Using the segmented package to detect changepoints in a time series

Abrupt changes in trends in a time series can be estimated by segmented regression.

Install and load the package

install.packages("segmented")
library(segmented)

Read a time series data set. This is biogenic silica (BSi) concentrations from a sediment core from Eleanor Lake, British Columbia.

ele <- read.csv("ele.csv")
names(ele)
plot(ele$age,ele$bsi,type="l", xlab="Age (cal yrs BP)", ylab="BSi (%)")

Note that one dip in values is due to the presence of the Mazama tephra. Remove the rows of data from ele using negative row values.

ele <- ele[-273:-289,]
plot(ele$age,ele$bsi,type="l", xlab="Age (cal yrs BP)", ylab="BSi (%)")

Interpolate the BSi record to a constant time interval. First, view the distribution of the inter-sample age differences (the temporal resolution of the record).

hist(ele$age[2:495]-ele$age[1:494])

It looks like interpolation to 20-year resolution is warranted. Now show the age range of the data set:

min(ele$age)
max(ele$age)

Use approx to interpolate between nearest points surrounding the 20-year target ages.

bsi.20 <-approx(x=ele$age,y=ele$bsi,xout=seq(-40,10860,20))
names(bsi.20)

Show the interpolated values as a red line (col=2) over the original points.

plot(ele$age,ele$bsi, xlab="Age (cal yrs BP)", ylab="BSi (%)")
lines(bsi.20$x,bsi.20$y,col=2)

Note that the interpolation does not integrate over areas of high density sample points. Therefore, a brute-force integration method of interpolation involves 1) interpolate to annual values, then 2) average the annual values in contiguous 20-year blocks. bsi.1 is the annually interpolated time series from the raw data. xout are the target ages. bsi.20i will hold the results.

bsi.1 <-approx(x=ele$age,y=ele$bsi,xout=(seq(-47,10870,1)))
xout <- seq(-40,10860,20)
bsi.20i <- data.frame(x=xout,y=rep(NA,length(xout)))

for(i in 1:length(xout)){
    low.bound <- xout[i]-10
    high.bound <- xout[i]+10
    bsi.20i$y[i] <- mean(bsi.1$y[bsi.1$x>low.bound & bsi.1$x<high.bound])
    }

Plot this interpolated series as a green line.

lines(bsi.20i$x,bsi.20i$y,col=3)

Now we can run segmented regression. First we need to make simple vector objects for the 'x' and the 'y' data. Then we run a regression using a simple command for linear regression (lm).

age <- bsi.20i$x
bsi <- bsi.20i$y
ele.lm <- lm(bsi~age)
lines(age,ele.lm$fitted.values,col=1)

The lm object is used as input to the segmented function. Initial guesses on breakpoints are 4000 and 8000 years ago.

ele.lm.seg <- segmented(ele.lm,seg.Z=~age,psi=c(4000,8000))
lines(xout,ele.lm.seg$fitted.values,col=5)
summary(ele.lm.seg)
AIC(ele.lm.seg)

The aic gives an overall fit of the segmented regression. Continue to add segments (by increaseing the number of events in psi) until aic is no longer decreasing by at least a value of 2 from the previous fit. Note that the locations of the starting-points of the data affect the final solution. Also, note that autocorrelation in the data set will result in over-fitting of the segmented regression model if using AIC alone to determine the number of changepoints.