Geography 423/523. Winter 2019. Dan Gavin.

Problem set 1

The goal of this exercise is to visualize a large vegetation plot data base and analzyze gradients in species abundance with elevation and topographic moisture (=local soil water). Follow through these commands after installing R on your computer. The data come from the ecological plot data base (ecoplot) for SW Oregon (If you want, you can find it under the core data sets in ecoshare.info, but I extracted the data for you). The data come from the same region where Robert Whittaker conducted a classic vegetation survey and presented patterns of vegetation with respect to moisture, elevation, and parent material (rocks of diorite or serpentinite). I have not subsetted the data based on parent material, so the examples below only address elevation and topographic moisture. (Serpentine rocks are relatively rare, anyway). There are other data out there on species distributions, but the ecoplot dataset is one of the more complete datasets in the Pacific Northwest.

Download the files from canvas.uoregon.edu (under Files) into a single folder on your computer. They are also downloadable here (right-click or control-click and select 'save-as'...). You do not need to open these files directly with other software. Just move them into a single folder. These files are:

The following preprocessing was done for you:

  1. I removed a few geographic outlier plots.
  2. I kept all tree and shrub species. I removed other species occurring in <1% of plots. This reduced the original 1200+ species down to 250 species.
  3. I did not update taxonomy to modern taxonomic names.

File sites.csv: This file has information about each site (vegetation plot). There are 8353 sites (8353 rows in the file):

File species.csv has these columns:

The file veg.csv gives the percent cover of each species at each site. The total percent in a site may be over 100% because trees + shrubs + herbs can add up to more than 100%. The file has 250 columns (species) and 8353 rows (sites). The row numbers match the sites in the sites.csv file.


Open R. Under Misc Menu, select the directory where the data files are located

Load the file and see the column names.

veg <- read.csv("veg.csv")
names(veg)
sites <- read.csv("sites.csv")
species <- read.csv("species.csv")

Start exploring data See a spatial distribution of the plots. The $ means select the column in the data set.

plot(sites$UTMEAST,sites$UTMNORTH)

We should use R's map-making capabilities, too.

install.packages("rgdal")
install.packages("lattice")
install.packages("rasterVis")
install.packages("raster")
library(rgdal)
library(lattice)
library(rasterVis)
library(raster)

#change the point locations into a 'spatialpoint' object that maps
sites.sp<-SpatialPoints(coords=cbind(sites$UTMEAST,sites$UTMNORTH))

#read the raster
dem <- raster("swodem.tif")

# this makes the base map (nothing plots now)
base <- rasterVis::levelplot(dem, col.regions = grey(seq(1,0,-0.01)),margin=F)

It would be nice to see the abundance of a species on the plots. This plots Douglas-fir (PSME) as a bubble plot on the base map. [veg$PSME>0,] subsets the rows where veg$PSME>0. cex command scales the dots to Douglas-fir cover level. Colors are red,green,blue,alpha. red,green,blue are from 0 to 1, alpha =0 means transparent, alpha=0.5 means partly transparent.

base + spplot(sites.sp[veg$PSME>0],cex=veg$PSME[veg$PSME>0]/100,col.regions=rgb(1,.5,0,alpha=0.5))

Add a second species (mountain hemlock) after another + sign

base + 
spplot(sites.sp[veg$PSME>0],cex= veg $PSME[veg$PSME>0]/100,col.regions=rgb(1,.5,0,alpha=0.5)) + 
spplot(sites.sp[veg$TSME>0],cex=veg$TSME[veg$TSME>0]/100,col.regions=rgb(0,0,.75,alpha=0.5))

add white fir (ABCO) as yellow and Sitka spruce (PISI) as purple and western hemlock (TSHE) as green

base + 
spplot(sites.sp[veg$PSME>0],cex=veg$PSME[veg$PSME>0]/100,col.regions=rgb(1,.5,0,alpha=0.5)) + 
spplot(sites.sp[veg$TSME>0],cex=veg$TSME[veg$TSME>0]/100,col.regions=rgb(0,0,.75,alpha=0.5)) +
spplot(sites.sp[veg$ABCO>0],cex=veg$ABCO[veg$ABCO>0]/100,col.regions=rgb(1,1,0,alpha=0.5)) +
spplot(sites.sp[veg$PISI>0],cex=veg$PISI[veg$PISI>0]/100,col.regions=rgb(0,0,.4,alpha=0.5)) +
spplot(sites.sp[veg$TSHE>0],cex=veg$TSHE[veg$TSHE>0]/100,col.regions=rgb(0,1,0,alpha=0.5)) 

We can make a page of maps by dividing the graphics monitor into smaller panes and using a loop in some code. First we can focus on just the species that occur in more than 25% of the plots. The data file species.csv has the species list with a column called 'frequency' which is the percent of plots that species occurs in. (note that some very important 'indicator species' of certain habitats are excluded by doing this).

species.25 <- species[species$frequency>25,]
species.25

Now you can see the 28 common species and their names. The numbers in species.25$column provides an index of the columns in the veg file. The next command divides the monitor into 7 rows and 4 columns, with almost no margin around each plot area:

par(mfrow=c(7,4),mar=c(0.5,0.5,1,0.5))

This is where programming comes in very handy. We are not using the base plot with the elevation data, for simplicity.

for(i in species.25$column){
    x<-sites$UTMEAST[veg[,i]>0]
    y<-sites$UTMNORTH[veg[,i]>0]
    z<-veg[veg[,i]>0,i]/100  ##scaled by 100
    plot(x,y,col="blue",cex=z,pch=16,xlim=c(375000,590000),ylim=c(4630000,4863000), xaxt='n',yaxt='n')
    mtext(side=3,line=0.2,text=species$Species_name[i],cex=.6)
    }

You can select just the trees using the code below, then use the code above. Substitute species.25$column with species.trees$column[1:28] or species.trees$column[29:34].

species.trees <- species[species$form=="Tree",]

Reset the graphics monitor to make one large plot using this command:

par(mfrow=c(1,1),mar=c(5, 4, 4, 2) + 0.1)

A lot of spatial overlap occurs because of the steep elevational gradients in the mountains. We can look at plots of abundance vs. elevation. First, see a histogram of the elevations of the plots:

hist(sites$ELEV)

This makes a scatter plot of mountain hemlock cover by elevation

plot(sites$ELEV,veg$TSME,cex=0.4)

We can fit a smoothed line through these points, to see the average change in abundance with elevation First load the function that can calculate this smoothed line. The file 'lowessmoduleR.txt' should be in your working directory

source("lowess_module_R.txt")

This next command should take a few seconds to run. It calculates the average abundance within a window of +/- 300 feet, for 0 to 7500 feet above sea level, at intervals of 10 feet

tsme.300<-lowess_curve(xin=sites$ELEV,yin=veg$TSME,xt=seq(0,7500,10))

Plot the result on the graph. The "lines" command puts a line on the current plot.

lines(tsme.300$xt,tsme.300$yfit,col=2)

Do the same for Douglas-fir or other species.
We can visualize the elevational profiles for all tree species. This does not show how the y-axis varies among plots. The y-axis (percent cover) is square-root transformed as is common for percentage data. Select which ses of species as described above. Run the par command if you need to clear the graphics monitor.

par(mfrow=c(7,4),mar=c(0.5,0.5,1,0.5))
for(i in species.25$column){
    plot(x=sites$ELEV,y=sqrt(veg[,i]),cex=0.4,xaxt='n', yaxt='n',xlim=c(0,7500))
    axis(1,at=seq(0,7000,1000),labels=F)
    mtext(side=3,line=0.15,cex=0.6,text=species$Species_name[i])
    fit <- lowess_curve(xin=sites$ELEV,yin=veg[,i],xt=seq(0,7500,10))
    lines(x=fit$xt,y=sqrt(fit$yfit),col="red")
    }

Now let's try to create Whittaker's diagram of vegetation zones by elevation (y-axis) and topographic moisture (x-axis).

This plots Douglas-fir with MICROPOS as the x-axis and Elevation on the Y-axis This only plots points where Douglas-fir is >0.

par(mfrow=c(1,1),mar=c(5, 4, 4, 2) + 0.1)
plot(sites$MICROPOS[veg$PSME>0],sites$ELEV[veg$PSME>0],col="darkorange",cex=veg$PSME[veg$PSME>0]/100,pch=16,ylim=c(0,7500),xlim=c(1,9.5),ylab="Elevation (feet)",xlab="Topographic Moisture")
text(1,-50,"PSME",col="darkorange",srt=90,cex=0.4)

Add points for other species with the points (here is is mountain hemlock). I am adding 0.1 to the x-axis MICROPOS variable so the points do not overlap.

points(sites$MICROPOS[veg$TSME>0]+0.1,sites$ELEV[veg$TSME>0],col="darkblue",cex=veg$TSME[veg$TSME>0]/100,pch=16)
text(1.1,-50,"TSME",col="darkblue",srt=90,cex=0.4)

Note that Douglas-fir and mountain hemlock do not overlap. Now add Abies concolor (white fir) and tanbark (Notholithcarpus, or LIDE).

points(sites$MICROPOS[veg$ABCO>0]+0.2,sites$ELEV[veg$ABCO>0],col="brown",cex=veg$ABCO[veg$ABCO>0]/100,pch=16)
text(1.2,-50,"ABCO",col="brown",srt=90,cex=0.4)
points(sites$MICROPOS[veg$LIDE>0]+0.3,sites$ELEV[veg$LIDE>0],col="purple",cex=veg$LIDE[veg$LIDE>0]/100,pch=16)
text(1.3,-50,"LIDE",col="purple",srt=90,cex=0.4)
points(sites$MICROPOS[veg$CADE>0]+0.4, sites$ELEV[veg$CADE>0],col="magenta",cex=veg$CADE[veg$CADE>0]/100,pch=16)
text(1.4,-50,"CADE",col="magenta",srt=90,cex=0.4)

It is getting messy to look at! This is the problem with multidimensional data!

We can create clusters of plots based on the similarity of the vegetation composition. Two methods vegetation ecologists use are 'ordination' and 'cluster analysis'. The goal of ordination is to place the plots in a two-dimensional space in which the distance between plots is proportional to the dissimilarity in the vegetation composition of the plots. There are many ways of accomplishing this. A n easy method called "principal components analysis; PCA" can achieve this (almost) but there are some proporties of PCA that are not desireable for vegetation ordination. A good method called "nonmetric multidimensional scaling" literally moves points (sites) around in a 2-dimentionsal space until it finds a best fit where the distance between points is proportional to the compositional difference. Since this takes a long time to run with 8353 sites, we will use a version of PCA called "detrended correspondence analysis" or DCA. It works OK. All vegetation ordination functions are in the R package called vegan.

install.packages("vegan")
library(vegan)

Reduce number of species to those occurring in 5% of plots or more

species.5 <- species[species$frequency>5,]
 length(species.5$column)
# [1] 89 ...89 species remain

# remove rows (plots) that sum to less than 5% percent cover (after trimming out some species)
veg.5 <- veg[,species.5$column]
sums <- array(NA,8353)
for(i in 1:8353){
    sums[i]<-sum(veg.5[i,])
    }
veg.5<-veg.5[sums>5,]  # removes zero rows
sites.5 <- sites[sums>5,]  
length(veg.5[,1])
# [1] 8321 (remaining plots)

Detrended correspondence analysis arranges plots into a a several-dimensional space, but we focus on the most important first two dimensions. Run the analysis on proportions (0 to 1), not percentages (0 to 100). In the function below, all percentages going into the decorana function are divided by 100.

swo.dca <- decorana(veg.5/100)
plot(swo.dca,display="sites",cex=0.4,xlim=c(-2.2,4.1))
text(swo.dca, display="species",cex=0.4,col="red")

The small black dots are the sample plots arrayed into an abstract ordination space. The red letters are the locations where each species is most abundant. What environmental gradients are represented by the horizontal axis and the vertical axis? Note where TSME (mountain hemlock) is located. This is a high elevation species. It is possible to add a smoothed surface that shows how the typical elevation changes over the ordination space. Note that only vegetation data were used to "build" this space.

es <- ordisurf(swo.dca,sites.5$ELEV)

We can also see if distance from the coast is meaningful (using the UTM easting).

es <- ordisurf(swo.dca,sites.5$UTMEAST)

Further explore other variables...such as ASPECT. Here is a transform of aspect so that cool moist NE facing slopes have a value of 0, and warm dry SW facing slopes have a value of 1.

t.aspect <- (1-cos((sites.5$ASPECT-45)*pi/180))/2
es <- ordisurf(swo.dca,t.aspect)

You can access the "site scores" (location along each axis) using the vegan function "scores" and make scatter plots with environmental gradients (i.e. what you saw on the surfaces in the previous commands. the [,1] and [,2] means select the first column (DCA axis 1) or the second column.

par(mfrow=c(2,1),mar=c(3,4,1,1))  # two rows and one column of plots
plot(sites.5$ELEV,scores(swo.dca)[,1],ylab="DCA axis 1",xlab="Elevation (feet)", cex=0.4)
plot(t.aspect,scores(swo.dca)[,2],ylab="DCA axis 2",xlab="Transformed aspect", cex=0.4)

Strangely I didn't see the strong correlation with aspect evident from the earlier function. There is a virtually endless series of analyses that could be done. How about geographic patterns in DCA axis 1 and DCA axis 2? (and you can also try axes 3 and 4 of DCA in the column number '[,1] or [,2], or [,3] in the lines below.) Since these values go into negative range, we need to scale them from 0 to 1.5 so the symbol size varies appropriately.

t <- scores(swo.dca)[,1]  # using axis 1 as an example.
dca.minmax <- 1.5*(t-min(t))/(max(t)-min(t))

sites.5.sp <- SpatialPoints(coords=cbind(sites.5$UTMEAST,sites.5$UTMNORTH))
base + spplot(sites.5.sp,cex=dca.minmax,col.regions=rgb(0,0,1,alpha=0.5))

To add text (title) you need the grid package

library(grid) #you may need to install.packages("grid") first
grid.text("DCA axis 1 scores",x=.2,y=.9,just="left")

Questions for problem set 1

Please write your complete thoughts concisely on these topics after exploring the data as shown above. Further exploring by examining additional species, looking up general species characteristics on Wikipedia, will certainly reveal additional vegetation patterns. Include figures to show what you are talking about (just save the graphics window in R to a PDF file, which you can put into Microsoft Word). I am looking for two pages of text plus figures. Submit on canvas.uoregon.edu

  1. Are species patterned elevationally, and are there important elevational transition points?
  2. How do these results compare with the concepts in Whittaker regarding vegetation gradients in the Siskiyou Mountains?
  3. Despite identifying some species pattering on environmental gradients (elevation, aspect), there is a lot of unexplained variability. What other environmental data would you try to pull in to explain the patterning of species on the landscape>