Introduction

Cyanobacteria, or blue-green algae, are actually a type of photosynthetic prokaryotes (Wiegand et al., 2005) that are capable of producing myriad types of secondary metabolites, many of which have been identified as strong cyanotoxins (Blaha et al., 2009). These cyanotoxins can be classified into five categories, including hepatotoxins (affecting the liver), neurotoxins (affecting the brain), cytotoxins (affecting cells), dermatotoxins (affecting the skin), and irritant toxins (Wiegand et al., 2005). With the most extreme cases of exposure ending in death, having an understanding of the factors that contribute to cyanobacterial vigor is incredibly important in protecting the health of people living near affected waters.

One important factor that has led to an increase in the observation of cyanobacterial bloom is the widespread increase in anthropogenic eutrophication, a state of excessive nutrient load in a water body caused largely by human land use practices (Blaha et al., 2009). As agriculture and urbanization expand, the increase of land impermeability coupled with the increase in nutrient-rich lands leads to an increased amount of surface runoff. This surface runoff carries nutrients from the land to the water, leading to these eutrophic conditions that favor excessive plant and algae growth. According to a 1999 estimate by Bartram et al., over 40% of lakes and reservoirs have become eutrophic, presenting ideal conditions for widespread cyanobacterial bloom. An updated proportion would most assuredly be much higher almost twenty years later.

In addition to the increase in anthropogenic eutrophication, global climate change also plays a large part in the increased burden of cyanobacterial bloom (Blaha et al., 2009). To put it simply, when the weather is hot and calm and nutrient levels are high, more widespread cyanobacterial bloom is expected (NYSDEC, 2017). These cyanobacterial preferences are troubling since scientists widely agree that temperatures will continue to increase for decades to come due to the anthropogenic production of greenhouse gases that have made themselves at home in the atmosphere (NASA, 2017). Forecasted temperatures demonstrate a probable increase in temperature by 2.5 to 10 degrees Fahrenheit in the next century, offering ideal conditions for cyanobacteria to thrive (NASA, 2017).

In New York State, citizens and government bodies understand all too well how cyanobacteria are finding increased amount of success in water bodies across the state. The New York State Department of Environmental Conservation has a webpage dedicated to the monitoring of harmful algal blooms, relying on information gathered from the DEC Lake Classification and Inventory Program, the Citizen Statewide Lake Assessment Program volunteers, and public reports using the Suspicious Algal Bloom Report form (NYSDEC, 2017). According to the archived data from 2012-2016, suspicious blooms increased from 19 to 41, confirmed blooms increased from 29 to 95, and highly toxic blooms increased from 9 to 37 (NYSDEC HAB Program Archive Summary, 2016).

The Finger Lakes represent 11 of of New York State’s monitored water bodies, all near to one another in geographical space and yet all impacted to varying degrees by cyanobacteria. The table below shows how many weeks per year that harmful algal blooms were detected by the NYS Department of Environmental Conservation:

library(readr)
library(DT)
lakedata=read.table(file = 'data/BloomWks_2.csv',skip=1,sep=',',col.names=c('Lake','BlmWk12','BlmWk13','BlmWk14','BlmWk15','BlmWk16'),nrows=11)
BlmWk17=c(8,0,0,14,4,6,5,10,10,5,2)
lakedata2017=cbind(lakedata,BlmWk17)
datatable(lakedata2017, options = list(pageLength = 6))

From this table, we can clearly see that some of the Finger Lakes experience more weeks of bloom than other lakes. We can also clearly see that many of the lakes never had recorded blooms actually experienced recorded blooms in 2017, a troubling sign.

These data beg a few questions: 1) if these lakes are so similar in geography, topography, and use, why are they impacted so disparately by cyanobacteria? 2) What might be the possible explanation for increasing bloom records across all lakes?

For this project, we will seek to answer these questions by looking into land use. Specifically, this project will seek to quantify the proportion of each lake’s watershed that can be categorized as agricultural land. It is hypothesized that lakes wit higher proportions of agricultural and urban lands will have higher nutrient loads and will therefore face higher burdens of cyanobacterial bloom. In addition, if land use has changed or is changing, perhaps this could be an indication of why blooms are becoming more widespread across all lakes.

Materials and methods

In order to quantify the ratio of agricultural land for each lake’s watershed, we first have to delineate the boundaries between the lakes’ watersheds.

Originally, we were going to use the ‘rgrass7’ package in order to call on the GRASS program from within R. In this way, we could theoretically use the watershed tools already built into GRASS in order to help with the watershed delineation. The problem, however, is in the learning curve. The process within GRASS is already somewhat complicated, so the attempt to translate into the R environment was extra complicated, especially since there was little literature for how it should be done. Instead, the watershed delineation was calculated strictly within GRASS.

The data used to perform this watershed delineation are 1 arc-second DEM data that can be freely downloaded by anyone from USGS’ National Map database.

First let’s find the elevation data that we need!

library(sp)
library(raster)
library(rgdal)
library(XML)
library(RArcInfo)
library(imager)
library(sf)
library(gridExtra)
datadir='C:/Users/gills/Desktop/Geo503_R/FinalProjectR/RDataScience_Project/data'
USA=getData('GADM',country='USA',level=0,path=datadir)
dem_FL=getData('alt',country='USA',lat=42.72,lon=-77.05,path=datadir,download=T)
dem_cont_USA=dem_FL$`C:/Users/gills/Desktop/Geo503_R/FinalProjectR/RDataScience_Project/data/USA1_msk_alt.grd`
dem_roi=crop(dem_cont_USA,extent(-79,-75,41,44),filename=file.path(datadir,"dem_flr.tif"),overwrite=T)

After narrowing down our DEM data for our region of interest around the 11 Finger Lakes, we can use the raster package to analyze terrain details like slope, aspect, and flow direction as the preliminary steps to identifying the boundaries between each of the 11 Finger Lakes.

FlowDir=terrain(x = dem_roi,opt = 'flowdir',unit = 'degrees',neighbors = 8)

Slope=terrain(x=dem_roi,opt='slope',unit='degrees',neighbors=8)

Aspect=terrain(x=dem_roi,opt='aspect',unit='degrees',neighbors=8)

These calculations within R have given us a good idea of how the terrain behaves in the Finger Lakes Region and thus where we’d expect water to flow. For the actual watershed delineation, however, we’ll turn it over to GRASS!

Once GRASS helps us identify watershed boundaries, however, we’ll need a land use map for the Finger Lakes Region. This can be found freely online through the National Land Cover Database (NLCD) or you can create a land use map of your own using remote sensing software such as ENVI and freely downloadable Landsat imagery from USGS’ Earth Explorer or GloVis interfaces. Since I did this for another class, I’ll bring in my own work here and we can do a data overlay to calculate land use proportions for each lake. My land use map was derived from a Landsat 8 OLI image that was captured on April 23, 2017. The image was preprocessed and classified within ENVI software.

After pulling in our watershed shapefile and our land use classification for the region, R will help us perform overlay analysis in order to pull information and knowledge out of the once raw data. First, we’ll need to break up the watershed multipolygon shapefile into separate polygons representing each separate lake watershed. This will make it simpler to analyze land use within each separate watershed.

Separating the Watershed in to sub-watersheds

watersheds=readOGR(dsn='FL_polygons.shp')

## OGR data source with driver: ESRI Shapefile 
## Source: "FL_polygons.shp", layer: "FL_polygons"
## with 11 features
## It has 2 fields
## Integer64 fields read as strings:  Area

watersheds_proj=readOGR(dsn='data/FL_polygons.shp')

## OGR data source with driver: ESRI Shapefile 
## Source: "data/FL_polygons.shp", layer: "FL_polygons"
## with 11 features
## It has 2 fields
## Integer64 fields read as strings:  Area

conesus=watersheds_proj[watersheds_proj$Name=='Conesus',]
otisco=watersheds_proj[watersheds_proj$Name=='Otisco',]
skaneateles=watersheds_proj[watersheds_proj$Name=='Skaneateles',]
honeoye=watersheds_proj[watersheds_proj$Name=='Honeoye',]
canandaigua=watersheds_proj[watersheds_proj$Name=='Canandaigua',]
owasco=watersheds_proj[watersheds_proj$Name=='Owasco',]
cayuga=watersheds_proj[watersheds_proj$Name=='Cayuga',]
seneca=watersheds_proj[watersheds_proj$Name=='Seneca',]
keuka=watersheds_proj[watersheds_proj$Name=='Keuka',]
canadice=watersheds_proj[watersheds_proj$Name=='Canadice',]
hemlock=watersheds_proj[watersheds_proj$Name=='Hemlock',]

Clipping Land Use Raster by Watershed Boundaries

Next, we’ll need to clip our land use raster image 11 times using the 11 separate watershed polygons. This will enable us to visualize the land use separately for each watershed and analyze the data of each watershed separately.

landuse=raster(x='data/Class2017_2')
# Conesus Lake
landuse_conesus=crop(landuse, conesus)
mask_conesus=mask(landuse_conesus,conesus)
conesus_data=extract(landuse_conesus, watersheds_proj[watersheds_proj$Name == "Conesus",])
conesus_summary=table(conesus_data)

#Otisco Lake
landuse_otisco=crop(landuse, otisco)
mask_otisco=mask(landuse_otisco,otisco)
otisco_data=extract(landuse_otisco, watersheds_proj[watersheds_proj$Name == "Otisco",])
otisco_summary=table(otisco_data)

#Skaneateles Lake
landuse_skaneateles=crop(landuse, skaneateles)
mask_skaneateles=mask(landuse_skaneateles,skaneateles)
skaneateles_data=extract(landuse_skaneateles, watersheds_proj[watersheds_proj$Name == "Skaneateles",])
skaneateles_summary=table(skaneateles_data)

#Honeoye Lake
landuse_honeoye=crop(landuse, honeoye)
mask_honeoye=mask(landuse_honeoye,honeoye)
honeoye_data=extract(landuse_honeoye, watersheds_proj[watersheds_proj$Name == "Honeoye",])
honeoye_summary=table(honeoye_data)

#Canandaigua Lake
landuse_canandaigua=crop(landuse, canandaigua)
mask_canandaigua=mask(landuse_canandaigua,canandaigua)
canandaigua_data=extract(landuse_canandaigua, watersheds_proj[watersheds_proj$Name == "Canandaigua",])
canandaigua_summary=table(canandaigua_data)

#Owasco Lake
landuse_owasco=crop(landuse, owasco)
mask_owasco=mask(landuse_owasco,owasco)
owasco_data=extract(landuse_owasco, watersheds_proj[watersheds_proj$Name == "Owasco",])
owasco_summary=table(owasco_data)

#Cayuga Lake
landuse_cayuga=crop(landuse, cayuga)
mask_cayuga=mask(landuse_cayuga,cayuga)
cayuga_data=extract(landuse_cayuga, watersheds_proj[watersheds_proj$Name == "Cayuga",])
cayuga_summary=table(cayuga_data)

#Seneca Lake
landuse_seneca=crop(landuse, seneca)
mask_seneca=mask(landuse_seneca,seneca)
seneca_data=extract(landuse_seneca, watersheds_proj[watersheds_proj$Name == "Seneca",])
seneca_summary=table(seneca_data)

#Keuka Lake
landuse_keuka=crop(landuse, keuka)
mask_keuka=mask(landuse_keuka,keuka)
keuka_data=extract(landuse_keuka, watersheds_proj[watersheds_proj$Name == "Keuka",])
keuka_summary=table(keuka_data)

#Canadice Lake
landuse_canadice=crop(landuse, canadice)
mask_canadice=mask(landuse_canadice,canadice)
canadice_data=extract(landuse_canadice, watersheds_proj[watersheds_proj$Name == "Canadice",])
canadice_summary=table(canadice_data)

#Hemlock Lake
landuse_hemlock=crop(landuse, hemlock)
mask_hemlock=mask(landuse_hemlock,hemlock)
hemlock_data=extract(landuse_hemlock, watersheds_proj[watersheds_proj$Name == "Hemlock",])
hemlock_summary=table(hemlock_data)

Above, the data behind the clipped land use maps for each watershed have been extracted into 11 summary tables. In each table, there is a count of how many pixels in each land use map are represented by each land use class. The classes are numbered 1-5 and represent Urban, Agriculture, Water, Forest, and Bare Land respectively. While it is useful to have 11 separate watershed land use summaries, it would be even more useful to put all of that summarized data into one table. ## Pulling Watershed-specific Land Use Data into Summary Table

conesus_matrix=c(conesus_summary)
otisco_matrix=c(otisco_summary)
skaneateles_matrix=c(skaneateles_summary)
honeoye_matrix=c(honeoye_summary)
canandaigua_matrix=c(canandaigua_summary)
owasco_matrix=c(owasco_summary)
cayuga_matrix=c(cayuga_summary)
seneca_matrix=c(seneca_summary)
keuka_matrix=c(keuka_summary)
canadice_matrix=c(canadice_summary)
hemlock_matrix=c(hemlock_summary)
lakes=c('Conesus','Otisco','Skaneateles','Honeoye','Canandaigua','Owasco','Cayuga','Seneca','Keuka','Canadice','Hemlock')
FL_results=as.data.frame(rbind(conesus_matrix,otisco_matrix,skaneateles_matrix,honeoye_matrix,canandaigua_matrix,owasco_matrix,cayuga_matrix,seneca_matrix,keuka_matrix,canadice_matrix,hemlock_matrix))
row.names(FL_results)=c(lakes)
colnames(FL_results)=c('Urban','Agriculture','Water','Forest','Bare')
FL_results$sum_pixels=c(rowSums(FL_results))
FL_results$U_prop=c(FL_results$Urban/FL_results$sum_pixels)
FL_results$U_prop=round(FL_results$U_prop,2)
FL_results$A_prop=c(FL_results$Agriculture/FL_results$sum_pixels)
FL_results$A_prop=round(FL_results$A_prop,2)
FL_results$W_prop=c(FL_results$Water/FL_results$sum_pixels)
FL_results$W_prop=round(FL_results$W_prop,2)
FL_results$F_prop=c(FL_results$Forest/FL_results$sum_pixels)
FL_results$F_prop=round(FL_results$F_prop,2)
FL_results$B_prop=c(FL_results$Bare/FL_results$sum_pixels)
FL_results$B_prop=round(FL_results$B_prop,2)
FL_results$Urban=NULL
FL_results$Agriculture=NULL
FL_results$Water=NULL
FL_results$Forest=NULL
FL_results$Bare=NULL
FL_results$sum_pixels=NULL
conesus@data$Area

## [1] 181
## Levels: 112 1432 177 178 181 2034 206 367 45 489 543

otisco@data$Area

## [1] 177
## Levels: 112 1432 177 178 181 2034 206 367 45 489 543

skaneateles@data$Area

## [1] 206
## Levels: 112 1432 177 178 181 2034 206 367 45 489 543

honeoye@data$Area

## [1] 178
## Levels: 112 1432 177 178 181 2034 206 367 45 489 543

canandaigua@data$Area

## [1] 489
## Levels: 112 1432 177 178 181 2034 206 367 45 489 543

owasco@data$Area

## [1] 543
## Levels: 112 1432 177 178 181 2034 206 367 45 489 543

cayuga@data$Area

## [1] 2034
## Levels: 112 1432 177 178 181 2034 206 367 45 489 543

seneca@data$Area

## [1] 1432
## Levels: 112 1432 177 178 181 2034 206 367 45 489 543

keuka@data$Area

## [1] 367
## Levels: 112 1432 177 178 181 2034 206 367 45 489 543

canadice@data$Area

## [1] 45
## Levels: 112 1432 177 178 181 2034 206 367 45 489 543

hemlock@data$Area

## [1] 112
## Levels: 112 1432 177 178 181 2034 206 367 45 489 543

FL_results$Wtshd_Area=c(181,177,206,178,489,542,2034,1432,367,45,112)
#Lake Area measurements pulled from Google Search
FL_results$Lake_Area=c(13.84,8.29,35.61,7.17,43.50,26.97,172.00,173.30,47.47,2.63,7.28)
FL_results$LW_prop=FL_results$Lake_Area/FL_results$Wtshd_Area
FL_results$LW_prop=round(FL_results$LW_prop,2)
#Bloom weeks are the summed bloom weeks from each lake between 2012-2017
FL_results$BlmWks=c(22,2,5,68,11,48,20,13,6,0,0)

Above, the data was moved around and manipulated in order to create a table summarizing the analysis results. The table summarizes each lake’s land use makeup, watershed size, lake size, and cyanobacterial bloom frequency.

With these data summarized into a table, it is possible to dig into it in order to look for patterns. In this way, it might be possible to understand why the eleven lakes that are so close in proximity are affected so differently by cyanobacteria.

Scatter plots can help visualize how data are or are not related. With the use of scatter plots, we can look for relationships between number of cyanobacterial blooms weeks and:

Proportion of Watershed that is Urban
Proportion of Watershed that is Agricultural
Proportion of Watershed that is Water
Proportion of Watershed that is Forested
Proportion of Watershed that is Bare
Size of Watershed
Size of Lake
Proportion of Lake Area to Watershed Area

After completing these steps, perhaps the results can inform about which factors might have the greatest impact on cyanobacterial bloom frequency.

Results

Basic Terrain Analysis

plot(dem_roi)

DEM for Finger Lakes Region

The above plot is the DEM data from the region of interest surrounding the Finger Lakes. The DEM was cropped followed a wide girth around the lakes because it is hard to tell just by looking at the DEM data how far each lake’s watershed might extend.

plot(Slope)

Slope for Finger Lakes Region

The above plot demonstrates how the slope changes throughout the region of interest. Slope plays an important part in the process of how water moves through a watershed since water follows the path of least resistance.

plot(Aspect)

Aspect for Finger Lakes Region

The above plot demonstrates the aspect of the land in the Finger Lakes Region. Since aspect informs about the compass direction any given slope faces, understanding the aspect of the terrain is also an important consideration in trying to understand how water might move through the watershed.

plot(FlowDir)

Flow Direction for Finger Lakes Region

This plot demonstrates the flow direction within the region of interest. The legend for this parameter goes up to 128. These legend values are not necessarily intuitive. To understand, think of water falling into a pixel. If the water does not stay in that pixel, it can go one of eight directions, to one of the eight surrounding pixels. Each pixel can be represented by a number. The numerical scale of the legend can then be understood as follows:

East : 2^0 = 1 Southeast : 2^1 = 2 South : 2^2 = 4 Southwest : 2^3 = 8 West : 2^4 = 16 Northwest : 2^5 = 32 North : 2^6 = 64 Northeast : 2^7 = 128

Loading Watershed Boundaries into R from GRASS

plot(watersheds)

Watershed Boundaries: Finger Lakes Region, NY

Originally, I had hoped to complete the entire watershed delineation within R using the rgrass7 package. This package theoretically calls on the GRASS software in order to utilize GRASS’s watershed tools. The code seemed simple, and yet was very reluctant to work. Having some experience delineating watershed boundaries from within GRASS, I know that the process is not exactly straight forward even within GRASS itself, so it makes sense that it would not be using R as a proxy either. Instead, I worked within the GRASS GUI in order to produce watershed boundaries using the DEM data we already have. This watershed shapefile can be seen above.

Overlaying Watershed Boundaries onto DEM

DEM_proj=raster(x = 'data/dem_flr_proj.tif')
DEM_proj_crop=crop(DEM_proj,extent(250000,420000,4650000,4800000),filename=file.path(datadir,"DEM_crop.tif"),overwrite=T)
plot(DEM_proj_crop)
plot(watersheds_proj, add=T)

Overlaying Watershed Boundaries over DEM

Here, we can see how the watershed boundaries fit nicely onto the DEM from earlier in a way that makes sense visually. Boundaries fall on what appear to be local high points between the lakes.

The reluctance of R to be useful in the watershed delineation process, however, turned out to be useful. Instead of the focus of this project being on watershed delineation, we can take it a step further in order to understand more about the land use within the watersheds.

Land Use Classification Data via Analysis of Landsat 8 Image with Validation Information

plot(landuse)

Land Use Classification derived from a Landsat 8 OLI Image from April 23, 2017

conf_mat=load.image(file = 'data/accuracy.jpg')
plot(conf_mat)

Land Use Classification derived from a Landsat 8 OLI Image from April 23, 2017

The above map was created by me in ENVI. I used an image from Landsat 8 OLI captured on April 23, 2017 that was downloaded from USGS’s National Map. The image was preprocessed and then a supervised classification was run on the image. In the classification map above, various colors represent different types of land use. In the above image, the colors signify:

Yellow - Agriculture Green - Forest Red - Bare Land Blue - Water Light blue - Urban

While this land use map is in no way perfect, validation of the classification did indicate a high accuracy of classification. Using training samples collected from higher spatial resolution imagery, a confusion matrix was constructed and demonstrates an overall accuracy of 98.246%. This lends some confidence to the classification results, supporting its usefulness in the watershed analysis.

Watershed / Land Use Overlay Analysis

plot(landuse)

Overlaying Watershed Boundaries over Land Use Classification Map

plot(watersheds_proj, add=T)

Overlaying Watershed Boundaries over Land Use Classification Map

In the above map, we can see how the land use varies between watersheds. There is clearly and issue at the bottom of the land use map where the boundary of the original Landsat image did not extend to the full reach of two of the central Finger Lakes. This should and will be addressed in future land use classification analysis by overlaying two adjacent Landsat images.

For the purposes of this project, however, we can still do our analysis, simply keeping in mind that the information we have for lakes 5 and 6, Keuka Lake and Seneca Lake, are incomplete.