Working with Spatial Data

Learning Objectives

Following this assignment students should be able to:

import, view properties, and plot a raster

perform simple raster math

extract points from a raster using a shapefile

evaluate a time series of raster

Reading

Topics
- raster
- Raster math
- Plotting spatial images
- Shapefile import
- Integrate raster and vector data
Readings
Additional information
- Rasters in R
- Vectors in R
  - Part I
  - Part II
- Projections in R
- Combining rasters and vectors in R

Lecture Notes

Spatial Data Introduction

Exercises

Canopy Height from Space (30 pts)

The National Ecological Observatory Network has invested in high-resolution airborne imaging of their field sites. Elevation models generated from LiDAR can be used to map the topography and vegetation structure at the sites. This data gets really powerful when you can compare ecological processes across sites. Download the elevation models for the Harvard Forest (HARV) and San Joaquin Experimental Range (SJER) and the plot locations for each of these sites. Often, plots within a site are used as representative samples of the larger site and act as reference areas to obtain more detailed information and ensure accuracy of satellite imagery (i.e., ground truth).
1. Create two Canopy Height Models using simple raster math (chm = dsm - dtm), one for the HARV site (which was done during the lecture) and another for the SJER site.
2. Create plots and histograms of canopy heights for both of the sites using ggplot.
3. Add corresponding points from plot_locations folder to each site plot.
4. Create a single dataframe with two columns, one of the maximum canopy heights for each point at the HARV site and one for the SJER points’ maximum canopy heights. When extracting the canopy height values, use a buffer of 10.
[click here for output] [click here for output] [click here for output] [click here for output] [click here for output]
Phenology from Space (40 pts)

The high-resolution images from Canopy Height from Space can be integrated with satellite imagery that is gathered more frequently. We will use data collected from MODIS. One common ecological process that can be observed from space is phenology (or seasonal patterns) of plants. Multi-band satellite imagery can be processed to provide a vegetation index of greenness called NDVI. NDVI values range from -1.0 to 1.0, where negative values indicate clouds, snow, and water; bare soil returns values from 0.1 to 0.2; and green vegetation returns values greater than 0.3.

Download HARV_NDVI and SJER_NDVI and place them in a folder with the NEON airborne data. The zip contains folders with a year’s worth of NDVI sampling from MODIS. The files are in order (and named) by date and can be organized implicitly by sampling period for analysis.
1. Plot the whole-raster mean NDVI (cellStats()) for Harvard Forest and SJER through time using different colors for the two sites. To do this:
  - Load the files for each site as a raster stack
  - Use cellStats() to calculate the mean values for each raster in the stack. Call the outputs harv_avg and sjer_avg
  - Create a vector of sampling periods for each site: e.g., samp_period = c(1:length(harv_avg), 1:length(sjer_avg))
  - Create a vector of site names for each site: e.g., site_name = c(rep("harv", length(harv_avg)), rep("sjer", length(sjer_avg)))
  - Make a data frame that includes columns for site name, sampling period, and the average NDVI values (concatenate the two vectors using c()).
  - Graph the trends through time using ggplot
2. Extract the NDVI values from all rasters for the HARV_plots and SJER_plotsin NEON-airborne/plot_locations. Running extract() on a raster stack results in a matrix with one column per raster and one row per point. To more easily work with this data, we want to have one column with the raster names and one column per point, which you can do by transposing the matrix with the t() function. Then make this into a dataframe and turn the rownames into a column using tibble::rownames_to_column(your_matrix, var = "date"). Do this for both HARV and SJER.
[click here for output] [click here for output]
Species Occurrences Map (30 pts)

A colleague of yours is working on a project on banner-tailed kangaroo rats (Dipodomys spectabilis) and is interested in what elevations these mice tend to occupy in the continental United States. You offer to help them out by getting some coordinates for specimens of this species and looking up the elevation of these coordinates.
1. Get banner-tailed kangaroo rat occurrences from GBIF, the Global Biodiversity Information Facility, using the spocc R package, which is designed to retrieve species occurrence data from various openly available data resources. Use the following code to do so:
```
 dipo_df = occ(query = "Dipodomys spectabilis", 
             from = "gbif",
             limit = 1000,
             has_coords = TRUE)
 dipo_df = data.frame(dipo_df$gbif$data)
```
2. Clean up the data by:
  - Using the rename function from dplyr to rename the second and third columns of this dataset to longitude and latitude
  - Filter the data to only include those specimens with Dipodomys_spectabilis.basisOfRecord that is PRESERVED_SPECIMEN and a Dipodomys_spectabilis.countryCode that is US
  - Remove points with values of 0 for latitude or longitude
  - Remove all of the columns from the dataset except latitude and longitude using select
  - Use head() function to show the top few rows of this cleaned dataset
3. Do the following to display the locations of these points on a map of the United States:
  - Get data for a US map using usmap = map_data("usa")
  - Plot it using geom_polygon. In the aesthetic use group = group to avoid weird lines cross your graph. Use fill = "white" and color = "black".
  - Plot the kangaroo rat locations
  - Use coord_quickmap() to automatically use a reasonable spatial projection
[click here for output] [click here for output]

Data Science in Omics Introduction

Assignment

Learning Objectives

Reading

Lecture Notes

Exercises

Canopy Height from Space (30 pts)

Phenology from Space (40 pts)

Species Occurrences Map (30 pts)