diff --git a/.gitignore b/.gitignore
index e1ae1a3..2e5954a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -2,5 +2,5 @@
.Rhistory
.RData
.Ruserdata
-_bookdown_files
+_bookdown_files/
*.tif
diff --git a/.travis.yml b/.travis.yml
index 5b719bc..aa83dd1 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -2,6 +2,10 @@ language: R
sudo: false
cache: packages
+branches:
+ except:
+ - develop
+
before_install:
- sudo apt-get install libudunits2-dev
- sudo apt-get install libgdal-dev
diff --git a/README.md b/README.md
index 06c326c..4e261b7 100644
--- a/README.md
+++ b/README.md
@@ -5,28 +5,27 @@
## An introduction to the use of TERRA REF data and software
-This repository provides a set of tutorials that are divided by data types and use cases.
+This repository provides a set of tutorials that are divided by data types and use cases.
-In the repository, you will find three folders that contain examples of how to access data:
+In the repository, you will find folders that contain examples of how to access data.
+Within each folder there are both R markdown and Jupyter notebooks.
-* traits
-* sensors
-* plantCV
+_If you are **interested in learning how to use the TERRA REF data**_, please find these tutorials published in book form at [terraref.org/tutorials](https://terraref.org/tutorials).
-Within each folder there are both R markdown and Jupyter notebooks. These describe different approaches to accessing data. These are intended to cover diverse use cases, and you will find information about accessing data from web interfaces but the primary focus is on accessing data using R, Python, SQL, and REST APIs. These are intended to provide quick-start introductions to access data along with computing environments required for further exploration. They are not intended to teach analyses, although some illustrative visualizations and statistical models are provided.
+If you want to **fix, improve, or contribute new tutorials**, please continue reading here!
This is a work in progress, and an open source community that welcomes contributions in many forms. Please feel welcome to ask questions, provide suggestions or share analyses that may be of interest to others.
-## Getting Started
+## Contributing
-### Requirements
+While you can run many of these tutorials locally, many require access to the TERRA REF filesystem and databases. These are available on a web-based cloud development environment that provides Rstudio, Jupyter Notebooks, and other interfaces. Therefore, the _only technical requirements_ are:
-All of the tutorials have been designed to work in the cloud and can be accessed using a web browser. Therefore, the _only technical requirements_ are:
* Web browser
* Internet connection
In addition, you will need to:
+
* Sign up as as a TERRA REF [Beta User by filling out this application](http://terraref.org/beta).
* Sign up for an account on the [TERRA REF Workbench](https://www.workbench.terraref.org), and wait for approval.
@@ -38,4 +37,4 @@ Although we provide a few pre-configured computing environments, Workbench is de
**To get started**, follow the [Workbench Quick Start](https://htmlpreview.github.io/?https://github.com/terraref/tutorials/blob/master/workbench/ndslabs_workbench_intro.html).
-This will walk you through the process of getting started with the first tutorials on how to access data.
+This will walk you through the process of getting started with the first tutorials on how to access data.
\ No newline at end of file
diff --git a/_bookdown.yml b/_bookdown.yml
index cd04963..87e602d 100644
--- a/_bookdown.yml
+++ b/_bookdown.yml
@@ -10,6 +10,10 @@ rmd_files: ["index.Rmd",
"vignettes/02-get-weather-data-R.Rmd",
"vignettes/03-get-images-python.Rmd",
"vignettes/04-synthesis-data.Rmd",
+"traits/00-BETYdb-getting-started.Rmd",
"traits/03-access-r-traits.Rmd",
+"traits/08-access-traits-python.Rmd",
"sensors/01-meteorological-data.Rmd",
-"sensors/06-list-datasets-by-plot.Rmd"]
+"sensors/06-list-datasets-by-plot.Rmd",
+"sensors/10-meteorological-data-python.Rmd",
+"data_use_policy.Rmd"]
diff --git a/data_use_policy.Rmd b/data_use_policy.Rmd
new file mode 100644
index 0000000..66f6c35
--- /dev/null
+++ b/data_use_policy.Rmd
@@ -0,0 +1,38 @@
+# Data Use Policy {#data_use_policy}
+
+## Release with Attribution
+
+We plan to make data from the Transportation Energy Resources from Renewable Agriculture Phenotyping Reference Platform (TERRA-REF) project available for use with attribution. Each type of data will include or point to the appropriate attribution policy.
+
+## Timing and Control of Release
+
+We plan to release the data in stages or tiers. For pre-release access please complete the [beta tester application](https://terraref.org/beta).
+
+1. The **first tier** will be an internal release to the TERRA-REF team and the standards committee. This first tier release will be to quality check and calibrate the data and will take place as data sets are produced and compiled.
+2. The **second tier** will enable the release of the data generated solely by the TERRA-REF team to other TERRA teams as well as non-TERRA entities. Release of the data to the second tier may occur prior to publication and that access is granted with the understanding that the contributions and interests of the TERRA-REF team should be recognized and respected by the users of the data. The TERRA-REF team reserves the right to analyze and published its own data. Resource users should appropriately cite the source of the data and acknowledge the resource produces. The publication of the data, as suggested in the TERRA-REF Authorship Guidelines, should specify the collaborative nature of the project, and authorship is expected to include all those TERRA-REF team members contributing significantly to the work.
+3. The **third tier** will provide the public with access to curated datasets from the TERRA REF program. It is an objective of the TERRA-REF team to release of the data to the public by November 2019. These will be released under the conditions described below.
+
+## Genomic Data
+
+### Restrictions on dataset usage
+
+Genomic data for the _Sorghum bicolor_ Bioenergy Association Panel (BAP) from the TERRA-REF project is available pre-publication to maximize the community benefit of these resources. Use of the raw and processed data that is available should follow the principles of the [Fort Lauderdale Agreement](https://www.genome.gov/pages/research/wellcomereport0303.pdf) and the [Department of Energy's Joint Genome Institute (JGI) early release policies](http://genome.jgi.doe.gov/pages/data-usage-policy.jsf).
+
+By accessing these data, you agree not to publish any articles containing analyses of genes or genomic data on a whole genome or chromosome scale prior to publication by TERRA-REF and/or its collaborators of a comprehensive genome analysis ("Reserved Analyses"). "Reserved analyses" include the identification of complete (whole genome) sets of genomic features such as genes, gene families, regulatory elements, repeat structures, GC content, or any other genome feature, and whole-genome- or chromosome-scale comparisons with other species. The embargo on publication of Reserved Analyses by researchers outside of the TERRA-REF project is expected to extend until the publication of the results of the sequencing project is accepted. Scientific users are free to publish papers dealing with specific genes or small sets of genes using the sequence data. If these data are used for publication, the following acknowledgment should be included: 'These sequence data were produced by the US Department of Energy Transportation Energy Resources from Renewable Agriculture Phenotyping Reference Platform (TERRA-REF) Project'. These data may be freely downloaded and used by all who respect the restrictions in the previous paragraphs. The assembly and sequence data should not be redistributed or repackaged without permission from TERRA-REF. Any redistribution of the data during the embargo period should carry this notice: "The TERRA-REF project provides these data in good faith, but makes no warranty, expressed or implied, nor assumes any legal liability or responsibility for any purpose for which the data are used. Once the sequence is moved to unreserved status, the data will be freely available for any subsequent use."
+
+We prefer that potential users of these sequence data contact the individuals listed under Contacts with their plans to ensure that proposed usage of sequence data are not considered Reserved Analyses.
+
+## Software and Algorithms
+
+For algorithms, we intend to release via BSD 3 clause or MIT / BSD compatible license. Algorithms are available on GitHub in the terraref organization: github.com/terraref.
+
+## Images, Phenotypes, and Other Raw Data
+
+For other raw data, such as phenotypic data and associated metadata, we intend to release data under [CC0: Creative Commons with No Rights Reserved - Public Domain)](https://creativecommons.org/share-your-work/public-domain/cc0/). This is to enable reuse of these data, but **Scientists are expected cite our data and research publications.** For more information, see related discussion and links in https://github.com/terraref/reference-data/issues/216.
+
+## Contacts
+
+* Todd Mockler, Project/Genomics Lead (email: tmockler@danforthcenter.org)
+* David LeBauer, Computing Pipeline Lead (email: dlebauer@email.arizona.edu)
+* Nadia Shakoor, Project Director (email: nshakoor@danforthcenter.org)
+
diff --git a/docs/.nojekyll b/docs/.nojekyll
deleted file mode 100644
index 8b13789..0000000
--- a/docs/.nojekyll
+++ /dev/null
@@ -1 +0,0 @@
-
diff --git a/docs/accessing-meteorological-data.html b/docs/accessing-meteorological-data.html
deleted file mode 100644
index ff2e212..0000000
--- a/docs/accessing-meteorological-data.html
+++ /dev/null
@@ -1,595 +0,0 @@
-
-
-
-
units can be converted by udunits, so these can vary (e.g. the time denominator may change with time frequency of inputs)
-
soil moisture for the full column, rather than a layer, is soil_moisture_content
-
-
For example, in the MsTMIP-CRUNCEP data, the variable rain should be precipitation_rate. We want to standardize the units as well as part of the met2CF.<product> step. I believe we want to use the CF “canonical” units but retain the MsTMIP units any time CF is ambiguous about the units.
-
The key is to process each type of met data (site, reanalysis, forecast, climate scenario, etc) to the exact same standard. This way every operation after that (extract, gap fill, downscale, convert to a model, etc) will always have the exact same inputs. This will make everything else much simpler to code and allow us to avoid a lot of unnecessary data checking, tests, etc being repeated in every downstream function.
-
-
-
-
8.1.2 Using the API to get data
-
In order to access the data, we need to contruct a URL that links to where the data is located on Clowder. The data is then pulled down using the API, which “receives requests and sends responses” , for Clowder.
-
-
-
8.1.3 The structure of the Geostreams database
-
The meteorological data that is collected for the TERRA REF project is contained in multiple related tables, also know as a relational database. The first table contains data about the sensor that is collecting data. This is then linked to a stream table, which contains information about a datastream from the sensor. Sensors can have multiple datastreams. The actual weather data is in the third table, the datapoint table. A visual representation of this structure is shown below.
-
-
-
-
-
In this vignette, we will be using data from a weather station at the Maricopa Agricultural Center, with datapoints for the month of January 2017 from a certain sensor. These data are five minute summaries aggregated from observations taken every second.
A certain time period can be specified for the datapoints.
-
For example, below are the URLs for the particular data being used in this vignette. These can be pasted into a browser to see how the data is stored as text using JSON.
Possible sensor numbers for a station are found on the page for that station under “id:”, and then datapoints numbers are found on the sensor page under “stream_id:”.
-
The table belows lists the names of some stations that have available meteorological data and associated stream ids.
-
-
-
-
stream id
-
name
-
-
-
-
-
3211
-
UA-MAC AZMET Weather Station - weather
-
-
-
3212
-
UA-MAC AZMET Weather Station - irrigation
-
-
-
46431
-
UA-MAC AZMET Weather Station - weather (5 min)
-
-
-
3208
-
EnvironmentLogger sensor_weather_station
-
-
-
3207
-
EnvironmentLogger sensor_par
-
-
-
748
-
EnvironmentLogger sensor_spectrum
-
-
-
3210
-
EnvironmentLogger sensor_co2
-
-
-
4806
-
UIUC Energy Farm SE
-
-
-
4807
-
UIUC Energy Farm CEN
-
-
-
4805
-
UIUC Energy Farm NE
-
-
-
-
Here is the json representation of a single five-minute observation:
The data represent 5 minute summaries aggregated from 1/s observations.
-
-
-
8.1.6 Download data using the command line
-
Data can be downloaded from Clowder using the command line program Curl. If the following is typed into the commmand line, it will download the datapoints data that we’re interested in as a file which we have chosen to call spectra.json.
-
curl -o spectra.json -X GET https://terraref.ncsa.illinois.edu/clowder/api/geostreams/datapoints?stream_id=46431&since=2017-01-02&until=2017-01-31
-
-
8.1.6.1 Using R
-
The following code sets the defaults for showing R code.
And this is how you can access the same data in R. This uses the jsonlite R package and desired URL to pull the data in. The data is in a dataframe with two nested dataframes, called properties and geometries.
The geometries dataframe is then pulled out from these data, which contains the datapoints from this stream. This is combined with a transformed version of the end of the time period from the stream.
Create time series plot for one of the eight variables, wind speed, in the newly created dataframe.
-
theme_set(ggthemes::theme_few())
-ggplot(data = weather_data) +
-geom_point(aes(x = time, y = wind_speed), size =0.7) +
-labs(x ="Day", y ="Wind speed (m/s)")
-
-
-
8.2.1 High resolution data (1/s) + spectroradiometer
-
This higher resolution weather data can be used for VNIR calibration, for example. But at 1/s it is very large!
-
-
8.2.1.1 Download data
-
Here we will download the files using the Clowder API, but note that if you have access to the filesystem (on www.workbench.terraref.org or globus, you can directly access the data in the sites/ua-mac/Level_1/EnvironmentLogger. Folder
# Get Spaces from Clowder - without authentication, result will be Sample Data
-spaces <-fromJSON(paste0(api_url, '/spaces'))
-print(spaces %>%select(id, name))
-
# Get list of (at most 20) Datasets within that Space from Clowder
-datasets <-fromJSON(paste0(api_url, '/spaces/', spaces$id, '/datasets'))
-print(datasets %>%select(id, name))
-
# Get list of Files within any EnvironmentLogger datasets and filter .nc files
-files <-fromJSON(paste0(api_url, '/datasets/', datasets$id[grepl("EnvironmentLogger", datasets$name)], '/files'))
-ncfiles <-files[grepl('environmentlogger.nc', files$filename), ]
-print(ncfiles %>%select(id, filename))
-
-
-
8.2.1.2 Download netCDF 1/s data from Clowder
-
-
-
8.2.1.3 Using the netCDF 1/s data
-
One use case getting the solar spectrum associated with a particular hyperspectral image.
The rOpenSci traits package makes it easier to query the TERRA REF trait database because 1) you can pass the query parameters in an R function, and the package takes care of putting the parameters into a valid URL and 2) because the package returns data in a tabular format that is ready to analyze.
-
-
7.1 Using the R traits package to query the database
-
-
7.1.1 Setup
-
Install the traits package
-
The traits package can be installed through github using the following command:
-
if(packageVersion("traits") == '0.2.0'){
- devtools::install_github('terraref/traits', force =TRUE)
-}
-
Load other packages that we will need to get started.
Create a file that contains your API key. If you have signed up for access to the TERRA REF database, your API key will have been sent to you in an email. You will need this personal key and permissions to access the trait data. If you receive empty (NULL) datasets, it is likely that you do not have permissions.
-
# This should be done once with the key sent to you in your email
-
-# Example:
-#writeLines('abcdefg_rest_of_key_sent_in_email',
-# con = '.betykey')
-
-
7.1.1.1 R - using the traits package
-
The R traits package is an API ‘client’. It does two important things: 1. It makes it easier to specify the query parameters without having to construct a URL 2. It returns the results as a data frame, which is easier to use within R
-
Lets start with the query of information about Sorghum from the species table
How to create a summary of available data to query from a TERRA REF season
-
How to query a specific trait
-
How to visualize query results
-
-
-
-
3.2 Introduction
-
In this chapter, we go over how to query TERRA REF trait data using the traits package. The traits package is a way to query for various sources of species trait data, including BETYdb, NCBI, Coral Traits Disease and others. In this chapter we use BETYdb as our trait source, as it contains the TERRA REF data that we are interested in.
-
Our example will show how to query for season 6 data and visualize canopy height. In addition to the traits package we will also be using some of the tidyverse packages, which allow us to maniupulate the data in an efficient, understandable way. If you are unfamiliar with tidyverse syntax, we recommend checking out some of the resources here.
-
-
-
3.3 Query for available traits
-
-
3.3.1 Getting Started
-
First, we will need to install and load the traits package from github, and load it into our environment, along with the other packages we will use in this tutorial.
-
# devtools::install_github('terraref/traits', force = TRUE) # run once
-library(traits)
-library(ggplot2)
-library(lubridate)
-library(dplyr)
-library(knitr)
-
-
-
3.3.2 Setting options
-
The function that is used to query BETYdb is called betydb_query. To reduce the number of arguments needed to pass into this function, we can set some global options using options. In this case, we will set the URL used in the query, and the API version.
The TERRA REF database contains trait data for many other seasons of observation, and available data may vary by season. Here, we get a visual summary of available traits and methods of measurement for a season.
-
First we construct a general query for the Season 4 data. This returns all season 4 data. The function betydb_query takes as arguments key = "value" pairs which represent columns in the database to query. In this example, we sitename column for season 4 data, and set the limit to “none” to return all records. By default, the function will search all tables in the database. To specify a particular table you can use the table argument.
-
# get all of season 4 data
-season_4 <-betydb_query(sitename ="~Season 4",
- limit ="none")
-
The return value for the betydb_query function is just a data.frame so we can work with it like any other data.frame in R.
-
Let’s plot a time series of all traits returned. First you might notice that the relevant date columns in the season_4 data.frame are returned as characters instead of a date format. Before plotting, let’s get our raw_date column into a proper date format using functions from dplyr and lubridate.
top of the general canopy of the plant, discounting any exceptional branches, leaves or photosynthetic portions of the inflorescence.
-
-
-
canopy_cover
-
Fraction of ground covered by plant
-
-
-
leaf_desiccation_present
-
Visual assessment of presence or absence of leaves showing desiccation. 1 = present, 0 = absent
-
-
-
leaf_length
-
Length of leaf from tip to stem along the midrib.
-
-
-
leaf_width
-
width of leaf at widest point along leaf
-
-
-
lodging_present
-
Visual assessment of presence or absence of lodging or severe leaning within a plot. 1 = present, 0 = absent
-
-
-
surface_temperature_leaf
-
Leaf surface temperature estimates of an individual haphazardly selected sunlit leaf in the upper canopy, targeting fully expanded leaves when possible, using hand-held instruments
-
-
-
panicle_height
-
height to top of panicle
-
-
-
stand_count
-
Number of plants per subplot or plot, counted after thinning
-
-
-
absorbance_730
-
Absorbance at 730 nm
-
-
-
relative_chlorophyll
-
Relative value describing the concentration of chlorophyll in the leaf, ranges from 0 to 80
-
-
-
-
-
-
-
3.4 Querying a specific trait
-
-
3.4.1 Querying season 6 canopy height data
-
You may find after constructing a general query as above that you want to only query a specific trait. Here, we query for the canopy height trait by adding the key-value pair trait = "canopy_height" to our query function. Note that the limit is also set to return only 250 records, shown here for demostration purposes.
And we can generate a time series plot of just the canopy height data.
-
#plot a time series of canopy height
-ggplot(data = canopy_height,
- aes(x = raw_date, y = mean)) +
-geom_point(size =0.5, position =position_jitter(width =0.1)) +
-xlab("Date") +
-ylab("Plant height (cm)") +
-ggtitle("Sorghum canopy height, Season 6 TERRA REF") +
-theme_bw()
4.1 Objective: To be able to demonstrate how to get TERRA REF meteorological data
-
This vignette shows how to read weather data for the month of January 2017 from the weather station at the University of Arizona’s Maricopa Agricultural Center into R. These data are stored online on the data management system Clowder, which is accessed using an API. More detailed information about the structure of the database and how API URLs are created is available in the weather tutorial. Data across time for one weather variable, temperature, is plotted in R. Then all eight of the weather variables have their times series plotted.
-
-
-
4.2 Read in data using R
-
A set of weather data can be accessed with a URL using the R package jsonlite. We are calling that library along with several others that will be used to clean and plot the data. The data is read in by the fromJSON function as a dataframe that also has two nested dataframes, called properties and geometries.
The geometries dataframe is then pulled out from these data, which contains the datapoints from this stream. This is combined with a transformed version of the end of the time period from the stream.
The temperature data, which is five minute averages for the entire month of January 2017, is used to calculate the growing degree days for each day. Growing degree days is a measurement that is used to predict when certain plant developmental phases happen. This new dataframe will be used in the last vignette to synthesize the trait, weather, and image data.
The five minute summary weather variables in the weather_data dataframe can be plotted across time, as shown below for temperature.
-
ggplot(data = weather_data) +
-geom_point(aes(x = time, y = air_temperature), size =0.1) +
-labs(x ="Date", y ="Temperature (K)")
-
-
We can also plot the time series for all eight of the weather variables in a single figure. We first have to rearrange the data to making plotting possible using R package ggplot.
Chapter 6 Combining trait, weather, and image datasets
-
The objective of this vignette is to walk through how to combine our several types of data, and demonstrate several realistic analyses that can be done on these merged data.
-
For the first analysis, we want to figure out how the number of sufficiently warm days affects the amount of canopy cover at our site. We do this by combining the canopy cover data with the meteorological data on growing degree days, then modeling and plotting their relationship. We are specifically interested in figuring out when the increase in canopy cover starts to slow down in response to warm temperature days.
-
The second analysis compares greenness from image data with canopy cover. The second analysis uses the gdal_translate tool from the GDAL package. You can use one of the prepared downloads availble on the GDAL web site to install the needed software tool. Alternatively, it’s possible to download and build the tools on your system. For example, on the Mac, the command is brew install gdal.
-
-
6.1 Get and join data
-
Here we combine two dataframes. The first contains all the canopy height values for 2017, which was created in the traits vignette. The second is the cumulative growing degree days for all of 2017, which were calculated from the daily minimum and maximum temperatures in the weather vignette. They are combined by their common column, the date.
6.2 Plot and model relationship between GDD and canopy cover for each cultivar
-
We are interested in how growing degree days affects canopy cover. To investigate this, we are going to model and plot their relationship. We want to know the relationship for each cultivar, so we’ll start of by determining the parameters of the model for one of the cultivars in our dataset. We are using a logistic growth model here because it is appropriate for the shape of the GDD-cover relationship.
We then use the parameters from a single cultivar to run a model for each of the rest of the cultivars. These results are used to plot the model predictions, which are shown as an orange line. We also calculated the inflection point from each cultivar’s model, which will be used in the following section.
6.3 Create histogram of growth rate for all cultivars
-
The last thing that we are going to do is assess the difference in this relationship among the cultivars. We are going to use the inflection point from the logistic growth model, which indicates when canopy cover stops increasing as quickly with increasingly more warm days. The resulting inflection points for each cultivar are plotted as a histogram.
-
ggplot(all_cultivars) +
-geom_point(aes(x = gdd_cum, y = mean)) +
-geom_line(aes(x = gdd_cum, y = mean_predict), color ="orange") +
-geom_vline(aes(xintercept = inf_point)) +
-facet_wrap(~cultivar, scales ="free_y") +
-labs(x ="Cumulative growing degree days", y ="Canopy Height")
In this examnple we will extract our plot data from a series of images taken in May of Season 6, measure its “greeness” annd plot that against the plant heights from above in this vignette. The chosen statistic here is the normalised green-red difference index, NGRDI=(R-G)/(R+G) (Rasmussen et al., 2016), which uses the red and green bands from the image raster.
-
Below we retrieve all the available plots for a particular date, then find and convert the plot boundary JSON into tuples. We will use these tuples to extract the data for our plot.
-
library(traits)
-library(stringr)
-
-# Function for breaking apart a corner into its Lat, Lon components
-getLatLon <-function(corner){
- p <-strsplit(corner, ' ')
- return (c(p[[1]][1], p[[1]][2]))
-}
-
-# Gets the bounding box of the array of points
-getBounds <-function(bounds){
- minX <-NA
- minY <-NA
- maxX <-NA
- maxY <-NA
- for (c inunique(bounds)){
- p =getLatLon(c)
- if (is.na(minX) ||(minX >p[2]))
- minX =p[2]
- if (is.na(minY) ||(minY >p[1]))
- minY =p[1]
- if (is.na(maxX) ||(maxX <p[2]))
- maxX =p[2]
- if (is.na(maxY) ||(maxY <p[1]))
- maxY =p[1]
- }
-
- return (c(minX, minY, maxX, maxY))
-}
-
-# Setting up our options
-options(betydb_url ="https://terraref.ncsa.illinois.edu/bety/",
- betydb_api_version ='v1')
-
-# Makiong the query for our site
-sites <-betydb_query(table ="sites",
- sitename ="MAC Field Scanner Season 6 Range 19 Column 1")
-
-# Assigning the geometry of the site (GeoJSON format)
-site.geom <-sites$geometry
-
-# Stripping out the extra information to get to the points
-complete_str <-str_match_all(site.geom, '(\\(\\(\\((.*)\\)\\)\\))')[[1]][, 3]
-bounds <-strsplit(complete_str, ', ')[[1]]
-
-# Getting the bounding box of the polygon
-bounding_box =getBounds(bounds)
-
These are the names of the full field RGB data for the month of May. We will be extracting our plot data from these files. A compressed file containing these images can be found on Google Drive. Be sure to extract the image files into a folder that’s accessible to the code below.
We will loop through these images, extract our plot data, and calculate the “greeness” of each extract. We are using the name of the file to extract the date for later.
-
library(raster)
-library(stringr)
-
-# Extract the date from the file name
-getDate <-function(file_name){
- date <-str_match_all(file_name, '[0-9]{4}-[0-9]{2}-[0-9]{2}')[[1]][,1]
- return(date)
-}
-
-
-# Get the clip coordinates into the correct order
-clip_coords <-paste(toString(bounding_box[4]),' ',toString(bounding_box[3]),
- ' ',toString(bounding_box[2]),' ',toString(bounding_box[1]))
-
-
-# Returns the greeness value of the clipped image
-getGreeness <-function(file_name, clip_coords){
- out_file <- "extract.tif"
-
- # Execute the GDAL command to extract the plot
- command =paste("gdal_translate -projwin ",clip_coords," ",file_name," ",out_file)
- system(command)
-
- # Load the red & green bands of the image and calculate the greeness value
- red_image <-raster(out_file, band =1)
- cellStats(red_image, stat ="mean")
-
- green_image <-raster(out_file, band =2)
- cellStats(green_image, stat ="mean")
-
- add_rasters <-green_image +red_image
- numerator <-cellStats(add_rasters, stat ="sum")
-
- subtract_rasters <-green_image -red_image
- denominator <-cellStats(subtract_rasters, stat ="sum")
-
- greeness <-numerator /denominator
-
- # Remove the temporary file
- if (file.exists(out_file))
- file.remove(out_file)
-
- return(greeness)
-}
-
-# Extract all the dates from the images
-day <-sapply(image_files, getDate, USE.NAMES =FALSE)
-
-# Extract all the greeness for the plot
-greeness <-sapply(image_files, getGreeness, clip_coords=clip_coords, USE.NAMES =FALSE)
-
-# Build the final day and greeness
-greenness_df <-data.frame(day,greeness)
-
-# Convert to tibble for later joining
-greenness_df <-as_tibble(greenness_df)
-greenness_df <-greenness_df %>%
-mutate(day =as.Date(day))
-
We then pull in the canopy data for our charting purposes.
We now need to add the height data to the data set to plot.
-
We then determine the average canopy cover across the site for the day that the sensor data were collected. The relationship between our greenness metric and average canopy cover are plotted.
if you have not already done so, you will need to 1) sign up for the beta user program and 2) sign up and be approved for access to the the sensor data portal in order to get the API key that will be used in this tutorial.
-
-
The terrautils python package has a new products module that aids in connecting plot boundaries stored within betydb with the file-based data products available from the workbench or Globus.
-
-
if are using Rstudio and want to run the Python code chunks, the R package “reticulate” is required
-
use pip3 install terrautils to install the terrautils Python library
-
-
-
-
9.2 Getting started
-
After installing terrautils, you should be able to import the products module.
-
from terrautils.products import get_sensor_list, unique_sensor_names
-from terrautils.products import get_file_listing, extract_file_paths
-
The get_sensor_list and get_file_listing functions both require the connection, url, and key parameters. The connection can be ‘None’. The url (called host in the code) should be something like https://terraref.ncsa.illinois.edu/clowder/. The key is a unique access key for the Clowder API.
-
-
-
9.3 Getting the sensor list
-
The first thing to get is the sensor name. This can be retreived using the get_sensor_list function. This function returns the full record which may be useful in some cases but primarily includes sensor names that include a plot id number. The utility function unique_sensor_names accpets the sensor list and provides a list of names suitable for use in the get_file_listing function.
-
To use this tutorial you will need to sign up for Clowder, have your account approved, and then get an API key from the Clowder web interface.
-
url ='https://terraref.ncsa.illinois.edu/clowder/'
-key ='ENTER YOUR KEY HERE'
Names will now contain a list of sensor names available in the Clowder geostreams API. The list of returned sensor names could be something like the following:
-
-
flirIrCamera Datasets
-
IR Surface Temperature
-
RGB GeoTIFFs Datasets
-
stereoTop Datasets
-
scanner3DTop Datasets
-
Thermal IR GeoTIFFs Datasets
-
…
-
-
-
-
9.4 Getting a list of files
-
The geostreams API can be used to get a list of datasets that overlap a specific plot boundary and, optionally, limited by a time range. Iterating over the datasets allows the paths to all the files to be extracted.
-
sensor ='Thermal IR GeoTIFFs Datasets'
-sitename ='MAC Field Scanner Season 1 Field Plot 101 W'
-key ='INSERT YOUR KEY HERE'
-datasets = get_file_listing(None, url, key, sensor, sitename)
-files = extract_file_paths(datasets)
-
Datasets can be further filtered using the since and until parameters of get_file_listing with a date string.
The source files behind the data are available for downloading through the API. By executing a series of requests against the API it’s possible to determine the files of interest and then download them.
-
Each of the API URL’s have the same beginning (https://terraref.ncsa.illinois.edu/clowder/api), followed by the data needed for a specific request. As we step through the process you will be able to see how then end of the URL changes depending upon the reuqest.
-
Below is what the API looks like as a URL. Try pasting it into your browser.
This will return data for the requested plot including its id. This id (or identifier) can then be used for additional queries against the API.
-
In the examples below we will be using curl on the command line to make our API calls. Since the API is accessed through URLs, it’s possible to use the URLs in software programs or with a programming language to retrieve its data.
-
-
9.5.1 A Word of Caution
-
We are no longer using the python terrautils package, which is a python library that provides helper functions that simplify interactions with the Clowder API. One of the ways it makes the interface easier is by using function names that make sense in the scope of the project. The API and the Clowder database have different names and this is confusing since the same names are used for different parts of the database.
-
The names and meanings of variables in this section don’t necessarily match the ones in the section above and it may be easy to get them confused. The API queries the database directly and thereby reflects the database structure. This is the main reason for the naming differences between the API and the terraref client.
-
For example, the Clowder API’s use of the term SENSOR_NAME is equivalent to site_name above.
-
-
-
9.5.2 Finding plot ID
-
We can query the API to find the identifier associated with the name of a plot. For this example we use the variable name of SENSOR_DATA to indicate the name of the plot.
-
SENSOR_NAME="MAC Field Scanner Season 1 Field Plot 101 W"
-curl -o plot.json -X GET "https://terraref.ncsa.illinois.edu/clowder/api/geostreams/sensors?sensor_name=${SENSOR_NAME}"
-
This creates a file named plot.json containing the JSON object returned by the API. The JSON object has an ‘id’ parameter. This ID parameter can be used to specify the correct data stream.
-
-
-
9.5.3 Finding stream ID within a plot
-
Using the sensor ID returned in the JSON from the previous call and the id of a sensor returned previously to get the stream id. The names of streams are are formatted as “ Datasets ()”.
-
SENSOR_ID=3355
-STREAM_NAME="Thermal IR GeoTIFFs Datasets (${SENSOR_ID})"
-curl -o stream.json -X GET "https://terraref.ncsa.illinois.edu/clowder/api/geostreams/streams?stream_name=${STREAM_NAME}"
-
A file named stream.json will be created containing the returned JSON object. This JSON object has an ‘id’ parameter that contains the stream ID. You can use this ID parameter to get the datasets, and then datapoints, of interest.
-
-
-
9.5.4 Listing Clowder dataset IDs for that plot & sensor stream
-
We now have a stream ID that we can use to list our datasets. The datasets in turn contain files of interest.
-
STREAM_ID=11586
-curl -o datasets.json -X GET "https://terraref.ncsa.illinois.edu/clowder/api/geostreams/datapoints?stream_id=${STREAM_ID}"
-
After the call succeeds, a file named datasets.json is created containing the returned JSON onject. As part of the JSON object there are one or more properties fields containing source_dataset parameters.
-
properties:{
- dataset_name:"Thermal IR GeoTIFFs - 2016-05-09__12-07-57-990",
- source_dataset:"https://terraref.ncsa.illinois.edu/clowder/datasets/59fc9e7d4f0c3383c73d2905"
-},
-
The URL of each source_dataset can be used to view the dataset in Clowder.
-
The datasets can also be filtered by date. The following filters out datasets that are outside of the range of January 2, 2017 through June 20, 2017.
-
curl -o datasets.json -X GET "https://terraref.ncsa.illinois.edu/clowder/api/geostreams/datapoints?stream_id=${STREAM_ID}&since=2017-01-02&until=2017-06-10"
-
-
-
9.5.5 Getting file paths from dataset
-
Now that we know what the dataset URLs are, we can use the URLs to query the API for file IDs in addition to their names and paths.
-
Note the the URL has changed from our previous examples now that we’re using the URLs returned by the previous call.
-
SOURCE_DATASET="https://terraref.ncsa.illinois.edu/clowder/datasets/59fc9e7d4f0c3383c73d2905"
-curl -o files.json -X GET "${SOURCE_DATASET}/files"
-
As before, we will have a file containing the returned JSON, named files.json in this case. The returned JSON consists of a list of the files in the dataset with their IDs, and other data if available:
Given that a large number of files may be contained in a dataset, it may be desireable to automate the process of pulling down files to the local system.
-
For each file to be retrieved, the unique file ID is needed on the URL.
-
FILE_NAME="ir_geotiff_L1_ua-mac_2016-05-09__12-07-57-990.tif"
-FILE_ID=59fc9e844f0c3383c73d2980
-curl -o "${FILE_NAME}" -X GET "https://terraref.ncsa.illinois.edu/clowder/api/files/${FILE_ID}"
-
This call will cause the server to return the contents of the file identified in the URL. This file is then stored locally in *ir_geotiff_L1_ua-mac_2016-05-09__12-07-57-990.tif*.
This book is intended to quickly introduce users to TERRA REF data through a series of tutorials. TERRA REF has many types of data, and most can be accessed in multiple ways. Although this makes it more complicated to learn (and teach!), the objective is to provide users with the flexibility to access data in the most useful way.
-
The first section walks the user through the steps of downloading and combining three different types of data: plot level phenotypes, meteorological data, and images. Subesquent sections provide more detailed examples that show how to access a larger variety of data and meta-data.
-
-
1.1 Pre-requisites
-
While we assume that readers will have some familiarity with the nature of the problem - remote sensing of crop plants - for the most part, these tutorials assume that the user will bring their own scientific questions and a sense of curiosity and are eager to learn.
-
These tutorials are aimed at users who are familiar with or willing to learn programming languages including R (particularly for accessing plot level trait data) and Python (primarily for accessing environmental data and sensor data). In addition, there are examples of using SQL for more sophisticated database queries as well as the bash terminal.
-
Some of the lessons only require a web browser; others will assume familarity with programming at the command line in (typically only one of) Python, R, and / or SQL. You should be willing to find help (see finding help, below).
-
-
1.1.1 Technical Requirements
-
At a minimum, you should have:
-
-
An internet connection
-
Web Browser
-
Access to the data that you are using
-
-
The tutorials will state which databases you will need access to
-
-
Software:
-
-
Software requirements vary with the tutorials, and may be complex
-
-
-
-
-
1.1.2 User Accounts and permission to access TERRA REF data
-
The first few chapters in the ‘vignettes’ section use publicly available sample data sets. Subsequent sections are also written to use publicly available data sets, but some of the examples require data that requires users to sign up. To sign up, you will need to 1) fill out the TERRA REF Beta user questionaire (terraref.org/beta) and 2) request access to specific databases.