Skip to content

Enhancing Visualizations for Biodiversity Data

RAHUL edited this page Mar 25, 2019 · 16 revisions

Background

Standardizing and aggregating publicly available biodiversity data from numerous sources (e.g., scientific research, citizen-science, and natural history collections), within a single portal has the potential to answer a staggering variety of research questions. Yet, biodiversity big data are prone to various data quality issues and biases, which may invalidate its usage in researches. Furthermore, complex technical and analytical skills are required for handling biodiversity big data. The bdverse is a collection of packages that form a general framework for facilitating biodiversity science in R. It is comprised of six unique packages in a hierarchical structure - representing different functionality levels, that can be used by users with or without programming capabilities. Hopefully, it will serve as a sustainable and agile infrastructure that enhances the value of biodiversity data by allowing users to conveniently employ R, for data exploration, quality assessment, data cleaning, and standardization.

The bdvis R package [1, 2, 3], created by Dr. Vijay Barve and Dr. Javier Otegui, is one of the packages in this hierarchy. It provides a set of functions to create visualizations for different aspects of biodiversity data, such as inventory completeness and the extent of coverage (taxonomic, temporal and spatial). These features can promote the identification of gaps and biases in the data. Moreover, bdvis exemplifies the value of diagnostic visualization for quickly unveiling patterns and anomalies in biodiversity big data. This coding project is aimed at adding state-of-the-art functionalities to bdvis as explained below.

Related work

bdvis, available from CRAN, has functions to draw spatial visualizations like bdwebmap, mapgrid and distrigraph, temporal visualizations like bdcalendarheat, chronohorogram and tempolar and taxonomic visualizations like taxotree. Few other R packages like biodiversityR, gdm and Metacoder offer limited functionalities for visual exploration. But all these still lack the majority of exploration methods, interactivity and are confined only to specific use cases.

More interactive visualizations such as density grid map, chronhorograms, and average distribution of records within years, ecological sensitivity map, voronoi treemap, cartography maps, taxonomy bar chart, taxonomy bubble chart and sankey diagrams will allow user to choose suspect records and execute further exploration analysis (drilling down). The opportunities are endless for visualizations and we are open for suggestions as well. These can largely facilitate user-level data cleaning procedures. An interactive visualization dashboard as Proof of Concept (PoC) for this project was developed by one of our students, which illustrates the potential of dashboards for biodiversity data.

Details of your coding project

We plan to incorporate into bdvis two state-of-the-art elements: interactive plotting and dashboards. We plan to develop and test an interface that enables graphic interactivity with ‘drilling down’ capabilities. Different tools, techniques and R packages will be evaluated. The most suitable solution will be incorporated in bdvis and relevant visualization templates will be constructed. The steps to follow will be tentatively as follows,

  1. Getting familiar with GBIF data and rgbif package.
  2. Getting familiar with biodiversity visualizations and bdvis package.
  3. Experimenting with new visualizations and R capabilities.
  4. Developing these visualizations as stable functions.
  5. Getting familiar with shiny UI development and bdverse package system.
  6. Developing visualizations as shiny modules and dashboards.
  7. Adding these functionalities to bdvis package and bdverse ecosystem.
  8. Being extremely experimental and creative.

Skills Required

R, shiny, data visualizations, HTML, Javascript, testthat

Expected impact

Diagnostic visualization can unveil hidden patterns and anomalies in the data, and allow quick exploration of massive datasets. Developing novel interactive visualizations coupled with a modular dashboard system for biodiversity data, that can easily be employed by R experts and novices alike; will undoubtedly promote biodiversity research.

Mentors

Students, please contact mentors below after completing at least one of the tests below.

  • Thiloshon Nagarajah [email protected] is a key member in bdverse development team. He was past GSoC and GCI student for Fedora Project, Sahana Foundation and R Language.
  • Vijay Barve [email protected] is the author and maintainer of bdvis and a key member in bdverse development team. Vijay is a biodiversity data scientist and has been a GSoC student and mentor since 2012 with the R project organization. Vijay has contributed to several packages on CRAN.
  • Tomer Gueta [email protected] is leading the bdverse project. He is a postdoctoral fellow at the Faculty of Civil and Environmental Engineering at the Technion, working with Prof. Yohay Carmel. His research deals with developing tools and methodologies for data-intensive biodiversity research. During the last two years, Tomer served as a GSoC mentor with the R project organization.

Join our Slack Channel for immediate clarifications: bd-r-group.slack.com

Tests

Students, please do one or more of the following tests before contacting the mentors above.

  • Easy: Download 10,000 GBIF’s occurrence records of Mammals in the U.S (georeferenced records only), using the ‘rgbif’ R package.
  • Medium: Build few visualizations in R with ‘bdvis’ that most effectively summarize the Mammals data you downloaded. Extra points for high aesthetics and creativity.
  • Medium: Write a simple shiny app with these visualizations.
  • Hard: Create a markdown report on different visualizations not covered in bdvis that can be used in biodiversity researches.
  • Hard: Choose one such visualization and create a simple shiny module for drawing that.

Solutions of tests

Students, please post here the github link to your test solutions in the format,

  1. Name - Email - University - Link to solutions
Clone this wiki locally