Introduction to Data Processing Workflow Languages

Exercises for the Harvard University Introduction to Data Processing Workflow Languages training course.

For details, see slides.

This repository contains sample pipelines in CWL and Nextflow.

The pipeline explores correlation between different numeric columns of a tab-separated file and builds a bar-chart plot. The user select one column (main variable) and a set of secondary columns that are correlated with the selected variable.

The actual data file contains a mix of demographics, behavioral, climate and air pollution data. It is hosted in IBM Cloud S3 bucket.

Note: This piepline is not intended to be used as best practices but as playground to explore different features of CWL and Nextflow workflow definition languages.

The same tasks migh be easier to do as standalone Python or R program, but the goal here is to show how to use specialized workflow definition domain specific languages (DSL).

It performs the following steps:

Cleanse the data. The input file contains strings (null) for some numeric values. Rows that with such values will not be parseable by pandas package and hence should be removed.
For every column that we would like to correlate with the main variable, we will calculate Pearson Correlation Coefficient using Python pandas package. Calculations can be done in parallel for different columns.
Gather and combine results of the calculations in step 2.
Use Gnuplot to build a bar-chart

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction to Data Processing Workflow Languages

About

Releases

Packages

Languages

License

HarvardRC/pipelines-tutorial

Folders and files

Latest commit

History

Repository files navigation

Introduction to Data Processing Workflow Languages

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages