Fundamentals of Data Analysis Workflows

Date: Tue 18 Apr 2023
Location: Training Portal
Best for: Independent User

Do you wish to know more about data analysis workflows and their implementation in NextFlow and Apache AirFlow?

Data-intensive domains such as machine learning and bioinformatics in industry require methodical, scalable, and reproducible workflows. In this course we will introduce you to the principles of scalable and reproducible data analysis, workflow design, and established practices in the development of data analysis pipelines. We will enhance your understanding of data analysis pipelines by providing practical demonstrations in NextFlow and Apache AirFlow. You will gain an insight into the established best practices and tools to advance reproducible data-intensive analysis.

The relationship between workflow and design is often misunderstood, and the terms workflow and pipeline are often used interchangeably.  Workflows and pipelines will be explored by examining the approach to answering scientific questions and the computational steps that are typically undertaken to aid answering those questions.

Aimed at independent users, the course will include:

  • An introduction to data analysis workflows and pipelines, which includes an exploration of data analysis phases, how to distinguish workflows from pipelines, and a tour of pipeline types.
  • Demonstration of examples: Apache Airflow and Nextflow pipelines
  • Guidance on how to get started with your own pipeline.

Pre-requisites: None

 

Create a free account to our Training Portal to register for a course and browse our full training catalogue. 

Create Account

Join Newsletter

Provide your details to receive regular updates from the STFC Hartree Centre.