Arrow image

24 May 2024

Nextflow Leading the Way for Pipeline Optimization

WRITTEN BY

Divya Anantsri

SHARE THIS

Blog

Nextflow is a workflow management software for developing scalable and reproducible workflows. In recent years, it has become increasingly popular in the bioinformatics space as it allows users to combine all of their various computational scripts into a single pipeline.

Some of the main advantages of using Nextflow are:

  • Portability 
  • Reproducibility 
  • Parallelization
  • Run time efficiency

Here at Strand, we have leveraged these features to optimize our methylation pipelines, achieving over 80% reduction in turnaround time and more than a 90% increase in data processing volume for our customers. To summarize:

  • The methylation pipelines were migrated to Nextflow and deployed on the Seqera platform, resulting in a  >50% cost reduction and simultaneous processing of ~100 samples using AWS Batch and AWS ECS.
  • The use of the Sentieon toolkit, which breaks large files into smaller chunks for simultaneous processing, helped achieve over 20% faster speeds compared to non-parallelized tools like Picard. 
  • The pipeline runtime was reduced from 8 days to just 1.5 days, enabling customers to increase processing from 1 batch per week (10 samples, 0.5 TB) to 20 batches per week (200 samples, 10 TB), ensuring the pipeline is no longer a bottleneck.
  • The pipeline was dockerized to encapsulate dependencies, enhance scalability and reduce overhead.

Our bioinformatics team—Pavan Kotha, Neha Bhojani, Nishant Shekhar, Mayur Saini, Juhi Pandey, Ruthvik Bobba, and Jaya Singh—recently presented this work at Bio-IT World 2024. The PDF version of this poster is available here!

Overall, porting the pipeline to Nextflow, dockerizing workflow steps, deploying on the Seqera platform, and enabling multiprocessing led to significant improvements in scalability, efficiency, reproducibility, and cost-effectiveness.

Stay tuned for future updates on this project! 

Today’s Pick
from Blogs

17 Dec 2024

Metadata Curation using AI/ML Methods

Sakshi Shinghal

Know More

13 Dec 2024

Strand’s Methylation Pipeline Series

1 | Strand’s Methylation Pipeline - An Overview

Divya Anantsri

Know More

Your Next
Blog Recommendations

04 Jul 2024

Bracing for the Petabyte Era in Genomics

Divya Anantsri

Know More

27 Mar 2023

Chris Hemsworth and the risk of Alzheimer’s Disease

Dr. Ramesh Hariharan

Know More

09 Aug 2024

RNA Expression Analysis: HY5-The Light Switch of Plants

Vaishali Chakraborty

Know More

Let's
Talk

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Form
About image
Please fill out this form to
download the case study.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.