Arrow image

28 Nov 2024

A Dockerized PacBio Workflow that slashed run time by 41%

WRITTEN BY

Sakshi Shinghal

SHARE THIS

Blog

Run time can become a major bottleneck in analysis when working with large file sizes. So, when approached by a client to build a long-read analysis pipeline, we created a dockerized solution that cut down run time by 41%. Our pipeline produced a vcf containing over 50,000 variants from a 42 GB file in just 120 minutes. When run through the various tools individually, this same analysis took 204 minutes. 

The client was working with unaligned long-reads obtained from a Pacific Biosciences machine, however, the analysis proved tedious and time-consuming. Each tool required separate setup, configuration, and resource allocation, leading to inconsistent environments, inefficiency in execution, and fragmented outputs. 

We created a unified Docker-based solution that packaged these scattered workflows into one whole and optimized them such that it would be -$1.24 cheaper to run a typical 100GB WGS using this solution, and when run for one month, you would be able to run 62 more WGS samples as compared to running them through each tool separately. 

By running the analysis tools through this Docker container, we reduced the overall runtime of using each tool individually from 204 minutes to a combined 120 minutes.

 

Through this Docker solution, we were able to:

  • Manage different dependencies for each tool within a single Docker image without version conflicts
  • Ensure tools can run in parallel without interference while handling interdependencies when they exist 
  • Implement standardized logging and robust error handling
  • Ensure the Docker image is flexible for additional tools and scalable for larger datasets in the future
  • Minimize Docker overhead and improve runtime efficiency to prevent the image from slowing down the analysis pipeline.
  • Store parameters in config files for easy sharing and reproducibility
  • Provide the option to run a subset of tools as per user’s need.

 

Our Docker solution streamlines tool execution, enabling consistent environments, parallel processing, simplified commands, and consolidated outputs* for efficient genomic analysis.

Contact us to learn more about how we can deliver a custom Docker solution for you!

Today’s Pick
from Blogs

24 Jan 2025

Strand’s Methylation Pipeline Series

2 | Strand’s Methylation Pipeline - An Overview - Part 2

Divya Anantsri

Know More

24 Jan 2025

AI/ML Series

2 | AI/ML in Spatial Transcriptomics and Cell Segmentation

Sakshi Shinghal

Know More

Your Next
Blog Recommendations

09 Jan 2025

AI/ML Series

1 | Standardized Data Harmonization Workflows

Sakshi Shinghal

Know More

15 Jan 2025

Strand's festiVAR Tool Achieves 40% Diagnostic Yield in Comprehensive Neurological Exome Sequencing

Sanjna Banerjee

Know More

06 Dec 2024

Strand’s Automated Variant Verification System Cuts Down Efforts by 80%

Divya Anantsri

Know More

Let's
Talk

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Form
About image
Please fill out this form to
download the case study.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.