Arrow image

04 Jul 2024

Overcoming Data Management Hurdles in Multiomics Analysis

WRITTEN BY

Divya Anantsri

SHARE THIS

Blog

Biomedical insights based on a single omics level are only one piece of the puzzle.

Integrating and analysing various omics layers—genomics, transcriptomics, proteomics, metabolomics and others—allows us to probe further into disease pathways and potential drug targets.

With these multi-layered molecular insights, we are able to predict disease, accelerate drug discovery and provide personalized treatments to patients.

 

However, bringing high-throughput multi-omics data together comes with the challenges of data integration and management.

Combining diverse datasets is not as straightforward because of the complexities present in individual datasets. Often, there are missing values, inconsistent metadata, and non-standard naming methods.

This was brought to light at a recent talk I was at as part of Bio-IT World 2024, where Dr Joseph Pearson, Global Product Manager, Qiagen, elaborated on the benefits of unified omics data and how it can accelerate bioinformatic prioritization of drug targets. He mentioned that data scientists and bioinformaticians spend 80% of their time cleaning data because it is important to have integrated data for answering biological questions.

Thus, data integration - including onboarding, cleaning, harmonizing and manual curating - is a crucial step on our path toward multiomics discovery.

On this note, we presented a poster at Bio-IT World 2024 emphasizing our strategies for managing multi-omics datasets through harmonization and curation.

It is becoming increasingly clear that good data management cannot happen in siloes. This is especially true for multi-omics data, which needs various experts to understand and process the breadth of information. At Strand, curators, bioinformaticians, data scientists, and software engineers work together to make this happen.

 

 

Curators, bioinformaticians, data scientists and software engineers work together to manage the workflow from data ingestion to uploading of harmonized data for user access.

 

 

The process begins with data ingestion: we onboard multi-omics data from various sources such as public databases (GEO, SRA and ENA), publications, cloud-based data repositories, custom databases and labs and integrate it into a unified format.

 

Next, we apply various harmonization and curation methods to prepare them for downstream sharing and analysis. These include:

  • using controlled vocabulary
  • developing metadata schemas
  • enforcing naming conventions
  • providing clear definitions

Finally, when the datasets are harmonized, we upload them to a cloud or any other specified location with appropriate access controls.

This streamlined system has enabled  us to:

  • harmonize over 1000 datasets to improve gene models
  • source scRNA and scATACdata for model building
  • create an integrated atlas and a time-series model for cell-cell interaction studies
Yasodha Kannan S. in conversation about this poster at Bio-IT World 2024

 

Our efforts in standardizing and managing diverse datasets do not end here. We’re actively working on improving and automating the metadata curation process. For this, the team is experimenting with LLM automation for ontology mappings.

You can download a PDF version of this poster from our website!

Feel free to get in touch with Yasodha Kannan Sivasamy Radhakrishna Bettadapura Jaya Singh, PhD Ernie Hobbs to learn about our data harmonization services.

Today’s Pick
from Blogs

17 Dec 2024

Metadata Curation using AI/ML Methods

Sakshi Shinghal

Know More

13 Dec 2024

Strand’s Methylation Pipeline Series

1 | Strand’s Methylation Pipeline - An Overview

Divya Anantsri

Know More

Your Next
Blog Recommendations

02 Aug 2024

Identification of SNP Markers in the Pm4b Gene of Wheat

Vaishali Chakraborty

Know More

28 Nov 2024

AI or Manual Curation: The Path Forward

Sanjna Banerjee

Know More

13 Sep 2024

FDA Rule on Lab Developed Tests (LDTs)

3 | Partnering with New Molecular LDTs: Strand's FDA Compliance Roadmap

Divya Anantsri

Know More

Let's
Talk

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Form
About image
Please fill out this form to
download the case study.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.