In the scientific community, particularly within the vast and diverse biomedical field, the need for harmonized data is critical. Harmonization not only adheres to the FAIR principles—making datasets findable, accessible, interoperable, and reusable—but also turns diverse data sources into a unified, standardized, and reliable research foundation. By providing comparable variables, harmonized datasets become immediately usable for research, facilitating deeper analysis and interpretation.
Furthermore, harmonized data enhances the statistical robustness of subsequent analyses and enables researchers to assess the generalizability of findings. This process ensures that discoveries are not only relevant within specific datasets but can also be applied more broadly across different research contexts.
By providing comparable variables, harmonized datasets become immediately usable for research, facilitating deeper analysis and interpretation.
Applications of harmonized data in the biomedical field include:
Data harmonization has significantly advanced research in neurology, immunology, and biotechnology, enabling significant strides in disease understanding, diagnostics, and treatment innovation. The following initiatives underscore this impact:
The Alzheimer’s Disease Neuroimaging Initiative (ADNI) has been pivotal in identifying Alzheimer's biomarkers, as demonstrated by its contribution to over 1,500 publications. The Human Immunology Project Consortium (HIPC) has accelerated vaccine and immunotherapy advancements by developing a comprehensive immune response catalog. Similarly, The Cancer Genome Atlas (TCGA) has reshaped our cancer understanding with 2.5 petabytes of data, leading to breakthrough publications and the development of FDA-approved treatments.
Visualization platforms coupled with clean data enhance data interpretation and decision-making processes.
In addition, initiatives like the Human Cell Atlas (HCA) have been instrumental in providing detailed cellular insights and promoting global scientific collaboration through open-source data and advanced visualization tools, such as cBioPortal. Visualization platforms coupled with clean data enhance data interpretation and decision-making processes [1].
Recent stories from Strand:
Biomarker Validation
- Strand recently collaborated with a biotech company to validate biomarkers for early detection of a specific autoimmune disease, stratifying them by disease stage.
- Our team utilized public omics data for this task, efficiently validating the biomarkers and stratifying them by stages and phenotypes.
This approach not only streamlined the categorization of biomarkers across omics fields but also significantly reduced the company's costs involved in conducting biomarker research and saved costs by forgoing any non-viable targets.
Atlas Generation and Data Visualization
- Two of our clients required data harmonization and atlas generation to meet their specific research objectives.
- Strand’s data management process involved a comprehensive workflow starting with curating raw data from over 1,500 experiments, totaling more than 600 terabytes.
- This data was processed, analyzed, and organized into a hierarchy to ensure the accuracy and consistency of terminology before being moved to a secure storage area.
- By collecting detailed information from users regarding their studies and experiments, we created standardized datasets.
- These files were validated and then prepared for visualization, undergoing a thorough recheck on our visualization platform to maintain accuracy and consistency throughout our data management workflow.
At Strand, we have enhanced our data curation and tracking systems to efficiently tackle these challenges. Additionally, we have been able to create detailed atlases for different cell types, overcoming the complexities of integrating internal data and managing extensive datasets.
Throughout this blog series, we've explored the essential role of data harmonization in scientific research, delved into the function of ontologies in facilitating this process, and examined the various applications of harmonized data. As we've seen, harmonization not only enhances the reliability and comparability of data across different studies and disciplines but also significantly contributes to the acceleration of scientific discovery and innovation. As the demand for interoperable data continues to grow, the principles and practices of data harmonization will remain at the forefront of efforts to harness the full potential of scientific research in the digital age.
For a more in-depth exploration of how we helped a big pharmaceutical company with data harmonization, read this case study!
You can find the rest of the blog series below:
- Part 1: From Chaos to Clarity: Harmonizing Data
- Part 2: What’s in a Term?
- Part 3: Resolving Ontology Inconsistencies: Insights from Strand's Approach
Reference:
1. Nan Y, Ser JD et. al Data harmonisation for information fusion in digital healthcare: A state-of-the-art systematic review, meta-analysis and future research directions. Inf Fusion. 2022 Jun;82:99-122. Doi: 10.1016/j.inffus.2022.01.001. PMID: 35664012; PMCID: PMC8878813.