Background Image

AI-Assisted Variant Interpretation

AI-Assisted Variant Interpretation

Tick SVG Image

LLM-Assisted Variant Interpretation

Tick SVG Image

Value Prop = Faster Variant Interpretation for Germline
Applications via the Use of LLMs.

01/ Our AI and LLM workflow

LLM-Assisted Variant Interpretation

  • Clinical/Doctor’s notes arrive in image form; text is extracted via Cloud Vision.
  • Phenotypes are extracted from text via a hybrid of fine tuned and zero-shot LLMs.
  • Image
  • Phenotypes are then converted to HPOs by performing semantic search against a database that vectorizes HPO terms.
  • The publicly available HPO to gene database is used to find the associated genes.
  • Results: Converting doctors notes to HPOs and then to associated genes results in a best recall of 94.7%, i.e. it misses 6% of genes that are associated with the underlying HPOs. The best human performance, for comparison, is 96%.  
  • Some genes are not curated sufficiently in HPO, for e.g., KCDT3.
  • These are curated via a workflow that includes using PubMed and MedGen to find phenotypes associated with these genes
  • Image
  • This augmentation method further improves recall by 1.5 %.
  • Gene scoring consists of two components: HPO score and variant score. Higher scoring genes have more specific phenotypes and better variant characteristics, as measured by well-known methods.
  • Genes are ranked according to score.
  • We created a benchmark of 1,500 cases with reportable variants. Of these 877 were positive, i.e., involved a pathogenic/likely pathogenic variant. 
  • The gene ranking as obtained above placed the reported genes in the top-3 rank in 75% of all positive cases.
  • Less than 0.75% cases had a reported gene outside the top 25 ranked genes.
02/ ACMG Labelling

Automated ACMG Labelling

For labels needing literature review

Image

LLM
Acceleration

LLMs are used to speed up Functional studies and Segregational studies labels

Image

Smart Variant
Labelling

The workflow involves an LLM that has been prompted with the ACMG criteria and is additionally fed with information on specific genes and variants from a literature search

Image

Genomic
Speedup

The average case time savings for the LLM-enabled ACMG labeling is ≈5x.

Image

1.5x Faster
ACMG Analysis

The overall workflow results in an improvement from 25 variants reviewed per person per day to ≈40 variants reviewed per person per day