Enrichment of Variant Information for the Variant Standardization and Annotation Pipeline

Christina Coppola

Mentor: Dr. Jennifer Lee, Bioinformatics Applications Development Center (BADC) within the National Cancer Institute’s Frederick National Lab for Cancer Research

Date/Time: August 25, 2020, 3pm

Abstract: This project was geared at contributing to a pipeline that is used to support precision medicine initiatives within the Frederick National Laboratory for Cancer Research (FNLCR) and the National Cancer Institute (NCI) to further cancer research.  The National Cancer Institute-Molecular Analysis for Therapy Choice (NCI-MATCH) is a precision medicine cancer treatment clinical trial.  Precision medicine utilizes prevention and treatment strategies that are fitted to the distinct characteristics of individuals and their disease. With regards to cancer, this includes the identification of variants that react to a targeted treatment. 

This project focused on enhancing the Variant Standardization and Annotation Pipeline (VarSAP). VarSAP normalizes variants to ensure that consistent variant terminology is used across different projects to assist with participant treatment assignment.  The pipeline enhancements were aimed at adding meaningful variant annotation data for variants on the adult MATCH treatment arms.  We wrote a Python script to access variant annotation data needed by VarSAP for variants of interest.

A comprehensive comparison was done on resources that house variant information.  Additionally, we completed an in-depth data analysis of cancer resources with respect to the NCI-Adult-MATCH treatment arms for the precision oncology knowledgebase (OncoKB), Clinical Interpretation of Variants in Cancer knowledgebase (CIViC), and Catalogue of Somatic Mutations in Cancer (COSMIC).  We wrote Python scripts for this analysis and the data studies explored the value of incorporating variant annotation data from these knowledgebases into VarSAP.  Ultimately, variant information for VarSAP was enriched by making use of annotation from a public cancer knowledgebase called CIViC.  The informational value that CIViC brings to variants includes biological significance, clinical implications, protein domain identification, drug interactions, and transcript identification.