Evaluating pathogenic variants of Fanconi anemia

Bioinformatics Internship Presentation

Subhiksha Nandakumar (Mentor: Dr. Settara Chandrasekharappa, National Institutes of Health)

August 26th, 3:20pm-3:40pm Room 1300, Harris Building.

Introduction

Fanconi anemia (FA) is the most common genetic form of aplastic anemia, a recessive disorder characterized by progressive pancytopenia, diverse congenital abnormalities and predisposition to malignancy. There are 16 FA genes known: FANCA, FANCB, FANCC, FANCD1 (BRCA2), FANCD2, FANCE, FANCF, FANCG, FANCI, FANCJ (BRIP1), FANCL, FANCM, FANCN (PALB2), FANCO (RAD51C), FANCP (SLX4) and FANCQ (XPF/ERCC4). All FA genes are autosomal with the exception of FANCB, which is X-linked. The lab analyzes DNA from a large number of FA patients for identification of the gene and the pathogenic variants causing the disease. DNA sequencing is performed using Next generation sequence (NGS) technologies, and for comprehensive analysis, sequencing is complemented with custom array comparative genomic hybridization (aCGH) and RNA sequencing (RNAseq) in order to identify biallelic germline mutations. The aim of this project was to analyze sequence data from patients to find defective gene and disease causing mutations. This effort involved sifting through a large number of variants, and evaluates their pathogenicity, particularly synonymous, missense and splice variants. Multiple amino acid pathogenic prediction and splice prediction programs were employed.

Method

The NGS data was processed, aligned to the reference genome and the variants were displayed using VarSifter (http://research.nhgri.nih.gov/software/VarSifter/index.shtml). This tool was also used to filter, sort and sift through the sequence variation data. ANNOVAR (Functional annotation of genetic variants from high-throughput sequencing data (http://www.openbioinformatics.org/annovar/annovar_filter.html) a versatile software tool to functionally annotate missense variants from various pathogenicity prediction programs such as SIFT, PolyPhen2 HDIV, PolyPhen2 HVAR, LRT, MutationTaster, MutationAssessor, FATHMM, GERP++, PhyloP and SiPhy. Another tool used for scoring the deleteriousness of single nucleotide variants as well as insertion/deletion variants in the human genome is CADD (Combined Annotation Dependent Depletion) (http://cadd.gs.washington.edu/). Splice prediction tools such as Human Splice Finder (http://www.umd.be/HSF3/HSF.html), were used to predict the transcriptional impact of mutations at known splice sites and creation of novel splice sites.

Results

It was necessary to sift through a large number of variants identified from NGS data of a patient DNA and find two variants in an FA gene causing the disease, as this is a recessive genetic disorder, with the exception of FANCB. Variants at a frequency >.05 in the 1000Genome and ClinSeq data were considered as nonpathogenic common variants and were eliminated. Cross checking with the known mutation database, Fanconi Anemia Mutation database (http://www.rockefeller.edu/fanconi/) as well as list of variants proven to be pathogenic in the lab helped identify some of the pathogenic variants. For recognizing remaining pathogenic variants, we needed to evaluate many missense, intronic and synonymous variants. Results from ANNOVAR and CADD analysis were evaluated for consistency in prediction from multiple programs. The scores obtained from splice prediction programs were interpreted by comparing the mutant score with the reference score, higher the score implying a higher probability of the variant affecting an existing splice site or creating a novel splice site. In silico methods significantly reduce the number of potential candidate variants to be evaluated by functional assays. Often RNA samples are unavailable for evaluation of splicing aberrations caused by a synonymous variant, or an intronic variant located far from the splice junction. In such cases, splice prediction programs are helpful, despite difficulty in interpreting the results