Rare Variants Association Study for inherited retinal diseases via Exome Sequencing

Bioinformatics Internship Presentation

Jiayao Wang (Mentor: Rinki Ratnapryia, Ph.D. NIH/NEI Neurobiology Neurodegeneration & Repair Laboratory)

August 30th, 2016, 2:00pm, Room 1300, Harris Building

Inherited retinal diseases (IRD) leading to either partial or total blindness, affect approximately 1 in 3000 people. Exceptional progress has been made in identifying the genetic causes of IRDs with about 220 genes identified so far. However, it is suggested that single-gene mutations must reside in a permissive genetic background for a disease phenotype to manifest. Often that is reflected by poor genotype-phenotype correlation in patients and IRD display great deal of heterogeneity. For example, Mutations in at least 14 genes have been reported that result in Leber congenital amaurosis and 9 genes have been linked with Usher syndrome and there are still patients who don’t have any mutations in these genes. Segregating background genes can modify the age of onset, rate of progression or severity of the diseases. These background genes that interact with the disease mutation responsible for the specific observed phenotypes are commonly called genetic modifiers. Identification of these modifier genes may elucidate the biological pathways that lead from the primary genetic defect to the aberrant phenotype. Once the identities of modifier genes that suppress vision or hearing loss become known, the door opens to new potential therapeutic targets, since these modifier genes may be more amenable to therapeutic interventions than the primary gene defect.

Rare variants (variants with Minor Allele Frequency < 0.5%) association test has been wildly used to evaluate the role on rare variants in missing heritability of common disease, which are primarily associated with common variants.  Whole Exome and Genome sequencing is being adapted to explore the extent to which rare alleles explain the heritability of complex diseases and health related traits. Our study ‘borrows’ methods from Rare Variants Association study to systematically search for modifier genes of disease phenotype and refines our understanding of genetic architecture of IRDs.

At current phase of project, we sequenced about 500 patients’ exome and examined two super populations, EUR (108 patients and 345 controls of European ancestry) and AMR (48 patients and 251 controls of ad mix American ancestry). We call variants on these individuals with Germline SNP & INDEL Discovery in Whole Exome Sequence pipeline suggested by GATK Best Practices. Functional annotation was performed using ANNOVAR. We further filtered variants by multiple criteria, mostly focusing on exonic rare variants. We performed gene-based aggregation test of multiple variants including Burden tests and Variance Component Tests on selected variants. Results of the tests reveals possible associations between phenotypes and genes. Under different criteria we identified different top statistically significant genes including potential causal genes as well as some false positives. Our tests also detected some genes that are already known to cause retinal diseases, which demonstrate that our method are able to uncover IRD associated genes.

Small sample size was the key limitation for this study. However, this exercise helps us to develop methods and pipelines to construct the whole map of modifier in retinal disease. We also developed a spark script to filter large variants dataset (Tens of Gigabytes) according to multiple criteria, which helps researchers to parse through the large dataset to get useful information. Finally, we also developed a pipeline to call variants from RNA-SEQ data.