Increasing the diagnostic yield of an autoinflammatory disease cohort by copy number variant analysis

Posted in Internship Presentation  |  Tagged

Brynja Matthíasardóttir

Mentor: Dr. Ivona Aksentijevich, National Human Genome Research Institute, National Institutes of Health

Date/Time: August 27, 2019 at 2pm

Location: Room 1300, Harris Building

Background: Although whole exome sequencing (WES) has become more widely used in a clinical setting in order to identify causal variants, the success rate to identify causal genes in monogenic diseases is still not satisfying. Copy number variations (CNVs) have been linked to dozens of human diseases, defined as DNA segments that are present at a variable copy number in comparison to a reference genome. Multiple software packages are available to call CNVs using WES data; however, the specificity of them is quite low, due to a large number of false positive calls. To overcome this situation, the Inflammatory Disease Section (IDS) at the NHGRI recently ran the combination of WES and SNP chip microarray using genomic DNA from 452 patients with hereditary autoinflammatory diseases and their unaffected family members. The aim of this project was to increase the diagnostic yield of the IDS cohort by developing and fine-tuning a bioinformatics pipeline to combine CNV data of SNP chip microarray with conventional WES data.

Methods: DNA microarray was performed using Infinium Global Screening Array-24 v1.0 BeadChip with 640.000 markers. CNVs were called by PennCNV using the output data from GenomeStudio. Data from public databases such as DGV and Decipher were used in order to filter out common CNVs. The WES data were processed using BWA-GATK pipeline to call single nucleotide variants (SNVs)/short indels, and variants were annotated using VEP. The output files of these two pipelines were combined together by in-house scripts to identify possible candidate variants.

Results: To confirm the identified CNVs and validate the methodology, positive control data were analyzed, which included mosaic trisomy 8 and single exon deletion in the ADA2 gene. The pipeline detected the trisomy on chromosome 8 with 60% mosaicism in peripheral blood. However, it failed to detect the single exon deletion in ADA2, which was attributed to the fact that only a limited number of markers were located in the deleted region. To increase the sensitivity of the analysis, the pipeline was further modified to detect small numbers of candidate markers.

Conclusion: A workflow to analyze SNP chip data to improve the diagnostic yield of the IDS cohort has been established.