Targeted RNASeq Analysis of Fanconi Anemia Patients
Vibhish Raghuraman (Mentor: Dr. Settara Chandrasekharappa, National Institutes of Health)
August 28, 2018, 2:00pm, Room 1300, Harris Building
Fanconi Anemia is a rare, autosomal recessive genetic disorder. It will result in inevitable bone marrow failure, as well as a sharp increase in the likelihood of developing acute myeloid leukemia, or solid tumors, specifically head and neck cancer. The FA pathway is involved in DNA repair and is active during replication. Currently, twenty-two FA genes are known to cause this disease. In this pilot project, the targeted RNAseq data was collected in conjunction with NIH Intramural Sequencing Center (NISC), and we analyzed that data for nineteen cell lines from fourteen patients in six of these genes: FANCA, FANCC, FANCD2, FANCE, FANCI, and FANCL. These nineteen cell lines were of interest because each either carried a mutation predicted to affect splicing, or had a “missing mutation”, meaning a mutation was found in only one of the alleles of that FA gene. The known mutation(s) in these genes had been previously discovered using various methods, including Sanger sequencing and targeted Next-Generation Sequencing. We hoped that RNAseq analysis would reveal the specific splicing effect of the known splice mutations and provide insight into the unknown mutation for those patients with only one known FA variant.
Once the RNA had been sequenced for these genes, the reads were aligned to the genome using STAR, and then the output generated from the alignment was run through the GATK pipeline to call variants in the data. I then used ANNOVAR to annotate the variants to identify possible pathogenicity. Any variant that had a frequency greater than .02 in the ExAC database was considered a common variant and thus considered non-pathogenic. All known coding variants were detected in the RNAseq data, except for the one with stopgain, which indicated loss of the allele in the RNA due to nonsense-mediated decay. However, we did not find any additional pathogenic variants in the patients with a “missing mutation”. The next step was to understand whether or not the previously discovered splice variants led to differential exon usage (DEU) using tools such as JunctionSeq and DEXSeq for statistical analysis, and tools such as IGV for visual assessment of DEU. It was difficult to assess DEU effectively in these samples because we were dealing with a small subset of genes, while these tools were designed to evaluate DEU on a whole genome scale. IGV, however, did allow for clear visual evaluation of exon skipping events, one with a splice variant in FANCI, and the other with a homozygous synonymous variant in FANCL. Both these exon skipping events were predicted from earlier RT-PCR experiments. Further evaluation of the read depths will be needed to identify low frequency changes, if any.
Since this was a pilot project in using the targeted RNAseq capture method, a secondary goal of the project was to evaluate the capture method to understand how to improve it and see if it could be used in further studies. To that end, the alignment statistics were studied, and the read counts were compared across all of the samples for every gene. Based on this, we noticed that samples from fibroblast cell lines had fewer reads than LCL-derived samples, which may be related to the observation that fibroblasts divide more slowly compared to LCL cell lines. Because of the variability in the number of exons and the lengths of those exons across these six genes, the capture/library preparation methods may generate reads of variable length, which could result in variable read depth across exons and across genes. Steps to improve the capture method, specifically probe design, and the preparation methods will be taken in the future to make this a more viable method of data collection and analysis.