Variant analysis of LY6 genes in Breast and Ovarian cancer

Bioinformatics Internship Presentation

Midrar Al HossinyMidrar Al Hossiny (Mentors: Dr. Geeta Upadhyay, Lombardi Comprehensive Cancer Center and Innovation Center for Biomedical Informatics, Department of Oncology, Georgetown University; and Dr. Yuriy Gusev, Innovation Center for Biomedical Informatics, Department of Oncology, Georgetown University.)

May 13th, 2016, 10:00pm, Room 1300, Harris Building

Sca-1 is a well characterized molecule in cancer stem cells which regulates TGF-b signaling, Wnt signaling and it is important in cancer progression and metastasis in mouse models. We focused on Ly6 genes family because they are the human homologues isoforms of the sca-1 in mouse.

This project focused on utilization of breast and ovarian cancer RNAseq data to analyze mutations and expression at transcript isoforms level to discover coding and non-coding mutations in LY6 gene family that could affect LY6 genes functions and/or regulation in human cancers with possible relevance to drug resistance or disease recurrence. Variant analysis and annotation was performed using the RNAseq pipelines and workflows developed at Innovation Center of Biomedical Informatics (ICBI at GUMC) as well as open source tools and publicly available resources such as TCGA.

The major steps that was done on this project includes the identification of RNAseq samples in TCGA on-line repository, downloading BAM files, running variant calling algorithm to generate VCF files and conducting variant annotation focused on LY6 gene family and it’s downstream target genes, and mapping variants onto known human pathways and conducting systems biology analysis.

We analyzed 21 ovarian samples and 29 breast samples from TCGA collection. Although the number of samples were higher for breast cancer, the number of variants was much higher in ovarian cancer samples where we found 3739 variants as compared to breast cancer which only had 418 variant. When findings in these two cancer types were compared we found 55 variants that were common in both types of cancer. Different types of variants were detected including high impact start and stop codons gains and losses, missense mutations in the coding region domain, and large number of variants affecting the 3 prime and 5 prime untranslated regulatory region of the gene.  A short list of variants has been selected for future validation in the Lab to test the effects that it might cause to the protein structure and function.