Metatranscriptomic annotation of oral microbial population from tissues with root caries

Bioinformatics Internship Presentation

Luis G. Rivera-García (Mentor: Dr. Matthew D. McCoy, Innovation Center for Biomedical Informatics, Department of Oncology, Georgetown University)

August 29, 2017, 2:30pm, Room 341, Basic Science

20 years ago, human microbiomes were considered beneficial to health because of their role in metabolism regulation processes and protection against pathogenic microbes (Wroblewski et al 2016; Wang et al 2015). However, due to recent metagenomic studies, several microbiomes in the human body had been identified to be involved in diseases development (Wroblewski et al 2016). Even though metagenomic approaches are making progress to elucidate microbiome composition, they lack the ability to generate functional and metabolic information based on gene expression levels. In 2016, Westreich and collaborators, developed a metatranscriptomic data annotation tool (SAMSA) to address this gap between microbiome structure and functional annotation. We utilized SAMSA to gain insight into existing RNAseq data generated by Dame-Teixeira and collaborators in 2016. To supplement SAMSA, we developed a new methodology for microbiome functional annotation which is broken down into 6 steps: preprocessing, de novo transcript assembly, open reading frames extraction, protein domain identification, selection of highly differentially expressed domains and highly differentially expressed domain sets, and domain annotation using gene ontology terms. We found visible domain expression differences between healthy and diseased samples. In terms of highly differentially expressed domain set, all, except one domain set, were found upregulated in healthy samples. Functional annotation was enriched with functional domains related to bacterial flagellum-dependent cell motility, transmembrane transport, regulation of transport, regulation of carbohydrate metabolic process and mRNA catabolic processes. These results demonstrate that functional data can be obtained by detecting Pfam domains in metatranscriptomic RNAseq data. Future work should include protein identification for a better understanding of functional roles in microbial populations.