Bioinformatic software development for analysis of admixed populations

Guangchuan Ji (Mentor: Dr. Simina Boca, Innovation Center for Biomedical Informatics, Department of Oncology and Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University)

August 28, 2018, 2:00pm, Room 1300, Harris Building

An admixed population arises from the amalgamation of multiple ancestral populations. In addition to revealing aspects of population history, admixed populations can be used to find the genetic loci that contribute to diseases or to other phenotypes. We consider expected heterozygosity and fixation index, two quantities commonly considered in population genetics, in the context of admixture. The expected heterozygosity represents the probability that two alleles randomly drawn from a population are different from each other and is thus a measure of genetic diversity. The “fixation index” Fst can be used to quantify genetic divergence between two populations. Fst can be defined as Fst = (Ht –Hs)/Hs, where Ht is the expected heterozygosity of the overall population, and Hs is the mean expected heterozygosity across subpopulations. In a previous study, it was shown that the Fst between an admixed population and one of the founding populations is maximized when the admixed population is one of the other founding populations. These results can provide a basis for interpreting Fst in admixed populations. For the case with only two parental populations, the results show that Fst is monotonic and convex as a function of the admixture function, so Fst is informative about the admixture fraction, and vice versa. It was also shown that the heterozygosity of an admixed population can be higher than the heterozygosity of the ancestral populations; thus, an admixed population can be more genetically diverse than its founder populations.

In this project, we developed an R package for the analysis of the Fst between an admixed population and one of the founding populations and for the analysis of the heterozygosity of an admixed population. This package includes documentation and examples. In particular, it includes an html vignette – built using R markdown – with reproducible examples and including of different types of plots. For a more user-friendly visualization approach, we developed a web application using the shiny framework, which can implement some functions in the R package.

Tagged: Summer 2018