Generation of computational pipeline to assess the phase separation potential of oncogenic fusion proteins
Teya Dragovic
Mentors: Dr. Sreejith Nair, Department of Oncology – Georgetown University Medical Center; Dr. Matthew McCoy, Innovation Center for Biomedical Informatics (ICBI) – Georgetown University Medical Center
Date/Time: August 23rd, 2022 at 1:20pm.
Abstract: Studies over the past decade have shown that several cellular processes happen in distinct subcellular compartments that are assembled spontaneously through a physical process known as liquid-liquid phase separation (LLPS). Although, highly debated topic in biology, it is accepted that several proteins involved in gene regulation and other biological process assembles as dynamic liquid-like puncti in the cells. The physical properties of these macromolecular assemblies’ correlate with their regulatory potential. In this project I examine the role of LLPS in cancer.
One class of genetic dysregulation that lead to cancer is gene fusion that result from structural rearrangements of chromosomes. Such rearrangements can lead to deregulation by bringing proproliferative genes under the regulation of a non-coding gene driver (E.g., Ectopic promoter or enhancer) resulting in elevated proliferative gene products. Chromosomal rearrangements can also result in in-frame fusion of two separate proteins leading to novel proteins with neomorphic properties. The aim of my project was to investigate the role of LLPS in the oncogenic activity of such fusion proteins. The first part of my project consisted of assessing available and curated datasets to identify common gene fusion events that have been implicated across different types of cancer. Next, using Python scripting, I created a computational pipeline line to characterize the physical properties that are predictive of the phase separation characters of a protein. A research group has recently developed an algorithm that can predict the phase separation potential of a protein. Using the extend of pi-pi interaction frequency of non-aromatic amino acid in a protein, this algorithm will generate a score that correlate with the probability of protein phase separation. I generated an analysis pipeline that calculates the phase separation score of fusion partners and fused protein in order to compare the score to examine if the fusion events have significantly altered the phase separation score.
Using these predicted phases separated scores, I further categorized the gene fusion partners according to molecular function, specifically focusing on nuclear proteins such as transcription factors and co-factors. Using this workflow I identified a system to create subsets of interesting candidates given the previously established parameters. In the future I hope this can serve as a tool to identify what factors or patterns, such as amino acid composition, unique motifs, post translational modifications, that may be responsible for such changes in phase separation.
- Tagged
- Summer 2022