Gene set enrichment with correlation of gene expression and its translated protein abundance in colorectal cancer study

Bioinformatics Internship Presentation

Xu Liu (Mentor: Dr. Nathan Edwards, Department of Biochemistry and Molecular & Cellular Biology, Georgetown University)

September 1st, 2015, 3:00-3:20pm, Room 1300, Harris Building

Colorectal cancer is the development of cancer in the colon or rectum.[i] It is due to the abnormal growth of cells that have the ability to invade or spread to other parts of the body.[ii] Greater than 75-95% of colon cancer occurs in people with little or no genetic risk.[iii] [iv]Other risk factors include older age, male gender, IV high intake of fat, alcohol or red meat, obesity, smoking, and a lack of physical exercise. III Approximately 10% of cases are linked to insufficient activity.[v]

The Cancer Genome Atlas (TCGA) has published sets of genomic features of human colorectal cancer. The Clinical Proteomic Tumor Analysis Consortium (CPTAC) is analyzing the proteomic data of TCGA tumor specimens. For this internship, I select the RNAseq v2 data for gene expression from TCGA and mass spectra data for protein abundance from CPTAC with same patients. Gene expression data, which are normalized by RSEM algorithm, are combined to one single set, then processed with normalized protein abundance data. Only less than 1/3 gene-protein pairs show significant self-correlation. Genes are classified with self-correlation then compared with other functional enrichment tools to discover some hidden relationship. And the statistics result can contribute to future study, e.g. transcription factors finding.

[i] "Colon Cancer Treatment (PDQ®)". NCI. 2014-05-12. Retrieved 29 June 2014.

[ii] "Defining Cancer". National Cancer Institute. Retrieved 10 June 2014.

[iii] Watson AJ, Collins, PD (2011). "Colon cancer: a civilization disorder". Digestive diseases (Basel, Switzerland) 29 (2): 222–8. doi:10.1159/000323926. PMID 21734388.

[iv] Cunningham D, Atkin W, Lenz HJ, Lynch HT, Minsky B, Nordlinger B, Starling N (2010). "Colorectal cancer". Lancet 375 (9719): 1030–47. doi:10.1016/S0140-6736(10)60353-4. PMID 20304247.

[v] Lee, I-Min; Shiroma, Eric J; Lobelo, Felipe; Puska, Pekka; Blair, Steven N; Katzmarzyk, Peter T (1 July 2012). "Effect of physical inactivity on major non-communicable diseases worldwide: an analysis of burden of disease and life expectancy". The Lancet 380 (9838): 219–29. doi:10.1016/S0140-6736(12)61031-9. PMC 3645500. PMID 22818936.