Internship Presentations

Data-Mining of Oncogenomic Datasets

Anna Abaimova (Mentor: Dr. Bassem Haddad, Department of Oncology, Georgetown University)

August 29, 2017, 2:30pm, Room 341, Basic Science

Microarray experiments generate a tremendous amount of data. New
bioinformatics tools that facilitate the utilization of such data are necessary. Several
bioinformatics platforms have been developed to allow evaluation of oncogenomics
data from microarray studies. Differential expression patterns of genes suggest their
potential oncogenic role in cancer and may lead to reliable hypotheses for cancer
research. Oncomine is a powerful web-based platform that provides tools for
bioinformaticians and scientists to explore gene expression across a wide variety of
cancer types and a wide range of cancer-related cell line. It allows users to query
and visualize a gene of interest across multiple datasets. Currently, Oncomine
contains data from 715 studies collected from published cancer genomic datasets.
One of the datasets available on Oncomine is the Cancer Genome Atlas (TCGA)
dataset. TCGA is a joint effort of the National Cancer Institute (NCI) and the National
Human Genome Research Institute (NHGRI) to provide researches with the entire
spectrum of genomic changes involved in human cancer. TCGA data includes clinical
information, genomic characterization data, and high-level sequence analysis of the
tumor genomes. This extraordinary dataset will be used in our studies along with
other datasets available at Oncomine.

To illustrate the power of data-mining of oncogenomic datasets using
Oncomine, we evaluated the correlation between the expression of STAT5 in
prostate cancer and of the calcium channels gene in breast cancer, and different
clinical variables.

Studies have shown a potential role of calcium in the development and
progression of breast cancer. Calcium channels play a critical role in a wide variety
of biological processes. Many studies in cancer have identified alterations in the
expression of proteins involved in the movement of calcium across the plasma
membrane and subcellular organelles. However, the overall impact of calcium
channels in breast cancer remains controversial. Using Oncomine, we found that in
breast cancer upregulation of CACNA1C, CACNA1D, CACNA1B, CACNA1G, CACNA1I
is correlated with clinical characteristics such as Estrogen Receptor (ER) expression,
Progesterone Receptor (PR) expression, recurrence, and metastatic events. The
observation of overexpression of those genes could make them likely targets for
cancer treatment. However further detailed investigation on the mechanism of how
calcium channels play a role in cancer onset and progression needs to be conducted.

Recent studies have established that the transcription factor STAT5a/b is a
critical protein for the survival of human prostate cancer cells. STAT5 is
constitutively active in human prostate cancer, but not in normal prostate
epithelium. According to a study published by Haddad et al (Am J Pathol, 2013), the
STAT5A/B gene locus undergoes amplification during human prostate cancer
progression. Expression of Stat5 is increased in high-grade prostate cancer,
castrate-resistant prostate cancer, distant metastases, and predicts early disease
recurrence and prostate cancer specific death. Using Oncomine to evaluate data
from prostate cancer studies, amplification of STAT5 in prostate cancer showed a
tendency for correlation with recurrence, however without reaching statistical
significance. This can be due to small sample size available through Oncomine.

Oncomine is an excellent web-based bioinformatics platform to analyze gene
expression information and has many useful functions. This platform facilitates the
evaluation of clinical information available from an extensive number of datasets
including the TCGA data. It is a very effective tool that enables the identification of
candidate genes associated with tumor development and clinical outcome and
facilitates discoveries using publically available oncogenomic datasets.

Summer 2017