Internship Presentations

Molecular profiling breast cancer recurrence: a meta-analysis

Shruti Gautam

Mentors: Dr. Robert Clarke, Department of Oncology, Georgetown University; Lu Jin, Department of Oncology, Georgetown University

Date/Time: August 27, 2019 at 2pm

Location: Room 1300, Harris Building

In the next 12 months, American Cancer Society estimates there will be over 62,000 newly diagnosed cases of in situ breast cancer and approximately 220,000 newly diagnosed cases of invasive breast cancer in the United States. The molecular events that drive breast cancer progression towards more aggressive forms are largely unknown in both sporadic and inherited breast cancers. Despite the benefits of endocrine therapy and cytotoxic therapy, advanced breast cancer largely remains an incurable disease for most women, and new treatment regimens and schedules have led to only incremental decreases in breast cancer related mortality and recurrence. New tools to better identify which cancers require more aggressive chemotherapy, compared with those that do not, continue to emerge. These tools are often molecular classifiers that attempt to identify prognosis (often risk of recurrence/metastasis) by measuring the expression values of a panel of genes, e.g., Mammaprint (measures 70 genes) and OncotypeDX (measures 21 genes). The goal of finding effective and accurate prognostic tools to better inform treatment decisions remains an area of active research within the breast cancer research community.

Our meta-analysis study aims to evaluate consistency among gene expression data from pre-treatment, estrogen receptor positive tumors stemming from patients that experienced downstream local and/or distant recurrence in their breast cancer. Clinical data was pulled in the form of either cel files or FPKM files from public repositories including GeoDataSets and the cancer genome atlas (TCGA). Each dataset included in the study contained clinical annotations regarding estrogen receptor status, recurrence status, and time to either local or distant recurrence. Patients within these datasets were included in downstream analysis if they were estrogen receptor positive and annotated with a last follow up date or a cancer recurrence date. Each dataset was evaluated independently, with patients grouped into three categories including, early recurrence, late recurrence or no recurrence. Early recurrence, late recurrence or no recurrence was defined as either breast cancer recurrence or distant recurrence within a 3-year period, between a 3-to-5-year period, or no recurrence for a 10-year period respectively.  Gene expression matrices were created post pre-processing, data normalization, and data reduction pipelines. These matrices underwent downstream statistical analysis, using the limma package in R, to evaluate for differential gene expression among the three aforementioned groups. Each dataset produced a variety of figures, including a histogram showing the distribution of the normalized data, volcano plots produced post-statistical analysis and a differential expressed gene list, which included genes meeting a threshold of either an FDR adjusted p-value of 0.05 or an absolute value fold change of 1.5 (log fold change of 0.6). Each gene list was exported, and the intersection of the data was evaluated. All figures produced for the datasets included in the study may be accessed through the R Shiny application at the following link:

We found no genes that were upregulated or downregulated across all evaluated data sets. We found CD24 and NR3C1 upregulated in the late recurrence cohort as compared to the early recurrence cohort and CA12, CYB561 and MAPKAPK2 proteins downregulated in the late recurrence cohort as compared to the early recurrence cohort across multiple data sets. A full intersection gene list may be found in the R Shiny application above.

Summer 2019