Visualization and automated reporting of proteomic consistency metrics

Bioinformatics Internship Presentation

Yi Bai (Mentor: Dr. Simina Boca, Innovation Center for Biomedical Informatics, Department of Oncology and Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University)

August 29, 2017, 2:30pm, Room 341, Basic Science

Mass spectrometry provides key tools for analyzing proteins, including purified proteins, protein mixtures, and complex proteomes. The most common use of proteomic MS is the comparison of protein levels between specific groups, for example individuals who have a disease of interest (cases) and comparable individuals who do not (controls), or among different types of tumor samples. However, this comparison assumes that the observed differences in protein levels are primarily due to the biological groups, as opposed to being confounded by other factors, such as the variability of the analytical system. This assumption may be especially problematic if the biological differences between samples are modest relative to the technical variability. Known sources of technical variability include differences due to the sample preparation and signal acquisition. In general, it is helpful to collect metrics which can be used to quantitatively assess the stability of the MS platform and help identify outlying samples or potential problems with specific fractions. Herein, we consider exploratory analyses of these consistency metrics by using visualization and reporting tools. The resulting reports can be used both for downstream troubleshooting of the experimental approach and for informing data analyses by explicitly modelling batch effects or removing/downweighting outliers

In this project, we developed a user-friendly visualization approach and generated automated reports to allow researchers to assess the consistency of proteomics experiments within the Clinical Proteomic Tumor Analysis Consortium (CPTAC) using specific MS performance metrics. These metrics covered the areas of chromatography, ESI, MS1, MS2 and data analyzing; they included the number of MS2 spectra and specific quantiles for the precursor intensities, m/z values, and molecular weights, for each analytical sample and fraction. The R programming language was used to develop insightful visualizations for this data and find potential outliers. Interactive graphics were created via the shiny package and automated reports were built using R markdown.

This approach will assist researchers in evaluating the performance of the MS system and identifying outliers or technical artifacts which may confound the biological interpretation of the results in order to troubleshoot potential problems and inform downstream experiments and analyses.