Optimization and Complementing of Proteomic Consistency Metrics Reporting Pipeline

Jiahao Huang (Mentor: Dr. Shaojun Tang, Innovation Center for Biomedical Informatics, Georgetown University)

August 28, 2018, 2:00pm, Room 1300, Harris Building

When it comes to proteomic research, Mass spectrometry is one of the fundamental tools utilized. Protein Identification is the major use of mass spectrometry which helps
identify the expressed proteome. Even though mass spectrometry was not developed to quantify ions, the quantification capabilities of its detectors provide significant analysis
potential. Nowadays, mass spectrometry is used to compare the protein levels between specific groups, for example, cancer (diseased) vs. control (non-diseased). However,
other factors which are not related to the major research interest such as system performance, technical variability will bias the result or conclusion. In order to analyze
these factors and understand their impact, several proteomic metrics were built for monitoring LC-MS/MS performance and a R-markdown based reporting pipeline was
constructed by previous developers in this project. This pipeline automatically generates a self-contained, user-friendly report of input metrics file to allow researchers to assess the consistency of proteomics experiments within the Clinical Proteomic Tumor Analysis Consortium (CPTAC).

For the purpose of providing a more comprehensive and well-organized report, the original script has been optimized and improved. In this new version, two more
proteomic metrics files are involved to provide more comprehensive information such as peptide identification and protein identification. The overall structure of the report was changed to include four major sections with detailed descriptions embedded instead often separate parts. Four plots related to MS/MS spectra were removed due to
redundancy. Six additional analysis plots are added to the report to illustrate and visualize the data of distinct peptide, peptide redundancy, LC retention time, and
identified protein for each analytical sample and fraction. Also, a PCA analysis along with potential batch effect removal by ComBat was embedded in this script for future
use. Because of the HTML format of this report output, some interactive and user-friendly features were added.

Tagged: Summer 2018