Improved retention-time estimation for glycopeptide extracted-ion chromatograms

Posted in Internship Presentation  |  Tagged

Yixin Wu

Mentor: Dr. Nathan Edwards, Department of Biochemistry and Molecular & Cellular Biology, Georgetown University

Date/Time: August 27, 2019 at 2pm

Location: Room 1300, Harris Building

Protein glycosylation is an enzymatic reaction that attaches glycans to the proteins, which is one of the most significant and fundamental post-translational modifications (PTMs) in nature. Due to the variety of glycans and their binding sites on proteins, protein glycosylation can lead to the increased diversification of protein structures and functions. Mass spectrometry is a technique that measures the mass-to-charge ration of ions which is a common method being used for glycoprotein analysis.  In order to quantify site-specific protein glycosylation and solve the problems of the complexity of glycoprotein structures, mass spectrometry methods require better analytical tools and data-processing support.

Our data is from GPTwiki database, consisting of glycopeptide identification spectra with crudely estimated retention times, which are crucial for the success of targeted LC-MS glycopeptide quantitation workflows. We use precursor m/z extracted ion chromatograms to show the intensity of signal observed at the median value of m/z for all identified spectra recorded as a function of retention time.

To better automate the process, we developed a python script to automatedly determine the start and end of the peak from the initial value. According to the maximum of intensity, mean and standard deviation, we fit a guassian shape to each XIC to better estimate the centroid and intensity of the peptides’ elution peak. Then we need to use fit quality metrics like correlation coefficient to determine whether to accept the fit.

In our project, we applied this script to 86 extracted ion chromatograms (XICs) from GPTwiki and we found 55% no good data in the XIC; 12% bad initial value; 17% good fit; 16% fit, but with poor metrics. The extreme cases, suggested a new project direction that we can try to detect outliers in RT estimates so that eliminating those poor or problematic data. Plus, we also applied this technique to the XICs of internal standard peptides for establishing normalized retention times.