Internship Presentations

Extracting Transitions from GPTwiki Glycopeptide Database

Yujian Long

Mentor: Dr. Nathan Edwards, Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center.

Date/Time: August 25, 2020 at 4:40pm

Abstract: Targeted LC-MS and data-independent analysis (DIA) quantitation workflows for glycoproteins require the careful selection of multiple transitions, consisting of the precursor and product ion m/z values and their charge-states, for each glycopeptide to be quantified. Good glycopeptide transitions should be readily observable, with consistent intensity; and be highly specific for the glycopeptide, thereby reducing the chance of false-positive matches. Putative glycopeptide transitions were extracted from the GPTwiki glycopeptide transition database, and various strategies were applied to determine good transitions for each glycopeptide.

We used the “rdflib” python module to execute a SPARQL query on the GPTwiki triple-store to extract 15540 putative glycopeptide transitions from 1664 glycopeptides with an average of 3 observed intensities per transition and developed a python program to determine glycopeptide transition specificity based on precursor m/z, product m/z and normalized retention-time. For each transition, the number of other transitions with similar precursor m/z, product m/z, and normalized retention-time, were counted. Non-specific transitions were studied to identify glycopeptide fragments most likely to result in a non-specific transition. To study the consistency of transition intensity, we experimented with a variety of intensity summary statistics, eventually settling on median and the standard deviation of observed intensities as suitable metrics for transition consistency.

We tested transition specificity using precursor m/z tolerance of ±30 Da, product m/z tolerance of ±0.05 Da, and normalized retention-time tolerance of ±3 mins, which match the parameters of typical DIA workflows. We found that 82.5% of transitions are specific for their glycopeptide, while 97.9% of transitions are specific for glycopeptides’ peptide substrate sequence. Analyzing glycopeptide fragments, Y[pN] is found to be most enriched in glycopeptide non-specific transitions; this is consistent with the observation that the fragment’s m/z value only varies due to peptide sequence and not the N-glycan structure. Due to the lack of glycan information, transitions with this fragment should not be selected as high-quality transitions. A filter is applied to exclude transition with Y[pN] fragment.  Another analysis on transitions with glycopeptide sequence non-specificity indicates that the sequence variation in IGHG N-glycosylated peptides leads to non-specificity for glycopeptides from IGHG N-glycosylated peptides leads to non-specificity for glycopeptides from IGHG 1-4. Transitions are then filtered for consistent observation and intensity, resulting in selection of at least 4 high-quality transitions for 61.4% of the glycopeptides in GPTwiki.

Summer 2020