Designing a Framework for a MS/MS Library to be Utilized for Compound Identification

Meth Jayatilake (Mentor: Dr. Amrita Cheema, Department of Oncology and Biochemistry and Molecular & Cellular Biology; Director of Metabolomics Shared Resource, Georgetown University Medical Center)

August 28, 2018, 2:00pm, Room 1300, Harris Building

Identification of metabolites is a challenge in the field of untargeted metabolomics. Tandem mass spectroscopy (MS/MS) can be used for identification by comparing spectra of pure standard compound with that in the sample for a given metabolite (m/z). Current workflow in untargeted metabolomics relies on manual comparison of these spectra for metabolite identification. Hence, the creation of the MS/MS database will enable compound identification by matching spectra using peak and pattern matching algorithms, rather than the manual evaluation that is currently done. Although online libraries already exist, information can be limited, and peaks detected can vary between vendors and instruments. More importantly, the pattern matching is still a manual process. Therefore, a database structure was created to be interoperable between vendors by using the netCDF exchange data format as its input.

For each compound that was run, the MySQL database would store associated information including empirical formula, molecular mass, and monoisotopic masses for each ionization mode. Each compound would also be associated with a HMDB ID to maintain chemical information consistency. The netCDF (a universally readable format of raw MS/MS files), for each metabolite, also underwent peak peaking by using the XCMS package for R. The m/z value, intensity, and retention time for each peak of every compound was stored in the database.

To query the database, files containing information about the positive and/or the negative ionization mode netCDF as well as the respective monoisotopic masses is required. Each of the files would first undergo the same R peak picking service that was used to store compounds in the library. The next step was to select parent ions from the library that were close to the query value. For each of the compounds that were selected, a retention time window would be created for each based on the parent ion to determine the true compound peaks to compare. The query peaks were compared to the highest intensity daughter ion peaks to determine a match.

Although precautions can be taken to minimize inconsistencies, such as by maintaining proper parameters, variance can still be observed between instruments. The retention time window was a feature that other MS/MS libraries did not have, and therefore, our workflow would allow for reliable peak selection even between instruments. This project also created a framework for lab specific MS/MS libraries. Construction of these libraries based on data each lab has acquired will allow for more accurate identification of properly curated metabolites, which will be beneficial to metabolomics researchers. Finally, the workflow that we have developed can be translated for any vendor specific MS/MS library.

Tagged: Summer 2018