Internship Presentations

LIME, A High-Throughput Metabolite Annotation Tool

Bowen Yang

Mentors: Dr. Tytus Mak, Mass Spectrometry Data Center, National Institute of Standards and Technology; and Dr. Evagelia C Laiakis, Department of Oncology, Georgetown University Medical Center

Date/Time: August 23rd, 2022 at 2:40pm.

Abstract: As a study primarily concerned with compounds in organisms, metabolomics is regarded as the most phenotype-reflective omics. One of the most challenging phases in metabolomics analysis is the annotation and identification of the compounds.  Although many databases have been created and a huge number of compounds are present, there is not a complete overlap and no uniform naming standards among them, making manual large-scale searching become a highly time-consuming and error-prone task.

Thus, a high-throughput metabolite annotation tool with a friendly graphical user interface (GUI), Large-scale Identification of Metabolomics Engine (LIME), was developed to annotate post-processed liquid chromatography-mass spectrometry (LC–MS) based metabolomic data. The software aims to generate putative identities for all observed features and visualize global statistical analysis based on the percentage of data sources and adduct types, distribution of masses and ppm errors. The database of LIME includes 286,268 unique chemical structures when ignoring stereochemistry and charge, which were integrated from Human Metabolome Database (HMDB)[1], Chemical Entities of Biological Interest (ChEBI)[2], and LIPID MAPS[3], three mainstream metabolite databases. InChIKey, a fixed-length format derived from IUPAC International Chemical Identifier (InChI) by hash, was used as the unique identifier of LIME.

The performance of LIME is demonstrated on the dataset from a previously reported study, which collected urine samples from mice exposed to γ radiation[4]. Post-processing of the raw chromatographic data produced 8,305 positive and 3,577 negative mode spectral features. 79.2% positive mode and 85.4% negative mode MS1 spectral features were annotated in this dataset by LIME, with a mass tolerance of 20 ppm and only protonation and deprotonation adducts considered. The putative identity information in the result can help researchers narrow down candidates and complete metabolite identification to observe the disproportionate compounds and find potential biological meaning.

[1].   Wishart DS, Feunang YD, Marcu A, et al. HMDB 4.0: the human metabolome database for 2018. Nucleic Acids Res. 2018;46(D1):D608-D617. doi:10.1093/nar/gkx1089
[2].   Hastings J, Owen G, Dekker A, et al. ChEBI in 2016: Improved services and an expanding collection of metabolites. Nucleic Acids Res. 2016;44(D1):D1214-D1219. doi:10.1093/nar/gkv1031
[3].   Fahy E, Subramaniam S, Murphy RC, et al. Update of the LIPID MAPS comprehensive classification system for lipids. J Lipid Res. 2009;50 Suppl(Suppl):S9-S14. doi:10.1194/jlr.R800095-JLR200
[4].   Laiakis EC, Hyduke DR, Fornace AJ. Comparison of mouse urinary metabolic profiles after exposure to the inflammatory stressors γ radiation and lipopolysaccharide. Radiat Res. 2012;177(2):187-199. doi:10.1667/rr2771.1

Summer 2022