ESTABLISHING GLYCAN ONTOLOGY - ALGORITHMS FOR GLYCAN COMPARISON AND VISUALIZATION

Bioinformatics Internship Presentation

Avery Wang (Mentor: Dr. Nathan Edwards, Department of Biochemistry and Molecular & Cellular Biology, Georgetown University)

August 29, 2017, 2:30pm, Room 341, Basic Science

Although the central dogma of biology has been primarily defined by DNA, RNA, and protein, there are many other cellular processes which contribute to the complexity of biological systems. Glycosylation is one such process; the attachment of glycans, composed of monosaccharides, to proteins and lipids influences protein-protein and protein-cell interactions, affecting many cellular processes.

Glycans are linked monosaccharides categorized by their attachment site and structure. N-glycans attach to asparagine (Asn, N) residues at the so-called N-glycosylation motif (NX[S/T]), while O-glycans attach to serine (Ser, S) or threonine (Thr, T) residues. Glycan chains vary in length and structure; they can be branching or linear and can consist of different sugar molecules. Owing to the complexity of potential glycan structures, constructed by enzymatic rather than templated cellular processes, and the inability of analytical techniques to fully elucidate all aspects of glycan structures, manuscripts and other resources describing glycan structures often leave aspects of the structure unspecified. Recent efforts to formalize glycan descriptions capture this ambiguity explicitly, but lack a strategy for navigating a glycan’s descriptions at varying levels of specificity. An ontological hierarchy, like those found in other ontological systems, would allow for mapping of relatedness and/or function across parent/child glycans based on their descriptions.

This project develops software tools and implements algorithms for comparing and organizing glycan descriptions. The algorithms utilize existing glycan data-structures implemented in both the PyGly and GlyPy Python modules developed by the Edwards lab. An initial algorithm was developed to directly compare two glycans with known topologies. The algorithm does take in to account the possibility of unspecified information such as unspecified linkages, anomer configuration, etc. An additional algorithm was added to address topologically undetermined glycans. With the basic framework for glycan comparison in place, an algorithm was developed to incrementally insert a glycan description into a progressively constructed ontological tree. Additional visualization tools were developed to display and confirm generated glycan description relationships; current utility allows for navigation across varying levels of specificity.