Glycan structure parsing, alignment, and visualization

Posted in Internship Presentation  |  Tagged

Wenjin Zhang (Mentor: Dr. Nathan Edwards, Department of Biochemistry and Molecular & Cellular Biology, Georgetown University)

August 28, 2018, 2:00pm, Room 1300, Harris Building

Glycosylation of proteins and lipids is important for many cellular processes, mediating protein-protein and protein-cell surface interactions. For example, the glycans on bacteria, virus or even cancer cell surfaces can be used as targets for drug design. Glycosylation also affects the stability and folding of proteins. The informatics of describing and analyzing glycan structures poses many challenges – from parsing the monosaccharide descriptions that form the foundation of glycan descriptions, aligning monosaccharide motifs against previous described structures, and visualizing the relationships between related glycan structures. Working with glycan structures is challenging because, unlike protein and nucleic acid, glycans are not linear molecules, but branched. Furthermore, glycan synthesis is an enzyme-driven process that does not rely on templates, so the set of potential and observable glycan structures is poorly understood. Lastly, experimental technologies for the analysis of glycan structures usually provide an incomplete characterization of glycan stereochemistry and other molecular details.

This project developed a parser for monosaccharides described in the WURCS2.0 glycan sequence format and integrated it in the PyGly Python package developed in the Edwards lab. The WURCS2.0 glycan structure format is used by the Glycan structure database GlyTouCan (glytoucan.org), and the lack of a reliable parser was a significant problem in the PyGly package. A Python program compared the glycan structures resulting from an existing GlycoCT format and the new WURCS2.0 format parsers, ensuring correctness for a set of a few thousand human glycan structures. Like proteins’ domains, glycans have structure motifs that drive the specificity of glycan-protein binding, and aligning glycan motifs to glycan structures can be used to categorize and organize glycan structures by their biological function. The equality aligner used to check the WURCS2.0 glycan format parser was adapted to solve the motif alignment problem and reverse engineer the semantics of motif association in GlyTouCan. A web application for visualizing motif-glycan associations was developed to verify correct alignments. Finally, a JavaScript-based widget for visualizing the relationship between similar glycan structures was developed using the JavaScript libraries d3.js and viz.js, enabling interactive examination of structure relationships from a static, precomputed relationship graph.

Together these glycan structure parsing, alignment, and visualization tools will ultimately be integrated into the recently developed GlyGen glycomics portal.