Regulation of proprotein convertase activity by substrate glycosylation and related genomic variants

Bioinformatics Internship Presentation

Laila Al-washahi (Mentor: Dr. Rado Goldman, Department of Oncology, Georgetown University)

August 26th, 2pm-2:20pm Room 1300, Harris Building.

INTRODUCTION: Processing of proteins by protein convertases (PC) is important in regulating vital biological pathways. However, Glycosylation in or closely adjacent to active convertase processing motifs may disrupt activity of proprotein convertases. As a result, glycosylation is emerging as an important co-regulator of convertase activity. The purpose of this project is to identify physiologically relevant PC substrates in the human proteome, glycosylation sites in their proximity, and SNPs affecting these glycosylation/convertase sites.

METHOD: In this project, using a series of computational steps, a proteome-wide analysis of potential variants regulating O-glycosylation and N-glycosylation near motifs for PC processing (potentially active PC sites) was carried out. First, a python code was created to extract cleavage motifs, information on annotated glycosylation sites and variants that create additional/abolish existing glycosylation sites from uniprot. Then a second code was created to extract potential glycosylation sites (N and O) and human SNPs in close proximity (approximately 10 AA) to the cleavage motif. Our dataset was refined further by first including information on non-annotated glycosylation sites, published O-glycosylation sites and human nsSNV while selecting proteins with signal peptide sequences indicative of passage of the potential PC substrates through the secretory pathway, where most PCs are active. The previous analysis was done on the whole proteome and then on a list of 100 experimentally validated convertase substrates. Resulting list of substrates and SNPs were annotated by information from literature and existing SNP databases.

RESULTS: The whole proteome analysis showed that there are (201,610) glycosylation sites (both annotated and non-annotated) in close proximity to a specific convertase cleavage motifs. At the same time there are (2,699) SNPs that create additional/abolish existing glycosylation sites and 34291 SNPs abolish convertase motif and/or create a glycosylation site at the convertase motif. On the other hand, while there are (6725) of glycosylation sites and (81) SNPs affecting glycosylation sites in the set of validated substrates, only few (<2%) of these SNPs (Disease linked) are associated with active cleavage motifs. The annotation of the list of the substrates and SNPs using SNP databases showed some cases were we can test our hypothesis about SNPs/variants affecting glycosylation sites could play an important role in the loss of convertase enzyme’s processing activity through the observation of the presence of defects in a substrate ‘s function due to the presence of a SNP creating a glycosylation site proximal to the active cleavage site and hence processing the converatase ’s activity so the substrate will not be transformed into its functional active form (e.g., substrate: o75888, SNP: VAR_052587).