Kallisto Isoform-Level Analysis of Platelet Transcript Data: A rapid, scalable RNA-Seq Workflow

Jacob Youkilis

Mentor: Dr. Rasika Mathias, Division of Allergy and Clinical Immunology, Johns Hopkins University School of Medicine

Date/Time: August 23rd, 2022 at 3:20pm.

Background

Previous studies have documented the role of platelet aggregation in cardiovascular disease progression. WGS- based GWAS studies have provided direct evidence that there are genetic loci determining platelet aggregation, and RNA-Seq of the platelet transcriptome has revealed eQTLs, or ‘expression quantitative trait loci’ for these GWAS loci. The GeneSTAR program at Johns Hopkins University has generated RNA-Seq on platelet samples in N=297 study subjects, and previously these data were examined at the gene-level. This project uses the existing GeneSTAR platelet data to expand upon the lab’s prior work by quantifying transcript-level gene expression, to summarize the added value of transcript-level quantification, and to explore sex-based differences in platelet transcript expression.

Methods

In this project, I explored the tool Kallisto for isoform-level quantification of RNA-Seq platelet data from GeneStar. This was done to investigate variability in paired-end platelet sequencing data for sex-specific differences for genes specifically involved in platelet aggregation, a cornerstone mechanism of cardiovascular disease progression. R and standard dependencies for data manipulation and visualization were used to format, organize, and analyze the information present in the raw data generated with Kallisto. Downstream analysis included the use of information from previous experiments to develop statistically-sound methods to address the research question at hand.

Results

This tool has incredible performance in terms of speed and efficacy in transcript quantification, which is particularly useful to analyze isoform-level differences for paired-end sequencing data. Further, it allows analysis of previously identified genes of interest to decipher the meaning behind the variability in transcript data for metadata factors across numerous samples, such as sex. Overall, we found 94.36% of previously identified genes associated with platelet aggregation to have >1 corresponding transcript. We also found 50 transcripts to be differential by sex at an FDR adjusted q-value less than 0.05, belonging to 39 unique genes.

Conclusions/ Future Directions

For research purposes relating to disease phenotypes in paired-end sequencing data, Kallisto is a valuable tool for quantification as variability among transcript quantities is easier to extrapolate than in other traditional workflows which rely on alignment, a computationally expensive step in traditional RNA-Seq data analysis, and gene-level quantification. Additionally, further exploration of the role of previously identified eQTLs within the transcript data as well as the integration of proteomic data for platelets may be invaluable as a complementary investigation to gain a complete understanding of platelet aggregation.

Tagged: Summer 2022