Internship Presentations

In-silico Purification Benchmarking Analysis and Automated Framework Development

Rujuta Shinde

Mentors: Rebecca Fuchs and Sean Cho, Arcus Biosciences.

Date/Time: August 23rd, 2024 at 3:00pm.

Abstract: Arcus Biosciences is a biopharmaceutical company dedicated to developing innovative immunotherapies. Their research often analyzes tumor samples using bulk transcriptomic data to uncover gene expression patterns that can provide insights into the tumor microenvironment. While bulk RNA sequencing captures a broad array of cells from a sample, the ability to understand the specific cellular components and their states is obscured. Single-cell RNA sequencing offers a more detailed perspective, but its high-cost limits accessibility in research. Additionally, there are thousands of publicly available bulk RNA-seq datasets that could be further mined for insights using novel computational techniques.

Deconvolution tools employ algorithms to elucidate cellular characteristics from bulk RNAseq data. Several publicly available tools have been benchmarked for their accuracy in estimating cell type proportions from tumor samples. Some tools offer in-silico purification, which involves generating gene expression profiles for the identified cell types. However, insilico purification capabilities are relatively new and scarce, and an unbiased benchmarking publication comparing these tools has yet to be found.

To integrate in-silico purification data into Arcus’s research, we aimed to evaluate the accuracy of in-silico data, identify the best available tool, and establish a framework for testing new tools in the future.

First, we developed a comprehensive benchmarking and ad hoc analysis framework named ‘rushdeconv.’ This command-line-based framework enables users to systematically compare the performance of different deconvolution tools on standardized datasets or run individual tools on new datasets for exploratory research and specific project needs. The benchmarked tools include BayesPrism, Blade, CDSeq, xCell, and quantiseq, with BayesPrism, Blade, and CDSeq offering in-silico purification and outputting gene expression profiles for each cell type alongside cell type proportions.

For benchmarking, five publicly available scRNA-seq datasets were transformed into pseudobulk. A correlation analysis was conducted to assess the benchmarking results for both cell type proportions and gene expression profiles. BayesPrism demonstrated the most accurate in-silico purification, with a user-friendly interface and fast runtime. The results also indicated that the accuracy of these tools is significantly influenced by the dataset used.

Tagged
Summer 2024
Summer 2024 #3