Recent advancements in RNA-sequencing (RNA-Seq) have led to impactful
technological breakthroughs in RNA-Seq quantification at high
resolution, including single-cell RNA-Seq and spatial transcriptomics.
However, bulk RNA-Seq (defined as: RNA-Seq quantification from averaged
gene expression ; averaged expression across all cell types from tissue
collected without spatial resolution), continues to play an important
role in transcriptomics for various reasons, including but not limited
to:
High volume of publicly available datasets (a good starting point
for initial hypothesis testing)
Cost effectivess
It has been well standardized over the last decade, and thus,
molecular and analysis methods are robust for the scope of the
technique.
Bulk RNA-Seq as a foundational block to transcriptomics
analysis
To data scientists seeking to analyze the latest transcriptomics
approaches (e.g.: single-cell, or spatial), bulk RNA-Seq analysis
methods are foundational blocks to transcriptomics. For example,
familiarity with bulk RNA-Seq methods is critical for the efficient
analysis of cell specific pseudobulk data derived from single-cell
experiments. Furthermore, many enrichment analysis methods and
visualization approaches are shared or built from classical methods from
bulk RNA-Seq analysis. Thus, familiarity with bulk transcriptomics is a
critical skill to have for any transcriptomics analysis.
Pre-requirements
Workshop participants are strongly encouraged to have familiarity
with Unix and R (introductory level).
The Software Carpentry provides excellent introductory level
material, which can be found in the following links:
If you have already attended one of the previous U-BDS workshops for
R and Unix programming, then you should meet the pre-requirements. If
you have not, we heavily encourage participants to go through the
materials from the links above prior to attendance.
Scope of the workshop
This workshop will cover the foundational topics and methods to bulk
RNA-Seq analysis, including:
Introduction to raw data management and data exploration
Secondary analysis with nf-core/rnaseq
Tertiary analysis topics: quality and control, data normalization
and differential gene expression analysis with DESeq2, gene
annotation, gene enrichment analysis (gene-ontology and gene set
enrichment analysis)
Fundamentals of data visualization for transcriptomics
Authors
Austyn Trull
Bharat Mishra, PhD
Lara Ianov, PhD
A.T., B.M. and L.I. designed workshop content and initial outline.
A.T. contributed to installation instructions, data management, and
secondary analysis materials. B.M. contributed to Docker container
creation and tertiary analysis materials (including DEG analysis, GSEA,
GO, visualization etc.). L.I. supervised all work, reviewed all content
and managed website design and deployment.
Nilesh Kumar, PhD and Luke Potter, PhD also provided additional edits
and revisions to source material.
Additional credits
In addition to U-BDS’s best practices and code written by U-BDS,
sections of the teaching material for this workshop (parts of tertiary
analysis), contains materials which have been adapted or modified from
the following sources (we thank the curators and maintainers of all of
these resources for their wonderful contributions, compiling the best
practices, and easy to follow training guides for beginners):
We would also like to thank the following groups for support:
UAB’s Research Computing (HPC resources and workshop logistics with
resources)
nf-core community (rnaseq pipeline)
We would also like to thank the authors of the dataset which we
implement in our workshop:
Koch CM, Chiu SF, Akbarpour M, Bharat A, Ridge KM, Bartom ET, Winter
DR. A Beginner’s Guide to Analysis of RNA Sequencing Data. Am J Respir
Cell Mol Biol. 2018 Aug;59(2):145-157. doi: 10.1165/rcmb.2017-0430TR.
PMID: 29624415; PMCID: PMC6096346.
Lastly, we would also like to thank Kristen Coutinho and the UAB
Informatics Club for the dataset suggestion, and preliminary discussions
for this workshop.