In addition to U-BDS’s best practices and code written by U-BDS, sections of the teaching material for this workshop (especially tertiary analysis), contains materials which have been adapted or modified from the following sources (we thank the curators and maintainers of all of these resources for their wonderful contributions, compiling the best practices, and easy to follow training guides for beginners):
Tertiary analysis can be long and complex and is heavily dependent on
study design. This workshop focuses on standard tertiary analysis tasks
which are split across 4 parts covering: quality and control, data
normalization and differential gene expression analysis with
DESeq2
, gene annotation, gene enrichment analysis
(gene-ontology and gene set enrichment analysis), and the fundamentals
of data visualization for transcriptomics.
# Set the seed so our results are reproducible:
set.seed(2020)
# Required packages
library(tximport)
library(DESeq2)
library(Glimma)
library(vsn)
# Mouse annotation package we'll use for gene identifier conversion
library(biomaRt)
# We will need them for data handling
library(magrittr)
library(ggrepel)
library(dplyr)
library(tidyverse)
library(readr)
# plotting
library(ggplot2)
library(ComplexHeatmap)
library(RColorBrewer)
data
and results
directorydir.create("./data", recursive = TRUE)
dir.create("./results", recursive = TRUE)
DESeqDataSet
In the DESeq2
package, the core object for storing read
counts and intermediate calculations during RNA-Seq analysis is the
DESeqDataSet
. This object, often referred to as dds in code
examples, is derived from the RangedSummarizedExperiment
class within the SummarizedExperiment
package.
The “Ranged” aspect of DESeqDataSet
signifies that each
row of read count data can be linked to specific genomic regions, such
as gene exons. This connection allows for seamless integration with
other Bioconductor tools, enabling tasks like identifying
ChIP-seq
peaks near differentially expressed genes.
Crucially, a DESeqDataSet
object requires a
design
formula, which defines the variables considered in
the statistical model. This formula typically consists of a tilde
(~
) followed by variables separated by plus signs. While
the design can be modified after object creation, any changes
necessitate repeating the upstream analytical steps of the DESeq2
pipeline, as changes in the design
influences the
dispersion estimates and log2 fold change calculations.
We can import input data by 4 ways to construct a
DESeqDataSet
, depending on what pipeline was used upstream
of DESeq2 to generated counts or estimated counts:
. From transcript abundance
files and
tximport
. From a count matrix
. From htseq-count
files
. From a SummarizedExperiment
object
Given that we have implemented salmon
quantification in
secondary analysis, the recommended approach for tertiary analysis is to
import the salmon
transcript quantification to
DESeq2
, and then quantify gene-level data in
DESeq2
using tximport
. This same approach can
also be applied to other transcript abundance quantifiers such as
kallisto
Employing salmon
or kallisto
for transcript
abundance estimation offers several key benefits:
Correction for gene length variations: These methods account for potential changes in gene length across samples, which can arise from differential isoform usage.
Efficiency: As discussed in the secondary analysis portion,
salmon
and kallisto
are notably faster and
require less computational resources compared to alignment-based
approaches that involve generating and storing BAM files. This is a
significant advantage for large-scale studies or when analyzing datasets
with limited resources.
Increased sensitivity: Unlike traditional methods that discard ambiguously aligning fragments, these tools can leverage information from reads aligning to multiple homologous genes. This increases the sensitivity of transcript quantification, particularly for genes with high sequence similarity.
With that being said and as previously discussed in secondary analysis, there can be study design needs where standard splice aware aligners may be more beneficial or complimentary.
Note : For this workshop, we will import the transcript
abundance (quant.sf
) file created by salmon
in
the secondary analysis pipeline (nf-core
) using
tximport
method. The files are provided to you in the
/data
folder
Note: For simplicity and organization, we have renamed the
.sf
files by ${sample_names}_quant.sf
. Please
practice caution while updating the file names in your own research
projects to avoid sample mix-up (tip: changing file names does not alter
the md5sum of the file. Thus, this checker can be applied to ensure
there has been no sample mix-ups.
Locate all the transcript abundance file and prepare them to be
imported to DESeq2
.
tx2gene <- read.table("./data/tx2gene.tsv", sep = '\t', header = FALSE)
head(tx2gene)
## V1 V2 V3
## 1 ENSMUST00000193812.2 ENSMUSG00000102693.2 4933401J01Rik
## 2 ENSMUST00000082908.3 ENSMUSG00000064842.3 Gm26206
## 3 ENSMUST00000162897.2 ENSMUSG00000051951.6 Xkr4
## 4 ENSMUST00000159265.2 ENSMUSG00000051951.6 Xkr4
## 5 ENSMUST00000070533.5 ENSMUSG00000051951.6 Xkr4
## 6 ENSMUST00000192857.2 ENSMUSG00000102851.2 Gm18956
# Importing quant.sf file from secondary outputs within data:
myFiles <- dir("./data", ".sf$", full.names = TRUE)
myFiles
## [1] "./data/N01_AM_Naive_quant.sf" "./data/N02_AM_Naive_quant.sf" "./data/N03_AM_Naive_quant.sf" "./data/N04_AM_Naive_quant.sf"
## [5] "./data/R05_AM_Allo24h_quant.sf" "./data/R06_AM_Allo24h_quant.sf" "./data/R07_AM_Allo24h_quant.sf" "./data/R08_AM_Allo24h_quant.sf"
# Adding names for columns:
myFiles_names <- c()
for (i in myFiles) {
result <- gsub("_quant.sf","",i)
result <- basename(result)
myFiles_names[result] <- i
}
all(file.exists(myFiles_names))
## [1] TRUE
# Making a log of the col names from full names:
Log <- as.matrix(myFiles_names)
write.table(Log, file = "./results/Sample_names_tximport.txt",
quote = FALSE, col.names = FALSE, sep = "\t")
tximport
We will now import the transcript-level abundances and quantify them into gene-level counts.
Note that we explicitly set the ignoreTxVersion
parameter to FALSE
despite it being the default value. This
serves as a reminder to ensure that the parameter has been set correctly
according to the reference source. In this canse, we set it to
FALSE
given that we used a GENCODE reference, which
contains versions associated to each feature which can be seen in the
tx2gene
object.
# First, let's take a peak at the available parameters for the tximport functions
?tximport
# with sensible parameters set, run tximport:
txi <- tximport(myFiles_names,
type = "salmon",
tx2gene = tx2gene,
txOut = FALSE,
ignoreTxVersion=FALSE)
## reading in files with read_tsv
## 1 2 3 4 5 6 7 8
## summarizing abundance
## summarizing counts
## summarizing length
names(txi)
## [1] "abundance" "counts" "length" "countsFromAbundance"
head(txi$counts, 5)
## N01_AM_Naive N02_AM_Naive N03_AM_Naive N04_AM_Naive R05_AM_Allo24h R06_AM_Allo24h R07_AM_Allo24h R08_AM_Allo24h
## ENSMUSG00000000001.5 3831 3765 3395 2841 5403.000 4993 4155.000 4122
## ENSMUSG00000000003.16 0 0 0 0 0.000 0 0.000 0
## ENSMUSG00000000028.16 180 192 173 90 1326.001 1378 1093.999 861
## ENSMUSG00000000031.18 0 0 0 0 0.000 0 0.000 0
## ENSMUSG00000000037.18 4 3 9 3 12.000 11 7.000 10
head(txi$abundance, 5)
## N01_AM_Naive N02_AM_Naive N03_AM_Naive N04_AM_Naive R05_AM_Allo24h R06_AM_Allo24h R07_AM_Allo24h R08_AM_Allo24h
## ENSMUSG00000000001.5 98.216401 106.299723 99.267634 95.303118 95.846128 94.673074 92.201307 89.722317
## ENSMUSG00000000003.16 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## ENSMUSG00000000028.16 6.839288 8.727377 8.015467 4.422736 34.996297 38.631371 36.942847 29.781577
## ENSMUSG00000000031.18 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## ENSMUSG00000000037.18 0.647132 0.120524 0.343255 0.192976 0.125222 0.149528 0.176068 0.305835
A sample metadata file should contain all relevant metadata known for the samples in the study. This typically includes at a minimum the experimental grouping. However, it should also include additional factors when they are a part of the study (e.g.: sex, age, batch, time-points etc.)
For this particular study, the sample metadata was acquired from the public repository containing the raw data:
# import file containing sample metadata
colData <- read.csv("./data/ColData.csv", header=TRUE, row.names=1)
colData
## Condition Group Time Genotype Sex Age
## N01_AM_Naive Naive Naive Naive Cx3cr1gfp/+B6 Male 14W
## N02_AM_Naive Naive Naive Naive Cx3cr1gfp/+B6 Male 14W
## N03_AM_Naive Naive Naive Naive Cx3cr1gfp/+B6 Male 14W
## N04_AM_Naive Naive Naive Naive Cx3cr1gfp/+B6 Male 14W
## R05_AM_Allo24h Transplant Transplant_24hr 24hr Cx3cr1gfp/+B6 Male 14W
## R06_AM_Allo24h Transplant Transplant_24hr 24hr Cx3cr1gfp/+B6 Male 14W
## R07_AM_Allo24h Transplant Transplant_24hr 24hr Cx3cr1gfp/+B6 Male 14W
## R08_AM_Allo24h Transplant Transplant_24hr 24hr Cx3cr1gfp/+B6 Male 14W
str(colData)
## 'data.frame': 8 obs. of 6 variables:
## $ Condition: chr "Naive" "Naive" "Naive" "Naive" ...
## $ Group : chr "Naive" "Naive" "Naive" "Naive" ...
## $ Time : chr "Naive" "Naive" "Naive" "Naive" ...
## $ Genotype : chr "Cx3cr1gfp/+B6" "Cx3cr1gfp/+B6" "Cx3cr1gfp/+B6" "Cx3cr1gfp/+B6" ...
## $ Sex : chr "Male" "Male" "Male" "Male" ...
## $ Age : chr "14W" "14W" "14W" "14W" ...
# change the condition, group, and time to factor
colData$Condition <- as.factor(colData$Condition)
colData$Group <- as.factor(colData$Group)
colData$Time <- as.factor(colData$Time)
colData
## Condition Group Time Genotype Sex Age
## N01_AM_Naive Naive Naive Naive Cx3cr1gfp/+B6 Male 14W
## N02_AM_Naive Naive Naive Naive Cx3cr1gfp/+B6 Male 14W
## N03_AM_Naive Naive Naive Naive Cx3cr1gfp/+B6 Male 14W
## N04_AM_Naive Naive Naive Naive Cx3cr1gfp/+B6 Male 14W
## R05_AM_Allo24h Transplant Transplant_24hr 24hr Cx3cr1gfp/+B6 Male 14W
## R06_AM_Allo24h Transplant Transplant_24hr 24hr Cx3cr1gfp/+B6 Male 14W
## R07_AM_Allo24h Transplant Transplant_24hr 24hr Cx3cr1gfp/+B6 Male 14W
## R08_AM_Allo24h Transplant Transplant_24hr 24hr Cx3cr1gfp/+B6 Male 14W
The DESeqDataSet
class, derived from
RangedSummarizedExperiment
, serves as the central data
container in DESeq2
for storing input data, intermediate
calculations, and differential expression analysis results. It enforces
the use of non-negative integer values in the counts matrix, the first
element in its assay list. Moreover, a design
formula
defining the experimental setup is mandatory.
Constructor functions facilitate the creation of
DESeqDataSet
objects from diverse sources:
. DESeqDataSet
: Accepts a
RangedSummarizedExperiment
object.
. DESeqDataSetFromMatrix
: Constructs from a matrix of
counts
.
. DESeqDataSetFromHTSeqCount
: Creates from
HTSeq
count files generated by the Python package.
. DESeqDataSetFromTximport
: Builds from a list object
returned by the tximport
function.
The design
parameter plays a pivotal role in modeling
samples based on the experimental design. In our workshop dataset, where
only the condition varies, a simple ~ Condition
model
suffices.
However, DESeq2
accommodates complex designs involving
batch correction, interactions, and time-series analysis. Refer to the
“additional_resources” section of this workshop and the
DESeq2
vignette for more elaborate design examples.
# tximport dds generation
dds <- DESeqDataSetFromTximport(txi,
colData = colData,
design = ~ Condition) # update the design as needed
## using counts and average transcript lengths from tximport
# dds
head(counts(dds), 5)
## N01_AM_Naive N02_AM_Naive N03_AM_Naive N04_AM_Naive R05_AM_Allo24h R06_AM_Allo24h R07_AM_Allo24h R08_AM_Allo24h
## ENSMUSG00000000001.5 3831 3765 3395 2841 5403 4993 4155 4122
## ENSMUSG00000000003.16 0 0 0 0 0 0 0 0
## ENSMUSG00000000028.16 180 192 173 90 1326 1378 1094 861
## ENSMUSG00000000031.18 0 0 0 0 0 0 0 0
## ENSMUSG00000000037.18 4 3 9 3 12 11 7 10
By default, R
automatically assigns a reference level to
factors based on alphabetical order. If you don’t specify a different
reference level (e.g., indicating the control
group) when
using DESeq2
functions, comparisons will be made based on
this alphabetical order.
To address this, you have two options:
Explicitly define the comparison: Use the contrast argument in the results function to specify the exact comparison you want to make. This overrides the default reference level.
Change the factor levels: Explicitly set the factor levels to
determine the reference level. This change will be reflected in the
results names after running DESeq
or
nbinomWaldTest
/nbinomLRT
.
Below we demonstrate option #2:
# see current levels:
dds$Condition
## [1] Naive Naive Naive Naive Transplant Transplant Transplant Transplant
## Levels: Naive Transplant
In the output above, we can see that Naive
is already
set as the reference level by alphabetical order. However, it is still
best practice to include the code chunk in your analysis and not rely on
alphabetical order as it can easily change across experiments.
# set reference to Naive
dds$Condition <- relevel(dds$Condition, ref = "Naive")
dds$Condition
## [1] Naive Naive Naive Naive Transplant Transplant Transplant Transplant
## Levels: Naive Transplant
Exploratory analysis is an essential step in RNA-Seq
data analysis for quality control and understanding the underlying
patterns. It can reveal issues like quality problems, sample swaps, or
contamination, while also highlighting the most prominent trends in the
dataset.
In this section, we’ll delve into two common approaches for
exploratory analysis of RNA-Seq data: clustering
and
principal component analysis (PCA)
. These methods aren’t
exclusive to RNA-Seq but are widely applicable. However, certain aspects
of count data require specific considerations when applying these
techniques.
Before proceeding, it’s important to establish a threshold for
gene expression
detectability. A simple criterion we’ll use
here is to consider a gene as non-detectable (or extremely low
abundance) if its total count across all samples doesn’t exceed 5. This
ensures we focus on genes with sufficient data for meaningful
analysis.
#----- Counts Pre-filtering based on rowMeans -------
message(paste0("Number of genes before pre-filtering: ", nrow(counts(dds))))
## Number of genes before pre-filtering: 55891
# here we do rowMeans, other approaches are rowSums or min. per sample
keep <- rowMeans(counts(dds)) >= 5
dds <- dds[keep,]
message(paste0("Number of genes after filtering: ", nrow(counts(dds))))
## Number of genes after filtering: 15486
Note that there has been a notable drop in the number of genes present in this dataset.
Discussion Question
Given the drop from ~55k genes to 15k genes. Is this expected for this species (mouse) and tissue type (lung-derived cells)? How many genes are typically expressed in mammalian genomes?
Challenge: Filter the dds object based to include genes with 5+ reads in at least half the samples
test_dds <- DESeqDataSetFromTximport(txi,
colData = colData,
design = ~ Condition)
## using counts and average transcript lengths from tximport
keep <- rowSums(counts(test_dds) >= 5) >= 4
test_dds <- test_dds[keep,]
nrow(counts(test_dds))
## [1] 15688
Library size
refers to the total number of reads
assigned to genes for a given sample. Comparing raw
read counts
directly between samples with different library
sizes can lead to incorrect conclusions about differential gene
expression.
Normalization
by library size adjusts the read counts to
make them comparable across samples, removing the technical bias
introduced by varying sequencing depths. Before proceeding with
downstream analysis, it is crucial to compare the library sizes across
all samples to identify potential outliers or samples with significantly
different sequencing depths.
# Add in the sum of all counts
dds$libSize <- colSums(counts(dds))
# Plot the libSize by using pipe %>%
# to extract the colData, turn it into a regular
# data frame then send to ggplot:
libsize_plot <- colData(dds) %>%
as.data.frame() %>%
ggplot(aes(x = rownames(colData), y = libSize / 1e6, fill = Condition)) +
geom_bar(stat = "identity") + theme_bw() +
labs(x = "Sample", y = "Total count in millions") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
# let's take a look at the plot:
libsize_plot
# reminder on how to save figures:
png("./results/libsize_plot.png")
plot(libsize_plot)
dev.off()
## png
## 2
Based on the figure above, we can see that there are differences in the raw counts across the samples. This is expected, and in the later sections the data will be normalized to account for this technical artifact.
The DESeq2
package provides two main approaches for
transforming RNA-Seq count data: the variance stabilizing transformation
(vst
) and the regularized log transformation
(rlog
). Both methods aim to produce transformed data on the
log2 scale that is normalized for library size or other normalization
factors.
. Variance Stabilizing Transformation (vst
) The
vst
is based on the concept of transforming the data such
that the variance becomes independent of the mean expression level. This
approach is useful because RNA-Seq data often exhibits higher variance
for lowly expressed genes compared to highly expressed genes.
. Regularized Log Transformation (rlog) The rlog
is an
alternative approach that transforms the count data to the
log2
scale while minimizing differences between samples for
rows (genes) with small counts. It incorporates a prior on the sample
differences, which acts as a regularization or shrinkage step.
Both vst
and rlog
produce transformed data
on the log2
scale, normalized for library size
or other factors. The choice between the two transformations may depend
on the specific characteristics of the dataset, such as the range of
library sizes or the presence of lowly expressed genes. Both methods
contrast with traditional log2
transformation (even when
adjusted for library size by normTransform
) in that they
reduce the high variance inherent in lowly expressed genes.
The vst
and rlog
functions in
DESeq2
have an argument blind that determines whether the
transformation should be blind to the sample information specified by
the design formula. When blind
is set to TRUE
(the default), the functions will re-estimate the dispersion using only
an intercept, ensuring that the transformation is unbiased by any
information about the experimental groups.
If blind
is set to FALSE
, the functions
will use the already estimated dispersion to perform the
transformations. If dispersion are not already estimated, they will be
calculated using the current design
formula. It is
important to note that even when blind
is set to
FALSE
, the transformation primarily uses the fitted
dispersion estimates from the mean-dispersion trend line, which reflects
the global dependence of dispersion on the mean for the entire
experiment. Therefore, setting blind
to FALSE
still largely avoids using specific information about which samples
belong to which experimental groups during the transformation
# Raw counts
meanSdPlot(assay(dds), ranks = FALSE)
# Traditional log2(n + 1) transformation
ntd <- normTransform(dds)
## using 'avgTxLength' from assays(dds), correcting for library size
ntd
## class: DESeqTransform
## dim: 15486 8
## metadata(1): version
## assays(1): ''
## rownames(15486): ENSMUSG00000000001.5 ENSMUSG00000000028.16 ... ENSMUSG00002076763.1 ENSMUSG00002076916.1
## rowData names(0):
## colnames(8): N01_AM_Naive N02_AM_Naive ... R07_AM_Allo24h R08_AM_Allo24h
## colData names(7): Condition Group ... Age libSize
meanSdPlot(assay(ntd))
Variance decreases as the mean increases.
# Variance stabilized transformation
vsd <- vst(dds, blind=FALSE)
## using 'avgTxLength' from assays(dds), correcting for library size
vsd
## class: DESeqTransform
## dim: 15486 8
## metadata(1): version
## assays(1): ''
## rownames(15486): ENSMUSG00000000001.5 ENSMUSG00000000028.16 ... ENSMUSG00002076763.1 ENSMUSG00002076916.1
## rowData names(4): baseMean baseVar allZero dispFit
## colnames(8): N01_AM_Naive N02_AM_Naive ... R07_AM_Allo24h R08_AM_Allo24h
## colData names(7): Condition Group ... Age libSize
meanSdPlot(assay(vsd))
Variance is more consistent across all genes.
Principal component analysis (PCA) is a dimensionality reduction technique that projects samples into a lower-dimensional space. This reduced representation can be used for visualization or as input for other analytical methods. PCA is unsupervised, meaning it does not incorporate external information about the samples, such as treatment conditions.
In the plot below, we represent the samples in a two-dimensional principal component space. For each dimension, we indicate the fraction of the total variance represented by that component. By definition, the first principal component (PC1) always captures more variance than subsequent components. The fraction of explained variance measures how much of the ‘signal’ in the data is retained when projecting samples from the original high-dimensional space to the low-dimensional space for visualization.
Principal component (PC) analysis plot displaying our 8 samples along PC1 and PC2, indicates that there ~69% of variance is explained in PC1 and that ~7% is explained in PC2.
pcaData <- DESeq2::plotPCA(vsd, intgroup = "Condition",
returnData = TRUE, ntop = length(vsd))
## using ntop=15486 top features by variance
percentVar <- round(100 * attr(pcaData, "percentVar"))
ggplot(pcaData, aes(x = PC1, y = PC2)) +
geom_point(aes(color = Condition), size = 5) +
xlab(paste0("PC1: ", percentVar[1], "% variance")) +
ylab(paste0("PC2: ", percentVar[2], "% variance")) +
coord_fixed() +
theme(text = element_text(size=20)) +
theme_bw(base_size = 16)
Discussion Question
What is your interpratation from the PCA plot above. Does it overall display variance that can be biologically relevant?
Challenge: Add names of samples on PCA plot
ggplot(pcaData, aes(x = PC1, y = PC2)) +
geom_point(aes(color = Condition), size = 5) +
coord_fixed() +
theme_minimal() +
xlab(paste0("PC1: ", percentVar[1], "% variance")) +
ylab(paste0("PC2: ", percentVar[2], "% variance")) +
theme(text = element_text(size=20)) +
theme_bw(base_size = 16) +
geom_text_repel(data = pcaData,
mapping = aes(label = name),
size = 3,
fontface = 'bold.italic',
color = 'black',
box.padding = unit(0.2, "lines"),
point.padding = unit(0.2, "lines"))
Euclidean distance
is a measure of the straight-line
distance between two points. In the context of sample clustering, it can
be used to assess the similarity of gene expression patterns between
samples. Longer Euclidean distances
indicate greater
differences in expression.
One straightforward approach to cluster
samples based on
their expression patterns is to calculate the Euclidean distance between
all possible sample pairs. These distances can then be visually
represented using both a branching dendrogram
and a
heatmap
, where color intensity corresponds to the magnitude
of the distance.
From this, we infer that all samples are clustered based on the
Groups
.
# dist computes distance with Euclidean method
sampleDists <- dist(t(assay(vsd)))
colors <- colorRampPalette(brewer.pal(9, "Blues"))(255)
ComplexHeatmap::Heatmap(
as.matrix(sampleDists),
col = colors,
name = "Euclidean\ndistance",
cluster_rows = hclust(sampleDists),
cluster_columns = hclust(sampleDists),
bottom_annotation = columnAnnotation(
condition = vsd$Condition,
time = vsd$Time,
col = list(condition = c(Naive = "blue", Transplant = "brown"),
time = c("Naive" = "yellow", "24hr" = "forestgreen"))
))
Often it is useful to look at interactive plots to directly explore different experimental factors or get insights from someone without coding experience.(particularly useful when there are covariate etc.)
Some useful tools for interactive exploratory data analysis for RNA-Seq are Glimma and iSEE
While we will not cover them at this time, we encourage trainees to explore the resources above in the future.
We will save the analyzed datasets for other parts of this workshop.
saveRDS(dds, file = "./results/dds.rds")
saveRDS(vsd, file = "./results/vsd.rds")
saveRDS(colData, file = "./results/colData.rds")
saveRDS(txi, file = "./results/txi.rds")
R’s sessionInfo()
captures the version of all packages
loaded in the current environment. You may save this to an external file
with the following command:
writeLines(capture.output(sessionInfo()), "./results/sessionInfo.txt")
In this case, we are displaying it as part of our lesson:
sessionInfo()
## R version 4.3.3 (2024-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] grid stats4 stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] RColorBrewer_1.1-3 ComplexHeatmap_2.18.0 lubridate_1.9.3 forcats_1.0.0
## [5] stringr_1.5.1 purrr_1.0.2 readr_2.1.5 tidyr_1.3.1
## [9] tibble_3.2.1 tidyverse_2.0.0 dplyr_1.1.4 ggrepel_0.9.5
## [13] ggplot2_3.5.0 magrittr_2.0.3 biomaRt_2.58.2 vsn_3.70.0
## [17] Glimma_2.12.0 DESeq2_1.42.1 SummarizedExperiment_1.32.0 Biobase_2.62.0
## [21] MatrixGenerics_1.14.0 matrixStats_1.3.0 GenomicRanges_1.54.1 GenomeInfoDb_1.38.8
## [25] IRanges_2.36.0 S4Vectors_0.40.2 BiocGenerics_0.48.1 tximport_1.30.0
##
## loaded via a namespace (and not attached):
## [1] DBI_1.2.2 bitops_1.0-7 rlang_1.1.3 clue_0.3-65 GetoptLong_1.0.5
## [6] compiler_4.3.3 RSQLite_2.3.6 png_0.1-8 vctrs_0.6.5 pkgconfig_2.0.3
## [11] shape_1.4.6.1 crayon_1.5.2 fastmap_1.1.1 dbplyr_2.5.0 XVector_0.42.0
## [16] labeling_0.4.3 fontawesome_0.5.2 utf8_1.2.4 rmarkdown_2.26 tzdb_0.4.0
## [21] preprocessCore_1.64.0 bit_4.0.5 xfun_0.43 zlibbioc_1.48.2 cachem_1.0.8
## [26] jsonlite_1.8.8 progress_1.2.3 blob_1.2.4 highr_0.10 DelayedArray_0.28.0
## [31] BiocParallel_1.36.0 prettyunits_1.2.0 parallel_4.3.3 cluster_2.1.6 R6_2.5.1
## [36] stringi_1.8.3 bslib_0.7.0 limma_3.58.1 jquerylib_0.1.4 Rcpp_1.0.12
## [41] iterators_1.0.14 knitr_1.46 timechange_0.3.0 Matrix_1.6-5 tidyselect_1.2.1
## [46] rstudioapi_0.16.0 abind_1.4-5 yaml_2.3.8 doParallel_1.0.17 codetools_0.2-20
## [51] affy_1.80.0 curl_5.2.1 lattice_0.22-6 withr_3.0.0 KEGGREST_1.42.0
## [56] evaluate_0.23 BiocFileCache_2.10.2 xml2_1.3.6 circlize_0.4.16 Biostrings_2.70.3
## [61] filelock_1.0.3 pillar_1.9.0 affyio_1.72.0 BiocManager_1.30.22 foreach_1.5.2
## [66] generics_0.1.3 vroom_1.6.5 RCurl_1.98-1.14 hms_1.1.3 munsell_0.5.1
## [71] scales_1.3.0 glue_1.7.0 tools_4.3.3 hexbin_1.28.3 locfit_1.5-9.9
## [76] XML_3.99-0.16.1 AnnotationDbi_1.64.1 edgeR_4.0.16 colorspace_2.1-0 GenomeInfoDbData_1.2.11
## [81] cli_3.6.2 rappdirs_0.3.3 fansi_1.0.6 S4Arrays_1.2.1 gtable_0.3.5
## [86] sass_0.4.9 digest_0.6.35 SparseArray_1.2.4 farver_2.1.1 rjson_0.2.21
## [91] htmlwidgets_1.6.4 memoise_2.0.1 htmltools_0.5.8.1 lifecycle_1.0.4 httr_1.4.7
## [96] GlobalOptions_0.1.2 statmod_1.5.0 bit64_4.0.5