To participate in the workshop, you will need access to the software as described below. In addition, you will need an up-to-date web browser.


If you have any questions or issues regarding any of the below instructions, please feel free to attend the office hours dedicated to installation troubleshooting. For information regarding the exact time and Zoom link, please visit: https://u-bds.github.io/2024-07-09-uab/installation.html


Cheaha Account Registration

The Cheaha supercomputer is a resource offered by Research Computing to all members of UAB and is a very useful tool for analyzing large datasets.

Please follow the documentation produced by UAB’s Research Computing Team available here in order to create an account to access Cheaha.

Once your Cheaha account is successfully created, make sure to test that you can use the Interactive File System and Terminal available on UAB Research Computing’s OnDemand Application. In order to do that, follow the below steps:

  1. Login to the OnDemand application located at the following url: https://rc.uab.edu/pun/sys/dashboard/
  2. In the upper left, click the Files button
  3. In the dropdown, click the link that says /scratch/<blazer_id>/ where blazer_id will be your blazer id
  4. In the new window that appears, click the Open in Terminal button in the upper right of the page

Performing those steps will ensure that your account has been setup correctly.

Environment Setup

In order to setup the environment to run the pipeline on Cheaha we will be using a tool called Anaconda. Anaconda is a package manager and allows for easy and quick installation of tools. We will be using Anaconda to install the tools needed for the initial parts of the workshop.

  1. Login to the UAB Research Computing OnDemand Application located at the following url: https://rc.uab.edu/pun/sys/dashboard/
  2. In the upper left, click the Files button
  3. In the dropdown, click the link that says /scratch/<blazer_id>/ where blazer_id will be your blazer id
  4. In the new window that appears, click the Open in Terminal button in the upper right of the page
  5. Cheaha comes with a lot of tools and software pre-installed called ‘modules’. Anaconda is one of these tools, and we can load it using the below command in the terminal window that appeared:
    • module load Anaconda3
  6. Anaconda works by creating isolated environments to install packages into. In order to create one of these environments and install Nextflow and nf-core, use the below command:
    • conda create -p $USER_SCRATCH/conda_envs/rnaseq_workshop python=3.12 bioconda::nf-core bioconda::nextflow

    • Type y when prompted (Proceed ([y]/n)?).

    • This command will create a new conda enviroment called rnaseq_workshop in your scratch space on Cheaha and under a sub-directory called conda_envs. It is important to name your environments as something intuitive to help you remember their purpose.

  7. We now need to activate the environment so we can begin installing packages inside the newly created ‘rnaseq_workshop’ environment.
    • conda activate $USER_SCRATCH/conda_envs/rnaseq_workshop
  8. The environment should be created and the packages installed, in order to get back out of the environment, we deactivate it
    • conda deactivate

Globus Account Registration

Globus is a web-based tool that can be used for large data transfers between different locations.

As a member of UAB, you are able to login using your blazer id. In order to login to Globus using UAB, follow the steps below:

  1. Go to the Globus home page.
  2. Click ‘Login’ in the upper right of the page.
  3. In the center of the page, click the drop down and search for ‘University of Alabama at Birmingham’ and click ‘Continue’.
  4. On the next page, enter your blazer id and password for your UAB account.

Download the Data to Cheaha

The data being used for the workshop comes from the paper below:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6096346/

For this workshop, the data has been pre-downloaded and packaged into a Globus endpoint for ease of transfer. The link for the Globus endpoint can be found here

If you have never logged into the Globus account, please see the instructions here

In order to download the data, perform the following steps:

  1. Login to Globus
  2. Click on ‘File Manager’ in the side bar
  3. In the left pane, click the ‘Collection’ text box
  4. Search for ‘Cheaha cluster on-campus (UAB DMZ)’ and select it
    • If you are off campus, use the ‘Cheaha cluster off-campus (UAB DMZ)’ collection instead
  5. In the ‘Path’ text box type /scratch/your_blazerid where your_blazerid is your blazer id
  6. Towards the middle, click the ‘New Folder’ button and type rnaseq_workshop.
    • This will create a new folder names ‘rnaseq_workshop’ in your scratch space
  7. Double click the ‘rnaseq_workshop’ folder to go inside of it
  8. In the right pane, click the ‘Collection’ text box
  9. Search for ‘rnaseq_workshop_data’
  10. In the right pane, click the checkboxes beside the ‘input’ and ‘results’ folder that have appeared
  11. In the middle, click the ‘Transfer of Sync to…’ button.

The folders will now be transferred to your scratch space on Cheaha. Following this process will match the directory structure that will be used for the first portion of the workshop.

Download the Data to local computer

For tertiary analysis, the class will be taught using your own local computers, instead of Cheaha.

Please download the data and the teaching_templates folders from the following Box link: https://uab.box.com/s/arbfr7dm5nhe62p5dycau5g9c2xzcoth

Note: please add the folders above to a location in your computer that is easy to find, such as a sub-folder in your Desktop. This directory path will be your working directory in the R session (for a reminder on how to set your working directory in class, please the the following brief overview: https://www.learn-r.org/r-tutorial/setwd-r.php).

Installing R and RStudio

R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R, we use RStudio.

Please follow the instructions at this location:

https://carpentries.github.io/workshop-template/install_instructions/#r-1

Installing R dependencies

R has a variety of existing and pre-built packages that ease the burden of analysis. We will be using a few of these packages to complete the lesson taught at this workshop.

Please only complete this step AFTER successfully installing R and Rstudio above.

  1. Open RStudio
  2. In the central panel (known as the Source Code Panel), type the below command and press the Enter key
install.packages(c("BiocManager", "remotes"))
  1. When you are prompted with Update all/some/none? [a/s/n]: , be sure to type a and press the Enter key.
  2. Repeat steps 2-3 with the following commands, making sure to run each block one at a time:
BiocManager::install(c("DESeq2", "tximport","vsn","apeglm", "tidyverse", "SummarizedExperiment", "vidger"))
BiocManager::install(c("biomaRt","clusterProfiler","msigdbr","Glimma","simplifyEnrichment"))
BiocManager::install(c("enrichplot", "kableExtra","ggrepel","ggpubr","ComplexHeatmap","ComplexUpset","EnhancedVolcano","RColorBrewer", "hexbin", "cowplot","gplots", "ggplot2"))
install.packages(c("gprofiler2", "ggridges", "rmarkdown"))
  1. Once installation has completed for all packages, type the below commands to make sure packages installed correctly:
library("DESeq2")
library("tximport")
library("vsn")
library("apeglm")
library("tidyverse")
library("SummarizedExperiment")
library("vidger")
library("biomaRt")
library("clusterProfiler")
library("msigdbr")
library("Glimma")
library("simplifyEnrichment")
library("enrichplot")
library("kableExtra")
library("ggrepel")
library("ggpubr")
library("ComplexHeatmap")
library("ComplexUpset")
library("EnhancedVolcano")
library("RColorBrewer")
library("hexbin")
library("cowplot")
library("gplots")
library("ggplot2")
library("gprofiler2")
library("ggridges")
library("rmarkdown")
  1. Run these commands similar to the steps described above, highlight each line and click the ‘Run’ button individually for each line.
  2. If everything worked correctly, you will not see any output from the ‘Console’ panel located in your RStudio window other than seeing the command copied on each line