This is a list of links to resources that have essential information on RNA-seq and the recommended R packages for RNA-seq analysis.
Contributed by Dr. Lara Ianov, UAB
When RNA-seq alignment and mapping are discussed, there are generally two primary methods of alignment: standard splice-aware alignment to the genome (e.g.: STAR, HISAT2) and quasi-mapping/pseudoalignment approaches, which utilize k-mer based counting methods to map fragments to the transcriptome (e.g.: Salmon, kallisto). STAR (Dobin et al., 2013) is usually my choice for a splice-aware aligner and Salmon (Patro, Duggal, Love, Irizarry, & Kingsford, 2017) for quasi-mapping. The choice between the two generally depend on the specific long-term goal of a study. It is worth noting that there are significant differences in the computational efficiency among these methods, but this goes beyond the scope of this synopsis.
While some benchmarking studies or computational community-based benchmarking have pointed to increased performance from quasi-mapping/pseudoalignment approaches, others have indicated that all modern methods perform similarly for mRNAs and highly abundant genes (Smith, 2016; Wu, Yao, Ho, Lambowitz, & Wilke, 2018). Thus, one may conclude that if the goal of the study is to perform standard mRNA-seq, any differences found among the methods will not result in a large loss of information if any.
However, beyond standard gene-level mRNA-seq quantification, here is an overview of some of the key points where choosing one approach over another may have a large effect in the interpretation of the results:
DESeq2 is an R package used for differential analysis on count data from high-throughput sequencing assays. It allows for quantitative analysis focused on strength rather than the mere presence of differential expression.
The required input for DESeq2 is gene or transcript counts.
Here is the paper that will give an overall introduction into DESeq2. It explains how DESeq2 uses shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates.
Instructions on how to install DESeq2 into RStudio can be found here.
After installing DESeq2, you can begin the vignette. The link to the vignette can be found here.
Here is Mike Love’s guide for DESeq2. This is an additional resource that explains how to use the package and demonstrates useful workflows.
Please be aware there are many other approaches for estimating differential expression that may be appropriate, or more appropriate (e.g., time course analysis), for your project. We recommend discussing with a knowledgeable data scientist if you are unsure if DESeq2 is a good option for your experimental design.
Below are links to vignettes/tutorials for R packages that are helpful for pathway analysis. There are many other tools and analytical approaches and we intend to add to this section in the future.
gprofiler2
The gprofiler2 package provides a tool set for functional enrichment analysis and visualization. It is primarily used to visualize gene lists, convert gene/protein/SNP identifiers to numerous namespaces, and map orthologous genes across species. Here is the paper that will give an overall introduction to pathway analysis using the gprofiler2 package.
Here is the link to the vignette, which will give you clear instructions on how to use the package.
GAGE
Generally Applicable Gene-set Enrichment (GAGE) is a method for gene set or pathway analysis. The gage package can be used on microarray or RNA-seq data for routine and advanced gene set analyses. Here is the link to the Bioconductor webpage, which will background information on the package and instructions on how to install it into RStudio.
Here is the link to the vignette.