Transcriptome pipelines steps and reports

Here, we outline the pipeline steps and reports for the Transcriptome pipeline, encompassing RNA-seq, RNA-seq with UMI, MARS-seq, and DESeq2 from counts matrix pipelines.

Analysis pipeline steps

The pipeline:

Trims adapter sequences
Runs FastQC on the trimmed sequences for quality control of the samples, in parallel with the steps that follow
Maps reads to the selected reference genome
Adds UMI and gene information to the reads
Quantifies gene expression by counting reads
Counts UMI’s for cases of PCR bias
Detects Differentially Expressed (DE) genes for a model with a single factor

Steps 4 and 6 are performed only for MARS-Seq and RNA-seq with UMI

Steps 7 is performed only if DESeq2 is selected

Steps 1-6 are not performed for DESeq2 from counts matrix pipeline

Pipeline report

Upon completion of the analysis, you will be sent an email with links to the results report.

The report includes several sections:

Sequencing and Mapping QC
1. Figure 1 - Plots the average quality of each base across all reads. Qualities of 30 (predicted error rate 1:1000) and above are good
2. Figure 2 - Histogram showing the number of reads for each sample in the raw data
3. Figure 3 - Histogram showing the percentage of reads discarded after trimming the adapters (after removing adapters, short, polyA/T and low quality reads are discarded by the pipeline). No figure presented since the percentage of reads discarded after trimming for all samples is lower than 1%.
4. Figure 4 - Histogram with the number of reads for each sample in each step of the pipeline
5. Figure 5 - Plots sequence coverage on and near gene regions
Exploratory Analysis
1. Figure 6 - Heatmap plotting the fraction of reads from the genes with the most counts
2. Figure 7 - Heatmap of Pearson correlation between samples according to gene expression values
3. Figure 8 - Clustering dendrogram of the samples according to gene expression
4. Figure 9 - PCA analysis
  
  Histogram of % explained variability for each PC component
  
  PCA plot of PC1 vs PC2
  
  PCA plot of PC1 vs PC3
Differential Expression Analysis (this section exists only if you run the DESeq2 analysis) - A table with the number of differentially expressed genes (DE) in each category (up/down) for the different contrasts. In addition, links for p-value distribution, volcano plots and heatmaps, as well as a table of the DE genes with dot plots of their expression values are also provided
Bioinformatics Pipeline Methods - Description of pipeline methods.
Links to additional results - Links for downloading tables with raw, normalized counts, log normalized values (rld), and statistical data of contrasts. In cases of models with batches, “combat” values calculated (instead of rld) using the “sva” package, providing batch corrected normalized log2 count values.

Note that only Figure 2 from Step 1, as well as Steps 2–5, will appear in the DESeq2 counts matrix report.

Output folders for RNA-seq pipeline

0_concatenating_fastq

1_cutadapt

2_fastqc

3_mapping

4_reports

Log directory

Output folders for MARS-seq and RNA-seq with-UMI pipelines

1_combined_fastq

2_cutadapt

3_fastqc

4_mapping

5_move_umi

6_count_reads

7_mark_dup

8_dedup_counts

9_umi_counts

10_reports

Log directory

Output folders for DESeq2 from counts matrix pipeline

<report_directory>

Log file

Annotation file

For counts of the reads per gene, we use annotation files (gtf format) from “Ensembl” or “GENCODE”. In MARS-seq analysis, we extend the 3’ UTR exon away from the transcript on the DNA and extend or cut the 3’ UTR exon towards the 5’ direction on the mRNA.

Examples of reports

RNA-seq example

Mars-seq example

RNA-seq with UMI example

DESeq2 from counts matrix example

Note: This example analysis demonstrates a good starting point, and not necessarily an end result.