RNA Sequencing - Metrics Explained

RNA and DNA analysis methods have improved dramatically in the last decade, and one of the most widely used methods across disciplines is RNA sequencing (RNA-seq), which detects and quantifies messenger RNAs (mRNAs) in either pooled tissue or cell samples. Single cell RNA sequencing (scRNA-seq) measures mRNAs present in individual cells and is now one of the most widely used techniques for characterizing cellular responses[1]. scRNA-seq is especially useful for preclinical oncology studies that compare transcriptomic differences between normal and malignant tissue or is used to analyze intratumoral heterogeneity[2].
scRNA-seq begins with the isolation of viable, single cells from peripheral blood or tissue that are processed for mRNA isolation. This mRNA undergoes reverse transcription into complementary DNA that is then amplified and sequenced. Specialized bioinformatics tools are then used to analyze and display data, and parameters are included that preserve the single-cell origin of the amplified cDNA.
Sequencing data for scRNA-seq is data dense given that the full complement of transcripts is quantified for each cell. scRNA-seq results can be reported in different ways, and transcripts per kilobase million (TPM) is the most common method used in the literature today[3]. TPM is calculated by dividing the read counts by each gene’s length in kilobases for reads per kilobase (RPK), and the sum of RPKs is calculated and divided by one million to give you RPK per million (RPKM). RPKM is then divided by one million to give you TPM. Reads per kilobase million (RPKM) is used as an alternate metric and is applied to single-end RNA-seq measurements for which each read corresponds to one mRNA fragment. Alternatively, fragments per kilobase million (FPKM) can be used to represent two reads per fragment for paired-end RNA-seq, and this metric does not double count the fragment. One reason TPM is used more widely is that it measures the frequency of transcription for each gene and accounts for gene length, but the sum of normalized reads for RPKM and FPKM may vary and cannot be compared in the same way.
By understanding how RNA-seq data is acquired and analyzed, investigators can make valid conclusions from complex datasets. It is typically recommended to work with experts in RNA-seq given the data-heavy nature of this method. Not only with this assure that the experiment is designed appropriately, but it will provide greater confidence in the findings and how they are applied toward future experiments.
[1] Tang F, Barbacioru C, Wang Y, Nordman E, Lee C, Xu N, et al. 2009. mRNA-seq whole-transcriptome analysis of a single cell. Nat. Methods.6:377–82.
[2] Patel AP, Tirosh I, Trombetta JJ, Shalek AK, Gillespie SM, Wakimoto H, et al. 2014. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science. 344:1396–401.
[3] https://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained/.