EVENTS | VIEW CALENDAR
Guest Commentary: RNA-seq in preclinical research and drug discovery--digging deep for insights
Transcriptomics—the study of all RNA molecules in a cell or tissue at a given moment—provides valuable insights into the underlying causes of disease and into the mechanisms underlying cellular disease response. Transcriptome research is also an important tool in drug discovery research, preclinical testing of potential therapeutics and research into the mechanisms underlying the development of drug resistance. RNA sequencing, or RNA-seq, is a powerful next-generation sequencing (NGS) method that provides a sensitive, comprehensive picture of the transcriptome of tissues, organs and single cells.
What can RNA-seq tell us that DNA sequencing cannot?
Most of the cells in our body contain the same genomic DNA, or genome—the basic instructions that guide everything that our cells do, from growing and dividing to responding to pathogens or drug treatments. And while genome sequencing can provide key information about risk factors for disease and reveal actionable information for known disease-associated variants and druggable targets, it doesn’t inform us about the real-time, in-the-moment status of our cells as they respond to their changing environments. For that, we need to examine the transcriptome, and RNA-seq offers the most comprehensive transcriptome analysis currently available (Rao et al.; Bonder et al; Yang et al.).
The transcriptome contains many types of RNA molecules, each with different functions, and changes constantly in response to all kinds of stimuli, such as toxins, pathogens, pharmaceuticals, injury and signals from other cells. The most frequently studied class of RNA is messenger RNAs, or mRNAs, which code for proteins. It is the expression of mRNAs and their precursor molecules (pre-mRNAs) that we are typically referring to when discussing gene expression.
Another rapidly expanding class of RNAs with significant biological function is the noncoding RNAs (ncRNAs)—a diverse assortment of RNA molecules do not code for protein, yet have many essential roles in the cells, including critical regulatory tasks that influence cell behavior, gene expression, mRNA stability, translation, and chromatin stability. These ncRNAs are transcribed across the genome, and include RNAs such as long non-coding RNA, or lncRNAs; microRNAs, or miRNAs; circular RNAs; intronic RNAs; and small nucleolar RNA, or snoRNA (Reviewed in Matsui and Corey, 2017).
What can RNA-seq tell us that other gene expression analysis platforms cannot?
There are several sensitive methods for investigating gene expression on a large scale, including gene expression microarrays and other systems for multiplexed RNA analysis. Indeed, microarrays are widely used for rapid, high-throughput screening of differential gene expression for biological, medical, and clinical research, as well as drug screening, and the collective data from these studies is available in several public databases (e.g., EMBL-EBI Expression Atlas, GTex and Gene Expression Omnibus).
Microarrays and other multiplex analysis systems depend upon the direct hybridization of specific probes to known target RNAs (or cDNA copies of the RNA) in the RNA sample, and the results are provided as a numerical count of each probe-specific RNA or RNA segment that is represented in the sample. While these methods are powerful tools for studying the expression of specific, known RNAs in response to a stimulus or treatment, they offer little opportunity for discovery of novel transcripts, alternative isoforms or of unanticipated but potentially important changes in the expression of noncoding transcripts.
RNA-seq, in contrast, offers the opportunity to interrogate the entire transcriptome at a given moment at single-base resolution without prior knowledge of genes that might be impacted by a particular treatment or condition, thus providing a more comprehensive, less hypothesis-driven view of the transcriptome.
These benefits are clearly shown by Rao et al., in which data obtained by RNA-seq and microarrays were compared in a toxicogenomic study in rats; compared to the microarrays, the RNA-seq data yielded additional differentially expressed genes (DEGs) that enriched their knowledge of key pathways affected by test compounds, as well as additional relevant pathways. They also identified differentially expressed ncRNAs that provided additional mechanistic clarity. They concluded that RNA-Seq is likely to generate more insight into mechanisms of toxicity compared to microarrays due to its wider dynamic range, greater sensitivity and ability to identify a larger number of transcripts.
One of the challenges of transcriptome research and drug discovery is that biological tissues contain numerous cell types, each with their own expression profiles. Single-cell RNA-seq (scRNA-seq) enables the analysis of gene expression at the single-cell level (Kulkarni et al., 2019; Yang et al, 2020). Venema and Voskuil (2019) provide an elegant example of how scRNA-seq enabled the identification of druggable, disease-associated genes that are overexpressed in specific cells in the blood and intestines of Crohn’s disease patients; such resolution is not attainable with standard bulk-RNA sequencing or with microarray analysis.
How can the identification of ncRNAs and alternative transcripts enhance drug discovery?
Noncoding RNAs (ncRNA) and alternatively spliced transcripts of disease-associated genes are gaining momentum as potential drug targets (reviewed in Matsui and Corey 2017 and Zhao 2019, respectively). Although only about 2 to 3 percent of the human genome codes for protein-coding mRNAs (collectively called the exome), many other RNAs—including ncRNA—also impact gene expression in healthy and diseased tissues, making them potential targets for drug discovery. Indeed, there are ongoing clinical studies for compounds that target ncRNAs or pre-mRNAs implicated in several diseases, including Duchenne muscular dystrophy (DMD), spinal muscular atrophy (SMA), hematological malignancies and solid tumors; see Matsui and Corey 2017.
When an mRNA molecule is transcribed from its respective gene as a pre-mRNA, it contains exons—the segments that code for protein domains or for regulatory regions in the mature mRNA—and introns, which are removed during splicing. As reviewed in Zhao 2019, it has become increasingly apparent that many—perhaps even most—genes in the human genome yield multiple, alternatively spliced mRNAs containing various combinations of exons, and that these alternative transcripts lead to proteins with diverse functional domains.
Thus, alternative splicing has significant implications for drug development, identification of biomarkers, and our understanding of drug resistance—especially as many drugs target specific protein domains. Examples of diseases caused by mis-spliced mRNAs include neurodegenerative disorders, cancer and tumor progression, immune diseases, cardiovascular disease, and metabolic disease; examples of mis-spliced genes include SRSF1, BCL2L1, Cyclin D1, KLF6 and VEGF. As described in Zhao et al., RNA-seq has enabled the identification of drugs that are specifically targeted to disease-causing splice variants in SMA, DMD and other diseases.
What are the different types of RNA-seq workflows, and when is each most appropriate?
There are several types of RNA-seq workflows (described briefly below), but each begins with the isolation of RNA from the sample, followed by construction of an RNA-seq library for sequencing on an NGS instrument.
If you are primarily interested in the expression of protein-coding genes under a specific set of conditions, then a workflow that enriches for mRNA is appropriate. These methods work best with high-quality RNA and depend on the fact that most eukaryotic mRNA molecules have a 3’ polyA tail that can be used to selectively enrich for mRNA using beads coated with stretches of complimentary thymine bases (oligo dT). The final sequencing data is then enriched for protein-coding transcripts.
If you are seeking a more comprehensive view of the transcriptome, including noncoding RNAs and pre-mRNAs, then select a whole-transcriptome sequencing (WTS) workflow; this is a good choice for degraded RNA samples also. For WTS, the workflow typically includes a ribodepletion step, which removes the highly abundant ribosomal RNA (rRNA), which provides no useful information for most gene expression studies.
In addition to the methods above, there are also many workflows for scRNA-seq, as well as options for custom-enriched libraries to enable more focused sequencing of specific transcripts. Such targeted workflows enable efficient use of sequencing. Conversely, it is also possible to selectively deplete highly expressed transcripts that may make it difficult to detect the expression of lowly expressed genes; one example is the targeted depletion of globin transcripts from blood-derived samples.
Out of all of the existing methods for examining the transcriptome, RNA-seq offers the most comprehensive method for studying differential gene expression and changing levels of noncoding RNAs at both the single-cell and bulk sequencing levels. RNA-seq thus offers a powerful arsenal of investigative tools for understanding the molecular basis of disease and for the discovery and testing of new treatment options.