Transcriptomics in the Time of Corona

Research scientists the world over are answering the challenges posed by the COVID-19 pandemic, working overtime to fully characterize the biology of the novel SARS-CoV-2 virus and the human response to infection. These efforts are essential to create diagnostic tests, find treatments among existing drugs and new molecules, develop vaccines for long term mitigation, understand transmission patterns, and prepare the world to live life alongside another killer bug.

The epicenter of this battle has become New York City, which has born the brunt of much of the sickness and death experienced in the US. New York-based researchers are also leading the charge to discover our way out of the crisis. Two recent manuscripts from Columbia and Weill Cornell, published on the open access preprint server bioRxiv, are worthy of specific attention. Both studies rely on detailed analyses of transcriptomics data, but for very different ends. One aims to help repurpose known drugs against COVID-19-related targets based on historical gene expression datasets. The other looks to develop a diagnostic and better understand patterns of transmission and mechanisms of disease susceptibility from RNA-sequencing of patient samples. Reading them side by side illustrates the versatility of gene expression analysis in the urgent and multi-pronged effort to turn the tide on COVID-19.

The first paper, from the Goldstein group at Columbia University Irving Medical Center, mined publicly available transcriptomics datasets and the literature, and analyzed these gene expression data to prioritize potential drug targets based on host (that’s us) susceptibility to infection. The authors choose to focus on the regulators of expression of genes known to mediate SARS-CoV-2 infection, including ACE2, TMPRSS2, and CatB/L. From the literature, the authors identified numerous studies in which drugs impacted the expression of these potential infection regulators. They performed a series of analyses to unearth one aspect or another of the impact of existing drugs on the potential targets, and link the findings from one disparate dataset to the next. While these analyses may be imperfect, for example relying in some cases on data from cancer cell lines, the study takes care to examine the findings from indirect models in the context of more relevant datasets, such as those derived from lung tissue.

One striking observation was the high proportion of drugs impacting TMPRSS2 that worked through androgen or estrogen receptors. This analysis raises a compelling hypothesis about why we see gender discrepancies in COVID-19 infection rates and severity, with men faring less well than women. TMPRSS2 is especially interesting among the potential targets because mouse studies suggest hitting this gene will be tolerated by the host. All the targets analyzed in this study have become subject of numerous drug repurposing efforts, in silico, in vitro and in vivo. We can only hope that one or more therapies are proved safe and effective, and soon.

The second study comes from Chris Mason’s team at Weill Cornell Medical School, along with an extended group of collaborators. This work included 338 confirmed or suspected COVID-19 patient samples and 86 environmental samples. The primary effort was to develop a diagnostic test for viral infection that is fast and reliable, and can be deployed in numerous settings outside of a well equipped medical testing facility. The authors reported a colorimetric assay—one where you can literally see the result in a test tube—based on a molecular biology technique called LAMP (loop-mediated isothermal amplification). LAMP is useful here because it works at a single temperature, thus obviating the need for special equipment, and is fast (~30 minutes). The test was validated using both qPCR and RNA-seq, and showed high specificity and sensitivity, which increased with viral load. This is an important contribution to our collective effort to make reliable testing available everywhere and to everyone, which ultimately presents one of the best chances (and biggest challenges) to emerging from our homes and re-engaging as a society.

Because Mason and colleagues analyzed the entire transcriptome data using total RNA-sequencing, they had a rich dataset that allowed them to accomplish several additional goals beyond validating the LAMP diagnostic: exploring the molecular mechanism underlying clinical outcome; measuring viral genomic variation & reconstructing phylogeny; and environmental screening for the presence of the virus in the subway system. One key observation included a strong association between patients taking ACEI (angiotensin converting enzyme inhibition) medication and SARS-CoV-2 infection. Another result, from the phylogenetic analysis, confirmed a predominance of European-origin viral strains, but with a large subgroup specific to New York. The paper presents numerous other findings from a large observational cohort.

Taken together, these two studies demonstrate the value in aggregated, curated and accessible biomedical datasets. As the various clinical and research communities battling COVID-19 continue to generate and contribute data—especially paired real world evidence and molecular profiling—our armament to fight this disease and future pandemics will become that much stronger.