Thiopurines are a class of chemotherapy drugs used in treatment of ALL and IBD, and are toxic to cells. Previous observations noted an association between decreased activity of the gene TPMT and increased toxicity. However, the endogenous role of TPMT and its molecular processes were unknown. We performed data fusion to construct a predictive model of this genetic system. The model integrated proprietary patient data from gene expression profiling and genotype-phenotype correlations, as well as public information sources such as gene annotations, Gene Ontology semantic structure, and protein-protein interactions. The model enabled gene prioritization and gene network prediction, which suggested a role for TPMT in oxidoreductive processes and regulating cell redox capacity. In vitro studies confirmed a difference in oxidative toxicity in HepG2 cells transfected with TPMT versus controls, validating the model’s general prediction. Additional predicted gene associations are under investigation as biomarkers of TPMT-mediated thiopurine response.
Genialis supported the development of cancer diagnostic biochips by applying various machine learning methods to disease and control data sets from medium and large probe sets. We surveyed the performance of Support Vector Machines, Neural Networks, Random Forest, and Gradient Boosting, as well as logistic regression. In addition to providing the customer with a ranked prediction set for immediate lab validation, we performed a detailed examination of model performance across different methods and attribute optimization. The final trusted results were reached by model stacking to produce a super-model with superior prediction accuracy while minimizing risk of over-fitting.
We identified novel genetic risk factors for two disease indications, ulcerative colitis and rheumatoid arthritis, from a genetically distinct north Indian cohort. Both disease datasets contained microarray genotype data of disease and control patients and GWAS studies. The goal was to discover novel genetic risk factors that were not revealed by the GWAS studies. We compared the GWAS ranking of genetic risk factors with the rankings by 3 machine learning methods for supervised learning: random forest, support vector machine and neural networks. Models were trained on the microarray genotype data with genetic risk factors as explanatory variables and the phenotype (disease or control) as a dependent target variable. For each model, we estimated the importance of genetic risk factors. The variable importance measures how much a risk factor contributes to the correct prediction of the target variable. Finally, genetic risk factors were ranked by their importance, revealing novel insights from the ML models previously overlooked by GWAS alone.
U-BIOPRED (Unbiased BIOmarkers in PREDiction of respiratory disease outcomes) was a multi-country, multi-year research project that used patient medical information and tissue samples to learn more about different types of asthma to ensure better diagnosis and treatment for each person. Genialis partnered with Boehringer Ingelheim to develop a web application for the identification of gene signatures associated with the characteristics of patient cohort. Using patients’ microarray data from the U-BIOPRED database, our web-app provided access to hundreds of expression profiles and more than a thousand demographic and clinical variables for each patient. We built a data import function from the tranSMART database, and custom ontology to accommodate the project-specific metadata. The visualization suite included graphical modules for everything from exploring individual gene expression patterns to stratifying patients by metadata parameters using unsupervised clustering.
The Cergentis R&D team engaged Genialis to design and develop ATLAS (Analysis of TLA Sequencing), a proprietary algorithm and visualization tool that rapidly identifies causal variants, structural anomalies, and transgenic rearrangements from… We integrated ATLAS into the Genialis platform to support upstream data processing and downstream visualization and interpretation for Cergentis customers. Genialis deployed the software for use at Novartis to identify high value transgenic Chinese Hamster Ovary clones for biologics production. The ATLAS application is being further extended to oncology companion diagnostics for analyzing whole-gene variation landscapes.
The Genialis platform supports a bicoastal consortium of top research labs studying the epigenetics of aging. In addition to standard analysis workflows for RNA-, ChIP- and ATAC-seq, Genialis works with our partners to develop and automate new pipelines and QC tools, standardize analyses across diverse datasets, and apply machine learning approaches for integrative data mining of the hosted data. We also lead the effort to identify, curate and onboard relevant datasets from the public domain to augment the user-generated content on platform. A key goal of the project is to identify potential points of intervention to stop, slow, or reverse cellular senescence. We have additionally developed visual applications to integrate gene expression and genome regulatory with cell line dependency data to enable rapid identification of potential drug targets.
Genialis works with the RNA Regulatory Networks Laboratory at Francis Crick, to develop a comprehensive data repository and analysis software for RNA-protein interaction and associated gene expression data. Protein-RNA interactions represent one of the most crucial yet understudied aspects of the way our cells regulate gene activity, and are implicated in myriad diseases and developmental processes, e.g. cancer, motor neuron disease, Fragile X syndrome and ataxia. This project aims to elucidate the dynamics and evolution of protein-RNA complexes, and to build models to predict targets for novel therapies and disease diagnostics. We have created algorithms and a web-based user interface for managing, analyzing, exploring and sharing CLIP- and RNA-seq data. Further, we developed a custom AI-based quality control module to automatically detect and report on pass-fail characteristics of these data.