Integrating multi-omics data with biological knowledge by Transformer-based deep learning

Abstract

Omics data analysis, powered by machine learning, has significantly improved cancer diagnosis and prognosis. However, most machine learning methods consider each gene as an independent feature, failing to integrate experimentally-acquired gene regulation and pathway information. The benefit of utilising this information increases in the era of multi-omics, because gene regulation is the key mechanism that links different omic layers together. Here, we present an interpretable deep learning model, DeepPathNet, which uses cancer-specific pathway information for both single and multi-omics data analysis. DeePathNet leverages the cutting-edge deep learning technique, Transformer, which is derived from the field of natural language processing, to model complex interactions between pathways from omics data. The computation of self-attention in the Transformer module allows DeePathNet to learn the encoding of pathways to achieve superior predictive performance and interpretability. Techniques such as drop out layers are also integrated into DeePathNet to maximise its generalisability for unseen data. Moreover, DeePathNet supports any number of omics layers and can handle missing values. Using multiple evaluation metrics, we demonstrate that DeePathNet robustly outperforms traditional methods for predicting drug response and cancer type on four publicly available datasets, namely COSMIC Cell Lines, Genomics of Drug Sensitivity in Cancer (GDSC), Cancer Cell Line Encyclopedia (CCLE) and Cancer Therapeutics Response Portal (CTRP). DeePathNet also provides reliable model interpretation, potentially enabling biomarker discoveries at the pathway level. Using the Transformer, DeePathNet is the first method that supports multi-omics data analysis, integrates cancer pathway knowledge into modelling, and provides pathway-level model explanation.

Date
Nov 25, 2021 12:00 AM — 12:00 AM
Location
Virtual/Australia