Machine learning for multi-omics data integration in cancer

Abstract

Multi-omics data analysis is an important aspect of cancer molecular biology studies and has led to ground-breaking discoveries. Many efforts have been made to develop machine learning methods that automatically integrate omics data. Here, we review machine learning tools categorised as either general-purpose or task-specific, covering both supervised and unsupervised learning for integrative analysis of multi-omics data. We benchmark the performance of five machine learning approaches using data from the Cancer Cell Line Encyclopedia, reporting prediction accuracy on cancer type prediction and mean absolute error on drug response prediction, and evaluating runtime efficiency. This review provides recommendations to researchers regarding suitable machine learning method selection for their specific applications. It should also promote the development of novel machine learning methodologies for data integration, which will be essential for drug discovery, clinical trial design and personalised treatments.

Type
Publication
In iScience
Zhaoxiang (Simon) Cai
Zhaoxiang (Simon) Cai
Senior Data Scientist

Innovative researcher and engineer with experience in both academia and industry.