Research

Publications

10 peer-reviewed publications — 5 as first or co-first author, cited over 900 times in total. Filter by theme or browse chronologically.

2025

2025 · Cancer Discovery

Federated deep learning enables cancer subtyping by proteomics

First author

Zhaoxiang Cai, Emma L Boys, Zaynab Noor, …, Roger R Reddel

The proteome provides unique insights into disease biology beyond the genome and transcriptome. However, the sharing of raw proteomic data across institutions is hindered by privacy concerns and data volume. Here, we present a federated deep learning framework for cancer subtyping using mass spectrometry-based proteomic data. By training on distributed datasets without centralized data sharing, our approach achieves performance comparable to centralized training. We demonstrate the utility of this framework by classifying 14 cancer subtypes across 7,500 cancer proteomes from multiple centers. This work introduces the first application of federated deep learning to cancer proteomics, enabling collaborative research while preserving data privacy.

2025 · Nature Communications

Large-scale drug sensitivity, gene dependency, and proteogenomic analyses of telomere maintenance mechanisms in cancer cells

Yangxiu Wu, Zhaoxiang Cai, Daniel Cross, …, Karen L MacKenzie

Telomere maintenance is a hallmark of cancer. Here, we present a large-scale analysis of telomere maintenance mechanisms (TMMs) in 976 cancer cell lines. Integrating proteomic, genomic, and transcriptomic data with drug sensitivity and CRISPR-Cas9 gene essentiality screens, we identify molecular features associated with telomerase activity and Alternative Lengthening of Telomeres (ALT). We discover broad heterogeneity in telomere biology beyond the binary TMM classification and develop multi-omic predictors for TMM status. Our findings reveal potential therapeutic vulnerabilities linked to specific TMMs, providing a resource for developing telomere-targeted cancer therapies.

2025 · Briefings in Bioinformatics

A technical review of multi-omics data integration methods: from classical statistical to deep generative approaches

Ana R Baião, Zhaoxiang Cai, Rebecca C Poulos, …, Emanuel Gonçalves

The integration of multi-omics data is essential for understanding complex biological systems. This review provides a comprehensive overview of multi-omics data integration methods, ranging from classical statistical approaches to state-of-the-art deep generative models. We discuss the challenges associated with high dimensionality, heterogeneity, and missing data, and highlight the potential of Variational Autoencoders (VAEs) and other deep learning techniques for data imputation, augmentation, and joint embedding. The review also covers emerging trends such as foundation models and contrastive learning in the context of multi-omics integration.

2025 · Stem Cell Reports

MIXL1 activation in endoderm differentiation of human induced pluripotent stem cells

Pierre Osteil, Sarah Withey, Nicole Santucci, …, Patrick P L Tam

MIXL1 plays a critical role in endoderm differentiation. Here, we demonstrate that MIXL1 activation is a key determinant of the efficiency of definitive endoderm generation from human induced pluripotent stem cells (hiPSCs). By modulating MIXL1 expression, we show that lineage propensity can be re-wired, enhancing the differentiation potential toward endoderm lineages. This work provides insights for optimizing stem cell differentiation protocols for regenerative medicine applications.

2024

2024 · Cancer Research Communications

DeePathNet: a transformer-based deep learning model integrating multi-omic data with cancer pathways

First author

Zhaoxiang Cai, Rebecca C Poulos, Jianmin Liu, Qing Zhong

DeePathNet is a transformer-based deep learning model that integrates multi-omic data with biological pathway information. By embedding pathway knowledge directly into the model architecture, DeePathNet improves the interpretability and performance of cancer subtype classification and drug response prediction. We demonstrate the utility of DeePathNet on large-scale datasets, highlighting its ability to identify pathway-level biomarkers and mechanisms of action.

2024 · Nature Communications

Synthetic augmentation of cancer cell line multi-omic datasets using unsupervised deep learning

First author

Zhaoxiang Cai, Samuel Apolinário, Ana R Baião, …, Emanuel Gonçalves

We introduce MOSA (Multi-Omic Synthetic Augmentation), an unsupervised deep learning model for integrating and augmenting cancer multi-omics data. By leveraging variational autoencoders, MOSA generates synthetic multi-omic profiles that expand the effective sample size of cancer datasets, enabling the discovery of new biomarkers and drug targets. We demonstrate that MOSA-augmented data improves the power of association studies and clustering analyses, providing a valuable resource for the cancer research community.

2022

2022 · Cancer Cell

Pan-cancer proteomic map of 949 human cell lines reveals principles of cancer vulnerabilities

Co-first author

Zhaoxiang Cai, Emanuel Gonçalves, Rebecca C Poulos, …, Roger R Reddel

The proteome provides unique insights into disease biology beyond the genome and transcriptome. A lack of large proteomic datasets has restricted the identification of new cancer biomarkers. Here, proteomes of 949 cancer cell lines across 28 tissue types are analyzed by mass spectrometry. Deploying a workflow to quantify 8,498 proteins, these data capture evidence of cell-type and post-transcriptional modifications. Integrating multi-omics, drug response, and CRISPR-Cas9 gene essentiality screens with a deep learning-based pipeline reveals thousands of protein biomarkers of cancer vulnerabilities that are not significant at the transcript level. The power of the proteome to predict drug response is very similar to that of the transcriptome. Further, random downsampling to only 1,500 proteins has limited impact on predictive power, consistent with protein networks being highly connected and co-regulated. This pan-cancer proteomic map (ProCan-DepMapSanger) is a comprehensive resource available at https://cellmodelpassports.sanger.ac.uk.

2022 · iScience

Machine learning for multi-omics data integration in cancer

First author

Zhaoxiang Cai, Rebecca C Poulos, Jia Liu, Qing Zhong

Multi-omics data analysis is an important aspect of cancer molecular biology studies and has led to ground-breaking discoveries. Many efforts have been made to develop machine learning methods that automatically integrate omics data. Here, we review machine learning tools categorised as either general-purpose or task-specific, covering both supervised and unsupervised learning for integrative analysis of multi-omics data. We benchmark the performance of five machine learning approaches using data from the Cancer Cell Line Encyclopedia, reporting prediction accuracy on cancer type prediction and mean absolute error on drug response prediction, and evaluating runtime efficiency. This review provides recommendations to researchers regarding suitable machine learning method selection for their specific applications. It should also promote the development of novel machine learning methodologies for data integration, which will be essential for drug discovery, clinical trial design and personalised treatments.

2022 · Proteomics

Opportunities for pharmacoproteomics in biomarker discovery

Rebecca C Poulos, Zhaoxiang Cai, Phillip J Robinson, Roger R Reddel, Qing Zhong

Proteomic data are a uniquely valuable resource for drug response prediction and biomarker discovery because most drugs interact directly with proteins in target cells rather than with DNA or RNA. Recent advances in mass spectrometry and associated processing methods have enabled the generation of large-scale proteomic datasets. Here we review the significant opportunities that currently exist to combine large-scale proteomic data with drug-related research, a field termed pharmacoproteomics. We describe successful applications of drug response prediction using molecular data, with an emphasis on oncology. We focus on technical advances in data-independent acquisition mass spectrometry (DIA-MS) that can facilitate the discovery of protein biomarkers for drug responses, alongside the increased availability of big biomedical data. We spotlight new opportunities for machine learning in pharmacoproteomics, driven by the combination of these large datasets and improved high-performance computing. Finally, we explore the value of pre-clinical models for pharmacoproteomic studies and the accompanying challenges of clinical validation. We propose that pharmacoproteomics offers the potential for novel discovery and innovation within the cancer landscape.

2015

2015 · arXiv

HetFHMM: A novel approach to infer tumor heterogeneity using factorial Hidden Markov model

Gholamreza Haffari, Zhaoxiang Cai, Mohammad S Rahman, Ann E Nicholson

Cancer arises from successive rounds of mutations which generate tumor cells with different genomic variation i.e. clones. For drug responsiveness and therapeutics, it is necessary to identify the clones in tumor sample accurately. Many methods are developed to infer tumor heterogeneity by either computing cellular prevalence and tumor phylogeny or predicting genotype of mutations. All methods suffer some problems e.g. inaccurate computation of clonal frequencies, discarding clone specific genotypes etc. In the paper, we propose a method, called- HetFHMM to infer tumor heterogeneity by predicting clone specific genotypes and cellular prevalence. To infer clone specific genotype, we consider the presence of multiple mutations at any genomic location. We also tested our model on different simulated data. The results shows that HetFHMM outperforms recent methods which infer tumor heterogeneity. Therefore, HetFHMM is a novel approach in tumor heterogeneity research area.

2014

2014 · Analytical Chemistry

Barcode-Like Paper Sensor for Smartphone Diagnostics: An Application of Blood Typing

Liyun Guan, Junfei Tian, Rong Cao, Miaosi Li, Zhaoxiang Cai, Wei Shen

This study introduced a barcode-like design into a paper-based blood typing device by integrating with smartphone-based technology. The concept of presenting a paper-based blood typing assay in a barcode-like pattern significantly enhanced the adaptability of the assay to the smartphone technology. The fabrication of this device involved the use of a printing technique to define hydrophilic bar channels which were, respectively, treated with Anti-A, -B, and -D antibodies. These channels were then used to perform blood typing assays by introducing a blood sample. Blood type can be visually identified from eluting lengths in bar channels. A smartphone-based analytical application was designed to read the bar channels, analogous to scanning a barcode, interpret this information, and then report results to users. The proposed paper-based blood typing device is rapidly read by smartphones and easy for the user to operate. We envisage that the adaptation of paper-based devices to the widely accepted smartphone technology will increase the capability of paper-based diagnostics with rapid assay result interpretation, data storage, and transmission.