Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

ArXiv logoLink to ArXiv
[Preprint]. 2024 Oct 31:arXiv:2410.23595v1. [Version 1]

Disentangling Interpretable Factors with Supervised Independent Subspace Principal Component Analysis

Jiayu Su, David A Knowles, Raul Rabadan
PMCID: PMC11581103  PMID: 39575118

Abstract

The success of machine learning models relies heavily on effectively representing high-dimensional data. However, ensuring data representations capture human-understandable concepts remains difficult, often requiring the incorporation of prior knowledge and decomposition of data into multiple subspaces. Traditional linear methods fall short in modeling more than one space, while more expressive deep learning approaches lack interpretability. Here, we introduce Supervised Independent Subspace Principal Component Analysis ($\texttt{sisPCA}$), a PCA extension designed for multi-subspace learning. Leveraging the Hilbert-Schmidt Independence Criterion (HSIC), $\texttt{sisPCA}$ incorporates supervision and simultaneously ensures subspace disentanglement. We demonstrate $\texttt{sisPCA}$'s connections with autoencoders and regularized linear regression and showcase its ability to identify and separate hidden data structures through extensive applications, including breast cancer diagnosis from image features, learning aging-associated DNA methylation changes, and single-cell analysis of malaria infection. Our results reveal distinct functional pathways associated with malaria colonization, underscoring the essentiality of explainable representation in high-dimensional data analysis.

Full Text Availability

The license terms selected by the author(s) for this preprint version do not permit archiving in PMC. The full text is available from the preprint server.

10 pages and 6 figures in the main text; To be published in the Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS 2024)


Articles from ArXiv are provided here courtesy of arXiv

RESOURCES