Skip to main content
. Author manuscript; available in PMC: 2021 Oct 19.
Published in final edited form as: Nat Biotechnol. 2021 Apr 19;39(8):1000–1007. doi: 10.1038/s41587-021-00867-x

Figure 1. Overview of the online iNMF algorithm.

Figure 1.

a, Schematic of integrative nonnegative matrix factorization (iNMF): the input single-cell datasets are jointly decomposed into shared (W) and dataset-specific (Vi) metagenes and corresponding “metagene expression levels” or cell factor loadings (Hi). These metagenes and cell factor loadings provide a quantitative definition of cell identity and how it varies across biological settings. b-d, Three different scenarios in which online learning can be used for single-cell data integration. (b) Scenario 1: the single-cell datasets are large but fully observed. Online iNMF processes the data in random mini-batches, enabling memory usage and/or disk storage independent of dataset size. Each cell may be used multiple times in different epochs of training to update the metagenes. (c) Scenario 2: the datasets arrive sequentially, and online iNMF processes the datasets as they arrive, using each cell to update the metagenes exactly once. (d) Scenario 3: online iNMF is performed as in scenario 1 or scenario 2 to learn W and Vi. Then cell factor loadings for the newly arriving dataset are calculated using the shared metagenes (W) learned from previously processed datasets. The new dataset is not used to update the metagenes.