. Author manuscript; available in PMC: 2024 Mar 25.

Published in final edited form as: Nat Protoc. 2023 Nov 21;18(12):3690–3731. doi: 10.1038/s41596-023-00892-x

Table 2 |.

Key parameters for CoGAPS/PyCoGAPS and guidance on setting their values

Parameter	Description	Guide to Setting
path	Path to data	Make sure data is log-normalized if providing a path rather than a data object
result_file	Name of result .h5ad file to output	Give this a descriptive name based on your data and run, such as PDACresult_50kiterations.h5ad
Standard parameters
nPatterns	Number of patterns CoGAPS will learn	The optimal number of patterns to learn will vary based on your data and may require several runs of varying values to observe learned features. We recommend starting off with selecting a value that represents the number of experimental conditions, cell types and/or biological processes expected from your data, as well as technical batches present
nIterations	Number of iterations of each phase of the algorithm	Higher iterations (i.e., 50,000 iterations) is recommended as it will lead to better convergence. However, higher iterations greatly increases runtime, so we invite the user to play around with values to observe the tradeoff and determine the appropriate value
useSparseOptimiz ation	Speeds up performance with sparse data	Set to true if using sparse data, i.e., if roughly >80% of data is zero
Run parameters
nThreads	Maximum number of threads to run on. Allows the underlying algorithm to run on multiple threads and has no effect on the mathematics of the algorithm	The precise number of threads to use depends on many factors such as hardware and data size. The best approach is to play around with different values and see how it affects the estimated time. This is separate from the distributed CoGAPS parallelization mechanism, which sets up multithreaded computing in a different way.
transposeData	Whether to transpose data	Whether to transpose the data matrix before running CoGAPS. Set to true if data is stored as samples × genes format (CoGAPS defaults to genes × samples format)
Distributed parameters
distributed	Whether to run distributed	Recommended in most cases for single-cell analysis. Set to ‘genome-wide’ for parallelization across genes, or ‘single-cell’ for parallelization across cells
nSets	Number of sets to break data into	For distributed with ‘genome-wide’, do not set value to below 2,000 genes per set. For distributed with ‘single-cell’, make sure this value captures sufficient representation of all cell types in the data
minNS	Minimum number of individual set contributions a cluster must contain	Be cautious in setting this value too high as increasing robustness may also cause misses in rare phenomenon or cells
maxNS	Maximum number of individual set contributions a cLuster can contain	Modifying this parameter is only important for highly correlated processes