Skip to main content
. Author manuscript; available in PMC: 2024 Mar 25.
Published in final edited form as: Nat Protoc. 2023 Nov 21;18(12):3690–3731. doi: 10.1038/s41596-023-00892-x

Table 2 |.

Key parameters for CoGAPS/PyCoGAPS and guidance on setting their values

Parameter Description Guide to Setting
path Path to data Make sure data is log-normalized if providing a path rather than a data object
result_file Name of result .h5ad file to output Give this a descriptive name based on your data and run, such as PDACresult_50kiterations.h5ad
Standard parameters
nPatterns Number of patterns CoGAPS will learn The optimal number of patterns to learn will vary based on your data and may require several runs of varying values to observe learned features. We recommend starting off with selecting a value that represents the number of experimental conditions, cell types and/or biological processes expected from your data, as well as technical batches present
nIterations Number of iterations of each phase of the algorithm Higher iterations (i.e., 50,000 iterations) is recommended as it will lead to better convergence. However, higher iterations greatly increases runtime, so we invite the user to play around with values to observe the tradeoff and determine the appropriate value
useSparseOptimiz ation Speeds up performance with sparse data Set to true if using sparse data, i.e., if roughly >80% of data is zero
Run parameters
nThreads Maximum number of threads to run on. Allows the underlying algorithm to run on multiple threads and has no effect on the mathematics of the algorithm The precise number of threads to use depends on many factors such as hardware and data size. The best approach is to play around with different values and see how it affects the estimated time. This is separate from the distributed CoGAPS parallelization mechanism, which sets up multithreaded computing in a different way.
transposeData Whether to transpose data Whether to transpose the data matrix before running CoGAPS. Set to true if data is stored as samples × genes format (CoGAPS defaults to genes × samples format)
Distributed parameters
distributed Whether to run distributed Recommended in most cases for single-cell analysis. Set to ‘genome-wide’ for parallelization across genes, or ‘single-cell’ for parallelization across cells
nSets Number of sets to break data into For distributed with ‘genome-wide’, do not set value to below 2,000 genes per set. For distributed with ‘single-cell’, make sure this value captures sufficient representation of all cell types in the data
minNS Minimum number of individual set contributions a cluster must contain Be cautious in setting this value too high as increasing robustness may also cause misses in rare phenomenon or cells
maxNS Maximum number of individual set contributions a cLuster can contain Modifying this parameter is only important for highly correlated processes