Skip to main content
EPA Author Manuscripts logoLink to EPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Jul 28.
Published in final edited form as: J Open Source Softw. 2024 Jul 26;9(99):10.21105/joss.06389. doi: 10.21105/joss.06389

SSN2: The next generation of spatial stream network modeling in R

Michael Dumelle 1, Erin E Peterson 2, Jay M Ver Hoef 3, Alan Pearse 4, Daniel J Isaak 5
PMCID: PMC11346377  NIHMSID: NIHMS2017548  PMID: 39193024

Summary

The SSN2 R package provides tools for spatial statistical modeling, parameter estimation, and prediction on stream (river) networks. SSN2 is the successor to the SSN R package (Ver Hoef, Peterson, Clifford, & Shah, 2014), which was archived alongside broader changes in the R-spatial ecosystem (Nowosad, 2023) that included 1) the retirement of rgdal (Bivand, Keitt, & Rowlingson, 2021), rgeos (Bivand & Rundel, 2020), and maptools (Bivand & Lewin-Koh, 2021) and 2) the lack of active development of sp (Bivand, Pebesma, & Gómez-Rubio, 2013). SSN2 maintains compatibility with the input data file structures used by the SSN R package but leverages modern R-spatial tools like sf (Pebesma, 2018). SSN2 also provides many useful features that were not available in the SSN R package, including new modeling and helper functions, enhanced fitting algorithms, and simplified syntax consistent with other R generic functions.

Statement of Need

Streams provide vital aquatic services that sustain wildlife, provide drinking and irrigation water, and support recreational and cultural activities. Data are often collected at various locations on a stream network and used to characterize spatial patterns in stream phenomena. For example, a manager may need to know how the amount of a hazardous chemical changes throughout a stream network to inform mitigation efforts. Comprehensive formulations of spatial stream network (SSN) models are provided by Ver Hoef & Peterson (2010), Peterson & Ver Hoef (2010), and Ver Hoef et al. (2014). The SSN2 R package is designed to help users fit SSN models to their stream network data.

SSN models use a spatial statistical modeling framework (e.g., Cressie, 1993) to describe unique and complex dependencies on a stream network resulting from a branching network structure, directional water flow, and differences in flow volume. These SSN models relate a continuous or discrete response variable to one or more explanatory variables, a spatially independent random error term, and up to three spatially dependent random error terms: tail-up random errors, tail-down random errors, and Euclidean random errors. Tail-up random errors restrict spatial dependence to flow-connected sites (i.e., water flows from an upstream to a downstream site) and incorporate spatial weights through an additive function to describe the branching network between sites. Tail-down random errors describe spatial dependence between both flow-connected and flow-unconnected sites (i.e., sites that share a common downstream junction but not flow), but spatial weights are not required. Euclidean random errors describe spatial dependence between sites based on straight-line distance and are governed by factors not confined to the stream network, such as regional geology. The variances and the length-scales of spatial dependence in the tail-up, tail-down, and Euclidean random errors are controlled by separate variance (i.e., partial sill) and range parameters, respectively, while the spatially independent variance (i.e., nugget) is controlled by another separate variance parameter. In this paper, we show how to use the SSN2 R package to fit SSN models, inspect SSN models, and use SSN models to make predictions at unobserved locations on a stream network.

Package Overview

The streams, observation, and prediction datasets must be pre-processed prior to fitting SSN models and making predictions at unobserved locations using SSN2. Previously, the STARS toolset for ArcGIS Desktop versions 9.3x - 10.8x (Peterson & Ver Hoef, 2014) or the openSTARS R package (Kattwinkel, Szöcs, Peterson, & Schäfer, 2020) were used to generate spatial information required for model fitting and prediction. However, both software packages have recently been retired and are replaced by the SSNbler R package (Peterson, Dumelle, Pearse, Teleki, & Ver Hoef, 2024), which is a new, R-based version of the STARS tools. SSNbler is currently available on GitHub, will soon be available on CRAN, and contains several useful resources that guide users through these pre-processing steps. Pre-processing using either SSNbler, STARS, or openSTARS ends with the creation of a .ssn folder, which is non-proprietary. Files residing in the .ssn folder are read into R using ssn_import() from SSN2 and placed into a list structure called an SSN object, which contains all the spatial, topological, and attribute information needed to leverage the modeling tools in SSN2.

SSN2 is first installed from CRAN:

install.packages(“SSN2”)

Then, SSN2 is loaded into an R session:

library(SSN2)

The SSN2 package comes with an example .ssn folder called MiddleFork04.ssn that represents water temperatures recorded from a stream network in the Middle Fork of the Salmon River in Idaho, USA during 2004.

Several functions in SSN2 for reading and writing data directly manipulate the .ssn folder. To avoid directly manipulating the MiddleFork04.ssn data installed alongside SSN2, MiddleFork04.ssn is instead copied into a temporary directory and the relevant path to this directory stored:

copy_lsn_to_temp()
path <- file.path(tempdir(), “MiddleFork04.ssn”)

The copy_lsn_to_temp() function is only used when working with MiddleFork04.ssn and generally, path should indicate a permanent directory on your computer that points towards your .ssn object. After specifying path, the stream reaches, observed sites, and prediction sites (pred1km) are imported and then visualized (Figure 1):

Figure 1:

Figure 1:

Middle Fork 2004 stream networks. Observed sites are represented by brown, closed circles at various locations throughout the stream network. Prediction sites are represented by blue, closed triangles and are spaced one kilometer apart.

mf04p <- ssn_import(path, predpts = “pred1km”)
library(ggplot2)
ggplot() +
  geom_sf(data = mf04p$edges) +
  geom_sf(data = mf04p$preds$pred1km, pch = 17, color = “blue”) + 
  geom_sf(data = mf04p$obs, color = “brown”, size = 2) + 
  theme_bw()

Prior to statistical modeling, hydrologic distance matrices are created (Ver Hoef & Peterson, 2010):

ssn_create_distmat(mf04p, predpts = “pred1km”, overwrite = TRUE)

Of particular interest here is summer mean stream temperature (Summer_mn) in degrees Celsius, which will be modeled as a function of elevation (ELEV_DEM) and watershed-averaged precipitation (AREAWTMAP) with exponential, spherical, and Gaussian structures for the tail-up, tail-down, and Euclidean errors, respectively, and a nugget effect (by default). Using ssn_lm(), the model is fit:

ssn_mod <- ssn_lm(
  formula = Summer_mn ~ ELEV_DEM + AREAWTMAP, 
  ssn.object = mf04p,
  tailup_type = “exponential”, 
  taildown_type = “spherical”,
  euclid_type = “gaussian”,
  additive = “afvArea”
)

The additive argument represents an “additive function value (AFV)” variable that captures branching in the stream network and is required when modeling the tail-up covariance. Cumulative watershed area is commonly used to derive the additive function value (here, afvArea represents cumulative watershed area), but other variables like flow can be used (if every line feature in the edges dataset contains a non-null value). Ver Hoef & Peterson (2010) provide further details regarding additive function values.

The ssn_lm() function is designed to be similar in syntax and structure to the lm() function in base R for fitting nonspatial linear models. Additionally, SSN2 accommodates various S3 methods for commonly-used R generic functions that operate on model objects. For example, the generic function summary() is used to summarize the fitted model:

graphic file with name nihms-2017548-f0003.jpg

graphic file with name nihms-2017548-f0004.jpg

SSN2 methods for the tidy(), glance(), and augment() generic functions from the broom R package (Robinson, Hayes, & Couch, 2021) are used to inspect the fitted model and provide diagnostics:

graphic file with name nihms-2017548-f0005.jpg

graphic file with name nihms-2017548-f0006.jpg

Specific generic helper functions (e.g., coef(), AIC(), residuals()) can be used to obtain the same quantities returned by tidy(), glance(), and augment():

graphic file with name nihms-2017548-f0007.jpg

Spatial prediction (i.e., Kriging) at the unobserved sites is performed using the generic functions predict() or augment():

graphic file with name nihms-2017548-f0008.jpg

Here, .fitted are the predictions, .lower are the lower bounds of 95% prediction intervals, and .upper are the upper bounds of 95% prediction intervals. Utilizing augment() makes the prediction output straightforward to visualize:

ggplot() +
  geom_sf(data = mf04p$edges) +
  geom_sf(data = aug_pred, aes(color = .fitted), size = 2) +
  scale_color_viridis_c(name = “Pred.”, option = “H”) +
  theme_bw()

Spatial generalized linear models for binary, count, proportion, and skewed data (Ver Hoef et al., 2024) are applied to stream networks via the ssn_glm() function. ssn_lm() and ssn_glm() also accommodate several advanced features, which include nonspatial random effects as in lme4 (Bates, Mächler, Bolker, & Walker, 2015) and nlme (Pinheiro & Bates, 2006) Euclidean anisotropy (Zimmerman & Ver Hoef, 2024), and more. In addition to modeling, simulating data on a stream network is performed via ssn_simulate().

Discussion

SSN models are valuable tools for statistical analysis of data collected on stream networks and help improve inference about vital stream ecosystems. These models have been employed to better understand and manage water quality (McManus et al., 2020; Scown, McManus, Carson Jr, & Nietch, 2017), ecosystem metabolism (Rodríguez-Castillo, Estévez, González-Ferreras, & Barquín, 2019), and climate change impacts on freshwater ecosystems (Isaak, Wenger, et al., 2017; Ruesch et al., 2012), as well as generate aquatic population estimates (Isaak, Ver Hoef, Peterson, Horan, & Nagel, 2017), inform conservation planning (Rodríguez-González et al., 2019; Sharma, Dubey, Johnson, Rawal, & Sivakumar, 2021), and assess restoration activities (Fuller, Leinenbach, Detenbeck, Labiosa, & Isaak, 2022), among other applications. The breadth and applicability of SSN models are further enhanced by data aggregation tools like the National Hydrography Dataset (McKay et al., 2012), National Stream Internet Project (Nagel, Peterson, Isaak, Ver Hoef, & Horan, 2015), and StreamCat (Hill, Weber, Leibowitz, Olsen, & Thornbrugh, 2016).

There are several spatial modeling packages in R, including geoR (Ribeiro Jr et al., 2022), gstat (Pebesma, 2004), FRK (Sainsbury-Dale, Zammit-Mangion, & Cressie, 2024), fields (Nychka, Furrer, Paige, & Sain, 2021), R-INLA (Lindgren & Rue, 2015), and spmodel (Dumelle, Higham, & Ver Hoef, 2023), among others. However, these aforementioned spatial modeling packages do not account for the unique spatial relationships found in data collected on stream networks. The rtop (Skoien et al., 2014), VAST (Charsley et al., 2023), and SSN2 R packages can be used to describe spatial stream network data in R, but SSN2 is unique. It not only provides representations of stream network data in R but also provides an extensive suite of functions for model fitting, diagnostics, and spatial prediction that integrate with the popular “tidy” framework (Kuhn & Silge, 2022; Wickham et al., 2019). To learn more about SSN2, visit the CRAN webpage at https://CRAN.R-project.org/package=SSN2.

Figure 2:

Figure 2:

Predicted Middle Fork 2004 mean summer temperatures (Celsius) spaced one kilometer apart. As expected, temperature is predicted to be lower in areas of higher elevation.

Acknowledgements

Figures were created using ggplot2 (Wickham, 2016) and the viridis color palettes (Garnier et al., 2024).

We would like to sincerely thank the editor and reviewers for all of their helpful feedback which greatly improved both the software and the manuscript.

The views expressed in this manuscript are those of the authors and do not necessarily represent the views or policies of USEPA, NOAA, or USFS. Any mention of trade names, products, or services does not imply an endorsement by the U.S. government, USEPA, NOAA, or USFS. USEPA, NOAA, or USFS do not endorse any commercial products, services or enterprises.

References

  1. Bates D, Mächler M, Bolker B, & Walker S (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. doi: 10.18637/jss.v067.i01 [DOI] [Google Scholar]
  2. Bivand R, Keitt T, & Rowlingson B (2021). rgdal: Bindings for the ‘geospatial’ data abstraction library. Retrieved from https://CRAN.R-project.org/package=rgdal
  3. Bivand R, & Lewin-Koh N (2021). maptools: Tools for handling spatial objects. Retrieved from https://CRAN.R-project.org/package=maptools
  4. Bivand R, Pebesma E, & Gómez-Rubio V (2013). Applied spatial data analysis with R. Springer, NY. doi: 10.1007/978-1-4614-7618-4 [DOI] [Google Scholar]
  5. Bivand R, & Rundel C (2020). rgeos: Interface to geometry engine - open source (‘GEOS’). Retrieved from https://CRAN.R-project.org/package=rgeos
  6. Charsley AR, Grüss A, Thorson JT, Rudd MB, Crow SK, David B, Williams EK, et al. (2023). Catchment-scale stream network spatio-temporal models, applied to the freshwater stages of a diadromous fish species, longfin eel (Anguilla dieffenbachii). Fisheries Research, 259, 106583. doi: 10.1016/j.fishres.2022.106583 [DOI] [Google Scholar]
  7. Cressie N (1993). Statistics for spatial data (revised edition). Wiley: Hoboken, NJ. doi: 10.1002/9781119115151 [DOI] [Google Scholar]
  8. Dumelle M, Higham M, & Ver Hoef JM (2023). spmodel: Spatial statistical modeling and prediction in R. PLOS ONE, 18(3), 1–32. doi: 10.1371/journal.pone.0282524 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Fuller MR, Leinenbach P, Detenbeck NE, Labiosa R, & Isaak DJ (2022). Riparian vegetation shade restoration and loss effects on recent and future stream temperatures. Restoration Ecology, 30(7), e13626. doi: 10.1111/rec.13626 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Garnier Simon, Ross Noam, Rudis Robert, Camargo, et al. (2024). viridis(Lite) - colorblind-friendly color maps for R. doi: 10.5281/zenodo.4679423 [DOI]
  11. Hill RA, Weber MH, Leibowitz SG, Olsen AR, & Thornbrugh DJ (2016). The stream-catchment (StreamCat) dataset: A database of watershed metrics for the conterminous United States. JAWRA Journal of the American Water Resources Association, 52(1), 120–128. doi: 10.1111/1752-1688.12372 [DOI] [Google Scholar]
  12. Isaak DJ, Ver Hoef JM, Peterson EE, Horan DL, & Nagel DE (2017). Scalable population estimates using spatial-stream-network (SSN) models, fish density surveys, and national geospatial database frameworks for streams. Canadian Journal of Fisheries and Aquatic Sciences, 74(2), 147–156. doi: 10.1139/cjfas-2016-0247 [DOI] [Google Scholar]
  13. Isaak DJ, Wenger SJ, Peterson EE, Ver Hoef JM, Nagel DE, Luce CH, Hostetler SW, et al. (2017). The NorWeST summer stream temperature model and scenarios for the western US: A crowd-sourced database and new geospatial tools foster a user community and predict broad climate warming of rivers and streams. Water Resources Research, 53(11), 9181–9205. doi: 10.1002/2017WR020969 [DOI] [Google Scholar]
  14. Kattwinkel M, Szöcs E, Peterson EE, & Schäfer RB (2020). Preparing GIS data for analysis of stream monitoring data: The R package openSTARS. PLOS ONE, 15(9), e0239237. doi: 10.1371/journal.pone.0239237 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Kuhn M, & Silge J (2022). Tidy modeling with R. O’Reilly Media, Inc. [Google Scholar]
  16. Lindgren F, & Rue H (2015). Bayesian spatial modelling with R-INLA. Journal of Statistical Software, 63(19). doi: 10.18637/jss.v063.i19 [DOI] [Google Scholar]
  17. McKay L, Bondelid T, Dewald T, Johnston J, Moore R, & Reah A (2012). NHD-Plus version 2: User guide. Retrieved from http://www.horizon-systems.com/NHDPlus/NHDPlusV2_home.php
  18. McManus MG, D’Amico E, Smith EM, Polinsky R, Ackerman J, & Tyler K (2020). Variation in stream network relationships and geospatial predictions of watershed conductivity. Freshwater Science, 39(4), 704–721. doi: 10.1086/710340 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Nagel D, Peterson EE, Isaak DJ, Ver Hoef JM, & Horan D (2015). National stream internet protocol and user guide. US Forest Service, Rocky Mountain Research Station Air, Water, and Aquatic Environments Program. Retrieved from https://research.fs.usda.gov/sites/default/files/2023-03/rmrs-nationalstreaminternetprotocolanduserguide.pdf [Google Scholar]
  20. Nowosad J (2023, June 4). Upcoming Changes to Popular R Packages for Spatial Data: What You Need to Do. Retrieved from https://geocompx.org//post/2023/rgdal-retirement
  21. Nychka D, Furrer R, Paige J, & Sain S (2021). fields: Tools for spatial data. Boulder, CO, USA: University Corporation for Atmospheric Research. doi: 10.32614/CRAN.package.fields [DOI] [Google Scholar]
  22. Pebesma E (2004). Multivariable geostatistics in S: The gstat package. Computers & Geosciences, 30, 683–691. doi: 10.1016/j.cageo.2004.03.012 [DOI] [Google Scholar]
  23. Pebesma E (2018). Simple Features for R: Standardized Support for Spatial Vector Data. The R Journal, 10(1), 439–446. doi: 10.32614/RJ-2018-009 [DOI] [Google Scholar]
  24. Peterson EE, Dumelle M, Pearse A, Teleki D, & Ver Hoef JM (2024). SSNbler: Assemble SSN objects in R. Retrieved from https://github.com/pet221/SSNbler
  25. Peterson EE, & Ver Hoef JM (2010). A mixed-model moving-average approach to geostatistical modeling in stream networks. Ecology, 91(3), 644–651. doi: 10.1890/08-1668.1 [DOI] [PubMed] [Google Scholar]
  26. Peterson EE, & Ver Hoef JM (2014). STARS: An ArcGIS toolset used to calculate the spatial information needed to fit spatial statistical models to stream network data. Journal of Statistical Software, 56, 1–17. doi: 10.18637/jss.v056.i02 [DOI] [Google Scholar]
  27. Pinheiro J, & Bates D (2006). Mixed-effects models in S and S-PLUS. Springer Science & Business Media: New York, NY. [Google Scholar]
  28. Ribeiro PJ Jr, Diggle P, Christensen O, Schlather M, Bivand R, & Ripley B (2022). geoR: Analysis of geostatistical data. doi: 10.32614/CRAN.package.geoR [DOI]
  29. Robinson D, Hayes A, & Couch S (2021). broom: Convert statistical objects into tidy tibbles. doi: 10.32614/CRAN.package.broom [DOI]
  30. Rodríguez-Castillo T, Estévez E, González-Ferreras AM, & Barquín J (2019). Estimating ecosystem metabolism to entire river networks. Ecosystems, 22, 892–911. doi: 10.1007/s10021-018-0311-8 [DOI] [Google Scholar]
  31. Rodríguez-González PM, García C, Albuquerque A, Monteiro-Henriques T, Faria C, Guimarães JB, Mendonça D, et al. (2019). A spatial stream-network approach assists in managing the remnant genetic diversity of riparian forests. Scientific Reports, 9(1), 6741. doi: 10.1038/s41598-019-43132-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Ruesch AS, Torgersen CE, Lawler JJ, Olden JD, Peterson EE, Volk CJ, & Lawrence DJ (2012). Projected climate-induced habitat loss for salmonids in the John Day River Network, Oregon, USA. Conservation Biology, 26(5), 873–882. doi: 10.1111/j.1523-1739.2012.01897.x [DOI] [PubMed] [Google Scholar]
  33. Sainsbury-Dale M, Zammit-Mangion A, & Cressie N (2024). Modeling big, heterogeneous, non-gaussian spatial and spatio-temporal data using FRK. Journal of Statistical Software, 108, 1–39. doi: 10.18637/jss.v108.i10 [DOI] [Google Scholar]
  34. Scown MW, McManus MG, Carson JH Jr, & Nietch CT (2017). Improving predictive models of in-stream phosphorus concentration based on nationally-available spatial data coverages. Journal of the American Water Resources Association, 53(4), 944–960. doi: 10.1111/1752-1688.12543 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Sharma A, Dubey VK, Johnson JA, Rawal YK, & Sivakumar K (2021). Dendritic prioritization through spatial stream network modeling informs targeted management of Himalayan riverscapes under brown trout invasion. Journal of Applied Ecology, 58(11), 2415–2426. doi: 10.1111/1365-2664.13997 [DOI] [Google Scholar]
  36. Skoien JO, Bloschl G, Laaha G, Pebesma E, Parajka J, & Viglione A. (2014). rtop: An R package for interpolation of data with a variable spatial support, with an example from river networks. Computers & Geosciences. doi: 10.1016/j.cageo.2014.02.009 [DOI] [Google Scholar]
  37. Ver Hoef JM, Blagg E, Dumelle M, Dixon PM, Zimmerman DL, & Conn PB (2024). Marginal inference for hierarchical generalized linear mixed models with patterned covariance matrices using the laplace approximation. Environmetrics. doi: 10.1002/env.2872 [DOI] [Google Scholar]
  38. Ver Hoef JM, & Peterson EE (2010). A moving average approach for spatial statistical models of stream networks. Journal of the American Statistical Association, 105(489), 6–18. doi: 10.1198/jasa.2009.ap08248 [DOI] [Google Scholar]
  39. Ver Hoef JM, Peterson EE, Clifford D, & Shah R (2014). SSN: An R package for spatial statistical modeling on stream networks. Journal of Statistical Software, 56, 1–45. doi: 10.18637/jss.v056.i03 [DOI] [Google Scholar]
  40. Wickham H (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag; New York. doi: 10.1007/978-0-387-98141-3 [DOI] [Google Scholar]
  41. Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, et al. (2019). Welcome to the Tidyverse. Journal of Open Source Software, 4(43), 1686. doi: 10.21105/joss.01686 [DOI] [Google Scholar]
  42. Zimmerman DL, & Ver Hoef JM (2024). Spatial linear models for environmental data. CRC Press. [Google Scholar]

RESOURCES