Skip to main content
PLOS One logoLink to PLOS One
. 2021 Oct 29;16(10):e0259156. doi: 10.1371/journal.pone.0259156

Comprehensive marine substrate classification applied to Canada’s Pacific shelf

Edward J Gregr 1,2,*, Dana R Haggarty 3,4, Sarah C Davies 3, Cole Fields 5, Joanne Lessard 3
Editor: Judi Hewitt6
PMCID: PMC8555849  PMID: 34714844

Abstract

Maps of bottom type are essential to the management of marine resources and biodiversity because of their foundational role in characterizing species’ habitats. They are also urgently needed as countries work to define marine protected areas. Current approaches are time consuming, focus largely on grain size, and tend to overlook shallow waters. Our random forest classification of almost 200,000 observations of bottom type is a timely alternative, providing maps of coastal substrate at a combination of resolution and extents not previously achieved. We correlated the observations with depth, depth-derivatives, and estimates of energy to predict marine substrate at 100 m resolution for Canada’s Pacific shelf, a study area of over 135,000 km2. We built five regional models with the same data at 20 m resolution. In addition to standard tests of model fit, we used three independent data sets to test model predictions. We also tested for regional, depth, and resolution effects. We guided our analysis by asking: 1) does weighting for prevalence improve model predictions? 2) does model resolution influence model performance? And 3) is model performance influenced by depth? All our models fit the build data well with true skill statistic (TSS) scores ranging from 0.56 to 0.64. Weighting models with class prevalence improved fit and the correspondence with known spatial features. Class-based metrics showed differences across both resolutions and spatial regions, indicating non-stationarity across these spatial categories. Predictive power was lower (TSS from 0.10 to 0.36) based on independent data evaluation. Model performance was also a function of depth and resolution, illustrating the challenge of accurately representing heterogeneity. Our work shows the value of regional analyses to assessing model stationarity and how independent data evaluation and the use of error metrics can improve understanding of model performance and sampling bias.

Introduction

Coastal management depends on understanding how marine species are distributed. Species distributions are central to protected area design, vulnerability assessments, ecosystem-based fisheries management, and other marine spatial planning activities such as aquaculture siting and oil spill response. Species distribution (or habitat suitability) models predict these distributions by mapping relationships with environmental predictors. Here, we take on the challenge of mapping of marine substrates, a key determinant of habitat for benthic species [1, 2].

The emergence of multibeam (MB) echosounder acoustics in the early 2000s [3] revolutionized how oceanographers and marine geologists view the sea floor [e.g., 4]. Prior to MB swath scanners, sea floor characteristics (depth and substrate) were based on point observations. Today, in addition to supporting sub-meter bathymetric models, acoustic backscatter (BS) intensity, typically collected at the same time as the bathymetry, is being used to derive sediment classifications [e.g., 5].

The allure of acoustic data to estimate sediment composition is strong because swath sensors can cover large areas at high resolution and are increasingly efficient with depth. However, acoustic surveys of shallow coastal areas are expensive and time consuming, particularly for complex coastlines [6]. In Pacific Canada about 20% of the Exclusive Economic Zone has been mapped with MB acoustics, and these data are particularly lacking in shallow, intertidal waters (Peter Wills, Fisheries & Oceans Canada, Hydrographic Services, personal communication). BS classification also faces ongoing technical challenges including standardized signal calibration, data rectification, and ground-truthing [7], and including BS data in correlative substrate models may not be useful [see 8, 9]. Thus, unlike predictors of ocean dynamics or chemistry (available from remote sensing or ocean circulation models), ecologically relevant descriptions of bottom type continue to elude researchers due to both sampling and analytic challenges.

As countries continue to increase their protection of marine spaces [10, 11], effective management will require this knowledge gap to be filled efficiently and quantitatively [12]. Given that a suitable MB BS substrate layer for the Canadian Pacific shelf remains decades away, we used a random forest classification of available observations to develop spatial layers at two resolutions (20 m and 100 m). Random forest models are seen as reliable [e.g., 13] because of an insensitivity to overfitting the data [14, 15] and the ability to accommodate a variety of relationships between observations and predictors [14, 16]. We extended an earlier random forest model of rocky reefs [8] to include Mixed, Sand, and Mud substrates, and added energy predictors (i.e., wave fetch, bottom current, and tides) and additional bathymetric derivatives. Our four substrate classes were defined to reflect ecological function and to allow different data sources to be combined (see S1 Table in S1 File for descriptions of the substrate observations and Gregr et al. [6] for details on the class derivation). We used observations of bottom type from multiple sources allowing us to consider sample bias and independence. We evaluated model performance (both model fit and predictive power) across resolutions, geographic regions, and depths. We present our results using a collection of diverse and interpretable metrics.

Objectives

Our main objective was to build a comprehensive, ecologically-relevant coastwide map of marine substrate to support predictions of quality habitat for benthic species, and other applications. The importance of such predictions to marine spatial planning makes timeliness an additional objective, and necessitates using the best available data. We approached this objective by building a suite of models extending from the high water line to the continental shelf. We defined classes to capture all substrate types (although with low class precision), making them more broadly ecologically relevant than grain size or single substrate models. Our methods are transparent and reproducible allowing refinement and updates as required–an advantage given ongoing data collection.

To assess the reliability of our models, we examined how class prevalence, sampling bias, model extents and resolution, and depth interact to influence model performance. Specifically, we asked the following questions:

  1. Can weighting classes by observation prevalence improve model predictions?

    The effect of class prevalence on classification models has been well described [e.g., 17, 18], and recent work [8] confirms the random forest algorithm favors the over-sampled class [19]. This challenge is also a significant area of research in the machine learning community [19], where well-balanced classes are encouraged. Finding little on this topic in the marine substrate classification literature, we tested the effect of class prevalence using two parallel sets of models with and without class-size weighting.

  2. What are the effects of model extents and resolution?

    Physical and ecological processes can differ across regions showing variable parameterization over space or time [20], but the detection and interpretation of such non-stationarity is rarely done. We were therefore interested in whether a single coastwide model would perform similarly across large, physiographically distinct regions.

  3. Do our models perform differently by depth? And if so, are these differences correlated with model resolution?

    Our collective experience based on over 50 years of surveying substrate on the BC Coast suggests that substrate heterogeneity (spatial variability) decreases with depth. We therefore predicted that higher resolution models would perform better than coarser resolution models in shallower waters.

Challenges

Ecological relevance

To be relevant as a predictor for habitat suitability models, substrate classifications need to include the full range of substrate types to support the diversity of benthic organisms. Recently, automated, machine-learning approaches to BS classifications have been used to predict particle size [13, 21, 22] as part of the European nature information system (EUNIS) soft sediment class. Automated classifications integrating hard and soft substrates [e.g., 6, 21, 23] are less common in the literature, likely in part because more classes tends to decrease model fit [22, 24]. While methods are available to derive comprehensive classifications with many classes [4, 25], these are labor and data intensive and have thus only been applied to local extents. Fortunately, while a representation of all bottom types is necessary to maximize relevance, the number of classes need not be large, since habitat models and ecological analyses typically don’t require and often cannot accommodate detailed classifications [e.g., 26]. We therefore limited our classification to four ecologically distinct classes: Rock, Mixed, Sand, and unconsolidated Mud. This has the advantage of allowing multiple sources of substrate observations to be combined [e.g., 6], and facilitates reproducibility compared to methods more closely tied to particular data types.

To be relevant as a habitat predictor, substrate classifications also need to be comprehensive across space—from the high water line to the shelf break. The exclusion of the coastal zone [commonly called the white strip because of the absence of data—6] is a chronic problem despite this being both the most productive region of the ocean and the most impacted by human activities [27, 28]. We addressed this challenge by including observations for the entire depth range, from the intertidal to the shelf edge.

Resolution

Developing relevant substrate maps is also challenged by high local substrate variability. When substrate varies at the scale of meters (a common feature, especially in shallow waters), the spatial heterogeneity of a substrate grid (i.e., raster) will depend on the resolution used because each pixel assumes homogeneity. Thus, a 100 x 100 m2 model will show less variability, and potentially a different distribution of substrate classes, than a 20 x 20 m2 model of the same area because point observations must be aggregated to the target resolution. This aggregation can reduce class accuracy, and limits the representation of variability to a single resolution. This suggests that model performance will increase with model resolution when point observations are used for validation. This assertion is supported by contrasting the performance of recent random forest grain size models built at different resolutions [24, 29, 30]. We examine the question of resolution by comparing our 100 m models to our 20 m models, the finest resolution achievable across the large spatial extents of our study area.

Observations and predictors

Substrate observations tend to be spatially patchy and biased towards different bottom types and depths according to sampling method. For example, Lawrence et al. [12] described the challenge of observational sampling when hard substrates are covered with a veneer of soft sediment, while data collected for safe navigation is often limited to shallower waters. Predictors can contain both sampling errors (e.g., poorly reconciled bathymetric track lines) and edge-effects (e.g., bathymetric derivatives generated by excluding terrestrial elevations [8, 31]). While spatial artefacts in the predicted layer can help identify systematic errors in the predictors, errors in the dependent data are harder to identify. This makes the degree of contextual overlap between observational data sets, which determines their shared biases, particularly salient when testing predictive power. Understanding how biases in the data used to build models compare to the data used to test their predictive power can provide insight into the limits of model complexity and improve understanding of model performance and scaling [32].

Model performance

For acoustic data collected with remote sensors, tree-based classifiers such as random forest models are now the most common method applied [e.g., 13, 21]. Similar statistical methods are used in ecological studies to classify species observations into predictions of suitable habitat [e.g., 20, 33], and both applications rely on correlations with environmental predictors. However, because ecological observations are patchy, spatial predictions of habitat suitability rely on the continuous distribution of predictors to make habitat maps [20]. Thus, maps derived from observations rely on the strength and stationarity of the predicted relationships. This is why tests of predictive power (as opposed to simply model fit) are essential to evaluating point-based classifications of habitat suitability, and are adapted here.

Evaluating model performance requires appropriate metrics and testing data, and when maps are based on functional relationships (as in the case of point-based models), the consideration of process stationarity [a common but generally false assumption– 32, 34]. The application of performance metrics has evolved little in over twenty years of predictive modeling [35] with many studies continuing to report Cohen’s Kappa as a measure of model quality despite its well-described shortcomings [36, 37]. While alternatives continue to appear in the literature [e.g., 3840], adoption of these improved metrics has been slow, likely because papers with equations tend to be poorly cited by many practitioners [41].

There is also a persistent misconception about how to interpret model performance given the testing data. The majority of models are tested using cross-validation (the splitting of a set of observations into training and testing partitions) a process described as internal validation [42] or tests of model fit. To test model predictive power [35] (also called forecast skill [e.g., 32], external evaluation [42], or model transfer [43]), independent data are required [39, 4345]. While independent data collected for purpose are desirable [e.g., 39], the use of opportunistic data can serve a similar purpose, while also illustrating important differences among data contexts [e.g., 32].

Evaluating the effect of spatial sampling patterns on predictions based on aspatial correlative relationships [20] is also a significant challenge. Examining residuals is recommended for assessing spatial autocorrelation and model stationarity, but we found no guidance on using residuals to evaluate categorical predictions. Other work has examined the spatial variability of model error by calculating performance metrics at a number of small, randomly positioned sites [46]. This approach was not feasible for our study because we could not control for the effect of sampling density on model performance. Instead, we approached this challenge by testing for model stationarity across regions and depths.

Methods

We applied a random forest classification [14, 16, 47] to a collection of substrate observations (Table 1, the build data) to create predictive models of our four substrate classes (Rock, Mixed, Sand, and Mud) based on a suite of geophysical predictors (Table 2). We built a 100 m (100 x 100 m2) coastwide model for the Canadian Pacific continental shelf. To improve the quality of habitat models for species close to shore and extend the predictions across the white strip, we then built 5 regional, nearshore models at 20 m resolution (20 x 20 m2) within the extents of the coastwide model (Fig 1). Our regions included the sheltered, largely muddy Strait of Georgia (SOG), the exposed West Coast of Vancouver Island (WCVI), the oceanographically distinct islands of Haida Gwaii (HG), the North Central Coast (NCC) with its deep fjords and inlets and a large exposed coastline, and the transitional Queen Charlotte Strait (QCS) region containing a mix of sheltered and exposed areas, fjords and inlets.

Table 1. Contents of the build data set showing number of observations (total and by substrate class) available for model development (build) and independent model evaluation.

Role Type Source N Class
Rock Mixed Sand Mud
Build Grab CHS 127,770 58,899 13,688 34,753 20,430
Grab NRCan 8,938 0 0 4,241 4,697
Dive DFO 44,809 21,250 7,626 13,464 2,469
ROV DFO 10,856 4073 2205 3200 1378
Marsh CHS 5,214 0 0 0 5,214
Totals 197,587 84,222 23,519 55,658 34,188
Evaluation Dive DFO 4974 2,892 543 974 565
Camera DFO 2143 421 491 654 577
ROV DFO 6064 1,477 1,479 633 2,475
Totals 13,181 4790 2513 2261 3617

Data types included Grab and Dive samples, observations from drop Cameras and remotely operated vehicles (ROV), and chart annotations of Marsh. Data were sourced from the Canadian Hydrographic Service (CHS), Natural Resources Canada (NRCan), and Fisheries and Oceans Canada (DFO).

Table 2. Predictors used to classify the observational data and the data sources from which they were derived for each study area.

Predictor Study area Source Native resolution
Depth CoastwideB Carignan et al. [48] elevation model 3 arc-seconds (~90 x 90 m2)
Slope
Slope (std. dev.) Gregr [49] elevation model 100 x 100 m2
Curvature
Regional Davies et al. [31] elevation models 20 x 20 m2
Rugosity
Broad BPIA
Medium BPI
Fine BPI
Tidal speed Coastwide Regional Mean summer conditions averaged from a regional circulation model [50]. 3 x 3 km2
Ocean circulation
Regional (SOG only) Mean summer conditions averaged from a local circulation model [51]. 440 x 500 m2
Fetch Regional Sum of fetch based on Gregr [52]. 50 m

A. Benthic Positioning Index. See S2 Table in S1 File for details.

B. The two Coastwide elevation models were combined by Nephin et al. [33] into a single 100 m bathymetry from which the derivatives were calculated.

Each predictor was generated for both the Coastwide and Regional study areas at their respective resolutions, but often from different source data.

Fig 1. Study area.

Fig 1

The spatial extents of the six models developed in this analysis. The 100 m coastwide model covers the Canadian Pacific continental shelf. The five 20 m regional models extend from the high water line to as far as 5 km seaward from the 50 m depth contour, the limit of the bathymetry and derived predictor variables. See text for details.

For each of the 6 extents we built models with and without class weighting to test for the effect of class prevalence. This collection of models allowed us to examine the relative performance of models across resolutions, regions, and depths.

We tested model fit by partitioning the build data into training and testing partitions. We tested the predictive power of our models using three independent data sets, collected separately. Our predictor variables (Table 2) included a commonly used suite of geomorphic predictors derived from bathymetry, and several measures of energy. Each of these data sets is described in the following sections.

Model build data

We assembled a coast-wide data set of 197,587 observations from Natural Resources Canada (NRCan), the Canadian Hydrographic Service (CHS) and Fisheries and Oceans Canada (DFO) data holdings to build the models (Table 1). The observations are broadly distributed across Canada’s Pacific shelf with high concentrations of points near shore (S1 Fig in S1 File). CHS collects grab samples as part of their regular hydrographic surveys and represent the largest component of our build data set, with sampling biased towards shallow waters and rocky substrate because of the CHS’s mandate to chart navigable waterways. We therefore included NRCan grab data and marsh locations mapped by CHS to increase the prevalence of the soft bottom type classes in our build data. NRCan core samples are biased towards unconsolidated substrates. Direct observations of bottom type were acquired from DFO shellfish stock assessment dive surveys and remotely operated vehicle (ROV) surveys of rockfish habitat. Observations were re-classified to the four bottom type classes used in this analysis following Gregr et al. [6]. Details on these data are provided in (S1 Table in S1 File).

Predictor data

Environmental predictors were selected to include a combination of benthic terrain features derived from bathymetry [e.g., 8, 53] and measures of energy [e.g., 33, 54]. To support our analysis across two spatial resolutions, we derived the same terrain features from a 100 m coastwide bathymetry and our 20 m regional bathymetries. These included slope, curvature, rugosity, the standard deviation of slope, and three bathymetric positioning indices (BPIs) with increasing neighborhood sizes (S2 Table in S1 File) to capture both small benthic features, and larger trends in terrain. Energy was represented using tidal currents and broad-scale circulations derived from ocean current models [50, 51]. Fetch, a proxy for wind-wave exposure [55], was included in the 20 m model but not the 100 m model as the shallowest accurate prediction from the 100 m was expected to be deeper than shoaling depth. Additional details on the derivation of these predictors is provided in (S2 Table in S1 File). We did not explore questions of predictor independence or variable selection.

Independent evaluation data

Our independent data were collected at random locations by DFO as part of the Benthic Habitat Mapping Project, intended to define nearshore habitat & species assemblages. Surveys were done using dive transects, drop cameras, and remotely operated video (ROV). The dive and ROV data were collected using similar methods as the build data, but they were collected at different times and often by different observers. The datasets included depth and consistently coded substrate classes making it easy to reclassify them to our four substrate classes (S3 Table in S1 File). Observations were collected within quadrats on transects. We aggregated the observations from each quadrat by 20 m grid cells, assigning the mode substrate observation to the center point of a cell. These observation points were then used to extract predictor data from the 20 m and 100 m raster stacks.

Independent observations from the dive data included 1077 transects surveyed between 2013 and 2018. Data were collected following Davies et al. [56] in shallow areas along the coast, ranging from -5 to 19 m depth. Points above chart datum were surveyed by divers at a high tide. Data were prepared in the same way as the dive component of the Build data (S1 Table in S1 File).

Drop camera observations were collected following Davies et al. [56] using a GoPro camera deployed off the side of small boats during the dive surveys (2014 to 2018). We obtained still photos from 889 locations at depths from 16 to 60 m. These data were intended to extend the dive observations of substrate into deeper waters. Uncertainty in the positional accuracy of the images increases with depth due to deflection of the drop camera from the boat position. We minimized this uncertainty by removing locations where the difference between recorded depth and bathymetry exceeded 50 m.

ROV observations for Haida Gwaii and the North Central Coast regions were extracted from video imagery along 366 transects in depths from 33 to 675 m collected between 2013 and 2015. Observations were recorded for 10 second increments of video and aggregated by mode into the 20 m bins as described above.

Model development and comparisons

All observational and predictor data were prepared using ArcGIS [57] before importing into R, where we joined all observational data with predictors scaled to coastwide (100 x 100 m2) and regional (20 x 20 m2) grids (i.e., rasters) before analysis. Predictors were not transformed or assessed for correlation since the random forest approach is largely robust to non-normal, correlated predictors [14, 16]. Where multiple observations occurred in the same raster cell, the predictor values were duplicated to preserve the observational sample size. This was more common for the coastwide model.

We built our models using the ranger package [58]. While a variety of random forest packages are available, ranger is the only one to effectively support the weighting of classes based on prevalence. We used the same number of trees (1000) and test fraction (0.6) for all models, set variable importance to use the Gini index, and the internal cross-validation to sample with replacement. All analyses were done in R [59], using a number of packages for the data analysis and presentation of results (see S2 Analytical Methods in S1 File).

We randomly spilt the build data into training (67%) and testing (33%) partitions. We used the training partition to build the models and the testing partition to assess model fit and explore the effects of depth and model resolution. We tested all models using the same testing partition to provide a baseline for assessing how model predictive power (based on IDE) may be influenced by sampling bias. We compared the weighted and non-weighted models to illustrate the effects of class prevalence. We did not use cross-validation as this is done internally as part of the random forest process, and is reflected in the out of bag error (OOB) [16]. We used the independent data to examine sample bias and test for stationarity. We evaluated model performance using a suite of comprehensive and interpretable metrics (see next section).

We weighted classes according to their prevalence in the training data (1 –Nclass / Ntotal) and compared class performance using both the build testing sample and the independent data sets across all models. We tested whether any observed differences were influenced by model resolution, region and depth.

To test for stationarity, we compared how the coastwide model performed against each of five regions. We tested our hypothesis that model performance is correlated with depth by comparing model fit and predictive performance across depth classes. We classified depths following Gregr et al. [6], who provided an ecological rationale for dividing coastal waters into Intertidal, 0–5 m, 5–10 m, 10–20 m, and 20–50 m zones. To these we added three deeper zones (50–100 m, 100–200 m, and 200+ m) for a more complete comparison of depths.

Measuring model performance

Best practice now calls for using more than one metric [44], as no single accuracy metric can serve all assessment objectives, and different measures can imply different conclusions [60]. To ensure our assessments of model performance were comprehensive, we explored metrics developed across the disciplines of image classification [61, 62] habitat suitability modelling [35, 42], and weather forecasting [63, 64]. We report metrics describing both model accuracy and model error. Accuracy measures both how much better model predictions are than a random guess [42], and the observed agreement between predictions and a test dataset [62]. For better-than-random, we used the True Skill Statistic (TSS) instead of Kappa, which has been shown to have limited utility as a performance metric [36, 37]. We used Overall Accuracy [61] and True Negative Rate [TNR, 42] to provide information on correctly predicted positives and negatives respectively (aggregated across classes). We assessed by class accuracy using TNR, and User and Producer accuracies (see S1 File). We used measures of model error based on the work of Pontius and colleagues [e.g., 62]. These include Quantity error, which measures the deviance in the frequency of observations and predictions, Exchange error defined as a swapping between two categories, and Shift error, the remaining error that cannot be attributed to either Exchange or Quantity. We report these error metrics aggregated across classes. Our combination of accuracy and error assessment provides a more complete picture of model performance than is commonly reported. Finally, we derived Imbalance as an integrated measure of prevalence in a multi-class data set (see S1 File).

To complement the quantitative assessment, we examined the spatial agreement of our predictions with two well-known areas of the Pacific coast, Pacific Rim National Park Reserve in the WCVI region, and English Bay, part of the urban coast of Greater Vancouver in the SOG region. This qualitative comparison adds valuable information on how location influences the assessment of the substrate predictions by allowing the patterns produced by different models to be compared.

Results

Model development and class weighting

All models, whether weighted or not, had comparable and high fit to data (TSS values ranging from 0.56 to 0.64) with no notable differences between the coastwide and the regional models in the aggregated metrics (Table 3). The effect of weighting is more apparent in the error assessment where the Quantity errors of all non-weighted models (0.07 to 0.10) were about twice that of the corresponding weighted models (0.03 to 0.06) (Table 3). The majority of model error came from Exchange between classes. The reduction in Quantity error achieved by weighting tended to be offset by a corresponding increase in Exchange error. This explains why the aggregate metrics TSS and Accuracy were largely unchanged by weighting (Table 3).

Table 3. Aggregated build metrics for all 6 models comparing the weighted (first row) and the unweighted (second row) random forest results.

Model N Imbalance OOB TSS Accuracy TNR Quantity Exchange Shift
Coast 66056 0.17 0.30 0.57 0.70 0.86 0.03 0.25 0.02
0.29 0.59 0.70 0.85 0.07 0.20 0.02
HG 9191 0.17 0.26 0.64 0.74 0.88 0.06 0.19 0.02
0.25 0.64 0.74 0.87 0.08 0.17 0.01
NCC 22189 0.22 0.28 0.57 0.72 0.83 0.05 0.21 0.02
0.28 0.58 0.72 0.80 0.10 0.16 0.02
QCS 4383 0.20 0.29 0.56 0.70 0.82 0.06 0.22 0.02
0.29 0.57 0.71 0.79 0.11 0.16 0.02
SOG 14399 0.15 0.30 0.60 0.71 0.87 0.06 0.23 0.01
0.30 0.60 0.71 0.86 0.08 0.20 0.02
WCVI 9523 0.22 0.24 0.63 0.76 0.87 0.03 0.21 0.00
0.24 0.64 0.76 0.83 0.08 0.16 0.01

Sample size (N) and Imbalance characterize the observational data. Out of Bag (OOB) values show the mean prediction error from the random forest internal cross-validation. The True Skill Statistic (TSS) measures how model performance exceeds random after correcting for chance and prevalence. Overall Accuracy, True Negative Rate (TNR), Quantity, Exchange and Shift provide an assessment of model error. See (S1 File) for details on the metrics.

Across classes, Rock and Mud had the highest User and Producer accuracies and the Mixed class had the lowest across all models, regardless of weighting (Fig 2). The effect of weighting can be seen when the class-based metrics are compared: weighting shifted User Accuracy away from Rock to the other classes, particularly Mixed and Mud, for all models (Fig 2). There was also a corresponding increase in the Producer Accuracy of the Rock class with weighting at the expense of the other classes, though this was more variable across models.

Fig 2. Heat maps of model performance.

Fig 2

Producer Accuracy, User Accuracy, and True Negative Rate (TNR) for each substrate class, shown for weighted (left column) and non-weighted (right column) models. The color shading within each row reflects the underlying values from high (red) to low (blue) and is included to emphasize differences.

The reduced Quantity error from class weighting is also evident when the prevalence of the build testing partition was compared to model predictions (Fig 3). The over-prediction of the Rock class by the non-weighted models and the consistent under-prediction of the Mixed class were clearly mitigated by class weighting, which also aligned the prevalence of the Sand and Mud classes more closely to the observed values. More importantly, these changes in prediction prevalence were evident in small but significant shifts in the spatial distribution of the classes, with the weighted model producing a pattern that more accurately reflected known nearshore substrate in our two test locations (Fig 4). The predictions from the weighted model produced less Rock and more Sand on known beaches in Pacific Rim. Changes in English Bay were less apparent, but a shift from Rock to Mixed is evident. In deeper waters of Pacific Rim, the weighted model predicted less Sand and more Mixed substrates, although bathymetric artefacts were enhanced.

Fig 3. Comparison of class prevalence.

Fig 3

Observed class prevalence in the build testing partition (orange) compared to predictions from the weighted (yellow) and unweighted (blue) random forest models across regions. Weighting tends to yield class prevalence closer to that observed in the build data (note that training and testing partitions have the same prevalence).

Fig 4. Predictions in regional assessment areas.

Fig 4

Predictions from the 100 m coastwide (top row), 20 m, no-weight (middle row), and 20 m weighted (bottom row) models for the Pacific Rim National Park (left column) and Greater Vancouver (right column) assessment areas. The series of images (from top to bottom) shows how increased resolution and weighting for prevalence help mitigate the bias towards rocky substrate.

Variable importance (Fig 5) differed notably across models suggesting the dominant processes differed across regions. Our indicators of ocean dynamics (circulation and tidal) were the top two predictors for the coastwide model followed by bathymetry. Slope, broad- and medium-BPI, and the standard deviation of slope provided almost equal contributions. In contrast bathymetry was the dominant predictor for all regional models except HG, where bathymetry was second to fetch. Fetch was also important in the WCVI region, the other region we presumed would be strongly influenced by exposure. Fetch, broad-BPI, and tidal flow rounded out the top four variables for the regions. Despite such rankings the contributions of predictors can be very similar, particularly among those contributing least. For example in the NCC region, the three least influential predictors have virtually equal model contribution scores (Fig 5).

Fig 5. Heat map of variable importance across models.

Fig 5

Variable importance, defined as the proportion of each predictor’s contribution to the model, is shown relative to the predictor with the highest contribution (p/pmax). The color-shading within each row reflects the underlying numbers from high (red) to low (blue) and is included to make the differences in the values more apparent. For the coastwide model,”NA” indicates that Fetch was not used as a predictor.

Model resolution

The most significant predictors for the 100 m coastwide model were related to ocean energy with a native resolution of three km. In contrast, the regional 20 m models were closely tied to bathymetry and potential wave energy. Yet comparing the 100 m and 20 m models shows little difference in either the aggregated or class-based metrics between resolutions (Table 3 and Fig 2 respectively). However the differences in mapped predictions are dramatic, with the 100 m model showing an over-prediction of nearshore Rock in both focal areas despite class weighting (Fig 4). The mapped predictions from the 100 m model are also more homogeneous than the 20 m models (Fig 4). While this reduced visible artefacts, it also highlights the inability of the coarser model to represent local substrate heterogeneity.

Performance across depths

The 100 m model shows a clear trend of increasing TSS with deeper water compared to the 20 m models based on the testing partition, although results were variable across regions (Fig 6A). Specifically, the trend is clear for HG and NCC, absent for SOG, and uneven for WCVI and QCS. This pattern is driven by the higher Accuracy of the 20 m models in the intertidal and 0–5 m depth zones (Fig 7A) across all regions (except the SOG intertidal). The 100 m model has a consistently higher TNR, particularly in the 0–5 m depth zone (Fig 6B). The corresponding error assessment for the two resolutions also shows decreasing error with depth for the 100 m model across most regions (Fig 7A) but generally increasing with depth for the regional models (Fig 7B). All models show a tendency towards increased Quantity error with depth but most of the error is from an Exchange between classes.

Fig 6. Differences in model fit by region, across depths and resolutions.

Fig 6

The difference in (A) the True Skill Statistic (TSS) and (B) the True Negative Rate (TNR) between the 20 and 100 m models across depth ranges, shown by region. Scores are based on model fit to the build testing partition. Values below 0 indicate a higher score by the 20 m model. This shows how regional model performance is generally better across all depths and regions, except for SOG, and identifies possible sampling biases in the 0–5 and 200+ depth ranges.

Fig 7. Error assessment of model fit by region, depth and resolution.

Fig 7

Accuracy and error metrics for (A) the 100 m coastwide and (B) 20 m regional models shown across depth zones for each region, based on model fit to the build testing partition. Accuracy tends to increase with depth in the 100 m model and decrease with depth in the 20 m models, but the trends are noisy.

The role of resolution in the correlation between model performance and depth is further supported by the IDE (see below) and is also apparent in the mapped predictions (Figs 4 and 8). The tendency of the 100 m model to predict more contiguous classes and over-predict Rock near shore is evident in all three of our test regions (Figs 4E, 4F and 8B). However, the 100 m model also captures known physiographic features in deeper waters, in particular the canyons in Queen Charlotte Sound and the shelf edge not identified by the 20 m models (Fig 8).

Fig 8. Mapped predictions for different resolutions.

Fig 8

Predictions for portions of the HG and NCC 20 m regional models (A) are shown for comparison with the 100 m coastwide model (B). Note the detail provided by the 20 m models near shore where the 100 m model predicts largely Rock. In contrast, the 100 m model identifies known features at depth (e.g., Moresby Canyon) not captured by the 20 m models. The regional models are shown using unclipped predictor variables to allow model comparison in deeper waters. See text for details.

Independent data evaluation

The independent data were not consistently distributed across regions: the Dive data were distributed most broadly, while the ROV data were limited to HG and NCC (Table 4 and S2 Fig in S1 File). Not unexpectedly, model forecast skill was generally lower and more variable than model fit. The TSS for the 100 m coastwide model varied considerably (0.10 to 0.24) for the three independent data sets (Table 4), however all regional models had higher TSS scores than their corresponding coastwide model. The regional TSS scores were notably better for the Dive and ROV data. The pattern in Accuracy scores generally followed the TSS scores with some exceptions, showing the importance of accounting for chance in model performance. The TNR scores were highest for the ROV data followed by the Camera data, and notably lower for the Dive data (Table 4). Errors (Quantity, Exchange, and Shift) were highly variable across both regions and independent data sets.

Table 4. Performance of each random forest model against each independent data set (IDS) (see Table 3 for description of metrics).

IDS Model N Imbalance TSS Accuracy TNR Quantity Exchange Shift
Dive Coast 3666 0.22 0.20 0.52 0.57 0.32 0.14 0.02
HG 1479 0.23 0.32 0.58 0.75 0.08 0.31 0.03
NCC 2217 0.30 0.33 0.65 0.57 0.19 0.14 0.03
WCVI 166 0.36 0.36 0.69 0.56 0.17 0.11 0.04
QCS 549 0.27 0.28 0.60 0.64 0.16 0.18 0.06
SOG 551 0.14 0.26 0.47 0.76 0.20 0.26 0.06
Camera Coast 2047 0.06 0.10 0.22 0.80 0.70 0.06 0.02
HG 818 0.10 0.15 0.34 0.79 0.30 0.26 0.10
NCC 580 0.06 0.20 0.39 0.76 0.34 0.17 0.10
WCVI 139 0.20 0.12 0.23 0.87 0.58 0.19 0.01
QCS 410 0.10 0.12 0.34 0.73 0.41 0.15 0.10
SOG 196 0.21 0.11 0.23 0.86 0.58 0.12 0.07
ROV Coast 6059 0.15 0.24 0.42 0.83 0.30 0.23 0.04
HG 1762 0.14 0.32 0.39 0.87 0.48 0.11 0.02
NCC 3909 0.18 0.27 0.49 0.79 0.14 0.35 0.03

Overall, the Dive and Camera data were predicted with the highest and lowest Accuracy respectively. Accuracy, while variable across regions, was inversely correlated with Imbalance (Table 4). The lower TNR predicted at the Dive observations (compared to the other independent data) is reflected in a correspondingly lower Quantity error (Table 4) which dominates the error component of most models, showing that much of the misclassification is due to errors in prediction prevalence.

Examining the aggregated scores adjusted to their no-information baselines (Fig 9) provided a clearer representation of relative predictive power. The regional models generally outperformed the coastwide model for regions (HG, NCC) with larger samples of independent data. For the other regions the models predicted the Dive data better than the coastwide model, but accurate predictions of the Camera data were variable. The ROV data (limited to the HG and NCC regions) were predicted with the most consistent Accuracy and TNR scores while the Dive data were consistently predicted with both the highest Accuracy and lowest TNR. These differences are due in part to sample size, but may also reflect some spatial bias in the sampling.

Fig 9. Aggregated accuracy metrics of predictive power for each independent data set by region.

Fig 9

Accuracies are shown as the difference from the no-information baseline: positive values indicate performance better than random and negative values indicate performance worse than random. The no-information baseline for TSS is always 0.0 because it integrates across classes. However in our error matrices with four classes (S4 Table in S1 File), the baseline for the true positive rate (TPR) is 0.25, and for the true negative rate (TNR) it is 0.75. Missing bars show either a difference of 0 indicating performance no better than random (HG) or missing data (ROV for QCS and SOG).

The influence of resolution on the correlation between depth and model performance is also apparent when the predictive power of the coastwide 100 m model is compared to the 20 m models for the regions with sufficient independent data (Fig 10). The IDE of the coastwide model (Fig 10A) shows a clear increase in predictive power with depth for both the Dive and Camera data, while the opposite pattern is evident in both the HG and NCC regions (Fig 10B and 10C). The IDE of the ROV data are more equivocal, illustrating differences between independent data sets and emphasizing the need to understand the different data collection methods and biases.

Fig 10. Error assessment of predictive power by depth zone.

Fig 10

Accuracy and error metrics for the predictive power of the (A) Coastwide, (B) Haida Gwaii, and (C) North Central Coast models for each independent data set (depth zones and regions not shown had insufficient sample sizes).

Discussion

Our results show how class weighting to address sample imbalance can lead to both numerical and spatial improvements in model performance. We also confirm the existence of regional non-stationarity, and show that model reliability depends on depth, resolution, and substrate class, and potentially the uniqueness (in terms of predictors) of the location in question. This means reliability will vary across the coast, with different resolutions and substrate classes being relatively more or less reliable in different locations. Understanding these differences will improve the confidence that can be placed in these and similar models. It will also inform their contribution to predictions of habitat suitability, and help guide future data collection and model refinement.

Depth and resolution

All our models fit the build data well, however the predictive power of the regional 20 m models was universally better than the coastwide model. The consistently higher TNR of the 100 m model can be attributed to the large, homogeneous Rock predictions of the 100 m model in shallower waters (Fig 4) showing that model performance was dependent on depth and resolution. This confirms our initial belief that the 100 m model would perform better in deeper waters and the 20 m models better in shallow waters. Our qualitative assessment of the mapped predictions agrees with the numerical analysis. Specifically, the 100 m prediction of large contiguous areas of Rock substrate nearshore and the failure of the 20 m model to capture sediments associated with known features (e.g., canyons) in deeper waters (Figs 4 and 8) illustrate how coarser spatial resolutions are unable to represent finer scale heterogeneity in substrate, while the higher resolution needed to capture that heterogeneity can miss larger geomorphic features. This supports the decision to limit the 20 m models to shallower depths, and shows that mapping the different scales of heterogeneity will require multiple resolutions, the integration of which would be best captured using object-based approaches [e.g., 6, 12] where representation is not resolution-dependent.

Our results also support the view that substrate heterogeneity generally decreases with depth, and suggest that resolution-based differences in performance across depths (Figs 6 and 7) are at least in part influenced by the true heterogeneity in different depth classes. For example, the higher accuracy of 20 m models in shallower water is because they can better capture nearshore heterogeneity. Similarly, the more consistent fit of the 100 m model across depth classes in the SOG region can be explained because certain characteristics of the region (a relatively shallow marginal sea dominated by mud in deeper waters [65] and less exposure to wind-wave energy than other regions) combine to minimize differences across depths. This emphasizes the importance of considering process stationarity (see following section). Nevertheless, the correlation of predictive power with depth (Fig 7) suggests that the 20 m models will be more reliable for nearshore studies to about 50 m depth, while the 100 m model would be more reliable in deeper areas.

To assess whether sampling density contributed to model performance we looked for patterns in accuracy to see if it was related to the density of observations (not shown). We found sample density was highly correlated with depth (as expected given the sampling context of much of the build data–see S1 Table in S1 File), making it impossible to disentangle the effect of density from depth with our observations. While understanding the role of sample density would provide important information to the design of sampling programs, accuracy is likely to be maximized when sample density is the same or better than the analytic resolution. Ideally, choice of resolution would include an explicit rationale to ensure the resulting product is not misinterpreted by drawing inferences at inappropriate resolutions.

Analytic resolution also influences process stationarity because processes are scale-dependent [66]. This means coarser models will better represent more averaged conditions (which by definition have less variability and correspondingly higher stationary). Thus, our observed differences between the 20 m and 100 m models are also due, in part, to the different processes captured by the different resolutions.

Process stationarity

Correctly representing driving processes is central to predictive power [67]. However, the reliability of such representations across a seascape depends in part on the assumption of stationarity, a typically tenuous assumption, particularly across larger spatial extents [68]. In this analysis, both class-based results (Fig 2) and variable importance (Fig 5) showed strong evidence for non-stationary processes across regions, while non-stationarity across both regions and depths is suggested by differences in aggregated metrics of model fit (Figs 6 and 7) and error assessment (Fig 9).

There are obvious reasons for non-stationarity across regions. For example, the SOG is strongly influenced by sediments from the Fraser River [65] while exposed regions are more influenced by wind-wave energy. Elsewhere, the unique characterization of variable importance on the NCC may reflect the competing dominance of tidal energy in channels and wind-wave energy (fetch) in inlets and exposed areas creating within-region differences. Finally, predictions of sand in exposed coastal areas of the WCVI may be made more difficult because the local processes responsible differ from an otherwise strong association between Rock and high fetch. Other (unrepresented) factors such as freshwater input and upland geology will also differ both within and across regions. Given that these processes play out on a crenulated coastline carved up by deep narrow fjords, it would not be surprising if the distribution of substrate types also depended on local processes and geological history.

Such local processes cannot be teased out by classification models. Instead, they are generalized across the model domain according to the prevalence and spatial distribution of the observations. We argue that model reliability is higher in places where predictor values are closer to the center of their ranges, thereby avoiding boundary conditions. This means oceanographic uniqueness will likely be correlated with poor model quality.

Bias and accuracy

Models of substrate, habitat and climate are regularly built on other models. Thus, any layers used in this way (e.g., bathymetries, current models, remotely sensed primary production) will have their own artefacts, uncertainties and limitations. Careful consideration of data sampling and processing steps such as averaging, aggregation, or interpolation methods is therefore warranted.

In contrast to their consistent validation against the build test data, the predictive power of our models varied across the independent data sets, a clear indication of the differences in data context. Such differences are a function of both data collection and preparation. For example, the occurrence of Dive data deeper than 50 m depth (as implied by the 100 m model, Fig 10A) is an artefact of large pixels in the coastal zone over-generalizing depths. This effect is also evident in the ROV data, which are typically not collected in waters shallower than 20 m. This misallocation of the data is not due to positional inaccuracy of the observations (we screened the depths of the independent data observations for agreement with the 20 m bathymetry) but rather a function of the resolution of the modelled bathymetry. Specifically, since a 100 m raster cell often covers a wide range of actual depths, particularly in steep topography, any spatially associated observation would be assigned to the single raster value regardless of the observation depth. Thus, what can appear to be a positional inaccuracy or bad data can actually be a function of analytic resolution.

This suggests that building models with a compilation of data from different sources can reduce (or average) sampling bias. This also implies that a model built with data from a single sampling context may have lower predictive power. We suggest that using build data compiled from several different sampling contexts will improve model performance because the diversity of biases will force the classifications to be more general, much like the generalization of processes described above regarding stationarity.

Model reliability is also influenced by the number and nature of the classes used in the classification. For example, Rock, Sand, and Mud are all more definitive than our Mixed class which, by definition, included a variety of heterogeneous classes (e.g., sand and cobble, gravelly sand, boulders on silt). With this diversity (i.e., a lack of independence), any method would be hard-pressed to reliably describe such a class. So it’s not unexpected Mixed was less well predicted than the more independent classes (see S4 Table in S1 File). However, its consistently high TNR (Fig 2) shows it was rarely predicted in error, perhaps in part due to its relatively small sample size (Fig 3 and S3 Table in S1 File). We suggest such a class is useful to the overall classification because providing a home for less definitive observations reduces the misclassification of the other classes, and provides insight into local heterogeneity. While an alternative is to have more classes, this can exacerbate the prevalence problem and lead to reduced accuracy [22, 24].

Spatially, we have shown that examining maps of model predictions is also critical to understanding model accuracy. Our qualitative visual assessment identified both poor spatial accuracy in mapped predictions and spatial artefacts, neither of which are apparent in the performance metrics. In addition to non-stationarity and analytic resolution (discussed above), the accuracy of predictors must also be considered. For example, the ocean current model we used (for all models except the 20 m SOG) had a native resolution of 3 km. While the related predictors (tidal and ocean current energy) were the most important predictors in the 100 m model, the interpolation and resampling to 20 m may have served to obscure rather than enhance nearshore variability, despite still helping explain the broader pattern in the observations. Similarly, the artefacts evident in both the 20 and 100 m bathymetric models, themselves the result of sampling bias [e.g., 31], serve as a caution to fitting models too closely to a modelled predictor. These insights suggest we have reached the limits of what can be achieved with our existing predictor data, and that higher resolution predictors that more closely match the analytic resolution are needed to improve model accuracy.

Recognizing artefacts is critical. We suggest all predictor layers be examined carefully before modelling. Our experience shows that artefacts (e.g., for depth) are more clearly visible when examining derivatives. If artefacts are observed, they can often be mitigated by smoothing (S3 Fig in S1 File). Another example relates to the importance of including terrestrial elevations, which can improve bathymetric derivatives (i.e., slope and rugosity) [31] and improved representation of steep-sided rocky shorelines in coastal systems [8]. In the end, we can only develop our models with the best information available. However, it is useful to understand the limits of our predictors, and to explicitly communicate how these limitations should inform model interpretation.

Measuring performance

Our study shows the value of using both accuracy and error assessment metrics, and of comparing performance across spatial subsets (e.g., regions and depths) and individual classes. The error assessment provided insight into tests of predictive power, showing differences across independent data sets (Table 4) and how errors can be associated with resolution (Table 3). We also found differences in how accuracy metrics respond to class weighting, with the TNR more responsive than Overall Accuracy (Table 3), corroborating the observation by Allouche et al. [17] that prevalence has a greater influence on TNR than Accuracy. Class-based metrics also showed the effect of class weighting on model performance (Fig 2) and contributed to our assessment of spatial non-stationarity (Table 4 and Fig 9) by identifying differences in model performance across regions and depths.

By using well known focal areas our qualitative spatial assessment uncovered important differences between numerical and mapped performance not otherwise apparent. Other, more detailed spatial assessments are possible, but in our case they would be complicated by spatial sampling bias (e.g., the rocky bias of the training data, and the shallow bias of the independent dive data), sample density, and questions of spatial-autocorrelation. Such analyses would therefore be most effective with purpose-collected data.

Next steps

Our illustration of how depth and resolution influence predictive performance challenges the feasibility of producing a gridded coastwide substrate map at a single resolution, pointing to the need for an object-based framework to integrate substrate class polygons developed at multiple resolutions. Such efforts could extend existing object-based efforts in the region [6], and take advantage of the work done testing the performance of predictors across resolutions, as has been done in some random forest BS classifications [21].

Independent data collected with dedicated surveys would not only support tests of spatial error, but could be used to test the suggestion that more typical parts of the coast will be better predicted than more unique areas. Such data could also be used to assess sub-regional differences in process. Improving predictor resolution would also help improve classification in shallower waters. For example, the interpolated fetch used in the 20 m regional models could be replaced with a more resolved fetch product. All models would benefit from a coastwide ocean circulation model with a sub-kilometer resolution.

Methodologically, we showed that class weighting affords a similar benefit to a 2-step procedure where the dominant class was modeled separately from the three more balanced ones and then combined [see 69]. Despite the imbalance in our build data, the mapped predictions were more balanced (S4 Fig in S1 File) and more consistent with the known characteristics of these regions [65] and our collective experience with the study area. However, our models did predict the independent Dive data (which, like the build data, were biased towards Rock substrate) with greater accuracy than the more balanced Camera data. This suggests differences in prevalence between build and test data could influence estimates of predictive power. While we found no guidance on whether model accuracy improves with increasing agreement between sample and true prevalence, this highlights the need for further research on the role of imbalance, and how to trade off the accuracy from a balanced training sample against the accurate representation of real-world prevalence. Such studies are critical given the challenges imbalanced data pose to random forest models [19].

Disentangling the effect of sample size from its spatial distribution is also an understudied challenge. Questions include whether model accuracy is higher in areas with higher sampling density, whether higher accuracy is better achieved with a balanced training sample or one that more closely corresponds to real-world prevalence, and whether accuracy is maximized by matching the resolution of the analysis with the sampling resolution. Answers to these questions would improve sampling design and support the development of an object-based approach to integrate features from different resolutions.

Investigating how well models predict more heterogeneous (e.g., Mixed) classes could also help guide model refinement, including the definition of more discrete classes [as proposed, for example, by 6]. To understand the interaction between sampling density and depth, estimates of spatial variability in the training data [e.g., 70], or spatial uncertainty in model predictions [e.g., 33] could also provide insights. Such methods may, however, be more relevant to the ecological models produced using the substrate layer developed here.

Despite the significant challenges facing the classification of BS data collected during bathymetric surveys, meter-scale bathymetry for an increasingly large portion of the coast is becoming available, particularly in deeper waters. These data could be used directly in our analysis at local scales to produce higher resolution outputs and support cross-scale analyses. More detailed, reliable, local MB classifications calibrated to BS data [e.g., 29] would be invaluable as independent data to test the substrate predictions developed here.

Conclusions

We have produced a set of comprehensive, coastwide maps of marine substrate at resolutions appropriate to nearshore and coastwide analyses (e.g., S5 Fig in S1 File). Compared to the 670,000 km2 classified by Stephens and Diesing [30], our 135,000 km2 study area is about 1/5th the size but is 25 times better resolved (100 m vs 500 m grids). Further enhanced by our 20 m regional models, this contribution is one of the most well-resolved national classifications produced to date. Our spatial assessment shows that our 20 m regional models are suitable for shallower (< 50 m depth), coastal regions while our 100 m model is more suitable for deeper, more homogenous areas of the shelf. Although higher resolution (e.g., meter-scale) models are feasible, they will require higher resolution predictors and will likely have to be limited to regional or sub-regional areas to manage the challenge of stationarity.

We expect model reliability will be highest in more typical, well-sampled areas of the coast, where predictor values were closer to the coastwide or regional average. Predictions in more unique, under-represented areas will be less reliable. Understanding the relationship between resolution and representable features will help users assess the reliability of the mapped predictions.

Our tests of predictive power suggest that building models with data compiled from diverse sampling contexts may improve predictive power by integrating the sampling biases into the models. They also emphasize the importance of distinguishing predictive power from model fit.

Our analysis is one of the few to predict substrate classes from a diverse set of observations over large spatial extents at ecologically relevant scales. Our predictive models are also the first to be evaluated using both accuracy and error metrics illustrating the benefits of comprehensive model assessment. Our maps will contribute to marine spatial planning initiatives in Pacific Canada, and our methods may be useful in other jurisdictions where substrate maps are required.

Supporting information

S1 File

(DOCX)

Acknowledgments

Matt Grinnell created the fetch points in Hecate Strait and Queen Charlotte Sound. Input and support from Jessica Finney greatly improved the analysis presented in this paper. We are grateful to CHS for providing their substrate observations, and Peter Wills in particular for insights into characteristics of the CHS data and for making the data from field sheets available. Collegial reviews of an earlier draft by Cooper Stacey and Beatrice Proudfoot, and formal reviews by Gary Greene and David Bowden greatly improved the coherence and readability of this paper.

Data Availability

The data and associated project code have been uploaded to GitHub. The url is https://github.com/ejgregr/substrate_model.

Funding Statement

Dr. Gregr is the principal of SciTech Environmental Consulting. His funding for this study was provided by Fisheries and Oceans Canada on a contractual basis. The funder did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Brown CJ, Smith SJ, Lawton P, Anderson JT. Benthic habitat mapping: A review of progress towards improved understanding of the spatial ecology of the seafloor using acoustic techniques. Estuar Coast Shelf Sci. 2011;92(3):502–20. [Google Scholar]
  • 2.Kostylev VE, Todd BJ, Fader GB, Courtney R, Cameron GD, Pickrill RA. Benthic habitat mapping on the Scotian Shelf based on multibeam bathymetry, surficial geology and sea floor photographs. Mar Ecol Prog Ser. 2001;219:121–37. [Google Scholar]
  • 3.Brown CJ, Blondel P. Developments in the application of multibeam sonar backscatter for seafloor habitat mapping. Applied Acoustics. 2009;70(10):1242–7. [Google Scholar]
  • 4.Greene HG, Bizzarro JJ, O’Connell VM, Brylinsky C. Construction of digital potential marine benthic habitat maps using a coded classification scheme and its application. In: Todd BJ, Greene HG, editors. Mapping the seafloor for habitat characterization. Geol. Assoc. Can. Spec. Pap. 47, pp. 141–1552007. [Google Scholar]
  • 5.Stephens D, Diesing M. A comparison of supervised classification methods for the prediction of substrate type using multibeam acoustic and legacy grain-size data. PloS One. 2014;9(4):e93950. doi: 10.1371/journal.pone.0093950 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Gregr EJ, Lessard J, Harper J. A spatial framework for representing nearshore ecosystems. Prog Oceanogr. 2013;115:189–201. [Google Scholar]
  • 7.Lamarche G, Lurton X. Recommendations for improved and coherent acquisition and processing of backscatter data from seafloor-mapping sonars. Marine Geophysical Research. 2018;39(1–2):5–22. [Google Scholar]
  • 8.Haggarty DR, Yamanaka L. Evaluating Rockfish Conservation Areas in southern British Columbia, Canada using a Random Forest model of rocky reef habitat. Estuar Coast Shelf Sci. 2018;208:191–204. [Google Scholar]
  • 9.Lucieer V, Hill NA, Barrett NS, Nichol S. Do marine substrates ‘look’and ‘sound’the same? Supervised classification of multibeam acoustic data using autonomous underwater vehicle images. Estuar Coast Shelf Sci. 2013;117:94–106. [Google Scholar]
  • 10.CBD. Strategic Plan for Biodiversity 2011–2020, including Aichi Biodiversity Targets. Convention on Biological Diversity. 21 January 2020.
  • 11.UK. Global Ocean Alliance: 30 countries are now calling for greater ocean protection: Department for Environment, Food & Rural Affairs; 2020. [updated 3 October 2020; cited Nov 30 2020]. Available from: https://www.gov.uk/government/news/global-ocean-alliance-30-countries-are-now-calling-for-greater-ocean-protection.
  • 12.Lawrence E, Hayes KR, Lucieer VL, Nichol SL, Dambacher JM, Hill NA, et al. Mapping habitats and developing baselines in offshore marine reserves with little prior knowledge: a critical evaluation of a new approach. Plos One. 2015;10(10):e0141051. doi: 10.1371/journal.pone.0141051 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Diesing M, Stephens D. A multi-model ensemble approach to seabed mapping. J Sea Res. 2015;100:62–9. [Google Scholar]
  • 14.Breiman L. Random forests. Machine Learning. 2001;45(1):5–32. [Google Scholar]
  • 15.Hastie T, Tibshirani R, Friedman J. The elements of statistical learning. Random Forests: Springer; 2009. p. 587–604. [Google Scholar]
  • 16.Cutler DR, Edwards TC Jr, Beard KH, Cutler A, Hess KT, Gibson J, et al. Random forests for classification in ecology. Ecology. 2007;88(11):2783–92. doi: 10.1890/07-0539.1 [DOI] [PubMed] [Google Scholar]
  • 17.Allouche O, Tsoar A, Kadmon R. Assessing the accuracy of species distribution models: Prevalence, kappa and the true skill statistic (TSS). J Appl Ecol. 2006;43:1223–32. doi: 10.1111/j.1365-2664.2006.01214.x 6607. [DOI] [Google Scholar]
  • 18.Manel S, Williams HC, Ormerod SJ. Evaluating presence–absence models in ecology: the need to account for prevalence. J Appl Ecol. 2001;38(5):921–31. [Google Scholar]
  • 19.Johnson JM, Khoshgoftaar TM. Survey on deep learning with class imbalance. Journal of Big Data. 2019;6(1):27. [Google Scholar]
  • 20.Elith J, Leathwick JR. Species distribution models: ecological explanation and prediction across space and time. Annual Review of Ecology, Evolution, and Systematics. 2009;40:677–97. [Google Scholar]
  • 21.Misiuk B, Lecours V, Bell T. A multiscale approach to mapping seabed sediments. PLoS One. 2018;13(2):e0193647. doi: 10.1371/journal.pone.0193647 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Porskamp P, Rattray A, Young M, Ierodiaconou D. Multiscale and hierarchical classification for benthic habitat mapping. Geosciences. 2018;8(4):119. [Google Scholar]
  • 23.Pelletier D, Selmaoui‐Folcher N, Bockel T, Schohn T. A regionally scalable habitat typology for assessing benthic habitats and fish communities: Application to New Caledonia reefs and lagoons. Ecology and Evolution. 2020. doi: 10.1002/ece3.6405 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Turner JA, Babcock RC, Hovey R, Kendrick GA. Can single classifiers be as useful as model ensembles to produce benthic seabed substratum maps? Estuar Coast Shelf Sci. 2018;204:149–63. [Google Scholar]
  • 25.Vasquez M, Chacón DM, Tempera F, O’Keeffe E, Galparsoro I, Alonso JS, et al. Broad-scale mapping of seafloor habitats in the north-east Atlantic using existing environmental data. J Sea Res. 2015;100:120–32. [Google Scholar]
  • 26.Haggarty DR, Shurin JB, Yamanaka KL. Assessing population recovery inside British Columbia’s Rockfish Conservation Areas with a remotely operated vehicle. Fisheries Research. 2016;183:165–79. [Google Scholar]
  • 27.Costanza R, d’Arge R, De Groot R, Farber S, Grasso M, Hannon B, et al. The value of the world’s ecosystem services and natural capital. Ecol Econ. 1998;25(1):3–15. [Google Scholar]
  • 28.Spalding MD, Fox HE, Allen GR, Davidson N, Ferdaña ZA, Max Finlayson, et al. Marine Ecoregions of the World: A Bioregionalization of Coastal and Shelf Areas. Bioscience. 2007;57(7):573–83. [Google Scholar]
  • 29.Herkül K, Peterson A, Paekivi S. Applying multibeam sonar and mathematical modeling for mapping seabed substrate and biota of offshore shallows. Estuar Coast Shelf Sci. 2017;192:57–71. [Google Scholar]
  • 30.Stephens D, Diesing M. Towards quantitative spatial models of seabed sediment composition. PloS One. 2015;10(11):e0142502. doi: 10.1371/journal.pone.0142502 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Davies SC, Gregr EJ, Bureau D, Wills P. Coastal digital elevation models integrating ocean bathymetry and land topography for marine ecological analyses in Pacific Canadian waters. Can. Tech. Rep. Fish. Aquat Sci. 3321. 2019; vi + 38 p. [Google Scholar]
  • 32.Gregr EJ, Palacios DM, Thompson A, Chan KM. Why less complexity produces better forecasts: An independent data evaluation of kelp habitat models. Ecography. 2018. doi: 10.1111/ecog.03470 [DOI] [Google Scholar]
  • 33.Nephin J, Gregr EJ, St. Germain C, Fields C, Finney JL. Development of a species distribution modelling framework and its application to twelve species on Canada’s Pacific coast. Can. Sci. Advis. Sec. Res. Doc. 2020/004. 2020; xii + 107 p. [Google Scholar]
  • 34.Comber A, Brunsdon C, Charlton M, Harris P. Geographically weighted correspondence matrices for local error reporting and change analyses: mapping the spatial distribution of errors and change. Remote Sensing Letters. 2017;8(3):234–43. [Google Scholar]
  • 35.Guisan A, Zimmermann NE. Predictive habitat distribution models in ecology. Ecol Model. 2000;135:147–86. [Google Scholar]
  • 36.Foody GM. Explaining the unsuitability of the kappa coefficient in the assessment and comparison of the accuracy of thematic maps obtained by image classification. Remote Sens Environ. 2020;239:111630. [Google Scholar]
  • 37.Pontius RGJ, Millones M. Death to Kappa: birth of quantity disagreement and allocation disagreement for accuracy assessment. Int J Remote Sens. 2011;32(15):4407–29. [Google Scholar]
  • 38.Lawson CR, Hodgson JA, Wilson RJ, Richards SA. Prevalence, thresholds and the performance of presence–absence models. Methods in Ecology and Evolution. 2014;5(1):54–64. [Google Scholar]
  • 39.Roberts DR, Bahn V, Ciuti S, Boyce MS, Elith J, Guillera‐Arroita G, et al. Cross‐validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography. 2017;40(8):913–29. [Google Scholar]
  • 40.Elith J, Ferrier S, Huettmann F, Leathwick J. The evaluation strip: A new and robust method for plotting predicted responses from species distribution models. Ecol Model. 2005;186(280–289). [Google Scholar]
  • 41.Fawcett TW, Higginson AD. Heavy use of equations impedes communication among biologists. Proceedings of the National Academy of Science. 2012;109(29):11735–9. doi: 10.1073/pnas.1205259109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Guisan A, Thuiller W, Zimmermann NE. Habitat suitability and distribution models: with applications in R: Cambridge University Press; 2017. [Google Scholar]
  • 43.Yates KL, Bouchet PJ, Caley MJ, Mengersen K, Randin CF, Parnell S, et al. Outstanding challenges in the transferability of ecological models. Trends Ecol Evol. 2018;33(10):790–802. doi: 10.1016/j.tree.2018.08.001 [DOI] [PubMed] [Google Scholar]
  • 44.Araújo MB, Anderson RP, Barbosa AM, Beale CM, Dormann CF, Early R, et al. Standards for distribution models in biodiversity assessments. Science Advances. 2019;5(1):eaat4858. doi: 10.1126/sciadv.aat4858 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Zhang C, Chen Y, Xu B, Xue Y, Ren Y. Temporal transferability of marine distribution models in a multispecies context. Ecol Indicators. 2020;117:106649. [Google Scholar]
  • 46.Foody GM. Local characterization of thematic classification accuracy through spatially constrained confusion matrices. Int J Remote Sens. 2005;26(6):1217–28. [Google Scholar]
  • 47.Prasad AM, Iverson LR, Liaw A. Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems. 2006;9(2):181–99. [Google Scholar]
  • 48.Carignan K, Eakins B, Love M, Sutherland M, McLean S. Bathymetric digital elevation model of British Columbia, Canada: procedures, data sources, and analysis. NOAA National Geophysical Data Center (NGDC). 2013. [Google Scholar]
  • 49.Gregr EJ. BC_EEZ_100m: A 100 m raster of the Canadian Pacific exclusive economic zone. SciTech Environmental Consulting. Vancouver BC. 2012. doi: 10.1038/nature11777 [DOI] [Google Scholar]
  • 50.Masson D, Fine I. Modeling seasonal to interannual ocean variability of coastal British Columbia. Journal of Geophysical Research: Oceans. 2012;117(C10):C10019:1–14. doi: 10.1029/2012jc008151 [DOI] [Google Scholar]
  • 51.Soontiens N, Allen SE, Latornell D, Le Souëf K, Machuca I, Paquin J-P, et al. Storm surges in the Strait of Georgia simulated with a regional model. Atmosphere-Ocean. 2016;54(1):1–21. [Google Scholar]
  • 52.Gregr EJ. Fetch Geometry Calculator Version 1.0 –User Guide. SciTech Environmental Consulting. Vancouver, BC. 2014. doi: 10.3852/13-390 [DOI] [Google Scholar]
  • 53.Dunn DC, Halpin PN. Rugosity-based regional modeling of hard-bottom habitat. Mar Ecol Prog Ser. 2009;377:1–11. [Google Scholar]
  • 54.Gregr EJ, Gryba R, Li M, Alidina H, Kostylev V, Hannah CG. A benthic habitat template for Pacific Canada’s continental shelf. Can. Tech. Rep. Hydrogr. Ocean Sci. 312. 2016; vii + 37. [Google Scholar]
  • 55.Lessard J, Campbell A. Describing northern abalone, haliotis kamtschatkana, habitat: focusing rebuilding efforts in British Columbia, Canada. J Shellfish Res. 2007;26(3):677–86. [Google Scholar]
  • 56.Davies SC, Bureau D, Lessard J, Taylor S, Gillespie GE. Benthic habitat mapping surveys of eastern Haida Gwaii and the North Coast of British Columbia, 2013–2015. Can. Tech. Rep. Fish. Aquat. Sci 3278. 2018; vi + 24 p. [Google Scholar]
  • 57.ESRI. ArcGIS. 2019. 10.4.
  • 58.Wright MN, Ziegler A. Ranger: A fast implementation of random forests for high dimensional data in C++ and R. Journal of Statistical Software. 2017;77(1):1–17. doi: 10.18637/jss.v077.i01 [DOI] [Google Scholar]
  • 59.R Core Team. R: A language and environment for statistical computing. 2018. https://www.R-project.org/.
  • 60.Stehman SV. Selecting and interpreting measures of thematic classification accuracy. Remote Sens Environ. 1997;62(1):77–89. [Google Scholar]
  • 61.Congalton RG. A review of assessing the accuracy of classifications of remotely sensed data. Remote Sens Environ. 1991;37(1):35–46. [Google Scholar]
  • 62.Pontius RGJ, Santacruz A. Quantity, exchange, and shift components of difference in a square contingency table. Int J Remote Sens. 2014;35(21):7543–54. [Google Scholar]
  • 63.Jolliffe IT, Stephenson DB. Forecast verification: a practitioner’s guide in atmospheric science: John Wiley & Sons; 2012. [Google Scholar]
  • 64.Peirce CS. The numerical measure of the success of predictions. Science. 1884;(93):453–4. doi: 10.1126/science.ns-4.93.453-a [DOI] [PubMed] [Google Scholar]
  • 65.Thomson RE. Oceanography of the British Columbia coast. Can. Spec. Publ. Fish. Aquat. Sci. 56. 1981; 291p. [Google Scholar]
  • 66.Wiens JA. Spatial scaling in ecology. Funct Ecol. 1989;3:385–97. [Google Scholar]
  • 67.Austin MP. Spatial prediction of species distribution: an interface between ecological theory and statistical modelling. Ecol Model. 2002;157:101–18. [Google Scholar]
  • 68.Scales KL, Hazen EL, Jacox MG, Edwards CA, Boustany AM, Oliver MJ, et al. Scale of inference: on the sensitivity of habitat models for wide‐ranging marine predators to the resolution of environmental data. Ecography. 2017;40(1):210–20. [Google Scholar]
  • 69.Misiuk B, Diesing M, Aitken A, Brown CJ, Edinger EN, Bell T. A spatially explicit comparison of quantitative and categorical modelling approaches for mapping seabed sediments using Random Forest. Geosciences. 2019;9(6):254. [Google Scholar]
  • 70.Mitchell PJ, Downie A-L, Diesing M. How good is my map? A tool for semi-automated thematic mapping and spatially explicit confidence assessment. Environ Model Software. 2018;108:111–22. [Google Scholar]

Decision Letter 0

Judi Hewitt

6 Apr 2021

PONE-D-20-40989

Comprehensive marine substrate classification of Canada’s Pacific shelf

PLOS ONE

Dear Dr. Gregr,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

The reviewers find a lack of clarity in many areas, in particular around the maps that may be derived from the model and how they would compare to present maps.  Reviewer 2 offers a way to bring clarity to the presentation of the manuscript that would help readers.

Please submit your revised manuscript by May 21 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Judi Hewitt

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially sensitive information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

3. Thank you for stating the following in the Competing Interests section:

[The authors declare no competing interests.].   

We note that one or more of the authors are employed by a commercial company: SciTech Environmental Consulting

  1. Please provide an amended Funding Statement declaring this commercial affiliation, as well as a statement regarding the Role of Funders in your study. If the funding organization did not play a role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript and only provided financial support in the form of authors' salaries and/or research materials, please review your statements relating to the author contributions, and ensure you have specifically and accurately indicated the role(s) that these authors had in your study. You can update author roles in the Author Contributions section of the online submission form.

Please also include the following statement within your amended Funding Statement.

“The funder provided support in the form of salaries for authors [insert relevant initials], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.”

If your commercial affiliation did play a role in your study, please state and explain this role within your updated Funding Statement.

2. Please also provide an updated Competing Interests Statement declaring this commercial affiliation along with any other relevant declarations relating to employment, consultancy, patents, products in development, or marketed products, etc.  

Within your Competing Interests Statement, please confirm that this commercial affiliation does not alter your adherence to all PLOS ONE policies on sharing data and materials by including the following statement: "This does not alter our adherence to  PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests) . If this adherence statement is not accurate and  there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared.

Please include both an updated Funding Statement and Competing Interests Statement in your cover letter. We will change the online submission form on your behalf.

Please know it is PLOS ONE policy for corresponding authors to declare, on behalf of all authors, all potential competing interests for the purposes of transparency. PLOS defines a competing interest as anything that interferes with, or could reasonably be perceived as interfering with, the full and objective presentation, peer review, editorial decision-making, or publication of research or non-research articles submitted to one of the journals. Competing interests can be financial or non-financial, professional, or personal. Competing interests can arise in relationship to an organization or another person. Please follow this link to our website for more details on competing interests: http://journals.plos.org/plosone/s/competing-interests

4. We note that Figures 1, 4, 8, S3 and Striking Image in your submission contain map (Fig. 1) / satellite (Fig. 4, 8, S3 and Striking Image) images which may be copyrighted. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For these reasons, we cannot publish previously copyrighted maps or satellite images created using proprietary data, such as Google software (Google Maps, Street View, and Earth). For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright.

We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission:

  1. You may seek permission from the original copyright holder of Figures 1, 4, 8, S3 and Striking Image to publish the content specifically under the CC BY 4.0 license. 

We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text:

“I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.”

Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission.

In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].”

  1. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only.

The following resources for replacing copyrighted map figures may be helpful:

USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/

The Gateway to Astronaut Photography of Earth (public domain): http://eol.jsc.nasa.gov/sseop/clickmap/

Maps at the CIA (public domain): https://www.cia.gov/library/publications/the-world-factbook/index.html and https://www.cia.gov/library/publications/cia-maps-publications/index.html

NASA Earth Observatory (public domain): http://earthobservatory.nasa.gov/

Landsat: http://landsat.visibleearth.nasa.gov/

USGS EROS (Earth Resources Observatory and Science (EROS) Center) (public domain): http://eros.usgs.gov/#

Natural Earth (public domain): http://www.naturalearthdata.com/

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This is a fairly well written manuscript that describes the construction and use of a statistical model based on random forest classification using bottom type data to map substrate along Canada’s Pacific margin. While the authors extensively describe their modelling development and provide a convincing argument for the model’s validity, I found that a good comparison of the model with detailed marine benthic habitat maps that exist for the region was missing. Unfortunately, I am not qualified to fully evaluate the statistical approaches described in the manuscript, so focused on the practical aspects of constructing substrate maps.

While I think that the model could be very helpful in making a first approximation of substrate distribution when good multibeam bathymetric data is not available, but sediment samples and other seafloor data is at hand, I am concerned that it could also be misleading. Generally, I found the modeling concept sound and the authors do a good job at testing the model’s applicability. I will leave it to those better qualified than me to evaluate the statistic.

I have a few comments to make about the text. While the text is generally clear there are some areas that I found in need of further explanation or clarification. I have added these few comments in the pdf. However, a few points need to be addressed here. While the manuscript is well referenced, and the references cited appear appropriate and comprehensive, the citation format is mixed (i.e., numbered as per the journal’s format in places while in other places not numbered). In addition, I have provided below some references that the authors may find useful, especially in regard to their discussion of “mixed” substrate types and that might be useful in validating the model.

The term “mixed substrate” needs to be better explained and defined when first used. The authors use backscatter as data to identify substrate types but do not fully explain how some “soft” substrate clasts such as gravel, cobble and pebbles can form a hard substrate type as a gravel-pebble-cobble pavement, which is quite common in the offshore BC region.

One confusing problem appears in the comparison of the 20 m and 100 m resolution models where a submarine canyon is not identified in the 20 m model. I am not sure why this is the case. If a distinct change in depth occurs with the presence of a canyon, why would the model not detect this? Is it because the canyon is too large? If so, what about gullies and small features, would they also not be identified in the 20 m model?

My recommendation is that the manuscript be published after minor modifications and clarification of points raised in this review are addressed. I would be especially interested in further validation of the model using available published Canadian marine benthic habitat maps for the Southern Georgia Strait region (see Greene and Barrie, editors (2011). This would show how well the authors’ model fits with comprehensive habitat maps based on MBES data interpretations.

Suggested pertinent references:

Greene, H.G., Yoklavich, M.M., Starr, R., O’Connell, V.M., Wakefield, W.W., Sullivan, D.L. MacRea, J.E. and Cailliet, G.M., 1999. A classification scheme for deep-water seafloor habitats: Oceanographica ACTA, v. 22, n. 6, p. 663-678.

Greene, H.G., Yoklavich, M.M., O’Connell, V.M., Starr, R.M., Wakefield, W.W. and Cailliet, G.M., 2000, Mapping and classification of deep seafloor habitats: ICES paper CM 2000/T:08, 11 p.

Greene, H.G., Bizzarro, J.J., Tilden, J.E., Lopez, H.L., and Erdey, M.D., 2005. The benefits and pitfalls of geographic information systems in marine benthic habitat mapping: In Wright, D.J. and Scholz, A.J., (Eds.), Place Matters. Oregon State University Press, Portland, OR, 34-46.

Greene, H.G., Bizzarro, J.J., O’Connell, V.M., and Brylinsky, C.K., 2007. Construction of digital potential marine benthic habitat maps using a coded classification scheme and its application: In Todd, B.J., and Greene, H.G. (Eds.), Mapping the Seafloor for Habitat Characterization, Canadian Geological Association Special Paper 47, 141-155.

Reviewer #2: PONE-D-20-40989 review

Overall comments

This manuscript describes an interesting and potentially very useful initiative that represents a lot of work and will make a good paper. At present, however, I find its presentation is unclear, with too much blurring between the conventional sections of introduction, methods, results, and discussion. I also have some questions about how the analyses were performed and the inferences made from the results.

The aim of this study is to generate reliable maps of seabed substrate type for the Pacific coast and continental shelf of Canada. The authors compile ca. 200,000 point records of seabed substrate type and use a tree-based machine-learning method, Random Forest (RF), to develop correlations between the observations and gridded layers for depth, depth-derivatives, and seabed wave energy. These correlations are then used to predict substrate type as a continuous layer across the region. The authors then evaluate these predictions by both cross-validation and against a set of independent seabed observations.

The main components of interest in the study, therefore, are: (1) the provenance and spatial distribution of the point observations; (2) the provenance and accuracy of the predictor variables; (3) the provenance and spatial distribution of the independent test data; (4) the modelling methods used, and (5) the resulting maps, with the final mapped classifications being the most important.

What we actually get in the ms is a lot of detail and talk about how well the modelling method has been applied but not much detail, or clarity, on what, to me, are the main points of interest (1), (2), (3), and (5). I found the sequence of sections and content confusing and difficult to follow, with elements of introduction, methods, results, and discussion jumbled up together throughout. As a simple example, why not start the Methods with a description of the study area instead of a statement about the modelling method used and how good it is? The important, and useful thing about the study, by my reading at least, is that it brings together all available point sample data for the region and uses them to generate a classification of the entire Pacific continental shelf of Canada. The methods are important but there is already plenty of published information available about how different modelling methods and evaluation metrics compare, and at present in this ms over-emphasis of these details eclipses appreciation of the main achievement of the study: the compilation of the input data set and the new maps of substrate type generated from them.

The Introduction stretches to 8 pages, includes much material that would be more appropriate in the Discussion, and does not, to me at least, seem to follow a logical progression. The Introduction should provide a concise background to the study; why and where it was undertaken, background and issues associated with the area of research, methods available, and what the specific objectives of this study.

The statements given at the start of the Results section are an example of one of my main issues with the way the study has been presented: it places all the emphasis on the modelling methods and very little on the input data compilation, particularly in terms of spatial distribution, the credibility of the predictor variables, or, most importantly, the utility of the final outputs. It reads more as a methods paper than an attempt to produce a useful resource for environmental management (which, of course, is what it actually is).

Specific comments

Abstract

The abstract is a concise summary of the study; summarising background, aim, study area, data and methods used, results, and conclusions. If the body of the manuscript followed this simple, logical, and interpretable structure, this would be a very nice paper. As it stands, I find the subsequent sections muddled, over-long, and confusing.

Line 28-30: “Predictive power was lower … when models were evaluated with independent data sets, emphasising how this is different from model fit”. This kind of statement seems a bit disingenuous to me, suggesting that ‘model fit’ by cross-validation using subsets of the training data and performance against fully independent survey data are of equal value in assessing the utility of a model. The real test of any model, particularly those designed to inform environmental management decisions, is how well its predictions match reality, in the form of independent observations. Perhaps no need to include “emphasizing how this is different from model fit” (it tells us nothing after all) but add more explanation in the discussion about how the models performed against independent data.

Lines 37-38: “This understanding relies on models of habitat suitability …”. This seems a sweeping statement to me when “This’ covers all aspects of marine ecosystem management.

Lines 38-39: “The credibility of which depends in large part on the accuracy of the underlying environmental predictors.” Yes, this is very true but I would observe that the final layers you develop here are based on the same techniques (RF) and thus carry the same issues of uncertainty associated with the predictor variables.

Line 46: need to reference Random Forest at first use.

Line 51: “mobilizing available observations …”: how do you ‘mobilize’ observations?

Lines 56-57: Meaning of this sentence is unclear to me: at this stage, the reader has no idea what is meant, in this context, by “weighting for prevalence”, and presumably the meaning is ‘use of diverse evaluation metrics’, and what does ‘qualitative assessment’ refer to here?

Line 61: Prior to MBES, seabed characteristics were derived from empirical point observations, which were often accurate and, for older lead-line records, included physical samples of the seabed. The 'inference' element comes when continuous bathymetry layers are created. For most of the world's oceans and seas, this is still the case.

Line 69: Need to be more concise with language: as written here, the meaning is “comprehensive surveys are particularly expensive and time consuming for less developed countries …”. The time and expense are the same, whatever the economic status of the country, its just the affordability that differs. And the final clause of the sentence doesn’t match its subject (i.e. “comprehensive acoustic surveys … can take decades to completely map.”

Line 74: How much, approximately of the shelf here has been mapped?

##: “… a diversity of metrics …”. Has no meaning; just tell us what you used.

Methods

Lines 221-223: should be in the Introduction.

Line 226: “nested” needs to be defined, i.e., nested within what? I assume within the ‘coastwide’ model but this is not explicit in your sentence.

Line 227: “paired models”, meaning what? Without clear explanation these terms are meaningless to the reader.

Lines 219-240: These first three paragraphs of the Methods are also a prime example of what I struggle with in the presentation of this study. They give a condensed summary, an abstract in effect, of what the study did but without any detail. This level of explanation would work well in the Introduction but here, in the Methods, it’s just confusing. For instance, the most fundamental aspect of the study is the input dataset of substrate type observations: this is the first thing the reader needs to be told about, in detail, to be able to understand what the subsequent models are working on and thus assess whether the resulting maps make sense. At present, the input data appear almost as an afterthought, with just a passing reference to Table 1 and no explanation of the data provenance, spatial distribution, or reliability.

Lines 259-260: But how was the weighting done: more weight to higher prevalence? Need explicit methods descriptions.

Lines 270-272: There is not enough detail on how this partition into training and test partitions was done. For spatial data, the way in which test data are selected can have strong influence on subsequent evaluation metrics. Were the test data selected at random, or in spatial bands, or by a more sophisticated spatially disaggregated method? Also, the wording here and later implies that only one iteration of each model was generated, all using the same partition of training and test data. If that is what was done, explanation is needed as to why k-fold cross validation (multiple iterations of each model, each iteration using a different split of the input data between training and test) was not conducted.

Line 275 and onwards: “Addressing our objectives”. Why is this a subsection in the Methods? Too much of the text here should really be in the Introduction or Discussion, not here in the Methods.

Line 280: “We weighted classes according to their prevalence”. Again, how? Weighted up, or down, and by what proportion?

Line 285-286: The input variables of this study, both response and predictor, are relatively very simple, representing primarily (entirely?) physical factors. I am not convinced by the argument that the physical process factors here should be expected to be non-stationary. I suspect differences in the density (‘prevalence’) and reliability of input response and predictor data will have a more important influence on outcomes than non-stationarity of processes.

Line 306 and onwards: “Model evaluation”. Again, by my reading of it, far too much wordage that should be (or already is) covered in the introduction or discussion. I might be jaded but much of this reads like rehashed material from textbooks. The point is, however, that I am used to working with these kinds of data and this kind of modelling method, and the further I read here, the more I find myself confused as to what was done and why.

Line 347: “Model build data”. At last! But there is no detail given about the spatial distribution of these data. For interpretation of the results, I would argue that it is essential to show the reader maps showing how these input data are distributed in space.

Line 379 and onwards: “Independent evaluation data”. How did you decide on which data to include in the ‘build’ set and which in the ‘independent’ set? Both sets include DFO Dive and ROV data, so how do these differ from the cross-validation test data withheld from the training dataset? If the two set of data are actually just arbitrary subsets from the same sources, the independence of the ‘independent’ dataset would be questionable. Again, needs clearer explanation of basic details.

Results

As with the methods, I find the sequence of sections here to be unintuitive, and the content to mix results with discussion material.

Lines 404-407: This paragraph is discussion material.

Line 533: Ah ha. Here, at last, we have more detail about the independent data but still, I would say, not enough to assess their utility. For instance, N = nearly 2,500 ROV ‘mud’ observations for the coastwide model domain but if each observation represents records of substrate type at 20 m intervals along seabed transects each one of which might be one or more km long (at 50 records per km), these data are likely to be strongly clumped in space. If you have not taken measures to account for this spatial clumping of the data, the resulting metrics of performance are likely to be unreliable and probably inflated. We need to see how these records are distributed in space to be able to assess whether the results are useful or not.

Results in general: I would have found if much more useful and interpretable to have included both cross-validation and independent test scores in the same table, simplified down to just one or two example metrics: all the rest could go into the Supplementary Material. Also, a question: where can the final map outputs be found? If the aim was to generate mapped predictions for use in environmental management, the outputs need to be accessible.

Discussion

I found this section to read better than the others. I have not made detailed notes but I would make the same observation about the inferences around stationarity: given the imbalances in the spatial distribution and provenance of your sample and test data, can you really be sure that the differences in model performance you see among regions is attributable to non-stationarity in environmental process rather than artefacts in your input data?

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: H. Gary Greene

Reviewer #2: Yes: David Bowden

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: PONE-D-20-40989_reviewer.pdf

PLoS One. 2021 Oct 29;16(10):e0259156. doi: 10.1371/journal.pone.0259156.r002

Author response to Decision Letter 0


1 Jul 2021

Comprehensive marine substrate classification applied to Canada’s Pacific shelf

Manuscript # PONE-D-20-40989

Response to reviewers

2021/06/27

Summary

The manuscript has been extensively re-organized largely in response to Reviewer #2’s concerns about clarity of presentation. The details of this re-organization are described in specific responses below. In addition, we have reviewed the entire paper for unnecessary jargon and clarity, removed redundancies, and added definitions where required. For example, we found 25 occurrences of the word ‘context’, which we have now restricted to the meaning we defined (sampling context). We replaced the remaining occurrences with more precise terminology.

All comments from Reviewer #1 were addressed and are described in the Specific comments section below. Reviewer #2 provided more general comments, which motivated more significant changes to the paper. In the following sections, we have italicized all reviewer comments and indented our response below them.

General comments (Reviewer #2)

“the presentation is unclear, with too much blurring between the conventional sections of introduction, methods, results, and discussion”.

“I found the sequence of sections and content confusing and difficult to follow, with elements of introduction, methods, results, and discussion jumbled up together throughout. As a simple example, why not start the Methods with a description of the study area instead of a statement about the modelling method used and how good it is? “

“The Introduction stretches to 8 pages, includes much material that would be more appropriate in the Discussion, and does not, to me at least, seem to follow a logical progression. The Introduction should provide a concise background to the study; why and where it was undertaken, background and issues associated with the area of research, methods available, and what the specific objectives of this study.”

We improved the clarity of the paper by re-organizing the text into a more standard format. We moved discussion text from the Results to the Discussion, and swept up other scattered introductory material into the Introduction. While the Introduction still runs to 8 pages, it has a more logical flow and contains more relevant background information. This re-organization also removed much of the redundancy in the paper.

We have re-written the challenges section to make them more concise, and to directly address several reviewer comments. We have more clearly integrated the classification work by Greene and colleagues in the section on Ecological relevance. We have replaced the Methods section ‘Addressing the challenges’ with ‘Model development and comparisons’.

“The main components of interest in the study are: (1) the provenance and spatial distribution of the point observations; (2) the provenance and accuracy of the predictor variables; (3) the provenance and spatial distribution of the independent test data; (4) the modelling methods used, and (5) the resulting maps, with the final mapped classifications being the most important. … What we actually get in the ms is a lot of detail … about how well the modelling method has been applied but not much detail or clarity on … the main points of interest (1), (2), (3), and (5).”

“The methods are important but there is already plenty of published information available about how different modelling methods and evaluation metrics compare, and at present in this ms over-emphasis of these details eclipses appreciation of the main achievement of the study: the compilation of the input data set and the new maps of substrate type generated.”

“The statements given at the start of the Results section are an example of one of my main issues with the way the study has been presented: it places all the emphasis on the modelling methods and very little on the input data compilation, particularly in terms of spatial distribution, the credibility of the predictor variables, or, most importantly, the utility of the final outputs. It reads more as a methods paper than an attempt to produce a useful resource for environmental management (which, of course, is what it actually is).”

We are very pleased the reviewer sees value in our resulting maps. However, we believe our work has broader utility than to just inform the substrate classification of the Canadian Pacific Shelf and therefore worked hard to fully describe the methods we applied (and to provide code) to allow the methods to be applied in other regions faced with the same challenges and objectives. We therefore see the paper as an application of novel methods to a particular case study, and have reflected this in the revised title “Comprehensive marine substrate classification applied to Canada’s Pacific shelf.”

We included new introductory paragraphs where we now more clearly state our objectives, and have elevated the descriptions of all the data sources to the front of the Methods section. We also added a figure to the supplemental materials to more clearly describe the spatial distribution of the substrate observations.

We agree that with the reviewer that “there is already plenty of published information available about how different modelling methods and evaluation metrics compare”, but point out that we are 1) presenting a novel application of random forest modelling, and 2) advancing the important step of model validation through an assessment of methodological challenges using a novel assemblage of metrics curated from three disciplines. We think that others faced with a need for broad scale substrate maps will find value in the methods developed here to classify data with coarser resolution and wider geographic extents (compared to the methods more commonly applied to small extent, high-resolution data).

- We have de-emphasized the accuracy of the predictor variables in the introduction.

“I also have some questions about how the analyses were performed and the inferences made from the results.”

These are elaborated on the Specific comments, below.

Specific Comments

Reviewer #1: Comments received via marked-up PDF.

[1] line 48: All references checked for citation style. Thank you.

[2] line 48: “What is mixed? Does this mean soft and hard like defined by Greene et al. (1999; 2007) where they use induration (soft, hard, mixed hard & soft) to describe mixed? I see you define mixed later on in the manuscript but it should be defined when first used. Also, see Greene & Barrie”.

We have added a sentence emphasizing that our classes our ecologically rather than geologically derived. The sentence also refers the reader to both the descriptions of the source data in Table S1 and to the reference with the ecological rationale for the classes.

[3] line 71: Suggested edit made.

[4] lines 111/112: “Perhaps in the EUNIS system this is correct but BS goes further than just grain size prediction and can be interpreted to apply to packing, sorting, and density. Other uses of backscatter are for mapping facies changes in soft sediment as well as fractured and deformed bedrock in hard substrates.”

We have re-phrased the first sentence to make it clear we were referring to the applications of backscatter classification (BS) to the EUNIS application.

[5] lines 113/114: In response to our assertion that classifications that integrate hand and soft substrates are rare in the literature, the reviewer stated “classifications that discuss the relationship of hard and soft substrates in a mixed substrate category has been discussed and defined in Greene et al. (1999; 2007).”

We appreciate the reminder of existing comprehensive classifications and have included the appropriate refence in this section.

[6] line 120: The definition of Mud has now been included earlier in the paper (see line 48 comment above) and includes ‘unconsolidated’ as a qualifier.

[7] line 150: Edit made. Thank you.

[8] line 165: Reviewer comment: “Why restrict elevation variations to landslides, why not to moraines and other irregular morphologic features that have distinct elevation changes? Also, not all landslides have distinct elevation changes, for example debris flows, or mudflows are very subtle topographic features. Sediment sampling by itself is not the best way to map marine benthic habitats, other process interpretations need to be applied.”

We thank the reviewer for their thoughts on morphological features. For clarity, and to avoid confusion with the word “landslide”, we have replaced the term “land-side” with “terrestrial” throughout the manuscript. “Landslides” are not part of this analysis.

[9] lines 248 to 250: Reference formats are inconsistent.

We thank the reviewer for noting the incorrect format and absence of the tabled references. This has been corrected.

[10] line 262: “What does this mean? Rocky reef is not a geologic term but more of a maritime term and as such is confusing. I like to think of a reef being biological and a shoal or bank composed of rock being a habitat feature comprised of rock and not calcium carbonate or silica.”

We appreciate the semantic complexities that emerge when synthesizing work from different disciplines and appreciate the reviewer sharing their perspective.

We have revised the sentence for clarity by removing the reference to rocky reefs.

[11] line 359: The reference has been corrected. Thank you.

[12] line 376: The reference has been corrected. Thank you.

[13] line 419: “This seems reasonable and not surprising as rock and mud are generally pretty stable substrates if consolidated.”

This comment reflects the effectiveness of the methods for these two classes. No change.

[14] line 601: “This term appears out of place as in geology structure relates to folds and faults and other such structures and in biology it refers to organisms. I suspect what is meant here is bottom geomorphology as that is what the scale or size of a feature is that is being identified.”

We thank the reviewer for pointing out the potential confusion of the phrase ‘bottom structure’. We have changed this to ‘substrate’ to clarify our meaning, and have added the qualifier ‘geomorphic’ at the end of the sentence to clearly express our meaning. These clarifications enabled further improvements to the text in the subsequent sentence.

[15] line 606: “I would encourage some caution here as substrate heterogeneity may become more uniform with depth along some continental slopes but in areas that are heavily gullied and incised by submarine canyons this is not always the case. Increased substrate heterogeneity is especially prevalent along tectonically active margins such as the Queen Charlotte transform margin of Canada.”

We appreciate the reviewer’s insight into the regional effects of plate tectonics. Our assertion here is a reflection of the available substrate data and the analysis. The inclusion of the qualifier ‘generally’ was intended to acknowledge that the observed trend may not be the case everywhere. No change.

[16] lines 610/613: “I see where the authors are going with this but do not agree that the coastline in the Strait of Georgia is more homogeneous than the outer coast, especially since the region has been tectonically deformed and more so altered by glaciation.”

We have removed our observational claim regarding the homogeneity of the coastline, and emphasized the energetic differences between this and the other regions.

[17] line 631: The reference has been corrected. Thank you.

[18] line 636: The reference has been added. Thank you.

[19] line 638: What happened to Reference #63, I did not see it cited before this reference.

Reference #63 is cited above on line 612.

[20] line 644: “What about tidal exchanges. I would suspect that the boundary conditions of the SOG would cause localized scouring and deposition even though Fraser River sedimentation is active.”

We agree that localized scouring is likely in areas of high energy and mention the importance of such local processes later in the paragraph. However, this does not invalidate the acknowledged role of sedimentation in the SOG. We have added a reference to support this assertion, and re-phrased the start of the paragraph to make our point more clearly.

[21] line 652: ”I would expect that geologic history plays a role as well.”

We have rephrased this sentence to refer to “local processes and geological history”.

[22] line 675/677: “Mixed class should be defined up front when you first discuss it.”

Addressed as part of reviewer’s comment on line 48 (above).

[23] line 683: “Not sure what this means. Do you refer to expert interpretations of say MBES data or something else. This appears to be a critical point as this would allow some validation of your model.”

We have revised the phrase ‘mapped predictions’ to ‘maps of model predictions’ to improve the clarity of this statement.

[24] line 700: The confusion between land-side and landslide has been addressed as part of the reviewer’s comment on line 165.

[25] line 703/704: “Such as what type of terrestrial data? Are you referring to LiDAR, satellite imagery, or what?”

We have changed ‘terrestrial data’ to ‘terrestrial elevations’ to more clearly make this point, and largely re-written the middle part of this paragraph for clarity.

[26] line 709: “Seems to be out-of-order.”

This comment refers to reference [63] appearing after [51]. We note that at this point in the manuscript, citations may not appear in order as many were cited earlier.

[27] line 740/742: “It would seem to me that applying you model to the habitat maps that were published by Greene and Barrie (2011) would be helpful in validating you work. The MBES data, sediment samples in the form of grabs and cores, bottom photos and other data used in the construction of their marine benthic habitat maps are readily available and could be inputted into your model.”

We considered doing this validation, however we chose to not complicate an already lengthy analysis because the Greene and Barrie (2011) analysis covers a very small portion of our study area, and because earlier work (Gregr et al. 2013) used a more simplistic sediment classification and showed reasonable agreement. No change.

[28] line 756: “Not sure what exactly "boundary class" means here.”

We thank the reviewer for noticing this introduced jargon. We have removed the phrase “boundary class” and use the more explicit ‘heterogeneous class’ along with an example. We have also made additional clarifications regarding the relevant next step in the rest of the paragraph.

[29] line 766/767: “From multibeam echosounder data distinct geomorphology can be machined ID and refined by expert knowledge. This should be an approach that is useful to you.”

We are aware of this method and describe it in the introduction where we also note that a dependence on multibeam echosounder data is a hinderance for comprehensive classification for many jurisdictions. This lack of comprehensive multibeam coverage was a key motivation for our work. No change.

Reviewer #2: PONE-D-20-40989 review

Abstract

The abstract is a concise summary of the study; summarizing background, aim, study area, data and methods used, results, and conclusions. If the body of the manuscript followed this simple, logical, and interpretable structure, this would be a very nice paper. As it stands, I find the subsequent sections muddled, over-long, and confusing.

We have revised the manuscript as suggested by the reviewer as described in the responses to the following comments.

[30] line 28-30: “Predictive power was lower … when models were evaluated with independent data sets, emphasizing how this is different from model fit”. This kind of statement seems a bit disingenuous to me, suggesting that ‘model fit’ by cross-validation using subsets of the training data and performance against fully independent survey data are of equal value in assessing the utility of a model. The real test of any model, particularly those designed to inform environmental management decisions, is how well its predictions match reality, in the form of independent observations. Perhaps no need to include “emphasizing how this is different from model fit” (it tells us nothing after all) but add more explanation in the discussion about how the models performed against independent data.

We have removed the sentence fragment in question from the abstract, and have revised the text in response to various comments below to emphasize the difference between model fit and model predictive power, as well as clarifying the methods applied.

[31] lines 37-38: “This understanding relies on models of habitat suitability …”. This seems a sweeping statement to me when “This’ covers all aspects of marine ecosystem management.

We have re-written the introduction for clarity and removed this phrase.

[32] lines 38-39: “The credibility of which depends in large part on the accuracy of the underlying environmental predictors.” Yes, this is very true but I would observe that the final layers you develop here are based on the same techniques (RF) and thus carry the same issues of uncertainty associated with the predictor variables.

Our revised introduction now clearly describes how our work fits within the constellation of models we believe are necessary to support coastal resource management.

[33] line 46: need to reference Random Forest at first use.

Done by moving line up from first paragraph of the discussion.

[34] line 51: “mobilizing available observations …”: how do you ‘mobilize’ observations?

We have replaced ‘mobilize’ with the phrase ‘making use of’.

[35] lines 56-57: Meaning of this sentence is unclear to me: at this stage, the reader has no idea what is meant, in this context, by “weighting for prevalence”, and presumably the meaning is ‘use of diverse evaluation metrics’, and what does ‘qualitative assessment’ refer to here?

We have removed the closing paragraph of the introduction containing this sentence, which was intended to foreshadow some of the more technical results.

[36] line 61: Prior to MBES, seabed characteristics were derived from empirical point observations, which were often accurate and, for older lead-line records, included physical samples of the seabed. The 'inference' element comes when continuous bathymetry layers are created. For most of the world's oceans and seas, this is still the case.

We have replaced to phrase ‘inferred from’ with ‘based on’.

[37] line 69: Need to be more concise with language: as written here, the meaning is “comprehensive surveys are particularly expensive and time consuming for less developed countries …”. The time and expense are the same, whatever the economic status of the country, its just the affordability that differs. And the final clause of the sentence doesn’t match its subject (i.e. “comprehensive acoustic surveys … can take decades to completely map.”

We have revised the relevant sentences, and made some additional edits to this section for clarity. Thank you.

[38] line 74: How much, approximately of the shelf here has been mapped?

We have added an estimate of the amount of multi-beam completed.

[39]: “… a diversity of metrics …”. Has no meaning; just tell us what you used.

We could not find this phrase in the manuscript. However, we do refer to ‘diverse and interpretable metrics’ at the end of the first Introduction section. We disagree with the reviewer that this has no meaning. We believe this sentence has value in foreshadowing the importance of metrics selection, and believe this preferable to listing the many metrics used and the rationale for their selection.

However, the re-organization of the manuscript prompted by various other comments by this reviewer have made the relevant section in the Methods more apparent and accessible.

[40] lines 221-223: should be in the Introduction.

We agree and have moved the sentence.

[41] line 226: “nested” needs to be defined, i.e., nested within what? I assume within the ‘coastwide’ model but this is not explicit in your sentence.

We removed the word ‘nested’ and revised the sentence for clarity.

[42] line 227: “paired models”, meaning what? Without clear explanation these terms are meaningless to the reader.

We have re-phrased for clarity, replacing “paired” with the more explicit “with and without class weights”.

[43] lines 219-240: These first three paragraphs of the Methods are also a prime example of what I struggle with in the presentation of this study. They give a condensed summary, an abstract in effect, of what the study did but without any detail. This level of explanation would work well in the Introduction but here, in the Methods, it’s just confusing. For instance, the most fundamental aspect of the study is the input dataset of substrate type observations: this is the first thing the reader needs to be told about, in detail, to be able to understand what the subsequent models are working on and thus assess whether the resulting maps make sense. At present, the input data appear almost as an afterthought, with just a passing reference to Table 1 and no explanation of the data provenance, spatial distribution, or reliability.

The intent of the preamble in the methods section is to give the reader an overview of what is to come. The overall analysis has many aspects and we believe this overview provides considerable value as a guide the reader. ‘

However, in response to the reviewer’s comments, we have limited this to one paragraph, which also now references the data overviews (Table 1 and Table 2) in the first sentence.

We follow this paragraph with the three sections that detail each of the source data sets (Model build data, Predictor data, and Independent evaluation data). Each of these sections refers to associated tables in the supplemental materials containing the details about the observations the reviewer seeks: Table S1 describes the observations and their preparation for this analysis in some detail. While Table S2 describes the source of the predictor data. We welcome the reviewers comments on the sufficiency of these tables.

We thank the reviewer for prompting this valuable re-organization.

[44] lines 259-260: But how was the weighting done: more weight to higher prevalence? Need explicit methods descriptions.

We have simplified the paragraph to focus on the need for weighting and explicitly described how the class weights were calculated.

[45] lines 270-272: There is not enough detail on how this partition into training and test partitions was done. For spatial data, the way in which test data are selected can have strong influence on subsequent evaluation metrics. Were the test data selected at random, or in spatial bands, or by a more sophisticated spatially disaggregated method? Also, the wording here and later implies that only one iteration of each model was generated, all using the same partition of training and test data. If that is what was done, explanation is needed as to why k-fold cross validation (multiple iterations of each model, each iteration using a different split of the input data between training and test) was not conducted.

We have reorganized this section so that the data partitioning is more clearly described, as is the use of the independent data. The reviewer’s questions regarding model validation are now addressed immediately after this paragraph, in the model evaluation section (see response to comment [49] below.

[46] line 275 and onwards: “Addressing our objectives”. Why is this a subsection in the Methods? Too much of the text here should really be in the Introduction or Discussion, not here in the Methods.

We agree with the reviewers assessment and have moved much of this material to the Introduction, where it now provides additional background on each of our stated objectives. This allowed us to consolidate several redundant paragraphs. We have renamed this much shortened section to ‘Model development and comparisons’.

[47] line 280: “We weighted classes according to their prevalence”. Again, how? Weighted up, or down, and by what proportion?

We now explicitly describe class weighting in the new Model development section.

[48] line 285-286: The input variables of this study, both response and predictor, are relatively very simple, representing primarily (entirely?) physical factors. I am not convinced by the argument that the physical process factors here should be expected to be non-stationary. I suspect differences in the density (‘prevalence’) and reliability of input response and predictor data will have a more important influence on outcomes than non-stationarity of processes.

It’s unclear to us why the nature of the predictor and response variables (simple and physical) is relevant to the question of process stationarity. Nevertheless, we removed this paragraph as part of our response to reviewer’s comment [46] above.

We also address the part of this comment regarding the density of points with our response to comment [54] below, with the addition of a new supplemental figure showing the random distribution of the build data. And while the predictor data sets are likely to contain artefacts as we discuss, there is no reason to think these could be responsible for the results we found in terms of regional responses, which we draw on for our conclusions about model stationarity.

Further information is provided in response to comment [57] below. No changes.

[49] line 306 and onwards: “Model evaluation”. Again, by my reading of it, far too much wordage that should be (or already is) covered in the introduction or discussion. I might be jaded but much of this reads like rehashed material from textbooks. The point is, however, that I am used to working with these kinds of data and this kind of modelling method, and the further I read here, the more I find myself confused as to what was done and why.

This section is central to the comprehensive assessment of model performance and as such forms a critical part of this contribution. While we understand and appreciate the reviewer’s perspective that this information is not novel, we believe we have synthesized this important issue in a way that makes it more accessible to less experienced practitioners.

However, we have carefully reviewed and re-written the section for clarity, removing redundancies and moving some text up to the introduction.

[50] line 347: “Model build data”. At last! But there is no detail given about the spatial distribution of these data. For interpretation of the results, I would argue that it is essential to show the reader maps showing how these input data are distributed in space.

We have moved the section on build data near the top of the methods section, and added a supplemental figure showing the spatial distribution of the sampling data.

[51] line 379 and onwards: “Independent evaluation data”. How did you decide on which data to include in the ‘build’ set and which in the ‘independent’ set? Both sets include DFO Dive and ROV data, so how do these differ from the cross-validation test data withheld from the training dataset? If the two set of data are actually just arbitrary subsets from the same sources, the independence of the ‘independent’ dataset would be questionable. Again, needs clearer explanation of basic details.

We have clarified that while the build and independent evaluation data were collected using similar methods, they were collected at different times, often by different observers, and for a different purpose.

[52] As with the methods, I find the sequence of sections here [Results] to be unintuitive, and the content to mix results with discussion material.

We reviewed the Results section and moved all discussion extending beyond one sentence to the Discussion section. Individual sentences interpreting the results were retained to allow us to convey the reasons for the results more clearly.

53] lines 404-407: This paragraph is discussion material.

We have moved the paragraph to the beginning of the discussion section.

[54] line 533: Ah ha. Here, at last, we have more detail about the independent data but still, I would say, not enough to assess their utility. For instance, N = nearly 2,500 ROV ‘mud’ observations for the coastwide model domain but if each observation represents records of substrate type at 20 m intervals along seabed transects each one of which might be one or more km long (at 50 records per km), these data are likely to be strongly clumped in space. If you have not taken measures to account for this spatial clumping of the data, the resulting metrics of performance are likely to be unreliable and probably inflated. We need to see how these records are distributed in space to be able to assess whether the results are useful or not.

The reviewer is correct that transect sampling can lead to pseudo-replication and patterning at the scale of metres. Our description (in Table S1) of how the observations were collected and prepared for this analysis includes details of how we aggregated the various transect data to address this concern, which is already partially mitigated by the resolutions we used in our analysis. We also note that because our analysis is largely based on the relative comparison of different models, the question of inflated performance metrics is not relevant.

To address the question of sampling distribution across the study area (as distinct from the pseudo-replication concern discussed above) we have added a supplemental figure (now S1) showing the spatial distribution of the sampling across the study area.

[55] Results in general: I would have found if much more useful and interpretable to have included both cross-validation and independent test scores in the same table, simplified down to just one or two example metrics: all the rest could go into the Supplementary Material.

While we appreciate the volume of results presented is significant, our goal is not only to present the resulting substrate maps, but to examine ways the reliability of these maps can be assessed – something not previously considered in predictive models of substrate. In our view, this is essential to advance the usefulness of such models. This goal would be undermined by reducing the metrics presented to just one or two.

We suggest that a comparison of cross-validation (Table 4) and independent data evaluation (Table 5) can be easily achieved by comparing individual columns in these two tables. No changes made.

[56] Also, a question: where can the final map outputs be found? If the aim was to generate mapped predictions for use in environmental management, the outputs need to be accessible.

The maps will be available as georeferenced TIF files. We believe this will have higher utility to potential users than figures. We have, however, added the striking figure (showing the coastwide, 100 m classification) as an example of the results in the supplemental materials (Figure S5).

[57] I found this section [Discussion] to read better than the others. I have not made detailed notes but I would make the same observation about the inferences around stationarity: given the imbalances in the spatial distribution and provenance of your sample and test data, can you really be sure that the differences in model performance you see among regions is attributable to non-stationarity in environmental process rather than artefacts in your input data?

First, there are no imbalances in the regional distribution or provenance of the build data used to develop or test the models (see new Figure S1 and the relevant supplementary tables on data provenance. And while we there are differences in sample density across depths, the models are not depth-specific so these differences cannot be causing any suck artefacts in the resulting models.

In our view, the evidence for non-stationarity is overwhelming – starting with Fig. 2 which shows the differences in model performance across classes. If the same processes were at play in all regions, we would expect much more similar results across the regional models. This is supported by the highly variable importance of the different predictors in each region (Fig. 5). We note that this part of the analysis uses a coastwide data set, with no difference in spatial pattern across the regions (new Fig. S1).

We suggest that this result is not, in fact, surprising, given that as we describe in our introduction and elsewhere it is now increasingly clear that stationarity is more of an exception than the rule. No changes made.

Attachment

Submitted filename: Substrate_PLOS_ReviewerResponse v3.docx

Decision Letter 1

Judi Hewitt

9 Aug 2021

PONE-D-20-40989R1

Comprehensive marine substrate classification applied to Canada’s Pacific shelf

PLOS ONE

Dear Dr. Gregr,

Thank you for submitting your manuscript to PLOS ONE. It is obvious that you have met most of the reviewers' suggestions.  A final review does suggest some ways which the manuscript could be improved which we would like you to consider.

Please submit your revised manuscript by Sep 23 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Judi Hewitt

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: (No Response)

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: Thank you for addressing my earlier review comments and questions. I find the revised manuscript to have a much clearer logical flow and makes an interesting study more understandable. I still find the Introduction to be unnecessarily long, containing some material that I suggest could readily be condensed by referencing existing studies. The length of the Introduction is probably something for the editor to decide at this point but there are also a few points I find rather condescending, as currently worded. For instance, in the 'Model performance' section, I find dismissive generalisations such as "... metrics have evolved little ... with most studies continuing to report Cohen's Kappa ...", "adoption of improved metrics has been glacial ...", and "There is a persistent misconception about how to interpret model performance..." to be overly generalised (I am not a geologist but Kappa is very rarely used in relation to predictive model performance in the literature I am familiar with), didactic, and unnecessary.

I also still find the presentation of so many performance metrics to be more confusing than useful, for the most part. Indeed, while trying to interpret the results here I reflected that this illustrates one very good reason why a more refined set set of metrics is "commonly provided" in most published studies; practicality of interpretation. With the slightly revised focus in the title, however, (i.e., the method taking priority over the application) there is an argument for inclusion of more metrics.

A few minor comments:

116-117: As worded here, I don't see why it would follow that "higher resolution models would perform better in shallow waters" (which I interpreted as meaning that a high resolution model would work better in shallow water than it would in deeper water). I would guess the intended meaning might be worded as "higher resolution models would perform better than coarser resolution models in shallower waters."?

Line 226: "class weights" is unexplained, as yet, and therefore uninterpretable here.

Table 2: first column rows are not aligned with others? And "DEMs" is not defined.

Line 263: "cross-walked" is a term I've not seen before and only makes sense once you go to Table S1.

Table 3: Caption does not include "imbalance".

Line 474: Why "not unexpectedly"? If we think our models perform well, why would we not 'expect' them to perform equally well against independent data. Suggest there is no need for this in the sentence and, if retained, the expectation should be supported by references (there are a few recent papers on this subject).

Stationarity section: Now the input data are more fully described, particularly with the sample distribution map figure, this argument is better supported.

lines 590 onwards: Yes, I strongly agree with the points made in this section.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Decision Letter 2

Judi Hewitt

14 Oct 2021

Comprehensive marine substrate classification applied to Canada’s Pacific shelf

PONE-D-20-40989R2

Dear Dr. Gregr,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Judi Hewitt

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Judi Hewitt

21 Oct 2021

PONE-D-20-40989R2

Comprehensive marine substrate classification applied to Canada’s Pacific shelf

Dear Dr. Gregr:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Judi Hewitt

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File

    (DOCX)

    Attachment

    Submitted filename: PONE-D-20-40989_reviewer.pdf

    Attachment

    Submitted filename: Substrate_PLOS_ReviewerResponse v3.docx

    Attachment

    Submitted filename: substrate_paper_ReviewerResponse_r2.docx

    Data Availability Statement

    The data and associated project code have been uploaded to GitHub. The url is https://github.com/ejgregr/substrate_model.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES