Skip to main content
PLOS One logoLink to PLOS One
. 2021 Aug 11;16(8):e0255119. doi: 10.1371/journal.pone.0255119

Assessing the geographic specificity of pH prediction by classification and regression trees

Jacob Egelberg 1,*, Nina Pena 2, Rachel Rivera 2, Christina Andruk 3
Editor: João Canário4
PMCID: PMC8357141  PMID: 34379630

Abstract

Soil pH effects a wide range of critical biogeochemical processes that dictate plant growth and diversity. Previous literature has established the capacity of classification and regression trees (CARTs) to predict soil pH, but limitations of CARTs in this context have not been fully explored. The current study collected soil pH, climatic, and topographic data from 100 locations across New York’s Temperate Deciduous Forests (in the United States of America) to investigate the extrapolative capacity of a previously developed CART model as compared to novel CART and random forest (RF) models. Results showed that the previously developed CART underperformed in terms of predictive accuracy (RRMSE = 14.52%) when compared to a novel tree (RRMSE = 9.33%), and that a novel random forest outperformed both models (RRMSE = 8.88%), though its predictions did not differ significantly from the novel tree (p = 0.26). The most important predictors for model construction were climatic factors. These findings confirm existing reports that CART models are constrained by the spatial autocorrelation of geographic data and encourage the restricted application of relevant machine learning models to regions from which training data was collected. They also contradict previous literature implying that random forests should meaningfully boost the predictive accuracy of CARTs in the context of soil pH.

Introduction

Soil pH mediates provisional and regulatory service availability

Soil pH, the concentration of hydrogen ions in a sample of soil, affects many critical biogeochemical processes. These biogeochemical processes drive plant growth and diversity, which are supporting ecosystem services that sustain provisioning services (food, water, lumber, and fuel output) and regulating services (air quality, climate, erosion, and water purification) [1, 2].

Soil pH affects the metabolic quotient qCO2, which measures organic substrate uptake by soil microbes [3]. In low pH soils, higher metabolic quotients indicate increased substrate utilization by microbes (due to more costly maintenance of internal pH) and reduced carbon availability for plant growth [4, 5]. Conversely, more neutral pH soils experience less substrate depletion and are characterized by higher plant biomass [4]. Plant biomass is further impacted by the pH-dependent leaching of dissolved organic carbon (DOC) and dissolved organic nitrogen (DON) [6, 7]. DOC and DON influence growth elements such as soil nutrient retention and turnover, soil structure, moisture retention and availability, degradation of pollutants, carbon sequestration, and soil resilience [8]. Soil pH also affects the growth of microbial and fungal communities that regenerate plant-limiting soil nutrients, as well as the efficiency of extracellular microbial enzymes [914].

Factors affecting soil pH

Soil pH is moderated by an array of ecological factors that are defined by local climates and topographies. Climatic factors include annual temperature range, warmest quarter precipitation, average annual precipitation, and average annual temperature [15, 16]. Topographic factors include elevation, slope, topographic wetness index, valley depth, channel network, ruggedness, aspect, moisture, silt content, carbon contents, plan curvature, profile curvature, stream power index, length-slope factor, volumetric water content, and parent materials [1520].

The relevance of these factors to soil pH varies by geographic region. In southwestern China, differences in elevation explain at most 0.02% (R2 = 0.0002) of the variance in soil pH [15], whereas they explain up to 25% (R2 = 0.25) on the Tibetan Plateau [17] and 9% (R2 = 0.09) in the Tianton National Forest [19]. Similarly, slope in southwestern China can only explain 2% (R2 = 0.02) of fluctuations in soil pH, whereas in northeast Colorado it accounts for approximately 50% (R2 = ~0.5) of fluctuations in pH [15, 20]. Similar disparities are observable for every climatic and topographic factor.

pH prediction by machine learning

Because of the major role soil pH plays in shaping plant growth and diversity, an awareness of soil pH is crucial to determine the potential benefits of ecosystem provisioning and regulating services. In this respect, a model capable of accurately predicting soil pH from easily accessible data would provide a much-needed scientific tool for later studies. This model could be utilized for improving current understandings of environmental homeostasis and the identification of ecosystems rich with resources required for agricultural production.

Due to the multivariable and nonlinear factors influencing soil pH, recent research has turned to machine learning techniques for developing pH prediction models [2124]. Specifically, classification and regression trees (CARTs) have been applied to ecological datasets [15, 22, 2527]. CARTs are decision trees that utilize a series of (typically binary) data splits to predict categories or values [28].

But traditional limitations of CARTs, such as overfitting, have not been explored in the context of pH prediction. Neither have random forests (RFs), which are known to improve upon the predictive capacity of CART models [21]. RFs are comprised of many discrete CARTs and better predictions by averaging those of each individual regression tree. Further, RFs counter overfitting by merging traditional bootstrapping with elements of randomization during tree construction [22, 24, 29].

Aim of the current study

We hypothesized that (a) Zhang et al.’s CART model would exhibit geographic specificity as a result of overfitting and (b) pH prediction by a novel RF would be more accurate than pH prediction by a novel CART. Regarding (b), we expected the accuracy of a novel CART’s pH predictions to approximate that of Zhang et al.’s CART model in their region of study and exhibit a similar % RRMSE of 6.9.

Our study objectives were to (a) test for geographic specificity by applying a CART model that was developed by Zhang et al. [15] in southwestern China’s humid subtropical hilly regions to data from New York’s Temperate Deciduous Forests, (b) compare the usefulness of CART and RF models at predicting soil pH, and (c) better understand the climatic and topographic factors that influence soil pH at different sites.

Materials and methods

Study area

The study area was located in New York State in the United States of America and spanned approximately 65,000 km2 across four forested New York State subregions: Hudson Valley, Saratoga, Central Leatherstocking, and Finger Lakes (Fig 1). We randomly selected 25 state parks within the study area. Soil samples were collected at four random locations within each park resulting in 100 total pH measurements. Locations for soil pH testing were determined by partitioning each park into 10 equally sized subsections and selecting four at random [30]. Soil pH across the study area ranged between five and seven (S2 File).

Fig 1. Study area.

Fig 1

Points indicate state parks where samples were collected. Republished from [31] under a CC BY license, with permission from ZeeMaps, original copyright 2005.

Soil pH testing

For pH measurement, approximately 100g of surface soil was collected from each sampling location and diluted with 130mL of deionized H2O. The mixture was vigorously inverted and rotated before being let to sit for 10 minutes until solutes sufficiently dissolved. pH was assessed with four-squared plastic pH test strips [32]. This protocol was adapted from [33].

Plastic pH test strips were an appropriate tool for pH measurement in the current context. Previous literature reports that, in moderately acidic solutions, four-squared pH test strips exhibit positive predictive and negative predictive values greater than 95% and exceed 90% sensitivity and specificity [34]. Soil pH values observed in this study fall within this moderately acidic range, reaching five at the lowest and seven at the highest (S2 File).

Topographic and climatic data

Topographic and climatic data for pH testing locations was gathered from gridded 90m × 90m Digital Elevation Models (DEM) and the WorldClim database, respectively, using the System for Automated Geoscientific Analysis (SAGA) version 7.6.2 [3538]. Computational restrictions required that model grids be clipped to +0.001° and -0.001° of their original size in the longitudinal and latitudinal directions prior to basic terrain analysis of sampling locations. Topographic factors analyzed include elevation, slope, topographic wetness index (TWI), valley depth, channel network, and terrain ruggedness index (TRI). Climatic parameters analyzed include annual temperature range (ATR), precipitation of the warmest quarter (PWQ), mean annual precipitation (MAP), and mean annual temperature (MAT). Factors were selected according to those in Zhang et al. [15].

Model parameters

Regression trees were generated with the R package ‘rpart’ [39, 40]. To counter overfitting, trees were pruned with 10-fold cross validation (xval) and data partitioning was ceased for nodes containing fewer than 20 observations (minsplit). Random forests were generated with the R package ‘randomForest’ [39, 41]. In accordance with existing literature [42], the following parameters were set for model optimization: 1000 regression trees comprised the random forest (ntree); a minimum of 1 observation (nodesize) in each node was required for data splits; a maximum of 36 nodes (maxnodes) was permitted to constitute the forest (maxnodes=QuantityofObservations3); the quantity of predictors randomly sampled from during tree construction (mtry) was tuned to minimize the model MSE; and, the seed was generated randomly (seed = 27137). R Code is available on GitHub [43].

Statistical analysis

Spearman correlation tests were performed to determine the strength of monotonic associations between pH and each predictor [44]. P values produced by Spearman correlations were Bonferroni-corrected to adjust for multiple hypothesis testing [45, 46] and set the global type I error rate at 0.05 (α = 0.05). A correlation was considered strong if |ρ|>0.7, moderate if 0.5<|ρ|<0.7, weak if 0.3<|ρ|<0.5, and nonexistent or very weak if 0<|ρ|<0.3 [47]. Two-tailed Wilcoxon Rank Sum tests were employed to quantify statistically significant differences (α = 0.05) between predicted pH values and measured pH values [48, 49]. Relative Root Mean Square Error (RRMSE) values were used to compare CART and RF model accuracies. RRMSE values were calculated as the square root of the average squared difference between actual pH values (y) and predicted pH values (y^), divided by the average of the actual pH values (y¯) [1]. RRMSE was interpreted according to previous literature [50]. Specifically, prediction accuracy was considered “excellent” when RRMSE<10%, “good” when 10%<RRMSE<20%, fair when 20%<RRMSE<30%, and poor when 30%<RRMSE.

RRMSE=i=1n(yiy^l)2ny¯ [1]

Topographic and climatic variable importance to pH prediction was measured by CART Variable Importance (CVI) for CARTs and % IncMSE for RFs. CVI was assessed for a predictor by summing the goodness of split (GOS) measures for each split for which it was the primary predictor (PP) and goodness of split multiplied by adjusted agreement (AA) for each split for which it was a surrogate predictor (SP) [2] [51]. Percent IncMSE was calculated for each predictor by subtracting MSE0 from MSEj, dividing by MSE0, and multiplying by 100 [3], where MSE0 is the MSE of the RF model and MSEj is the MSE of the RF model after the random permuting of predictor values [52].

CARTVariableImportance=GOSPP+GOSSP×AA [2]
%IncMSE=MSEjMSE0MSE0×100 [3]

Larger relative CVI and % IncMSE values indicate greater variable importance to pH prediction.

Model generation and statistical analysis were conducted in R version 4.0.0 [39]. Predictions by Zhang et al.’s CART model were calculated manually in Microsoft Excel.

Results

Descriptive statistics

Sampling location elevation ranged from 2m above sea level to 547m and averaged 190.5m, with land sloping between 0° and 44.7°. Precipitation and temperature measurements were similarly variable; precipitation measured between 870mm and 1292mm and temperature between 4.4°C and 11.36°C. Additional statistics are available in Table 1 and S2 File.

Table 1. Descriptive statistics for topographic and climatic factors across study area.

Factors Minimum Maximum Mean Standard Deviation
Topographic Elevation 2 547 190.5 140.2
Slope 0 44.7 5.8 5.9
TWI 4.46 11.34 7.6 1.1
Valley depth 0 52 7.6 9.6
Channel network 0 546.48 182.1 140.5
TRI 0 46.77 6.7 6.1
Climatic ATR 35.1 41 36.8 1.2
PWQ 256 335 295.7 19.7
MAP 870 1292 1090.6 111.7
MAT 4.4 11.36 8.8 1.7

Abbreviations defined as follows: topographic wetness index (TWI), terrain ruggedness index (TRI), annual temperature range (ATR), precipitation of the warmest quarter (PWQ), mean annual precipitation (MAP), and mean annual temperature (MAT). Reference S1 File for a definition of each factor.

Spearman correlation tests provided the strength of the monotonic association between climatic and topographic variables and soil pH (Table 2). Bonferroni-adjusted p values were referenced to determine significance. There were moderate, significant, and negative associations between Mean Annual Precipitation (MAP) and soil pH (ρ = -0.51; p<0.001), and Precipitation of the Warmest Quarter (PWQ) and soil pH (ρ = -0.51; p<0.001). Weak significant associations were observed between soil pH and Slope (ρ = -0.31; p = 0.017) and soil pH and Terrain Ruggedness Index (TRI) (ρ = -0.33; p = 0.008). All other associations were nonexistent or very weak.

Table 2. Spearman correlations between factors and soil pH.

Factor ρ p value Adjusted p value
Topographic Elevation 0.2110 0.0351 0.3511
Slope -0.3096 0.0017 0.0172*
TWI 0.2943 0.0029 0.0295*
Valley depth -0.1644 0.1020 1.0000
Channel network 0.2224 0.0262 0.2617
TRI -0.3296 0.0008 0.0081*
Climatic ATR -0.1864 0.0633 0.6335
PWQ -0.5081 0.0000 0.0000*
MAP -0.5052 0.0000 0.0000*
MAT -0.2430 0.0149 0.1486

Abbreviations defined as follows: topographic wetness index (TWI), terrain ruggedness index (TRI), annual temperature range (ATR), precipitation of the warmest quarter (PWQ), mean annual precipitation (MAP), and mean annual temperature (MAT).

*Statistically significant. Adjusted p values are Bonferroni-corrected.

Zhang et al.’s pH prediction CART model

In 2019, Zhang et al. developed a pH prediction CART model with topographic and climatic data collected from a hilly region of southwestern China [15]. The current study adapted Zhang et al.’s model to data collected within New York’s Temperate Deciduous Forests to assess the geographic specificity of its pH predictions.

Zhang et al.’s model uniformly predicted New York Temperate Deciduous Forest soil pH to be 5.43. For all sampling locations, factor values determinative of predicted pH in Zhang et al.’s model measured consistently below or above its split cutoff values, despite variation within factor data. ATR exceeded 28.85 degrees Celsius, elevation fell below 1297 meters, and channel network values never equaled or surpassed 867.38 for all locations. Nevertheless, Zhang et al.’s CART was ‘good’ at predicting Temperature Deciduous Forest soil pH (% RRMSE = 14.53), although its predictions differed significantly from observed pH (p<0.001) and it experienced an increase in error relative to its estimations in southwestern China (% RRMSE = 14.53 as compared to % RRMSE = 6.9).

Novel pH prediction models and factor importances

For alternative Temperate Deciduous Forest pH estimations, novel CART and RF models were generated. The novel CART model (Fig 2) ‘excellently’ predicted soil pH (% RRMSE = 9.33) and predictions did not differ significantly from actual data (p = 0.77).

Fig 2. Novel CART model predicting Temperate Deciduous Forest soil pH.

Fig 2

See Table 1 for descriptions of all variables.

The novel RF model also ‘excellently’ predicted Temperate Deciduous Forest soil pH (Fig 3). The RF model’s pH estimations yielded a RRMSE 5% lower than that of the CART model (% RRMSE = 8.88) and its predictions did not differ significantly from observed pH (p = 0.07) or CART predictions (p = 0.26).

Fig 3. Predicted pH by the random forest for all sampling locations.

Fig 3

Republished from [31] under a CC BY license, with permission from ZeeMaps, original copyright 2005.

Plotting the mean squared error (MSE) of the RF model as individual decision trees were recursively added to the ’forest’ demonstrates that n = 1000 trees was sufficient to minimize model MSE (Fig 4).

Fig 4. Random forest model MSE by number of trees in the model (31).

Fig 4

For both novel models, climatic factors were more important to model construction than topographic factors. The two most important factors for CART and RF model construction were mean annual temperature(CVI = 7.5, % IncMSE = 0.092) and mean annual precipitation (CVI = 7.21, % IncMSE = 0.052). The fourth most important factor for each model was also a climatic factor, precipitation of the warmest quarter (CVI = 4.8, % IncMSE = 0.041). Additional factor importances are available in Table 3.

Table 3. Factor importances to CART and RF model construction.

Factors CVI % IncMSE
MAT 7.51 0.0921
MAP 7.212 0.0522
PWQ 4.804 0.0414
ATR 5.313 0.0296
Elevation 3.955/6 0.0453
Channel network 3.955/6 0.0245
TRI 0.427 0.0049
TWI N/A 0.0087
Slope N/A 0.0058
Valley depth N/A -0.00110

Abbreviations defined as follows: topographic wetness index (TWI), terrain ruggedness index (TRI), annual temperature range (ATR), precipitation of the warmest quarter (PWQ), mean annual precipitation (MAP), and mean annual temperature (MAT). Reference S1 File for a definition of each factor.

1,2,3,4,5,6,7,8,9,10Rankings of variable importance from most important (1) to least important (10).

Discussion

The current study investigated the extrapolative capacity of a previously developed CART model as compared to novel CART and random forest models for soil pH prediction. We found that a model developed with data from southwestern China had higher predictive error when applied in our study region, supporting our first hypothesis regarding geographic specificity of CART models. A random forest model was not significantly more accurate at predicting soil pH than a CART model, disagreeing with our second hypothesis regarding the superiority of random forest models.

Geographic specificity

Previously, Zhang et al. [15] developed a classification and regression tree for pH prediction with data from southwestern China. In the current study, the geographic specificity of Zhang et al.’s model was assessed via its application to data from New York’s Temperate Deciduous Forests. Relying on this data, the model suffered an increase in predictive error from 6.9% RRMSE (in southwestern China) to 14.53% RRMSE (in New York), demonstrating its limited pH prediction ability in an alternative geographic region. When a novel CART model was constructed using Temperate Deciduous Forest data, it predicted pH more accurately than Zhang et al.’s model (% RRMSE = 9.33), demonstrating that CART pH predictions are most accurate in the geographic area from which their training data is sourced. As expected, the novel CART model predicted Temperature Deciduous Forest soil pH with an accuracy approximately equal to that of Zhang et al.’s CART model in southwestern China. The novel CART % RRMSE in New York, 9.3, is nearly equal to Zhang et al.’s CART RRMSE in China, 6.9.

Our approach of testing geography was novel and necessary to explore the limitations of soil pH model extrapolation. The failure of Zhang et al.’s model to transfer to a novel geographic region may be due to spatial autocorrelation between factors affecting soil pH. Existing literature documents that attributes close to one another in geographic space and time are similar to one another in value [53]. As a result, the distribution of ecological data in a region is different from the distribution in another and predictive models trained in one learn to fit its distribution specifically [5456]. Research seeking to improve the extrapolative and interpolative abilities of ecological models has turned to accounting for spatial autocorrelation for this reason; specifically, in the fields of biodiversity conservation [57]and pedometrics [5861]. Future research can apply these concepts to soil pH prediction by further testing the limitations of current models and identifying general rules that increase the likelihood of successful model application to new regions.

CART vs. random forest models

We report that random forest models are not significantly more accurate at predicting soil pH than CART models. This disagrees with previous research identifying weaknesses of CART modelling relative to random forests [21, 55, 56, 62]. CART models are susceptible to overfitting training data and fail to maximize predictive accuracy because of unavoidable skewness in training data. Random forests, on the other hand, not only incorporate randomization and bootstrapping into the construction of individual Categorization and Regression trees, but also aggregate the output of many individual trees. These characteristics are thought to counter overfitting and generally improve accuracy [22, 24, 29, 6367]. The current study disagrees with these findings in the context of soil pH prediction. We report that random forest models are not meaningfully more accurate at soil pH prediction from topographic and climatic factors than individual classification and regression trees for our study region. The RF % RRMSE, 8.88, is approximately equal to the CART RRMSE, 9.33, and model predictions did not differ significantly from one another (p = 0.26).

Factors influencing soil pH

The second most important factor for both CART and RF model construction was the climatic factor mean annual precipitation (MAP). MAP demonstrated relatively high and statistically significant correlations with soil pH (ρ = -0.51, p<0.001). Precipitation of the warmest quarter (PWQ) was the fourth most important factor to CART and RF model construction and also exhibited a high and significant correlation with pH (ρ = -0.51, p<0.001).

These findings contrast with those of Zhang et al., who found that the annual temperature range (ATR), terrain wetness index (TWI) and Melton ruggedness number were most important to pH. These factors may have been more important in Zhang et al’s study due to their analysis of a hilly region with much greater variability in elevation and slope than in the current study. The standard deviation of elevation and slope observed here were 140m and 5.9°, less than the 254m and 8° observed by Zhang et al.

This interpretation is supported by previous literature. While global soil pH is highly influenced by precipitation [68], regional heterogeneities can enhance the influence of other factors. In some regions, south-facing slopes tend to exhibit more basic soil pH than north-facing slopes [69] and in others slope direction alone has no significant relationship to pH [70]. The strength of the association between elevation and pH also varies by region, ranging from r = -0.3 in a broadleaf forest [19] to r = -0.5 in a subtropical rainforest [17]. To this effect, Zhang et al. reported a correlation between soil pH and elevation with r = -0.014 [15]. It is probable that the unique topographical characteristics of Zhang et al.’s studied region, including its varied elevation and slope, are responsible for their CART model’s emphasis on ATR, TWI and Melton ruggedness.

In future model creation, swapping factors that exhibited weak or nonexistent correlations to soil pH and that only minimally contributed to model construction with alternative topographic or climatic factors could improve predictive accuracy [24]. Previous literature has utilized linear regression [71] or Boruta all-relevant variable selection for this purpose [54, 72]. Model construction from the minimal-optimal set of variables reduces overfitting and increases interpretability [54].

Applications

Provisioning and regulating services are the most important ecosystem services for security, the basic materials for a good life, and health (1). These services are provided for by supporting services, such as plant growth and diversity, that are controlled by soil pH (2). In this way, the ability to accurately predict global soil pH would expand the scientific knowledge of natural environmental regulation and facilitate an increase in the efficiency of agricultural production. These impacts could improve global living standards by countering climate change and mitigating world hunger.

Regarding climate change, soil carbon-sequestration by plant roots has been proposed as a mechanism for extracting CO2 from the atmosphere [73]. However, the efficiency of root-mediated carbon storage varies by plant species [74] whose optimal growth is influenced by soil pH [75]. In this context, soil pH prediction models could be leveraged to identify ideal growth regions for the large-scale breeding of carbon-sequestering plant species.

Regarding world hunger, carbon-sequestration in soil is also expected to improve crop yields by replenishing historically depleted organic matter that is needed for plant growth [76]. Therefore, pH prediction models could inform agricultural workers about their ability to seed carbon-sequestering plant species on their land, replenish their soil’s carbon content, and improve their crop yields. Further, rice growth is highly pH-dependent [77] and many developing regions rely on rice to provide a large portion of their populations’ average caloric intake [78]. Soil pH prediction models could be used to discern areas naturally conducive to rice growth and improve production. These concepts apply equally to other nourishing plant species.

As shown here, the accuracy of pH prediction, and the ability to leverage pH for combating climate change and world hunger, is influenced by the geographic specificity of predictive models. Therefore, to fully realize the potential of natural soils, our study encourages researchers to restrict the extrapolation of predictive models to the regions from which their training data was sourced.

Taken together, future research should seek to identify a combination of predictors that explain a larger fraction of pH variance, to construct alternative CART and RF pH prediction models for additional geographic regions, and to leverage these improved models for the seeding of carbon-sequestering and nutrient-providing plant species.

Conclusions

Previous literature has established that CART modelling can be applied to soil pH prediction. The current study sought to address the extrapolative capacity of Zhang et al.’s pH prediction CART model in the Temperate Deciduous Forest relative to novel methods of pH prediction in this region. Results indicated that Zhang et al.’s CART model experienced a reduction in predictive accuracy when applied to data from the Temperate Deciduous Forest and was outperformed by novel CART and RF models. We report that pH prediction models are most accurate when applied to their training data’s geographic region, that RF modeling provides no notable advantage over CART modeling in the realm of soil pH prediction, and that climatic factors are useful for model construction.

Supporting information

S1 File. Topographic and climatic factor information.

(XLSX)

S2 File. Topographic and climatic factor and soil pH data.

(XLSX)

Acknowledgments

The authors appreciate the help of Mr. Jeffrey Wuebber who has provided guidance for us throughout our early scientific careers. Thank you to Mr. Egelberg and Mr. and Mrs. Pena for assisting with travel and pH data collection.

Data Availability

All relevant data are within the manuscript and its Supporting Information files.

Funding Statement

The authors received no specific funding for this work.

References

  • 1.Millenium Ecosystem Assesment. Ecosystems and Human Well-being: Synthesis. Washington DC: Island Press; 2005. [Google Scholar]
  • 2.Dahlgren RA. Biogeochemical processes in soils and ecosystems: From landscape to molecular scale. Journal of Geochemical Exploration. 2006;88(1–3):186–9. [Google Scholar]
  • 3.Anderson T, Domsch K. Carbon link between microbial biomass and soil organic matter. In: Megusar F, Gantar M, editors. Perspectives in Microbial Ecology. Ljubljana: Slovene Society for Microbiology; 1986. [Google Scholar]
  • 4.Blagodatskaya EV, Anderson T-H. Interactive effects of pH and substrate quality on the fungal-to-bacterial ratio and qCO2 of microbial communities in forest soils. Soil Biology and Biochemistry. 1998;30(10–11):1269–74. [Google Scholar]
  • 5.Anderson T-H. Microbial eco-physiological indicators to asses soil quality. Agriculture, Ecosystems & Environment. 2003;98(1–3):285–93. [Google Scholar]
  • 6.Andersson S, Nilsson SI, Saetre P. Leaching of dissolved organic carbon (DOC) and dissolved organic nitrogen (DON) in mor humus as aected by temperature and pH. Soil Biology and Biochemistry. 2000;32(1):1–10. [Google Scholar]
  • 7.Curtin D, Campbell CA, Jalil A. Effects of acidity on mineralization: pH-dependence of organic matter mineralization in weakly acidic soils. Soil Biology and Biochemistry. 1998;30(1):57–64. [Google Scholar]
  • 8.Edwards G. Measuring and assessing soils: Government of Western Australia Department of Primary Industries and Regional Development: Agriculture and Food; 2019. [Available from: https://www.agric.wa.gov.au/measuring-and-assessing-soils/what-soil-organic-carbon. [Google Scholar]
  • 9.Rousk J, Brookes PC, Baath E. Contrasting soil pH effects on fungal and bacterial growth suggest functional redundancy in carbon mineralization. Appl Environ Microbiol. 2009;75(6):1589–96. doi: 10.1128/AEM.02775-08 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Pawar RM. The Effect of Soil pH on Bioremediation of Polycyclic Aromatic Hydrocarbons (PAHS). Journal of Bioremediation & Biodegradation. 2015;06(03). [Google Scholar]
  • 11.Sinsabaugh RL, Lauber CL, Weintraub MN, Ahmed B, Allison SD, Crenshaw C, et al. Stoichiometry of soil enzyme activity at global scale. Ecol Lett. 2008;11(11):1252–64. doi: 10.1111/j.1461-0248.2008.01245.x [DOI] [PubMed] [Google Scholar]
  • 12.Turner BL. Variation in pH optima of hydrolytic enzyme activities in tropical rain forest soils. Appl Environ Microbiol. 2010;76(19):6485–93. doi: 10.1128/AEM.00560-10 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Collins SL, Sinsabaugh RL, Crenshaw C, Green L, Porras-Alfaro A, Stursova M, et al. Pulse dynamics and microbial processes in aridland ecosystems. Journal of Ecology. 2008;96(3):413–20. [Google Scholar]
  • 14.Singh BK, Walker A. Microbial degradation of organophosphorus compounds. FEMS Microbiol Rev. 2006;30(3):428–71. doi: 10.1111/j.1574-6976.2006.00018.x [DOI] [PubMed] [Google Scholar]
  • 15.Zhang YY, Wu W, Liu H. Factors affecting variations of soil pH in different horizons in hilly regions. PLoS One. 2019;14(6):e0218563. doi: 10.1371/journal.pone.0218563 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Ji C-J, Yang Y-H, Han W-X, He Y-F, Smith J, Smith P. Climatic and Edaphic Controls on Soil pH in Alpine Grasslands on the Tibetan Plateau, China: A Quantitative Analysis. Pedosphere. 2014;24(1):39–44. [Google Scholar]
  • 17.Chen Z-S, Hsieh C-F, Jiang F-Y, Hsieh T-H, Sun I-F. Relations of soil properties to topography and vegetation in a subtropical rain forest in southern Taiwan. Plant Ecology. 1997;132:229–41. [Google Scholar]
  • 18.Reauter HI, Lado LR, Hengl T, Montanarella L. Continental-scale Digital Soil Mapping Using European Soil Profile Data: Soil pH. Hamburg contributions to physical geography and landscape ecology. Hamburg: University of Hamburg; 2008. p. 91–102. [Google Scholar]
  • 19.Li X, Chang SX, Liu J, Zheng Z, Wang X. Topography-soil relationships in a hilly evergreen broadleaf forest in subtropical China. Journal of Soils and Sediments. 2016;17(4):1101–15. [Google Scholar]
  • 20.Moore I, Gessler P, Nielsen G, Peterson G. Soil Attribute Prediction Using Terrain Analysis. Soil Science Society of America Journal. 1993;57(2):443–7. [Google Scholar]
  • 21.Wiesmeier M, Barthold F, Blank B, Kögel-Knabner I. Digital mapping of soil organic matter stocks using Random Forest modeling in a semi-arid steppe ecosystem. Plant and Soil. 2010;340(1–2):7–24. [Google Scholar]
  • 22.Prasad AM, Iverson LR, Liaw A. Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction. Ecosystems. 2006;9(2):181–99. [Google Scholar]
  • 23.Stoorvogel JJ, Kempen B, Heuvelink GBM, de Bruin S. Implementation and evaluation of existing knowledge for digital soil mapping in Senegal. Geoderma. 2009;149(1–2):161–70. [Google Scholar]
  • 24.Padarian J, Minasny B, McBratney AB. Machine learning and soil sciences: a review aided by machine learning tools. Soil. 2020;6(1):35–52. [Google Scholar]
  • 25.Bui E, Henderson B, Viergever K. Using knowledge discovery with data mining from the Australian Soil Resource Information System database to inform soil carbon mapping in Australia. Global Biogeochemical Cycles. 2009;23(4):n/a-n/a. [Google Scholar]
  • 26.Bui EN, Henderson BL, Viergever K. Knowledge discovery from models of soil properties developed through data mining. Ecological Modelling. 2006;191(3–4):431–46. [Google Scholar]
  • 27.Henderson BL, Bui EN, Moran CJ, Simon DAP. Australia-wide predictions of soil properties using decision trees. Geoderma. 2005;124(3–4):383–98. [Google Scholar]
  • 28.Wilkinson L. Tree Structure Data Analysis: AID, CHAID, and CART. Sawtooth/SYSTAT Join Software Conference; Sun Valley, Idaho: 1992. [Google Scholar]
  • 29.Breiman L. Random Forests. Machine Learning. 2001;45:5–32. [Google Scholar]
  • 30.Haahr M. RANDOM.ORG: True Random Number Service [Available from: https://www.random.org.
  • 31.ZeeMaps. Create and publish interactive maps 2005 [Available from: https://www.zeemaps.com/.
  • 32.LabRat Supplies. PH STRIPS [Available from: https://www.labrat-supplies.com/collections/ph-strips.
  • 33.Cole-Parmer. Testing the pH of Soil Samples: Cole-Parmer; n.d. [Available from: https://www.coleparmer.com/tech-article/soil-samples-ph-testing.
  • 34.Metheny N, Gunn E, Rubbelke C, Quillen T, Ezekiel U, Meert K. Effect of pH Test-Strip Characteristics on Accuracy of Readings. Critical Care Nurse. 2017;37(3):50–8. doi: 10.4037/ccn2017199 [DOI] [PubMed] [Google Scholar]
  • 35.Hole-filled seamless SRTM data V4 [Internet]. SRTM 90m DEM Digital Elevation Database. 2008 [cited July 12 2020]. Available from: http://srtm.csi.cgiar.org.
  • 36.Conrad O, Bechtel B, Bock M, Dietrich H, Fischer E, Gerlitz L, et al. System for Automated Geoscientific Analyses (SAGA) v. 2.1.4. Geoscientific Model Development. 1991–2015;8. [Google Scholar]
  • 37.Farr TG, Rosen PA, Caro E, Crippen R, Duren R, Hensley S, et al. The Shuttle Radar Topography Mission. Reviews of Geophysics. 2007;45(2). [Google Scholar]
  • 38.Fick SE, Hijmans RJ. WorldClim 2: new 1km spatial resolution climate surfaces for global land areas. International Journal of Climatologty. 2017;37(12):4302–15. [Google Scholar]
  • 39.R Core Team. R: A language and environment for statistical computing Vienna: R Foundation for Statistical Computing; 2020. [Available from: https://www.R-project.org/. [Google Scholar]
  • 40.Therneau T, Atkinson B. rpart: Recursive Partitioning and Regression Trees. 2019. [Google Scholar]
  • 41.Package ’randomForest’ [Internet]. The Comprehensive R Archive Network: Contributed Packages. 2002 [cited July 9, 2020]. Available from: https://cran.r-project.org/web/packages/randomForest/.
  • 42.Scornet E, Coeurjolly J-F, Leclercq-Samson A. Tuning parameters in random forests. ESAIM: Proceedings and Surveys. 2017;60:144–62. [Google Scholar]
  • 43.pH Prediction by Machine Learning [Internet]. 2021. Available from: https://github.com/Jake1Egelberg/pH-Prediction-by-Machine-Learning.
  • 44.Schober P, Boer C, Schwarte LA. Correlation Coefficients: Appropriate Use and Interpretation. Anesth Analg. 2018;126(5):1763–8. doi: 10.1213/ANE.0000000000002864 [DOI] [PubMed] [Google Scholar]
  • 45.Jafari M, Ansari-Pour N. Why, When and How to Adjust Your P Values? Cell. 2019;20(4):604–7. doi: 10.22074/cellj.2019.5992 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Schober P, Vetter T. Adjustments for Multiple Testing in Medical Research. Anesth Analg. 2020;130(1):99. doi: 10.1213/ANE.0000000000004545 [DOI] [PubMed] [Google Scholar]
  • 47.Mindrila D, Balentyne P. Scatterplots and Correlation [Available from: https://www.westga.edu/academics/research/vrc/assets/docs/scatterplots_and_correlation_notes.pdf.
  • 48.Schober P, Vetter TR. Two-Sample Unpaired t Tests in Medical Research. Anesth Analg. 2019;129(4):911. doi: 10.1213/ANE.0000000000004373 [DOI] [PubMed] [Google Scholar]
  • 49.Schober P, Vetter T. Nonparametric Statistical Methods in Medical Research. Anesth Analg. 2020;131(6):1862–3. doi: 10.1213/ANE.0000000000005101 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Li M-F, Tang X-P, Wu W, Liu H-B. General models for estimating daily global solar radiation for different solar radiation zones in mainland China. Energy Conversion and Management. 2013;70:139–48. [Google Scholar]
  • 51.An Introduction to Recursive Partitioning Using the RPART Routines [Internet]. The Comprehensive R Archive Network: Contributed Packages. 2019 [cited July 4 2020]. Available from: https://CRAN.R-project.org/package=rpart.
  • 52.Freeman P. Random Forest: Variable Importance 2019. [Available from: http://stat.cmu.edu/summer/cmsacamp/Week_04_Tuesday/RF_Var_Imp.Rmd. [Google Scholar]
  • 53.Yates KL, Bouchet PJ, Caley MJ, Mengersen K, Randin CF, Parnell S, et al. Outstanding Challenges in the Transferability of Ecological Models. Trends in Ecology & Evolution. 2018;33(10):790–802. doi: 10.1016/j.tree.2018.08.001 [DOI] [PubMed] [Google Scholar]
  • 54.Keskin H, Grunwald S, Harris WG. Digital mapping of soil carbon fractions with machine learning. Geoderma. 2019;339:40–58. [Google Scholar]
  • 55.Jiang Z, Knight J. Learning Spatial Decision Tree For Geographical Classification: A Summary of Results. ACM SIGSPATIAL GIS; 2012; Redondo Beach, CA2012. p. 390–3. [Google Scholar]
  • 56.Li X, Claramunt C. A Spatial Entropy-Based Decision Tree for Classification of Geographical Information. Transactions in GIS. 2006;10(3):451–67. [Google Scholar]
  • 57.Fitzpatrick MC, Hargrove WW. The projection of species distribution models and the problem of non-analog climate. Biodiversity and Conservation. 2009;18(8):2255–61. [Google Scholar]
  • 58.Simbahan GC, Dobermann A, Goovaerts P, Ping J, Haddix ML. Fine-resolution mapping of soil organic carbon based on multivariate secondary data. Geoderma. 2006;132(3–4):471–89. [Google Scholar]
  • 59.Vasques GM, Grunwald S, Comerford NB, Sickman JO. Regional modelling of soil carbon at multiple depths within a subtropical watershed. Geoderma. 2010;156(3–4):326–36. [Google Scholar]
  • 60.Mishra U, Torn MS, Masanet E, Ogle SM. Improving regional soil carbon inventories: Combining the IPCC carbon inventory method with regression kriging. Geoderma. 2012;189–190:288–95. [Google Scholar]
  • 61.Sun W, Minasny B, McBratney A. Analysis and prediction of soil properties using local regression-kriging. Geoderma. 2012;171–172:16–23. [Google Scholar]
  • 62.Liaw A, Wiener M. Classification and Regression by randomForest. R News. 2002:18–22. [Google Scholar]
  • 63.Heung B, Ho HC, Zhang J, Knudby A, Bulmer CE, Schmidt MG. An overview and comparison of machine-learning techniques for classification purposes in digital soil mapping. Geoderma. 2016;265:62–77. [Google Scholar]
  • 64.Ließ M, Glaser B, Huwe B. Uncertainty in the spatial prediction of soil texture. Geoderma. 2012;170:70–9. [Google Scholar]
  • 65.Guo P-T, Li M-F, Luo W, Tang Q-F, Liu Z-W, Lin Z-M. Digital mapping of soil organic matter for rubber plantation at regional scale: An application of random forest plus residuals kriging approach. Geoderma. 2015;237–238:49–59. [Google Scholar]
  • 66.Tziachris P, Aschonitis V, Chatzistathis T, Papadopoulou M, Doukas ID. Comparing Machine Learning Models and Hybrid Geostatistical Methods Using Environmental and Soil Covariates for Soil pH Prediction. ISPRS International Journal of Geo-Information. 2020;9(4). [Google Scholar]
  • 67.Chen S, Liang Z, Webster R, Zhang G, Zhou Y, Teng H, et al. A high-resolution map of soil pH in China made by hybrid modelling of sparse soil data and environmental covariates and its implications for pollution. Sci Total Environ. 2019;655:273–83. doi: 10.1016/j.scitotenv.2018.11.230 [DOI] [PubMed] [Google Scholar]
  • 68.Slessarev EW, Lin Y, Bingham NL, Johnson JE, Dai Y, Schimel JP, et al. Water balance creates a threshold in soil pH at the global scale. Nature. 2016;540(7634):567–9. doi: 10.1038/nature20139 [DOI] [PubMed] [Google Scholar]
  • 69.Seibert J, Stendahl J, Sørensen R. Topographical Influences on Soil Properties in Boreal Forests. Geoderma. 2007;141:139–48. [Google Scholar]
  • 70.Tamene GM, Adiss HK, Alemu MY. Effect of Slope Aspect and Land Use Types on Selected Soil Physicochemical Properties in North Western Ethiopian Highlands. Applied and Environmental Soil Science. 2020;2020:8463259. [Google Scholar]
  • 71.Wang B, Waters C, Orgill S, Cowie A, Clark A, Li Liu D, et al. Estimating soil organic carbon stocks using different modelling techniques in the semi-arid rangelands of eastern Australia. Ecological Indicators. 2018;88:425–38. [Google Scholar]
  • 72.Xiong X, Grunwald S, Myers DB, Kim J, Harris WG, Comerford NB. Holistic environmental soil-landscape modeling of soil organic carbon. Environmental Modelling & Software. 2014;57:202–15. [Google Scholar]
  • 73.Kell DB. Large-scale sequestration of atmospheric carbon via plant roots in natural and agricultural ecosystems: why and how. Philosophical Transactions of the Royal Society B: Biological Sciences. 2012;367(1595):1589–97. doi: 10.1098/rstb.2011.0244 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Mandal RA, Jha PK, Dutta IC, Thapa U, Karmacharya SB. Carbon Sequestration in Tropical and Subtropical Plant Species in Collaborative and Community Forests of Nepal. Advances in Ecology. 2016;2016:1529703. [Google Scholar]
  • 75.Neina D. The Role of Soil pH in Plant Nutrition and Soil Remediation. Applied and Environmental Soil Science. 2019;2019:1–9. [Google Scholar]
  • 76.Lal R. Soil Carbon Sequestration Impacts on Global Climate Change and Food Security. Science. 2004;304(5677):1623–7. doi: 10.1126/science.1097396 [DOI] [PubMed] [Google Scholar]
  • 77.Abdul Halim N, Abdullah R, Karsani S, Osman N, Panhwar Q, Ishak C. Influence of Soil Amendments on the Growth and Yield of Rice in Acidic Soil. Agronomy. 2018;8(9):165. [Google Scholar]
  • 78.Elert E. Rice by the numbers: A good grain. Nature. 2014;514(7524):S50–S1. doi: 10.1038/514s50a [DOI] [PubMed] [Google Scholar]

Decision Letter 0

João Canário

27 Apr 2021

PONE-D-21-10198

Assessing the geographic specificity of pH prediction by classification and regression trees

PLOS ONE

Dear Dr. Egelberg,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

The submitted manuscript has some merit but I totally agree with both reviewers that this work needs some major revisions before being considered to be published in PLON ONE. I'm mainly concern about the short description of the state of the art (Introduction) and the methods section. refer 2 highlighted this point in the revisions,

Other aspect is that I've some difficulty  to the highly the scientific novelty of this work and the future applications. The authors should clearly highlight this aspect.

Please submit your revised manuscript by Jun 11 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

João Canário, PhD

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that Figures 1 and 3 in your submission contain map images which may be copyrighted.

All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For these reasons, we cannot publish previously copyrighted maps or satellite images created using proprietary data, such as Google software (Google Maps, Street View, and Earth). For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright.

We require you to either (a) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (b) remove the figures from your submission:

a. You may seek permission from the original copyright holder of Figures 1 and 3 to publish the content specifically under the CC BY 4.0 license. 

We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text:

“I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.”

Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission.

In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].”

b. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only.

The following resources for replacing copyrighted map figures may be helpful:

USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/

The Gateway to Astronaut Photography of Earth (public domain): http://eol.jsc.nasa.gov/sseop/clickmap/

Maps at the CIA (public domain): https://www.cia.gov/library/publications/the-world-factbook/index.html and https://www.cia.gov/library/publications/cia-maps-publications/index.html

NASA Earth Observatory (public domain): http://earthobservatory.nasa.gov/

Landsat: http://landsat.visibleearth.nasa.gov/

USGS EROS (Earth Resources Observatory and Science (EROS) Center) (public domain): http://eros.usgs.gov/#

Natural Earth (public domain): http://www.naturalearthdata.com/

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript entitled “Assessing the geographic specificity of pH prediction by classification and regression trees” by Egelberg et al. showed the influence of different geographic conditions in the prediction of pH by CART´s (classification and regression trees), their limitations by machine learning and the use of new CART´s and random forest (RF) models to compare with previous ones applied in different regions. This study is based on a previous one conducted by Zhang et al. (2019) and the authors applied the CART´s model developed by Zhang et al. and compared with new ones specially used for this study.

General comments:

The manuscript is well written, the introduction is short but contain all the necessary information for the current study. The goals of the study are simple and well designated, very understandable and in agreement with what the authors want to understand beyond what is already known. Material and methods section is well designed with necessary detail. The results are well presented and divided into appropriate key points. The discussion section is in accordance with the presented goals and results from the study and the conclusions are well described, according to the discussion of the obtained results and aligned with the hypotheses presented in the goals and tested with the results.

With a short revision (indicated below in the specific comments and considered as minor revision) in order to improve the manuscript, I recommend this research for publication.

Specific comments:

Material and Methods:

Line 103: Caption of Fig. 1: where is “Created” should be “created”.

Line 107: dH20 is deionized water? If yes, please indicate in the text; “…vigorously shaken…” during which time? And how? Please indicate!

Line 143: Please describe each of the parameters in the equation (1).

Line 148: Where is (2), (44) indicate as (2, 44). Are both references right?

Results:

Lines 167-173: you should indicate the meaning of all abbreviations used in the table and not only of those discussed. You can add this information in the end of the table as notes.

Line 177: According to the table 3 the percentage of 26.6 is actually 26.4 (see table 3). please correct accordingly in the text.

Line 205: Since the manuscript only have 3 figures I think that you can use this figure in S2 in the manuscript too, as fig. 4.

Reviewer #2: Comments on the manuscript PONE-D-21-10198

The manuscript “Assessing the geographic specificity of pH prediction by classification and regression trees” by Egelberg et al. intends to evaluate the suitability of machine learning on the prediction of pH soil, by replicating the approach of Zhang et al. (2019) using CART model and by complementing it with random forest models.

The manuscript has a good structure and it is easy to read. However, I do find some weaknesses that should be revised to improve the paper. My major concern is related with the overall lack of detail in the manuscript. The hypothesis should be clearly indicated. The discussion in particular lacks an integration of data obtained, whereas the authors provide general information without discussing their data. Other comments are also indicated.

I suggest the authors a major revision of the manuscript.

Aims

Lines 91-93 should precede the aims. Can the authors give more detail on the accuracy of RF prediction?

Study area

Is there any information that could be added to describe pH in the soils sampled?

Soil pH testing

Can the authors provide a reference for the procedure adopted?

The use of pH strips does not provide sufficient accuracy for the measurements, which could benefit the model approach used by the authors.

Topographic and climatic data

Table 1 is merely informative and in part replicates tables 2 and 3. It could be in supporting information since it does not provide substantial information relevant to study.

Statistical analysis

Did the authors check the normality of the data?

Lines 167-168: to assess ‘the strength of the linear association between variables and soil pH’ using Pearson correlation the authors have to indicate if variables did present normal distribution.

Discussion

Lines 219-221: This is a conclusion drawn from your work. Please, revise the start of your discussion.

The discussion lacks the support of the data obtained.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Aug 11;16(8):e0255119. doi: 10.1371/journal.pone.0255119.r002

Author response to Decision Letter 0


26 May 2021

RESPONSE TO EDITOR:

EDITOR COMMENT: The submitted manuscript has some merit but I totally agree with both reviewers that this work needs some major revisions before being considered to be published in PLON ONE. I'm mainly concern about the short description of the state of the art (Introduction) and the methods section. refer 2 highlighted this point in the revisions, Other aspect is that I've some difficulty to the highly the scientific novelty of this work and the future applications. The authors should clearly highlight this aspect.

RESPONSE: Thank you for coordinating the review process. We have revised the

manuscript and addressed Reviewer’s comments. To address your recommendations, we have elaborated on the importance of our research in the Discussion section.

General Comments:

COMMENT: Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming.

RESPONSE: We have checked that all formatting and naming requirements have been met.

COMMENT: We note that Figures 1 and 3 in your submission contain map images which may be copyrighted. We require you to either (a) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (b) remove the figures from your submission:

RESPONSE: We have obtained permission from the copyright holder, ZeeMaps, to publish their images. A PLOS ONE Content Permission Form has been uploaded.

RESPONSE TO REVIEWER 1:

REVIEWER COMMENT: The manuscript is well written, the introduction is short but contain all the necessary information for the current study. The goals of the study are simple and well designated, very understandable and in agreement with what the authors want to understand beyond what is already known. Material and methods section is well designed with necessary detail. The results are well presented and divided into appropriate key points. The discussion section is in accordance with the presented goals and results from the study and the conclusions are well described, according to the discussion of the obtained results and aligned with the hypotheses presented in the goals and tested with the results. With a short revision (indicated below in the specific comments and considered as minor revision) in order to improve the manuscript, I recommend this research for publication.

RESPONSE: We thank the reviewer for their kind words. We have revised our manuscript according to their recommendations below.

REVIEWER COMMENT: Line 103: Caption of Fig. 1: where is “Created” should be “created”.

RESPONSE: We have revised the caption of Fig. 1 to read “Study area. Points indicate state parks where samples were collected. Created with ZeeMaps.”

REVIEWER COMMENT: Line 107: dH20 is deionized water? If yes, please indicate in the text; “…vigorously shaken…” during which time? And how? Please indicate!

RESPONSE: We specified this line to read “deionized H2O.” We replaced the phrasing of “vigorously shaken” with a more specific description: “The mixture was vigorously inverted and rotated before being let to sit for 10 minutes…”

REVIEWER COMMENT: Please describe each of the parameters in the equation (1).

RESPONSE: We added definitions of y, y ^, and y ®, as well as an in-text description of the equation.

REVIEWER COMMENT: Line 148: Where is (2), (44) indicate as (2, 44). Are both references right?

RESPONSE: We redid notation to clearly distinguish between references to a citation and references to an equation. References to an equation are now labeled and in brackets.

REVIEWER COMMENT: Lines 167-173: you should indicate the meaning of all abbreviations used in the table and not only of those discussed. You can add this information in the end of the table as notes.

RESPONSE: We added all abbreviations in the subsection Topographic and climatic data.

REVIEWER COMMENT: Line 177: According to the table 3 the percentage of 26.6 is actually 26.4 (see table 3). please correct accordingly in the text.

RESPONSE: We re-constructed Table 3 with data from Spearman correlations rather than Pearson correlations. We performed this step to accommodate reviewer 2’s methodological recommendations. Updated values in the table match those in the text, but the specific sentence identified here was removed because it is not appropriate to square the Spearman Rho.

REVIEWER COMMENT: Line 205: Since the manuscript only have 3 figures I think that you can use this figure in S2 in the manuscript too, as fig. 4.

RESPONSE: We have added figure S2 in to the manuscript as figure 4.

RESPONSE TO REVIEWER 2:

REVIEWER COMMENT: The manuscript “Assessing the geographic specificity of pH prediction by classification and regression trees” by Egelberg et al. intends to evaluate the suitability of machine learning on the prediction of pH soil, by replicating the approach of Zhang et al. (2019) using CART model and by complementing it with random forest models.

The manuscript has a good structure and it is easy to read. However, I do find some weaknesses that should be revised to improve the paper. My major concern is related with the overall lack of detail in the manuscript. The hypothesis should be clearly indicated.

RESPONSE: See subsection Aim of the current study in the Introduction for our hypotheses.

The discussion in particular lacks an integration of data obtained, whereas the authors provide general information without discussing their data.

RESPONSE: Data from the Results was incorporated into the Discussion to support each claim made.

Other comments are also indicated. I suggest the authors a major revision of the manuscript.

RESPONSE: We thank the reviewer for their effort and attention to detail. We have revised our manuscript according to their recommendations below.

REVIEWER COMMENT: Lines 91-93 should precede the aims. Can the authors give more detail on the accuracy of RF prediction?

RESPONSE: We believe the logical flow of the paper is better maintained by stating the study objectives prior to our expectations regarding those objectives. As such, we have kept lines 91-93 before the aims. This decision aligns with the recommendation of Schober et al. in “Clear Study Aims and Hypotheses in a Research Paper.”

However, we updated our hypotheses to provide a more specific estimate regarding the accuracy of RF prediction. Because we believed that the RF would be more accurate than the CART, and that our CART would demonstrate an accuracy in our region similar to Zhang et al.’s 6.9% RRMSE in their region, we hypothesized that our RF would yield a % RRMSE of below 6.9.

REVIEWER COMMENT: Is there any information that could be added to describe pH in the soils sampled?

RESPONSE: We added a statement describing the observed pH range in our study region.

REVIEWER COMMENT: Can the authors provide a reference for the procedure adopted?

RESPONSE: We added a reference from which we derived our protocol.

REVIEWER COMMENT: The use of pH strips does not provide sufficient accuracy for the measurements, which could benefit the model approach used by the authors.

RESPONSE: We updated the description of our methodology with additional details and citations to support its robustness. Specifically, we added that we used four-squared pH test strips for pH testing. Previous literature has found these types of test strips to exhibit greater than 95% positive and negative predictive power and greater than 90% sensitivity and specificity in moderately acidic pH. Considering that our observed pH range falls within this moderately acidic category, four-squared plastic test strips were appropriate in the current context.

REVIEWER COMMENT: Table 1 is merely informative and in part replicates tables 2 and 3. It could be in supporting information since it does not provide substantial information relevant to study.

RESPONSE: We moved Table 1 to supporting information.

REVIEWER COMMENT: Did the authors check the normality of the data?

RESPONSE: We did not check for normality; however, all sample sizes exceed 30 samples. By the central limit theorem, this implies that the sampling distribution of our sample means for all factors is approximately normal and t-tests are appropriate. We applied Levene’s test for homogeneity of variance found variances to differ between samples. As such, we applied a two-tailed Welch’s t-test to our data. We added citations to our manuscript to support our analysis.

REVIEWER COMMENT: Lines 167-168: to assess ‘the strength of the linear association between variables and soil pH’ using Pearson correlation the authors have to indicate if variables did present normal distribution.

RESPONSE: We recalculated Pearson correlations with the Spearman correlation, a rank-based method that does not assume normality. This recalculation altered the strength and significance of the relationship between some factors and soil pH, but our initial conclusions remain largely intact. Changes have been clearly noted in the revised manuscript. We thank the reviewer for bringing this assumption of the Pearson correlation to our attention and have added a citation that discusses the appropriate use of correlation coefficients to support our analysis and as a resource for future readers.

Further, though this was not requested by the reviewer, in our recalculation of correlational significance we adjusted our p-values with the Bonferroni correction. Because we performed multiple statistical tests on different characteristics of the same samples, the global probability of a type I error is inflated. To counter this, we multiplied p-values by the number of tests performed as suggested by Bonferroni. This is mathematically equivalent to dividing the probability of a type I error (α) by the number of tests. We added references to support this decision. We did not perform this step in the original manuscript because the authors were made aware of the Bonferroni correction after our initial submission.

REVIEWER COMMENT: Lines 219-221: This is a conclusion drawn from your work. Please, revise the start of your discussion.

RESPONSE: We agree that this is a conclusion drawn from our work, but we dispute that this is problematic to include in this section. We believe it is important to address the accuracy of our hypotheses upfront in the Discussion and we believe that our conclusions are relevant to our discussion of the results.

To clarify the flow of our Discussion, we have split it into subsections. Each of these subsections interprets a portion of our results in the context of our initial hypotheses. We have also expanded the interpretation of our results in the context of previous literature.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

João Canário

22 Jun 2021

PONE-D-21-10198R1

Assessing the geographic specificity of pH prediction by classification and regression trees

PLOS ONE

Dear Dr. Egelberg,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

After a careful reading of the reviewers comments (mainly reviewer 2) i tend to agree that the manuscript needs some clarification before being accepted for publication in PLOS ONE. While reviewer 2 comment 1 and 4 are a question of style I tend to corroborate both comments, suggesting a revision accordingly.

Comments 2 and 3 from the same reviewer are more delicate and needs a clear clarification. I agree with the reviewer that the normality of the data should be checked. Also the Pearson and Spearman correlations issue should be clarified.

Please submit your revised manuscript by Aug 06 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

João Canário, PhD

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: taking into account that the authors made all the changes proposed and that the manuscript was improved with the help of the comments of both reviewers, I considered that this paper can be published in the present form.

Reviewer #2: REVIEWER 2 COMMENTS TO THE AUTHOR’S RESPONSE:

The manuscript “Assessing the geographic specificity of pH prediction by classification and regression trees” by Egelberg et al. was revised according to the previous reviewer 2’ comments. However, some details need to be clarified. Consequently, I suggest the author to revise the manuscript in some critical issues, in particular points 1, 2 and 4, as follows:

1- AUTHOR’S RESPONSE: We believe the logical flow of the paper is better maintained by stating the study objectives prior to our expectations regarding those objectives. As such, we have kept lines 91-93 before the aims. This decision aligns with the recommendation of Schober et al. in “Clear Study Aims and Hypotheses in a Research Paper.”

Reviewer comment: I strongly disagree with the logical argument provided by the authors. Contrarily to what is stated by the authors, one can only postulate objectives after elaborating a hypothesis, which is completely different from a hypothesis elaborated after the definition of objectives. Therefore, lines 93-97 of the revised manuscript should precede the definition of the aims of the study. Ultimately, the response given by the authors corroborates my observation.

2- AUTHOR’S RESPONSE: We did not check for normality; however, all sample sizes exceed 30 samples. By the central limit theorem, this implies that the sampling distribution of our sample means for all factors is approximately normal and t-tests are appropriate. We applied Levene’s test for homogeneity of variance found variances to differ between samples. As such, we applied a two-tailed Welch’s t-test to our data. We added citations to our manuscript to support our analysis.

Reviewer comment: The CLT has some disadvantages and is a common misconception if we take a large number of samples their distribution will be (close to) normal. Not everything is a mean. Therefore, it should be used very carefully when dealing with environmental samples. As an example, if you consider two vertical profiles, e.g. for pH, between 1 to 10-cm depth and one presents the highest value at the surface (1-cm) and other presents the same highest value at 6-cm, their distributions are completely different despite having the same average value. This is even more complicated when such variable depends on other parameters of the soils, as it is the case of this work when looking into a spatial distribution. Thus, normality should be checked using the appropriate statistical test and indicated in the Methods for the sake of the reader withdraw its own conclusions.

3- AUTHOR’S RESPONSE: We recalculated Pearson correlations with the Spearman correlation, a rank-based method that does not assume normality.

Reviewer comment: Please be aware that Pearson and Spearman correlations have different assumptions, so I assume the authors meant to say they have recalculated correlations.

4- AUTHOR’S RESPONSE: We agree that this is a conclusion drawn from our work, but we dispute that this is problematic to include in this section. We believe it is important to address the accuracy of our hypotheses upfront in the Discussion and we believe that our conclusions are relevant to our discussion of the results.

Reviewer comment: I do not understand or agree with this logic, as was earlier expresses for the objectives and hypothesis. One can discuss their findings and subsequently draw the conclusions but to discuss results based in conclusions is not logical and scientifically incorrect.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Aug 11;16(8):e0255119. doi: 10.1371/journal.pone.0255119.r004

Author response to Decision Letter 1


6 Jul 2021

RESPONSE TO EDITOR:

EDITOR COMMENT: After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. After a careful reading of the reviewers comments (mainly reviewer 2) i tend to agree that the manuscript needs some clarification before being accepted for publication in PLOS ONE. While reviewer 2 comment 1 and 4 are a question of style I tend to corroborate both comments, suggesting a revision accordingly. Comments 2 and 3 from the same reviewer are more delicate and needs a clear clarification. I agree with the reviewer that the normality of the data should be checked. Also the Pearson and Spearman correlations issue should be clarified.

RESPONSE: Thank you for your comments. We have addressed your recommendations with our responses to Reviewer 2’s input.

GENERAL COMMENTS:

COMMENT: Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

RESPONSE: We have reviewed our references and can ensure that it is complete and correct. There are no retracted papers included in our references.

RESPONSE TO REVIEWER 1:

REVIEWER COMMENT: Taking into account that the authors made all the changes proposed and that the manuscript was improved with the help of the comments of both reviewers, I considered that this paper can be published in the present form.

RESPONSE: We thank the reviewer for their time and consideration.

RESPONSE TO REVIEWER 2:

REVIEWER COMMENT: The manuscript “Assessing the geographic specificity of pH prediction by classification and regression trees” by Egelberg et al. was revised according to the previous reviewer 2’ comments. However, some details need to be clarified. Consequently, I suggest the author to revise the manuscript in some critical issues, in particular points 1, 2 and 4, as follows.

RESPONSE: We thank the reviewer for their additional input and address each recommendation below.

REVIEWER COMMENT: I strongly disagree with the logical argument provided by the authors. Contrarily to what is stated by the authors, one can only postulate objectives after elaborating a hypothesis, which is completely different from a hypothesis elaborated after the definition of objectives. Therefore, lines 93-97 of the revised manuscript should precede the definition of the aims of the study. Ultimately, the response given by the authors corroborates my observation.

RESPONSE: We reformatted our objectives and hypothesis so that statement of the objectives follows statement of the hypotheses. We trust the reviewer’s judgement that this improves clarity.

REVIEWER COMMENT: The CLT has some disadvantages and is a common misconception if we take a large number of samples their distribution will be (close to) normal. Not everything is a mean. Therefore, it should be used very carefully when dealing with environmental samples. As an example, if you consider two vertical profiles, e.g. for pH, between 1 to 10-cm depth and one presents the highest value at the surface (1-cm) and other presents the same highest value at 6-cm, their distributions are completely different despite having the same average value. This is even more complicated when such variable depends on other parameters of the soils, as it is the case of this work when looking into a spatial distribution. Thus, normality should be checked using the appropriate statistical test and indicated in the Methods for the sake of the reader withdraw its own conclusions.

RESPONSE: To test for normality, we plotted histograms of predicted and actual pH data with the hist() function in R. CART and random forest-predicted pH values appear normally distributed; however, sampled pH and Zhang et al.’s predicted pH values do not.

To air on the side of caution and avoid an improper analysis, we recalculated significance with non-parametric Wilcoxon Rank Sum tests that do not assume a normal sampling distribution of sample means. The recalculation of significance did not alter our results or conclusions.

REVIEWER COMMENT: Please be aware that Pearson and Spearman correlations have different assumptions, so I assume the authors meant to say they have recalculated correlations.

RESPONSE: We agree that the Pearson and Spearman correlations measure fundamentally different phenomena and intended to state that we replaced calculations of the Pearson correlation with calculations of the Spearman correlation in our analysis. We altered language in our paper to account for these divergent assumptions. Specifically, we removed R^(2) values, replaced r values produced by the Pearson correlation with rho values produced by the Spearman correlation, and reworded our statistical analysis section to emphasize that Spearman correlations measure the strength of monotonic relationships (as opposed to Pearson correlations, which are only valid for linear relationships).

REVIEWER COMMENT: I do not understand or agree with this logic, as was earlier expresses for the objectives and hypothesis. One can discuss their findings and subsequently draw the conclusions but to discuss results based in conclusions is not logical and scientifically incorrect.

RESPONSE: We have reworded the introduction to our discussion in accordance with the reviewers’ comments.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 2

João Canário

12 Jul 2021

Assessing the geographic specificity of pH prediction by classification and regression trees

PONE-D-21-10198R2

Dear Dr. Egelberg,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

João Canário, PhD

Academic Editor

PLOS ONE

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Acceptance letter

João Canário

16 Jul 2021

PONE-D-21-10198R2

Assessing the geographic specificity of pH prediction by classification and regression trees

Dear Dr. Egelberg:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. João Canário

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File. Topographic and climatic factor information.

    (XLSX)

    S2 File. Topographic and climatic factor and soil pH data.

    (XLSX)

    Attachment

    Submitted filename: Response to Reviewers.docx

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    All relevant data are within the manuscript and its Supporting Information files.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES