Taxonomic affinity, habitat and seed mass strongly predict seed desiccation response: a boosted regression trees analysis based on 17 539 species

Sarah V Wyse; John B Dickie

doi:10.1093/aob/mcx128

. 2017 Dec 18;121(1):71–83. doi: 10.1093/aob/mcx128

Taxonomic affinity, habitat and seed mass strongly predict seed desiccation response: a boosted regression trees analysis based on 17 539 species

Sarah V Wyse ^1,^✉, John B Dickie ¹

PMCID: PMC5786232 PMID: 29267906

Abstract

Background and Aims

Seed desiccation response plays an important role in plant regeneration ecology, and has significant implications for species conservation. The majority of seed plants produce desiccation-tolerant (orthodox) seeds, whilst comparatively few produce desiccation-sensitive (recalcitrant) seeds that are unable to survive dehydration, and which cannot be conserved in traditional seed banks. This study develops a set of models to predict seed desiccation response in unstudied species.

Methods

Taxonomy, trait, location and climate data were compiled to form a global data set of 17 539 species. Three boosted regression trees models were then developed to predict species’ seed desiccation responses based on habitat and trait information for the species, and the seed desiccation responses of close relatives (either members of the same genus, family or order, depending on the model). Ten-fold cross-validation was used to test model predictive success. The utility of the models was then demonstrated by predicting seed desiccation response for two floras: Ecuador, and Britain and Ireland.

Key Results

The three models had varying success rates for identifying the desiccation-sensitive species: 89 % for the genus-level model, 79 % for the family-level model and 60 % for the order-level model. The most important predictor variables were the seed desiccation responses of a species’ relatives, seed mass and annual precipitation. It is predicted that 10 % of seed plants from Ecuador and 1.2 % of those from Britain and Ireland produce desiccation-sensitive seeds. Due to data availability, prediction accuracy is likely to be higher for the British and Irish flora, where it is estimated that a desiccation-sensitive species had a 96.7 % chance of being correctly identified, compared with 80.8 % in the Ecuador flora.

Conclusions

These models can utilize existing data to predict species’ likely seed desiccation responses, providing a gap-filling tool for global studies of plant traits, as well as critical decision-making support for plant conservation activities.

Keywords: Boosted regression trees, plant development and life-history traits, recalcitrant seeds, seed desiccation sensitivity, seed functional traits, seed storage behaviour

INTRODUCTION

Regeneration is an important component of plant ecology; however, it is often overlooked and excluded from trait-based and community analyses (Larson and Funk, 2016). The ability of seeds to survive desiccation is one important seed trait, both ecologically (Tweddle et al., 2003) and with respect to seed conservation (Roberts, 1973; Berjak and Pammenter, 2008). According to their responses to desiccation, seeds can be divided into two broad groups: desiccation-tolerant (‘orthodox’; Roberts, 1973) seeds, which can survive drying to low moisture contents (below ~7 %); and desiccation-sensitive (‘recalcitrant’; Roberts, 1973) seeds, which cannot. The majority of seed plants produce desiccation-tolerant seeds, which undergo pre-maturation drying and may survive for considerable periods of time in storage or soil seed banks, depending on species (Berjak and Pammenter, 2008). Species that produce desiccation-sensitive seeds, in contrast, are likely to only represent ~8 % of the seed plant flora (Wyse and Dickie, 2017). Such seeds are shed metabolically active, are generally comparatively quick to germinate, and have typical lifespans in the order of days or months, to a maximum of 1–2 years for some temperate species (Berjak and Pammenter, 2008).

Studies relating to seed desiccation sensitivity are predominantly located in the seed science literature, where discussion of the trait is often restricted to the practicalities of ex situ seed banking (but see Tweddle et al., 2003; Daws et al., 2005; Wyse and Dickie, 2017). However, the responses of seeds to desiccation have wider implications, are of global importance, and should be incorporated into studies of plant ecology and conservation. The species that produce desiccation-sensitive seeds are most commonly mature-phase forest trees from moist, tropical habitats (Tweddle et al., 2003; Wyse and Dickie, 2017), many of which dominate their forest canopies, constituting a high proportion of the plant biomass in their ecosystems. In addition, many of these species are also threatened by overexploitation, habitat loss and forest fragmentation (Berjak and Pammenter, 2008; Walck et al., 2011). Such species therefore represent important foci for conservation efforts, including the international in and ex situ targets for threatened plant species conservation (CBD, 2012). However, desiccation-sensitive seeds are unable to be conserved ex situ using traditional seed banking techniques, and can also hamper revegetation and habitat-restoration efforts (Cole et al., 2011). Additionally, given that desiccation-sensitive seeds are ‘extremely’ sensitive to environmental stress (Joët et al., 2016), and the species are typically mature-phase forest trees with comparatively slow life histories (Tweddle et al., 2003), species that produce desiccation-sensitive seeds may be less resilient to environmental change, particularly the increases in drought that are forecast under climate change scenarios for many areas (IPCC, 2014). Therefore, the identification of species that produce desiccation-sensitive seeds should form a key component of assessments of forest resilience to climate change (Tweddle et al., 2003; Wyse and Dickie, 2017). Finally, seed desiccation response also informs other regeneration traits, such as seed dormancy, germination rate and soil seed bank persistence (Tweddle et al., 2003; Daws et al., 2005; Murdoch, 2014).

With the growing availability of plant trait and distribution data in global databases such as the TRY trait database (Kattge et al., 2011) and the Global Biodiversity Information Facility (GBIF; www.gbif.org), large-scale studies can be undertaken to investigate global patterns in plant traits and plant evolution (e.g. Díaz et al., 2016). However, given the inherent sparsity of trait data, statistical imputation methods, such as the gap-filling method recently developed by Schrodt et al. (2015) for continuous trait data, are required if we are to make full use of available data sets to further our understanding of plant and seed traits, trait evolution and the resilience of plant communities. Previous research has predicted the global incidence of the desiccation-sensitive seed trait (Wyse and Dickie, 2017), and advanced identification of the habitats (Tweddle et al., 2003; Wyse and Dickie, 2017) and major plant groups (Wyse and Dickie, 2017) in which the trait is most likely to occur. These broad findings can provide insight into the potential incidence of the trait in different habitats or taxa; however, they are less suitable for providing further understanding of the probability that an individual species possesses the trait. Such predictions may be made using models (Daws et al., 2006) or keys (Hong and Ellis, 1996) that use seed morphology, such as the seed coat ratio, seed mass and seed moisture content, to predict the probability that a seed is desiccation-sensitive. Once seeds are collected and these traits measured, these models allow a reasonably accurate prediction of whether a species’ seeds are desiccation-tolerant or -sensitive (e.g. Daws et al. [2006] model success: 79 % of species with desiccation-sensitive seeds [n = 29]; 89 % of species with desiccation-tolerant seeds [n = 75]). However, the use of such models is obviously predicated on seed collection and measurements having been undertaken for the species of interest. There is therefore a need to be able to utilize current knowledge to predict the likelihood that individual unstudied species produce desiccation-sensitive seeds to contribute to global studies of plant and seed traits, and also to aid in conservation decision-making.

There is mounting evidence suggesting that rather than the binary distinction of ‘desiccation-sensitive’ and ‘desiccation-tolerant’, species may more correctly exist along a spectrum of levels of seed-desiccation response (Walters, 2015), which also includes ‘intermediate’ species that can be dried to a certain extent, but not as much as fully desiccation-tolerant species (Ellis et al., 1990). Indeed, the position of individual species’ seed samples may vary along that spectrum, depending on maturity and post-harvest treatment (Walters, 2015). Nevertheless, the binary distinction (or, more accurately the end of the spectrum to which a species is closest) has substantial value at the global scale for understanding regeneration ecology and planning ex-situ seed conservation activities, for devising propagation techniques for threatened species, and for predictive modelling of the effects of climate change on vegetation, through species regeneration by seed (Berjak and Pammenter, 2008; Joët et al., 2016; Wyse and Dickie, 2017). Thus, for the purpose of this study, we have retained that simple classification.

To enable prediction of seed desiccation response in unstudied species, this study develops boosted regression trees (BRT) models (Elith et al., 2008) that utilize existing knowledge contained within the Royal Botanic Gardens, Kew’s Seed Information Database (SID) (Royal Botanic Gardens Kew, 2016) to predict the probability that a species produces desiccation-sensitive seeds. This will enable nuanced predictions to be made at the level of individual species, which take into account the species’ traits and habitat, as well as the trait values of related species. Such models will also provide detailed information on correlates of this trait, at a scale orders of magnitude larger than previous work. Our models are based on seed-desiccation response data for 17 539 species, alongside climate variables and commonly recorded plant traits, and we test their predictive ability using 10-fold cross-validation. Additionally, one advantage of the BRT approach is that predictions can be made despite missing predictor variables and so we also use cross-validation to test the utility of the models in the absence of sets of predictor variables. We then present an example workflow to compile existing data and use the BRT models to predict seed desiccation response values for lists of species, demonstrating the use of this workflow with two freely available species lists for Britain and Ireland, and Ecuador.

MATERIALS AND METHODS

Input data

Data on seed desiccation response were compiled from the SID (Royal Botanic Gardens Kew, 2016). These data are publicly available from the Royal Botanic Gardens, Kew (http://data.kew.org/sid/) or via the TRY trait database (Kattge et al., 2011). Within the seed information database, seed desiccation response is divided into three broad categories: recalcitrant, intermediate, and orthodox. However, intermediate species represent just 0.76 % of the SID data (134 accepted species), so we therefore had insufficient data to include this category within our model. Consequently, these species were excluded from our dataset and are not predicted by our models. For the purposes of our modelling, we considered species to fall into one of two categories: those that produce desiccation-sensitive (recalcitrant) seeds, and those that produce desiccation-tolerant (orthodox) seeds. To ensure a consistent taxonomy, all binomials were matched against accepted names in The Plant List (2016). Binomials not found in this resource were then matched against Tropicos (http://www.tropicos.org), whilst any not occurring in either list were excluded from the analysis. Potential synonymies were checked using The Plant List (2016). Genera were assigned to families following The Plant List (2016) or Tropicos where appropriate, and families were assigned to orders following the Angiosperm Phylogeny Group (Stevens, 2013). The compiled dataset totalled 668 desiccation-sensitive species and 16 871 desiccation-tolerant species, collectively representing 3929 genera, 280 families and 52 orders.

Predictor variables used in the model included species traits and habitat variables. Trait variables were two of the most commonly recorded plant traits, seed mass and woodiness (see the TRY database; Kattge et al., 2011), as well as seed dispersal mode. Seed mass data were obtained from the SID, and were available for 83 % of the species in our dataset. The woodiness database used was that compiled by Zanne et al. (2014) and is available through the Dryad data repository (Zanne et al., 2013). In this database species were classified as woody, herbaceous or variable, the definition of a woody species being one that has a prominent above-ground stem that persists through time and changing environmental conditions, whilst herbaceous species were those that lack such a stem. Using this database, we were able to obtain woodiness data for 86 % of the species in our dataset. Seed dispersal data were also obtained from the SID; however, these data were only available for 10 % of the species in the dataset. We then imputed values of this trait for other species where information was available for their congeners; the imputed values were the most common dispersal mode among the species congeners for which data were available. Using this method, we were able to increase the proportion of species with dispersal mode data to 63 % of the total species in the dataset. Predominant dispersal mode was recorded as one of five potential categories: animal (31 % of species); wind (42 %); water (7 %); unassisted or methods originating from parent plant (18 %); or combination (2 %). The combination category represented any combination of the other categories, most commonly wind plus animal (1 % of species with dispersal mode values). To investigate whether this imputation was appropriate, we determined the proportion of species in each genus in the dispersal mode dataset that possessed the most common dispersal mode for their genus. If dispersal mode was completely random with respect to genus, one would expect the mean proportion of species with the most common dispersal mode for their genus to be around 0.5 (as determined by simulation), whilst if dispersal mode was strongly related to genus one would expect this proportion to be around 1. For the genera containing at least ten species with known dispersal mode in the SID dispersal dataset (61 genera in total), the mean proportion of species with the most common dispersal mode for their genus was 0.9, with a median value of 0.97.

To obtain climate variables describing each species’ habitat, we extracted temperature, rainfall and altitude data from the WorldClim global climate database (Hijmans et al., 2005) for the locations at which we had species occurrence records. Seventy-eight percent of the seed desiccation response data within our dataset (all of which were from the SID) come originally from research on collections of seeds in long-term storage in the Millennium Seedbank at the Royal Botanic Gardens, Kew, and for these species the geographical locations used were those at which the seeds had been collected. For species without these seed collection location data we downloaded location information, where available, from the Global Biodiversity Information Facility (GBIF; www.gbif.org) using the dismo package (Hijmans et al., 2015) in R v. 3.2.2 (R Core Team, 2015). We downloaded a maximum of 100 records per species. Using data obtained from WorldClim for each location, we then calculated mean values per species for the variables annual precipitation, minimum monthly precipitation, mean annual temperature, diurnal temperature range and altitude. In order to reduce the potential effects of outliers within the location data obtained from GBIF, mean climate variables for these species were calculated from the data values that fell within 95 % confidence limits. Using these methods, we were able to obtain climate data for 94 % of the species in our dataset.

Given that seed desiccation response is relatively conserved at low taxonomic levels (Wyse and Dickie, 2017), we expected that the trait values of close relatives should be a good predictor of a species’ seed desiccation response. To account for this prior knowledge we computed a variable that represented the proportion of a species’ close relatives that produce desiccation-sensitive seeds. For the majority of species, the closest relatives within the dataset were from the same genus (89 % of species), but for others data were only available for species within the same family (10.7 %) or order (0.3 %), as is likely to be the case for any species lists for which trait predictions need to be made. Given that the probability of a species producing desiccation-sensitive seeds is likely to have a different relationship with the trait values of its relatives depending on how closely related those relatives are, we computed three versions of the variable (genus-level, family-level and order-level) for each species where possible. For a given species X, these variables were calculated as follows: the genus-level variable was the proportion of species within species X’s genus known to produce desiccation-sensitive seeds, excluding data for species X (able to be calculated for 89 % of species); the family-level variable was the mean proportion of species known to produce desiccation-sensitive seeds across the data-containing genera within the same family as species X, excluding any data for species X or its congeners (able to be calculated for 99 % of species); and finally the order-level variable was the mean proportion of species known to produce desiccation-sensitive seeds across the data-containing genera within the same order as species X, excluding any data for species X or its confamilial species (able to be calculated for 98 % of species).

Boosted regression trees models

Boosted regression trees (BRT) is a statistical learning technique that combines the methods of regression trees and boosting to fit numerous individual regression trees in a forward, step-wise manner before combining them into a final predictive model (De’ath, 2007; Elith et al., 2008; Buston and Elith, 2011). Such models typically have predictive power that exceeds that of most traditional modelling techniques (Elith et al., 2008). The method is becoming increasingly common within the ecological and botanical literature, and authors have used BRT to investigate a range of questions, from predicting tree species abundances from remote sensing data (van Ewijk et al., 2014) to determining the factors that best explain the invasion success of exotic species (Carboni et al., 2016). Alongside its predictive performance, BRT has numerous advantages over more conventional regression techniques. The BRT method can accept different types of predictor and response variables, is robust to outliers, can accommodate missing values in predictor variables, is able to fit non-linear relationships, and will automatically handle any interaction effects between predictor variables (Elith et al., 2008). Given the variety of variable types within our model and the presence of missing values within many of the predictor variables, a BRT approach is a logical choice for our analyses.

We used BRT to model the binary variable of seed desiccation response as a function of the functional trait (seed mass, woodiness and dispersal mode), climate (annual precipitation, minimum monthly precipitation, mean annual temperature, diurnal temperature range and altitude) and related-species (proportion of desiccation-sensitive relatives and genus-, family- or order-level) variables. We produced three BRT models, one using each version of the variable representing the proportion of a species’ close relatives with desiccation-sensitive seeds (hereafter referred to as the genus-level, family-level and order-level models). Before performing our BRT analysis, we first determined the optimum combination of the three key user-set model parameters for our data: the learning rate (shrinkage parameter; lr), the number of trees (nt) and the tree complexity (interaction depth; tc) (Elith et al., 2008). These optimized values were as follows: lr = 0.005, nt = 3 500, tc = 3 for the genus-level model; lr = 0.005, nt = 14 400, tc = 3 for the family-level model; and lr = 0.005, nt = 6 900, tc = 8 for the order-level model. Finally, we estimated variable importance for each model using the formulae developed by Friedman (2001) as implemented in the gbm package (Ridgeway and contributions from others, 2015), which returns values for each variable that reflect the reduction in model squared error that can be attributed to that variable, scaled so values for all variables sum to 100 (Elith et al., 2008). All BRT analyses were conducted in R v. 3.2.2 (R Core Team, 2015) using the dismo (Hijmans et al., 2015) and gbm (Ridgeway and contributions from others, 2015) packages.

Model validation

When predicting the value of a data point, the model returns the probability that the species has desiccation-sensitive seeds. We used a threshold value of 0.5 to distinguish between predicted desiccation-sensitive and desiccation-tolerant species. To assess model performance, we performed 10-fold cross-validation by randomly dividing the data into ten groups of approximately even size and building ten models in turn, each model using a different group as a test dataset with the remaining nine groups as the training dataset. We then assessed the predictive power of each model by comparing predicted with observed values of the test datasets. Because stochastic differences among different randomly assigned training and test datasets are likely to result in differences in model performance, we undertook 100 iterations of these analyses, thereby producing 1000 models in total, each with different test and training datasets. Species were randomly assigned to one of the ten groups in each iteration.

To test the utility of the models when data are only available for certain predictor variables, we performed a further 100 iterations of 10-fold cross-validation, where models were built using the training datasets as previously, but where all data were removed for certain predictor variables in the test sets. For this cross-validation we used four different groups of available predictor variables for the test datasets, representing the most important predictor variables in the model: (1) all variables except seed mass, the variable most likely to be absent from a potential dataset of unknown species; (2) the climate variables and the variable representing the proportion of a species’ close relatives that produce desiccation-sensitive seeds; (3) seed mass and the variable representing the proportion of a species’ close relatives that produce desiccation-sensitive seeds; and (4) the variable representing the proportion of a species’ close relatives that produce desiccation-sensitive seeds alone.

Using the models to predict seed desiccation response

We then developed a workflow (Fig. 1; R script provided as Supplementary Data) to make predictions for lists of species with unknown trait values. We used two freely available example species lists: a list of accepted plant names for Britain and Ireland in the PLANTATT database (last revised 2008), available from the Biological Records Centre (http://www.brc.ac.uk/biblio/plantatt-attributes-british-and-irish-plants-spreadsheet), and a catalogue of the vascular plants of Ecuador, available from Tropicos (http://www.tropicos.org/Project/CE). These two lists represent two diverse situations. The first is a small, temperate flora that is likely to have a low incidence of species with desiccation-sensitive seeds (Tweddle et al., 2003; Wyse and Dickie, 2017), and which is comparatively well-studied so should have a high proportion of species and genera with known trait values; the second is a considerably more diverse tropical flora, which is likely to have a much higher incidence of species with desiccation-sensitive seeds (Tweddle et al., 2003; Wyse and Dickie, 2017), and which should have a higher proportion of species where trait values are known only for other species related at the family or order level. For both lists we excluded known introduced species, resulting in lists of 1378 seed plant species for Britain and Ireland and 15 313 for Ecuador. For these species lists, we had no additional input information other than taxonomic name, and so the R script provided attempts to acquire as many variables as possible for each species, from the SID datasets (Royal Botanic Gardens Kew, 2016), the woodiness dataset compiled by Zanne et al. (2014) and which is available through the Dryad data repository (Zanne et al., 2013), as well as the GBIF (www.gbif.org) and the WorldClim global climate database (Hijmans et al., 2005) to estimate habitat variables.

RESULTS

The genus-level model had the highest predictive power: averaging across the 1000 models we performed for cross-validation (CV), this model successfully predicted the seed desiccation response for 99 % (95 % CI 99–100 %) of the test species that had been excluded from building the model (Table 1). However, given the low percentage of desiccation-sensitive species within the dataset (3.8 %), such a high total success rate should be easily achieved. The most important consideration, therefore, is the ability of the model to successfully predict the few desiccation-sensitive species in the dataset. We therefore report model success rates separately for the desiccation-sensitive and desiccation-tolerant species. On average, the genus-level models successfully predicted desiccation response for 89 % of the test desiccation-sensitive species (Table 1). When considering the desiccation-tolerant species alone, the model correctly predicted the trait values for 99.7 % of the test species (Table 1). The family-level model had a similar success rate to the genus-level model for predicting desiccation-tolerant species (Table 1); however, the ability of the family-level model to successfully predict the seed desiccation response for the desiccation-sensitive species was lower than that of the genus-level model, at 79 %. Finally, the order-level model had the lowest predictive success of the three models for the desiccation-sensitive species, successfully predicting on average 60 % of the test species (Table 1). This model had similar success rates for predicting the desiccation-tolerant species to the other two models (Table 1). When combining the three BRT models to predict the test species, by using the model based on the lowest taxonomic level possible for each of the 17 539 species, mean success rates across the CV iterations were 99.1 % (95 % CI 98.6–99.5 %) for all species, 99.7 % (95 % CI 99.4–99.9 %) for desiccation-tolerant species and 84.9 % (95 % CI 76.2–92.8 %) for desiccation-sensitive species.

Table 1.

Mean success rates (with 95 % confidence intervals) for the predictions of seed desiccation response trait values made using three boosted regression trees models. Predictions were made on test species using models developed from training data for 100 iterations of 10-fold cross-validation. Prediction success rates are reported separately for species with desiccation-sensitive (DS) and desiccation-tolerant (DT) seeds. All models were built on training datasets containing all available predictor variables, whilst predictions were made using test sets with different subsets of predictor variables

Predictor variable	Mean success rate (%)
	Genus-level model		Family-level model		Order-level model
	DS	DT	DS	DT	DS	DT
All	89 (80–96)	99.8 (99.4–99.9)	79 (69–88)	99.7 (99.4–99.9)	60 (48–72)	99.3 (98.9–99.7)
All except SM	91 (83–98)	99.6 (99.3–99.9)	79 (69–88)	99.6 (99.2–99.9)	51 (38–64)	98.7 (98.0–99.2)
R + C only	91 (82–98)	99.7 (99.3–99.9)	74 (64–84)	99.7 (99.5–99.9)	18 (9–28)	99.6 (99.3–99.9)
R + SM only	86 (76–95)	99.7 (99.3–99.7)	74 (63–84)	99.7 (99.3–99.9)	33 (19–47)	99.1 (98.4–99.6)
R only	87 (75–98)	99.7 (99.3–99.9)	73 (62–82)	99.6 (99.3–99.9)	13 (0–26)	98.3 (96.2–1.0)

Open in a new tab

All, all available predictor variables included; All except SM, all predictor variables excluding seed mass; R + C only, the variable representing the proportion of a species’ close relatives that produce desiccation-sensitive seeds, and the climate variables; R + SM only, the variable representing the proportion of a species’ close relatives that produce desiccation-sensitive seeds, and seed mass; R only, only the variable representing the proportion of a species’ close relatives that produce desiccation-sensitive seeds.

For the genus-level model, the most important variable for predicting the seed desiccation response of a species, by a considerable margin, was the proportion of its congeners that produce desiccation-sensitive seeds (Fig. 2A). This variable had a relative importance of 91 %, and was positively associated with the probability of seed desiccation-sensitivity: the higher the proportion of a species’ close relatives that produced desiccation-sensitive seed, the higher the probability that the species would also produce such seeds (Fig. 2B). The remaining variables in the model all had relative influences of <3 %, with seed mass the next most important variable at 2.7 %, followed by minimum monthly precipitation (2.1 %; Fig. 2A). The mean proportion of desiccation-sensitive species across the genera in a species’ family was the most important variable in the family-level model, with a relative importance of 71 %, followed by seed mass (12 %), annual precipitation (5 %) and minimum monthly precipitation (4 %) (Fig. 3). The partial response curves (Fig. 3) illustrate the effect of each predictor variable on the response after accounting for the effects of the other predictors, and for seed mass (Fig. 3C) the partial response curve indicates that heavier seeds are more likely to be desiccation-sensitive. These curves also suggest that species are more likely to produce desiccation-sensitive seeds if they grow in areas where annual rainfall is between 1300 and 3300 mm per year, with the highest probabilities in areas with rainfall between 1900 and 2700 mm per year (Fig. 3D). For the order-level model, seed mass was the most important predictor variable (relative importance 29 %; Fig. 4A, B), followed by the mean proportion of desiccation-sensitive species across the genera in a species’ order (20 %; Fig. 4A, C), and annual precipitation (18 %; Fig. 4A, D). In this model, minimum precipitation (relative importance 7 %) had a positive relationship with the probability that a species produces desiccation-sensitive seeds (Fig. 4E), while a species was also more likely to produce desiccation-sensitive seeds if it was woody (Fig. 4F; woodiness relative importance 7 %), and was from a habitat with a mean annual temperature around 30 °C (Fig. 4G; mean annual temperature relative importance 5 %). Finally, in this model wind-dispersed seeds were less likely to be desiccation-sensitive (Fig. 4H; dispersal mode relative importance 5 %).

Fig. 2. — The relative importance of the predictor variables in the genus-level boosted regression trees analysis scaled to sum to 100 (A) and the partial response curve showing the marginal effect of the most important predictor variable on the probability of a species having desiccation-sensitive seed in the genus-level boosted regression tree model (B). ‘Proportion d.s. relatives’ is the proportion of a species’ congeners known to produce desiccation-sensitive seeds. This is the only predictor variable with relative importance >5 %.

Fig. 3. — The relative importance of the predictor variables in the family-level boosted regression trees analysis scaled to sum to 100 (A) and the partial response curves showing the marginal effects of the variables on the probability of a species having desiccation-sensitive seed in the family-level boosted regression tree model (B–D). Partial response curves are shown only for variables with relative importance >5 %. ‘Proportion d.s. relatives’ is the mean proportion of species known to produce desiccation-sensitive seeds across the data-containing genera within the same family as a species, excluding any data for that species or its congeners. Note the differences in y-axis scales.

Fig. 4. — The relative importance of the predictor variables in the order-level boosted regression trees analysis scaled to sum to 100 (A), and the partial response curves showing the marginal effects of the variables on the probability of a species having desiccation-sensitive seed in the order-level boosted regression tree model (B–H). Partial response curves are shown only for variables with relative importance >5 %. ‘Proportion d.s. relatives’ is the mean proportion of species known to produce desiccation-sensitive seeds across the data-containing genera within the same order as a species, excluding any data for that species or its confamilials. Woodiness categories in (F): H, herbaceous; W, woody. Dispersal mode categories in (H): A, animal; Wa, water; Wi, wind; U, unassisted or methods originating from parent plant; C, any combination of the other categories. Note the differences in y-axis scales.

Both the genus-level and the family-level model maintained good predictive success when key predictor variables were removed from the model (Table 1). For the genus-level model, even removing all variables apart from the proportion of a species congeners that produce desiccation-sensitive seeds only reduced the model’s ability to correctly predict the desiccation-sensitive species to 87 % (Table 1). For this level of model, it would appear that the inclusion of seed mass as a predictor may have slightly reduced the model’s predictive power, potentially through over-fitting. When this variable was removed but all others were included, mean predictive success for desiccation-sensitive test species increased from 89 % to 91 % (Table 1). However, the difference between these models is not significant, as there was considerable overlap in the 95 % confidence intervals for the model success rates. Removing predictor variables caused a larger reduction in predictive success for the family-level model, as removing all variables except the mean proportion of desiccation-sensitive species across the genera in a species’ family saw a reduction in mean predictive success from 79 % to 73 % for desiccation-sensitive test species (Table 1). Removing seed mass alone did not change predictive success rates for this model (Table 1). In contrast, the predictor variables in the order-level model all contributed more evenly to the model than the genus-level and family-level models, which were dominated by the variable representing the proportion of a species’ relatives with desiccation-sensitive seeds (Figs 2–4). For the order-level model, therefore, removing predictor variables had a much larger effect on model performance than in the other two models (Table 1). For this model, seed mass was the most important predictor variable, yet may be the variable most likely to be unavailable for an unknown species. When this predictor variable, but no others, was removed, the model’s predictive performance decreased from 60 % to 51 % for the desiccation-sensitive species (Table 1). Removing additional variables, however, caused the predictive success for the desiccation-sensitive species to decrease even further: to 18 % for the desiccation-sensitive species when only the climate variables and the variable representing the proportion of a species’ relatives with desiccation-sensitive seeds were included, and to 13 % when this latter variable alone was used (Table 1). For all models and predictor variable combinations, the predictive success for the desiccation-tolerant species was >98 % (Table 1).

Using the workflow outlined in Fig. 1 and the R script provided as Supplementary Data, we were able to use a combination of the three BRT models to form predictions of the probabilities that the species in our example lists produce desiccation-sensitive seeds. For the Britain and Ireland species list we were able to match 1375 species names to accepted names in The Plant List (2016). Of these 1375 species, 1047 had known seed-desiccation response trait values recorded in the SID. For the remaining species, sufficient data were available in the SID for us to use the genus-level model to predict the probability of a species producing desiccation-sensitive seeds for 252 species, whilst the family-level model was required for 75 species and the order-level model for the final species (Fig. 5A). For the 328 species predicted by the models, the datasets we call were able to provide seed mass data for 58 % of the species, and temperature, rainfall and altitude data for 92 % of the species. Woodiness was able to be determined or imputed for 90 % of the species and dispersal mode for 62 % of the species. Collectively the models and data suggest that, for the 1375 British and Irish species for which we were able to compile known trait values or form predictions with the BRT models, 1.2 % of them had a probability of at least 0.5 of producing desiccation-sensitive seeds. Based on the data availability for the British and Irish flora and the results of our CV analyses for the different models and for the models with various predictor variables removed, we estimated that a desiccation-tolerant species in this species list had a 99.9 % chance of having its trait value correctly predicted, while a desiccation-sensitive species had a 96.7 % chance.

Fig. 5. — Beanplots (Kampstra, 2008) showing the density (i.e. frequency distribution) of the probabilities of species producing desiccation-sensitive seeds for species with known trait values (species present in the SID; value 0 or 1) and species with predicted probabilities for (A) a list of plant names for Britain and Ireland (*http://www.brc.ac.uk/biblio/plantatt-attributes-british-and-irish-plants-spreadsheet*) and (B) a catalogue of the vascular plants of Ecuador (http://www.tropicos.org/Project/CE). Predictions were made using the genus-, family- or order-level boosted regression trees model, depending on whether the closest relatives to a species with known values of the desiccation-sensitive seed trait in the SID were related at the genus-, family- or order-level (see Fig. 1 for the workflow used to develop these predictions). The species beans represent known trait data rather than predicted values. See Table 1 for likely model success rates for the predicted values (genus, family and order beans).

For the catalogue of vascular plants of Ecuador, we were able to match 13 925 of the taxa to accepted species names in The Plant List (2016). In contrast to the Britain and Ireland list, only 735 of these species had known seed desiccation response trait values in the SID, whilst we were able to use the genus-level model to predict seed desiccation response trait values for 5885 species, the family level model for 6956 species and the order-level model for the remaining 349 species (Fig. 5B). For the Ecuador species predicted by the models, the datasets we call were able to provide temperature, rainfall and altitude data for 87 % of species and seed mass data for just 3.2 % of species. Woodiness was determined or imputed for 73 % of species and dispersal mode for 51 % of species. As expected, the Ecuador species list had a higher known and predicted incidence of species with desiccation-sensitive seeds compared with the Britain and Ireland list, with 10 % of all species most likely to produce seeds that are desiccation-sensitive (known trait values or predicted probability of at least 0.5). For the genus- and family-level model predictions, most predicted probabilities were around either 0 or 100 %; i.e. there was a high probability that a species produces desiccation-tolerant and desiccation-sensitive seeds, respectively. Although the majority of the order-level predictions also returned probability values around 0, indicating a high chance of desiccation-tolerant seeds, a comparatively high proportion of the predictions fell between 0.2 and 0.8 (21 % of predictions, compared with 8 % for the family-level model and 1.5 % for the genus-level model), with only 0.6 % of the predicted probabilities being >0.9 (compared with 9 % for the genus-level model and 5 % for the family-level model). Because the data availability for the Ecuador flora was lower than that for the British and Irish flora, our estimated likely success rates per species were lower than those for the British and Irish flora, especially for any potentially desiccation-sensitive species on the list. We estimated that a desiccation-tolerant species in the Ecuador species list had a 99.7 % chance of having its trait value correctly predicted, while a desiccation-sensitive species only had an 80.8 % chance on average. Thus, our overall predicted incidence of desiccation sensitivity within the Ecuador flora (10 % of species) may slightly underestimate the true figure, while the value for the British and Irish flora (1.2 %) should be more accurate.

DISCUSSION

Correlates of seed desiccation response

The most important predictor variable in our genus-level and family-level models, and the second-most important variable in the order-level model, was the variable representing the proportion of a species’ relatives with desiccation-sensitive seeds. This finding contradicts previous reviews stating that there is no taxonomic basis for the desiccation-sensitive seed trait (e.g. Berjak and Pammenter, 2001), and suggests that seed responses to desiccation are conserved at the lower taxonomic levels, as suggested by the analyses of Wyse and Dickie (2017). However, it is also evident that the strength of the association between the trait and taxonomy is reduced at higher taxonomic levels, supporting the suggestion by multiple authors that the trait is one that has arisen multiple times in numerous plant lineages (Dickie and Pritchard, 2002; Berjak and Pammenter, 2008). Both Dickie and Pritchard (2002) and Berjak and Pammenter (2008) suggest that it may only require a single gene loss for seed desiccation sensitivity to arise from desiccation tolerance, although the genetic mechanisms underlying seed desiccation tolerance are not well understood and, likewise, the ease of desiccation sensitivity reverting to desiccation tolerance is not known. Our models also support previous work showing that seeds with greater mass were more likely to be desiccation-sensitive (Hong and Ellis, 1996; Daws et al., 2005, 2006; Ellis et al., 2007; Lima et al., 2014), and likewise that species from warmer, wetter habitats are more likely to produce desiccation-sensitive seeds (Tweddle et al., 2003; Wyse and Dickie, 2017). However, given the relationships previously described between seed mass, temperature, precipitation and net primary productivity (Moles et al., 2005), it is difficult to establish causation and there may be multiple variables acting in concert.

Consistent with previous work (Dickie and Pritchard, 2002; Tweddle et al., 2003; Daws et al., 2005, 2006), seed mass was an important predictor in all three models, especially the order-level model, whereby larger seeds were more likely to be desiccation-sensitive. Although there is likely a strong relationship between seed mass and taxonomy (Daws et al., 2005), the high importance of both seed mass and the variable representing the proportion of a species’ relatives with desiccation-sensitive seeds suggests that, within a taxon, seed mass is likely to vary predictably between species with desiccation-sensitive compared with desiccation-tolerant seeds. This result is supported by the physiological model of Daws et al. (2006), which, using seed mass and seed coat ratio, was able to successfully predict the different seed desiccation responses of species of Acer. The association between large seeds and desiccation sensitivity has been explained as a result of the influence of seed size on the rate of water loss, as smaller seeds will lose water more rapidly than larger seeds (Daws et al., 2005). However, Tweddle et al. (2003) also postulate that the relationship may not be causal, but might simply result from the ecological conditions that favour rapidly germinating desiccation-sensitive seeds (i.e. mature-phase moist forest) also selecting for large seeds.

Our model results with respect to the relationship between climate variables and the likelihood of seed desiccation sensitivity were consistent with those of Tweddle et al. (2003) and Wyse and Dickie (2017), supporting the finding that species from tropical moist forests are those most likely to produce desiccation-sensitive seeds. The correlation between the incidence of seed desiccation sensitivity and warm, wet habitats has been explained in detail previously (e.g. Daws et al., 2006; Berjak and Pammenter, 2008): in conditions of low abiotic stress, biotic stresses can be of greater adaptive significance and desiccation-sensitive seeds and their associated rapid germination (Daws et al., 2005) may afford species higher chances of regenerative success where seedling-stage competition, as well as risks of seed predation and fungal attack, is high. However, despite the strong patterns in the incidence of seed desiccation sensitivity that have been found amongst habitat types, habitat variables such as precipitation were not as strong predictors in our models as the species-specific variables of seed mass and the incidence of desiccation-sensitive seeds amongst a species’ relatives. Given the strength of the relationship between seed mass and seed desiccation response (Daws et al., 2006), the high importance of seed mass in our models was expected, and likewise it could be expected that this seed-specific trait should be a more important predictor variable than the more general habitat variables. The finding that the incidence of seed desiccation sensitivity amongst a species’ relatives was a stronger predictor than the climate variables implies either that a species’ evolutionary history may be more important than its habitat when predicting its seed desiccation response, or that the climate variables we include do not fully capture the abiotic dimensions of a species’ niche. More likely, both explanations are possibly partially correct, and it is almost certain that there are important abiotic variables that we have not included here. Although the climate variables we used capture the overarching dimensions of a species’ environment, potentially important aspects of a species’ microclimate are not included in our models, such as soil conditions (including propensity to drought or water-logging), aspect and exposure, and also the successional stage at which a species occurs. Additionally, precipitation at the time of seed dispersal may be more important than the precipitation measures included here, yet phenological data are likely to be unavailable for many unstudied species and so we intentionally did not include this variable in the models.

A considerable majority of the species known to produce desiccation-sensitive seeds are woody plants (Wyse and Dickie, 2017). Within our dataset, we were able to compile woodiness data for 15 086 species, 56 % of which were herbaceous. Of these herbaceous species, just 0.35 % produced desiccation-sensitive seeds, compared with 9.5 % of the known woody species. To explain the rarity of herbaceous species with desiccation-sensitive seeds, Tweddle et al. (2003) suggested that seed desiccation sensitivity may be incompatible with the life history strategies of annual and biennial species, and indeed all the herbaceous species in our dataset with desiccation-tolerant seeds were perennial geophytes. However, the seed desiccation responses of perennial herbaceous species have received very little research attention, and there are few available data for many taxonomic groups, such as the commercially important Zingiberaceae. Despite the apparent pattern between woodiness and seed desiccation response in our data, woodiness was the weakest predictor variable in our genus- and family-level models, contributing very little additional information further to the other predictors (Figs 2 and 3). This is likely due to the strong relationship between taxonomy and woodiness, which is especially pronounced at the genus level, with most genera monomorphic for the trait (Fitzjohn et al., 2014). Unlike the genus- and family-level models, woodiness had a relative importance of 7 % in the order-level model, where there was a greater likelihood of a woody species producing desiccation-sensitive seeds than herbaceous or variable species, after accounting for the other predictor variables. The increased importance of woodiness as a predictor variable in this higher-level model is expected, given that, like the seed desiccation response, the woodiness trait is less conserved at higher taxonomic levels, as determined from the data of Zanne et al. (2013, 2014).

Finally, Tweddle et al. (2003) suggest that dispersal method should be considered when investigating the ecology of seed desiccation response, particularly when considering species in more seasonal environments. However, to our knowledge ours is the first study to explore potential relationships between seed desiccation response and dispersal method. Dispersal mode was not a strong predictor of seed desiccation response, especially for the genus-level and family-level models, although this may relate in part to the limited availability of data for this trait. For the order-level model, dispersal method did have a minor contribution to the model (relative importance 5 %; Fig. 4), whereby a seed had a higher probability of being desiccation-sensitive if it was animal- or water-dispersed and a lower probability if it was wind-dispersed, after accounting for the other variables in the model. When considering the incidence of seed desiccation sensitivity in our dataset among dispersal methods, there were higher frequencies of desiccation-sensitive seeds among the animal-dispersed (9.2 % of species) and water- dispersed (6.0 % of species) species than in the dataset as a whole (3.8 % of species), and likewise a lower frequency of desiccation-sensitive seeds among those that were wind-dispersed (1.7 %). Desiccation-sensitive seeds must avoid fatal water loss both during and after dispersal, and it follows that the likelihood of a seed drying during dispersal would differ among dispersal methods. Wind dispersal, therefore, may be associated with a higher propensity for seed water loss than either animal or water dispersal. Additionally, given the comparatively high seed mass of desiccation-sensitive seeds observed here and in previous studies (e.g. Daws et al., 2005), there may also be developmental constraints that render the seed physiology associated wind dispersal less compatible with desiccation sensitivity, or vice versa, in some lineages. However despite these broad patterns, 15 % of the species with desiccation-sensitive seeds in our dataset (with known dispersal method) were wind-dispersed, most notably members of the Dipterocarpaceae.

Utility of the models

Our models provide the ability to use an extensive existing dataset to predict the probability that an unknown species will produce desiccation-sensitive seeds. The models can be used to provide an informative tool for conservation managers, especially those working in tropical habitats, where the incidence of seed desiccation sensitivity is highest. Our intent here is for the models to provide decision-making support prior to seed collection, although they may also help guide restoration efforts.

Our cross-validation results indicated that the three BRT models all had good success at predicting seed desiccation response, especially the genus- and family-level models; however, success rates for the order-level model were strongly dependent on the availability of predictor variables. One advantage of the BRT technique is its continued utility when some predictors are missing (Elith et al., 2008); however, the cross-validation results suggest that, despite predictions being possible in the absence of numerous predictor variables, in the case of the order-level model they are not necessarily always advisable. More specifically, our findings show that if seed desiccation response data are only available for members of the same order, rather than the same genus or family as the unknown species of interest, habitat and trait data must be included in the model for reliable predictions to be made. The cross-validation analyses we present can provide an idea of likely predictive success and should be considered before predictions are made, and certainly when interpreting model results. These analyses also highlight the importance of further data acquisition; increasing the extent and completeness of the data on which the BRT models are based should improve their predictive power, and it is clear that increasing the coverage of species for which the seed desiccation response is known will considerably improve prediction accuracy by allowing proportional increases in the use of the genus- or family-level models and a decrease in the use of the less powerful order-level model. This highlights the important role of global resources such as the TRY, SID and GBIF databases for conservation management, and the need for continued and increased data-sharing through such resources.

The predictions of seed desiccation response we made here for the two contrasting example seed plant floras, that of Britain and Ireland and that of Ecuador, indicate the increased certainty in classifying the likely seed desiccation response of an unknown species that can be associated with a better-studied flora. For the Britain and Ireland flora, we were able to use the genus-level model to predict trait values for 77 % of the unknown species, and the family-level model for 23 % of species. In contrast, the family-level model was used most commonly to predict trait values of the Ecuador flora (53 % of species), whilst the order-level model was required for 3 % of the species (Fig. 5). It is evident that use of the order-level model results in higher numbers of predictions around 0.5, where the classification of a species as either most likely desiccation-sensitive or desiccation-tolerant is less certain. As a result, the likely prediction accuracy was much higher for the Britain and Ireland flora, especially in terms of the ability to detect potential desiccation-sensitive species. This is not only a facet of the relative proportions in which the different models were used for the two floras, but is also due to the availability of trait and habitat predictor variables for the species. Despite these likely differences in predictive accuracy between the two species lists, the clear disparity in the incidence of seed desiccation sensitivity between the two regions (1.2 % of the Britain and Ireland flora, 10 % of the Ecuador flora) was as expected given the habitat types in the regions (Olson et al., 2001; Wyse and Dickie, 2017).

Finally, these BRT models are not intended to replace the existing physiological model developed by Daws et al. (2006); rather these two models could be used in conjunction to predict the likely seed desiccation response before (our models) and after (Daws et al., 2006) seed collection. By using traits measured directly from a seed of interest, the Daws et al. (2006) model remains the most appropriate when measurements of seed mass and, crucially, the seed coat ratio are able to be undertaken. However, both types of models had similar predictive success rates: when using the most appropriate model per species based on data availability for related taxa, our BRT models correctly predicted seed desiccation response for 99.7 % of the desiccation-tolerant species and 85 % of the desiccation-sensitive species (determined through cross-validation), while the Daws et al. (2006) model was successful for 89 % of the desiccation-tolerant and 79 % of the desiccation-sensitive species on which their model was based. The use of these two contrasting approaches in combination, where possible, may provide the most robust predictions of seed desiccation response, especially in situations where the predicted probabilities returned by one model type are around 0.5.

Conclusions

Knowledge of a species’ likely seed desiccation response is vital for decision-making support for in and ex situ plant conservation efforts, especially in the world’s most biodiverse habitats where the incidence of seed desiccation sensitivity is also at its highest. To this end, we have presented a methodological approach and set of models that attempt to integrate existing available data to develop predictions of seed desiccation response for seed plant species globally. These models and methods are intended for use prior to seed collection, and are able to provide robust predictions of species’ likely seed desiccation trait values. The nature of this work also highlights the considerable value of global databases such as the TRY database, the Royal Botanic Gardens, Kew’s SID and the GBIF. It is evident that the principal means of improving these models and their predictive success is through the generation of new data, as well as further compilation and sharing of existing data.

SUPPLEMENTARY DATA

Supplementary data are available online at www.aob.oxfordjournals.org and consist of the following: R script data.

Supplementary Material

ACKNOWLEDGEMENTS

The Royal Botanic Gardens (RBG), Kew, is part funded by Grant in Aid from the UK Department for Environment, Food and Rural Affairs. We would like to thank our colleagues in the Conservation Science department at RBG Kew for their extensive testing of the model and the valuable feedback they provided; especially Kate Hardwick, Tim Pearce and Michael Way. We would also like to thank Udayangani Liu for database support. We are grateful to Richard Ellis, Alberto Teixido and an anonymous reviewer for their constructive comments on an earlier version of the manuscript.

LITERATURE CITED

Berjak P, Pammenter NW. 2001. Seed recalcitrance – current perspectives. South African Journal of Botany 67:79–89. [Google Scholar]
Berjak P, Pammenter NW. 2008. From Avicennia to Zizania: seed recalcitrance in perspective. Annals of Botany 101:213–228. [DOI] [PMC free article] [PubMed] [Google Scholar]
Buston PM, Elith J. 2011. Determinants of reproductive success in dominant pairs of clownfish: a boosted regression tree analysis. Journal of Animal Ecology 80:528–538. [DOI] [PubMed] [Google Scholar]
Carboni M, Munkemuller T, Lavergne S et al. 2016. What it takes to invade grassland ecosystems: traits, introduction history and filtering processes. Ecology Letters 19:219–229. [DOI] [PMC free article] [PubMed] [Google Scholar]
CBD 2012. Global strategy for plant conservation: 2011–2020. Richmond, UK: Botanic Gardens Conservation International. [Google Scholar]
Cole RJ, Holl KD, Keene CL, Zahawi RA. 2011. Direct seeding of late-successional trees to restore tropical montane forest. Forest Ecology and Management 261:1590–1597. [Google Scholar]
Daws MI, Garwood NC, Pritchard HW. 2005. Traits of recalcitrant seeds in a semi-deciduous tropical forest in Panama: some ecological implications. Functional Ecology 19:874–885. [Google Scholar]
Daws MI, Garwood NC, Pritchard HW. 2006. Prediction of desiccation sensitivity in seeds of woody species: a probabilistic model based on two seed traits and 104 species. Annals of Botany 97:667–674. [DOI] [PMC free article] [PubMed] [Google Scholar]
De’ath G. 2007. Boosted trees for ecological modelling and prediction. Ecology 88:243–251. [DOI] [PubMed] [Google Scholar]
Díaz S, Kattge J, Cornelissen JHC et al. 2016. The global spectrum of plant form and function. Nature 529:167–171. [DOI] [PubMed] [Google Scholar]
Dickie JB, Pritchard HW. 2002. Systematic and evolutionary aspects of desiccation tolerance in seeds. In: Black M, Pritchard HW eds. Desiccation and survival in plants: drying without dying. Wallingford, UK: CAB International. [Google Scholar]
Elith J, Leathwick JR, Hastie T. 2008. A working guide to boosted regression trees. Journal of Animal Ecology 77:802–813. [DOI] [PubMed] [Google Scholar]
Ellis RH, Hong TD, Roberts EH. 1990. An intermediate category of seed storage behaviour? 1. Coffee. Journal of Experimental Botany 41:1167–1174. [Google Scholar]
Ellis RH, Mai-Hong T, Hong TD et al. 2007. Comparative analysis by protocol and key of seed storage behaviour of sixty Vietnamese tree species. Seed Science and Technology 35:460–476. [Google Scholar]
van Ewijk KY, Randin CF, Treitz PM, Scott NA. 2014. Predicting fine-scale tree species abundance patterns using biotic variables derived from LiDAR and high spatial resolution imagery. Remote Sensing of Environment 150:120–131. [Google Scholar]
Fitzjohn RG, Pennell MW, Zanne AE, Stevens PF, Tank DC, Cornwell WK. 2014. How much of the world is woody?Journal of Ecology 102:1266–1272. [Google Scholar]
Friedman JH. 2001. Greedy function approximation: a gradient boosting machine. Annals of Statistics 29:1189–1232. [Google Scholar]
Hijmans RJ, Cameron SE, Parra JL, Jones PG, Jarvis A. 2005. Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology 25:1965–1978. [Google Scholar]
Hijmans RJ, Phillips S, Leathwick J, Elith J. 2015. dismo: species distribution modeling. R package version 1.0–12. http://CRAN.R-project.org/package=dismo. [Google Scholar]
Hong TD, Ellis RH. 1996. Ex situ biodiversity conservation by seed storage: multiple-criteria keys to estimate seed storage behaviour. Seed Science and Technology 25:157–161. [Google Scholar]
IPCC 2014. Climate change 2014: synthesis report. Contribution of working groups I, II and III to the fifth assessment report of the Intergovernmental Panel on Climate Change. Geneva: IPCC. [Google Scholar]
Joët T, Ourcival J-M, Capelli M, Dussert S, Morin X. 2016. Explanatory ecological factors for the persistence of desiccation-sensitive seeds in transient soil seed banks: Quercus ilex as a case study. Annals of Botany 117:165–176. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kampstra P. 2008. Beanplot: a boxplot alternative for visual comparison of distributions. Journal of Statistical Software 28 Code Snippet 1:1–9.27774042 [Google Scholar]
Kattge J, Diaz S, Lavorel S et al. 2011. TRY—a global database of plant traits. Global Change Biology 17:2905–2935. [Google Scholar]
Larson JE, Funk JL. 2016. Regeneration: an overlooked aspect of trait-based plant community assembly models. Journal of Ecology 104:1284–1298. [Google Scholar]
Lima MdJ, Hong TD, Arruda YMBC, Mendes AMS, Ellis RH. 2014. Classification of seed storage behaviour of 67 Amazonian tree species. Seed Science and Technology 42:363–392. [Google Scholar]
Moles AT, Ackerly DD, Webb CO et al. 2005. Factors that shape seed mass evolution. Proceedings of the National Academy of Sciences of the USA 102:10540–10544. [DOI] [PMC free article] [PubMed] [Google Scholar]
Murdoch AJ. 2014. Seed dormancy. In: Gallagher RS. ed. Seeds: the ecology of regeneration in plant communities, 3rd edn UK: CAB International. [Google Scholar]
Olson DM, Dinerstein E, Wikramanayake ED et al. 2001. Terrestrial ecoregions of the world: a new map of life on Earth. Bioscience 51:933–938. [Google Scholar]
R Core Team 2015. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; https://www.R-project.org/. [Google Scholar]
Ridgeway G, contributions from others 2015. gbm: Generalized Boosted Regression Models. R package version 2.1.1. http://CRAN.R-project.org/package=gbm. [Google Scholar]
Roberts EH. 1973. Predicting the storage life of seeds. Seed Science and Technology 1:499–514. [Google Scholar]
Royal Botanic Gardens, Kew 2016. Seed Information Database (SID), Version 7.1. London: Royal Botanic Gardens, Kew; http://data.kew.org/sid/. [Google Scholar]
Schrodt F, Kattge J, Hanhuai S et al. 2015. BHPMF – a hierarchical Bayesian approach to gap-filling and trait prediction for macroecology and functional biogeography. Global Ecology and Biogeography 24:1510–1521. [Google Scholar]
Stevens PF. 2013. Angiosperm phylogeny website, Version 13, September 2013. http://www.mobot.org/MOBOT/research/APweb/. [Google Scholar]
The Plant List 2016. Version 1.1. http://www.theplantlist.org/ (19 January 2016). [Google Scholar]
Tweddle JC, Dickie JB, Baskin CC, Baskin JM. 2003. Ecological aspects of seed desiccation sensitivity. Journal of Ecology 91:294–304. [Google Scholar]
Walck JL, Hidayati S, Dixon KW, Thompson K, Poschlod P. 2011. Climate change and plant regeneration from seed. Global Change Biology 17:2145–2161. [Google Scholar]
Walters C. 2015. Orthodoxy, recalcitrance and in-between: describing variation in seed storage characteristics using threshold responses to water loss. Planta 242:397–406. [DOI] [PubMed] [Google Scholar]
Wyse SV, Dickie JB. 2017. Predicting the global incidence of seed desiccation sensitivity. Journal of Ecology 105:1082–1093. [Google Scholar]
Zanne AE, Tank DC, Cornwell WK et al. 2013. Data from: Three keys to the radiation of angiosperms into freezing environments. Dryad Digital Repository. http://dx.doi.org/10.5061/dryad.63q27.2. [Google Scholar]
Zanne AE, Tank DC, Cornwell WK et al. 2014. Three keys to the radiation of angiosperms into freezing environments. Nature 506:89–92. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material

[CIT0001] Berjak P, Pammenter NW. 2001. Seed recalcitrance – current perspectives. South African Journal of Botany 67:79–89. [Google Scholar]

[CIT0002] Berjak P, Pammenter NW. 2008. From Avicennia to Zizania: seed recalcitrance in perspective. Annals of Botany 101:213–228. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0003] Buston PM, Elith J. 2011. Determinants of reproductive success in dominant pairs of clownfish: a boosted regression tree analysis. Journal of Animal Ecology 80:528–538. [DOI] [PubMed] [Google Scholar]

[CIT0004] Carboni M, Munkemuller T, Lavergne S et al. 2016. What it takes to invade grassland ecosystems: traits, introduction history and filtering processes. Ecology Letters 19:219–229. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0005] CBD 2012. Global strategy for plant conservation: 2011–2020. Richmond, UK: Botanic Gardens Conservation International. [Google Scholar]

[CIT0006] Cole RJ, Holl KD, Keene CL, Zahawi RA. 2011. Direct seeding of late-successional trees to restore tropical montane forest. Forest Ecology and Management 261:1590–1597. [Google Scholar]

[CIT0007] Daws MI, Garwood NC, Pritchard HW. 2005. Traits of recalcitrant seeds in a semi-deciduous tropical forest in Panama: some ecological implications. Functional Ecology 19:874–885. [Google Scholar]

[CIT0008] Daws MI, Garwood NC, Pritchard HW. 2006. Prediction of desiccation sensitivity in seeds of woody species: a probabilistic model based on two seed traits and 104 species. Annals of Botany 97:667–674. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0009] De’ath G. 2007. Boosted trees for ecological modelling and prediction. Ecology 88:243–251. [DOI] [PubMed] [Google Scholar]

[CIT0010] Díaz S, Kattge J, Cornelissen JHC et al. 2016. The global spectrum of plant form and function. Nature 529:167–171. [DOI] [PubMed] [Google Scholar]

[CIT0011] Dickie JB, Pritchard HW. 2002. Systematic and evolutionary aspects of desiccation tolerance in seeds. In: Black M, Pritchard HW eds. Desiccation and survival in plants: drying without dying. Wallingford, UK: CAB International. [Google Scholar]

[CIT0012] Elith J, Leathwick JR, Hastie T. 2008. A working guide to boosted regression trees. Journal of Animal Ecology 77:802–813. [DOI] [PubMed] [Google Scholar]

[CIT0013] Ellis RH, Hong TD, Roberts EH. 1990. An intermediate category of seed storage behaviour? 1. Coffee. Journal of Experimental Botany 41:1167–1174. [Google Scholar]

[CIT0014] Ellis RH, Mai-Hong T, Hong TD et al. 2007. Comparative analysis by protocol and key of seed storage behaviour of sixty Vietnamese tree species. Seed Science and Technology 35:460–476. [Google Scholar]

[CIT0015] van Ewijk KY, Randin CF, Treitz PM, Scott NA. 2014. Predicting fine-scale tree species abundance patterns using biotic variables derived from LiDAR and high spatial resolution imagery. Remote Sensing of Environment 150:120–131. [Google Scholar]

[CIT0016] Fitzjohn RG, Pennell MW, Zanne AE, Stevens PF, Tank DC, Cornwell WK. 2014. How much of the world is woody?Journal of Ecology 102:1266–1272. [Google Scholar]

[CIT0017] Friedman JH. 2001. Greedy function approximation: a gradient boosting machine. Annals of Statistics 29:1189–1232. [Google Scholar]

[CIT0018] Hijmans RJ, Cameron SE, Parra JL, Jones PG, Jarvis A. 2005. Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology 25:1965–1978. [Google Scholar]

[CIT0019] Hijmans RJ, Phillips S, Leathwick J, Elith J. 2015. dismo: species distribution modeling. R package version 1.0–12. http://CRAN.R-project.org/package=dismo. [Google Scholar]

[CIT0020] Hong TD, Ellis RH. 1996. Ex situ biodiversity conservation by seed storage: multiple-criteria keys to estimate seed storage behaviour. Seed Science and Technology 25:157–161. [Google Scholar]

[CIT0021] IPCC 2014. Climate change 2014: synthesis report. Contribution of working groups I, II and III to the fifth assessment report of the Intergovernmental Panel on Climate Change. Geneva: IPCC. [Google Scholar]

[CIT0022] Joët T, Ourcival J-M, Capelli M, Dussert S, Morin X. 2016. Explanatory ecological factors for the persistence of desiccation-sensitive seeds in transient soil seed banks: Quercus ilex as a case study. Annals of Botany 117:165–176. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0023] Kampstra P. 2008. Beanplot: a boxplot alternative for visual comparison of distributions. Journal of Statistical Software 28 Code Snippet 1:1–9.27774042 [Google Scholar]

[CIT0024] Kattge J, Diaz S, Lavorel S et al. 2011. TRY—a global database of plant traits. Global Change Biology 17:2905–2935. [Google Scholar]

[CIT0025] Larson JE, Funk JL. 2016. Regeneration: an overlooked aspect of trait-based plant community assembly models. Journal of Ecology 104:1284–1298. [Google Scholar]

[CIT0026] Lima MdJ, Hong TD, Arruda YMBC, Mendes AMS, Ellis RH. 2014. Classification of seed storage behaviour of 67 Amazonian tree species. Seed Science and Technology 42:363–392. [Google Scholar]

[CIT0027] Moles AT, Ackerly DD, Webb CO et al. 2005. Factors that shape seed mass evolution. Proceedings of the National Academy of Sciences of the USA 102:10540–10544. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0028] Murdoch AJ. 2014. Seed dormancy. In: Gallagher RS. ed. Seeds: the ecology of regeneration in plant communities, 3rd edn UK: CAB International. [Google Scholar]

[CIT0029] Olson DM, Dinerstein E, Wikramanayake ED et al. 2001. Terrestrial ecoregions of the world: a new map of life on Earth. Bioscience 51:933–938. [Google Scholar]

[CIT0030] R Core Team 2015. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; https://www.R-project.org/. [Google Scholar]

[CIT0031] Ridgeway G, contributions from others 2015. gbm: Generalized Boosted Regression Models. R package version 2.1.1. http://CRAN.R-project.org/package=gbm. [Google Scholar]

[CIT0032] Roberts EH. 1973. Predicting the storage life of seeds. Seed Science and Technology 1:499–514. [Google Scholar]

[CIT0033] Royal Botanic Gardens, Kew 2016. Seed Information Database (SID), Version 7.1. London: Royal Botanic Gardens, Kew; http://data.kew.org/sid/. [Google Scholar]

[CIT0034] Schrodt F, Kattge J, Hanhuai S et al. 2015. BHPMF – a hierarchical Bayesian approach to gap-filling and trait prediction for macroecology and functional biogeography. Global Ecology and Biogeography 24:1510–1521. [Google Scholar]

[CIT0035] Stevens PF. 2013. Angiosperm phylogeny website, Version 13, September 2013. http://www.mobot.org/MOBOT/research/APweb/. [Google Scholar]

[CIT0036] The Plant List 2016. Version 1.1. http://www.theplantlist.org/ (19 January 2016). [Google Scholar]

[CIT0037] Tweddle JC, Dickie JB, Baskin CC, Baskin JM. 2003. Ecological aspects of seed desiccation sensitivity. Journal of Ecology 91:294–304. [Google Scholar]

[CIT0038] Walck JL, Hidayati S, Dixon KW, Thompson K, Poschlod P. 2011. Climate change and plant regeneration from seed. Global Change Biology 17:2145–2161. [Google Scholar]

[CIT0039] Walters C. 2015. Orthodoxy, recalcitrance and in-between: describing variation in seed storage characteristics using threshold responses to water loss. Planta 242:397–406. [DOI] [PubMed] [Google Scholar]

[CIT0040] Wyse SV, Dickie JB. 2017. Predicting the global incidence of seed desiccation sensitivity. Journal of Ecology 105:1082–1093. [Google Scholar]

[CIT0041] Zanne AE, Tank DC, Cornwell WK et al. 2013. Data from: Three keys to the radiation of angiosperms into freezing environments. Dryad Digital Repository. http://dx.doi.org/10.5061/dryad.63q27.2. [Google Scholar]

[CIT0042] Zanne AE, Tank DC, Cornwell WK et al. 2014. Three keys to the radiation of angiosperms into freezing environments. Nature 506:89–92. [DOI] [PubMed] [Google Scholar]

PERMALINK

Taxonomic affinity, habitat and seed mass strongly predict seed desiccation response: a boosted regression trees analysis based on 17 539 species

Sarah V Wyse

John B Dickie