Abstract
The earliest widely accepted presence of humans in America dates to approximately 17.5 cal kyr BP, at the end of the Last Glacial Maximum (LGM). Among other evidence, this presence is attested by stone tools and associated cut-marks and other bone surface modifications (BSM), interpreted as the result of the consumption of animals by humans. Claims of an older human presence in the continent have been made based on the proposed anthropogenic modification of faunal remains; however, these have been controversial due to the highly subjective nature of the interpretations. Here, we employ advanced deep learning algorithms to objectively increase the accuracy of BSM identification on bones. With several models that exhibit BSM classification accuracies greater than 94%, we use ensemble learning techniques to robustly classify a selected sample of BSM from the approximately 30 kyr BP site of Arroyo del Vizcaíno, Uruguay. Our results confidently show the presence of cut-marks imparted by stone tools on bones at the site. This result supports an earlier presence of humans in the American continent, expanding additional genetic and archaeological evidence of a human LGM and pre-LGM presence in the continent.
Keywords: artificial intelligence, archaeology, megafauna, Pleistocene, Xenarthra, human arrival in America
1. Introduction
The earliest presence of humans in America has long been controversial, with a widely shared interpretation that no convincing evidence existed prior to 17.5 kyr BP, before the end of the Last Glacial Maximum (LGM) [1–4]. A new discovery a few years ago, consisting of the association of broken stones with some remains of a proboscidean [5,6], could potentially place the oldest evidence of humans in this continent by 130 kyr BP; however, despite the analysis of bone micro-residues and the evidence of use-wear and impact marks, it still is a matter of controversy. Also, there is an increasing number of sites that vie for an older than the widely accepted presence of humans in the continent. Sites such as Gault [7], Meadowcroft Rockshelter [8], Cactus Hill [9], Miles Point [10] or Bluefish Caves [11] in North America and Monte Verde [12] or Santa Elina [13] in South America provide important cultural evidence of human presence during or immediately after the LGM. In addition to that, the recent discovery of thousands of artefacts and bones in the Chiquihuite Cave, Mexico, dated to 26.5–19 kyr BP, constitute together with some of the Brazilian sites in Serra da Capivara the best evidence for a human presence during the LGM and even earlier in America. This early occupation of the continent can be even older, as suggested by a Bayesian age modelling of the oldest archaeological record and genetic evidence [14,15]. Adding to this, compelling evidence of a pre-LGM human presence in America could potentially be found at Arroyo del Vizcaíno [16–19] (hereafter, AdV) in southern Uruguay (34°37'2.92″ S, 56° 2'32.54″ W) (figure 1).
Figure 1.
AdV site. (a) Panoramic view of the bonebed in its westernmost area, the grid used for reference the elements measures 1 m. (b) Close-up of the fossils in situ. (c) Close-up of the partial excavated area. Photographs by Martín Batallés. (d) Stratigraphic sequence of the site showing the provenance of the fossils and the obtained radiocarbon dates. (Online version in colour.)
This site, located at the bottom of a stream, has yielded a collection of over 2000 remains of South American Pleistocene megafauna (mostly belonging to the giant sloth Lestodon armatus), while many thousands are estimated yet to be extracted [16]. The fossils are densely packed in an approximately 70 cm thick level that fines upwards from a muddy sandy gravel (bed 2a) to a muddy sand (bed 2b) [9] (figure 1; electronic supplementary material). A total of 12 radiocarbon dates, which cluster at around 30 14C kyr BP [16,17] (over 33 cal kyr BP), have been obtained from purified and non-purified collagen as well as from wood in different laboratories and at different times. The potential existence of cut-marks on the AdV bones advocates an earlier human presence in America, just before the LGM [18]. Indeed, the dates obtained and the possible human modifications are from the very same objects (i.e. the same marked bones), thus suggesting a very strong association of the agency and the dates [16,17]. Furthermore, the presence of some lithic elements found with the fossil remains would suggest a direct functional association. This evidence includes flakes made of silicified sandstone and a small piece of translucid silcrete identified as a scraper with micropolish consistent with usage on dry hide [16]. For a summary of the findings at the site and its dating, see electronic supplementary material.
In turn, this association emphasizes the importance of correctly identifying the bone surface modification (BSM) that would support such an early human presence in the American continent. In fact, cut-marks in the fossil record have long been used for the identification of specific butchering behaviours by early Pleistocene hominins and their carcass acquisition strategies in the earliest stages of the evolution of Homo [20]. Claims have been made about the Pliocene antiquity of carcass processing using purported cut-marked bones in Dikika (Ethiopia) [21] or in the Plio-Pleistocene boundary in Quranwala (India) [22]. The claims made based on these discoveries remain controversial [23], as the purported cut-marked fossils from both sites also bear clear evidence of trampling or sedimentary abrasion [24]. Similarly, the AdV site also contains abundant conspicuous evidence of trampling and/or sedimentary abrasion [16].
To overcome the inherent subjectivity with which BSM have traditionally been interpreted [25], here we will apply artificial intelligence methods, which objectively retrieve information from images and classify them through deep learning (DL) techniques [26]. This is done through the use of convolutional neural networks (CNN). CNN are a series of multi-layer perceptrons that transform graphic information with the aid of sequential or parallel layers of weights and activation methods. Each layer focuses on specific image features and the bottom of the network integrates all the information producing a deep multivariate understanding of all the features that identifies any specific object.
CNNs have successfully shown that their accuracy can be substantially higher than that of human experts when identifying testing image datasets of BSM (91% versus 63%) [27]. CNN have also shown great resolution at classifying cut-marks made on fleshed versus those made on defleshed bones, regardless of the types of tools used [28]. Cut-marks, like other taphonomic entities, are subjected to morphological evolution through a palimpsest of multiple-agent processes and an interplay between stasis and change [29–31]. In this regard, CNN have successfully been used to detect morphing of the properties of cut-marks when exposed to biostratinomic abrasive processes [32]. In these experiments, cut-marks have been differentiated from trampling marks even when subjected to a high degree of modification by trampling itself, with more than 80% of morphed cut-marks correctly identified. In a training set involving several hundreds of BSM, including cut-marks, tooth-marks and trampling marks, CNN have yielded accuracies as high as 92% of correct classification on the testing sets [33].
This high accuracy of CNN in discriminating cut-marks from trampling marks (and other BSM), when using small [27] and large [28] training datasets, constitutes the best objective tool available for identifying cut-marks in the archaeological record. Given that some of the claims for the earliest presence of humans in the Americas are based partially or mostly on the interpretation of BSM on bones [5,11,18], here we will apply this DL method to the ichnological evidence from the approximately 30 kyr BP AdV site, where a sample of the bone assemblage exhibits both trampling marks and purported cut-marks. Here, we will test this assumption and, by extension, the evidence of human manipulation of bones at the site. If confirmed, this would add more data to the position of a pre-LGM human presence in the continent and would require more in-depth studies in similar contexts.
2. Classification of cut-marks
All the models applied to the experimentally controlled sample achieved an accuracy in the classification of testing datasets ranging between 94.3 and 97.9% (table 1). The best model to classify the experimental BSMs was InceptionV3, with an accuracy of 96.3% and a loss of 0.14 on the testing set, and a balanced accuracy of 83% of all BSM types. Such a high accuracy and low loss made this CNN model the best objective classifier of BSMs currently available. For this reason, we used it here for classifying the AdV marks and as a reference for the ensemble learning classification. The models presented here provided higher accuracy than the previous version of VGG16, which was earlier the most successful model used for the classification of BSMs [33]. In these models, tooth-marks and cut-marks were very accurately classified. The greater misclassification impacted trampling marks, several of which were classified preferentially as tooth-marks by all the models. This means that (i) when misclassified, the models are highly unlikely to classify trampling marks as cut-marks; (ii) when classified as trampling mark, the models are highly confident in that classification; and (iii) it is very unlikely that the models classify a cut-mark as any other type of mark.
Table 1.
Accuracy, loss and F1-scores on the testing sets of the five models used for DL and ensemble learning classification.
| accuracy | loss | F1-scores | |
|---|---|---|---|
| VGG16 | 0.963 | 0.142 | 0.75 |
| ResNet50 | 0.979 | 0.18 | 0.70 |
| Densenet201 | 0.943 | 0.169 | 0.73 |
| Jason2 | 0.985 | 0.2 | 0.61 |
| InceptionV3 | 0.963 | 0.146 | 0.83 |
Among 20 marks from the AdV site (figure 2), human experts previously interpreted seven marks as trampling and thirteen marks as cut-marks [16,18] (figure 2; electronic supplementary material, figures S1–S4). The CNN classifier is in disagreement with several of the experts' decisions. The ensemble analysis identified eight marks as cut-marks. The InceptionV3 model, with the highest balanced accuracy, also identified the same eight marks as cut-marks without any ambiguity (four of them with a probability greater than 90% and the other two marks with probability greater than 75%). Both the InceptionV3 and the ensemble analyses were also mostly coincident in the classification of the remaining BSMs. It is interesting to note that some of the marks reported by the excavators as trampling were classified by the machine as tooth-marks (supported by both DL methods). This does not necessarily mean that they are tooth-marks since they look very similar also to biochemical marks caused by bioerosion; however, the machine has not been trained to identify bioerosion yet, and the closest mark type that it associates these BSMs with are tooth-marks (like human experts would do). It is interesting to note that the machine is focusing on the internal microstriation features and the shoulder of the grooves to classify them correctly, as can be seen when applying the Grad-CAM algorithm (figure 3 and table 2).
Figure 2.
Lestodon ribs (a) CAV 451 and (b) CAV 452 with cut-marks. The insets show the marks under a magnification of 30 x. The scale bars measure 5 cm. Photographs by Martín Batallés. (Online version in colour.)
Figure 3.

Heat-map made with the Grad-CAM algorithm showing the most discriminating features that the InceptionV3 model is using to classify marks with accuracy. Notice how the internal microstriations and associated features (a) as well as the shoulder modifications (b) are used to classify Old-451-2 and Old-451-3 as cut-marks with high probability (table 2). (Online version in colour.)
Table 2.
Classification of the selected BSM from Arroyo del Vizcaíno according to ensemble learning and to deep learning (using InceptionV3). Probabilities of classification of the latter according to BSM are displayed.
| marks | ensemble learning |
inceptionV3 |
|||
|---|---|---|---|---|---|
| classification | tooth-mark | cut-mark | trampling | classification | |
| CAV 453-1 | trampling | 0.061 | 0.021 | 0.917 | trampling |
| CAV 453-2 | trampling | 0.05 | 0.002 | 0.947 | trampling |
| CAV 457 | trampling | 0.453 | 0.056 | 0.489 | trampling |
| CAV 459 | cut-mark | 0.155 | 0.532 | 0.311 | cut-mark |
| CAV 489 | trampling | 0.135 | 0.257 | 0.607 | trampling |
| CAV 1417 | trampling | 0.129 | 0.165 | 0.705 | trampling |
| Old 451-1a | cut-mark | 0.002 | 0.974 | 0.023 | cut-mark |
| Old 451-2a | cut-mark | 6.14 × 10−5 | 0.999 | 2.90 × 10−6 | cut-mark |
| Old 451-3a | cut-mark | 0.009 | 0.987 | 0.003 | cut-mark |
| CAV 451-1 | cut-mark | 0.207 | 0.782 | 0.009 | cut-mark |
| CAV 451-2 | cut-mark | 6.10 × 10−6 | 0.999 | 1.68 × 10−7 | cut-mark |
| CAV 451-3 | cut-mark | 0.077 | 0.762 | 0.016 | cut-mark |
| CAV 457-1 | cut-mark | 0.279 | 0.681 | 0.102 | cut-mark |
| CAV 452-1 | cut-mark | 0.09 | 0.904 | 0.005 | cut-mark |
| CAV 452-2 | cut-mark | 0.201 | 0.778 | 0.021 | cut-mark |
| CAV 452-3 | cut-mark | 0.173 | 0.613 | 0.213 | cut-mark |
| T1 | trampling | 0.184 | 0.191 | 0.624 | trampling |
| T2 | tooth-mark | 0.445 | 0.257 | 0.297 | tooth-mark |
| T3 | trampling | 0.254 | 0.102 | 0.642 | trampling |
| T4 | trampling | 0.158 | 0.407 | 0.434 | trampling |
| T5 | tooth-mark | 0.127 | 0.021 | 0.851 | trampling |
| T6 | tooth-mark | 0.37 | 0.321 | 0.308 | tooth-mark |
| T7 | tooth-mark | 0.401 | 0.054 | 0.543 | trampling |
aGiven that when magnified to 30× the images of the archaeological BSM do not capture most of the mark, a smaller version of the same marks (451-1-2-3) using 20× was also used (Old 451-2-3). This was done because these three marks displayed the highest probability estimates for being classified as anthropogenic cut-marks.
Further analysis of the anatomical distribution of the cut-marks on the different bones shows spatial associations with muscle attachment areas, which could provide information regarding behavioural aspects of their possible human makers. The cut-marks identified on ribs CAV 451 and CAV 452 are located on the external surface of the curved rim of the shafts, on the insertion of the intercostal and serratus muscles. Regarding the rib specimen CAV 457, the cut-mark is located in the neck, close to the capitulum, a region where the ligaments of the articulation with the vertebrae are inserted. Lastly, the cut-mark identified on CVA 459 (a glyptodont radius) is located in the lateral surface, at the proximal end of the radial ridge, an area of forelimb muscle insertion.
3. The AdV site in the context of other genetic and archaeological evidence
The accuracy in experimental mark identification by CNN models and their high probability in the identification of some of the selected marks from AdV supports that this site's modified bones could potentially constitute one the oldest direct evidence of humans in South America and, therefore, add more sound data to the position of a pre-LGM human presence in the Americas as a whole.
Apart from the claims that humans (belonging to yet undetermined species) were present [1] in America 130 kyr BP, the present DL classification method applied to the AdV site BSMs sample potentially provides more objective and unambiguous evidence of the early human presence in America, dating to over 30 cal kyr BP. Additional support for such an early human presence in that geographic area of the continent comes from a series of sites discovered in the Serra da Capivara (Brazil) [15,34–37]. Although most of the evidence found in these sites cluster in the LGM deposits, some sites show archaeological materials in the lower layers dated to slightly older than 26 500 cal BP. Evidence of the human presence comes not only from the dated in situ artefacts but also from the use-wear found on some of them. In addition, in the LGM deposit of Santa Elina, the spatial association of stone tools and bones include anthropogenic modifications of sloth (Glossotherium) skeletal elements, as in the case of AdV [13].
The finding of human-modified bones at AdV should encourage the search for this kind of evidence (as well as others) in other LGM and pre-LGM sites. The results presented here, although not necessarily support that human presence in America extends back to the Middle Pleistocene [5], provide compelling evidence both on humans having been directly engaged in megafaunal butchery (potentially supporting previous interpretations of the human impact in the megafaunal extinction at the end of the Pleistocene in America [38,39]) and also on human presence prior to LGM. In the same evidential path is the new discovery of the Chiquihuite Cave (Mexico), dated to the early LGM, which provides unambiguous evidence that humans were already established in Mesoamerica during that time [14], although wear analysis would additionally reinforce the anthropogenic attribution of the artefacts.
The relatively recent discovery of two infant burials at Upward Sun River [40] (Alaska, USA) dated to 11.5 kyr BP yielded genetic information about the founding population of Native Americans. The DNA data showed that Native Americans split from East Asians around 36 kyr BP with continuous gene flow, eventually interrupted [41] at 25 kyr BP. It is not known where the divergence between this founding Native American population (i.e. Beringians) and the eastern Asians took place. One possible location is in northeast Asia; another is Beringia. The documented gene flow between both populations could have happened in both locations. The interruption of the gene flow seems coincident with the beginning of the LGM. This grants more support to the Beringian locus than the Asian locus for the genetic divergence documented around 36 kyr BP. The subsequent genetic flow among Beringians and Northern Native Americans (NNA) and Southern Native Americans (SNA) also suggests this clade originally evolved in Beringia. From there, a rapid dispersal and diversification followed as people moved south [2]. Part of this initial spreading movement may have geographically diverged towards the east of North America and the southern part of the continent. The split between NNA and SNA [2] is inferred to have happened after 20 kyr BP. During the LGM, the Beringian founding population must have remained isolated from the already-diverged NNA population, as documented in the similar genetic information of the Trail Creek DNA, extracted from another infant 750 km away from Upward Sun River [2]. After the LGM, the NNA population went north and replaced the original Beringian population. The SNA population spread all the way to South America quickly. The SNA also show ‘complex admixture events between earlier-established populations' [2]. This suggests that some pre-SNA people may have reached South America earlier. DNA extracted from 11 kyr BP remains from Lagoa Santa (Brasil) show gene flow between this SNA individual and an outgroup denominated UpopA (Unsampled population A), which split off the NNA/SNA clade anytime between 30 and 22 kyr BP at the same time that Beringians split from Siberians. This has been interpreted as an indication that ‘there were multiple splits that took place in Beringia close in time’ [2]. This UpopA people could have been the first population to reach South America and could have been responsible for the meagre and controversial archaeological record in that part of the continent. Australasian genetic components have also been documented in the Lagoa Santa sample but not in North America, suggesting an earlier population that could have reached South America before the NNA/SNA split; that is earlier than 20 kyr BP. The AdV site formed slightly earlier than this date and falls chronologically close to the splitting date of UpopA from the other American populations. After the NNA/SNA split, it has been inferred that the SNA population dispersed over South America in less than one millennium [2]. The same timing could have been applied to UpopA in their earlier dispersal. The new genetic information is unveiling the formation of SNA genome, presumably through the presence of humans in that part of the continent in dates prior to those documented archaeologically.
Additional support to these interpretations stemming from the analysis of prehistoric DNA can be found in the genomics of modern populations in the region. Genetic data from modern Amazonian people indicate they originated from a Native American founding population with an Australasian substrate, linked to indigenous Australians and New Guineans, among others [42]. This Australasian component cannot be found in the North American modern and ancient populations. This suggests that populations from Central and South America provide the strongest evidence of deriving from an ancient first American population. Skoglund et al. [42] do not time the arrival of this population with Australasian ancestry (referred to as population Y) to the continent, but argue that ‘an open question is when and how Population Y ancestry reached South America. There are several archaeological sites in the Americas that are substantially older than Clovis sites. Since one Clovis site is now known from ancient DNA analysis to have included an individual of entirely First American ancestry, an interesting hypothesis is that Population Y ancestry may have been prevalent in the individuals of some of these earlier sites. Regardless, our results suggest that at least two different ancestry streams penetrated south of the Late Pleistocene ice sheets, perhaps taking different routes or arriving at different times from a structured Beringian or Northeast Asian source, or reflecting more longstanding gene flow. The genetic data allow us to say with confidence that Population Y ancestry arrived south of the ice sheets anciently: the fact that the geographically diverse Andamanese, Australian and New Guinean populations are all similarly related to this source suggests that the population is no longer extant, and the absence of long-range admixture linkage disequilibrium suggests that the population mixture did not occur in the last few thousand years' [42].
It should therefore not be surprising that the earliest taphonomic modification of bones by humans and some of the implements used, as documented in the AdV site, may be attributed to this founding period of human presence in South America.
4. Conclusion
One positive contribution of AdV to the debate of the early human presence in the Americas is that the anthropogenic evidence at the site is in the form of cut-marks on bones instead of just artefacts. In this case, with cut-marks being objectively assessed by computerized methods, the interpretation of human agency has the same value as the presence of artefacts in other contexts.
In summary, AdV contains the earliest objective taphonomic evidence of modification of megafaunal carcasses in South America and one of the oldest in the whole American continent, attesting to a pre-LGM presence of humans there. This provides archaeological support to inferences of an ancient occupation of South America as inferred from genomics of modern Amazonian populations. It also provides indirect support to the old dates reported from some of the Serra da Capivara sites in Brazil. The potential of artificial intelligence methods to unravel agency in archaeofaunas in the continent should be extended to other pre-LGM sites to detect an even older threshold of taphonomically supported evidence of human presence in America.
5. Methods
(a) . Description of the sample
(i) . Experimental bone surface modifications
The present study has used the experimental dataset reported by Domínguez-Rodrigo et al. [33]. It consists of 657 marks distributed as follows: 488 cut-marks from single-flake experiments, 106 tooth-marks from experimental work with lions and wolves, plus 63 marks from trampling experiments. For tooth-marks, the lion BSM sample was obtained from the experiment reported by Gidna et al. [43], carried out with a group of semi-captive lions from the Cabárceno (Cantabria) reserve in Spain and from the modern Olduvai Carnivore Site (OCS)(Tanzania) reported by Arriaza et al. [44]. The wolf tooth-mark sample was obtained from a collection of long bones consumed exclusively by wolves in the reserve of El Hosquillo (Cuenca, Spain).
Trampling marks were created by using four types of sediments: fine-grained (0.06–0.2 mm), medium-grained (0.2–0.6 mm) and coarse-grained (0.6–2.0 mm) sand, as well as a combination of the previous sand types over a clay substratum and granular gravel (>2.0 mm). These marks were selected from the trampling experiment reported by [45] and [32], and they include all the variety of abrasive sediment particles (other than large pebble gravel grains ranging between 4–6 mm) potentially creating trampling marks in natural settings.
For the cut-mark sample, a set of cow long limb bones were butchered with 22 simple (i.e. non-retouched) flint flakes. Each stone tool was used only 20 times to keep control of edge sharpness. A more detailed description of the experimental study and sample for the three types of BSM can be found in the original published work [33].
Images of BSM were produced with a binocular microscope (Optika) at 30× following a standardized protocol and using the same light intensity and angle. The resulting image data bank was used for analysis through the CNN models described below. All images were transformed into black and white during image processing in the Keras platform, using bidimensional matrices for standardization and centring, and they were reshaped to the same dimensions (80 × 400 pixels). The Keras library was used with the TensorFlow backend.
(ii) . Archaeological marks
A total of 20 marks from the archaeological assemblage of AdV were selected for analysis. Nine of these marks (from bones CAV 451, 452, 453 and 459) have already been analysed [16]. Additional marks obtained subsequently after continuing excavation were also added to the sample. Thirteen marks were interpreted by analysts as cut-marks, six of them from two Lestodon ribs (CAV 451 and 452; figure 2; electronic supplementary material, figures S1–S3). Seven additional marks were interpreted as trampling marks by the excavators (electronic supplementary material, figure S4). The reason to select two different types of marks was to assess whether the human analysts followed criteria that were useful in discriminating marks and to be compared with the performance of the machine. The criteria used in the identification of these marks followed standardized methods of mark micromorphology, in which cut-marks were identified when marks exhibited V-shaped grooves with internal microstriations (situated on the walls), and additional associated features such as shoulder effect, parallel shallow striations running partially along the main groove, asymmetry in the orientation of the groove axis and shoulder effects, preferentially associated with one shoulder or more extensively impacting one shoulder because of the angulation of the tool stroke. Trampling marks were interpreted as those marks exhibiting a shallow wide open section, with or without microstriations. If occurring, microstriations mostly occur on the floor of the mark instead of the walls. Trampling marks also exhibit limited to no shoulder effect and occur in clusters with random orientation. Additionally, features that were considered typical of trampling marks were the presence of shallower oblique striations crossing the main groove and the widespread occurrence of inconspicuous microabrasion caused by the smaller sedimentary particles. This protocol is fully described in [45].
The marks from the AdV sample were observed and photographed using an Olympus SZ61 stereomicroscope under a magnification of 30×. Helicon Focus software was used to make a complete in-focus image.
(b) . Description of the deep learning models
In the prior experimental study, several model architectures were tested: Alexnet, VGG16, two VGG-module-based models (Jason1, Jason2), ResNet50 and InceptionV3 [33,46,47]. A selection of these models (VGG16, Jason2, ResNet50, InceptionV3) was made and a new one (Densenet 201) was added for the present analysis. The famous VGG-16 architecture was a winner of the Imagenet international competition in 2014 [48–50]. ResNet50 is a residual neural network and was the winner of the ImageNet competition in 2015. It incorporates parallel computation through the skip-connection method, which avoids the vanishing gradient problem [50].
A redundancy in the code in the previous modelling was limiting the models' effectiveness; its modification, together with a switch of dependencies (including Tensorflow 2.4.1 and Keras 2.4.2), enabled an improvement over previous classification models. For all the architectures used, the activation function for each layer was a rectified linear unit (ReLU). The last fully connected layer of the network used a ‘softmax’ activation. The loss function selected was categorical cross-entropy, adequate for multi-class comparisons. Cross-entropy measures the distances between probability distributions and predictions [51]. The ‘SGD’ optimizer was selected for all models because it provided better results than alternative optimizers. The learning rate was placed at 1e-3. Accuracy was the metric selected for the compilation process. Computation was carried out on a GPU workstation (HP Z6).
Data augmentation was used to avoid overfitting and to artificially enlarge the sample. This method is highly recommended for small sample sizes since it increases the heuristics of the neural architecture [26]. In the experimental study, samples were augmented via random transformations of the original images involving shifts in width and height (20%), in shear and zoom range (20%), and also including horizontal flipping, as well as a rotation range of 40°.
The architectures were trained on 70% of the original image set. The resulting model was subsequently tested against the 30% remaining sample, which was not used during the training. Training and testing were performed through mini-batch kernels (size = 32). Weight update was made using a backpropagation process for 100 epochs.
Two complementary methods were used here for the classification of the BSM in the AdV site: the model yielding the highest general and balanced accuracies on the testing sets and ensemble learning. The latter consists of using the weights of a group of learners to generate a meta-classification through either majority (classification) or averaging (regression) of results. This is done through three major methods: blending, stacking and majority voting. Blending and stacking, especially the latter, are probably the most widely used ensemble learning methods. Stacking consists of generating a multiple layer structure of estimators with a hierarchy of base learners at the bottom and one or more meta-learners on top (each in successive layers). Blending is a similar approach since it also uses the base learners as the new-feature estimators. Instead of using a k-fold training dataset, blending uses a separate holdout set. Base learners are trained on the training dataset. Predictions are made both on the holdout set and the testing sets. A third very different approach is based on majority voting. This approach to ensemble classification is based on the selection of a series of baseline estimators and their categorical outputs are submitted to majority voting. The categories predicted by most estimators determine the outcome of the meta-prediction. This combined estimation is on average better than using estimators independently because it tends to reduce estimation variance. Here, we used two major methods: stacking and majority voting. Stacking was used to model the group performance on the training/testing experimental sets. Majority voting was used to obtain a common classification of the archaeological BSM. Instead of accepting the results at face value (given that all models correctly classified the testing set with greater than 95% of accuracy), we applied an extremely conservative approach and only accepted marks as anthropogenic when the model group decision (i.e. ensemble learning) coincided with the most accurate single model (InceptionV3).
The probability of classification was calibrated. Calibration, which consists of turning classification scores into class probabilities, is important for assessing confidence in the results. It proceeds by matching the predicted probabilities with the expected distribution of probabilities for each group label. Although using calibrated probabilities did not modify the original classification decisions, it provided more reliable probability estimates for the classification of each archaeological mark.
A gradient weighted activation mapping algorithm (Grad-CAM) was used in order to detect microscopic features that influenced BSM discrimination [52]. This method uses the weighted activation to build a heat-map that is overlaid on the original BSM image, based on gradients of the predicted class derived from the last convolutional feature map. The Grad-CAM algorithm highlights areas of the marks that are most important for the prediction and classification of the image.
Supplementary Material
Data accessibility
All data are given either in the main text or in the electronic supplementary material [53].
Authors' contributions
M.D.-R.: conceptualization, methodology, resources, software, supervision, writing-original draft, writing-review and editing; E.B.: methodology and software; L.V.: formal analysis, investigation, methodology, resources, validation, writing-review and editing; P.S.T.: investigation, methodology, writing-review and editing; M.J.M.: methodology, resources, writing-review and editing; R.A.F.: conceptualization, funding acquisition, investigation, supervision, writing-original draft, writing-review and editing
All authors gave final approval for publication and agreed to be held accountable for the work performed therein.
Competing interests
The authors declare no competing interests.
Funding
We thank the Spanish Ministry of Education and Science for funding this research (grant no. HAR2017-82463-C4-1-P). Also, financial support was provided by a grant by the Comisión Sectorial de Investigación Científica, UdelaR, Uruguay (CSIC I + D 2018 no. 355), a grant by the Espacio Interdisciplinario, UdelaR, Uruguay (Núcleo interdisciplinario de estudios cuaternarios 2019), and a National Geographic Explorer Grant (grant no. 178431) to R.A.F.
References
- 1.Clark P, Dyke AS, Shakun JD, Carlson AE, Clark J, Wohlfarth B, Mitrovica JX, Hostetler SW, Mccabe AM. 2009. The last glacial maximum. Science 325, 710-714. ( 10.1126/science.1172873) [DOI] [PubMed] [Google Scholar]
- 2.Moreno-Mayar JV, et al. 2018. Early human dispersals within the Americas. Science 362, eaav2621. ( 10.1126/science.aav2621) [DOI] [PubMed] [Google Scholar]
- 3.Prates L, Politis GG, Perez SI. 2020. Rapid radiation of humans in South America after the last glacial maximum: a radiocarbon-based study. PLoS ONE 15, e0236023. ( 10.1371/journal.pone.0236023) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Becerra-Valdivia L, Higham T. 2020. The timing and effect of the earliest human arrivals in North America. Nature 584, 93-97. ( 10.1038/s41586-020-2491-6) [DOI] [PubMed] [Google Scholar]
- 5.Holen SR, et al. 2017. A 130,000-year-old archaeological site in southern California, USA. Nature 544, 479-483. ( 10.1038/nature22065) [DOI] [PubMed] [Google Scholar]
- 6.Bordes L, Hayes E, Fullagar R, Deméré T. 2020. Raman and optical microscopy of bone micro-residues on cobbles from the Cerutti mastodon site. J. Archaeol. Sci. 34, 102656. ( 10.1016/j.jasrep.2020.102656) [DOI] [Google Scholar]
- 7.Williams TJ, et al. 2018. Evidence of an early projectile point technology in North America at the Gault Site, Texas, USA. Sci. Adv. 4, eaar5954. ( 10.1126/sciadv.aar5954) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Adovasio JM, Gunn JD, Donahue J, Stuckenrath R. 1978. Meadowcroft Rockshelter, 1977: an overview. Am. Antiq. 43, 632-651. ( 10.2307/279496) [DOI] [Google Scholar]
- 9.Macphail RI, McAvoy JM. 2008. A micromorphological analysis of stratigraphic integrity and site formation at Cactus Hill, an Early Paleoindian and hypothesized pre-Clovis occupation in south-central Virginia, USA. Geoarchaeology 23, 675-694. ( 10.1002/gea.20234) [DOI] [Google Scholar]
- 10.Lowery DL, O'Neal MA, Wah JS, Wagner DP, Stanford DJ. 2010. Late Pleistocene upland stratigraphy of the western Delmarva Peninsula, USA. Quat. Sci. Rev. 29, 1472-1480. ( 10.1016/j.quascirev.2010.03.007) [DOI] [Google Scholar]
- 11.Bourgeon L, Burke A, Higham T. 2017. Earliest human presence in North America Dated to the Last Glacial Maximum: new radiocarbon dates from bluefish caves, Canada. PLoS ONE 12, e0169486. ( 10.1371/journal.pone.0169486) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Dillehay TD, et al. 2015. Correction: new archaeological evidence for an Early Human Presence at Monte Verde, Chile. PLoS ONE 10, e0145471. ( 10.1371/journal.pone.0145471) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Vialou D, Benabdelhadi M, Feathers J, Fontugne M, Vialou AV. 2017. Peopling South America's centre: the late Pleistocene site of Santa Elina. Antiquity 91, 865-884. ( 10.15184/aqy.2017.101) [DOI] [Google Scholar]
- 14.Ardelean CF, et al. 2020. Evidence of human occupation in Mexico around the Last Glacial Maximum. Nature 584, 87-92. ( 10.1038/s41586-020-2509-0) [DOI] [PubMed] [Google Scholar]
- 15.Boëda E, et al. 2021. 24.0 kyr cal BP stone artefact from Vale da Pedra Furada, Piauí, Brazil: Techno-functional analysis. PLoS ONE 16, e0247965. ( 10.1371/journal.pone.0247965) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Fariña RA, et al. 2014. Arroyo del Vizcaíno, Uruguay: a fossil-rich 30-ka-old megafaunal locality with cut-marked bones. Proc. R. Soc. B 281, 20132211. ( 10.1098/rspb.2013.2211) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Fariña RA, Tambusso PS, Varela L, Czerwonogora A, Di Giacomo M, Musso M, Bracco R, Gascue A. 2014. Among others, cut-marks are archaeological evidence: reply to ‘Archaeological evidences are still missing: a comment on Fariña et al. Arroyo del Vizcaíno Site, Uruguay’ by Suárez et al. Proc. R. Soc. B 281, 20141637. ( 10.1098/rspb.2014.1637) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Fariña RA. 2015. Bone surface modifications, reasonable certainty, and human antiquity in the Americas: the case of the Arroyo Del Vizcaίno Site. Am. Antiq. 80, 193-200. ( 10.7183/0002-7316.79.4.193) [DOI] [Google Scholar]
- 19.Fariña RA, Tambusso PS, Varela L, Velazco S, Di Giacomo M, Gascue A. 2021. Arroyo del Vizcaíno: Strengths and weaknesses of a very old archaeological/paleontological site in Uruguay, South America. In: New Discoveries in the American Paleolithic: The Pre-16,000 Archaeological Record, Conference Proceedings (eds Jefferson GT, Holen S, Holen K). Sunbelt Publications Inc., San Diego, California, USA. [Google Scholar]
- 20.Domínguez-Rodrigo M, Barba R, Egeland CP. 2007. Deconstructing Olduvai: a taphonomic study of the Bed I sites. Dordrecht, The Netherlands: Springer Science & Business Media. [Google Scholar]
- 21.McPherron SP, Alemseged Z, Marean CW, Wynn JG, Reed D, Geraads D, Bobe R, Béarat HA. 2010. Evidence for stone-tool-assisted consumption of animal tissues before 3.39 million years ago at Dikika, Ethiopia. Nature 466, 857-860. ( 10.1038/nature09248) [DOI] [PubMed] [Google Scholar]
- 22.Dambricourt Malassé A, et al. 2016. Intentional cut-marks on bovid from the Quranwala zone, 2.6 Ma, Siwalik Frontal Range, northwestern India. C. R. Palevol. 15, 317-339. ( 10.1016/j.crpv.2015.09.019) [DOI] [Google Scholar]
- 23.Domínguez-Rodrigo M, Pickering TR, Bunn HT. 2010. Configurational approach to identifying the earliest hominin butchers. Proc. Natl Acad. Sci. USA 107, 20 929-20 934. ( 10.1073/pnas.1013711107) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Domínguez-Rodrigo M, Alcalá L. 2016. 3.3-Million-year-old stone tools and butchery traces? More evidence needed. PaleoAnthropology 2016 46-53. [Google Scholar]
- 25.Domínguez-Rodrigo M, et al. 2017. Use and abuse of cut-mark analyses: the Rorschach effect. J. Archaeol. Sci. 86, 14-23. ( 10.1016/j.jas.2017.08.001) [DOI] [Google Scholar]
- 26.Chollet F. 2017. Deep learning with Python. Shelter Island, NY: Manning Publications Company. [Google Scholar]
- 27.Byeon W, Domínguez-Rodrigo M, Arampatzis G, Baquedano E, Yravedra J, Maté-González MA, Koumoutsakos P. 2019. Automated identification and deep classification of cut-marks on bones and its paleoanthropological implications. J. Comput. Sci. 32, 36-43. ( 10.1016/j.jocs.2019.02.005) [DOI] [Google Scholar]
- 28.Cifuentes-Alcobendas G, Domínguez-Rodrigo M. 2019. Deep learning and taphonomy: high accuracy in the classification of cut-marks made on fleshed and defleshed bones using convolutional neural networks. Sci. Rep. 9, 18933. ( 10.1038/s41598-019-55439-6) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Fernández López SR. 2000. Temas de tafonomía. See https://eprints.ucm.es/id/eprint/22003.
- 30.Fernández López SR. 1998. Tafonomía y fosilización. In Tratado de Paleontología, vol. I (ed. B Meléndez), pp. 51–107, 438–441. Madrid, Spain: Consejo Superior deInvestigaciones Científicas.
- 31.Fernández López SR. 2006. Taphonomic alteration and evolutionary taphonomy. J. Taphonomy 4, 111-142. [Google Scholar]
- 32.Pizarro-Monzo M, Domínguez-Rodrigo M. 2020. Dynamic modification of cut-marks by trampling: temporal assessment through the use of mixed-effect regressions and deep learning methods. Archaeol. Anthropol. Sci. 12, 1-3. ( 10.1007/s12520-019-00966-6) [DOI] [Google Scholar]
- 33.Domínguez-Rodrigo M, Cifuentes-Alcobendas G, Jiménez-García B, Abellán N, Pizarro-Monzo M, Organista E, Baquedano E. 2020. Artificial intelligence provides greater accuracy in the classification of modern and ancient bone surface modifications. Sci. Rep. 10, 18862. ( 10.1038/s41598-020-75994-7) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Boëda E, et al. 2014. A new late Pleistocene archaeological sequence in South America: the Vale da Pedra Furada (Piauí, Brazil). Antiquity 88, 927-941. ( 10.1017/S0003598X00050845) [DOI] [Google Scholar]
- 35.Boëda E, et al. 2016. New Data on a Pleistocene Archaeological Sequence in South America: Toca do Sítio do Meio, Piauí, Brazil . PaleoAmerica 2, 286-302. ( 10.1080/20555563.2016.1237828) [DOI] [Google Scholar]
- 36.Lahaye C, et al. 2015. New insights into a late-Pleistocene human occupation in America: the Vale da Pedra Furada complete chronological study. Quat. Geochronol. 30, 445-451. ( 10.1016/j.quageo.2015.03.009) [DOI] [Google Scholar]
- 37.Lahaye C, et al. 2013. Human occupation in South America by 20,000 BC: the Toca da Tira Peia site, Piauí, Brazil. J. Archaeol. Sci. 40, 2840-2847. ( 10.1016/j.jas.2013.02.019) [DOI] [Google Scholar]
- 38.Fariña RA, Vizcaíno SF, De Iuliis G. 2013. Megafauna: giant beasts of Pleistocene South America. Bloomington, IN: Indiana University Press. [Google Scholar]
- 39.Prates L, Perez SI. 2021. Late Pleistocene South American megafaunal extinctions associated with rise of Fishtail points and human population. Nat. Commun. 12, 1-11. ( 10.1038/s41467-021-22506-4) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Potter BA, Irish JD, Reuther JD, McKinney HJ. 2014. New insights into Eastern Beringian mortuary behavior: a terminal Pleistocene double infant burial at Upward Sun River. Proc. Natl Acad. Sci. USA 111, 17 060-17 065. ( 10.1073/pnas.1413131111) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Moreno-Mayar JV, et al. 2018. Terminal Pleistocene Alaskan genome reveals first founding population of Native Americans. Nature 553, 203-207. ( 10.1038/nature25173) [DOI] [PubMed] [Google Scholar]
- 42.Skoglund P, Mallick S, Bortolini M, Chennagiri N, Hünemeier T, Petzl-Erler ML, Salzano FM, Patterson N, Reich D. 2015. Genetic evidence for two founding populations of the Americas. Nature 525, 104-108. ( 10.1038/nature14895) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Gidna A, Yravedra J, Domínguez-Rodrigo M. 2013. A cautionary note on the use of captive carnivores to model wild predator behavior: a comparison of bone modification patterns on long bones by captive and wild lions. J. Archaeol. Sci. 40, 1903-1910. ( 10.1016/j.jas.2012.11.023) [DOI] [Google Scholar]
- 44.Arriaza MC, Domínguez-Rodrigo M, Yravedra J, Baquedano E. 2016. Lions as Bone Accumulators? Paleontological and ecological implications of a modern bone assemblage from Olduvai Gorge. PLoS ONE 11, e0153797. ( 10.1371/journal.pone.0153797) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Domínguez-Rodrigo M, de Juana S, Galán AB, Rodríguez M. 2009. A new protocol to differentiate trampling marks from butchery cut-marks. J. Archaeol. Sci. 36, 2643-2654. ( 10.1016/j.jas.2009.07.017) [DOI] [Google Scholar]
- 46.Jiménez-García B, Aznarte J, Abellán N, Baquedano E, Domínguez-Rodrigo M. 2020. Deep learning improves taphonomic resolution: high accuracy in differentiating tooth marks made by lions and jaguars. J. R. Soc. Interface 17, 20200446. ( 10.1098/rsif.2020.0446) [DOI] [Google Scholar]
- 47.Abellán N, Jiménez-García B, Aznarte J, Baquedano E, Domínguez-Rodrigo M. 2021. Deep learning classification of tooth scores made by different carnivores: achieving high accuracy when comparing African carnivore taxa and testing the hominin shift in the balance of power. Archaeol. Anthropol. Sci. 13, 31. ( 10.1007/s12520-021-01273-9) [DOI] [Google Scholar]
- 48.Simonyan K, Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition. arXiv.
- 49.Simonyan K, Zisserman A. 2014. Two-stream convolutional networks for action recognition in videos. In Advances in neural information processing systems 27 (eds Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ), pp. 568-576. Red Hook, NY: Curran Associates. [Google Scholar]
- 50.He K, Zhang X, Ren S, Sun J. 2016. Deep residual learning for image recognition. In Proc. of the IEEE conf. on computer vision and pattern recognition, pp. 770-778. New York, NY: IEEE. [Google Scholar]
- 51.Goodfellow I, Bengio Y, Courville A. 2016. Deep learning. Cambridge, MA: MIT Press. [Google Scholar]
- 52.Selvaraju RR, et al. 2017. Grad-cam: visual explanations from deep networks via gradient-based localization. In Proc. of the IEEE int. conf. on computer vision, pp. 618-626. New York, NY: IEEE. [Google Scholar]
- 53.Domínguez-Rodrigo M, Baquedano E, Varela L, Tambusso PS, Melián MJ, Fariña RA. 2021. Deep classification of cut-marks on bones from Arroyo del Vizcaíno (Uruguay). FigShare. [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Citations
- Domínguez-Rodrigo M, Baquedano E, Varela L, Tambusso PS, Melián MJ, Fariña RA. 2021. Deep classification of cut-marks on bones from Arroyo del Vizcaíno (Uruguay). FigShare. [DOI] [PMC free article] [PubMed]
Supplementary Materials
Data Availability Statement
All data are given either in the main text or in the electronic supplementary material [53].


