Editor’s note:
On 16 August 2023, Science Advances published a Research Article “Societies of strangers do not speak less complex languages” by O. Shcherbakova et al. (1). On 8 November 2023, an Editorial Expression of Concern alerted readers that the authors notified the journal about a problem with their analyses (2). The authors have corrected the paper as described in the Erratum (3). These changes have addressed concerns about the integrity of the paper therefore, Science Advances has removed the Editorial Expression of Concern. The journal has posted this notification in its place to indicate the editors’ confidence in the Research Article’s data and conclusions. We thank the authors for bringing these issues to our attention.
—H. Holden Thorp
Editor-in-Chief, Science Advances
References
1. O. Shcherbakova et al., Societies of strangers do not speak less complex languages. Sci.Adv. 9, eadf7704(2023). DOI: 10.1126/sciadv.adf7704
2. H. H. Thorp, Editorial expression of concern. Sci.Adv. 9, eadm8238 (2023). DOI: 10.1126/sciadv.adm8238
3. Editor’s note and erratum for the Research Article: “Societies of strangers do not speak less complex languages” by Shcherbakova et al., Sci. Adv. 10, eadm6113 (2024).
Erratum:
The original version of the Research Article, “Societies of strangers do not speak less complex languages” by Shcherbakova et al., contained errors in the reported analyses.
In the analyses reported, 23 of 1314 languages were incorrectly coded as having “0” L1 speakers and 1194 of 1314 languages as having “0” L2 speakers (and consequently a proportion of “0” L2 speakers), despite the data being missing in these languages. The numbers of L1 speakers and L2 speakers were wrongly calculated because of the following:
1. The wrong data file was selected from the demographic data from Ethnologue (1), i.e., Table_of_Languages.tab (which lacks L2_User data) was used instead of Table_of_LICs.tab (which contains L2_Users).
2. There was an error in the code, which failed to exclude languages with missing values for All_Users and L1_Users columns.
To correct these errors, the authors have done the following:
1. For the languages with available data on the number of L2 speakers (n = 120) in the appropriate file (i.e., Table_of_LICs.tab), the authors tested whether the proportion of L2 speakers is negatively correlated with grammatical complexity. As in the original paper, the authors found no support for a correlation either with fusion or with informativity. This applies to models with L2 proportion as the only predictor and to the models incorporating the combination of L2 proportion with the number of L1 speakers, and the interaction between these two predictors.
2. To address the problem of missing data in the main analyses on the larger sample, the authors used the appropriate file (i.e., Table_of_LICs.tab) and corrected the code to include only languages for which the data in All_Users and L1_Users columns was available. This reduced the sample from 1314 to 1291 languages. The authors reran their main analyses with all predictors on these data, except for the “proportion of L2 speakers.” To replace the “proportion of L2 speakers” in the large sample, the authors added a new predictor variable, “Vehicularity.” This variable has been argued to be a reliable indicator of whether a language is expected to have L2 speakers (2). This variable was constructed following the approach described in (2) based on the Expanded Graded Intergenerational Disruption Scale (EGIDS) available in the Ethnologue (1). The EGIDS scale reflects how endangered a language is: level 0 stands for “International”, and level 10 stands for “Extinct”. If a language has a low EGIDS level, such as 0 (“International”), 1 (“National”), 2 (“Provincial”), or 3 (“Wider Communication”), the language is considered “vehicular,” i.e., expected to have L2 speakers. In contrast, languages of level 4 (“Educational”) and higher (“Developing,” “Vigorous,” “Threatened,” “Shifting,” “Moribund,” “Nearly Extinct,” “Dormant,” and “Extinct”) are not likely to be used by L2 speakers.
The authors modeled the effects of the previously used predictors (the number of L1 speakers - “L1,” the number of linguistic neighbors - “Neighbors,” the presence/absence of official status - “Official,” whether the language is used in education - “Education”) and Vehicularity on the reduced sample of 1291 languages. These results confirmed that none of the predictors are negatively correlated with measures of grammatical complexity in the full models (see corrected Fig. 2).
Below is a summary of data and methodological changes made to the paper.
Data changes
• The authors tested the proportion of L2 speakers as a predictor of grammatical complexity in a sample of 120 languages.
• The authors tested Vehicularity as a predictor of grammatical complexity (as well as previously used predictors) on the sample of 1291 languages.
Methodological changes
• During code revision, the authors found that the varcov.spatial function in the geoR R package (3) calculated spatial distances in two-dimensional Euclidean space, and not the intended great-circle distance. The authors have also informed the R package maintainers of this problem in their function. This meant that distances across the antimeridian were not being accounted for. The authors have altered their approach to use great-circle distance. This change had minimal impact on the calculation of geographic covariance because distances nearing and more than 1000 km are modeled to have negligible impact on each other (see fig. S1).
Consistent with the authors’ initial findings, the corrected analyses found that the correlation between grammatical complexity and sociodemographic predictors was either absent or weak.
Similarly, the authors again found weak positive effects of the number of L1 speakers on both fusion and informativity. In contrast to previous findings where “Official” and “Education” were not correlated with fusion, the authors’ corrected analyses suggest a weak positive effect of these variables on fusion. The new predictor, “Vehicularity,” also shows a low to moderate positive correlation with measures of grammatical complexity (see corrected Fig. 2).
The authors’ original paper reported a weak negative correlation between the morphological complexity score calculated in (4) and the number of L1 speakers [while applying the same 10% feature coverage threshold as in (4)]. This correlation disappears on the smaller sample after raising the threshold to 35%. The authors used this analysis to argue in the original paper that Lupyan and Dale’s (4) results might have resulted from the data sparseness in the World Atlas of Language Structures (5) and the use of the low threshold for minimum feature coverage. In this correction, the authors reran these analyses after excluding missing data for the L1_Users and All_Users columns. This led to a sample reduction from 621 to 448 languages for the 10% threshold and 90 to 77 languages for the 35% threshold. In these corrected analyses, the authors now see no evidence for any correlation under both thresholds of feature coverage (see corrected table S4). Only the model with the 10% threshold and control for spatiophylogenetic effects on a sample of 441 languages shows a weak negative correlation (coef est: −0.02; the 95% credible intervals: −0.05 to 0).
The paper’s core findings and conclusions hold: The authors find no evidence for a negative correlation between grammatical complexity and sociodemographic predictors associated with societies of strangers.
All code for the analysis, including version history of the original study and the reanalysis, can also be found at Zenodo (https://zenodo.org/records/10420654) and https://github.com/OlenaShcherbakova/Sociodemographic_factors_complexity.
The following is the original paper that has been updated to incorporate the corrected data discussed above:
• The text in the subsections “Languages and societies sample,” “Sociodemographic variables of exotericity,” “Spatiophylogenetic modeling,” “Results,” “Discussion,” “Sociodemographic variables,” and the caption of Fig. 2.
• Figures 1 to 4.
• Tables 1 and 2.
• Tables S2 to S4. The Supplementary Materials PDF has been replaced.
Citation: Editor’s note and erratum for the Research Article: “Societies of strangers do not speak less complex languages” by Shcherbakova et al., Sci. Adv. 10, eadm6113 (2024).
REFERENCES AND NOTES
- 1.D. M. Eberhard, G. F. Simons, C. D. Fennig, Ethnologue Global Dataset (SIL International, ed. 24, 2021). [Google Scholar]
- 2.A. Koplenig, Language structure is influenced by the number of speakers but seemingly not by the proportion of non-native speakers. R. Soc. Open Sci. 6, 181274 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.P. J. Ribeiro, P. J. Diggle, M. Schlather, R. Bivand, B. Ripley, geoR: Analysis of geostatistical data 2020; https://CRAN.R-project.org/package=geoR.
- 4.G. Lupyan, R. Dale, Language structure is partly determined by social structure. PLOS ONE 5, e8559 2010. https://doi.org/ 10.1371/journal.pone.0008559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.M. Dryer, M. Haspelmath, The World Atlas of Language Structures Online (Max Planck Institute for Evolutionary Anthropology, 2013); http://wals.info. [Google Scholar]
