Skip to main content
PLOS One logoLink to PLOS One
. 2021 Feb 1;16(2):e0245263. doi: 10.1371/journal.pone.0245263

Comparing phonological and orthographic networks: A multiplex analysis

Pablo Lara-Martínez 1, Bibiana Obregón-Quintana 1, Cesar F Reyes-Manzano 2, Irene López-Rodríguez 2, Lev Guzmán-Vargas 2,*
Editor: Diego Raphael Amancio3
PMCID: PMC7850493  PMID: 33524013

Abstract

The complexity of natural language can be explored by means of multiplex analyses at different scales, from single words to groups of words or sentence levels. Here, we plan to investigate a multiplex word-level network, which comprises an orthographic and a phonological network defined in terms of distance similarity. We systematically compare basic structural network properties to determine similarities and differences between them, as well as their combination in a multiplex configuration. As a natural extension of our work, we plan to evaluate the preservation of the structural network properties and information-based quantities from the following perspectives: (i) presence of similarities across 12 natural languages from 4 linguistic families (Romance, Germanic, Slavic and Uralic), (ii) increase of the size of the number of words (corpus) from 104 to 50 × 103, and (iii) robustness of the networks. Our preliminary findings reinforce the idea of common organizational properties among natural languages. Once concluded, will contribute to the characterization of similarities and differences in the orthographic and phonological perspectives of language networks at a word-level.

Introduction

Many studies focused on the complexity of natural language have pointed out that language is the manifestation of different levels of complex organization [14], ranging from semantics [5] to syntax [6, 7] or even emotional components [8]. Of particular interest are the applications of network science on language organization, where these levels of complexity may be explored by means of single [9, 10] and multilayer graphs [11, 12]. A number of studies have reported emergent organizational properties in language based on associations of semantics, orthographic similarities [13] and phonetics [14, 15]. In many of these networks, the behavior of connectivities -the number of neighbors of a given node- is found to follow a distribution with a tail, which can be short or large. For instance, Arbesman et al. [16] reported that for phonological networks, the degree distribution can be well described by a truncated power law for several languages. For orthographic networks, Trautwein et al. [13] described that the distribution of connectivities for mental lexicon of students at elementary level, has a power law tail and the network exhibits a small-word property. Despite the variety of characterizations of language from the network’s perspective, only a limited number of studies have incorporated the multi-layer aspects of language. Here, we consider a bi-layer approach of the analysis of orthographic and phonological language networks. Our procedure is based on the mapping of words into a two-layer network where nodes are words, and where connections are defined if an appropriate distance similarity is considered. In general, distance similarity between two strings, A and B, can be defined as the minimum number of edit operations needed to transform A into B. In our study we will consider the Damerau-Levenshtein (DL) as a proxy of the similarity between two words. It is recognized that, for many natural languages, there is not a biunivocal correspondence between how a word is spelled and its corresponding pronunciation, for instance, there is not a biunivocal correspondence between graphemes and phonemes. In fact, it is more likely to be observed in particular situations like homography (when a letter corresponds to two phonemes), digraphy (two letters correspond to one phoneme or viceversa), heterography (one phoneme corresponds to two or more letters), etc.

When comparing orthographical and phonological networks, an important question would be if the local and global connectivity patterns exhibit similarities. As well as what kind of differences can be identified, more specifically, in the context of psycholinguistics studies. The latter suggesting that the acting mechanisms on the cognitive processes, such as word recognition and retrieval, are particularly different than the orthographic organization.

Proposed hypothesis and research plan

Our study is based on the premises that network representation of both syntax and phonological networks capture the most representative features of each network. In this sense, different questions can be asked. Our study focuses on the following three research questions:

  • What are the characteristics of multiplex orthographic-phonological language networks?

  • Would the connectivity patterns from orthographic and phonological networks reveal similarities and differences between them?

  • How does orthographic structure varies in relation to phonological patterns across several natural languages?

There is enough evidence that phonological and grammatical networks exhibit common properties and differences. We shall focus on the evaluation of properties both locally and globally to show the differences between each layer while quantifying them at a bi-layer network (multiplex). To strengthen our study we initially intend to carry out the analysis in four natural languages (Spanish, English, German and Russian) via a 104 word corpus. The plan for a secondary stage contemplates two considerations: (i) increase the corpus size from 104 to 50 × 103 words and (ii) expand the analysis to 12 languages belonging to 4 different linguistic families (Germanic, Romance, Slavic and Uralic).

Data analytic and proposed analyses

Methods

The study of complex networks has incorporated the analysis of systems, for which, multiplex modelling is more suitable. In these cases nodes are located in layers with connections among them and the nodes are common to all layer-networks. A number of real-world and simulated multilayer networks have been studied in contexts such as finance and economics [1719], social systems [20, 21], synchronization [22] and linguistics [12].

In this study, we plan to analyze the multiplex language network which consists of an orthographic network and phonological network (see Fig 1 for a schematic representation). For the orthographic network, we construct a network at word-level G[O] = (V[O], E[O]), where nodes are words and a link between two nodes is defined if the DL distance, described later, is smaller or equal than a threshold value . Similarly, a phonological network G[P] = (V[P], E[P]) is constructed where the nodes represent words which were translated to the international phonological alphabet (IPA), and edges are defined if the DL, is smaller or equal than a given threshold . To generate a multiplex language network at word-level, the orthographic and phonological networks are combined to form a two-layer word-level network, denoted by GL[α]=(V[α],E[α]), with α = O, P. Here, the adjacency matrix for the multiplex network is given aij[α], where aij[α]=1 indicates that there is a link between node (word) i and node (word) j at layer α. More formally, the adjacency matrix associated with each layer is defined as: aij[α]=Θ(-d(wi[α],wj[α]))-δij, where Θ(−) represents the Heaviside function, δij is the Kronecker delta and d(wi[α],wj[α]) the DL distance between word i and word j at layer α.

Fig 1. Construction of the multiplex language network.

Fig 1

Schematic illustration of the construction of a multiplex language network for English based on an orthographic-distance and phonological-distance similarity networks. In the orthographic and phonological layers nodes are words and there is a link if the Damerau-Levenshtein distance is smaller than a given threshold . Notice that words in the phonological layer were translated into the International Phonetic Alphabet and then the DL was calculated.

Regarding the distance condition between two words, as we mentioned in the Introduction, the distance similarity between two strings A and B can be defined as the minimum number of edit operations needed to transform A into B. These operations are: (1) substitute a character in A to a different character, (2) insert a character into A, (3) delete a character of A, and (4) transpose two adjacent characters of A. The Damerau-Levenshtein (DL) distance is then defined as the length of the optimal edit sequence. For instance, the Levenshtein distance is the length of the shortest sequence of substitutions, insertions, and deletions needed to transform string A into string B. In our analysis, we adopt the DL distance as a threshold value to define a link between two words.

Databases

The corpus of words were constructed from written texts (books) freely available at Gutenberg project www.gutenberg.org. The written texts were pre-processed to remove function words, stop words and any mark symbol. The titles of the written texts and the resulting corpus are described in https://doi.org/10.6084/m9.figshare.12735380.v4 [23]. The final corpora contain 104 words with their corresponding translation to the international phonetic alphabet for four languages (transliterated by the epitran library of Python version 3.6.8).

Topological properties of single-layer and multiplex networks

Our initial analysis is focused on the basic topological characteristics of two individual networks, and then to proceed to investigate similarities and differences of the two layers. The single-layer-network measures (of a network with N nodes) in a multiplex network that have been initially evaluated are [24]:

  • Density. The density of a layer α, ρ[α], is given as:
    ρ[α]=2m[α]N(N-1) (1)
    where m[α] is the number of actual connections within the layer α.
  • Degree distribution. The degree ki[α] of a node i is the number of links outgoing (or incoming) to that node,
    ki[α]=j=1Naij[α]. (2)
    The degree distribution for layer α is then defined as the fraction of nodes in the network with degree k,
    P[α](k)=nk[α]N, (3)
    where nk[α] is the number of nodes with degree k.
  • Clustering Coefficient. Measures the degree of transitivity in connectivity among the nearest neighbors of a node i within the layer α. Ci[α] is calculated as [25],
    Ci[α]=2Ei[α]ki[α](ki[α]1), (4)
    where Ei[α] is the number of links between the ki[α] neighbors of the node i within the layer α.
  • Average Nearest-Neighbor Degree. Measures the average of the neighbors of a node [25]. The k¯nn,i[α] is calculated as:
    k¯nn,i[α]=1ki[α]j=1Naij[α]kj[α]. (5)
  • Modularity. Given ci[α] the community associated to the node i within the layer α, where ci[α]{1,2,,P}, with P a natural number. The modularity, Q[α] of a given layer α is given by [24]:
    Q[α]=12m[α]ij(aij[α]ki[α]kj[α]2m[α])δ(ci[α]cj[α]), (6)
    where δ is the Kronecker delta. We use the Louvain algorithm [26] to perform a greedy optimization of the modularity.

In order to get insight on our study, we plan to characterize structural network properties and information-based quantities from the following perspectives: (i) presence of similarities across 4 linguistics families (Romance, Germanic, Slavic and Uralic), (ii) increase of the size of the number of words (corpus) from 104 to 50 × 103, and (iii) robustness of the networks. Regarding (i), we will analyze to what extent the topological single-layer and multiplex network properties exhibit similarities and differences quantified by means of correlation measures and information-theory-based metrics for 12 natural languages which belong to 4 linguistic families. To reinforce the characterization of the grouping patterns of nodes of the network, we will consider multilayer community detection algorithms [27] to determine the presence of clusters across layers. These procedures will help us in the understanding of local and global network properties of the orthographic-phonological variations across several languages. With respect to (ii), we plan to increase the size of the corpus to 50 × 103 in the number words for all languages in our study. The results for this size will confirm the validity of our preliminary results for 104 words, and also will permit to evaluate the concordance of our findings with previous results. Concerning (iii), the robustness of the single-layer and multiplex network will be evaluated by means of two well-recognized strategies: random removal of fraction of nodes and edges and directed attacks [28]. Moreover, a randomized version of the networks will be also considered to repeat all the calculations in our study.

Initial analyses for 4 natural languages and 104 words

We have started our analyses working with 4 languages (Spanish, English, German and Russian) with corpus containing 104 words each one. Table 1 concentrates the results of the calculations for the basic structural properties of the orthographic network, the phonological network and the multiplex one. These preliminary results of topological features indicate that there are common properties at local and global scales. Interestingly, the results for the average clustering for Spanish, in the case of the phonological layer with = 2, is concordant with the value reported for phonological networks [16], where the authors used a different corpus and assumed an edge between words if the differ by a single phoneme or sound segment. In order to get a better understanding of the patterns of the connectivities in both layers, we proceed to construct the degree distribution for different threshold values of the DL distance ranging form 1 to 3. Fig 2 shows the cases of the degree distributions of GO and GP for Spanish, German, English and Russian and DL distances from 1 to 3. It is visually apparent that, for the 4 languages, as the DL distance increases, the distributions change from an approximately exponential regime ( = 1) to a combination of an exponential and power law behavior ( ≥ 2). It is likely that the best fit would be obtained by means of a truncated power law function, which has been suggested to fit phonological networks [16]. In our initial estimation of the best fit of the distributions, we only consider the power law behavior at the tails, P(k) ∼ kγ, where γ is an exponent which characterizes the connectivities. For instance, for a DL distance = 3 and the phonological layer, the estimated γ-exponents (1.12) for the power law degree distribution is concordant to the value reported in [16] for Spanish. Additional tests are needed in order to get a better description of the distributions, and also for the behavior of the other topological metrics as a function of the degree.

Table 1. The basic topological network quantities are listed for the ortographic (GO) and phonological (GP) networks.

Language Network GO GP
Metric
Threshold = 1 = 2 = 3 = 1 = 2 = 3
English Density 0.60(10−3) 1.98(10−3) 10.23(10−3) 0.86(10−3) 3.18(10−3) 15.42(10−3)
Average degree k¯ 2.03 13.74 91.53 1.24 3.41 16.78
Clustering c¯ 0.12 0.22 0.29 0.03 0.12 0.21
Nearest neighbor knn¯ 2.51 19.31 137.66 0.46 4.57 24.92
Maximum modularity Q 0.91 0.55 0.35 0.99 0.77 0.49
Fit exponent γ 2.57 ± 0.57 1.31 ± 0.25 1.02 ± 0.16 2.08 ± 0.56 1.22 ± 0.24 0.96 ± 0.18
Average cluster size 4.26 13.76 58.10 5.24 17.35 60.07
German Density 0.59(10−3) 1.05(10−3) 4.29(10−3) 0.81(10−3) 1.61(10−3) 5.70(10−3)
Average degree k¯ 1.47 5.73 32.23 1.93 8.53 42.24
Clustering c¯ 0.10 0.22 0.27 0.14 0.22 0.29
Nearest neighbor knn¯ 1.72 7.77 48.76 2.35 11.36 61.48
Maximum modularity Q 0.98 0.67 0.45 0.94 0.64 0.47
Fit exponent γ 3.14 ± 0.48 1.82 ± 0.42 1.20 ± 0.21 2.47 ± 0.47 1.57 ± 0.37 1.15 ± 0.12
Average cluster size 2.94 8.05 21.83 3.68 8.56 23.49
Russian Density 0.64(10−3) 0.65(10−3) 2.68(10−3) 0.74(10−3) 0.65(10−3) 2.16(10−3)
Average degree k¯ 1.22 3.73 22.27 1.24 3.42 16.78
Clustering c¯ 0.06 0.19 0.25 0.07 0.19 0.25
Nearest neighbor knn¯ 1.35 5.06 33.92 1.37 4.57 24.92
Maximum modularity Q 0.99 0.74 0.43 0.99 0.78 0.5
Fit exponent γ 4.17 ± 0.50 2.09 ± 0.35 1.31 ± 0.25 3.64 ± 0.34 2 ± 0.26 1.39 ± 0.22
Average cluste size 2.40 6.18 25.52 2.40 5.24 17.31
Spanish Density 0.52(10−3) 0.86(10−3) 4.14(10−3) 0.52(10−3) 1.11(10−3) 5.59(10−3)
Average degree k¯ 1.41 5.87 37.36 1.68 8.07 51.53
Clustering c¯ 0.06 0.21 0.27 0.09 0.23 0.28
Nearest neighbor knn¯ 1.66 8.18 56.69 2.01 11.28 77.20
Maximum modularity Q 0.98 0.65 0.41 0.95 0.59 0.39
Fit exponent γ 3.26 ± 0.53 1.90 ± 0.46 1.19 ± 0.31 2.98 ± 0.49 1.77 ± 0.43 1.12 ± 0.30
Average cluster size 2.87 10.21 60.49 3.36 12.72 83.75

Notes. Topological metrics of the orthographic network and the phonological network. Here we present the average values of the degree (ki), clustering (ci) and nearest neighbor (knn,i). We observe that the density, k¯, c¯ and knn¯ exhibit an increasing behavior for the four languages and the two layers, with some similarities such as it occurs for c¯ in both layers and distances = 2, 3. For the modularity and the average cluster size, we observe they exhibit opposite trends, while the modularity decreases as increases, the average cluster size increases because a larger number of nodes tends to be connected to a giant component.

Fig 2. Degree distributions for phonological and orthographic networks and several DL distance thresholds.

Fig 2

a) Phonological (English). b) Phonological (Spanish). c) Orthographic (English). d) Orthographic (Spanish). For a better comparison of the data, the insets of each plot show the corresponding degree distribution for normalized degrees, where k* = max(log(k)).

Proposed timeline

The proposed study requires at most 3 months to complete (starting Dec. 1st., 2020). It is planned to build the corpus of 12 new languages and enlarge the size of the existing ones to 50 × 103. This stage is planned to conclude in a month, and immediately proceed to carry out the corresponding pre-processing for the translation into the international phonetic alphabet of all the corpus. Then we will proceed with the calculations of the metrics of the orthographic, phonological and multiplex networks. Next, we plan on finishing data interpretation and drafting the final manuscript in the following two months.

Supporting information

S1 File

(TXT)

Data Availability

All corpora used in this study are available from the https://doi.org/10.6084/m9.figshare.12735380.v4 database.

Funding Statement

This work was partially supported by programs EDI and COFAA from Instituto Politécnico Nacional and Consejo Nacional de Ciencia y Tenología, México. No additional external funding was received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Solé RV, Corominas-Murtra B, Valverde S, Steels L. Language networks: Their structure, function, and evolution. Complexity. 2010;15(6):20–26. [Google Scholar]
  • 2. Amato R, Lacasa L, Díaz-Guilera A, Baronchelli A. The dynamics of norm change in the cultural evolution of language. Proceedings of the National Academy of Sciences. 2018;115(33):8260–8265. 10.1073/pnas.1721059115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Hernández-Gómez C, Basurto-Flores R, Obregón-Quintana B, Guzmán-Vargas L. Evaluating the Irregularity of Natural Languages. Entropy. 2017;19(10):521 10.3390/e19100521 [DOI] [Google Scholar]
  • 4. Ferrer-i Cancho R, Bentz C, Seguin C. Optimal coding and the origins of Zipfian laws. Journal of Quantitative Linguistics. 2020; p. 1–30. [Google Scholar]
  • 5. Seoane LF, Solé R. The morphospace of language networks. Scientific Reports. 2018;8(1):1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Corominas-Murtra B, Sànchez Fibla M, Valverde S, Solé R. Chromatic transitions in the emergence of syntax networks. Royal Society Open Science. 2018;5(12):181286 10.1098/rsos.181286 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Jiang J, Yu W, Liu H. Does scale-free syntactic network emerge in second language learning? Frontiers in Psychology. 2019;10:925. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Garcia D, Garas A, Schweitzer F. Positive words carry less information than negative words. EPJ Data Science. 2012;1(1):3 10.1140/epjds3 [DOI] [Google Scholar]
  • 9. de Arruda HF, Marinho VQ, Costa LdF, Amancio DR. Paragraph-based representation of texts: A complex networks approach. Information Processing & Management. 2019;56(3):479–494. 10.1016/j.ipm.2018.12.008 [DOI] [Google Scholar]
  • 10. Baeza-Blancas E, Obregón-Quintana B, Hernández-Gómez C, Gómez-Meléndez D, Aguilar-Velázquez D, Liebovitch LS, et al. Recurrence networks in natural languages. Entropy. 2019;21(5):517 10.3390/e21050517 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Stella M. Multiplex networks quantify robustness of the mental lexicon to catastrophic concept failures, aphasic degradation and ageing. Physica A: Statistical Mechanics and its Applications. 2020; p. 124382. [Google Scholar]
  • 12. Martinčić-Ipšić S, Margan D, Meštrović A. Multilayer network of language: A unified framework for structural analysis of linguistic subsystems. Physica A: Statistical Mechanics and its Applications. 2016;457:117–128. 10.1016/j.physa.2016.03.082 [DOI] [Google Scholar]
  • 13. Trautwein J, Schroeder S. Orthographic Networks in the Developing Mental Lexicon. Insights From Graph Theory and Implications for the Study of Language Processing. Frontiers in Psychology. 2018;9:2252 10.3389/fpsyg.2018.02252 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Arbesman S, Strogatz SH, Vitevitch MS. Comparative Analysis of Networks of Phonologically Similar Words in English and Spanish. Entropy. 2010;12(3):327 10.3390/e12030327 [DOI] [Google Scholar]
  • 15. Vitevitch MS, Luce PA. Phonological neighborhood effects in spoken word perception and production. Annual Review of Linguistics. 2016;2:75–94. [Google Scholar]
  • 16. Arbesman S, Strogatz SH, Vitevitch MS. The structure of phonological networks across multiple languages. International Journal of Bifurcation and Chaos. 2010;20(03):679–685. 10.1142/S021812741002596X [DOI] [Google Scholar]
  • 17. Barigozzi M, Fagiolo G, Garlaschelli D. Multinetwork of international trade: A commodity-specific analysis. Physical Review E. 2010;81(4):046104 10.1103/PhysRevE.81.046104 [DOI] [PubMed] [Google Scholar]
  • 18. Bargigli L, Di Iasio G, Infante L, Lillo F, Pierobon F. The multiplex structure of interbank networks. Quantitative Finance. 2015;15(4):673–691. 10.1080/14697688.2014.968356 [DOI] [Google Scholar]
  • 19. Poledna S, Molina-Borboa JL, Martínez-Jaramillo S, Van Der Leij M, Thurner S. The multi-layer network nature of systemic risk and its implications for the costs of financial crises. Journal of Financial Stability. 2015;20:70–81. 10.1016/j.jfs.2015.08.001 [DOI] [Google Scholar]
  • 20. Gomez S, Diaz-Guilera A, Gomez-Gardenes J, Perez-Vicente CJ, Moreno Y, Arenas A. Diffusion dynamics on multiplex networks. Physical Review Letters. 2013;110(2):028701 10.1103/PhysRevLett.110.028701 [DOI] [PubMed] [Google Scholar]
  • 21. Gómez-Gardenes J, Reinares I, Arenas A, Floría LM. Evolution of cooperation in multiplex networks. Scientific Reports. 2012;2:620 10.1038/srep00620 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Nicosia V, Valencia M, Chavez M, Díaz-Guilera A, Latora V. Remote synchronization reveals network symmetries and functional modules. Physical Review Letters. 2013;110(17):174102 10.1103/PhysRevLett.110.174102 [DOI] [PubMed] [Google Scholar]
  • 23. Lara-Martínez PA, Obregón-Quintana B, Reyes-Manzano F, López-Rodríguez I, Guzmán-Vargas L. Data Comparing phonetic and orthographic networks: A multiplex analysis; 2020. Available from: https://figshare.com/articles/dataset/Table_Comparing_phonetic_and_orthographic_networks_a_multiplex_analysis_pdf/12735380. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Bianconi G. Multilayer networks: structure and function. Oxford University Press; 2018. [Google Scholar]
  • 25. Newman MEJ. Networks: An Introduction. Oxford University Press; 2010. [Google Scholar]
  • 26. Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment. 2008;2008(10):P10008 10.1088/1742-5468/2008/10/P10008 [DOI] [Google Scholar]
  • 27. De Bacco C, Power EA, Larremore DB, Moore C. Community detection, link prediction, and layer interdependence in multilayer networks. Phys Rev E. 2017;95:042317 10.1103/PhysRevE.95.042317 [DOI] [PubMed] [Google Scholar]
  • 28. Barabási AL, Stanley HE. Fractal Concepts in Surface Growth. Cambridge University Press; 1995. [Google Scholar]

Decision Letter 0

Diego Raphael Amancio

11 Nov 2020

PONE-D-20-23977

Comparing phonological and orthographic networks: a multiplex analysis

PLOS ONE

Dear Dr. Guzmán-Vargas,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please notice that only comments that are applicable to a Registered Report Protocol should be addressed. You can consider additional suggestions when preparing a Registered Report.

Please submit your revised manuscript by Dec 10 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Diego Raphael Amancio

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

3.  Thank you for stating the following in the Acknowledgments Section of your manuscript:

"This work was partially supported by COFAA-IPN, EDI-IPN, and Conacyt-Mexico.".

i) We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

ii) Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

 "NO".

 iii) Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

4. Thank you for stating the following in your Competing Interests section: 

"NO".

i) Please complete your Competing Interests on the online submission form to state any Competing Interests. If you have no competing interests, please state "The authors have declared that no competing interests exist.", as detailed online in our guide for authors at http://journals.plos.org/plosone/s/submit-now

 ii) This information should be included in your cover letter; we will change the online submission form on your behalf.

5. PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. Please see the following video for instructions on linking an ORCID iD to your Editorial Manager account: https://www.youtube.com/watch?v=_xcclfuvtxQ

Additional Editor Comments (if provided):

The authors should address only the comments that are applicable for a Registered Report Protocol.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Does the manuscript provide a valid rationale for the proposed study, with clearly identified and justified research questions?

The research question outlined is expected to address a valid academic problem or topic and contribute to the base of knowledge in the field.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Is the protocol technically sound and planned in a manner that will lead to a meaningful outcome and allow testing the stated hypotheses?

The manuscript should describe the methods in sufficient detail to prevent undisclosed flexibility in the experimental procedure or analysis pipeline, including sufficient outcome-neutral conditions (e.g. necessary controls, absence of floor or ceiling effects) to test the proposed hypotheses and a statistical power analysis where applicable. As there may be aspects of the methodology and analysis which can only be refined once the work is undertaken, authors should outline potential assumptions and explicitly describe what aspects of the proposed analyses, if any, are exploratory.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Is the methodology feasible and described in sufficient detail to allow the work to be replicable?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors described where all data underlying the findings will be made available when the study is complete?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception, at the time of publication. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above and, if applicable, provide comments about issues authors must address before this protocol can be accepted for publication. You may also include additional comments for the author, including concerns about research or publication ethics.

You may also provide optional suggestions and comments to authors that they might find helpful in planning their study.

(Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The study proposed by the authors in this report is interesting and could yield interesting results. The authors do a good job in motivating their project. The manuscript is well written barring some minor typos (see below).

The authors may consider adopting some multilayer community detection algorithms like the ones described in [1] for example to find more interesting clusters across layers.

Typo on line 79 on page 3 - A & B should be V & W.

Reference:

[1] De Bacco, Caterina, et al. "Community detection, link prediction, and layer interdependence in multilayer networks." Physical Review E 95.4 (2017): 042317.

Reviewer #2: The manuscript ‘Comparing phonological and orthographic networks: a multiplex analysis’ by Pablo Lara-Martinez, et al. presents several graph-theoretic statistics and metrics of networks build from orthographic and phonological corpora of words for four languages (English, German Russian, and Spanish). The authors assign words to nodes and use the Damerau-Levenshtein distance to create edges between nodes depending on their distance. Several networks are created in which the cutoff for the distance is different and the statistics are listed for each network. The authors also propose a timeline to expand the size of the corpora and repeat the experiments, but this part of the analysis is not included in the manuscript.

The main issue with this manuscript is that it does not include a discussion or interpretation of the results. Even though the methods are described and results are listed, there are no scientific claims, there is no analysis of the results, there is no real comparison to relevant work, and there is no discussion of the importance of the results or their significance to the field.

The manuscript is unsittable for publication as a scientific article in the present form but could be published if a discussion section is included.

I also have minor suggestions:

• Make the networks created with the different distance cutoffs publicly available. Calculating the DL distance is not necessarily trivial and this dataset would be useful to other researchers, who can use it to compare their work or methods to the current ones.

• In the text, define DL as ‘Damerau-Levenshtein’ before using the acronym.

• I did not identify any connection in the results listed between the phonological and the orthographic networks other than 1 to 1, so I don’t think this is rigorously speaking multiplex.

• The caption of Fig. 1 states that the illustration is for Spanish but the words are in English in the actual figure.

• Although it is understandable, V should be defined for completeness before Eq. 3.

• M is used in the definition of the degree of a node right before Eq. 3, but M is never defined. From inspection it seems like M is the multiplex layer, but this should be stated clearly to make it easier for the reader.

• I did not understand the definition of the modularity (Eq. 6). It says that the partition includes elements C_1, …, C_M. In this case M is subscript. C is defined as the clustering coefficient in Eq. 4. The set S is not really defined. So this needs more work.

• The results in Fig. 2 might not be apples to apples because the highest degree depends on the DL distance cutoff. I suggest normalizing the horizontal axis by the highest degree in each network.

• I suggest removing Fig. 3 and mentions of specific start dates and due dates. This is not really part of a scientific paper.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Feb 1;16(2):e0245263. doi: 10.1371/journal.pone.0245263.r002

Author response to Decision Letter 0


25 Nov 2020

Reviewer #1

“The study proposed by the authors in this report is interesting and could yield interesting results. The authors do a good job in motivating their project. The manuscript is well written barring some minor typos (see below). The authors may consider adopting some multilayer community detection algorithms like the ones described in [1] for example to find more interesting clusters across layers.

Typo on line 79 on page 3 - A & B should be V & W.

Reference:

[1] De Bacco, Caterina, et al. "Community detection, link prediction, and layer interdependence in multilayer networks." Physical Review E 95.4 (2017): 042317.”

Response: Thank you for your comments. We have added the application of the suggested multilayer community dectection method to the revised version of our Protocol (Lines 129-131), which will enrich our discussion about potential grouping of nodes (words) in the context of the two-layer network. The typo was corrected.

Reviewer #2

1)• Make the networks created with the different distance cutoffs publicly available. Calculating the DL distance is not necessarily trivial and this dataset would be useful to other researchers, who can use it to compare their work or methods to the current ones.

Response: Thank you for your suggestion. We have added the corresponding networks (with 10⁴ nodes) for each corpus to our dataset in the Figshare repository. We plan to update the dataset once we have concluded our study.

2) • In the text, define DL as ‘Damerau-Levenshtein’ before using the acronym.

Response: Corrected in the revised version (Line 23)

3)• I did not identify any connection in the results listed between the phonological and the orthographic networks other than 1 to 1, so I don’t think this is rigorously speaking multiplex.

Response: We plan to address the multiplex features in the extended paper

4)• The caption of Fig. 1 states that the illustration is for Spanish but the words are in English in the actual figure.

Response: The error was corrected

5)• Although it is understandable, V should be defined for completeness before Eq. 3.

Response: We have clarified the network notation, and the meaning of the letters (see lines 99-121 )

6)• M is used in the definition of the degree of a node right before Eq. 3, but M is never defined. From inspection it seems like M is the multiplex layer, but this should be stated clearly to make it easier for the reader.

Response: We have corrected the description of the network’s notation (see lines 99-121)

7)• I did not understand the definition of the modularity (Eq. 6). It says that the partition includes elements C_1, …, C_M. In this case M is subscript. C is defined as the clustering coefficient in Eq. 4. The set S is not really defined. So this needs more work.

Response: Thank you for your comment. In the revised version of the manuscript we have clarified our notation regarding the modularity definition (see Lines 117-121).

8)• The results in Fig. 2 might not be apples to apples because the highest degree depends on the DL distance cutoff. I suggest normalizing the horizontal axis by the highest degree in each network.

Response: We have added the suggested normalization of the horizontal axis in the new version of Fig. 2

9)• I suggest removing Fig. 3 and mentions of specific start dates and due dates. This is not really part of a scientific paper.

Response: The mentioned fig was removed. The new dates were updated in the main text.

Attachment

Submitted filename: ResponseToReviewers.pdf

Decision Letter 1

Diego Raphael Amancio

28 Dec 2020

Comparing phonological and orthographic networks: a multiplex analysis

PONE-D-20-23977R1

Dear Dr. Guzmán-Vargas,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Diego Raphael Amancio

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Does the manuscript provide a valid rationale for the proposed study, with clearly identified and justified research questions?

The research question outlined is expected to address a valid academic problem or topic and contribute to the base of knowledge in the field.

Reviewer #2: Yes

**********

2. Is the protocol technically sound and planned in a manner that will lead to a meaningful outcome and allow testing the stated hypotheses?

The manuscript should describe the methods in sufficient detail to prevent undisclosed flexibility in the experimental procedure or analysis pipeline, including sufficient outcome-neutral conditions (e.g. necessary controls, absence of floor or ceiling effects) to test the proposed hypotheses and a statistical power analysis where applicable. As there may be aspects of the methodology and analysis which can only be refined once the work is undertaken, authors should outline potential assumptions and explicitly describe what aspects of the proposed analyses, if any, are exploratory.

Reviewer #2: Yes

**********

3. Is the methodology feasible and described in sufficient detail to allow the work to be replicable?

Reviewer #2: Yes

**********

4. Have the authors described where all data underlying the findings will be made available when the study is complete?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception, at the time of publication. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above and, if applicable, provide comments about issues authors must address before this protocol can be accepted for publication. You may also include additional comments for the author, including concerns about research or publication ethics.

You may also provide optional suggestions and comments to authors that they might find helpful in planning their study.

(Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: The authors satisfactorily addressed in their revised manuscript the points raised in the first round of reviews and I recommend publication of the manuscript.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Acceptance letter

Diego Raphael Amancio

7 Jan 2021

PONE-D-20-23977R1

Comparing phonological and orthographic networks: a multiplex analysis

Dear Dr. Guzmán-Vargas:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Diego Raphael Amancio

Academic Editor

PLOS ONE


Articles from PLoS ONE are provided here courtesy of PLOS

RESOURCES