Abstract
Multidisciplinary approach to malaria research remain important to the national malaria control program in Benin. To unravel the mechanism underlying the establishment of research collaboration ties in malaria research in Benin, we model the complexity and dynamics of the malaria research co-authorship network of Benin using Exponential Random Graph models (ERGMs) and Temporal ERGMs (TERGMs). The network contains co-authorship information from January 1996 to December 2016. We fit ERGMs and TERGMs to the network as a function of nodal, dyadic and structural statistics terms, accounting for important principles of graph theory such as homophily and structural equivalence. The final ERGM and TERGM showed that the mechanistic phenomenon driving collaboration ties in malaria research in Benin is driven by homophily on the type of affiliation and by membership to a research group. Our study is one of a few to take a model-based approach to the study ofco-authorship networks.
Introduction
The field of malaria research is characterized by a tremendous increase over the last 20 years of number of published materials and the number of researchers involved. This increase in scientific collaborations enabled various research in the field that led to important milestones such as the successful implementation and sustainability of entomological surveillance of malaria for more than six years since 20081, an annual decrease of 5.2% in the incidence of malaria and an annual decrease of 5.3% in malaria-related deaths2 in 2012. Despite these achievements, malaria remains a major public health concern as it represents the first reason for hospital visits in Benin. Among other reasons, the lack of information on how research collaborations are established, limits the discovery of new knowledge and its translation into public health policies.
Scientific collaboration can be studied via co-authorship analysis. Based on graph theory, the analysis of co-authorship network describes the relationship and interaction among authors within a scientific research field. Many co-authorship papers have investigated such interactions among researchers within specific scientific disciplines such as biomedical research, physics, computer science3, information systems4, Chagas disease research5 and even reproductive biology6. Other studies have applied network analysis to identify active groups driving the research effort in a specific country across a scientific discipline7,8. Despite the variety of co-authorship studies, most of the existent studies are limited in terms of the methods used, mostly descriptive in nature. Since the publication of the first network analysis paper, tremendous progress has been made in developing new methods to model co-authorship networks. Even better, methods that model network evolution in time have been developed. Despite such developments, very few applications of these methods are reported in the literature. Even more compelling, methods for studying long term co-authorship interactions are available but remain poorly applied to modeling the complexity of such networks. In this study, we propose to model the complex structure of the malaria collaborative network and its dynamics using advanced statistical modeling methods of network data.Over mathematical models of network data, statistical models are well-suited to model nodal, dyadic, and structural variables influencing collaboration tie formation. Also, they can be fit to the observed network and enable model assessment9.
Co-authorship networks are dynamic in nature. New authors can make their apparition in the network while old authors may disappear, and new authorship collaborations may be initiated while old ones may cease to exist10. With the development of new network analysis methods9, the dynamic of tie formation can be modeled. For this reason, our approach consist in finding static and temporal structural patterns to better explain the evolution of tie formation in the scientific malaria research community in Benin.
We hypothesized that tie formation in the Malaria co-authorship network (i) is dependent on certain authors’ characteristics, and (ii) collaboration type and/or membership to a certain research community or class determines collaboration tie formation. We applied Exponential Random Graph family models also referred to as p* models to our Malaria co authorship network to verify our hypotheses. The purpose of this approach to network modeling is to unveil structural patterns driving collaboration tie formation in the Malaria co-authorship network in Benin.
Methods
Data
Our research utilized secondary data collection techniques using the systematic literature search. The data collection was carried on papers indexed in Thompson’s Institute for Scientific Information Web Of Science (WOS) (formerly known as the Web of Knowledge). The search was conducted using combinations of Malaria related MeSH terms including “malaria”, “Anopheles”, “Plasmodium” and “vector”. We restricted the search to the period from 1996 to 2016 and to “Benin” for country. We further screened the papers in order to only select those published by Beninese authors, or papers published on Malaria involving at least one author affiliated to a Beninese research institution. No restriction was placed upon the document types. We first queried WOS using each term independently, and then combined the other terms so the query return the maximum number of results. The Full citations information containing the authors’ names, their institutional affiliations, the year of publication, as well as the number of times the document was cited were recorded as a bibliographic corpus in text format. After a second screening only research that have met the above listed inclusion criteria and that were published between January 1, 1996 and December 31, 2016 were selected in this study.
Text Mining and Network generation
From each bibliographic text files, we constructed a corpus of the published documents using Tethne v0.811, a python library for parsing bibliographic data. Using NetworkX12, another python library, we generated an undirected multi-graph co-authorship network containing parallel edges. Vertices were defined by several attributes including name, affiliation, city, country, number of publication and total number of times cited. Edges too, had attributes associated with them such as a unique identifier, the number of times a pair of authors was cited and the number of publications of a pair of authors. We normalized and disambiguated the information collected such as researchers’ names, research center denominations, and any other information that appeared ambiguous.
Author Name Disambiguation
One common challenge in collecting bibliometric data is the matching problem. Multiple names can refer to the same author. A well-known approach to solving this issue is termed as Author Name Disambiguation (AND). While many AND methods have been reported in the literature13, we performed a fuzzy matching machine learning technique of AND. We used Dedupe, a python library to disambiguate authors’ names and assign a unique identification number to each author. We manually annotated 10% of the names and then trained the algorithm to automatically disambiguate the remaining of the entries. Dedupe is interactive and adjusts further annotations as the disambiguation process evolves. Dedupe is based on the work of Bilenko14 and has been developed by Gregg Forest and Derek Eder. We evaluated our AND fuzzy matching machine learning method by computing Precision and recall metrics.
Stochastic Block Model
Blockmodeling is a statistical method to identify, in a given network, clusters or classes of authors that share structural characteristics15,16. Each such cluster forms a position. The units within a cluster have the same or similar connection patterns. Given a graph G = (V, E) and its adjacency matrix Y, for two distinct nodes i, j ∈ V, the block model defined by Kolaczyk and Csárdi17, specifies that each element yij of Y is conditional on the class label q and r of the vertices i and j. The model has the form:
| (1) |
where Lqr is the number of edges in the observed graph y connecting vertices of classes q and r, θqr is the parameter estimates, and K is a normalization constant defined as Stochastic block model (SBM) originated from the ideas that equivalent units can be grouped together. There are three definitions of equivalences which are structural, automorphic and regular10. In practice, the differences in types of equivalence tend to blur when stochastic block modeling is applied to real networks. Here, we used SBM as a model based clustering technique. After fitting the SBM, we extract the posterior probability of class membership and determined the class membership of each vertex class assignment based on the maximum a posteriori criterion. Class membership was added to the network as an additional nodal attribute. The R package mixer was used to fit the SBM. Mixer used the Integration Classification Likelihood (ICL) criterion to select the number of classes fit to the observed network.
Exponential Random Graph Model
Also referred to as p* models, Exponential Random Graph Models (ERGMs) are probability models for network designed in analogy to Generalized Linear Models (GLMs)17. ERGM have gain increasing interests especially in modeling social networks. Robins et al.18 provides a nice introduction to ERGM as well as a general framework for ERGM creation which we closely followed here. We used ERGM to investigate how local processes affect collaboration tie formation between authors in our network. We modeled the network ties, the dependent variable as a function of nodal and dyadic attributes (covariates) such as the number of times an author was cited, the number of publications, the number of collaborators, the collaboration type as well as its community membership as determined by the SBM. Given a random graph G = (V, E), for two distinct nodes i, j ∈ V, we define a random binary variable Yij such that Yij = 1 if there is an edge e ∈ E between i and j, and Yij = 0 otherwise. Since co-authorship networks are by definition undirected networks, Yij = Yji and the matrix Y = [Yij] represents the random adjacency matrix for G. The general formulation of ERGM is therefore:
| (2) |
where each H is a configuration, a set of possible edges among a subset of the vertices in G and is the network statistic corresponding to the configuration H; gH (y) = 1 if the configuration is observed in the network y, and is 0 otherwise. θH is the parameter corresponding to the configuration H (and is non-zero only if all pairs of variables in H are assumed to be conditionally dependent); K is a normalization constant defined as .
In order to obtain the best model, several models containing nodal, dyadic and structural terms were fit to the observed network data. The first model we fit is a naive model containing only the ERGM “edge” term. This model is nothing but the Bernoulli random graph model19. We then fit another model containing only nodal and/or dyadic terms. Third, we fit a structural model containing only high-order terms representing network statistics such as triangles, k-stars, geometrically weighted edge-wise shared partner distribution and many more17,18. Ideally, we expect the best model to contain nodal and dyadic covariates as well as high order ERGM terms. Model log-likelihood, the Akaike’s Information (AIC) and the Bayesian Information (BIC) criteria were used to select the best model. After checking for model diagnostics whenever necessay, we finally evaluated the best model (lowest AIC or BIC and highest likelihood) by assessing its goodness-of-fit to the observed network. We expect each model to converge within a maximum of 1,000 iterations. The R package ergm was used to fit the models.
Temporal Exponential Random Graph Model
The Temporal Exponential Random Graph Model (TERGM) is an extension of the ERGM proposed by Hanneke, Fu, and Xing20. The TERGM was designed with the idea of accounting for inter-temporal dependence in longitudinally collected network data. TERGM was applied to each co-authorship network following from the work of Leifeld, Cranmer, and Desmarais21. For a full description of the TERGM, we refer the reader to Leifeld, Cranmer, and Des-marais21.
Each network is subset in different temporal snapshots. In general, when the temporal network is overly dense or sparse early on or in later time periods, the TERGM tends to fit different time spans differently21. To avoid such an issue, the cumulative network was subset in a certain way that balanced the number of edges across the years. This strategy improved the robustness and convergence of our models. We modeled the network ties, the dependent variable as a function of nodal, dyadic variables, and dyadic stability and delay reciprocity memory terms. To check whether there is a linear trend in collaboration tie formation, we also included a linear time covariate in the model. We accounted for network structural predictors and homophily on the type of collaboration. Model log-likelihood, the Akaike’s Information (AIC) and the Bayesian Information (BIC) criteria were used to select the best model (final model) corresponding to the lowest AIC or BIC, and highest log-likelihood.
To evaluate the extent to which the final model captures the endogenous properties and processes of the observed network, we checked for model diagnostics, assessing the within-sample and out-of-sample goodness-of-fit. For the out-of-sample goodness-of-fit, we estimated the model on the first network snapshots leaving out the last network snapshot in the series. We simulated 1000 networks from the model and assessed how the simulated network predicted the left out network. All models were fit using the Markov Chain Monte Carlo Maximum Likelihood Estimation (MCMC-MLE) for TERGMs implemented in the btergm R package.
Analysis and computation tools
All analyses were performed using R22, an open source software for statistical data analysis. We used the R package igraph23 for network manipulation. Given the size of our network and the expected long computation times, all analyses were parallelized whenever possible. All computations were conducted on a 64 CPU cores, 256 GB EXXACT server.
Results
Bibliographic search and network characteristics
The final query set returned 685 records. After screening, 424 documents met the selection criteria. On average, there was 10.67 authors per published document. After the Author Name Disambiguation, we identified 1792 unique authors with a precision of 99.87% and a recall of 95.46%. The generated multigraph co-authorship network therefore contained 1792 vertices (authors) and 116,388 parallel edges (collaborations) which converted in a weighted graph of 1792 vertices and 95,787 edges. This network is characterized by a giant component containing 94% of all the vertices in the network, a shortest path distance between pairs of three, a diameter, of 10 and a high clustering coefficient (transitivity) of 0.964. It is also characterized by a density of of 0.0596 representing the baseline probability of a tie formation in the network24.
Stochastic Block Model
The ICL plot on figure 1 shows that the malaria co-authorship network has been fit with 39 classes by the SBM with a degree of latitude of 30 to 39 classes being reasonable. The degree distribution of the fitted SBM (blue curve) provides a decent description of the observed distribution (yellow histogram). In the inter/intra class probabilities network, the vertices correspond to the 39 classes detected by the SBM. The vertex sizes are proportional to the number of authors assigned to each class. Each vertex is further broken down in a pie chart with each portion reflecting the relative proportion of the types of collaboration. Yellow represents the proportion of authors of international affiliations, orange represents regional authors who are affiliated with African institutions other than Beninese institutions, and green for authors affiliated to Beninese research institutions. In general, we observe a dominance of international and regional researchers over national researchers across all detected clusters.
Figure 1.

Summary of the goodness-of-fit of the SBM analysis on the Malaria co-authorship network.
Exponential Random Graph Model
Table 1 summarizes the results of the different models we fit to the observed network. Model 1 is analogous to the null model in a typical General Linear Model (GLM). The probability of any two authors establishing a collaboration tie is therefore expressed as the inverse logit of the edge coefficient. The inverse logit of a coefficient x is defined as . The conditional log-odds for a collaboration between authors in the network is –2.76. The associated probability of any two authors establishing a collaboration tie is therefore 5.96% which is the same as the density of the malaria co-authorship network. Since, our network is characterized by a high transitivity, we modeled the triangle ERGM term along with the edge term in model 2. We see some improvements in the model performance with a significantly positive but small triangle effect on the collaboration tie formation (Coefficient = 0.08, p < 0.001).
Table 1.
ERGM of the co-authorship Malaria network.
| Model 1 | Model 2 | Model 3 | Model 4 | |
|---|---|---|---|---|
| Estimate (SE) | Estimate (SE) | Estimate (SE) | Estimate (SE) | |
| Network structural predictor | ||||
| Intercept(edge) | –2.76 (0.00)*** | –5.00 (0.01)*** | –7.98 (0.02)*** | –8.22 (0:02)*** |
| Triangle | – | 0.08 (0.00)*** | – | – |
| Number of collaborations | – | – | 0.02 (0.00)*** | 0.01 (0.00)*** |
| Number of publications | – | – | 0.12 (0.00)*** | 0.13 (0.00)*** |
| Number of times cited | – | – | –0.01 (0.00)*** | –0.01 (0.00)*** |
| Homophily on cluster assignment | – | – | 5.58 (0.02)*** | 5.68 (0.02)*** |
| Homophily on collaboration type | – | – | 0.46 (0.01)*** | 0.61 (0.00)*** |
| Factor attribute effect (collaboration type) | ||||
| International | – | – | – | REF |
| National | – | – | – | –0.32 (0.02)*** |
| Regional | – | – | – | 0.58 (0.01)*** |
| Number of iterations | 6 | 18 | 8 | 9 |
| Akaike’s Information Criterion (AIC) | 725268 | 660444 | 220964 | 217026 |
| Bayesian Information Criterion (BIC) | 725280 | 660469 | 221038 | 217125 |
| Model Log Likelihood | –362633 (df = 1) | –330220 (df = 2) | –110475.9 (df = 6) | –108505.2 (df = 8) |
REF = reference, SE = Standard Error, df = degree of freedom
***p < .001 **p < .01 *p < .05
In model 3, we describe the co-authorship network as a function of the number of collaborations, the number of publications, and the number of citations of authors inside the network. We also include confounding homophily on cluster assignment from the SBM and on the collaboration type. Compared to models 1 and 2, model 3 has tremendously improved (See AIC and BIC in table 1). The edge effect has decreased (Coefficient = –7.98, p < 0.001) with the associated conditional probability (given all other terms in the model) equal to 0.03%. We observed a small, though positively significant effect of the number of collaborators and the number of publications on the odds of collaboration tie formation between any two authors. One unit increase in the number of collaborators increases the odds of collaboration tie by 2% while one unit increase in the number of publications increases the odds of establishing a collaboration tie by 12.75%. On the other hand, model 3 has found a very small but significant negative effect of the number of times an author was cited on the odds of collaboration tie formation. One unit increase in the number of citation of a given author was associated with 1% decrease in the odds of collaboration between two authors conditional on all the other terms in the model.
It clearly appears that the process underlying the malaria co-authorship network is driven by homophily on cluster assignment or membership to a specific research community and the type of collaboration. The conditional probability of two authors collaborating adjusted by the homophily on their membership to a research community is estimated at 8.32% compared to the baseline probability of 0.03% given all other terms in model 3. Adjusted by the collaboration type, the same probability is estimated at 0.05% conditional on all other terms in the model. The overall conditional probability adjusting for all terms in model 3 is estimated at 14.06% which is a lot greater than the 5.95% estimated from model 1.
In model 4, we introduced factor attributes on the collaboration type in order to investigate the likelihood of researchers affiliated to Beninese institutions to establish international and regional or African collaboration ties. While model 4 slightly improved upon model 3, it displays minor changes in the coefficient of the terms it has in common with model 3. Overall, compared to researchers with international research affiliations, researchers affiliated to Beninese research institutions have 37.7% average decrease in the odds of establishing collaboration ties. On the other hand, researchers affiliated to other African research institutions have 78.6% increase in the odds of establishing a collaboration tie than researchers affiliated to international research institutions. In other words, in model 4, the probability for researchers affiliated to international institutions to establish a collaboration tie is estimated at 14.19%, that of researchers affiliated to Beninese institutions is 10.72%, and that of researchers affiliated to African institutions other than Beninese institutions is 22.79%.
None of the structural models containing high order ERGM terms, nor the models containing the dyadic attribute terms converged after the maximum of 1,000 iterations making estimates from these models untrustful. This observation justifies the reason why we do not present the results from these models in table 1. The unability of model containing structural terms to converge also makes it impossible for us to assess model degeneracy as recommended by Handcock et al.25. The wide range of degree distribution of our co-authorship network makes it difficult to assess model fit accordingly. But in general, model 4 fits poorly to the observed network despite the highly significant estimates obtained. We therefore have strong evidence confirming that there is likely something other than the terms included in this model that are driving the structure of the network, possibly additional attributes our final model did not control for.
Temporal Exponential Random Graph Model
The observed cumulative network was subset in seven snapshots representing respectively the following time spans: 1996 — 2006,2007 — 2009,2010 — 2011,2012 — 2013,2014,2015 and 2016. Figure 2 displays the topological structure of the snapshots of the different time steps.
Figure 2.
Topological structure of the different snapshots of the malaria co-authorship network.
Table 2 summarizes the results of the different temporal models we fit to the observed network. Models 1, 2, and 3 are equivalent to a pooled ERGM across the 7 different time points (Fig. 2). The null model of the TERGM (model 1) suggests that the baseline log-odds for collaboration tie formation between authors in the network is —4.66. This coefficient is equivalent to a baseline probability of 0.9% for any two authors in the network to establish a stable collaboration tie. This probability is significantly lower than the 5.96% baseline probability of collaboration tie establishment.
Table 2.
Temporal ERGM of Malaria Co-authorship Network.
| Model 1 | Model 2 | Model 3 | Model 4 | |
|---|---|---|---|---|
| Estimate (SE) | Estimate (SE) | Estimate (SE) | Estimate (SE) | |
| Network structural predictor | ||||
| Intercept(edge) | –4.66 (0.00)*** | –10.14 (0.02)*** | –10.45 (0.02)*** | –8.65 (0.05)*** |
| Number of collaborations | – | 0.03 (0.00)*** | 0.03 (0.00)*** | 0.03 (0.00)*** |
| Number of times cited | – | –0.03 (0.00)*** | –0.02 (0.00)*** | –0.03 (0.00)*** |
| Number of publications | – | 0.45 (0.00)*** | 0.46 (0.00)*** | 0.45 (0.00)*** |
| Homophily on cluster assignment | – | 4.96 (0.02)*** | 5.06 (0.02)*** | 4.79 (0.02)*** |
| Homophily on collaboration type | – | 0.44 (0.01)*** | 0.56 (0.01)*** | 0.54 (0.01)*** |
| Factor attribute effect (collaboration type) | ||||
| International | – | – | REF | REF |
| National | – | – | –0.10 (0.02)*** | 0.01 (0.02) |
| Regional | – | – | 0.55 (0.01)*** | 0.60 (0.01)*** |
| Temporal dependencies | ||||
| Dyadic stability | – | – | – | 1.07 (0.01)*** |
| Linear trends | – | – | – | –0.18 (0.01)*** |
| Akaike’s Information Criterion (AIC) | 94681198 | 93740511 | 93737596 | 67005816 |
| Bayesian Information Criterion (BIC) | 94681230 | 93740624 | 93737742 | 67005991 |
| Model Log Likelihood | –47340597 | –46870248 | –46868789 | –33502897 |
REF = reference, SE = Standard Error
***p < .001 **p < .01 *p < .05
Model 2 of the TERGM describes the co-authorship network as a function of the number of collaborations, the number of publications, and the number of citations of authors inside the network. It is also adjusted by homophily on cluster assignment from the SBM and on the collaboration type. Compared to model 1, model 2 has slightly improved (See AIC and BIC in table 2). The edge effect has decreased (Coefficient = —10.14, p < 0.001) with the associated conditional probability (given all other terms in the model) equal to 0. 004%. We observed a relatively high positively significant effect of the homophily on cluster assignment on the odds of collaboration tie formation between any two authors. Adjusting for the other variables in model 2, authors of the same research groups/communities are 4.96 times as likely to collaborate than authors that belong to different research groups. The effect of the other attributes in model 2 are minor. When we adjust for attribute effect on the collaboration type, we obtained model 3 which is slightly better than model 2. Relatively to model 2, the edge effect decreases more followed by an even stronger effect of the homophily on cluster assignment of the authors in the network (Coefficient = —5.06, p < 0.001).
After introducing temporal dependencies terms, we obtained model 4 which tremendously improved compared to models 1,2 and 3. Model 4 confirms that the process underlying the malaria co-authorship network is driven by homophily on cluster assignment or membership to a specific research community and the type of collaboration. Model 4 suggests that the baseline conditional probability of any two authors to collaborate is estimated at 0.02% given all other terms in the model. The coefficient associated to the dyadic stability term is 1.07 meaning that the odds of existent and non existent collaboration ties at one time point to remain the same at the next time point increased on average by 65.7%. In other words, the odds of new collaboration ties and non-ties to occur from one time point to another is 34.3%. In addition, the TERGM showed that the probability of sustainable collaboration tie formation among international researchers is 12.13% versus 12.24% for researchers affiliated with national institutions (p > 0.05). However, this probability significantly increases to 20.26% for researchers affiliated to African research institutions other than those in Benin. These probabilities confirm the results from the ERGM final model with respect to the higher probability of tie formation between researchers affiliated to African institutions other than Beninese institutions. None of the structural temporal models containing high order TERGM terms, nor the models containing the dyadic attribute terms converged after the maximum of 1,000 iterations making estimates from these models unreliable.
Figure 3 presents the goodness-of-fit assessment for the TERGM model 4. We can see that this model containing temporal dependencies fits better to the observed Malaria co-authorship network than the final ERGM model 4. While the first five subfigures compare the distribution of endogenous network statistics between the observed network and the simulated ones, the last subfigure presents the Receiver Operating Characteristics (ROC) and precision-recall (PR) curves. The ROC for model 4 is depicted by the dark red curve compared to the ROC of a random graph depicted by the light red curve. Similarly, the blue curve represents the PR of model 421. It clearly appears that the final TERGM model 4 outperformed the random null model with an Area Under the Curve (AUC) value estimated at 79.98%.
Figure 3.

Goodness-of-fit assessment for the final Malaria TERGM Model 4 with temporal dependencies.
Discussion and Conclusion
This study shows that the mechanistic phenomenon driving collaboration ties in the malaria research in Benin is influenced by homophily on the type of affiliation (national, international or regional) and on membership to a research group or cluster, verifying therefore our second hypothesis. It clearly shows the dominance of the Beninese malaria research arena by international and regional players, and further demonstrates the lower likelihood of local Beninese researchers to establish international collaboration ties compared to regional researchers. The ERGM revealed that factors such as number of publications, number of citations and number of collaborations are associated to higher likelihood to establishing collaboration ties, confirming therefore our first hypothesis. We further evaluate how each factor affects the likelihood of collaboration tie formation.
Our first approach to modeling our network relied on the use of SBM. In addition of being a model based clustering method, the SBM identified a large clique of mainly international researchers with little or no collaborations with other research groups. The overwelming dominance of regional and international players in the network is consistent with previous observations by Onyancha and Maluleka26 who concluded on a much higher likelihood of Sub-Saharan African countries to collaborate with non-African states.
It is worth noting that many of the studies on co-authorship network analysis are descriptive in nature. This study is one of a few to model a co-authorship network using advanced statistical models. ERGM is the leading approach to modeling network27. A study by Kronegger et al.28 conducted an investigation aiming at describing the collaboration in Slovenian scientific communities using data from four different disciplines. While their methodological approach is consistent with ours, the main difference relies in their application of Stochastic Actor-Oriented Model (SAOM) on the dynamics of their co-authorship networks. Since the SAOM is an actor-oriented modeling method and we are interesting in tie prediction here, we relied rather on a tie-oriented approach by applying the TERGM to our network data.
Although the use of statistical models for networks is justfied in this research, the size of our network prevented the fitting of complex models including dyadic and high order ERGM terms. In a recent paper, Schmid and Desmarais27 acknowledged the difficulty of fitting ERGM to network which size is of the order 1,000 vertices. In fact, the main limitation of statistical models for networks resides in their cost as they tend to be computationally intensive and expensive in terms of CPU time and memory usage.
Our results suggest that the regain in Malaria research funding has appealed to research groups all around the world, hence the explosion in publications number and research collaborations. As the disease continues to be main public health concern in the Republic of Benin, it is essential to consolidate the knowledge generated from the numerous studies on the disease and reinforce the different communities involved in the research effort. In addition, there is an urgent need to reinforce the malaria research network in Benin by continuously supporting, stabilizing the identified key brokers and most productive authors, and promoting the junior scientists in the field. However, we observed a tendency of the international researchers to only collaborate among themselves. Although the rise in scientific collaboration between advanced and developing nations29, the latter observation may limit effective and sustainable technology transfer in Benin. It is possible that some of the isolated cliques within the network have top-notch research capabilities and skills researchers affiliated to Beninese institutions can acquire, should the research groups be more inclusive. We therefore recommend, that policies should be designed, at international, regional and country level, to diversify research groups operating in any Sub-Saharan African countries. Such policies will ultimately enable effective technology transfer, multidisciplinarity, and promote junior African researchers in Africa and particularly, in Benin.
References
- 1.Martin C Akogbéto, Rock Y. Aïkpon, Roseric Azondékon, Gil G. Padonou, Razaki A. Ossè, Fiacre R. Agossa, Raymond Beach, Michel Sèzonlin. Six years of experience in entomological surveillance of indoor residual spraying against malaria transmission in Benin: Lessons learned, challenges and outlooks. Malaria Journal. 2015 Dec;14(1) doi: 10.1186/s12936-015-0757-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Geneva: World Health Organization View Article Google Scholar. 2012. World Health Organization. World malaria report 2010. [Google Scholar]
- 3.Newman M. E. J. Coauthorship networks and patterns of scientific collaboration. Proceedings of the National Academy of Sciences. 2004 Apr;101(Supplement 1):5200–5205. doi: 10.1073/pnas.0307545100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Sally Cunningham and Stuart Dillon. Authorship patterns in information systems. Scientometrics. 1997;39(1):19–27. [Google Scholar]
- 5.Gregorio González-Alcaide, Jinseo Park, Charles Huamani, Joaquin Gascón, José Manuel Ramos. Scientific authorships and collaboration network analysis on Chagas disease: Papers indexed in PubMed (1940-2009) Revista do Instituto de Medicina Tropical de Sao Paulo. 2012;54(4):219–228. doi: 10.1590/s0036-46652012000400007. [DOI] [PubMed] [Google Scholar]
- 6.Gregorio González-Alcaide, Rafael Aleixandre-Benavent, Carolina Navarro-Molina, Juan Carlos Valderrama-Zurián. Coauthorship networks and institutional collaboration patterns in reproductive biology. Fertility and Sterility. 2008 Oct;90(4):941–956. doi: 10.1016/j.fertnstert.2007.07.1378. [DOI] [PubMed] [Google Scholar]
- 7.Ghafouri H.B, Mohammadhassanzadeh H, Shokraneh F, Vakilian M, Farahmand S. Social network analysis of Iranian researchers on emergency medicine: A sociogram analysis. Emergency Medicine Journal. 2014;31(8):619–624. doi: 10.1136/emermed-2012-201781. [DOI] [PubMed] [Google Scholar]
- 8.Reza Yousefi-Nooraie, Marjan Akbari-Kamrani, Robert A Hanneman, Arash Etemadi. Association between co-authorship network and scientific productivity and impact indicators in academic medical research centers: A case study in Iran. Health Research Policy and Systems. 2008 Dec;6(1) doi: 10.1186/1478-4505-6-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Eric D. Kolaczyk. Statistical Analysis of Network Data: Methods and Models Springer series in statistics. New York; [London]: Springer; 2009. [Google Scholar]
- 10.Mali F, Kronegger L, Doreian P, Ferligoj A. Dynamic Scientific Co-Authorship Networks Understanding Complex Systems. 2012. [Google Scholar]
- 11.Erick Peirson, Aaron Baker, Ramki Subramanian, Abhishek Singh, Yogananda Yalugoti. Tethne. 2016;v0.8 [Google Scholar]
- 12.Daniel A Schult, Swart P. Exploring network structure, dynamics, and function using NetworkX. 2008;volume 2008:11–16. [Google Scholar]
- 13.Anderson A Ferreira, Marcos André Goncalves, Alberto HF Laender. A brief survey of automatic methods for author name disambiguation. Acm Sigmod Record. 2012;41(2):15–26. [Google Scholar]
- 14.Mikhail Yuryevich Bilenko. Learnable Similarity Functions and Their Application to Record Linkage and Clustering. Austin, TX, USA: PhD thesis, University of Texas at Austin; 2006. [Google Scholar]
- 15.François Lorrain, Harrison C. White. Structural equivalence of individuals in social networks. The Journal of Mathematical Sociology. 1971 Jan;1(1):49–80. [Google Scholar]
- 16.Patrick Doreian, Vladimir Batagelj, Anuska Ferligoj. Generalized Blockmodeling. Cambridge: Cambridge University Press; 2004. [Google Scholar]
- 17.Eric D Kolaczyk, Gábor Csárdi. Statistical Analysis of Network Data with R. volume 65. Springer; 2014. [Google Scholar]
- 18.Garry Robins, Pip Pattison, Yuval Kalish, Dean Lusher. An introduction to exponential random graph (p*) models for social networks. Social networks. 2007;29(2):173–191. [Google Scholar]
- 19.Paul Erdös, Alfréd Rényi. On random graphs, I. Publicationes Mathematicae (Debrecen) 1959;6:290–297. [Google Scholar]
- 20.Steve Hanneke, Wenjie Fu, Eric P Xing. Discrete temporal models of social networks. Electronic Journal of Statistics. 2010;4:585–605. [Google Scholar]
- 21.Philip Leifeld, Skyler J Cranmer, Bruce A Desmarais. Journal of Statistical Software. 2015. Temporal Exponential Random Graph Models with xergm: Estimation and Bootstrap Confidence Intervals. [Google Scholar]
- 22.Vienna, Austria: 2013 2014. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing. [Google Scholar]
- 23.Gabor Csardi, Tamas Nepusz. The igraph software package for complex network research. InterJournal, Complex Systems. 2006;1695(5):1–9. [Google Scholar]
- 24.Roseric Azondekon, Zachary James Harper, Fiacre Rodrigue Agossa, Charles Michael Welzig, Susan McRoy. Scientific authorship and collaboration network analysis on malaria research in Benin: Papers indexed in the web of science (1996-2016) Global Health Research and Policy. 2018 Dec;3(1) doi: 10.1186/s41256-018-0067-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Mark S Handcock, Garry Robins, Tom AB Snijders, Jim Moody, Julian Besag. Citeseer; 2003. Assessing degeneracy in statistical models of social networks. Technical report. [Google Scholar]
- 26.Omwoyo Bosire Onyancha, Jan Resenga Maluleka. Knowledge production through collaborative research in sub-Saharan Africa: How much do countries contribute to each other’s knowledge output and citation impact? Scientometrics. 2011 May;87(2):315–336. [Google Scholar]
- 27.Christian S Schmid, Bruce A Desmarais. arXiv preprint arXiv:1708.02598. 2017. Exponential Random Graph Models with Big Networks: Maximum Pseudolikelihood Estimation and the Parametric Bootstrap. [Google Scholar]
- 28.Luka Kronegger, Franc Mali, Anuska Ferligoj, Patrick Doreian. Collaboration structures in Slovenian scientific communities. Scientometrics. 2012 Feb;90(2):631–647. [Google Scholar]
- 29.Caroline S Wagner, Irene Brahmakulam, Brian Jackson, Anny Wong, Tatsuro Yoda. 2001. Science and technology collaboration: Building capability in developing countries. Technical report, RAND CORP SANTA MONICA CA. [Google Scholar]

