Skip to main content
PLOS One logoLink to PLOS One
. 2025 Jun 17;20(6):e0320939. doi: 10.1371/journal.pone.0320939

A quantitative geospatial analysis of the risk that Boko Haram will target a school

Lirika Sola 1, Youdinghuan Chen 1, V S Subrahmanian 1,*
Editor: Jessica Leight2
PMCID: PMC12173403  PMID: 40526739

Abstract

We provide a novel quantitative geospatial analysis of school attacks perpetrated by Boko Haram in Nigeria. Such attacks are used by Boko Haram to kidnap boys (for potential use as child soldiers and suicide bombers) and girls (for potential use as domestic servants, as sex slaves, and suicide bombers). We first build a novel geospatially tagged data set spanning almost 15 years (July 2009 to April 2023) of data not only on Boko Haram attacks on schools (our dependent variable) but also a set of 15 independent variables (or features) about other attacks by Boko Haram, locations of security installations, as well as socioeconomic and geospatial characteristics of the regions around these schools. Second, we develop a univariate statistical analysis of this data, showing strong links between three broad factors affecting attacks on schools: Security presence in and around a school, the Boko Haram Activity in the area around a school, and the Socioeconomic characteristics of the region around a school. Third, we train several predictive machine learning models and assess their predictive efficacy. The results show that some of these models can accurately quantify the likelihood that a school will be at risk of a Boko Haram attack. In addition, they cast light on the features that are most important in making such predictions. We then analyze learned decision trees to identify some conditions on the independent variables that help predict Boko Haram attacks on school. Fourth, we use these decision trees to formulate multivariate hypotheses that we investigate further from a statistical perspective. We find that Security presence near schools, Activity of Boko Haram in regions, and the Socioeconomic factors characterizing the region a school is in are all significant predictors of attacks. We conclude with a policy recommendation.

Introduction

On April 15 2014, the world woke up to the horrifying news that Boko Haram terrorists had kidnapped 276 schoolgirls from the town of Chibok. Almost 10 years later, Amnesty International reported that as of April 2023, 98 of these unfortunate girls were still being held by Boko Haram [1]. In another devastating attack in 2018, 110 schoolgirls were kidnapped in a Boko Haram attack on a school in Dapchi in Yobe State [2] though most were subsequently released. In December 2020, over 300 schoolboys were abducted from their school in Kankara in Katsina state [3]. More recently, in March 2024, more than 300 schoolchildren were reportedly kidnapped by Boko Haram in the town of Kuriga in Kaduna State [4]. Just one day earlier, another group of children were reportedly abducted by Boko Haram [5].

With a name that is synonymous with “[Western] Education is forbidden”, the group has carried out over 4300 attacks during the 14-year period (July 1 2009 to April 30 2023) that this study encompasses [6]. 76 of these attacks targeted schools. Boko Haram’s attacks have had a devastating toll, resulting in over 39000 casualties and forcibly displacing over 2.4M people who once called the Lake Chad region home [7]. It is estimated that over 15 million people have been adversely affected by the insurgency and the resulting counter-terrorism operations [8]. All of this carnage has been caused by a group that, in 2022, was estimated by the US Office of the Director of National Intelligence (ODNI) to have only around a 1000 members [9].

Boko Haram’s attacks on schools serve a multitude of purposes. On the one hand, opposition to Western education is a major plank of Boko Haram’s raison d’être. Hence, attacks on schools can be viewed as a way to further this ideological goal. On the other hand, such attacks often go hand in hand with the kidnapping of the targeted school’s students. Kidnapped students, boys, and girls, are often used as child soldiers, spies, and suicide bombers, frequently leading to horrific outcomes for the children involved. Girls are also subject to horrific abuse in the form of extreme domestic servitude and sexual slavery. Yet another possible reason for kidnappings is ransom. Even in those cases when some of the kidnapped children are released, the fate that awaits them is not always pleasant. Because of the stigma of association with Boko Haram, they are often shunned by the communities they grew up in, as well as others. Protecting the youngest members of society is an imperative for all nation-states.

The goal of this paper is to develop a data-driven, quantitative assessment of the risk of schools in Nigeria to an attack by Boko Haram. We will frequently refer to such attacks as “school attacks”. We hope that our results, findings, and policy recommendation can be used by the Government of Nigeria to increase the security of schools.

Our model is data-driven, combining statistical and machine-learning methods. We do not claim to have invented new statistical or machine learning techniques in this paper. The contributions of this paper are our new dataset, the findings that predictively link Boko Haram’s attacks on schools to the triad of Security Presence, (Boko Haram) Activity, and Socioeconomic factors, and the recommendation that we make to the Nigerian Government and the international community. This triad is based on the following hypotheses:

  1. Hypotheses 1 (Activity). Boko Haram is more likely to target schools in regions where they are carrying out other operations in the region. This hypothesis is based on the Routine Activity Approach theory, suggesting that the proximity of schools to regions where Boko Haram is active renders them more likely targets. This increased risk is due to the offender’s familiarity with the area, ease of access, and a potentially reduced risk of being intercepted [10]. Additionally, detailed studies of criminal geography and past findings [11] show that “juveniles do not commit property crimes in their immediate home areas to avoid recognition”. At the same time, criminals do not go too far away from their home areas as they prefer to stay in areas they know well as shown by [12], who stated “Distances between home and crime-site were short”. [13] showed that in the case of improvised explosive device (IED) attacks in Baghdad during the Iraq war, the locations of weapons caches facilitating those attacks were neither too near nor too far from the locations of those attacks. Based on these past findings, we use Boko Haram’s other types of attacks as a set of features that serve as a proxy for their location in this paper.

  2. Hypothesis 2 (Security Presence). Boko Haram is less likely to target schools in regions with a heavy security presence. This hypothesis is driven by past work in criminology showing that the presence of police stations in locations with lots of crime is linked to subsequent reductions in crime in that location [14]. The hypothesis also aligns with the Deterrence Theory, which states that the visible presence of security acts serves as a deterrent to criminal activities, by increasing the perceived risk of detection and subsequent punishment for offenders [15,16]. A related study found similar results before and after the closures of police stations in the German state of Baden–Württemberg [17]. Likewise, a study of property crime in Buenos Aires [18] “found that the commission of crimes increases exponentially as the distance from the nearest police station increases”. And even in Nigeria (albeit in the southwest of the country), [19] found that crime increased “as distance from police stations increased”. Hence, we include features in our machine learning models that are related to security presence.

  3. Hypothesis 3 (Socioeconomic). Boko Haram is more likely to target socioeconomically weaker areas than wealthier areas. This hypothesis is based on the theory of Social Disorganization, which suggests that areas burdened with socioeconomic factors such as unemployment, poverty, and inequality often struggle to maintain strong social controls and a united community. This leads to social disorganization, where the community cannot uphold standards that prevent behavior such as crime or terrorism [20,21]. Hence, our machine learning models include features related to the socioeconomic status of the region that a school is in.

  4. Hypothesis 4 (Geospatial). Boko Haram is more likely to target rural areas than urban ones. This hypothesis is based on the idea that because rural regions are more geographically spread out, they are harder to defend than urban areas. A counter-argument could be that urban areas offer richer pickings for the criminals. This hypothesis is inspired by works such as [22] who investigated the relationship between urbanization and crime in regions including Rio de Janeiro, Karachi, and Lagos. This is the rationale underlying some of the features used by our machine learning models.

It is important that we are stating the above as hypotheses, not as statements of fact. To formally evaluate these hypotheses and to build machine learning models and quantitative risk scores, we have assembled a dataset spanning a period of almost 14 years (July 1 2009 to April 30 2023). The dataset combines information from a wide variety of open sources and captures the following sub-datasets related to (i) school attacks carried out by Boko Haram (our dependent variable), (ii) locations of schools in Nigeria, (iii) data on whether wards in Nigeria are rural or urban (a ward is a geographic administrative unit in Nigeria. The country is divided into 744 Local Government Areas or LGAs, each of which is further divided into 10–20 wards), (iv) data on other Boko Haram activity in Nigeria, (v) locations of security installations such as police stations and military bases in Nigeria, and (vi) socioeconomic characterizations of the wards in Nigeria. The dataset has been made available as part of the Supplementary Materials. We discuss our combined dataset in greater detail in the later sections of this report.

After the initial statistical evaluation of the hypotheses referenced above, we provide a machine-learning characterization of the risk of a Boko Haram attack on every school in Nigeria. We use cross-validation to identify the best such machine learning models. Because decision trees [23,24] generate predictive rules that are relatively easy to explain and understand, we use some of the decision trees learned from our study to identify additional hypotheses which are subsequently evaluated statistically.

We draw the following conclusions:

  1. Ablation testing suggests that Presence of Security Forces is the single most important factor that mitigates attacks on schools by Boko Haram. Simply put, the more security installations there are near a school, the less likely it is to be attacked by Boko Haram. We therefore recommend that the Government of Nigeria and international donors focus on directing security budgets to creating new security installations in high-threat areas of Nigeria. Fig 1 shows our new country-wide risk map for the schools in Nigeria, while Fig 2 zeros in on a small part of Nigeria and shows a detailed risk map.

  2. Unsurprisingly, Other Boko Haram Activity in a region is positively correlated with attacks on schools in that region.

  3. Perhaps more surprisingly, wealth of a region is not clearly inversely correlated with school attacks. Two of our socioeconomic indicators show this, but a third shows the reverse trend. We discuss this seeming anomaly and provide some potential explanations of why this might be occurring.

  4. Finally, we present 7 complex multivariate hypotheses about which schools are attacked, each learned from our decision trees, that are statistically validated.

Fig 1. Predictive Risk of School Attacks by Boko Haram Across All of Nigeria.

Fig 1

These predictive risks are probabilities computed using the best performing model (namely AdaBoost, as presented later in the paper) to predict which school is most likely to be attacked. Maps were visualized using GRID3’s Operational Wards data under the Creation Commons Attribution License (CCBY 4.0). https://data.grid3.org/datasets/GRID3::grid3-nga-operational-wards-v1-0/about.

Fig 2. Predictive Risk of School Attacks by Boko Haram in Northeastern Nigeria.

Fig 2

These predictive risks are probabilities computed using the best performing model (namely AdaBoost, as presented later in the paper) to predict which school is most likely to be attacked. Maps were visualized using GRID3’s Operational Wards data under the Creation Commons Attribution License (CCBY 4.0). https://data.grid3.org/datasets/GRID3::grid3-nga-operational-wards-v1-0/about.

Last but not least, we present comprehensive maps that indicate our predicted scores denoting the risk of school attacks by Boko Haram. Such risk maps will need to be updated over time as new Boko Haram activity, new security installations, and new socioeconomic conditions, change the security landscape in Nigeria.

Boko Haram: A rief history

To set the context for this study, we provide a brief overview of the history of Boko Haram. Readers interested in a comprehensive history of Boko Haram can consult more detailed studies such as [2528].

According to the International Monetary Fund (IMF) [29], Nigeria had a per-capita gross domestic product (GDP) of $ 1760 in 2023, placing the country’s wealth in the upper quartile across African nations. However, this wealth is unequally distributed. The south of the country has greater wealth, in part because of the presence of oil in the Niger Delta [30]. This has led to a situation where the northern part of the nation has experienced greater food insecurity [31] and unemployment than the south. For instance, [32] states that “High unemployment has been blamed for civil unrest in Nigeria, in some cases leading to a revolution e.g. Boko Haram crisis in the Northern part of the country.” Similarly, [33] states that “Structural unemployment and widespread poverty are believed to be the basis for the activities of miscreants such as militant youth in the Niger Delta and the present deadly Boko Haram in northern Nigeria upsetting the seemingly peaceful and stable political situation.”

A religious divide between the largely Muslim north and the largely Christian south of the country has also played a role in the conflict. Since the late 1990s, Sharia law has been introduced in several northern states, leading to “ national and sub-regional terrorism with crucial ramifications for national development as well as national security” [34].

It was in this climate that Boko Haram was founded in 2002 in Maiduguri by a cleric, Mohammed Yusuf, in the Northeast of the country. During the next several years, Boko Haram’s violent activities gradually increased and included attacks on security forces, attacks on Christian populations, and kidnappings. As recounted vividly in [35], the conflict between Boko Haram and the Nigerian state peaked in 2008-2009. Nigerian security forces carried out Operation Flush in 2008, which led to a spate of retaliatory attacks by Boko Haram in 2009, killing over 700 people. This was the beginning of what is called the Boko Haram Uprising. Nigerian forces responded vigorously, killing over 1000 people and executing Mohammed Yusuf, the group’s founder, in custody. This was followed by a “decade of terror” [27]. During the 2009–2020 period, Boko Haram, under their new leader, Abubakar Shekau seized territory, and carried out dramatic attacks on numerous security facilities including both Nigeria’s Police Headquarters and the UN Headquarters in Abuja in 2011. Over the next few years, churches, schools, colleges, police stations and army barracks were attacked viciously by Boko Haram. On April 15 2014, tax day in the US, the world woke up to the news that Boko Haram had kidnapped 276 schoolgirls from Chibok, leading to the BringBackOurGirls campaign [36,37] that was supported by numerous celebrities including then U.S. First Lady Michelle Obama. Shortly thereafter, the organization used one of the unfortunate victims of the Chibok attack as a suicide bomber [38]. Boko Haram has repeatedly used girls as suicide bombers in subsequent years [39].

Despite a spate of horrific attacks, only a few of which are listed above, in December 2015, the Nigerian government claimed that Boko Haram was defeated [40]. A similar claim was repeated by Nigeria’s Information Minister in October 2019 [41]. Such claims were repeated several times in subsequent years, despite evidence that Boko Haram has consistently managed to carry out attacks. To this day, Boko Haram continues to carry out attacks - a phenomenon supported by attacks such as one in Maiduguri in November 2023 that reportedly killed 40 people [42], as well as a January 2024 attack that reportedly killed 14 people in Yobe state [43].

Child kidnappings

Child kidnapping has been used as a strategy by Boko Haram since the beginning of the insurgency in 2009. Kidnapped girls are forced into a combination of domestic and sexual servitude [44]. They are also used as spies, fighters [45] and suicide bombers [46]. Children of both sexes are often starved, physically abused and forced to torture and kill civilians [47]. In addition to the horrific attacks carried out by the victims of Boko Haram’s abductions, it is clear that victims of kidnappings face an extreme form of child abuse with profound mental health and other consequences [48,49]. To make matters worse, efforts to reintegrate Boko Haram’s male fighters (including boys who were abducted and trained to become fights) into society through programs such as Operation Safe Corridors [8,50] have not been very successful and in fact, as stated by [8], Nigerian communities often do not want take back and reintegrate fighters and women and children who were themselves victims of Boko Haram [51].

Materials and methods

We now describe: (i) how we created our Boko Haram dataset, (ii) our statistical analysis of the univariate hypotheses posed in the Introduction, (iii) our machine learning analysis of our Boko Haram data, and (iv) our statistical analysis of several multi-variate hypotheses inspired by our machine learning analysis in (iii) above. Finally, (iv), we show the risk maps we have generated for Nigerian schools.

Our data

In order to explore a set of statistically testable hypotheses and to build out machine learning models underlying our quantitative risk scores, we created a dataset spanning a period of almost 14 years (July 1 2009 to April 30 2023). The dataset combines information from a wide variety of open sources, as noted in Table 1, and captures the following “sub” datasets relating to school attacks carried out by Boko Haram (our dependent variables), locations of schools in Nigeria, (iii) data on whether wards in Nigeria are rural or urban, (iv) data on other Boko Haram activity in the vicinity of schools, (v) locations of security installations in Nigeria, and (vi) socioeconomic characterizations of the wards in Nigeria. Our use of reliable sources such as ACLED [6] (for the dependent variable), GRID3 for school data, police stations and socioeconomic risk factors [52], and OpenStreetMaps [53] for military installations also eliminated the need for imputation methods. We will discuss our combined dataset in further detail later in the paper.

Table 1. Summary of data sources and key attributes.

Data Source Record number Categories
Dependent variable ACLED 76 School attacks
B.H activity ACLED 4224 Arson, looting, kidnapping etc.
School data GRID3 103,064 primary, mixed, others, secondary, tertiary, pre-primary
Security installations OSM & GRID3 920 Police & Military
Risk factors GRID3 36 Communications, Exposure, Socioeconomic

Boko Haram’s Attacks, 2009–2023

Fig 3a shows the total number of attacks carried out by Boko Haram in our dataset, as well as the total number of attacks on schools (Fig 3b). We note that there are challenges in determining whether an attack should be attributed to Boko Haram or not. We relied on the widely used and highly respected ACLED data set [6] for this purpose. ACLED data showed over 4300 Boko Haram attacks in all during the 2009–2023 period, of which 76 targeted schools. The bulk of these attacks occurred during the height of the Boko Haram insurgency (2012–2014), but the numbers since then still indicate several attacks per year. Although ACLED records no attacks targeting schools in Nigeria during 2016 and 2019 (as shown in Fig 3b), Boko Haram remained active in the country during these years, as evidenced by Fig 3a. Though Boko Haram carries out attacks and has a presence in the nations of Cameroon, Chad, Niger, and Nigeria, the overwhelming majority of school attacks are in Nigeria. We therefore limited our study to school attacks in Nigeria.

Fig 3. Attacks by Boko Haram, July 2009–April 2023.

Fig 3

(a) shows attacks not including school attacks. (b) shows attacks on schools. Numbers only run through the end of April 2023. These charts are based on ACLED data [6].

We also investigated the months when Boko Haram is most likely to carry out an attack and/or a school attack. This information is shown in Fig 4. Fig 4a looks at non-school attacks, while Fig 4b shows school attacks. We see an interesting difference here. While non-school attacks are more or less uniformly distributed over the 12 months of the year, school attacks occur less frequently in January, August, and November. While school holidays may vary from one institution to another, the academic year in Nigeria typically runs from September to July. This accounts for the lower frequency of school attacks observed in August. Moreover, the school year is segmented into three terms, with most schools having midterm breaks and/or a winter break scheduled from mid-December to mid-January.

Fig 4. Month-by-Month Attacks by Boko Haram, July 2009–April 2023.

Fig 4

(a) shows attacks not including school attacks. (b) shows attacks on schools. 2023 numbers only run through the end of April 2023. These charts are based on ACLED data [6]. Maps were visualized using GRID3’s Operational Wards data under the Creation Commons Attribution License (CCBY 4.0). https://data.grid3.org/datasets/GRID3::grid3-nga-operational-wards-v1-0/about.

Fig 5 shows the locations of school attacks in Nigeria. The map shows that most attacks happen in Northeastern Nigeria in the Lake Chad region with a smaller number of attacks in other northern states.

Fig 5. Month-by-Month Locations of School Attacks by Boko Haram, July 2009–April 2023.

Fig 5

Blue dots show locations of schools that were never attacked. Dots with other colors show the month when a school was attacked. These charts are based on ACLED data [6]. Maps were visualized using GRID3’s Operational Wards data under the Creation Commons Attribution License (CCBY 4.0). https://data.grid3.org/datasets/GRID3::grid3-nga-operational-wards-v1-0/about.

School data

Our dataset includes information about 103,064 schools in Nigeria including both primary and secondary schools, but not including universities and colleges. The data was obtained from the GRID3 Data Hub [52]. Each school, serving as the unit of analysis, has a unique ID, a latitude and longitude showing the location of the school, and the ward in which the school is located. In addition, each school has the following fields:

  • Education type consisting of 4 possible types: formal, religious, informal, and integrated.

  • Management type consisting of 13 possible types: public, private, faith-based, private faith-based, public NGO funded, schools funded based on a public-private partnership, public faith based, NGO funded, unknown, private NGO funded, faith-based NGO-funded, state government-funded, and federal government-funded.

  • Subtypes consisting of 12 possible types: standard, primary, pre-primary, nursery, aggregate, adult education, mixed, others, tertiary, senior, junior, and university. In our study, adult and university-education instances were excluded as these do not involve schools.

  • Category types with 6 possible values: primary, mixed, others, secondary, tertiary, pre-primary.

  • Type of education imparted included formal vs. informal vs. religious education.

  • Type of management of the school that included public school vs. private school.

  • Category to which the school belongs consisting of (primary vs. secondary vs. “mixed” which includes both.

Security installations

We obtained information on the location of security installations in Nigeria using two methods.

First, we continued the use of the GRID3 Data Hub to extract the locations of 802 police stations in the country [54]. For each police station, we captured an ID for the station, a latitude and longitude describing the location of the station, as well as the name of the state and the specific region in the state where the police station is located. It is important to note that the dataset is may be incomplete in terms of covering all police stations in the country. To enrich our analysis and achieve a more comprehensive understanding of security installations, we incorporated data on Nigerian military sites, including military installations, checkpoints, barracks, airfields, and training areas, obtained via the Overpass API [55]. OverPass is a software service that is built on top of Google Street Maps. It can be queried to obtain diverse forms of information. From this, we extracted locations of 128 military facilities. After checking for duplicate records, we obtained data on 920 security locations throughout the country. Fig 6 serves as a visual representation of the security installations and where they are located. The queries used to extract the military installations via the Overpass API can be found in S6 Appendix F.

Fig 6. Location of Nigerian security installations.

Fig 6

The blue stars represent security installations, the majority of which are situated in the southwestern states. These charts are based on Open Street Maps [53] & GRID3 data [52]. Maps were visualized using GRID3’s Wards data under the Creation Commons Attribution License (CCBY 4.0). https://data.grid3.org/datasets/GRID3::grid3-nga-operational-wards-v1-0/about.

Socioeconomic Data

We gathered proxies for socioeconomic data about each school ward from the geospatial maps provided by GRID3 Data Hub [52]. In particular, we obtained three types of “risk scores”, which assess the percentage of individuals in households within the wards who face socioeconomic disadvantages. The scores range from 1 to 5 with 1 indicating the lowest risk and 5 the highest. The score scale is determined by taking into consideration and consolidating multiple data sources including household and population surveys as well as other agencies such as USAID, United Nations, World Bank, USGS, WorldPop and more. The predefined scale of risk categories at the ward level was also omitted the potential presence of outliers in the data, thereby eliminating the need for outlier handling. The score categories are defined as follows:

  • Communication Risk Score [56]: This metric is based on the level of communications access that the population in the ward has. It is based on the percentage of members of the ward with access to radios, TVs, the Internet, and other sources of news. Fig 7 presents a color-coded map that illustrates the communication risk scores at the ward level across Nigeria.

  • Exposure Risk Score [57]: This metric is an aggregate that captures indicators such as population density, proximity between households, water, sanitation, and hygiene. Fig 8 presents a color-coded map that illustrates the exposure risk scores at the ward level across Nigeria.

  • Socioeconomic Risk Score [58]: This metric captures the overall level of socioeconomic risk. Fig 9 presents a color-coded map that illustrates the socioeconomic risk scores at the ward level across Nigeria.

Fig 7. Color-coded map of communication risk scores across Nigerian wards.

Fig 7

Maps were visualized using GRID3’s Operational Wards data under the Creation Commons Attribution License (CCBY 4.0). https://data.grid3.org/datasets/GRID3::grid3-nga-operational-wards-v1-0/about.

Fig 8. Color-coded map of exposure risk scores across Nigerian wards.

Fig 8

Maps were visualized using GRID3’s Operational Wards data under the Creation Commons Attribution License (CCBY 4.0). https://data.grid3.org/datasets/GRID3::grid3-nga-operational-wards-v1-0/about.

Fig 9. Color-coded map of socioeconomic risk scores across Nigerian wards.

Fig 9

Maps were visualized using GRID3’s Operational Wards data under the Creation Commons Attribution License (CCBY 4.0). https://data.grid3.org/datasets/GRID3::grid3-nga-operational-wards-v1-0/about.

We can think of these “risk scores” as proxies for wealth. Small scores (close to 1) typically relate to areas that are more wealthy than those with higher risk scores.

Other derivative data

For each attack location, we found the 5 nearest security installations using the Shapely and GeoPandas tools’ Ball-tree approach [59] together with the Haversine distance metric [60] which is specialized for use with latitude-longitude data. The same method was used to find locations of the 5 nearest attacks to a school.

S1 Appendix A in the Supplementary Material contains a list of all the independent variables (i.e. features) we associated with each school.

Statistical inference of school attacks in relation to social attributes

In this section, we present the results for our first round of statistical analysis. We examine the relationships between school attacks and several variables (e.g. socioeconomic risk factors and wealth described in Hypotheses 1–4) below. Given a school, we examine whether a school attack happened within k{1,2,3,5,10} km. of the school. We wanted to investigate how predictions of whether a school is at risk of attack vary in quality as we expand the window of what someone might consider threatening to the school. We investigated how predictive performance changes when the threshold is set to 1, 2, 3, 5, 10 km ranges.

There are several papers (e.g. [13]) that measure spatial prediction error by looking at the distance between the location where an event is predicted to occur, and the true location. The choice of k captures this in a binary setting. When our algorithms say that a school S will be attacked, will some school within the circle of radius k km. be attacked? If so, we say the prediction is correct. This is similar to the notion of distance-based accuracy [61] in predicting locations of individuals or events in social media.

This framing has a natural interpretation from a defense perspective. Imagine being a parent whose child goes to school S. If a school within 2 km. of school S was attacked, would you consider your child’s school S to be safe? Probably not. What about if the nearest school attacked was 5 km. away from your child’s school? Would that make you feel safer? From a defense perspective, the rationale for studying proximate attacks at different distances is to account for the possibility of target selection based on characteristics we do not know about. Perhaps school S is 2 km. away from school S and the only reason S was selected for an attack was that the leader of the attacking party that day knew the neighborhood around S slightly better than the neighborhood around S. Or perhaps the group of attackers was closer to S on that specific day. As we cannot systematically gather such data, we consider a school S to be attacked if any school within a distance k of S was attacked for k{1,2,3,5,10} km, regardless of how many other schools lie within a distance k from S.

At the 1 km threshold, our experiments will demonstrate a lack of ability to accurately predict if a school within 1 km of a given school S will be attacked. But for the 2, 3, 5, and 10 km levels, our experiments show that our best predictive model performs well.

Fig 10 represents a visualization of assigning the dependent variable to Nigerian schools when k = 5 km. These are the dependent variables we study in this paper.

Fig 10. Schools labeled A and B were identified as having experienced attacks by Boko Haram, due to the incidents occurring within a 5 km radius of their premises.

Fig 10

Hypothesis 1. For a given school, a given k{1,2,3,5,10}, we define “Class 1” to be the set of all schools s which experienced an attack within a distance of k kilometers (km) from s. “Class 0” is the set of all schools that did not experience an attack within k km of the school, i.e. the complement set of Class 1. We hypothesize that the total number of attacks (not just school attacks) by Boko Haram in Class 1 differs significantly from Class 0.

Recall that Class 1 is the set of schools s that experienced an attack nearby, i.e. within k{1,2,3,5,10} km, the quantity we aim to predict. The geospatial granularity of the precision is more fine-grained when k is small. The independent variables we examined are the total number of attacks within a distance of d{5,10,25,50} km. of the school which serves as a proxy for the presence of Boko Haram in the region near the school.

We formally compare the total number of attacks between the two classes using a linear model-based approach (two-sided Welch’s t-test) between classes [62]. We report the mean difference along its 99% confidence interval (CI) and P-value between the number of attacks between the two classes, summarized in a forest plot. We then repeat the analysis for all d{5,10,25,50} km. When the difference is statistically significant (i.e. P<0.05), the 99% CI does not contain the null value of 0 (i.e. no difference). To address multiple hypothesis testing but account for the correlation structure of the shared radii across the analyses, we corrected each raw P-value using the false discovery rate (FDR) method [63]. S10 Appendix J provides a detailed description of the statistical methods used in this paper.

Results and Analysis of Hypothesis 1. Fig 11 shows the results (Forest Plots, FDR-corrected P–values) of our analysis for each of the dependent variables based on k. Based on Fig 11, we can conclude that:

Fig 11. Hypothesis 1: Difference between means of “Class 1” (a school attack happens within k km of a given school) vs. “Class 0” (no school attack happens within k km of a given school) for k{1,2,3,5,10} when considering the total number of other attacks that occur within d{5,10,25,50} of the school as the independent variables.

Fig 11

The exact numerical values of the point estimates and 99% CI can be found in Table S4, S5 Appendix E.

  • There is a significantly higher number of attacks in Class 1 across all values of d and all the 5 dependent variables (i.e. values of k).

  • There is consistency in direction, i.e. the difference between the means of Class 1 and Class 0 increases as d and k increase.

  • All results are statistically significant as FDR0.05 in all cases.

  • In all values of k, there is a step-wise, upward trend in mean difference as d increases supporting greater certainty in the total number of attacks.

  • As k increases (e.g. 1–3 km), the total number of attacks tend to decrease.

Finding 1. In short, we can conclude that the Activity hypothesis is correct: the more Boko Haram activity (other attacks) occurs in a region, the more likely that school attacks will occur in that region.

Our second hypothesis investigates whether proximity to security installations (e.g. police stations, military bases) is linked to whether a school was attacked. To test this, we investigated the distance between each school sS and the distance to the 1st, 2nd,..., 5th closest security installation. We asked the question: Between the outcome Class 1 (a school attack happens within k kms. of a given school) and Class 0 (no school attack happens within k kms. of a given school), is there a statistically significant difference in the distance to the 1st, 2nd, ..., 5th closest security measure?

Hypothesis 2. Consider the difference between means of Class 1 (a school attack happens within k kms. of a given school) and Class 0 (no school attack happens within k kms. of a given school) for k{1,2,3,5,10} when considering the distance between each school sS and the distances to the 1st, 2nd,, 5th closest security installation. There is a statistically significant difference in the distance to the 1st, 2nd, , 5th closest security installation in Classes 1 and 0.

We use the same statistical approach used in Hypothesis 1, and repeat the analysis for all km variations. Unlike Hypothesis 1, we address multiple hypothesis testing using the Bonferroni correction instead of the FDR method because the distances from the school to the 1st, 2nd, 5th closest security installations are conditionally independent. Fig 12 below shows the forest plots and p-values (raw and Bonferroni) for each dependent variable k{1,2,3,5,10} km.

Fig 12. Hypothesis 2: Difference between means of “Class 1” (a school attack happens within k kms. of a given school) vs. “Class 0” (no school attack happens within k kms. of a given school) for k{1,2,3,5,10} when considering the distance between each school sS and the distance to the 1st, 2nd, , 5th closest security installation.

Fig 12

Results and Analysis of Hypothesis 2. Fig 12 shows the results (Forest Plots, raw and Bonferroni-corrected P-values) of our analysis for each of the dependent variables based on k{1,2,3,5,10} kms. We can conclude from this figure that:

  1. In all km variations k{1,2,3,5,10} kms of the dependent variable, Class 1 (Attack) is significantly farther away from a security installation than Class 0 (No Attack).

  2. The magnitude of the difference for the 1st,..., 5th closest security installations is always increasing. One point to note is that the 1st and 2nd closest security installations in the cases when k{5,10} km were very close.

  3. At higher km. variations, the mean-difference estimates of the 1st, ..., 5th closest become more similar to one other and more distinguishable.

Finding 2. We therefore conclude with high statistical significance that the closer a school is to security installations, the less likely it is to be attacked.

Our third hypothesis is whether Boko Haram would be more likely to target socioeconomically weaker areas than wealthier areas. This hypothesis is based on the theory of Social Disorganization, which suggests that areas burdened with socioeconomic factors such as unemployment, poverty, and inequality often struggle to maintain strong social controls and a united community. This leads to social disorganization, where the community cannot uphold standards that prevent behavior such as crime or terrorism [20,21].

Hypothesis 3. The difference between means of “Class 1” (now defined as a school attack happens within k kms. of a given school) vs. “Class 0” (no school attack happens within k kms. of a given school) for k{1,2,3,5,10} increases as the Socioeconomic Risk Score of the region of each school decreases (i.e. the school is in a richer neighborhood).

We used the same statistical method including the Bonferroni correction of raw P-values as in Hypothesis 2. Fig 13 shows the forest plots that summarize the finding.

Fig 13. Hypothesis 3: The mean difference between “Class 1” (a school attack happens within k kms. of a given school) and “Class 0” (no school attack happens within k kms. of a given school) for k{1,2,3,5,10} increases as the Socioeconomic Risk Score of the region of each school decreases (i.e. the school is in a wealthier neighborhood).

Fig 13

Numerical values associated with each point estimate and 99% confidence interval can be found in Table S6, S5 Appendix E.

Results and Analysis of Hypothesis 3. We were surprised to find a positive relationship between school attacks and the Exposure Risk, but a negative relationship between school attacks and each Socioeconomic Risk and Communication Risk (Fig 13). Specifically, we found that:

Finding 3.

  1. The magnitude of differences is consistent in the direction across all km variations of the dependent variable.

  2. There is a negative association between school attacks and socioeconomic Risk Score and with Communication Risk Score.

  3. There is a positive association between school attacks and Exposure Risk Score.

  4. These results held for all k{2,3,5,10} km settings—but not for the 1 km variation. In the k{2,3,5,10} km settings, all risk scores individually showed highly significant differences between Class 1 and 0. The results remains statistically significant after the Bonferroni P-value correction, the most conservative approach for addressing multiple hypothesis testing.

The testing of Hypothesis 3 yields more nuanced results than the previous hypotheses. We find that the mean differences between Class 1 and Class 0 with respect to both the Socioeconomic Risk Score and Communication Risk Score are negative. In other words, these risk scores are low for Class 1 and high for Class 0. This finding supports the notion that schools with a higher probability of attack tend to be located in wealthier areas (i.e. areas with low values of these risk scores). In contrast, we also find that the difference between the means of Class 1 and Class 0 in Exposure Risk scores is positive, suggesting that regions with poor sanitation and water are at greater risk of school attacks. One possibility might be that regions with poor sanitation and water tend to be in rural areas.

Based on our analysis of socioeconomic and communication risks, Boko Haram targets both wealthy areas for kidnapping, perhaps to garner good ransom payments, as well as poor areas, perhaps to obtain child soldiers and domestic workers and/or sex slaves. There is evidence supporting the hypothesis that Boko Haram generates revenue through kidnapping and ransom [64]. See also the assertion in [65], attributed to an unspecified 2021 National Geographic article, that ransoms were paid for 103 of the 276 girls kidnapped in Chibok, though official reports from the Nigerian Government at the time deny making ransom payments to Boko Haram in exchange for the release of the Chibok girls [66].

Finally, we hypothesize that Boko Haram targets schools in rural areas more frequently than urban areas when urban/rural areas are defined by population. There is a wide body of literature that has looked at the question of whether terrorist groups prefer urban or rural targets. For instance, [67] argues that “urban locations make attacks against civilian targets more likely, whereas rural areas increase the likelihood of attacks against the police and governmental targets". The question of urban vs. rural locations in the context of schools has not been studied—and this is why we chose to study it below. [67] also argues that urban locations are attractive for the density of targets and for the ability of terrorists to move with relative anonymity—whereas in a rural region, outsiders may stand out. As population density seems linked to this hypothesis, we investigated a secondary hypothesis. This secondary hypothesis is based on the idea that urban areas can be further classified into urban clusters (areas of > 50, 000 people) and urban centers (areas of > 2, 500 and < 50, 000 people) [68]. We expected that the less densely populated region would be targeted more frequently. Our hypotheses can be more formally stated as two questions:

Hypothesis 4. (a) Does the frequency of school attacks differ between Rural vs. Urban (defined based on census-based population)?

(b) Considering non-rural schools only, does the frequency of school attacks differ between Urban Center and Urban Clusters?

To formally address Hypothesis 4(a), we use the data of all schools to set up a 2×2 contingency table with respect to the categories of interest and compute the odds ratio (OR) and 99% CI. We also calculated statistical significance using the Chi-squared test with Yates’ Continuity Correction [69]. The results are shown in Table 2. Appendix G in the Supplementary Material shows additional statistics for the 2×2 contingency table that go beyond odds ratios. These include PPV, NPV, Sensitivity, and Specificity.

Table 2. Contingency table setups for the hypotheses.

(A) (B)
All schools, n=103,064 Urban schools only, n=74,281
Urban Rural Urban Center Urban Cluster
Attack=1 2386 (3.21%) 210 (0.73%) 2145 (5.79%) 241 (0.65%)
Attack=0 71,895 (96.79%) 28,573 (99.27%) 34929 (94.21%) 36,966 (99.35%)

Next, we restrict our data to a subset containing non-rural schools (Table 2B). We compare the proportion of school attacks in the more densely populated “urban centers” to the less densely populated “urban clusters” using the same statistical approach.

Results and Analysis of Hypothesis 4. Table 3 shows the results of our analysis. We find that the likelihood of school attacks in urban regions is 4.52 times higher than in rural regions. Limiting to only the schools in urban regions, the frequency of school attacks in urban centers is 9.42 times higher than in urban clusters. Table S7 in S7 Appendix G provides additional metrics for a more comprehensive evaluation and deeper insights into the analysis.

Table 3. Association analysis of school attacks and socioeconomic wealth.

Odds Ratio 99% CI, lower 99% CI, upper P-value
Urban
(compared to Rural; all schools included) 4.52 3.92 5.23 <5×10115
Urban centers
(compared to Urban Clusters; Rural schools excluded) 9.42 8.23 10.81 <1×106

Finding 4. These results suggest that Boko Haram targets schools in more densely populated areas (schools in urban centers are more likely to be targeted than schools in urban clusters, which, in turn, are more likely to be targeted than schools in rural areas).

Machine learning based analysis

In this section, we describe the results of machine learning-based analysis of whether Boko Haram would target a school. We note that our goal in this paper is not to develop new machine learning techniques, but to adapt them to gain insights about the risk that schools in Nigeria face. We report the results of two experiments:

  1. How well can we predict the schools that Boko Haram will attack as we vary the granularity of prediction with k{1,2,3,5,10} km?

  2. Which features are the most impactful in making these predictions? The answer to this latter question sheds light on the factors that are most important in determining where Boko Haram targets its school attacks.

Predictive Performance Our data consisted of triples (s,fs,dvs) where s is a school, fs is the feature vector of length 15 associated with s, and dvs is a vector of 5 dependent variables. The feature vector fs consists of the 15 features shown in S1 Appendix A Table S1. The vector dvs of dependent variables includes one dependent variable each for the k=1,2,3,5,10 km variables. The value of the dependent variable is set to 1 for a school s if Boko Haram attacked a school within k km. of s. Otherwise, it is set to 0.

Machine learning classifiers fall into two broad categories: those based on deep learning and more traditional ones. We selected six of the most well-known traditional machine learning classifiers that have performed well in many other settings. The six traditional classifiers were Random Forest, Decision Trees, AdaBoost, Logistic Regression, Linear SVM, and Gaussian Naive Bayes. For instance, Random Forest classifiers have outperformed many other machine learning classifiers in many settings. On the deep learning side, we also tried Multi-Layer Perceptrons and Deep Neural Networks—the first because it is a foundational one and the second because it is extremely popular in the literature today. As a third deep learning model, we also tried a time-based Long Short-Term Memory (LSTM) neural network based model—but the results in this case (see S9 Appendix I) were truly abysmal and hence are not reported in the main body of the paper.

The features used by these classifiers are shown in S1 Appendix A. These features were selected on the basis of theories in social science from the fields of criminology, conflict studies, and sociology. The rationale for features related to distance to the nearest security installation is based on the social science theory that the presence of police stations deters crime [15,16]. The rationale for features related to the rural vs. urban nature of crime is based on the idea that urban crime may be more prevalent in some cases [22] and rural crime may be more prevalent in some cases [67]. The rationale for features based on socioeconomic characteristics of a region are based on the idea that poor neighborhoods (e.g. in the USA) are more likely to experience crime) than rich ones [20,21].

All our machine learning performance results were obtained via a standard 10-fold cross-validation protocol in which training/validation was done on 9 folds and testing was done on the 10th (holdout) fold. This was repeated 10 times for each classifier by varying the holdout fold in each of the 10 iterations and training/validating on the remaining 9 folds. Table 4 shows the Precision, Recall, F1 Score, and Area under the Receiver-Operation Characteristic Curve (AUC) for the best-performing classifier. AUC and F1 scores are single performance metrics—the latter combines Precision and Recall. For all k’s, AdaBoost generated the best results in both F1 Score and AUC. S2 Appendix B shows the detailed breakdown of the predictive performance of all the classifiers tested for all k’s.

Table 4. Summary of performance of the best machine learning classifier (AdaBoost).

DVk Precision Recall F1 Score AUC
1 km. 0.83 0.62 0.71 0.81
2 km. 0.92 0.91 0.91 0.95
3 km. 0.94 0.92 0.93 0.96
5 km. 0.96 0.97 0.97 0.98
10 km. 0.99 0.98 0.98 0.99

The predictive performance results achieved by AdaBoost, our best classifier, are shown in Table 4. The results show that predictive performance is highly accurate when we consider k2. In such cases, both precision and recall are above 90%. This means that when our AdaBoost models for k2 suggest that a school will be attacked, there is a high probability that a school within a few kilometers of it (i.e. k kms.) will in fact be attacked, over 91%. Moreover, the recall is also high for k2, meaning that in the case of schools that were in fact attacked, our algorithms are able to correctly predict this.

Throughout this analysis, we see that as k increases from 1 to 10, all performance metrics also improve. When we consider k = 1, both precision and recall drop. This suggests that predicting at the 1 km. spatial granularity is much more challenging than at the granularity of 2 km. or more. Yet, precision is still 83% suggesting that when AdaBoost predicts a school will be attacked, there is an 83% probability that prediction is correct. But recall drops dramatically to 62% for k = 1, compared to over 91% for k = 2 and higher.

The performance metrics are also illustrated in Fig 14, where the AdaBoost Receiver Operating Characteristic (ROC) curves for all k’s are plotted. These curves demonstrate the trade-off between the true positive rate and the false positive rate across various thresholds. S8 Appendix H contains the confusion matrices of the AdaBoost model across all k’s. As k decreases, the model’s ability to predict positive instances also decreases, suggesting that it performs better on a larger spatial scale. To ensure reproducibility, it is important to note that the model was run using version 1.1.2 of the Scikit-Learn library.

Fig 14. Receiver operating characteristic curve for AdaBoost.

Fig 14

Finding the Most Impactful Features To identify the features that have the biggest impact on predictive accuracy, we used standard ablation testing. In ablation testing, we drop one feature at a time from the overall set of 15 features considered (cf. Table S1 in S1 Appendix A), and measure the reduction in predictive performance (F1 score). The most important feature, when dropped, leads to the greatest reduction in predictive performance. Suppose we now drop the most important feature (after identifying it). We need to recompute the drop in predictive performance for each remaining feature to find the second most significant feature. This process is repeated to find the third most important feature, and so forth. Table 5 shows the three most important features (by rank) that our best-performing model (AdaBoost) used, as we vary k. The rows of this table show the features and the columns show k. For instance, the row “dist. to 3rd closest security installation” suggests that this is the most important feature for the dependent variable with k = 5 km and the second most important feature for the k = 2 km. dependent variable.

Table 5. Ranks of the importance of features (rows) as we vary the dependent variable k (columns). Determined using ablation testing with our best-performing classifier, AdaBoost.

Feature name k = 10 k=5 k=3 k=2 k=1
Attacks within 10 kms. of school 1
Dist. to closest security installation 2 3 1 3
Dist. to 2nd closest security installation 2 3 1
Dist. to 3rd closest security installation 1 2
Dist. to 4th closest security installation 3
Dist. to 5th closest security installation 3 2 1 2

Finding 5. The distances to nearby security installations form the most important type of variable when predicting which schools will be attacked. The exact rank of which feature is more important in predicting importance of a feature for the different k{1,2,3,4,5} km. variations of the dependent variable are perhaps not important. What is important is that the the features about proximity of the school to the nearest, 2nd, 3rd, 4th, and 5th nearest security installations consistently end up being highly ranked (in the top 3) across many of the variations of k. That said, it would be interesting for future work to understand why the ranks turn out the way they do for different variations of k. This reinforces the findings in Hypothesis 2 which we had explored earlier, showing that the presence of security forces near schools is a powerful deterrent to Boko Haram attacks on schools.

Multivariate Decision Trees. In addition to high-quality predictions generated by AdaBoost, our Decision Tree classifiers also showed high performance with F1 scores of 0.38, 0.79, 0.85, 0.78, 0.81 for the k=1,2,3,5,10 km dependent variables. Except for the k = 1 case (where, as we have seen earlier, our predictive ability is weak), these F1 scores are all over 0.75 and hence considered quite strong. A notable advantage of decision trees is that they are easily explainable. Fig 15 shows part of our decision tree for the k = 1 dependent variable. Showing the entire decision tree is challenging as it causes the fonts to become very small and unreadable. Hence, we chose to focus on the parts of the decision tree that are relevant to our claims. The bold blue line shows the following hypothesis. Let S be the set of schools that satisfy the following logical conditions: the number of total attacks within 5km of the school exceeds 7.5 and the distance to the 4th closest security installation to the school exceeds 35.431 km. and the distance to the closest security installation to the school is less then 2.973 km. and the socioeconomic score of the ward to which the school belongs is less than 2.004. There were 37 schools satisfying this condition, 30 of which experienced Boko Haram attacks. This leads us to formulate the hypothesis that schools in S (schools satisfying this logical condition) are far more likely to experience a kidnapping attack by Boko Haram than schools in S¯. Similarly, the hypotheses can be generated from all of the decision trees that we have learned. S3 Appendix C presents all the decision trees we have extracted for k=2,3,5,10 km. cases.

Fig 15. Relevant Part of the Decision Tree Extracted for k=1 Dependent Variable.

Fig 15

Multivariate machine learning inspired statistical inferences of Boko Haram school attacks

Though AdaBoost is the best performing predictive model, Decision Trees provide explanations that are easy to interpret. We therefore looked at the decision trees (for the k=1,2) cases. Fig 15 shows the decision tree we derived for the k = 1 case. Even though predicting which schools will be attacked is least effective when k = 1 as shown in Table 4, this decision tree still allows us to come up with some interesting hypotheses. As we will show shortly below, we are able to get strong experimental results.

In this section, we discuss new hypotheses based on the decision tree models described in the previous section. We then formally address these hypotheses using a maximum likelihood-based statistical approach.

The paths we select in our decision trees are highlighted in the hypotheses below. For example, the path highlighted in blue in Fig 15 reflects a condition which, when satisfied by a school, leads to a 30/32 = 93.75% probability that the school will experience an attack. In contrast, the chance of a school being attacked (unconditionally) is 76/103064 = 7.37x10−4 which is far less than 1%. Thus, even the decision trees for the k = 1 case can shed important light on conditions that, when satisfied by a school, suggest a high likelihood of an attack on that school. S3 Appendix C shows the decision trees for the k=2,,5 cases which also have similarly interesting paths.

We first investigate the hypotheses inspired by the example in Fig 15.

Hypothesis 5. Let S be the set of schools that satisfy the following logical and condition. Each school sS must be such that there are:

  1. at least 7.5 attacks within a 5 km radius of the school and

  2. the distance of the 4th closest security installation to s is over 35.431 km and

  3. the distance to the nearest security installation to s is 2.973 km or less and

  4. the socioeconomic risk score for the school is less than 2.004.

Let S¯ be the set of all other schools (i.e. those that do not satisfy the above condition). We hypothesize that schools in S are far more likely to experience a Boko Haram kidnapping attack within 1 km distance by Boko Haram than schools in S¯.

There were 37 schools in S and 30 of them experienced attacks. Table 6 shows the contingency table for testing the hypothesis. We use an odds ratio (OR) as a representation of how many times, on average, school attacks would occur in S compared to S¯.

Table 6. Example of contingency table setup for testing Hypothesis 5.

Set S Set S¯
Attack = 1 30 (81.08%) 117 (0.16%)
Attack = 0 7 (18.92%) 71990 (99.84%)
  • OR > 1 There was an enrichment of school attacks in set S

  • OR < 1 There was a depletion of school attacks in set S. (or equivalently, there was an enrichment of school attacks in set S¯.

  • OR = 1 (i.e. null value) indicates no difference between set S and S¯.

As before, we used the P-values from the Chi-squared test with Yates’ continuity correction [69] to determine statistical significance.

We similarly set up a contingency table for each of the remaining six decision-tree hypotheses. Table 7 shows that in general:

Table 7. Results of statistical evaluation of the decision tree hypotheses.

km Hyp. Attacks Attacks Odds 99% CI, 99% CI, Chi-squared
No. Set S Set S¯ Ratio lower upper P-value
Count % Count %
1 5 30 81.08 117 0.16 2632.5 1122.2 8192.0 0.0002
1 6 30 93.75 117 0.16 7766.5 2409.4 4.5e15 0.0002
2 7 50 89.29 358 0.50 1616.9 685.9 4300.3 0.0002
2 8 50 94.34 358 0.50 3197.0 1050.3 16384.0 0.0002
2 9 122 82.99 286 0.40 1234.1 753.6 2015.4 0.0002
2 10 118 93.65 290 0.40 3320.5 1789.3 8192.0 0.0002
2 11 118 95.93 290 0.40 5658.4 2409.4 16384.0 0.0002
  • all OR point estimates are large and positive

  • none of the 99% confidence intervals (CI) contain the null value of 1

  • all Chi-squared test P-values are well below 10−3

These results serve as strong evidence to support Hypothesis 5. Furthermore, additional metrics that enhance the understanding of DT-based statistical inference studies such as PPV, NPV, Sensitivity, and Specificity, are detailed in Table S8, found in S7 Appendix G.

Hypothesis 6. Let S be the set of schools that satisfy the following logical condition. Each school sS must be such that there are:

  1. between 7.5 and 80.5 attacks occurred within a 5 km radius of the school and

  2. the distance of the 4th closest security installation to s is over 35.431 km and

  3. the distance to the nearest security installation to s is between 0.6 and 2.973 km or less and

  4. the socioeconomic risk score for the school is less than 2.004.

Our hypothesis is that schools in S are far more likely to experience a kidnapping attack by Boko Haram than schools in S¯.

Notably, Hypothesis 6 differs slightly from Hypothesis 5 in two respects. Conditions (1) and (2) each now include a lower bound which is not present in Hypothesis 5. Formally testing this hypothesis, we determine that the frequency of attack in set S is over 7,700 times (99%CI=2409.44.5×1015) higher than set S¯ with a highly significant chi2 P-value of 0.0002.

Table 8 contains some additional hypotheses suggested by the decision tree for k = 2. We do not discuss these in detail here for space reasons. S11 Appendix K provides a comprehensive description and interpretation of these hypotheses for k=2, as detailed in Table 7.

Table 8. In this table, each hypothesis consists of 3–5 preconditions whose conjunction (logical and) implies a school attack.

Hyp. No Preconditions Parameter Requirement
Hyp. 7 1 Number of attacks within 5 km radius Between 7.5 and 80.5
2 Number of attacks within 25 km radius 17.5
3 Distance to nearest security installation 2.73 km
Hyp. 8 1 Number of attacks within 5 km radius Between 7.5 and 80.5
2 Number of attacks within 25 km radius 17.5
3 Distance to nearest security installation 2.73 km
4 Communication risk score for the school 1.022
Hyp. 9 1 Number of ttacks within 5 km radius > 80.5
2 Socioeconomic risk for the school 2.283
3 Number of attacks within 25 km radius > 88.5
Hyp. 10 1 Number of attacks within 5 km radius > 80.5
2 Socioeconomic risk for the school 2.283
3 Number of attacks within 25 km radius > 88.5
4 Distance to 4th nearest security installation > 116.098 km
Hyp. 11 1 Number of attacks within 5 km radius > 80.5
2 Socioeconomic risk for the school 2.283
3 Number of attacks within 25 km radius > 88.5
4 Distance to the 4th nearest security installation > 116.098 km
5 Number of attacks within 10 km radius > 84.5

Interpretation: Let S be the set of schools that satisfy the above conditions. Each school sS meets these criteria. Our hypothesis is that schools in S are far more likely to experience a kidnapping attack by Boko Haram than schools in S¯.

Taken together, our statistical evidence supports the conclusion that schools satisfying the conditions to be in S are far more likely to experience attacks by Boko Haram than schools that do not.

Limitations

Our work has several limitations.

First, like many open source projects, our effort is limited by the data that we were able to collect. The ACLED dataset [6], though excellent and widely used, may have missed some school attacks and other types of attacks by Boko Haram. Likewise, the GRID3 Data Hub includes data about school locations, security installations, and socioeconomic data about most of Africa - but may likewise be incomplete. We have tried to augment GRID3 with data from another widely used system, OpenStreetMaps, to the extent possible.

Second, it is possible that some attacks on schools have not been reported, e.g. if local organizations were able to quickly raise the alarm and stop the attack and/or if local leaders were able to quickly negotiate a resolution of the situation with Boko Haram. Such attacks may not be included in the ACLED data.

Third, we predict the risk of attack on individual schools. Our prior book [27] predicts when an attack will occur on a given type of target. Types of targets may include security installations, transportation targets, and of course, schools. This paper focuses on predicting risk of attack on specific targets, i.e. specific schools, not schools in general. Ideally, we would like to simultaneously predict both the time and the location of a school attack. However, simultaneous prediction at such fine-grained granularity is challenging. An important future work would extend the results here to predict the approximate time frames of attacks on specific schools. For now, the results show risks to schools in this paper and the “when” attacks might happen could be predicted using the methods in [27] as a starting point.

Fourth, Boko Haram is constantly evolving its tactics. This requires that the data be updated periodically and the models relearned. Fortunately, updating the data (e.g. on a monthly or quarterly basis) is not challenging using the same data sources used in this paper that are typically updated with a few weeks of delay. This allows for a constant update of the risk maps.

Fifth, machine learning algorithms can overfit the data. One reason we did both a rigorous statistical and machine learning analysis is to see if both sets of techniques provide similar results. Our most significant finding, namely that the distance of the nearest security installation from a school is linked to the risk of an attack on the school by Boko Haram is validated by both sets of techniques. Hence, we feel confident about this finding.

Sixth, there is the question of explainability. We are able to explain many findings quite well. Some of the decision tree findings are more complex to explain. For instance, if we look at Hypothesis 5, the condition used to define the sets S and S depend on the distance of the school being studied from the nearest security installation and the 4th nearest security installation. This may look a little odd. We can think of this as expressing a range saying that the nearest security installation should be really close (less than 0.6 km) in the case of Hypothesis 5) and there are at least two other other security installations (i.e. the 2nd and 3rd closest ones) within a distance of 35.431 kms. This might suggest that the exact locations of the 2nd and third closest security installations doesn’t matter as long as they are within the desired distance.

Finally, there is the question of ethics. All the data used in this study has been ethically sourced, and no human data or personally identifying information (PII), whatsoever, is used in this study. The subject of study, school attacks, is at the school level, not at the level of individuals. Our results show that the absence of security installations close to schools is linked to increased probability of attacks by Boko Haram on schools - which suggests an increased security presence not too far from schools. This finding has been validated both statistically and through the ablation tests applied to our machine learning models. As such, it is robust. Please note that we are not suggesting any security presence inside schools, but within a few kilometers of schools which doesn’t seem overly intrusive.

Conclusion and recommendation

Since the start of the Boko Haram insurgency about 15 years ago, the group has carried out numerous deadly attacks including assassinations, attacks on markets, attacks on government buildings and security installations, and even attacking UN headquarters in Abuja. Despite numerous efforts to deter these attacks, the group remains active as of March 2024.

Protecting children is a strong imperative for society. The impact of school attacks on the boys and girls who are kidnapped is horrific. Not only do those who are abducted suffer the horrors of domestic servitude, sexual slavery, torture, enrolment as suicide bombers and child soldiers, and other forms of abuse, even those who escape these abductions are left with a deep reluctance to return to schools for fear of kidnapping. In the absence of a quality education, they may be doomed to a life of poverty.

In this work, we present a data-driven investigation of Boko Haram’s attacks on schools with rigorous statistical profiling and machine-learning analytics. Notably, we assemble a novel dataset spanning almost 14 years (available as part of our Supplementary Material). Using this data, our work leverages both statistical inference and machine learning to characterize Boko Haram’s school attacks in several novel aspects. First, we develop a set of univariate hypotheses based on a triad of factors: for a school to be vulnerable, (i) there must be Boko Haram Activity in the vicinity of the school (ii) Security Presence in the area around the school must be weak, and (iii) the Socioeconomic conditions prevalent in the vicinity of the school must be poor. While our statistical analyses validate the first two of these hypotheses, our findings on the third hypothesis are less straightforward. Wealthy areas are targeted (perhaps to gain ransom payments)—but so are poor areas (perhaps to capture sex slaves and child soldiers). Second, we apply advanced machine learning algorithms to predict which schools are at risk of attack by Boko Haram. Our algorithms perform well: when we consider a prediction to be correct when a school within 2 km. of the considered school is attacked, then we obtain both precision and recall exceeding 90%.

Third, through Ablation Tests, we find that the distance to nearby security installations is the single, most important risk factor of experiencing a school attack. In particular, the finding implies that increasing security presence around vulnerable schools will likely have a deterrent effect. Fourth, our machine learning models motivate the design of seven hypotheses that can be statistically addressed. We formally articulate and validate these hypotheses linking a diverse set of independent variables relating to the activity of Boko Haram, security presence, and socioeconomic conditions of a given region in Nigeria.

Finally, a major contribution of our work is the geospatial mapping of school vulnerability (Figs 1 and 2). Fig 1 displays the risk of school attacks for each and every school in Nigeria, while Fig 2 focuses on parts of Northeast Nigeria and shows the risk of school attacks there.

Recommendation. A major recommendation of this paper to both international donors and the Nigerian government is the need for a commitment to reducing the distance between vulnerable schools and nearby security installations. Our findings show clearly that the presence of security installations near schools is the single biggest factor in whether schools are attacked by Boko Haram. This needs to be an urgent priority for the Nigerian Government.

International donors and Nigeria’s security partners should also take note. As an example, a January 2024 U.S. State Department report [70] mentions millions of dollars of aid to Nigeria and hundreds of millions of dollars of weapons sales. As stated in [71] in a Foreign Policy article, such weapons sales may not solve Nigeria’s security problems. We recommend that at least some amount of such financial aid efforts focus, over time, on creating police stations near high-risk schools such as those shown in Figs 1 and 2.

Because of widespread allegations of abuse by Nigeria security forces, there is also a need for improved training and oversight of such forces.

The results generated using our best performing algorithm, AdaBoost, show that the single biggest factors involved in school attacks is the absence of security installations near many vulnerable schools. The results therefore suggest that in order to reduce the number of school attacks by Boko Haram, there needs to be a greater security presence in Nigeria, both in the Northeast of the country and in some Northern states like Kano State. Foreign aid and domestic investments intended to strengthen security in these regions must be accompanied with detailed analyses on exactly where such security installations must be located in order to maximize the protection that these schools and schoolchildren deserve. Specifically, the results in this paper provide a risk score for each school which can be used, in conjunction perhaps with facility location algorithms [72], to identify the best locations for such security installations. The creation of these new security installations must be accompanied with highly visible patrols whose routes vary daily so that a deterrent effect can be achieved. Till such interventions occur with the force needed, a stable situation will be hard to reach.

Supporting information

S1 Appendix A. Independent features calculated for every school record and their description.

(PDF)

pone.0320939.s001.pdf (82KB, pdf)
S2 Appendix B. Performance Metrics of Various Classifiers Across Different Dataset Variations.

(PDF)

pone.0320939.s002.pdf (123.3KB, pdf)
S3 Appendix C. Decision Tree Graphs.

(PDF)

pone.0320939.s003.pdf (1.3MB, pdf)
S4 Appendix D. Ward’s Wealth and Residential Area Type.

(PDF)

pone.0320939.s004.pdf (70.9KB, pdf)
S5 Appendix E. Supplementary Tables for the Forest Plots.

(PDF)

pone.0320939.s005.pdf (138KB, pdf)
S6 Appendix F. OverPass API Queries for Military Installations.

(PDF)

pone.0320939.s006.pdf (109.1KB, pdf)
S7 Appendix G. Additional Metrics for ML-Inspired Statistical Analyses.

(PDF)

pone.0320939.s007.pdf (69.8KB, pdf)
S8 Appendix H. AdaBoost Confusion Matrices.

(PDF)

pone.0320939.s008.pdf (83KB, pdf)
S9 Appendix I. Time-Series Analysis.

(PDF)

pone.0320939.s009.pdf (86.5KB, pdf)
S10 Appendix J. Description of Detailed Statistical Methods.

(PDF)

pone.0320939.s010.pdf (105KB, pdf)
S11 Appendix K. Additional Decision Tree - Inspired Statistical Analysis.

(PDF)

pone.0320939.s011.pdf (126.2KB, pdf)
S1 Data. Hypotheses Test Data.

(ZIP)

pone.0320939.s012.zip (84.5MB, zip)
S2 Data. ML Data.

(ZIP)

pone.0320939.s013.zip (61.1MB, zip)

Acknowledgments

We thank Dan Byman, Aaron Mannes, Chiara Pulice, and Marco Postiglione for detailed comments on earlier versions of this manuscript.

Data Availability

All relevant data are within the paper and its Supporting Information files.

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1.Amnesty International. Nigeria: Nine years after Chibok girls’ abducted, authorities failing to protect children. 2023. [Google Scholar]
  • 2.Wikipedia. Dapchi schoolgirls kidnapping. [cited 2024 Mar 30]. Available from: https://en.wikipedia.org/w/index.php?title=Dapchi_schoolgirls_kidnapping&oldid=1210701087.
  • 3.Al Jazeera. ‘Bring back our boys’: Northern Nigerians protest kidnappings. Boko Haram News. 2020. [Google Scholar]
  • 4.Carter S, Reals T. Witnesses in Nigeria say hundreds of children kidnapped in second mass-abduction in less than a week. CBS News. 2024. [Google Scholar]
  • 5.Carter S. Nigeria media report mass abduction of girls by Boko Haram or other Islamic militants near northern border. 2024. [Google Scholar]
  • 6.Armed Conflict Location & Event Data Project (ACLED). https://acleddata.com/about-acled/.
  • 7.UNHCR Global Appeal 2018-2019 - full report; 2018. [Google Scholar]
  • 8.Felbab-Brown V. In Nigeria, we don’t want them back: Amnesty, Defectors’ Programs, leniency measures, informal reconciliation, and punitive responses to Boko Haram; 2018. [Google Scholar]
  • 9.National Counterterrorism Center Terrorist Groups — Boko Haram [Internet]. [cited 2025 Feb 1]. Available from: https://www.dni.gov/nctc/terroristgroups/bokoharam.html.
  • 10.Cohen L, Felson M. Change and crime rate trends: a routine activity approach. Am Sociolog Rev. 1979; 44(4):588–608. [Google Scholar]
  • 11.Phillips PD. Characteristics and typology of the journey to crime. In: Crime: A spatial perspective. p. 167–180. Columbia University Press; 1980. [Google Scholar]
  • 12.Laukkanen M, Santtila P, Jern P, Sandnabba K. Predicting offender home location in urban burglary series. Forensic Sci Int. 2008; 176(2–3):224–35. doi: 10.1016/j.forsciint.2007.09.011 [DOI] [PubMed] [Google Scholar]
  • 13.Shakarian P, Subrahmanian V, Sapino M. GAPs: Geospatial abduction problems. ACM Trans Intell Syst Technol. 2011; 3(1):1–27. [Google Scholar]
  • 14.Braga A, Papachristos A, Hureau D. The effects of hot spots policing on crime: An updated systematic review and meta-analysis. Justice Q. 2014; 31(4):633–63. [Google Scholar]
  • 15.Nagin DS. Deterrence in the twenty-first century. Crime and Justice. 2013;42(1):199–263. [Google Scholar]
  • 16.Loughran TA, Paternoster R, Piquero AR, Pogarsky G. On ambiguity in perceptions of risk: Implications for criminal decision making and deterrence*. Criminology. 2011; 49(4):1029–61. doi: 10.1111/j.1745-9125.2011.00251.x [DOI] [Google Scholar]
  • 17.Blesse S, Diegmann A. The place-based effects of police stations on crime: Evidence from station closures. J Public Econ. 2022; 207:104605. doi: 10.1016/j.jpubeco.2022.104605 [DOI] [Google Scholar]
  • 18.Fondevila G, Vilalta-Perdomo C, Galindo Pérez MC, Cafferata FG. Crime deterrent effect of police stations. Appl Geography. 2021; 134:102518. doi: 10.1016/j.apgeog.2021.102518 [DOI] [Google Scholar]
  • 19.Ajala OA, Owabumoye BR. Influence of police stations’ location on crime incidence in developing countries like Nigeria. IJSSS. 2018; 10(2):132. doi: 10.1504/ijsss.2018.092537 [DOI] [Google Scholar]
  • 20.Bursik RJ Jr. Social disorganization and theories of crime and delinquency: Problems and prospects*. Criminology. 1988; 26(4):519–52. doi: 10.1111/j.1745-9125.1988.tb00854.x [DOI] [Google Scholar]
  • 21.Bhorat H, Lilenstein A, Monnakgotla J, Thornton A, Van der Zee K. The socioeconomic determinants of crime in South Africa: An empirical assessment; 2017.
  • 22.Brennan-Galvin E. Crime and violence in an urbanizing world. J Int Affairs. 2002: 123–145. [Google Scholar]
  • 23.Safavian SR, Landgrebe D. A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern. 1991; 21(3):660–74. doi: 10.1109/21.97458 [DOI] [Google Scholar]
  • 24.Kotsiantis SB. Decision trees: A recent overview. Artif Intell Rev. 2011; 39(4):261–83. doi: 10.1007/s10462-011-9272-4 [DOI] [Google Scholar]
  • 25.Backstrom J. Boko Haram: The history of an African Jihadist movement. Democracy Secur. 2019; 15(1):107–10. doi: 10.1080/17419166.2019.1578057 [DOI] [Google Scholar]
  • 26.Kendhammer B, McCain C. Boko Haram. Ohio University Press; 2018. https://www.bibliovault.org/BV.book.epl?ISBN=9780821423516. [Google Scholar]
  • 27.Subrahmanian V, Pulice C, Brown J, Bonen-Clark J. A machine learning based model of Boko Haram. 2021; 1:1–100. 10.1000/xyz123 [DOI]
  • 28.Walker A. What is Boko Haram?; 2012. [Google Scholar]
  • 29.World Economic Outlook (October 2023) - GDP per capita, current prices; 2023. [Google Scholar]
  • 30.Campbell J. Why Nigeria’s North South Distinction Is Important; 2011. [Google Scholar]
  • 31.Ayinde I, Otekunrin O, Akinbode S, Otekunrin O. Food security in Nigeria: Impetus for growth and development. J Agric Econ. 2020; 6:808–20. [Google Scholar]
  • 32.Kayode A, Arome A, Silas A. The rising rate of unemployment in Nigeria: the socioeconomic and political implications. Global Business Econ Res J. 2014; 3(1). [Google Scholar]
  • 33.Aiyedogbon J, Ohwofasa B. Poverty and youth unemployment in Nigeria, 1987–2011. Int J Business Soc Sci. 2012; 3(20). [Google Scholar]
  • 34.Nwozor A, Olowojolu K, Adedire S, Iseolorunkanmi J. How has political sharia fared in Nigeria?. Peace Rev. 2021; 33(1):115–23. [Google Scholar]
  • 35.Kerins PM, Mouaha-Bell SV. Boko Haram’s Rise and the Multinational Response. PhD thesis, Monterey, CA; Naval Postgraduate School, 2018. [Google Scholar]
  • 36.Carter C, Olson C. #BringBackOurGirls: Digital communities supporting real-world change and influencing mainstream media agendas. Feminist Media Studies. 2016; 16(5):772–87. [Google Scholar]
  • 37.Ofori-Parku SS, Moscato D. Hashtag activism as a form of political action: A qualitative analysis of the #BringBackOurGirls Campaign in Nigerian, UK, and US Press. Int J Commun. 2018; 12:23. [Google Scholar]
  • 38.Galehan J. Instruments of violence: Female suicide bombers of Boko Haram. Int J Law Crime Justice. 2019; 58:113–23. doi: 10.1016/j.ijlcj.2019.04.001 [DOI] [Google Scholar]
  • 39.Markovic V. Suicide squad: Boko Haram’s use of the female suicide bomber. Women & Criminal Justice. 2019; 29(4–5):283–302. doi: 10.1080/08974454.2019.1629153 [DOI] [Google Scholar]
  • 40.Nigeria Boko Haram: Militants ’technically defeated’ Buhari; 2015. [Google Scholar]
  • 41.Council on Foreign Relations. Boko Haram is back in the media spotlight, but it was never really gone; 2019. [Google Scholar]
  • 42.Reuters. Voice of America. Boko Haram Suspected in Attacks That Kill at Least 40 in Nigeria, Police Say. 2023 [cited 2024 Mar 30]. Available from: https://www.voanews.com/a/boko-haram-suspected-in-attacks-that-kill-at-least-40-in-nigeria-police-say/7337624.html.
  • 43.Suspected insurgents kill 14 in northeast Nigeria, residents say. Reuters. 2024 Jan 5 [cited 2024 Mar 30]; Available from: https://www.reuters.com/world/africa/suspected-insurgents-kill-14-northeast-nigeria-residents-say-2024-01-05/.
  • 44.Zenn J, Pearson E. Women, gender and the evolving tactics of Boko Haram. J Terrorism Res. 2014. [Google Scholar]
  • 45.Abdu A, Shehu S. The implication of Boko Haram insurgency on women and girls in North east Nigeria. J Public Adm Soc Welfare Res. 2019; 4(1):9–21. [Google Scholar]
  • 46.Okolie-Osemene J, Okolie-Osemene RI. Nigerian women and the trends of kidnapping in the era of Boko Haram insurgency: Patterns and evolution. Small Wars Insurgencies. 2019; 30(6–7):1151–68. doi: 10.1080/09592318.2019.1652011 [DOI] [Google Scholar]
  • 47.Yakubu M. Child insurgents in West Africa: The Boko Haram example in Nigeria, Chad and Cameroon. African J Governance Dev. 2016; 5(2):34–49. [Google Scholar]
  • 48.Amusan L, Ejoke UP. The psychological trauma inflicted by Boko Haram insurgency in the North Eastern Nigeria. Aggression Violent Behav. 2017; 36:52–9. doi: 10.1016/j.avb.2017.07.001 [DOI] [Google Scholar]
  • 49.Adam Yakasai B, Ayinla H, Bashir Yakasai H. Psychological impact of kidnapping on mental health and well-being of abductees: A study of abducted school children in Kaduna State, Nigeria. Act Sci Wom Hea. 2022:08–19. doi: 10.31080/aswh.2022.04.0443 [DOI] [Google Scholar]
  • 50.Omotuyi S. Operation safe corridor: The missing components in Nigeria’s deradicalisation programme as an effective counterterrorism strategy in northeast. African J Terrorism Insurgency Res. 2022; 3(1):97. [Google Scholar]
  • 51.Sarfati A, Donnelly P. Protection dilemmas arising from the reintegration of former combatants and the impact of the terrorist designation; 2022. [Google Scholar]
  • 52.GRID3 NGA - Data Resources. [Internet]. [cited 2024 Mar 30]. Available from: https://data.grid3.org/maps/381d1249defa4986b8fe1205c13f5999/explore.
  • 53.OpenStreetMap contributors. Planet dump. retrieved from https://planet.osm.org. [Internet]. [cited 2025 Feb 1]. Available from: https://www.openstreetmap.org.
  • 54.GRID3 NGA - Other POI: Police Stations [Internet]. [cited 2025 Jan 21]. Available from: https://data.grid3.org/datasets/GRID3::grid3-nga-other-poi-police-stations/about.
  • 55.Overpass Turbo. [Internet]. [cited 2024 Mar 30]. Available from: http://overpass-turbo.eu/index.html.
  • 56.GRID3 NGA - Communications Access Risk Score per State. [Internet]. [cited 2024 Mar 30]. Available from: https://cdn.arcgis.com/home/item.html?id=f7a7eaf2100549d1818ad3c42f983dd2.
  • 57.GRID3 NGA - Risk Index: Exposure per Ward. [Internet]. [cited 2024 Mar 30]. Available from: https://www.arcgis.com/home/item.html?id=8b3317d1b395405a8374bbf9843f9aa0.
  • 58.GRID3 NGA - Socioeconomic Vulnerability Risk Score per Ward. [Internet]. [cited 2024 Mar 30]. Available from: https://m.arcgis.com/home/item.html?id=288136ef843e453fb90d7bc84261fa10.
  • 59.Nearest neighbor analysis with large datasets. [Internet]. [cited 2024 Mar 30]. Available from: https://autogis-site.readthedocs.io/en/2019/notebooks/L3/nearest-neighbor-faster.html
  • 60.Maria E, Budiman E, , Taruk M. Measure distance locating nearest public facilities using Haversine and Euclidean Methods. J Phys: Conf Ser. 2020; 1450(1):012080. doi: 10.1088/1742-6596/1450/1/012080 [DOI] [Google Scholar]
  • 61.Zheng X, Han J, Sun A. A survey of location prediction on Twitter. IEEE Trans Knowl Data Eng. 2018; 30(9):1652–71. doi: 10.1109/tkde.2018.2807840 [DOI] [Google Scholar]
  • 62.Mishra P, Singh U, Pandey CM, Mishra P, Pandey G. Application of student’s t-test, analysis of variance, and covariance. Ann Card Anaesth. 2019; 22(4):407–11. doi: 10.4103/aca.ACA_94_19 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Benjamini Y. Discovering the false discovery rate. J R Stat Soc Ser B: Stat Methodol. 2010; 72(4):405–16. doi: 10.1111/j.1467-9868.2010.00746.x [DOI] [Google Scholar]
  • 64.Adisa W. Transnational organized crime, terrorist financing and boko haram insurgency in Nigeria. J Terrorism Stud.; 3(1):1. [Google Scholar]
  • 65.Simeon OA. Kidnapping economy and increasing insecurity: Rethinking Nigeria security effectiveness. Int J Soc Sci Manag Rev. 2024; 4(4). [Google Scholar]
  • 66.Burke J. Nigeria denies paying ransom and freeing Boko Haram leaders for Chibok girls. The Guardian; 2016. [Google Scholar]
  • 67.Hinkkainen K, Pickering S. Strategic risk of terrorist targets in urban vs. rural locations. 2013; 1:379–99.
  • 68.Bureau UC. Census urban and rural classification and urban area criteria; 2010. [Google Scholar]
  • 69.Yates F. Contingency tables involving small numbers and the χ2 test. J R Stat Soc Ser B: Stat Methodol. 1934; 1(2):217–35. doi: 10.2307/2983604 [DOI] [Google Scholar]
  • 70.U.S. Security Cooperation with Nigeria. https://www.state.gov/u-s-security-cooperation-with-nigeria/. Accessed: 2024-03-13.
  • 71.Kwelum C, Obasanjo I. More weapons won’t solve Nigeria’s security crisis. Foreign Policy. 2022. [Google Scholar]
  • 72.Celik Turkoglu D, Erol Genevois M. A comparative survey of service facility location problems. Ann Oper Res. 2019; 292(1):399–468. doi: 10.1007/s10479-019-03385-x [DOI] [Google Scholar]

Decision Letter 0

Steve Zimmerman

20 Dec 2024

PONE-D-24-14163A Quantitative Geospatial Analysis of the Risk that Boko Haram Will Target a SchoolPLOS ONE

Dear Dr. Subrahmanian,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

 Please accept our apologies for the delay in issuing an editorial decision. Unfortunately, we have to search for a new Academic Editor several times since you submitted your manuscript. 

The manuscript has now been evaluated by six reviewers, and their comments are available below.This is a large number of reviewers. However, we have had difficulty securing reviewers with the relevant expertise to assess the topic and methods of your study. Although all the reviewers have brought their own perspectives to bear, and all offer constructive criticism, please pay particular attention to the comments made by reviewers 2 and 5. Could you please revise the manuscript to carefully address the concerns raised?

Please submit your revised manuscript by Feb 02 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Steve Zimmerman, PhD

Senior Editor, PLOS One

Journal Requirements:

1. When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

3. We note that Figures 1,2 and 5 in your submission contain [map/satellite] images which may be copyrighted. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For these reasons, we cannot publish previously copyrighted maps or satellite images created using proprietary data, such as Google software (Google Maps, Street View, and Earth). For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright.

We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission:

1. You may seek permission from the original copyright holder of Figures 1,2 and 5 to publish the content specifically under the CC BY 4.0 license.

We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text:

“I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.”

Please upload the completed Content Permission Form or other proof of granted permissions as an ""Other"" file with your submission.

In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].”

2. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only.

The following resources for replacing copyrighted map figures may be helpful:

USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/

The Gateway to Astronaut Photography of Earth (public domain): http://eol.jsc.nasa.gov/sseop/clickmap/

Maps at the CIA (public domain): https://www.cia.gov/library/publications/the-world-factbook/index.html and https://www.cia.gov/library/publications/cia-maps-publications/index.html

NASA Earth Observatory (public domain): http://earthobservatory.nasa.gov/

Landsat: http://landsat.visibleearth.nasa.gov/

USGS EROS (Earth Resources Observatory and Science (EROS) Center) (public domain): http://eros.usgs.gov/#

Natural Earth (public domain): http://www.naturalearthdata.com/

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: Partly

Reviewer #6: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: Yes

Reviewer #6: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: Yes

Reviewer #6: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: No

Reviewer #6: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The article is interesting and could be used for public awareness. The article needs further improvement. My comment are;

1. Citation is missing in 'A similar claim was repeated by Nigeria’s Information Minister in October 2019()'.

2. Give a predictive AI based solution for the problem mentioning the period it might take for a stable situation

Reviewer #2: In the manuscript, utilizing a geospatially tagged data set, the coauthors identified factors affecting attacks on schools, implemented machine learning models to quantify the likelihood that a school will be at risk of a Boko Haram attack and finally concluded with a policy recommendation. The work presented is dedicated to an interesting and important topic. I have multiple concerns are as follows:

1. Order of Figure 3a vs. 3b is wrong. And in the caption “2023 numbers only run …” 2023 seems redundant?

2. In section Socio-Economic Data, there is no clear definition of “risk score”. How it derived from the data? Whether it’s valid or not?

3. Figure 6 and 8 are switched? Figure 6, 7, 8 are with low resolution.

4. In the sentence below Table 1, Table 1B summarized the non-urban schools or urban schools?

5. In the machine learning based analysis section, It seems like the models were only fitted in training set but not validation set nor testing set. It may lead to overfit problem. And prediction accuracy in training set is not valid for prediction in future events.

6. In line 446, there are two “that”.

7. In Figure 9, results for k=1 were shown. How connect the results with results in Table 3 where models for k=1 provided worst performance? Brief discussion is welcome.

8. For section “Multivariate Machine Learning Inspired Statistical Inferences of Boko Haram School Attacks”, using odds ratio assessing the hypothesis is not justifiable enough. Other assessment such as sensitivity, specificity, PPV, NPV, accuracy, error rate, …, could be provided in addition to OR.

9. Figure 1 & 2 displays the risk of school attacks. How those risk estimated? Which method was used?

Reviewer #3: The research article presents a quantitative geospatial analysis of the risk factors associated with Boko Haram's targeting of schools in Nigeria over nearly 14 years, utilizing statistical inference and machine learning techniques. It identifies three main vulnerability factors: proximity to Boko Haram activity, weak security presence, and socioeconomic conditions. The study finds that both wealthy and poor areas are targeted for different reasons—wealthy areas for ransom and poor areas for capturing sex slaves and child soldiers. Advanced machine learning models, particularly the AdaBoost classifier, achieved an F1 score of 0.85 in predicting attacks within 2km of schools, with key predictive features including distance to security installations and previous attacks. The findings underscore the critical role of security presence in deterring attacks and highlight the complex interplay between socioeconomic factors and risk. Overall, this research provides valuable insights for policymakers and security forces, suggesting that enhanced protection strategies could be developed based on the identified risk factors and predictive models. However, there are some concerns, listed below:

1. The authors should try to incorporate time series analysis or recurrent neural networks to better capture the temporal dynamics of conflict evolution over time.

2. The authors should remove the words first ever from abstract and conclusion.

3. Clarify the rationale behind choosing specific machine learning algorithms.

4. Provide more details on data preprocessing and feature selection methods.

5. Explain how the 2km threshold for predictions was determined.

6. All the tables should include axis labels.

7. It would be helpful with provide visualizations (e.g., ROC curves, confusion matrices) to illustrate model performance.

8. Address potential biases in the dataset and discuss their implications

9. Provide a separate methods section on statistical methods and analysis

Reviewer #4: More details about the data sources, particularly the socioeconomic data and the locations of security installations, would improve the manuscript. Reproducibility depends on transparency on data reliability and potential biases in data sources.

Describe any imputation methods that were employed and include a detail on how any missing data was handled. It is also necessary to address the handling of outliers, especially in socioeconomic and activity factors.

The manuscript would benefit from a brief description of how insights from the AdaBoost model and decision trees could be applied in a practical setting, even if the research highlights the significance of features in prediction.

By employing quantitative techniques to examine the risk factors associated with Boko Haram's attacks on schools, this paper tackles a pressing and important subject. It makes a significant contribution to the fields of geospatial analysis and counterterrorism studies.

Despite their strength, machine learning models have drawbacks. Talk about possible drawbacks such as the model's tendency to overfit historical data, particularly in light of Boko Haram's changing strategies, and how it might apply to attacks in the future or in other regions.

Since predictive models can impact real-world security decisions, address any ethical considerations related to this research, such as the potential consequences of model inaccuracies.

Reviewer #5: The paper A Quantitative Geospatial Analysis of the Risk that Boko Haram Will Target a School submitted to PLOSOne makes a number of interesting contributions that empirically, theoretically and practically relevant in the study of conflict and international development. The paper develops a one-of-a-kind dataset on Boko Haram (BH) attacks with substantial contextual information. It is a step up from widely-used ACLED. It then develops a set of spatial measures to examine hypotheses about mechanisms that may be involved in BH attacks. There is reasonable observational evidence that the level of general BH activity in the area and proximity to security (e.g., police or military facilities) are associated with the likelihood of attack on a school. If the patterns of association have a causal underpinning, then the authors point to useful policy recommendations.

Overall, the paper may be acceptable for publication in PLOSOne with substantial revisions. I offer the following suggestions in the hope that they help the authors revise the paper for resubmission.

The general points to consider are:

1. The paper could use a careful editing for language.

2. The story gets lost in a relatively large number of tests and separate analyses that are not fully described. The authors might consider slimming down the manuscript, concentrate on solid narrative development, and move “nice to have but not essential” testing to an integrated Supplementary file.

3. The main measure is a bit confusing. I have a hard time visualizing what it means to measure school attacks within 10 km against BH activity within 50 km. What if there is more than one school within that 10 km radius and one of them is attacked? I think a visualization of the method would go a long way to better supporting the findings.

The remainder of my comments are in order of presentation and mix both larger issues with some smaller ones.

1. Pg. 2, lines 16-17, is a very powerful statement. It would punch even harder if it ended with a something like "...all of this impact arises from a group with an estimated N number of members/fighters." Bring home the point that asymmetry can still produce huge harm.

2. Pg. 2, lines 43-53. This is small, but I would hang the activity hypothesis first on "routine activities theory" with a secondary pointer to "journey-to-crime". The basic idea is that areas where BH are operating have established some type of (operational) routine. Targets that fall near or within that routine are easier to accommodate than ones that are outside of the routine.

@article{cohen1979rat,

author = {Cohen, L. E. and Felson, M.},

title = {Social change and crime rate trends: A routine activity approach},

journal = {Am Sociol Rev},

volume = {44},

pages = {588-608},

DOI = {10.2307/2094589},

year = {1979},

type = {Journal Article}

}

3. Pg. 3, lines 61-62. If you want an explanation for "why" this might be, you could argue that the police stations serve as a deterrent where BH fighters perceive a greater likelihood of getting caught if operating near those spaces.

@article{RN5221,

author = {Nagin, Daniel S.},

title = {Deterrence in the Twenty-First Century},

journal = {Crime and Justice},

volume = {42},

pages = {199-263},

DOI = {10.1086/670398},

year = {2013},

type = {Journal Article}

}

@article{RN2912,

author = {Loughran, Thomas A. and Paternoster, Raymond and Piquero, Alex R. and Pogarsky, Greg},

title = {On Ambiguity in Perceptions of Risk: Implications for Criminal Decision Making and Deterrence},

journal = {Criminology},

volume = {49},

number = {4},

pages = {1029-1061},

DOI = {10.1111/j.1745-9125.2011.00251.x},

year = {2011},

type = {Journal Article}

}

4. Pg. 3, lines 65-66. This probably directly relates to so-called social disorganization theory in criminology. A key citation is (though there are thousands of others too):

@article{RN3950,

author = {Bursik Jr., Robert J.},

title = {Social Disorganization and Theories of Crime and Delinquency: Problems and Prospects},

journal = {Criminology},

volume = {26},

number = {4},

pages = {519-552},

DOI = {10.1111/j.1745-9125.1988.tb00854.x},

year = {1988},

type = {Journal Article}

}

5. Pg. 3, line 68. Hypothesis 4 is not particularly clear to me. I think the authors are looking forward to their results.

6. Pg. 6, lines 201-2002. There are a few years with no recorded attacks. Is this because there were no attacks or because of other potential issues with data acquisition?

7. Pg. 6, lines 210-211. When are schools on break? “While non-school attacks are more or less uniformly distributed over the 12 months of the year, school attacks occur less frequently in January, August, and December.”

8. Pp. 6-8. Overall, a tabular summary of data might be more efficient use of space.

9. Pg. 8, lines 276-277. The core measure used in the paper needs better explanation. It is not clear if this is a measure that the authors have developed themselves or if it has a source in the literature. There are a couple of ways to understand the statement “a school attacked happened within k ={1,2,3,5,10} km”. This could mean that there are, say, 10 students from school s. They weren't kidnapped directly from the school s but at a location k-kilometers away from the location of s. Or, there is a school s_i and a school s_j. They are 4.3km apart. If schooI s_i is attacked then school s_j also is considered attacked within the 5km distance band of school s_i. What is the correct interpretation?

10. Pg. 8, lines 300-301 and Figure 6. The functional form of the points as you move from the k=1 to the k = 10 panel is very regular; basically moving from exponential-like to linear. This suggest to me that there is something fundamental about the geometry of the measurement units that is driving the pattern. It would be very useful to simulate a null model here to know how this measure behaves when there are no correlations in the data. For example, one could assume that all BH events are all 2D Poisson, attacked and non-attacked schools and security installations are also Poisson distributions of points. Then investigate how the measures change as discs of radius k are compared.

11. Pg. 13, lines 463-466. Finding 5. It seems unlikely for such extreme events, but is it possible that there is reporting bias that is correlated with proximity to police stations? That is, attacks farther from formal law enforcement rely on alternative means of solution (such as family, clan groups, local traditional leaders) and therefore are never reported to a formal source (news or police).

12. Pg. 14, lines 494-509 (and Table 4 on pg. 13). It is hard to intuit the rank order arrangement of distance to nth-closest security installation. It feels like there is some interaction between the closeness of security installations and the area size implied by k, but there’s no rhyme nor reason to it. Sure, it is ok to basically say this variable has predictive power, but it would also be nice to have a better understanding why in a behavioral (and geographic) sense certain orderings matter as one scale and not another. For example, why would the 3rd, 4th and 5th closest be important at k = 5, but not 1 and 2. Some explanation seems needed.

13. Pp. 13-14. The statistical testing associated with the decision tree analyses could use better explanation.

Reviewer #6: Overall, this is a very will written manuscript, doing a great job at spatially analyzing Boko Haram's attacks on Schools. There are only two concerns that I have in improving the manuscript.

First, the argument for hypothesis #3 is quite weak and the author(s) need to bolster the rationale for it.

The other addition that I would suggest is for the author(s) explicitly state what the unit of analysis is (i.e. the areal unit). Is it the school? The ward?

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Linchen He

Reviewer #3: No

Reviewer #4: No

Reviewer #5: No

Reviewer #6: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2025 Jun 17;20(6):e0320939. doi: 10.1371/journal.pone.0320939.r003

Author response to Decision Letter 1


5 Feb 2025

We would like to thank all the reviewers for their insightful comments. We found the comments extremely helpful in improving the framing of the main results, the analysis, presentation of the paper, and embedding it within the context of related work in criminology and sociology. Thanks so much! The Comment Number (X.Y) indicates that we are addressing the Y’th comment made by reviewer #X. Thanks again – and we hope these changes address your concerns!

REFEREE #1:

1.0. Thanks for your kind words.

1.1. Thank you for flagging this. It has been fixed.

1.2. Thanks for your kind words. We have included a paragraph on this in the Conclusion. Thanks for the suggestion.

REFEREE #2:

2.0. Thanks for the kind words.

2.1. Thank you for pointing this out! We have now fixed it.

2.2. Thank you for this comment. These “Risk Scores” were not created by us but were provided by the GRID3 Hub. We have updated the Socio-Economic data section to provide some more information about it. Thanks!

2.3. Figures 6 and 8 are in the correct order. We have updated the figures for better resolution. Thanks for the helpful comment.

2.4. Thank you for pointing this out. Table 1B summarizes urban schools. The typo has been fixed in the manuscript.

2.5. We should have explained it better in the initial version of the paper. All the ML-based analysis involves a standard 10-fold cross validation analysis with training/validation done on 9 folds and testing on the 10th holdout fold. So, there shouldn’t be any overfitting. We have added a few sentences on this in the “Predictive Performance” paragraph of the “Machine Learning Based Analysis” section. We are glad you asked about this as we should have made this clear early on. Please also see the paragraph in the new “Limitations” section (just before the Conclusion”. Our prior book on Boko Haram shows prediction on WHEN different types of attacks will occur. This work focuses specifically on WHERE. Predicting both WHEN and WHERE at the same time is too difficult because of the paucity of data – 76 school attacks over 14 years (so about 5.4 school attacks per year on average).

2.6. Fixed, thank you.

2.7. We have added a paragraph at the beginning of the “Multivariate Machine Learning Inspired Statistical Inferences of Boko Haram School Attacks” to make this connection. We have also significantly expanded the third paragraph of that section. Thanks for the suggestion.

2.8. Thanks for the excellent suggestion. We have now computed Positive Predictive Values (PPV), Negative Predictive Values (NPV), Sensitivity, and Specificity as additional evaluation metrics for the relevant analyses. We believe these additional metrics, as the reviewer kindly suggested, may add additional insights about our machine learning models and statistical inference. Thus, we now include it as Supplemental Table 14 and 15 in Appendix G.

2.9. The risks shown in Figures 1 and 2 are based on the probabilities returned by the best performing ML model, namely AdaBoost. We have updated the captions of Figures 1 and 2 to reflect this. Thanks for the suggestion.

REFEREE #3:

3.0. Thanks so much for your kind and encouraging words.

3.1. Thank you for your comment. A time-series analysis was performed on the time-series version of the dataset using LSTM-based models. However, due to the huge imbalance in the dataset (there were only 76 out of a total of over 103,000 schools were attacked), the LSTM fails to capture the temporal dynamics well, by overfitting in favor of the “No Conflicts” class. For more details on the performance metrics, please refer to Appendix H.

3.2. Thank you for your comment, it has been fixed in the manuscript.

3.3. We have added a paragraph at the beginning of the section on “Machine Learning-based Analysis”. Thanks for the suggestion!

3.4. Please note that all the features selected are based on theory from the social science literature. We have added a paragraph at the beginning of the section on “Machine Learning-based Analysis”. Thanks!

3.5. We wanted to investigate how predictions of whether a school is at risk of attack vary in quality as we expand the window of what someone might consider threatening to the school. We investigated how predictive performance changes when the threshold is set to 1, 2, 3, 5, 10km ranges. At the 1km threshold, our experiments demonstrate a lack of ability to predict if a school within 1km of a given school S will be attacked. But for the 2km, 3km, 5km, and 10km levels, our best predictive model performs well. We have added a discussion on this toward the beginning of the “Statistical Inference of School Attacks in Relation to Social Attributes” section.

3.6. Thank you for your comment, the plots have been updated with the axis labels.

3.7. Thank you for your comment. The ROC curves’ plot has been added and addressed in page 17 (Figure 14), and you can find the confusion matrices in Appendix H of the supplementary material. For full transparency and reproducibility, when running the AdaBoost model to gather this data, we observed minor variations in precision and recall across different dataset variations (k=1-10). These variations were no more than 2% from the values initially reported and did not impact the originally reported F1-scores. These changes are attributed to updates in the sklearn AdaBoost Classifier module. Consequently, we have specified the version of sklearn used in the paper to ensure clarity and consistency, as well as updated the Table 4 and Appendix B.

3.8. We have added a new section called “LImitations” and addressed this point there. Thanks for the suggestion.

3.9. Thanks. We have added a new Appendix (appendix J) with details on the statistical methods used.

REFEREE #4:

4.1. Thank you for your comment, we have updated the security installations section to address this comment. We have also added more information about the socioeconomic data (but as this data was created by GRID3, not by us, we are limited to publicly available information about it). We have created and included new maps showing the locations of security installations (Figure 6) and socioeconomic characteristics of regions in Nigeria (Figures 7-9).

4.2. Thank you for your comment. We updated the Data section (lines 211-215) and the Socio-Economic Data (288-295) to address it.

4.3. We have added a paragraph at the end of the Conclusions that explains this.

4.4. Thanks so much for the kind words. We appreciate it.

4.5. We have added 2 new paragraphs in a new section called “Limitations”. Thanks for the suggestion.

4.6. Thanks for raising this point. We have added a new paragraph on ethics in the Limitations section.

REFEREE #5:

5.0. Thanks so much for your kind comments and excellent suggestions for improvement.

5.1. Thanks. We have carefully proofread the paper. In addition, we had the paper read by a couple of external people (native English speakers). Hopefully the grammatical errors and typos have now been fixed.

5.2. Thank you for your comment. We have slimmed down the manuscript by moving a portion of our Decision Tree-based statistical analysis to Appendix K . Instead, we are summarizing these hypotheses in the manuscript in a tabular manner (Table 8)

5.3. We have updated page 11 with a new visual (Figure 10) representation of assigning the dependent variables to the schools, along with more clarifications on the process.

5.4. We have added a note to this effect in the second paragraph of the Introduction. Thanks for the great suggestion!

5.5. Thank you for your comment. The manuscript has been updated to incorporate the theory.

5.6. Thank you for your comment. The manuscript has been updated to incorporate the theory.

5.7. Thank you for your comment. The manuscript has been updated to incorporate the theory.

5.8. We have added some rationale for this hypothesis based on the social science literature just before Hypothesis 4. Thanks for pointing it out.

5.9. Thank you for your comment. There were no issues with the data acquisition from ACLED. We also cross-referenced the group’s activity on schools with other online sources. We have now updated the section with more detail (lines 224-226).

5.10. Thank you for this helpful comment. We have extended the Boko Haram section, particularly lines 235-239 to address the issue you have raised.

5.11. The tabular summary has been added on page 6. Thanks for the suggestion!

5.12. Thanks. We are not claiming this is a new measure. It is simply a way of framing an error-measure in terms of a binary classification problem. It is similar to something called distance-based accuracy. We have added a better explanation in two new paragraphs introduced before Hypothesis 1. Thanks for pointing out the lack of clarity.

5.13. We thank reviewer for this comment. In this paper, we treated each radius as independent analysis rather than interdependent entities. Thus, considering all radii together with Poisson methods was not applicable under this approach. Instead, we believe a linear model-based approach would be the more suitable approach as it has greater generalizability. Further, this is also the basis for us to adjust our raw P-values with the Bonferroni method. We hope this is ok!

5.14. This could certainly happen – but we are not aware of any instances where this did happen. We have added a paragraph on this in our new Limitations section. Thanks for the excellent suggestion.

5.15. This is an interesting point, and the exact answer is not clear. We spent some time digging into this, but couldn’t come up with anything conclusive. But regardless of the actual importance rank, all of these features related to proximity to security installations are clearly important. We have added a sentence about the importance of this Table in Finding 5. Thanks.

5.16. We have added Appendices G and J which provides more information on the odds ratio-based testing and also provides additional statistics (beyond odds ratio). Thanks for the suggestion!

REFEREE #6:

6.0. Thanks so much for your very kind words.

6.1. We have now added some supporting rationale before Hypothesis 3. Thanks for the suggestion!

6.2. Thank you for your comment, the clarification was added to the School Data section (line 246).

Attachment

Submitted filename: RESPONSE TO REVIEWERS.pdf

pone.0320939.s014.pdf (120.7KB, pdf)

Decision Letter 1

Jessica Leight

27 Feb 2025

A Quantitative Geospatial Analysis of the Risk that Boko Haram Will Target a School

PONE-D-24-14163R1

Dear Dr. Subrahmanian,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Jessica Leight, PhD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

Reviewer #5: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Partly

Reviewer #5: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

Reviewer #5: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

Reviewer #5: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

Reviewer #5: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: (No Response)

Reviewer #5: The authors have addressed all of my concerns. I believe this paper is an important contribution to the literature.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Reviewer #5: No

**********

Acceptance letter

Jessica Leight

PONE-D-24-14163R1

PLOS ONE

Dear Dr. Subrahmanian,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Jessica Leight

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Appendix A. Independent features calculated for every school record and their description.

    (PDF)

    pone.0320939.s001.pdf (82KB, pdf)
    S2 Appendix B. Performance Metrics of Various Classifiers Across Different Dataset Variations.

    (PDF)

    pone.0320939.s002.pdf (123.3KB, pdf)
    S3 Appendix C. Decision Tree Graphs.

    (PDF)

    pone.0320939.s003.pdf (1.3MB, pdf)
    S4 Appendix D. Ward’s Wealth and Residential Area Type.

    (PDF)

    pone.0320939.s004.pdf (70.9KB, pdf)
    S5 Appendix E. Supplementary Tables for the Forest Plots.

    (PDF)

    pone.0320939.s005.pdf (138KB, pdf)
    S6 Appendix F. OverPass API Queries for Military Installations.

    (PDF)

    pone.0320939.s006.pdf (109.1KB, pdf)
    S7 Appendix G. Additional Metrics for ML-Inspired Statistical Analyses.

    (PDF)

    pone.0320939.s007.pdf (69.8KB, pdf)
    S8 Appendix H. AdaBoost Confusion Matrices.

    (PDF)

    pone.0320939.s008.pdf (83KB, pdf)
    S9 Appendix I. Time-Series Analysis.

    (PDF)

    pone.0320939.s009.pdf (86.5KB, pdf)
    S10 Appendix J. Description of Detailed Statistical Methods.

    (PDF)

    pone.0320939.s010.pdf (105KB, pdf)
    S11 Appendix K. Additional Decision Tree - Inspired Statistical Analysis.

    (PDF)

    pone.0320939.s011.pdf (126.2KB, pdf)
    S1 Data. Hypotheses Test Data.

    (ZIP)

    pone.0320939.s012.zip (84.5MB, zip)
    S2 Data. ML Data.

    (ZIP)

    pone.0320939.s013.zip (61.1MB, zip)
    Attachment

    Submitted filename: RESPONSE TO REVIEWERS.pdf

    pone.0320939.s014.pdf (120.7KB, pdf)

    Data Availability Statement

    All relevant data are within the paper and its Supporting Information files.


    Articles from PLOS One are provided here courtesy of PLOS

    RESOURCES