Version Changes
Revised. Amendments from Version 2
Amendments from Version 2 This third version includes the following amendments: We have corrected the typos that have appeared in the text. We have clarified how we have modelled the populations. We have changed Figure 8 to include wavelength units as well. And we have improved the bibliography with two new references. We sincerely appreciate the comments made by the reviewers during this process.
Abstract
Despite the global efforts made in the fight against malaria, the disease is resurging. One of the main causes is the resistance that Anopheles mosquitoes, vectors of the disease, have developed to insecticides. Anopheles must survive for at least 10 days to possibly transmit malaria. Therefore, to evaluate and improve malaria vector control interventions, it is imperative to monitor and accurately estimate the age distribution of mosquito populations as well as their population sizes. Here, we demonstrate a machine-learning based approach that uses mid-infrared spectra of mosquitoes to characterise simultaneously both age and species identity of females of the African malaria vector species Anopheles gambiae and An. arabiensis, using laboratory colonies. Mid-infrared spectroscopy-based prediction of mosquito age structures was statistically indistinguishable from true modelled distributions. The accuracy of classifying mosquitoes by species was 82.6%. The method has a negligible cost per mosquito, does not require highly trained personnel, is rapid, and so can be easily applied in both laboratory and field settings. Our results indicate this method is a promising alternative to current mosquito species and age-grading approaches, with further improvements to accuracy and expansion for use with wild mosquito vectors possible through collection of larger mid-infrared spectroscopy data sets.
Keywords: Malaria, Anopheles gambiae, Anopheles arabiensis, Vector control, Machine learning, Mid-infrared spectroscopy
Between 2000 and 2015, insecticide-based control interventions targeting mosquito vectors averted an estimated 537 million malaria cases 1. Nevertheless, malaria still kills hundreds of thousands of people each year (445,000 in 2016), mainly in sub-Saharan Africa 2. Additionally, there is concern that progress may have stalled after more than a decade of success in global malaria control 2. Of major concern is the increase in insecticide resistance among mosquito populations throughout Africa 3, which is degrading the lethality and effectiveness of vector control tools, notably indoor residual spraying (IRS) and long-lasting insecticide treated nests (LLINs) which have been the cornerstones of malaria control in the past decades 4. Indeed, much of the effectiveness of LLINs and IRS comes from community-wide reductions in vector population size, not merely from preventing people from getting bitten 5.
Measurement of female mosquito vector survival is an important biological determinant of malaria transmission intensity 6, 7. This is because malaria parasites ( Plasmodium spp.) require more than 10 days of incubation inside female mosquito vectors (extrinsic incubation period, EIP) before they become infectious 8– 11. While there is uncertainty about mosquito survival in the field, crude estimates suggest the median lifespan of African malaria vectors is 7–10 days 12. Thus, only relatively old mosquitoes can transmit the parasite 13. As a result, even minor reductions in mosquito survival can have exponential impacts on pathogen transmission 10, 14. Consequently, accurate and high-resolution estimation of both mosquito abundance and longevity is essential for the assessment of the impact of various vector control measures.
Despite the crucial importance of mosquito demography to vector control, there are few reliable tools for rapid, high-throughput monitoring of mosquito survival in the wild. Conventionally, mosquito age has been approximated by classifying females (the only sex that transmits malaria) into groups based on their reproductive status as assessed through observation of their ovarian tracheoles 15. This widely-employed technique distinguishes females who have not yet laid eggs (nulliparous) from those that have laid at least one egg batch (parous), with the latter group assumed to be older than the former because the gonotrophic cycle between blood feeding and oviposition takes ~ 4 days. While useful for approximating general patterns of survival 16, this method is crude and cannot distinguish between females who have laid eggs only once or multiple times. Alternatively, more refined methods have been developed to estimate the number of gonotrophic cycles a female mosquito has gone through based on follicular relics or dilatations formed during each oviposition 17, although the conversion between gonotrophic cycles and actual age is imprecise (especially now that LLINs are limiting regular access to blood-meals) 18. While an improvement on the simple parity classification method, this approach is extremely technically demanding and time-consuming 19. Additionally, it is unsuitable for analysis of the large sample sizes necessary for estimating mosquito population structure 20.
Given these problems with ovary-based assessment 21, there has been significant investigation of alternative, molecular-based approaches to estimate mosquito age. These methods include: counting cuticle rings representing daily growth layers of the mosquito skeletal apodemes 22, chromatographic analysis of cuticular hydrocarbon chains 23, assessment of pteridines using fluorescence techniques 24, transcriptomic profiling 25, and mass spectrometric analysis of mosquito protein expression 26. However, thus far the level of accuracy, high cost, and/or need of highly trained users suggest that they might not be suitable for application in the field.
In addition to age, identification of mosquito species is crucial for estimation of malaria transmission dynamics. In Africa, the bulk of malaria transmission is carried out by members of the Anopheles gambiae sensu latu and Anopheles funestus sensu latu species compleses 27. The An. gambiae s.l. complex includes several morphological identical sibling species that can only be distinguished by molecular analysis 28– 30. Despite being morphologically identical, members of this group vary significantly in behaviour, transmission potential, and response to vector control measurements 31. For example, two major vectors in the An. gambiae s.l. group, An. arabiensis and An. gambiae, can differ in their propensity to enter and rest in houses, their host species choice, breeding conditions, resistance to insecticides, and tolerance to dry climates 6, 32, 33. Currently, An. gambiae s.l. species are best distinguished by polymerase chain reaction (PCR) methods 30, 34– 36, which are time-consuming and still relatively costly, and can thus only be carried out on a subsample of mosquitoes collected during typical entomological surveillance conducted by many agencies in Africa. Alternative techniques have been developed such as isoenzyme electrophoresis 37 or chromatography of cuticular components 24, but these are also very laborious and have weak discriminatory power 38.
Non-PCR-based methods often rely on structural and chemical differences in the cuticle to discriminate insects according to their species and other traits. In particular, near-infrared spectroscopy (NIRS) has been evaluated as a general strategy for examining insects since it does not require reagents and holds promise as a fast, practical, non-destructive, and cost-effective method for entomological surveillance. The results obtained to date have proved that the chemical composition of mosquitoes and other insects not only changes between species 39– 41, also across different age 40, 42– 45, according to resistance to insecticides 46 and in the presence of an infectious agent 47, 48. While promising, the NIRS typical approach has certain drawbacks. As it employs the most energetic portion of the infrared spectrum, the absorption bands are generated by two indirect processes: overtones (a vibration excited at a multiple of the fundamental frequency) and combinations (two or more fundamental vibrations excited simultaneously). Both processes are more incoherent and less frequent than the absorption of light by fundamental vibrations, so their absorption bands are wide and weak. As a result, NIR spectrum of a mosquito, formed by a combination of dozens of these bands, consists of a few features standing out against a background of continuous absorption 49. Also, most NIRS analyses use a dispersive method to collect the absorption spectra from insects, so the reflectivity of the sample is not controlled and the intensity of the bands of the spectrum depends on how the mosquito is placed in the spectrometer. In addition, the results are normally analysed using Partial Least Squares (PLS) regression, which is prone to over-fitting (i.e. the production of a model that corresponds too closely to a particular set of data and may therefore fail to predict future observations reliably) 50. This problem commonly arises when the number of samples is relatively small, and the number of variables is large.
Here we tested if these limitations can be overcome by shifting the measurement range (25,000–4,000 cm -1) to the mid-infrared region (4,000–400 cm -1), employing an attenuated total reflectance (ATR) device to assess the mosquitoes, and modelling the results with supervised machine learning. The mid-infrared absorption spectrum of a mosquito contains a set of discrete well-delineated bands that depend on the fundamental vibrations of the molecules present in the cuticle, providing a wealth of information not present in the near-infrared range, where it is not possible to capture the contributions of different biochemical components of the mosquito to the spectrum and their variations among mosquitoes with different attributes, as shown in Aedes aegypti and the diptera Culicoides sonorensis 51, 52. However, since the mid-infrared spectral bands are affected in non-trivial ways by the development of a mosquito and the changing composition of the cuticle, it is not possible to predict traits by simply monitoring changes in band intensities 51.
Here, we show that the use of supervised machine learning 53 allows the determination of the age and species of two major malaria vectors, An. arabiensis and An. gambiae, from the information contained in their mid-infrared spectra. This is possible because machine learning, unlike standard statistical approaches, can recognise the complex relationships in these traits (mosquito species and mosquito age) and disentangle them from other irrelevant variation 54– 56. Using this approach, we are able to reconstruct simulated age distributions of mosquito populations with unprecedented reliability. The technique we propose here is time efficient (an analysis takes less than one minute per mosquito), economical, and requires neither reagents nor highly trained operators. It also represents a novel approach to the analysis of insects using spectroscopic techniques, solving some previous drawbacks, and accelerating progress towards the establishment of infrared spectroscopy as a routine approach for mosquito surveillance and evaluation of interventions.
Methods
Mosquito rearing, blood feeding, and processing
Anopheles gambiae s.s (Kisumu strain) and An. arabiensis (Ifakara strain) mosquitoes were reared under standard insectary conditions of 27 ± 1°C, 70% humidity and a 12-hr light: 12-hr dark cycle at the University of Glasgow. Anopheles gambiae s.s (Kisumu strain) mosquitoes were provided by Hilary Ranson (Liverpool School of Tropical Medicine). The An. arabiensis (Ifakara strain) colony was initially established in 2008 at the Ifakara Health Institute with individuals from Sagamaganga village (Kilombero District, Morogoro Region, Tanzania) 57, and after a few generations reared at the University of Glasgow. Larvae were fed ad libitum on fish pellets (Tetra Pond Pellets, Tetra GmbH, Herrenteich 78, D49324). Pupae were collected from the larval trays and moved into a cage for emergence. Mosquitoes were considered to be in the age category of “Day 0” on their day of emergence from pupa to adult. Upon emergence, adults were fed ad libitum on a 5% glucose solution supplemented with 0.05% (w/v) 4-aminobenzoic acid (PABA).
In order to produce mosquitoes with the same age and different physiological conditions, cages with mosquitoes of the same age (where pupae were added on the same day) were blood fed human blood and membrane feeders at different days after emergence. An oviposition cup was then introduced 2 days after a blood meal to allow egg laying. Mosquitoes under three types of physiological conditions were collected, specifically: mosquitoes that had just received a blood meal (blood fed), mosquitoes that developed eggs as they received a blood meal two days before collection (gravid) and mosquitoes that laid eggs as they received a blood meal four days before collection and had the chance to lay eggs on an oviposition cup for two consecutive nights (sugar fed). Blood feeding was provided to each cage every 6 days. Thus, mosquitoes living 6 or more days after their first blood meal underwent multiple gonotrophic cycles.
Human blood was obtained from the Glasgow and West of Scotland Blood Transfusion Service. Ethical approval for the supply and use of human blood was obtained from Scottish National Blood Transfusion Service committee for governance of blood and tissue samples for non-therapeutic use, and Donor Research (submission Reference No 18~15). Whole blood from donors of any blood group was provided in Citrate-Phosphate-Dextrose-Adenine (CPD-A) anticoagulant/preservative. Fresh blood was obtained on a weekly basis.
Upon collection, mosquitoes were transferred into a cup and killed with a cotton soaked with chloroform placed on top of the cup for 30 minutes. Dead mosquitoes were then transferred into a tube over a layer of cotton and silica gel desiccant. The vial was then immediately stored at 4°C. Since it takes one day to dry in silica Anopheles gambiae mosquitoes and two days An. arabiensis, both species were stored prior to measurement for at least three days.
Spectral data acquisition
Dried specimens were laid on their sides on the ATR diamond so that the surface of the diamond was mainly covered by the insect’s head and thorax to avoid as far as possible measuring the contents of the abdomen ( Figure 1). The wings and limbs were not removed and were used to help position the mosquito. Pressure was then applied by the anvil of the ATR and the spectrum was measured using a dry-air purged Bruker Vertex 70 spectrometer (Bruker Corporation, Billerica, Massachusetts, USA) equipped with a Globar lamp, a Deuterated Lanthanum α Alanine doped Tri-Glicine Sulphate (DLaTGS) detector, a potassium bromide (KBr) beamsplitter, and a diamond ATR accessory (Bruker Platinum ATR Unit A225). Final, noiseless spectra were produced after averaging 16 scans taken at room temperature between 400 and 4,000 cm-1 with 1 cm -1 resolution. Mosquito spectra with low intensity or a significative atmospheric intrusion ( Figure 2) were discarded automatically using Loco Mosquito 5.0, a custom program written in Python 3.6 (see Software availability section). This program discarded unsuitable spectra by measuring the average absorbance of the plateau in the mosquito spectra between 400 and 500 cm -1 and the smoothness of the region between 3,500 and 3,900 cm -1 (to detect water and CO 2 spectra intrusion).
Figure 1. Best position of the mosquito on the ATR crystal.
The correct way to place a mosquito on the ATR crystal (left) is to cover the surface with the head and chest. The wrong way (right) is by centring the abdomen on the crystal.
Figure 2. Common experimental errors during the measurement of the infrared spectrum of a mosquito using ATR-FTIR spectroscopy.
Above, blue: Spectrum with a significant atmospheric intrusion. Centre, green: An. gambiae mosquito with high water content. Below, red. Spectrum with poorly defined features due to low intensity, caused by the displacement of the mosquito during the measurement. All spectra are compared to a correct spectrum of a mosquito shown with a black dashed line.
Machine-learning analysis
A supervised machine-learning approach was used to map the pre-selected 17 wavenumbers (see Spectroscopic Method subsection in Results) to mosquito species (either An. gambiae or An. arabiensis) and to mosquito age. In both cases, a classification approach was used. The age classes selected were mosquito ages 1, 3, 5, 7, 9, 11, and 15 days, which allowed acceptable per-age accuracy while improving on current binary cut-off of 4 days based on oviposition (and assuming no pre-gravid behaviour). These age classes were chosen as a compromise between granularity of the predictions and model performance.
Mosquito species and ages were treated in separate models to increase accuracy. To identify the algorithms most suited to the identification of either mosquito species and age class, we first compared the baseline performance of k nearest neighbours (kNN), logistic regression (LR), support vector machines (SVM), random forests (RF), and gradient boosted trees (XGB) using 5-fold cross-validation ( Figure 3). This range of parametric (LR, SVM) and non-parametric (kNN, RF, XGB) models offer different data representation schemes using Euclidian distance (kNN), linear relationships (LR, SVM), and ensemble decision trees (RF, XGB). For species and age class identification, XGB and LR, respectively, were then selected for further optimization. The full dataset—comprising 2,536 mosquito spectral features (details in Table 1) and their corresponding species or age labels—was sampled at random to generate a hold-out validation set stratified according to predicted age classes for each species (see below). The remaining samples were then repeatedly (10 rounds) split in random stratified training and test sets (10 folds). Model optimization involved a further 70%/30% random stratified splitting scheme on each of the training folds, and algorithms were trained with a broad range of parameter combinations, and the best settings for each train set retained.
Figure 3.
Comparison of the baseline pre-optimisation performance of 5 supervised machine learning algorithms for the prediction of mosquito species ( A), An. gambiae age ( B), and An. arabiensis age ( C). Each classifier was run with 5-fold cross-validation on a training subset sampled random representing stratified 70% of the full dataset and tested against the remaining 30%. No model parameter optimization was performed at this stage (please see selected models post-optimisation in Figure 13, Figure 14, and Figure 16). KNN, k Nearest Neighbours; LR, logistic regression; SVM, support vector machines; RF, random forests; XGB, gradient boosted trees with XGBoost.
Table 1. Number of mosquitoes of each species and status that have been measured.
Anopheles arabiensis | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Totals | ||||||||||||||||||
Age/days | 1 | 3 | 5 | 7 | 9 | 11 | 12 | 13 | 15 | 17 | ||||||||
Gravid | 0 | 57 | 61 | 41 | 66 | 70 | 52 | 90 | 33 | 80 | 550 | |||||||
Sugar-fed | 42 | 43 | 65 | 67 | 84 | 67 | 0 | 39 | 41 | 16 | 464 | |||||||
Totals | 42 | 100 | 126 | 108 | 150 | 137 | 52 | 129 | 74 | 96 | 1014 | |||||||
Anopheles gambiae | ||||||||||||||||||
Totals | ||||||||||||||||||
Age/days | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | |
Gravid | 0 | 0 | 0 | 47 | 45 | 51 | 43 | 40 | 37 | 39 | 35 | 89 | 34 | 0 | 45 | 61 | 39 | 605 |
Sugar-fed | 160 | 63 | 65 | 62 | 54 | 52 | 59 | 53 | 53 | 44 | 41 | 32 | 27 | 44 | 44 | 24 | 40 | 917 |
Totals | 160 | 63 | 65 | 109 | 99 | 103 | 102 | 93 | 90 | 83 | 76 | 121 | 61 | 44 | 99 | 85 | 79 | 1522 |
Each optimised model’s accuracy was then calculated against the corresponding test set. The 100 resulting trained models were then ranked according to their accuracy scores, and the best 10 retained and predictions bagged for evaluation of their predicted labels (age or species) against the true labels. All machine learning was performed in Python 3.6 using scikit-learn 0.19, XGBoost 0.82, and corresponding plotting using seaborn 0.9.
Age-structure modelling
To illustrate the utility of our approach for field-based surveys of Anopheles populations, and to assess whether they could be used to measure the impact of vector control interventions in the field, we simulated age structures of An. gambiae and An. arabiensis using a simple age structure population model. Here, age corresponds to days. Specifically, the number of mosquitoes N surviving to from age t to t+1 was modelled as a binomial function: N t+ 1 ~ binomial ( N t, s); where N t is the total number of mosquitoes alive at age t+1 and s is the probability of daily survival. The daily survival rate was based on literature values, i.e., for An. gambiae s = 0.91 58 and for An. arabiensis s = 0.82 16. For the age structure of the populations under a theoretical intervention regime, we assume that the intervention quadruples the mortality rate of both species from day 3 onwards. This emulates a scenario where mosquitoes encounter an insecticide-treated bednet for the first time at day 3, when they start feeding.
Each age class was generated by sampling the full dataset in the proportions calculated from the above simulated age-structured populations. A continuous probability distribution was then fitted to the true and predicted discrete age distributions to better generalize our discrete model predictions to an exponentially decreasing age structure using a half-logistic probability function as
The half-logistic distribution is well-suited for fitting survival data 59, 60. Age distributions were compared using the Kolmogorov-Smirnov statistic on 2 samples, a two-sided test for the null hypothesis that 2 independent samples are drawn from the same continuous distribution.
Estimation of the light penetration distance in a mosquito
The depth of light penetration for ATR measurements de-pends on the wavelength λ and angle of incidence of light θ, and on the refractive indices of the mosquito, n 2, and the ATR crystal, n 1:
Taking into account that, according to the specifications of the ATR accessory 61, the incidence angle is θ = 45º, the Sellmeier equation 62 for diamond 63 ( λ in μm):
and a Cauchy equation n( λ) = A + B/ λ 2, with A = 1.517 and B = 8.80·10 -3 μm 2 for insect chitin 64. The results for the MIR region are shown in Figure 4.
Figure 4. Estimated depth of penetration of the ATR evanescent wave in the mosquito sample.
Results
Mosquitoes preparation
A ‘field-friendly’ protocol to kill and store mosquitoes for infrared (IR) spectroscopy was established. In brief, laboratory-reared female An. gambiae and An. arabiensis mosquitoes of different ages and physiological states were killed by exposure to chloroform for 30 minutes. As chloroform evaporates and does not interact with the mosquito cuticle, the IR spectra were not affected by this chemical ( Figure 5). This method, also used before 40, 43, 45, is more practical in the field than killing mosquitoes with CO 2 or by freezing them at -20°C. Dead mosquitoes were then stored in 20 ml transport tubes with silica gel to dry them out 65. Removal of water from samples is essential, as it uncovers parts of the IR spectrum that would otherwise be hidden by the intense IR absorption of water ( Figure 6). Water IR absorption bands disappeared from An. gambiae and An. arabiensis mosquitoes after storage with silica gel at 4°C for one and two days, respectively (longer in a An. arabiensis due to its higher body water content) 66. In addition, this drying method preserved mosquitoes from decomposition for more than 10 days ( Figure 7). Alternative drying methods such as desiccating specimens in an oven at 80°C were shown to affect IR spectra, disrupting specially the peaks associated with lipids ( Figure 6), and therefore not used.
Figure 5. Mid-infrared absorption spectra of a typical mosquito ( An. gambiae, gravid, 9 days old, top) and liquid chloroform (bottom).
Note the absence of the signal of the chloroform employed to kill the mosquito in the insect spectrum, since chloroform rapidly evaporates from the sample and leave no MIR-detectable signals.
Figure 6. Mid-infrared absorption spectra of a recently killed mosquito (blue), a mosquito dried in a vial with silica (green) and in an oven at 80°C for 60 minutes (pink).
All mosquitoes were An. gambiae, sugar-fed and 11 days old. A clear loss of detail can be observed in the oven dried sample due to heating.
Figure 7. Effect of the storage time on the averaged mid-infrared spectra of 30 sugar-fed 17-day-old An. gambiae mosquitoes.
3 days (blue), 6 days (green), and 11 days (pink) after collection.
Spectroscopic method
The far- (30–400 cm -1), mid- (400–4,000 cm -1), and near-infrared (4,000–10,000) regions of mosquito spectra were compared ( Figure 8). The far- and near-infrared regions were essentially featureless in dried mosquitoes, unlike the NIR spectra previously published 40, 43, 45, 47, 67 which show the intense signals of liquid water when specimens were not dried ( Figure 9). However, the mid-infrared region showed a large number of well-defined intense peaks, which are easily identifiable as coming from the chemical components of the cuticle ( Table 2). Three different IR spectral sampling techniques were investigated: diffuse reflectance, transmission, and attenuated total internal reflection (ATR, see Spectral data acquisition in Methods). ATR spectroscopy produced the best-defined and most reproducible spectra in the mid-IR region ( Figure 10). ATR also allowed the measurement of different parts of the mosquito body ( e.g., head or abdomen) that have slightly different IR spectra ( Figure 11). It also had superior signal-to-noise ratios allowing acquisition of the spectra in 45 seconds. Raw spectra data is available as Underlying data 68.
Figure 8. Typical near- (left, blue), mid- (centre, green), and far-infrared (right, pink) spectra of an An. gambiae mosquito.
The near-infrared spectrum was collected using diffuse reflectance infrared spectroscopy, while the mid- and far-infrared spectra were obtained using ATR.
Figure 9. Near-infrared diffuse reflectance spectra of water (blue), an undried An. gambiae mosquito (green), a dried mosquito (pink), and chitin (red).
Table 2. Assignment of the selected wavenumbers shown in Figure 12 51, 69.
Wavenumber | Bond | Reference |
---|---|---|
3856 | * | |
3400 | O-H | Mosquito moisture |
3276 | N-H | Chitin, proteins |
2923 | C-H 2 | Proteins, waxes |
2859 | C-H 2 | Proteins, waxes |
1901 | * | |
1746 | C=O | Proteins, waxes |
1636 | C=O | Proteins, chitin |
1539 | O=C-N | Proteins, chitin |
1457 | C-CH 3 | Wax, proteins |
1307 | C-N | Proteins, chitin |
1154 | C-O-C | Chitin, waxes |
1076 | C-O | Chitin |
1027 | C-O | Chitin |
880 | * | |
526 | C-C | Proteins, chitin |
401 | * |
Wavenumbers selected as indicators of overall spectra intensity and offset.
Figure 10. Typical ATR (blue, scaled Abs x32), diffuse reflectance (pink), and transmission (green) mid-infrared spectra of a mosquito.
The transmission spectrum was taken using ZnSe windows. Its vertical offset is due to the reflection of a part of the light because of the difficulty in controlling the angle of the cell windows with the mosquito inside. The ATR, diffuse reflectance, and transmission spectra are the result of an average of 16, 120, and 80 scans, respectively.
Figure 11. Mid-infrared absorption spectra of the head and thorax (blue) and abdomen (green) of a sugar-fed, 17-day-old, An. gambiae mosquito.
It was estimated that by using the ATR sampling technique in the mid-IR, the light penetrates the sample by 3–10 μm up to about 1000 cm -1, and then up to 22 μm within 1000 and 400 cm -1 (see Estimation of the light penetration distance in a mosquito in Methods). As the cuticle of a mosquito is approximately 2–5 μm thick 22, 70, the measured spectra encompass the outer shell and part of the interior of insects. As the cuticle is mainly composed of chitin, proteins, and lipids, spectra associated with these substances were individually compared with the whole-mosquito spectra ( Figure 12) to allow the assignment of the main vibrational modes of the mosquito cuticular constituents to each element ( Table 2). As the cuticular chemical composition is known to change with species and age 71, 72, so too are the relative magnitudes of these vibrational bands. To quantify this change, 17 wavenumbers in the MIR spectrum were selected corresponding to 13 well-defined vibrational absorption peaks (contributed in different proportions by the three main constituents) and 4 troughs (that provide information on spectrum intensity and offset). These 17 wavenumbers were then used for training machine learning models (see below).
Figure 12. Typical mid-infrared spectrum of an Anopheles mosquito.
Shown are An. gambiae (gravid, 9 days-old, blue) and its main chemical constituents wax (arachidyl dodecanoate, green), chitin (from shrimp shells, red), and protein (collagen from bovine Achilles tendon, pink). The wavenumbers selected for the machine learning are indicated with a grey line ( Table 2).
Mosquito species determination
To develop a MIRS-based method to determine the age and species of An. gambiae and An. arabiensis, mosquitoes were reared under laboratory conditions (see Mosquito rearing, blood feeding, and processing in Methods) and collected at ages ranging from 1 to 17 days. To model part of the variability typical in the wild, female encompassing a range of physiological states were incorporated in analysis including those that have just taken a blood meal (blood fed), those that had eggs developed in the abdomen (gravid), or that laid eggs but have not blood-fed yet again (sugar fed); mosquitoes undergone either single or multiple gonotrophic cycles depending on their age. In most cases, over 40 mosquitoes per age and physiological condition from each species were analysed ( Table 1).
A total of 1,522 An. gambiae and 1,014 An. arabiensis spectra from different ages and physiological conditions were used to train supervised machine-learning models (see Machine-learning analysis in Methods). Five algorithms were tested on the dataset to predict mosquito species ( Figure 3A). This initial approach identified logistic regression (LR) as the most accurate approach. We generated 100 bootstrapped models trained on and tested against different subsets of the data which, when aggregated (bagged), predicted the species identity of An. gambiae and An. arabiensis with 76.8 and 76.6% accuracy, respectively ( Figure 13 A). To increase the accuracy of the prediction while retaining the stability and generalisability afforded by bagging, we selected the 10 best models among them, which achieved 82.6% accuracy ( Figure 13 B). These results demonstrate that the MIRS signal is indicative of mosquito species and can be used to distinguish between species in a more time and cost-efficient method, although currently with less accuracy, than standard PCR methods.
Figure 13. Prediction of mosquito species using mid-infrared spectra.
( A) Violin plots of the distribution of per species prediction accuracies of 100 models trained on different random stratified subsets (70/30 splits) of the data. The red line shows model prediction accuracy under chance alone (i.e. in the absence of learning). ( B) Confusion matrix showing the proportion of accurate (diagonal) classification of mosquitoes as either An. gambiae (AG) or An. arabiensis (AR) using the 10 best logistic regression models (n = 2,536).
Mosquito age determination
After the development of the species-prediction model, a similar supervised machine-learning approach was used to model the chronological age for a given mosquito species. Mosquitoes were screened every second day after emerging as adults, and models trained on the same set of 17 wavenumbers as above. The LR model again performed best for both species in correctly mapping wavenumber intensities to mosquito age ( Figure 3B and C). To train, optimise, and validate the models, the full dataset was partitioned into an age-structured validation set and retained for later use in population models (see below). The remaining samples were then randomly split into stratified 70%/30% training and test sets for model tuning. The accuracy in predicting each chronological age varied over mosquito lifespan and between species, ranging from an average of 15% to 97% for An. gambiae and 10% to 100% for An. arabiensis ( Figure 14). As in previous studies 40, 43, it was found that the chronological age of young and old mosquitoes was generally more accurately predicted than intermediate ages, although there were some differences between species. These results suggest that the MIRS-based approach developed here can predict the chronological age of each species from 1 to 15 days old, as well as providing the confidence of prediction for each age class. Furthermore, a trade-off was observed between the granularity of the prediction and its accuracy: models trained on daily scans (not shown) performed worse than if we allowed the mosquitoes to age 2 or 3 days between each scan, suggesting that the ageing of the mosquito cuticle varies between individuals and that the features used for training the models overlap between consecutive age classes ( Figure 15).
Figure 14.
Prediction of An. gambiae ( A– B) and An. arabiensis ( C– D) age class using mid-infrared spectra. ( A, C) Violin plots of the distribution of per age class prediction accuracies of 100 optimised models. ( B, D) Confusion matrices showing the proportion of accurate (diagonal) classification of mosquitoes as either 1, 3, 5, 7, 9, 11, or 15 days old using the 10 best logistic regression models trained on repeated stratified random subsets using 70% of all mosquitoes sampled, and tested on the remaining 30% (n = 681 for An. gambiae and n = 737 for An. arabiensis).
Figure 15. Box-whisker plot containing the measured absorption in each wavenumber and for each age for all the mosquitoes ( An. arabiensis and An. gambiae).
The orange lines represent the median absorbance of each age and wavenumber, the limits of the boxes correspond to the interquartile range (IQR) and the whiskers show the lowest datum still within 1.5 IQR of the lower quartile, and the highest datum still within 1.5 IQR of the upper quartile.
Predicting mosquito age structure
To monitor the efficacy of vector control interventions in the field, accurately describing the age distribution ( i.e., the summary demographic age structure of the local vector species population) is more important than knowing the age of any individual mosquito 46. Consequently, we tested how well the above age prediction models developed for An. gambiae and An. arabiensis could reconstruct the known age distribution of mosquito populations. Mosquito populations reflecting anticipated changes in mortality were used under two scenarios: natural mortality and increased mortality due to a theoretical vector control interventions.
Consistent with natural mosquito populations, but unlike our training dataset, field sampling would not produce age-balanced sample sizes, but rather diminishing sample sizes at older age classes. Furthermore, it would be highly desirable to use our models to measure the impact of vector control interventions on mosquito-population age structures. However, because no real datasets of a true mosquito population age structure exist, the age structures of An. gambiae and An. arabiensis were modelled based on their reported average daily mortality 16, 58 and assuming an intervention that increased the mortality of adult females four-fold after a first blood meal (~3 days after adult emergence).
In these simulations, a starting population of 1,000 female mosquitoes was used, with the population at each subsequent day being calculated as a proportion of the previous day, with survival rates for each species estimated from reports on field studies ( Figure 16A,D) 16, 58. The resulting age-structured populations were then used to randomly sample replicates in the corresponding proportion for each age class used in our MIRS-based prediction models ( Figure 15B–F, grey bars; n = 122 for An. gambiae and n = 42 for An. arabiensis). The models trained above were then used to predict age classes from the MIRS of this age-structured population ( Figure 16B–F, orange bars).
Figure 16. Reconstruction of the age structure of simulated populations of An. gambiae and An. arabiensis mosquitoes sampled from simulated pre- and post-treatment populations.
Population age structures of An. gambiae ( A– C) and An. arabiensis ( D– F) were generated using an age structure population model assuming survival rates of 0.91 ( An. gambiae, A) or 0.82 ( An. arabiensis, D), under two common scenarios: naive untreated populations (blue lines), and populations in which a simulated vector control program resulted in 4x daily mortality of mosquitoes after day 3 (see Age-structure modelling in Methods for details). The proportions of each age class were extracted from those simulated populations ( A, D), and used to build datasets that are representative of a field-sampled population survey (grey bars in B, C, E, and F). The resulting age-structured dataset was then used as the test set for our age-predicting machine learning models (see Figure 3) and compared with the predicted age structure generated from those models (orange bars in B, C, E, and F). Finally, we fit a continuous probability distribution to the true (grey curve) and predicted (orange curve) for better generalization of our discrete model predictions to an exponentially decreasing age structure. Population distributions were compared using a 2-sample Kolmogorov-Smirnov test (KS_2samp), reported in the y-axis labels. A - Relative proportion of each age class in a simulated population of An. gambiae. B - Estimation of age structure of simulated population from ( A) using best models from Figure 3B for An. gambiae (n = 130). C - Estimation of age structure of simulated population post-intervention from ( A) using best models from Figure 3B for An. gambiae (n = 122). D - Relative proportion of each age class in a simulated population of An. arabiensis. E - Estimation of age structure of simulated population from ( A) using best models from Figure 3D for An. arabiensis (n = 42). F - Estimation of age structure of simulated population post-intervention from ( A) using best models from Figure 3D for An. arabiensis (n = 45).
To test the ability of those models to reconstruct the age structure of the true population from our predicted age class frequencies, the age structures of the predicted ( Figure 16B–F, orange bars) and true sampled populations ( Figure 16B–F, grey bars) were modelled with the best fit half-logistic distribution for each species (grey and orange curves in Figures 15 B,C and E,F; see also Age-structure modelling in Methods). The true and predicted age distributions were statistically indistinguishable (Kolmogorov-Smirnov 2-sample test (KS test), p = 1 and p = 0.99 for An. gambiae pre- and post-intervention, respectively; p = 0.75 and p = 0.30 for An. arabiensis pre- and post-intervention, respectively). This approach shows that the algorithm can reconstruct the age structure with good accuracy. Furthermore, our models detected a shift in mosquito age structure consistent with the simulated impacts of the interventions (sampled from true population: KS test p < 0.0001 for An. gambiae and p = 0.004 for An. arabiensis; predicted population: p < 0.0001 for An. gambiae and p = 0.1 for An. arabiensis), suggesting that this MIRS-based approach holds promise for robust measurement and estimation of the age structure of mosquito vector populations.
Discussion
We developed a straightforward, inexpensive, and rapid method to determine the age and species of large numbers of An. gambiae s.l. mosquitoes ( An. gambiae s.s. and An. arabiensis). Based on the supervised machine-learning analysis of their mid-infrared spectra, this method facilitates prediction of mosquito species distribution and survival, two crucial tasks critical to implement and assess malaria control strategies. An advantage of this approach is that in comparison to the current most widely used technique based on dissection, it can determine the whole-age distribution of a mosquito population from the day of emergence until two weeks of age. Although the accuracy of age prediction in the “mid-range” of mosquito life span was not high, by determining the age structure of a population this method could accurately estimate the proportion of mosquitoes within the older and most epidemiologically-important age classes that responsible for malaria transmission.
The use of the mid-infrared spectral region provides some advantages over techniques using near-infrared. Foremost, it is possible to independently quantify the amount of different biochemical components as their vibrational bands appear at different wavenumbers. Furthermore, the MIRS bands are more intense and have much greater definition. In contrast, the near-infrared spectrum of a mosquito is composed of few weak signals ( Figure 8) that are typically dominated by the much stronger vibrational overtone and combination bands of water ( Figure 9) 46, which is likely more dependent on the mosquito physiological state and environmental conditions than on other mosquito traits, such as species and age.
We have shown that the variation of MIR spectra over mosquito age can be exploited by a machine-learning algorithm to predict the chronological age, and ultimately reconstruct population age structures of two important malaria vector species under simulated conditions of changing mortality risk due to vector control. Our algorithms accurately reconstructed age structures of both An. arabiensis and An. gambiae, and also detected shifts in mosquito age structure consistent with simulated impacts of interventions. The ability of this proposed technique to predict the age structure of a population suggests that this approach could constitute an efficient tool for monitoring the efficacy of vector control interventions. Future work will include larger datasets used for training in supervised machine learning, comprising field samples with different ecological conditions. The ecological variability of field samples has limited the use of NIRS for age prediction in wild mosquito populations 45. While the accuracy of MIRS-based approaches may also decline when moving from laboratory-reared to field mosquitoes, we predict that this method will be more robust due to the specific information content and high signal clarity that is obtained in spectra from MIRS. Additional improvements are anticipated by increasing the size and variability of the training set on which mosquito age predictions are validated. This will also facilitate the use of alternative machine learning techniques such as neural networks 73 which may yield even higher accuracy and repeatability.
We have shown that MIRS can discriminate between morphologically identical An. gambiae s.l. species with ~83% accuracy. While the observed accuracy of MIRS species prediction is still not comparable to the PCR precision, further work including a larger training set and field samples is expected to increase the overall accuracy of this approach. In addition, the inclusion of other species of the An. gambiae s.l. complex will be necessary to implement this technique for field application. However, these laboratory-based results, which included mosquitoes from different ages, physiological conditions, and cohorts, suggest that despite the ecological and life-history traits variation, MIR spectra contain a species-specific signature that the machine-learning algorithm can detect. Indeed, mass-spectrometry studies have shown that different species in the An. gambiae s.l. complex have quantitative differences in the cuticular hydrocarbon composition of their cuticle 72, which will affect the MIR spectra.
The biochemical signature obtained by MIRS from the mosquito cuticle provided information on both mosquito species and age. It may therefore be possible to obtain further information on other mosquito traits that alter the cuticular composition. Recently, a new insecticide resistance mechanism has been discovered in An. gambiae, which relies on an increased cuticle thickness that in turn reduces insecticide uptake 73. While this mechanism has been detected by electron microscopy, there are no other methods to measure this new trait, which could have profound epidemiological consequences. In the future, MIRS calibrations including cuticular resistant mosquitoes may be able to identify this insecticide resistant trait. In addition, infection with the Plasmodium malaria parasite might be detected by MIRS. Pathogen infection is known to alter mosquito physiology and could directly or indirectly modify their cuticular composition. For example, in the dengue and Zika vector Aedes aegypti mosquitoes, an infrared spectroscopy method has recently been developed to detect Zika virus 47, the bacterial endosymbiont Wolbachia 52, 67, and malaria infection in mosquitoes 41, 74.
The accuracy, speed, and generalisability of the MIRS approach presented here shows that this tool holds promise for use in the evaluation of vector control interventions and as triage method when a large number of specimens (>500 -1000) requires to be processed in a rapid fashion. The inclusion of new species, larger sample sizes and field samples with variable ecological conditions is a prerequisite for the application of this technique. It is worth noting that the cost of a portable FTIR MIR spectrometer is ~$20–25,000, which is in the range of quantitative PCR machines used for species determination and/or insecticide resistance monitoring. However, in contrast to PCR analysis, no additional, ongoing costs for reagents and running costs are required once the core equipment is installed. Thus, this approach could be particularly valuable in resource limited settings.
The MIRS method presented here provides rapid and accurate information on Anopheles species (82.6%) and reliably characterises mosquito age distribution. However, these results were obtained by training machine learning models with a relatively modest number of mosquitoes (2,536). In future work, it will be possible to generate much larger MIRS datasets and thus train more sophisticated predictive models. Such larger data sets will lend themselves to analysis by more powerful “big data” approaches including deep learning methods that would be expected to improve accuracy considerably beyond this proof-of-principle study. Furthermore, the technique applied to malaria vectors here could also be expanded to other vector-borne diseases such as Zika, dengue, Lyme disease, leishmaniasis, or filariasis. In light of these opportunities, we recommend this method be prioritised for further evaluation.
Data availability
Underlying data
Enlighten: Research Data: Prediction of malaria mosquito species and population age structure using mid-infrared spectroscopy and supervised machine learning. https://doi.org/10.5525/gla.researchdata.688 68
This project contains the following underlying data:
DataMosquitoes.zip (zip file containing underlying spectra data)
Software availability
Source code: https://github.com/SimonAB/Gonzalez-Jimenez_MIRS/tree/v1.0
Archived source code: http://doi.org/10.5281/zenodo.2609356 75
Licence: GNU General Public License v3.0
Acknowledgements
We would like to thank Dorothy Armstrong and Elizabeth Peat for assistance with mosquito rearing and maintenance. We would also like to thank Hilary Ranson for providing the Kisumu colony.
Funding Statement
This work was supported by the Wellcome Trust through a Intermediate Fellowship in Public Health and Tropical Medicine to FO [102350]. This work was also supported by The Engineering and Physical Sciences Research Council (EPSRC) [EP/J009733/1, EP/K034995/1, EP/N508792/1, and EP/N007417/1] and Medical Research Council (MRC) [MR/P025501/1]. FB is supported by an AXA Research Fund fellowship [14-AXA-PDOC-130] and a European Molecular Biology Organization (EMBO) Long Term fellowship [43-2014]. MV is funded under the MRC/Department for International Development Concor-dat agreement, which is part of EU EDCTP2 programme [MR/N015320/1].
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
[version 3; peer review: 2 approved]
References
- 1. Bhatt S, Weiss DJ, Cameron E, et al. : The effect of malaria control on Plasmodium falciparum in Africa between 2000 and 2015. Nature. 2015;526(7572):207–211. 10.1038/nature15535 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. WHO: World Malaria Report.2017. Reference Source [Google Scholar]
- 3. Hemingway J, Ranson H, Magill A, et al. : Averting a malaria disaster: will insecticide resistance derail malaria control? Lancet. 2016;387(10029):1785–1788. 10.1016/S0140-6736(15)00417-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Protopopoff N, Mosha JF, Lukole E, et al. : Effectiveness of a long-lasting piperonyl butoxide-treated insecticidal net and indoor residual spray interventions, separately and together, against malaria transmitted by pyrethroid-resistant mosquitoes: a cluster, randomised controlled, two-by-two factorial design trial. Lancet. 2018;391(10130):1577–1588. 10.1016/S0140-6736(18)30427-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Hawley WA, Phillips-Howard PA, ter Kuile FO, et al. : Community-wide effects of permethrin-treated bed nets on child mortality and malaria morbidity in western Kenya. Am J Trop Med Hyg. 2003;68(4 Suppl):121–127. 10.4269/ajtmh.2003.68.121 [DOI] [PubMed] [Google Scholar]
- 6. Pates H, Curtis C: Mosquito behavior and vector control. Annu Rev Entomol. 2005;50:53–70. 10.1146/annurev.ento.50.071803.130439 [DOI] [PubMed] [Google Scholar]
- 7. Viana M, Hughes A, Matthiopoulos J, et al. : Delayed mortality effects cut the malaria transmission potential of insecticide-resistant mosquitoes. Proc Natl Acad Sci U S A. 2016;113(32):8975–8980. 10.1073/pnas.1603431113 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Beier JC: Malaria parasite development in mosquitoes. Annu Rev Entomol. 1998;43:519–43. 10.1146/annurev.ento.43.1.519 [DOI] [PubMed] [Google Scholar]
- 9. Ohm JR, Baldini F, Barreaux P, et al. : Rethinking the extrinsic incubation period of malaria parasites. Parasit Vectors. 2018;11(1):178. 10.1186/s13071-018-2761-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Smith DL, McKenzie FE: Statics and dynamics of malaria infection in Anopheles mosquitoes. Malar J. 2004;3:13. 10.1186/1475-2875-3-13 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Brady OJ, Godfray HC, Tatem AJ, et al. : Vectorial capacity and vector control: reconsidering sensitivity to parameters for malaria elimination. Trans R Soc Trop Med Hyg. 2016;110(2):107–117. 10.1093/trstmh/trv113 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Gillies MT, Wilkes TJ: A study of the age-composition of populations of Anopheles gambiae Giles and A. funestus Giles in North-Eastern Tanzania. Bull Entomol Res. 1965;56(2):237–262. 10.1017/S0007485300056339 [DOI] [PubMed] [Google Scholar]
- 13. Macdonald G: Epidemiological basis of malaria control. Bull World Health Organ. 1956;15(3–5):613–626. [PMC free article] [PubMed] [Google Scholar]
- 14. Macdonald G: The Epidemiology and Control of Malaria. Oxford University Press,1957. Reference Source [Google Scholar]
- 15. Detinova TS: Age-grouping methods in Diptera of medical importance with special reference to some vectors of malaria. Monogr Ser World Health Organ. 1962;47:13–191. [PubMed] [Google Scholar]
- 16. Charlwood JD, Smith T, Billingsley PF, et al. : Survival and infection probabilities of anthropophagic anophelines from an area of high prevalence of Plasmodium falciparum in humans. Bull Entomol Res. 1997;87(5):445–453. 10.1017/S0007485300041304 [DOI] [Google Scholar]
- 17. Polovodova VP: Age changes in ovaries of Anopheles and methods of determination of age composition in mosquito populations. Med Parazitol i Parazit Bolezn. 1941;10(9):387–395. [Google Scholar]
- 18. Yakob L, Yan G: Modeling the effects of integrating larval habitat source reduction and insecticide treated nets for malaria control. PLoS One. 2009;4(9):e6921. 10.1371/journal.pone.0006921 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Anagonou R, Agossa F, Azondékon R, et al. : Application of Polovodova's method for the determination of physiological age and relationship between the level of parity and infectivity of Plasmodium falciparum in Anopheles gambiae s.s, south-eastern Benin. Parasit Vectors. 2015;8:117. 10.1186/s13071-015-0731-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Hoc TQ, Charlwood JD: Age determination of Aedes cantans using the ovarian oil injection technique. Med Vet Entomol. 1990;4(2):227–33. 10.1111/j.1365-2915.1990.tb00281.x [DOI] [PubMed] [Google Scholar]
- 21. Joy TK, Jeffrey Gutierrez EH, Ernst K, et al. : Aging field collected Aedes aegypti to determine their capacity for dengue transmission in the southwestern United States. PLoS One. 2012;7(10):e46946. 10.1371/journal.pone.0046946 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Schlein Y, Gratz NG: Determination of the age of some anopheline mosquitos by daily growth layers of skeletal apodemes. Bull World Health Organ. 1973;49(4):371–375. [PMC free article] [PubMed] [Google Scholar]
- 23. Gerade BB, Lee SH, Scott TW, et al. : Field validation of Aedes aegypti (Diptera: Culicidae) age estimation by analysis of cuticular hydrocarbons. J Med Entomol. 2004;41(2):231–238. 10.1603/0022-2585-41.2.231 [DOI] [PubMed] [Google Scholar]
- 24. Wu D, Lehane MJ: Pteridine fluorescence for age determination of Anopheles mosquitoes. Med Vet Entomol. 1999;13(1):48–52. 10.1046/j.1365-2915.1999.00144.x [DOI] [PubMed] [Google Scholar]
- 25. Cook PE, Hugo LE, Iturbe-Ormaetxe I, et al. : The use of transcriptional profiles to predict adult mosquito age under field conditions. Proc Natl Acad Sci U S A. 2006;103(48):18060–18065. 10.1073/pnas.0604875103 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Sikulu MT, Monkman J, Dave KA, et al. : Mass spectrometry identification of age-associated proteins from the malaria mosquitoes Anopheles gambiae s.s. and Anopheles stephensi. Data Brief. 2015;4:461–467. 10.1016/j.dib.2015.07.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Sinka ME, Bangs MJ, Manguin S, et al. : The dominant Anopheles vectors of human malaria in the Asia-Pacific region: occurrence data, distribution maps and bionomic précis. Parasit Vectors. 2011;4:89. 10.1186/1756-3305-4-89 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Koekemoer LL, Kamau L, Hunt RH, et al. : A cocktail polymerase chain reaction assay to identify members of the Anopheles funestus (Diptera: Culicidae) group. Am J Trop Med Hyg. 2002;66(6):804–811. 10.4269/ajtmh.2002.66.804 [DOI] [PubMed] [Google Scholar]
- 29. Cohuet A, Simard F, Toto JC, et al. : Species identification within the Anopheles funestus group of malaria vectors in Cameroon and evidence for a new species. Am J Trop Med Hyg. 2003;69(2):200–205. 10.4269/ajtmh.2003.69.200 [DOI] [PubMed] [Google Scholar]
- 30. Santolamazza F, Mancini E, Simard F, et al. : Insertion polymorphisms of SINE200 retrotransposons within speciation islands of Anopheles gambiae molecular forms. Malar J. 2008;7:163. 10.1186/1475-2875-7-163 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Braack L, Hunt R, Koekemoer LL, et al. : Biting behaviour of African malaria vectors: 1. where do the main vector species bite on the human body? Parasit Vectors. 2015;8:76. 10.1186/s13071-015-0677-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Lyimo IN, Ferguson HM: Ecological and evolutionary determinants of host species choice in mosquito vectors. Trends Parasitol. 2009;25(4):189–196. 10.1016/j.pt.2009.01.005 [DOI] [PubMed] [Google Scholar]
- 33. Lehmann T, Diabate A: The molecular forms of Anopheles gambiae: a phenotypic perspective. Infect Genet Evol. 2008;8(5):737–746. 10.1016/j.meegid.2008.06.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Bass C, Williamson MS, Wilding CS, et al. : Identification of the main malaria vectors in the Anopheles gambiae species complex using a TaqMan real-time PCR assay. Malar J. 2007;6:155. 10.1186/1475-2875-6-155 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Favia G, Lanfrancotti A, Spanos L, et al. : Molecular characterization of ribosomal DNA polymorphisms discriminating among chromosomal forms of Anopheles gambiae s.s. Insect Mol Biol. 2001;10(1):19–23. 10.1046/j.1365-2583.2001.00236.x [DOI] [PubMed] [Google Scholar]
- 36. Fanello C, Santolamazza F, Della Torre A: Simultaneous identification of species and molecular forms of the Anopheles gambiae complex by PCR-RFLP. Med Vet Entomol. 2002;16(4):461–464. 10.1046/j.1365-2915.2002.00393.x [DOI] [PubMed] [Google Scholar]
- 37. Cooseman M, Smits A, Roelants P: Intraspecific isozyme polymorphism of Anopheles gambiae in relation to environment, behavior, and malaria transmission in southwestern Burkina Faso. Am J Trop Med Hyg. 1998;58(1):70–74. 10.4269/ajtmh.1998.58.70 [DOI] [PubMed] [Google Scholar]
- 38. Al Ahmed AM, Badjah-Hadj-Ahmed AY, Al Othman ZA, et al. : Identification of wild collected mosquito vectors of diseases using gas chromatography-mass spectrometry in Jazan Province, Saudi Arabia. J Mass Spectrom. 2013;48(11):1170–1177. 10.1002/jms.3282 [DOI] [PubMed] [Google Scholar]
- 39. Pickering CL, Hands JR, Fullwood LM, et al. : Rapid discrimination of maggots utilising ATR-FTIR spectroscopy. Forensic Sci Int. 2015;249:189–196. 10.1016/j.forsciint.2015.01.036 [DOI] [PubMed] [Google Scholar]
- 40. Mayagaya VS, Michel K, Benedict MQ, et al. : Non-destructive determination of age and species of Anopheles gambiae s.l. using near-infrared spectroscopy. Am J Trop Med Hyg. 2009;81(4):622–630. 10.4269/ajtmh.2009.09-0192 [DOI] [PubMed] [Google Scholar]
- 41. Barbosa TM, de Lima LAS, Dos Santos MCD, et al. : A novel use of infra-red spectroscopy (NIRS and ATR-FTIR) coupled with variable selection algorithms for the identification of insect species (Diptera: Sarcophagidae) of medico-legal relevance. Acta Trop. 2018;185:1–12. 10.1016/j.actatropica.2018.04.025 [DOI] [PubMed] [Google Scholar]
- 42. Perez-Mendoza J, Dowell FE, Broce AB, et al. : Chronological age-grading of house flies by using near-infrared spectroscopy. J Med Entomol. 2002;39(3):499–508. 10.1603/0022-2585-39.3.499 [DOI] [PubMed] [Google Scholar]
- 43. Sikulu M, Killeen GF, Hugo LE, et al. : Near-infrared spectroscopy as a complementary age grading and species identification tool for African malaria vectors. Parasit Vectors. 2010;3:49. 10.1186/1756-3305-3-49 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Ntamatungiro AJ, Mayagaya VS, Rieben S, et al. : The influence of physiological status on age prediction of Anopheles arabiensis using near infra-red spectroscopy. Parasit Vectors. 2013;6(1):298. 10.1186/1756-3305-6-298 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Krajacich BJ, Meyers JI, Alout H, et al. : Analysis of near infrared spectra for age-grading of wild populations of Anopheles gambiae. Parasit Vectors. 2017;10(1):552. 10.1186/s13071-017-2501-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Lambert B, Sikulu-Lord MT, Mayagaya VS, et al. : Monitoring the Age of Mosquito Populations Using Near-Infrared Spectroscopy. Sci Rep. 2018;8(1):5274. 10.1038/s41598-018-22712-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Fernandes JN, Dos Santos LMB, Chouin-Carneiro T, et al. : Rapid, noninvasive detection of Zika virus in Aedes aegypti mosquitoes by near-infrared spectroscopy. Sci Adv. 2018;4(5):eaat0496. 10.1126/sciadv.aat0496 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Esperança PM, Blagborough AM, Da DF, et al. : Detection of Plasmodium berghei infected Anopheles stephensi using near-infrared spectroscopy. Parasit Vectors. 2018;11(1):377. 10.1186/s13071-018-2960-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Lin-Vien D, Colthup NB, Fateley WG, et al. : The Handbook of Infrared and Raman Characteristic Frequencies of Organic Molecules. (Academic Press Inc.,).1991. Reference Source [Google Scholar]
- 50. Deng BC, Yun YH, Liang YZ, et al. : A new strategy to prevent over-fitting in partial least squares models based on model population analysis. Anal Chim Acta. 2015;880:32–41. 10.1016/j.aca.2015.04.045 [DOI] [PubMed] [Google Scholar]
- 51. Peiris KH, Drolet BS, Cohnstaedt LW, et al. : Infrared Absorption Characteristics of Culicoides sonorensis in Relation to Insect Age. Am J Agric Sci Technol. 2014;2(2):49–61. 10.7726/ajast.2014.1006 [DOI] [Google Scholar]
- 52. Khoshmanesh A, Christensen D, Perez-Guaita D, et al. : Screening of Wolbachia Endosymbiont Infection in Aedes aegypti Mosquitoes Using Attenuated Total Reflection Mid-Infrared Spectroscopy. Anal Chem. 2017;89(10):5285–5293. 10.1021/acs.analchem.6b04827 [DOI] [PubMed] [Google Scholar]
- 53. Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. (Springer).2009. 10.1007/978-0-387-84858-7 [DOI] [Google Scholar]
- 54. Babayan SA, Sinclair A, Duprez JS, et al. : Chronic helminth infection burden differentially affects haematopoietic cell development while ageing selectively impairs adaptive responses to infection. Sci Rep. 2018;8(1): 3802. 10.1038/s41598-018-22083-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Babayan SA, Liu W, Hamilton G, et al. : The Immune and Non-Immune Pathways That Drive Chronic Gastrointestinal Helminth Burdens in the Wild. Front Immunol. 2018;9:56. 10.3389/fimmu.2018.00056 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Borchers MR, Chang YM, Proudfoot KL, et al. : Machine-learning-based calving prediction from activity, lying, and ruminating behaviors in dairy cattle. J Dairy Sci. 2017;100(7):5664–5674. 10.3168/jds.2016-11526 [DOI] [PubMed] [Google Scholar]
- 57. Lyimo IN, Haydon DT, Russell TL, et al. : The impact of host species and vector control measures on the fitness of African malaria vectors. Proc Biol Sci. 2013;280(1754): 20122823. 10.1098/rspb.2012.2823 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Molineaux L, Gramiccia G: The Garki Project. Research on the epidemiology and control of malaria in the Sudan savanna of West Africa. World Health Organization,1980. Reference Source [Google Scholar]
- 59. Bennett S: Log-Logistic Regression Models for Survival Data. Appl Stat. 1983;32(2):165 10.2307/2347295 [DOI] [Google Scholar]
- 60. Collett D: Modelling Survival Data in Medical Research. CRC press,2003. Reference Source [Google Scholar]
- 61. Bruker Optik GmbH: Platinum ATR Unit A 225 User Instructions.2011. Reference Source [Google Scholar]
- 62. Sellmeier W: Zur Erklärung der abnormen Farbenfolge im Spectrum einiger Substanzen. Ann der Phys und Chemie. 1871;219(6):272–282. 10.1002/andp.18712190612 [DOI] [Google Scholar]
- 63. Thomas ME, Tropf WJ: Optical Properties of Diamond. In Proceedings of SPIE(ed. Klocek, P.)1994;144–151. 10.1117/12.187336 [DOI] [Google Scholar]
- 64. Leertouwer HL, Wilts BD, Stavenga DG: Refractive index and dispersion of butterfly chitin and bird keratin measured by polarizing interference microscopy. Opt Express. 2011;19(24):24061–6. 10.1364/OE.19.024061 [DOI] [PubMed] [Google Scholar]
- 65. Dowell FE, Noutcha AE, Michel K: Short report: The effect of preservation methods on predicting mosquito age by near infrared spectroscopy. Am J Trop Med Hyg. 2011;85(6):1093–6. 10.4269/ajtmh.2011.11-0438 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66. Gray EM, Bradley TJ: Physiology of desiccation resistance in Anopheles gambiae and Anopheles arabiensis. Am J Trop Med Hyg. 2005;73(3):553–9. 10.4269/ajtmh.2005.73.553 [DOI] [PubMed] [Google Scholar]
- 67. Sikulu-Lord MT, Milali MP, Henry M, et al. : Near-Infrared Spectroscopy, a Rapid Method for Predicting the Age of Male and Female Wild-Type and Wolbachia Infected Aedes aegypti. PLoS Negl Trop Dis. 2016;10(10):e005040. 10.1371/journal.pntd.0005040 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68. Gonzalez Jimenez M, Babayan S, Khazaeli P, et al. : Prediction of malaria mosquito species and population age structure using mid-infrared spectroscopy and supervised machine learning. University of Glasgow,2018. 10.5525/gla.researchdata.688 [DOI] [PMC free article] [PubMed]
- 69. Cárdenas G, Cabrera G, Taboada E, et al. : Chitin characterization by SEM, FTIR, XRD, and 13C cross polarization/mass angle spinning NMR. J Appl Polym Sci. 2004;93(4):1876–1885. 10.1002/app.20647 [DOI] [Google Scholar]
- 70. Yahouédo GA, Chandre F, Rossignol M, et al. : Contributions of cuticle permeability and enzyme detoxification to pyrethroid resistance in the major malaria vector Anopheles gambiae. Sci Rep. 2017;7(1):11091. 10.1038/s41598-017-11357-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71. Suarez E, Nguyen HP, Ortiz IP, et al. : Matrix-assisted laser desorption/ionization-mass spectrometry of cuticular lipid profiles can differentiate sex, age, and mating status of Anopheles gambiae mosquitoes. Anal Chim Acta. 2011;706(1):157–163. 10.1016/j.aca.2011.08.033 [DOI] [PubMed] [Google Scholar]
- 72. Wood O, Hanrahan S, Coetzee M, et al. : Cuticle thickening associated with pyrethroid resistance in the major malaria vector Anopheles funestus. Parasit Vectors. 2010;3(1):67. 10.1186/1756-3305-3-67 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73. LeCun Y, Bengio Y, Hinton G: Deep learning. Nature. 2015;521(7553):436–444. 10.1038/nature14539 [DOI] [PubMed] [Google Scholar]
- 74. Maia MF, Kapulu M, Muthui M, et al. : Detection of Plasmodium falciparum infected Anopheles gambiae using near-infrared spectroscopy. Malar J. 2019;18(1):85. 10.1186/s12936-019-2719-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75. Babayan S, Gonzalez M: SimonAB/Gonzalez-Jimenez_MIRS: First public release (Version v1.0). Zenodo.2019. 10.5281/zenodo.2609356 [DOI] [Google Scholar]