Abstract
Arthropods play a dominant role in natural and human-modified terrestrial ecosystem dynamics. Spatially-explicit arthropod population time-series data are crucial for statistical or mathematical models of these dynamics and assessment of their veterinary, medical, agricultural, and ecological impacts. Such data have been collected world-wide for over a century, but remain scattered and largely inaccessible. In particular, with the ever-present and growing threat of arthropod pests and vectors of infectious diseases, there are numerous historical and ongoing surveillance efforts, but the data are not reported in consistent formats and typically lack sufficient metadata to make reuse and re-analysis possible. Here, we present the first-ever minimum information standard for arthropod abundance, Minimum Information for Reusable Arthropod Abundance Data (MIReAD). Developed with broad stakeholder collaboration, it balances sufficiency for reuse with the practicality of preparing the data for submission. It is designed to optimize data (re)usability from the “FAIR,” (Findable, Accessible, Interoperable, and Reusable) principles of public data archiving (PDA). This standard will facilitate data unification across research initiatives and communities dedicated to surveillance for detection and control of vector-borne diseases and pests.
Subject terms: Population dynamics, Ecological epidemiology, Research data
Introduction
Arthropods play a dominant role in the dynamics of practically all natural and human-modified terrestrial ecosystems1–3 and have significant economic and health effects. For example, certain insects provide significant economic benefits (e.g. pollination) exceeding $57 billion a year to the United States alone4. Invasive insects, however, cost an estimated $70 billion dollars per year globally5 and insect pests may reduce agricultural harvests by up to 16%, with an equal amount of further losses of harvested goods6. Particularly noteworthy is a subset of arthropods that are disease vectors, transmitting pathogens to and between animals as well as plants. Vector borne diseases cause billions of dollars in crop and livestock losses, every year7–9. In humans, vector borne diseases account for more than 17% of all infectious diseases (e.g. malaria, Chagas, dengue, and leishmaniasis, Zika, West Nile, Lyme disease, and sleeping sickness), with hundreds of thousands of deaths, hundreds of millions of cases, and billions of people at risk, annually10,11.
The current economic and health burden of arthropod pests, exacerbated by invasive species, and uncertain effects of climate change12,13, has driven significant research programs and data collection efforts. These include crop pest, mosquito, and tick survey and reporting initiatives14–18, citizen science projects19–21, and digitization of museum specimen data22,23, all yielding a rich and growing trove of field-based data spanning multiple spatial and temporal scales. The monitoring of arthropod abundance (e.g. Fig. 1) in different disciplines (e.g., biodiversity research, pest-control assessment, vector borne disease monitoring, and pollination research) has similar objectives — to quantify abundance, phenology and geographical ranges of target arthropod species — and entails similar techniques. However, the data produced by these various efforts are often not reusable, or comparable to similar data, as they are typically not recorded in a standard format (e.g. Darwin Core), or do not provide adequate metadata. In contrast, the advent of journal-mandated deposition of data from high-throughput technologies (e.g. NCBI and GenBank), data and code sharing, and other practices to improve transparency and reusability of research results are increasing rapidly across the sciences24–29. Furthering these advances through standardization and public archiving of arthropod abundance data can bring significant benefits, including (1) supporting empirical parameterization and validation of mathematical models (e.g. of pest or disease emergence and spread), (2) validation of model predictions, (3) reduction in the duplication of expensive empirical research, and (4) revealing new patterns and questions through meta-analyses11,30–33. This will also lead to substantial public benefit through improved human, animal, plant, and ecosystem health, and reduced economic costs.
One of the key impediments to the re-use of these data is the lack of adequate metadata or data descriptors (i.e. data about the data)34–37. In general, for data to be most valuable to the scientific community, they should meet the FAIR Principles – they should be Findable, Accessible, Interoperable and Reusable – and delineate the key components of good data management and stewardship practices38,39. Data are Findable and Accessible when they are archived and freely downloadable from an online public data repository that is indexed and easily searchable. Interoperability and reusability describe the ease with which humans or computer programs can understand the data (e.g. via metadata) and explore/re-use them across a variety of non-proprietary platforms. Even when data are available, metadata for arthropod abundance data are often absent or not readily interpretable, limiting their reusability at a fundamental level.
Results
A minimum information standard for arthropod abundance data
Here, we present a Minimum Information for Reusable Arthropod Abundance Data (MIReAD) standard for reporting primarily longitudinal (repeated, temporally explicit) field-based collections of arthropods. ‘MIReAD’ also evokes ‘Myriad,’ a countless or extremely great number. Abundance is measured and reported in different ways, and MIReAD fields have been designed to allow researchers to capture this complexity. Examples 1–4 (which can be found in Figshare40), provide examples for how to report such different types of abundances. However, we do not encourage the reporting of relative population abundances since these are not raw data as such, but derived values. One might argue that this could lead to (and it probably will) loss of information, but we argue that the reporting of raw abundance or occurrence data is non-negotiable if these data are to be reused. For example, incorrect statistical methods to aggregate data, such as taking the arithmetic mean for skewed abundance data across samples or replicates are not uncommon, and we wish to discourage such practices. In the same manner as has been developed in other biological disciplines41–46, this standard is “minimum” because it defines the necessary minimal information required to understand and reuse a dataset without consulting any further persons, text, materials, or methods47. MIReAD is designed to facilitate data archiving efforts of publishers and field researchers. It is not a data model [the explicit definition of data field names, data formats (e.g., for dates and GPS locations)] and therefore does not define controlled vocabularies, or specific field titles, but should be easy to understand and interpret by the wider scientific community47.
MIReAD is separated into two components, metadata and data. For each component, we provide a description of the information that should be included, recommendations for how to make that information as useful as possible, and examples. The metadata component (Online-only Table 1) includes information for the origin of the data set (e.g. study information and licensing for usage). The second component (Online-only Table 2) lists and describes specific data fields that should be included in data collection sheets. We also provide recommendations and examples to demonstrate how these recommendations can be implemented. MIReAD was designed to match the data that are generally collected by academic researchers and surveillance initiatives, and can serve as a checklist for important information that needs to be recorded but is often unintentionally omitted (e.g. Fig. 2a). By adhering to MIReAD standards, omissions and ambiguity can be avoided even if the data are shared in different formats (Fig. 2b,c). Finally, we identify common problems likely to be encountered across all the MIReAD metadata and data fields, and data quality standards that can be employed to avoid confusion (Box 1).
Online-only Table 1.
Field | Details | Recommendations | Examples |
---|---|---|---|
Contact details | A name, person, authority, etc. that may be contacted with enquiries about the data. | Include investigator ORCID(s), email address, website (if institutional) if possible. | Kurt Vandegrift orcid.org/0000-0002-5690-3300 kurtvandegrift@gmail.com State University Agricultural Extension John Smith (jsmith@StateU.edu) www.StateU.edu/AgriculturalExtension/ |
General description of the experiment/ collection set | A short description of the study objectives, sampling design, and hypotheses. Used to aid in browsing multiple studies. A short title and long form name might be helpful. |
Useful things to indicate are: Random sampling or continuous monitoring in fixed locations General time frames and location. General description of where data is from. Subsampling details, if relevant. Rationale for trap placement, experimental design, etc. may be provided, if relevant. |
“Monitoring of major pests on cucumber, sweet pepper and tomato under net-house conditions in Punjab, India” “Pennsylvania Ixodes scapularis weekly abundance” Continuous (weekly) monitoring of tick numbers attached to white-footed mice in fixed locations in Pennsylvania, USA (12 sites). 2003-present.” “Long term aphid emergence monitoring using continuous suction traps” |
Citations | Reference to related publications, digital if possible (e.g. DOI(s) or PMID(s)). | “A web-based relational database for monitoring and analyzing mosquito population dynamics Sucaet Y, Van Hemert J, Tucker B, Bartholomay L.” “PMID: 18714883” “Horiuchi, Kaho, Kosei Hashimoto, and Fumio Hayashi. Cantharidin world in air: Spatiotemporal distributions of flying canthariphilous insects in the forest interior. Entomological Science (2018).” |
|
Species Identification Method | A description of method of species identification. Particularly important for cryptic species complexes. | Providing information on the veracity of the identification is encouraged, such as a reference to the exact identification key or method. | “Morphological” “Genotyped, using method of Smith et al. 2014, PMID: 18714883” “Morphological: used keys of Doe (1958)” “High confidence morphological ID by Jane Doe” |
Not present vs zero information | Indication of what gaps, zeros, NA, etc. mean. | It is imperative, especially for population surveys, to understand the difference between a species was not found when the collection method would be expected to find the given species (confirmed absence), or a species was not looked for (e.g. a trap failure) Preferably, a zero indicates a species was looked for and not found, and a NA represents a species was not looked for/trap failure/ etc. Blank values are highly discouraged Authors might also assert a definition of absence. |
“Zero indicates a species was looked for and not found. NA represents a trap failure” “When a species is indicated as absent, it was expected due to extensive sampling and found in other places during the study but not found here.” |
GPS information | If raw GPS data is obfuscated in any way, a statement on the manner by which this occurred should be given. The ellipsoid, geodetic datum, or spatial reference system (SRS), if known, can also be identified. GPS unit accuracy could also be provided. |
The highest resolution data (e.g. trap-level, specific GPS location) are the most useful. It is hoped that no data obfuscation occurs. Some common GPS point obsfucations, if utilized, that should be noted: aggregation (making the areal unit larger, e.g. increasing pixel size in a raster (and generating new centroid coordinates) reducing precision of location (reducing GPS decimal points) dithering (‘moving’ GPS points via adding a degree of ‘error’ – distance – to the X,Y) |
“Raw GPS points have been provided. GPS unit accuracy + /− 8 m” “GPS locations have had precision reduced by truncating to 3 decimals” “Points were dithered by displacing points randomly by X distance” “WGS84 datum used and data was aggregated to 1 km pixels” |
Data usage information | The data reuse policy for your data. Please provide a creative commons license identification. See https://creativecommons.org for more information. |
For data to be F.A.I.R., it must be Reusable. We therefore recommend data be provided as “CC0” or “CC BY 4.0”. “CC0”, under which data are made available for any use without restriction or particular requirements on the part of users “CC BY 4.0”, under which data are made available for any use provided that attribution is appropriately given for the sources of data used, in the manner specified by the owner (e.g. citation). |
“CC 0” or “CC BY 4.0” |
Online-only Table 2.
Field(s) | Details | Recommendations | Examples |
---|---|---|---|
Start Time (for collection) | Start time of the data sample collection. e.g. The trap was set… |
Be as specific as practically possible. Any unambiguous format is acceptable. However, do not use two-digit year abbreviations. If relevant, provide time zone in field or in header; a 24 hour clock is preferred but should be made unambiguous as to which time format is being used. |
“2012-04-27” “July 26, 2017” “2017-Jul-26” “2017-July-26 Morning” “2017-Jul-26 20:00 GMT” |
End Time (for collection) | End time of the data sample collection. e.g. The trap was collected… |
See above. If instantaneous data collection (e.g. a tick drag), End Time may be the same as Start Time. |
See above. |
Location | The geographical location of sample collection. | As detailed as possible. Latitude and longitude if possible with specified accuracy Providing both a GPS point (decimalized GPS points are preferred) field and a geographical name field is preferred. Note only providing location names is highly discouraged as they change over time and can be ambiguous. Both Place / Trap names and GPS fields can be provided. If obfuscation was used, it should be indicated in the Metadata (Online-only Table 1). Splitting latitude and longitude further into two columns further reduces ambiguity. |
“Kukar Maikiya, Jigawa State, Nigeria” and “40.697” and “ −74.015” “40.697” and “ −74.015” |
Collection method | Sampling apparatus (e.g. trap type, observation method) | “CDC light trap” “Tick drag” “Quadrat count” “BG Sentinel Trap” “Pitfall trap” “Sticky trap” “Larval dip” “Johnson suction trap” “Lindgren Funnel Trap” |
|
Collection attractants | The attractant/ lures used to attract insects to a trap or collection | Please be as specific as possible. For example, a cow used as bait should be referred to as a ‘cow’ and not ‘animal’, or specify the attractant used, such as ‘CO2’ Company names, in particular for chemical attractants, are helpful. Please explicitly state if no attractant was used. |
“None” “Carbon dioxide” “UV light” “Biogents Sweetscent Mosquito Lure” “Human” "Russel IPM: CAT-QLURE-CB" |
Collection area | The spatial extent (area or volume) of the sample. | If relevant (e.g., when collection method is transect or quadrat), in units of area or volume, the spatial coverage of the sampling unit. Note this field would not typically be used for passive collections from fixed traps. |
“100 m^2” “1 liter” “1 ha” “10 m^3” |
Taxonomy | Classification of sample collected. | Scientific genus and species preferred. Avoid abbreviation. |
“Chortoicetes terminifera” “Aedes aegypti” “Anopheles gambiae sensu stricto” “Chrysodeixis argentifera“ |
Unit(s) of measurement and observation | Description of exactly what was observed, the unit for the field “Value,” below. For counts, should indicate life stage, sex, etc. Unit measures can be encoded into value field header. Consider multiple unit fields (e.g. separate fields for sex and stage.) See Fig. 2. |
Do not abbreviate. Coded data key should be provided in field name (e.g. “1 = species present 0 = species absent”) |
“Number of individuals per m^2” “Adult Females” “Males and Females” and “Nymphs” |
Value | The numerical amount or result from the sample collection. Often this will be a quantity of observed individuals. Unit measures can be encoded into value field header. See Fig. 2. |
Units should be provided in a separate field or in the header. | “0” “23” “Yes” “Not present” |
Additional sample information field(s) | This could be more than one field and should be used when more information is required to understand the experiment, for example experimental variables, sub-locations, plant host cultivar/species, etc. Some users may report wind speeds, temperatures, elevations etc. This could also include data on disposition of any voucher specimen. |
Do not abbreviate. For voucher specimens, it should be made known that a specimen is in fact a voucher and second, if different pieces of the voucher are stored separately (e.g., insect on pin, leg in freezer) or at different institutions. |
“Forest” vs “Field” “Winter” vs “Summer” “Inside” vs “Outside” “200 meters above sea level” “Cowrie” vs “Leichhardt” vs “Surf” (Soybean cultivars) |
Sample Name | A human readable sample name. May exist solely for the benefit of the depositor in organizing their data, use their own internal naming conventions etc. May also be used to tie related observations together. This field may be useful for updating records once deposited in a repository. |
Naming convention is not restricted, but any encoded metadata should be revealed in the other data fields. For example, you may name a sample named ‘Aphid1_StickyTrap_Jan4,’ but you will still have “Sticky Trap” listed in a Collection Method field, and “Jan 4, 2017” in the date field. | “Trap1_Night1” “Armyworm_2” “00004” “Jan08_animal_4,” “ABC_123_4b” |
Figure 2b provides an annotated example. Field names in bold should be considered required. Remaining fields are optional or depend on the complexity of the experimental design.
Box 1 Data quality standards.
No abbreviations. Abbreviations (including in columns names) are ambiguous, with the exception of measurement units (e.g. centigrade and meters).
No external legend/key files. While repetitive, all data should be explicitly given within the data table. Separate files mapping ID numbers to GPS locations, full species names, etc. should be avoided. In addition, rich metadata are essential for good data discovery and reuse.
Unambiguous dates. Because of country-level differences in date formats, data should be reported unambiguously with four-digit years, and months provided alphabetically and not numerically (e.g. 4-Jun-2017 or Nov 12, 2015,) or by using ISO 8601 date format (YYYY-MM-DD, e.g. 2019-01-27)
Machine-readable file formats. Data should be provided in non-proprietary machine-readable formats such as comma-separated text files. PDFs and multiple spreadsheets in the same document should be avoided.
No font styling or subsection headings. Formatting (color, bold, italics, subscripts, sheet tab names, etc.) should not be required for understanding the data. Subsection headings should not be required to understand data; every line of data should be interpretable in isolation from any other line of data.
Highest precision possible. Data should be provided at the highest temporal, spatial, numerical, and taxonomic resolution available. If location (e.g., geographical coordinate) data need to be presented at a lower resolution than available for privacy reasons, this should be made clear in the submission in Study Information (Resource Metadata; Online-only Table 1).
Language. Once data are ready to be deposited/submitted, all fields and data should be preferably written in English. This will allow researchers and data curators worldwide to understand and reuse the data. Use of other languages is better than not publishing data. Please avoid introducing data reuse barriers through incomplete translation. For example, avoid non-English field names in an English-language submission.
Examples
Below we provide three examples to illustrate MIReAD-compliant data (linked to examples 1–4 in Figshare40, respectively). Researchers can use these data sheets as a basis for formatting their own data. In these examples, note that all data meet the data quality standards of Box 1; are adequately described, have columns labeled, etc. to eliminate ambiguity (even if the data appear repetitive; for example, the sex and life stage are repeated in every row). Examples 1 and 2 should be sufficient for most data generators. Examples 3–4 demonstrate more complex data collection scenarios.
Long-format trapping data
Each row captures count data for a single species’ occurrence in a given sampling event. This illustrates an example of the most common mosquito collection protocol (MIReAD_example_1.csv40). Also see Fig. 2b.
Wide format trapping data
Each row captures count data from a given sampling event. Each identified taxonomic group is identified in a separate column. An ‘additional sample information’ field, ‘sub-location,’ has been added to describe the various locations around the village where collections were made (MIReAD_example_2.csv40). This illustrates an example of adult mosquito populations that have been tracked over time and in specific locations. Also see Fig. 2c.
Complex trapping data scenarios
Tick surveillance performed using tick drags and flags and collections of ectoparasites on trapped mice. The tick drags/flags report three life stages independently (adult, larvae, and nymph) (MIReAD_example_3.csv40). Larvae are only identified to the genus, while adults and nymphs are identified to the species. Observations of different life stages and sexes are preferably documented in separate records. A Sample Name is used to help link these records (but would not be necessary.) The mouse survey uses an additional sample information field to record the sex of the trapped mouse from which the parasites were collected (MIReAD_example_4.csv40).
Discussion
MIReAD as the path to FAIR data principles
We designed MIReAD to achieve a balance between standards that are too onerous for data generators with guidelines that are sufficient to ensure at least minimal reusability31,41. It balances a perfectly formatted and reusable dataset with all necessary metadata in a consistent format (but comes with a high burden on the dataset generator) versus a dataset that is unusable or re-usable due to missing or incomplete metadata. MIReAD allows for a relatively easy standardization, as it ensures all necessary collection metadata is present in an unambiguous manner. By not mandating any particular field name, field order, or controlled vocabulary terms, we will in fact gain traction from other more rigorous (and thus more onerous) data models, for which lack of minimum standards in data are often a first and major hurdle. In striking this balance, we note that MIReAD focuses on capturing information on ‘what’ was done, rather than ‘why’. We acknowledge that for some use cases, this may hinder reusability48 but for the majority of cases, where the results of the original data can be interpreted without understanding the rationale, providing data in MIReAD format will be sufficient for data reuse.
Like all minimum standards, MIReAD only aims at ensuring data ‘Reusability’. However, ultimately this will promote the implementation of data models, and controlled vocabularies (e.g., the Darwin Core49). Data models enable ‘Interoperability’, and in turn facilitate structured databases, public repositories, and development of data analysis tools47,50. Deposition in open databases make data ‘Findable’ and ‘Accessible’51–53. MIReAD compliant data contain sufficient information for established aggregators/databases such as VectorBase and SCAN (Symbiota Collections of Arthropods Network54) to process and store the data in a standardized data model [e.g., Darwin Core, a widely used universal data standard that supports opportunistic observation and collection data (occurrence core) as well as presence/absence and abundance data collected using strict and documented methodology (event core)49], and ultimately facilitate data transfer to even more comprehensive biodiversity databases. For example, GBIF contains over one billion species occurrence records, from thousands of environmental, ecological, and natural resource investigations, including research on Arthropoda in numerous ecological and monitoring projects, allowing for study of changes and trends in populations53. Indeed, in Tables 1, 2, we provide an example of the mapping of data fields from MIReAD to DarwinCore and GBIF. In this way, MIReAD opens the door to FAIR data and more sophisticated methods to integrate data across many scales.
Table 1.
MIReAD Field | Corresponding GBIF metadata fields |
---|---|
Contact details | contact |
General description of the experiment/collection set | designDescription; sampling |
Citations | citation |
Species Identification Method | designDescription |
Not present vs zero information | samplingDescription |
GPS obfuscation information | geographicDescription geodeticDatum |
Data usage information | intellectualRights |
See GBIF63 for more information.
Table 2.
MIReAD Field | Corresponding DarwinCore fields |
---|---|
Start Time (for collection) | eventTime |
End Time (for collection) | eventTime |
Location | A number of fields under Location See: http://rs.tdwg.org/dwc/terms/#location |
Collection method | samplingProtocol |
Collection attractants | samplingProtocol |
Collection area | samplingEffort |
Taxonomy | A number of fields under Taxon See http://rs.tdwg.org/dwc/terms/#taxon |
Unit(s) of measurement and observation | sampleSizeUnite |
Value | sampleSizeValue |
Additional sample information | fieldNotes |
eventRemarks | |
SampleID | eventID |
Sample Name | e.g. fieldNumber for individual observations; see also SampleID above for the field names for the complex samples |
See Wieczorek et al.64 for more information.
Benefits to field researchers
It is essential that the benefits of a minimal data standard extend not just to data re-users, but also to the researchers who collect and generate data in the first place. MIReAD provides a framework for data preparation that can help scientists achieve recognized professional merit for sharing data such as increased citation rates, academic recognition, opportunities for co-authorship, and new collaborations [sensu Roche et al.31]. Large, deposited data sets can now themselves be standalone, citable “data papers” (e.g.55–57) or even depositions without any traditional manuscript (but as an authored ‘digital product,’ with persistent identifiers, such as a DOI number), if desired. Data sets are increasingly recognized as valuable research outputs that count towards academic recognition and professional advancement (e.g. grants, interviews, and tenure). For example, several funders (e.g. United States National Science Foundation and Swiss National Science Foundation) have adopted or are in the process of adopting the Declaration on Research Assessments (DORA)58, offering further opportunities for data generators to gain recognition and publication credit for their work59. Also, an increasing number of funders are mandating public data access, and detailed data management plans are often required even at the grant proposal stage. Therefore, reporting data according to MIReAD will provide a basis for stipulating archival formats. We also note that by storing data in MIReAD format, data generators can assure that their data contains all the necessary metadata for their own internal use. As time passes, research staff, sampling protocols, and sampling locations change, and thus the recording of minimal information ensures long-term reusability of data.
Furthermore, many data generators are also data users. Developing analyses that rely on standardized fields can facilitate the development of generalized analytical tools that can be easily extended to datasets beyond those that were collected by a single individual or lab. In this way, they can enable extensions of work that would otherwise not happen, such as comparisons of population dynamics in different locations or assessments of interspecies interactions. Adopting MIReAD can, therefore, both help data generators reap the benefits of sharing data they have collected and enable them to more readily leverage data collected by others.
Further MIReAD applications and extensions
The creation of minimum information standards for these types of databases facilitates analyses of data at scales that cannot be attained by a single individual or lab group. Linking records to additional information also extends the utility of these data to address population level questions. For example, a well-populated database presents opportunities to investigate interactions between populations of different species of arthropod that overlap in geography but may be of interest individually to different realms of research. As a case in point, in the northeastern USA, Agrilus plannipennis, the Emerald Ash Borer, is a highly destructive invasive insect, monitored closely by both state and federal agencies for management60. Interestingly, Emerald Ash Borers are creating new habitats for carpenter bees, a species interaction that can be tracked and anticipated using large scale arthropod data.
Another example of the utility of linked data is for disease vectors. Data on insecticide resistance linked with time and place would be valuable for coordinating control strategies within and between nations and communities. Presence/absence data on infection levels would be helpful for tracking and investigating disease outbreaks and dynamics. Standardization of these data would be particularly useful for pathogens that infect multiple vectors and hosts and would facilitate a “One Health” approach. Other important vector phenotypes that contribute to control and transmission such as pathogen susceptibility, biting preferences, and breeding behaviours could be measured over time and space.
Indeed, MIReAD would be useable for any arthropod abundance data collection effort, not just medical, veterinary, and agricultural pests or invasive arthropods. We note this standard is applicable not only to abundance measurements, but could be easily extended to any other kind of routinely sampled time-series field data. For example, in addition to aphid abundance, plant pathogen (such as mosaic virus) infection and insecticide resistance statuses of the aphids could be reported in MIReAD format. Note that MIReAD can also be used for cross-sectional data (i.e. a non-continuous, one-time sampling effort) by simply reporting data from the single collection period by utilizing a single Start Time and End Time (Online-only Table 2).
Disseminating MIReAD
Many data generators are already storing or sharing data in a manner that would be consistent with MIReAD (e.g. on MosquitoNet or NEON), but we call on data generators, authors, reviewers, editors, journals, research infrastructures (e.g. data repositories) and funders to embrace MIReAD as a standard to facilitate FAIR data use and compliance for arthropod abundance data. We propose that workshops, outreach at conferences and meetings, and interfacing with data repositories, societies and organizations (e.g. SpeciesLink, the American Mosquito Control Association, MosquitoNet, Symbiota, VectorNet, and VectorBiTE), and journal editors will be the best way to spread the adoption of this standard.
Conclusion
We present MIReAD as a minimum information standard for representing arthropod abundance data. MIReAD will facilitate collation and analyses of data at scales that cannot be attained by a single individual or lab in order to address key questions across temporal and spatial scales, such as within and across-year phenology of abundance of target arthropod taxa over large geographical areas. This is particularly important given the pressing need to understand and predict the population dynamics of harmful (e.g., disease vectors and pests) as well as beneficial (e.g., pollinators, bio-control agents) arthropods in natural and human modified landscapes. This is the first step for achieving the broad benefits of FAIR data for arthropod abundance.
Acknowledgements
The seeds of this effort were planted in 2016 at a meeting of VectorBiTE, which is a cross-disciplinary research coordination network (RCN) for disease vectors. Samuel S.C. Rund, Matthew Watts, Kurt Vandegrift, Naveed Heydari, Cynthia Lord, Michael Johansson, Samraat Pawar, and Sadie J. Ryan, received travel funding from NIH grant 1R01AI122284-01 and BBSRC grant BB/N013573/1 as part of the joint [NIH-NSF-USDA-BBSRC] Ecology and Evolution of Infectious Diseases program. Samuel S.C. Rund was funded by the Royal Society (NF140517). Rund, Daniel Lawson, Robert M. MacCallum, Sarah A. Kelly, Gloria I. Giraldo-Calderón and Scott J. Emrich were supported by the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contract No. HHSN272201400029C (VectorBase Bioinformatics Resource Center). Kurt Vandegrift was funded by the National Science Foundation Ecology and Evolution of Infectious Diseases program (1619072). Naveed Heydari and Sadie J. Ryan were funded by National Science Foundation (NSF DEB EEID 1518681). Sadie J. Ryan was additionally funded by NIH 1R01AI136035-01, and CDC grant 1U01CK000510-01: Southeastern Regional Center of Excellence in Vector-Borne Diseases: the Gateway Program. This publication was supported by the Cooperative Agreement Number above from the Centers for Disease Control and Prevention. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the Centers for Disease Control and Prevention. Jennifer M. Zaspel was funded by the National Science Foundation Division of Biological Infrastructure (NSF 1561448, NSF 1601957).
Online-only Table
Author Contributions
The project was conceptualized by Samuel S.C. Rund, Lauren Cator and Samraat Pawar. The original draft was prepared by Samuel S.C. Rund, Michael A. Johansson, Naveed Heydari, Kurt Vandegrift, Matthew Watts, and Samraat Pawar. All the authors contributed to reviewing and editing the manuscript.
Data Availability
No novel data were generated for this report. We encourage readers to view the datasets that inspired and informed our work at www.vectorbase.org, www.gbif.org, www.vectorbyte.org, and in our other publication14.
Competing Interests
The authors declare no competing interests.
Footnotes
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Seastedt TR, Crossley DA. The influence of arthropods on ecosystems. Bioscience. 1984;34:157–161. doi: 10.2307/1309750. [DOI] [Google Scholar]
- 2.Moore JC, Walter DE, Hunt HW. Arthropod regulation of micro- and mesobiota in below-ground detrital food webs. Annu. Rev. Entomol. 1988;33:419–439. doi: 10.1146/annurev.en.33.010188.002223. [DOI] [Google Scholar]
- 3.Whiles MR, Charlton RE. The ecological significance of tallgrass prairie arthropods. Annu. Rev. Entomol. 2006;51:387–412. doi: 10.1146/annurev.ento.51.110104.151136. [DOI] [PubMed] [Google Scholar]
- 4.Losey JE, Vaughan M. The economic value of ecological services provided by insects. Bioscience. 2006;56:311–323. doi: 10.1641/0006-3568(2006)56[311:TEVOES]2.0.CO;2. [DOI] [Google Scholar]
- 5.Bradshaw CJA, et al. Massive yet grossly underestimated global costs of invasive insects. Nat. Commun. 2016;7:12986. doi: 10.1038/ncomms12986. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Bebber DP, Ramotowski MAT, Gurr SJ. Crop pests and pathogens move polewards in a warming world. Nat. Clim. Chang. 2013;3:985–988. doi: 10.1038/nclimate1990. [DOI] [Google Scholar]
- 7.Sparling, P. F., Hamburg, M. A., Relman, D. A., Choffnes, E. R. & Mack, A. Vector-Borne Diseases: Understanding the Environmental, Human Health, and Ecological Connections, Workshop Summary. Forum on Microbial Threats: Board on Global Health. p. 1–40, (National Academies Press, 2008). [PubMed]
- 8.Minjauw, B. & McLeod, A. Tick-borne diseases and poverty: the impact of ticks and tick-borne diseases on the livelihoods of small-scale and marginal livestock owners in India and eastern and southern Africa. 8, (Centre for Tropical Veterinary Medicine, 2003).
- 9.Van den Bossche P, de La Rocque S, Hendrickx G, Bouyer J. A changing environment and the epidemiology of tsetse-transmitted livestock trypanosomiasis. Trends Parasitol. 2010;26:236–243. doi: 10.1016/j.pt.2010.02.010. [DOI] [PubMed] [Google Scholar]
- 10.World Health Organization. Vector-borne diseases, http://www.who.int/news-room/fact-sheets/detail/vector-borne-diseases (2017).
- 11.Golding N, et al. Integrating vector control across diseases. BMC Med. 2015;13:249. doi: 10.1186/s12916-015-0491-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Elbers ARW, Koenraadt CJM, Meiswinkel R. Mosquitoes and Culicoides biting midges: vector range and the influence of climate change. Rev. Sci. Tech. 2015;34:123–137. doi: 10.20506/rst.34.1.2349. [DOI] [PubMed] [Google Scholar]
- 13.Sakai AK, et al. The population biology of invasive species. Annu. Rev. Ecol. Syst. 2001;32:305–332. doi: 10.1146/annurev.ecolsys.32.081501.114037. [DOI] [Google Scholar]
- 14.Rund SSC, Moise IK, Beier JC, Martinez ME. Rescuing troves of data to tackle emerging mosquito-borne diseases. J. Am. Mosq. Control Assoc. 2019;35:75–83. doi: 10.2987/18-6781.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Foley, D. H., Maloney, F. A. Jr., Harrison, F. J., Wilkerson, R. C. & Rueda, L. M. Online spatial database of US Army Public Health Command Region-West mosquito surveillance records: 1947–2009. US Army Med. Dep. J. Jul–Sep, 29–36 (2011). [PubMed]
- 16.Hutchinson ML, Strohecker MD, Simmons TW, Kyle AD, Helwig MW. Prevalence rates of Borrelia burgdorferi (Spirochaetales: Spirochaetaceae), Anaplasma phagocytophilum (Rickettsiales: Anaplasmataceae), and Babesia microti (Piroplasmida: Babesiidae) in host-seeking Ixodes scapularis (Acari: Ixodidae) from Pennsylvania. J. Med. Entomol. 2015;52:693–698. doi: 10.1093/jme/tjv037. [DOI] [PubMed] [Google Scholar]
- 17.Magarey RD, et al. Risk maps for targeting exotic plant pest detection programs in the United States: US risk maps for exotic plant pest detection. EPPO Bulletin. 2011;41:46–56. doi: 10.1111/j.1365-2338.2011.02437.x. [DOI] [Google Scholar]
- 18.Wilson BE, Beuzelin JM, VanWeelden MT, Reagan TE, Way MO. Monitoring Mexican rice borer (Lepidoptera: Crambidae) populations in sugarcane and rice with conventional and electronic pheromone traps. J. Econ. Entomol. 2017;110:150–156. doi: 10.1093/jee/tow264. [DOI] [PubMed] [Google Scholar]
- 19.Chandler M, et al. Contribution of citizen science towards international biodiversity monitoring. Biol. Conserv. 2017;213:280–294. doi: 10.1016/j.biocon.2016.09.004. [DOI] [Google Scholar]
- 20.Kampen H, et al. Approaches to passive mosquito surveillance in the EU. Parasit. Vectors. 2015;8:9. doi: 10.1186/s13071-014-0604-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Suprayitno, N., Narakusumo, R. P., von Rintelen, T., Hendrich, L. & Balke, M. Taxonomy and biogeography without frontiers - WhatsApp, Facebook and smartphone digital photography let citizen scientists in more remote localities step out of the dark. Biodivers. Data J. e19938 (2017). [DOI] [PMC free article] [PubMed]
- 22.Seltmann KC, et al. LepNet: The Lepidoptera of North America Network. Zootaxa. 2017;4247:73–77. doi: 10.11646/zootaxa.4247.1.10. [DOI] [PubMed] [Google Scholar]
- 23.Short AEZ, Dikow T, Moreau CS. Entomological collections in the age of Big Data. Annu. Rev. Entomol. 2018;63:513–530. doi: 10.1146/annurev-ento-031616-035536. [DOI] [PubMed] [Google Scholar]
- 24.Horton R. (Comment) Offline: What is medicine’s 5 sigma? The Lancet. 2015;235:1380. doi: 10.1016/S0140-6736(15)60696-1. [DOI] [Google Scholar]
- 25.Nakagawa S, Parker TH. Replicating research in ecology and evolution: feasibility, incentives, and the cost-benefit conundrum. BMC Biol. 2015;13:88. doi: 10.1186/s12915-015-0196-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Nosek BA, et al. Promoting an open research culture. Science. 2015;348:1422–1425. doi: 10.1126/science.aab2374. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Parker TH, et al. Transparency in ecology and evolution: Real problems, real solutions. Trends Ecol. Evol. 2016;31:711–719. doi: 10.1016/j.tree.2016.07.002. [DOI] [PubMed] [Google Scholar]
- 28.Smaldino PE, McElreath R. The natural selection of bad science. R. Soc. Open Sci. 2016;3:160384. doi: 10.1098/rsos.160384. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Ihle M, Winney IS, Krystalli A, Croucher M. Striving for transparent and credible research: Practical guidelines for behavioral ecologists. Behav. Ecol. 2017;28:348–354. doi: 10.1093/beheco/arx003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Poisot T, Mounce R, Gravel D. Moving toward a sustainable ecological science: don’t let data go to waste! Ideas in Ecology and Evolution. 2013;6:11–19. doi: 10.4033/iee.2013.6b.14.f. [DOI] [Google Scholar]
- 31.Roche DG, et al. Troubleshooting public data archiving: Suggestions to increase participation. PLoS Biol. 2014;12:e1001779. doi: 10.1371/journal.pbio.1001779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Culley TM. The frontier of data discoverability: Why we need to share our data. Appl. Plant. Sci. 2017;5:1700111. doi: 10.3732/apps.1700111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Gerstner K, et al. Will your paper be used in a meta‐analysis? Make the reach of your research broader and longer lasting. Methods Ecol. Evol. 2017;8:777–784. doi: 10.1111/2041-210X.12758. [DOI] [Google Scholar]
- 34.Ioannidis JPA, et al. Repeatability of published microarray gene expression analyses. Nat. Genet. 2009;41:149–155. doi: 10.1038/ng.295. [DOI] [PubMed] [Google Scholar]
- 35.Gilbert KJ, et al. Recommendations for utilizing and reporting population genetic analyses: the reproducibility of genetic clustering using the program STRUCTURE. Mol. Ecol. 2012;21:4925–4930. doi: 10.1111/j.1365-294X.2012.05754.x. [DOI] [PubMed] [Google Scholar]
- 36.Roche DG, Kruuk LEB, Lanfear R, Binning SA. Public data archiving in ecology and evolution: How well are we doing? PLoS Biol. 2015;13:e1002295. doi: 10.1371/journal.pbio.1002295. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Renaut S, Budden AE, Gravel D, Poisot T, Peres-Neto P. Management, archiving, and sharing for biologists and the role of research institutions in the technology-oriented age. Bioscience. 2018;68:400–411. doi: 10.1093/biosci/biy038. [DOI] [Google Scholar]
- 38.Wilkinson MD, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data. 2016;3:160018. doi: 10.1038/sdata.2016.18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Wilkinson MD, et al. A design framework and exemplar metrics for FAIRness. Sci. Data. 2018;5:180118. doi: 10.1038/sdata.2018.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Rund, S. S. C. et al. Example Minimum Information for Reusable Arthropod Abundance Data (MIReAD) files. figshare, 10.6084/m9.figshare.c.4248320 (2019).
- 41.Taylor CF, et al. The minimum information about a proteomics experiment (MIAPE) Nat. Biotechnol. 2007;25:887–893. doi: 10.1038/nbt1329. [DOI] [PubMed] [Google Scholar]
- 42.Yilmaz P, et al. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nat. Biotechnol. 2011;29:415–420. doi: 10.1038/nbt.1823. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Lourenço A, et al. Minimum information about a biofilm experiment (MIABiE): standards for reporting experiments and data on sessile microbial communities living at interfaces. Pathog. Dis. 2014;70:250–256. doi: 10.1111/2049-632X.12146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Brazma A, et al. Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nat. Genet. 2001;29:365–371. doi: 10.1038/ng1201-365. [DOI] [PubMed] [Google Scholar]
- 45.Bustin SA, et al. The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments. Clin. Chem. 2009;55:611–622. doi: 10.1373/clinchem.2008.112797. [DOI] [PubMed] [Google Scholar]
- 46.York WS, et al. MIRAGE: the minimum information required for a glycomics experiment. Glycobiology. 2014;24:402–406. doi: 10.1093/glycob/cwu018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Taylor CF, et al. Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project. Nat. Biotechnol. 2008;26:889–896. doi: 10.1038/nbt.1411. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Kelly-Hope LA, McKenzie FE. The multiplicity of malaria transmission: a review of entomological inoculation rate measurements and methods across sub-Saharan. Africa. Malaria J. 2009;8:19. doi: 10.1186/1475-2875-8-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Wieczorek J, et al. Darwin Core: an evolving community-developed biodiversity data standard. PLoS One. 2012;7:e29715. doi: 10.1371/journal.pone.0029715. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Giraldo-Calderón GI, et al. VectorBase: an updated bioinformatics resource for invertebrate vectors and other organisms related with human diseases. Nucleic Acids Res. 2015;43:D707–13. doi: 10.1093/nar/gku1117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Benson DA, et al. GenBank. Nucleic Acids Res. 2013;41:D36–42. doi: 10.1093/nar/gks1195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Berman H, Henrick K, Nakamura H. Announcing the worldwide Protein Data Bank. Nat. Struct. Biol. 2003;10:980. doi: 10.1038/nsb1203-980. [DOI] [PubMed] [Google Scholar]
- 53.GBIF: The Global Biodiversity Information Facility. What is GBIF?, https://www.gbif.org/what-is-gbif (2018).
- 54.Heinrich, P. L., Gilbert, E., Cobb, N. S. & Franz, N. Symbiota collections of arthropods network (SCAN): A data portal built to visualize, manipulate, and export species occurrences, http://openknowledge.nau.edu/2258/ (2015).
- 55.Perryman SAM, et al. The electronic Rothamsted Archive (e-RA), an online resource for data from the Rothamsted long-term experiments. Sci. Data. 2018;5:180072. doi: 10.1038/sdata.2018.72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Gossner MM, et al. A summary of eight traits of Coleoptera, Hemiptera, Orthoptera and Araneae, occurring in grasslands in Germany. Sci. Data. 2015;2:150013. doi: 10.1038/sdata.2015.13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Hedefalk F, Svensson P, Harrie L. Spatiotemporal historical datasets at micro-level for geocoded individuals in five Swedish parishes, 1813-1914. Sci. Data. 2017;4:170046. doi: 10.1038/sdata.2017.46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.The American Society for Cell Biology. San Francisco Declaration on Research Assessment, http://www.ascb.org/wp-content/uploads/2017/07/sfdora.pdf (2012).
- 59.Chavan V, Penev L. The data paper: a mechanism to incentivize data publishing in biodiversity science. BMC Bioinformatics. 2011;12(Suppl 15):S2. doi: 10.1186/1471-2105-12-S15-S2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Abell KJ, Bauer LS, Duan JJ, Van Driesche R. Long-term monitoring of the introduced emerald ash borer (Coleoptera: Buprestidae) egg parasitoid, Oobius agrili (Hymenoptera: Encyrtidae), in Michigan, USA and evaluation of a newly developed monitoring technique. Biol. Control. 2014;79:36–42. doi: 10.1016/j.biocontrol.2014.08.002. [DOI] [Google Scholar]
- 61.Dunphy, B. M., Rowley, W. A. & Bartholomay, L.C. A taxonomic checklist of the mosquitoes of Iowa. J. Am. Mosq. Control Assoc. 30, 119–121 (2014). [DOI] [PubMed]
- 62.Sucaet, Y., Van Hemert, J., Tucker, B. & Bartholomay, L. A web-based relational database for monitoring and analyzing mosquito population dynamics. J. Med. Entomol.45, 775–784 (2008). [DOI] [PubMed]
- 63.Ó Tuama, E., Braak, K. & Remsen, D. GBIF Metadata Profile – How-to Guide, https://github.com/gbif/ipt/wiki/GMPHowToGuide (2011).
- 64.Wieczorek, J., Döring, M., De Giovanni, R., Robertson, T. & Vieglais, D. Darwin Core Terms: A quick reference guide, http://rs.tdwg.org/dwc/terms/index.htm (2018).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
No novel data were generated for this report. We encourage readers to view the datasets that inspired and informed our work at www.vectorbase.org, www.gbif.org, www.vectorbyte.org, and in our other publication14.