Skip to main content
STAR Protocols logoLink to STAR Protocols
. 2026 Feb 26;7(1):104408. doi: 10.1016/j.xpro.2026.104408

Protocol for geographical mapping of pesticide contamination

Yabi Huang 1, Zijian Li 1,2,3,
PMCID: PMC12964003  PMID: 41758645

Summary

Pesticide residues in the environment threaten agricultural and human health. Here, we present a framework for mapping and comparing the contamination levels of pesticides across specific environmental compartments in various regions. We describe steps for applying the pesticide residual concentration and assigning a score to regions by calculating the deviation of pesticide concentrations in each region from the overall central tendency. Comparing scores among regions can help environmental agencies identify regions with high pollution scores and implement timely control strategies.

For complete details on the use and execution of this protocol, please refer to Huang and Li.1

Subject areas: Health Sciences, Environmental sciences, Earth sciences

Graphical abstract

graphic file with name fx1.jpg

Highlights

  • Steps for collecting and cleaning global freshwater pesticide concentration data

  • Procedure for handling non-detects and reshaping data for scoring analysis

  • Instructions for calculating and mapping contamination scores for countries


Publisher’s note: Undertaking any experimental protocol requires adherence to local institutional guidelines for laboratory safety and ethics.


Pesticide residues in the environment threaten agricultural and human health. Here, we present a framework for mapping and comparing the contamination levels of pesticides across specific environmental compartments in various regions. We describe steps for applying the pesticide residual concentration and assigning a score to regions by calculating the deviation of pesticide concentrations in each region from the overall central tendency. Comparing scores among regions can help environmental agencies identify regions with high pollution scores and implement timely control strategies.

Before you begin

Background

Pesticides are widely used in agriculture.2 They can enter vairous environment compartments and then cause adverse effects on non-target organisms and human health.3,4,5 Currently, numerous studies have reported pesticide residues in various areas.6,7,8 However, due to variations in sampling times and the pesticide types, individual pesticide concentrations fail to adequately represent the contamination levels in a specific area during a given time period. Other approaches to assess pesticide pollution include scoring index systems such as the Pollution Index (PI)9 and the Pesticide Toxicity Index (PTI).10 These indices are typically defined as ratios of measured concentrations to water quality standards or toxicity benchmarks, but their applicability is limited by the availability of threshold values. Therefore, we introduce a scoring approach based on the deviation of measured concentrations from their central tendency to assess and compare pesticide contamination levels across regions.11 This approach relies solely on measured concentration data without using threshold values.

Innovation

Unlike previous index systems, which rely on regulatory thresholds or toxicity benchmarks, this protocol uses deviations from the central tendency of measured concentrations to characterize relative contamination levels. It could calculate pesticide contamination scores for specific regions within a given period, highlight contamination hotspots and enable ranking of regions based on contamination scores. This innovative approach offers a data-driven tool applicable in regions where regulatory benchmarks are limited. In addition, this framework is also adaptable to other environmental pollutants, such as pharmaceuticals or industrial chemicals.

Preparation for software

Inline graphicTiming: 1–2 h

Before starting the data collection and computational tasks, ensure that the following necessary software and tools are installed and ready for use:

Ensure that you have access to licensed versions if necessary.

  • 3.
    Install R packages.
    • a.
      Install readxl: This package is used to import Excel files that contain collected data into R.
      >install.packages("readxl")
    • b.
      Install tidyr: The dplyr is used for data tidying, making data easier to analyze and visualize.
      >install.packages("tidyr")
    • c.
      Install dplyr: The dplyr mainly functions to data manipulation and cleaning, handling structured data frames.
      >install.packages("dplyr")
  • 4.

    Download and prepare the relevant map data (e.g., shapefiles or satellite images). The website (http://www.naturalearthdata.com; http://www.fao.org/landwater/databases-and-software/geonetwork/en/) can be a useful resource for geographical and environmental data.

Recommended timing

The total time required for this method largely depends on the scale of data collection. Specifically, as the data volume increases or the number of locations and pesticides considered grows, the time needed for subsequent data cleaning, processing, and analysis also increases. Using pesticide contamination in global surface freshwater from 2010 to 2023 as an example, this paper aims to provide the timing interval of each step.

Key resources table

REAGENT or RESOURCE SOURCE IDENTIFIER
Deposited data

Dataset on Global Water Pesticide Residues (2010–2023) Huang and Li1 https://doi.org/10.1016/j.isci.2025.112861

Software and algorithms

Microsoft Excel 2010 Microsoft https://excel.cloud.microsoft/
R (Version: 4.5.2) R Core Team https://www.r-project.org/; RRID: SCR_001905
R Studio (Version: 2025.09.2+418) RStudio Team http://www.rstudio.com/; RRID: SCR_000432
readxl (Version: 1.4.5) R Core Team https://cran.r-project.org/web/packages/readxl/index.html
dplyr (Version: 1.1.4) R Core Team https://cran.r-project.org/web/packages/dplyr/index.html
tidyr (Version: 1.3.2) R Core Team https://cran.r-project.org/web/packages/tidyr/index.html
writexl (Version: 1.5.4) R Core Team https://cran.r-project.org/web/packages/writexl/index.html
readxl (Version: 1.4.5) R Core Team https://cran.r-project.org/web/packages/readxl/index.html
ArcGIS (Version: 10.8) Environmental
Systems Research
Institute
https://www.arcgis.com/index.html; RRID SCR_011081
QGIS (Version: 3.44) QGIS project https://qgis.org/community/organisation/

Step-by-step method details

The main steps of the method are shown in Figure 1.

Figure 1.

Figure 1

Detailed instructions for the method

Collecting data

Inline graphicTiming: 1–2 months, depending on the data volume

Here, we describe the steps for a systematic literature search and data collection procedure to identify and record peer-reviewed studies reporting pesticide concentrations in freshwater.

  • 1.
    Define search strategy and data sources.
    • a.
      Use academic databases (e.g., Web of Science, PubMed) and official environmental monitoring program websites as primary sources.
    • b.
      Use the following keyword groups combined with AND operators: (“freshwater” OR “river” OR “lake” OR “groundwater” OR “well”) AND (“pesticide” OR “insecticide” OR “herbicide” OR “fungicide”) AND (“residuals” OR “pollution” OR “concentration”).
    • c.
      Limit searches to publications dated 2010–2023.

Note: Further expand the search strategy by screening relevant references cited in the retrieved results.

  • 2.
    Export search results and retrieve full texts.
    • a.
      Export citations (RIS/CSV) and download full texts or data files where available.
    • b.
      Use reference managers (e.g., Mendeley, EndNote) to organize results.
  • 3.
    Select research.
    • a.
      Perform title/abstract screening followed by full-text review. Include studies that meet all of the following inclusion criteria.
    • b.
      Inclusion criteria:
      • i.
        Published in peer-reviewed journals.
      • ii.
        Sampling years from 2010 to 2023.
      • iii.
        Record pesticide concentrations in surface freshwater or groundwater (rivers, lakes, wells).
      • iv.
        Provide numeric pesticide concentration data (not only figures).
      • v.
        Provide sampling metadata (location and year).
    • c.
      Exclusion Criteria: Exclude reviews, meta-analyses, predictive-only studies, and analyses based solely on national monitoring summaries without primary concentration data.
  • 4.

    Extract concentration and metadata. Store the relevant information in a spreadsheet.

Note: Concentration information. Includes pesticide name, average concentrations, and units. Metadata. Includes sampling location name and coordinates, sampling year, reported limit of detection (LOD) or limit of quantification (LOQ) if available.

Note: If the average pesticide concentration is not provided, median, minimum, or maximum concentration values can be used as alternatives. If the detected concentration of a pesticide is below the LOD or LOQ, the LOD or LOQ value may be used instead. This conservative approach is commonly used in environmental studies12 to retain non-detected data while avoiding underestimation of contamination levels.

  • 5.

    Compile the dataset and save it in Excel format.

Preprocessing the dataset

Inline graphicTiming: 1 day

These steps aim to enhance data consistency and improve the efficiency of subsequent computations.

  • 6.

    Arrange the pesticides in descending order of detection frequency and sort the literature entries by continent and country name.

  • 7.
    Handle non-detected and missing values.
    • a.
      Replace concentrations below LOD or LOQ with the reported LOD or LOQ value, respectively.
    • b.
      Excluded records lacking both concentration information and LOD/LOQ.
  • 8.

    Convert all concentration units to μg/L to facilitate calculation.

Note: For example, divide concentration values reported in ng/L by 1000, and multiply values reported in mg/L by 1000.

  • 9.

    Assign unique identifiers to pesticides. For each pesticide, assign a unique Chemical Abstract Service Registration Number (CAS No.) by searching the name on the official website (https://www.cas.org/cas-data/cas-registry).

Note: The column headers in the Excel spreadsheet are as follows: Literature No, Country Name, Continent Name, Sampling Year, Sampling Region, Latitude and Longitude, Sample Size, Pesticide Name, CAS No, Mean Concentration, LOD (Limit of Detection), LOQ (Limit of Quantification). That is, the Excel spreadsheet is in long format.

Indications and choice of spatial unit

Inline graphicTiming: 0.5 h

Here, we present several key considerations to guide the selection of appropriate spatial units when applying this protocol, with the goal of ensuring the interpretability and comparability of contamination scores.

Note: Country-level application can be treated as an illustrative or exploratory use case. However, for geographically large or environmentally diverse countries, such as Canada, China and the USA, the country-level aggregation may obscure substantial heterogeneity in environmental conditions and monitoring coverage within the country. Country-level contamination scores for these countries may be misleading if interpreted as representative of a uniform exposure context. We therefore recommend applying this protocol to spatial units that are as homogeneous as possible, such as sub-national administrative regions (e.g., states or provinces), river basins, or uniform grid cells, depending on data availability and study objectives. The choice of spatial unit is critical and should be carefully considered prior to implementation. Here, we provide several indications that could help to decide whether to use aggregated data at the national level or more subdivided regional data.

  • 10.

    Evaluate the country area.

Note: For example, in countries with an area greater than 2.7 million km2, the assessments were recommended to be performed at the provincial or state level, as substantial intra-national heterogeneity was expected. This threshold was selected based on the visible discontinuity in the global distribution of country areas, below which country sizes decrease sharply. Users may adjust this threshold based on their specific research objectives and data availability.

  • 11.

    Evaluate the number and distribution of sampling sites.

Note: When pesticide sampling sites are clustered in specific regions rather than evenly distributed across the country, it is recommended to assess subnational pollution scores.

  • 12.

    Consider the study interest.

Note: While this protocol provides recommended indications for selecting spatial units, the final choice may also depend on specific research objectives. Users are encouraged to justify their chosen spatial scale and to explicitly discuss potential biases associated with spatial aggregation.

Note: This subsection could serve as a spatial normalization strategy, aiming to improve the comparability of contamination scores across regions by reducing scale-induced bias.

Constructing the computation framework

Inline graphicTiming: 1 h

These steps are to convert the tabular data from long to wide format, facilitating subsequent table calculations.

  • 13.

    Generate unique identifiers for each record. For each row of the dataset, create a unique Code by combining the literature reference number, country name, and CAS number.

Note: This “Code” will serve as the primary key for further processing.

  • 14.
    Construct the dataset in R.
    • a.
      Import the Excel dataset.
      >library(readxl)
      >Data <- read_excel("File name.xlsx", sheet = "xxx")
    • b.
      Select relevant columns.
      >Data <- Data [, c("Code", "CAS", "Concent")
    • c.
      Reshape the dataset from long to wide format.
      >library(dplyr)
      >library(tidyr)
      >Data <- Data %>%
      pivot_wider(names_from = CAS, values_from = Concent)
    • d.
      Export processed dataset.
      >library(writexl)
      >write_xlsx(Scores, path = "Sea-Outcome.xlsx")

Calculating the regional contamination scores

Inline graphicTiming: 4–5 h

This step involves the core computational procedure for deriving regional contamination scores based on the previously preprocessed dataset, enabling quantitative comparison of contamination levels across different spatial units.

  • 15.

    Calculate the global central tendency. The overall average pesticide concentration is computed across all sampling sites worldwide, weighted by the number of sites:

LogCn¯=1Qq=1QLogCn

where Q is the total number of sampling sites globally. LogCn refers to the log-transformed concentration of pesticide n in the specific country/region.

  • 16.

    Calculate the contamination score (S):

S=1Mm=1M{1Nn=1N[LogCn,mlogCn¯]Pesticides}SitesM1,N1

Where LogCn,m denotes the log (10)-transformed value of pesticide (n) concentration in the sampling sites (m) of the country/region. logCn¯ represents the global central tendency of logarithmic concentration of pesticide n (logCn). Pesticide concentration data are typically skewed distributions and span several orders of magnitude across sampling sites and regions. Therefore, logarithmic transformation is applied in the equations to reduce the influence of extreme values and to stabilize variance, which is a common practice in environmental concentration analyses.13 However, it may also compress the differences in pesticide contamination levels across different regions.

  • 17.

    Compile the pollution score data of countries into a table.

Note: To illustrate the calculation process, we provide a simplified numeric example implemented in the Supplementary Excel Material, where the global central tendency and regional contamination scores are calculated step by step using standard Excel functions.

Note: The contamination scores are used to compare relative concentration levels among spatial units based on available measurements. However, the results should be interpreted cautiously, combined with the sampling-site maps to reduce scale-induced bias.

Mapping the contamination

Inline graphicTiming: 1 day

Here, we briefly describe the method for visualizing contamination scores and geographic locations of sampling sites.

  • 18.

    Download the spatial boundary data.

Note: Spatial boundary shapefiles (e.g., country, province, or county) can be obtained from the ArcGIS Hub.

  • 19.

    Import spatial boundary data into ArcGIS as a polygon layer.

  • 20.

    Import the regional contamination score table into ArcGIS.

Inline graphicCRITICAL: Check that the names of the spatial unit in the contamination score table match those in the spatial boundary dataset.

  • 21.

    Join attribute data and generate contamination maps.

Note: Apply a graduated color or classified symbol system to visualize contamination scores across spatial units.

  • 22.

    Import the sampling site latitude and longitude data as a point layer.

  • 23.

    Overlay the sampling site layer onto the contamination map to visualize spatial coverage.

Expected outcomes

By following this protocol, researchers will obtain a dimensionless pesticide contamination score (S) for each spatial unit (e.g., country, province/state, watershed, or grid cell), depending on the chosen scale. These scores highlight regions with relatively elevated pesticide residues compared to the global average and enable cross-regional ranking. In conclusion, the scoring results reveal pollution hotspots and regions where pesticide contamination levels are significantly higher than others.

The ArcGIS mapping further delivers visual outputs, including country-level contamination scores and point-level sampling site distributions. These maps can be used to illustrate spatial patterns of pesticide contamination, identify clusters of high exposure, and provide intuitive support for environmental management and policy decisions.

Limitations

Firstly, the pesticide contamination scores calculated by this protocol can be influenced by the quantity of collected concentration data and the spatial balance of sampling sites. When data is large and sampling sites are evenly distributed, the scores can well represent a country’s relative contamination level. However, when data are limited, or sampling sites are unevenly distributed, this protocol does not recommend countries as the optimal spatial unit. For large or heterogeneous countries, users are encouraged to apply this protocol at sub-national scales (e.g., states, provinces, watersheds) to reduce spatial aggregation bias. Secondly, since this method is based solely on pesticide concentrations without considering safety thresholds, the resulting scores only reflect relative contamination levels and cannot indicate the potential risk to human health or the environment. In other words, a high pesticide contamination score for a country does not necessarily imply that its pesticide residues pose a certain risk; such risk must be assessed separately.

Troubleshooting

Problem 1

Insufficient concentration data for certain countries. [See Step 1].

Some countries have very few published studies or monitoring data, which may reduce the representativeness of the contamination scores.

Potential solution

Search additional literature, official monitoring reports, or contact local agencies to supplement the dataset. If still limited, mark these countries and interpret their scores with caution.

Problem 2

Errors during data reshaping from long to wide format in R. [See Step 14].

The pivoting process may fail if the dataset contains missing, duplicated, or inconsistent Code identifiers.

Potential solution

Ensure each row has a unique Code, clean and standardize column names, and check for missing values before constructing the computation framework. Perform small test runs before reshaping the full dataset.

Problem 3

Influence of extreme concentration values on contamination scores. [See Step 16].

Extremely high or low pesticide concentration values may strongly influence the calculated contamination scores. These extreme values can therefore bias the relative comparison among regions.

Potential solution

Examine pesticide-level statistics and consider sensitivity analyses to assess the influence of extreme values on contamination scores.

Problem 4

Uneven spatial distribution of sampling sites. [See Step 17].

Sampling sites may be concentrated in specific regions while other areas have few or no data points, which can bias country-level contamination scores.

Potential solution

Conduct sensitivity analyses to evaluate the effect of uneven sampling. Clearly report limitations in under-sampled regions.

Problem 5

Variation in sampling years. [See Step 17].

Pesticide concentration data span multiple years, and temporal differences may affect the comparability of contamination scores. Some countries may have older or sporadic data, leading to bias in the relative contamination assessment.

Potential solution

Consider performing sensitivity analyses to evaluate whether including only recent data (e.g., past 5 years) significantly changes the results. When possible, assign temporal weights or highlight older datasets to inform interpretation.

Resource availability

Lead contact

Further information and requests for resources should be directed to and will be fulfilled by the lead contact, Zijian Li (lizijian3@mail.sysu.edu.cn).

Technical contact

Technical questions on executing this protocol should be directed to and will be answered by the technical contact, Zijian Li (lizijian3@mail.sysu.edu.cn).

Materials availability

This study did not generate new unique reagents.

Data and code availability

The article includes the code needed for the protocol, and any further questions can be directed to the lead contact.

Acknowledgments

This work was financially supported by the National Natural Science Foundation of China (32472598) and the Shenzhen Science and Technology Program (JCYJ20250604174437049).

Author contributions

Y.H., writing – review and editing, writing – original draft, methodology, and data curation; Z.L., writing – review and editing, writing – original draft, methodology, funding acquisition, data curation, and conceptualization.

Declaration of interests

The authors declare no competing interests.

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.xpro.2026.104408.

Supplemental information

Table S1. Example dataset and stepwise calculation of global central tendency and country-level contamination scores, related to step 17

(A) Example dataset. (B) Calculation of global central tendency. (C) Country-level contamination score calculation.

mmc1.xlsx (13.2KB, xlsx)

References

  • 1.Huang Y., Li Z. Global mapping of freshwater contamination by pesticides and implications for agriculture and water resource protection. iScience. 2025;28 doi: 10.1016/j.isci.2025.112861. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.FAO . FAO; 2022. Pesticides Use, Pesticides Trade and Pesticides Indicators-Global, Regional and Country Trends; pp. 1990–2020. [Google Scholar]
  • 3.Ahmad M.F., Ahmad F.A., Alsayegh A.A., Zeyaullah M., Alshahrani A.M., Muzammil K., Saati A.A., Wahab S., Elbendary E.Y., Kambal N., et al. Pesticides impacts on human health and the environment with their mechanisms of action and possible countermeasures. Heliyon. 2024;10 doi: 10.1016/j.heliyon.2024.e29128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Tang F.H.M., Lenzen M., McBratney A., Maggi F. Risk of pesticide pollution at the global scale. Nat. Geosci. 2021;14:206–210. doi: 10.1038/s41561-021-00712-5. [DOI] [Google Scholar]
  • 5.Zhou W., Li M., Achal V. A comprehensive review on environmental and human health impacts of chemical pesticide usage. Emerg. Contam. 2025;11 doi: 10.1016/j.emcon.2024.100410. [DOI] [Google Scholar]
  • 6.Huang Y., Li Z. Assessing pesticides in the atmosphere: A global study on pollution, human health effects, monitoring network and regulatory performance. Environ. Int. 2024;187 doi: 10.1016/j.envint.2024.108653. [DOI] [PubMed] [Google Scholar]
  • 7.Syafrudin M., Kristanti R.A., Yuniarto A., Hadibarata T., Rhee J., Al-onazi W.A., Algarni T.S., Almarri A.H., Al-Mohaimeed A.M. Pesticides in Drinking Water—A Review. Int. J. Environ. Res. Public Health. 2021;18:468. doi: 10.3390/ijerph18020468. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Shekhar C., Khosya R., Thakur K., Mahajan D., Kumar R., Kumar S., Sharma A.K. A systematic review of pesticide exposure, associated risks, and longterm human health impacts. Toxicol. Rep. 2024;13 doi: 10.1016/j.toxrep.2024.101840. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Siddique S., Chaudhry M.N., Ahmad S.R., Nazir R., Zhao Z., Javed R., Alghamdi H.A., Mahmood A. Ecological and human health hazards; integrated risk assessment of organochlorine pesticides (OCPs) from the Chenab River, Pakistan. Sci. Total Environ. 2023;882 doi: 10.1016/j.scitotenv.2023.163504. [DOI] [PubMed] [Google Scholar]
  • 10.Nowell L.H., Norman J.E., Moran P.W., Martin J.D., Stone W.W. Pesticide Toxicity Index—A tool for assessing potential toxicity of pesticide mixtures to freshwater aquatic organisms. Sci. Total Environ. 2014;476–477:144–157. doi: 10.1016/j.scitotenv.2013.12.088. [DOI] [PubMed] [Google Scholar]
  • 11.Zhang X., Li Z. Investigating industrial PAH air pollution in relation to population exposure in major countries: A scoring approach. J. Environ. Manage. 2023;338 doi: 10.1016/j.jenvman.2023.117801. [DOI] [PubMed] [Google Scholar]
  • 12.Helsel D.R. Vol. 77. John Wiley & Sons; 2011. (Statistics for Censored Environmental Data Using Minitab and R). [DOI] [Google Scholar]
  • 13.CAN W., LOG O. Log-normal Distributions across the Sciences: Keys and Clues. Bioscience. 2001;51:341–352. doi: 10.1641/0006-3568(2001)051[0341:LNDATS]2.0.CO;2. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Table S1. Example dataset and stepwise calculation of global central tendency and country-level contamination scores, related to step 17

(A) Example dataset. (B) Calculation of global central tendency. (C) Country-level contamination score calculation.

mmc1.xlsx (13.2KB, xlsx)

Data Availability Statement

The article includes the code needed for the protocol, and any further questions can be directed to the lead contact.


Articles from STAR Protocols are provided here courtesy of Elsevier

RESOURCES