Abstract

Freshwater cyanobacterial harmful algal blooms (cyanoHABs) are a worldwide problem resulting in substantial economic losses, due to harm to drinking water supplies, commercial fishing, wildlife, property values, recreation, and tourism. Moreover, toxins produced from some cyanoHABs threaten human and animal health. Climate warming can affect the distribution of cyanoHABs, where rising temperatures facilitate more intense blooms and a greater distribution of cyanoHABs in inland freshwater. Nutrient runoff from adjacent watersheds is also a major driver of cyanoHAB formation. While some of the physicochemical factors behind cyanoHAB dynamics are known, there are still major gaps in our understanding of the conditions that trigger and sustain cyanoHABs over time. In this perspective, we suggest that sufficient data sets, as well as machine learning (ML) and artificial intelligence (AI) tools, are available to build a comprehensive model of cyanoHAB dynamics based on integrated environmental/climate, nutrient/water chemistry, and cyanoHAB microbiome and ‘omics data to identify key factors contributing to HAB formation, intensity, and toxicity. By taking a holistic approach to the analysis of all available data, including the rapidly growing number of biological data sets, we can provide the foundational knowledge needed to address the increasing threat of cyanoHABs to the security of our water resources.
1. Introduction
Climate change is causing a rise in the appearance and intensity of harmful algal blooms (HABs) in coastal and inland waters and in areas of the world where they have never been seen before.1−5 HABs affect aquatic ecosystems directly by producing toxins that affect other aquatic species, as well as birds, other animals, and humans, and indirectly by generating large hypoxic areas at the HAB site.6 HABs also result in economic losses of several billions of dollars annually in the United States alone, from harm to fishing, recreation, and tourism industries and to drinking water supplies.7 For example, in August 2014, the toxic cyanobacterial (cyano)HAB in Lake Erie caused by Microcystis aeruginosa forced shutdown of the water supply to over 500 000 residents of Toledo, Ohio, for >2 days8 (Figure 1). This cyanoHAB was subsequently attributed to a complex interaction of biological and environmental factors.9
Figure 1.

(top) HABs are a common occurrence in western Lake Erie (image used with permission from U.S. Geological Survey). (bottom left, right) CyanoHABs can be toxic to wildlife and to humans and pets who use the water for recreation. (Images licensed from Dreamstime (https://www.dreamstime.com/).
Extensive research of freshwater blooms has resulted in the characterization of independent parameters (e.g., abundance of phosphorus, nitrogen, iron, water temperature) influencing HAB onset.10−13 A better understanding of HABs can enable the development of more predictive signatures for forecasting and mitigating toxic HAB formation; however, the complexity of bloom dynamics and the difficulties of reproducing a bloom in a lab setting14,15 complicate the integration of HAB-associated measurements into a model explaining bloom formation and toxin release.10,16,17 It is our position that machine learning (ML) and artificial intelligence (AI) based frameworks are well-suited for the integration and analysis of biological and environmental data from naturally occurring HABs and that such an approach would greatly improve our ability to predict cyanoHABs, enable decision-makers to implement mitigation strategies, and provide a means to discover underlying mechanisms.18,19
ML and AI technologies have proved to be transformative for the analysis and interpretation of big science data.20 Notably, remote sensing coupled with ML technology can predict near-term bloom onset; however, the associated predictive uncertainties increase over longer timeframes.16 Other ML-based models predict aspects of cyanobacterial HAB information.21−25 However, these models address only the intensity of a HAB, e.g., predictions based on chlorophyll content, and do not address other key properties, such as toxin titer, hypoxia, and basification. In the future, as continued monitoring and studies of cyanoHABs in a collaborative research environment produce an abundance of physical, chemical, and biological data sets, we posit that explainable ML will become essential for comprehensive analyses of factors leading to cyanoHAB formation and toxin release that can inform the development of decision support tools to secure water systems and protect public health.
2. Biological Aspects of HABs
The biological causal agents of HABs belong to a range of taxonomic groups; to date (April 2023) the Intergovernmental Oceanographic Commission of the United Nations Educational, Scientific, and Cultural Organization (IOC-UNESCO) Taxonomic Reference List of Harmful Micro Algae contains 203 strains belonging to diatoms, haptophytes, dinoflagellates, raphidophyceans, dictyochophyceans, pelagophyceans, and cyanobacteria.26 This diversity explains the widespread occurrence of HABs at the global scale, underscored also by the variety of toxins and biological agents and the adverse effects within a given water ecosystem.27 For example, members of the genus Alexandrium (marine dinoflagellates) may have the capacity to produce saxitoxins, spirolides, goniodomins, unknown lytic molecules, etc.,28 which affect the fauna of coastal areas. Conversely, a type of toxin, microcystin, can be produced by certain cyanobacteria of the genera Anabaena, Microcystis, and Planktothrix(29) in freshwater lakes and reservoirs, and even on land after irrigation with contaminated water.30 Among these, Microcystis is the most well-studied, as reflected in the number of publications (Figure 2) on toxic cyanobacterial genera identified by Skulberg et al.31 and Cronberg et al.32
Figure 2.
Number of publications on toxic cyanobacterial genera31,32 that exist in the Web of Science database as of April 2023.
In the United States, Microcystis spp. are some of the most studied cyanoHAB-producing freshwater cyanobacteria.33 The factors that induce microcystin production are varied and not entirely understood; these include nutrient status,34 trace metal availability,35 and abundance of reactive oxygen species (ROS),36 among others. Conversely, microcystins may participate in a variety of metabolic roles, such as in ROS protection under high light and oxidative conditions,37 as a moderate siderophore (chelating iron)38 or iron shuttle39 during iron-limiting conditions, and as a defensive molecule against grazers and niche competitors.40,41 Microcystins are endotoxins and stay within the cellular cytoplasm until population senescence and cellular breakage. After release, they can linger in aquatic systems for weeks.42 While abiotic factors, such as pH and UV light, influence the degradation of microcystins in the environment,43 interestingly there are microcystin-degrading microbes within a post-HAB community that can assist in faster toxin breakdown.44 Moreover, cyanobacteria are known to form symbiotic relationships with higher organisms and other bacteria in their marine ecosystems.45 In addition, local environmental conditions may drive differences in bacterial community composition in freshwater ecosystems.21,46 Increased knowledge about community composition at cyanoHAB sites may provide insight into the identification of sentinel species that may help predict cyanoHAB formation, toxicity, or other dynamics.
Therefore, while axenic cultures have long been considered the standard for research, it has become evident that a holistic approach is necessary to model and characterize the HAB dynamics found in both natural and laboratory settings.47 For example, Pound et al.48 were able to attribute the observed experimental variability in Microcystis spp. microcosm studies to the activity of Microcystis-infecting phages—an effect which would have otherwise been overlooked and unreported. The microbiome associated with HAB strains certainly plays a role in the progression of the bloom, as observed in Harsha Lake (Ohio) transcriptomes, where a relationship between nutrient availability and recycling and the bacterial population dynamics was observed, as well as for a cyanophage surge and the corresponding Microcystis decline.49 Besides their role in nutrient availability, heterotrophic bacteria can also complement the metabolic capabilities of cyanoHAB strains, as suggested in phylosymbiosis studies where Roseomonas and Rhodobacter showed high cophylogeny with Microcystis.50 Consequently, studying the microbiome associated with Microcystis HABs can further provide insights for modeling the mechanisms used for its efficient competition for nutrients and light as well as for designing community-based mitigation strategies after toxin release. By evaluating HAB data sets independently, and within the context of their physiology and ecosystem, we can gain a better understanding of causal and functional parameters of HABs, as shown in the review by Dick et al.,51 which can improve toxicity prediction and the effectiveness of mitigation strategies.
3. Data Resources
Data on HABs has been gathered since 1954.52 Several countries and regions have set up monitoring and reporting systems that are accessible online, which has led to the accumulation of large data sets and several publications (Figure 3). However, the data sets are independently curated, use different instrumentation and data storage formats, and have different levels of spatiotemporal resolution. Therefore, substantial data engineering is required for these data sets to be fused and holistically analyzed. We note that most research on HABs has utilized primarily physical and chemical measurements. In Table 1, we present examples of large data sources that are, or can be, leveraged to understand the ecology of HABs.
Figure 3.
CyanoHAB data ecosystem depicting cyanoHAB data sources and sampling approaches. These include grab samples or fixed/buoy telemetry sampling directly from the lake, weather conditions from telemetry stations on land, and satellite remote sensing of HABs. By analysis of the data collectively it will be possible to build a model that will ultimately predict cyanoHAB occurrence, persistence, and toxicity and inform decision-makers.
Table 1. Examples of Large Data Resources That Can Be Synergized for a Better Understanding of Algal Blooms from the Top Four Countries/Regions to Publish Data on HABs.
| source | description | URL |
|---|---|---|
| NOAA | National Oceanic and Atmospheric Administration: Several programs and services collecting physical, chemical, and biochemical data are available through the National Centers for Environmental Information. | https://www.ncei.noaa.gov |
| NASA | National Aeronautics and Space Administration: NASA provides access and support for the Cyanobacterial Assessment Network (CyAN), including mobile app development and deployment; bloom intensities are quantified using the Cyanobacterial Index (CI). | https://www.nccs.nasa.gov/services/climate-data-services |
| EPA | United States Environmental Protection Agency: EPA communicates essential information regarding algal blooms and coordinates with state environmental agencies to gather and disseminate data on algal blooms. | https://www.epa.gov/water-research/harmful-algal-blooms-monitoring-and-remote-sensing-research |
| USGS | United States Geological Survey: USGS leads and partners with several research institutions on algal bloom research and provides data sets from the Great Lakes through the Great Lakes Science Center. | https://www.usgs.gov/centers/great-lakes-science-center/data |
| GLOS | Great Lakes Observing System: GLOS is a multi-institutional program focused on monitoring the Great Lakes. | https://glos.org/ |
| NEON | National Ecological Observatory Network: NEON is a program supported by the National Science Foundation and operated by Battelle; several data sets are provided on water and benthic physical, chemical and biological parameters. | https://data.neonscience.org/data-products/explore |
| HABSOS | Harmful Algal BloomS Observing System: HABSOS is a primary observation program run by NOAA, to track HABs in Florida and Texas. | https://habsos.noaa.gov/ |
| NETLAKE | Networking Lake Observatories in Europe: This is a multi-institutional program collecting data on lakes in Europe, Ireland, and North America (excluding the Great Lakes). | https://www.dkit.ie/netlake/netlake-resources/netlake-metadatabase |
| HAEDAT | Harmful Algal Event Database: HAEDAT is a harmful algal bloom event metadatabase maintained by the International Oceanographic Data and Information Exchange. Records are available from 1985. | http://haedat.iode.org/ |
| Aquastat, FAO | United Nations Food and Agriculture Organization: This organization provides global water quality data. | https://wbwaterdata.org/organization/about/aquastat |
| Waterbase, Europe | Data is provided by the European Environmental Agency on water quality and biology. | https://www.eea.europa.eu/data-and-maps/data/waterbase-water-quality-icm; https://www.eea.europa.eu/data-and-maps/data/waterbase-biology |
| LAGOS-US/NE | Lake multiscaled geospatial and temporal database: This is a multi-institutional project that collects morphometric, water quality, and algal bloom data in the United States (except the Great Lakes region). | https://lagoslakes.org/products/ |
| NCBI SRA | National Center for Biotechnology Information Sequence Read Archive: This is a repository for sequencing data, not specific to HABs, but several thousand HAB related data files are available. | https://www.ncbi.nlm.nih.gov/sra |
| NCBI Refseq | National Center for Biotechnology Information Reference sequences: This is a curated and well-annotated set of nucleic and amino acid sequences. | https://www.ncbi.nlm.nih.gov/refseq/ |
| JGI | Joint Genome Institute: This is a U.S. Department of Energy center that offers standardized nucleic acid data from projects around the world. | https://jgi.doe.gov/ |
| CyanometDB | CyanometDB is a detailed secondary metabolite database for cyanobacteria. | https://comptox.epa.gov/dashboard/chemical-lists/CYANOMETDB |
| CyanoPATH | CyanometPATH is a database of biological pathways specifically associated with harmful algal blooms. | http://www.csbg-jlu.info/CyanoPATH/ |
| Water Data Online, Australia | This is a metadatabase on water physical parameters from water bodies across 6000 stations in Australia. | https://researchdata.edu.au/water-online/1369955 |
| CNEMC, China | China National Environmental Monitoring Center: This gathers data on water quality and algal blooms in China. | https://szzdjc.cnemc.cn:8070/GJZ/Business/Publish/Main.html |
| NIES, Japan | National Institute of Environmental Studies: This gathers data on water quality and algal blooms in Japan. | https://tenbou.nies.go.jp/gis/ |
| WQQDB, Germany | Water Quality and Quantity DataBase: This gathers data on water quality and morphometry in Germany. | https://www.ufz.de/record/dmp/archive/7754/de/ |
3.1. Physical Data
Physical parameters such as weather (temperature and precipitation), water pH, water temperature, and water turbidity (often reported as transparency in terms of Secchi depth) can influence several characteristics of freshwater ecosystems. It is not only their measured value but also the temporal trends that can be influential. For example, sustained elevations in water temperature can favor the growth of the cyanoHAB forming Microcystis spp. while inhibiting others. Heavy and sustained precipitation can lead to hydrological changes, which, in turn, can change chemical parameters as well as increase water turbidity. Water turbidity affects light penetration, and this, in turn, can regulate toxin production.53 Physical data is obtained either on site at water bodies of interest, through remote sensing and satellite imagery, or at regional weather offices. Organizations in several countries have data portals (Table 1) to national repositories for water quality data. In the United States, several hydrological systems that supply drinking water and accept wastewater are monitored using inline sensors and have dedicated data streams available from NOAA, U.S. Geological Survey (USGS), and the U.S. Environmental Protection Agency’s (EPA) Central Data Exchange (CDX). We note that several independent studies have been carried out with similar data acquisition that are found only through literature meta-analyses.54−57 Gaps and corruptions in physical data exist, most likely due to instrument malfunctions. Even though these are infrequent, so are HABs, from a data acquisition perspective. Therefore, it is important to consider developing advanced gap-filling techniques to prepare the existing data for ML projects. In many cases, there are redundant and proxy data sources that can be used to fill or correct data points. The time resolution of data can also vary widely, and this can make it difficult to implement time-series-based predictive models.
3.2. Chemical Data
Chemical parameters describe the form and abundance of micro- and macronutrients, ultimately influencing their bioavailability. Commonly measured chemical parameters include total nitrogen (N), total phosphorus (P), dissolved oxygen (DO), and alkalinity (pH). Such chemical data is available in public databases like the Environmental Research Division Data Access Program (ERDDAP) and CDX, hosted by NOAA and EPA, respectively. Because different bloom forming strains react uniquely to specific compositions of N and P molecular species and these reactions are not fully understood,58 we suggest delineation of the different N (e.g., nitrate, ammonia, organic nitrogen) and P (e.g., soluble reactive phosphorus, particulate phosphorus, dissolved inorganic phosphorus, and dissolved organic phosphorus) species present in the ecosystem of interest, especially if ML is to be leveraged to gather predictive insights. However, the collection of chemical species data is labor intensive, and therefore, in many cases, chemical species data is collected only during algal blooms or a few days in advance. There can also be time lags between the acquisition and the processing of the sample for chemical speciation. This can lead to systematic gaps in the data in the context of global data acquisition. In the future, if more data on chemical speciation can be collected and made publicly available in a timely manner, this may greatly add to insight on bloom dynamics and ecological outcomes (e.g., duration of sustained biofilms, toxin release, biological community succession) that can be learned using ML approaches.
3.3. Biological Data
Several types of data are available and cover a wide range of sensitivities and specificities with regard to identification of microorganisms in a water sample. Image-based analyses such as microscopy have been used in several programs to monitor areas prone to algal blooms.59 The current state of the art is to gather molecular data in the form of environmental nucleic acid populations, as well as using enzyme linked immunosorbent assays (ELISAs) to identify biological entities and their toxic products, respectively. Molecular characterization of HABs is becoming increasingly reliant on ‘omics technologies such as genomics, transcriptomics, and proteomics, but these have been used by only 2.6% of all publications on HABs and report mostly on Microcystis dominated blooms.
There are over 45 000 depositions in the NCBI Sequence Read Archive (NCBI SRA) with “freshwater” and “lake” as keywords (search query: lake [Text Word] AND freshwater [Text Word]), with over 6200 samples from the Great Lakes. Most of these reads are 16s rRNA polymerase chain reaction (PCR) amplicons, deep-sequenced using Illumina next generation sequencing platforms. This data is commonly used as a fingerprint for classification of organisms into operational taxonomic units (OTUs). In environmental samples, this is a useful tool to determine the presence and abundance of different species, which is a common measure of biodiversity. One can also use bioinformatics tools such as Picrust260 to approximate biological function potential. We collectively refer to these activities as low resolution metagenomics (LRM). LRM data is being collected much more frequently from important watersheds such as the Great Lakes, in the spate of novel bloom characteristics that have been identified in recent years.9,61 Deeper characterization of environmental samples, using high resolution metagenomics (HRM) and metatranscriptomics (MT) is becoming more common and is mainly focused on the identification of biosynthetic gene clusters (BGCs) that are responsible for the formation of toxins in cyanobacteria.62 However, we note that the processes for such characterizations are much more expensive and unstandardized63,64 than LRM. We speculate that continued reduction in the cost of DNA sequencing and advancement of direct RNA sequencing methods will lead to even greater adoption of HRM and MT by researchers as they provide much more detailed information about a sample. For example, they allow resolution of species (narrower OTUs than genera), more accurately measure genetic variation and its linkage to phenotype variation,51 and offer clues on the functional status of genes and regulation of their activity. Furthermore, they allow detection of molecular signatures for viruses, which do not have ribosomes but are abundant in the environment and can have both beneficial and pathogenic functions.65 Lastly, novel genes can be identified using HRM and MT approaches.
Although LRM data is currently much more abundant, ML projects on HABs reported in the literature have focused on HRM data.66 The usefulness of LRM data for ML with respect to lake ecology remains undetermined, although successful applications have been demonstrated with predictions of soil productivity.67,68 We therefore suggest greater adoption of LRM data usage in ML of HAB dynamics will be beneficial in the near future, while in the distant future, affordable HRM and MT capabilities will increase the predictive power of ML models for HABs.
3.4. Bloom Occurrence Data
The occurrence of blooms has been tracked historically by community observations. Several national and state-level citizen monitoring programs are in place, and these offer cross-examinable information on bloom occurrence. Some such programs are accessible through the Harmful Algae website (https://hab.whoi.edu/regions-resources/citizen-science-programs) hosted by the Woods Hole Oceanographic Institution in the United States. Particularly, studies led by the Stockholm (https://www.su.se/english/research/research-projects/the-citizen-science-algae-project) and Gothenburg (https://www.gu.se/en/ocean/ocean-research/marine-citizen-science) universities in Sweden are also leveraging citizen science for HAB reporting. Recently, satellite imagery and remote sensing have become more popular for monitoring bloom occurrence. The Landsat and Sentinel satellite platforms are used by multiple agencies such as NASA, NOAA, and the USGS to record bloom occurrence in the United States. Global occurrences of algal blooms are available on the Harmful Algal Event Database (HAEDAT) (see Table 1).
4. Spatiotemporal Scale and Resolution
Data on water bodies in which HABs have been documented span vast spatial scales and are collected at varying frequencies. For example, on one end of the spectrum, automated sampling at water bodies provides high temporal resolution (minute-resolved) but the data represents the vicinity (a few meters around the sampling instruments, microscale). At the other end, satellites can provide hyperspectral imagery over large swaths of the Earth (synoptic-planetary scale). For instance, the Great Lakes Observing System (GLOS) focuses on monitoring the Great Lakes and collates physical, chemical, and biological data on per minute, hourly, and daily bases, while satellite imagery for the same region is available on a weekly basis via the NOAA Moderate Resolution Imaging Spectroradiometer (MODIS) database. The distributions of data, organized by the temporal resolution, spatial scale, and data type, are presented in Table 2.
Table 2. Data Availability at Different Meteorological Spatial Scales and Temporal Resolutionsa.
Gray shading indicates that data is available at the given spatial scale and temporal resolution. Example data sources cited are the following: GLOS Seagull (sponsored by the Great Lakes Observing System (GLOS) (https://glos.org/priorities/seagull/); the Joint Genome Institute Genome Portal sponsored by the U.S. Department of Energy (DOE) (https://jgi.doe.gov); NOAA Climate Data Online (CDO) (https://www.ncei.noaa.gov/cdo-web/datasets); the Lake Erie Sensor Network (http://lees.geo.msu.edu/sensor_net/); the Level-1 and Atmosphere & Distribution System, Distributed Active Archive Center (LAADS DAAC) sponsored by NASA (https://ladsweb.modaps.eosdis.nasa.gov); and the U.S. Environmental Protection Agency (EPA) Water Quality Portal (https://www.waterqualitydata.us).
5. Machine Learning Methods Applied to HABs
5.1. Summary of ML Tools Developed to Analyze and Forecast Algal Blooms
Classical hydroecological models can efficiently simulate geophysical processes.69 In a review of HAB modeling in the face of climate change,70 the authors present a comprehensive summary of the use of these process-based HAB models and make several recommendations for how the field can move forward by using them to explicitly represent key physical and biological factors in HAB development. Satellite-based remote sensing methods such as the Cyanobacteria Index (CI-cyano) algorithm, a spectral shape based algorithm,71 are also used for the detection of cyanobacteria in water bodies. Recently, in a novel approach of combining multiple data sources, Mishra et al.72 compared the correspondence between cyanoHAB information from field and satellite observations in an effort to evaluate CI-cyano and its ability to detect cyanobacteria from satellite images. With the use of microcystin toxin concentrations as a proxy for cyanobacteria levels in the field, the CI-cyano bloom product generated from satellite data was evaluated and demonstrated to have about 84% accuracy for detecting cyanoHABs. Such examples of data fusion, however, are rare and conventional models are inherently limited to single or related data sources, which makes it difficult to describe the HAB formation process holistically, thus resulting in limited applicability. Additionally, phytoplankton dynamics are regulated by complex interactions and nonlinear mechanisms, all of which are not known a priori and hence cannot be integrated into process-based models.
Here, we focus on the use of ML and AI tools in HAB modeling as a complement to traditional modeling techniques. Over the last few decades, ML methods have increasingly been used for HAB detection and forecasting due to their inherent ability to handle very large amounts of data.
In principle, the ML models focus mainly on the relationship mapping between inputs and outputs of a system rather than complex process mechanisms. By learning from a large mass of historical data that has included the dynamic evolution process (e.g., environmental conditions, water quality, remote sensing data, and HAB growth), the highly nonlinear relationships may be accurately modeled. Thus, the data sources relevant to HAB formation and growth are distinct and varied. ML models particularly excel at dealing with multisource data which make them especially relevant in this case. Additionally, the HAB models need to address, separately or sequentially, the phenomena of HAB formation and HAB growth. The choice of ML algorithm is contingent on both the data set(s) being used to train the models and the modeling objective, i.e., HAB formation and/or growth. Therefore, a wide variety of ML algorithms have been used to build models for HAB/cyanoHAB prediction and forecasting, such as artificial neural networks (ANNs),73−81 deep convolutional neural networks (CNNs),82 support vector machines (SVMs),81,83−86 and tree-based methods such as random forests (RF)87−89 and gradient boosting.90,91 Recknagel et al.,73,74 who pioneered the use of ANNs as early as 1996, showed that the ANN approach is a successful method of modeling such complex and nonlinear phenomena as algal blooms in freshwater systems with different environmental conditions. When validated on test data, their models showed good agreement for predictions of the occurrence of specific algae species in four disparate freshwater systems and were able to realistically predict timing, magnitudes, and succession of several algae species, such as Microcystis, Oscillatoria, and Phormidium.
Evolutionary algorithms such as genetic programming have also been successfully applied to HABs.77,92,93 Recknagel et al.94 compared ANN and genetic programming techniques to model freshwater algal blooms in Lake Kasumigaura in Japan and found that, while ANNs allow seven-days-ahead predictions of timing and magnitudes of algal blooms with reasonable accuracy, models explicitly synthesized by genetic algorithms (GAs) perform better in seven-days-ahead predictions of algal blooms. In addition, GAs provide more transparency and insight as compared to black-box ANN models.
Almuhtaram et al.95 trained four different unsupervised ML algorithms, namely, local outlier factor (LOF), one-class SVM, elliptic envelope, and isolation forest (iForest), on data collected at four buoys in Lake Erie from 2014 to 2019 to detect anomalies in phycocyanin fluorescence data without the need for corresponding cell counts or biovolume. Predictions were validated using remote sensing data. They found that the one-class SVM and elliptic envelope perform best for detecting potential cyanoHABs using only fluorescence data sets, with accuracies greater than 90%.
Heddam et al.96 built multiple ML models including ANNs, extreme learning machine (ELM), random forest regression (RFR), and random vector functional link (RVFL) for modeling cyanobacteria at two rivers located in the United States using only water quality variables. They found that good predictive accuracy was obtained using the RFR model, while the other models, i.e., ANN, RVFL, and ELM, failed to provide a good estimation of the cyanobacteria concentrations.
On the remote sensing side, ML algorithms have been used extensively with high accuracy to predict patterns in algal blooms via satellite image classification and characterization.96−99 However, such studies are more often used to predict saltwater HABs. Recently, long–short-term-memory (LSTM) networks, which are a variety of recurrent neural networks (RNNs), have been used for time series analysis for HAB proxies in inland water systems.100 RNNs enable the classification and characterization of temporal signals, while LSTMs are capable of learning long-term dependencies, especially in temporal information. Ai et al.101 built classification and regression models for short-term prediction of algal blooms in the Lake Erie area. They compiled a large data set consisting of the chlorophyll a index and a combination of riverine (the Maumee and Detroit Rivers) and meteorological features and used it to train ML-based classification and regression models for 10 day scale bloom predictions. Their analysis found that nitrogen loads, time, water levels, soluble reactive phosphorus load, and solar irradiance are the most important features for cyanoHAB control. In their models, they considered both long- and short-term nitrogen loads. In addition, LSTM-based neural network architecture was used to model the temporal behavior of four short-term features (N, solar irradiance, and two water levels), which was then used as an input to the RF-based classification model. This is an effective approach to address the issue of making predictions when actual feature measurements are not available going forward in time (a practical scenario for most real-life situations). However, current studies are limited to short-term forecasts only.
Yuan et al.102 developed an AI-chip-based algae monitoring system for real-time algae species classification and HAB prediction. The system performs on-site image processing and classification using a low-power edge AI device and was tested on a data set of 11 250 algae images containing the 25 most common HAB classes in Hong Kong subtropical waters with 99.87% test accuracy. Lee et al.103 developed a CNN model using eight water quality variables and four weather variables to predict the concentration of chlorophyll a in four major Korean rivers. In addition, Deep SHAP, which is a high-speed algorithm for SHAP (SHapley Additive exPlanations)103−105 values in learning models was applied to aid in policy decision-making and identify the influence on variables affecting chlorophyll a. They envision that their monitoring system can predict HAB spread, identify variable influences to aid decision-makers, and effectively implement preemptive responses, thus reducing economic losses and preserving aquatic ecosystems.
In summary, a wide variety of traditional ML methods such as unsupervised clustering, decision tree, and neural networks have been applied to forecast HABs and have yielded limited but valuable insights. The focus has naturally been on HAB forecasting and less so on developing a holistic understanding of all the features that contribute to HAB formation and using this knowledge to drive prediction. This is primarily due to the complex nature of the varied environmental, chemical, and biological factors that affect HABs in addition to the geographical diversity of water bodies vulnerable to HABs. Also, a wide variety of ML algorithms have been investigated and trained on varied data sets from different sites. Due to the heterogeneity inherent in the data, different algorithms are found to work better for different data sets. Additionally, most studies have focused on relatively small data sets and are restricted to single data sources.
Thus far, there is a shortage of comprehensive HAB ML models trained on multisource data such as physical, chemical, and biological data to achieve holistic predictions of HAB formation and toxin production. Gaussian process regression and random forest regression algorithms are particularly suited to such endeavors owing to the in-built routines for uncertainty predictions and feature importance analysis. Iterative refinements of such models in conjunction with a relative feature analysis will allow quantification of factors/features that play a dominant role. The most relevant features identified in each of the subgroups of physical, chemical, and biological data could then be combined to construct a final model that provides the best trade-offs between predictive power and model complexity. Explanation-based analyses, SHAP104,105 or partial dependence plots (PDPs),106 performed on the final model could lead to valuable insights into both the role of individual factors and their interaction effects leading to HAB growth and toxin production. In their work detailing process-based models for HAB prediction, Ralston and Moore70 suggest the use of ensemble approaches which consider multiple model scenarios to quantify how different choices of key input factors, and the model formulation itself, affect the model predictions and address the uncertainty inherent to long-term projections of HAB response. These approaches may also be applied to ML-based models. Considerable work has focused on the development of AI-based monitoring and prediction systems that can prove very useful; however, their nature necessitates that a distinct system be developed for each specific site. Combined efforts to develop a generalized AI system that can account for all possible data sources and may be tweaked to make them applicable to different sites are the current need.
Finally, study of the HAB microbiome in the context of HAB physicochemical factors over time will enable a predictive understanding of HABs, to identify bloom triggers and key environmental and chemical factors that maintain bloom activity and toxin production and release. We recognize that while cyanoHABs are dominated by cyanobacteria, there may be other actors, notably viruses, other bacteria, and eukaryotic microbes present in the water samples. Existing sequence analysis tools, e.g., read aligners such as Bowtie2107 and de novo assemblers such as SPAdes,108 will need additional tuning and computation scaling to detect the presence and abundance of fragment-based features that are routinely filtered out in conventional meta-‘omics pipelines.109,110
Figure 4 outlines a suggested workflow to build predictive and interpretative ML models for HABs that may be used by domain experts and decision-makers.
Figure 4.
Suggested HAB ML/AI workflow for the data collection, integration, and development of a HAB ML/AI model to enable the development and understanding of an integrated picture of the cyanoHAB dynamic ecosystem. Multisource HAB data that spans the different meteorological spatial scales and temporal resolutions (as defined in section 4) should be curated and used to train, test, and validate ML models. The model should then be analyzed to enable extraction of key insights, in terms of the relative feature importance quantification and generation of design maps for cyanoHAB formation. The model development and analysis should be conducted iteratively to ensure a robust model. Insights and predictions obtained from such an adaptive framework will then be interpreted by domain experts and decision-makers.
5.2. Computing Resources and Needs
Due to the multitemporal, multispatial nature of remote-sensing (RS) data, its volume is very large with an exponential growth rate and increasing degree of diversity and complexity. For example, as of the writing of this perspective, there were an estimated 1.49 million files, each approximately 6–7 GB in size, available through the NASA Earthdata portal, whereas tabulated textual data files with millions of data points can be stored in a text file of the same (6–7 GB) size. In addition, RS data consists of extensive metadata such as image details to describe the basic size and type information on image data, map details that indicate the geographical location of the data, such as latitudes and longitudes of the image, and projection details that include structured geographic projection parameters which vary with different projection methods. Once a model and analysis techniques are in place, real-time analysis of incoming data needs to be implemented, which requires efficient preprocessing techniques that will transform the data into standardized geo-referenced data. The preprocessing stage then includes data storage, memory loading, transmission, processing, and analysis which can prove quite challenging. Unless the challenges are met efficiently, the big data cannot be fully exploited. Parallel input/output (I/O) interfaces and physical storage techniques and system architectures need to be developed that can match the expected data access patterns of RS data. Networks with sufficient bandwidths will be required to enable high-speed communication of high-volume data. Several deep learning models exist for learning from geospatial data, and these are able to take advantage of graphics processing unit (GPU) computing. However, they require GPUs with at least 4 GB of memory to perform satisfactorily. Even with GPU acceleration, deep learning models often need to be trained on high performance computing systems supporting gigaflop computations.111 There may also be benefits from hardware optimization for unique data and machine learning needs.112,113
Due to the systematic and largely automated collection methods described above, most data on HABs and the water bodies they affect almost always exist in tabular form, facilitating ingestion by data analysis tools. However, biological data, such as HRM data, needs to be meticulously cleaned, denoised, and processed through specialized meta-‘omics software. HRM data processing computation needs to scale with the number of samples and the sequencing depth of each sample. Memory needs can approach the terabyte scale,114 and therefore, the computations commonly use compute clusters with >32 central processing units (CPUs) and >4 GB of random access memory per CPU. GPU computing115 and edge computing116 are popularly used due to the inherent single instruction multiple data (SIMD) nature of the computation.117 Download and storage of the data can also require petabyte-scale systems such as those hosted by the National Microbiome Data Collaborative.118
6. Current Limitations and Challenges
HAB events are naturally complex ecological events triggered and dictated by a variety of biotic and abiotic factors and consequently produce a range of data indicators.119 While ML is a powerful tool that is especially suited to the prediction of HAB formation from complex data, it frequently encounters gaps and defects in the data,120 which make its implementation challenging. For example, improved sensing technologies and new platforms for data access are being developed,121 and they may not always provide enough historical data for HAB predictions. Similarly, detailed ground truth data sets with confirmed positive and negative HAB events are necessary but remain insufficiently available, especially for freshwater systems (HAEDAT,122 Environmental Working Group, EWG123). Moreover, to assign a HAB/non-HAB label to a collected data set, thresholds for measured parameters must be specified, which will also vary with time and location.
Detection of cyanoHABs in water samples collected in the field use traditional microbiological methods, as well as a variety of fluorometric, analytical, and molecular methods.124,125 Satellite sensing of HABs most frequently use chlorophyll a as a proxy for HAB presence.126 Mishra et al.72 addressed how well satellite measurements of cyanoHABs corresponded to field measurements of cyanoHAB presence/absence in a recent study where they demonstrated highly accurate matching of the CIcyano algorithm based on satellite data, and microcystin (MC) measurements from field samples of the same locations, as confirmation of cyanoHAB presence. This report also presented an excellent discussion of the complexities and challenges in matching up satellite sensor data with field data to determine cyanoHAB presence/absence. Satellite sensing of cyanoHABs have also been shown to correspond to state-reported cyanoHAB events.127 However, we found that such satellite sensing data are only available at the time resolution of a few days to weeks, depending on which satellite is used. Therefore, ground truth labels at the hour and day time resolution remain a challenge to obtain and must be arbitrated on the basis of chlorophyll, phycocyanin, cell count, or biovolume quantitation in the field.
While it is possible to build accurate prediction models, since these are event-based predictions, sufficient data to build a baseline no-HAB model is also required. In addition, factors such as cloud cover affect the gathering of remote sensing data based on reflectance methods, and estimated HAB proxy concentrations are often inaccurate in shallow water. While large amounts of data are available, this data is in essence sparse, and consequently, ML techniques to handle sparse data need to be employed to tackle such problems. Due to a variety of causes like missing values and inconsistencies in data, as well as data aggregation from a variety of sources, the incomplete data problem spans various domains. Human-induced errors or sensor failures due to technical or environmental issues also contribute to the missing data problem. To address this challenge, in the context of cyanobacteria monitoring via satellites, Luthra128 trained a supervised classification model using ML algorithms such as random forest and k-nearest neighbors to classify data into different trophic states to predict the missing data of a particular lake at a given timestamp.
From the data aspect, HAB modeling needs to overcome some unique challenges. Depending on the differences in the harmful effects of the HAB being considered, the definitions of HABs implied in the literature are subjective. The diversity in lake monitoring metrics (concentration of microcystin, surface scum), the types of harm considered (fish mortality, ecosystem degradation, effect on human health), and the direct and indirect connections between the two suggest that what constitutes a HAB is not straightforward, and conclusions across studies using different metrics may not be immediately comparable. HABs occur all over the world in unique and diverse, open and connected ecosystems; hence a general solution to the HAB modeling problem is not feasible. HAB data is collected over different time scales (weeks, months, years, decades) and is consequently both a big-data problem (remote sensing data) and a multiple-small-data problem (water monitoring data). A unified, chronological, standardized database with data collated using standardized metrics is an essential first step in a holistic HAB prediction model.
It is also interesting to note that, while data sets on environmental variables, water quality, remote sensing, etc. have been heavily employed to build ML models, there is a conspicuous lack of models trained on genomics data. The generation of genomics data is inherently time and labor intensive, and it is likely that the objectives of personnel engaged in their collection do not extend to the application of ML and AI for HAB modeling. In addition, the use of such data would necessitate the availability of extensive domain knowledge, which may not be accessible to researchers currently engaged in the modeling of HAB formation and growth.
7. Research Needs/Future Research Questions
We offer the following observations and recommendations:
1. A predictive understanding of freshwater HAB formation and toxin production is essential to improve the security of freshwater systems and protect public health. The complexity and the data volume necessitate an AI approach that will bring together disparate data types under a unified model in order to identify relationships and make predictions that can inform decision-makers.
2. Data on physical and chemical features of freshwater systems are widely publicly available, at least for some regions of the United States where HABs are a regular occurrence. However, biological data on the HAB environment, such as ‘omics-related data, are scarcer but are becoming increasingly available in public ‘omics repositories and in publications.
3. Fortunately, the data science and ML tools are ready to assist us in learning from these data sets. In the future, the ML tools that we develop, especially those focused on biological data, can be applied in conjunction with other data to provide a more holistic picture of cyanoHAB dynamics. Moreover, the ML tools that we develop will provide an improved understanding of the cyanoHAB lifecycle and the role of other microbes in cyanoHAB formation and toxicity.
4. In other fields, such as materials science, ML tools can be applied to accelerate materials design, by reducing experimental time. In HAB science, ML tools may help other areas of the United States and the world where data has not been collected as extensively as for areas such as the Great Lakes, by identifying key parameters that are predictive of HABs. Monitoring programs can then be developed to focus only on those parameters.
5. Finally, the importance of data set curation cannot be underestimated. With large, complex, disparate data sets, a critical first step is to ensure that the data being entered is reliable. Often data curation requires domain expertise. Without a solid data foundation, interpretation of any learned relationships may not be possible. We posit, therefore, that there may be more value in tracing back to original data from published research and then reprocessing the data with statistical rigor including, for example, data standardization and input uncertainty quantification. Meta-analyses of data collection techniques and associated effect sizes may add further value in data fusion by increasing numerical stability and reducing the need for data gap filling.
8. Conclusion
As climate change progresses, we can expect to see an increased occurrence and intensity of HABs impacting our water resources. In this perspective, we describe limitations and proposed improvements to approaches that can achieve effective, high-fidelity predictions of cyanoHAB formation and toxicity through innovative, data-enabled analyses that use available biological parameters in conjunction with physical, chemical, and geospatial parameters as predictors that dictate causality of cyanoHAB formation and toxicity. To the best of our knowledge, bringing multiple disparate data sets together via ML would exceed the state of the art and be a critical advancement for cyanoHAB forecasting.
By application of ML to integrated data sets, we promote a predictive understanding of the dynamic cyanoHAB ecosystem. This may be approached via the use of standalone ML models or in conjunction with process-based models thus combining start-of-the-art ML and AI infrastructure with domain expertise. Moreover, we expect the new models could be tailored to expedite HAB monitoring and forecasting broadly to enhance the security of freshwater systems and protect public health in the many regions of the United States and globally with intense cyanoHABs.
Acknowledgments
We gratefully acknowledge the insight we received from many fruitful discussions with Thomas Bridgeman, University of Toledo, and George Bullerjahn, Bowling Green State University. LANL is operated by Triad National Security, LLC, for the National Nuclear Security Administration of the U.S. Department of Energy (Contract No. 89233218CNA000001). We are grateful to LANL Visual Designer Jacob Hassett for his artistry, which made this perspective come to life.
Biographies

Babetta Marrone is a Scientist, Laboratory Fellow, and AAAS Fellow in the Microbial and Biome Sciences group (Bioscience Division) at Los Alamos National Laboratory. Her research career has spanned biomedicine, bioforensics, biosecurity, and bioenergy. Her current research interests are focused on sustainable biomanufacturing of biofuels and bioproducts, especially from microalgae. She is specifically interested in development of AI/ML tools for optimizing biomanufacturing processes and for accelerating biological solutions applied to health and environmental science and climate resilience. She received a B.A. from Hampshire College, a Ph.D. from Rutgers University, and conducted postdoctoral research at the University of Wisconsin.

Shounak Banerjee received his Bachelor of Technology degree in Biotechnology from West Bengal University of Technology in 2011 and his Ph.D. in Biology from Rensselaer Polytechnic Institute in 2016. He has worked at the intersection of computational structural and molecular biology for 12 years. He specializes in a broad range of topics, including engineering fluorescent proteins, using machine learning to mine valuable genetic traits and developing high throughput screening systems for bio-mimetic materials. His current research interests are in the fields of bio-inspired and bio-mimetic material engineering, photobiology and photochemistry, and machine learning on multi-‘omics data.

Anjana Talapatra is a Scientist in the Materials Science and Technology Division at Los Alamos National Laboratory. Her research involves combining high-throughput Density Functional Theory (DFT) and Machine-Learning methods to accelerate the search for new materials. These techniques find applications in disparate fields ranging from structural to opto-electronic materials to bio-materials. She received her Ph.D. in 2015 from the Mechanical Engineering Department at Texas A&M University followed by post-doctoral research appointments in the Materials Science & Engineering Department at Texas A&M and LANL.

Raul Gonzalez is a Scientist and First Line Manager in the Microbial and Biome Sciences group (Bioscience Division) at Los Alamos National Laboratory. After receiving his Ph.D. in Plant Biology from Arizona State University, he did two postdoctoral periods at Michigan State University and LANL. He leverages synthetic and molecular biology techniques towards engineering photosynthetic microbes (mainly cyanobacteria), with a focus on developing platforms for biomanufacturing of fuel and renewable chemicals.
Author Present Address
§ GE Research, Niskayuna, NY 12309, USA
Author Contributions
All authors conceived the perspective and contributed equally to writing the original draft, review, and editing. A.T., S.B., C.R.G.E., and G.P. were responsible for gathering the data, for data curation, and for developing the formal analysis approaches. B.L.M., S.B., G.P., and C.R.G.E. acquired funding, and B.L.M. leads the project that supported this work. CRediT: Babetta L Marrone conceptualization, funding acquisition, project administration, resources, supervision, writing-original draft, writing-review & editing; Shounak Banerjee conceptualization, data curation, formal analysis, funding acquisition, resources, writing-original draft, writing-review & editing; Anjana Talapatra conceptualization, data curation, formal analysis, resources, writing-original draft, writing-review & editing; C. Raul GonzalezEsquer conceptualization, data curation, formal analysis, resources, writing-original draft, writing-review & editing; Ghanshyam Pilania conceptualization, funding acquisition, writing-original draft, writing-review & editing.
Funding for all authors was provided from internal support from Los Alamos National Laboratory’s Directed Research and Development (LDRD) program, under Project No. 20230279ER.
The authors declare no competing financial interest.
References
- Moore S. K.; Trainer V. L.; Mantua N. J.; Parker M. S.; Laws E. A.; Backer L. C.; Fleming L. E. Impacts of climate variability and future climate change on harmful algal blooms and human health. Environmental Health 2008, 7 (Suppl 2), S4. 10.1186/1476-069X-7-S2-S4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wells M. L.; Trainer V. L.; Smayda T. J.; Karlson B. S. O.; Trick C. G.; Kudela R. M.; Ishikawa A.; Bernard S.; Wulff A.; Anderson D. M.; Cochlan W. P. Harmful algal blooms and climate change: Learning from the past and present to forecast the future. Harmful Algae 2015, 49, 68–93. 10.1016/j.hal.2015.07.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ho J. C.; Michalak A. M.; Pahlevan N. Widespread global increase in intense lake phytoplankton blooms since the 1980s. Nature 2019, 574 (7780), 667–670. 10.1038/s41586-019-1648-7. [DOI] [PubMed] [Google Scholar]
- Anderson D. M.; Fachon E.; Pickart R. S.; Lin P.; Fischer A. D.; Richlen M. L.; Uva V.; Brosnahan M. L.; McRaven L.; Bahr F.; Lefebvre K.; Grebmeier J. M.; Danielson S. L.; Lyu Y.; Fukai Y. Evidence for massive and recurrent toxic blooms of Alexandrium catenella in the Alaskan Arctic. Proc. Natl. Acad. Sci. U. S. A. 2021, 118 (41), e2107387118. 10.1073/pnas.2107387118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dai Y.; Yang S.; Zhao D.; et al. Coastal phytoplankton blooms expand and intensify in the 21st century. Nature 2023, 615, 280–284. 10.1038/s41586-023-05760-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Turley B. D.; Karnauskas M.; Campbell M. D.; Hanisko D. S.; Kelble C. R. Relationships between blooms of Karenia brevis and hypoxia across the West Florida Shelf. Harmful Algae 2022, 114, 102223. 10.1016/j.hal.2022.102223. [DOI] [PubMed] [Google Scholar]
- Kudela R. M.; et al. Harmful Algal Blooms. A Scientific Summary for Policy Makers; IOC/INF-1320; IOC/UNESCO: Paris, 2015.
- Wuebbles D. L.; Cardinale B.; Cherkauer K.; Davidson-Arnott R.; Hellmann J.; Infante D.; Johnson L.; de Loe R.; Lofgren B.; Packman A.; Seleglenieks F.; Sharma A.; Sohngen B.; Tiboris M.; Vimont D.; Wilson R.; Kunkel K.; Ballinger A.. An Assessment of the Impacts of Climate Change on the Great Lakes; Environmental Law & Policy Center: 2019
- Steffen M. M.; Davis T. W.; McKay R. M. L.; Bullerjahn G. S.; Krausfeldt L. E.; Stough J. M. A.; Neitzey M. L.; Gilbert N. E.; Boyer G. L.; Johengen T. H.; Gossiaux D. C.; Burtner A. M.; Palladino D.; Rowe M. D.; Dick G. J.; Meyer K. A.; Levy S.; Boone B. E.; Stumpf R. P.; Wilhelm S. W.; et al. Ecophysiological examination of the Lake Erie Microcystis bloom in 2014: Linkages between biology and the water supply shutdown of Toledo, OH. Environ. Sci. Technol. 2017, 51 (12), 6745–6755. 10.1021/acs.est.7b00856. [DOI] [PubMed] [Google Scholar]
- Bullerjahn G. S.; McKay R. M.; Davis T. W.; Baker D. B.; Boyer G. L.; D’Anglada L. V.; Doucette G. J.; Ho J. C.; Irwin E. G.; Kling C. L.; Kudela R. M.; Kurmayer R.; Michalak A. M.; Ortiz J. D.; Otten T. G.; Paerl H. W.; Qin B.; Sohngen B. L.; Stumpf R. P.; Wilhelm S. W.; Visser P. M. Global solutions to regional problems: Collecting global expertise to address the problem of harmful cyanobacterial blooms. A Lake Erie case study. Harmful Algae 2016, 54, 223–238. 10.1016/j.hal.2016.01.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Matson P. G.; Boyer G. L.; Bridgeman T. B.; Bullerjahn G. S.; Kane D. D.; McKay R. M. L.; McKindles K. M.; Raymond H. A.; Snyder B. K.; Stumpf R. P.; Davis T. W. Physical drivers facilitating a toxigenic cyanobacterial bloom in a major Great Lakes tributary. Limnology and Oceanography 2020, 65 (12), 2866–2882. 10.1002/lno.11558. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Barnard M. A.; Chaffin J. D.; Plaas H. E.; Boyer G. L.; Wei B.; Wilhelm S. W.; Rossignol K. L.; Braddy J. S.; Bullerjahn G. S.; Bridgeman T. B.; Davis T. W.; Wei J.; Bu M.; Paerl H. W. Roles of nutrient limitation on Western Lake Erie CyanoHAB toxin production. Toxins 2021, 13 (1), 47. 10.3390/toxins13010047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoke A. K.; Reynoso G.; Smith M. R.; Gardner M. I.; Lockwood D. J.; Gilbert N. E.; Wilhelm S. W.; Becker I. R.; Brennan G. J.; Crider K. E.; Farnan S. R.; Mendoza V.; Poole A. C.; Zimmerman Z. P.; Utz L. K.; Wurch L. L.; Steffen M. M. Genomic signatures of Lake Erie bacteria suggest interaction in the Microcystis phycosphere. PLoS One 2021, 16 (9), e0257017. 10.1371/journal.pone.0257017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burford M. A.; Carey C. C.; Hamilton D. P.; Huisman J.; Paerl H. W.; Wood S. A.; Wulff A. Perspective: Advancing the research agenda for improving understanding of cyanobacteria in a future of global change. Harmful Algae 2020, 91, 101601. 10.1016/j.hal.2019.04.004. [DOI] [PubMed] [Google Scholar]
- Xiao M.; Hamilton D. P.; O’Brien K. R.; Adams M. P.; Willis A.; Burford M. A. Are laboratory growth rate experiments relevant to explaining bloom-forming cyanobacteria distributions at global scale?. Harmful Algae 2020, 92, 101732. 10.1016/j.hal.2019.101732. [DOI] [PubMed] [Google Scholar]
- Schmale D. G.; Ault A. P.; Saad W.; Scott D. T.; Westrick J. A. Perspectives on Harmful Algal Blooms (HABs) and the cyberbiosecurity of freshwater systems. Frontiers in Bioengineering and Biotechnology 2019, 7, 128. 10.3389/fbioe.2019.00128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wilhelm S. W.; Bullerjahn G. S.; McKay R. M. L. The Complicated and Confusing Ecology of Microcystis Blooms. MBio 2020, 11 (3), e00529-20. 10.1128/mBio.00529-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhong S.; Zhang K.; Bagheri M.; Burken J. G.; Gu A.; Li B.; Ma X.; Marrone B. L.; Ren Z. J.; Schrier J.; Shi W.; Tan H.; Wang T.; Wang X.; Wong B. M.; Xiao X.; Yu X.; Zhu J.-J.; Zhang H. Machine Learning: New ideas and tools in environmental science and engineering. Environ. Sci. Technol. 2021, 55 (19), 12741–12754. 10.1021/acs.est.1c01339. [DOI] [PubMed] [Google Scholar]
- Treuer G.; Kirchhoff C.; Lemos M. C.; McGrath F. Challenges of managing harmful algal blooms in US drinking water systems. Nature Sustainability 2021, 4 (11), 958–964. 10.1038/s41893-021-00770-y. [DOI] [Google Scholar]
- Pion-Tonachini L.; Bouchard K.; Martin H. G.; Peisert S.; Holtz W. B.; Aswani A.; Dwivedi D.; Wainwright H.; Pilania G.; Nachman B.; Marrone B. L.; Falco N.; Prabhat; Arnold D.; Wolf-Yadlin A.; Powers S.; Climer S.; Jackson Q.; Carlson T.; Brown J. B.; et al. (2021). Learning from learning machines: A new generation of AI technology to meet the needs of science. arXiv (Computer Science.Machine Learning), November 27, 2021, 2111.13786. https://arxiv.org/abs/2111.13786.
- Tromas N.; Fortin N.; Bedrani L.; Terrat Y.; Cardoso P.; Bird D.; Greer C. W.; Shapiro B. J. Characterising and predicting cyanobacterial blooms in an 8-year amplicon sequencing time course. ISME Journal 2017, 11 (8), 1746–1763. 10.1038/ismej.2017.58. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao C. S.; Shao N. F.; Yang S. T.; Ren H.; Ge Y. R.; Feng P.; Dong B. E.; Zhao Y. Predicting cyanobacteria bloom occurrence in lakes and reservoirs before blooms occur. Science of the Total Environment 2019, 670, 837–848. 10.1016/j.scitotenv.2019.03.161. [DOI] [PubMed] [Google Scholar]
- Kim S.; Kim S.; Mehrotra R.; Sharma A. Predicting cyanobacteria occurrence using climatological and environmental controls. Water Res. 2020, 175, 115639. 10.1016/j.watres.2020.115639. [DOI] [PubMed] [Google Scholar]
- Myer M. H.; Urquhart E.; Schaeffer B. A.; Johnston J. M. Spatio-temporal modeling for forecasting high-risk freshwater Cyanobacterial Harmful Algal Blooms in Florida. Frontiers in Environmental Science 2020, 8, 581091. 10.3389/fenvs.2020.581091. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ahn J. M.; Kim J.; Park L. J.; Jeon J.; Jong J.; Min J.-H.; Kang T. Predicting Cyanobacterial Harmful Algal Blooms (CyanoHABs) in a regulated river using a revised EFDC model. Water 2021, 13 (4), 439. 10.3390/w13040439. [DOI] [Google Scholar]
- Lundholm N., Churro C., Fraga S., Hoppenrath M., Iwataki M., Larsen J., Mertens K., Moestrup Ø., Zingone A., Eds. IOC-UNESCO Taxonomic Reference List of Harmful Micro Algae, 2009. https://www.marinespecies.org/Hab (accessed 2023-07-05).
- Hallegraeff G. M.Global Harmful Algal Bloom: Status Report 2021; IOC/INF1399; UNESCO: 2021. [DOI] [PubMed]
- Long M.; Krock B.; Castrec J.; Tillmann U. Unknown extracellular and bioactive metabolites of the Genus Alexandrium: A review of overlooked toxins. Toxins 2021, 13 (12), 905. 10.3390/toxins13120905. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dittmann E.; Börner T. Genetic contributions to the risk assessment of microcystin in the environment. Toxicol. Appl. Pharmacol. 2005, 203 (3), 192–200. 10.1016/j.taap.2004.06.008. [DOI] [PubMed] [Google Scholar]
- Codd G. A.; Metcalf J. S.; Beattie K. A. Retention of Microcystis aeruginosa and microcystin by salad lettuce (Lactuca sativa) after spray irrigation with water containing cyanobacteria. Toxicon 1999, 37 (8), 1181–1185. 10.1016/S0041-0101(98)00244-X. [DOI] [PubMed] [Google Scholar]
- Skulberg O. M.; Carmichael W. W.; Codd G. A.; Skulberg R.. Taxonomy of toxic Cyanophyceae (cyanobacteria). Algal Toxins in Seafood and Drinking Water; Falconer I. R., Ed.; Academic Press: 1993; pp 145–164. [Google Scholar]
- Cronberg G.; Carpenter E. J.; Carmichael W. W.. Taxonomy of harmful cyanobacteria. In Manual on Harmful Marine Microalgae; Hallegraeff G. M., Anderson D. M., Cembella A. D., Eds.; UNESCO Publishing: 2003; pp 523–562. [Google Scholar]
- Erdner D. L.; Dyble J.; Parsons M. L.; Stevens R. C.; Hubbard K. A.; Wrabel M. L.; Moore S. K.; Lefebvre K. A.; Anderson D. M.; Bienfang P.; Bidigare R. R.; Parker M. S.; Moeller P.; Brand L. E.; Trainer V. L. Centers for Oceans and Human Health: a unified approach to the challenge of harmful algal blooms. Environmental Health 2008, 7 (Suppl 2), S2. 10.1186/1476-069X-7-S2-S2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davis T. W.; Berry D. L.; Boyer G. L.; Gobler C. J. The effects of temperature and nutrients on the growth and dynamics of toxic and non-toxic strains of Microcystis during cyanobacteria blooms. Harmful Algae 2009, 8 (5), 715–725. 10.1016/j.hal.2009.02.004. [DOI] [Google Scholar]
- Sunda W. G. Trace Metals and Harmful Algal Blooms. Ecological Studies 2006, 189, 203–214. 10.1007/978-3-540-32210-8_16. [DOI] [Google Scholar]
- Diaz J. M.; Plummer S. Production of extracellular reactive oxygen species by phytoplankton: past and future directions. Journal of Plankton Research 2018, 40 (6), 655–666. 10.1093/plankt/fby039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zilliges Y.; Kehr J.-C.; Meissner S.; Ishida K.; Mikkat S.; Hagemann M.; Kaplan A.; Börner T.; Dittmann E. The Cyanobacterial hepatotoxin microcystin binds to proteins and increases the fitness of Microcystis under oxidative stress conditions. PLoS One 2011, 6 (3), e17615. 10.1371/journal.pone.0017615. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ceballos-Laita L.; Marcuello C.; Lostao A.; Calvo-Begueria L.; Velazquez-Campoy A.; Bes M. T.; Fillat M. F.; Peleato M.-L. Microcystin-LR binds iron, and iron promotes self-assembly. Environ. Sci. Technol. 2017, 51 (9), 4841–4850. 10.1021/acs.est.6b05939. [DOI] [PubMed] [Google Scholar]
- Klein A. R.; Baldwin D. S.; Silvester E. Proton and iron binding by the cyanobacterial toxin microcystin-LR. Environ. Sci. Technol. 2013, 47 (10), 5178–5184. 10.1021/es400464e. [DOI] [PubMed] [Google Scholar]
- DeMott W. R.; Zhang Q.-X.; Carmichael W. W. Effects of toxic cyanobacteria and purified toxins on the survival and feeding of a copepod and three species of Daphnia. Limnology and Oceanography 1991, 36 (7), 1346–1357. 10.4319/lo.1991.36.7.1346. [DOI] [Google Scholar]
- Holland A.; Kinnear S. Interpreting the possible ecological role(s) of cyanotoxins: Compounds for competitive advantage and/or physiological aide?. Marine Drugs 2013, 11 (7), 2239–2258. 10.3390/md11072239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jones G. J.; Bourne D. G.; Blakeley R. L.; Doelle H. Degradation of the cyanobacterial hepatotoxin microcystin by aquatic bacteria. Natural Toxins 1994, 2 (4), 228. 10.1002/nt.2620020412. [DOI] [PubMed] [Google Scholar]
- Schmidt J. R.; Wilhelm S. W.; Boyer G. L. The fate of microcystins in the environment and challenges for monitoring. Toxins 2014, 6 (12), 3354–3387. 10.3390/toxins6123354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Christoffersen K.; Lyck S.; Winding A. Microbial activity and bacterial community structure during degradation of microcystins. Aquatic Microbial Ecology 2002, 27, 125–136. 10.3354/ame027125. [DOI] [Google Scholar]
- Mutalipassi M.; Riccio G.; Mazzella V.; Galasso C.; Somma E.; Chiarore A.; de Pascale D.; Zupo V. Symbioses of cyanobacteria in marine environments: Ecological insights and biotechnological perspectives. Marine Drugs 2021, 19 (4), 227. 10.3390/md19040227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gautam A.; Lear G.; Lewis G. D. Time after time: detecting annual patterns in stream bacterial biofilm communities. Environmental Microbiology 2022, 24 (5), 2502–2515. 10.1111/1462-2920.16017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pound H. L.; Martin R. M.; Sheik C. S.; Steffen M. M.; Newell S. E.; Dick G. J.; McKay R. M. L.; Bullerjahn G. S.; Wilhelm S. W. Environmental studies of Cyanobacterial Harmful Algal Blooms should include interactions with the dynamic microbiome. Environ. Sci. Technol. 2021, 55, 12776–12779. 10.1021/acs.est.1c04207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pound H. L.; Martin R. M.; Zepernick B. N.; Christopher C. J.; Howard S. M.; Castro H. F.; Campagna S. R.; Boyer G. L.; Bullerjahn G. S.; Chaffin J. D.; Wilhelm S. W. Changes in microbiome activity and sporadic viral infection help explain observed variability in microcosm studies. Frontiers in Microbiology 2022, 13, 809989. 10.3389/fmicb.2022.809989. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang K.; Mou X.; Cao H.; Struewing I.; Allen J.; Lu J. Co-occurring microorganisms regulate the succession of cyanobacterial harmful algal blooms. Environ. Pollut. 2021, 288, 117682. 10.1016/j.envpol.2021.117682. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pérez-Carrascal O. M.; Tromas N.; Terrat Y.; Moreno E.; Giani A.; Corrêa Braga Marques L.; Fortin N.; Shapiro B. J. Single-colony sequencing reveals microbe-by-microbiome phylosymbiosis between the cyanobacterium Microcystis and its associated bacteria. Microbiome 2021, 9 (1), 94. 10.1186/s40168-021-01140-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dick G. J.; Duhaime M. B.; Evans J. T.; Errera R. M.; Godwin C. M.; Kharbush J. J.; Nitschky H. S.; Powers M. A.; Vanderploeg H. A.; Schmidt K. C.; Smith D. J.; Yancey C. E.; Zwiers C. C.; Denef V. J. The genetic and ecophysiological diversity of Microcystis. Environ. Microbiol 2021, 23, 7278–7313. 10.1111/1462-2920.15615. [DOI] [PubMed] [Google Scholar]
- Bolin R.; Abbott D.. Studies on the Marine Climate and Phytoplankton of the Central Coastal Area of California, 1954–1960; California Cooperative Oceanic Fisheries Investigations: 1963.
- Briand E.; Yéprémian C.; Humbert J. F.; Quiblier C. Competition between microcystin- and non-microcystin-producing Planktothrix agardhii (cyanobacteria) strains under different environmental conditions. Environmental Microbiology 2008, 10 (12), 3337–3348. 10.1111/j.1462-2920.2008.01730.x. [DOI] [PubMed] [Google Scholar]
- Platt T.; Fuentes-Yaco C.; Frank K. T. Spring algal bloom and larval fish survival. Nature 2003, 423 (6938), 398–399. 10.1038/423398b. [DOI] [PubMed] [Google Scholar]
- Oehrle S.; Rodriguez-Matos M.; Cartamil M.; Zavala C.; Rein K. S. Toxin composition of the 2016 Microcystis aeruginosa bloom in the St. Lucie Estuary, Florida. Toxicon 2017, 138, 169–172. 10.1016/j.toxicon.2017.09.005. [DOI] [PubMed] [Google Scholar]
- Kramer B. J.; Davis T. W.; Meyer K. A.; Rosen B. H.; Goleski J. A.; Dick G. J.; Oh G.; Gobler C. J. Nitrogen limitation, toxin synthesis potential, and toxicity of cyanobacterial populations in Lake Okeechobee and the St. Lucie River Estuary, Florida, during the 2016 state of emergency event. PLoS One 2018, 13 (5), e0196278. 10.1371/journal.pone.0196278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang T.; Hu H.; Ma X.; Zhang Y. Long-term spatiotemporal variation and environmental driving forces analyses of algal blooms in Taihu Lake based on multi-source satellite and land observations. Water 2020, 12 (4), 1035. 10.3390/w12041035. [DOI] [Google Scholar]
- Jankowiak J.; Hattenrath Lehmann T.; Kramer B. J.; Ladds M.; Gobler C. J. Deciphering the effects of nitrogen, phosphorus, and temperature on cyanobacterial bloom intensification, diversity, and toxicity in western Lake Erie. Limnology and Oceanography 2019, 64 (3), 1347–1370. 10.1002/lno.11120. [DOI] [Google Scholar]
- Kenitz K. M.; Anderson C. R.; Carter M. L.; Eggleston E.; Seech K.; Shipe R.; Smith J.; Orenstein E. C.; Franks P. J. S.; Jaffe J. S.; Barton A. D. Environmental and ecological drivers of harmful algal blooms revealed by automated underwater microscopy. Limnology and Oceanography 2023, 68 (3), 598–615. 10.1002/lno.12297. [DOI] [Google Scholar]
- Douglas G. M.; Maffei V. J.; Zaneveld J. R.; Yurgel S. N.; Brown J. R.; Taylor C. M.; Huttenhower C.; Langille M. G. I. PICRUSt2 for prediction of metagenome functions. Nat. Biotechnol. 2020, 38 (6), 685–688. 10.1038/s41587-020-0548-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yancey C. E.; Smith D. J.; Den Uyl P. A.; Mohamed O. G.; Yu F.; Ruberg S. A.; Chaffin J. D.; Goodwin K. D.; Tripathi A.; Sherman D. H.; Dick G. J. Metagenomic and metatranscriptomic insights into population diversity of Microcystis blooms: Spatial and temporal dynamics of mcy genotypes, including a partial operon that can be abundant and expressed. Appl. Environ. Microbiol. 2022, 88 (9), e02464-21. 10.1128/aem.02464-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yancey C. E.; Yu F.; Tripathi A.; Sherman D. H.; Dick G. J. Expression of Microcystis biosynthetic gene clusters in natural populations suggests temporally dynamic synthesis of novel and known secondary metabolites in western Lake Erie. Appl. Environ. Microbiol. 2023, 89 (5), e02092-22. 10.1128/aem.02092-22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shakya M.; Lo C.-C.; Chain P. S. G. Advances and Challenges in Metatranscriptomic Analysis. Frontiers in Genetics 2019, 10, 904. 10.3389/fgene.2019.00904. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wood-Charlson E. M.; Anubhav; Auberry D.; Blanco H.; Borkum M. I.; Corilo Y. E.; Davenport K. W.; Deshpande S.; Devarakonda R.; Drake M.; Duncan W. D.; Flynn M. C.; Hays D.; Hu B.; Huntemann M.; Li P.-E.; Lipton M.; Lo C.-C.; Millard D.; Eloe-Fadrosh E. A.; et al. The National Microbiome Data Collaborative: enabling microbiome science. Nature Reviews Microbiology 2020, 18 (6), 313–314. 10.1038/s41579-020-0377-0. [DOI] [PubMed] [Google Scholar]
- Roossinck M. J. The good viruses: viral mutualistic symbioses. Nature Reviews Microbiology 2011, 9 (2), 99–108. 10.1038/nrmicro2491. [DOI] [PubMed] [Google Scholar]
- Hennon G. M. M.; Dyhrman S. T. Progress and promise of omics for predicting the impacts of climate change on harmful algal blooms. Harmful Algae 2020, 91, 101587. 10.1016/j.hal.2019.03.005. [DOI] [PubMed] [Google Scholar]
- Yuan J.; Wen T.; Zhang H.; Zhao M.; Penton C. R.; Thomashow L. S.; Shen Q. Predicting disease occurrence with high accuracy based on soil macroecological patterns of Fusarium wilt. ISME Journal 2020, 14 (12), 2936–2950. 10.1038/s41396-020-0720-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McElhinney J. M. W. R.; Catacutan M. K.; Mawart A.; Hasan A.; Dias J. Interfacing machine learning and microbial omics: A promising means to address environmental challenges. Frontiers in Microbiology 2022, 13, 851450. 10.3389/fmicb.2022.851450. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hellweger F. L. Models predict planned phosphorus load reduction will make Lake Erie more toxic. Science 2022, 376, 1001–1005. 10.1126/science.abm6791. [DOI] [PubMed] [Google Scholar]
- Ralston D. K.; Moore S. K. Modeling harmful algal blooms in a changing climate. Harmful Algae 2020, 91, 101729. 10.1016/j.hal.2019.101729. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wynne T. T.; Stumpf R. P.; Tomlinson M. C.; Warner R. A.; Tester P. A.; Dyble J.; Fahnenstiel G. L. Relating spectral shape to cyanobacterial blooms in the Laurentian Great Lakes. International Journal of Remote Sensing 2008, 29 (12), 3665–3672. 10.1080/01431160802007640. [DOI] [Google Scholar]
- Mishra S.; Stumpf R. P.; Schaeffer B.; Werdell P. J.; Loftin K. A.; Meredith A. Evaluation of a satellite-based cyanobacteria bloom detection algorithm using field-measured microcystin data. Science of The Total Environment 2021, 774, 145462. 10.1016/j.scitotenv.2021.145462. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Recknagel F.; French M.; Harkonen P.; Yabunaka K.-I. Artificial neural network approach for modelling and prediction of algal blooms. Ecological Modelling 1997, 96 (1–3), 11–28. 10.1016/S0304-3800(96)00049-X. [DOI] [Google Scholar]
- Recknagel F.; Orr P. T.; Bartkow M.; Swanepoel A.; Cao H. Early warning of limit-exceeding concentrations of cyanobacteria and cyanotoxins in drinking water reservoirs by inferential modelling. Harmful Algae 2017, 69, 18–27. 10.1016/j.hal.2017.09.003. [DOI] [PubMed] [Google Scholar]
- Lee J. H. W.; Huang Y.; Dickman M.; Jayawardena A. W. Neural network modelling of coastal algal blooms. Ecological Modelling 2003, 159 (2–3), 179–201. 10.1016/S0304-3800(02)00281-8. [DOI] [Google Scholar]
- Muttil N.; Chau K.-W. Machine-learning paradigms for selecting ecologically significant input variables. Engineering Applications of Artificial Intelligence 2007, 20 (6), 735–744. 10.1016/j.engappai.2006.11.016. [DOI] [Google Scholar]
- Sivapragasam C.; Muttil N.; Muthukumar S.; Arun V. M. Prediction of algal blooms using genetic programming. Mar. Pollut. Bull. 2010, 60 (10), 1849–1855. 10.1016/j.marpolbul.2010.05.020. [DOI] [PubMed] [Google Scholar]
- Chang N.-B.; Bai K.; Chen C.-F. Integrating multisensor satellite data merging and image reconstruction in support of machine learning for better water quality management. Journal of Environmental Management 2017, 201, 227–240. 10.1016/j.jenvman.2017.06.045. [DOI] [PubMed] [Google Scholar]
- Tian W.; Liao Z.; Zhang J. An optimization of artificial neural network model for predicting chlorophyll dynamics. Ecological Modelling 2017, 364, 42–52. 10.1016/j.ecolmodel.2017.09.013. [DOI] [Google Scholar]
- Kim S. M.; Shin J.; Baek S.; Ryu J.-H. U-Net convolutional neural network model for deep Red Tide learning using GOCI. Journal of Coastal Research 2019, 90 (sp1), 302. 10.2112/SI90-038.1. [DOI] [Google Scholar]
- Park Y.; Lee H. K.; Shin J.-K.; Chon K.; Kim S.; Cho K. H.; Kim J. H.; Baek S.-S. A machine learning approach for early warning of cyanobacterial bloom outbreaks in a freshwater reservoir. Journal of Environmental Management 2021, 288, 112415. 10.1016/j.jenvman.2021.112415. [DOI] [PubMed] [Google Scholar]
- Pyo J.; Duan H.; Baek S.; Kim M. S.; Jeon T.; Kwon Y. S.; Lee H.; Cho K. H. A convolutional neural network regression for quantifying cyanobacteria using hyperspectral imagery. Remote Sensing of Environment 2019, 233, 111350. 10.1016/j.rse.2019.111350. [DOI] [Google Scholar]
- Liu Z.; Wang X.; Cui L.; Lian X.; Xu J.. Research on water bloom prediction based on least squares support vector machine. 2009 WRI World Congress on Computer Science and Information Engineering; IEEE: 2009; pp 764–768. 10.1109/CSIE.2009.476. [DOI] [Google Scholar]
- Xie Z.; Lou I.; Ung W. K.; Mok K. M. Freshwater algal bloom prediction by support vector machine in Macau storage reservoirs. Mathematical Problems in Engineering 2012, 2012, 1–12. 10.1155/2012/397473. [DOI] [Google Scholar]
- Dai C.; Tan Q.; Lu W. T.; Liu Y.; Guo H. C. Identification of optimal water transfer schemes for restoration of a eutrophic lake: An integrated simulation-optimization method. Ecological Engineering 2016, 95, 409–421. 10.1016/j.ecoleng.2016.06.080. [DOI] [Google Scholar]
- Mamun M.; Kim J.-J.; Alam M. A.; An K.-G. Prediction of algal chlorophyll-a and water clarity in monsoon-region reservoir using machine learning approaches. Water 2020, 12 (1), 30. 10.3390/w12010030. [DOI] [Google Scholar]
- Segura A. M.; Piccini C.; Nogueira L.; Alcántara I.; Calliari D.; Kruk C. Increased sampled volume improves Microcystis aeruginosa complex (MAC) colonies detection and prediction using Random Forests. Ecological Indicators 2017, 79, 347–354. 10.1016/j.ecolind.2017.04.047. [DOI] [Google Scholar]
- Zeng Q.; Liu Y.; Zhao H.; Sun M.; Li X. Comparison of models for predicting the changes in phytoplankton community composition in the receiving water system of an inter-basin water transfer project. Environ. Pollut. 2017, 223, 676–684. 10.1016/j.envpol.2017.02.001. [DOI] [PubMed] [Google Scholar]
- Yñiguez A. T.; Ottong Z. J. Predicting fish kills and toxic blooms in an intensive mariculture site in the Philippines using a machine learning model. Science of The Total Environment 2020, 707, 136173. 10.1016/j.scitotenv.2019.136173. [DOI] [PubMed] [Google Scholar]
- Lin S.; Novitski L. N.; Qi J.; Stevenson R. J. Landsat TM/ETM+ and machine-learning algorithms for limnological studies and algal bloom management of inland lakes. Journal of Applied Remote Sensing 2018, 12 (02), 1. 10.1117/1.JRS.12.026003. [DOI] [Google Scholar]
- Cao Z.; Ma R.; Duan H.; Pahlevan N.; Melack J.; Shen M.; Xue K. A machine learning approach to estimate chlorophyll-a from Landsat-8 measurements in inland lakes. Remote Sensing of Environment 2020, 248, 111974. 10.1016/j.rse.2020.111974. [DOI] [Google Scholar]
- Muttil N.; Chau K. W. Neural network and genetic programming for modelling coastal algal blooms. International Journal of Environment and Pollution 2006, 28 (3-4), 223. 10.1504/IJEP.2006.011208. [DOI] [Google Scholar]
- Daghighi A.Harmful Algae Bloom Prediction Model for Western Lake Erie Using Stepwise Multiple Regression and Genetic Programming. M.S. Thesis, Cleveland State University, 2017. [Google Scholar]
- Recknagel F.; Bobbin J.; Whigham P.; Wilson H. Comparative application of artificial neural networks and genetic algorithms for multivariate time-series modelling of algal blooms in freshwater lakes. Journal of Hydroinformatics 2002, 4 (2), 125–133. 10.2166/hydro.2002.0013. [DOI] [Google Scholar]
- Almuhtaram H.; Zamyadi A.; Hofmann R. Machine learning for anomaly detection in cyanobacterial fluorescence signals. Water Res. 2021, 197, 117073. 10.1016/j.watres.2021.117073. [DOI] [PubMed] [Google Scholar]
- Heddam S.; Yaseen Z. M.; Falah M. W.; Goliatt L.; Tan M. L.; Sa’adi Z.; Ahmadianfar I.; Saggi M.; Bhatia A.; Samui P. Cyanobacteria blue-green algae prediction enhancement using hybrid machine learning-based gamma test variable selection and empirical wavelet transform. Environmental Science and Pollution Research 2022, 29 (51), 77157–77187. 10.1007/s11356-022-21201-1. [DOI] [PubMed] [Google Scholar]
- Tanaka A.; Kishino M.; Doerffer R.; Schiller H.; Oishi T.; Kubota T. Development of a neural network algorithm for retrieving concentrations of chlorophyll, suspended matter and yellow substance from radiance data of the ocean color and temperature scanner. Journal of Oceanography 2004, 60 (3), 519–530. 10.1023/B:JOCE.0000038345.99050.c0. [DOI] [Google Scholar]
- Ioannou I.; Gilerson A.; Gross B.; Moshary F.; Ahmed S. Deriving ocean color products using neural networks. Remote Sensing of Environment 2013, 134, 78–91. 10.1016/j.rse.2013.02.015. [DOI] [Google Scholar]
- DeLancey E. R.; Simms J. F.; Mahdianpari M.; Brisco B.; Mahoney C.; Kariyeva J. Comparing deep learning and shallow learning for large-scale wetland classification in Alberta, Canada. Remote Sensing 2020, 12 (1), 2. 10.3390/rs12010002. [DOI] [Google Scholar]
- Sagan V.; Peterson K. T.; Maimaitijiang M.; Sidike P.; Sloan J.; Greeling B. A.; Maalouf S.; Adams C. Monitoring inland water quality using remote sensing: potential and limitations of spectral indices, bio-optical simulations, machine learning, and cloud computing. Earth-Science Reviews 2020, 205, 103187. 10.1016/j.earscirev.2020.103187. [DOI] [Google Scholar]
- Ai H.; Zhang K.; Sun J.; Zhang H. Short-term Lake Erie algal bloom prediction by classification and regression models. Water Res. 2023, 232, 119710. 10.1016/j.watres.2023.119710. [DOI] [PubMed] [Google Scholar]
- Yuan A.; Wang B.; Li J.; Lee J. H. W. A low-cost edge AI-chip-based system for real-time algae species classification and HAB prediction. Water Res. 2023, 233, 119727. 10.1016/j.watres.2023.119727. [DOI] [PubMed] [Google Scholar]
- Lee D.; Kim M.; Lee B.; Chae S.; Kwon S.; Kang S. Integrated explainable deep learning prediction of harmful algal blooms. Technological Forecasting and Social Change 2022, 185, 122046. 10.1016/j.techfore.2022.122046. [DOI] [Google Scholar]
- Lundberg S.; Lee S.-I.. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems 30 (NIPS 2017); Neural Information Processing Systems Foundation, Inc.: 2017. [Google Scholar]
- Lundberg S. M.; Erion G. G.; Lee S.-I.. Consistent individualized feature attribution for tree ensembles. arXiv (Computer Science.Machine Learning), March 17, 2019, 1802.03888, ver. 3. https://arxiv.org/abs/1802.03888.
- Cutler D. R.; Edwards T. C.; Beard K. H.; Cutler A.; Hess K. T.; Gibson J.; Lawler J. J. Random forests for classification in ecology. Ecology 2007, 88 (11), 2783–2792. 10.1890/07-0539.1. [DOI] [PubMed] [Google Scholar]
- Langmead B.; Salzberg S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 2012, 9 (4), 357–359. 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bankevich A.; Nurk S.; Antipov D.; Gurevich A. A.; Dvorkin M.; Kulikov A. S.; Lesin V. M.; Nikolenko S. I.; Pham S.; Prjibelski A. D.; Pyshkin A. V.; Sirotkin A. V.; Vyahhi N.; Tesler G.; Alekseyev M. A.; Pevzner P. A. SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. Journal of Computational Biology 2012, 19 (5), 455–477. 10.1089/cmb.2012.0021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sangiovanni M.; Granata I.; Thind A.; Guarracino M. R. From trash to treasure: detecting unexpected contamination in unmapped NGS data. BMC Bioinformatics 2019, 20 (S4), 168. 10.1186/s12859-019-2684-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moreira F.-C.; Sarquis D. P.; Santana de Souza J. E.; de Souza Avelar D.; Thomaz Araújo T. M.; Khayat A. S.; Batista dos Santos S. E.; Pimental de Assumpcã P. Treasures from trash in cancer research. Oncotarget 2022, 13, 1246–1257. 10.18632/oncotarget.28308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- García-Martín E.; Faviola Rodrigues C.; Riley G.; Grahn H. Estimation of energy consumption in machine learning. Journal of Parallel and Distributed Computing 2019, 134, 75–88. 10.1016/j.jpdc.2019.07.007. [DOI] [Google Scholar]
- Sze V.; Chen Y. -H.; Emer J.; Suleiman A.; Zhang Z.. Hardware for machine learning: Challenges and opportunities. 2017 IEEE Custom Integrated Circuits Conference (CICC), Austin, TX, USA; IEEE: 2017; pp 1–8. 10.1109/CICC.2017.7993626. [DOI] [Google Scholar]
- Armstrong M. P.High Performance Computing for geospatial applications: A prospective view. In High Performance Computing for Geospatial Applications. Geotechnologies and the Environment; Tang W., Wang S., Eds.; Springer: 2020; Vol. 23, p 271. 10.1007/978-3-030-47998-5_15. [DOI] [Google Scholar]
- Tremblay J.; Schreiber L.; Greer C. W. High-resolution shotgun metagenomics: The more data, the better?. Briefings in Bioinformatics 2022, 23 (6), bbac443. 10.1093/bib/bbac443. [DOI] [PubMed] [Google Scholar]
- Su X.; Xu J.; Ning K. Parallel-META: efficient metagenomic data analysis based on high-performance computation. BMC Systems Biology 2012, 6 (S1), S16. 10.1186/1752-0509-6-S1-S16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- D’Agostino D.; Morganti L.; Corni E.; Cesini D.; Merelli I. Combining Edge and Cloud computing for low-power, cost-effective metagenomics analysis. Future Generation Computer Systems 2019, 90, 79–85. 10.1016/j.future.2018.07.036. [DOI] [Google Scholar]
- Yelick K.; Buluç A.; Awan M.; Azad A.; Brock B.; Egan R.; Ekanayake S.; Ellis M.; Georganas E.; Guidi G.; Hofmeyr S.; Selvitopi O.; Teodoropol C.; Oliker L. The parallelism motifs of genomic data analysis. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 2020, 378 (2166), 20190394. 10.1098/rsta.2019.0394. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hu B.; Canon S.; Eloe-Fadrosh E. A.; Anubhav; Babinski M.; Corilo Y.; Davenport K.; Duncan W. D.; Fagnan K.; Flynn M.; Foster B.; Hays D.; Huntemann M.; Jackson E. K.P.; Kelliher J.; Li P.-E.; Lo C.-C.; Mans D.; McCue L. A.; Mouncey N.; Mungall C. J.; Piehowski P. D.; Purvine S. O.; Smith M.; Varghese N. J.; Winston D.; Xu Y.; Chain P. S. G. Challenges in bioinformatics workflows for processing microbiome omics data at scale. Front. Bioinform. 2022, 1, 826370. 10.3389/fbinf.2021.826370. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sellner K. G.; Doucette G. J.; Kirkpatrick G. J. Harmful algal blooms: causes, impacts and detection. J. Ind. Microbiol. Biotechnol. 2003, 30, 383–406. 10.1007/s10295-003-0074-9. [DOI] [PubMed] [Google Scholar]
- Giest S.; Samuels A. ‘ For good measure’: data gaps in a big data world. Policy Sci. 2020, 53, 559–569. 10.1007/s11077-020-09384-1. [DOI] [Google Scholar]
- Zainurin S. N.; Wan Ismail W. Z.; Mahamud S. N. I.; Ismail I.; Jamaludin J.; Ariffin K. N. Z.; Wan Ahmad Kamil W. M. Advancements in monitoring water quality based on various sensing methods: A systematic review. Int. J. Environ. Res. Public Health 2022, 19, 14080. 10.3390/ijerph192114080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- IOC-UNESCO. The Harmful Algal Event Database (HAEDAT), October 8, 2023. https://ipt.iobis.org/hab/resource?r=haedat&v=3.16
- News Reports of Algae Blooms, 2010 to Present. Environmental Working Group (EWG). https://www.ewg.org/interactive-maps/algal_blooms/map.
- Chaffin J. D.; Bratton J. F.; Verhamme E. M.; Bair H. B.; Beecher A. A.; Binding C. E.; Birbeck J. A.; Bridgeman T. B.; Chang X.; Crossman J.; Currie W. J. S.; Davis T. W.; Dick G. J.; Drouillard K. G.; Errera R. M.; Frenken T.; MacIsaac H. J.; McClure A.; McKay R. M.; Reitz L. A.; Zhou X.; et al. The Lake Erie HABs Grab: A binational collaboration to characterize the western basin cyanobacterial harmful algal blooms at an unprecedented high-resolution spatial scale. Harmful Algae 2021, 108, 102080. 10.1016/j.hal.2021.102080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Saleem F.; Jiang J. L.; Atrache R.; Paschos A.; Edge T. A.; Schellhorn H.E. Cyanobacterial algal bloom monitoring: Molecular methods and technologies for freshwater ecosystems.. Microorganisms 2023, 11, 851. 10.3390/microorganisms11040851. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Khan R. M.; Salehi B.; Mahdianpari M.; Mohammadimanesh F.; Mountrakis G.; Quackenbush L. J. A meta-analysis on Harmful Algal Bloom (HAB) detection and monitoring: A remote sensing perspective. Remote Sens. 2021, 13, 4347. 10.3390/rs13214347. [DOI] [Google Scholar]
- Whitman P.; Schaeffer B.; Salls W.; Coffer M.; Mishra S.; Seegers B.; Loftin K.; Stumpf R.; Werdell P. J. A validation of satellite derived cyanobacteria detections with state reported events and recreation advisories across U.S. lakes. Harmful Algae 2022, 115, 102191. 10.1016/j.hal.2022.102191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Luthra P.Comparison of machine learning techniques to predict missing cyanobacteria data and trophic states of lakes. M.S. Thesis, University of Georgia, 2017. http://getd.libs.uga.edu/pdfs/luthra_priyanka_201712_ms.pdf. [Google Scholar]




