Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 May 9.
Published in final edited form as: Curr Biol. 2025 Apr 14;35(9):2175–2182.e3. doi: 10.1016/j.cub.2025.03.035

Delayed flowering phenology of red-flowering plants in response to hummingbird migration

Patrick F McKenzie 1,*, Andrea E Berardi 1,2, Robin Hopkins 1
PMCID: PMC12062948  NIHMSID: NIHMS2074900  PMID: 40233751

Summary

The radiation of angiosperms is marked by a phenomenal diversity of floral size, shape, color, scent, and reward.14 The multi-dimensional response to selection to optimize pollination has generated correlated suites of these floral traits across distantly related species, known as ‘pollination syndromes’.59 The ability to test the broad utility of pollination syndromes and expand upon the generalities of these syndromes is constrained by limited trait data, creating a need for new approaches that can integrate vast, unstructured records from community science platforms. Here, we compile the largest North American flower color dataset to date, using GPT-4 with Vision to classify color in over 11,000 species across more than 1.6 million iNaturalist observations. We discover that red- and orange-flowering species (classic “hummingbird pollination” colors) bloom later in eastern North America compared to other colors, corresponding to the arrival of migratory hummingbirds. Our findings reveal how seasonal flowering phenology, in addition to floral color and morphology, can contribute to the hummingbird pollination syndrome in regions where these pollinators are migratory. Our results highlight phenology as an underappreciated dimension of pollination syndromes and underscore the utility of integrating artificial intelligence with community science data. The potential breadth of analysis offered by community science datasets combined with emerging data extraction techniques could accelerate discoveries about the evolutionary and ecological drivers of biological diversity.

Graphical Abstract

graphic file with name nihms-2074900-f0001.jpg

Results

Pollination syndromes describe convergent evolution of suites of floral traits, such as flower color, shape, scent, and nectar production, in response to pollinator-driven selection.2,5,7 While pollination syndromes are not strict “rules,” they provide a valuable framework for understanding how major selective forces shape reproductive strategies in flowering plants.8 For example, hummingbird-pollinated flowers (ornithophilous / trochilophilous) often share a distinctive combination of bright red or orange coloration, tubular forms, exserted stamens, and abundant nectar.2,5,7,9

Tests of pollination syndromes at broad geographic scales have been constrained by limited trait data.6,1012 While existing resources like the TRY Plant Trait Database are helpful, their coverage remains incomplete (e.g. only ~2,500 North American species with recorded flower color in the TRY dataset).13,14 Meanwhile, community science platforms such as iNaturalist are amassing orders of magnitude more observations, albeit with challenges in data curation and standardization.15 Here, we harness community science data using an advanced computer vision model to classify flower color for over 11,000 species. Leveraging this dataset, we test the hypothesis that seasonal flowering phenology may be part of the hummingbird pollination syndrome in eastern North America. Because ruby-throated hummingbirds (Archilochus colubris [Linnaeus, 1758]) migrate seasonally to and from this region, we predict that red- and orange-flowering species occurring there delay blooming relative to other flower colors until these pollinators are present, thus aligning their reproductive timing with hummingbird availability.5,9,1618

Generation of a North American flower color database.

To test our hypothesis, we created a database of flower color for 11,729 North American flowering plant species by combining observation data from the community-science website iNaturalist and GPT-4 with Vision (GPT-4V) (pipeline summarized in Figure S1).19 This database includes 1,674,908 color-labeled observations, each of which is annotated as being in flower when observed and is associated with a time, location, and research-grade species identification. This final dataset was filtered from a total of 1,763,821 observations of angiosperms from iNaturalist, including 13,378 species, that were associated with the “flowering” phenology label as of October 11, 2023. GPT-4V was able to label the color of 88% of the species with only 1,637 species labeled as “NAN” or “UNKNOWN” flower color. With these species-level labels we were able to map flower color to 95% of the observations in our iNaturalist dataset. The cost of using GPT-4V for this task at this time was under 100 USD.

The size of our dataset is substantially larger than the TRY Plant Trait flower color dataset, which contains flower color for only 2,527 species (2,035 species overlapping with our dataset) in our study area, whereas we generated flower color calls for ~11K species. Our dataset is also a more accurate assessment of categorical color as evaluated by independent human labeling of a sampled dataset. Specifically, out of a validation dataset of 250 species with a mix of flower colors, the three authors agreed with each other on 90% of the color labels (mean = 225, standard deviation = 3.60). The authors agreed with the GPT-4V color labels on 87% of species (mean = 217, standard deviation = 3) and agreed with the TRY color labels on only 81.6% of the species (mean = 204, standard deviation = 5.57). Furthermore, the authors visually inspected all 294 species labeled with red flowers from GPT-4V and were able to validate 85.4% of the species color assignments, representing over 95% of the red-labeled observations from iNaturalist (38,840 out of 40,742 observations). The mismatch between the percentage of species correctly labeled and the percentage of corresponding observations correctly labeled suggests that less-abundant species were more likely to be mislabeled.

The GPT-4V-labeled dataset had over 1 million more observations and almost 10,000 more species than the TRY-labeled dataset, and more accurate color labels. Our query of iNaturalist also retrieved 163,693 observations for hummingbirds and 408,406 observations of bumblebees in North America for downstream analysis. These observations are similarly associated with time and location data.

Patterns of flowering plants across North America.

Our sweeping GPT-4V-labeled dataset of flowering plants reveals general patterns in phenology across North America (Figure 1). Nearly 50% of both the unique species observed and total observations in the dataset belong to either the white or yellow color categories (Figure 1B). Red flowers only make up approximately 2.5% of the species and observations. Combining all colors, the average flowering time is consistent across all latitudes; however, the variance around the mean decreases dramatically with increasing latitude (Figure 1A). Lower latitudes tend to have a bimodal flowering pattern with a burst of blooming earlier in the year and a second burst late in the year, while flowering at higher latitudes is confined to a shorter season.

Figure 1: Summary of iNaturalist observation data.

Figure 1:

(A) Overall flowering phenology across all colors in the dataset with mean indicated by central black boxes, distribution indicated by gray violin plots, and 10th and 90th percentiles indicated by dashed black line. While the average flowering time is similar across latitudes, the variance of flowering times throughout the year decreases with increasing latitudes. (B) The flower color frequencies in the North American iNaturalist dataset show similar patterns when broken down by species (left) and number of observations (right). Colors clockwise from the black indicator arrow are: red, orange, brown/maroon, blue, green, yellow, white, purple, pink. The corresponding table can be found as Table S1. (C) Patterns of flowering plant and hummingbird observations in the eastern U.S. in week 11 of the year (top maps and chart) versus week 21 (bottom maps and chart). The maps show gray dots for every observation of flowering plants (left map) and hummingbirds (right map). In week 11, flowers are blooming throughout the eastern U.S., but hummingbirds have not yet arrived. By week 21, hummingbirds have migrated up the eastern U.S. The pie charts show a corresponding increase in the abundance of red and orange flower observations in the eastern U.S. by week 21 compared to week 11. Order of colors is the same as in Figure 1B. See also Table S1.

As day of the year progresses from winter into spring, the number of flower observations increases, particularly in northern latitudes (Video S1). This general pattern of increasing observation data is also true for the two pollinators, bumblebees and hummingbirds, for which we also obtained data from iNaturalist. Observations of hummingbirds are strikingly absent from the eastern United States in the first weeks of the year and increase northward over time after the first three months of the year. The ruby-throated hummingbird, Archilochus colubris [Linnaeus, 1758], is the only species of hummingbird native to the eastern United States (with the exception of rare vagrants).18 They winter in Central America and then migrate northward from March through May. In contrast, hummingbirds are present along the west coast of the United States year-round, and they are largely absent from the middle of the United States.

Red- and orange-flowering plants bloom later than other colors.

Flowering time of red- and orange-flowering plants lags behind that of other colors in the eastern half of the U.S. There, red- and orange-flowering plants are conspicuously absent from the observation dataset in early spring, which corresponds to a period of time when hummingbirds are also not observed in this area. For example, in week 11 of the year, when hummingbirds have not yet migrated through the eastern U.S., red- and orange-flowering observations are scarce. In contrast, by week 21 when hummingbird observations are abundant, red and orange flowers are present and near their overall frequency in the dataset (Figure 1C).

The difference in flowering time can be visualized by tracking the northern edge (latitude of the northernmost 80th percentile of flowering times) of flowering for plants of each flower color (Figure 2A). In the eastern U.S. (longitudes −96° to −59°), red- and orange-flowering plants start flowering 20–50 days after all other colors (white, blue, purple/pink, brown/maroon, green, and yellow). The lag begins about 20 days into the year, when flowers start blooming in the southeastern U.S., and the northern range of red- and orange-flowering plants catches up to the other colors around day 120. Because occurrences with white flowers are the most frequent category in our dataset and are concordant with other non-red/orange colors in their phenological advance northward, and because white flowers appeal to a broad range of pollinators, we proceed by specifically comparing red and orange against white flowers (although see Figures 2A, S2, and S3).

Figure 2: Demonstration of the late appearance of red and orange flowers in eastern North America.

Figure 2:

(A) Latitude of the northern edge (80th percentile of observations) of flowering time for different colors by day of year in the eastern U.S. Red and orange lag behind the other colors (shown in gray, with thicker gray line being the average of all other colors and thick black line representing white flowers). (B) Normalized abundances of red (in red) and white (in white) flowers for latitudinal sections of the eastern U.S. (latitudes 24–34°, 34–44°, and 44–54°) across the day of the year. (C) Consolidated results from MaxEnt maps corresponding to the difference in red and white flowering distributions in sliding windows through the year. The color of each pixel corresponds to the number of days by which white flower occupancy is inferred to occur before red flower occupancy for that pixel. Blue pixels indicate where white flowers occur before red flowers. Pixels that are pure white indicate no time lag; in the southern portions of the map this could represent pixels where both colors are present on the first day of the year, or in more northern regions this could represent pixels where neither color passes the probability threshold for “presence.” See also Figure S3, Video S1, Video S2, and Video S3.

The pattern of delayed flowering in red and orange flowers is robust to overall abundance. There are more white flower observations than red flower observations; to ensure patterns of flower time observations are not confounded with patterns of overall abundance we also compare red and white flowering abundance normalized to total abundance for each region of the study area (Figure 2B). As expected, the flowering season in the northern latitudes begins later than in the southern latitudes, as reflected in rightward movement of the histograms for the northern latitudes. In the eastern U.S., white flowers start blooming earlier than red flowers (the observation histograms for white flowers extend farther to the lower values on the left compared to red flowers), particularly at middle and upper latitudes (latitudes 34° to 54°).

We quantified the extent of the flowering delay for red-flowered species across North America by using a species distribution model approach to infer the spatial patterns of flowering time of red- and white-flowering observations only. The difference between red and white flowering times is extensive (heatmap shading in Figure 2C; Video S2). On average, white flowers are present 25.5 days (median=31.0, std=18.0) before red flowers broadly across the eastern U.S. In Mexico, red flowers appear to lead white flowers; however, there was very limited data associated with the high-elevation regions where this pattern was prominent, and so we expect much higher variance. For example, because of limited sampling white flower presence in some areas never passed the probability threshold that we used to determine occupancy (these cells take the darkest red color in Figure 2C).

Pollinator presence is associated with flower color phenology differences.

Red flowering time tracked the observations of hummingbirds but not bumblebees, many species of which are floral generalists. We found a striking overlap between the distributions of red flowers and hummingbirds, as well as between white flowers and bumblebees (Figure 3A, Figure S2, Video S3). The latitude range for white flowers and for bumblebees advances northward earlier in the spring (beginning around days 20–40) than for red flowers and hummingbirds, both of which advance steeply later in the spring (beginning around days 60–80) (Figure S3).

Figure 3: Correspondence of pollinators and red vs. white flowers.

Figure 3:

(A) The middle 50% latitude occupancy of red flowers, white flowers, hummingbirds, and bumblebees by day of year within the study area. On the top row, red flowers and hummingbirds are layered and white flowers and bumblebees are layered. The bottom row highlights pollinator mismatch, with bumblebees layered on red flowers and hummingbirds layered on white flowers. The average percent overlap across all days is reported in the bottom-right corner of each plot. (B) Importance assigned to each variable in MaxEnt distribution models for inferring the niche of red flowers and white flowers in 15-day overlapping windows through the year. Variables used for niche inference were seven environmental variables, as well as the inferred hummingbird distribution for that window and the inferred bumblebee distribution from that window. Black dividing lines indicate divisions between seasons. Darker red indicates stronger inferred importance for predicting flowering distribution. See also Figure S2 and Video S3.

We confirmed these observations by quantifying the degree to which pollinator presence can predict the presence of each flower color. We found that for most of the year, hummingbird distribution is the most important predictor variable for red flower occurrence (Figure 3B top panel). In contrast, for most of the year, and most importantly during the expansion northward in the springtime, bumblebee distribution is the most important predictor variable for white flower occurrence (Figure 3B bottom panel).

Discussion

We have created a novel pipeline using community science data alongside GPT-4V, a flexible artificial intelligence tool, to compare the flowering phenologies of different angiosperm flower colors. We predicted that the timing of seasonal hummingbird migration would constrain the appearance of red and orange flowers in eastern North America. Using our new dataset, we demonstrate that red- and orange-flowering plants in this region bloom later than those of other colors, and we show that flowering of these individuals coincides closely with the seasonal arrival of hummingbirds. Our results expand our understanding of the scope of plant-pollinator interactions, suggesting constraints on flowering phenology across red- and orange-flowering plants by the movements of their hummingbird pollinators.

We compiled a large dataset of over 1 million iNaturalist observations from more than 10,000 North American flowering plant species. Each observation in this dataset included a species identification, geographic coordinates, flowering status, and assigned flower color. Our use of GPT-4V for categorizing flower colors was highly successful, achieving accuracy comparable to inter-author consistency and outperforming a well-known plant trait database. By capitalizing on this novel methodology, our study reveals biological patterns that deepen our understanding of plant-pollinator interactions.

Previous work has suggested that pollinator availability could shape where and when flowers of different colors occur. There are biogeographical patterns in some flower colors: for example, white and yellow flowers are the predominant phenotype in the arctic, where most pollinators are bees and flies.9,20,21 There are time-of-day patterns for some cases: for example, some species only open at night when their crepuscular (i.e., active during twilight) or nocturnal pollinators are active.22 There is also support for correspondence between flowering phenology of limited sets of species and their specialist pollinators, like a previous case study tying flowering phenology to hummingbird presence for an Andean plant species, and research linking bat migration in southwestern North America to the flowering of Agave.23,24 However, until now it has been unknown whether seasonal flowering phenology at very broad geographic and taxonomic scales might be influenced by pollinator behavior and thus be included in pollination syndromes.

The lag between red/orange and the other colors is consistent with convergent evolution of seasonal flowering phenology driven by hummingbird pollination in eastern North America, extending beyond the known physical floral traits (red, tubular flowers with exserted stamens) of a “hummingbird flower.” To maximize simplicity, we did not consider underlying taxonomic identities of the plants in our characterization of the flowering time lag. Still, we observe that among the 43,882 observations of red- or orange-flowering plant species in eastern North America, there are 19 families represented by at least 500 observations, suggesting that the phenomenon is spread across distantly related taxa and could reflect convergent evolution.

Our findings motivate extensive future research exploring nuances in these patterns of biological diversity. For example, future work could explicitly test for phylogenetic conservatism using established phylogenetic comparative methods.25 Additionally, next steps could seek to explore the mechanisms underlying this phenomenon, characterizing within- vs. between-species patterns in flowering time, as well as looking at the contributions of common vs. rare species and native vs. introduced species to the lag in flowering. Each color category might encompass multiple modes of pollination, and screening out wind-pollinated taxa would further isolate the signals in phenology specifically driven by pollinator availability.

With climate change causing hummingbirds to arrive earlier at northern latitudes, flowering phenology may come under strong directional selection to maintain synchrony with hummingbird presence, both during the short blooming seasons at northern latitudes but also at lower latitudes, where we observe a bimodal distribution of flower abundance (Figure 1A).18,26 Plant/pollinator synchrony is relevant for other regions and systems too; while our analysis focuses on a single example of phenology-related constraints, future work could examine other seasonally constrained pollinators, explore different geographic regions, or investigate additional physical traits associated with pollination syndromes (e.g., floral shape instead of floral color). We propose a directional influence (i.e., that hummingbird migration constrains flowering time) since red- and orange-flowering species have distinct phenologies compared with other colors; however, especially in a changing climate, we imagine that the phenology of these hummingbird flowers should influence the success of their pollinators as well, and future research could explore this further.

While our methodological approach offers significant advantages, we acknowledge the limitations inherent in using community science data. Longstanding methods for precisely measuring floral color – e.g., pigment extraction, spectral reflectance analysis, and standardized photography – are essential for achieving fine-grained accuracy but limit the scale at which floral color data can be analyzed.27,28 By using community science data together with computer vision, we traded some precision for quantity, enabling us to construct a dataset quickly and at minimal cost, while still maintaining high accuracy at the level of coarse color categories. To our knowledge, this is the most comprehensive flower color dataset ever assembled for a geographic region of this size. We also recognize known biases associated with iNaturalist data, such as observations being concentrated in urban areas and showier flowers tending to be disproportionately represented.29 However, since we are primarily examining broad, latitudinal flowering patterns within each color group, we do not expect these biases to significantly impact our straightforward results here. Future research might incorporate herbarium specimens, which exhibit less taxonomic and geographic bias than iNaturalist data.30 Finally, we focused on comparing red- and white-flowering plants because their colors are easily distinguished, and because they tend to attract different pollinators (birds vs. generalist insects). The phenology of white flowers corresponded well with bumblebees and with that of other colors in our analysis of North American plants (Figure S2).

Previous studies have explored machine learning approaches for extracting color and other information from plant image data, including from community science datasets.3135 However, these methods have been limited by the need for extensive model training and manual identification of the flowers within images, which restricts their scalability. In contrast, large multimodal models (LMMs) like GPT-4V offer the advantage of generality due to their pre-training on diverse datasets, eliminating the need for specialized training and manual annotation. This generality enabled us to perform extensive sampling, enhancing the statistical power of our analysis. Targeted studies focusing on flower color in a more constrained set of taxa or using photos with consistent composition and exposure might achieve finer-grained characterization. In such cases, dedicated models specifically trained for these purposes could better capture subtle differences in floral color, with biases that are easier to characterize and account for.

Future work at the intersection of phenology and community science data could further scale up our data extraction approach to include more observations and/or additional traits. In creating our dataset, we used a representative photo from each species to assign color to the entire species, which excludes within-species variation. Reducing the task from labeling each observation to labeling each species saved money, keeping the API use cost under 100 USD, and was also necessitated by the rate limits of the GPT-4V preview model. As computer vision models become more affordable and allow higher-throughput analysis, future approaches might instead query each individual observation. This would account for possible color polymorphism, a source of uncertainty in our current dataset. It would also eliminate the dependence on manually annotated “flowering” phenology through iNaturalist, as recent work has shown that computer vision could directly determine whether the plant in each observation is flowering.36

By presenting a method for analyzing large-scale phenotypic patterns in community science data, we demonstrate an efficient method to test a range of hypotheses across diverse biological systems. We anticipate that our iNaturalist-to-GPT-4V pipeline can be generalized to study other traits, like vegetative characteristics or presence/absence of flowers, as well as questions in non-plant taxa. Harnessing emerging technologies alongside massive community science datasets like iNaturalist could unlock unprecedented insights into phenotypic diversity, transforming our understanding of global patterns of biological variation.

STAR Methods

Experimental model and study participant details

Study area.

We limited our study area to the contiguous lower United States, northern Mexico, and southern Canada with a latitude/longitude-bounded box ranging from −130° to −59° in longitude and from 24° to 54° in latitude.

iNaturalist observation data.

We used the iNaturalist data export tool (https://www.inaturalist.org/observations/export) to collect data from our study area for all flowering plants (Angiospermae). We filtered the export to only include observations that were labeled as being in flower, had open geoprivacy, had photos, and whose identifications were marked as research grade. We included hummingbirds as red-flower-associated pollinators, compared against bumblebees as generalist pollinators. For this analysis, we also exported all observations of each of those groups (family Trochilidae and genus Bombus, respectively) that had photos, had open geoprivacy, and were research grade.

TRY Plant Trait Database color data.

To help benchmark our flower color inferences with GTP-4V (see below), we accessed all non-restricted flower color data through the TRY Plant Trait Database. To do this, we queried public data for the traits “flower color” (ID=207) and “corolla color” (ID=3866). Like our GPT-4V-labeled data, the TRY data was associated with species. Because it is a global dataset, we trimmed down the dataset to only those species present in the iNaturalist dataset from our study area.

Method Details

GPT-4V flower color data.

To maximize the size of our dataset, we employed a new approach for bulk labeling of flower color, using a computer vision model from generative AI. Our approach was to assign a flower color for each of the species in the dataset and then to project these species-assigned colors back onto the full dataset of observation data (see Figure S1). To do this, we: 1) used the iNaturalist API to match each species to a representative “default” photo from the iNaturalist website, and 2) used a cutting-edge general computer vision model to assign each image with a categorical label describing the color of the flower. We interacted with the iNaturalist API in Python through the client “pyinaturalist.”37 We queried each species individually using the “get_taxa” function and saved the returned “default_photo” url alongside each species into a two-column dataframe – this dataframe had species names in the first column and a link to each representative photo in the second column.

For the computer vision step, we used the OpenAI API, which allows us to programmatically interact with OpenAI’s suite of generative AI models in Python. We used the “GPT-4 with Vision” model, which in late 2023 – early 2024 was available in a rate-limited preview capacity, by designating the model choice as “gpt-4-vision-preview”. The GPT models are designed to be very general, and the vision model can interpret basic features of both text and photographs. For each query, we passed GPT-4V the url link to the representative species photo along with the following instructions:

“Please adhere to very specific formatting in your response: three words separated onto three lines (one word per line). The first line should indicate ‘YES’ or ‘NO’ to answer whether there is a flower present. The second line should be one word from the following list, to best describe the flower color in the photo: [‘BLUE’, ‘BROWN’, ‘GREEN’, ‘ORANGE’, ‘PINK’, ‘PURPLE’, ‘RED’, ‘MAROON’, ‘WHITE’, ‘YELLOW’,’UNKNOWN’,’NAN’]. The flowers might not match these categories perfectly. Do the best you can. If in doubt, please be conservative and choose ‘unknown’. The third line should indicate your assessment of the subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that the choice of color assignment seems highly subjective.”

We saved the GPT-4V-labeled results in a dataframe for further analysis, in which each row corresponds to a different species (Table S2).

Validation.

We conducted a validation analysis to ensure reliability in categorical color label assignments, focusing on the 250 species with the highest abundance that had flower color labels available in both the GPT-4V and TRY datasets. Each of the three authors manually scored flower color for all 250 species from the following pool of colors: blue, yellow, green, white, purple / pink / violet, orange, red, maroon / brown, and black. We averaged pairwise comparisons among the three authors with the GPT-4V and TRY labels. Given that multiple possible colors are associated with some species in the TRY data (i.e., color polymorphism), we considered it a match if any of the TRY colors for a species corresponded with our assigned color (e.g., if a species had both “red” and “white” color labels in the TRY data and our validation consensus for this species was “white,” we marked the TRY label as correct). We also manually validated all 294 species labeled as red (Table S3).

Quantification and Statistical Analysis

We analyzed our data using a combination of R and Python notebooks.37,38 For the purposes of the analysis, we combined the color labels of pink/purple and brown/maroon.

Comparing northern edges of each flower color.

We subset the data to eastern (longitudes −96° to −59°) North America. In this region, we recorded the latitude of the northern edge of the distribution of each flower color across the ordinal dates of the year, using a sliding window of 25 days. Specifically, for each color category and for each 25-day window, we extracted all observations from our dataset with their corresponding latitudes, and we recorded the value 80th percentile using the R builtin quantile function. This gives us the northern latitude edge of flowering for each flower color for each 25-day period.

Comparing red vs. white flower abundance.

With the datasets of just red and white flower colors we summarize trends in flowering time by region for each color. We compared the relative abundance of observations of each color in different parts of North America. To do this, we subset the data to the eastern U.S. (longitudes −96° to −59°), and we further subset the data by latitude into 3 rows (latitudes 24° to 34°, 34° to 44°, 44° to 54°). Within each grid cell we created histograms showing the normalized abundance of each color by ordinal day. This demonstrates the different flowering distributions of the two colors over space and through time.

Quantifying lag in flowering time across the landscape.

We used niche modeling in sliding windows to determine occupancy of flowering plants across the landscape for each flower color through time. Distribution modeling is often used to infer the broad distribution of where a species or group of species can or could occur given some observations of the species and patterns of environmental variables. Instead of inferring occupancy of a species, we inferred occupancy of a flower of a particular color. Separately for red and for white, we subset the “flowering” observations in overlapping 15-day windows, sliding one day for each window. For each data subset, we used MaxEnt species distribution modeling program (version 3.4.4) to incorporate the observation data along with seven WorldClim 2.0 (10-minute) environmental variables (annual mean temperature, maximum temperature of warmest month, minimum temperature of the coldest month, annual precipitation, precipitation of the wettest month, precipitation of the driest month, and elevation [from the Shuttle Radar Topography Mission, accessed through WorldClim]) to predict, cell by cell, the probability of occupancy of that particular flower color during that particular window.39,40 The study area was represented by a 180×426 matrix.

Once we estimated occupancy of flowering, we calculated the extent to which red and white flowers bloomed at different times. To do this, we identified the first day of flowering for each color for each pixel as the point at which the MaxEnt-output value of that pixel was greater than 0.5. With two matrices of “first day of flowering” values for red and white flowers separately, we then subtracted the white flower matrix from the red flower matrix to get a matrix for which each pixel represented the number of days by which white flowering time led that for red flowers. When graphed, this represents the difference in flowering phenology of red and white flowers across space. We calculated the mean, median, and standard deviation of cells of this difference matrix in the eastern U.S. to quantify the extent to which white flowers lead red flowers.

Pollinator associations – hummingbirds and bumblebees.

We tested the hypothesis that flowering time of red flowered plants is constrained in the eastern United States because this is where ruby-throated hummingbirds are seasonal migrants. We investigated how well the presence of two different pollinators – hummingbirds, which are strongly associated with pollinating red flowers, and bumblebees, which are generalist pollinators that usually avoid red flowers – predicted the phenology of red and white flowers.9,16,17,41 For our study area, we exported all research-grade iNaturalist observations for bumblebees (genus Bombus) and hummingbirds (family Trochilidae). We used the pollinator data for two analyses: comparing the 25% (trailing edge) and 75% (leading edge) latitudes of red and white flower colors with the pollinators, and for niche modeling where the pollinator distributions were used as predictors.

For the first analysis, we subset the data to look at the eastern United States, where hummingbirds are widespread residents only during the late spring and summer. We used a sliding window each day of the year (starting edge of the window) and we calculated the 25% percentile latitude and the 75% percentile latitude. We repeated this across red flowers, white flowers, hummingbirds, and bumblebees, so that for any day of the year we have an estimate of latitude range. We then compared the distributions of flowering across latitude for these different flower colors and pollinators.

For the second analysis, we used MaxEnt for niche modeling. We first inferred distributions for hummingbirds and bumblebees separately for each day of the year, using 15-day sliding windows, linear models, and the seven predictor variables used for the flowering analysis above. Once we had these results, we then turned to red and white flowers and again ran MaxEnt for each day of the year. For these runs, we used 15-day sliding windows, linear models, and the previous seven “environmental” predictors, but we also included two more predictor variables for each run: the hummingbird and bumblebee distributions from the same window of the year. After these runs finished, we examined variable importance to determine whether environmental variables, hummingbird distributions, or bumblebee distributions were important in explaining red vs. white flower distributions for each day of the year.

Supplementary Material

Supplemental text
Supplemental Table1
Supplemental Table3

Table S3: Manual validation of species assigned “red” labels, related to STAR Methods.

Supplemental Table2

Table S2: Full dataset of species paired with GPT-4V color labels, related to STAR Methods.

VideoS1

Video S1: Occurrences of all flowers, hummingbirds, and red flowers in eastern North America, related to Figure 2.

Download video file (14.8MB, mpg)
Video S2

Video S2: Modeled distributions of red vs. white flowers in North America, related to Figure 2.

Download video file (2MB, mpg)
Videos3

Video S3: Modeled distributions of red flowers vs. hummingbirds vs. white flowers in North America, related to Figure 3.

Download video file (3MB, mpg)
REAGENT or RESOURCE SOURCE IDENTIFIER
Deposited data
iNaturalist observations iNaturalist.org https://www.inaturalist.org/observations/export [Exported to Zenodo for this paper: https://doi.org/10.5281/zenodo.14885028]
TRY flower color data TRY Plant Trait Database13,14 https://www.try-db.org/TryWeb/Home.php [Exported to Zenodo for this paper: https://doi.org/10.5281/zenodo.14885028]
WorldClim Fick and Hijmans39 https://worldclim.org/data/worldclim21.html
Software and algorithms
Custom scripts This paper Zenodo: https://doi.org/10.5281/zenodo.14885028
GPT4-Vision OpenAI et al.19 https://platform.openai.com/
Maxent v3.4.4 Phillips et al.40 https://biodiversityinformatics.amnh.org/open_source/maxent/
Python v3.10.5 Python Software Foundation37 https://www.python.org
R v4.2.0 R Core Team38 https://www.r-project.org

Acknowledgements

We are grateful to the community of iNaturalist users, curators, and staff for generating and maintaining the data underlying this project. Thanks to the Robin Hopkins Lab and the Deren Eaton Lab for feedback on early versions of this project. R. H. was funded by NIH NIGMS-1R35GM142742–01, NSF IOS-19061133, and NSF DEB-1844906.

Footnotes

Declaration of Interests

The authors declare no competing interests.

Declaration of generative AI and AI-assisted technologies in the writing process

During the preparation of this work the authors used GPT-4 with Vision in order to extract data. After using this tool/service, the authors reviewed the content as needed and take full responsibility for the content of the publication.

References

  • 1.Niet T. van der, and Johnson SD (2012). Phylogenetic evidence for pollinator-driven diversification of angiosperms. Trends in Ecology & Evolution 27, 353–361. 10.1016/j.tree.2012.02.002. [DOI] [PubMed] [Google Scholar]
  • 2.Wessinger CA, Rausher MD, and Hileman LC (2019). Adaptation to hummingbird pollination is associated with reduced diversification in Penstemon. Evolution Letters 3, 521–533. 10.1002/evl3.130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Johnson SD (2010). The pollination niche and its role in the diversification and maintenance of the southern African flora. Philosophical Transactions of the Royal Society B: Biological Sciences 365, 499–516. 10.1098/rstb.2009.0243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Pellmyr O (1992). Evolution of insect pollination and angiosperm diversification. Trends in Ecology & Evolution 7, 46–49. 10.1016/0169-5347(92)90105-K. [DOI] [PubMed] [Google Scholar]
  • 5.Fenster CB, Armbruster WS, Wilson P, Dudash MR, and Thomson JD (2004). Pollination Syndromes and Floral Specialization. Annual Review of Ecology, Evolution, and Systematics 35, 375–403. [Google Scholar]
  • 6.Ollerton J, Alarcón R, Waser NM, Price MV, Watts S, Cranmer L, Hingston A, Peter CI, and Rotenberry J (2009). A global test of the pollination syndrome hypothesis. Annals of Botany 103, 1471–1480. 10.1093/aob/mcp031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Martén-Rodríguez S, Almarales-Castro A, and Fenster CB (2009). Evaluation of pollination syndromes in Antillean Gesneriaceae: evidence for bat, hummingbird and generalized flowers. Journal of Ecology 97, 348–359. 10.1111/j.1365-2745.2008.01465.x. [DOI] [Google Scholar]
  • 8.Dellinger AS (2020). Pollination syndromes in the 21st century: where do we stand and where may we go? New Phytologist 228, 1193–1213. 10.1111/nph.16793. [DOI] [PubMed] [Google Scholar]
  • 9.Willmer P (2011). Pollination and Floral Ecology (Princeton University Press; ). [Google Scholar]
  • 10.Faegri K, and Pijl LVD (2013). Principles of Pollination Ecology (Elsevier; ). [Google Scholar]
  • 11.Vogel S (1954). Blütenbiologische Typen als Elemente der Sippengliederung : dargestellt anhand der Flora Südafrikas (G. Fischer; ). [Google Scholar]
  • 12.van der Pijl L (1960). Ecological Aspects of Flower Evolution. I. Phyletic Evolution. Evolution 14, 403–416. 10.2307/2405990. [DOI] [Google Scholar]
  • 13.Kattge J, Bönisch G, Díaz S, Lavorel S, Prentice IC, Leadley P, Tautenhahn S, Werner GDA, Aakala T, Abedi M, et al. (2020). TRY plant trait database – enhanced coverage and open access. Global Change Biology 26, 119–188. 10.1111/gcb.14904. [DOI] [PubMed] [Google Scholar]
  • 14.Kattge J, Díaz S, Lavorel S, Prentice IC, Leadley P, Bönisch G, Garnier E, Westoby M, Reich PB, Wright IJ, et al. (2011). TRY – a global database of plant traits. Global Change Biology 17, 2905–2935. 10.1111/j.1365-2486.2011.02451.x. [DOI] [Google Scholar]
  • 15.Rapacciuolo G, Young A, and Johnson R (2021). Deriving indicators of biodiversity change from unstructured community-contributed data. Oikos 130, 1225–1239. 10.1111/oik.08215. [DOI] [Google Scholar]
  • 16.de Camargo MGG, Lunau K, Batalha MA, Brings S, de Brito VLG, and Morellato LPC (2019). How flower colour signals allure bees and hummingbirds: a community-level test of the bee avoidance hypothesis. New Phytologist 222, 1112–1122. 10.1111/nph.15594. [DOI] [PubMed] [Google Scholar]
  • 17.Bergamo PJ, Rech AR, Brito VLG, and Sazima M (2016). Flower colour and visitation rates of ostus arabicus support the ‘bee avoidance’ hypothesis for red-reflecting hummingbird-pollinated flowers. Functional Ecology 30, 710–720. 10.1111/1365-2435.12537. [DOI] [Google Scholar]
  • 18.Courter JR, Johnson RJ, Bridges WC, and Hubbard KG (2013). Assessing Migration of Ruby-Throated Hummingbirds (Archilochus colubris) at Broad Spatial and Temporal Scales. The Auk 130, 107–117. 10.1525/auk.2012.12058. [DOI] [Google Scholar]
  • 19.OpenAI, Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Aleman FL, Almeida D, Altenschmidt J, Altman S, et al. (2024). GPT-4 Technical Report. Preprint at arXiv, 10.48550/arXiv.2303.08774 [DOI] [Google Scholar]
  • 20.Kevan PG (1972). Insect Pollination of High Arctic Flowers. Journal of Ecology 60, 831–847. 10.2307/2258569. [DOI] [Google Scholar]
  • 21.Kevan PG, Chittka L, and Dyer AG (2001). Limits to the salience of ultraviolet: lessons from colour vision in bees and birds. Journal of Experimental Biology 204, 2571–2580. 10.1242/jeb.204.14.2571. [DOI] [PubMed] [Google Scholar]
  • 22.van Doorn WG, and Kamdee C (2014). Flower opening and closure: an update. Journal of Experimental Botany 65, 5749–5757. 10.1093/jxb/eru327. [DOI] [PubMed] [Google Scholar]
  • 23.Boehm MMA, Guevara-Apaza D, Jankowski JE, and Cronk QCB (2022). Floral phenology of an Andean bellflower and pollination by buff-tailed sicklebill hummingbird. Ecology and Evolution 12, e8988. 10.1002/ece3.8988. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Trejo-Salazar R-E, Gámez N, Escalona-Prado E, Scheinvar E, Medellín RA, Moreno-Letelier A, Aguirre-Planter E, and Eguiarte LE (2023). Historical, temporal, and geographic dynamism of the interaction between Agave and Leptonycteris nectar-feeding bats. American Journal of Botany 110, e16222. 10.1002/ajb2.16222. [DOI] [PubMed] [Google Scholar]
  • 25.Revell LJ, Johnson MA II, Schulte JA, Kolbe JJ, and Losos JB (2007). A PHYLOGENETIC TEST FOR ADAPTIVE CONVERGENCE IN ROCK-DWELLING LIZARDS. Evolution 61, 2898–2912. 10.1111/j.1558-5646.2007.00225.x. [DOI] [PubMed] [Google Scholar]
  • 26.Wang W, Du J, He Z, Miao C, Wu J, Ma D, and Zhao P (2024). Pollinator peaking earlier than flowering is more detrimental to plant fecundity. Science of The Total Environment 917, 170458. 10.1016/j.scitotenv.2024.170458. [DOI] [PubMed] [Google Scholar]
  • 27.Laitly A, Callaghan CT, Delhey K, and Cornwell WK (2021). Is color data from citizen science photographs reliable for biodiversity research? Ecology and Evolution 11, 4071–4083. 10.1002/ece3.7307. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Leighton GRM, Hugo PS, Roulin A, and Amar A (2016). Just Google it: assessing the use of Google Images to describe geographical variation in visible traits of organisms. Methods in Ecology and Evolution 7, 1060–1070. 10.1111/2041-210X.12562. [DOI] [Google Scholar]
  • 29.White E, Soltis PS, Soltis DE, and Guralnick R (2023). Quantifying error in occurrence data: Comparing the data quality of iNaturalist and digitized herbarium specimen data in flowering plant families of the southeastern United States. PLOS ONE 18, e0295298. 10.1371/journal.pone.0295298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Eckert I, Bruneau A, Metsger DA, Joly S, Dickinson TA, and Pollock LJ (2024). Herbarium collections remain essential in the age of community science. Nat Commun 15, 7586. 10.1038/s41467-024-51899-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Luong Y, Gasca-Herrera A, Misiewicz TM, and Carter BE (2023). A pipeline for the rapid collection of color data from photographs. Applications in Plant Sciences 11, e11546. 10.1002/aps3.11546. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Majumder S, and Mason CM (2024). A machine learning approach to study plant functional trait divergence. Applications in Plant Sciences n/a, e11576. 10.1002/aps3.11576. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Perez-Udell RA, Udell AT, and Chang S-M (2023). An automated pipeline for supervised classification of petal color from citizen science photographs. Applications in Plant Sciences 11, e11505. 10.1002/aps3.11505. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Hussein BR, Malik OA, Ong W-H, and Slik JWF (2022). Applications of computer vision and machine learning techniques for digitized herbarium specimens: A systematic literature review. Ecological Informatics 69, 101641. 10.1016/j.ecoinf.2022.101641. [DOI] [Google Scholar]
  • 35.Ott T, Palm C, Vogt R, and Oberprieler C (2020). GinJinn: An object-detection pipeline for automated feature extraction from herbarium specimens. Applications in Plant Sciences 8, e11351. 10.1002/aps3.11351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Dinnage R, Grady E, Neal N, Deck J, Denny E, Walls R, Seltzer C, Guralnick R, and Li D (2024). PhenoVision: A framework for automating and delivering research-ready plant phenology data from field images. Preprint at bioRxiv, 10.1101/2024.10.10.617505 [DOI] [Google Scholar]
  • 37.Van Rossum G, and Drake FL (2009). Python 3 Reference Manual (CreateSpace; ). [Google Scholar]
  • 38.R Core Team (2021). R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing; ). [Google Scholar]
  • 39.Fick SE, and Hijmans RJ (2017). WorldClim 2: new 1-km spatial resolution climate surfaces for global land areas. International Journal of Climatology 37, 4302–4315. 10.1002/joc.5086. [DOI] [Google Scholar]
  • 40.Phillips SJ, Anderson RP, and Schapire RE (2006). Maximum entropy modeling of species geographic distributions. Ecological Modelling 190, 231–259. 10.1016/j.ecolmodel.2005.03.026. [DOI] [Google Scholar]
  • 41.Lunau K, Papiorek S, Eltz T, and Sazima M (2011). Avoidance of achromatic colours by bees provides a private niche for hummingbirds. Journal of Experimental Biology 214, 1607–1612. 10.1242/jeb.052688. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental text
Supplemental Table1
Supplemental Table3

Table S3: Manual validation of species assigned “red” labels, related to STAR Methods.

Supplemental Table2

Table S2: Full dataset of species paired with GPT-4V color labels, related to STAR Methods.

VideoS1

Video S1: Occurrences of all flowers, hummingbirds, and red flowers in eastern North America, related to Figure 2.

Download video file (14.8MB, mpg)
Video S2

Video S2: Modeled distributions of red vs. white flowers in North America, related to Figure 2.

Download video file (2MB, mpg)
Videos3

Video S3: Modeled distributions of red flowers vs. hummingbirds vs. white flowers in North America, related to Figure 3.

Download video file (3MB, mpg)

RESOURCES