Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2025 Feb 21.
Published in final edited form as: Science. 2024 Nov 1;386(6721):eadp7710. doi: 10.1126/science.adp7710

A molecular mechanism for bright color variation in parrots

Roberto Arbore 1,2,3,*,#, Soraia Barbosa 1,2,#, Jindřich Brejcha 4,#, Yohey Ogawa 3, Yu Liu 3, Michaël P J Nicolaï 5, Paulo Pereira 1,2,6, Stephen J Sabatino 1,2, Alison Cloutier 7, Emily Shui Kei Poon 7, Cristiana I Marques 1,2,6, Pedro Andrade 1,2, Gerben Debruyn 5, Sandra Afonso 1,2, Rita Afonso 1,2,6, Shatadru Ghosh Roy 8, Uri Abdu 8, Ricardo J Lopes 1,2,9,10, Peter Mojzeš 11, Petr Maršík 12, Simon Yung Wa Sin 7, Michael A White 13,14, Pedro M Araújo 1,2,15,*, Joseph C Corbo 3,*, Miguel Carneiro 1,2,*
PMCID: PMC7617403  EMSID: EMS202157  PMID: 39480920

Abstract

Parrots produce stunning plumage colors through unique pigments called psittacofulvins. However, the mechanism underlying their ability to generate a spectrum of vibrant yellows, reds, and greens remains enigmatic. Here, we uncover a unifying chemical basis for a wide range of parrot plumage colors, which result from the selective deposition of red aldehyde- and yellow carboxyl-containing psittacofulvin molecules in developing feathers. Through genetic mapping, biochemical assays, and single-cell genomics, we identified a critical player in this process, the aldehyde dehydrogenase ALDH3A2, which oxidizes aldehyde psittacofulvins into carboxyl forms in late-differentiating keratinocytes during feather development. The simplicity of the underlying molecular mechanism — in which a single enzyme influences the balance of red and yellow pigments — offers an explanation for the exceptional evolutionary lability of parrot coloration.


Colors play a vital role in ecological adaptation and communication in the natural world (1). Among animals, birds stand out for their wide range of striking hues, color patterns, and iridescence. Through their plumage colors, birds interact with their environment and convey crucial information about individual and species identity, health status, sexual attractiveness, and social dominance (16). Bird colors are thus frequent targets of both natural and sexual selection (7, 8). Despite decades of study, understanding the selective and ecological pressures underlying the adaptive function of coloration in nature (9), as well as the physiological and metabolic processes presumably linking color ornaments to fitness, remains a challenge (1, 10). Investigating the molecular mechanisms controlling color variation is one promising approach to shed light on these enduring questions.

Parrots are renowned for their vibrant plumage (11, 12). Their feathers differ dramatically among species in hue, saturation, and overall patterning across the body (Fig. 1), and likely subserve a variety of signaling and non-signaling functions (1318). The rapid and dynamic evolution of parrot coloration (Fig. 1) largely results from the differential deposition of psittacofulvins during feather growth (12, 1921), a class of polyene pigments that are uniquely found in these birds and generate their fiery reds and luminous yellows (19, 22). When combined with blue coloration that arises from light scattering by nanostructural features of the feather, yellow psittacofulvin coloration also gives rise to vivid green hues (Fig. 1). Unlike carotenoids, which also produce bright yellow and red colors in many bird species and need to be acquired through diet (2325), psittacofulvins are endogenously synthesized by parrots (22). A polyketide synthase (PKS) has been identified as essential for psittacofulvin biosynthesis in domesticated mutants lacking psittacofulvin-based pigmentation (2628), but the mechanisms governing the striking color variation found in parrots remain unknown.

Fig. 1. Psittacofulvin-based coloration diversity and evolution in parrots.

Fig. 1

Phylogeny of all 354 species of parrots showing whether species display: 1) only yellow/green hues (yellow circles), 2) only red hues (red circles), 3) simultaneously yellow/green and red hues (orange circles), or 4) no psittacofulvin-based hues (grey circles, i.e., color is produced by other mechanisms with no contribution from psittacofulvins). We note that green and yellow color patches were considered together since the green color is a combination of blue structural coloration and yellow psittacofulvin pigments. Lineages including several species of identical phenotype are collapsed. The phylogeny demonstrates the high number of evolutionary shifts among parrots expressing yellow, green, and red. Across the bottom is a compilation of photographs showcasing the diversity of parrot plumage coloration. From left to right: golden parakeet (Guaruba guarouba, CC BY-SA 3.0 Rodrigo Menezes), budgerigar (Melopsittacus undulatus, ML619272371, Robert Hynson), rosy-faced lovebird (Agapornis roseicollis, ML215809581, Niall Perrins), galah (Eolophus roseicapilla, CC BY-SA 2.0 Jim Bendon), scarlet macaw (Ara macao, ©Milan Kořínek, www.biolib.cz/en), and red lory (Eos bornea, CC BY-SA 3.0 Doug Janson).

In this study, we identify a simple molecular mechanism that explains how psittacofulvins are biochemically modified to produce yellow-to-red and green hues in parrots. The discovery of this mechanism provides an explanation for a broad spectrum of phenotypic variation that characterizes one of the most brilliantly colored animal groups in the natural world.

A shared chemical basis for yellow-to-red psittacofulvin coloration in parrots

Psittacofulvins were first identified in studies of red parrot feathers as non-carotenoid pigment molecules consisting of an extended polyene chain with a single aldehyde end group (Fig. 2A) (19, 22). Subsequent research postulated that variation in the hue of psittacofulvin-based colors could arise from multiple factors (22, 26, 29); however, the precise chemical and physical mechanisms responsible for color variation in parrots remain unclear. To explore these questions, we conducted comprehensive chemical analyses of red, orange, yellow, and green feathers across representatives of all parrot superfamilies (seven species), spanning >50-80 million years of evolution (30).

Fig. 2. Chemical analyses of psittacofulvin pigmentation.

Fig. 2

(A) Schematic representation of a tail feather of a scarlet macaw (Ara macao). Psittacofulvins are deposited within the keratin matrix of both the ramus and barbules of feathers (insert). Psittacofulvins are linear polyenes with various carbon chain lengths (C16 examples shown) and with distinct terminal groups (aldehyde or carboxyl groups). (B) The plot illustrates the magnitude of the shifts in the positions of the two primary Raman bands characteristic of psittacofulvins (both y-axis). The dashed line at 0 represents the average spectrum. The variation in these Raman bands is shown for yellow, green, and red feathers of the studied parrot species. From left to right: budgerigar (Melopsittacus undulatus), Pesquet’s parrot (Psittrichas fulgidus), rosy-faced lovebird (Agapornis roseicollis), scarlet macaw (Ara macao), galah (Eolophus roseicapilla), cockatiel (Nymphicus hollandicus), and kea (Nestor notabilis). (C) Chromatograms of the presence of positive ions of the exact molecular masses corresponding to psittacofulvins in the carboxyl (green) and aldehyde forms (magenta) detected by mass spectrometry (HRAM-QTOF). Peak 1 corresponds to C14 carboxylic acid, 2 to C14 aldehyde, 3 to C16 carboxylic acid, 4 to C16 aldehyde, 5 to C18 carboxylic acids, and 6 to C18 aldehyde. The mass spectrometry peaks correspond to the UV-detected absorbance peaks (UHPLC UV/VIS) shown below – the slight shift is caused by the delay from the UV/VIS detection to the HRAM-QTOF molecular mass detection. The UHPLC UV/VIS chromatogram (black) shows absorbance peaks detected at 421 nm, which is close to the average maximum absorbance wavelength of psittacofulvins. The plots at the bottom show the maximum absorbance shifts between the carboxyl and aldehyde forms of psittacofulvins with different carbon chain lengths. The absorbance spectra have been normalized such that the maximum intensity = 1. (D) Differences in psittacofulvin content in red, yellow, and green feathers of parrot species. The total amount of psittacofulvins or the relative amount of each type was quantified as the area under the peak in the exact mass spectrum relative to the baseline. The upper row shows the total amount of psittacofulvins, the middle row shows the relative abundance of psittacofulvins of different lengths, and the lower row shows the ratio of aldehyde (magenta) and carboxylic forms (green). The order of the species is the same as in panel (B).

We utilized confocal Raman microscopy to examine differences in the vibrational spectra of pigment molecules in situ. These analyses revealed a consistent tendency of yellow/green hues to be shifted towards higher wavenumbers compared to red hues, regardless of the species analyzed (Fig. 2B; figs. S1 and S2), aligning with previous studies (21, 29, 31). We also confirmed that green color patches result from a combination of blue structural coloration and yellow psittacofulvin pigments, and therefore exhibit similar Raman spectra to those produced by yellow patches. These analyses further revealed that yellow/green and red hues share a common structural fingerprint across species and point towards a general mechanism underlying psittacofulvin-based color differences among parrots.

To gain further insights into pigment composition in parrot feathers, we performed ultra-high-performance liquid chromatography (UHPLC) coupled with UV/VIS detection and high-resolution, accurate-mass (HRAM) quadrupole time-of-flight (Q-TOF) mass spectrometry. Our chemical analyses confirmed the existence of psittacofulvins with varying polyene chain lengths (14, 16, or 18 carbons; herein referred to as C14, C16, and C18) and two different types of functional groups: aldehydes and carboxylic acids (Fig. 2C; figs. S3 and S4). These various molecular forms were identified in all species and feather tracts. We found that both the number of conjugated double bonds and the identity of the end group influence the wavelength of maximal absorbance of the psittacofulvin molecule (Fig. 2C). As with carotenoid pigments (24), the addition of a single double bond to the conjugated system of a psittacofulvin red-shifts its peak absorbance by 16-20 nm. Similarly, as reported for retinoids (32), switching from a carboxylic acid to an aldehyde end group red-shifts the peak absorbance by 17-24 nm.

In our analyses, red and orange feathers were strongly enriched in aldehyde psittacofulvins, whereas yellow and green feathers contained a higher proportion of carboxyl psittacofulvins (F = 43.8, P = 8.94 × 10-16; Fig. 2D and table S1). This pattern was consistently observed across all major parrot lineages (Fig. 2D and fig. S5). Intensely red-colored patches also exhibited a tendency for a higher proportion of molecules with longer carbon chain lengths, as well as a higher concentration of pigments (e.g., macaw, kea, and Pesquet’s parrot; Fig. 2D). However, the same patterns were not evident in less saturated red or orange patches (e.g., cockatiel and lovebird). Taken together, these findings suggest that the ratio of carboxylic acid-to aldehyde-containing psittacofulvins plays a major role in determining the hue of a feather and that this mechanism is conserved across divergent lineages of parrots.

The genetic and transcriptomic bases of psittacofulvin coloration

Heritable differences in psittacofulvin coloration are largely fixed among parrot species. The absence of phenotypic variation within populations complicates efforts to study the genetic basis of parrot coloration, which has been exclusively analyzed in domesticated species carrying mutations that result in a complete loss of psittacofulvin pigmentation (26, 28). To uncover molecular mechanisms underlying the evolution and diversification of psittacofulvin-based colors in nature, we took advantage of a naturally occurring intraspecific polymorphism present in the dusky lory (Pseudeos fuscata). This species is native to New Guinea where two color morphs expressing red and yellow pigmentation coexist and interbreed in sympatry (Fig. 3A) (33), offering a rare opportunity to conduct genetic mapping and investigate how parrot colors evolve in the wild. The red and yellow morphs differ in the content of psittacofulvin forms as described above for the other parrot species (Fig. 3B and figs. S5 and S6). Pedigree and phenotypic data gathered from 20 breeding couples show that this polymorphism is inherited largely as a binary trait, indicative of a simple genetic architecture, with yellow being dominant over red (table S2).

Fig. 3. The genetic and transcriptomic bases of psittacofulvin colors.

Fig. 3

(A) Images of red and yellow morphs of the dusky lory (Pseudeos fuscata). Photo credits: David Hosking/ Minden Pictures. (B) Differences in pigment composition between feathers of red and yellow morphs. The left panel shows the relative ratio of aldehydes (magenta) and carboxylic acids (green), and the right panel shows the relative abundance of psittacofulvins of different chain lengths. (C) Genetic mapping of the color polymorphism. The top panel summarizes the genome-wide association analysis using whole-genome resequencing data. Each dot represents the −log10 transformation of Wald test P-values for each variant. The horizontal red line indicates the Bonferroni-corrected genome-wide significance (P = 1.16 × 10-8;-log10 (P) = 7.94) based on the total number of tests (n = 4,303,897). The bottom panel is a zoomed-in view of the region of association shown on top. The protein-coding genes contained within the represented genomic interval are shown at the top of the panel. (D) Patterns of gene expression of ALDH3A2 between dusky lory morphs. The left panel shows RNA-seq normalized raw read counts (circles) from regenerating feather follicles from red (n = 3, left) and yellow (n = 3, right) birds, with colored boxes illustrating the range of read counts for the respective color morph. The right panel shows the proportion of full-length Iso-seq transcripts (n = 152) linked to the red and yellow alleles in heterozygous individuals (n = 3). (E) Differential expression of ALDH3A2 between red feathers from the forehead region (d) versus green feathers from the back (a), chest (b), and head (c) regions of rosy-faced lovebirds (n = 8 for each region; Welch’s test followed by Games-Howell post-hoc test; * adj-P < 0.05; ** adj-P < 0.01; *** adj-P < 0.001; **** adj-P < 0.0001). The sampled regions are indicated on the illustration at the bottom left of the graph.

We assembled and annotated a draft reference genome of the dusky lory (tables S3 to S5) and resequenced the genomes of 57 individuals representing both color morphs (red n = 35; yellow n = 22; mean depth = 11.5 ± 4.8x; table S6). Among the 4,303,897 variants we examined via association tests, only three exceeded the genome-wide significance threshold for explaining the color phenotype (Fig. 3C). The three variants spanned a small interval of 284 bp (scaffold_13:6,288,645-6,288,929), and one SNP (scaffold_13:6,288,712T>C) showed a markedly stronger association with color (P = 6 × 10-14). Both alleles of the top associated variant were present in multiple haplotype backgrounds and exhibited a rapid decay of haplotype homozygosity (EHH < 0.5, ~1.2 kb and ~2 kb on each side; fig. S7). These patterns imply that color alleles at this locus have been co-existing and recombining within dusky lory wild populations for many generations. Based on patterns of nucleotide variation around the candidate region, we inferred the color polymorphism to be ~1,000,000 years old (table S7), ranging from ~460,000 to ~1,940,000 years, depending on assumptions about generation time, mutation rate, and recombination rate (34).

The candidate interval lies in a non-coding region between ALDH3A2 and SLC47A1, immediately downstream of the last exon of ALDH3A2 (Fig. 3C). SLC47A1 encodes the Solute Carrier Family 47 Member 1, a transporter involved in excreting endogenous and exogenous electrolytes through urine and bile (35). ALDH3A2 encodes the Aldehyde Dehydrogenase 3 Family Member A2 (also known as FALDH), a ubiquitously expressed enzyme responsible for catalyzing the oxidation of medium- and long-chain fatty aldehydes (preferentially acting on C14-C18 substrates) to the corresponding carboxylic acids (36). In light of our pigment analyses, ALDH3A2 is a strong candidate gene for explaining the differences in psittacofulvin coloration between dusky lory color morphs.

To investigate whether color differences between the dusky lory color morphs could arise from differences in gene expression, we performed bulk RNA-seq analysis and generated full-length transcriptomes via PacBio long-read sequencing of regenerating feather follicles derived from both red (n = 3) and yellow (n = 3) individuals (table S8). Our decision to examine growing feathers was guided by the discovery that parrots neither circulate psittacofulvins in the bloodstream nor accumulate them in the liver (22), implying that pigment biosynthesis occurs locally in the integument during feather development. We identified 33 genes with significant differential expression, but none were contained within the candidate scaffold (data S1). Considering the positional information provided by our genetic mapping, we examined the two genes flanking the associated variants. SLC47A1 displayed negligible expression in regenerating feather follicles. For ALDH3A2, we found that the three protein-coding isoforms detected by Iso-seq (fig. S8) are expressed at similar levels in red and yellow birds. Although we detected a subtle, but statistically non-significant, trend toward higher ALDH3A2 expression in yellow individuals by RNA-seq (Fig. 3D; left), quantitative polymerase chain reaction (qPCR) analyses failed to confirm this difference (fig. S9).

Differences in the expression of ALDH3A2 between color morphs may be obscured by its ubiquitous expression (see below), insufficient statistical power due to low sample size, or by the fact that all three yellow individuals under investigation were heterozygous for the red and yellow alleles at the candidate locus. To further explore ALDH3A2 expression, we compared the relative abundance of red and yellow transcripts in each of the heterozygous individuals using Iso-seq data. This approach is particularly sensitive to minor differences in expression given that in heterozygous birds both alleles are exposed to the same trans-acting regulatory environment in the nucleus. We found an imbalance in the expression of the two alleles, with a higher percentage of reads corresponding to the yellow allele (71% yellow vs. 29% red, χ2, P = 0.03; Fig. 3D; right), suggesting the existence of underlying cis-regulatory differences favoring higher expression of this allele. This finding is consistent with the hypothesis that ALDH3A2 encodes an enzyme that converts aldehyde psittacofulvins into carboxyl forms and fits the expectations of our pigment analysis (Fig. 3B), which showed that dusky lory individuals with yellow plumage contained a higher proportion of carboxyl psittacofulvins in their feathers.

The above hypothesis implies that differences in the ratio of carboxyl- to aldehyde-containing psittacofulvins across feathers should positively correlate with ALDH3A2 expression levels. To test this idea, we used rosy-faced lovebirds (Agapornis roseicollis), a species that displays both green (i.e., yellow psittacofulvin-containing) and red plumage patches (Fig. 1). Using bulk RNA-seq, we analyzed gene expression in regenerating feather follicles of eight rosy-faced lovebirds and four plumage patches: three green patches (from back, chest, and head) and one red feather patch (from head) (table S8). Our initial chemical analyses indicate that these patches differ in the ratio of carboxyl- to aldehyde-containing psittacofulvins, with the green patches showing a higher ratio than the red (Fig. 2D). We found that ALDH3A2 ranks among the top differentially expressed genes in all pairwise comparisons between green and red patches (fig. S10), with 4-6-fold higher expression in the three green feather patches compared to the red (Fig. 3E). Differential expression of ALDH3A2 between color patches was confirmed by qPCR (fig. S9). These findings are consistent with the hypothesis that yellow/green feathers express higher levels of ALDH3A2, which converts red aldehyde psittacofulvins into the corresponding yellow carboxyl forms. These data further support the conclusion that ALDH3A2 is a strong candidate gene for mediating color variation in parrots.

ALDH3A2 is expressed at higher levels in late differentiating keratinocytes

To investigate the expression of ALDH3A2 in the context of feather development, we next studied gene expression at the cellular level. We generated single-cell RNA sequencing (scRNA-seq, n = 2) data from regenerating feather follicles of budgerigars (table S9), a parrot species that expresses yellow psittacofulvin-based coloration (Figs. 1 and 2) and can be maintained for extended periods under laboratory conditions required for optimization of single-cell analysis protocols. We analyzed a total of 6,262 cells, which aggregated in 10 clusters representing the major cell types of regenerating feather follicles (Fig. 4A and fig. S11) (34). Distally located epithelial and pulp cells undergo apoptosis and keratinization during feather regeneration (37), hence we expect these cell populations to be underrepresented, and for more immature, proximally placed cells to be correspondingly overrepresented in our single-cell dataset. We found that ALDH3A2 exhibited widespread expression across all cell types (fig. S12), as expected for a gene critical for cellular metabolism (38).

Fig. 4. ALDH3A2 expression during feather development.

Fig. 4

(A) scRNA-seq analyses of budgerigar regenerating feather follicles (t-SNE projection): annotation of 6,262 cells clustered by gene expression profiles into 10 major clusters. Plots of selected marker genes supporting the annotation are reported for each cluster in fig. S11. (B) Expression of keratin 17-like (KRT17L; ENSMUNG00000017214.1) in keratinocyte clusters (t-SNE projection). (C) scRNA-seq analyses of keratinocytes (n = 2,753 cells; UMAP projections). Left: heatmap of average expression levels of five cell cycle genes defining a sub-cluster of dividing keratinocytes (i.e., follicle proliferation zone). Middle: heat map of average expression levels of five genes defining late differentiating keratinocytes (fig. S12) (34). Right: heatmap of ALDH3A2 expression. (D) Analyses of keratinocyte differentiation. Left: branching trajectory reflecting differentiation from dividing cells in the proliferative zone towards cells forming specialized structures in the follicle (i.e., the marginal, axial, and barbule plates). The color indicates the distance of each cell (pseudotime) from the root node in the proliferative zone (solid arrow): blue = early cells; yellow = late differentiating cells. The red line shows the trajectory leading to a sub-population of keratinocytes with the highest expression of ALDH3A2, likely axial plate cells (34). Right: normalized gene expression of KRT17L (all keratinocytes), CDK1 (proliferating keratinocytes), SCEL (late differentiating keratinocytes), and ALDH3A2 in the cells along the trajectory (n = 859 cells). ALDH3A2 expression is enriched towards late differentiating keratinocytes.

The deposition of psittacofulvin pigments within the keratin matrix of feathers suggests that keratinocytes might be involved in their metabolism. In a subset of the cell clusters identified using scRNA-seq, we noticed a strong enrichment of various keratin genes such as keratin 17-like (KRT17L), a type I alpha keratin that is known to have ubiquitous expression in feather keratinocytes (39) (Fig. 4B). In regenerating feather follicles, keratinocyte differentiation proceeds along the proximal-distal axis of growth, from the proliferative zone to the distal tip of the feather, and radially, during the formation of barb ridges where they organize into the marginal, axial, and barbule plates (40). A closer examination of cell-cycle marker genes in our scRNA-seq data revealed a population of dividing keratinocytes with elevated expression of these genes, likely corresponding to the proliferative zone at the base of the feather follicle (Fig. 4C and fig. S12) (34). Additionally, gene expression patterns suggested a progression of keratinocyte lineages from actively dividing cells to non-cycling late differentiating keratinocytes, likely positioned towards the distal tip of the feather and enriched in expression of SCEL (41) (a precursor to the cornified envelope of terminally differentiated keratinocytes) and of the epidermal differentiation complex gene EDQM3 (42) (Fig. 4C and fig. S12). By explicitly modeling the progress of keratinocyte differentiation along a branching trajectory rooted at the putative follicle proliferative zone (34), we found a progressive increase in ALDH3A2 expression towards late differentiating cells, with maximum expression in a population of cells expressing NCAM1 (43) and likely representing axial plate keratinocytes (Fig. 4D and fig. S12). Axial plate keratinocytes are located in the midline of each developing barb ridge and will later undergo apoptosis, enabling the flanking barbule plate cells to open and form a feather barb (37). The increased expression of ALDH3A2 in this lineage of late-differentiating keratinocytes suggests that they serve as the primary site for yellow-to-red psittacofulvin conversion in parrots during feather development.

A candidate causal mutation resides in a late differentiating keratinocyte-specific open chromatin region

Next, we sought to identify the specific mutation responsible for regulating ALDH3A2 expression in the dusky lory. The yellow morph is genetically dominant over the red morph, establishing a clear expectation for the genotypes associated with causative mutations. Apart from the three significant non-coding variants from the genetic mapping analyses (Fig. 3C), we did not identify any structural variants within 100 kb of the candidate interval that followed the expected inheritance pattern, nor did we identify additional small deletions or point mutations. The lead variant identified in the association mapping analysis (scaffold_13:6,288,712T>C), located in a non-coding region 42 bp downstream of the longest ALDH3A2 transcript, exhibited the anticipated genotypes in nearly all individuals (98%). One yellow individual deviated from the expected pattern. However, the two remaining significant variants (scaffold_13:6,288,645C>T and 6,288,929A>G) did not match the expected genotypes based on phenotype in five individuals, including the same yellow individual that carried a red haplotype in homozygosity across the entire region. Additionally, by screening breeding couples that produced offspring of both color morphs, we were able to exclude these two mutations as potential causative factors in one of the pedigrees (table S10). Thus, the lead variant from the association mapping analyses emerges as the sole candidate causal mutation to explain the color polymorphism. The mismatched genotype in one individual may be attributable to genetic heterogeneity or epistatic interactions with unknown genetic factors located elsewhere in the genome.

We then hypothesized that the candidate non-coding variant might be involved in the regulation of ALDH3A2 expression due to its occurrence within a cis-regulatory element (i.e., enhancer/promoter). To test this hypothesis, we assessed genome-wide chromatin accessibility in regenerating feather follicles of budgerigars using a single-nucleus transposase-accessible chromatin sequencing assay (snATAC-seq, n = 2; table S9). This method identifies open chromatin regions expected to be enriched for regulatory elements. An annotation based on open chromatin profiles at the promoters of cell-type markers in 1,700 nuclei validated the existence of the primary cell types previously identified via scRNA-seq (Fig. 5A and fig. S13) (34). The only difference compared to the scRNA-seq data is that leukocytes are now collapsed in a single cluster. Among keratinocytes (defined by openness at the promoter region of several keratin genes, including KRT17L; Fig. 5B and fig. S14), we observed significant differential accessibility at the promoter regions of differentiation genes such as SCEL and EDQM3 in cells likely corresponding to late differentiating keratinocytes as identified in our scRNA-seq analyses (SCEL: Log2FC = 2.7, P = 3 × 10-29; EDQM3: Log2FC = 3.6, P = 2 × 10-26; Fig. 5C and fig. S14). While the promoter region of ALDH3A2 is broadly open across multiple cell types — as expected given this gene’s ubiquitous expression — there is an additional open chromatin region upstream of the promoter that is specific to keratinocytes (Fig. 5D). Furthermore, there is another region of open chromatin (budgerigar genome, chr13:8,487,599-8,488,513) immediately downstream of ALDH3A2 that is specific to the late differentiating keratinocyte cluster (Log2FC = 3.40, P = 5.97 × 10-14; Fig. 5, C and D). Most strikingly, the homologous region in the dusky lory genome includes our candidate causal variant.

Fig. 5. A regulatory region overlaps the candidate causal mutation.

Fig. 5

(A) snATAC-seq analyses (t-SNE projection): annotation of 1,700 cells into nine major clusters based on chromatin accessibility profiles. (B) Accessibility at the Keratin 17-like promoter in the keratinocyte clusters (t-SNE projection; gene ID: ENSMUNG00000017214.1). (C) snATAC-seq analyses of keratinocytes (t-SNE projections). Left: heatmap of averaged DNA accessibility at the promoters of five genes identified in the scRNA-seq analyses as defining late differentiating keratinocytes (Fig. 4C and figs. S12 and S14) (34). Right: heatmap of DNA accessibility at the ATAC peak identified downstream of ALDH3A2 and corresponding to a late differentiating keratinocyte-specific regulatory element. (D) Chromatin accessibility at the ALDH3A2 locus for different cell types: normalized transposase cut site counts per cluster smoothed over 400 bp windows. The grey area highlights the region shown in (E). (E) Characterization of the regulatory element downstream of ALDH3A2 in budgerigar. Top: predicted nucleotide contribution for chromatin accessibility (per-nucleotide averaged contribution score from three independently trained models). Bottom: annotation of predicted TF binding sites enriched in late differentiating keratinocytes. The red box highlights the region shown in (G). (F) Sequence logos for 14 representative motifs (chosen from 10 motif sub-families) among the top 57 predicted TF binding sites which showed the greatest change in position-weight matrix (PWM) score between C and T nucleotides (shown on the right, rounded to one significant digit). (G) Top: per-nucleotide evolutionary conservation (phyloP scores) across 363 bird genomes projected to the budgerigar sequence. Only positive scores, indicating slower evolution than expected, are reported. The dashed line represents the non-coding genome-wide top 5th percentile. Bottom: per-nucleotide evolutionary conservation across 100 parrot genomes. Nucleotide symbols at the same position are scaled according to their frequency. The height of the stacked symbols describes the information content at each position in the alignment.

To further investigate whether the region that contains the candidate causal variant might be capable of cis-regulatory activity in keratinocytes, we conducted an enrichment analysis of transcription factor (TF) binding sites within the accessible chromatin regions of the regenerating feather follicles (table S11). Several of these motifs appeared to contribute to chromatin openness in the region harboring the causal variant, as revealed by modeling the higher-order syntax of TF binding motifs of late differentiating keratinocytes using a convolutional neural network (Fig. 5E). The same method also predicted similar chromatin accessibility profiles for the ALDH3A2 locus in the budgerigar, the dusky lory, and several other parrots (fig. S15). For example, 33 bp upstream of the candidate mutation, we identified a consensus binding site motif for the RUNX family of pioneer transcription factors with a strong predicted contribution to chromatin accessibility (Fig. 5E). Since none of the motifs enriched in late differentiating keratinocytes overlapped with the candidate causal variant, we used a comprehensive collection of TF binding motifs to search for motifs that do overlap and compiled a list of those that show differential predicted binding affinity between the T (yellow) and C (red) dusky lory alleles (Figs. 5F and fig. S16; data S2). This catalog provides a manageable list of candidate TFs for future functional testing.

We next investigated evolutionary conservation at individual sites across the candidate region. Alignment of the genomes of 363 diverse bird species revealed only low-to-moderate phylogenetic conservation in the region (Fig. 5G and fig. S15), with the nucleotide position homologous to the candidate causative mutation in the dusky lory (budgerigar: chr13:8,487,876; dusky lory: scaffold_13:6,288,712) corresponding to a local conservation maximum. Importantly, the alignment of this region among 100 parrot genomes revealed universal conservation of cytosine at the candidate causative position and strong conservation in the flanking nucleotides (Fig. 5G). This observation suggests that strong purifying selection and ancestral functional constraints have acted upon the locus throughout parrot evolution, a pattern consistent with the preservation of a TF binding site. Collectively, our results suggest a mechanism whereby a point mutation alters the binding of a yet to be identified TF within a cell type-specific enhancer in parrots. This alteration is likely to lead to changes in allelic expression of ALDH3A2 during feather development in the dusky lory.

The enzyme encoded by ALDH3A2 oxidizes aldehyde psittacofulvins to carboxyl forms

Our genetic and chemical results indicate that a substantial portion of the spectrum of parrot plumage colors can be attributed to the ratio of carboxyl- to aldehyde-containing psittacofulvins deposited during feather development. To investigate the role of ALDH3A2 in psittacofulvin biosynthesis, we used baker’s yeast (Saccharomyces cerevisiae) to assay psittacofulvin production upon transfection with the avian polyketide synthase (PKS) (26), with and without ALDH3A2. We introduced a construct containing PKS into wild-type (WT) yeast and two genetically modified strains (Fig. 6A). In one strain (designated Δhfd1 + PKS), we knocked out HFD1, the yeast ortholog of avian ALDH3A2 (44). In the second modified strain, we replaced HFD1 with ALDH3A2 from the dusky lory (Δhfd1 + PKS + ALDH3A2).

Fig. 6. The role of aldehyde dehydrogenase activity in psittacofulvin biosynthesis.

Fig. 6

(A-C) Analyses of yeast pigment extracts. A wild-type yeast strain (WT) was transformed to express PKS (WT + PKS). Two additional strains expressing PKS were engineered by knocking-out HFD1, the yeast homologous of ALDH3A2 (Δhfd1 + PKS), and by knocking-out HFD1 and knocking-in the dusky lory ALDH3A2 (Δhfd1 + PKS + ALDH3A2). (A) UHPLC spectra of yeast pigment extracts. Extracts for all the PKS-expressing strains contained varying amounts of chemically distinct psittacofulvins represented by three main absorbance peaks (1-3). Expression of ALDH3A2 (strain: Δhfd1 + PKS + ALDH3A2) restored the WT chromatogram (WT + PKS). (B) Absorption spectra of the main UHPLC peaks: peak 1 (carboxyl psittacofulvin), peaks 2a and 2b (alcohol psittacofulvins), and peak 3 (aldehyde psittacofulvin). (C) Chromatographic separation of peaks 2a and 2b. (D) Proposed model of psittacofulvin biosynthesis. After priming with an acetyl unit, PKS acts cyclically by adding malonyl units to extend the polyketide chain which is then reductively released as an aldehyde. Aldehyde psittacofulvin products are then converted into the carboxyl form by ALDH3A2. Tuning of ALDH3A2 enzymatic activity (e.g., by modulation of its expression) from ‘low’ to ‘high’ is sufficient to explain the production of red-to-yellow psittacofulvins in parrots.

Through liquid chromatography (UHPLC) and mass spectrometry analyses (HRAM Q-TOF) of pigment extracts from the yeast strains (Fig. 6, A to C; figs. S17 and S18), we identified three major peaks that stand out in their UV/VIS absorbance intensity and have chromatographic, spectrophotometric (45), and mass-spectroscopic (19, 26) characteristics corresponding to psittacofulvins or their derivatives (Fig. 6, A and B). Peaks 1 and 3 closely resembled the pigments present in parrot feathers, and based on their maximum absorbance (402 nm and 418 nm) and molecular masses (243.1385 m/z and 227.1436 m/z), we determined them to be C16 psittacofulvins with either a carboxyl (peak 1) or an aldehyde group (peak 3). Peak 2 exhibited absorbance spectra resembling those of alcohols produced by the chemical reduction of psittacofulvins (19, 45). Further analysis revealed that peak 2 splits into two closely eluting peaks, 2a and 2b (Fig. 6C), both displaying identical chromophores (Fig. 6B) and exhibiting molecular masses of 245.1537 m/z (C16H20O2) and 231.1742 m/z (C16H22O), respectively (fig. S18). Several chemical structures are consistent with each of these molecular weights. We propose that peak 2a may arise from early chain termination before the formation of the seventh double bond. Peak 2b appears to have a terminal hydroxyl group (Fig. 6B), likely formed from the rapid conversion of the corresponding aldehyde via the action of an endogenous yeast enzyme, as previously observed for other metabolic pathways in yeast strains lacking HFD1 (44).

Upon deleting HFD1 (Δhfd1 + PKS), we observed a significant alteration in pigment composition compared to the chromatogram of WT + PKS (Fig. 6A). This change was characterized by a noticeable decrease in the absorption intensity of peak 1 (the carboxyl form) and the concomitant increase in the absorption intensity of peaks 2 (i.e., alcohol forms) and 3 (i.e., aldehyde form). Strikingly, the effect of HFD1 deletion was entirely reversed by knocking ALDH3A2 into the locus (Fig. 6A). These results suggest that the ALDH3A2 enzyme converts the aldehyde end group of psittacofulvins into carboxylic acid and that the orthologous yeast enzyme encoded by HFD1 possesses the same biochemical activity.

Collectively, these findings support a model wherein red aldehyde-containing psittacofulvins are the primary products of PKS (possibly released by the action of an unknown thioesterase) which are then subjected to enzymatic modification by ALDH3A2, resulting in the formation of yellow carboxyl forms (Fig. 6D). Our proposed model implies that modulation of ALDH3A2 expression levels is necessary and sufficient to explain a large portion of the observed color variation among parrot species.

Discussion

Evolutionary innovations often act as catalysts of biological diversification. This study investigates the biochemical and genetic basis of a unique pigmentary system that evolved exclusively in parrots and which drives the vivid hues ornamenting their plumage. Our feather pigment analysis uncovered a strong correlation between the relative proportion of chemically distinct psittacofulvin molecules and color differences. This finding implies that psittacofulvin-based color variation — which has evolved numerous times independently in parrots — has a common chemical basis across divergent lineages of parrots.

Through a combination of genetic mapping, transcriptomics, and functional experimentation, we further implicated the ALDH3A2 enzyme in psittacofulvin-driven color shifts. We show that ALDH3A2 underlies color variation in a rare instance of a parrot species that is phenotypically variable for yellow and red plumage coloration in the wild and further show that ALDH3A2 expression is also correlated with color differences between plumage patches in another species. The color transition in both systems thus appears to be a direct outcome of changes in aldehyde metabolism by modulation of ALDH3A2 expression that yield chemically distinct psittacofulvin pigments, which is consistent with the expectations based on our analysis of pigment composition. Although multiple genetic factors likely determine the overall color phenotype of a parrot species, our results show that substantial color shifts in psittacofulvin pigmentation can be accomplished via subtle changes in enzymatic activity. This simplicity may explain why evolutionary transitions from yellow/green to red hues, and vice versa, are so common in the parrot lineage.

ALDH3A2 is also notable for its role in vital cellular functions. As demonstrated by our yeast experiments, ALDH3A2 is deeply conserved across the tree of life (38, 44, 46). Our findings therefore provide an example of an ancient gene being co-opted for a new function, which was likely enabled by cis-regulatory changes exerting their effects on specific cell types and thereby minimizing potential pleiotropic consequences. Considering that the spectrum of hues observed across parrot species appears often to be attributable to the selective deposition of carboxyl- vs. aldehyde-containing psittacofulvins in feathers, it is highly plausible that ALDH3A2, or other enzymes with aldehyde dehydrogenase activity, are involved in such color differences in a wide range of parrot species.

The adaptive significance of parrot colors remains poorly understood (12), despite associations with predation risk, oxidative stress, feather degradation, and condition signaling in mate choice (1418, 47). Our findings, together with what is known about the role of aldehyde dehydrogenases in the detoxification of reactive compounds that accumulate within cells from lipid metabolism or dietary sources (48, 49), lead to several hypotheses regarding the potential signaling information conveyed by psittacofulvin-based vibrant colors. For example, these colors could potentially indicate an individual’s detoxification ability or state, as previously suggested for red carotenoid coloration (48), or could serve as indicators of physiological performance through their association with vital metabolic processes and ancient pathways for lipid metabolism (10). However, these explanations likely fail to account for the existence of the dusky lory color polymorphism. In fact, co-existing color morphs in nature are extremely rare among parrots. While the ecological drivers behind this polymorphism remain largely unknown, polymorphisms in other highly gregarious species in several taxonomic groups are often linked to social status signaling and other behavioral differences, or exploitation of alternative ecological niches (49, 50). Overall, this study provides insights into the origins of evolutionary novelties and opens avenues for investigating the adaptive significance of color displays more broadly.

Materials and Methods

Ethics statement

Experimental procedures involving animals were conducted following Directive 2010/63/EU on the protection of animals for scientific purposes and were approved by the Animal Welfare and Ethics Body of CIBIO/BIOPOLIS (2018_01). Experiments on the rosy-faced lovebirds were approved by the University of Hong Kong Committee on the Use of Live Animals in Teaching and Research (CULATR) (4751-18) and the Department of Health ((18-675) in DH/SHS/8/2/3 Pt. 17).

Phylogenetic analysis of psittacofulvin colors throughout parrot evolution

To investigate the diversity of psittacofulvin-based pigmentation, the presence of various color mechanisms (i.e., psittacofulvin-based, melanin-based, and structural) was assessed in all 354 parrot species. To accomplish this, 314 specimens of 237 species (67% of described species from all parrot families and genera) were manually inspected from the collections of the Royal Belgian Institute of Natural Sciences, the Royal Museum for Central Africa, and the Harvard Museum of Comparative Zoology. The remaining species were scored using species descriptions from the Handbook of the Birds of the World and online photographs. Yellow psittacofulvins were coded as present when, in either male or female, at least one patch of plumage was either yellow or green (the latter being a combination of structural blue with yellow psittacofulvins). Red psittacofulvins were coded as present when, in either male or female, at least one patch of plumage was red, pink, or orange, all of which contain elevated levels of aldehyde psittacofulvins (see chemical analyses of galah [Eolophus roseicapilla] and cockatiel [Nymphicus hollandicus]). Other mechanisms such as structural coloration (blue colors) and melanin (brown, black, and grey colors) were scored as “other mechanisms”. These scores were plotted on the parrot phylogeny (pruned from Jetz et al. (78)). For visual representation, lineages including several species of identical phenotype were collapsed.

General procedures for tissue and feather acquisition

Blood or feathers of dusky lories (Pseudeos fuscata) of both morphs were donated by licensed owners of this species in captivity. Whole blood was collected into a heparin-free capillary using a sterile needle, and growing feather follicles were obtained by inducing feather regeneration by plucking old feathers and sampling follicles 10 to 12 days later. In some cases, for DNA analysis full-grown feathers were plucked and used for DNA isolation. Samples for DNA analysis were stored in 96% ethanol and samples used for RNA were stored in RNAlater (Thermo Fisher Scientific) immediately after their acquisition and then transferred to a -80ºC freezer until RNA extraction.

Feathers for pigment analysis of the following species were also obtained from captive individuals from zoos or licensed breeders in Portugal and the Czech Republic: budgerigars (Melopsittacus undulatus), cockatiel (Nymphicus hollandicus), dusky lory (Pseudeos fuscata), galah (Eolophus roseicapilla), kea (Nestor notabilis), Pesquet’s parrot (Psittrichas fulgidus), rosy-faced lovebird (Agapornis roseicollis), and scarlet macaw (Ara macao). Feathers were kept in a black plastic bag at room temperature to protect them from light.

Raman spectroscopy of parrot feathers

For vibrational spectroscopy, intact feathers were measured using a confocal Raman microscope (WITec alpha300 RSA, Oxford Instruments) equipped with 5× Achroplan, NA= 0.15, and 50× EC Epiplan-Neofluar, NA = 0.55 (Zeiss) objectives. A 532 nm laser excitation with a power of approximately 0.5 mW at the focal plane was used. The data was analyzed using WITec Project FIVE Plus (v6) software (WITec, Oxford Instruments). The analyses included cosmic ray removal, background subtraction, cropping of the spectral edges affected by detector margins, spectral unmixing with the True Component Analysis tool, and averaging of the mean spectrum by summarizing multiple measurements to optimize the signal-to-noise ratio. For each color, a whole-feather sample was measured at multiple randomly chosen 180 × 180 µm patches to capture spatial variation in pigment content. To confirm the psittacofulvin Raman spectra, the obtained spectra were compared with previously published data (21, 31).

Statistical analyses of the Raman spectroscopy data were carried out using multivariate procedures based on singular value decomposition (SVD), as detailed previously (79). The analysis involved a comparison of individual Raman spectra (n = 132). Through the SVD process, Raman spectra were deconstructed into a set of orthonormal abstract functions known as subspectra (Sj). These subspectra are characterized by weights that represent their importance (singular values, Wj) and coefficients of linear combinations (Vij). The coefficients of the linear combination, known as factor scores, and the subspectra (factor loadings) obtained from the subsequent SVD analysis of the corrected datasets were used to visually represent the spectral variability of the Raman datasets. Typically, the factor scores Vi1 of the first subspectrum S1 is found to be proportional to the intensity of the Raman signal of the ith sample. The factor scores Vi2 of the second subspectrum S2 can be interpreted as a measure of the primary spectral differences reflecting the chemical composition of the samples. In the case of samples composed of only two spectrally different compounds, properly rescaled Vi2 factor scores could be used for visualizing relative fractions of the compounds present in the sample. This analytical process is illustrated in Fig. 2B and Figs. S2 and S6. Custom Matlab scripts, leveraging the Matlab svd function (based on the LAPACK library), were used for SVD-based corrections, analysis, and visualization of the results.

Ultra-high-performance liquid chromatography and mass spectrometry of parrot feathers

Before pigment extraction, surface lipids were removed by washing the feathers consecutively in detergent, ethanol, and hexane. After drying, the vane was separated from the rachis, weighed on an analytical balance, and minced with micro-scissors. Pigments were extracted at 95°C for one hour in a 1 ml solution of 2% HCl in pyridine. Samples were then evaporated to dryness, partitioned between 1 ml of methyl tert-butyl ether (MTBE) and 1 ml of water, centrifuged (4,755 × g, 10 min, 4°C), and the organic phase was washed again with 1 ml of 2% acetic acid. The collected MTBE fractions containing crude pigments were dried under a stream of nitrogen, dissolved in methanol, and frozen at -18°C for one hour to get rid of potential protein and fatty contaminants. After centrifugation (24,400 × g, 10 min, 4°C), the supernatant was analyzed.

Characteristic absorbance spectra of psittacofulvin pigments were determined using an ultra-high-performance liquid chromatography (UHPLC) system Dionex Ultimate 3000 (Thermo Fisher Scientific), coupled with a photodiode array detector (PDA). The separation was performed by two methods, which differed with respect to the column stationary phase used: 1) Phenyl-Hexyl (Kinetex Phenyl-Hexyl 1.7 μm column, 100 x 2.1 mm, with pre-column; Phenomenex) for the separation of main polyene classes according to chain length and terminal groups (aldehyde/carboxyl), and 2) C30 (RP-Aqueous C30 Develosil 5 μm column, 4.6 x 250 mm, with pre-column, Nomura Chemical) for more specific separation of particular forms (e.g., stereoisomers, see the comparison of the chromatograms in fig. S19). In both methods, 0.2% (v/v) formic acid (solvent A) and methanol (solvent B) were used as a mobile phase. The gradient elution for method 1 (Phenyl-Hexyl column), with a constant flow rate of 0.3 mL/min, started at 5% B (0-5 min), then was increased to 50% B (in 2.2 min) and finally ramped at 100% B (8-15 min) with equilibration under initial conditions (5% B) for 5 min. For the method using the C30 Develosil column, the gradient started at 90% B (0-1 min), then rose to 100% B (15-22 min) followed by equilibration (22.5-28 min). The flow rate in this method was 1.2 ml/min. The column temperature was set to 40°C and the injection volume of the sample was 5 μL in both methods. Absorbance spectra were collected in a range from 250 nm to 700 nm (collection rate of 5 Hz).

The identity of the individual psittacofulvins was subsequently confirmed by using ultra-high resolution accurate mass (HRAM) Q-TOF mass spectrometry (IMPACT II, Bruker Daltonik). Ions were detected in positive mode using atmospheric pressure chemical ionization (APCI, both separation methods) and electrospray ionization (ESI, only for Phenyl-Hexyl phase separation) and recorded at a resolution of > 60,000 and an acquisition rate of 1 Hz. Fragmentation MS2 spectra were generated at a collision energy of 20 eV. Detailed settings of the mass spectrometer are given in table S12. Data acquisition and processing were performed using oTof Control (v4.0), HyStar (v3.2), and DataAnalysis (v4.3) software (Bruker Daltonik). Signals of the particular compounds were monitored as extracted chromatograms of corresponding m/z (± 0.005 Da). Pigment’s identity was determined through a comprehensive approach, combining comparison with published data (19, 22, 26), expected molecular formulas based on exact masses and isotopic patterns, analysis of fragmentation spectra from MS2 experiments, and evaluation of absorption spectra. The relative amounts of individual polyketide types within each sample were expressed as a sum of the areas of all chromatographic peaks in extracted chromatograms measured in positive APCI mode with the separation method using the C30 Develosil column.

To compare differences in psittacofulvin content among feathers of varying colors, a statistical analysis of our UHPLC/HRAM-QTOF data was conducted using linear mixed models. This analysis was performed using the ‘lmer’ function from the ‘lmerTest’ package (80) in R (v4.2) (51). The choice of linear mixed models was motivated by the need to account for multiple measurements taken from feathers of individual species, collectively amounting to a total of 28 samples. The analysis focused on general differences in psittacofulvin content across parrots, rather than delving into the effects of phylogenetic relationships or differences among species after accounting for phylogenetic effects. To address this, individual species were treated as a random effect in the model. The response variable was the amount of psittacofulvin, quantified as the area under the peak in the exact mass spectrum relative to the baseline. Each psittacofulvin type is characterized by distinctive peaks in the mass spectrum. Color, length of the polyene chain, and the functional ending group were considered as explanatory fixed effects in our analysis. The same analyses were performed using red and yellow feathers of 11 individuals of the dusky lory (six red and five yellow).

Genome assembly and annotation

The individual chosen to generate a draft reference genome sequence of the dusky lory was a yellow female. This bird, belonging to a breeding couple known for producing offspring of both color morphs, is expected to be heterozygous for the red and yellow alleles. Whole blood was collected from this female as described above but, in this case, immediately snap-frozen in liquid nitrogen. This blood sample was stored at -80ºC until DNA extraction. High molecular weight genomic DNA was extracted from 5 µl of blood using a modified salt-based protocol (81). DNA quantity and integrity were assessed using a NanoDrop instrument, Qubit dsDNA BR Assay Kit (Thermo Fisher Scientific), and Agilent Genomic DNA ScreenTape (Agilent Technologies). The average fragment length of the extracted DNA was estimated to be above 60 kb. Genomic DNA purity was improved by implementing a cleaning step with AMPure XP magnetic beads (Beckman Coulter) at a 3X ratio (beads/volume).

Pacific Biosciences (PacBio) HiFi sequencing libraries were prepared from 7 µg of input genomic DNA. DNA was sheared with the Megaruptor® 3 (Diagenode) and purified using AMPure PB magnetic beads (Pacific Biosciences) for size selection. The size distribution was determined using the Femto Pulse System (Agilent) for all size quality controls. The library insert size was within the optimal size range. A total of 10 µl of library was prepared using PacBio SMRTbell prep kit 3.0, and SMRTbell templates were annealed using Sequel II Bind Kit 3.2. Sequel II DNA internal control complex 3.2 was added to control for any instrument-related issues. Sequencing was performed using the Sequel II Sequencing Kit 2.0 and SMRT cell 8M Tray. Each SMRT cell was captured using 30 h movie time with SMRT cells (Pacific Biosciences) on the PacBio Sequel IIe (Pacific Biosciences) sequencing platform by Macrogen (Seoul, South Korea). The subsequent steps followed the PacBio Sample Net-Shared Protocol, which is available at https://www.pacb.com/.

A total of 1,161,349 HiFi long reads with an average length of 17,270 bp were generated that correspond to ~17X coverage of the expected size of a bird genome (~1.2 Gb). Read and genome statistics are available in table S3. Contig assembly was carried out with Hifiasm (v0.18.2-r467) (52, 53) and primary assembly contigs (n = 1091, N50 = 5.8 Mb, longest = 24.2 Mb) were scaffolded into a pseudochromosome assembly using the homology-based scaffolding algorithm of RagTag (v2.1.0) (54) and the budgerigar bMelUnd1 genome assembly as reference (GCF_012275295.1). The final assembly was 1,214,639,153 bp in length and consisted of 574 scaffolds (N50 = 107.5 Mb, longest = 164.7 Mb). For some analyses to detect structural variants (see below), the two haplotypes (red and yellow) containing ALDH3A2, resulting from the diploid assembly and identified using BLAST searches, were compared to each other.

Before gene annotation, the reference genome was repeat-masked using WindowMasker (v1.0.0) (82). Gene annotation was then performed using MAKER (v3.01.04) (55) in an iterative process largely following Card et al. (56). It consisted of three runs of MAKER:

  • 1)

    In the initial MAKER run, gene models were constructed enabling the “est2genome” and “protein2genome” options and employing multiple forms of gene evidence. First, a de novo transcriptome assembly generated using Trinity (83) was utilized as EST evidence. This assembly was generated using the RNA-sequencing data of regenerating feather follicles (see below) and the data for the six individuals were combined. Secondly, protein sequences from three other bird species, including a parrot (budgerigar - GCF_012275295.1, zebra finch - GCF_003957565.2, chicken - GCF_016699485.2), were incorporated as protein homology evidence. The gene models produced by this first run were subsequently employed to train the gene prediction software SNAP (downloaded: 2023.03.02) (84) and Augustus (v3.5.0) (85). For SNAP, the training was restricted to gene models with a length of 50 or more amino acids and a maximum annotation edit distance score of 0.25. As for Augustus, genomic regions containing the transcript models along with 1 kb of upstream and downstream sequences were extracted. To train Augustus gene prediction parameters, BUSCO (Benchmarking Universal Single-Copy Orthologs) (v5.4.5) (86) was used in the genome and “—long” modes while the training was based on a list of 8,338 single-copy orthologs from the avian lineage database (aves_odb10).

  • 2)

    In the second run, the gene models from the first MAKER run, together with the EST and protein evidence, were used as input for a second MAKER run. This time with the “est2genome” and “protein2genome” options disabled. Under these settings, SNAP and Augustus gene prediction parameters were used to produce gene models, supported by the empirical EST and protein data. After this second run, the same training process of SNAP and Augustus described above was performed with the resulting gene models.

  • 3)

    The third run of MAKER was carried out employing the same options as in the second run. The output of this run was considered the final annotation. Protein sequences were used to annotate the gene models based on homology BLAST searches (minimum e-value threshold set to 1 × 10-5) against the curated protein database of UniProt/SwissProt (accessed on 2023.03.21). The final annotation consisted of 22,296 gene models with an average mean length of 14,394 bp (see table S4 for additional statistics).

Finally, completeness analyses of the assembly and annotation were performed through BUSCO by using the aves_odb10 gene list. These analyses indicate that our genome assembly and annotation are 95.5% and 87.2% complete, respectively (table S5).

Whole-genome resequencing using Illumina reads

Genome-wide polymorphism data for genetic mapping was generated by whole-genome Illumina sequencing of 57 dusky lories (35 red and 22 yellow; table S6). Genomic DNA from blood and feathers was extracted using a modified salt-based protocol (81) and the QIAamp DNA Micro Kit (Qiagen), respectively. Lysates were treated with RNAse-A (Roche) for RNA removal. Following DNA isolation, DNA quality and purity were assessed using spectrophotometry (Nanodrop) and fluorometric quantitation (Qubit dsDNA BR Assay Kit, Thermo Fisher Scientific).

An Illumina sequencing library was then produced for each of the 57 individuals. For blood samples, for which DNA quality was higher and in larger quantities, sequencing libraries were prepared using the TruSeq DNA PCR-free Library Preparation Kit (Illumina). For feather follicle samples, sequencing libraries were prepared using Illumina’s PCR-based Nextera XT Preparation Kit. The quality of each library was assessed by its size distribution, evaluated on an Agilent 2200 TapeStation using an HS D5000 ScreenTape (Agilent Technologies), and by its molarity, calculated using a KAPA qPCR library quantification kit (Roche). Libraries were sequenced using 150 bp paired-end reads on an Illumina instrument (table S6). The PCR-based libraries produced a higher proportion of duplicates due to lower DNA quantities of the samples (table S6). Thus, some of the samples were sequenced more deeply.

Before data analyses, read quality was inspected with FastQC (v0.11.8) (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). The sequencing reads were mapped to the dusky lory draft genome assembly produced in this study with BWA-MEM (v0.7.17-r1188) (57) using default parameters. Read duplicates were flagged for downstream analysis using the Picard (v3.0.0) function MarkDuplicates (http://broadinstitute.github.io/picard). Read group information was added to each BAM using the Picard function AddOrReplaceReadGroups. Sequencing and mapping summary statistics were computed using SAMtools (v1.11) (87).

Variant discovery, genotype calling, and functional annotation of alleles

SNP and small indel variants were identified using GATK (v4.2.6.1) and associated functions (58). Briefly, gVCF files were generated for each individual using the HaplotypeCaller function requiring a minimum mapping quality of 30, were combined into a single file using the function CombineGVCFs, and variants were predicted using the function genotypeGVCFs. Two files containing either SNPs or indels were generated using SelectVariants and filtered with VariantFiltration using specific filters for each type of marker following GATK documentation: 1) SNPs - FisherStrand (FS) > 60.0, QualByDepth (QD) < 2.0, StrandOddsRatio (SOR) > 3.0, RMSMappingQuality (MQ) < 30.0, MappingQualityRankSumTest (MQRankSum) < -12.5, ReadPosRankSumTest (ReadPosRankSum) < -8.0, variant quality (QUAL) < 30.0, and excess heterozygosity (ExcessHet) > 54.69; 2) Indels - FisherStrand (FS) > 200.0, QualByDepth (QD) < 2.0, ReadPosRankSumTest (ReadPosRankSum) < -20.0, and variant quality (QUAL) < 30.0, and excess heterozygosity (ExcessHet) > 54.69.

Before the genome-wide association analysis, SNP and indel variants were combined into a single vcf file with GATK’s function MergeVcfs, and the following filtering steps were applied at the level of genotype and/or variant using vcftools (v0.1.16) (88). The genotype of each individual was coded as ‘missing data’ if its quality was below 20 (--minGQ 20), or if its coverage was below 4X or higher than 58X (i.e., twice the average coverage of the individual with the highest coverage) (--minDP 4; --maxDP 58). Following genotype filtering steps, variants with 50% or more missing data in individual genotypes (--max-missing-count 57) were finally removed, yielding a total of 5,169,694 variants.

The variants retained after filtering, both SNPs and indels, were annotated for potential protein-coding impact (nonsynonymous, frame-shift, splicing, and STOP mutations) using the genetic variant annotation and effect prediction toolbox SnpEff (v4.3t) (89).

Genetic mapping

Before conducting the genome-wide association analysis, the software Beagle (v5.1) (59, 60) was used to impute missing genotypes and to phase the genotype data. Association analyses for each marker were run with GEMMA (v0.98.1) (61) using a linear mixed model and coding color morph as a binary trait (red versus yellow). To minimize potential confounding effects of population stratification and relatedness between individuals, a kinship matrix estimated using the centered relatedness matrix option (-gk 1) in GEMMA was incorporated in the model as a random effect. The first three principal component axes from a principal components analysis (PCA) estimated using PLINK (v1.90b6.26) (90) were also incorporated in the model as covariates. To avoid the influence of variant clusters in estimating the relatedness matrix and PCA, linkage-disequilibrium variant pruning was performed using PLINK with an r2 of 0.3 and a window size of 200 kb. P-values for the association tests were calculated using both the Wald test and the likelihood ratio test for allele frequency differences. Variants with a minor allele frequency lower than 10% were not considered. A Bonferroni correction threshold was set to determine the variants significantly associated with the color morph (-log10 [0.05/4,303,897] = 1.16 × 10-08). In parallel, the same association analysis was repeated by mapping the reads and calling variants to a high-quality genome assembly from the closest relative available, the budgerigar (bMelUnd1), but the results remained qualitatively unaltered. Manhattan plots summarizing the associations were generated with the qqman package (91).

Structural variant detection

Structural rearrangements, such as copy number variation, large deletions, large insertions, inversions, and translocations, were identified using both the Illumina short-read whole-genome resequencing data of the two morphs and the long PacBio reads of a yellow individual heterozygous for the red and yellow alleles, which was sequenced for the genome assembly. The candidate region and a 100 kb interval surrounding the top significant variant were considered. Several approaches were used that vary in their ability to identify different types of rearrangements. First, DELLY (v1.1.6) (92) and LUMPY (v0.2.13) (93) with default parameters were applied to the Illumina short-read data. These methodologies for variant detection integrate information on paired-end alignments, split-read alignment, and read-depth. Second, read alignments across the candidate region for both the short- and long-read data were visually inspected using the Integrated Genomics Viewer (IGV; v2.16.1) (94). Third, the red and yellow haplotypes resulting from the diploid genome assembly produced using Hifiasm, and containing our candidate region, were manually aligned in BioEdit (v7.7) (77) and visually screened for rearrangements. Finally, to visualize sequence similarity and structural variation in a more automated manner, the red and yellow contigs were also compared using a dot plot in the YASS web server (95).

Haplotype analysis

Haplotypes were phased using Beagle (see above). These phased haplotypes were used to estimate allele-specific decay of homozygosity with increasing distance from the candidate causative mutation (scaffold_13:6,288,712). The extended haplotype homozygosity statistic (EHH) was calculated using the R package rehh (v3.1.0) (96). Haplotype tables were also generated by plotting reference and alternative alleles against the contig containing the red allele from the draft genome assembly. Only biallelic variants were considered for these analyses.

Allele age estimation

The age of the color polymorphism in dusky lories was estimated using runtc (97), a method that calculates the first coalescence time between chromosomes carrying a focal allele and chromosomes without the allele. Phased haplotypes were inferred using Beagle (as mentioned earlier). Several parameters were considered in the analysis (table S7). First, a mutation rate of 9.2 × 10-9 was assumed, based on empirical estimates for the blue-throated macaw (Ara glaucogularis) (98), which is the closest relative to the dusky lory with an empirically estimated mutation rate. Additionally, lower (5.6 × 10-9) and higher (1.4 × 10-8) mutation rates were used, matching the confidence interval of the blue-throated macaw estimate from the same study. Second, a recombination rate of 1.5 × 10-8 was used based on estimates obtained from pedigrees of the zebra finch (99). A lower recombination rate of 1.0 × 10-8, similar to that observed in humans (100), and a higher recombination rate of 3.0 × 10-8, comparable to pedigree estimates available for chickens (101), were also evaluated. Third, effective population sizes (Ne) were estimated under a simple mutation drift expectation: θ = 4Neμ (102), where θ is the average θw across all 100 kb nonoverlapping windows across the genome and μ is the neutral mutation rate per site per generation. The software ANGSD (v0.934) (103) was used to estimate θw. Considering the range of mutation rates mentioned earlier, and an average genome-wide estimate of θw = 0.0028, Ne estimates of 51,325, 76,418, and 124,575 were obtained. Finally, generation time, which is the average time between two consecutive generations within a species, was used to convert runtc estimates into years. Dusky lories reach sexual maturity at 2 to 3 years of age. Therefore, generation times of four, five, and six years were used. Allele age estimates were obtained for all combinations of parameters described above for mutation rate, recombination rate, Ne, and generation time.

Transcriptomics in the dusky lory

Bulk RNA-sequencing

RNA-seq data of regenerating feather follicles of dusky lories was generated from six samples, three red and three yellow (table S8). Feather follicle regeneration was induced by plucking a small number of chest feathers on a psittacofulvin-pigmented patch. The feather growth was checked daily, and as soon as the regenerating feathers emerged from the skin, the follicles were plucked, snap-frozen in liquid nitrogen, and stored at -80°C until RNA extraction. Total RNA was isolated using the RNeasy Plus Mini Kit (Qiagen). Following extraction, the integrity of the RNA was measured using a TapeStation RNA ScreenTape (Agilent Technologies), and RNA concentration was measured using a Qubit RNA BR assay kit (Thermo Fisher Scientific). Production of Illumina strand-specific RNA-seq libraries was performed using the TruSeq Stranded mRNA kit according to the manufacturer’s instructions. A total of ~587 million paired-end reads (2 × 150 bp) with an average of ~97 million reads per individual (range = 58,767,478-151,224,228) were generated (table S8).

Differential gene expression analysis

Prior to differential expression analyses, bulk RNA-sequencing raw reads were trimmed using TrimGalore (v0.6.6) (https://github.com/FelixKrueger/TrimGalore) and Cutadapt (v4.0) (104), filtering for a minimum Phred score threshold of 20 at the 3’ end of the reads, using a stringency parameter of 1 during adaptor removal, and a minimum read length of 36 bp. Trimmed reads were then mapped to the dusky lory draft reference genome using HISAT2 (v2.2.1) (62). Differential expression analyses were carried out using the R package DESeq2 (v1.36.0) (63), with count data for each individual and transcript generated with the featureCounts function of the Rsubread package (v1.22.2) (64, 65) against the annotation file. Differentially expressed transcripts between the three red and three yellow biological replicates were considered as those with Benjamini-Hochberg adjusted P-values, corrected for false discovery rate, lower than 0.1.

Sequencing of full-length transcripts using PacBio HiFi Iso-seq

Isoform Sequencing (Iso-seq) data was generated for regenerating feather follicles of six dusky lories, three red and three yellow (table S8). Total RNA was isolated using the RNeasy Plus Mini kit (Qiagen) following the manufacturer’s protocol. RNA degradation and contamination were monitored using an RNA ScreenTape with the Agilent 2200 TapeStation (Agilent Technologies) and RNA concentration was measured using Quant-iT™ RiboGreen™ RNA Assay Kit in Victor Nivo (PerkinElmer).

Iso-seq libraries were constructed according to PacBIO Iso-seq protocol for SMRTbell prep kit 3.0 (Pacific Biosciences). Sequencing primer annealing and polymerase binding to the SMRTbell templates was performed using Sequel II binding kit 3.1. Libraries were sequenced in SMRT cells 8M for 24 hours using a PacBio Sequel IIe (Pacific Biosciences) sequencing platform at Macrogen (Seoul, South Korea). The subsequent steps were performed according to the PacBio Sample Net-Shared Protocol, which is available at https://www.pacb.com/. A total of 1.3 million reads per individual (range = 752,828 to 2,084,623) were generated (table S8).

Analysis of transcript isoforms

The demultiplexed PacBio circular consensus sequences (CCS) generated HiFi reads, with a predicted accuracy ≥Q20, were first processed using the IsoSeq (v4.0.0, https://isoseq.how/) lima tool (v2.7.1) for 5’ and 3’ primer removal with parameters (--iso-seq; --peek-guess). The polyA+ tails and artificial concatemers were trimmed and removed using the IsoSeq (v4.0.0) refine tool (--require-polya). Clustering was performed using the partial order alignment (POA) algorithm using the IsoSeq3 cluster tool (--use-qvs). Clusters were subsequently converted to fasta using BamTools (v2.5.1) with the convert tool (-format fasta). For the identification of common isoforms in red and yellow regenerating feather follicles, transcripts were mapped to the dusky lory draft reference genome using Minimap2 (v2.24) (66). A general feature format (GFF) file with the reference isoforms of ALDH3A2 was generated using TransDecoder (v5.5.0, http://transdecoder.sourceforge.net).

Measurement of allelic imbalance in the dusky lory

To evaluate the relative expression of the dusky lory red and yellow ALDH3A2 alleles in heterozygous individuals, an analysis was conducted based on the Iso-seq data obtained from the regenerating feather follicles of three ALDH3A2 heterozygotes with yellow pigmentation. As part of the genome-wide association study detailed above, the genomes of these individuals were sequenced, enabling us to classify each transcript as either red or yellow based on variants in the coding region that are linked through the same haplotype to the alleles of the candidate causal variant. To obtain count data for each reference isoform, the output bams from the IsoSeq refine tool were merged using the SAMtools merge tool into a single yellow pool, converted to fasta format using BamTools, and then mapped to the reference genome using Minimap2. Finally, the occurrence of each transcript type was quantified on IGV, and the proportion of red and yellow transcripts was calculated from the number of full-length transcripts.

Transcriptomics in the rosy-faced lovebird

Bulk RNA-sequencing

RNA-seq data of regenerating feather follicles was generated for eight rosy-faced lovebirds (Agapornis roseicollis) obtained from red forehead feather follicles and three additional body regions expressing green color (i.e., yellow psittacofulvin-containing patches): back feathers, chest feathers, and head feathers (table S8). To collect feather follicle samples, feathers were plucked from the focal body regions and the follicles were collected as they emerged from the skin, which took 10-15 days. The feather follicles were stored in RNAlater at 4°C overnight and then at -80°C until RNA extraction. RNA was extracted from each sample using TRIzol (Thermo Fisher Scientific). The quantity of RNA was assessed using the Qubit RNA High Sensitivity Assay Kit (Thermo Fisher Scientific). The quality of the total RNA was assessed using the High Sensitivity RNA ScreenTape for the Tapestation (Agilent Technologies). mRNA library preparation was conducted using poly-A selection, and the libraries were sequenced on a NovaSeq instrument aiming for 6 Gb data per sample. mRNA library preparation and sequencing were performed by Novogene (Hong Kong). A total of ~1,432 million paired-end reads (2 × 150 bp) with an average of ~45 million reads per plumage patch (range = 40,205,062-58,758,880) were generated (table S8).

Differential gene expression analysis

Fastp (v0.23.2) (105) was used for adapter removal of the RNA-seq data, and to trim bases with quality < 20 and filter reads < 25 bp in length following trimming. SortMeRNA (v4.3.4) (106) was used to filter reads mapping to ribosomal RNAs (rRNAs) using the SILVA database release 138.1 of large and small subunit rRNAs (107). Quality metrics of the sequencing libraries were assessed with FastQC (v0.11.9).

Processed reads for each RNA-seq library were aligned to a chromosome-level A. roseicollis genome assembly using STAR (v2.7.9a) (67) to first identify sample-specific splice sites and subsequently map each library to the reference using the combined splice site information across all libraries. RSEM (v1.3.3) (68) was used to quantify transcript abundances from aligned RNA-seq data, and edgeR (v3.36.0) (69) was used to normalize count data across samples using the default weighted trimmed mean of M-values (TMM) method. Genes were filtered from each contrast if they occurred with < 1 CPM (count per million) in more than eight samples (of the 16 total for each pairwise contrast). Differential expression was assessed using the linear model fitting procedure implemented in limma (v3.50.3) (108), with the empirical Bayes procedure and using empirical quality weights. The false discovery rate due to multiple testing was controlled using a significance threshold of P < 0.05 with adjusted P-values calculated using the Benjamini-Hochberg procedure. The differential expression of ALDH3A2 based on the RNA-seq data was investigated on both fragments per kilobase per million mapped fragments (FPKM) and transcripts per million mapped reads (TPM) as gene expression metrics. For FPKM, we ran a one-way analysis of variance (ANOVA) followed by Tukey HSD post-hoc test. For TPM, we used Welch’s one-way test followed by Games-Howell post-hoc test to relax the assumption of homogeneity of variances since Levene’s test for TPM data indicated a significant departure from homogeneity (F = 3.025, d.f. = 3, P = 0.046).

Quantitative polymerase chain reaction (qPCR)

Quantitative polymerase chain reaction (qPCR) was performed on the RNA extracted from two red and two yellow growing feather samples of dusky lory, previously used for RNA–seq (see above). ALDH3A2 transcripts were amplified using primers Pf_ALDH_qPCR_F and Pf_ALDH_qPCR_R (table S13). As reference, the housekeeping gene glyceraldehyde-3-phosphate dehydrogenase (GAPDH) was amplified using primers: Pf_GAPDH_qPCR_F and Pf_GAPDH_qPCR_R (table S13). Real-time qPCR reactions were carried out on a QuantStudio 5 Real-Time PCR System (Applied Biosystems). Reactions were performed using a total volume of 10 µL, which contained 100 ng of cDNA template, 0.2 µM each primer, and 1× SYBR®Green JumpStartTM Taq Ready MixTM ABI StepOne Plus (Applied Biosystems). The thermal cycling conditions were as follows: 95°C for 30 s, followed by 40 cycles of 95°C for 5 s, 62°C for 20 s, and 65°C for 10 s. The melting curve was recorded after 40 cycles to verify primer specificity by heating from 65°C to 95°C. Each sample was run in duplicate. The gene expression of ALDH3A2 was normalized to that of GAPDH. For the rosy-faced lovebirds, qPCR was performed on the RNA extracted from green back feathers, green head feathers, and red forehead feathers of eight rosy-faced lovebirds previously analyzed (see above). The primers for ALDH3A2 and GAPDH used were Ar_ALDH_qPCR_F and Ar_ALDH_qPCR_R and Ar_GAPDH_qPCR_F and Ar_GAPDH_qPCR_R, respectively (table S13). Real-time qPCR reactions were carried out on a CFX96 system (Bio-Rad). The reactions were prepared using iTaq Universal SYBR Green Supermix (Bio-Rad), with 1 µL of cDNA template added to each qPCR reaction (total volume of 10 µL). Each sample was run in duplicate. The qPCR was conducted on a CFX96 system (Bio-Rad). The thermal cycling conditions included an initial denaturation at 95°C for 30 sec, followed by 35 cycles of denaturation at 95°C for 5 sec, annealing at 60°C for 30 sec, and melt curve analysis. The gene expression of ALDH3A2 was normalized to that of GAPDH.

Tissue preparation and production of libraries for scRNA-seq and snATAC-seq

Two male budgerigars (Melopsittacus undulatus) were used for the single-cell RNA-seq and single-nuclei ATAC-seq experiments. These animals were obtained from aviaries of licensed breeders in Portugal and were kept in indoor cages (1.2 × 0.58 × 0.5 m) with ad libitum access to water and seeds (Versele-Laga, Manitoba, and Domus Molinari). Nine days before each experiment, feathers from the chest were plucked to induce feather follicle regeneration. For each experiment, 15 regenerating feathers were plucked, immediately placed on ice in CMF-HBSS (Hanks’ Balanced Salt Solution without calcium and magnesium, Gibco), and dissected within one hour in the same conditions. For the dissections, each feather was cut longitudinally to open the outer sheath, and the inner tissues (dermis and epidermis) were transferred into cold CMF-HBSS medium until all feathers were dissected.

For the scRNA-seq experiment, the pooled tissues were dissociated in 2mg/ml dispase solution in DMEM (Gibco) for 40 min at 37°C at 300 rpm. After 10 minutes of incubation, 30 µl of liberase solution (Roche) in DMEM (10mg/ml) was added and the mixture was passed through a pipette tip several times every 10 minutes. The mixture was then removed by centrifugation at 400 × g at 4°C and the partially dissociated tissues were incubated in 0.05% Trypsin/EDTA (Gibco) for 10 minutes at 37°C and 300 rpm mixing and occasionally passed through a pipette tip. The digestion was arrested by addition of 10% FBS (Gibco) in DMEM and the mixture was treated with DNase I (Roche) at 37°C for 5 min and 300 rpm. The cells were then washed two times in 0.4% BSA (Sigma-Aldrich) in HBSS (with calcium and magnesium), filtered twice through a 40 µm-mesh filter (Falcon), treated with a commercial kit for dead cell removal following the manufacturer recommendations (Dead Cell Removal Kit, Miltenyi Biotec), resuspended in 0.4% BSA in HBSS, and counted in a hemocytometer after trypan blue staining. The resuspended cells were split into two fractions and partitioned and barcoded separately using a 10× Genomics Chromium instrument (10X Genomics) following the manufacturer protocol (Chromium Next GEM Single Cell 3ʹ Reagent Kits v3.1 Dual Index, Rev C) and targeting the recovery of 10,000 cells per reaction.

For the snATAC-seq experiment, the cells were dissociated as described before and then incubated in ice for 6 minutes in lysis buffer: 10 mM Tris-HCl pH 7.4, 10 mM NaCl, 3 mM MgCl2, 1% BSA, 0.1% Tween-20, 0.1% IGEPAL CA 630 (Sigma-Aldrich), 0.01% Digitonin (Thermo Fisher Scientific). The nuclei were then washed in lysis buffer without IGEPAL and Digitonin, filtered with a 40 µm cell Flowmi strainer (Scienceware), and resuspended in 15 µl diluted Nuclei Buffer (10X Genomics). The nuclei were then stained with Propidium Iodide and Hoechst 33342 (Sigma-Aldrich) and counted with a hemocytometer using a fluorescence microscope (Leica). The nuclei were then split into two fractions, and partitioned and barcoded separately using a 10X Genomics Chromium instrument following the manufacturer’s protocol (Chromium Next GEM SingleCell ATAC ReagentKits v2 UserGuide, RevB) and targeting the recovery of 2,000 nuclei per reaction.

Following quantification and quality control (LightCycle qPCR, Agilent TapeStation), the libraries were sequenced on an Illumina instrument at Macrogen (Seoul, South Korea) and demultiplexed using the CellRanger Fastq pipeline (10X Genomics). Summary statistics for the single-cell libraries are given in table S9.

Analyses of the scRNA-seq data

The two scRNA-seq libraries were pre-processed individually using the count pipeline in Cell Ranger (v7.0.1; 10X Genomics; https://github.com/10XGenomics/cellranger). Briefly, the reads were aligned to the budgerigar genome (assembly bMelUnd1.mat.Z; Ensembl annotation release 108) using the splicing-aware aligner STAR (67). Uniquely mapped reads within the start and end coordinates of each gene were considered for UMI counts and cells were distinguished from empty droplets using an algorithm based on the EmptyDrop method (109), as implemented in Cell Ranger. After filtering, 3,197 cells for scRNAseq_Lib1 and 3,966 cells for scRNAseq_Lib2 were retrieved (table S9). The two datasets were then normalized by sequencing depth per cell and merged using the Cell Ranger function aggr, enabling batch (i.e., library) correction. After retaining cells with UMI counts > 1,000 and < 25,000, a total of 6,262 cells were included in downstream analyses.

Dimensionality reduction of the gene expression matrix by Principal Components Analysis (PCA, n = 20 PCs) followed by k-means clustering (n = 10 clusters) were performed as implemented in the reanalyze function in Cell Ranger. Second-level clustering and differentiation trajectory analyses were performed using Monocle 3 (70, 71). Briefly, the filtered aggregated libraries were preprocessed (PCA, n = 35 PCs) and dimensionality reduction (UMAP) and clustering (method = louvain) were performed as implemented in Monocle 3. After inspection of the UMAP projection, 2,753 keratinocytes (i.e., cells showing enriched expression of several keratin genes, see below) were selected from the dataset and subjected to second-level clustering (method = louvain, nearest neighbors = 300). A trajectory modeling the relationship between cells as a sequence of gene expression changes was calculated using the learn_graph function with default parameters (minimal branch length = 13). Progress of keratinocyte differentiation along the trajectory was calculated as the distance of each cell from the root node of the graph (i.e., pseudotime), by manually assigning the root at the base of the proliferating keratinocyte population (see below). Independent sub-clustering of keratinocytes was performed with Cell Ranger as described before by increasing k-means clustering resolution (n = 15 clusters) on the full dataset without additional filtering. These analyses grouped peripheral keratinocytes (in the UMAP projection) within a single cluster, which we termed “late differentiating keratinocytes” (see below). Differential gene expression analyses between clusters were conducted by testing the log2 fold-change of each gene’s mean expression in each cluster relative to all other cells in the dataset using an exact negative binomial test, adjusting for false discovery rate with Benjamini-Hochberg correction, and using the publicly available software Loupe Browser (v6.4.0, 10X Genomics).

The annotation of the budgerigar genome used for single-cell analyses lacked the official gene symbols. The gene symbols reported throughout the text were obtained from the corresponding top blastp hit of the longest budgerigar protein isoform of each gene against the chicken protein database (assembly bGalGal1.mat.broiler.GRCg7b; Ensembl annotation release 108). When unavailable, the gene symbols were obtained from the zebra finch (Taeniopygia guttata) or the common canary (Serinus canaria) protein databases using the same procedure (bTaeGut1_v1.p and SCA1; Ensembl annotation release 108).

Annotation of the scRNA-seq data

Cluster annotation was performed by inspecting the top differentially expressed genes in each cluster and by visual inspection of UMAP and t-SNE plots. Cells were annotated into 10 clusters: (1) erythrocytes, n = 1,803; (2) leukocytes 1, n = 194; (3) leukocytes 2, n = 474; (4) endothelial cells, n = 429; (5) melanocytes, n = 331; (6) mesenchymal cells 1, n = 147; (7) mesenchymal cells 2, n = 124; (8) keratinocytes 1, n = 1295; (9) keratinocytes 2, n = 1182; and (10) keratinocytes 3, n = 283. The latter three clusters showed enriched expression of several keratin genes (e.g., keratin 17-like, keratin 5, keratin 13-like, keratin 6A) and were collectively labeled as keratinocytes (fig. S12). Lists of the top differentially expressed genes per cluster and t-SNE plots of selected gene markers supporting cluster annotation are provided in data S3 and fig. S11.

The second-level clustering of keratinocytes obtained from Monocle 3 identified six clusters possibly representing cells progressing on distinct differentiation trajectories along the proximal-distal axis of feather growth (data S4; fig. S12). A subpopulation showed enriched expression of several markers of cell division (S/G2-phase, and M-phase markers; retrieved from the R package Seurat (110)) and likely represents cells of the proliferation zone at the base of the follicle (111). Distal to the proliferative zone is the ramogenic zone, where the feather epithelium invaginates to form barb ridges. One population of basal keratinocytes (COL17A1-positive) showed enrichment in expression of the marginal plate marker Sonic Hedgehog (SHH) (40) and likely represents marginal plate cells flanking each barb ridge. Within each barb ridge, suprabasal keratinocytes organize into the axial and barbule plates. In our dataset, a subpopulation of supra-basal (COL17A1-negative) cells showed enriched expression of several feather keratins (e.g., feather keratin 1, feather beta keratin-like, feather keratin B4-like, and barbule specific keratin 1) and likely represent barbule plate cells. The remaining supra-basal keratinocytes (COL17A1-negative) were positive for NCAM1, a marker gene for marginal and axial plate cells, and likely represent axial plate cells (43). This cell-type showed the highest enrichment in ALDH3A2 expression and is known to express PKS (26), the enzyme responsible for psittacofulvin biosynthesis, based on in situ hybridization in growing feathers. However, we detect only marginal expression of PKS in our dataset, possibly due to a low number of transcripts per cell and/or a pattern of expression mostly restricted to cells at very late stages of differentiation (i.e., toward the distal tip of the feather) that might have been lost during cell dissociation.

Several genes showed similar patterns of enriched expression towards the extremities of each trajectory and likely represent markers for cells entering the terminal differentiation program (data S4). Among these are SCEL (a precursor to the cornified envelope of terminally differentiated keratinocytes (41)), EDQM3 (alias NKX2-3, a transcription factor belonging to the epidermal differentiation complex and associated with the regulation of keratinocyte differentiation (42)), the alpha keratins keratin 15-like and keratin 6B, and CMBL (carboxymethylenebutenolidase homolog). These genes were selected as markers to support the annotation of the snATAC-seq data (see below). t-SNE plots of selected gene markers supporting keratinocyte annotation are provided in fig. S12.

Analyses of the snATAC-seq data

The two snATAC-seq libraries were pre-processed individually using the count pipeline in Cell Ranger ATAC (v2.1.0; 10X Genomics; https://github.com/10XGenomics/cellranger-atac) (72). First, the reads were aligned to the budgerigar genome (assembly bMelUnd1.mat.Z; Ensembl annotation release 108) using a method based on the BWA-MEM algorithm, as implemented in Cell Ranger ATAC. Preliminary barcode quality control and filtering were performed using the count pipeline. Briefly, the analysis defines a global set of open chromatin genomic regions (i.e., peaks) from the collection of high-quality aligned read pairs (i.e., fragments) pooled from all barcodes in the library. Next, barcodes showing an enrichment of fragments overlapping called peaks, relative to the rest of the genome, are selected by fitting a mixture model of two negative binomial distributions for signal and background noise and retained as nuclei-associated barcodes. The analyses retrieved 1,479 and 1,306 barcodes in snATACseq_Lib1 and snATACseq_Lib2, respectively. The two datasets were then normalized by sequencing depth per barcode and merged using the Cell Ranger ATAC function aggr. On the aggregated dataset, additional filters were applied by excluding barcodes with transcription start site (TSS) enrichment scores less than 3 and with a number of fragments <1,000, by calculating a genome-wide tile matrix with insertion counts on 500 bp non-overlapping windows on the combined unfiltered libraries using the createArrowFiles function in ArchR (v1.0.2) (112). A total of 1,700 nuclei were included in downstream secondary analyses. Normalization and dimensionality reduction of the peak-barcode matrix by Latent Semantic Analysis (LSA; n = 20 PCs) and spherical k-means clustering were performed as implemented in the reanalyze function in Cell Ranger.

Annotation of the snATAC-seq data

Annotation of the clusters was performed using Loupe Browser (v6.4.0, 10X Genomics) (72). To identify cluster-specific marker genes, the analysis was focused on differential accessibility analyses within called peaks at promoter regions (i.e., peaks that fall 1,000 bp upstream or 100 bp downstream from a gene’s transcription start site) using a Poisson generalized linear mode with per cell depth as a covariate. The log2 fold-change of each gene’s promoter accessibility in each cluster was tested relative to all other cells in the dataset after correcting for multiple testing. Based on visual inspection of the data and cross-reference with our scRNA-seq analyses, the nuclei were annotated into nine clusters, as identified by the k-means method: (1) erythrocytes, n = 229; (2) leukocytes, n = 87; (3) endothelial cells, n = 263; (4) melanocytes, n = 70; (5) mesenchymal cells 1, n = 129; (6) mesenchymal cells 2, n = 189; (7) keratinocytes A, n = 288; (8) keratinocytes B, n = 303; and (9) keratinocytes C; n = 142). t-SNE and violin plots of selected gene markers supporting cluster annotation are provided in figs. S13 and S14. Lists of genes with top differentially accessible gene promoters per cluster are provided in data S5. The latter three clusters showed enriched promoter accessibility of several keratin genes in agreement with our scRNA-seq results (e.g., keratin 17-like, keratin 5, keratin 13-like, keratin 6A; figs. S12 and S14), and were collectively labeled as keratinocytes. Keratinocyte cluster C showed enriched accessibility at the promoter of genes previously identified as markers of late differentiating keratinocytes in our scRNA-seq analyses: SCEL, EDQM3, keratin 15-like, keratin 6B, and CMBL (see above). Accordingly, cells belonging to keratinocyte cluster C were annotated as late differentiating keratinocytes (fig. S14). To further characterize the dataset, this cluster was split manually into two subclusters on the base of the examination of UMAP/t-SNE plots (keratinocytes C1, n = 74; keratinocytes C2, n = 68) and differential accessibility analyses were performed between the four keratinocyte clusters (i.e., A, B, C1, and C2) (data S6). Keratinocyte cluster A showed enriched accessibility at the promoter of Sonic Hedgehog (SHH), likely representing basal keratinocytes eventually differentiating into marginal plate cells (40). Keratinocyte cluster B showed enriched accessibility at the promoter of keratin-12, possibly representing supra-basal differentiating keratinocytes. Keratinocyte cluster C1 was characterized by enriched accessibility at the promoter of the α-Keratin KRT75. In chicken, KRT75 is the causative gene underlying the frizzle feather phenotype and is expressed in barb ridge cells of embryonic and regenerating feathers in the region that will develop into the ramus (113). Finally, keratinocyte cluster C2 showed enriched accessibility at the promoter of feather keratin 1-like and likely represents late differentiating barbule cells. t-SNE plots of selected gene markers supporting cluster annotation are provided in fig. S14.

ChromBPNet model training and ATAC-seq signal prediction

A convolutional neural network method, ChromBPNet (v0.1.3, https://github.com/kundajelab/chrombpnet), was utilized to explain the relationship between genomic sequence and base-resolution Tn5 transposase cut sites in ATAC-seq. Briefly, ChromBPNet evaluates both Tn5 sequence bias and sequence rules of accessibility in the empirical ATAC-seq data, which possesses both features. During the training step, ChromBPNet corrects the bias by simultaneously passing sequence information through two distinct models: i) the “Frozen Bias Model” that learns to identify the Tn5 sequence bias from the ATAC-seq signal in closed chromatin background regions, and ii) the “TF Model” that comprises the sequence rules of accessibility in open chromatin regions. A combination of these two models is then used to correctly predict sequence accessibility regressing out the effect of the Tn5 bias from the ATAC-seq profiles. After the training step, downstream interpretations are applied only to the Tn5-bias factorized model, which contains the unbiased sequence rules that explain chromatin accessibility. The developer’s tutorial (https://github.com/kundajelab/chrombpnet) was followed to train/evaluate the model, generate base-pair resolution nucleotide contribution scores, and predict the ATAC-seq signals in the dusky lory genome. ChromBPNet model inputs are 2,114 bp genomic sequences centered on ATAC-seq peaks, GC-matched control background peaks, and pseudo bulk ATAC-seq reads. Autosomal and sex chromosomes (chr1 to chr30, chrW, and chrZ) were split into training set, validation set, and test set, except for chromosome 13, where the candidate locus is located, which was assigned to the test set in all models, based on the percentage of nucleotide content in each set in the standard ENCODE cross-validation folds for human chromosomes. To test the stability of ChromBPNet models, three models were trained with shuffled training, validation, and test sets.

The pooled coverage of late differentiating keratinocytes was prepared using subset-bam (v1.1.0), followed by the removal of PCR duplicates using the Picard (v2.26.2) function MarkDuplicates with the following options: --REMOVE_DUPLICATES true, -- BARCODE_TAG CR. The resultant pseudo bulk ATAC-seq reads were then used to define open chromatin regions with MACS2 (v2.2.4; https://github.com/macs3-project/MACS) (114) with the following options: --gsize 1.1e9, --nomodel, --nolambda, --shift -75, --extsize 150, --keep-dup all, and --p 0.01. GC-matched background genomic regions were obtained using the ChromBPNet command chrombpnet prep nonpeaks with the stride of 500 bp bins.

A custom Tn5 bias model was prepared to represent the Tn5 sequence bias in the budgerigar late differentiating keratinocytes. The Tn5 bias model architecture followed ChromBPNet defaults. After the bias model training, sequence subsets with positive contribution scores in the model were clustered and aggregated into contribution score matrices using DeepLIFT (https://github.com/kundajelab/deeplift) and TF-MoDISco (https://github.com/kundajelab/tfmodisco). The resultant contribution score matrices were then compared to the known Tn5 sequence bias motifs and transcription factor motif consensus to examine whether the model learned only Tn5 sequence bias and no other grammar rules, particularly motif-driven rules. In the motif assessment, we confirmed that all the most significantly enriched motifs were the Tn5 sequence bias motifs for both profile and count importance scores. The above analysis used the ChromBPNet command chrombpnet bias pipeline with a bias threshold factor of 0.5.

After the Tn5 bias model training, the bias factorized ChromBPNet model (the combined TF and Frozen models) was trained using the chrombpnet pipeline command with default parameters and model architecture. The model performance was then assessed by comparing counts correlations of observed ATAC-seq read counts to ChromBPNet predictions in each peak region. Each model was also evaluated by the stability of the performance metrics and the stability of the transcription factor motifs generated from contribution scores.

The hypothetical sequence contribution scores in the candidate causal variant region were obtained using the command chrombpnet contribs_bw with the 2,114 bp peak centered on the summit defined by the MACS2 peak calling as described in the previous section. The contribution scores from the three independently trained models were averaged per nucleotide and visualized using the ggseqlogo library (v0.1) (115) in R. The command chrombpnet pred_bw was used to generate the prediction track of the chromatin accessibility of the late differentiating keratinocytes in the genomes of budgerigar, dusky lory and six additional parrot species with chromosome-level genome assemblies available on NCBI: Amazona aestiva (GCA_017639355.1), Amazona ochrocephala (GCA_039720435.1), Ara ararauna (GCA_028858755.1), Lathamus discolor (GCA_037157625.1), Psittacula echo (GCA_963264785.1), and Strigops habroptila (GCF_004027225.2). The target sequence downstream of ALDH3A2 could not be identified in the chromosome-level genome assembly of Myiopsitta monachus possibly due to mis-assembly at the locus. The predicted signals of 2,114 bp peaks contained within the ALDH3A2 locus were averaged with 100 bp bins. The prediction tracks from the three models were averaged using WiggleTools (v1.2.11) (116) and visualized using IGV.

De novo motif discovery

To find cell type-specific open chromatin regions (i.e., peaks), two pseudo-replicates were created from the subset snATAC-seq data, consisting of the nine clusters: erythrocytes, leukocytes, endothelial cells, melanocytes, mesenchymal cells 1, mesenchymal cells 2, keratinocytes A, keratinocytes B, keratinocytes C. The replicates were generated with the ArchR (v1.0.2) (117) function addGroupCoverages, and a union of 200 bp peaks was generated using the addReproduciblePeakSet function. A differential accessibility test was performed with the function getMarkerFeatures with the following parameters: normBy = nFrags, bias = c(“TSSEnrichment” and “ log10(nFrags)”), and testMethod = wilcoxon. The peaks were filtered with the function getMarkers: cutOff = “FDR <= 0.01 & Log2FC >= 1.” The resultant peak sets were defined as differentially accessible peaks.

The statistically significant motifs were discovered de novo by HOMER (v4.11) (73) with ~50,000 random genome background regions that match the GC-content distribution of the input sequences. The findMotifsGenome.pl command was run on 6,668 late differentiating keratinocyte-specific 200 bp peaks. The region corresponding to the ATAC peak downstream of ALDH3A2 (chr13:8,487,599-8,488,513) harbored several late differentiating keratinocyte-enriched binding motifs of TFs known to play crucial roles in epithelial development and keratinocyte terminal differentiation (Fig. 5e, table S10): CEBPE, associated with terminal differentiation in human keratinocytes (118); RUNX2, associated to follicle development and control of epidermal thickness in mice (119); TFAP2A, involved in the upregulation of barrier genes in terminal differentiating human keratinocytes (120); GRHL2, member of the highly conserved Grainy head-like gene family regulating formation and maintenance of the integument across metazoans (121); p53, involved in the control of apoptosis in mouse hair follicles (122).

Identification of transcription factor binding sites

Potential transcription factor binding sites overlapping the causal variant were identified using the scan_sequences function from the universalmotif library (v1.12.4, https://bioconductor.org/packages/universalmotif) in R (v4.1.0). A window of +/- 40 bp surrounding the alleles of the candidate causal position was scanned with the transcription factor binding models (i.e., position weight matrices, PWMs) retrieved from the HOCOMOCO (v11) (123) database. The nucleotide sequences with positive log-odds scores were defined as transcription factor binding sites, and the difference in PWM score between C and T alleles was calculated (data S2).

A total of 57 unique motifs for which the normalized PWM score difference between C and T alleles was > |0.3| were subjected to pairwise similarity analysis using the motifSimilarity function from the PWMEnrich package, R (v4.30.0; https://bioconductor.org/packages/PWMEnrich). The pairwise motif similarity matrix was then utilized to create a cluster dendrogram using the ComplexHeatmap package (v2.10.0; https://bioconductor.org/packages/release/bioc/html/ComplexHeatmap.html) with default settings (fig. S16).

Sequence conservation analyses in whole-genome alignments of birds

Evolutionary conservation analyses at the ALDH3A2 locus among bird species were conducted by calculating per-nucleotide phyloP scores (74). A positive phyloP score indicates slower than expected evolution compared to genome-wide expectations. The scores were generated using the halPhyloP (v2.1) wrapper included in the whole genome aligner progressive cactus (75). The phyloP wrapper was run on a previously generated HAL genome alignment file made with assemblies from 363 avian species (76), the substitution model “b10k model_363_macros.mod” available at https://genome-asia.ucsc.edu/, and Melopsittacus undulatus as a reference genome. To evaluate the magnitude of the phyloP scores for the sites of interest relative to the rest of the genome, the top 5th percentile of phyloP scores across all non-exonic regions was calculated using custom Python scripts. To investigate conservation at the locus across parrots, the homologous sequences to a 300 bp fragment around the candidate causal variant in the dusky lory genome were extracted from 98 parrot genomes retrieved from the NCBI genome repository (124) and aligned using BioEdit (v7.2.5). In addition, the homologous sequence of the dusky lory sister species, Pseudeos cardinalis, was obtained by PCR followed by Sanger sequencing from DNA extracted from feather follicles and using primers designed based on the dusky lory reference genome. The final alignment consisted of 100 parrot species. To visualize sequence conservation among parrots in the region, we generated a 25 bp sequence logo flanking the candidate causative mutation using the ggseqlogo library (v0.1).

Yeast strains preparation

The yeast strains used in the experiments were derived from Saccharomyces cerevisiae strain BJ5464-NpgA (WT hereafter) obtained from Prof. Nancy Da Silva at the University of California, Irvine. This strain is constructed from strain BJ5464 (MATa, ura352, trp1, leu2D1, his3D200, pep4::HIS3, prb1D1.6R, can1, GAL) and contains a chromosomal integration of a phosphopantetheinyl transferase (NpgA) domain, under the control of the ADH2 promoter, necessary to convert PKS apoenzymes to active holoenzymes for psittacofulvin biosynthesis, after transformation with a PKS expression plasmid (see below) (26, 125). The yeast gene HFD1 (SGD: S000004716), orthologous to the mammalian ALDH3A2 (44), was knocked-out from the WT strain by homologous recombination of a KanMX cassette with flanks targeting HFD1 (strain: Δhfd1). The KanMX cassette was amplified from the pBAC690-TEF plasmid (126) using PCR primers with 5’ tails homologous to HFD1 (HFD1_KanMX_F; HFD1_KanMX_R; table S13) and the PCR products were purified using the Monarch PCR & DNA Cleanup Kit (New England Biolabs). Then, 1 ml of an overnight WT strain culture in YPD was resuspended in 10 µl of 100 mM LiAc and the cells were incubated for 15 minutes at 30°C, transformed by incubation for 30 minutes at 30°C in transformation mix (25 µl PCR reaction, 5 µl salmon sperm DNA, 240 µl 60% PEG 3350, 36 µl 1M LiAc, 9 µl water) followed by a 30 minutes heat shock at 42°C, resuspended in 200 µl YPD, incubated overnight at RT, and plated on kanamycin selective medium (YPD + kanamycin). The genomic insertion of the KanMX cassette at the HFD1 locus was then confirmed by screening yeast colonies by PCR with specific primers (HFD1_KO_5_F; HFD1_KO_5_R; table S13) and agarose gel electrophoresis.

With the same procedure, a constitutively expressed dusky lory ALDH3A2 construct was inserted (TEF2 promoter, ALDH3A2 CDS, ADH1 terminator), followed by the KanMX cassette, into the HFD1 locus in the WT strain thereby simultaneously knocking-out HFD1 and knocking-in the dusky lory ALDH3A2 sequence. The procedure was repeated twice independently with the ALDH3A2 510 amino acid transcript isoform CDS (strain: Δhfd1 + ALDH3A2-510aa) and with the ALDH3A2 488 amino acid transcript isoform CDS (strain: Δhfd1 + ALDH3A2-488aa). Since the two ALDH3A2-expressing strains yielded similar results in the following experiments, we focused our discussion in the main text on the Δhfd1 + ALDH3A2-510aa strain (hereafter and in the main text referred to as Δhfd1 + ALDH3A2). The constructs for transformation were amplified by PCR from plasmids assembled using the NEBuilder® HiFi DNA Assembly Cloning Kit (New England Biolabs), following the manufacturer’s protocol. Briefly, the backbone from the plasmid pBAC690-TEF and the ALDH3A2 510aa transcript isoform CDS were amplified from a custom plasmid (Twist Bioscience) with primers with overlapping 5’ overhangs (ALDH_Vector_F and ALDH_Vector_R; ALDH_510_F and ALDH_510_R/ALDH_488_R; table S13). The PCR products were digested with methylation-sensitive restriction enzyme DpnI (New England Biolabs) to remove any untransformed plasmids from the solution and assembled into a vector as before. The vector was cloned into E. coli competent cells following the NEBuilder® HiFi DNA Assembly Chemical Transformation Protocol (New England Biolabs) and the plasmids were purified with the Monarch Plasmid Miniprep Kit (New England Biolabs). The construct was amplified with PCR primers with 5’ tails homologous to HFD1 (ALDH_HFD1_KI_F; ALDH_HFD1_KI_R; table S13) and used to transform yeast WT cells as before. The genomic insertion of the construct was confirmed by screening yeast colonies by PCR (primers HFD1_KO_5_F and ALDH_Vector_R; table S13) and agarose gel electrophoresis.

PKS expression plasmid construction and transformation

The four yeast strains (WT, Δhfd1, Δhfd1 + ALDH3A2, Δhfd1 + ALDH3A2-488aa) were then transformed with an expression plasmid carrying an inducible chicken polyketide synthase (PKS; LOC420486 [Gallus gallus](26)) construct (ADH2 promoter, PKS CDS, CYC1 terminator) and the URA3 auxotrophic marker, or with the plasmid lacking the PKS construct as control, resulting in eight strains used for the experiments (see below). We opted for the use of an expression construct carrying the PKS sequence from chicken, since in a previous study (26), yeast strains expressing chicken PKS yielded higher polyketide concentrations compared to yeast strains expressing PKS from budgerigar. Briefly, the expression plasmid was assembled as before using the NEBuilder® HiFi DNA Assembly Cloning Kit (New England Biolabs) from the plasmid pXP842 (127) and two custom plasmids jointly carrying the full PKS CDS (XM_015282063.4) (Twist Bioscience) amplified by PCR with primers with overlapping 5’ overhangs (PKS_Vector_F; PKS_Vector_R and PKS_Fragment1_F; PKS_Fragment 1_R and PKS_Fragment 2_F; PKS_Fragment 2_R; table S13). The assembled vector was then cloned in E. coli and the plasmids were purified with the Monarch Plasmid Miniprep Kit (New England Biolabs). For transformation, each of the four yeast strains was cultured to mid-log phase in 50 ml YPD medium, 1.5 ml of each culture was centrifuged for 3 minutes at 300 x g, the cell pellets were mixed with 3 µl of either plasmid (pXP842-PKS+ or pXP842-PKS-), resuspended in 100 µl of transformation mix (0.2M LiAc, 26.5% PEG3350, 0.1M DTT in Millipore H2O), incubated for 30 minutes at 42°C, and plated on URA-dropout synthetic media.

Yeast cultures

For each experiment, the four yeast strains and respective controls (WT, WT + PKS, Δhfd1, Δhfd1 + PKS, Δhfd1 + ALDH3A2, Δhfd1 + ALDH3A2 + PKS, Δhfd1 + ALDH3A2-488aa, Δhfd1 + ALDH3A2-488aa + PKS) were cultured in 3 ml of URA-dropout media overnight at 30°C to start 200 ml cultures in 10%-dextrose YPD with initial OD600 = 0.02. The cultures were then incubated on an orbital shaker for 72 hours at 30°C. Then, the cells were harvested by centrifugation at 3,000 x g for 5 minutes at 4°C and washed twice with 10 ml of molecular-grade water on ice followed by centrifugation as before. The liquid was discarded, and the yeast pellets were stored at -80°C until processing.

Pigment extractions from yeast

The pigment extraction protocol was modified from the procedure described in Cooke et al. (26). For each yeast strain, pigments were extracted from 1 gram of cell pellet (wet weight) in two separate extractions with the following procedure. For each extraction, 0.5 g of pellet was washed once in 1 ml of HPLC-grade methanol on ice in 2 ml tubes and centrifuged for 1 min at 3,000 x g at 4°C. The liquid was discarded, and 1 ml of glass beads and 1 ml of HPLC-grade methanol were added to the tubes. The samples were then homogenized twice for 40 seconds using a tissue lyser at maximum speed, placed on ice for 2 minutes, and homogenized again two times as before. The samples were then centrifuged at 16,000 x g for 10 minutes at 4°C and 500 µl of supernatant from each replicate extraction was mixed and centrifuged again as before. Then, 600 µl of supernatant from each sample was dried under a stream of nitrogen and the pigment residues were resuspended by vortexing for 10 seconds in 500 µl of 2% acetic acid in ethyl-acetate and 500 µl of molecular-grade water. The samples were centrifuged at 16,000 x g for 10 minutes at 4°C until phase separation was complete and 350 µl of the upper phase was transferred to a 1.5 ml tube and dried under a stream of nitrogen. The extracts were then resuspended by vortexing for 10 seconds in 100 µl of mobile phase A (see below) and centrifuged at 16,000 x g for 2 minutes at 4°C. For each sample, 30 µl of the solution was immediately analyzed by HPLC.

HPLC analyses of yeast extracts

The HPLC analyses were performed on an Agilent 1100 series HPLC system with the following components: G1316A, G1315A, G1313A, G1312A, and G1379A. The columns used were a Kinetex 5 µm Phenyl-Hexyl 100 Å, LC Column 250 x 4.6 mm (Phenomenex, Torrance, CA, USA) or a YMC 5 µm Carotenoid HPLC Column 250 x 4.6 mm (P.J. Cobert Associates, Inc., St. Louis, MO, USA) at 40°C with a constant flow rate of 1 ml/min. Mobile phase A was milliQ water with 0.2% formic acid and mobile phase B was methanol. One example gradient condition was isocratic at 5% B for 1 minute, gradient from 5% B to 50% B from 1 to 5 minutes, gradient from 50% B to 100% B from 5 to 12 minutes, isocratic at 100% B from 12 to 22 minutes, gradient from 100% B to 5% B from 22 to 25 minutes, and isocratic at 5% B from 25 to 30 minutes.

LC/MS analysis of yeast extracts

The yeast extracts prepared for HPLC analysis were dissolved in 80 µL of LC/MS-grade methanol by vortexing (1 min) and then centrifuged at 24,400 × g, for 10 min at 4°C. Supernatants (60 µl) were then pipetted into conical 250 µl glass inserts and applied to UHPLC/HRAM-QTOF. The instrumentation was identical to the analysis of psittacofulvins in feathers.

Each extract was analyzed by three separation methods that differed in stationary phase and mobile phase composition. Method 1 was identical to method 1 of parrot feather analysis (Phenyl-Hexyl column, mobile phase A – 0.2% formic acid, mobile phase B – 100% methanol). Methods 2 and 3 both used an Accucore C30 column (2 µm, 150 x 2.1 mm, Thermo Fisher Scientific, Waltham, MA, USA). Mobile phases consisting of 0.2% formic acid (phase A) combined with 100% methanol (method 2) or acetonitrile (method 3) as the mobile phase B. All other chromatographic parameters (elution gradient, flow rate, column temperature, injection volume) were the same as in method 1 of parrot feather analysis. The yeast pigments were detected by ESI ionization in positive mode with a resolution of > 60,000 and an acquisition rate of 1 Hz. MS2 spectra of selected peaks were measured in purified fractions prepared by manual isolation using methods 2 and 3. The collision energy used for fragmentation was 20 eV. Absorbance measurement parameters were the same as for parrot feather analysis (table S12).

Chemical analyses of parrot feathers

For in situ vibrational spectroscopy, parrot feathers were measured using a confocal Raman microscope (WITec alpha300 RSA, Oxford Instruments). Spectra were treated using WITec Project FIVE Plus software (Oxford Instruments). Statistical analyses of the Raman spectroscopy data were carried out using multivariate procedures based on singular value decomposition (SVD), as detailed previously (34). Psittacofulvin pigments were extracted from the feathers using a pyridine-based protocol. Characteristic absorbance spectra of the pigments were determined using an ultra-high-performance liquid chromatography (UHPLC) system (Dionex Ultimate 3000, (n = 3) Fisher Scientific), coupled with a photodiode array detector (PDA). The identity of psittacofulvins was confirmed by ultra-high resolution accurate mass (HRAM) quadrupole time-of-flight (Q-TOF) mass spectrometry (IMPACT II, Bruker Daltonik). Data acquisition and processing were performed using oTof Control, HyStar, and DataAnalysis software (Bruker Daltonik). To compare differences in psittacofulvin content among feathers of varying colors, a statistical analysis of our UHPLC-PDA/HRAM-QTOF data was conducted using linear mixed models in R (51).

Genetic mapping in the dusky lory

A draft reference genome sequence of the dusky lory was produced using Pacific Biosciences (PacBio) HiFi sequencing reads. Contig assembly was carried out with Hifiasm (52, 53) and contigs were scaffolded into a pseudochromosome assembly using the homology-based scaffolding algorithm of RagTag (54) and the budgerigar bMelUnd1 genome assembly as reference (GCF_012275295.1). Gene annotation was performed using iterative runs of MAKER (55) following Card et al. (56). Genome-wide polymorphism data for genetic mapping was generated by whole-genome Illumina sequencing of 57 dusky lories (35 red and 22 yellow). The sequencing reads were mapped to the dusky lory draft genome assembly with BWA-MEM (57) and SNP and small indel variants were identified using GATK (58). The software Beagle (59, 60) was used to impute missing genotypes and to phase the genotype data and association analyses were run with GEMMA (61).

Transcriptomics

Illumina strand-specific RNA-seq libraries of regenerating feather follicles of dusky lories were generated from six samples, three red and three yellow. Trimmed reads were mapped to the dusky lory draft reference genome using HISAT2 (62). Differential expression analyses were carried out using DESeq2 (63) and Rsubread (64, 65). Isoform Sequencing (Iso-seq) data using the PacBIO Iso-seq protocol was generated for regenerating feather follicles from the same six individuals. The generated HiFi reads were processed using the IsoSeq (https://isoseq.how/) lima tool and were mapped to the dusky lory draft reference genome using Minimap2 (66). The relative expression of the dusky lory red and yellow ALDH3A2 alleles in heterozygous individuals was obtained by counting the occurrence of each transcript type in each individual. RNA-seq data of regenerating feather follicles was generated for eight rosy-faced lovebirds (Agapornis roseicollis) obtained from red forehead feather follicles and three additional body regions expressing green color (i.e., yellow psittacofulvin-containing patches): back feathers, chest feathers, and head feathers. Trimmed reads for each RNA-seq library were aligned to a chromosome-level A. roseicollis genome assembly using STAR (67). RSEM (68) was used to quantify transcript abundances from aligned RNA-seq data, and edgeR (69) was used to normalize count data across samples using the default weighted trimmed mean of M-values (TMM) method. ALDH3A2 expression was further validated by qPCR.

scRNA-seq and snATAC-seq

Single-cell RNA-seq and single-nuclei ATAC-seq experiments were conducted on regenerating feather follicles of budgerigars. Following tissue dissociation, the resuspended cells and nuclei were barcoded using a 10X Genomics Chromium instrument. The libraries were sequenced on an Illumina instrument and demultiplexed using the CellRanger Fastq pipeline (10X Genomics).

The scRNA-seq libraries (n = 2) were pre-processed using the count pipeline in Cell Ranger (10X Genomics). The reads were aligned to the budgerigar genome (assembly bMelUnd1.mat.Z; Ensembl annotation release 108) using a method based on the splicing-aware aligner STAR (67). Dimensionality reduction of the gene expression matrix of the aggregated libraries was performed as implemented in the reanalyze function in Cell Ranger. Second-level clustering and differentiation trajectory analyses of keratinocytes were performed using Monocle 3 (70, 71). Independent sub-clustering of keratinocytes was performed with Cell Ranger by increasing k-means clustering resolution. Differential gene expression analyses between clusters were conducted using an exact negative binomial test using the publicly available software Loupe Browser (10X Genomics). Cluster annotation was performed by inspecting each cluster’s top differentially expressed genes and visual inspection of UMAP and t-SNE plots.

The snATAC-seq libraries (n = 2) were pre-processed using the count pipeline in Cell Ranger ATAC (10X Genomics) (72). The reads were aligned to the budgerigar genome using a method based on the BWA-MEM algorithm implemented in Cell Ranger ATAC. Normalization and dimensionality reduction of the peak-barcode matrix of the aggregated libraries were performed as implemented in the reanalyze function in Cell Ranger. Annotation of the clusters was performed using Loupe Browser. To identify cluster-specific marker genes, the analysis was focused on differential accessibility analyses within called peaks at promoter regions. A convolutional neural network method, ChromBPNet (https://github.com/kundajelab/chrombpnet), was utilized to explain the relationship between genomic sequence and base-resolution Tn5 transposase cut sites in snATAC-seq empirical data and to predict chromatin accessibility of the late differentiating keratinocytes in the dusky lory genome. The cell type-specific enriched transcription factor binding motifs were discovered de novo by HOMER (73).

Identification of transcription factor binding sites

Potential transcription factor binding sites overlapping the causal variant were identified using the scan_sequences function from the universalmotif library (https://bioconductor.org/packages/universalmotif) in R (v4.1.0). Motifs with differential predicted transcription factor binding affinity between dusky lory alleles (C vs. T) were grouped into 10 sub-families of related motifs with the PWMEnrich package, R (v4.30.0); (https://bioconductor.org/packages/PWMEnrich).

Sequence conservation analyses

Evolutionary conservation analyses at the ALDH3A2 locus among birds were conducted by calculating per-nucleotide phyloP scores (74). The scores were generated using the halPhyloP wrapper included in the whole genome aligner progressive cactus (75). The phyloP wrapper was run on a previously generated HAL genome alignment file incorporating sequences from 363 avian species (76). Conservation analyses among parrots were conducted by extracting homologous sequences surrounding the candidate causal variant in the dusky lory from 100 parrot genomes and aligning them using BioEdit (77).

Biochemical assays in yeast

The yeast strains used in the experiments were derived from Saccharomyces cerevisiae strain BJ5464-NpgA. The yeast gene HFD1 (SGD: S000004716), orthologous to the mammalian ALDH3A2 (44), was knocked-out from the WT strain by homologous recombination. With the same procedure, a constitutively expressed dusky lory ALDH3A2 construct was inserted into the HFD1 locus in the WT strain thereby simultaneously knocking-out HFD1 and knocking-in the dusky lory ALDH3A2 sequence. The resulting yeast strains were then transformed with an expression plasmid carrying an inducible chicken polyketide synthase (PKS; LOC420486 [Gallus gallus](26)) construct. The yeast strains and respective controls were cultured, and the cells were harvested by centrifugation. The pigment extraction protocol was modified from the procedure described in Cooke et al. (26). The resulting extracts were subjected to HPLC and UHPLC/HRAM-QTOF analyses.

Supplementary Material

Supp. data S1
Supp. data S2
Supp. data S3
Supp. data S4
Supp. data S5
Supp. data S6
Supplemental material

Introduction

Coloration is an important trait in ecological adaptation and communication among animals, particularly birds, which use their diverse plumage for multiple purposes, such as camouflage or social signaling. Parrots, known for their vivid coloration, display a wide gamut of hues including yellows, oranges, reds, and greens. This vibrant palette is primarily due to the deposition of psittacofulvins during feather growth, a unique class of pigments found exclusively in parrots.

Rationale

Psittacofulvins are polyene pigments that are endogenously synthesized to produce bright yellow, orange, and red colors. Combined with blue hues produced by feather nanostructures, yellow psittacofulvins are also essential for producing green colors. While previous studies in domesticated species identified a polyketide synthase required for psittacofulvin biosynthesis, the mechanisms by which parrots diversify their color palette were previously unknown. The present study elucidates how psittacofulvins are biochemically modified to produce the broad spectrum of colors observed in wild species of parrots and identifies the chemical and genetic bases of these color variations.

Results

By combining spectroscopy, chromatography, and mass spectrometry analyses of feathers from various species, we uncovered a common chemical basis for yellow-to-red color variation in parrots. We found that the oxidation state of the psittacofulvin ‘end group’ plays a key role in color shifts, with the tuning of color from yellow to red correlating with the ratio of carboxyl to aldehyde end group in psittacofulvin molecules: red feathers have large amounts of aldehyde psittacofulvins, while yellow and green feathers have higher levels of carboxyl psittacofulvins. To explore the genetic basis of these color differences, we studied the dusky lory, which occurs in two varieties in wild populations: yellow and red. Genetic mapping identified a genomic region associated with color variation, containing a candidate point mutation in a non-coding region downstream of the ALDH3A2 gene, which encodes an enzyme that catalyzes the oxidation of fatty aldehydes to carboxylic acids. Single-cell RNA sequencing and chromatin accessibility assays in regenerating feather follicles confirmed that ALDH3A2 is expressed at higher levels in late differentiating keratinocytes, cells crucial for psittacofulvin metabolism, and reveal that the candidate causal mutation in the dusky lory lies within an open chromatin region specific to these keratinocytes, suggesting that the causal variant affects the activity of an enhancer which controls levels of ALDH3A2 expression in a cell type-specific manner. Transcriptomic analyses in dusky lory and another parrot species (rosy-faced lovebirds) indicate that regenerating feather patches enriched for yellow psittacofulvins have higher levels of ALDH3A2 than patches enriched for red psittacofulvins. Yeast assays confirmed that the ALDH3A2 enzyme is capable of converting red psittacofulvins into yellow pigments as predicted by the genetic results.

Conclusion

This study identifies ALDH3A2 as a key enzyme in the biochemical pathway responsible for color variation in parrots. These results reveal insights into the molecular mechanisms underlying one of the most visually striking adaptations in the natural world and lay the groundwork for future studies aimed at understanding how bright colors evolve in the wild.

The molecular bases of bright color variation in parrots.

The molecular bases of bright color variation in parrots.

Yellow-to-red color variation in parrot feathers is due to differences in the concentration of yellow carboxyl and red aldehyde psittacofulvin pigments. Through a combination of genetic and biochemical techniques, we identify aldehyde dehydrogenase 3 family member A2 (ALDH3A2) as a key enzyme regulating the balance of aldehyde to carboxyl pigments in parrots.

Acknowledgments

We thank Thomas Cooke for his advice on psittacofulvin biosynthetic assays. We thank the following breeders who provided samples and feathers for this study: Carlos Filipe Macedo, Rui Sousa, Gonçalo George, Ricardo Clemente, Roman Vašata, Milan Hřebík from Pilsen Zoo, Magdaléna Žohová from Laguna rescue center, and Ladislav Žoha from KPEP. We thank Ibanidis Lda (Versele-Laga distributor), ZooService Lda (Manitoba distributor), and Papa d’Ovo (Domus Molinari distributor) as sponsoring partners of food, husbandry cages, and accessories supplies. We thank Nancy Da Silva for the yeast strains. We thank Milan Kořínek, Niall Perrins, Robert Hynson and the Macaulay Library at Cornell Lab of Ornithology for permission to use parrot photos. We thank Connie Myers for advice in single-cell wet lab practices. We thank Joana C. Carvalho for producing the graphical abstract. We thank the CCMR staff, Ellen Sai Nam Lo, Tat Sing Ngai, and Mei Ying Wu for animal care and husbandry, and the research computing facilities offered by the Information Technology Services, the University of Hong Kong.

Funding

European Research Council under the European Union’s Horizon 2020 research and innovation program; grant agreement No. 101000504 (MC)

Portuguese Foundation for Science and Technology (FCT, https://www.fct.pt) research fellowships SFRH/BD/147030/2019 and PD/BD/128492/2017 in the scope of the Biodiversity, Genetics and Evolution (BIODIV) PhD program (CIM, PP)

Portuguese Foundation for Science and Technology (FCT, https://www.fct.pt) research contracts CEECINST/00014/2018/CP1512/CT0002, 2020.01405.CEECIND/CP1601/CT0011, and 2020.01494.CEECIND (MC, PA, PMA)

Portuguese National Funds (Transitory Norm contract [DL57/2016/CP1440/CT0006]) (RJL)

Portuguese Foundation for Science and Technology (FCT, https://www.fct.pt); R&D Project in All Scientific Domains 2022.06261.PTDC. (RJL)

FWO (Fonds voor Wetenschappelijk Onderzoek Vlaanderen) travel grant (MN)

Universiteit Gent BOF grant (MN)

BAEF (Belgian American Educational Foundation) fellowship (MN)

Seed Fund for Basic Research; University of Hong Kong (SJS)

Research Infrastructure METROFOOD-CZ supported by the Ministry of Education, Youth, and Sports of the Czech Republic; project No. LM2023064 (PMa)

Footnotes

Author contributions

Conceptualization: MC, JCC, PMA

Methodology: JB, JCC, MC, MW, SB, RAr, ESKP, SYWS

Investigation: CIM, GD, JB, MPJN, PA, PMa, PMo, Raf, RAr, SA, SB, SJS, YL, YO, AC, ESKP, SYWS

Visualization: CIM, GD, JB, MJPN, RAr, SB, AC

Experimental procedures on animals: RJL, ESKP, SYWS

Sampling and animal husbandry: JB, PMA, RJL, SGR, UA, ESKP, SYWS

Funding acquisition: JCC, MC

Supervision: JCC, MC

Writing – original draft: JB, JCC, MC, RAr, SB

Writing – review & editing: All authors

Competing interests

RAr, SB, JB, PMA, JCC, and MC are inventors on a pending patent application related to the technology described in this work. The remaining authors declare no conflict of interest.

Data and materials availability

PacBio and Illumina reads are available in NCBI SRA under BioProject PRJNA986688. Gene annotation, the draft reference genome, and the raw Raman and HPLC data are available in a Dryad repository (10.5061/dryad.18931zd5c).

References and Notes

  • 1.Cuthill IC, Allen WL, Arbuckle K, Caspers B, Chaplin G, Hauber ME, Hill GE, Jablonski NG, Jiggins CD, Kelber A, Mappes J, et al. The biology of color. Science. 2017;357:eaan0221. doi: 10.1126/science.aan0221. [DOI] [PubMed] [Google Scholar]
  • 2.Svensson, Wong Carotenoid-based signals in behavioural ecology: a review. Behaviour. 2011;148:131–189. [Google Scholar]
  • 3.Amundsen T. Why are female birds ornamented? Trends Ecol Evol. 2000;15:149–155. doi: 10.1016/s0169-5347(99)01800-5. [DOI] [PubMed] [Google Scholar]
  • 4.Hill GE. In: Bird Coloration, Volume 2: Function and Evolution. Hill GE, McGraw KJ, editors. Harvard University Press; Cambridge, MA: 2006. Female choice for ornamental coloration; pp. 137–200. [Google Scholar]
  • 5.Terrill RS, Shultz AJ. Feather function and the evolution of birds. Biological Reviews. 2023;98:540–566. doi: 10.1111/brv.12918. [DOI] [PubMed] [Google Scholar]
  • 6.Mason NA, Bowie RCK. Plumage patterns: Ecological functions, evolutionary origins, and advances in quantification. Auk. 2020;137:ukaa060 [Google Scholar]
  • 7.Dunn PO, Armenta JK, Whittingham LA. Natural and sexual selection act on different axes of variation in avian plumage color. Sci Adv. 2015;1:e1400155. doi: 10.1126/sciadv.1400155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Fox HM, Vevers G. The Nature of Animal Colors. Macmillan; New York: 1960. [Google Scholar]
  • 9.Barrett RDH, Laurent S, Mallarino R, Pfeifer SP, Xu CCY, Foll M, Wakamatsu K, Duke-Cohan JS, Jensen JD, Hoekstra HE. Linking a mutation to survival in wild mice. Science. 2019;363:499–504. doi: 10.1126/science.aav3824. [DOI] [PubMed] [Google Scholar]
  • 10.Weaver RJ, Koch RE, Hill GE. What maintains signal honesty in animal colour displays used in mate choice? Philosophical Transactions of the Royal Society B: Biological Sciences. 2017;372:20160343. doi: 10.1098/rstb.2016.0343. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Nemesio A. Color production and evolution in parrots. International Journal of Ornithology. 2001;4:75–102. [Google Scholar]
  • 12.Berg ML, Bennett ATD. The evolution of plumage colouration in parrots: a review. Emu - Austral Ornithology. 2010;110:10–20. [Google Scholar]
  • 13.Arnold KE, Owens IPF, Marshall NJ. Fluorescent Signaling in Parrots. Science. 2002;295:92. doi: 10.1126/science.295.5552.92. [DOI] [PubMed] [Google Scholar]
  • 14.Masello JF, Pagnossin ML, Lubjuhn T, Quillfeldt P. Ornamental non-carotenoid red feathers of wild burrowing parrots. Ecol Res. 2004;19:421–432. [Google Scholar]
  • 15.Masello JF, Quillfeldt P. Body size, body condition and ornamental feathers of Burrowing Parrots: variation between years and sexes, assortative mating and influences on breeding success. Emu - Austral Ornithology. 2003;103:149–161. [Google Scholar]
  • 16.Burtt EH, Schroeder MR, Smith LA, Sroka JE, McGraw KJ. Colourful parrot feathers resist bacterial degradation. Biol Lett. 2011;7:214–216. doi: 10.1098/rsbl.2010.0716. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Carballo L, Delhey K, Valcu M, Kempenaers B. Body size and climate as predictors of plumage colouration and sexual dichromatism in parrots. J Evol Biol. 2020;33:1543–1557. doi: 10.1111/jeb.13690. [DOI] [PubMed] [Google Scholar]
  • 18.Heinsohn R, Legge S, Endler JA. Extreme Reversed Sexual Dichromatism in a Bird Without Sex Role Reversal. Science. 2005;309:617–619. doi: 10.1126/science.1112774. [DOI] [PubMed] [Google Scholar]
  • 19.Stradi R, Pini E, Celentano G. The chemical structure of the pigments in Ara macao plumage. Comp Biochem Physiol B Biochem Mol Biol. 2001;130:57–63. doi: 10.1016/s1096-4959(01)00402-x. [DOI] [PubMed] [Google Scholar]
  • 20.Krukenberg CFW. Die Federfarbstoffe der Psittaciden. Vergleichend-physiologische Studien Reihe 2, Abtlg. 1882;2:29–36. [Google Scholar]
  • 21.Veronelli M, Zerbi G, Stradi R. In situ resonance Raman spectra of carotenoids in bird’s feathers. Journal of Raman Spectroscopy. 1995;26:683–692. [Google Scholar]
  • 22.McGraw KJ, Nogare MC. Distribution of unique red feather pigments in parrots. Biol Lett. 2005;1:38–43. doi: 10.1098/rsbl.2004.0269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Toomey MB, Lopes RJ, Araújo PM, Johnson JD, Gazda MA, Afonso S, Mota PG, Koch RE, Hill GE, Corbo JC, Carneiro M. High-density lipoprotein receptor SCARB1 is required for carotenoid coloration in birds. Proceedings of the National Academy of Sciences. 2017;114:5219–5224. doi: 10.1073/pnas.1700751114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Toomey MB, Marques CI, Araújo PM, Huang D, Zhong S, Liu Y, Schreiner GD, Myers CA, Pereira P, Afonso S, Andrade P, et al. A mechanism for red coloration in vertebrates. Current Biology. 2022;32:4201–4214.:e12. doi: 10.1016/j.cub.2022.08.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Gazda MA, Araújo PM, Lopes RJ, Toomey MB, Andrade P, Afonso S, Marques C, Nunes L, Pereira P, Trigo S, Hill GE, et al. A genetic mechanism for sexual dichromatism in birds. Science. 2020;368:1270–1274. doi: 10.1126/science.aba0803. [DOI] [PubMed] [Google Scholar]
  • 26.Cooke TF, Fischer CR, Wu P, Jiang TX, Xie KT, Kuo J, Doctorov E, Zehnder A, Khosla C, Chuong CM, Bustamante CD. Genetic Mapping and Biochemical Basis of Yellow Feather Pigmentation in Budgerigars. Cell. 2017;171:427–439. doi: 10.1016/j.cell.2017.08.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Mundy NI. Colouration Genetics: Pretty Polymorphic Parrots. Current Biology. 2018;28:R113–R114. doi: 10.1016/j.cub.2017.12.045. [DOI] [PubMed] [Google Scholar]
  • 28.Ke F, van der Zwan H, Poon ESK, Cloutier A, Van den Abeele D, van der Sluis R, Sin SYW. Convergent evolution of parrot plumage coloration. PNAS Nexus. 2024;3:pgae107. doi: 10.1093/pnasnexus/pgae107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Barnsley JE, Tay EJ, Gordon KC, Thomas DB. Frequency dispersion reveals chromophore diversity and colour-tuning mechanism in parrot feathers. R Soc Open Sci. 2018;5:172010. doi: 10.1098/rsos.172010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Wright TF, Schirtzinger EE, Matsumoto T, Eberhard JR, Graves GR, Sanchez JJ, Capelli S, Müller H, Scharpegge J, Chambers GK, Fleischer RC. A Multilocus molecular phylogeny of the parrots (Psittaciformes): Support for a Gondwanan origin during the Cretaceous. Mol Biol Evol. 2008;25:2141–2156. doi: 10.1093/molbev/msn160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Tay EJ, Barnsley JE, Thomas DB, Gordon KC. Elucidating the resonance Raman spectra of psittacofulvins. Spectrochim Acta A Mol Biomol Spectrosc. 2021;262:120146. doi: 10.1016/j.saa.2021.120146. [DOI] [PubMed] [Google Scholar]
  • 32.Kane MA, Napoli JL. Quantification of Endogenous Retinoids. 2010:1–54. doi: 10.1007/978-1-60327-325-1_1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Collar N, Boesman PFD. Birds of the World. Cornell Lab of Ornithology; 2020. Dusky Lory (Pseudeos fuscata) [Google Scholar]
  • 34.Materials and methods are available as supplementary materials
  • 35.Otsuka M, Matsumoto T, Morimoto R, Arioka S, Omote H, Moriyama Y. A human transporter protein that mediates the final excretion step for toxic organic cations. Proceedings of the National Academy of Sciences. 2005;102:17923–17928. doi: 10.1073/pnas.0506483102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Kelson TL, Secor McVoy JR, Rizzo WB. Human liver fatty aldehyde dehydrogenase: microsomal localization, purification, and biochemical characterization. Biochimica et Biophysica Acta (BBA) - General Subjects. 1997;1335:99–110. doi: 10.1016/s0304-4165(96)00126-2. [DOI] [PubMed] [Google Scholar]
  • 37.Chang C-H, Yu M, Wu P, Jiang T-X, Yu H-S, Widelitz RB, Chuong C-M. Sculpting skin appendages out of epidermal layers via temporally and spatially regulated apoptotic events. Journal of Investigative Dermatology. 2004;122:1348–1355. doi: 10.1111/j.0022-202X.2004.22611.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Rogers GR, Markova NG, De Laurenzi V, Rizzo WB, Compton JG. Genomic organization and expression of the human fatty aldehyde dehydrogenase gene (FALDH) Genomics. 1997;39:127–135. doi: 10.1006/geno.1996.4501. [DOI] [PubMed] [Google Scholar]
  • 39.Yue Z, Jiang TX, Wu P, Widelitz RB, Chuong CM. Sprouty/FGF signaling regulates the proximal–distal feather morphology and the size of dermal papillae. Dev Biol. 2012;372:45–54. doi: 10.1016/j.ydbio.2012.09.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Yu M, Wu P, Widelitz RB, Chuong C-M. The morphogenesis of feathers. Nature. 2002;420:308–312. doi: 10.1038/nature01196. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Champliaud M-F, Burgeson RE, Jin W, Baden HP, Olson PF. cDNA Cloning and characterization of Sciellin, a LIM domain protein of the keratinocyte cornified envelope. Journal of Biological Chemistry. 1998;273:31547–31554. doi: 10.1074/jbc.273.47.31547. [DOI] [PubMed] [Google Scholar]
  • 42.Strasser B, Mlitz V, Hermann M, Rice RH, Eigenheer RA, Alibardi L, Tschachler E, Eckhart L. Evolutionary origin and diversification of epidermal barrier proteins in Amniotes. Mol Biol Evol. 2014;31:3194–3205. doi: 10.1093/molbev/msu251. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Chuong CM, Edelman GM. Expression of cell-adhesion molecules in embryonic induction. II. Morphogenesis of adult feathers. J Cell Biol. 1985;101:1027–1043. doi: 10.1083/jcb.101.3.1027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Nakahara K, Ohkuni A, Kitamura T, Abe K, Naganuma T, Ohno Y, Zoeller RA, Kihara A. The Sjögren-Larsson syndrome gene encodes a hexadecenal dehydrogenase of the sphingosine 1-phosphate degradation pathway. Mol Cell. 2012;46:461–471. doi: 10.1016/j.molcel.2012.04.033. [DOI] [PubMed] [Google Scholar]
  • 45.Adamec F, Greco JA, LaFountain AM, Magdaong NM, Fuciman M, Birge RR, Polívka T, Frank HA. Spectroscopic investigation of a brightly colored psittacofulvin pigment from parrot feathers. Chem Phys Lett. 2016;648:195–199. [Google Scholar]
  • 46.Lin Z, Carney G, Rizzo WB. Genomic organization, expression, and alternate splicing of the mouse fatty aldehyde dehydrogenase gene. Mol Genet Metab. 2000;71:496–505. doi: 10.1006/mgme.2000.3084. [DOI] [PubMed] [Google Scholar]
  • 47.Morelli R, Loscalzo R, Stradi R, Bertelli A, Falchi M. Evaluation of the antioxidant activity of new carotenoid-like compounds by electron paramagnetic resonance. Drugs Exp Clin Res. 2003;29:95–100. [PubMed] [Google Scholar]
  • 48.Mundy NII, Stapley J, Bennison C, Tucker R, Twyman H, Kim KW, Burke T, Birkhead TRR, Andersson S, Slate J. Red carotenoid coloration in the zebra finch is controlled by a cytochrome P450 gene cluster. Current Biology. 2016;26:1435–1440. doi: 10.1016/j.cub.2016.04.047. [DOI] [PubMed] [Google Scholar]
  • 49.Roulin A. The evolution, maintenance and adaptive function of genetic colour polymorphism in birds. Biol Rev Camb Philos Soc. 2004;79:815–848. doi: 10.1017/s1464793104006487. [DOI] [PubMed] [Google Scholar]
  • 50.Pryke SR. Fiery red heads: Female dominance among head color morphs in the Gouldian finch. Behavioral Ecology. 2007;18:621–627. [Google Scholar]
  • 51.R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; Vienna: 2022. https://www.R-project.org . [Google Scholar]
  • 52.Cheng H, Jarvis ED, Fedrigo O, Koepfli K-P, Urban L, Gemmell NJ, Li H. Haplotype-resolved assembly of diploid genomes without parental data. Nat Biotechnol. 2022;40:1332–1335. doi: 10.1038/s41587-022-01261-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Cheng H, Concepcion GT, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 2021;18:170–175. doi: 10.1038/s41592-020-01056-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Alonge M, Lebeigle L, Kirsche M, Jenike K, Ou S, Aganezov S, Wang X, Lippman ZB, Schatz MC, Soyk S. Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol. 2022;23:258. doi: 10.1186/s13059-022-02823-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Holt C, Yandell M. MAKER2: An annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics. 2011;12:491. doi: 10.1186/1471-2105-12-491. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Card DC, Adams RH, Schield DR, Perry BW, Corbin AB, Pasquesi GIM, Row K, Van Kleeck MJ, Daza JM, Booth W, Montgomery CE, et al. Genomic basis of convergent island phenotypes in Boa constrictors. Genome Biol Evol. 2019;11:3123–3143. doi: 10.1093/gbe/evz226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA. The genome analysis toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Browning BL, Zhou Y, Browning SR. A one-penny imputed genome from next-generation reference panels. The American Journal of Human Genetics. 2018;103:338–348. doi: 10.1016/j.ajhg.2018.07.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Browning BL, Tian X, Zhou Y, Browning SR. Fast two-stage phasing of large-scale sequence data. The American Journal of Human Genetics. 2021;108:1880–1890. doi: 10.1016/j.ajhg.2021.08.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Zhou X, Stephens M. Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 2012;44:821–824. doi: 10.1038/ng.2310. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Kim D, Langmead B, Salzberg SL. HISAT: A fast spliced aligner with low memory requirements. Nat Methods. 2015;12:357–360. doi: 10.1038/nmeth.3317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014;30:923–930. doi: 10.1093/bioinformatics/btt656. [DOI] [PubMed] [Google Scholar]
  • 65.Liao Y, Smyth GK, Shi W. The R package Rsubread is easier, faster, cheaper and better for alignment and quantification of RNA sequencing reads. Nucleic Acids Res. 2019;47:e47. doi: 10.1093/nar/gkz114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Li H. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–3100. doi: 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Li B, Dewey CN. RSEM: Accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011;12:323. doi: 10.1186/1471-2105-12-323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–40. doi: 10.1093/bioinformatics/btp616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Cao J, Spielmann M, Qiu X, Huang X, Ibrahim DM, Hill AJ, Zhang F, Mundlos S, Christiansen L, Steemers FJ, Trapnell C, et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature. 2019;566:496–502. doi: 10.1038/s41586-019-0969-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Trapnell C, Cacchiarelli D, Grimsby J, Pokharel P, Li S, Morse M, Lennon NJ, Livak KJ, Mikkelsen TS, Rinn JL. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol. 2014;32:381–386. doi: 10.1038/nbt.2859. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Satpathy AT, Granja JM, Yost KE, Qi Y, Meschi F, McDermott GP, Olsen BN, Mumbach MR, Pierce SE, Corces MR, Shah P, et al. Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion. Nat Biotechnol. 2019;37:925–936. doi: 10.1038/s41587-019-0206-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng JX, Murre C, Singh H, Glass CK. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell. 2010;38:576–589. doi: 10.1016/j.molcel.2010.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 2010;20:110–121. doi: 10.1101/gr.097857.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Armstrong J, Hickey G, Diekhans M, Fiddes IT, Novak AM, Deran A, Fang Q, Xie D, Feng S, Stiller J, Genereux D, et al. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature. 2020;587:246–251. doi: 10.1038/s41586-020-2871-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Jarvis ED, Mirarab S, Aberer AJ, Li B, Houde P, Li C, Ho SYW, Faircloth BC, Nabholz B, Howard JT, Suh A, et al. Phylogenomic analyses data of the avian phylogenomics project. Gigascience. 2015;4:4. doi: 10.1186/s13742-014-0038-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Hall TA. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser. 1999;41:95–98. [Google Scholar]
  • 78.Jetz W, Thomas GH, Joy JB, Hartmann K, Mooers AO. The global diversity of birds in space and time. Nature. 2012;491:444–448. doi: 10.1038/nature11631. [DOI] [PubMed] [Google Scholar]
  • 79.Palacký J, Mojzeš P, Bok J. SVD-based method for intensity normalization, background correction and solvent subtraction in Raman spectroscopy exploiting the properties of water stretching vibrations. Journal of Raman Spectroscopy. 2011;42:1528–1539. [Google Scholar]
  • 80.Kuznetsova A, Brockhoff PB, Christensen RHB. lmerTest Package: Tests in linear mixed effects models. J Stat Softw. 2017;82:1–26. [Google Scholar]
  • 81.Enbody ED, Sprehn CG, Abzhanov A, Bi H, Dobreva MP, Osborne OG, Rubin C-J, Grant PR, Grant BR, Andersson L. A multispecies BCO2 beak color polymorphism in the Darwin’s finch radiation. Current Biology. 2021;31:5597–5604.:e7. doi: 10.1016/j.cub.2021.09.085. [DOI] [PubMed] [Google Scholar]
  • 82.Morgulis A, Gertz EM, Schäffer AA, Agarwala R. WindowMasker: window-based masker for sequenced genomes. Bioinformatics. 2006;22:134–141. doi: 10.1093/bioinformatics/bti774. [DOI] [PubMed] [Google Scholar]
  • 83.Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29:644–652. doi: 10.1038/nbt.1883. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Korf I. Gene finding in novel genomes. BMC Bioinformatics. 2004;5:59. doi: 10.1186/1471-2105-5-59. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Stanke M, Waack S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics. 2003;19:ii215–ii225. doi: 10.1093/bioinformatics/btg1080. [DOI] [PubMed] [Google Scholar]
  • 86.Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
  • 87.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 2014;6:80–92. doi: 10.4161/fly.19695. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC. PLINK: A tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Turner SD. qqman: an R package for visualizing GWAS results using Q-Q and manhattan plots. J Open Source Softw. 2018;3:731. [Google Scholar]
  • 92.Rausch T, Zichner T, Schlattl A, Stütz AM, Benes V, Korbel JO. DELLY: Structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics. 2012;28:i333–i339. doi: 10.1093/bioinformatics/bts378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Layer RM, Chiang C, Quinlan AR, Hall IM. LUMPY: A probabilistic framework for structural variant discovery. Genome Biol. 2014;26:R84. doi: 10.1186/gb-2014-15-6-r84. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP. Integrative genomics viewer. Nat Biotechnol. 2011;29:24–26. doi: 10.1038/nbt.1754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Noe L, Kucherov G. YASS: enhancing the sensitivity of DNA similarity search. Nucleic Acids Res. 2005;33:W540–W543. doi: 10.1093/nar/gki478. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Gautier M, Vitalis R. rehh : an R package to detect footprints of selection in genome-wide SNP data from haplotype structure. Bioinformatics. 2012;28:1176–1177. doi: 10.1093/bioinformatics/bts115. [DOI] [PubMed] [Google Scholar]
  • 97.Platt A, Pivirotto A, Knoblauch J, Hey J. An estimator of first coalescent time reveals selection on young variants and large heterogeneity in rare allele ages among human populations. PLoS Genet. 2019;15:e1008340. doi: 10.1371/journal.pgen.1008340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Bergeron LA, Besenbacher S, Zheng J, Li P, Bertelsen MF, Quintard B, Hoffman JI, Li Z, Leger J, Shao C, Stiller J, et al. Evolution of the germline mutation rate across vertebrates. Nature. 2023;615:285–291. doi: 10.1038/s41586-023-05752-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Backström N, Forstmeier W, Schielzeth H, Mellenius H, Nam K, Bolund E, Webster MT, Öst T, Schneider M, Kempenaers B, Ellegren H. The recombination landscape of the zebra finch Taeniopygia guttata genome. Genome Res. 2010;20:485–495. doi: 10.1101/gr.101410.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Kong A, Gudbjartsson DF, Sainz J, Jonsdottir GM, Gudjonsson SA, Richardsson B, Sigurdardottir S, Barnard J, Hallbeck B, Masson G, Shlien A, et al. A high-resolution recombination map of the human genome. Nat Genet. 2002;31:241–247. doi: 10.1038/ng917. [DOI] [PubMed] [Google Scholar]
  • 101.Groenen MAM, Wahlberg P, Foglio M, Cheng HH, Megens H-J, Crooijmans RPMA, Besnier F, Lathrop M, Muir WM, Wong GK-S, Gut I, et al. A high-density SNP-based linkage map of the chicken genome reveals sequence features correlated with recombination rate. Genome Res. 2009;19:510–519. doi: 10.1101/gr.086538.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Watterson GA. On the number of segregating sites in genetical models without recombination. Theor Popul Biol. 1975;7:256–276. doi: 10.1016/0040-5809(75)90020-9. [DOI] [PubMed] [Google Scholar]
  • 103.Korneliussen TS, Albrechtsen A, Nielsen R. ANGSD: Analysis of Next Generation Sequencing Data. BMC Bioinformatics. 2014;15:356. doi: 10.1186/s12859-014-0356-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 2011;17:10–12. [Google Scholar]
  • 105.Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34:i884–i890. doi: 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Kopylova E, Noé L, Touzet H. SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data. Bioinformatics. 2012;28:3211–3217. doi: 10.1093/bioinformatics/bts611. [DOI] [PubMed] [Google Scholar]
  • 107.Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glöckner FO. The SILVA ribosomal RNA gene database project: Improved data processing and web-based tools. Nucleic Acids Res. 2013;41:D590–D596. doi: 10.1093/nar/gks1219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43:e47. doi: 10.1093/nar/gkv007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Lun ATL, Riesenfeld S, Andrews T, Dao TP, Gomes T, Marioni JC. EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data. Genome Biol. 2019;20:63. doi: 10.1186/s13059-019-1662-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Satija R, Farrell JA, Gennert D, Schier AF, Regev A. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol. 2015;33:495–502. doi: 10.1038/nbt.3192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Chen C-F, Foley J, Tang P-C, Li A, Jiang TX, Wu P, Widelitz RB, Chuong CM. Development, regeneration, and evolution of feathers. Annu Rev Anim Biosci. 2015;3:169–195. doi: 10.1146/annurev-animal-022513-114127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Granja JM, Corces MR, Pierce SE, Bagdatli ST, Choudhry H, Chang HY, Greenleaf WJ. ArchR is a scalable software package for integrative single-cell chromatin accessibility analysis. Nat Genet. 2021;53:403–411. doi: 10.1038/s41588-021-00790-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Ng CS, Wu P, Foley J, Foley A, McDonald M-L, Juan W-T, Huang C-J, Lai Y-T, Lo W-S, Chen C-F, Leal SM, et al. The chicken frizzle feather is due to an α-Keratin (KRT75) mutation that causes a defective rachis. PLoS Genet. 2012;8:e1002748. doi: 10.1371/journal.pgen.1002748. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, Liu XS. Model-based Analysis of ChIP-Seq (MACS) Genome Biol. 2008;9:R137. doi: 10.1186/gb-2008-9-9-r137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Wagih O. ggseqlogo: a versatile R package for drawing sequence logos. Bioinformatics. 2017;33:3645–3647. doi: 10.1093/bioinformatics/btx469. [DOI] [PubMed] [Google Scholar]
  • 116.Zerbino DR, Johnson N, Juettemann T, Wilder SP, Flicek P. WiggleTools: parallel processing of large collections of genome-wide datasets for visualization and statistical analysis. Bioinformatics. 2014;30:1008–1009. doi: 10.1093/bioinformatics/btt737. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.Granja JM, Corces MR, Pierce SE, Bagdatli ST, Choudhry H, Chang HY, Greenleaf WJ. Author Correction: ArchR is a scalable software package for integrative single-cell chromatin accessibility analysis. Nat Genet. 2021;53:935. doi: 10.1038/s41588-021-00790-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 118.Solé-Boldo L, Raddatz G, Gutekunst J, Gilliam O, Bormann F, Liberio MS, Hasche D, Antonopoulos W, Mallm J, Lonsdorf AS, Rodríguez-Paredes M, et al. Differentiation-related epigenomic changes define clinically distinct keratinocyte cancer subclasses. Mol Syst Biol. 2022;18:e11073. doi: 10.15252/msb.202211073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 119.Glotzer DJ, Zelzer E, Olsen BR. Impaired skin and hair follicle development in Runx2 deficient mice. Dev Biol. 2008;315:459–473. doi: 10.1016/j.ydbio.2008.01.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120.Smits JPH, Qu J, Pardow F, van den Brink NJM, Rodijk-Olthuis D, van Vlijmen-Willems IMJJ, van Heeringen SJ, Zeeuwen PLJM, Schalkwijk J, Zhou H, van den Bogaard EH. The aryl hydrocarbon receptor regulates epidermal differentiation through transient activation of TFAP2A. Journal of Investigative Dermatology. 2024 doi: 10.1016/j.jid.2024.01.030. in press. [DOI] [PubMed] [Google Scholar]
  • 121.Boglev Y, Wilanowski T, Caddy J, Parekh V, Auden A, Darido C, Hislop NR, Cangkrama M, Ting SB, Jane SM. The unique and cooperative roles of the Grainy head-like transcription factors in epidermal development reflect unexpected target gene specificity. Dev Biol. 2011;349:512–522. doi: 10.1016/j.ydbio.2010.11.011. [DOI] [PubMed] [Google Scholar]
  • 122.Botchkarev VA, Komarova EA, Siebenhaar F, Botchkareva NV, Sharov AA, Komarov PG, Maurer M, Gudkov AV, Gilchrest BA. p53 Involvement in the control of murine hair follicle regression. Am J Pathol. 2001;158:1913–1919. doi: 10.1016/S0002-9440(10)64659-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123.Kulakovskiy IV, Vorontsov IE, Yevshin IS, Sharipov RN, Fedorova AD, Rumynskiy EI, Medvedeva YA, Magana-Mora A, Bajic VB, Papatsenko DA, Kolpakov FA, et al. HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis. Nucleic Acids Res. 2018;46:D252–D259. doi: 10.1093/nar/gkx1106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124.Hains T, Pirro S, O’Neill K, Valez J, Speed N, Clubb S, Oleksyk T, Bates J, Hackett S. The complete genome sequences of 94 species of parrots (Psittaciformes, Aves) Biodiversity Genomes. 2022 doi: 10.56179/001c.40338. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125.Ma SM, Li JW-H, Choi JW, Zhou H, Lee KKM, Moorthie VA, Xie X, Kealey JT, Da Silva NA, Vederas JC, Tang Y. Complete reconstitution of a highly reducing iterative polyketide synthase. Science. 2009;326:589–592. doi: 10.1126/science.1175602. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 126.Roy B, Granas D, Bragg F, Cher JAY, White MA, Stormo GD. Autoregulation of yeast ribosomal proteins discovered by efficient search for feedback regulation. Commun Biol. 2020;3:761. doi: 10.1038/s42003-020-01494-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 127.Shen MWY, Fang F, Sandmeyer S, Da Silva NA. Development and characterization of a vector set with regulated promoters for systematic metabolic engineering in Saccharomyces cerevisiae. Yeast. 2012;29:495–503. doi: 10.1002/yea.2930. [DOI] [PubMed] [Google Scholar]
  • 128.Ji L, Guo W. Single-cell RNA sequencing highlights the roles of C1QB and NKG7 in the pancreatic islet immune microenvironment in type 1 diabetes mellitus. Pharmacol Res. 2023;187:106588. doi: 10.1016/j.phrs.2022.106588. [DOI] [PubMed] [Google Scholar]
  • 129.Roh HC, Kumari M, Taleb S, Tenen D, Jacobs C, Lyubetskaya A, Tsai LT-Y, Rosen ED. Adipocytes fail to maintain cellular identity during obesity due to reduced PPARγ activity and elevated TGFβ-SMAD signaling. Mol Metab. 2020;42:101086. doi: 10.1016/j.molmet.2020.101086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130.Nishimura M, Nishie W, Shirafuji Y, Shinkuma S, Natsuga K, Nakamura H, Sawamura D, Iwatsuki K, Shimizu H. Extracellular cleavage of collagen XVII is essential for correct cutaneous basement membrane formation. Hum Mol Genet. 2016;25:328–339. doi: 10.1093/hmg/ddv478. [DOI] [PubMed] [Google Scholar]
  • 131.Cohen E, Johnson C, Redmond CJ, Nair RR, Coulombe PA. Revisiting the significance of keratin expression in complex epithelia. J Cell Sci. 2022;135 doi: 10.1242/jcs.260594. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 132.Wegner J, Loser K, Apsite G, Nischt R, Eckes B, Krieg T, Werner S, Sorokin L. Laminin α5 in the keratinocyte basement membrane is required for epidermal–dermal intercommunication. Matrix Biology. 2016;56:24–41. doi: 10.1016/j.matbio.2016.05.001. [DOI] [PubMed] [Google Scholar]
  • 133.Wu P, Yan J, Lai Y-C, Ng CS, Li A, Jiang X, Elsey RM, Widelitz R, Bajpai R, Li W-H, Chuong C-M. Multiple regulatory modules are required for scale-to-feather conversion. Mol Biol Evol. 2018;35:417–430. doi: 10.1093/molbev/msx295. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134.Ting-Berreth SA, Chuong C-M. Sonic hedgehogin feather morphogenesis: Induction of mesenchymal condensation and association with cell death. Developmental Dynamics. 1996;207:157–170. doi: 10.1002/(SICI)1097-0177(199610)207:2<157::AID-AJA4>3.0.CO;2-G. [DOI] [PubMed] [Google Scholar]
  • 135.Tirosh I, Izar B, Prakadan SM, Wadsworth MH, Treacy D, Trombetta JJ, Rotem A, Rodman C, Lian C, Murphy G, Fallahi-Sichani M, et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science. 2016;352:189–196. doi: 10.1126/science.aad0501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 136.Kowata K, Nakaoka M, Nishio K, Fukao A, Satoh A, Ogoshi M, Takahashi S, Tsudzuki M, Takeuchi S. Identification of a feather β-keratin gene exclusively expressed in pennaceous barbule cells of contour feathers in chicken. Gene. 2014;542:23–28. doi: 10.1016/j.gene.2014.03.027. [DOI] [PubMed] [Google Scholar]
  • 137.Grikscheit K, Grosse R. Formins at the Junction. Trends Biochem Sci. 2016;41:148–159. doi: 10.1016/j.tibs.2015.12.002. [DOI] [PubMed] [Google Scholar]
  • 138.Irvine AD, Corden LD, Swensson O, Swensson B, Moore JE, Frazer DG, Smith FJD, Knowlton RG, Christophers E, Rochels R, Uitto J, et al. Mutations in cornea-specific keratin K3 or K12 genes cause Meesmann’s corneal dystrophy. Nat Genet. 1997;16:184–187. doi: 10.1038/ng0697-184. [DOI] [PubMed] [Google Scholar]
  • 139.Foitzik K, Krause K, Nixon AJ, Ford CA, Ohnemus U, Pearson AJ, Paus R. Prolactin and its receptor are expressed in murine hair follicle epithelium, show hair cycle-dependent expression, and induce catagen. Am J Pathol. 2003;162:1611–1621. doi: 10.1016/S0002-9440(10)64295-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp. data S1
Supp. data S2
Supp. data S3
Supp. data S4
Supp. data S5
Supp. data S6
Supplemental material

Data Availability Statement

PacBio and Illumina reads are available in NCBI SRA under BioProject PRJNA986688. Gene annotation, the draft reference genome, and the raw Raman and HPLC data are available in a Dryad repository (10.5061/dryad.18931zd5c).

RESOURCES