Skip to main content
Genome Biology and Evolution logoLink to Genome Biology and Evolution
. 2025 Feb 14;17(3):evaf024. doi: 10.1093/gbe/evaf024

A Broad Genome Survey Reveals Widespread Presence of Secretoglobin Genes in Squamate and Archosaur Reptiles that Flowered into Diversity in Mammals

Robert C Karn 1,2,b,, Christina M Laukaitis 3,4,5
Editor: Federico Hoffmann
PMCID: PMC11884772  PMID: 39949088

Abstract

Secretoglobins (SCGBs) are a superfamily of small, dimeric, cytokine-like proteins found originally in the reproductive tracts and airways of mammals. Most SCGB research has focused on respiratory diseases in humans and laboratory animal models but knowledge of their biological functions is sparse. We report here a broad survey of Scgbs, the genes that encode SCGBs, in animal genomes. We tested the view that they are uniquely mammalian in origin and distribution, hoping that understanding their distribution would shed light on their evolutionary history and perhaps point to putative biological functions. Rather than being uniquely mammalian, we found many different SCGBs in turtles, crocodilians, lizards, and birds, suggesting they existed in the Carboniferous Period (∼320 MYA) when the sauropsids evolved in the amniote lineage. We identified no SCGBs in amphibians or fishes, suggesting that this characteristic originated in an amniote ancestor. Amniotes include sauropsid and synapsid lineages, and three subfamilies of SCGBs (SCGB2A, SCGB3A, and SCGB1C) are found in both sauropsid and synapsid lineages. Uteroglobin (SCGB1A), the first identified SCGB protein, is uniquely mammalian, having appeared in monotremes. The SCGB subfamilies including androgen-binding proteins (SCGB1B and SCGB2B) are first seen in metatherians. This complex distribution suggests that there is an as-yet-undiscovered basic function of SCGBs shared by all amniotes.

Keywords: secretoglobin, comparative genomics, amniotes, evolutionary history


Significance.

This work is significant because it shows that secretoglobins (SCGBs) exist in animals other than mammals. We establish that the evolutionary history of SCGBs began by the Carboniferous Period (∼320 MYA). Based on its evolutionary history, we propose that there is a basic biological function of SCGBs shared by all amniotes.

Introduction

The secretoglobin (SCGB) protein family was named 25 years ago to recognize structural similarities between a variety of small, soluble, cytokine-like proteins related to uteroglobin (UG; Mukherjee and Chilton 2000). At that time, it was already recognized that none of the SCGB proteins under study had a clearly defined function, a problem that continues, with only a few exceptions, to the present day (Mukherjee and Chilton 2000; Chung et al. 2017). Moreover, most of the work on family members has focused on understanding the connections with human disease and thus was conducted in common laboratory animal models, e.g. rabbits, mice, and rats, leading to an incomplete and myopic view of their potential functions. At the close of the meeting where the SCGB protein family was named, Dorothy Gail said, “A most important goal is clarifying the biological role(s) of UG in normal human tissues and organs and in human disease. Despite the fact that the protein is 50-70% conserved among species and binds to other molecules such as retinoids, phospholipids, and calcium, its biological function remains to be defined. A deeper understanding of taxonomy, structural modifications, biochemistry, and tissue specificity of all UG family members is needed” (Gail 2000).

Here, we address the lack of taxonomic diversity in our knowledge of SCGBs by surveying nonmammalian animal genomes for the presence of Scgb genes, as well as ensuring that we have included representatives of all four mammalian superorders. In so doing, it is our hope that comparisons of the kinds of SCGBs we found in various taxa and their evolutionary niches will eventually reveal clues to the potential functions of SCGBs in many organisms.

SCGBs are defined by their protein structures (globins that do not contain heme) and the fact that they are secreted in their dimeric form. The primary structure of SCGB protein monomers is a single polypeptide of nearly 100 amino acids in length. Their tertiary structure is a four-helix bundle in a boomerang configuration called the UG fold (Callebaut et al. 2000). They vary in quaternary structure. The best known of these, UG has a single gene encoding a monomer that associates to form homodimers. This contrasts with other SCGBs that are encoded by two genes, one for each of the monomers making up the heterodimer. Thus, the structure of the secreted UG is a homodimer stabilized by disulfide bridging in an antiparallel configuration (Callebaut et al. 2000), while the other members of this superfamily that have been studied in this respect are disulfide-bridged heterodimers composed of monomers encoded by two different genes. RYD5 is a poorly understood SCGB that may also exist as a monomer or homodimer.

In addition to UG, other SCGBs also bear names conferred on them before the advent of the SCGB nomenclature, e.g. androgen-binding protein (ABP; ABPA and ABPBG), Lipophilin (LIPPA and LIPPB), and Mammaglobin (MGBA and MGBB) among others with acronyms varying in the literature.

While many Scgb genes are expressed in secretory epithelial tissues, no clear function is known for any. Many are dysregulated in disease states and have thus been studied in a variety of medical settings, including lung and respiratory tract disease, kidney disease, reproduction, inflammation, and cancer. With the intention of identifying UG function, two laboratories produced targeted disruptions of different regions of the gene encoding UG in mouse (Stripp et al. 1996; Zhang et al. 1997; reviewed in Mukherjee et al. 2007; Chung et al. 2017). The results of the two studies differed substantially in terms of the overall health of the resulting Ug−/− genotype animals and the effects on specific organs. That result emphasizes the importance of evaluating knockout progeny for a variety of potentially subtle manifestations; alternatively, the presence of numerous closely related genes with similar functions may have allowed compensation by another gene or genes that are upregulated as the result of loss of the target gene (Chen et al. 2010; Smith et al. 2014).

Previous studies of ABP (a heterodimer composed of ABPA [SCGB1B] and ABPBG [SCGB2B] monomers) have focused exclusively on mammals, especially on rodents. In the mouse, salivary ABPs mediate assortative mate selection based on subspecies recognition that potentially limits gene exchange between subspecies where they meet (Laukaitis et al. 1997; Talley et al. 2001). There is also evidence that ABP constitutes a system of incipient reinforcement across the European hybrid zone where house mouse subspecies make secondary contact (Bímová et al. 2005; Laukaitis et al. 2008; Laukaitis and Karn 2012; Karn and Laukaitis 2014). This pheromonal function makes ABP unique among SCGBs in that there is a relatively well-understood function, at least for a few of the many mouse Abpa and Abpbg genes. Whether Abp genes in other species share this role is not known.

During the nearly 25 years since the SCGB superfamily was named, this field of study has contracted dramatically with many of the earlier researchers retiring or expiring (Karn et al. 2021). Therefore, our focus in this comparative genomics study was to take a longer view of this field, asking what animals in addition to mammals might have Scgb genes. We extended from SCGBs already reported in various medically relevant primates and rodents (humans, mice, and rats) to seek new SCGB-like proteins in additional mammalian and nonmammalian taxa. What might we learn from establishing an evolutionary history of genes that encode these proteins in mammals, their early relatives and other nonmammalian organisms that might have some version of them? Would a broad enough comparison of the taxa that have them with those that apparently do not give us clues to the biological functions of one or more of them?

Our primary aim in this project was to identify Scgb genes in the widest-possible variety of animals to answer the following key questions:

  1. Were SCGBs mammalian novelties or were SCGBs present at more basal nodes?

  2. Are nonmammalian SCGBs similar to or different from those described in mammals?

  3. Are there any Scgb genes confined to mammals or to nonmammals?

  4. Can we infer an evolutionary history of the SCGBs?

Results

We interrogated genomes of mammalian and nonmammalian animals for Scgb genes, including larger genome groups (e.g. vertebrates [taxid: 7742] and individual phyla of invertebrates [e.g. arthropoda; taxid: 6656]), as well as smaller genome groups (e.g. birds [taxid: 8782] and turtles [taxid: 8459]), with psi-BLAST searches on the NCBI website and BLAT searches on the UCSC Genome Browser. Scgb genes (BLAT) and/or SCGB proteins (psi-BLAST) of three mammals of superorder Euarchontoglires: human (Homo sapiens), mouse (Mus musculus), and rat (Rattus norvegicus) were used as queries and then searches were repeated using newly identified Scgb/SCGB sequences as queries. We manually curated all identified gene sequences by (i) delineating the three Scgb exons and manually searching for a frequently missed third exon; (ii) translating the predicted gene into a protein sequence to ensure correct frame usage and likely gene/pseudogene status; (iii) creating a protein phylogeny with the translated protein sequence (Fig. 1) to correct the naming applied from automated gene-finding algorithms; and (iv) interrogating each protein for the presence of core SCGB protein features (Fig. 2). We also included as queries additional SCGB sequences found in mammals from the five major A to E SCGB subfamilies (Ni et al. 2000; Laukaitis et al. 2003; Laukaitis and Karn 2005).

Fig. 1.

Fig. 1.

A midpoint-rooted protein phylogeny of the SCGBs found in amniotes in this study. Titles to each major clade (e.g. Ryd5/SCGB1) appear on the left in their clade colors. In some titles, the capital letters A to E followed by Ni et al. and their family number denote the major clades in previous mammal phylogenies of SCGBs (Ni et al. 2000; Laukaitis et al. 2003; Laukaitis and Karn 2005) with the clades and subclades color coded. To save room, those with identical suffixes throughout (e.g. 1C for Ryd5/SCGB1C) have their suffixes omitted and only those with variation in suffixes (e.g. both 3A1 and 3A2 SCGB3A) have them shown. In addition to the five subfamilies identified by Ni et al. (2000), we labeled three other clades NOVEL, one with SCGB3A/UGRP1 in mammals and turtles, another a large clade containing SCGB1C/Ryd5 and a third, much smaller wall lizard clade. Importantly, the SCGB1C clade does not correspond with the C subfamily because the three NOVEL families were not initially designated by Ni et al. (2000) but were derived later by Jackson et al. (2011). Four clades of SCGB2A are denoted with their common group names (turtles, crocodilians, early birds, and squamates). The fifth clade is identified as mammals. The small clade of sauropsid reptiles that correspond to the squamates are lizards. Although snakes make up a large group of squamates, we found no SCGBs in snakes. The SCGB2A early bird paralogs, the crocodilian paralogs, and the turtle paralogs represent the archosaurs. Bootstrap values 50 and above are shown in black.

Fig. 2.

Fig. 2.

Validation tests comparing putative SCGB protein sequences of UG/SCGB1A a) and SCGB3A turtles b). In each panel, the sequence is shown as a WebLOGO with the signal peptide and coding sequence of the secreted protein delineated by an arrowhead below the x axis. nt counts are in letters at every decimal, and the ½-Cys residues are numbered below them. We assessed the presence and positions of the ½-Cys residues for the disulfide bridging required for dimerization of two monomers in an antiparallel configuration. To test the secreted protein sequences for the UG fold structure (i.e. a four-helix bundle in the SCGB boomerang configuration), we used Phyre2 threading analysis. An example secondary structural report from the Phyre2 analysis is shown for each sequence (platypus in a and Chinese pond turtle in b) with confidence keys at the C-terminal ends. A thumbnail image representing the secondary structure of each animal appears at the bottom of each panel. We also assessed the presence or absence of the six conserved residues F6, L13, Y21, F28, M41, and I63 (rabbit UG) in the sequences: the residue at position 28 is a plus symbol and the others are asterisks.

Our initial searches of invertebrate genomes yielded no Scgb gene sequences so we turned to genome searches of the Cyclostomi (aka Cyclostomata; taxid: 1476529) including the hagfish (taxid: 7761) and lampreys (taxid: 7745 and 7746) as well as ascidians (aka sea squirts; taxid: 7713) and lancelets (taxid: 7735; formerly amphioxi, taxid: 7736) that provide evolutionary insight into the origins of vertebrates. When none of those yielded any Scgb sequences, we queried the genomes of fishes (taxid: 7898) and amphibians (taxid: 8292), which also failed to yield Scgb sequences.

Only when we began searching genomes of amniotes did we find Scgb genes. We identified 123 Scgb gene sequences in 48 genomes of mammals and nonmammals (Table 1). Overall, we identified 18 genes in marsupials, 4 in monotremes, 10 in birds, 22 in reptiles, 16 in Laurasiatherians, 15 in Afrotherians, 11 in Xenarthrans, and 27 in Euarchontoglires (Table 1; shown in Fig. 1). Additionally, we identified Scgb genes in 35 bird species (supplementary file S1, Supplementary Material online) and additional placental mammals. To avoid crowding in the phylogeny (Fig. 1), we used only five neoavian bird species and included only a few representatives of each superorder.

Table 1.

SCGBs from different families identified in sauropsid and synapsid amniotes

Class Super/order Genus species Common name Group C Novel Novel Novel Group D Group A Group E Group B Ni nomenclature
SCGB2A SCGB1C SCGB3A Wall lizard SCGB1D SCGB1A SCGB2B SCGB1B Jackson nomenclature
MGB RYD5 UGRP1 LPP UG ABPBG ABPA Example
Reptilia Squamata Podarcis muralis Common wall lizard 2
Reptilia Squamata Podarcis raffonei Aeolian wall lizard 2
Reptilia Squamata Podarcis lilfordi Lilford's wall lizard 1 1
Reptilia Squamata Hemicordylus capensis Cape Cliff Lizard 1
Reptilia Squamata Eublepharis macularius Leopard gecko 1
Reptilia Testudines Gopherus evgoodei Goode's thornscrub tortoise 1
Reptilia Testudines Gopherus flavomarginatus Bolson tortoise 1
Reptilia Testudines Caretta caretta Loggerhead sea turtle 1
Reptilia Testudines Chelonia mydas Green sea turtle 1
Reptilia Testudines Dermochelys coriacea Leatherback sea turtle 1
Reptilia Testudines Mauremys reevesii Chinese pond turtle 1
Reptilia Testudines Malaclemys terrapin pileata Diamondback terrapin 1 1
Reptilia Testudines Macrochelys suwanniensis Suwannee snapping turtle 1
Reptilia Testudines Trachemys scripta elegans Red-eared slider 1
Reptilia Crocodylia Alligator mississippiensis American alligator 1 1
Reptilia Crocodylia Crocodylus porosus Saltwater crocodile 1 1
Aves Palaeognathae Apteryx rowi Brown kiwi 1
Aves Palaeognathae Dromaius novaehollandiae Emu 1
Aves Palaeognathae Struthio camelus australis Ostrich 1
Aves Galloanserae Gallus gallus Chicken 1
Aves Galloanserae Phasianus colchicus Pheasant 1
Aves Neoaves Accipiter gentilis Northern goshawk 1
Aves Neoaves Aquila chrysaetos chrysaetos Golden eagle 1
Aves Neoaves Falco cherrug Saker falcon 1
Aves Neoaves Athene cunicularia Burrowing owl 1
Aves Neoaves Lagopus muta Ptarmigan 1
Mammalia Monotremata Ornithorhynchus anatinus Platypus 1 1
Mammalia Monotremata Tachyglossus aculeatus Echidna 1 1
Mammalia Metatheria Antechinus flavipes Yellow-footed antechinus 1
Mammalia Metatheria Phascolarctos cinereus Koala 1 1
Mammalia Metatheria Vombatus ursinus Common wombat 1 1
Mammalia Metatheria Dromiciops gliroides Monito del monte opossum 1 1
Mammalia Metatheria Trichosurus vulpecula Brushtail opossum 1 1
Mammalia Metatheria Gracilinanus agilis Agile gracile opossum 1 1
Mammalia Metatheria Monodelphis domestica Domestic opossum 1 1 2
Mammalia Metatheria Sarcophilus harrisii Tasmanian devil 1 1 1
Mammalia Afrotheria Trichechus manatus latirostris Florida manatee 1 1
Mammalia Afrotheria Echinops telfairi Tenrec 1 1
Mammalia Afrotheria Orycteropus afer afer Aardvark 1 1
Mammalia Afrotheria Elephas maximus indicus Asiatic elephant 1 1 1
Mammalia Afrotheria Loxodonta africana African elephant 1 1 1
Mammalia Afrotheria Elephantulus edwardii Cape elephant shrew 1 1
Mammalia Afrotheria Chrysochloris asiatica Cape golden mole 1
Mammalia Xenarthra Choloepus didactylus Linnaeus two-toed sloth 2 1 1 2
Mammalia Xenarthra Dasypus novemcinctus Nine-banded armadillo 1 1 2 1
Mammalia Laurasiatheria Pteropus alecto Black flying fox 1 1 1 1
Mammalia Laurasiatheria Camelus dromedarius Arabian camel 1 1 1 1 1 1 1
Mammalia Laurasiatheria Equus asinus Horse 1 1 1 1 1
Mammalia Euarchontoglires Homo sapiens Human 2 1 2 2 1
Mammalia Euarchontoglires Rattus norvegicus Brown rat 1 1 2 2 1 2* 1
Mammalia Euarchontoglires Mus musculus House mouse 1 1 2 1 1 2* 1*

*More Abp paralogs than listed have been identified in mouse and rat.

Are There Scgb Genes in the Genomes of Animals Other Than Placental Mammals?

Figure 1 is a midpoint-rooted protein phylogeny built with the sequences we found in nonmammals, monotremes, marsupials, and representatives from the four placental mammalian superorders. Where possible, the major subfamilies were identified with the original five A to E lettered SCGB nomenclatures (Ni et al. 2000). Two large subfamilies (SCGB1C and SCGB3A) and a wall lizard clade are labeled as “NOVEL” because they were not recognized in Ni et al. (2000) but fit the nomenclature of Jackson et al. (2011).

Two of the novel groups (Ryd5/SCGB1C and SCGB3A) appear at the top of the phylogeny. Ryd5/SCGB1C is one of the most diverse, having birds, crocodilians, monotremes, marsupials, and placental mammals in its subclades. SCGB3A has two different clades, one with placental mammals and one with turtles. Below those are the members of SCGB2A, clade C in Ni et al (2000), including the original human Mammaglobin members MGBA 2A1 and MGBA 2A2 in one subclade that is shared with 2A paralogs of other placental mammals. SCGB2A also has subclades of turtles, crocodilians, early birds, and squamates.

The set of groups below SCGB2A includes SCGB1D, which is family D in Ni et al (2000), including the original human Lipophilin members LPPA/SCGB1D1 and LPPB/SCGB1D2, as well as paralogs of other placental mammals. SCGB1D shares a node with ABPA/SCGB1B (Ni family B) and a small clade of wall lizards that have their NCBI sequences labeled 1D but seem to have more affinity to the ABPA paralogs. Therefore, we labeled the wall lizards “NOVEL” but retained the 1D labels of their sequences on NCBI. Below those three groups is a UG/SCGB1A1 clade originally found in placental mammals but obviously including other placentals as well as marsupials and monotremes. The clade at the bottom of the phylogeny is ABPBG/SCGB2B, family E in Ni et al. (2000).

Our work shows that the SCGBs are not restricted to mammals, but they also appear in numerous taxa of reptiles and birds. Thus, it is clear that they were present at nodes more basal than the placental mammals where they were originally identified. We also found that there are Scgb genes in marsupial mammals and monotremes as well as in the nonmammals listed above (Table 1). Interestingly, our search strategy did not identify Scgb genes in the genomes of vertebrates with deep nodes (the Cyclostomi and lampreys or the ascidians and lancelets) nor in amphibians or fishes, suggesting that Scgb genes and their SCGB proteins arose in amniotes.

Surprisingly, we found no Scgb genes in genomes from Tuatara, or any snakes, even when reptile SCGBs (e.g. common wall lizard or crocodile) were psi-blasted on “lizard and snake” (taxid: 8504), on “snake” alone (taxid: 8570), on “colubrid snake” alone, or on individual snake genomes (e.g. two different rattlesnake species). Apparently, not all amniotes have these genes and proteins, although we may have missed some of them.

Data Validation

Each new Scgb/SCGB sequence that we found had to be considered as putative unless and until their membership among the SCGBs could be validated by analysis of their gene and protein structures. As noted above, the term “secretoglobin” is defined by structural characteristics and we tested the new sequences found in other genomes using the following characteristics:

  1. Their gene structures are composed of three exons separated by two introns with a phase 1 intron between their first and second exons and a phase 0 intron between their second and third exons.

  2. The lengths of their exons and cDNAs are similar to those reported by Laukaitis et al. (2003).

  3. The amino acid sequences produced from their second exons result in the secondary structure of a four-helix bundle with the boomerang shape of the UG fold (Callebaut et al. 2000).

  4. The amino acid sequences of the secreted proteins have the ½-Cys residues in the proximal and distal positions to produce the disulfide bridges at the ends of the secreted dimer, a feature required for the antiparallel bonding with another SCGB monomer.

  5. Their secreted proteins have the key residues necessary to bind ligands as shown in other SCGBs, e.g. conserved residues the same or similar to F6, L13, Y21, F28, M41, and I63, numbered from the N-terminus of the rabbit UG sequence with the signal peptide removed (Mornon et al. 1980; Laukaitis et al. 2003). In our experience, the starting positions in most SCGB sequences were indeed at position 6 and thus so were their WebLOGOs; however, there were exceptions such as SCGB1C. In those cases, other landmarks such as the ubiquitous K42 were helpful guides. We also note that the conserved positions could be occupied by other similar residues (e.g. F6 might be L6, I6, or V6; F28 might be Y28).

Supplementary file S2, Supplementary Material online shows an exon/cDNA/gene analysis done on two quite different Scgb paralogs, platypus UG and Chinese pond turtle Scgb3a from Fig. 1, that illustrate how the first steps of our data validation process work. Both organisms have three exons making up their cDNAs/genes. Exon 1 of the platypus UG has the usual 61 nucleotides (nts), while the Chinese pond turtle Scgb3A has the less common 55 nt exon 1; however, both paralogs have the phase 1 intron a and the phase 0 intron b (criterion a above). Their exon sizes and cDNA sizes are within the usual ranges for these small cytokine-like proteins (criterion b above). With few exceptions, this has proven to be the case with the exon/intron sizes of the genes in the lettered subfamilies of Ni et al. (2000) and the novel SCGB1C subfamily. However, the novel mammal and turtle Scgb3A genes have experienced some deviations in exon/intron locations that probably explain their missing helices and consequent deranged secondary and tertiary structures.

The rest of the criteria above are illustrated in Fig. 2 that compares the two groups that represent these two animals: the UG subfamily (Fig. 2a) and the SCGB3A turtle subfamily (Fig. 2b). The UG LOGO display shows the proximal ½-Cys26 and the distal ½-Cys92 residues required to form a dimer with another, similar, monomer with the same ½-Cys residues, and to stabilize two subunits in an antiparallel configuration. The secondary structure analysis below the LOGO predicts four alpha-helices as expected for four-helix bundles in SCGBs. The nine threading models below the secondary structure analysis show that each is a four-helix bundle in a boomerang-style configuration of the UG fold. In contrast, Fig. 2b suggests that the sequences in the SCGB3A turtle subfamily do not meet the criteria for SCGB membership because (i) the secreted protein lacks both the proximal and distal ½-Cys residues needed to form and stabilize an antiparallel configuration with a similar monomer; (ii) the secondary structure analysis of one of the sequences is lacking the first two helices seen in UG and only predicts two other helices too far forward to be the helices numbered 3 and 4; and (iii) the six threading models below the secondary structure analysis do not have enough helix structure to qualify for the four-helix bundles required for a UG fold.

The structural analyses of the members of the five lettered subfamilies of Ni et al. (2000) can be found in supplementary file S2, Supplementary Material online and will be summarized here. Subfamily A (UG) has been described above but also appears with more species as (UG/SCGB1A) in supplementary file S2, Supplementary Material online. Subfamilies B (ABPA/SCGB1B), C (SCGB2A in mammals, birds and reptiles), D (SCGB1D in mammals and lizards), the questionable wall lizard clade, and E (ABPBG/SCBG2B) meet all the SCGB structural criteria as did subfamily A. The two novel subfamilies, SCGB1C (RYD5; many diverse species) and SCGB3A (UGRP1; mammals and turtles), failed to meet one or more of the criteria, summarized as follows. The WebLOGOs of both of these putative protein sequences suggest that they lack the proximal and distal ½-Cys residues required to pair two SCGB monomers to make both homodimers and heterodimers. This situation raises an additional question: If they cannot dimerize, should they be considered to be SCGBs?

The secondary and tertiary structural analyses gave different results in SCGB1Cs and SCGB3As. The turtle and mammal SCGB3As have secondary and tertiary structures lacking some of the required helices and, consequently, they lack the UG fold with the requisite four-helix bundle in boomerang form. In contrast, all of the 27 SCGB1C sequences have the UG fold with the possible exception of two species, the Cape golden mole and the black flying fox that have the helices but not in a bundle with the boomerang form. The SCGB1C WebLOGO has the expected six conserved residues for ligand binding, but the SCGB3A WebLOGO does not. Resolving the issues that arise in the case of these novel SCGBs is beyond the scope of this project; however, the questions that have arisen need eventually to be addressed.

Comparing Mammalian and Nonmammalian SCGBs

With the exception of the novel SCGB1C and SCGB3A sequences and the questionable wall lizard clade, the newly identified mammalian and nonmammalian SCGB sequences generally fit within the five originally established subfamilies (Ni et al. 2000; Laukaitis et al. 2003; Laukaitis and Karn 2005). The two exceptions, RYD5 (SCGB1C1) and UGRP1 (SCGB3A), were not included in the original nomenclature or mapping work (Ni et al. 2000) nor mentioned in the proceedings of the 2000 meeting (Mukherjee and Chilton 2000), but they are acknowledged in a more complicated later nomenclature that retained some of these lettered subfamilies and introduced new subfamilies (Jackson et al. 2011). It was not a goal of our project to revise SCGB nomenclature, but we note significant discrepancies between the phylogeny we report and those of Jackson et al. (2011). To minimize confusion, we use common names and relate them to the varying nomenclatures as appropriate (Table 1).

A Summary of Evolutionary Distribution of SCGBs

UG (SCGB1A) is the only member of subfamily A and its representatives are found only in mammals, including monotremes, marsupials, and placentals. Similarly, SCGBs in subfamilies B and E, ABPA (SCGB1B) and ABPBG (SCGB2B), respectively, are found only in marsupial and placental mammals. Members of subfamily C (SCGB2A) are found in birds and reptiles as well as placental mammals (Table 1). Excepting the wall lizard clade that is probably misnamed 1D, subfamily D (SCGB1D) members are found only in placental mammals. SCGB2As are not found in monotremes or marsupials, and unlike in the well-studied placental mammals commonly used in research, there is not a strict co-occurrence between members of the two SCGB2A and SCGB1D subfamilies. The novel SCGB1C/RYD5 subfamily is likewise widely spread evolutionarily and is found in monotremes, marsupials, crocodilians, neoavian birds, and mammals. The other novel SCGB subfamily, including the Scgb3A/UGRP1 genes identified in turtles and mammals, poses a particular problem because they lack the UG fold and other data validation criteria. The Scgb3A/UGRP1 genes likely form a unique class of SCGB-derived molecules with a function distinct from the majority of SCGBs.

We note that while SCGBs are present in the amniote ancestor, there appear to have been selective eliminations from many lineages. For example, SCGB1C is found in all major lineages with elimination from squamates, turtles, and palaeognathes. Other SCGB subfamilies have other patterns.

Discussion

Structure vs. Function: How to Define the SCGBs?

A first principle of biology involves the interplay of structure and function and, yet to date, the SCGBs have been defined by their structure alone (see the five criteria under Data Validation in Results). These small, dimeric, cytokine-like proteins were originally found in the reproductive tracts (rabbit uterus) and airways of mammals and so it is not surprising that most SCGB research has focused on respiratory diseases in humans and laboratory animal models. But what about other possible biological functions? Thus, far only a putative pheromonal function for mouse ABP (Laukaitis et al. 1997; Talley et al. 2001) with the possibility of providing reinforcement on the hybrid zone in Europe (Bímová et al. 2005, 2011) reviewed in Laukaitis and Karn (2012) has withstood the test of time.

SCGBs have been shown to bind small ligands such as progesterone (Beato and Baier 1975) and male steroid hormones (Heyns and De Moor 1977; Karn 1994), but not estradiol (Karn 1998). The amino acid residues involved in this process have been well defined and the requirement for dimer formation to effect binding is well established (Callebaut et al. 2000) but the requirement of binding such ligands for a specific biological function(s) is still not understood. The real gap in this knowledge, though, is that, in all the ligand-binding studies, defined ligands (e.g. progesterone and testosterone) were provided in vitro. Thus, there are no studies aimed at identifying ligands bound in vivo.

What the structural studies do tell us is that all SCGBs share the characteristics of being small with a simple four-helix bundle (Brändén and Tooze 1991) that ensures a high degree of external hydrophilicity and, in the dimer form, allows binding small hydrophobic ligands in the cavity formed by two monomers (Callebaut et al. 2000). A plethora of observations suggest that these dimerized proteins are secreted (as the name indicates) in exocrine fluids and may protect biological surfaces by coating them (Dominguez 1995).

Physiological Studies of SCGBs

SCGBs have been primarily identified in glandular and epithelial tissue of mammals, including respiratory epithelium (nose and lung), lacrimal glands, skin, prostate, breast, uterus, and colon. They have been minimally recognized and studied in nonmammals. The primary function of an Scgb gene or its SCGB protein product is seldom known, but the effect of disease (inflammation, polyps, and cancer) on gene expression is often used as a marker for function.

A variety of SCGBs have been found in respiratory epithelium in mammals. The best-studied family member, SCGB1A1 (aka Club Cell Secretory Protein, CC10, CC17, and UG) is expressed in secretory lung epithelial cells (Mukherjee and Chilton 2000; Mukherjee et al. 2007; Martinu et al. 2023). This protein is highly abundant in healthy lung and its level changes with diseases such as asthma, emphysema, and fibrosis (Martinu et al. 2023). Similarly, SCGB1C1 (aka RYD5) is upregulated in mouse lungs after treatment with immune stimulators and exposure to asthma-stimulating allergens (Kim et al. 2020). Having low levels of the protein may increase susceptibility to upper respiratory infections in human athletes (Orysiak et al. 2016). SCBG1C1 was first found in nasal epithelium and is at higher levels in people with nasal polyps (Lu et al. 2011), although no expression-controlling SNPs correlate with nasal polyp risk (Özdaş et al. 2015). SCGB3A2 (aka UGRP1) is also found in club cells of lung epithelium where it has been shown to have anti-inflammatory and anti-fibrotic properties (Kimura et al. 2022).

SCGBs are also important in a variety of mammalian glandular tissues. SCGB2A2 and SCGB1D1 are expressed in the lacrimal glands of the eyes and are dysregulated in allergic and dry eye states (Dominguez 1995; Lehrer et al. 1998; Leonardi et al. 2014). SCGB1D2, SCGB2A1, and SCGB2A2 are downregulated in inflammatory skin disease (Coates et al. 2019). SCGB1D members have also been used as a marker to identify milk adulteration (Fan et al. 2023; Ji et al. 2023). UG may also serve as a marker of kidney injury (García-García et al. 2016).

Some other SCGBs influence mammalian reproduction. For example, UG has been shown to be induced by increasing progesterone levels with pregnancy in rabbits (Mukherjee et al. 2007). More recently, proteomic evaluation has identified UG in mare cervical plugs that are thought to block microbial entry into the pregnant uterus (Loux et al. 2017). UG is known to be secreted into prostatic fluid (Folk 1980; Peri et al. 1993), and when present with transglutaminase, covalently binds sperm surface proteins (Manjunath et al. 1984) to reduce sperm motility (Luconi et al. 2000). This contrasts with SCGB1D, which has been identified in bull sperm by proteomics where it is positively associated with sperm motility (Boe-Hansen et al. 2015) although not with viability after freezing (Gomes et al. 2020).

SCGBs have attracted a great deal of attention in human oncology. SCGB2A1 and SCGB2A2 are dysregulated in ovarian (Bignotti et al. 2013) and breast cancers (Zehentner and Carter 2004). SCGB2A1 has also been studied as a colorectal cancer marker (Munakata et al. 2014) and as a marker and treatment target in breast cancer (Milosevic et al. 2023). SCGB1D2 has been studied as a tumor marker for pancreatic cancer (Taniuchi et al. 2018). SCGB1C1 is also significantly increased in ovarian cancer (Bignotti et al. 2013). One study shows that SCGB2A1 is highly expressed in endometrial cancer (Li et al. 2020) while another shows that low expression suggests a poor outcome in the same cancer (Zhang et al. 2023). Finally, SCGB3A1 has been shown to be a tumor suppressor in a variety of cancers (Kimura et al. 2022).

Seeking SCGBs in a Broader Biological Context

In an attempt to discover new SCGBs, especially in nonmammals, we carried out an extensive study of the genomes available at the time of this writing. To determine whether the SCGBs are mammalian novelties, we began by probing genomes as far back as the ancestors of vertebrates, looking for SCGBs in order to gain evolutionary insight into the origins of SCGBs in vertebrates and/or other animals. We also interrogated the phyla of invertebrates (e.g. coelenterates, sponges, flatworms, annelids, molluscs, echinoderms, cnidaria, and arthropods) but did not identify new Scgbs genes, suggesting that they likely were derived in vertebrates and possibly only amniotes. However, we present these negative findings with caution because these small sequences might be lost due to technical issues with genome builds (see Karn et al. 2021). Nonetheless, we note that genome sequencing is constantly improving, and we found many new Scgbs/SCGBs in the nonmammalian vertebrates presented below.

Our major finding of the project answers the first three questions posed in the Introduction:

  1. Were SCGBs mammalian novelties or did some appear in nonmammals? Our results show that SCGBs are present in crocodilians, reptiles (including turtles and lizards), and birds.

  2. Are nonmammalian SCGBs similar to or different from those described in mammals? Most of the SCGBs we found in nonmammals cluster with the five SCGB families described previously (Ni et al. 2000). Others fall into the novel SCGB1C and SCGB3A subfamilies and the wall lizard clade that were not described in Ni et al (2000). Of these, 1C and 3A were included in a later phylogeny (Jackson et al 2011). With the possible exception of the wall lizards, no new SCGB subfamilies were identified that do not have members in mammals. In fact, four of the SCGB subfamilies represented in mammals were present in a recognizable form as early as the amniote ancestor.

  3. Are there any Scgb genes confined to mammals or to nonmammals? Three SCGB families, SCGB1A/UG, SCG1B/ABPA, and SCGB2B/ABPBG, are found only in mammals and are not represented in nonmammals. Thus, these are mammalian novelties.

  4. Inferring an evolutionary history of the SCGBs

SCGB representatives are identified in numerous amniotes, suggesting that they were derived from amphibian ancestors during the Carboniferous Period. The amniote lineage includes two groups, the sauropsids (including all reptiles and birds) and the synapsids whose only living representatives are the mammals (Fig. 3; Tizard 2023). The basal sauropsid node differentiated ∼320 million years ago, in the Carboniferous Period of the Paleozoic Era, and the archosaur node is dated to the Middle Triassic Period (about 246 million to 229 million years ago). The synapsids were also present in the Carboniferous Period, appearing about 359 million to 299 million years ago.

Fig. 3.

Fig. 3.

A phylogeny of amniotes with the presence of SCGB subfamilies indicated by colored branches. a) SCGB1C (novel), SCGB2A (subfamily C), and SCGB3A (novel) subfamilies are identified in both sauropsids and mammals and so must have been present in a common amniote ancestor. The novel wall lizard clade is found only in squamates. b) SCGB1A (UG/subfamily A) representatives are present in all mammalian taxa, while SCGB1B (ABPA/subfamily B), and SCGB2B (ABPBG/subfamily E) first appear in metatherians and SCGB1D (subfamily D) is found only in placentals.

Where do the wall lizards fit into this picture? They were not included in the Ni et al. (12000) phylogeny nor in the Jackson et al (2011) addition of SCGB1C and SCGB3A, so they appear to have been named SCGB1D by a sequence prediction algorithm on the NCBI genome browser. If, as we propose, they are something other than any previously identified SCGB, even perhaps being a unique SCGB, where did they come from? Given that they fall between SCGB1D and ABPA/SCGB1B on the Fig. 1 phylogeny, our best guess is that their affinity to either one is due to convergent evolution. Our argument for this proposal is that, the wall lizard clade aside, both SCGB1D and ABPA/SCGB1B are uniquely mammalian. Perhaps convergent evolution is in part responsible for the mediocre bootstrap values (65 overall for the 1D wall lizard-ABPA/1B and 83 for wall lizard-ABPA/1B).

A study of Fig. 1 and Fig. 3 phylogenies suggests that there are three candidates for an ancestral SCGB: SCGB2A, SCGB3A, and SCGB1C. In different combinations, these have been found in squamate (lizard), archosaur (crocodile and/or avian), and testudine (turtle) members of the sauropsid branch, with basal nodes occurring in the Carboniferous Period. While our data do not allow definitively designating any one of these SCGBs as the oldest, clades of turtles, crocodilians, squamates, and early birds in Fig. 1 suggest that the SCGB2As are most thoroughly represented in the sauropsid branch. In contrast, we did not find SCGB1Cs in lizards or turtles and only in neoavian birds. Finally, we note that while lizards represent squamates, we were not able to find Scgb genes in the snakes that make up a large share of squamate membership nor any in tuatara (Sphenodon punctatus), the last living species of a closely related order (Rhynchocephalia).

In contrast to the sauropsid SCGBs, UG/SCGB1A, ABPA/SCGB1B, and ABPBG/SCGB2B are not candidates for the ancestral SCGBs because they are found in numerous monotremes, marsupials, and placental mammals in the synapsid branch, suggesting their origin in an ancestor of metatherian and eutherian mammals. The time estimate for divergence of the monotreme line from other mammalian lines is uncertain but probably falls between 163 and 220 million years ago (Hedges & Kumar 2009); thus, UG appeared substantially later than the other subfamilies. Moreover, therian mammals with common characters split into marsupials and eutherians (placental mammals) about 160 MYA (Luo et al. 2011), so UG/SCGB1A is much younger than the SCGB2A, SCGB3A, and SCGB1C families. We found the ABPA (SCGB1B) and ABPBG (SCGB2B) families only in marsupial and placental mammals although we cannot rule out earlier loss in the mammalian lineage. On the other hand, they may occur in nonmammalian lineages that we did not find.

This complex distribution of SCGBs suggests that there is an as-yet-undiscovered basic function of SCGBs shared by all amniotes. We hope that extending our knowledge to nonmammal vertebrates will eventually lure more researchers into the SCGB area of interest and that more research on expression in this wider group of organisms may begin to reveal new functional information about SCGBs.

Materials and Methods

Data Mining Scgb Genes From a Phylogenetically Diverse Group of Animals

We used both psi-BLAST and the BLAT tool on the UCSC genome browser to identify Scgb sequences in genomes in which they had not previously been found and/or published. We also tested iteratively, requerying with sequences from closely related species as these were identified. Those Scgb sequences sometimes had been predicted algorithmically but were not documented in the literature. Many of them were embedded in longer sequences, much or most of which were not Scgb material and/or were missing sequence, often the small third exons of some Scgbs (e.g. Abpa sequences). Some algorithmical hits also had been named with Scgb or other nomenclature, so we tested all those against the known Scgbs published for human, mouse, and rat genes. We also tested our conclusion about the Scgb designation we intended to apply by back-psiBLASTing the sequences in question against the genomes of the three placental mammals mentioned above.

Predicted gene sequences (Alioto 2012) are available from GenBank (https://www.ncbi.nlm.nih.gov/genbank/) for many organisms, and we have accessed 86 new Scgb sequences in a variety of mammalian and nonmammalian vertebrates. In some instances, the predicted sequences include the first, second, and sometimes the third exons consistent with Scgbs confirmed in other mammals. We used psi-BLAST and, in some cases, the Clustered nr program (https://ncbiinsights.ncbi.nlm.nih.gov/2022/05/02/clusterednr_1/) on the NCBI website to search for Scgbs in their genomes in GenBank. Clustered nr is the standard NCBI nr database clustered with each sequence within 90% identity and 90% length to other members of the cluster and the search runs against a single representative sequence for each cluster. The representative is used as a title for the cluster and can be used to fetch all the other members. Clustered nr is smaller and more compact for searching than other nr programs and its results have more taxonomic depth than standard nr results.

We searched other genome sources including the UCSC Genome Browser (https://genome.ucsc.edu/cgi-bin/hgGateway; Clawson et al. 2023) and Zoonomia (https://zoonomiaproject.org/the-mammal-tree-view/), which has 240 mammal species represented. This was derived from a Cactus genome alignment made without reference to any single genome. Thus, it includes both regions shared across eutherian mammals and regions unique to specific lineages. We selected a species from the “mammalian tree” at the link above and downloaded its NCBI assembly and searched it with the BLAST tool, using known Scgb sequences from various related species as queries.

To find Scbg genes in mammals that lack previously described sequences, we conducted BLAT searches (https://genome.ucsc.edu/cgi-bin/hgGateway?hgsid = 2068453854_Xs4H52hjvBamvVaVAQa87r5Cwjba; Kent 2002) on the UCSC browser using known sequences from other related taxa as the queries. The queries included cDNAs and/or individual exons of Scgb genes (blastn) from more-or-less closely related mammals and other vertebrates that had already been identified. Alternatively, the queries were complete or partial protein sequences used in psi-BLAST searches (Altschul et al. 1997). This included the Clustered nr program in a search run against a single representative sequence for each cluster (MMseqs2; see Steinegger and Söding 2017). Where chromosomal placement was possible, we numbered the Abp genes starting proximal to the centromere and proceeding distally (see Karn et al. 2021 for examples in the genus Mus). In the case of assignment to fragments identified with various characters, we were not able to use such a numbering system.

Validating Scgb nt and SCGB Amino Acid Sequences

Scbg genes are composed of three exons separated by two introns and always have a phase 1 intron between their first and second exons and a phase 0 intron between their second and third exons (for phase determination, see Nguyen et al. 2006). The first exon in most Scgbs contains almost all of the signal peptide and is easy to find because of the ATG start codon, the second codon (usually AAG or a variation on AAX) and the GT intron donor splice site. It usually is 61 bp long with very little variation, probably because it encodes an evolutionarily constrained signal peptide. In cases where exon–intron boundaries of the second and third introns were questionable, we aligned the new sequence with established sequences of the most closely related taxa. Concatenating the three exons and translating them usually was sufficient to determine whether the gene could be expressed or was likely to be a pseudogene.

We evaluated these Scgb genes for accuracy and, in some cases, had to correct them either because they (i) missed the third exon, especially the small exon in some Scgbs, and/or (ii) translated inappropriate nts (e.g. nt upstream of the start codon; the gt donor or ag acceptor splice sites flanking an exon sequence; and/or additional intron sequence). We assembled the curated exons into cDNAs and translated them into putative SCGB proteins, which allowed us to identify the gene as either intact (i.e. potentially expressed) or as a pseudogene (i.e. if it had either a disruption in the coding region and/or a noncanonical splice site; Emes et al. 2004; Laukaitis et al. 2008; Karn et al. 2021). While experimental confirmation was precluded by the wide survey of taxa and lack of DNA availability, we could curate the sequences we obtained by evaluating them for missing or extra nts as described in supplementary file S2, Supplementary Material online.

In the validation process, we referred to the table of nt lengths of exon and intron regions of Abpa and Abpgbg genes in Laukaitis et al. (2003), along with the characteristic of intron phase. Scbg genes are composed of three exons separated by two introns and always have a phase 1 intron between their first and second exons and a phase 0 intron between their second and third exons (for phase determination, see Nguyen et al. 2006). We most often found deviations from the size and phase expectations in the third exon of Scbg genes, where they had either one too many or one too few nts and/or a phase other than 0. In the curation process, we used various members of the Sequence Manipulation Suite at https://www.bioinformatics.org/ (Stothard 2000), including Filter (https://www.bioinformatics.org/sms2/filter_dna.html), Reverse Complement (https://www.bioinformatics.org/sms/rev_comp.html), and Translate (https://www.bioinformatics.org/sms2/translate.html). We also used Clustal Omega (https://www.ebi.ac.uk/jdispatcher/msa/clustalo) to make alignments of both nt sequences and amino acid sequences.

Data Analysis

In order for us to conclude that a putative gene sequence, especially one found in nonmammals, actually represents an Scgb sequence, we had to use biochemical characteristics of SCGBs that have been established by the research community (see for example Callebaut et al. 2000). The paucity of information on functions of SCGBs precludes using function to determine what is a real Scgb/SCGB.

We assigned exons and introns to the verified and/or corrected DNA sequences of the Scgbs of numerous vertebrate taxa by aligning them with the exon and intron sequences of known Scgbs in the same subfamily. The donor and acceptor splice sites were identified, and the exons were assembled into putative mRNAs and translated in silico. From the translations, we identified each gene as either a potentially expressed gene or as a pseudogene if it had either a disruption in the coding region and/or a noncanonical splice site (Emes et al. 2004; Laukaitis et al. 2008; Karn et al. 2021). We used SignalP 6.0 (https://services.healthtech.dtu.dk/services/SignalP-6.0/; (Teufel et al. 2022) to determine signal-peptide cleavage points when removing those to produce the sequences of the secreted proteins.

Clustal Omega alignments allowed us to identify gaps in sequences that were to be further analyzed in WebLOGO where all the sequence lengths must be equal for that analysis (https://weblogo.berkeley.edu/logo.cgi; Schneider and Stephens 1990; Crooks et al. 2004). In applying WebLOGO, we included the signal peptides since they provide an additional feature for comparing Scgb sequences. Key amino acids most often conserved among the sequences in a group, especially ½-Cys and Gly, stand out in contrast to those more often substituted in secondary structure, i.e. alpha-helices in the case of SCGBs.

We used Phyre2 threading on the Phyre2 web portal for protein modeling, prediction, and analysis (Kelley et al. 2015) to verify the boomerang-shaped four-helix bundles characteristic of the UG fold (Callebaut et al. 2000). Another feature specific to SCGBs is six conserved sites that we used for verification.

We used MAFFT to align the SCGB sequences and IQtree (http://www.iqtree.org; Trifinopoulos et al. 2016) to build maximum likelihood phylogenetic trees that were visualized with FigTree v1.4.3 (http://tree.bio.ed.ac.uk/software/figtree/). Bootstrap values (1,000 repetitions) were obtained with the MAFFT ultrafast bootstrap approximation. Figure 3 was produced using the Interactive Tree of Life (http://itol.embl.de).

Supplementary Material

evaf024_Supplementary_Data

Acknowledgments

We thank the Carl R. Woese Institute for Genomic Biology (IGB) at the University of Illinois for research infrastructure.

Contributor Information

Robert C Karn, Gene Networks in Neural and Developmental Plasticity, Institute for Genomic Biology, University of Illinois, Urbana, IL, USA; Department of Biomedical & Translational Sciences, Carle Illinois College of Medicine, University of Illinois, Urbana, IL, USA.

Christina M Laukaitis, Department of Biomedical & Translational Sciences, Carle Illinois College of Medicine, University of Illinois, Urbana, IL, USA; Environmental Impact on Reproductive Health, Regenerative Biology and Tissue Engineering, Institute for Genomic Biology, University of Illinois, Urbana, IL, USA; Department of Genetics, Carle Health, Urbana, IL, USA.

Supplementary Material

Supplementary material is available at Genome Biology and Evolution online.

Author Contributions

R.C.K. and C.M.L. conceived of the project. R.C.K. mined the Scbg sequence data and identified the exons and introns in the paralog structures, built the phylogenies, and performed the WebLOGO and Phyre2 analyses. C.M.L. researched and drafted the material on the medical uses of human SCGBs and on the evolutionary histories of major amniote groups, creating Fig. 3. Both authors assessed the evolutionary implications and participated in writing the manuscript.

Data Availability

All sequence data are released into GenBank and their accession numbers are listed in supplementary file S1a, Supplementary Material online.

Literature Cited

  1. Alioto  T. Gene prediction. In: Anisimova M, editor. Evolutionary genomics. Methods in molecular biology. Vol. 855 Totowa, NJ: Humana Press; 2012. p. 175–201. [DOI] [PubMed] [Google Scholar]
  2. Altschul  SF, Madden  TL, Schäffer  AA, Zhang  J, Zhang  Z, Miller  W, Lipman  DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997:25(17):3389–3402. 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Beato  M, Baier  R. Binding of progesterone to the proteins of the uterine luminal fluid. Identification of uteroglobin as the binding protein. Biochim Biophys Acta. 1975:392(2):346–356. 10.1016/0304-4165(75)90016-1. [DOI] [PubMed] [Google Scholar]
  4. Bignotti  E, Tassi  RA, Calza  S, Ravaggi  A, Rossi  E, Donzelli  C, Todeschini  P, Romani  C, Bandiera  E, Zanotti  L, et al.  Secretoglobin expression in ovarian carcinoma: lipophilin B gene upregulation as an independent marker of better prognosis. J Transl Med. 2013:11(1):162. 10.1186/1479-5876-11-162. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bímová  B, Karn  RC, Pialek  J. The role of salivary androgen-binding protein in reproductive isolation between two subspecies of house mouse: Mus musculus musculus and Mus musculus domesticus. Biol J Linn Soc Lond.  2005:84(3):349–361. 10.1111/j.1095-8312.2005.00439.x. [DOI] [Google Scholar]
  6. Bímová  BV, Macholán  M, Baird  SJE, Munclinger  P, Dufková  P, Laukaitis  CM, Karn  RC, Luzynski  K, Tucker  PK, Piálek  J. Reinforcement selection acting on the European house mouse hybrid zone. Mol Ecol. 2011:20(11):2403–2424. 10.1111/j.1365-294X.2011.05106.x. [DOI] [PubMed] [Google Scholar]
  7. Boe-Hansen  GB, Rego  JPA, Crisp  JM, Moura  AA, Nouwens  AS, Li  Y, Venus  B, Burns  BM, McGowan  MR. Seminal plasma proteins and their relationship with percentage of morphologically normal sperm in 2-year-old Brahman (Bos indicus) bulls. Anim Reprod Sci. 2015:162:20–30. 10.1016/j.anireprosci.2015.09.003. [DOI] [PubMed] [Google Scholar]
  8. Brändén  C-I, Tooze  J. Introduction to protein structure. New York: Garland Pub; 1991. [Google Scholar]
  9. Callebaut  I, Poupon  A, Bally  R, Demaret  J, Housset  D, Delettré  J, Hossenlopp  P, Mornon  J. The uteroglobin fold. Ann N Y Acad Sci.  2000:923(1):90–112. 10.1111/j.1749-6632.2000.tb05522.x. [DOI] [PubMed] [Google Scholar]
  10. Chen  X, Shu  S, Schwartz  LC, Sun  C, Kapur  J, Bayliss  DA. Homeostatic regulation of synaptic excitability: tonic GABA(A) receptor currents replace I(h) in cortical pyramidal neurons of HCN1 knock-out mice. J Neurosci. 2010:30(7):2611–2622. 10.1523/JNEUROSCI.3771-09.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Chung  AG, Belone  PM, Bimova  BV, Karn  RC, Laukaitis  CM. Studies of an androgen-binding protein knockout corroborate a role for salivary ABP in mouse communication. Genetics. 2017:205(4):1517–1527. 10.1534/genetics.116.194571. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Clawson  H, Lee  BT, Raney  BJ, Barber  GP, Casper  J, Diekhans  M, Fischer  C, Gonzalez  JN, Hinrichs  AS, Lee  CM, et al.  GenArk: towards a million UCSC genome browsers. Genome Biol. 2023:24(1):217. 10.1186/s13059-023-03057-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Coates  M, Mariottoni  P, Corcoran  DL, Kirshner  HF, Jaleel  T, Brown  DA, Brooks  SR, Murray  J, Morasso  MI, MacLeod  AS. The skin transcriptome in hidradenitis suppurativa uncovers an antimicrobial and sweat gland gene signature which has distinct overlap with wounded skin. PLoS One. 2019:14(5):e0216249. 10.1371/journal.pone.0216249. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Crooks  GE, Hon  G, Chandonia  J-M, Brenner  SE. WebLogo: a sequence logo generator. Genome Res. 2004:14(6):1188–1190. 10.1101/gr.849004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Dominguez  P. Cloning of a Syrian hamster cDNA related to sexual dimorphism: establishment of a new family of proteins. FEBS Lett. 1995:376(3):257–261. 10.1016/0014-5793(95)01294-4. [DOI] [PubMed] [Google Scholar]
  16. Emes  RD, Riley  MC, Laukaitis  CM, Goodstadt  L, Karn  RC, Ponting  CP. Comparative evolutionary genomics of androgen-binding protein genes. Genome Res. 2004:14(8):1516–1529. 10.1101/gr.2540304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Fan  R, Xie  S, Wang  S, Yu  Z, Sun  X, Du  Q, Yang  Y, Han  R. Identification markers of goat milk adulterated with bovine milk based on proteomics and metabolomics. Food Chem X. 2023:17:100601. 10.1016/j.fochx.2023.100601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Folk  JE. Transglutaminases. Annu Rev Biochem. 1980:49(1):517–531. 10.1146/annurev.bi.49.070180.002505. [DOI] [PubMed] [Google Scholar]
  19. Gail  DB. Closing remarks and future directions. Ann N Y Acad Sci.  2000:923(1):355–356. 10.1111/j.1749-6632.2000.tb05550.x. [DOI] [Google Scholar]
  20. García-García  PM, Martín-Izquierdo  E, de Basoa  CM-F, Jarque-López  A, Pérez-Suárez  G, Rivero-González  A, González-Posadas  JM, Macía-Heras  M, García-Nieto  VM, Navarro-Gónzález  JF. Urinary Clara cell protein in kidney transplant patients: a preliminary study. Transplant Proc. 2016:48(9):2884–2887. 10.1016/j.transproceed.2016.09.022. [DOI] [PubMed] [Google Scholar]
  21. Gomes  FP, Park  R, Viana  AG, Fernandez-Costa  C, Topper  E, Kaya  A, Memili  E, Yates  JR, Moura  AA. Protein signatures of seminal plasma from bulls with contrasting frozen-thawed sperm viability. Sci Rep. 2020:10(1):14661. 10.1038/s41598-020-71015-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Hedges  SB, Kumar  S, editors. The timetree of life. New York: Oxford University Press; 2009. p. 459–461. ISBN 978-0-19-953503-3. [Google Scholar]
  23. Heyns  W, De Moor  P. Prostatic binding protein. A steriod-binding protein secreted by rat prostate. Eur J Biochem. 1977:78(1):221–230. 10.1111/j.1432-1033.1977.tb11733.x. [DOI] [PubMed] [Google Scholar]
  24. Jackson  BC, Thompson  DC, Wright  MW, McAndrews  M, Bernard  A, Nebert  DW, Vasiliou  V. Update of the human secretoglobin (SCGB) gene superfamily and an example of “evolutionary bloom” of androgen-binding protein genes within the mouse Scgb gene superfamily. Hum Genomics. 2011:5(6):691–702. 10.1186/1479-7364-5-6-691. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Ji  Z, Zhang  J, Deng  C, Hu  Z, Du  Q, Guo  T, Wang  J, Fan  R, Han  R, Yang  Y. Identification of mare milk adulteration with cow milk by liquid chromatography-high resolution mass spectrometry based on proteomics and metabolomics approaches. Food Chem. 2023:405:134901. 10.1016/j.foodchem.2022.134901. [DOI] [PubMed] [Google Scholar]
  26. Karn  RC. The mouse salivary androgen-binding protein (ABP) alpha subunit closely resembles chain 1 of the cat allergen Fel dI. Biochem Genet. 1994:32(7-8):271–277. 10.1007/BF00555830. [DOI] [PubMed] [Google Scholar]
  27. Karn  RC. Steroid binding by mouse salivary proteins. Biochem Genet. 1998:36(3/4):105–117. 10.1023/A:1018708404789. [DOI] [PubMed] [Google Scholar]
  28. Karn  RC, Laukaitis  CM. Selection shaped the evolution of mouse androgen-binding protein (ABP) function and promoted the duplication of Abp genes. Biochem Soc Trans. 2014:42(4):851–860. 10.1042/BST20140042. [DOI] [PubMed] [Google Scholar]
  29. Karn  RC, Yazdanifar  G, Pezer  Ž, Boursot  P, Laukaitis  CM. Androgen-binding protein (Abp) evolutionary history: has positive selection caused fixation of different paralogs in different taxa of the genus Mus?  Genome Biol Evol. 2021:13(10):evab220. 10.1093/gbe/evab220. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Kelley  LA, Mezulis  S, Yates  CM, Wass  MN, Sternberg  MJE. The Phyre2 web portal for protein modeling, prediction and analysis. Nat Protoc. 2015:10(6):845–858. 10.1038/nprot.2015.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Kent  WJ. BLAT–the BLAST-like alignment tool. Genome Res. 2002:12(4):656–664. 10.1101/gr.229202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Kim  S-D, Kang  SA, Kim  Y-W, Yu  HS, Cho  K-S, Roh  H-J. Screening and functional pathway analysis of pulmonary genes associated with suppression of allergic airway inflammation by adipose stem cell-derived extracellular vesicles. Stem Cells Int. 2020:2020:5684250. 10.1155/2020/5684250. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Kimura  S, Yokoyama  S, Pilon  AL, Kurotani  R. Emerging role of an immunomodulatory protein secretoglobin 3A2 in human diseases. Pharmacol Ther. 2022:236:108112. 10.1016/j.pharmthera.2022.108112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Laukaitis  CM, Critser  ES, Karn  RC. Salivary androgen-binding protein (Abp) mediates sexual isolation in Mus musculus. Evolution. 1997:51(6):2000–2005. 10.2307/2411020. [DOI] [PubMed] [Google Scholar]
  35. Laukaitis  CM, Dlouhy  SR, Karn  RC. The mouse salivary androgen-binding protein (ABP) gene cluster on chromosomes 7: characterization and evolutionary relationships. Mamm Genome.  2003:14(10):679–691. 10.1007/s00335-003-2291-y. [DOI] [PubMed] [Google Scholar]
  36. Laukaitis  CM, Heger  A, Blakley  TD, Munclinger  P, Ponting  CP, Karn  RC. Rapid bursts of androgen-binding protein (Abp) gene duplication occurred independently in diverse mammals. BMC Evol Biol. 2008:8(1):46. 10.1186/1471-2148-8-46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Laukaitis  CM, Karn  RC. Evolution of the secretoglobins: a genomic and proteomic view. Biol J Linn Soc Lond.  2005:84(3):493–501. 10.1111/j.1095-8312.2005.00450.x. [DOI] [Google Scholar]
  38. Laukaitis  CM, Karn  RC. Recognition of subspecies status mediated by androgen-binding protein (ABP) in the evolution of incipient reinforcement on the European house mouse hybrid zone. In: Macholan  M, Munclinger  P, Baird  SJ, Pialek  J, editors. Evolution of the house mouse. Cambridge studies in morphology and molecules: new paradigms in evolutionary biology. Cambridge, UK: Cambridge University Press; 2012. p. 150–190. [Google Scholar]
  39. Lehrer  RI, Xu  G, Abduragimov  A, Dinh  NN, Qu  XD, Martin  D, Glasgow  BJ. Lipophilin, a novel heterodimeric protein of human tears. FEBS Lett. 1998:432(3):163–167. 10.1016/S0014-5793(98)00852-7. [DOI] [PubMed] [Google Scholar]
  40. Leonardi  A, Palmigiano  A, Mazzola  EA, Messina  A, Milazzo  EMS, Bortolotti  M, Garozzo  D. Identification of human tear fluid biomarkers in vernal keratoconjunctivitis using iTRAQ quantitative proteomics. Allergy. 2014:69(2):254–260. 10.1111/all.12331. [DOI] [PubMed] [Google Scholar]
  41. Li  J, Xu  W, Zhu  Y. Mammaglobin B may be a prognostic biomarker of uterine corpus endometrial cancer. Oncol Lett. 2020:20(5):255. 10.3892/ol.2020.12118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Loux  SC, Scoggin  KE, Troedsson  MHT, Squires  EL, Ball  BA. Characterization of the cervical mucus plug in mares. Reproduction. 2017:153(2):197–210. 10.1530/REP-16-0396. [DOI] [PubMed] [Google Scholar]
  43. Lu  X, Wang  N, Long  X-B, You  X-J, Cui  Y-H, Liu  Z. The cytokine-driven regulation of secretoglobins in normal human upper airway and their expression, particularly that of uteroglobin-related protein 1, in chronic rhinosinusitis. Respir Res. 2011:12(1):28. 10.1186/1465-9921-12-28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Luconi  M, Muratori  M, Maggi  M, Pecchioli  P, Peri  A, Mancini  M, Filimberti  E, Forti  G, Baldi  E. Uteroglobin and transglutaminase modulate human sperm functions. J Androl. 2000:21(5):676–688. 10.1002/j.1939-4640.2000.tb02136.x. [DOI] [PubMed] [Google Scholar]
  45. Luo  Z-X, Yuan  C-X, Meng  Q-J, Ji  Q. A Jurassic eutherian mammal and divergence of marsupials and placentals. Nature. 2011:476(7361):442–445. 10.1038/nature10291. [DOI] [PubMed] [Google Scholar]
  46. Manjunath  R, Chung  SI, Mukherjee  AB. Crosslinking of uteroglobin by transglutaminase. Biochem Biophys Res Commun. 1984:121(1):400–407. 10.1016/0006-291X(84)90736-8. [DOI] [PubMed] [Google Scholar]
  47. Martinu  T, Todd  JL, Gelman  AE, Guerra  S, Palmer  SM. Club cell secretory protein in lung disease: emerging concepts and potential therapeutics. Annu Rev Med. 2023:74(1):427–441. 10.1146/annurev-med-042921-123443. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Milosevic  B, Stojanovic  B, Cvetkovic  A, Jovanovic  I, Spasic  M, Stojanovic  MD, Stankovic  V, Sekulic  M, Stojanovic  BS, Zdravkovic  N, et al.  The enigma of mammaglobin: redefining the biomarker paradigm in breast carcinoma. Int J Mol Sci. 2023:24(17):13407. 10.3390/ijms241713407. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Mornon  JP, Fridlansky  F, Bally  R, Milgrom  E. X-ray crystallographic analysis of a progesterone-binding protein. The C2221 crystal form of oxidized uteroglobin at 2.2 A resolution. J Mol Biol. 1980:137(4):415–429. 10.1016/0022-2836(80)90166-7. [DOI] [PubMed] [Google Scholar]
  50. Mukherjee  AB, Chilton  BS. Annals of the New York academy of sciences. Vol. 923. New York: New York Academy of Sciences; 2000. p. 1–358.11193749 [Google Scholar]
  51. Mukherjee  AB, Zhang  Z, Chilton  BS. Uteroglobin: a steroid-inducible immunomodulatory protein that founded the secretoglobin superfamily. Endocr Rev. 2007:28(7):707–725. 10.1210/er.2007-0018. [DOI] [PubMed] [Google Scholar]
  52. Munakata  K, Uemura  M, Takemasa  I, Ozaki  M, Konno  M, Nishimura  J, Hata  T, Mizushima  T, Haraguchi  N, Noura  S, et al.  SCGB2A1 is a novel prognostic marker for colorectal cancer associated with chemoresistance and radioresistance. Int J Oncol. 2014:44(5):1521–1528. 10.3892/ijo.2014.2316. [DOI] [PubMed] [Google Scholar]
  53. Nguyen  HD, Yoshihama  M, Kenmochi  N. Phase distribution of spliceosomal introns: implications for intron origin. BMC Evol Biol. 2006:6(1):69. 10.1186/1471-2148-6-69. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Ni  J, Kalff-Suske  M, Gentz  R, Schageman  J, Beato  M, Klug  J. All human genes of the uteroglobin family are localized on chromosome 11q12.2 and form a dense cluster. Ann N Y Acad Sci.  2000:923(1):25–42. 10.1111/j.1749-6632.2000.tb05517.x. [DOI] [PubMed] [Google Scholar]
  55. Orysiak  J, Malczewska-Lenczowska  J, Bik-Multanowski  M. Expression of SCGB1C1 gene as a potential marker of susceptibility to upper respiratory tract infections in elite athletes—a pilot study. Biol Sport. 2016:33(2):107–110. 10.5604/20831862.1196510. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Özdaş  S, İzbirak  A, Özdaş  T, Özcan  KM, Erbek  SS, Köseoğlu  S, Dere  H. Single-nucleotide polymorphisms on the RYD5 gene in nasal polyposis. DNA Cell Biol. 2015:34(10):633–642. 10.1089/dna.2015.2897. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Peri  A, Cordella-Miele  E, Miele  L, Mukherjee  AB. Tissue-specific expression of the gene coding for human Clara cell 10-kD protein, a phospholipase A2-inhibitory protein. J Clin Invest. 1993:92(5):2099–2109. 10.1172/JCI116810. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Schneider  TD, Stephens  RM. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 1990:18(20):6097–6100. 10.1093/nar/18.20.6097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Smith  ML, Souza  FGO, Bruce  KS, Strang  CE, Morley  BJ, Keyser  KT. Acetylcholine receptors in the retinas of the a7 nicotinic acetylcholine receptor knockout mouse. Mol Vis. 2014:20:1328–1356. http://www.molvis.org/molvis/v20/1328. [PMC free article] [PubMed] [Google Scholar]
  60. Steinegger  M, Söding  J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol. 2017:35(11):1026–1028. 10.1038/nbt.3988. [DOI] [PubMed] [Google Scholar]
  61. Stothard  P. The sequence manipulation suite: JavaScript programs for analyzing and formatting protein and DNA sequences. Biotechniques. 2000:28(6):1102,1104. 10.2144/00286ir01. [DOI] [PubMed] [Google Scholar]
  62. Stripp  BR, Lund  J, Mango  GW, Doyen  KC, Johnston  C, Hultenby  K, Nord  M, Whitsett  JA. Clara cell secretory protein: a determinant of PCB bioaccumulation in mammals. Am J Physiol. 1996:271(4\ Pt\ 1):L656–L664. 10.1152/ajplung.1996.271.4.L656. [DOI] [PubMed] [Google Scholar]
  63. Talley  HM, Laukaitis  CM, Karn  RC. Female preference for male saliva: implications for sexual isolation of Mus musculus subspecies. Evolution. 2001:55(3):631–634. 10.1554/0014-3820(2001)055[0631:FPFMSI]2.0.CO;2. [DOI] [PubMed] [Google Scholar]
  64. Taniuchi  K, Tsuboi  M, Sakaguchi  M, Saibara  T. Measurement of serum PODXL concentration for detection of pancreatic cancer. Onco Targets Ther. 2018:11:1433–1445. 10.2147/OTT.S155367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Teufel  F, Almagro Armenteros  JJ, Johansen  AR, Gíslason  MH, Pihl  SI, Tsirigos  KD, Winther  O, Brunak  S, von Heijne  G, Nielsen  H. SignalP 6.0 predicts all five types of signal peptides using protein language models. Nat Biotechnol. 2022:40(7):1023–1025. 10.1038/s41587-021-01156-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Tizard  IR. Comparative mammalian immunology: the evolution and diversity of the immune systems of mammals. London, United Kingdom, San Diego, CA: Academic Press; 2023. [Google Scholar]
  67. Trifinopoulos  J, Nguyen  LT, von Haeseler  A, Minh  BQ. W-IQ-TREE: a fast online phylogenetic tool for maximum likelihood analysis. Nucleic Acids Res. 2016:44(W1):W232–W235. 10.1093/nar/gkw256. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Zehentner  BK, Carter  D. Mammaglobin: a candidate diagnostic marker for breast cancer. Clin Biochem. 2004:37(4):249–257. 10.1016/j.clinbiochem.2003.11.005. [DOI] [PubMed] [Google Scholar]
  69. Zhang  Y, Li  L, Ke  X-P, Liu  P. The identification of a PTEN-associated gene signature for the prediction of prognosis and planning of therapeutic strategy in endometrial cancer. Transl Cancer Res. 2023:12(12):3409–3424. 10.21037/tcr-23-1436. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Zhang  Z, Kundu  GC, Yuan  CJ, Ward  JM, Lee  EJ, DeMayo  F, Westphal  H, Mukherjee  AB. Severe fibronectin-deposit renal glomerular disease in mice lacking uteroglobin. Science. 1997:276(5317):1408–1412. 10.1126/science.276.5317.1408. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

evaf024_Supplementary_Data

Data Availability Statement

All sequence data are released into GenBank and their accession numbers are listed in supplementary file S1a, Supplementary Material online.


Articles from Genome Biology and Evolution are provided here courtesy of Oxford University Press

RESOURCES