Skip to main content
Genetics logoLink to Genetics
editorial
. 2021 Mar 1;217(3):iyab023. doi: 10.1093/genetics/iyab023

The Descent of Databases

Howard D Lipshitz 1,
PMCID: PMC8045696  PMID: 33789353

Following Thomas Hunt Morgan's discovery of the white mutant in 1910 (Morgan 1910), new Drosophila melanogaster mutants were discovered at a rapid pace in the Columbia University “Fly Room.” By 1923, there were more than 400 mutants—at the time referred to as “races”—that represented more than 160 genes. It, therefore, became necessary to produce the first—admittedly primitive—genetic “databases,” which appeared as a series of publications by the Carnegie Institution of Washington, starting in 1916 with 36 X-linked genes (Morgan and Bridges 1916), followed in 1919 by 39 genes on the second chromosome (Bridges and Morgan 1919) and then, in 1923, by 91 genes on the third chromosome (Bridges and Morgan 1923). Each publication listed the known genes and described their discovery, allelomorphs (subsequently shortened to “alleles”), recombination map position, phenotypes, and genetic interactions, often with multiple accompanying figures and tables. A dozen years later, Bridges reported eight genes on the small, fourth chromosome (Bridges 1935a). Many of these early-identified genes are now well known. To name a few, on the X: fused (first allele identified in 1912), Bar (1913), Notch (1913); on the second: vestigial (1910), blistered (1911), dachs (1912), morula (1913), patched (1913), Star (1916); on the third: Deformed (1913), rough (1913), spineless (1914), bithorax (1915), Delta (1918), hairy (1918); on the fourth: eyeless (1914) and cubitus interruptus (1930).

The Carnegie publications were authored by Morgan together with Calvin Bridges who, as described in an earlier editorial (Lipshitz 2021), while a Ph.D. student with Morgan, proved that genes are on chromosomes (Bridges 1914, 1916a, 1916b). Subsequently, a Carnegie Institution research grant to Morgan, valued in 1915 at $3,600 per annum (equivalent to roughly $90,000 today), was used in its entirety for the research fellow salaries of Bridges and another of Morgan's former Ph.D. students, Alfred Sturtevant (Allen 1978). When Morgan and these former students moved to Caltech in 1928, the Carnegie grant having increased to $12,575 per annum (equivalent to about $190,000 today), he was able to continue Bridges' position as a research fellow. Bridges never attained—or desired—professorial status; despite this he was elected to the National Academy of Sciences in 1936.

Bridges pioneered two key innovations, both of which are taken for granted by researchers today: cataloging and systematizing genetic information (referred to here loosely as “database development”) and establishment of a stock center for mutant strains. Apart from authoring the abovementioned publications, Bridges started, with Milislav Demerc of the Department of Genetics of the Carnegie Institution (which later became the Cold Spring Harbor Laboratory), the Drosophila Information Service (DIS), the goals of which were to provide the community with catalogs of genes, reports of new mutants and methods, and bibliographies of the Drosophila literature (Bridges and Demerec 1934). DIS also presented lists of mutant fly lines available to the research community worldwide from a stock center that Bridges was “largely instrumental” (Morgan 1940) in establishing at Caltech. Notably, DIS specifically wished to share unpublished material, stating on the cover “This is not a publication––Unpublished material presented in this circular must not be used in publications without the specific permission of the author” and in the preface that “An appreciable share of credit for the fine accomplishments in Drosophila genetics is due to the broadmindedness of the original Drosophila workers who established the policy of a free exchange of material and information among all actively interested in Drosophila research. This policy has proved to be a great stimulus for the use of Drosophila material in genetic research and is directly responsible for many important contributions.” Sharing unpublished but useful information and mutant strains with the entire research community fit squarely in the Morgan tradition of openness.

Publication of DIS continues today—Volume 103 appeared in 2020—although many of its initial roles have been superseded by online databases (about which, more below). Thirty-five years ago, on the initiative of Thom Kaufman, the Caltech stock center moved to Indiana University, where it has grown and prospered. In 1986, 1,675 stocks were transferred from Pasadena to the Bloomington Drosophila Stock Center (https://bdsc.indiana.edu/); today, it houses over 75,000 fly lines and each year sends more than 200,000 of these worldwide. Bridges would have been delighted!

As an aside, Bridges made many other innovations, including the introduction of binocular microscopes to Drosophila research (up until then hand lenses had been used—Morgan insisted on continuing to do so!); synthetic medium (agar-cornmeal-molasses-yeast, replacing bananas); temperature-controlled incubators; the Drosophila nomenclature that is still in use today; and specially manufactured glass milk bottles with a square base to optimize the amount of medium in each container and to save space in the incubators. As amusingly related by Morgan, this last-mentioned innovation failed (Morgan 1940); however, even in this case, Bridges was ahead of his time: disposable square-base plastic bottles are now in common use for fly stocks and crosses.

Herman Muller, another of Morgan's fly room proteges, and the second model organism geneticist after Morgan (1933) to win the Nobel Prize in Physiology or Medicine (in 1946, for his Drosophila experiments showing that ionizing radiation causes genetic lesions), noted that Bridges' “gathering together and systematization of the multitudinous extant material on Drosophila mutations and technique, in the ‘Drosophila Information Service’, constituted the invaluable work upon which he was still engaged at the time of his death” (Muller 1939). When he died at age 49 in 1938 Bridges had, in fact, been working on what was to be the first comprehensive encyclopedia of all known Drosophila genes brought together alphabetically in a single publication. It was subsequently completed and edited by Katherine Brehme and appeared in 1944, again under the auspices of the Carnegie Institution (Bridges and Brehme 1944). It also included plates of Bridges' polytene chromosome maps correlated with the genetic maps of each chromosome, which he had spent his final years studying and drawing; they were reproduced from his original publications (Bridges 1935b, 1938). These maps—aligned with high-magnification photographic montages of the polytene chromosomes (Lefevre 1976)—remain in use today.

As another aside for those interested in the history of genetics, it was Bridges' hypothesis in that 1935 paper, that polytene chromosome doublets might represent tandem gene duplications upon which evolution could act to produce new gene functions, that led Ed Lewis to begin his studies of genes that map to such doublets: Star-asteroid (the 21E1-2 doublet), white-apricot (3C2-3), Stubble-stubbloid (89B4-5), and the bithorax series of “pseudo-alleles” (89E1-2). It was only later that Lewis realized the significance of the last of these as master regulators of development; his initial interest was in the nature of genes and their evolution (Lipshitz 2007). Fifty years after publishing his paper on the Star-asteroid doublet (Lewis 1945), in which he invented the cis-trans test for position effect, Lewis received the third Nobel Prize awarded for research on Drosophila, together with Christiane Nüsslein-Volhard and Eric Wieschaus.

A quarter century after Bridges-Brehme appeared, Dan Lindsley and Ed Grell published an updated version of the encyclopedia of fly genes (Lindsley and Grell 1968), which was fondly referred to as “the red book” because of the color of its cover, with tongue-in-cheek reference to a “little red book” that was widely consulted at the time. The Drosophila red book appeared a few years before the first fly genes were molecularly cloned in David Hogness' laboratory at Stanford University (Wensink et al. 1974; Glover et al. 1975), triggering a revolution in our understanding of gene and chromosome structure, the mechanisms by which genes control development in space and time, and serving as the spark that ignited genome projects two decades later (Burtis et al. 2003).

In 1992, Lindsley, together with Georgianna Zimm, updated the encyclopedia to include molecular information (Lindsley and Zimm 1992). Its cover was now red and white rather than uniformly red, and the title no longer referred to “The Mutants” (Bridges-Brehme) or “Genetic Variations” (Lindsley-Grell) but, instead, to “The Genome” of Drosophila melanogaster. By the time the tome appeared it was already abundantly clear that the rate of growth of new information was so great—for example, over 60,000 scientific papers on Drosophila had been published—that any further “hard copy” publication of this type would be too unwieldy as well as out of date long before it appeared.

Fortunately, the invention of personal computers, the internet and the worldwide web—as well as Moore’s Law (Moore 1965) – enabled the burgeoning growth, storage and dissemination of genetic and molecular information. FlyBase was conceptualized in 1989 and formally established in 1992, with the goal of providing a computerized, online, searchable database of genetic, phenotypic and molecular information on Drosophila (Ashburner and Drysdale 1994). Coming full circle to Bridges and his successors as editors of DIS, the original bibliography on FlyBase was produced by scanning and using optical character recognition of the bibliographies that had appeared in DIS (T.C. Kaufman, personal communication). Also in the late 1980s, a C. elegans Database (ACeDB) was established (Eeckman and Durbin 1995), later replaced by WormBase (Stein et al. 2001). Databases to support research on additional genetic model organisms were introduced roughly simultaneously with those for worms and flies, including for budding yeast (Cherry et al. 1997, 1998), Arabidopisis (Flanders et al. 1998; Huala et al. 2001), mouse (Blake et al. 1997), zebrafish (Westerfield et al. 1999), and fission yeast (Wood et al. 2012).

Paralleling the introduction of the databases was the initiation of model organism genome projects, which released physical maps, assemblies and, in due course, sequences for S. cerevisiae (Cherry et al. 1997), C. elegans (C. elegans Sequencing Consortium 1998), D. melanogaster (Adams et al. 2000) and mouse (Waterston et al. 2002). As they became available, the annotated genomes and the sequence information were added to the online databases. Apart from revolutionizing the way that model organism research is conducted, these projects provided proof-of-principle for the sequencing and assembly methods applied to the human genome. Thus, model organism research led the way in terms of experimental methods as well as in computational analysis, database design and implementation.

Morgan noted that the funding from the Carnegie Institution was essential to the success of the first three decades of Drosophila research (Morgan 1940), predating the establishment of the National Institutes of Health (NIH) and the National Science Foundation (NSF) after the Second World War. Development of the model organism databases was funded with federal support from one or more of the NIH, NSF and the Medical Research Council (UK). Several also received additional support from private foundations.

Stock centers and online databases remain the lifeblood of model organism genetic and genomic research. The databases are very heavily used. For example, in 2020, using the Google Analytics definition of “Users” (unique IP addresses) and “Sessions” (a series of page views by a unique IP address), more than 800,000 users logged on to the Saccharomyces Genome Database (SGD) for over 1.7 million sessions (J.M. Cherry, personal communication); more than 600,000 users logged on to WormBase for over 1.3 million sessions (T.W. Harris and P.W. Sternberg, personal communication); and more than 500,000 users logged on to FlyBase for over 1.6 million sessions (J. Goodman and T.C. Kaufman, personal communication). Moves to implement large reductions in federal funding for these databases have caused uncertainty about their sustainability (Oliver et al. 2016; The Alliance for Genome Resources Consortium 2019) and are both myopic and, potentially, destructive of the research engine that underlies and underpins advances in understanding of human development and disease—namely, fundamental genetic, molecular and genomic research on model organisms.

GENETICS and G3: Genes|Genomes|Genetics, published by the Genetics Society of America, have a history of commitment to the model organism databases. The journals also recognize the significance of these databases for the research communities that depend on them. For example, GENETICS and G3 pioneered the establishment of links from the databases to papers published in the two journals (Rangarajan et al. 2011). In further recognition of the importance of databases and the associated software and computational tools that promote genetic research, GENETICS recently announced a new section “Computational Resources, Software and Databases.” This section aims to publish high-quality papers describing databases, knowledgebases, software and computational resources used to query, analyze and integrate genetic, genomic or population data. Topics of interest include descriptions of databases that harbor key data in standardized form; knowledgebases that compile curated data and results; methods for obtaining, transforming and integrating data; methods of querying data; new software that helps visualize data; and computational workflows and pipelines. We hope that the research community will support this exciting section of GENETICS by submitting their best work for publication.

Conflicts of interest

None declared.

Literature cited

  1. Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, et al. 2000. The genome sequence of Drosophila melanogaster. Science. 287:2185–2195. [DOI] [PubMed] [Google Scholar]
  2. Allen GE. 1978. Thomas Hunt Morgan: The Man and His Science. Princeton: Princeton University Press. [Google Scholar]
  3. Ashburner M, Drysdale R.. 1994. FlyBase – The Drosophila genetic database. Development. 120:2077–2079. [DOI] [PubMed] [Google Scholar]
  4. Blake JA, Richardson JE, Davisson MT, Eppig JT, Mouse Genome Informatics Group. 1997. The Mouse Genome Database (MGD). A comprehensive public resource of genetic, phenotypic and genomic data. Nucleic Acids Res. 25:85–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bridges CB. 1914. Direct proof through non-disjunction that the sex-linked genes of Drosophila are borne by the X-chromosome. Science. 40:107–109. [DOI] [PubMed] [Google Scholar]
  6. Bridges CB. 1916a. Non-disjunction as proof of the chromosome theory of heredity. Genetics. 1:1–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bridges CB. 1916b. Non-disjunction as proof of the chromosome theory of heredity (concluded). Genetics. 1:107–163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bridges CB. 1935a. The mutants and linkage data of chromosome four of Drosophila melanogaster. J Biol (Moscow). 4:401–420. [Google Scholar]
  9. Bridges CB. 1935b. Salivary chromosome maps––with a key to the banding of the chromosomes of Drosophila melanogaster. J Hered. 26:60–64. [Google Scholar]
  10. Bridges CB. 1938. A revised map of the salivary gland X-chromosome of Drosophila melanogaster. J Hered. 29:11–13. [Google Scholar]
  11. Bridges CB, Brehme KS.. 1944. The Mutants of Drosophila melanogaster. Washington D.C: Carnegie Institution of Washington Publication 552. [Google Scholar]
  12. Bridges CB, Demerec M, editors. 1934. Drosophila Information Service, Vol. 1. Washington D.C. and Cold Spring Harbor N.Y: Carnegie Institution of Washington and Cold Spring Harbor Laboratory. [Google Scholar]
  13. Bridges CB, Morgan TH.. 1919. Contributions to the Genetics of Drosophila melanogaster. II The Second Chromosome Group of Mutant Characters. Washington D.C: Carnegies Institution of Washington Publication 278, p. 123–304. [Google Scholar]
  14. Bridges CB, Morgan TH.. 1923. The Third-Chromosome Group of Mutant Characters of Drosophila melanogaster. Washington D.C: Carnegie Institution of Washington Publication 327. [Google Scholar]
  15. Burtis KC, Hawley RS, Lipshitz HD.. 2003. The 2003 Thomas Hunt Morgan Medal; David S. Hogness. Genetics. 164:1243–1245. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. C. elegans Sequencing Consortium. 1998. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science. 282:2012–2018. [DOI] [PubMed] [Google Scholar]
  17. Cherry JM, Adler C, Ball C, Chervitz SA, Dwight SS, et al. 1998. SGD: Saccharomyces Genome Database. Nucleic Acids Res. 26:73–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Cherry JM, Ball C, Weng S, Juvik G, Schmidt R, et al. 1997. Genetic and physical maps of Saccharomyces cerevisiae. Nature. 387:67–73., [PMC free article] [PubMed] [Google Scholar]
  19. Eeckman FH, Durbin R.. 1995. ACeDB and macace. Methods Cell Biol. 48:583–605. [PubMed] [Google Scholar]
  20. Flanders DJ, Weng S, Petel FX, Cherry JM.. 1998. AtDB, the Arabidopsis thaliana database, and graphical-web-display of progress by the Arabidopsis Genome Initiative. Nucleic Acids Res. 26:80–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Glover DM, White RL, Finnegan DJ, Hogness DS.. 1975. Characterization of six cloned DNAs from Drosophila melanogaster, including one that contains the genes for rRNA. Cell. 5:149–157. [DOI] [PubMed] [Google Scholar]
  22. Huala E, Dickerman AW, Garcia-Hernandez M, Weems D, Reiser L, et al. 2001. The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant. Nucleic Acids Res. 29:102–105., [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Lefevre G. 1976. A photographic representation and interpretation of the polytene chromosomes of Drosophila melanogaster salivary glands, Vol. 1a. In: Ashburner M, Novitski E, editors. The Genetics and Biology of Drosophila. London: Academic Press, p. 31–66. [Google Scholar]
  24. Lewis EB. 1945. The relation of repeats to position effect in Drosophila melanogaster. Genetics. 30:137–166. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Lindsley DL, Grell EH.. 1968. Genetic Variations of Drosophila melanogaster. Washington D.C: Carnegie Institution of Washington Publication 627. [Google Scholar]
  26. Lindsley DL, Zimm GG.. 1992. The Genome of Drosophila melanogaster. New York: Academic Press. [Google Scholar]
  27. Lipshitz HD. 2007. Genes, Development and Cancer. The Life and Work of Edward B. Lewis. Dordrecht: Springer. [Google Scholar]
  28. Lipshitz HD. 2021. The origin of GENETICS. Genetics. 217:iyaa024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Moore GE. 1965. Cramming more components onto integrated circuits. Electronics. 38:114–117. [Google Scholar]
  30. Morgan TH. 1910. Sex limited inheritance in Drosophila. Science. 32:120–122. [DOI] [PubMed] [Google Scholar]
  31. Morgan TH. 1940. Biographical Memoir of Calvin Blackman Bridges 1889–1938. Biographical Memoirs of the National Academy of Sciences of the United States of America. 22:29–48. [Google Scholar]
  32. Morgan TH, Bridges CB.. 1916. Sex-Linked Inheritance in Drosophila. Washington D.C: Carnegie Institution of Washington Publication 237. [Google Scholar]
  33. Muller HJ. 1939. Obituary: Dr. Calvin B. Bridges. Nature. 143:191–192. [Google Scholar]
  34. Oliver SG, Lock A, Harris MA, Nurse P, Wood V.. 2016. Model organism databases: essential resources that need the support of both funders and users. BMC Biol. 14:49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Rangarajan A, Schedl T, Yook K, Chan J, Haenel S, et al. 2011. Toward an interactive article: integrating journals and biological databases. BMC Bioinformatics. 12:175. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Stein L, Sternberg P, Durbin R, Thierry-Mieg J, Spieth J.. 2001. WormBase: network access to the genome and biology of Caenorhabditis elegans. Nucleic Acids Res. 29:82–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. The Alliance for Genome Resources Consortium. 2019. The Alliance of Genome Resources: Building a Modern Data Ecosystem for Model Organism Databases. Genetics. 213:1189–1196. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Mouse Genome Sequencing Consortium, et al. 2002. Initial sequencing and comparative analysis of the mouse genome. Nature. 420:520–562., [DOI] [PubMed] [Google Scholar]
  39. Wensink PC, Finnegan DJ, Donelson JE, Hogness DS.. 1974. A system for mapping DNA sequences in the chromosomes of Drosophila melanogaster. Cell. 3:315–325. [DOI] [PubMed] [Google Scholar]
  40. Westerfield M, Doerry E, Kirkpatrick AE, Douglas SA.. 1999. Zebrafish informatics and the ZFIN database. Methods Cell Biol. 60:339–355. [DOI] [PubMed] [Google Scholar]
  41. Wood V, Harris MA, McDowall MD, Rutherford K, Vaughan BW, et al. 2012. PomBase: a comprehensive online resource for fission yeast. Nucleic Acids Res. 40:D695–699. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Genetics are provided here courtesy of Oxford University Press

RESOURCES