Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2008 Sep 17;105(38):14251–14253. doi: 10.1073/pnas.0808284105

Profile of David Haussler

Philip Downey
PMCID: PMC2567157  PMID: 18799747

Sequencing the human genome was the final grand scientific achievement of the 20th century. One scientist who played a major part in organizing and analyzing the three billion base pairs of DNA that make up our genome is David Haussler, who was elected to the National Academy of Sciences in 2006. His Inaugural Article, on mathematically modeling the evolution of genomes, is published in this issue (1).

Haussler is currently a professor of biomolecular engineering at the University of California, Santa Cruz (UCSC), and a Howard Hughes Medical Institute investigator, but his career took a slight detour from his southern California roots to scientific prominence. He began his career in the humanities before settling in mathematics, but a keen ability to mix his interests in biology and genomics has led to his consistently applying math to biology.

A Restive Youth

Haussler grew up in Los Angeles, where his father, an engineer, encouraged David's and his brother Mark's interest in science. David, however, did not follow a straight path to science. During high school he was more interested in art and psychology and, after graduating, enrolled in the Academy of Art in San Francisco, in 1971, where he studied painting for three months. He soon transferred to tiny, offbeat Immaculate Heart College (IHC) in Hollywood, where he studied gestalt therapy in the hope of becoming a practicing psychologist.

During this restless time, his brother helped him find his calling. After two years at IHC, David left in 1973 to major in mathematics at Connecticut College (New London, CT).

“I think the turning point came when I went to work in my brother's lab. He's 12 years older and was a biochemist at the University of Arizona,” Haussler recalls. “He said, ‘You want a summer job? Come to my lab and I'll teach you how to do science.’ He gave me Leninger's book on biochemistry and said, ‘Read this first.’ I read the text and worked in his lab and it was a dream summer.”

“By the end of it we had measured the levels of the hormonal form of vitamin D in the human bloodstream for the first time, and we published a paper in Science—my first publication,” Haussler recalls. “Although my job was the lab work, I also ended up doing a key step in the analysis for the paper, because I was the math major. It was really a foundational experience for me and a harbinger of things to come”(2).

graphic file with name zpq9990850580001.jpg

David Haussler

After completing his bachelor's degree in 1975, Haussler received a master's degree in applied mathematics in 1979 from California Polytechnic State University at San Luis Obispo. He then moved to the University of Colorado (Boulder, CO), where he obtained his Ph.D in computer science in 1982. In 2005, he won the Classic Paper Award from the American Association of Artificial Intelligence (AAAI) for an earlier manuscript on learning algorithms (3).

He has also won the Dickson Prize in science from Carnegie Mellon University (Pittsburgh, PA) and the Association for Computing Machinery/AAAI Allen Newell Award, and he is a Fellow of the California Academy of Sciences, the American Academy of Arts and Sciences, and the American Association for the Advancement of Science. He is 54 and married, with two children in college.

Haussler's doctoral thesis was in pure math, reporting his study of formal language theory and the theory of computation, including Turing machines. It may seem far removed from the human genome, but Haussler explains that this abstract world of machine language describes how anything that is computable can be computed, using very simple rules, on strings written over finite alphabets, “so it could be DNA in one incarnation, and something quite different in another” (4).

Sequencing Pioneers

While at Boulder in the early 1980s, Haussler befriended fellow grad students Gene Myers and Gary Stormo, and the three of them worked toward sequencing DNA and making sense of genomes. They met in a seminar led by Haussler's supervisor, Andrzej Ehrenfeucht.

Since their student days, the three have each had an enormous impact on genomics: Stormo developed fundamental methods of recognizing sequence motifs and other patterns in genomic data, Myers led the bioinformatics team at Celera, the private company that sequenced the human genome, and Haussler's group provided bioinformatics for the public, government-funded Human Genome Project.

“In the 1980s, we were analyzing small snippets of the E. coli genome that were available and genomes of bacteriophages like phi-X 174. These represented the first products of the early sequencing efforts, driven by the introduction of recombinant DNA methodologies,” Haussler says. “We didn't have much data to work with back then.” But that didn't stop them from preparing the techniques that could be used to analyze the large quantities of DNA to come (5).

Later that decade, Haussler moved among computer science, artificial intelligence, and statistics. “I was interested in how brain-like algorithms could be built and what their limitations and strengths were,” he says. “I wanted to know what was theoretically learnable in a very general sense.”

Even while working in mathematics, Haussler kept his eye on genomics, a contemporary field that only began taking shape during the late 1970s. When more gene sequence data became available in the 1990s, he got back into the field, developing statistical models and algorithms that were later used in major genome projects (6).

“It was a long way from bacteriophage genomes to our first whole animal genome,” Haussler says. “Of course, the ultimate project was the human genome. We were recruited into the project because they wanted experts to find the genes in the DNA. We had developed a methodology for this using hidden Markov models.”

He officially joined the public Human Genome Project in 1999. “When we got there, the public project had just tiny snippets of DNA scattered all over the genome in GenBank files without any cohesive map or assembly to pull them together,” Haussler recalls. There were genetic maps of the genome measured in centimorgans, radiation hybrid maps, and physical maps made from restriction enzyme digest data obtained from the thousands of approximately 150,000-base pair artificial chromosomes that the project was sequencing.

He remembers that the project's original plans for assembling the draft genome data were not working and had to be reinvented.

“These maps were mutually inconsistent in places, the data were noisy, and trying to overlay all the sequence data—both genomic DNA snippets and cDNA sequences made from mRNAs—was just a huge jigsaw puzzle,” he says. “We couldn't even start to find the genes until we had built long stretches of continuous DNA.”

“Jim Kent, of my group, stepped in at the last minute and saved the day by writing an amazing assembly program that we call GigAssembler,” Haussler says (7). The program works at a scale of billions of bases of information, taking information from 13 sources, including the different maps and information on RNA transcripts.

Haussler says that without Kent, who wrote 20,000 lines of code in just a few months, the public project would not have caught up with Gene Myers' team at Celera, which was well funded and boasted plenty of computational power.

“We ended up doing our first assembly on 100 desktop machines that the UCSC chancellor and dean of engineering hastily purchased for us. It was unbelievable.”

After many tries, long nights, and working weekends in the spring of 2000, Haussler says the public project was able to pull its data together when the genome was unveiled by Craig Venter and Francis Collins at the White House on June 26, 2000.

Genomic Search Engine

For Haussler, the most exciting day of his scientific career came the next week, on July 7, when UCSC put the human genome sequence onto the Internet.

“That day we put out half a trillion bytes of information from the campus of UCSC, dwarfing all previous records. You could download the human genome, for free, without restriction, from Santa Cruz that day. This was humanity's first real glimpse at its own recipe.”

graphic file with name zpq9990850580002.jpg

David Haussler's wet lab

“It's not Jurassic Park, and it never will be, but it's a start,” he concludes.

Since then, Haussler has been assembling and scanning other genomes to find the recipes of different species of living animals, and using the results to determine the difference between the DNA of various organisms and modern humans. His team has been a part of deciphering the mouse, chimpanzee, macaque, fruit fly, chicken, and rat genomes. All data from those projects have been put online and are publicly available through the UCSC genome browser.

“The only hope for understanding the molecular evolution of life is to understand it as a network of genomes.”

With the browser, a simple query on, for example, the name of a gene tells anyone the chromosomal region where it occurs, replete with information about introns, exons, cross-species comparisons, and other data.

“We're logging over 200,000 hits per day on our site, not counting the several mirror sites that now exist,” Haussler says. “Usage has steadily climbed as we have put on more powerful search and interactive capabilities for access to an increasing variety of high-throughput genomic data” (8).

Along the way, the researchers have found ultra-conserved regions of the human genome that have remained unchanged for hundreds of millions of years, along with an RNA gene, expressed in early development of the brain's neocortex, that has changed dramatically only recently and could play a role in human uniqueness (911).

Haussler's current research is focused on the evolution of complete genomes. He wants to take the trees of life that have been created by comparisons of single genes and do the same with entire genomes, studying the changes that made species what they are, or were (12).

DNA Evolution

In his Inaugural Article, he and his collaborators present a mathematical framework for inferring deletion, insertion, duplication, and rearrangement of segments of bases, and speciation events, to reconstruct the genome of the common ancestor for a set of species.

“Our project is very ambitious, in that we want to understand these events in the evolution of our genome at all levels, from the very macro level of large scale chromosomal rearrangements down to the level of changes in individual bases. Even a single base change, such as A to C, can have enormous evolutionary consequences,” he says.

The interactions between different kinds of evolutionary operations are more complex than single mutations, Haussler says. “Segments of DNA can get duplicated and rearranged, and then other segments of DNA containing the original segments and their duplicate copies can themselves get duplicated and rearranged. The material involved in the second event is a mosaic created from material from the previous duplications. This process can continue to arbitrary depth.”

He notes that tracing these changes along different lineages of living species that share a common ancestor requires complete genomic sequences for the living species. By applying the ideas introduced in his Inaugural Article, Haussler hopes to discern the genomic changes that led to the placental mammals, including changes that were specific to primates, apes, and humans.

“The only hope for understanding the molecular evolution of life is to understand it as a network of genomes, as a continuum of DNA that has duplicated and branched out,” Haussler says. “In particular, the molecular unity between animals is embodied in a tree of animal genomes, descended from a common ancestral genome.”

Haussler says that, by reconstructing ancestral genes and genomes, we can understand the commonalities that exist between all living and extinct species, whichis ultimately the only way to understand life and the diversity of animals that we see today.

Footnotes

This is a Profile of a recently elected member of the National Academy of Sciences to accompany the member's Inaugural Article on pages 14254–14261 of volume 105.

References

  • 1.Ma J, et al. The infinite sites model of genome evolution. Proc Natl Acad Sci USA. 2008;105:14254–14261. doi: 10.1073/pnas.0805217105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Brumbaugh P, Haussler D, Bressler R, Haussler M. Radioreceptor assay for 1 alpha,25-dihydroxyvitamin D3. Science. 1974;183:1089–1091. doi: 10.1126/science.183.4129.1089. [DOI] [PubMed] [Google Scholar]
  • 3.Haussler D. Quantifying inductive bias: AI learning algorithms and valiant's learning framework. Artif Intell. 1988;36:177–221. [Google Scholar]
  • 4.Haussler D, Zeiger P. Very special languages and representations of recursively enumerable languages. Inf Control. 1980;47:201–211. [Google Scholar]
  • 5.Clift B, Haussler D, McConnell R, Schneider TD, Stormo GD. Sequence landscapes. Nucleic Acids Res. 1986;14:141–158. doi: 10.1093/nar/14.1.141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Krogh A, Mian IS, Haussler D. A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res. 1994;22:4768–4778. doi: 10.1093/nar/22.22.4768. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Kent WJ, Haussler D. Assembly of the working draft of the human genome with GigAssembler. Genome Res. 2001;11:1541–1548. doi: 10.1101/gr.183201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Karolchik D, et al. The UCSC genome browser database: 2008 update. Nucleic Acids Res. 2008;36:D773–D779. doi: 10.1093/nar/gkm966. Database issue. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Bejerano G, et al. Ultraconserved elements in the human genome. Science. 2004;304:1321–1325. doi: 10.1126/science.1098119. [DOI] [PubMed] [Google Scholar]
  • 10.Katzman S, et al. Human genome ultraconserved elements are ultraselected. Science. 2007;317:915. doi: 10.1126/science.1142430. [DOI] [PubMed] [Google Scholar]
  • 11.Pollard KS, et al. A rapidly evolving region in the human genome is an RNA gene expressed during early neocortical development. Nature. 2006;443:167–172. doi: 10.1038/nature05113. [DOI] [PubMed] [Google Scholar]
  • 12.Blanchette M, Green ED, Miller W, Haussler D. Reconstructing large regions of an ancestral mammalian genome in silico. Genome Res. 2004;14:2412–2423. doi: 10.1101/gr.2800104. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES