Genomics, proteomics, vaccinology, transgenics, stem cell—advances in all these areas critically stack on the shoulders of tissue culture, our ability to cultivate an organism's living cells in plastic dishes. Nutritional trial and error for decades of painstaking cell gardening laid the groundwork for the several thousand human primary cell explants and immortal tumor lines available to modern biotechnology. Now, the 50-year-old problem of cell line misidentification from cell contamination, mislabeling, or, in some cases, conscious deceit, has a brand-new tool for cell and individual validation, a composite short tandem repeat (STR, also called genomic microsatellite) genotype signature (1). The new advances, the latest in cell identification technologies, represent the most advanced and powerful forensic approach to dispense with the embarrassing, expensive, and maddening cell contamination that occurs in biomedical laboratories.
The extent of inadvertent cell line contamination is enormous. During the 1970s and 1980s, as many as one in three cell lines deposited in cell culture repositories were imposters, one cell line overtaking or masquerading as another. The most notorious culprit was a cervical carcinoma line, HeLa, established by George Gey at the Johns Hopkins Medical School in 1951 from a 31-year-old mother of four, Henrietta Lacks (2) (Fig. 1). HeLa cells were unlike other primary cervical cancer explants in that they grew horrifically in culture, perhaps too aggressively. In the years that followed, nearly every basic cancer research laboratory grew HeLa cells and attempted to repeat primary tumor cell explantation from other people's cancer cells. But too frequently, as vividly documented in Michael Gold's popular book, A Conspiracy of Cells (3), the new tumor cells mysteriously became replaced with ubiquitous HeLa cells. Stanley Gartler, subject editor of the report in this issue of PNAS (1), first unveiled the hoary deception at a cell culture conference in Bedford, PA, in 1966. Gartler was struck that the first 18 established human cell lines he tested expressed a GGPD-A allozyme genotype, an allele restricted to African Americans, even though the origin labeled on most cell lines was tumors from Caucasians (4). HeLa were African American, GGPD-A, ubiquitous in cancer cytology labs, and fully capable of infiltrating slower plodding primary cell cultures. Gartler opined that HeLa was overtaking these cells surreptitiously, a conclusion that would undermine the significance of research reports using the cell contaminants.
Over the next 15 years, a charismatic, if vitriolic, cytogenetic crusader named Walter Nelson-Rees unmasked scores of human cell lines by identifying three highly rearranged unique “marker” chromosomes that confirm HeLa cell contamination as a backup to the GGPD-A genotype. He exposed HeLa contamination in over 40 different human cultures, all labeled as something else (3, 5–8). The cell culture community had a very large problem.
Human emotions were on edge, red faces were appearing in the most prestigious laboratories, and discussions of the problem rapidly lost any semblance of civility. The cost, both monetary and to science, of cell line mix-ups is considerable. Hundreds of scientific reports based on fraudulent cell lines were published, and tainted research, estimated in value well in excess of 10 million dollars, was discredited. Each incident of cell contamination had a lead researcher's name attached, and all were branded with Nelson-Rees' “scarlet letter,” even if they had not actually caused the mix-up. Careers were derailed, epithets were slung, and science stumbled. The cell culture community learned from this sorry episode but is still looking for a better fix. Stan Gartler migrated back to genetic studies; Walter Nelson-Rees retired abruptly in 1981, a casualty of the U.S. National Cancer Institute's retreat from Richard Nixon's celebrated War on Cancer, yet not before publishing several very specific hit lists of notorious cell contaminants in Science magazine (5–8). Yet cell contamination continues into the 21st century. Last year, it was reported that 18% of 252 new cell cultures deposited at a German cell line repository were contaminated by another cell line (9).
Given the critical importance of cell line integrity for vaccinology, for tumor research, and for new developing research (see below), the cell biologist should welcome a cheap verifiable technology to avoid these costly mistakes. The report by Masters et al. (1) in this issue goes a long way in this direction.
The HeLa era, championed by Gartler, Nelson-Rees, and their colleagues, opened our eyes to the breadth of the problem. However, we wondered whether cell lines other than HeLa were also getting mixed up. To help uncover these, my own group developed the “allozyme genetic signature” assay for cells, which determines the composite genotype of seven polymorphic enzyme loci (10, 11). By considering multiple polymorphic allozyme locus genotypes, we could develop a rather unique individual genetic signature for cell line individual identification.
In addition, the statistical probability of a chance match of any two individual genotypes could be estimated, because that probability is equivalent to the frequency of the composite genotype in the population. That frequency is computed (by what forensic experts now call the “product rule;” refs. 12 and 13) as the multiplication product of the included allozyme genotype frequencies. Thus, if a cell line's genotype frequency for seven distinct allozyme loci was 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, and 0.7, then the population frequency of the cell's composite genotype, and the likelihood of a chance match for that genotype, was 0.1 × 0.2 × 0.3 × 0.4 × 0.5 × 0.6 × 0.7 = 0.0005. Empirically, the genotype frequencies of tested human cell lines had an average value of 0.004 and ranged from 0.02 to <10−4 (11). The allozyme approach seemed robust in most cases but suffered when multiple cell lines were considered. Although the likelihood of a match was about 0.004 (the average composite genotype frequency) for two cell lines, there was a 50% chance of a match when 21 cell lines were genotyped and compared with each other (11, 14). This imprecision in designing a unique cell line genetic profile was assuaged somewhat by supplementary karyology and HLA typing with the allozyme signature, but until the dawn of DNA technologies in the 1980s, the resolution was limited and subject to some uncertainty because of statistical creep (likelihood of a chance match) with multiple tests.
Multilocus DNA fingerprinting was proposed as an improved cell contamination monitor in 1990 by Dennis Gilbert et al. (15). This technology, originally introduced by Alec Jeffreys, was the first major advance in DNA individual identification in forensic cases (16, 17). Gilbert reported human cell line genetic variation in 235 restriction fragments resolved in a “bar-code”-like pattern of Southern blots hybridized with human minisatellite probe 33.6. Computation of the frequency of derived DNA fingerprints had a median frequency of 2.9 × 10−17 and a range between 2.4 × 10−21 and 6.6 × 10−15. These statistics were great, but there was a down side. Producing reproducible DNA fingerprints requires a fastidious technology not easily performed outside of experienced molecular biology labs. Further, the technique was so sensitive it revealed slight differences (0.3–3% pairwise differences), even among all lines known to be the same, largely because of allele “dropout” or loss, likely a consequence of cell culture-derived aneuploid evolution. Cell lines from different individuals commonly differed in 72–84% of their restriction fragments, whereas cell lines from the same person showed 0.3–2.9% fragment differences, a distinction sufficient to discriminate intraspecies cell contamination.
The technical advances are timely and important for quality control of available cell lines.
This issue's report (1) improves appreciably on previous genetic identification for cell lines. A group of six unlinked autosomal and one X-linked STR loci are used to determine individual genotypes that are the most unique yet. STRs are short (2–4 bp) repeat, or stutter, nucleotide sequences that are abundant in vertebrate genomes, estimated at about 100,000 loci randomly dispersed in the human genome. Because of their repeat structure, their mutation rate is higher than that of most coding gene sequences (10−2–10−4 mutations per meiosis). The higher mutation rate results in an accumulation of 8–20 alleles at each locus. This high level of polymorphism translates to near ideal genetic signatures for individual identifications. Typical composite STR genotypes occur infrequently, on the order of 10−8 or lower, effectively guaranteeing genetic uniqueness. The authors used a multiplex format allowing assessment of all seven loci in a single run after PCR amplification. The STR loci are “tetranucleotide repeats” selected, because alleles of these show single peaks on the Perkin–Elmer–Applied Biosystems DNA sequencing assessment machines, unlike the more common “dinucleotide” repeats, which produce shadow bands for each allele that can confuse multiplex allele scoring. The six autosomal loci are mapped to different human chromosomes, assuring that linkage disequilibrium among them would be minimized. As a consequence, each STR locus genotype frequency would be independent statistically of other loci. This independence allows the composite genotype frequency, and thus the likelihood of a chance match, to be estimated by using the forensic “product rule” multiplication (12, 13).
Masters et al. (1) validate their method empirically by testing 20 cell lines they had requested from five international cell repositories and five cancer centers. Included in their analysis were 131 cell lines that were known to involve previous mix-ups and 127 that were not. Unrelated cell lines share an average of 20% alleles with a range from 0–60% allele identity. Among the known mix-ups (264 pairwise combinations), they found that 99% had over 70% allele identity. On the basis of this frequency distribution [figure 2 in their report (1)], they suggest 80% allele identity as an empirical cut-off, above which any two cell lines would be pronounced as a likely match. Because nearly all of the unique cell line STR profile have less that 70% matching, the cut-off seems appropriate.
The STR DNA profile is particularly robust because it maximizes genetic informativeness by using several loci that show average heterozygosities of 79.1–87.8%. Among cells from the same individual person, there were three categories of rare differences: (i) quantitative allele differences; (ii) allele loss; and (iii) allele gain. The first two would be expected and explicable by developing aneuploidy common in cell cultures, particularly neoplastic cells. An allele gain producing three or more alleles at an STR locus would signal a cell contamination where a new cell genotype would somehow be introduced into the cell cultures. Understanding these aspects makes the new technologies particularly alluring and less prone to misinterpretation.
The technical advances are timely and important for quality control of available cell lines. American Type Culture Collection lists 4,000 cell lines in their repository; over 2,000 are human. It seems critical that the greater integrity of these and other culture collections should be vigorously monitored. As more and more human disease gene discovery cohorts are assembled by using hundreds, even thousands, of lymphoblastoid cell lines (e.g., for our AIDS cohorts, my own laboratory has established over 7,000 human cell lines), vigilance in avoiding all mix-ups is paramount (18).
As cell biology research on animal cell lines expands, their genetic integrity should also be maintained. This is particularly important, because genetic resources of rare and endangered species are increasingly banked as fibroblast cell lines (19). Fortunately, STR gene maps of numerous domestic animal species developed as a part of comparative gene mapping projects can be applied to DNA profiling of cell lines from domestic and related species (20). As nuclear transfer methodologies and mammalian species cloning advance, primary fibroblast cell culture applications, which can serve as nuclear clone donors, would certainly be expanded. Increased cell culture applications increase the likelihood of cell mix-ups and the need for DNA profile-based genetic validation.
The STR methods offered by Masters et al. (1) are powerful, relatively inexpensive (<$200 per test), and commercially available at least for human cells. The authors recommend that science journal editors set genetic validation/characterization as a criterion for publication, as has been in force for the cell culture primary techniques journal In Vitro. This is a sound concept, and both editors and referees of such papers should undertake a renewed vigilance in avoiding a rerun of this sorry episode in the history of cell technology. It matters not so critically which genetic tools one might use to identify the cells, but with STRs and a few million single nucleotide polymorphisms available for profiling (21), there seems little need to continue to work with cells whose origins are suspect.
Footnotes
See companion article on page 8012.
References
- 1.Masters J R, Thomson J A, Daly-Burns B, Reid Y A, Dirks W G, Packer P, Toji L H, Ohno T, Tanabe H, Arlett C F, et al. Proc Natl Acad Sci USA. 2001;98:8012–8017. doi: 10.1073/pnas.121616198. . (First Published June 19, 2001; 10.1073/pnas.121616198) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Gey G O, Coffman W D, Kubicek M T. Cancer Res. 1952;12:264–265. [Google Scholar]
- 3.Gold M. A Conspiracy Of Cells, One Woman's Immortal Legacy and The Medical Scandal It Caused. Albany, NY: State Univ. of New York Press; 1986. [Google Scholar]
- 4.Gartler S. Nature (London) 1968;217:750–751. doi: 10.1038/217750a0. [DOI] [PubMed] [Google Scholar]
- 5.Nelson-Rees W A, Flandermeyer R R, Hawthorne P K. Science. 1974;184:1093–1096. doi: 10.1126/science.184.4141.1093. [DOI] [PubMed] [Google Scholar]
- 6.Nelson-Rees W A, Flandermeyer R R. Science. 1976;191:96–98. doi: 10.1126/science.1246601. [DOI] [PubMed] [Google Scholar]
- 7.Nelson-Rees W A, Daniels D W, Flandermeyer R R. Science. 1981;212:446–452. doi: 10.1126/science.6451928. [DOI] [PubMed] [Google Scholar]
- 8.Nelson-Rees W A, Hunter L, Darlington G J, O'Brien S J. Cytogenet Cell Genet. 1980;27:216–231. doi: 10.1159/000131490. [DOI] [PubMed] [Google Scholar]
- 9.MacLeod R A F, Dirks W G, Matsuo Y, Kaufmann M, Milch H, Drexler H G. Int J Cancer. 1999;83:555–563. doi: 10.1002/(sici)1097-0215(19991112)83:4<555::aid-ijc19>3.0.co;2-2. [DOI] [PubMed] [Google Scholar]
- 10.O'Brien S J, Kleiner G, Olson R, Shannon J. Science. 1977;195:1345–1348. doi: 10.1126/science.841332. [DOI] [PubMed] [Google Scholar]
- 11.O'Brien S J, Shannon J E, Gail M H. In Vitro. 1980;16:119–135. doi: 10.1007/BF02831503. [DOI] [PubMed] [Google Scholar]
- 12.National Research Council Technology in Forensic Science. The Evaluation of Forensic DNA Evidence. Washington, DC: National Academy Press; 1996. [Google Scholar]
- 13.Budowle B, Monson K L, Chakraborty R. Int J Legal Med. 1996;108:173–176. doi: 10.1007/BF01369786. [DOI] [PubMed] [Google Scholar]
- 14.Gail M H, Weiss G H, Mante N, O'Brien S J. J Appl Prob. 1979;16:242–251. [Google Scholar]
- 15.Gilbert D A, Reid Y A, Gail M H, Pee D, White C, Hay R J, O'Brien S J. Am J Hum Genet. 1990;47:499–514. [PMC free article] [PubMed] [Google Scholar]
- 16.Jeffreys A J, Wilson V, Thein S L. Nature (London) 1985;314:67–73. doi: 10.1038/314067a0. [DOI] [PubMed] [Google Scholar]
- 17.Jeffreys A J, Wilson V, Thein S L. Nature (London) 1985;316:76–79. doi: 10.1038/316076a0. [DOI] [PubMed] [Google Scholar]
- 18.O'Brien S J, Nelson G W, Winkler C A, Smith M W. Annu Rev Genet. 2000;34:563–591. doi: 10.1146/annurev.genet.34.1.563. [DOI] [PubMed] [Google Scholar]
- 19.Ryder O A, McLaren A, Brenner S, Zhang Y-P, Benirschke R. Science. 2000;288:275–277. doi: 10.1126/science.288.5464.275. [DOI] [PubMed] [Google Scholar]
- 20.O'Brien S J, Menotti-Raymond M, Murphy W J, Nash W G, Wienberg J, Stanyon R, Copeland N G, Jenkins N A, Womack J E, Marshall-Graves J A. Science. 1999;286:458–481. doi: 10.1126/science.286.5439.458. [DOI] [PubMed] [Google Scholar]
- 21.Sachidanandam R, Weissman D, Schmidt S C, Kakol J M, Stein L D, Marth G, Sherry S, Mullikin J C, Mortimore B J, Willey D L, et al. Nature (London) 2001;409:928–933. doi: 10.1038/35057149. [DOI] [PubMed] [Google Scholar]