Although genomewide association studies are increasingly popular, they present formidable logistical and technical challenges. The primary challenge lies in selecting a disease or a trait suitable for analysis. A successful analysis is more likely when the phenotype of interest can be sensitively and specifically diagnosed or measured. For such studies, extremely large sample series are required, involving thousands of case subjects and control subjects. This process usually mandates collaboration among groups that were previously competitors, which in itself presents a formidable challenge to success. In the first stage, single-nucleotide polymorphisms (SNPs) across the genome are genotyped, almost exclusively on chip-based products generated by one of two companies, Illumina or Affymetrix. The genotyping content of these products differs, but recent advances allow the imputation of ungenotyped SNPs from those that have been genotyped, which facilitates collaboration and comparison among groups that have used different techniques. Second, after the generation of SNP data, the data are subjected to quality control and cleaning procedures, such as ensuring that the genotyped sex (based on X and Y genotypes) matches the reported sex for individual samples, measuring how well the samples are matched as a group, and identifying individual outliers (all based on general patterns of genetic variability). This step allows the removal of samples from ethnically distant subjects and adjustment for any systematic differences between or within cohorts. Third, each SNP that survives quality control and cleaning is then tested for association with a disease or trait. Shown is a Manhattan plot, which is typically used in genomewide association studies and plots the negative log of the P value against chromosomal position. Because of the number of statistical tests that are performed, there is a high false positive rate. Therefore, depending on the study design, genomewide statistical significance is set at P values of approximately 1.0×10−8 or less at this stage of the analysis. The models of risk that are most typically tested are dominant, recessive, genotypic, allelic, and additive (with the additive model, which assumes that the presence of one risk allele confers an intermediate risk between having no allele and having two alleles, most frequently tested). Fourth, SNPs or loci are selected for replication in an independent sample set, ideally of the same or larger size than the sample analyzed in the genomewide association. The selection of loci may be based on statistical significance alone or a combination of statistical significance and biologic plausibility; the number of SNPs that are selected for testing may be as few as 10 or as many as 20,000, depending on the initial study design and resources available. Fifth, replication experiments lead to any combination of three results: selected loci show clear and unequivocal association with disease, show no association signal whatsoever, or show an association with disease that is not of sufficient magnitude to pass a predetermined statistical threshold. Sixth, additional genotyping is performed in independent replication cohorts to determine whether an association with a disease is genuine or not. Seventh, data mining at unequivocally associated loci reveals transcripts in and around this locus, in addition to the mapping of all known genetic variation within the region. Further fine mapping of the locus is performed by a combination of deep-resequencing methods to discover new variants and genotyping of untyped variants to determine which are most significantly associated with disease. Further analysis of the region is performed to determine the most critical variants, the pathologically relevant gene, and the likely biologic effect.