Skip to main content
. 2009 Apr;39(4):247–266. doi: 10.1111/j.1365-2362.2009.02125.x

Table 2.

Rationale for inclusion of topics in the STREGA recommendations

Specific issue in genetic association studies Rationale for inclusion in STREGA Item(s) in STREGA Specific suggestions for reporting
Main areas of special interest (See also main text)
Genotyping errors (misclassification of exposure) Non-differential genotyping errors will usually bias associations towards the null [65,66]. When there are systematic differences in genotyping according to outcome status (differential error), bias in any direction may occur 8(b): Describe laboratory methods, including source and storage of DNA, genotyping methods and platforms (including the allele calling algorithm used, and its version), error rates and call rates. State the laboratory/centre where genotyping was performed. Describe comparability of laboratory methods if there is more than one group. Specify whether genotypes were assigned using all of the data from the study simultaneously or in smaller batches 13(a): Report numbers of individuals in whom genotyping was attempted and numbers of individuals in whom genotyping was successful Factors affecting the potential extent of misclassification (information bias) of genotype include the types and quality of samples, timing of collection and the method used for genotyping [18,61,136] When high throughput platforms are used, it is important to report not only the platform used but also the allele calling algorithm and its version. Different calling algorithms have different strengths and weaknesses ([130] and supplementary information in [85]). For example, some of the currently used algorithms are notably less accurate in assigning genotypes to single nucleotide polymorphisms with low minor allele frequencies (< 0.10) than to single nucleotide polymorphisms with higher minor allele frequencies [129]. Algorithms are continually being improved. Reporting the allele calling algorithm and its version will help readers to interpret reported results, and it is critical for reproducing the results of the study given the same intermediate output files summarizing intensity of hybridization For some high throughput platforms, the user may choose to assign genotypes using all of the data from the study simultaneously, or in smaller batches, such as by plate ([68,137] and supplementary information [85]). This choice can affect both the overall call rate and the robustness of the calls For case-control studies, whether genotyping was done blind to case-control status should be reported, along with the reason for this decision
Population stratification (confounding by ethnic origin) When study sub-populations differ both in allele (or genotype) frequencies and disease risks, then confounding will occur, if these sub-populations are unevenly distributed across exposure groups (or between cases and controls) 12(h): Describe any methods used to assess or address population stratification In view of the debate about the potential implications of population stratification for the validity of genetic association studies, transparent reporting of the methods used, or stating that none was used, to address this potential problem is important for allowing the empirical evidence to accrue Ethnicity information should be presented (see for example Winker [138]), as should genetic markers or other variables likely to be associated with population stratification. Details of case-family control designs should be provided, if they are used As several methods of adjusting for population stratification have been proposed [84], explicit documentation of the methods is needed
Modelling haplotype variation In designs considered in this article, haplotypes have to be inferred because of lack of available family information. There are diverse methods for inferring haplotypes 12(g): Describe any methods used for inferring genotypes or haplotypes When discrete ‘windows’ are used to summarize haplotypes, variation in the definition of these may complicate comparisons across studies, as results may be sensitive to choice of windows. Related ‘imputation’ strategies are also in use [85,91,139] It is important to give details on haplotype inference and, when possible, uncertainty. Additional considerations for reporting include the strategy for dealing with rare haplotypes, window size and construction (if used) and choice of software
Hardy–Weinberg equilibrium (HWE) Departure from Hardy–Weinberg equilibrium may indicate errors or peculiarities in the data [128]. Empirical assessments have found that 20%–69% of genetic associations were reported with some indication about conformity with Hardy–Weinberg equilibrium, and that among some of these, there were limitations or errors in its assessment [128] 12(f): State whether Hardy–Weinberg equilibrium was considered and, if so, how Any statistical tests or measures should be described, as should any procedure to allow for deviations from Hardy–Weinberg equilibrium in evaluating genetic associations [131]
Replication Publications that present and synthesize data from several studies in a single report are becoming more common 3: State if the study is the first report of a genetic association, a replication effort, or both The selected criteria for claiming successful replication should also be explicitly documented
Additional issues
Selection of participants Selection bias may occur if (i) genetic associations are investigated in one or more subsets of participants (sub-samples) from a particular study; or (ii) there is differential non-participation in groups being compared; or, (iii) there are differential genotyping call rates in groups being compared 6(a): Give information on the criteria and methods for selection of subsets of participants from a larger study, when relevant 13(a): Report numbers of individuals in whom genotyping was attempted and numbers of individuals in whom genotyping was successful Inclusion and exclusion criteria, sources and methods of selection of sub-samples should be specified, stating whether these were based on a priori or post-hoc considerations
Rationale for choice of genes and variants investigated Without an explicit rationale, it is difficult to judge the potential for selective reporting of study results. There is strong empirical evidence from randomised controlled trials that reporting of trial outcomes is frequently incomplete and biased in favour of statistically significant findings [140142]. Some evidence is also available in pharmacogenetics [143] 7(b): Clearly define genetic exposures (genetic variants) using a widely-used nomenclature system. Identify variables likely to be associated with population stratification (confounding by ethnic origin) The scientific background and rationale for investigating the genes and variants should be reported For genome-wide association studies, it is important to specify what initial testing platforms were used and how gene variants are selected for further testing in subsequent stages. This may involve statistical considerations (for example, selection of P-value threshold), functional or other biological considerations, fine mapping choices, or other approaches that need to be specified Guidelines for human gene nomenclature have been published by the Human Gene Nomenclature Committee [144,145]. Standard reference numbers for nucleotide sequence variations, largely but not only SNPs are provided in dbSNP, the National Center for Biotechnology Information’s database of genetic variation [146]. For variations not listed in dbSNP that can be described relative to a specified version, guidelines have been proposed [147,148]
Treatment effects in studies of quantitative traits A study of a quantitative variable may be compromised when the trait is subjected to the effects of a treatment, for example, the study of a lipid-related trait for which several individuals are taking lipid-lowering medication. Without appropriate correction, this can lead to bias in estimating the effect and loss of power 9(b): For quantitative outcome variables, specify if any investigation of potential bias resulting from pharmacotherapy was undertaken. If relevant, describe the nature and magnitude of the potential bias, and explain what approach was used to deal with this 11: If applicable, describe how effects of treatment were dealt with Several methods of adjusting for treatment effects have been proposed [149]. As the approach to deal with treatment effects may have an important impact on both the power of the study and the interpretation of the results, explicit documentation of the selected strategy is needed
Statistical methods Analysis methods should be transparent and replicable, and genetic association studies are often performed using specialized software 12(a): State software version used and options (or settings) chosen
Relatedness The methods of analysis used in family-based studies are different from those used in studies that are based on unrelated cases and controls. Moreover, even in the studies that are based on apparently unrelated cases and controls, some individuals may have some connection and may be (distant) relatives, and this is particularly common in small, isolated populations, for example, Iceland. This may need to be probed with appropriate methods and adjusted for in the analysis of the data 12(j): Describe any methods used to address and correct for relatedness among subjects For the great majority of studies in which samples are drawn from large, non-isolated populations, relatedness is typically negligible and results would not be altered depending on whether relatedness is taken into account. This may not be the case in isolated populations or those with considerable inbreeding. If investigators have assessed for relatedness, they should state the method used [150152] and how the results are corrected for identified relatedness
Reporting of descriptive and outcome data The synthesis of findings across studies depends on the availability of sufficiently detailed data 14(a): Consider giving information by genotype 15: Cohort study – Report outcomes (phenotypes) for each genotype category over time Case-control study – Report numbers in each genotype category Cross-sectional study – Report outcomes (phenotypes) for each genotype category
Volume of data The key problem is of possible false-positive results and selective reporting of these. Type I errors are particularly relevant to the conduct of genome-wide association studies. A large search among hundreds of thousands of genetic variants can be expected by chance alone to find thousands of false positive results (odds ratios significantly different from 1.0) 12(i): Describe any methods used to address multiple comparisons or to control risk of false positive findings 16(d): Report results of any adjustments for multiple comparisons 17(b): If numerous genetic exposures (genetic variants) were examined, summarize results from all analyses undertaken 17(c): If detailed results are available elsewhere, state how they can be accessed Genome-wide association studies collect information on a very large number of genetic variants concomitantly. Initiatives to make the entire database transparent and available online may supply a definitive solution to the problem of selective reporting [7] Availability of raw data may help interested investigators reproduce the published analyses and also pursue additional analyses. A potential drawback of public data availability is that investigators using the data second-hand may not be aware of limitations or other problems that were originally encountered, unless these are also transparently reported. In this regard, collaboration of the data users with the original investigators may be beneficial. Issues of consent and confidentiality [153,154] may also complicate what data can be shared, and how. It would be useful for published reports to specify not only what data can be accessed and where, but also briefly mention the procedure. For articles that have used publicly available data, it would be useful to clarify whether the original investigators were also involved and if so, how The volume of data analysed should also be considered in the interpretation of findings Examples of methods of summarizing results include giving distribution of P-values (frequentist statistics), distribution of effect sizes and specifying false discovery rates

STREGA, STrengthening the Reporting of Genetic Association studies; SNP, single nucleotide polymorphism.