Skip to main content
. 2020 Dec 14;20(Suppl 4):292. doi: 10.1186/s12911-020-01270-3

Table 4.

An example of two previous mIDA case studies annotated using ATTEST checklist

Item No Recommendation Page No Study (1) [20] Page No Study (2) [29]
Objectives
 Background/rationale 1 Explain the scientific background and rationale for the study being reported in one or two sentences Page 1, section “Abstract”, paragraph 1, line 1–7 Page 1, section “Abstract”, paragraph 1, line 1–4
 Prespecified hypotheses 2 State prespecified hypotheses in on or two sentences Page 2, section “Introduction”, paragraph 3, line 1–2 N/A
Study design: data sources selection & variables selection & data integration
 Data source 3a Describe the time coverage FCDS: Page 2, section “Data source and case selection”, paragraph 1, line 2 FCDS: Page 4, section “Data sources”, paragraph 1, line 11
BRFSS: Page 2, section “Data source and case selection”, paragraph 1, line 6 BRFSS: N/A
2000 U.S. census data: Page 2, section “Data source and case selection”, paragraph 1, line 7 United States Census Bureau: Page 4, section “Data sources”, paragraph 1, line 23
ATSDR: N/A
County Health Ranking & Roadmaps: N/A
3b Describe the geographic coverage FCDS: Page 2, section “Data source and case selection”, paragraph 1, line 4–5” FCDS: Page 4, section “Data sources”, paragraph 1, line 12–14
BRFSS: N/A BRFSS: Page 10, section “Result”, paragraph 2, line 7–8
2000 U.S. census data: N/A United States Census Bureau: N/A
ATSDR: N/A
County Health Ranking & Roadmaps: N/A
3c Describe the sample size FCDS: Page 2, section “Data source and case selection”, paragraph 2, line 7 FCDS: Page 4, section “Data sources”, paragraph 2, line 6–7
BRFSS: N/A BRFSS: N/A
2000 U.S. census data: N/A United States Census Bureau: N/A
ATSDR: N/A
County Health Ranking & Roadmaps: N/A
3d Describe the demographic distribution FCDS: Page 2, Table 1 N/A
BRFSS: N/A
2000 U.S. census data: N/A
3e Describe the Cohort criteria FCDS: Page 2, section “Data source and case selection”, paragraph 2, line 1–5 FCDS: Page 4, section “Data sources”, paragraph 2, line 1–6
BRFSS: N/A BRFSS: N/A
2000 U.S. census data: N/A United States Census Bureau: N/A
ATSDR: N/A
County Health Ranking & Roadmaps: N/A
3f Describe the sources of bias N/A N/A
3 g Describe the data collection approach N/A FCDS: N/A
BRFSS: Page 4, section “Data sources”, paragraph 2, line 6–7
United States Census Bureau: N/A
ATSDR: N/A
County Health Ranking & Roadmaps: N/A
 Dependent variable 4a State the variable definition and variable type (e.g., primary outcome variable, secondary outcome variable) Survival time: Page 2, section “Variable definitions”, line 1–3 Cancer survival: Page 4, section “Data integration use case: The multi-level integrative data analysis of Cancer survival”, paragraph 1, line 1–2
4b State the data source of dependent variable Survival time: Page 2, section “Data source and case selection”, paragraph 1, line 2 Cancer survival: Page 4, section “Data sources”, paragraph 1, line 9–14
4c State the data type (e.g., numerical, categorical, date-time) of dependent variable Survival time: Page 2, section “Variable definitions”, paragraph 1, line 1 Cancer survival: N/A
4d State descriptive statistics (e.g., min, max. Median, value range, percentile) of dependent variable Survival time: Page 4, Table 1 Cancer survival: N/A
4e State the NIMHD domain and levels of dependent variable Survival time: Page 2, section “Data source and case selection”, paragraph 1, line 1–2 Cancer survival: Page 4, section “Data sources”, paragraph 2, line 15
 Independent variable 5a State the variable definition and variable type (e.g., primary predictor, secondary predictor) Socioeconomic status: Page 2, section “Variable definitions”, paragraph 3, line 1–2 Demographic variables: Page 5, Table 1
Individual smoking: Page 2, section “Data source and case selection”, paragraph 2, line 1–2 Smoking status: Page 10, section “The ontology for Cancer research variables (OCRV)”, paragraph 2, line 13–27
Regional smoking: Page 3, section “Data source and case selection”, paragraph 2, line 4–6 Marital status: Page 14, section “Type 4: Queries that generate results based on the knowledge encoded in ontology”, paragraph 2, line 7–10
Insurance payer: Page 5, Table 1
Residency: Page 5, Table 1
Age at diagnosis: Page 5, Table 1
Year of diagnosis: Page 5, Table 1
Tumor stage: Page 5, Table 1
Tumor type: Page 5, Table 1
Treatment procedure: Page 5, Table 1
Census Tract SVI: Page 14, section “Type 3: Queries that are used to link a patient to contextual factors through geographic variables”, paragraph 1, line 5–16
Census tract high school completion rates: Page 5, Table 1
Census tract family poverty rates: Page 5, Table 1
Census tract rurality status: Page 4, section “Data integration use case: The multi-level integrative data analysis of Cancer survival”, paragraph 1, line 8–11
County adult mental and physical health status: Page 5, Table 1
County density of primary care physicians: Page 5, Table 1
County smoking rate: Page 10, section “The ontology for Cancer research variables (OCRV)”, paragraph 2
County alcohol consumption rate: Page 5, Table 1
5b State the data type (e.g., numerical, categorical) of independent variable Socioeconomic status: Page 2, section “Variable definitions”, paragraph 3, line 9–10 Demographic variables: N/A
Individual smoking: Page 2, section “Data source and case selection”, paragraph 2, line 2–3 Smoking status: Page 13, Table 3
Regional smoking: Page 3, section “Data source and case selection”, paragraph 2, line 4–6 Marital status: Page 14, section “Type 4: Queries that generate results based on the knowledge encoded in ontology”, paragraph 2, line 7–10
Insurance payer: N/A
Residency: N/A
Age at diagnosis: Page 16, Fig. 6
Year of diagnosis: Page 16, Fig. 6
Tumor stage: N/A
Tumor type: Page 4, section “Data sources”, paragraph 2, line 1–6
Treatment procedure: Page 5, Table 1
Census Tract SVI: Page 14, section “Type 3: Queries that are used to link a patient to contextual factors through geographic variables”, paragraph 1, line 5–16
Census tract high school completion rates: N/A
Census tract family poverty rates: N/A
Census tract rurality status: N/A
County adult mental and physical health status: N/A
County density of primary care physicians: N/A
County smoking rate: Page 10, section “The ontology for Cancer research variables (OCRV)”, paragraph 2
County alcohol consumption rate: N/A
5c State the data source of independent variable Socioeconomic status: Page 2, section “Data source and case selection”, paragraph 1, line 6–7 Page 5, Table 1
Individual smoking: Page 2, section “Data source and case selection”, paragraph 1, line 1–2
Regional smoking: Page 2, section “Data source and case selection”, paragraph 1, line 7–10
5d State descriptive statistics (e.g., min, max. Median, value range, percentile) of independent variable Page 4, Table 1 N/A
5e State the NIMHD domain and levels of independent variable Socioeconomic status: Page 2, section “Data source and case selection”, paragraph 1, line 6 Page 5, Table 1
Individual smoking: Page 2, section “Data source and case selection”, paragraph 2, line 1
Regional smoking: Page 3, section “Data source and case selection”, paragraph 2, line 4–6
 Controlled variable 6a State the controlled variable and variable type (e.g., numerical, categorical) of controlled variable Age of diagnosis: Page 2, section “Variable definitions”, paragraph 1, line 10–13 N/A
Anatomic site: Page 2, section “Variable definitions”, paragraph 1, line 2–9
Race-ethnicity: Page 4, Table 1
Marital status: Page 4, Table 1
Insurance: Page 4, Table 1
Year of diagnosis: Page 4, Table 1
Gender: Page 4, Table 1
Stage of diagnosis: Page 4, Table 1
Treatment: Page 4, Table 1
6b State the data source of controlled variable Page 2, section “Data source and case selection”, paragraph 1, line 2a N/A
6c State descriptive statistics (e.g., min, max. Median, value range, percentile) of controlled variable Page 2, section “Data source and case selection”, paragraph 1, line 2a N/A
6d State the NIMHD domain and levels of controlled variable Page 2, section “Data source and case selection”, paragraph 1, line 1–5a N/A
 Missing data 7a For each data source, describe whether required or expected variable that is not present N/A N/A
7b For each variable, describe method of how to handle missing data N/A N/A
7c For each variable, describe the missing rate N/A N/A
 Data processing 9a Data extraction: for each variable, describe how to process the raw data source to extract the variable N/A Demographic variables: Page 15, Fig. 5
Age at diagnosis: Page 16, Fig. 6
Census Tract SVI: Page 16, Fig. 7
County smoking rate: Page 17, Fig. 8
Marital status: Page 18, Fig. 9
9b Data cleaning: for each variable, describe the method used to detect and correct (or remove) the incorrect records, missing values or outliers N/A N/A
 Integration strategy 10 Describe the integration strategy for each variable:1) Integrate with variables from same level, 2) Integrate with variables from different levels, and 3) Creation of additional computed elements Socioeconomic status: Page 2, section “Variable definitions”, paragraph 3, line 6–7. Demographic variables: Page 15, Fig. 5
Regional smoking: Page 2, section “Variable definitions”, paragraph 2, line 4–5. Age at diagnosis: Page 16, Fig. 6
Census Tract SVI: Page 16, Fig. 7
County smoking rate: Page 17, Fig. 8
Marital status: Page 18, Fig. 9
Census tract high school completion rates: Page 15, section “Type 3: Queries that are used to link a patient to contextual factors through geographic variables”, paragraph 1, line 1–3
Census tract family poverty rates: Page 15, section “Type 3: Queries that are used to link a patient to contextual factors through geographic variables”, paragraph 1, line 1–3
Census tract rurality status: Page 15, section “Type 3: Queries that are used to link a patient to contextual factors through geographic variables”, paragraph 1, line 1–3
County adult mental and physical health status: Page 15, section “Type 3: Queries that are used to link a patient to contextual factors through geographic variables”, paragraph 1, line 1–3
County density of primary care physicians: Page 15, section “Type 3: Queries that are used to link a patient to contextual factors through geographic variables”, paragraph 1, line 1–3
County alcohol consumption rate: Page 15, section “Type 3: Queries that are used to link a patient to contextual factors through geographic variables”, paragraph 1, line 1–3
 Integration algorithms 11 For each variable, describe the algorithm used to integrate it with variables from other data sources N/A Demographic variables: Page 15, Fig. 5
Age at diagnosis: Page 16, Fig. 6
Census Tract SVI: Page 16, Fig. 7
County smoking rate: Page 17, Fig. 8
Marital status: Page 18, Fig. 9
Census tract high school completion rates: Page 15, section “Type 3: Queries that are used to link a patient to contextual factors through geographic variables”, paragraph 1, line 1–3
Census tract family poverty rates: Page 15, section “Type 3: Queries that are used to link a patient to contextual factors through geographic variables”, paragraph 1, line 1–3
Census tract rurality status: Page 15, section “Type 3: Queries that are used to link a patient to contextual factors through geographic variables”, paragraph 1, line 1–3
County adult mental and physical health status: Page 15, section “Type 3: Queries that are used to link a patient to contextual factors through geographic variables”, paragraph 1, line 1–3
County density of primary care physicians: Page 15, section “Type 3: Queries that are used to link a patient to contextual factors through geographic variables”, paragraph 1, line 1–3
County alcohol consumption rate: Page 15, section “Type 3: Queries that are used to link a patient to contextual factors through geographic variables”, paragraph 1, line 1–3
 Variable validation 12 For each variable, describe data validation rule for the selected variable. Rule should identify both the variable and the validation algorithms N/A Demographic variables: Page 19, section “Data quality and consistency checks of the source data using the ontology”
 Integrated variable 13 Describe the variable after integration and basic descriptive statistics (e.g., min, max. Median, value range, percentile) N/A Page 18, Table 4

FCDS Florida Cancer Data System

ATSDR Agency for Toxic Substances& Disease Registry

BRFSS behavioral risk factor surveillance system

aIf the reported items for all variables or data sources are described at the same place, you can list the page/section/table information at once. For the integration related items, we only presented variables that have the information (N/A will not be showed in the table)