Skip to main content
. 2017 Aug 10;187(4):817–827. doi: 10.1093/aje/kwx287

Table 1.

Baseline Characteristics Before and After Multiply Imputing Missing Values Among Clinical Practice Research Datalink Participants (United Kingdom, 2001–2014) Before Applying the Inclusion/Exclusion Criteria for “Justification for the Use of Statins in Prevention: an Intervention Trial Evaluating Rosuvastatin” (Multiple Countries, 2003–2008)

Variablea Original CPRD Population (n = 1,438,355) After Multiple Imputationb,c
Distribution Missing Distribution
% Median (IQR) % % Median (IQR)
Age, years 63 (57–72) 0 63 (57–72)
Male sex 57.6 0 57.6
BMId 26.8 (23.9–30.3) 57.6 26.7 (23.6–30.1)
Current smoker 21.0 35.2 18.0
hs-CRP, mg/L 4.0 (2.0–7.0) 88.2 3.8 (2.0–6.8)
DBP, mm Hg 80 (74–87) 23.3 80 (74–87)
SBP, mm Hg 139 (128–149) 23.3 138 (127–149)
HDL-C, mg/dL 54 (46–66) 65.1 56 (46–68)
LDL-C, mg/dL 135 (112–159) 70.0 135 (112–159)
Triglycerides, mg/dL 124 (91–169) 68.2 121 (89–168)
Total cholesterol, mg/dL 218 (193–244) 57.8 219 (193–245)
Glucose, mmol/L 5.1 (4.8–5.6) 87.1 5.3 (4.6–6.0)
Serum Creatinine, mg/dL 1.0 (0.8–1.1) 45.6 1.0 (0.8–1.1)

Abbreviations: BMI, body mass index; CPRD, Clinical Practice Research Datalink; DBP, diastolic blood pressure; HDL-C, high-density lipoprotein cholesterol; hs-CRP, high-sensitivity C reactive protein; IQR, interquartile range; JUPITER, Justification for the Use of Statins in Prevention: an Intervention Trial Evaluating Rosuvastatin; LDL-C, low-density lipoprotein cholesterol; SBP, systolic blood pressure.

a The variables chosen for multiple imputation and for the model of selection into the trial were based on their high relevance to cardiovascular diseases. Several laboratory tests were used in the exclusion criteria of JUPITER but were not selected here because of large missingness in the CPRD and not being strong predictors for cardiovascular disease.

b The distributions of the variables were very similar across the 20 imputed data sets. Thus, we presented the results from a randomly selected imputed data set in the table.

c The imputation model included the following variables: age, male sex, BMI, weight within 2 years prior to the index date, previous weight recorded before 2 years before the index date, tobacco smoking status within 2 years prior to the index date, prior smoking status recorded before 2 years before the index date, diagnosis related to smoking, smoking cessation prescriptions, diagnosis of alcohol abuse, DBP, SBP, LDL-C, HDL-C, log transformation of triglycerides, total cholesterol, log transformation of hs-CRP, glucose, serum creatinine, use of any nonsteroidal antiinflammatory drugs, use of angiotensin-converting-enzyme inhibitor, use of angiotensin II receptor blockers, use of beta blockers, use of loop or nonloop diuretics, use of calcium channel blockers, diagnosis of renal disease, calendar year, and number of general practitioner encounters. In addition, occurrence of major cardiovascular disease and rosuvastatin initiation were prospectively assessed within 2 years after the index date and were included in the multiple imputation model to improve the performance of imputation.

d Weight (kg)/height (m)2.