Skip to main content
. 2018 Nov 20;15(11):e1002695. doi: 10.1371/journal.pmed.1002695

Table 1. Predictors considered, and how they are represented in CPRD and in our models.

Category Predictor Representation in CPRD Representation in models In QA In QA+ In T
Demographics Age Year of birth Computed based on mid-year of year of birth * * *
Sex Binary variable (male/female) Binary variable * * *
Ethnicity Ethnicity (categorical value) Categorical variable * * *
Lifestyle and family history Socioeconomic status Index of Multiple Deprivation Numeric variable on a scale of 1 to 5 * * *
BMI Weight measurement recorded repeatedly in various clinic visits BMI based on height and most recent recorded weight * * *
Smoking status Current tobacco use, in terms of number of cigars/cigarettes per day (recorded repeatedly) Categorical variable for latest status: non-smoker, ex-smoker, light smoker (less than 10 cigarettes/day), moderate smoker (10–20 cigarettes/day), heavy smoker (more than 20 cigarettes/day), smoker (amount not recorded) * * *
Alcohol intake Current alcohol consumption, in terms of units of alcohol per day (recorded repeatedly) Categorical variable for latest status: non-drinker, ex-drinker, trivial (less than 1 unit/week), light (1–2 units/week), moderate (3–6 units/week), heavy (7–9 units/week), very heavy (more than 9 units/week), drinker (amount not recorded) * * *
Family history of chronic disease Binary variable (yes/no) Binary variable (yes/no) * * *
Strategic health authority (region) Categorical variable Categorical variable * * *
Marital status Categorical variable Categorical variable * *
Use of care Previous emergency admissions Read Code and date of event Number of occurrences during last year * * *
Time since last occurrence (in days) *
Prior GP visits (consultations) Read Code and date of event Number of occurrences during last year * *
Time since last occurrence (in days) *
Total duration spent in GP visits (minutes) *
Clinical diagnoses (comorbidities) Diabetes, atrial fibrillation, cardiovascular disease, congestive cardiac failure, venous thromboembolism, cancer, asthma or COPD, epilepsy, falls, manic depression or schizophrenia, chronic renal disease, chronic liver disease or pancreatitis, valvular heart disease, treated hypertension, rheumatoid arthritis or SLE, depression (QOF definition) Read Code and date of entry One separate binary variable for each disease, 16 variables in total * *
Time since first diagnosis (in days)—1 separate variable for each disease, 16 variables in total *
Arthritis, connective tissue disease, hemiplegia, HIV/AIDS, hyperlipidaemia, learning disability, obesity, osteoporosis, peripheral arterial disease, peptic ulcer disease, substance abuse Read Code and date of entry One separate binary variable for each disease, 11 variables in total *
Time since first diagnosis (in days)—1 separate variable for each disease, 11 variables in total *
Clinical measures and laboratory tests Systolic blood pressure, haemoglobin, cholesterol/HDL, liver function test (γ-GT, aspartate aminotransferase, or bilirubin), platelets, ESR Numeric value for result and date of measurement Binary (yes/no) variable for if recorded—1 variable per test * *
Numeric variable for most recent result—1 variable per test * * *
Binary variable for abnormal result—1 variable per test * * *
Time since the latest result (in days)—1 variable per test *
Prescriptions Statin, NSAID, anticoagulant, corticosteroid, antidepressant, antipsychotic Date of prescription if applicable Binary (yes/no) variable for if prescription exists * * *

γ-GT, γ-glutamyl transferase; COPD, chronic obstructive pulmonary disease; CPRD, Clinical Practice Research Datalink; ESR, erythrocyte sedimentation rate; GP, general practice; HDL, high-density lipoprotein; NSAID, non-steroidal anti-inflammatory drug; QOF, Quality and Outcomes Framework; SLE, systemic lupus erythematosus.