Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Oct 1.
Published in final edited form as: Transl Stroke Res. 2020 Jan 8;11(5):861–870. doi: 10.1007/s12975-020-00780-6

Recruiting Control Participants into Stroke Biomarker Studies

Matthew A Edwardson 1, Stephen J Fernandez 2
PMCID: PMC7340578  NIHMSID: NIHMS1548520  PMID: 31912324

Summary

The number of scientists using –omics technologies to investigate biomarkers with the potential to gauge risk and aid in the diagnosis, treatment, and prognosis of stroke continues to rise, yet there are few resources to aid investigators in recruiting control participants. In this review, we describe two major strategies to match control participants to a stroke cohort - propensity score matching and one-to-one matching – including statistical approaches to gauge the balance between groups. We then explore the advantages and disadvantages of traditional recruitment methods including approaching spouses of enrolled stroke participants, direct recruitment from clinics, community outreach events, approaching retirement communities, and buying samples from a 3rd party vendor. Newer methods to identify controls by screening the electronic health record and using an online screening questionnaire are also described. Finally, we cover compensation for control participants and special considerations. The hope is that this review will serve as a roadmap whereby an investigator can successfully tailor their control recruitment strategy to the research question at hand and the local research environment. While this review is focused on blood-based biomarker studies, the principles will apply to investigators studying a broad range of biological materials.

Keywords: Biomarkers, Stroke, Matched Groups, Research Subject Recruitment, Propensity Scores, Electronic Health Record

Introduction

Biomarkers are biological signatures used to indicate the presence of some other biological phenomenon in health and disease. In the context of stroke, biomarkers are useful both clinically and in research to gauge stroke risk [1] and aid in diagnosis [2, 3], prognosis [4], and directing treatment decisions [5]. Some examples of widely used clinical biomarkers include blood low density lipoprotein (LDL) level as a risk factor for stroke [6], diffusion-weighted MRI to diagnose ischemic stroke [7], and the degree of stenosis on carotid imaging to guide treatment decisions for carotid atherosclerotic disease [8, 9]. There are also many stroke biomarkers under evaluation in research such as plasma NT-proBNP, a biomarker of cardiac myocyte dysfunction linked to an increased risk of ischemic stroke [1]. NT-proBNP is currently under study to help direct the decision on whether to start apixaban (an anticoagulant drug) or aspirin for secondary stroke prevention [5].

In developing stroke biomarkers, non-stroke controls are often necessary to help prove that the biomarker is truly signaling the biological phenomenon of interest. This is most important in diagnostic studies trying to differentiate between stroke and non-stroke physiology. Controls can also be helpful in prognostic studies and in studies assessing stroke risk. For example, in the Biomarkers of Stroke Recovery Study in which we developed the recruitment strategies described in this article, our goal was to identify plasma biomarkers associated with good and poor motor recovery after stroke. Based on our preliminary study findings [10], several microRNAs differentiate between good and poor motor recovery, but without controls we were left with many questions. Were these findings due to increased tissue injury in the poor recovery group or an enhanced neural repair response in the good recovery group? Non-stroke controls could help answer this question. How confident are we in the results? In the current state of science where many biomarker studies cannot be reproduced [11], proving that biomarker levels are significantly different from non-stroke controls adds a level of scientific rigor that substantially raises confidence in the reproducibility of the findings. Similar reasoning led investigators in a study assessing metabolite biomarkers of stroke risk to add a validation cohort comparing metabolite levels in stroke patients and controls [12]. Of course, not all biomarker studies require controls. In studies using a biomarker to direct treatment for secondary stroke prevention, such as the NT-proBNP example above, non-stroke controls add no value. Thus, while non-stroke controls are not necessary for all stroke-related biomarker studies, they are required for diagnostic studies, and add substantial rigor to many other investigations.

In this review we provide practical guidance on how to approach control recruitment for stroke biomarker studies. The first half introduces strategies to match stroke and non-stroke controls and describes statistical tests to gauge match fidelity. The latter half covers recruitment strategies including both traditional and more modern techniques using the electronic health record (EHR) and online screening questionnaires.

Matching Strategies - Propensity Score and One-to-One Matching

Once the decision has been made to recruit a control group, the next step is to determine how closely to match the stroke participants and non-stroke controls. In some studies matching is unnecessary, for example, in randomized studies or in acute stroke diagnostic studies where it is more important for the controls to share the same symptoms on presentation than to match for other potential confounders. For most other investigations, however, matching is important to control for potential confounders between the stroke and control populations. If the entire stroke cohort is already recruited one can perform regression assessing each potentially confounding variable, determine which variables lead to significant variance in the outcome of interest, and then recruit controls matched only for those variables. Most investigators, however, will wish to prospectively enroll controls along with stroke patients. In this latter case the confounding variables are unknown and it is wise to match for age, sex, race, and major cardiovascular comorbidities including hypertension, hyperlipidemia, diabetes, atrial fibrillation, and smoking. Ignoring these variables may bias results. For example, older patients have worse recovery from stroke than younger patients [13], and there are known sex differences in gene expression for many neurological and cardiovascular diseases [14]. In addition, blacks have around double the incidence of stroke as whites [15, 16] for reasons that are likely multifactorial, but could include genetics [17] and socioeconomic status [18]. Researchers may also need to match for variables beyond demographics and cardiovascular comorbidities. For example, because statins were shown to stabilize carotid plaques [19], an investigator studying a new lipid biomarker for carotid atherosclerotic disease may wish to match for statin use. Once the important potential confounders are identified, there are two primary methods to choose from to obtain well matched groups – propensity score matching and one-to-one matching.

Traditionally, investigators have used propensity scoring to match a pool of possible control participants to a stroke cohort based on demographic features and cardiovascular comorbidities [20, 21]. Most investigators employing this strategy either already have a pool of control biological samples on hand or prospectively recruit both stroke and control participants during the study period. Propensity scoring typically requires a larger pool of controls than stroke patients (at least 50% larger). Once all participants are enrolled, a propensity score is commonly estimated using logistic regression where patient status (stroke or control) is regressed on baseline characteristics. Regression results in a propensity score from 0 to 1, which is the estimated probability that the participant belongs to the stroke group. In the final step, each stroke participant is matched with a control participant, commonly using a greedy matching algorithm where the next available control with the closest propensity score is matched regardless of whether that control is a better match for a remaining stroke participant. Optimal matching is another common method, which minimizes the difference in propensity scores among all matched stroke and control participants. To achieve good results, the control cohort on the whole needs to be similar to the stroke cohort with regard to the matched variables, and the propensity scores need to span the same range. If the ranges are different between stroke and control, there may only be a narrow overlapping range in the middle, which will decrease the number of participants that can be included in the analysis and potentially bias the study toward a group that is not representative of the overall sample. There are several statistical packages that can perform propensity score matching, including SPSS, SAS, Stata and the MatchIt package in R [22]. A full discussion of propensity score matching is beyond the scope of this review, but the reader is referred to other articles focused on this topic [23, 20, 24].

In one-to-one matching, the investigator attempts to identify and enroll control participants that are perfect or near perfect matches for each stroke participant with regard to demographic features and other potential confounders. In the past, the time and resources required to employ such a strategy would be impossible. In the digital age, however, one can leverage the large amount of information contained in the electronic health record (EHR) as well as the Internet via web-based questionnaires to screen for well-matched controls (see separate sections below for details). One-to-one matching may be preferred over using propensity scores by some investigators because fewer control participants need to be enrolled and the controls are likely to be closer matches for each respective stroke patient, particularly in small biomarker studies. For example, the odds of finding a 75 year-old Asian male with a history of diabetes, atrial fibrillation, and smoking from among a sample of 50 prospectively enrolled controls for the purpose of propensity score matching are exceedingly low, whereas one-to-one matching typically identifies many such individuals for potential enrollment. Drawbacks to one-to-one matching include the requirement of a modest investment in programming support as well as a large EHR system with many patient encounters to be effective. The advantages and disadvantages of propensity score matching and one-to-one matching are described in Table 1.

Table 1.

Comparison of propensity score matching and one-to-one matching

Propensity Score Matching One-to-One Matching
Study design best suited for matching technique • Large biomarker studies, particularly those where a large pool of control samples is already available, but can also be used in studies where controls are recruited prospectively • Small to medium biomarker studies in which controls are recruited prospectively
Total number of enrolled controls necessary to achieve good matching • At least 50% more than study cohort • Equal to study cohort
Ability to closely match cohorts for potential confounders • Yes, though may be ill-suited for small biomarker studies without a large pool of controls to draw from
• Risk that propensity scores may span different ranges between stroke and control which may lead to inclusion of non-representative sample
• Yes, though perfectly matching for age, race, sex and > 2–3 comorbidities becomes challenging
• Investigator will need to try and offset additional or missing comorbidies in the non-perfect matches by enrolling future controls who lack/have the comorbidity
Computer programming investment • Little to none, regression and matching algorithms can be run from most biostatistical software packages • Modest investment upfront to establish algorithms to query electronic health record
• Minimal ongoing support cost to run queries
Electronic health record size requirement • None • ~500,000 or more patient encounters

Statistical Tests to Gauge Balance between Stroke and Control Cohorts

Determining the best method to gauge match fidelity between clinical and control cohort baseline variables remains controversial, but we aim to clarify the subject through specific examples. Traditionally investigators have compared each matched variable from two cohorts using hypothesis testing (such as T-tests) and reporting P values. However, this approach is problematic because whether the P value is < 0.05 is highly dependent on the sample size and not just the difference in variable prevalence between groups [25]. From a statistical perspective, T-tests are inappropriate for comparing dichotomous variables. Consider, for example, a stroke and control population matched for the variable diabetes (Table 2). Most investigators would consider a 25% difference in rates of diabetes between groups unacceptably large. Yet for a small cohort of 20, the P value from a Student’s T-test is > 0.05, suggesting a non-significant difference between groups (P = 0.11). Moving to a larger cohort of 200 with the same 25% difference suggests a highly significant difference between groups (P = 1.7 × 10−7).

Table 2.

Hypothesis testing versus standardized difference calculations for comparing baseline characteristic prevalence from two hypothetical matched stroke and control cohorts

Cohort 1 Cohort 2
Stroke (n=20) Control (n=20) P valuea Standardized Differenceb Stroke (n=200) Control (n=200) P valuea Standardized Differenceb
Diabetes 50% 25% 0.11 0.53 50% 25% 1.65E-07 0.53
Hyperlipidemia 50% 45% 0.76 0.10 50% 45% 0.32 0.10
Hypertension 90% 85% 0.64 0.15 90% 85% 0.13 0.15
Smoking 15% 10% 0.64 0.15 15% 10% 0.13 0.15
a

Student’s t-test. A P value < 0.05 suggests a significant difference between baseline characteristics

b

A standardized difference ≥ 0.1 suggests an imbalance between baseline characteristics

To overcome the effect of sample size, statisticians have encouraged reporting the standardized difference between groups as opposed to P values [26, 23, 22]. The formula for calculating the standardized difference (d) between dichotomous variables is

d=(p^strokep^control)p^stroke(1p^stroke)+p^control(1p^control)2

where p^stroke and p^control are the prevalence of the variable in the stroke and control populations respectively [26]. Some have proposed a standardized difference < 0.1 as an acceptably small difference between matched variables [27, 28]. Returning to our prior example of a 25% difference in diabetes rates between groups, the standardized difference is 0.53 regardless of sample size, confirming our initial intuition that the difference was unacceptably large.

While adopting the standardized difference has advantages over hypothesis testing, there remain challenges. Consider, for example, 3 other matched variables hyperlipidemia (HLD), hypertension (HTN) and smoking (Table 2). A hypothetical stroke and control cohort have 50% and 45% rates of HLD; 90% and 85% rates of HTN; and 15% and 10% rates of smoking respectively. All 3 risk factors have a 5% difference between groups, but the standardized difference is 0.1 for HLD and 0.15 for HTN and smoking. Thus, the standardized difference calculation is least stringent when a variable is present in 50% of the population, but becomes more stringent as the characteristic becomes either over- or underrepresented in the group (Fig 1). What this means in practice is that for small biomarker studies with n ≤ 20, a single participant with an additional or missing variable between the stroke and control cohorts leads to a standardized difference ≥ 0.1. In studies matching for multiple baseline demographic and comorbidity variables, it should be apparent that achieving a standardized difference < 0.1 for all variables may be unachievable.

Figure 1.

Figure 1.

Standardized difference (black, left y-axis) and P-values (gray, right y-axis) for a hypothetical matching variable with a 5% difference in prevalence between the control and stroke cohorts. A standardized difference < 0.1 or a P-value > 0.05 are generally reported as evidence for good matching between groups. The P-value, however, changes dramatically based on sample size such that a small study with 20 participants produces P-values much larger than a study with 200 participants. In contrast, for the same difference in prevalence between groups, the standardized difference remains the same whether there are 20 or 200 participants. Thus, the standardized difference is the preferred method to report match fidelity between variables. With either method, variables that are either over- or underrepresented in the population require more stringent matching.

If hypothesis testing reporting P values for differences between groups lacks statistical rigor and obtaining a standardized difference < 0.1 for all variables is too stringent, what should an investigator report? We propose reporting the standardized difference with a goal < 0.1 for all variables, but accepting the fact that some will be outside this range, particularly in studies with small sample size. Ultimately the degree of difference between each variable in the stroke and control cohorts that could bias a particular study are unknown, and it will be up to the reader to gauge whether the groups are well matched irrespective of the data reporting method.

Recruitment Strategies

Here, after first discussing ethical considerations, we describe several methods to identify and enroll control participants into stroke biomarker studies along with their advantages and disadvantages (summarized in Table 3). While these were devised from the perspective of a practicing physician engaged in clinical research, the strategies could also be carried out by non-clinician investigators. In the latter case it would be helpful to team up with a practicing physician engaged in stroke-related research to explore all control recruitment options. Many of the non-clinician investigators at our institution, for example, use our registry among other strategies to identify appropriate controls.

Table 3.

Advantages and disadvantages of various control recruitment strategies

Advantages Disadvantages
Traditional Recruitment Strategies
Spouses, relatives, and friends of enrolled stroke participants • Already invested in research
• If all spouses enrolled study well matched for sex
• Exposed to same diet / environmental risks as stroke patient
• Tend to have fewer cardiovascular comorbidities
Stroke Mimics • Undergo same stressor as stroke patient with acute hospitalization
• May be critical in acute stroke diagnostic biomarker studies
• Predominantly female
• Different vascular risk factors
• Seizure physiology may confound stroke biomarker studies focused on treatment or prognosis
Clinics • Patients and their spouses / relatives have established relationship with provider, therefore more likely to enroll as controls • Need to be careful that other comorbidities (dementia, Parkinson’s, etc.) will not confound results
• Can be challenging when recruiting from clinic other than home department
Community outreach events • Great for targeting a particular demographic that is challenging to enroll otherwise • May require added effort on the part of the investigator (travel, putting together stroke educational materials, etc.)
Retirement communities • Good way to screen a large number of potential controls • Requires gaining the trust of the facility manager
• Many investigators may wish to exclude controls who are not able to live independently
Purchase samples from 3rd party vendor • Samples can be gained quickly • Cost prohibitive
• May introduce significant bias since sample processing and handling likely different than stroke cohort
Modern Recruitment Strategies
Screening electronic health record • Allows for good one-to-one matching of multiple demographic and comorbidity features simultaneously
• More targeted strategy that mitigates some of the effort required for more traditional recruitment strategies described above
• Sometimes difficult to find large number of matches for minority populations
• Requires programming investment
• Still a fair bit of effort for research staff, ~10–20% of contacted matches end up enrolling
Online screening questionnaire • Can be coupled with most of the traditional recruitment approaches described above
• Allows researcher to gauge the fidelity of the match before proceeding to full enrollment
• Hyperlink can be email blasted to hospital employees and to EHR health portal accounts to reach large numbers of potential controls
• May add additional layer of unnecessary work if one plans to use propensity score matching
• Requires small programming investment

Ethical Considerations

All clinical research involving human subjects requires approval from the appropriate institutional ethics committee prior to initiation, and there are additional requirements for some of the EHR and online questionnaire-related recruitment strategies discussed in this article. It is important to describe to the ethics committee which recruitment strategies will be employed. For example, the investigator should describe whether they plan to recruit spouses, relatives, friends and coworkers; recruit from clinics, community events, and retirement communities; and create a registry which can be used for future studies. These disclosures will allow the ethics committee to provide feedback and in the process help protect the investigator and the institution in the case of an untoward event or lawsuit. Screening the EHR and using online questionnaires to identify potential matches (described below) requires the investigator to screen through protected health information (PHI) before obtaining informed consent; having thousands of people sign informed consents when only a small fraction will ultimately engage in study procedures is not practical. Ethics committees in most countries will allow this after first obtaining a special waiver of informed consent for screening PHI as long as informed consent is obtained from participants who go on to engage in study procedures.

Recruiting Spouses, Relatives, and Friends of Enrolled Stroke Patients

Spouses, relatives, and friends of enrolled stroke patients are excellent resources for potential control participants. They are already invested to a certain degree based on participation by their friend / loved one. In addition, many wish there was more they could do to help, even if indirectly, in the aftermath of a stroke. By participating in clinical research, family members and friends receive some outlet for these feelings with the knowledge that their participation could improve patient care in the future. Spouses tend to be similar in age, and if the spouse of every stroke patient enrolled as a control the groups would be closely matched for sex. What is more, spouses and neighborhood friends generally eat the same diet and are exposed to the same environmental risks – features which are difficult to control for when recruiting an unrelated cohort of controls. One limitation to recruiting family members and friends is that they are often healthier than the stroke patient with fewer cardiovascular comorbidities. Additional recruitment strategies are therefore required to obtain well matched groups.

Recruiting Stroke Mimics

Stroke mimics are appealing for enrollment as control participants, and are best suited for studies focused on acute stroke diagnostic biomarkers. Some examples of stroke mimics include seizure, multiple sclerosis, migraine and conversion disorder [29]. By nature strokes are stressful events, and using stroke mimics as controls provides exposure to a similarly stressful initial presentation and hospitalization. The investigator may already be following the stroke mimic clinically, so a relationship is already established whereby they are more likely to enroll as a control. Enrolling stroke mimics may be critical in studies assessing diagnostic biomarkers for acute stroke [2, 30]. The reason is that some mimics, including seizure and multiple sclerosis, may lead to a change in physiology [3133] that must be differentiated from acute stroke in order to make the diagnostic test clinically useful. This issue is important enough that recruiting mimics may take precedence over matching for demographics and cardiovascular comorbidities in acute stroke diagnostic studies. Recruiting mimics as controls has a few drawbacks. Stroke mimics tend to be younger, are more often female and have different cardiovascular comorbidities including lower rates of atrial fibrillation [3436, 29].

Recruitment from Clinics

Clinics are another potential source for controls, but success largely depends on the relationship of the principal investigator and research coordinators with the clinic physicians and staff. As the home department, Neurology clinic will be the most natural fit for most stroke investigators. Then again, enrolling patients with neurologic disease is problematic, as the other disease process may confound the results of a stroke-related study. One way around this is to not approach the patients, but the spouses and relatives who accompany the patients to Neurology clinic. Cardiology and Vascular Surgery clinic are also potentially fruitful, because the patients share a large number of cardiovascular comorbidities with the study cohort. Recruiting from a department that is not one’s primary affiliation is more challenging, but these relationships can be forged through collaborative efforts.

From a logistics standpoint, one should begin by trying to recruit from a clinic whose demographic features roughly parallel the stroke cohort. For example, dementia or movement disorders clinic will yield more age-appropriate controls than targeting headache clinic. It generally works best for the clinic provider to ask the patient / spouse whether they would be willing to talk to research staff about possible study participation at the end of the patient encounter. Research staff can then enter the exam room and approach for screening or enrollment. A word of caution if one is employing a one-to-one matching strategy - searching through Cardiology and Vascular Surgery clinic schedules to identify perfectly matched controls prospectively is generally very low yield. If an investigator wishes to use one-to-one matching in a clinic setting, the best approach is to quickly screen many potential controls using an online questionnaire via iPad or similar device. The researcher can then circle back to the potential control by phone once they confirm the match.

Community Outreach Events

Health fairs and other community outreach events can be very helpful to target a particular demographic. For example, an investigator may find their stroke cohort is comprised of equal numbers of white and black participants, yet the control cohort is mostly white due to the demographics in the vicinity of the Neurology clinic. To bolster the number of black control participants the investigator may wish to set up a booth at a health fair situated in a black community, providing education on recognizing the signs and symptoms of stroke while seeking new controls. In the United States, Comprehensive Stroke Centers are required to provide events to educate the public on stroke biannually. These events are perfect opportunities to both educate and seek out potential controls from the community.

Retirement Communities

Because stroke is a disease of aging, retirement communities are a rich source of controls. One must be careful, however, to gauge the functional status of the residents before targeting a particular facility. Because most stroke-related studies enroll patients who were previously functionally independent, independent living facilities are often the most appropriate source of controls. The investigator should approach the manager of the facility to get permission to distribute recruitment materials. Gaining the trust of these individuals can be challenging, and it is often helpful to have someone on the inside – a resident who previously enrolled in your study or another investigator who has worked with the facility who can lobby on your behalf. An efficient way to screen a retirement community is to have the manager distribute an online screening questionnaire to all the residents via email. Alternatively, the researcher can approach the facility like a community outreach event, providing stroke education to the residents while also trying to recruit control participants.

Obtaining Control Samples from Third Party Vendors

We recommend against using third party vendors to obtain control samples of biological materials for two main reasons. Most importantly, the cost of these samples can be prohibitive, especially for an investigator-initiated study. Each matched demographic feature or comorbidity tends to ratchet up the cost, and soon the study is untenable. The second downside is sample handling and processing. Any time samples are acquired by two different labs there is opportunity to introduce significant bias. Simply using a different gauge phlebotomy needle, for instance, can sometimes alter study results [37].

Screening the Electronic Health Record for One-to-One Matches

Countries with a nationalized health care system and those with consolidated regional health care systems using a common EHR have a wealth of data to identify very closely matched controls for biomarker studies. The EHR for MedStar Health in the mid-Atlantic region of the U.S., for example, has data from over 4 million patient encounters per year, and over 4 million distinct patients. One-to-one matching will be most effective in similar EHR systems with a large number of potential matches available.

Querying the EHR to find matched controls requires writing a computer program followed by a small amount of ongoing programming support to run the matching algorithm for each stroke patient. In our case, we wrote a program in SAS 9.4 (Cary, NC) that accesses the Cerner MedConnect system (Cerner Corp., Kansas City, MO). Demographic features including age, sex, and race were queried using the PERSON table in Cerner. To match each comorbidity using ICD9 and ICD10 codes, we linked the DIAGNOSIS table in Cerner to the NOMENCLATURE table by the Nomenclature ID. We also used the PROBLEM tables in Cerner to look up various phrases synonymous with the comorbidity, for example, “HTN”, “HYPERTENSION ESSENTIAL”, and “BENIGN ESSENTIAL HYPERTENSION”. We added algorithms for exclusion criteria, such as stroke, using “Not In” subqueries. Finally we limited potential matches to zip codes within a 5–10 mile radius of the hospital to reduce logistical challenges. An example of the SAS code used to identify matches for a 79yo black male stroke patient with a history of hypertension, hyperlipidemia, diabetes, coronary artery disease, smoking, and statin use can be found in Appendix 3 in the online supplementary material.

The computer program must be modified slightly for each individual and some iterative trial and error is required to obtain the desired number of matches. It is not practical to identify one-to-one matches for all stroke patients simultaneously because the query for each match can take from 30 minutes up to several hours to run. Matching for age, race, and sex is straightforward and often returns 100 or more matches. Adding comorbidities can quickly narrow this list. If a stroke patient has a long list of comorbidities, we would recommend limiting to the 3–4 most important comorbidities to generate a list of at least 10 matches. Finding enough matches for underrepresented minority populations may also be challenging, requiring more flexibility on exact matching of comorbidities or age range. Matching for medications, such as statin use, is also possible, but further limits the number of matches.

Once a list of matches is generated, some detective work is still required before initiating contact. The research staff will need to look up each match in the EHR to be sure they meet study inclusion / exclusion criteria. One may find, for example, that the match has terminal cancer and should be excluded. The next step is to send a recruitment letter to each carefully vetted match, telling them that they match a previously enrolled stroke patient and describing the nature of the study (see Appendix 1 in the online supplementary material for example recruitment letter). We follow this up a week later with a phone call. About 10–20% of those we reach out to with this method end up enrolling in the study, with about a third of enrollees responding directly to the mailer alone. It is often helpful to detail over the phone exactly which comorbidities the prospective control possesses that perfectly match the stroke patient – this helps convince them of their unique qualifications and motivates them to participate. If the research staff works through everyone in the list without enrolling a matched control the next step is to rerun the algorithm, possibly expanding the geographic catchment area, liberalizing the number of matched comorbidities, or expanding the age range.

In practice, one-to-one matching has facilitated the enrollment of perfect matches for the majority of our stroke patients. The rest are near perfect matches, +/− 2 years in age and off by 1 comorbidity. We attempt to offset the additional or missing comorbidity by prioritizing recruitment of a future control who lacks or has the comorbidity respectively. This makes the future control a less perfect match for their paired stroke patient, but balances the comorbidities overall to bring the standardized difference into the desired range.

Web-based Questionnaire to Screen for Matches

Using a web-based questionnaire is another way to leverage technology to screen large numbers of potential control participants. The questionnaire requires programming up front, but once in place there is no need for ongoing programming support. The questionnaire is simply a web-based form where a prospective control can enter their demographic information and answer a few questions regarding their past medical history related to the inclusion / exclusion criteria for the study (see example in Appendix 2 of the online supplementary material or https://researchdata.medstar.net/redcap/surveys/?s=AXDWMT4K3M). This information is sent to an encrypted database (examples include REDCap, SQL) accessible only to study investigators. The investigators can then manually determine whether the prospective control is a suitable match for a previously enrolled stroke patient.

Once in place, the screening questionnaire can be deployed in many environments including retirement communities, clinics, and community outreach events. Additional strategies that can be very effective include blast emails to all hospital employees and sending a message with the questionnaire hyperlink to all patients registered with an individualized health portal through the EHR. These blast email strategies can generate 100’s if not 1000’s of responses, but the success largely depends on lobbying efforts with hospital administration. Many hospitals are reluctant to directly email study solicitations and may relegate the notice to an email newsletter that includes several other studies competing for the reader’s attention. To help overcome this limitation, investigators could use a computer algorithm (as described in the ‘Screening the Electronic Health Record for One-to-One Matches’ section above) to target blast emails to particular individuals who meet certain inclusion / exclusion criteria. For institutions engaged in ongoing stroke biomarker studies, the data collected can serve as a registry for future controls.

Compensation

In our experience, most controls do not participate for the money, but out of a genuine desire to help future stroke patients. In countries where compensation for study participation is rare, this altruistic spirit should be enough to reach recruitment goals. Having said that, in countries where research compensation is more common, a small amount of compensation helps facilitate recruitment efforts. Around $50–100 (USD) per blood draw, whether in cash or gift cards, is usually effective. Less compensation would not be motivating and much higher rates can eventually become coercive, where the participant takes on risk they would not otherwise deem acceptable in order to receive payment. When providing compensation, one should provide enough to cover study related participation including things like parking, travel, and breakfast after a fasting blood draw.

Special Considerations

Certain patient populations are more difficult to recruit and may require extra efforts. Many blacks are wary of enrolling in studies for a variety of reasons [38, 39], including exploitation in the past in the name of biomedical research. The Tuskegee syphilis trial [40] and the immortal HeLa cell line [41] are two prime examples. As a result, an investigator may need to screen twice as many blacks to enroll the same number of control participants [42, 43]. Men are also more difficult to recruit than women [44, 45]. Using the EHR screening technique described above we find that many women contact us directly from the mailer alone, but men require more convincing with a follow up phone call. To better address recruitment challenges in these populations, investigators may need to develop at least one specialized recruitment strategy. This may entail screening at health fairs situated in black communities or at a Veteran’s home with a large population of men to draw from.

A final consideration relates to interactions between variables the investigator is trying to match between the stroke and control cohorts, such as hyperlipidemia and statin use. Patients suffering acute stroke are typically started on high dose statin for secondary stroke prevention based on the results of the SPARCL trial [46] even in the absence of hyperlipidemia. As a result, an investigator may end up with a stroke cohort where > 95% are on statins, but less than half have hyperlipidemia. The problem arises when trying to identify matched controls. In the absence of stroke there are very few indications for statin use without comorbid hyperlipidemia. Inevitably the investigator must choose which variable is most important, as it will be impossible to find enough controls on statins without hyperlipidemia to perfectly match the stroke and control populations. Fortunately we have not encountered other such interactions between the common cardiovascular comorbidities that limit matching fidelity.

Conclusions

Propensity score matching and one-to-one matching are both good options for enrolling matched control groups into stroke biomarker studies. Propensity scoring requires a larger pool of enrolled controls, but may be more practical for investigators who lack a large EHR or who already have many control samples in hand. One-to-one matching is a more targeted approach, but requires computer programming support.

Supplementary Material

12975_2020_780_MOESM1_ESM

Acknowledgements

We would like to acknowledge research coordinators Jamal Smith, Juby Mathews, and Margot Giannetti, who helped develop the screening materials and carried out many of the recruitment strategies described in the article.

Funding

MAE received research support from the National Institute of Neurological Disorders and Stroke (1U10NS086513) and from the National Center for Advancing Translational Science (KL2TR001432 and UL1TR001409). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

Footnotes

Compliance with Ethical Standards

The methods described in this article were developed during recruitment of control participants into the Biomarkers of Stroke Recovery Study (Georgetown IRB #2015-0288). Informed consent was obtained from all individual participants included in the study, and a waiver of informed consent was obtained to perform the screening techniques using the EHR and online questionnaire.

Conflict of Interest

The authors declare no conflicts of interest.

Publisher's Disclaimer: This Author Accepted Manuscript is a PDF file of a an unedited peer-reviewed manuscript that has been accepted for publication but has not been copyedited or corrected. The official version of record that is published in the journal is kept up to date and so may therefore differ from this version.

References

  • 1.Lind L, Siegbahn A, Lindahl B, Stenemo M, Sundstrom J, Arnlov J. Discovery of New Risk Markers for Ischemic Stroke Using a Novel Targeted Proteomics Chip. Stroke; a journal of cerebral circulation. 2015. [DOI] [PubMed] [Google Scholar]
  • 2.Sheth SA, Iavarone AT, Liebeskind DS, Won SJ, Swanson RA. Targeted Lipid Profiling Discovers Plasma Biomarkers of Acute Brain Injury. PloS one. 2015;10(6):e0129735. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Tang Y, Xu H, Du X, Lit L, Walker W, Lu A et al. Gene expression in blood changes rapidly in neutrophils and monocytes after ischemic stroke in humans: a microarray study. Journal of cerebral blood flow and metabolism : official journal of the International Society of Cerebral Blood Flow and Metabolism. 2006;26(8):1089–102. [DOI] [PubMed] [Google Scholar]
  • 4.De Marchis GM, Dankowski T, Konig IR, Fladt J, Fluri F, Gensicke H et al. A novel biomarker-based prognostic score in acute ischemic stroke: The CoRisk score. Neurology. 2019;92(13):e1517–e25. [DOI] [PubMed] [Google Scholar]
  • 5.Kamel H, Longstreth WT Jr., Tirschwell DL, Kronmal RA, Broderick JP, Palesch YY et al. The AtRial Cardiopathy and Antithrombotic Drugs In prevention After cryptogenic stroke randomized trial: Rationale and methods. International journal of stroke : official journal of the International Stroke Society. 2019;14(2):207–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Amarenco P, Labreuche J, Lavallee P, Touboul PJ. Statins in stroke prevention and carotid atherosclerosis: systematic review and up-to-date meta-analysis. Stroke; a journal of cerebral circulation. 2004;35(12):2902–9. [DOI] [PubMed] [Google Scholar]
  • 7.Lutsep HL, Albers GW, DeCrespigny A, Kamat GN, Marks MP, Moseley ME. Clinical utility of diffusion-weighted magnetic resonance imaging in the assessment of ischemic stroke. Annals of neurology. 1997;41(5):574–80. [DOI] [PubMed] [Google Scholar]
  • 8.Randomised trial of endarterectomy for recently symptomatic carotid stenosis: final results of the MRC European Carotid Surgery Trial (ECST). Lancet. 1998;351(9113):1379–87. [PubMed] [Google Scholar]
  • 9.Barnett HJ, Taylor DW, Eliasziw M, Fox AJ, Ferguson GG, Haynes RB et al. Benefit of carotid endarterectomy in patients with symptomatic moderate or severe stenosis. North American Symptomatic Carotid Endarterectomy Trial Collaborators. The New England journal of medicine. 1998;339(20):1415–25. [DOI] [PubMed] [Google Scholar]
  • 10.Edwardson MA, Zhong X, Fiandaca MS, Federoff HJ, Cheema AK, Dromerick AW. Plasma microRNA markers of upper limb recovery following human stroke. Sci Rep. 2018;8(1):12558. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Ioannidis JP, Khoury MJ. Improving validation practices in “omics” research. Science. 2011;334(6060):1230–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Sun D, Tiedt S, Yu B, Jian X, Gottesman RF, Mosley TH et al. A prospective study of serum metabolites and risk of ischemic stroke. Neurology. 2019;92(16):e1890–e8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Edwardson MA, Ding L, Park C, Lane CJ, Nelsen MA, Wolf SL et al. Reduced Upper Limb Recovery in Subcortical Stroke Patients With Small Prior Radiographic Stroke. Front Neurol. 2019;10:454. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Sharma S, Eghbali M. Influence of sex differences on microRNA gene regulation in disease. Biol Sex Differ. 2014;5(1):3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Sacco RL, Boden-Albala B, Gan R, Chen X, Kargman DE, Shea S et al. Stroke incidence among white, black, and Hispanic residents of an urban community: the Northern Manhattan Stroke Study. Am J Epidemiol. 1998;147(3):259–68. [DOI] [PubMed] [Google Scholar]
  • 16.Kleindorfer DO, Khoury J, Moomaw CJ, Alwell K, Woo D, Flaherty ML et al. Stroke incidence is decreasing in whites but not in blacks: a population-based estimate of temporal trends in stroke incidence from the Greater Cincinnati/Northern Kentucky Stroke Study. Stroke; a journal of cerebral circulation. 2010;41(7):1326–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Carty CL, Keene KL, Cheng YC, Meschia JF, Chen WM, Nalls M et al. Meta-Analysis of Genome-Wide Association Studies Identifies Genetic Risk Factors for Stroke in African Americans. Stroke; a journal of cerebral circulation. 2015;46(8):2063–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Kleindorfer DO, Lindsell C, Broderick J, Flaherty ML, Woo D, Alwell K et al. Impact of socioeconomic status on stroke incidence: a population-based study. Annals of neurology. 2006;60(4):480–4. [DOI] [PubMed] [Google Scholar]
  • 19.Merwick A, Albers GW, Arsava EM, Ay H, Calvet D, Coutts SB et al. Reduction in early stroke risk in carotid stenosis with transient ischemic attack associated with statin treatment. Stroke; a journal of cerebral circulation. 2013;44(10):2814–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Deb S, Austin PC, Tu JV, Ko DT, Mazer CD, Kiss A et al. A Review of Propensity-Score Methods and Their Use in Cardiovascular Research. The Canadian journal of cardiology. 2016;32(2):259–65. [DOI] [PubMed] [Google Scholar]
  • 21.Dykstra-Aiello C, Jickling GC, Ander BP, Shroff N, Zhan X, Liu D et al. Altered Expression of Long Noncoding RNAs in Blood After Ischemic Stroke and Proximity to Putative Stroke Risk Loci. Stroke; a journal of cerebral circulation. 2016;47(12):2896–903. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Ho DE, Imai K, King G, Stuart EA. Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Political Analysis. 2007;15(3):199–236. [Google Scholar]
  • 23.Austin PC. Primer on statistical interpretation or methods report card on propensity-score matching in the cardiology literature from 2004 to 2006: a systematic review. Circ Cardiovasc Qual Outcomes. 2008;1(1):62–7. [DOI] [PubMed] [Google Scholar]
  • 24.Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41–5. [Google Scholar]
  • 25.Palesch YY. Some common misperceptions about P values. Stroke; a journal of cerebral circulation. 2014;45(12):e244–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Austin PC. Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Statistics in medicine. 2009;28(25):3083–107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Heijboer RRO, Lubberts B, Guss D, Johnson AH, Moon DK, DiGiovanni CW. Venous Thromboembolism and Bleeding Adverse Events in Lower Leg, Ankle, and Foot Orthopaedic Surgery with and without Anticoagulants. J Bone Joint Surg Am. 2019;101(6):539–46. [DOI] [PubMed] [Google Scholar]
  • 28.Normand ST, Landrum MB, Guadagnoli E, Ayanian JZ, Ryan TJ, Cleary PD et al. Validating recommendations for coronary angiography following acute myocardial infarction in the elderly: a matched analysis using propensity scores. Journal of clinical epidemiology. 2001;54(4):387–98. [DOI] [PubMed] [Google Scholar]
  • 29.Merino JG, Luby M, Benson RT, Davis LA, Hsia AW, Latour LL et al. Predictors of acute stroke mimics in 8187 patients referred to a stroke service. Journal of stroke and cerebrovascular diseases : the official journal of National Stroke Association. 2013;22(8):e397–403. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Sharma R, Macy S, Richardson K, Lokhnygina Y, Laskowitz DT. A blood-based biomarker panel to detect acute stroke. Journal of stroke and cerebrovascular diseases : the official journal of National Stroke Association. 2014;23(5):910–8. [DOI] [PubMed] [Google Scholar]
  • 31.Lee SY, Choi YC, Kim JH, Kim WJ. Serum neuron-specific enolase level as a biomarker in differential diagnosis of seizure and syncope. Journal of neurology. 2010;257(10):1708–12. [DOI] [PubMed] [Google Scholar]
  • 32.Selmaj I, Cichalewska M, Namiecinska M, Galazka G, Horzelski W, Selmaj KW et al. Global exosome transcriptome profiling reveals biomarkers for multiple sclerosis. Annals of neurology. 2017;81(5):703–17. [DOI] [PubMed] [Google Scholar]
  • 33.Sun J, Cheng W, Liu L, Tao S, Xia Z, Qi L et al. Identification of serum miRNAs differentially expressed in human epilepsy at seizure onset and post-seizure. Mol Med Rep. 2016;14(6):5318–24. [DOI] [PubMed] [Google Scholar]
  • 34.Gargalas S, Weeks R, Khan-Bourne N, Shotbolt P, Simblett S, Ashraf L et al. Incidence and outcome of functional stroke mimics admitted to a hyperacute stroke unit. Journal of neurology, neurosurgery, and psychiatry. 2017;88(1):2–6. [DOI] [PubMed] [Google Scholar]
  • 35.Geisler F, Ali SF, Ebinger M, Kunz A, Rozanski M, Waldschmidt C et al. Evaluation of a score for the prehospital distinction between cerebrovascular disease and stroke mimic patients. International journal of stroke : official journal of the International Stroke Society. 2018:1747493018806194. [DOI] [PubMed] [Google Scholar]
  • 36.Lewandowski C, Mays-Wilson K, Miller J, Penstone P, Miller DJ, Bakoulas K et al. Safety and outcomes in stroke mimics after intravenous tissue plasminogen activator administration: a single-center experience. Journal of stroke and cerebrovascular diseases : the official journal of National Stroke Association. 2015;24(1):48–52. [DOI] [PubMed] [Google Scholar]
  • 37.Markowitz J, Abrams Z, Jacob NK, Zhang X, Hassani JN, Latchana N et al. MicroRNA profiling of patient plasma for clinical trials using bioinformatics and biostatistical approaches. Onco Targets Ther. 2016;9:5931–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Drake BF, Boyd D, Carter K, Gehlert S, Thompson VS. Barriers and Strategies to Participation in Tissue Research Among African-American Men. J Cancer Educ. 2017;32(1):51–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Freimuth VS, Quinn SC, Thomas SB, Cole G, Zook E, Duncan T. African Americans’ views on research and the Tuskegee Syphilis Study. Soc Sci Med. 2001;52(5):797–808. [DOI] [PubMed] [Google Scholar]
  • 40.Corbie-Smith G The continuing legacy of the Tuskegee Syphilis Study: considerations for clinical investigation. Am J Med Sci. 1999;317(1):5–8. [DOI] [PubMed] [Google Scholar]
  • 41.Skloot R The Immortal Life of Henrietta Lacks. Crown; 2010. [Google Scholar]
  • 42.Zhou Y, Elashoff D, Kremen S, Teng E, Karlawish J, Grill JD. African Americans are less likely to enroll in preclinical Alzheimer’s disease clinical trials. Alzheimers Dement (N Y). 2017;3(1):57–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Braunstein JB, Sherber NS, Schulman SP, Ding EL, Powe NR. Race, medical researcher distrust, perceived harm, and willingness to participate in cardiovascular prevention trials. Medicine (Baltimore). 2008;87(1):1–9. [DOI] [PubMed] [Google Scholar]
  • 44.Cottler LB, Zipp JF, Robins LN, Spitznagel EL. Difficult-to-recruit respondents and their effect on prevalence estimates in an epidemiologic survey. Am J Epidemiol. 1987;125(2):329–39. [DOI] [PubMed] [Google Scholar]
  • 45.Jancey J, Howat P, Lee A, Clarke A, Shilton T, Fisher J et al. Effective recruitment and retention of older adults in physical activity research: PALS study. Am J Health Behav. 2006;30(6):626–35. [DOI] [PubMed] [Google Scholar]
  • 46.Amarenco P, Bogousslavsky J, Callahan A 3rd, Goldstein LB, Hennerici M, Rudolph AE et al. High-dose atorvastatin after stroke or transient ischemic attack. The New England journal of medicine. 2006;355(6):549–59. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

12975_2020_780_MOESM1_ESM

RESOURCES