Abstract
Background:
Use of routine data sources within clinical research is increasing and is endorsed by the National Institute for Health Research to increase trial efficiencies; however there is limited evidence for its use in clinical trials, especially in relation to self-harm. One source of routine data, Hospital Episode Statistics, is collated and distributed by NHS Digital and contains details of admissions, outpatient, and Accident and Emergency attendances provided periodically by English National Health Service hospitals. We explored the reliability and accuracy of Hospital Episode Statistics, compared to data collected directly from hospital records, to assess whether it would provide complete, accurate, and reliable means of acquiring hospital attendances for self-harm – the primary outcome for the SHIFT (Self-Harm Intervention: Family Therapy) trial evaluating Family Therapy for adolescents following self-harm.
Methods:
Participant identifiers were linked to Hospital Episode Statistics Accident and Emergency, and Admissions data, and episodes combined to describe participants’ complete hospital attendance. Attendance data were initially compared to data previously gathered by trial researchers from pre-identified hospitals. Final comparison was conducted of subsequent attendances collected through Hospital Episode Statistics and researcher follow-up. Consideration was given to linkage rates; number and proportion of attendances retrieved; reliability of Accident and Emergency, and Admissions data; percentage of self-harm episodes recorded and coded appropriately; and percentage of required data items retrieved.
Results:
Participants were first linked to Hospital Episode Statistics with an acceptable match rate of 95%, identifying a total of 341 complete hospital attendances, compared to 139 reported by the researchers at the time. More than double the proportion of Hospital Episode Statistics Accident and Emergency episodes could not be classified in relation to self-harm (75%) compared to 34.9% of admitted episodes, and of overall attendances, 18% were classified as self-harm related and 20% not related, while ambiguity or insufficient information meant 62% were unclassified. Of 39 self-harm-related attendances reported by the researchers, Hospital Episode Statistics identified 24 (62%) as self-harm related while 15 (38%) were unclassified. Based on final data received, 1490 complete hospital attendances were identified and comparison to researcher follow-up found Hospital Episode Statistics underestimated the number of self-harm attendances by 37.2% (95% confidence interval 32.6%–41.9%).
Conclusion:
Advantages of routine data collection via NHS Digital included the acquisition of more comprehensive and timely trial outcome data, identifying more than double the number of hospital attendances than researchers. Disadvantages included ambiguity in the classification of self-harm relatedness. Our resulting primary outcome data collection strategy used routine data to identify hospital attendances supplemented by targeted researcher data collection for attendances requiring further self-harm classification.
Keywords: Self-harm, young people, child and adolescent mental health, NHS Digital, Hospital Episode Statistics, data collection, randomised controlled trial, routine data
Background
Routine data and research
Large quantities of electronic health data are routinely collected in numerous administrative and clinical databases. In addition to diagnosis-specific clinical databases,1 general clinical databases exist, including Hospital Episode Statistics and the Clinical Practice Research Datalink containing general clinical information provided by National Health Service (NHS) hospitals in England and a sample of UK general practitioners, respectively. Healthcare records are used primarily within the NHS to inform the direct care of patients; however, extensive secondary use of anonymised data is made to inform commissioning and clinical audit, public health monitoring and management, and research.2
The use of routine data for clinical epidemiological and observational studies is common, describing population characteristics, identifying risk factors, comparing outcomes, and assessing variations across providers.3 Clinical trials tend to rely on data designed, generated, and collected specifically for trial purposes. Including routinely collected data in trials is of growing interest, as are fully pragmatic trials implemented using everyday clinical data,4 to reduce cost, maximise efficient trial design, and improve external validity. The use of routinely collected patient-level data is endorsed by the National Institute for Health Research in the United Kingdom to increase efficiency by informing feasibility and sample size calculations, identifying eligible patients, and supporting follow-up data collection.
There is limited evidence on the use of routine data within clinical trials. Williams et al.5 assessed the feasibility, utility, and resource implications of using electronically captured routine data from various sources and, despite potential benefits and cost reduction, confirmed concerns around data validity, difficulties identifying, accessing, and extracting data, and a lack of uniformity. Cook and Collins3 advised that completeness and accuracy of data sources must be considered, and replication of findings using other sources to verify results. For instance, following replication of a large clinical trial, Barry et al.6 reported that depending on outcome type, routinely collected data could be used for cardiovascular outcome detection in countries with unified health systems.
Routine data and self-harm
Self-harm in adolescents is a major public health issue. Globally, suicide is the second commonest cause of death in 10–24 year olds.7 Around 20,000–30,000 adolescents present to hospital in England each year having self-harmed.8 Research in this field using Hospital Episode Statistics admissions data includes retrospective analysis, time series analysis, and cohort studies.9–13 NHS Digital, the national provider of Hospital Episode Statistics, publish National Statistics, and hospital admissions caused by ‘intentional’ self-harm has been a previous topic of interest.14 Hospital Episode Statistics also include outpatient and Accident and Emergency attendances; however, self-harm research utilising these datasets has been far more limited.
Emergency hospital admission for self-harm is a key indicator in the Health Profiles 2013 Indicator guide,15 but excludes Accident and Emergency presentations not resulting in admission. Public Health Outcomes Framework Research16 includes Accident and Emergency attendances for self-harm, stating however that Hospital Episode Statistics need further development to support the indicator. Polling et al.17 reported it possible to identify presentations for self-harm using routine data by combining trust level electronic health records and Hospital Episode Statistics and validated against audit data. Thomas et al.18 concluded general practitioner reporting of suicide and self-harm using Read Codes through the Clinical Practice Research Datalink was unreliable, identifying only 68.5% of self-harm in Hospital Episode Statistics admissions. In the Avon Longitudinal Study of Parents and Children,19 self-harm episodes were identified in Hospital Episode Statistics for only 3% of 417 participants reporting a history of self-harm, and of 41 individuals with an admission identifying self-harm, 66% had no corresponding self-harm Accident and Emergency record, highlighting the unreliability of Accident and Emergency data.
Furthermore, Accident and Emergency Hospital Episode Statistics have been reported to underestimate self-harm rates by up to 60% compared to local data;20 admissions were more reliable but underestimated self-harm rates due to presentations not resulting in admission.
Rationale, aims, and objectives
We aimed to assess whether Hospital Episode Statistics would provide a complete, accurate, and reliable means of acquiring primary outcome and safety data for the SHIFT (Self-Harm Intervention: Family Therapy) trial.21 Collection of the primary endpoint, self-harm leading to hospital attendance, was resource intensive requiring researcher visits to hospitals across England to interrogate local medical records, with data missed for attendances outside identified catchment areas. Should routine data prove reliable, benefits would be regular, England-wide data retrieval, avoidance of biased data collection due to more frequent visits to some hospitals, and more efficient use of researcher resources.
We aimed to compare attendances reported via Hospital Episode Statistics to those collected directly by researchers, with consideration given to linkage, identification of attendances; reliability of Accident and Emergency, and admissions data; self-harm categorisation; data quality and completeness.
Methods
The SHIFT trial
SHIFT21 was a pragmatic, phase III, multicentre, individually randomised, controlled trial comparing clinical and cost-effectiveness of systemic Family Therapy to Treatment-as-Usual in adolescents following self-harm. Systemic Family Therapy is a psychological treatment aiming to reduce distress and conflict by changing communication, relationships, and roles in family members; in SHIFT, it consisted of ∼8 sessions over ∼6 months. Eight hundred and thirty-two young people, aged 11–17, and their primary carers were recruited from 41 Child and Adolescent Mental Health Services between April 2010 and December 2013, and followed for 18 months.
The primary outcome, self-harm leading to hospital attendance, defined self-harm as any form of non-fatal self-poisoning or self-injury regardless of motivation or intent. This provided an objective measure, not reliant on self-report (prone to recall and response bias), which could be quantified from hospital records even if participant contact was lost.
Original methods to obtain hospital attendances involved manual searches of local medical records by trial researchers. Hospitals with Accident and Emergency departments were ‘mapped’ to recruiting services to ensure appropriate catchment area, acknowledging that ‘out-of-area’ presentations would not be detected. Thereafter, local approval was obtained for researcher data collection with regular visits required to access up to date data from each hospital.
Hospital episode statistics
Approval for Accident and Emergency, and admissions Hospital Episode Statistics was obtained from NHS Digital following review of appropriate participant consent. To identify trial participants, NHS number, date of birth, gender, and postcode were provided. Data items requested matched, where possible, data collected via researchers, including dates of attendance, admission, and discharge; presenting hospital; patient group, diagnosis, and cause; and treatment. Final year data are available following an annual refresh in November for the preceding financial year; hence, ‘provisional’ datasets were provided for the most recent data years to ensure data were as up to date as possible.
Pilot downloads were obtained in April and August 2012 containing data for 487 of the 832 SHIFT participants recruited at that time, providing data for our initial comparison with researcher-collected data. Further data were subsequently received over the duration of the trial for the full cohort of 832 participants. Final comparison to researcher-collected data is referred to as full cohort follow-up.
Data cleaning and derivation of complete hospital attendances
To compare Hospital Episode Statistics to researcher-collected data, episodes falling outside of participants’ 18-month follow-up were removed. The primary interest was emergency hospital attendances; therefore, planned follow-up attendances, elective and maternity events were removed.
Each row of admissions data corresponds to a finished consultant episode, and episodes taking place over a continuous period of time describe a continuous in-patient spell detailing the patient pathway.22 Episodes (rows) across the datasets can therefore describe a patient’s complete hospital attendance, from initial presentation to final discharge; with multiple episodes where patients attend Accident and Emergency and are then admitted, receive care under multiple consultants, or are transferred between hospitals’. Episodes were linked to obtain a participant’s complete hospital attendance (Accident and Emergency attendances and continuous in-patient spells) (Supplementary Table S1).22,23
Identifying self-harm-related attendances
Accident and Emergency episodes use ‘patient group’ to indicate the reason for attendance and two broad diagnosis codes. Of eight patient groups, ‘Deliberate self-harm’ identifies self-harm; however, where patient group does not provide a clear reason (i.e. ‘not known’), further diagnosis codes may not rule out self-harm, for example, ‘Poisoning (including overdose)’ could include deliberate or accidental poisoning.
For Admissions, up to 20 diagnoses are provided per episode using International Statistical Classification of Diseases and Related Health Problems 10th Edition (ICD-10) codes. Codes X60-X84 identify Intentional self-harm, and further codes such as Y10-Y34 (event of undetermined intent) and S00-T79 (injury, poisoning, and certain other consequences of external causes) have potential to identify self-harm.
We classified attendances as self-harm related, possibly self-harm related, and not self-harm related (Supplementary Table S2). Classification was conservative and uncertainty resulted in unclassified attendances, including those with a mix of ‘possible’ and non-self-harm codes, ‘possible’ self-harm codes only, or non-informative codes (Figure 1).
Figure 1.
Classification for self-harm relatedness.
Results
Initial comparison
Results first relate to pilot Hospital Episode Statistics informing our initial comparison to researcher data, at which time 487 participants had been recruited to the trial.
Linkage
A high linkage rate was achieved, with 465/487 (95%) participants matched with varying levels of certainty (Table 1). The most reliable linkage at step 1, matches on all identifiers; whilst the least reliable, at step 8, matches on NHS number only, the least reliable. Linkage at steps 5 and 8 were not considered acceptable and episodes for four participants were discarded. For unmatched or inadequately linked participants, identifiers were further queried to ensure no data completion or entry errors.
Table 1.
Initial comparison: linkage of SHIFT participants to Hospital Episode Statistics.
| Step | Records matched | NHS number | DoB | Sex | Post-code | Acceptable match |
|---|---|---|---|---|---|---|
| 1 | 376 (77.2%) | ✓ | ✓ | ✓ | ✓ | Yes |
| 2 | 35 (7.2%) | ✓ | ✓ | ✓ | Yes – adequate Due to potential changes in postcode over time |
|
| 3 | 6 (1.2%) | ✓ | Partial | ✓ | ✓ | Yes – adequate Identified errors in the DOB we held for 5/6 participants |
| 4 | 0 | ✓ | Partial | ✓ | Yes – adequate | |
| 5 | 3 (0.6%) | ✓ | ✓ | No Incorrect NHS number held within SHIFT for 1 participant |
||
| 6 | 44 (9.0%)a | ✓ | ✓ | ✓ | Yes – with further checks Largely due to missing NHS number at time of linkage. When NHS number provided, there was largely a partial match on NHS number identifying errors in the NHS number we held |
|
| 7 | 0 | ✓ | ✓ | ✓b | Yes – with further checks | |
| 8 | 1 (0.2%) | ✓ | No Incorrect NHS number held within SHIFT |
|||
| Unmatched | 22 (4.5%) | No | ||||
| Total | 487 |
NHS: National Health Service; SHIFT: Self-Harm Intervention: Family Therapy.
One participant linked to three different identifiers in step 6; the correct identifier was identified after querying the NHS number at site and identifying common hospital attendances in both the researcher-collected data and one of the three Hospital Episode Statistics records.
Postcode in the ignore list (communal establishments such as hospitals, prisons, army barracks).
Hospital Episode Statistics data cleaning and derivation of complete hospital attendances
A total of 1897 episodes were received from April 2009 to May 2012 (Figure 2), with 516 episodes occurring within 18-month follow-up. After removing non-emergency episodes, 458 remained: 332 Accident and Emergency, and 126 admission episodes. Linkage of episodes resulted in 341 complete hospital attendances: 222 (65%) Accident and Emergency attendances resulting in discharge, 98 (29%) Accident and Emergency attendances resulting in admission, and 21 (6%) emergency admissions, that is, via the general practitioner.
Figure 2.
Initial comparison: linkage of Hospital Episode Statistics episodes to form complete hospital attendances.
Self-harm classification
Classification identified 17% of episodes and 18% complete hospital attendances as self-harm related (Table 2). 64% of episodes and 62% complete hospital attendances could not be classified. Three quarters of Accident and Emergency episodes were unclassified, whereas ICD-10 coding of admissions ensured a far higher classification rate, with 35% unclassified. Low classification of Accident and Emergency episodes was attributed to the large proportion (299, 90%) reported under uninformative patient groups ‘Other than above’, ‘Other accident’, or ‘Not known’, and while diagnosis codes identified some non-self-harm, they could not identify self-harm.
Table 2.
Initial comparison – self-harm classification of Hospital Episode Statistics: episodes and complete hospital attendances.
| Self-harm classification | Emergency-related episodes | Emergency-related complete hospital attendances | |||||
|---|---|---|---|---|---|---|---|
| Accident and Emergency (n = 332) | Admission (n = 126) | Total (n = 458) | Accident and Emergency (n = 222) | Accident and Emergency, and Admission (n = 98) | Admission (n = 21) | Total (n = 341) | |
| Self-harm | 29 (8.7%) | 49 (38.9%) | 78 (17.0%) | 10 (4.5%) | 51 (52.0%) | 0 (0.0%) | 61 (17.9%) |
| Non-self-harm | 54 (16.3%) | 33 (26.2%) | 87 (19.0%) | 37 (16.7%) | 19 (19.4%) | 13 (61.9%) | 69 (20.2%) |
| Unclassified | 249 (75.0%) | 44 (34.9%) | 293 (64.0%) | 175 (78.8%) | 28 (28.6%) | 8 (38.1%) | 211 (61.9%) |
Researcher comparison
Identification of attendances
Three hundred and forty-four complete hospital attendances were reported via either source, with 40% reported via both methods (Table 3). Researchers reported less than half of all attendances; Hospital Episode Statistics reported all but three.
Table 3.
Initial comparison: identification of hospital attendances within Hospital Episode Statistics and as reported by the researcher.
| HESa | Researcher | Total | ||
|---|---|---|---|---|
| Attendance reported | Attendance not reported | |||
| Hospital searcheda | Hospital not searcheda | |||
| Attendance reported | 136 (39.5%) | 25 (7.3%) | 180 (52.3%) | 341 (99.1%) |
| Attendance not reported | 3 (0.9%) | NA | 3 (0.9%) | |
| Total | 139 (40.4%) | 205 (59.6%) | 344 (100%) | |
HES: Hospital Episode Statistics.
Hospital searched or not post attendance.
Most attendances not reported by the researcher were in hospitals the researcher had not visited since the participant’s attendance. There were 138 (40%) in hospitals yet to be visited, either because the first visit was yet to be arranged (82, 21%), approval was not yet granted (27, 8%), or a visit had not been planned (29, 8%); and a further 42 (12%) occurred after hospital records had been searched during a researcher visit. There were however 25 (7.3%) attendances not reported by the researcher following a visit to the corresponding hospital. These comprised a disproportionate number of admission-only attendances (11, 52%), suggesting researchers had greater difficulty finding admissions which did not follow an Accident and Emergency attendance, that is, emergency elective, or GP referrals. By trial end, 10 were subsequently found and reported by the researcher, 13 were not requested as self-harm classification was not required, and two were not reported despite further searches.
Three Accident and Emergency attendances were reported by the researcher but not from Hospital Episode Statistics, with no clear reason why: participants had been reliably linked, other attendances were reported for two participants, and attendances occurred in different hospitals, with other attendances reported.
Self-harm comparison
There were no conflicting self-harm classifications (Table 4). Researchers obtained enough information for all 139 reported attendances, whereas over half (211, 62%) could not be classified from Hospital Episode Statistics alone. Combining data, 89 unclassified attendances had already been reported by the researcher, of which 15 (17%) were self-harm related. Of 39 self-harm attendances reported by the researchers, Hospital Episode Statistics identified 24 (62%) as self-harm related while 15 (38%) were unclassified.
Table 4.
Self-harm classification of attendances reported within Hospital Episode Statistics and by the researcher – initial comparison and full cohort follow-up.
| HESa | Researcher | ||||
|---|---|---|---|---|---|
| Self-harm | Non-self-harm | Not Known | Attendance not reportedb | Total | |
| Initial comparison | |||||
| Self-harm | 24 (7.0%) | 0 | 0 | 37 (10.8%) | 61 (17.7%) |
| Non-self-harm | 0 | 23 (6.7%) | 0 | 46 (13.4%) | 69 (20.1%) |
| Unclassified | 15 (4.4%) | 74 (21.5%) | 0 | 122 (35.5%) | 211 (61.3%) |
| Attendance not reported | 0 | 3 (0.9%) | 0 | NA | 3 (0.9%) |
| Total | 39 (11.3%) | 100 (29.1%) | 0 | 205 (59.6%) | 344 (100%) |
| Full cohort follow-up | |||||
| Self-harm | 83 (5.5%) | 3 (0.2%) | 0 | 186 (12.3%)b | 272 (18.0%) |
| Non-self-harm | 1 (0.1%) | 61 (4%) | 0 | 352 (23.3%)b | 414 (27.4%) |
| Unclassified | 129 (8.5%) | 505 (33.4%) | 9 (0.6%) | 161 (10.6%) | 804 (53.1%) |
| Attendance not reported | 3 (0.2%) | 19 (1.3%) | 1 (0.1%) | 23 (1.5%) | |
| Total | 216 (14.3%) | 588 (38.9%) | 10 (0.7%) | 699 (46.2%) | 1513 (100%) |
Hospital Episode Statistics.
Attendances not reported by the researcher during full cohort follow-up were largely not expected, as a change in process following the initial comparison meant researchers only identified attendances as directed by Hospital Episode Statistics data where more information was required to enable classification.
Researcher follow-up
Researchers were tasked with following up the 122 unclassified attendances and 90% were subsequently identified. Of the 12 (10%) remaining attendances not identified, one occurred out of area, two in primary care trusts, and four in minor injury or walk-in centers; meaning researchers could not search for these attendances being outside our R&D approved trusts. No record of the remaining five could be found despite further researcher visits.
Full cohort follow-up
Further Hospital Episode Statistics were obtained on four occasions throughout the trial for the full cohort of 832 participants. By the final download in May 2015, 1490 complete hospital attendances had been reported within participants’ 18-month follow-up (Table 4). For 686 (46%), Hospital Episode Statistics provided enough information to establish self-harm relatedness, while 804 (54%) required researcher follow-up, with 80% subsequently identified.
Many attendances not identified by the researchers were in minor injury or walk-in centres (40%, 64/161); a further 51 (32%) were in hospitals where the researcher had not identified any reported attendances; the remaining 46 (29%) were in hospitals where the researcher had successfully identified other reported attendances (6% of attendances requiring researcher follow-up).
Remaining unclassified attendances were clinically reviewed based on diagnosis and ICD-10 codes, outcome, duration of admission, and treatment. Clinical review classed 43% (69/161) as not self-harm related, largely based on admission codes for other mental health conditions and diagnoses including contusions/abrasions, dislocation/fractures, or sprain/ligament injury.
At the end of the trial, 23 (2%) attendances were reported by the researchers but not Hospital Episode Statistics; 14 were not reported despite successful linkage and coverage for those participants (whereas other attendances could be accounted for, that is, no linkage); all but one were Accident and Emergency attendances.
There were conflicting self-harm classifications for 4/791 (<1%) attendances reported via both sources, with three self-harm related according to Hospital Episode Statistics and one from the researcher; all related to poisoning or involved alcohol.
Of all 1490 complete hospital attendances reported from Hospital Episode Statistics, 272 (18.3%) were clearly self-harm related (Table 4), and 129 unclassified attendances were identified by the researcher as self-harm related, resulting in 401 self-harm attendances (26.9%). There remained 161 unclassified attendances without researcher follow-up; assuming the same proportion for self-harm as unclassified attendances with follow-up (20.1%, 129/643), we estimate a further 32 self-harm attendances. Therefore, of 1490 attendances, a total 433 (29.1%) self-harm attendances is assumed, of which 161 were not identified as self-harm related by Hospital Episode Statistics, comprising 10.8% of all attendances and 37.2% of self-harm related attendances. We therefore estimate Hospital Episode Statistics underestimate the overall proportion of self-harm hospital attendances in this population by 10.8% (95% confidence interval (CI) 9.3% -12.5%), and the overall number of self-harm hospital attendances by 37.2% (95% CI 32.6% -41.9%)
Discussion
We compared hospital attendances reported from Hospital Episode Statistics to those identified directly from hospital records by researchers, to assess whether routine data provided a complete, accurate, and reliable outcome dataset.
Main findings
Researcher data collection
Researcher access to data from multiple hospital trusts was a barrier from the outset. Over 2 years into recruitment, we had obtained approvals from 19/30 Trusts, with researchers having accessed data from 16. Difficulties were due to identification of suitable ‘local collaborators’ to facilitate access, and variable Trust opinions on the requirement for full local approval despite being ‘data collection only’ and classification by the main Research Ethics Committee as Site Specific Assessment exempt.
Researcher data collection was immensely resource intensive, requiring regular visits to numerous Trusts, and uncertainty concerning potential attendances at Trusts not included in the mapping and approvals process. The majority allowed access and review of electronic records and one was willing to link and provide electronic data periodically, however others required manual searches of paper records. It was often not possible for researchers to access both Accident and Emergency, and admissions data to provide the whole patient pathway and our findings suggest greater difficulty collecting admissions which were not via Accident and Emergency.
Hospital Episode Statistics comparison
Trial participants were reliably linked to Hospital Episode Statistics (95%) during the first download, with improved linkage (99%) over successive downloads.
Hospital Episode Statistics were superior for identification of the number, date, and location of hospital attendances. Researchers identified less than half of all attendances in our original comparison due to difficulties obtaining approvals and visiting trusts to manually collect attendances on a regular basis. Attendances not reported by researchers mainly included those in minor injury and walk-in centres, and hospitals outside our catchment areas. Excluding these and despite researcher visits having taken place at the relevant hospital, 25 (7%) attendances from Hospital Episode Statistics were not identified at site by the initial comparison; and 46 (6%) requiring researcher follow-up for self-harm classification were not identified in the full cohort follow-up.
Similar to recent literature,19,20 we found greater reliability and data quality in admissions than Accident and Emergency Hospital Episode Statistics for self-harm presentations. Over double the proportion of Accident and Emergency episodes were unclassified according to self-harm (75%) compared to admitted episodes (35%). Initial comparison identified 211 (62%) unclassified attendances, improving to 54% by full cohort follow-up.
In our full cohort follow-up, researchers verified 505/643 (79%) unclassified attendances as not self-harm related. Clinical review classified a further 69/161 (43%) without researcher data, as non-self-harm; suggesting our classification was potentially over-conservative. However, 20% of unclassified attendances were identified as self-harm related by the researcher and would have been missed based on Hospital Episode Statistics alone.
Initial comparison found 38.5% (95% CI 23.4%–55.4%, N = 39) of researcher reported self-harm attendances were unclassified from Hospital Episode Statistics; in our full cohort follow-up, relying on definitive self-harm identifiers in Hospital Episode Statistics underestimated the number of self-harm attendances by 37.2% (95% CI 32.6%–41.9%, N = 433). We found Hospital Episode Statistics to be more reliable than the Multicentre self-harm study20 which reported underestimation of the overall rate of self-harm by up to 60%. This higher rate of underestimation may be due to differences in study design as, unlike SHIFT, patients study data and Hospital Episode Statistics were not directly linked, nor were Accident and Emergency and admissions linked; thus, only overall rates of self-harm were compared for Accident and Emergency, and admissions separately.
Implications of findings
Using Hospital Episode Statistics did not negate the need for researcher follow-up due to the high proportion of unclassified attendances and underestimation of self-harm. Hospital Episode Statistics reliably identified overall hospital attendances, vastly improving the completeness and timeliness of safety reporting and ensuring efficient targeting of researcher data collection to obtain self-harm classification where unclear. Our findings suggest that routine data can provide an efficient method to identify outcome events involving hospital attendance and admission; however, depending on the nature of the event, additional follow-up may be required to provide further detail and validate specific events. Implications are relevant across routine data source and disease area, and further support studies which propose supplementing routine data with appropriate source documentation for focused events6 and which recommend replication of findings using multiple data sources.3
Lessons learnt
Obtaining pilot Hospital Episode Statistics proved vital to determine additional fields required to derive complete hospital attendances and determine the reliability of data to inform subsequent data collection. Initial analysis removed non-emergency attendances; however, these were retrospectively included, and follow-up attendances provided additional information. Greater consideration was required for attendances in NHS walk-in centres and minor injury units as these often fell outside of approved trusts with approval to collect data, were more sparsely populated, did not meet the primary outcome, and have known coverage issues. With the growth in walk-in centres and minor injury units, researchers must be clear on the type of attendances they investigate. This is complex as service patterns differ by area with potential for confusion in interpretation.
Manipulation following each download was an intensive task to link episodes to create complete hospital attendances, clean data, assign provider codes and self-harm classification. Data were provided cumulatively, therefore most recent years of data were ‘provisional’ and also required comparison to previous data to identify additional or updated records. Provider codes were often available at Trust rather than hospital level, particularly for Accident and Emergency, walk-in centre and minor injury unit attendances; as such researchers could not always be informed exactly where to search, and on two occasions primary care trusts were present in the dataset.
Recommendations for trialists
Consider the data provider’s consent requirements, and liaise in trial set-up during participant documentation development.
Incorporate a pilot download to ascertain if requested fields are sufficient, understand the data and quality, and analysis requirements.
Ensure additional data sources are available to verify routine data, or there is sufficient prior research to ensure completeness and reliability.
Consider the frequency of data receipt, time-lag of available data, and data processing undertaken by the provider.
Routine data may provide efficiencies in case identification and targeting researcher resource where populations are mobile, follow-up is over an extended period of time, frequent follow-up is required, or the sample size is large. If a study is restricted to a limited area, then routine data may be overly complex compared to traditional trial data collection methods.
Consider the scope of data providers; local clinical administrative datasets may exist regionally and different data sources nationally. Trials covering multiple regions or countries need to assess the reliability of all sources, the similarities and differences between them.
Strengths and limitations
Our research included an adolescent cohort with known history of self-harm, individual consent for Hospital Episode Statistics linkage, and researcher-collected data providing a comparator. We linked Accident and Emergency, and admissions episodes; subsequently identifying self-harm-related presentations and those which could not be classified. We further explored self-harm attendances not reported as such in Hospital Episode Statistics through researcher follow-up. Our findings build on research which relied on participant self-report from a wider population,19 the Multicentre self-harm study20 comparing overall self-harm rates between unlinked study data and anonymous Hospital Episode Statistics, and on research in other disease areas reporting on the use of routine data to identify endpoints in trials.6
A limiting factor of our initial comparison was the relatively few self-harm attendances reported by the researcher. However, the lack of data overall highlighted that researcher data collection alone could not provide a complete means of acquiring data, thus the need to use routine Hospital Episode Statistics to identify attendances and target researcher follow-up to reliably classify attendances.
Both Hospital Episode Statistics and researcher data collection were limited to England. Although this was a large national multicentre trial, attendances outside of England were not expected and unlikely to impact on results. Similar data sources do exist within the United Kingdom through the Secure Anonymised Information Linkage, and the Information Services Division of the NHS National Services Scotland, both of which use ICD-10 for medical coding of admissions.
Conclusion
Advantages of using routine data to obtain primary outcome data far outweighed disadvantages in this trial and a change to our method of data collection was instigated. Our resulting strategy allowed for accurate, complete, and timely identification of hospital attendances via routine data with further targeted data collection through researcher site visits for attendances requiring supplementary information to determine self-harm classification.
Supplementary Material
Supplemental material, 751381_supp_mat for Routine hospital data – is it good enough for trials? An example using England’s Hospital Episode Statistics in the SHIFT trial of Family Therapy vs. Treatment as Usual in adolescents following self-harm by Alexandra Wright-Hughes, Elizabeth Graham, David Cottrell and Amanda Farrin in Clinical Trials
Acknowledgments
The authors would like to thank the trial researchers in Leeds, Manchester, and London, staff from the Comprehensive Local Research Networks and local Mental Health Research Networks; and staff in the Leeds Clinical Trials Research Unit for their commitment to SHIFT and for all their help in data collection. The authors would also like to thank and acknowledge NHS Digital for providing Hospital Episode Statistics throughout the SHIFT trial, and Garry Coleman at NHS Digital for supporting this project and facilitating access to data.
Footnotes
Authors’ contributions: All authors contributed to design of the SHIFT trial, the analysis plan for the comparison of Hospital Episode Statistics and researcher data, and were involved in the interpretation of results. All authors also contributed to the writing of this article and read and approved the final article. Further to this contribution, D.C. is the lead grant holder and contributed to the acquisition of Hospital Episode Statistics. A.W.H. is the trial statistician and contributed to the acquisition of Hospital Episode Statistics, conducted the analysis, and drafted this article. L.G. is the trial manager and led on the acquisition of Hospital Episode Statistics. A.F. is a co-holder of the grant and the statistical guarantor.
Declaration of conflicting interests: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical approval: Leeds (West) Research Ethics Committee, 23/04/2009, REC ref: 09/H1307/20
Funding: This article presents independent research, the SHIFT trial, funded by the National Institute for Health Research under its Health Technology Assessment Programme (Reference Number 07/33/01). The views and opinions expressed therein are those of the authors and do not necessarily reflect those of the Health Technology Assessment Programme, the National Institute for Health Research, NHS or the Department of Health.
ORCID iD: Alexandra Wright-Hughes
https://orcid.org/0000-0001-8839-6756
Trial registration number: ISRCTN59793150. 26 January 2009.
References
- 1. Black N, Barker M, Payne M. Cross sectional survey of multicentre clinical databases in the United Kingdom. BMJ 2004; 328: 1478. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Parliamentary Office of Science and Technology, Houses of Parliament. Big data and public health. POSTNOTE No. 474, Parliamentary Office of Science and Technology, Houses of Parliament, July 2014, http://researchbriefings.files.parliament.uk/documents/POST-PN-474/POST-PN-474.pdf.
- 3. Cook J, Collins GS. The rise of big clinical databases. Br J Surg 2015; 102: e93–e101. [DOI] [PubMed] [Google Scholar]
- 4. Van Staa TP, Goldacre B, Gulliford M, et al. Pragmatic randomised trials using routine electronic health records: putting them to the test. BMJ 2012; 344: e55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Williams JG, Cheung WY, Cohen DR, et al. Can randomised trials rely on existing electronic data? A feasibility study to explore the value of routine data in health technology assessment. Health Technol Assess 2002; 7: iii, v–x, 1–117. [DOI] [PubMed] [Google Scholar]
- 6. Barry SJ, Dinnett E, Kean S, et al. Are routinely collected NHS administrative records suitable for endpoint identification in clinical trials? Evidence from the West of Scotland Coronary Prevention Study. PLoS ONE 2013; 8: e75379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Hawton K, Saunders KE, O’Connor RC. Self-harm and suicide in adolescents. Lancet 2012; 379: 2373–2382. [DOI] [PubMed] [Google Scholar]
- 8. Hawton K, Bergen H, Casey D, et al. Self-harm in England: a tale of three cities. Multicentre study of self-harm. Soc Psychiatry Psychiatr Epidemiol 2007; 42: 513–521. [DOI] [PubMed] [Google Scholar]
- 9. Wilkinson S, Taylor G, Templeton L, et al. Admissions to hospital for deliberate self-harm in England 1995-2000: an analysis of hospital episode statistics. J Public Health Med 2002; 24: 179–183. [DOI] [PubMed] [Google Scholar]
- 10. Gunnell D, Hawton K, Ho D, et al. Hospital admissions for self harm after discharge from psychiatric inpatient care: cohort study. BMJ 2008; 337: a2278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Gunnell D, Metcalfe C, While D, et al. Impact of national policy initiatives on fatal and non-fatal self-harm after psychiatric hospital discharge: time series analysis. Br J Psychiatry 2012; 201: 233–238. [DOI] [PubMed] [Google Scholar]
- 12. Singhal A, Ross J, Seminog O, et al. Risk of self-harm and suicide in people with specific psychiatric and physical disorders: comparisons between disorders using English national record linkage. J R Soc Med 2014; 107: 194–204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Herbert A, Gilbert R, González-Izquierdo A, et al. Violence, self-harm and drug or alcohol misuse in adolescents admitted to hospitals in England for injury: a retrospective cohort study. BMJ Open 2015; 5: e006079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Provisional Monthly Hospital Episode Statistics for Admitted Patient Care Outpatients Accident Emergency Data – April 2015 to August 2015. Provisional Monthly Hospital Episode Statistics for Admitted Patient Care, Outpatients and Accident and Emergency Data - April 2015 to August 2015: Topic of Interest - Intentional Self-harm 2015, https://digital.nhs.uk/catalogue/PUB19222.
- 15. Health Profiles England. The Indicator Guide Health Profiles 2013. London: Public Health England, 2013, http://webarchive.nationalarchives.gov.uk/20170106085344/http://www.apho.org.uk/resource/view.aspx?RID=116454 (accessed 11 November 2017). [Google Scholar]
- 16. Department of Health. Public health outcomes framework. Improving outcomes and supporting transparency. Part 2: summary technical specifications of public health indicators, https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/382115/PHOF_Part_2_Technical_Specifications_Autumn_2014_refresh_02.12.2014_FINAL.pdf (2014, accessed 13 November 2017).
- 17. Polling C, Tulloch A, Banerjee S, et al. Using routine clinical and administrative data to produce a dataset of attendances at Emergency Departments following self-harm. BMC Emerg Med 2015; 15: 15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Thomas KH, Davies N, Metcalfe C, et al. Validation of suicide and self-harm records in the Clinical Practice Research Datalink. Br J Clin Pharmacol 2013; 76: 145–157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Mars B, Cornish R, Heron J, et al. Using data linkage to investigate inconsistent reporting of self-harm and questionnaire non-response. Arch Suicide Res 2016; 20: 113–141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Clements C, Turnbull P, Hawton K, et al. Rates of self-harm presenting to general hospitals: a comparison of data from the Multicentre Study of Self-Harm in England and Hospital Episode Statistics. BMJ Open 2016; 6: e009749. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Wright-Hughes A, Graham E, Farrin A, et al. Self-Harm Intervention: Family Therapy (SHIFT), a study protocol for a randomised controlled trial of family therapy versus treatment as usual for young people seen after a second or subsequent episode of self-harm. Trials 2015; 16: 501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Health & Social Care Information Centre. Methodology to create provider and CIP spells from HES APC data, http://www.hscic.gov.uk/media/11859/Provider-Spells-Methodology/pdf/Spells_Methodology.pdf (2014, accessed 13 November 2017).
- 23. Hospital Episode Statistics (HES). Accident and Emergency Attendances in England - 2009–2010, Experimental statistics: Hospital Episode Statistics Accident and Emergency to Admitted Patient Care Linkage Methodology, http://www.hscic.gov.uk/catalogue/PUB02563/acci-emer-atte-eng-09-10-link-meth.pdf (2011, accessed 11 November 2017).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplemental material, 751381_supp_mat for Routine hospital data – is it good enough for trials? An example using England’s Hospital Episode Statistics in the SHIFT trial of Family Therapy vs. Treatment as Usual in adolescents following self-harm by Alexandra Wright-Hughes, Elizabeth Graham, David Cottrell and Amanda Farrin in Clinical Trials


