Skip to main content
Wellcome Open Research logoLink to Wellcome Open Research
. 2023 Feb 1;8:47. [Version 1] doi: 10.12688/wellcomeopenres.18720.1

Linkage of the Avon Longitudinal Study of Parents and Children (ALSPAC) to Avon & Somerset Police regional police records

Alison Teyhan 1,a, Rosie Cornish 1,2, Andy Boyd 3, Richard Thomas 3, Mark Mumme 1, Amy Dillon 4, Iain Brennan 5, Adrian Brown 6, Anna Ferrante 6, John Macleod 1,3,7,8
PMCID: PMC10403744  PMID: 37546715

Abstract

This data note describes a new resource for crime-related research: the Avon Longitudinal Study of Parents and Children (ALSPAC) linked to regional police records. The police data were provided by Avon & Somerset Police (A&SP), whose area of responsibility contains the ALSPAC recruitment area. In total, ALSPAC had permission to link to crime records for 12,662 of the ‘study children’ (now adults, who were born in the early 1990s).  The linkage took place in two stages: Stage 1 involved the ALSPAC Data Linkage Team establishing the linkage using personal identifiers common to both the ALSPAC participant database and A&SP records using deterministic and probabilistic methods. Stage 2 involved A&SP extracting attribute data on the matched individuals, removing personal identifiers and securely sharing the de-identified records with ALSPAC. The police data extraction took place in July 2021, when the participants were in their late 20s/early 30s. This data note contains details on the resulting linked police records available. In brief, electronic police records were available from 2007 onwards. In total, 1757 participants (14%) linked to at least one police record for a charge, offence ‘taken into consideration’, caution, or another out of court disposal. Linked participants had a total of 6413 records relating to 6283 offences. Almost three quarters of the linked participants were male. The most common offence types were violence against the person (22% of records), drug offences (19%), theft (17%) and public order offences (11%). This data note also details important issues that researchers using the local police data should be aware of, including the importance of defining an appropriate denominator, completeness, and biases affecting police records.

Keywords: ALSPAC, birth cohort, police data, linkage, crime

Background

A public health approach to tackling crime means a focus on populations rather than individuals, proactive prevention, and the tackling of upstream risk factors 1, 2 . To be successful, this approach relies on suitable data to produce a strong evidence base that can inform the design and delivery of effective interventions 1 . Police records alone cannot be used for this purpose as they do not contain data relating to an individual’s exposure to potential risk factors for perpetrating crime. However, the linkage of police records to longitudinal cohort study data has the potential to create a data resource that could be used to study both the antecedents and consequences of involvement with the criminal justice system. This is because many longitudinal birth cohort studies have detailed measures of the lives of their participants, and often their families, peers, and wider contexts, across the life course.

One such study is the Avon Longitudinal Study of Parents and Children (ALSPAC). It began in the early 1990s and the study children are now adults. ALSPAC has already established linkages to participants’ routinely collected electronic health, education, and geographic records. With regard to criminality records, ALSPAC had planned to link to the Police National Computer (PNC), which is a large centralised administrative database maintained by the Ministry of Justice (MoJ) that was started in 1974 and contains information about police cautions and court convictions in England and Wales 3 . A pilot linkage based on an anonymised extract of PNC data was achieved in 2013 4 ; however, this did not progress to a full linkage. A finding of the pilot study was that the majority of offences committed by the ALSPAC participants (86%) took place in the policing area local to ALSPAC (Avon and Somerset). It was therefore decided that pursuing linkage to local police records would be a more targeted yet equally valid approach.

The linkage of ALSPAC to Avon and Somerset Police (A&SP) records is the focus of this data note. The aims are: 1. to detail the linkage process, 2. to describe the police data available, 3. to highlight important considerations and limitations of the police data.

Materials and methods

Data sources

Avon Longitudinal Study of Parents and Children (ALSPAC). ALSPAC began with the recruitment of pregnant women who had an expected due date between April 1991 and December 1992 and who lived in a defined area in and around the city of Bristol, UK. The precise geographical catchment is described elsewhere 5 and broadly matches what are the present-day counties of Bristol, North Somerset and South Gloucestershire. There were 13,988 study children alive at one year of age. An additional 718 children, who met the original study eligibility criteria, but whose mothers had not joined the study during pregnancy, were recruited by age 18 years. Full details on ALSPAC are given in the cohort profiles 5, 6 and the study website contains details of all the data available through a fully searchable data dictionary and variable search tool.

Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee (ALEC) and the Local Research Ethics Committees. Study involvement of the index participants (the children born in 1991–92) was based on parental approval until the children reached adulthood. Informed consent for the use of data collected via questionnaires and clinics was obtained from participants following the recommendations of ALEC at the time. When participants reached age 18 years they were sent ‘fair processing’ materials that invited them to continue to take part in ALSPAC and informed them about ALSPAC’s intention to link to their routine health and administrative records, including any criminality records, and gave a clear means to opt out. Where practicable ( e.g. when attending a study assessment visit), participants were also able to explicitly consent. The original materials referred to linkage to PNC records. Therefore, before the linkage to regional police records held by A&SP, an update to the fair processing materials was provided to participants, using both online and postal materials, with a clear means to opt out. At the time of the linkage (July 2021) we had permission to link to the criminality records of 12,662 participants (comprising 5,055 who had explicitly opted-in to criminality linkage and 7,607 who had received the opt-out linkage form and did not respond). Participants who opted out of linkage to criminality records (4% to date), or who did not receive fair processing materials, were not included in the linkage to A&SP records.

Avon and Somerset Police. A&SP are responsible for law enforcement in the four counties that replaced the now abolished county of Avon (Bristol, Bath and North-East Somerset, North Somerset, and South Gloucestershire), plus the county of Somerset. The A&SP area (population 1.73 million; area 4784km 2 ) therefore includes the full ALSPAC recruitment area and some neighbouring areas.

Offences committed in the Avon and Somerset area, and which come to the attention of the police, are recorded by A&SP in their database. Offences in other areas of the country, or abroad, are not recorded in their database. The numbers of crimes recorded annually by A&SP, and every other police force in England and Wales, are available online 7 .

A&SP have used the NicheRMS365 cloud platform as their record management system since September 2015. Older electronic records (from the Guardian system that predated Niche) have been migrated to the Niche platform. Pre-2007, a large proportion of records were held on a system called CMU2 and these only held an electronic reference for the offence, with the majority of the information being held in a paper format. Some of these paper records are still held in the force archive and are researched as part of the police review process (detailed in next paragraph) and, if relevant for retention or if they add value to the record, the force’s Retention, Review and Disposal (RRD) Team can scan and upload them to Niche.

The Management of Police Information (MoPI) Code of Practice gives guidance in relation to the review, retention and disposal of policing information and records. Police records must be regularly reviewed to ensure that they remain necessary for a policing purpose, are accurate, adequate and up-to-date, and are kept for no longer than is necessary. Details of the review process are available online. In brief, all offences are categorised into one of three MoPI groups. Group 1 offences are the most serious, and Group 2 covers sexual, violent and serious offences not included in Group 1. The MoPI guidance is that Group 1 and 2 records are reviewed after a 10 year clear period ( i.e. a 10 year period in which the offender has not come to the police’s notice again). Group 1 and 2 records can only be deleted from the police database if that is deemed appropriate after a manual review process conducted by the RRD Team. If, for example, the subject is deemed to pose a high risk of harm, records would be retained and reviewed after a further 10 year clear period. Group 3 covers all other offences. These records are reviewed after an initial six year clear period and then, if retained, reviewed again every subsequent five year clear period. These records are currently manually reviewed and deleted as appropriate, but this could be automated for suitable Group 3 records in the future. All pre-2007 A&SP records relating to MoPI Group 3 offences have been manually reviewed and disposed of. Note that where a person is linked to multiple offences, the most serious offence determines the review category for all offences.

Linkage methodology

Data Processing Agreements for the transfer of A&SP data to ALSPAC were finalised in June 2020, and the police data were extracted in July 2021. The linkage took place in two stages, detailed below. All data processing was conducted by the three data managers in the ALSPAC Data Linkage Team, all of whom were individually security cleared by A&SP prior to the commencement of this project. All data processing took place within the ALSPAC Data Safe Haven, which is accredited to the ISO27001 information security standard.

Stage 1: Using personal identifiers to establish matches

As there is no strong, persistent identifier common to both ALSPAC and the A&SP dataset, a number of personal data items available in both datasets were used to determine which individuals in ALSPAC had an A&SP record. A&SP sent ALSPAC the forename, surname, date of birth (DoB), sex and full current and historical address(es) of all individuals held in their database who were born between 1 st January 1991 and 31 st January 1993 (the date range in which the vast majority of ALSPAC study children were born). No information about these individuals was sent other than these identifiers and a unique record ID ( ‘offender_id’). Comparable identifiers were extracted from the ALSPAC participant database for the 12,662 participants for whom ALSPAC had permission to link to criminal record data.

A combination of deterministic and probabilistic record linkage methods were used to maximise linkage coverage and minimise false matches. Firstly, a deterministic match was completed using forename, surname, and DoB ( i.e. these identifiers needed to be identical for there to be a ‘match’). This yielded 1876 matches to unique A&SP ‘ offender_ids’. Postcode was then used to create a match strength variable ( Table 1). Postcode was not included as a mandatory matching variable as it was considered likely that ALSPAC participants with a criminality record would be less likely to be actively engaged in the study (as both criminal involvement 8, 9 and ALSPAC participation 4, 10 are associated with social position), meaning the address information ALSPAC had for them at the time of linkage may have been out of date.

Table 1. Deterministic match criteria and number of matches.

Matching Criteria Match
strength
Number of
matches
Forename, surname, DoB,
full postcode 1
1 956
Forename, surname, DoB,
first part of postcode 1
2 403
Forename, surname, DoB 3 517
1876 TOTAL

1A postcode in the UK consists of two alphanumeric codes – the first part identifies the post town, and the second part relates to a few addresses within that post town (usually a group of around 15).

Probabilistic linkage was then used, which uses conditional probabilities to compute likelihood estimates for each field. Record comparisons involve comparisons for each field and the sum of the weights determined by each field comparison (using the likelihood estimate) provide an overall score. Record comparison scores over a defined threshold are designated a match. This probabilistic approach, along with the use of similarity comparisons, allows variations such as full name and short form names, errors in postcode details and typographical errors. The probabilistic matching procedure was conducted using the default settings in the LinXmart record linkage software (Version 1.8.3) developed by the Centre for Data Linkage at Curtin University, Australia 11 . LinXmart is able to perform linkage across event-level datasets, based on user-defined matching of demographic information using a probabilistic approach. (The authors used LinXmart free of charge based on our collaboration with Curtin University, but it is also more widely available as a commercial product. There are various open-access alternatives, including Splink). This method yielded 2292 matches to unique A&SP ‘ offender_ids’. This included all 1876 linked using the deterministic linkage process plus an additional 416 matched through probabilistic linkage and who passed two manual review stages (detailed in next paragraph). Therefore, in total 2273 ALSPAC individuals who had at least one record in the A&SP dataset were identified ( Figure 1).

Figure 1. Flow chart of linkage of ALSPAC participants to Offender IDs.

Figure 1.

The first manual review of the links achieved with probabilistic methods was a ‘twin check’; as twins share a number of common identifiers, they are at high risk of creating a false link. A list of twins (from ALSPAC administrative data) was used to create a flag to highlight individuals to be checked. These were required to match the A&SP identifiers on both gender and the first two characters of forename. This resulted in 37 offender_ids being removed. The second manual review used the LinXmart generated metric that indicates matching confidence between records. Those in the bottom 10% (lowest confidence) were selected for manual review and the following rules were derived that required these records to have:

  • A match on at least forename AND full date of birth

  • OR a surname AND full date of birth match

  • OR full postcode match

This resulted in an additional 34 individuals being removed from the linkage file. These individuals were heavily concentrated towards the lowest confidence scores. A few 'true' links also had some of the lowest confidence scores, indicating that the manual review was a worthwhile exercise (as opposed to setting a fixed/hard confidence threshold).

Duplicate checks were then performed. These identified three offender_ids matching to the same individual. These were removed, as none of these records had DoB or postcode matches, and the names alone were not deemed to have sufficient distinguishing power to confirm a match. Further, a duplicate check was run on the offender_id field and four duplicates were identified. Only the offender_id with a strong match (based on postcode) was retained. In total, 39 offender IDs were removed during the second manual review and de-duplication step.

It was also found that 19 ALSPAC individuals each linked to two offender_ids ( i.e. the police had marked an individual as two different people in their database, but they were the same person according to the ALSPAC database). In these cases, records belonging to both offender_ids were retained and linked to the same ALSPAC individual.

At the end of the linkage process, all personal identifiers provided by A&SP were securely destroyed in line with ALSPAC’s ISO27001 certified processes. This left an ID match variable (ALSPAC ID to A&SP offender_ids) and linkage quality variables.

Stage 2: Extracting attribute data

A&SP extracted 11,681 de-identified police event records related to the 2,273 individuals matched in Stage 1 and securely transferred these records to ALSPAC. In this event-based dataset, each row is a record that corresponds to a crime occurrence for an individual. A&SP records include the disposal outcome(s) for each crime. Of the disposal outcome types available in the police dataset, ALSPAC has an ethico-legal basis (set through the participant fair processing) to link to records with the following types: charges, crimes ‘taken into consideration’ (TICs), cautions, and other out-of-court disposals (penalty notices, drug warnings, and community resolutions). The threshold of evidence needed for an individual to be charged is high 12 , and the majority will go on to face trial in court. Conviction rates are high for many offences, but do vary by offence type 13 . TICs are crimes taken into consideration at the time of sentencing for another crime. The individual may volunteer these offences, or they may be asked by the police if they accept them. In either case, the individual must formally admit guilt to the additional crime(s) while under caution. To be issued with an out-of-court disposal, an individual must admit they are guilty of the offence and be eligible in terms of previous recorded offending (these disposals are designed to be used in situations of low-level offending). Notably, and in contrast to the PNC, A&SP do not routinely record conviction data.

The process to identify which A&SP records had an eligible disposal type was complicated by the fact that the police record many details of crimes at an offence level rather than at a person (offender) level. A&SP provided a variable which states how many offenders were involved in each crime; this enabled identification of ‘group crimes’, i.e. a single recorded crime that was alleged to have been perpetrated by multiple offenders. The following outcome variables were then used to determine the disposal type:

  • For crimes involving one offender: the main outcome variable, Currentclassificationhooutcom, was used. This is a 22 category variable that gives the Home Office outcome code for each offence 14 . This is an offence-level variable, but for offences that involve only one offender, it is in effect an individual-level variable. Records were linked to ALSPAC if currentclassificationhooutcom was OC1 (charged), OC2-3 (cautioned), OC4 (taken into consideration), OC6 (penalty notice for disorder), OC7 (cannabis warning), or OC8 (community resolution).

  • For crimes involving multiple offenders (‘group crimes’): for offences involving more than one offender, Currentclassificationhooutcom cannot be used as it is an offence-level variable, meaning everyone with a record for that offence is assigned the same outcome, the most serious outcome of the group. Instead, a secondary outcome variable offenderclassificationconcat was used. This is an individual-level, concatenated variable which lists several terms (up to six) for each individual ( e.g. ‘suspect; arrested; charged’). Note that prior to September 2015 (when police recording software changed to Niche), the term ‘prosecuted’ was used in the concatenated variable to cover TICs and all out-of-court disposals (cautions, penalty notices, drug warnings, and community resolutions). Records were linked to ALSPAC if offenderclassificationconcat contained at least one of the following terms (which relate to OC 1-4 and 6-8): charged, TIC, cautioned, adult conditional caution, postal requisition, reported for summons, cannabis warning, penalty notice for disorder, community resolution, prosecuted.

All other records were deleted (these include records where the individual had been eliminated from enquiries, or where there was insufficient evidence to proceed). This resulted in a final sample of 6413 police records ( Figure 2). The 6413 records relate to 1757 individuals and 6283 separate offences (an example of the data structure is shown in Figure 3).

Figure 2. Flow chart of linkage of police events records to ALSPAC participants.

Figure 2.

Figure 3. Relationship between individuals, crime records and offences.

Figure 3.

The A&S Police dataset

Data provided by A&SP

The A&SP data set contains 19 variables. These comprise administrative variables, date variables that specify when an offence took place and when it was reported to police, type and severity of offence variables, disposal type variables, flag variables, and variables related to Magistrates’ Court appearances. Table 2 lists the variable names along with a brief description, a summary of the missing data, and a note as to whether they are available to researchers.

Table 2. Variables in the dataset provided by A&S police.

Variable type Variable name Description % missing
(100%=6413)
Available for researchers?
Administrative occurrence_id ID of the crime 0% Yes 1
offendercount How many offenders were involved
in the crime
0% Yes
Date occurrencecreateddate System generated, triggered by a
111/999 call about an occurrence
that the officer later declares a
crime, or similar.
0% No (however, age available) 2
occurrencereporteddate Automatically entered when
the crime occurrence is created
(generated from STORM 1 and
pushed to Niche 2 ).
0% No (however, age available) 2
occurrencefromdate Date of the offence, person
reported via 111/999 or any other
way
0.1% No (however, age available) 2
Type/severity
of offence
currentoffencegroup 12 category variable giving type of
offence
0% Yes
currentoffencehocode Offence Home Office code 0% No 3
currentoffencedescription Offence description 0% No 3
scorexmultiplier Crime severity score 0% Yes
Disposal type currentclassificationhooutcom Offence-level. Home office
outcome code and description
0% No
offenderclassificationconcat Individual-level. String variable
with up to 6 terms. This has been
split into 6 separate variables.
0% No
Flag domesticabuseindicator Crime involved domestic abuse
(no/yes)
0% Yes
knifecrimeindicator Crime involved a knife (no/yes) 0% Yes
drugsflagged Crime involved drugs (no/yes) 0% Yes
alcohol Crime involved alcohol 98.2% No 4
currentsubstanceusedbyoffend Offender affected by: alcohol;
alcohol and drugs; drugs; not
affected; not known.
This flag started being used in the
mid-2000s but has since fallen into
disuse. Not mandatory field.
95.2% (99.0% if not
known category is
treated as missing)
No 4
Magistate’s Court casefileid ID of Magistrates’ court case 87.9% Yes 1
casefilecreateddateandtime Date of court case 88.2% (3.0% of those
with a casefileid)
No (however, age
available) 2
verdict Verdict of Magistrates’ court case
(Not guilty; guilty)
92.5% (37.9% of those
with a casefileid)
No 4

1A pseudonymised version of these variables is available.

2Age in months has been derived for each of the date variables.

3The Home Office code, and corresponding description, variables are not available to researchers due to a large number of codes having small numbers of records. However, researchers can specify an aggregated variable - this will be available provided numbers in each grouping are adequate.

4These variables will not be released due to a high proportion of missing data.

With regards the date variables, for over half (53%) of the records occurrencefromdate is equal to occurrencereporteddate ( i.e. the crime was reported on the same day that it occurred). For the records with non-matching dates, the difference ranges from one day to several years: 45% of these records have a difference of only one day (meaning the crime was reported the day after it was thought to have occurred), 73% have a difference of <10 days, and 7% have a difference of over a year. In general, reasons for short time discrepancies between when a crime occurs and the date it is reported to police can include a person not being aware of the exact date of the offence ( e.g. house was burgled when on holiday). Reasons for longer discrepancies can include historical sexual assaults, or a catalogue of domestic abuse incidents being reported in one report.

There are four ‘flag’ variables which specify if a crime involved domestic abuse, knife crime, drugs, or alcohol. There is an additional variable which specifies whether an offender was using drugs and/or alcohol ( currentsubstanceusedbyoffend) but this has very high levels of missing data as it is no longer used by the police in their reporting.

The nature of the offence is given by: currentoffencehocode (the Home Office code for the offence 15 ); currentoffencedescription (a detailed categorical variable which describes these codes); and currentoffencegroup (a categorical variable which assigns each of the offences to one of 12 broader offence groups). For example, currentoffencedescription describes a code as ‘possession of cannabis’ and currentoffencegroup assigns that offence to the ‘drug offences’ category.

The variable scorexmultiplier indicates each offence’s severity. These scores are used by A&SP to monitor the harm arising from crimes as opposed to just measuring crime volume, enabling them to identify the most high-risk offenders and most vulnerable victims. These scores are not used by the courts. The scorexmultiplier value is derived from the ‘harm score’ for the offence, increased by a ‘multiplier’ if relevant. Each Home Office offence code has a corresponding harm score, ranging from 0.01 to 100. Offences with a harm score <3 include intent to supply class A drugs (harm score of 0.8), wounding with intent to do serious bodily harm (1.45), and rape (2.9). No offences have a harm score between 3 and 8. Crimes with scores ≥8 include conspiring to traffic a person into the UK for exploitation (8), causing or inciting child pornography (10), manslaughter (30), use of noxious substance in terrorism offence (50), and murder (100). These harm scores are increased by a multiplier if the following factors are present: +30% for domestic abuse related, +50% for hate related, +5% for drug related, +10% if there is a firearm tag, and +30% if there is a safeguarding children tag. If more than one of these factors is present, the multipliers are cumulative and applied in the order listed. Note that ALSPAC has not been provided with variables that specify which multipliers were used in the calculation of each crime’s scorexmultiplier value.

The final three variables relate to Magistrates’ Court appearances (from November 2015 only): casefileid is the ID for that court appearance, casefilecreateddateandtime gives the date of the court case, and verdict states whether the defendant was found guilty or not guilty (this variable has high levels of missing data: this information is generated by the local Crown Prosecution Service, not the police, and the data flow between them can be poor).

Changes made to A&SP data by ALSPAC

The ALSPAC Data Linkage managers made changes to some of the police data to prevent disclosure of ALSPAC participants’ identities during research use. The changes are:

•   The occurrence_id and casefileid variables have been pseudonymised but retain equivalent functionality.

•   The date variables will not be released. Instead, the age of the participant on each of these dates has been calculated (in months) using their date of birth. Month and year of offence will be available.

•   The original outcomes variables ( currentclassificationhooutcom and offenderclassificationconcat) will not be released. A binary variable has been derived (participant has a police record, yes or no). This binary variable ensures all ALSPAC participants with a record are treated equally (as details on type of disposal are not available for individuals involved in group crimes prior to September 2015, as described above).

•   The variables that describe the nature of the offence in detail ( currentoffencehocode and currentoffencedescription) will not be released to researchers in their original format as they have many categories with small cell counts. The offence group variable ( currentoffencegroup) will be available. If required, researchers can discuss with the ALSPAC Data Linkage Team options for grouping the Home Office codes in a different way to that available in the currentoffencegroup variable.

•   The scorexmultiplier and offendercount variables will be aggregated at the upper end due to small numbers of records with high scores.

•   Variables with high levels of missing data will not be released.

Brief summary of the police data available

This section gives a brief overview of the police data, including the time period that the records cover, and the numbers of offences by type of offence and sex. Researchers requiring more detailed information in order to determine if these data are suitable for their research purposes should contact the ALSPAC Data Linkage Team.

Of those in the ALSPAC sample with an A&SP record, 73% are male. Over three quarters of records are for an offence involving only one person. The years 2009–2010 saw the largest number of offences (when the participants would have been in their late teens). This peak was driven largely by males’ offending ( Figure 4): females have considerably fewer records, and the distribution of their records by year of offence is flatter. Most of the individuals with a record have a small number of records: 47% have one, and 18% have two (range 1 to >150, median 2). For those with more than one record, the time difference between first and last offence ranges from 0 (i.e. all offences took place on same day) to several years (median 3.9 years). Note that the police data are left and right censored (few records available pre-2007 as they were not in electronic format, and no records for offences that were reported after the linkage to ALSPAC occurred in July 2021). In terms of crime severity, the scorexmultiplier variable has a range of 0.01 to over 100, with most records having a relatively low score (74% of records have a severity score of ≤0.2).

Figure 4. Distribution of age at offence by sex.

Figure 4.

(Figure 4 a is Males, 4 b is Females). Footnote for Figure 4: Due to small numbers of offences at the youngest and oldest ages, any offences below the age of 14.5 are included in the 14.5 group, and any offences over the age of 29.5 are included in the 29.5 group.

The 6413 A&SP records linked to ALSPAC cover a wide range of offence groups. The most common groups were violence against the person (22% of records), drug offences (19%), theft (17%) and public order offences (11%) ( Table 3). All offences were more common in males than females. There are similarities and differences in the distribution of offences by sex. For example, violent crime accounts for a similar percentage of the crimes committed by males and females (around 22–25%). In contrast, thefts make up just 12% of male crimes but 39% of female crimes.

Table 3. Summary of number of police records, by offence group and sex.

Offence group Overall
N records (%)
(100%=6413)
Males
N records (%)
(100%=5255)
Females
N records (%)
(100%=1158)
Arson and criminal damage 807 (12.7) 732 (13.9) 75 (6.5)
Burglary 466 (7.3) 451 (8.6) 15 (1.3)
Drug offences 1237 (19.3) 1095 (20.8) 142 (12.3)
Fraud 1 44 (0.7) - -
Miscellaneous crimes against society 157 (2.5) 131 (2.5) 26 (2.3)
Possession of weapons 85 (1.3) 77 (1.5) 8 (0.7)
Public order offences 683 (10.7) 572 (10.9) 111 (9.6)
Robbery 102 (1.6) 88 (1.7) 14 (1.2)
Sexual offences 1 45 (0.7) - n<5
Theft 1077 (16.8) 622 (11.8) 455 (39.3)
Vehicle offences 277 (4.3) 268 (5.1) 9 (0.8)
Violence against the person 1433 (22.4) 1141 (21.7) 292 (25.2)

1 Cell counts suppressed to prevent calculation of the small cell count for sexual offence records for females.

Points to note

General points on regional police records

Police forces only hold records of crimes committed in their area. Therefore, a lack of an A&SP record does not mean an individual does not have a police record elsewhere. Further, not all crimes are reported to, or recorded by, the police. An additional consideration is that police forces are only able to retain records if there is a justification for doing so. As per MoPI rules, many older records for Category 3 offences where the individual was not involved in any further crime will likely have been deleted and will therefore not have been included in this linkage. Additionally, A&SP used paper records pre-2007 and the majority of these were not transferred to an electronic form. We cannot quantify the extent of deleted records, but we do know that in the pilot linkage of ALSPAC to the PNC there were several pre-2007 records (predominantly in 2005 and 2006) (see Figure 2 in reference 4), meaning we can be sure that some ALSPAC participants did have police records pre-2007. Overall, this means linkage to A&SP records as a way of measuring offending in the ALSPAC cohort will underestimate the total amount of crime committed by this group.

Regional police records do not routinely include data on convictions, and it is important that this is made clear when describing the police data and interpreting findings. While conviction rates can be very high, they do vary by offence type 13 . The age of criminal responsibility in England is ten years; children below this age cannot be arrested, charged or cautioned if they break the law 16 . The UK has no statute of limitations for indictable (either-way) and indictable only offences; for summary offences it is generally six months although there are exceptions. (The term ‘statute of limitations’ refers to the maximum time limit after an event that legal proceedings can be initiated: after the time limit has passed, a person cannot be prosecuted regardless of the evidence against them). It is common to see offences, particularly sexual offences, prosecuted many years after the offence took place.

It is important for researchers using police data to be aware that there are several sources of bias. These include bias in terms of whose criminal behaviour is detected by the police, and the disposal type they are given. Examples of this include the disproportionate use of Stop and Search on Black, Asian and Minority Ethnic communities 17 and variations in the rate of reporting of crime across communities and demographic groups 18 . Bias may also be introduced through the data linkage process if participants with a criminal record are, in general, less active in ALSPAC, resulting in their identifier information ( e.g. current name and address) held by the study being out of date; this is likely to be true since levels of participation in ALSPAC are lower among individuals from more deprived backgrounds 10 and deprivation is associated with increased involvement in crime 8, 9 . It is also known that linkage error can be differential with respect to particular socio-demographic characteristics ( e.g. non-traditional UK names may be at increased risk of being incorrectly entered into official records) and, finally, missed matches can occur when linking to crime records in particular due to the use of ‘fake’ identifiers. Notwithstanding these limitations, police data have been—and continue to be—a useful, population-level indicator of criminal behaviour.

Defining an appropriate denominator

As the A&SP records only cover crimes committed in A&S, it is important for researchers to be able to identify who was living in this area so that an appropriate denominator can be defined. Flags have been derived that denote whether an individual was living in A&S on each of their birthdays (this is based on the contact address ALSPAC held for that child’s family at each time point and is unlikely to be completely accurate). At age 10 (the youngest age someone can have a police record in England), almost 90% of the ALSPAC sample for whom there is permission to link to crime data had an address in A&S. This proportion declined only slightly through adolescence but then dropped to 76% by age 24 and 66% by age 28. Overall, over 60% of the sample had an ALSPAC recorded contact address in A&S for every birthday from age 10 through to 28 years.

ALSPAC data availability for those with a police record

Participants with an A&SP record have lower response rates to questionnaires and lower attendance rates at clinics than those with no A&SP record ( i.e. they have more missing ALSPAC-collected data). This is true at all ages and for most questionnaire types [including mother, partner, child-based (completed by the mother about the child), and child-completed]. Of those participants eligible for crime data linkage (n=12,662), for the vast majority ALSPAC also has permission to link to their health and education records. However, those with a crime record are much less likely to have actively consented to data linkage (given active opt-in consent was only collected where practicable, and this is tied to active study participation) and are much more likely to be non-responders. This emphasises the importance of using opt-out linkage permission approaches and including non-responders in any analyses using linkage data where possible 19 .

Acknowledgements

Achieving this linkage has been a team effort and the authors would like to sincerely thank the A&S Police staff who supported this linkage, prepared the police data set for us, and answered our many queries.

We are extremely grateful to all the families who took part in ALSPAC, the midwives for their help in recruiting them, and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists and nurses.

Funding Statement

This work was supported by Wellcome [086118, <a href=https://doi.org/10.35802/086118>https://doi.org/10.35802/086118</a>]; the UK Medical Research Council and Wellcome Trust [217065, <a href=https://doi.org/10.35802/217065>https://doi.org/10.35802/217065</a>] and the University of Bristol provide core support for ALSPAC; the Medical Research Council [MC_PC_17210; to AB, AT, RT and JM]; ESRC [ES/T014393/1; grant awarded to AT, RC, IB and JM]; and Avon and Somerset Police.

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

[version 1; peer review: 2 approved, 1 approved with reservations]

Data availability

If you require further information about the A&SP data, please contact the ALSPAC Data Linkage Team ( alspac-linkage@bristol.ac.uk).

ALSPAC data access is through a system of managed open access. The steps below highlight how to apply for access to the data included in this data note and all other ALSPAC data:

i. Please read the ALSPAC access policy which describes the process of accessing the data and samples in detail, and outlines the costs associated with doing so.

ii. You may also find it useful to browse our fully searchable research proposals database, which lists all research projects that have been approved since April 2011.

iii. Please submit your research proposal for consideration by the ALSPAC Executive Committee. You will receive a response within 10 working days to advise you whether your proposal has been approved.

References

  • 1. Christmas H, Srivastava J: Public health approaches in policing: a discussion paper. Public Health England, College of Policing.2019. Reference Source [Google Scholar]
  • 2. National Police Chiefs' Council: Policing, Health and Social Care consensus: working together to protect and prevent harm to vulnerable people.2018. Reference Source [Google Scholar]
  • 3. Ministry of Justice, ADR UK: Data First: An Introductory User Guide. Harnessing the potential of linked administrative data for the justice system. Version 7.0.2022. Reference Source [Google Scholar]
  • 4. Boyd A, Teyhan A, Cornish RP, et al. : The potential for linking cohort participants to official criminal records: a pilot study using the Avon Longitudinal Study of Parents and Children (ALSPAC) [version 2; peer review: 2 approved]. Wellcome Open Res. 2022;5:271. 10.12688/wellcomeopenres.16328.2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Boyd A, Golding J, Macleod J, et al. : Cohort Profile: the 'children of the 90s'--the index offspring of the Avon Longitudinal Study of Parents and Children. Int J Epidemiol. 2013;42(1):111–27. 10.1093/ije/dys064 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Fraser A, Macdonald-Wallis C, Tilling K, et al. : Cohort Profile: the Avon Longitudinal Study of Parents and Children: ALSPAC mothers cohort. Int J Epidemiol. 2013;42(1):97–110. 10.1093/ije/dys066 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Office for National Statistics: Crime in England and Wales: Police Force Area data tables.2022. Reference Source [Google Scholar]
  • 8. Piotrowska PJ, Stride CB, Croft SE, et al. : Socioeconomic status and antisocial behaviour among children and adolescents: a systematic review and meta-analysis. Clin Psychol Rev. 2015;35:47–55. 10.1016/j.cpr.2014.11.003 [DOI] [PubMed] [Google Scholar]
  • 9. Office for National Statistics: The education and social care background of young people who interact with the criminal justice system.2022. Reference Source [Google Scholar]
  • 10. Cornish RP, Macleod J, Boyd A, et al. : Factors associated with participation over time in the Avon Longitudinal Study of Parents and Children: a study using linked education and primary care data. Int J Epidemiol. 2020;50(1):293–302. 10.1093/ije/dyaa192 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Boyd JH, Randall SM, Brown AP, et al. : Population Data Centre Profiles: Centre for Data Linkage. Int J Popul Data Sci. 2020;4(2):1139. 10.23889/ijpds.v4i2.1139 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Crown Prosecution Service: The Code for Crown Prosecutors.2018. Reference Source [Google Scholar]
  • 13. Crown Prosecution Service: Prosecution Data Tables Year Ending June 2022.2022. Reference Source [Google Scholar]
  • 14. Home Office: Police Recorded Crime and Outcomes: Open Data Tables User Guide.2016. Reference Source [Google Scholar]
  • 15. Home Office: Counting rules notifiable offences and notifiable reported incidents list 2022 to 2023. London: Home Office;2022. Reference Source [Google Scholar]
  • 16. McGuinness T: The age of criminal responsibility. House of Commons Library. Briefing Paper Number 7687.2016. Reference Source [Google Scholar]
  • 17. Avon and Somerset Criminal Justice Board: Identifying disproportionality in the Avon and Somerset Criminal Justice System.2022. Reference Source [Google Scholar]
  • 18. Buil-Gil D, Medina J, Shlomo N: Measuring the dark figure of crime in geographic areas: Small area estimation from the Crime Survey for England and Wales. Br J Criminol. 2020;61(2):364–88. 10.1093/bjc/azaa067 [DOI] [Google Scholar]
  • 19. Boyd A: Understanding Population Data for Inclusive Longitudinal Research. Bristol: University of Bristol;2021. Reference Source [Google Scholar]
Wellcome Open Res. 2023 Aug 4. doi: 10.21956/wellcomeopenres.20758.r59614

Reviewer response for version 1

Shawn Bushway 2, Nicholas Goldrosen 1

This paper describes the valuable task of linking police-recorded offending data from Avon and Somerset (A&S) Police to the broader Avon Longitudinal Study of Parents and Children (ALSPAC) dataset. Administrative data linkages are an increasingly important area of research attention in criminology, such as with the Ministry of Justice’s Data First project [1]. The inclusion of official arrest data in a rich longitudinal dataset like ALSPAC will allow for important new explorations of the role of arrest and justice involvement in the life-course [2]. The linkage process is carefully done and well-described, which will help other UK researchers undertaking administrative data linkages with police data

While the authors clearly acknowledge many of the linkage’s limitations, some of these deserve closer treatment. 86% of offending in the pilot PNC linkage occurred in the A&S police area -- it is unclear from the pilot linkage paper or this paper whether the non-A&S offending systematically differed from that in A&S [3]. The authors could use the PNC pilot linkage to address whether the offences in the A&S area were similar in age of onset, crime type, and crime severity to the offences recorded by other police forces. This information would be important in understanding criminality that crosses geographic boundaries, such as involvement in county lines or other organised crime [4].

More importantly, the routine deletion of group 3 offences from local police systems undercuts the ability of this data to show how early arrests for minor events could lead directly to future involvement with the police. This concern is over and above the authors’ observation that this purging, “will underestimate the total amount of crime committed by this group.” The linkage was performed in 2021, when the cohort was aged 29-31 years old, meaning that group 3 offences occurring under the age of 23-25 are vastly underrepresented. The average onset of serious delinquency in other studies has been between 11 and 14 years old; a large portion of desistance occurs in the early 20s [5,6]. Hence, not only the frequency but also the prevalence of offending will be underestimated, as some participants’ offences will likely be wholly excluded from the linked police data. One way to quantify this underrepresentation might be using the pilot PNC linkage, as PNC data is not purged over time. What proportion of participants who do have a police record stemming from the A&S police area in the PNC linkage show no records, or substantially fewer records, in the A&S linked data?

Additionally, given the importance of self-reported data on criminal offending — and comparisons of this data with official records [7] — the inclusion of some descriptive statistics on self-reported versus officially-recorded crime in the ALSPAC cohort would be helpful. For example, how many ALSPAC participants that self-report criminal justice involvement and did not withdraw consent for linkage have no A&S police record? This answer might also help estimate the impact of purged group 3 offences or offences occurring outside A&S on the linked criminal record data.

Such a comparison would also place this research into the proper context of other papers that compare self-reported and officially-reported arrests in life-course criminology as a field. Existing studies of this type in the United Kingdom, such as the Cambridge Study in Delinquent Development and the Edinburgh Study of Youth Transitions and Crime, are few [8,9]; the Cambridge study is quite old, with a cohort born in the 1960s. Examples in the U.S. are also rare, and are either dated [10] or limited to youth are already deeply involved with the justice system [11]. The inclusion of police-recorded crime data in a longitudinal study of more recent vintage and from a new geographical area is an important development to understanding the onset and persistence of, and desistance from, offending over the life-course [12] in a world where the discovery of large cohort differences in criminal justice involvement [13-18] are raising serious questions about the supposed constancy of the vaunted age-crime curve [19].

The inclusion of crime harm in the dataset is another valuable element. Harm indices have become a widely-used tool in both criminological research and practice [20]. The exact origins of the A&S harm score, though, is unclear. Some other crime harm indices have relied on public sentiment [21] or on sentencing guidelines [22], but it is not clear if the A&S scores reflect either of these. It would be helpful to include information on how the A&S harm scores were calculated by police.

In sum, this paper is a clear and well-written description of a thorough and valuable data linkage project. For criminologists, the inclusion of police data in the ALSPAC dataset makes it an excellent resource — we look forward to additional papers using this important new resource to explore the linkages between arrest and other life events [23] as well as descriptive questions about how the prevalence of police involvement varies between birth cohorts.

[1] Ministry of Justice (2020), Data First. https://www.gov.uk/guidance/ministry-of-justice-data-first

[2] Kirk, D. and Wakefield, S. (2018). Collateral Consequences of Punishment: A Critical Review and Path Forward. Annual Review of Criminology 1(1), 171-194.

[3] Boyd, A., Teyhan, A., Cornish, R.P., et al. The potential for linking cohort participants to official criminal records: a pilot study using the Avon Longitudinal Study of Parents and Children (ALSPAC) [version 2; peer review: 2 approved]. Wellcome Open Research 5(271).

[4] E.g., McLean, R., Robinson, G., and Densley, J.A. (2020). County lines: criminal networks and evolving drug markets in Britain. Springer.

[5] Farrington, D.P. (1986). Age and crime. In Tonry, M. and Morris, N. (eds.) Crime and Justice, vol 7. Chicago: University of 189–250.

[6] Moffitt, T.E. (1993). Adolescence-limited and life-course-persistent antisocial behavior: a developmental taxonomy. Psychological Review 100(4), 674–701.

[7] Thornberry, T.P. and Krohn, M.D. (2003). Comparison of Self-Report and Official Data for Measuring Crime. Measurement Problems in Criminal Justice Research: Workshop Summary. Washington, DC: The National Academies Press.

[8] Farrington, D.P., Jolliffe, D., and Coid, J.W. (2021) Cohort profile: the Cambridge study in delinquent development (CSDD).  Journal of Developmental and Life Course Criminology 7, 278–291

[9] Smith, D.J., McVie, S., Woodward, R., Shute, J., and McAra, L. (2001). The Edinburgh Study of Youth Transitions and Crime: Key findings at ages 12 and 13. The Edinburgh Study of Youth Transitions and Crime,  Research Digest no. 1.

[10] Krohn M. D., Lizotte A. J., Phillips M., Thornberry T. P., Bell K. A. (2013). Explaining systematic bias in self-reported measures: Factors that affect the under- and over-reporting of self-reported arrests. Justice Quarterly, 30, 501-528.

[11] Piquero, A. R., Schubert, C. A., and Brame, R. (2014). Comparing Official and Self-report Records of Offending across Gender and Race/Ethnicity in a Longitudinal Study of Serious Youthful Offenders.  Journal of Research in Crime and Delinquency 51(4), 526–556.

[12] E.g., Stewart, A., Dennison, S., Allard, T., et al. (2015). Administrative data linkage as a tool for developmental and life-course criminology: The Queensland Linkage Project.  Australian & New Zealand Journal of Criminology 48(3), 409–428.

[13] Sampson, R.J. and Smith, L.A. (2021). Rethinking Criminal Propensity and Character: Cohort Inequalities and the Power of Social Change.  Crime and Justice 50, 13-76.

[14] Neil, R. and Sampson, R.J. (2021). The Birth Lottery of History: Arrest over the Life Course of Multiple Cohorts Coming of Age, 1995-2018. American Journal of Sociology 126(5), 1127–1178.

[15] Kim, J., Bushway, S., and Tsao, H. (2016). Identifying Classes of Explanations for the Crime Drop: Period and Cohort Explanations in New York State. Journal of Quantitative Criminology 32, 357-375.

[16] Shen, Y., Bushway, S., Sorensen, L., and Smith, H. (2020). Locking Up My Generation: Cohort Differences in Prison Spells Over the Life Course. Criminology 54(4), 645-677.

[17] Bjerk, D. and Bushway, S. (2022). The Long-term Incarceration Consequences of Coming-of-Age in a Crime Boom. Journal of Quantitative Criminology.

[18] Spelman, W. (2022). How cohorts changed crime rates, 1980-2016. Journal of Quantitative Criminology 38(3), 637-671.

[19] Britt, C.L. (1992) Constancy and Change in the U.S. Age Distribution of Crime: A Test of the ‘Invariance Hypothesis.’ Journal of Quantitative Criminology 8(2), 175-87.

[20] van Ruitenburg, T. and Ruiter, S. (2023). The adoption of a crime harm index: A scoping literature review. Police Practice and Research 24(4), 423-445.

[21] Wolfgang, M., Figlio, R. M., Tracy, P. E., and Singer, S. I. (1985).  The National Survey of Crime Severity. U.S. Dept. of Justice, Bureau of Justice Statistics.

[22] Sherman, L., Neyroud, P., and Neyroud, E. (2016). The Cambridge crime harm index: measuring total harm from crime based on sentencing guidelines.  Policing: A Journal of Policy and Practice 10(3), 171–183.

[23] Doherty E.E., Cwick, J.M., Green, K.M., and Ensminger, M.E. (2016). Examining the Consequences of the ‘Prevalent Life Events’ of Arrest and Incarceration among an Urban African-American Cohort. Justice Quarterly 33(6), 970-999.

Are sufficient details of methods and materials provided to allow replication by others?

Yes

Is the rationale for creating the dataset(s) clearly described?

Partly

Are the datasets clearly presented in a useable and accessible format?

Yes

Are the protocols appropriate and is the work technically sound?

Yes

Reviewer Expertise:

Criminology

We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

References

  • 1. : Ministry of Justice: Data First. UK Government Ministry of Justice/ .2022; Reference source
  • 2. : Collateral Consequences of Punishment: A Critical Review and Path Forward. Annual Review of Criminology .2018;1(1) : 10.1146/annurev-criminol-032317-092045 171-194 10.1146/annurev-criminol-032317-092045 [DOI] [Google Scholar]
  • 3. : The potential for linking cohort participants to official criminal records: a pilot study using the Avon Longitudinal Study of Parents and Children (ALSPAC). Wellcome Open Res .2020;5: 10.12688/wellcomeopenres.16328.2 271 10.12688/wellcomeopenres.16328.2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. : County Lines. 10.1007/978-3-030-33362-1 10.1007/978-3-030-33362-1 [DOI] [Google Scholar]
  • 5. : Age and Crime. Crime and Justice .1986;7: 10.1086/449114 189-250 10.1086/449114 [DOI] [Google Scholar]
  • 6. : Adolescence-limited and life-course-persistent antisocial behavior: A developmental taxonomy. Psychological Review .1993;100(4) : 10.1037/0033-295X.100.4.674 674-701 10.1037/0033-295X.100.4.674 [DOI] [PubMed] [Google Scholar]
  • 7. : Comparison of Self-Report and Official Data for Measuring Crime In Measurement Problems in Criminal Justice Research: Workshop Summary. Washington, DC: The National Academies Press, National Academies of Sciences, Engineering, and Medicine .2003; 10.17226/10581 43-94 10.17226/10581 Reference source [DOI]
  • 8. : Cohort Profile: The Cambridge Study in Delinquent Development (CSDD). Journal of Developmental and Life-Course Criminology .2021;7(2) : 10.1007/s40865-021-00162-y 278-291 10.1007/s40865-021-00162-y [DOI] [Google Scholar]
  • 9. : The Edinburgh Study of Youth Transitions and Crime: Key Findings at Ages 12 and 13. Centre for Law and Society, University of Edinburgh .2001; Reference source
  • 10. : Explaining Systematic Bias in Self-Reported Measures: Factors that Affect the Under- and Over-Reporting of Self-Reported Arrests. Justice Quarterly .2013;30(3) : 10.1080/07418825.2011.606226 501-528 10.1080/07418825.2011.606226 [DOI] [Google Scholar]
  • 11. : Comparing Official and Self-report Records of Offending across Gender and Race/Ethnicity in a Longitudinal Study of Serious Youthful Offenders. Journal of Research in Crime and Delinquency .2014;51(4) : 10.1177/0022427813520445 526-556 10.1177/0022427813520445 [DOI] [Google Scholar]
  • 12. : Administrative data linkage as a tool for developmental and life-course criminology: The Queensland Linkage Project. Australian & New Zealand Journal of Criminology .2015;48(3) : 10.1177/0004865815589830 409-428 10.1177/0004865815589830 [DOI] [Google Scholar]
  • 13. : Rethinking Criminal Propensity and Character: Cohort Inequalities and the Power of Social Change. Crime and Justice .2021;50(1) : 10.1086/716005 13-76 10.1086/716005 [DOI] [Google Scholar]
  • 14. : The Birth Lottery of History: Arrest over the Life Course of Multiple Cohorts Coming of Age, 1995–2018. American Journal of Sociology .2021;126(5) : 10.1086/714062 1127-1178 10.1086/714062 [DOI] [Google Scholar]
  • 15. : Identifying Classes of Explanations for Crime Drop: Period and Cohort Effects for New York State. Journal of Quantitative Criminology .2016;32(3) : 10.1007/s10940-015-9274-5 357-375 10.1007/s10940-015-9274-5 [DOI] [Google Scholar]
  • 16. : Locking up my generation: Cohort differences in prison spells over the life course. Criminology .2020;58(4) : 10.1111/1745-9125.12256 645-677 10.1111/1745-9125.12256 [DOI] [Google Scholar]
  • 17. : The Long-Term Incarceration Consequences of Coming-of-Age in a Crime Boom. Journal of Quantitative Criminology .2022; 10.1007/s10940-022-09559-4 10.1007/s10940-022-09559-4 [DOI] [Google Scholar]
  • 18. : How Cohorts Changed Crime Rates, 1980–2016. Journal of Quantitative Criminology .2022;38(3) : 10.1007/s10940-021-09508-7 637-671 10.1007/s10940-021-09508-7 [DOI] [Google Scholar]
  • 19. : Constancy and change in the U.S. age distribution of crime: A test of the “invariance hypothesis”. Journal of Quantitative Criminology .1992;8(2) : 10.1007/BF01066743 175-187 10.1007/BF01066743 [DOI] [Google Scholar]
  • 20. : The adoption of a crime harm index: A scoping literature review. Police Practice and Research .2023;24(4) : 10.1080/15614263.2022.2125873 423-445 10.1080/15614263.2022.2125873 [DOI] [Google Scholar]
  • 21. : National Survey of Crime Severity. Bureau of Justice Statistics (BJS), University of Pennsylvania .1985; Reference source
  • 22. : The Cambridge Crime Harm Index: Measuring Total Harm from Crime Based on Sentencing Guidelines. Policing .2016;10(3) : 10.1093/police/paw003 171-183 10.1093/police/paw003 [DOI] [Google Scholar]
  • 23. : Examining the Consequences of the "Prevalent Life Events" of Arrest and Incarceration among an Urban African-American Cohort. Justice Q .2016;33(6) : 10.1080/07418825.2015.1016089 970-999 10.1080/07418825.2015.1016089 [DOI] [PMC free article] [PubMed] [Google Scholar]
Wellcome Open Res. 2023 Jun 29. doi: 10.21956/wellcomeopenres.20758.r59630

Reviewer response for version 1

Stacy Tzoumakis 1,2

In this Data Note, the authors provide detailed information on the linkage of regional police records to the participants of the Avon Longitudinal Study of Parents and Children. This will be an invaluable resource to answer questions about early life risk and protective factors linked to criminal justice involvement in adolescence and adulthood. They provide clear and concise information on the procedures and record linkage methods. The deterministic and probabilistic record linkage methods used were appropriate. The authors also acknowledge the limitations and difficulties with conducting record linkage with police data such as the use of aliases.

Many record linkage studies can provide a rate of false-positive linkages (e.g., n/1000 persons or %) in the data to better demonstrate the accuracy of the linkage. If it is possible to do so, the authors should consider adding this information. Moreover, the authors should consider adding the number of records and prevalence for the domestic abuse indicator (perhaps to Table 3?) as this will surely be of interest to potential collaborators.

A few other minor comments:

  1. In the Abstract, the authors state: “ In total, ALSPAC had permission to link to crime records for 12,662 of the ‘study children’ (now adults, who were born in the early 1990s).” They should consider providing the denominator of the study children (n~14,000) or the % to demonstrate that this was relatively high.

  2. In the first sentence of the background, the authors refer to upstream risk factors. It might be good to provide some examples or define this for the readership.

  3. On p. 3 the authors state that “ pursuing linkage to local police records would be a more targeted yet equally valid approach”. I would consider rewording “equally valid” as they acknowledge later that using this data source does not account for convictions. It has other strengths such as being a broader measure and an indicator of earlier contact with the CJS.

  4. On p. 5, it would be good to provide the total number of twins in the ALSPAC data for context.

  5. On p. 6, consider providing the n to specify what consists of ‘a few’ in this sentence: “ A few 'true' links also had some of the lowest confidence scores…

  6. On p. 6, in Stage 2: Extracting attribute data, the authors should provide percentages or ranges to describe what is meant by the ‘majority’ and ‘high’ in these two sentences: “ majority will go on to face trial in court”; “ Conviction rates are high for many offences…

  7. It was great to see the brief summary of the data include numbers by sex, which already revealed important differences. I was surprised to see that violence against the person had the highest prevalence, especially for females. Violence is usually the least common crime type. Traffic and other less serious offences are typically highest in population studies. It would be helpful to have a more detailed description of what specific offence types (common assault? acts intended to cause injury?) were included in this offence group to unpack this finding.

  8. On page 8, consider providing some numbers to show what is meant by ‘very high’: “ While conviction rates can be very high…

Are sufficient details of methods and materials provided to allow replication by others?

Yes

Is the rationale for creating the dataset(s) clearly described?

Yes

Are the datasets clearly presented in a useable and accessible format?

Yes

Are the protocols appropriate and is the work technically sound?

Yes

Reviewer Expertise:

Criminology, child maltreatment, child development, administrative record linkage

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Wellcome Open Res. 2023 Jun 27. doi: 10.21956/wellcomeopenres.20758.r58111

Reviewer response for version 1

Stephanie D'Souza 1

The data note reviewed details the linkage of ALSPAC and regional police records. There is increasing use of administrative records, beyond just the public health space, and into areas of social sciences. This development has made significant contributions to our understanding of key social outcomes. Linkage to prospective surveys has been identified as a significant avenue for future research (e.g. see Milne et al. 2022, Annual Review of Developmental Psychology); therefore, the resource described in this data note can certainly provide significant value.

In general, the methodological processes appear appropriate and have been outlined clearly. Importantly, clear measures have been taken to prevent the possibility of identification due to linkage. Caveats with the data have also been addressed.

I do have two questions about some of the methodological processes that could be clarified in the text where appropriate:

  1. Does the linkage method allow for phonetic similarity between names? (e.g. Li and Lee)

  2. Regarding the manual review of links - does this mean links that met the three criteria specified on page 5 were retained? I’m concerned about the full postcode match being the only criteria for retention in that case, especially given that individuals may have moved and not updated information - particularly those who are likely to have a police record. What % of links met this criterion?

Finally, regarding the disposal outcome types, the authors note that conviction data aren’t available and can differ based on offence type. Are there published statistics on the average conviction rate or range in conviction rate by offence type that could be specified here?

Are sufficient details of methods and materials provided to allow replication by others?

Partly

Is the rationale for creating the dataset(s) clearly described?

Yes

Are the datasets clearly presented in a useable and accessible format?

Yes

Are the protocols appropriate and is the work technically sound?

Yes

Reviewer Expertise:

Longitudinal research; administrative data; maternal mental health; child development

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

References

  • 1. : Use of Population-Level Administrative Data in Developmental Science. Annu Rev Dev Psychol .2022;4(1) : 10.1146/annurev-devpsych-120920-023709 447-468 10.1146/annurev-devpsych-120920-023709 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Availability Statement

    If you require further information about the A&SP data, please contact the ALSPAC Data Linkage Team ( alspac-linkage@bristol.ac.uk).

    ALSPAC data access is through a system of managed open access. The steps below highlight how to apply for access to the data included in this data note and all other ALSPAC data:

    i. Please read the ALSPAC access policy which describes the process of accessing the data and samples in detail, and outlines the costs associated with doing so.

    ii. You may also find it useful to browse our fully searchable research proposals database, which lists all research projects that have been approved since April 2011.

    iii. Please submit your research proposal for consideration by the ALSPAC Executive Committee. You will receive a response within 10 working days to advise you whether your proposal has been approved.


    Articles from Wellcome Open Research are provided here courtesy of The Wellcome Trust

    RESOURCES