Skip to main content
Social History of Medicine logoLink to Social History of Medicine
. 2021 Oct 8;35(4):1116–1139. doi: 10.1093/shm/hkab037

Older rationales and other challenges in handling causes of death in historical individual-level databases: the case of Copenhagen, 1880–1881

Barbara Revuelta-Eugercios 2, Helene Castenbrandt 3, Anne Løkke 4,
PMCID: PMC9949592  PMID: 36844659

Summary

Large-scale historical databases featuring individual-level causes of death offer the potential for longitudinal studies of health and illnesses. There is, however, a risk that the transformation of the primary sources into ‘data’ may strip them of the very qualities required for proper medical historical analysis. Based on a pilot study of all 11,100 deaths registered in Copenhagen in 1880–1881, we identify, analyse and discuss the challenges of transcribing and coding cause of death sources into a database. The results will guide us in building Link-Lives, a database featuring close to all nine million Danish deaths from 1787 to 1968. The main challenge is how to accommodate different older medical rationales in one classification system. Our key finding is multi-coding with more than one version of the ICD system (e.g. ICD-1893 and ICD-10) can be used as a novel method to systematically handle historical causes of death over time.

Keywords: historical cause of death, large-scale historical individual-level population databases, ICD-10 classification, older rationales in causes of death, multiple coding as method


Registered causes of death have changed over time in a number of ways. This has prompted animated discussions among medical historians and historical demographers as to whether it is at all possible to use registered causes of death as primary sources to establish trustworthy knowledge about health and illnesses in the past. In recent decades, however, researchers from both disciplines have come a long way in developing methods to understand and handle the entanglement of changing registration practices, changing comprehensions, changing diagnosis systems and changing illness panoramas.

Recently, however, the building of large-scale historical population databases featuring reconstructed life courses of every single individual in large populations over hundreds of years,1 aims to open new avenues for research in many other disciplines. This development is not only very promising, but it also actualises all the methodological problems of handling the historicity of causes of death on an entirely new scale. As social and health scientists with no historical training (neither medical historical, historical demographical or otherwise) use these data, the value of their results is thus heavily dependent on choices made by the medical historians and historical demographers who build the databases. This concerns all the standard methodological reflections which define the work of all historians, such as how, why, when and by whom the sources are produced and how that influences the information given. Similarly, what are the biases and errors produced by transcription? How do different categorisation criteria, classification systems and the application of different levels of medical historical expertise influence the resulting data? If the answers to these questions are historically well founded, well described and transparent in the data structure, such large-scale databases may provide real research breakthroughs for many disciplines; if not, they may simply be a waste of time and money.

For the authors of this article, this challenge is immediate. We are currently building Link-Lives, an individual-level historical population database consisting of life courses and family relations for (nearly) everyone who lived in Denmark from 1787 to 1968, the year in which a modern civil registration system was introduced in Denmark. In total, the database is to capture nine million individuals. We are building basal life courses and family relations by linking information about any given person across different censuses and baptism, marriage and burial registers. We will subsequently add information to these basal life courses from death certificates and, later, information from a range of other sources, such as patient records, conscription records and autobiographies. If we succeed in our aims, our Danish data will be useful for longitudinal studies with life-course and multigenerational perspectives, and be open for in-depth transnational comparisons with places that also have large-scale individual-level historical population databases.

One of our greatest challenges right now is the choice of classification system for hand-ling causes of death. Classification systems have been used since the nineteenth century to categorise the many thousands of individual causes of death into groups according to the medical scientific rationales of the time in order to derive useful cause of death statistics. The classification system for such statistics used worldwide today is called ICD-10.

Historical research projects based on smaller samples of causes of death have in recent decades most often constructed their own classification systems, tailor-made to their specific research questions. However, the whole purpose of large-scale historical population databases is that the same data can be used many times over by many different research projects to study a range of research questions. The applied classification system must therefore be able to accommodate very different uses. Furthermore, it is an obstacle for comparative research if every large-scale database uses its own classification system. Therefore, international cooperation and agreement are required among developers of historical population databases.

In the last couple of years, some historical population databases have chosen to use ICD-10.2 Using ICD-10 is appealing because it seemingly makes sense of historical data by connecting them to present-day medical science and to modern register data. However, this methodological decision has deep implications for our understanding of health and diseases in the past. And it is a decision most medical historians are highly sceptical about. ICD-10 is not made to handle the many historical causes of death which today are not considered possible causes, despite making perfect sense in the medical science rationales of their time. However, rather than just accepting or rejecting the use of ICD-10 for classifying historical causes of death data, we decided to explore its implications further.

Aim and Structure

The overall aim of this article is to identify, describe, analyse and discuss the difficulties and challenges associated with the handling of individual-level historical cause of death information in the construction of large-scale historical population databases, designed to be used by researchers from many different scientific disciplines, as well as for longitudinal studies and transnational comparison.

This article is structured such that we first present the two main primary sources of historical causes of death in Copenhagen: the burial register and death certificates. Secondly, we compare the quality and potential of the information in these two sources by linking individuals in the burial register to their corresponding death certificates. Thirdly, we compare the results of coding the same data using two different classification systems: the main chapters of ICD-10 and a coding system developed by Bernabeu, Ramiro, Sanz and Robles for nineteenth-century causes of death (hereafter the BeRaSaRo-system).3 We use the coding both to compare the two source groups and to compare the classification systems. The aim of the latter is to discover what actually happens to our handling of causes of death with only older rationales while using ICD-10, rather than using a system made to be more sensitive to older rationales in historical cause of death information.

Link-Lives will soon have to handle individual causes of death for nine million individuals over 200 years. As the pilot study has been designed to elicit experience in preparation for the main project, it is based on a limited sample of all deaths in Copenhagen for the two years 1880 and 1881: a total of 11,100 deaths. This decision to work with two years of data was in order to build volume while still keeping the workload manageable (and indeed affordable) as a pilot study.

As background, we present medical historical arguments for the importance of understanding and preserving information held by historical causes of death, while recognising that these causes are not meaningful as causes of death in present-day medical science. This is followed by an introduction of ICD-10 and the BeRaSaRo-system.

Background: Older Rationales in Historical Causes of Death

Many medical historians and historical demographers have debated the merits and problems of using historical cause of death registration as a source for the history of health and illness. Twenty years ago, confrontations between those who doubted the feasibility of the whole effort and those taking a more carefree leap into counting and conclusions often became agitated, as evidenced in the special issues of Historical Methods (1996), Continuity and Change (1997) and the Journal of the History of Medicine (1999). Simultaneously, however, and increasingly during the subsequent decades, studies carefully built their case for using registered cause of death as sources to throw light on specific research questions, typically about a limited subject matter such as the emergence and disappearance of a specific diagnosis or the history of the stabilisation of symptoms into one recognisable illness. These studies typically navigated the concrete methodological challenges of the historicity of causes of death. They explicitly presented, analysed and discussed the steps taken in using this specific cause of death information, to make sure the argumentation and conclusions were trustworthy for the specific research question about a given place and time.4 Questions addressed have often been: how have the registration procedures changed? By whom was the cause of death decided? And when and how has a specific diagnosis changed? Most of the studies agreed that very few illnesses have had stable identities over hundreds of years (though smallpox is one). Most illnesses have been part of larger complexes of symptoms, such as gangrene, plethora, fever, and typhus in the eighteenth century, which during the nineteenth and twentieth centuries were split into different, more singular diagnoses, each with a described aetiology and prognosis. Because newer diagnoses are typically more specific than older ones, most diagnoses are bound up in relatively narrow timespans and cannot be directly or fully translated into diagnoses further forward in time. The older diagnoses simply do not feature the information necessary to decide to which more modern diagnosis they might belong. For example, if a nineteenth-century death certificate only gives the word gangrene, there is not enough information about what caused the gangrene and where in the body is was situated in order for us to place it in ICD-10 with a useful code; instead it has to be categorised as being ‘ill-defined’. However, causes of death given within older medical rationales, such as gangrene, do bear information, which would get lost if categorised as ‘ill-defined’. Gangrene is a much more informative cause of death than nothing, even if we have no way of knowing whether it was caused by diabetes, a crushed leg or something entirely different.

Another example is from Løkke’s studies of infant deaths: In Copenhagen, 60 per cent of dead infants during the years 1850–1860 died from three causes: newborn, convulsions and atrophia infantilis (Figure 1). Although physicians decided the cause of death, none of these causes is a valid term for a cause of death within medical science today. Since they cannot be allocated specific codes in ICD-10, they have to be coded as ‘ill-defined’, a classification unhelpful to the understanding of historical infant death. For medical historians, these three causes do bear valuable information, however. That newborn, convulsions and atrophia infantilis were the most frequent causes of death for infants tells us that, in the medical thinking of the time, infants were thought to be so fragile that they could die without suffering from an actual illness at all. ‘Newborn’ was a cause of death in itself until 1876 when it was changed to congenital weakness. Slighter older infants were thought to be able to succumb to convulsions, despite being seemingly healthy just moments before their deaths. Atrophia indicated that infants could waste away simply because it was difficult to make them thrive; again, without an illness being involved.5 However, in the late nineteenth century, these older rationales faded away as more and more infant deaths were recorded as having been caused by an illness (Figure 1). Further investigation shows that the use of these older rationales declined before the aetiology of the infant diarrhoea was understood, which had become the most common new cause of death. The more frequent use of this new diagnosis was followed by massive research investment that developed a more effective prevention of infant diarrhoea, an important factor in the decrease in infant mortality.6 This is an example of the type of analysis we want to ensure we accommodate in our large-scale database.

Fig. 1.

Fig. 1

Number of infant deaths per 100 live births in Copenhagen according to cause of death, 1850–1919. Source: The Danish published cause of death statistics. See Løkke, Døden i Barndommen, 58.

The ICD-Classification Systems for Causes of Death

ICD-10 is, as previously mentioned, the classification system used today for making sense of health data, including cause of death statistics. It is negotiated and accepted internationally by medical experts in their fields and used by 117 countries in their mortality statistics.7

ICD-10 is the current version of the International Classification of Diseases which, since 1948, has been maintained by the World Health Organization (WHO). International cooperation using a common classification system, however, goes back to the work of William Farr in the middle of the nineteenth century. Subsequently, in 1893, Jacques Bertillon compiled a classification designed for international adoption, which among medical historians and statisticians is often referred to as ICD-1893 or ICD-0.8

ICD-10 was first published in 1990. It has 22 major classification chapters and approximately 15,000 minor codes.9 There are appealing features for using ICD-10 with historical data: it is easily accessible online and a huge organisation is occupied with producing educational material as textbooks, online training systems, courses, etc. It is thus feasible for historians everywhere to teach themselves ICD-10 and easy to find expert coders among today’s health professionals.

The only major drawback for the use of ICD-10 in historical cause of death databases used to be that, although it is designed to be the best possible reflection of the medical rationales of our time, it is of course not made to handle historical diagnoses and older medical rationales. However, yet another drawback is looming. ICD-10 is in itself on the threshold of becoming a period piece. It has been relatively stable since its implementation, but its successor, ICD-11, has now been agreed upon.10 In a few years’ time, therefore, ICD-10 will become the tenth historical ICD-classification system to no longer be updated in line with present-day standard medical science. This poses a new challenge for historical databases that have been using ICD-10 precisely because it is so up to date.

ICD-11 will also prompt a demand from present-day health professionals for the production of translation tools simply to convert ICD-10 to ICD-11, but WHO found it impossible to achieve this between ICD-9 and ICD-10. At the launch of ICD-10, WHO stated that: ‘It is not possible to convert ICD-9 data sets into ICD-10 data sets or vice versa’. To ease the problems caused, WHO recommended that ‘at the time of a change in revision of the ICD, WHO Member States perform a “bridge-coding” exercise in which all or a sample of deaths are dual coded according to both the outgoing and incoming revisions. In this way, comparability ratios can be calculated for each cause’.11

There is no doubt that a simple conversion of ICD-10 to ICD-11 will be as impossible as it was between earlier ICD versions. But we may be on the cusp of getting access to a tool to understand and better describe the differences between ICD versions, thanks to Professor of Translational Disease Systems Biology, Søren Brunak, and his group, who are developing a novel kind of ‘mapping’ of ICD-8 onto ICD-10. They call the endeavour ‘mapping’ because it involves, among others things, differences flagged by a score system, and imaging of the changes in both main chapters and minor codes. This may result in analytical methods of inspirational value for handling the same challenge in much older data too.

The BeRaSaRo-Classification System

To assess whether more information is lost from historical data when coded using ICD-10 than if coded using a classification system made to catch historical medical rationales, we had to create or choose a system tailor-made to fit late nineteenth century Western medical science. A major challenge for international comparisons is that most research projects have used their own historical classification coding systems. We therefore decided against using the Danish national classification system used in the late nineteenth century for the publication of the official cause of death statistics.12 This system otherwise appealed to us because it was tailor-made to accommodate Danish diagnoses of the time. Instead, we chose a system explicitly made for international comparison of historical data: the BeRaSaRo-classification was both explicitly designed for nineteenth-century causes of death and adequately explained, making it possible to use by international colleagues.13 The BeRaSaRo-classification was further attractive to us because the research interest of its authors was fairly broad, namely, the change of illness panorama during the great the decline of mortality in late-nineteenth- and early-twentieth-century Europe.

Bernabeu, Ramiro et al. focused their system on separating infectious diseases from all other causes of death as the two main categories. As subcategories within these two groups, they accommodated the categories from Bertillon’s international 1893 classification, mentioned above. They constructed the infectious/non-infectious dichotomy based on state-of-art infectious disease medicine in 2003. The BeRaSaRo-classification thus operates with two simultaneous rationales: the 2003 rationale of infectious disease medicine and the late nineteenth-century medical rationale reflected in the individual causes of death of the Spanish primary sources sorted into Bertillon’s classification. This double rationale is built into every single code which, as we will discuss later, proved a challenge in our coding when we started using the BeRaSaRo-classification in 2017.

The Sources for Individual Causes of Death

The two main primary sources for individuals’ cause of death data in nineteenth-century Copenhagen are, as already mentioned, the burial register and death certificates.

The system of issuing death certificates was implemented by law in Copenhagen in 1832.14 Every dead body had to be examined by a physician, who was to issue a death certificate and indicate cause of death. The next of kin normally had to pay the physician for this, but poor people received the certificate free of charge. The aim was to prevent people who only appeared to be dead from being buried alive, something widely debated in the press at the time. The law prohibited the burial of a corpse if a death certificate was not presented to the officials in charge of the burial.15 As a result, since 1832, the cause of death of very close to every person in Copenhagen was recorded by physicians via death certificates.

The physician completed a pre-printed death certificate template.16 Since 1848, the main disease as well as the acute cause of death was to be recorded and, from 1875, more details about the disease and the deceased person were required.17

We have not yet found direct evidence of how and by whom the death certificate was transferred from the physician to the burial authorities during the nineteenth century. If the certificate was issued to the next of kin for them to submit to the burial authorities, it may have influenced what the physicians chose to write as the cause of death. In 1901, however, it is clear that it was usually the next of kin who submitted the certificate, as the Danish Association of Physicians in this year asked the Ministry of Justice to make the death certificates less understandable to the public. The argument was that physicians were tempted to shy away from writing alcoholism, venereal disease or other ‘dishonourable’ causes of death. A glued envelope was not enough to keep it secret, the Association stated,18 so it is reasonable to assume that the number of not-so-honourable causes of death recorded was smaller than those actually diagnosed by the physicians.

All original Danish death certificates are meant to be preserved at the Danish National Archives, and the collection is fairly, but not fully, complete. However, the certificates are single sheets of paper, so there is no simple way to check which are missing.19

Official aggregate cause of death statistics for Copenhagen have been published since 1832, based on death certificates. Provincial towns were included in 1860 and the rural districts in 1920.20 Also based on the death certificates is the database entitled ‘the Danish Cause of Death Register’ (Dødsårsagsregister) containing individual cause of death data for all deceased individuals in Denmark since 1943.21

Burial registers have a much longer history in Denmark than death certificates. The keeping of church records containing burial registers has been mandatory by law in Denmark since 1646.22 Depending on the local clergyman, these may or may not have included causes of death. In Copenhagen, however, it was made mandatory in 1664 for the clergyman who performed the burial to ask the next of kin how the deceased had died, so he could submit weekly lists of the number of burials with information on causes of death. The lists were supposed to be published in print every week and, from the late eighteenth century at least, they were.23

The burial registration procedure in Copenhagen became more complicated during the nineteenth century. A shortage of space at the old parish graveyards inside the city ramparts prompted the establishment of new large cemeteries outside the central city serving multiple churches and parishes. Thus, it became unclear whether the registration of a burial should be recorded in the parish burial register, the burial register for the new cemeteries or both. In 1861, the municipal authorities decreed that all non-military burials in Copenhagen, regardless of which graveyard or cemetery was used, should be written into one central register. From 1880, all graveyards and cemeteries in Copenhagen were managed together by a new department, the Copenhagen Burial Authorities (Kjøbenhavns Begravelsesvæsen).

The Copenhagen Burial Register 1861–1942 consists of bound volumes of pre-printed pages consisting of six, later eight, burials per page. The burials are registered semi-chronologically according to month of burial, and each burial is numbered consecutively, making it evident when pages are missing. The bound volumes are preserved in their entirety and stored at the Copenhagen City Archive (Københavns Stadsarkiv). Scanned images of the full original collection are available online.24

Contents of the Two Sources

Much of the information requested on the death certificates and the burial registers are similar, but as the two types of document were made for different purposes, they are organised differently.

The pre-printed form for each record in the Copenhagen Burial Register is a short narrative text, with space left blank to be filled in by the clerks of the Copenhagen Burial Authorities. No instructions are included, but the handwritten contents were filled in systematically and in the same way for all records in our sample; the handwriting also seems to suggest only two or three clerks were involved. Below is an example record from the 1881 Burial Register, translated into English. The text in bold type shows the handwritten contents; non-bold type indicates the pre-printed text:

No 813

Sunday 20 Febr 12 o’clock is buried

Jens, son of journeyman carpenter Christrup,

2/12 years of age. Residence Slotsg. 29

Unmarried. Married. Widowhood. Died 16 Febr.

of convulsions, from the home Sct Johns Parish to Vestre25

The cost of the funereal services appeared beneath each entry. In this example, a minimum burial package was selected, at a cost of 1 Danish krone.

The record is obviously written in a kind of shorthand developed for the Copenhagen Burial Register. The parish names and the name of the cemeteries are shortened, as are the street names, and the clerk chose not to underline Jens’ civil status. However, the information is unmistakable: an infant of two months of age was clearly going to be unmarried, and the names of cemeteries and streets refer to finite lists, if in Copenhagen. So for individuals with residence in Copenhagen, and who were also buried in Copenhagen, the information is quite clear. For those in the Copenhagen Burial Register who had residence and/or a burial place outside the city, the recorded information relating to places tends to be less precise.

The death certificates are one-page forms consisting of named fields, along with elaborate instructions for the certifying physician to help produce a legally valid document, even if he had not routinised the task.

We do not know for sure whether the information from the two sources are interdependent. We will look closer into this below. However, a simple field-to-field comparison shows that both sources have information that is not included in the other. Thus, the two sources combined give fuller information about an individual than each source does independently.

Transcription Methods

Transcription of primary sources for use in historical population databases should of course be subject to all the same considerations as other types of historical primary source editing and publishing. Research based on such databases will naturally be flawed if the transcribed data is not a trustworthy reproduction of the original, so it is vital that the transcription principles are documented and accessible to users. The recognised standards among historical demographers remain those drafted by the ‘founding fathers’ of the Umeå historical population database back in 1973: transcribed data should be true to its source, able to be tracked back to the original source and all relevant information in the original source should be included in the data transcription. All processing of data should be research-oriented, allowing for micro-historic research as well as for large-scale cohort studies.26 However, to adhere to these principles in practice takes a lot of careful planning and for economical and other reasons it will often be tempting for transcription projects to try to cut corners, which may seem insignificant beforehand, but may have unintended negative consequences later.

The Copenhagen burial records are currently being transcribed by a large crowdsourcing project organised by the Copenhagen City Archive (Københavns Stadsarkiv). Transcriptions for the years 1861–1912 are already accessible online, and 1913–1942 will follow in the next few years. The transcription project has been carefully planned to fulfil the above principles. Using an online interface, volunteers carry out the transcriptions. The effort is supervised and quality-checked by so-called ‘super-users’, who are also mainly volunteers. The online interface provides transcribers with scanned facsimile images of the original records alongside fields to fill in.27 All fields in the original records are transcribed, except the fields concerning burial arrangements and payment thereof. These can, however, be added later. The system contains features to flag fields where there is no information to transcribe, as well as features for the volunteers to use when they are uncertain of the transcription, prompting the assistance of a super-user.

The Copenhagen City Archive designed the input fields after interviews with genealogists and volunteers and discussions with researchers in history and onomastics. The aims were to optimise the volunteer transcription experience, the genealogists’ user experience as well as the quality of data for medical, social and historical research. The volunteers preferred drop-down menus to always having to transcribe letter by letter, for example, so standard spellings were built into drop-down lists in cases where the Archive judged that it would help the volunteers and that would not cause a negative impact on the quality of the resulting data.

For the field capturing causes of death, the Archive chose a dual approach: as well as a drop-down list, an open field was provided, in which volunteers were asked to type a new value if the exact one did not appear in the drop-down list. Newly input values were then used to update the drop-down menu, so the options have been steadily growing. The first version was based on the classification used in Denmark to compile the official aggregate cause of death statistics in the 1930s. This first list contained 700 diagnostic expressions. Since then, the drop-down list has grown to more than 10,000.28 The list groups different spellings of the same word—for example, ‘Rachitis (Rakitis, Rhachitis)’, ‘Convulsiones (Konvulsioner)’—to be selected in one mouse click. Synonyms for the same illness, however, are listed as separate choices: ‘Engelsk Syge’ (Danish for rachitis), ‘Kramper’ (Danish for convulsions). Also listed as separate choices are causes with a descriptive attachment, such as ‘Nephritis chronica’.29

In our work with the records, we have experienced this transcription project as being very successful. It is progressing quickly and the resulting data is systematic, clean and easy to handle. It is easy to check with the facsimile if something appears unusual or erroneous.30 As a result, soon it will be possible to access transcribed and digitised individual causes of death from the Copenhagen Burial Register, of all people buried in Copenhagen from 1861 to 1942. Added to that is the existing Cause of Death Register that includes all deaths in all Denmark from 1943 to the present day.31 Soon, we will have continuous individual cause of death information for all deaths in Copenhagen since 1861: more than 150 years of data. This is highly promising for future research.

However, for research based on causes of death from the Copenhagen Burial Register, it is important to know more about the registers. Who supplied the cause of death to the Burial Register? Did the clerk copy the cause from the death certificate? The normative sources are not explicit about this and even if they were, we could not be sure whether the rules were followed. We proceeded then to compare the cause of death for each individual in both the Copenhagen Burial Register and the death certificates.

As no transcriptions exist for the full collection of death certificates in Copenhagen for any year, we had to do the transcribing ourselves as part of our pilot study. Our research team photographed the death certificates for the city of Copenhagen 1880 and 1881 on site at the Danish National Archives32 before paid student assistants transcribed these off site. The contents of each of the death certificate fields were transcribed, and fields were created for attributes that were frequently mentioned, even if they had no designated field name on the certificate (such as parental information and legitimacy status for children). We provided the transcribers with lists of streets, physicians’ names and causes of death, but we made no drop-down menus. We instructed the students to transcribe the information exactly as it was written on each certificate and to indicate illegible letters with a question mark. These raw transcriptions were revised by trained and highly motivated student-interns and by Revuelta-Eugercios and Løkke. The cause of death fields, especially, required heavy revision. From this, we learned that to generate high-quality transcriptions of causes of death, the transcribers need a lot of training, which is much better carried out in a huge transcription project like the Copenhagen Burial Register Project processing 80 years at a time, instead of setting up a smaller project covering only 2 years.

Number of Deaths in Copenhagen

One way to assess the completeness of the sources is to compare the number of transcribed records with the number of deaths in the published statistics. The total number of deaths in Copenhagen during the 2 years 1880 and 1881 was close to 11,100, according to both Copenhagen Burial Register and the death certificates. The number of deaths in the official cause of death statistics as well as in the Vital Statistics publications was, however, higher (Table 1).

Table 1.

Number of deaths in Copenhagen according to different sources, 1880 and 1881 (stillbirths included)

Burial records Death certificates Published aggr. Cause of Death Statisticsa Published aggr. vital Statisticsb
Number of deaths 11,073 11,105 11,594 12,014
Intended criteria for inclusion Municipality of Copenhagen is place of burial Municipality of Copenhagen is residence while living Municipality of Copenhagen is residence while living Municipality of Copenhagen is place of death
a

1880: 5,876 deaths + 216 stillbirths; 1881: 5,277 + 225 stillbirths. Døds-Aarsagerne i Staden Kjøbenhavn, de øvrige Kjøbstæder […]1880–84. Statistisk Tabelværk, Nr.4, Rk. 4, Litra A, (1886) pp. 7, 29.

b

1880: 6,363 inclusive 283 stillbirths; 1881: 5,651 inclusive 285 stillbirths. Vielser, Fødsler og Dødsfald i Aarene 1880–1884, Statistisk Tabelværk, Nr. 5, Rk. 4, Litra A, (1886) pp. XVII, 254, 260.

Some differences are to be expected, since the criteria for which deaths counted as Copenhagen deaths were not identical. In 1880 and 1881, the rule was that the Copenhagen Burial Register should include all individuals buried in the city except for individuals buried at the military, Catholic or Jewish cemeteries. This rule was changed in 1887, after which all individuals who died in Copenhagen and all those buried in Copenhagen were to be included in the Burial Register.33 However, we see some individuals with a place of death in Copenhagen but buried outside the city, and some buried at military cemeteries in the Copenhagen Burial Register for 1880 and 1881, before this rule applied.

The original death certificates archived in the Danish National Archives and the published cause of death statistics are both expected to be categorised by residence of the deceased while living. The vital statistics should be derived according to place of death. However, late nineteenth century statisticians were already complaining that when place of death and place of residence were not the same, the reporting practices lacked consistency. For our purposes, we concluded that we needed to keep track of individuals with missing death documentation throughout our Link-Life database.

Linkage Methods

To compare the information in the two source groups, we linked the 11,073 individuals in the 1880–1881 Copenhagen Burial Registers to their death certificates when it was possible to find a match. We found a corresponding death certificate for 9,616 of the burials (87 per cent of cases). We used automatic rule-based linkage almost exclusively. We prepared our two files for linkage by removing superfluous typographical space signs in attributes, removing record duplicates within each file and adding columns with codes for standardised attributes of age, gender and marital status. We did not standardise names, occupations or addresses.

Readers with no experience in using databases for historical analysis may also wish to know that database programmes can return sorted subsets of the data, based on rules formulated by historians. Using algorithms, we can also compare whether the descriptive attributes for entities in a pair of files are the same, a little different or very different. We used four different strategies to establish the matches (Table 2).

Table 2.

The four strategies used to link a burial record to a death certificate for a given individual

Strategy Burial records
No. %
1 Automatic link, blocking by gender and death date, + table-2-rules: using Jaro–Winkler similarity for name, surname, address, occupation + age 8,421 76.1
2 Manual link if possible of the remaining unmatched same death date, same gender subsets from the first step 871 7.9
3 Automatic link, as strategy 1, but blocking only by death date 101 0.9
4 Automatic link, as strategy 1, but blocking by gender, month and year of death 223 2.0
Burial records matched to a death certificate total 9,616 86.8
Burial records not matched to a death certificate total 1,457 13.2
All burial records 11,073 100.0

Strategy 1 matched 76 per cent of the 11,073 individual burial records to their corresponding death certificates. We used date of death and gender as blocking criteria, to minimise the number of comparisons that needed to be made between the two files. This process involves having the programme sort the individuals into small groups containing the same gender and death date. Then we asked the programme to compare each subset of individuals in the two files of the same gender who died on the same day.

We compared string similarity in names, surnames, addresses and occupations, applying the Jaro–Winkler string metric,34 as well as computing age differences. The Jaro–Winkler string metric assesses the similarity of two words and returns a value between 0 (zero) and 1 (one). Zero denotes completely different words and one indicates an exact match. It takes into account how many characters are different between the two words, the edit distance and the length of the word, and it places most emphasis on the similarity in the first part of the word. To give an example, the Jaro–Winkler distance for the names ‘Catarine’ and ‘Catharina’ is 0.974 and between ‘Kathrine’ and ‘Catharina’ it is 0.806.

We used six sequential rules to evaluate whether a given pair of records with the same date of death and gender could be a potential match (Table 3). The first rule identified the most conservative matches, where the Jaro–Winkler similarity score of first name, surname and address was equal to or higher than 0.8 and the age in years was exactly the same (rule 1, Table 3). Those records that did not fulfil the criteria for classification under the first rule were subjected to a second rule, with looser constraints.

Table 3.

The rule sequence used to link a burial record to a death certificate for a given individual

RULE jaro_name jaro_surname age years jaro_address jaro_occ1
1 ≥0.8 ≥0.8 = =
2 ≥0.8 ≥0.8 = =
3 ≥0.8 ≥0.8 =
4 ≥0.8 ≥0.8
5 ≥0.8 =
6 ≥0.8 =

The minimum criteria on which two records could be considered a match, with both the same death date and the same gender, were the match of a surname and the exact age (rule 6, Table 3). As the rule sequence made the first rules more reliable, we always preferred the candidate pair with the lowest rule number. If the same ranking rule resulted in more than one match, we went to strategy 2 (Table 2).

Strategy 2: we manually checked all subsets of the same death date and the same gender, which had not already been paired by strategy 1. We assigned the link if there were no other competing candidate with as low a rule number. Using this strategy, we matched an additional 7.9 per cent of the 11,073 burial records to a death certificate (Table 2).

The application of strategies 3 and 4 accounted for a final 2.9 per cent of our matches. They replicated the steps taken in strategy 1 for those burials still unmatched after applying strategies 1 and 2 (Table 2). We did this by applying slightly looser blocking criteria, in order to find potential matches not previously identified because of inconsistencies in gender and date reporting. If gender or day of death were misreported, strategy 1 could not have been able to compare the true potential records.

The rules to identify likely pairs within a given blocking set were strict enough to capture most real matches. But they were not overly conservative, as we saw when we manually reviewed the unmatched subsets for strategy 2. On the one hand, the blocking clusters reduced the numbers of individuals in each source group to a maximum of a dozen, so the possibility of having very similar candidates was reduced. On the other hand, by first assigning the more clear-cut cases, we could accept lower levels of matching for the remaining cases where we would still expect to see matches. This was confirmed when we reviewed the unmatched pairs in step 2. We saw that when we manually examined burials without a corresponding death certificate for a given date and gender, we found that none of the individuals in these same-gender/same-date-of-death groups had even remotely similar names or characteristics to those unmatched individuals. So the 0.8 Jaro–Winkler threshold was relatively precise in identifying versions of a similar person. Furthermore, that precision did not come at the expense of missing many pairs, wrongly discarded by very strict rules.

We are therefore relatively sure that we have found nearly all the death certificates where there is a match to be found in the two files. There is, however, one exception: unbaptised infants and stillbirths. There are often so few personal characteristics provided for them in either or both of the two sources, that it is almost impossible to assign a link where there are two or more stillbirths/unbaptised infants registered with the same day of death.

Burial Records with and without a Matching Death Certificates

The linkage showed that 87 per cent of the burial records matched a corresponding death certificate and 13 per cent did not. We sought to discover what characterised the individuals in the 13 per cent group compared to those in the 87 per cent group. In the analysis that follows, we first look at the deceased’s place of residence, then age, gender and civil status.

We expected the match rate to be lower for individuals who were not resident in Copenhagen while living, because another health district was responsible for archiving their death certificate, as mentioned before. Therefore, we distributed the burial records according to place of residence. We found that 91 per cent of individuals with residence within the municipality of Copenhagen had a matching death certificate. This means that nine per cent of the death certificates are missing that should have been archived under the Copenhagen City Health District (Table 4).

Table 4.

Linkage rates for burial record to death certificate according to place of residence while living

Residence of deceased Number of burial records Certificate not found Certificate found Percentage matched with certificate
All 11,073 1,457 9,616 86.8
Municipality of Copenhagen 10,378 926 9,452 91.1
Christianshavn 665 46 619 93.1
Nørrebro 2,759 209 2,550 92.4
Østerbro 255 20 235 92.2
Indre By 5,130 415 4,715 91.9
Vesterbro 807 97 710 88.0
Hospital as residence 399 31 368 92.2
Institution as residence 363 108 255 70.2
Neighbouring municipalities 272 221 51 18.8
Frederiksberg 133 112 21 15.8
Amager 66 59 7 10.6
Bispebjerg 31 23 8 25.8
Other neighbouring districts 42 27 15 35.7
Residence further away 290 235 55 19.0
Unknown residence 133 75 58 43.6

A total of 695 burial records (6 per cent) stated the residence of the deceased, while living, to be in neighbouring districts, further away or unknown. We would not expect to find their death certificates archived under Copenhagen. However, we found approximately a fifth of their certificates archived there anyway, with only some of these marked as duplicates. That means that we may also expect to find certificates for people with residence in Copenhagen archived under one of the 208 other Danish health districts.

We also looked for differences in linkage rates according to age, gender and civil status. The only systematic difference found was for infants of under 1 year of age, including stillbirths. We tested this in a multivariate model of the probability of a burial record to have a corresponding death record: we found that the differences according to gender and civil status were not statistically significant, while there was a 20 per cent lower probability of finding death certificates for infants under 1 year of age when compared with the age range 15–60 years. This confirms our impressions from handling the burial records manually, that the names of very small infants were, more often than was the case in the other age groups, too incomplete to allow a match with a death certificate.

In summary, we could not match burial records to death certificates for 9 per cent of deceased individuals who were resident in Copenhagen while living. The only systematic bias, when looking for age, gender and civil status among those missing, is that infants of less than 1 year of age were more often than others impossible to match. It must, however, be expected that individuals who lived in Copenhagen and died in Copenhagen, but were buried outside Copenhagen, had, more often than others, their death certificate incorrectly archived under the health district in which they were buried. If this is the case, there may be a bias towards people who had recently migrated to Copenhagen. It is possible to test this, but we have not had the resources to do so in this study.

Personal Information Compared

The matching of a burial record with the death certificate of the same person allows us to compare how well the two sources capture information for the same individual (Table 5).

Table 5.

Number of cases with missing information of the matched sample of burial records and death certificates

Variable missing Number of burial records Number of death certificates Burial records with this variable missing (%) Death certificates with this variable missing (%)
Age 536 279 5.6 2.9
Civil status 581 2161 6.0 22.5
Gender 3 22 0.0 0.2
Occupation 1,752 2,999 18.2 31.2
Address 91 193 0.9 2.0
All burial records with matching certificate 9,616 9,616

We see that both sources capture age, gender and address quite well, age being a little better in the death certificates. Neither are very successful at capturing occupation, but the burial records record occupation more often than the death certificates. Civil status data is also captured better in the burial records.

It is, however, important to bear in mind that the unmatched burial records most likely miss more variables than the matched, because if too many variables are missing, links are impossible to establish. Still, we see that the Copenhagen Burial Register provides more complete personal information than do the death certificates, but the two source groups together provide fuller personal information than one source group on its own.

Classification Coding of Causes of Death

The standard method of handling huge volumes of individual-level information in large-scale population databases is to carry out the initial systematising analysis by extracting all unique expressions (words or word strings) of each attribute, and then coding these unique expressions with classification codes in new columns in classification tables. All classification is thus done on unique expressions, not on the individual persons. The database containing the persons is afterwards automatically coded by adding columns based on the master coding in the classification tables. This is a highly time-efficient method, because although the number of individuals in such databases can amount to millions, the number of unique expressions of occupations or causes of death, for example, will most often be in the hundreds or thousands thus making it feasible to manually access these one by one.

Because all classification and interpretation is allocated into additional columns in the database, this method also keeps all original information about a given person in its original form, including original spelling errors. It is the added columns based on the classification tables which, in a transparent and reversible way, bring standard spelling and classification codes into the database.

In our study of Copenhagen for the years 1880 and 1881, we had c.11,100 individual deceased persons. But there were only 6,000 unique expressions of causes of death from the death certificates and 1,545 from the burial records. We then merged the 6,000 and the 1,545 expressions into an alphabetically sorted list in a table to manually find different spellings of the same word or word string. In a new column, we applied a single standard spelling of similar words (e.g. ‘bronchitis’ for ‘bronkitis’, and for all other spelling variations of bronchitis including misspellings) and a standard placing of all descriptive words of the condition after the cause (so ‘acut diarhea’ in the added column became ‘diarrhoea acut’). Such lists are often called ‘synonym lists’ in data analysis, but ours may more accurately be called ‘standard spelling lists’, as we chose to keep synonyms separate as unique expressions. Similarly, we kept both Latin and Danish words for the same condition, because the language used can provide useful information about both the deceased and the physician. For example, illegitimate infants in this study more often had a Danish-language cause of death than those infants born within wedlock, whose death causes were more often given in Latin. This process revealed a new total of 1,123 unique standardly spelled expressions of causes of death.

The next step was to code these 1,123 standardly spelled expressions in two more additional columns, in our two chosen cause of death coding systems: the ICD-10 (the main 22 chapters) and the BeRaSaRo-classification. Our student interns carried out the first coding. They were highly motivated and trained hard, but they were not confident that they had the required knowledge of historical diagnostics. As a result, Castenbrandt and Løkke finished the work, both historians specialising in late nineteenth-century medicine and health in Scandinavia.

We found that this method, in extracting all unique expressions of causes of death from the database, establishing a standard spelling list and applying different coding systems to it, has the potential for high-quality historical analysis for large-scale populations. Nevertheless, the foundations have to be laid by specialist medical historians, prepared to invest the necessary time to actually do the work involved in defining standard spellings and applying the classification coding(s) to the lists.

While coding with the BeRaSaRo-classification, we found that the early Bertillon international classification, which formed part of it, was very close to the late nineteenth century Danish national classification system. This is the one we had abstained from using, as mentioned before, because we wanted a coding system better suited to international comparison. Even if the Danish government did not acknowledge the emerging international cooperation around cause of death classifications at the time, it was clearly influenced by the Bertillon classification. So we found that because Bertillon’s classification was set up for international cooperation in several languages, and as it was influential beyond the nations affiliated to it, the system applied in its original 1893 form was very accommodating for the Danish data. It seems, therefore, that Bertillon’s ICD-1893 might also facilitate international comparison of data from the late nineteenth century by medical historians and historical demographers of today.

Nevertheless, the infectious/non-infectious dichotomy build into the BeRaSaRo-classification proved to cause unforeseen difficulties. The problem was that what was state of the art in infectious medicine in 2017, when we did the coding, was not exactly the same as it was in 2003, when the BeRaSaRo-classification was conceived. It was therefore not evident that the late nineteenth century diagnoses should in 2017 be categorised as infectious/non-infectious, in the way that had seemed evident in 2003. For example, ‘stomach ulcer’ has acquired an element of infection in its aetiology since 2003. We adapted the classification to 2017-infectious medicine and finished the coding, but we learnt that we could avoid such issues by keeping each categorisation rationale in separate classification system codes placed in separate columns. In this case, one column for Bertillon’s ICD-1893 classification and one for the 2003-infectious/non-infectious classification. In this way, we could just have added a third column with a 2017 infectious/non-infectious classification.

Causes of Death Compared

As mentioned above, a death certificate issued by a physician was submitted to the burial authorities for a burial to be scheduled. So it is to be expected that the cause of death, at least to some extent, was copied by the clerks of the Copenhagen Burial Office from the death certificate to the Burial Register. But as these two sources were kept for different reasons, we thought there may be differences between them. We therefore made a comparison to explore differences and similarities in the causes of death for all individuals for whom we were able to match a burial record to a death certificate.

Generally, the burial records mirror the information in the death certificates, but they tend to be less specific. Two or more causes of death per record are less frequent in the burial records. Only 3 per cent of the deaths in the burial records have two expressions of causes of death, compared to 19 per cent in the death certificates. For example, a 58-year-old man was recorded as having died from tumor cerebri as well as tuberculosis universal on his death certificate, but only tumor cerebri in his burial record. Another 58-year-old man was registered having died from hanging (hængning), which was then clarified with two other causes: suicide (suicidum) and insanity (sindssygdom); he was registered only using ‘suicidum’ in the burial record.

Single causes of death also tend to be less specific in the burial records. They were generally written with only one word (72 per cent of the cases), compared to just 33 per cent of death certificates featuring one-word causes.

To identify different ways in which the burial records are less specific than the death certificates, we looked in more detail into the causes we coded in ICD-10 as neoplasms (cancers and tumours). There are 583 cases of neoplasms in the burial records and 497 of these are linked to death certificates. In these 497 comparable cases, 215 causes of death were written only as cancer (kræft) or tumour (tumor) in the burial record. However, most of these cases (83 per cent) have a clarification in the death certificate, referring to the type of cancer and its location, such as cancer abdominal. For researchers interested in tumours, it may thus be useful to transcribe the death certificates in spite of the burial records having already been transcribed.

To find out the frequency of crucial differences in the causes of death in the two sources, we examined those who had died aged 60 or older. A total of 1,971 of the burial records for this group had a matching death certificate. Of these, 1,609 (82 per cent) had exactly the same first cause of death expression in both sources. A further 100 had a simpler description of a more complex diagnosis: the burial record had, for example, bronchitis instead of bronchitis capillary. A total of 87 per cent thus had the same, or a simpler version of the same, cause of death in both sources.

Some of the differences that we see in the remaining 13 per cent of the cases were caused by the use of a synonym (Danish instead of Latin or vernacular language instead of medical language), such as ‘hjertesygdom’ (heart disease) for ‘morbus cordis’. We therefore checked in how many cases our coding, with the ICD-10 main chapter code, differed between the burial records and the death certificates of the same individual. We found that in 151 cases of the 1,971 elderly people (7.6 per cent), the main ICD-10 chapter was different. Among these, 53 were in another chapter, because the burial record used only the second cause of death recorded on the death certificate. The remaining 98 death causes (5 per cent) of the elderly have, however, real differences. Among these are cases with different rationales. The term ‘old age’ (alderdom) was more commonly used as a cause of death in the burial records than in the death certificates. If old age was stated in the death certificate, it was often registered as ‘weakness in old age’ rather than just ‘old age’. There are also examples where the cause of death in the death certificate was changed from a specific illness as cause to ‘old age’ in the burial record. For example, a 76-year-old woman who died of cancer, according to her death certificate, was registered with the cause ‘old age’ in the burial record, and an 88-year-old woman’s death cause changed from ‘diarrhoea’ to ‘old age’.

Among infants of under one year of age, we found cases with different rationales in the burial records compared to their death certificates. In the burial records, 397 cases give stillbirth as the cause of death. Of these, 298 could be traced to their death certificates. Of these 298 cases, as many as 197 had a more descriptive cause of death in the death certificate, such as ‘died during birth’, ‘premature birth’ or ‘prolonged birth’. This may be explained by the different purposes of the two types of registration: for the burial authorities, it was important to known whether an infant was a stillbirth or a live birth, because different instructions applied as to how the burial ritual should be performed for each.35 The cause of death in the death certificates was meant to monitor birth assistance. For researchers of today interested in the complications of birth, it will be crucial to have access to the causes of death given in the death certificates and not just from the burial records.

Suicide as a cause of death seems to have been deliberately removed in some cases, mostly from the burial records but not exclusively so. Overall, 57 suicide cases were found in the two sources. Most of these appear as suicide in the death certificates, but there are eight cases where suicide was noted in the burial records with no indication of suicide in the corresponding death certificates. However, as many as 39 of the cases noted as suicide in the death certificates are not noted as such in the burial records. The term ‘suicide’ is generally not mentioned but, instead, the way the person died is noted, such as ‘hanging’ or ‘drowned’. For example, the death of a 39-year-old female cigar maker was listed on her death certificate as from ‘suicide, likely poisoning’, but became ‘poisoning’ in the burial record. Another case, with suicide as the only cause on the death certificate, was for a 53-year-old man that worked as a delivery worker. His death was noted as an ‘unfortunate event’ in the burial record. This might indicate that it continued to be preferable not to have one’s suicide recorded in the burial records, even if, as of 1866, suicide was no longer a criminal offence.36 Perhaps, then, family and relatives influenced what elements of the cause of death from the death certificate were registered in the burial record.

In summary, while personal information given for individuals is more detailed in the burial records, the causes of death are much more explicated in the death certificates. For more detailed analyses of causes of death, then, it will be worth analysing both the burial records and the death certificates in tandem. However, as the transcription of death certificates is in its very early days, the Copenhagen Burial Register is a valuable and trustworthy starting point to generate relevant samples for transcription for specific research projects.

ICD-10 and BeRaSaRo-Classification Compared

To estimate the proportion of late nineteenth century deaths registered with causes of death no longer perceived to be possible causes of death in modern health sciences, we coded the same burial records with both the ICD-10 and the BeRaSaRo-classification. We chose the burial records for this, as they tended to have a larger proportion of older rationales than the death certificates. Thus, the burial records should give us the maximum magnitude of this challenge. We distributed the deaths according to age, since Løkke had earlier studied this phenomenon only in infancy and old age.

Table 6 shows the result first for ICD-10, where a total of 83.4 per cent of the deaths can be placed meaningfully in a main chapter, but 16.6 per cent of the cases bear causes of death which are ill-defined according to ICD-10 (chapter 18), a chapter which includes all kinds of ill-defined causes of death. A 16.6 per cent is seemingly not a prohibitively large share. But when the same cases below are coded with the BeRaSaRo-classification, and the five most frequent causes of death with older rationales are shown separately, just 4.2 per cent remain ill-defined.

Table 6.

ICD-10 coding compared with BeRaSaRo coding of all Copenhagen burial records 1880 and 1881 (n = 11,073)

Number coded in this category per 100 burial records
Age in years at death <1 1–14 15–60 60+ Total
Coded by ICD-10 coding system
Ill-defined = chapter 18 34.2 7.9 5.1 12.6 16.6
All other main chapters 65.8 92.1 94.9 87.4 83.4
Sum 100.0 100.0 100.0 100.0 100.0
Coded by the BeRaSaRo-classification
Convulsions 10.9 2.9 0.3 0.0 3.9
Atrophia infantilis/atrophia 16.1 2.1 0.1 0.0 5.3
Old age 0.0 0.0 0.0 7.8 1.7
Sudden death with unknown cause 2.4 0.4 1.8 2.5 1.8
Marasmus senilis 0.0 0.0 0.0 0.8 0.2
Sum of these 29.4 5.4 2.2 11.1 12.9
Ill-defined 6.0 2.4 2.3 3.0 4.2
All other causes of death 64.6 92.2 95.5 85.9 82.9
Sum 100.0 100.0 100.0 100.0 100.0

However, when analysed according to age, there are huge differences in the frequency of older rationales. Again using ICD-10, as much as a third of infant deaths are categorised as chapter 18: ill-defined causes. For all other age groups, the percentage of ill-defined deaths in ICD-10 is much lower. For the elderly, it is 12 per cent, 8 per cent for children aged 1–14 years and for the age range 15–60 years, just 5 per cent (Table 6). Coded with BeRaSaRo-classification, we see that the percentage of ill-defined deaths among infants is as little as 6 per cent down from 34 per cent using ICD-10.

For the 1880s Copenhagen data, then, the problem of how to handle older rationales is much more critical for research in infant deaths and also, but to a lesser extent, among the elderly than for other age groups.

Discussion

The point of departure for this article was a range of recent developments in the handling of historical cause of death registration as a source for health and illnesses in the past. The animated fights of 20 years ago between proponents (mostly historical demographers) and sceptics (mostly medical historians) have subsided, as more researchers from both disciplines have performed careful source analysis into changing registration practices, changing comprehensions, changing diagnosis systems and changing illness panoramas.

Right now, however, the building of more and larger historical population databases containing individual cause of death information actualises the importance of remembering the scepticism of medical historians, as the databases provide easy access to historical cause of death data to researchers from disciplines without historical training. However, building the databases gives medical historians and historical demographers the chance to cooperate in building their expert knowledge into the databases to guide new users in fruitful ways to handle methodological challenges. However, the challenge remains how to do that in practice. Identifying the range of challenges involved when building such databases, and possible best practices, has been the aim of this study.

In terms of the transcription of sources, it is evident that this should be done as accurately as in other scholarly primary source publications. There are no key disagreements here, only a mass of practical issues to be faced, solved and paid for. Users of the databases must have access to the original cause of death expressions of the historical primary sources and there must be an accurate labelling as to which are original expressions and which are later interpretations. This study has shown that large transcription projects have the potential to be more cost effective in delivering high quality transcriptions than smaller projects can ever be.

The study also looked into the possible gains of transcribing both the death certificate and burial record for each given individual. The result was that there is consistency, but not complete equivalence, between the causes of death given in the two sources. The death certificates tend to describe the causes of death more fully, while the personal information is more complete in the burial records. Therefore, using both simultaneously offers richer data for research.

The main challenge, however, is the choice of classification system for historical cause of death information. Here no agreement on best practice exists: not in medical history, not in historical demography and not in today’s health sciences, where the problem presents itself with every new ICD version. In our study, we coded the same cause of death information with both the ICD-10 and the BeRaSaRo-classification, in order to understand the actual differences between the two systems in the interpretation of historical causes of death with rationales in older medical science.

Our coding with the BeRaSaRo-classification taught us that even an accurately described, coherent coding system, tailor-made for handling nineteenth-century causes of death, is not robust enough for long-lasting use if it combines both nineteenth-century and present-day medical rationales in every single code. This is because present day medical science very soon becomes obsolete, as medical science develops so quickly. Therefore, if we want to apply a classification referring to medical rationales from different time periods, it must be done in separate columns in the database, as in the double coding suggested by WHO in updating ICD-9 to ICD-10. However, we also learned that the Spanish version of the Bertillon international classification, often called ICD-1893, was very close to the Danish classification system used in the 1880s, so it was unexpectedly easy to make synonym lists simultaneously in Latin, Danish, Spanish and English.

Our coding showed that the BeRaSaRo-classification was much better at keeping causes of death with older rationales out of the ill-defined category than ICD-10 was. But it also showed that for our 1880s Copenhagen sample, the frequency of older rationales is higher for infant deaths and for the elderly than for other age groups. It must however be expected that older rationales appear more frequently for the other age groups in older data and in Danish rural districts, where death certificates were issued by authorised laymen.

The main result of our pilot coding was that a double coding with both the main chapters of ICD-10 and the original Bertillon ICD-1893 classification is a promising way forward. Longitudinal databases may want to double-, triple- or multi-code with a range of older versions of the ICD system. Medical science and medical statisticians since 1893 have built ICD-0, 1, 2, 3 etc., complete with translations to many languages. These historical coding systems are thus accessible at the national libraries in most countries and therefore easy to share internationally among medical historians and historical demographers of today.

The double/multiple coding approach also provides a solution to the problems with ICD-11 encountered by historical population databases coded using ICD-10: they do not, mercifully, have to change their coding but “just” need to add the ICD-11 coding in a new separate column while keeping the ICD-10 coding intact.

This coding strategy will thus be robust enough to withstand future changes in medical science rationales. It will also permit international cooperation among those medical historians who need an alternative to navigating multiple project-specific coding systems, something that has become a necessary, time-consuming and cost-inefficient norm over the past few decades.

Conclusion

Large-scale historical life-course population databases, with individual-level cause of death information, offer the potential to allow longitudinal studies of health and illnesses, of interest not only for medical historians and historical demographers, but also for health and social scientists who study health and illnesses across multiple generations.

The use of ICD-10 to classify historical causes of death in such databases, however, has inspired fear, not least among the article’s authors, that the hard-learned methodological lessons in handling historical causes of death will be lost, and thus research based on such databases will be less trustworthy than state-of-the-art research in the field. The risk is that once classified using ICD-10, the transformation of the primary sources into ‘data’ will strip them of the very qualities required for proper historical analysis.

The main result of our study is that instead of hindering proper historical analysis, applying standard data analysis methods actually offers new possibilities to handle the complexity of changing registration practices, changing comprehensions, changing diagnosis systems and changing illness panoramas. Databases offer tools to maintain, document and keep track both of the original historical expressions and the many layers of interpretations and classifications. ICD-10 can therefore be applied without losing the possibility of adding further classification systems, to be used separately or together. Accordingly, the ICD-system, in its full historical existence (ICD-1893, ICD-1, ICD-2, etc.), can be used as a novel method to systematically handle changing diagnoses for causes of death over time. All versions of the ICD-system thus together provide a meta-classification system for large-scale historical individual-level population databases. In this way, medical historical knowledge will also contribute to extending the time frame backwards for scientific disciplines traditionally not interested in history, but in the interaction of human society and human biology.

Acknowledgements

The authors want to thank our colleagues in the SHIP network (Studying the History of Health in Port Cities) for fruitful discussions of earlier versions of this article. Extra warm thanks go to Isabelle Devos, Angelique Janssens and the anonymous referees for careful reading and valuable comments.

Footnotes

1

Among them are HISDI-Mad for Madrid (CSIC); Norwegian Historical Population Register (University of Tromsø); the Pop-Link and Popum databases (CEDAR Umeå University); Scania Economic and Demographic Database (Lund University); Digitising Scotland (Edinburgh University); Skye and Kilmarnock (University of Cambridge); Tasmania (Monash University); Amsterdam Causes of Death (Radboud University); Historical Sample of the Netherlands (Institute of Social History (IISG); Alghero Database (Università degli Studi di Sassari).

2

Among the historical population databases working with ICD-10 are CEDAR, Digitising Scotland, Connecticut Valley and Tasmania Dataset.

3

Josep Bernabeu-Mestre et al., ‘El Análisis Histórico De La Mortalidad Por Causas: Problemas Y Soluciones’, Revista de Demografía Histórica XXI, 2003, I, 167–93.

4

Among them are Irvine Loudon, The Tragedy of Childbed Fever (Oxford; New York: Oxford University Press, 2000); Robert Woods, Death before Birth: Fetal Health and Mortality in Historical Perspective (Oxford: Oxford University Press, 2009); Alice Reid et al., ‘‘A Confession of Ignorance’: Deaths from Old Age and Deciphering Cause-of-Death Statistics in Scotland, 1855–1949’, The History of the Family, 2015, 20, 320–44; Phillip Roberts, ‘El Significado De “Parálisis General Del Demente” En Victoria, Australia; 1886–1906’, Asclepio, 2014, 66, 1–12; Anne Løkke, Døden i Barndommen. Spædbarnsdødelighed Og Moderni-seringsprocesser i Danmark 1800 til 1920 (København: Gyldendal 1998); Anne Løkke, ‘Infancy and Old Age as Causes of Death’, in Jørgen Povlsen, Signe Mellemgaard and Ning De Coninck-Smith, eds, Childhood and Old Age—Equals or Opposites? (Odense: Syddansk Universitetsforlag, 1999).

5

Løkke, Døden I Barndommen, 55–69.

6

Anne Katrine Kleberg Hansen, Henrik Hertz and Anne Løkke, Børneafdelingen: Syge Børn og Børnesygdomme på Rigshospitalet 1910–2010 (København: Lægeforeningens Forlag, 2010), 21–24.

7

International Classification of Diseases (ICD) Information Sheet, http://www.who.int/classifications/icd/factsheet/en/ (accessed 12 January 2021).

8

Ibid.

9

Ibid., ICD-10 Version: 2016, https://icd.who.int/browse10/2016/en#/(accessed 12 January 2021).

10

International Statistical Classification of Diseases https://www.who.int/standards/classifications/classification-of-diseases (accessed 12 January 2021).

11

http://www.who.int/classifications/help/icdfaq/en/ (accessed 16 December 2019).

12

Døds-Aarsagerne i Staden Kjøbenhavn, de øvrige Kjøbstæder og de 6 saakaldte Handelspladser i Fem-Aaret 1880-84 Statistisk Tabelværk, Nr.4, Rk. 4, Litra A.

13

Bernabeu-Mestre et al., ‘El Análisis Histórico de la Mortalidad por Causas’.

14

Børre Johansson, Den Danske Sygdoms- og Dødsaarsagsstatistik: Med Et Afsnit Om Pneumonistatistik (København: Munksgaard, 1946), 54–56. Death certificates without cause of death were implemented in Copenhagen in 1829. The 1832 law included all of Denmark, but only in Copenhagen was it mandatory that a physician issue the certificate and provide cause of death. This was due to the very few physicians residing outside Copenhagen. In provincial towns with a residing physician, the rules for Copenhagen applied. In rural districts and towns without a physician, two laymen were authorised to issue the certificates.

15

Ibid., 55.

16

Communication by Sundhedscollegiet (Royal Board of Health) in Adressecomptoirets Efterretninger April 1836.

17

Johansson, Den Danske Sygdoms- og Dødsaarsagsstatistik, 163–164; Axel Holck, Dansk Statistiks Historie 1800–1850 Saerlig Med Hensyn Til Den Officielle Statistiks Udvikling (København: Statens Statistiske Bureau, 1901), 181.

18

Johansson, Den Danske Sygdoms- og Dødsaarsagsstatistik, 177.

19

Rigsarkivet, Daisy søg dødsattester: See the 208 archival references here: https://www.sa.dk/daisy/arkivskaber_eller_arkivserie_liste?d=1&e=2016&c=d%C3%B8dsattester

20

Johansson, Den Danske Sygdoms- og Dødsaarsags-statistik, 66.

22

Holck, Dansk Statistiks Historie, 161.

23

Johansson, Den Danske Sygdoms- og Dødsaarsagsstatistik, 36.

24

Københavns Stadsarkiv search ”begravelser”. Visited 12 January 2021 here https://www.kbharkiv.dk/sog-i-arkivet/kilder-pa-nettet/begravelser/assistens-kirkegard-og-begravelsesprotokoller.

25

Københavns Stadsarkiv, archival reference: https://kbharkiv.dk/permalink/post/1-39702

26

Johansson, E. and Åkerman, S., ’Faktaunderlag för Forskning. Planering av en Demografisk Databas’, Historisk Tidskrift, 1973, 3, 406–14, 406.

28

Information supplied by archivist Signe Trolle Gronemann 2020.

29

Københavns Stadsarkiv search “søg i indtastede kilder”. Visited here 12 January 2021 https://www.kbharkiv.dk/sog-i-arkivet/sog-i-arkivet.

30

Ibid..

32

Rigsarkivet, Sundhedsstyrelsen, Dødsattester, København (1840–1980).

33

Københavns Stadsarkiv search ”Begravelser i hele København 1861-1942”, visited 12 January 2021 https://www.kbharkiv.dk/sog-i-arkivet/kilder-pa-nettet/begravelser-i-hele-kobenhavn-fra-1861-og-frem.

34

William E. Winkler, ‘Overview of Record Linkage and Current Research Directions’, in Research Report Series, (Statistics #2006-2) (Washington, DC 20233: Statistical Research Division, U.S. Census Bureau, 2006).

35

Anne Løkke, ‘Statistical, Legal, Religious and Medical Definitions of Stillborn Infants in Denmark 1683–2012’, in Gaëlle Clavandier et al., eds, Morts Avant De Naître. La Mort Périnatale (Dead before Being Born. About Perinatal Death), Collection "Perspectives Historiques" (Tours, France: Academic Press François-Rabelais (University of Tours), 2018), 83–97, 92–93.

36

Emil Engelbrektsen, “At Forebygge den Over-haandstagende Hang til Selvmord”. Den juridiske behandling af selvmord i Danmark 1683–1866 (Unpublished Master’s thesis, University of Copenhagen, 2017).

Funding

This work was supported by Slots-og Kulturstyrelsen, Kulturministeriet [grant number FPK.2018-0018]; Innovation Fund Denmark, Grand Solutions [grant number 8088-00034A]; and Carlsbergfondet, ‘Semper Ardens’ Research Project [grant number CF18-1116].


Articles from Social History of Medicine are provided here courtesy of Oxford University Press

RESOURCES