Skip to main content
GigaByte logoLink to GigaByte
. 2022 Mar 15;2022:gigabyte45. doi: 10.46471/gigabyte.45

Sepsis-3 criteria in AmsterdamUMCdb: open-source code implementation

Tom Edinburgh 1,*, Stephen J Eglen 1, Patrick Thoral 2, Paul Elbers 2, Ari Ercole 3
PMCID: PMC9650242  PMID: 36824503

Abstract

Sepsis is a major healthcare problem with substantial mortality and a common reason for admission to the intensive care unit (ICU). For this reason, the management of sepsis is an important area of ICU research. A number of large-scale, freely-accessible ICU databases are available for observational research and the robust identification of septic patients in such data sets is crucial for research purposes, particularly for comparative studies between critical care sub-populations which may vary around the world. However, data structures are poorly standardised due to inevitable variances in clinical electronic health record system vendor and implementation as well as research database design choices. Robust and well-documented cohort selection (such as patients with sepsis) is crucial for reproducible research. In this work, we operationalise the Sepsis-3 definition on the AmsterdamUMCdb, a recently published large European ICU database, publishing open-access code for wider use by critical care researchers.

Statement of need

Sepsis is defined as life-threatening organ dysfunction resulting from a dysregulated host response to infection [1] and is a primary cause of critical illness and mortality. Early identification and treatment, which may include complex organ-support on the intensive care unit (ICU), is crucial for survival [2]. Historically, it has been been difficult to quantify both its incidence and mortality rate within the ICU, as it is a heterogeneous syndrome characterised by a wide-ranging infectious agent, infection site, treatment history and host response. Multiple organ dysfunction in septic patients follows from post-infection dysregulation in the immunology, biochemistry and physiology of the patient, and this in turn leads to morbidity and mortality. The third consensus definition of sepsis, by the Sepsis-3 Task Force [1], recommended a revised definition better aligned to this concept, explicitly incorporating mortality in order to ameliorate previous limitations and to allow for greater consistency in operationalising the definition criteria across different centres [3]. The Sepsis-3 clinical criteria are an acute increase in Sequential Organ Failure Assessment (SOFA) [4] of at least 2 points, accompanying a suspected or documented infection, with the criteria for septic shock further requiring both use of vasopressors and a lactate level >2 mmol/L.

Sepsis-3

The third consensus definition of sepsis, by the Sepsis-3 Task Force [1], recommended a revised definition to address and ameliorate previous limitations and to allow for greater consistency in operationalising the definition criteria across different centres [3]. Nevertheless, there is not unanimous agreement within the intensive care community on the utility of the new definition [5, 6]. The Sepsis-3 clinical criteria are an acute increase in Sequential Organ Failure Assessment (SOFA) [4] of at least 2 points, accompanying a suspected or documented infection, with the criteria for septic shock further requiring both use of vasopressors and a lactate level >2 mmol/L. SOFA measures the severity of organ dysfunction across the six domains of the respiratory, neurological, cardiovascular, liver, coagulation and renal systems.

Amsterdam UMC database

The most seriously ill patients with sepsis are treated on ICUs, which are perhaps the most data-dense clinical environments. Continuous multimodality monitoring, together with clinical expertise, forms the bedrock of patient care. However, the need to carefully balance the potential research benefits against patient privacy, ethics and legal concerns has limited the number of openly-accessible, de-identified large-scale databases of critical care patients to a small handful, such as Medical Information Mart for Intensive Care (MIMIC) [7] and eICU [8]. Differences in ICU demographics, resources, admission criteria and treatment strategies across different countries restrict the ability to generalise knowledge from these databases to other ICU populations and so comparative studies are crucial. Robust and well-documented cohort definition (sepsis, in this case considered here) is crucial to such reproducible large-scale observational data research. However a lack of uniform standard in data collection across different vendors and implementations of electronic health records hamper easy re-usability of models and code which is therefore necessarily database-specific.

Amsterdam University Medical Centers Database (AmsterdamUMCdb) [9] is a new, freely-accessible European ICU database, released in collaboration with the Society of Critical Care Medicine (SCCM) and the European Society of Intensive Care Medicine (ESICM). Compliant with both the U.S. Health Insurance Portability and Accountability Act (HIPAA) [10] and the European General Data Protection Regulation (GDPR) [11] through iterative risk-based patient de-identification, this database contains close to 1 billion data points from 20,109 critically ill patients admitted to Amsterdam UMC between 2003 and 2016. The database consists of patients admitted both to ICU and to the ‘medium care unit’ (MCU) in Amsterdam UMC. This data, comprised of seven comma-separated value tables, is combined from multiple systems in a ‘data lake’ structure linked through anonymised identifiers. AmsterdamUMCdb has already been the focus of several multidisciplinary research events, including two ESICM datathons [12] and a Neural Information Processing Systems (NeurIPS) privacy challenge [13].

Implementation

Sepsis-3 in Amsterdam UMC database

We provide a single script that computes the following: daily SOFA scores (individual components and total score) for each admission, antibiotic escalation on a daily basis, and finally sepsis/septic shock episodes (where one ‘day’ corresponds to each 24 h period after admission). Our definition of each SOFA component score follows the AmsterdamUMCdb SOFA script, and we extend this computation from just the 24 h period post admission to a longer time period spanning multiple ‘days’. This time period may be specified early within our script. Where no SOFA scores were available prior to ICU admission, the SOFA components were assumed to be zero, as per the Sepsis-3 recommendation. However, at least three missing SOFA components resulted in discarding any identification of sepsis or not for that admission/day.

Following [3], we define infection by an increase in the maximum rank of any antibiotics administered (or the number of antibiotics of maximum rank), according to the classification proposed by [14], where at least one antibiotic was given intravenously. This operationalisation of Sepsis-3 with a suspected infection identified only by antibiotic escalation is reliant on clinical judgement rather than a confirmed infection, which is a limitation of this approach. In our implementation, we have explicitly identified and disregarded routine or prophylactic administration of certain antibiotics within the standard procedure in Amsterdam UMC. For example, Amsterdam UMC practises selective digestive decontamination, which involves administering cefotaxime on admission (16 doses over four days) to everyone expected to stay at least one or two days. In Amsterdam UMC, cefotaxime is exchanged for ceftriaxone upon a suspected infection within this time period, so we have disregarded cefotaxime from the antibiotic escalation within the first four days.

A sepsis episode within a 24 h period was then defined as an increase in total SOFA score of at least two points between the previous and current, previous and subsequent, or current and subsequent 24 h periods, alongside an antibiotic escalation within that 24 h period. Finally, antibiotic use that accompanied admission after elective surgery were assumed to be prophylactic and as such was not classified as sepsis either on that 24 h period or the subsequent 24 h period. Any subsequent 24 h period that met the Sepsis-3 definition was however identified as a sepsis episode. Septic shock was defined as a subset of sepsis episodes with a cardiovascular SOFA score of at least 3 (i.e. using vasopressors) and a maximum lactate of at least 2 mmol/L. We assumed that vasopressors were administered if required to maintain a mean arterial pressure at least 65 mmHg, assuming adequate fluid administration.

The accompanying AmsterdamUMCdb GitHub repository [15] contains descriptions of the data structure and instructions for querying the database, as well as Python scripts for extracting and checking crucial concepts, such as primary admission diagnosis and severity scores within the first 24 h. Noting that sepsis at admission is rarely documented consistently, the definition of sepsis in the AmsterdamUMCdb scripts is given by one of the following criteria:

  • sepsis at admission flagged in the admission form by the attending clinician

  • the admission diagnosis, medical or surgical, is considered a severe infection, e.g., gastrointestinal perforation, cholangitis, meningitis

  • non-prophylactic use of antibiotics after surgery

  • use of antibiotics and cultures drawn within 6 h of admission.

These criteria are generally less consistent than Sepsis-3. Of 20,091 unique first admissions to ICU, in which a diagnosis of sepsis within the 24 h period before or the 24 h period after admission could be made via the Sepsis-3 definition, the sensitivity of the above current criteria compared to the Sepsis-3 in the first 24 h is poor (Table 1). Furthermore, the previous script is designed with admission in mind only, and does not identify sepsis episodes or septic shock outside of the first 24 h period.

Table 1.

Confusion matrix for the current AmsterdamUMCdb sepsis definition, compared to Sepsis-3.

Unique first admissions Current All admissions Current
True False True False
Sepsis-3 True 2114 4319 Sepsis-3 True 2410 5145
False 838 12,820 False 996 14,533

There are 25 admissions in total that have insufficient data for a Sepsis-3 diagnosis. Sensitivity for first admissions only is 0.33 and specificity is 0.94. Sensitivity for all admissions is 0.32 and specificity is 0.94.

To keep our code agnostic of choice of database management system, we work directly with the underlying tables in comma-separated value (CSV) format. The output of this script are two additional CSV files, of a similar size to the base ‘admissions’ table (<10 MB), one containing all SOFA scores for each admission/day and the other containing binary indicators of the following for each admission/day: total SOFA score, antibiotic escalation, prophylaxis, infection, sepsis episodes and septic shock. These output tables are described in Tables 2 and3, and further details about the implementation are documented within the code.

Table 2.

Example SOFA score output, containing the SOFA component and total scores.

admissionid time sofa_respiration_score sofa_coagulation_score sofa_liver_score sofa_cardiovascular_score sofa_cns_score sofa_renal_score sofa_total_score
0 −1 NaN 0 NaN NaN NaN 0 0
0 0 3 0 NaN 1 0 0 4
0 1 2 1 NaN 2 NaN 0 5
1 −1 NaN 1 NaN NaN NaN 0 1
1 0 2 0 NaN 2 0 0 4
1 1 NaN NaN NaN 0 NaN 0 0
2 −2 NaN 0 0 NaN NaN 0 0
2 −1 NaN 1 NaN NaN NaN NaN 1
2 0 2 0 NaN 4 0 0 6
3 −3 NaN 0 NaN NaN NaN 1 1
3 0 2 0 NaN 0 NaN 1 3

The column ‘time’ denotes the ‘day’ of admission, which is the 24 h period after the ICU/MCU admission. A negative ‘time’ indicates data prior to ICU admission (i.e. a partial SOFA score from when the patient was in a general ward prior to transfer to ICU). NaN indicates that this SOFA component was not measured or could be calculated from the data available. The total SOFA score is the sum of the components, with NaN values replaced by 0, as per [1, 4].

Table 3.

Example sepsis table output, containing also the total SOFA score, infection status, prophylactic use of antibiotics and septic shock.

admissionid time sofa_total_score antibiotic_escalation prophylaxis infection sepsis_episode septic_shock
0 −1 0 True True False False False
0 0 4 False False False False False
0 1 5 NaN False False False False
1 −1 1 True True False False False
1 0 4 False False False False False
1 1 0 NaN False False False False
2 −2 0 NaN False False False False
2 −1 1 True False True True False
2 0 6 False False False False False
3 −3 1 NaN False False False False
3 −1 0 True False True True False
3 0 3 False False False False False

The column ‘time’ denotes the ‘day’ of admission, which is the 24 h period after the ICU/MCU admission. NaN in the column ‘antibiotic_escalation’ indicates that this 24 h period occurs before any antibiotics were first administered. Antibiotic escalation in elective post-operative admissions was assumed to be a prophylactic increase in antibiotic administration, rather than an antibiotic escalation due to an infection. Post-operative s are also likely to have a high SOFA score because of surgery. A sepsis episode is defined as antibiotic escalation accompanied by an increase in SOFA score of 2 or more. This increase in SOFA can either be over the previous and current ‘days’, the current and subsequent ‘days’, or the previous and subsequent ‘days’. For the sepsis episode in this table (admissionid 2, time −1), the increase in SOFA from 1 to 6 from day −1 to day 0 is accompanied by an antibiotic escalation on day −1.

Availability of source code and requirements

  • Project name: Sepsis-3 in AmsterdamUMCdb

  • Project home page: https://github.com/tedinburgh/sepsis3-amsterdamumcdb

  • Operating system(s): Platform independent

  • Programming language: Python 3.7.9

  • Other requirements: Python modules – numpy 1.20.3 or higher, pandas 1.2.5 or higher, re 2.2.1 or higher, amsterdamumcdb (installation via/described in [15])

  • License: MIT License

  • RRID:SCR_022042.

Acknowledgements

We would like to thank the reviewers, Chris Armit and Tom Pollard, for their useful comments on the manuscript and about the code in the open peer review for GigaByte.

Funding Statement

TE is funded by Engineering and Physical Sciences Research Council (EPSRC) National Productivity Investment Fund (NPIF) EP/S515334/1, reference 2089662. The funding body had no input in the study design, data collection, analysis or manuscript.

Data availability

The dataset supporting the results of this article (AmsterdamUMCdb) is freely-accessible. Although de-identified, it still contains detailed information regarding the clinical care of patients, so must be treated with appropriate care and respect and cannot be shared without going through a formal application process. Access to the database is on request through [16], under moderate conditions, including completion of a standard training course for handling de-identified clinical data. Snapshots of the GitHub repositories and forms and documentation for applying for access to the data are available via the GigaScience GigaDB repository [17]. To gain access to AmsterdamUMCdb requires the following steps:

  • (1)

    Users must first complete the Data or Specimens Only Research (DSOR) course from CITI (https://about.citiprogram.org/).

  • (2)

    Users must then submit a signed copy of the AmsterdamUMCdb application form (see forms in GigaDB [17]).

  • (3)

    Once the application form has been approved, users must create an account on EASY (https://easy.dans.knaw.nl/ui/home), complete their user profile and request download permission for the dataset on EASY.

  • (4)

    Once registered, users should then contact DANS (Data Archiving and Networked Services), who should send a link to the AmsterdamUMCdb archive.

Declarations

List of abbreviations

Amsterdam University Medical Centers database: AmsterdamUMCdb; comma-separated value: CSV; European Society of Intensive Care Medicine: ESICM; General Data Protection Regulation: GDPR; Health Insurance Portability and Accountability Act: HIPAA; intensive care unit: ICU; Medical Information Mart for Intensive Care: MIMIC; medium care unit: MCU; Sequential Organ Failure Assessment: Neural Information Processing Systems: NeurIPS; SOFA; Society of Critical Care Medicine: SCCM; Structured Query Language: SQL.

Ethical approval

Ethical approval for the data collection, deidentification and governance are described in [9]. No additional ethical approval was required for this manuscript.

Competing interests

The authors declare that they have no competing interests.

Funding

TE is funded by Engineering and Physical Sciences Research Council (EPSRC) National Productivity Investment Fund (NPIF) EP/S515334/1, reference 2089662. The funding body had no input in the study design, data collection, analysis or manuscript.

Author’s contributions

TE, SE and AE contributed conceptualisation. PE and PT were responsible for data curation and project administration. TE and PT developed software. SE and AE provided supervision. TE and AE drafted the manuscript. All authors read and approved the final version of the manuscript.

References

  • 1.Singer M, Deutschman CS, Seymour CW et al. . The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). JAMA, 2016; 315(8): 801–810. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Evans L, Rhodes A, Alhazzani W et al. . Surviving sepsis campaign: international guidelines for management of sepsis and septic shock 2021. Intensive Care Med., 2021; 47(11): 1181–1247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Shah AD, MacCallum NS, Harris S et al. . Descriptors of sepsis using the Sepsis-3 criteria: a cohort study in critical care units within the U.K. National Institute for Health Research Critical Care Health Informatics Collaborative. Crit. Care Med., 2021; 49(11): 1883–1894. doi: 10.1097/CCM.0000000000005169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Vincent JL, Moreno R, Takala J et al. . The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure. On behalf of the working group on sepsis-related problems of the European Society of Intensive Care Medicine. Intensive Care Med., 1996; 22(7): 707–710. [DOI] [PubMed] [Google Scholar]
  • 5.Sinha S, Ray B. . Sepsis-3: How useful is the new definition? J. Anaesthesiol. Clin. Pharmacol., 2018; 34(4): 542–543. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Sartelli M, Kluger Y, Ansaloni L et al. . Raising concerns about the Sepsis-3 definitions. World J. Emerg. Surg., 2018; 13: 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Johnson AEW, Pollard TJ, Shen L et al. . MIMIC-III, a freely accessible critical care database. Sci. Data, 2016; 3: 160035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Pollard TJ, Johnson AEW, Raffa JD et al. . The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Sci. Data, 2018; 5: 180178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Thoral PJ, Peppink JM, Driessen RH et al. . Sharing ICU patient data responsibly under the Society of Critical Care Medicine/European Society of Intensive Care Medicine Joint Data Science Collaboration: The Amsterdam University Medical Centers Database (AmsterdamUMCdb) example. Crit. Care Med., 2021; 49(6): e563–e577. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Public Law 104-191 . Health Insurance Portability and Accountability Act. 1996; https://eur-lex.europa.eu/eli/reg/2016/679/oj. Accessed on 22 September 2021.
  • 11.Regulation (EU) 2016/679 . General Data Protection Regulation. 2016; https://www.govinfo.gov/app/details/PLAW-104publ191. Accessed on 22 September 2021.
  • 12.ESICM . 3rd Critical Care Datathon. 2021; https://www.esicm.org/events/datathon-2021/. Accessed on 22 September 2021.
  • 13.Jordon J, Jarrett D, Yoon J et al. . Hide-and-Seek Privacy Challenge. arXiv. 2020. July; 10.48550/arXiv.2007.12087. [DOI]
  • 14.Braykov NP, Morgan DJ, Schweizer ML et al. . Assessment of empirical antibiotic therapy optimisation in six hospitals: an observational cohort study. Lancet Infect. Dis., 2014; 14(12): 1220–1227. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Amsterdam UMC . AmsterdamUMCdb GitHub repository. 2021; https://github.com/AmsterdamUMC/AmsterdamUMCdb. Accessed on 3 November 2021.
  • 16.Amsterdam UMC . Amsterdam Medical Data Science home page. 2021; https://www.amsterdammedicaldatascience.nl. Accessed on 3 November 2021.
  • 17.Edinburgh T, Eglen SJ, Thoral P et al. . Supporting data for “Sepsis-3 criteria in AmsterdamUMCdb: open-source code implementation”. GigaScience Database. 2022; 10.5524/102204. [DOI] [PMC free article] [PubMed] [Google Scholar]
GigaByte. 2022 Mar 15;2022:gigabyte45.

Article Submission

Tom Edinburgh
GigaByte.

Assign Handling Editor

Editor: Scott Edmunds
GigaByte.

Editor Assess MS

Editor: Scott Edmunds
GigaByte.

Curator Assess MS

Editor: Chris Armit
GigaByte.

Review MS

Editor: Chris James Armit

Reviewer name and names of any other individual's who aided in reviewer Chris Armit
Do you understand and agree to our policy of having open and named reviews, and having your review included with the published manuscript. (If no, please inform the editor that you cannot review this manuscript.) Yes
Is the language of sufficient quality? Yes
Please add additional comments on language quality to clarify if needed A minor point is that the contents of the Restricted Access AmsterdamUMCdb tabular data files are in Dutch, and so researchers may have to familiarise themselves with some of the terms that are used in the data files e.g. Neurochirurgie = Neurosurgery, Vaatchirurgie = Vascular Surgery, Verloskunde = Midwifery etc.
Is there a clear statement of need explaining what problems the software is designed to solve and who the target audience is? Yes
Additional Comments This Technical Release manuscript describes Sepsis-3 criteria in AmsterdamUMCdb. Sepsis-3 is the third consensus definition of sepsis, which includes a revised definition of sepsis based on a scoring system that utilises Sequential Organ Failure Assessment (SOFA). SOFA measures the severity of organ dysfunction across six domains including respiratory, neurological, cardiovascular, liver, coagulation and renal systems. AmsterdamUMCdb is a freely accessible intensive care database with de-identified health data relating to intensive care unit admissions, and includes demographics, vital signs, laboratory tests and medications.
Is the source code available, and has an appropriate Open Source Initiative license &lt;a href="https://opensource.org/licenses" target="_blank"&gt;(https://opensource.org/licenses)&lt;/a&gt; been assigned to the code? Yes
Additional Comments The manuscript is well written, and the software is publicly available from GitHub (https://github.com/tedinburgh/sepsis3-amsterdamumcdb), where it has been ascribed an OSI-approved MIT license.
As Open Source Software are there guidelines on how to contribute, report issues or seek support on the code? Yes
Additional Comments The GitHub archive was created by Dr Tom Edinburgh, who is the contact author for this manuscript.
Is the code executable? Yes
Additional Comments
Is installation/deployment sufficiently outlined in the paper and documentation, and does it proceed as outlined? Yes
Additional Comments
Is the documentation provided clear and user friendly? Yes
Additional Comments The AmsterdamUMCdb Test Data is Restricted Access. I outline below the various steps that were needed to access the Test Data. • To gain access to AmsterdamUMCdb, one must first complete the Data or Specimens Only Research (DSOR) course from CITI • Reviewers must then submit a signed copy of application form arfeula_v1.6.pdf • This application form and the link to the DSOR course are available from the following link: • https://www.amsterdammedicaldatascience.nl/#amsterdamumcdb • This application must be counter-signed by an intensivist i.e. a health professional who specialises in the care of critically ill patients, and who is the named reference on the application from Once the application form has been approved, reviewers must do the following: • Create an account on EASY using the same institutional e-mail address as on their form • Complete their user profile in EASY, including organisation and department • Request download permission for the dataset on EASY Once registered with EASY, reviewers will be granted access to the AmsterdamUMCdb Data File that is archived in EASY: • https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:130980 There is currently a single file archived on this link. This single file is a PDF file that states the following: • Your request has been granted, the dataset has been archived outside EASY due to the size. • Please contact DANS at info@dans.knaw.nl for delivery of the dataset to you. Preferably mention the citation of the dataset: • Elbers, Dr. P.W.G. (Amsterdam UMC) (2019): AmsterdamUMCdb v1.0.2. DANS. https://doi.org/10.17026/dans-22u-f8vd Reviewers should then contact DANS (Data Archiving and Networked Services), who should send a SURFfilesender link to the 8.3GB AmsterdamUMCdb-v1.0.2.zip archive. This link is active for 3 weeks. The zip file contains seven tabular data files (CSV file format) that I list below. • admissions.csv • drugitems.csv • freetextitems.csv • listitems.csv • numericitems.csv • procedureorderitems.csv • processitems.csv
Is there a clearly-stated list of dependencies, and is the core functionality of the software documented to a satisfactory level? Yes
Additional Comments
Have any claims of performance been sufficiently tested and compared to other commonly-used packages? Not applicable
Additional Comments
Are there (ideally real world) examples demonstrating use of the software? Yes
Additional Comments
Is automated testing used or are there manual steps described so that the functionality of the software can be verified? Yes
Additional Comments
Any Additional Overall Comments to the Author This Technical Release manuscript describes Sepsis-3 criteria in AmsterdamUMCdb. Sepsis-3 is the third consensus definition of sepsis, which includes a revised definition of sepsis based on a scoring system that utilises Sequential Organ Failure Assessment (SOFA). SOFA measures the severity of organ dysfunction across six domains including respiratory, neurological, cardiovascular, liver, coagulation and renal systems. AmsterdamUMCdb is a freely accessible intensive care database with de-identified health data relating to intensive care unit admissions, and includes demographics, vital signs, laboratory tests and medications. The manuscript is well written, and the software is publicly available from GitHub (https://github.com/tedinburgh/sepsis3-amsterdamumcdb), where it has been ascribed an OSI-approved MIT license. The AmsterdamUMCdb Test Data is Restricted Access. I am grateful to the authors for providing me with access to this very significant data resource. I outline below the various steps that were needed to access the Test Data. • To gain access to AmsterdamUMCdb, one must first complete the Data or Specimens Only Research (DSOR) course from CITI • Reviewers must then submit a signed copy of application form arfeula_v1.6.pdf • This application form and the link to the DSOR course are available from the following link: • https://www.amsterdammedicaldatascience.nl/#amsterdamumcdb • This application must be counter-signed by an intensivist i.e. a health professional who specialises in the care of critically ill patients, and who is the named reference on the application from Once the application form has been approved, reviewers must do the following: • Create an account on EASY using the same institutional e-mail address as on their form • Complete their user profile in EASY, including organisation and department • Request download permission for the dataset on EASY Once registered with EASY, reviewers will be granted access to the AmsterdamUMCdb Data File that is archived in EASY: • https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:130980 There is currently a single file archived on this link. This single file is a PDF file that states the following: • Your request has been granted, the dataset has been archived outside EASY due to the size. • Please contact DANS at info@dans.knaw.nl for delivery of the dataset to you. Preferably mention the citation of the dataset: • Elbers, Dr. P.W.G. (Amsterdam UMC) (2019): AmsterdamUMCdb v1.0.2. DANS. https://doi.org/10.17026/dans-22u-f8vd Reviewers should then contact DANS (Data Archiving and Networked Services), who should send a SURFfilesender link to the 8.3GB AmsterdamUMCdb-v1.0.2.zip archive. This link is active for 3 weeks. The zip file contains seven tabular data files (CSV file format) that I list below. • admissions.csv • drugitems.csv • freetextitems.csv • listitems.csv • numericitems.csv • procedureorderitems.csv • processitems.csv A minor point is that the contents of the tabular data files are in Dutch, and so researchers may have to familiarise themselves with some of the terms that are used in the data files e.g. Neurochirurgie = Neurosurgery, Vaatchirurgie = Vascular Surgery, Verloskunde = Midwifery etc. I was wondering whether it would it be possible to streamline the application process for AmsterdamUMCdb? For example, as the AmsterdamUMCdb dataset is archived outside of EASY, is it necessary to register with EASY? It may be swifter for a researcher if this step was omitted and they were invited to contact DANS at an earlier step in the process.
Recommendation Accept
GigaByte.

Review MS

Editor: Tom Pollard

Reviewer name and names of any other individual's who aided in reviewer Tom Pollard
Do you understand and agree to our policy of having open and named reviews, and having your review included with the published manuscript. (If no, please inform the editor that you cannot review this manuscript.) Yes
Is the language of sufficient quality? Yes
Please add additional comments on language quality to clarify if needed Overall the paper is clear and well written.
Is there a clear statement of need explaining what problems the software is designed to solve and who the target audience is? Yes
Additional Comments
Is the source code available, and has an appropriate Open Source Initiative license &lt;a href="https://opensource.org/licenses" target="_blank"&gt;(https://opensource.org/licenses)&lt;/a&gt; been assigned to the code? Yes
Additional Comments - The code is currently in a personal GitHub repo. The authors could consider using a service like Zenodo to assign a persistent URL to the code, which could be used in place of the github URL. This may help to ensure the code is persistently available (e.g. in the case that the author decides to move the code from GitHub to a new location).
As Open Source Software are there guidelines on how to contribute, report issues or seek support on the code? No
Additional Comments - It would be helpful for the paper to describe how the user community will be supported. How would the authors like community members to report bugs, fix bugs, contribute improvements, etc? - The authors could briefly discuss this in the paper, and perhaps add guidelines to the repository (e.g. in a CONTRIBUTING.md file: https://github.com/github/docs/blob/main/CONTRIBUTING.md).
Is the code executable? Yes
Additional Comments - It would be great if example or synthetic version of the AmsterdamUMCdb was provided in the repository. At its simplest, this could be comprise of empty tables. This would allow users to easily test the code against the sample data. It would also support development of a testing framework.
Is installation/deployment sufficiently outlined in the paper and documentation, and does it proceed as outlined? Yes
Additional Comments
Is the documentation provided clear and user friendly? No
Additional Comments The paper is well written. The code itself would benefit from significant work to improve readability and usability (for example, with the use of functions and docstrings). I have outlined some suggestions below: - The following link gives a nice example of a well formatted script: https://www.annasyme.com/docs/python_structure.html. There are many reasons why this structure is helpful (e.g. readability; facilitates import and reuse of code; avoids hardcoding of arguments; etc) - Following standard style guidelines improves readability makes it easier to identify bugs (e.g. PEP8: https://www.python.org/dev/peps/pep-0008/). Installing a formatter allows the style to be consistently applied across the code. - Refactoring the code to use functions, classes, and modules, as appropriate would help readability and enable unit testing. https://github.com/tedinburgh/sepsis3-amsterdamumcdb/blob/main/concepts/sepsis3/reason_for_admission.py is &gt;1300 lines with a single function. - Instead of hardcoded variables like "../../data/additional_files/", these could be handled as arguments with defaults (e.g. with argparse).
Is there a clearly-stated list of dependencies, and is the core functionality of the software documented to a satisfactory level? No
Additional Comments - The dependencies are fairly clear, but the code would benefit from refactoring for readability as dicussed. - Presumably the database has some kind of version control? Is there a plan to keep the code in sync with the database versions?
Have any claims of performance been sufficiently tested and compared to other commonly-used packages? No
Additional Comments The paper could be improved by providing a more in depth analysis of the sepsis cohort that can extracted using this code. e.g. how do characteristics of patients with sepsis-3 differ from those without? Although not necessary, these characteristics could be compared to similar cohorts extracted from the other datasets mentioned in the paper.
Are there (ideally real world) examples demonstrating use of the software? Yes
Additional Comments The benefits of making this code available to the research community are clear.
Is automated testing used or are there manual steps described so that the functionality of the software can be verified? No
Additional Comments - The code does not include a testing framework, and the current implementation is not well suited to testing - If refactoring the code as discussed above, then the authors could consider adding some simple unit tests at the same time.
Any Additional Overall Comments to the Author # Summary The paper describes implementation of open source python scripts for computing sepsis-3 for patients in AmsterdamUMCdb, a publicly accessible critical care database. The paper is well written and the efforts are well motivated. The code would benefit from a significant clean up. # Strengths - Many analyses that use AmsterdamUMCdb are likely to benefit from open source code for computation of Sepsis-3. - The paper is well written and demonstrates that significant thought has gone into the implementation, taking into account both clinical and technical perspectives. - The code is shared with an appropriate license, in a public, version controlled repository. # Suggestions for improvement - The code would benefit from a fairly significant clean up. Currently, in my opinion it does not follow standard practice for Python and as a result it is tricky to read, reuse, and iteratively improve.
Recommendation Major Revisions
GigaByte.

Editor Decision

Editor: Scott Edmunds
GigaByte. 2022 Mar 15;2022:gigabyte45.

Major Revision

Tom Edinburgh
GigaByte.

Assess Revision

Editor: Scott Edmunds
GigaByte.

Re-Review MS

Editor: Chris James Armit

Comments on revised manuscript This Technical Release manuscript describes Sepsis-3 criteria in AmsterdamUMCdb. I thank the authors for addressing my comments and for updating the linked GitHub repository. The authors have explained that streamlining access to AmsterdamUMCdb is unfortunately outwith their control, hence the need for a user to gain access via EASY registration. This is completely understandable and I thank the authors for their explanation. I recommend this Technical Release manuscript for publication in GigaByte.
GigaByte.

Editor Decision

Editor: Scott Edmunds
GigaByte.

Final Data Preparation

Editor: Chris Armit
GigaByte.

Editor Decision

Editor: Scott Edmunds
GigaByte.

Accept

Editor: Scott Edmunds

Comments to the Author Thanks for updating the LaTex file.
GigaByte.

Export to Production

Editor: Scott Edmunds

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Availability Statement

    The dataset supporting the results of this article (AmsterdamUMCdb) is freely-accessible. Although de-identified, it still contains detailed information regarding the clinical care of patients, so must be treated with appropriate care and respect and cannot be shared without going through a formal application process. Access to the database is on request through [16], under moderate conditions, including completion of a standard training course for handling de-identified clinical data. Snapshots of the GitHub repositories and forms and documentation for applying for access to the data are available via the GigaScience GigaDB repository [17]. To gain access to AmsterdamUMCdb requires the following steps:

    • (1)

      Users must first complete the Data or Specimens Only Research (DSOR) course from CITI (https://about.citiprogram.org/).

    • (2)

      Users must then submit a signed copy of the AmsterdamUMCdb application form (see forms in GigaDB [17]).

    • (3)

      Once the application form has been approved, users must create an account on EASY (https://easy.dans.knaw.nl/ui/home), complete their user profile and request download permission for the dataset on EASY.

    • (4)

      Once registered, users should then contact DANS (Data Archiving and Networked Services), who should send a link to the AmsterdamUMCdb archive.


    Articles from GigaByte are provided here courtesy of Gigascience Press

    RESOURCES