Skip to main content
PLOS One logoLink to PLOS One
. 2021 Oct 13;16(10):e0258537. doi: 10.1371/journal.pone.0258537

Development of algorithms for identifying patients with Crohn’s disease in the Japanese health insurance claims database

Hiromu Morikubo 1,2,3, Taku Kobayashi 1,*, Tomohiro Fukuda 1,2, Takayoshi Nagahama 4, Tadakazu Hisamatsu 3, Toshifumi Hibi 1
Editor: Valérie Pittet5
PMCID: PMC8513890  PMID: 34644342

Abstract

Background

Real-world big data studies using health insurance claims databases require extraction algorithms to accurately identify target population and outcome. However, no algorithm for Crohn’s disease (CD) has yet been validated. In this study we aim to develop an algorithm for identifying CD using the claims data of the insurance system.

Methods

A single-center retrospective study to develop a CD extraction algorithm from insurance claims data was conducted. Patients visiting the Kitasato University Kitasato Institute Hospital between January 2015–February 2019 were enrolled, and data were extracted according to inclusion criteria combining the Tenth Revision of the International Statistical Classification of Diseases and Related Health Problems (ICD-10) diagnosis codes with or without prescription or surgical codes. Hundred cases that met each inclusion criterion were randomly sampled and positive predictive values (PPVs) were calculated according to the diagnosis in the medical chart. Of all cases, 20% were reviewed in duplicate, and the inter-observer agreement (Kappa) was also calculated.

Results

From the 82,898 enrolled, 255 cases were extracted by diagnosis code alone, 197 by the combination of diagnosis and prescription codes, and 197 by the combination of diagnosis codes and prescription or surgical codes. The PPV for confirmed CD cases was 83% by diagnosis codes alone, but improved to 97% by combining with prescription codes. The inter-observer agreement was 0.9903.

Conclusions

Single ICD-code alone was insufficient to define CD; however, the algorithm that combined diagnosis codes with prescription codes indicated a sufficiently high PPV and will enable outcome-based research on CD using the Japanese claims database.

Introduction

Crohn’s disease (CD) is a chronic inflammatory bowel disease (IBD) of unknown etiology [1]. Recent progress on treatment for IBD has been remarkable, and many new drugs have been launched following randomized control trials (RCTs) [2]. At the same time, multiple clinical questions have arisen to help adapt the increased treatment options to better suit patients’ needs in clinical practice. Consequently, the importance of observational studies, as well as RCTs, are being reevaluated [3]. In fact, it has been demonstrated that RCTs represent only a small proportion of patients with IBD in real-world practice [4]. In this respect, large-scale observational studies are also needed.

The incidence and prevalence of CD are higher in Western countries [5] and are also increasing in Asian countries, including Japan [6]. Its prevalence is 1.51-322/100,000 in Western countries [5] and 55.6/100,000 in Japan [7]. When conducting real-world observational studies requiring a large number of patients in Japan, it is often difficult to obtain a sufficient sample size from a single or small number of institutions. For diseases with low prevalence, the claims database can therefore be a useful tool for conducting large-scale real-world observational studies [8, 9]. In fact, various epidemiological studies using the claims database have been successfully conducted [1012], and the usefulness of these databases has also been proven in IBD [8, 1315]. However, it is important to note that the diagnosis in the claims database may not always reflect the final medical diagnosis made in clinical practice, and validation studies are therefore necessary for each disease [1619]. Furthermore, in Japan, the validity of the Tenth Revision of the International Statistical Classification of Diseases and Related Health Problems (ICD-10) codes registered in the claims data for CD has not been evaluated. Thus, the reliability of claims database studies on CD using ICD-10 codes has not yet been confirmed. Therefore, the purpose of this study was to develop an algorithm for identifying CD using the claims data of the Japanese insurance system.

Materials and methods

Study design

This was a retrospective cross-sectional validation study that reviewed health insurance claims data and medical records. Patients who met the inclusion criteria and those who did not were randomly selected from the claims data of patients who visited Kitasato University Kitasato Institute Hospital (Tokyo, Japan) and filed for insurance reimbursement. The medical records of these patients were reviewed to evaluate the validity of the inclusion criteria. Case selection, random sampling, and statistical analyses were conducted in collaboration with the Japan Medical Data Center (JMDC) Corporation (Tokyo, Japan). The flow of the review process is shown in Fig 1.

Fig 1. Study design and data flow, cohort setting.

Fig 1

A total of 82,898 patients were enrolled during the study period. 255 and 197 patients who met the IC-A and B were extracted, respectively. Patients selected for IC-C (n = 197) were excluded from the subsequent analyses due to the same number of patients as in IC-B. JMDC; Japan Medical Data Center, ICD-10; Tenth Revision of the International Statistical Classification of Diseases and Related Health Problems, IC; Inclusion criteria, PPV; positive predictive value, NPV; negative predictive value.

Japanese health administrative data

Japan has a universal health insurance system, which covers almost all citizens, as they are obliged to join one of the systems according to their occupation and age [20]. At the end of each month, each medical provider files a set of reimbursement invoices to the insurance payer via the review organization. For this reason, medical institutions register all processes, drugs, procedures, and devices that are subject to reimbursement according to the Ministry of Health, Labor and Welfare’s standard codes, and this registration information is managed as the Japanese claims database [20, 21].

Setting

Kitasato University Kitasato Institute Hospital (Tokyo, Japan) is affiliated with Kitasato University; it has the Center for Advanced IBD Research and Treatment, which has 329 hospital beds. It received 865 outpatients and received 163 inpatients per day in FY2019.

Inclusion criteria

Patients who visited the hospital between January 2015 and December 2019 were included in the first sampling of the claims data. The observation period was set as the maximum period for which insurance claims data were available at the study site. According to the inclusion criteria listed below, the cases were divided into those that met the inclusion criteria and those that did not (Table 1). Age-stratified random sampling of 100 cases each was performed in cases that met the inclusion criteria and 200 cases from those that did not. The cases extracted using each inclusion criterion (IC-A/B/C) were defined as cohorts (Cohort-A/B/C).

Table 1. Inclusion criteria, and details of the confirmed diagnosis.

Criteria
Inclusion Criteria A Patients with a confirmed ICD-10 diagnostic code of CD (K50), without a confirmed ICD-10 diagnostic code of ulcerative colitis (K51) or Behcet’s disease (M35) in the same month.
B A + Prescription codes for CD in the same month
C A + Prescription or Surgical codes for CD in the same month
Details of the confirmed diagnosis a Confirmed diagnosis at own institution
b Diagnosed by an IBD specialist or gastroenterologist in another hospital
c Diagnosed by a primary care physician (with a description of the findings supporting the diagnosis)
d Diagnosed by a primary care physician (without a description of the findings supporting the diagnosis)

IBD; Inflammatory bowel disease, ICD-10; Tenth Revision of the International Statistical Classification of Diseases and Related Health Problems, CD; Crohn’s disease

Inclusion criteria A (IC-A; diagnostic code alone): Patients with a confirmed ICD-10 diagnostic code of CD (K50) (S1 Table) but without a confirmed ICD-10 diagnostic code of ulcerative colitis or Behçet’s disease in the same month.

Inclusion criteria B (IC-B; diagnostic and prescription codes): Cases fulfilling IC-A and with prescription codes (S2 Table) in the same month as the diagnostic codes.

Inclusion criteria C (IC-C; diagnostic, prescription, and surgical codes): Cases fulfilling IC-A with prescription codes or surgical codes (S2 Table) in the same month as the diagnostic codes.

Reviewing process

A medical chart review was independently performed by two gastroenterologists (chart reviewers with at least 5 years of clinical experience and training in IBD practice at a specialist center who are engaged in Kitasato University Kitasato Institute Hospital). The reviewers classified cases into three categories based on the gold standard according to the definition by the national guidelines [1] described in the section below as confirmed diagnosis, suspected diagnosis, and negative. If the two reviewers had different diagnoses, the final decision was made after (1) discussion between the two reviewers or (2) consultation with a third reviewer (a gastroenterologist and IBD specialist).

Gold standard and data collection of clinical information

The following data were collected for each randomly sampled case at the time when the inclusion criteria were met: age, sex, age of onset, disease type (Montreal classification), previous surgery (intestinal/anal), medications for CD, laboratory findings, examination results (upper and lower endoscopy, histopathology, small bowel radiography, magnetic resonance enterography, and intestinal ultrasound findings), discharge summary, referral letter, and registration of intractable disease application. The gold standard was based on the national guidelines of the Japanese Society of gastroenterology [1]. The details of cases with confirmed diagnoses were categorized as follows: a) diagnosed or confirmed the diagnosis at our own institution, b) diagnosed only by an IBD specialist or gastroenterologist in another hospital; c) diagnosed only by a primary care physician (with a description of the findings supporting the diagnosis), and d) diagnosed only by a primary care physician (without a description of the findings supporting the diagnosis).

Assessment of validity

For each inclusion criterion, validity was assessed for confirmed and suspected diagnoses. A 2 × 2 contingency table was created, and the validity was mainly calculated by the positive predictive value (PPV). The sensitivity, specificity, and negative predictive value (NPV) were also calculated. A total of 20% (120/600) of the total cases were independently reviewed by two chart reviewers per case to examine inter-rater reliability and another 20% of the total cases were reviewed twice by one chart reviewer with a two-week interval, to examine the intra-rater reliability.

Statistical analysis

All statistical analyses were performed using STATA/S v. 15.1 (Stata Corporation, College Station, Texas, USA). Continuous variables were expressed as the median interquartile range (IQR) or mean standard deviation (SD). Categorical variables were expressed as integers and percentages (%). A 2 × 2 contingency table was created to calculate the sensitivity, specificity, PPV, and NPV. Inter- and intra-rater reliability was assessed using kappa, weighted kappa, and AC1.

The sample size was set at 100 for cases that met the inclusion criteria and 200 for cases that did not. If the 95% confidence interval for PPV was set to within ±0.1, the required number of cases that met the inclusion criteria was 100. Since the prevalence of CD is 55.6/100,000 in Japan [7], approximately 370,000 cases that did not meet the inclusion criteria were required to detect the exact sensitivity and specificity. However, to ensure feasibility, only 200 cases were selected.

Ethical considerations

The study was conducted in accordance with the Declaration of Helsinki and Good Clinical Practice guidelines. The Research Ethics Committee of Kitasato University Kitasato Institute Hospital approved the study protocol and all necessary documents (approval number: 19047). The study used data already recorded, and the ethics committee approved a waiver of informed consent.

Results

Case extraction and medical record review

A total of 82,898 patients who visited Kitasato University Kitasato Institute Hospital during the study period were enrolled, and 255 and 197 cases who met IC-A and B respectively, were extracted. Although 197 cases were selected for IC-C, they were excluded from later analyses because the number of cases that met IC-C was the same as IC-B (Fig 1). In Cohort-A, PPV was 83.0% for only confirmed diagnosis and 90.0% for confirmed and suspected diagnosis, and in Cohort-B, PPV was 97.0% for only confirmed diagnosis and 100.0% for confirmed and suspected diagnosis (Table 2) (The 2×2 tables are shown in S3 Table).

Table 2. Assessment of validity for each cohort.

Cohort Diagnosis TP TN FP FN Sensitivity (95% CI) Specificity (95% CI) PPV (95% CI) NPV (95% CI)
A Confirmed 83 200 17 0 1.000(0.957–1.000) 0.922(0.878–0.954) 0.830(0.742–0.898) 1.000(0.982–1.000)
Confirmed & suspected 90 200 10 0 1.000(0.960–1.000) 0.952(0.914–0.977) 0.900(0.824–0.951) 1.000(0.982–1.000)
B Confirmed 97 200 3 0 1.000(0.963–1.000) 0.985(0.957–0.997) 0.970(0.915–0.994) 1.000(0.982–1.000)
Confirmed & suspected 100 200 0 0 1.000(0.964–1.000) 1.000(0.982–1.000) 1.000(0.964–1.000) 1.000(0.982–1.000)

*CD; Crohn’s disease, TP; true-positive, TN; true-negative, FP; false-positive, FN; false-negative, PPV; positive predictive value, NPV; negative predictive value

In Cohort-A, the positive predictive value (PPV) was 0.830 for confirmed and 0.900 for confirmed and suspected Crohn’s disease (CD) cases. In Cohort-B, the PPV was 0.970 for confirmed and 1.000 for confirmed and suspected CD cases. The negative predictive value (NPV) is 1.000 because there are no false-negative cases.

The characteristics of the patients who were diagnosed as confirmed and suspected cases in each cohort are shown in Table 3. In Cohort-A, 90 CD patients were diagnosed as confirmed and suspected cases [mean age 43.7±14.0, 62 males (68.9%)]; in Cohort-B (n = 100), the mean age was 44.3±14.7 and included 71 males (71.0%). In Cohort-A, 62% of the patients had CD confirmed by our medical records, 20% by an IBD specialist or gastroenterologist in another hospital, and 1% by primary care physicians without a description of the findings supporting the diagnosis (Fig 2). Of those, 7% were considered to have suspicious diagnoses and 10% were declared negative for CD, based on our medical records. Cases that were declared negative for CD included infectious enterocolitis (n = 4), intestinal Behçet’s disease (n = 2), drug-induced enterocolitis (n = 1), intestinal tuberculosis (n = 1), unspecified intestinal stenosis (n = 1), and cirrhosis (n = 1). In Cohort-B, 74% of the patients had CD confirmed by our medical records, 23% by an IBD specialist or gastroenterologist in another hospital, and 3% were considered to have suspicious diagnoses. No cases were declared negative for CD in Cohort-B.

Table 3. Baseline characteristics of each cohort.

Cohort A (N = 90) Cohort B (N = 100)
Age (mean ± SD, years) 43.65±13.99 44.33±14.66
Male, n (%) 62, 68.9% 71, 71.0%
Age at diagnosis (mean ± SD, years) 28.08±12.48 28.64±12.69
Disease duration, (mean ± SD, years) 10.52±9.13 11.46±10.80
Montreal Age at diagnosis, n (%)
 A1 (<16 years) 6 (6.7%) 4 (4.0%)
 A2 (17–40 years) 68 (75.6%) 76 (76%)
 A3 (>40 years) 13 (14.4%) 18(18%)
 unknown 3 (3.3%) 2 (2%)
Montreal Location, n (%)
 L1 (ileal) 19 (21.1%) 21 (21.0%)
 L2 (Colonic) 18 (20.0%) 15 (15.0%)
 L3 (ileo-colonic) 49 (54.4%) 63 (63.0%)
 + isolated L4 (upper) 7 (7.8%) 11 (11.0%)
 unknown 2 (2.2%) 1 (1.0%)
Montreal Behavior, n (%)
 B1 (Non-stricturing, non-penetrating) 46 (51.1%) 43 (43.0%)
 B2 (Stricturing) 29 (32.2%) 27 (27.0%)
 B3 (Penetrating) 13 (14.4%) 39 (30.0%)
 + perianal disease 29 (32.2%) 43 (43.0%)
 unknown 2 (2.2%) 0 (0.0%)
Prior history of surgery, n (%)
 intestine 25 (27.7%) 36 (36.0%)
 peri-anal 13 (14.4%) 21 (21.0%)
Intractable disease registration, n (%) 72 (80.0%) 92 (92.0%)
Review result, (confirmed / suspected) 83/7 97/3
Treatment
 Mesalazine, n (%) 71 (78.9%) 80 (80.0%)
 Immunomodulator, n (%) 44 (48.9%) 51 (51.0%)
 Elemental diet, n (%) 22 (24.4%) 26 (26.0%)
 Corticosteroid, n (%) 12 (13.3%) 13 (13.0%)
  Prednisolone, n (%) 6 (6.7%) 6 (6.0%)
  Budesonide, n (%) 6 (6.7%) 7 (7.0%)
 Biologics, n (%) 37 (41.1%) 46 (46.0%)
  Infliximab, n (%) 20 (22.2%) 27 (27.0%)
  Infliximab BS, n (%) 1 (1.1%) 1 (1.0%)
  Adalimumab, n (%) 16 (17.8%) 18 (18.0%)
  Ustekinumab, n (%) 1 (1.1%) 1 (1.0%)
  Vedolizumab, n (%) 0 (0.0%) 0 (0.0%)

*SD; standard deviation.

Fig 2. Details of medical chart review.

Fig 2

(A) Details of medical chart review for Cohort-A. 83% of cases were confirmed for Crohn’s disease (CD) (confirmed diagnosis at own institution, or another hospital). 7% were considered suspected diagnosis. Cases denied for CD (10%) included infectious enterocolitis (n = 4), intestinal Behçet’s disease (n = 2), drug-induced enterocolitis (n = 1), intestinal tuberculosis (n = 1), unspecified intestinal stenosis (n = 1), and cirrhosis (n = 1). (B) Details of medical chart review for Cohort-B. 97% of cases were confirmed CD. 3% were considered suspected diagnoses. None of the cases were denied CD. *a; Confirmed diagnosis at own institution, b; Diagnosed by an IBD specialist or gastroenterologist in another hospital, c; Diagnosed by a primary care physician (with a description of the findings supporting the diagnosis), d; Diagnosed by a primary care physician (without a description of the findings supporting the diagnosis), CD; Crohn’s disease.

Inter- and intra-rater reliability

The inter- and intra-rater reliability are shown in Table 4. The weighted kappa coefficient of inter-rater reliability was 0.9903 and that of intra-rater reliability was 0.9948, suggesting that the diagnoses derived by medical record review were valid.

Table 4. Inter- and intra-rater reliability of the medical chart review.

Kappa (95% CI) Weighted Kappa (95% CI) Gwet’s AC1 (95% CI)
Inter-rater reliability 0.9634(0.9136–1.0000) 0.9903(0.9768–1.0000) 0.9784(0.9481–1.0000)
Intra-rater reliability 0.9816(0.9457–1.0000) 0.9948(0.9845–1.0000) 0.9892(0.9678–1.0000)

Discussion

In this study, we first developed algorithms to extract CD cases from the Japanese claims database by assessing the accuracy of claim codes validated by medical chart review.

For a disease with a low prevalence of CD, it is difficult to secure a sufficient number of cases from a single center. Murakami et al. reviewed the number of CD cases from various facilities, and the maximum number of cases at a single specialist center was approximately 320 [7], which is a small number of cases when compared to the 70,700 cases in Japan as a whole [22]. Another issue is that large-scale observational studies are usually conducted in specialist centers, including numerous non-specialized facilities, and may not reflect real-world practice. A large-scale study utilizing big data is therefore necessary to examine populations representing real-world practice. The insurance claims database is a useful resource and has been used in several important studies [23, 24].

Since Japan has a universal health insurance system and almost all citizens are enrolled, the Japanese claims database is a very useful resource for real-world data in database studies. In addition to the databases owned by the government (National Database), commercial databases from private companies are also available (JMDC, Medical Data Vision), which are under contract to different insurance payers, and which are used to conduct database research and to support hospital management by analyzing medical costs. The National Database is a public database that contains data supporting more than 1 billion claims, as well as data and information on specific legal health checkups and guidance [20]. The Diagnosis Procedure Combination (DPC) database holds medical information of inpatients from 1,730 DPC-registered hospitals captured in 2018. The JMDC database is a commercial database that contains claims data for up to 7.3 million insured individuals, which represents approximately 6.1% of the Japanese population between 2005 and April 2020 and includes some salaried employees and their families. A previous study extracted 150 CD cases treated with biologic agents from this database [15]. The Medical Data Vision database is a commercial database that contains data on about 29.8 million patients who received treatment from approximately 400 DPC hospitals in Japan between April 2008 and October 2019. According to a previous database study, about 75,000 CD and ulcerative colitis cases were registered [25]. The validation of our present study is based on claims filed from the medical provider independently of payers; therefore, it is expected to be applicable in any of the claims databases.

There have been many studies on other diseases using the various databases mentioned above. Some utilized prior validation studies [26, 27], but others did not [11, 28]. However, it is possible that a lack of validated algorithms may significantly reduce the reliability of each database study. It is, therefore, extremely important to develop a validated algorithm to extract target diseases from the relevant databases [29].

In fact, in a previous study, the PPV was often remarkably low (60%) for extractions with only a single disease code, while an acceptable PPV (82–91%) was achieved by using repeated detection of the disease code as the extraction protocol [30]. In this study, we found that extraction by diagnostic codes alone (Cohort-A) resulted in the inclusion of other diseases, such as infectious enteritis and Behçet’s disease, which suggests that extraction from claims data by ICD-10 code alone is not sufficient. The PPV of confirmed CD cases Cohort-A of this study was 83%. In general, other studies have set the target PPV as 85% or higher [31]. These PPV values in Cohort-A did not reach this level. However, Cohort-B, in which prescription codes were added, resulted in a remarkably improved PPV of 97.0%. This is comparable to the PPVs for other diseases in Japan [19, 32, 33] and is therefore considered to be acceptable for general extraction algorithms. The number of cases extracted by IC-B and IC-C, which had additional surgical codes was the same (n = 197). In other words, most cases that underwent a CD-related procedure or surgery were likely to receive the prescription code for CD at the same time, showing that there was little significance in adding the procedure or surgery code.

Some algorithms have been used to extract IBD from other claims databases, such as the algorithm for the Korean National database, which achieved a PPV of approximately 98% by combining the ICD-10 codes, treatment with the incurable disease application code, and the number of hospital visits for IBD [34]. CD is one of the diseases included in the Intractable Disease Registry by the Ministry of Health, Labor and Welfare. However, a certain proportion of patients (20.0% of cohort A and 8.0% of cohort B) were not registered in the registry. This means that Intractable Disease Registry may not necessarily reflect the real world.

Other algorithms that combine Ninth Revision of the International Statistical Classification of Diseases and Related Health Problems (ICD-9) codes with the number of visits and hospitalizations have also reported good PPVs: 81–91% from the Veterans Affairs Health Care System and 94–98% from the Canadian claims database. Ananthakrishnan et al. [35] also reported a PPV of 98% by combining the ICD-9 codes, medical record information, and the complications of IBD for claims data from two tertiary referral hospitals. The results of this study are also comparable to those of other such studies.

We confirmed the extracted patient population in two additional ways. Inter- and intra-rater agreements of the chart review results were confirmed to ensure the reliability of the validation. In addition, the validated cohort in our study was similar to the characteristics of patients in terms of the sex ratio, Montreal classification, prior history of surgery, and previously reported treatment from other specialist centers [3538].

This study has several limitations. First, although the algorithm developed in this study successfully demonstrated excellent PPV, it is important to note that the study was conducted at a single specialist center, where the prior probability of CD patients among all patients is likely to be much higher than that in the non-specialist centers. Therefore, it is possible that PPV was overestimated compared to real-world clinical practice. In addition, it is also unclear whether cases extracted from the claims database using our algorithm would represent real-world practice in the entire patient population. Further studies to validate our algorithm are warranted from various types of facilities, including non-specialist general hospitals and private clinics.

Second, it is possible that IC-B may inappropriately exclude the true CD patients who have stopped medications and are no longer prescribed. Using the PPV in this study, the numbers of patients who met the IC−A and B before random sampling were estimated to be 211 (PPV 83%) and 191 (PPV 97%), respectively. In other words, IC-B might have excluded approximately 20 patients who had no prescription for several years. If the aim of the study requires to extract such patients together, IC-A should be used with caution to its low PPV. However, considering the disease behavior of CD, it is very rare that a whole set of treatment is discontinued for several years once it has been prescribed.

Third, although the sensitivity (100%), specificity (92–100%), and NPV (100%) shown in our study were excellent, the sample size considered sufficient to accurately calculate these parameters was calculated as 37,000, considering the actual prevalence of CD (55.6/100,000), and thus our sample size (200 cases) is too small. Therefore, the accuracy of these parameters cannot be assured, and 2 × 2 tables with adjusted weights are also assumptive (S3 Table). However, PPV is generally considered to be the most important to develop the extraction algorithms. Moreover, the sample size required for the calculation of PPV is reported to be much smaller [39, 40]. Therefore, our algorithm is still likely to help appropriately define CD cases from the large-scale claims database.

In conclusion, this study established an algorithm to extract CD from the Japanese claims database and will be of importance in future large-scale real-world studies using the claims database.

Supporting information

S1 Table. ICD-10 diagnostic code.

(XLSX)

S2 Table. Prescription codes and surgical codes for this study.

(XLSX)

S3 Table. A 2 × 2 contingency tables for inclusion criteria and validation criteria.

The number listed is the actual number of validated cases, and the number in parentheses is the assumed number of cases in the entire Kitasato Research Institute Hospital, calculated based on the prevalence calculated from all cases extracted in this study (82,898).

(XLSX)

S1 File

(DOCX)

Acknowledgments

The authors are grateful to Hiroki Kiyohara, Yuki Watanabe (Center for Advanced IBD Research and Treatment, Kitasato University Kitasato Institute Hospital), Takashi Tanaka, Katsuhiko Nagai (Japan Medical Data Center Co., Ltd.) for their assistance in this study.

Data Availability

All files are available from the GitHub database (https://github.com/HiromuMorikubo/pone2021).

Funding Statement

This study was funded by JMDC Inc. The funder provided support in the form of salaries for TN, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. There was no additional external funding received for this study. The specific roles of these authors are articulated in the author contributions section.

References

  • 1.Matsuoka K., Kobayashi T., Ueno F., Matsui T., Hirai F., Inoue N., et al., Evidence-based clinical practice guidelines for inflammatory bowel disease. J Gastroenterol, 2018. 53(3): p. 305–353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Feagan B.G., Sandborn W.J., Gasink C., Jacobstein D., Lang Y., Friedman J.R., et al., Ustekinumab as Induction and Maintenance Therapy for Crohn’s Disease. N Engl J Med, 2016. 375(20): p. 1946–1960. doi: 10.1056/NEJMoa1602773 [DOI] [PubMed] [Google Scholar]
  • 3.Issa J., D. and David M.K., Can the learning health care system be educated with observational data? JAMA, 2014. 312: p. 129–130. doi: 10.1001/jama.2014.4364 [DOI] [PubMed] [Google Scholar]
  • 4.Ha C., Ullman T.A., Siegel C.A., and Kornbluth A., Patients enrolled in randomized controlled trials do not represent the inflammatory bowel disease patient population. Clin Gastroenterol Hepatol, 2012. 10(9): p. 1002–7; quiz e78. doi: 10.1016/j.cgh.2012.02.004 [DOI] [PubMed] [Google Scholar]
  • 5.Ng S.C., Shi H.Y., Hamidi N., Underwood F.E., Tang W., Benchimol E.I., et al., Worldwide incidence and prevalence of inflammatory bowel disease in the 21st century: a systematic review of population-based studies. The Lancet, 2017. 390(10114): p. 2769–2778. doi: 10.1016/S0140-6736(17)32448-0 [DOI] [PubMed] [Google Scholar]
  • 6.Vegh Z., Kurti Z., and Lakatos P.L., Epidemiology of inflammatory bowel diseases from west to east. J Dig Dis, 2017. 18(2): p. 92–98. doi: 10.1111/1751-2980.12449 [DOI] [PubMed] [Google Scholar]
  • 7.Murakami Y., Nishiwaki Y., Oba M.S., Asakura K., Ohfuji S., Fukushima W., et al., Estimated prevalence of ulcerative colitis and Crohn’s disease in Japan in 2014: an analysis of a nationwide survey. J Gastroenterol, 2019. 54(12): p. 1070–1077. doi: 10.1007/s00535-019-01603-8 [DOI] [PubMed] [Google Scholar]
  • 8.Kosa F., Kunovszki P., Borsi A., Ilias A., Palatka K., Szamosi T., et al., Anti-TNF dose escalation and drug sustainability in Crohn’s disease: Data from the nationwide administrative database in Hungary. Dig Liver Dis, 2020. 52(3): p. 274–280. doi: 10.1016/j.dld.2019.09.020 [DOI] [PubMed] [Google Scholar]
  • 9.Bergmann M.M., Hernandez V., Bernigau W., Boeing H., Chan S.S., Luben R., et al., No association of alcohol use and the risk of ulcerative colitis or Crohn’s disease: data from a European Prospective cohort study (EPIC). Eur J Clin Nutr, 2017. 71(4): p. 512–518. doi: 10.1038/ejcn.2016.271 [DOI] [PubMed] [Google Scholar]
  • 10.Sruamsiri R., Iwasaki K., Tang W., and Mahlich J., Persistence rates and medical costs of biological therapies for psoriasis treatment in Japan: a real-world data study using a claims database. BMC Dermatol, 2018. 18(1): p. 5. doi: 10.1186/s12895-018-0074-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Uno S., Goto R., Suzuki K., Iwasaki K., Takeshima T., and Ohtsu T., Current treatment patterns and medical costs for multiple myeloma in Japan: a cross-sectional analysis of a health insurance claims database. J Med Econ, 2020. 23(2): p. 166–173. doi: 10.1080/13696998.2019.1686870 [DOI] [PubMed] [Google Scholar]
  • 12.Sato H., Yokomichi H., Takahashi K., Tominaga K., Mizusawa T., Kimura N., et al., Epidemiological analysis of achalasia in Japan using a large-scale claims database. J Gastroenterol, 2019. 54(7): p. 621–627. doi: 10.1007/s00535-018-01544-8 [DOI] [PubMed] [Google Scholar]
  • 13.Schwartz D.A., Tagarro I., Carmen Diez M., and Sandborn W.J., Prevalence of Fistulizing Crohn’s Disease in the United States: Estimate From a Systematic Literature Review Attempt and Population-Based Database Analysis. Inflamm Bowel Dis, 2019. 25(11): p. 1773–1779. doi: 10.1093/ibd/izz056 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Kobayashi T., Udagawa E., Uda A., Hibi T., and Hisamatsu T., Impact of immunomodulator use on treatment persistence in patients with ulcerative colitis: A claims database analysis. J Gastroenterol Hepatol, 2020. 35(2): p. 225–232. doi: 10.1111/jgh.14825 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Yokoyama K., Yamazaki K., Katafuchi M., and Ferchichi S., A Retrospective Claims Database Study on Drug Utilization in Japanese Patients with Crohn’s Disease Treated with Adalimumab or Infliximab. Adv Ther, 2016. 33(11): p. 1947–1963. doi: 10.1007/s12325-016-0406-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Eindhoven D.C., van Staveren L.N., van Erkelens J.A., Ikkersheim D.E., Cannegieter S.C., Umans V., et al., Nationwide claims data validated for quality assessments in acute myocardial infarction in the Netherlands. Neth Heart J, 2018. 26(1): p. 13–20. doi: 10.1007/s12471-017-1055-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Langner I., Ohlmeier C., Haug U., Hense H.W., Czwikla J., and Zeeb H., Implementation of an algorithm for the identification of breast cancer deaths in German health insurance claims data: a validation study based on a record linkage with administrative mortality data. BMJ Open, 2019. 9(7): p. e026834. doi: 10.1136/bmjopen-2018-026834 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Nakayama T., Imanaka Y., Okuno Y., Kato G., Kuroda T., Goto R., et al., Analysis of the evidence-practice gap to facilitate proper medical care for the elderly: investigation, using databases, of utilization measures for National Database of Health Insurance Claims and Specific Health Checkups of Japan (NDB). Environ Health Prev Med, 2017. 22(1): p. 51. doi: 10.1186/s12199-017-0644-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Ando T., Ooba N., Mochizuki M., Koide D., Kimura K., Lee S.L., et al., Positive predictive value of ICD-10 codes for acute myocardial infarction in Japan: a validation study at a single center. BMC Health Serv Res, 2018. 18(1): p. 895. doi: 10.1186/s12913-018-3727-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Ministry of Health, Labour and Welfare. https://www.mhlw.go.jp/english/index.html.
  • 21.Ikegami N., Yoo B.-K., Hashimoto H., Matsumoto M., Ogata H., Babazono A., et al., Japanese universal health coverage: evolution, achievements, and challenges. The Lancet, 2011. 378(9796): p. 1106–1115. [DOI] [PubMed] [Google Scholar]
  • 22.Japan Intractable Disease Information Center.; http://www.nanbyou.or.jp.
  • 23.Clayton J.L., Jones S.G., Dunn J.R., Schaffner W., and Jones T.F., Enhancing Lyme Disease Surveillance by Using Administrative Claims Data, Tennessee, USA. Emerg Infect Dis, 2015. 21(9): p. 1632–4. doi: 10.3201/eid2109.150344 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Paller A.S., Siegfried E.C., Vekeman F., Gadkari A., Kaur M., Mallya U.G., et al., Treatment patterns of pediatric patients with atopic dermatitis: A claims data analysis. J Am Acad Dermatol, 2020. 82(3): p. 651–660. doi: 10.1016/j.jaad.2019.07.105 [DOI] [PubMed] [Google Scholar]
  • 25.Kobayashi T., Uda A., Udagawa E., and Hibi T., Lack of Increased Risk of Lymphoma by Thiopurines or Biologics in Japanese Patients with Inflammatory Bowel Disease: A Large-Scale Administrative Database Analysis. J Crohns Colitis, 2020. 14(5): p. 617–623. doi: 10.1093/ecco-jcc/jjz204 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Yamana H., Moriwaki M., Horiguchi H., Kodan M., Fushimi K., and Yasunaga H., Validity of diagnoses, procedures, and laboratory data in Japanese administrative data. J Epidemiol, 2017. 27(10): p. 476–482. doi: 10.1016/j.je.2016.09.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Hara K., Tomio J., Svensson T., Ohkuma R., Svensson A.K., and Yamazaki T., Association measures of claims-based algorithms for common chronic conditions were assessed using regularly collected data in Japan. J Clin Epidemiol, 2018. 99: p. 84–95. doi: 10.1016/j.jclinepi.2018.03.004 [DOI] [PubMed] [Google Scholar]
  • 28.Izumi K., Morimoto K., Hasegawa N., Uchimura K., Kawatsu L., Ato M., et al., Epidemiology of Adults and Children Treated for Nontuberculous Mycobacterial Pulmonary Disease in Japan. Ann Am Thorac Soc, 2019. 16(3): p. 341–347. doi: 10.1513/AnnalsATS.201806-366OC [DOI] [PubMed] [Google Scholar]
  • 29.Benchimol E.I., Manuel D.G., To T., Griffiths A.M., Rabeneck L., and Guttmann A., Development and use of reporting guidelines for assessing the quality of validation studies of health administrative data. J Clin Epidemiol, 2011. 64(8): p. 821–9. doi: 10.1016/j.jclinepi.2010.10.006 [DOI] [PubMed] [Google Scholar]
  • 30.Hou J.K., Tan M., Stidham R.W., Colozzi J., Adams D., El-Serag H., et al., Accuracy of diagnostic codes for identifying patients with ulcerative colitis and Crohn’s disease in the Veterans Affairs Health Care System. Dig Dis Sci, 2014. 59(10): p. 2406–10. doi: 10.1007/s10620-014-3174-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Lacasse Y., Daigle J., Martin S., and Maltais F., Validity of chronic obstructive pulmonary disease diagnoses in a large administrative database. Can Respir J, 2012. 19: p. e5–9. doi: 10.1155/2012/260374 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Sato I., Yagata H., and Ohashi Y., The Accuracy of Japanese Claims Data in Identifying Breast Cancer Cases. Biol Pharm Bull, 2015. 38(1): p. 53–57. doi: 10.1248/bpb.b14-00543 [DOI] [PubMed] [Google Scholar]
  • 33.Ooba N., Setoguchi S., Ando T., Sato T., Yamaguchi T., Mochizuki M., et al., Claims-based definition of death in Japanese claims database: validity and implications. PLoS One, 2013. 8(5): p. e66116. doi: 10.1371/journal.pone.0066116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Lee C.K., Ha H.J., Oh S.J., Kim J.W., Lee J.K., Kim H.S., et al., Nationwide validation study of diagnostic algorithms for inflammatory bowel disease in Korean National Health Insurance Service database. J Gastroenterol Hepatol, 2020. 35(5): p. 760–768. doi: 10.1111/jgh.14855 [DOI] [PubMed] [Google Scholar]
  • 35.Ananthakrishnan A.N., Cai T., Savova G., Cheng S.C., Chen P., Perez R.G., et al., Improving case definition of Crohn’s disease and ulcerative colitis in electronic medical records using natural language processing: a novel informatics approach. Inflamm Bowel Dis, 2013. 19(7): p. 1411–20. doi: 10.1097/MIB.0b013e31828133fd [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Yasukawa S., Matsui T., Yano Y., Sato Y., Takada Y., Kishi M., et al., Crohn’s disease-specific mortality: a 30-year cohort study at a tertiary referral center in Japan. J Gastroenterol, 2019. 54(1): p. 42–52. doi: 10.1007/s00535-018-1482-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Huang S., Li L., Ben-Horin S., Mao R., Lin S., Qiu Y., et al., Mucosal Healing Is Associated With the Reduced Disabling Disease in Crohn’s Disease. Clin Transl Gastroenterol, 2019. 10(3): p. e00015. doi: 10.14309/ctg.0000000000000015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Peyrin-Biroulet L., Loftus E.V. Jr., Colombel J.F., and Sandborn W.J., The natural history of adult Crohn’s disease in population-based cohorts. Am J Gastroenterol, 2010. 105(2): p. 289–97. doi: 10.1038/ajg.2009.579 [DOI] [PubMed] [Google Scholar]
  • 39.Cutrona S.L., Toh S., Iyer A., Foy S., Cavagnaro E., Forrow S., et al., Design for validation of acute myocardial infarction cases in Mini-Sentinel. Pharmacoepidemiol Drug Saf, 2012. 21 Suppl 1: p. 274–81. doi: 10.1002/pds.2314 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Semins M.J., Trock B.J., and Matlaga B.R., Validity of administrative coding in identifying patients with upper urinary tract calculi. J Urol, 2010. 184(1): p. 190–2. doi: 10.1016/j.juro.2010.03.011 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Valérie Pittet

1 Jul 2021

PONE-D-21-09822

Development of algorithms for identifying patients with Crohn’s disease in the Japanese Health Insurance Claims Database

PLOS ONE

Dear Dr. Kobayashi,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Aug 15 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Valérie Pittet, PhD

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf.

2.Please note that PLOS ONE has guidelines on software sharing (https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-software). Accordingly, we encourage you to make the code for the algorithm described in your manuscript publicly available, if it has not been published in full previously.

3. Thank you for stating the following financial disclosure:

“This work was partly supported by JMDC Inc.. JMDC Inc. helped study design, data collection from claims data. There was no additional external funding received for this study. JMDC Inc. URL https://www.jmdc.co.jp/en/index”

At this time, please address the following queries:

a) Please clarify the sources of funding (financial or material support) for your study. List the grants or organizations that supported your study, including funding received from your institution.

b) State what role the funders took in the study. If the funders had no role in your study, please state: “The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”

c) If any authors received a salary from any of your funders, please state which authors and which funders.

d) If you did not receive any funding for this study, please state: “The authors received no specific funding for this work.”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

4.Your ethics statement should only appear in the Methods section of your manuscript. If your ethics statement is written in any section besides the Methods, please delete it from any other section.

Additional Editor Comments (if provided):

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This research is an attempt to investigate disease identification algorithms and validations for making descriptive statistics and analysis of intractable diseases called Crohn's disease (CD) using claims data. So, it is positioned as a valuable study for analyzing many cases that cannot easily be obtained by RCT. On the other hand, I think some of the findings you have introduced should be summarized and commented in a more detailed manner. Please check the the comments listed below;

(P6, L95) How is this study dealing with cases of visiting the facility or being diagnosed and treated for CD before January 2015 and having continuously visited after 2015? What is the reasons for setting this target period? If there was a case in which the patient was heavily prescribed before 2015 but was not prescribed after January 2015, it seems to be included in inclusion criteria A and not included in inclusion criteria B according to the established protocol of this study. But from the viewpoint of pathological condition, isn't it appropriate to interpret it as a case included in inclusion criteria B? In the case of CD, it is unlikely that the same patient will be given or deleted the disease code of "CD" repeatedly, but as to the prescription, it often happens that doctors change the status of prescription for the same CD patient during the course. The author's view needs to be clarified, since the results will vary greatly depending on the setting of the study period and the interpretation of the medication process, whether or not they fall under inclusion criteria B.

(P8, L124) Although the information of registration of intractable disease application is used for confirming whether the case is really CD or not, it may be possible to judge to some extent by checking the presence or absence of the legal number (first 2 digits) of the public funder number for intractable specific diseases patients in the claims data (listed at "KO" code, f). Have you considered reflecting the presence or absence of a public funder number to formulize the inclusion criteria A?

(P10, L148) The number of cases that do not meet the inclusion criteria is 200, isn't it too small? Why did you choose 200 cases? Because of the smallness of the cases, the legitimacy of the findings that there were no CD patients in cases that did not meet the criteria could be questioned. I think it is necessary to devise something to increase the persuasiveness if you appeal that you have obtained the high sensitivity and high specificity.

(S2 Table) Although the prescription code you introduce (ex, "2399009F1149", "2399009F2030") is an individual drug code, it is not a code used for insurance claims. It looks impossible to grasp the medication status from claims data by using this code directly.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Oct 13;16(10):e0258537. doi: 10.1371/journal.pone.0258537.r002

Author response to Decision Letter 0


4 Aug 2021

We truly appreciate the positive comments and insightful criticisms of the reviewers, especially because adding an accurate interpretation of the inclusion period has greatly strengthened our study.

Clearly the novelty and timeliness of this work was acknowledged. In a competitive field, we believe that PLOS ONE is the appropriate forum for publication of our algorithm for extracting CD patients from insurance databases and our evaluation of its accuracy. Changes in the revised manuscript are denoted in red text for ease of review.

Reviewer #1: This research is an attempt to investigate disease identification algorithms and validations for making descriptive statistics and analysis of intractable diseases called Crohn's disease (CD) using claims data. So, it is positioned as a valuable study for analyzing many cases that cannot easily be obtained by RCT. On the other hand, I think some of the findings you have introduced should be summarized and commented in a more detailed manner. Please check the comments listed below;

We truly appreciate the favorable feedback. The responses to each comment have been highlighted.

(P6, L95) How is this study dealing with cases of visiting the facility or being diagnosed and treated for CD before January 2015 and having continuously visited after 2015? What is the reasons for setting this target period?

I agree with the reviewer. The observation period for this study was set from 2015 to 2019 in order to maximize the dataset to avoid the bias as the reviewer mentioned. In Japan, insurance claims data have to be stored for at least three years, and the data prior to that period may not be available. However, we were fortunate that our facility stored the data since 2015 when we initiated this study in 2019, and this is why we took advantage of maximum study period for this study. Additional explanation has been added to the Method section.

“The observation period was set as the maximum period for which insurance claims data were available at the study site.” Please confirm (Page 6, Lines 97-98).

If there was a case in which the patient was heavily prescribed before 2015 but was not prescribed after January 2015, it seems to be included in inclusion criteria A and not included in inclusion criteria B according to the established protocol of this study. But from the viewpoint of pathological condition, isn't it appropriate to interpret it as a case included in inclusion criteria B? In the case of CD, it is unlikely that the same patient will be given or deleted the disease code of "CD" repeatedly, but as to the prescription, it often happens that doctors change the status of prescription for the same CD patient during the course. The author's view needs to be clarified, since the results will vary greatly depending on the setting of the study period and the interpretation of the medication process, whether or not they fall under inclusion criteria B.

As the reviewer mentioned, CD who had prescriptions before and did not have any prescriptions during the study period should have been excluded by the inclusion criteria B. Considering the PPV in this study, the number of true CD is estimated as 211 (PPV: 83%) for IC-A and 191 (PPV: 97%) for IC-B, which means that 20 true CD patients may be missed in IC-B. Of these, the number of people who fit IC-B before the study period is unknown. However, considering the disease behavior of Crohn's disease, treatment is rarely discontinued for several years once it has been administered. It is expected that such cases are often very mild CD or not CD at all. Additional explanation has been added to the Discussion section.

“Second, it is possible that IC-B may inappropriately exclude the true CD patients who have stopped medications and are no longer prescribed. Using the PPV in this study, the numbers of patients who met the IC−A and B before random sampling were estimated to be 211 (PPV 83%) and 191 (PPV 97%), respectively. In other words, IC-B might have excluded approximately 20 patients who had no prescription for several years. If the aim of the study requires to extract such patients together, IC-A should be used with caution to its low PPV. However, considering the disease behavior of CD, it is very rare that a whole set of treatment is discontinued for several years once it has been prescribed.” Please confirm (Page 20, Lines 279-285).

(P8, L124) Although the information of registration of intractable disease application is used for confirming whether the case is really CD or not, it may be possible to judge to some extent by checking the presence or absence of the legal number (first 2 digits) of the public funder number for intractable specific diseases patients in the claims data (listed at "KO" code, f). Have you considered reflecting the presence or absence of a public funder number to formulize the inclusion criteria A?

Thank you for the suggestion. This study looked at the utilization rate of applications for intractable diseases in cases of CD diagnoses (Table 3), but applications for intractable diseases were not completed in a certain proportion of cases (20.0% of cohort A and 8.0 % of cohort B were not registered.). In contrast, it is very unlikely that cases with KO codes do not meet the inclusion criteria B because patients need to meet either disease severity (moderate or severe) or a certain level of annual medical expense. In other words, adding intractable disease codes (KO) to the inclusion criteria and use it instead of IC-B would not increase the accuracy but would miss some CD patients.

This has been added to the Discussion section.

“CD is one of the diseases included in the Intractable Disease Registry by the Ministry of Health, Labor and Welfare. However, a certain proportion of patients (20.0% of cohort A and 8.0 % of cohort B) were not registered in the registry. This means that Intractable Disease Registry may not necessarily reflect the real world.” (Page18, Lines255-258)

(P10, L148) The number of cases that do not meet the inclusion criteria is 200, isn't it too small? Why did you choose 200 cases? Because of the smallness of the cases, the legitimacy of the findings that there were no CD patients in cases that did not meet the criteria could be questioned. I think it is necessary to devise something to increase the persuasiveness if you appeal that you have obtained the high sensitivity and high specificity.

Thank you for this important comment. Agreeing with the reviewer, we consider that 200 cases that do not meet the inclusion criteria is too few for an accurate assessment of sensitivity and specificity. However, as stated in the Method section, the required number calculated from the prevalence of Crohn's disease is approximately 370,000, which is not feasible. Since the main purpose of the algorithm in this study is to accurately extract Crohn's disease from insurance claims data, we believe that PPV is the most important factor.

We revised the discussion section.

”the sample size considered sufficient to accurately calculate these parameters was calculated as 37,000, considering the actual prevalence of CD (55.6/100,000), and thus our sample size (200 cases) is too small”

(Page 20, Lines287-289)

(S2 Table) Although the prescription code you introduce (ex, "2399009F1149", "2399009F2030") is an individual drug code, it is not a code used for insurance claims. It looks impossible to grasp the medication status from claims data by using this code directly.

Thank you for this important comment. We apologize that the code list has confused you. We have revised the table to the insurance claims codes instead of the individual drug codes (S2 table).

In summary, on behalf of my co-authors I would like to again thank the Editors and Reviewers for their time and helpful comments. I look forward to further comments and I would be happy to provide any further clarifications.

Most sincerely,

Taku Kobayashi, MD., PhD.

Vice Director and Associate Professor

Center for Advanced IBD Research and Treatment

Kitasato University Kitasato Institute Hospital

5-9-1 Shirokane, Minato-ku, Tokyo 108-8642

+81-3-3444-6161

Email: kobataku@insti.kitasato-u.ac.jp

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Valérie Pittet

30 Sep 2021

Development of algorithms for identifying patients with Crohn’s disease in the Japanese Health Insurance Claims Database

PONE-D-21-09822R1

Dear Dr. Kobayashi,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Valérie Pittet, PhD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Valérie Pittet

5 Oct 2021

PONE-D-21-09822R1

Development of algorithms for identifying patients with Crohn’s disease in the Japanese Health Insurance Claims Database

Dear Dr. Kobayashi:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

PD Dr. Valérie Pittet

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. ICD-10 diagnostic code.

    (XLSX)

    S2 Table. Prescription codes and surgical codes for this study.

    (XLSX)

    S3 Table. A 2 × 2 contingency tables for inclusion criteria and validation criteria.

    The number listed is the actual number of validated cases, and the number in parentheses is the assumed number of cases in the entire Kitasato Research Institute Hospital, calculated based on the prevalence calculated from all cases extracted in this study (82,898).

    (XLSX)

    S1 File

    (DOCX)

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    All files are available from the GitHub database (https://github.com/HiromuMorikubo/pone2021).


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES