Table 5.
Inflammatory bowel disease | |||||
Country/Region | Database | Area of research | Sample size | Design, statistical methods and 3V | Application |
South Korea | Korean Health Insurance Review and Assessment Service (HIRA) | UC | 11233 | Nationwide retrospective cohort study | Incidence and clinical impact of perianal disease in UC |
Song et al[97], 2018 | |||||
Comparator: general population | |||||
Volume, Velocity and Variety | |||||
Taiwan, China | Taiwan National Health Insurance Database (NHID) | IBD | 38039 | Nationwide retrospective cohort study to compare IBD patients with general population to derive SIR | Association between IBD and herpes zoster infection |
Chang et al[98], 2018 | |||||
Hospital based nested case-control study | |||||
Volume, Velocity and Variety | |||||
Sweden | Swedish Patient Registry | UC | 63711 | Nationwide retrospective cohort study | Association between appendectomy and UC |
Myrelid et al[99], 2017 | |||||
Volume, Velocity and Variety | |||||
Swedish Medical Birth Register (child-mother link) | IBD | 827,239 children born between 2006 and 2013 | Nationwide prospective population-based register study | Association between maternal exposure to antibiotics during pregnancy and very early onset IBD in adulthood | |
Ortqvist et al[72], 2019 | |||||
Volume, Velocity and Variety | |||||
Swedish Multigeneration Register (child-father link) | |||||
Swedish Prescribed Drug Register National Patient Register | |||||
United States | NCBI Gene Expression Omnibus (GEO) | IBD | n.a. | Signature inversion study | Topiramate as a potential therapeutic agent against IBD |
Dudley et al[70], 2011 | |||||
Volume, Velocity and Variety | |||||
United States | n.a. | IBD | 1585 | Retrospective cohort study Natural language processing | Association between arthralgia and biologics (anti-TNF vs vedolizumab) |
Cai et al[20], 2018 | |||||
Volume, Velocity and Variety | |||||
n.a | International IBD Genetics Consortium's Immunochip project | IBD | 53279 | Machine learning algorithm | Predictors of IBD |
Wei et al[64], 2013 | |||||
Volume, Velocity and Variety | |||||
United States | n.a. | IBD | 575 colonoscopy reports | Retrospective cohort study Natural language processing | Differentiation of surveillance from non-surveillance colonoscopy |
Hou et al[100], 2013 | |||||
Volume, Velocity and Variety | |||||
United States | n.a. | IBD | 1080 | Retrospective cohort study | Prediction of IBD remission in thiopurine users |
Waljee et al[66], 2017 | |||||
Random Forest machine learning algorithm | |||||
United States | n.a. | IBD | 20368 | Retrospective cohort study | Prediction of hospitalization and outpatient steroid use |
Waljee et al[65], 2017 | |||||
Random Forest machine learning algorithm | |||||
n.a. | Phase 3 clinical trial data | IBD | 491 | Retrospective cohort study | Prediction of steroid-free endoscopic remission with vedolizumab in UC |
Waljee et al[67], 2018 | |||||
Random Forest machine learning algorithm | |||||
Volume, Velocity and Variety |
This list is not exhaustive, but serves to provide a few distinct examples of how Big Data analysis can generate high-quality research outputs in the field of gastroenterology and hepatology. 3V: Volume/velocity/variety; UC: Ulcerative colitis; IBD: Inflammatory bowel disease; SIR: Standardized incidence ratio; anti-TNF: anti-tumour necrosis factor.