Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Oct 1.
Published in final edited form as: Cancer Epidemiol Biomarkers Prev. 2020 Feb 17;29(4):816–822. doi: 10.1158/1055-9965.EPI-19-0873

Leveraging Digital Data to Inform and Improve Quality Cancer Care

Tina Hernandez-Boussard 1,2,3, Douglas W Blayney 1,4, James D Brooks 4,5
PMCID: PMC7195903  NIHMSID: NIHMS1563500  PMID: 32066619

Abstract

Background:

Efficient capture of routine clinical care and patient outcomes are needed at a population-level, as is evidence on important treatment-related side effects and their effect on well-being and clinical outcomes. The increasing availability of electronic health records (EHRs) offers new opportunities to generate population-level patient-centered evidence on oncological care that can better guide treatment decisions and patient-valued care.

Methods:

This study includes patients seeking care at an academic medical center, 2008–2018. Digital data sources are combined to address missingness, inaccuracy, and noise common to EHR data. Clinical concepts were identified and extracted from EHR unstructured data using natural language processing (NLP) and machine/deep learning techniques. All models are trained, tested, and validated on independent data samples using standard metrics.

Results:

We provide use cases for using EHR data to assess guideline adherence and quality measurements among cancer patients. Pretreatment assessment was evaluated by guideline adherence and quality metrics for cancer staging metrics. Our studies in perioperative quality focused on medications administered and guideline adherence. Patient outcomes included treatment-related side-effects and patient-reported outcomes.

Conclusions:

Advanced technologies applied to EHRs present opportunities to advance population-level quality assessment, to learn from routinely collected clinical data for personalized treatment guidelines, and to augment epidemiological and population health studies. The effective use of digital data can inform patient-valued care, quality initiatives and policy guidelines.

Impact:

A comprehensive set of health data analyzed with advanced technologies results in a unique resource that facilitates wide-ranging, innovative, and impactful research on prostate cancer. This work demonstrates new ways to use the EHRs and technology to advance epidemiological studies and benefit oncological care.

Keywords: Electronic Health Records, Quality Measurement, Natural Language Processing, Machine Learning, Oncology

INTRODUCTION

Digital technology and a focus on quality and value of care has promise to transform the health care delivery system. Over twenty years ago, two startling reports from the Institute of Medicine spotlighted patient safety and quality of care delivery.1,2 Recommendations included creation of evidence-based guidelines, quality measures and use of electronic data collection systems. Federal agencies and oncology organizations responded and these efforts spurred health care system improvement by developing and implementing quality measurement tools and systems.37 However, the tools and systems developed reflected the data available for ‘high volume’ clinical analytics at that time –mainly insurance claims data. Claims data generally include encounter-level information regarding diagnoses, treatments and billing information yet are limited in their capture of patient outcomes (particularly patient-reported outcomes), clinical decisions, patient-values, and secondary diagnoses/problem lists.8,9 These data limitations resulted in an abundance of quality measures focused on processes of care10 that have been subsequently linked to clinician burnout11 and also too few measures that are actually linked to improvements in patient outcomes, which are of importance to patients and their caregivers, as well as health care purchasers.12

Advances in informatics and digitalization of information further promised a positive transformation of health care. Initial adoption of informatics for patient care was slow13 but the rapid uptake of tools and software for clinical care followed passage of the HiTech Act of 2009, which provided monetary incentives to implement electronic health record (EHR) systems.14 As a result, over 90% of US hospitals had a functioning EHR system by 2017.15 The EHR captures “real world” care: care that is longitudinal, multidisciplinary, occurs across settings, and is delivered to “all comers” – including patients who may not have been eligible or included in randomized clinical trials (RCTs).16 EHRs provide a plethora of data to analyze, explore, and learn from to evaluate healthcare delivery from a different angle – an angle that includes patient values, shared decision-making, and patient-reported outcomes. However, because most EHR data exists as unstructured text, advanced biomedical informatics methodologies are needed to extract and organize this wealth of data so that it can be used to investigate healthcare delivery.8,17

Our laboratory has focused on using guidelines and evidence to measure the quality of care delivered from information extracted from electronic data systems, as suggested by the IOM reports. We have developed tools and methods to analyze the granular, longitudinal data available in EHRs. Our work takes advantage of the new opportunities to improve the quality of healthcare delivery, to learn from routinely collected clinical data for personalized treatment guidelines, and to augment epidemiological and population health studies. We recognize that EHR data presents several notable advantages over previously available data types (e.g., insurance claims data), including large population sizes, low research costs, and opportunities to access medical histories and track disease onset and progression. This opens possibilities for conducting relatively low cost and time-efficient studies in routine clinical settings in which large, population-based research would otherwise be difficult or impossible to conduct. Here, we highlight opportunities for using EHR data to evaluate the quality of healthcare delivery among cancer patients. First, we describe an infrastructure for capturing and linking EHR data from a large population of patients at an academic setting. Second, we describe the types of tools and software needed to leverage a main component of EHRs – unstructured clinical narrative text. Third, we provide examples of using EHRs to assess pre-treatment, treatment, and post-treatment aspects of care – based on endorsed or recommended quality measures which are rarely described in the literature. Last, we discuss how biomedical informatics can be used for population and epidemiological studies in cancer research.

MATERIALS AND METHODS

Construction of a data warehouses for patient centered outcomes

Data Sources

Patients were identified in a clinical data warehouse.18,19 In brief, data were collected from Stanford Health Care (SHC), a tertiary-care academic medical center using the Epic EHR system (Epic Systems, Verona, WI) and managed in an EHR-based relational database, the clinical data warehouse. The clinical data warehouse contains structured data, including diagnosis and procedure codes, drug exposures and laboratory results, as well as unstructured data, including discharge summaries, progress notes, pathology and radiology reports. Structured data elements are mapped to standardized terminologies including RxNorm, SNOMED, International Classification of Disease (ICD) 9 or 10 codes and Current Procedural Terminology (CPT). The cohort included all patients seeking treatment at SHC between 2005–2018. Cancer patients were linked to the SHC cancer registry and also to the California Cancer Registry (CCR) to gather additional information on tumors, treatments not administered at SHC, cancer recurrence and survival. The CCR contains structured data about diagnosis, histology, cancer stage, treatment and outcomes across multiple tumor types, incorporating data from health care organizations across California. We matched patients to CCR records using the name and demographic details of the EHR cohort (first name, last name, middle name, date of birth, and social security number when available). Patients were excluded if they had less than two clinical visits, as these patients were likely patients seeking secondary opinions and not receiving treatment at our site. All studies received the approval from the institute’s Institutional Review Board (IRB) and were conducted in accordance with recognized ethical guidelines.

Certain patient cohorts were also analyzed in the Veterans Health Administration (VHA). In the VHA cohort, data were obtained from the VA Corporate Data Warehouse (CDW), a national data repository from several VA clinical and administrative systems between 2009 and 2015. In the VHA, medication information was obtained using both the Bar Code Medication Administration data and the Decision Support System National Data Extract pharmacy dataset.20,21

Data Mining

To fully leverage the vast amounts of data in the EHRs, we developed the infrastructure to capture and merge large heterogenous datasets, developed the methods to transform these disparate data into knowledge, and then use this knowledge to improve the health and well-being of an individual by working with policy makers and stakeholders to inform guidelines and by working with clinicians to gain insights into pressing questions in clinical care and to understand how we can best bring discoveries to the point of care. We have summarized our approach as the CAPTIVE infrastructure, which is comprised of 3 processes: Capture; Transform; Improve (Figure 1).

Figure 1.

Figure 1.

Our CAPTIVE infrastructure combines heterogeneous data sources and uses state of the art technology to transform data to knowledge for care improvement.

Capture.

Patient cohorts are first identified in the EHR, which provide granular information on individuals’ healthcare encounters. In our database, EHRs are merged with other data sources, including RCTs,22,23 patient surveys,24 and data registries.18 This comprehensive set of knowledge resources can exponentially amplify the value of each linked semantic layer and addresses the missingness and noise of data often found in the EHRs.

Transform.

The merged data are then transformed to knowledge through different algorithms, mappings, and validation series. We have demonstrated the feasibility of our data-mining workflow to extract accurate, clinically meaningful information from EHR.2528 The key for data extraction is to transform patient encounters into a retrospective longitudinal record for each patient and identify cohorts of interest, known as clinical phenotyping,17 using structured and unstructured data. The custom extractors we develop range in complexity based on the types of data and analytic methods required to identify and pull each variable at high fidelity. For example, we have developed several natural language processing (NLP) pipelines to populate our database with patient outcomes from unstructured EHR data using traditional rule-based approaches2729 and machine learning or deep learning based approaches such as weighted neural word embeddings,26,27 which computes weights based on TF-iDF scores for term/document pairs and generates sentence-level vector representation of clinical notes. These algorithms accurately identify clinician documentation of patient outcomes, often focusing on patient-centered outcomes, and have high performance (F1-scores between 0.87–0.94).

Improve.

The ultimate goal of our system to is learn for the data routinely collected in the EHRs and bring that evidence to the point of care. We focus on questions related to guideline adherence,30 patient centered care,8 comparative-effectiveness analysis31 and decision support.32 The application of our research at point of care provides opportunities to improve patient care and patient outcomes.

RESULTS

Applications of CAPTIVE in assessing quality

We use our CAPTIVE system to assess the quality of care delivery using quality measures that have either been endorsed by federal agencies or have been proposed by clinical societies. Here, we report on our efforts that focus on quality measures that are unavailable in claims data yet are identified as important to both the patient and clinician.

Pretreatment Assessments

Receipt of Radionucleotide Bone scan for Staging.

The National Comprehensive Cancer Network (NCCN) and American Urological Association (AUA) have set guidelines for obtaining radionuclide bone scans for clinical staging to better inform treatment decisions. These guidelines recommend that patients with advanced stage and local/regional high-risk cancer receive a bone scan for staging purposes and that low-risk patients not receive a bone scan prior to treatment.33,34 Clinical features needed to appropriately classify patients into low and high risk categories are embedded in multiple data sources and scattered throughout EHRs. High risk patients were defined by a combination of overall clinical stage, Grade Group (Gleason score), and pre-treatment PSA values. Overall clinical stage was identified in 2 separate structured fields: the CCR and the EHRs, and PSA values were identified from the laboratory values in the EHR and the CCR. Next, we developed algorithms to assess adherence to guideline recommendations using several data sources in the EHR, including unstructured text. (Figure 2) One particular challenge was that bone scans were often obtained at outside facilities and therefore not identifiable as structured data in the EHRs, such as the presence of a radiological report. Presence of outside facility bone scans were captured using NLP algorithms to scan the physicians’ notes for documentation of scans. This work demonstrated the utility of gathering multiple data sources captured in diverse formats and sources to assess both overuse and underuse of bone scans for cancer staging among prostate cancer patients. This pipeline can be implemented at point of care, such as by providing reminders to providers to perform a test like a bone scan. In addition, by gathering and presenting all information necessary to guide bone scan decisions, the methods can be used to assess the need for and performance of guidelines (such as those based solely on expert opinion) in special populations of clinical decision grey-zones where there is an absence of evidence of effectiveness for guidelines.

Figure 2.

Figure 2.

Guideline adherence based on data extracted from billing codes and radiology reports and the health care system (CPT + radiology) augmented by report of bone scan within providers’ unstructured text (NLP). Percentage of patients undergoing a bone scan stratified by risk group according to the NCCN and AUA guidelines.35

Digital Rectal Examination for Prostate Cancer Clinical Staging.

The majority of prostate cancer cases are localized at diagnosis. A digital rectal examination (DRE) is used for clinical staging and pretreatment assessment and can suggest additional diagnostic imaging in patients with locally advanced disease.36 DRE performance is identified as an important quality metric in prostate cancer care,10,37 and most clinical guidelines include DRE as part of a comprehensive pretreatment assessment.38,39 However, DRE results are often not systematically recorded nor included in claims datasets. Therefore, we developed a pipeline to utilize routinely collected electronic clinical text data to automatically assess pretreatment DRE documentation using a rule-based NLP framework.27 This NLP pipeline can accurately identify DRE documentation in the EHRs (95% precision and 90% recall).27 In our system, 72% of prostate cancer patients had documentation of a DRE prior to initiation of therapy, and rates of documentation improved from 67% in 2005 to 87% in 2017. Of those with a DRE, over 70% were performed within 6 months prior to treatment, as required for quality metric adherence. This pipeline can open new opportunities for scalable and automated information extraction of other quality measures from unstructured clinical data.40

Treatment Assessments

Anesthesia Type.

The type of anesthesia administered during operative procedures can influence postoperative outcomes, particularly pain.41 However, this information is not available for most clinicians and researchers because these data are captured and stored as unstructured data in the EHR. Despite evidence that type of anesthetic influences post-operative pain, there are limited quality metrics supporting best practices. Using a rule-based NLP pipeline we have developed, we can accurately classify different types of anesthesia (general, local, and regional) based on features within the free text of operative notes (precision 0.88 and recall 0.77).42 Using our algorithms, we found that regional anesthesia was associated with better pain scores compared to general and local anesthsia.43 This work provides evidence across populations on the use of different anesthesia types and differences in pain associations, information that can guide clinical guidelines and quality metric development.

Multimodal Analgesia.

Regimens using multiple agents that target different pain-relieving mechanisms, “multimodal analgesia”, are associated with improved pain control and reduced opioid consumption post-operatively.44,45 Current pain management guidelines recommend multimodal analgesia for postoperative pain.46,47 Using our EHR pipeline, we evaluated patients undergoing common surgeries associated with high pain, including thoracotomy and mastectomy from Stanford University and VHA, 2008–2015.20 Prescription and medication details are captured well in EHRs as structured data and both EHR systems link prescription medications to RxNorm. RxNorm provides normalized names for drugs and links names, including both generic and Brand names.48 The models were developed and validated independently at Stanford University and then applied to the VHA dataset to external validation. While a majority of patients receive a multimodal pain approach at discharge, 20% were discharged with opioids alone.(Figure 3) Moreover, the multimodal regimen at discharge was associated with lower pain levels at follow-up and lower all-cause readmissions compared to the opioid-only regimen, substantiating guideline recommendations of postoperative pain management in a real world setting.20,21

Figure 3.

Figure 3.

Distribution of Discharge Drug Modality in two diverse healthcare settings, 2008–2015.

Post-treatment Assessments

Global Mental and Physical Health

Post-treatment assessments, particularly patient-reported outcomes (PRO), are difficult to capture and often missing from clinical research.10 Using our CAPTIVE system, the assessment of PROs is possible through the systematic collection of the PROMIS Global survey at the Stanford Cancer Institute.49 The surveys were deployed into routine clinical workflows for oncology outpatients as follows: at the time of clinic appointments, patients were given a paper survey which was transcribed directly into the EHR by the medical assistant. In May 2013, this process was supplemented by an electronic one, where patients could access the survey through the EHR patient portal prior to an appointment. Approximately 75% of patients at the academic cancer center were enrolled in the EHR patient portal and could receive electronic reminders to complete a survey. If no survey was completed electronically, paper surveys were available at the time of the visit. We assessed 11,657 PROMIS surveys from breast (4,199) and prostate (2,118) cancer patients. Survey collection varied by important demographic and clinical subgroups; elder patients and those with advanced disease had disproportionately lower numbers of completed surveys. Similarly, global mental and physical health varied by patient race and stage at diagnosis, with non-white patients and those with advanced disease scoring significantly lower in both global physical and mental health compared to their counterparts.49 We are now correlating these direct measures of PROs with other clinical outcome, including disease status and treatment complications. However, our results also highlight short-comings of collecting survey-based assessments of PROs, since important populations can be missed. These findings provide areas for improvement within our center where additional resources might be needed for improved survey capture.

Treatment-Related Side Effects.

Similar to the PRO, treatment-related side-effects are difficult to capture, and studies on these outcomes are often limited to costly, prospective, survey-based ascertainment. The CAPTIVE system allows the automatic surveillance of clinical narrative text for treatment-related side effects. We demonstrate the opportunities our system facilitates through the investigation of urinary incontinence (UI), erectile dysfunction (ED), and bowel dysfunction (BD) following treatment for localized prostate cancer. First, we developed a rule-based NLP system to identify clinical documentation of UI following prostatectomy, where we identified improvements in the prevalence of UI and ED post-treatment.8,28 Building on this system, we applied machine-learning methods to the clinical narrative text, which improved the accuracy of our algorithms to identify positive and negated mentions of UI and BD, as well as mentions for discussed risk of UI and BD (F1 score of 0.86),26 all of which are recommended quality metrics for prostate cancer care. To assess the concordance between clinician and patient reporting of UI, we next used our CAPTIVE system to compare UI documentation in clinical narrative text with patients’ reporting of UI via a patient survey - the EPIC-26 (Expanded Prostate Cancer Index Composite)50 - collected in a subset of our patients.51 For all time points, the Cohen’s Kappa coefficient agreement between EPIC-26 and the EHR was moderate (agreement across all time points, p<0.001). The high level of agreement between the patient surveys and provider notes suggests that our methods facilitate unbiased measurement of important PROs from routinely collected information in EHR clinician notes.52

DISCUSSION

A biomedical informatics approach to population epidemiology fills an unmet need in cancer research to improve upon quality measurement and begin to capture and report the features of delivered care that matter most to both the clinician and patient. We developed our CAPTIVE system based on routinely collected clinical information to assess important quality aspects of oncology care. Our system fuses the EHR with other digital data streams, transforms the raw data into knowledge, and uses this knowledge to improve and guide clinical care. Such an approach can enable a learning healthcare system, where information gathered from previous patients and encounters can be used to guide and improve clinical care. The extensive availability of EHR data offers unique and promising opportunities for the application of advanced biomedical informatics techniques to improve and guide clinical oncology.

The ability to capture patient-centered outcomes and patient symptoms or treatment-related side effects at a population level opens new paradigms for patient-valued care.

For each patient, health information, such as the number and type of co-morbid conditions, as well as socio-economic, geographical and other features that affect health care interactions can be used to contextualize their health care trajectory. Novel methods that may gather patient-centered outcomes outside of traditional surveys, such as direct capture of patient-clinician conversation documentation, provide opportunities to overcome important biases. The capture of these outcomes through convenience surveys may be biased, as we have previously shown in PROMIS survey completion rates.49 High survey completion rates can be associated with a high number of appointments (suggesting patients were given more opportunities to complete at least one survey), which may bias responses toward sicker and higher acuity patients.53 Furthermore, there is racial bias in survey completion rates which has been well documented.54,55 This may be an effect of patient and/or staff behaviors; and emphasizes the importance of developing efforts to target minority groups in PRO initiatives.

The unbiased capture of patient-centered outcomes across populations is essential to provide precision care to oncology patients. Ultimately, the interface of the disease and treatment features and patient specific context can be used to personalize treatment decisions that can incorporate patient values and aspects of care that are often difficult to capture from structured data, such as in insurance claims data. Such an approach is particularly important for patients where multiple treatment options are available – such as in localized prostate and breast cancers where patients must balance the risks and benefits of different treatments.

To successfully leverage the abundance of data held within an EHR system for epidemiological studies, advanced biomedical informatics tools are needed. Fortunately, computer science and engineering methodologies applied to medical data have progressed rapidly, and new technologies are emerging at a rapid pace. In our work, we apply a broad range of techniques and methodologies to fully utilize information stored in the EHRs.8,2628,35 Our algorithms are developed by a multidisciplinary team, where clinicians, epidemiologists, quality experts and informaticians work closely together to identify clinical terms commonly used to describe a concept in the medical record and to identify where and how these terms are stored in the EHR. When clinical concepts are more subjective (e.g. patient reported outcomes, such as pain, fatigue and nausea) or severity of illness is of interest machine learning and deep learning tools may be required.56,57 While big data and EHRs provide new avenues of clinical data for research and population-based studies, these data are unserviceable if the appropriate tools and software are not developed within a multidisciplinary team.

As machine learning algorithms are adopted into routine clinical care, scientific rigor and generalizability are of the highest importance. Concerns are arising regarding bias and equity of data-driven algorithms in healthcare,58 particularly because they are often trained on historical data that may be biased or only include non-representative populations, similar to what has been found of older clinical trials.16,59,60 The algorithms developed to predict patient outcomes from a predominantly white affluent population may not produce accurate results in a different patient population. Furthermore, bias in training data could further accentuate health disparities.61 While the development of site-specific tools unlocks data within the EHR, these tools lack rigor and reproducibility if they are not extensively validated both internally and externally. Validation goes well beyond evaluating for problems of overfitting. Rather, validation requires assessments of performance in a completely separate health care system, where terminologies used by providers in clinical documentation might confound NLP algorithms developed in another system.62 Our use of the VAH data to validate our SHC derived multidomain pain information is a useful example. External validation ensures that the prediction models developed are applicable to diverse populations, represent general populations and not the possible select patients who might be seen only at an academic facility. However, external validation is difficult due to resource, technology, privacy, and incentive limitations. To manage these shortcomings, transparency is needed on the data used to train and validate the models and how training data are representative of the broader population. This information can help prioritize research agendas, highlight populations under-represented in this wave of medical informatics and can inform policy around equitable healthcare development and practice.

A core aspect of a biomedical informatics approach to population-based cancer research is the ability to fuse diverse data to create a “tapestry” of information that can be used to learn from and predict patient trajectories.63 Registry data, such as the CCR and its associated Surveillance Epidemiology and End Results (SEER) data,64 provide the backbone of population epidemiology studies. However these data provide only a skeleton of patient outcomes and have limited clinical information and knowledge about treatment decisions. By linking registry data with EHRs, the gaps in information can be filled and new knowledge generated. In our work, we have the ability to analyze population-level oncology data with the added information on biomarkers, social determinants of health, quality of care and other patient-reported outcomes. By linking of these data to patient surveys, patient-generated data, and other environmental and social factors, an unprecedented tapestry of the patient journey through their cancer care can be obtained, where precision oncology can flourish and shared decision making is facilitated.

In conclusion, advanced technologies applied to routinely collected her clinical data presents an opportunity to advance population-level quality assessment. We have developed our CAPTIVE system to efficiently and accurately assess the quality of healthcare delivery for oncology patients. We focus on clinical guidelines and endorsed quality metrics that in the past have been underreported due to limited data availability. While we provide examples of potential use cases for our system, many more opportunities exist for knowledge discovery, improved patient outcomes, and the development of a learning healthcare system. The pipeline we have developed can be shared across systems and provide the groundwork for novel informatics-based epidemiological studies.

ACKNOWLEDGEMENTS

Funding

Tina M Hernandez-Boussard was awarded grant number R01HS024096 from the Agency for Healthcare Research and Quality. Tina M Hernandez-Boussard was awarded grant number R01CA183962 from the National Cancer Institute of the National Institutes of Health. The content of this work is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or of the Agency for Healthcare Research and Quality.

Footnotes

Conflicts of Interest: The authors declare no conflicts of interest.

Presentations at Meetings: A portion of this work was presented at the AACR Modernizing Population Sciences in the Digital Age, Feb 21, 2019

REFERENCES

  • 1.Kohn L, Corrigan J, Donaldson M. To err is human: building a safer health system. USA: Insitute of Medicine; 1999. [PubMed] [Google Scholar]
  • 2.Simone JV, Hewitt M. Ensuring quality cancer care. National Academies Press; 1999. [PubMed] [Google Scholar]
  • 3.Jacobson JO, Neuss MN, McNiff KK, et al. Improvement in oncology practice performance through voluntary participation in the Quality Oncology Practice Initiative. J Clin Oncol. 2008;26(11):1893–1898. [DOI] [PubMed] [Google Scholar]
  • 4.Neuss MN, Desch CE, McNiff KK, et al. A process for measuring the quality of cancer care: the Quality Oncology Practice Initiative. J Clin Oncol. 2005;23(25):6233–6239. [DOI] [PubMed] [Google Scholar]
  • 5.Agency for Healthcare Research and Quality. Patient Safety Indicators, PSI. Version 4.1b ed. Rockville, MD: Agency for Healthcare Research and Quality,; 2011. [Google Scholar]
  • 6.National Quality Forum. NQF-endorsed standards. NQF,. http://www.qualityforum.org/Measures_List.aspx. Published 2010. Accessed December 10, 2016. [Google Scholar]
  • 7.Weeks J Outcomes assessment in the NCCN: 1998 update. National Comprehensive Cancer Network. Oncology (Williston Park). 1999;13(5A):69–71. [PubMed] [Google Scholar]
  • 8.Hernandez-Boussard T, Tamang S, Blayney D, Brooks J, Shah N. New Paradigms for Patient-Centered Outcomes Research in Electronic Medical Records: An Example of Detecting Urinary Incontinence Following Prostatectomy. EGEMS (Wash DC). 2016;4(3):1231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Elixhauser A, Steiner C, Harris DR, Coffey RM. Comorbidity measures for use with administrative data. Med Care. 1998;36(1):8–27. [DOI] [PubMed] [Google Scholar]
  • 10.Gori D, Dulal R, Blayney DW, et al. Utilization of Prostate Cancer Quality Metrics for Research and Quality Improvement: A Structured Review. Jt Comm J Qual Patient Saf. 2019;45(3):217–226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Shanafelt TD, Dyrbye LN, West CP. Addressing Physician Burnout: The Way Forward. JAMA. 2017;317(9):901–902. [DOI] [PubMed] [Google Scholar]
  • 12.Rubin HR, Pronovost P, Diette GB. The advantages and disadvantages of process-based measures of health care quality. Int J Qual Health Care. 2001;13(6):469–474. [DOI] [PubMed] [Google Scholar]
  • 13.Shortliffe EH, Tang PC, Detmer DE. Patient records and computers. Ann Intern Med. 1991;115(12):979–981. [DOI] [PubMed] [Google Scholar]
  • 14.Blumenthal D Launching HITECH. The New England journal of medicine. 2010;362(5):382–385. [DOI] [PubMed] [Google Scholar]
  • 15.Adler-Milstein J, Jha AK. HITECH Act Drove Large Gains In Hospital Electronic Health Record Adoption. Health Aff (Millwood). 2017;36(8):1416–1422. [DOI] [PubMed] [Google Scholar]
  • 16.Kennedy-Martin T, Curtis S, Faries D, Robinson S, Johnston J. A literature review on the representativeness of randomized controlled trial samples and implications for the external validity of trial results. Trials. 2015;16:495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Banda JM, Seneviratne M, Hernandez-Boussard T, Shah NH. Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models. Annual Review of Biomedical Data Science, Vol 1 2018;1:53–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Seneviratne MG, Seto T, Blayney DW, Brooks JD, Hernandez-Boussard T. Architecture and Implementation of a Clinical Research Data Warehouse for Prostate Cancer. EGEMS (Wash DC). 2018;6(1):13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Thompson CA, Kurian AW, Luft HS. Linking electronic health records to better understand breast cancer patient pathways within and between two health systems. EGEMS (Wash DC). 2015;3(1):1127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Desai K, Carroll I, Asch SM, et al. Utilization and effectiveness of multimodal discharge analgesia for postoperative pain management. J Surg Res. 2018;228:160–169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Hernandez-Boussard T, Graham LA, Carroll I, et al. Perioperative opioid use and pain-related outcomes in the Veterans Health Administration. Am J Surg. 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Hah J, Mackey SC, Schmidt P, et al. Effect of Perioperative Gabapentin on Postoperative Pain Resolution and Opioid Cessation in a Mixed Surgical Cohort: A Randomized Clinical Trial. JAMA Surg. 2018;153(4):303–311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Hah JM, Sharifzadeh Y, Wang BM, et al. Factors Associated with Opioid Use in a Cohort of Patients Presenting for Surgery. Pain Res Treat. 2015;2015:829696. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Sturgeon JA, Darnall BD, Kao MC, Mackey SC. Physical and psychological correlates of fatigue and physical function: a Collaborative Health Outcomes Information Registry (CHOIR) study. J Pain. 2015;16(3):291–298.e291. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Hernandez-Boussard T, Kourdis P, Dulal R, et al. A natural language processing algorithm to measure quality prostate cancer care. Journal of Clinical Oncology. 2017;35(8):232–232. [Google Scholar]
  • 26.Banerjee I, Li K, Seneviratne M, et al. Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment. JAMIA Open. 2019;2(1):150–159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Bozkurt S, Park JI, Kan KM, et al. An Automated Feature Engineering for Digital Rectal Examination Documentation using Natural Language Processing. AMIA Annu Symp Proc. 2018;2018:288–294. [PMC free article] [PubMed] [Google Scholar]
  • 28.Hernandez-Boussard T, Kourdis PD, Seto T, et al. Mining Electronic Health Records to Extract Patient-Centered Outcomes Following Prostate Cancer Treatment. AMIA Annu Symp Proc. 2017;2017:876–882. [PMC free article] [PubMed] [Google Scholar]
  • 29.Tamang SR, Hernandez-Boussard T, Ross EG, Gaskin G, Patel MI, Shah NH. Enhanced Quality Measurement Event Detection: An Application to Physician Reporting. EGEMS (Wash DC). 2017;5(1):5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Magnani CJ, Li K, Seto T, et al. PSA Testing Use and Prostate Cancer Diagnostic Stage After the 2012 U.S. Preventive Services Task Force Guideline Changes. J Natl Compr Canc Netw. 2019;17(7):795–803. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Vorhies JS, Hernandez-Boussard T, Alamin T. Treatment of Degenerative Lumbar Spondylolisthesis With Fusion or Decompression Alone Results in Similar Rates of Reoperation at 5 Years. Clin Spine Surg. 2018;31(1):E74–E79. [DOI] [PubMed] [Google Scholar]
  • 32.Goodnough LT, Maggio P, Hadhazy E, et al. Restrictive blood transfusion practices are associated with improved patient outcomes. Transfusion. 2014;54(10 Pt 2):2753–2759. [DOI] [PubMed] [Google Scholar]
  • 33.Holmes JA, Bensen JT, Mohler JL, Song L, Mishel MH, Chen RC. Quality of care received and patient-reported regret in prostate cancer: Analysis of a population-based prospective cohort. Cancer. 2017;123(1):138–143. [DOI] [PubMed] [Google Scholar]
  • 34.Carlson RW, Allred DC, Anderson BO, et al. Breast cancer. Clinical practice guidelines in oncology. J Natl Compr Canc Netw. 2009;7(2):122–192. [DOI] [PubMed] [Google Scholar]
  • 35.Coquet J, Bozkurt S, Kan KM, et al. Comparison of orthogonal NLP methods for clinical phenotyping and assessment of bone scan utilization among prostate cancer patients. J Biomed Inform. 2019;94:103184. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Siegel RL, Miller KD, Jemal A. Cancer statistics, 2019. CA Cancer J Clin. 2019;69(1):7–34. [DOI] [PubMed] [Google Scholar]
  • 37.Litwin MS, Steinberg M, Malin J, et al. Prostate Cancer Patient Outcomes and Choice of Providers: Development of an Infrastructure for Quality Assessment. Santa Monica, CA: RAND Corporation; 2000. [Google Scholar]
  • 38.Mohler JL, Armstrong AJ, Bahnson RR, et al. Prostate Cancer, Version 1.2016. J Natl Compr Canc Netw. 2016;14(1):19–30. [DOI] [PubMed] [Google Scholar]
  • 39.Thompson I, Thrasher JB, Aus G, et al. Guideline for the management of clinically localized prostate cancer: 2007 update. J Urol. 2007;177(6):2106–2131. [DOI] [PubMed] [Google Scholar]
  • 40.Bozkurt S, Kan KM, Ferrari MK, et al. Is it possible to automatically assess pretreatment digital rectal examination documentation using natural language processing? A single-centre retrospective study. BMJ Open. 2019;9(7):e027182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Ruppert V, Leurs LJ, Rieger J, et al. Risk-adapted outcome after endovascular aortic aneurysm repair: analysis of anesthesia types based on EUROSTAR data. J Endovasc Ther. 2007;14(1):12–22. [DOI] [PubMed] [Google Scholar]
  • 42.Nastasi AJ, Bozkurt S, Manjrekar M, Curtin C, Hernandez-Boussard T. A Rule-Based Natural Language Processing Pipeline for Anesthesia Classification from EHR Notes. Paper presented at: Academic Surgical Congress; September, 2018; Huston, TX. [Google Scholar]
  • 43.Chin KK, Carroll I, Desai K, et al. Integrating Adjuvant Analgesics into Perioperative Pain Practice: Results from an Academic Medical Center. Pain Med. 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Maund E, McDaid C, Rice S, Wright K, Jenkins B, Woolacott N. Paracetamol and selective and non-selective non-steroidal anti-inflammatory drugs for the reduction in morphine-related side-effects after major surgery: a systematic review. Br J Anaesth. 2011;106(3):292–297. [DOI] [PubMed] [Google Scholar]
  • 45.Ong CK, Seymour RA, Lirk P, Merry AF. Combining paracetamol (acetaminophen) with nonsteroidal antiinflammatory drugs: a qualitative systematic review of analgesic efficacy for acute postoperative pain. Anesth Analg. 2010;110(4):1170–1179. [DOI] [PubMed] [Google Scholar]
  • 46.Dowell D, Haegerich TM, Chou R. CDC Guideline for Prescribing Opioids for Chronic Pain--United States, 2016. JAMA. 2016;315(15):1624–1645. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.American Society of Anesthesiologists Task Force on Acute Pain M. Practice guidelines for acute pain management in the perioperative setting: an updated report by the American Society of Anesthesiologists Task Force on Acute Pain Management. Anesthesiology. 2012;116(2):248–273. [DOI] [PubMed] [Google Scholar]
  • 48.Hernandez P, Podchiyska T, Weber S, Ferris T, Lowe H. Automated mapping of pharmacy orders from two electronic health record systems to RxNorm within the STRIDE clinical data warehouse. AMIA Annu Symp Proc. 2009;2009:244–248. [PMC free article] [PubMed] [Google Scholar]
  • 49.Seneviratne MG, Bozkurt S, Patel MI, et al. Distribution of global health measures from routinely collected PROMIS surveys in patients with breast cancer or prostate cancer. Cancer. 2019;125(6):943–951. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Wei JT, Dunn RL, Litwin MS, Sandler HM, Sanda MG. Development and validation of the expanded prostate cancer index composite (EPIC) for comprehensive assessment of health-related quality of life in men with prostate cancer. Urology. 2000;56(6):899–905. [DOI] [PubMed] [Google Scholar]
  • 51.Gori D, Banerjee I, Chung BI, et al. Extracting Patient-Centered Outcomes from Clinical Notes in Electronic Health Records: Assessment of Urinary Incontinence After Radical Prostatectomy. EGEMS (Wash DC). 2019;7(1):43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Gori D, Banerjee I, Chung BI, et al. Extracting patient-centered outcomes from clinical notes in electronic health records:assessment of urinary incontinence after radical prostatectomy. eGEMS DC. 2019(in press). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Weiskopf NG, Rusanov A, Weng C. Sick patients have more data: the non-random completeness of electronic health records. AMIA Annu Symp Proc. 2013;2013:1472–1477. [PMC free article] [PubMed] [Google Scholar]
  • 54.Jackson R, Chambless LE, Yang K, et al. Differences between respondents and nonrespondents in a multicenter community-based study vary by gender ethnicity. The Atherosclerosis Risk in Communities (ARIC) Study Investigators. J Clin Epidemiol. 1996;49(12):1441–1446. [DOI] [PubMed] [Google Scholar]
  • 55.Richiardi L, Boffetta P, Merletti F. Analysis of nonresponse bias in a population-based case-control study on lung cancer. J Clin Epidemiol. 2002;55(10):1033–1040. [DOI] [PubMed] [Google Scholar]
  • 56.Purushotham S, Meng C, Che Z, Liu Y. Benchmarking deep learning models on large healthcare datasets. J Biomed Inform. 2018;83:112–134. [DOI] [PubMed] [Google Scholar]
  • 57.Gensheimer MF, Henry AS, Wood DJ, et al. Automated Survival Prediction in Metastatic Cancer Patients Using High-Dimensional Electronic Medical Record Data. J Natl Cancer Inst. 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Gianfrancesco MA, Tamang S, Yazdany J, Schmajuk G. Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data. JAMA Intern Med. 2018;178(11):1544–1547. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Gijsberts CM, Groenewegen KA, Hoefer IE, et al. Race/Ethnic Differences in the Associations of the Framingham Risk Factors with Carotid IMT and Cardiovascular Events. PLoS One. 2015;10(7):e0132321. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Ferryman K, Pitcan M. Fairness in Precision Medicine. Data & Society. 2018. [Google Scholar]
  • 61.Char DS, Shah NH, Magnus D. Implementing Machine Learning in Health Care - Addressing Ethical Challenges. N Engl J Med. 2018;378(11):981–983. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Hernandez-Boussard T, Monda KL, Crespo BC, Riskin D. Real world evidence in cardiovascular medicine: assuring data validity in electronic health record-based studies. J Am Med Inform Assoc. 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Weber GM, Mandl KD, Kohane IS. Finding the missing link for big biomedical data. JAMA. 2014;311(24):2479–2480. [DOI] [PubMed] [Google Scholar]
  • 64.Howlader N, Noone A, Krapcho M. SEER stat fact sheets: prostate cancer. National Cancer Institute; In:2017. [Google Scholar]

RESOURCES