Skip to main content
PLOS Digital Health logoLink to PLOS Digital Health
. 2022 Nov 1;1(11):e0000036. doi: 10.1371/journal.pdig.0000036

From months to minutes: Creating Hyperion, a novel data management system expediting data insights for oncology research and patient care

Eric Snyder 1, Thomas Rivers 1, Lisa Smith 1, Scott Paoni 1, Scott Cunliffe 1, Arpan Patel 1, Erika Ramsdale 1,*
Editor: Rutwik Shah2
PMCID: PMC9931228  PMID: 36812590

Abstract

Here we describe the design and implementation of a novel data management platform for an academic cancer center which meets the needs of multiple stakeholders. A small, cross-functional technical team identified key challenges to creating a broad data management and access software solution: lowering the technical skill floor, reducing cost, enhancing user autonomy, optimizing data governance, and reimagining technical team structures in academia. The Hyperion data management platform was designed to meet these challenges in addition to usual considerations of data quality, security, access, stability, and scalability. Implemented between May 2019 and December 2020 at the Wilmot Cancer Institute, Hyperion includes a sophisticated custom validation and interface engine to process data from multiple sources, storing it in a database. Graphical user interfaces and custom wizards permit users to directly interact with data across operational, clinical, research, and administrative contexts. The use of multi-threaded processing, open-source programming languages, and automated system tasks (normally requiring technical expertise) minimizes costs. An integrated ticketing system and active stakeholder committee support data governance and project management. A co-directed, cross-functional team with flattened hierarchy and integration of industry software management practices enhances problem solving and responsiveness to user needs. Access to validated, organized, and current data is critical to the functioning of multiple domains in medicine. Although there are downsides to developing in-house customized software, we describe a successful implementation of custom data management software in an academic cancer center.

Author summary

Ensuring timely access to accurate data is critical for the functioning of a cancer center. Despite overlapping data needs, data are often fragmented and sequestered across multiple systems (such as the electronic health record, state and federal registries, and research databases), creating high barriers to data access for clinicians, researchers, administrators, quality officers, and patients. The creation of integrated data systems also faces technical, leadership, cost, and human resource barriers, among others. The University of Rochester Wilmot Cancer Institute (WCI) hired a small team of individuals with both technical and clinical expertise to develop a custom data management software platform addressing five challenges: lowering the skill level required to maintain the system, reducing costs, allowing users to access data autonomously, optimizing data security and utilization, and shifting technological team structure to encourage rapid innovation. We describe how this platform, Hyperion, was successfully designed, developed, and implemented at WCI. We offer an overview of the data architecture, provide insight into the design elements that address our identified challenges, and discuss the performance of the system in terms of cost, speed, and user engagement.

Background and significance

Academic cancer centers, particularly those embracing the Learning Healthcare Systems (LHS) model to dynamically generate and utilize high-quality evidence for patient decision making [1], require integration and maintenance of data systems offering intuitive access and manipulation of valid, ordered, and up-to-date knowledge informing clinical operations, clinical decision support, and research. Electronic health record (EHR) systems do not offer the data curation nor the user experience required to fully meet these needs. EHR systems were primarily designed to improve billing and revenue capture, requiring very different design decisions which often result in clunky, burdensome, and disorganized systems from the perspectives of many end-users. Moreover, useful data are not exclusively stored in a single location like the EHR, but across dozens of databases utilizing disparate (and often incompatible) technologies. Addressing the data needs of an academic cancer center introduces many challenges, including recruitment and retention of the appropriate technical expertise, while adhering to already thin financial budgets. [2] Technical difficulties include seamless integration of multiple data sources, enhancement of user buy-in for the data system (including mitigation of technology burnout), and rapidly changing technical and data landscapes. [2] Leadership challenges implicate the dominant paradigm for vertical, clinician-centric decision-making: current organizational leadership structures may be ill-suited to devising technical data solutions requiring systems thinking and rapid adaptation/iteration. Importing organizational processes, systems thinking approaches, and technical domain expertise from other industries could help academic cancer centers around the country surmount many issues impeding data utilization.

Attempts to optimally balance data currency, access, validation, and integrity in the healthcare typically involve data or research warehouses. [3,4] Given the disparate nature of the data sets to be merged, as well as the heightened security and privacy concerns involved in storing patient data, barriers include large capital outlays and accommodating the potentially competing design aims such as efficiency, timeliness of reports, user experience, data validity, and accuracy. [5] Different health systems, or even different groups within an individual health system, may prioritize different design aims, such that a one-size-fits-all technical solution is inadvisable and obviates the ability to purchase a ready-made software solution. Even if a ready-made solution is available, the shifting data landscape could quickly make it obsolete; maintenance of data architecture, not only its initial build and deployment, requires significant ongoing technical skill and time. Throughout the processes of data architecture design and implementation, ongoing user feedback is critical, and clinical domain expertise is required at every step to maximize utility, comprehensiveness, and validity.

This paper discusses a novel design and successful implementation of a sophisticated data architecture to address the data needs of an academic cancer center. Although each center has individualized needs embedded in idiosyncratic circumstances, a few principles may be derived to guide other centers hoping to implement and maintain a customized data architecture that users can employ confidently, productively, and adaptively to facilitate rapid answers to quality and research questions and ultimately to improve patient outcomes at the point of care.

Materials and methods

Wilmot cancer institute

The James P. Wilmot Cancer Institute (WCI) at the University of Rochester Medical Center (URMC, a large tertiary care academic medical center) is the largest cancer care provider in upstate New York, with a catchment area of 3.4 million people across a 27-county region. Prior to mid-2019, patient and research data were distributed across multiple unconnected systems, including the EHR (Epic), Research Electronic Data Capture (REDCap) [6] databases, clinical trial management software, individually maintained disease databases, laboratory information management systems, shared resource databases, and billing applications. Additionally, useful data outside of the institution existed in a variety of formats, including behind web portals (e.g., clinicaltrials.gov), [7] in private company databases (e.g., externally performed molecular tumor testing and nonprofit organization public health databases), and within various externally maintained registries (e.g., New York State cancer registry). Gathering, merging, and validating data were tedious, time- and resource-intensive procedures performed largely manually, resulting in high expenditures for human abstractors and significant delays in implementing data insights. Data users lacked the ability to interact with real-time data; while static reports could be generated, the reliance on manual effort led to outdated, delayed reports. Intuitive and interactive data visualization was unavailable, limiting data exploration necessary to develop a research question or protocol, review clinical data, probe quality issues, or refine operations.

In 2019, a small informatics team was assembled to address WCI’s data challenges, consisting of two software architects with expertise in custom healthcare software, a business intelligence software developer, a project manager, and a clinician with expertise in data science and quantitative research methods. The initial primary aim was to improve data availability to WCI faculty and staff, but in collaboration with the data architects it was clear that other aims should be equally prioritized, including data security, access to near-real-time data, data validity, flexibility in integrating data sources, and a platform for interacting with and visualizing data. Between May 2019 and December 2020, a complete data management and analytics platform was built, iterated, and deployed for WCI users: the Hyperion Centralized Medical Analytics (Hyperion) platform. The design and development process was conducted using Project Management Body of Knowledge (PMBOK) best practices under the guidance of the project manager, including scoping, planning, identifying and meeting with stakeholders for each design phase, and communicating requested changes to the development team. [8]

Medical informatics challenges

Although well-maintained data warehouses are critical to the functioning of academic institutions, they do not address all informatics needs. All data management solutions share the common goal of consolidating data and ensuring its quality (ensuring, for example, data validity, availability, and completeness). In addition to meeting this goal for WCI, we sought to overcome five main challenges with the implementation of Hyperion. These challenges, abstracted from our iterative design process including input from all stakeholders, are distilled principles which could be considered across diverse implementations rather than a comprehensive description of discrete implementation challenges we faced.

Challenge 1: Lowering the skill floor

Setting up a data warehouse and maintaining complex interfaces, dashboards, and ad-hoc reporting often require significant time and large teams of information technology professionals. In one instance, developing a single-purpose, straightforward data management system to study antimicrobial resistance required two years and 4000 person-hours among four skilled personnel. [9] The personnel costs of creating such systems become the primary challenge in implementation. [10] Beyond deployment, data management systems must respond to constantly updating data sets and sources, as well as updated user needs; for optimal functioning, maintenance typically requires continued involvement of highly educated (and high-cost) professionals such as software developers and data architects to design changes, as well as a team of programmers to directly implement changes to the software code. Budgets for these activities can quickly become unsustainable. Resilient software design, automation of projected maintenance activities, and creation of interfaces to translate programming and auditing activities could potentially “lower the skill floor,” allowing a smaller team of non-experts to substitute for many of the activities of much larger (and more costly) information technology teams.

Challenge 2: Reducing the cost of technology

The technical architecture to achieve data storage and processing can be very expensive, whether it exists on-premise or in cloud-based systems. On-premise solutions have high start-up costs for processing and storage hardware. Operating a cloud-based system similar in size to Hyperion would be expected to cost about $35,000 per month, using Amazon Web Services as a benchmark. [11]

Challenge 3: Enhancing user autonomy

Data users exist on a spectrum of familiarity and sophistication with data manipulation, ranging from users who want simple, intuitive visual summaries to those capable of sophisticated analysis of raw data. A data system needs to be accessible and usable across the spectrum of potential users within WCI, ideally without the need for manual creation of curated datasets and visuals. Users should be able to autonomously access the data and tools they need to answer their own queries as much as possible.

Challenge 4: Optimizing data governance

Patient data are subject to laws and policies (such as provisions within the Health Insurance Portability and Accountability Act [HIPAA]) enforcing high scrutiny and standards to maintain patient privacy rights. Integration of data governance policies and activities into the data management platform would facilitate adherence to the strict confidentiality guidelines of the health system. Additionally, a well-designed ticketing system can help users clarify their requests, ensure appropriate data usage, streamline ticket approval and completion, and permit ongoing user needs assessment.

Challenge 5: Changing technological team structure

The culture of academic medical centers often promotes top-down decision-making, prioritizing a particular paradigm centered around clinicians. Although clinicians bring invaluable perspective to the design and usage of data systems, they are not typically trained in (or necessarily aware of) the specific technical skillset required to design sophisticated data management systems. Leadership structures with clinicians or administrators at the “top” can hinder technical teams, limiting their autonomy to implement technical best practice decisions, and delaying even simple architectural and programming tasks. Flattening the decisional hierarchy and implementing a transdisciplinary team approach could optimize functioning and increase development speed, simulating the pace of industry teams.

Results

The custom-built data management solution for WCI, Hyperion, consists of a back-end relational Structured Query Language (SQL) database linked to a front-end interface platform accommodating multiple user types and tasks, including database administration, ticketing, reporting, and user dashboards. Table 1 compares Hyperion features to those of the most commonly available clinical data management software packages. Table 2 summarizes how Hyperion design addresses the 5 identified challenges above.

Table 1. Comparison of features: Hyperion versus other commercially available, commonly used clinical data management software systems.

Feature Hyperion Most-Used CDMs*
Custom Security X X
User Defined Roles X X
Security on all Individual Data Elements X  
Dozens of Built In Integrations (EHR/CTMS/XML/State APIS) X  
Clinical Trial Data Integrated X  
Clinical Operational Data Integrated X  
Outside Vendor Data Integration X  
Built in Analytics Packages X X
Ability to Use Outside Analytics (Tableau, QLIK, etc) X X
Ability to Embed Outside Analytics (Tableau, QLIK, etc) X  
Built in Learning Management Systems X  
Built in Data Governance Systems X  
Built in Document Management Systems X  
Automated Emailing of Reports X X
Federated Data X X
Plug and Play Custom Application Support (Write your own app) X  
Change Management Integration X  
Full System Auditing X X
Social Determinates of Health Integration X  
Built in Ticketing System X  
GeoSpatial Analysis Tools X  
Web Accessible/Mobile Accessible X X
Custom Data Fields X X

*CDM = Clinical Data Management software

Table 2. Design elements supporting identified challenges.

Challenge Design Element
Lowering Skill Floor Configuring a new data source for import is automated via a point-and-click graphical user interface (GUI)
Custom wizards embedded in GUI walk administrators through setup of data interfaces, new user profiles, and access privileges
Automatic parsing of imported database schema to inform wizard setup
Custom code to facilitate import, automatic renaming, and validation of EHR data
Automated dashboards for visually monitoring data validity and metrics
Custom wizard walks users through resolving data validation issues identified by HDM
Reducing Technology Cost Use of multi-threaded processing
Use of open-source programming languages for all code
Automatically matches addresses to national database using a “certainty percentage”, allowing users to specify a threshold and limit manual review
Enhancing User Autonomy Secure sandboxing with curated datasets
Customizable dashboards (see Table 3)
Custom geospatial software supporting user-specified map overlays (CANVAS)
User tracking of submitted tickets enabled in Hyperion
Optimizing Data Governance Centralized data governance model with decentralized execution by users (e.g. in sandboxes or dashboards) where feasible
All requests linked to user-generated tickets
Data governance policy embedded in Hyperion, which hard-stops review and signatures by users when updated, and stores documents with signatures
Robust Data Governance Committee which meets monthly and includes all key stakeholders
Pre-specified criteria and processes for approval, scoping, and management of tickets
Changing Team Structure Informatics team operates internally with flattened decisional hierarchy
Transdisciplinary team including technical, management, and clinical expertise
Formal authority for interaction with other institutional structures resides within a co-directed role (one technical and one clinical lead)
Recruitment and hiring processes value and emphasize diversity of identity, perspective, background, and training/skillset

Hyperion data manager

Hyperion’s “nervous system” is the Hyperion Data Manager (HDM), an interface and validation engine built utilizing the Python programming language. Via a front-end system utilizing a Flask-based graphical user interface (GUI), approved users access point-and-click tasks to import data sources, initiate and manage extract, transform, and load (ETL) procedures, create sandboxes, copy table structures, re-run interfaces, and check for errors. A built-in custom wizard permits users to set up data interfaces without any programming knowledge for most data sources; bidirectional interfaces can be easily configured for all common data formats and models (HL7 FHIR, XML, API etc.).

For integrating any new data source, HDM reads the database schema for the new data source and presents approved administrative users with a field listing. Via the GUI, users can select what to import into the Hyperion platform, create table names, rename data elements, and select a data update interval. HDM creates a scheduled interface pull at the selected interval and can accommodate near-real-time intervals. HDM can read in the most utilized database technologies such as Microsoft’s SQL Server and Oracle, and it also supports token authentication, which is utilized by multiple governmental data sources and common medical database software products such as REDCap. For EHR data, HDM has custom code to simplify and semantically define tables and interpret names for commonly used fields It translates field names into more readable form based on clinical naming conventions and uploads data directly at the specified intervals. This streamlines table design and reduces time delay for downstream needs including ad-hoc report development, sandbox creation, and full-scale applications. As all data tables are normalized and explicitly defined within our system, data reusability across the system is also facilitated.

Ease of use combined with robust procedures to ensure data quality has permitted rapid integration of multiple data sources for a variety of clinical, quality, operational, and research purposes (Fig 1). To increase oversight of data validity and currency, HDM incorporates a complete validation and metrics monitoring package for data uploaded to Hyperion. HDM stores and displays metrics including number of new records per interface, timing and number of updates, and deletions or modifications to records since the prior interface update. Automated dashboards allow non-expert users to monitor metrics (Fig 2); the dashboard displays are interactive, allowing point-and-click activities (e.g., adding or removing metrics to the display) and hovering over data points to get more information (see Figs A-C in S1 Text for further examples).

Fig 1. Current architecture of Wilmot Cancer Institute (WCI) data warehouse.

Fig 1

Fig 2. HDM monitoring dashboard.

Fig 2

HDM’s validation routines regularly evaluate for data consistency issues such as field and data type mismatches and extreme shifts in table sizes indicating mass data deletion or insertion. Some validation routines are specifically applied to certain data sources. For example, correct address information is critical for analyses or visualizations involving geospatial elements (Fig 3). Incoming address data are validated against a database of extant addresses, and a “certainty percentage” is calculated which permits a user-defined accuracy threshold for certifying the address as correct. Validation routines are linked to a wizard and programmers are not needed for review of their output. The wizard allows users to examine items or processes flagged as potentially incorrect by the validation routines, and it presents them with a point-and-click menu of options (e.g., flag for further review, manually correct item, or approve upload).

Fig 3. CANcer Visual Analytic System (CANVAS) visual.

Fig 3

Map created using Leaflet with data from OpenStreetMap contributors.

To improve both computational and cost efficiency, HDM uses multi-threaded best practice approaches to parallelize processes and increase processing speed. Job duration calculations allow for efficient scheduling of multiple processor cores prior to task execution. Utilizing the combination of scheduling plus multi-threaded processing increased speed by 50–77% compared to non-optimized methods. Cost efficiency is further optimized using open-source technology (e.g., Python/Flask) for all code. Hyperion imports millions of rows of data every few hours with a total implementation cost of $1,500 for hardware and software (in addition to already existing enterprise licensure costs). This design also facilitates future implementation of analytic pipelines, including machine learning pipelines.

HDM sandboxes offer researchers temporary, partitioned access to curated datasets. Researchers can autonomously handle and analyze their data within a secure environment. Access is completely separated from other Hyperion architecture, and access privileges have an associated, pre-specified timeframe. Upon lapse of access privileges, HDM will terminate access and archive and lock the data.

Hyperion front end

The front-end platform also utilizes open-source programming technology (HTML5 and JavaScript) to limit cost and align with common programming skillsets. Via a secure web portal, each user has access to a curated set of dashboards and activities based upon their access profile. Dashboards are highly interactive and customized to user groups based on iterative feedback. Users can view and securely sign documents (such as the data governance policy) and submit requests and ideas via an integrated ticketing system.

Hyperion administration and security activities are accessible via the web portal and HTML5 interface for approved users; point-and-click tools enable addition of new users and assignment of access privileges. The system has been tested and functions with all major web browsers. System administrators can view applications utilization data, including granular data by user and by clicks. Real-time data monitoring in conjunction allows for all data use to be precisely tracked and audited.

Hyperion’s analytics-processing framework enables real-time analytics across any data element in the system, including those accessed via Application Programming Interfaces (APIs, e.g., imaging, pathology, and molecular/genomic data). Process efficiencies as described above permit rapid turnaround of ad-hoc reports, and scheduled reports are automatically generated and delivered at set intervals. A key principle guiding design of the user experience, however, prioritizes user autonomy in data access, analysis, and visualization. To this end, multiple customized dashboards permit users to directly interact with curated datasets and visualizations; Table 3 gives examples of some dashboards developed for various user groups in WCI.

Table 3. Example user dashboards.

See supplementary material for sample visualizations.

Dashboard Content Users
Physician - Individual physician panel data
    ○ Demographics
    ○ RVU metrics
    ○ Referral patterns
    ○ Common medications prescribed
    ○ Clinical trial accruals
- Physicians (access to own data only)
Clinical Trials - Clinical trial accrual metrics, in total, by disease groups, and by trial
- Color coding for quick visual inspection of trials above or below accrual targets
- Clinical Trials Office
- Investigators
Nursing - Patient and appointment numbers (clinic + infusion):
    ○ Total
    ○ By location
    ○ By day of week and hour of day
    ○ By regimen, with acuity score
- Appointment time metrics
    ○ Average time per appointment
    ○ Percent on-time appointments
    ○ Length of appointment by hour of day and day of week
- Patient demographics
- Numerous point-and-click data filters available
- Nurses
- Nursing managers
- Infusion center staff
- Clinical operations staff
Shared Resources - Usage tracking (rates and hours) for all WCI shared resources (e.g. biostatistics, genomics)
- Resource membership list tracking
- Shared resource leadership
- Administrators
Pharmacy - Tracks timing of antineoplastic therapy order process steps and lab draws
- Visual displays, by provider, of where process may be impacting therapy delivery time
- Pharmacists and pharmacy staff
CANVAS - Interactive map of catchment area, with configurable map overlays and animations using point-and-click functionality - All Hyperion users
Quality - Mortality rates with demographic breakdown
- Visual and numeric tracking of hospital admissions and antineoplastic therapy administration within 14 and 30 days of death overall and by disease grouping
- Readmission data
- Clinical decision tracking (therapy on-pathway, off-pathway, clinical trial)
- Point-and-click filter functionality
- Leadership
- Quality officers

Cross-referencing and validation of patient addresses (described above) as well as integrated data crosswalks between various geographic levels (e.g., zip code and census tract levels) facilitate geospatial visualization and analysis at any geographic level specified by the user. These capabilities enhance the ability to examine data stratified by area within the WCI catchment area, map changes over time, or link to other datasets to analyze disparities across the region (e.g., nutritional access or socioeconomic disadvantage. Hyperion’s CANcer Visual Analytic System (CANVAS) allows users to create regional map overlays for data elements such as patient visits, diagnostic categories, and clinical trial accruals which can be toggled on/off or superimposed (Fig 3). Animations permit direct visualization of changes over time.

Data governance

Data governance processes follow a centralized data governance model with a decentralized execution standard: this allows centralized control and authority to reside with the informatics team but permits the individual users and groups to be able to execute queries and analytics on curated sandboxed data sets or dashboards. This method of governance ensures data reliability and consistency via a single data baseline, validated daily.

Upon initial login, Hyperion users are presented with the current WCI data governance policy, which they must electronically read and sign prior to interacting further with the system (Fig 4). The integrated governance platform requires new signatures at login when the policy is updated. All signatures are encrypted and stored with the documents in Hyperion. The platform can support multiple documents with multiple versions, allowing for customized documentation for different user types as required.

Fig 4. Custom electronic signature platform presenting the data governance policy at login.

Fig 4

Hyperion has an integrated custom ticketing platform for users to submit requests for additional development, reporting, or other requests. This is the primary method for initial communication of needs to the WCI Informatics Team and the Data Governance Committee. This method limits user cognitive load, streamlines development process, and facilitates automated tracking and reporting of requests. The ticketing platform collects all relevant info from users and populates a document for committee discussion, triage, and project management. Tickets are automatically routed to individuals on the governing committee for initial approval; if more information is needed, the committee member can submit questions back to the system via embedded email links. For items with limited time and scope requirements, the team can independently triage the request and place it appropriately within the project queue. Larger requests (e.g., >40 person-hours, complexity, and/or multiple stakeholder groups affected) are flagged for committee discussion. Users can track the progress and actions on their individual tickets, including the ability to view key details entered by the informatics team (such as committee approvals, relevant dates, and requests for further information).

The Data Governance Committee consists of the WCI Informatics team and representatives from clinical operations, quality, administration, and research, all of whom have separate leadership roles within their respective domains. The Committee meets monthly and determines overall mission and priorities, discusses and triages large project requests, and reviews data usage and security.

Security

Hyperion is only accessible via a secure networked computer or institutional virtual private network (VPN). Hyperion’s custom security module augments the institution’s Active Directory (AD) authentication and allows for user auditing and access control down to the individual application level. This permits user access to be customized by role and job function. and allows for the capture of full details on each user action, which Hyperion logs and monitors. Secure sandboxes are provided to users for data analytics purposes, without the ability to download data; this method of viewing and analyzing Hyperion reports is strongly encouraged. Otherwise, requested data documents are sent via either end-to-end encrypted email or secure file transfer protocol (SFTP). All documents able to be downloaded by the user include meta tags to identify the source of download, the individual user, and date/time stamps. In the event of a data consistency issue, the document can be compared to the logged SQL audit calls stored within Hyperion to ensure data was not altered. Request fulfillment for patient-identifiable data meets all institutional data security policies, including review by the Data Governance Committee.

Usage metrics

From January 2020 through December 2021, Hyperion has processed >41 billion records. More than 174 million records are currently stored, and 791 million records have been updated since January 2020. Hyperion currently has 148 unique users (52% physicians, 21% nursing, 17% IT, 10% administrators) accessing an average of 27 pages per day through December 2021. Table 4 shows the most accessed dashboards. There are currently 25 customized real-time updating dashboards in Hyperion, as well as 13 automated reports that distribute throughout the week to various departments and individuals via automated email distribution. The average turnaround time for an ad-hoc reporting request is 3 days. The average time to deploy a new dashboard is about 4 weeks.

Table 4. Most accessed dashboards, January 2020 –December 2021.

Application Name Total Visits
Clinical Trials Dashboard 4690
Physician Dashboard 2821
CANVAS 1604
Nursing Dashboard 1586
Shared Resources 1098
Referrals 601

Discussion

Creating and maintaining complex, secure, and high-volume data warehouses, processing and assembling data views, and interfacing new data sources represent significant challenges for academic healthcare organizations, even those with adequate information technology (IT) staffing and budget. In typical practice, the team (or vendor) that creates the data warehousing software is distinct from the team tasked with maintaining it and supporting data access and visualization activities. Even after deployment of the data warehousing software, basic IT support for data warehousing typically includes interface developers, application developers, business analysts, Business Intelligence developers, security professionals, and help desk staff. Iterative adaptation of the software to meet changing data needs can be challenging or impossible, leading to the accrual of “workarounds” and technical debt.

In WCI, we assembled a small, transdisciplinary team to develop a customized, adaptable, and scalable data management approach, supported by senior leadership and enterprise IT structures. Beginning with the definition and elaboration of key challenges we wished to address, we designed and built a modular and scalable software package addressing data storage, validation, and processing needs as well as data monitoring, access, and visualization. These structures were designed to permit rapid iteration and adaptation by a small but highly skilled technical team when needed, but to allow basic administration and continued maintenance to be performed by non-technical staff. The Hyperion database architecture, Hyperion Data Manager, security module, and governance modules were designed and deployed with a 6-month timeline, and the entire package can be maintained by a single full-time equivalent (FTE). The modular extensible approach significantly reduces enhancement and update times. Limiting the continuous need for high-level technical skillsets to maintain software and data integrity frees a smaller team to work “at the top of their cognitive skillsets”, iterating solutions in response to user feedback, and creatively generating new solutions to complex data problems in WCI. This approach maximizes cost-effectiveness in addition to overall efficiency. Furthermore, Hyperion aligns with FAIR data principles (findability, accessibility, interoperability, and reusability). [12]

Hyperion has high user uptake, with many faculty members, staff, administrators, and others logging in daily to independently access data for clinical, operational, administrative, and research purposes. Although much of the data in Hyperion originates within the institutional EHR, it has previously been sequestered within individual patient records and only slightly more accessible to automated extraction methods than paper charts. Hyperion makes validated, curated, organized, near-real-time datasets automatically accessible and easily explorable via an interactive suite of analysis and visualization tools. Quality initiatives, program development, physician decision-making, clinical trial planning and management, research (grant applications and peer-reviewed analyses), clinical operations management, and more are all supported within a single management platform.

In addition to design and build of the software elements, implementation of Hyperion required careful consideration and design of support structures to address data governance, security, and project management. Intrinsic software elements such as the data governance policy tracking, security audits, and the sophisticated ticketing system are embedded in a set of processes overseen by a supporting committee structure consisting of key stakeholders and meeting at least monthly. Users are not competing with requests external to WCI for attention, and two-way communication is streamlined and efficient due to the embedded nature of the informatics team.

Beyond embedding a responsive team within and among the software users, the composition and functioning of the team are substantial revisions to the usual model of academic IT teams. A co-directed model shares formal authority for external interactions between a technical expert with extensive background in healthcare IT, and a practicing clinician with training in data science and informatics. Internally, the team is structured with a flattened decisional hierarchy, a deliberate emphasis on diversity of opinion and perspective, and a rapidly iterative approach to problem solving adapted from industry. Projects are managed through a combination of agile-based approaches and more traditional project management philosophies such as milestone phase gating. Although this structure is not able to completely shield the team from institutional bureaucracy, it has helped to create and sustain space for innovation and creativity within a traditionally cautious and even inert system.

There are several potential disadvantages to our strategy for addressing data needs within an academic cancer center. Academic medical centers may not be able or willing to support in-house software development for patient data, relying on outsourced software to guarantee robustness, security, and technical support. In the current market, the technical skillsets to achieve customized software solutions for managing patient data are rare and expensive, and achieving buy-in to budget for these positions may be very challenging. We have mitigated personnel costs with a smaller team comprised of individuals with cross-functional skillsets, but this poses its own difficulty: there is limited redundancy within the system to accommodate team member absences or departures. Although the software design offloads much of the technical maintenance, allowing it to be done by nontechnical personnel, the sustainability of the software still relies on a core highly-skilled technical team. However, stable teams are also required to internally support many vendor products within the technical ecosystem.

In summary, Hyperion has surmounted large challenges in working with healthcare data to merge, organize, validate, and package data for use in multiple applications including interactive user dashboards. Additional design considerations included lowering the skill floor for interaction with and maintenance of the software, reducing costs, and encouraging user autonomy. Development of data governance and other support structures, as well as discussions about team functioning and structure, occurred in parallel with the software build. Future work includes turning our attention to further supporting data analytics, including machine learning (ML). A ML pipeline is being developed to allow users to explore their data using advanced techniques while also receiving hands-on education about the potential and pitfalls of these methods. Novel data visualization methods, including augmented and virtual reality methods (AR/VR) are also in initial development and testing. Hyperion provides a flexible, reliable, and scalable foundation for the development of responsive and innovative applications to support the mission and goals of an academic cancer center.

Supporting information

S1 Text

Fig A. Nursing dashboard. Fig B. Example of Provider Dashboard landing page, with interactive features (hover-over pop-up text and ability to click on bar graphs to “drill down” on data). Fig C. Clinical Trial dashboard main page, with interactive features.

(DOCX)

Acknowledgments

We acknowledge the continued support of the Wilmot Cancer Institute’s leadership in encouraging innovation, as well as the support of the physicians, nurses, and staff.

Data Availability

All data are in the manuscript and/or supporting information files.

Funding Statement

ER is supported by the National Cancer Institute (K08CA248721) and the National Institute on Aging (R03AG067977). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Institute of Medicine Roundtable on Evidence-Based M. In: Olsen L, Aisner D, McGinnis JM, editors. The Learning Healthcare System: Workshop Summary. Washington (DC): National Academies Press (US); 2007. [PubMed] [Google Scholar]
  • 2.Chaudhry B, Wang J, Wu S, Maglione M, Mojica W, Roth E, et al. Systematic review: impact of health information technology on quality, efficiency, and costs of medical care. Ann Intern Med. 2006;144(10):742–52. Epub 20060411. doi: 10.7326/0003-4819-144-10-200605160-00125 . [DOI] [PubMed] [Google Scholar]
  • 3.Lyman JA, Scully K, Harrison JH Jr. The development of health care data warehouses to support data mining. Clin Lab Med. 2008;28(1):55–71, vi. doi: 10.1016/j.cll.2007.10.003 . [DOI] [PubMed] [Google Scholar]
  • 4.Sheta OE-S, Eldeen AN. Building a health care data warehouse for cancer diseases. arXiv preprint arXiv:12114371. 2012. [Google Scholar]
  • 5.Ewen EF, Medsker CE, Dusterhoft LE. Data warehousing in an integrated health system: building the business case. Proceedings of the 1st ACM international workshop on Data warehousing and OLAP; Washington, D.C., USA: Association for Computing Machinery; 1998. p. 47–53. [Google Scholar]
  • 6.Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap)—a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2009;42(2):377–81. Epub 20080930. doi: 10.1016/j.jbi.2008.08.010 ; PubMed Central PMCID: PMC2700030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.ClinicalTrials.gov. U.S. National Library of Medicine; [July 11, 2022]. Available from: https://clinicaltrials.gov/.
  • 8.PMBOK Guide, 6th ed. Newtown Square, PA: Project Management Institute, Inc.; 2017.
  • 9.Wisniewski MF, Kieszkowski P, Zagorski BM, Trick WE, Sommers M, Weinstein RA. Development of a clinical data warehouse for hospital infection control. J Am Med Inform Assoc. 2003;10(5):454–62. Epub 20030604. doi: 10.1197/jamia.M1299 ; PubMed Central PMCID: PMC212782. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Rubin DL, Desser TS. A data warehouse for integrating radiologic and pathologic data. J Am Coll Radiol. 2008;5(3):210–7. doi: 10.1016/j.jacr.2007.09.004 . [DOI] [PubMed] [Google Scholar]
  • 11.Amazon Web Services. AWS Pricing Calculator Accessed November 12, 2021 from https://calculator.aws/.
  • 12.Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data. 2016;3(1):160018. doi: 10.1038/sdata.2016.18 [DOI] [PMC free article] [PubMed] [Google Scholar]
PLOS Digit Health. doi: 10.1371/journal.pdig.0000036.r001

Decision Letter 0

Laura Sbaffi, Rutwik Shah

15 Jun 2022

PDIG-D-22-00095

From months to minutes: creating Hyperion, a novel data management system expediting data insights for oncology research and patient care

PLOS Digital Health

Dear Dr. Ramsdale,

Thank you for submitting your manuscript to PLOS Digital Health. Based on reviewer comments and questions, we feel that it has merit but does not fully meet PLOS Digital Health's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by %7/11/2022%. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at digitalhealth@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pdig/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

* A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

* A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

* An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

We look forward to receiving your revised manuscript.

Kind regards,

Rutwik Shah, MD

Guest Editor

PLOS Digital Health

Journal Requirements:

1. Please amend your detailed Financial Disclosure statement. This is published with the article. It must therefore be completed in full sentences and contain the exact wording you wish to be published.

a. State the initials, alongside each funding source, of each author to receive each grant.

b. State what role the funders took in the study. If the funders had no role in your study, please state: “The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”

c. If any authors received a salary from any of your funders, please state which authors and which funders.

2. Please ensure that the funders and grant numbers match between the Financial Disclosure field and the Funding Information tab in your submission form. Note that the funders must be provided in the same order in both places as well.

3. Please update your Competing Interests statement. If you have no competing interests to declare, please state: “The authors have declared that no competing interests exist.”

4. Please provide a complete Data Availability Statement in the submission form, ensuring you include all necessary access information or a reason for why you are unable to make your data freely accessible. If your research concerns only data provided within your submission, please write "All data are in the manuscript and/or supporting information files" as your Data Availability Statement.

5. Please provide separate figure files in .tif or .eps format and ensure that all files are under our size limit of 10MB.

For more information about how to convert your figure files please see our guidelines:

https://journals.plos.org/digitalhealth/s/figures

6. All figures and supporting information files will be published under the Creative Commons Attribution License (creativecommons.org/licenses/by/4.0/). Authors retain ownership of the copyright for their article and are responsible for third-party content used in the article.

Figure 3: please (a) provide a direct link to the base layer of the map used and ensure this is also included in the figure legend; (b) provide a link to the terms of use / license information for the base layer. We cannot publish proprietary or copyrighted maps (e.g. Google Maps, Mapquest) and the terms of use for your map base layer must be compatible with our CC-BY 4.0 license.

If your map was obtained from a copyrighted source please amend the figure so that the base map used is from an openly available source. Alternatively, please provide explicit written permission from the copyright holder granting you the right to publish the material under our CC-BY 4.0 license.

Please note that the following CC BY licenses are compatible with PLOS license: CC BY 4.0, CC BY 2.0 and CC BY 3.0, meanwhile such licenses as CC BY-ND 3.0 and others are not compatible due to additional restrictions.

If you are unsure whether you can use a map or not, please do reach out and we will be able to help you. The following websites are good examples of where you can source open access or public domain maps:

* U.S. Geological Survey (USGS) - All maps are in the public domain. (http://www.usgs.gov)

* PlaniGlobe - All maps are published under a Creative Commons license so please cite “PlaniGlobe, http://www.planiglobe.com, CC BY 2.0” in the image credit after the caption. (http://www.planiglobe.com/?lang=enl)

* Natural Earth - All maps are public domain. (http://www.naturalearthdata.com/about/terms-of-use/)

Please upload any written confirmation as an 'Other' file type. It must clarify that the copyright holder understands and agrees to the terms of the CC BY 4.0 license; general permission forms that do not specify permission to publish under the CC BY 4.0 will not be accepted. Note that uploading an email confirmation is acceptable.

7. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article's retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments (if provided):

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Does this manuscript meet PLOS Digital Health’s publication criteria? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: Yes

--------------------

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

Reviewer #2: N/A

Reviewer #3: N/A

--------------------

3. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

--------------------

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS Digital Health does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

--------------------

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: General comments

=============

This paper highlights the design and implementation of an in-house customized data management software in an academic medical center involving a small, cross-functional team. This is an excellent practical project and seems to work well for a specific context. However, for digital health and health informatics, the methods and measures are unclear and don’t have adequate rigor. I am also unsure about the generalizability of the study findings

Specific comments

=============

Major comments

---------------------

1. Some of the challenges that have been identified are very specific to a context. It would be helpful to address the generalizability of these challenges to diverse cancer settings (academic and community centers). Also, what methods were used to determine these challenges? How did the authors ascertain that these were the “only” challenges?

2. Also, how were the design elements for each challenge developed. Replicability of these findings is difficult, and this study seems more like a solution to a contextual problem.

3. It is also clear if any user feedback on ease of use was measured. Also, were the end-users involved in design and development of Hyperion? Please include more details. The authors have included usage metrics and these metrics would often cloud the overall usability and end-user preferences.

4. The authors make a case for a successful implementation of custom data management software but haven’t measured any implementation outcomes.

5. It is not clear what were the novel design ideas in data architecture.

6. It would be helpful to also provide a rationale on why an in-house customized data management software was required and provide some generalizable knowledge on how academic medical centers should take decisions for or against the development of in-house customized data management software. Much of the limitation section highlights all the issues on why academic medical centers are unwilling to support in-house software development for patient data and it could be that this is a “unique” situation. The generalizability of this study findings is what I am struggling most. This is an excellent practical project with great value to a specific context. However, it lacks rigor in methods, measures, and generalizability

Minor Comments

---------------------

1. Figure 5 is titled most accessed pages by end users through October 2021. Instead of merely presenting the pages from 1-6 ( clinical trial dashboard- Referrals), it would be helpful if the number of times each of these pages have been accessed per a specific time period or the average number for a defined time period.

Reviewer #2: This paper describes the design and implementation of a data management platform denominated Hyperion for an academic cancer center. The authors present all the features and benefits of using this interface in their center. The authors discuss the limitations of Hyperion in the discussion section and future work which looks promising.

I only have minor comments bellow.

References could be added to several statements in the manuscript. Regarding related work, the authors only mention one institution (reference 4). This could be extended to include more examples since the authors refer to “Some institutions”.

It would be interesting to visualize graphical examples of tasks or dashboards which are possible with Hyperion as described on the paper, as supplementary material.

Please check the references in the manuscript and include those currently after the dot before the final dot, examples in “while adhering to already thin financial budgets.[2]” and “skilled personnel.[5]”.

The first sentence in Data Governance section should be divided into at least 2 sentences.

In Usage metrics specify the start and end dates, instead of ‘through October 2021’, so that the reader understands the referred time period. Figure 5 legend should also include this information.

In Figure 1 please add the acronym meaning for your center WCI.

Figure 2 is missing y and x axis legends.

Reviewer #3: Hyperion seems to be a very useful system, bringing together data within a single cancer center. However the paper should be improved significantly. I have a number of suggestions:

Please follow the submission guidelines of the journal (https://journals.plos.org/digitalhealth/s/submission-guidelines):

- Abstract is conceptually divided into three sections (Background, Methodology/Principal Findings, and Conclusions/Significance), do not apply these distinct headings to the Abstract within the article file.

- Include an 150-200 word author summary

- Word count: according to the submission guidelines, there is no limit of 4000 as mentioned in your paper

The background information is quite limited. What are state-of-the-art Clinical Data Management (CDM) systems? How does Hyperion compare to these? It might be good to include a table with features, comparing current CDM systems.

The number of references (7) is very small. Sites/software such as REDCap, OnCore, clinicaltrials.gov should be referenced, using a scientific publication (e.g. REDCap: https://www.sciencedirect.com/science/article/pii/S1532046408001226).

It's surprising that a paper about a clinical data management platform does not discuss the FAIR Guiding Principles (https://www.nature.com/articles/sdata201618). How does Hyperion handle these four aspects?

The Hyperion front-end works with HTML5 and JavaScript. Was it tested in all major browsers (Chrome/Firefox/Edge/Safari/Opera)?

Probably the most difficult task in a custom-made CDM system is security. I miss the details on security in this paper. For example, does Hyperion support two-factor authentication?

Is there a way to access imaging data (e.g. DICOM) through Hyperion, for example in the Physician Dashbord? The same for genomics data, digital pathology data, etc.: does Hyperion contain a link to these raw data?

A screenshot of one of the other dashboards than just CANVAS would be useful, for example the Clinical Trials Dashbord or Physician Dashboard (dummy data could be used), since these are most accessed?

--------------------

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public.

For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Marta Fernandes

Reviewer #3: Yes: Tim Hulsen

--------------------

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLOS Digit Health. doi: 10.1371/journal.pdig.0000036.r003

Decision Letter 1

Laura Sbaffi, Rutwik Shah

4 Sep 2022

From months to minutes: creating Hyperion, a novel data management system expediting data insights for oncology research and patient care

PDIG-D-22-00095R1

Dear Dr. Ramsdale,

We are pleased to inform you that your manuscript 'From months to minutes: creating Hyperion, a novel data management system expediting data insights for oncology research and patient care' has been provisionally accepted for publication in PLOS Digital Health.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow-up email from a member of our team. 

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they'll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact digitalhealth@plos.org.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Digital Health.

Best regards,

Rutwik Shah, MD

Guest Editor

PLOS Digital Health

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Text

    Fig A. Nursing dashboard. Fig B. Example of Provider Dashboard landing page, with interactive features (hover-over pop-up text and ability to click on bar graphs to “drill down” on data). Fig C. Clinical Trial dashboard main page, with interactive features.

    (DOCX)

    Attachment

    Submitted filename: PLOSdigital_Hyperion_responsetoreviewercomments.docx

    Data Availability Statement

    All data are in the manuscript and/or supporting information files.


    Articles from PLOS Digital Health are provided here courtesy of PLOS

    RESOURCES