Skip to main content
Cancer Informatics logoLink to Cancer Informatics
. 2011 Aug 31;10:217–226. doi: 10.4137/CIN.S7845

Multicenter Breast Cancer Collaborative Registry

Simon Sherman 1,10,, Oleg Shats 1,10, Elizabeth Fleissner 1, George Bascom 2, Kevin Yiee 3, Mehmet Copur 4, Kate Crow 5, James Rooney 6, Zubeena Mateen 7, Marsha A Ketcham 1, Jianmin Feng 1, Alexander Sherman 1, Michael Gleason 1, Leo Kinarsky 1,10, Edibaldo Silva-Lopez 8, James Edney 8, Elizabeth Reed 8, Ann Berger 9, Kenneth Cowan 1
PMCID: PMC3169352  PMID: 21918596

Abstract

The Breast Cancer Collaborative Registry (BCCR) is a multicenter web-based system that efficiently collects and manages a variety of data on breast cancer (BC) patients and BC survivors. This registry is designed as a multi-tier web application that utilizes Java Servlet/JSP technology and has an Oracle 11g database as a back-end. The BCCR questionnaire has accommodated standards accepted in breast cancer research and healthcare. By harmonizing the controlled vocabulary with the NCI Thesaurus (NCIt) or Systematized Nomenclature of Medicine-Clinical Terms (SNOMED-CT), the BCCR provides a standardized approach to data collection and reporting. The BCCR has been recently certified by the National Cancer Institute’s Center for Biomedical Informatics and Information Technology (NCI CBIIT) as a cancer Biomedical Informatics Grid (caBIG®) Bronze Compatible product.

The BCCR is aimed at facilitating rapid and uniform collection of critical information and biological samples to be used in developing diagnostic, prevention, treatment, and survivorship strategies against breast cancer. Currently, seven cancer institutions are participating in the BCCR that contains data on almost 900 subjects (BC patients and survivors, as well as individuals at high risk of getting BC).

Keywords: biomedical informatics, breast cancer, registry, caBIG® bronze compatible system

Introduction

Breast cancer (BC) is one of the most common cancers in American women, with over 192,000 estimated newly diagnosed cases of invasive BC and about 40,000 BC-related deaths in 2009.1 Incidence and mortality differ along racial lines.1

Small BC tumors are often asymptomatic, but larger tumors often present as a painless, palpable mass.1 Other, less common symptoms include breast pain and physical changes to the breast or nipple. The majority of patients with invasive BC are diagnosed at early stage disease, and only 6% of patients present with metastatic disease.2

Declines in BC incidence since the year 2000 are attributed to fewer women choosing hormone replacement therapy,3 as recommended by the Women’s Health Initiative,4 as well as decreased screening5 that would detect more, smaller cancers at an early stage. Invasive BC survival rates have shown improvement, with the 5-year survival rate at 89% and 15-year rate at 75%, however, these rates drop significantly with increasing stage or tumor size at diagnosis.1

Breast cancer survivors make up the largest diagnostic group among the 11.7 million Americans who are living after a diagnosis of cancer. Twenty-two percent of all survivors, including 41% of all female survivors, have a history of BC.6 Despite the prevalence, there is limited information about the long-term effects in BC survivors and the impact a diagnosis of BC had on the functioning and well-being of the survivor’s family members. Individual patients differ in the importance they place on the risks and benefits of adjuvant therapy. Quality of life (QOL) needs to be evaluated to examine the impact of acute and long-term side effects of adjuvant therapies.

Although many risk factors for BC have been identified, BC etiology remains to be elucidated. Female gender and increasing age are the two highest risk factors for BC. Hereditary factors, such as mutations in the BRCA1 and BRCA2 genes that account for 5% to 10% of BC cases,79 mutations in other common oncogenes such as TP53,10 CHEK2,11 ATM,12 or PTEN,13 and family history play significant roles in a woman’s BC risk profile. It is well known that some BC cases exhibit familial clustering, however, the question of the extent of family history and genetic background toward BC initiation has been poorly studied. BRCA1/2 remain the most frequent genes of interest in BC genetics research. Studies on BRCA1/2 mutation carriers suggest that BRCA1/2’s influence on BC risk is modified by both non-genetic and genetic factors.14 Some carriers have little chance of a BC diagnosis, while others are diagnosed early into adulthood.7,15 Investigations of BC often focus on potential modifiers or regulators of BRCA1/2 or their downstream targets in these pathways. Additional studies are needed to identify other oncogenes that are uniquely associated with BC initiation and progression. A woman’s personal history of BC is also a concern, with two to six times the risk of being diagnosed with a second primary cancer in the same or contralateral breast.16 Lifestyle factors such as having no children or having them later in age, as well as use of oral contraceptives, hormone replacement therapy, or alcohol, low or no amount of breastfeeding, and being overweight are all associated with the increased risk of BC development.1

Surveillance efforts and prospective identification of at-risk patients are useful for gathering clinical data and biospecimens that can be used toward the determination of BC risk factors, development of the comprehensive prevention strategies and effective treatment regimens. Registered patients also provide a wealth of potential diagnostic and prognostic information on environmental, nutritional, social, familial, and demographic factors in addition to clinical, biochemical, genetic, and epidemiological data. Most cases of BC are sporadic,17 and most of the aforementioned risk factors account for only a small portion of a patient’s risk profile. Due to the complexity of the disease, and the increasing evidence that BC itself is a set of distinct morbidities,18 thorough analysis requires detailed data sets with large sample sizes from a representative patient population. BC is characterized by high incidence rates, but even the largest single-center studies lack the statistical power to study prospective factors that occur relatively infrequently within the BC patient subpopulation. Multi-center collaboration is required not only for sample size considerations, but to avoid regional effects specific to a local population, and for leveraging varying levels of expertise in the areas of BC epidemiology, genetics, biology, pathology, early detection and patient care. These issues can be addressed by the deployment of an advanced information technology (IT) infrastructure, which allows clinicians from various sites (centers) to input and analyze BC data in a standard, convenient, and secure manner. A foundation of standardized data collection and patient registration is essential for building a system capable of advancing the field of BC research. Consistent, well-documented data models are required to develop databases and methodology capable of validation and analysis of BC clinical and biological data.

To address all of these challenges, the Breast Cancer Collaborative Registry (BCCR) was developed and implemented as a web-based system at the UNMC Eppley Cancer Center (ECC). The BCCR is designed as a regional/national breast cancer resource that aims to facilitate rapid and uniform collection of critical information and biological samples to be used in developing new strategies for the prevention and treatment of breast cancer and improving QOL for BC survivors. The BCCR collects a variety of information, including demographic, medical and family details, dietary and environmental exposure history, as well as biospecimen samples from breast cancer patients, women at increased risk for breast cancer (those with one or more first-degree relatives or at least two second-degree relatives with a history of cancer), and BC survivors.

This paper describes the biomedical informatics aspects of the BCCR development and organizational procedures that support BCCR activities.

Methods

The BCCR is a multi-tier web application that utilizes Java Servlet/JSP technology and has an Oracle 11g database as a back-end. The Pancreatic Cancer Collaborative Registry (PCCR)19 has been used as a foundation for the development of the BCCR. Adoption of the PCCR’s methodology allowed us to design the BCCR as a cancer Biomedical Informatics Grid (caBIG®) bronze compatible system.20 The caBIG® is a voluntary community of biomedical researchers and institutions with the goals being to: connect scientists and practitioners through a shareable and interoperable infrastructure; develop standard rules and a common language for easily sharing information; and build or adapt tools for collecting, analyzing, integrating, and disseminating information associated with cancer research and care.21 The caBIG® compatibility guidelines22 specify the compatibility requirements for software tools in four categories: programming and messaging interfaces, vocabularies, common data elements, and information models. The BCCR has satisfied caBIG® bronze compatibility requirements in all four categories and has been certified by the NCI CBIIT as a caBIG® Bronze Compatible product.

Programming interface

BCCR end-users utilize a web interface for data entry and management. To allow third-party applications to access the BCCR’s data directly and to satisfy the caBIG® bronze compatibility requirements, a set of application programming interfaces (API) has been developed. This set of APIs consists of methods for both retrieving data from and inserting data into the BCCR.

Vocabulary

The following controlled terminologies have been implemented both in the BCCR front-end and metadata: NCI Thesaurus (NCIt)23 and Systematized Nomenclature of Medicine-Clinical Terms (SNOMED-CT).24 These publicly accessible controlled vocabularies meet all caBIG® Bronze requirements.

Data elements

The Bronze requirements for caBIG® compatibility include “the availability of data element descriptions for every scientific data type that the system exposes”.22 BCCR data element descriptors (metadata) have been constructed from the aforementioned vocabularies using the NCI CBIIT Data Standards Registry and Repository (caDSR) common data elements convention25 and are available in an electronic format.

Information model

The BCCR data is stored in a relational database (Oracle 11g). All Protected Health Information (PHI) is stored encrypted in the database and requires encryption/decryption functions with a pass phrase in order to insert or select data. A simplified entity-relationship diagram (ERD) is presented in Figure 1.

Figure 1.

Figure 1

Simplified BCCR’s ERD.

Organizational model

The BCCR utilizes the confederation model assuring that each institution voluntarily participates in the registry, retains all rights to its own data, and has equal representation in the registry’s steering committee. A confederation encourages any interested center regardless of its size or location to participate in database development and utilization. The data collected at any location can be used by other participants only after obtaining required permissions and by providing corresponding references and acknowledgements.

Standardization

To satisfy the needs of different centers and create the foundation for future integration with other related data sources, we defined and established the criteria for standardization of collection forms and identified research questions that had to be addressed. All terms of the BCCR controlled vocabulary are explicitly numerated and described by an unambiguous definition drawn from the NCIt or SNOMED-CT. The data elements of the BCCR vocabulary have been defined based on the caDSR convention and, when possible, were mapped to the caDSR.

The BCCR questionnaires have been designed and developed to collect comprehensive data related to the diagnosis, treatment and follow-up of BC patients, as well as information pertaining to demographics and survivorship. Existing, well established and recognized in the cancer research community questionnaires, such as the SF-36v2 Health Survey to measure QOL26 and the NCI Quick Food Scan questionnaire27 for the dietary habits have been implemented in the BCCR registry. The American Cancer Society’s (ACS) examples of moderate versus vigorous physical activity guidelines for cancer prevention28 were used to create the physical activity form. Sleep habits are assessed by using the Pittsburgh Sleep Quality Index.29

According to the BCCR rules, information on personal, demographic, lifestyle, physical activity, dietary habits, family history, women’s health, genetics data, symptoms, QOL, and medical history may be provided by a subject; whereas medical information on diagnostic studies, pathology/staging, treatment, surgeries, biospecimens, and survival can be provided only by clinical personnel. The Core Data Set categories included in this registry are described below.

Personal data collect patient identification information including medical record number, names, date of birth, and contact information. All of this information is stored encrypted in the database.

Demographic data capture the self-reported information on race and ethnicity of the subject, their city/state/country of birth, current marital status, income and education level, as well as religious preference. These data also include history of employment and toxic exposures.

Lifestyle data include information on tobacco usage as well as alcohol, coffee and caffeinated beverage habits. For this purpose, the lifestyle page is designed with descriptions of various tobacco and alcohol products and drop-down boxes to enter the amount of each product used (e.g. the number of cigarettes per day). The age at which the subject started and stopped using each specific product is also captured in this data.

Physical activity data include current physical activity levels and sleep habits. This page also prompts subjects to recall their physical activities over their lifetime. In order to do this, subjects are asked how active they were a year ago, at age 20, and, if applicable, at age 50. The page has descriptions of various moderate and vigorous activities and drop-down boxes to enter the duration or time spent in the activity.

Dietary habits data are collected based on the revised edition of the NCI Quick Food Scan questionnaire. Subjects are asked to complete a table regarding how often they ate certain fatty foods and if they feel their diet was high in fat, medium in fat, or low in fat. They are also asked if they have changed their eating habits since their BC diagnosis.

Family data provide information on the birth status (e.g. a single birth, twin or one of a multiple birth), as well as ancestry. The subjects are asked to provide information on first and second degree relatives with the history of cancer and other major diseases.

Quality of life (QOL) questionnaire utilizes the SF-36v2 Health Survey that measures dimensions including physical, social/family, emotional, and functional well-being. Physical well-being addresses disease symptoms and side effects of therapy, while functional well-being applies to the patient’s ability to perform their role at work and home. Social or family well-being focuses on communication with family members, the support and closeness they feel with their friends and family, and their satisfaction with their sex life. Emotional well-being includes a wide range of psychological impacts manifested by their disease and therapy. The additional concerns address medical changes such as pain, shortness of breath, arm swelling, and change in weight as well as emotional changes regarding sexuality.

Medical history collects the subject’s self-reported history of their height/weight, performance of breast exams, any lifestyle modifications or interventions they have employed, their menstruation history and reproductive history. Subjects are asked to address pre-existing or concurrent health problems and include the specific details of some of these problems.

Therapy history includes the self-reported history of the patient’s BC and the therapy received. Details of these received treatments are entered by the clinician. Subjects are asked about follow-up care and monitoring, as well as about their use of complementary/alternative therapies including: stress reduction techniques, use of diet and nutritional supplements, and any traditional or ethnic remedies used.

Medical changes provide information on functional or medical changes after BC therapy. Details regarding any experience with lymphedema following surgery, as well as additional medical and functional changes concerning urinary difficulties, hypothyroidism, cardiac problems, secondary cancers, and osteoporosis problems are addressed. Subjects are also asked if they have developed arthritis, scleroderma, or specific effects from radiation therapy. Endocrine symptoms are measured using the self-reporting FACT-ES scale,30 which is specific to endocrine/menopausal symptoms. Additional details regarding the subject’s experience with hot flashes, and infertility/reproductive issues are also asked on this page. Fatigue is measured using the self-reporting FACIT-F scale.30 Questions include feelings regarding tiredness, weakness, and energy level, as well as cognitive issues that address their ability to start and finish things. Additional details regarding the subject’s experiences with cognitive problems (mental or thought processes) are further addressed on this page.

Medical data are entered by the clinician and include listings of the tests/procedures completed at the time of diagnosis, including where and when the test was done. Biopsy and histology/pathology details, as well as the stage of disease at the time of diagnosis and the clinical stage of disease are entered. Details regarding the affected breast, quadrant, surgical margins, and lymph nodes or metastatic sites involved are included, as well as data on biochemical markers and details of hereditary breast cancer.

Treatment data include details regarding dosage administration, therapy start and stop dates, best responses/outcomes, and schedules of any hormonal therapy, immunotherapy, radiation therapy, chemotherapy, or any other therapies the subject has received.

Surgery form includes details regarding the subject’s mastectomy. The outcome of the surgery is evaluated by asking if there was a recurrence of disease, the site of recurrence and the number of months to the first recurrence. Clinicians are also asked if there was any infection after surgery and if the subject had reconstructive surgery.

Follow-up information is entered by a clinician or coordinator. It includes data on the disease’s progression or relapse, as well as any updates to the QOL, medical, treatment, and family history. The vital status of the subject is updated on this ‘Follow-up’ form including the date and cause of death if relevant.

Administrative data include: (i) date when questionnaire is submitted; (ii) current status of the questionnaire; (iii) registering institution code; (iv) clinician’s ID; and (v) subject’s identification code—an automatically generated number that can be used to re-identify the subject when data are de-identified (as permitted by HIPAA regulations).

BCCR user interface

The BCCR public website (Fig. 2) can be accessed at http://bccr.unmc.edu/.

Figure 2.

Figure 2

The BCCR website.

The BCCR user interface is compatible with all major web browsers, such as Microsoft Internet Explorer 6+, Mozilla Firefox 3+, Opera, and Safari. The user interface and data collection forms were designed to eliminate ambiguity and to assist in accuracy and ease of data collection by providing pre-defined selection of choices whenever possible. The BCCR system includes validation components that prevent entering erroneous information by the users. The BCCR developers regularly collect feedback from the end users and evaluate the system’s interface for further improvements. Figure 3 presents an example of the BCCR user interface.

Figure 3.

Figure 3

Example of the BCCR user interface.

To improve portability, convenience and ease of use, a separate interface for Apple’s iPads has been created. Initial testing has proved that the use of iPads helps subjects’ enrollment by creating a “cool factor” and removing a barrier between a patient and clinical personnel. Currently, about 97% of the approached subjects have been signing the consent forms and about 63% of them filled out the questionnaires. Our preliminary data suggest that the use of Apple iPads as a mobile interface to the BCCR has increased the questionnaires’ return rate by ∼10%.

Integration with caTissue

The caTissue Suite,31 which is the tissue bank repository tool developed under the caBIG® umbrella, has been adopted and integrated with the BCCR to collect and manage the biospecimen data in a standard and efficient way. It is used to track the collection, storage and distribution of specimens and provides quality assurance for all of these activities. The participating centers are able to either submit biospecimen data into the central repository or maintain their own installation of caTissue and store biospecimen data locally.

BCCR security

The implementation of security needs reflects modern IT trends and ensures compliance with the electronic information security standards mandated by the Federal Health Insurance Portability and Accountability Act (HIPAA). The BCCR protects the patients’ personal information by utilizing the recommendations of the Healthcare Information and Management Systems Society (HIMSS) Privacy & Security Toolkit.32 The BCCR web application that supports data collection is accessible only for authorized users, and the patients’ PHI is stored encrypted. The system utilizes secure web server communication and supports Secure Socket Layer (SSL) (an Internet encryption method that provides two-way encryption along the entire route that data travels to and from a user’s computer) and Hypertext Transfer Protocol Secure (HTTPS) authentication (the communications standard used to securely transfer pages on the Web).

The authorized users must have their own unique electronic signature—a combination of a user name and a password. Each user has an appropriate level of access to data. The user roles and types of authority are described in Table 1.

Table 1.

BCCR user roles and their authority.

Role Authority
Subject/patient Can enter/update personal, demographic, lifestyle, symptoms, QOL, family and medical history data
Lab technician Can enter/update biospecimen data only
Clinician All of the above + Enter medical data, retrieve and edit existing cases of his/her patients
Coordinator All of the above for the assigned clinicians
Center manager All of the above + Retrieve and edit cases of the patients of the center/institution
System coordinator All of the above + Retrieve and edit all cases, activate/suspend users, assign user authorities

IRB and subject recruitment

The BCCR participating centers are required to obtain approval from its Institutional Review Board (IRB). The BCCR provides standard protocol templates and privacy assurances in procedures of informed consent that have been formulated to detail the use of web-based tools. A template of common protocol statements includes: (i) methods and procedures applied to human subjects; (ii) data storage and confidentiality; (iii) potential risk assessment for human subjects; (iv) risk classification; (v) protection against the potential risks for human subjects; (vi) potential benefit assessment for human subjects; (vii) potential benefits to society; and (viii) alternatives to participation. A template for a common informed consent form includes the following HIPAA-mandated information: (i) a specific description of the information to be used or disclosed; (ii) the person or entity to whom disclosure will be made; (iii) the purpose of the use or disclosure; (iv) an expiration date or event for use of the information; (v) an explanation of how authorization may be revoked; and (vi) any restrictions placed on the subject’s access to the information with access granted upon completion of the research. All participating investigators are able to use these standardized statements to assist them with their IRB applications.

All BCCR participating researchers and clinicians are required to complete the computer-based training course on the Protection of Human Research Subjects. All information gathered in the BCCR should be compliant with IRB approvals at participating sites that are monitored by each center’s IRB. The BCCR coordinator opens new accounts and enables data entry into the BCCR only after receiving the documented proof of IRB protocol approval. In order to enter data into the BCCR, a copy of the consent form for each subject must be submitted to the BCCR coordinator. Under the informed consent process, study participants have been asked to voluntarily participate in the BCCR. The potential participants are asked about their willingness to share the information they provided in the BCCR with research collaborators. The information the participants provide is collected for research purposes only. The subjects are informed in the consent that their PHI will be encrypted and that the web-based registry is accessible to authorized users only. Identifiers will never be released in order to protect participant confidentiality. Every effort is made to ensure confidentiality by means of data encryption, password authentication of users, electronic firewalls and locked storage facilities, de-identification of PHI, audit trails, a disaster prevention and recovery plan, and security measures for back-up.

Participants have also been informed that they may revoke the authorization to use and share their PHI at any time by contacting the principal investigator in writing. If they revoke the authorization, they may no longer participate in the research studies and the use or sharing of future PHI will be stopped, but the PHI which has already been collected may still be used.

Data use agreements

To review and prioritize scientific projects, the BCCR Steering Committee has been formed to oversee all studies utilizing the BCCR data. The committee consists of an appointed member from each participating institution and external advisors, including a patient advocate, and the BCCR coordinator.

Quality control

Quality of the collected data is ensured by the standardization of the collection forms that have been developed with predetermined selection choices to assist with the accuracy and ease of completion. Extensive computer-based procedures assessing the accuracy of the submitted data and multi-layer control (patient, clinician, center manager, system administrator) also ensure the quality and completeness of gathered information. The comprehensive training materials, manuals defining vocabulary used in the BCCR and user manuals with a defined set of procedures and lines of responsibilities for each level of participants were distributed to the centers. To guarantee the consistency and reliability of data collection across the participating centers, the BCCR coordinator continuously provides educational training sessions and audits submitted data, whereas individual center managers review the data submitted from their respective centers.

Results and Discussions

The BCCR was initiated as the Nebraska-based regional breast cancer data repository that has been evolved into a multicenter, multistate breast cancer collaborative registry. The BCCR is implemented as a bank of information with the ability to reference the institution and investigator where additional files/samples are located. The BCCR maintains an audit trail of all data entries to protect the authenticity, integrity and confidentiality of all data entries. The collaborators and representatives from the centers utilizing the BCCR system have developed organizational and operational procedures, standardized IRB applications and common consent forms, as well as bylaws for stockholders participating in multi-center collaborations.

At the present time, seven cancer centers from three states participate in the BCCR: University of Nebraska Medical Center, Good Samaritan Hospital/Cancer Center (Kearney, NE), St. Elizabeth’s Regional Medical Center (Lincoln, NE), Saint Francis Medical Center/Cancer Treatment Center (Grand Island, NE), Penrose Cancer Center (Colorado Springs, CO), Holyoke Medical Center (Holyoke, MA), and St. Vincent Hospital Center for Cancer Services (Worcester, MA). Data on 875 subjects are collected in the BCCR as of 7/20/2011 (see Table 2).

Table 2.

BCCR enrollment.

Number of subjects
By race
American Indian or Alaska native 4
Asian 5
Black or African American 50
Native Hawaiian or other 1
Pacific Islander
White 779
Multiracial 10
Unknown/not reported 26
Total 875
By ethnicity
Hispanic 16
Not-hispanic 639
Unknown/not reported 220
Total 875
By gender
Female 868
Male 7
Total 875
By subject type
BC patients 816
High-risk subjects 59
Total 875

Recently, the BCCR has been certified by the NCI Center for Biomedical Informatics and Information Technology as a caBIG® Bronze Compatible product. The BCCR serves as an effective medical informatics platform for the successful implementation of collaborative efforts by the diverse group of researchers from multiple institutions with expertise in oncology, pathology, epidemiology, genetics, nutrition, and biomedical computing. The BCCR development and utilization provides participating investigators with clear benefits of: (i) an effective and secure web-based system for standardized data collection; (ii) computerized data auditing and data quality control; (iii) ability to access larger data sets from collaborating centers for data mining and analysis; and (iv) enhanced collaboration with other breast cancer researchers and clinicians from different institutions.

In the near future, the BCCR will be virtually integrated with other cancer- and health-related databases developed at the UNMC Eppley Cancer Center, into a comprehensive cancer and health data resource equipped with an advanced data reporting and data mining system.

Acknowledgments

The development of the BCCR was partially supported by the grant from the National Cancer Institute (CA10595 “Expansion of Breast Cancer Resource of Eppley Cancer Center on NCCCP sites”, PI: Kenneth Cowan).

Footnotes

Disclosures

Author(s) have provided signed confirmations to the publisher of their compliance with all applicable legal and ethical obligations in respect to declaration of conflicts of interest, funding, authorship and contributorship, and compliance with ethical requirements in respect to treatment of human and animal test subjects. If this article contains identifiable human subject(s) author(s) were required to supply signed patient consent prior to publication. Author(s) have confirmed that the published article is unique and not under consideration nor published by any other publication and that they have consent to reproduce any copyrighted material. The peer reviewers declared no conflicts of interest.

References


Articles from Cancer Informatics are provided here courtesy of SAGE Publications

RESOURCES