Skip to main content
Journal of the American Medical Informatics Association : JAMIA logoLink to Journal of the American Medical Informatics Association : JAMIA
. 1998 Sep-Oct;5(5):432–440. doi: 10.1136/jamia.1998.0050432

A Case Study of the Evolving Software Architecture for the FDA Generic Drug Application Process

Kip Canfield 1, Michele Ritondo 1, Richard Sponaugle 1
PMCID: PMC61324  PMID: 9760391

Abstract

This primary goal of this project was to develop a software architecture to support the Food and Drug Administration (FDA) generic drug application process by making it more efficient and effective. The secondary goal was to produce a scalable, modular, and flexible architecture that could be generalized to other contexts in interorganizational health care communications. The system described here shows improvements over the old system for the generic drug application process for most of the defined design objectives. The modular, flexible design that produced this new system offers lessons for the general design of distributed health care information systems and points the way to robust application frameworks that will allow practical development and maintenance of a distributed infrastructure.


This paper presents a case study of the software system for submission of drug development information (including clinical trial and manufacturing information) that is used for drug approval by the Food and Drug Administration (FDA). The automated system has been implemented in the Office of Generic Drugs for submission of bioequivalence and chemistry application materials. This program is now officially optional in that sponsor companies can choose to submit materials using this automated system or may continue to use their current paper-based system only. The FDA hopes to phase out the paper system in the future.

The primary goal of this project was to develop a software architecture to support the FDA drug application process by making it more efficient and effective. The secondary goal was to produce a scalable, modular, and flexible architecture that could be generalized to other contexts in health care communications.

Kazman et al.1 define software architecture as “a high-level configuration of components that compose the system, and the connections that coordinate the activities of those components.” Garlan and Shaw2 state that software architecture issues “include gross organization and global control structure; protocols for communication, synchronization, and data access; assignment of functionality to design elements; physical distribution; composition of design elements; scaling and performance; and selection among design alternatives.” In general, software architecture is an emerging discipline in software engineering that seeks to understand the principles of software design modularity that can lead to software reuse, portability, and maintainability in complex systems. This project created a software architecture for a process that links drug companies to the FDA.

Background

The FDA drug application process is the way that sponsor drug companies (drug developers and manufacturers) request approval of a drug product from the FDA. The process requires interorganizational data flows. In a typical scenario, a sponsor company will contract with one or more contract research organizations to conduct portions of the clinical studies in the drug evaluation process. All the resulting data must be submitted to and organized by the sponsor and then submitted to the FDA. In addition to the CROs' data, the sponsor has a large amount of other drug data that also have to be submitted to the FDA. The data include bioequivalence and chemistry, manufacturing, and controls data. The application process is a fairly complex system that usually requires the coordination of more than three organizations with information systems. Any candidate architecture for supporting this process not only must develop standards for information representation at any particular organization in the process, but must also handle the data flows between organizations.

Interorganizational systems that involve resources that are shared between multiple organizations with no shared leadership or ownership.3 The information systems literature identifies several barriers to interorganizational systems, including connectivity/interoperability,4 competition/trust,5 and security.6 The connectivity/interoperability barrier is the major barrier that a software architecture must address. Interoperability problems can be reduced through universal standards and new cross-platform technologies. A software architecture must be sensitive to the other important barriers and include technologies and standards to reduce them. For example, standards, programs, and processes must respect privacy of proprietary information, support proven encryption standards, and not incur excessive technologic risk.

The immediate motivation for this project was the inefficiency and ineffectiveness of the current approaches to processing drug applications. The common ones are paper submission and various computer-assisted new drug application (CANDA) approaches. Paper submission is an inefficient process with high redundancy and error rates. Contract research organizations typically submit data to sponsors on paper (sometimes in part electronically and in arbitrary formats). Sponsors must integrate those data with their own data and create documents for submission to the FDA. The FDA must internally handle large quantities of paper with resulting problems of difficult access and data transfer. The strengths of the paper-based system are that it is familiar to all players (lower training costs) and is very flexible, in that people (rather than computers) can best handle arbitrary changes. It will be argued below, however, that the new software system has much higher overall efficiency and effectiveness. The various forms of CANDA are more difficult to characterize, because the term covers many different approaches. This is the major problem with CANDA approaches—the fact that they are not standards-based.7 Sponsors may automate their applications in fairly arbitrary ways, which require each contract research organization and the FDA to adapt to many systems. There have been calls from the FDA for so-called network CANDAs, which means that the submitted data should be able to reside on the FDA network. The system described here is the first comprehensive standards-based effort in that spirit.

Design Objectives

We developed ten design objectives that characterize the general goals of efficiency and effectiveness for the FDA application process, and then designed a software architecture to address these goals. Each node of this process is an organization—either a contract research organization, a sponsor, or the FDA. We divided the ten design objectives into the flow-processing, data-processing, and cost/complexity categories shown in .

Table 1.

Design Objectives for the Electronic Submission Program

Flow Processing:
  • Less transfer delay between nodes (internode delay)

  • Higher density of communication with low cost

  • Less misunderstanding between nodes

Data Processing:
  • Less processing delay at all nodes (intranode)

  • Less redundancy of effort

  • Greater use of automated tools

  • Lower error rates for both content and completeness

Cost/Complexity:
  • Lower infrastructure development cost

  • Lower maintenance cost

  • Greater flexibility in handling changing requirements

System Description

The system is called the Electronic Submission Program, which is part of a larger strategy for electronic regulatory submission and review at the FDA. The University of Maryland has developed the program for the FDA. The system is best described in terms of its components: three data standards, three software programs, and multiple data flows.

Data Standards

The Data File Standard

The data file standard is the first of three data document standards. Each contract research organization uses a standard data file format to submit data to the sponsor. We developed this simple, text-based standard on the model of typical data files used for statistical analysis of data. A sample data file is shown in . The names and ordering of the most common columns are given by the standard, and additional columns can be added at the end and described in the standard comment at the top of the file. Since this is plain text, it is easily e-mailed or otherwise transferred electronically to another node in the system. Since the data file format is typical of those that statistical packages require, no further transformation of the data is required at either the sponsor or the FDA. In fact, libraries of statistical procedures can be developed that leverage the standard data file format. Time-consuming data preparation steps are in this way eliminated.

Figure 1.

Figure 1

Sample dissolution data file. Ellipses indicate omitted data.

The Electronic Submission Document Standard

The central text document of the submission is the electronic submission document (ESD). This is a highly structured plain-text document that is designed to support automated data entry into a database at the FDA. It allows the FDA to keep a structured database of all application information for any drug. This has been impossible under the paper and CANDA submission processes.

The ESD is a hierarchically structured file with simple markup. The data files are referenced in the document by filename, but the data themselves are not repeated there. shows a fragment of the ESD with simulated data. shows the complete hierarchic structure, or outline, of the document for bioequivalence, which is geared toward getting data on the clinical studies. Complete information on all the standards for bioequivalence and chemistry, manufacturing, and controls is available on the project Web site at http://mundos.ifsm.umbc.edu/fda_eva.

Figure 2.

Figure 2

The electronic submission document (ESD). Ellipses indicate omitted data.

Figure 3.

Figure 3

The hierarchic structure of the electronic submission document.

The simple markup used in the ESD is designed to reduce the barriers to participation in this system faced by sponsors and contract research organizations. More sophisticated markup languages such as SGML (Standardized Generalized Markup Language) and XML (Extensible Markup Language) are more powerful and functional, but expertise in SGML is not common among drug company personnel and XML was not available at the beginning of this project. HTML (hypertext markup language) is not appropriate for this task, since it does not allow custom content tags. The simple markup defined for this project is easily (and automatically) converted to a standard such as SGML or XML, and we plan to change the markup to XML in the near future. Before the end of 1998, XML browsers should be available from the major vendors, and familiarity with these technologies should become routine in corporate information technology shops. The document-type definition feature of SGML and XML would be particularly useful in this architecture for parsing submission documents to verify syntactic correctness or for entry of data into databases. As the capabilities of the participants increase, this architecture will allow evolution to include more complex standards such as SGML or XML without compromising the basic conceptual design.

HL7 (Health Level 7) is a messaging syntax. This project is interested in representing persistent content independent of an HL7 message. The documents submitted under the Electronic Submission Program become legal documents. The Kona Proposal8 summarizes the differences between HL7 and persistent documents. This is especially important for the ESD. “Observational Report, Unsolicited” messages in HL7 can easily be adapted for the data files.

The Companion Document Standard

The companion document (CDOC) standard is not a text document and must be submitted in portable document format. The standard simply requires standard headings in this document. The sponsor can enter any relevant information under the standard headings (there are also open-ended headings so that anything can be entered). The CDOC is designed for narrative information that will not be entered into the FDA database but will be useful to the reviewer. The standard headings make it easier for reviewers to look up information in the document. In addition, the CDOC offers a way to handle change gracefully and flexibly in the system. If additional information becomes needed in the ESD, it can be added temporarily to the CDOC until it is incorporated into the ESD on a scheduled update. The CDOC is very easy to update and has no negative impact on the process or software development of the programs that read the structured documents.

Software Programs

The Entry and Validation Application

The entry and validation application (EVA) is the first of three software programs to support the system. Since structured documents are not easy for direct data entry (see ), the EVA program offers a typical windowed, forms-based environment for data entry. provides a graphic view of the ESD hierarchy that is presented to the user. The user clicks on any node to access the relevant forms. shows a typical form from the data entry interface.

Figure 4.

Figure 4

The entry and validation application (EVA) interface.

The purpose of the EVA is merely to simplify data entry, and the program is not required. Only the ESD, the CDOC, and data files are submitted to the FDA. Sponsors may develop their own data entry programs, or software vendors may offer commercial programs to fill this need. The ESD becomes simply a report in the information system of the sponsor, who may engineer their information systems as they see fit. The additional function of the EVA is to validate ESDs. The EVA program reads and writes the ESD format and can be used to parse, or validate the structure of, the ESD. If a sponsor develops its own data entry software, the EVA can be used to validate the output. Under an XML implementation, its document-type definition feature would be used to validate the document structure.

The FDA Database

We developed a database for the FDA to hold the information from the ESDs. It is a relational database design implemented on an Oracle database and centrally administered at the FDA. All information from the ESD is automatically entered into this database with a parser—a program that reads the ESD and inserts the data into the database. The information in the data files can be entered optionally. The data files and ESD are retained in standard read-only directories on a secure local area network drive.

This database resource has many functions at the FDA. It allows retrieval of other and past application data for decision support and management analysis. For example, a reviewer is able to request (through an ad hoc query interface) information from other applications for a certain ingredient or manufacturing process. For management, the FDA is able to track statistics on applications in much finer detail. As a regulatory agency the FDA can respond much more quickly if a certain drug substance develops problems detected through the field-monitoring process.

The Reviewer Tools

Once the data are in a database, they can feed other applications and reports. Reviewer productivity can be significantly enhanced with such tools. Almost all clerical work is done by the system and not by the reviewer. The reviewer need not engage in data format translation, copying tables and data into a report, or data entry for statistical or graphing applications. According to interview-based assessments by reviewers, this kind of clerical work takes up more than half of reviewers' time.

We have developed two basic tools, but it is important to note that this architecture allows easier proliferation of such tools. The first tool we developed is a reporting application that uses the database to create a template review document. This document holds all the information and tables that the reviewer would normally have to enter by hand. Reviewers update the template with the substance of their reviews. The second tool is for data exploration. The data are programmatically obtained from the database or data files, or both, and put into a spreadsheet-based application using Open DataBase Connectivity drivers9 that allow the reviewer to do what-if analysis on the data, recalculate parameters, and easily graph data. A typical screen from this interface is shown in .

Figure 5.

Figure 5

A spreadsheet interface from the reviewer tool.

Data Flows

The data flows in this system do not require any special network transport; e-mail and diskette are currently the most common. The contract research organizations create the data files using a text editor (according to the published standard) or the EVA tools. The EVA contains spreadsheet-like components that enforce the required structure and output the text data files. Typically, the contract research organization then e-mails the data files to the sponsor. Security is the responsibility of the sponsor. It is easily accomplished using widely available public key encryption technologies. The sponsor must at this point integrate the data files into their ESD. The data files have standard names according to a published convention. When the sponsor has completed the application, the data files, the ESD, and the CDOC are submitted to the FDA. This would be done optimally via secure e-mail or another network transaction, but it is currently handled by diskettes sent by the U.S. postal service or a courier service because of security and legal concerns at the FDA. We expect this situation to be resolved in the near future, since a digital signature standard has recently been approved for the FDA.

On reaching the FDA, the files that make up the application are vetted against a checklist. This leads to either acceptance (filing) of the application or rejection (due to incompleteness or some other deficiency). This is still a manual process, but the architecture allows easy migration to automated filing. Once filed, the application enters a queue for review and is randomly assigned a reviewer. The ESD and the data files remain on the network as the persistent and official drug application. The reviewer receives the drug application, does the evaluation, and writes the review. The reviewer receives the files (both submitted and derived) over the network. Notice that this process allows for significant internode and intranode savings of time and effort through automation. The ESD is also parsed and the data are inserted into the database, which allows cross-application queries. shows the data flows between the contract research organizations, sponsors, and the FDA.

Table 2.

Flow of Data Through the System, from Contract Research Organizations (CROs) to Sponsors and the Food and Drug Administration (FDA)

CRO Task Documents Transferred to Sponsor Sponsor Tasks Documents Transferred to FDA FDA Tasks
Data file preparation Data files ESD preparation ESDs ESD import to the database
Data file preparation and incorporation Data files ESD and data file import to the reviewer tool
CDOC preparation CDOCs ESD, data file, and CDOC storage on LAN




Review of application
Note: FDA indicates Food and Drug Administration; ESD, electronic submission document; CDOC, companion document; LAN, local area network.

Status Report

The system described here is an official and operational system from the FDA's Office of Generic Drugs. It has been in operation for about one year and in pilot testing for about three years. Approximately 25 sponsors and contract research organizations have submitted applications using the system. Approximately 15 reviewers have used it for operational reviews. The three scenarios below will show the performance of this system in specific contexts.

Reviewer/Statistician Consultation

Reviewers for the FDA often need to consult with staff statisticians for a particular review. In the paper-based system, the reviewer requests a consultation with the statistician by phone, e-mail, or memo, and the statistician must deal with paper copies of data or arbitrary electronic formats. In the current ESP architecture, the reviewer e-mails the data files in the standard format to the statistician or shares them on the network. This reduces clerical work by both the statistician and the reviewer and satisfies the data-processing design objectives. The time of both statisticians and reviewers is expensive, and the time they spend on such clerical work as rekeying and formatting is a waste of their expertise. Based on our interviews, most reviewers who have used the Electronic Submission Program report subjective productivity increases of about 50 percent. The program also eliminates the need for a physical document room and the shuttling of large paper jackets around the FDA. Both the statistician and the reviewer can leverage the constant format of the data files to develop libraries of statistical procedures that will work for every submission. Prior to the introduction of this architecture, statistical procedures had to be modified arbitrarily to accept different data file formats.

Collaboration Between Sponsors and Contract Research Organizations

The contract research organizations must send the data files to the sponsors for inclusion in the sponsors' drug applications. Under the paper system, the sponsor must in many cases incorporate paper copies of data from the contract research organization into their application. For the ESD-based system, the contract research organizations prepare the data files according to the standard format. The contract research organization then e-mails, mails, or electronically transfers the data files to the sponsor using the naming convention specified in the standard. The benefits of the new system are largely efficiencies from standards-based communications that allow automation of processing at each node of the workflow (relevant to the flow-processing design objectives).

Some contract research organizations play a much more prominent role in the application process, handling all aspects of the bioequivalence study and needing to use EVA to produce an ESD, whereas other contract research organizations submit only the data files to the sponsor. The problem with this is that EVA does not explicitly contain “groupware” or collaboration features10 that would simplify the filling out of EVA forms by more than one organization. For example, sponsors do not want to share formulation and dissolution information with the contract research organizations, even when the research organizations perform all other data preparation for the submission. We are working on extensions of the current system to solve this problem. The architecture of the Electronic Submission Program does not require global control over the system (except for the initial setting of standards) and does not require that separate organizations share any parts of their information systems. This is important for interorganizational systems and reduces the cost of maintenance and change (relevant to the cost/complexity design objectives).

Reporting

Reporting is made much easier at all nodes using this system. This is primarily because the standards for the reports reduce uncertainty in communications and allow for automated tool development. Scenarios for both sponsor-generated drug application reports and reviewer reports follow for chemistry, manufacturing, and controls data.

Reviewers must generate, at the end of the review process, a report that contains both reviewer-generated and sponsor-generated data. The paper-based system requires the reviewer to photocopy relevant parts of the jacket, rekey data in reviewer-generated tables, and integrate all this with their narrative report. The Electronic Submission Program, in contrast, requires the reviewer to generate the review template document from a software tool and add their own narrative report information. Since all data come into the FDA as machine-readable data in a standard format, they are available to the reviewer software tools. These tools, in addition to aiding in the analysis of sponsor data, allow for the automatic generation of review template documents that contain all the required sponsor data in a standard review document format. The reviewers must add only their own review comments and analysis to this document. Most clerical work is eliminated and faster review times should result.

The reviewer report is also of use to the sponsors. The sponsors can generate the same reviewer report document and see exactly what the reviewers see. This was the function most requested by sponsors. The system reduces the uncertainties of nonstandard communications. The sponsors can also use this report to check the quality of their submission and to obtain management sign-offs. Both these benefits satisfy the data-processing design objectives.

Discussion

This paper is a detailed case study of a real software architecture for interorganizational health care data exchange. Case studies have been identified as an important need in the emerging study of software architecture.11 The design objectives of this project were satisfied by using a modular and flexible architecture. The design is modular because it does not require different companies to share any parts of their information systems. The design is flexible because it allows a rapid rate of change. For example, the evolution from the proprietary markup language of the ESD to a standard such as XML is relatively easy.

Related Projects

There are other initiatives for electronic submission of regulatory information to the FDA for the purpose of streamlining the review and approval process. None of them really qualifies as an architecture, since each one is geared to presenting a document to the FDA rather than integrating the process at all nodes. The Multiagency Electronic Regulatory Submission (MERS) project (see http://www.mers.pharmasoft.se/news.html) is an SGML-based international initiative to specify a standard for a drug application submission to regulatory authorities in Europe and the United States. It does not include workflow or database interfaces, but it is a carefully developed SGML standard that could be integrated with the ESD in the future. Its developers have produced a chemistry, manufacturing, and controls document-type definition for new drug submissions.

The Drug Application Methodology with Optical Storage (DAMOS) project (see http://www.damos.org/) has many similarities to the current FDA electronic regulatory submission and review initiative. It uses various types of data files to allow both structured and unstructured data standards. It started at about the same time (1990) as the FDA initiative, and a prototype form was operational at about the same time (1993). It is an example of parallel development in response to a similar set of requirements. The FDA project includes the regulatory agency, whereas the DAMOS project is an industry initiative. The DAMOS project has released version 3.0 of its specification, which is available on the DAMOS Web site. The FDA project uses a markup standard to allow exchange of structured information like the MERS project and defines specialized data file types like the DAMOS project. The FDA project adds specification for tools and procedures at each node (sponsor, contract research organization, and the FDA) for software-supported workflow. This distinguishes the establishment of standards from an architecture to support those standards.

Future Work

The basic architecture described here is applicable to other areas that require interorganizational data flows. Other regulatory environments are obvious candidates. We are looking into the applicability of this general architecture to minimal data set (MDS) reporting to the Health Care Financing Administration (HCFA) by long-term care institutions. This context requires clinical institutions to submit standards-based MDS patient data to HCFA. Currently, the data have not been used optimally because of workflow problems. Applied to this problem, the approach described here would allow incoming MDS data to be put into an active database that performs monitoring and data mining functions continuously with standard and ad hoc reporting. Standard formats and applications for workflow would reduce the effort to acquire data and increase the efficiency of use. This architecture could also be used for some basic reporting functions of multicenter distributed clinical trials. Generally, when different organizations must exchange data, a modular system will work because it allows autonomy and control at each node. Completely integrated application systems that must operate across organizational boundaries are difficult to implement for both technical and political reasons. We intend to develop this architecture into an application framework—that is, a “reusable, `semi-complete' application that can be specialized to produce common applications.”12 Frameworks are typically domain-specific and offer a blackbox interface and extensibility components that add functionality. Frameworks are currently the most effective tools for the reuse of design and software. A significant requirement in the development of a basic conceptual architecture is that it allows graceful evolution and incorporation of new technologies. This architecture is now evolving from a simple interchange system to a distributed application framework.

This work was supported by contract 223-95-3003 from the Food and Drug Administration to the University of Maryland, Baltimore School of Pharmacy and the Department of Information Systems.

References

  • 1.Kazman R, Abowd G, Bass L, Clements P. Scenario-based analysis of software architecture. IEEE Software. Nov 1996: 47-55.
  • 2.Garlan D, Shaw M. An Introduction to Software Architecture: Advances in Software Engineering and Knowledge Engineering. Vol. 1. River Edge, N. J.: World Scientific Publishing Co., 1993: 1-39. [Google Scholar]
  • 3.Slyke C, Prescott M, Kittner M. Overcoming Barriers to Distributed Interorganizational Systems. Proceedings of the 30th Annual Hawaii International Conference on System Sciences. Vol. IV. Washington, D.C.: IEEE Computer Society Press, 1996: 69-78. [Google Scholar]
  • 4.Hardwick M, Spooner D, Rando T, Morris K. Sharing manufacturing information in virtual enterprises. Communications of the Association for Computing Machinery. (CACM). Feb 1996;39: 46-54. [Google Scholar]
  • 5.Kumar K, Dissel H. Sustainable collaboration: managing conflict and cooperation in interorganizational systems. MIS Quarterly 1996;20: 279-300. [Google Scholar]
  • 6.Neuman P. Risk in digital commerce. Communications of the Association for Computing Machinery. Jan 1996;39: 154. [Google Scholar]
  • 7.Williams J, Canfield K, Ritondo M. IT Lessons Learned from the Food and Drug Administration's CANDA Program. Journal of Failures and Lessons Learned on Information Technology Management. 1997;1: 39-47. [Google Scholar]
  • 8.HL7 SGML Special Interest Group: The Kona Architecture Proposal. July 7, 1997. Medical Center Information Systems, Duke University Medical Center Web Site. Available at: http://www.mcis.duke.edu/standards/HL7/sigs/sgml/kona.htm.
  • 9.Blobel B, Holena M. Comparing Middleware Concepts for Advanced Healthcare System Architectures. Int J Med Inform. Sep 1997;46: 69-85. [DOI] [PubMed] [Google Scholar]
  • 10.Miller P, Nadkarni PM, Kidd KK, et al. Internet-based support for bioscience research: a collaborative genome center for human chromosome 12. J Am Med Inform Assoc. 1995;2: 351-64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Bass L, Clements P, Kazman R. Software Architecture in Practice. Reading, Mass.: Addison-Wesley, 1997.
  • 12.Fayad M, Schmidt D. Object-oriented application frameworks. Communications of the Association for Computing Machinery. Oct 1997;40: 32-8. [Google Scholar]

Articles from Journal of the American Medical Informatics Association : JAMIA are provided here courtesy of Oxford University Press

RESOURCES