Feasibility of Extracting Key Elements from ClinicalTrials.gov to Support Clinicians’ Patient Care Decisions

Heejun Kim; Jiantao Bian; Javed Mostafa; Siddhartha Jonnalagadda; Guilherme Del Fiol

. 2017 Feb 10;2016:705–714.

Feasibility of Extracting Key Elements from ClinicalTrials.gov to Support Clinicians’ Patient Care Decisions

Heejun Kim ¹, Jiantao Bian ², Javed Mostafa ¹, Siddhartha Jonnalagadda ³, Guilherme Del Fiol ²

PMCID: PMC5333330 PMID: 28269867

Abstract

Motivation: Clinicians need up-to-date evidence from high quality clinical trials to support clinical decisions. However, applying evidence from the primary literature requires significant effort.

Objective: To examine the feasibility of automatically extracting key clinical trial information from ClinicalTrials.gov.

Methods: We assessed the coverage of ClinicalTrials.gov for high quality clinical studies that are indexed in PubMed. Using 140 random ClinicalTrials.gov records, we developed and tested rules for the automatic extraction of key information.

Results: The rate of high quality clinical trial registration in ClinicalTrials.gov increased from 0.2% in 2005 to 17% in 2015. Trials reporting results increased from 3% in 2005 to 19% in 2015. The accuracy of the automatic extraction algorithm for 10 trial attributes was 90% on average. Future research is needed to improve the algorithm accuracy and to design information displays to optimally present trial information to clinicians.

Introduction

Evidence Based Medicine (EBM) is the application of the best and up-to-date scientific evidence available from rigorous clinical studies to guide the care of individual patients. To practice EBM, clinicians need to retrieve, appraise, and integrate the latest research findings into decisions for a particular patient¹. However, clinicians often cannot process the large amount of recent research results within the typical busy patient care environment. As a result, over 60% of the clinical questions that clinicians raise go unanswered due to the amount of time needed to spend on the search task². Systematic reviews are a common approach to synthesize the evidence on a specific clinical topic, but the process is slow and expensive. As a result, many clinical topics have no systematic reviews and a large percentage of published reviews are outdated³. Hence, solutions are needed to help clinicians use the results of original clinical studies in the care of their patients.

To improve the efficiency of information seeking, substantial progress has been made to improve the evidence retrieval process. Several approaches are available to help clinicians identify high quality and high impact clinical studies^4–8. However, less attention has been dedicated to help clinicians quickly judge the relevance of a clinical study to a particular patient and understand the gist of the study findings. To improve the search process, clinicians have been suggested to formulate their questions according to four components: patient population, intervention, comparison, and the outcome of interest (PICO)⁹. While several information retrieval studies have used the PICO framework, most prior work has focused on the search process^10–12. Preliminary studies have demonstrated the feasibility of automatically extracting core clinical trial characteristics and results from narrative text such as journal article abstracts^13,14. There have also been several studies that extracted PICO elements from narrative text^15–18. More recently, the increased use of clinical trial registries that record semi-structured trial information provides a unique opportunity for improving the information extraction process¹⁹.

The overall goal of the present study is to assess the feasibility of automatically extracting PICO elements of clinical trials from ClinicalTrials.gov. Specifically, we aimed to (1) assess the coverage (i.e., the rate of high quality clinical trials indexed in PubMed that have been registered in ClinicalTrials.gov) of ClinicalTrials.gov for high quality trials to examine its comprehensiveness as a data source; and (2) develop and assess the accuracy of an algorithm to extract PICO elements from ClinicalTrials.gov that overcomes unexpected patterns of registration. Both coverage and extraction algorithm accuracy are important for the feasibility of extracting PICO information from ClinicalTrials.gov. The resulting algorithm can be used as a component of tools that support clinicians’ decision making.

Background

ClinicalTrials.gov

is an online registry of clinical research trials established by the US National Library of Medicine (NLM) in 1999 aiming for increased transparency in clinical research. ClinicalTrials.gov archives over 209 thousand trials from 192 countries as of February 2016. Since 2007, trials covered by the Food and Drug Administration Amendments Act (FDAAA) are required to be updated within one year of completion²⁰ with basic summary results data²¹. A ClinicalTrials.gov record contains detailed information about a clinical trial in semi-structured XML format. Trial records include information such as conditions, eligibility, sample size, study methods, study arms, and study outcomes. In the present study, we leverage the semi-structured information from ClinicalTrials.gov to automatically extract PICO elements.

Links between Medline and ClinicalTrials.gov

When registering a trial on ClinicalTrials.gov, a unique identifier (“NCT”) is assigned automatically by the registry. The NCT identifier is also recorded in the Medline citation of articles that report the results of a registered trial, establishing a link between Medline citations and ClinicalTrials.gov records. The International Committee of Medical Journal Editors (ICMJE) has required that all clinical trials be publicly registered before manuscripts are submitted and accepted for publication. ClinicalTrials.gov provides two types of XML files: general information about the trial methods and the trial results. These files can be retrieved through a RESTful (Representational State Transfer) Web service (https://clinicaltrials.gov/ct2/resources/download), which was used in our study method.

Huser and Cimino examined links between ClinicalTrials.gov records and PubMed citations for trials completed between 2006 and 2013²⁴. They found that 28% of the trial records had at least one linked article. The study also found that 27% of the trials completed by 2009 had basic results. In our study, we looked at the reverse, i.e. links from clinical trial citations in PubMed to ClinicalTrials.gov. For clinical decision support purposes, it is more reasonable to start the information seeking process by searching for relevant high quality trials in PubMed than by searching for trials records in ClinicalTrials.gov.

Previous studies extracting PICO elements

In a systematic review, Jonnalagadda et al. found four studies that automatically extracted PICO elements from biomedical publications²². Huang et al.^16,17 used a naïve Bayes classifier, Boudin et al.¹⁵ used a combination of multiple classifiers (random forest, naïve Bayes, support vector machines, and multi-layer perceptron), and Demner-Fushman and Lin¹⁸ utilized a rule-based approach to detect PICO elements in medical abstracts. Boudin et al.¹⁵ utilized features such as Medical Subject Headings (MeSH) semantic types, word overlap with article title, and number of punctuation marks, while Huang et al.^16,17 used terms with highest frequency. The average precision over PICO elements for these studies ranged from 0.63 to 0.84.

Method

The study method consisted of: (1) retrieving citations of recent high quality trials from PubMed and examining coverage of ClinicalTrials.gov for high quality trials; (2) retrieving ClinicalTrials.gov records for the trials retrieved in Step 1; (3) selecting 40 random ClinicalTrials.gov records from the set in Step 2 and manually inspecting the PICO elements from the records of these trials; (4) writing a set of rules to extract PICO elements from trial records; and (5) evaluating the accuracy of the rules with an independent set of 100 trials.

Retrieving citations of recent high quality trials from PubMed

To identify high quality clinical studies in PubMed, we utilized the machine learning classifier developed by Kilicoglu et al.⁴ This classifier uses a Naïve Bayes algorithm and utilizes features such as word frequency, semantic features, and citation metadata from PubMed. The precision of the classifier is 83%. We retrieved 178,302 citations of high quality studies from PubMed published in high impact journals between December 2005 and December 2015 by applying the classifier. Next, we used the E-utilities API (http://www.ncbi.nlm.nih.gov/books/NBK25501/) to retrieve full PubMed citations in XML format. A Python script was used to extract the ClinicalTrials.gov identifier (NCT) from the <AccessionNumber> tag where the value of the parent <DataBankName> tag was “ClinicalTrials.gov” (Figure 1).

Figure 1. — Fragment of a Medline citation in XML format retrieved using the NCBI E-utilities. The squares highlight the relevant nodes for the extraction of ClinicalTrials.gov identifiers.

For instance, the following URL can be used to retrieve the PubMed citation in XML file for the article with PubMed identifier 24725238: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=24725238&report=xml. The coverage of ClinicalTrials.gov for high quality trials was calculated by dividing the number of PubMed citations with a link to ClinicalTrials.gov by the total number of high quality citations. Annual coverage rates were obtained for the 2005 to 2015 period.

Retrieving ClinicalTrials.gov records

We used the ClinicalTrials.gov Web service API to retrieve trial records in XML format. For each trial, we retrieved two types of XML files using a Python script: general information about the trial methods and the trial results. For instance, the methods record for the trial with identifier “01768286” was retrieved using: http://clinicaltrials.gov/ct2/show/NCT01768286/?displayxml=true. The trial results record was retrieved with: https://clinicaltrials.gov/ct2/show/NCT01768286/?resultsxml=true (Figure 2). There is a temporal gap between registration of clinical trials methods and reporting their results, so there are cases in which the registered trials do not include results. Only trials that have been completed with basic summary results were selected for the extraction study. We also calculated the percentage of those ClinicalTrials.gov records that reported trial results.

Figure 2. — Example of a trial results record from ClinicalTrials.gov. The highlighted areas show the primary outcome, study arm intervention, unit of measurement, and result for the primary outcome.

Selecting ClinicalTrials.gov records

From the trial records retrieved in the previous step, we selected a random set of 40 records (development set) out of 463 trials completed in 2014 to develop a set of rules for PICO extraction. An independent random set of 100 records (evaluation set) was selected from the same 2014 set for evaluating the accuracy of the algorithm.

PICO extraction rules

One of the authors (HK), assisted by two co-authors (GDF, JM), manually reviewed the 40 trial records in the development set. Based on this review, rules were developed to automatically extract PICO elements from the trial records. We referred to the ClinicalTrials.gov XML schema (https://clinicaltrials.gov/ct2/html/images/info/public.xsd) to guide rule development and to understand how ClinicalTrials.gov users are instructed on registering trials.

Rules were designed to extract 10 trial attributes associated with PICO elements (Table 1). The rules are basically a combination of XPATH expressions. For example, from the <condition> tag (Case 1, Table 2), <minimum_age> tag (Case 2, Table 2), and <enrollment> tag (Case 3, Table 2), we extracted data elements related to the study population, i.e. the main condition of interests in study participants, minimum participant age, and number of total participants in the study inclusion criteria.

Table 1.

Trial attributes extracted by the PICO extraction algorithm for each PICO element.

PICO elements	Trial attributes
Population	Main condition of interest in study participants, minimum and maximum participant age, and total enrollment
Intervention and comparison	Number of study arms and study arm interventions
Outcome	Number of participants who started the trial, number of participants who completed the trial, primary outcome, and results for the primary outcome.

Open in a new tab

Table 2.

Examples of PICO elements successfully extracted from clinical trial records in ClinicalTrials.gov and corresponding XPATH.

Case	Identifier (NCT)	Element type	Sample XML node	Pseudo code with XPATH	Extracted result
1	00006237	Main condition of interests in study participants	<condition>Melanoma (Skin)</condition>	//clinical_study/condition	Melanoma (Skin)
2	00006237	Minimum participant age	<minimum_age> 18 Years</minimum_age>	//clinical_study/eligibility/minimum_age	18
3	00006237	Total enrollment	<enrollment type=“Actual”>432</enrollment>	//clinical_study/enrollment[@type=“Actual”]	432
4	00081731	Study arm intervention	<group group_id=“O1”><title> Optimal Medical Therapy</title></group>	//clinical_study/clinical_results/outcome_list/outcome[type=“Primary”]/group_list/group/title	Optimal Medical Therapy
5	00081731	Number of participants who completed the trial	<participant_flow>…<group group_id=“P1”><title>Optimal Medical Therapy</title></group>…<participants group_id=“P1” count=“472”/>…</participant_flow>	//clinical_study/clinical_results/participant_flow/period_list/period/milestone_list/milestone [title-’COMPLETED”]/participants_list/participants@count	472
6	00081731	Results for the primary outcome	<measure>…<measurement group_id=“O1” value=“20”/> </measurement_list>	//clinical_study/clinical_results/outcome_list/outcome[type-”Primary”]/group_list/group/title	20 (Cardiovascu lar or Renal Death)

Open in a new tab

The rules retrieved scattered PICO elements over two XMLs from several tags and merged them based on the study arm intervention. For instance, the number of participants who completed the trial is only available from the <participants> tag (Case 5, Table 2) under the <participant_flow> tag. However, the study arm design is often changed and needs to be verified from the <title> tag (Case 4, Table 2) under the <outcome_list> tag if it is the final study arm intervention. If there is discrepant information (e.g., pre-arm design before trials and real-arm intervention), the latest information is selected.

We retrieved outcome elements from the <clinical_results> tag. Description for the primary outcome measure was extracted from the <title> tag under the <outcome> tag. The actual outcome measurements were retrieved from the value attribute of the <measurement> tag as in Case 6 (Table 2). The algorithm produces an output in XML format according to the XML schema available at (https://sites.google.com/site/automaticpicoextraction/xml-schema). The complete rules set is available at (https://sites.google.com/site/automaticpicoextraction/).

Evaluation

Two of the co-authors (JB, GDF) rated the algorithm output for the 10 attributes described above. To determine the accuracy of each extracted attribute, the raters compared the algorithm output with the original trial record at the ClinicalTrials.gov Web site. An attribute was considered to be accurate when the attribute value in the algorithm output was identical to the value that could be manually searched in ClinicalTrials.gov. After an initial round of calibration with 20 trials, the Cohen’s Kappa inter rater agreement was 0.51. Disagreements were resolved by consensus between the two raters. For instance, in one case there was disagreement on successful extraction of study arm interventions. The algorithm correctly identified three out of five arm interventions. One author considered it as accurate because three arms were extracted correctly, but the other author considered it inaccurate due to incompleteness. We agreed that the algorithm is expected to extract all the arms correctly. With the calibrated evaluation criteria, one of the evaluators (JB) continued the evaluation on the remaining 80 trials. For one attribute (number of participants who completed the trial), there were 9 trials that did not have data in ClinicalTrials.gov. They were excluded from the evaluation. In total, 991 trial attributes were evaluated.

Results

ClinicalTrials.gov coverage

Out of 178,302 high quality clinical studies indexed in PubMed between 2005 and 2015, 15,360 (8.6%) had a record in ClinicalTrials.gov. Among those registered trials, 3,496 (22.7%) trials had basic results. The coverage of ClinicalTrials.gov for high quality clinical studies has increased from 0.2% in 2005 to 17% in 2015 (Figure 3). The percentage of trials in ClinicalTrials.gov that included results has increased from 3% in 2005 to 19% in 2015. The rate of coverage for trial results peaked (28%) in 2012, but decreased afterwards.

Figure 3. — ClinicalTrials.gov coverage for high quality studies and percentage of ClinicalTrials.gov trial records with results.

Algorithm accuracy

Out of 991 trial attributes extracted from 100 trial records, 893 (90%) could be accurately extracted. Overall, the population-related elements (94%) and intervention- and comparison-related elements (97%) were relatively easy to extract compared to the outcome-related elements (83%). The algorithm extracting the total enrollment hit the highest accuracy (100%) and the algorithm extracting the number of participants who started the trial showed the lowest accuracy (70%). Table 3 provides the accuracy for each trial attribute.

Table 3.

Accuracy of the PICO extraction algorithm according to 10 trial attributes for 100 trial records.

	Main condition of interest in study participants	Minimum participant age	Maximum participant age	Total enrollment	Number of study arms
Accuracy	77 (77%)	99 (99%)	99 (99%)	100 (100%)	96 (96%)
	Study arm interventions	Number of participants who started the trial	Number of participants who completed the trial	Primary outcome	Results for the primary outcome
Accuracy	97 (97%)	70 (70%)	72 (79%, 72/91)	97 (97%)	86 (86%)

Open in a new tab

Discussion

In this study, we aimed to assess the coverage of ClinicalTrials.gov for high quality trials indexed in PubMed and to examine the feasibility of extracting PICO elements from ClinicalTrials.gov records. To our knowledge, this is the first attempt to examine the feasibility of extracting PICO elements from ClinicalTrials.gov to support clinicians’ information needs. According to our analysis, although the overall coverage of ClinicalTrials.gov is still relatively low, the coverage has steadily increased since 2005 and clinical trials with higher impact should be more likely to be registered due to regulatory and publication requirements. For trials that are registered, most PICO elements could be accurately extracted from the randomly selected 100 ClinicalTrials.gov records using simple XPATH-based rules. Our proposed method could be used to enable alternate information displays to summarize evidence from clinical trials in a way that reduces clinicians’ cognitive effort. For example, in a study by Slager et al., physicians rated an information display structured according to PICO elements more favorably when compared with PubMed narrative abstracts²³. The algorithm developed in the present study can be used to support the implementation of such structured displays.

Several studies have attempted to extract PICO elements with machine learning algorithms and/or simple rules^15–18. Previous approaches were assessed in terms of precision (63-84%) because PICO elements were extracted from narrative texts and therefore the methods were framed as information retrieval problems. However, our approach aimed to extract PICO elements from semi-structured data, with only one extraction target per data element. Thus, we used accuracy (90%) for measurement. As a result, direct comparisons with previous studies are difficult. The level of extraction among studies was also different. Three^15–17 of those studies classified PICO elements at the sentence-level, while one study¹⁸ extracted PICO elements at the object-level like ours. Overall, our rule-based approach showed fairly good performance most likely because previous approaches relied on the extraction of information from narrative abstracts. However, unlike ours, previous methods do not depend on trial registration in ClinicalTrials.gov and work with any clinical trial published in PubMed. An optimal solution could be to combine both methods, i.e. PICO extraction from ClinicalTrials.gov whenever a trial record is available and extraction from PubMed otherwise.

Extracting the number of participants enrolled in a clinical trial was the easiest task, since the actual sample size was consistently recorded in ClinicalTrials.gov. Overall, elements related to the population and study arm intervention were also easy to extract. Other tasks for extracting main condition of participants, number of participants who started/completed, and results for primary outcomes required careful considerations for unexpected patterns of registration. After accounting for most of the variation in our sample, the algorithm accuracy for those challenging cases was fairly high (70% to 86%).

The main challenge of applying the proposed approach is the low rate of high quality trials that are registered on ClinicalTrials.gov and variation in trial registration format and content. The coverage of ClinicalTrials.gov (11.2%) and the rate of trials with results (21.3%) are fairly low in 2014, however, they have increased gradually likely due to increased regulatory pressure. Moreover, some systematic delays are expected in registering the summary result, as trials covered by the Food and Drug Administration Amendments Act (FDAAA) are required to be updated within one year of completion²⁰. If there is a certain degree of delay, PICO elements from ClinicalTrials.gov will not be available in a timely manner for recent trials. An analysis of common algorithm errors is described below.

Algorithm error analysis

First, the rules extracted only one condition regardless of the multiple conditions for study participants registered in ClinicalTrials.gov. This error can be fairly easy to correct so that higher accuracy is expected for the next phase of the development. However, there were some cases in which registered information was incomplete. For instance, “Opioid-Induced Constipation (OIC)” was the original condition registered in the ClinialTrials.gov and extracted correctly by the rules, but our evaluator (JB) found “Non-cancer-related Pain” as an additional condition by reading other tags (Case 7, Table 4).

Table 4.

Examples of challenging cases for extracting PICO elements from clinical trial records in ClinicalTrials.gov.

Case	NCT identifier	Element type	XML node for challenging case	Number of similar cases	Type of error
7	01336205	Main condition of interests in study participants	<condition>Opioid-Induced Constipation (OIC)</condition>	2	Incomplete input by ClinicalTrials.gov user
7	01336205	Main condition of interests in study participants	<brief_title>Assessment of Long-term Safety in Patients With Non-cancer-related Pain and Opioid-induced Constipation</brief_title>	2	Incomplete input by ClinicalTrials.gov user
8	00924638	Results for the primary outcome	<category><sub_title>Month 138 (N=79)</sub_title><measurement_list><measu rement group_id=“O1” value-’748.1” lower_limit-’614.3” upper_limit=“911.1”/>	6	Time series outcome measurements
			<category><sub_title>Month 186 (N=98)</sub_title><measurement_list><measu rement group_id=“O1” value-’353.1” lower_limit-’284.0” upper_limit=“439.1”/>
			<category><sub_title>Month 246 (N=85)$</sub_title><measurement_list><meas urement group_id=“O1” value-’317.3” lower_limit-’247.4” upper_limit-”407.1”/>
9	00883246	Outcome	<participant_flow>…<group group_id-”P1”><title>Atherectomy</title>	6	Study arm changed without updating related information
9	00883246	Outcome	<outeome_list>…<group group_id-” O1” ><title>Claudicant Subgroup</title>	6	Study arm changed without updating related information

Open in a new tab

Second, there were several time series measurements in the results for primary outcome, but the rules could not extract them properly. In Case 8 (Table 4), there were 11 measurements registered in the time series, but the rules extracted only the first. In the Table 4, three measurements were only included for explanation. Corresponding enrollment information on each study arm per each point of measurement were recorded in <period_list> tag. The revised rules should process those multiple measurements.

Finally, outcome related attributes such as number of participants who started/completed and results for the primary outcome were the most challenging task while extracting PICO elements from ClinicalTrials.gov. As mentioned earlier, there are two types of XML files in ClinicalTrials.gov: general information about the trial methods and the trial results. There is a temporal gap between inputs for two XMLs. Usually, the former is input before the actual trial, and the latter is input after the actual trial. During this period, the arm design can be changed for several reasons. Thus, there can be different arm information in two XMLs. Moreover, there are several tags containing arm related information such as <participant_flow> tag and <outcome_list> tag. Unfortunately, users of ClinicalTrials.gov often input data inconsistently so that study arm related information becomes challenging to extract.

As in Case 9 (Table 4), there were six cases in which the values of tags for study arm interventions were changed completely. In the beginning of the study, all participants in the study were to be treated with directional atherectomy. However, primary patency rate was measured with patients treated for Claudication RCC 1-3 in the real study. Since the number of participants who completed the trial is only recorded in the <participant_flow> tag and the final intervention design is only recorded in <outcome_list> tag, there is no way to match the study arm intervention with the number of participants who completed the trial. ClinicalTrials.gov needs to consider a way to alert its users when they register a different study arm intervention from the study plan so that they can update misinformation accordingly based on the changed study design. There were also nine cases in which the arm names were slightly changed. For instance, in the trial of NCT00211887, the arm name “Interferon Beta 1a” was changed to an acronym, “IFB-1a.” The next phase of the development should incorporate advanced text extraction algorithms, such as named-entity recognition (NER) and medical semantic networks such as Unified Medical Language System (UMLS)²⁵ to overcome the current limitation of the rules.

Limitations

This study had several limitations. First, a small sample (n= 100) of trial records was used to test rules for the automatic extraction of PICO elements from ClinicalTrials.gov. Future studies are needed to test generalizability of the handcrafted rules with a larger dataset. Second, the study was limited to ClinicalTrials.gov. There are other large clinical trial repositories that could be used to increase coverage, but different methods will be needed since there is not a standard format for trial registration across registries. Third, our findings are specific to high quality clinical trials published in high impact journals. Still, these are the trials that are most likely to be useful for clinical decision support. Last, we did not evaluate the usefulness and readability of extracted PICO elements. Future studies are needed to investigate the design and assess the usefulness and readability of extracted information for patient care decisionmaking. Another important step is to map the extracted information to standard controlled vocabularies whenever possible to increase utility of extracted PICO elements.

Conclusion

Our study found that the coverage of ClinicalTrials.gov for high quality trials published in high impact journals, although relatively low, has continuously increased over the last ten years. Our method, based on a set of rules, was able to accurately automatically extract the majority of PICO elements from a sample of 100 clinical trial records. The study suggests that using ClinicalTrials.gov for extracting key clinical trial information to support patient care decision-making is feasible. Future studies include assessing and tuning the algorithm on a larger sample size, expanding our work to other clinical trial repositories, developing structured information displays according to PICO elements, and assessing the usefulness of this type of information display for supporting patient care decisions.

Acknowledgement

This study was supported by grants number 1R01LM011416-01 and R00LM011389 from the National Library of Medicine (NLM).

References

1.Sackett DL, Rosenberg WM, Gray JA, Haynes RB, Richardson WS. Evidence based medicine: What it is and what it isn't. BMJ. 1996;312(7023):71–72. doi: 10.1136/bmj.312.7023.71. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Del Fiol G, Workman TE, Gorman PN. Clinical questions raised by clinicians at the point of care: A systematic review. JAMA internal medicine. 2014;174(5):710–718. doi: 10.1001/jamainternmed.2014.368. [DOI] [PubMed] [Google Scholar]
3.Bastian H, Glasziou P, Chalmers I. Seventy-five trials and eleven systematic reviews a day: How will we ever keep up? PLoS medicine. 2010;7(9):e1000326. doi: 10.1371/journal.pmed.1000326. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Kilicoglu H, Demner-Fushman D, Rindflesch TC, Wilczynski NL, Haynes RB. Towards automatic recognition of scientifically rigorous clinical research evidence. Journal of the American Medical Informatics Association. 2009;16(1):25–31. doi: 10.1197/jamia.M2996. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Wilczynski NL, McKibbon KA, Walter SD, Garg AX, Haynes RB. MEDLINE clinical queries are robust when searching in recent publishing years. J Am Med Inform Assoc. 2013;20(2):363–368. doi: 10.1136/amiajnl-2012-001075. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Bernstam EV, Herskovic JR, Aphinyanaphongs Y, Aliferis CF, Sriram MG, Hersh WR. Using citation data to improve retrieval from MEDLINE. J Am Med Inform Assoc. 2006;13(1):96–105. doi: 10.1197/jamia.M1909. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Aphinyanaphongs Y, Tsamardinos I, Statnikov A, Hardin D, Aliferis CF. Text categorization models for high-quality article retrieval in internal medicine. Journal of the American Medical Informatics Association. 2005;12(2):207–216. doi: 10.1197/jamia.M1641. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Hauser SE, Demner-Fushman D, Jacobs JL, Humphrey SM, Ford G, Thoma GR. Using wireless handheld computers to seek information at the point of care: An evaluation by clinicians. J Am Med Inform Assoc. 2007;14(6):807–815. doi: 10.1197/jamia.M2424. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Richardson WS, Wilson MC, Nishikawa J, Hayward RS. The well-built clinical question: A key to evidence-based decisions. ACP J Club. 1995;123(3):A12–3. [PubMed] [Google Scholar]
10.Akatov AK, Zueva VS, Dmitrenko OA. A new approach to establishing the set of phages for typing methicillin-resistant staphylococcus aureus. J Chemother. 1991;3(5):275–278. doi: 10.1080/1120009x.1991.11739105. [DOI] [PubMed] [Google Scholar]
11.Schardt C, Adams MB, Owens T, Keitz S, Fontelo P. Utilization of the PICO framework to improve searching PubMed for clinical questions. BMC Med Inform Decis Mak. 2007;7:16. doi: 10.1186/1472-6947-7-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Hoogendam A, de Vries Robbe PF, Overbeke AJ. Comparing patient characteristics, type of intervention, control, and outcome (PICO) queries with unguided searching: A randomized controlled crossover trial. J Med Libr Assoc. 2012;100(2):121–126. doi: 10.3163/1536-5050.100.2.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Summerscales RL, Argamon S, Bai S, Huperff J, Schwartzff A. Automatic summarization of results from clinical trials. 2011:372–377. [Google Scholar]
14.Kiritchenko S, Bruijn B, Carini S, Martin J, Sim I. ExaCT: Automatic extraction of clinical trial characteristics from journal publications. BMC medical informatics and decision making. 2010;10(1):1. doi: 10.1186/1472-6947-10-56. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Boudin F, Nie J, Bartlett JC, Grad R, Pluye P, Dawes M. Combining classifiers for robust PICO element detection. BMC medical informatics and decision making. 2010;10(1):1. doi: 10.1186/1472-6947-10-29. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Huang K, Liu CC, Yang S, et al. Classification of PICO elements by text features systematically extracted from PubMed abstracts. 2011:279–283. [Google Scholar]
17.Huang K, Chiang I, Xiao F, Liao C, Liu CC, Wong J. PICO element detection in medical text without metadata: Are first sentences enough. J Biomed Inform. 2013;46(5):940–946. doi: 10.1016/j.jbi.2013.07.009. [DOI] [PubMed] [Google Scholar]
18.Demner-Fushman D, Lin J. Answering clinical questions with knowledge-based and statistical techniques. Computational Linguistics. 2007;33(1):63–103. [Google Scholar]
19.Huser V, Cimino JJ. Evaluating adherence to the international committee of medical journal editors’ policy of mandatory, timely clinical trial registration. J Am Med Inform Assoc. 2013;20(e1):e169–74. doi: 10.1136/amiajnl-2012-001501. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Kishore R, Tabor E. Overview of the FDA amendments act of 2007: Its effect on the drug development landscape. Drug Inf J. 2010;44(4):469–475. [Google Scholar]
21.Prayle AP, Hurley MN, Smyth AR. Compliance with mandatory reporting of clinical trial results on ClinicalTrials.gov: Cross sectional study. BMJ. 2012;344:d7373. doi: 10.1136/bmj.d7373. [DOI] [PubMed] [Google Scholar]
22.Jonnalagadda SR, Goyal P, Huffman MD. Automating data extraction in systematic reviews: A systematic review. Systematic reviews. 2015;4(1):1. doi: 10.1186/s13643-015-0066-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Slager LS, Weir C, Kim H, Mostafa J, Fiol GD. Alternative information display of clinical research to support clinical decision making. A formative evaluation. 2015 [Google Scholar]
24.Huser V, Cimino JJ. Linking ClinicalTrials. gov and PubMed to track results of interventional human clinical trials. PloS one. 2013;8(7):e68409. doi: 10.1371/journal.pone.0068409. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Bodenreider O. The unified medical language system (UMLS): Integrating biomedical terminology. Nucleic Acids Res. 2004;32(Database issue):D267–70. doi: 10.1093/nar/gkh061. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r1-2498363] 1.Sackett DL, Rosenberg WM, Gray JA, Haynes RB, Richardson WS. Evidence based medicine: What it is and what it isn't. BMJ. 1996;312(7023):71–72. doi: 10.1136/bmj.312.7023.71. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r2-2498363] 2.Del Fiol G, Workman TE, Gorman PN. Clinical questions raised by clinicians at the point of care: A systematic review. JAMA internal medicine. 2014;174(5):710–718. doi: 10.1001/jamainternmed.2014.368. [DOI] [PubMed] [Google Scholar]

[r3-2498363] 3.Bastian H, Glasziou P, Chalmers I. Seventy-five trials and eleven systematic reviews a day: How will we ever keep up? PLoS medicine. 2010;7(9):e1000326. doi: 10.1371/journal.pmed.1000326. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r4-2498363] 4.Kilicoglu H, Demner-Fushman D, Rindflesch TC, Wilczynski NL, Haynes RB. Towards automatic recognition of scientifically rigorous clinical research evidence. Journal of the American Medical Informatics Association. 2009;16(1):25–31. doi: 10.1197/jamia.M2996. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r5-2498363] 5.Wilczynski NL, McKibbon KA, Walter SD, Garg AX, Haynes RB. MEDLINE clinical queries are robust when searching in recent publishing years. J Am Med Inform Assoc. 2013;20(2):363–368. doi: 10.1136/amiajnl-2012-001075. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r6-2498363] 6.Bernstam EV, Herskovic JR, Aphinyanaphongs Y, Aliferis CF, Sriram MG, Hersh WR. Using citation data to improve retrieval from MEDLINE. J Am Med Inform Assoc. 2006;13(1):96–105. doi: 10.1197/jamia.M1909. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r7-2498363] 7.Aphinyanaphongs Y, Tsamardinos I, Statnikov A, Hardin D, Aliferis CF. Text categorization models for high-quality article retrieval in internal medicine. Journal of the American Medical Informatics Association. 2005;12(2):207–216. doi: 10.1197/jamia.M1641. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r8-2498363] 8.Hauser SE, Demner-Fushman D, Jacobs JL, Humphrey SM, Ford G, Thoma GR. Using wireless handheld computers to seek information at the point of care: An evaluation by clinicians. J Am Med Inform Assoc. 2007;14(6):807–815. doi: 10.1197/jamia.M2424. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r9-2498363] 9.Richardson WS, Wilson MC, Nishikawa J, Hayward RS. The well-built clinical question: A key to evidence-based decisions. ACP J Club. 1995;123(3):A12–3. [PubMed] [Google Scholar]

[r10-2498363] 10.Akatov AK, Zueva VS, Dmitrenko OA. A new approach to establishing the set of phages for typing methicillin-resistant staphylococcus aureus. J Chemother. 1991;3(5):275–278. doi: 10.1080/1120009x.1991.11739105. [DOI] [PubMed] [Google Scholar]

[r11-2498363] 11.Schardt C, Adams MB, Owens T, Keitz S, Fontelo P. Utilization of the PICO framework to improve searching PubMed for clinical questions. BMC Med Inform Decis Mak. 2007;7:16. doi: 10.1186/1472-6947-7-16. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r12-2498363] 12.Hoogendam A, de Vries Robbe PF, Overbeke AJ. Comparing patient characteristics, type of intervention, control, and outcome (PICO) queries with unguided searching: A randomized controlled crossover trial. J Med Libr Assoc. 2012;100(2):121–126. doi: 10.3163/1536-5050.100.2.010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r13-2498363] 13.Summerscales RL, Argamon S, Bai S, Huperff J, Schwartzff A. Automatic summarization of results from clinical trials. 2011:372–377. [Google Scholar]

[r14-2498363] 14.Kiritchenko S, Bruijn B, Carini S, Martin J, Sim I. ExaCT: Automatic extraction of clinical trial characteristics from journal publications. BMC medical informatics and decision making. 2010;10(1):1. doi: 10.1186/1472-6947-10-56. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r15-2498363] 15.Boudin F, Nie J, Bartlett JC, Grad R, Pluye P, Dawes M. Combining classifiers for robust PICO element detection. BMC medical informatics and decision making. 2010;10(1):1. doi: 10.1186/1472-6947-10-29. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r16-2498363] 16.Huang K, Liu CC, Yang S, et al. Classification of PICO elements by text features systematically extracted from PubMed abstracts. 2011:279–283. [Google Scholar]

[r17-2498363] 17.Huang K, Chiang I, Xiao F, Liao C, Liu CC, Wong J. PICO element detection in medical text without metadata: Are first sentences enough. J Biomed Inform. 2013;46(5):940–946. doi: 10.1016/j.jbi.2013.07.009. [DOI] [PubMed] [Google Scholar]

[r18-2498363] 18.Demner-Fushman D, Lin J. Answering clinical questions with knowledge-based and statistical techniques. Computational Linguistics. 2007;33(1):63–103. [Google Scholar]

[r19-2498363] 19.Huser V, Cimino JJ. Evaluating adherence to the international committee of medical journal editors’ policy of mandatory, timely clinical trial registration. J Am Med Inform Assoc. 2013;20(e1):e169–74. doi: 10.1136/amiajnl-2012-001501. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r20-2498363] 20.Kishore R, Tabor E. Overview of the FDA amendments act of 2007: Its effect on the drug development landscape. Drug Inf J. 2010;44(4):469–475. [Google Scholar]

[r21-2498363] 21.Prayle AP, Hurley MN, Smyth AR. Compliance with mandatory reporting of clinical trial results on ClinicalTrials.gov: Cross sectional study. BMJ. 2012;344:d7373. doi: 10.1136/bmj.d7373. [DOI] [PubMed] [Google Scholar]

[r22-2498363] 22.Jonnalagadda SR, Goyal P, Huffman MD. Automating data extraction in systematic reviews: A systematic review. Systematic reviews. 2015;4(1):1. doi: 10.1186/s13643-015-0066-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r23-2498363] 23.Slager LS, Weir C, Kim H, Mostafa J, Fiol GD. Alternative information display of clinical research to support clinical decision making. A formative evaluation. 2015 [Google Scholar]

[r24-2498363] 24.Huser V, Cimino JJ. Linking ClinicalTrials. gov and PubMed to track results of interventional human clinical trials. PloS one. 2013;8(7):e68409. doi: 10.1371/journal.pone.0068409. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r25-2498363] 25.Bodenreider O. The unified medical language system (UMLS): Integrating biomedical terminology. Nucleic Acids Res. 2004;32(Database issue):D267–70. doi: 10.1093/nar/gkh061. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Feasibility of Extracting Key Elements from ClinicalTrials.gov to Support Clinicians’ Patient Care Decisions

Heejun Kim, MS

Jiantao Bian, MS

Javed Mostafa, PhD

Siddhartha Jonnalagadda, PhD

Guilherme Del Fiol, MD, PhD

Abstract

Introduction