Abstract
Background
Critical thyroid nodule features are contained in unstructured ultrasound (US) reports. The Thyroid Imaging, Reporting, and Data System (TI-RADS) uses five key features to risk stratify nodules and recommend appropriate intervention. This study aims to analyze the quality of US reporting and the potential benefit of Natural Language Processing (NLP) systems in efficiently capturing TI-RADS features from text reports.
Materials and Method
This retrospective study used free-text thyroid US reports from an academic center (A) and community hospital (B). Physicians created “gold standard” annotations by manually extracting TI-RADS features and clinical recommendations from reports to determine how often they were included. Similar annotations were created using an automated NLP system and compared to the gold standard.
Results
282 reports contained 409 nodules at least 1-cm in maximum diameter. The gold standard identified three nodules (0.7%) which contained enough information to calculate a complete TI-RADS score. Shape was described most often (92.7% of nodules) while margins were described least often (11%). A median number of two TI-RADS features are reported per nodule. The NLP system was significantly less accurate than the gold standard in capturing echogenicity (27.5%) and margins (58.9%). 108 nodule reports (26.4%) included clinical management recommendations, which were included more often at site A than B (33.9 vs. 17%, p<0.05).
Conclusions
These results suggest a gap between current US reporting styles and those needed to implement TI-RADS and achieve NLP accuracy. Synoptic reporting should prompt more complete thyroid US reporting, improved recommendations for intervention, and better NLP performance.
Keywords: Natural language processing, thyroid, ultrasound, TI-RADS
INTRODUCTION
Thyroid nodules are very common in the adult population. Using high-resolution ultrasound (US), nodules can be detected incidentally in up to 67% of North American adults. However, the vast majority (95%) of thyroid nodules are benign.1,2 Providers thus need an approach to risk stratify nodules to determine appropriate work-up and intervention.
US plays a key role in both detecting nodules and assessing critical features which may indicate a single nodule’s risk of malignancy.3 In 2017, the American College of Radiologists (ACR) released its Thyroid Imaging, Reporting and Data System (TI-RADS). This system, akin to the Breast Imaging, Reporting and Data System (BI-RADS) for mammography assessment, includes 1) a standardized lexicon for radiologists to describe critical nodule features and 2) a scoring algorithm indicating appropriate clinical recommendations for intervention.4-6 In agreement with other professional societies, TI-RADS recommends limiting routine biopsies to thyroid nodules at least 1-cm in maximum diameter.
Currently, thyroid nodule information is entered into US reports within electronic health records (EHR) by dictating, direct entry, or through the use of speech recognition applications. These reports exist in a free-text, narrative form of varying style and structure such that manually extracting critical data and applying TI-RADS to each nodule becomes time-consuming, costly, and prone to error and variability.7 Furthermore, the sheer volume of robust data contained within the EHR necessitates a higher level of processing capability.
For this precise purpose, Natural Language Processing (NLP) systems have risen in popularity over the last few decades within the biomedical domain. “Natural language” describes the communication and language used by human beings. This language is distinct from, for example, a programming language used by a computer or an artificial language produced by a translating system. NLP describes the automated extraction of data from text and was historically developed as an application of artificial intelligence.7,8 Outside the medical domain, NLP applications include simple features like text recognition (e.g. a search function) or personal assistant devices. In healthcare, NLP has played a large role in information extraction from EHR text documents, which are essentially unstructured data in the form of natural language. In theory, training an NLP system on a point-based algorithm like TI-RADS requires less customized programming compared to completely unstructured reports. Therefore, synoptic style reporting like TIRADS offers new opportunities to incorporate NLP systems into the EHR for research or clinical decision support.
While TI-RADS is a helpful tool, few studies have investigated if the reporting style used by radiologists contains sufficient data to consistently apply TI-RADS scoring.6 Even fewer studies have examined the role of NLP in extracting critical nodule features from thyroid nodule US reports. Therefore, the goals of this study are two-fold: 1) to characterize current thyroid nodule US reporting styles to evaluate consistency with TI-RADS and 2) evaluate the potential benefit of TI-RADS for extracting thyroid nodule features using NLP.
METHODS
Study Population
The Institutional Review Board approved this retrospective study and waived the need for informed consent. All radiology reports used in this study were obtained from an existing, retrospectively collected dataset of 273 thyroid US reports from an academic site (A) and a community hospital center (B) from 2007 to 2013 (Figure 1). The initial database consisted of a computer-generated random 10% sample of thyroid US reports from adult patients (≥ 18 years old) with diagnosis codes for thyroid nodular disease. Any reports describing patients who previously had their entire thyroid removed (i.e. post-total thyroidectomy) or patients with a personal history of thyroid cancer were excluded (n = 26). In agreement with TI-RADS scoring recommendations, nodules smaller than 1-cm in maximum diameter were excluded (n = 220). The final population consisted of 409 total nodules for analysis.
Figure 1 – Thyroid US Report Study Population.

This flowchart illustrates selection process of the final thyroid US report and thyroid nodules included in the study. Reports were collected from site A, an academic center, and site B, a community hospital from 2007-2013. Nodules < 1-cm in maximum diameter were excluded.
Data Annotation and NLP Extraction
To evaluate US reporting of nodule features, manual and NLP annotations of each report were created based on the five TI-RADS categories. NLP data extraction was performed using clinical Text Analysis and Knowledge Extraction System (cTAKES), and performance of this NLP system is reported elsewhere.9,10 Manual annotation was performed using the Extensible Human Oracle Suite of Tools (eHOST).11 Two independent physician annotators created nodule annotations with attributes relevant to TI-RADS categories (Table 1). Any reports using ambiguous language to describe echogenicity, for example “heterogenous”, were categorized as “mixed” since they did not match any TI-RADS descriptions. Inter-rater disagreements were reviewed and adjudicated by a third physician rater to create a gold standard annotation.
TABLE 1 –
Thyroid Nodule Features and Clinical Recommendations
| Attribute Lexicon | |
|---|---|
| Shape | Taller-than-wide, wider-than-tall* |
| Echogenic Foci | Present/absent |
| Echogenicity | Hyperechoic, hypoechoic, anechoic, isoechoic, mixed** |
| Margins | Smooth, irregular |
| Composition | Solid, cystic, mixed |
| Clinical Recommendation Examples | |
| Not Provided | No recommendation present |
| Present | |
| No follow-up | “benign” or “likely benign” |
| Follow-up | “… may be clinically amenable to biopsy.” |
| “… consider US follow-up in the future.” | |
| Biopsy | “… biopsy strongly recommended to rule out malignancy.” |
Calculated using nodule (height:width) ratio. No report explicitly described a nodule as “taller-than-wide” or “wider-than-tall”.
“mixed” terminology indicated ambiguous language and was assigned a default score of 1
By adapting methods from Griffin et al., reports were manually analyzed for clinical recommendations for intervention.6 Intervention categories consisted of recommendation not provided, “no follow-up”, “follow-up”, or “immediate biopsy” and were determined by specific radiologist language used in US reports (Table 1). The clinical recommendations determined by these TI-RADS scores were then compared to the interventions included in the report to explore if radiologist reports reflected recommendations based on TI-RADS category.
TI-RADS Scoring
The 2017 ACR white paper contains detailed description of TI-RADS scoring.5 Briefly, five categories of critical thyroid nodule features (composition, echogenicity, margins, size, and echogenic foci) were retrospectively assigned point scores from features available in reports. Features with a higher risk of malignancy receive more points. For example, under the composition category, a solid nodule receives two points while a cystic nodule receives none. A nodule’s TI-RADS score (i.e. risk category) is its summed points for all five categories.
Any absent features were assigned a default score of zero. Any nodules with ambiguous or “mixed” echogenicity received a score of one in that category.
Statistical Analysis
Reporting style and quality were compared in multiple analyses. The gold standard was first used to identify nodules and the reporting frequency of TI-RADS features. This was then compared against the NLP system’s performance on the same task. Only the gold standard was used to compare reporting styles and clinical recommendations between sites A and B.
Statistical analyses were performed using STATA Special Edition 16.0. Fisher’s exact test was used to compare categorical variables while χ2 test was used to compare ordinal variables, namely TI-RADS scores. P-values < 0.05 were considered statistically significant.
RESULTS
Study Population
Overall inter-rater agreement in the gold standard annotation was 90.5%. Raters agreed most on nodule dimensions (95.3%) and least on less frequently mentioned features like calcifications (74.3%) and borders (54.2%). Of the 153 thyroid US reports from site A, the gold standard identified 227 thyroid nodules. The 120 reports from site B contained 182 nodules. Of the total 409 nodules, the NLP system captured 376 nodules (91.9%).
Evaluation of Reported Features between Annotators
Table 2 shows the frequency of reported TI-RADS features per nodule using gold standard annotations in comparison to NLP performance. The most frequently reported features include shape (92.7%) and composition (53.1%). Margins were reported least often (11.0%). Most reports did not mention echogenicity (61.8%) or echogenic foci (82.1%).
TABLE 2 –
Comparison of NLP Output to Gold Standard Annotations for TI-RADS Features
| Gold Standard (n = 409) |
NLP (n = 376) |
p-value | |
|---|---|---|---|
| Frequency of Reported Fields | N (%) | N (%) | |
| Shape | 379 (92.7) | 349 (92.8) | 1.000 |
| Echogenic Foci | 73 (17.9) | 58 (15.4) | 0.39 |
| Echogenicity | 156 (38.2) | 104 (27.7) | 0.002 |
| Margins | 45 (11.0) | 17 (4.52) | 0.001 |
| Composition | 217 (53.1) | 174 (46.3) | 0.06 |
NLP = Natural Language Processing
When comparing the NLP system output against the gold standard, similar trends were seen in the features that were more or less likely to be reported. The results showed significant differences in capture of echogenicity (38.2% vs. 27.7%, p = 0.002) and margins (11.0% vs. 4.52%, p = 0.001). Notably, the NLP system identified only 17 of 376 reports which included a description of margins.
Figure 2 illustrates the total number of missing TI-RADS fields per nodule. Overall, a median of two out of the five critical features are reported per nodule. According to the gold standard, only three of the 409 nodule reports (0.7%) had enough detail to calculate a complete TI-RADS score. The NLP system did not identify any complete nodule descriptions. Compared to gold standard, NLP showed significantly more nodules missing ≥ three total features (79.9% vs. 66.94%, p < 0.001).
Figure 2 – Characteristics of Total Missing TI-RADS Features Among Individual Nodule Reports.
Both gold standard annotations and NLP output contain almost no nodules with full reporting such that a complete TI-RADS score can be calculated. A majority of reports are missing three of the five TI-RADS features. Compared to the Gold Standard, the NLP System identified significantly fewer reports missing two features and more reports missing four features. TI-RADS = Thyroid Imaging, Reporting, and Data System
Reporting Style Analysis between Sites
Using the gold standard, the frequency of reported nodule features was compared between site A versus site B (Table 3). Site A reported 644 features for 227 nodules, while site B reported 531 features for 182 nodules. Overall, there was no difference in which site was more likely to document nodule features (p = 0.46). The order of features in how frequently they were reported was identical between sites.
TABLE 3 –
Comparison of Reporting Sites in Reporting TI-RADS
| Site A (n = 227) |
Site B (n = 182) |
p-value | |
|---|---|---|---|
| Frequency of Reported Fields | N (%) | N (%) | |
| Shape | 209 (92.1) | 170 (93.4) | 0.70 |
| Echogenic Foci | 39 (17.2) | 34 (18.7) | 0.70 |
| Echogenicity | 92 (40.5) | 64 (35.2) | 0.31 |
| Margins | 24 (10.6) | 21 (11.5) | 0.75 |
| Composition | 127 (56.0) | 90 (49.5) | 0.20 |
Clinical Recommendations for Intervention
Comparison of clinical recommendations between site A and site B are shown in Table 4. Overall, radiologists at both sites included clinical recommendations for the minority of nodules (33.9% at site A vs. 17.0% at site B), with site B being significantly less likely to include recommendations than site A (p < 0.001). 45 of the 77 nodules (58.4%) with radiologist-reported clinical recommendations at site A disagreed with the interventions indicated by retrospective TI-RADS scores, which was determined by the gold standard (Figure 3). Similarly, out of 31 nodules, the rate of disagreement at site B was 54.8% (p = 0.83). Manually calculated TI-RADS scores suggested intervention more often than radiology reports at both site A (36.4% of recommendations) and site B (45.2%, p = 0.51). 93.9% of these recommendations were made for nodules which were indicated for biopsy by TI-RADS.
TABLE 4 –
Radiologist Clinical Recommendations for Diagnostic Intervention and Consistency with TI-RADS between Reporting Sites
| Site A (n = 227) |
Site B (n = 182) |
p-value | |
|---|---|---|---|
| Clinical Recommendations | N (%) | N (%) | |
| Present | 77 (33.9) | 31 (17.0) | <0.001 |
| No follow-up | 14 (6.17) | 2 (1.10) | |
| Follow-up | 35 (15.4) | 18 (9.89) | |
| Biopsy | 28 (12.3) | 11 (6.04) | |
| Agree with TI-RADS | 32 (41.6) | 14 (45.2) | 0.830 |
| Recommendations | |||
| Disagree with TI-RADS | 45 (58.4) | 17 (54.8) | |
| Recommendations | |||
| Not Provided | 150 (66.1) | 151 (83.0) |
Figure 3 – Consistency of Clinical Recommendations with Retrospective TI-RADS Scores by Reporting Site.

Fewer than half of the nodule reports containing clinical recommendations at both sites agreed with the interventions indicated by retrospectively calculated TI-RADS scores. Among the reports which disagreed, radiologists tended to make less aggressive recommendations. No significant differences were found. TI-RADS = Thyroid Imaging, Reporting, and Data System
DISCUSSION
In our assessment of current thyroid US reporting style, we found significant underreporting of critical thyroid nodule features. A full TI-RADS score could not be calculated for nearly all nodules (99.3%). In the setting of incomplete TI-RADS scores, radiologists tended to provide recommendations for clinical intervention that were inconsistent with manually calculated TI-RADS scores or did not provide clear, actionable recommendations for further workup. Our results support the need for more explicit reporting by radiologists to provide definitive clinical decision-making support for ordering providers. Better implementation of TI-RADS reporting will improve management guidance and also allow NLP systems to more accurately and efficiently extract critical nodule features.
The role of NLP systems in automated feature extraction has been demonstrated in a number of other biomedical areas. These include a wide variety of applications ranging from other qualitatively descriptive reports (e.g. radiology, pathology, and echocardiography) to extracting diagnoses from ED discharge summaries.12-16 Of note, within radiology, NLP systems have been useful in applying BI-RADS to breast mammography reports with an accuracy of up to 96.6%.14 Numerous barriers still exist to the application of NLP in the clinical domain. First, the usefulness of NLP systems relies heavily on the quantity and quality of training sets. Within our study, margins were the least reported feature—it was reported for only 45 nodules, of which NLP captured 17. Such a small sample size complicates how we interpret statistical significance between the gold standard and NLP. In terms of quality, for example, a similar study by Castro et al. showed NLP accuracy decreases with increasing complexity of reports. Another study suggests ambiguous reporting language used by radiologists results in poor interrater agreement and NLP system capture rate.9 While TI-RADS includes a standardized lexicon for each feature, our results show a mismatch between radiologist reporting style and the proposed TI-RADS standard. The variation in style of US thyroid nodule descriptions poses a significant barrier to both manual and automated TI-RADS feature extraction.13
Our results primarily support the findings in other literature highlighting the omission of critical nodule features, whether intentional or unintentional, as a dominant issue not only in the United States but also worldwide.17-20 A study by Griffin et al. compared the quality of US reports before and after radiologists implemented a structured US reporting template based on the TI-RADS lexicon6. In the free-text report analysis, they found shape was missing in 100% of nodule reports. Per their methods, only explicit shape descriptions “wider-than-tall” or “taller-than-wide” from TI-RADS lexicon were recorded. We felt nodule dimensions would be sufficient to determine shape instead. Even after controlling for this feature, at least 92% of nodules had incomplete TI-RADS descriptions in their study. Similar to our work, they found unreported features were most often margins (missing in 92.0% of nodules) and echogenic foci (86.0%). Applying other risk stratification systems, such as the American Thyroid Association (ATA) or National Comprehensive Cancer Network (NCCN) guidelines, yielded similar results. A study by Jiang et al. found fewer than 15% of US reports for thyroid nodules met criteria for high-quality reports according to the ATA and NCCN guidelines.21 Inman et al. found fewer than half of 11 critical features in Canadian US reports were included.20 Both studies incorporated additional features, like lymph node description and vascularity as recommended by numerous committees including the ATA and ACR that are not included in TI-RADS reporting but also play an important role in risk stratifying for nodule malignancy. Notably, Inman et al. also found a higher number of reported elements was associated with a shorter surgical wait time following initial US. Not only does this cause psychological burden for patients, but it also suggests the quality of initial US may impact how thyroid nodules are triaged for biopsy or treatment.
Inclusion of clinical recommendations and institutional differences in reporting have been investigated by other projects as well. Karkada et al. demonstrate similarly low rates (46%) of thyroid US reports including explicit recommendations from radiologists.22 Interpretation of “unclassified” nodules varied significantly between readers. Even in nodules with recommendations, our study results support this finding where radiologist interpretations of nodules disagreed with those determined by TI-RADS for 56% of nodules. Uncertainty in management of these unclassified nodules poses significant safety risks to patients and further consumes limited resources in our healthcare system. Regarding institutional differences in reporting, Inman et al. suggest academic centers tend to include more features in thyroid US reports than community hospitals do.20 This finding differs from our results. It is possible the additional elements used in their study may be more complicated features which require highly specialized radiologist evaluation. We found radiologists at the academic center in our study tended to report clinical interventions more often. This was not investigated by Karkada et al. or Inman et al. We speculate radiologists at academic centers may be more familiar with treatment guidelines since they perform FNA biopsies. Alternatively, they may simply have more experience due to the high-volume nature of these institutions. As our study only involved a single academic institution, the generalizability of this observation remains unclear at this time.
Regardless of reporting site, it is very likely radiologists tend to exclude features if they are not present based on our clinical experience. A missing nodule feature raises ambiguity over whether it was evaluated and not seen or not evaluated. Because the expertise of thyroid nodule US varies between radiologists, technicians, and ordering physicians, documenters should explicitly report pertinent negatives to avoid potential error and misinterpretation.17,22,23 For this purpose, many experts argue for the adoption of synoptic or templated reporting in radiology reports.6,14,18,22,24-26 A structured template with an established lexicon for each critical feature would eliminate any ambiguity within reports and ensure each feature is being described. A prime example of this is supported by Griffin et al. After introducing a TI-RADS template with a required field for radiologist recommendations, the rates of feature description for all five categories improved to 100%. Furthermore, agreement between radiologists and ACR TI-RADS recommendations increased to 94%.6
Limitations of this study include the retrospective nature and the use of US reports from before the release of TI-RADS. Nonetheless, the reporting style remains consistent with more recent reports such as those used in Griffin et al., and we feel the reporting style at both institutions in this study is largely unchanged since the time these US reports were created. We also recognize when assigning default scores of zero to any missing features, assumptions were made for nearly all nodules in calculating TI-RADS scores. This scoring differs from that used by Griffin et al., who assigned scores of two and one to composition and echogenicity, respectively, if they could not be determined from text reports.6 From our clinical experience, default scores of zero were felt to be more appropriate on the assumption that radiologists are more likely to comment on high-risk features rather than reporting on normal or low-risk findings. Our results show these estimated scores still tend to suggest more aggressive interventions than those made by radiologists only among nodules which were recommended for biopsy according to TI-RADS. One possibility may be due to how we chose to interpret and categorize specific radiologist language—another example illustrating the need for more precise language. Additionally, many reports only describe nodule features if they had changed significantly from previous reports. It is possible radiologists made less aggressive recommendations given the knowledge that some nodules were stable in comparison to old reports. Obtaining this information, as well as correlating US features with final pathology (where available), would be an important goal for future studies.
In conclusion, current thyroid nodule US reports do not contain enough information to consistently apply TI-RADS to nodule management. We suggest this be addressed by the implementation of synoptic reporting. Adopting the TI-RADS scoring system should include synoptic reporting such that nodule features, even when normal or negative, are still reported. Application of synoptic TIRADS reporting style will not only increase the quality of US reports but also allow for the improved accuracy of NLP systems for research or clinical decision support.
ACKNOWLEDGEMENTS
Principal Investigator D.F.S. was responsible for conceiving original research idea and oversight of project. All authors participated in data analysis, discussed the results, and contributed to the final manuscript.
FUNDING
This work was supported by the National Institutes of Health [grant number 2T35DK062709-14], and the University of Wisconsin School of Medicine and Public Health.
Footnotes
DISCLOSURE
The authors report no proprietary or commercial interest in any product mentioned or concept discussed in this article. Dr. Schneider is an Associate Editor for the Journal of Surgical Research; as such, he was excluded from the entire peer-review and editorial process for this manuscript.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
REFERENCES
- 1.Tan GH. Thyroid Incidentalomas: Management Approaches to Nonpalpable Nodules Discovered Incidentally on Thyroid Imaging. Ann Intern Med. 1997;126(3):226. doi: 10.7326/0003-4819-126-3-199702010-00009 [DOI] [PubMed] [Google Scholar]
- 2.Ezzat S, Sarti DA, Cain DR, Braunstein GD. Thyroid Incidentalomas: Prevalence by Palpation and Ultrasonography. Arch Intern Med. 1994;154(16):1838–1840. doi: 10.1001/archinte.1994.00420160075010 [DOI] [PubMed] [Google Scholar]
- 3.Haugen BR, Alexander EK, Bible KC, et al. 2015 American Thyroid Association Management Guidelines for Adult Patients with Thyroid Nodules and Differentiated Thyroid Cancer: The American Thyroid Association Guidelines Task Force on Thyroid Nodules and Differentiated Thyroid Cancer. Thyroid. 2016;26(1):1–133. doi: 10.1089/thy.2015.0020 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Grant EG, Tessler FN, Hoang JK, et al. Thyroid Ultrasound Reporting Lexicon: White Paper of the ACR Thyroid Imaging, Reporting and Data System (TIRADS) Committee. J Am Coll Radiol. 2015;12(12):1272–1279. doi: 10.1016/j.jacr.2015.07.011 [DOI] [PubMed] [Google Scholar]
- 5.Tessler FN, Middleton WD, Grant EG, et al. ACR Thyroid Imaging, Reporting and Data System (TI-RADS): White Paper of the ACR TI-RADS Committee. J Am Coll Radiol. 2017; 14(5):587–595. doi: 10.1016/j.jacr.2017.01.046 [DOI] [PubMed] [Google Scholar]
- 6.Griffin AS, Mitsky J, Rawal U, Bronner AJ, Tessler FN, Hoang JK. Improved Quality of Thyroid Ultrasound Reports After Implementation of the ACR Thyroid Imaging Reporting and Data System Nodule Lexicon and Risk Stratification System. J Am Coll Radiol. 2018;15(5):743–748. doi: 10.1016/j.jacr.2018.01.024 [DOI] [PubMed] [Google Scholar]
- 7.Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF. Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research. Yearb Med Inform. 2008;17(01):128–144. doi: 10.1055/s-0038-1638592 [DOI] [PubMed] [Google Scholar]
- 8.Chapman WW, Nadkarni PM, Hirschman L, D’Avolio LW, Savova GK, Uzuner O. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J Am Med Inform Assoc JAMIA. 2011;18(5):540–543. doi: 10.1136/amiajnl-2011-000465 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Dedhia PH, Chen KJ, Imbus JR, Schneider DF. Ambiguous and Incomplete: Natural Language Processing Reveals Problematic Reporting Styles in Thyroid Ultrasound Reports. [DOI] [PubMed]
- 10.Savova GK, Masanz JJ, Ogren PV, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010;17(5):507–513. doi: 10.1136/jamia.2009.001560 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.South B, Shen S, Leng J, Forbush T, DuVall S, Chapman W. A Prototype Tool Set to Support Machine-Assisted Annotation. In: BioNLP: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing. Montréal, Canada: Association for Computational Linguistics; 2012:130–139. https://www.aclweb.org/anthology/W12-2416. Accessed July 29, 2019. [Google Scholar]
- 12.Patterson BW, Jacobsohn GC, Shah MN, et al. Development and validation of a pragmatic natural language processing approach to identifying falls in older adults in the emergency department. BMC Med Inform Decis Mak. 2019;19. doi: 10.1186/s12911-019-0843-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Castro SM, Tseytlin E, Medvedeva O, et al. Automated annotation and classification of BI-RADS assessment from radiology reports. J Biomed Inform. 2017;69:177–187. doi: 10.1016/j.jbi.2017.04.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Sippo DA, Warden GI, Andriole KP, et al. Automated Extraction of BI-RADS Final Assessment Categories from Radiology Reports with Natural Language Processing. J Digit Imaging. 2013;26(5):989–994. doi: 10.1007/s10278-013-9616-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Liu K, Mitchell KJ, Chapman WW, Crowley RS. Automating Tissue Bank Annotation from Pathology Reports – Comparison to a Gold Standard Expert Annotation Set. AMIA Annu Symp Proc. 2005;2005:460–464. [PMC free article] [PubMed] [Google Scholar]
- 16.Xu H, Anderson K, Grann VR, Friedman C. Facilitating Cancer Research using Natural Language Processing of Pathology Reports. 565–72 2004. [PubMed] [Google Scholar]
- 17.Symonds CJ, Seal P, Ghaznavi S, Cheung WY, Paschke R. Thyroid nodule ultrasound reports in routine clinical practice provide insufficient information to estimate risk of malignancy. Endocrine. 2018;61(2):303–307. doi: 10.1007/s12020-018-1634-0 [DOI] [PubMed] [Google Scholar]
- 18.Gamme G, Parrington T, Wiebe E, et al. The utility of thyroid ultrasonography in the management of thyroid nodules. Can J Surg. 2017;60(2):134–139. doi: 10.1503/cjs.010316 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Qadan L, Ahmed A, Kapila K. Thyroid Ultrasound Reports: Deficiencies and Recommendations. Med Princ Pract. 2019;28(3):280–283. doi: 10.1159/000497789 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Inman A, Liu K, Ong K, et al. Completeness of ultrasound reporting impacts time to biopsy for benign and malignant thyroid nodules. Am J Surg. 2017;213(5):931–935. doi: 10.1016/j.amjsurg.2017.03.030 [DOI] [PubMed] [Google Scholar]
- 21.Jiang L, Lee CY, Sloan DA, Randle RW. Variation in the Quality of Thyroid Nodule Evaluations Before Surgical Referral. J Surg Res. 2019;244:9–14. doi: 10.1016/j.jss.2019.06.024 [DOI] [PubMed] [Google Scholar]
- 22.Karkada M, Costa AF, Imran SA, et al. Incomplete Thyroid Ultrasound Reports for Patients With Thyroid Nodules: Implications Regarding Risk Assessment and Management. Am J Roentgenol. 2018;211(6):1348–1353. doi: 10.2214/AJR.18.20056 [DOI] [PubMed] [Google Scholar]
- 23.Su HK, Dos Reis LL, Lupo MA, et al. Striving Toward Standardization of Reporting of Ultrasound Features of Thyroid Nodules and Lymph Nodes: A Multidisciplinary Consensus Statement. Thyroid. 2014;24(9):1341–1349. doi: 10.1089/thy.2014.0110 [DOI] [PubMed] [Google Scholar]
- 24.Naik SS, Hanbidge A, Wilson SR. Radiology Reports: Examining Radiologist and Clinician Preferences Regarding Style and Content. Am J Roentgenol. 2001;176(3):591–598. doi: 10.2214/ajr.176.3.1760591 [DOI] [PubMed] [Google Scholar]
- 25.Berlin L Pitfalls of the Vague Radiology Report. 2000:8. [DOI] [PubMed] [Google Scholar]
- 26.Barbosa F, Maciel LMZ, Vieira EM, de Azevedo Marques PM, Elias J, Muglia VF. Radiological Reports: A Comparison between the Transmission Efficiency of Information in Free Text and in Structured Reports. Clinics. 2010;65(1):15–21. doi: 10.1590/S1807-59322010000100004 [DOI] [PMC free article] [PubMed] [Google Scholar]

