Skip to main content
JCO Clinical Cancer Informatics logoLink to JCO Clinical Cancer Informatics
. 2022 May 27;6:e2100129. doi: 10.1200/CCI.21.00129

Machine Learning Approach to Facilitate Knowledge Synthesis at the Intersection of Liver Cancer, Epidemiology, and Health Disparities Research

Travis C Hyams 1,, Ling Luo 2, Brionna Hair 1, Kyubum Lee 3, Zhiyong Lu 2, Daniela Seminara 1
PMCID: PMC9225668  PMID: 35623021

PURPOSE

Liver cancer is a global challenge, and disparities exist across multiple domains and throughout the disease continuum. However, liver cancer's global epidemiology and etiology are shifting, and the literature is rapidly evolving, presenting a challenge to the synthesis of knowledge needed to identify areas of research needs and to develop research agendas focusing on disparities. Machine learning (ML) techniques can be used to semiautomate the literature review process and improve efficiency. In this study, we detail our approach and provide practical benchmarks for the development of a ML approach to classify literature and extract data at the intersection of three fields: liver cancer, health disparities, and epidemiology.

METHODS

We performed a six-phase process including: training (I), validating (II), confirming (III), and performing error analysis (IV) for a ML classifier. We then developed an extraction model (V) and applied it (VI) to the liver cancer literature identified through PubMed. We present precision, recall, F1, and accuracy metrics for the classifier and extraction models as appropriate for each phase of the process. We also provide the results for the application of our extraction model.

RESULTS

With limited training data, we achieved a high degree of accuracy for both our classifier and for the extraction model for liver cancer disparities research literature performed using epidemiologic methods. The disparities concept was the most challenging to accurately classify, and concepts that appeared infrequently in our data set were the most difficult to extract.

CONCLUSION

We provide a roadmap for using ML to classify and extract comprehensive information on multidisciplinary literature. Our technique can be adapted and modified for other cancers or diseases where disparities persist.

INTRODUCTION

Liver Cancer Disparities

Liver cancer is a global public health challenge with more than 900,000 cases diagnosed and approximately 830,000 deaths in 2020.1,2 There are multiple established risk factors including viral, environmental, and behavior-related factors, and the etiology of the disease varies globally.3,4 Additionally, significant disparities persist across population factors such as sex,1,5,6 race and ethnicity,7,8 geography,2,6,7,9 socioeconomic status,5,10 and throughout the disease continuum from prevention to treatment and mortality.5-8,10-12

CONTEXT

  • Key Objective

  • Liver cancer occurs globally, and significant disparities exist across disease domains and populations. It is necessary to synthesize and analyze relevant literature on liver cancer to ensure that strategic priorities address disparities. In this study, we present the methods for developing a machine learning (ML) approach to synthesize literature at the intersection of liver cancer, epidemiology, and health disparities.

  • Knowledge Generated

  • It is possible to achieve a high degree of accuracy for a ML model to analyze multidisciplinary literature with relatively few training abstracts.

  • Relevance

  • More efficient ways to synthesize multidisciplinary literature are needed, and our methods can be adapted and to other cancers or diseases where health disparities exist. The model metrics that we present in this study can be used to benchmark the success of similarly designed ML approaches.

Effective knowledge synthesis of multidisciplinary literature is essential to guide the development of a research agenda addressing disparities in liver cancer. However, it is challenging to efficiently perform knowledge synthesis in this area given the rapidly evolving literature and the numerous etiological, geographical, methodological, and disease continuum domains of interest.

Machine Learning Approaches to Systematic Review

Advances in machine learning (ML) capabilities allow for more efficient use of researcher time and semiautomation of the literature review process.13 ML approaches can be used for literature reviews in two ways: (1) text classification and (2) data extraction.13 In this study, we detail our approach to developing a ML method to classify literature and extract data at the intersection of three interrelated fields: liver cancer, health disparities, and epidemiology. The overarching goal of this study was to develop a database that can be used to efficiently extract, evaluate, and synthesize literature at the interface of these areas. The approach we outline serves as a case study for performing a similar process for literature that bridges multiple disciplines. This methodology can be modified and applied to other cancers or diseases where disparities are prevalent. In this study, we also provide practical benchmarks and metrics to gauge the success of similarly designed studies.

METHODS

A visual depiction of our methodology can be found in Figure 1. We have labeled each phase in the process: (I) training, (II) validation, (III) confirmation, (IV) error analysis, (V) extraction, and (VI) application.

FIG 1.

FIG 1.

Study process diagram. BiLSTM, bidirectional long short-term memory; C, classifier; CNN, convolutional neural network; CRF, conditional random field; Ensemble-ALL, all listed deep learning techniques; ML, machine learning; R, round.

RESULTS

Phase I: Training the ML Classifier

The purpose of the training phase was to develop and optimize a ML model able to sort articles into two categories: relevant and irrelevant. We began by developing an initial PubMed search strategy that targeted the concepts liver cancer, health disparities, and epidemiology (Data Supplement). Our broad search criteria were designed to identify all relevant articles but could not eliminate a high degree of articles that did not meet our criteria. For all searches, we automatically removed editorials, letters to the editor, non–data-based reviews, and commentaries.

We conducted training of the ML classifier in 11 iterative rounds of manual curation, and we used curated articles from the current and all previous rounds to train each new classifier for the next round. We conducted rounds iteratively to constantly evaluate the model at each round and ensure that we only curated enough articles to reach a stable model performance. Criteria for abstract relevancy can be found in the Data Supplement. Abstracts must have fulfilled all three criteria (liver cancer, epidemiology, and health disparities) to be considered relevant. In the following paragraphs, we describe our process in detail.

During round 1 (R1), we conducted the PubMed search and manually classified an initial, random subset of abstracts (n = 673) as relevant or irrelevant. We then used the lists of curated PubMed IDs (PMIDs) to train the initial ML classifier (C1) with the LitSuggest system developed by the National Library of Medicine.14 This classifier uses an ensemble of ML techniques including support vector machine, K-nearest neighbor, random forest, and perceptron to train a logistic regression classifier, aggregate, and report output as a single model. In subsequent references, this ensemble of traditional ML techniques will be referred to as LitSuggest.

We then searched PubMed again and selected another random, unique subset of PMIDs (R2) and applied the initial ML model (C1, trained using the first round of curated abstracts) to the R2 test set. The C1 classifier assigned a relevance score (range: 0-1) to the test set corresponding to the likelihood that each abstract in R2 was relevant. The relevance score was calculated using an ensemble logistic regression technique, and we considered a model score of ≥ 0.50 to be relevant. We then manually classified abstracts in R2 as relevant or irrelevant to verify the results of the C1 model. We repeated this process for R3-R11, including the abstracts from the previous rounds test set into the training data for the new classifier (C2-C10), until we determined that the metrics had stabilized adequately when all models (C1-10) were compared when applied to the same manually curated set of articles (n = 207). The sum PMIDs from the 11 rounds (n = 718, positive; n = 1,381, negative) concluded our initial training set, for a total of 2,099 abstracts.

Phase I Training Results

Generally, the precision and recall of the classifier increased with each addition of the data from the previous round until classifier round 8 (C8). The results of each classifier round (C1-10) on the same test set of PMIDs are summarized in Table 1. We achieved maximum accuracy of approximately 91% and F1 score of 67% in rounds C8-10.

TABLE 1.

Results of Each Training Round (C1-10) Against the Same Test Set (n = 207)

graphic file with name cci-6-e2100129-g002.jpg

Phase II: Validating the ML Classifier

To evaluate the robustness of our classifier and compare several ML techniques, we conducted two rounds of validation. For validation, we used the 2,099 classified PMIDs identified in phase I in addition to 764 random negatives randomly selected, on any topic, from the whole PubMed database. Including random negative articles improves model performance by ensuring that the classifier can adequately classify articles outside of the scope of our initial targeted searches.

The database of 2,863 PMIDs (718 positive, 1,381 negative, and 764 random negative) was split into two distinct validation sets. Each validation was considered a separate analysis and used all 2,863 articles. Eighty percent of articles in each of the validation sets were randomly selected and used to train the classifier. For this phase, we trained the classifier using LitSuggest in addition to advanced deep learning techniques that use neural networks including convolutional neural network (CNN),15 bidirectional long short-term memory (BiLSTM),16 transformer,17 BioBERT,18 and an averaging ensemble19 of the LitSuggest model and all listed deep learning techniques (Ensemble-ALL). The Ensemble-ALL method averaged the prediction score for each model to assign its own prediction score (> 0.50 = relevant). The trained classifiers were then applied to the remaining 20% of abstracts in each validation set (n = 574). To justify the ML approach, we also applied the search strategy (Data Supplement) to the validation sets to determine the query-based method's performance compared with the ML approach.

Phase II Validation Results

The validation of the ML classifier showed adequate and stable results across the two validation sets and performs better than the query-based retrieval strategy. A summary of the performance of each model is provided in Table 2. Precision scores ranged from 69.32% (BiLSTM, V1) to 83.87% (Transformer, V2). Recall scores ranged from 70.92% (Transformer, V1) to 87.94% (CNN V1). F1 scores ranged from 76.34% (Transformer, V1) to 83.27% (Ensemble-ALL, V1). Accuracy scores ranged from 87.28% (BiLSTM, V1) to 91.81% (Ensemble-ALL, V1). These validation tests show that the Ensemble-ALL model performed the best overall using the accuracy and F1 metrics. We selected the Ensemble-ALL method because high F1 score indicates a strong balance between precision and recall.

TABLE 2.

Results of the Two Independent Validation Analysis Using 80% of the Data Set as Training and 20% as Validation

graphic file with name cci-6-e2100129-g003.jpg

Phase III: Confirming the Classifier

The purpose of the confirmation phase was to determine that the classifier can identify studies fulfilling all three of our criteria when applied to all studies in PubMed identified by the broad liver cancer portion of our search strategy. We used liver cancer as the initial search encompassing concept because we determined that this concept had the clearest boundaries. To form our test set, we classified 150,886 abstracts from the search using the Ensemble-ALL model and then randomly sampled approximately 103 abstracts in each 0.1 block of relevance scores assigned by the Ensemble-ALL model (total n = 1,033 abstracts). We then manually classified these articles to calculate the model metrics for the Ensemble-ALL in addition to the LitSuggest and single neural network models for comparison.

Phase III Confirmation Results

We determined that the Ensemble-ALL model performed well when applied broadly to liver cancer literature. The model performed with a precision of 74.30%, recall of 89.59%, F1 score of 81.23%, and accuracy of 82.28% (Table 3). The Ensemble-ALL model performed the most accurately at the highest (0.9-1.0; 96.67%) and lowest (0.0-0.1; 100%) model-assigned relevance scores (Data Supplement). Over 94% of articles fell within these two score ranges. The model performed most poorly in the 0.5-0.6 relevancy score range with an accuracy of 54.62% (n = 657, 0.44%).

TABLE 3.

Results of the Machine Learning Classifier Confirmation

graphic file with name cci-6-e2100129-g004.jpg

Phase IV: Error Analysis

To qualitatively evaluate the performance of the ML classifier, we performed an error analysis using the false-positive (n = 137) and false-negative (n = 41) abstracts from the confirmation phase. A reviewer (T.C.H.) qualitatively evaluated the false-positive and false-negative abstracts. The reviewer determined the inclusion domain that was missing (liver cancer, disparities, or epidemiology) from false-positive results and the importance of false-negative articles to the overall data set by evaluating the disparities domain of the erroneously classified abstracts.

Phase IV Error Analysis Results

Of the 137 false-positive results, most abstracts were missing the disparities component (n = 119, 86.9%). Only 12 (8.8%) abstracts were not focused on liver cancer, and six (4.4%) abstracts did not use epidemiologic methods. Twenty (48.8%) of the 41 false-negative results included disparities by age, 31 (75.6%) by sex, seven (17.1%) by geography, and one (2.4%) by race or ethnicity. None of the other disparities categories (education, income, social status, disability, or sexual orientation) were represented in the false-negative dataset. Nineteen of the 41 (46.3%) false-negative results included one or more of these disparities categories in their analytic models, 20 (48.8%) stratified results by one or more of these categories, and two (4.9%) focused on a single disparities population.

Phase V: Extraction of Abstract Data

The purpose of phase V was to apply ML to extract concepts from our relevant data set and classify PMIDs into topic areas. We conducted the classification task using a named entity recognition (NER) analysis. We used the NER analysis for classification to efficiently identify combinations of topics that a simple classifier model would have missed. We used the codebook contained in the Data Supplement to highlight relevant words contained in 60 randomly selected, relevant abstracts pertaining to a prespecified set of codes using the TeamTat tool developed by the National Library of Medicine (NLM).20 Codes contained within the codebook covered etiological, care continuum, methodological and disparities domains. This initial set of 60 coded abstracts was then used to train a data extraction model (NLM). Like many NER tasks, we modeled the coded extraction task as a sequence labeling problem. Then, the bidirectional long short-term memory with a conditional random field layer model was used to extract these codes.21 The initial model was applied to a test set of 50 abstracts. We manually corrected the machine-assigned codes and calculated model metrics for each individual code and for the overall model. Since our overall goal was classification into topic areas according to the machine-assigned codes, these metrics were calculated by assigning each abstract a binary (yes/no) for each code. A true positive was assigned when both the ML code and the human assigned the code to an abstract ≥ 1 time; a false positive, when the machine assigned the code ≥ 1 time in the abstract, but the human did not; a false negative, when the machine did not assign the code, but the human did ≥ 1 time; a true negative, when both the machine and human did not assign the code to the abstract. We repeated this process a total of six times with 50 randomly selected, relevant articles per test set (n = 310 PMIDs), each round adding the previous test set to further train the extraction model for the next round. In the seventh round, we aimed to improve the metrics for concepts with < 20% F1 score (environmental factors and interventions) by purposively sampling 20 studies containing these concepts. We then annotated these studies to better train the model to identify these concepts, applied the new model to 20 studies and calculated model metrics specifically for the intervention and environment concepts.

Phase V Extraction Results

The results for the overall model for each test set are summarized in Table 4. In R6, the model performed with a precision score of 94.79%, a recall of 92.79%, and F1 of 93.78%. The detailed results of the extraction model for individual concept codes in R6 can be found in the Data Supplement. For disparities concepts, the ML extraction performed at between 72.73% F1 score for age disparities and 100% for income and gender disparities. The model performed adequately for determining that studies focused on hepatocellular carcinoma (F1 = 98.36%) and other types of liver cancer (F1 = 100%). For study design, the model was able to achieve excellent F1 score for meta-analysis (F1 = 100%), and adequately for cohort studies (F1 = 82.05%), and case-control studies (F1 = 90.91%). For risk factor and etiological domains, the models also performed well on some concepts such as behavioral risk factors (F1 = 100%), genetic risk factors (F1 = 100%), and viral risk factors (F1 = 97.14%). After targeted searches, the model performed adequately for intervention studies (F1 = 66.67%) and very well for environmental risk factors (F1 = 100%).

TABLE 4.

Results of the Data Extraction R2-R6

graphic file with name cci-6-e2100129-g005.jpg

Phase VI: Application of the Classifier and Data Extraction Model

We applied the data classifier to the same data set of (n = 150,886) abstracts used in the confirmation phase found using the liver cancer-only search strategy. We applied the data extraction model to the final data set of relevant articles (n = 2,835). We calculated descriptive statistics for individual coding domains using 0 = no and 1 = yes for whether each concept appeared in the abstract ≥ 1 time.

Phase VI Application Results

The detailed results of the application of the classifier and data extraction model are presented in Figure 2. The data extraction model determined that most studies (2,191 of 2,835) in our data set focused on hepatocellular carcinoma (77.3%) while 5.9% focused on other types of liver cancer (cholangiocarcinoma, hepatoblastoma, and angiosarcoma). The most common disparity focus was sex (49.6%), and the least common was education (1.3%). For study design, cohort studies were most common (41.5%), and the fewest number of studies were interventions (1.3%). Most studies focused on risk assessment (46.1%) and outcomes (49.5%) while the fewest number of studies focused on quality of life (3.7%).

FIG 2.

FIG 2.

Number of relevant studies containing each coding concept. HCC, hepatocellular carcinoma; LC, liver cancer.

DISCUSSION

In this study, we present a detailed process for developing and testing a ML classifier and data extraction model at the intersection of three interrelated fields. We provide evidence that a level of performance can be achieved for classifying multidisciplinary cancer literature that is similar to other, single-field, cancer literature classification studies.22 With 2,099 training abstracts, our classifier only misclassified approximately 9% of articles and performed much better than the PubMed query–based method. Qualitatively, the most challenging concept for the classifier to recognize was disparities because of this concept's multifaceted nature and the numerous ways that investigators describe disparities. Targeted inclusion of abstracts that describe diverse disparities domains and analytic methods may improve metrics further. The model performed extremely well at evaluating studies using epidemiologic methods and those focused on liver cancers. Investigators will still need to conduct hands-on annotation to ensure that the data set is comprehensive for articles that fall in the middle of a classifier's inclusion range, primarily between 0.5 and 0.6 range.

We also show that it is possible to achieve a high level of performance extracting broadly applicable concepts such as study design, disparities type, and care continuum and more specific topical concepts such as risk factors with relatively few article annotations (n = 350, approximately 12.3% of positively classified articles). For domains that appear infrequently, and, therefore, perform poorly, we show that targeted inclusion in the extraction data set can ensure that the extraction model is performing adequately for these concepts.

Our study is not without limitations that must be discussed when interpreting results and when using this study as a model for future work. First, to ensure that the human annotation task was feasible, we chose to work with abstracts, rather than full-text articles. It is possible, although unlikely, that some disparities comparisons or references to primary liver cancers are found within full-text articles. This limitation is important to consider when interpreting any study using abstracts instead of full-text articles because full-text articles are generally more comprehensive23 but are not always available without additional investigator input. Second, human annotation tasks are inherently subjective in nature, so the classifier, extraction model, and final data set that we have developed are subject to coding rules that we set. We mitigate this by clearly presenting our codebook and inclusion criteria so that other teams can interpret the results and make modifications for their own study. Finally, this investigation focused solely on abstracts contained within PubMed, although relevant literature can likely be found across multiple publication databases. However, the process that we used is flexible and should be modified to include other databases that are relevant to a study.

As the research enterprise continues to grow24 and research that bridges multiple disciplines is becoming more common,25 it is becoming more challenging for investigators to keep up-to-date bibliometric databases pertaining to their area of work. Furthermore, it is imperative that strategic planners, research agencies, and policy developers move toward more efficient methods to analyze and review multidisciplinary literature to be used to orient current and future research agendas. Researchers have weighed the pros, cons, and diverse uses of employing ML for the review process.26,27 Others have analyzed the performance of these processes and the time that could be saved by automating parts of the review process with mixed success.28-30 We add to this literature by presenting a roadmap for investigators to use similar processes and techniques and by providing practical benchmark metrics at each step of the development. We also show that it is possible to conduct this work at the intersection of multiple fields. Finally, our methods are flexible and can be adopted for other cancer sites or diseases where disparities persist.

ACKNOWLEDGMENT

We would like to thank Barbara Brandys at the National Institutes of Health (NIH) Library for her assistance in creating the PubMed searches that we used in this manuscript.

Kyubum Lee

Employment: Amgen

Stock and Other Ownership Interests: Amgen

Travel, Accommodations, Expenses: Amgen

Zhiyong Lu

Patents, Royalties, Other Intellectual Property: I have received royalties from my patent named “Method and System of Building Hospital-Scale Chest X-Ray Database for Entity Extraction and Weakly-Supervised Classification and Localization of Common Thorax Diseases”

No other potential conflicts of interest were reported.

DISCLAIMER

This is a US Government work. There are no restrictions on its use. The opinions expressed by the authors are their own, and this material should not be interpreted as representing the official viewpoint of the US Department of Health and Human Services, the National Institutes of Health, or the National Cancer Institute.

SUPPORT

Supported by the Intramural Research Programs of the National Institutes of Health, National Library of Medicine.

AUTHOR CONTRIBUTIONS

Conception and design: Travis C. Hyams, Ling Luo, Brionna Hair, Kyubum Lee, Zhiyong Lu, Daniela Seminara

Financial support: Travis C. Hyams, Ling Luo

Administrative support: Daniela Seminara

Collection and assembly of data: Travis C. Hyams, Ling Luo, Brionna Hair, Kyubum Lee, Daniela Seminara

Data analysis and interpretation: Travis C. Hyams, Ling Luo, Zhiyong Lu, Daniela Seminara

Manuscript writing: All authors

Final approval of manuscript: All authors

Accountable for all aspects of the work: All authors

AUTHORS' DISCLOSURES OF POTENTIAL CONFLICTS OF INTEREST

The following represents disclosure information provided by authors of this manuscript. All relationships are considered compensated unless otherwise noted. Relationships are self-held unless noted. I = Immediate Family Member, Inst = My Institution. Relationships may not relate to the subject matter of this manuscript. For more information about ASCO's conflict of interest policy, please refer to www.asco.org/rwc or ascopubs.org/cci/author-center.

Open Payments is a public database containing information reported by companies about payments made to US-licensed physicians (Open Payments).

Kyubum Lee

Employment: Amgen

Stock and Other Ownership Interests: Amgen

Travel, Accommodations, Expenses: Amgen

Zhiyong Lu

Patents, Royalties, Other Intellectual Property: I have received royalties from my patent named “Method and System of Building Hospital-Scale Chest X-Ray Database for Entity Extraction and Weakly-Supervised Classification and Localization of Common Thorax Diseases”

No other potential conflicts of interest were reported.

REFERENCES

  • 1.Sung H, Ferlay J, Siegel RL, et al. : Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 71:209-249, 2021 [DOI] [PubMed] [Google Scholar]
  • 2.Dasgupta P, Henshaw C, Youlden DR, et al. : Global trends in incidence rates of primary adult liver cancers: A systematic review and meta-analysis. Front Oncol 10:171, 2020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Petrick JL, Florio AA, Znaor A, et al. : International trends in hepatocellular carcinoma incidence, 1978-2012. Int J Cancer 147:317-330, 2020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.McGlynn KA, Petrick JL, El-Serag HB: Epidemiology of hepatocellular carcinoma. Hepatology 73:4-13, 2021. (suppl 1) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Wang S, Sun H, Xie Z, et al. : Improved survival of patients with hepatocellular carcinoma and disparities by age, race, and socioeconomic status by decade, 1983–2012. Oncotarget 7:59820-59833, 2016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Wong MCS, Jiang JY, Goggins WB, et al. : International incidence and mortality trends of liver cancer: A global profile. Sci Rep 7:45846, 2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Islami F, Miller KD, Siegel RL, et al. : Disparities in liver cancer occurrence in the United States by race/ethnicity and state. CA Cancer J Clin 67:273-289, 2017 [DOI] [PubMed] [Google Scholar]
  • 8.Li J, Hansen BE, Peppelenbosch MP, et al. : Factors associated with ethnical disparity in overall survival for patients with hepatocellular carcinoma. Oncotarget 8:15193-15204, 2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.McGlynn KA, London WT: The global epidemiology of hepatocellular carcinoma, present and future. Clin Liver Dis 15:223-243, 2011. vii-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Singal AG, Yopp A, Skinner SC, et al. : Utilization of hepatocellular carcinoma surveillance among American patients: A systematic review. J Gen Intern Med 27:861-867, 2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Singal AG, Li X, Tiro J, et al. : Racial, social, and clinical determinants of hepatocellular carcinoma surveillance. Am J Med 128:90.e1-7, 2015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Xu L, Kim Y, Spolverato G, et al. : Racial disparities in treatment and survival of patients with hepatocellular carcinoma in the United States. Hepatobiliary Surg Nutr 5:43-52, 2016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Marshall IJ, Wallace BC: Toward systematic review automation: A practical guide to using machine learning tools in research synthesis. Syst Rev 8:163, 2019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Allot A, Lee K, Chen Q, et al. : LitSuggest: A web-based system for literature recommendation and curation using machine learning. Nucleic Acids Res 49:W352-W358, 2021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Kim Y: Convolutional Neural Networks for Sentence Classification. 2014. arXiv:14085882 [cs]. http://arxiv.org/abs/1408.5882 [Google Scholar]
  • 16.Long Short-Term Memory | Neural Computation. https://dl.acm.org/doi/10.1162/neco.1997.9.8.1735 [DOI] [PubMed] [Google Scholar]
  • 17.Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need. 2017. https://arxiv.org/pdf/1706.03762.pdf
  • 18.Lee J, Yoon W, Kim S, et al. : BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36:1234-1240, 2020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Zhou Z-H: Ensemble Methods: Foundations and Algorithms. Boca Raton, FL, CRC Press, 2012 [Google Scholar]
  • 20.Islamaj R, Kwon D, Kim S, et al. : TeamTat: A collaborative text annotation tool. Nucleic Acids Res 48:W5-W11, 2020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Luo L, Yang Z, Yang P, et al. : An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition. Bioinformatics 34:1381-1388, 2018 [DOI] [PubMed] [Google Scholar]
  • 22.Bao Y, Deng Z, Wang Y, et al. : Using machine learning and natural language processing to review and classify the medical literature on cancer susceptibility genes. JCO Clin Cancer Inform 3:1-9, 2019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Westergaard D, Stærfeldt H-H, Tønsberg C, et al. : A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts. PLoS Comput Biol 14:e1005962, 2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Landhuis E: Scientific literature: Information overload. Nature 535:457-458, 2016 [DOI] [PubMed] [Google Scholar]
  • 25.Van Noorden R: Interdisciplinary research by the numbers. Nature 525:306-307, 2015 [DOI] [PubMed] [Google Scholar]
  • 26.Baclic O, Tunis M, Young K, et al. : Challenges and opportunities for public health made possible by advances in natural language processing. Can Commun Dis Rep 46:161-168, 2020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.van Dinter R, Tekinerdogan B, Catal C: Automation of systematic literature reviews: A systematic literature review. Inf Softw Technol 136:106589, 2021 [Google Scholar]
  • 28.Gates A, Guitard S, Pillay J, et al. : Performance and usability of machine learning for screening in systematic reviews: A comparative evaluation of three tools. Syst Rev 8:278, 2019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Reddy SM, Patel S, Weyrich M, et al. : Comparison of a traditional systematic review approach with review-of-reviews and semi-automation as strategies to update the evidence. Syst Rev 9:243, 2020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Wang, M: Does artificial intelligence really benefit reviewers with reduced workload? A mixed-methods usability study on systematic review software, 2020. 10.17615/r9yk-k096 [DOI]

Articles from JCO Clinical Cancer Informatics are provided here courtesy of American Society of Clinical Oncology

RESOURCES