Abstract
Identifying valuable information within the extensive texts documented in natural language presents a significant challenge in various disciplines. Named Entity Recognition (NER), as one of the critical technologies in text data processing and mining, has become a current research hotspot. To accurately and objectively review the progress in NER, this paper employs bibliometric methods. It analyzes 1300 documents related to NER obtained from the Web of Science database using CiteSpace software. Firstly, statistical analysis is performed on the literature and journals that were obtained to explore the distribution characteristics of the literature. Secondly, the core authors in the field of NER, the development of the technology in different countries, and the leading institutions are explored by analyzing the number of publications and the cooperation network graph. Finally, explore the research frontiers, development tracks, research hotspots, and other information in this field from a scientific point of view, and further discuss the five research frontiers and seven research hotspots in depth. This paper explores the progress of NER research from both macro and micro perspectives. It aims to assist researchers in quickly grasping relevant information and offers constructive ideas and suggestions to promote the development of NER.
Keywords: Named entity recognition, CiteSpace, Natural language processing, Bibliometrics
Highlights
-
•
A bibliometric approach to analyzing hotspots and frontiers in the field of NER.
-
•
Explosive document analysis explores mainstream approaches in various periods of NER.
-
•
Explore the cooperation and publication among authors, institutions, and countries.
-
•
Statistical methods explore the distributional characteristics of literature.
1. Introduction
With the arrival of the information era, text data in various fields has grown exponentially. A lot of valuable professional information is covered in semi-structured or unstructured text recorded in natural language. How to mine these from massive text data information has become a research hotspot in various fields. It is usually time-consuming and error-prone to manually extract information from these data, so the method of extracting information from texts using artificial intelligence technology comes into being.
The concept of named entity (NE) was first used in the Message Understanding Conference - 6 (MUC-6) [1], in which the main concerned entity categories are people, organizations, places, time expressions, etc. (general field). In a specific subject field, NE refers to the object of concern. For example, in biology, it refers to proteins, genes, diseases [2,3], and so on. In chemistry, it refers to compounds, solvents [4,5], and so on. NER is a crucial and essential task in text mining, which aims to identify the types and boundaries of NE. For the label sequence of a given text, one or more triplet lists are obtained after the NER model, each triplet contains one entity information, in the triplet, shows the beginning index position of the entity, shows the terminate index position of the entity, and indicates the entity type. Its principal structure is shown in Fig. 1. Given a paragraph of text “Xiao Li Lives in Beijing, the capital of China.” three triples are obtained by the NER model, , , where shows the beginning index position of entity “Person”, means the terminate index position of the entity “Person”, and “Person” is the entity type. NER not only plays a crucial role in the field of text mining but also in natural language processing (NLP) such as information retrieval [6], automatic text summarization [7], question answering system [8], machine translation [9], knowledge graph [10], etc. applications also play an essential role.
Fig. 1.
structural schematic diagram of NER.
Early NER methods can be divided into two categories: rule-based methods [11,12] and machine learning-based methods [13,14]. The rule-based method means that experts manually create rule templates, and then the entity is obtained by matching according to the rule template or lexicon. Although this method has high recognition accuracy, it has high labor costs and poor generalization ability. Some well-known rule-based NER systems include Lasie-II [15], NetOw1 [16], Facile [17], and FASTUS [18], etc. These systems are mainly based on manually customized semantic and grammatical rules to recognize entities. Machine learning-based methods transform the NER task into a sequence labeling task, which uses a large-scale annotated corpus to train the annotation model to tag each token in the text through the trained model. Then, the automatically tagged sequence is decoded according to the tagging scheme and integrated into the NE composed of several characters in the text [19]. Machine learning algorithms can be divided into three categories according to whether the training dataset is labeled: supervised learning, semi-supervised learning, and unsupervised learning. Common machine algorithms include the Hidden Markov Model (HMM) [20], Maximum Entropy Model (MENE) [21], Support Vector Machine (SVM) [22], Decision Tree (DT) [23] and Conditional Random Field (CRF) [24], etc. In recent years, deep learning has proved to be an effective strategy to extract feature representations directly from text data, which has made breakthroughs in the field of NER. Compared with methods based on statistical learning, deep learning makes it easier to discover hidden features due to the characteristics of multi-layer nonlinearity [25].
Although NER has been developing for decades, there are few reviews in this field. In 2013, Marrero et al. [26] conducted an in-depth discussion on the application, evaluation methods, and different definitions of named entities of NER, with special emphasis on the research mainstream of machine learning-based and rule-based NER technology at that time. With the rise of deep learning technology, the NER field has experienced significant changes. Goyal et al. [27] provided a comprehensive overview of the development status of NER and classification technology and explored diverse technical paths from rule-based methods to unsupervised learning. Nasar et al. [28] conducted an extensive review of methods for NER and relationship extraction, highlighting the advantages of hybrid and joint models based on deep learning. Their research revealed the significant contribution of deep learning technology in improving recognition accuracy and processing complex entity relationships. Li et al. [29] focused on introducing the NER method based on deep learning by subdividing NER technology into a distributed representation of the input, context encoder, and label decoder. They not only demonstrate the impact of deep learning techniques in standardizing model structures but also systematically classify existing work. Previous research has deeply explored various aspects of NER technology, contributing important insights to the development of the field. However, most of them focus on evaluating specific technologies or methods, and this focused perspective rarely touches on the macro development trends of NER research. Likewise, there is a relative lack of comprehensive assessments on the evolution of research hotspots and cutting-edge technologies.
Bibliometrics can quantitatively analyze the advanced trends and research hotspots of the field based on published literature [30], and it is also an objective and scientific analysis method. Through this method, researchers can explore important topics and their interrelationships in the research field and deeply understand the process of knowledge sharing and diffusion, thus providing valuable insights for future research directions and policy formulation. For example, Yu and Pan [31] applied bibliometric methods to deeply explore and analyze the knowledge development process in the research field of Technique for Order Preference by Similarity to an Ideal Solution. Through a comprehensive survey of key literature transmission paths in citation networks, this paper reveals the knowledge diffusion model and its development trajectory over time in this field. Furthermore, it delves into the intricate knowledge structure and specialized research topics within the research community of this field. Yu et al. [32] analyzed literature related to intuitionistic fuzzy set theory through bibliometrics, which provided a macro perspective on the evolution of research in this field and vividly demonstrated the evolution of topics in this field. The knowledge diffusion path in this field was explored through the main path analysis of global and critical paths. These studies demonstrate the powerful application capabilities of bibliometrics in the field of scientific research and highlight its value as a scientific research tool. CiteSpace is a bibliometrics visualization software based on Java language [33], providing powerful tools to objectively reveal development trends and research hotspots in the scientific field. The software can analyze and visualize citation relationships, co-citation networks, and keyword co-occurrence networks in documents, thereby directly displaying mainstream research directions and key issues in the field [34]. At the same time, through detailed visual display, CiteSpace can depict the evolution of the knowledge structure and the interaction of the research community in the NER field, providing a basis for further research. This is particularly important for NER, a multifaceted and rapidly developing field because valuable information and trends from a large amount of academic literature need to be extracted. In this context, this study is different from the previous analysis that mainly focused on specific technologies or methods and adopts the method of bibliometrics to analyze relevant documents in the Web of Science database. It not only discusses the overall trend of NER research, key research hotspots, and how they evolve over time from a macro perspective but also focuses on the research frontiers and related research hotspots in this field through in-depth analysis of relevant literature. Through in-depth mining and visual display of literature data, the broad layout of the research network in the NER field is depicted, including the distribution of leading institutions and countries/regions, providing a clear and objective perspective for research in this field. It aims to help researchers quickly grasp research frontiers and hotspots and provide constructive ideas and suggestions for promoting the development of NER.
The remainder of the paper is organized as follows: Section 2 introduces the data sources and research methodology, and Section 3 analyzes the number of published papers, research directions, and the distribution of journals. Section 4 analyzes the number of articles published by authors, institutions, and countries and their cooperation. Section 5 explores the research frontiers in the field of NER through the analysis of co-cited literature. Section 6 explores the research hotspots of NER by analyzing keywords. Finally, Section 7 summarizes the paper's results and presents ideas for further development.
2. Data source and methods
2.1. Data source
The data used in this study are obtained from the Science Citation Index Expanded (SCI-Expanded) and Social Sciences Citation Index (SSCI) databases in the Web of Science core database. SCI-Expanded and SSCI contain many authoritative publications and the most extensive data, so the data obtained by choosing this database is convincing enough. In recent years, NER technology has experienced a transformation from initial exploration to rapid development, especially the rise of deep learning technology, which has brought significant progress and changes to the field. With the availability of large amounts of text data, the development of NER technology has been supported by powerful resources. Before 2000, the number of NER research papers was relatively small, and this history has significance in the context of technological development and theory. However, considering the increasing application of deep learning technology in NER research since 2010, we choose the analysis scope starting from 2000, aiming to capture the development process of the transition from traditional methods to deep learning methods. Therefore, we used “TS=Named entity recognition” as the search formula (search time May 2023) and selected the period from 2000 to 2023 to analyze the trajectory of rapid progress and key changes in NER technology during this period. The literature type was chosen as the article. After retrieval, 1913 literature was obtained, and the data was cleaned manually. After removing irrelevant literature, 1300 papers and text research data sources were finally obtained.
2.2. Research methods
The research uses bibliometrics and visualization methods based on relevant literature data by drawing “author cooperation”, “institutional cooperation”, “literature co-occurrence”, “literature clustering”, “citation burst”, “keyword co-occurrence”, and “keyword clustering” network maps, which intuitively and scientifically display the characteristics of documents and the development trend and research frontier of NER. The research framework of this paper is shown in Fig. 2.
Fig. 2.
Research framework diagram of the article.
3. Literature distribution characteristics
3.1. Analysis of annual publications
To some degree, the yearly publication quantity in this field can indicate its developmental progress. The number of papers published in the 1215 literature records obtained was statistically sorted out. The trend chart of the annual number of papers published in the past 22 years was drawn, as shown in Fig. 3. As the complete count of papers published in 2023 is unavailable, this year's publications are excluded from the figure. In 2005, the publication numbers reached a short-term peak, possibly related to the ACE Conference 2004. The ACE plan aims to extract the entities mentioned from natural language data and the relationships between these entities and their participation in events [35]. At present, the NER model based on deep learning has become mainstream and has achieved good results, so the NER field has developed rapidly. Price's Law of Literature Growth states that at the early stage of the birth of an area, the growth of the number of related documents is in an unstable stage; when the field is in a period of rapid development, the number of documents grows exponentially; when the area is in a mature location, the number of documents grows relatively slowly. The mathematical model of the exponential curve is used to fit the number of publications, and the parameters of the curve are obtained by the least square method. The curve-fitting formula for the number of publications is finally accepted, as shown in Eq. (1).
(1) |
where is the annual number of publications and is the year. The degree of fit of the curve can be judged by the (coefficient of determination), and , the closer the value is to 1, the higher the fitting reliability. The fitting curve , indicates that the fitting reliability is high. The red line in Fig. 3 is the fitting curve. It can be seen that the number of publications in the NER field has increased exponentially since 2018, so NER is in a period of rapid development.
Fig. 3.
Annual publication numbers and its trend chart.
3.2. Research directions and journals distribution
Analyzing the research directions of NER can assist in understanding the background knowledge and basic disciplines involved in this technology. Through the function of analyzing the search results provided by the WOS database, the number of papers in each research direction involved in NER is obtained, as shown in Fig. 4. Among them, Computer Science Information Systems had the most significant number of papers, with 471 papers. The second direction was Computer Science and Artificial Intelligence, with 443 papers. They were followed closely by Computer Science Interdisciplinary Applications with 329 papers. Through the analysis of research directions, it can be known that NER involves artificial intelligence, medicine, computer science, electrical electronics engineering, biology, biochemistry, and other fields. The distribution of journals can, to some extent, reflect the trend of NER research and the subject areas involved. According to the titles of publications, NER-related studies have been published in 507 journals, and the names of the eight journals with high article loads and their subject areas are summarized in Table 1. They are mainly published in IEEE ACCESS, JOURNAL OF BIOMEDICAL INFORMATICS, BMC BIOINFORMATICS, and other journals. From the perspective of the journal's field, NER's research is mainly related to computer science, medicine, biology, chemistry, and other disciplines, consistent with the research directions analysis results. The rapid development of NER technology in specific fields such as biology, medicine, and chemistry may be related to a large number of labeled databases, high-quality labels, wide data range, and high application value in these fields, which are conducive to the development of NER.
Fig. 4.
Number of papers included in different research direction.
Table 1.
Number of papers in each journal and its subject fields.
Publication titles | Number of publications | Discipline domain |
---|---|---|
IEEE ACCESS | 95 | Computer Science Information Systems, Engineering Electrical Electronic, Telecommunications |
JOURNAL OF BIOMEDICAL INFORMATICS | 84 | Computer Science Interdisciplinary Applications, Medical Informatics |
BMC BIOINFORMATICS | 79 | Biochemical Research Methods, Biotechnology Applied Microbiology, Mathematical Computational Biology |
APPLIED SCIENCES BASEL | 65 | Chemistry Multidisciplinary, Engineering Multidisciplinary, Materials Science Multidisciplinary, Physics Applied |
BIOINFORMATICS | 48 | Biochemical Research Methods, Biotechnology Applied Microbiology, Computer Science Interdisciplinary Applications, Mathematical Computational Biology |
BMC MEDICAL INFORMATICS AND DECISION MAKING | 46 | Medical information |
DATABASE THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION | 42 | Mathematical Computational Biology |
4. Analysis of cooperation
4.1. Analysis of author collaboration network
In order to more accurately track and analyze the annual research dynamics and development trends in the NER field and ensure the timeliness of the analysis, we selected the time-slicing unit as one year in CiteSpace, which means that the retrieved documents will be carefully divided according to each year. The node type “Author” represents the analysis of the number of documents sent by authors and the cooperation between them. In order to focus on the core authors and their cooperation models who are highly active and influential in the field of NER, we will provide a highly targeted and clear visual presentation and analysis basis. Set the node label “Threshold” to 5, which will display author labels with more than five published articles. Through this setting, the visualized knowledge map of the author's cooperation network reveals this field's core researchers and cooperation networks, as shown in Fig. 5.
Fig. 5.
Author cooperation network map.
In Fig. 5, the number of articles published by an author is represented by the node's size, where a larger node indicates that the author has published more articles. The thickness of the line between the nodes indicates the cooperation between authors. The thicker the line, the more frequent the collaboration. On the contrary, the narrower the line, the less collaboration. The node color indicates when the author published a paper, and the warm color indicates when the author published an article recently. The node size and line of the author, institution, and country cooperation network map have the same meaning. In the early stage, “Munoz, R” and “Li, YP” appeared with high frequency, and the links between each node were dense, and the authors cooperated closely. In the middle period, “Ananiadou, S” and “Xu, H” appeared more frequently, and the cooperation between the authors became closer. Recently, “Lin, HF” and “Qiu, QJ” appeared more regularly, but the cooperative relationship between the authors was relatively reduced. Table 2 counts the authors with more than eight publications. Among them, “Ananiadou, S” has published the most papers and is mainly engaged in the research of Mathematical Computational Biology and Biotechnology Applied Microbiology [36,37]. The second is “Xu, H”, who published 13 papers and mainly engaged in research on Health Care Sciences and Services Medical Informatics [38,39]. It is closely followed by “Lin, HF” with 11 publications, mainly engaged in Computer Science and other research work [40,41]. The recent larger node is “Qiu, QJ” The author published the first article in 2019 and has published eight articles so far. The author is mainly engaged in research in astronomy and astrophysics, geology, and other research work [42,43], indicating that the author has recently paid close attention to the field of NER. The method for academia to determine the core authors in a field can be calculated by Price's law, as shown in Eq. (2) [44].
(2) |
where is the threshold for judging the number of papers published by core authors, and those whose paper count is greater than this value are core authors, and is the highest number of papers published by author in this field. The number of core authors was 63, and 310 articles were published, accounting for 23.85 % of the total literature numbers, which was far lower than the conclusion proposed by Price's law that half of the papers were produced by high-productive authors, indicating that the scale of cooperation between authors was relatively small, and no core cluster was formed. Therefore, the cooperation between the authors or the author team should be strengthened.
Table 2.
Author with more than (or equal to) 8 publications and the year of first publication.
Author | Year of first publication | Number of published papers |
---|---|---|
Ananiadou, Sophia | 2008 | 15 |
Xu, Hua | 2014 | 13 |
Lin, Hongfei | 2007 | 11 |
Tang, Buzhou | 2014 | 9 |
Yang, Zhihao | 2008 | 8 |
Zhang, Yaoyun | 2016 | 8 |
Wu, Yonghui | 2016 | 8 |
Qiu, Qinjun | 2019 | 8 |
4.2. Analysis of national and institutional cooperation network
There are specific differences in NER technology between languages, and the difficulties of NER for different languages are different. For example, in the NER technology of English and Chinese, the first problem the NER of Chinese faces is correctly segmenting the words in the text. English words have obvious boundary conditions, while Chinese boundary conditions are difficult to determine. Analyzing cooperation among various countries can promote the exchange and cooperation of NER technology in different languages and the development of NER technology. Yu et al. [45] explored the evolution of collaboration in the analytic hierarchy process research field through bibliometric methods, revealing the dynamic changes in collaboration between countries/regions and institutions and how these collaborations promote the sharing and diffusion of knowledge in the analytic hierarchy process field. Using this research as a reference, this article uses CiteSpace software to analyze national cooperation relationships. Optimizing the principles of map information density and clarity also ensures that the analysis can focus on countries with strong cooperative influence in the NER field, thereby effectively displaying the cooperation networks of these countries in the field. Set the node label “Threshold” to 12. The country labels with a published volume greater than 12 will be displayed. The national cooperation network knowledge graph drawn by this method (Fig. 6) counts the top ten countries with the most published articles and their betweenness centrality values (Table 3). It intuitively demonstrates the current status and characteristics of cooperation between countries in the NER field. Betweenness centrality is a measure of a node's centrality in a network. It equals the shortest paths from all vertices to all others that pass through that node [46]. The larger the amount of data passing through the node and the more frequent the data transmission, the greater the influence of the node in the network graph and the more critical the node's position. Fig. 6 and Table 3 can be used to understand the strength of cooperation between countries and the development status of NER technology in various countries. Among the retrieved data, PEOPLES R CHINA published the most papers, reaching 563, with a betweenness centrality value of 0.32, indicating that NER technology is developing rapidly in China and attracting a high degree of attention. The second is the USA, with 521 publications and a betweenness centrality value of 0.30. This is followed by ENGLAND, with 88 publications and a betweenness centrality value of 0.27. In addition to the countries mentioned above, the development of other countries, such as Germany, India, and Japan, cannot be ignored, although the relatively small number of publications has made important contributions to specific NER technology fields, such as multilingual recognition, cross-domain applications, etc. This demonstrates the diversity and extensive collaboration in global research on NER technologies. The number of articles published by a country reflects the development of NER technology in the language used in that country. From the number of articles published in each country, it can be seen that research on NER technology in Chinese, English, and Arabic is significantly active. At the same time, we have also noticed that NER technology in other languages, such as Spanish, French, and German, is also developing rapidly. These languages show their uniqueness in the process of word embedding, and the integration of this feature in the pre-trained language model (PLM) helps enrich the model's understanding of the language, allowing the model to learn more features. These developments highlight the potential and wide range of applications of NER technology in adapting to global multilingual environments.
Fig. 6.
map of national cooperation networks.
Table 3.
Top 10 countries and institutions with published papers and their betweenness centrality.
Country | Number of publications | Betweenness centrality | Organization | Number of publications | Betweenness centrality |
---|---|---|---|---|---|
PEOPLES R CHINA | 563 | 0.32 | Chinese Acad Sci | 34 | 0.19 |
USA | 251 | 0.36 | Harbin Inst Technol | 29 | 0.08 |
ENGLAND | 88 | 0.27 | Dalian Univ Technol | 25 | 0.01 |
SOUTH KOREA | 85 | 0.02 | Natl Univ Def Technol | 24 | 0.03 |
INDIA | 66 | 0.07 | Univ Manchester | 22 | 0.08 |
SPAIN | 63 | 0.14 | Univ Cambridge | 14 | 0.01 |
GERMANY | 44 | 0.07 | Wuhan Univ | 14 | 0.01 |
AUSTRALIA | 37 | 0.06 | Peking Univ | 13 | 0.07 |
JAPAN | 35 | 0.06 | Korea Univ | 12 | 0.05 |
ITALY | 28 | 0.06 | Cent South Univ | 12 | 0.01 |
Analyzing the cooperation between institutions can help understand the leading institutions and mainstream research objects in the field. In order to focus on displaying the leading institutions with more than six publications in the field of NER and simplify the network map to highlight these major research centers and their cooperation models, the node label “Threshold” is set to 6. Through this setting, we can more clearly identify and analyze active institutions in the NER field and their cooperation networks and draw an institutional cooperation network map (as shown in Fig. 7). Table 3 lists the top ten institutions and their betweenness centrality. It can be seen from Fig. 7 and Table 3 that the institution with the most significant number of publications is Chinese Acad Sci (Chinese Academy of Sciences), with 34 publications, and its betweenness centrality value is 0.19, indicating that Chinese Acad Sci has a more significant academic influence on the field of NER; The second largest publication numbers is Harbin Inst Technol (Harbin Institute of Technology), with 29 publications, and the betweenness centrality value is 0.08; the third publication amounts is Dalian Univ Technol (Dalian University of Technology), with 25 publications, the betweenness centrality value is 0.01. It can be seen that most institutions cooperate closely. Still, most institutions are colleges and universities, so there needs to be more cooperation between schools and enterprises, and the number of papers published by enterprises is relatively small. Therefore, collaboration and exchanges between schools and enterprises should be strengthened to promote NER's more profound development and its application at the enterprise level.
Fig. 7.
Institutional cooperation network map.
5. Literature analysis
5.1. Literature co-citation analysis
Literature co-citation refers to the co-occurrence of two or more documents in reference to one or more other documents. Literature with many co-citations in the field is vital or core literature. The analysis of literature co-citation in the area of NER can explore the mainstream models and application fields of NER technology at various stages. Table 4 lists the top 10 co-cited literature and the year of publication, which are of great significance to the development of NER technology. The CiteSpace software is used to visualize the co-citation of literature. Based on the preliminary analysis of the frequency distribution of the data used, the aim is to balance the level of detail of the atlas with the readability of the overview. At the same time, in order to accurately highlight the widely cited and influential literature in the research field of NER. Set the node label “Threshold” to 6; document labels with reference frequency greater than six will be displayed. The document co-citation network knowledge graph drawn in this way (shown in Fig. 8) highlights the key documents that promote the progress of this field. The number of nodes in the figure is 1188. The links between nodes are 4940. The nodes' size represents the literature's co-citation frequency, and the larger the node, the higher the co-citation frequency. The color of the annual ring of the node is cold or warm, which means the year of publication; the cool color represents the year of publication earlier, and the warm color represents the year of publication later. The line between the nodes shows the closeness of the relationship between the two documents.
Table 4.
Top 10 co-cited literature and co-citation frequency.
Co-cited literature | Frequency | Year of publication |
---|---|---|
Devlin J, 2019, ARXIV, V0, P0 | 381 | 2019 |
Lample G., 2016, ACL, V0, PP260 | 195 | 2016 |
Vaswani A, 2017, ADV NEUR IN, V30, P0 | 169 | 2017 |
Ma XZ, 2016, ACL, VOL 1, P1064 | 140 | 2016 |
Lee J, 2020, BIOINFORMATICS, V36, P1234 | 110 | 2020 |
Habibi M, 2017, BIOINFORMATICS, V33, P137 | 91 | 2017 |
Zhang Y, 2018, (ACL), VOL 1, P1554 | 87 | 2018 |
Radford A., 2018, P 2018 CNAM CHAPT, V0, P0 | 84 | 2018 |
Bojanowski P., 2017, T ASSOC COMPUT LING, V5, P135 | 80 | 2017 |
Li J, 2022, IEEE T KNOWL DATA EN, V34, P50 | 69 | 2022 |
Fig. 8.
Co-cited literature network map.
Fig. 8 and Table 4 show that. The largest node is “Devlin J. (2019)”, with 381 citations. It shows pre-training models' profound impact and breakthrough progress in NLP research and practical applications, especially BERT (Devlin et al., 2019) (Bidirectional Encoder Representations from Transformers). This trend marks a major shift from traditional rule-based and statistical methods to deep learning-based and large-scale pre-trained models, opening a new chapter in the field of NLP. The BERT model is based on the bidirectional coding structure of the Transformer [47], and task 1 is to randomly mask some words in the input text and then predict these masked words so that the model can learn the meaning of words in the context. Task 2, the “next sentence prediction” task, predicts whether the input two paragraphs of text are consecutive texts so the model can understand the relationship between sentences. The second node is “Lample G. (2016)”, with 19 citations. This paper [48] proposes two NER models: a bidirectional LSTM combined with CRF to capture text's long-term dependencies and a transformation method that uses supervised and unsupervised word representation. The paper extensively uses character-level information in the NER task for the first time. This innovation provides new ideas for later processing of complex morphological languages (such as compound words in English). The third node is “Vaswani A. (2017)”. This document [47] proposes the Transformer model, and its innovative attention mechanism (Self-Attention and Multi-Head Attention) marks an important turning point in the field of NLP. Compared with traditional convolutional neural networks (CNN) and recurrent neural networks (RNN), the Transformer model greatly improves the processing efficiency through parallel processing and, at the same time, captures long-distance dependencies more effectively by focusing on different parts of the text, which is the cornerstone of promoting the development of models such as BERT, and provides new methods and technical paths for solving complex NLP tasks. “Lee J (2020)” is the node with the highest citation frequency recently, with ten citations. This document [49] introduces the BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining) model, which is a BERT model optimized specifically for biomedical text mining tasks. BioBERT significantly improves the performance of tasks such as biomedical NER and relation extraction, thanks to its pre-training on a large amount of biomedical domain text. This work demonstrates how to improve the application effect of the BERT model further in professional fields through pre-training in specific fields. It opens up a new path for NLP in specific fields. The thickness of the purple growth rings of a node represents its betweenness centrality. The thicker the growth rings, the higher the betweenness centrality of the node. The node with the highest betweenness centrality is “Leaman R (2015)”. In the literature [50], the method of model combination is used to combine two independent machine learning models to create a chemically named entity recognizer, in which the two models have large differences, such as different feature sets and CRF parameters. The model's innovative nature has led to significant advances in NER tasks in the field of chemistry.
In addition to the high-frequency co-cited literature mentioned above, other sources also play an equally important role in developing NER. For example, Sang and De Meulder [51] provide a standardized evaluation framework and dataset for NER research, which has profoundly impacted standardizing NER assessment and advancing research in this field. Collobert et al. [52] propose an end-to-end deep learning approach for handling NLP multi-tasks, such as NER and part-of-speech annotation. This approach uses a unified neural network model, reduces the reliance on task-specific feature engineering, and lays the foundation for a multi-task NLP solution. Dai and Le [53] propose a learning strategy that combines unsupervised pre-training and supervised fine-tuning, which can effectively use large amounts of unlabeled data to improve the model's performance on specific tasks. This approach inspired later PLMs such as BERT. In addition, many other studies have made key contributions to the development of NER, including exploring different algorithms and models, applying NER in various languages and domains, and innovative approaches when dealing with complex entity types. Together, these studies have advanced NER technology, making it a key component of the field of NLP.
5.2. Cluster analysis of co-cited literature
Further clustering analysis of co-cited literature can explore the research frontier in the field of NER. The Log-Likelihood Ratio algorithm is selected as the clustering algorithm, which can effectively reflect the relationship between events. The main idea of the literature can be roughly summarized through the abstract, so the cluster label selection is obtained through the abstract. After clustering, 16 clusters are obtained, and the most significant 10 clusters are selected for display. Finally, the clustering network map of co-cited literature in the NER field is accepted, as shown in Fig. 9, modularity and silhouette . value was more significant than 0.3, meaning that the network clustering is obvious. The greater the value, the better the cluster obtained by the network. It is generally considered that clustering is reasonable and clustering is convincing. It can be seen that the matching relationship between the node and the cluster is high, the matching relationship with other clusters is low, and the clustering effect is good. Table 5 lists the size, average year, and cluster labels and their log-likelihood values for each cluster, with larger values indicating more representative labels. The largest cluster was the “#0 single-task” model, the cluster size was 172, the silhouette value was , and the average year of the literature was 2014. This is followed by the “#1 Bert model”; the size is 110, the silhouette value is , and the average year of the literature is 2018. The third cluster is the “#1 protein name”; the size of this cluster is 101, the silhouette value is , and the average year of the literature is 2018. The emergence of the “#2protein name” cluster highlights the importance and value of NER technology in specific fields, especially in the biomedical field. Identifying protein names is of great importance in biomedical research because they are key to understanding biological processes, disease mechanisms, and drug effects. In biomedical literature, clinical reports, or research papers, proteins are a common entity type, and their accurate identification and classification are crucial for information retrieval, data mining, disease diagnosis, and biological research. With the development of NLP technology, the application of NER has been extended to various fields. In addition to biomedicine, the identification of compounds and elements in the field of chemistry, the identification of institutional and product names in the financial field, and the identification of component and process terminology in the manufacturing industry have all become important directions of NER research. The development of NER in these fields not only promotes the process of informatization and intelligence in related fields but also provides technical support for extracting and managing professional knowledge.
Fig. 9.
Co-cited literature clustering network map.
Table 5.
Co-cited literature clustering labels and their size.
Cluster ID | Size | Silhouette | Mean(year) | Top Terms (log-likelihood ratio, p-level) |
---|---|---|---|---|
0 | 172 | 0.885 | 2014 | single-task model (1793.76, 1.0E-4) |
1 | 109 | 0.783 | 2018 | bert model (1420.22, 1.0E-4) |
2 | 100 | 0.925 | 2003 | protein name (459.62, 1.0E-4) |
3 | 96 | 0.847 | 2018 | Chinese ner (1989.18, 1.0E-4) |
4 | 92 | 0.817 | 2017 | electronic medical record (1053.37, 1.0E-4) |
5 | 63 | 0.894 | 2007 | metabolite name (204.68, 1.0E-4) |
6 | 60 | 0.907 | 2019 | nested ner (1707.52, 1.0E-4) |
7 | 52 | 0.95 | 2011 | word representation feature (677.39, 1.0E-4) |
8 | 52 | 0.869 | 2018 | joint entity (1959.86, 1.0E-4) |
9 | 52 | 0.993 | 2001 | Spanish text (210.45, 1.0E-4) |
The “#0 single-task model”, “#6 nested ner”, “#7 word representation feature”, and “#8 joint entity” in the cluster marks the key progress and research frontier of NER technology. “#1 bert model” highlights the widespread use of PLM models such as BERT in NER tasks, demonstrating the core application of advanced models in improving text recognition and processing. “#3 Chinese ner” and “#9 Arabic ner” demonstrate the special challenges and advances that represent NER technology in different language structures. In addition, the “#2 protein name”, the “#4 electronic medical record”, and the “#5 metabolite name” reveal the unique applications and development trends of NER technology in various professional fields.
The largest cluster group, “#0 Single-task model”, means that in the development process of NER, the research of single-task models occupies an important position. This type of model focuses on performing a specific task, such as identifying a specific type of entity in text. This focus allows the model to learn more deeply and adapt to the characteristics of the specific task, thereby improving performance on the task. Compared with multi-task models, the structure of single-task models is usually simpler and easier to design and implement. This simplified design helps researchers focus on improving the model's performance on specific tasks without worrying about the complex interaction trade-offs between multiple tasks. The concise structure and focus on a single task of the single-task model enhance the interpretability of the model's decision-making process during the NER task. This clear and understandable feature not only makes the model an important focus in theoretical research but also promotes its in-depth analysis and understanding in empirical research. However, the main limitations of single-task models include that they are often unable to handle or generalize effectively to other types of tasks different from the training task. At the same time, because they focus on one specific task during design and training, these models cannot fully exploit the potential connections or common features between different tasks. While useful for in-depth learning of specific tasks, this focus ignores the importance of connections between different tasks when understanding text. For example, in the NER task, understanding context information or syntactic structure may be helpful to the NER task. With the advancement of technology, although multi-task and complex models gradually dominate, single-task models still have an important significance in the history and development of the NER field. The top 10 literature with the highest citation frequency in the cluster are shown in Table 6. In addition to Lample G. (2016) and Ma XZ (2016), Chiu and Nichols [54] proposed a NER model that combines bidirectional LSTM and CNN. This hybrid architecture effectively integrates the long-term dependency capture capability of BiLSTM and CNN's character-level feature extraction capability, making the model more effective in handling morphological changes and spelling errors, which is crucial to the NER task. This work has promoted the application of deep learning technology in the field of NER, provided a new direction and benchmark for subsequent research, and proved the effectiveness of using deep learning technology to process complex NLP tasks. E. Peters et al. [55] proposed ELMo (Embeddings from Language Models), a deep PLM based on BiLSTM. The key innovation of ELMo lies in introducing deep upper and lower cultural word embedding, which can generate dynamic word representation for the same word in different contexts. This innovation has greatly improved the performance of a variety of NLP tasks, including NER, emotion analysis, and question-answering systems. The emergence of ELMo has profoundly impacted subsequent NLP research, paving the way for developing more advanced PLMs such as BERT and GPT. The research frontier is the seed of scientific and technological innovation, which is of great significance to scientific research and economic development [56]. Through careful analysis of the recent cluster and literature co-citation and literature, we summarized the research frontier of NER technology and tried to reveal the latest scientific exploration and technological breakthrough in this field.
Table 6.
Top 10 co-cited documents in the largest cluster.
Frequency | Centrality | Label | Author | Year | Source |
---|---|---|---|---|---|
195 | 0.05 | Lample G. (2016) | Lample G. | 2016 | ACL |
140 | 0.09 | Ma XZ (2016) | Ma XZ | 2016 | ACL (54TH) |
64 | 0 | Peters ME (2018) | Peters ME | 2018 | MNLP 2018 |
61 | 0.01 | Chiu J.P.C. (2016) | Chiu J.P.C. | 2016 | T ASSOC COMPUT LING |
60 | 0.03 | Abadi M. (2015) | Abadi M. | 2015 | ARXIV160304467 |
43 | 0.28 | Leaman R (2015) | Leaman R | 2015 | J CHEMINFORMATICS |
34 | 0.02 | Krallinger M (2015) | Krallinger M | 2015 | J CHEMINFORMATICS |
33 | 0.01 | Leaman R (2016) | Leaman R | 2016 | BIOINFORMATICS |
31 | 0.01 | Manning C.D. (2014) | Manning C.D. | 2014 | P C EMP METH NAT LAN |
30 | 0.04 | Crichton G (2017) | Crichton G | 2017 | BMC BIOINFORMATICS |
5.2.1. Pre-trained language model (PLM)
PLM plays a key role in the development of the NLP field. These models learn language's basic structure and patterns through pre-training on large amounts of text data, thereby understanding natural language. BERT, as one of the representatives of the PLM, obtained the note through literature clustering, which indicates that the PLM is a research hotspot of NER at this stage. BERT is the first deep, two-way, unsupervised language representation, which uses only a large text corpus for pre-training and combines the context of each token [57]. Further, fine-tune BERT through an additional output layer to apply to various downstream tasks, including NER. In addition to the flexibility of fine-tuning, BERT also has an outstanding ability to deal with rare or new words, which is particularly important in the NER task in specific fields (such as medical treatment, law, etc.). In addition, the multilingual version of BERT supports processing multilingual or cross-language NER tasks. Although BERT performs well in many aspects, it still has challenges processing long texts, such as large computing resource requirements, poor model interpretability, and great demand for fine-tuning data. These challenges have inspired various BERT improvements. For example, the RoBERTa proposed by Liu et al. [58] is an improved version of the BERT model. Optimize performance by increasing the amount of training data, using larger batches, and extending training time. The model eliminates the next sentence prediction task and introduces dynamic mask technology, allowing it to handle longer text sequences and improve understanding of complex structures. Although RoBERTa has improved in performance, its large model size may lead to increased difficulty in deployment, especially in environments with limited resources, and may face overfitting problems on small-scale datasets. Lan et al. [59] proposed ALBERT, an optimized BERT model, in order to solve the problems of memory limitation and longer training time when increasing the model. ALBERT uses two techniques: factored embedding parameterization, which reduces the model's size by decomposing the vocabulary embedding matrix. And cross-layer parameter sharing to reduce the number of parameters that increase with network depth. Like RoBERTa, ALBERT also removes the Next Sentence Prediction task. These innovations enable ALBERT to reduce the model's size and training duration significantly while maintaining a performance similar to that of BERT. Therefore, ALBERT is suitable for the fields with limited resources and provides valuable ideas for optimizing large-scale PLMs. However, although the model size is reduced through the parameter-sharing mechanism, this may also limit the model's ability to capture complex features. Furthermore, like other models, the interpretability of ALBERT remains a challenge. Recently, many researchers have fine-tuned or added other structures to perform NER tasks based on BERT and its improved series of models. Agrawal et al. [60] adopted a transfer learning method to deal with the challenge of nested named entity recognition (NER). Through joint label modeling technology, strategies such as fine-tuning, pre-training, and BERT-based language models were applied to solve this problem. Chen et al. [61] proposed a method based on the ALBERT model to extract entities from steel e-commerce data. Li et al. [62] pre-trained BERT on an unlabeled Chinese clinical record corpus and obtained a large pre-trained BERT model for Chinese clinical texts.
In addition to the BERT model, there are a series of advanced PLMs. For example, GPT (Generic Pre-trained Transformer) series models developed by OpenAI are unsupervised language representations based on deep self-attention mechanism, and the infrastructure is also a transformer. Unlike the BERT model, which focuses on improving the accuracy and depth of language understanding, GPT [63] performs better in generating tasks. At the same time, GPT uses a one-way (forward) attention mechanism compared to BERT's two-way attention mechanism, which makes the model architecture relatively simple. In addition, because of its generative nature, GPT is more flexible in dealing with open problems or generative tasks. Although GPT performs well in many aspects, it is still challenging in terms of high computational resource requirements when processing long texts, poor model interpretability, and large fine-tuning data requirements. For example, GPT-3 [64] can process longer text sequences, which helps the model understand more complex structure texts. However, its parameter scale is huge (175 billion parameters), which may lead to difficulties in deployment. In order to solve these challenges, researchers have been exploring ways to improve the efficiency of the GPT model, for example, by optimizing the model architecture, reducing the number of parameters, adopting more efficient training techniques, etc. XLNet [65] is an advanced PLM jointly developed by CMU and Google Brain. It is the first in-depth bidirectional language representation model that combines autoregressive and autocoding technologies. XLNet uses only large text corpora for pre-training and combines the context information of each tag at the same time. Like GPT, although XLNet performs well in all aspects, it is still a challenge in terms of computing resource requirements, handling long text, and fine-tuning. In addition to the models introduced in the appeal, there are many advanced models such as Transformer-XL [66], ERNIE [67], ELECTRA [68], etc. At this stage, the PLM, while achieving significant NLP capabilities, is also faced with the challenges of high demand for computing resources and environmental impact, as well as the problems of unfair and inaccurate model output that the bias of training data may cause. In addition, the lack of interpretability and limited generalization ability of these models are also the main problems. Many researchers are also using PLM, such as GPT [69,70] and XLNet [71,72], to achieve advanced performance on NER tasks.
5.2.2. Cross-language NER and cross-domain NER
The development of NER technology in various countries and document clustering labels show that NER technology has become a global research hotspot, and its application and research scope span multiple fields and multiple languages. The development of cross-language NER makes it possible to realize efficient NER by drawing on the data and models of high-resource languages when facing some low-resource languages that lack a large amount of training data. It can reduce the need to develop and train models for each language separately, saving time and resources. At the same time, it can also promote cultural exchanges, assist in analyzing and processing multi-language documents, and provide support for information extraction and data analysis on a global scale. However, cross-language NER research also faces a series of challenges. For example, there are differences in grammar, vocabulary, and cultural background between different languages, and different models perform differently in each language. It is a challenge to find a universal model suitable for multiple languages. On the other hand, model recognition of entities often depends on context, and texts in different languages may have different contextual structures and cultural meanings, which greatly increases the difficulty of the model's understanding of text when there is only a small amount of data for fine-tuning. In addition, the consistency and quality of annotations also affect the models' training and evaluation. Datasets in different languages may differ in entity definitions and annotation standards. For the above difficulties, the researchers proposed some solutions. For example, Google launched Multilingual BERT (M-BERT), which aims to handle the task of natural language understanding in multiple languages rather than a single language by pre-training a model with the same architecture as the BERT model on a large number of multilingual texts. Facebook AI proposed XLM-R (Cross linguistic Language Model RoBERTa) [73] for cross-language NLP tasks based on the RoBERTa model architecture. The model is pre-trained on texts in more than 100 languages and uses a self-supervised learning method that does not rely on the parallel corpus, so it can better handle the differences between languages and improve model performance in multiple languages. In addition to improving the PLM, other researchers have also realized the cross-language NER task through different methods. For example, Keung et al. [74] added language adversarial tasks when fine-tuning multilingual BERT, successfully improving the model's performance in zero resource cross-language environments. However, the paper did not delve into the potential limitations of adversarial training, such as adaptability in different language combinations or more complex language scenarios. Feng et al. [75] propose three innovative strategies to improve NER task performance on low-resource datasets: transferring knowledge from high-resource languages, expanding dictionary strategies, and integrating cross-language universal word level entity type features into neural network architectures. Although many scholars have explored cross-lingual NER tasks, some difficulties still have not been resolved. For example, existing models have insufficient generalization capabilities when dealing with new languages that are significantly different from the training data. In addition, the differences in grammar structure, vocabulary usage, and cultural background between different languages still challenge the model's adaptability. Future research may focus on developing general models that are more adaptable to different languages and cultures to address these challenges. This includes exploring effective ways to transfer knowledge from resource-rich languages to languages with fewer resources and using unsupervised learning techniques to solve the problem of insufficient annotation data, thereby improving the performance of NLP-related tasks. The cross-lingual NER task proposes solutions for entity recognition in specific languages or cultural backgrounds and has profound implications for developing NLP.
Cross-domain NER involves identifying and classifying entities in multiple fields (such as healthcare, law, finance, etc.) [76]. Unlike traditional NER, cross-domain NER aims to develop a universal model that can adapt to text characteristics and entity categories in different fields. Research on cross-domain NER technology can more accurately extract key information from texts in different fields and provide support for various complex NLP applications. The difficulties in implementing cross-domain NER tasks include the model's ability to understand domain-specific knowledge in different fields, where the text may contain unique entity types and specialized terminology. Moreover, there may be significant differences in text style and structure in different fields, which poses a challenge to the model's generalization ability. Furthermore, some domains may lack sufficient annotated data to train effective NER models. Jia et al. [77] combine the transfer learning method and use cross-domain language models as a bridge to perform cross-domain and cross-task knowledge transfer, thereby solving problems such as resource limitations and domain adaptability in cross-domain NER tasks. Chen et al. [78] alleviate the problem of data scarcity in cross-domain NER tasks by using data augmentation methods such as pseudo-annotated data and data synthesis. Brack et al. [79] process data from different scientific fields simultaneously through multi-task learning methods, thereby improving the model's generalization ability. In addition to the above methods, other researchers [80,81] have also addressed the challenges faced by cross-domain NER tasks using different methods. Although there have been many studies and solutions for cross-domain NER tasks, there are still some challenges. For example, adaptability to highly specialized fields, changes within the field (new entity type terms may appear in some fields over time), small sample learning, etc. Future research may need to explore more efficient few-shot learning methods, develop more flexible model architectures, and improve domain adaptation techniques to improve the problem.
5.2.3. Nested NER and fine-grained NER
Nested named entities refer to entities that can contain or be embedded within another entity. For example, in the entity “Peking University”, “Peking University” itself is an organizational entity, and the “Beijing” contained within it is a geographical location entity; this indicates that the same text fragment can be classified into multiple entity types. The traditional flat NER method cannot recognize overlapping or nested entities, but it often contains complex entity structures in medical literature, legal documents, scientific papers, and other texts. Katiyar and Cardie [82] pointed out that nested NE is quite common: 17 % of entities in the GENIA corpus are embedded in another entity; In the ACE corpus, 30 % of sentences contain nested entities. The development of nested NER technology can enable deeper analysis and understanding of text. The nested NER model has better understanding and processing capabilities when processing highly structured or specialized text. The complexity of nested entities makes nested NER more challenging and practical than traditional NER, so nested NER has become an emerging topic in NER tasks [83]. In addition, determining the exact boundaries of each entity in nested entities is also a challenge, especially when the entities overlap and the context is ambiguous. Furthermore, when dealing with nested entities, the structure of the model is often more complex, which may result in higher computational complexity and lower processing efficiency. These challenges require continuous research and innovation in model design, data processing, and algorithm optimization to improve the performance and applicability of nested NER technology. From the perspective of model structure, standard mainstream methods for nested NER include early rule-based methods [84], which rely on the post-processing of rules like traditional NER methods. The Layer-based approaches [85] treat nested NER tasks as multiple traditional NER tasks and identify nested entities layer by layer. The Span-Based approach [86] solves the problem of entity boundary ambiguity by calculating the span representation of all sequences and then classifying them through local normalization scores. The hypergraph-Based approach [87] refers to using hypergraphs to represent the nested structure of entities in sentences and can represent and process complex entity relationships. The Transition-Based approach [88], inspired by the transformation-based parser, processes nested entities through sequential operations and is suitable for long sentences with complex structures. Recently, many scholars have addressed the nested NER problem by further improving mainstream methods. Geng et al. [89] proposed a novel planar sentence representation and bidirectional two-dimensional recursive operation, effectively solving the semantic dependency and entity boundary ambiguity problems in nested NER. This method can reduce the complexity of the model and improve the accuracy of entity recognition, but there may be a dependency on high-quality annotated data. Cui and Joe [90] proposed a pyramid hierarchical model based on a multi-head adjacent attention mechanism, which is used to fuse information from two adjacent inputs and better model the dependency relationship between entity spans. Chen et al. [91] improved the accuracy of entity boundary recognition and semantic dependency construction in nested NER by proposing a controlled attention mechanism, allowing the model to focus more effectively on task-related semantic features, thereby improving the model's performance and robustness. Although many researchers have addressed some of the difficulties in nested NER through various methods, some challenges still need to be addressed. For example, existing models still underperform when dealing with extremely complex nested structures, such as multiple nested or cross-nested entities. Secondly, many models perform well in specific fields, but their performance may decrease when applied to different types of text or across domains. Meanwhile, in low-resource languages, how to effectively identify nested NER is also a challenge. Therefore, future research on nested NER technology may focus on the following points. We will use weakly supervised and transfer learning techniques to reduce dependence on large amounts of annotated data and improve the model's adaptability in different fields. Explore nested NER methods that combine multimodal data such as text, sound, audio, and cross-language nested NER. Develop more efficient and lightweight nested NER models to meet real-time and large-scale data processing needs.
Fine-grained NER aims to identify and classify entities using more detailed and specific categories from text. Fine-grained NER focuses more on more profound and specific entity categories than traditional NER. For example, it identifies an entity as an organization and further distinguishes it from government agencies, educational institutions, commercial companies, etc. This requires the model line to understand the context more deeply to accurately classify close or similar entity types. At the same time, it will also face problems such as increasing entity categories, blurring entity boundaries, and fine-grained feature recognition. These challenges require that the fine-grained NER model not only needs strong language understanding ability but also can handle complex entity relationships and category segmentation. Rodríguez et al. [92] effectively address challenges such as blurred entity boundaries, inaccurate category recognition, and complex context interpretation in fine-grained NER by combining advanced text encoding technology, BiLSTM, CRF, and name-focused attention mechanisms. Wan et al. [93] propose a span-based multimodal attention network, which introduces a closed-loop mechanism to simulate human behavior to simultaneously and deeply mine multimodal information (span cell tag sequence and context information) existing in the text to capture the fine-grained interaction characteristics between them, thus improving the model performance. Wang et al. [94] use the method of distance-supervision, combined with flexible knowledge base matching and ontology-guided multi-type disambiguation technology, to effectively deal with the fine-grained NER problem in the chemical field. The performance of fine-grained NER can be improved through advanced disambiguation technology, combining NER with entity linking to enhance the understanding and classification of complex entities, and also by utilizing data synthesis, transfer learning, and other technologies to address issues related to data scarcity and imbalance.
5.2.4. Multimodal NER
Multimodal NER is a technology that combines text with information from other modalities (such as images, videos, sounds, etc.) for entity recognition. It analyzes the text's language features and uses information from other modalities to assist in the recognition and understanding of entities. For example, a multimodal NER system might combine visual cues in images and image description text to identify specific tasks or objects in images. When the context information is ambiguous, multimodal NER can more accurately identify entities that are difficult to determine in the text by combining multimodal data. In addition, when dealing with text containing complex scenes (such as social media content), multimodal information helps to better interpret and understand entities. When carrying out multimodal NER, the first problem is the high cost of multimodal data acquisition and annotation. Secondly, different modal data (such as text and image) may have great differences in feature representation, scale, and type, and how to effectively fuse these heterogeneous data. In addition, there may be noise inconsistency between different modes, which may affect the model performance. Yu et al. [95] realize the effective fusion of text and visual information by combining a unified multi-mode converter and an auxiliary entity range detection module. It has improved the problem of dealing with visual bias and modal interaction, thus improving the entity recognition rate in social media posts. Zhang et al. [96] proposed a multimodal graph fusion method to improve the effect of entity recognition in social media posts. By creating a graph structure that fuses text and visual objects, this method realizes deep semantic interaction inside and outside the mode and effectively integrates context and cross-modal content. The span-based multimodal variational autoencoder proposed by Zhou et al. [97] solves the difficulty of obtaining and labeling data sets and the noise problem. The reliance on large amounts of labeled data is reduced through semi-supervised learning, and the noise in the data is effectively handled through variational autoencoders. On this basis, future multimodal NER research will further explore cross-domain and cross-language adaptability to improve the generalization ability of models in different environments. At the same time, it will also focus on developing multimodal NER systems that adapt to real-time and dynamic environments, such as social media analysis and real-time news processing, to meet the growing demand for real-time data processing.
5.2.5. Few-shot NER
In order to solve the problem of limited annotation data and high annotation cost in specific fields or low resource languages and improve the generalization ability of models when facing new entity types and different fields, the few-shot NER technology came into being. Developing few-shot NER technology can reduce the model's dependency on a large amount of labeled data, thus lowering the cost associated with collecting and labeling vast amounts of data. In addition, few-shot NER allows the model to quickly adapt to new domains, which is particularly important in dynamic environments. Driven by a small amount of data, new methods, model architectures, and algorithms such as meta-learning and transfer learning have been developed. How to enable the model to learn from a small number of samples and effectively generalize to unseen entities, solve the negative impact of noise (such as error labeling) in a small number of samples on model performance, and deal with different language styles and entity types of domain texts are the primary challenges in few-shot NER. Wang et al. [98] introduced a data enhancement method to improve few-shot NER, which enhances model generalization and training effects by changing the prompt order. Chen et al. [99] propose a self-describing network that learns extensive knowledge through pre-training and then transfers to few-shot NER tasks. This approach uses universal concept descriptions to automatically map new entity types and identify entities adaptively. Das et al. [100] propose a comparative learning method to optimize the distribution differences between labeled entity representations. Gaussian embedding is used to display the distribution of modeling entities. In this way, the model can more effectively capture the label dependency, avoid the overfitting problem of the previous methods in dealing with O (non-entity) markers, and thus improve the model's performance in the small sample NER. Chen et al. [101] propose a prompt-based metric learning framework, which effectively solves the problem of tag scarcity and overfitting by combining tag awareness prompts and metric learning. In addition to the above methods, some researchers [102,103] explored different methods to improve the performance of NER tasks with few samples. In the future, the development of few-shot learning may need to study more complex and effective methods to encode and use context information to improve the recognition ability of models for complex entities. At the same time, we can use better data enhancement techniques and semi-supervised learning methods to expand the training data set according to the data situation. And develop lightweight models and computational efficiency optimization methods, especially in resource-constrained environments.
5.3. Analysis of burst literature
Important literature in the development process of the research field can be discovered through the burst function in CiteSpace software, which can be used to find the literature with the strongest citation burst. The literature has time characteristics, and its burst and blanking times can be known through this software to obtain the hotspot evolution and development track in this field. Burst literature refers to literature cited in a large number within a certain period [104]. The top 25 documents with the strongest citation burst in the NER field are shown in Fig. 10. The blue line represents the time axis, the red line represents the period of the burst literature, and both ends of the red line represent the start and end time of each burst literature. The development of each period can be seen in this figure. For example, Kazama et al. [105] explored the application of SVM in biomedical NER. During this period, the methods of NER technology were mainly based on machine learning methods. Rocktaschel et al. [106] proposed an integration method that combines dictionary-based and grammar-based methods, effectively improving the accuracy and efficiency of extracting chemical entities from chemical texts. This method is of certain importance for specific application scenarios. Mikolov et al. [107] proposed two models for calculating word representation: the Continuous Bag of Words Common Bag of Words (CBOW) and Skip-gram models. This significantly simplifies and improves the computational efficiency of word vector representation. This document is one of the important documents that pushed the combination of deep learning and NLP tasks to the mainstream. Huang et al. [108] applied BILSTM and CRF models to sequence marking tasks for the first time, such as part-of-speech tagging (POS), blocking, and NER, significantly improving task performance and reducing dependence on word embedding. It lays a foundation for applying further in-depth learning in the NER field.
Fig. 10.
top 25 literature with the strongest citation burst.
6. Keyword analysis
6.1. Keyword co-occurrence analysis
Keywords in literature are usually the concentration of an article, and keywords can reflect the core idea of the literature. Keyword co-occurrence refers to the number of occurrences of the same keyword in a group of documents, and the close and distant relationship between them is studied by counting the number of co-occurrences. Cluster analysis can classify the keywords with strong homogeneity into one category according to the affinity between keywords, making the cohesion between keywords in the same category stronger than that between keywords in other categories [109]. Yu et al. [110] explored the knowledge structure in the field of Preference Ranking Organization Method for Enrichment Evaluations through the analysis of co-word networks, revealing the dynamic changes of the core themes and research directions in this field. This method demonstrates the effectiveness of co-word network analysis in significantly identifying and tracking the development trend of knowledge in the subject area. Based on this, this study uses this method for reference and keyword co-occurrence analysis to explore research hotspots in the field of NER in depth. The node type is selected as “Keyword”, and other parameters remain in default settings. In order to enable the atlas to focus on the keywords that frequently appear in many kinds of literature and have significant importance and representativeness, optimize the information density of the atlas and ensure that important research trends and hotspots are presented in the atlas. After preliminary data exploration, we set the threshold to 26. This setting aims to emphasize the core and widely concerned research topics in the field of NER and map the knowledge map of the keyword co-occurrence network, as shown in Fig. 11. The number of nodes is 634, and the lines between nodes are 2907. The node size represents the number of keywords co-occurrences. The larger the node, the higher the number of keyword co-occurrences. The color of the annual ring of the node represents the year when the keyword co-occurrence, the cool color represents the year earlier, the warm color represents the year later, and the line between the nodes means the closeness of the two keywords. The high and low-frequency word demarcation values are calculated by Eq. (3), proposed by Donohue [111].
(3) |
where is the number of words with a frequency of one, and is the dividing frequency of high and low-frequency words. The number of keywords with a frequency of 1 calculated by CiteSpace is 395, and is calculated according to the formula, Hence, those with a co-occurrence frequency greater than 29 are high-frequency keywords. The betweenness centrality, frequency, and year of occurrence of high-frequency keywords are obtained after processing the keywords with low correlation and similar relationships, as shown in Table 7.
Fig. 11.
Keyword co-occurrence network map.
Table 7.
High-frequency co-occurrence keywords.
Keyword | Centrality | Year of first appearance | Frequency |
---|---|---|---|
named-entity recognition | 0.28 | 2004 | 497 |
natural language processing | 0.18 | 2002 | 213 |
extraction | 0.19 | 2002 | 189 |
text mining | 0.19 | 2005 | 136 |
deep learning model | 0.08 | 2017 | 133 |
neural network | 0.08 | 2018 | 87 |
machine learning | 0.11 | 2006 | 77 |
conditional random field | 0.16 | 2006 | 68 |
task analysis | 0.05 | 2019 | 57 |
database | 0.06 | 2005 | 56 |
relation extraction | 0.04 | 2015 | 55 |
attention | 0.03 | 2019 | 39 |
gene | 0.05 | 2004 | 36 |
transfer learning | 0.01 | 2020 | 35 |
biomedical text mining | 0.03 | 2006 | 35 |
electronic health records | 0.02 | 2013 | 34 |
feature extraction | 0.03 | 2019 | 34 |
biomedical named entity recognition | 0.06 | 2004 | 34 |
classification | 0.03 | 2007 | 33 |
word embedding | 0.03 | 2017 | 32 |
sequence labeling | 0.03 | 2017 | 30 |
It can be seen from Fig. 11 and Table 7 that, except for the subject words, keywords with high co-occurrence frequency in the NER field include “Natural language processing”, “Extraction”, “Text mining”, etc. These are the downstream tasks of NER. “Deep learning model”, “Neural Networks”, “Machine Learning”, etc., are methods or techniques to conduct NER research. The recent high-frequency keywords include “Task analysis”, “Attention mechanism”, “Transfer learning”, and “Feature extraction”, which show that most NER research at this stage is based on these technologies. In addition to the high-frequency keywords, attention should be paid to the keywords “Multi-task learning” and “Adversarial training”.
6.2. Keyword cluster analysis
After further clustering the keywords, 10 clusters are obtained, and the keyword clustering network map is shown in Fig. 12. Table 8 lists the size of each cluster, the mean year as well as the cluster label, and its cluster value. The largest cluster is “# adversarial learning”, the cluster size is 94, the clustering modularity is 0.711, and the average year is 2019, indicating that this strategy is widely used in recent NER technologies and is an effective means to improve the performance of NER models. The second cluster is “# social media”, the cluster size is 90, the cluster modularity is 0.695, and the average year is 2017. With the development of the Internet, more and more studies focus on NER in social media [112], which is challenging due to its informality and strong noise. The third cluster, “# biomedical literature”, has a cluster size of 90, a cluster modularity of 0.769, and an average year of 2011, indicating that NER for biological texts is a long-term research hotspot. Based on keyword co-occurrence and keyword clustering analysis, the research hotspots in the field of NER can be summarized as follows.
Fig. 12.
Keyword clustering network map.
Table 8.
Keyword clustering labels and their size.
Cluster ID | Size | Silhouette | Mean year | Top Terms (Log-Likelihood Ratio, P-Level) |
---|---|---|---|---|
0 | 94 | 0.711 | 2019 | adversarial learning (510.39, 1.0E-4) |
1 | 90 | 0.695 | 2017 | social media (485.51, 1.0E-4) |
2 | 90 | 0.769 | 2011 | biomedical literature (1151.48, 1.0E-4) |
3 | 60 | 0.644 | 2011 | active learning (461.07, 1.0E-4) |
4 | 59 | 0.749 | 2013 | protein-protein interaction information extraction (253.17, 1.0E-4) |
5 | 49 | 0.81 | 2013 | natural language processing method (218.87, 1.0E-4) |
6 | 35 | 0.812 | 2017 | using lexical disambiguation (234.06, 1.0E-4) |
7 | 32 | 0.764 | 2014 | data augmentation (187.01, 1.0E-4) |
8 | 19 | 0.896 | 2011 | question answering (175.99, 1.0E-4) |
9 | 10 | 0.958 | 2011 | mining chemical document (60.6, 1.0E-4) |
6.2.1. Attention network
The attention mechanism was first proposed in computer vision, which can make neural networks pay more attention to the valuable information in the input and reduce the attention to irrelevant information. Just like when people look at pictures, they tend to pay more attention to the content they are interested in. Generally speaking, the attention mechanism is divided into two steps: calculating the attention distribution on the input information and calculating the context vector according to the attention distribution [113]. The attention mechanism enables the model to focus on key parts of input data to better understand the context. In NER, it is important to understand the context around an entity because it can help distinguish between entity and non-entity terms. When dealing with long-distance dependency problems, the attention mechanism can effectively capture the dependencies between long-distance words, which is particularly important for identifying entities that span multiple words. At the same time, the attention score provides a way to explain model decisions, which can show the most concerned part of the model when identifying entities. Combining the attention mechanism with other deep learning technologies (such as LSTM, RNN, etc.) can improve the model's overall performance. In the research of attention mechanism, the main challenges include how to allocate attention accurately and efficiently in key parts of the model, especially in the face of many possible concerns to ensure the accuracy and efficiency of attention. When dealing with long sequences, how to effectively maintain the concentration and distribution of attention and avoid paying too much attention to distractions or focusing on irrelevant parts. In addition, when using the multi-head attention mechanism, optimizing the role of each head and integrating their output to improve the overall performance are key issues. These challenges highlight the complexity and nuances of applying attention mechanism to practical problems, which need in-depth research and innovative methods. Bahdanau et al. [114] used attention in NLP tasks for the first time to extract important information in sentences by giving different weights to words. With the attention mechanism demonstrating superior performance in NLP tasks, various attention mechanisms have been proposed to enhance this capability. Xu et al. [115] introduced Hard Attention, which focuses on specific parts of the input sequence to improve computing efficiency and model interpretability. However, it may face certain challenges in the training process and risk missing other important information. Vaswani et al. [47] proposed the transformer architecture, which is completely based on the self-attention mechanism and effectively handles long-distance dependencies while achieving higher parallelism. In addition, a multi-head attention mechanism is used, allowing the model to focus on different parts of the sequence on different “heads” simultaneously, thereby capturing various relationships and patterns in the data. In addition to the above attention mechanisms, many researchers improve the accuracy of NER tasks by exploring other attention mechanisms or integrating attention mechanisms into the model. Zhang et al. [116] proposed a part-of-speech attention mechanism to obtain the contribution weight of part-of-speech to entity recognition. Lin et al. [117] applied the attention mechanism to character and word level information, respectively, and proposed a neural network model that relies on hierarchical attention to achieve sequence tagging. Xu et al. [118] proposed an attention-based neural network architecture, which relieves context dependency by using document-level global information obtained from documents represented by a pre-trained bidirectional language model with neural attention. Although the attention mechanism has been widely used, the transparency and interpretability of its decision-making process still need to be improved. In addition, using or improving the attention mechanism to deal with long texts more effectively also requires more effort. At the same time, explore the integration of different types of attention mechanisms and use their respective advantages to improve the model performance.
6.2.2. Multi-task joint learning
The core idea of multi-task joint learning is to train a model to perform multiple related tasks simultaneously rather than train a model for each task separately. The main advantage of this method is that it can enable the model to learn shared representations and features from multiple tasks, thus improving the performance of each task and the overall efficiency of the model. In NER, joint entity relationship extraction, as the representative of multi-task joint learning, allows the model to simultaneously identify entities in the text and judge the relationship between these entities, thus improving the accuracy and efficiency of information extraction and contributing to a deeper understanding and analysis of text content. Joint entity relationship extraction [119] refers to the joint process of entity recognition and relationship extraction. The joint learning method considers the potential dependency between the two tasks, thus using rich contextual information [120]. In the traditional way of extracting triple groups (entity 1, relationship, entity 2), NER and relationship extraction are executed independently, called the pipeline model [121]. The pipeline method is simple and flexible, but it ignores problems such as low-level interaction and error propagation [122]. The joint model is usually more efficient than sub-step processing because it reduces the complexity of the processing process. However, building a complex model that can simultaneously handle entity recognition and relationship extraction is necessary when extracting joint entity relationships. At the same time, data in this area is relatively scarce in many fields. Moreover, it becomes more difficult to deal with text with complex structures and diversified entities or relationship types. Zhao et al. [123] proposed a method based on a heterogeneous graph neural network, which can accurately capture the dependencies of entities and their relationships by representing iterative fusion technology and effectively dealing with entities and relationships in long texts. Chen et al. [124] used the location-aware attention mechanism and relationship embedding method to solve the problem of overlapping triples in joint entity and relationship extraction. This method improves the model's ability to deal with complex relationships by accurately identifying the location of entities and enhancing the understanding of relationships between entities. Wan et al. [93] conducted an in-depth analysis of multimodal information in the text, such as span cell label sequences and context information, and developed a multimodal attention network and a modal attention enhancement module to jointly model this information, aiming to capture the fine-grained interaction characteristics between entities and their relationships. In addition to the methods used in the above papers, some other researchers [[125], [126], [127]] use different strategies to improve the performance of the joint entity relationship extraction model. In addition to joint entity relationship extraction, the NER task is usually combined with the POS tagging task [128]. Combining NER with POS tagging can significantly enhance the model's understanding of words' grammatical roles and boundaries and improve the accuracy and precision of entity recognition in the context. This combination uses part of speech information to optimize entity identification and classification, especially when identifying entities in complex sentence structures. Combining NER and semantic role labeling can significantly improve the model's entity recognition ability in complex contexts. An in-depth understanding of entities' semantic roles and context relationships can enhance the understanding of sentence meaning and accurate recognition of relationships between entities. In addition to the above combinations, it is combined with syntactic dependency analysis, sentiment analysis, and text classification to improve the accuracy of each task. Although multi-task joint learning can bring significant improvement, it also faces a series of challenges. For example, task relevance, not all tasks contribute to common learning, and incorrect task combinations may lead to performance degradation. There may be conflicts between different tasks, and the optimization of one task may have a negative impact on another task. Balancing various tasks is also a key issue. It is necessary to ensure that no single task dominates the learning process. Future developments in multi-task joint learning may focus on improving the effectiveness of task combinations, such as developing advanced algorithms that can automatically adjust the learning process based on the relevance and complementarity of tasks. Optimize resource usage, such as developing dynamic resource allocation mechanisms, exploring adversarial training and regularization techniques to enhance the generalization and robustness of the model, etc.
6.2.3. Transfer learning
In most fields, tagging data is often insufficient, and tagging costs are high. Transfer learning can solve the problem of insufficient tagging data to some extent. Transfer learning [129] refers to applying the knowledge learned in the source task domain to machine learning tasks in the target domain. Transfer learning is an effective method for low-resource corpus and cross-domain learning. In the NLP field, there are generally two types of transfer learning. One is feature-based transfer, mainly represented by word2vec [107]. The other is fine-tuning. The whole pre-trained model is carried to the downstream task for fine-tuning, mainly represented by BERT. At this stage, the focus of transfer learning is mainly on fine-tuning. However, the following points need to be considered when conducting transfer learning. One core of transfer learning is effectively applying the knowledge learned from one field to another. The two fields may have significant differences in data distribution, feature space, or task objectives, making direct transfer ineffective. In addition, when the difference between the source domain and the target domain is too large, it may lead to negative transfer; that is, the knowledge of the source domain may not only be unhelpful but also harm the performance of the target domain. It is also a key point in determining which features are shared between the source and target domains and which are domain-specific. Peng et al. [130] employ a language model based on BiLSTM as part of their transfer learning approach. This model is initially trained to extract features and structures from a large corpus of unlabeled text data. These learned linguistic patterns are then adapted and applied to the task of NER. Gligic et al. [131] pre-trained the model on many unlabeled electronic health records to capture rich linguistic features and context information and then applied these embeddings to the network architecture. Through this transfer method, the information in a large amount of unlabeled data can be effectively used to improve the model's performance. Yu et al. [132] used BERT to conduct pre-training on large-scale text corpora and the medical department's deep-level, context-based two-way language representation. Then, they used the output of BERT as the input of the subsequent neural network model. Yao et al. [133] adopted transfer and active learning methods to address the scarcity of labeled data by learning from public source datasets to knowledge transfer to fine-grained mechanical NER. Transfer learning is a rapidly developing field, and future research may continue to explore how to transfer knowledge more effectively between different fields or modalities, such as transferring from text to images or across various types of datasets. Additionally, developing technology that can automatically identify the optimal transfer strategy could reduce the need for manual adjustments and extensive experimentation.
6.2.4. Adversarial training
Adversarial training [134] is a technology used to enhance the robustness of machine learning models, especially for deep learning models. It trains the model by introducing and using adversarial samples to improve the model's resistance to small, deliberately created disturbances in the input data. Through training models to deal with diverse inputs, adversarial training can enhance the generalization ability of models to unseen data. In addition, adversarial training increases the diversity of training data by creating adversarial examples, and in some cases, not only improves the robustness of the model but also improves its performance. For security-sensitive applications, such as fraud detection, adversarial training can improve the stability of the model in the face of malicious operations. While adversarial training brings excellent performance, other factors must be considered. For example, adversarial training requires an additional computational burden in generating and training on adversarial training examples. At the same time, consideration should be given to finding a balance between enhancing robustness and maintaining high performance when conducting adversarial training. Excessive adversarial training may lead to a decline in model performance on conventional data. In addition, during the adversarial training process, if the adversarial sample generation method is too simple or is too different from the real data distribution, it may cause the model to overfit the adversarial sample. In NER, adversarial training not only improves the robustness and generalization ability of the model but also helps the model learn to recognize semantically complex or ambiguous entities, such as entities with ambiguity or context dependence. Inspired by this technology, some studies improve the performance and robustness of NER through adversarial training. For example [135], proposed a cross-domain adversarial learning method, which guides the model to learn the shared information between two tasks (Chinese electronic medical record text and NER in online medical consultation text) by combining the adversarial mechanism into multi-task learning, thus significantly improving the recognition performance and robustness of the model on complex and diverse text data. Park et al. [136] performed NER in the automotive field by combining adversarial training and multi-task learning. Adversarial training trained the model to recognize terms from both the general and automotive domains, thereby avoiding overfitting in a single domain. Multi-task learning is applied to handle NER and word spacing prediction tasks simultaneously. Wang et al. [137] increased the adaptability of the model to data by adding disturbances to the key variables of the model. The purpose of this adversarial training is to improve the generalization ability and robustness of the model, reduce the risk of overfitting, and improve the model's performance in processing diversified input data. In the field of NER, adversarial training has shown its potential and effectiveness. Future research and development may focus on using adversarial training to improve the capability of the NER system in dealing with complex entity structures (such as nested entities) and cross-domain and cross-language adaptability. In addition, it may also focus on improving the NER system's ability to deal with low-resource language and informal text, enhancing the system's security and resisting adversarial attacks.
6.2.5. Deep active learning
Active learning [138] is a method committed to studying how to obtain more performance gains as much as possible through less labeled data. Specifically, iterative unlabeled data sets select appropriate samples for labeling to reduce labeling costs. However, classical active learning methods are difficult to deal with in terms of high-dimensional data [139]. Deep learning performs excellently in processing high-dimensional data and feature extraction, and active learning can effectively reduce the annotation cost. Therefore, combining deep learning with active learning provides an effective way to train efficient models when data annotation resources are limited. In NER, the main advantage of deep active learning is that it can significantly reduce the need for high-quality annotated data, thereby reducing annotation costs and time. At the same time, the accuracy and adaptability of the model are improved by selecting the most effective samples for model improvement, especially in data scarcity or domain-specific scenarios. However, when performing active learning, selecting unlabeled samples to best improve the model's performance requires a precise sample selection strategy. At the same time, high-quality annotation may still rely on experts in specific fields, especially in professional fields. In addition, processing a large number of unlabeled data to determine its information content may lead to higher computing costs. In recent years, many scholars have achieved excellent results in the field of NER by combining deep learning and active learning. For example, Agrawal et al. [140] used the minimum confidence sampling strategy based on uncertainty to solve the sample selection problem. This strategy considers the uncertainty of the model's most likely label for each instance and calculates the number of uncertain words in the sentence. At the same time, the corpus is used to label the selected samples directly. Li et al. [141] combined the uncertainty- and diversity-based sampling method with the BERT-BiLSTM-CRF model to alleviate the problem of insufficient annotated data. Among them, uncertainty sampling selects instances with uncertain labels, while diversity sampling increases data diversity and selects instances with large context differences. Radmard et al. [142] proposed a sequence-based active learning method to improve the efficiency of sample selection in the NER task. This method not only considers the uncertainty of the whole sentence but also focuses on the subsequences in the sentence, allowing the query and annotation of subsequences with high uncertainty. In the field of NER, future deep active learning will focus more on further developing subsequence-based annotation methods to improve the utilization of annotation data and reduce the labor and time costs required for annotation. At the same time, it explores integrating the latest deep learning models (such as PLMs) and technologies into the deep active learning framework and developing more accurate sample selection algorithms.
6.2.6. Federal learning
Federated learning [143] is a distributed machine learning method whose core is to allow multiple devices or servers to collaborate on data learning while protecting data privacy and security. Compared with traditional machine learning methods, federated learning can not only protect data privacy and reduce dependence on centralized data storage but also enable distributed collaborative learning across multiple devices. Driven by privacy protection and data security, the application of federated learning in NER has gradually become a research hotspot. For those NER applications involving sensitive data, such as healthcare, financial services, or legal documents, federated learning provides a way to protect personal privacy and sensitive information while allowing learning to be performed from this data. In addition, sharing models without sharing data between different institutions or fields allows the NER system to learn from a wider range of data, thereby improving the model's generalization ability and accuracy. However, when conducting federated learning, it is necessary to consider ensuring data privacy and security in the process of distributed data processing, solving the heterogeneity problem of data distribution on different clients, improving communication efficiency in the model update process, and reducing Broadband and resource consumption. Developing federated learning suitable for NER requires careful consideration of data distribution, model design, privacy protection, communication optimization, and other aspects. Wu et al. [144] use knowledge distillation technology to achieve communication-efficient federated learning, using smaller mentee models and larger mentor models to learn from each other. Small models perform personalized learning on their respective clients while reducing communication costs through the dynamic gradient method based on singular value decomposition. Wang et al. [145] proposed a cross-platform data distillation and processing method for heterogeneous label sets to train global NER models. This method combines the sequence-to-sequence NER framework and prompt tuning technology to reduce communication costs and improves through the distillation of pseudo-complete annotations (data contains all possible entity type annotations, not just the entity types already in the local dataset) Identification of unlabeled entity types. Ge et al. [146] divided the model into private modules that focus on local characteristics and shared modules that capture general knowledge of the platform, and then shared model gradients rather than original data among various medical platforms to maintain privacy, thereby improving the generalization ability and accuracy of the NER model. The research of federated learning in NER may need to explore encryption and privacy protection technologies further, such as homomorphic encryption and differential privacy. At the same time, the algorithm and processing strategy are optimized to deal with the efficiency problem in large-scale heterogeneous data environments, especially in real-time updates and dynamic changes of medical data.
6.2.7. Distance-supervision and weakly-supervised learning
Distance-supervision and weakly-supervised learning are two machine learning paradigms that aim to solve the problem of scarcity of annotated data. The distance-supervised learning method automatically labels training data using existing knowledge bases or external resources. In NER, the application of distance-supervised learning significantly improves efficiency and professionalism. For example, using existing knowledge bases in the medical field, medical terms in text data can be automatically annotated. This method not only reduces the need for manual annotation but also automatically identifies and accurately annotates entities in specific fields (such as medical and legal) through a professional knowledge base. In addition, distance-supervised learning can generate a large amount of diverse training data, thereby improving the model's generalization ability. There are several key points when applying distant supervised learning in NER. First, processes that rely on automated annotation in existing knowledge bases may generate erroneous or inaccurate labels, introducing noise that impacts model performance. Secondly, distance-supervised learning may have difficulty correctly handling entity ambiguities in context during automatic annotation, especially when the same words or phrases represent different entities in different contexts. In addition, since the knowledge base may cover some entity categories more extensively than others, it may lead to category imbalance in the data set, further affecting the model's ability to identify rare entity categories. Li et al. [147] proposed a self-training framework for category rebalancing. By designing flexible category thresholds and using hybrid pseudo-labeling technology, the category imbalance problem of NER under distance supervised learning is improved. Zhou et al. [148] proposed a distance-supervised learning NER method, which uses an external knowledge base to generate labels automatically and a reliability-based learning strategy to reduce false negative samples generated by incomplete labels. Meng et al. [149] automatically generate training data by matching entity mentions in the original text with entity types in the knowledge base. A noise-robust learning scheme is proposed to solve the problem of incomplete and noisy labels, including a new loss function and steps to extract noisy labels. Although distance-supervised learning has made significant progress in the field of NER, there are still some challenges. For example, for higher-level semantic understanding, distant supervised learning usually relies on surface-level text matching and simple rules, which makes it difficult to handle complex semantic understanding and reasoning tasks. Moreover, in an environment of dynamically changing data sources and updated knowledge bases, the real-time learning and adaptability of distance-supervised learning models need improvement.
Weakly-supervised learning is a method of training machine learning models using incomplete, inaccurate, or inconsistent labeled data. This approach often relies on heuristic rules, labeling functions, or the integration of multiple imperfect labeling sources. In NER, weakly-supervised learning can effectively utilize a large amount of unlabeled or partially labeled data and reduce reliance on manual labeling. This method enables the model to quickly adapt to new entity types and changing domains by integrating information from external knowledge bases, rules, or heuristic algorithms. It is especially suitable for professional or low-resource language scenarios. Although weakly-supervised learning reduces the requirement for large amounts of annotated data, data diversity and coverage must be ensured to avoid model bias and overfitting. At the same time, weakly-supervised data may contain some errors or inconsistent labels, requiring effective noise processing mechanisms, such as noise filtering or correction strategies. Fries et al. [150] used medical ontology as the source of annotation heuristic rules and adopted a weakly-supervised learning method to train a medical entity classifier. Correct label noise by modeling the accuracy of each ontology and rules to improve model performance. Zhang et al. [151] improved the problems of insufficient label coverage and text noise by combining category description, keyword, and network structure analysis and using weakly-supervised learning methods with hierarchical structure information within the text. In addition, they also implemented a self-training strategy, which effectively enhanced the model's ability to handle complex, multi-faceted tasks, including improving the processing of labeled data. In order to solve the difficulties in weakly-supervised learning in existing NER, the following points may be the focus in the future. First, the model's adaptability in multiple languages and domains should be improved, and the data diversity brought by globalization should be responded to by developing cross-language and cross-domain transfer learning technologies. The second is to focus on data enhancement and simulation data generation and use technologies such as the generative adversarial network to make up for the lack of labeled data and enhance the robustness of the model. The third is to improve the processing method of noise labels and adopt a more accurate noise detection and correction mechanism to optimize the model's performance in complex data environments.
In addition to the above methods or strategies, other advanced technologies and methods have received increasing attention in NER research. For example, self-supervised learning utilizes large amounts of unlabeled data to learn useful feature representations. As an effective pre-training strategy, this method has been proven to significantly improve the performance of NER. In few-shot learning scenarios, meta-learning demonstrates its ability to quickly adapt to new tasks. By learning how to learn efficiently, meta-learning enables models to quickly adapt to new tasks or environments with limited data, which is especially important in fields where data is scarce. Additionally, incremental learning is another important approach that allows models to gradually adapt as they receive new data or face new tasks without completely retraining each time. This strategy is particularly effective in dealing with changing data environments because it keeps the model flexible and adaptable. Model compression and distillation techniques are becoming increasingly important in the NER field. These techniques reduce the size of large models, making them more suitable for environments with limited computing resources while maintaining or improving model performance. In summary, NER research is constantly evolving towards more efficient, adaptable, and resource-efficient directions, and these advanced methods and strategies manifest this trend.
7. Summary and prospect
This paper uses the literature in the field of NER obtained from the Web of Science core collection database as the data source. The following conclusions are drawn using CiteSpace software to comprehensively analyze NER's research status, existing achievements, research cooperation, research frontiers, and hotspots from macro and micro perspectives. The superior performance of deep learning in NER research has made the field of NER develop rapidly. According to the trend of the number of documents issued, the NER field is in a period of rapid development at this stage. From the perspective of research directions and journal distribution, NER research mainly involves computer science, medicine, biology, chemistry, and other disciplines, which shows that NER has interdisciplinary and cross-field common components. This multidisciplinary intersection has brought new application scenarios and research perspectives for developing NER technology, such as precise identification of biomedical named entities and entity identification of compound reactants. From the perspective of the cooperation between authors and the number of papers published, the core authors in the early stage of NER development include Munoz, R, Li, YP, and other authors, and the cooperation between those authors is close. In the mid-term, with Ananiadou, S, Xu, H, and other authors as the main body, the cooperation has become more intimate. Recent highly productive authors include Lin, HF, Qiu, QJ, and others. There are 63 prolific authors in the field, but the cooperative relationship between authors needs to be strengthened. This reminds us that deepening academic exchanges and cooperation not only contributes to knowledge sharing but also stimulates new creativity and technology integration, further promoting the innovation and development of NER research. From the perspective of the number of publications and cooperation among countries, countries with a higher volume of publications include PEOPLES R CHINA, the USA, ENGLAND, etc. The cooperation between countries is close. The number of publications in a country reflects, to a certain extent, the development level of NER technology in the language used in that country. The NER technology research in languages such as Chinese, English, and Arabic is significantly active. Meanwhile, we have also observed that other languages, such as Spanish, French, and German, are rapidly developing in NER technology. The cooperation between these countries shows the important role of international cooperation in promoting the global development of NER technology. In addition, the active research on multilingual NER technology reflects the urgent need to process multilingual information in the context of globalization. Encouraging international cooperation can accelerate technological progress and help promote information understanding and exchange in different languages and cultural backgrounds. From the perspective of inter-institutional cooperation and publication volume, the institutions with higher publication volume include Chinese Acad Sci, Harbin Inst Technol, and Dalian Univ Technol. The cooperation between institutions is mainly focused on the cooperation between universities, with less cooperation between schools and enterprises and less publication by enterprises. In the future, exploring and promoting cooperation models between universities and enterprises is expected to bring new opportunities for the application and industrialization of NER technology. The business community's demand for practical application of NER technology can provide rich practical scenarios for academic research. At the same time, the latest research results from academia can help companies solve technical challenges and rapidly transform and apply technology.
Based on the co-citation frequency of literature, the mainstream model BERT proposed by Devlin J. (2019) has made significant contributions to the development of NER. Lample G. (2016) paper extensively uses character-level information in NER tasks for the first time. This innovation provides new ideas for later processing of complex morphological languages (such as compound words in English). Vaswani A (2017) proposed the Transformer architecture, and its innovative attention mechanism marked an important turning point in the field of NLP. In addition to the high frequency of co-cited literature mentioned above, other literature provides important methods and strategies for developing NER. The literature cluster analysis concludes that the research frontiers of NER include PLM, cross-language and cross-domain NER, nested and fine-grained NER, multimodal NER, few-shot NER, etc. Pre-trained language models such as BERT and its variants significantly improve the machine's ability to understand natural language by leveraging large amounts of text data to learn the deep features of the language. This progress not only brings a qualitative leap to the NER task but also provides new tools and methods for the entire field of natural language processing, especially in processing language context and understanding complex relationships. The progress of cross-language and cross-domain NER technology enables machines to better transfer and apply knowledge between different languages and fields, breaking down the barriers of language and professional knowledge and bringing new possibilities for global information sharing and knowledge management. Nested NER focuses on identifying mutually contained or overlapping entities, such as simultaneously annotating diseases and their related symptoms in the medical literature. Fine-grained NER strives to distinguish nuanced entity categories, such as further subdividing “organizations” into “non-profit organizations,” “government agencies,” etc. At the same time, fine-grained entity recognition and classification will promote the construction of a richer and more accurate knowledge map and provide basic support for developing the semantic web, intelligent search, and other technologies. The development of modal NER enables machines to more comprehensively understand and process data containing non-text information such as images and sounds, providing a new perspective for social media analysis, multimedia content management, and other applications. Research on a few-shot NER directly addresses the problem of data scarcity, enabling NER technology to quickly adapt to new fields or low-resource languages. With the in-depth development of these technologies, they are expected to have an important impact on intelligent search, personalized recommendation, intelligent assistants, and other fields. Although the pre-training model, cross-language and cross-domain NER, nested and fine-grained NER, multimodal NER, and small sample NER technologies have made significant progress in the field of NER in recent years, which has promoted the machine's ability to understand natural language and its application scope to expand significantly, especially in understanding language context, complex relationship processing, and multilingual information processing. However, there are still areas to be explored. Future research areas include but are not limited to: For the pre-trained model, how to further optimize its performance in processing long text, computational efficiency, and model interpretability, such as exploring innovative methods of model compression and interpretability mechanisms; In terms of cross language and cross domain NER, we will deeply study how to effectively deal with low resource languages with complex structure and changeable syntax through new in-depth learning methods, and how to better adapt and migrate the model to different professional fields, especially highly specialized fields; The main challenges for nested and fine-grained NER include accurately identifying and classifying intricately intertwined fine-grained entities in text, and understanding and parsing subtle relationships between entities; Multimodal NER, it is possible to explore and develop more diversified information fusion mechanisms in the future, focusing on how to make full use of and fuse information of multiple modes such as text, image, voice, etc. to improve the accuracy and robustness of entity recognition; In addition, the challenge of few-sample NER lies in how to use advanced technologies such as transfer learning and meta-learning to achieve rapid adaptation and improved generalization capabilities of the model under limited annotated data. The in-depth exploration of these research directions will not only fill the current technology gap but also greatly promote the development of NER technology in theory and practical applications, bringing new breakthroughs and innovations to the field of natural language processing.
From the keyword map analysis, the current research hotspots of NER include attention networks, multi-task joint learning, transfer learning, adversarial training, deep active learning, federated learning, distance-supervision learning, weakly-supervised learning, and other methods or strategies. These methods have greatly promoted the progress of models in understanding complex contexts and entity recognition accuracy. Among them, the attention mechanism enhances the model's focus on key information in the input data. It becomes a key factor in improving model performance, especially when understanding complex contexts and dealing with long-distance dependencies. Multi-task joint learning shows the potential to improve model generalization capabilities and learning efficiency by processing multiple related tasks in parallel in the same model. Transfer learning, especially the method of fine-tuning pre-trained models, shows excellent performance under low resource conditions and can significantly reduce the reliance on large amounts of labeled data. Adversarial training enhances the robustness of the model by introducing adversarial samples, helping the model maintain stable performance in the face of small perturbations in the input data. The deep active learning strategy effectively reduces the amount of annotation data needed by intelligently selecting samples with large amounts of information to annotate, which is especially suitable for scenarios with scarce annotation resources. Federated learning emphasizes jointly improving the model through collaborative training on multiple devices or servers while maintaining data privacy, which is particularly important for processing sensitive data. Distance supervision and weakly supervised learning effectively solve the problem of insufficient annotated data by utilizing existing knowledge bases or incompletely accurate annotated data to train models. However, these advances bring new possibilities to NER in practical applications. How to accurately transfer knowledge learned from one domain to another, handle differences in data distribution, and optimize models to adapt to new tasks remain challenges. In addition, applications in low-resource fields such as mechanical engineering and agricultural science face unique challenges, including but not limited to the accurate identification of professional terms and complex entities and the adaptability of models to domain-specific language patterns. For example, in the field of mechanical engineering, documents may be full of professional terms, technical parameters, CAD drawings, etc. The NER system is required not only to have high accuracy and robustness but also to be able to adapt to specific terms and expressions in various mechanical fields. Strategies such as adversarial training, deep active learning, federated learning, remote supervision, and weakly supervised learning show the potential to improve model robustness, reduce labeling requirements, and protect data privacy. However, how to effectively integrate these strategies to solve the specific problems in NER and improve the accuracy and efficiency of the model still needs further exploration.
Although this article attempts to analyze the NER field comprehensively, there are still certain limitations. First, during the literature search process, “Named Entity Recognition” was selected as the primary search keyword, which ensured that we could effectively locate a wide range of literature directly related to NER. However, this search strategy may not fully cover all research literature in this field, especially those that may use different terms or keywords to describe similar concepts. In addition, interdisciplinary research or emerging technology applications may be published with different keywords, resulting in some unusual keywords not being included in this analysis. Future research can consider adopting more extensive search strategies, including more keywords and terms, to cover the literature in this field as comprehensively as possible.
Data availability
The data that was collected and analyzed during this study is contained in this published article and the data that was used to support the findings of this review are listed in the references at the end of the article.
CRediT authorship contribution statement
Jun Yang: Writing – original draft, Software, Formal analysis, Data curation, Conceptualization. Taihua Zhang: Writing – review & editing, Supervision. Chieh-Yuan Tsai: Writing – review & editing, Supervision. Yao Lu: Writing – review & editing, Supervision. Liguo Yao: Writing – review & editing, Supervision, Conceptualization.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported by Guizhou Provincial Basic Research Program (Natural Science) (Grant No. Qiankehejichu-ZK[2022]General 320), National Natural Science Foundation (Grant No. 72061006) and Academic New Seedling Foundation Project of Guizhou Normal University (Grant No. Qianshixinmiao-[2021]A30).
Contributor Information
Jun Yang, Email: juny@gznu.edu.cn.
Taihua Zhang, Email: zhangth542@gznu.edu.cn.
Chieh-Yuan Tsai, Email: cytsai@saturn.yzu.edu.tw.
Yao Lu, Email: yao.lu@gznu.edu.cn.
Liguo Yao, Email: lgyao@gznu.edu.cn.
References
- 1.Chinchor N.A. Proceedings of the Sixth Message Understanding Conference (MUC-6) 1995. Named entity task definition; pp. 317–332. [Google Scholar]
- 2.Dang T.H., Le H.Q., Nguyen T.M., Vu S.T. D3NER: biomedical named entity recognition using CRF-biLSTM improved with fine-tuned embeddings of various linguistic information. Bioinformatics. 2018;34:3539–3546. doi: 10.1093/bioinformatics/bty356. [DOI] [PubMed] [Google Scholar]
- 3.Karaa W.B., Alkhammash E.H., Bchir A. Drug disease relation extraction from biomedical literature using NLP and machine learning. Mobile Inf. Syst. 2021:2021. doi: 10.1155/2021/9958410. [DOI] [Google Scholar]
- 4.Hemati W., Mehler A. LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools. J. Cheminf. 2019;11 doi: 10.1186/s13321-018-0327-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Usie A., Alves R., Solsona F., Vazquez M., Valencia A. CheNER: chemical named entity recognizer. Bioinformatics. 2014;30:1039–1040. doi: 10.1093/bioinformatics/btt639. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Antony J.B., Mahalakshmi G.S. Content-based information retrieval by named entity recognition and verb semantic role labelling. J. Univers. Comput. Sci. 2015;21:1830–1848. [Google Scholar]
- 7.Khademi M.E., Fakhredanesh M. Persian automatic text summarization based on named entity recognition. Iranian Journal of Science and Technology-Transactions of Electrical Engineering. 2020 doi: 10.1007/s40998-020-00352-2. [DOI] [Google Scholar]
- 8.Guan F.M., Tezuka T. 2022 Ieee Symposium Series on Computational Intelligence (Ssci) 2022. A medical Q&A system with entity linking and intent recognition; pp. 820–829. [DOI] [Google Scholar]
- 9.Li Z., Qu D., Xie C.J., Zhang W.L., Li Y.X. Language model pre-training method in machine translation based on named entity recognition. Int. J. Artif. Intell. Tool. 2020;29 doi: 10.1142/S0218213020400217. [DOI] [Google Scholar]
- 10.Wang L., Jiang J.C., Song J.W., Liu J. A weakly-supervised method for named entity recognition of agricultural knowledge graph. Intelligent Automation and Soft Computing. 2023;37:833–848. doi: 10.32604/iasc.2023.036402. [DOI] [Google Scholar]
- 11.Hanisch D., Fundel K., Mevissen H.T., Zimmer R., Fluck J. ProMiner: rule-based protein and gene entity recognition. BMC Bioinf. 2005;6 doi: 10.1186/1471-2105-6-S1-S14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Quimbaya A.P., Munera A.S., Rivera R.A.G., Rodriguez J.C.D., Velandia O.M.M., Pena A.A.G., Labbe C. International Conference on Enterprise Information Systems/International Conference on Project Management/International Conference on Health And Social Care Information Systems and Technologies. CENTERIS/PROJMAN/HCIST; 2016, 2016. Named entity recognition over electronic health records through a combined dictionary-based approach; pp. 55–61. [DOI] [Google Scholar]
- 13.McNamee P., Mayfield J. 2002. Entity Extraction without Language-specific Resources. [Google Scholar]
- 14.McCallum A., Li W. Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons. 2003. pp. 188–191. [Google Scholar]
- 15.Humphreys K., Gaizauskas R., Azzam S., Huyck C., Mitchell B., Cunningham H., Wilks Y. 1998. University of Sheffield: Description of the LaSIE-II System as Used for MUC-7. [Google Scholar]
- 16.Krupka G.R., Hausman K. 1998. IsoQuest Inc.: Description of the NetOwl™ Extractor System as Used for MUC-7. [Google Scholar]
- 17.Black W.J., Rinaldi F., Mowatt D. 1998. FACILE: Description of the NE System Used for MUC-7. [Google Scholar]
- 18.Appelt D.E., Hobbs J.R., Bear J., Israel D., Kameyama M., Kehler A., Martin D., Myers K., Tyson M. 1995. SRI International FASTUS SystemMUC-6 Test Results and Analysis. [Google Scholar]
- 19.Liu P., Guo Y.M., Wang F.L., Li G.H. Chinese named entity recognition: the state of the art. Neurocomputing. 2022;473:37–53. doi: 10.1016/j.neucom.2021.10.101. [DOI] [Google Scholar]
- 20.Eddy S.R. Hidden Markov models. Curr. Opin. Struct. Biol. 1996;6:361–365. doi: 10.1016/S0959-440X(96)80056-X. [DOI] [PubMed] [Google Scholar]
- 21.Kapur J.N. John Wiley & Sons; 1989. Maximum-entropy Models in Science and Engineering. [Google Scholar]
- 22.Hearst M.A., Dumais S.T., Osuna E., Platt J., Scholkopf B. Support vector machines. IEEE Intell. Syst. Their Appl. 1998;13:18–28. doi: 10.1109/5254.708428. [DOI] [Google Scholar]
- 23.Rokach L., Maimon O. Top-down induction of decision trees classifiers - a survey. IEEE Trans. Syst. Man Cybern. C Appl. Rev. 2005;35:476–487. doi: 10.1109/TSMCC.2004.843247. [DOI] [Google Scholar]
- 24.Lafferty J., McCallum A., Pereira F.C.N. 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. [Google Scholar]
- 25.Mi B.G., Fan Y. A review: development of named entity recognition (NER) technology for aeronautical information intelligence. Artif. Intell. Rev. 2023;56:1515–1542. doi: 10.1007/s10462-022-10197-2. [DOI] [Google Scholar]
- 26.Marrero M., Urbano J., Sanchez-Cuadrado S., Morato J., Gomez-Berbis J.M. Named entity recognition: fallacies, challenges and opportunities. Comput. Stand. Interfac. 2013;35:482–489. doi: 10.1016/j.csi.2012.09.004. [DOI] [Google Scholar]
- 27.Goyal A., Gupta V., Kumar M. Recent named entity recognition and classification techniques: a systematic review. Computer Science Review. 2018;29:21–43. doi: 10.1016/j.cosrev.2018.06.001. [DOI] [Google Scholar]
- 28.Nasar Z., Jaffry S.W., Malik M.K. Named entity recognition and relation extraction: state-of-the-art. ACM Comput. Surv. 2021;54 doi: 10.1145/3445965. [DOI] [Google Scholar]
- 29.Li J., Sun A.X., Han J.L., Li C.L. A survey on deep learning for named entity recognition. IEEE Trans. Knowl. Data Eng. 2022;34:50–70. doi: 10.1109/TKDE.2020.2981314. [DOI] [Google Scholar]
- 30.Li Y.X., Wang Y., Rui X., Li Y.X., Li Y., Wang H.Z., Zuo J., Tong Y.D. Sources of atmospheric pollution: a bibliometric analysis. Scientometrics. 2017;112:1025–1045. doi: 10.1007/s11192-017-2421-z. [DOI] [Google Scholar]
- 31.Yu D.J., Pan T.X. Tracing knowledge diffusion of TOPSIS: a historical perspective from citation network. Expert Syst. Appl. 2021;168 doi: 10.1016/j.eswa.2020.114238. [DOI] [Google Scholar]
- 32.Yu D.J., Sheng L.B., Xu Z.S. Analysis of evolutionary process in intuitionistic fuzzy set theory: a dynamic perspective. Inf. Sci. 2022;601:175–188. doi: 10.1016/j.ins.2022.04.019. [DOI] [Google Scholar]
- 33.Chen C., Leydesdorff L. Technology, Patterns of Connections and Movements in Dual‐map Overlays: A New Method of Publication Portfolio Analysis, 2014;65:334–351. [Google Scholar]
- 34.Chen C.M., Hu Z.G., Liu S.B., Tseng H. Emerging trends in regenerative medicine: a scientometric analysis in CiteSpace. Expet Opin. Biol. Ther. 2012;12:593–608. doi: 10.1517/14712598.2012.674507. [DOI] [PubMed] [Google Scholar]
- 35.Doddington G.R., Mitchell A., Przybocki M.A., Ramshaw L.A., Strassel S., Weischedel R.M. International Conference on Language Resources and Evaluation. 2004. The automatic content extraction (ACE) program – tasks, data, and evaluation. [Google Scholar]
- 36.Cao J.R., van Veen E.M., Peek N., Renehan A.G., Ananiadou S. Ieee Journal of Biomedical and Health Informatics. 2023. A novel automated approach to mutation-cancer relation extraction by incorporating heterogeneous knowledge; pp. 1096–1105. 27. [DOI] [PubMed] [Google Scholar]
- 37.Espinosa K., Georgiadis P., Christopoulou F., Ju M.Z., Miwa M., Ananiadou S. Comparing neural models for nested and overlapping biomedical event detection. BMC Bioinf. 2022;23 doi: 10.1186/s12859-022-04746-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Wei Q., Zhang Y.Y., Amith M., Lin R., Lapeyrolerie J., Tao C., Xu H. Recognizing software names in biomedical literature using machine learning. Health Inf. J. 2020;26:21–33. doi: 10.1177/1460458219869490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Jiang M., Chen Y.K., Liu M., Rosenbloom S.T., Mani S., Denny J.C., Xu H. A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. J. Am. Med. Inf. Assoc. 2011;18:601–606. doi: 10.1136/amiajnl-2011-000163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Zhang T.X., Lin H.F., Ren Y.Q., Yang Z.H., Wang J., Zhang S.W., Xu B., Duan X.D. Adversarial transfer network with bilinear attention for the detection of adverse drug reactions from social media. Appl. Soft Comput. 2021;106 doi: 10.1016/j.asoc.2021.107358. [DOI] [Google Scholar]
- 41.Zhang T.X., Lin H.F., Ren Y.Q., Yang Z.H., Wang J., Duan X.D., Xu B. Identifying adverse drug reaction entities from social media with adversarial transfer learning model. Neurocomputing. 2021;453:254–262. doi: 10.1016/j.neucom.2021.05.007. [DOI] [Google Scholar]
- 42.Lv X., Xie Z., Xu D.X., Jin X.G., Ma K., Tao L.F., Qiu Q.J., Pan Y.S. Chinese named entity recognition in the geoscience domain based on BERT. Earth Space Sci. 2022;9 doi: 10.1029/2021EA002166. [DOI] [Google Scholar]
- 43.Qiu Q.J., Xie Z., Wu L., Tao L.F., Li W.J. BiLSTM-CRF for geological named entity recognition from the geoscience literature. EARTH SCIENCE INFORMATICS. 2019;12:565–579. doi: 10.1007/s12145-019-00390-3. [DOI] [Google Scholar]
- 44.Price D.J., de Solla. Little Science, Big Science. Columbia University Press; New York: 1963. 1963. [Google Scholar]
- 45.Yu D.J., Kou G., Xu Z.S., Shi S.S. Analysis of collaboration evolution in AHP research: 1982-2018. Int. J. Inf. Technol. Decis. Making. 2021;20:7–36. doi: 10.1142/S0219622020500406. [DOI] [Google Scholar]
- 46.Qian G. Scientometric sorting by importance for literatures on life cycle assessments and some related methodological discussions. Int. J. Life Cycle Assess. 2014;19:1462–1467. doi: 10.1007/s11367-014-0747-9. [DOI] [Google Scholar]
- 47.Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser L., Polosukhin I. vol. 30. NIPS; 2017. Attention is all you need. (ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS). ), 2017. [Google Scholar]
- 48.Lample G., Ballesteros M., Subramanian S., Kawakami K., Dyer C. North American Chapter of the. Association for Computational Linguistics; 2016. Neural architectures for named entity recognition. [Google Scholar]
- 49.Lee J., Yoon W., Kim S., Kim D., Kim S., So C.H., Kang J. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020;36:1234–1240. doi: 10.1093/bioinformatics/btz682. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Leaman R., Wei C.H., Lu Z.Y. tmChem: a high performance approach for chemical named entity recognition and normalization. J. Cheminf. 2015;7 doi: 10.1186/1758-2946-7-S1-S3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Sang E.F., De Meulder F.J.a.p.c. 2003. Introduction to the CoNLL-2003 Shared Task: Language-independent Named Entity Recognition. [Google Scholar]
- 52.Collobert R., Weston J., Bottou L., Karlen M., Kavukcuoglu K., Kuksa P. vol. 12. 2011. pp. 2493–2537. (Natural Language Processing (Almost) from Scratch). [Google Scholar]
- 53.Dai A.M., Le Q.V. 2015. Semi-supervised Sequence Learning; p. 28. [Google Scholar]
- 54.Chiu J.P.C., Nichols E. Named entity recognition with bidirectional LSTM-CNNs. Transactions of the association for computational linguistics. 2016;4:357–370. [Google Scholar]
- 55.Peters M.E., Neumann M., Iyyer M., Gardner M., Clark C., Lee K., Zettlemoyer L. Deep contextualized word representations. Association for Computational Linguistics. 2018:2227–2237. doi: 10.18653/v1/N18-1202. [DOI] [Google Scholar]
- 56.Yu D.J., Yan Z.P. Combining machine learning and main path analysis to identify research front: from the perspective of science-technology linkage. Scientometrics. 2022;127:4251–4274. doi: 10.1007/s11192-022-04443-1. [DOI] [Google Scholar]
- 57.Devlin J., Chang M.W., Lee K., Toutanova K., Assoc Computat L. vol. 1. 2019. BERT: pre-training of deep bidirectional transformers for language understanding; pp. 4171–4186. (2019 Conference of The North American Chapter of The Association for Computational Linguistics: Human Language Technologies (Naacl Hlt 2019)). [Google Scholar]
- 58.Liu Y., Ott M., Goyal N., Du J., Joshi M., Chen D., Levy O., Lewis M., Zettlemoyer L., Stoyanov V. arXiv Preprint arXiv:1907.11692. 2019. Roberta: a robustly optimized bert pretraining approach. [Google Scholar]
- 59.Lan Z., Chen M., Goodman S., Gimpel K., Sharma P., Soricut R. Albert: a lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.1194. 2019 [Google Scholar]
- 60.Agrawal A., Tripathi S., Vardhan M., Sihag V., Choudhary G., Dragoni N. BERT-based transfer-learning approach for nested named-entity recognition using joint labeling. APPLIED SCIENCES-BASEL. 2022;12 doi: 10.3390/app12030976. [DOI] [Google Scholar]
- 61.Chen M.J., Luo X., Shen H.L., Huang Z.Y., Peng Q.J. A novel named entity recognition scheme for steel E-commerce platforms using a lite BERT. CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES. 2021;129:47–63. doi: 10.32604/cmes.2021.017491. [DOI] [Google Scholar]
- 62.Li X.Y., Zhang H., Zhou X.H. Chinese clinical named entity recognition with variant neural structures based on BERT methods. J. Biomed. Inf. 2020:107. doi: 10.1016/j.jbi.2020.103422. [DOI] [PubMed] [Google Scholar]
- 63.Radford A., Narasimhan K., Salimans T., Sutskever I. 2018. Improving Language Understanding by Generative Pre-training. [Google Scholar]
- 64.Brown T., Mann B., Ryder N., Subbiah M., Kaplan J.D., Dhariwal P., Neelakantan A., Shyam P., Sastry G., Askell A., Agarwal S., Herbert-Voss A., Krueger G., Henighan T., Child R., Ramesh A., Ziegler D., Wu J., Winter C., Hesse C., Chen M., Sigler E., Litwin M., Gray S., Chess B., Clark J., Berner C., McCandlish S., Radford A., Sutskever L., Amodei D., Language Models Are Few-Shot Learners, vol. 33, 2020, pp. 1877–1901.
- 65.Yang Z., Dai Z., Yang Y., Carbonell J., Salakhutdinov R.R., Q.V.J.A.i.n.i.p.s. Le . 2019. Xlnet: Generalized Autoregressive Pretraining for Language Understanding; p. 32. [Google Scholar]
- 66.Dai Z., Yang Z., Yang Y., Carbonell J., Le Q.V., Salakhutdinov R.J.a.p.a. 2019. Transformer-xl: Attentive Language Models beyond a Fixed-Length Context. [Google Scholar]
- 67.Sun Y., Wang S., Li Y., Feng S., Chen X., Zhang H., Tian X., Zhu D., Tian H., Wu H.J.a.p.a. 2019. Ernie: Enhanced Representation through Knowledge Integration. [Google Scholar]
- 68.Clark K., Luong M.-T., Le Q.V., Manning C.D. 2020. Electra: Pre-training Text Encoders as Discriminators rather than Generators. [Google Scholar]
- 69.Wang S., Sun X., Li X., Ouyang R., Wu F., Zhang T., Li J., Wang G.J.a.p.a. 2023. Gpt-ner: Named Entity Recognition via Large Language Models. [Google Scholar]
- 70.Covas E.J.a.p.a. 2023. Named Entity Recognition Using GPT for Identifying Comparable Companies. [Google Scholar]
- 71.Yan R., Jiang X., Dang D.J.N.P.L. vol. 53. 2021. pp. 3339–3356. (Named Entity Recognition by Using XLNet-BiLSTM-CRF). [Google Scholar]
- 72.Yang D., Wan F., Zhang Y. 2022 4th International Conference on Advances in Computer Technology, Information Science and Communications (CTISC) IEEE; 2022. Named entity recognition in XLNet cyberspace security domain based on dictionary embedding; pp. 1–5. [Google Scholar]
- 73.Conneau A., Khandelwal K., Goyal N., Chaudhary V., Wenzek G., Guzmán F., Grave E., Ott M., Zettlemoyer L., Stoyanov V.J.a.p.a. 2019. Unsupervised Cross-Lingual Representation Learning at Scale. [Google Scholar]
- 74.Keung P., Lu Y., Bhardwaj V.J.a.p.a. 2019. Adversarial Learning with Contextual Embeddings for Zero-Resource Cross-Lingual Classification and NER. [Google Scholar]
- 75.Feng X., Feng X., Qin B., Feng Z., Liu T. IJCAI; 2018. Improving Low Resource Named Entity Recognition Using Cross-Lingual Knowledge Transfer; pp. 4071–4077. [Google Scholar]
- 76.Liu Z., Xu Y., Yu T., Dai W., Ji Z., Cahyawijaya S., Madotto A., Fung P. Proceedings of the AAAI Conference on Artificial Intelligence. 2021. Crossner: evaluating cross-domain named entity recognition; pp. 13452–13460. [Google Scholar]
- 77.Jia C., Liang X., Zhang Y. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. Cross-domain NER using cross-domain language modeling; pp. 2464–2474. [Google Scholar]
- 78.Chen S., Aguilar G., Neves L., Solorio T.J.a.p.a. 2021. Data Augmentation for Cross-Domain Named Entity Recognition. [Google Scholar]
- 79.Brack A., Hoppe A., Buschermöhle P., Ewerth R. Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries. 2022. Cross-domain multi-task learning for sequential sentence classification in research papers; pp. 1–13. [Google Scholar]
- 80.Liu Z.H., Xu Y., Yu T.Z., Dai W.L., Ji Z.W., Cahyawijaya S., Madotto A., Fung P., Assoc I. Thirty-Fifth Aaai Conference On Artificial Intelligence, Thirty-Third Conference On Innovative Applications Of Artificial Intelligence And The Eleventh Symposium On Educational Advances In Artificial Intelligence. 2021. Advancement artificial, CrossNER: evaluating cross-domain named entity recognition; pp. 13452–13460. [Google Scholar]
- 81.Peng Q., Zheng C.M., Cai Y., Wang T., Xie H.R., Li Q. Unsupervised cross-domain named entity recognition using entity-aware adversarial training. Neural Network. 2021;138:68–77. doi: 10.1016/j.neunet.2020.12.027. [DOI] [PubMed] [Google Scholar]
- 82.Katiyar A., Cardie C. North American Chapter of the. Association for Computational Linguistics; 2018. Nested named entity recognition revisited. [Google Scholar]
- 83.Wang Y., Tong H.H., Zhu Z.Y., Li Y. Nested named entity recognition: a survey. ACM Trans. Knowl. Discov. Data. 2022;16 doi: 10.1145/3522593. [DOI] [Google Scholar]
- 84.Shen D., Zhang J., Zhou G., Su J., Tan C.L. BioNLP@ACL. 2003. Effective adaptation of hidden Markov model-based named entity recognizer for biomedical domain. [Google Scholar]
- 85.Ju M., Miwa M., Ananiadou S. North American Chapter of the Association for Computational Linguistics. 2018. A neural layered model for nested named entity recognition. [Google Scholar]
- 86.Xu M.B., Jiang H., Watcharawittayakul S. vol. 1. 2017. A local detection approach for named entity recognition and mention detection; pp. 1237–1247. (Proceedings of The 55th Annual Meeting Of The Association For Computational Linguistics (Acl 2017)). [DOI] [Google Scholar]
- 87.Lu W., Roth D. Conference On Empirical Methods in Natural Language Processing. 2015. Joint mention extraction and classification with mention hypergraphs. [Google Scholar]
- 88.Wang B.L., Lu W., Wang Y., Jin H.X., Assoc Computat L. 2018 Conference On Empirical Methods in Natural Language Processing (Emnlp 2018) 2018. A neural transition-based model for nested mention recognition; pp. 1011–1017. [Google Scholar]
- 89.Geng R.S., Chen Y.P., Huang R.Z., Qin Y.B., Zheng Q.H. Planarized sentence representation for nested named entity recognition. Inf. Process. Manag. 2023;60 doi: 10.1016/j.ipm.2023.103352. [DOI] [Google Scholar]
- 90.Cui S.M., Joe I. vol. 35. Neural Computing & Applications; 2023. pp. 2561–2574. (A Multi-Head Adjacent Attention-Based Pyramid Layered Model for Nested Named Entity Recognition). [DOI] [Google Scholar]
- 91.Chen Y.P., Huang R., Pan L.J., Huang R.Z., Zheng Q.H., Chen P. A controlled attention for nested named entity recognition. Cognitive Computation. 2023;15:132–145. doi: 10.1007/s12559-023-10112-z. [DOI] [Google Scholar]
- 92.Rodríguez A.J.C., Castro D.C., García S.H. vol. 193. 2022. (Noun-based Attention Mechanism for Fine-Grained Named Entity Recognition). [Google Scholar]
- 93.Wan Q., Wei L., Zhao S., Liu J.J.K.-B.S. A span-based multi-modal attention network for joint entity-relation extraction. 2023;262 [Google Scholar]
- 94.Wang X., Hu V., Song X., Garg S., Xiao J., Han J. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. ChemNER: fine-grained chemistry named entity recognition with ontology-guided distant supervision. [Google Scholar]
- 95.Yu J., Jiang J., Yang L., Xia R. Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer. Association for Computational Linguistics; 2020. [Google Scholar]
- 96.Zhang D., Wei S., Li S., Wu H., Zhu Q., Zhou G. Proceedings of the AAAI Conference on Artificial Intelligence. 2021. Multi-modal graph fusion for named entity recognition with targeted visual guidance; pp. 14347–14355. [Google Scholar]
- 97.Zhou B., Zhang Y., Song K., Guo W., Zhao G., Wang H., Yuan X. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022. A span-based multimodal variational autoencoder for semi-supervised multimodal named entity recognition; pp. 6293–6302. [Google Scholar]
- 98.Wang H., Cheng L., Zhang W., Soh D.W., Bing L.J.a.p.a. 2023. Enhancing Few-Shot NER with Prompt Ordering Based Data Augmentation. [Google Scholar]
- 99.Chen J., Liu Q., Lin H., Han X., Sun L.J.a.p.a. 2022. Few-shot Named Entity Recognition with Self-Describing Networks. [Google Scholar]
- 100.Das S.S.S., Katiyar A., Passonneau R.J., Zhang R.J.a.p.a. 2021. CONTaiNER: Few-Shot Named Entity Recognition via Contrastive Learning. [Google Scholar]
- 101.Chen Y., Zheng Y., Yang Z.J.a.p.a. 2022. Prompt-Based Metric Learning for Few-Shot NER. [Google Scholar]
- 102.Lee D.-H., Kadakia A., Tan K., Agarwal M., Feng X., Shibuya T., Mitani R., Sekiya T., Pujara J., Ren X.J.a.p.a. 2021. Good Examples Make a Faster Learner: Simple Demonstration-Based Learning for Low-Resource NER. [Google Scholar]
- 103.Shen Y., Tan Z., Wu S., Zhang W., Zhang R., Xi Y., Lu W., Zhuang Y.J.a.p.a. 2023. PromptNER: Prompt Locating and Typing for Named Entity Recognition. [Google Scholar]
- 104.Yu D.J., Xu Z.S., Pedrycz W., Wang W.R. Information sciences 1968-2016: a retrospective analysis with text mining and bibliometric. Inf. Sci. 2017;418:619–634. doi: 10.1016/j.ins.2017.08.031. [DOI] [Google Scholar]
- 105.Kazama J.i., Makino T., Ohta Y., Tsujii J. ACL Workshop on Natural Language Processing in the Biomedical Domain. 2002. Tuning support vector machines for biomedical named entity recognition. [Google Scholar]
- 106.Rocktaschel T., Weidlich M., Leser U. ChemSpot: a hybrid system for chemical named entity recognition. Bioinformatics. 2012;28:1633–1640. doi: 10.1093/bioinformatics/bts183. [DOI] [PubMed] [Google Scholar]
- 107.Mikolov T., Chen K., Corrado G., Dean J. 2013. Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781. [Google Scholar]
- 108.Huang Z., Xu W., Yu K., Bidirectional LSTM-CRF models for sequence tagging, arXiv preprint arXiv:1508.01991, 2015.
- 109.Fraley C., Raftery A.E. Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 2002;97:611–631. doi: 10.1198/016214502760047131. [DOI] [Google Scholar]
- 110.Yu D.J., Liu Y., Xu Z.S. Analysis of knowledge evolution in PROMETHEE: a longitudinal and dynamic perspective. Inf. Sci. 2023;642 doi: 10.1016/j.ins.2023.119151. [DOI] [Google Scholar]
- 111.Donohue J.C. 1973. Understanding Scientific Literatures: A Bibliometric Approach. [Google Scholar]
- 112.He H.F., Sun X. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE. 2017. Aaai, A unified model for cross-domain and semi-supervised named entity recognition in Chinese social media; pp. 3216–3222. [Google Scholar]
- 113.Niu Z.Y., Zhong G.Q., Yu H. A review on the attention mechanism of deep learning. Neurocomputing. 2021;452:48–62. doi: 10.1016/j.neucom.2021.03.091. [DOI] [Google Scholar]
- 114.Bahdanau D., Cho K., Bengio Y.J.C. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. abs/1409.0473. [Google Scholar]
- 115.Xu K., Ba J.L., Kiros R., Cho K., Courville A., Salakhutdinov R., Zemel R.S., Bengio Y. Show, attend and tell: neural image caption generation with visual attention. INTERNATIONAL CONFERENCE ON MACHINE LEARNING. 2015;37:2048–2057. [Google Scholar]
- 116.Zhang S., Sheng Y., Gao J.F., Chen J.H., Huang J.J., Lin S.F. COMPUTER SUPPORTED COOPERATIVE WORK AND SOCIAL COMPUTING. CHINESECSCW; 2019, 2019. A multi-domain named entity recognition method based on part-of-speech attention mechanism; pp. 631–644. [DOI] [Google Scholar]
- 117.Lin J.C.W., Shao Y.N., Djenouri Y., Yun U. ASRNN: a recurrent neural network with an attention model for sequence labeling. Knowl. Base Syst. 2021;212 doi: 10.1016/j.knosys.2020.106548. [DOI] [Google Scholar]
- 118.Xu G.H., Wang C.Y., He X.F. Improving clinical named entity recognition with global neural attention. WEB AND BIG DATA (APWEB-WAIM 2018), PT. 2018;II:264–279. doi: 10.1007/978-3-319-96893-3_20. [DOI] [Google Scholar]
- 119.Zhuang C., Jin X., Zhu W., Liu J.W., Bai L., Cheng X.Q. Deep learning based relation extract-ion: a survey. Chinese Journal of Informatics. 2019;33:1–18. [Google Scholar]
- 120.Geng Z.Q., Zhang Y.H., Han Y.M. Joint entity and relation extraction model based on rich semantics. Neurocomputing. 2021;429:132–140. doi: 10.1016/j.neucom.2020.12.037. [DOI] [Google Scholar]
- 121.Zhang Z.Y., Shu X.B., Liu T.W., Fang Z., Li Q.G. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) 2020. Ieee, joint entity linking and relation extraction with neural networks for knowledge base population. [Google Scholar]
- 122.Li Q., Ji H. vol. 1. 2014. Incremental joint extraction of entity mentions and relations; pp. 402–412. (PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS,). [DOI] [Google Scholar]
- 123.Zhao K., Xu H., Cheng Y., Li X., Gao K.J.K.-B.S. Representation iterative fusion based on heterogeneous graph neural network for joint entity and relation extraction. 2021;219 [Google Scholar]
- 124.Chen T., Zhou L., Wang N., Chen X.J.A.S.C. vol. 119. 2022. (Joint Entity and Relation Extraction with Position-Aware Attention and Relation Embedding). [Google Scholar]
- 125.Yan Z., Yang S., Liu W., Tu K.J.a.p.a. 2023. Joint Entity and Relation Extraction with Span Pruning and Hypergraph Neural Networks. [Google Scholar]
- 126.Bekoulis G., Deleu J., Demeester T., Develder C. Joint entity recognition and relation extraction as a multi-head selection problem. 2018;114:34–45. [Google Scholar]
- 127.Luan Y., Wadden D., He L., Shah A., Ostendorf M., Hajishirzi H.J.a.p.a. 2019. A General Framework for Information Extraction Using Dynamic Span Graphs. [Google Scholar]
- 128.Nguyen D.Q. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 2021. PhoNLP: a joint multi-task learning model for Vietnamese part-of-speech tagging, named entity recognition and dependency parsing; pp. 1–7. [Google Scholar]
- 129.Pan S.J., Yang Q.A. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2010;22:1345–1359. doi: 10.1109/TKDE.2009.191. [DOI] [Google Scholar]
- 130.Peng D., Wang Y., Liu C., Chen Z.J.I.S.F., TL-NER . vol. 22. 2020. pp. 1291–1304. (A Transfer Learning Model for Chinese Named Entity Recognition). [Google Scholar]
- 131.Gligic L., Kormilitzin A., Goldberg P., Nevado-Holgado A.J.N.N. Named entity recognition in electronic health records using transfer learning bootstrapped. Neural Network. 2020;121:132–139. doi: 10.1016/j.neunet.2019.08.032. [DOI] [PubMed] [Google Scholar]
- 132.Yu Y.Q., Wang Y.Z., Mua J.Q., Li W., Jiao S.T., Wang Z., Lv P., Zhu Y.Q. Chinese mineral named entity recognition based on BERT model. Expert Syst. Appl. 2022;206 doi: 10.1016/j.eswa.2022.117727. [DOI] [Google Scholar]
- 133.Yao L.G., Huang H.S., Wang K.W., Chen S.H., Xiong Q.Q. Fine-grained mechanical Chinese named entity recognition based on ALBERT-AttBiLSTM-CRF and transfer learning. SYMMETRY-BASEL. 2020;12 doi: 10.3390/sym12121986. [DOI] [Google Scholar]
- 134.Goodfellow I.J., Shlens J., Szegedy C.J.a.p.a. 2014. Explaining and Harnessing Adversarial Examples. [Google Scholar]
- 135.Wen G.H., Chen H.H., Li H.H., Hu Y., Li Y.H., Wang C.J. Cross domains adversarial learning for Chinese named entity recognition for online medical consultation. J. Biomed. Inf. 2020:112. doi: 10.1016/j.jbi.2020.103608. [DOI] [PubMed] [Google Scholar]
- 136.Park C., Jeong S., Kim J. vol. 225. 2023. (ADMit: Improving NER in Automotive Domain with Domain Adversarial Training and Multi-Task Learning). [Google Scholar]
- 137.Wang J., Xu W., Fu X., Xu G., Wu Y.J.K.-B.S. vol. 197. 2020. (ASTRAL: Adversarial Trained LSTM-CNN for Named Entity Recognition). [Google Scholar]
- 138.Ren P.Z., Xiao Y., Chang X.J., Huang P.Y., Li Z.H., Gupta B.B., Chen X.J., Wang X. A survey of deep active learning. ACM Comput. Surv. 2022;54 doi: 10.1145/3472291. [DOI] [Google Scholar]
- 139.Tong S. Active Learning: Theory and Applications. 2001. [Google Scholar]
- 140.Agrawal A., Tripathi S., Vardhan M. vol. 10. 2021. pp. 113–128. (Active Learning Approach Using a Modified Least Confidence Sampling Strategy for Named Entity Recognition). [Google Scholar]
- 141.Li W., Du Y.J., Li X.Y., Chen X.L., Xie C.Z., Li H., Li X.L. UD_BBC: named entity recognition in social network combined BERT-BiLSTM-CRF with active learning. Eng. Appl. Artif. Intell. 2022:116. doi: 10.1016/j.engappai.2022.105460. [DOI] [Google Scholar]
- 142.Radmard P., Fathullah Y., Lipani A. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) 2021. Subsequence based deep active learning for named entity recognition; pp. 4310–4321. [Google Scholar]
- 143.Konečný J., McMahan H.B., Yu F.X., Richtárik P., Suresh A.T., Bacon D.J.a.p.a. 2016. Federated Learning: Strategies for Improving Communication Efficiency. [Google Scholar]
- 144.Wu C., Wu F., Lyu L., Huang Y., Xie X.J.N.c. vol. 13. 2022. p. 2032. (Communication-efficient Federated Learning via Knowledge Distillation). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 145.Wang R., Yu T., Wu J., Zhao H., Kim S., Zhang R., Mitra S., Henao R. Findings of the Association for Computational Linguistics. vol. 2023. ACL; 2023. Federated domain adaptation for named entity recognition via distilling with heterogeneous tag sets; pp. 7449–7463. [Google Scholar]
- 146.Ge S., Wu F., Wu C., Qi T., Huang Y., Xie X.J.a.p.a. 2020. Fedner: Privacy-Preserving Medical Named Entity Recognition with Federated Learning. [Google Scholar]
- 147.Li Q., Xie T., Peng P., Wang H., Wang G. Findings of the Association for Computational Linguistics. vol. 2023. ACL; 2023. A class-rebalancing self-training framework for distantly-supervised named entity recognition; pp. 11054–11068. [Google Scholar]
- 148.Zhou K., Li Y., Li Q.J.a.p.a. 2022. Distantly Supervised Named Entity Recognition via Confidence-Based Multi-Class Positive and Unlabeled Learning. [Google Scholar]
- 149.Meng Y., Zhang Y., Huang J., Wang X., Zhang Y., Ji H., Han J.J.a.p.a. 2021. Distantly-supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training. [Google Scholar]
- 150.Fries J.A., Steinberg E., Khattar S., Fleming S.L., Posada J., Callahan A., Shah N.H.J.N.c. vol. 12. 2021. p. 2017. (Ontology-driven Weak Supervision for Clinical Entity Classification in Electronic Health Records). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 151.Zhang Y., Jin B., Chen X., Shen Y., Zhang Y., Meng Y., Han J.J.a.p.a. 2023. Weakly Supervised Multi-Label Classification of Full-Text Scientific Papers. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data that was collected and analyzed during this study is contained in this published article and the data that was used to support the findings of this review are listed in the references at the end of the article.