Span-based single-stage joint entity-relation extraction model

Dongchen Han; Zhaoqian Zheng; Hui Zhao; Shanshan Feng; Haiting Pang

doi:10.1371/journal.pone.0281055

. 2023 Feb 7;18(2):e0281055. doi: 10.1371/journal.pone.0281055

Span-based single-stage joint entity-relation extraction model

Dongchen Han ^1,^#, Zhaoqian Zheng ^1,^#, Hui Zhao ^1,^*, Shanshan Feng ¹, Haiting Pang ¹

Editor: Yiming Tang²

PMCID: PMC9904475 PMID: 36749758

Abstract

Extracting entities and relations from the unstructured text has attracted increasing attention in recent years. The existing work has achieved considerable results, yet it is difficult to solve entity overlap and exposure bias. To address cascading errors, exposure bias, and entity overlap in existing entity relation extraction approaches, we propose a joint entity relation extraction model (SMHS) based on a span-level multi-head selection mechanism, transforming entity relation extraction into a span-level multi-head selection problem. Our model uses span-tagger and span-embedding to construct span semantic vectors, utilizes LSTM and multi-head self-attention mechanism for span feature extraction, multi-head selection mechanism for span-level relation decoding, and introduces span classification task for multi-task learning to decode out the relation triad in a single-stage. Experiments on the classic English dataset NYT and the publicly available Chinese relationship extraction dataset DuIE 2.0 show that this method achieves better results than the baseline method, which verifies the effectiveness of this method. Source code and data are published here(https://github.com/Beno-waxgourd/NLP.git).

Introduction

Relation Extraction (RE) [1], as one of the important subtasks of information extraction, aims to identify entities and relations between entities from unstructured text and finally form a triad such as (subject, relation, object). The traditional pipeline approach divides entity relationship extraction into two subtasks. Named Entity Recognition (NER) and Relationship Classification (RC). However, since these two subtasks are independent of each other, the intrinsic connections and dependencies between the two subtasks can be overlooked and are prone to entity overlap and exposure bias. The exposure bias problem arises due to the inconsistent distribution of the real labels used in model training and the model-generated labels used in prediction. Entity overlap refers to some identical entities between different relational triples of a sentence, as shown in Table 1.

Table 1. Examples of entity overlap.

Entity Overlap Type	Text	Triplets
Normal	Obama was born in the United States.	(Obama, birthplace, the United States)
SEO(Single Entity Overlap)	Obama was born in the United States and graduated from Harvard University.	(Obama, birthplace, the United States) (Obama, graduated from, Harvard University)
EPO(Entity Pair Overlap)	Washington D.C. is the capital of the United States.	(the United States, contains, Washington D.C.) (the United States, capital city, Washington D.C.)

Open in a new tab

Unlike pipelined approaches, joint extraction can identify entities and entity relationships with a single model by exploiting the close interaction information between entities and relationships. However, it is still difficult to solve the problems of entity overlap and exposure bias simultaneously.

This paper proposes a joint relation extraction model(SMHS) based on a span multi-head selection mechanism to address the difficulty of simultaneously solving entity overlap and exposure bias concerning extraction. This model treats relation extraction as a span-level multi-head relation selection problem, and the model is divided into two parts as a whole: encoding and decoding. The encoding layer uses span-tagger and span-embedding to construct span-level semantic vectors and uses LSTM and multi-head self-attention for deep span feature extraction; the decoding layer uses a span classifier and multi-head selection mechanism for span classification and relational decoding. The span vectors constructed by the model are independent of each other, which naturally solves the problem of nested entities; combined with the span classification task, the span type information is indirectly incorporated by sharing the span vectors, which enhances the type constraint of the span relation; the relation multi-head selection between the span vectors can directly decode the spans and the relation between the spans and realize the single-step decoding of the relation triad, thus simultaneously solving the problems of entity overlap and exposure bias.

The model was tested on the classic English dataset NYT and the authoritative Chinese public dataset DuIE 2.0 respectively. The results show that compared with the baseline model, the model has significant improvement, and performs well in dealing with SEO and EPO, which proves that the model is effective in solving the problems of entity overlap, error accumulation, exposure deviation, etc.

Related work

Currently, the mainstream modeling approaches for joint extraction are divided into four main types: (a) Modeling entity relation extraction as end-to-end table-filling. Gupta et al. [2] proposed a relation extraction method based on table-filling. However, this method allows entities and entity-relations to share parameters in a model, extracts entities and relations separately, generating redundant information, and has high computational complexity. (b) Modeling entity-relation extraction as sequence-labeling. Zheng et al. [3] first unified entity identification and relationship identification into a single task of sequence-labeling. This approach achieves single-step extraction but cannot overcome the entity overlap problem since it can only assign one label to each token when performing annotation. Dai et al. [4] used a multi-round labeling scheme on this basis to solve the entity overlap problem. Such labeling-based approach requires manual design of an ingenious annotation system. (c) Modeling entity relation extraction as an Encoder-Decoder structure [5, 6], which can cope with the entity overlap problem with good computational complexity. However, it is more difficult to solve the exposure bias and nested entity problems. (d) Modeling entity-relation extraction as a relation mapping between subject entity and object entity. Wei et al. [7] proposed a cascaded extraction framework first to extract principal candidate entities that may involve target relations and then label the corresponding object relations and guest entities for each extracted principal entity. This approach solves the entity overlap problem while failing to address the exposure bias problem.

In addition, Miwa et al. [8] attempted to use a joint extraction method based on shared parameters and used syntactic dependency trees and LSTM to encode entities. However, it still causes entity redundancy and cannot solve the entity overlap problem. Bekoulis et al. [9] viewed relation extraction as a multi-headed selection problem and used tag embedding to incorporate tag features, but decoded token-level relations, which require joint entity boundaries identified by the recognition module to decode entity relations, which will have error accumulation jointly. Katiyar et al. [10] used a pointer network to decode entity relations. However, there is an omission of entities, and the entity boundary information is not fully utilized, which can lead to a distraction of attention mechanisms. LI et al. [11] transformed the relation extraction into a multi-round dialogue task, which can capture hierarchical dependencies well, and the design problem can encode important prior relation information, which is helpful for entity relation extraction. A reinforcement learning mechanism is used to alleviate the error accumulation problem. Dixit et al. [12] use span alignment and span filterer to generate candidate spans, but the final span relation classification generates redundancy, which leads to accuracy degradation. Fu et al. [13]proposed an end-to-end joint extraction model based on graph convolutional neural network (GCN), which takes into account the interaction between named entities and relations to extract relations through relation-weighted GCN better. Wang et al. [14] proposed a handshake-tagging and token-linking-based approach, which decodes entity-relation triples in a single step but still requires combining the recognition results of multiple tokens for relation decoding. Sui et al. [15] treated joint entity and relation extraction as an ensemble prediction problem, used a non-autoregressive decoder for the previous problem of decoding triples in a given order, and used a bipartite graph matching loss function, but was limited by a predetermined number of triples hyperparameters. Zheng et al. [16] designed an extraction method using dynamic sliding window, and designed three mapping strategies to combine and arrange tokens and then tile them into a span sequence. However, the extracted entities with the same relationship label adopt the nearest matching method, which reduces the accuracy.

In general, most mainstream relation extraction models can hardly solve entity overlap and exposure bias simultaneously. In contrast, a small number of models can solve the above two problems but have corresponding drawbacks, such as not using entity type information and less interaction between entities. At present, there is still more room to explore the problems of entity overlap and exposure bias with extraction models.

Model

This section divides the model into three parts: the span-tagger, the encoding, and the decoding, for a detailed description. The overall framework of the model is shown in Fig 1.

Span-tagger

This model has designed a method called a span-tagger, which combines the position indexes of the first and last word elements of the span into an index tuple to tag the span. By enumerating the span, all enumerated spans cover all candidate entities, so that the problem of entity nesting can be solved. The essence of span-tagger is to tag the mapping relationship between the tagged span and its position in the span embedding matrix, that is, to map the position of the original span word to obtain the position in the span embedding matrix. For example: “Obama was born in the United States”. Where “Obama” is tagged as (0, 0) and the span position index is 0, “Obama was” is tagged as (0, 1) and the span position index is 1.., and so on. The tagging sequence of span uses three different strategies for location mapping, namely, the same start mapping strategy, the same end mapping strategy and the same length mapping strategy. See Fig 2 for details.

Fig 2 — (window length is 4). The value in the matrix is the tag value of span, and its position is the index of span. According to different span mapping strategies, the specific location index in the span embedding matrix is different. Span mapping strategy with the same starting point: fix the starting point of the span in the text, change the span length by changing the ending point of the span, until the span length reaches the maximum length of the window, and train the spans with the same starting point together, as shown in the first line of the figure; The mapping strategy for spans with the same endpoint is just the opposite: tag the span by fixing the endpoint of the span, changing the starting point of the span, and train the spans with the same starting point together, as shown in the second line of the figure; Mapping strategy of the same span length: the fixed window length traverses the text, and then increases the window length to traverse the text again until the maximum window length is reached, and the spans with the same length are trained together, as shown in the third line of the figure.

Suppose the length of the input sequence is n and since the sentences are divided by individual characters, the number of tokens is also n. Given the temporal and semantic nature of the text, each token can only form a span from the tag in the position before it. With the length of the text sequence, if all spans are enumerated, the number of spans increases rapidly. It contains a large number of useless spans, reducing accuracy and consuming much computational power. So a window length is set w to filter the useless spans that exceed the window length, then the number of all spans in the sequence of token m are:

\begin{matrix} m = \frac{(2 n + 1) w - w^{2}}{2}, 0 < w \leq n \end{matrix}

(1)

Construct a mapping D that transforms the position index tuples of the head and tail token of a span to the indexes of the corresponding span embedding matrix. Assume that the position index of the head token of a span is i and the position index of the tail token is j. The resulting token position index tuple is (i, j), which is in the kth row of the span embedding lookup table, then we have:

\begin{matrix} D ((i, j)) = k, 0 < i \leq j \leq n, 0 \leq k < m \end{matrix}

(2)

The span is an arrangement of several consecutive tokens. The spans within the window length of each character are enumerated, so the candidate entities and relations must also be in the enumerated spans, which can solve the problem of entity nesting and error accumulation more perfectly. The span tagger enables the token position index tuple to correspond to its constituent span position indexes quickly. It enables the subsequent construction of the span semantic vectors to be combined into a span-embedding matrix according to the results of the span-tagger, which facilitates the subsequent selection of span multi-headed relations.

Encoding

Token representation

The character-level BERT (Bidirectional Encoder Representation from Transformers) [17–19] language model is used to encode the utterance with tokens, extracting contextual semantic information for each character and the overall semantic information of the sentence. Assuming that the sentence input is represented as S = [s₁, s₂, s₃, …, s_n], then the vector after BERT encoding is represented as:

\begin{matrix} H, \bar{H} = B E R T (S), H \in R^{n * d}, \bar{H} \in R^{d} \end{matrix}

(3)

Where H is denoted as the vector of token generated after each token is encoded by BERT, the $\bar{H}$ is the vector of sentences generated after the whole sentence is encoded by BERT.n is the length of the sequence, and d is the dimension of the BERT hidden state.

Span representation

The construction of span semantic vector and span embedding matrix is the core of the model. Previously, span based models were used to extract span features indirectly. However, this model constructs span semantic vector by averaging pooling each word element vector that makes up the span and splicing it with the sentence vector, which can directly extract span features.

Suppose $H_{i : j} = [h_{i}, h_{i + 1}, \dots, h_{j}], H_{i : j} \in R^{(j - i + 1) * d}$ , means that the window length is w with position index i of the head token to the middle of the tail token with position index of j The semantic vector of the span composed is calculated of it is as follows:

\begin{matrix} k = D ((i, j)), 0 < i \leq j \leq n \end{matrix}

(4)

\begin{matrix} \begin{matrix} x_{k}^{'} = C o n c a t (M e a n p o o l (H_{i : j}), \bar{H}), \\ x_{k}^{'} \in R^{2 d} \end{matrix} \end{matrix}

(5)

\begin{matrix} x_{k} = T a n h (W_{h} x_{k}^{'} + b_{h}), x_{k} \in R^{d} \end{matrix}

(6)

where $W_{h} \in R^{d * 2 d}$ is a parameter matrix and $b_{h} \in R^{d}$ is a bias vector to be learned during training.k is the position in the span-embedding matrix of the span composed by the head token position index i to the tail position index j, and x_k denotes the span semantic vector of the composed spans.

The process of Eqs 4 to 6 is carried out cyclically to construct all span semantic vectors, and then the span embedding matrix is composed in order according to the mapping result of the span-tagger. $X = [x_{1}, x_{2}, x_{3}, \dots, x_{m}], X \in R^{m * d}$ , denoted as the span embedding matrix composed of all span semantic vectors.

Span feature extraction

LSTM and multi-headed self-attention mechanism are used to enhance span information interaction and deep span feature extraction.

LSTM [20] controls the input content and the content stored in the memory cell through input gates, forgetting gates, and output gates to form a memory of the previous input information, which can effectively solve the gradient explosion and gradient disappearance problems. The span embedding matrix is encoded by LSTM, which can effectively enhance the information interaction between spans and capture the dependency relation between spans.

Attention mechanism can selectively focus on important information of text. Multi-headed self- attention mechanism [21] is a variation of attention mechanism, in which Q (query), K (key) and V (value) are equal. Multiple queries are used to extract multiple groups of different information from input information in parallel for splicing, and common attention is paid to information from different representation subspaces at different locations, which helps the model capture more span feature information.

Assume P = [p₁, p₂, p₃, …, p_m], denoted as the spans vector after LSTM encoding, then we have:

\begin{matrix} P = L S T M (X), P \in R^{m * d} \end{matrix}

(7)

Assume that T = [t₁, t₂, t₃, …, t_m], denoted as a span vector after encoding by the multi-headed attention mechanism and using z parallel heads are used to capture the features, then the ith head is computed as follows:

\begin{matrix} H e a d_{i} = s o f t m a x (\frac{(Q W_{i}^{Q}) (K W_{i}^{K})}{\sqrt{d / z}}) (V W_{i}^{V}) \end{matrix}

(8)

\begin{matrix} \begin{matrix} T = C o n c a t (h e a d_{1}, h e a d_{2}, \dots, h e a d_{z}) W^{o}, \\ T \in R^{m * d} \end{matrix} \end{matrix}

(9)

where $W_{i}^{Q} \in R^{d * d / z}, W_{i}^{K} \in R^{d * d / z}, W_{i}^{V} \in R^{d * d / z}, W^{o} \in R^{d * d}$ , the $W^{o} \in R^{d * d}$ is the trainbale parameter matrix.

Decoding

The decoding is responsible for predicting two subtasks. Namely, relation extraction (RE) and span classification (SC), as shown in the Fig 3, which will be presented below. Fig 4 shows the flattened matrix on the left side of Fig 3.

Relation extraction

The model uses the span-level multi-headed selection mechanism [9] for relationship extraction, which directly performs multi-headed relationship selection between spans without entity recognition, and extracts relationship triples in a single step.

Assume R = {r₁, r₂, …, r_l} that is denoted as the set of relations and l is the number of relations. $\tilde{S} (t_{i}, t_{j}, r_{k})$ denote the score of the k-th relation between span pair (t_i, t_j). The calculation formula is as follows:

\begin{matrix} \tilde{S} (t_{i}, t_{j}, r_{k}) = V_{k} t a n h (U_{K} t_{i} + W_{k} t_{j} + b_{k}) \end{matrix}

(10)

where $V_{k} \in R^{d}, U_{k} \in R^{d * d}, W_{k} \in R^{d * d}, b_{k} \in R^{d}$ , are parameters for the k-th relation. Next, the probability of span t_i selected as the head of t_j with the relation r_k is calculated as:

\begin{matrix} P_{r} (h e a d = t_{i}, r e l a t i o n = r_{k} | t_{j}) = σ (\tilde{S} (t_{i}, t_{j}, r_{k})) \end{matrix}

(11)

Span classification

Entity information, especially category information, has been shown to help improve the effectiveness of relation extraction models [22], and there are entity category constraints in the relation recognition process. For example, the relation “singer” can only be composed of entities with entity types “figure” and “song”. In order to utilize entity type and entity type constraint information, this method indirectly introduces entity type and entity type constraint information through shared span coding, which is used for multi-task learning of span type classification task and auxiliary training of span relation extraction task. Multi-task learning [23] can improve the generalization and robustness of the model by using the interaction between multiple tasks and the special information contained through shared encoding.

Assume I = {i₁, i₂, …, i_g} that is the set of span types, and g is the number of span types. $\bar{S} (t_{j}, i_{k})$ is denoted span t_j is the score of the k-th type. The calculation formula is as follows:

\begin{matrix} \bar{S} (t_{j}, i_{k}) = Z_{k} t_{j} + b_{k}^{'} \end{matrix}

(12)

where $Z_{k} \in R^{d}, b_{k}^{'} \in R^{1}$ , are parameters for the k-th type. Next, the probability of span t_j is the k-th type is calculated as:

\begin{matrix} P_{t} (t y p e = i_{k} | t_{j}) = σ (\bar{S} (t_{j}, i_{k})) \end{matrix}

(13)

Loss function

The loss function for relation extraction is defined as follows:

\begin{matrix} L o s s_{r e} = \sum_{i = 1}^{m} \sum_{j = 1}^{m} \sum_{k = 1}^{l} - l o g P_{r} (h e a d = t_{i}, r e l a t i o n = r_{k} | t_{j}) \end{matrix}

(14)

The loss function for span classification is defined as follows:

\begin{matrix} L o s s_{s c} = \sum_{j = 1}^{m} \sum_{k = 1}^{g} - l o g P_{t} (t y p e = i_{k} | t_{j}) \end{matrix}

(15)

The final loss function is defined as follows:

\begin{matrix} L o s s_{f i n a l} = α * L o s s_{r e} + β * L o s s_{s c} \end{matrix}

(16)

where α, β is the loss function weight, which is used to balance the two tasks. Our model is implemented with PyTorch and the network weights are optimized with the Adam [24] optimizer.

Experiments

Dataset

For the convenience to compare our model with previous work, we follow the popular choice of datasets:

(a) NYT: A dataset widely used in relation extraction tasks. This dataset is generated by aligning the relationships in freebase with the New York Times (NYT) corpus through remote supervision. It contains more than 100000 triples, 50000 English texts, 24 relationship types, and 1 entity type.
(b) DuIE2.0(https://aistudio.baidu.com/aistudio/competition/detail/46/0/datasets): The largest schema based Chinese relationship extraction data set in the industry, which belongs to Baidu company. It contains more than 430000 triple data, 210000 Chinese sentences, 48 predefined relationship types, 43 simple relationship types and 5 complex relationship types. The sentences in the dataset come from the texts of Baidu Encyclopedia, Baidu tieba and Baidu feed flow. The data annotation is generated through manual annotation and remote supervision.

Since the relationship between enumeration span and enumeration span will occupy a large amount of storage space, texts larger than 62 and entities smaller than 15 will be deleted from DuIE2.0 training set or test set respectively to form new training and test sets. The NYT dataset is not modified. The distribution of the new training set and test set is the same as that of the original dataset. To ensure the fairness of the data, all the comparison tests in this article will be conducted in the new dataset. The statistical information of the dataset is shown in Table 2.

Table 2. Dataset information.

Dataset		Normal	SEO	EPO	Texts	Triples	Max length of text	Max length of entity
NYT	Train	41109	8918	15606	56195	112936	378	9
NYT	Test	3266	1297	978	5000	10142	137	7
DuIE	Train	59464	26999	3338	89801	144858	61	15
DuIE	Test	7208	3257	395	10860	17565	76	15

Open in a new tab

Setting

The GPU used in this experiment is NVIDIA TITAN XP * 4, the operating system is CentOS 7.9, the programming language is Python 3.7, and the deep learning framework is Pytorch 1.8. The pre-training models are bert-base-cased and 768-dimensional Chinese character-level BERT. In the English model, because the text length is too long, if we set a larger window length, it will cause an unbearable cost of time and space, so our window length is set to 6, and the maximum sequence length is 128. In the Chinese model, the sliding window length is 17, and the maximum sequence length is 64. The number of attention heads is the same as the pre-training model, the mapping strategy uses the mapping method with the same starting point, the learning rate is 0.00002, the epoch is 20, α: β = 3: 1.

Evaluation

In this paper, the accuracy, recall and F1 values are used to evaluate the experimental results, and strict criteria are adopted: when the primary entity, primary entity type, customer entity, customer entity type and relationship type are judged correctly, the extraction result is considered correct.

Experimental results and analysis

In order to verify the effectiveness of the model, two groups of experiments are designed in this paper: validation experiment and hyperparameter comparison experiment. Five baselines were selected for the validation experiment; The hyperparameter contrast experiment is designed according to different mapping strategies, window length and other hyperparameter of the model.

Validation experiments

The comparative baseline models used in this paper are: (a) Novel-Tagging [3], a model that transforms the two tasks of relation extraction and entity extraction into a unified sequence annotation. (b) GraphRel [13], an end-to-end joint extraction model based on GCNs. (c) CMAN [25], a deep cross-modal attention network model for label information and entity feature information. (d) TP-Linker [14], a single-stage joint extraction model by linking token and token annotations. (e) SLM [16], an entity-relationship extraction model based on Span labeling.

All the above models use the same BERT pre-training model for token encoding of sentences. The experimental results are shown in Table 3.

Table 3. Comparison of experimental results.

Model	NYT						DuIE
Model	Normal	SEO	EPO	Prec.	Rec.	F1	Normal	SEO	EPO	Prec.	Rec.	F1
Novel-Tagging	-	-	-	32.8	30.6	31.7	54.0	42.1	31.5	75.8	36.7	49.4
GraphRel	69.6	51.2	58.2	63.9	60.0	61.9	-	-	-	52.2	25.8	34.5
CMAN	-	-	-	-	-	-	74.5	69.3	66.1	77.3	68.1	72.4
TP-Linker	90.1	93.4	94.0	91.4	92.6	92.0	74.8	73.1	73.0	78.4	70.5	74.2
SLM	81.3	72.6	75.4	86.1	75.1	80.2	68.1	63.4	68.2	71.3	61.1	65.8
SMHS	89.2	87.3	88.4	97.5	82.0	89.1	77.3	77.0	76.6	80.9	73.8	77.2

Open in a new tab

The default hyperparameter values are: mapping = start, window_length = 6(NYT) and 15(DuIE), α:β=3:1

Table 3 shows that the SMHS method proposed in this paper has achieved the most advanced level in precision, recall and F1 scores. The first four baseline models are all decoded based on the word element vector, which has the disadvantage of unclear entity boundary recognition. Among them, novel tagging cannot overcome the problems of overlapping entities and nested entities, resulting in a low recall rate; GraphRel ignores the correlation between the two tasks, and the prediction of the relationship between entity pairs composed of multiple words may conflict, so the effect is not good; Although CMAN model uses the cross modal method to combine the label information, it does not solve the problems of overlapping entities and exposure bias in essence, resulting in its F1 on DuIE dataset being 5.2% lower than that of our model; Compared with the same single-stage decoding model TP-linker on NYT, our model failed to cover all entities during training because we could not set too long window length, so the effect was slightly inferior, but on the DuIE dataset, we achieved full coverage of entities, the precision rate increased by 2.5%, the recall rate increased by 3.3%, and F1 increased by 3.0%. This is because TP-linker classifies according to the semantic vector combined by the head and tail lexical elements, while other lexical elements between spans do not participate, resulting in insufficient semantic information fusion and the use of type information. Although SLM uses the span labeling method, the extracted entities with the same relationship labels use the nearest matching method, which leads to the F1 of the model on the two datasets is 8.9% and 11.4% lower than that of our model respectively, especially the value of SEO is 13.0% and 13.6% lower.

In contrast, the model in this paper incorporates all token vectors and sentence vectors that constitute the spans, fully exploits token and sentence information, and performs well in SEO and EPO data. The experimental results show that the proposed model can effectively solve the entity overlap and exposure bias problems.

Hyperparameter comparison experiment

This section mainly studies the influence of different span index mapping strategies, different maximum window lengths and different loss function weights on the experimental results. All experiments in this section are conducted under the premise of a fixed learning rate.

(a) Mapping strategies. It is easy to see from Table 4 that when span tagging is performed, mapping strategies with the same starting point can obtain better results.

Table 4. Comparison experiment results of mapping strategy.

Dataset	Mapping Strategies	Prec.	Rec.	F1
NYT	Start	95.7	80.5	87.5
	End	94.7	81.2	87.4
	Length	89.1	74.9	81.4
DuIE	Start	78.9	73.0	75.8
	End	80.3	69.3	74.4
	Length	80.4	59.0	68.1

Open in a new tab

Other hyperparameter values: window_length = 6(NYT) and 15(DuIE), α:β=1:1.

(b) Window length. The maximum length of entities in the dataset can be queried from Table 2. As shown in the Table 5, when the window length is less than the entity length, span tag cannot completely extract all entities, so the results will also be affected. At the same time, it can be seen that when the window length is greater than the entity length, the model can cover all entities in the sample. If the window length continues to increase, the score of the model will increase.

Table 5. Comparison experiment results of mapping strategy.

Dataset	Window Length	Number of Entities	Coverage of Entities	Prec.	Rec.	F1
NYT	3	200814	0.9687	96.5	71.1	81.9
	6	207206	0.9995	95.7	80.5	87.5
	9	207292	1	-	-	-
DuIE	13	225130	0.9903	77.7	71.6	74.5
	15	226614	0.9968	78.9	73.0	75.8
	17	227332	1	78.9	73.9	76.3

Open in a new tab

Other hyperparameter values: mapping = start, α:β=1:1.

(c) α and β. Table 6 shows that if the weight of relation extraction in the loss function is appropriately increased and the weight of span classification is appropriately reduced, the model can get higher scores.

Table 6. Comparison experiment results of mapping strategy.

Dataset	Ratio of α to β	Prec.	Rec.	F1
NYT	1:1	95.7	80.5	87.5
	3:1	97.5	82.0	89.1
	1:3	97.3	76.6	85.7
DuIE	1:1	78.9	73.0	75.8
	3:1	80.9	73.8	77.2
	1:3	81.9	69.5	75.1

Open in a new tab

Other hyperparameter values: mapping = start, window_length = 6(NYT) and 15(DuIE).

Ablation experiments

In order to further explore the influence of design methods, ablation experiments were conducted. The experiments are to remove Span-Classify; Remove the Multi-head Selection; Use Shared-Token representation instead of span representation. The results of the experiments are shown in Table 7.

Table 7. Results of ablation experiments.

Dataset	Research Content	Prec.	Rec.	F1
NYT	Remove Span-Classify	92.3	77.1	84.0
	Remove the Multi-head Selection	86.1	75.1	80.2
	Use Shared-Token	95.3	78.6	86.1
	SMHS	97.5	82.0	89.1
DuIE	Remove Span-Classify	77.0	69.4	73.0
	Remove the Multi-head Selection	71.3	61.1	65.8
	Use Shared-Token	79.1	70.7	74.7
	SMHS	80.9	73.8	77.2

Open in a new tab

The default hyperparameter values are: mapping = start, window_length = 6(NYT) and 15(DuIE), α:β=3:1.

The results show that the remove the Span-Classify results in the decrease of F1 value, indicating that span type information introduced by span type classification is effective for relation extraction model, because for each relationship, only a specific span category can form this type, and span type classification task can introduce this type of information; The removal of Multi-head Selection leads to the decrease of F1 value of the model, which indicates that the design of three-dimensional matrix is effective, and this design enhances the connection of elements in the triplet; Using Shared Token instead of Span results in a decrease in the F1 value of the model, indicating that it is more effective to directly share span vectors. Ablation experiments show that all modules designed to improve the effect of relation extraction in the model are effective.

Conclusion

In this paper, a new span-based multi-head selection model is proposed, which constructs span vector representation by span-tagger as well as span embedding, uses LSTM and multi-head self-attention for span feature extraction, uses multi-head selection mechanism to achieve single-step relation extraction, and introduces span classification to assist training, which provides span type information and span type constraints for relation extraction tasks. Compared with various baseline models, F1 scores obtained the best results on the English relation extraction dataset NYT and the Chinese relation extraction dataset DuIE2.0. Although this method can effectively solve entity overlap, error accumulation, exposure bias, etc., there is still room for exploration. It consumes a large amount of memory space when embedding spans, the high computational complexity when selecting relations, which makes it impossible to model long texts, the practice of constructing span vector representation is a bit rough, and the way to introduce span type information and span type constraints is not straightforward. Unfortunately, due to the constraints of the experimental environment, we failed to finally test how long the window length can reach the peak of the model effect. In other words, there is still room for improvement in our model. The focus of future work is how to build span vectors in a more refined and effective way, so as to optimize the time complexity and space complexity without reducing the accuracy. Besides, introducing type information and type constraints more directly and explicitly is also the direction of the subsequent research.

Supporting information

S1 Dataset

(ZIP)

Click here for additional data file.^{(35.8MB, zip)}

Data Availability

All relevant data are within the paper and its Supporting information files.

Funding Statement

This study was supported by the ”13th Five-Year Plan” Science and Technology Project (JJKH20200677KJ) of the Department of Education of Jilin Province, China.

References

1. Bach N, Badaskar S. A review of relation extraction. Literature review for Language and Statistics II. 2007;2:1–15. [Google Scholar]
2.Gupta P, Schütze H, Andrassy B. Table Filling Multi-Task Recurrent Neural Network for Joint Entity and Relation Extraction. In: In proccedings of the 26th International Conference on Computational Linguistics (COLING2016); 2016.
3.Zheng S, Wang F, Bao H, Hao Y, Zhou P, Xu B. Joint extraction of entities and relations based on a novel tagging scheme. arXiv preprint arXiv:170605075. 2017;.
4.Dai D, Xiao X, Lyu Y, Dou S, She Q, Wang H. Joint extraction of entities and overlapping relations using position-attentive sequence labeling. In: Proceedings of the AAAI conference on artificial intelligence. vol. 33; 2019. p. 6300–6308.
5. Zeng D, Zhang H, Liu Q. CopyMTL: Copy Mechanism for Joint Extraction of Entities and Relations with Multi-Task Learning. Proceedings of the AAAI Conference on Artificial Intelligence. 2020;34(5):9507–9514. doi: 10.1609/aaai.v34i05.6495 [DOI] [Google Scholar]
6.Zhang RH, Liu Q, Fan AX, Ji H, Kurohashi S. Minimize Exposure Bias of Seq2Seq Models in Joint Entity and Relation Extraction. 2020;.
7.Wei Z, Su J, Wang Y, Tian Y, Chang Y. A novel cascade binary tagging framework for relational triple extraction. arXiv preprint arXiv:190903227. 2019;.
8.Miwa M, Bansal M. End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); 2016.
9. Giannis B, Johannes D, Thomas D, Chris D. Joint entity recognition and relation extraction as a multi-head selection problem. Expert Systems with Application. 2018;114(DEC.):34–45. [Google Scholar]
10.Katiyar A, Cardie C. Going out on a limb: Joint extraction of entity mentions and relations without dependency trees. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); 2017. p. 917–928.
11.Li X, Yin F, Sun Z, Li X, Yuan A, Chai D, et al. Entity-Relation Extraction as Multi-Turn Question Answering; 2019.
12.Dixit K, Al-Onaizan Y. Span-level model for relation extraction. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics; 2019. p. 5308–5314.
13.Fu TJ, Ma WY. GraphRel: Modeling Text as Relational Graphs for Joint Entity and Relation Extraction. In: ACL; 2019.
14.Wang Y, Yu B, Zhang Y, Liu T, Sun L. TPLinker: Single-stage Joint Extraction of Entities and Relations Through Token Pair Linking. 2020;.
15.Sui D, Chen Y, Liu K, Zhao J, Liu S. Joint Entity and Relation Extraction with Set Prediction Networks. 2020;. [DOI] [PubMed]
16. Zheng Z, Han D, Zhao H. Joint Extraction of Entities and Relations Model for Single-Step Span-Labeling. Computer Engineering and Applications | Comput Eng Appl. 2022; p. 1–11. [Google Scholar]
17.Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2018;.
18. Devlin J, Chang M, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR. 2018;abs/1810.04805. [Google Scholar]
19.Cui Y, Che W, Liu T, Qin B, Yang Z, Wang S, et al. Pre-Training with Whole Word Masking for Chinese BERT. arXiv preprint arXiv:190608101. 2019;.
20.Hochreiter, Sepp, Schmidhuber, Jurgen. Long short-term memory. Neural Computation. 1997;. [DOI] [PubMed]
21.Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. In: Advances in neural information processing systems; 2017. p. 5998–6008.
22.Peng H, Gao T, Han X, Lin Y, Li P, Liu Z, et al. Learning from context or names? an empirical study on neural relation extraction. arXiv preprint arXiv:201001923. 2020;.
23.Rich, Caruana. Multitask Learning. Machine Learning. 1997;.
24.Kingma D, Ba J. Adam: A Method for Stochastic Optimization. Computer Science. 2014;.
25.Zhao S, Hu M, Cai Z, Liu F. Modeling Dense Cross-Modal Interactions for Joint Entity-Relation Extraction. In: IJCAI; 2020.

PLoS One. doi: 10.1371/journal.pone.0281055.r001

Decision Letter 0

Yiming Tang

27 Sep 2022

PONE-D-22-22843Span-Based Single-Stage Joint Entity-Relation Extraction Model for Chinese Short TextsPLOS ONE

Dear Dr. Han,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

ACADEMIC EDITOR: There are two reviewers. One prefers to a major revision and another one suggests a minor revision. As a result, a major revision is suggested now. ===============================

Please submit your revised manuscript by Nov 11 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Yiming Tang, Ph.D.

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

3. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match.

When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.

4. Please amend the manuscript submission data (via Edit Submission) to include author Shanshan Feng and Haiting Pang.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: No

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript, in its present form, contains several weaknesses. Adequate revisions to the following points should be undertaken to justify the recommendation for publication.

1. Specify the limitations and drawbacks of the proposed method.

2. The contribution is not stated also add at the end of the introduction section.

3. A deep and detailed comparison with other methods is mandatory.

4. Please add future work to the conclusion section and discuss it briefly.

Reviewer #2: The manuscript presents a neural approach for relation extraction from Chinese Short Texts. Experiments were performed on a publicly available dataset and the results were compared with other approaches from the literature. While the work seems to be promising, the following issues should be addressed before publication.

1. Is the approach specific to the Chinese language? If yes, please explain in more detail why (note that this may limit the significane of the work). If not, it would be nice to see how the presented approach works in case of other languages, i.e., experiments on other datasets would increase the scientific value of the manuscript.

2. Based on the description, it is rather difficult to reproduce the approach. Therefore, I suggest to publish the implementation (source codes) in a repository, such as github.

3. I wonder, if both hyperparameters of the loss function, alpha and beta, are needed? It seems that what actually matters is the ratio of alpha and beta, right? Currently, the Authors propose to use alpha=1.5 and beta=0.5, but I guess that we should get the exactly same results with alpha=3 and beta=1 if the learning rate is halfed. Do the Authors agree? If so, one could eliminate one of the hyperparameters out of alpha and beta for simplicity and clarity.

4. According to the first sentence of the "Dataset" subsection in the section titled "Experiments" (line 215), the Authors compare their results with previously published results. However, in the second paragraph of the same section, the Authors write that they modified both the training and the test data, so I wonder if the results on the new test set are directly comparable with the results obtained by other researchers on the original test set?

5. It seems that the Authors did not perform statistical significance tests, although they claim (at the end of the Introduction) that their approach significantly outperform other approaches from the literature.

6. The manuscript should be carefully proofread, esp. because PLOS ONE does not copyedit accepted manuscripts. Minor issues of the current version range from "unusual" (not necessarily incorrect) phrasings over inconsistent use of capital letters, to broken references. A few examples (the list is not intended to be complete!):

- " The model experimented on..."

- Currently, section titles are inconsistent in terms of the usage of capital letters: "Related Work", "MODEL", "hyperparameter comparison experiment"

- In several(!) cases, a space should be added after the dot (".") at the end of the sentence, e.g. "...sample.As shown" -> "...sample. As shown"

- Instead of "potential to be developed" and "next research", phrases like "room for improvement" and "future work" would be a more common (please remember to rephrase the entire sentence if the new phrase does not fit into the original sentence).

- Spelling of "GraphRel" and "CMAN" is not always consistent (in line 263: "Graphrel", in line 266: "cman"). I sugges to use "GraphRel" and "CMAN" throughout the entire manuscript.

- In line 260, I guess the Authors actually mean "precision" instead of "accuracy". (Note that "accuracy" usually refers to the number of correct classification decision divided by the total number of classification decisions.)

- Lines 240-242 literally repeat what is written in the previous lines.

- At the end of line 185, I can see a flipped "!" and a flipped "?" instead of "(" and ")" around t_i and t_j (it may be a rendering issue, but please double check).

- In lines 92-94: references to sections are broken.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Krisztian Buza

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 Feb 7;18(2):e0281055. doi: 10.1371/journal.pone.0281055.r002

Author response to Decision Letter 0

30 Nov 2022

To Reviewer #1:

1.The limitations and shortcomings of the method are written in lines 328 “Although ...”.

2.Follow your advice and add a contribution note at the end of the introduction.

3.A detailed comparison with other methods is made in the "Validation Experiences" section.

4.Follow your suggestions and add a discussion on future work in the conclusion section.

To Reviewer #2:

1.This method is not only for Chinese. We supplement the experiment and results on NYT dataset in our paper.

2.Follow your suggestions and the code of this project has been opened to GitHub, and the address is also mentioned in the paper.

3.Following your suggestions, I changed the description of the super parameters alpha and beta in the paper. The discussion of these two parameters in the paper is based on a fixed learning rate.

4.In this paper, standard data sets are used to ensure that the comparative experiments are on the same track.

5.At the end of the introduction, some descriptions of the effects of the model are added.

6.We sincerely thank the reviewer for careful reading. As suggested by the reviewer, We have unified several cases in the paper, such as "GraphRel" and "CMAN"; Correction of punctuation errors; Modified "precision" instead of "accuracy"; Fixed rendering issues; Several wording changes were made using more common statements.

Attachment

Submitted filename: Response to Reviewers.pdf

Click here for additional data file.^{(58.3KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0281055.r003

Decision Letter 1

Yiming Tang

13 Dec 2022

PONE-D-22-22843R1Span-Based Single-Stage Joint Entity-Relation Extraction ModelPLOS ONE

Dear Dr. Zhao,

Please submit your revised manuscript by Jan 27 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Yiming Tang, Ph.D.

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #2: (No Response)

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: No

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #2: Although the manuscript improved compared to the initial submission, e.g. I appreciate experiments on NYT dataset and the publication of the source code, I have to say that the Authors addressed my concerns only partially and their answers are exceptionally short, not always describing what exactly they changed to address the concerns.

(A) For example, I asked the Authors to perform statistical significance tests. Their answer, i.e., "At the end of the introduction, some description of the effects of the model are added" simply does not address the issue: the reason to perform statistical significance tests is to check whether the observed differences might be attributed to random fluctuations or not. As far as I can judge, statistical significance tests are essential in scientific works, or if it is not possible to perform rigorous significance testing, the reason should be explained clearly and the Authors should find other ways to justify that their conclusions drawn from the observations are not because of random fluctuations.

(B) Tab. 6 in the new version looks somewhat strange with many missing values: is it not possible to perform experiments with the configurations indicated by the "-" on the NYT dataset? Why? Is the reason conceptual or computational? Even if it is not possible, I would exclude those rows of the table that contain missing values only, such as "2:1" in case of NYT.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

**********

PLoS One. 2023 Feb 7;18(2):e0281055. doi: 10.1371/journal.pone.0281055.r004

Author response to Decision Letter 1

23 Dec 2022

Editor-in-Chief of PLOS ONE

Dear editor and reviewers

On behalf of all the contributing authors, I would like to express our sincere appreciations of your letter and reviewers’ constructive comments concerning our article entitled “Span-Based Single-Stage Joint Entity-Relation Extraction Model” (Manuscript No.: PONE-D-22-22843R1). These comments are all valuable and helpful for improving our article. According to the associate editor and reviewers’ comments, we have made extensive modifications to our manuscript and supplemented extra data to make our results convincing. In this revised version, changes to our “Revised Manuscript with Track Changes” were all highlighted within the document by using yellow-colored text. Point-by-point responses to the nice associate editor and two nice reviewers are listed below this letter.

To Reviewer #2:

A.We are sorry that we failed to complete the statistical significance test before. In order to prove that the model results do not come from random fluctuations, we have added ablation experiments in this revision to ensure the effectiveness of each component of the model.

B.Tab 6 has been revised and redundant experiments have been removed to make it clear.

Thank you again for your positive comments on our manuscript. PLOS ONE is an influential magazine. I hope our articles can be published in your magazine. If there are any other corrections, please feel free to contact us.

Attachment

Submitted filename: Response to Reviewers.pdf

Click here for additional data file.^{(53KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0281055.r005

Decision Letter 2

Yiming Tang

16 Jan 2023

Span-Based Single-Stage Joint Entity-Relation Extraction Model

PONE-D-22-22843R2

Dear Dr. Zhao,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Yiming Tang, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #2: (No Response)

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: (No Response)

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #2: (No Response)

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #2: (No Response)

**********

6. Review Comments to the Author

Reviewer #2: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

**********

PLoS One. doi: 10.1371/journal.pone.0281055.r006

Acceptance letter

Yiming Tang

27 Jan 2023

PONE-D-22-22843R2

Span-Based Single-Stage Joint Entity-Relation Extraction Model

Dear Dr. Zhao:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Professor Yiming Tang

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Dataset

(ZIP)

Click here for additional data file.^{(35.8MB, zip)}

Attachment

Submitted filename: Response to Reviewers.pdf

Click here for additional data file.^{(58.3KB, pdf)}

Attachment

Submitted filename: Response to Reviewers.pdf

Click here for additional data file.^{(53KB, pdf)}

Data Availability Statement

All relevant data are within the paper and its Supporting information files.

[pone.0281055.ref001] 1. Bach N, Badaskar S. A review of relation extraction. Literature review for Language and Statistics II. 2007;2:1–15. [Google Scholar]

[pone.0281055.ref002] 2.Gupta P, Schütze H, Andrassy B. Table Filling Multi-Task Recurrent Neural Network for Joint Entity and Relation Extraction. In: In proccedings of the 26th International Conference on Computational Linguistics (COLING2016); 2016.

[pone.0281055.ref003] 3.Zheng S, Wang F, Bao H, Hao Y, Zhou P, Xu B. Joint extraction of entities and relations based on a novel tagging scheme. arXiv preprint arXiv:170605075. 2017;.

[pone.0281055.ref004] 4.Dai D, Xiao X, Lyu Y, Dou S, She Q, Wang H. Joint extraction of entities and overlapping relations using position-attentive sequence labeling. In: Proceedings of the AAAI conference on artificial intelligence. vol. 33; 2019. p. 6300–6308.

[pone.0281055.ref005] 5. Zeng D, Zhang H, Liu Q. CopyMTL: Copy Mechanism for Joint Extraction of Entities and Relations with Multi-Task Learning. Proceedings of the AAAI Conference on Artificial Intelligence. 2020;34(5):9507–9514. doi: 10.1609/aaai.v34i05.6495 [DOI] [Google Scholar]

[pone.0281055.ref006] 6.Zhang RH, Liu Q, Fan AX, Ji H, Kurohashi S. Minimize Exposure Bias of Seq2Seq Models in Joint Entity and Relation Extraction. 2020;.

[pone.0281055.ref007] 7.Wei Z, Su J, Wang Y, Tian Y, Chang Y. A novel cascade binary tagging framework for relational triple extraction. arXiv preprint arXiv:190903227. 2019;.

[pone.0281055.ref008] 8.Miwa M, Bansal M. End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); 2016.

[pone.0281055.ref009] 9. Giannis B, Johannes D, Thomas D, Chris D. Joint entity recognition and relation extraction as a multi-head selection problem. Expert Systems with Application. 2018;114(DEC.):34–45. [Google Scholar]

[pone.0281055.ref010] 10.Katiyar A, Cardie C. Going out on a limb: Joint extraction of entity mentions and relations without dependency trees. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); 2017. p. 917–928.

[pone.0281055.ref011] 11.Li X, Yin F, Sun Z, Li X, Yuan A, Chai D, et al. Entity-Relation Extraction as Multi-Turn Question Answering; 2019.

[pone.0281055.ref012] 12.Dixit K, Al-Onaizan Y. Span-level model for relation extraction. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics; 2019. p. 5308–5314.

[pone.0281055.ref013] 13.Fu TJ, Ma WY. GraphRel: Modeling Text as Relational Graphs for Joint Entity and Relation Extraction. In: ACL; 2019.

[pone.0281055.ref014] 14.Wang Y, Yu B, Zhang Y, Liu T, Sun L. TPLinker: Single-stage Joint Extraction of Entities and Relations Through Token Pair Linking. 2020;.

[pone.0281055.ref015] 15.Sui D, Chen Y, Liu K, Zhao J, Liu S. Joint Entity and Relation Extraction with Set Prediction Networks. 2020;. [DOI] [PubMed]

[pone.0281055.ref016] 16. Zheng Z, Han D, Zhao H. Joint Extraction of Entities and Relations Model for Single-Step Span-Labeling. Computer Engineering and Applications | Comput Eng Appl. 2022; p. 1–11. [Google Scholar]

[pone.0281055.ref017] 17.Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2018;.

[pone.0281055.ref018] 18. Devlin J, Chang M, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR. 2018;abs/1810.04805. [Google Scholar]

[pone.0281055.ref019] 19.Cui Y, Che W, Liu T, Qin B, Yang Z, Wang S, et al. Pre-Training with Whole Word Masking for Chinese BERT. arXiv preprint arXiv:190608101. 2019;.

[pone.0281055.ref020] 20.Hochreiter, Sepp, Schmidhuber, Jurgen. Long short-term memory. Neural Computation. 1997;. [DOI] [PubMed]

[pone.0281055.ref021] 21.Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. In: Advances in neural information processing systems; 2017. p. 5998–6008.

[pone.0281055.ref022] 22.Peng H, Gao T, Han X, Lin Y, Li P, Liu Z, et al. Learning from context or names? an empirical study on neural relation extraction. arXiv preprint arXiv:201001923. 2020;.

[pone.0281055.ref023] 23.Rich, Caruana. Multitask Learning. Machine Learning. 1997;.

[pone.0281055.ref024] 24.Kingma D, Ba J. Adam: A Method for Stochastic Optimization. Computer Science. 2014;.

[pone.0281055.ref025] 25.Zhao S, Hu M, Cai Z, Liu F. Modeling Dense Cross-Modal Interactions for Joint Entity-Relation Extraction. In: IJCAI; 2020.

PERMALINK

Span-based single-stage joint entity-relation extraction model

Dongchen Han

Zhaoqian Zheng

Hui Zhao

Shanshan Feng

Haiting Pang

Roles

Abstract

Introduction

Table 1. Examples of entity overlap.

Related work

Model

Fig 1. The framework of SMHS.

Span-tagger

Fig 2. Example of span-tagger.

Encoding

Token representation

Span representation

Span feature extraction

Decoding

Fig 3. Example of decoding process (window length is 4).

Fig 4. Flattened matrix of relation extraction of Fig 3.

Relation extraction

Span classification

Loss function

Experiments

Dataset

Table 2. Dataset information.

Setting

Evaluation

Experimental results and analysis

Validation experiments

Table 3. Comparison of experimental results.

Hyperparameter comparison experiment

Table 4. Comparison experiment results of mapping strategy.

Table 5. Comparison experiment results of mapping strategy.

Table 6. Comparison experiment results of mapping strategy.

Ablation experiments

Table 7. Results of ablation experiments.

Conclusion

Supporting information

Data Availability

Funding Statement

References

Decision Letter 0

Yiming Tang

Roles

Author response to Decision Letter 0

Decision Letter 1

Yiming Tang

Roles

Author response to Decision Letter 1

Decision Letter 2

Yiming Tang

Roles

Acceptance letter

Yiming Tang

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases