Skip to main content
PLOS One logoLink to PLOS One
. 2022 Jul 22;17(7):e0268278. doi: 10.1371/journal.pone.0268278

Improving extractive document summarization with sentence centrality

Shuai Gong 1, Zhenfang Zhu 1,*, Jiangtao Qi 1, Chunling Tong 1, Qiang Lu 2, Wenqing Wu 1
Editor: Sanda Martinčić-Ipšić3
PMCID: PMC9307201  PMID: 35867732

Abstract

Extractive document summarization (EDS) is usually seen as a sequence labeling task, which extracts sentences from a document one by one to form a summary. However, extracting sentences separately ignores the relationship between the sentences and documents. One solution is to use sentence position information to enhance sentence representation, but this will cause the sentence-leading bias problem, especially in news datasets. In this paper, we propose a novel sentence centrality for the EDS task to address these two problems. The sentence centrality is based on directed graphs, while reflecting the sentence-document relationship, it also reflects the sentence position information in the document. We implicitly strengthen the relevance of sentences and documents by using sentence centrality to enhance sentence representation. Notably, we replaced the sentence position information with sentence centrality to reduce sentence-leading bias without causing model performance degradation. Experiments on the CNN/Daily Mail dataset showed that EDS models with sentence centrality significantly improved compared with baseline models.

Introduction

Automatic document summarization aims to produce a concise summary of a document while preserving its crucial information. Existing summarization methods can be divided into two categories: abstractive and extractive methods. Abstractive methods generate a summary word by word from scratch, and can introduce new words that do not appear in the document [1]. Extractive methods, on the other hand, form a summary by selecting text fragments from the original document. Compared with abstractive methods, extractive methods are inclined to generate semantically and grammatically correct sentences [2, 3].

In recent years, extractive document summarization (EDS) based on neural networks has achieved great success [46]. However, it faces a challenge in modeling the sentence-document hierarchical structure. Previous approaches to solve this problem can be divided into two categories: (1) constructing hierarchical structures to represent documents and sentences separately; (2) using certain sentence-document information to enhance the representation of sentences.

There is much excellent work based on the first approach. For example, Zhang et al. [7] proposed a hierarchical transformer [8] called HIBERT to strengthen the relationship between sentences and documents. Xu et al. [9] applied the self-attention scores in the sentence-level Transformer to measure the importance of sentences. Jia et al. [10] employed the hierarchical attention mechanism to establish intersentence relations. Although the hierarchical models effectively capture the sentence-document relationship, complex model architectures and huge computing power requirements limit their actual use scenarios. Another approach usually uses the position information of the sentence in the document to enhance the sentence representation. This method is simple and effective but will cause the sentence-leading bias problem, which means that the extractive summarizer tends to select the leading sentences in the document. Sentence-leading bias will cause the model to rely excessively on sentence position information rather than semantic information when selecting sentences [11, 12]. In this paper, we replace the sentence position information with the sentence centrality information.

The sentence centrality is usually based on undirected graphs and is widely used in unsupervised extractive summarization tasks to identify salient sentences in a document [13, 14]. In the task, a document is represented as a graph, in which each node is a sentence, weights of the edges are measured by sentence similarities. The centrality of a sentence can be measured by simply computing its node’s degree. This method can be described in Fig 1(a). The number in the node represents the position of the sentence in the document, the node size indicates the sentence centrality score. The centrality of the Third sentence is related to all other sentences. Although the sentence centrality based on undirected graphs can reflect the relationship between the sentence and the document, it does not include the sentence position information, which has been shown to have an essential role in the EDS task [11]. Zheng and Lapata [14] construct directed graphs to compute the sentence centrality. Their work shows that for a sentence, the similarity with the previous content will damage its centrality. Inspired by their work, we calculate the centrality score of a sentence based only on the similarity between the sentence and its following content. Our approach to calculating the sentence centrality is presented in Fig 1(b).

Fig 1. The sentence centrality based on the undirected graph (a) and the directed graph (b).

Fig 1

Figure (a) shows the conventional method for computing sentence centrality, and (b) is our method. In our method, we calculate the centrality of a sentence not by calculating the similarity of the sentence to all sentences in the document but only for some specific sentences.

Previous work considered sentence centrality as a signal to measure the importance of sentences [1315]. Different from their work, we regard the sentence centrality as a unique property of the sentence, just like sentence position information in the document. Therefore, we use the sentence centrality to enhance sentence representation.

Following our intuition mentioned before, we can learn that the sentence centrality is no longer restricted in unsupervised extractive methods. We develop two methods to apply sentence centrality to enhance sentences representation: (1) embedding the sentence centrality directly into the sentence representation output by the encoder; (2) updating the sentence representation indirectly via Graph Attention Network (GAT) [16]. We build two models to implement these two ideas. We firstly construct our sentence centrality-enhanced EDS model based on BERT. The model contains a BERT encoder and a summarization layer classifier to select sentences. Notably, in the summarization layer, we only use a simple linear classifier and do not use other methods such as Inter-sentence Transformer [17], RNN to enhance the model. Then, we construct the sentence centrality on the heterogeneous graphical extractive summarization model [18]. In the heterogeneous graph neural network, the sentence representation is varied by the attention mechanism. We extend the edge features with sentence centrality and use it to modify the GAT. The experimental results show that the performance of both models is significantly improved with the sentence centrality. Finally, we analyzed the position distribution of the sentence centrality in the document and explained why the sentence centrality information is practical.

The contributions of our work are as follows:

  1. We propose a novel sentence centrality for EDS task and two approaches to use sentence centrality to enhance sentence representation. With the help of the sentence centrality, the relationship between sentences and documents is implicitly strengthened, thus improving the performance of the extractive summarization.

  2. We propose to replace sentence position information with sentence centrality, which can reduce the sentence-leading bias in the news dataset caused by position information.

The remainder of this article is arranged as follows. We introduce some related topics on the EDS in the section Related Work. In the Method section, we define the EDS and then introduce our sentence centrality-enhanced extractive summarization models. We present the training details, parameter settings and experimental results in the Experiment section. In the Discussion section, we discuss why sentence centrality works. Finally, we conclude our paper in the Conclusion section.

Related work

To make the paper self-contained, we will introduce some related topics on the EDS and the sentence centrality-based summarization methods.

Extractive document summarization

The EDS task aims to extract sentences from the original document to form a summary. The task first encodes the sentences with the help of an encoder to obtain a sentence vector. The sentence vector is then passed through a classification layer to determine whether it should be included in the summary. Nallapati et al [2]; Zhou et al. [19] choose recurrent neural networks (RNN) for sentence encoding, while Wang et al. [20] use Transformer [8]. BERT and other pre-trained language models [21] also perform well in the EDS task. Besides, graph neural networks also have received extensive attention. Yasunaga et al. [22] apply the graph neural network for multi-document summarization. Wang et al. [18] propose to use the heterogeneous graph for the EDS task.

Although these methods are effective, they mostly rely on sentence position information to enhance sentence representation. We introduce sentence centrality information in the model and remove sentence position information, which improves model performance and does not cause sentence-leading bias.

Sentence centrality-based summarization methods

The sentence centrality is often used to measure the importance of a sentence in unsupervised EDS tasks. In the task, a document is represented as a graph, with nodes representing sentences and edges connecting sentences weighted according to their similarity. TextRank [13] calculates similarity by analyzing the cooccurrence of words, LexRank [15] incorporates TF-IDF values into the weights of the edges, Zheng and Lapata [14] use BERT to measure sentence similarities.

There are three key differences in our sentence centrality compared to the previous methods. (1) We calculate the centrality of a sentence by counting only the similarity between that sentence and the content that follows it, not the similarity of all other content. (2) The centrality of a sentence is considered a unique property of the associated sentence and document rather than just as a measure of the importance of the sentence. Therefore, we use sentence centrality to strengthen the sentence representation. (3) We applied sentence centrality into the supervised EDS.

Sentence embeddings for extractive document summarization

An essential step in the extractive document summarization task is to obtain sentence embeddings. Traditional sentence embedding methods are based on weighting and averaging words vectors to construct sentences’ vectors. Kedzie et al. [23] averaged word embeddings of a sentence to get the sentence embedding. This method regards each word as having the same effect on the sentence and does not consider the specificity of particular words. Nallapati et al. [2] apply RNN to compute the hidden state representation at each word position sequentially, based on the current word embedding and previous hidden states, then use the average-pooled, concatenated hidden states as sentence embeddings. Compared with a simple average of words embeddings representation, Nallapati et al. [2] consider the order of words.

Traditional sentence embedding methods are simple and effective. However, extractive document summarization is a document-level task, and the relationship between sentences and documents needs to be considered when obtaining sentence embeddings. Most works [3, 10, 20, 21] strengthen sentence embeddings using the position information of sentences in the document.

Different from their work, we use sentence center information to enhance sentence representations. Compared to using sentence position information, our methods are able to achieve performance improvements while reducing the sentence lead bias problem.

Method

We define the problem of EDS as follows. Given a single document d that contains n sentences, d = {s1, s2, …, sn}, where si = {wi1, wi2, …, wim} is the i-th sentence in the document and wij is the j-th word in the i-th sentence. EDS can be seen as a sequence labeling task [5], which means that every sentence in the document is assigned a label yi ∈ {0, 1} to suggest whether the sentence should be included in the summaries.

We introduce sentence centrality into the EDS task. The sentence centrality is used to enhance sentence representation in two ways. One is embedded directly into the sentence representation output by the encoder, and the other is to update the sentence representation indirectly via Graph Attention Network (GAT) [16]. In this section, we will first introduce the computation of the sentence centrality and then present our sentence centrality-enhanced EDS models.

Calculation of sentence centrality

The first step of calculating the sentence centrality is to obtain the representations of sentences. We use BERT [24] as the sentence encoder. BERT is a recently proposed highly effective model that is based on deep bidirectional Transformers and has achieved state-of-the-art performance in many NLP tasks. BERT is fine-tune followed by Gao et al. [25] with contrastive learning:

li=logesim(hi,hi+)/tj=1j=nesim(hi,hj)/t, (1)

where hi and hi+ are different vector representations of the same sentence, sim(hi,hi+) is the cosine similarity hiThi+hi·hi+, t is a temperature hyperparameter. We feed the same sentences to the encoder twice by applying random dropout to get hi and hi+. After we get representations {sv1, sv2, …, svn} for sentence {s1, s2, …, sn} in the document d, we calculate centrality of sentence si by following these steps. Firstly, we employ paired dot product to compute the similarity matrix Ei for sentence si:

Eij=(svi)Tsvj(in,j>i). (2)

Then, we calculate the centrality of the sentence si (Shorthand for SCi by averaging the elements included in Ei:

SCi=1n-i-1j=i+1j=neij. (3)

Through the Eqs (2) and (3), we can obtain the sentence centrality {SC1, SC2, …, SC(n−1)} for the sentence {s1, s2, …, s(n−1)}. Note that the centrality of the last sentence in the document is not calculated, we average other n − 1 sentences’ centrality to get SCn. That seems counter-intuitive since the last sentence should intuitively summarize the articles and have a high sentence centrality score. In fact, information in the last sentence is not as much as we intuitively think because of the particularity of the news dataset [20]. So far, we have obtained the centrality of all sentences in one document: SCd = {SC1, SC2, …, SCn}. We normalize SCd by the following way:

SCi˜=SCi-Min(SCd)Max(SCd)-Min(SCd). (4)

The centrality of all sentences in a document is ultimately expressed as:

SCd={SC1˜,SC2˜,...,SCn˜}. (5)

Sentence Centrality-enhanced EDS models

We build two models to implement the previously mentioned methods for enhancing sentence representations separately. The first model is based on BERT to realize viewpoint one: directly embedding sentence centrality into the sentence representation, and the second model is based on the heterogeneous graph neural network EDS model (HSG) of Wang et al. [18] to realize the viewpoint two: modifying the attention mechanism through sentence centrality and then indirectly enhancing the sentence representation through the attention. The method (2) could demonstrate the idea that the sentence centrality is a special property of sentences, because the sentence representation will update according to its centrality.

Sentence centrality-enhanced EDS model based on BERT

We firstly build our sentence centrality-enhanced EDS models based on BERT. The overview of this model is presented in Fig 2.

Fig 2. Sentence centrality-enhanced EDS model based on BERT.

Fig 2

EmbSCi is the centrality embedding of sentence si, which is directly embedded in the sentence representation generated by BERT. The sentence position embedding is replaced by sentence centrality embedding.

The model contains a BERT encoder and a summarization layer classifier. BERT is applied to obtain a contextual representation of each word for each sentence in the input document:

[u11,u12,...,unm]=BERT([w11,w12,...,wnm]). (6)

where the uij is the contextual representation for wij. The sentence’s representation is obtained by weighted pooling:

aij=W(uij)T. (7)
svi=1nj=1naijuij. (8)

In this way, we obtain the vector representation for each sentence in the document. Then, we obtain the sentence centrality embedding (EmbSCi) by mapping the normalized scalar sentence centrality to the multi-dimensional embedding space:

EmbSCi=WscSCi˜. (9)

where Wsc is a weight matrix with the weights set to 1. EmbSCi is the centrality embedding of sentence si, which has the same dimension as the sentence embedding. The final vector representation of sentence si in the document is represented as:

hi=svi+EmbSCi, (10)

where svi is the vector representation of the sentence si output by BERT.

In the summarization layer, we only use a simple classifier and do not use other methods such as Inter-sentence Transformer [17], RNN [2] to enhance the model. The simple classifier only adds a linear layer on the final sentence vector representation and use a sigmoid function to get the predicted score:

Y^=σ(W0+b). (11)

where σ is the sigmoid function, W0 is trainable weights matrix, b is a bias matrix. The loss of the model is the binary classification entropy of prediction Y^ against gold label Yi.

Sentence centrality-enhanced EDS model based on HSG

Heterogeneous summarization graph (HSG) [18] is an extractive summarization model based on heterogeneous graph neural networks, which achieves the optimal performance in the architecture without pre-trained contextualized encoders. The model contains two kinds of nodes: word nodes and sentence nodes. The TF-IDF value of the word links the word nodes and the sentence node containing the words. We build sentence centrality on this model for two reasons: (1) it can be verified that sentence centrality is equally valid in the architecture without pre-trained contextualized encoder; (2) it serves the purpose of indirectly enhancing the sentence representation by modifying the attention mechanism.

Our modified HSG model is presented in Fig 3. The word embedding is obtained by a word encoder. Here, we use a 300-dimensional GloVe [26] embedding for each word in the sentence. We first use Convolutional Neural Networks (CNN) [27] with different kernel sizes to capture local n-gram features for each sentence, and then obtain sentence-level features using Bidirectional Long and Short-Term Memory (BiLSTM) [28]. We use graph attention networks (GAT) [16] to update the representations of our semantic nodes. The GAT layer is modified by infusing the scalar edge weights eij, which are mapped to the multi-dimensional embedding space. The weights of the edge eij are the sum of the sentence centrality and the TF-IDF value of the words, because the types of nodes connected by the edge are different. The modified GAT layer is designed as follows:

eij=TFIDFi+SCj˜. (12)
zij=LeakyReLU(Wa[Wqhi;Wkhj;eij]). (13)
αij=exp(zij)lNizil. (14)
ui=sigmoid(jNiαijWvhj). (15)

where hi is the hidden state of input node, αij refers to the weight of attention between hi and hj. The residual connection is used to avoid gradient vanishing after several iterations. The final sentence representation is:

hi=hi+ui. (16)
Fig 3. Sentence centrality-enhanced EDS model based on HSG.

Fig 3

Given a constructed graph G with word features Xw and sentence node features Xs, the sentence nodes are updated with their neighbor word nodes via the above GAT and feed-forward (FFN) layer:

Us1=GAT(Hs0,Hw0,Hw0). (17)
Hs1=FFN(Us1+Hs0). (18)

where Hw0=Xw, Hs0=Xs, GAT(Hs0,Hw0,Hw0) denotes that Hs0 is used as the attention query and Hw0 is used as the key and value. The updated sentence representations are then feed into the sentence selector moudle. In the sentence selector moudle, we do node classification for sentences and ues the cross-entropy loss as the training objective for the whole system.

Experiment

Dataset

We conduct our experiment on the CNN/Daily [29] Mail, Xsum [30] datasets.

CNN/Daily Mail is a well-known news dataset for single document extractive summarization, which is split into three parts by Hermann et al. [24] for training, validation, and testing. The splits contain 90,266/1,220/1,093 CNN documents and 196,961/12,148/10,397 Daily Mail documents. We process the dataset by the Stanford CoreNLP toolkit [31] following methods in See et al. [32].

XSum is a one-sentence summary dataset to answer the question “What is the article about?”. We conduct experiments on this dataset to study whether sentence centrality-enhanced EDS models are still effective when dealing with dataset with short summaries.

We only use the XSum dataset for ablation experiments, as the extraction results on this dataset are few and are insufficient to support our model performance comparison.

Implementation details

We limit the sentence length to 50 words calculating sentence centrality. Both models are trained on a single GPU (GeForce RTX 3080).

Sentence centrality-enhanced EDS model based on BERT

The model is implemented by the ‘bert-base-uncased’ version of BERT, which can be obtained in https://github.com/huggingface/pytorch-pretrained-BERT. The model is trained for 40000 steps. The best result on the validation set occurs at step 37000. Adam algorithm is applied for optimizing the loss function. Learning rate schedule is following Vaswani et al. [33] with warming-up on first 10,000 steps:

lr=2e-3·min(step-5,step·warmup-0.5) (19)

We score the sentences and then select the top-3 sentences with the highest scores as the summaries.

Sentence centrality-enhanced EDS model based on HSG

The word nodes are initialized with demb = 300 while sentence nodes are ds = 128. The dimension of edge feature eij = 50. Each GAT layer is eight heads with a hidden state dh = 64. We train the model with 32 batch sizes for 20 epochs and use the Adam algorithm [34] to optimize the loss function with the learning rate 5e−4. In the decoding stage, we choose the three sentences with the highest scores as document summaries.

Trigram blocking

In the prediction phase of both two models, we use Trigram Blocking [35] for decoding, a simple and practical approach to reducing redundancy. In the stage of selecting sentences to form a summary, it skips sentences that have triple overlap with previously selected sentences. Surprisingly, this simple method of removing duplication brings a remarkable performance improvement.

Baselines and comparisons

We compare our models with the following solid baselines for text summarization:

  • LEAD-3: The method takes the first three sentences of the document as a summary.

  • HSG [18]: An extractive method based on the heterogeneous graph neural network. This method constructs the document as a heterogeneous graph. The graph contains two different types of nodes: sentence nodes and word nodes. Information can be passed between the nodes.

  • JECS [36]: A hybrid method. The method firstly selects sentences and then compresses each sentence by removing unnecessary words.

  • LSTMPN [37]: An extractive model based on LSTM and pointer network.

  • LongformerExt [38]: An extractive model based on Long Transformer. This method enables the complete input of sentences and documents to the encoder.

  • BERTSUMEXT [17]: A method based on the pretrained model BERT. The model encodes sentences by BERT and uses Inter-sentence Transformer to capture the document-level information further.

  • PNBERT [39]: An extractive model based on BERT and pointer network.

  • BERTRL [39]: The method encodes sentences by BERT and uses reinforcement learning to solve the problem of inconsistency between training and evaluation objectives.

  • HIBERTM [7]: An extractive model based on BERT. The model proposed a hierarchical transformer to strengthen the relationship between sentences and documents.

Results

We test our model on the CNN/Daily Mail. ROUGE [40] scores measure the summarization quality. The definition of ROUGE scores is presented in S1 Appendix. The results of our BERT-based EDS model are presented in Table 1. Experimental results show a slight performance improvement of our sentence centrality-enhanced EDS model compared to BERTSUM. The model BERTSUM uses Inter-sentence Transformer to strengthen the sentences-document relationship while we only use the sentence centrality, which could show that sentence centrality is effective in strengthening the relationship between sentences and documents.

Table 1. The results of sentence centrality-enhanced EDS model based on BERT.

Model ROUGE-1 ROUGE-2 ROUGE-L
LEAD-3 40.34 17.70 36.57
PNBERT 42.69 19.60 38.85
BERTRL 42.76 19.87 39.11
HIBERTM 42.37 19.95 38.83
Longformer-Ext 43.00 20.20 39.30
BERTSUMEXT 43.25 20.24 39.63
SCBERT 43.32 20.3 39.72

ROUGE scores measure the summarization quality. ROUGE-1, ROUGE-2, ROUGE-L are used for reporting the unigram, bigram, and longest common subsequence overlap with reference summaries. The first part presents the LEAD-3 baseline model. The second block shows the results of sentence-level extractors for comparison. SCBERT is our sentence centrality-enhanced EDS model based on BERT.

The experimental results of our model based on the heterogeneous graph neural network are shown in Table 2. The results show that the experimental performance on ROUGE-1, ROUGE-2, ROUGE-L outperforms all the models without pre-trained encoders.

Table 2. The results of sentence centrality-enhanced EDS model based on HSG.

Model ROUGE-1 ROUGE-2 ROUGE-L
LEAD-3 40.34 17.70 36.57
JECS 41.70 18.50 37.90
LSTM+PN 41.85 18.93 38.13
HSG 42.31 19.51 38.74
HSG + Tri-Blocking 42.95 19.76 39.23
SCHSG 43.01 19.98 39.39

The second block in the table shows the EDS models without the pre-trained encoder for comparison. The third block highlights the results of our model.

Ablation study

We performed ablation experiments to discuss the effects of the sentence centrality and sentence position on model performance. Experiments are conducted on CNN/Daily Mail and XSum. The models are presented as follows.

  • SCES: Extractive summarizer with the sentence centrality. We build our extractive summarization model based on BERT. We discard the position embedding of sentences, and the sentence centrality embedding is applied instead.

  • SCPES: Extractive summarizer with the sentence and position information. In this model, we do not discard the sentence position information. The sentence position information is embedded into the sentence with its centrality information together.

  • POSES: Extractive summarizer with the sentence position information. In this model, we use the sentence position information only to enhance sentence representation. The second block in the table shows the EDS models without the pre-trained encoder for comparison. The third block highlights the results of our model. The results show that the experimental performance on ROUGE-1, ROUGE-2, ROUGE-L outperforms all the models without pre-trained encoders.

All the models in Table 3 are based on the pre-trained language model BERT, where the SCES model is exactly our SCBERT in Table 1. The various configurations of experimental parameters for the SCES model are the same as for the SCBERT model, except that the datasets used are different.

Table 3. Performance difference caused by different sentence information.

Model ROUGE-1 ROUGE-2 ROUGE-L
CNN/Daily Mail
POSES 43.23 20.23 39.60
SCPES 43.27 20.24 39.68
SCES 43.32 20.30 39.72
XSum
POSES 23.67 4.60 17.89
SCPES 23.72 4.62 17.92
SCES 23.76 4.62 17.93

SCES is an extractive summarizer with the sentence centrality, SCPES is an extractive summarizer with the sentence and position information, POSES is an extractive summarizer with the sentence position information.

Table 3 shows the performance difference caused by the sentence position information and the sentence centrality. We can see that SCES performs well on news datasets CNN/Daily Mail and XSum. Combining the advantages of sentence centrality in reducing sentence-leading bias (we discuss it in the section Discussion) and experimental results, we can conclude that sentence centrality may be a better choice than sentence position information in the EDS task.

Discussion

We argue that the effectiveness of sentence centrality is dataset-dependent. In news datasets, sentence position information can cause sentence-leading bias, which limits model performance. This problem is mitigated when sentence centrality replaces sentence position information.

We do a analysis driven by one question: why is sentence centrality a better choice than sentence position information in EDS tasks, especially on news datasets? According to the definition of the sentence centrality, sentences with higher centrality are more relevant to the document. Based on this, we calculated the top three sentence centrality scores distribution at different positions in the document.

Fig 4 shows that the distribution of top-3 sentence centrality scores in different positions. We can see that sentences with high centrality scores tend to be located in front of the document, especially the first three sentences, explaining why the Lead-3 model is so strong and effective. Sentence position information is a simplification of its centrality, because it cannot recognize the importance of sentences with high centrality scores but is located far from the first sentence. Another significant disadvantage of using positional information is that it is only valid on a particular dataset, such as news datasets.

Fig 4. The distribution of top-3 sentence centrality scores in different positions.

Fig 4

Sentences with high centrality scores tend to be located in front of the document.

Fig 5 shows the proportion of sentences extracted by different models in different positions in the test set. We used a greedy algorithm that is similar to Nallapati et al. [2] to obtain an ORACLE summary for each document. The algorithm generates an ORACLE consisting of multiple sentences by maximizing the ROUGE-2 score against the gold summary. For the sentences in the document, the ones in ORACLE will be marked with the label 1, and the others will be marked with the label 0. ORACLE summary is often used to train extractive models in extractive summarization task, because it represents the extraction upper bound. For comparison, we constructed the sentence centrality on the BERTSUM model with the sentence position information removed. Experimental results show that our model reduces the number of sentences in the front position and increases the number of sentences in the back position when forming summaries. A reason is that sentences at the front of the document but with lower centrality have a reduced impact on the model. Compared to models that use sentence position information, our model’s outputs are more similar to ORACLE summaries.

Fig 5. Proportion of sentences extracted by different models in different positions.

Fig 5

BERTSUM is the BERT-based extractive summarization model with the sentence position. Oracle is the summary generated by the greedy algorithm.

Conclusion

In this paper, we presented how sentence centrality can be usefully applied in two ways for improving extractive summarization performance. We introduced a novel way to calculate sentence centrality and proposed two approaches to applying sentence centrality to enhance sentence representation: (1) directly embedding sentence centrality into the sentence representation; (2) modifying the attention mechanism through sentence centrality. We revealed that the positional information of a sentence can be replaced by its centrality without introducing sentence-leading bias. In future work, we will continue to explore three points about sentence centrality. First, the way we map scalar sentence centrality to a multi-dimensional space is straightforward. How to effectively model sentence centrality is worth exploring. Second, we will explore whether sentence centrality is also practical in other tasks, such as sentiment analysis, automatic question answering, etc. Finally, it would be useful to know how the proposed model performs with other similar node-local measures, such as the selectivity measure, which is also one of our future works.

Supporting information

S1 Appendix. Definition of ROUGE scores.

(PDF)

Acknowledgments

We would like to thank Professor Zhenfang Zhu for his guidance and support, who is also the corresponding author of the manuscript. We thank our NLP group for helpful discussion and valuable feedback on our paper. We also thank the reviewers for their patient and constructive review.

Data Availability

All relevant data are within the article and its Supporting information files.

Funding Statement

This study was funded by a grant from National Social Science Fund of China (19BYY076) to ZZ.

References

  • 1. Chan HP, King I. A condense-then-select strategy for text summarization. Knowl-Based Syst. 2021;227: 107235. doi: 10.1016/j.knosys.2021.107235 [DOI] [Google Scholar]
  • 2.Nallapati R, Zhai F, Zhou B. SummaRuNNer. A recurrent neural network based sequence model for extractive summarization of documents. Proc AAAI Conf Artif Intell. 2017;31. Available: https://ojs.aaai.org/index.php/AAAI/article/view/10958
  • 3.Dong Y, Shen Y, Crawford E, van Hoof H, Cheung JCK. BanditSum: Extractive Summarization as a Contextual Bandit. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Computational Linguistics; 2018. pp. 3739–3748.
  • 4.Liu Y, Lapata M. Text Summarization with Pretrained Encoders. Proceed-ings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China: Association for Compu-tational Linguistics; 2019. pp. 3730–3740.
  • 5.Jia R, Cao Y, Fang F, Zhou Y, Fang Z, Liu Y, et al. Deep Differential Amplifier for Extractive Summarization. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Online: Association for Computational Linguistics; 2021. pp. 366–376.
  • 6.Liu Y, Lapata M. Hierarchical Transformers for Multi-Document Summarization. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics; 2019. pp. 5070–5081.
  • 7.Zhang X, Wei F, Zhou M. HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics; 2019. pp. 5059–5069.
  • 8. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is All you Need. Adv Neural Inf Process Syst. 2017;30. Available: https://papers.nips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html [Google Scholar]
  • 9. Xu S, Zhang X, Wu Y, Wei F, Zhou M. Unsupervised Extractive Summarization by Pre-training Hierarchical Transformers. Findings of the Association for Computational Linguistics: EMNLP 2020. Online: Association for Computational Linguistics; 2020. pp. 1784–1795. doi: 10.18653/v1/2020.findings-emnlp.161 [DOI] [Google Scholar]
  • 10. Jia R, Cao Y, Shi H, Fang F, Yin P, Wang S. Flexible Non-Autoregressive Extractive Summarization with Threshold: How to Extract a Non-Fixed Number of Summary Sentences.: 9. [Google Scholar]
  • 11. Zhong M, Wang D, Liu P, Qiu X, Huang X. A Closer Look at Data Bias in Neural Extractive Summarization Models. Proceedings of the 2nd Workshop on New Frontiers in Summarization. Hong Kong, China: Association for Computational Linguistics; 2019. pp. 80–89. [Google Scholar]
  • 12.Xing L, Xiao W, Carenini G. Demoting the Lead Bias in News Summarization via Alternating Adversarial Learning. ArXiv210514241 Cs. 2021 [cited 15 Jul 2021]. Available: http://arxiv.org/abs/2105.14241
  • 13.Mihalcea R, Tarau P. TextRank: Bringing Order into Text. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Barcelona, Spain: Association for Computational Linguistics; 2004. pp. 404–411. Available: https://www.aclweb.org/anthology/W04-3252
  • 14.Zheng H, Lapata M. Sentence Centrality Revisited for Unsupervised Summarization. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics; 2019. pp. 6236–6247.
  • 15. Erkan G, Radev DR. LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. J Artif Intell Res. 2004;22: 457–479. doi: 10.1613/jair.1523 [DOI] [Google Scholar]
  • 16.Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y. Graph Attention Networks. ArXiv171010903 Cs Stat. 2018 [cited 25 Jun 2021]. Available: http://arxiv.org/abs/1710.10903
  • 17.Liu Y. Fine-tune BERT for Extractive Summarization. ArXiv190310318 Cs. 2019 [cited 25 Jun 2021]. Available: http://arxiv.org/abs/1903.10318
  • 18.Wang D, Liu P, Zheng Y, Qiu X, Huang X. Heterogeneous Graph Neural Networks for Extractive Document Summarization. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: Association for Computational Linguistics; 2020. pp. 6209–6219.
  • 19.Zhou Q, Yang N, Wei F, Huang S, Zhou M, Zhao T. Neural Document Summarization by Jointly Learning to Score and Select Sentences. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne, Australia: Association for Computational Linguistics; 2018. pp. 654–663.
  • 20.Wang D, Liu P, Zhong M, Fu J, Qiu X, Huang X. Exploring Domain Shift in Extractive Text Summarization. ArXiv190811664 Cs. 2019 [cited 25 Jun 2021]. Available: http://arxiv.org/abs/1908.11664
  • 21.Xu J, Gan Z, Cheng Y, Liu J. Discourse-Aware Neural Extractive Text Summarization. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: Association for Computational Linguistics; 2020. pp. 5021–5031.
  • 22.Yasunaga M, Zhang R, Meelu K, Pareek A, Srinivasan K, Radev D. Graph-based Neural Multi-Document Summarization. Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017). Vancouver, Canada: Association for Computational Linguistics; 2017. pp. 452–462.
  • 23.Kedzie C, McKeown K, Daumé III H. Content Selection in Deep Learning Models of Summarization. Proceedings of the 2018 Conference on Empiri-cal Methods in Natural Language Processing. Brussels, Belgium: Associa-tion for Computational Linguistics; 2018. pp. 1818–1828.
  • 24.Devlin J, Chang M-W, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics; 2019. pp. 4171–4186.
  • 25.Gao T, Yao X, Chen D. SimCSE: Simple Contrastive Learning of Sentence Embeddings. ArXiv210408821 Cs. 2021 [cited 19 Aug 2021]. Available: http://arxiv.org/abs/2104.08821
  • 26.Pennington J, Socher R, Manning C. GloVe: Global Vectors for Word Rep-resentation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar: Association for Com-putational Linguistics; 2014. pp. 1532–1543.
  • 27. LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86: 2278–2324. doi: 10.1109/5.726791 [DOI] [Google Scholar]
  • 28. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9: 1735–1780. doi: 10.1162/neco.1997.9.8.1735 [DOI] [PubMed] [Google Scholar]
  • 29. Hermann KM, Kocisky T, Grefenstette E, Espeholt L, Kay W, Suleyman M, et al. Teaching Machines to Read and Comprehend.: 9. [Google Scholar]
  • 30.Narayan S, Cohen SB, Lapata M. Don’t Give Me the Details, Just the Sum-mary! Topic-Aware Convolutional Neural Networks for Extreme Summari-zation. ArXiv180808745 Cs. 2018 [cited 22 Mar 2022]. Available: http://arxiv.org/abs/1808.08745
  • 31.Manning C, Surdeanu M, Bauer J, Finkel J, Bethard S, McClosky D. The Stanford CoreNLP Natural Language Processing Toolkit. Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Baltimore, Maryland: Association for Computational Linguistics; 2014. pp. 55–60.
  • 32.See A, Liu PJ, Manning CD. Get To The Point: Summarization with Pointer-Generator Networks. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vancouver, Canada: Association for Computational Linguistics; 2017. pp. 1073–1083.
  • 33.Ba JL, Kiros JR, Hinton GE. Layer Normalization. ArXiv160706450 Cs Stat. 2016 [cited 25 Jun 2021]. Available: http://arxiv.org/abs/1607.06450
  • 34.Kingma DP, Ba J. Adam: A Method for Stochastic Optimization. ArXiv14126980 Cs. 2017 [cited 25 Jun 2021]. Available: http://arxiv.org/abs/1412.6980
  • 35. Paulus R, Xiong C, Socher R. A DEEP REINFORCED MODEL FOR ABSTRACTIVE SUMMARIZATION. 2018; 13. [Google Scholar]
  • 36.Xu J, Durrett G. Neural Extractive Text Summarization with Syntactic Compression. ArXiv190200863 Cs. 2019 [cited 14 Jul 2021]. Available: http://arxiv.org/abs/1902.00863
  • 37.Zhang X, Lapata M, Wei F, Zhou M. Neural Latent Extractive Document Summarization. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Computational Linguistics; 2018. pp. 779–784.
  • 38.Beltagy I, Peters ME, Cohan A. Longformer: The Long-Document Transformer. ArXiv200405150 Cs. 2020 [cited 14 Jul 2021]. Available: http://arxiv.org/abs/2004.05150
  • 39.Zhong M, Liu P, Wang D, Qiu X, Huang X. Searching for Effective Neural Extractive Summarization: What Works and What’s Next. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics; 2019. pp. 1049–1058.
  • 40. Lin C-Y. ROUGE: A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out. Barcelona, Spain: Association for Computa-tional Linguistics; 2004. pp. 74–81. Available: https://aclanthology.org/W04-1013 [Google Scholar]

Decision Letter 0

Sanda Martinčić-Ipšić

23 Mar 2022

PONE-D-22-05139Improving Extractive Document Summarization with Sentence CentralityPLOS ONE

Dear Dr. Gong,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by May 07 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Sanda Martinčić-Ipšić, PhD

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

Additional Editor Comments :

Please address reviewers comments adequately.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript “Improving Extractive Document Summarization with Sentence Centrality” highlights a way to enhance sentence representation for extractive document summarization which in turns boost performance of existing EDS techniques. The paper is well written and contributions are clearly laid out and explained, used methodology is sound. Still some issues are cracking through. The following is a list of items which should be addressed:

1. In the introduction, the manuscript references findings of Zheng and Lapata that “similarity with the previous content will damage centrality”. In the method proposed here, forward edges are removed and only backward edges are considered, but this seems confusing. Why is this the right approach if this tends to damage centrality score?

2. Acronym EDS in line 10 is not defined – please add full text of the meaning.

3. The structure of the paper is missing at the end of Introduction.

4. While being sufficient example of a graph, figure 1 does not convey idea of degree centrality in a meaningful way. Maybe node size or colour should be varied based on centrality score to reinforce the idea that some sentences are more important than the other.

5. In the approach involving heterogeneous graph neural network, edge features were extended with sentence centrality. If sentence centrality is a node characteristic, why is it used to enhance edge features?

6. In the equation 6, indices in LHS appear to be duplicated. Equation 7 contains similar problem. Although it may be obvious to knowledgeable reader, terms of the equation 11 should still be defined for rigour, and it should be done for all other equations as well. For example, in equation (1) hi and hi+ are not defined.

7. It is not quite clear how is centrality embedding (EmbSCi) is obtained. Specifically, what are the terms on the RHS of the equation 9? If SCi is centrality of sentence i, what is the meaning of the exponents 1 to emb? Furthermore, equation 9 defines EmbSCi as a set of terms SCi, but this seems unusual w.r.t embeddings usually being vectors.

8. Please define HSG in line 168.

9. Please add short explanation of Trigram Blocking.

10. How does table 3 relate to table 1 and table 2? Are these various configurations of input data to SCBERT and SCHSG models from previous tables? If it is indeed so, please make sure it is more clear from the text to prevent any misunderstandings.

11. Please cite everything which is not original work presented in the paper, e.g. ROUGE, bert-base-uncased model, and some other instances.

12. Please in the appendix define used ROUGE scores.

13. “Analysis” section contains reference to “ORACLE summary”, please elaborate some more on what the ORACLE is, how does it work, and cite the relevant paper.

14. Consider renaming Analysis Chapter into Discussion and expand it.

15. Axes on figure 3 should be appropriately labelled to make the figure stands on it’s own.

16. It would be useful to know how the proposed model performs with other similar node-local measures such as selectivity measure. It might be useful as a basis for future work.

17. Introduction contains abbreviation “EDT” which is never elaborated or mentioned again. This seems like a very minor typographical error, and presumably was meant to say "EDS". In the same vein, “Ablation Study” section contains sentence “The results show that the experimental performance on ROUGE-L, ROUGE-2, ROUGE-L”. Is the first ROUGE-L in line 244 ROUGE-1?

18. On several places there is a blank after comma missing.

Reviewer #2: ### Overview and general recommendation

The paper addresses the problem of extractive document summarization. It uses sentence centrality information to enhance sentence representation. This information should reflect the sentence-document relationship and the sentence position information as well. The sentence representation is enhanced in two ways: one is embedded directly into the sentence representation output and the other updates the sentence representation indirectly via a graph attention network.

Because of advances in abstractive summarization, the task of extractive document summarization is probably no longer very challenging or interesting for the research field. The paper provides a good method for comparing the impact of sentence position vs. sentence centrality. However, one of the conclusions is not well supported by the experimental results, but overall, this is a very well-written paper with sound methodology, ablations studies, and a well-formulated research question.

### Major comments

- According to Table 3, the results of different sentence information don't support the conclusion that sentence centrality is a better choice than sentence position. Given that ROUGE is a poor metric, the difference of approximately 0.1 is insignificant.

- More ideas for future research could be added to the conclusion.

- The related research section could be expanded as well, e.g., adding a section on sentence embeddings, since the paper works on that level of representation.

- Some typos should be fixed, and some sentences could be rewritten to make them more clear; Take a look at minor remarks

- There is no link to the code repository.

### Minor comments

- Typo in the caption of Fig. 1, “documentt”

- Line 176: “using” → “use”

- Table 3: “summarizers” → “summarizer”

- Fig. 2: the first sentence should be corrected

- Line 244: the first “ROUGE-L” should be “ROUGE-1”

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Jul 22;17(7):e0268278. doi: 10.1371/journal.pone.0268278.r002

Author response to Decision Letter 0


18 Apr 2022

We greatly appreciate the reviewers taking the time to provide constructive comments and helpful suggestions. There is no doubt that the suggestions have significantly raised the quality if the manuscript and have enable us to improve the manuscript. Each suggested revision and comment brought forward by the reviewers was accurately incorporated and considered. We have carefully addressed all the reviewer's concerns. Please see our replies. Changes highlighted in red have been made accordingly in the revised manuscript.

Comments to the Author

________________________________________

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with

data that supports the conclusions. Experiments must have been conducted rigorously,

with appropriate controls, replication, and sample sizes. The conclusions must be

drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

Response: Thanks for your review. We added experiments to verify the effectiveness of our method. We conducted experiments on XSum dataset. The results demonstrate the superiority of sentence centrality compared to positional information.

________________________________________

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

Reviewer #2: Yes

Response: Thank you for agreeing that our analysis is appropriate and rigorous.

________________________________________

3. Have the authors made all data underlying the findings in their manuscript fully

available?

The PLOS Data policy requires authors to make all data underlying the findings

described in their manuscript fully available without restriction, with rare exception

(please refer to the Data Availability Statement in the manuscript PDF file). The data

should be provided as part of the manuscript or its supporting information, or

deposited to a public repository. For example, in addition to summary statistics, the

data points behind means, medians and variance measures should be available. If

there are restrictions on publicly sharing data—e.g. participant privacy or use of data

from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Response: Thank you for your approval of our data.

________________________________________

4. Is the manuscript presented in an intelligible fashion and written in standard

English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted

articles must be clear, correct, and unambiguous. Any typographical or grammatical

errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Response: Thank you for your approval.

________________________________________

Review Comments to the Author

________________________________________

Please use the space provided to explain your answers to the questions above. You

may also include additional comments for the author, including concerns about dual

publication, research ethics, or publication ethics. (Please upload your review as an

attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript “Improving Extractive Document Summarization with

Sentence Centrality” highlights a way to enhance sentence representation for

extractive document summarization which in turns boost performance of existing EDS techniques. The paper is well written and contributions are clearly laid out and

explained, used methodology is sound. Still some issues are cracking through. The

following is a list of items which should be addressed:

1. In the introduction, the manuscript references findings of Zheng and Lapata that

“similarity with the previous content will damage centrality”. In the method proposed here, forward edges are removed and only backward edges are considered, but this seems confusing. Why is this the right approach if this tends to damage centrality score?

Response: We appreciate the reviewer for asking questions about the details of sentence centrality. According to the original paper of Zheng and Lapata, the centrality score of s_i based on the directed graph can be defined as follows:

centrality(s_i )=λ_1 ∑_(j<i)e_ij +λ_2 ∑_(j>i)e_ij ,

where the optimal λ_1 tends to be negative.

In our paper, we do not calculate the similarity between the sentence and its previous content of the sentence, which means that we set the weights of forward-looking directed edges λ_1 are equal to 0.

We think the descriptions of “forward-looking” and “forward” make our point unclear, so we revised this part of content to convey our idea clearly, in lines 43-45.

The description of “Inspired by their work, we remove the forward edges of sentences on directed graphs and calculate the sentence centrality based only on the weights of backward edges” is modified by us to “Inspired by their work, we calculate the centrality score of a sentence based only on the similarity between the sentence and its following content”.

________________________________________

2. Acronym EDS in line 10 is not defined – please add full text of the meaning.

Response: We appreciate the reviewer for his/her careful review. We have added the full text of the meaning of EDS in line 10.

________________________________________

3. The structure of the paper is missing at the end of Introduction.

Response: We thank the reviewer for reminding us to describe the structure of our paper, and there is no doubt that this suggestion makes the paper more readable. We have added a description of the paper structure in lines 76-81.

________________________________________

4. While being sufficient example of a graph, figure 1 does not convey idea of degree

centrality in a meaningful way. Maybe node size or colour should be varied based on

centrality score to reinforce the idea that some sentences are more important than the other.

Response: We thank the reviewer for the helpful suggestions on our figure 1. We now updated the figure 1. we used different colors to indicate different sentences, and used the size of the nodes to indicate the centrality scores. We also increased the number of sentence nodes in order to convey idea our ideas more clearly.

________________________________________

5. In the approach involving heterogeneous graph neural network, edge features were

extended with sentence centrality. If sentence centrality is a node characteristic, why

is it used to enhance edge features?

Response: We are grateful to the reviewer for questions about our method.

We use sentence centrality to enhance edge features is to modify the graph attention (GAT) layer. In the heterogeneous graph neural network, the sentence representations are updated with their neighbor word nodes via a GAT layer and feed-forward (FFN) layer. The GAT layer is modified by infusing the scalar edge weights e_ij (described in equation 13), which are mapped to the multi-dimensional embedding space. The weights of the edge e_ij are the sum of the sentence centrality and the TF-IDF value of the words, because the types of nodes connected by the edge are different.

We added equations and a textual explanation to describe how sentence representations are updated by GAT and FFN in equation 17, equation 18 and lines 228-236. We also do a textual explanation of why we use sentence centrality to enhance edge features in lines 216-221.

________________________________________

6. In the equation 6, indices in LHS appear to be duplicated. Equation 7 contains

similar problem. Although it may be obvious to knowledgeable reader, terms of the

equation 11 should still be defined for rigour, and it should be done for all other

equations as well. For example, in equation (1) hi and hi+ are not defined.

Response: We appreciate the reviewer for his/her careful review and we feel sorry for our lack of rigour.

We have corrected equation 6 and equation 7. For terms in the equations that are not strictly defined, we have carefully checked.

For equation 1, we defined h_i and h_i^+, and explained the meaning of sim(h_i,h_i^+) in lines 150-152.

For equation 6, we explained the meaning of the term u_ij in line 184.

For equation 9 and equation 10, we modified these two equations to make the meaning of 〖EmbSC〗_i clearer. We defined each term in lines 187-192.

For equation 11, we define each term of the equation and explain the meaning in lines198-200.

________________________________________

7. It is not quite clear how is centrality embedding (EmbSCi) is obtained. Specifically,

what are the terms on the RHS of the equation 9? If SCi is centrality of sentence i,

what is the meaning of the exponents 1 to emb? Furthermore, equation 9 defines

EmbSCi as a set of terms SCi, but this seems unusual w.r.t embeddings usually being

vectors.

Response: We appreciate the reviewer for his/her careful review. Our definition of equation 9 is not rigorous, leaving our point unclear. We feel sorry for this. We modified the equation 9. Now the equation 9 is:

EmbSC_i=W_sc (SC_i ) ,

where W_sc is a weight matrix with the weights set to 1. EmbSC_i is the centrality embedding of sentence s_i, which has the same dimension as the sentence embedding.

EmbSC_i is obtained by mapping the normalized scalar sentence centrality to the multi-dimensional embedding space. The RHS of the new equation 9 W_sc (SC_i ) means that (SC_i ) is mapped to a higher dimensional space by W_sc. The RHS of our previous equation 9 was trying to convey the same meaning as the new equation 9, but we are sorry that we did not make it clearer. The exponents 1 to emb means that we map sentence centrality to the emb-dimensional space in the previous equation 9.

________________________________________

8. Please define HSG in line 168.

Response: We thank the reviewer for reminding us to define HSG. We add full text of the meaning HSG in lines 202-204, and give our sentence centrality-enhanced extractive document summarization model based on HSG in Fig.3.

________________________________________

9. Please add short explanation of Trigram Blocking.

Response: We thank the reviewer for reminding us to add short explanation of Trigram Blocking. We added the short explanation of Trigram Blocking in lines 270-273.

________________________________________

10. How does table 3 relate to table 1 and table 2? Are these various configurations of

input data to SCBERT and SCHSG models from previous tables? If it is indeed so,

please make sure it is clearer from the text to prevent any misunderstandings.

Response: We thank the reviewer for his/her careful review. All the models in Table 3 are based on the pre-trained language model BERT, where the SCES model is exactly our SCBERT in Table 1. The various configurations of experimental parameters for the SCES model are the same as for the SCBERT model, except that the datasets used are different. We added the relevant description in lines 324-327.

________________________________________

11. Please cite everything which is not original work presented in the paper, e.g.

ROUGE, bert-base-uncased model, and some other instances.

Response: We thank the reviewer for his/her careful review. We checked our paper carefully and added citations to the work that needed to be cited.

Line 17, we added a reference to transformer.

Line 213, we added a reference to Convolutional Neural Network.

Line 216, we added a reference to Bidirectional Long and Short-Term Memory.

Line 239, we added references to CNN/Daily Mail and XSum datasets.

Line 297, we added references to ROUGE.

“bert-base-uncased model” is published in https://github.com/huggingface/pytorch-pretrained-BERT, we put the URL in lines 256-257.

________________________________________

12. Please in the appendix define used ROUGE scores.

Response: We thank the reviewer for reminding us to add the definition of ROUGE scores. We defined the used ROUGE scores in S1 Appendix.

________________________________________

13. “Analysis” section contains reference to “ORACLE summary”, please elaborate

some more on what the ORACLE is, how does it work, and cite the relevant paper.

Response: We thank the reviewer for his/her rigorous review. We elaborated the ORACLE summary in lines 353-359, including that what the ORACLE is, how does it work. We also cite the relevant paper.s

________________________________________

14. Consider renaming Analysis Chapter into Discussion and expand it.

Response: We thank the reviewers for the suggestions on the structure of our article.

We have renamed Analysis Chapter into Discussion. In this part, we added the content of ORACLE according to your comment 13. We discuss why sentence centrality is a better choice than sentence position information.

________________________________________15. Axes on figure 3 should be appropriately labelled to make the figure stands on it’s

own.

Response: We thank the reviewer for his/her careful review. Since we added a heterogeneous graph model graph, the original figure 3 is now figure 4. The axes on figure 4 are now appropriately labeled.

________________________________________

16. It would be useful to know how the proposed model performs with other similar

node-local measures such as selectivity measure. It might be useful as a basis for

future work.

Response: We thank the reviewer for his/her constructive suggestion. This suggestion makes us realize that our exploration of sentence centrality needs to go further. We have written this suggestion into future work. Thanks again for the constructive suggestion.

________________________________________

17. Introduction contains abbreviation “EDT” which is never elaborated or mentioned

again. This seems like a very minor typographical error, and presumably was meant to

say "EDS". In the same vein, “Ablation Study” section contains sentence “The results

show that the experimental performance on ROUGE-L, ROUGE-2, ROUGE-L”. Is

the first ROUGE-L in line 244 ROUGE-1?

Response: We appreciate the reviewer for his/her careful review. “EDT” is a typographical error; we have corrected it in line 41. The first “ROUGE-L” is corrected to “ROUGE-1” in line 306.

________________________________________

18. On several places there is a blank after comma missing.

Response: We appreciate the reviewer for his/her careful review. We checked our paper carefully and added a blank for commas.

We again thank the reviewer for taking the time to review our article. From the review comments, we can feel the rigorous attitude of the reviewers to academics. There is no doubt that the comments of the reviewers make our manuscript more rigorous and clearer.

________________________________________Reviewer #2: ### Overview and general recommendation

The paper addresses the problem of extractive document summarization. It uses

sentence centrality information to enhance sentence representation. This information

should reflect the sentence-document relationship and the sentence position

information as well. The sentence representation is enhanced in two ways: one is

embedded directly into the sentence representation output and the other updates the

sentence representation indirectly via a graph attention network.

Because of advances in abstractive summarization, the task of extractive document

summarization is probably no longer very challenging or interesting for the research

field. The paper provides a good method for comparing the impact of sentence

position vs. sentence centrality. However, one of the conclusions is not well supported

by the experimental results, but overall, this is a very well-written paper with sound

methodology, ablations studies, and a well-formulated research question.

### Major comments

- According to Table 3, the results of different sentence information don't support the

conclusion that sentence centrality is a better choice than sentence position. Given

that ROUGE is a poor metric, the difference of approximately 0.1 is insignificant.

Response: We thank the reviewer for pointing out potential limitation in our study.

We agree with the reviewer that the superiority of sentence centrality cannot be demonstrated only from the ROUGE scores.

In the extractive summarization task, position information is usually used to enhance sentence representation. Although doing so will improve model extraction performance significantly, it will cause sentence-leading bias, especially in news datasets. We present this phenomenon in lines 23-30.

we replaced the sentence position information with sentence centrality to reduce sentence-leading bias without causing model performance degradation, which can be seen in the figure 5 and table 3. We added experiments in the news dataset XSum. The experimental results show that the replacement of sentence position information by sentence centrality will not cause model performance degradation.

Before reaching the conclusion "sentence centrality is a better choice than sentence position", we added a description of the advantages of sentence centrality in reducing sentence leading bias, discussed in the Discussion section.

Combining the advantages of sentence centrality in reducing sentence leading bias and experimental results in table 3, we can conclude that sentence centrality information has certain advantages over sentence position information the extractive summarization task.

We are grateful to the reviewer for his/her constructive comments, which made our logic more rigorous and greatly improved the quality of our paper.

________________________________________

- More ideas for future research could be added to the conclusion.

Response: We thank the reviewer for reminding us to expand our future work. We add more ideas in future research, including exploring whether sentence centrality is also effective in other tasks, etc., which are presented in lines 374-381.

________________________________________

- The related research section could be expanded as well, e.g., adding a section on

sentence embeddings, since the paper works on that level of representation.

Response: We thank the reviewer for his/her constructive suggestion. Adding a section on sentence embeddings will make our article more rigorous. We have expanded our related work in lines 113-131.

________________________________________

- Some typos should be fixed, and some sentences could be rewritten to make them. Take a look at minor remarks

Response: We thank the reviewer for his/her careful review. We have carefully checked our article and corrected errors. We present the modification details in the ### Minor comments section.

________________________________________

- There is no link to the code repository.

Response: We are pleased that the reviewer is interested in our work.

The code is released at https://github.com/GongShuai8210/SCES.

________________________________________### Minor comments

- Typo in the caption of Fig. 1, “documentt”

- Line 176: “using” → “use”

- Table 3: “summarizers” → “summarizer”

- Fig. 2: the first sentence should be corrected

- Line 244: the first “ROUGE-L” should be “ROUGE-1”

Response: We thank the reviewer for taking the time to review our article carefully. We have corrected the typo now.

-Typo in the caption of Fig. 1, “documentt” → “document”

- Line 213: “using” has been corrected to “use”

- Table 3: “summarizers” has been corrected to “summarizer”

- Fig. 2: the first sentence has been corrected to “EmbSC_i is the centrality embedding of sentence s_i, which is directly embedded in the sentence representation generated by BERT”.

- Line 306: the first “ROUGE-L” has been corrected to “ROUGE-1”.

We again thank the reviewer for his/her careful review and constructive suggestions. There is no doubt that the reviewer's suggestions improve the quality of our article.

Attachment

Submitted filename: Response to Reviewers.pdf

Decision Letter 1

Sanda Martinčić-Ipšić

27 Apr 2022

Improving Extractive Document Summarization with Sentence Centrality

PONE-D-22-05139R1

Dear Dr. Gong,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Sanda Martinčić-Ipšić, PhD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

I have reviewed the revised manuscript, responses to reviewer comments, and availability of data and SW. The authors have addressed all reviewer comments and improved the quality of the manuscript. I am pleased to report that the current manuscript revision adequately addresses all issues and meets the required PlosOnNE criteria.

Reviewers' comments:

Acceptance letter

Sanda Martinčić-Ipšić

14 Jul 2022

PONE-D-22-05139R1

Improving Extractive Document Summarization with Sentence Centrality

Dear Dr. Zhu:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Sanda Martinčić-Ipšić

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Appendix. Definition of ROUGE scores.

    (PDF)

    Attachment

    Submitted filename: Response to Reviewers.pdf

    Data Availability Statement

    All relevant data are within the article and its Supporting information files.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES