NLP4NLP+5: The Deep (R)evolution in Speech and Language Processing

Joseph Mariani; Gil Francopoulo; Patrick Paroubek; Frédéric Vernier

doi:10.3389/frma.2022.863126

. 2022 Jul 27;7:863126. doi: 10.3389/frma.2022.863126

NLP4NLP+5: The Deep (R)evolution in Speech and Language Processing

Joseph Mariani ^1,^*, Gil Francopoulo ², Patrick Paroubek ¹, Frédéric Vernier ¹

PMCID: PMC9363593 PMID: 35965665

Abstract

This paper aims at analyzing the changes in the fields of speech and natural language processing over the recent past 5 years (2016–2020). It is in continuation of a series of two papers that we published in 2019 on the analysis of the NLP4NLP corpus, which contained articles published in 34 major conferences and journals in the field of speech and natural language processing, over a period of 50 years (1965–2015), and analyzed with the methods developed in the field of NLP, hence its name. The extended NLP4NLP+5 corpus now covers 55 years, comprising close to 90,000 documents [+30% compared with NLP4NLP: as many articles have been published in the single year 2020 than over the first 25 years (1965–1989)], 67,000 authors (+40%), 590,000 references (+80%), and approximately 380 million words (+40%). These analyses are conducted globally or comparatively among sources and also with the general scientific literature, with a focus on the past 5 years. It concludes in identifying profound changes in research topics as well as in the emergence of a new generation of authors and the appearance of new publications around artificial intelligence, neural networks, machine learning, and word embedding.

Keywords: speech processing, natural language processing, artificial intelligence, neural networks, machine learning, research metrics, text mining

Introduction

Preliminary Remarks

The global aim of this series of studies was to investigate the speech and natural language processing (SNLP), research area through the related scientific publications, using a set of NLP tools, in harmony with the growing interest for scientometrics in SNLP [refer to Banchs, 2012; Jurafsky, 2016; Atanassova et al., 2019; Goh and Lepage, 2019; Mohammad, 2020a,b,c; Wang et al., 2020; Sharma et al., 2021 and many more] or in various domains such as economics (Muñoz-Céspedes et al., 2021), finance (Daudert and Ahmadi, 2019), or disinformation (Monogarova et al., 2021).

The first results of these studies were presented in two companion papers, published in the first special issue “Mining Scientific Papers Volume I: NLP-enhanced Bibliometrics” of the Frontiers in Research Metrics and Analytics journal; one on production, collaboration, and citation: “The NLP4NLP Corpus (I): 50 Years of Publication, Collaboration and Citation in Speech and Language Processing” (Mariani et al., 2019a) and a second one on the evolution of research topics over time, innovation, use of language resources and reuse of papers and plagiarism within and across publications: “The NLP4NLP Corpus (II): 50 Years of Research in Speech and Language Processing” (Mariani et al., 2019b).

We now extend this corpus by considering the articles published in the same 34 sources over the past 5 years (2016–2020). We watched during this period an increasing interest for machine-learning approaches for processing speech and natural language, and we wanted to examine how this was reflected in the scientific literature. Here, we therefore analyze these augmented data to identify the changes that happened during this period, both in terms of scientific topics and in terms of research community, reporting the results of this new study in a single article covering papers and authors' production and citation within these sources, which is submitted to the second special issue “Mining Scientific Papers Volume II: Knowledge Discovery and Data Exploitation” of the Frontiers in Research Metrics and Analytics journal. We invite the reader to refer to the previous extensive articles to get more insights on the used data and developed methods. In addition, we conducted here the study of the more than 1 million total number of references, to measure the possible influence of neighboring disciplines outside the NLP4NLP sources.

The NLP4NLP Speech and Natural Language Processing Corpus

The NLP4NLP corpus¹ (Mariani et al., 2019a) contained papers from 34 conferences and journals on natural language processing (NLP) and spoken language processing (SLP) (Table 1) published over 50 years (1965–2015), gathering about 68,000 articles and 270MWords from about 50,000 different authors, and about 325,000 references. Although it represents a good picture of the international research investigations of the SNLP community, many papers, including important seminal papers, related to this field, may have been published in other publications than these. Given the uncertainty of the existence of a proper review process, we did not include the content neither of workshops nor of publications such as arXiv², a popular non-peer-reviewed, free distribution service and open-access archive created in 1991 and now maintained at Cornell University. It should be noticed that conferences may be held annually, may appear every 2 years (on even or odd years), and may also be organized jointly on the same year.

Table 1.

The NLP4NLP+5 corpus of conferences (24) and journals (10).

Short name	# docs	Format	Long name	Language	Access to content	Period	# events
acl	6,713	Conference	Association for Computational Linguistics conference series	English	Open access^*	1979–2020	42
acmtslp	82	Journal	ACM Transaction on Speech and Language Processing	English	Private access	2004–2013	10
alta	361	Conference	Australasian Language Technology Association conference series	English	Open access^*	2003–2019	17
anlp	278	Conference	Applied Natural Language Processing	English	Open access^*	1983–2000	6
cath	927	Journal	Computers and the Humanities	English	Private access	1966–2004	39
cl	905	Journal	American Journal of Computational Linguistics	English	Open access^*	1980–2020	41
coling	5,091	Conference	Conference on Computational Linguistics	English	Open access^*	1965–2020	24
conll	1,124	Conference	Computational Natural Language Learning	English	Open access^*	1997–2020	23
csal	1,111	Journal	Computer Speech and Language	English	Private access	1986–2020	34
eacl	1,139	Conference	European Chapter of the ACL conference series	English	Open access^*	1983–2017	15
emnlp	4,588	Conference	Empirical methods in natural language processing	English	Open access^*	1996–2020	25
hlt	2,219	Conference	Human Language Technology	English	Open access^*	1986–2015	19
icassps	10,971	Conference	IEEE International Conference on Acoustics, Speech and Signal Processing - Speech Track	English	Private access	1990–2020	31
ijcnlp	2,047	Conference	International Joint Conference on NLP	English	Open access^*	2005–2019	8
inlg	495	Conference	International Conference on Natural Language Generation	English	Open access^*	1996–2020	12
isca	22,778	Conference	International Speech Communication Association conference series	English	Open access	1987–2020	33
jep	739	Conference	Journées d'Etudes sur la Parole	French	Open access^*	2002–2020	8
lre	490	Journal	Language Resources and Evaluation	English	Private access	2005–2020	16
lrec	6,920	Conference	Language Resources and Evaluation Conference	English	Open access^*	1998–2020	12
ltc	793	Conference	Language and Technology Conference	English	Private access	1995–2019	9
modulad	232	Journal	Le Monde des Utilisateurs de L'Analyse des Données	French	Open access	1988–2010	23
mts	906	Conference	Machine Translation Summit	English	Open access^*	1987–2019	17
muc	149	Conference	Message Understanding Conference	English	Open access^*	1991–1998	5
naacl	2,175	Conference	North American Chapter of the ACL conference series	English	Open access^*	2000–2019	14
paclic	1,352	Conference	Pacific Asia Conference on Language, Information and Computation	English	Open access^*	1995–2018	23
ranlp	521	Conference	Recent Advances in Natural Language Processing	English	Open access^*	2009–2019	4
sem	1,089	Conference	Lexical and Computational Semantics / Semantic Evaluation	English	Open access^*	2001–2020	13
speechc	1,087	Journal	Speech Communication	English	Private access	1982–2020	39
tacl	307	Journal	Transactions of the Association for Computational Linguistics	English	Open access^*	2013–2020	8
tal	222	Journal	Revue Traitement Automatique du Langage	French	Open access	2006–2020	15
taln	1,250	Conference	Traitement Automatique du Langage Naturel	French	Open access^*	1997–2020	24
taslp	7,387	Journal	IEEE/ACM Transactions on Audio, Speech, and Language Processing	English	Private access	1975–2020	46
tipster	105	Conference	Tipster Defense Advanced Research Projects Agency (DARPA) text program	English	Open access^*	1993–1998	3
trec	2,199	Conference	Text Retrieval Conference	English	Open access	1992–2020	29
Total incl. duplicates	88,752					1965–2020	687
Total excl. duplicates	85,138					1965–2020	667

Research area	Sources	# Docs
NLP-oriented	acl, alta, anlp, cath, cl, coling, conll, eacl, emnlp, hlt, ijcnlp, inlg, lre, lrec, ltc, mts, muc, naacl, paclic, ranlp, sem, tacl, tal, taln, tipster, trec	40,751
Speech-oriented	acmtslp, csal, icassps, isca, jep, lre, lrec, ltc, mts, speechc, taslp	53,264

Rank	Name	#papers	Previous	Previous	Delta	Delta%
2020			Rank	#Papers
			2015
1	Shrikanth S. Narayanan	453	1	358	95	27%
2	Hermann Ney	388	2	343	45	13%
3	John H. L. Hansen	354	3	299	55	18%
4	Haizhou Li	350	4	257	93	36%
5	Satoshi Nakamura	263	7	205	58	28%
6	Chin Hui P. Lee	261	5	218	43	20%
7	Alex Waibel	234	6	207	27	13%
8	Mark J. F. Gales	230	8	195	35	18%
9	James R. Glass	214	25	142	72	51%
10	Yang Liu	209	19	148	61	41%
11	Lin Shan Lee	204	9	193	11	6%
12	Li Deng	201	10	192	9	5%

Rank	Name	#papers
1	Graham Neubig	109
2	Shrikanth S. Narayanan	103
3	Haizhou Li	100
4	Yue Zhang	99
5	Björn W. Schuller	91
6	Dong Yu	83
7	Iryna Gurevych	80
8	Junichi Yamagishi	80
9	Shinji Watanabe	78
10	James R. Glass	77
11	Helen M. Meng	72
12	Pushpak Bhattacharyya	71

#Papers	#Authors	Author name
28	1	W. Nick Campbell
26	1	Jerome R. Bellegarda
24	2	Ellen M. Voorhees, Olivier Ferret
21	1	Ralph Grishman
20	1	Takayuki Arai
18	2	Mark A. Johnson, Rathinavelu Chengalvarayan
17	3	Beth M. Sundheim, Douglas B. Paul, Kenneth C. Litkowski
16	3	Jerry R. Hobbs, Oi Yee Kwong, Steven M. Kay
15	1	Donna Harman
14	4	Dominique Desbois, John Makhoul, Patrick Saint-Dizier, Sadaoki Furui
13	4	Eckhard Bick, Paul S. Jacobs, Rens Bod, Robert C. Moore
12	11	David S. Pallett, Harvey F. Silverman, Jen Tzung Chien, Jörg Tiedemann, Lynette Hirschman, Marius A. Pasca, Martin Kay, Reinhard Rapp, Stephen Tomlinson, Ted Pedersen, Yorick Wilks
11	10	Dekang Lin, Eduard H. Hovy, Hagai Aronowitz, Michael Schiehlen, Philip Rose, Philippe Blache, Roger K. Moore, Shunichi Ishihara, Stephanie Seneff, Tomek Strzalkowski
10	11	Aravind K. Joshi, Hermann Ney, Hugo Van Hamme, Joshua T. Goodman, Karen Spärck Jones, Kenneth Ward Church, Kuldip K. Paliwal, Mark Hepple, Mark A. Huckvale, Mark Jan Nederhof, Olov Engwall
9	31	…
8	25	…
7	51	…
6	90	…
5	124	…
4	224	…
3	447	…
2	1,088	…
1	4,667	…
0	60,193	…

Name	#Co-	Rank	Previous	Previous	New
	authors	2020	rank	#co-	co-authors
			2015	authors	2015–2020
Shrikanth S. Narayanan	403	1	1	299	104
Haizhou Li	355	2	3	252	103
Satoshi Nakamura	292	3	4	234	58
Björn W. Schuller	291	4	39	135	156
Yang Liu	290	5	12	178	112
Hermann Ney	288	6	2	254	34
Sanjeev Khudanpur	284	7	8	193	91
Khalid Choukri	253	8	15	177	76
Ming Zhou	246	9	71	115	131
Chin Hui P. Lee	241	10	7	194	47
Dong Yu	241	10	187	82	159
Alan W. Black	238	12	25	149	89

Name	#Collaborations	Rank	Previous	#Collaborations	New
	2020	2020	rank 2015	2015	collaborations 2015–2020
Shrikanth S. Narayanan	1,411	1	1	1,035	376
Haizhou Li	1,288	2	2	899	389
Hermann Ney	1,026	3	3	890	136
Satoshi Nakamura	861	4	4	672	189
Björn W. Schuller	841	5	26	408	433
Helen M. Meng	717	6	46	337	380
Dong Yu	716	7	63	293	423
Chin Hui P. Lee	710	8	6	544	166
Junichi Yamagishi	685	9	48	332	353
Ming Zhou	680	10	57	315	365
Alex Waibel	679	11	5	580	99
Bin Ma	670	12	10	503	167

Rank	Name	#Co-authors
1	Graham Neubig	193
1	Björn W Schuller	193
3	Yue Zhang	187
4	Dong Yu	175
4	Yu Zhang	175
6	Haizhou Li	161
7	Kongaik Lee	158
8	Shrikanth S. Narayanan	154
9	Ming Zhou	151
10	Shinji Watanabe	145
10	Jan Hajic	145
12	Yang Liu	143

Closeness centrality					Degree centrality				Betweenness centrality
Rank 2020	Previous rank 2015	Author's name	Harmonic centrality	Norm on first	Rank 2020	Previous rank 2015	Author's name	Index and norm on first	Rank 2020	Previous rank 2015	Author's name	Index	Norm on first
1	8	Sanjeev Khudanpur	17863.281	1	1	1	Shrikanth S Narayanan	1	1	1	Shrikanth S Narayanan	44717979	1
2	5	Haizhou Li	17782.575	0.995	2	3	Haizhou Li	0.881	2	2	Haizhou Li	34084103	0.762
3	2	Shrikanth S Narayanan	17709.094	0.991	3	4	Satoshi Nakamura	0.725	3	8	Yang Liu	32048199	0.717
4	1	Mari Ostendorf	17565.169	0.983	4	41	Björn W Schuller	0.722	4	3	Satoshi Nakamura	28679912	0.641
5	3	Chin Hui P Lee	17454.696	0.977	5	12	Yang Liu	0.72	5	4	Chin Hui P Lee	25895571	0.579
6	6	Julia B Hirschberg	17449.533	0.977	6	2	Hermann Ney	0.715	6	28	Laurent Besacier	25076596	0.561
7	15	Yang Liu	17442.071	0.976	7	8	Sanjeev Khudanpur	0.705	7	11	Alan W Black	23527696	0.526
8	11	Alan W Black	17409.874	0.975	8	15	Khalid Choukri	0.628	8	10	Khalid Choukri	22889904	0.512
9	4	Hermann Ney	17272.551	0.967	9	14	Ming Zhou	0.61	9	18	Sanjeev Khudanpur	21917631	0.49
10	115	Dong Yu	17249.284	0.966	10	7	Chin Hui P Lee	0.598	10	5	Hermann Ney	21262259	0.475
					10	187	Dong Yu	0.598

Rank	Name	Harmonic centrality	Norm on first
1	Dong Yu	7205.507	1
2	Yu Zhang	7109.654	0.987
3	Graham Neubig	7103.21	0.986
4	Yue Zhang	7012.758	0.973
5	Sanjeev Khudanpur	6908.953	0.959
6	Heng Ji	6897.558	0.957
7	Shinji Watanabe	6881.992	0.955
8	Xin Wang	6836.757	0.949
9	Mark A. Hasegawa Johnson	6811.851	0.945
10	Lukás Burget	6732.778	0.934

Rank	Name	Index	Norm on first
1	Yue Zhang	12633450	1
2	Graham Neubig	12539019	0.993
3	Dong Yu	10394169	0.823
4	Yu Zhang	9117498	0.722
5	Shrikanth S. Narayanan	8093016	0.641
6	Laurent Besacier	7640198	0.605
7	Yang Liu	6931507	0.549
8	Shinji Watanabe	6751311	0.534
9	Haizhou Li	6233480	0.493
10	Xin Wang	6096768	0.483

	Number	Percentage	Previous
		2020	% 2015
Papers never referenced	31,603	37	44
Papers never referenced (aside self ref)	40,111	47	54
Authors never referenced	23,850	36	42
Authors never referenced (aside self ref)	25s ,281	38	44

Rank 2020	Previous rank 2015	Name	#Citations	Nb of papers written by this author	Ratio #citations/nb of papers written by this author	Percentage of self-citations
1	3	Christopher D. Manning	13,195	152	86.809	2.145
2	1	Hermann Ney	7,109	388	18.322	16.205
3	>20	Christopher Dyer	5,372	114	47.123	3.984
4	>20	Richard Socher	5,175	37	139.865	1.198
5	2	Franz Josef Och	5,041	42	120.024	1.825
6	5	Dan Klein	4,945	130	38.038	6.249
7	4	Philipp Koehn	4,726	59	80.102	2.412
8	>20	Noah A. Smith	4,648	160	29.05	6.713
9	7	Andreas Stolcke	4,532	145	31.255	6.355
10	6	Michael John Collins	4,256	69	61.681	3.195
11	>20	Kenton Lee	4,251	21	202.429	0.729
12	>20	Luke S. Zettlemoyer	4,158	92	45.196	5.075
13	9	Salim Roukos	4,132	71	58.197	1.5
14	18	Daniel Jurafsky	4,056	118	34.373	2.342
15	>20	Kristina Toutanova	4,055	47	86.277	0.764
16	>20	Sanjeev Khudanpur	4,051	135	30.007	6.492
17	>20	Daniel Povey	3,796	112	33.893	7.929
18	16	Li Deng	3,672	201	18.269	14.842
19	>20	Dong Yu	3,653	177	20.638	10.895
20	>20	Mirella Lapata	3,578	138	25.928	6.987

Number of written papers	Name	#as first author	% as first author	#as last author	% as last author	#as sole author	% as sole author	Rank citations	#self-citations	Ratio of #self-citations/ number of written papers	#external citations	Ratio of #external citations/number of written papers
453	Shrikanth S. Narayanan	13	3	388	86	0	0	>20	782	1.726	2,129	4.7
388	Hermann Ney	27	7	325	84	10	3	2	1,152	2.969	5,957	15.353
354	John H. L. Hansen	29	8	283	80	3	1	>20	779	2.201	1,076	3.04
350	Haizhou Li	13	4	256	73	2	1	>20	490	1.4	1,623	4.637
263	Satoshi Nakamura	17	6	190	72	1	0	>20	160	0.608	648	2.464
261	Chin Hui P. Lee	14	5	207	79	5	2	>20	577	2.211	2,852	10.927
234	Alex Waibel	13	6	199	85	2	1	>20	262	1.12	2,048	8.752
230	Mark J. F. Gales	31	13	105	46	9	4	>20	638	2.774	2,923	12.709
214	James R. Glass	11	5	152	71	1	0	>20	428	2	2,084	9.738
209	Yang Liu	48	23	83	40	3	1	>20	240	1.148	2,080	9.952
204	Lin Shan Lee	10	5	189	93	0	0	>20	328	1.608	656	3.216
201	Li Deng	57	28	73	36	6	3	18	545	2.711	3,127	15.557
197	Hervé Bourlard	10	5	141	72	3	2	>20	277	1.406	940	4.772
195	Mari Ostendorf	29	15	100	51	5	3	>20	309	1.585	2,136	10.954
195	Tatsuya Kawahara	33	17	110	56	0	0	>20	248	1.272	708	3.631
192	Björn W. Schuller	40	21	105	55	0	0	>20	511	2.661	1,583	8.245
188	Keikichi Hirose	28	15	95	51	1	1	>20	140	0.745	330	1.755
183	Frank K. Soong	9	5	78	43	0	0	>20	208	1.137	1,240	6.776
182	Kiyohiro Shikano	1	1	142	78	0	0	>20	276	1.516	1,161	6.379
180	Timothy Baldwin	21	12	115	64	4	2	>20	216	1.2	1,160	6.444

Rank	Name	#Citations	#Papers written by this author	Ratio #citations/#papers written by this author	Percentage of self-citations
1	Christopher D. Manning	9,148	152	60.184	0.875
2	Richard Socher	4,404	37	119.027	0.749
3	Kenton Lee	4,250	21	202.381	0.729
4	Christopher Dyer	3,881	114	34.044	3.015
5	Luke S. Zettlemoyer	3,640	92	39.565	3.407
6	Sanjeev Khudanpur	3,168	135	23.467	5.966
7	Kristina Toutanova	3,154	47	67.106	0.254
8	Noah A. Smith	3,115	160	19.469	4.687
9	Ming Wei Chang	2,990	31	96.452	1.204
10	Daniel Povey	2,852	112	25.464	6.872
11	Jacob Devlin	2,836	20	141.8	0.353
12	Jeffrey Pennington	2,586	2	1293	0
13	Percy Liang	2,312	56	41.286	3.287
14	Dong Yu	2,238	177	12.644	6.702
15	Tomáš Mikolov	2,232	18	124	0.314
16	Yoshua Bengio	2,170	47	46.17	2.074
17	Mirella Lapata	2,106	138	15.261	7.123
18	Daniel Jurafsky	2,002	118	16.966	1.049
19	Eduard H. Hovy	1,970	168	11.726	2.69
20	Yoav Goldberg	1,860	72	25.833	2.527

Rank 2020	Rank 2015	Title	Authors	Source	Year	#Citations 2020	#Citations
1	1	BLEU: a Method for Automatic Evaluation of Machine Translation	Kishore A. Papineni, Salim Roukos, Todd R. Ward, Wei Jing Zhu	acl	2002	3,020	1514
2	>20	Glove: Global Vectors for Word Representation	Jeffrey Pennington, Richard Socher, Christopher D. Manning	emnlp	2014	2,590
3	0	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding	Jacob Devlin, Ming Wei Chang, Kenton Lee, Kristina Toutanova	naacl	2019	2,468
4	2	Building a Large Annotated Corpus of English: The Penn Treebank	Mitchell P. Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz	cl	1993	1,610	1145
5	3	Moses: Open Source Toolkit for Statistical Machine Translation	Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Christopher Dyer, Ondrej Bojar, Alexandra Constantin, Evan Herbst	acl	2007	1,380	860
6	5	SRILM - an extensible language modeling toolkit	Andreas Stolcke	isca	2002	1,319	831
7	>20	Front-End Factor Analysis for Speaker Verification	Najim Dehak, Patrick J. Kenny, Réda Dehak, Pierre Dumouchel, Pierre Ouellet	taslp	2011	1,170
8	0	Deep Contextualized Word Representations	Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke S. Zettlemoyer	naacl	2018	1,166
9	4	A Systematic Comparison of Various Statistical Alignment Models	Franz Josef Och, Hermann Ney	cl	2003	1,079	855
10	6	Statistical Phrase-Based Translation	Philipp Koehn, Franz Josef Och, Daniel Marcu	hlt, naacl	2003	1,038	829
11	7	The Mathematics of Statistical Machine Translation: Parameter Estimation	Peter E. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, Robert L. Mercer	cl	1993	978	820
12	0	Effective Approaches to Attention-based Neural Machine Translation	Thang Luong, Hieu Pham, Christopher D. Manning	emnlp	2015	907
13	8	Minimum Error Rate Training in Statistical Machine Translation	Franz Josef Och	acl	2003	879	726
14	>20	Convolutional Neural Networks for Sentence Classification	Yoon Chul Kim	emnlp	2014	862
15	0	Neural Machine Translation of Rare Words with Subword Units	Rico Sennrich, Barry Haddow, Alexandra Birch	acl	2016	836
16	>20	Wordnet: A Lexical Database For English	George A. Miller	hlt	1992	814
17	>20	Spoken Language Translation	Hwee Tou Ng	emnlp	1997	774
18	15	Europarl: A Parallel Corpus for Statistical Machine Translation	Philipp Koehn	mts	2005	760	472
19	10	Suppression of acoustic noise in speech using spectral subtraction	Steven F. Boll	taslp	1979	728	566
20	13	Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator	Yariv Ephraim, David Malah	taslp	1984	708	488

Rank 2020	Previous Rank 2015	Name	h-5 Index	h-5 Median	Previous h-5 index	Previous h-5 median
1	1	Meeting of the Association for Computational Linguistics (ACL)	157	275	65	99
2	2	Conference on Empirical Methods in Natural Language Processing (EMNLP)	132	235	56	81
3	5	Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (HLT-NAACL)	105	195	48	71
4	3	IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)	96	143	54	73
5	6	Conference of the International Speech Communication Association (INTERSPEECH)	89	150	39	70
6	8	International Conference on Computational Linguistics (COLING)	64	103	38	59
7	4	IEEE/ACM Transactions on Audio, Speech, and Language Processing	60	87	51	78
8		Transactions of the Association for Computational Linguistics (TACL)	59	136
9	7	International Conference on Language Resources and Evaluation (LREC)	53	81	38	64
10	15	International Workshop on Semantic Evaluation (SEMEVAL)	52	93	23	41
10	16	Conference of the European Chapter of the Association for Computational Linguistics (EACL)	52	98	21	34
12	20	Workshop on Machine Translation (WMT)	47	74	18	24
13	13	Conference on Computational Natural Language Learning (CoNLL)	43	77	24	36
14	10	Computer Speech & Language (CSL)	34	49	32	51
14	19	Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)	34	51	18	27
16		IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)	33	52
16	18	IEEE Spoken Language Technology Workshop (SLT)	33	58	18	28
18	12	Computational Linguistics (CL)	30	48	31	40
18	17	International Joint Conference on Natural Language Processing (IJCNLP)	30	48	20	27
20	11	Speech Communication	28	49	32	49
21		Workshop on Representation Learning for NLP	27	72
22		Biomedical Natural Language Processing	26	37
23		Workshop on Innovative Use of NLP for Building Educational Applications	25	34
24	14	Language Resources and Evaluation (LRE)	24	36	23	42
24		Odyssey: The Speaker and Language Recognition Workshop	24	45
24		International Conference on Natural Language Generation (INLG)	24	35
27		Natural Language Engineering	23	48
28		IEEE International Conference on Semantic Computing	22	31

Rank	Term	Variants of all sorts	#Occurrences	Frequency	#Existences	Presence	Occurrences/ existences	Previous Rank	Delta Ranking
1	Dataset	Data-set, data-sets, datasets	240,691	0.00758	24,288	0.28969	9.91	11	10
2	Annotation	Annotations	187,175	0.00589	19,942	0.23786	9.39	4	2
3	SR	ASR, ASRs, Automatic Speech Recognition, Speech Recognition, automatic speech recognition, speech recognition	179,579	0.00566	25,916	0.30911	6.93	2	−1
4	LM	LMs, Language Model, Language Models, language model, language models	164,944	0.00519	19,139	0.22828	8.62	3	−1
5	HMM	HMMs, Hidden Markov Model, Hidden Markov Models, Hidden Markov model, Hidden Markov models, hidden Markov Model, hidden Markov Models, hidden Markov model, hidden Markov models	155,335	0.00489	17,131	0.20433	9.07	1	−4
6	Embedding	Embeddings	145,844	0.00459	11,804	0.14079	12.36	29	23
7	Classifier	Classifiers	143,885	0.00453	18,540	0.22114	7.76	6	−1
8	POS	POSs, Part Of Speech, Part of Speech, Part-Of-Speech, Part-of-Speech, Parts Of Speech, Parts of Speech, Pos, part of speech, part-of-speech, parts of speech, parts-of-speech	135,022	0.00425	18,946	0.22598	7.13	5	−3
9	NP	NPs, noun phrase, noun phrases	111,726	0.00352	12,139	0.14479	9.20	7	−2
10	Parser	parsers	107,678	0.00339	12,071	0.14398	8.92	8	−2
11	Neural network	ANN, ANNs, Artificial Neural Network, Artificial Neural Networks, NN, NNs, Neural Network, Neural Networks, NeuralNet, NeuralNets, neural net, neural nets, neural networks	97,039	0.00306	18,724	0.22333	5.18	17	6
12	Metric	Metrics	95,056	0.00299	20,451	0.24393	4.65	18	6
13	Segmentation	Segmentations	94,888	0.00299	14,033	0.16738	6.76	9	−4
14	SNR	SNRs, Signal Noise Ratio, Signal Noise Ratios, signal noise ratio, signal noise ratios	90,820	0.00286	8,517	0.10159	10.66	10	−4
15	MT	MTs, Machine Translation, Machine Translations, machine translation, machine translations	88,790	0.0028	13,603	0.16225	6.53	15	0
16	Parsing	Parsings	75,189	0.00237	12,551	0.1497	5.99	13	−3
17	DNN	DNNs, Deep Neural Network, Deep Neural Networks, deep neural network, deep neural networks	74,921	0.00236	5,740	0.06846	13.05	63	46
18	GMM	GMMs, Gaussian Mixture Model, Gaussian Mixture Models, Gaussian mixture model, Gaussian mixture models	74,820	0.00236	8,203	0.09784	9.12	14	−4
19	ngram	n-gram, n-grams, ngrams	73,159	0.0023	11,285	0.1346	6.48	21	2
20	Semantic		70,186	0.00221	16,697	0.19915	4.20	12	−8
21	Decoder	Decoders	69,385	0.00219	10,274	0.12254	6.75	71	50
22	WER	WERs, Wer, word error rate, word error rates	69,297	0.00218	8,547	0.10194	8.11	20	−2
23	LSTM		68,445	0.00216	7,090	0.08457	9.65	145	122
24	SVM	SVMs, Support Vector Machine, Support Vector Machines, support vector machine, support vector machines	67,610	0.00213	9,005	0.10741	7.51	19	−5
25	Iteration	Iterations	65,686	0.00207	15,372	0.18335	4.27	16	−9

Rank	Term	Observed	Term	Observed	Term	Predicted	Term	Observed
		2018		2019		2020		2020
1	DATASET	0.019411	Dataset	0.019293	Embedding	0.020756	Dataset	0.020833
2	EMBEDDING	0.012028	Embedding	0.018099	Dataset	0.017509	Embedding	0.015237
3	ANNOTATION	0.008888	Encoder	0.009572	Encoder	0.011884	BERT	0.01076
4	LSTM	0.008571	LSTM	0.008271	BERT	0.008609	Annotation	0.009168
5	DNN	0.006005	Decoder	0.007093	Decoder	0.008261	Encoder	0.009156
6	SR	0.005689	LM	0.006079	Classifier	0.007376	LM	0.006342
7	RNN	0.005585	Metric	0.005929	LM	0.006825	Transformer	0.006299
8	Encoder	0.005373	BERT	0.005745	Metric	0.006738	SR	0.006232
9	Classifier	0.005365	SR	0.005388	LSTM	0.006276	Metric	0.00604
10	Neural network	0.005334	Annotation	0.005326	Transformer	0.004887	LSTM	0.005866

Predicted in 2015						Observed in 2020
*Prediction 2016*	*Prediction 2017*	*Prediction 2018*	*Prediction 2019*	*Prediction 2020*	Rank	Observation 2016	Observation 2017	Observation 2018	Observation 2019	Observation 2020
Dataset	Dataset	Dataset	Dataset	Dataset	1	Dataset	Dataset	Dataset	Dataset	Dataset
DNN	DNN	DNN	DNN	DNN	2	Annotation	Embedding	Embedding	Embedding	Embedding
Annotation	Neural network	Neural network	Neural network	Neural network	3	DNN	DNN	Annotation	Encoder	BERT
POS	SR	RNN	RNN	RNN	4	embedding	LSTM	LSTM	LSTM	Annotation
Neural network	Classifier	POS	Parser	Parser	5	SR	SR	DNN	Decoder	Encoder
Classifier	LM	Parser	SR	SR	6	LSTM	RNN	SR	LM	LM
Parser	POS	Annotation	LM	Metric	7	POS	Annotation	RNN	Metric	Transformer
SR	RNN	Classifier	Classifier	POS	8	Classifier	Neural network	Encoder	BERT	SR
LM	parser	SR	Metric	Parsing	9	Neural network	Classifier	Classifier	SR	Metric
HMM	HMM	Metric	POS	Classifier	10	RNN	LM	Neural network	Annotation	LSTM
Rightly predicted in 10 tops						7	7	8	4	3
Rightly predicted at rank						1	1	1	1	1

Rank 2020	Previous Rank 2015	Term	Variants of all sorts	Event when the term appeared	Authors who introduced the term	Documents	Archive #occurences	Archive frequency	Archive #existence	Archive presence	Archive rank occurrences	Archive rank existences	Archive occurrence/existence ratio	#Occurrences of the term in 2020 (by other people than the inventors)	#Existences in 2020 (by other people than the inventors)	Frequency in 2020	Presence in 2020
1	1	Dataset	Data-set, data-sets, datasets	1966	Laurence Urdang	cath1966-3	240,691	0.0076	24,288	0.290	1	2	9.91	59,794	4,313	0.0224	0.795
2	30	Embedding	Embeddings	1967	Aravind K. Joshi, Danuta Hiz, Jane J. Robinson, Steven I. Laszlo	C67-1007 C67-1010 C67-1015	145,845	0.0046	11,804	0.141	6	25	12.36	37,346	3,193	0.0140	0.588
3	2	Metric	Metrics	1965	A Andreyewsky	C65-1002	95,056	0.0030	20,451	0.244	12	4	4.65	14,352	2,915	0.0054	0.537
4	7	Neural network	ANN, ANNs, Artificial Neural Network, Artificial Neural Networks, NN, NNs, Neural Network, Neural Networks, NeuralNet, NeuralNets, neural net, neural nets, neural networks	1972	P J. Brown	cath1972-21	97,031	0.0031	18,716	0.223	11	8	5.18	9,190	2,623	0.0034	0.483
5	>200	Encoder	Encoders	1968	Raymond F. Erickson	cath1968-2	62,324	0.0020	6,874	0.082	28	74	9.07	21,444	2,350	0.0080	0.433
6	6	Annotation	Annotations	1967	Kenneth Janda, Martin Kay	cath1967-12 cath1967-8	187,175	0.0059	19,942	0.238	2	5	9.39	21,751	2,160	0.0081	0.398
7	67	Hyperparameter	hyperparam, hyperparameters	1989	G Demoment	taslp1989-131	22,593	0.0007	7,900	0.094	104	58	2.86	5,232	2,110	0.0020	0.389
8	9	LM	LMs, Language Model, Language Models, language model, language models	1965	Sheldon Klein	C65-1014	164,564	0.0052	19,080	0.228	4	6	8.62	14,850	1,977	0.0056	0.364
9	14	NLP	Natural Language Processing	1965	Denis M. Manelski, Gilbert K. Krulee	C65-1018	46,094	0.0015	14,243	0.170	40	14	3.24	6,978	1,946	0.0026	0.359
10	146	LSTM		1999	Felix A. Gers, Fred Cummins, Juergen Schmidhuber	e99_93	68,445	0.0022	7,090	0.085	23	70	9.65	13,767	1,934	0.0051	0.356
12	3	Subset	Sub set, sub sets, sub-set, sub-sets, subsets	1965	Denis M. Manelski, E. D. Pendergraft, Gilbert K. Krulee, Itiroo Sakai, N. Dale, Wojciech Skalmowski	C65-1006 C65-1018 C65-1021 C65-1025	65,243	0.0021	24,171	0.288	26	29	2.70	5,239	1,913	0.0020	0.353
14	4	Classifier	Classifiers	1967	Aravind K. Joshi, Danuta Hiz	C67-1007	143,885	0.0045	18,540	0.221	7	13	7.76	11,125	1,847	0.0042	0.340
24	5	SR	ASR, ASRs, Automatic Speech Recognition, Speech Recognition, automatic speech recognition, speech recognition	1965	Denis M. Manelski, Dániel Várga, Gilbert K. Krulee, Makoto Nagao, Toshiyuki Sakai	C65-1018 C65-1022 C65-1029	179,579	0.0056	25,916	0.309	3	1	6.93	14,630	1,423	0.0055	0.262
27	10	Optimization	Optimization, optimisations, optimizations	1967	Ellis B. Page	C67-1032	48,412	0.0015	15,221	0.182	36	13	3.18	3,514	1,356	0.0013	0.250
33	8	POS	POSs, Part Of Speech, Part of Speech, Part-Of-Speech, Part-of-Speech, Parts Of Speech, Parts of Speech, Pos, part of speech, part-of-speech, parts of speech, parts-of-speech	1965	Denis M. Manelski, Dániel Várga, Gilbert K. Krulee, Makoto Nagao, Toshiyuki Sakai	C65-1018 C65-1022 C65-1029	135,022	0.0042	18,946	0.226	8	14	7.13	7,278	1,158	0.0027	0.213

Rank	Overall	NLP	Speech
1	Speech recognition	Semantic	Speech recognition
2	Subset	NP	Spectral
3	Semantic	Syntactic	HMM
4	LM	POS	Filtering
5	Filtering	Parsing	Subset
6	POS	Subset	Acoustics
7	HMM	Parser	Gaussian
8	Iteration	Lexical	Fourier
9	Spectral	Machine translation	Acoustic
10	Metric	Annotation	Linear

Rank	Overall	NLP	Speech
1	Lawrence R. Rabiner	Ralph Grishman	Lawrence R. Rabiner
2	Hermann Ney	Jun'Ichi Tsujii	Shrikanth S. Narayanan
3	Shrikanth S. Narayanan	Kathleen R. Mckeown	John H. L. Hansen
4	John H. L. Hansen	Aravind K. Joshi	Hermann Ney
5	Chin Hui P. Lee	Christopher D. Manning	Chin Hui P. Lee
6	Haizhou Li	Mark A. Johnson	Haizhou Li
7	Mark J. F. Gales	Noah A. Smith	Mark J. F. Gales
8	Mari Ostendorf	Ralph M. Weischedel	Li Deng
9	Li Deng	Eduard H. Hovy	Hervé Bourlard
10	Alex Waibel	Timothy Baldwin	Frank K. Soong

Rank 2020	Rank 2015	Name	Type	# Existences	# Occurences	First author	First corpora	First year	Last year
1	3	Wikipedia	NLPCorpus	6,348	36695	Ana Licuanan, Jinxi Xu, Ralph M. Weischedel	trec	2003	2020
2	1	WordNet	NLPLexicon	5,803	37654	Kenji Sakamoto, Kouichi Yamaguchi, Toshio Akabane, Yoshiji Fujimoto	isca	1990	2020
3	>10	BLEU	NLPSpecification	4,595	42311	Ludovic Lebart	modulad	2001	2020
4	2	Timit	NLPCorpus	3,982	15984	Andrej Ljolje, Benjamin Chigier, David Goodine, David S. Pallett, Erik Urdang, Fileno Alleva, Francine R. Chen, George R. Doddington, Hong C. Leung, Hsiao Wuen Hon, James L. Hieronymus, James R. Glass, Jan Robin Rohlicek, Jeff Shrager, Jeffrey N. Marcus, John Dowding, John F. Pitrelli, John S. Garofolo, Joseph H. Polifroni, Judith R. Spitz, Julia B. Hirschberg, Kai Fu Lee, L. G. Miller, Mari Ostendorf, Mark Liberman, Meiyuh Hwang, Michael D. Riley, Michael S. Phillips, Robert Weide, Stephanie Seneff, Stephen E. Levinson, Vassilios V. Digalakis, Victor W. Zue	hlt, isca, taslp	1989	2020
5	4	Penn Treebank	NLPCorpus	2,786	10,622	Beatrice Santorini, David M. Magerman, Eric Brill, Mitchell P. Marcus	hlt	1990	2020
6	>10	Word2Vec	NLPTool	2,536	8,245	Allan Hanbury, Amir Globerson, Angelina Ivanova, Baobao Chang, Bin Gao, Bing Qin, Bo Tang, Brigitte Grau, Bruno Martins, Bryan Rink, Carina Silberer, Carlos Guestrin, Carmen Banea, Chengqing Zong, Christopher D. Manning, Chuchu Huang, Claire Cardie, Cícero Nogueira Dos Santos, Cícero Nogueira Dos Santos, D. Song, Dakun Zhang, Daniel Zeman, Daniel P. Flickinger, Danqi Chen, David B. Bracewell, Daxiang Dong, Deniz Yuret, Di Chen, Dianhai Yu, Dimitri Kartsaklis, Dmitrijs Milajevs, Duyu Tang, Emanuela Boros, Enhong Chen, Fabin Shi, Fei Tian, Filip Ginter, Furu Wei, Georgiana Dinu, Germán Kruszewski, Guang Chen, Guoxin Cui, Haifeng Wang, Haiyang Wu, Hal Daumé Iii, Hanjun Dai, Heike Adel, Hinrich Schütze, Hu Junfeng, Hua Wu, Idan Szpektor, Ido Dagan, Ignacio Cano, Ion Androutsopoulos, Ivan Titov, Jacob Goldberger, Jan Hajic, Janyce M. Wiebe, Jason Weston, Jeffrey Pennington, Jenna Kanerva, Jiajun Zhang, Jiang Bian, Jiang Guo, Jianlin Feng, Jianwen Zhang, Johan Bos, Johannes Bjerva, John Pavlopoulos, Jordan Boyd Graber, João Filgueiras, João Palotti, Juhani Luotolahti, Jun Zhao, Jun Cheng Guo, Kai Hakala, Kang Liu, Karen Livescu, Kazuma Hashimoto, Keith Adams, Kevin Gimpel, Leonardo Claudino, Li Dong, Liheng Xu, Linda Anderson, Liumingjing Xiao, Maira Gatti, Makoto Miwa, Malvina Nissim, Maosong Sun, Marc Tomlinson, Marco Baroni, Marco Kuhlmann, Marek Rei, Mark Dredze, Matthew Purver, Mehrnoosh Sadrzadeh, Michael Mohler, Miguel B. Almeida, Mikhail Kozhevnikov, Ming Zhou, Mirella Lapata, Mo Yu, Mohit Bansal, Mohit Iyyer, Mu Li, Mário J. Silva, Nan Yang, Navid Rekabsaz, Nianwen Xue, Olivier Ferret, Omer Levy, Oren Melamud, P. Zhang, Peng Hsuan Li, Peter Enns, Philip Resnik, Pontus Stenetorp, Qinlong Wang, Rada F. Mihalcea, Regina Barzilay, Richard Socher, Rob Van Der Goot, Romaric Besançon, Rui Zhang, Sameer Singh, Sanda Maria Harabagiu, Shaoda He, Shizhu He, Shujie Liu, Silvio Amir, Stephan Oepen, Sumit Chopra, Suwisa Kaewphan, Tao Ge, Tao Li, Tatsuya Izuha, Ted Briscoe, Tie Yan Liu, Ting Liu, Travis R. Goodwin, Wanxiang Che, Wei He, Weiran Xu, Wen Ting Wang, Wenzhe Pei, Xiaobo Hao, Xiaoguang Hu, Xiaojun Zou, Xiaolei Liu, Xiaozhao Zhao, Xingxing Zhang, Xinxiong Chen, Xueke Xu, Xueqi Cheng, Yang Liu, Yi Zhang, Yoav Goldberg, Yonatan Belinkov, Yongqiang Chen, Yoon Chul Kim, Yoshimasa Tsuruoka, Yuanyuan Qi, Yuanzhe Zhang, Yuchen Zhang, Yue Liu, Yusuke Miyao, Yuta Tsuboi, Zhen Wang, Zheng Chen, Zhenjun Tang, Zhiqiang Toh, Zhiyuan Liu	acl, coling, conll, eacl, emnlp, lrec, sem, tacl, trec	2014	2020
7	5	Praat	NLPTool	2,123	4,359	Carlos Gussenhoven, Toni C. M. Rietveld	isca	1997	2020
8	>10	MATLAB	NLPTool	1,915	2,842	Demosthenis Stavrinides, Michael D. Zoltowski	taslp	1989	2020
9	>10	GloVe	NLPTool	1,863	6,686	Christopher D. Manning, Jeffrey Pennington, Richard Socher	emnlp	2014	2020
10	>10	AnCora	NLPCorpus	1,694	3,233	Barbara J. Grosz, Jaime G. Carbonell, Mitchell P. Marcus, Ralph M. Weischedel, Raymond Perrault, Robert Wilensky, Wendy G. Lehnert	hlt	1989	2020

Year	# Existences	# Documents	Top10 cited resources
2000	1,923	2,118	Timit, WordNet, RST, HPSG, Penn Treebank, AnCora, EAGLES, ATIS, LFG, Pronunciation Dictionary
2001	1,283	1,551	WordNet, Timit, Penn Treebank, NOISEX, ATIS, SENSEVAL, HPSG, MATLAB, Maximum Likelihood Linear Regression, TDT
2002	2,200	2,074	WordNet, Timit, Penn Treebank, MATLAB, HPSG, British National Corpus, AnCora, Praat, EAGLES, BAF
2003	2,085	1,991	Timit, WordNet, Penn Treebank, BLEU, BAF, AQUAINT, Pronunciation Dictionary, British National Corpus, HPSG, TAG
2004	3,633	2,695	WordNet, Timit, Penn Treebank, AnCora, Praat, BLEU, British National Corpus, FrameNet, AQUAINT, EuroWordNet
2005	3,453	2,416	WordNet, Timit, BLEU, Penn Treebank, Praat, AQUAINT, GIZA++, MATLAB, Pronunciation Dictionary, ICSI
2006	5,681	3,101	WordNet, Timit, BLEU, Penn Treebank, AnCora, Praat, PropBank, AQUAINT, FrameNet, MATLAB
2007	4,910	2,663	WordNet, BLEU, Timit, Penn Treebank, GIZA++, Praat, MATLAB, SRILM, GALE, Wikipedia
2008	6,582	3,208	WordNet, BLEU, Wikipedia, Timit, Penn Treebank, Praat, AnCora, PropBank, GALE, FrameNet
2009	6,067	2,919	WordNet, BLEU, Wikipedia, Timit, Penn Treebank, Praat, SRILM, GALE, Europarl, GIZA++
2010	8,782	3,547	WordNet, Wikipedia, BLEU, Penn Treebank, Timit, AnCora, GIZA++, MATLAB, Europarl, FrameNet
2011	6,105	2,864	Wikipedia, WordNet, BLEU, Timit, Penn Treebank, GIZA++, MATLAB, SRILM, Praat, Weka
2012	10,097	3,663	Wikipedia, WordNet, BLEU, Timit, Penn Treebank, Praat, Europarl, AnCora, GIZA++, MATLAB
2013	8,874	3,342	Wikipedia, WordNet, BLEU, Timit, Penn Treebank, SRILM, Weka, GIZA++, MATLAB, Praat
2014	10,793	3,663	Wikipedia, WordNet, BLEU, Timit, Penn Treebank, Praat, AnCora, MATLAB, Weka, SRILM
2015	9,932	3,568	Wikipedia, WordNet, Word2Vec, BLEU, Timit, SemEval, MATLAB, Penn Treebank, Praat, Weka
2016	11,303	3,814	Wikipedia, Word2Vec, WordNet, BLEU, Timit, Praat, Penn Treebank, MATLAB, AnCora, Europarl
2017	7,915	3,042	Wikipedia, Word2Vec, BLEU, WordNet, Timit, GloVe, Praat, MATLAB, Penn Treebank, Keras
2018	13,295	4,482	Wikipedia, Word2Vec, BLEU, GloVe, WordNet, Seq2seq, Timit, Penn Treebank, ROUGE, CoreNLP
2019	13,461	5,003	Wikipedia, BLEU, GloVe, Word2Vec, Seq2seq, WordNet, ROUGE, Timit, Penn Treebank, SQuAD
2020	15,652	5,426	Wikipedia, BLEU, GloVe, Word2Vec, Seq2seq, RoBERTa, WordNet, ROUGE, LibriSpeech, SQuAD

PERMALINK

NLP4NLP+5: The Deep (R)evolution in Speech and Language Processing

Joseph Mariani

Gil Francopoulo

Patrick Paroubek

Frédéric Vernier

Abstract

Introduction

Preliminary Remarks

The NLP4NLP Speech and Natural Language Processing Corpus

Table 1.

The NLP4NLP+5 Speech and Natural Language Processing Corpus

Table 2.

Global Analysis Of The Conferences And Journals

Production of Papers Over the Years

Figure 1.

Figure 2.

Figure 3.

Data and Tools

Overall Analysis

Authors' Renewal and Redundancy

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Figure 9.

Papers and Authorship

Figure 10.

Authors' Gender

Figure 11.

Figure 12.

Authors' Production and Co-production

Table 3.

Table 4.

Collaborations

Authors' Collaborations

Figure 13.

Table 5.

Table 6.

Table 7.

Table 8.

Collaboration Graph

Figure 14.

Measures of Centrality in the Collaboration Graph

Table 9.

Table 10.

Table 11.

Citations

Global Analysis

Figure 15.

Figure 16.

Table 12.

Analysis of Authors' Citations

Most Cited Authors

Table 13.

Table 14.

Table 15.

Authors' H-Index

Table 16.

Table 17.

Analysis of Papers' Citations

Most Cited Papers

Table 18.

Table 19.

Analysis of Citations Among NLP4NLP Sources

Comparison of NLP vs. Speech Processing Sources

Figure 17.

Figure 18.

Comparison of Citations for Six Major Conferences and Journals

Figure 19.

Figure 20.

Citation Graph

Figure 21.

Figure 22.

Figure 23.

Figure 24.

Sources' H-Index

Figure 25.

Analysis of the Citation in NLP4NLP Papers of Sources From the Scientific Literature Outside NLP4NLP

Data	Impact	Evaluation	Impact	Tools	Impact
	factor		factor		factor
Wikipedia	6,348	BLEU	4,595	Word2Vec	2,536
WordNet	5,803	ROUGE	1,335	Praat	2,123
Timit	3,982			MATLAB	1,915
Penn Treebank	2,786			GloVe	1,863
AnCora	1,694			SRILM	1,375
Europarl	1,405			GIZA++	1,314
SemEval	1,257			Weka	1,220
FrameNet	1,202			Seq2seq	1,162
CoNLL	1,091

	Source is quoted	Source is not quoted
At least one author in both papers	Self-Reuse	Self-Plagiarism
No author in common	Reuse	Plagiarism