Abstract
Scientific publications have become the currency of Academia, hence the concept of ‘publish or perish’. But there are consequences: the amount of existing literature and its proliferation rate have reached the point where keeping pace is just impossible. If this is true in general, it becomes a huge issue in interdisciplinary fields such as bioethics where knowing the state of the art in more than one single discipline is a concrete necessity. If we accept the idea of building new science on an exhaustive comprehension of existing knowledge, a radical change is needed. Smart iterative search strategies, frequency analysis and text mining, techniques described in this paper, can't be a long run solution. But they might serve as a useful coping strategy.
Keywords: Publications' proliferation, Text mining, Search strategies, Information extraction, Topic tracking, Information science, Information systems management, Information technology, Content analysis, Data mining, Knowledge representation, Information management
publications' proliferation; Text Mining; search strategies; Information Extraction; topic tracking; Information Science; Information Systems Management; Information Technology; Content Analysis; Data Mining; Knowledge Representation; Information Management
1. Introduction
Scientific publications has played a central role in modern science since its very beginning: any result, no matter how good, has little to no value if it is not made public, available to peers to be analysed, discussed, questioned and maybe also used as a foundation or an instrument for further research. This is the original reason behind scientific publications. Born as a way to disseminate scientific news to a small audience of interested experts (Gotti, 2006), over the last 350 years it has evolved into a complex and organized system, characterized by a detailed set of rules (theoretically) developed to guarantee the scientific quality of publications. The publication system, paralleled by bibliometrics, has improperly become a tool for evaluating careers, departments, research projects, and so on, often only from a quantitative point of view. In fact, the problems generated by purely quantitative approaches in research evaluation are recognised by the Leiden Manifesto for research metrics, one of the most influential documents aiming to outline a set of principles and best practices for scientometrics. Its first principle states that “quantitative evaluation should support qualitative, expert assessment”: support, not replace(Hicks et al., 2015).
This shift of scope, from conveying knowledge to evaluating knowledge, has reshaped the way scientists write publications: the more, the better; the higher impact factor, the better. Lots of words have been spent on this topic, discussing and contesting many different aspects, from cross-field bibliometric comparisons (Bornmann et al., 2008) to the very idea of capturing “quality” with the amount and impact factor of publications (Lüscher and Thomas, 2018; Bornmann and Haunschild, 2017; Callaway, 2016). This paper has no intention to further deal with the political aspects of the issue; rather, the intention is to offer a set of instruments to manage the biggest consequence of the “publish or perish rule” resulting from the aforementioned shift of scope: the (over)proliferation of publications and the impossibility of keeping the pace with new literature.
It is not only theoretical speculation: even without considering the phenomenon of retracted literature (Brainard and Jia, 2018), and of predatory publishing (Bohannon, 2013; Sorokowski et al., 2017), the existence of a relevant body of non-relevant literature (the pun is hard to avoid) is a fact that anyone engaging with, for instance, a systematic review can verify by her or himself.
In short, the problem can be outlined as follows: scientific publications are the currency of Academia, hence scientists publish a lot, and not always relevant things (where “a relevant thing” means adding at least a single brick to the building of science) (Binswanger, 2014). But still, once a piece of literature is published, it becomes part of the corpus of knowledge on a certain topic, and cannot be just ignored a priori. This overproliferation has lead to the development of a set of “coping strategies” to try reducing the amount of time needed to keep the pace with growing amounts of literature. But rarely a coping strategy is a solution, and often, as I will discuss later, it implies some degree of bias.
2. The case of bioethics
If this process is true in general, it has a special impact on every interdisciplinary field, including bioethics (Eriksson and Helgesson, 2017). Every researcher needs to be knowledgeable of a field in order to contribute to its growth, and bioethics has by necessity the need to include input from many different disciplines, considering more than one perspective on the same phenomenon. As an example, bioethically relevant literature on organ donation, a topic that has gathered vast interest over the last 65 years, comes from transplantation medicine, or from economics, or from philosophy, or from law, and so on. As a further example, consider for instance some simple queries on PubMed – see Table 1.
Table 1.
PubMed queries. The first query displays the amount of new papers indexed in PubMed in a 10 days interval. The second displays the amount of new papers indexed in a 10 days interval mentioning “cancer” in title or abstract. The third displays the amount of new papers indexed in a 10 days interval using “breast neoplasms” as a MeSH indexing term.
Query | Results | ||
---|---|---|---|
("2018/11/01"[PDAT]: "2018/11/10"[PDAT]) | 64857 | ||
(("2018/11/01"[PDAT]: "2018/11/10"[PDAT])) AND cancer[Title/Abstract] | 6349 | ||
(("2018/11/01"[PDAT]: "2018/11/10"[PDAT])) AND breast neoplasms[MeSH Terms] | 87 |
Repeating the queries on different dates -but with the same ten days interval length-does not lead to big differences in numbers, meaning that every ten days there are around sixty thousand new indexed publications, six thousand about the broad topic “cancer” and about one hundred indexed with the specific MeSH term “breast neoplasms”. MeSH, or “Medical Subject Heading”, is a thesaurus created and maintained by the United States National Library of Medicine with the aim to reduce ambiguity in categorizing medical literature. It is organized in 16 categories, further divided into subcategories, resulting in a hierarchical tree structure.
If instead of looking to the last ten days we consider ten years, the scenario becomes really overwhelming. Figure 1 displays the number of publications per year found on Web of Science with the query TS=("end of life") AND PY=(2007–2017).
Figure 1.
Number of publications per year indexed on Web of Science mentioning “end of life” as TS (topic subject) with PY(publication year) comprised between 2007 and 2017.
We are speaking of a corpus of 18224 papers that we can (arbitrarily) consider somehow “recent”, growing at a steady pace. Considering other topics typically of interest for bioethicists, for instance “abortion” (n: 19264) or “informed consent” (n: 20862) the resulting amount of literature is more or less in the same order of magnitude.
3. The indexing issue
A precise indexing strategy, assigning unique descriptors to define the topic of a publication, is a functional tool to allow researchers to narrow down a query to include only what is really relevant for a specific research question. In this sense, with more than 25000 terms organized in a hierarchical structure, MeSH indexing is a fundamental instrument to retrieve medical literature discussing very specific topics (Lowe and Octo Barnett, 1994). Its effectiveness, tested, developed and consolidated since 1960, is due to four factors: it is comprehensive, unambiguous, clear and widely accepted.
Unfortunately there is no such a thing as a comprehensive, unambiguous, clear and widely accepted indexing system for bioethical literature. For example, in a systematic review assessing the different methodologies applied in empirical ethics, the authors were able to identify four main methodological categories, which is completely understandable and acceptable in a pluralistic and interdisciplinary field. The surprising (and disorienting) finding is that each one of these four categories includes a plethora of synonymic or semi-synonymic methodologies: 4 in ‘Dialogical Processes’, 3 in ‘Combination of Dialogical and Consultative Processes’, 22 in ‘Consultative Processes’ and 7 in ‘Neither Clearly Dialogical Nor Consultative’ (Davies et al., 2015).
Ambiguity and synonymity are two sides of the same coin, depending on the plastic nature of language and on its not always rigorous use. The notion of “justice” is a good example in this sense. Beuchamp and Childress, for instance, identified at least six families of theories, all of them using the same term, “justice”, and all of them grounded in the same Aristotelian formal principle (“equals must be treated equally”) but resulting in very different material principles, ranging from utilitarian frameworks (justice as maximization of social utility) to Madison Powers and Ruth Faden's wellbeing theory (justice as guaranteeing to every individual the functioning of the six core dimensions of well-being) (Beauchamp and Childress, 2013, 253).
The problem becomes even more relevant in interdisciplinary research. The word “ontology”, for instance, is used by both philosophers and computer scientists, but while philosophers understand it as the metaphysical study of Being in itself, for computer scientists an ontology is “an explicit specification of a conceptualization”, or better, “a specification of a representational vocabulary for a shared domain of discourse – definitions of classes, relations, functions and other objects” (Gruber, 1993; Breitman et al., 2007). When philosophers meet computer scientists and discuss ontologies, preliminary terminological clarifications are of paramount importance.
In a context in which the same word can have different meanings, conceptual clarification is of paramount importance, and better indexing can be a solution. Developing a MeSH-like indexing system for bioethics that aims to categorize topics and reduce ambiguity would surely be an indispensable and daunting enterprise that the scholarly community should seriously consider. Nevertheless, the whole effort would likely take several years before consensus and implementation, years that will likely see a continuous growth in uncategorised literature, or better, in literature categorised with the current ambiguous systems. A “BeSH tree” would be an instrument for the future, but not a solution for the present.
4. Selection bias
The issue of bias in information retrieval is well known and well debated in the literature, and several authors proposed taxonomies for different kinds of bias that can impact research (Song et al., 2010; Booth et al., 2016).
Selection bias is the kind of systematic error that can have the most detrimental impact on a literature review. It is conventionally understood as a form of bias in which “a reviewer selects primary research studies that support his/her prior beliefs” (Booth et al., 2016, 19). In a broader sense, selection bias implies a purposeful selection of the literature included in a study, either a posteriori (as in the aforementioned definition) or even a priori, selecting the sources of information to be used.
Numbers indicate that it is simply impossible to keep pace and read everything on a specific topic, even a quite narrow one; moreover, the lack of an indexing system means that there is no tool to filter safely what is really relevant from what is not. So we face a question: assuming that we consider it ethical to have both a granular understanding and an overall view of the field we want to work in, how can we reduce the amount of non relevant literature to deal with, without wasting too much time and losing relevant information? Three pseudo-solutions are usually employed, and all of them are biased to some degree:
-
-
“The newer, the better”. Even if somehow valid in STEM (Science, Technology, Engineering and Mathematics), this is an approach which is not viable in bioethics. There is no need to embrace a conservative standpoint to recognize a simple fact: plenty of fundamental bioethical literature is “old”, or at least older than ten years. The WMA Declaration of Sydney (1968), the Harvard Report (1968), and the President Commission's report on the Protection of Human Subjects in Biomedical and Behavioral Research (1981) are three clear examples;
-
-
“The most cited, the better” is a flawed approach in principle: it starts a positive feedback loop that marginalizes articles that might be relevant, but for some reason didn't receive an initial burst of citations at their publication (“reputation echo chamber”) (Kim et al., 2017);
-
-
“Follow a specific tradition/approach” is flawed in principle as well: the consequence is the loss of a global perspective on the field (“heritage echo chamber”).
5. Smart iterative search strategies
These three distinct but closely related problems (publication proliferation, poor indexing and selection bias) have a possible common solution in the application of Smart Iterative Search Strategies (SISS). The overarching idea is quite simple: text mining software can analyse more data than a person; thus, if properly set up and “fed”, it can reduce selection bias and has the ability to cope with poor indexing.
Interactive query expansion and interactive query formulation have been already discussed in the literature from a theoretical point of view, and have been successfully applied in different contexts (Liu and Wacholder, 2017; Haunschild et al., 2016): according to Efthimiadis, the expansion of an initial query with related terms (hierarchically, in the context of MeSH-like trees, or by similarity using a thesaurus) leads to high user satisfaction in information retrieval (Efthimiadis, 2000). Wacholder, in a more recent review, described the cognitive process of Iterative query formulation, intended as an information retrieval activity in which “the information seeker has input from the results of previous searches (from the same session). Basic QF is at the core of iterative QF but the process is modulated by the additional entities and increased complexity of the flow of information” (Wacholder, 2011) (Wacholder 2011)S.
In Wacholder's description, iterative QF and basic QF are presented as activities heavily depending on the user, who is responsible of crafting the initial search strategy, revising the results, and eventually deciding how to modify the initial query. SISS is a set of techniques that aims to offer a practical implementation tool to automatize some of this passages. In short, it is a way to analyse large amounts of text in order to refine the initial query, including relevant keywords and yielding to more relevant and comprehensive results. Moreover, being based on the application of an algorithm, it is per se less prone to the selection bias that a user could introduce in the process of selecting relevant terms for the expansion of the query.
Text mining (and computational linguistics in general) have entered the spotlight, being widely used for security purposes, biomedical applications, understanding markets or tracking political discourse (Gupta and Lehal, 2009), and there is also increasing consensus regarding their application as instruments to speed up systematic reviews (or, at least, to keep the pace with published literature) (Ananiadou et al., 2009; Thomas et al., 2011; O'Mara-Eves et al., 2015). Moreover, thanks to open source online based instruments such as Voyant Tools (‘About - Voyant Tools Help’ 2018), these techniques have become easily and widely available. Voyant Tools, originally conceived to “enhance reading through lightweight text analytics such as word frequency lists, frequency distribution plots” (Klein et al., 2015) is the oldest and most widely used tool to support interactive exploration of large linguistic corpora.
Nevertheless, since computational linguistics is a field of its own with a growing body of literature and techniques of increasing complexity, approaching the issue from the point of view of a bioethicist with little to no formal training in computer science could sound daunting. But it is not. Some of these techniques are rather easy to apply to one's everyday research workflow.
Let's assume, as a case study, that we are interested in the ethical aspects of human genetic enhancement. A quick search for TITLE-ABS-KEY(human AND enhancement AND (“gene” OR “genes” or genet∗) AND ethic∗) on Scopus yelds a considerable but not enormous number of results (n: 688), providing a good test case. Scopus (and many other databases) allows us to order chronologically the results and export their abstracts, which can then be fed into Voyant Tools (see Table 2).
Table 2.
Voyant Tool Frequency Analysis, first 30 results. The first column represents the word's frequency ranking in the corpus, the second represents the word, the third represents the number of occurrences.
# | Word | Count |
---|---|---|
1 | genetic | 2644 |
2 | human | 1901 |
3 | enhancement | 1790 |
4 | ethics | 1096 |
5 | gene | 986 |
6 | genetics | 867 |
7 | therapy | 779 |
8 | research | 730 |
9 | medical | 718 |
10 | inward | 688 |
11 | social | 673 |
12 | health | 622 |
13 | ethical | 608 |
14 | humans | 592 |
15 | engineering | 538 |
16 | reproduction | 532 |
17 | moral | 465 |
18 | approach | 423 |
19 | biomedical | 410 |
20 | technology | 339 |
21 | life | 311 |
22 | public | 296 |
23 | cell | 282 |
24 | bioethics | 281 |
25 | risk | 279 |
26 | germ | 278 |
27 | care | 272 |
28 | rights | 248 |
29 | policy | 239 |
30 | reproductive | 238 |
It is interesting, at this point, to examine the relative frequencies and the sparkline graph trends of the words in the corpus, looking for other concepts emerging under the surface of the initial query and for how often they have been mentioned in different moments of time. After adding some standard stopwords (s2.0, eid, https, http, md5, partnerid, record.uri, www.scopus.com, doi, article, journal, keyword, index, author, abstract) to get rid of some noise, we know, without any prior knowledge of the field, that the question is considered “medical” or at least related to health, strictly connected with social issues, and related to reproduction.
A final detail: as is well known in the literature, there is a stark contrast between the concepts of enhancement and therapy, often presented and discussed as opposites, embedding different moral values and implying different moral duties (The President’s Council on Bioethics, 2003). Voyant Tools can show how the frequency of a word varies in different segments of the document, and it allows us to use wildcards (e.g: enhance∗ or therap∗). Confronting the variation in frequency of these two clusters of words over time we can clearly see that between the late ‘90s and the early 2000s (segment 2) the concept of therapy has become less discussed in this field (see Figure 2).
Figure 2.
“Therapy” (purple line) vs. “Enhancement” (blue line)in Genetic Enhancement literature. Results from the Scopus query have been downloaded as text and divided by year of publication. Each segment is one year. Each word has been lemmatized (i.e: therap∗ and enhance∗) in order to show the relative frequency of the semantic groups in the segments.
This comes as no surprise to anyone who is familiar with the content of the report cited above. But it is of great value to see the change in the debate without having to read the texts, from a quantitative perspective. In the same way, it is possible to confront the trends of any word, or cluster of words.
At this point, after understanding the general trends in the field and identifying an interesting question, the process can be iterated, refining the query, exporting a (smaller) number of abstracts and exploring them with the same text mining techniques (see Figure 3).
Figure 3.
The five steps of Smart Iterative Search Strategies, from the definition of a preliminary query to its refining through frequency analisys.
6. Full text frequency analysis and text mining
If adopting frequency analysis techniques in the preliminary phase of the development of a new research project or while approaching a new topic is helpful, it can be really useful also in a later phase when the relevant literature for a specific topic or project is already identified and available in full text, and it has to be assessed. In this case, the traditional approaches are two, and again both are biased to some degree:
-
-
“First in, first out and read everything”: feasible, but the risk is ending up with a massive amount of disconnected notes, precise on the single paper but lacking an overall picture of the concepts discussed, and of their evolution over time;
-
-
“Read the abstract first, then read the paper only if the abstract seems relevant”: there is a consistent risk to arbitrarily miss relevant studies just because the abstract is not fancy enough.
But what if we had the possibility of having both a general and a granular understanding of the literature in our corpus, being able to see at the same time the big overarching trends and small but fundamental details? If for the latter it is (still) indispensable to allocate some quality time to the pleasure of reading, for the former there is a solution provided by the application of frequency analysis techniques to full text articles.
An interesting case study in this sense is offered by the literature on organ donation, which is a broad topic, widely discussed, with a lot of literature, coming from different fields, and ranging from theoretical positions to empirical studies. A recent request for a report on the influence of consent models, donor registries and family decision on organ donation rates, realized for the Swiss Federal Office of Public Health (Christen et al., 2018) has been an excellent opportunity to test the system. After defining a precise and comprehensive search strategy by means of SISS, we downloaded all the obtained papers and fed them into MaxQDA (Woolf and Silver, 2017), a program designed for qualitative research and coding that recently introduced some easy and useful functions for frequency analysis.
The first pass was plain frequency analysis, that can be performed on single words or on couples/triplets of words (see Table 3). This type of analisys can help us look for emergent concepts and to define further exploration strategies. For instance, in this case we had a first intuition about opt-out systems being much more discussed (caveat: discussed does not mean favoured!) in comparison with opt-in.
Table 3.
MaxQDA Frequency Analysis, couples of words, first 10 results.
Word combination | Frequency # | % | Present in documents, # | % |
---|---|---|---|---|
organ donation | 1648 | 1.66 | 67 | 98.53 |
organ donor | 438 | 0.44 | 49 | 72.06 |
presume consent | 412 | 0.42 | 45 | 66.18 |
opt-out system | 377 | 0.38 | 50 | 73.53 |
opt out | 297 | 0.3 | 45 | 66.18 |
donation rate | 279 | 0.28 | 41 | 60.29 |
http www | 274 | 0.28 | 49 | 72.06 |
their organ | 230 | 0.23 | 42 | 61.76 |
potential donor | 223 | 0.22 | 43 | 63.24 |
does not | 190 | 0.19 | 45 | 66.18 |
Dictionary based frequency analysis is an evolution of frequency analysis: we might know, from familiarity with the field or from a preliminary frequency analysis, that some concepts can be expressed in more than one way (namely: “opt in” or “opt-in” or “opting in”). Dictionary based frequency analysis is the solution for this issue: defining a list of synonyms or semi-synonyms allows us to aggregate all the possible variations of a concept, and count them together. It is important to keep in mind that the compilation of such a list is a delicate task and requires some degree of familiarity with the topic and with the lexicon used to discuss it. For example, failing to include the world “boyfriend” among the synonyms and semi-synonyms of “partner” will introduce another source of bias.
The results, i.e., the overall frequency of the words of a dictionary, can be shown aggregated for an entire corpus or for a single paper. If the first feature is of great utility in cases like opt in vs opt out, the latter is extremely useful (especially if combined with basic filtering and ordering tools provided by Excel or similar software) to identify at a glance the literature that is likely to be more important in order to understand a specific problem in a given corpus.
Table 4 is an example of the results obtainable with dictionary-based frequency analysis. First a query on shared decision making in young hemato-oncologic patients was defined by means of SISS, then all the literature was retrieved, then a dictionary was built for each of the relevant categories (autonomy, responsibility, patient, physician, nurse, family), and finally they were used for the frequency analysis. From the data we know, for instance, that the concept of responsibility is more debated than autonomy, and that the role of physicians is less debated that the role of families, but more than the role of nurses.
Table 4.
MaxQDA frequency analysis, dictionary based, results per paper, first 10 results. The first column identifies the paper. Columns 2–7 display the absolute frequency of words contained in each one of the 6 dictionaries.
Name | autonomy | responsibility | patient | physician | nurse | family |
---|---|---|---|---|---|---|
TOTAL | 175 | 450 | 14964 | 5768 | 2246 | 6084 |
Sainio, Lauri 2003 | 2 | 1 | 238 | 58 | 86 | 39 |
Tang, Lee 2004 | 7 | 1 | 314 | 74 | 17 | 113 |
El Turabi, Abel et al. 2013 | 1 | 1 | 321 | 10 | 2 | 17 |
Shepherd, Woodgate 2011 | 0 | 6 | 14 | 15 | 53 | 186 |
Knopf, Hornung et al. 2008 | 3 | 11 | 124 | 88 | 7 | 17 |
Langbecker, Ekberg et al. 2016 | 0 | 7 | 115 | 29 | 134 | 25 |
Ishibashi, Ueda et al., 2010 | 1 | 1 | 82 | 30 | 49 | 146 |
Cohen, Botti 2015 | 1 | 1 | 299 | 25 | 54 | 33 |
Trarieux-Signol, Bordessoule et al. 2018 | 12 | 1 | 291 | 57 | 0 | 91 |
Carey, Anderson et al. 2012 | 0 | 3 | 130 | 35 | 4 | 25 |
As a last treat, it is also possible to further narrow down frequency analysis using autocoding, a feature of MaxQDA originally developed to speed up qualitative research. In short, the autocoding feature divides the text into sentences, looks for the presence of a word contained in the dictionary in each sentence, and if one of these words is found, tags that sentence with the name of the category that the word belongs to. As a practical example, given a dictionary like “hair = (hair, eyebrow, sideburn, eyelash, moustache, beard, wig)” and a sentence like “This morning I forgot to shave my beard”, this sentence would be autocoded as “hair”. This way it is possible to build a set of “subcorpuses” containing all the sentences that contain a specifc set of keywords, like all the sentences concerning “autonomy” or all the ones on “patient”, or even all the ones mentioning both. Then it is possible to explore these subcorpuses with the same techniques discussed above, understanding for instance what are the most common concepts associated with “patient autonomy”. Finally, after “mapping” the overarching themes and building a general understanding of the literature, it is time to “go granular” and proceed with a manual content assessment by reading the papers.
7. Discussion
The (over)proliferation of scientific literature in general is a problem too big not to be acknowledged, and it is hard to overestimate its impact on an interdisciplinary field such as bioethics, where gathering and understanding information coming from different disciplines is fundamental. It is a fact: if we want to ground future science on existing knowledge, we have two possibilities. The first is to dramatically reduce the amount of published literature, decoupling the publisher's revenues from the number of papers published and thus removing incentives to publish “noise” (Aguzzi, 2019) and finding better ways than sheer bibliometric indicators to evaluate academic careers (Binswanger, 2014). The second, as already discussed, is to develop and systematically employ comprehensive, unambiguous, clear and widely accepted indexing systems, modelled on MeSH – like taxonomies.
Both are clearly long-term, hard to accomplish solutions that need to be discussed and pursued by the scholarly community. Meanwhile, the methodology here described as “Smart Iterative Search Strategies” (SSIS) can be a practical way to “cope with the flood”, to define more refined search strategies, explore search results, get the general sense of the literature captured by a query, and ultimately reduce the number of papers to be downloaded and read without incurring one of the three kinds of selection bias described above.
In this context, full text frequency analysis and text mining are complementary techniques, relatively easy and fast to perform, allowing one to build a preliminary map of concepts and topics discussed in a given corpus that can be used to build a general perspective, a starting point for manual assessment of the content. Nevertheless, there is a caveat: it is important to remember that from the point of view of frequency analysis, assuming that we are interested in concepts such as “nose” and “nice”, the sentences “my nose is nice” and “my nose is not nice” are identical: their meaning is opposite, but they both mention the same concepts. Results obtained with frequency analysis are not final results, but powerful hints about what is going on in large bodies of text.
8. Conclusion
Are we able to really know and understand all the literature on a certain topic, keeping pace with new publications? Unless the topic is really narrow, the answer is a clear “no”. A situation less than ideal, risking to transform science into an uncoordinated and chaotic effort.
The issue of literature overproliferation could lead the scientific enterprise per se to a critical spot, a “no turning back point” where there is a dramatic loss of meaning. An instrument originally introduced as a way to convey knowledge has grown too fast in comparison to our ability to get the meaning out of it, becoming a source of noise and a huge time devourer. On the one hand, we definitely need to find a way to limit the growth of non-significant literature, or of literature that has purposes other than conveying knowledge. On the other, we need better strategies to navigate large amounts of text in a fast, efficient and non-biased way.
Smart iterative search strategies, full text frequency analysis and text mining are not a solution, in contrast with developing and implementing a MeSH-like indexing system for bioethical literature, or finding a structural way to “change the currency of Academia”. Nevertheless, Smart Iterative Search Strategies, Full Text Frequency Analysis and Text mining, if properly employed, can be a good working strategy to cope with this massive flow of information.
Declarations
Author contribution statement
Giovanni Spitale: Conceived and designed the experiments; Performed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper.
Funding statement
This work was supported by a Krebsforschung Schweiz grant.
Competing interest statement
The authors declare no conflict of interest.
Additional information
No additional information is available for this paper.
References
- About - Voyant Tools Help. 2018. https://voyant-tools.org/docs/#!/guide/about Accessed November 27. [Google Scholar]
- Aguzzi Adriano. ‘“Broken access” publishing corrodes quality’. Nature. 2019;570(June):139. doi: 10.1038/d41586-019-01787-2. [DOI] [PubMed] [Google Scholar]
- Ananiadou Sophia, Rea Brian, Okazaki Naoaki, Procter Rob, James Thomas. Supporting systematic reviews using text mining. Soc. Sci. Comput. Rev. 2009;27(4):509–523. [Google Scholar]
- Beauchamp Tom L., Childress James F. seventh ed. Oxford University Press; 2013. Principles of Biomedical Ethics. [Google Scholar]
- Binswanger Mathias. Excellence by nonsense: the competition for publications in modern science. In: Bartling Sönke, Friesike Sascha., editors. Opening Science: the Evolving Guide on How the Internet Is Changing Research, Collaboration and Scholarly Publishing. Springer International Publishing; Cham: 2014. pp. 49–72. [Google Scholar]
- Bohannon John. ‘Who’s afraid of peer review? Science. 2013;342(6154):60–65. doi: 10.1126/science.2013.342.6154.342_60. [DOI] [PubMed] [Google Scholar]
- Booth Andrew, Sutton Anthea, Papaioannou Diana. second ed. SAGE; 2016. Systematic Approaches to a Successful Literature Review.https://www.researchgate.net/publication/235930866_Systematic_Approaches_to_a_Successful_Literature_Review [Google Scholar]
- Bornmann Lutz, Haunschild Robin. Does evaluative scientometrics lose its main focus on scientific quality by the new orientation towards societal impact? Scientometrics. 2017;110(2):937–943. doi: 10.1007/s11192-016-2200-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bornmann Lutz, Mutz Rüdiger, Neuhaus Christoph, Daniel Hans-Dieter. Citation counts for research evaluation: standards of good practice for analyzing bibliometric data and presenting and interpreting results. Ethics Sci. Environ. Polit. 2008;8(1):93–102. [Google Scholar]
- Brainard Jeffrey, Jia You. Science | AAAS; 2018. What a Massive Database of Retracted Papers Reveals about Science Publishing’s “Death Penalty.https://www.sciencemag.org/news/2018/10/what-massive-database-retracted-papers-reveals-about-science-publishing-s-death-penalty Oct. 25, 2018, and 2:00 Pm. October 18. [Google Scholar]
- Breitman Karin Koogan, Casanova Marco Antonio, Walter Truszkowski., editors. NASA Monographs in Systems and Software Engineering. Springer; London: 2007. Ontology in computer science. (Semantic Web: Concepts, Technologies and Applications, 17–34). [Google Scholar]
- Callaway Ewen. Beat it, impact factor! Publishing elite turns against controversial metric. Nature News. 2016;535(7611):210. doi: 10.1038/nature.2016.20224. [DOI] [PubMed] [Google Scholar]
- Christen Markus, Baumann Holger, Spitale Giovanni. Eine Beurteilung der aktuellen Literatur. Interner Bericht für das Bundesamt für Gesundheit zu Fragen des Hirntods und der Organspende nach Kreislaufstillstand [The influence of consent models, donor registries and family decision on organ donation. An evaluation of the current literature. Internal report for the Swiss Federal Office of Public Health on questions of brain death and organ donation in accordance with Circulatory arrest]’. 2018. Der Einfluss von Zustimmungsmodellen, Spenderegistern und Angehörigen-Entscheid auf die Organspende. [Google Scholar]
- Davies Rachel, Ives Jonathan, Dunn Michael. A systematic review of empirical bioethics methodologies. BMC Med. Ethics. 2015;16(1):15. doi: 10.1186/s12910-015-0010-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Efthimiadis Efthimis N. Interactive query expansion: a user-based evaluation in a relevance feedback environment. J. Am. Soc. Inf. Sci. 2000;51(11):989–1003. [Google Scholar]
- Eriksson Stefan, Helgesson Gert. The false academy: predatory publishing in science and bioethics. Med. Healthc. Philos. 2017;20(2):163–170. doi: 10.1007/s11019-016-9740-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gotti Maurizio. Disseminating early modern science. Specialized news discourse in the philosophical transactions. In: Brownlees Nicholas., editor. News Discourse in Early Modern Britain: Selected Papers of CHINED 2004. Peter Lang; 2006. pp. 41–70. [Google Scholar]
- Gruber Thomas R. A translation approach to portable ontology specifications. Knowl. Acquis. 1993;5(2):199–220. [Google Scholar]
- Gupta Vishal, Lehal Gurpreet S. A survey of text mining techniques and applications. J. Emerg. Technol. Web Intell. 2009;1(1) [Google Scholar]
- Haunschild Robin, Lutz Bornmann, Werner Marx. Climate change research in view of bibliometrics. PloS One. 2016;11(7) doi: 10.1371/journal.pone.0160393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hicks Diana, Wouters Paul, Waltman Ludo, de Rijcke Sarah, Rafols Ismael. Bibliometrics: the leiden Manifesto for research metrics. Nature News. 2015;520(7548):429. doi: 10.1038/520429a. [DOI] [PubMed] [Google Scholar]
- Kim Lanu, West Jevin, Stovel Katherine. American Sociological Association (ASA) Annual Meeting, August 2017. 2017. ‘Echo chambers in science?’.https://jevinwest.org/papers/ [Google Scholar]
- Klein Lauren F., Jacob Eisenstein, Sun Iris. Digital Scholarship in the Humanities 30 (Suppl_1). Oxford Academic: I130–41. 2015. Exploratory thematic analysis for digitized archival collections. [Google Scholar]
- Liu Ying-Hsang, Wacholder Nina. Evaluating the impact of MeSH (medical subject headings) terms on different types of searchers. Inf. Process. Manag. 2017;53(4):851–870. [Google Scholar]
- Lowe Henry J., Octo Barnett G. Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. J. Am. Med. Assoc. 1994;271(14):1103–1108. [PubMed] [Google Scholar]
- Lüscher, Thomas F. Measuring the unmeasurable: assessing the quality of science and scientists. Eur. Heart J. 2018;39(20):1765–1769. doi: 10.1093/eurheartj/ehy295. [DOI] [PubMed] [Google Scholar]
- O’Mara-Eves Alison, James Thomas, McNaught John, Miwa Makoto, Ananiadou Sophia. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst. Rev. 2015;4(1):5. doi: 10.1186/2046-4053-4-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Song F., Parekh S., Hooper L., Loke Y.K., Ryder J., Sutton A.J., Hing C., Kwok C.S., Pang C., Harvey I. ‘Dissemination and publication of research Findings : an updated review of related biases’. Health Technol. Assess. 2010;14(8) doi: 10.3310/hta14080. [DOI] [PubMed] [Google Scholar]
- Sorokowski Piotr, Kulczycki Emanuel, Sorokowska Agnieszka, Pisanski Katarzyna. Predatory journals recruit fake editor. Nature News. 2017;543(7646):481. doi: 10.1038/543481a. [DOI] [PubMed] [Google Scholar]
- The President’s Council on Bioethics . 2003. Beyond Therapy: Biotechnology and the Pursuit of Happiness.https://bioethicsarchive.georgetown.edu/pcbe/reports/beyondtherapy/ [Google Scholar]
- Thomas James, McNaught John, Ananiadou Sophia. Applications of text mining within systematic reviews. Res. Synth. Methods. 2011;2(1):1–14. doi: 10.1002/jrsm.27. [DOI] [PubMed] [Google Scholar]
- Wacholder Nina. Interactive query formulation. Annu. Rev. Inf. Sci. Technol. 2011;45(1):157–196. [Google Scholar]
- Woolf Nicholas H., Silver Christina. Routledge; 2017. Qualitative Analysis Using MAXQDA: the Five-Level QDATM Method. [Google Scholar]