Abstract
We read with considerable interest the study by Gusenbauer and Haddaway (Gusenbauer and Haddaway, 2020, Research Synthesis Methods, doi:10.1002/jrsm.1378) comparing the systematic search qualities of 28 search systems, including Google Scholar (GS) and PubMed. Google Scholar and PubMed are the two most popular free academic search tools in biology and chemistry, with GS being the number one search tool in the world. Those academics using GS as their principal system for literature searches may be unaware of research which enumerates five critical features for scientific literature tools that greatly influenced Gusenbauer's 2020 study. Using this list as the framework for a targeted comparison between just GS and PubMed, we found stark differences which overwhelmingly favored PubMed. In this comment, we show that by comparing the characteristics of the two search tools, features that are particularly useful in one search tool, but are missing in the other, are strikingly spotlighted. One especially popular feature that ubiquitously appears in GS, but not in PubMed, is the forward citation search found under every citation as a clickable Cited by N link. We seek to improve the PubMed search experience using two approaches. First, we request that PubMed add Cited by N links, making them as omnipresent as the GS links. Second, we created an open‐source command‐line tool, pmidcite, which is used alongside PubMed to give information to researchers to help with the choice of the next paper to examine, analogous to how GS's Cited by N links help to guide users. Find pmidcite at https://github.com/dvklopfenstein/pmidcite.
Keywords: bibliometrics, manuscripts as topic, MEDLINE, open access publishing, publications, PubMed, reproducibility, search engine
Highlights.
What is already known?
Google Scholar is the most popular search system in the world.
Until Gusenbauer and Haddaway's 2020 paper, there was not a wide‐ranging, detailed study plus explicit advice for choosing a search system appropriate for systematic searches.
What is new?
A method to augment the list of citations returned from a PubMed query with the citation count and scientific influence data for each citation provided by NIH's Open Citation Collection (iCite).
A targeted comparison of Google Scholar and PubMed using the five search criteria recommended by Boeker with Google Scholar's documentation providing a description of their search support.
Potential impact for RSM readers outside the authors' field:
Draw attention to Gusenbauer and Haddaway's 27 search criteria, which is based on Boeker's five search criteria, and Gusenbauer and Haddaway's evaluation of 28 search tools so that a researcher may re‐evaluate their own specialized search systems and search methodology.
Make searches more effective and faster by informing researcher's how Boeker's five search criteria for evaluating search systems was applied using the example of PubMed vs Google Scholar.
1. INTRODUCTION
Modern literature reviews are primarily performed using online search engines. 1 , 2 The two most popular free academic search tools that are commonly used in health studies are PubMed and Google Scholar (GS). 3 Researchers worldwide are drawn to GS as the most common starting point for literature searches 1 , 3 , 4 , 5 because of its intuitive and familiar search interface, 1 , 6 a forward citation search Cited by N link under every document result, a Cite link to download a document's citation to bibliographic management software such as EndNote for every document result, high citation counts, immense literature coverage, and researcher profile pages. GS's massive citation count, reflected in the “N” in their Cited by N links is due to their highly effective web crawlers and to agreements with publishing houses (Data S3, Figure 13).
But the GS search interface has severe deficiencies that make literature searches laborious and, most importantly, unreproducible. However, many researchers are unaware of the drawbacks of GS. 7 For example, search results for a given query are dropped from one month to another, 3 , 8 , 9 , 10 with no documentation as to what has been dropped. Additionally, there is no way to download full search results in bulk, 7 , 8 resulting in the need to click and click and click to page through up to a maximum of 1000 search results, 10 or 20 results at a time (Data S3, Figure 12). And there is no direct access in GS to a paper's digital object identifier (DOI), which is a unique standardized persistent identifier.
It is important to call out the features and shortcomings of both PubMed and GS following two recent events. First, the National Institutes of Health (NIH) published a paper on October 11, 2019 announcing the NIH Open Citation Collection (NIH‐OCC), a free public citation database with citation data available for download in bulk. 11 Citation records in the NIH‐OCC database are accessible through a set of web and Application Programming Interface (API) tools, collectively called “iCite.” Second, on November 18, 2019 the U.S. National Library of Medicine (NLM) announced that the new PubMed 12 , 13 was available. 14 Highlights of the new PubMed include: a nimble mobile experience from a single responsive website for all screen sizes including mobile phones, tablets, and desktop computers; faster and more comprehensive search response; and advanced search features that GS simply lacks.
While we argue that PubMed is superior to GS in many ways, there is room to improve the literature search user experience in PubMed. We compare the “forward citation search” implementation in GS to that of PubMed, finding that the PubMed user experience can be improved by adding a GS feature to the PubMed Graphical User Interface (GUI). Alternatively, command‐line users can immediately augment their PubMed search results using the pmidcite scripts and library, which download citation data from the NIH‐OCC database using NIH's “iCite” API.
2. SCIENTIFIC SEARCH INTERFACE REQUIREMENTS
Many researchers are unaware that there is more than one type of search, 8 , 15 , 16 , 17 with each search type oriented to different user goals. All search tools are not appropriate for all search types. Three types of search include lookup tasks, exploratory search, and systematic search.
Lookup tasks are the most basic kind of search, usually involving a single query to obtain a well‐defined result. An example of a lookup task is searching for a specific paper by entering its title into the query box. GS excels at returning papers when provided with their title, even if there are errors in the query title text. Fellow researchers have complained that PubMed will sometimes not find a title if it is spelled incorrectly. Gehanno et al found that 100% of the 738 papers in their study were found using GS to search for each paper by entering its title into the query box. 18 From this, they concluded that GS could be used in systematic reviews. 18 This conclusion was quickly disputed by Giustini and Boulos whose paper is titled, “Google Scholar Is Not Enough to Be Used Alone for Systematic Reviews.” 19
Exploratory search and systematic search are useful in evidence synthesis. The goal of exploratory search is the acquisition of new knowledge and is considered to be demanding and potentially time‐consuming for the researcher. 16 A researcher doing an exploratory search uses a number of queries to iteratively learn about a subject. The queries begin with a rudimentary understanding of the subject matter and become honed as the researcher's knowledge increases through the search process. 15 Search tools like Google are frequently used in exploratory searches because they are made to be “user friendly” to increase user engagement, which benefits Google by making their market bigger. 8 Google, with their user‐friendly interface nearly always returns search results, but the search results that are missing can not be known. Additionally, GS is not designed for systematic searches where researchers need control over the selection power of the query results.
Systematic search is profoundly different than exploratory search. The goal of systematic search is to catalyze an objective account of the cumulative state of evidence for a specific research question. An example research question is “What is the best treatment for lupus nephritis that was classified as stage IV on a renal biopsy 17 ?” A well‐founded question addresses a clinical need where there is uncertainty regarding the effects of different interventions, which may vary in practice. 17 The goal is to understand the costs and benefits of various treatments, so that together the doctor and patient can make an appropriate choice for their particular situation. Systematic reviews are an exacting evidence syntheses featuring numerous rigorous steps documented in method guidelines 20 with the goal of providing an exhaustive synthesis of a well‐studied area of research. 8
Cochrane is one of the organizations that participate in systematic reviews. 8 Steps in a Cochrane systematic review include creating the research question, building a team of people that includes those that have previously done a systematic review, writing or updating a protocol for the review, and having the protocol reviewed. Only after those steps, the systematic search using search tools begins by attempting to find all published and unpublished literature that may answer the research question. First and second authors work independently to remove irrelevant results and upon completion, compare their findings. Many other steps occur, which are all reviewed, under the umbrella of data synthesis and specialized plots, data interpretation, and data presentation. Finally, the review is written. To learn more, we recommend researchers read “How to write a Cochrane systematic review.” 17
High quality literature searches, both systematic and exploratory, are one of the important elements required for the creation of sound scientific evidence. 21 In late October 2013, Boeker et al recommended that a scientific search interface contain five integrated search criteria. The 2013 Boeker guidance greatly influenced the Gusenbauer study, 8 which expanded the Boeker list from five search criteria to 27 for their study of 28 search tools. The requirements for search interfaces are mandatory not only for structured scientific literature retrieval like systematic reviews, but also in any research that needs to provide a comprehensive literature review. 7 We add “Forward citation search” to the Boeker list to evaluate the extremely popular GS implementation of this feature against the PubMed implementation and compare PubMed and GS's support for the search tools below using the 2013 foundational Boeker advice 7 :
Reproducible search: A reproducible search is a critical quality measure of a systematic review in a well‐documented search process that allows others to replicate or update a published synthesis search. To be a reproducible search means that given a search, the same query returns the same results plus new results.
Export: Users should be able to export search results in full.
Search history: Histories are needed to create incremental search changes, which are used to selectively focus the search results.
Search strategy documentation: Documentation that instructs researchers how to create original search queries and how to iteratively develop new queries that build upon previous searches.
Search string builder. These include the use of numerous fields, such as author, title, journal, date, and abstract, and clinical query filters for categories including therapy and diagnosis.
Forward citation search. Tools that allow researchers to follow the chain of citing papers.
These six criteria synchronize well with other pertinent principals like the “The FAIR Guiding Principles for scientific data management and stewardship,” which emphasizes automating the discovery of researchers' work through software algorithms by applying a succinct and measurable set of principles to make the work FAIR—Findable, Accessible, Interoperable, Reusable. 22
The availability of fundamental search elements in the search interfaces of both PubMed and GS is summarized in Figure 1, showing that PubMed's search interface fully implements the five recommended Boeker et al search elements, while GS does not. However, GS's implementation of the forward citation search is much more popular than PubMed's implementation due to its heavy use of the Cited by N link.
FIGURE 1.

Scientific search interface requirements. PubMed fully implement's Boeker et al's required list of characteristics for systematic search interfaces, while GS's implementation provides minimal support. But PubMed does not implement the extremely popular Cited by N links seen throughout Google Scholar
The next sections contrast PubMed and GS for each of the search requirements. When the GS documentation describes how they support the search interface requirements, it is featured in text boxes. Screen shots were taken of all GS documentation featured in this commentary (Data S3).
2.1. Reproducibility of search results
Repeating search queries in PubMed always produced the same previous content in the results, plus the expected steadily rising hits resulting from the increasing coverage of the database over time. But when a query is run month to month in GS, numerous researchers have observed sudden jumps, both rising, and falling, with large numbers of previous results lost. 8 , 9 , 23
2.2. Search results can be exported in full
Search results in PubMed, even those not displayed on the screen, may be exported in full up to a maximum of 10 000 results. PubMed export formats include short summaries, files for import to citation software (like EndNote), PMIDs, abstracts, or a comma separated values (csv) file containing a list of data.
Searches in GS are limited to 1000 results maximum and cannot be exported in bulk (Box 1), as described in their help documentation (Data S3, Figure 1):
BOX 1. GS Search results 24 .
Can I see more than 1000 search results?
Sorry, we can only show up to 1000 results for any particular search query.
Try a different query to get more results.
How do I get bulk access to records in Google Scholar?
Sorry, we are unable to provide bulk access.
Researchers who write a script to download GS search results programmatically, quickly discover the downloaded results are halted (Box 2) upon reaching an unspecified limit and then find this on the help page (Data S3, Figure 1):
BOX 2. GS programmatic bulk export 24 .
I wrote a program to download lots of search results, but you blocked my computer from accessing Google Scholar. Can you raise the limit?
Err, no, please respect our robots.txt when you access Google Scholar using automated software. As the wearers of crawler's shoes and webmaster's hat, we cannot recommend adherence to web standards highly enough.
GS explicitly states in their documentation that they will return a maximum of 1000 results for any search query (Box 1). The number of search results appears at the top of the list as “N result” in PubMed and “About N results” in GS. If “N” is less than 1000 results in GS, researchers may think they can copy and paste all “N” results, 20 citations at a time, by clicking, and clicking and clicking, advancing slowly through the search results. But researchers may be surprised to find that even if “N” is less than 1000 in GS, some of the “N” results may be missing. 9
PubMed can display 10, 20, 50, 100, or 200 results at a time on one page. GS can display 10 or 20 results. Clicking “Show more” in PubMed causes another set of results to be appended to the current results on the screen. The previous result sets are visible by scrolling up or pressing the browsers back button, which will cause the view to move to the previous divider and will not cause any results to disappear.
The “best match” relevance sort ordering in PubMed, described in a recent freely available peer‐reviewed research article, 25 uses a modern machine learning algorithm that is trained with aggregated user searches. The “best match” algorithm uses dozens of features to sort a list of citations, but its developers find that the most important document features are publication year and past usage. Additionally, recently published papers are given extra elevation in the sort list so that they will not be missed. 25
Users can find GS search algorithm components described in a variety of locations including a 1999 “technical report” on the website for the Stanford Digital Library Technologies Project which ended in 2004, 26 major updates as reported in a 2011 New York Times “Week in Review” piece, 27 and numerous Google patents.
GS favors highly cited papers and ranks them at the top of the sort list 28 so recent papers are more likely to be many pages back, making them harder to find. It is important to understand GS's sort practices to be able to estimate which results over the 1000 maximum were excluded.
2.3. Search history
PubMed records a history of every user search query and that user history is available as an interactive list where previous queries can be chained together and individual queries can be deleted to simplify the list. The full sequence of queries can be downloaded. GS has no similar search history.
2.4. Search strategy documentation
In addition to a comprehensive user guide, PubMed provides training in the form of tutorials, online training modules, quick tours, classes, and handouts. For further support, PubMed allows users to enter a question by clicking on the feedback link always shown at the bottom right corner of the page to bring up a contact form. A real person usually responds within the next business day or two. PubMed plans to move the feedback link to a “Contact Us” link located at the bottom of each web page now that “The New PubMed” is now the default link.
To access GS's contact form, click on questions like these (Box 3, Data S3, Figures 2 and 3):
BOX 3. GS contact 29 .
I have noticed an error in a court opinion you are providing. What I can do to help fix it?
How do I remove a “Cached” (or “View as HTML”) link from your search results?
2.5. Search string builder
The link to PubMed's advanced search is immediately below the main search query box, making access straightforward, and efficient. The PubMed advanced search builder guides the user in building queries using more than 30 search fields, Boolean expressions (formed with AND, OR and NOT), and linking previous queries from the history. Additionally, users can customize the query entered in the query box.
The Boolean operators AND, OR, and NOT are required as it has been found that ranked retrieval alone, such as that found in GS, is not sufficient for a systematic search requiring high recall. 30 High recall ensures that all the expected matches appear in the search results. 31 These features give researchers the ability to fine‐tune which results are included and which are not. 32
GS's advanced search only offers access to three fields, “authored,” “published,” and “dated,” compared with PubMed's 30 search fields. There is no support for full Boolean search, 8 and no ability to string together previous queries. The link to the GS advanced search documentation is still described as being located immediately to the right of the main GS search box (Data S3, Figure 6). But the link has been moved away from the right of the main search box to under a menu icon on the upper left‐hand corner of the GS web page (Data S3, Figure 7).
2.6. Forward citation search
PubMed has a forward citation search which can be accessed by opening the PubMed page for a single chosen article (Data S1, Figure 1). If the paper has citations, scrolling to the bottom of the page will show a “Cited by” section (Data S1, Figure 2, red 3) which lists the total number of citing papers in the section header and shows the first few papers in the section body. The full list of citing papers may be downloaded from PubMed in a variety of formats, including text or comma separated values (csv), by clicking the See all cited by articles link (Data S1, Figure 2, red 3a) and pressing the “Save” button (Data S1, Figure 5). But the web page showing the list of citing papers contains no citation count information for any articles on the page (Data S1, Figure 5). To see the citation count of each of the citing papers, the researcher must click on each citing paper one by one to open the individual paper's web page and scroll down to that paper's “Cited By” section, making choosing the next paper to explore a slow and laborious process (Data S1, Figures 1 and 2). We would like to see Cited by N links ubiquitously featured on all citations in a list (Data S1, Figure 6, red boxes). We rate the Forward citation search feature as “Good” rather than “Better” (Figure 1) because the Cited by N links do not appear (Data S1, Figure 5) in PubMed.
In GS, clicking the Cited by N link of a specific paper will open a web page with a list of papers citing the specific paper (Data S1, Figures 3 and 4). Each paper in the list has a Cited by N link (Data S1, Figure 4, red boxes), making it easier to compare the citing papers appearing in the list (Data S1, Figure 4, boxed in red). Unlike PubMed, there is no way to download the list of all citing papers in bulk. We rate this feature as “Better,” even though it is not possible to compare all search results in a single view, because of the usefulness, and popularity of the GS Cited by N link.
PubMed is missing the Cited by N link on each paper in a list papers which is prominently featured in GS (Data S1, Figure 6), causing researchers to be lured toward GS and away from PubMed despite a grueling literature search experience in GS.
2.7. Scientific search feature summary
The advanced features recommended in 2013 7 for an effective, exhaustive, and reproducible systematic review are fully implemented in PubMed, but was not implemented by GS in 2013, when Boeker did his study, and remains not implemented in 2020. 7 , 8 , 19
And some GS features have made the search process more onerous. In 2008, GS search results could be displayed with 10 to 300 items per page. 33 Today, it is restricted to either 10 or 20 items per page (Data S3, Figure 12). Featuring search results at a maximum of 20 per page rather than 300 per page makes literature search more time‐consuming, labor‐intensive, and reduces a researcher's ability to visualize the search results as a whole. In 2013, Boeker concluded that GS was not ready as a searching tool for tasks where structured retrieval methodology is compulsory. 7 Almost a decade later, GS, still can not be considered for such tasks.
3. COVERAGE OF PUBMED AND GOOGLE SCHOLAR
3.1. The coverage of PubMed
PubMed is a search interface and toolset used to access databases like MEDLINE and PubMed Central (PMC) as well as additional content like books and articles published before the 1960s. Over 30.5 million article records are accessible through the PubMed interface (Figure 2). The databases, MEDLINE and PMC, are separate entities whose combined articles comprise 94% of all of the coverage indexed by PubMed (Data S2). MEDLINE is a highly selective database started in the 1960s. PMC, started in 2000, is an open‐access database for full‐text papers that are free of cost to the reader.
FIGURE 2.

Most of the coverage of PubMed is indexed in the MEDLINE database and the PMC database. The coverage of PubMed is shown on the horizontal axis. The top two bars on the vertical axis show two overlapping databases indexed by PubMed. The bottom orange bar indicates PubMed citations not found in the two major databases. About 88.5% of the ∼30.5 million citations accessible through PubMed are in the MEDLINE database (top blue bars) or are about to be added to (top green bars) the MEDLINE database (top blue and green bars). The MEDLINE papers that are free full‐text and are also indexed in the PMC database (middle blue and green bars) comprise over 68% of papers indexed in PMC. About 5.5% of PubMed papers are only available in PMC (middle brown bar). Almost all of the remaining 6% of full‐text papers (bottom orange bar) are behind a paywall. We queried for and downloaded 34 PubMed count data and created the figure with a script available in pmidcite [Colour figure can be viewed at wileyonlinelibrary.com]
3.2. The coverage of Google Scholar
While the coverage of GS is not known, it is estimated to exceed all other currently available search systems since GS aims to index all of scholarly information that is electronically available. 23 This is a principal reason for its standard‐setting citation index, which is used to replace “N” with a number in GS's forward citation search via Cited by N. The size and scope of GS remains unknown despite having been the subject of sizable research efforts since its creation. 8 , 28
3.3. Journals covered
The GS documentation instructs researchers who want to know if a specific journal is covered to choose a “statistical sample” of articles published by the journal and search for each paper using its title in the search box (Box 4, Data S3, Figure 4):
BOX 4. GS journal coverage 35 .
Which specific journals do you cover?
Ahem, we index papers, not journals. You should also ask about our coverage of universities, research groups, proteins, seminal breakthroughs, and other dimensions that are of interest to users. All such questions are best answered by searching for a statistical sample of papers that has the property of interest – journal, author, protein, etc. Many coverage comparisons are available if you search for [allintitle:”Google Scholar”], but some of them are more statistically valid than others.
In contrast to the GS approach, researchers can download PubMed's complete list of journals currently indexed in MEDLINE and deposited in PMC by following the journals link found on PubMed's home page. If a journal is on MEDLINE's approved journals list, papers are automatically indexed by PubMed (Data S2).
3.4. Indexing procedure for individual manuscripts
If an individual author manuscript was accepted into a journal that is not on MEDLINE's approved journal list, the requirements to submit the manuscript for deposit into PMC and indexing by PubMed are as follows. The work must be funded by an approved agency, peer‐reviewed, accepted into a journal, and free to access electronically. If these criteria are met, the paper may be submitted to the NIH Manuscript Submission system (NIHMS) for potential indexing in PMC after the paper has been successfully vetted in NIHMS.
In GS, the requirements are the article must be contained in a pdf file whose contents include a title, list of authors, and bibliography, and uploaded to a website. The affect of GS's regularly crawled data and loose indexing policies is that GS indexes records that are non‐academic. For example, the GS policies for “author manuscripts” has resulted in a number of lunch menus that are stored as a pdf file online to be indexed as scholarly citations in GS with various food items being listed as authors (Data S3, Figure 10 and 11).
Additionally, some researchers have demonstrated that it is possible to deliberately trick the crawler and inflate the GS citation score. 36
4. FORWARD CITATION SEARCH
The NIH Open Citation Collection (NIH‐OCC), 11 is a free public citation database, which liberates researchers from the constraints of citation data that were previously locked behind barriers, such as citation lists which were not downloadable in bulk. Having full access to citation data could allow researchers to perform more efficient literature searches and analyze publishing trends in biomedicine.
The NIH citation database differs from GS's citation database in coverage, usability, and content. The coverage and content of GS is huge and covers many disciplines, while the coverage of the NIH citation database is limited currently to about 30.5 million manuscripts that were assigned a PubMed ID (PMID). The usability of iCite citations is extremely high because they are accessible for free through the NIH “iCite” web site and downloadable in bulk through the NIH Application Programming Interface (API). Citations can not be downloaded in bulk in GS.
The citation counts in GS will be higher because their index is massive. We have not experienced that the differing citation counts when using pmidcite vs GS are a hindrance during exploratory search tasks because the data needed to decide the next paper to investigate is how one paper performs relative to another. If all citation counts are scaled down in NIH's “iCite” compared to GS, we still can successfully compare the performance of papers relative to one another.
Additionally, in NIH's “iCite,” new papers are easy to find and compare, even if they have few citations. Having the data, even if it is scaled down compared to GS, to choose the next paper will speed the exploratory literature search faster than having all of the citations that are available in GS, but not available in other search systems. Once the researcher has become familiar with the subject through their exploratory literature search, then they may choose to use GS to see what might have been missed.
We have tested the practical usage of a Cited by N link by creating a set of command‐line interface (CLI) scripts and a Python library, called pmidcite, which glue PubMed search results and NIH's “iCite” citation data together using PMIDs to provide functionality that is equivalent to having the Cited by N link. The results were so successful that we hope PubMed can expand the access to all biomedical researchers, even if they do not use a CLI by adding the links, Cited by N and N References (Data S1, Figure 6), to the PubMed GUI as soon as possible.
4.1. NIH's iCite
PubMed does not have a clickable Cited by N link for all the citations, making it difficult to choose the next paper to investigate (Data S1, Figures 5 and 6). But equivalent functionality can be had if, for a selected paper, the researcher downloads from PubMed the full list of citing papers as a list of PMIDs (Data S1, Figure 7) and uploads this PMID list of citations to iCite for analysis (Data S1, Figure 8). The list can then be sorted in the “citations” Table (Data S1, Figure 9, red 2) by Total Citations by clicking on the Total Citations column header under the OpenCites Table (Data S1, Figure 9, red 3).
But comparing papers only using its number of citations is problematic because papers in small niche fields may get considerably less citations than papers in large fields because both papers may be of relatively equal scientific influence in their respective communities. The NIH normalizes the number of citations that a paper receives by comparing it to the citation numbers of papers in its co‐citation networks. This measurement is called the Relative Citation Ratio (RCR) 37 and can be used to sort a list of PMIDs. Only citation count is offered by GS and it cannot be used for sorting search results.
4.2. pmidcite
Functionality equivalent to having a Cited by N link can be had from the command‐line shown in the following example using a selected paper with a PMID of 25 505 874 by typing “icite 25505874 – verbose” and pressing the “Enter” button. This causes citation counts to be downloaded from NIH's iCite and a report to be written to the screen or to a file. In the report, the number of citations is seen under the “cit” column for the user‐requested paper (“TOP”), the full list of it's citing papers (“CIT”) and references (“REF”)
:
The NIH values based on a paper's RCR are also available in pmidcite, but are not shown here. For more information regarding pmidcite and to see options for sorting citing papers from the CLI, see Data S1 and https://github.com/dvklopfenstein/pmidcite.
5. CONCLUSIONS
We hope to raise awareness that there are various types of search, including lookup tasks, exploratory search, and systematic search. Each search type requires unique search system features. GS is the system used as the starting point of most searches, rather than specialized tools like PubMed, by most researchers due to its intuitive, accessible interface, fast response, and best in class coverage. 3 GS excels for simple lookup tasks, like finding a paper by entering its title in the query box. 19
Both GS and PubMed can be used for exploratory searches, but we urge biomedical researchers to use PubMed rather than GS, because PubMed is one of the top recommended primary sources for literature searches of peer‐reviewed research in the biomedical sciences and has search feature criteria that GS has lacked since its inception. CLI users, especially, should consider using PubMed with search results annotated using NIH's “iCite” citation data because this functionality is available immediately through pmidcite.
Searching using the PubMed interface is a satisfying experience, even without the addition of the Cited by N link. But we hope that PubMed will soon add a clickable citation count link to every document entry in the search results list and to each paper listed in the document page sections, similar articles, cited by, references, and suggested reading so PubMed GUI users can enjoy similar benefits as CLI users.
GS fails to implement the required search criteria for systematic searches 7 and should not be used as a primary search tool for systematic reviews. 7 However, GS can be used as a secondary source. 8
Finally, we urge researchers to read the Gusenbauer and Haddaway paper to see how their own specialized search tool is or can be evaluated among the 28 extensively used academic search systems in the Gusenbauer study.
CONFLICT OF INTEREST
The authors declare no potential conflict of interests.
AUTHOR CONTRIBUTIONS
D. V. Klopfenstein wrote the manuscript and created the literature search methodology using citation data from NIH's iCite. D. V. Klopfenstein architected, designed, and implemented pmidcite, which allows researchers to use the literature search methodology (Data S1). Will Dampier reviewed the manuscript and provided crucial analytical feedback. All authors reviewed the final manuscript.
Supporting information
Data S1. The Cited by N link, the N References link, and how to use pmidcite.
Data S2. PubMed Coverage.
Data S3. Screenshots of GS containing content used in text boxes.
ACKNOWLEDGMENTS
We thank Carol Berman for her probing questions and evocative comments. We thank Robert Link and Phillip Palmer for sharing their detailed experience and impressions of PubMed and Google Scholar. We thank Phillip Palmer for alerting us that GS's scholarly coverage includes lunch menus authored by food items.
Klopfenstein DV, Dampier W. Commentary to Gusenbauer and Haddaway 2020: Evaluating retrieval qualities of Google Scholar and PubMed . Res Syn Meth. 2021;12:126–135. 10.1002/jrsm.1456
DATA AVAILABILITY STATEMENT
The software library that annotates PubMed search results with citation data downloaded from the NIH Open Citation Collection is openly available at https://github.com/dvklopfenstein/pmidcite.
REFERENCES
- 1. Hemminger BM, Lu D, Vaughan KTL, Adams SJ. Information seeking behavior of academic scientists. J Am Soc Inf Sci Technol. 2007;58(14):2205‐2225. 10.1002/asi.20686. [DOI] [Google Scholar]
- 2. Duke L. College Libraries and Student Culture: What we Now Know. Chicago, IL: American Library Association; 2011. [Google Scholar]
- 3. Nicholas D, Boukacem‐Zeghmouri C, Rodríguez‐Bravo B, et al. Where and how early career researchers find scholarly information. Learn Publ. 2017;30(1):19‐29. 10.1002/leap.1087. [DOI] [Google Scholar]
- 4. Athukorala K, Hoggan E, Lehtiö A, Ruotsalo T, Jacucci G. Information‐seeking behaviors of computer scientists: challenges for electronic literature search tools. Proc Am Soc Inf Sci Technol. 2013;50(1):1‐11. 10.1002/meet.14505001041. [DOI] [Google Scholar]
- 5. Niu X, Hemminger BM. A study of factors that affect the information‐seeking behavior of academic scientists. J Am Soc Inf Sci Technol. 2011;63(2):336‐353. 10.1002/asi.21669. [DOI] [Google Scholar]
- 6. Georgas H. Google vs. the library: student preferences and perceptions when doing research using Google and a federated search tool. Portal: Libr Acad. 2013;13(2):165‐185. 10.1353/pla.2013.0011. [DOI] [Google Scholar]
- 7. Boeker M, Vach W, Motschall E. Google scholar as replacement for systematic literature searches: good relative recall and precision are not enough. BMC Med Res Methodol. 2013;13(1). 10.1186/1471-2288-13-131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Gusenbauer M, Haddaway NR. Which academic search systems are suitable for systematic reviews or meta‐analyses? Evaluating retrieval qualities of Google scholar, PubMed and 26 other resources. Res Synth Methods. 2019;11:181‐217. 10.1002/jrsm.1378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Wichor Matthijs Bramer B. Variation in number of hits for complex searches in Google scholar. J Med Libr Assoc. 2016;104(2):143‐145. 10.5195/jmla.2016.61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Jamali HR, Asadi S. Google and the scholar: the role of Google in scientists' information‐seeking behaviour. Online Inf Rev. 2010;34(2):282‐294. 10.1108/14684521011036990. [DOI] [Google Scholar]
- 11. Hutchins BI, Baker KL, Davis MT, et al. The NIH open citation collection: a public access, broad coverage resource. PLoS Biol. 2019;17(10):e3000385. 10.1371/journal.pbio.3000385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Fiorini N, Lipman DJ, Lu Z. Towards PubMed 2.0. elife. 2017;6. 10.7554/elife.28801. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Fiorini N, Canese K, Bryzgunov R, et al. PubMed Labs: an experimental system for improving biomedical literature search. Database. 2018;2018. 10.1093/database/bay094. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. M C. The New PubMed is Here. 2019. (Accessed December 5, 2019).
- 15. White RW, Roth RA. Exploratory search: beyond the query‐response paradigm. Synth Lect Inform Concepts, Retrieval Services. 2009;1(1):1‐98. 10.2200/s00174ed1v01y200901icr003. [DOI] [Google Scholar]
- 16. Athukorala K, Głowacka D, Jacucci G, Oulasvirta A, Vreeken J. Is exploratory search different? A comparison of information search behavior for exploratory and lookup tasks. J Assoc Inf Sci Technol. 2015;67(11):2635‐2651. 10.1002/asi.23617. [DOI] [Google Scholar]
- 17. Henderson LK, Craig JC, Willis NS, Tovey D, Webster AC. How to write a Cochrane systematic review. Nephrol Ther. 2010;15(6):617‐624. 10.1111/j.1440-1797.2010.01380.x. [DOI] [PubMed] [Google Scholar]
- 18. Gehanno JF, Rollin L, Darmoni S. Is the coverage of google scholar enough to be used alone for systematic reviews. BMC Med Inform Decis Mak. 2013;13(1). 10.1186/1472-6947-13-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Giustini D, Boulos MNK. Google scholar is not enough to be used alone for systematic reviews. Online Journal of Public Health Informatics. 2013;5(2):214. 10.5210/ojphi.v5i2.4623. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Furlan AD, Pennick V, Bombardier C, vM T. 2009 updated method guidelines for systematic reviews in the Cochrane Back review group. Spine. 2009;34(18):1929‐1941. 10.1097/BRS.0b013e3181b1c99f. [DOI] [PubMed] [Google Scholar]
- 21. McKeever L, Nguyen V, Peterson SJ, Gomez‐Perez S, Braunschweig C. Demystifying the search button. J Parenter Enter Nutr. 2015;39(6):622‐635. 10.1177/0148607115593791. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Wilkinson MD, Dumontier M, Aalbersberg IJ, et al. The FAIR guiding principles for scientific data management and stewardship. Sci Data. 2016;3(1):160018. 10.1038/sdata.2016.18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Gusenbauer M. Google scholar to overshadow them all? Comparing the sizes of 12 academic search engines and bibliographic databases. Scientometrics. 2018;118(1):177‐214. 10.1007/s11192-018-2958-5. [DOI] [Google Scholar]
- 24. LLC G . Google Scholar help page for citation export. https://scholar.google.com/intl/en/scholar/help.html#export; 2020. Accessed January 7, 2020.
- 25. Fiorini N, Canese K, Starchenko G, et al. Best match: new relevance search for PubMed. PLoS Biol. 2018;16(8):e2005343. 10.1371/journal.pbio.2005343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Page L, Brin S, Motwani R, Winograd T. The PageRank Citation Ranking: Bringing Order to the Web. Tech. Rep. 1999–66, Stanford InfoLab; 1999. Previous number = SIDL‐WP‐1999‐0120.
- 27. Lohr S. Google Schools Its Algorithm. https://www.nytimes.com/2011/03/06/weekinreview/06lohr.html; 2011. Accessed: January 13, 2020.
- 28. Mayr P, Walter AK. An exploratory study of Google scholar. Online Inf Rev. 2007;31(6):814‐830. [Google Scholar]
- 29. LLC G . Google Scholar Publishers Support Web Page. https://scholar.google.com/intl/en/scholar/publishers.html#questions; 2020. Accessed: January 15, 2020.
- 30. Karimi S, Pohl S, Scholer F, Cavedon L, Zobel J. Boolean versus ranked querying for biomedical systematic reviews. BMC Med Inform Decis Mak. 2010;10(1). 10.1186/1472-6947-10-58. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Turnbull D. Relevant Search: with Applications for Solr and Elasticsearch. Shelter Island, NY: Manning Publications Co; 2016. [Google Scholar]
- 32. Hjørland B. Classical databases and knowledge organization: a case for boolean retrieval and human decision‐making during searches. J Assoc Inf Sci Technol. 2014;66(8):1559‐1575. 10.1002/asi.23250. [DOI] [Google Scholar]
- 33. Falagas ME, Pitsouni EI, Malietzis GA, Pappas G. Comparison of PubMed, Scopus, web of science, and Google scholar: strengths and weaknesses. FASEB J. 2008;22(2):338‐342. 10.1096/fj.07-9492lsf. [DOI] [PubMed] [Google Scholar]
- 34. Sayers EW, Agarwala R, Bolton EE, et al. Database resources of the National Center for biotechnology information. Nucleic Acids Res. 2018;47(D1):D23‐D28. 10.1093/nar/gky1069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Which specific journals do you cover?. https://scholar.google.com/intl/en/scholar/help.html#coverage; 2020. Accessed January 9, 2020.
- 36. López‐Cózar ED, Robinson‐García N, Torres‐Salinas D. The Google scholar experiment: how to index false papers and manipulate bibliometric indicators. J Assoc Inf Sci Technol. 2013;65(3):446‐454. 10.1002/asi.23056. [DOI] [Google Scholar]
- 37. Hutchins BI, Yuan X, Anderson JM, Santangelo GM. Relative citation ratio (RCR): a new metric that uses citation rates to measure influence at the article level. PLoS Biol. 2016;14(9):e1002541. 10.1371/journal.pbio.1002541. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data S1. The Cited by N link, the N References link, and how to use pmidcite.
Data S2. PubMed Coverage.
Data S3. Screenshots of GS containing content used in text boxes.
Data Availability Statement
The software library that annotates PubMed search results with citation data downloaded from the NIH Open Citation Collection is openly available at https://github.com/dvklopfenstein/pmidcite.
