Skip to main content
EMBO Reports logoLink to EMBO Reports
. 2018 Nov 13;19(12):e47260. doi: 10.15252/embr.201847260

Alternative article‐level metrics

The use of alternative metrics in research evaluation

Lutz Bornmann 1, Robin Haunschild 2
PMCID: PMC6280648  PMID: 30425125

Abstract

A growing range of metrics are available as an alternative to citation counts. The crucial questions for research evaluation are what kind of impact these measure and whether this correlates with the quality of research.

graphic file with name EMBR-19-e47260-g001.jpg

Subject Categories: S&S: Careers & Training; S&S: Media & Publishing; S&S: Politics, Policy & Law


Over the past few years, a new group of metrics, known as alternative metrics or altmetrics, has become the topic of interest and research in scientometrics. The name indicates that it begs to differ from classical bibliometrics by providing an alternative to citation analysis. “Altmetrics” itself was first proposed by Jason Priem on Twitter in 2010. Priem and his co‐authors then published the Altmetrics Manifesto in which they explain what altmetrics involves, how it can be used and what subjects it should focus on (http://altmetrics.org/manifesto). It also proposes studies to establish whether alternative metrics do measure the impact and influence of specific research or just represent background noise. To this end, the authors suggest that one could correlate various alternative metrics with traditional metrics, such as citations, or analyse their correlation with expert opinions. In addition, the manifesto addresses a major drawback of altmetrics, namely the susceptibility to manipulation; for instance, the difference between a good or bad indicator value based on Twitter often comes down to just a few tweets. Based on the initial Manifesto, Bornmann and Haunschild discussed principles to guide research evaluation 1.

Bibliometrics use citations from academic papers, which gauges the influence of specific articles on other scientific publications. In contrast, altmetrics use various other media to assess the impact of scientific publications. In essence, they involve counting references on social media such as LinkedIn, Twitter and Facebook, as well as news forums, blogs, online reference managers such as Mendeley and CiteULike, policy‐related documents (such as IPCC and WHO reports), and Wikipedia. In addition, altmetrics also measure views and downloads of scientific publications, along with reviews and recommendations of specific publications (such as F1000Prime, PubPeer and Publons).

Data sources for alternative metrics

Views and downloads of publications have been used as an alternative to citation counts 2, even before altmetrics were introduced. Views refer to the number of times users have visited a certain URL or have clicked on a specific link and can be further differentiated whether into abstract, full‐text, HTML, or PDF views of a publication. The COUNTER initiative (http://www.projectcounter.org) has developed standards and protocols for the recording and sharing of such statistics. The drawbacks of using views and downloads as data sources are that data are not always available from publishers and that automated downloads and content crawling undermine the use for measuring interest.

Views and downloads of publications have been used as an alternative to citation counts […], even before altmetrics were introduced.

The online reference manager Mendeley (http://www.mendeley.com) is another popular data source for measuring impact based on the number of Mendeley users who saved certain documents in their library. There are two reasons for the popularity of Mendeley: the data are freely available; and Mendeley has a high rate of document coverage of more than 80% of the documents in a publication set 3. Mendeley users save publications in their databases, before they (possibly) cite them. Mendeley data can therefore be used to measure the impact of publications early after their appearance, which is scarcely possible with citation counts.

F1000Prime (https://f1000.com/prime) is another interesting data source, since it is connected to the oldest and most established instrument of research evaluation: peer review 4. F1000Prime is a post‐publication peer review system, in which experts assess publications as “Good”, “Very Good” or “Exceptional” (equivalent to scores of 1, 2 or 3 stars) (https://f1000.com/prime/about/whatis/how). The stars ratings can be used as alternatives to traditional citations as indications of the quality of publications. However, F1000Prime is currently focused on biology and medicine.

In addition, altmetrics involve the analysis of a broad spectrum of different sources; in principle, all media are searched for references to publications. Some media are used because data providers assume that these represent interest in a certain field, for instance, in the case of policy‐related documents. Other media, notably Facebook and Twitter, are mainly used because the data are easily available. This, however, raises the question as to what kind of impact can be measured if the main criterion is just that it is easily available. Do the references reflect the benefits of publications to science or society, or simply a superficial interest in certain research topics?

Not all references are equal

Alternative metrics providers, such as Altmetric, Plum Analytics or Impact Story, collect references from various sources and count them at the individual article level. However, the references to publications are not always unequivocal. For example, references are often made to publications in tweets and blogs without the inclusion of hyperlinks or bibliographical information. This significantly increases the difficulty involved in recording references and, probably, results in a systematic underestimation. While altmetrics providers use automatic text analysis programmes to identify references to publications not identified via a direct reference, one can assume that these methods do not capture every single reference.

Given the advances made in automated textual analysis, it could be possible to perform more meaningful impact analyses by including a content‐related weighting.

Altmetrics data at the individual publication level are primarily of interest to publishers and literature databases, such as Web of Science and Scopus. The numerical data can also be analysed at higher levels of aggregation at the level of journals, scientists, universities or countries or used to compile an index based on data from several sources. For example, Altmetric developed the Attention Score based on altmetrics and corresponding weightings. References in news media, for example, are scored higher than those from blogs or tweets. To some extent, it also differentiates within individual sources, for example whether a particular tweet is from a well‐known or an unknown scientist. Overall, however, the weightings are rather arbitrary: no empirical or conceptual evidence is presented in support of the notion that it would result in a better index than any other weightings.

A few publications over the past few years have classified citations based on their context, for example to determine whether a given citation is influential or non‐essential 5. A similar analysis of content would also make sense for an altmetrics weighting. For example, a tweet that only contains a link to a scientific article along with the names of the authors is not indicative of an in‐depth engagement with the article. However, if a tweet provides a brief synopsis of the article or an evaluation of the content (“the method used is ambitious” or “the study cannot be reproduced”), it would indicate a more thorough reading. Given the advances made in automated textual analysis, it could be possible to perform more meaningful impact analyses by including a content‐related weighting.

The introduction and popularity of alternative metrics is largely fuelled by the shortcomings of traditional metrics and criticism of how these are used to evaluate research. After publication, a scientific paper needs to be found, read and understood to inspire new research publications or reviews, which can take years depending on the field. Given the length of this pre‐citation process and the necessity to carry out research in order to generate publications that cite a paper, it is evident that alternative metrics probably have a different meaning than citations. Alternative metrics could, for instance, generate impact measurements much closer to the date of publication. Authors and readers can already send tweets or write blog posts shortly after a given publication has appeared. Many tweets appear in rapid succession after a study is published, which gives rise to doubt as to whether or not the author of the tweet has even read (or understood) the study in question. However, in the case of one kind of altmetrics, that is policy‐related documents, we discovered that referencing takes place even later than classic citations.

Altmetric and Plum Analytics have been analysing policy‐related documents for scientific publications. This is particularly interesting from a research evaluation perspective because such an analysis provides insights into the impact of scientific publications on political decision‐making. Moreover, many policy documents undergo a quality assurance process similar to peer review for scientific publications. In addition, the appearance of scientific findings in policy‐related documents can take a long time, because these reports (e.g. IPCC or WHO reports) appear significantly less frequently than scientific publications.

Measuring quality

A crucial issue is the relationship between various metrics and scientific quality, which can be gauged using the correlation between alternative metrics and citations. Calculating rank correlations between altmetrics and citations yields large differences in the coefficients 6: the correlations between citations and most sources of altmetrics, such as Twitter, are rather low, which means that these metrics measure something else than traditional citations. It is difficult to say with certainty which dimensions they measure; all one can state is that references to scientific publications on platforms such as Twitter mirror the impact on that particular platform. Thus, impact analysis can hardly be generalized.

Many tweets appear in rapid succession after a study is published, which gives rise to doubt as to whether or not the author of the tweet has even read (or understood) the study in question.

Moreover, the same alternative metric can vary significantly between different altmetrics providers, in particular when the numbers are aggregated for more than one publication. The main reason is that these providers do not use a literature database to identify scientific publications but easily searchable platforms—usually Twitter or Facebook—for identifying publications. For example, the number of readers of publications on Mendeley can differ from that obtained from Altmetric or Plum Analytics. The reason for this is that while many scientific publications have readers on Mendeley, they are not referred to on either Twitter or Facebook. Therefore, Mendeley readership information of many scientific publications is not available to Altmetric and Plum Analytics.

F1000Prime and reader counts at online reference managers, particularly Mendeley, yield relatively high correlations between altmetrics and citations. One could infer that this fulfils one of the aspirations of altmetrics: to determine the impact of publications on science. These data accrue more rapidly than citation data and correlate well with later citations. To be able to make reliable statements about the impact of research, both citations and saves in online reference managers could be included in an evaluative study. If both metrics fail to produce comparable results, then the reliability of the impact analysis could be called into doubt.

A significant point of criticism against traditional citation‐based assessment is that the impact measurement is restricted to the sphere of research. However, with an increasing obligation to justify the use of public funds for research, interest in gauging the impact of research beyond science is high. In this context, altmetrics providers propose that their results can be used to ascertain “attention” by the public rather than “impact”. This idea is primarily based on the fact that the platforms analysed by these providers are mainly used by people outside the scientific community.

… with an increasing obligation to justify the use of public funds for research, interest in gauging the impact of research beyond science is high.

Our research shows that it is possible to measure the impact on (or attention by) specific user groups outside science. We think that these user‐specific impact measurements might become the most interesting application of alternative metrics. Based on self‐descriptions, Twitter or Facebook users can be classified, for example, as scientists or science communicators. Mendeley users are required to state the (academic) category to which they belong: for example, student, professor or librarian. Based on such categorizations, one can directly measure the impact of scientific publications on certain groups. For example, based on data from the Mendeley platform, it is possible to assess the impact of publications on PhD students and to compare the results of different universities. It is also possible to collate various user groups—students, professors and lecturers—within a specific sector and measure the impact of publications. In addition, some platforms also record geographic and subject‐specific data. These data can be analysed along similar lines to the corresponding author address from a publication. For example, data from Mendeley can be used to map user networks to address questions, such as common literature usage between various disciplines or countries.

The need for normalization

Just as with citations, alternative metrics also show subject‐specific differences: publications that appear in multidisciplinary magazines such as Nature or Science show the highest levels of activity on those platforms analysed for altmetrics, followed by publications from dedicated biological and medical journals, while the least activity can be observed in relation to publications from the humanities. Because subject‐specific differences in referencing frequency hardly ever have anything to do with the quality of the publications, normalization processes should be used to obtain subject‐independent impact scores. During the past few years, scientometricians have begun to transfer the established normalization processes in bibliometrics to alternative metrics. We proposed the Mean Normalized Reader Score (MNRS) and Mean Discipline Normalized Reader Score (MDNRS) indicators based on data from Mendeley. In the case of the MNRS, the number of readers of a scientific publication is divided by the average number of readers of those publications that were published in the same field and in the same year; MNRS values above 1 mean that the publication in question had an above‐average impact. Fairclough and Thelwall simultaneously proposed a methodology similar to the MNRS 7.

The MDNRS—like the MNRS—is based on the ratio of the number of readers to the discipline average for the same year, but the discipline is determined by the disciplinary category recorded at Mendeley. The MDNRS is related to the normalization processes used in bibliometrics where it refers to the papers in which the citation appears (citing‐side normalization) rather than the cited papers (cited‐side normalization). As an MDNRS value of 1 does not indicate an average impact, the values are more difficult to interpret than the MNRS values. However, the higher the MDNRS values, the greater the impact these publications have achieved on Mendeley.

Alternative metrics often yield zero values, meaning that the publication in question has not been referred to at all. It therefore makes little sense to carry out discipline‐based normalization for these data using the usual bibliometrics process. That is why the indicators discussed above primarily involve data from Mendeley, where zero values occur much less often. To address this problem, we recently proposed the MHq indicator 8, which is calculated for publication sets instead of for individual publications. The MHq compares the proportion of publications by a given unit—for instance, by an individual university—referenced at least once on a platform, with the proportion of publications referenced at least once in the corresponding discipline and year of publication.

We applied the MHq indicator to various altmetrics (Twitter, Facebook, news forums, policy‐related documents, Wikipedia and blogs) and observed only a weak correlation with the opinions of experts in a field. We also applied the MHq and compared the results with research assessments by the British universities’ Research Excellence Framework with respect to societal impact, and, again, found no appreciable correlation [preprint: 9]. The results can be interpreted to mean that alternative metrics reflect a different dimension of the impact of scientific publications than do the assessments of experts—even if the experts assess societal impact.

Which type of impact?

Indeed, it is not known what type of impact alternative metrics identify. Does it indicate that the publication is being used to inform research or does it only reflect background noise, which can tell us very little about the impact of a publication? This is the most important research question in relation to altmetrics, as the answer will decide over its future in the field of research evaluation. That is also why this question should be answered separately for every individual data source of alternative metrics. Only if the type of impact is known—and if it is of interest in the field of research evaluation—should the metric be used in evaluations.

Only if the type of impact is known—and if it is of interest in the field of research evaluation—should the metric be used in evaluations.

For online reference managers, such as Mendeley, there is substantial evidence to suggest that they reflect the impact of publications on science. The benefit for research evaluation is that they can be used earlier for measuring impact than citations. Another alternative metric is of similar interest to research evaluation, namely publication recommendations (F1000Prime, PubPeer or Publons), whereby experts in a field evaluate publications with reference to their quality, originality or readability. However, there are still at least two obstacles that hinder the use of these recommendations for research evaluation. First, there are currently too few publications being evaluated by experts for comprehensive coverage. Therefore, while a recommendation is a positive measure, a lack of recommendation is not significant. Second, recommendations cover only a few subject areas. While F1000Prime includes a relatively large volume of publications in medicine and the life sciences, most other disciplines are missing. The coverage of PubPeer and Publons is currently unknown.

On the whole, most alternative metrics, such as indicators based on Twitter and Facebook, should, for now, not be used to evaluate research. By contrast, normalized reader counts in online reference managers have a greater potential for research evaluation in the future, in particular for early measurement of research impact. In certain fields, such as medicine and the life sciences, it can also be helpful to measure the number of recommended publications along with recommendations for the publications themselves (in F1000Prime). In any case, alternative metrics are rather young and their use for evaluating research was proposed only a few years ago. Aggregated altmetrics should therefore primarily be used to conduct further research within scientometrics to determine whether and how they are able to measure any impact of research. Thus, most altmetrics should be of more interest to scientometricians than to research managers.

Conflict of interest

The authors declare that they have no conflict of interest.

EMBO Reports (2018) 19: e47260

Contributor Information

Lutz Bornmann, Email: bornmann@gv.mpg.de.

Robin Haunschild, Email: R.Haunschild@fkf.mpg.de.

References

  • 1. Bornmann L, Haunschild R (2016) To what extent does the Leiden Manifesto also apply to altmetrics? A discussion of the manifesto against the background of research into altmetrics. Online Info Rev 40: 529–543 [Google Scholar]
  • 2. Kurtz MJ, Bollen J (2010) Usage Bibliometrics. Ann Rev Inf Sci Technol 44: 3–64 [Google Scholar]
  • 3. Sugimoto CR, Work S, Larivière V, Haustein S (2017) Scholarly use of social media and altmetrics: a review of the literature. J Am Soc Inf Sci Tec 68: 2037–2062 [Google Scholar]
  • 4. Thelwall M, Kousha K (2015) Web indicators for research evaluation. Part 2: social media metrics. Prof De La Inform 24: 607–620 [Google Scholar]
  • 5. Zhao D, Cappello A, Johnston L (2017) Functions of Uni‐ and multi‐citations: implications for weighted citation analysis. J Data Inform Sci 2: 51 [Google Scholar]
  • 6. Bornmann L (2015) Alternative metrics in scientometrics: a meta‐analysis of research into three altmetrics. Scientometrics 103: 1123–1144 [Google Scholar]
  • 7. Fairclough R, Thelwall M (2015) National research impact indicators from Mendeley readers. J Informetr 9: 845–859 [Google Scholar]
  • 8. Bornmann L, Haunschild R (2018) Normalization of zero‐inflated data: an empirical analysis of a new indicator family and its use with altmetrics data. J Informetr 12: 998–1011 [Google Scholar]
  • 9. Bornmann L, Haunschild R, Adams J (2018) Do altmetrics assess societal impact in the same way as case studies? An empirical analysis testing the convergent validity of altmetrics based on data from the UK Research Excellence Framework (REF). https://arxiv.org/abs/1807.03977 [preprint]

Articles from EMBO Reports are provided here courtesy of Nature Publishing Group

RESOURCES