Dependency management bots in open-source systems—prevalence and adoption

Linda Erlenhov; Francisco Gomes de Oliveira Neto; Philipp Leitner

doi:10.7717/peerj-cs.849

. 2022 Mar 3;8:e849. doi: 10.7717/peerj-cs.849

Dependency management bots in open-source systems—prevalence and adoption

Linda Erlenhov ^1,^✉, Francisco Gomes de Oliveira Neto ², Philipp Leitner ¹

Editor: Slinger Jansen

PMCID: PMC9044236 PMID: 35494797

Abstract

Bots have become active contributors in maintaining open-source repositories. However, the definitions of bot activity in open-source software vary from a more lenient stance encompassing every non-human contributions vs frameworks that cover contributions from tools that have autonomy or human-like traits (i.e., Devbots). Understanding which of those definitions are being used is essential to enable (i) reliable sampling of bots and (ii) fair comparison of their practical impact in, e.g., developers’ productivity. This paper reports on an empirical study composed of both quantitative and qualitative analysis of bot activity. By analysing those two bot definitions in an existing dataset of bot commits, we see that only 10 out of 54 listed tools (mainly dependency management) comply with the characteristics of Devbots. Moreover, five of those Devbots have similar patterns of contributions over 93 projects, such as similar proportions of merged pull-requests and days until issues are closed. Our analysis also reveals that most projects (77%) experiment with more than one bot before deciding to adopt or switch between bots. In fact, a thematic analysis of developers’ comments in those projects reveal factors driving the discussions about Devbot adoption or removal, such as the impact of the generated noise and the needed adaptation in development practices within the project.

Keywords: Software engineering, Software bots, Mining software repositories, Dependency management

Introduction

Bots are becoming prevalent tools in software development environments (Lebeuf, Storey & Zagalsky, 2018; Erlenhov, de Oliveira Neto & Leitner, 2020), particularly when bots are supportive of costly software maintenance tasks involving, e.g., creating pull requests (PRs) (Wessel et al., 2020a), code refactoring (Wyrich & Bogner, 2019) or code contributions over time (Wessel et al., 2018). Consequently, various studies investigate the impact of adopting a bot in a software development process. Recent work showed that the adoption of bots can have a significant impact on overall project metrics, such as number of PRs created and closed before and after a bot was introduced (Wessel et al., 2020a).

The underlying challenge such studies face is the difficulty of determining what exactly constitutes a “bot”, and to distinguish bots from automation in general (a topic that has been studied in the software engineering community at least since the early 2000s¹ ). Two different mindsets appear to be prevalent in existing studies: whereas researchers working on taxonomies or definitions often stress the difference to automation tools, e.g., by requiring bots to have, for example, human-like traits, such as a name (Erlenhov, de Oliveira Neto & Leitner, 2020), language (Lebeuf et al., 2019), or purpose (Erlenhov et al., 2019), more quantitative studies often take a relatively all-encompassing stance where every contribution that is not made directly by a human developer is considered a bot contribution (Dey et al., 2020b).

Depending on the study goal, such a wide definition may be exactly what is required. For example, Dey et al. (2020b) have proposed an approach to identify bot commits so as to exclude them from studies that target human behaviour. For such a study, whether the author of an excluded contribution is a bot or “just” an automation tool is largely irrelevant. However, for work that specifically targets the study of bot contributions and their effect on human developers, it seems central to more clearly delineate between tools that actually exhibit bot-like characteristics (according to existing taxonomies and classification frameworks) and other automation tools that do not.

Therefore, our goal is to (i) investigate some of the approaches above that classify bots and then (ii) verify whether a clearer distinction between bots and automation tools provides insights about the impact of bot activity in a project. Particularly, we leverage widely used impact measures such as PRs and comments to investigate the activity generated by one or more bots in the same project and also the interaction between humans and those bots (Wessel et al., 2020a; Wessel et al., 2018). Our general hypothesis is that a more refined approach to define and sample bots enables consistent comparison of one or more bots in maintaining the same project and reveals insights about bot activity (e.g., discussion threads between humans and bots) that are tangential to the expected benefits that any automation tool brings (e.g., creating more PRs and commits).

We investigate that hypothesis in an exploratory empirical study with open-source projects following a multi-method methodology composed of both quantitative and qualitative analysis of bot activity. In order to sample bots we rely on two existing studies that characterise bots: the BIMAN dataset which includes bot commits produced automatically by the BIMAN approach proposed by Dey et al. (2020a, 2020b), and the bot users’ personas introduced in our own earlier work (Erlenhov, de Oliveira Neto & Leitner, 2020), which focuses explicitly on how practitioners distinguish bots from automation tools. Below, we summarise our research questions and findings:

RQ1 - How much of the dataset includes automation tools that are, according to a more strict definition, not bots? As a first step, we qualitatively assess a sample of tools from the BIMAN dataset (Dey et al., 2020a) through the lense of bot users’ personas (Erlenhov, de Oliveira Neto & Leitner, 2020). We observe that only 10 of 54 (18.5%) analysed tools would qualify as bots according to our less lenient categorisation (they would be considered automation tools without human-like characteristics). Further, with one exception, these bots were all dependency management bots.
RQ2 - Do similar dependency management bots generate contrasting patterns of activity? Are their pull requests often merged by developers? How often do projects use multiple dependency management bots? Based on RQ1 results, we further analyse five dependency management bots from the dataset, and mine their activity (created pull requests and corresponding discussion threads) in 93 projects to perform a temporal analysis comparing patterns of bot activity in those projects. We observe that all five analysed bots exhibit similar behavioural patterns. Further, we observe that many projects experiment with multiple dependency management bots and frequently switch between them.
RQ3 - What factors guide the discussions about adopting, switching, discarding or using dependency management bots in open-source software? Based on the temporal analysis from RQ2, we qualitatively investigated a subset of issues and PRs with discussions about the different features and behaviour of the bot, such as usability aspects that conflict with the project’s development praxis, or the increase decrease in noise or trust introduced by the bot. Particularly, we map comments about adopting, discarding or replacing a bot to bot traits (e.g., convenience in handling multiple updates) and behaviour (e.g., intrusiveness autonomy to source code changes). Our analysis reveals that open-soure software maintainers are hoping for improved software quality when adopting dependency management bots. Common problems discussed when adopting, using, discarding and switching between these bots are usability issues, such as difficulties to understand or explain how the bot works, or challenges related to noise that overloads the maintainers.

The key contribution of this paper to the state of research is two-fold:

Firstly, our work shows that there currently is a dissonance between definitions of bots used by different authors and in different study contexts. Our results related to RQ1 indicate that even datasets such as BIMAN, which have explicitly been created to contain "bot contributions”, may contain many tools that would not satisfy more strict delineations of what a bot is. This implies that future bot researchers should be explicit about what definition of “bot” they are assuming, and ensure that the dataset they use (or their own data generation method) follows the same definition.
Secondly, we conduct an empirical investigation (using a combination of quantitative and qualitative methods) on the subset of tools contained in the BIMAN dataset that are indeed classified as bots even following a more strict delineation. We show that these are mostly very similar (dependency management) tools, and provide insights on how and why developers adopt, discard, or switch between such bots.

The remainder of the paper is structured as follows. In “Related Work”, we introduce related research on bots in software engineering. In “Study Methodology”, we provide a high-level view of our overall methodology, which is followed by a discussion of our main results relating to the three research questions in “Distinguishing Bots and Automation Tools”, “Activity Analysis of Dependency Management Bots” and “What are the Discussed Challenges and Preferenceswhen Adopting, Switching or Discarding Bots?”. Based on these results, we summarise and provide a broader discussion of our findings (and their implications for software engineering research) in “Discussion”, in which we also discuss the threats to validity. Finally, we conclude the paper in “Conclusions”.

Related work

Bots are the latest software engineering trend for how to best utilise the scarce resource “developer time” in software projects. However, the term itself is an umbrella term for several different types of tools used in software engineering. In order to classify these tools, several taxonomies have been presented. Lebeuf et al. (2019) presented an extensive, faceted taxonomy of software bots. Erlenhov et al. (2019) created a more compact taxonomy specifically focusing on bots in software development. A third taxonomy was proposed by Paikari & van der Hoek (2018), with a particular focus on chat bots in software engineering. The different taxonomies offer complementary views to classify and understand bots. For instance, Paikari & van der Hoek (2018) targets chatbots, thusm including many facets to classify different types of interaction and direction between the bot and a human. In contrast, Lebeuf et al. (2019) defines 27 subfacets covering intrisic, environmental and interaction dimensions to classify bots. Moreover, all those taxonomies are faceted, which allows them to be expanded to accomodate new levels as the field of software bots evolve (Usman et al., 2017). Nonetheless, a limitation common to all three taxonomies is that they lack clear, minimal requirements that a tool would need to fulfil to be considered a bot. In a subsequent study, Erlenhov, de Oliveira Neto & Leitner (2020) turned the question around and investigated the developers’ perception of bots as a concept, and asked what facets needed to be present in order for the developers to look at a tool as a bot. The authors categorised the tools by introducing three personas based on developers’ impressions, since there was not one definition that all developers could agree on. These personas each have a set of minimal requirements that needs to be fulfilled in order for them to recognise the tool as a bot-autonomy, chat and smartness. Each persona’s bots come with different problems and benefits, and affects the projects and its developers in different ways.

Research in the last years has explored various different dimensions of software engineering where bots may assist developers, including the automated fixing of functional bugs (Urli et al., 2018), bug triaging (Wessel et al., 2019), creating performance tests (Okanović et al., 2020), or source code refactoring (Wyrich & Bogner, 2019). This proliferation of bots is slowly creating demand for coordination between bots in a project, which has recently started to receive attention by Wessel & Steinmacher (2020) through the design of a “meta-bot”.

Impact of bot adoption

When it comes to adopting tools in the open-source software ecosystem Lamba et al. (2020) looked at how the usage of a number of tools spread by tracking badges from the projects main page. They found that social exposure, competition, and observability affect the adoption. In a recent paper by Wessel et al. (2021), the initial interview study revealed several adoption challenges such as discoverability issues and configuration issues. The study then continues to discuss noise and introduces a theory about how certain behaviours of a bot can be perceived as noise. Even though previous work often speculates that the adoption of bots can be transformative of software projects (Erlenhov, de Oliveira Neto & Leitner, 2020), it is still an open research question how exactly bot adoption impacts projects. Previous work from Wessel et al. (2018) studied 44 open source projects on GitHub and their bot usage. They clustered bots based on what tasks the bot performed and looked at metrics such as number of commits and comments before and after the introduction of the bots. However, no significant change could be discerned. One reason for this may have been that this study did not sufficiently distinguish between different types of bots, which may be used for very different purposes. Hence, follow-up research (Wessel et al., 2020a) focussed foremost on one specific type of bot, namely code coverage bots (1,190 projects out of 1,194), and found significant changes related to the communication amongst developers as well as a in the number of merged and non-merged PRs. This was subsequently investigated further in an interview study (Wessel et al., 2020b). These results, that less discussion is taking place, also is what was found by Cassee, Vasilescu & Serebrenik (2020) when looking at how continuous integration impacted code reviews. Peng et al. (2018) studied how developers worked with Facebook mention bot. The study found that mention bots impact on the project was both positive in saved contributors’ effort in identifying proper reviewers but also negative as it created problems with unbalanced workload for some already more active contributors.

Bot identification

Another area where bot categorisations are directly useful is in the (automated) study of developer activity. Software repository mining studies, such as the work published every year at the MSR conference (https://conf.researchr.org/home/msr-2021), frequently struggle to distinguish between contributions of humans and bots (where the study goal often requires to only include human contributions). Different approaches have recently been proposed to automatically identify bot contributions (Golzadeh et al., 2021b; Dey et al., 2020b), also leading to the BIMAN dataset, i.e., a large dataset of bot contributions (Dey et al., 2020a) which we build upon in our work. One challenge with identifying bot contributions is the presence of “mixed accounts” (Golzadeh et al., 2021a), i.e., accounts that are used by humans and bots in parallel. Mixed accounts require an identification of bot contributions on a the individual contribution level (rather than classifying entire accounts). Cassee et al. (2021) have shown that existing classification models are not suitable to reliably detect mixed accounts. In general, existing approaches are sufficient if the goal is to identify human contributions. However, as a foundation to study the bot contributions themselves (e.g., to assess bot impact), existing work lacks fidelity, in the sense that they do not distinguish between different types of automation tools and bots, nor between different types of bots.

Our study directly connects to these earlier works. We use the categorisation model proposed in our earlier work (Erlenhov, de Oliveira Neto & Leitner, 2020) to further investigate the BIMAN dataset (Dey et al., 2020a), particularly with regards to the question of how many of these automated contributions are actually “bots” in a stricter sense of the word. We further quantitatively as well as qualitatively investigate the (dependency management) bots we identified in the BIMAN dataset, further contributing to the discussion related to the impact of bot adoption on open-source projects.

Study methodology

To address our study goal, we perform a multi-method study combining different elements. First we perform a qualitative assessment of the BIMAN dataset (Dey et al., 2020a) based on criteria for bot classification defined by practitioners (RQ1), followed by a quantitative analysis based on temporal data of the activity of five dependency management bots (RQ2). Lastly, we look closer at specific bot activity within projects by doing a qualitative, thematic analysis of the discussion threads related to bot adoption, discarding and switching. A high-level overview of our methodology can be found in Fig. 1.

Icons made by Pixel perfect from www.flaticon.com.

We first extract a complete list of unique tools from the BIMAN dataset, which we then rank by usage. The first author of this study then manually categorised the first 70 tools according to our own classification from earlier research (Erlenhov, de Oliveira Neto & Leitner, 2020). Only 10 tools are classified as bots. Subsequently, we select five of those bots and sample 50 projects each that used the bot. For these, we use the GitHub API to extract all PRs and issues where the bot was involved (either as issue creator, commenter, or simply being mentioned). This leads to a large database of bot issues and PRs, which we then analyse both quantitatively and qualitatively. Finally, we select a subset of issues that include discussion threads about multiple bots in order to perform a qualitative analysis on the discussion between human contributors of the project.

Since the data of each RQ feeds into the next, more detailed method information is provided directly in “Distinguishing Bots and Automation Tools”, “Activity Analysis of Dependency Management Bots” and “What are the Discussed Challenges and Preferenceswhen Adopting, Switching or Discarding Bots?”, such as the choice of dependency management bots and filtering of issues in our datasets. The data collected, and scripts used for analysis can be found in our replication package (Erlenhov, de Oliveira Neto & Leitner, 2021). (https://doi.org/10.5281/zenodo.5567370).

Distinguishing bots and automation tools

We now discuss our first research question, an analysis of whether the existing BIMAN dataset of bots aligns with the bot characteristics listed by practitioners in our previous work. Specifically, we are interested how much of the dataset includes pure automation tools.

Data collection

We started from the BIMAN dataset which includes over 13 million commits from 461 authors. We then extracted the authors and sorted them by the number of GitHub organisations adopting each tool as a proxy of popularity or importance. However, initial analysis showed that the dataset contained duplicate tools (the same tool acting under multiple identities). We resorted to manually merging identities of the first 70 tools in the ordered list, which after merging, produced a final table consisting of 54 unique tools associated with 89 different authors.

Analysis and interpretation approach

We analysed these 54 tools manually using the flow-chart to characterise bots proposed in our previous work where we conducted an interview study and a survey with practitioners (Erlenhov, de Oliveira Neto & Leitner, 2020). The flow-chart contains five decision blocks with the goal of deciding if the tool would be considered a bot by any of the three personas modelled in the study: Charlie (a bot communicates via voice or chat), Sam (a bot does something “smart”), and Alex (a bot works autonomously). Furthermore, the classification implicitly assumed that bots would need to be used for a software engineering task.

For our categorisation, we adapted this decision model slightly (see Fig. 2). We added a decision to first check if the tool was actually used for a software engineering task. Further, since the goal of our study is to decide if a tool is a bot or an automation tool, we were less interested in the specific persona and classified all types of bots simply as “DevBots” with no further distinction.

Adapted from Erlenhov, de Oliveira Neto & Leitner (2020).

As the BIMAN dataset only contains commit data, we resorted to manually query additional information (GitHub user profiles, documentation, the tool’s external website, developer comments, etc.) to arrive at a classification decision for each tool. Examples of additional information used in the classification can be found in Figs. 3 (GitHub) and 4 (tool’s external website).

(https://github.com/docker-library/docs/issues/1248). The screen shows an issue explaining what the tool does.

(https://www.siteleaf.com/blog/connecting-github/). The screen shown is the investigated tool’s webpage with a video that implicitly describes how the commits to GitHub are created.

Results

Following the flow-chart we began by investigating whether the tool was actually used in a software development related task ((1) in Fig. 2). Not all tools passed this check—an example of a tool from the dataset that failed this criterion is fs-lms-test-bot. The tool updates repositories with a .learn-file (https://learn.co/lessons/standard-files-in-all-curriculum-lessons) that contains metadata about the project and is added so that participants at a bootcamp style coding school can easily identify what type of repository they are looking at.

Step (2) asks if a tool uses chat or voice. For most tools, this proved difficult to determine, and even for promising candidates (e.g., the JHipster bot (https://github.com/jhipster/jhipster-bot) we found that the part of the tool that produced the git commits that we were observing was unrelated to the chat bot. We concluded that, given our analysis data (git commits), this check is not of high value.

Step (3) asks if the automated tool initiated by humans. One tool that was considered as automation tool rather than bot because of this check was the Bors bot (https://bors.tech/), which (despite its name), only becomes active when explicitly triggered by a human developer.

In step (4), we investigated if the tool produces nontrivial code snippets or analysis? While clearly a judgement call, we did not consider the output of any tool in our sample to be sufficiently complex or “smart” in the spirit of the original classification model.

Step (5) asks if the tool is integrated into existing systems. Examples of tools that failed this check is one of the numerous build helpers, whose only task is to update the code with release versions when someone explicitly initiates this (https://github.com/docker-library/docs/issues/1248).

Finally, the last check in step (6) asks if the tool creates text output in team communication channels. Similar to step 2, this proved difficult to determine, as we did not have access to relevant team communication channels. One tool that did emerge as a bot after this check is the Whitesource bot (https://github.com/apps/whitesource-bolt-for-github), which creates one initial commit and after that communicates via issues.

On the final list of 54 tools, only 10 tools were (clearly) judged as bots according to the persona-oriented classification model. Table 1 lists these bots and a sample of tools that were judged as automation tools. We conclude the following from this classification exercise:

Table 1. Identified bots and a sample of tools evaluated as automation tools.

Numbers refer to checkpoints in the flow-chart in Fig. 2. Question marks represent checkpoints that we could not answer due to limited information about the corresponding tool.

Name	(1)	(2)	(3)	(4)	(5)	(6)	Evaluation
Whitesource-bot-for-Github	Yes	No	No	No	No	Yes	Bot
Greenkeeper	Yes	No	No	No	Yes	–	Bot
Dependabot	Yes	No	No	No	Yes	–	Bot
Renovate bot	Yes	No	No	No	Yes	–	Bot
Pyup bot	Yes	No	No	No	Yes	–	Bot
imgbot	Yes	No	No	No	Yes	–	Bot
DPE bot	Yes	No	No	No	Yes	–	Bot
Snyk bot	Yes	No	No	No	Yes	–	Bot
Depfu	Yes	No	No	No	Yes	–	Bot
Scala Steward	Yes	No	No	No	Yes	–	Bot
fs-lms-test-bot	No	–	–	–	–	–	Not related
Bors	Yes	?	Yes	–	–	–	Automation
docker-library-bot	Yes	No	No	No	Yes	–	Automation
Siteleaf	Yes	No	No	No	No	No	Automation
JHipster bot	Yes	?	–	–	–	–	Undetermined

Open in a new tab

Only a small fraction (10% of 54%, or 18.5%) of analysed tools clearly qualify as “bots” according to a stricter definition. A large majority are, often fairly conservative, automation tools that have been re-branded as bots, and exhibit little qualitative difference to the kinds of scripts that developers have used for a long time as part of their development, build, and deployment processes.
Interestingly, this includes many tools that are explicitly called “bots” as part of their names, e.g., the Bors bot or docker-library-bot. Hence, researchers that are interested in investigating bots in a stricter sense should not rely on tool names as primary way to identify bots.
It is evident that the tools that we actually classified as Devbots (e.g., dependabot, renovate, or greenkeeper) are very similar. More specifically, nine out of these ten bots are dependency management bots on some form. In one case—Snyk and Greenkeeper—one bot was acquired by the other in 2020 (https://snyk.io/blog/snyk-partners-with-greenkeeper-to-help-developers-proactively-maintain-dependency-health/).