Humans decompose tasks by trading off utility and computational cost

Carlos G Correa; Mark K Ho; Frederick Callaway; Nathaniel D Daw; Thomas L Griffiths

doi:10.1371/journal.pcbi.1011087

. 2023 Jun 1;19(6):e1011087. doi: 10.1371/journal.pcbi.1011087

Humans decompose tasks by trading off utility and computational cost

Carlos G Correa ^1,^*, Mark K Ho ^2,³, Frederick Callaway ², Nathaniel D Daw ^1,², Thomas L Griffiths ^2,³

Editor: Tobias U Hauser⁴

PMCID: PMC10234566 PMID: 37262023

Abstract

Human behavior emerges from planning over elaborate decompositions of tasks into goals, subgoals, and low-level actions. How are these decompositions created and used? Here, we propose and evaluate a normative framework for task decomposition based on the simple idea that people decompose tasks to reduce the overall cost of planning while maintaining task performance. Analyzing 11,117 distinct graph-structured planning tasks, we find that our framework justifies several existing heuristics for task decomposition and makes predictions that can be distinguished from two alternative normative accounts. We report a behavioral study of task decomposition (N = 806) that uses 30 randomly sampled graphs, a larger and more diverse set than that of any previous behavioral study on this topic. We find that human responses are more consistent with our framework for task decomposition than alternative normative accounts and are most consistent with a heuristic—betweenness centrality—that is justified by our approach. Taken together, our results suggest the computational cost of planning is a key principle guiding the intelligent structuring of goal-directed behavior.

Author summary

People routinely solve complex tasks by solving simpler subtasks—that is, they use a task decomposition. For example, to accomplish the task of cooking dinner, you might start by choosing a recipe—and in order to choose a recipe, you might start by opening a cookbook. But how do people identify task decompositions? A longstanding challenge for cognitive science has been to describe, explain, and predict human task decomposition strategies in terms of more fundamental computational principles. To address this challenge, we propose a model that formalizes how specific task decomposition strategies reflect rational trade-offs between the value of a solution and the cost of planning. Our account allows us to rationalize previously identified heuristic strategies, understand existing normative proposals within a unified theoretical framework, and explain human responses in a large-scale experiment.

Introduction

Human thought and action are hierarchically structured: We rarely tackle everyday problems in their entirety and instead routinely decompose problems into more manageable subproblems. For example, you might break down the high-level goal of “cook dinner” into a series of intermediate subgoals such as “choose a recipe,” “get the ingredients from the store,” and “prepare food according to the recipe.” Task decomposition—identifying subproblems and reasoning about them—lies at the heart of human general intelligence. It allows people to tractably solve problems that occur at many different timescales, ranging from everyday tasks such as cooking a meal to more ambitious projects such as completing a Ph.D.

At least two questions arise in the context of human task decomposition. First, how do people use decompositions? Second, how do people decompose tasks to begin with? Existing research provides answers to these two questions, but does so largely by considering each one in isolation. For example, we know that, when given hierarchical structure, people readily use it to bootstrap learning [1, 2] and to organize planning [3, 4]. Separately, studies show how hierarchical structure emerges from graph-theoretic properties of tasks (e.g., “bottleneck” states) [5], latent causal structure in the environment [6, 7], or efficient encoding of optimal behaviors [8]. These accounts provide insights into the function and mechanisms of hierarchically structured, action-guiding representations, but, again, they largely consider the use and the creation of such representations separately.

In this paper we bridge this gap, developing an integrated account of how using a decomposition interacts with the task decomposition process itself. Our proposal is organized around a deceptively simple idea: Task decompositions are learned to facilitate efficient planning (Fig 1). Based on this intuition, we develop a normative framework that specifies how an idealized agent should choose a hierarchical structure for a domain, given the need to balance task performance with the costs of planning. We quantify planning costs straightforwardly as the run-time of a planning algorithm, which means that our framework predicts that task decomposition is the result of interactions between task structure and the algorithm used to plan. Because we quantitatively examine how cognitive costs are balanced with task performance in the style of a resource-rational analysis [9–11], we refer to our framework as resource-rational task decomposition.

Fig 1 — We formalize this in three nested layers of optimization: Action-Level Planning solves for a plan to accomplish a subgoal, which has a computational cost. Subgoal-Level Planning constructs subgoal sequences that maximize reward and minimize computational cost. Task Decomposition selects subgoals based on their value in Subgoal-Level Planning. This figure was adapted from a figure published in [12] (License: CC BY 4.0).

Although much prior research is motivated by the idea that hierarchical task decomposition has the potential to reduce planning costs [8, 13–19], our framework differs from some prominent accounts because we directly incorporate planning costs into the criteria used to choose a task decomposition. By contrast, existing normative accounts typically formulate task decomposition as structure inference with the goal of inferring the hierarchical structure of the environment [6, 7] or sequential behavior [8, 20], which only indirectly connect to the computations involved in planning or their efficiency. Instead, our formal framework extends research that performs task decomposition based on algorithm-specific planning costs—some algorithms previously studied are value iteration [19], random walk search [21], and random sampling of optimal behavior [8]. Generalizing beyond a fixed algorithm, our framework explicitly considers how planning efficiency shapes hierarchical representations, which we use to demonstrate how resource-rational task decompositions change with varied search algorithms. Additionally, normative accounts often have limited ability to explain human behavior without the specification of algorithmic details necessary to be efficient or psychologically plausible. Because our account focuses on efficient use of planning, it is even more critical to spell out these details. We conduct an initial exploration of these issues by examining the capacity of existing heuristics to serve as efficient algorithmic approximations to our normative account.

Using this framework, we conduct a systematic comparison of resource-rational task decomposition with four alternative formal models previously reported in the literature [7, 8, 16, 22] using 11,117 different graph-structured planning tasks. One key insight from this analysis is that our framework can justify several previously proposed heuristics for task decomposition based on graph-theoretic properties (e.g., those that capture the idea of “bottleneck states” [16, 22]). Our framework thus provides a normative justification for these heuristics within a broader framework of resource-rational decision-making. Critically, these connections between our framework and heuristics also demonstrate the existence of efficient approximations to our formal framework. We also show that our framework produces predictions that are distinguishable from previous normative models of task decomposition.

To empirically evaluate this framework, we report results from a pre-registered experiment (N = 806) that uses 30 distinct graph-structured tasks sampled from 1,676 graphs, the subset of the 11,117 graphs that are compatible with our experimental design. To our knowledge, this set of graph-structured tasks is larger and more diverse than that of any other experiment previously reported in the literature on how people decompose tasks. As such, it enables us to draw more general conclusions about task decomposition than previous studies. Across this large stimulus set, we find that our framework provides the best explanation for participant responses among normative accounts, which supports the thesis that people’s hierarchical decomposition of tasks reflects a rational allocation of limited computational resources in service of effective planning and acting.

Results

A formal framework for task decomposition

How should an agent decompose a task? When purely optimizing behavior on a task (e.g., taking a shortest path in a graph), decomposing a task is only worthwhile in some larger context, such as making learning or computation more efficient. The computational efficiency of planning is a critical concern—as one attempts to plan into an increasingly distant future, over a larger state space, or under conditions of greater uncertainty, computation quickly becomes intractable, a challenge termed the curse of dimensionality [23]. In some cases, task decomposition can ameliorate this curse by splitting a task into more manageable subtasks. The difficulty comes in choosing among task decompositions since a bad choice can make the task at hand even more difficult [1]. In this work, we formulate a framework for task decomposition where planning costs directly factor into people’s choices—quite literally, our framework decomposes tasks into subtasks based on the run-time and utility of the plan that results from planning algorithms that solve the subtasks.

To demonstrate the role planning costs play in how people break down tasks, consider the following scenario: After leaving work for the day, you plan to go to the post office to send a letter. Since you rarely navigate directly from your workplace to the post office, you’ll have to do some planning. You could determine some efficient way to get from work to the post office, but an alternative is to first get somewhere that is easy to navigate to and also along the way. Maybe the café you sometimes stop by before work—If it’s easier to plan a route from the café to the post office, then you’ve simplified your problem by breaking it down into subtasks. This example suggests that the way people break tasks down (e.g., navigating to the café first) is a trade-off between efficiency (e.g., taking a quick route) and the cost of planning with that task decomposition (e.g., planning via the café is simpler).

We formalize our framework using three nested levels of planning and learning (Fig 1). At the lowest level is action-level planning, where concrete actions are chosen that solve a subtask (e.g., what direction do I walk to get to the café). The next level is subtask-level planning, where a sequence of subtasks is chosen (e.g., first navigating to the café and then to the post office). Finally, the highest level is task decomposition, where a set of subtasks that break up the environment are selected (e.g., setting the café as a possible subgoal across multiple tasks).

A central feature of our framework is the interdependence of choices made at each level: The optimal task decomposition depends on the computations occurring in the subtask-level planner, which depends on the computations that occur in the action-level planner. In particular, we are interested in how different decompositions can be evaluated as better or worse based on the cost of computing good action-level plans for a series of subtasks chosen by the subtask-level planner. In the next few sections, we discuss these different levels and how they relate to one another.

Action-level planning

Action-level planning computes the optimal actions that one should take to reach a subgoal. Here, we focus on deterministic, shortest-path problems. Formally, action-level planning occurs over a task $(S, T, s_{0}, z)$ defined by a set of states, $S$ ; an initial state, $s_{0} \in S$ ; a subgoal state, $z \in S$ ; and valid transitions between states, $T \subseteq S \times S$ , so that s can transition to s′ when (s, s′) ∈ T. The neighbors of s are the states s′ that it can transition to, $N (s) = {s^{'} ∣ (s, s^{'}) \in T}$ . We refer to the structure of a task $(S, T)$ as the task environment or the task graph.

Given an initial state, s₀, and a subgoal, z, action-level planning seeks to find a sequence of states that begins at s₀ and ends at z, which we denote as a plan π = 〈s₀, s₁, …, z〉. An action-level plan is computed by a planning algorithm, which is a stochastic function that takes in the initial state, valid transitions, and subgoal state and takes a certain amount of time to run, t. Thus, we can think of a planning algorithm Alg as inducing a distribution over plans and run-times given a start and end state, P_Alg (π, t ∣ s, z). In this work, we considered four different planning algorithms [24, 25], which we summarize below.

We start with a simple algorithm that hardly seems like one—the random walk (RW) algorithm. The algorithm starts at the initial state s₀ and repeatedly transitions to a uniformly sampled neighbor of the current state until it reaches the subgoal state z. Because it does not keep track of previously visited states to inform state transitions, this algorithm can revisit states many times and can result in both path lengths and run-times that are unbounded.

Depth-first search (DFS) augments RW by keeping track of the states along its current plan—this helps minimize repeated state visits. Because DFS does this, it will sometimes reach a dead-end where it is unable to extend the current plan, so it backtracks to an earlier state to consider an alternative choice among the other neighbors. DFS ensures that resulting plans avoid revisiting states, but still might be suboptimal and repeatedly consider the same states during the search process.

Iterative Deepening Depth-first Search (IDDFS; [26]) consists of depth-limited DFS run to increasing depths until the goal is found; while based on DFS, IDDFS returns optimal paths because it systematically increases the depth limit. IDDFS is conceptually similar to “progressive deepening,” a search strategy proposed by de Groot in seminal studies of chess players [27, 28].

Breadth-first search (BFS) ensures optimal paths by systematically exploring states in order of increasing distance from the start state s₀. The algorithm does so by considering all neighbors of the start state (which are one step away), then all of their unvisited neighbors (which are two steps away), and so on, successively repeating this process until the goal is encountered. Through this systematic process, BFS is able to guarantee optimal solutions and ensure states will only ever be considered once, making the algorithm run-time linear in the number of states.

While only noted in passing above, each of these algorithms makes subtle trade-offs between run-times, memory usage, and optimality. Focusing on BFS and IDDFS, the two optimal algorithms, we briefly examine these trade-offs. BFS visits states at most once, but requires remembering every previously visited state; by contrast, IDDFS will revisit states many times (i.e. greater run-time compared to BFS) but only has to track the current candidate plan (i.e. smaller memory use compared to BFS). In effect, IDDFS increases its run-time to avoid the cost of greater memory use. We briefly return to this point below when examining algorithm run-times. Having introduced various search algorithms, we now turn to the role algorithms play in subtask-level planning, where algorithm plans and run-times jointly influence the choice of hierarchical plan.

Subtask-level planning

Here, we assume a simplified model of hierarchical planning that involves only a single level above action-level planning, which we call subtask-level planning. Formally, subtask-level planning occurs over a set of subgoals, $Z \subset S$ . Given a set of subgoals, subtask-level planning consists of choosing the best sequence of subgoals that accomplish a larger aim of reaching a goal state $g \in S$ . Each subgoal is then provided to the action-level planner, and the resulting action-level plans are combined into a complete plan to reach the goal state.

The objective of the subtask-level planner is to identify the sequence of subgoals that brings the agent to the goal state while maximizing task rewards and minimizing computational costs. Here, we focus on a domain in which the task is simply to reach the goal state in as few steps as possible. Formally, the task reward associated with executing a plan is then simply the negative number of states in that plan: R(π) = −|π|. The computational cost that we consider is cumulative expected run-time. Thus, we define a subgoal-level reward function when planning to a single subgoal z from a state s using a planning algorithm that induces a distribution over plans and run-times P_Alg (π, t ∣ s, z) as:

\begin{matrix} R_{Alg} (s, z) = \sum_{π, t} P_{Alg} (π, t ∣ s, z) [R (π) - t] . \end{matrix}

(1)

This formulation is analogous to other resource-rational models that jointly optimize task rewards and run-time [29, 30] but applied to the problem of task decomposition.

Eq 1 defines the rewards for planning towards a single subgoal, but subtask-planning requires chaining plans together to form a larger plan that efficiently solves the task. This sequential optimization problem can be compactly expressed as a set of recursively defined Bellman equations [23]. Formally, given a task goal g, a set of subgoals $Z$ , and an algorithm Alg, the optimal subtask-level planning utility for all non-goal states s is then:

\begin{matrix} V_{Z}^{g} (s) = max_{z \in Z \cup {g}} {R_{Alg} (s, z) + V_{Z}^{g} (z)} \end{matrix}

(2)

To ensure this recursive equation terminates, the utility of the goal state g is $V_{Z}^{g} (g) = 0$ . The fixed point of Eq 2 can be used to identify the optimal subtask-level policy [31]. We permit the selection of the goal g as a subgoal to ensure that it is possible for the subtask-level planner to solve the task.

Task decomposition

Having defined action-level planning and subtask-level planning over subgoals, we can now turn to our original motivating question: How should people decompose tasks? In this context, this reduces to the problem of selecting the best set of subgoals $Z$ to plan over. Importantly, we assume that people rely on a common set of subgoals for all the different possible tasks that they might have to accomplish in a given environment. Thus, the value of a task decomposition, $Z$ , is given by the value of the subtask-level plans averaged over the task distribution of an environment, p(s₀, g). That is,

\begin{matrix} V (Z) = \sum_{s_{0}, g} p (s_{0}, g) V_{Z}^{g} (s_{0}) . \end{matrix}

(3)

The optimal set of subgoals $Z^{*}$ for planning maximize this value, so $Z^{*} = {argmax}_{Z} V (Z)$ .

To summarize, the value of a task decomposition (Eq 3) depends on how a subtask-level planner plans over the decomposed task (Eq 2), which is shaped by the resulting plans and run-time of action-level planning (Eq 1). This model thus captures how several key factors shape task decomposition: the structure of the environment, the distribution of tasks given by an environment, and the algorithm used to plan at the action level.

To provide an intuition for our framework, we explore its predictions in a simple task in Fig 2. The environment is a grid with a single task that requires navigating from the green state to the orange state. Each column in the figure corresponds to a different search algorithm, showing how search costs change without subgoals (Top) and with a subgoal (Bottom; subgoal in blue). While the task seems like it would be extraordinarily trivial to a person—like walking from one side of a room to another—a critical attribute of these search algorithms is they have an entirely unstructured representation of the environment, giving them only very local visibility at a state. A more analogous task for a person might be navigating in a place with low visibility, such as a forest or a city in a blackout. Even in this simple task, some search algorithms (BFS and IDDFS) can be used more efficiently when the problem is split at its midpoint (Fig 2d). The random walk is a notable counterexample, where using a subgoal results in less efficient search. This result may initially seem puzzling, but occurs because a random walk is likely to get to the goal without passing through the subgoal. The gap in run-time between IDDFS and BFS might make IDDFS seem inefficient—however, as noted in the algorithm descriptions above, IDDFS makes a trade-off of increased run-time in order to decrease memory usage. While outside the scope of the current manuscript, our formulation can be extended to incorporate other resource costs like memory usage in order to study how they influence task decomposition. This example clearly demonstrates a few characteristics of our framework—that the choice of hierarchy critically depends on the algorithm used for search, and that hierarchy can have a normative benefit (since it reduces computational costs) even in the absence of learning or generalization.

The formal presentation of our framework considers subgoal choice with intentionally restricted algorithms: brute-force search methods that exclude problem-specific heuristics to accelerate planning. However, the examples in this section (navigating to the post office via the café, navigating in a place with low visibility) likely rely on algorithms that incorporate heuristics, particularly related to spatial navigation. While outside the scope of this manuscript, our framework can flexibly incorporate any search algorithm that can define an algorithmic cost, including those that make use of heuristics. For example, in a previous theoretical study, we applied an early version of our framework to task decomposition in the Tower of Hanoi by using A* Search [25] with an edit distance heuristic [12]. Our framework also considers a constrained set of task decompositions that consist of individual subgoals. In comparison, the influential options framework [15] defines a more general set of hierarchical task decompositions, in which, for instance, subtasks can be defined by the subgoal of reaching any one out of a set of states and subtask completion can be non-deterministic. However, finding such subtasks remains a challenging problem in machine learning, so by focusing on the simpler problem of selecting a single subgoal we are able to make a significant amount of progress in understanding a key component of how humans plan. While not yet fully explored, our formalism can be extended to encompass broader types of hierarchy (and also varied search algorithms). For instance, another theoretical study adapted our framework to support more varied kinds of hierarchical structure by incorporating abstract spatial subgoals in a block construction task [32].

Comparing accounts of task decomposition

A number of existing theories have been proposed for how people decompose tasks. These accounts can be divided into two broad categories: heuristics for decomposition based on graph-theoretic properties of tasks and normative accounts based on the functional role of a decomposition. Our account is normative, so comparison with alternative normative accounts highlights the unique functional consequences of our framework. By comparing our framework to heuristics, we can characterize their predictions relative to normative theories as well as rationalize their use as heuristics for task decomposition. All models are listed and briefly described in Table 1.

Table 1. Descriptions of Normative Algorithms and Heuristics.

Normative Algorithm	Description
RRTD-IDDFS	Resource-Rational Task Decomposition (RRTD) using Iterative-Deepening Depth-First Search (IDDFS) as a search algorithm
RRTD-BFS	RRTD using Breadth-First Search (BFS) as a search algorithm
RRTD-RW	RRTD using a Random Walk (RW) as a search algorithm
Solway et al. (2014) [8]	Identifies partitions of the task into subtasks that minimize the description length of optimal solutions, given that subtask solutions are reused across tasks.
Tomov et al. (2020) [7]	Performs inference over partitions of the task graph into regions based on a prior over hierarchical graphs. Incorporates a preference for tasks to start and end in the same region, and for states in the same region to have similar rewards.
Heuristic	Description
QCut [16, 33]	Partitions the task graph through spectral decomposition of the graph.
Degree Centrality	Chooses subgoals based on Degree Centrality, which is the number of transitions into or out of a state s. For tasks where all state transitions are reversible, Degree Centrality is the number of neighbors $\| N (s) \|$ .
Betweenness Centrality [22]	Chooses subgoals based on Betweenness Centrality, which is how often a state s appears on shortest paths, averaged over all possible start and goal states. Takes into account cases with multiple shortest paths.

Open in a new tab

To begin with, one way to compare accounts is to relate them formally. We do so by relating resource-rational task decomposition with a random walk (RRTD-RW) to QCut [16, 33] and Degree Centrality in S1 Appendix. We prove a relationship between RRTD-RW and QCut that connects the two methods through spectral analysis and examine how the relationship between RRTD-RW and Degree Centrality varies based on spectral graph properties.

Another method we can use to compare theories is to compute their subgoal predictions on a fixed set of environments and compare them—qualitatively or quantitatively. This approach has been used in existing studies, but nearly always through qualitative comparison using a small number of hand-picked environments. For example, several published models perform qualitative comparisons to the graphs studied in Solway et al. (2014) [8]. These environments contain states that most models agree should be subgoals [7, 12, 21, 34]—these states are typically one or few that connect otherwise disconnected parts of the environment, making them “bottleneck states.” Environments with these kinds of bottleneck states robustly elicit hierarchically-structured behavior in experiments, but make it difficult to distinguish among theoretical accounts because they are in strong agreement (see top row of Fig 3).

Fig 3 — State color and size is proportional to model prediction when using the state as a subgoal. (Top) The 10-node, regular graph from Solway et al. (2014) [8]. (Middle, Bottom) Two eight-node graphs selected from the 11,117 included in our analysis.

To perform a large-scale and unbiased comparison of these algorithms, we chose from a structurally rich set of environments: the set of all possible 11,117 simple, undirected, eight-node, connected graphs. We compare subgoal choice for several heuristic theories, several variants of our framework, and variants of the normative accounts proposed by Solway et al. (2014) [8] and Tomov et al. (2020) [7] in Fig 4 (the theories are described in Table 1). Each cell of Fig 4 is the correlation between the subgoal predictions of a pair of theories, averaged across all environments. For simplicity, we assume the task distribution is a uniform distribution over pairs of distinct start and goal states. For the RRTD-based models, the model prediction for a state is the corresponding value of task decomposition when the state is the only possible subgoal.

Fig 4 — For each graph, correlations between two models are computed on the per-state subgoal values, then averaged across the 11,117 simple, connected, undirected, eight-node graphs. We discard correlations when either of the two models predicts a uniform distribution over subgoals because the correlation is not defined in those instances. References: Solway et al. (2014) [8], Tomov et al. (2020) [7].

We find a few notable clusters of theories—one demonstrates that RRTD-IDDFS is well-correlated with subgoal sampling based on Betweenness Centrality—this suggests the potential of a formal connection between the two algorithms, though the authors are not aware of an existing proof connecting them. A second prominent cluster shows RRTD-RW and Degree Centrality are highly correlated and that both are moderately correlated with QCut, consistent with our formal analysis. This relationship between RRTD-RW and Degree Centrality is qualitatively consistent with a published result that relates Degree Centrality to the task decomposition that minimizes a search cost related to RW [21]. The remaining algorithms—RRTD-BFS, Solway et al. (2014) [8], and Tomov et al. (2020) [7]—are singleton clusters, suggesting qualitative differences from the other algorithms.

To better understand these large-scale quantitative patterns, we qualitatively examine some of the normative algorithms and one heuristic. We focus on the subgoal predictions for three graphs, shown in Fig 3. In the top row is a 10-node, regular graph that has been previously studied [7, 8]. In the middle row is a similar graph with critical differences: the graph is asymmetric about the graph bottleneck, and the bottleneck of the graph now corresponds to a single state instead of two connected states. In the bottom row is a graph notably distinct from graphs typically studied because it lacks an obvious hierarchical structure. All algorithms make similar predictions for the graph in the top row—an example of the difficulty in using typically studied graphs to distinguish among algorithms. Now, we look at the algorithms in more detail.

We first examine RRTD-IDDFS (Fig fig:graph-examples-smalla) and Betweenness Centrality (Fig 3b), noting their strong agreement—this is consistent with the large-scale correlation analysis in the previous section. These two algorithms prefer the same subgoals in both graphs. At middle, they prefer the bottleneck state. At bottom, they prefer states that are close to many states—in particular, the two most-preferred states can reach any other state in at most two steps.

Now, we turn to the other normative accounts: Solway et al. (2014) [8] (Fig 3c) and Tomov et al. (2020) [7] (Fig 3d). Both rely on partition-based representations of hierarchical structure where states are partitioned into different groups. Mapping from partitions onto subgoal choices requires a step of translation. In particular, when considering a path that crosses from one group into another there are two natural subgoals that correspond to the boundary between the groups: either the last state in the first group or the first state in the second group. In the context of a task distribution, there are many possible ways to map partitions onto subgoal choices, without clear consensus between the two partition-based accounts that we consider. While these analysis choices have little impact on symmetric graphs (e.g., top row of Fig 3), they are important for asymmetric graphs like the one in the middle row, which has a bottleneck state instead of a bottleneck edge. For simplicity, our implementation of Solway et al. (2014) [8] uses the optimal hierarchy, placing uniform weight over all states at the boundaries between groups of the partition, as can be seen in Fig 3c.

The algorithm from Tomov et al. (2020) [7] introduces other subtleties. It poses task decomposition as inference of hierarchical structure, with two main criteria: 1) that there are neither too few nor too many groups (accomplished via a Chinese Restaurant Process) and 2) that connections within groups are dense while connections between groups are sparse. The latter leads to issues when connection counts do not reflect hierarchical structure, as shown in Fig 3d. At middle, the algorithm prefers partitions that minimize the number of cross-group connections, even when the bottleneck state is not on the boundary between groups. At bottom, the lack of hierarchical structure that can be detected by edge counts leads the algorithm to make diffuse predictions among many possible subgoals.

To close this section, we briefly discuss how our resource-rational account might be plausibly implemented. We outline two broad approaches: people might directly search for task decompositions that maximize Eq 3, or they might attempt to approximate the objective through tractable heuristics. First, finding the optimal task decomposition in a brute-force manner is more computationally expensive than simply solving the task. One alternative is to learn the value of task decompositions, relying on the shared structure between tasks and subgoals to ensure learning efficiency—for example, in the domain of strategy selection, one study uses shared structure to ensure efficient estimation which is incorporated by decision-theoretic methods to deal with the uncertainty in these estimates [35]. The second approach might approximately optimize the objective by using a more tractable heuristic—the results in this section suggest two examples, where Betweenness Centrality can approximate RRTD-IDDFS and Degree Centrality can approximate RRTD-RW. While Degree Centrality is straightforward to compute, Betweenness Centrality is still computationally costly because it requires finding optimal paths for all tasks. Importantly, Betweenness Centrality has a probabilistic formulation, so it can be estimated with analytic error bounds [36]. In this formulation, states that are more central appear more often in paths sampled from an appropriate distribution (i.e. sample a task, then sample an optimal path uniformly at random). This suggests a trivial memory-based strategy that tracks the occupancy of states visited along paths—when the paths are appropriately sampled, the expected occupancy should be related to Betweenness Centrality. Another approach is to approximate Betweenness Centrality, like in one planning-specific method that analyzes small regions of the environment separately, then pools this information to choose subgoals [22].

The large-scale comparison of subgoal predictions in Fig 4 demonstrates connections between existing heuristics and our framework for subgoal choice based on search efficiency. These connections suggest a rationale for the efficacy of these heuristics, which may stand in as tractable approximations of our resource-rational framework. Our qualitative comparison in Fig 3 highlights some of the differing predictions among the accounts. But how do these different accounts relate to how people decompose tasks? We turn to this question in the next section.

An empirical test of the framework

To measure people’s task decompositions, we ask research participants to report their subgoal use after experience navigating in an environment. A previously published experiment by Solway et al. (2014) [8] tested task decomposition using three graph navigation tasks. We developed a similar paradigm but used a set of 30 environments sampled randomly from 1,676 graphs, a subset of the 11,117 used in our large-scale model comparison above. The criteria for this subset were selected to ensure compatibility with the experiment and are detailed in the methods. This set of environments is larger and more diverse than those used in previous studies [8, 37] which allows us to draw broader and more generalizable inferences about the task decomposition process.

Inspired by prior studies [8], we conducted an experiment with two phases: participants were first familiarized with an environment by performing a series of navigation trials (Fig 5a), then answered a series of questions about their subgoal choices (Fig 5c and 5d).

Fig 5 — The depicted graph is the same as the top left graph of Fig 10. (a) An example *navigation trial*. The current state has a green background and the goal state has a yellow background. Only the edges connected to the current state are shown. (b) The interface used to show all graph edges between navigation trials. There is no indication of past or future trials on this screen. State icons are only shown when the cursor is placed on them. (c) An example *implicit subgoal probe*. (d) The final post-task assessment with the *teleportation question*. All icons were designed by OpenMoji and are reproduced here with permission.

Participants gained exposure to the environment by performing 30 navigation trials requiring navigation to a goal state from some initial state. These long trials were randomly selected while ensuring the initial and goal states were not directly connected (Fig 5a). Participants moved from the current state to a neighboring state using numeric keys. The trial ended when the goal state was reached. In simulations and pilot studies, states with high visit rates coincided with the predictions of RRTD-IDDFS and Betweenness Centrality. This made it difficult to dissociate model predictions from an alternative memory-based strategy where frequently visited states are selected as subgoals. As noted in the previous section, this memory-based strategy is related to sampling-based estimation of Betweenness Centrality. To address this confound, we modified the experimental task distribution so that long trials were interleaved with filler trials requiring navigation to a state directly connected to the start state. These filler trials were adaptively selected to increase visits to states besides the most frequently visited one; in pilot studies and simulations, this was sufficient to dissociate visit rate and model predictions.

A critical methodological difficulty is visually representing the environment in a way that enables rapid learning but does not introduce confounds. To prevent participants from relying on heuristics such as the Euclidean distance between states, states were assigned random locations in a circular layout. The absence of useful heuristics ensures the problems participants face are more comparable to that solved by brute-force search algorithms, which also lack a heuristic. To encourage model-based reasoning instead of visual search, connections between states were only shown for the current state. So that participants could still easily learn the connections, participants were periodically shown the graph with all connections between trials (Fig 5b).

To query participant subgoal choice, we used both direct and indirect probes to comprehensively and reliably measure subgoal choice, including novel as well as previously studied prompts [8]. In the context of 10 navigation trials, we first prompted participants “Plan how to get from A to B. Choose a location you would visit along the way,” the implicit subgoal probe (Fig 5c). Then, in the context of the same trials after shuffling, we asked participants “When navigating from A to B, what location would you set as a subgoal? (If none, click on the goal),” the explicit subgoal probe. In order to ensure familiarity with the concept of a subgoal, participants were introduced to the concept of a “subgoal” in the context of a cross-country road trip during the experiment tutorial. In a final post-task assessment, we asked participants “If you did the task again, which location would you choose to use for instant teleportation?”, the teleportation question (Fig 5d). We asked this question outside the context of any particular navigation trial.

Experiment results

We recruited English-speaking participants in the United States on the Prolific recruiting platform, prescreening to exclude participants of previous experimental pilots and those with approval ratings below 95%. Of the 952 participants that completed the experiment, 806 (85%) satisfied the pre-registered exclusion criteria requiring efficient performance on the navigation trials. If a participant took 75% more actions than the optimal path (averaged across the last half of long trials), their data was excluded. The number of participants per graph varied after exclusion criteria were applied, without significant differences per graph (before exclusion: range 27–34, after exclusion: range 21–30, two-factor χ² test comparing included to excluded, p = .985). Participants took an average of 17.17 minutes (SD = 8.02) to complete the experiment.

Even though the experimental interface obfuscated task structure by showing the task states in a random circular layout, participants became more effective from the first to the second half of training: long trials were solved more quickly (from 10.30s (SD = 29.74) to 7.60s (SD = 11.19)), with more efficient solutions (from 36% to 20% more actions than the optimal path; completely optimal solutions increased from 70% to 79%; solutions that included a repeated state decreased from 14% to 9%), and with decreased use of the map (on-screen duration decreased from 9.01s (SD = 20.97) to 2.98s (SD = 12.66); number of hovered states decreased from 5.43 (SD = 8.70) to 1.38 SD = 3.99; duration of state hovering decreased from 2.52s (SD = 7.23) to 0.60s (SD = 9.04)). In order to rule out confounding effects due to differences in the complexity and structure of tasks that participants solved, we related participant behavior on the probes to measures associated with each graph and found no significant relationships in S1 Appendix.

Our findings are organized into two sections: First, an analysis of the subgoal probes, demonstrating their internal consistency and relationship to behavior. Then, model-based prediction of subgoal probe choice, as well as a subset of choice behavior.

Subgoal probes are internally consistent and predict behavior

A crucial methodological concern is the validity of the probes for subgoal choice—in existing studies, various types of probes have been used, but not compared systematically. Choice on the explicit and implicit subgoal probes had high within-probe consistency across participants while choice on the teleportation question had low within-probe consistency across participants, based on the average correlation between per-participant choice rates and per-graph choice rates (Explicit Probe r = 0.71, Implicit Probe r = 0.63, Teleportation Question r = 0.37). We also evaluated consistency between probes by comparing the per-graph, per-state choice rates. The Explicit and Implicit Probes were well-correlated (r = 0.98, p < .001), though the relationship between the Teleportation Question and the remaining probes was weaker (Teleportation Question and Explicit Probe: r = 0.58, p < .001, Teleportation Question and Implicit Probe: r = 0.58, p < .001). While the Teleportation Question exhibits relatively low self-consistency and cross-probe consistency, it is difficult to compare to the consistency of the other probes since both the Explicit and Implicit probes were sampled for 10 different tasks per participant, while the Teleportation Question was only sampled once per participant.

Beyond simply assessing consistency, it is also crucial to link the probes to participant behavior during navigation, ensuring there is a link between decision-making and our indirect assessment via probes. On the Explicit Probe trials, participants were given the option of choosing the goal instead of a subgoal. On average, participants who chose a subgoal more frequently took shorter paths in the navigation trials (r = −0.29, p < .001; Fig 6), which suggests that use of subgoals promotes efficiency.

We also briefly examine the relationship between subgoal choice count on Explicit Probe trials and response times during navigation. We were unable to find evidence of a correlation between the subgoal choice count of a participant and their average log-transformed navigation trial duration (r = 0.01, p = .858). In order to understand how subgoal use influences response times, future studies should examine trial-level measures in appropriate experimental designs, an issue we remark on in the discussion.

To further link the probes to behavior, we examine instances where participants performed the same task (matched by start and goal state) in the navigation trials and the probe trials. This allows us to ask whether participants took paths that passed through the states they later identified as subgoals. To simplify the interpretation of the analysis, we focus on navigation trials where the participant’s path was optimal and there were multiple optimal paths between the start and goal. Evaluated over these pairs of matched navigation and probe trials, we found that participants’ choices on probe trials were consistent with their choices among optimal paths (Explicit Probe: 75.4%, Implicit Probe: 70.6%) more often than would be expected by random choice among optimal paths (Explicit Probe: 70.5%, p < .001; Implicit Probe: 65.2%, p < .001; Monte Carlo test). Probe trial choice is also a significant predictor of choice among optimal paths when analyzed using multinomial regression (Explicit Probe: χ²(1) = 54.7, p < .001, Implicit Probe: χ²(1) = 62.5, p < .001).

In sum, these results suggest the subgoal probes are well-correlated, though to a lesser degree for the Teleportation Question. They also suggest a strong connection between the probes and planning behavior.

Comparing subgoal choice to theories

Having established that participants learned the task, as well as the validity of their probe responses, we now turn to our central claim, namely that subgoal choice is driven by the computational costs of hierarchical planning. Letting participants’ responses to the subgoal probes stand as reasonable proxies of subgoal choice, we relate the predictions of normative accounts and heuristics to participant probe choice across the three probes.

We start by qualitatively examining participant subgoal choice in Fig 7, extending the qualitative analysis given above with two additional graphs, more model predictions (Fig 7b–7g), and behavioral data averaged across tasks, probes, and participants (Fig 7a). Participant data for all graphs is in Fig A1 in S1 Appendix. We first note that Betweenness Centrality and RRTD-IDDFS are consistent in the graphs (Fig 7b and 7c), and are both relatively consistent with participant probe choice—as described above, states that are close to many other states are preferred. For brevity, we skip over RRTD-BFS and RRTD-RW in this description. The predictions of Solway et al. (2014) [8] are less consistent with participant probe choices (Fig 7f)—as previously described, the predictions are unintuitive because of the difficulty in mapping between partitions and subgoals, particularly when graph bottlenecks correspond to states instead of edges. The predictions of Tomov et al. (2020) [7] are also less consistent with participant probe choices (Fig 7g)—as previously described, the predictions do not correspond with intuitive subgoals because the model relies on between-group edges being sparser than within-group edges.

We now quantitatively compare model predictions of participant subgoal choice.

For each model and probe type, we predict participant choices using hierarchical multinomial regression, where standardized model predictions are included as a factor with a fixed and per-participant random effect. Since the regression analyses have the same effect structure and the underlying theories being compared have no free parameters, we compare the relative ability of factors to predict probe choice through their log likelihood (LL) in Fig 8. We also report the results of likelihood-ratio tests to the null hypothesis of a uniformly random choice model in Table 2. As in the analysis above, we assume the task distribution is uniformly-distributed over all pairs of distinct states. Among normative theories, we found the RRTD-IDDFS model best explained behavior as judged by LL. For the Explicit and Implicit Probes, the next best models in sequence were RRTD-BFS, then both Solway et al. (2014) [8] and Tomov et al. (2020) [7] with similar performance, and finally RRTD-RW. For the Teleportation Question, the best models after RRTD-IDDFS were Solway et al. (2014) [8], Tomov et al. (2020) [7], RRTD-BFS, and finally RRTD-RW. This suggests that, among the normative theories, those based on search costs are most explanatory of subgoal choices.

Fig 8 — Log likelihood (LL) is relative to the minimum model LL for each probe. Larger values indicate better predictivity. References: Solway et al. (2014) [8], Tomov et al. (2020) [7].

Table 2. Estimated coefficients with standard errors from hierarchical multinomial regression predicting subgoal choice.

Likelihood-ratio test statistics compare regression models to the null hypothesis of sampling subgoals uniformly at random.

Normative Algorithm	Explicit Probe	Implicit Probe	Teleportation Question
RRTD-IDDFS	β = 1.78	β = 1.63	β = 0.73
	SE = 0.04	SE = 0.04	SE = 0.06
	χ²(2) = 5183.1	χ²(2) = 3941.2	χ²(1) = 155.7
	p < .001	p < .001	p < .001
RRTD-BFS	β = 4.98	β = 4.94	β = 1.45
	SE = 0.10	SE = 0.11	SE = 0.20
	χ²(2) = 4412.8	χ²(2) = 3402.8	χ²(1) = 55.1
	p < .001	p < .001	p < .001
RRTD-RW	β = 0.37	β = 0.92	β = 0.29
	SE = 0.02	SE = 0.03	SE = 0.05
	χ²(2) = 686.7	χ²(2) = 1822.1	χ²(1) = 39.9
	p < .001	p < .001	p < .001
Solway et al. (2014) [8]	β = 0.75	β = 0.69	β = 0.37
	SE = 0.02	SE = 0.02	SE = 0.03
	χ²(2) = 3929.7	χ²(2) = 3072.7	χ²(1) = 114.3
	p < .001	p < .001	p < .001
Tomov et al. (2020) [7]	β = 1.09	β = 0.97	β = 0.41
	SE = 0.03	SE = 0.02	SE = 0.04
	χ²(2) = 3678.7	χ²(2) = 3056.4	χ²(1) = 102.4
	p < .001	p < .001	p < .001
Heuristic	Explicit Probe	Implicit Probe	Teleportation Question
QCut	β = −0.14	β = −0.19	β = 0.04
	SE = 0.01	SE = 0.01	SE = 0.04
	χ²(2) = 1504.3	χ²(2) = 1236.3	χ²(1) = 1.4
	p < .001	p < .001	p = .238
Degree Cent. (log)	β = 0.73	β = 0.64	β = 0.45
	SE = 0.02	SE = 0.02	SE = 0.04
	χ²(2) = 2534.3	χ²(2) = 1923.3	χ²(1) = 116.0
	p < .001	p < .001	p < .001
Betweenness Cent. (log)	β = 0.86	β = 0.82	β = 0.58
	SE = 0.02	SE = 0.02	SE = 0.03
	χ²(2) = 5598.2	χ²(2) = 4666.1	χ²(1) = 307.3
	p < .001	p < .001	p < .001
State Occupancy (log)	β = 1.00	β = 0.89	β = 0.39
	SE = 0.03	SE = 0.03	SE = 0.04
	χ²(2) = 3222.8	χ²(2) = 2673.3	χ²(1) = 103.3
	p < .001	p < .001	p < .001

Open in a new tab

We additionally compare model predictions to participant state occupancy during navigation trials in order to assess whether people are relying on simple, memory-based strategies to respond to the probes, as described above. We find that participant behavior is better explained by all normative theories for the Explicit and Implicit Probes (with the exception of RRTD-RW), but only RRTD-IDDFS and Solway et al. (2014) [8] for the Teleportation Question.

Among the heuristic theories, we found Betweenness Centrality best explained behavior as judged by LL. For all probes, Degree Centrality was next best, followed by QCut. As above, we compared model predictions to participant state occupancy and found that participant behavior is better explained by Betweenness Centrality for the Explicit and Implicit Probes, and both Betweenness and Degree Centrality for the Teleportation Question.

These results are consistent with the empirical connection between RRTD-IDDFS and Betweenness Centrality found in the large-scale simulation above. Betweenness Centrality further improves on the behavioral fit of RRTD-IDDFS, suggesting that our participants may be using a metric like Betweenness Centrality to approximate resource-rational task decomposition. In contrast, we found that state occupancy was a worse fit to participant behavior than either RRTD-IDDFS or Betweenness Centrality, suggesting that the introduction of filler trials was sufficient to rule out a trivial strategy based on state occupancy. We return to these points in the discussion.

Since the experiment was designed so that only local connections were visible during navigation trials but conducted via an online platform, in the closing survey we asked participants if they used a reference to the task structure besides the interface (“Did you draw or take a picture of the map? If you did, how often did you look at it?”). Participant responses were as follows: 603 participants selected “Did not draw/take picture,” 65 selected “Rarely looked,” 90 selected “Sometimes looked,” and 48 selected “Often looked.” In order to ensure the above results were not impacted, we ran the same analysis in the subset of participants (N = 603) that selected “Did not draw/take picture” and found qualitatively similar results (Fig A2 and Table A1 in S1 Appendix).

In another analysis in S1 Appendix, we tested whether icon identity influenced these results by incorporating the icon used for state presentation into the null choice model. We found minimal influence—like the above results, the addition of subgoal predictions was statistically significant for each model and comparisons based on log likelihood were qualitatively similar.

In a final analysis, we predict participant navigation in the instances where their path was one of several optimal paths. We analyze participant choice as a simple two-stage process: a subgoal is sampled with log probability proportional to model predictions (weighted by a parameter β₁), then an optimal path is sampled with log probability proportional to a free parameter β₂ if it contains the subgoal and 0 otherwise. For each theory of subgoal choice, we optimized this two-stage choice model to maximize the likelihood assigned to observed choices. Because there were a relatively small number of trials per participant, we did not fit random effects for participants. The model results are shown in Fig 9. These results are again consistent with those previously observed—among normative theories RRTD-IDDFS is best, among heuristic theories Betweenness Centrality is best, and Betweenness Centrality is overall the most explanatory. In a supplementary analysis, we also found evidence that participant responses during optimal navigation were slower at both self-reported subgoals and those predicted by models, as detailed in S1 Appendix.

Fig 9 — Log likelihood (LL) is relative to the minimum model LL. Larger values indicate better predictivity. References: Solway et al. (2014) [8], Tomov et al. (2020) [7].

Discussion

In this work, we have proposed a resource-rational framework for task decomposition where tasks are broken down into subtasks based on planning costs. Our first contribution is a novel formal account of this idea based on a resource-rational analysis [38]. Specifically, our proposal involves three levels of nested optimization: Task decomposition identifies a set of subgoals for a given task, subtask-level planning chooses sequences of subgoals to reach a goal, and action-level planning chooses sequences of concrete actions to reach a subgoal. Optimal task decomposition thus depends on both the structure of the environment and the computational resource usage specific to the planning algorithm. We quantitatively compared the predictions of our framework to four heuristic and normative theories proposed in the literature across 11,117 graph-structured tasks. These analyses show that our framework provides different predictions from other normative accounts and aligns with heuristics. We argue that this provides a rationalization of these heuristics for task decomposition in terms of resource-rational planning that accounts for computational costs.

To test our framework, we ran a pre-registered, large-scale study using 30 graph-structured environments and 806 participants. This study includes a more diverse set of tasks than that of any previously reported study in the literature, allowing us to draw more general conclusions about how people form task decompositions. The results of this study reveal that, among normative models, people’s responses most closely align with the predictions of our model. This provides support for our theory that people are engaged in a process of resource-rational task decomposition. Among heuristics for task decomposition, one heuristic is a better fit to behavior than our framework. Because the heuristic makes similar predictions as our framework, this might indicate that people use the heuristic as a tractable approximation to our framework.

Our account, while normative, is not particularly interpretable. Identifying the qualitative patterns that guide human subgoal choice and relating them to the patterns resulting from our framework’s sensitivity to search costs will be necessary for an interpretable account of human subgoal choice. Critically necessary are experimental paradigms that provide rich, but minimally confounded behavior—our experiments extend those in the literature, but our results depend heavily on the self-reported subgoals of research participants. While we have already demonstrated relationships between self-reported subgoals and behavior, making more extensive comparisons to behavior is important for future research. For example, although we found a systematic relationship between participants’ responses to the subgoal probes and their previous navigation decisions, these two measurements were taken minutes apart. We chose to separate these two measurements so as to avoid possible measurement effects in which explicitly asking about subgoals may lead people to navigate through the state they identified. However, this likely weakens the observed relationship between the probe responses and the navigation decisions. An additional difference during navigation trials is that participants are still learning the task structure and are shown connections from the current state. By contrast, during the probe trials, participants are unable to see any connections and must rely solely on what they have learned. These subtle differences may reduce the relationship between these two trial types. They are also different from the formal framework which assumes perfect knowledge about task structure—because this could influence task decomposition, we note a relevant extension below. The effect of this difference could be studied by introducing a separate experimental phase where participants are trained on the task structure directly, outside the context of navigation—the first experiment in Solway et al. (2014) [8] contains a similar phase, but had limited navigation trials. Developing experimental techniques to measure subgoal choices in the process of navigation without biasing the planning process is an important direction for future work.

We highlight three limitations of the navigation trials studied in this experiment. The first is that participants are encouraged to plan hierarchically (“It might be helpful to set subgoals” in Fig 5a). While this seems likely to have minimal impact on which subgoals participants choose, the main focus of this manuscript, it may impact whether participants plan hierarchically in the first place. Future studies intending to assess how people choose to plan hierarchically should consider avoiding prompts like this.

Second, though we report analyses of participant response times, these analyses were not pre-registered and our experiment was not designed to assess response times. These findings should be reevaluated in experimental designs appropriate to assess response times. For example, we found that participants were slower to respond at their subgoals, suggesting they were planning at those states. However, our normative framework makes no prediction about when planning occurs. The framework only defines a resource-rational value for subgoals that can be used to simplify planning whether it happens before action or after reaching a subgoal. Future experiments could investigate this further through manipulations to influence when planning is employed, like a timed phase for up-front planning or an incentive for fast plan execution.

A third limitation is that the navigation trials provide limited insight into the algorithms people are using to plan. While participants only see local connections during navigation, analogous to the local visibility search algorithms have, there are a number of reasons why their navigation behavior would be difficult to relate to the choice of search algorithm. The main issue is that planning steps are generally covert, and do not correspond in any simple way with steps of overt behavior given by the plan that is ultimately produced. Such covert planning is better suited, in future work, to being studied through process-tracing experiments [30, 39], think-aloud protocols, or by investigating neural signatures related to planning and learning [40]. It is possible that some aspects of planning are externalized in the current experiment, via exploration on navigation trials, but this is at best incomplete. For instance, participant behavior improves over the course of navigation trials, suggesting they are performing mental search instead of search via navigation. Also, the experiment restricts single-step movement to neighboring states, whereas by contrast, algorithms like BFS might plan over states in an order where subsequent states are not neighbors. Identifying the search algorithm participants use is critical for future studies since our framework predicts that task decomposition is driven by the search algorithm used.

As mentioned in the text, our framework and experiment explicitly focus on the constrained setting of bruteforce search. However, other theoretical studies have extended this framework to incorporate heuristic search [12, 32] and abstract subgoals [32]. Future research should continue to explore extensions of this framework to more robustly test the predictions of resource-rational task decomposition. For example, our framework could be used to make predictions about subgoal choice in spatial navigation tasks by incorporating spatial distance heuristics and using heuristic search algorithms like A* search or Iterative-Deepening A* search [25]. Another direction could explore other resource costs like memory use, motivated by the relatively low memory use of IDDFS discussed above. At present, our framework assumes planning for tasks occurs independently, avoiding the reuse of previous solutions to subtasks. Despite this absence, our framework still predicts a normative benefit for problem decomposition. However, research has found that people learn hierarchically, exhibiting neural signatures consistent with those predicted by hierarchical reinforcement learning theory [41]. Further research could relax the independence between task solutions by explicitly reusing solutions (as in [8]) or turning to formulations based on reinforcement learning [42], particularly those designed for a distribution of goal-directed tasks with shared structure as studied in this manuscript [43]. A particularly interesting direction could incorporate model learning (as in the Dyna architecture [44]) across tasks in order to explain the influence of task learning on task decomposition. Further extension of this framework could build on resource-rational models developed in other domains, like Markov Decision Processes [19] and feedback control [45], and draw inspiration from approaches used to learn action hierarchies in high-dimensional tasks [42, 43].

Our framework is intended to be an idealized treatment of the problem of task decomposition, but an essential next step for this line of research is to understand how people tractably approximate the expensive computations needed to determine search costs, which we discuss above. Our results already make some progress in this direction. Specifically, we found that participant responses were best explained by the Betweenness Centrality model. This model’s predictions are highly correlated with the normative RRTD-IDDFS model but require far less computation to produce. This suggests that people may be using Betweenness Centrality as a heuristic to approximate the task decomposition that minimizes planning cost. However, Betweenness Centrality is also expensive to compute since it requires finding optimal paths between all pairs of states–something our participants are not likely doing. Since we were able to rule out one trivial estimation strategy (state occupancy) through the inclusion of filler trials, the issue of tractable estimation is an open question for future research to explore by proposing other estimation strategies and experimental manipulations to dissociate their predictions. Identifying even more efficient approximations to resource-rational task decomposition will be essential for a process-level account of human behavior, as well as for advancing a theory of subgoal discovery for problems with larger state spaces.

The human capacity for hierarchically structured thought has proven difficult to formally characterize, despite its intuitive nature and long history of study [13, 28]. In this study we propose a resource-rational framework that motivates and explains the use of hierarchical structure in decision-making: People are modeled as having subgoals that reduce the computational overhead of action-level planning. Our framework departs from and complements other normative proposals in the literature. Most published accounts pose task decomposition as an inference problem: People are modeled as inferring a generative model of the environment [6, 7] or as compressing optimal behavior [8, 20]. We quantitatively relate our framework to existing proposals in a simulation study; In addition, we conduct a large-scale behavioral experiment and find that our framework is effective at predicting human subgoal choice. The work presented here is consistent with other recent efforts within cognitive science to understand how people engage in computationally efficient decision-making [9–11, 38]. It is also complementary to recent work in artificial intelligence that explores the interaction between planning and task representations [19, 42]. Our hope is that future work on human planning and problem-solving will continue to investigate the relationships between computation, representation, and resource-rational decision-making.

Methods

Ethics statement

The following experimental procedures were approved by the institutional review board of Princeton University. In the experiment, participants were shown an electronic consent form providing a written description of the experiment and gave informed consent by clicking a button in lieu of a signature.

Experiment design

To probe for participant subgoal choice, we employed an experiment inspired by those previously published in Solway et al. [8]. In our experiment, participants first navigated on a web-like representation of a graph to learn the graph’s structure (“navigation trials”; Fig 5), then answered a series of task-specific and task-independent questions about subgoal choices (“probe trials”). We then quantitatively analyzed their responses to these questions about subgoal choice. In the Design section, we motivate the choice of various experimental details. Then, in the Procedure section we detail the experimental procedure. The experiment pre-registration is available at https://osf.io/hegf2. Our pre-registered analysis was a comparison of how well RRTD-IDDFS and Solway et al. (2014) [8] could predict participant probe choice using hierarchical multinomial regression, a subset of the comparisons in Fig 8 and Table 2.

Design

The navigation trials were intended to provide participants with an opportunity to learn the structure of the graph. Drawing from results in pilot experiments, we ensured participants could only see the graph edges connected to their current state (Fig 5a). Periodically, the graph with all edges was shown to participants (Fig 5b). Importantly, this was done without signaling any information about future tasks. From pilot experiments, these visual choices (minimizing displayed edges during tasks, but showing all edges periodically between tasks) ensured that participants quickly learned the graph structure instead of relying on the visual representation of the graph.

From pilot experiments, we observed that states with high visit rates coincided with the predictions of RRTD-IDDFS and Betweenness Centrality. To address this confound, the experiment had two types of navigation trials: long and filler. Long trials were optimally solved with more than one action (i.e. the start and goal state were not directly connected) and were intended to give participants exposure to the graph structure. Filler trials were optimally solved with one action (i.e. the start and goal state were directly connected) and were adaptively chosen to ensure a balanced visit rate that avoided overlapping predictions with our model. This adaptive procedure selected from all possible filler tasks by 1) excluding tasks where the start or goal state was most-visited, and then by 2) sampling uniformly from the remaining tasks with the greatest sum of visits to the start and goal states. When all tasks had a most-visited state as a start or goal, the first step was skipped; this circumstance was uncommon and dependent on both participant behavior and the structure of the graph. In effect, this procedure increased the number of most-visited states by increasing visits to states that were nearly (but not) most-visited. In simulations and pilot experiments, this procedure was sufficient to dissociate the visit rate and our model predictions.

Our probes for subgoal choice included two inspired by prior studies [8]—the implicit probe and teleportation question—and included a novel variant that explicitly asked about subgoal use—the explicit probe. We included these three probes to ensure a reliable and comprehensive measure of subgoal choice. The implicit probe was shown to participants before the explicit probe to avoid biasing participant responses.

Typical visual layouts of graphs often have a relationship to pairwise state distances, which means that a variety of visual heuristics are effective strategies when problem-solving. To avoid these confounding heuristics, we visually represented the graph states in a pseudo-random circular layout (Fig 5). The same circular layout was shown for all trial types, including the periodic display of all graph edges.

Procedure

Each participant was assigned a single graph for the entire experiment, with a fixed circular layout and fixed mapping from nodes to icons. We first introduced them to the experimental interface with an interactive tutorial, showing the visual cues used to mark the current node and goal node and how to navigate along graph edges using the numeric keys of the keyboard. We then introduced the concept of a “subgoal” through the example of a cross-country road trip with a “subgoal” located at the midpoint of the road trip. Participants were asked to describe what they thought “subgoal” meant.

Navigation trials followed this introductory material. Participants completed a few short practice trials, then completed 60 trials alternating between long and filler trial types: 30 long trials were drawn uniformly from those optimally solved with more than one action, and 30 filler trials were adaptively selected from those optimally solved with one action (described in detail above). In navigation trials, participants started from a node and had to navigate to a goal node. They were also prompted to consider the use of subgoals with the message “It might be helpful to set subgoals.” At any point during a navigation trial, participants could only see the edges connected to their current node (Fig 5a). Every four trials (thus, 15 times total) participants were shown all the edges of the graph (Fig 5b). Since a photograph of the graph shown this way could simplify navigation trials, the icons at each node were only shown when the participant hovered over them.

Following navigation trials, we probed for subgoal choice. For all probes, the graph was shown in the same circular layout, but the edges were hidden. Between every two trials, the graph was shown with all edges as mentioned above. In the context of 10 different tasks, we queried for subgoal choice using the implicit probe: “Plan how to get from A to B. Choose a location you would visit along the way.” For this probe, we excluded both the start and goal nodes from the available options. Then, in the context of the same 10 tasks after shuffling, we queried for subgoal choice using the explicit probe: “When navigating from A to B, what location would you set as a subgoal? (If none, click on the goal).” For this probe, we only excluded the start node from the available options. Before each of the task-specific probes, participants also completed a few practice trials. Finally, we asked a single instance of the teleportation question: “If you did the task again, which location would you choose to use for instant teleportation?” For this probe, all nodes were available options.

Finally, participants responded to a multiple-choice survey question: “Did you draw or take a picture of the map? If you did, how often did you look at it?”

Stimuli

The graphs we studied were sampled from among the 11,117 simple, connected, undirected, eight-node graphs with sufficient probe trials for the study design. For a given graph, probe trials were sampled uniformly from tasks that require at least 3 actions to optimally solve, a threshold selected based on model predictions that these tasks often require the use of hierarchy. After ensuring each graph had 10 distinct tasks that require 3+ actions to optimally solve, this limited the number of possible graphs to 1,676. The 30 graphs we studied were sampled uniformly at random from these 1,676 graphs (Fig 10). The order of graph nodes in the circular layout, the icon assigned to each node, and the sequence of navigation and probe trials were all sampled pseudo-randomly. We counter-balanced the assignment of participants to graph, circular layout, and trial orderings.

All icons designed by OpenMoji, the open-source emoji and icon project. License: CC BY-SA 4.0.

Analyses

Model predictions

We define a model’s predictions over subgoals as proportional to a utility or log probability. We do so because model predictions are primarily used to predict probe choice using multinomial regression. This leads to a natural interpretation of the coefficients from multinomial regression. For a utility, multinomial regression is equivalent to a softmax choice model, so the coefficient can be equivalently interpreted as an inverse temperature. For a log probability, the coefficient w can predict a range of strategies, including random choice for w = 0, probability matching for w = 1, and maximizing based on probability as w → ∞.

We define the task distribution, when applicable to a model, as a uniform distribution over all tasks with distinct start and goal states. The reported analyses have qualitatively similar results when the task distribution matches the experiment’s long trials, which only includes tasks (s₀, g) where the start s₀ and goal g are not neighbors, so (s₀, g) ∉ T.

Hierarchical multinomial regression of choice

We model participant choice among subgoals using hierarchical multinomial regression with the mlogit package in the R programming language. Regression models are fit with 100 draws from the default Halton sequence (parameters halton = NA, R = 100).

For each model, we predict participant choices for each type of probe trial using multinomial regression with model predictions as regressors. Regressors were standardized to be mean-centered with a standard deviation of 1. Since each participant has multiple task-specific probe choices for the explicit and implicit probes, we include random effects for regressors when modeling those probe types. While not explicitly noted above, since the teleportation question was only asked once per participant, prediction of subgoal choice for it was fit without random effects (i.e. non-hierarchical multinomial regression).

For task-specific probes, the set of possible choices available to the model are configured to match those available to participants, as described in the Procedure section. Additionally, the explicit probe instructs participants to select the goal if they did not use subgoals. For RRTD-based models, we model this with the predicted value for the use of no subgoals. This is equivalent to the sum of the reward and negated planning cost of navigating directly to the goal.

Two-stage model of choice among optimal paths

Free parameters β₁ and β₂ were constrained to be greater than or equal to zero. Optimization started from initial parameters β₁ = 1, β₂ = 1. The random choice model selects subgoals uniformly at random which corresponds to a special case of the two-stage choice model with fixed parameter β₁ = 0.

Resource-rational task decomposition

The model predictions for a state s are the value of a task decomposition (Eq 3) where the state is a single subgoal, $V (Z) = V ({s})$ .

Random walk

The search algorithm returns a plan π = 〈s₀, s₁, …, z〉 and run-time t = |π| with probability $P_{RW} (π, t ∣ s, z) = \prod_{i > 0} \frac{1}{N (s_{i})}$ . Since we defined the reward for a plan as R(π) = −|π|, the expected reward over all plans is

\begin{matrix} R_{RW} (s, z) & = \sum_{π, t} P_{RW} (π, t ∣ s, z) [R (π) - t] \\ = - 2 \sum_{π, t} P_{RW} (π, t ∣ s, z) | π | . \end{matrix}

Since a constant multiplier does not affect model predictions, we drop the constant and let R_RW (s, z) = −∑_π,t P_RW (π, t ∣ s, z)|π|.

The negative expected reward −R_RW (s, z) is the expected number of steps until the first visit to z when starting at s, also called the hitting time H(s, z). H(s, z) can be efficiently computed by a recursive equation

H (s, z) = 1 + \sum_{s^{'} \in N (s)} \frac{1}{| N (s) |} H (s^{'}, z)

when s ≠ z, and with H(s, s) = 0 otherwise. We thus define R_RW (s, z) = −H(s, z).

Since the use of subgoals will either maintain or increase the number of steps required to reach a goal for a random walk, we make note of implementation differences to accommodate this for RRTD-RW. Formally stated, H(s, s′) + H(s′, s″) ≥ H(s, s″), with H(s, s′) + H(s′, s″) = H(s, s″) only when all state sequences from s to s″ must contain s′. So, states s′ with H(s, s′) + H(s′, s″) > H(s, s″) will only increase the expected number of steps and would not be in the policy over subgoals defined by Eq 2. To avoid this issue, we require the subgoal policy for RRTD-RW to contain at least one subgoal. By the same argument, a second subgoal can not decrease the expected number of steps, so we can simplify Eq 2 for RRTD-RW to

\begin{matrix} V_{Z}^{g} (s) = max_{z \in Z} {- [H (s, z) + H (z, g)]} \end{matrix}

(4)

or when the subgoals are a singleton set $Z = {z}$ simply $V_{Z}^{g} (s) = - [H (s, z) + H (z, g)]$ .

Depth-first search

The algorithm is recursively defined to take a current state s and plan-so-far π. At each call of the algorithm, it iterates over neighbors of the current state $s^{'} \in N (s)$ —if the state s′ is not in the current plan π then there is a recursive call to the algorithm with state s′ and an updated plan π′ that ends with s′. When there are no unvisited neighbors s′ to consider, the algorithm backtracks to a previous state and plans until it finds one with unvisited neighbors. When the algorithm reaches the subgoal z, it terminates, returning the plan and a run-time based on the number of calls to the algorithm. To avoid bias due to neighbor order, in each call of the algorithm neighbors are randomly shuffled.

Breadth-first search

The algorithm has a queue of states to visit and tracks all states that have been visited. At each iteration, it visits the next state s from the queue and adds all not-yet-visited neighbors $s^{'} \in N (s)$ to the queue. When it visits the subgoal z, it returns the path to z and a run-time based on the number of iterations that were required. As in DFS, we shuffle the neighbors of the current state at each iteration to avoid bias due to neighbor order.

Iterative deepening depth-first search

A depth-limited DFS augments a standard DFS by terminating when the current “depth” (i.e. the length of the current plan) exceeds a limit, in addition to terminating when the goal is reached. IDDFS iterates by running depth-limited DFS with incrementally larger depths, starting from a depth limit of 1. The algorithm returns when a depth-limited DFS finds a plan to the goal, counting the run-time as the number of recursive DFS calls across all uses of depth-limited DFS. As in other algorithms, neighbors are shuffled to avoid bias due to order.

Alternative models

Degree centrality, betweenness centrality

Both centrality measures were computed in the Python programming language using the networkx library with all parameters left at their defaults except for endpoints = True for Betweenness Centrality. As computed by networkx, both centrality measures are a fraction—for a given state s, Degree Centrality is proportional to the fraction of states that s is connected to and Betweenness Centrality is the fraction of optimal paths that s is part of. Thus, for both measures, we define the model predictions as the logarithm of these fractions for the reasons noted above.

QCut

This section requires more extensive use of graph theory, so we first explicitly connect the task formalism used in the remainder of the text to graphs before describing QCut. An undirected graph consists of nodes $V$ and edges ${i, j} \in E$ between nodes i and j, and we let $n = | V |$ and $m = | E |$ . The adjacency matrix of a graph A_ij = 1 if ${i, j} \in E$ and 0 otherwise. The degree of a node i is d_i = ∑_j A_ij = ∑_i A_ij and the degree matrix D = diag(d). To connect the notation used in the rest of the paper, we contextually assume the following relationship between undirected graphs and task environments: For environments with reversible actions (formally, (s, s′) ∈ T ⇔ (s′, s) ∈ T), we let states correspond directly to graph nodes and transitions in the environment (s, s′) ∈ T correspond to graph edges ${s, s^{'}} \in E$ .

QCut divides the states of a graph into two groups based on the Normalized Cut criterion, which admits an approximate solution based on a spectral decomposition of the graph [16, 33, 46]. The approximate solution to the Normalized Cut criterion is based on the symmetric graph Laplacian $L_{sym} = D^{- \frac{1}{2}} (D - A) D^{- \frac{1}{2}} = I - D^{- \frac{1}{2}} A D^{- \frac{1}{2}}$ .

Following prior work [16, 46], we divide graph nodes into two groups based on the eigenvector v with the second-smallest eigenvalue of $L_{sym}$ . This eigenvector is also the best, non-trivial one-dimensional embedding of the graph states that minimizes the distance between connected states (See Eq. 10 in [46]). Our implementation partitions states based on whether they are above or below a threshold in this one-dimensional embedding v—a typical threshold is zero or the median. Since states s with corresponding eigenvector entry v_s closest to the eigenvector mean are considered most central, we define the model prediction for a state s as $- v_{s}^{2}$ .

Solway et al. (2014)

Solway et al. (2014) [8] propose that people choose hierarchies that most efficiently encode problem-solving behavior. The efficiency of an encoding is quantified through the information-theoretic concept of minimum description length; when applied to encode problem-solving behavior through hierarchical structure, this involves choosing a task decomposition so that solutions have short description length and are composed of subtasks whose solutions can be reused in many tasks. This account takes optimal paths as the behavior to encode and selects hierarchies based on graph partitions.

We now note our implementation details that depart from those of Solway et al. (2014) [8]. To predict behavioral choices, we use the optimal behavioral hierarchy to specify a binary regressor that takes a value of 1 for boundary states (states with at least one neighbor in a different region) and 0 otherwise—we discuss this choice in the main text. Since multiple state sequences can be optimal, random noise is added to graph edges for the purpose of tie-breaking—in our implementation, the description length of behavior is averaged across 10 samplings of these edge weights in order to reduce the effects of noise. In the original publication, optimization over partitions based on model evidence was performed using a genetic algorithm—in our implementation, we enumerate all graph partitions and select those with the highest model evidence. For a given graph, the original article considers all possible partitions—for our analyses over eight-node graphs, we found it necessary to exclude trivial partitions. So, when possible for a graph, we only considered partitions with at least two regions and required that each region contained at least one state that was not a boundary state. The description length of behavior (the “model evidence”) was computed using R code supplied by Dr. Alec Solway.

Tomov et al. (2020)

Tomov et al. (2020) [7] propose an account of task decomposition as inference over hierarchical structure. Their generative model of hierarchical structure assumes partitions are drawn from a Chinese Restaurant Process and additionally assumes that edges between states are more likely when the states are in the same region. Their model also incorporates terms related to the task distribution and reward function that we expect to have minimal impact on our results because we do not vary the task distribution and the reward function for our task is a constant.

To incorporate this model, we used the analysis in the “Simulation Two: Bottleneck States” section in the publication [7]. The analysis models participants (N = 40) as sampling from a generative model over hierarchical graphs, then randomly sampling subgoals (three per participant) from the states connected to a “bridge” edge, which connects distinct regions in the hierarchical graph. Notably, pairs of regions are connected by a single “bridge” edge—this is subtly different from standard graph partitions, where different regions can be connected by any number of edges. All parameters were left as reported in the publication [7], with the exception of choice stochasticity ϵ which we made entirely deterministic by setting ϵ = 1.0. To implement the published analyses, we used the publicly available code at https://github.com/tomov/chunking. We define the model prediction as the logarithm of the number of times a subgoal was sampled by this procedure—to avoid issues where a subgoal isn’t sampled due to noise, we add 1 to the subgoal counts before taking the logarithm.

Supporting information

S1 Appendix. Appendix.

Extended data and analyses.

(PDF)

Click here for additional data file.^{(460.3KB, pdf)}

Acknowledgments

We are very grateful to Dr. Alec Solway for providing software to compute the quantities defined in Solway et al. (2014) [8]. We thank Dr. Brendan McKay for making the eight-node graphs we analyze in this study available for download on his website. We thank Dr. Ari Kahn for helpful discussions about behavioral confounds.

Data Availability

The data and code used for analysis are available on GitHub at: https://github.com/cgc/resource-rational-task-decomposition.

Funding Statement

This research was supported by John Templeton Foundation grant 61454 awarded to TLG and NDD (https://www.templeton.org/), U.S. Air Force Office of Scientific Research grant FA 9550-18-1-0077 awarded to TLG (https://www.afrl.af.mil/AFOSR/), and U.S. Army Research Office grant ARO W911NF-16-1-0474 awarded to NDD (https://www.arl.army.mil/who-we-are/directorates/aro/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Botvinick MM, Niv Y, Barto AG. Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective. Cognition. 2009;113(3):262–280. doi: 10.1016/j.cognition.2008.08.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Eckstein MK, Collins AG. Computational evidence for hierarchically structured reinforcement learning in humans. Proceedings of the National Academy of Sciences. 2020;117(47):29381–29389. doi: 10.1073/pnas.1912330117 [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Balaguer J, Spiers H, Hassabis D, Summerfield C. Neural Mechanisms of Hierarchical Planning in a Virtual Subway Network. Neuron. 2016;90(4):893–903. doi: 10.1016/j.neuron.2016.03.037 [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Cushman F, Morris A. Habitual control of goal selection in humans. Proceedings of the National Academy of Sciences. 2015;112(45):13817–13822. doi: 10.1073/pnas.1506367112 [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Stachenfeld KL, Botvinick MM, Gershman SJ. The hippocampus as a predictive map. Nature Neuroscience. 2017;20(11):1643–1653. doi: 10.1038/nn.4650 [DOI] [PubMed] [Google Scholar]
6. Collins AGE, Frank MJ. Cognitive control over learning: Creating, clustering, and generalizing task-set structure. Psychological Review. 2013;120(1):190–229. doi: 10.1037/a0030852 [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Tomov MS, Yagati S, Kumar A, Yang W, Gershman SJ. Discovery of hierarchical representations for efficient planning. PLOS Computational Biology. 2020;16(4):1–42. doi: 10.1371/journal.pcbi.1007594 [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Solway A, Diuk C, Córdova N, Yee D, Barto AG, Niv Y, et al. Optimal Behavioral Hierarchy. PLOS Computational Biology. 2014;10(8):1–10. doi: 10.1371/journal.pcbi.1003779 [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Lewis RL, Howes A, Singh S. Computational Rationality: Linking Mechanism and Behavior Through Bounded Utility Maximization. Topics in Cognitive Science. 2014;6(2):279–311. doi: 10.1111/tops.12086 [DOI] [PubMed] [Google Scholar]
10. Gershman SJ, Horvitz EJ, Tenenbaum JB. Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science. 2015;349(6245):273–278. doi: 10.1126/science.aac6076 [DOI] [PubMed] [Google Scholar]
11. Griffiths TL, Lieder F, Goodman ND. Rational Use of Cognitive Resources: Levels of Analysis Between the Computational and the Algorithmic. Topics in Cognitive Science. 2015;7(2):217–229. doi: 10.1111/tops.12142 [DOI] [PubMed] [Google Scholar]
12.Correa CG, Ho MK, Callaway F, Griffiths TL. Resource-rational task decomposition to minimize planning costs. In: Proceedings of the 42nd Annual Conference of the Cognitive Science Society; 2020.
13. Sacerdoti ED. Planning in a hierarchy of abstraction spaces. Artificial Intelligence. 1974;5(2):115–135. doi: 10.1016/0004-3702(74)90026-5 [DOI] [Google Scholar]
14. Korf RE. Learning to Solve Problems by Searching for Macro-Operators. Pitman Publishers; 1985. [Google Scholar]
15. Sutton RS, Precup D, Singh S. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence. 1999;112(1-2):181–211. doi: 10.1016/S0004-3702(99)00052-1 [DOI] [Google Scholar]
16.Şimşek Ö, Wolfe AP, Barto AG. Identifying useful subgoals in reinforcement learning by local graph partitioning. In: Proceedings of the 22nd International Conference on Machine Learning; 2005.
17. Ramkumar P, Acuna DE, Berniker M, Grafton ST, Turner RS, Kording KP. Chunking as the result of an efficiency computation trade-off. Nature communications. 2016;7(1):1–11. doi: 10.1038/ncomms12176 [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Huys QJM, Lally N, Faulkner P, Eshel N, Seifritz E, Gershman SJ, et al. Interplay of approximate planning strategies. Proceedings of the National Academy of Sciences. 2015;112(10):3098–3103. doi: 10.1073/pnas.1414219112 [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Jinnai Y, Abel D, Hershkowitz DE, Littman M, Konidaris G. Finding options that minimize planning time. In: Proceedings of the 36th International Conference on Machine Learning; 2019.
20. Maisto D, Donnarumma F, Pezzulo G. Divide et impera: subgoaling reduces the complexity of probabilistic inference and problem solving. Journal of The Royal Society Interface. 2015;12(104):20141335–20141335. doi: 10.1098/rsif.2014.1335 [DOI] [PMC free article] [PubMed] [Google Scholar]
21. McNamee D, Wolpert DM, Lengyel M. Efficient state-space modularization for planning: theory, behavioral and neural signatures. In: Advances in Neural Information Processing Systems; 2016. [Google Scholar]
22. Şimşek Ö, Barto AG. Skill characterization based on betweenness. In: Advances in Neural Information Processing Systems; 2009. [Google Scholar]
23. Bellman R. Dynamic programming. Princeton University Press; 1957. [Google Scholar]
24. Russell S, Norvig P. Artificial Intelligence: A Modern Approach. 3rd ed. USA: Prentice Hall Press; 2009. [Google Scholar]
25. Ghallab M, Nau D, Traverso P. Automated planning and acting. Cambridge University Press; 2016. [Google Scholar]
26. Korf RE. Depth-first iterative-deepening: An optimal admissible tree search. Artificial intelligence. 1985;27(1):97–109. doi: 10.1016/0004-3702(85)90084-0 [DOI] [Google Scholar]
27. De Groot AD. Thought and choice in chess. Mouton Publishers; 1965. [Google Scholar]
28. Newell A, Simon HA. Human problem solving. Prentice-Hall; 1972. [Google Scholar]
29. Lieder F, Plunkett D, Hamrick JB, Russell SJ, Hay N, Griffiths T. Algorithm selection by rational metareasoning as a model of human strategy selection. In: Advances in Neural Information Processing Systems; 2014. [Google Scholar]
30. Callaway F, van Opheusden B, Gul S, Das P, Krueger PM, Lieder F, et al. Rational use of cognitive resources in human planning. Nature Human Behaviour. 2022;6(8):1112–1125. doi: 10.1038/s41562-022-01332-8 [DOI] [PubMed] [Google Scholar]
31. Puterman ML. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc.; 1994. [Google Scholar]
32.Binder FJ, Mattar MG, Kirsh D, Fan JE. Visual scoping operations for physical assembly. In: Proceedings of the 43rd Annual Conference of the Cognitive Science Society; 2021.
33.Menache I, Mannor S, Shimkin N. Q-Cut—Dynamic discovery of sub-goals in reinforcement learning. In: European Conference on Machine Learning. Springer; 2002. p. 295–306.
34. Donnarumma F, Maisto D, Pezzulo G. Problem solving as probabilistic inference with subgoaling: explaining human successes and pitfalls in the tower of hanoi. PLoS computational biology. 2016;12(4):e1004864. doi: 10.1371/journal.pcbi.1004864 [DOI] [PMC free article] [PubMed] [Google Scholar]
35. Lieder F, Griffiths TL. Strategy selection as rational metareasoning. Psychological review. 2017;124(6):762–794. doi: 10.1037/rev0000075 [DOI] [PubMed] [Google Scholar]
36. Borassi M, Natale E. KADABRA is an ADaptive Algorithm for Betweenness via Random Approximation. ACM J Exp Algorithmics. 2019;24(1.2):1–35. doi: 10.1145/3284359 [DOI] [Google Scholar]
37. Huys QJM, Eshel N, O’Nions E, Sheridan L, Dayan P, Roiser JP. Bonsai trees in your head: how the pavlovian system sculpts goal-directed choices by pruning decision trees. PLOS Computational Biology. 2012;8(3):e1002410. doi: 10.1371/journal.pcbi.1002410 [DOI] [PMC free article] [PubMed] [Google Scholar]
38. Lieder F, Griffiths TL. Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources. Behavioral and Brain Sciences. 2020;43:e1. doi: 10.1017/S0140525X1900061X [DOI] [PubMed] [Google Scholar]
39. Ho MK, Abel D, Correa CG, Littman ML, Cohen JD, Griffiths TL. People construct simplified mental representations to plan. Nature. 2022;606(7912):129–136. doi: 10.1038/s41586-022-04743-9 [DOI] [PubMed] [Google Scholar]
40. Liu Y, Mattar MG, Behrens TEJ, Daw ND, Dolan RJ. Experience replay is associated with efficient nonlocal learning. Science. 2021;372(6544):eabf1357. doi: 10.1126/science.abf1357 [DOI] [PMC free article] [PubMed] [Google Scholar]
41. Ribas-Fernandes JF, Solway A, Diuk C, McGuire J, Barto A, Niv Y, et al. A Neural Signature of Hierarchical Reinforcement Learning. Neuron. 2011;71(2):370–379. doi: 10.1016/j.neuron.2011.05.042 [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Harb J, Bacon PL, Klissarov M, Precup D. When waiting is not an option: Learning options with a deliberation cost. In: Thirty-Second AAAI Conference on Artificial Intelligence; 2018.
43. Nasiriany S, Pong V, Lin S, Levine S. Planning with goal-conditioned policies. In: Advances in Neural Information Processing Systems; 2019. [Google Scholar]
44. Sutton RS, Barto AG. Reinforcement learning: An introduction. MIT press; 2018. [Google Scholar]
45. Prystawski B, Mohnert F, Toi M, Lieder F. Resource-rational Models of Human Goal Pursuit. Topics in Cognitive Science. 2022;14(3):528–549. doi: 10.1111/tops.12562 [DOI] [PubMed] [Google Scholar]
46. Shi J, Malik J. Normalized cuts and image segmentation. IEEE Transactions on pattern analysis and machine intelligence. 2000;22(8):888–905. doi: 10.1109/34.868688 [DOI] [Google Scholar]

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1011087.r001

Decision Letter 0

Thomas Serre, Tobias U Hauser

4 Jan 2023

Dear Mr. Correa,

Thank you very much for submitting your manuscript "Humans decompose tasks by trading off utility and computational cost" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

We apologise for the delay in getting back to you - it has taken considerable time to solicit expert comments on this paper. As you can see from the review below, all reviewers liked the research question and appreciate the approach you have taken. They also liked the writing of the paper and we believe it will make a significant contribution to our journal. However, all reviewers had several comments and suggestions that we would like you to address in a revision of the paper.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Tobias U Hauser, PhD

Academic Editor

PLOS Computational Biology

Thomas Serre

Section Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The manuscript proposes a normative computational account of subgoal formation, which aims to integrate how the subgoals are chosen with the ways they are subsequently used, in the class of problems that translates into pathfinding problems on graphs. The authors implement a human behavioural experiment to assess the predictive power of the computational account and conclude that one of the graph search algorithms, one that closely corresponds to an interesting heuristic, corresponds to human choices of subgoals the best. In my opinion the paper is very thorough, thoughtful and very successful in integrating computational insight with empirical observation in a rather subtle domain. I think it would be a great addition to the scientific record - however, I believe addressing the following concerns would make it easier for readers to correctly contextualise the presented results:

There are three problem classes defined by the paper: the one described by the mathematical formalism of the model, the one implemented in the experiment, and the one from which the examples in the text are given. I think the text basically treats all these three to be the exact same, and I think they are not. I don’t necessarily think that this is a problem - abstracting away some aspects of the experimental setup and addressing a more general set of phenomena in the examples might even be desired - but I think writing very clearly about these differences is important to help the reader see what the results really mean. My first two questions address these differences.

1. The mathematical formalism assumes T to be known, and to be accessed locally (i.e. no searching from the goal backwards). This is fortunate since learning T would almost certainly be entangled with the choice of Z making the formalism way too complicated. However, in the experiment, people have to learn T from scratch, and aren’t expected to have a noiseless version of it at any point. In the navigation trials, this is compensated by the local information about edges that can be said to correspond to the local access in the model - but there people are making actual moves on the graph, not merely mental ones, as e.g. in the given example of chess. And in the probes, where they presumably are making mental simulations, they presumably not only learned T already (albeit noisily) but also already constructed Z - thus people aren’t really ‘planning’ while constructing the subgoals, but rather ‘exploring’. All this might be perfectly fine, it just gave me pause not to see these points addressed more directly.

2. The text of the manuscript gives various real-world examples of problems, most notably that of physical navigation, choosing landmarks (the café) as subgoals. I think the café example works through a different mechanism: I’d choose it as a subgoal because I already know how to get there, i.e. there is an existing (habitualised) policy I can reuse, and not because it will make planning cheaper now. Habits seem to be a consideration orthogonal to this paper, but maybe I’m failing to see a connection. Furthermore, in tasks like physical navigation people seem to heavily rely on distance-from-goal or directional heuristics, both available due to the existence of an embedding into a feature space. This could result in something like an IDA* search algorithm instead of the ones discussed here. No such embedding is assumed either in the model or the experiment - in fact it’s explicitly avoided. The example of navigating in the dark is closer to the experiment, however not the model. Board game examples are closer to the model, as they involve planning in symbolic spaces (but one has to ignore heuristic-generating feature embeddings for e.g. chess). Cooking might be the example I find the most fitting for the model.

3. The Discussion mentions in passing that this work is complementary to option discovery. This seems to be an important point to address, and although I myself am not well versed enough in the options literature to be able to tell the exact relationship, I’d be very interested how exactly is it complementary. In fact, I’d probably mention this in the introduction as well instead of only at the very end.

4. Relatedly, the learning problem in this paper is formalised similarly to goal-directed reinforcement learning. In particular, this paper looks at defining subgoals in such a setting: https://arxiv.org/abs/2106.01404. It might worth relating to this literature as well.

5. Why is IDDFS an intuitive choice despite the large computational cost depicted in Fig 1b? It has presumably been proposed previously because of some favorable property, was it good memory complexity? If so, would it also make sense to look at how subgoals reduce memory costs?

6. Do Fig 1c-e show the costs incurred during one particular run of each of these algorithms? Or why is e.g. the random walk not symmetric around the start state? These panels are also a bit far from the part of the text that describes them, moving them closer, maybe on a separate figure could streamline the reading experience.

7. In the experiment could stimulus salience distort the results in any way? E.g. is the red balloon overrepresented in the choice of subgoals?

8. Would it be possible to make more direct comparisons between the number of steps people make in the navigation trials and the number of steps taken by the algorithms? Would such a comparison be meaningful given that for humans this is a learning phase as well?

9. Eq 3: is the Z that maximises this formula taken here? I assume it is but it isn’t stated explicitly.

10. It is mentioned that the subjects were asked if they drew a map. Did any of them answer yes to this question?

In sum, I find the manuscript to be a valuable contribution, and regard the above issues as mere possible suboptimalities, not disqualifying problems. I look forward to seeing the final version of the paper, hopefully allowing me to understand the work even better.

Reviewer #2: The paper describes behavioral predictions derived from a set of (normative) computational accounts and heuristics for the (resource-rational) decomposition of tasks in a graph-structured environment. For simulations, graph structures are selected such that differences between model predictions about which states should be sub-goals become qualitatively evident. Model predictions are qualitatively and quantitatively (using multinomial regression analyses) compared against human behavior from a large, pre-registered online study (N = 806). The authors report that human behavior in a graph-structured planning task involving explicit and implicit probe questions about sub-goals is most consistent with the use of a task decomposition heuristic (betweenness centrality) and – among the formal accounts – with a resource-rational model performing an iterative-deepening depth-first search on the graph structure.

The paper is well-written, addresses an interesting (and novel) research question and features a variety of well-crafted computational accounts of task decomposition that are motivated by a resource-rational perspective on planning. Predictions of previously considered formal accounts of planning from the literature are pitted up against these new algorithms – allowing for quantitative comparisons of the relative goodness of fit to observed human behavior. The novelty and strength of the present computational approach lies in the formalization of three nested levels of planning (action-level planning, subtask-level planning and task decomposition), and their optimization considering computational costs (limited resources).

The idea that humans (and potentially other cognitive systems) engage in resource-rational trade-offs during planning and decomposition of tasks is intriguing, and has far-reaching implications, even beyond the field of cognitive psychology/neuroscience.

While I enjoyed reading the paper and think it would be of much interest to the diverse readership of PLOS Computational Biology, I have a few comments that I would like to see addressed. These are mainly related to a potential confound in the behavioral task design (that should at least be discussed), the presentation/analysis of the behavioral data and the interpretation of the findings. I am very confident that the authors will be able to address my concerns.

Major questions and comments:

1.) I would like to see the human behavior unpacked and explored a bit more:

a. A figure for the observed associations between navigation trial performance and probe behavior could be used to illustrate these findings. This would help readers to get a better sense for the variability of performance across subjects and the data distributions at hand.

b. In Figure 5a it is unclear how many participants performed choices on each of the depicted graphs. Please clarify this, and potentially consider adding a supplementary figure showing the results for the remaining 26 graph structures that were considered in the study.

c. A potential confound that should be controlled for is the varying complexity of the employed graph structures. Is discovery and usage of sub-goals further modulated by measures of graph complexity/minimum description length?

d. Does behavioral performance improve across trials/repeated exposures to planning tasks?

e. Is there evidence that participants learned the structure well-enough? What looks like absence of use of normative task decomposition could in fact be failure to acquire the structure. This learning deficit could be assessed by investigating exploration behavior before setting sub-goals (entropy in cursor/mouse movement, return to previously visited states – over and above the reported control for state occupancy during navigation trials) as marker of how well the structure has been learned.

f. Relatedly, was there evidence for (overall) longer reaction times on the task (e.g. longer planning duration, or time to complete a trial) for subjects choosing less optimal sub-goals (or no sub-goals at all), which would be expected if there are advantages of using a normative strategy to solve the task?

2.) Does task decomposition in the behavioral experiment occur “naturally”, or is it induced by task instructions, e.g. “Plan how to get from A to B. Choose a location you would visit along the way,” (lines 340-341), and subjects feel encouraged to do so; i.e. due to demand characteristics of the task? The authors should acknowledge and discuss this potential confound and its implications for the interpretation of the results. Additionally, it would be beneficial to point out that future studies should try to address this confound by using the same task but without explicitly prompting sub-goal use (and therefore task decomposition). This would allow to test whether the results are still aligned with the predictions of normative accounts and heuristics.

3.) It is unclear to me, how exactly the tested approximations/heuristics are more psychologically plausible and tractable for humans than the presented normative computations. As also discussed by the authors, betweenness centrality is computationally very demanding, given that all shortest paths of a given graph need to be computed and stored in memory to calculate the importance of each node using this heuristic. Relatedly, I would like to encourage the authors to include a section elaborating on how these heuristics (and potentially the computations used in the normative accounts) might plausibly be implemented by humans/brains (i.e. how cognitively and also how biologically plausible their implementation is). The authors acknowledge that even simpler heuristics than the ones considered here could be used by participants. I think it would be beneficial to elaborate more on what these simpler, more tractable heuristics may be. For example, simple count-based strategies, keeping track of the number of edges of and how often each node occurred in each query about start and end node seem to be a more tractable approximation (akin to something like a successor representation).

4.) The current study presents a normative account for resource-rational behavior in graph-structured environments, where sub-tasks are well-defined and the state space can be decomposed with reasonable certainty, only by considering the graph structure itself (subjects transition from one state to another, transitions do not depend on skill level of executing the behavioral task at hand, or the time to complete it etc). To keep up with the general motivation of the study as put forth in the author summary and introduction (“how do people decompose tasks to begin with?”, line 11), the authors could elaborate on the extent of how generalizable the presented normative accounts are to other, non-graph-structured planning tasks. How are sub-goals identified in the more general case, e.g. in more complex, less discrete tasks that involve more uncertainty about the state space and completion of sub-tasks/achievement of sub-goals?

5.) For resource-rational computations in subtask-level planning to occur, subjects would need to have access to the computational cost of a cognitive process – before deciding whether to engage in the computation, or rather not to invest time and cognitive resource. This cost is of course only available after indeed running the very same computation, which sort of seems to defeat the purpose of resource-rational deliberation. It is more of a question out of curiosity, but I assume many readers will have similar thoughts – so it might be beneficial to elaborate (e.g. in the discussion) how the necessary “ingredients” for the resource-rational deliberation are thought to be accrued before engaging in resource-rational task decomposition.

6.) It would be beneficial to present an additional, alternative metric for model comparison that is less dependent on the assumption of uninformative (flat) priors and an approximately multivariate Gaussian posterior distribution as the AIC. I suggest adding another metric like the WAIC, or preferably, cross-validation to assess the predictive accuracy of the models under consideration. Do other metrics produce convergent model comparison results?

Minor questions and comments

(1) It is surprising that predictions based on betweenness centrality seem so closely aligned with predictions of RRTD-IDDFS but not with RRTD-BFS (Fig. 3), given algorithmic work suggesting that betweenness centrality can be efficiently (and probabilistically) approximated using balanced bidirectional breadth-first search (e.g. Borassi & Natale, 2019, https://doi.org/10.1145%2F3284359). The authors could clarify and discuss this.

(2) The last sentence of the abstract “Taken together, our results provide new theoretical insight into the computational principles underlying the intelligent structuring of goal-directed behavior.”, seems to overstate the behavioral findings of the study and what the models represent. In my view, the study shows how well predictions of a number of considered normative accounts and heuristics are aligned with human behavior, but do not necessarily represent a proof for the use of these exact computational principles by humans (there could be alternative computational accounts and heuristics that are currently not considered in the model space of the present study – e.g. Dijkstra’s algorithm for discovery of the shortest path between nodes).

(3) I was a bit confused by the fact that the authors indicate the number of all possible unique 8-node graphs graph-structured planning tasks (11,117) multiple times throughout the manuscript without mentioning that this was not the number of graphs actually used in the behavioral experiment. It is more informative to learn that the authors further limited this set by ensuring that each graph had 10 distinct tasks with 3+ actions for an optimal solution (lines 532-533), which greatly enhances the scrutiny of the approach.

(4) In which way was the multiple-choice survey question at the end of the experiment used? Did it serve as an exclusion criterion? It would be beneficial to rule out the potentially confounding effects of participants using drawings or pictures of the graph and re-run the behavioral analyses only including subjects who did indeed adhere to the protocol.

(5) The explanations of the toy example task decomposition in the figure caption of Fig. 1 (c-e, page 3) are a bit unclear without reading the section describing how the task was set up in the main manuscript (only at page 6). Please expand the figure caption such that this becomes clearer without having to refer to the main text.

(6) Page 12, lines 354-355: I do not understand the pre-registered exclusion criterion of “no more than 175% of the optimal number of actions”. Is this a typo?

(7) Page 13, line 403: The last sentence before Figure 5 seems to overstate the behavioral findings. I do not think that the presented analyses (of internal consistency) are sufficient to establish validity of the construct (from a test theoretic perspective) – internal consistency is a metric of reliability. Please rephrase this.

(8) Figure 3: Why are there relatively low correlations between RRTD-RW and Q-Cut model predictions, if one is a heuristic approximation of a random walk, while correlations e.g. between RRTD-IDDFS and betweenness centrality are much higher. Was another than rank-one approximation used?

(9) The Github link to data and code used for analysis (https://github.com/cgc/resource-rational-task-decomposition) is currently not working, please make this important information available.

I sign my reviews.

Lennart Luettgau

Reviewer #3: In summary, my view is that this is a well-executed study which makes a significant contribution to the literature on human planning. In particular, the authors are to be commended on their efforts to integrate a variety of hitherto disparate studies within a unified perspective under the framework of resource-rational DM. Furthermore, a more detailed analysis of the hierarchy/bottleneck problem is presented based on the most comprehensive experiment on this topic to date. However, there are a couple of important gaps in the data analysis approach in my view.

Regarding the data analysis and model comparisons. The authors emphasize the algorithm-based approach in contrast to the structure inference approach. I agree with this perspective and find it interesting however it seems to me that this suggests an investigation into what planning algorithm is being used by the participants. To put it bluntly, what is the utility of considering a model such as RRTD-IDDFS to predict subgoals if the participants are not using IDDFS to plan? I wonder if the authors could at least provide some perspectives on this if not actually run some model fits/comparisons on choices during the navigation trials.

Related to this, it seems that an immediate computational hypothesis emerging from the normative framework studied here (and the principle of subgoaling more generally) regards the modulation of reaction times. That is, given a task decomposition, then subtask-level planning should occur at a subgoal specifically and this should be reflected in reaction times. More generally, reaction times can be an important behavioural indicator of internal computation and I think it should be somehow addressed in this study.

I think the authors could tune their introduction to the literature a bit better. For example, on the critical idea of relating task decompositions to planning (rather than structure inference), it is said that “…our framework differs from many existing accounts because we directly incorporate planning costs into the criteria used to choose a task decomposition.” I think it should be acknowledged that this idea is not fundamentally new and existing accounts have already considered planning costs in task decompositions computationally e.g. Jinnai et al 2019 (in RL) and McNamee et al 2016 (regarding human planning) (both cited here but there may be others). In particular, the latter considers a random walk search policy and points to log(degree centrality) as a key variable in determining decompositions/subgoals consistent with the modelling results here (see Fig 3 RRTD-RW vs degree centrality (log)). I think the specific computational novelty here is the integrative framework (which generates new results).

Minor comment:

Can authors speculate on the low consistency of teleportation probe? As I understand it, this measure is taken once per subject at the end of the experiment thus I would intuitively expect this measure of subgoals to be stable as opposed to the other measures which may be varying throughout the experiment.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: None

Reviewer #2: No: Link to Github repository was not working (as of 02/12/2022)

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Lennart Luettgau

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

References:

Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript.

If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

PLoS Comput Biol. 2023 Jun 1;19(6):e1011087. doi: 10.1371/journal.pcbi.1011087.r002

Author response to Decision Letter 0

27 Feb 2023

Attachment

Submitted filename: response-to-reviewers.pdf

Click here for additional data file.^{(338KB, pdf)}

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1011087.r003

Decision Letter 1

Thomas Serre, Tobias U Hauser

10 Apr 2023

Dear Mr. Correa,

We are pleased to inform you that your manuscript 'Humans decompose tasks by trading off utility and computational cost' has been provisionally accepted for publication in PLOS Computational Biology. As you can see from the reviewers' comments, all of them were happy with the thorough revisions you have conducted and I congratulate you on this nice contribution to our field.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology.

Best regards,

Tobias U Hauser, PhD

Academic Editor

PLOS Computational Biology

Thomas Serre

Section Editor

PLOS Computational Biology

***********************************************************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The authors’ responses elaborate on each point I raised in my review of the initial submission, and the modifications of the text and the figures adequately address my questions and concerns. I find the revised manuscript to be significantly improved in terms of clarity and ease of understanding. Consequently, I see no further obstacles to the publication of this work.

Reviewer #2: The authors have addressed all of my concerns and questions through a convincing set of revisions to the manuscript, and by adding supplementary analyses and extending previous ones. I support publication of this highly intriguing and well-written paper and congratulate the authors on an excellent piece of work on resource-rational trade-offs during task decomposition and planning.

I sign my reviews.

Lennart Luettgau

Reviewer #3: Thank you for addressing my comments. I have no further comments and am happy to see this interesting study published.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

Reviewer #1: None

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Lennart Luettgau

Reviewer #3: No

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1011087.r004

Acceptance letter

Thomas Serre, Tobias U Hauser

8 May 2023

PCOMPBIOL-D-22-01548R1

Humans decompose tasks by trading off utility and computational cost

Dear Dr Correa,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Zsofi Zombor

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Appendix. Appendix.

Extended data and analyses.

(PDF)

Click here for additional data file.^{(460.3KB, pdf)}

Attachment

Submitted filename: response-to-reviewers.pdf

Click here for additional data file.^{(338KB, pdf)}

Data Availability Statement

The data and code used for analysis are available on GitHub at: https://github.com/cgc/resource-rational-task-decomposition.

[pcbi.1011087.ref001] 1. Botvinick MM, Niv Y, Barto AG. Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective. Cognition. 2009;113(3):262–280. doi: 10.1016/j.cognition.2008.08.011 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011087.ref002] 2. Eckstein MK, Collins AG. Computational evidence for hierarchically structured reinforcement learning in humans. Proceedings of the National Academy of Sciences. 2020;117(47):29381–29389. doi: 10.1073/pnas.1912330117 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011087.ref003] 3. Balaguer J, Spiers H, Hassabis D, Summerfield C. Neural Mechanisms of Hierarchical Planning in a Virtual Subway Network. Neuron. 2016;90(4):893–903. doi: 10.1016/j.neuron.2016.03.037 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011087.ref004] 4. Cushman F, Morris A. Habitual control of goal selection in humans. Proceedings of the National Academy of Sciences. 2015;112(45):13817–13822. doi: 10.1073/pnas.1506367112 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011087.ref005] 5. Stachenfeld KL, Botvinick MM, Gershman SJ. The hippocampus as a predictive map. Nature Neuroscience. 2017;20(11):1643–1653. doi: 10.1038/nn.4650 [DOI] [PubMed] [Google Scholar]

[pcbi.1011087.ref006] 6. Collins AGE, Frank MJ. Cognitive control over learning: Creating, clustering, and generalizing task-set structure. Psychological Review. 2013;120(1):190–229. doi: 10.1037/a0030852 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011087.ref007] 7. Tomov MS, Yagati S, Kumar A, Yang W, Gershman SJ. Discovery of hierarchical representations for efficient planning. PLOS Computational Biology. 2020;16(4):1–42. doi: 10.1371/journal.pcbi.1007594 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011087.ref008] 8. Solway A, Diuk C, Córdova N, Yee D, Barto AG, Niv Y, et al. Optimal Behavioral Hierarchy. PLOS Computational Biology. 2014;10(8):1–10. doi: 10.1371/journal.pcbi.1003779 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011087.ref009] 9. Lewis RL, Howes A, Singh S. Computational Rationality: Linking Mechanism and Behavior Through Bounded Utility Maximization. Topics in Cognitive Science. 2014;6(2):279–311. doi: 10.1111/tops.12086 [DOI] [PubMed] [Google Scholar]

[pcbi.1011087.ref010] 10. Gershman SJ, Horvitz EJ, Tenenbaum JB. Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science. 2015;349(6245):273–278. doi: 10.1126/science.aac6076 [DOI] [PubMed] [Google Scholar]

[pcbi.1011087.ref011] 11. Griffiths TL, Lieder F, Goodman ND. Rational Use of Cognitive Resources: Levels of Analysis Between the Computational and the Algorithmic. Topics in Cognitive Science. 2015;7(2):217–229. doi: 10.1111/tops.12142 [DOI] [PubMed] [Google Scholar]

[pcbi.1011087.ref012] 12.Correa CG, Ho MK, Callaway F, Griffiths TL. Resource-rational task decomposition to minimize planning costs. In: Proceedings of the 42nd Annual Conference of the Cognitive Science Society; 2020.

[pcbi.1011087.ref013] 13. Sacerdoti ED. Planning in a hierarchy of abstraction spaces. Artificial Intelligence. 1974;5(2):115–135. doi: 10.1016/0004-3702(74)90026-5 [DOI] [Google Scholar]

[pcbi.1011087.ref014] 14. Korf RE. Learning to Solve Problems by Searching for Macro-Operators. Pitman Publishers; 1985. [Google Scholar]

[pcbi.1011087.ref015] 15. Sutton RS, Precup D, Singh S. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence. 1999;112(1-2):181–211. doi: 10.1016/S0004-3702(99)00052-1 [DOI] [Google Scholar]

[pcbi.1011087.ref016] 16.Şimşek Ö, Wolfe AP, Barto AG. Identifying useful subgoals in reinforcement learning by local graph partitioning. In: Proceedings of the 22nd International Conference on Machine Learning; 2005.

[pcbi.1011087.ref017] 17. Ramkumar P, Acuna DE, Berniker M, Grafton ST, Turner RS, Kording KP. Chunking as the result of an efficiency computation trade-off. Nature communications. 2016;7(1):1–11. doi: 10.1038/ncomms12176 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011087.ref018] 18. Huys QJM, Lally N, Faulkner P, Eshel N, Seifritz E, Gershman SJ, et al. Interplay of approximate planning strategies. Proceedings of the National Academy of Sciences. 2015;112(10):3098–3103. doi: 10.1073/pnas.1414219112 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011087.ref019] 19.Jinnai Y, Abel D, Hershkowitz DE, Littman M, Konidaris G. Finding options that minimize planning time. In: Proceedings of the 36th International Conference on Machine Learning; 2019.

[pcbi.1011087.ref020] 20. Maisto D, Donnarumma F, Pezzulo G. Divide et impera: subgoaling reduces the complexity of probabilistic inference and problem solving. Journal of The Royal Society Interface. 2015;12(104):20141335–20141335. doi: 10.1098/rsif.2014.1335 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011087.ref021] 21. McNamee D, Wolpert DM, Lengyel M. Efficient state-space modularization for planning: theory, behavioral and neural signatures. In: Advances in Neural Information Processing Systems; 2016. [Google Scholar]

[pcbi.1011087.ref022] 22. Şimşek Ö, Barto AG. Skill characterization based on betweenness. In: Advances in Neural Information Processing Systems; 2009. [Google Scholar]

[pcbi.1011087.ref023] 23. Bellman R. Dynamic programming. Princeton University Press; 1957. [Google Scholar]

[pcbi.1011087.ref024] 24. Russell S, Norvig P. Artificial Intelligence: A Modern Approach. 3rd ed. USA: Prentice Hall Press; 2009. [Google Scholar]

[pcbi.1011087.ref025] 25. Ghallab M, Nau D, Traverso P. Automated planning and acting. Cambridge University Press; 2016. [Google Scholar]

[pcbi.1011087.ref026] 26. Korf RE. Depth-first iterative-deepening: An optimal admissible tree search. Artificial intelligence. 1985;27(1):97–109. doi: 10.1016/0004-3702(85)90084-0 [DOI] [Google Scholar]

[pcbi.1011087.ref027] 27. De Groot AD. Thought and choice in chess. Mouton Publishers; 1965. [Google Scholar]

[pcbi.1011087.ref028] 28. Newell A, Simon HA. Human problem solving. Prentice-Hall; 1972. [Google Scholar]

[pcbi.1011087.ref029] 29. Lieder F, Plunkett D, Hamrick JB, Russell SJ, Hay N, Griffiths T. Algorithm selection by rational metareasoning as a model of human strategy selection. In: Advances in Neural Information Processing Systems; 2014. [Google Scholar]

[pcbi.1011087.ref030] 30. Callaway F, van Opheusden B, Gul S, Das P, Krueger PM, Lieder F, et al. Rational use of cognitive resources in human planning. Nature Human Behaviour. 2022;6(8):1112–1125. doi: 10.1038/s41562-022-01332-8 [DOI] [PubMed] [Google Scholar]

[pcbi.1011087.ref031] 31. Puterman ML. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc.; 1994. [Google Scholar]

[pcbi.1011087.ref032] 32.Binder FJ, Mattar MG, Kirsh D, Fan JE. Visual scoping operations for physical assembly. In: Proceedings of the 43rd Annual Conference of the Cognitive Science Society; 2021.

[pcbi.1011087.ref033] 33.Menache I, Mannor S, Shimkin N. Q-Cut—Dynamic discovery of sub-goals in reinforcement learning. In: European Conference on Machine Learning. Springer; 2002. p. 295–306.

[pcbi.1011087.ref034] 34. Donnarumma F, Maisto D, Pezzulo G. Problem solving as probabilistic inference with subgoaling: explaining human successes and pitfalls in the tower of hanoi. PLoS computational biology. 2016;12(4):e1004864. doi: 10.1371/journal.pcbi.1004864 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011087.ref035] 35. Lieder F, Griffiths TL. Strategy selection as rational metareasoning. Psychological review. 2017;124(6):762–794. doi: 10.1037/rev0000075 [DOI] [PubMed] [Google Scholar]

[pcbi.1011087.ref036] 36. Borassi M, Natale E. KADABRA is an ADaptive Algorithm for Betweenness via Random Approximation. ACM J Exp Algorithmics. 2019;24(1.2):1–35. doi: 10.1145/3284359 [DOI] [Google Scholar]

[pcbi.1011087.ref037] 37. Huys QJM, Eshel N, O’Nions E, Sheridan L, Dayan P, Roiser JP. Bonsai trees in your head: how the pavlovian system sculpts goal-directed choices by pruning decision trees. PLOS Computational Biology. 2012;8(3):e1002410. doi: 10.1371/journal.pcbi.1002410 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011087.ref038] 38. Lieder F, Griffiths TL. Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources. Behavioral and Brain Sciences. 2020;43:e1. doi: 10.1017/S0140525X1900061X [DOI] [PubMed] [Google Scholar]

[pcbi.1011087.ref039] 39. Ho MK, Abel D, Correa CG, Littman ML, Cohen JD, Griffiths TL. People construct simplified mental representations to plan. Nature. 2022;606(7912):129–136. doi: 10.1038/s41586-022-04743-9 [DOI] [PubMed] [Google Scholar]

[pcbi.1011087.ref040] 40. Liu Y, Mattar MG, Behrens TEJ, Daw ND, Dolan RJ. Experience replay is associated with efficient nonlocal learning. Science. 2021;372(6544):eabf1357. doi: 10.1126/science.abf1357 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011087.ref041] 41. Ribas-Fernandes JF, Solway A, Diuk C, McGuire J, Barto A, Niv Y, et al. A Neural Signature of Hierarchical Reinforcement Learning. Neuron. 2011;71(2):370–379. doi: 10.1016/j.neuron.2011.05.042 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1011087.ref042] 42.Harb J, Bacon PL, Klissarov M, Precup D. When waiting is not an option: Learning options with a deliberation cost. In: Thirty-Second AAAI Conference on Artificial Intelligence; 2018.

[pcbi.1011087.ref043] 43. Nasiriany S, Pong V, Lin S, Levine S. Planning with goal-conditioned policies. In: Advances in Neural Information Processing Systems; 2019. [Google Scholar]

[pcbi.1011087.ref044] 44. Sutton RS, Barto AG. Reinforcement learning: An introduction. MIT press; 2018. [Google Scholar]

[pcbi.1011087.ref045] 45. Prystawski B, Mohnert F, Toi M, Lieder F. Resource-rational Models of Human Goal Pursuit. Topics in Cognitive Science. 2022;14(3):528–549. doi: 10.1111/tops.12562 [DOI] [PubMed] [Google Scholar]

[pcbi.1011087.ref046] 46. Shi J, Malik J. Normalized cuts and image segmentation. IEEE Transactions on pattern analysis and machine intelligence. 2000;22(8):888–905. doi: 10.1109/34.868688 [DOI] [Google Scholar]

PERMALINK

Humans decompose tasks by trading off utility and computational cost

Carlos G Correa

Mark K Ho

Frederick Callaway

Nathaniel D Daw

Thomas L Griffiths

Roles

Abstract

Author summary

Introduction

Fig 1. Our framework for task decomposition accounts for the computational cost of planning towards subgoals—task decompositions should jointly optimize task performance and the computational cost of search.

Results

A formal framework for task decomposition

Action-level planning

Subtask-level planning

Task decomposition

Fig 2. Choosing task decompositions that make planning more efficient.

Comparing accounts of task decomposition

Table 1. Descriptions of Normative Algorithms and Heuristics.

Fig 3. Comparing predictions of the (a) RRTD-IDDFS, (b) Betweenness Centrality, (c) Solway et al. (2014) [8], and (d) Tomov et al. (2020) [7] models.

Fig 4. Correlation matrix comparing model predictions.

An empirical test of the framework

Fig 5. Screenshots from the experimental interface.

Experiment results

Subgoal probes are internally consistent and predict behavior

Fig 6. Participants that choose a subgoal instead of the goal more often in Explicit Probe trials have shorter average path length, relative to the optimal path length (r = −0.29, p < .001).

Comparing subgoal choice to theories

Fig 7. Visualization of behavior and model predictions on four eight-node graphs selected from the 30 graphs used for the experiment.

Fig 8. Comparison of statistical analysis using mixed-effects multinomial regression to predict subgoal choice behavior for each subgoal probe.

Table 2. Estimated coefficients with standard errors from hierarchical multinomial regression predicting subgoal choice.

Fig 9. Comparison of two-stage choice models to predict participant choice behavior among optimal paths.

Discussion

Methods

Ethics statement

Experiment design

Design

Procedure

Stimuli

Fig 10. The 30 undirected, eight-node graphs that were used in the experiment.

Analyses

Model predictions

Hierarchical multinomial regression of choice

Two-stage model of choice among optimal paths

Resource-rational task decomposition

Random walk

Depth-first search

Breadth-first search

Iterative deepening depth-first search

Alternative models

Degree centrality, betweenness centrality

QCut

Solway et al. (2014)

Tomov et al. (2020)

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Thomas Serre

Tobias U Hauser

Roles

Author response to Decision Letter 0

Decision Letter 1

Thomas Serre

Tobias U Hauser

Roles

Acceptance letter

Thomas Serre

Tobias U Hauser

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles