Abstract
Many real-world systems exhibit higher-order interactions beyond pairwise links. Such interactions are modeled by undirected hypergraphs where edges can connect any number of vertices, but without capturing the directional nature of many real-world interactions. Directed hypergraphs overcome this limitation by distinguishing source and target sets within each hyperedge, enabling analysis of directional information flow. Here, we provide a framework to characterize the structural organization of directed higher-order networks at their microscale. We extract the fingerprint of a directed hypergraph, capturing the frequency of hyperedges with a certain source and target sizes, and use this information to compute differences in higher-order connectivity patterns among real-world systems. Then, we investigate the overlap among sources and targets to reveal recurring sets of co-sending and co-receiving nodes. We define reciprocity in hypergraphs using exact, strong, and weak definitions to quantify the extent to which hyperedges are reciprocated. Finally, we extend motif analysis to identify recurring interaction patterns and extract the building blocks of directed hypergraphs. We validate our framework on empirical datasets, including Bitcoin transactions, metabolic networks, and citation data, revealing structural principles behind the organization of real-world systems.
Subject terms: Complex networks, Computational science
Many real-world systems involve higher-order, directional interactions that are naturally modeled by directed hypergraphs. The authors develop a framework that characterizes directed hypergraphs through hyperedge pattern frequencies, frequent co-sending and co-receiving nodes, higher-order reciprocity, and motifs, revealing their microscale organization.
Introduction
Accurately modeling interactions among entities is crucial to understand the properties of many complex systems. Traditional network models focus on pairwise connections between nodes1,2, neglecting the complexities of systems where multiple units interact simultaneously. Such higher-order interactions are prevalent in various domains, including social networks3–5, folksonomies6, ecological systems7, chemical reactions8 including metabolic pathways9, and the brain10,11.
Hypergraphs12 provide a framework for explicitly encoding higher-order interactions, representing them as hyperedges connecting multiple nodes simultaneously. By preserving group-based interactions, they improve our ability to understand the structures and dynamics of systems with many-body interactions13,14. Recently, a variety of measures have been introduced or extended to capture the higher-order organization of complex systems, including centrality15,16, community structure17–19 and motifs20–22. Moreover, new models have allowed to describe systems’ evolution23–25, and highlight the importance of higher-order interactions in shaping emergent behaviors in diffusion26,27, synchronization28–30, spreading31,32 and evolutionary dynamics33.
Most research has so far focused on undirected hypergraphs, which fail to capture the directional nature of many real-world interactions. For example, in a metabolic reaction, a set of reactants transforms into a set of products9. Similarly, in a Bitcoin transaction, multiple source wallets may transfer funds simultaneously to multiple target wallets34. To accurately encode such interactions, models must incorporate directionality into their representations. In this sense, directed hypergraphs enhance modeling by distinguishing between source and target sets in each hyperedge35. Tools to study directed hypergraphs are largely underdeveloped, with notable exceptions in areas, such as null models36, synchronization37, overlapping patterns between two hyperedges of limited size38, and some early proposals to define reciprocity39,40.
In this work, we introduce measures and tools to characterize the microscale organization of real-world directed hypergraphs. First, we discuss a decomposition into fundamental interaction types: one-to-one, one-to-many, many-to-one and many-to-many. We analyze empirical data to count the occurrences of each interaction type, and use this information as a signature to compute differences in higher-order connectivity patterns. Then, for each node, we investigate the overlap among its source and target sets, to extract recurring groups of co-senders and co-receivers. By examining this overlap and comparing it against randomized models, we aim to reveal whether certain systems exhibit a more redundant organization, where interactions frequently recur among the same groups, or a more diverse structure with less overlap among participants. Additionally, we propose new, computationally efficient definitions for reciprocity41 for directed hypergraphs, namely exact, strong and weak higher-order reciprocity, designed to capture different patterns of bi-directionality in empirical data. Finally, we extend motif analysis42 to incorporate the directionality of interactions, extracting recurring higher-order and directed subgraphs. Our results suggest the existence of complex mechanisms of feedback and reinforcement in the information flow among system units, where pairwise interactions support the action of groups, and vice versa.
Results
Traditional graph models reduce directed group interactions into a collection of pairwise links, often leading to a loss of important structural information about group organization and dynamics. For instance, reducing a many-to-many interaction, such as SOURCE = {A, B} and TARGET = {D, E} to a set of directed pairwise links (A → D, A → E, B → D and B → D) fails to capture the collective nature of the interaction, including information about co-senders and co-receivers nodes. Directed hypergraphs preserve both group-based structure and the associated information flow, allowing for a more faithful representation of complex interactions. In such a framework, hyperedge direction is encoded by distinguishing between source and target node sets, which we consider non-empty and disjoint. More formally, we work with finite, simple, directed hypergraphs on node set V, where each hyperedge e ∈ E is an ordered pair with . This definition yields four canonical directed hyperedge patterns: one-to-one, where a single source node connects to a single target; one-to-many, where one source affects multiple targets; many-to-one, where multiple sources act on a single target; and many-to-many, the most general case, where multiple sources act on multiple targets. The analysis of the interplay and overlap among these building blocks in real-world hypergraphs enables a characterization of their microscale organization. Figure 1 illustrates this taxonomy on a toy directed hypergraph.
Fig. 1. Schematic of a directed hypergraph.

Each interaction encodes a source set of units acting towards a target set of units. We distinguish four types of directed higher-order interactions: one-to-one (black), one-to-many (blue), many-to-one (red), and many-to-many (green).
We analyzed datasets from multiple domains, including QNA (nodes are users and forum posts are hyperedges), E-MAIL (nodes are users and emails are hyperedges), BITCOIN (nodes are accounts and financial transactions are hyperedges), METABOLIC (nodes are genes and metabolic reactions are hyperedges) and CITATION (nodes are authors and hyperedges are paper citations)40. Each dataset is encoded as a set-indexed adjacency tensor. In particular, we index the distinct source- and target-sets observed in the data by and , respectively, and define the set-indexed adjacency tensor by if and only if there exists e ∈ E with and (and otherwise). Whenever , we enforce , and .
Detailed descriptions and summary statistics of each dataset are reported in Supplementary Note 1.
Patterns of directed hyperedges
A natural starting point to characterize directed hypergraphs across domains is investigating the diversity in their patterns of directed hyperedges. For each dataset, we construct a hyperedge signature vectorv, which captures the distribution of hyperedges based on the sizes of their source and target sets (see Methods). Such vectors provide a fingerprint for systems based on their higher-order connectivity patterns at the microscale. Figure 2a shows the hyperedge signature vectors for each dataset, considering interactions up to size 6. To emphasize the role of higher-order interactions in the analysis, we do not consider one-to-one interactions. We find that one-to-many interactions dominate the E-MAIL dataset, reflecting the typical structure of email communications. Similarly, in the QNA, many-to-one interactions are prevalent, as these systems involve multiple individuals responding to a question by a single user. In contrast, METABOLIC and CITATION datasets show high abundances in many-to-many relationships across a variety of source and target set sizes. Finally, BITCOIN dataset exhibits more varied behavior, with abundant entries for both one-to-many and many-to-many interactions, indicating different interaction types in the network.
Fig. 2. Hyperedge signature of directed hypergraphs.
a We describe each system with a hyperedge signature vector whose entries encode the count of directed hyperedges with source and tail sizes (∣S∣, ∣T∣). We compute statistics using hyperedges with total cardinality at most 6 (i.e., ∣S∣ + ∣T∣≤6). For visualization, we display each vector as a sequence of histogram panels: one panel for each source size ∣S∣ = i, separated by a small gap; within each panel, bins correspond to tail sizes ∣T∣ in increasing order, restricted by i + ∣T∣≤6. Systems from the same domain share the color. b Dendrogram resulting from agglomerative clustering applied to the correlation matrix of hyperedge signature vectors for each dataset. Correlation values are color-coded, with high positive correlations in red and high negative correlations in blue.
To further explore structural diversity across different domains, we compute pairwise rank correlations among hyperedge signature vectors using weighted Kendall’s τ and apply hierarchical agglomerative clustering on their correlation matrix. A correlation value close to 1 indicates similar hyperedge structures, 0 suggests no relationship, and − 1 indicates the structures are inversely related. The clustering procedure applied to the systems’ correlation matrix results in a dendrogram that visually represents their hierarchical relationships, highlighting the presence of clusters of directed hypergraphs that share similar connectivity patterns. In Fig. 2b, we show the correlation matrix and the clustering dendrogram. By examining the correlation matrix, we observe a strong correlation within systems from the same domain, indicating highly similar abundance in hyperedge structures. In contrast, systems from different domains exhibit varying degrees of correlation. Specifically, E-MAIL and QNA datasets are inversely correlated, as they display non-overlapping and complementary connectivity patterns: E-MAIL is characterized by one-to-many interactions, whereas QNA primarily involves many-to-one relationships. The METABOLIC and CITATION datasets, which feature many-to-many interactions, are positively correlated and form a distinct cluster. Interestingly, the BITCOIN datasets also display positive correlations with the METABOLIC and CITATION cluster due to a high presence of many-to-many interaction patterns. However, they also exhibit a weaker positive correlation with the E-MAIL datasets, reflecting the presence of one-to-many interactions in BITCOIN.
Source and target sets overlap
The degree to which hyperedges share elements provides valuable insights into redundancy, hierarchical structures, and information flow within the system. In general, real-world systems exhibit a high degree of overlap, indicating that recurrent and redundant interactions are a shared feature43. Moreover, hyperedge overlap has been shown to widely impact the dynamics of systems with higher-order interactions44,45.
In directed hypergraphs, nodes are frequently involved in multiple hyperedges, either as part of the source or the target set. In order to characterize and quantify this property, for each node, we measure how much its incident source sets overlap and how much its incident target sets overlap, and we compare with a null model, reporting source and target z-scores (see Methods). In other words, we measure the extent to which nodes engage in interactions with the same set of co-senders or co-receivers. Specifically, an observed overlap associated with a z-score greater than 2 (i.e., statistically significant excess overlap) highlights nodes that tend to participate in structurally redundant interactions where the same groups of nodes frequently co-occur in source or target sets. Conversely, a z-score less than − 2 reflects statistically significant lower overlap than expected, suggesting that interactions are more diverse, with hyperedges being more distinct and less likely to share members.
In Fig. 3, we show the distribution of nodes with a given overlap z-score for source and target sets across domains, along with the fraction of nodes in each region of the z-score space. The focus is on positive excess overlap, as statistically significant negative excess overlap is very rare. The CITATION dataset displays significant overlap for both source and target sets, showing that (i) an author tends to preferentially work with known collaborators, and (ii) an author tends to be cited repeatedly alongside similar sets of authors. In the E-MAIL dataset, the excess overlap can be computed only for the target sets, as the source sets always have cardinality 1. The overlap is significantly larger than random, underscoring the hierarchical and broadcast-like nature of email communication. The BITCOIN dataset generally exhibits nodes with high excess overlap in both source and target sets. However, the presence of nodes with excess overlap values lower than zero implies that certain participants in the network engage in interactions that introduce more novelty rather than reinforce existing hyperedges. The METABOLIC dataset follows a similar trend, with half of the nodes displaying significant excess overlap for either source or target sets or both, suggesting that metabolic reactions tend to involve recurring sets of substrates and products and highlighting the modular nature of metabolic networks. QNA data show a lower excess overlap compared with the other systems, indicating that forum respondents are less likely to engage repeatedly with the same set of co-responders. Since the target sets in this dataset always have cardinality 1, the excess overlap can be computed only for source sets.
Fig. 3. Overlap across domains.
a Distribution of node counts within the joint z-score space of source and target overlap, representing how much nodes deviate from null model expectations in both dimensions. b Bar plots quantifying the fraction of nodes exceeding the threshold for either source or target overlaps, nodes exceeding both thresholds, and nodes below both thresholds.
Higher-order reciprocity
Reciprocity is a fundamental property of systems with directed interactions, including social networks46. It traditionally refers to the tendency of the system’s units to mutually exchange information. In directed graphs, reciprocity is defined as
measuring the ratio of the number of bidirectional links () to the total number of links (L). This measure is often normalized as
where r is the observed reciprocity in the network, and 〈r〉NM is the average reciprocity in null model samples41. It measures the difference between observed and expected reciprocity by the maximum possible deviation, bounding ρNM between − 1 and 1. Positive values indicate more reciprocity than expected at random, negative values indicate less, and values near zero suggest consistency with the null model. This normalization allows a more faithful comparison and ranking of reciprocity across systems with different scales and density41. Recognizing its broad importance, recent works have extended reciprocity to hypergraphs, accounting for the complexity of having multiple nodes in both the source and target sets of hyperedges. Among the recent approaches for hypergraph reciprocity, one method decomposes hyperedges into pairwise links39, losing information about group interactions. An alternative approach defines a more complex measure that diverges from the traditional binary definition of reciprocity at the level of single links40. While this approach can capture different nuances, it is computationally expensive and less straightforward to interpret, as it provides a continuous value instead of a simple yes-or-no answer to whether an interaction is reciprocated.
Here, we introduce three simple and computationally efficient measures for higher-order reciprocity in directed hypergraphs, capturing different aspects of mutual interactions:
Exact reciprocity occurs when an interaction represented by a hyperedge with a source set h and a target set t is precisely mirrored by another interaction with the source and target sets reversed. Formally, two hyperedges e1 = (h1, t1) and e2 = (h2, t2) are exactly reciprocated if and only if h1 = t2 and t1 = h2. This is the strictest form of reciprocity.
Strong reciprocity relaxes the previous requirement and allows source and target sets to be reversed through a combination of hyperedges, instead of requiring a direct reversal with a single opposite one. Formally, a hyperedge e = (h, t) is strongly reciprocated if there exists a set of hyperedges {e1, e2, …, ek} such that the union of the target sets of e1, …, ek is a superset of the source set h, and the union of the source sets of e1, …, ek is a superset of the target set t.
Weak reciprocity represents the most relaxed form of reciprocity and requires only that at least one node from the target set of a hyperedge appears in the source set of another, and vice versa. Formally, a hyperedge e = (h, t) is weakly reciprocated if there exists another hyperedge such that .
We summarize our definitions of reciprocity for directed hypergraphs in Fig. 4. More information about the algorithmic aspects of such measures is available in the Methods section.
Fig. 4. Reciprocity measures for directed hypergraphs.
a Example of a directed hyperedge. b All possible ways in which this hyperedge can be reciprocated according to our definitions. Exact reciprocity: a single hyperedge with source and target sets swapped, represented by reversing the arrow between the same node sets. Strong reciprocity: multiple hyperedges collectively reverse the interaction, possibly involving external nodes. Weak reciprocity: at least one node in the target set reciprocates with one node in the source set, illustrated as a pairwise link with reversed direction. In all panels, shaded areas group nodes involved in each interaction; colors encode the interaction pattern (green many-to-many, red many-to-one, black one-to-one); arrows encode direction from source to target. Gray disks denote nodes in the original hyperedge, and external nodes that appear only in reciprocal interactions are white with a dashed border.
After introducing these definitions, a natural first question is which systems exhibit the highest and lowest levels of reciprocity and how the ranking of systems based on reciprocity changes across different definitions. We address this in Fig. 5a, which shows the normalized ratio of reciprocated hyperedges ρNM (reciprocity score) for each system across varying notions of reciprocity. The reciprocity score induces rankings of the systems, allowing us to observe which systems exhibit stronger tendencies toward mutual exchange of information. By definition, the score tends to increase for each system as we move from stricter definitions of reciprocity (exact) to more relaxed ones (weak). We observe that systems from the same domain tend to show similar levels of reciprocity across definitions, indicating that functional similarities within domains may drive comparable reciprocity patterns. E-MAIL datasets exhibit the highest levels of reciprocity, while BITCOIN datasets consistently show the lowest. Interestingly, while the ranking of systems remains largely stable with varying definitions, the relative distances between the datasets change. For instance, exact reciprocity mostly characterizes E-MAIL datasets, which are positioned far from the other datasets, clustering distinctly at the top of the scale. Strong reciprocity induces three clear clusters of datasets based on their scores: E-MAIL datasets rank the highest by a large margin, while BITCOIN datasets occupy the very low end. In the case of weak reciprocity, the datasets begin to separate along domain lines, spanning the entire spectrum of reciprocity scores. Notably, we observe a reduction in the distance between E-MAIL and datasets from metabolic and citation domains, suggesting a convergence in reciprocity levels as the definition becomes more relaxed. Overall, these patterns highlight how the choice of measure can influence the perceived level of reciprocity within different systems. By analyzing how the score evolves across definitions, we gain a more precise understanding of the extent of mutual exchange within each system, from the high reciprocity observed in E-MAIL datasets, where high mutual exchange is clear, to the lower reciprocity in BITCOIN datasets, where reciprocal connections are minimal across all definitions, and to the metabolic datasets, which emerge with high reciprocity under weaker definitions.
Fig. 5. Higher-order reciprocity in real-world hypergraphs.
a Reciprocity score across datasets and reciprocity definitions. Each column corresponds to a distinct notion of higher-order reciprocity, thereby inducing a ranking of the datasets based on their scores. Datasets from the same domains share the same color. Arrows link the datasets across different definitions. b Reciprocity score disaggregated by hyperedge size for each different notion of reciprocity. Trends should be interpreted relative to the null (zero line). To simplify the plots, we aggregate systems from the same domain.
A related question is how the size of hyperedges influences the levels of reciprocity. We address this in Fig. 5b, which reports the reciprocity score as a function of interaction size for the three notions, with deviations interpreted relative to the null baseline. For the E-MAIL datasets, ρNM displays consistent positive offset for strong and weak reciprocity across all admissible sizes, whereas exact reciprocity exhibits a pronounced signal only at size 2 and is negligible at larger sizes. This suggests that excess reciprocal structure relative to the null is present for most interaction sizes, while exact “mirror" reversals remain largely a dyadic feature. In the METABOLIC datasets, exact reciprocity remains near zero across sizes, with strong and weak deviations showing initial negative values for small sizes, followed by a substantial increase at higher sizes that is more pronounced for the weak definition. Thus, small-set interactions are under-reciprocated relative to the null, whereas large-set interactions become increasingly over-represented. In the CITATION datasets, exact reciprocity is modestly positive at small sizes. Strong reciprocity displays a higher, stable excess across sizes, while weak reciprocity reaches its highest value for the smallest sets and diminishes with size. Citation reciprocity above the null is thus dominated by weak reciprocity in small interactions, with strong reciprocity contributing a more size-invariant offset. For the BITCOIN datasets, scores for all definitions cluster near zero, with a slight increase from exact to strong to weak, and minimal size dependence. This points to weak, size-stable excess reciprocal structure. Finally, the QNA datasets track the BITCOIN pattern, with modest positive offsets that increase from exact to strong to weak definitions, and little systematic variation as size increases.
These findings suggest that weaker notions of reciprocity are valuable in providing insights into the overall reciprocity of systems with larger interactions, capturing a multifaceted view of how reciprocity operates at different strengths and interaction sizes.
Motif analysis in directed hypergraphs
Motif analysis involves counting the frequency of patterns of interactions in connected subgraphs of a given number of nodes. This framework was first introduced by Milo et al.42 to extract the fundamental functional units of complex systems47. Recently, motif analysis has been extended to undirected hypergraphs to capture patterns of interactions with arbitrary size20. Here, we extend such analysis to consider also the direction of the hyperedges involved in the patterns.
First, it is interesting to study the combinatorics of the patterns of directed subhypergraphs. There is no simple closed-form formula for counting the number of possible directed higher-order motifs as a function of their order n, i.e., the number of nodes in the patterns. We can estimate the number of non-isomorphic connected directed hypergraphs in a way similar to ref. 20. Given a set of n nodes, the number of possible directed hyperedges is . This expression counts the ways to partition the n nodes into three disjoint sets: source, target and empty set. We subtract the invalid combinations with empty source or target sets. Given n nodes, we ensure connectivity by selecting a chain of n − 1 hyperedges and including them in the hypergraph, leaving us with 3n − 2 ⋅ 2n − n + 2 remaining possible hyperedges. For each remaining hyperedge, we decide whether to include it or not, resulting in total hypergraphs. Since we are interested in non-isomorphic hypergraphs, we divide this number by n!, the number of ways to label the vertices, providing the lower bound . If we ignore the constraints of non-isomorphism and connectivity, we count the number of possible labeled hypergraphs. Since each of the 3n − 2 ⋅ 2n + 1 possible hyperedges can either be included or excluded, the total number of labeled hypergraphs is at most . Figure 6 shows the upper and lower bounds on the growth of possible sub-hypergraph patterns as a function of the number of nodes (order) for both the undirected and directed cases. The estimated number of patterns grows super-exponentially, even in the undirected case. In the directed case, the growth is even faster due to the need to consider all possible subdivisions into source and target sets.
Fig. 6. Combinatorics of directed higher-order motifs.

Upper (dashed lines) and lower (solid lines) bounds on the number of higher-order motifs as a function of their order. Blue lines refer to undirected motifs on hypergraphs, red lines refer to the directed case.
To perform motif analysis on real-world directed hypergraphs, we propose an exact algorithm to count the frequency of all connected sub-hypergraph patterns and associate each pattern with a z-score that quantifies its over- or under-representation relative to our proposed configuration model (see Methods).
Given the intractability of the problem for large sub-hypergraphs, we limit our study of empirical data to patterns involving three and four nodes. Moreover, we focus on patterns that include at least one group interaction. Because the theoretical pattern space is still very large, while the empirical signal is sparse and concentrated (see Supplementary Note 3), we restrict our analysis to the positive tail of the motif z-score distributions (see Supplementary Note 3) to highlight structural differences across datasets. In particular, Fig. 7 shows the most over-represented patterns of directed higher-order interactions with three and four nodes across different domains. Each domain reveals distinct motifs, characterized by different directed hyperedge types, sizes, densities and patterns of reciprocity. In terms of hyperedges types, E-MAIL and QNA involve abundant patterns with only many-to-one and one-to-many interactions. Other datasets display more diverse patterns, including combinations of one-to-many, many-to-one, and many-to-many interactions (the latter is possible only in motifs with four nodes). Traditional one-to-one interactions are commonly part of abundant patterns in all datasets. The number of interactions in abundant sub-hypergraphs is small in the BITCOIN, METABOLIC and CITATION domains, often involving just one or two hyperedges. In contrast, the E-MAIL and QNA domains tend to be richer in interactions. This observation is reversed when considering the average size of interactions. The relation between the number and the average size of interactions aligns with previous studies on undirected higher-order motifs20. A common pattern in many datasets is the coexistence of group interactions alongside lower-order interactions within the same set of nodes, a phenomenon often referred to as nestedness20 or simpliciality48. Interestingly, considering the direction of such interactions reveals that they seem to play a role in increasing the overall reciprocity of the patterns, suggesting the existence of a feedback mechanism. This is particularly evident in E-MAIL data. In addition to reciprocity, the direction of lower-order interactions in abundant patterns suggests a reinforcing mechanism where subsets of source and target nodes interact at multiple interaction sizes. These observations are closely connected with the insights discussed in the previous sections about frequent co-senders and co-receivers nodes and higher-order reciprocity.
Fig. 7. Directed higher-order motifs in real-world hypergraphs.
The three most representative directed higher-order motifs of orders three and four from each system. The color of a group interaction encodes its type: one-to-one (black), one-to-many (blue), many-to-one (red), and many-to-many (green). We group statistics of systems within the same domain.
Discussion
Hypergraphs extend traditional network representations by allowing hyperedges to connect multiple nodes simultaneously, enabling the encoding of group interactions ubiquitous in many relational systems. Directed hypergraphs further enhance our modeling abilities by accounting for directionality in group interactions, distinguishing between source and target sets for each hyperedge. This versatile framework can accurately model a range of diverse real-world systems and interactions, including financial transactions, email exchanges, and metabolic reactions.
In this work, we proposed new measures and tools to analyze the structural organization of directed hypergraphs at their microscale. First, we analyzed hyperedge signature vectors to identify the abundance of each hyperedge structure across datasets and identified classes of systems sharing similar higher-order connectivity patterns. Second, we analyzed the excess overlap among source and target sets for each node in each system. The resulting distributions suggest that different domains may follow distinct organizational principles, ranging from redundant to more diverse interaction patterns. Then, we introduced three distinct types of higher-order reciprocity measures: exact, strong, and weak reciprocity. Each definition offers a different perspective on how group interactions can be reciprocated, ranging from strict to more relaxed forms of reciprocal influence, and can be computed efficiently, making it suitable also for the analysis of very large systems. We showed that all systems exhibit reciprocity in broad terms, though different domains are associated with specific patterns and sensitivity to specific reciprocity measures. Lastly, we extended the notion of motifs to directed hypergraphs, capturing recurring patterns of directed interactions. Motif analysis revealed frequent microscale structures and highlighted common organizational principles playing a role in the function and behavior of systems, such as the existence of reinforcing or feedback mechanisms among dyadic and non-dyadic interactions in groups.
Taken together, by considering the nuances related to the directionality of interactions in directed hypergraphs, our research provides a framework to understand higher-order connectivity in directed complex systems, opening up a wide range of potential applications in diverse fields, such as social network analysis, biology, and finance. For instance, the study of multi-party financial transactions as directed higher-order structures may capture more complex patterns of fraudulent activity than traditional graph-based models49. Similarly, directed hypergraphs may enhance the accuracy of existing frameworks in identifying and predicting important genes based on genomic expression relations50. As scalability is a pressing issue in hypergraph algorithms, future work may explore more advanced techniques for detecting motifs in large-scale directed hypergraphs. These include sampling methods already proposed for undirected higher-order motifs51 or the use of parallel algorithms, which may achieve significant speed-ups beyond the implementation of our current Python library, thereby enabling analysis of motifs larger than four nodes. Another interesting venue for further studies is related to the study of reciprocity in time-evolving hypergraphs, since it can affect mechanisms of group formation4,5 and inform the efficient seeding of information52,53. All in all, our work reveals new structural principles behind the organization of real-world systems, shedding light on the complex interplay between structural patterns and functionality in directed complex systems.
Methods
Hyperedge signature vector construction
For each dataset, we construct a hyperedge signature vectorv, where each element represents the count of hyperedges with a specific combination of source set size s and target set size t in the hypergraph. The vector v captures the distribution of hyperedges based on the sizes of their source and target sets, providing a profile of the hypergraph structure.
Formally, we define the vector v as follows:
where K represents the maximum hyperedge size considered, and each vh,t counts the number of hyperedges with a specific source size h and target size t.
Microcanonical set-swap configuration model for directed hypergraphs
We generate randomized counterparts of directed hypergraphs using a microcanonical set-swap configuration model. This approach is similar to the one proposed by Preti et al.36, and more recently by Kraakman et al.54. More broadly, our null model fits within the family of configuration- and entropy-based random hypergraph ensembles that preserve degree and edge-size sequences55–58, which have been used in applications, such as extracting statistically validated higher-order interactions59 and motifs20. In our framework, each directed hyperedge e ∈ E is specified by two disjoint node sets: a source set s(e) ⊆ V and a target set t(e) ⊆ V, with . The microcanonical ensemble comprises all simple directed hypergraphs that match the observed node out-/in-degree sequences (counts of appearances in s( ⋅ ) and t( ⋅ ), respectively) and preserve every hyperedge’s cardinalities ∣s(e)∣ and ∣t(e)∣. We forbid duplicated hyperedges.
Starting from the observed H, we repeatedly attempt set-swaps on one side at a time. In a source-side attempt, we select two distinct hyperedges ei ≠ ej uniformly and sample u ∈ s(ei) and v ∈ s(ej). We propose
leaving t(ei) and t(ej) unchanged. The move is accepted only if it respects set semantics on the chosen side (no duplicates), preserves within-edge disjointness (), and does not create a duplicate . Target-side attempts are defined symmetrically. Accepted swaps conserve each node’s in/out degree and all ∣s(e)∣, ∣t(e)∣. Rejected proposals leave the state unchanged.
Hyperedge overlap
For each node, we quantify its hyperedge overlap separately for its participation in target sets (in-hyperedges) and source sets (out-hyperedges). For in-hyperedge overlap, we consider all hyperedges e for which the node is in the target set t(e). Let be the collection of these hyperedges. We define the in-hyperedge overlap as
where ∣t(e)∣ denotes the number of nodes in the target set of hyperedge e.
Similarly, for out-hyperedge overlap, we consider only hyperedges e in which the node is a source (i.e., belongs to s(e)). Let denote this collection, and define
with ∣s(e)∣ being the size of the source set of hyperedge e. These metrics yield a value of 1 when all corresponding hyperedges share an identical set of nodes (i.e., maximal overlap), and decrease as the sets become more diverse.
To assess statistical significance, we compare the observed overlaps to those computed on an ensemble of randomized networks that preserve key structural properties (e.g., node degrees and hyperedge sizes). For each node i, we compute a z-score that standardizes the observed overlap relative to the null ensemble:
We compute this for both Osources and Otargets.
Algorithms for measuring reciprocity in directed hypergraphs
Below, we outline our proposed algorithms for efficiently measuring reciprocity in directed hypergraphs.
Exact reciprocity. Each hyperedge e = (s, t) is stored in a hash-based dictionary, and for each hyperedge, we search for a reverse hyperedge . Since each lookup takes constant time, the overall complexity is O(m), where m is the number of hyperedges.
Strong reciprocity. For each hyperedge e = (s, t), we maintain a reachability dictionary that tracks which nodes in the target set t can reach other nodes via multiple hyperedges. We then check whether the source set s is fully covered by the accumulated reachable nodes from the target set t. This involves iterating over each hyperedge, for each target node, accumulating the reachable nodes and then checking if the source set is a subset of this accumulated set. Computing the union of reachable nodes is O(s ⋅ t), where s is the maximum size of source sets and t is the maximum size of target sets. This operation is repeated for all hyperedges, leading to a total complexity of O(m ⋅ s ⋅ t).
Weak reciprocity. First, we construct a dictionary to store all directed node pairs between the source and target sets of each hyperedge. Then, for each hyperedge, we check whether any of its target nodes are linked back to the source nodes via reverse connections in the dictionary. The computational complexity is dominated by the first operation, which is O(m ⋅ s ⋅ t), where s is the maximum size of the source sets and t is the maximum size of the target sets across all hyperedges.
In practice, executing these algorithms on the real-world datasets used in our experiments requires only a few minutes for all datasets combined, demonstrating the computational efficiency of the proposed methods.
Algorithms for motif analysis in directed hypergraphs
In order to design efficient algorithms for mining directed higher-order motifs, we extend prior ideas developed for the same problem in undirected hypergraphs51. Our algorithms are efficient enough to count motifs of size 3 and 4 in datasets of reasonable size (comparable to those used in our experiments). However, scaling to larger datasets and motifs of larger size would require more sophisticated approaches, such as sampling algorithms51, which we leave for future work. Further details on the execution times of the algorithms for mining motifs of orders 3 and 4 can be found in Supplementary Notes 2.
The algorithm for mining motifs (involving at least one group interaction) of order 3 begins by iterating through each hyperedge in the hypergraph that contains exactly three vertices. For each such hyperedge, it identifies all possible subsets of vertices and checks whether one or more subsets form valid directed hyperedges in the hypergraph. Valid subsets, along with the original hyperedge, define the motif structure involving those three vertices. To ensure consistency in motif identification, the algorithm generates a canonical form of the motif by lexicographical ordering its vertices and edges, which can be computed by sorting the n! possible relabels. This canonical representation allows motifs with the same structural pattern to be compared and counted, even if they differ in their vertex labels. Each canonical form of motifs is stored in a frequency hash map. If the motif has not been encountered before, it is added to the map; if it has, its frequency count is incremented. In the end, the algorithm outputs a distribution of the various motif structures of order 3. This algorithm operates in linear time with respect to the number of hyperedges of order 3. Specifically, its computational complexity is O(m3), where m3 is the number of hyperedges involving exactly three vertices. Each motif construction and comparison is performed in constant time due to the fixed size of the motifs. For more details, refer to the pseudocode in Supplementary Note 2. The algorithm for mining motifs of order 4 follows a similar approach. First, it iterates over all hyperedges of size 4, counting the motifs involving exactly these 4 nodes. Unlike the previous algorithm, it then iterates over all hyperedges of size 3, performing an additional neighborhood exploration step to identify the fourth node involved in the motif. Each neighboring node is considered during this process. Once the 4 nodes are identified, the algorithm constructs the motif as before. The pseudocode for this algorithm is provided in Supplementary Note 2.
Statistical significance of motifs
To distinguish meaningful, non-random interaction patterns from those that may occur by chance, we use a configuration model as a null model to evaluate the statistical significance of the interaction patterns after computing their frequency in our directed hypergraphs. The configuration model generates randomized versions of the original hypergraph while preserving key properties, such as the in-degree and out-degree sequences, as well as the source and target sizes of the hyperedges36. By comparing the observed frequencies with those found in the randomized networks, we can identify significantly over-represented motifs.
In particular, each motif i is associated with a standardized score zM(i), which quantifies how many standard deviations the observed frequency of the motif differs from its expected value under the configuration model20,42. A larger absolute value of zM(i) indicates a stronger statistical deviation from the null expectation, meaning the motif is significantly over- or under-represented compared to random networks. The score is defined as:
Here, denotes the observed frequency of motif i in the empirical hypergraph, and and are the mean and variance of motif i’s frequency across random hypergraphs generated by the configuration model. Following ref. 47, we generate 100 random samples from the configuration model for each hypergraph to estimate the mean and variance of the motif frequencies.
Supplementary information
Acknowledgements
F.B. acknowledges support from the Air Force Office of Scientific Research under award number FA8655-22-1-7025. F.B. acknowledges support from the Austrian Science Fund (FWF) through project 10.55776/PAT1052824 and project 10.55776/PAT1652425. A.M. acknowledges support from the European Union through Horizon Europe CLOUDSTARS project (101086248).
Author contributions
Q.F.L., A.M., and F.B. designed research; Q.F.L. analyzed data; Q.F.L. and A.V. developed the algorithms and performed the computations; Q.F.L., A.V., A.M, and F.B. analyzed results and wrote the paper.
Peer review
Peer review information
: Communications Physics thanks the anonymous reviewers for their contribution to the peer review of this work.
Data availability
Data40 is publicly available and also easily accessible through HGX60.
Code availability
The tools for the analysis of directed hypergraphs presented in this work are available as part of Hypergraphx (HGX)60.
Competing interests
Federico Battiston is an Editorial Board Member for Communications Physics, and one of the editors curating the Focus Collection “Higher-order Interaction Networks 2024” but was not involved in the editorial review of, or the decision to publish this article. All other authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Quintino Francesco Lotito, Email: lotitoq@ceu.edu.
Federico Battiston, Email: battistonf@ceu.edu.
Supplementary information
The online version contains supplementary material available at 10.1038/s42005-025-02472-9.
References
- 1.Boccaletti, S., Latora, V., Moreno, Y., Chavez, M. & Hwang, D.-U. Complex networks: Structure and dynamics. Phys. Rep.424, 175–308 (2006). [Google Scholar]
- 2.Cimini, G. et al. The statistical physics of real-world networks. Nat. Rev. Phys.1, 58–71 (2019). [Google Scholar]
- 3.Patania, A., Petri, G. & Vaccarino, F. The shape of collaborations. EPJ Data Sci.6, 1–16 (2017).32355601 [Google Scholar]
- 4.Cencetti, G., Battiston, F., Lepri, B. & Karsai, M. Temporal properties of higher-order interactions in social networks. Sci. Rep.11, 1–10 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Iacopini, I., Karsai, M. & Barrat, A. The temporal dynamics of group interactions in higher-order social networks. Nat. Commun.15, 7391 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Ghoshal, G., Zlatić, V. & Caldarelli, G. Random hypergraphs and their applications. Phys. Rev. E79, 066118 (2009). [DOI] [PubMed] [Google Scholar]
- 7.Grilli, J., Barabás, G., Michalska-Smith, M. J. & Allesina, S. Higher-order interactions stabilize dynamics in competitive network models. Nature548, 210–213 (2017). [DOI] [PubMed] [Google Scholar]
- 8.Jost, J. & Mulas, R. Hypergraph laplace operators for chemical reaction networks. Adv. Math.351, 870–896 (2019). [Google Scholar]
- 9.Traversa, P., Ferraz de Arruda, G., Vazquez, A. & Moreno, Y. Robustness and complexity of directed and weighted metabolic hypergraphs. Entropy25, 1537 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Petri, G. et al. Homological scaffolds of brain functional networks. J. R. Soc. Interface11, 20140873 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Santoro, A., Battiston, F., Lucas, M., Petri, G. & Amico, E. Higher-order connectomics of human brain function reveals local topological signatures of task decoding, individual identification, and behavior. bioRxiv 2023–12 (2023). [DOI] [PMC free article] [PubMed]
- 12.Berge, C.Graphs and hypergraphs (North-Holland Pub. Co., 1973).
- 13.Battiston, F. et al. Networks beyond pairwise interactions: structure and dynamics. Phys. Rep.874, 1–92 (2020). [Google Scholar]
- 14.Battiston, F. et al. The physics of higher-order interactions in complex systems. Nat. Phys.17, 1093–1098 (2021). [Google Scholar]
- 15.Benson, A. R. Three hypergraph eigenvector centralities. SIAM J. Math. Data Sci.1, 293–312 (2019). [Google Scholar]
- 16.Tudisco, F. & Higham, D. J. Node and edge nonlinear eigenvector centrality for hypergraphs. Commun. Phys.4, 1–10 (2021). [Google Scholar]
- 17.Eriksson, A., Edler, D., Rojas, A., de Domenico, M. & Rosvall, M. How choosing random-walk model and network representation matters for flow-based community detection in hypergraphs. Commun. Phys.4, 1–12 (2021). [Google Scholar]
- 18.Contisciani, M., Battiston, F. & De Bacco, C. Inference of hyperedges and overlapping communities in hypergraphs. Nat. Commun.13, 1–10 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Ruggeri, N., Contisciani, M., Battiston, F. & De Bacco, C. Community detection in large hypergraphs. Sci. Adv.9, eadg9159 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Lotito, Q. F., Musciotto, F., Montresor, A. & Battiston, F. Higher-order motif analysis in hypergraphs. Commun. Phys.5, 79 (2022). [Google Scholar]
- 21.Lee, G., Ko, J. & Shin, K. Hypergraph motifs: concepts, algorithms, and discoveries. Proc. VLDB Endow.13, 2256–2269 (2020). [Google Scholar]
- 22.Arregui-García, B., Longa, A., Lotito, Q. F., Meloni, S. & Cencetti, G. Patterns in temporal networks with higher-order egocentric structures. Entropy26, 256 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Petri, G. & Barrat, A. Simplicial activity driven model. Phys. Rev. Lett.121, 228301 (2018). [DOI] [PubMed] [Google Scholar]
- 24.Di Gaetano, L., Battiston, F. & Starnini, M. Percolation and topological properties of temporal higher-order networks. Phys. Rev. Lett.132, 037401 (2024). [DOI] [PubMed] [Google Scholar]
- 25.Gallo, L., Lacasa, L., Latora, V. & Battiston, F. Higher-order correlations reveal complex memory in temporal hypergraphs. Nat. Commun.15, 4754 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Schaub, M. T., Benson, A. R., Horn, P., Lippner, G. & Jadbabaie, A. Random walks on simplicial complexes and the normalized hodge 1-laplacian. SIAM Rev.62, 353–391 (2020). [Google Scholar]
- 27.Carletti, T., Battiston, F., Cencetti, G. & Fanelli, D. Random walks on hypergraphs. Phys. Rev. E101, 022308 (2020). [DOI] [PubMed] [Google Scholar]
- 28.Lucas, M., Cencetti, G. & Battiston, F. Multiorder laplacian for synchronization in higher-order networks. Phys. Rev. Res.2, 033410 (2020). [Google Scholar]
- 29.Gambuzza, L. V. et al. Stability of synchronization in simplicial complexes. Nat. Commun.12, 1–13 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Zhang, Y., Lucas, M. & Battiston, F. Higher-order interactions shape collective dynamics differently in hypergraphs and simplicial complexes. Nat. Commun.14, 1605 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Iacopini, I., Petri, G., Barrat, A. & Latora, V. Simplicial models of social contagion. Nat. Commun.10, 1–9 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Chowdhary, S., Kumar, A., Cencetti, G., Iacopini, I. & Battiston, F. Simplicial contagion in temporal higher-order networks. J. Phys.: Complex.2, 035019 (2021). [Google Scholar]
- 33.Civilini, A., Sadekar, O., Battiston, F., Gómez-Gardeñes, J. & Latora, V. Explosive cooperation in social dilemmas on higher-order networks. Phys. Rev. Lett.132, 167401 (2024). [DOI] [PubMed] [Google Scholar]
- 34.Ranshous, S. et al. Exchange pattern mining in the bitcoin transaction directed hypergraph. In Financial Cryptography and Data Security: FC 2017 International Workshops, WAHC, BITCOIN, VOTING, WTSC, and TA, Sliema, Malta, April 7, 2017, Revised Selected Papers 21, 248–263 (Springer, 2017).
- 35.Gallo, G., Longo, G., Pallottino, S. & Nguyen, S. Directed hypergraphs and applications. Discret. Appl. Math.42, 177–201 (1993). [Google Scholar]
- 36.Preti, G., Fazzone, A., Petri, G. & De Francisci Morales, G. Higher-order null models as a lens for social systems. Phys. Rev. X14, 031032 (2024). [Google Scholar]
- 37.Gallo, L. et al. Synchronization induced by directed higher-order interactions. Commun. Phys.5, 263 (2022). [Google Scholar]
- 38.Moon, H., Kim, H., Kim, S. & Shin, K. Four-set hypergraphlets for characterization of directed hypergraphs. arXiv preprint arXiv:2311.14289 (2023).
- 39.Pearcy, N., Crofts, J. J. & Chuzhanova, N. Hypergraph models of metabolism. Int. J. Biol. Vet. Agric. Food Eng.8, 752–756 (2014). [Google Scholar]
- 40.Kim, S., Choe, M., Yoo, J. & Shin, K. Reciprocity in directed hypergraphs: measures, findings, and generators. Data Min. Knowl. Discov.37, 2330–2388 (2023). [Google Scholar]
- 41.Garlaschelli, D. & Loffredo, M. I. Patterns of link reciprocity in directed networks. Phys. Rev. Lett.93, 268701 (2004). [DOI] [PubMed] [Google Scholar]
- 42.Milo, R. et al. Network motifs: Simple building blocks of complex networks. Science298, 824–827 (2002). [DOI] [PubMed] [Google Scholar]
- 43.Lee, G., Choe, M. & Shin, K. How do hyperedges overlap in real-world hypergraphs?-patterns, measures, and generators. In Proc. of the web conference, 3396–3407 (ACM, 2021).
- 44.Malizia, F., Lamata-Otín, S., Frasca, M., Latora, V. & Gómez-Gardeñes, J. Hyperedge overlap drives explosive transitions in systems with higher-order interactions. Nat. Commun.16, 555 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Lamata-Otín, S., Malizia, F., Latora, V., Frasca, M. & Gómez-Gardeñes, J. Hyperedge overlap drives synchronizability of systems with higher-order interactions. Phys. Rev. E111, 034302 (2025). [DOI] [PubMed] [Google Scholar]
- 46.Wasserman, S., Faust, K. et al. Social network analysis: Methods and applications (Cambridge University Press, 1994).
- 47.Milo, R. et al. Superfamilies of evolved and designed networks. Science303, 1538–1542 (2004). [DOI] [PubMed] [Google Scholar]
- 48.Landry, N. W., Young, J.-G. & Eikmeier, N. The simpliciality of higher-order networks. EPJ data Sci.13, 17 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Akoglu, L., Tong, H. & Koutra, D. Graph based anomaly detection and description: a survey. Data Min. Knowl. Discov.29, 626–688 (2015). [Google Scholar]
- 50.Feng, S. et al. Hypergraph models of biological networks to identify genes critical to pathogenic viral response. BMC Bioinforma.22, 287 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Lotito, Q. F., Musciotto, F., Battiston, F. & Montresor, A. Exact and sampling methods for mining higher-order motifs in large hypergraphs. Computing106, 475–494 (2024). [Google Scholar]
- 52.Mancastroppa, M., Iacopini, I., Petri, G. & Barrat, A. Hyper-cores promote localization and efficient seeding in higher-order processes. Nat. Commun.14, 6223 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Genetti, S., Ribaga, E., Cunegatti, E., Lotito, Q. F. & Iacca, G. Influence maximization in hypergraphs using multi-objective evolutionary algorithms. In International Conference on Parallel Problem Solving from Nature, 217–235 (Springer, 2024).
- 54.Kraakman, Y. J. & Stegehuis, C. Hypercurveball algorithm for sampling hypergraphs with fixed degrees. J. Complex Netw.13, cnaf007 (2025). [Google Scholar]
- 55.Chodrow, P. S. Configuration models of random hypergraphs. J. Complex Netw.8, cnaa018 (2020). [Google Scholar]
- 56.Barthelemy, M. Class of models for random hypergraphs. Phys. Rev. E106, 064310 (2022). [DOI] [PubMed] [Google Scholar]
- 57.Nakajima, K., Shudo, K. & Masuda, N. Randomizing hypergraphs preserving degree correlation and local clustering. IEEE Trans. Netw. Sci. Eng.9, 1139–1153 (2021). [Google Scholar]
- 58.Saracco, F., Petri, G., Lambiotte, R. & Squartini, T. Entropy-based models to randomise real-world hypergraphs. Commun. Phys.8, 284 (2025). [Google Scholar]
- 59.Musciotto, F., Battiston, F. & Mantegna, R. N. Detecting informative higher-order interactions in statistically validated hypergraphs. Commun. Phys.4, 1–9 (2021). [Google Scholar]
- 60.Lotito, Q. F. et al. Hypergraphx: a library for higher-order network analysis. J. Complex Netw.11, cnad019 (2023). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data40 is publicly available and also easily accessible through HGX60.
The tools for the analysis of directed hypergraphs presented in this work are available as part of Hypergraphx (HGX)60.





