Skip to main content
The Innovation logoLink to The Innovation
. 2024 Feb 8;5(2):100590. doi: 10.1016/j.xinn.2024.100590

A survey on causal inference for recommendation

Huishi Luo 1, Fuzhen Zhuang 1,2,, Ruobing Xie 3, Hengshu Zhu 4, Deqing Wang 5, Zhulin An 6, Yongjun Xu 6
PMCID: PMC10901840  PMID: 38426201

Abstract

Causal inference has recently garnered significant interest among recommender system (RS) researchers due to its ability to dissect cause-and-effect relationships and its broad applicability across multiple fields. It offers a framework to model the causality in RSs such as confounding effects and deal with counterfactual problems such as offline policy evaluation and data augmentation. Although there are already some valuable surveys on causal recommendations, they typically classify approaches based on the practical issues faced in RS, a classification that may disperse and fragment the unified causal theories. Considering RS researchers' unfamiliarity with causality, it is necessary yet challenging to comprehensively review relevant studies from a coherent causal theoretical perspective, thereby facilitating a deeper integration of causal inference in RS. This survey provides a systematic review of up-to-date papers in this area from a causal theory standpoint and traces the evolutionary development of RS methods within the same causal strategy. First, we introduce the fundamental concepts of causal inference as the basis of the following review. Subsequently, we propose a novel theory-driven taxonomy, categorizing existing methods based on the causal theory employed, namely those based on the potential outcome framework, the structural causal model, and general counterfactuals. The review then delves into the technical details of how existing methods apply causal inference to address particular recommender issues. Finally, we highlight some promising directions for future research in this field. Representative papers and open-source resources will be progressively available at https://github.com/Chrissie-Law/Causal-Inference-for-Recommendation.

Graphical abstract

graphic file with name fx1.jpg

Public Summary

  • Causal inference enhances recommendation by modeling cause-effect and answering “what-ifs”.

  • We provide an up-to-date collection and review of causal recommendation methods.

  • All methods can be categorized into a causal-theoretically coherent taxonomy.

  • Evolution of causal methods in recommender systems is traced.

Introduction

Recommender systems (RSs), working as filtering systems to present personalized information to users and alleviate information overload, have been widely deployed in various online applications, including e-commerce, social networks, and multimedia services. Recently, an emerging research direction has attracted increasing attention from RS researchers. This direction explores the integration of advanced machine learning with a traditional statistics field, namely causal inference. Causal inference1,2 works to analyze the relationship between a cause and its effect,3 which has a wide range of real-world applications in both academic and industrial domains, such as medicine,4,5,6 climate,7 political science,8,9,10 and online advertising evaluation.11,12,13 Treatment effect estimation is a fundamental problem in causal inference, often applied in policy evaluation. For example, consider pharmaceutical research, where we are interested in the effect of a drug on lifespan. We need to answer a causal question involving a so-called intervention or treatment: what is the probability that a typical patient would survive in L years if made to take the drug? A large-scale randomized controlled trial is the golden solution, but it may suffer from expense and even ethical issues. Therefore, in most cases, we can only estimate the effect from non-randomized observational data, where the correlation between the drug and survival does not imply causation, because factors including age, gender, and severity of the disease may affect the outcomes.

Causality for recommendation has been widely used in uplift modeling for policy effect evaluation14,15 but it was not until the last few years that research has tended to focus on applying it to model training. Common recommendation scenarios in practice, including click-through rate (CTR) prediction and post-click metric prediction, can be abstracted into causal problems, and causal inference can be applied in different stages of the entire RS project, such as preliminary data collection,16 representations learning of users and items embeddings,17,18,19 objective optimization,20,21,22 and policy evaluation offline and online.23,24,25 Causal RSs can surpass traditional approaches, primarily due to two key strengths (Figure 1):

  • (1)

    Model cause and effect. The majority of current machine-learning systems, including RSs, operate predominantly in a statistical mode,26,27 which focuses on the correlation between variables. However, in practical applications, we care more about causality rather than correlation, and it is well known that correlation is not causation. For example, a movie recommendation platform recorded a female user who has finished watching an action movie, so it concludes that she likes action movies and makes many recommendations for related action movies. Nevertheless, the user may have watched the movie due to its popularity rather than her inherent preference for action movies in fact. Therefore, the spurious correlation between user interest and movie genres learned by traditional RSs may lead to a degraded user experience. In contrast, causal recommendation systems can learn the causal effects of users' individual interests as well as conformity on the interaction outcome (i.e., watching), respectively, thus avoiding incorrectly recommending action movies later. Modeling cause and effect enables causality-based RSs to (1) measure the causal effects on user interaction of source variables of a wide range of bias, such as popularity and exposure,28,29,30,31 thus performing effective debiasing, which is currently the most common application of causal inference for recommendation; (2) better control RSs due to decomposition and inference of the causal effect of variables; for example, leveraging the causal effect of certain bias to improve recommendation accuracy.28

  • (2)

    Answering counterfactual questions. Many RS problems, including data augmentation, out-of-distribution (OOD) generalization, and policy evaluation, are essentially counterfactual problems; that is, the values of some causal variables are different from reality. (1) In terms of the data augment problem, as a significant complementary resource of the observed data,16 the counterfactual data need to answer questions such as “What would be the user’s interaction if the recommended items had been different?” or “What would be the probability of a click if an item had been recommended to a user who has not been recommended it before?.” (2) The OOD problem refers to the recommendation that violates the independent and identically distributed (IID) assumption of the interactions between training and testing periods.18 Traditional recommendation may learn false associations between users and items, while causal RSs adopt the counterfactual means to find invariant or unchangeable variables or causal relationships in the recommendation task and reuse them to generalize when the distribution changes. For example, if a pregnant woman had purchased red high heels before pregnancy, a traditional recommendation system might continue to make recommendations of high heels, but a causal inference system can learn the causal relationship between high heels and pregnancy status through causal tools such as causal graph. Therefore, when the user’s status shifts (identified from the user’s behavior such as purchasing baby products), the causal RS no longer recommends high-heeled shoes but retains the user’s preference for the red color and recommends red clothing instead. (3) Uplift modeling estimates the increase, or uplift, in user interactions caused by recommendations, which is

Figure 1.

Figure 1

Strengths of causal inference for recommendation

It is worth mentioning that there are some existing surveys35,36,37,38 of causal inference for RSs. However, the present study distinguishes itself from these previous works for several reasons.

  • (1)

    Theoretically coherent classification framework from a causal perspective. The aforementioned surveys fall short in providing a comprehensive taxonomy of causal RSs. Specifically, survey 35 only discusses recommendation methods of the potential outcome framework,39 and approaches in surveys 36–38 are mainly classified from the application perspectives (i.e., issues of RSs). This application-centric taxonomy, while practical, tends to obscure the underlying theoretical coherence of causal inference methods, as a single causal theory could be applied in various problems. Contrastingly, our survey adopts a more nuanced and theory-driven classification, involving the Neyman-Rubin potential outcome (PO) framework39,40 and the Pearl structural causal model (SCM) framework.3,41,42 In this paper, causal-based recommendation algorithms are categorized into three main types: PO based, SCM based, and general counterfactuals based. Both PO-based and SCM-based methods utilize specific causal inference techniques, but the former does not explicitly employ causal structure information. On the other hand, methods based on general counterfactuals draw inspiration from counterfactual concepts but do not employ particular causal inference techniques. The classification framework is illustrated in Figure 2. This taxonomy not only provides a more structured and holistic understandi

  • (2)

    Evolution of causal methods in RSs. We systematically delineate the developmental trajectory of the integration between prevalent causal inference theories and RSs, as illustrated in Figures 4 and 7. Through this intuitive exposition, readers can readily perceive how methodologies within a specific domain have been iteratively proposed and the particular issues they address in their respective evolutions.

  • (3)

    Up-to-date collection and review. Given the growing popularity of this domain, our survey encompasses numerous recent publications in existing surveys.35,36,37,38 We have collected papers related to causal inference-based recommender algorithms from esteemed conference proceedings and journals, and we visualize their statistics concerning the published year and causal inference framework in the supplemental information in Figure S1.

Figure 2.

Figure 2

Strategies of the causal inference for recommendation

Figure 4.

Figure 4

Evolutionary timeline of propensity score strategies in recommendations

Figure 7.

Figure 7

Separate-learning counterfactual inference

Separate-learning counterfactual inference, a common pattern of SCM-based causal inference for RSs, learns causal effect with a separate structure or multi-task framework and performs counterfactual inference during testing.

This paper provides a comprehensive summary of the work on causal RSs. The core of this survey is organized into three sections, focusing on causal recommendation approaches from the perspective of causal techniques: PO based, SCM based, and general counterfactuals based, respectively. The last section concludes this survey. Due to the limited space in this article, in-depth discussions on related fields, theoretical foundations, and future research directions have been included in the supplemental information. The supplemental information also features comprehensive tables of the research papers reviewed in this study. We highly recommend readers to delve into these sections in the supplemental information for a more thorough understanding, as they contain critical insights and extended analyses that complement the main text.

PO-based methods

Many causal recommendation approaches, especially in early research, have focused on applying the PO framework proposed by Donald B. Rubin.2,39,40 These approaches primarily integrate PO-based causal inference into the optimization functions in traditional deep-learning-based methods or the reward functions in reinforcement-learning-based methods.

Figure 2 illustrates the strategies and objectives concerning the PO framework in the context of RS, categorizing the strategies into two main types: propensity score and causal effect. The former generally leverages estimated propensity scores from causal inference methods to adjust importance weights, while the latter concentrates on the difference between POs under treatment and control (see Equation S1 in supplemental information).

It is essential to clarify that, in this paper, models that estimate causal effects without explicitly utilizing causal structure information are classified as PO based, whereas those explicitly incorporating causal structure information are categorized as SCM based.

Propensity score strategy

Let us consider how a recommendation system operates. Given background variables, denoted as xPr(x) and also known as pre-treatment variables or covariates2 (e.g., user and item features, time of the day), a recommender policy π acts as a decision-making system. It decides whether to take an active treatment tπ(t|x) (e.g., recommend an item). Following this, the PO, yPr(y|x,t), often termed the “reward” in the reinforcement learning context (e.g.,clickindicator), will be observed.43 For example, in online markets, information such as user profile, consumption history, and products in the cart will be treated as context variables x, according to which the policy π will produce a list of recommended items (i.e., treatment t), and the logged reward y can be the click signal, conversions, or revenue, etc. The effectiveness of the policy π can be evaluated through its running expected reward, formulated as:

R(π)yPr(yx,t)π(tx)Pr(x)dxdtdy=EPr(x)π(tx)Pr(yx,t)[y]. (Equation 1)

To learn the optimal policy

πargmaxπΠV(π) (Equation 2)

where Π means the policy class, conducting an online A/B test will be the best choice,44,45 but it suffers from high expense. A substitute and common practice is offline evaluation, by calculating an estimator Rˆ for the reward of a target policy π using logged data Oπ0 collected by a logging policy π0.43 However, like many other empirical sciences, offline evaluation is challenged with the problem of missing not at random (MNAR).

To address this issue, early approaches tend to predict the missing data directly46 but have accentuated the problem of high bias.24,47 Recently, many researchers have resorted to the propensity score e(X) in causality to recover the data distribution. For example, ExpoMF30 first predicts the exposure matrix and then uses the exposures (i.e., propensity scores) to guide the model of the interaction matrix, which is inspired by the separation between propensity scores and potential outcomes in the PO framework. Similarly, Wang et al.48 propose SERec to integrate social exposure into collaborative filtering. A refreshing work is by Wang et al.,49 who aim to overcome the confounder issue with propensity score. They regard correlations among the interacted items as bringing indirect evidence for confounders and propose the deconfounded recommenders. They first build an exposure model to estimate the propensity score and then use this exposure model to estimate a substitute for the unobserved confounders, conditional on which the final outcome model (specifically in this work, a rating model based on matrix factorization) is trained. In addition, inspired by studies,50,51 Chen et al.52 propose IOBM (Interactional Observation-Based Model) to estimate propensity score in interaction settings, which learns low-dimensional embeddings as a substitute for unobservable confounders. Specifically, it learns individual embeddings to capture the PO information from specific exposure events. Based on individual embeddings, the interactional embeddings, which uncover the hidden relationship among single exposure events and utilize query context information to apply attention, are learned through the bidirectional LSTM (long short-term memory) model. Recently, the incorporation of contrastive learning53,54 with propensity scores has offered new avenues to address noisy data in recommendation systems. A prominent example is the CCL framework,55 which innovatively employs propensity score-based sampling to generate informative positive pairs for contrastive learning tasks.

Propensity-based methods can be further divided into approaches based on inverse propensity score (IPS) and approaches based on doubly robust (DR) (Figure 4). One of the greatest strengths of applying propensity-based methods in RS is their unbiased and model-agnostic nature, allowing for simple deployment either in the objective function for policy evaluation directly or for policy learning indirectly.

MNAR

In this part, we introduce the phenomena and factors of MNAR to provide explanations and conclusions of challenges in RSs in a causal language to understand existing work better.

Recommendation algorithms often obey the missing at random (MAR)56 assumption but may lead to biased prediction and suboptimal policy.57,58 The MAR condition essentially states that the probability that a PO is missing does not depend on the value of that PO and can be easily violated in RSs.58 For example, on movie rating websites, movies with high ratings are less likely to be missing compared to movies with low ratings.59 The issue of MNAR has been demonstrated by Marlin and Zemel58 and it is a phenomenon stemming from selection bias and confounding bias.35,60

Selection bias, or sampling bias, is usually discussed in the prediction task and can be further classified into model selection bias and user self-selection bias.35 For example, the case that the platform may systematically recommend pop music to younger users who may be more active on the service regardless of genre preferences21 will be regarded as model selection bias21,61 and can be eliminated by random recommendation. User self-selection bias,62,63 on the contrary, cannot be removed by randomization of recommendation.60 It is caused by preferential exclusion of samples from the data.62 A typical example is a song RS, in which users usually rate songs they like or dislike and seldom rate what they feel neutral about.64 Some of the most frequently discussed biases, such as popularity bias29,65 and exposure bias,30,48 will lead to model selection bias, while conformity bias28,66 and clickbait bias31 fall under user self-selection bias as a result of user preference.

Confounding bias3,67 arises from the confounder described in the next section, which affects both the treatment and the outcome, illustrated in Figure 3A. Alternatively, it can be identified if the probabilistic distribution representing the statistical association is not always equivalent to the interventional distribution; i.e., Pr(yt)Pr(ydo(t)).68 A notable example of confounding bias is that a system trained with historical user interactions may over-recommend items that the user used to like, and the user’s decision (i.e., outcome) is also affected by historical interactions.69

Figure 3.

Figure 3

Causal explanation of confounding bias and user self-selection bias

(A) Confounding bias.

(B) User self-selection bias.

Both biases can lead to invalid estimates of causality from the data, and they are not mutually exclusive because selection bias does not explicitly involve causality. Many model selection biases, including popularity bias and exposure bias, are also confounding biases. As for user self-selection bias, the model in Figure 3B shows an illustration of its causal nature in which S is a variable affected by both T (treatment) and Y (outcome), indicating entry into the data pool.62 Therefore, confounding bias is significantly different from user self-selection bias from the causal perspective. Confounding bias originates from common causes, whereas user self-selection bias originates from common outcomes.63 The former stems from the systematic bias introduced during the treatment assignment, while the latter comes from the systematic bias during the collection of units into the sample.60

IPS,57,70,71,72 also named inverse propensity weighting (IPW), or inverse propensity of treatment weighting (IPTW), is one of the favorite counterfactual techniques and has inspired a lot of causal inference methods in RS, especially for unbiased learning.50 Propensity score is the probability of receiving the treatment given covariates X, formulated as follows:

eπ(X)=Prπ(T=1X). (Equation 3)

IPS assigns a weight w to each sample:

w=te(x)+1t1e(x), (Equation 4)

which indicates the inverse probability of receiving the observed treatment and control. The unbiasedness of IPS can be proved.71 More specifically, for the reward estimation of recommendation policy, IPS adjusts the distribution of background features in the logged dataset to align with that observed during π tests online, formulated as follows:

RˆIPS(π;Oπ0)1Oπ0k=1|Oπ0|eπ(X)eπ0(X)·yk=1Oπ0k=1|Oπ0|Prπ(T=1X)Prπ0(T=1X)·yk, (Equation 5)

where we assume that only positive feedback is taken into account, and w=eπ(X)eπ0(X) is the ratio of the evaluation and logged policies. Note that, in most applications in RS, IPS is model agnostic and is applied to the training objective function for policy evaluation directly or for policy learning indirectly.

Many IPS-based recommendations focus on data debiasing in user interactions, mainly selection bias.25,73,74,75,76,77,78,79 For example, Schnabel et al.73 is a representative work adopting IPS to RS for the elimination of selection bias, in which the recommendation algorithm is based on matrix factorization and propensity scores are estimated via naive Bayes or logistic regression. Similarly, Saito et al.74 estimate the exposure propensity for each user-item pair and Sato et al.75 propose the DLCE (debiased learning for the causal effect) model with IPS-based estimators to evaluate unbiased ranking uplift. Unbiased IPS-based uplift is also covered by Sato et al.'s work.23 In addition, UR-IPW (user retention modeling with inverse propensity weighting)76 models revisit rate estimation accounting for the selection bias problem and80 adjust domain weights based on IPS to reduce domain bias. Although IPS-based methods do not require an explicit analysis of the causal correlation between variables, some works21,81,82 still discuss causal graphs as an excellent guide to accurate model. For example, Ding et al.82 leverage a causal graph to explain the risk of unmeasurable confounders on the accuracy of propensity estimation and propose RD (robust deconfounder) with the sensitivity analysis, obtaining the bound of propensity score to enhance the robustness of methods against unmeasured confounders. Li et al.79 construct the DENC (de-bias network confounding in recommendation). This causal-graph-based recommendation framework disentangles three determinants for the outcomes, namely inherent factors, social network-based confounder, and exposure, and estimates each of them with a specific component, respectively.

In addition to debiasing, some IPS-based methods are dedicated to addressing other issues that abound in RS.20,83,84 For example, Mehrotra et al.20 propose an unbiased estimator of user satisfaction based on IPS to jointly optimize for supplier fairness and consumer relevance. Besides, the CBDF (counterfactual bandit with delayed feedback)83 re-weights the observed feedback with importance sampling, which is determined by a survival model to deal with delayed feedback. The CAFL (causal adjustment for feedback loops)84 extends the IPS estimator to break feedback loops.

Despite the unbiasedness strength of IPS, the inaccurate estimation of the unknown propensity e(x) or sample weight, which results in high variance,85 becomes the biggest obstacle to achieving it. To alleviate this problem, modified versions of IPS have been proposed to control variance and applied to RS, including self-normalized IPS,73,76 clipped IPS,74,75 reward interaction IPS (RIPS),21 and regularized per-item IPS.86 Self-normalized inverse propensity scoring (SNIPS)87 rescales the estimate of the original IPS without any parameters to reduce the high variance, which is

RˆSNIPS(π;Oπ0)(k=1|Oπ0|eπ(X)eπ0(X))1k=1|Oπ0|eπ(X)eπ0(X)·yk, (Equation 6)

and is introduced to RS by works such as Schnabel et al.73 and Zhang et al.76 to alleviate selection bias. Clipped IPS (CIPS),74,75,88 or capped IPS, tightens the bound of the sample weight by introducing a scalar hyperparameter λCIPS, formulated as

RˆCIPS(π;Oπ0)1Oπ0k=1|Oπ0|min{eπ(X)eπ0(X),λCIPS}·yk, (Equation 7)

which has a lower variance but gives away its unbiasedness. Expanding upon the groundwork established by NCIPS,87 which amalgamated SNIPS and CIPS, the study by Gilotte et al.85 advances PieceNCIS and PointNCIS as enhancements that utilize contextual information to refine bias modeling. McInerney et al.21 loosen the SUTVA (stable unit treatment value assumption, see assumption 1 in supplemental information) assumption and propose RIPS for sequential recommendations, which assumes a causal model in which users interact with a list of items from the top to the bottom. RIPS uses iterative normalization and lookback to estimate the average reward and achieves a better bias-variance trade-off than IPS. In addition to high variance, violation of the unconfoundedness assumption is another challenge of utilizing IPS in RS. That is, the treatment mechanism is identifiable89,90 from observed covariates due to the existence of unobserved ones, which leads to the inaccurate estimate of propensity score and the disagreement between the online and offline evaluations. To address the uncertainty brought by the identifiability issue, the minimax empirical risk formulation91 transforms the challenge into an adversarial game between two recommendation models via duality arguments and relaxations.

More recently, Liu et al.86 propose regularized per-item IPS (RIIPS) with an additional penalty function that constrains the difference in recommended outcomes between the deployed system and the new system so that the explosion of propensity scores can be avoided.

DR

DR47,92,93,94 is another powerful and effective causal method to account for the MNAR issue. To understand DR, let us consider the two commonly used approaches to mitigate against MNAR: direct method (DM)95 and IPS.96 The former designs a model (linear regression, deep neural network, etc.) to directly learn the missing outcomes based on the observed data, which has low variance due to the advantage of supervised learning but suffers from high bias caused by unmet IID assumptions, denoted as96

RˆDM(π;Oπ0,yˆ(xk,t))1Oπ0k=1|Oπ0|Pr(t=1|xk)yˆ(xk,t), (Equation 8)

where yˆ(xk,t) is the estimated outcome. The latter, although unbiased theoretically, often causes training losses to oscillate stemming from the inverse of propensity with high variance.97 What DR does is to combine the direct method and IPS, which takes advantage of both and overcomes their limitations:

RˆDR(π;Oπ0,)RˆDM(π;Oπ0,yˆ(xk,t))+1Oπ0k=1|Oπ0|eπ(X)eπ0(X)(ykyˆ(xk,t)). (Equation 9)

DR uses the estimated outcomes to decrease the variance of IPS. It is also DR in that it is consistent with the policy reward value if either the propensity scores or the imputed outcomes are accurate for all user-item pairs.47,96 Furthermore, advanced versions such as Switch-DR98 and DRos (doubly robust with optimistic shrinkage)99 are proposed to further control the variance.

Based on the above advantages, DR has found an increasingly wide utilization in RSs.47,61,100,101,102,103,104,105,106,107 Wang et al.47 utilize DR for unbiased RS prediction and further propose a joint learning approach that simultaneously learns rating prediction and propensity to guarantee a low prediction inaccuracy at inference time. Yuan et al.61 propose a propensity-free DR method to address the issue that samples with low propensity scores are absent in the observed dataset. Zhang et al.100 propose multi-DR based on a multi-task learning framework to address selection bias and data sparsity issues in CVR estimation. Gun et al.101 propose the more robust doubly robust (MRDR) estimator to further reduce the variance caused by inaccurate imputed outcomes in DR while retaining its double robustness. In addition, Kiyohara et al.102 expand previous RIPS to the cascade doubly robust estimator, which has the same user interaction assumption as RIPS. Xiao et al.104 propose an information bottleneck-based approach to effectively learn the DR estimator for the estimation of recommendation uplift, with the hope of a better trade-off between the bias and variance of propensity scores. Dai et al.105 learn imputation with balancing the variance and bias of DR loss. More recently, Song et al.106 filter imputation data through examination of their mean and variance, in order to reduce poisonous imputations that significantly deviate from the truth and impair the debiasing performance.

Causal effect strategy

The most critical and fundamental role of causal inference is to estimate the causal effects from observational data, which has a variety of applications in real-world RSs. Some works are dedicated to estimating and enhancing the treatment effect of a recommender policy on specific customer outcomes, namely uplift.15 In such scenarios, the causal effect is typically implemented as either a direct or indirect optimization goal, aiming to maximize platform benefits. Additionally, treatment effects extend to other application areas in RSs, serving purposes beyond uplift.

It is crucial to highlight that, within the PO framework, the causal relationships between variables are not the focal point while calculating causal effect. Instead, all variables affecting POs, except for the treatment, will be treated as covariates.

Causal effect for uplift

Uplift, denoting the causal effect of recommendations, refers to the increase in user interactions purely caused by recommendations. Typical evaluations of RSs regard positive user interactions as a success. However, a subset of these interactions might persist even in the absence of recommendations. This assertion is substantiated by the conclusion of Sharma et al.,108 which indicates that more than 75% of click-throughs would still occur in the absence of recommendations. For marketing campaigns where return on investment (ROI) is paramount, targeting “voluntary buyers”—individuals who would interact with or without any recommendations—is deemed unnecessary. Therefore, the industry regards uplift as a valuable metric for recommendations in expectation of higher rewards.

It is a natural application to introduce the causality concepts such as average treatment effect (ATE) and conditional ATE (CATE) for uplift modeling since the definition of uplift is a counterfactual problem and consistent with the objective of causal effect estimation.15,109,110 Causal approaches with traditional machine-learning methods for uplift estimation include two-model approach,14,111 transformed outcome,112 and uplift trees.113,114 Regarding RSs, uplift estimation on online A/B testing suffers from the high expense and large fluctuations due to user self-selection bias,25 while uplift estimated offline is bedeviled by a wide variety of biases that could lead to MNAR. To address these issues, a substantial body of literature has emerged. Sato et al.23 utilize SNIPS-based ATE to accomplish offline uplift-based evaluation. Goldenberg et al.115 leverage the retrospective estimation technique, which relies solely on data with positive outcomes for CATE-based uplift modeling, which makes it especially suited for many recommendation scenarios where only the treatment outcomes are observable. Betlei et al.116 learn a model that directly optimizes an upper bound on AUUC (area under uplift curve), a popular uplift metric based on the uplift curves and unified with ATE.109 In addition, CausCF117 extends the classical MF to the tensor factorization with three dimensions (user, item, and treatment effect) for better uplift performance. CF-MTL (counter factual entire-space multi-task learning)107 accounts for whether users actively accept the treatment, leading to a more granular classification of users, and then estimates the probability for each user type within a multi-task learning framework.

Causal effect beyond uplift

There are some other impressive recommendation works with causal effect.22,65,118,119,120,121 For example, Mehrotra et al.118 adapt a Bayesian model to infer the causal impact of new track releases, which may be an essential consideration in the design of music recommendation platforms. Zhang et al.65 minimize the distance between the traditional attention weights in the recommendation method and the individual treatment effect (ITE) to reflect the true impact of the features on the interactions. Rosenfeld et al.119 and Bonner et al.22 frame causal inference as a domain adaptation problem and leverage ITE with a large sample of biased data and a small sample of unbiased data to eliminate the bias problems, which are described in more detail in section “domain adaptation.”

SCM-based methods

Unlike the PO framework, SCM explicitly expresses the causal relationship between variables on a causal graph, based on experiences, before analyzing the causal effect. Its intuitive features have garnered widespread admiration from researchers in the computer field. In this section, the corresponding strategies are classified according to their causal structures (i.e., collider, mediator, and confounder). We focus on how researchers abstract recommendation issues into causal problems with causal graphs and exploit tools in causal inference to cope with them.

Causal recommendation with collider structure

A collider node occurs when it receives effects from two or more other factors. Collider exists in RSs. For instance, item positions in the ranking list are influenced by user preference and item popularity.

Analyzing the dependency between variables in collider structures will contribute to its utilization in RSs. Although A and C are independent, i.e., for all a and c, Pr(A=a|B=b)=Pr(A=a), conditioning on the collision node C introduces a dependence between the node’s parents, i.e., for some a,b,c, Pr(A=a|B=b,C=c)Pr(A=a|C=c). To understand the point, let us consider the most basic example where C=A+B, and A and B are independent variables.89 In this case, given C=10, knowing A=3 means we can immediately calculate that B=7. Thus, A and B are dependent, given that C=10. This characteristic suggests that in RS issues with collider structure, knowing the common effect and one of the causes can provide information for another effect.122

Although collider structures are prevalent in RSs, they are usually compounded by other causal relationships and are treated as other causal structures, which results in minor literature discussing purely colliders. A representative work is DICE (disentangling interest and conformity with causal embedding),66 which is proposed by Zheng et al. and tracks the popularity issue from the user’s perspective instead of eliminating popularity bias from the item’s perspective. Zheng et al. argue that users’ interactions are driven by individual interest as well as users’ conformity, the latter being independent of user interest and illustrating the tendency of users to follow others. A causal graph depicting this is shown in Figure 5A. From this point of view, DICE splits user and item embeddings into interest and conformity embeddings, respectively, and learns disentangled representations with conformity-specific and interest-specific data, driven by the colliding effect: if a user interacts with a less popular item, not conforming to the mainstream, it usually indicates that the user is highly interested in the item itself, and vice versa. Further, Ding et al. propose CIGC (causal incremental graph convolution),123 which includes a new operator named CED (colliding effect distillation), to efficiently retrain graph convolution network (GCN)-based recommender models. CED frames the entire incremental training phase as a causal graph (see Figure 5B) and creates a collider St between inactive nodes RIn,t and new data RAc,t, which is represented as the pairwise distance. Therefore, the incremental integration data It can update both RAc,t and RIn,t, since conditioning on the collider St opens the path ItRAc,tRIn,t.

Figure 5.

Figure 5

Causal graphs in SCM-based RSs

(A) DICE.

(B) CIGC.

(C) Simple mediator.

(D) Mediator with confounder, where I,Y,K, and U denote the cause, effect, mediator, and confounder variable of the mediator and the outcome, respectively.

(E and F) Causal graphs of the CCF model before and after intervention. U and I are user and item representation, respectively, Y is preference score, and H denotes user interaction history.

(G and H) Causal graphs illustrating the backdoor path.

Causal recommendation with mediator structure

When one variable causes another, it may not do it directly but through a set of mediating variables instead. For example, an item purchased by your friends increases your purchase probability not only directly through the recommendation that integrates social network but also indirectly through increased trust in the item.

The key to utilizing the mediator structure is understanding the distinction between direct and indirect effects of the change of treatment on outcome, typically achieved by conditioning on the mediating variable traditionally.89 Specifically, as illustrated in Figure 5C, the total effect (ToE) of I=i on Y is defined as

ToE=Y(I=i,K(I=i))Y(I=i,K(I=i)), (Equation 10)

I=i refers to the situation where the value of I is different from the reality (i.e., counterfactual).

ToE can be further decomposed into natural direct effect (NDE) and total indirect effect (TIE). NDE reflects the effect of I on Y through the direct path, i.e., IY, while K is set to the value when I=i:

NDE=Y(I=i,K(I=i))Y(I=i,K(I=i)). (Equation 11)

TIE is defined as the difference between ToE and natural indirect effect (NIE), denoted as:

TIE=ToENDE=Y(I=i,K(I=i))Y(I=i,K(I=i)), (Equation 12)

which represents the effect of I on Y through the indirect path IKY. TE can also be decomposed into NIE and total direct effect (TDE). NIE represents the effect of I on Y through the mediator, i.e., IKY, while the direct effect on IY is blocked by setting I as I, denoted as

NIE=Y(I=i,K(I=i))Y(I=i,K(I=i)). (Equation 13)

In linear systems, NIE and TIE have the same value, and NDE and TDE have the same value.89,124

However, if there are confounders of the mediator and the outcome, as in the case of Wei et al.29 shown in Figure 5D, conditioning on the mediator means conditioning on a collider. This results in indirect dependence passing through the confounder to the outcome and misguiding the calculation of indirect effect. To tackle the problem, we should intervene on the mediator, which involves counterfactuals. The controlled direct effect (CDE) on Y of I is defined as

CDE=Y(do(I=i),do(K=k))Y(do(I=i),do(K=k)). (Equation 14)

Here, do represents do-calculus, and do(I=i) denotes setting the vairable I to the value i. Please refer to section A.2 in the supplemental information for further details. The difference between NDE and CDE is explained in Glymour et al.89

Some works are generally interested in how much of the treatment’s causal effect on variable Y is direct and how much is indirect, which is usually explored with the technique of mediation analysis,125,126 similar to SCM but without exogenous variables and the introduction of counterfactuals. For example, in early studies, Choi et al.127 conduct an experiment varying the level of social presence over hundreds of testers and examines the effect of social presence on users’ reuse intention and trust through mediation analysis. A similar structure is used to evaluate how electronic word-of-mouth affects user interactions.128 Further, Yin et al.129 aim to separate the direct effects of the change in user behaviors in the tested product from the effect of changes in user behaviors in other products, also known as induced changes. For example, during the A/B test of a new version of the recommendation module, despite the overall sitewide CVR not showing a significant increase, it is important to differentiate the effects of significant lifts in CTR on the recommendation list and significant decreases in CTR on organic search results on the final outcomes. Therefore, they use causal mediation analysis (CMA) of the PO framework to estimate causal effects of the induced changes and also discuss the estimation under the situation that multiple unmeasured causally dependent mediators exist with the help of a directed acyclic graph.

Some other works utilize Pearl’s counterfactual tool to cope with mediator structure in order to improve accuracy.29,130,131 Wei et al.29 explore the popularity issue with the SCM framework and formulate the causal graph as Figure 5D shows. In this graph, the probability of interaction Y is influenced by three main factors: user-item matching (K(U,I)Y), item popularity (IY), and user conformity (UY), the last two of which are usually ignored by existing models and thus result in the terrible Matthew effect. Following this causal graph, Wei et al. propose MACR, a multi-task framework that consists of three modules to jointly learn the effects of UY, U&IKY, and IY, respectively, during recommender training. MACR estimates TIE of I on Y in counterfactual inference:

TIE=ToENDE (Equation 15)
=Y(U=u,I=i,K=K(U=u,I=i))Y(U=u,I=i,do(K=K(U=u,I=i)))
=Yk(K(U=u,I=i))Yu(U=u)Yi(I=i)Yk(K(U=u,I=i))Yu(U=u)Yi(I=i)
=yˆkσ(yˆi)σ(yˆu)cσ(yˆi)σ(yˆu),

where σ() denotes the sigmoid function, and c is a hyperparameter that represents Yk(K(U=u,I=I)), the reference situation of Yk(K(U=u,I=i)). With counterfactual inference, MACR could rank items without popularity bias by reducing the direct effect from item properties to the ranking score.

The work by Xu et al.130 regards the user interaction history H as a mediator (Figure 5E) and proposes CCF (causal collaborative filtering) to estimate Pr(Y=y|U=u,do(I=i)), where u,i is a user-item pair and y is the preference score for the pair. More specifically, H=fh(U=u) is a database retrieval operation that returns a user’s interaction history from the observational data, I=f0(U=u,H=h) means the recommended item I returned from the already deployed recommendation system based on the user and the user’s interaction history, and Y=f(U=u,I=i) represents the estimation of unbiased user preference on the item. Pr(Y=y|U=u,do(I=i)) adopts the conditional intervention to consider both observed and unobserved (counterfactual) interaction history, as presented in Figure 5F. The derivation result of Pr(Y=y|U=u,do(I=i)) is given here:

Pr(Y=y|U=u,do(I=i))Pr(Y=y|U=u,do(I=f0(U=u,H=h)))=hPr(y|u,h,f0(U=u,H=h)) (Equation 16)

It is tempting to conclude that, if trained only with observed history h, f(U=u,I=i) would naturally degenerate to the original recommendation model f0(U=u,H=h). Therefore, Xu et al. adopt a heuristic-based approach to generate counterfactual history h.

Causal recommendation with confounder structure

There is a large volume of published studies investigating the confounding structures in recommendation, since many data biases widespread in RSs are, essentially, confounding biases mentioned in the section “propensity score strategy.” Approaches to tackle confounder structures of existing literature can be categorized into four types: with backdoor adjustment, with instrumental variables (IVs), with front-door adjustment, and with deep-learning-based intervention.

The backdoor-based approach

Before introducing the backdoor adjustment approaches, let us briefly review the definitions of backdoor path and backdoor criterion.2

Definition 1 (backdoor path) is that, given a pair of treatment T and outcome variable Y, a path connecting T and Y is a backdoor path for (T,Y) if it satisfies that

  • (1)

    It is not a directed path (it contains an arrow pointing into T); and

  • (2)

    It is not blocked (it has no collider).

Backdoor path helps us to identify confounders, which is the central node of a fork on a backdoor path of (T,Y). The following two examples will help to illustrate it.132 In Figure 5G, there is one backdoor path from T to Y, TAY, indicating that A is the confounder. For the estimation the effect of T on Y, we should eliminate the confounding bias by either controlling A to block the backdoor path or running a randomized controlled experiment. Note that TBAY is blocked by the collider at B and, therefore, not a backdoor path. In Figure 5H, we can control for C to close the backdoor path TBCY. Here, we present the formal definition of the backdoor criterion to deal with the confounding effects.

Definition 2 (backdoor criterion) is that, given a pair of treatment T and outcome variable Y, a set of variables X satisfied the backdoor criterion if X blocks all backdoor paths of (T,Y).

Based on the backdoor criterion, we can further derive the backdoor adjustment theorem, which adjusts fewer variables compared to the causal effect rule (definition 4 in supplemental information).

Definition 3 (backdoor adjustment) is that, if a set of variables X satisfies the backdoor criterion for T and Y, the causal effect of T on Y is identifiable and given by the formula

Pr(Y=y|do(T=t))=xPr(Y=y|T=t,X=x)Pr(X=x), (Equation 17)

To see what this means in practice, let us look at a concrete example, as presented in Figure 6A. Suppose we need to evaluate the effect of recommendation (T) on a user’s click behavior (Y) of a newly deployed recommendation strategy on an online shopping platform. However, the time-varying consuming desire (A) makes it difficult to compare the effect with that of the existing one. For example, users might be more willing to spend due to the proximity of holidays, resulting in a seemingly better recommendation effect of the tested policy. However, the consuming desire is unmeasurable and thus poses a challenge for do-calculation. Instead, we could control for an observed variable, the number of recent interactions B, that fits the backdoor criterion from T to Y. Therefore, adjusting for B to block the backdoor path TABY will give us the true causal effect of recommendation T on click Y, formulated as

Pr(Y=y|do(T=t))=xPr(Y=y|T=t,B=b)Pr(B=b), (Equation 18)
Figure 6.

Figure 6

Causal graphs in backdoor-based approaches

(A) A causal graph representing the relationship between recommendation T, click Y, consuming desire A, and number of recent interactions B. The dotted circle indicates this variable is unobservable.

(B) A causal graph showing the relationships between a sudden shock in traffic Zi, total exposure Ti of the focal product i, product demand Di and Dj, and recommendation click-through Yij of related product j from the focal product i. The focal product i experiences an instantaneous shock Zi in traffic, while the product j recommended shown alongside does not. Vj means direct exposure of product j (e.g., through search or browsing), which is not influenced by recommendation.

(C) The causal graph used in DecRS, where U and I denote user and item representation, D represents the user historical distribution over item groups, G is the group-level user representation, and Y is the prediction score.

(D) The causal graph of PDA, in which U and I denote user and item representation, P is the item popularity, and Y stands for user interactions.

Some literature on recommendation issues with confounder structures introduces the theory of backdoor criterion.31,108,133,134 Huang et al.133 utilize the backdoor criterion to verify whether or not word-of-mouth recommendations can influence users’ evaluation of the recommended items. Sharma et al.108 treat an instantaneous shock in direct traffic as an IV to answer the counterfactual question from purely observational data of how much interaction activity there would have been on the online shopping website if recommendations were absent and apply the backdoor criterion to block the possible unobserved confounding effect between the “exposure” Ti and “click” Yij, as Figure 6B shows. Besides, Tran et al.134 consider the job personal recommendation issue in disability employment services and present a causality-based method to tackle the problem, in which the covariate set is determined by the backdoor criterion.

A multitude of studies employ backdoor adjustment to block the backdoor path by directly intervening on the treatment variable.19,65,69,135,136,137,138,139,140,141 For example, Wang et al.69 propose the framework called DecRS (deconfounded recommender system) to eliminate bias amplification through intervention on the user representation U, which removes the effect of the historical user distribution over item groups D on U, as Figure 6C shows. Zhang et al.65 propose PDA (popularity-bias deconfounding and adjusting) to eliminate the effect of item popularity P through intervention on the item I (see Figure 6D), denoted as

Pr(Y=y|do(U=u,I=i))=pPr(y|u,i,p)Pr(p|u,i)=pPr(y|u,i,p)Pr(p), (Equation 19)

where U denotes the user representation and Y represents interactions. Pr(y|u,i,p) and Pr(p) are learned separately. It is worth mentioning that PDA can leverage popularity bias to enhance the recommendation performance by adjusting Pr(p) in the inference stage, which can be regarded as counterfactual inference. More recently, Zhang et al.139 address duration bias by identifying duration time as a confounder. Subsequently, they group data samples based on watch time feedback and craft novel duration supervision labels, thereby alleviating the confounding bias.

In the above literature elaboration, we may find a series of works that accomplish the integration of SCM-based causal inference and RSs with a similar pattern, as shown in Figure 7. They first analyze the causal relationship between the variables regarding the concern issue and formulate the causal graph based on it; after theoretical analysis, a multi-task or separated structure is adopted to learn the causal effects of the variables on the PO in the training phase. Once the training has been completed, appropriate variables are selected to intervene during the inference stage (i.e., they are set to counterfactual values directly or indirectly), and the outcome is estimated based on applicable causal rules (e.g., backdoor adjustment, TIE) to conduct counterfactual inference.

IV-based approach

The IV method is such a powerful approach for learning causal effects with confounders that it can be done even without controlling for, or collecting data on, the confounders.132 The IV causally influences the outcome only through the treatment (Figure 8A), defined as follows:

Figure 8.

Figure 8

Causal graphs in IV-based and front-door-based approaches

(A) The causal graph of a general setup for IVs, where Z is an IV.

(B) Proxy variables are easier to satisfy compared with the IVs.

(C) The causal graph of IV4Rec, which decomposes treatment T by leveraging search queries Z as IVs into causal part Tca and non-causal part Tno and combining them with different weights. X is a set of unmeasurable confounders and Y represents users’ interaction.

(D) A graphical model representing the front-door path, in which T denotes the treatment, and Y denotes outcomes. Unobserved confounders X exist in the causal effect TY, and K are variables that satisfy the front-door criterion.

(E) The causal graph for illustrating the relationship in the HCR framework. U, user features; X, hidden confounders; I, item features affected by X; Y, post-click interactions; M, click behaviors.

Definition 4 (instrumental variable) is that, given an observed variable Z, covariates X, the treatment T, and the outcome Y, Z is a valid instrumental variable (IV) for the causal effect of TY if Z satisfies:142

graphic file with name fx2.jpg

  • (2) ZY|do(T),X.

Although a popular tool, IV finds limited application in RSs because of the difficulty of finding variables that satisfy the conditions of IVs. As already cited above, Sharma et al.108 utilize an instantaneous shock in direct traffic as an IV to evaluate the recommendation effect. Si et al.143 propose a model-agnostic framework called IV4Rec, which effectively decomposes the embedding vectors into two parts: the causal part, indicating a user’s personal preference for an item, and the non-causal part, which merely reflects the statistical dependencies between users and items such as exposure mechanism and display position. In this framework, users’ search behaviors serve as the IV. The causal graph is illustrated in Figure 8C.

Considering the stringent conditions often associated with IVs, a recent theoretical advancement144 has been proposed to estimate treatment effects utilizing an auxiliary variable, which requires less restrictive prerequisites compared to IVs. An example causal diagram for auxiliary variables is visually represented in Figure 8B, where Z serves as a proxy variable for the unmeasurable confounder. Building on this theory, Zhang et al.145 developed the iDCF (identifiable deconfounder) to account for the unmeasured user’s socio-economic status X by employing the user’s consumption level as a proxy variable Z. Z can be regarded as a descendant of the unobserved confounder X yet not directly causally associated with either treatment or outcomes. Furthermore, they leverage iVAE146 to infer the conditional distribution of the latent confounder, thus resolving the non-identification issue encountered in Wang et al.49

The front-door-based approach

The front-door adjustment2 is another popular method for learning causal effects with unobserved confounders, in which we condition on a set of variables K that satisfies the front-door criterion.

Definition 5 (front-door criterion) is that, given a pair of treatment T and outcome variable Y, a set of variables K is said to satisfy the front-door criterion if:

  • (1)

    K intercepts all directed paths from T to Y

  • (2)

    There is no backdoor path from T to K, and

  • (3)

    All backdoor paths from K to Y are blocked by T.

A graph depicting the front-door criterion is shown in Figure 8D. In practice, K is usually the mediator of the causal effect TY. With the help of K, the causal effect of T on Y can be calculated as follows:

Definition 6 (front-door adjustment) is that, if K satisfies the front-door criterion relative to (T,Y) and Pr(T,Y)>0, then the causal effect of T on Y is given by the formula

Pr(Y|do(T))=KPr(Y|do(K))Pr(K|do(T))=KPr(K|T)TPr(Y|T,K)Pr(T) (Equation 20)

Zhu et al.147 propose the HCR (hidden confounder reomval) framework to mitigate hidden confounding effects by front-door adjustment. In HCR, user and item feature U and I are treatments, post-click user behaviors Y are the concerned outcome, and the click feedback K acts as the mediator that satisfies the front-door criterion, as Figure 8E shows. However, in real-world recommendation scenarios, confounding bias also affects the estimation of the click feedback, which means performing the front-door adjustment is challenging. In fact, the front-door adjustment, like the IV method, finds little application in RSs because of the lack of eligible variables.

Deconfounded recommender algorithms

Rather than directly introducing causal technique, some literature expands upon existing recommendation algorithms to deal with confounders, under the inspiration of analysis from the perspective of causal inference. For example, Chaney et al.148 modify several traditional recommendation algorithms to explore the impact of algorithmic confounding, which has found that the data-algorithm feedback loop amplifies the homogenization of user behavior without corresponding gains in utility and also amplifies the impact of recommendation systems on item consumption.

Some works integrate reinforcement learning-based RSs with causal inference to tackle the confounding issue. For example, DEMER (deconfounded multi-agent environment reconstruction)149 is proposed following the generative adversarial training framework to model the hidden confounder, which affects both actions and rewards while an agent interacting with the environment and thus obstructs an effective reconstruction of the environment, by treating the hidden confounder as a hidden policy. In Yang et al.,150 user representations U are considered as a confounder of the recommendation lists T and users’ interactions Y on recommendation lists. To alleviate this confounding bias, CPR (counterfactual personalized ranking framework) builds the recommender simulator to generate new training samples based on the causal graph.

As for session-based RSs (SBRSs), Gupta et al.151 propose the CauSeR (causal session-based recommendation) framework to perform deconfounded training to handle popularity bias. COCO-SBRS (counterfactual collaborative session-based recommender systems)152 adopts a self-supervised approach to pre-train a recommendation model to learn the causalities in SBRSs so as to eliminate confounding bias and make accurate next-item recommendations. In terms of graph neural network (GNN)-based recommendations, Gas et al. infer the unobserved confounders existing in representation learning with the CVAE (conditional variational auto-encoder) model153 and apply it to GNN-based recommendation strategy.154

General counterfactuals-based methods

Some causal recommender approaches are established based on the general concept of counterfactuals, the world that does not exist but can be reasoned with some fundamental law and human intuition. In this section, we will introduce related strategies from the perspective of recommender issues they try to address, including domain adaptation, data augmentation, fairness, and explanation.

Domain adaptation

RSs are trained and evaluated offline with the supervision of previously collected data, which usually suffer from selection bias and confounding bias. It results in a gap between the training goal and the actual recommendation objective, and thus leading to a suboptimal recommender algorithm. To address this issue, we hope to evaluate the training policy on the unbiased data, which is collected from the randomized treatment policy. However, uniform data are always expensive and small scale. To take full advantage of the uniform data, researchers train the RSs with a small amount of unbiased data and a large amount of biased data, with the hope of learning the counterfactual distribution of the biased data, which is both a counterfactual problem and a domain adaptation problem.

Rosenfeld et al.119 and Bonner et al.22 train recommender policies using biased and unbiased data, and they add regulation terms to the loss function so that the distance of parameters between the two policies in the inspiration of ITE is controllable. Bonner et al.22 train an unbiased imputation model to impute the labels of all observed and unobserved events in biased and unbiased data, and they learn the final CTR model by combining the two data with the propensity-free DR method. Further, Liu et al.155 propose KDCRec (knowledge distillation framework for counterfactual recommendation) in which the teacher network, fed with unbiased data, as input is used to guide the biased model via four approaches.

Data augmentation

Data augmentation is uncontroversially a counterfactual problem, typically addressing questions such as “What would be the user’s decision if a different item had been exposed?.” Therefore, some works aim to integrate counterfactuals into the procedure of data augmentation.

Xiong et al.156 generate new data samples based on users’ feature-level preference for review-based recommendation. To generate more effective samples, they leverage the “minimum” idea in counterfactuals. This involves learning the minimum change of the user feature-level preference that can “exactly” reverse the preference ranking of the user on a given item pair. For example, an effective counterfactual sample could be that slightly increasing the price attention of a user who had purchased an iPhone might make Xiaomi more attractive to her, this will be regarded as an effective counterfactual sample. Similarly, CASR (counterfactual data-augmentation sequential recommendation)16 generates the counterfactual sequence of items by making minimal adjustments to the user’s historical items, such that her currently interacted item can be exactly altered.

The CauseRec (counterfactual user sequence synthesis for sequential recommendation) proposed by Zhang et al.157 generates counterfactual data in a different way. It identifies indispensable and dispensable concepts in the historical behavior sequence. The former can represent a meaningful aspect of the user’s interest, while the latter indicates noisy behaviors that are less important in representing user interest. Therefore, it is reasonable to argue that replacing indispensable concepts in the original user sequence incurs a preference deviation of the original user representation, while replacing the dispensable ones still has a similar user representation. CauseRec achieves this through contrastive learning. Liu et al.158 focus on the recommendation scenario where users are exposed with decision factor-based persuasion texts, i.e., persuasion factors, and generate new training samples by making simple but reasonable counterfactual assumptions about user behaviors, including the following:

  • (1)

    If a user clicks on an item without the existence of persuasion factors, the user will still be likely to click on it with a matching persuasion factor.

  • (2)

    If a user does not click on an item with the existence of persuasion factors, the user will not click on it when the persuasion factor does not exist.

In recent work, Song et al.152 categorize the factors influencing user interactions in session-based RSs into two types: inner-session causes and outer-session causes, and then generate counterfactual data samples through a novel combination of original inner-session causes and outer-session causes from similar users.

Fairness and explanation

The counterfactual technique is a natural tool for the evaluation of fairness since we can compare the outcome (ratings, recommendation lists, etc.) in the real world and in the counterfactual world in which only users’ sensitive features (e.g., gender and race) are altered.159,160

Definition 6 (counterfactual fairness) is that a recommender model is counterfactually fair if, for any possible user u with features X=x and Z=z,

Pr(y|x,z)=Pr(y|x,do(z)). (Equation 21)

For any value y and z, Y denotes the PO for user u. Z are users’ sensitive features and X are causally Z-independent features.

Based on the counterfactual fairness, Li et al. generate sensitive feature-independent user embeddings through adversary learning.32 They train a predictor to learn the filtered embedding and an adversarial classifier to predict the sensitive features from the learned representation simultaneously. For the reinforcement learning-based recommendation, F-UCB (fair causal bandit)159 picks arms from a subset of arms at each round in which all the arms satisfy counterfactual fairness constraint that users receive similar rewards regardless of their sensitive attributes. Zhu et al.161 contend that directly removing or altering sensitive features will inevitably compromise the quality of recommendations, as these features can influence user interests fairly (e.g., racial influences on cultural preferences). To address this issue, their proposed PSF-RS delineates the influence process of sensitive features on interaction outcomes into fair and unfair paths. It addresses the path-specific bias by minimally transforming the biased factual world into a hypothetically fair one.

As for explanation, counterfactuals describe a dependency on the external facts that lead to certain outcomes and thus allow researchers to reason about the behavior of a black-box algorithm.162 Literature on counterfactual explanation also resorts to the minimum idea in counterfactuals. For example, Ghazimatin et al.33 present PRINCE (provider-side interpretability with counterfactual evidence) to search for a set of minimal actions performed by the user that, if removed, would change the recommendation to a different item, in a heterogeneous information network with users, items, etc. To understand the point, consider the following example. If a user who has bought an iPhone and followed MacBook receives a recommendation about AirPods and would not have received it if she had not bought iPhone, PRINCE would regard the behavior “purchase of iPhone” as the explanation of the recommendation. Similarly, CountER (counterfactual explainable recommendation)34 seeks the minimum changes of item features that exactly reverse the recommendation decision.

The aforementioned studies are predominantly post hoc methods tailored for proprietary machine-learning models, which restricts the explanatory models from leveraging information from the predictive models. The work by Guo et al.163 introduces CounterNet, an integration that combines the predictive model and the counterfactual explanation generator in an end-to-end framework. Beyond the scope of RSs, there are additional counterfactual explanation studies that may serve as supplementary references.162,164,165,166

Conclusion

In this survey paper, we have summarized the mechanisms and strategies of causal inference for RSs from the theoretical perspective, including PO framework based, SCM framework based, and general counterfactuals based. The survey outlines the strengths of causal inference for recommendations and employs a theoretically coherent classification framework to review numerous existing causal recommender approaches. We hope this survey will assist researchers in the recommendation field in innovation and application.

Acknowledgments

This review is supported by the National Key Research and Development Program of China under grant no. 2021ZD0113602, the National Natural Science Foundation of China under grant nos. 62176014 and 62276015, and the Fundamental Research Funds for the Central Universities.

Author contributions

F.Z. conceived and designed the research. H.L. and R.X. wrote and edited the manuscript. H.Z. and D.W. led the research paper collection. Z.A. and Y.X. supervised the review and revised the manuscript. All authors contributed to the article, read, and approved the submitted version.

Declaration of interests

The authors declare no competing interests.

LEAD CONTACT website

The corresponding author website is at https://fuzhenzhuang.github.io.

Published Online: February 8, 2024

Footnotes

Lead contact website

The corresponding author website is at https://fuzhenzhuang.github.io.

Supplemental information

Document S1. Figures S1–S3, Tables S1–S5, and Text S1–S3
mmc1.pdf (342.2KB, pdf)
Document S2. Article plus supplemental information
mmc2.pdf (5.7MB, pdf)

References

  • 1.Gelman A. Causality and statistical learning. arXiv. 2011 doi: 10.48550/arXiv.1003.2619. Preprint at. [DOI] [Google Scholar]
  • 2.Imbens G.W., Rubin D.B. Cambridge University Press; 2015. Causal Inference in Statistics, Social, and Biomedical Sciences. [Google Scholar]
  • 3.Pearl J. Cambridge university press; 2009. Causality. [Google Scholar]
  • 4.Kessler R.C., Bossarte R.M., Luedtke A., et al. Machine learning methods for developing precision treatment rules with observational data. Behav. Res. Ther. 2019;120 doi: 10.1016/j.brat.2019.103412. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Shalit U. Can we learn individual-level treatment policies from clinical data? Biostatistics. 2020;21(2):359–362. doi: 10.1093/biostatistics/kxz043. [DOI] [PubMed] [Google Scholar]
  • 6.Lu X., Meng J., Wang H., et al. DNA replication stress stratifies prognosis and enables exploitable therapeutic vulnerabilities of HBV-associated hepatocellular carcinoma: An in-silico precision oncology strategy. Innovat. Med. 2023;1(1) doi: 10.59717/j.xinn-med.2023.100014. [DOI] [Google Scholar]
  • 7.Tan J., Li N., Wang X., et al. Associations of particulate matter with dementia and mild cognitive impairment in China: a multicenter cross-sectional study. Innovation. 2021;2(3) doi: 10.1016/j.xinn.2021.100147. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Schlotter M., Schwerdt G., Woessmann L. Econometric methods for causal evaluation of education policies and practices: a non-technical guide. SSRN Journal. 2011;19(2):109–137. doi: 10.2139/ssrn.1545152. [DOI] [Google Scholar]
  • 9.Wu L., Wang L., Li N., et al. Modeling the COVID-19 outbreak in China through multi-source information fusion. Innovation. 2020;1(2) doi: 10.1016/j.xinn.2020.100033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Zhu Z., Chen B., Chen H., et al. Strategy evaluation and optimization with an artificial society toward a Pareto optimum. Innovation. 2022;3(5) doi: 10.1016/j.xinn.2022.100274. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Li S., Vlassis N., Kawale J., et al. Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. 2016. Matching via dimensionality reduction for estimation of treatment effects in digital marketing campaigns. [Google Scholar]
  • 12.Fong C., Hazlett C., Imai K. Covariate balancing propensity score for a continuous treatment: Application to the efficacy of political advertisements. Ann. Appl. Stat. 2018;12(1):156–177. doi: 10.1214/17-aoas1101. [DOI] [Google Scholar]
  • 13.Zhu X., Ao X., Qin Z., et al. Intelligent financial fraud detection practices in post-pandemic era. Innovation. 2021;2(4) doi: 10.1016/j.xinn.2021.100176. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Radcliffe N. Using control groups to target on predicted lift: Building and assessing uplift model. Direct Marketing Analytics Journal. 2007:14–21. [Google Scholar]
  • 15.Gutierrez P., Grardy J.-Y. International Conference on Predictive Applications and APIs. 2017. Causal inference and uplift modelling: A review of the literature. [Google Scholar]
  • 16.Wang Z., Zhang J., Xu H., et al. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2021. Counterfactual data-augmented sequential recommendation. [Google Scholar]
  • 17.Liu D., Cheng P., Zhu H., et al. Proceedings of the 15th ACM Conference on Recommender Systems. 2021. Mitigating confounding bias in recommendation via information bottleneck. [Google Scholar]
  • 18.He Y., Wang Z., Cui P., et al. Proceedings of the ACM Web Conference 2022. 2022. CausPref: Causal Preference Learning for Out-of-Distribution Recommendation. [Google Scholar]
  • 19.Wang X., Li Q., Yu D., et al. Causal Disentanglement for Semantic-Aware Intent Learning in Recommendation. IEEE Trans. Knowl. Data Eng. 2023 [Google Scholar]
  • 20.Mehrotra R., McInerney J., Bouchard H., et al. Proceedings of the 27th Acm International Conference on Information and Knowledge Management. 2018. Towards a fair marketplace: Counterfactual evaluation of the trade-off between relevance, fairness & satisfaction in recommendation systems. [Google Scholar]
  • 21.McInerney J., Brost B., Chandar P., et al. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2020. Counterfactual evaluation of slate recommendations with sequential reward interactions. [Google Scholar]
  • 22.Bonner S., Vasile F. Proceedings of the 12th ACM Conference on Recommender Systems. 2018. Causal embeddings for recommendation. [Google Scholar]
  • 23.Sato M., Singh J., Takemori S., et al. Proceedings of the 13th ACM Conference on Recommender Systems. 2019. Uplift-based evaluation and optimization of recommenders. [Google Scholar]
  • 24.Saito Y., Joachims T. Proceedings of the 15th ACM Conference on Recommender Systems. 2021. Counterfactual learning and evaluation for recommender systems: Foundations, implementations, and recent advances. [Google Scholar]
  • 25.Sato M. Proceedings of the 15th ACM Conference on Recommender Systems. 2021. Online Evaluation Methods for the Causal Effect of Recommendations. [Google Scholar]
  • 26.Pearl J. Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 2018. Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution. [Google Scholar]
  • 27.Xu Y., Wang F., An Z., et al. Artificial intelligence for science—bridging data to wisdom. Innovation. 2023;4(6) doi: 10.1016/j.xinn.2023.100525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Zhang Y., Feng F., He X., et al. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2021. Causal intervention for leveraging popularity bias in recommendation. [Google Scholar]
  • 29.Wei T., Feng F., Chen J., et al. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021. Model-agnostic counterfactual reasoning for eliminating popularity bias in recommender system. [Google Scholar]
  • 30.Liang D., Charlin L., McInerney J., et al. Proceedings of the 25th International Conference on World Wide Web. 2016. Modeling user exposure in recommendation. [Google Scholar]
  • 31.Wang W., Feng F., He X., et al. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2021. Clicks can be cheating: Counterfactual recommendation for mitigating clickbait issue. [Google Scholar]
  • 32.Li Y., Chen H., Xu S., et al. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2021. Towards personalized fairness based on causal notion. [Google Scholar]
  • 33.Ghazimatin A., Balalau O., Saha Roy R., et al. Proceedings of the 13th International Conference on Web Search and Data Mining. 2020. PRINCE: Provider-side interpretability with counterfactual explanations in recommender systems. [Google Scholar]
  • 34.Tan J., Xu S., Ge Y., et al. Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2021. Counterfactual explainable recommendation. [Google Scholar]
  • 35.Wu P., Li H., Deng Y., et al. Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence. 2022. On the Opportunity of Causal Learning in Recommendation Systems: Foundation, Estimation, Prediction and Challenges. [Google Scholar]
  • 36.Gao C., Zheng Y., Wang W., et al. Causal Inference in Recommender Systems: A Survey and Future Directions. ACM Trans. Inf. Syst. 2022 doi: 10.1145/3639048. [DOI] [Google Scholar]
  • 37.Zhu Y., Ma J., Li J. Causal Inference in Recommender Systems: A Survey of Strategies for Bias Mitigation, Explanation, and Generalization. arXiv. 2023 doi: 10.48550/arXiv.2301.00910. Preprint at. [DOI] [Google Scholar]
  • 38.Xu S., Ji J., Li Y., et al. Causal Inference for Recommendation: Foundations, Methods and Applications. arXiv. 2023 doi: 10.48550/arXiv.2301.04016. Preprint at. [DOI] [Google Scholar]
  • 39.Rubin D.B. Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies. J. Educ. Psychol. 1974;66(5):688–701. doi: 10.1037/h0037350. [DOI] [Google Scholar]
  • 40.Splawa-Neyman J., Dabrowska D.M., Speed T.P. On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Stat. Sci. 1990;5:465–472. doi: 10.1214/ss/1177012031. [DOI] [Google Scholar]
  • 41.Pearl J. Causal diagrams for empirical research. Biometrika. 1995;82(4):702–710. doi: 10.1093/biomet/82.4.702. [DOI] [Google Scholar]
  • 42.Pearl J. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (Morgan Kaufmann) [Google Scholar]
  • 43.Saito Y., Joachims T. International Conference on Machine Learning. 2022. Off-Policy Evaluation for Large Action Spaces via Embeddings. [Google Scholar]
  • 44.Gomez-Uribe C.A., Hunt N. The netflix recommender system: Algorithms, business value, and innovation. ACM Trans. Manag. Inf. Syst. 2015;6(4):1–19. doi: 10.1145/2843948. [DOI] [Google Scholar]
  • 45.Kohavi R., Deng A., Frasca B., et al. Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2013. Online controlled experiments at large scale. [Google Scholar]
  • 46.Steck H. Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010. Training and testing of recommender systems on data missing not at random. [Google Scholar]
  • 47.Wang X., Zhang R., Sun Y., et al. International Conference on Machine Learning. 2019. Doubly robust joint learning for recommendation on data missing not at random. [Google Scholar]
  • 48.Wang M., Zheng X., Yang Y., et al. Proceedings of the AAAI Conference on Artificial Intelligence. 2018. Collaborative filtering with social exposure: A modular approach to social recommendation. [Google Scholar]
  • 49.Wang Y., Liang D., Charlin L., et al. Proceedings of the 14th ACM Conference on Recommender Systems. 2020. Causal inference for recommender systems. [Google Scholar]
  • 50.Joachims T., Swaminathan A., Schnabel T. Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. 2017. Unbiased learning-to-rank with biased feedback. [Google Scholar]
  • 51.Fang Z., Agarwal A., Joachims T. Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2019. Intervention harvesting for context-dependent examination-bias estimation. [Google Scholar]
  • 52.Chen M., Liu C., Sun J., et al. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2021. Adapting Interactional Observation Embedding for Counterfactual Learning to Rank. [Google Scholar]
  • 53.Yu J., Yin H., Xia X., et al. Self-supervised learning for recommender systems: A survey. IEEE Trans. Knowl. Data Eng. 2023;36:335–355. doi: 10.1109/tkde.2023.3282907. [DOI] [Google Scholar]
  • 54.Zhou C., Ma J., Zhang J., et al. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2021. Contrastive learning for debiased candidate generation in large-scale recommender systems. [Google Scholar]
  • 55.Zhou G., Huang C., Chen X., et al. Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 2023. Contrastive Counterfactual Learning for Causality-aware Interpretable Recommender Systems. [Google Scholar]
  • 56.Rubin D.B. Inference and missing data. Biometrika. 1976;63(3):581–592. [Google Scholar]
  • 57.Little R.J.A., Rubin D.B. John Wiley & Sons; 2019. Statistical Analysis with Missing Data. [Google Scholar]
  • 58.Marlin B.M., Zemel R.S. Proceedings of the Third ACM Conference on Recommender Systems. 2009. Collaborative prediction and ranking with non-random missing data. [Google Scholar]
  • 59.Pradel B., Usunier N., Gallinari P. Proceedings of the Sixth ACM Conference on Recommender Systems. 2012. Ranking with non-random missing ratings: influence of popularity and positivity on evaluation metrics. [Google Scholar]
  • 60.Correa J.D., Tian J., Bareinboim E. Proceedings of the AAAI Conference on Artificial Intelligence. 2019. Identification of causal effects in the presence of selection bias. [Google Scholar]
  • 61.Yuan B., Hsia J.-Y., Yang M.-Y., et al. Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2019. Improving ad click prediction by considering non-displayed events. [Google Scholar]
  • 62.Bareinboim E., Pearl J. Proceedings of the AAAI Conference on Artificial Intelligence. 2012. Controlling selection bias in causal inference. [Google Scholar]
  • 63.Elwert F., Winship C. Endogenous selection bias: The problem of conditioning on a collider variable. Annu. Rev. Sociol. 2014;40:31–53. doi: 10.1146/annurev-soc-071913-043455. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Saito Y. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2020. Asymmetric tri-training for debiasing missing-not-at-random explicit feedback. [Google Scholar]
  • 65.Zhang J., Chen X., Zhao W.X. Proceedings of the 30th ACM International Conference on Information and Knowledge Management. 2021. Causally attentive collaborative filtering. [Google Scholar]
  • 66.Zheng Y., Gao C., Li X., et al. Proceedings of the Web Conference 2021. 2021. Disentangling user interest and conformity for recommendation with causal embedding. [Google Scholar]
  • 67.Hernn M.A., Hernndez-Daz S., Werler M.M., et al. Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. Am. J. Epidemiol. 2002;155(2):176–184. doi: 10.1093/aje/155.2.176. [DOI] [PubMed] [Google Scholar]
  • 68.Guo R., Cheng L., Li J., et al. A survey of learning causality with data: Problems and methods. ACM Comput. Surv. 2020;53(4):1–37. doi: 10.1145/3397269. [DOI] [Google Scholar]
  • 69.Wang W., Feng F., He X., et al. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021. Deconfounded recommendation for alleviating bias amplification. [Google Scholar]
  • 70.Horvitz D.G., Thompson D.J. A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 1952;47(260):663–685. doi: 10.1080/01621459.1952.10483446. [DOI] [Google Scholar]
  • 71.Rosenbaum P.R. Model-based direct adjustment. J. Am. Stat. Assoc. 1987;82(398):387–394. doi: 10.1080/01621459.1987.10478441. [DOI] [Google Scholar]
  • 72.Rosenbaum P.R., Rubin D.B. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41–55. doi: 10.1017/cbo9780511810725.016. [DOI] [Google Scholar]
  • 73.Schnabel T., Swaminathan A., Singh A., et al. International Conference on Machine Learning. 2016. Recommendations as treatments: Debiasing learning and evaluation. [Google Scholar]
  • 74.Saito Y., Yaginuma S., Nishino Y., et al. Proceedings of the 13th International Conference on Web Search and Data Mining. 2020. Unbiased recommender learning from missing-not-at-random implicit feedback. [Google Scholar]
  • 75.Sato M., Takemori S., Singh J., et al. Proceedings of the 14th ACM Conference on Recommender Systems. 2020. Unbiased learning for the causal effect of recommendation. [Google Scholar]
  • 76.Zhang Y., Wang D., Li Q., et al. International Joint Conferences on Artificial Intelligence. 2021. User Retention: A Causal Approach with Triple Task Modeling. [Google Scholar]
  • 77.Zhang W., Zhang X., Chen D. Causal neural fuzzy inference modeling of missing data in implicit recommendation system. Knowl. Base Syst. 2021;222 doi: 10.1016/j.knosys.2020.106678. [DOI] [Google Scholar]
  • 78.Wu X., Chen H., Zhao J., et al. 2021. Unbiased Learning to Rank in Feeds Recommendation. [Google Scholar]
  • 79.Li Q., Wang X., Wang Z., et al. Be causal: De-biasing social network confounding in recommendation. ACM Trans. Knowl. Discov. Data. 2023;17(1):1–23. doi: 10.1145/3533725. [DOI] [Google Scholar]
  • 80.Li S., Yao L., Mu S., et al. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021. Debiasing Learning based Cross-domain Recommendation. [Google Scholar]
  • 81.Christakopoulou K., Traverse M., Potter T., et al. Fourteenth ACM Conference on Recommender Systems. 2020. Deconfounding user satisfaction estimation from response rate bias. [Google Scholar]
  • 82.Ding S., Wu P., Feng F., et al. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022. Addressing unmeasured confounder for recommendation with sensitivity analysis. [Google Scholar]
  • 83.Zhang X., Jia H., Su H., et al. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2021. Counterfactual reward modification for streaming recommendation with delayed feedback. [Google Scholar]
  • 84.Krauth K., Wang Y., Jordan M.I. Breaking Feedback Loops in Recommender Systems with Causal Inference. arXiv. 2022 doi: 10.48550/arXiv.2207.01616. Preprint at. [DOI] [Google Scholar]
  • 85.Gilotte A., Calauznes C., Nedelec T., et al. Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 2018. Offline a/b testing for recommender systems. [Google Scholar]
  • 86.Liu Y., Yen J.-N., Yuan B., et al. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022. Practical Counterfactual Policy Learning for Top-K Recommendations. [Google Scholar]
  • 87.Swaminathan A., Joachims T. 2015. The Self-Normalized Estimator for Counterfactual Learning. [Google Scholar]
  • 88.Bottou L., Peters J., Quionero-Candela J., et al. Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising. J. Mach. Learn. Res. 2013;14(11) doi: 10.5555/2567709.2567766. [DOI] [Google Scholar]
  • 89.Glymour M., Pearl J., Jewell N.P. John Wiley & Sons; 2016. Causal Inference in Statistics: A Primer. [Google Scholar]
  • 90.Mohan K., Pearl J. Graphical models for processing missing data. J. Am. Stat. Assoc. 2021;116(534):1023–1037. doi: 10.1080/01621459.2021.1874961. [DOI] [Google Scholar]
  • 91.Xu D., Ruan C., Korpeoglu E., et al. Adversarial counterfactual learning and evaluation for recommender system. Adv. Neural Inf. Process. Syst. 2020 [Google Scholar]
  • 92.Funk M.J., Westreich D., Wiesen C., et al. Doubly robust estimation of causal effects. Am. J. Epidemiol. 2011;173(7):761–767. doi: 10.2139/ssrn.2387544. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Dudík M., Erhan D., Langford J., et al. Doubly Robust Policy Evaluation and Optimization. Stat. Sci. 2014;29:485–511. doi: 10.1214/14-sts500. [DOI] [Google Scholar]
  • 94.Jiang N., Li L. International Conference on Machine Learning. 2016. Doubly Robust Off-Policy Value Evaluation for Reinforcement Learning. [Google Scholar]
  • 95.Beygelzimer A., Langford J. Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2009. The offset tree for learning with partial labels. [Google Scholar]
  • 96.Saito Y., Udagawa T., Kiyohara H., et al. Proceedings of the 15th ACM Conference on Recommender Systems. Association for Computing Machinery; 2021. Evaluating the Robustness of Off-Policy Evaluation. [Google Scholar]
  • 97.Thomas P., Brunskill E. International Conference on Machine Learning. 2016. Data-efficient off-policy policy evaluation for reinforcement learning. [Google Scholar]
  • 98.Wang Y.-X., Agarwal A., Dudk M. International Conference on Machine Learning. 2017. Optimal and adaptive off-policy evaluation in contextual bandits. [Google Scholar]
  • 99.Su Y., Dimakopoulou M., Krishnamurthy A., et al. 2020. Doubly Robust Off-Policy Evaluation with Shrinkage. [Google Scholar]
  • 100.Zhang W., Bao W., Liu X.-Y., et al. Proceedings of the Web Conference 2020. 2020. Large-scale causal approaches to debiasing post-click conversion rate estimation with multi-task learning. [Google Scholar]
  • 101.Guo S., Zou L., Liu Y., et al. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2021. Enhanced doubly robust learning for debiasing post-click conversion rate estimation. [Google Scholar]
  • 102.Kiyohara H., Saito Y., Matsuhiro T., et al. Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. 2022. Doubly robust off-policy evaluation for ranking policies under the cascade behavior model. [Google Scholar]
  • 103.Mondal A., Majumder A., Chaoji V. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022. ASPIRE: Air Shipping Recommendation for E-commerce Products via Causal Inference Framework. [Google Scholar]
  • 104.Xiao T., Wang S. Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. 2022. Towards unbiased and robust causal ranking for recommender systems. [Google Scholar]
  • 105.Dai Q., Li H., Wu P., et al. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022. A generalized doubly robust learning framework for debiasing post-click conversion rate prediction. [Google Scholar]
  • 106.Song Z., Chen J., Zhou S., et al. Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 2023. CDR: Conservative Doubly Robust Learning for Debiased Recommendation. [Google Scholar]
  • 107.Li H., Zheng C., Wu P., et al. Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2023. Who should be given incentives? counterfactual optimal treatment regimes learning for recommendation. [Google Scholar]
  • 108.Sharma A., Hofman J.M., Watts D.J. Proceedings of the Sixteenth ACM Conference on Economics and Computatio. 2015. Estimating the causal impact of recommendation systems from observational data. [Google Scholar]
  • 109.Yamane I., Yger F., Atif J., et al. Uplift modeling from separate labels. Neural Information Processing Systems. 2018 [Google Scholar]
  • 110.Zhang W., Li J., Liu L. A unified survey of treatment effect heterogeneity modelling and uplift modelling. ACM Comput. Surv. 2021;54(8):1–36. doi: 10.1145/3466818. [DOI] [Google Scholar]
  • 111.Nassif H., Kuusisto F., Burnside E.S., et al. ILP; 2013. Uplift Modeling with ROC: An SRL Case Study. (late breaking papers) [Google Scholar]
  • 112.Jaskowski M., Jaroszewicz S. ICML Workshop on Clinical Data Analysis. 2012. Uplift modeling for clinical trial data. [Google Scholar]
  • 113.Radcliffe N.J., Surry P.D. White Paper TR-2011-1. Stochastic Solutions; 2011. Real-world uplift modelling with significance-based uplift trees; pp. 1–33. [Google Scholar]
  • 114.Rzepakowski P., Jaroszewicz S. Decision trees for uplift modeling with single and multiple treatments. Knowl. Inf. Syst. 2012;32:303–327. doi: 10.1007/s10115-011-0434-0. [DOI] [Google Scholar]
  • 115.Goldenberg D., Albert J., Bernardi L., et al. Fourteenth ACM Conference on Recommender Systems. 2020. Free lunch! retrospective uplift modeling for dynamic promotions recommendation within roi constraints. [Google Scholar]
  • 116.Betlei A., Diemert E., Amini M.-R. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021. Uplift modeling with generalization guarantees. [Google Scholar]
  • 117.Xie X., Liu Z., Wu S., et al. Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2021. CausCF: Causal Collaborative Filtering for Recommendation Effect Estimation. [Google Scholar]
  • 118.Mehrotra R., Bhattacharya P., Lalmas M. Proceedings of the 14th ACM Conference on Recommender Systems. 2020. Inferring the Causal Impact of New Track Releases on Music Recommendation Platforms through Counterfactual Predictions. [Google Scholar]
  • 119.Rosenfeld N., Mansour Y., Yom-Tov E. Proceedings of the 26th International Conference on World Wide Web Companion. 2017. Predicting counterfactuals from large historical data and small randomized trials. [Google Scholar]
  • 120.Yao J., Wang F., Ding X., et al. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022. Device-cloud Collaborative Recommendation via Meta Controller. [Google Scholar]
  • 121.Tran H.X., Le T.D., Li J., et al. 2022. What is the Most Effective Intervention to Increase Job Retention for this Disabled Worker? [Google Scholar]
  • 122.Zhang Y., Wang W., Wu P., et al. Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2022. Causal Recommendation: Progresses and Future Directions. [Google Scholar]
  • 123.Ding S., Feng F., He X., et al. Causal incremental graph convolution for recommender system retraining. IEEE Transact. Neural Networks Learn. Syst. 2022:1–11. doi: 10.1109/tnnls.2022.3156066. [DOI] [PubMed] [Google Scholar]
  • 124.Pearl J. Probabilistic and Causal Inference: The Works of Judea Pearl. Morgan & Claypool; 2022. Direct and indirect effects; pp. 373–392. [Google Scholar]
  • 125.Kenny D.A. Wiley; 1979. Correlation and Causality. [Google Scholar]
  • 126.Baron R.M., Kenny D.A. The moderator--mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. J. Pers. Soc. Psychol. 1986;51(6):1173–1182. doi: 10.1145/3447548.3467395. [DOI] [PubMed] [Google Scholar]
  • 127.Choi J., Lee H.J., Kim Y.C. The influence of social presence on customer intention to reuse online recommender systems: The roles of personalization and product type. Int. J. Electron. Commer. 2011;16(1):129–154. doi: 10.2753/jec1086-4415160105. [DOI] [Google Scholar]
  • 128.Luo C., Luo X.R., Schatzberg L., et al. Impact of informational factors on online recommendation credibility: The moderating role of source credibility. Decis. Support Syst. 2013;56:92–102. doi: 10.1016/j.dss.2013.05.005. [DOI] [Google Scholar]
  • 129.Yin X., Hong L. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2019. The identification and estimation of direct and indirect effects in A/B tests through causal mediation analysis. [Google Scholar]
  • 130.Xu S., Ge Y., Li Y., et al. Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval. 2023. Causal collaborative filtering. [Google Scholar]
  • 131.Gao C., Wang S., Li S., et al. CIRS: Bursting Filter Bubbles by Counterfactual Interactive Recommender System. ACM Trans. Inf. Syst. 2024;42:1–27. doi: 10.1145/3594871. [DOI] [Google Scholar]
  • 132.Pearl J., Mackenzie D. Basic books; 2018. The Book of Why: The New Science of Cause and Effect. [Google Scholar]
  • 133.Huang J., Cheng X.-Q., Shen H.-W., et al. Proceedings of the Fifth ACM International Conference on Web Search and Data Mining. 2012. Exploring social influence via posterior effect of word-of-mouth recommendations. [Google Scholar]
  • 134.Tran H.X., Le T.D., Li J., et al. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021. Recommending the Most Effective Intervention to Improve Employment for Job Seekers with Disability. [Google Scholar]
  • 135.He X., Zhang Y., Feng F., et al. Addressing Confounding Feature Issue for Causal Recommendation. ACM Trans. Inf. Syst. 2023;41:1–23. doi: 10.1145/3559757. [DOI] [Google Scholar]
  • 136.Zhan R., Pei C., Su Q., et al. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022. Deconfounding Duration Bias in Watch-time Prediction for Video Recommendation. [Google Scholar]
  • 137.Rajanala S., Pal A., Singh M., et al. Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2022. DeSCoVeR: Debiased Semantic Context Prior for Venue Recommendation. [Google Scholar]
  • 138.Xia Y., Wu J., Yu T., et al. Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2023. User-Regulation Deconfounded Conversational Recommender System with Bandit Feedback. [Google Scholar]
  • 139.Zhang Y., Bai Y., Chang J., et al. Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 2023. Leveraging Watch-time Feedback for Short-Video Recommendations: A Causal Labeling Framework. [Google Scholar]
  • 140.Yu D., Li Q., Yin H., et al. Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 2023. Causality-guided Graph Learning for Session-based Recommendation. [Google Scholar]
  • 141.Tsoumas I., Giannarakis G., Sitokonstantinou V., et al. Proceedings of the AAAI Conference on Artificial Intelligence. 2023. Evaluating digital agriculture recommendations with causal inference. [Google Scholar]
  • 142.Angrist J., Imbens G., Rubin D.B. Identification of causal effects using instrumental variables. J. Am. Stat. Assoc. 1993;91(434):444–455. doi: 10.3386/t0136. [DOI] [Google Scholar]
  • 143.Si Z., Han X., Zhang X., et al. Proceedings of the ACM Web Conference 2022. 2022. A Model-Agnostic Causal Learning Framework for Recommendation using Search Data. [Google Scholar]
  • 144.Miao W., Hu W., Ogburn E.L., et al. Identifying effects of multiple treatments in the presence of unmeasured confounding. J. Am. Stat. Assoc. 2023;118(543):1953–1967. doi: 10.1080/01621459.2021.2023551. [DOI] [Google Scholar]
  • 145.Zhang Q., Zhang X., Liu Y., et al. Debiasing Recommendation by Learning Identifiable Latent Confounders. arXiv. 2023 doi: 10.48550/arXiv.2302.05052. Preprint at. [DOI] [Google Scholar]
  • 146.Khemakhem I., Kingma D., Monti R., et al. International Conference on Artificial Intelligence and Statistics. 2020. Variational autoencoders and nonlinear ica: A unifying framework. [Google Scholar]
  • 147.Zhu X., Zhang Y., Feng F., et al. Mitigating Hidden Confounding Effects for Causal Recommendation. arXiv. 2022 doi: 10.48550/arXiv.2205.07499. Preprint at. [DOI] [Google Scholar]
  • 148.Chaney A.J.B., Stewart B.M., Engelhardt B.E. Proceedings of the 12th ACM Conference on Recommender Systems. ACM; 2018. How algorithmic confounding in recommendation systems increases homogeneity and decreases utility. [Google Scholar]
  • 149.Shang W., Yu Y., Li Q., et al. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019. Environment reconstruction with hidden confounders for reinforcement learning based recommendation. [Google Scholar]
  • 150.Yang M., Dai Q., Dong Z., et al. Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2021. Top-N Recommendation with Counterfactual User Preference Simulation. [Google Scholar]
  • 151.Gupta P., Sharma A., Malhotra P., et al. Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2021. CauSeR: Causal Session-based Recommendations for Handling Popularity Bias. [Google Scholar]
  • 152.Song W., Wang S., Wang Y., et al. Proceedings of the ACM Web Conference 2023. 2023. A Counterfactual Collaborative Session-based Recommender System. [Google Scholar]
  • 153.Sohn K., Yan X., Lee H. Neural Information Processing Systems. 2015. Learning structured output representation using deep conditional generative models. [Google Scholar]
  • 154.Gao J., Yang M., Liu Y., et al. Advances in Knowledge Discovery and Data Mining. 2021. Deconfounding Representation Learning Based on User Interactions in Recommendation Systems. [Google Scholar]
  • 155.Liu D., Cheng P., Dong Z., et al. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2020. A general knowledge distillation framework for counterfactual recommendation via uniform data. [Google Scholar]
  • 156.Xiong K., Ye W., Chen X., et al. Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2021. Counterfactual Review-based Recommendation. [Google Scholar]
  • 157.Zhang S., Yao D., Zhao Z., et al. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2021. Causerec: Counterfactual user sequence synthesis for sequential recommendation. [Google Scholar]
  • 158.Liu C., Gao C., Yuan Y., et al. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022. Modeling Persuasion Factor of User Decision for Recommendation. [Google Scholar]
  • 159.Huang W., Zhang L., Wu X. Proceedings of the AAAI Conference on Artificial Intelligence. 2022. Achieving Counterfactual Fairness for Causal Bandit. [Google Scholar]
  • 160.Wei T., He J. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022. Comprehensive fair meta-learned recommender system. [Google Scholar]
  • 161.Zhu Y., Ma J., Wu L., et al. Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningAugust. 2023. Path-Specific Counterfactual Fairness for Recommender Systems. [Google Scholar]
  • 162.Wachter S., Mittelstadt B., Russell C. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. JL Tech. 2017;31:841. doi: 10.2139/ssrn.3063289. [DOI] [Google Scholar]
  • 163.Guo H., Nguyen T.H., Yadav A. Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2023. CounterNet: End-to-End Training of Prediction Aware Counterfactual Explanations. [Google Scholar]
  • 164.Joshi S., Koyejo O., Vijitbenjaronk W., et al. Towards realistic individual recourse and actionable explanations in black-box decision making systems. arXiv. 2019 doi: 10.48550/arXiv.1907.09615. Preprint at. [DOI] [Google Scholar]
  • 165.Nemirovsky D., Thiebaut N., Xu Y., et al. Uncertainty in Artificial Intelligence. 2022. CounteRGAN: Generating Counterfactuals for Real-Time Recourse and Interpretability Using Residual GANs. [Google Scholar]
  • 166.Pawelczyk M., Broelemann K., Kasneci G. Proceedings of the web conference 2020. 2020. Learning model-agnostic counterfactual explanations for tabular data. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S3, Tables S1–S5, and Text S1–S3
mmc1.pdf (342.2KB, pdf)
Document S2. Article plus supplemental information
mmc2.pdf (5.7MB, pdf)

Articles from The Innovation are provided here courtesy of Elsevier

RESOURCES