Skip to main content
Observational Studies logoLink to Observational Studies
. 2025 Jun 25;11(2):209–212. doi: 10.1353/obs.2025.a963649

Review of “A First Course in Causal Inference” by Peng Ding

Nicole E Pashley 1
PMCID: PMC12959899  PMID: 41788419

Abstract

This is a review of Peng Ding’s textbook “A First Course in Causal Inference.” The book builds causal inference topics up from basics in experiments to complex observational studies. This review discusses the book’s style and content as well as who should use this book.

Keywords: causal inference, book review


These days, if you tell a statistician that you do research in causal inference, they are likely to reply “that’s a hot area!” Many fields have long been interested in addressing causal questions but the subject has experienced a recent “boom” across many domains. Accompanying that boom is an increased desire to have causal inference courses included in statistics and data science programs, both at the graduate and undergraduate level. This desire is natural: With the rise of causal inference in a broad array of fields and industries, it is increasingly important that students and researchers have an understanding of causal concepts to avoid making precarious and dubious causal conclusions. “A first course in causal inference” by Peng Ding is a welcome addition to the textbooks available to teach an introductory course on causal inference. Unlike many other causal inference textbooks that are written for applied researchers or econometricians, this book is (to my mind) clearly written with a statistical audience in mind. I would recommend this textbook to advanced undergraduate students or graduate students in statistics (or related fields). Although appendices are provided to review key concepts in probability, inference, and regression, I believe the book would be most appropriate for students who are already familiar with the basics of these concepts.

The textbook focuses on frequentist inference using the potential outcomes framework. The organization walks students from the nuances of causality and the basics of the potential outcomes framework through to advanced techniques for analyzing observational studies. The book is broken into six parts (excluding appendices):

  • Part I provides two chapters of introduction to causality and potential outcomes,

  • Part II provides seven chapters reviewing inference in randomized experiments covering a variety of designs and inference in the Fisherian, Neyman, and linear regression styles,

  • Part III provides six chapters focused on observational studies primarily focused on propensity scores methods including weighting, matching, and linear regression,

  • Part IV provides five chapters discussing the challenges of conducting causal inference in observational studies including violations of assumptions, sensitivity analyses, and e-values,

  • Part V provides five chapters on instrumental variables covering multiple perspectives and more modern applications such as fuzzy regression discontinuity designs and mendelian randomization,

  • Part VI provides four chapters on causal mechanisms with post-treatment variables including principal stratification, mediation analysis, and time-varying treatments and confounding.

The book provides a recommendation of how to cover the key topics within a one-semester course, emphasizing which material might be considered “optional.” The focus and length of material on randomized experiments first is in the style of Imbens and Rubin (2015). Even skipping or abbreviating the material on experiments, there is plenty of material in the book to fill a semester. Although extensively covering the frequentist approach to causal inference with potential outcomes, the book notes a few omissions of topics that instructors should be aware of, among which are methods such as difference-in-differences, synthetic controls and other panel data methodologies, and machine learning approaches. I would note two other omissions: (i) interference or causal inference on networks and (ii) ethics (to be fair, I believe every statistics textbook should include a chapter on this latter topic and most do not).

As an instructor, there is much to like in this book. From the start, I was able to imagine how I could easily use this book to create course material (and, indeed, am using this book to teach a section on causal inference in a masters in data science course). Likely from the origin of the material as lecture notes, the content is organized into short digestible sections. The presentation provides a good balance of intuition and theory. Easy to implement R code examples are provided which will help students engage in a hands on manner. The code includes simulations to build intuition for the statistical results presented as well as data examples. The code and datasets are conveniently publicly available online. At the start of the textbook, tables of acronyms and notation are provided – these are items my students commonly ask for. Each chapter provides a set of practice problems, typically a mix of theory and application. The theory questions frequently involve deriving results mentioned in the chapter, giving instructors the opportunity to tailor how much they wish students to tangle with the theory. This mix of questions seems particularly useful in mixed undergraduate and graduate classes, where additional theory problems could be assigned to graduate students but not undergraduates.

Ding leans into the exposition of experiments to build concepts from the ground up, weaving principles from experimentation into more complex approaches for observational studies. For example, the reader is reminded how propensity score stratification is conceptually related to stratified randomized experiments. These connections make the more complicated observational study techniques more approachable and gives the reader a sense of cohesion. Moreover, throughout the text, nuanced points are carefully and concisely made, shedding light on issues that are often overlooked or not understood until one is well entrenched in the subject (and sometimes not even then!). For example, the performance of regression methods is carefully demonstrated as well as its connections to different experimental designs. Recent results are referenced throughout the text, giving the reader a good overview of the current state of the literature. Refreshingly, Ding also points out where debates still exist in the literature or where further theory needs to be developed (e.g., criticisms of the doubly robust estimator are explained along with its merits). In addition to using this book as a course reference, I plan to have PhD students who are interested in research in causal inference but new to the field read this book to get up to speed.

There are a few points that instructors interested in using this book should be aware of. First, owing likely to the origin of the material as lecture notes, the book is concise and to the point, which may not suit all audiences. I would also say it is relatively mathematics-forward. For instance, examples are presented for illustration of statistical ideas with only basic background discussed. This makes the book particularly appropriate for an audience with a quantitative background, but may be less approachable for others. Second, the textbook could be made more visually appealing, particularly for more junior students. This is a (surprisingly, to me) common complaint I have from students: the textbook does not have enough pictures, it is just words and math. Although the book does make use of some helpful visuals, I would recommend even more. Third, the textbook uses R for demonstration of ideas. One of my graduate students commented that this was one of their favorite aspects of the books – that the code and simulations made the theoretical ideas quickly apparent. However, if students in the course are not already familiar with R, or the instructor wishes to use a different language, some of the accessibility of the textbook is lost. Despite these notes, I find the book well thought out.

Overall I am pleased to see “A First Course in Causal Inference” by Peng Ding in press. Ding should be commended on a wonderfully useful textbook. I would heartily recommend it to instructors teaching a causal inference course in a statistics department, and am excited to continue to put it to use myself.

References

  1. Imbens Guido W., Rubin Donald B. Causal inference in statistics, social, and biomedical sciences. Cambridge University Press; [Google Scholar]

Articles from Observational Studies are provided here courtesy of University of Pennsylvania Press

RESOURCES