Skip to main content
F1000Research logoLink to F1000Research
. 2025 Jul 25;14:213. Originally published 2025 Feb 14. [Version 2] doi: 10.12688/f1000research.161735.2

A structured dataset of the federalist society’s public engagements

Chad M Topaz 1,2,3,a
PMCID: PMC12203041  PMID: 40584889

Version Changes

Revised. Amendments from Version 1

This revised version incorporates clarifications and additions based on two referee reports. First, I have added language to the  Missing Data and Limitations section to clarify that missingness in the dataset arises from two distinct sources: (1) structural design choices in how certain event types are documented (e.g., informal gatherings lacking topical labels), and (2) absent content in otherwise expected fields (e.g., speaker biographies that contain only disclaimers). This distinction helps users interpret missingness more accurately. Second, I have added a supplemental file (event_topics.csv) that presents one row per event–topic pair, in response to a suggestion that the original pipe-delimited format may not be optimal for all users—particularly those working with relational databases or conducting topic-level analysis. This file is described in both the manuscript and the project repository. Third, I have clarified the jurisdictional scope of the dataset’s privacy and ethics review by noting that, while the project complies with U.S. data use standards, researchers in other countries (especially in the European Union and the European Free Trade Association) should ensure compliance with local data protection laws. Finally, I have added a bulleted list of other public datasets that may be usefully linked to this one—such as the Federal Judicial Center’s biographical directory and the Supreme Court Database—along with brief descriptions and links. These additions are intended to help researchers extend their analyses without modifying the original data structure.

Abstract

Background

The Federalist Society, a leading conservative legal organization, has played a significant role in shaping the American judiciary for decades. Despite its influence, comprehensive empirical data on the organization remains scarce. We address this gap by systematically documenting 20,205 public events hosted by the Society from 1984 to 2024, with substantive coverage from 2007 onward.

Methods

Following ethical best practices in data collection and ownership, we gathered event metadata—including titles, dates, locations, sponsors, topics, and speakers—via web scraping from the Federalist Society’s archives. The dataset is structured to facilitate analysis of event trends, co-speaking networks, and thematic shifts over time. To ensure data integrity, we performed validation, deduplication, and cleaning, resulting in a well-structured, high-quality dataset.

Conclusions

This dataset provides an empirical foundation for examining the Federalist Society’s role in legal discourse. It is a resource for scholars in law, political science, sociology, gender studies, network science, and computational social science, enabling investigations into key themes of discussion, event participation patterns, demographic diversity, professional backgrounds, and more. Researchers can use it to study institutional influence, legal networks, and the evolution of conservative legal thought.

Keywords: law, courts, judges, judicial appointments, organizational dynamics

Introduction

The Federalist Society for Law and Public Policy Studies, established in 1982, is a prominent conservative and libertarian legal organization with nearly 100,000 members dedicated to reforming the current legal order. 1 It has significantly influenced the American judiciary by advocating for originalism and textualism in constitutional interpretation. 2 The Society has played a pivotal role in the nomination and confirmation of federal judges, including Supreme Court justices. During President Donald Trump’s first tenure, its leadership collaborated closely with the administration to recommend judicial nominees, 3 resulting in the appointment of 234 federal judges with lifetime tenure under Article III of the U.S. Constitution, including three of the nine current Supreme Court justices. 4

Despite the Society’s well-documented influence on judicial appointments and legal discourse, there is a lack of comprehensive empirical data on it. ProPublica’s Nonprofit Explorer 5 provides some access to financial records through tax disclousures. Amanda Hollis-Brusky’s influential book Ideas with Consequences 6 analyzes the Society’s strategies and ideological reach. Still, most research has relied on qualitative analysis, interviews, or case studies rather than systematic, large-scale data. Empirical studies quantifying the Society’s influence remain scarce, and scholars have had to rely on fragmented sources when attempting to study its role in shaping the judiciary and legal thought. The absence of a structured, machine-readable dataset capturing the Society’s public engagements limits researchers’ ability to assess its operations and goals. Without a comprehensive dataset, it is difficult to rigorously analyze questions about the Society’s organizational reach, its ideological evolution, or its impact on professional pathways within the legal system.

To address this gap, we compiled and structured data on 20,205 public events hosted by the Federalist Society between 1984 and 2024, with substantive coverage beginning in 2007. The dataset contains detailed metadata, including event titles, dates, locations, sponsors, topics, and speakers, and is designed to facilitate quantitative and qualitative research on the Society’s role in shaping legal discourse. By making this data openly available, we aim to provide scholars with a robust empirical foundation for analyzing patterns of legal engagement, speaker networks, and thematic evolution within the Federalist Society’s programming.

Potential research applications

Our dataset provides a foundation for numerous lines of inquiry into the Federalist Society’s structure, influence, and evolution. Below, we outline seven areas of research that could be of interest to legal scholars, political scientists, network scientists, organizational theorists, gender scholars, sociologists, and computational social scientists.

First, researchers could examine how the Federalist Society’s presence has changed over time and across regions. Analyzing the geographic distribution of events could reveal areas of strong or limited activity, while tracking the number of unique speakers per year may indicate whether the network has expanded or remained insular. Shifts in event types and frequency—such as transitions from lectures to panel discussions or from in-person gatherings to virtual forums—may reflect broader trends in legal discourse and engagement strategies.

Second, mapping relationships among speakers could illuminate the structure of the co-speaking network. Identifying frequently recurring figures, as well as speakers who bridge multiple topic areas, would provide insight into key influencers within the organization. Tracking individual trajectories over time may also reveal whether certain speakers later attain judicial or political appointments. Additionally, this dataset enables an examination of professional clustering within the speaker network, revealing whether academics, practitioners, or judges tend to form distinct communities.

Third, event titles and descriptions offer a rich source of information for analyzing the substantive focus of the Federalist Society’s discussions. Using natural language processing, researchers could identify dominant themes, track how these themes evolve, and determine whether particular topics gain prominence following external events such as Supreme Court rulings or shifts in political leadership. Regional variations in topic selection could reveal how different chapters tailor their programming. The dataset also allows for an examination of whether larger panels tend to focus on particularly contentious or consensus-driven topics.

Fourth, the dataset provides an opportunity to study gender representation within the Federalist Society’s speaker network. Many biographical entries include titles (Mr., Ms., Mrs., etc.) or pronouns, which can be ethically leveraged to infer gender representation. Since these biographies are either self-provided or reviewed by the speakers themselves, this method respects individual identity. Researchers could analyze trends in gender balance over time, assess whether women are more or less likely to be repeat speakers, and evaluate how gender diversity varies across different topic areas. Co-speaking relationships could also be examined to determine whether speakers tend to appear alongside others of the same gender.

Fifth, biographical information about speakers’ affiliations makes it possible to assess the balance between academic, judicial, and practitioner speakers. A historical view of how these affiliations shift could reveal whether the Society has become more or less dominated by certain professional backgrounds. Furthermore, tracking career movements—such as transitions from academia to the judiciary—could shed light on the role of Federalist Society events in shaping legal careers.

Sixth, for those interested in statistical modeling, this dataset could help identify factors that predict speaker recurrence. Certain credentials or professional backgrounds may be associated with a higher likelihood of being invited back. The dataset also enables modeling of topic diffusion across the network, identifying patterns in the spread of legal and ideological discourse. The structure of event panels could be analyzed to determine what factors are associated with whether an event features a single speaker or a larger panel discussion.

Finally, this dataset supports investigations into how the Federalist Society’s programming responds to external political and legal developments. Certain topics may become more prevalent following major Supreme Court rulings, elections, or shifts in judicial appointments. Additionally, researchers could analyze whether certain speakers—particularly those who later become federal judges—appear more frequently alongside future nominees, offering insights into the Society’s role in shaping judicial careers.

Overall, researchers may wish to enrich their analyses by linking this dataset to other structured sources that track judicial careers, rulings, political affiliations, or institutional networks. Examples include:

  • Federal Judicial Center – Biographical Directory of Article III Federal Judges. Maintained by the Federal Judicial Center, this directory provides biographical and appointment information for all Article III federal judges since 1789.

  • Supreme Court Database. Hosted by Pennsylvania State University, this database provides detailed case-level and justice-level data on U.S. Supreme Court decisions from 1791 to the present, including votes, issues, and legal provisions.

  • Caselaw Access Project. Run by the Harvard Law School Library Innovation Lab, this project offers structured access to over 40 million pages of U.S. court decisions dating up through at least 2020, including metadata on opinions, courts, and citations.

  • Oyez Project. A multimedia archive created by Cornell’s Legal Information Institute, Justia, and Chicago-Kent College of Law, Oyez provides transcripts, audio, and biographies related to Supreme Court oral arguments and justices.

  • OpenSecrets (Center for Responsive Politics). An independent nonprofit, OpenSecrets provides campaign finance and lobbying data, which can be used to investigate the political affiliations or institutional ties of event speakers who are attorneys, nominees, or donors.

Methods

This dataset documents past public events organized by the Federalist Society. The dataset includes details about event titles, dates, locations, topics, sponsors, event descriptions, speaker names, and speaker biographies. We collected data programmatically through web scraping in order to ensure comprehensive coverage of all events listed on the Federalist Society’s website. We performed data collection on December 4, 2024. Our code is available in our Open Science Framework repository. 7 We carried out scraping and processing using open source tools: RStudio (version 2024.12.0+467) running R (version 4.4.2) with the pbmcapply, rvest, and tidyverse libraries. To optimize performance, we distributed data requests across 23 CPU cores on a Mac Studio M2 Ultra computer, leveraging parallel processing to expedite retrieval. Given the scale of the dataset—several tens of thousands of web pages—parallelization was essential to efficiently process web requests and prevent excessive runtime delays.

Event data retrieval

To construct a dataset of public events hosted by the Federalist Society, we systematically collected event listings from their official website’s archive of past events. 8 On the day of data collection, we verified that the archive contained 2,044 pages of event listings, each (except for the last) containing 10 events. We iterated through each of these pages and extracted structured information by identifying relevant content elements. The collected fields included: event title; event date; event location; event topics; event sponsors; event type, e.g., lecture, panel, or seminar; speaker names and biography links; and hyperlink to the event’s dedicated webpage. In total, we collected 20,433 event records.

Event description retrieval

Each event listed on the Federalist Society’s website has a dedicated webpage containing additional textual descriptions that provide further context about the discussion topics and speakers. Using the 20,433 event URLs gathered in the previous stage, we accessed each event’s page and retrieved its description. If an event page contained multiple descriptive paragraphs, we concatenated them into a single text field. After extraction, we cleaned the data by (1) trimming leading and trailing whitespace, and (2) removing a boilerplate disclaimer about the Federalist Society’s neutrality, which appeared in some event descriptions. Finally, we merged the event descriptions into our main dataset.

Speaker biography retrieval

To enhance event data with speaker details, we extracted biographical information for each individual listed as a speaker. In the previous stage, we collected speaker names and their associated biography URLs. We then accessed each speaker’s biography page and retrieved the full text of their profile. To avoid redundant processing, we identified 6,723 unique speakers among the 20,433 events and we processed each biography only once. If a biography page contained multiple text blocks, we concatenated them into a single entry. As in the previous stage, we performed additional data cleaning by trimming whitespace and removing a standard disclaimer stating that appearance on the site does not “imply any other endorsement or relationship between the person and the Federalist Society.” Finally, we assigned a unique speaker ID to each individual for consistent referencing across events and merged the speakers back into our main dataset.

Data finalization

During data processing, we identified 385 events that appeared multiple times with the same title and date. A manual review confirmed that these were not scraping errors but genuine duplicate listings on the website. While these events shared identical titles and dates, their full records varied; some contained more abbreviated information than others. To ensure completeness, we consolidated these near-duplicate entries into a single record, retaining the most detailed version of the event description, address, and other key details.

In the final dataset, we flattened events with multiple speakers, meaning that each row represents one speaker at one event—an event-speaker pair. As a result, the dataset contains 30,898 rows, corresponding to 20,205 unique events and 6,723 unique speakers. The total number of events is lower than the 20,433 originally scraped because of merging near-duplicates.

The final dataset contains 16 fields:

  • event_id: A unique identifier for the event. [integer]

  • speaker_id: A unique identifier for the speaker. [integer]

  • event_title: The title of the event. [text]

  • sponsors: The sponsors of the event, such as Federalist Society student chapters, professional groups, and topical interest groups. [text]

  • topics: Words or brief phrases describing the focus of the event, such as “Administrative Law & Regulation” or “Federal Courts.” These topics follow the Federalist Society’s own taxonomy. For events with multiple topics, entries are separated by a pipe “|”. [text]

  • event_types: The medium for the event, such as “In-Person,” “Webinar,” or “Live Stream.” For events with multiple types, entries are separated by a pipe “|”. [text]

  • event_link: The web address of the event’s page on the Federalist Society website. [text URL]

  • paragraph_content: Unstructured information provided as part of the event description. [text]

  • speaker_name: The name of the speaker. [text]

  • speaker_bio_link: The web address of the speaker’s biography page on the Federalist Society website. [text URL]

  • bio_content: The speaker’s biography as posted online. [text]

  • year: The year of the event. [integer]

  • month: The month of the event. [integer]

  • day: The day of the event. [integer]

  • address: The location of the event, which could be a physical venue or an online reference. [text]

  • combined_flag: Indicates whether the record resulted from merging two or more near-duplicate listings. [true/false]

To support normalized database use and facilitate topic-level analysis, we also provide a secondary table containing one row per event–topic pair. This table, event_topics.csv, includes two columns: event_id and topic. It allows users to easily perform frequency analysis, build relational models, or conduct topic-based filtering without additional preprocessing. The original topics field in the main dataset uses a pipe-delimited format; this supplemental file provides an alternative structure that adheres to first normal form.

Dataset validation

Because this dataset was constructed through web scraping, traditional validation techniques—such as comparing against an external authoritative source—do not directly apply, as no comprehensive structured dataset of Federalist Society events and speakers exists for benchmarking. Instead, we ensured internal consistency, completeness, and accurate extraction of structured information by implementing duplicate resolution and manual verification of sampled records.

Missing data and limitations

This dataset covers events from 1984 to 2024, but only 44 events (0.2%) date from 1984 through 2006. Since the Federalist Society could not have maintained an online event archive in the earliest years, these records must have been added retroactively. There is a sharp increase—from 9 events in 2006 to 357 in 2007—which likely reflects a shift to systematic online record-keeping. Alternatively, it could represent a sudden, significant expansion in the number of events the organization hosted. Further archival research would be needed to determine the precise cause. Regardless, researchers should treat 2007–2024 as the dataset’s effective coverage period.

Because web scraping captures only what is publicly available, any missing event descriptions, speaker biographies, or other metadata on the Federalist Society’s site at the time of collection remain missing in the dataset. For example, some events ( e.g., annual conventions, holiday parties, and networking receptions) do not list individual speakers, and some speaker biography pages exist without any content beyond a standard disclaimer.

Table 1 documents the extent of missing data and provides likely explanations. While the table presents all fields with missing data, it is important to distinguish between different types of missingness. In some cases—such as missing topics or sponsors—the absence of data likely reflects a deliberate omission because the field is irrelevant to that type of event (e.g., informal gatherings or social receptions). In other cases—such as missing speaker biographies—the field is expected to contain content, but none was provided. These differences are conceptually meaningful: the former stems from construct-driven exclusion, while the latter represents missing data in the traditional sense.

Table 1. Summary of missing data across key fields in the dataset.

The columns are: variable; number and percentage of missing records; number and percentage of unique events affected; and likely explanation. Gaps reflect structural patterns in event documentation rather than data collection artifacts. For example, networking events and large conventions often lack listed speakers, while some speaker biography pages exist without biographical content beyond a standard disclaimer.

Variable(s) Missing records Missing unique events Likely explanation
all speaker info 2,512 (8.1%) 2,512 (12.4%) Events without named speakers, such as networking events or multi-session conferences. Examples include “2011 National Lawyers Convention,” and “2019 Summer Social.”
bio_content only 5,588 (18.1%) 4,358 (21.6%) Some speakers have a biography link but no biographical text, only a disclaimer stating that “appearance does not imply any other endorsement or relationship between the person and the Federalist Society.”
sponsors 733 (2.4%) 330 (1.6%) Some events do not list sponsors, particularly broad-topic panels or private events.
topics 8,161 (26.4%) 6,075 (30.0%) Some events do not include topical labels, especially informal gatherings.
event_type 3,233 (10.5%) 1,553 (7.7%) Some events do not specify a format ( e.g., lecture, panel, seminar).
paragraph_content 5,586 (18.1%) 4,069 (20.1%) Some events lack descriptions beyond title and date.
address 7,157 (23.2%) 4,864 (24.1%) Online events and some others lack detailed location information.

For speaker biographies, we further validated the missing cases by randomly sampling 60 speakers (approximately 1% of total) and verifying that in each case, the speaker’s webpage existed but contained no biographical text aside from a disclaimer stating that “appearance does not imply any other endorsement or relationship between the person and the Federalist Society.”

While missing data is often viewed as a technical limitation, in this case, it may reflect meaningful institutional patterns in how the Federalist Society documents its events. For instance:

  • Certain types of events, such as holiday gatherings and networking receptions, rarely list speakers.

  • The absence of speaker biographies could indicate variation in how the organization archives professional credentials or how individuals choose to share information rather than a data collection artifact.

  • Gaps in event topics and sponsors may correspond to real differences across event types rather than accidental omission.

Future research could systematically analyze these patterns to assess how the Federalist Society’s event documentation practices vary over time and across event categories.

Validation steps

To ensure the accuracy of the dataset, we performed the following checks:

  • URL Verification: We confirmed that every link we followed during scraping—event listing pages, individual event pages, and speaker biography pages—existed at the time of data collection. To restate, all URLs led to pages that actually existed.

  • Manual Review: We manually checked a random subset of 300 records (approximately 1% of total) to verify that extracted text fields, such as event titles, descriptions, and speaker biographies, matched the content displayed on the website. We found no discrepancies.

  • Duplicate Handling: As described earlier, we identified and consolidated 385 near-duplicate event listings, preserving the most complete version of each event.

In summary, this dataset represents the complete publicly available event record as of December 4, 2024, though it substantively covers the period from 2007 to 2024. While missing data presents some limitations, it also provides insight into how the Federalist Society documents and categorizes its events. More broadly, this dataset enables empirical analysis of the organization’s event structure, speaker networks, and thematic focus, offering a foundation for studying its operations and influence.

Ethical considerations

This study utilizes a dataset compiled from publicly available information sourced from the Federalist Society’s website. The dataset consists of historical records of speaking events, which were systematically extracted, structured, and processed for research purposes. Below, we address key ethical and legal considerations regarding data ownership, informed consent, and compliance with research integrity standards.

Public availability and data collection

The event records included in this study were publicly accessible at the time of collection, requiring no authentication, login credentials, or restricted access. The information was posted by the hosting organization as part of its publicly available event listings. No private, sensitive, or confidential data were accessed or collected.

At the time of data collection, the Federalist Society’s robots.txt file permitted general web crawling but restricted automated access to certain types of URLs, including those containing query parameters related to speakers, locations, topics, and media. Our data collection adhered to these restrictions by scraping only publicly available event listing pages that were not disallowed. We made no attempts to bypass these limitations, and we structured automated requests to avoid undue server load.

This project complies with data use and privacy standards in the United States. However, researchers in other jurisdictions—particularly in the European Union and the European Free Trade Association, where the General Data Protection Regulation applies—should ensure that their use and storage of the dataset complies with local data protection laws.

Data ownership and generation

The original information in this dataset was spread across thousands of web pages and did not previously exist in structured form. Under U.S. copyright law, factual data is not copyrightable. The Supreme Court’s decision in Feist Publications, Inc. v. Rural Telephone Service Co. 9 established that mere collections of facts lack the originality required for copyright protection. Applying this principle, the underlying event details in this dataset—such as event titles, dates, and speaker names—are considered public domain information. Moreover, no database rights exist under U.S. law that would grant exclusive control over a compilation of publicly available facts.

The dataset used in this study was created through systematic extraction, processing, and reorganization of public information, resulting in a structured resource designed for research purposes. This approach aligns with established academic practices in computational social science and legal research, where publicly available data is routinely collected and analyzed. The organization, cleaning, and structuring of the data constitute an original compilation, distinguishing it from a simple reproduction of raw records. Such methodologies are widely recognized in academic literature as valuable tools for research. Recent work proposes a comprehensive framework for web scraping in social science research, highlighting its role in expanding data access while addressing legal, ethical, and scientific considerations. 10 This work discusses how unstructured web data, when systematically collected and processed, can support rigorous empirical analysis—as long as researchers adhere to best practices in transparency, validity, and ethical responsibility.

Informed consent and privacy considerations

This study does not involve human participants in a way that requires informed consent, as defined by the Declaration of Helsinki 11 and typical institutional research ethics guidelines. The dataset contains only publicly available event information, with no personally sensitive data or private records. No interaction with individuals took place, and no personally identifiable information beyond publicly listed event details was collected.

Legal and ethical compliance

At the time of data collection, the Federalist Society website did not publish any terms of service that restricted the use of publicly available data for research purposes. The absence of contractual limitations means that no restrictions apply. Additionally:

  • We made no attempt to bypass any technological barriers or access restricted content.

  • The dataset contains no social media data, which would require additional ethical considerations.

  • We present the dataset without any misrepresentation of the original information.

In summary, this research aligns with best practices in open data, transparency, and ethical scholarship.

Funding Statement

The author(s) declared that no grants were involved in supporting this work.

[version 2; peer review: 2 approved]

Data availability

Open Science Framework: A Structured Dataset of the Federalist Society’s Public Engagements https://doi.org/10.17605/OSF.IO/QH672. 7

This project contains the following underlying code files and data:

  • scrapeEventList.R – script to gather basic information about events

  • scrapeParagraphText.R – script to gather event description

  • getBios.R – script to gather biographical information

  • combineFedsoc.R – script to combine and finalize data

  • fedsoc.csv – final data after gathering, structuring, processing, and cleaning

  • event_topics.csv – normalized event-topic table

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

References

  • 1. The Federalist Society: The Federalist Society for Law and Public Policy Studies. Accessed 2025-02-01. Reference Source
  • 2. Scherer N, Miller B: The Federalist Society’s Influence on the Federal Judiciary. Polit. Res. Q. 2009;62(2):366–378. 10.1177/1065912908317030 [DOI] [Google Scholar]
  • 3. Hawkins S: Trump’s Dangerous Judicial Legacy. UCLA L. Rev. Discourse. 2019;67:20–45. [Google Scholar]
  • 4. Federal Judicial Center: Biographical Directory of Article III Federal Judges, 1789–Present. Reference Source
  • 5. ProPublica: Nonprofit Explorer. Reference Source
  • 6. Hollis-Brusky A: Ideas with Consequences: The Federalist Society and the Conservative Counterrevolution. Oxford University Press;2014. [Google Scholar]
  • 7. Topaz CM: A Structured Data Set of the Federalist Society’s Public Engagements. Open Science Framework. 2024. 10.17605/OSF.IO/QH672 [DOI] [Google Scholar]
  • 8. The Federalist Society: Past Events Archive. Accessed 2024-12-04. Reference Source
  • 9. Feist Publications, Inc. v. Rural Telephone Service Co., 499 U.S. 340. 1991.
  • 10. Brown MA, Gruen A, Maldoff G, et al. : Web Scraping for Research: Legal, Ethical, Institutional, and Scientific Considerations. 2024. Reference Source
  • 11. World Medical Association: Declaration of Helsinki–Ethical Principles for Medical Research Involving Human Participants. Amended 2024. Reference Source [DOI] [PubMed]
F1000Res. 2025 Jun 26. doi: 10.5256/f1000research.177808.r373577

Reviewer response for version 1

Malcolm Langford 1

This dataset provides an important insight into a powerful institution in American civic life, especially in the appointment of and influence on American judges. I would recommendation indexing and make only the following minor comments.

1. While it is explained that missing data stems from "structural patters in event documentation rather than data collection artifacts", there seems to be qualitative and important differences between the missing categories. The table seems to cobble together missing data that results from construct validity decisions (e.g., "topics", which would be irrelevant for informal gatherings) and data content (e.g., bio_content_only).

2. I am not convinced that most users can generate gendered categories for speakers. This usually requires more advanced computational skills and there is a rich literature and set of methods that concern how to address ambiguous and rare names. I think the author could create a gendered variable.

3. The privacy considerations are US-centric, but the data may be used outside the United States. Storage and use of this dataset would require data privacy applications in many EU/EFTA jurisdictions as it contains personal data. Thus, the text should indicate that this assessment relates solely to US law.

4. There should be a discussion of other relevant datasets to which this dataset could be linked, especially those concerning judgments and judges, and how that might be best done.

5. If space permits, it could be helpful for readers to see some examples of applications of the different methods proposed under the seven motivations. E.g., one example with the scholarly reference.

Are sufficient details of methods and materials provided to allow replication by others?

Yes

Is the rationale for creating the dataset(s) clearly described?

Yes

Are the datasets clearly presented in a useable and accessible format?

Yes

Are the protocols appropriate and is the work technically sound?

Yes

Reviewer Expertise:

Comparative constitutional law; Empirical legal studies; Computational legal studies; Legal profession studies

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

F1000Res. 2025 Jul 21.
Chad Topaz 1

Thank you very much for your thoughtful review and supportive recommendation. I appreciate your engagement with the dataset and your helpful suggestions for improving its clarity, accessibility, and contextualization. Below, I respond to each of your comments.

Differentiating types of missing data. Thank you for highlighting the qualitative differences among the missing fields. I have revised the  Missing Data and Limitations section to clarify the distinction between construct-driven omissions (e.g., event topics not listed for informal gatherings) and content-absent fields (e.g., speaker biographies that exist only as disclaimers). This clarification helps contextualize missingness as a function of institutional documentation practices rather than simple data loss.

Gender inference. I appreciate your suggestion to include a gender variable to support gender-based analyses. However, I have chosen not to infer or assign gender in the dataset. Inferring gender from names, titles, or pronouns can result in misclassification, particularly for nonbinary individuals or others whose identities may not be accurately captured by such heuristics. Because these decisions carry ethical weight and the potential for harm, I believe they are best made by individual researchers whose specific analytic goals require them. To support such work, the dataset includes full speaker biographies and clear documentation of their structure and missingness.

Privacy considerations outside the United States. Thank you for raising this important point. I have added a clarifying sentence to the  Ethical Considerations section noting that while the dataset complies with data use and privacy standards in the United States, researchers in other jurisdictions—particularly in the European Union and the European Free Trade Association, where the General Data Protection Regulation applies—should ensure that their use and storage of the dataset complies with local data protection laws.

Linkages to other datasets. I appreciate your recommendation to discuss possible linkages with other relevant datasets. I have added a bulleted list at the end of the  Potential Research Applications section describing five publicly available resources that researchers may wish to link to this dataset. These include the Federal Judicial Center’s biographical directory, the Supreme Court Database, the Caselaw Access Project, the Oyez Project, and OpenSecrets. Each entry includes a brief description and web address to facilitate access.

Examples of analytic methods. Thank you for the suggestion to provide concrete examples of analytic approaches. In keeping with the scope of this paper as a dataset description, I have chosen not to include specific analyses or methods beyond the use cases already described. My aim is to provide a flexible foundation that enables a wide range of research questions without predefining the analytic approaches researchers should take.

Thank you again for your valuable and thoughtful feedback, which helped strengthen the clarity, usability, and contextual framing of the dataset.

F1000Res. 2025 Mar 5. doi: 10.5256/f1000research.177808.r366981

Reviewer response for version 1

Jonathan Kropko 1

This dataset is a comprehensive collection of events hosted by the Federalist Society from 1984 to 2024 (with better coverage from 2007 onward), along with the speakers, the event titles/types and topics, and the published biographical information about the speaker. It is collected via webscraping of the Federalist Society's website using R (and the pbmcapply, rvest, and tidyverse packages). It was collected in accordance with the restrictions outlined in the website's robots.txt file. The dataset is the first structured dataset of Federalist Society events and speakers.

This work is a significant contribution to quantitative scholarship on the American legal system as it intersects with politics. It may be argued that because the dataset focuses on only one organization that the scope of the data is too narrow. However, the Federalist Society has an outsized impact on American jurisprudence and is responsible for advancing affiliated individuals to three seats on the current Supreme Court as well as many federal judgeships. The Federalist Society is more than worthy of study in its own right and this dataset will facilitate that study.

I recommend that the work may be indexed as is. However, if the authors will be submitting revisions, there are three areas that may be addressed to strengthen the contribution.

First, while the authors do state that the speaker bio_content often contains information about the speaker's gender, they leave it to the data user to extract this information from pronouns and salutations. I imagine that while some of the users of this dataset are capable of extracting this information properly, many social scientists and legal scholars are not well trained in working with text data and may be held back by this requirement. The authors are highly proficient in data wrangling, and it may be an excellent additional benefit to users if the authors could also construct a speaker_gender column.

Second, the event topics are currently stored in a list format where each of the event's topics are separated by a pipe | symbol. Some researchers however, especially those who will be using the data to do predictive modeling, will likely be adapting the data into a relational database. According to the rules of first normal form, the topics as currently stored are nonatomic and will need to be moved to another data table that has one topic per row along with the event ID. This table would also make it much easier for users to create bar charts of topic frequency, for example. Instead of forcing users to do this, the authors can create this table and make it separately available.

Third, the data contains a unique speaker_ID. However, because many of the speakers are or go on to be federal judges, users may want to link this dataset with a separate dataset that tracks judges, such as https://www.fjc.gov/history/judges/biographical-directory-article-iii-federal-judges-export. A user who wants to merge the data with the FJC data would currently have to do so on the speaker/judge's name, which is a difficult and error prone method. If the authors can include additional columns for external judge IDs, such as the nid and jid columns in https://www.fjc.gov/sites/default/files/history/judges.xlsx, for example, then it would facilitate a great many more kinds of analyses from the data.

Are sufficient details of methods and materials provided to allow replication by others?

Yes

Is the rationale for creating the dataset(s) clearly described?

Yes

Are the datasets clearly presented in a useable and accessible format?

Yes

Are the protocols appropriate and is the work technically sound?

Yes

Reviewer Expertise:

data science, data engineering, political science

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

F1000Res. 2025 Jul 16.
Chad Topaz 1

Thank you very much for your thoughtful and constructive feedback. I’m especially grateful for the care with which you engaged the structure and potential uses of the dataset. Below, I respond to each of your suggestions.

Gender inference from speaker biographies. I appreciate your suggestion to include a speaker gender column to support gender-related analysis. However, I have chosen not to include such a column in the dataset. Inferring gender from names, titles, or pronouns—while technically feasible—can lead to misclassification, particularly for nonbinary individuals or others whose identities may not be accurately captured by such heuristics. Because these decisions involve ethical considerations and the potential for harm, I believe they are best made by individual researchers whose analytic goals require them.

Normalization of the event topics field. Thank you for this suggestion. I agree that a normalized event–topic table would be useful for users working with relational databases or conducting topic-level analysis. In response, I’ve created a supplemental file that contains one row per event–topic pair. This file makes it easier to perform frequency counts, joins, and other analyses while preserving the original flat file for broader accessibility. I’ve described this supplemental table in the manuscript and included it in the publicly available project repository.

Linking to external judge identifiers. I appreciate the suggestion to include external judge IDs (e.g., nid, jid) to facilitate linkages with the Federal Judicial Center’s biographical directory. However, I’ve chosen not to add those identifiers. Doing so would require matching speaker names and biographical details across datasets. While the result would be useful, the process is susceptible to errors. Because the dataset is designed to reflect only what is available from the Federalist Society’s public archive, I believe it’s important to preserve that boundary. Researchers interested in merging with external sources are welcome and encouraged to do so based on the provided speaker names and biographies.

Thank you again for your constructive and valuable feedback, which helped improve the clarity and usability of the dataset.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Availability Statement

    Open Science Framework: A Structured Dataset of the Federalist Society’s Public Engagements https://doi.org/10.17605/OSF.IO/QH672. 7

    This project contains the following underlying code files and data:

    • scrapeEventList.R – script to gather basic information about events

    • scrapeParagraphText.R – script to gather event description

    • getBios.R – script to gather biographical information

    • combineFedsoc.R – script to combine and finalize data

    • fedsoc.csv – final data after gathering, structuring, processing, and cleaning

    • event_topics.csv – normalized event-topic table

    Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).


    Articles from F1000Research are provided here courtesy of F1000 Research Ltd

    RESOURCES