Abstract
Measuring meaning is a central problem in cultural sociology and word embeddings may offer powerful new tools to do so. But like any tool, they build on and exert theoretical assumptions. In this paper I theorize the ways in which word embeddings model three core premises of a structural linguistic theory of meaning: that meaning is coherent, relational, and may be analyzed as a static system. In certain ways, word embeddings are vulnerable to the enduring critiques of these premises. In other ways, word embeddings offer novel solutions to these critiques. More broadly, formalizing the study of meaning with word embeddings offers theoretical opportunities to clarify core concepts and debates in cultural sociology, such as the coherence of meaning. Just as network analysis specified the once vague notion of social relations, formalizing meaning with embeddings can push us to specify and reimagine meaning itself.
Measuring, modeling, and understanding how meaning operates are several of the most prominent and longstanding endeavors of sociology (e.g., Mohr 1998; Mohr et al. 2020). In recent years, word embedding methods reinvigorated the study of meaning (e.g., Arseniev-Koehler and Foster Forthcoming; Boutyline, Arseniev-Koehler, and Cornell 2020; Charlesworth et al. 2021; Jones et al. 2020; Kozlowski, Taddy, and Evans 2019; Nelson 2021; Stoltz and Taylor 2019, 2020; Taylor and Stoltz 2020; Voyer, Kline, and Danton 2022). These methods computationally model the semantic information of words in large-scale text data. Despite their promise, it remains unclear what kind of meaning word embeddings capture—or whether they capture any meaning at all. To employ these tools rigorously, it is paramount that we clarify what they operationalize. Here, I critically evaluate the possibility that word embeddings operationalize an influential theory of language: structural linguistics.
Word embedding, and in particular the word2vec word embedding algorithm, revolutionized how computers learn and process human language (Mikolov, Chen, et al. 2013; Mikolov, Sutskever, et al. 2013). Indeed, since word2vec was published in 2013 it has been cited over 35,000 times (Mikolov, Sutskever, et al. 2013). Rapid advances in computer science have yielded a tremendous variety of word embedding approaches and strategies to model language more generally (e.g., Devlin et al. 2019). Meanwhile, social scientists have imported word embeddings to analyze text data at scale (e.g., Best and Arseniev-Koehler 2022; Boutyline et al. 2020; Boutyline, Cornell, and Arseniev‐Koehler 2021; Charlesworth et al. 2021; Garg et al. 2018; Haber 2021; W. Hamilton, Leskovec, and Jurafsky 2016; Jones et al. 2020; Kozlowski et al. 2019; Linzhuo, Lingfei, and Evans 2020; Martin-Caughey 2021; Nelson 2021; Rozado 2019; Stoltz and Taylor 2019, 2020; Taylor and Stoltz 2020). For example, these methods are used to capture stereotypes encoded in media language across time, offering a historical lens into stereotypes despite the absence of corresponding survey data. The premises of word embeddings may also be generalized beyond language to model other cultural systems (e.g., Arronte Alvarez and Gómez-Martin 2019; Chuan, Agres, and Herremans 2020). As such work highlights, word embedding offers an exciting new lens to study language—and perhaps other symbolic systems—across time and space.
Researchers employing word embedding methods often note their affinities to a century-old theoretical perspective on language: structural linguistics (e.g., Baroni, Dinu, and Kruszewsk 2014; Faruqui et al. 2016; Günther, Rinaldi, and Marelli 2019; Kozlowski et al. 2019). Structural linguistics envisions language as a symbolic system composed of various linguistic units (e.g., words, phonemes, or suffixes). These units are defined by their relationships to other units in the system (rather than by their reference to physical entities in the world). For instance, a word may be characterized by its co-occurrence relationships with other words. Social scientists generalized structural linguistics premises to study non-linguistic symbolic systems (e.g., kinship systems Lévi-Strauss 1963:35) and to theorize culture as a symbolic system more broadly. This intellectual movement is often referred to as semiotic structuralism, French structuralism, or just structuralism.
Here, I critically examine the affinities between the way that word embeddings model words’ semantic information and the structural linguistic perspective on word meaning. I focus on how word embeddings operationalize (or do not operationalize) core premises of structural linguistics. The first major contribution of this paper is to highlight that the extent to which contemporary word embedding methods operationalize structural linguistics depends on the way these methods are used, the specific embedding algorithm used, and even an analysts’ own interpretation of “meaning” in the algorithm.
Structural linguistics (and structuralist approaches to culture more broadly) are both profoundly influential and intensely critiqued across the social sciences (Dosse 1997; Sewell 2005). For instance, critics questioned the plausibility of a symbolic system (e.g., Martin 2010; Swidler 1986), noting that cultural symbols (like words) are often used in strikingly incoherent ways (e.g., Swidler 2013). Given the parallels between word embeddings and structuralism, do word embeddings also model language in a way that is vulnerable to the critiques of structuralism? The second major contribution of this paper is to evaluate the ways in which word embeddings succumb to and overcome the limitations of structural linguistics.
To begin, I first briefly review background information on word embedding methods as they are used in social science. Second, I review structural linguistics, focusing on three of its core premises (that language is relational, coherent, and should be studied as a static system). Third, I critically examine the ways in which word embeddings operationalize each of these three premises. Fourth, I consider critiques of these premises and evaluate whether and how each critique applies to word embeddings. In the discussion, I highlight implications and directions for future sociological research with word embedding methods.
1. A PRIMER ON WORD EMBEDDINGS
Word embeddings are quantitative representations of the semantic information of words; they are computed based on how those words are used in a text dataset (i.e., training data). Examples of text datasets include a corpus of news articles, social media posts, government records, or product reviews. Word embeddings aim to represent words as vectors (i.e., arrays of N numbers) where words that are used in more similar contexts in this training data are assigned more similar vectors. Word-vectors may also be understood graphically—as positions in an N-dimensional space. The dimensionality of the space (and thus all word-vectors) is pre-set by the algorithm or the researcher—often, at a few hundred dimensions (Rong, 2014). The information captured by dimensions is latent and often uninterpretable to humans. The dimensions are identified by the word embedding algorithm as it organizes words in space while processing the text data (i.e., trains). Two examples of training algorithms, word2vec (Mikolov, Sutskever, et al. 2013) and GloVe (Pennington, Socher, and Manning 2014), will be detailed shortly. Since vectors locate positions in space, similarity and distance are interchangeable. Words with more similar vector representations are also closer in space. This similarity, or distance, is commonly measured with cosine similarity. The collection of the word-vectors may also be referred to as the trained word embedding, a semantic space, or, as just a word embedding.
A body of research finds that the semantic information captured by well-trained word embeddings corresponds to humans’ own understandings of words (for a review, see Mandera, Keuleers, and Brysbaert 2017). For example, in well-trained word embeddings, the cosine similarities between word-vectors strongly correlate to human-rated similarities between words (e.g., Caliskan, Bryson, and Narayanan 2017; Pennington et al. 2014). Further, while word embeddings are trained to represent words as positions in space (i.e., as vectors), empirical work finds that the space between word-vectors may also capture semantic information. Most famously, the direction to travel in semantic space between the word-vectors “king” and “queen” is often similar to the direction to travel between “man” and “woman” in semantic space. That is, the difference between the locations of “king” and “queen” in semantic space is similar to the difference between “man” and “woman” (Mikolov, Chen, et al. 2013; Mikolov, Sutskever, et al. 2013). The difference may be measured by subtracting the corresponding word-vectors, e.g., “woman” — “man.” The result of this subtraction is a vector that may be interpreted as a latent line in space, pointing to words about women at one pole and words about men at the other pole (see figure 1). In fact, a variety of concepts beyond gender may be encoded as latent dimensions in space (e.g., Grand et al. 2022; Kozlowski et al. 2019; Mikolov, Sutskever, et al. 2013). This property of word embeddings attests to their ability to encode semantic information in nuanced ways. Furthermore, as described next, this property makes word embeddings a remarkably useful social science method.
Figure 1.
Conceptual illustration of a latent dimension in semantic space corresponding to gender (left) and morality (right).
1.1. Word Embeddings as Social Science Methods
In recent years, word embedding has exploded as an exciting new method in social science (for a review, see Stoltz and Taylor 2021). One popular analytic approach is to deductively identify a latent semantic dimension (e.g., gender, social class, sentiment, or morality) in the embedding space, and then examine how a set of sociologically interesting keywords (e.g., occupational roles or stereotypical traits) are positioned along this dimension (e.g., Kozlowski et al. 2019). For example, researchers can identify a line corresponding to gender (e.g., by subtracting the word-vector “she” from “he”) and then examine how close occupational terms are to the pole about women versus the pole about men (Bolukbasi et al. 2016). Effectively, this approach enables analysts to compute the association of a word (or set of words) with a latent concept in text data. Because this approach is generalizable to a range of key terms and dimensions, it is useful to many sociological domains.
As one example of a sociological application, Jones et al. used this approach to investigate the gendered associations of words about career, family, science, and art, using word embeddings trained across two centuries of books (2020). Their findings suggest that many of the gender stereotypes in these domains have receded in language over time. As another example, Kozlowski et al. used word embeddings to investigate five dimensions of meanings relevant to social class in books across the twentieth century (2019). Their results suggested that the cultural associations of education with social class emerged only in recent decades, while in the earlier part of the 20th century these associations were mediated by meanings of cultivation. As these studies illustrate, word embeddings enable social scientists to investigate cultural phenomena in ways that may be impractical or even impossible using surveys or other traditional social science methods.
More broadly, social scientists often use word embedding methods to investigate the relationships between language, widely held personal meanings (e.g., from survey responses), and broader societal patterns (e.g., demographic trends). For instance, Caliskan et al. illustrated the correspondences between various implicit associations in human participants1 and cosine similarities between corresponding word-vectors (2017; see also Lewis and Lupyan 2020). Further, Garg et al. showed that the way occupations are gendered in word embeddings corresponds tightly (correlations around 0.9) to the proportion of women in these occupations based on Census data, both in present day and across time (2018). Both papers identified these patterns across embeddings trained on a variety of media corpora. The results both validated word embedding methods and illustrated the surprising extent to which our language encodes undesirable biases and inequalities. Given that embedding methods enable social scientists to efficiently leverage historical text data, a stream of social science uses word embeddings to specifically investigate cultural change (e.g., Best and Arseniev-Koehler 2022; Boutyline et al. 2020; Charlesworth, Caliskan, and Banaji 2022; Jones et al. 2020; Kozlowski et al. 2019; Rozado and al-Gharbi 2021; Voyer et al. 2022). Thus far, this work has been largely associative. But it paves the way for future, more causal work on the links between language, culture, and material patterns.
Despite the promise and prevalence of word embeddings for empirical research, sociologists are just beginning to reconcile these methods with sociological notions of meaning and culture (e.g., Arseniev-Koehler and Foster Forthcoming; Kozlowski et al. 2019). To theorize word embedding methods, it is important to first understand how we arrive at a trained word embedding from raw text data. These methodological details are briefly reviewed next.
1.2. Approaches to Compute Word Embeddings Commonly used in Social Science
Social scientists use a wide variety of algorithms to estimate word embeddings from text data. In this paper, I focus on describing one key algorithmic difference which will be most relevant to my theoretical arguments: the use of count-based approaches versus using a machine-learning framework called artificial neural networks (i.e., neural word embeddings) to train word embeddings from text data (Baroni et al. 2014).
Count-based approaches begin with a word by word (or word by document) co-occurrence matrix computed from the entire corpus and attempt to reduce the dimensionality of this matrix by finding N latent, lower-dimensional features that encode most of the matrix’s structure. A wide variety of methods may be used for dimensionality reduction. The output from performing dimensionality reduction is a word by N-dimensional matrix: each row is an N-dimensional word-vector. Among the most popular and successful count-based word embedding approaches is GloVe (Pennington et al. 2014). However, since the publication of word2vec (Mikolov, Chen, et al. 2013; Mikolov, Sutskever, et al. 2013), neural word embedding architectures are becoming dominant in computer science for their flexibility and performance on downstream tasks (Baroni et al. 2014; Mandera et al. 2017).2
Neural word embeddings use artificial neural networks to incrementally learn word-vectors from a given corpus as they “read” the text data and attempt to predict missing words in the data. For example, in word2vec3 with Continuous Bag-of-Words (word2vec-CBOW), the model learns word-vectors while attempting to “fill-in” missing words from various sets of contexts in the text (Mikolov, Chen, et al. 2013; Mikolov, Sutskever, et al. 2013). Word2vec-CBOW is iteratively given a context of words (e.g., 10 words) with one word missing, and is tasked with predicting the missing word. To make a prediction, the model first combines the observed context word-vectors to form a single vector representing the context. Second, it predicts the missing word based on the word-vector which is most similar (or closest in space) to this context vector.4 Since word-vectors may be initially randomized, the model initially tends to predict the missing words incorrectly. Each time it guesses incorrectly, the correct word is revealed, and the model updates the word-vectors to reduce this prediction error (and thus, improve its chances at guessing correctly if it were to see this context again).5 As word2vec adjusts word-vectors across many attempts to predict words from their contexts, the word-vectors begin to better represent words. Upon reaching some predetermined stopping point (e.g., the level of accuracy), social scientists often stop training and use the most recent word-vectors for downstream analyses. Other than CBOW, the second possible word2vec training task is Skip-Gram, where the goal is to guess the context words around a target word.
All these variants of neural network based and count-based embeddings produce N-dimensional word-vectors, and they may perform similarly on many linguistic tasks (e.g., Mandera et al. 2017). Still, they offer very distinct mechanistic explanations for learning and processing linguistic information (e.g., Arseniev-Koehler and Foster Forthcoming; Günther et al. 2019; Hollis 2017; Mandera et al. 2017).6 As I will show, these differences have implications for theorizing the kind of meaning that word embeddings operationalize.
Computer scientists developed contemporary word embeddings to enable computers to learn, process, and represent human language—not to operationalize any particular theory of language or linguistic meaning.7 Now, these methods are rapidly gaining traction in social science and serve as a foundation to many language modeling advances in computer science. Therefore, it is crucial that we clarify what they do and do not operationalize. This paper critically evaluates the possibility that word embeddings operationalize structural linguistics.
2. A PRIMER ON STRUCTURAL LINGUISTICS
Structural linguistics is both a theoretical perspective on language, and an analytic approach to study language (Craib 1992:131–48; Joas and Knobl 2009:339–70; Leschziner and Brett 2021). In this paper, I focus on three premises of structural linguistics which were influential in cultural sociology.
A first core premise is that language is relational: it is composed of various signs (e.g., words, suffixes, phonemes) and these signs are defined by their relationship to other signs, rather than by any external reality (Saussure, 1983, p. 113). For example, a word is defined by its co-occurrence relationship to all other words—not from the intrinsic properties of the letters or sounds that comprise the word, from dictionary definitions, or by its reference to some external object. This suggests, for example, that if a misspelled word is used in a similar way as a correctly spelled word, both spelling variants will be understood in the same way. If, however, spelling variants are used in some systematically different way (e.g., British versus American spellings), the variants may be understood as distinct—even when both spellings refer to the same physical object. Envisioning language as relational, rather than rooted in referents, may also be interpreted as seeing language (or any other symbolic system) as self-contained and autonomous (Barthes 1977).
Identifying and understanding the relational structures in language is the core goal of structural linguistics. One well-studied type of relationship is a binary opposition: a structure of meaning where two concepts are defined by their oppositional relationship to one another (Lévi-Strauss 1963:35). In this perspective, for example, we cannot conceive of the concept of “good” without that of “evil” because they form a binary opposition. “Good” is defined by its distinction from “evil” and vice versa. Theoretically, structuralism suggests that binary opposition is a key, latent structure of meaning which scaffold symbolic systems, such as language. Therefore, a common empirical goal in structural linguistics (and structuralist-inspired scholarship more broadly) is to identify binary oppositions (e.g., Alexander and Smith 1993; Barthes 2012; Jones and Smith 2001; Lembo and Martin 2021; Lévi-Strauss 1963:35).
A second core premise of structural linguistics is that underlying the inconsistent ways in which we use words (“parole”), there exists a latent, stable, and coherent linguistic system (“langue”). Structural linguistics focuses on studying langue rather than the varying ways in which we use words (or other linguistic units). Similarly, contemporary work influenced by structural linguistics also often hypothesizes and studies a latent system organizing the disparate ways in which a set of symbols are used (e.g., Barthes 1961; Cerulo 1995; Tavory and Swidler 2009).
Initially, langue was also described as something psychological (i.e., internalized), and generalized beyond any individual language user (thus shared or cultural) (Saussure 1983; see also Stoltz 2019). However, scholarship influenced by structural linguistics remains divided between envisioning symbolic systems external to individuals’ minds (i.e., in public culture (Lizardo 2017)) versus as something internalized or cognitive (e.g., Alexander 2003; Lévi-Strauss 1963). As I will illustrate in this paper, word embedding methods have also inherited this division. They are sometimes used to study meaning in personal culture, and other times they are used to study meaning in public culture.
A third core premise of structural linguistics is the distinction between studying language as a static versus dynamic system. In structuralist jargon, these are referred to as “synchronic” versus “diachronic” analyses. The former considers how the parts within a linguistic system interact at any given point (e.g., what are the kinds of the relationships that exist between words or morphemes). The latter focuses on how this system changes and why (e.g., how words’ positions in the system change across time or how new words emerge). Analogously, one can study chess as a static or dynamic system: we can freeze a chess game and describe where the pieces lie on the chess board in relation to one another, or we can describe the movements of pieces across a game (Saussure 1983). Structural linguistics focuses on theoretically understanding (and empirically studying) language as if it were static.
3. OPERATIONALIZING STRUCTURAL LINGUISTICS WITH WORD EMBEDDINGS
Here, I detail the ways in which word embeddings can be used to operationalize each of the three premises of structural linguistics described previously: the focus on language as a relational, coherent, and static system. Where relevant, I distinguish between cognitive and non-cognitive interpretations of these premises which, as noted earlier, are two variants of structuralism. I primarily consider these three premises as they pertain to the meanings of words. However, given that structural linguistics generalizes to linguistic units other than words (e.g., morphemes) and to symbolic systems beyond language, many of the following arguments may be widely generalized as well.
3.1. Modeling Language as a Relational System with Word Embeddings
Word embeddings operationalize language as a relational system in several ways. Most crucially, like many text analysis methods, they rely on the Distributional hypothesis (Firth 1957; Harris 1954). This hypothesis suggests that words may be understood by differences in “the company they keep”—i.e., their co-occurrence relations. It is no accident that this hypothesis is fundamentally relational: it emerged in structural linguistics (Sahlgren 2008), not computer science.
While all word embeddings operationalize the Distributional hypothesis, they do so in radically different ways. Count-based models learn from global patterns of co-occurrence, while neural network models learn from many local contexts. The fact that both approaches work comparably is remarkable, illustrating how global relationships between words can be estimated from many local contexts. Analysts also vary widely in how they define these contexts (e.g., the size of the context window, and how they combine context words into a single context vector). In all cases, the resulting word-vectors are also only defined relationally: they are only interpretable to human analysts because of their relative positions. Word-vectors are not tied to any external referents. Any given word-vector is arbitrary and uninformative outside of its semantic space.
The extent to which word embeddings operationalize a relational theory of meaning depends on the analyst’s interpretation of “meaning” in the Distributional hypothesis. Indeed, the hypothesis is notoriously vague when it comes to the relationship between distributional structure and meaning (Harris 1954:151–57; see also Sahlgren 2008). The concept of meaning is only relevant when the analyst introduces it. In practice, researchers often implicitly interpret meaning in the Distributional hypothesis somewhere along two extremes (Lenci 2008).
At one extreme lies the weakest reading of the hypothesis: a word’s meaning—whatever that might be—correlates to the patterned ways in which it is used in language (e.g., Harris 1954). In this first reading, “meaning” is latent: word embeddings are not necessarily capturing any meaning at all, let alone a relational notion of meaning. Under this interpretation, word embeddings trained on large-scale cultural texts offer proxies for meaning. This use of word embeddings may also be motivated by the intuition that public culture, like media and books, reflects personal culture (e.g., Garg et al. 2018; Kozlowski et al. 2019; Xu et al. 2019).
At the other extreme, in a stronger interpretation of the hypothesis, the meaning of a word emerges from its distributional patterns in natural language (rather than, say, its dictionary definition, the emotions evoked by a word, or a word’s reference to a physical object). This second interpretation is distinctly structuralist, suggesting that words are defined relationally rather than by anything external to the linguistic system. This second interpretation is also more cognitive and causal. It suggests that our meanings of words may be learned from (and influenced by) relational patterns in natural language (Lenci 2008). A midrange (“partly structuralist”) interpretation is that meaning is, at least in part, learned from experiences with natural language.
Under a midrange or strong interpretation of the Distributional hypothesis, word embeddings measure (at least to some extent) the meanings that may be evoked by language. Word embeddings, then, are empirically useful to investigate the information that can be learned from, or reinforced by specific language sources, like children’s books (Lewis et al. 2022) or news reporting on obesity (Arseniev-Koehler and Foster Forthcoming). More generally, this approach to using word embeddings is motivated by the intuition that public culture shapes personal culture.
Finally, empirical analyses using word embeddings often focus on identifying relational structures—primarily, binary oppositions (e.g., Arseniev-Koehler and Foster Forthcoming; Best and Arseniev-Koehler 2022; Boutyline et al. 2020; Caliskan et al. 2017; Garg et al. 2018; Jones et al. 2020; Kozlowski et al. 2019; Nelson 2021; Taylor and Stoltz 2021). For instance, as described in part one, a core approach to measure a concept, like gender, is to identify a line between the word-vectors for two opposing poles (e.g., “woman” and “man” for gender). This measure operationalizes the concept of gender as ranging continuously from one pole to the other. Being closer to one pole implies being farther from the other (e.g., more masculinity implies less femininity). In a similar approach, analysts identify word-vectors corresponding to two poles of an opposition (e.g., “woman” and “man” to represent gender) and then compare the distance of some interesting word to each pole. This second strategy does not assume that more femininity implies less masculinity or that gender is represented as a line in semantic space, but still identifies concepts as oriented by two poles and thus assumes they are relational constructs.
3.2. Modeling Language as a Coherent System with Word Embeddings
From the varying ways in which words are used in a corpus, word embeddings aim to abstract a latent, coherent system. This may be thought of as abstracting “langue” from “parole.” The abstraction occurs in several ways. First, word embedding methods commonly used in social science (e.g., word2vec) represent each word as a single word-vector. Thus, the goal is to capture the regularities of words across all the various contexts in which the word appears in the corpus. Modeling each vocabulary word as a vector assumes there is systematicity across the varied instances of each vocabulary word in the data.
Second, the architecture of word embeddings aims to find latent regularities across vocabulary words, such that a limited number of dimensions can represent all words in a vocabulary. More specifically, words are represented as vectors where each element corresponds to a loading on each of N dimensions, as described in section one. The dimensionality (N) of word-vectors is always far lower than the vocabulary size of the corpus. Often, N is set at a few hundred, while vocabulary sizes often range from tens of thousands to several hundred thousand words depending on the corpus. This difference in sizes assumes that there are latent patterns across the raw co-occurrences of words which will accurately capture a high dimensional vocabulary. Dimensions are shared and reused across vocabulary words to represent different aspects of their meaning, and so only a limited number of dimensions is needed to model words. In fact, this compression is thought to be critical to the superior performance of word embeddings (e.g., on analogy tests) compared to word-vectors based on raw co-occurrences (Arora et al. 2016). The high performance of word embeddings also attests to the validity of modeling words’ meanings as a coherent system.
As noted in part two, interpretations of langue diverge as to whether it is internal or external to individuals. Similarly, current social science employing word embeddings also comes in a cognitive and non-cognitive flavor. Some work uses word embeddings as cognitive models: to learn about how semantic information might be learned by, represented in, and processed by human minds (e.g., Arseniev-Koehler and Foster Forthcoming; Baroni et al. 2014; Günther et al. 2019). Other work uses embeddings as methods to learn about cultural texts, cultural construction, and broad-scale cultural change (e.g., Garg et al. 2018; Kozlowski et al. 2019). These two approaches to using word embeddings may be thought of as operationalizing cognitive and non-cognitive interpretations of langue. The former aims to abstract langue as a system that is internalized and learned from cultural texts, while the latter aims to abstract langue as a latent system of meaning encoded in a cultural text.
3.3. Modeling Language as a Static System with Word Embeddings
Word embeddings can also model language as a static system of signs, following a structural linguistic perspective. Neural and count-based word embeddings do so differently. As described in section one, count-based embeddings abstract a semantic space from global co-occurrences while neural word embeddings learn a semantic space incrementally, as the by-product of “reading” and predicting data. When social scientists use word-vectors from neural embeddings for empirical analyses, they pause the training8 and extract the word-vectors as a semantic space. Thus, all word embedding can be used to examine language as a static system, but count-based and neural word embeddings do so in different ways. As I will elaborate on in section 4.4, differences between neural and count-based word embeddings matter when we consider critiques of studying language as something static.
Social scientific analyses using word embeddings vary in the extent to which they focus on language as static or dynamic. While some empirical work using embeddings investigates a cultural phenomenon at a single time point (e.g., Caliskan et al. 2017; Lewis and Lupyan 2020; Nelson 2021), a large body of scholarship uses word embeddings to track words or concepts (e.g., stereotypes) across time (e.g., Best and Arseniev-Koehler 2022; Boutyline et al. 2020; W. L. Hamilton, Leskovec, and Jurafsky 2016; Jones et al. 2020; Kozlowski et al. 2019). The former’s static lens may be considered more characteristically “structuralist.”
4. FOUR CRITIQUES OF STRUCTURAL LINGUISTICS
Here, I consider four critiques of structural linguistics: (1) that meaning may be grounded or embodied, rather than purely relational, (2) that a focus on binary oppositions is reductionistic, (3) that meaning is incoherent, and (4) that language is dynamic. Like the previous section, I focus on these critiques as they relate to language specifically. However, many of these critiques and arguments generalize beyond words and language to other symbols and semiotic systems. After briefly introducing each critique, I argue how it applies (or does not apply) to word embeddings and highlight implications for sociological applications of word embedding methods.
4.1. Critiques of Purely Relational Approaches to Meaning
While structural linguistics theorizes signs as defined by their relationships with one another, other scholarship emphasizes that our understandings of words are linked to concrete referents in the external world: physical objects and events, sensorimotor experiences, and/or emotional experiences (e.g., Barsalou 1999; Lakoff and Johnson 2008; Moseley et al. 2012; Pulvermüller 2013; Quiroga et al. 2005; Smith and Gasser 2005). For example, like word embeddings, humans can know when and how to use the word “summer” next to words like “spring” and “sun.” But in addition, humans can understand when “summer” refers to a specific, upcoming time point. And we can identify other, non-linguistic references to “summer,” such as from a calendar. These examples illustrate that meaning may be grounded in concrete referents (see also Bryson 2008). We may also learn the meanings of “summer” as something about relaxation, joy, sunshine, and warmth, from our experiences in summer months. Hearing or seeing the word may then evoke these same sensations and feelings. Perhaps our meaning of the word “summer” entirely consists of the feelings and sensations evoked by the word. These examples illustrate that meaning may be embodied—that is, linked to our bodily sensations.
Across the disciplines, there is growing consensus that our brains incorporate semantic information from a variety of sources (Davis and Yee 2021; Quiroga et al. 2005), not merely from language. However, exactly how this occurs is less understood. For instance, it is unclear to what extent cultural meaning is linguistic versus embodied/grounded, and how we combine semantic information from distinct sources (e.g., linguistic and sensorimotor). Therefore, this critique highlights that while the meanings of words (and other symbols) are likely relational to some extent, language is far more than just a self-contained system of signs.
4.1.1. Implications of these critiques for word embeddings
The extent to which critiques of a relational approach to meaning apply to word embeddings partly depends on the analysts’ interpretation of “meaning” in word embeddings. As detailed in section 3.1, a weaker reading of the Distributional hypothesis is that a word’s meaning correlates to its relationship to other words in a language. A stronger reading of the Distributional hypothesis is that a word’s meaning is defined by its relationship to other words in a language. This stronger interpretation is also more structuralist, and vulnerable to the critique that linguistic meaning may also be grounded or embodied, rather than merely relational.
Despite critiques of a purely relational notion of meaning, some words do not have concrete references or are not easily experienced. These words’ meanings are unlikely to be learned from concrete referents or sensorimotor information alone (Borghi et al. 2017). For example, humans know how to use words like “depression,” and “royalty”, even though we have not all experienced these concepts. More generally, it is unlikely that we learn the meanings of abstract words (e.g., “epistemic” and “subjective”) from sensorimotor experience or physical objects. These points suggest that meaning is not likely entirely embodied, either. The Distributional hypothesis suggests a mechanism for learning and communicating more abstract concepts: experiences with language and the relational patterns between words (Borghi et al. 2017; Günther et al. 2019).
In fact, the high performance of word embeddings across a range of linguistic tasks (e.g., solving analogies) provides one of the most convincing demonstrations of the Distributional hypothesis, and a relational notion of meaning more generally. Even though word embeddings learn from the distributional patterns of words alone, they learn semantic information that strongly correlates to what humans learn (for a review, see Caliskan and Lewis 2022). This suggests that a mid-range reading of the Distributional hypothesis is warranted: that meaning correlates to distributional patterns of words and meaning can be partly learned from the patterned ways in which words are used (see also Davis and Yee 2021; and Lenci 2018).
Even though word embeddings operationalize meaning as purely relational, they are still useful for sociologists to study sensorimotor and emotional information that is encoded into language. For instance, researchers might test how language about senses are used to make sense of other domains, like descriptions of sexual relationships in terms of sweetness, bitterness, heat, or cold (see also Tavory and Swidler 2009). Dictionaries with sensorimotor information about words may be used to identify words about senses (Lynott et al. 2020). More generally, researchers can test for a range of conceptual metaphors in language (Lakoff and Johnson 1980), such as how semantic information about orientation organizes semantic information about morality (Lakoff and Johnson 2008). Because word embeddings encode rich semantic representations (and are scalable), they can be used to address calls to consider embodied knowledge as part of cultural and cognitive sociology (Ignatow 2007), especially through text analysis (Cerulo 2019; Ignatow 2009, 2016).
Further, a stream of work in computer science aims to develop language models where meaning is both relational (learned from distributional patterns in text data), and learned from extra-linguistic experiences, such as images of what a word represents (Baroni 2016; Bruni, Tran, and Baroni 2014; Goh et al. 2021; Li and Gauthier 2017; Radford et al. 2021; Roy 2005; Vijayakumar, Vedantam, and Parikh 2017). These multimodal word embeddings integrate semantic information derived from text, images, sound, or other formats. Empirically, multimodal embeddings capture slightly different information than what may be learned from text alone (e.g., Vijayakumar et al. 2017). For example, while word2vec learns the word “apple” is closest in space to “apples,” “pear,” “fruit,” and “berry,” a word2vec model also trained on sounds learns “apple” is closest to “bite,” “snack,” “chips,” and “chew” (Vijayakumar et al. 2017). Multimodal embeddings are not yet popular in social science but offer an exciting direction for sociologists to address and overcome critiques of a relational notion of meaning.
4.2. Critiques of Binary Oppositions
The focus on binary oppositions in structural linguistics (and structuralism more broadly) also garners extensive critique (Craib 1992). Binary oppositions are one among many possible forms of meaning. Opposition itself comes in many varieties: hierarchical, continuous, dichotomous, or graded (Geeraerts 2010:87). For instance, we might describe aesthetics as a dichotomous concept (as unattractive versus beautiful), or on a graded scale (unattractive versus plain versus pleasant versus beautiful). Oppositions might be discrete and mutually exclusive, such having meaning at one pole of the opposition implies the complete lack of meaning at the other pole (e.g., dead versus alive). They may have an evaluative component, such as good versus bad and clean versus dirty. We might also have ensembles of multiple oppositions. For instance, the Western meaning-system for direction consists of two binary oppositions (north/south and east/west) or of three binary oppositions (up/down, left/right, and forward/backward) (Geeraerts 2010:87). Concepts may be also multidimensional, such as the constructions of race and ethnicity. In sum, the structuralist focus on binary oppositions (as opposed to other oppositions or other forms of meaning) may be overly reductionistic.
4.2.1. Implications of these critiques for word embeddings
Limitations of binary oppositions in structural linguistics also directly apply to the large body of work in word embeddings which focuses on studying semantic dimensions. Indeed, analysts have measured a wide variety of concepts as binary oppositions in semantic space, such as gender, age, morality, and size. At the same time, researchers find tremendous variation in the extent to which the resulting measures actually match human-rated perceptions of the concept (e.g., Chen, Peterson, and Griffiths 2017; Grand et al. 2022; Joseph and Morgan 2020). These findings underscore theoretical critiques of the limitations of binary oppositions.
Gender is the canonical case for studying concepts as binary oppositions in word embeddings. But gendered meanings in word embeddings (when measured as a binary opposition) also appears to have an exceptionally tight correspondence to human-ratings—unlike, for example, race (Grand et al. 2022; Joseph and Morgan 2020). Perhaps, gender is also an outlier in the extent to which it manifests in natural language as an opposition between two poles (see also Ethayarajh, Duvenaud, and Hirst 2019). Indeed, gender is frequently and explicitly denoted in many languages (as masculine vs feminine) with pronouns, suffixes, and other grammatical endings. Alternatively, perhaps human raters conceptualized the concept of gender around two poles, more so than they tended to do for other concepts. It is well-known that gender is pervasively constructed as a binary between men and women (Ridgeway 2011). Although scholarship has not yet resolved why gender is an outlier in these word embedding studies, it does highlight limitations and nuances of measuring all concepts as binary oppositions in semantic space.
Some word embedding scholarship goes beyond binary oppositions by looking for systems of oppositions and other latent structures in space. Kozlowski et al. studied class as a system of oppositions, focusing on the relationships between five dimensions of class across time (2019). Boutyline et al. investigated gendered stereotypes relevant to education in print media from 1930–2009, including the gendered cultural associations of effort and intelligence (effort and intelligence are stereotypically feminine and masculine routes to success, respectively) (2020). Across time, the gendered associations of effort and intelligence became increasingly and synchronously polarized: as the former gained feminine associations the latter gained masculine associations. These results suggest that these gendered stereotypes changed together across time as a system of oppositions.
Finally, researchers have also begun to investigate information structures beyond oppositions, such as topical regions or clusters of words in semantic space (e.g., Arora et al. 2018; Arseniev-Koehler et al. 2022). Further, as demonstrated by Nelson (2021), because word-vectors can be decomposed and recombined, word embeddings can be used to look at meaning from an intersectional lens where binary oppositions may interact. This can be done, for instance, by combining the word-vectors “woman” and “Black,” and comparing this with the combination of “woman” and “white.” Meanwhile, another body of work investigates how to build word embeddings that can model even more nuanced forms of information, such as hierarchy (Nickel and Kiela 2017). Thus, while early scholarship using word embeddings heavily focused on binary oppositions, emerging scholarship considers other structures that can overcome critiques of binary oppositions.
4.3. Critiques of Coherence
The coherence posited by structural linguistics (and structuralism more generally) is also one of its most controversial aspects. From this perspective, structuralism envisions meaning (of words or other symbols) as unrealistically logical, systematic, and homogenous (e.g., Bakhtin 1981; DiMaggio 1997; Martin 2010; Sewell 2005:169–72; Swidler 1986, 2013). Scholars also emphasized the necessity of accounting for context (e.g., Douglas 2003; Labov 1972). In the case of language, a word’s meaning may vary depending on a host of factors, such as where the text is produced and by whom, or who is reading the text (e.g., Franco et al. 2019; Geeraerts, Grondelaers, and Bakema 2012; Geeraerts and Speelman 2010; Hu et al. 2017; Peirsman, Geeraerts, and Speelman 2010; Robinson 2010).
As an example, empirical work shows that individuals’ interpretations of the word “awesome” varies widely across linguistic contexts; interpretations depend on the individuals’ age, gender, and even neighborhood (Robinson 2010). Even among uses of a word within a given document, the word may evoke very different interpretations depending on its surrounding words—a phenomenon known as polysemy. For instance, the word “depression” has an entirely different meaning in a sentence about mental health versus one about economics. The word “depression [economics]” is even more specific “The Great Depression,” which refers to a particular economic depression. Such examples of the variation of words highlights the shortcomings of conceptualizing meaning as a coherent system of signs.
4.3.1. Implications of these critiques for word embeddings
Word embedding methods commonly used in social science, such as word2vec and GloVe, are also vulnerable to longstanding critiques of coherence. Most crucially, these models use a single word-vector for each vocabulary word in the corpus, thus smoothing over the varying ways in which a word is used across the training corpus. Because these models are insensitive to linguistic context, they are commonly critiqued as modeling words’ meaning as unrealistically coherent (e.g., Faruqui et al. 2016; Gladkova and Drozd 2016; Neelakantan et al. 2015; Wang et al. 2020).
In fact, this limitation prompted a variety of new approaches in computer science to allow linguistic meaning to be more context dependent. Most notably, computer scientists developed a new paradigm to model language: “contextualized” neural word embeddings (Devlin et al. 2019; M. Peters et al. 2018). While models like word2vec and GloVe represent each vocabulary word as a vector, contextualized models produce a vector for each instance of a word in a text. For example, using a contextualized embedding, each time the word “depression” is used in a corpus it may be modeled with a slightly different vector. The raw word-vector for “depression” is modified based on the context words used around each mention of “depression.” Thus, contextualized models enable words’ meanings to vary depending on linguistic contexts. Contextualized models revolutionized how language is modeled in computer science and artificial intelligence.
The specific approaches to contextualize word-vectors vary widely, but all use some form of an artificial neural network (for reviews, see M. E. Peters et al. 2018; Wang et al. 2020). The broad goal of training is to learn both stable and contextual aspects of language. To gain intuition into how a model might produce contextualized word-vectors, consider a very simplified strategy (roughly based on Akbik, Blythe, and Vollgraf 2018; Peters et al. 2017). Like word2vec, an artificial neural network is tasked with “reading” a sentence and predicting the next word. However, as this network predicts the next word in the sentence, it also keeps an ongoing vector representing the “gist” of what is currently being talked about at any point in the sentence.9 This “gist” is updated with each new word encountered in the sentence, and it is a function of the sequence of preceding words. Part of the model’s training process is learning how to maintain this “gist”: learning what information to keep, what to forget, and how to use information previously encountered, and as it reads a sentence and predicts a word. Once training finishes, we can input a sentence and, for any word used at some point t in the sentence, we can extract out the “gist” at time t. This “gist” is still a vector, but it is a function of the preceding context words in the sentence. This gist may also be combined with a non-contextualized word-vector (e.g., concatenated or summed). Regardless of the approach, contextualized word embeddings allow for heterogeneity across linguistic contexts.
Still, even contextualized word embeddings do not entirely overcome critiques of coherence. For instance, they do not account for heterogeneity across extra-linguistic contexts—such as who produced the text, when, where, or why. In addition, contextualized models (particularly, the most recent ones, like BERT (Devlin et al. 2019)) require extraordinarily large corpora for training. Therefore, training from scratch on a dataset is generally impractical or impossible. Instead, researchers typically begin with one of a few, select models (sometimes called “foundation models”) which are already trained on supersized corpora (Bommasani et al. 2021). They then continue to train (i.e., “fine tune”) the model to their specific data or task. While foundation models are remarkably adaptable, even after fine-tuning they will still reflect their initial training data in various ways (Merchant et al. 2020). Thus, contextualized models still assume there is some underlying coherence to language.
Using traditional (i.e., non-contextualized) word embeddings, one practical strategy to address the varied uses of words within a corpus is to train multiple models on various subsamples of the data (e.g., bootstrapping). This strategy is often used to ensure that findings are robust to any particular subset of documents (or other contexts) (e.g., Arseniev-Koehler and Foster Forthcoming; Best and Arseniev-Koehler 2022; Boutyline et al. 2020; Kozlowski et al. 2019). But it also reveals the extent to which empirical findings are sensitive to specific usages of words. Resampled or bootstrapped embeddings make it possible to model variation of a words’ meanings across the contexts. They also make it easier to distinguish between patterns that are robust across documents versus specific to subsets of documents. However, this approach is not as sensitive to context as is contextualized word embedding. And it still does not account for the many other sources for variation in linguistic meaning.
For sociological applications of word embeddings, contextualized models and resampling offer ways to empirically investigate the extent to which meaning is coherent. One study in computer science investigated the extent to which meaning is contextual in contextualized embeddings, by comparing words’ vectors from contextualized versus non-contextualized embeddings (Ethayarajh 2019). This study found that, on average, less than 5% of the meaning of a word’s contextualized word-vectors (where there is one word-vector from each instance of the word in the corpus) could be explained by a single word-vector. Further, contextualized models dramatically outperform non-contextualized models on many linguistic tasks. This might suggest that enabling words’ meaning to vary across linguistic contexts offers a better model for human meaning. Or perhaps the strong coherence assumed by models like word2vec and GloVe is indeed overly unrealistic.
However, empirical work illustrates that it is not necessarily the contextualization process itself that leads to contextualized models’ improvement on linguistic tasks (Arora et al. 2020). Indeed, contextualized embeddings also include radically more complex training architectures and are trained on tremendously larger sizes of corpora. The extent to which contextualized embeddings outperform static models in downstream applications varies widely across specific linguistic tasks (Arora et al. 2020; Ethayarajh 2019; Tenney et al. 2019). For certain tasks and corpora, contextualized word embeddings only yield marginal improvements (Arora et al. 2020; Tenney et al. 2019). In some cases, contextualized and non-contextualized embeddings trained on similar corpora even perform similarly, thus validating structuralist visions of coherence. For social scientists, these findings suggest that the extent to which meaning is coherent is far more nuanced and remains an open (and promising) research area well suited to word embedding methods.
4.4. Critiques of a Static Lens on Language
Structural linguistics conceptually distinguished the study of language across time from the study of language at a single time point. While this was useful for analytical purposes, structural linguistics also struggled to reconcile these two lenses (Giddens 1979:13; Stoltz 2019). Even if we give precedence to theorizing a symbolic system at a single time point, we also need to be able to explain changes in this system (Emirbayer 2004:10–11; Giddens 1979). As a theory, and even as a framework for empirical analysis, structural linguistics cannot account for how language (or any other symbolic system) may be both static and dynamic.
4.4.1. Implications of these critiques for word embeddings
The extent to which word embeddings are vulnerable to critiques of structural linguistics’ static lens partly depends on the approach used to learn word embeddings: count-based versus artificial neural network based. Count-based word embeddings model a symbolic system as static but abstract the whole semantic space at once by performing some dimensionality reduction on a co-occurrence matrix, as described in section one. These methods do not incorporate any mechanism for change in a semantic space. Thus, count-based embeddings, like structuralist linguistics, cannot reconcile a static and dynamic account of language.
By contrast, neural word embeddings (including contextualized models) model language more dynamically. The word-vectors are deployed and updated each time new cultural stimuli (e.g., text excerpts) are encountered. Upon experiencing a context (i.e., text excerpt), the neural word embedding uses its current information about each vocabulary word (i.e., “looks up” the word’s position in the semantic space at this point) to make a prediction about the missing word in the context. When the prediction is incorrect, the positions of word-vectors are shifted, yielding an updated symbolic system. Thus, word-vectors structure how neural word embeddings experience any incoming language and are simultaneously structured by new experiences with language. In this way, neural word embeddings operationalize the notion that a symbolic system is both a “thing” and a “process,” i.e., a “structuring structure” (Bourdieu 1984; Giddens 1979; Sewell 1992). The symbolic system captured by neural word embeddings is part of a dynamic process: it changes as the embedding interacts with its cultural environment and experiences new data.
When we use word-vectors from neural word embeddings in social science applications, we generally stop the training process and begin analyses on the “frozen” system. We have extracted the word-vectors as static representations from a system that can hypothetically change at any time with additional stimuli (i.e., additional text data) if we were to “unfreeze” the system. Thus, unlike count-based embeddings, neural word embeddings lend themselves to static analyses, but do not entirely divorce static and dynamic lenses.
Importantly, this account of neural word embeddings as dynamic makes more sense for the cognitive flavor of structuralism, where langue is internalized in a single individual. “Training,” then, may be thought of as cognitive socialization (Arseniev-Koehler and Foster Forthcoming). Notably, neural word embedding represents only one possible source for meaning change within an individual: novel experiences with cultural symbols. It does not account for many other possible factors for meaning changes, such as social relationships, and it does not offer an account for macro-level change in meaning.
Further, while neural word embeddings offer a possible theoretical reconciliation between static and dynamic lenses, methods to empirically study semantic change with embeddings remain limited (for a review, see Kutuzov et al. 2018). This limitation is important to address given the growing sociological interest in investigating culture across larger time-scales using word embeddings (e.g., Best and Arseniev-Koehler 2022; Boutyline et al. 2020; Jones et al. 2020; Kozlowski et al. 2019; Stoltz and Taylor 2020). One popular approach to study semantic change is to divide up the corpus into time segments and then compare embeddings trained on these separate segments (using count based or neural word embeddings) (Kulkarni et al. 2015). Then, to compare word-vectors across time points, researchers may either (1) rotate embeddings from different segments so that their word-vectors are directly comparable, or (2) compare cosine similarities (i.e., between words or sets of words) in different segments. A downside of the second strategy is that it assumes most words did not shift in meaning, and so local relationships are static. A downside of both strategies is that they require training an embedding on each time segment. As a result, they may be unfeasible for many corpora sizes of sociological interest (but see Boutyline et al. 2020). Even on large corpora this approach does not allow for very granular time segments.
A second broad approach to study semantic change involves estimating a word-vector from each time period in the corpus at once in a single, modified neural word embedding model (e.g., Bamler and Mandt 2017; Rosenfeld and Erk 2018; Yao et al. 2018). For instance, in addition to aiming to maximize the similarity of words that occur (and minimizing the similarities of words that do not), Yao et al (2018) suggest a modified model that aims to maximize the similarity of vectors for the same word which occurs at different time points. Rosenfeld and Erk (2018) propose to (1) learn time-invariant embeddings for each word, and (2) embeddings for each time point, and then (3) combine these using a learned function (e.g., a weighted combination) to arrive at a time-stamped word-vector. Although not motivated by any theoretical model for semantic change and more complex than prior approaches, these modified models offer exciting opportunities to investigate meaning change even in smaller-scale data and/or at more granularity.
A third approach to study semantic change using embeddings is to train neural word embeddings on documents ordered across time: “freezing” and saving the system at various points in time, and then comparing the frozen models across this training (Kim et al. 2014). This approach more clearly operationalizes how a symbolic system may change within an individual as they experience text, rather than macro-scale change. It also has practical limitations. For example, the quality of word-vectors may improve with increased training, making it challenging to disentangle training effects from true changes across time using this approach. Further, words which are not present for several time points might simply appear to have no semantic change, and it is unclear how new words may be incorporated into the semantic system.
To sum up, neural word embedding methods offer a model of a dynamic meaning system that may be paused for static analyses. A variety of methods exist to empirically study language as a dynamic system but additional methodological work on modeling linguistic change using word embeddings is essential. We need new approaches as well as validation and theorization of existing approaches. Such methodological advances will enable social scientists to empirically analyze the structure and content of semantic systems across time in more precise and formalized ways. They can also offer new insight into reconciling static and dynamic lenses on language, and cultural symbolic systems more broadly.
5. DISCUSSION
Word embeddings open new doors for social scientists to investigate culture and language at scale. However, like any method, it is crucial that we clarify exactly what word embeddings operationalize. This paper has critically theorized how word embeddings operationalize key premises of an influential theory of language: structural linguistics. Not only can word embeddings be used to operationalize core premises of structural linguistics, but their remarkable successes at capturing human-like semantic information attests to the validity of structural linguistic theory itself.
This paper also theorized the ways in which word embeddings succumb to or overcome several critiques of these structuralist premises, such as debates about the (in)coherence of meaning and relational notions of meaning. As highlighted in this paper, different word embedding algorithms do so differently. In general, while count-based word embeddings share many limitations of structural linguistics, neural word embeddings—and especially, contextualized neural word embeddings—offer solutions to these limitations. For example, neural word embeddings model a dynamic symbolic system, which is then frozen for static, structuralist analyses. Further, while some word embeddings (e.g., word2vec) may model language overly coherent, contextualized neural word embeddings offer solutions to account for variation across linguistic contexts. More broadly, theoretical shortcomings of structuralism parallel advances in computer science to address the limitations of word embeddings. This includes the move from count-based to neural embeddings, the more recent move from static to contextualized embeddings, and the growing interests in diachronic and multimodal embeddings in computer science (see also Bisk et al. 2020).
The extent to which word embeddings succumb to critiques of structuralism also depends on the analysts’ own interpretation of “meaning” in word embeddings. This includes, for example, whether the analyst interprets word embeddings as a theoretical model or as a method to measure meaning. For instance, the Distributional hypothesis may be seen as a modeling one mechanism by which meaning is constructed from public culture. Or the hypothesis may be interpreted as merely an approach to capture a proxy for meaning, whatever meaning may be, thus side-stepping the concept of meaning altogether. Further, neural embeddings may be interpreted as formal models for how humans represent and process semantic information (Arseniev-Koehler and Foster Forthcoming), not only methods to measure meaning. Like network analysis (Borgatti et al. 2009) and other structuralist tools, word embeddings may be understood as a method or as a metaphor (Craib 1992:133).
Directions for Future Social Science Research using Embeddings
This paper highlights numerous future research directions in computational social science and cultural sociology. First, word embedding methods offer opportunities to revisit longstanding debates about the coherence of meaning. While word embeddings like word2vec represent meanings as entirely static, contextualized word embeddings represent meaning as dependent on linguistic context. Contextualized word embeddings make it possible to compare meanings across linguistic contexts, and move past a dichotomous view of cultural meaning as either coherent or incoherent (Ghaziani 2011). These methods offer strategies to measure the extent of the variation of meaning, identify patterns in the distribution of meanings (Sperber 1985), and perhaps ultimately explain how meaning may be both ordered and messy. At the same time, all word embeddings reflect their training corpus. Given that meaning may vary across possible training corpora, analysts must consider how the training corpus for any word embedding is produced, why, and by whom—and whose meanings the embedding represents or excludes. In this way, theoretical work on coherence in cultural sociology may be relevant to contemporary ethical issues in machine-learning (Bommasani et al. 2021), such as that language models often reflect the language (and ideologies about language) of dominant social groups (Blodgett et al. 2020; Shah, Schwartz, and Hovy 2020).
Second, extrapolating from the meaning of words to the meaning of other signs and symbolic systems was an important legacy of structural linguistics. Similarly, while word embeddings focus on the meaning in written language, their algorithms generalize to a range of symbolic systems: from modeling nodes in a social network (e.g., Grover and Leskovec 2016) to segments in musical scores (e.g., Arronte Alvarez and Gómez-Martin 2019; Chuan et al. 2020). Many key theoretical points raised in this paper about word embeddings also extend to other modalities. For instance, all these variations of embeddings hinge on generalizing the Distributional hypothesis, where nodes, sounds, or images are defined by their relationship to other nodes, sounds, or images, respectively. One key distinction of non-linguistic embeddings is that cultural elements may not be as cleanly demarcated (unlike words in a text). Thus, non-linguistic embeddings revive longstanding methodological questions about what counts as a cultural element (Mohr 1998). This paper has focused on word embeddings given their recent rise in social science. However, moving beyond word embeddings to other modalities could also aid sociological research on how signs and symbolic systems, more generally, operate in our cultural environment (Bail 2014).
Third, this paper also highlighted a variety of scholarship using word embeddings to study the relationship between symbolic and material (e.g., demographic) patterns. For example, Garg et. al. studied the relationship between gendered associations of occupations in text and the gender ratios of these occupations (2018). This is an exciting direction and responds to longstanding calls to study cultural and social orders in conjunction. One especially relevant application is socio-semantics, which studies the relationships between social ties and semantic structures (Basov, Breiger, and Hellsten 2020). For instance, Linzhuo et. al. use embedding methods to study the relationship between centralization in social networks and semantic diversity (2020). Notably, this research direction also offers one more way to address critiques of the linguistic structuralist vision of meaning as a “closed” system.
Finally, while this paper considers several core premises and critiques of structural linguistics (and structuralism more broadly) as they apply to word embeddings, this intellectual movement is broad (Dosse 1997). Future theoretical work might consider how word embeddings and their specific architectures align with or diverge from these variations within structural linguistics, such as perspectives of de Saussure, Jakobson, and C.S. Peirce (e.g., Yakin and Totu 2014), and the many branches of structuralism more broadly. Future work might also consider numerous critiques of structuralism which are not covered in this paper, such as the role of agency and creativity. Such research might unveil other implicit theoretical assumptions—and potential innovations—in word embeddings.
Conclusions
Word embeddings are becoming pervasive social science approaches to analyze language, meaning, and culture using text data. However, these methods remain undertheorized. To ensure we use them effectively, it is crucial that we define what kind of meaning word embeddings operationalize and their implicit assumptions. Dissecting the way that word embeddings implicitly formalize (or might be used to formalize) sociological concepts can ultimately push us to redefine these concepts themselves (Merton 1948). Analogously, social network analysis pushed scholars to clarify concepts like “social tie,” “network,” and “community” (Borgatti et al. 2009). Now, word embeddings offer a new theoretical opportunity to formalize concepts in cultural sociology, such as schema (Arseniev-Koehler and Foster Forthcoming), binary opposition (Kozlowski et al. 2019), symbolic system, symbol, and coherence.
Acknowledgments
This paper is based on work supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE-1650604 and the National Library of Medicine Training Grant: NIH grant T15LM011271. I am grateful to Omar Lizardo, Jacob Foster, Bernard Koch, Eleni Skaperdas, Sam Ashelman, Devin Cornell, the members of the UCLA Conversation Analysis Working Group, and the anonymous reviewers for their valuable feedback on this work.
Footnotes
Measured with the implicit associations test (Greenwald, McGhee, and Schwartz 1998).
Word2vec remains among the most popular and parsimonious neural word embedding models used in social science, and thus is one core neural word embedding model referenced in this paper. However, computer scientists have also developed a wide variety of neural word embeddings which have specialized features. Some of these variants are described as relevant in later sections (e.g., “contextualized” embeddings and multimodal word embeddings).
This explanation of word2vec is simplified for brevity. For example, in word2vec-CBOW, the context-vector is not merely average of the context word-vectors, since high frequency words are downweighed in certain word2vec architectures (Arora et al. 2016). For details on the word2vec architecture, see Rong (2014).
To be more precise, there are two word-vectors for each word: One corresponds to contexts and the other to target words.
More specifically, the objective is to maximize the similarity between the missing word and the context words and minimize the similarity between the observed missing word and vocabulary words not in the context. Minimizing the similarity between the missing word and all vocabulary words not in the context is impractical to implement. Therefore, in practice this is often approximated by minimizing the similarity between the missing word and k other words not in the vocabulary (i.e., negative sampling).
Count-based and neural word embeddings have been shown to perform comparably on certain semantic tasks such as completing analogies and evaluating word similarity (Levy and Goldberg 2014; Levy, Goldberg, and Dagan 2015). In theory, these embedding approaches are also trying to extract very similar information from the raw text data. Both GloVE and word2vec (with skip-gram, negative sampling, and certain parameter settings) have been shown to be doing implicit matrix factorization (Levy and Goldberg 2014). Still, the algorithmic difference between them is crucial to their interpretation and theorization (Baroni, Dinu, and Kruszewsk 2014; Günther, Rinaldi, and Marelli 2019; Mandera, Keuleers, and Brysbaert 2017). For instance, predictive and count-based approaches provide very different mechanistic explanations for how meaning may be learned and processed, and thus operationalize slightly different notions of meaning (see also Arseniev-Koehler and Foster Forthcoming; Günther et al. 2019). Analogously, two different agent-based models may arrive at similar macro-level outcomes, even if they have different assumptions about what agents do.
While contemporary word embeddings were developed in computer science for the purpose of quantifying language, rather than operationalizing any theory of meaning, they imported approaches used in cognitive science and to quantitatively model how humans process and represent language (e.g., Latent Semantic Analysis (Landauer and Dumais 1997)).
In practice, we usually stop the training process based on preset hyperparameters in the algorithm—such as error falling below a certain threshold or following a set cap on the number of iterations of the algorithm.
In more technical jargon, this conceptual description refers to the hidden state in a long-short-term memory (LSTM) network.
WORKS CITED
- Akbik Alan, Blythe Duncan, and Vollgraf Roland. 2018. “Contextual String Embeddings for Sequence Labeling.” Pp. 1638–49 in, Proceedings of the 27th International Conference on Computational Linguistics. Santa Fe, New Mexico, USA: Association for Computational Linguistics. [Google Scholar]
- Alexander Jeffrey. 2003. The Meanings of Social Life. Oxford: Oxford University Press. [Google Scholar]
- Alexander Jeffrey, and Smith Philip. 1993. “The Discourse of American Civil Society: A New Proposal for Cultural Studies.” Theory and Society 22(2):151–207. [Google Scholar]
- Arora Sanjeev, Li Yuanzhi, Liang Yingyu, Tengyu Ma, and Risteski Andrej. 2018. “Linear Algebraic Structure of Word Senses, with Applications to Polysemy.” Transactions of the Association for Computational Linguistics 6:483–95. [Google Scholar]
- Arora Sanjeev, Li Yuanzhi, Yingyu Liang, Ma Tengyu, and Risteski Andrej. 2016. “A Latent Variable Model Approach to Pmi-Based Word Embeddings.” Transactions of the Association for Computational Linguistics 4:385–99. [Google Scholar]
- Arora Simran, May Avner, Zhang Jian, and Ré Christopher. 2020. “Contextual Embeddings: When Are They Worth It?” Pp. 2650–63 in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: Association for Computational Linguistics. [Google Scholar]
- Arronte Alvarez Aitor, and Gómez-Martin Francisco. 2019. “Distributed Vector Representations of Folksong Motifs.” Pp. 325–32 in Mathematics and Computation in Music, edited by Montiel M, Gomez-Martin F, and Agustín-Aquino OA. Cham: Springer International Publishing. [Google Scholar]
- Arseniev-Koehler Alina, Cochran Susan D., Mays Vickie M., Chang Kai-Wei, and Foster Jacob G.. 2022. “Integrating Topic Modeling and Word Embedding to Characterize Violent Deaths.” Proceedings of the National Academy of Sciences 119(10):e2108801119. doi: 10.1073/pnas.2108801119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Arseniev-Koehler Alina, and Foster Jacob G.. Forthcoming. “Machine Learning as a Model for Cultural Learning: Teaching an Algorithm What It Means to Be Fat.” Sociological Methods & Research. doi: 10.31235/osf.io/c9yj3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bail Christopher. 2014. “The Cultural Environment: Measuring Culture with Big Data.” Theory and Society 43(3–4):465–82. [Google Scholar]
- Bakhtin Mikhail M. 1981. “The Dialogic Imagination: Four Essays, Ed.” Holquist Michael, Trans. Emerson Caryl and Holquist Michael (Austin: University of Texas Press, 1981) 84(8):80–82. [Google Scholar]
- Bamler Robert, and Mandt Stephen. 2017. “Dynamic Word Embeddings.” Pp. 380–89 in Proceedings of the 34th International Conference on Machine Learning. Vol. 70. Sydney, Australia. [Google Scholar]
- Baroni Marco. 2016. “Grounding Distributional Semantics in the Visual World.” Language and Linguistics Compass 10(1):3–13. [Google Scholar]
- Baroni Marco, Dinu Georgiana, and Kruszewsk Germán. 2014. “Don’t Count, Predict! A Systematic Comparison of Context-Counting vs. Context-Predicting Semantic Vectors.” Pp. 238–47 in 52nd Annual Meeting of the Association for Computational Linguistics. Vol. 1. [Google Scholar]
- Barsalou Lawrence W. 1999. “Perceptual Symbol Systems.” Behavioral and Brain Sciences 22(4):577–660. doi: 10.1017/S0140525X99002149. [DOI] [PubMed] [Google Scholar]
- Barthes Roland. 1961. “Toward a Psychosociology of Contemporary Food Consumption.” in Food and culture, edited by Counihan C and Esterik PV. New York, NY: Routledge. [Google Scholar]
- Barthes Roland. 1977. Elements of Semiology. 34. [print]. New York, NY: Hill and Wang. [Google Scholar]
- Basov Nikita, Breiger Ronald, and Hellsten Iina. 2020. “Socio-Semantic and Other Dualities.” Poetics 78:101433. [Google Scholar]
- Best Rachel Kahn, and Arseniev-Koehler Alina. 2022. Stigma’s Uneven Decline. preprint. SocArXiv. doi: 10.31235/osf.io/7nm9x. [DOI] [Google Scholar]
- Bisk Yonatan, Holtzman Ari, Thomason Jesse, Andreas Jacob, Bengio Yoshua, Chai Joyce, Lapata Mirella, Lazaridou Angeliki, May Jonathan, and Nisnevich Aleksandr. 2020. “Experience Grounds Language.” Pp. 8718–35 in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). [Google Scholar]
- Blodgett Su Lin, Barocas Solon, Daumé Hal III, and Wallach Hanna. 2020. “Language (Technology) Is Power: A Critical Survey of ‘Bias’ in NLP.” Pp. 5454–76 in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: Association for Computational Linguistics. [Google Scholar]
- Bolukbasi Tolga, Chang Kai-Wei, Zou James, Saligrama Venkatesh, and Kalai Adam. 2016. “Man Is to Computer Programmer as Woman Is to Homemaker? Debiasing Word Embeddings.” Pp. 4349–57 in Advances in neural information processing systems. Barcelona, Spain. [Google Scholar]
- Bommasani Rishi, Hudson Drew A., Adeli Ehsan, Altman Russ, Arora Simran, Sydney von Arx Michael S. Bernstein, Bohg Jeannette, Bosselut Antoine, Brunskill Emma, Brynjolfsson Erik, Buch Shyamal, Card Dallas, Castellon Rodrigo, Chatterji Niladri, Chen Annie, Creel Kathleen, Jared Quincy Davis Dora Demszky, Donahue Chris, Doumbouya Moussa, Durmus Esin, Ermon Stefano, Etchemendy John, Ethayarajh Kawin, Li Fei-Fei Chelsea Finn, Gale Trevor, Gillespie Lauren, Goel Karan, Goodman Noah, Grossman Shelby, Guha Neel, Hashimoto Tatsunori, Henderson Peter, Hewitt John, Ho Daniel E., Hong Jenny, Hsu Kyle, Huang Jing, Icard Thomas, Jain Saahil, Jurafsky Dan, Kalluri Pratyusha, Karamcheti Siddharth, Keeling Geoff, Khani Fereshte, Khattab Omar, Pang Wei Koh Mark Krass, Krishna Ranjay, Kuditipudi Rohith, Kumar Ananya, Ladhak Faisal, Lee Mina, Lee Tony, Leskovec Jure, Levent Isabelle, Xiang Lisa Li Xuechen Li, Ma Tengyu, Malik Ali, Manning Christopher D., Mirchandani Suvir, Mitchell Eric, Munyikwa Zanele, Nair Suraj, Narayan Avanika, Narayanan Deepak, Newman Ben, Nie Allen, Juan Carlos Niebles Hamed Nilforoshan, Nyarko Julian, Ogut Giray, Orr Laurel, Papadimitriou Isabel, Joon Sung Park Chris Piech, Portelance Eva, Potts Christopher, Raghunathan Aditi, Reich Rob, Ren Hongyu, Rong Frieda, Roohani Yusuf, Ruiz Camilo, Ryan Jack, Christopher Ré Dorsa Sadigh, Sagawa Shiori, Santhanam Keshav, Shih Andy, Srinivasan Krishnan, Tamkin Alex, Taori Rohan, Thomas Armin W., Florian Tramèr Rose E. Wang, Wang William, Wu Bohan, Wu Jiajun, Wu Yuhuai, Sang Michael Xie Michihiro Yasunaga, You Jiaxuan, Zaharia Matei, Zhang Michael, Zhang Tianyi, Zhang Xikun, Zhang Yuhui, Zheng Lucia, Zhou Kaitlyn, and Liang Percy. 2021. “On the Opportunities and Risks of Foundation Models.” ArXiv:2108.07258 [Cs]. [Google Scholar]
- Borgatti Stephen P., Mehra Ajay, Brass Daniel J., and Labianca Giuseppe. 2009. “Network Analysis in the Social Science.” Science 323(5916):892–95. [DOI] [PubMed] [Google Scholar]
- Borghi Anna, Binkofski Ferdinand, Castelfranchi Cristiano, Cimatti Felice, Scorolli Claudia, and Tummolini Luca. 2017. “The Challenge of Abstract Concepts.” Psychological Bulletin 143:263. [DOI] [PubMed] [Google Scholar]
- Bourdieu Pierre. 1984. Distinction: A Social Critique of the Judgement of Taste. Harvard University Press. [Google Scholar]
- Boutyline Andrei, Arseniev-Koehler Alina, and Cornell Devin. 2020. “School, Studying, and Smarts: Gender Stereotypes and Education Across 80 Years of American Print Media, 1930–2009.” SocArXiv. doi: 10.31235/osf.io/bukdg. [DOI] [Google Scholar]
- Boutyline Andrei, Cornell Devin, and Arseniev‐Koehler Alina . 2021. “All Roads Lead to Polenta: Cultural Attractors at the Junction of Public and Personal Culture.” Sociological Forum 36(S1):1419–45. doi: 10.1111/socf.12760. [DOI] [Google Scholar]
- Bruni Elia, Tran Nam-Khanh, and Baroni Marco. 2014. “Multimodal Distributional Semantics.” Journal of Artificial Intelligence Research 49(2014):1–47. [Google Scholar]
- Bryson Joanna. 2008. “Embodiment versus Memetics.” Mind & Society 7(1):77–94. [Google Scholar]
- Caliskan Aylin, Bryson Joanna J., and Narayanan Arvind. 2017. “Semantics Derived Automatically from Language Corpora Contain Human-like Biases.” Science 356(6334):183–86. [DOI] [PubMed] [Google Scholar]
- Caliskan Aylin, and Lewis Molly. 2022. “Social Biases in Word Embeddings and Their Relation to Human Cognition.” Pp. 447–63 in Handbook of Language Analysis in Psychology. New York: Guilford Publications. [Google Scholar]
- Cerulo Karen A. 1995. Identity Designs: The Sights and Sounds of a Nation. Rutgers University Press. [Google Scholar]
- Cerulo Karen A. 2019. “Embodied Cognition.” The Oxford Handbook of Cognitive Sociology 81. [Google Scholar]
- Charlesworth Tessa ES, Caliskan Aylin, and Banaji Mahzarin R.. 2022. “Historical Representations of Social Groups across 200 Years of Word Embeddings from Google Books.” Proceedings of the National Academy of Sciences 119(28):e2121798119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Charlesworth Tessa ES, Yang Victor, Mann Thomas C., Kurdi Benedek, and Banaji Mahzarin R.. 2021. “Gender Stereotypes in Natural Language: Word Embeddings Show Robust Consistency across Child and Adult Language Corpora of More than 65 Million Words.” Psychological Science 32(2):218–40. [DOI] [PubMed] [Google Scholar]
- Chen Dawn, Peterson Joshua C., and Griffiths Thomas L.. 2017. “Evaluating Vector-Space Models of Analogy.” ArXiv 1705(04416). [Google Scholar]
- Chuan Ching-Hua, Agres Kat, and Herremans Dorien. 2020. “From Context to Concept: Exploring Semantic Relationships in Music with Word2vec.” Neural Computing and Applications 32(4):1023–36. doi: 10.1007/s00521-018-3923-1. [DOI] [Google Scholar]
- Craib I 1992. “The Word as a Logical Pattern: An Introduction to Structuralism.” Pp. 131–48 in Modern Social Theory. St Martin’s Press. [Google Scholar]
- Davis Charles P., and Yee Eiling. 2021. “Building Semantic Memory from Embodied and Distributional Language Experience.” WIREs Cognitive Science 12(5). doi: 10.1002/wcs.1555. [DOI] [PubMed] [Google Scholar]
- Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. “Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” Pp. 4171–86 in NAACL-HLT. [Google Scholar]
- DiMaggio Paul. 1997. “Culture and Cognition.” Annual Review of Sociology 23(1):263–87. [Google Scholar]
- Dosse François. 1997. History of Structuralism. Vol. 1. Minneapolis, Minn: University of Minnesota Press. [Google Scholar]
- Douglas Mary. 2003. Purity and Danger: An Analysis of Concepts of Pollution and Taboo. Routledge. [Google Scholar]
- Emirbayer Mustafa. 2004. “The Alexander School of Cultural Sociology.” Thesis Eleven 79(1):5–15. doi: 10.1177/0725513604046951. [DOI] [Google Scholar]
- Ethayarajh Kawin. 2019. “How Contextual Are Contextualized Word Representations?” in IJCNLP. [Google Scholar]
- Ethayarajh Kawin, Duvenaud David, and Hirst Graeme. 2019. “Towards Understanding Linear Word Analogies.” Pp. 3253–62 in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics. [Google Scholar]
- Faruqui Manaal, Tsvetkov Yulia, Rastogi Pushpendre, and Dyer Chris. 2016. “Problems With Evaluation of Word Embeddings Using Word Similarity Tasks.” Pp. 30–35 in Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP. [Google Scholar]
- Firth John. 1957. “A Synopsis of Linguistic Theory, 1930–1955.” Pp. 1–32 in Studies in Linguistic Analysis. Oxford: Philological Society. [Google Scholar]
- Franco Karlien, Geeraerts Dirk, Speelman Dirk, and Van Hout Roeland . 2019. “Concept Characteristics and Variation in Lexical Diversity in Two Dutch Dialect Areas.” Cognitive Linguistics 30(1):205–42. [Google Scholar]
- Garg Nikhil, Schiebinger Londa, Jurafsky Dan, and Zou James. 2018. “Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes.” Proceedings of the National Academy of Sciences 115(16):E3635–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Geeraerts Dirk. 2010. Theories of Lexical Semantics. Oxford: Oxford University Press. [Google Scholar]
- Geeraerts Dirk, Grondelaers Stefan, and Bakema Peter. 2012. The Structure of Lexical Variation: Meaning, Naming, and Context. Vol. 5. Walter de Gruyter. [Google Scholar]
- Geeraerts Dirk, and Speelman Dirk. 2010. “Heterodox Concept Features and Onomasiological Heterogeneity in Dialects.” Pp. 21–40 in Advances in cognitive sociolinguistics. De Gruyter Mouton. [Google Scholar]
- Ghaziani Delia, Baldassarri Amin. 2011. “Cultural Anchors and the Organization of Differences.” American Sociological Review 76(2):179–206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Giddens Anthony. 1979. Central Problems in Social Theory: Action, Structure and Contradiction in Social Analysis. Berkeley, CA: University of California Press. [Google Scholar]
- Gladkova Anna, and Drozd Aleksandr. 2016. “Intrinsic Evaluations of Word Embeddings: What Can We Do Better?” Pp. 36–42 in Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP. Berlin, Germany: Association for Computational Linguistics. [Google Scholar]
- Goh Gabriel, Cammarata Nick, Voss Chelsea, Carter Shan, Petrov Michael, Schubert Ludwig, Radford Alec, and Olah Chris. 2021. “Multimodal Neurons in Artificial Neural Networks.” Distill 6(3):10.23915/distill.00030. doi: 10.23915/distill.00030. [DOI] [Google Scholar]
- Grand Gabriel, Idan Asher Blank Francisco Pereira, and Fedorenko Evelina. 2022. “Semantic Projection: Recovering Human Knowledge of Multiple, Distinct Object Features from Word Embeddings.” Nature Human Behavior. doi: 10.1038/s41562-022-01316-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Greenwald Anthony G., McGhee Debbie E., and Schwartz Jordan LK. 1998. “Measuring Individual Differences in Implicit Cognition: The Implicit Association Test.” Journal of Personality and Social Psychology 74(6):1464. [DOI] [PubMed] [Google Scholar]
- Grover Aditya, and Leskovec Jure. 2016. “Node2vec: Scalable Feature Learning for Networks.” Pp. 855–64 in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco California USA: ACM. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Günther Fritz, Rinaldi Luca, and Marelli Marco. 2019. “Vector-Space Models of Semantic Representation from a Cognitive Perspective: A Discussion of Common Misconceptions.” Perspectives on Psychological Science 14(6):1006–33. [DOI] [PubMed] [Google Scholar]
- Haber Jaren R. 2021. “Sorting Schools: A Computational Analysis of Charter School Identities and Stratification.” Sociology of Education 94(1):43–64. doi: 10.1177/0038040720953218. [DOI] [Google Scholar]
- Hamilton William L., Leskovec Jure, and Jurafsky Dan. 2016. “Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change.” Pp. 1489–1501 in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics. [Google Scholar]
- Hamilton William, Leskovec Jure, and Jurafsky Dan. 2016. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics 1:1489–1501. [Google Scholar]
- Harris Zellig. 1954. “Distributional Structure.” Word 10(2–3):146–62. [Google Scholar]
- Hollis Geoff. 2017. “Estimating the Average Need of Semantic Knowledge from Distributional Semantic Models.” Memory & Cognition 45(8):1350–70. doi: 10.3758/s13421-017-0732-1. [DOI] [PubMed] [Google Scholar]
- Hu Tianran, Song Ruihua, Abtahian Maya, Ding Philip, Xie Xing, and Luo Jiebo. 2017. “A World of Difference: Divergent Word Interpretations among People.” in Proceedings of the International AAAI Conference on Web and Social Media. Vol. 11. [Google Scholar]
- Ignatow G 2009. “Culture and Embodied Cognition: Moral Discourses in Internet Support Groups for Overeaters.” Social Forces 88(2):643–69. doi: 10.1353/sof.0.0262. [DOI] [Google Scholar]
- Ignatow Gabe. 2016. “Theoretical Foundations for Digital Text Analysis.” Journal for the Theory of Social Behaviour 46(1):104–20. [Google Scholar]
- Ignatow Gabriel. 2007. “Theories of Embodied Knowledge: New Directions for Cultural and Cognitive Sociology?” Journal for the Theory of Social Behaviour 37(2):115–35. doi: 10.1111/j.1468-5914.2007.00328.x. [DOI] [Google Scholar]
- Joas H, and Knobl W. 2009. “Structuralism and Poststructuralism.” Pp. 339–70 in Social Theory: Twenty Introductory Lectures. Cambridge University Press. [Google Scholar]
- Jones Frank L., and Smith Philip. 2001. “Diversity and Commonality in National Identities: An Exploratory Analysis of Cross-National Patterns.” Journal of Sociology 37(1):45–63. doi: 10.1177/144078301128756193. [DOI] [Google Scholar]
- Jones Jason, Amin Ruhul Mohammad Jessica Kim, and Skiena Steven. 2020. “Stereotypical Gender Associations in Language Have Decreased Over Time.” Sociological Science 7:1–35. [Google Scholar]
- Joseph Kenneth, and Morgan Jonathan. 2020. “When Do Word Embeddings Accurately Reflect Surveys on Our Beliefs About People?” Pp. 4392––4415 in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: Association for Computational Linguistics. [Google Scholar]
- Kim Yoon, Chiu Yi-I, Hanaki Kentaro, Hegde Darshan, and Petrov Slav. 2014. “Temporal Analysis of Language through Neural Language Models.” Pp. 61–65 in Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science. Baltimore, MD, USA: Association for Computational Linguistics. [Google Scholar]
- Kozlowski Austin C., Taddy Matt, and Evans James A.. 2019. “The Geometry of Culture: Analyzing the Meanings of Class through Word Embeddings.” American Sociological Review 84(5). doi: 10.1177/0003122419877135. [DOI] [Google Scholar]
- Kulkarni Vivek, Rami Al-Rfou Bryan Perozzi, and Skiena Steven. 2015. “Statistically Significant Detection of Linguistic Change.” Pp. 625–35 in Proceedings of the 24th International Conference on World Wide Web. Florence Italy: International World Wide Web Conferences Steering Committee. [Google Scholar]
- Kutuzov Andrey, Øvrelid Lilja, Szymanski Terrence, and Velldal Erik. 2018. “Diachronic Word Embeddings and Semantic Shifts: A Survey.” ArXiv:1806.03537 [Cs]. [Google Scholar]
- Labov William. 1972. Sociolinguistic Patterns. University of Pennsylvania press. [Google Scholar]
- Lakoff George, and Johnson Mark. 1980. “Conceptual Metaphor in Everyday Language.” The Journal of Philosophy 77(8):453–86. [Google Scholar]
- Lakoff George, and Johnson Mark. 2008. Metaphors We Live By. University of Chicago press. [Google Scholar]
- Landauer Thomas, and Dumais Susan. 1997. “A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge.” Psychological Review (104):211–40. [Google Scholar]
- Lembo Alessandra, and Martin John Levi. 2021. “The Structure of Cultural Experience.” Poetics 101562. doi: 10.1016/j.poetic.2021.101562. [DOI] [Google Scholar]
- Lenci Alessandro. 2008. “Distributional Semantics in Linguistic and Cognitive Research.” Italian Journal of Linguistics 20(1):1–31. [Google Scholar]
- Lenci Alessandro. 2018. “Distributional Models of Word Meaning.” Annual Review of Linguistics (4):151–71. [Google Scholar]
- Leschziner Vanina, and Brett Gordon. 2021. “Symbol Systems and Social Structures.” Pp. 559–82 in Handbook of Classical Sociological Theory. Cham: Springer. [Google Scholar]
- Lévi-Strauss Claude. 1963. Structural Anthropology. New York, London: Basic Books. [Google Scholar]
- Levy Omer, and Goldberg Yoav. 2014. “Neural Word Embedding as Implicit Matrix Factorization.” Advances in Neural Information Processing Systems 2177–85. [Google Scholar]
- Levy Omer, Goldberg Yoav, and Dagan Ido. 2015. “Improving Distributional Similarity with Lessons Learned from Word Embeddings. Transactions of the Association for Computational Linguistics.” Transactions of the Association for Computational Linguistics 3:211–25. [Google Scholar]
- Lewis Molly, Matt Cooper Borkenhagen Ellen Converse, Lupyan Gary, and Seidenberg Mark S.. 2022. “What Might Books Be Teaching Young Children About Gender?” Psychological Science 33(1):33–47. doi: 10.1177/09567976211024643. [DOI] [PubMed] [Google Scholar]
- Lewis Molly, and Lupyan Gary. 2020. “Gender Stereotypes Are Reflected in the Distributional Structure of 25 Languages.” Nature Human Behaviour 1–8. [DOI] [PubMed] [Google Scholar]
- Li Lucy, and Gauthier Jon. 2017. “Are Distributional Representations Ready for the Real World? Evaluating Word Vectors for Grounded Perceptual Meaning.” ArXiv Preprint 1705(11168). [Google Scholar]
- Linzhuo Li, Lingfei Wu, and Evans James. 2020. “Social Centralization and Semantic Collapse: Hyperbolic Embeddings of Networks and Text.” Poetics 78:101428. doi: 10.1016/j.poetic.2019.101428. [DOI] [Google Scholar]
- Lizardo Omar. 2017. “Improving Cultural Analysis: Considering Personal Culture in Its Declarative and Nondeclarative Modes.” American Sociological Review 82(1):88–115. [Google Scholar]
- Lynott Dermot, Connell Louise, Brysbaert Marc, Brand James, and Carney James. 2020. “The Lancaster Sensorimotor Norms: Multidimensional Measures of Perceptual and Action Strength for 40,000 English Words.” Behavior Research Methods 52(3):1271–91. doi: 10.3758/s13428-019-01316-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mandera Pawel, Keuleers Emmanuel, and Brysbaert Marc. 2017. “Explaining Human Performance in Psycholinguistic Tasks with Models of Semantic Similarity Based on Prediction and Counting: A Review and Empirical Validation.” Journal of Memory and Language 92:57–78. [Google Scholar]
- Martin John Levi. 2010. “Life’s a Beach but You’re an Ant, and Other Unwelcome News for the Sociology of Culture.” Poetics 38(2):229–44. [Google Scholar]
- Martin-Caughey Ananda. 2021. “What’s in an Occupation? Investigating Within-Occupation Variation and Gender Segregation Using Job Titles and Task Descriptions.” American Sociological Review 86(5):960–99. doi: 10.1177/00031224211042053. [DOI] [Google Scholar]
- Merchant Amil, Rahimtoroghi Elahe, Pavlick Ellie, and Tenney Ian. 2020. “What Happens To BERT Embeddings During Fine-Tuning?” Pp. 33–44 in Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP. Online: Association for Computational Linguistics. [Google Scholar]
- Merton Robert K. 1948. “The Bearing of Empirical Research on Sociological Theory.” American Sociological Review 13(5):505–15. [Google Scholar]
- Mikolov Tomas, Chen Kai, Corrado Greg, and Dean Jeffrey. 2013. “Efficient Estimation of Word Representations in Vector Space.”
- Mikolov Tomas, Sutskever Ilya, Chen Kai, Corrado Greg S., and Dean Jeff. 2013. “Distributed Representations of Words and Phrases and Their Compositionality.” Advances in Neural Information Processing Systems 3111–19. [Google Scholar]
- Mohr John. 1998. “Measuring Meaning Structures.” Annual Review of Sociology 24(1):345–70. [Google Scholar]
- Mohr John, Bail Chris, Frye Margaret, Lena Jennifer, Lizardo Omar, Terence McDonnell Ann Mische, Tavory Iddo, and Wherry Frederick. 2020. Measuring Culture. New York: Columbia University Press. [Google Scholar]
- Moseley Rachel, Carota Francesca, Hauk Olaf, Mohr Bettina, and Pulvermüller Friedemann. 2012. “A Role for the Motor System in Binding Abstract Emotional Meaning.” Cerebral Cortex 22(7):1634–47. doi: 10.1093/cercor/bhr238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Neelakantan Arvind, Shankar Jeevan, Passos Alexandre, and McCallum Andrew. 2015. “Efficient Non-Parametric Estimation of Multiple Embeddings per Word in Vector Space.” ArXiv Preprint 1504(06654). [Google Scholar]
- Nelson Laura K. 2021. “Leveraging the Alignment between Machine Learning and Intersectionality: Using Word Embeddings to Measure Intersectional Experiences of the Nineteenth Century US South.” Poetics 101539. [Google Scholar]
- Nickel Maximillian, and Kiela Kiela. 2017. “Poincaré Embeddings for Learning Hierarchical Representations.” Pp. 6338–47 in Advances in neural information processing systems. [Google Scholar]
- Peirsman Yves, Geeraerts Dirk, and Speelman Dirk. 2010. “The Automatic Identification of Lexical Variation between Language Varieties.” Natural Language Engineering 16(4):469–91. [Google Scholar]
- Pennington Jeffrey, Socher Richard, and Manning Christopher. 2014. “Glove: Global Vectors for Word Representation.” Pp. 1532–43 in Conference on empirical methods in natural language processing (EMNLP). [Google Scholar]
- Peters Matthew, Ammar Waleed, Bhagavatula Chandra, and Power Russell. 2017. “Semi-Supervised Sequence Tagging with Bidirectional Language Models.” Pp. 1756–65 in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vancouver, Canada: Association for Computational Linguistics. [Google Scholar]
- Peters Matthew E., Neumann Mark, Zettlemoyer Luke, and Yih Wen-tau. 2018. “Dissecting Contextual Word Embeddings: Architecture and Representation.” ArXiv:1808.08949 [Cs]. [Google Scholar]
- Peters Matthew, Neumann Mark, Iyyer Mohit, Gardner Matt, Clark Christopher, Lee Kenton, and Zettlemoyer Luke. 2018. “Deep Contextualized Word Representations.” Pp. 2227–37 in.
- Pulvermüller Friedemann. 2013. “How Neurons Make Meaning: Brain Mechanisms for Embodied and Abstract-Symbolic Semantics.” Trends in Cognitive Sciences 17(9):458–70. doi: 10.1016/j.tics.2013.06.004. [DOI] [PubMed] [Google Scholar]
- Quiroga R. Quian, Reddy Leila, Gabriel Kreiman, Koch Christof, and Fried Itzhak. 2005. “Invariant Visual Representation by Single Neurons in the Human Brain.” Nature 435(7045):1102–7. [DOI] [PubMed] [Google Scholar]
- Radford Alec, Jong Wook Kim Chris Hallacy, Ramesh Aditya, Goh Gabriel, Agarwal Sandhini, Sastry Girish, Askell Amanda, Mishkin Pamela, Clark Jack, Krueger Gretchen, and Sutskever Ilya. 2021. “Learning Transferable Visual Models From Natural Language Supervision.” ArXiv:2103.00020 [Cs]. [Google Scholar]
- Ridgeway Cecilia L. 2011. Framed by Gender: How Gender Inequality Persists in the Modern World. Oxford: Oxford University Press. [Google Scholar]
- Robinson Justyna A. 2010. “Awesome Insights into Semantic Variation.” Pp. 85–110 in Advances in cognitive sociolinguistics. De Gruyter Mouton. [Google Scholar]
- Rong Xin. 2014. “Word2Vec Parameter Learning Explained.” ArXiv 1411–2738. [Google Scholar]
- Rosenfeld Alex, and Erk Katrin. 2018. “Deep Neural Models of Semantic Shift.” Pp. 474–84 in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). New Orleans, Louisiana: Association for Computational Linguistics. [Google Scholar]
- Roy Deb. 2005. “Grounding Words in Perception and Action: Computational Insights.” Trends in Cognitive Sciences 9:389–96. [DOI] [PubMed] [Google Scholar]
- Rozado David. 2019. “Using Word Embeddings to Analyze How Universities Conceptualize ‘Diversity’ in Their Online Institutional Presence.” Society 56(3):256–66. doi: 10.1007/s12115-019-00362-9. [DOI] [Google Scholar]
- Rozado David, and al-Gharbi Musa. 2021. “Using Word Embeddings to Probe Sentiment Associations of Politically Loaded Terms in News and Opinion Articles from News Media Outlets.” Journal of Computational Social Science. doi: 10.1007/s42001-021-00130-y. [DOI] [Google Scholar]
- Sahlgren Magnus. 2008. “The Distributional Hypothesis.” Italian Journal of Disability Studies 20:33–53. [Google Scholar]
- Ferdinand de Saussure. 1983. Course in General Linguistics. London: Duckworth. [Google Scholar]
- Sewell WH 2005. “The Concept(s) of Culture.” Pp. 152–74 in The Logics of History. University of Chicago Press. [Google Scholar]
- Sewell William H. 1992. “A Theory of Structure: Duality, Agency, and Transformation.” American Journal of Sociology 98(1):1–29. [Google Scholar]
- Shah Deven Santosh, Schwartz H. Andrew, and Hovy Dirk. 2020. “Predictive Biases in Natural Language Processing Models: A Conceptual Framework and Overview.” Pp. 5248–64 in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: Association for Computational Linguistics. [Google Scholar]
- Smith Linda, and Gasser Michael. 2005. “The Development of Embodied Cognition: Six Lessons from Babies.” Artificial Life 11(1–2):13–29. [DOI] [PubMed] [Google Scholar]
- Sperber Dan. 1985. “Anthropology and Psychology: Towards an Epidemiology of Representations.” Man 73–89. [Google Scholar]
- Stoltz Dustin. 2019. “Becoming a Dominant Misinterpreted Source: The Case of Ferdinand de Saussure in Cultural Sociology.” Journal of Classical Sociology 0(0):1–22. [Google Scholar]
- Stoltz Dustin S., and Taylor Marshall A.. 2021. “Cultural Cartography with Word Embeddings.” Poetics 101567. [Google Scholar]
- Stoltz Dustin, and Taylor Marshall. 2019. “Concept Mover’s Distance: Measuring Concept Engagement via Word Embeddings in Texts.” Journal of Computational Social Science 2(2):293–313. [Google Scholar]
- Stoltz Dustin, and Taylor Marshall. 2020. “Cultural Cartography with Word Embeddings.” ArXiv Preprint 2007.04508:1–70. [Google Scholar]
- Swidler Ann. 1986. “Culture in Action: Symbols and Strategies.” American Sociological Review 51:273–86. [Google Scholar]
- Swidler Ann. 2013. Talk of Love: How Culture Matters. University of Chicago Press. [Google Scholar]
- Talbert Ryan D. 2017. “Culture and the Confederate Flag: Attitudes toward a Divisive Symbol.” Sociology Compass 11(2). doi: 10.1111/soc4.12454. [DOI] [Google Scholar]
- Tavory Iddo, and Swidler Ann. 2009. “Condom Semiotics: Meaning and Condom Use in Rural Malawi.” American Sociological Review 74(2):171–89. doi: 10.1177/000312240907400201. [DOI] [Google Scholar]
- Taylor Marshall A., and Stoltz Dustin S.. 2021. “Integrating Semantic Directions with Concept Mover’s Distance to Measure Binary Concept Engagement.” Journal of Computational Social Science 4(1):231–42. [Google Scholar]
- Taylor Marshall, and Stoltz Dustin. 2020. “Concept Class Analysis: A Method for Identifying Cultural Schemas in Texts.” Sociological Science 7:544–69. [Google Scholar]
- Tenney Ian, Xia Patrick, Chen Berlin, Wang Alex, Poliak Adam, McCoy R. Thomas, Kim Najoung, Van Durme Benjamin, Bowman Samuel R., Das Dipanjan, and Pavlick Ellie. 2019. “What Do You Learn from Context? Probing for Sentence Structure in Contextualized Word Representations.” ArXiv:1905.06316 [Cs]. [Google Scholar]
- Vijayakumar Ashwin K., Vedantam Ramakrishna, and Parikh Devi. 2017. “Sound-Word2Vec: Learning Word Representations Grounded in Sounds.” Pp. 920–25 in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. [Google Scholar]
- Voyer Andrea, Kline Zachary D., and Danton Madison. 2022. “Symbols of Class: A Computational Analysis of Class Distinction-Making through Etiquette, 1922–2017.” Poetics 101734. doi: 10.1016/j.poetic.2022.101734. [DOI] [Google Scholar]
- Wang Yuxuan, Hou Yutai, Che Wanxiang, and Liu Ting. 2020. “From Static to Dynamic Word Representations: A Survey.” International Journal of Machine Learning and Cybernetics 11(7):1611–30. doi: 10.1007/s13042-020-01069-8. [DOI] [Google Scholar]
- Xu Huimin, Zhang Zhang, Wu Lingfei, and Wang Cheng-Jun. 2019. “The Cinderella Complex: Word Embeddings Reveal Gender Stereotypes in Movies and Books” edited by Safro I. PLOS ONE 14(11):e0225385. doi: 10.1371/journal.pone.0225385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yakin Halina Sendera Mohd, and Totu Andreas. 2014. “The Semiotic Perspectives of Peirce and Saussure: A Brief Comparative Study.” Procedia-Social and Behavioral Sciences 155:4–8. [Google Scholar]
- Yao Zijun, Sun Yifan, Ding Weicong, Rao Nikhil, and Xiong Hui. 2018. “Dynamic Word Embeddings for Evolving Semantic Discovery.” Pp. 673–81 in Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. Marina Del Rey CA USA: ACM. [Google Scholar]