Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Dec 1.
Published in final edited form as: Comput Human Behav. 2018 Aug 9;89:308–315. doi: 10.1016/j.chb.2018.08.010

Twitter-derived measures of sentiment towards minorities (2015–2016) and associations with low birth weight and preterm birth in the United States

Thu T Nguyen a, Hsien-Wen Meng b, Sanjeev Sandeep c, Matt McCullough d, Weijun Yu b, Yan Lau e, Dina Huang f, Quynh C Nguyen f
PMCID: PMC6432619  NIHMSID: NIHMS1504881  PMID: 30923420

Abstract

Introduction:

The objective of this study was to investigate the association between state-level publicly expressed sentiment towards racial and ethnic minorities and birth outcomes for mothers who gave birth in that state.

Methods:

We utilized Twitter’s Streaming Application Programming Interface (API) to collect 1,249,653 tweets containing at least one relevant keyword pertaining to a racial or ethnic minority group. State-level derived sentiment towards racial and ethnic minorities were merged with data on all 2015 U.S. births (N=3.99 million singleton births).

Results:

Mothers living in states in the lowest tertile of positive sentiment towards racial/ethnic minorities had greater prevalences of low birth weight (+6%), very low birth weight (+9%), and preterm birth (+10%) compared to mothers living in states in the highest tertile of positive sentiment, controlling for individual-level maternal characteristics and state demographic characteristics. Sentiment towards specific racial/ethnic groups showed a similar pattern. Mothers living in states in the lowest tertile of positive sentiment towards blacks had an 8% greater prevalence of low birth weight and very low birth weight, and a 16% greater prevalence of preterm birth, compared to mothers living in states in the highest tertile. Lower state-level positive sentiment towards Middle Eastern groups was also associated with a 4–13% greater prevalence of adverse birth outcomes. Results from subgroup analyses restricted to racial/ethnic minority mothers did not differ substantially from those seen for the full population of mothers.

Conclusions:

More negative area-level sentiment towards blacks and Middle Eastern groups was related to worse individual birth outcomes, and this is true for the full population and minorities.

Keywords: Twitter, social media, low birth weight, preterm birth, geography, big data, vital statistics

INTRODUCTION

Research examining birth outcomes of black women found disparities in birth weight for the 1980–1995 period.1 Racial disparities in birth outcomes have continue to persist.2 NonHispanic black women have 60% higher risk of preterm birth compared to white women.3 The overall incidence of low birth weight for the U.S. is 8.02%, but is 13.1% among non-Hispanic black infants in 2013.3 Low birth weight and preterm birth are important outcomes because they are widely-used indicators of reproductive and population health,4,5 and are associated with an increased risk of infant mortality,6 and developmental delays.7

Racial disparities in birth outcomes are not well understood.8 Reducing disparities in birth outcomes has proved challenging. In 2005, when Massachusetts implemented universal health care, black infants had a 40% higher risk to be born preterm and 64% higher risk to be low birth weight. In 2014, black infants were still 18% more likely to be born preterm and 48% more likely to be low birth weight compared to white infants.8 Maternal health behaviors, adequacy of prenatal care, and sociodemographic characteristics do not fully explain the observed disparities.9 However, there is some evidence that discrimination is an important risk factor for adverse birth outcomes.911 A landmark study examined the influence of contextual indicators of discrimination on birth outcomes. After September 11, 2001, the social climate towards Arabs became more hostile and individuals perceived to be Arabs experienced increased harassment, violence, and workplace discrimination. Comparing birth outcomes for women by race/ethnicity and nativity 6 months before and after September 2001, only Arabic-named women experienced significantly increased risk of preterm birth and low birth weight after September 2001.12 The study findings are consistent with the hypothesis that ethnicity-related stress or discrimination increases risk of adverse birth outcomes.

One of the most challenging aspects of discrimination research has been evaluating racial hostility encountered by individuals using a data source other than self-report. While selfreported experiences of discrimination provide valuable information, it is critical to identify other data sources that provide valid information on exposure to discrimination. In addition, previous research has tended to examine discrimination in smaller geographic areas.

This study leverages the power of social media to capture aspects of the local social environment. Social media provides an untapped resource in understanding social context. Over 90% of Twitter users make their profile and communication public,13 permitting researchers the opportunity to examine public communication of a large proportion of the country. Social media conversations about race/ethnicity may provide a proxy for racial attitudes and an indicator of the social climate in which people live.

Social environments can offer social and emotional support that buffer stressful life events.14 Johns and colleagues found that neighborhoods with higher social cohesion had residents with lower risk of posttraumatic stress disorder.15 A study of 29 high income countries found respondents were more likely to report better self-reported health in countries with greater social cohesion, which includes measures of ethnic tolerance and valuing diversity, controlling for individual-level characteristics.16 Communities with higher happiness levels have residents with lower rates of obesity, hypertension, and suicide as well as increased life expectancy.1722 Conversely, communities that are less supportive and less welcoming may increase feelings of stress and alienation and not provide important social support.

Twitter data has been previously used to characterize national patterns in happiness and social modeling of diet and physical activity.23,24 Information generated via Twitter has been useful in the examination of beliefs, attitudes, and sentiment towards various topics (e.g., vaccinations).25 Others have utilized social media and user-generated data to track health behaviors and perform health surveillance (e.g., for outbreak detection),26 to examine sleep issues,27 to make use of personal health status mentions by Twitter users,28,29 and to investigate patient-perceived quality of care.30 To the authors’ knowledge, the current study is the first use of Twitter data to characterize sentiment towards racial/ethnic minorities. There has been one previous study on internet-based racism, examining the association between the proportion of Google searches containing the N-word in 196 designated market areas (DMAs) and all-cause Black mortality and adverse birth outcomes.31,32 Negative sentiment towards racial/ethnic minorities on social media may provide a “temperature” of the social environment of the state and county where the tweets originated. Fewer positive race-related tweets in a state or county may indicate an environment that is less welcoming to minorities and may be a source of stress for mothers. Views and activities described via social media can help shape perceived norms, beliefs and subsequently behaviors of people. Social climates that are less welcoming may encourage racial discrimination or tolerance of discrimination.

Two integrative reviews of the literature found that racial discrimination was associated with preterm birth and low birth weight.9,11 A systematic review of literature on perceived discrimination and health among Black women also found consistent evidence of an association between discrimination and adverse birth outcomes with several studies observing a doseresponse relationship.10 Several pathways have been proposed linking discrimination to adverse birth outcomes. Racial and ethnic minority women have a higher lifetime exposure to chronic stress, which is a known risk factor for poor pregnancy outcomes.9,33,34 Chronic stress may be a pathway through which racial discrimination increases risk of preterm birth and low-birth weight for racial and ethnic minority women. Other mechanisms through which racial discrimination may impact birth outcomes include geographic segregation leading to unhealthy physical and social environments, and restricted access to employment opportunities, housing, and education.13

Study Aims and Hypotheses

This study uses Twitter data to create indicators of the racial climate by quantifying sentiment towards racial and ethnic minorities in the United States. We hypothesize that states with lower positive sentiment towards racial/ethnic minorities will have higher rates of preterm and low birth weight.

METHODS

Social media data collection and spatial joins

From March 2015– April 2016, we utilized Twitter’s Streaming Application Programming Interface (API) to continuously collect a random sample of publicly available tweets. Twitter’s Streaming API only gives users access to a random 1% sample of tweets. The Twitter API is freely available to everyone and this API allows users free access to subsets of public tweets sent from anywhere in the world. Users may request tweets for a certain geographic area that contain a particular set of keywords, or just random subsets of tweets. Depending on the search criteria used by the researcher, the number of tweets returned may comprise less than 1% of all available tweets. In our case, we restricted the data collection to tweets with latitude and longitude coordinates that were sent from the contiguous United States (including District of Columbia). Tweets were not collected for Alaska and Hawaii given that these two states have very different physical and social environments and may exhibit different health patterns. In total, we collected 79,848,992 million general topic tweets from 603,363 unique Twitter users.

We created a keyword list of 398 terms of racial/ethnic groups terms and racial/ethnic slurs. The keyword list was derived from racial and ethnic categories used by the U.S. census and the online database of racial slurs (http://www.rsdb.org/). We include the keyword list in Appendix A. We then used our keyword list to identify tweets from our database of 80 million tweets that had at least one race-related word (1.25 million tweets) and these tweets were included in the final analytic dataset. We used the latitude and longitude coordinates of where a tweet was sent to spatially map them to their respective county and state locations. To accomplish spatial join of the tweets, we utilized Python and an R-tree to build a spatial index.28,35 Given the rarity of race related-tweets, we were unable to create stable Twitter-derived estimates for smaller levels of geographies like zip codes and census tracts.

Processing tweets

Each tweet was divided into tokens using the Stanford Tokenizer36, an open access software tool that divides text into tokens, which roughly correspond to “words.” Below, we briefly describe algorithms we utilized to create variables for sentiment, and references to racial/ethnic minority groups.

Sentiment analysis of Twitter data

To conduct sentiment analysis on tweets with references to racial and ethnic minorities, we utilized the Stochastic Gradient Descent Classifier in Python software version 2.7. In order to train the software to analyze tweets, which differ from other types of text, we obtained labeled tweets from other research teams.3739 Sanders and Kaggle used manual annotations to arrive at sentiment labels while Sentiment140 derived their sentiment labels from emoticons (e.g., smiley face indicates a positive tweet while a sad emoticon indicates a sad tweet). The computer algorithm then uses these labeled tweets to learn what a human considers a “positive” or “negative/neutral” tweet. Label “1” was used for positive tweets and “0” for negative/neutral tweets. The classifier was trained on a total of 1.6 million tweets. We chose to dichotomous sentiment as “positive” vs. “negative/neutral” because in preliminary analyses using all three categories (happy, neutral and negative) we found that accuracy against manual annotations was much higher for dichotomous sentiment compared to sentiment containing multiple categories. In particular, our sentiment model was most able to accurately distinguish between happy vs. not happy tweets. In ongoing work, we are attempting to build a sentiment model for negative tweets which comprise a small percentage of tweets compared to neutral and happy tweets.

Tweets were preprocessed to remove any stop words, urls, Twitter username references, additional white spaces. All words were converted to lower case. Two or more repetitions of a character were replaced with the character itself. More than two repetitions of a word was replaced with two repetitions of that word. We removed punctuations and hashtag symbols. These established preprocessing steps were taken in order to allow the sentiment model to focus on relevant features of the tweet.40 All the above preprocessing was done for both the training and test datasets. We then constructed a count vector for unigrams (1 word) and bigrams (2-word sequence) in the preprocessed tweets. From the count vector we then calculated the “term frequency inverse document frequency” (TFIDF) values for each of the word in training dataset. The TFIDF weight is the measure used to evaluate how important that word is to the document. In our research, the document refers to the preprocessed tweet message. We have as many documents as there are tweets in a given file. TFIDF algorithm assigns each word in a document a number that is proportional to its frequency in the document and inversely proportional to the number of documents in which it occurs. Very common words receive high values for TFIDF scores in contrast to words that are very specific to the document in question. The TFIDF values are calculated for the training data set which has a collection of positive and negative tweets. The classifier is then trained on these values. The trained classifier will then predict on the TFIDF values of the test dataset.

In order to evaluate the accuracy of the sentiment algorithm, one of the coauthors manually labeled a random subset of 500 tweets and this served as the test dataset. The trained classifier was then used to predict the sentiment on the TFIDF values of the test dataset. The accuracy of the classifier was 89.23%. Separate state and county level sentiment variables were created by averaging the sentiment of tweets referencing various racial/ethnic groups.

Examples of positive tweets include “Happy Chinese New Year, indeed! #hungryfeast #louistres #louisxiii”, “Lmao drunk vibes with my main #Gorgeous #Beautiful #Latina #EyelinerOnPoint #Curly.” Examples of negative/neutral tweets include “Day 2 of someone calling me a chink yet this guy was like a 30 year old white trash” and “Hispanic people are dwarves white people are elves black people are orcs.” An example of a negative tweets that referenced Middle Eastern groups include “I am real. There’s no debate. ISIS say they are Muslim. That’s good enough for me,” “Islam doesn’t belong in this country. Doesn’t fit. Not sorry I feel that way”).

Individual-level health data

Birth outcomes data come from the 2015 restricted Natality File with geographical (state and county) identifiers. The file was obtained after submitting a research proposal to the National Association for Public Health Statistics and Information Systems (NAPHSIS). In total, there were 3,988,733 births in the file. Low birth weight was defined as babies weighing less than or equal to 2499 grams. Very low birth weight was defined as babies weighing less than or equal to 1499 grams. Preterm birth was defined as gestational age less than 37 weeks based on the obstetric estimate of gestation at delivery (OE). Comparison of gestational age estimates based on last menstrual period (LMP) and OE found greater agreement between OE and early ultrasound, which is the gold standard. Beginning in 2014, the National Center on Health Statistics has used the OE measure as the primary measure of gestational age.41 Twins, triplets, and other higher order multiple births are at increased risk for low birth weight and preterm birth compared to singletons.42 Hence, we restricted analyses to singleton births.

Covariates

We included individual socio-demographic characteristics along with health/disease predispositions to adjust for potential confounding of the relationship between neighborhood environments and birth outcomes. The following individual-level covariates were included: age (years), marital status (married vs. divorced/single), race (White, non-Hispanic; Black, non-Hispanic; American Indian/ Alaskan Native, non-Hispanic; Asian, non-Hispanic; Native Hawaiian / Pacific Islander, non-Hispanic; multiracial, non-Hispanic), Hispanic ethnicity, education (less than high school, high school, some college, college or greater), body mass index (kg/m2), smoking status during pregnancy (1st, 2nd, 3rd trimester); first birth for the mother (yes/no), and mother received prenatal care in the 1st trimester. In addition, we controlled for state-level median household income and percent non-Hispanic whites to account for betweenstate differences in compositional characteristics. State-level sociodemographic covariates were derived from 1-year 2015 estimates from the American Community Survey. This study was approved by the University of Maryland’s Institutional Review Board and all methods were performed in accordance with the relevant guidelines and regulations.

Analytic Approach

Mapping.

Each geolocated tweet we collected was assigned its corresponding state location using Python software (version 2.7.12). We created maps using ArcGIS Desktop Version 10.5 (ESRI, Redlands CA, http://www.esri.com/arcgis/about-arcgis) and the 2016 U.S. Census TIGER/Line Shapefiles (https://www.census.gov/geo/maps-data/data/tiger-line.html).43 We utilized natural breaks in the data to display spatial patterns.

Regression modeling.

State characteristics were categorized into tertiles and the highest tertile was utilized as the referent level. We hypothesize that individuals living in states with the lowest percent of positive tweets towards racial/ethnic minority groups would be at increased risk for adverse birth outcomes. We implemented log Poisson regression models to estimate associations, controlling for covariates. Reported prevalence ratios represent comparisons between individuals in the 1st tertile (vs. 3rd tertile) and 2nd tertile (vs. 3rd tertile) for state characteristics. Higher prevalence ratios indicate that individuals living in areas in the 1st and 2nd tertile have higher prevalence of adverse birth outcomes than those in the 3rd tertile. Lower prevalence ratios indicate that individuals living in areas in the 1st and 2nd tertile have lower prevalence of adverse birth outcomes than those in the 3rd tertile. We estimated prevalence ratios rather than odds ratios because prevalence ratios are arguably more interpretable; this is because people are more likely to think in terms of probabilities such as 10% and 33% rather than their corresponding odds (1:9 and 1:2 odds). Additionally, odds ratios can overestimate prevalence ratios for common outcomes.44 Statistical analyses were implemented with Stata MP13 (StataCorp LP, College Station, TX).

RESULTS

Descriptive characteristics of tweets

From 2015–2016, we collected 1,249,653 geotagged tweets containing at least one of the relevant keywords pertaining to a racial or ethnic minority group. Approximately 620,000 tweets were about blacks, 205,000 about Hispanics, 270,000 about Asians, 60,000 about Middle Eastern groups, 16,000 about immigrants, and 13,000 about refugees (other less prominently referenced groups included Native Americans and multiracial groups). From a list of 398 terms, only 20 terms were necessary to characterize 84% of all tweets with references to a racial or ethnic minority group. The top Twitter terms were “nigga” (43%), Mexican (8%), Thai (4%), and Asian (4%) (Table 1).

Table 1.

Top Twitter terms

Term N
Percent
nigga 531,820 42.56
mexican 104,959 8.40
thai 52,234 4.18
asian 50,026 4.00
japanese 48,873 3.91
chinese 43,970 3.52
indian 33,106 2.65
korean 23,242 1.86
cuban 22,103 1.77
dominican 19,142 1.53
latino 15,885 1.27
latina 14,585 1.17
oriental 14,354 1.15
muslim 14,161 1.13
ghetto 13,567 1.09
islam 11,110 0.89
jewish 10,898 0.87
vietnamese 10,607 0.85
negro 10,493 0.84

1,249,653 geotagged tweets were collected between April 2015-March 2016 included at least one race or ethnic term.From a list of 398 terms, 20 terms comprised 84% of all tweets with references to a racial or ethnic minority group

Table 2 displays descriptive statistics of the state-level Twitter-derived indicators of sentiment towards racial/ethnic minority groups. Across the contiguous United States, tweets referencing Asians and Hispanics had the highest percentage of positive sentiment at approximately 18% of tweets. Tweets that referenced Middle Eastern groups had the lowest positive sentiment (13% positive) followed by tweets that reference blacks (14% positive).

Table 2.

Descriptive characteristics for contextual and individual level characteristics

N %
State level Twitter characteristics
Overall sentiment of tweets (% positive) 49 19.43
Sentiment towards blacks (% positive) 49 14.14
Sentiment towards Middle Eastern groups (% positive) 49 12.67
Sentiment towards Hispanics (% positive) 49 18.37
Sentiment towards Asians (% positive) 49 18.31
Maternal characteristics
Age (years) (Mean, SD) 3,988,733 28.51 (5.86)
% Married 3,988,733 59.78
% White, non-Hispanic 3,887,154 53.00
% Black, non-Hispanic 3,887,154 14.30
% American Indian/ Alaskan Native, non-Hispanic 3,887,154 0.83
% Asian, non-Hispanic 3,887,154 6.13
% Native Hawaiian / Pacific Islander, non-Hispanic 3,887,154 0.23
% More than one race, non-Hispanic 3,887,154 2.00
% Hispanic ethnicity 3,887,154 23.51
% US born 3,979,894 77.16
% Less than high school 3,870,528 14.52
% High school 3,870,528 25.11
% Some college 3,870,528 29.38
% College or greater 3,870,528 31.00
Body mass index (Mean, SD) 3,794,443 26.68 (6.59)
% Nonsmoker during 1st trimester 3,883,261 92.47
% Nonsmoker during 2nd trimester 3,883,001 93.58
% Nonsmoker during 3rd trimester 3,883,059 93.90
% First birth 3,962,062 31.69
% Received prenatal care in 1st trimester 3,787,844 76.92
% Low birth weight 3,985,085 8.07
% Very low birth weight 3,985,085 1.40
% Preterm birth 3,991,379 9.61

Data sources: 1,249,653 geolocated tweets from the 49 contiguous United States collected between April 2015-March 2016; 2015 Natality file

Geographic distribution of tweets

Appendix B, Figure 1 displays the geographic distribution of tweets that mention a racial or ethnic minority group across the United States. States with the least positive sentiment towards all racial/ethnic groups combined were Nevada and Louisiana with 9–12% tweets being positive (Appendix B, Figure 2). States with the most positive sentiment towards ethnic/racial minority groups were North Dakota, Oregon, Utah, South Dakota, Nebraska, Minnesota, California, and Arkansas with 17–20% of tweets being positive.

Figure 1 displays the geographic distribution of Twitter-derived state sentiment towards blacks/African Americans. States with the least positive sentiment towards blacks were Vermont, Louisiana, Mississippi, Alabama, District of Columbia, and Connecticut with 9–12% of tweets being positive. States with the most positive sentiment towards blacks were South Dakota, Wyoming, and Utah with 17–20% of tweets being positive.

Figure 1. Mapping Twitter-derived sentiment towards blacks.

Figure 1.

Geographic distribution of percent tweets about blacks that are positive State level summaries of percent of tweets about blacks/African Americans that are positive. Choropleth maps were created using ArcGIS Desktop Version 10.5 and the 2016 U.S. Census TIGER/Line Shapefiles

Associations between sentiment and birth outcomes

Table 2 displays the characteristics of birth outcomes and mothers’ characteristics. Among the approximately 4 million births in 2015, 8.07% of babies born were low birth weight, 1.40% were very low birth weight, and 9.61% were preterm births. Among mothers who gave birth, 53% were white, non-Hispanic, and 77% were U.S. born (Table 2).

State level sentiment towards all racial/ethnic minority group combined—a measure that aggregates tweet references across different minority groups— suggest increased risk of low birth weight and preterm birth (Table 3). Mothers living in states in the lowest tertile of positive sentiment towards racial/ethnic minorities had greater prevalences of low birth weight (+6%), very low birth weight (+9%), and preterm birth (+10%) compared to mothers living in states in the highest tertile of positive sentiment, controlling for maternal characteristics and state demographic characteristics.

Table 3.

State sentiment towards race/ethnic minorities and individual level birth outcomes

Among full population

Low Birth
Weight
Very Low Birth
Weight
Preterm birth

State level Twitter-derived variables Prevalence Ratio
(95% CI)b
Prevalence Ratio
(95% CI)b
Prevalence
Ratio (95% CI)b

Proportion of tweets about racial/ethnic minorities that
are positive
 1st tertile (lowest) 1.06 (1.04, 1.07) 1.09 (1.06, 1.12) 1.10 (1.10, 1.11)
 2nd tertile 1.05 (1.03, 1.06) 1.05 (1.02, 1.09) 1.07 (1.06, 1.07)
 N  3,444,526  3,444,526 3,446,140
a

Data source for health outcome: 2015 Natality File. Tweets collected from April 2015-March 2016

b

Adjusted Poisson models were run for each outcome separately. Models controlled for state-level % non-Hispanic white and median household income as well as individual level maternal age, sex, race, ethnicity, foreign birth, education, marital status, smoking, body mass index, first birth status, and prenatal care. Twitter-derived characteristics were categorized into tertiles, with the highest tertile serving as the referent group. Robust standard errors reported.

Sentiment towards specific racial/ethnic groups showed a similar pattern. Table 4 presents our regression results of examining associations between state-level sentiment towards various racial/ethnic groups and adverse birth outcomes. Mothers living in states in the lowest tertile of positive sentiment towards blacks had an 8% greater prevalence of low birth weight and very low birth weight, and a 16% greater prevalence of preterm birth, compared to mothers living in states with the highest positive sentiment towards blacks (Table 4 full population columns). Additionally, mothers in states with the lowest positive sentiment towards Middle Eastern groups had a 9% greater prevalence of low birth weight, 13% greater prevalence of very low birth weight, and 4% greater prevalence of preterm birth compared to mothers in states with the highest positive sentiment. Lower positive sentiment towards Asians was associated with modestly better birth outcomes.

Table 4.

State sentiment towards race/ethnic groups and individual level birth outcomes

Full population Among Hispanics, nonwhites, and foreign-born


Low Birth
Weight
Very Low Birth
Weight
Preterm birth Low Birth
Weight
Very Low Birth
Weight
Preterm birth


State level Twitter-derived variables Prevalence
Ratio (95% CI)b
Prevalence
Ratio (95% CI)b
Prevalence
Ratio (95% CI)b
Prevalence
Ratio (95% CI)b
Prevalence
Ratio (95% CI)b
Prevalence
Ratio (95% CI)b


Proportion of tweets about blacks that
are positive
 1st tertile (lowest) 1.08 (1.06, 1.09) 1.08 (1.04, 1.11) 1.16 (1.15, 1.17) 1.10 (1.09, 1.12) 1.10 (1.05, 1.15) 1.18 (1.16, 1.20)
 2nd tertile 1.07 (1.06, 1.08) 1.07 (1.03, 1.10) 1.11 (1.10, 1.12) 1.09 (1.07, 1.11) 1.07 (1.03, 1.12) 1.11 (1.10, 1.13)
Proportion of tweets about Middle
Eastern groups that are positive
 1st tertile (lowest) 1.09 (1.07, 1.10) 1.13 (1.09, 1.17) 1.04 (1.03, 1.05) 1.11 (1.09, 1.13) 1.18 (1.13, 1.24) 1.05 (1.03, 1.07)
 2nd tertile 1.05 (1.04, 1.07) 1.11 (1.07, 1.15) 1.03 (1.02, 1.04) 1.07 (1.05, 1.10) 1.13 (1.08, 1.19) 1.05 (1.03, 1.07)
Proportion of tweets about Hispanics that
are positive
 1st tertile (lowest) 1.03 (1.01, 1.04) 0.99 (0.95, 1.03) 0.98 (0.97, 0.99) 1.03 (1.01, 1.05) 1.05 (1.00, 1.11) 0.97 (0.96, 0.99)
 2nd tertile 1.03 (1.01, 1.05) 1.04 (0.98, 1.10) 1.00 (0.98, 1.02) 1.04 (1.01, 1.07) 1.10 (1.02, 1.19) 1.00 (0.97, 1.02)
Proportion of tweets about Asians or
Pacific Islander that are positive
 1st tertile (lowest) 0.96 (0.95, 0.98) 0.96 (0.93, 1.00) 0.98 (0.97, 0.99) 0.96 (0.94, 0.98) 0.94 (0.89, 0.99) 0.98 (0.96, 0.99)
 2nd tertile 0.98 (0.97, 1.00) 1.00 (0.97, 1.04) 0.98 (0.97, 0.99) 0.99 (0.97, 1.01) 1.00 (0.95, 1.05) 1.00 (0.98, 1.01)
N 3,444,526 3,444,526 3,446,140 1,705,853 1,705,853 1,706,593
a

Data source for health outcome: 2015 Natality File. Tweets collected from April 2015-March 2016

b

Adjusted Poisson models were run for each outcome separately. Models controlled for state-level % non-Hispanic white and median household income as well as individual level maternal age, sex, race, ethnicity, foreign birth, education, marital status, smoking, body mass index, first birth status, and prenatal care. Twitter-derived characteristics were categorized into tertiles, with the highest tertile serving as the referent group. Robust standard errors reported.

Association between sentiment and birth outcomes among different racial/ethnic groups

We investigated whether associations between sentiment towards racial and ethnic minorities would be stronger among these groups. However, analyses restricting to Hispanic, nonwhite, and foreign-born mothers resulted in very similar magnitudes of associations (Table 4).

Supplemental subgroup analyses found associations among black mothers were not statistically significantly different from estimates from the full population (Appendix B, Table 1). That is, the associations between sentiment towards blacks and birth outcomes of black mothers were not different from the associations between sentiment towards blacks and birth outcomes in the full population of mothers. Similarly, subgroup analyses examining associations between sentiment towards Middle Eastern groups among the foreign born were not statistically different from the full population (Appendix B, Table 2). Regarding Asian-related tweets, in the full population, lower sentiment was related to very modest improvements in birth outcomes. Effect estimates were very small with prevalence ratios ranging from 0.96–0.99. However, among Asians, Asian-related tweets were generally not a statistically significant predictor of birth outcomes. (Appendix B, Table 3).

Lower state-level positive sentiment towards blacks and Middle Easterners was observed to be associated with increases in low birth and preterm birth among non-Hispanic white mothers (Appendix B, Table 4). Supplemental analyses suggest similar, albeit slightly attenuated associations using county level derived Twitter variables as compared to state level variables (Appendix B, Table 5).

DISCUSSION

In this study, we found wide geographic variation exists in the frequency of negative sentiment towards racial and ethnic minorities. Low positive sentiment may include various forms of emotions ranging from a general neutral feeling to hostility toward a person or a group of individuals of a racial or ethnic background.

Social media can be utilized to assess expressed negativity towards minority groups and immigrants that may have implications for heightened psychological stress and stress-related health conditions. The effects of stress on health have been well documented in literature. Some common stress-related health outcomes include physical pain,45 memory loss,46,47 depression,48 and addictions,49,50 all of which can negatively impact birth outcomes. Chronic stress is a known risk factor for poor pregnancy outcomes.9,33 In this study, we found that more negative sentiment towards racial/ethnic groups (particularly blacks and Middle Eastern) groups was related to higher rates of preterm and low birth weight, and this is true for minority groups and the full population.

Study limitations

Twitter data was used to assess sentiment towards racial/ethnic minorities. We did not measure experiences of discrimination. An additional limitation of the sentiment technique we used was that it was unable to identify and process sarcasm or humor in a tweet—challenges which still evade most natural language processing algorithms, though some studies show promising results.51,52 In addition, while we used geolocated tweets to characterize the social environments at the county and state levels, women can be potentially exposed to negative tweets regardless of their location. Also, we were only able to characterize the social environment via online Twitter posts and these may differ from in-person interactions. Using online social media may present some advantages in being able to document use of racial slurs and derogatory language. People may be more willing to display biases or use derogatory language on social media than in in-person interactions. Use of social media also provides an opportunity to assess racial attitudes that may be subtle and not captured in traditional surveys. Previous studies have found that social media can capture information about the social environment that has utility for predicting health outcomes. For example, Eichstaedt et al. found that psychological language on Twitter predicted county-level heart disease mortality.53

Analyses were cross-sectional, and thus the study was unable to evaluate longitudinal or temporal trends. This study used Twitter data from March 2015– April 2016 and evaluated birth outcomes for 2015. While the dates do not entirely overlap, it is unlikely that sentiment towards racial/ethnic minorities changed dramatically over a relatively short time period. In addition, race-related tweets were relatively rare, and the year-long period was used to create stable estimates of the social environment at the state and county levels. Supplemental analyses examining sentiment across calendar month found sentiment to be relatively stable, differing only by a few percentage points (Appendix B, Figure 3). Maternal stressful life events54 and maternal health status55 prior to birth and even prior to pregnancy can influence birth outcomes. It may be the accumulation chronic stressors that impact the birth outcomes.33 Our collection of a year-long period of tweets was used to characterize the social environment in which women reside, with the limitation that we were only able to characterize one window of time in a woman’s life. The analyses did not take into account residential histories and the length of time individuals lived in their current communities. We found links between social environment characteristics (sentiment towards racial/ethnic minorities) and adverse birth outcomes— however, our study was unable to identify the directionality of associations.

For a given state, tweets sent from that state were utilized to construct indicators of sentiment towards racial/ethnic minorities. However, tweets could be sent by both residents and visitors. Additionally, users of social media tend to be younger than the general population; in 2016, 36% of individuals aged 18–29 years old used Twitter compared to 21% of individuals 50–64 years, and 10% among those 65+ years.56 Nonetheless, adoption rates of social media have been steadily increasing. Importantly, mobile access to the internet results in individuals from all socioeconomic status tweeting. Tweets also include information rarely found in other neighborhood sources. Twitter users are composed of individuals as well as groups of individuals, organizations, companies and news outlets. Thus, compiling such information may allow for a more comprehensive examination of the social environment as well as community issues and needs. Nonetheless, using social media data, like other data relies upon people’s willingness to report. The content of tweets reflects information that people feel comfortable reporting and may not represent the true spectrum of their feelings or their experiences.

Additionally, it is important to keep in mind that our estimates of sentiment towards racial and ethnic groups derived from tweets collected over a one-year period. It does not capture the cumulative experience of individuals over their lifetime and thus provides conservative estimates regarding the potential impact of more negative sentiment towards racial/ethnic minorities. We controlled for individual race and ethnicity in all regression models and statelevel percent of non-Hispanic whites in order to take into account differential racial/ethnic composition across states. We noticed that race and ethnicity were strong confounding variables—potentially capturing baseline adverse associations between social group membership and birth outcomes.

Study Implications

Vast amounts of data exist outside of traditional cohort studies and clinical trials that are potentially valuable in answering a host of questions in epidemiology. This study harnesses expansive and relatively untapped Big Data resources to examine recent national trends in racial sentiment and their relationship to birth outcomes. This study addresses the limits to research resulting from primary reliance on self-reported discrimination data by providing new, costefficient data sources for characterizing attitudes towards racial/ethnic minorities in the U.S. This study described innovative methodology to identify and characterize sentiment towards minorities from Twitter data summarized to state and county boundaries, thus providing a measure of the broader social environment in which individuals live. This study found that mothers living in areas with more negative sentiment towards racial/ethnic minorities had higher rates of adverse birth outcomes. Associations observed among minority subgroups did not differ from those found for the full population—potentially suggesting that social environments with greater levels of hostility towards minority groups may have adverse effects for all. Efforts to promote a more accepting and inclusive social environment may have benefits to reducing preterm and low birth weight rates.

Supplementary Material

1
2

HIGHLIGHTS.

  • Wide geographic variation exists in sentiment towards racial/ethnic minorities in the U.S.

  • Positive sentiment was lowest among tweets that reference Middle Easterners and blacks

  • Lower area-level positive sentiment was related to worse birth outcomes

  • Associations with birth outcomes were similar across the full population and subgroups

ACKNOWLEDGMENTS

This study was supported the National Institutes of Health’s Big Data to Knowledge Initiative (BD2K) grants 5K01ES025433; 3K01ES025433–03S1 (Dr. Nguyen, PI) and the NIH Commons Credit Pilot Program (grant number: CCREQ-2016–03-00003). We thank Jessica Omoregie for her research assistance with quality control activities. Y.L.: The views expressed are those of the author, and do not necessarily reflect those of the Federal Trade Commission or any individual Commissioner.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

HUMAN PARTICIPANT PROTECTION

The University of Maryland Institutional Review Board approved the study.

COMPETING FINACIAL INTERESTS:

The authors declare no competing financial interests.

REFERENCES

  • 1.David RJ & Collins JW Jr Differing birth weight among infants of US-born blacks, African-born blacks, and US-born whites. New England Journal of Medicine 337, 1209–1214 (1997). [DOI] [PubMed] [Google Scholar]
  • 2.Alhusen JL, Bower KM, Epstein E & Sharps P Racial discrimination and adverse birth outcomes: an integrative review. Journal of Midwifery & Women’s Health 61, 707–720 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Matthews TJ, MacDorman MF & ME. T Infant mortality statistics from the 2013 period linked birth/infant death data set. Natl Vital Stat Rep 64, 1–30 (2015). [PubMed] [Google Scholar]
  • 4.Iyasu S, Tomashek K & Barfield W Infant Mortality and Low Birth Weight Among Black and White Infants--United States, 1980–2000. Morbidity and Mortality Weekely Report 51, 589–592 (2002). [PubMed] [Google Scholar]
  • 5.U.S. Department of Health and Human Services (U.S. Government Printing Office, Washington, D.C., 2000). [Google Scholar]
  • 6.Kim D & Saada A The Social Determinants of Infant Mortality and Birth Outcomes in Western Developed Nations: A Cross-Country Systematic Review. International Journal of Environmental Research and Public Health 10, 2296–2335, doi: 10.3390/ijerph10062296 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Hille ET et al. School performance at nine years of age in very premature and very low birth weight infants: perinatal risk factors and predictors at five years of age.Collaborative Project on Preterm and Small for Gestational Age (POPS) Infants in The Netherlands. J Pediatr 125, 426–434 (1994). [DOI] [PubMed] [Google Scholar]
  • 8.Burris HH & Hacker MR Birth outcome racial disparities: A result of intersecting social and environmental factors. Seminars in Perinatology 41, 360–366, doi: 10.1053/j.semperi.2017.07.002 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Alhusen JL, Bower KM, Epstein E & Sharps P Racial Discrimination and Adverse Birth Outcomes: An Integrative Review. J Midwifery Womens Health 61, 707–720 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Black LL, Johnson R & VanHoose L The Relationship Between Perceived Racism/Discrimination and Health Among Black American Women: a Review of the Literature from 2003 to 2013. J Racial Ethn Health Disparities 2, 11–20 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Giurgescu C, McFarlin BL, Lomax J, Craddock C & Albrecht A Racial discrimination and the black-white gap in adverse birth outcomes: a review. J Midwifery Womens Health 56, 362–370, doi: 10.1111/j.1542-2011.2011.00034.x (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Lauderdale DS Birth outcomes for Arabic-named women in California before and after September 11. Demography 43, 185–201 (2006). [DOI] [PubMed] [Google Scholar]
  • 13.Mislove A, Lehmann S, Ahn Y, Onnela JP & Rosenquist JN in Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media 554–557 (AAAI Press, 2011). [Google Scholar]
  • 14.Pearlin LI The Sociological Study of Stress. Journal of Health and Social Behavior 30, 241–256 (1989). [PubMed] [Google Scholar]
  • 15.Johns LE et al. Neighborhood social cohesion and posttraumatic stress disorder in a community-based sample: findings from the Detroit Neighborhood Health Study. Soc Psychiat Epidemiol 47, 1899–1906 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Chuang YC, Chuang KY & Yang TH Social cohesion matters in health. Int J Equity Health 12, 87 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Oswald AJ & Powdthavee N Obesity, unhappiness, and the challenge of affluence: theory and evidence. Economic Journal 117, 117:F441–154 (2007). [Google Scholar]
  • 18.Bray I & Gunnell D Suicide rates, life satisfaction and happiness as markers for population mental health. Soc Psychiat Epidemiol 41, 333–337, doi: 10.1007/s00127-0060049-z (2006). [DOI] [PubMed] [Google Scholar]
  • 19.Tella RD, MacCulloch RJ & Oswald AJ The Macroeconomics of Happiness. The Review of Economics and Statistics 85, 809–827, doi: 10.2307/3211807 (2003). [DOI] [Google Scholar]
  • 20.Blanchflower DG & Oswald AJ Hypertension and happiness across nations. Journal of Health Economics 27, 218–233, doi: 10.1016/j.jhealeco.2007.06.002 (2008). [DOI] [PubMed] [Google Scholar]
  • 21.Dodds PS, Harris KD, Kloumann IM, Bliss CA & Danforth CM Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter. PLoS ONE 6, e26752, doi: 10.1371/journal.pone.0026752 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Di Tella R & MacCulloch R Gross national happiness as an answer to the Easterlin Paradox? Journal of Development Economics 86, 22–42, doi: 10.1016/j.jdeveco.2007.06.008 (2008). [DOI] [Google Scholar]
  • 23.Nguyen QC et al. Building a National Neighborhood Dataset From Geotagged Twitter Data for Indicators of Happiness, Diet, and Physical Activity. JMIR Public Health Surveill 2, e158 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Nguyen QC et al. Social media indicators of the food environment and state health outcomes. Public Health 148, 120–128, doi: 10.1016/j.puhe.2017.03.013 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Bahk CY et al. Publicly available online tool facilitates real-time monitoring of vaccine conversations and sentiments. Health Affairs 35, 341–347 (2016). [DOI] [PubMed] [Google Scholar]
  • 26.Nsoesie EO & Brownstein JS Computational Approaches to Influenza Surveillance: Beyond Timeliness. Cell host & microbe 17, 275–278 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.McIver JD et al. Characterizing Sleep Issues Using Twitter. J Med Internet Res 17, e140, doi: 10.2196/jmir.4476 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Nguyen QC et al. Leveraging geotagged Twitter data to examine neighborhood happiness, diet, and physical activity. Applied Geography 73, 77–88, doi: 10.1016/j.apgeog.2016.06.003 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Yin Z, Fabbri D, Rosenbloom TS & Malin B A Scalable Framework to Detect Personal Health Mentions on Twitter. J Med Internet Res 17, e138, doi: 10.2196/jmir.4305 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Hawkins JB et al. Measuring patient-perceived quality of care in US hospitals using Twitter. BMJ Quality & Safety 10.1136/bmjqs-2015–004309, doi: 10.1136/bmjqs-2015004309 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Chae DH et al. Association between an Internet-Based Measure of Area Racism and Black Mortality. PLoS One 10, e0122963 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Chae DH et al. Area racism and birth outcomes among Blacks in the United States. Soc Sci Med 199, 49–55 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Latendresse G The Interaction Between Chronic Stress and Pregnancy: Preterm Birth from A Biobehavioral Perspective. Journal of midwifery & women’s health 54, 8–17, doi: 10.1016/j.jmwh.2008.08.001 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Hoffman S & Hatch MC Stress, social support and pregnancy outcome: a reassessment based on recent research. Paediatr Perinat Epidemiol 10, 380–405 (1996). [DOI] [PubMed] [Google Scholar]
  • 35.Guttman A R-trees: a dynamic index structure for spatial searching. Proceedings of the 1984 ACM SIGMOD international conference on Management of Data 14, 47–57 (1984). [Google Scholar]
  • 36.Stanford Natural Language Processing Group. Stanford Tokenizer, <http://nlp.stanford.edu/software/tokenizer.shtml> (2015).
  • 37.Sentiment140. For Academics, <http://help.sentiment140.com/for-students. Archived at: http://www.webcitation.org/6joQzyTSS> (
  • 38.Sanders Analytics Twitter Sentiment Corpus, <http://www.sananalytics.com/lab/twittersentiment/> (2011).
  • 39.Kaggle in Class. Sentiment Classification, <https://inclass.kaggle.com/c/si650winter11> (2011).
  • 40.Go A, Bhayani R & Huang L Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford: 1 (2009). [Google Scholar]
  • 41.Martin JA, Osterman M, Kirmeyer S & Gregory E Measuring gestational age in vital statistics data: transitioning to the obstetric estimate. National Vital Statistics Reports: From the Centers for Disease Control and Prevention, National Center for Health Statistics, National Vital Statistics System 64, 1–20 (2015). [PubMed] [Google Scholar]
  • 42.Luke B & Brown MB The Changing Risk of Infant Mortality by Gestation, Plurality, and Race: 1989–1991 Versus 1999–2001. Pediatrics 118, 2488–2497, doi: 10.1542/peds.2006-1824 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.United States Census Bureau. TIGER/Line® Shapefiles and TIGER/Line® Files, <https://www.census.gov/geo/maps-data/data/tiger-line.html> (2016).
  • 44.Knol MJ, Cessie SL, Algra A, Vandenbroucke JP & Groenwold RHH Overestimation of risk ratios by odds ratios in trials and cohort studies: alternatives to logistic regression. CMAJ 184, 895–899 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Cohen S et al. Chronic stress, glucocorticoid receptor resistance, inflammation, and disease risk. Proceedings of the National Academy of Sciences 109, 5995–5999 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Peavy GM et al. Effects of chronic stress on memory decline in cognitively normal and mildly impaired older adults. American Journal of Psychiatry 166, 1384–1391 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Wilding J, Andrews B & Hejdenberg J Relations between life difficulties, measures of working memory operation, and examination performance in a student sample. Memory 15, 57–62 (2007). [DOI] [PubMed] [Google Scholar]
  • 48.Kendler KS, Karkowski LM & Prescott CA Causal relationship between stressful life events and the onset of major depression. American Journal of Psychiatry 156, 837841 (1999). [DOI] [PubMed] [Google Scholar]
  • 49.Baker TB, Piper ME, McCarthy DE, Majeskie MR & Fiore MC Addiction motivation reformulated: an affective processing model of negative reinforcement. Psychological review 111, 33 (2004). [DOI] [PubMed] [Google Scholar]
  • 50.Brady KT & Sinha R Co-occurring mental and substance use disorders: the neurobiological effects of chronic stress. Focus 5, 229–239 (2007). [DOI] [PubMed] [Google Scholar]
  • 51.Burfoot C & Baldwin T in Proceedings of the ACL-IJCNLP 2009 conference short papers 161–164 (Association for Computational Linguistics; ). [Google Scholar]
  • 52.Ptáček T, Habernal I & Hong J in Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. 213–223. [Google Scholar]
  • 53.Eichstaedt JC et al. Psychological Language on Twitter Predicts County-Level Heart Disease Mortality. Psychological Science 26, 159–169, doi: 10.1177/0956797614557867 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Witt WP et al. Maternal Stressful Life Events Prior to Conception and the Impact on Infant Birth Weight in the United States. American Journal of Public Health 104, S81–S89, doi: 10.2105/AJPH.2013.301544 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Weisman CS et al. Preconception Predictors of Birth Outcomes: Prospective Findings from the Central Pennsylvania Women’s Health Study. Maternal and child health journal 15, 829–835, doi: 10.1007/s10995-009-0473-2 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Greenwood S, Perrin A & Duggan M Social Media Update 2016 (Pew Research Center, Washington, DC, 2016). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2

RESOURCES