Table 3.
Practical and Ethical Considerations
Feature | Surveys | Social media analyses |
---|---|---|
Costs to researchers | Data collection is expensive, especially for face-to-face interviews: salaries for interviewers, survey operation costs, programming costs, compensation to respondents. Analysis requires salaries for skilled personnel (statisticians, data managers). Data collection is time intensive, from survey design to interviewing time to analysis. |
Data retrieval is low cost for many researchers (especially if their organizations bear the cost of a subscription to a data source), but can vary across sites. Analysis requires salaries for skilled personnel (data scientists, computational linguists, etc.). Data retrieval is speedy and can be carried out immediately or very soon after an event, or even continuously. Analysis is more time intensive, and data storage costs may be substantial. |
Research communities | Decades of practice and professionalization. Deep and extensive scrutiny of methods inherent in the discipline. | Practice and research are newer, and methodological research is still developing. There is a wider range in scientific orientations and what are seen as necessary skills. |
Ethics of consent for use of data | Surveys involve explicit consent from respondents, who grant permission to use their anonymized data. | Social media posters may not be aware that their data are being used for research, even if they have consented in a user agreement. |
Ethics review of research protocol | Before deployment, surveys are (ideally) subject to considerations by IRB or government ethics board. | Use of data not consistently regulated by IRBs or ethics boards. Often treated as secondary data and unregulated. |
Identifiability of respondents/users | Identities of respondents are unavailable from aggregate reports, though potentially recoverable using covariates of survey responses in data set (e.g., location, ethnicity, income). If microdata are made available to the public, survey organization must remove potentially identifying information in order to honor agreement with respondents. | Identities of users are potentially recoverable from wording and content of posts (e.g., photographs, Twitter handles). Ethical practices for concealing users’ identities in research analyses are not yet agreed upon; social media posts can be intended to be public. |
Analytic approach | Survey researchers define a priori what the relevant variables for analysis will be. In a well-designed survey questionnaire, questions are included because they reflect a hypothesis held by the researchers. | Social media analyses (particularly machine learning) tend to follow a bottom-up data-driven approach that makes no assumptions about which variables are likely to be relevant. Social media data can be used to test hypotheses if researchers have them. |
Potential for researcher bias | When making predictions, the only variables used are those explicitly included by the researcher. What is tested and what can be found may therefore depend on the researcher’s preconceived notions. | When researchers pre-define relevant variables, exactly the same as for surveys. When all variables collected are used in generating a predictive model, there is a greater chance to identify covariates that were not imagined by the researcher, but also potential to be misled by spurious relationships. |
Evaluating model quality | Most models used to analyze surveys can be evaluated with significance testing and by comparing regression coefficients. Coefficients can be compared across models to choose best fit. | As with surveys, models can be assessed in terms of how well they fit the data and how parsimonious they are. But social media analysis models can be too large for evaluation with p-values (which will often be significant with such large samples) and can be easily saturated through the inclusion of too many parameters (a large number of predictors can assure a good model fit). |
Adjustments for nonrepresentativeness of data | Surveys can be matched to benchmarks (e.g., national census data) and any one response can be adjusted accordingly (e.g., given less weight if produced by member of group overrepresented in sample). Strategies exist to adjust for response likelihoods of households and individuals. | Possible to dampen the influence of frequent posters or those with many followers and amplify the influence of others, through strategic selection of content. Some researchers would argue that such adjustments are not needed or even appropriate, and concern about nonrepresentativeness reflects misunderstanding about how prediction from social media analyses works. |
Stability of data source | Large infrastructure and ongoing survey programs, but depends on continued funding, mostly from government and nonprofit sources. Increasing refusal to participate may affect long-term stability of data source. |
Not stable; may never be. Driven by social media companies’ business models, user base, technology changes, and revenues. No guarantee that organizations will be in business or continue to generate or release the same data stream in future. Posters’ concerns about surveillance and data privacy may affect long-term stability of data source. |
Ownership of data | Respondents explicitly consent to researchers owning data. | Users and the social media company own the data, but disputed. |
Perception of research enterprise | Some members of the public find the survey enterprise intrusive and do not understand how participating may benefit them. Respondents can be unclear on the legitimacy of a survey invitation, and may not perceive a difference between scientific versus marketing and sales surveys. | Although posters probably do not think about researchers mining social media data when they are posting, ongoing debates about ethics of surveillance have raised concerns about data mining enterprise. |
Data users and impact | Policy makers, business owners, and individuals rely on official survey data for decision-making. | Primary data users so far are advertisers and market researchers. |