Skip to main content
. 2018 Apr 18;25(8):931–944. doi: 10.1093/jamia/ocy012

Table 2.

Variable Operationalization

Variable Operationalization
Dependent variable
Self-care behaviori To measure self-care behavior of an “i,” we created a self-made dictionary based on ratings by experts. We provided to an HIV physician 395 posts belonging to 105 threads that were selected randomly from 3 strata reflecting no, low, and high self-care to ensure variability in expert coding. The coding sample provided to the expert did not contain strata information. The three strata were based on an initial score that was assigned to self-care behavior using LIWC standard dictionaries for “achievement” and “reward” and a custom dictionary created based on the adherence questionnaire of ACTG. Using Python’s Natural Language Took Kit (NLTK) and after removing stop words, we selected top 300 words from the posts that the expert coded as reflecting high self-care behavior. From these 300 words, we chose the words that most appropriately represent self-care behavior. After several iterations, we finalized a concise, pithy dictionary that succinctly captures self-care behavior. The raw score for the extent of self-care behavior of an “i” expressed in a thread is calculated as the unique frequency count of words in our self-care dictionary that appear in the concatenated subsequent posts by this “i” in this thread after his/her initial post. Prior literature shows that the frequency of appearance is a good indicator of relevance theoretically and empirically.53–55 We adjusted this raw self-care score with 2 factors. First, it is possible that an “i” expresses self-care words but is expressing his/her intention, and not actual behavior. To account for such futurity in expression of a support seeker, we multiplied the raw self-care behavior score with a factor (100 − future_focus_score)/100. By doing this we penalize a post for high future focus. The future_focus_score was calculated as the percentage of words in the text that denote futurity—this value is calculated using the future focus dimension of LIWC. Second, there is also a possibility that support seeker “i” used self-care behavior-oriented words but was complaining about it. To account for this, we calculated the net emotion score for i’s responses in the thread by subtracting the LIWC negative emotional tone score from the positive emotional tone score. When the net score was positive, we used the self-care score as it is. If the net emotion score was negative, self-care behavior was expressed in a negative emotional environment, indicating that the individual is probably not engaged in self-care behavior but complaining about it. In this case, we converted the adjusted score of self-care behavior as described above to zero
Independent variables
Objective informationj This variable is measured as the amount of factual information about the disease and treatment management provided by “j’s” in a thread. To measure objective information, we created a self-made dictionary based on ratings by experts. We provided to a senior HIV researcher 586 posts belonging to 70 threads that were selected randomly from 2 strata reflecting low and high objective information to ensure variability in expert coding. The coding sample provided to the expert did not contain strata information. The 2 strata were based on an initial score that was assigned to objective information using LIWC standard dictionaries for “biological processes” and “cognitive processes.” Using Python’s NLTK, we removed the stop words and selected top 300 words from the posts that the expert coded as containing high objective information. From the top 300 words, we chose the words that most appropriately represent objective information. After several iterations, we finalized a concise, pithy dictionary that succinctly captures objective information. Using this dictionary as input, we obtained the LIWC percentage score to quantify objective information in the concatenated subsequent posts by “j’s” in a thread after a support seeker’s question. This LIWC percentage score was our score for objective information
Experiential informationj This variable is measured as the amount of experiential information provided by “j’s” in terms of their personal stories and anecdotes from the past. To measure experiential information, we created a self-made dictionary based on ratings by experts. We provided to a senior HIV researcher 586 posts belonging to 70 threads that were selected randomly from 2 strata reflecting low and high experiential information to ensure variability in expert coding. The coding sample provided to the expert did not contain strata information. The 2 strata were based on an initial score that was assigned to experiential information using LIWC standard dictionaries for “past focus.” Using Python’s NLTK, we removed the stop words and selected the top 300 words from the posts that the expert coded as containing high experiential information. From the top 300 words, we chose the words that most appropriately represent experiential information. After several iterations, we finalized a concise, pithy dictionary that succinctly captures experiential information. Using this dictionary as input, we obtained a raw experiential information score as the LIWC percentage score for the concatenated posts by “j’s” in a thread after an “i’s” question. Further, we adjusted the experiential information score by 2 factors: (1) second-hand information and (2) first-person account score. We adjusted downward the raw experiential information score for second-hand information by a factor (100 − second-hand information score)/100 because there is a possibility that a “j” may have used words like doctor, telling, etc. that may indicate that the information contained in the thread was obtained from second hand sources like the Internet, a doctor, etc. rather than based on personal experience. We adjusted upward the raw experiential information score for first-person account by a factor (first-person account score/100) because we found that when “j’s” share their personal experiences, they typically use first person singular pronouns like “i,” “me,” “mine,” etc. For example, a post “I took the medicine and I felt better” has high experiential content compared to the zero experiential information in the post “my friend took the medicine and he felt better.” To measure second-hand information, we first created a custom dictionary consisting of most-frequently used words by support providers that reflect second-hand information. To do this, we performed a word frequency count analysis on the text obtained after removing stop words from the concatenated posts of support providers in a thread using Python’s NLTK. This custom dictionary was used to generate second-hand information scores for each thread using LIWC. Next, we calculated a first-person account score using the “first person singular pronoun” dictionary of LIWC. We used these two scores for adjustment as above
Emotional tonej This variable is measured as the LIWC score on “emotional tone” obtained by analyzing the concatenated posts of “j’s” in the focal thread
Community involvementj This variable is measured as the number of unique “j’s” who reply to an “i” in response to his/her question in a thread
Control variables—capturing support seeker’s (i's) characteristics
Emotional tonei This variable is measured by the LIWC score on “emotional tone” obtained by analyzing the concatenated posts of “i” in the focal thread. This measure can act as a proxy for the support seeker’s sickness
Self-disclosurei This variable measures the willingness of an “i” to share personal information. Based on the information collected by this online community, we accounted for disclosure of age, gender, and location, with each being a binary code (0—not shared, 1—shared) and contributing to the self-disclosure score for the “i.” For each user, we aggregated the three disclosure scores to form the self-disclosure score for an “i” (an integer value between 0 and 3). This score captures the openness of “i” and the extent to which he/she trusts this community
Degree centralityi This is a social network measure gauging the centrality of an “i” in the reply network. It is measured by summing up the in-degree and out-degree of the “i56
Creator involvementi This variable was measured as the number of posts by an “i” in a thread initiated by him/her. This variable reflects “i’s” importance to that thread, which in turn could be related to self-care behavior
Question score of postsi This variable measures the possibility that an “i” asked questions about self-care behavior rather than actually engaged in self-care behavior. We first split the concatenated posts by “i” in a thread into individual sentences using Python’s NLTK tokenizers. We then classified each sentence in that thread into two categories: question or not a question. Finally, we obtained the question score as (number of question sentences/number of total sentences)
Word count of first posti This variable measures the number of words in the first post by an “i
Community activityi This variable was measured by counting the total number of threads generated by an “i” in this online community. This variable reflects how active the “i” is in the online community
Control variables—capturing support provider’s (j’s) characteristics
Crowd consensusj This variable reflects the degree of agreement in the support provided by “j’s” in a thread. To calculate it, we considered all replies posted by all “j’s” in a given thread. We generated a vector representation for each reply using Term Frequency-Inverse Document Frequency weighting. We used cosine similarity to measure the similarity of content between a pair of replies by community members. For every thread, we calculated similarity scores for every pair of replies and averaged all the scores to define crowd consensus for a thread
Self-disclosurej This variable measures the willingness of “j’s” to share personal information. Based on information collected by this online community, we accounted for disclosure of age, gender, and location, with each being a binary code (0—not shared, 1—shared) contributing to the self-disclosure score. We aggregated the 3 disclosure scores for each “j” to form the self-disclosure score for j (an integer value between 0 and 3). For each thread, we averaged the self-disclosure scores of all “j’s” in the thread to generate a self-disclosure scorej for the thread
Control variables—capturing thread characteristics
Year dummy variable Our dataset spanned 13 years, and there is a possibility that the online social activity might have changed over the years. We included 12 dummy variables in our model to account for the effect of each year from 2006 to 2017
Previous-interactioni,j This measure quantifies the familiarity between an “i” and “j’s” offering social support within this thread. Previous interaction is measured by the average number of interactions between an “i” and “j’s” prior to the creation of this thread. For example, we traced the number of interactions between an i and j1, the i and j2 … , the i and jn prior to focal thread, and averaged these numbers to get the previous interactioni,j for the entire thread
Thread duration This variable was calculated as the difference between the date of creation of the thread and the date of last post (either by i or j). This measure gives a sense of the level of activity in the focal thread