Abstract
Objectives:
Most researchers who study the effects of hormonal contraception on menstrual bleeding rely on self-reported data via paper diaries, for which completeness and timeliness have been shown to be poor. The purpose of this exploratory study was to compare the completeness and timeliness of bleeding data collected via paper diaries, text messages or smartphone application (a.k.a “app”).
Methods:
This was a sub-study of a double-blinded, placebo-controlled randomized trial comparing the effects of a non-steroidal anti-inflammatory drug, naproxen, with placebo when using a copper IUD. Participants tracked bleeding and symptoms over 112 days. Participants tracked bleeding daily using a paper diary as well as with either text messages or a smartphone app. Participants who used paper and the app were also able to record non-bleeding symptoms.
Results:
Twenty-five participants submitted diaries. Of these participants, 10 completed both paper and app diaries, 7 completed both paper and text messages, 4 completed the paper diary only, 4 completed the app only. Text messages had the most complete data (108 days), followed by the app (96 days) and paper diaries (84 days). The lag time between a bleeding event and the date recording that event was 0.10 days for text, 1.0 days for app, and 4.73 days for paper diaries. Participants using the app reported a median of 33 other symptoms over the study period compared to 7 for the paper diaries
Discussion:
Our findings suggest texts demonstrated more complete and timely bleeding data than either paper diaries, or the app. Compared to paper diaries, the app delivered more complete, timely data, and also collected a large set of symptoms.
Keywords: contraception, self-reported bleeding data, smartphone app, paper diaries, text messaging, SMS
Introduction
Contraceptive-induced menstrual bleeding changes are closely linked to perceptions about whether to use a particular birth-control method (Polis, Hussain, & Berry, 2018). Studies evaluating the effects of hormonal contraception on menstrual bleeding have typically relied on self-reported daily bleeding data collected using paper diaries (Mishell et al., 2007). Studies suggest that the completeness and timeliness of data reported via paper diaries are poor (Lim, Sacks-Davis, Aitken, Hocking, & Hellard, 2010; Stone, Shiffman, Schwartz, Broderick, & Hufford, 2003). A 90-day contraceptive study of 230 participants found that the 115 participants assigned to texting reported 82 days of bleeding data compared to only 36 days for participants assigned to paper diaries (Nippita et al., 2015). In an age of increasing technologies, contraceptive research studies need new strategies to capture daily bleeding data.
Despite the rapid growth of period tracking smartphone applications (a.k.a. “app”), to our knowledge no studies have examined the timeliness and quality of data collected via app for contraceptive studies. Smartphone apps incorporate reminders, are portable, and can collect a robust set of bleeding and symptoms data. Researchers studying contraception can leverage these app features to collect both more complete and timely side-effect and bleeding data (Earle, Marston, Hadley, & Banks, 2020). In this exploratory study, we compare the completeness and timeliness of study data collected via app, text messages, and paper diaries.
Methods
This project was a sub-study of a double-blinded, placebo-controlled randomized trial comparing the effect of a non-steroidal anti-inflammatory drug, naproxen, with placebo when using a copper IUD. Participants were asked to take an assigned medication for seven days starting the first day of their period for three consecutive 28-day cycles, followed by a 28-day observational cycle without medication. Study participants were enrolled between February 2016 and May 2017 and tracked their scheduled and unscheduled bleeding and other symptoms for a total of four 28-day “cycles” or 112 consecutive days.
Setting/Participants
Participants were eligible if they were between the ages of 18 and 49 years, English-speaking, had regular menstrual periods, had no contraindications to naproxen and had requested the copper IUD for contraception. Per the parent study’s protocol, all participants were asked to track daily bleeding and symptoms using a paper diary and one additional diary (text or app). The text messaging option was only available for the first half of the enrollment period due to cost. Consequently, participants who enrolled during the second half of the study were automatically assigned to the app. This study was approved by the Human Subjects Review Board of the University of Washington and is registered in clinicaltrials.gov (NCT02519231). All participants gave written informed consent prior to participation in the study.
Data Collection Tools
The traditional paper diaries consisted of monthly calendars with spaces for each day of the month for participants to specify whether they used the study medication that day, bleeding information for that day (no bleeding, bleeding, or spotting), and the date the bleeding information was recorded (i.e., today’s date). Daily spaces on the paper diaries also included six possible symptoms (headache, stomach cramps, nausea, vomiting, diarrhea, dizziness) with checkboxes to indicate whether the participant experienced those symptoms, and blank space to add other symptoms not already listed.
Participants assigned to text messages received two notifications each evening. The first notification asked about study medication use, to which they responded either “1” or “2” (1=study medication taken; 2=no study medication taken). The second notification was sent two minutes after the first and asked about bleeding type, to which respondents reported back either “1”, “2”, or “3” (1=bleeding; 2=spotting; 3=no bleeding). Participants were unable to relay any information about symptoms via text. If participants did not respond to the first notification attempt, a second attempt was sent the following morning. The text service required a response within a 24-hour period in order to be counted, which prevented text messages being more than one day late. MiR3 (San Diego, CA) operated the text service.
Participants assigned to the app used Clue, a free period tracking app developed by BioWink (Berlin, Germany) (“Clue,” 2019). We chose Clue because it had an organizational section devoted to supporting scientific investigators and has been used in other menstrual-related studies (Adams Hillard & Wheeler, 2017). We also favored Clue because the accuracy of information shared in the app and menstrual cycle predictions have been validated in a formal evaluation of menstrual cycle tracking apps (Moglia, Nguyen, Chyjek, Chen, & Castaño, 2016). Participants entered their study number into the app once they signed the consent form. Participants documented bleeding (e.g. light, medium, heavy, spotting) and certain symptoms (e.g. headaches, cramps, and nausea) using options that were pre-programmed in the app. Participants were also able to document additional, limitless symptoms that were not pre-programmed in the app by typing them out and adding them as custom tags that could be reused on future days. Participants using Clue also received daily notifications as a reminder to document their bleeding and symptoms.
Analysis
To ensure research participant data would be securely transferred to the University of Washington, separate data use agreements were signed between the University of Washington and MIR3 and Biowink (Clue app).
Our first aim was to compare the completeness of the bleeding data. We compared the median number of days with bleeding data across paper, text, and app using Mood’s Median Test. To evaluate attrition over the study period and to accommodate small cell counts, we also compared the proportion of participants who recorded bleeding data for >50% and >90% of the study days across the modalities using Fisher’s Exact Test. Our second aim was to assess the timeliness of the data. Timeliness data for paper and text were noted for day and hour of documentation, whereas the app data were noted as whole days. We used Mood’s Median Test to compare the median days between event and recording dates over the total study period, that included the first 84 days (during treatment period), and the last 28 days (no treatment). Mood’s Median Test was used to accommodate right skewed data and to compare more than two medians. Our third aim was to assess the completeness of the reported symptoms data. We compared the median number of symptoms entered per person between paper diaries and the app using a two-sample Mann-Whitney U Test. All data analysis was conducted in R version 3.6.1 (R Core Team, 2020).
Results
Twenty-five participants completed the study and submitted diary information. Of these participants, 10 completed both paper and app diaries, 7 completed paper and text messages, 4 completed the paper diary only, 4 completed the app diary only. The mean participant age was 30 years. The majority of participants were white, had a bachelor’s degree or higher, and made less than $39,000 per year (Table 1).
Table 1:
Characteristics of study participants who were assigned to paper diary and text messages or smartphone app (N=25, includes 4 participants who submitted paper diaries only and 4 who submitted app data only)
Paper (n=21) | Text (n=7) | App (n=14) | |
---|---|---|---|
Age in years | |||
Mean | 30 | 31 | 28 |
Range | 25–42 | 35–35 | 21–36 |
Missing | 4 | 0 | 1 |
Race/ethnicity | N (%) | N (%) | N (%) |
White | 16 (76.2) | 7 (100.0) | 9 (64.3) |
Asian/Pacific Islander | 3 (14.3) | 0 (0.0) | 5 (35.7) |
Other | 2 (9.5) | 0 (0.0) | 0 (0.0) |
Education | N (%) | N (%) | N (%) |
High school or less | 1 (4.8) | 1 (14.3) | 0 (0.0) |
Some college | 5 (23.8) | 2 (28.6) | 4 (28.5) |
Bachelor’s degree or higher | 15 (71.4) | 4 (57.1) | 10 (71.4) |
Employed | N (%) | N (%) | N (%) |
Yes | 18 (85.7) | 7 (100.0) | 10 (71.4) |
Marital status | N (%) | N (%) | N (%) |
Single/divorced | 8 (38.1) | 3 (42.9) | 5 (35.7) |
Married/remarried | 4 (19.0) | 0 (0.0) | 4 (28.6) |
Other | 9 (42.9) | 4 (57.1) | 5 (35.7) |
Household income (annual) | N (%) | N (%) | N (%) |
< $10,000 | 1 (4.8) | 0 (0.0) | 2 (14.3) |
$10,000-$39,999 | 11 (52.4) | 4 (57.1) | 7 (50.0) |
$40,000–74,999 | 8 (38.1) | 3 (42.9) | 3 (21.4) |
Over $75,000 | 1 (4.8) | 0 (0.0) | 2 (14.3) |
Parity | N (%) | N (%) | N (%) |
Nulliparous | 13 (61.9) | 3 (42.9) | 10 (71.4) |
1+ | 8 (38.1) | 4 (57.1) | 4 (28.6) |
Out of 112 maximum number of days, texting had the highest number of days with complete data (median=108 days), followed by app (median=96 days), followed by paper diaries (median=84 days) (Table 2). However, the medians across diary types were not significantly different from one another (p-value=0.42). We did find higher attrition in bleeding data completion over the course of the study period for the paper diaries than for text or app. Only 8 out of 15 paper diary users who had completed data for 50% of the study days also completed data for 90% of study days, as compared to 5 out of 6 text users and 7 out 8 app users.
Table 2:
Number of days of bleeding data reported by paper, text and app (out of 112 possible days)
Paper (n=21) | Text (n=7) | App (n = 14) | Significance (p value) | |
---|---|---|---|---|
Median (IQR) | Median (IQR) | Median (IQR) | ||
Number of days with completed data per participant | 84 (56–112) | 108 (96.5–110.5) | 96 (36–112.5) | 0.42+ |
N (%) | N (%) | N (%) | ||
Completed bleeding data for more than 50% of study days | 15 (71) | 6 (86) | 8 (57) | 0.40* |
Completed bleeding data for more than 90% of study days | 8 (38) | 5 (71) | 7 (50) | 0.32* |
Mood’s Median Test
Fisher’s Exact Test
We found a significant difference in the median number of days between events and recording dates across paper, text and app (Table 3). Participants using text messages documented their bleeding with the shortest lag time between event and recording (median lag time of 0.10 days, IQR 0.025–0.22 days), followed by smartphone app (median of 1.0 days, IQR 1.0–5.0), and then the paper diary (median of 4.73 days, IQR 0.73–16.4).
Table 3:
Time lag between date of bleeding event and date of recording of that bleeding event
Paper (n=20)ˆ | Text (n=7) | App (n=14) | Significance (p value) | |
---|---|---|---|---|
Median (IQR) |
Median (IQR) |
Median (IQR) |
||
Days between event and recording date over total study period | 4.73 (0.73–16.4) | 0.10 (0.025–0.22) | 1.0 (1.0–5.0) | 0.03 |
Days between event and recording date over first 84 days (during treatment only) | 5.93 (0.81–16.5) | 0.09 (0.025–0.16) | 1.0 (0.0–1.25) | 0.001 |
Days between event and recording date over last 28 days (no treatment) | 2.22 (0.00–9.78) | 0.13 (0.013–0.21) | 0.00 (0.00–1.00) | 0.16 |
1 patient did not fill out date of recording out of the 21 original paper diary submissions
p-values are using Mood’s Median Test
We also compared entries of (non-bleeding) symptom data between paper and smartphone app users (Table 4). Participants submitting data via text were not included in this part of the analysis as they did not have the option to provide non-bleeding symptoms. The median number of symptoms entered by women using the app over the study period (33 symptoms [IQR: 27–55]) was more than four times higher than the median number of symptoms recorded in the paper diary (7 symptoms [IQR: 0–22]) (p=0.0065).
Table 4:
Number of symptom entries during 112-day study period
Paper (n = 21) | App (n = 14) | Significance (p value) |
|
---|---|---|---|
Median (IQR) | Median (IQR) | ||
Symptoms entered per person | 7.0 (0–22) | 33.0 (27–55) | 0.0065* |
Total symptoms entered | 349 | 824 | |
Type of symptom reported | (number of entries) | (number of entries) | |
Cramps | 185+ | 169+ | |
Bloating | 17 | 89+ | |
Headaches | 78+ | 49+ | |
Breast soreness | 12 | 63+ | |
Nausea | 24+ | 29+ | |
Ovulation Pain | 0 | 6+ | |
Vomiting | 1+ | 0 | |
Back Pain | 1 | 2 | |
Fatigue | 2 | 0 | |
Gas | 21 | 76+ | |
Diarrhea | 5+ | 7 | |
Other | 3 | 318 |
Symptoms that were pre-listed in the paper diary or pre-programmed in the app
Two-sample Mann-Whitney U Test
Discussion
In this contraceptive study of 25 study participants who documented bleeding data via paper diaries, text messages or a smartphone app, text messages demonstrated superiority in terms of data completeness and timeliness. We found more complete and timely data with the app than with paper diaries. Additionally, we found that the app diary provided an added benefit of collecting a large and varied set of symptoms data. The daily reminders built into text messaging and the app and the convenience of having a virtual diary accessible at any time on their smartphones likely contributed to the improved completeness and timeliness of data collected via text and app versus the paper diaries.
While not a primary endpoint, we anecdotally found that the app was easier to implement as a data collection tool than either the paper diaries or text messages. Because we had to transcribe the paper diary data into our digital database, the paper diary increased the research team workload and potential data entry errors. Although we received the text data in digital form, the output was more disorganized and cumbersome to analyze than the app data. The text messaging modality with MiR3 was unaffordable with our limited grant budget. Clue, on the other hand, was free to use and only required a Data Use Agreement (DUA) between Biowink and the University of Washington.
This study has several limitations. First, the small sample size (n = 25) restricts our inferences to the sample and limits generalizability. Second, because the study required participants to use multiple diary types (e.g., paper and text messages, or paper and app) to collect adequate paper diary data for the parent study, the study design may have biased the results of our sub-study. For example, requesting participants to complete two separate diary types may have led to documentation fatigue and decreased data completeness, which may explain why a third of our participants (n=8) only completed one diary modality. Alternatively, reinforcement across the diary types may have increased data completion and timeliness. Comparing our data to the Nippita et. al. study that randomized patients to either a text message arm or a paper diary arm for collecting bleeding data, the latter may be the more likely scenario (Nippita et al., 2015). In the Nippita study, participants in the text message arm recorded data for 91% of days in the study as compared to 40% of days among participants in the paper diary arm. In our study, participants recorded text message data for 96% of days in the study period and paper diary data for 75% of days in the study period. Recording data via both paper diaries and an electronic format that included daily reminders may have facilitated inflated completion data for the paper diaries. It is also possible that participants used the saved text or app data as a helpful reminder when filling in the paper diaries at a later point (or vice versa), introducing another potential source of bias. Our study would have been stronger if each participant had been randomly assigned a single diary type.
Third, the varied design of the three diary types likely influenced the data outcomes. Recording data via text messaging was very simple as compared to paper and app. Although participants assigned to texting only needed to reply to two short questions via text each day, we did not assess whether two texts two minutes apart was burdensome for the participants. The app required participants to open the app, select the date, and choose from a set of bleeding and symptoms options. The completeness of the app data may have also been improved if it were easier to indicate “no bleeding” in the app (in our study, participants had to swipe through a few screens to select “no bleeding” as a custom tag). Participants who were in the text message arm only had a window of 24 hours to respond to a given text notification, or their response was invalid. Thus, participants who were counseled about the time-restriction for the text message arm likely contributed to the very short median lag time of 2.4 hours via text. The timeliness data that we calculated for the paper diaries was less trustworthy than for text and app as the time lag between the actual event and when that event was self-reported on paper diaries and not automated. With regards to symptoms data, we found that frequency of symptoms was heavily influenced by which symptoms were pre-listed on the paper diary or pre-programmed in the app. The Clue app allows users to select from a much larger set of symptoms than was practical to include in the paper diary. The Clue app also allows users to enter a limitless number of symptoms as free-text, likely contributing to the 318 symptoms reported in the app’s data.
Despite these limitations, results from this study may help inform future studies that examine patient-documented bleeding. Perhaps our most novel and exciting finding is that using an app such as Clue, instead of traditional paper diaries shows promise, especially for researchers who are pursuing a rich dataset on symptoms related to contraceptive use. As people become increasingly accustomed to tracking their cycles via period tracking apps, we expect that collecting menstrual bleeding and symptoms data via app will be significantly more practical and convenient for both study participants and researchers.
Significance:
Most studies that examine the effects of contraception on menstrual bleeding necessarily rely on self-reported bleeding data, traditionally collected via paper diaries. Prior research suggests bleeding data collected via text messages are more complete and timely compared to paper diaries. Despite the rapidly growing usage of period tracking applications, to our knowledge, no studies have examined the completeness and timeliness of bleeding data collected via smartphone application. Our study compares bleeding data collected via paper diaries, text, and app. This study is valuable given that researchers may be able to harness the period tracking apps already used by millions of menstruating persons for their studies.
ACKNOWLEGEMENTS
This study was supported in part by the National Center for Advancing Translational Sciences of the National Institutes of Health under Award Number UL1 TR002319 and an independent investigator grant funded by CooperSurgical, Inc. We thank Amanda Shea and Marija Vlajic Wheeler from BioWink for their help with procuring data from the Clue app and reviewing the manuscript. We thank Joel Marcus and Evelyn Chanasyk for their editorial assistance. Dr. Godfrey receives grant funding from Bayer Women’s Health and honoraria from Merck as a Nexplanon trainer. None of the other authors has any conflicts of interest related to the submitted work.
References
- Adams Hillard PJ, & Wheeler MV (2017). Data from a Menstrual Cycle Tracking App Informs our Knowledge of the Menstrual Cycle in Adolescents and Young Adults. Journal of Pediatric and Adolescent Gynecology, 30(2), 269–270. doi: 10.1016/j.jpag.2017.03.015 [DOI] [Google Scholar]
- Clue. (2019). Retrieved from https://helloclue.com/
- Earle S, Marston HR, Hadley R, & Banks D (2020). Use of menstruation and fertility app trackers: a scoping review of the evidence. BMJ Sex Reprod Health. doi: 10.1136/bmjsrh-2019-200488 [DOI] [PubMed] [Google Scholar]
- Lim MS, Sacks-Davis R, Aitken CK, Hocking JS, & Hellard ME (2010). Randomised controlled trial of paper, online and SMS diaries for collecting sexual behaviour information from young people. J Epidemiol Community Health, 64(10), 885–889. doi: 10.1136/jech.2008.085316 [DOI] [PubMed] [Google Scholar]
- Mishell DR Jr., Guillebaud J, Westhoff C, Nelson AL, Kaunitz AM, Trussell J, & Davis AJ (2007). Combined hormonal contraceptive trials: variable data collection and bleeding assessment methodologies influence study outcomes and physician perception. Contraception, 75(1), 4–10. doi: 10.1016/j.contraception.2006.08.008 [DOI] [PubMed] [Google Scholar]
- Moglia ML, Nguyen HV, Chyjek K, Chen KT, & Castao PM (2016). Evaluation of Smartphone Menstrual Cycle Tracking Applications Using an Adapted APPLICATIONS Scoring System. Obstet Gynecol, 127(6), 1153–1160. doi: 10.1097/aog.0000000000001444 [DOI] [PubMed] [Google Scholar]
- Nippita S, Oviedo JD, Velasco MG, Westhoff CL, Davis AR, & Castaño PM (2015). A randomized controlled trial of daily text messages versus monthly paper diaries to collect bleeding data after intrauterine device insertion. Contraception, 92(6), 578–584. doi: 10.1016/j.contraception.2015.09.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Polis CB, Hussain R, & Berry A (2018). There might be blood: a scoping review on women’s responses to contraceptive-induced menstrual bleeding changes. Reprod Health, 15(1), 114. doi: 10.1186/s12978-018-0561-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- R Core Team. (2020). R: A language and environment for statistical computing. Vienna, Austria.: R Foundation for Statistical Computing. Retrieved from http://www.R-project.org// [Google Scholar]
- Stone AA, Shiffman S, Schwartz JE, Broderick JE, & Hufford MR (2003). Patient compliance with paper and electronic diaries. Controlled Clinical Trials, 24(2), 182–199. doi: 10.1016/s0197-2456(02)00320-3 [DOI] [PubMed] [Google Scholar]