Abstract
Blood pressure (BP) is an important indicator of an individual's health status and is closely related to daily behaviors. Thus, a continuous daily measurement of BP is critical for hypertension control. To assist continuous measurement, BP prediction based on non-physiological data (ubiquitous mobile phone data) was studied in the research. An algorithm was proposed that predicts BP based on patients’ daily routine, which includes activities such as sleep, work, and commuting. The aim of the research is to provide insight into the application of mobile data in telemonitoring and the continuous unobtrusive daily measurement of BP. A half-year data set from October 2017 of 320 individuals, including telecom data and BP measurement data, was analyzed. Two hierarchical Bayesian topic models were used to extract individuals’ location-driven daily routine patterns (topics) and calculate probabilities among these topics from their day-level mobile trajectories. Based on the topic probability distribution and patients’ contextual data, their BP were predicted using different models. The prediction model comparison shows that the long short-term memory (LSTM) method exceeds others when the data has a high dependency. Otherwise, the Random Forest regression model outperforms the LSTM method. Also, the experimental results validate the effectiveness of the topics in BP prediction.
Keywords: Daily routine, Blood pressure prediction, Telemedicine, Machine learning
1. Introduction
Hypertension is one of the most common chronic non-communicable diseases, and it is also a heavy disease burden worldwide. In 2013, around 2.5 million people died of hypertension in China, accounting for almost 27.5% of the total deaths [57]. In the United States, hypertension causes about 410,000 deaths annually, contributing to 1 in 6 adult deaths, and nearly 30% of adults have a problem with hypertension [15]. Alarmingly, the diagnosed population is becoming younger. However, most people, especially the young and middle-aged, do not measure their blood pressure regularly, which is important for hypertension management and control. The early detection or prediction of high blood pressure is a crucial step in decreasing the risk of hypertension and its complications. Also, continuous BP measurement is critical for hypertension rehabilitation [1]. Research also shows that the blood pressures measured at home are more relevant to cardiovascular disease (CVD) than those measured in the hospital [27], as home measurements avoid some bias, such as white-coat Hypertension (WCH) caused by the intension when facing a doctor [51]. Thus, remote continuous blood pressure measurement is very important.
The traditional measurement using a mercury blood pressure gauge requires professional help, which limits its usage outside of a hospital. The prevalence of the electronic sphygmomanometer is still limited by its cost in rural areas, which has had a higher cardiovascular mortality rate than urban areas in the last decades in China. Although some community health service centers provide the devices for free, the storage problem of historical user records weakens the management and control of home measurement. Considering these problems, a ubiquitous mobile phone with almost 100% coverage worldwide, with its ability to collect individuals’ daily behavioral data unobtrusively and frequently, can play a complementary role in the regular measurement of BP if it can be used for BP prediction.
Health behaviors, such as exercise, diet, sufficient sleep, and stress [40,50], have been widely accepted as important influential factors of hypertension. Commuting time, considered as a source of life stress [52], has been proven to be a risk factor related to cardiovascular disease [22]. Shift work and long working hours have also been proven to be risk factors of cardiovascular disease [39,45]. Some BP predictions have used working hours, sleep duration [9], and step counts [4] as predicators, which are mostly measured by specific numbers. However, the timeline of these behaviors has been ignored, which is important in a doctor's diagnosis process. To emphasize the duration and timeline of these behaviors synchronously, daily routine is introduced into the prediction model. For example, a daily routine can depict patterns such as “Mary leaves home at 8 a.m., arrives at work at 9 a.m., leaves the workplace at 5 p.m., visits elsewhere until 8 p.m., arrives home at 9 p.m., and has no cellphone signal at 10 p.m.” Such a pattern can reflect Mary's working hours, commuting time, sleep time, and timeline. Thus, daily routine can reflect the relevant hypertension risks comprehensively. Furthermore, O'Conor et al. [36] have shown that daily routine could be an unrecognized changeable factor that influences the health outcome of the elderly.
Thus, a daily location-driven routine pattern was extracted from the trajectory data collected from mobile devices, which reflects an individual's movement within a contextual environment in time and space [12]. A few BP prediction models based on the daily routine pattern are applied and compared to find the best prediction. The main objectives of this paper are as follows: (1) realize remote potable continuous BP prediction based on non-physiological data; (2) extract an individual's daily routine pattern and digitalize the daily routine into a distribution for further study; and (3) predict BP utilizing the daily routine pattern and the user's contextual data.
To our knowledge, few BP predictions based on daily routine have been extracted from mobile phone data. This approach can bring three advantages to the current research. First, it can avoid the biases of self-reporting. Second, the high coverage of mobile phones can solve the problem of device cost and sample limitation. Third, it can provide a possibility for quantifying the indirect variables such as work pattern and location shifting.
The main contribution of the study is that it is the first study to utilize large-scale trajectory data to predict BP. Although mobile analysis has played an important role in psychological science study (e.g., depression, anxiety, and stress) [2,5,11,46,56], studies in mobile behaviors and physical health, to our knowledge, are not many, especially studies regarding hypertension. The ubiquity, potability, and unobtrusive data collection of mobile phones highly support the continuous BP prediction of the public. Secondly, compared with traditional risk factors of hypertension, this study focuses on the individual's daily routine pattern, which can reflect the regularity of routine, working pattern, and stress level to a certain extent. The new measurement method of daily routine broadens the research dimension of hypertension. In addition, various groups of data inputs into different prediction models are compared. Also, the prediction is proved to be efficient.
According to the experiment, the accuracy of our predictions of systolic BP and diastolic BP achieved 95.02% and 92.49%, and the Root mean squared error (RMSE) was 8.29 and 8.01 mmHg, respectively, when using long short-term memory (LSTM). When using independent historical data, the accuracy of the Random Forest regression model was even higher. It can be implicated to collect a patient's possible BP predictions for remote or online medical treatment. The daily predictive values can be a reference data for medical staff to learn about the historical data and abnormal situation of a patient, which can be helpful for hypertension control and management. It is worth mentioning that the prediction model fits the population who cannot access continuous BP measurements with a BP measurement device. But it cannot substitute for a sphygmomanometer, so the model is not applicable to patients receiving medical treatment for which they need an accurate measurement. In conclusion, trajectory data from a mobile phone can help to mitigate the problem of remote continuous BP prediction, which can assist in the in-hospital diagnosis and out-of-hospital monitoring as an alternative BP measurement method when a professional device is not at hand.
2. Literature review
2.1. Daily routine and health
Researchers have conducted many studies on the relationship between health and daily activities, such as physical activity, shift work, working hours, commuting time, etc. Physical activity is classified into three categories: (1) occupational physical activity (OPA), (2) cardiorespiratory fitness activity, and (3) leisure-time physical activity (LTPA). The health influences of these categories are studied respectively. Evidences show a detrimental role of occupational physical activity and a protective role of cardiorespiratory fitness for all-cause and ischemic heart disease mortality, while leisure-time physical activity had differential effects on mortality: protective effects for healthy men but not for men with pre-existing cardiovascular disease [28]. As for commuting, it is widely accepted in psychology as a source of stress due to the time pressure and traffic congestion commuters experience while commuting [[35], [48]]. Also, commuting has been extensively studied as an important socioeconomic factor that influences public health (e.g. cardiovascular risk as reported by [22] and stress as reported by [52]). In terms of shift work and working hours, a systematic review reported moderate grade evidence linking shift work to breast cancer and long working hours to stroke, and it reported low grade evidence linking both shift work and long working hours with an increase in risk for cardiovascular diseases [45].
With the advance of technology, models to determine individual's habits or daily activities based on large-scale mobile phone sensor data [24,41,42,54] have been developed and provide the opportunity to study the measurable daily routine and health. Based on the step count data collected from a mobile pedometer, Okura et al. [37] proved the effectiveness of regular physical activity on blood pressure control. Scholars also found that the movement patterns (such as location features and daily behavior) reflected in the trajectory data collected by a smartphone can identify patients with depression or symptoms of depression [7,47,58]. Chiang and Dey [9] considered behaviors, such as sleep duration, sleep condition, and step count, in BP prediction. In addition to movement data, late sleep frequency, and sleep time, Cheng [8] considered characteristics of an individual's online surfing behavior, such as the duration of online time and call duration, into personal health risk prediction.
Therefore, a user's life rhythm, work pattern, commuting time, and other information extracted from daily trajectory data can provide a new research angle for remote continuous blood pressure monitoring.
2.2. Blood pressure prediction
One of the most common data used to predict blood pressure is self-reporting data. According to the well-known risk factors of hypertension (e.g. a lack of exercise, sedentary lifestyle, smoking, and drinking), some scholars applied machine learning to predict BP based on datasets related to existing risk factors and demographic features (e.g. age, sex). Golino et al. [17] applied a classification tree algorithm to predict whether the user would develop hypertension in the future based on contextual data, comprising body mass index (BMI), waist hip ratio (WHR), and smoking habits. Kwong et al. [29] predicted an individual's average systolic BP (SBP) in a certain period utilizing questionnaire data such as age, BMI, exercise level, alcohol-assumption level, smoking condition, stress level, and salt intake. Also, the neural network was applied to the prediction. Rau et al. [44] related ambulatory 24-hour recordings of BP to heart rate (HR) and daily ratings, such as perceived control (PC), psychological demand (P), control (C), and social support (S) at work. Also, they used the interaction of job strain and perceived control (P x C x PC) to predict BP at night but not at work. These predictions, to some extent, extended the input dimension of BP prediction and contributed to health assessment. However, there were some limitations to the studies. First, the sample size was limited, which was constrained by the data sources. Second, intrusive self-reporting procedures are related to some well-known biases, including the participants’ lack of attention to critical behaviors, memory limitations, and socially desirable responding [18,38]. Moreover, the problem of continuous measurement cannot be mitigated, and thus early warning of hypertension could not be realized in the studies.
Mobile and wearable devices are poised to address this research gap by allowing researchers to collect data objectively and unobtrusively [23]. It is also considered an easy-to-use way for medical professionals to provide direct feedback and support to patients [29]. Currently, data from these devices are increasingly used to predict BP. Based on a wireless home BP monitor with a cloud platform, Li et al. ([30]a) used historical measurement records (e.g. BP, HR) and contextual data (e.g. age, sex, weight, and height) to build a BP trend prediction model. Ballinger et al. [4] utilized a dataset comprising heart rate, step count, and other activity, which was collected and processed from wearable heart rate sensors and medical history data, to assess cardiovascular risk scores, including hypertension risk scores. Chiang and Dey [9] considered health behavior collected by wearable devices, such as sleep duration, sleep condition, calories burned, and step count into the BP prediction. Combined with a user's historical BP records, BP can be predicted daily, and the most relevant health behaviors can be identified for the user.
To our knowledge, few researches have predicted BP based on non-physiological data, such as mobile trajectory data, which can be helpful to save the cost of wearable devices.
2.3. Prediction models
The classical regression models include Support Vector Regression (SVR) [49], the Random Forest regression model [32], the Gradient Boosted Regression Tree (GBRT) [14], and the Multilayer Perceptron (MLP) regression model [16]. These regression models are applied to blood pressure prediction. However, since their input must be a fixed length and they have no sequential structure, they cannot deal with the variable-length time series data. The Hidden Markov model (HMM) can be used in series data analysis such as speech recognition, and it can model transitions between states in a sequence [43]. However, HMM can only capture the correlation of the adjacent states, and it is not suitable for the modeling of long-term dependency [21].
The recurrent neural network (RNN) is a kind of feedforward neural network, which introduces a notion of series to the model. It can connect the data of the previous time step with the following ones, and it can better process sequential data. Long short-term memory (LSTM) [25] is a variant of the RNN. It uses a LSTM unit instead of an activation function, which can solve the problem of the long-term dependency of sequence data. Long short-term memory has been successfully used in sequence dependent modeling of various scenes, such as speech recognition [20], handwriting recognition [19], sequence tagging [10], and language model [33]. Considering that our dataset of BP prediction may be a time series data with long-term dependency within users, we also include LSTM to predict BP in our model comparison.
3. Materials and methods
In our model, a user's trajectory data is labeled using a heuristic algorithm and transferred into region of interest (ROI) label sequences. Furthermore, a Latent Dirichlet Allocation (LDA) topic model is proposed to extract the main daily routine mode and obtain an individual's daily routine pattern distribution. Finally, time-series data, containing daily routine pattern distribution records in the last few days, and contextual data, such as BMI, sex, and age, are put into the prediction model using the LSTM recurrent neural network. Fig. 1 shows the framework of the model.
Fig. 1.
Diagram of modeling framework.
3.1. Dataset
Our research is based on the sphygmomanometer data set and mobile cell tower data of users who bought a sphygmomanometer device from our cooperating device company in Shanghai, China. The data was collected over a period of six months from October 1, 2017, to March 31, 2018. Based on patented pulse wave acquisition technology and the biofeedback system of the company, the study participants wore the device on their wrists and measured their blood pressure, pulse rate, and other health indicators. When connected to the internet, the data was uploaded on a cloud server belonging to the company and analyzed for a health assessment. Our BP measurement data was collected from the company. Meanwhile, with the mobile phone numbers provided by the company, the mobile cell tower data of these users was collected from a telecom company in Shanghai, China. We chose this group of users because it is the target group who may be interested in BP monitoring and benefit from BP prediction the most.
The data set included 320 users, 3738 historical blood pressure measurement records, and 14834 users’ daily trajectories records. For the sake of privacy and confidentiality, the datasets were encrypted and then combined into one by the two companies using the user's key identification code. Furthermore, all the trajectory data was analyzed by the telecom company using a dedicated computer with the internet disconnected. Only the final analysis results, the topic distribution of the daily routine pattern, was taken out of the company under internal confirmation to ensure that the key information had been encrypted.
A sample of the wearable data and baseline characteristics of the users, such as the wearable device use log, monitoring related variables, and sociodemographic characteristics, is shown in Table 2.1 The ratio of females to males for the dataset is 0.35:0.65, with an average age of 60.5, average weight of 69.5 kg, and average height of 168.4 cm. Also, the average number of measurements is 46 times. The distribution of users’ measurement period and measurement ratio during the data collection period is depicted in Fig. 2.2 As shown in the Fig., short-term (<25 days) and long-term (>150 days) users are the two types of users that are centered in the perspective of measurement period (x-axis). Most users are dispersed evenly from 25-150 days. From the y-axis (measurement ratio), most users’ use frequency is below 0.3, which means 3 times in 10 days.
Table 2.
Baseline characteristics of users in the study.
Variables for Time Series Data | Example | Type | AVG±SD |
---|---|---|---|
Measurement Time | 2018/3/26 12:11 | Datetime | - |
Pulse | 79 | Numerical | - |
Systolic BP (SP) | 78 | Numerical | |
Diastolic BP (DP) | 129 | Numerical | |
Pulse Rate (PR) | 78.95 | Numerical | |
Left Ventricular Compliance (LVC) | 0.4 | Numerical | |
Myocardial Contractile Force (MCF) | 1.69 | Numerical | |
Cardiac Output (CO) | 8.57 | Numerical | |
Endocardial Viability Ratio (EVR) | 98.44 | Numerical | |
Myocardium Consume Volume Oxygen (MVO) | 18.92 | Numerical | |
Mean Arterial Pressure (MAP) | 14.11 | Numerical | |
Total Peripheral Resistance (TPR) | 104.35 | Numerical | |
Variables for Context Data | |||
Gender | F(0), M(1) | Categorical | F:M=0.35:0.65 |
Age | 45 | Numerical | |
Weight | 67 | Numerical | |
Height | 167 | Numerical |
Fig. 2.
The distribution of the measurement period and measurement ratio.
Table 3 shows a sample of the cell tower data, which was recorded when a mobile connection was made.
Table 3.
Example of records of trajectory data.
User ID | Time Stamp | Cell Tower ID | Longitude | Latitude |
---|---|---|---|---|
c663ddc……42815 | 2017-10-15 14:56:23 | C0920 | 121.4211 | 31.23194 |
c663ddc……68ad8 | 2017-12-08 17:26:52 | C1610 | 121.4095 | 31.23867 |
c663ddc……4cc27 | 2018-03-11 08:54:17 | C0066 | 121.4061 | 31.23139 |
3.2. Daily routine pattern extraction
In this section, we introduce an unsupervised method of daily routine extraction. The ROI labels, such as home and workplace, are obtained by a heuristic algorithm, which is referred to as Montoliu, Blom, and Gatica-Perez's grid-based clustering algorithm [34] and is commonly used in ROI industrial practice. Then the cell tower connection sequences are replaced by ROI label sequences. Referring to the two hierarchical Bayesian topic models [13] comprising the Latent Dirichlet Allocation (LDA), the probability of daily routine pattern distribution is obtained. Through a visualization method, the meaning of the daily routine pattern can be understood from the perspective of time and space.
3.2.1. ROI label sequence transformation
For each user's cell tower trajectory data, each record is composed of a timestamp and the longitude and latitude of the cell tower, which can be expressed as . Then a user's cell tower trajectory for the day is composed of a series of events whose timestamps belong to the same day. To provide locations with semantic meanings, the ROI set of each user needs to be defined as . According to the ROI label of each user, the user trajectory is transformed into a ROI sequence. Fig. 3 shows the data conversion process.
Fig. 3.
ROI label sequences generation.
The details of the data processing are as follows:
(1) Defining the user ROI set
Referring to the grid-based clustering algorithm, we first cut the instant connection point and obtain the stay points that indicate where the user stayed in one location for a while to make the trajectory clear. Then the stay points are clustered into stay regions if the transfer is mutual and frequent. To provide semantics to the locations, different time periods are defined. The time period 23:00-6:00 is defined as the night period to determine the home ROI. The time period 10:00-16:00 is defined as the daytime period of work ROI. The remaining hours are defined as other. The frequency of each stay region during the three time periods is counted respectively. The stay region has the highest frequency of night and daytime periods, which are identified as home and workplace.3 For the same cell tower, the home label takes precedence over the workplace label. Considering the instability of mobile connection, the stay region or the points whose distance to a key location is lower than the average distance are clustered into the same ROI label. At the same time, the remaining locations are defined as “other ROI,” so that all cell towers of each user are labelled with ROI.
(2) ROI label sequence transformation
Based on the ROI label set, the cell tower trajectory of user i on day d was converted into the ROI trajectory:. Then, the stay duration of each ROI was calculated. According to the stay time proportion in each hour, the main ROI with the highest proportion was taken as the residence location of that hour. Since the user may not have had data in a certain period of time, the label N is defined to emphasize this situation “No signal.” Finally, the ROI trajectory was transformed into an ROI label sequence composed of 24 residence labels, . Taking as an example, the first label H indicates that the user was home at 1:00, and the third label N indicates that the user had no signal at 3:00. The sequence also shows that the user was in other places at 8:00 and at work at 9:00. This transformation is done for the convenience of subsequent daily routine topic extraction.
3.2.2. Hierarchical Bayesian topic models
Based on the obtained ROI label sequences, the LDA topic model is applied to extract main topics (daily routine pattern). Latent Dirichlet Allocation is an unsupervised machine learning technology. It was originally a document generation model and a three-level Bayesian probability model, which included a three-tier structure of words, topics, and documents [6]. Based on the idea of "a document to topic follows polynomial distribution, while a topic to word obeys polynomial distribution", the topic information of a large-scale document set can be specified under LDA. Referring to the method of Farrahi and Gatica-Perez [13], we can convert the ROI label into ROI words to construct a bag of ROI transitions (document) from ROI label sequences. And the ROI word is obtained using a 3-hour time window following Farrahi and Gatica-Perez [13]’s method. A 2-hour time window is not considered because it can only offer 9 (3 labels * 3 labels) words, which can not reflect the variance in daily routine enough. And to get as many ROI words in the document as possible, the 4-hour time window is also not mentioned in this paper. The specific conversion is shown in Table 4.
Table 4.
LDA document generation.
Time | 1:00 | 2:00 | 3:00 | 4:00 | 5:00 | 22:00 | 23:00 | 24:00 | |
ROI Label | H | H | N | H | H | H | H | H | |
Time Window | / | / | 1:00-3:00 | 2:00-4:00 | 3:00-5:00 | 20:00-22:00 | 21:00-23:00 | 22:00-24:00 | |
ROI Words | / | / | HHN3 | HNH4 | NHH5 | OHH22 | HHH23 | HHH24 | |
Document | {HHN3,HNH4,NHH5,…,OHH22,HHH23,HHH24} |
The LDA model is used to discover main routines topics (characteristics) of all articles converted from the trajectory dataset and provide the probability distribution on the given topics. It digitalizes the cell tower sequence into a daily routine pattern distribution for prediction. We implement our LDA model using the Gensim packet in Python 3.8.3. The details of the LDA result will be demonstrated in the experiments section.
3.3. Prediction model
In our first prediction dataset, the input of the model includes the user's time-series data in a 3-day time window, including daily routine topic distribution, BP history measurement data, and the user's context data. The output of the model is the user's blood pressure value of the day. The recurrent model and classical models are compared in terms of the prediction accuracy. In the recurrent model, the daily routine topic distributions and BP time-series data are input into LSTM, which can extract the features of time series data. The user's context data are input into the full connection layer, which is the basic layer in the neural network. After that, the data features extracted from LSTM layer and full connection layer are fused to generate the predicted blood pressure value, while the other classical models consider all the input in a single layer. The structures of the models are depicted in Figs. 4 and 5.
Fig. 4.
Structure of the LSTM model.
Fig. 5.
Structure of the other classical models.
Concurrently, considering that BP history data is sparse in actual practice, another prediction dataset is collected, where the prior BP measurement can be uncontinuous. The dataset includes BP measurement records of last m times, the topic distribution of the corresponding date, and contextual data of the user. The classical models and recurrent model with a contextual layer are compared in this dataset again. Also, to avoid the influence of different day length on model prediction accuracy, different day lengths are set. The details of the comparison are in given in Section 4.3.
4. Experiments
4.1. Main daily routine pattern
After the processing of the topic model using the 320 users' cell tower trajectory data, the main daily routine patterns (topics) of the group are obtained. The daily routine topics vary with the topic number setting. However, the main topics remain similar. Taking a topic model with 10 topics as an example, the top 50 user ROI sequences with the highest probability of each topic are visualized to make the abstract topics, which are represented by word packets, easier to understand. The visualized daily routine topics are shown in Fig. 6.
Fig. 6.
Main daily routine pattern.
Take Topic 4 in Fig. 6 as an example. The pattern of this topic means that users leave home to work around 7:00-8:00 and go home around 17:00-18:00 to some extent. The commuting time is short because there is no other label during the connecting time. Topic 8 belongs to the cases of longer commutes, because the users spend some time in other locations from home to workplace. Topic 2 denotes that users go somewhere they do not normally go. Topic 0 means that the users stay home all day. Therefore, through the topic model and visualization method, we can capture the main routine pattern of the users. Table 5 lists the possible routine meaning of each topic.
Table 5.
Description of daily routine pattern of each topic.
Daily Routine Pattern | Daily Routine Pattern | ||
Topic 0 | At home | Topic 5 | Almost no signal the whole day |
Topic 1 | At most frequent place in daytime (e.g., the workplace for workers) the whole day | Topic 6 | At home, with home address often connected to two base stations |
Topic 2 | Out at other places not normally visited during the daytime | Topic 7 | Around the home activities all day |
Topic 3 | At home, with frequent no signal state | Topic 8 | At work during the daytime with long commutes |
Topic 4 | At work during daytime with short commutes | Topic 9 | Out at other places or workplaces in the afternoon |
Based on the daily routine pattern obtained from the above LDA topic model after training, the user's daily trajectory data can be transferred into the distribution under different topics. Taking three randomly picked users as examples, the distribution regarding different topics of 30 days is visualized in Fig. 7. The specific information of these users is shown in Table 6.
Fig. 7.
Topic distribution of users on workdays and weekends.
Table 6.
User information.
Gender | Age | Weight | Height | |
User 0 | Female | 75 | 52 | 156 |
User 1 | Male | 79 | 60 | 165 |
User 2 | Male | 46 | 77 | 170 |
To make the daily routine understand by prediction model, the sequence is also digitalized as a distribution on the main routine pattern. In Fig. 7 below, each line in the x-axis direction is the probability distribution of the topic of users in one day. The probability value is determined by the color and can be referred to in the color bar on the right of the Fig..
From Fig. 7, both Users 1 and 2 have higher probability in Topic 5 (no signal), which is more common for elderly people, than User 3. Also, it can be seen that User 1 is often in Topics 0, 3, and 7 on weekdays and weekends, which is basically an at home pattern, and there are some differences between weekends and weekdays. The probabilities of User 2 being in Topics 0, 3, and 7 (home pattern) are also higher than in other topics. Compared with User 1, the probability distribution of his daily routine pattern is more concentrated, and the difference between weekends and weekdays is smaller. The main topics on the workdays and weekends of User 3 are basically Topic 0 (at home) and Topic 2 (going out), however, the situation of going out on weekdays is higher, and the probability of being at home on weekends is higher. Furthermore, User 3 has a lower probability in Topic 5 (no signal topic) than that of Users 1 and 2. Combined with their demographic characteristics, the situation reflected by these topics is in accordance with their state.
4.2. Data preprocess and evaluation method
Data preprocessing is very important for the construction of the model. In the collected dataset, the user's trajectory data is relatively complete. However, the user's BP measurement data is sparse. The lack of measurement data is mainly due to users’ irregular measurement habits. Therefore, when constructing the training dataset, it is necessary to combine the two datasets according to user id and date first, and then remove the blank lines of the measurement data to form the final experimental data.
Due to the different scalars of BP data, trajectory data, and context data, the prediction effect of neural network will be affected, so it is necessary to standardize the training set. Here we use Min-Max Scaler normalization to normalize each variable into "0" to "1" in the range of minimum to maximum. The formula is as follows:
To verify the performance of the proposed prediction model, Mean absolute error (MAE), Root mean squared error (RMSE), and Mean absolute percentage error (MAPE) are used as evaluation metrics. The formulas of these three metrics are as follows:
where denotes the predicted BP values on day i, while represents the actual measure value of the day. Mean absolute error (MAE) describes the average value of absolute error, which can better reflect the actual prediction error. Root mean squared error (RMSE) measures the deviation of the prediction from the actual value. Mean absolute percentage error (MAPE) reflects the magnitude of prediction deviation. The closer the value is to 0, the better the prediction effect.
4.3. Compared methods and results
The different prediction models are compared in this section according to the prediction accuracy, and the effects of trajectory data and historical BP data are verified. To reduce the randomness of experimental results, 5-fold cross-validation is applied to train each model. We use grid search to determine the parameters of compared models, and all models are well trained in the experiment. A brief introduction of the traditional algorithms is as follow:
-
•
SVR: Support Vector Regression is a classic regression model based on SVM. We use LibSVM to implement the method.
-
•
RF: Random Forest is an enhanced approach by aggregating a collection of decision trees to reduce overfitting of the data and the resulting high variance.
-
•
XGBoost is a boosted method whose base learners are decision trees. We use the XGBoost API in Python to implement the method.
-
•
MLP is a basic neural network containing one hidden layer, and the output layer is a linear regression model.
-
•
LSTM is a variant of RNN, which can solve the problem of the long-term dependency of sequence data.
We use Keras4 V2.4.3 (Tensorflow backend) as our programming framework to implement the LSTM model, and the parameters of Fig. 4 are depicted in
Table 7. We choose the scikit-learn5 Python module to implement the compared model.
Table 7.
Parameters of the proposed model.
Layers Setting | Units |
Dense (context data) | 16 |
LSTM (time-series data) | 11 |
Dense(output) | 1 |
Other Settings | Values |
Input (context data) | Shape = (number of samples, 8) |
Input (time-series data) | Shape = (number of samples, 3, 11) |
Epochs | 500 |
Batch size | 10 |
Optimizer | Adam |
We first choose a dataset that contains users who have continuous BP measure behavior within 3 days,6 which means the data within a user is highly correlated. We then use 5-fold cross-validation to compare the models. The results of the experiment are shown in Table 8.7
Table 8.
Cross-validation results of models with different inputs and algorithms.
SBP | |||||
---|---|---|---|---|---|
Data | Data Scale | Model | MAE | RMSE | MAPE (%) |
Topic series + Context data | 2614 | LSTM with contextual layer | 7.65 | 11.15 | 6.41 |
SVR | 7.60 | 11.03 | 6.38 | ||
RF | 7.01 | 10.06 | 5.87 | ||
XGBOOST | 7.50 | 10.67 | 6.21 | ||
MLP | 8.17 | 11.53 | 6.60 | ||
BP series+ Context data | 875 | LSTM with contextual layer | 12.59 | 15.73 | 10.17 |
SVR | 12.80 | 16.15 | 10.33 | ||
RF | 10.78 | 13.65 | 8.58 | ||
XGBOOST | 11.09 | 14.31 | 8.77 | ||
MLP | 10.99 | 13.59 | 8.84 | ||
Topic series +BP series+ Context data | 871 | LSTM with contextual layer | 6.11* | 8.30* | 4.99* |
SVR | 7.18 | 10.18 | 5.95 | ||
RF | 6.69 | 9.31 | 5.49 | ||
XGBOOST | 6.54 | 9.49 | 5.35 | ||
MLP | 7.36 | 9.85 | 5.83 | ||
DBP | |||||
Data | Data Scale | Model | MAE | RMSE | MAPE (%) |
Topic series + Context data | 2614 | LSTM with contextual layer | 6.34 | 8.73 | 8.29 |
SVR | 6.43 | 8.87 | 8.38 | ||
RF | 6.10 | 8.31 | 8.02 | ||
XGBOOST | 6.38 | 8.62 | 8.34 | ||
MLP | 6.35 | 8.62 | 8.11 | ||
BP series+ Context data | 875 | LSTM with contextual layer | 8.55 | 11.25 | 10.57 |
SVR | 6.40 | 8.64 | 8.35 | ||
RF | 6.41 | 8.73 | 8.40 | ||
XGBOOST | 6.75 | 8.96 | 8.72 | ||
MLP | 7.52 | 9.22 | 9.57 | ||
Topic series +BP series+ Context data | 871 | LSTM with contextual layer | 5.83* | 8.01* | 7.51* |
SVR | 5.93 | 8.12 | 7.80 | ||
RF | 5.86 | 8.04 | 7.71 | ||
XGBOOST | 6.32 | 8.50 | 8.16 | ||
MLP | 6.57 | 8.90 | 8.09 |
After adding trajectory topic series data, the performance of our model was significantly improved compared with only blood pressure time series data. Therefore, trajectory topic series data has a significant effect on blood pressure prediction. Concurrently, the LSTM model is better than the other four compared models on all the three indicators when the data has a strong dependency. For predicting systolic blood pressure, the advantages of LSTM are more significant. Also, the MAPE value is very low. That is, the difference between the predicted blood pressure and the real blood pressure is very small.
In addition, compared with the other algorithms predicting blood pressure summarized in Table 1, our predicting model can achieve similar accuracy of the models using behavioral variables [29] with MAPE of 7.98 when only using topic data and context data in our dataset (with MAPE of 6.41-8.29). Moreover, it is proved that the BP predicting model (LSTM based on the whole dataset) can have a similar performance with models (Kei [26]) using PPG data with a RMSE of 7.29. But it cannot outperform the models using ECG signals and other physiological indicators.
Table 1.
Summary of prediction models.
Author | Method | Data Source | Model Application | Efficiency |
|||
---|---|---|---|---|---|---|---|
AUC | MAE | RMSE | MAPE | ||||
Kwong et al. [29] | Back-Propagation and Radial Basis Function Neural Network | Correlation of variables (age, BMI, exercise level, alcohol consumption level, smoking status, stress level, and salt intake level). | Predict systolic blood pressure | 7.98 | |||
Kei Fong et al. [26] | Multiple Support Vector Regression (SVR) Machines | Multiple photoplethysmogram (PPG) signals | Predict continuous non-occluding blood pressure | 7.29 | |||
Chiang and Dey [9] | Random Forest with Feature Selection (RFFS) | Health behavior and historical BP | Predict individual's BP |
5.18 | 6.88 | 5.64 | |
Baek et al. [3] | Convolutional Neural Network (CNN) | Sequential ECG and PPG signals | Predict blood pressure | 5.32 | 5.54 | ||
Zhang et al. [59] | Gradient Boosting Decision Tree (GBDT) | ECG and PPG | Continuous blood pressure prediction | 5 (from Fig.) | |||
Su et al. [53] | Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) | A static BP dataset | Multi-day BP prediction | 3.90 | |||
Teixeira et al. [55] | Phenotyping Algorithms (Random Forest performs best) | Electronic health record (EHR) data | Identify hypertensive individuals |
0.97 | |||
Li et al. ([31]b) | Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) | Sequential measurement data and contextual data | Predict the trend of users' BP | 5.29 | 7.01 | ||
Koshimizu et al. [27] | Multi-Input Multi-Output Deep Neural Networks | Time-series data of blood pressure measured and medical examination data |
Predict blood pressure variability | 6.65 |
Then the whole dataset is used to compare the prediction accuracy when the users’ BP measurement is sparse, which is more common in reality. In this dataset, we use the last m times’ BP history data and corresponding days’ topic series data as the model input. Also, to check the robustness of this dataset, we use 5-fold cross-validation and different settings of m to compare the models. Fig. 8, Fig. 9 show the cross-validation results of SBP and DBP with different m settings and evaluation methods. The numerical descriptions of Fig. 8 and Fig. 9 are illustrated in Table 9. Since the performance of MLP method is very poor, we do not draw it in Fig. 8 and Fig. 9 for the harmony of the figures.
Fig. 8.
Cross-validation results of SBP with different m settings and evaluation methods.
Fig. 9.
Cross-validation results of DBP with different m settings and evaluation methods.
Table 9.
The numerical descriptions of Cross-validation results.
SBP | MAE | RMSE | MAPE(%) | |||||||||
MODEL | AVG | SD | MIN | MAX | AVG | SD | MIN | MAX | AVG | SD | MIN | MAX |
LSTM | 7.29 | 0.08 | 7.18 | 7.45 | 9.89 | 0.12 | 9.68 | 10.11 | 5.99 | 0.06 | 5.89 | 6.11 |
MLP | 10.88 | 1.37 | 8.71 | 14.28 | 13.84 | 1.58 | 11.43 | 17.88 | 8.46 | 1.03 | 6.87 | 10.92 |
RF | 6.43 | 0.09 | 6.29 | 6.66 | 9.03 | 0.11 | 8.88 | 9.34 | 5.35 | 0.09 | 5.20 | 5.55 |
SVR | 8.99 | 0.62 | 8.00 | 10.03 | 11.67 | 0.61 | 10.71 | 12.73 | 7.33 | 0.51 | 6.52 | 8.18 |
XGBOOST | 6.61 | 0.12 | 6.45 | 6.85 | 9.21 | 0.17 | 8.92 | 9.53 | 5.46 | 0.10 | 5.32 | 5.68 |
DBP | MAE | RMSE | MAPE(%) | |||||||||
MODEL | AVG | SD | MIN | MAX | AVG | SD | MIN | MAX | AVG | SD | MIN | MAX |
LSTM | 5.69 | 0.07 | 5.57 | 5.93 | 7.74 | 0.08 | 7.55 | 8.04 | 7.47 | 0.12 | 7.27 | 7.83 |
MLP | 6.91 | 0.65 | 6.08 | 8.61 | 9.18 | 0.79 | 8.24 | 11.48 | 8.69 | 0.80 | 7.65 | 10.67 |
RF | 5.33 | 0.03 | 5.27 | 5.40 | 7.37 | 0.05 | 7.27 | 7.45 | 7.06 | 0.05 | 6.96 | 7.14 |
SVR | 5.93 | 0.22 | 5.63 | 6.40 | 8.07 | 0.20 | 7.84 | 8.54 | 7.76 | 0.31 | 7.33 | 8.38 |
XGBOOST | 5.58 | 0.08 | 5.39 | 5.71 | 7.62 | 0.08 | 7.44 | 7.81 | 7.31 | 0.11 | 7.05 | 7.50 |
From the above figures, it can be found that the Random Forest method performs best on all three indicators, almost achieved 5% in MAPE in SBP prediction. The next is XGBoost, which is followed by LSTM with a small difference at around 0.5% in MAPE in SBP prediction. It makes sense that in a dataset where the dependency is lower, LSTM does not outperform Random Forest and XGBoost. As the sparse measurement is more common in actual user input, we conclude that Random Forest is the better prediction algorithm.
5. Conclusion
5.1. Discussion
In this section, we will discuss the merits and limitations of the proposed BP prediction model. Based on the trajectory data, historical BP data, and contextual data, the BP prediction on the half-year data of 320 users is carried out. Compared with traditional models, the accuracy of the LSTM model exceeds most of the other models when the time-series data has strong dependency, but the Random Forest model outperforms the others when time-series data is highly uncorrelated.
From a large trajectory dataset, our study shows the potential of using trajectory data in hypertension research. The accuracy of the proposed algorithm shows the feasibility of BP prediction using mobile trajectory data. Also, user trajectory data is accessible to telecom companies. If cooperation between telecom companies and medical professionals can be reached under secure data protection, this data can support remote and online medical treatment in the future.
In addition, this study provides insight into the measurable daily routine patterns which can reflect the individual's daily behavioral information in the health management area, such as work pattern, sleep time, and working hours. It can greatly supplement the bias of self-reporting, which is the current mainstream of health behavioral information collection. At the same time, it provides a measurable method for future study on routine-related variables and physical health.
We consider that our continuous daily BP prediction can support physicians' medical decision-making process. On the one hand, our prediction values can be used as reference data for patients without hypertension measurement records especially in remote conditions. On the other hand, the continuous prediction value can sometimes function as an early abnormal situation warning.
5.2. Limitations and conclusions
The proposed algorithm has some defects that cannot be overlooked and require further improvement. Firstly, due to the lack of other risk factors related to hypertension, such as diet, exercise, alcohol assumption, and smoking, the prediction accuracy would suffer from the instable characteristic in BP records and trajectory time series. Secondly, the model has the cold-start problem that a new user's initial prediction accuracy would be low due to the lack of historical BP records. Some records need to be added by the user to support higher accuracy, which may limit the application. Thirdly, we only apply the prediction model to users in Shanghai, while the daily routine topic distribution may vary according to population and region. More experiments are needed to improve accuracy. Lastly, due to the lack of fine-grained trajectory data (e.g., GPS data), we are unable to conduct more fine-grained analyses, such as shorter intervals in ROI sequence extraction. We will improve this aspect in future studies.
We performed BP prediction using two hierarchy Bayesian topic models and recurrent models with the contextual layer. The prediction accuracy reached more than 90% on both SBP and DBP. Previous studies have mostly used self-report behavioral data and contextual data, which cannot avoid the bias of self-reporting. In addition, previous studies cannot provide a continuous BP prediction unobtrusively for users. However, regular BP measurement is critical for hypertension control and management, especially in the rehabilitation period. Moreover, the device we used to collect data is available for the public, even in rural areas. With an application that allows GPS data acquisition, the model can perform better predictions than cell tower trajectory data. In the future, we will study daily behaviors further and extract more specific behavioral variables to improve the prediction model. Also, we believe that combined with a self-developed application, the model can be more helpful in BP prediction with higher precision trajectory data.
Declaration of Competing Interest
The authors declare that they have no conflict of interest.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (Grants No. 91646205 and 71421002), and the Fundamental Research Funds for the Central Universities of China (Grant No. 16JCCS08)
Biographies
Yidan Xiang received her B.S. degree (2017) from Tianjin University and is now pursuing her Ph.D. degree as a student of Prof. Pengzhu Zhang at Antai College of Economics and Management, Shanghai Jiao Tong University. She is interested in smart health, user behavior changes and big data analysis.
Shaochun Li is now studying for Ph.D. degree at Antai College of Economics and Management, Shanghai Jiao Tong University. His research interests include smart health, knowledge discovery, data mining.
Footnotes
The variables LVC, MCF, CO, EVR, MVO, MAP, and TPR, are NOT used in the prediction model.
The measurement period is the length of the day from the user's first measurement to the last, while the ratio means the proportion of days that have records in the measurement period.
For retired people, the workplace can be regarded as the most frequent place they go to during the daytime.
We only set a 3-day time window to ensure enough observation samples.
The * mark means better performance.
References
- 1.AbuDagga A., Resnick H.E., Alwan M. Impact of blood pressure telemonitoring on hypertension outcomes: a literature review. Telemed. e-Health. 2010;16(7):830–838. doi: 10.1089/tmj.2010.0015. [DOI] [PubMed] [Google Scholar]
- 2.Aung M.S.H., Alquaddoomi F., Hsieh C., et al. Leveraging multi-modal sensing for mobile health: a case review in chronic pain. IEEE J. Selected Topics Signal Proc. 2016;10(5):962–974. doi: 10.1109/JSTSP.2016.2565381. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Baek S., Jang J., Yoon S. End-to-end blood pressure prediction via fully convolutional networks. IEEE Access. 2019;7:185458–185468. [Google Scholar]
- 4.Ballinger B., Hsieh J., Singh A., et al. DeepHeart: semi-supervised sequence learning for cardiovascular risk prediction. arXiv preprint. 2018 arXiv:1802.02511. [Google Scholar]
- 5.Ben-Zeev D., Wang R., Abdullah S., et al. Mobile behavioral sensing for outpatients and inpatients with schizophrenia. Psychiatr. Serv. 2016;67(5):558–561. doi: 10.1176/appi.ps.201500130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Blei D.M., Ng A.Y., Jordan M.I. Latent dirichlet allocation. J. Mach. Learn. Res. 2003;3:993–1022. Jan. [Google Scholar]
- 7.L. Canzian, M.Musolesi, Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, Osaka, Japan, (2015).
- 8.Cheng X. Beijing University of Posts and Telecommunications]; 2017. Personal Health Risk Prediction and Assessment Based on Mobile Telecommunication Data. [Google Scholar]
- 9.Chiang P.-H., Dey S. Offline and online learning techniques for personalized blood pressure prediction and health behavior recommendations. IEEE Access. 2019;7:130854–130864. [Google Scholar]
- 10.Collobert R., Weston J., Bottou L., et al. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 2011;12:2493–2537. ARTICLE. [Google Scholar]
- 11.DaSilva A.W., Huckins J.F., Wang R., et al. Correlates of stress in the college environment uncovered by the application of penalized generalized estimating equations to mobile sensing data. Jmir Mhealth Uhealth. 2019;7(3):e12084. doi: 10.2196/12084. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Eagle N., Pentland A.S. Eigenbehaviors: identifying structure in routine. Behav. Ecol. Sociobiol. 2009;63(7):1057–1066. [Google Scholar]
- 13.Farrahi K., Gatica-Perez D. Proceedings of the 16th ACM international conference on Multimedia. 2008. What did you do today? Discovering daily routines from large-scale mobile data. [Google Scholar]
- 14.Friedman J.H. Greedy function approximation: a gradient boosting machine. Ann. Statis. 2001:1189–1232. [Google Scholar]
- 15.Fryar C.D., Ostchega Y., Hales C.M., et al. 2017. Hypertension Prevalence and Control Among Adults: United States; pp. 2015–2016. [PubMed] [Google Scholar]
- 16.Gardner M.W., Dorling S. Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmos. Environ. 1998;32:2627–2636. 14-15. [Google Scholar]
- 17.Golino H.F., Amaral L.S.d.B., Duarte S.F.P., et al. Predicting increased blood pressure using machine learning. J. Obesity. 2014 doi: 10.1155/2014/637635. 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Gosling Samuel, D John, Oliver P., et al. Do people know how they behave? Self-reported act frequencies compared with on-line codings by observers. J. Personal. Soc. Psychol. 1998 doi: 10.1037//0022-3514.74.5.1337. [DOI] [PubMed] [Google Scholar]
- 19.Graves A., Liwicki M., Fernández S., et al. A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2008;31(5):855–868. doi: 10.1109/TPAMI.2008.137. [DOI] [PubMed] [Google Scholar]
- 20.Graves A., Mohamed A.-r., Hinton G. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 2013. Speech recognition with deep recurrent neural networks. [Google Scholar]
- 21.A. Graves, G. Wayne, I. Danihelka, (2014). Neural turing machines. arXiv preprint arXiv:1410.5401.
- 22.Hamer M., Chida Y. Active commuting and cardiovascular risk: A meta-analytic review. Prev. Med. 2008;46(1):9–13. doi: 10.1016/j.ypmed.2007.03.006. [DOI] [PubMed] [Google Scholar]
- 23.Harari G.M., Lane N.D., Wang R., et al. Using smartphones to collect behavioral data in psychological science: opportunities, practical considerations, and challenges [Article] Perspect. Psychol. Sci. 2016;11(6):838–854. doi: 10.1177/1745691616650285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Harari G.M., Müller S.R., Aung M.S.H., et al. Smartphone sensing methods for studying behavior in everyday life. Curr. Opin. Behav. Sci. 2017;18:83–90. [Google Scholar]
- 25.Hochreiter S., Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–1780. doi: 10.1162/neco.1997.9.8.1735. [DOI] [PubMed] [Google Scholar]
- 26.Fong Kei, W M., Ng E.Y.K., et al. SVR ensemble-based continuous blood pressure prediction using multi-channel photoplethysmogram. Comput. Biol. Med. 2019;113 doi: 10.1016/j.compbiomed.2019.103392. K. [DOI] [PubMed] [Google Scholar]
- 27.Koshimizu H., Kojima R., Kario K., et al. Prediction of blood pressure variability using deep neural networks. Int. J. Med. Inf. 2020;136 doi: 10.1016/j.ijmedinf.2019.104067. [DOI] [PubMed] [Google Scholar]
- 28.Krause N. Physical activity and cardiovascular mortality – disentangling the roles of work, fitness, and leisure. Scandinavian J. Work, Environ. Health. 2010;36(5):349–355. doi: 10.5271/sjweh.3077. [DOI] [PubMed] [Google Scholar]
- 29.Kwong E.W.Y., Wu H., Pang G.K.-H. A prediction model of blood pressure for telemedicine. Health Informat. J. 2018;24(3):227–244. doi: 10.1177/1460458216663025. [DOI] [PubMed] [Google Scholar]
- 30.Li X., Wu S., Wang L. Proceedings of the 26th International Conference on World Wide Web. 2017. Blood pressure prediction via recurrent models with contextual layer. [Google Scholar]
- 31.Li X., Wu S., Wang L. Proceedings of the 26th International Conference on World Wide Web. Perth, Australia; 2017. Blood pressure prediction via recurrent models with contextual layer. [Google Scholar]
- 32.Liaw A., Wiener M. Classification and regression by randomForest. R news. 2002;2(3):18–22. [Google Scholar]
- 33.Mikolov T., Kombrink S., Burget L., et al. 2011 IEEE international conference on acoustics, speech and signal processing. 2011. Extensions of recurrent neural network language model. (ICASSP) [Google Scholar]
- 34.Montoliu R., Blom J., Gatica-Perez D. Discovering places of interest in everyday life from smartphone data. Multimedia Tools Appl. 2013;62(1):179–207. [Google Scholar]
- 35.Novaco R.W., Stokols D., Milanesi L. Objective and subjective dimensions of travel impedance as determinants of commuting stress. Am. J. Community. Psychol. 1990;18(2):231–257. doi: 10.1007/BF00931303. [DOI] [PubMed] [Google Scholar]
- 36.O'Conor R., Benavente J.Y., Kwasny M.J., et al. Daily lts. Gerontologist. 2019;59(5):947–955. doi: 10.1093/geront/gny117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Okura T., Enomoto D., Miyoshi K.-i., et al. The importance of walking for control of blood pressure: Proof using a telemedicine system. Telemed. e-Health. 2016;22(12):1019–1023. doi: 10.1089/tmj.2016.0008. [DOI] [PubMed] [Google Scholar]
- 38.Paulhus D.L., Vazire S. Handbook of Research Methods In Personality Psychology. 2007. The self-report method; pp. 224–239. [Google Scholar]
- 39.Peter R., Alfredsson L., Knutsson A., et al. Does a stressful psychosocial work environment mediate the effects of shift work on cardiovascular risk factors? Scand. J. Work Environ. Health. 1999;25(4):376–381. doi: 10.5271/sjweh.448. [DOI] [PubMed] [Google Scholar]
- 40.Pickering T.G. Mental stress as a causal factor in the development of hypertension and cardiovascular disease. Curr. Hypertens. Rep. 2001;3(3):249–254. doi: 10.1007/s11906-001-0047-1. [DOI] [PubMed] [Google Scholar]
- 41.Pierson E., Althoff T., Leskovec J. Proceedings of the 2018 World Wide Web Conference. 2018. Modeling individual cyclic variation in human behavior. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Qin T., Shangguan W., Song G., et al. Spatio-temporal routine mining on mobile phone data. ACM Trans. Knowledge DiscoveryData (TKDD) 2018;12(5):1–24. [Google Scholar]
- 43.Rabiner L.R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE. 1989;77(2):257–286. [Google Scholar]
- 44.Rau R., Georgiades A., Fredrikson M., et al. Psychosocial work characteristics and perceived control in relation to cardiovascular rewind at night. J. Occup. Health Psychol. 2001;6(3):171–181. [PubMed] [Google Scholar]
- 45.Rivera A.S., Akanbi M., O'Dwyer L.C., et al. Shift work and long work hours and their association with chronic health conditions: a systematic review of systematic reviews with meta-analyses. PLoS One. 2020;15(4) doi: 10.1371/journal.pone.0231037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Saeb S., Lattie E.G, Schueller S.M., et al. The relationship between mobile phone location sensor data and depressive symptom severity. PeerJ. 2016;4 doi: 10.7717/peerj.2537. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Saeb S., Zhang M., Karr C.J., et al. Mobile phone sensor correlates of depressive symptom severity in daily-life behavior: an exploratory study. J. Med. Internet Res. 2015;17(7):e175. doi: 10.2196/jmir.4273. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Schaeffer M.H., Street S.W., Singer J.E., et al. Effects of Control on the Stress Reactions of Commuters1. J. Appl. Soc. Psychol. 1988;18(11):944–957. [Google Scholar]
- 49.Smola A.J., Schölkopf B. A tutorial on support vector regression. Statis. Comput. 2004;14(3):199–222. [Google Scholar]
- 50.Spruill T.M. Chronic Psychosocial Stress and Hypertension. Curr. Hypertens. Rep. 2010;12(1):10–16. doi: 10.1007/s11906-009-0084-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Stergiou G.S., Nasothimiou E.G. Hypertension: Does home telemonitoring improve hypertension management? Nat. Rev. Nephrol. 2011;7(9):493. doi: 10.1038/nrneph.2011.108. [DOI] [PubMed] [Google Scholar]
- 52.Stutzer A., Frey B.S. Stress that doesn't pay: the commuting paradox*. Scandinavian J. Econ. 2008;110(2):339–366. [Google Scholar]
- 53.Su P., Ding X., Zhang Y., et al. 2018 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI) 2018. Long-term blood pressure prediction with deep recurrent neural networks. [Google Scholar]
- 54.Sun L., Chen X., He Z., et al. Routine pattern discovery and anomaly detection in individual travel behavior. arXiv preprint. 2020 arXiv:2004.03481. [Google Scholar]
- 55.Teixeira P.L., Wei W.-Q., Cronin R.M., et al. Evaluating electronic health record data sources and algorithmic approaches to identify hypertensive individuals. J. Am. Med. Inform. Assoc. 2016;24(1):162–171. doi: 10.1093/jamia/ocw071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Wang W., Harari G.M., Wang R., et al. Sensing behavioral change over time: using within-person variability features from mobile sensing to predict personality traits. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2018;2(3) Article 141. [Google Scholar]
- 57.Xiong H., Peng M., Jiang X.j., et al. Time trends regarding the etiology of renal artery stenosis: 18 years’ experience from the China center for cardiovascular disease. J. Clin. Hypertension. 2018;20(9):1302–1309. doi: 10.1111/jch.13356. l. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Xu X., Chikersal P., Doryab A., et al. Leveraging routine behavior and contextually-filtered features for depression detection among college students. Proc. ACM Interact., Mobile, Wearable Ubiquitous Technol. 2019;3(3):1–33. [Google Scholar]
- 59.Zhang B., Ren J., Cheng Y., et al. Health data driven on continuous blood pressure prediction based on gradient boosting decision tree algorithm. IEEE Access. 2019;7:32423–32433. [Google Scholar]