Abstract
Artificial intelligence (AI) telephone is reliable for the follow‐up and management of hypertensives. It takes less time and is equivalent to manual follow‐up to a high degree. We conducted a reliability study to evaluate the efficiency of AI telephone follow‐up in the management of hypertension. During May 18 and June 30, 2020, 350 hypertensives managed by the Pengpu Community Health Service Center in Shanghai were recruited for follow‐up, once by AI and once by a human. The second follow‐up was conducted within 3–7 days (mean 5.5 days). The mean length time of two calls were compared by paired t‐test, and Cohen's Kappa coefficient was used to evaluate the reliability of the results between the two follow‐up visits. The mean length time of AI calls was shorter (4.15 min) than that of manual calls (5.24 min, P < .001). The answers related to the symptoms showed moderate to substantial consistency (κ:.465–.624, P < .001), and those related to the complications showed fair consistency (κ:.349, P < .001). In terms of lifestyle, the answer related to smoking showed a very high consistency (κ:.915, P < .001), while those addressing salt consumption, alcohol consumption, and exercise showed moderate to substantial consistency (κ:.402–.645, P < .001). There was moderate consistency in regular usage of medication (κ:.484, P < .001).
Keywords: artificial Intelligence, hypertension, reliability, speech recognition, telephone follow‐up
1. INTRODUCTION
The China Hypertension Survey indicated that the number of adult patients with hypertension in China has exceeded 240 million, and demands for national public health services have increased dramatically in the last decade. 1 Since 2009, China has launched a Basic Public Health Services Program that includes management for hypertension. 2 This service is primarily borne by community healthcare centers. According to national norms, community healthcare center physicians (CHCPs) should provide seasonal follow‐up which includes blood pressure level, clinical symptoms, complications, behaviors and medication compliance to adjust the corresponding treatment plan. 3 , 4 In addition, a matching comprehensive disease management system has been established and connected with all of the community hospital information systems (HIS), and the associated follow‐up form will pop‐up for CHCPs to complete when managed patients come to an outpatient clinic during a new follow‐up period.
At present, the number of managed patients with hypertension in Shanghai has exceeded 2 million, while the number of CHCPs is only around 20,000, and the personnel engaged in chronic disease management is even more limiting. 5 , 6 CHCPs have limited energy, heavy workloads, and low follow‐up efficiencies, and the large gap in the service level leads to poor data quality. In addition, in some cases, suspicious data may be entered to enable health personnel to cope with follow‐up tasks. A large number of positive symptoms are not reported in routine follow ups. 7 As a result, CHCPs are unable to timely intervene in patient condition fluctuations and the effect of community management falls short. 8 , 9
To overcome the aforementioned bottleneck, we aim to enhance and improve the status quo with the use of new technology. In the near future, artificial intelligence (AI) technology like image and speech recognition technology could replace manual inquiry and provide a powerful means of high‐efficiency and high‐quality information collection in large quantities. 10 , 11 , 12
Voice interaction technology has attracted much attention in the fields of symptom monitoring and follow‐up intervention. Interactive voice response (IVR) has been used to collect monitoring information for a long time and proved efficient in many studies. 13 , 14 , 15 , 16 Compared with the studies designed to use keystrokes, the use of automatic speech recognition (ASR) technology makes it more efficient for the listener to answer. 17 , 18 , 19 This is especially important for older adults, who make up the majority of chronic disease patients. It usually questions patients according to a pre‐set questionnaire and acquires further inquiries or health information based on patient response. The major advantages of IVR are its convenience of use and high level of repeatability. It can replace the repetitive work of asking the same questions, filling in the blanks automatically, and save manual power for a more meaningful process like tracing patients with abnormal conditions after IVR calls. It can also ensure the data's authenticity and prevent CHCPs from filling records incorrectly or falsifying them. Using the same questionnaire, IVR can also ensure the homogeneity of follow‐up services and avoid data quality problems caused by artificial differences. In order to cope with the rapidly increasing workload, telephone follow‐up can easily solved by increasing the number of phone lines, while increasing the number of community doctors is not workable. 20
In China, unlike the large number of applications in business customer service and social communications, IVR is rarely used in the medical field. Some hospitals have begun to explore the application of AI calls in postoperative follow‐ups, primarily after the COVID‐19 outbreak, and most studies have been designed to use ASR software. 21 Some studies have been conducted for the management of chronic patients using IVR calls in western countries, but there have been rare reports of their widespread application for community follow‐ups in China. 22 , 23 , 24 , 25
In this study, an AI telephone follow‐up platform based on standard speech and voice interaction technology was carried out to compare the length of time and consistency of the information collected by two types of follow‐up to provide a basis for the feasibility of AI follow‐up technology in condition monitoring.
2. MATERIALS AND METHODS
2.1. Study design
Ethical approvals were obtained from the institutional review boards at the Shanghai Municipal Center for Disease Control and Prevention. We chose the Pengpu Community Health Service Center of Jing'an District, Shanghai, for data collection. All participants were recruited by CHCPs during May 18 and June 30, 2020. We had access to ID number that could identify individual participants during data collection. Participants were followed up twice, once by AI and once by a human; the period between calls was 3 to 7 days. If there was no response three times, no more calls were made. After a successful connection, the target patient or a family member who knew the patient's condition well answered the standard questionnaire, including the 21 questions. The results recorded by the doctor and the AI were compared for consistency, and both calls were recorded to analyze the reasons for the differences. The inclusion criteria were as follows: (a) managed hypertensives without diabetes; (b) have correct telephone number; and (c) conscious and able to speak normally.
2.2. Recruitment
A total of 350 patients were recruited from patients managed by CHCPs. Among them, 53 (15%) were first followed up by doctors and 297 (75%) by AI. In the second follow‐up, among the patients who received the human follow‐up, 11 (21%) patients did not answer the AI calls, and 20 (48%) patients did not answer more than four questions. Among the patients receiving AI follow‐up, 48 (16%) did not answer the human calls, and 69 (28%) did not answer more than four questions. After removing 8 cases that were answered by a family member, a total of 194 (55%) patients were finally included in the next phase of the analysis (Figure 1).
FIGURE 1.

Participant enrollment and retention. AI, artificial intelligence.
2.3. Standard questionnaire
The standard follow‐up questionnaire was derived from the indicators required in the follow‐up information form from Shanghai Community Health Management Practice—Comprehensive Prevention and Treatment of Chronic Diseases (2017 Edition), with a total of 21 questions, including the patients' symptoms (headache or dizziness; pale or flushed; blurred vision, tinnitus or nosebleed; limb numbness or edema; palpitation, chest discomfort or dyspnea; dysphoria), complications, behaviors (smoking, drinking, high salt intake, no exercise) and medicine information (irregular medicine taking, side effect). The standard questionnaire changed each indicator into a question that was easy to understand. For example, “dizziness and headache,” one of the collected symptoms, was converted to “Did you have dizziness or headache recently?” and the answer options were set to “Yes” and “No.” AI would judge the corresponding options based on the patient's answer. If the patient answered, “It seems to happen these days,” the result would be determined as “Yes.”
2.4. Telephone follow‐up based on AI
AI in this study was used to analyze speech data of patients, understanding what they responded, and storing them as standardized indicators. Speech and natural language processing are the core parts of intelligent interaction. Speech processing includes speech synthesis, speech recognition, voice pattern recognition, and other processes. Natural language processing includes natural language understanding, dialogue management, and other processes. This study used ASR technology provided by Aliyun, including modules such as pre‐processing, feature extraction, an acoustic model, a language model, and a search algorithm. The acoustic model calculates the most likely pronunciation of each Chinese character based on its finals and initials, while the language model calculates the most likely combination of different characters. A latency‐controlled bidirectional long short‐term memory (LC‐BLSTM) acoustic model and neural network language model (NNLM) were used to decode speech data and convert it to text. We used deep learning for natural language understanding (NLU), convolutional neural networks (CNN) for intention determination, and BLSTM for attribute extraction. 26 , 27
Figure 2 shows the workflow of the AI voice follow‐up. The AI follow‐up included four main parts: (a) importing patient information, (b) making phone calls, (c) asking questions and information collection, (d) content identification and preservation. At the beginning, lists of basic information for each follow‐up subject were imported, including each patient's ID number, telephone number, and the managing doctor to develop the follow‐up questionnaire. We adjusted phone settings such as the time the phone call was made, the interval between the two calls, and the reserved response time. After the import, the system automatically dials through the trunk line. After the call is connected, the speech text is converted into speech using Text to Speech (TTS) technology, and a simulated human voice is used to communicate. During the follow‐up, self‐introduction and identification are first carried out according to the preset questionnaire, and then questions are asked. After receiving the patients' answers, ASR and NLU technology are used to extract the response elements, determine whether there were physical abnormalities, and generate standardized indicators, and the next question is selected according to the identified content.
FIGURE 2.

Workflow of the artificial intelligence (AI) voice follow‐up. ASR, automatic speech recognition; CHCP, community healthcare center physicians; NLP, natural language processing; STT, Speech to Text; TTS, Text to Speech
After completing the follow‐up, the complete recording is segmented for each question and its answer, and the patient's responses are translated into text using Speech to Text (STT) technology. Finally, all collected information, including the follow‐up date, call and answer situation, follow‐up duration, complete recording, segment recording, the translated text, and follow‐up results, are stored structurally. In addition, all abnormal indicators and 10% of normal indicators are manually checked for quality control. Once the storage is complete, CHCPs can view the records and modify them.
2.5. Statistical analyses
AI follow‐up data were extracted from the AI system in a spreadsheet format (Excel), and manual follow‐up data were filled in by CHCPs in the pre‐designed form (Excel). SPSS 26.0 software was used to import two forms, and the data were matched, cleaned, and statistically analyzed. Composition ratio was expressed as the number of cases and percentage, and the mean length of time was expressed as minutes. A paired t‐test was used to compare the length of time between AI and human calls, and t‐tests and analysis of variance were used to compare further differentces between gender, age group, and education level. Kappa was used to calculate the reliability of classification variables, and kappa values are classified as follows: below.00 (poor),.00−.20 (slight),.21−.40 (fair),.41−.60 (moderate),.61−.80 (substantial) and.81−1 (almost perfect). Test level α = .05.
3. RESULTS
3.1. Participant statistics
Table 1 shows 194 patients completed both follow‐up visits eventually. Out of these, 82 (42.3%) were male and 112 (57.7%) were female, with a mean age of 67.6 years (range 35 to 91). Among them, the 60−69 age group was the largest group with 63 patients, accounting for 32.5%. And 86 (44.3%) patients had a secondary school education.
TABLE 1.
Demographic characteristics of the study participants.
| Attributes | Distribution | N (%) |
|---|---|---|
| Gender | Male | 82 (42.3) |
| Female | 112 (57.7) | |
| Age group | ≤ 59 | 46 (23.7) |
| 60−69 | 63 (32.5) | |
| 70−79 | 49 (25.3) | |
| 80+ | 36 (18.5) | |
| Education | Primary or less | 19 (9.8) |
| Secondary school | 86 (44.3) | |
| High school | 54 (27.8) | |
| University or higher | 35 (18.1) | |
| BP * (mmHg), mean(SD) | SBP | 76.0 (8.289) |
| DBP | 132.5 (11.211) | |
| Total | 194 (100.0) |
Abbreviations: DBP, diastolic blood pressure; SBP, systolic blood pressure; SD, standard deviation.
Number was 175 due to missing values.
3.2. Evaluation outcomes
Table 2 shows the mean length of time (minutes) for each call. The call duration of AI calls was 4.15 ± 0.47 min, ranging from 3.15 to 5.85 min. The duration of manual calls was 5.24 ± 1.89 min, ranging from 2.43 to 13.85 min. There was no statistically significant difference in the average time of different gender, age groups, and educational level groups.
TABLE 2.
Mean length of time (minutes) for different call types.
| Attributes | Distribution | AI calls (SD) | F/T‐value | P‐value | Manual calls (SD) | F/T‐value | P‐value |
|---|---|---|---|---|---|---|---|
| Gender | Male | 4.16 (0.50) | 0.246 | .806 | 5.33 (2.01) | 0.572 | .568 |
| Female | 4.14 (0.44) | 5.17 (1.81) | |||||
| Age group | ≤ 59 | 4.10 (0.46) | 0.364 | .779 | 4.97 (1.62) | 2.591 | .054 |
| 60−69 | 4.17 (0.52) | 5.11 (1.62) | |||||
| 70−79 | 4.18 (0.46) | 5.88 (2.26) | |||||
| 80+ | 4.12 (0.38) | 4.96 (1.99) | |||||
| Education | Primary or less | 4.18 (0.50) | 2.173 | .093 | 5.58 (2.47) | 0.430 | .732 |
| Secondary school | 4.14 (0.42) | 5.27 (1.87) | |||||
| High school | 4.25 (0.52) | 5.25 (1.43) | |||||
| University or higher | 3.97 (0.43) | 4.98 (2.34) | |||||
| Total | 4.15 (0.47) | 5.24 (1.89) | 8.198 | <.001 |
Abbreviation: F‐value, value of F‐test; SD, standard deviation; T‐value, value of T‐test.
Table 3 shows the report rate of each variable. The report rate of AI calls was far lower than manual calls for complications. The report rate of female was higher than male for symptoms and complications but lower for behavior. People over 70 reported more symptoms and complications, but when it came to behavior, they had lower rates of smoking and drinking.
TABLE 3.
Report rate of different indicators.
| Both collected (n) | Total | Male | Female | ≤ 59 | 60−69 | 70−79 | 80+ | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AI | human | AI | human | AI | human | AI | human | AI | human | AI | human | AI | human | ||
| Symptoms | |||||||||||||||
| Headache or dizziness | 192 | 39 (20.3) | 65 (33.9) | 9 (11.1) | 16 (19.8) | 30 (27.0) | 49 (44.1) | 8 (17.4) | 12 (26.1) | 9 (14.5) | 19 (30.6) | 14 (29.2) | 24 (50.0) | 8 (22.2) | 10 (27.8) |
| Pale or flushed face | 188 | 26 (13.8) | 15 (8.0) | 9 (11.4) | 6 (7.6) | 17 (15.6) | 9 (8.3) | 7 (15.2) | 4 (8.7) | 5 (8.3) | 3 (5.0) | 10 (20.8) | 5 (10.4) | 4 (11.8) | 3 (8.8) |
| Blurred vision, tinnitus, or nosebleed | 186 | 44 (23.7) | 69 (37.1) | 18 (22.5) | 25 (31.3) | 26 (24.5) | 44 (41.5) | 7 (15.6) | 11 (24.4) | 12 (20.0) | 20 (33.3) | 15 (32.6) | 20 (43.5) | 10 (28.6) | 18 (51.4) |
| Limb numbness or edema | 191 | 43 (22.5) | 62 (32.5) | 11 (13.6) | 16 (19.8) | 32 (29.1) | 46 (41.8) | 6 (13.0) | 10 (21.7) | 12 (19.4) | 17 (27.4) | 18 (38.3) | 21 (44.7) | 7 (19.4) | 14 (38.9) |
| Palpitation, chest discomfort, or dyspnea | 190 | 38 (20.0) | 55 (28.9) | 12 (14.6) | 15 (18.3) | 26 (24.1) | 40 (37.0) | 6 (13.1) | 9 (20.0) | 13 (21.7) | 18 (30.0) | 11 (22.4) | 14 (28.6) | 8 (22.2) | 14 (38.9)* |
| Dysphoria | 185 | 50 (27.0) | 49 (26.5) | 16 (20.3) | 20 (25.3) | 34 (32.1) | 29 (27.4) | 12 (26.7) | 13 (28.9) | 19 (31.1) | 21 (34.4) | 13 (28.3) | 13 (28.3) | 6 (18.2) | 2 (6.1)* |
| Complications | 187 | 56 (29.9) | 117 (62.6) | 25 (30.5) | 44 (53.7) | 31 (29.5) | 73 (69.5) | 7 (15.6) | 12 (26.7) | 15 (24.6) | 35 (57.4) | 20 (41.7) | 43 (89.6)* | 14 (42.4) | 27 (81.8)* |
| Behavior | |||||||||||||||
| High salt intake | 174 | 26 (14.9) | 13 (7.5) | 12 (17.1) | 7 (10.0) | 14 (13.5) | 6 (5.8) | 5 (12.2) | 3 (7.3) | 8 (13.3) | 6 (10.0) | 10 (21.7) | 4 (8.7) | 3 (11.1) | 0(0.0) |
| Smoking | 191 | 38 (19.9) | 35 (18.3) | 31 (38.8) | 30 (37.5) | 7 (6.3) | 5 (4.5) | 17 (37.0) | 17 (37.0) | 13 (20.6) | 11 (17.5) | 4 (8.5) | 4 (8.5) | 4 (11.4) | 3 (8.6) |
| Drinking | 191 | 28 (14.7) | 43 (22.5) | 22 (27.5) | 33 (41.3) | 6 (5.4) | 10 (9.0) | 10 (21.7) | 17 (37.0) | 11 (18.0) | 16 (26.2) | 4 (8.3) | 5 (10.4) | 3 (8.3) | 5 (13.9) |
| No exercise | 187 | 68 (36.4) | 71 (38.0) | 27 (33.8) | 23 (28.8) | 41 (38.3) | 48 (44.9) | 13 (28.3) | 12 (26.1) | 21 (35.0) | 21 (35.0) | 17 (36.2) | 23 (48.9) | 17 (50.0) | 15 (44.1) |
| Medicine information | |||||||||||||||
| Irregular medicine taking | 190 | 7 (3.7) | 5 (2.6) | 3 (3.7) | 1 (1.2) | 4 (3.7) | 4 (3.7) | 0 (0.0) | 0 (0.0) | 4 (6.6) | 2 (3.3) | 2 (4.1) | 2 (4.1)* | 1 (2.9) | 1 (2.9) |
| Side effect | 177 | 10 (5.6) | 5 (2.8)* | 4 (5.2) | 0 (0.0) | 6 (6.0) | 5 (5.0)* | 1 (2.2) | 1 (2.2)* | 0 (0.0) | 1 (1.8) | 4 (8.9) | 0 (0.0)* | 5 (15.6) | 0 (0.0) |
P > .05.
Table 4 presents the estimation of reliability for all assessed variables. In terms of symptoms, consistency fluctuates between general and moderate. Limb numbness or edema showed the highest consistency of substantial (κ = .624), while headache or dizziness (κ = .510), pale or flushed face (κ = .593), blurred vision, tinnitus or nosebleed (κ = .465), palpitation, chest discomfort, or dyspnea (κ = .507) and dysphoria (κ = .490) showed moderate consistency. The complications displayed fair consistency (κ = .349).
TABLE 4.
Reliability estimation of different indicators.
| Sensitivity of AI (%) | Specificity of AI (%) | Kappa statistics | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Total | Male | Female | ≤ 59 | 60−69 | 70−79 | 80+ | |||
| Symptoms | |||||||||
| Headache or dizziness | 84.6 | 95.3 | .510 | .394 | .524 | .495 | .466 | .500 | .557 |
| Pale or flushed face | 86.7 | 92.5 | .593 | .780 | .483 | .693 | .467 | .613 | .523 |
| Blurred vision, tinnitus, or nosebleed | 50.7 | 92.3 | .465 | .465 | .463 | .588 | .500 | .408 | .323 |
| Limb numbness or edema | 61.3 | 93.3 | .624 | .603 | .610 | .701 | .599 | .607 | .550 |
| Palpitation, chest discomfort, or dyspnea | 52.7 | 93.3 | .507 | .513 | .487 | .444 | .612 | .626 | .239 * |
| Dysphoria | 63.3 | 86.7 | .490 | .498 | .482 | .613 | .554 | .464 | −.100 * |
| Complications | 45.3 | 95.7 | .349 | .455 | .277 | .672 | .390 | .006 * | .171 * |
| Behavior | |||||||||
| High salt intake | 69.2 | 89.4 | .402 | .458 | .347 | .450 | .516 | .348 | .000 * |
| Smoking | 97.1 | 97.4 | .915 | .921 | .824 | 1.000 | .794 | 1.000 | .842 |
| Drinking | 58.1 | 98.0 | .640 | .647 | .464 | .643 | .482 | .878 | .721 |
| No exercise | 76.1 | 87.9 | .645 | .652 | .636 | .726 | .560 | .658 | .647 |
| Medicine information | |||||||||
| Irregular medicine taking | 60.0 | 97.8 | .484 | .491 | .481 | – | .651 | −.043 * | 1.000 |
| Side effect | .0 | 94.2 | −.039 * | .000 * | −.058 * | −.023 * | .000 * | −.082 * | .000 * |
Abbreviation: AI, artifcial intelligence.
P > .05.
For behaviors, consistency was extremely high, especially for situations like smoking (κ = .915). Drinking and lack of exercise displayed a substantial agreement (κ = .640, κ = .645). The consistency of salt intake showed moderate agreement (κ = .402), being the lowest for lifestyle. For drug‐taking, the consistency of the irregular drug‐taking report revealed moderate agreement (κ = 0.484).
In all segmented records, 363 were inconsistent with AI and manual follow‐up. After re‐hearing all these records, we classified the reasons for the inconsistencies into six categories, including inconsistencies in patients' answers (n = 278, account for 76.6%), inconsistencies between the AI and the doctor's judgment when the answers were consistent (n = 11, account for 3%), AI identification error (n = 41, account for 11.3%), doctor filling error (n = 23, account for 6.3%), incomplete recording (n = 5, account for 1.4%), and ambiguity of the question (n = 5, account for 1.4%).
4. DISCUSSION
4.1. Principal results
After the outbreak of COVID‐19, more community manpower has been invested in the prevention and control work, which has impacted the management of chronic diseases. Therefore, providing intelligent follow‐up tools is one of the effective ways to solve the shortage of medical resources. In this pilot study, AI follow‐up platform helped doctors complete several parts of follow‐up, including plan formulation, phone calls, information collection, structured storage, and data uploading.
4.1.1. Duration time
The average follow‐up time of AI was shorter than that of doctors. The duration of AI follow‐up for different patients is relatively stable compared with that of doctors; the same result was found in Bian's study. 28 During follow‐up, doctors often need to answer some questions that have nothing to do with the disease, especially for older people.
In the AI follow‐up, we can control the duration of the call from many aspects. For example, we can adjust the reserved response time for different groups of people because older adults may react and think slowly. At the same time, the period of some questions can also be customized. The weight may fluctuate greatly in a short time; thus, the collection frequency can be set to once every 3 months for it. However, height can be set at once a year to reduce the patients' irritability due to repeated answers. In this pilot, 10 telephone lines were used, and one line was able to contact an average of 98 patients a day, while a doctor could call a maximum of about 50 patients a day without doing any work else.
From the doctors’ point of view, too much time on filling out basic and tedious forms is likely one of the reasons influencing doctors lack enthusiasm. There is a lot of paperwork to complete, and the content to be filled in is basic and tedious. In addition, the existing data quality analysis also suggested that the doctors’ filled‐in content was not entirely truthful. Therefore, saving the follow‐up time is an important means to improve the efficiency of follow‐ups.
We also noticed that the voice follow‐up lost some patients' information, and these patients' comprehensive answers were very long. Even if there was important information, the capture efficiency of the manual follow‐up was still very low, and it was not suitable for chronic disease follow‐ups of large populations in terms of daily work. We were able to add a question at the end of the call, regarding whether the patient required a doctor or assistant to call back.
4.1.2. Measure of agreement
We divided the collected indicators into four categories: symptoms, complications, lifestyle, and medication. It can be seen that the consistency of symptom indicators is mostly moderate except for numbness of limbs and edema. However, the consistency of lifestyle indicators was mostly high, especially for smoking. This is similar to the results of many studies and shows that smoking and drinking habits are highly consistent through different types of collections. 29 However, salt intake consistency was only moderate, indicating that salt intake was also affected by subjective feeling. As for the medication situation, the consistency of irregular medication was high, while the response to adverse drug reactions was close to 0%, which affected the Kappa statistics.
By comparing the reporting rate of AI follow‐up with the data from Shanghai Non‐communicable and Chronic Disease (NCD) Surveillance, we can find that this study's smoking rate and drinking rate are close to the surveillance, which means AI can achieve an excellent reporting rate. There is a large difference in the reporting rate of high salt intake, indicating that many people have an inaccurate perception of their salt intake and underestimate compared with surveillance. 30 The monitoring data estimated the salt intake situation by recalling how much salt a household uses in a month using a face‐to‐face query. This study also found a high rate of insufficient exercise, which may be related to the occurrence of the COVID‐19 disease, which caused many people to reduce outdoor exercise.
4.1.3. Causes of inconsistency
There were 363 inconsistencies between AI and manual calls in all, and 278 (76.6%) among them were caused by the patients’ different answers, especially responses to questions about symptoms. For example, one patient said he had a headache when AI called, but said he had no headache when a human called. Most of them were caused by time fluctuations. Second, 41 (11.3%) were errors due to AI recognition. Because of technical limitations, it was difficult for AI to be completely accurate in some cases, such as with complex logic in patients’ answers. Sometimes there was affirmative and negative in one sentence at the same time. For example, one person said that he ate a diet heavy on salt in the past, but he now consumes a limited sodium diet at the time of the call. In another case, AI identified salt intake wrong because the pronunciation of “heavy” and “moderate” in Mandarin is very similar; if the patient has a heavy accent, these words may be difficult to distinguish even by doctors. Third, when the answers were consistent, AI and doctors made different judgments when words like “rarely” occurred, such as rarely drinking. Some doctors would judge it as no drinking. Fourth, the short time between two questions also lead to 5 (1.4%) errors in judgment because the next question may continue to play on the phone before the patient has finished answering, resulting in the loss of information. The last reason of 5 (1.4%) errors is the ambiguity of the question. When asking about comorbidities, we want to collect new diseases within 3 months, but we may not emphasize the time, so some patients told us about comorbidities existing before the past 3 months.
4.2. Comparison with prior work
In the short term, the expected improvement in the follow‐up efficiency was certainly achieved, but in the long term, more work is required to ensure the operation of the overall management process. 31 The most important way is to ensure the learning ability of the AI through a continuous recognition practice and the improvement of the questionnaire to maintain and improve the accuracy of recognition. In addition, we are also developing further confirmation procedures for unspecific symptoms, such as asking if there are other eye diseases, such as glaucoma, after knowing the blurred vision.
The second way is to improve the follow‐up process. This includes an update of the questionnaire, such as adjusting the number of questions or the order of inquiry. In addition, we will consider the classification of the characteristics of the patients who answered the phone. For patients who refuse to answer the phone or whose answers are very long, telephone follow‐up may not be suitable, and a traditional manual follow‐up can still be completed instead. A short message bank should be established, and initial brief feedback should be provided to patients after follow‐up each time. According to the grade of positive symptoms, different response levels are established. If a patient's blood pressure is too high, doctors are directly reminded to address it, without waiting for doctors to arrange follow‐up plans.
The third improvement is to have a good connection with the chronic disease management system. We are in the process of data docking between the voice follow‐up platform and the chronic disease management system, and relevant follow‐up data will be transmitted to the chronic disease management system every day. If patients visit the outpatient facility for follow‐up, the doctor can see directly in the pop‐up form content the abnormal symptoms of an early follow‐up reminder, and doctors can ask more thorough questions based on the exceptional, saved content in the form. If the patients do not go to the clinic during the follow‐up period, a community doctor or assistant can also check the temporary form completed by the AI follow‐up on the chronic disease management platform, and call patients for further inquiries based on the follow‐up contents.
4.3. Limitations
There are still some shortcomings to this study. First, considering the initial funds and equipment investment, only one community center was selected for this pilot application, and all patients were recruited by themselves, which may have biased the selection. Second, the change in the order of the two follow‐up methods may produce bias, but the removal of patients that doctors called first would make the sample size too small to get meaningful results. All first calls should be randomized to either AI or human to look at the effect of AI compared to humans on average. But due to the low response rate of patients undergoing AI follow‐up if the doctor followed up first, we conducted AI follow‐up first and doctors performed follow‐up later to save doctors' time due to the limited working time of doctors during the epidemic. It indicated a low cooperation degree and enthusiasm for AI follow‐up at present. More attention should be paid to improving the response rate of telephone follow‐up in future studies.
5. CONCLUSIONS
Relying on the daily work of grassroots public health services, community doctors must complete relevant follow‐up work, which is the basis for us to use this voice follow‐up technology, aiming to reduce the workload of doctors, improve efficiency, improve data quality, and thus improve the effect of chronic disease management. This study suggests that the AI follow‐up technology is feasible in community chronic disease follow‐up work by saving time and improving follow‐up efficiency, and it has good application prospects.
AUTHOR CONTRIBUTIONS
Concept and design: Junling Gao, Yan Shi, and Minna Cheng. Acquisition of data: Chen Chen, Lin Zhang, Jing Shen, Xin Zhang, and Dongsheng Ren. Analysis and interpretation of data: Siyuan Wang, Mengyun Sui, Junling Gao, and Qinping Yang. Drafting of the manuscript: Siyuan Wang, Mengyun Sui, and Qinping Yang. Critical revision of the manuscript for important intellectual content: Junling Gao, Jing Shen, and Chen Chen. Statistical analysis: Siyuan Wang, Yuheng Wang, Mengyun Sui, Yuheng Wang, and Qinping Yang. Provision of study materials or patients: Chen Chen, Lin Zhang, Xin Zhang, and Dongsheng Ren. Obtaining funding: Minna Cheng, Yan Shi, Yuheng Wang, and Chen Chen. Administrative, technical, or logistic support: Chen Chen, Lin Zhang, Jing Shen, Xin Zhang, and Dongsheng Ren. Supervision: Minna Cheng, Yan Shi, Yuheng Wang, Chen Chen.
ACKNOWLEDGMENTS
We would like to thank Yajuan Chen, Yan Zhao, Tingting Wu, Lin Lu, Zhiwei Liang, Yabei Fang, Xiaoye Pan and Yujing Wu from Pengpu Community Health Service Center for information collection. This study was funded by Shanghai Municipal Health Commission (GWVI‐8, GWVI‐11.1‐22, 20214Y0488, 20234Y0304)
Wang S, Shi Y, Sui M, et al. Telephone follow‐up based on artificial intelligence technology among hypertension patients: Reliability study. J Clin Hypertens. 2024;26:656‐664. 10.1111/jch.14823
Siyuan Wang, Yan Shi, Mengyun Sui contributed equally to this work and share first authorship.
Contributor Information
Junling Gao, Email: jlgao@fudan.edu.cn.
Minna Cheng, Email: chengminna@scdc.sh.cn.
REFERENCES
- 1. Wang Z, Chen Z, Zhang L, et al. Status of hypertension in China: results from the China Hypertension Survey, 2012–2015. Circulation. 2018;137:2344‐2356. [DOI] [PubMed] [Google Scholar]
- 2. Li X, Lu J, Hu S, et al. The primary health‐care system in China. Lancet. 2017;390:2584‐2594. [DOI] [PubMed] [Google Scholar]
- 3. Qin J, Zhang Y, Fridman M, et al. The role of the Basic Public Health Service program in the control of hypertension in China: results from a cross‐sectional health service interview survey. PLoS One. 2021;16:e0217185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Zheng X, Xiao F, Li R, et al. The effectiveness of hypertension management in China: a community‐based intervention study. Prim Health Care Res Dev. 2019;20:e111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Xu JY, Yan QH, Yao HH, et al. Analysis on management of patients with hypertension in communities in Shanghai (in Chinese). Shanghai J Prev Med. 2016;28:442‐447. [Google Scholar]
- 6. Commission NH. China health statistics yearbook 2019 (in Chinese). China Union Medical College Press; 2019. [Google Scholar]
- 7. Li X, Krumholz HM, Yip W, et al. Quality of primary health care in China: challenges and recommendations. Lancet. 2020;395:1802‐1812. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Song ZW, Zhang M, Zhang X, et al. Study on community health management and control of hypertension in patients aged 35 years and above in China, 2015. Zhonghua Liu Xing Bing Xue Za Zhi. 2021;42:2001‐2009. [DOI] [PubMed] [Google Scholar]
- 9. Fang G, Yang D, Wang L, Wang Z, Liang Y, Yang J. Experiences and challenges of implementing universal health coverage with china's national basic public health service program: literature review, regression analysis, and insider interviews. JMIR Public Health Surveill. 2022;8:e31289. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Meng L, Wang GY, Xia HH, et al. Clinical evaluation of artificial intelligence system based on fundus photograph in diabetic retinopathy screening. Chin J Exp Ophthalmol. 2019;37:663‐668. [Google Scholar]
- 11. Posadzki P, Mastellos N, Ryan R, et al. Automated telephone communication systems for preventive healthcare and management of long‐term conditions. Cochrane Database Syst Rev. 2016;12:CD009921. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Rodriguez‐Ruiz A, Lang K, Gubern‐Merida A, et al. Stand‐alone artificial intelligence for breast cancer detection in mammography: comparison with 101 radiologists. J Natl Cancer Inst. 2019;111:916‐922. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Schiff GD, Klinger E, Salazar A, et al. Screening for adverse drug events: a randomized trial of automated calls coupled with phone‐based pharmacist counseling. J Gen Intern Med. 2019;34:285‐292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Holland SM, Shuk E, Burkhalter J, Shouery M, Li Y, Hay JL. Feasibility and acceptability of using an IVRS to assess decision making about sun protection. Psychooncology. 2020;29:156‐163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Steinberg DM, Levine EL, Lane I, et al. Adherence to self‐monitoring via interactive voice response technology in an eHealth intervention targeting weight gain prevention among Black women: randomized controlled trial. J Med Internet Res. 2014;16:e114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Pariyo GW, Greenleaf AR, Gibson DG, et al. Does mobile phone survey method matter? Reliability of computer‐assisted telephone interviews and interactive voice response non‐communicable diseases risk factor surveys in low and middle income countries. PLoS ONE. 2019;14:e0214450. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Forster AJ, Boyle L, Shojania KG, Feasby TE, Walraven CV. Identifying patients with post‐discharge care problems using an interactive voice response system. J Gen Intern Med. 2009;24:520‐525. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Pichayapinyo P, Saslow LR, Aikens JE, et al. Feasibility study of automated interactive voice response telephone calls with community health nurse follow‐up to improve glycaemic control in patients with type 2 diabetes. Int J Nurs Pract. 2019;25:e12781. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Bin Sawad A, Narayan B, Alnefaie A, et al. A systematic review on healthcare artificial intelligent conversational agents for chronic conditions. Sensors (Basel). 2022;22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Lieberman G, Naylor MR. Interactive voice response technology for symptom monitoring and as an adjunct to the treatment of chronic pain. Transl Behav Med. 2012;2:93‐101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Yilin F, Jidong Z, Hao J, Lindi W, Xiaowen H, Daxiang W. Application of artificial intelligence phonetic system in postoperative follow‐up of day surgery patients (in Chinese). West Chin Med J. 2019;34:164‐167. [Google Scholar]
- 22. Piette JD, Rosland AM, Marinec NS, Striplin D, Bernstein SJ, Silveira MJ. Engagement with automated patient monitoring and self‐management support calls: experience with a thousand chronically ill patients. Med Care. 2013;51:216‐223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Piette JD, Marinec N, Gallegos‐Cabriales EC, et al. Spanish‐speaking patients' engagement in interactive voice response (IVR) support calls for chronic disease self‐management: data from three countries. J Telemed Telecare. 2013;19:89‐94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Wang SY, Zhou F, Gao JL, et al. The application of artificial intelligence telephone follow‐up in the management of hypertension follow‐up (in Chinese). Chin J Prev Contr Chron Dis. 2021;29:817‐820. [Google Scholar]
- 25. Wang Y, Chen Y, Wu H, Wei XJ, Gao WJ, Lu XL. Application effect of an intelligent outbound call platform in Fangzhuang Community Health Center (In Chinese). Chin Gen Pract. 2021;24:2062‐2067. [Google Scholar]
- 26. Zhang Y, Chen GG, Yu D, Yao KS, Highway long short‐term memory RNNs for distant speech recognition: IEEE International Conference of Acoustics,Speech and Signal Processing (ICASSP), Shanghai, China, 20‐25 March 2016, IEEE; 2016:5755‐5759. [Google Scholar]
- 27. Xue SF, Yan ZJ, Improving latency‐controlled BLSTM acoustic models for online speech recognition: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, 05‐09 March 2017, IEEE; 2017. [Google Scholar]
- 28. Bian Y, Xiang Y, Tong B, Feng B, Weng X. Artificial intelligence‐assisted system in postoperative follow‐up of orthopedic patients: exploratory quantitative and qualitative study. J Med Internet Res. 2020;22:e16896. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Dal Grande E, Fullerton S, Taylor AW. Reliability of self‐reported health risk factors and chronic conditions questions collected using the telephone in South Australia, Australia. BMC Med Res Methodol. 2012;12:108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Xu HF, Tang HY, Yuan Y, Wu F, Lu Y, Wang YH. Risk characteristics of hypertension in high‐risk population: an analysis based on the surveillance data of chronic diseases in Shanghai (in Chinese). Shanghai J Prev Med. 2021;33. [Google Scholar]
- 31. Davenport T, Kalakota R. The potential for artificial intelligence in healthcare. Future Healthc J. 2019;6:94‐98. [DOI] [PMC free article] [PubMed] [Google Scholar]
