Abstract
Background
The purpose of this study was to reveal inter‐ and intra‐rater reliability of the detailed evaluation of cognitive function by assistive robot for older adults.
Methods
We investigated the inter‐rater and test–retest reliability. Neurobehavioral Cognitive Status Examination was conducted twice for each participant using an assistive robot and the examiner respectively (Experiment 1). The order of these two tests was randomly selected and the interval between them was 1 week. In Experiment 2, we investigated the test–retest reliability of the first robot test and this additional robot test was conducted approximately 6 weeks after Experiment 1.
Results
Fifty‐one (13 men and 38 women, mean age: 80.5 ± 5.6 years) participants went through Experiment 1 and 29 of those (eight men and 21 women, mean age: 80.4 ± 4.8 years) completed Experiment 2. In Experiment 1, the interclass coefficient (ICC) in orientation was in the high range and its Cronbach's α was 0.919, rated as excellent internal consistency. On the other hand, other items did not show positive results. In Experiment 2, the ICCs in orientation, attention, and repetition were in the adequate range, while other items showed marginal or low range.
Conclusions
Orientation was supposed to be utilised for figuring out initial symptoms of dementia. In the future, as robot functions become more high‐tech, a partner robot might be able to measure the symptoms and severity of dementia.
Keywords: assistive technology, cognitive assessment, older adults, robot
INTRODUCTION
The risks of mild cognitive impairment (MCI) and dementia are increasing in the ageing population. MCI often progresses to Alzheimer's disease (AD), with the incidence of MCI ranging from 40% to 75%, depending on the population in each study. 1 , 2 According to the World Health Organization (2022), dementia is one of the major causes of disability and dependency among older adults worldwide. 2 In Japan, more than 6 million older adults are certified for support needs in the long‐term care insurance system, and the number of individuals needing long‐term care has dramatically increased in recent years. 3 Therefore, it is important for older adults to maintain independent living for as long as possible. 4
In recent years, assistive technology (AT), aimed at maintaining the independence of older people, has been developed. In addition, studies have reported that ATs have a high potential for promoting independence when properly used, 4 , 5 , 6 reducing caregiver burden, 5 , 7 and increasing quality of life. 8 , 9 Recent studies have focused on robots, specifically communication robots and partner robots. 4 , 10 , 11 , 12 In previous studies, we confirmed the effectiveness of partner robots from various perspectives. We reported that older adults who received information about their schedule from a communication robot understood their daily tasks better and were able to effectively perform their daily activities. 13 In addition, a 1 month‐long intervention experiment showed that the number of free interactions increased, with the robots gradually becoming irreplaceable. 11 For robots to be effectively utilised, their introduction should also be understandable, depending on the type of dementia or the characteristics of cognitive dysfunction. For instance, step‐by‐step instructions are clearer for older people with moderate dementia, and critical information should be repeated for those with attention dysfunction. 10 Thus, robots need to adapt to the cognitive characteristics, lifestyles, and environmental factors of each older person.
However, evidence on the effectiveness and adaptability of assistive robots is limited. Dementia has multiple causes, such as AD and dementia with Lewy bodies, 14 and is characterised by symptoms such as memory deficits, attention deficits and disorientation. 15 , 16 In addition, the severity of symptoms easily changes. 17 Consequently, older people could use assistive robots for unmet needs, but not for long periods, with positive effects. 17 , 18 Therefore, robot instructions must be understandable based on individual cognitive characteristics and levels.
To assess the cognitive function of each older person while using the robot, we had the robot conduct cognitive tests. In our previous study, we clarified that the Japanese version of the Telephone Interview for Cognitive Status (TICS) conducted by assistive robots is reliable compared to assessments by occupational therapists. 19 It showed good reliability and was acceptable to participants; however, the screening test could not capture detailed characteristics of cognitive function.
In the present study, we focused on the Japanese version of the Neurobehavioral Cognitive Status Examination (COGNISTAT), 20 which is a detailed assessment tool used to identify individuals' cognitive impairments. 20 It assesses 10 domains of cognitive functioning: orientation, attention, language comprehension, repetition, naming, construction, memory, calculations, reasoning similarities, and judgement. The study targeted people with cognitive disabilities such as neurocognitive disorders, dementia, MCI, depression, schizophrenia, and alcohol dependence syndrome. Raters such as occupational therapists and psychologists must be proficient in psychological testing and interviews.
This study aimed to investigate the inter‐ and intra‐rater reliabilities of COGNISTAT. In terms of inter‐rater reliability, we compared two types of raters: assistive robots and occupational therapists. We also verified the intra‐rater reliability of the assistive robot with test–retest reliability. Similar to the TICS, we hypothesised that the evaluation findings would be equally reliable regardless of whether the rater was a human or a robot.
METHODS
Study design
This study followed the recommendations of the Guidelines for Reporting Reliability and Agreement Studies (GRRAS) 21 and the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE). 22 We explained the details of the method in accordance with these guidelines.
Subjects
Participants were recruited from older nursing homes and daily care facilities in urban areas of Japan. Enrolment began in November 2015 and was completed in August 2016. We recruited as many participants as possible because this was an exploratory and preliminary study. The inclusion criterion was adults aged 65 years or older who could communicate verbally. The exclusion criteria included hearing impairment, visual impairment, diagnosis or suspicion of a psychiatric disorder, and loss of consciousness. All participants provided written informed consent prior to data collection. In addition, participants received a cash voucher worth ¥1000 per test for their participation.
Robot and cognitive tests used in experiments
The present study used a communication robot called ‘PaPeRo’ (Fig. 1). Inoue et al., stated that robots should be equipped for speech recognition, speech synthesis, facial image recognition, autonomous mobility, head motion, light indication, and tactile sensors. Its key functions include providing the required information to users, attracting attention, prompting actions, and communicating. 4 PaPeRo was developed as a research platform.
Figure 1.

The information support robot, ‘PaPeRo’ (NEC). Colours are visible in the online version of the article; 4 https://doi.org/10.3233/TAD‐120357.
In the experiments (explained in detail in the procedures section), PaPeRo conducted the COGNISTAT 20 face‐to‐face using synthetic voices. COGNISTAT assesses 10 domains, as mentioned above. The scores were for cognitive functioning: orientation, attention, language comprehension, repetition, naming, construction, memory, calculations, reasoning similarities, and judgement. In the present study, PaPeRo conducted six out of the 10 domains mentioned above: orientation, attention, repetition, calculations, reasoning similarities, and judgement, because the other four domains require assessment materials that PaPeRo could not manage.
Procedures
The experiments followed the same protocol described in our previous report except for the assessment tool. 19 In Experiment 1, we investigated the internal consistency and inter‐rater reliability. COGNISTAT was conducted twice for each participant using PaPeRo (robot test 1) and a human rater (human test) respectively. The order of these two tests was randomly selected and counterbalanced across participants, and the interval between them was approximately 1 week. The human rater were two trained occupational therapists who were women in their 30s. In Experiment 2, we investigated the intra‐rater reliability of the robot test. This additional test using PaPeRo (robot test 2) was conducted approximately 6 weeks after the second test of Experiment 1. Two human raters independently calculated the scores and checked all consistency. There may be a learning effect with multiple implementations.
Before starting robot test 1, the participants were briefed on PaPeRo and the experimental procedures. First, the examiner explained that participants should respond to PaPeRo when its ears turn red, indicating that they have entered the preparatory mode for voice recognition. Participants were required to listen carefully to the questions and answer them, with instructions not to ask the examiner any questions except in emergencies.
In addition, the examiner developed a fixed scenario for the robot tests. In this scenario, PaPeRo introduced itself to the participants and explained COGNISTAT. Then, PaPeRo asked the questions from each item in the COGNISTAT, nodded to the participants' responses, and finished the assessment with a closing greeting ‘Thank you for chatting, and have a nice day’. The timing of the PaPeRo's speech was controlled by the examiner, who was sitting in another room and monitoring the participants' responses via video because PaPeRo could not automatically move.
Statistical analysis
The Cronbach's α was calculated to measure internal consistency, with values >0.80 considered excellent, 0.75–0.79 good, 0.70–0.74 moderate, 0.65–0.69 fair and ≦0.65 unsatisfactory. The interclass correlation coefficient (ICC) between the human and robot test, and between the two instances of the robot test, along with 95% confidence intervals (CI), were calculated to measure intra‐ and inter‐rater reliability respectively. An ICC >0.90 was considered very high, 0.08–0.89 high, 0.70–0.79 adequate, 0.60–0.69 marginal and ≦0.59 low. EZR (Saitama Medical Centre, Jichi Medical University, Saitama, Japan) was used for all analyses. 23
RESULTS
Participants
A total of 66 participants were recruited for the study, and 15 participants were excluded. Reasons for exclusion included two participants suspected of having severe hearing impairment, four with deconditioning on the first test day, and nine who rejected participation. Finally, 51 participants (13 men and 38 women, mean age: 80.5 ± 5.6 (SD) years: range = 68–91 years) went through Experiment 1, and 29 of these (eight men and 21 women, mean age: 80.4 ± 4.8 (SD) years, range: 72–90 years) completed Experiment 2 (see Table 1). None of the participants complained about hearing the instructions from PaPeRo.
Table 1.
The number of participants, gender, and age in each experiment
| n (%) | Gender | Age mean (SD) | ||
|---|---|---|---|---|
| F | M | |||
| Experiment 1 | 51 (63.8) | 13 (25.5) | 38 (74.5) | 80.5 (5.6) |
| Experiment 2 | 29 (36.2) | 8 (27.6) | 21 (72.4) | 80.4 (4.8) |
In terms of raters, two trained occupational therapists conducted the human test and they also rated the robot test independently according to the COGNISTAT manual. The study protocol was explained to the participants and their families and was presented in a written document. One family member signed an informed consent form in case the participant was diagnosed with dementia.
Experiment 1
The median scores of COGNISTAT for each item, Cronbach's α and the ICC are shown in Table 2. The median score for orientation was 10.0 in the human test and robot test 1. The ICC for orientation, an alternate form of reliability, was 0.851 (95% CI 0.753–0.912), which was in the ‘high’ range. Similarly, Cronbach's α for orientation was 0.919, rated as ‘excellent’ internal consistency. On the other hand, the ICC in reasoning similarities was 0.672 (95% CI 0.491–0.799), which was in the ‘marginal’ range, although Cronbach's α was 0.809, giving ‘excellent’ internal consistency. In addition, the ICC for attention, repetition, calculations, and judgement were in the ‘low’ range, which was ≦0.59, and their Cronbach's α were 0.65 or less, rated as ‘unsatisfactory’ internal consistency except for calculations at 0.745 as ‘moderate’ (see Table 2).
Table 2.
Results of COGNISTAT in the human test and robot test 1
| Human test | Robot test 1 | Cronbach's α | ICC (1, 2) and 95% CI | ||
|---|---|---|---|---|---|
| COGNISTAT | Orientation | 10.0 (9.5–10.0) | 10.0 (10.0–10.0) | 0.919 | 0.851 [0.753–0.912] |
| Median (IQR) | Attention | 10.0 (6.0–10.0) | 3.0 (0.0–8.0) | 0.608 | 0.261 [−0.078–0.544] |
| Repetition | 11.0 (9.5–11.0) | 8.0 (6.0–11.0) | 0.634 | 0.345 [−0.001–0.600] | |
| Calculations | 10.0 (10.0–10.0) | 10.0 (10.0–10.0) | 0.745 | 0.578 [0.361–0.735] | |
| Reasoning Similarities | 9.0 (9.0–10.0) | 10.0 (9.0–10.0) | 0.809 | 0.672 [0.491–0.799] | |
| Judgement | 10.0 (10.0–11.0) | 10.0 (9.0–11.0) | 0.658 | 0.469 [0.229–0.657] |
Abbreviations: CI, confidence interval; COGNISTAT, Neurobehavioral Cognitive Status Examination; Cronbach's α; ICC, interclass correlation coefficient; IQR, interquartile range.
Experiment 2
In Experiment 2, the score for each domain on COGNISTAT and the ICC scores for robot tests are presented in Table 3. The ICCs for orientation, attention, and repetition were 0.710, 0.752, and 0.754, respectively, which fall within the ‘adequate’ range. In contrast, the ICC for reasoning similarities and judgement were in the ‘low’ range with value of 0.499 and 0.352 respectively (see Table 3). The ICC of calculations was −0.0481.
Table 3.
Results of COGNISTAT in robot test 1 and 2 (test–retest reliability)
| Robot test 1 | Robot test 2 | ICC (1) and 95% CI | ||
|---|---|---|---|---|
| COGNISTAT | Orientation | 10.0 (10.0–10.0) | 10.0 (10.0–10.0) | 0.710 [0.473–0.852] |
| Median (IQR) | Attention | 6.0 (0.0–8.0) | 6.0 (0.0–10.0) | 0.752 [0.539–0.875] |
| Repetition | 11.0 (8.0–11.0) | 11.0 (8.0–11.0) | 0.754 [0.543–0.876] | |
| Calculations | 10.0 (10.0–10.0) | 10.0 (10.0–10.0) | −0.0481 [−0.398–0.316] | |
| Reasoning Similarities | 10.0 (9.0–10.0) | 10.0 (9.0–11.0) | 0.499 [0.179–0.724] | |
| Judgement | 10.0 (9.0–11.0) | 11.0 (10.0–12.0) | 0.352 [0.001–0.627] |
Abbreviations: CI, confidence interval; COGNISTAT, Neurobehavioral Cognitive Status Examination; ICC, interclass correlation coefficient; IQR, interquartile range.
DISCUSSION
The major findings of this study indicated that only orientation showed excellent internal consistency and ICCs when comparing human and robot tests. However, most other items did not show good results except for reasoning similarities, which exhibited excellent internal consistency. In terms of test–retest comparison in the robot tests, orientation, attention and repetition showed adequate ICCs.
First, in our previous study, the TICS, a screening test for cognitive impairment, showed good reliability and suggested the possibility of the clinical use of robot testing; 19 however, the findings were disappointing, except for orientation. This result may indicate that the length of the sentences given by COGNISTAT was much longer than those in the TICS owing to the detailed inspection, making it more difficult for participants to understand what the robot was talking about. In particular, regarding the order of numbers requiring attention, participants needed to list seven digits. It was considered that the nuances of the robot's speech were different from those of human speech, which may have made it more difficult to understand, as previous research has shown. 24 From this perspective, robot speech and voice recognition technology have improved in recent years, and there is a possibility that it can be improved to a voice similar to the human voice. 25 Therefore, we believe that it will be necessary to conduct experiments using improved robots in the future. Similar to our previous study 19 and other studies 26 showing the practicality of cognitive function tests using robotic cognitive tests, it is possible that detailed evaluations of cognitive functions can be performed by improving the voice recognition and speech technology of robots. Future studies should be organised and conducted.
Second, regarding orientation, good reliability was observed between humans and robots and between robots. In particular, the time orientation item captures the early stages of dementia and is useful for screening for dementia. 27 Furthermore, time disorientation hinders older adults' abilities to perform daily activities and manage their schedules and knowing the extent of this disorder is important when considering future support. 6 These findings suggest that investigating time orientation using robots is especially useful. Although different from the purpose of this research, which was to understand the subject's detailed cognitive function, daily monitoring of date while spending time with the robot as a partner is a good way to understand the subject's cognitive functional state and whether it is deteriorating. Not only can this system be used to identify problems early, but it can also be a means of finding ways to provide support.
However, the present study has several limitations. First, we collected data from older people both with and without dementia. Therefore, it is unknown whether these findings can be generalised to other older adults, including individuals diagnosed with or suspected of having psychiatric disorders. We needed data from older people with various cognitive levels and personal factors. Second, the sample size was too small to apply the present results to a larger population. In addition, all participants volunteered to participate in this study, which may have resulted in a sampling bias, particularly with a disproportionate number of participants who considered the robot test acceptable. Multicentre studies are required to confirm the feasibility of using robots for cognitive testing. Another limitation is that the cognitive tests were performed three times. Repeating the same task may have had learning effects that could influence the findings. Finally, not all aspects of the cognitive test were carried out by the robots in this study; researchers monitored the timing of the participants' responses, decided when the robot should proceed to the next question, and provided feedback on the findings after scoring. Therefore, it is necessary to develop the above‐mentioned technical functions to determine whether robots can be considered reliable and acceptable as a new type of interface.
In conclusion, we could not show that the robot test effectively captured the details of cognitive functions. In the future, as robot capacities become more advanced, it may be possible to convey information more correctly through longer sentences. This suggests that a partner robot may be able to measure daily symptoms and severity of dementia by checking the dates and days of the week.
Author contribution
Yuko Nishiura: writing‐original draft, conceptualisation, methodology, investigation, formal analysis. Takenobu Inoue: supervision, funding acquisition, writing‐review and editing. Kana Takaeda: conceptualisation, methodology, investigation, data curation, formal analysis. Tomoko Kamimura: supervision, conceptualisation, project administration, writing‐review and editing.
Disclosure
The authors declare no potential conflicts of interest.
Funding
We disclose receipt of the following financial support for the research, authorship, and/or publication of this article: Japan Science and Technology Agency ‘Strategic Promotion of Innovation Research and Development’ Grant Number JPMJSV1011, Japan.
Ethics approval statement
This study was approved by the Ethics Committee of the National Rehabilitation Centre for Persons with Disabilities (C27115) and the Medical Ethics Committee of Shinshu University before commencing the research.
Patient consent statement
The study protocol was explained to the participants and their families and was presented in a written document. One family member signed an informed consent form for participation in the study in case the participant was diagnosed with dementia.
Acknowledgments
The authors are grateful to all participants and supporting staff in Seikatsu Kagaku Un‐Ei Co., Ltd. We also We thank Hiroaki Kojima and Ken Sadohara from the National Institute of Advanced Industrial Science and Technology for installing the scenario dialogue for the cognitive test and operating the robot during the experiments. We would like to thank Editage (www.editage.jp) for English language editing.
DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.
References
- 1. 2022 Alzheimer's disease facts and figures. Alzheimers Dement 2022; 18: 700–789. 10.1002/alz.12638. [DOI] [PubMed] [Google Scholar]
- 2. WHO updates fact sheet on Dementia [cited 15 March 2023], Available from: https://www.who.int/news-room/fact-sheets/detail/dementia.
- 3. Cabinet Office Annual Report on the Ageing Society. 2021. Annual report on the ageing society [Summary] FY2021, Available from: https://www8.cao.go.jp/kourei/english/annualreport/2021/pdf/2021.pdf.
- 4. Inoue T, Nihei M, Narita T et al. Field‐based development of an information support robot for persons with dementia. Technol Disabil 2012; 24: 263–271. [Google Scholar]
- 5. Chaurasia P, McClean SI, Nugent CD et al. Modelling assistive technology adoption for people with dementia. J Biomed Inform 2016; 63: 235–248. [DOI] [PubMed] [Google Scholar]
- 6. Nishiura Y, Nihei M, Nakamura‐Thomas H, Inoue T. Effectiveness of using assistive technology for time orientation and memory, in older adults with or without dementia. Disabil Rehabil Assist Technol 2021; 16: 472–478. [DOI] [PubMed] [Google Scholar]
- 7. Lauriks S, Meiland F, Osté JP et al. Effects of assistive home technology on quality of life and falls of people with dementia and job satisfaction of caregivers: results from a pilot randomized controlled trial. Assist Technol 2020; 32: 243–250. [DOI] [PubMed] [Google Scholar]
- 8. Orpwood R, Chadd J, Howcroft D et al. Designing technology to improve quality of life for people with dementia: user‐led approaches. Univ Access Inf Soc 2010; 9: 249–259. [Google Scholar]
- 9. Orpwood R, Sixsmith A, Torrington J, Chadd J, Gibson G, Chalfont G. Designing technology to support quality of life of people with dementia. Technol Disabil 2007; 19: 103–112. [Google Scholar]
- 10. Nishiura Y, Nihei M, Takaeda K, Inoue T. Comprehensible instructions from assistive robots for older adults with or without cognitive impairment. Assist Technol 2022; 34: 557–562. [DOI] [PubMed] [Google Scholar]
- 11. Mizuno J, Sadohara K, Nihei M, Onaka S, Nishiura Y, Inoue T. The application of an information support robot to reduce agitation in an older adult with Alzheimer's disease living alone in a community dwelling: a case study. Hong Kong J Occup Ther 2021; 34: 50–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Nihei M, Nishiura Y, Mamiya I et al. Change in the relationship between the elderly and information support robot system living together. Hum Asp IT Aged Popul 2017: 433–442. https://link.springer.com/chapter/10.1007/978-3-319-58536-9_34 [Google Scholar]
- 13. Nishiura Y, Inoue T, Nihei M. Appropriate talking pattern of an information support robot for people living with dementia: a case study. J Assist Technol 2014; 8: 177–187. [Google Scholar]
- 14. Gale SA, Acar D, Daffner KR. Dementia. Am J Med 2018; 131: 1161–1169. [DOI] [PubMed] [Google Scholar]
- 15. Fujii M, Butler JP, Sasaki H. Core symptoms and peripheral symptoms of dementia. Geriatr Gerontol Int 2018; 18: 979–980. [DOI] [PubMed] [Google Scholar]
- 16. Little MO. Reversible dementias. Clin Geriatr Med 2018; 34: 537–562. [DOI] [PubMed] [Google Scholar]
- 17. Lorenz K, Freddolino PP, Comas‐Herrera A, Knapp M, Damant J. Technology‐based tools and services for people with dementia and carers: mapping technology onto the dementia care pathway. Dementia (London) 2017; 18: 725–741. [DOI] [PubMed] [Google Scholar]
- 18. Kenigsberg PA, Aquino JP, Bérard A et al. Assistive technologies to address capabilities of people with dementia: from research to practice. Dementia (London) 2019; 18: 1568–1595. [DOI] [PubMed] [Google Scholar]
- 19. Takaeda K, Kamimura T, Inoue T, Nishiura Y. Reliability and acceptability of using a social robot to carry out cognitive tests for community‐dwelling older adults. Geriatr Gerontol Int 2019; 19: 552–556. [DOI] [PubMed] [Google Scholar]
- 20. Kiernan RJ, Mueller J, Langston JW et al. The neurobehavioral cognitive status examination: a brief but quantitative approach to cognitive assessment. Ann Intern Med 1987; 107: 481–485. [DOI] [PubMed] [Google Scholar]
- 21. Kottner J, Audige L, Brorson S et al. Guidelines for reporting reliability and agreement studies (GRRAS) were proposed. J Clin Epidemiol 2011; 64: 96–106. [DOI] [PubMed] [Google Scholar]
- 22. von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP. The strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies. J Clin Epidemiol 2008; 61: 344–349. [DOI] [PubMed] [Google Scholar]
- 23. Kanda Y. Investigation of the freely available easy‐to‐use software ‘EZR’ for medical statistics. Bone Marrow Transplant 2013; 48: 452–458. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Schreibelmayr S, Mara M. Robot voices in daily life: vocal human‐likeness and application context as determinants of user acceptance. Front Psychol 2022; 13: 1–17. 10.3389/fpsyg.2022.787499. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Meng Z, Liu H, Ma AC. Optimizing voice recognition informatic robots for effective communication in outpatient settings. Cureus 2023; 15: e44848. 10.7759/cureus.44848. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Chang YL, Luo DH, Huang TR, Goh JOS, Yeh SL, Fu LC. Identifying mild cognitive impairment by using human‐robot interactions. J Alzheimers Dis 2022; 85: 1129–1142. [DOI] [PubMed] [Google Scholar]
- 27. Dumurgier J, Dartigues JF, Gabelle A et al. Time orientation and 10 years risk of dementia in elderly adults: the Three‐City study. J Alzheimers Dis 2016; 53: 1411–1418. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.
