Skip to main content
European Heart Journal. Digital Health logoLink to European Heart Journal. Digital Health
. 2023 Feb 2;4(2):119–124. doi: 10.1093/ehjdh/ztad008

Automated categorization of virtual reality studies in cardiology based on the device usage: a bibliometric analysis (2010–2022)

Akinori Higaki 1,2,✉,2, Yuta Watanabe 3, Yusuke Akazawa 4, Toru Miyoshi 5, Hiroshi Kawakami 6, Fumiyasu Seike 7, Haruhiko Higashi 8, Takayuki Nagai 9, Kazuhisa Nishimura 10, Katsuji Inoue 11, Shuntaro Ikeda 12, Osamu Yamaguchi 13
PMCID: PMC10039423  PMID: 36974263

Abstract

Aims

Currently, virtual reality (VR) constitutes a vital aspect of digital health, necessitating an overview of study trends. We classified type A studies as those in which health care providers utilized VR devices and type B studies as those in which patients employed the devices. This study aimed to analyse the characteristics of each type of studies using natural language processing (NLP) methods.

Methods and results

Literature related to VR in cardiovascular research was searched in PubMed between 2010 and 2022. The characteristics of studies were analysed based on their classification as type A or type B. Abstracts of the studies were used as corpus for text mining. A binary logistic regression model was trained to automatically categorize the abstracts into the two study types. Classification performance was evaluated by accuracy, precision, recall, F-1 score, and c-statistics of the receiver operator curve (ROC) analysis. In total, 171 articles met the inclusion criteria, where 120 (70.2%) were type A studies and 51 (29.8%) were type B studies. Type A studies had a higher proportion of case reports than type B studies (18.3% vs. 3.9%, P = 0.01). As for abstract classification, the binary logistic regression model yielded 88% accuracy and an area under the ROC of 0.98. The words ‘training’, ‘3d’, and ‘simulation’ were the most powerful determinants of type A studies, while the words ‘patients’, ‘anxiety’, and ‘rehabilitation’ were more indicative for type B studies.

Conclusions

NLP methods revealed the characteristics of the two types of VR-related research in cardiology.

Keywords: Virtual reality, Bibliometric analysis, Language processing

Graphical Abstract

Graphical abstract.

Graphical abstract

Introduction

Virtual reality (VR) and extended reality (XR) are currently considered an integral part of digital health and this is also true in the field of cardiology.1,2 This flourishing of VR utilization could be partly attributed to the emergence of commercially available head-mounted displays (HMD) in the mid-2010s.3 In such a relatively new research field, a roadmap is required to navigate the research trends and is useful for the strategic planning of future research.

During the preliminary research, we found that there are two broad categories of VR device usage. One use is for health care providers (including physicians) to receive training and education using VR devices, and the other is for patients to wear VR devices to receive rehabilitation and medical therapy. Since the two have very different research directions, we felt it necessary to distinguish them when analysing the research topics and trends. A categorization of XR usage in the cardiovascular field has recently been proposed by Skalidis et al. along with the concept of ‘CardioVerse’.4 In this article, the authors divide the use of XR into four components according to two axes: ‘simulation or augmentation’ and ‘external or intimate’. While this is accurate, the drawback of this categorization is that it is not intuitively understandable, and therefore, we propose to use the simple dichotomy: ‘who wears the VR-device, the doctor or the patient?’.

In this context, we assessed the characteristics of VR-related studies in cardiology, according to the user of VR devices by natural language processing (NLP) methods, which are known to be useful for the bibliometric analysis.5 We also developed a publicly available web application where one can try our machine learning model to categorize the research abstracts.

Method

Literature search and the database creation

The literature search for this study was conducted by two researchers, A.H. and Y.W. A PRISMA-like flow diagram was utilized to screen the candidate literature for the application of NLP methods in the analysis of abstracts. The search was initially conducted on 5 January 2023, where literature related to VR in cardiovascular care published between January 2010 and September 2022 was searched in PubMed using a keyword search: [‘virtual reality’ AND (‘cardiology’ OR ‘cardiovascular’ OR ‘cardiac’)]. Papers for which abstracts were not available were excluded from the list. Additionally, review articles, including systematic reviews, were also excluded from the analysis. Based on the titles and abstracts, studies that did not use VR or were not related to the cardiovascular field were also excluded. Preclinical studies using experimental animal models were also excluded from the analysis.

Definition of study groups

We defined type A as studies in which health care providers (including physicians and medical students) used VR devices and type B as studies in which patients (and/or their families) used the devices. VR devices include HMD, medical simulator, and autostereoscopic display.

The decision on whether each document is type A or B was finalized by agreement between A.H. and Y.W. If we discovered a study in which both the health care providers and the patients utilized VR devices, we designated it as type B.

Natural language processing methods

Text analysis on the abstracts was conducted to see if the NLP methods could distinguish research groups. First, frequently observed words in each group were identified using WordCloud.6 Next, a prediction model to classify abstracts into type A or B studies was constructed. Sentences in the abstracts were tokenized into bigrams and vectorized by term frequency–inverse document frequency (tf–idf). Thereafter, vectorized abstracts were used as an input data for a binary logistic regression model. Five-fold cross-validation was used to evaluate the performance of the prediction model. To determine the characteristic keywords that contributed to the group prediction, words with high coefficients of the regression model are identified.

scikit-learn, a machine learning library of Python, was used for this analysis.

Development of a publicly available web application

We have implemented a web application using the Streamlit library of Python. The parameters and the structure of the trained machine learning model were saved using the joblib library. Then, the codes and data were housed in our public GitHub repository (https://github.com/ahigaki/vr-categorizer), which was linked to Streamlit Sharing (https://ahigaki-vr-categorizer-str-app-gb1m6v.streamlitapp.com).

Statistical analyses

Data are shown as mean ± standard deviation for continuous variables and percentage for categorical variables. For between-group comparisons, Welch’s t-test for continuous variables and χ2 tests for qualitative variables were used. Statistical significance was defined as a P-value of <0.05. All statistical analyses were performed using SciPy, a Python library. The performance of the binary prediction model was evaluated by accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUROC).

Results

Bibliometric analysis

The initial keyword search yielded 454 entities, and out of those, 171 articles met the inclusion criteria. Of those articles, 120 (70.2%) were type A studies and 51 (29.8%) were type B studies, as shown in Figure 1A. Both types of studies are increasing in number each year, with a sharp increase since 2018. The proportion of type B studies changed year to year but peaked in 2021 (n = 14, 40%) as shown in Figure 1B.

Figure 1.

Figure 1

Categorization of virtual reality studies in cardiology and its publication trends. (A) Shows that the initial keyword search yielded 454 entities and, out of those, 171 articles met the inclusion criteria. (B) Shows that both types of studies are increasing in number each year, with a sharp increase since 2018. Type A studies have been the majority throughout the entire period.

As shown in the Table 1, the mean published year was earlier in type B studies than in type A studies, but it was not statistically significant (2018.8 ± 3.4 vs. 2019.6 ± 2.9, P = 0.11). There were also no significant differences in the average length of abstracts and the number of authors between groups. Type A publications included significantly higher ratio of case reports than type B publications (18.3% vs. 3.9%, P = 0.01). On the other hand, type A publications had a significantly lower proportion of randomized controlled trials (RCTs) compared to type B publications, with 11.7% compared to 37.3%, respectively (P < 0.01). In terms of RCTs, the total number of participants was 2555 with an average of 77.4 per study.

Table 1.

Characteristics of publications according to the type of VR device usage

Overall Type A Type B P-value
n = 171 n = 120 (70.2) n = 51 (29.8)
Published year 2019 ± 3.2 2018.8 ± 3.4 2019.6 ± 2.9 0.11
Number of authors 6.8 ± 3.0 6.5 ± 2.9 7.1 ± 3.1 0.20
Words of abstract 229.4 ± 82.1 227.2 ± 84.7 234.5 ± 76.0 0.57
Case reports (%) 24 (14.0) 22 (18.3) 2 (3.9) 0.01
Randomized controlled study (%) 33 (19.3) 14 (11.7) 19 (37.3) <0.01

Figure 2 shows the distribution of countries/regions where the studies were conducted. In total, Europe was the most dominant region (n = 71, 41.5%) and the North America area came next (n = 51, 29.8%). However, when limited to type A, the largest number of studies came from North America (n = 44, 36.7%), and in type B, studies from Latin America increased to about 20% of the total (n = 10, 19.6%).

Figure 2.

Figure 2

The distribution of countries/regions where the studies were conducted. Pie charts show the dominance of regions in each study category. In total, Europe and North America are the dominant regions accounting for 70% of the total. In type B category, studies from Latin America accounts for about 20% of the total.

Natural language processing

As visualized in Figure 3, type A studies are characterized by the words ‘training’, ‘simulation’, and ‘procedure’. On the other hand, ‘patient’, ‘anxiety’, and ‘pain’ were the more frequently observed words in the type B studies. The predictive accuracy of the binary logistic regression model was calculated as 88.4 ± 0.06%. The mean precision and recall were 0.86 and 1.00, respectively. The AUROC was 0.98 ± 0.03 (Table 2).

Figure 3.

Figure 3

Visualization of characteristic words by WordCloud. Words which have higher number of occurrences are displayed in a large letter. (A) Shows frequently observed words in abstracts from type A studies. (B) Shows those from type B studies.

Table 2.

Performance of the binary logistic regression model to classify abstracts into type A or B

Five-fold CV Mean SD
Accuracy (%) 88.4 0.06
Precision 0.86 0.06
Recall 1.00 0.00
F1-score 0.92 0.04
AUROC 0.98 0.03

Values are obtained as mean and SD from five-fold CV. CV, cross–validation; SD, standard deviation; AUROC, area under the receiver operating characteristic curve.

Figure 4 shows the top-10 high coefficients of the logistic regression model, indicating the most contributing words to the prediction of the study types. In other words, the presence of words ‘simulation’, ‘3d’, and ‘training’ suggests the type A study, while the presence of words ‘patients’, ‘rehabilitation’, and ‘anxiety’ suggests the type B study. In addition to the data analysis, we also implemented a web application, where anyone can try using our machine learning model to classify the VR-related literatures (https://ahigaki-vr-categorizer-str-app-gb1m6v.streamlitapp.com). As shown in Figure 5, the results of the classification will be automatically displayed, whenever one puts the research abstract in the designated area.

Figure 4.

Figure 4

Top-10 high coefficients of the logistic regression model. Words with high coefficient contribute to the prediction of the study types. If the words ‘training’, ‘3d’, and ‘simulation’ are included in the abstract, the study is more likely to be categorized as type A, while the presence of words ‘patients’, ‘anxiety’, and ‘rehabilitation’ suggests the type B study.

Figure 5.

Figure 5

Overview of the web application. The results of the classification will be automatically displayed, whenever one puts the research abstract in the designated area. The probability of the regression model is simultaneously displayed with an illustration.

Discussion

Since the introduction of commercially available HMD in mid-2010’s, VR applications have been extendedly used in the medical domain. In this study, we categorized VR-related academic literatures in cardiology by focusing on ‘who uses VR devices’ and analysed their characteristics. Our bibliometric analysis revealed that the type B studies (where patients use VR devices) have emerged relatively recently and seem to be benefited more from the availability of personal HMD after mid-2010’s than type A studies (where doctors use VR devices). It seems natural that type B studies include less case reports, because VR devices in type B studies are mainly used for therapeutic purposes rather than for training or simulation purposes. Studies from Latin America were more commonly found in the type B category. However, we currently do not have a hypothesis to explain the variation in research type distribution by region.

In this study, NLP methods enabled a detailed analysis of the research abstracts. Previously, we have shown that latent Dirichlet allocation could identify research topics in the machine learning-related cardiovascular research.7 However, this approach is not suited to our goal of dividing the literature into predefined two categories. Therefore, we took a supervised machine learning approach using a logistic regression classifier. Before training the model, we have tokenized the abstracts into bigrams and vectorized them with tf–idf. tf–idf can take into account both the rarity of a term in the entire corpus and its frequency of occurrence within each document and has been widely used as a pre-processing method in text categorization tasks.8 This vectorization worked well in training the logistic regression classifier and we could obtain a model with high predictive performance. We also succeeded to identify the terms contributing to the prediction of research types. In our analysis, the abstracts of type B studies are characterized by the words such as ‘patients’, ‘rehabilitation’, ‘anxiety’, and ‘pain’, indicating that the VR technologies are mainly used for the virtual cardiac rehabilitation and pain relief in the invasive procedures.

As introduced in this study, automatic classification of scientific papers could be useful in many situations.9 For examples, it helps one to quickly find studies that match one’s interests and is also useful for screening the literature when conducting systematic reviews. Since we have developed a publicly available web application, we believe that our method will facilitate the development of VR medicine in cardiology.

Limitations

We admit several limitations in our study. First, we only included MEDLINE-indexed literature in the analysis; inclusion of other databases may have affected the results. Although we made every effort to list all relevant articles, there may be references that were not detected by our keyword search. Second, there can be kinds of studies that do not fit into the two categories. The first example that comes to mind is medical care in which both the medical professional and the patient wear VR devices, which we see as an extension of type B. Furthermore, studies utilizing augmented reality and mixed reality, which are commonly grouped together with virtual reality under the term XR, were excluded from our analysis. This is due to the broad nature of the concept, which would not fit within the type A/B dichotomy proposed in this study. We are aware of the need to update the concept of categories when other types of studies arise.

Additionally, our machine learning model must be regularly updated as new research emerges, as is typical with data-driven models. More advanced NLP models utilizing deep learning will be required for more complex classification tasks that involve a deeper understanding of sentence semantics.

Conclusion

Results suggested the utility of the NLP approach in our novel categorization of VR studies in cardiovascular medicine.

Contributor Information

Akinori Higaki, Department of Cardiology, Pulmonology, Hypertension, and Nephrology, Ehime University Graduate School of Medicine, 454 Shitsukawa, Toon 791-0204, Japan; Department of Intractable Disease and Aging Science, Ehime University Graduate School of Medicine, 454 Shitsukawa, Toon 791-0204, Japan.

Yuta Watanabe, Department of Cardiology, Pulmonology, Hypertension, and Nephrology, Ehime University Graduate School of Medicine, 454 Shitsukawa, Toon 791-0204, Japan.

Yusuke Akazawa, Department of Cardiology, Pulmonology, Hypertension, and Nephrology, Ehime University Graduate School of Medicine, 454 Shitsukawa, Toon 791-0204, Japan.

Toru Miyoshi, Department of Cardiology, Pulmonology, Hypertension, and Nephrology, Ehime University Graduate School of Medicine, 454 Shitsukawa, Toon 791-0204, Japan.

Hiroshi Kawakami, Department of Cardiology, Pulmonology, Hypertension, and Nephrology, Ehime University Graduate School of Medicine, 454 Shitsukawa, Toon 791-0204, Japan.

Fumiyasu Seike, Department of Cardiology, Pulmonology, Hypertension, and Nephrology, Ehime University Graduate School of Medicine, 454 Shitsukawa, Toon 791-0204, Japan.

Haruhiko Higashi, Department of Cardiology, Pulmonology, Hypertension, and Nephrology, Ehime University Graduate School of Medicine, 454 Shitsukawa, Toon 791-0204, Japan.

Takayuki Nagai, Department of Cardiology, Pulmonology, Hypertension, and Nephrology, Ehime University Graduate School of Medicine, 454 Shitsukawa, Toon 791-0204, Japan.

Kazuhisa Nishimura, Department of Cardiology, Pulmonology, Hypertension, and Nephrology, Ehime University Graduate School of Medicine, 454 Shitsukawa, Toon 791-0204, Japan.

Katsuji Inoue, Department of Cardiology, Pulmonology, Hypertension, and Nephrology, Ehime University Graduate School of Medicine, 454 Shitsukawa, Toon 791-0204, Japan.

Shuntaro Ikeda, Department of Cardiology, Pulmonology, Hypertension, and Nephrology, Ehime University Graduate School of Medicine, 454 Shitsukawa, Toon 791-0204, Japan.

Osamu Yamaguchi, Department of Cardiology, Pulmonology, Hypertension, and Nephrology, Ehime University Graduate School of Medicine, 454 Shitsukawa, Toon 791-0204, Japan.

Funding

No funding was received for the present work.

Data availability

All datasets analysed are publicly available through PubMed. The list of analysed studies and python scripts used in the study are available on the Mendeley Data Repository (DOI: 10.17632/56kbcd93n4.2).

References

  • 1. Jung C, Wolff G, Wernly B, Bruno RR, Franz M, Schulze PC, et al. Virtual and augmented reality in cardiovascular care. JACC Cardiovasc Imaging 2021;15:519–532. [DOI] [PubMed] [Google Scholar]
  • 2. Mahtab EAF, Egorova AD. Current and future applications of virtual reality technology for cardiac interventions. Nat Rev Cardiol 2022;19:779–780. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Chen C, Liu K, Yang H, Wu M. The development characteristics of virtual reality after ‘the year of VR’. In: 2020 International Conference on Innovation Design and Digital Technology (ICIDDT). 2020. p152–155. IEEE [Google Scholar]
  • 4. Skalidis I, Muller O, Fournier S. Cardioverse: the cardiovascular medicine in the era of metaverse. Trends Cardiovasc Med S1050-1738(22)00071-8. Published online ahead of print 2022. [DOI] [PubMed] [Google Scholar]
  • 5. Scaccia JP, Scott VC. 5335 Days of implementation science: using natural language processing to examine publication trends and topics. Implement Sci 2021;16:47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Halvey MJ, Keane MT. An assessment of tag presentation techniques. In: Proceedings of the 16th International Conference on World Wide Web—WWW ‘07. 2007. p1313. ACM Press, New York, NY, USA. [Google Scholar]
  • 7. Higaki A, Uetani T, Ikeda S, Yamaguchi O. Co-authorship network analysis in cardiovascular research utilizing machine learning (2009–2019). Int J Med Inform 2020;143:104274. [DOI] [PubMed] [Google Scholar]
  • 8. Trstenjak B, Mikac S, Donko D. KNN with TF-IDF based framework for text categorization. Procedia Eng 2014;69:1356–1364. [Google Scholar]
  • 9. Kim SW, Gil JM. Research paper classification systems based on TF-IDF and LDA schemes. Human centric Comput Inf Sci 2019;9:30. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All datasets analysed are publicly available through PubMed. The list of analysed studies and python scripts used in the study are available on the Mendeley Data Repository (DOI: 10.17632/56kbcd93n4.2).


Articles from European Heart Journal. Digital Health are provided here courtesy of Oxford University Press on behalf of the European Society of Cardiology

RESOURCES