Skip to main content
PLOS One logoLink to PLOS One
. 2023 Jun 7;18(6):e0286347. doi: 10.1371/journal.pone.0286347

An analysis of Chinese nursing electronic medical records to predict violence in psychiatric inpatients using text mining and machine learning techniques

Ya-Han Hu 1,2, Jeng-Hsiu Hung 3,4, Li-Yu Hu 5,6, Sheng-Yun Huang 7, Cheng-Che Shen 5,7,8,9,*
Editor: Qin Xiang Ng10
PMCID: PMC10246850  PMID: 37285344

Abstract

Background

The prevalence of violence in acute psychiatric wards is a critical concern. According to a meta-analysis investigating violence in psychiatric inpatient units, researchers estimated that approximately 17% of inpatients commit one or more acts of violence during their stay. Inpatient violence negatively affects health-care providers and patients and may contribute to high staff turnover. Therefore, predicting which psychiatric inpatients will commit violence is of considerable clinical significance.

Objective

The present study aimed to estimate the violence rate for psychiatric inpatients and establish a predictive model for violence in psychiatric inpatients.

Methods

We collected the structured and unstructured data from Chinese nursing electronic medical records (EMRs) for the violence prediction. The data was obtained from the psychiatry department of a regional hospital in southern Taiwan, covering the period between January 2008 and December 2018. Several text mining and machine learning techniques were employed to analyze the data.

Results

The results demonstrated that the rate of violence in psychiatric inpatients is 19.7%. The patients with violence in psychiatric wards were generally younger, had a more violent history, and were more likely to be unmarried. Furthermore, our study supported the feasibility of predicting aggressive incidents in psychiatric wards by using nursing EMRs and the proposed method can be incorporated into routine clinical practice to enable early prediction of inpatient violence.

Conclusions

Our findings may provide clinicians with a new basis for judgment of the risk of violence in psychiatric wards.

Background

People with mental illness are at greater risk of violence, although most do not act violently [13]. According to the National Institute of Mental Health’s Epidemiologic Catchment Area survey, patients with serious mental illness, including major depressive disorder, schizophrenia, or bipolar disorder, are 2 to 3 times more likely than people without such illnesses to be assaultive [3]. In addition, violence is a common problem that is a frequent cause of injuries to clinicians in psychiatric inpatient units [4, 5]. In a meta-analysis investigating violence in psychiatric inpatient units, the researchers determined that 17% of inpatients committed one or more act of violence during their stay [5]. Inpatient aggression can negatively affect health-care providers, patients, and the therapeutic environment due to the influence of aggression and the measures implemented to prevent aggression generally being counter-therapeutic [57].

A multitude of studies have been conducted on aggression/violence in psychiatric inpatient units [4, 5, 830]. Most have focused on individual patient risk factors for aggression/violence. The factors considered to be most associated with patient aggression/violence are the existence of previous episodes, the victim and aggressor being of the same sex, being hostile and impulsive, having experienced involuntary admission, and having longer hospitalization [9, 21, 22]. Furthermore, some studies have revealed situational, relational, and environmental factors to also be related to aggression/violence in inpatient settings [1012, 22]. A summary of recent studies on the prevalence and risk factors of violence among psychiatric inpatients is presented in Table 1. When the results of studies that recruited different populations were compared in a meta-analysis, however, only a few of the aforementioned factors were determined to be effective predictors of aggression [13]. Predicting violent incidents can be highly challenging [31]. For example, a study demonstrated that inaccuracy in violence risk assessment is often experience-related [32], and Eaton et al indicated that violence may be impossible to predict in patients with mental disorders [33].

Table 1. Comparative analysis of prevalence and risk factors of violence in psychiatric inpatients: A review of recent studies (2019–2022).

Author Study design Sample (N) Definition of aggression/violence Prevalence Risk factors
Brown et al, 2019 (UK) [25] Retrospective 394 With intention to attempt, threaten or inflict harm on another human 42.2% A history of head injury
Menger et al, 2019 (Netherlands) [26] Retrospective Site 1: 3189
Site 2: 3253
All threatening and violent behavior of a verbal or physical nature directed at another person but excluded self-harm and inappropriate behavior, such as substance use, sexual intimidation, or vandalism Site 1: 9.1%
Site 2: 7.7%
-
Girardi et al, 2019 (UK) [27] Retrospective 28 Physical aggression toward others or verbal aggression 57.2% Higher scores of Historical Clinical and Risk Management scale
Huitema et al, 2021 (Netherlands) [28] Retrospective 542 Verbal aggression, aggression toward objects, self-harm, physical aggression, and sexual aggression 63.5% Civil psychiatric patients caused more aggressive incidents than forensic patients and female patients caused more inpatient aggression compared with male patients.
Camus et al, 2021 (Switzerland) [21] Retrospective 4518 Violent physical contact directed against another person 4.4% Living in sheltered housing before hospitalization; Suffering from schizophrenia with substance abuse comorbidity; Cumulating hospitalization days
Fazel et al, 2021 (UK) [29] Prospective 89 An outcome was defined as a violent incident categorized on the Datix system as ‘violence’ or ‘aggression’. 33% Total dynamic score of ⩾1; Younger age; Female sex
Lockertsen et al, 2021 (Norway) [30] Retrospective 528 Physical violence: a physical act against another person involving the use of body parts or objects, with a clear intention to cause physical injury to that person; Threats as verbal and non-verbal communications conveying a clear intention to inflict physical injury upon another person. 14.6% Higher scores of Brøset Violence Checklist
McIvor et al, 2022 (UK) [23] Retrospective 8923 - - Increased number of violent incidents in the year before admission; Being admitted involuntarily; Being admitted to psychiatric intensive care unit; Instances of self-harm; Being the target of violence; Referral to a Psychiatric Liaison Team

Studies have developed a variety of risk assessment measures to improve violence risk assessment [3443]. However, their performances in different locations vary considerably [44]. In addition, the process of applying some risk assessment measures is cumbersome and time-consuming, and frequently assessing the risk of violence is thus impractical in most real-world clinical settings [45, 46]. Because of these challenges, developing a means through which violent incidents can be predicted by analyzing already-registered clinical text would be a valuable contribution to the field of personalized medicine and would yield time savings.

Most medical institutions use electronic medical record (EMR) systems, and many studies have retrieved unstructured text data from EMRs to investigate various research topics [20, 4752]. According to the results of our literature review, only two studies used EMRs to predict the risk of aggression in psychiatric inpatients [20, 26], and no studies have used Chinese nursing EMRs to analyze related issues. Therefore, this study established a predictive model for violence in psychiatric inpatients by using structured and unstructured data obtained from Chinese nursing EMRs and several machine learning techniques.

Methods

Data sources and study sample

We obtained the data set for the violence prediction task from the psychiatry department of a regional hospital in southern Taiwan. This psychiatry department has 2 25-bed acute psychiatric wards. Admissions from the 2 wards from between January 2008, and December 2018, were included in the data set. The data used in this study were personal data, diagnosis data, and nursing records that were included in EMRs.

Ethics statement

The current study received approval from the Institutional Review Board (IRB) of Taichung Veterans General Hospital (IRB number: SE19143B). The data set comprised deidentified secondary data. Therefore, the IRB of Taichung Veterans General Hospital formally waived the requirement for participant consent.

Research variables

The dependent variable was the presence or absence of violence during hospitalization. In this study, we defined violence as any behavior that involves an overt attempt or an actual act of aggression that causes physical harm to another person. This includes behaviors such as threatening to hit or physically attacking someone, damaging property, or throwing objects at people. An example of presence violence in one of nursing EMRs is given below.

After toileting, attempted to climb onto another patient’s bed. Despite being reminded of their correct bed location, the patient persisted and even attempted to kick the other patient lying on the bed. Verbal intervention was used to try to stop the patient, but the patient then physically attacked the security staff.”

Two experienced psychiatrists independently reviewed the nursing EMRs of each inpatient to determine whether violence occurred during the patient’s hospitalization. In the case of a discrepancy, a third experienced psychiatrist reviewed the EMRs.

We obtained each patient’s personal basic information, admission diagnosis, and admission nursing assessment from the EMRs created by nurses on the first day of hospitalization. After the data were preprocessed, the structured features were combined with the text features extracted by text preprocessing techniques to obtain a complete set of independent variables.

Text preprocessing of nursing records

Two text preprocessing methods were used to extract text features: bag-of-words (BOW) and sentence embedding.

The BOW method is a commonly used technique in natural language processing (NLP) for text classification and analysis. It involves breaking down a piece of text into individual words, discarding grammar and word order, and representing the text as a numerical vector. Our bag-of-words text preprocessing steps are illustrated in Fig 1. These steps include word transformation, part-of-speech (POS) tagging, and text vectorization.

Fig 1. The text preprocessing steps.

Fig 1

Initially, we converted full-width characters into half-width characters and removed non-American Standard Code for Information Interchange characters in nursing records.

Next, we used the CkipTagger and NLTK toolkits for text segmentation and POS tagging. Because nursing notes in Taiwan are written bilingually (i.e., in Chinese and English), the text features of each language were separately determined by using different natural language processing (NLP) tools. CkipTagger is an open-source Chinese NLP tool that was released by the Chinese Thesaurus Group (CKIP Lab) of Academia Sinica in 2019. Its main functions include named entity recognition, word segmentation, and POS tagging. NLTK is one of the most widely used open-source NLP tools for English document preprocessing. Many common text preprocessing tasks can be performed using NLTK, such as tokenization, stemming, POS tagging, and named entity recognition. We first performed word segmentation and POS tagging by using CkipTagger to retrieve a set of Chinese index terms. In this step, we only retained nouns as feature words. NLTK was used for word segments that were marked as foreign words by CkipTagger to identify English index terms.

After the text preprocessing steps were completed, we obtained a large set of feature words from the nursing notes; we employed dimension reduction to filter out meaningless words. The filter was set to a frequency of 0.1 to 0.9 occurrences per document. The term frequency-inverse document frequency (TF-IDF) technique of the Scikit-Learn suite was used for document vectorization.

Sentence embedding is an emerging NLP technique that involves representing a sentence or document as a dense vector of numerical values. The goal of sentence embedding is to capture the meaning and context of a sentence in a low-dimensional space, so that the resulting vector can be used as document features. This study used Sentence-BERT (SBERT) [53], a modification of the Bidirectional Encoder Representations from Transformers (BERT) model, to generate text embedding features. The key innovation of SBERT is the addition of a siamese network architecture, where two copies of the BERT model share the same weights and are trained to encode two different sentences. During training, the model is presented with pairs of sentences and learns to distinguish between sentences that are semantically similar and those that are dissimilar. This enables SBERT to generate high-quality sentence embeddings that capture the semantic meaning of a sentence.

The steps to generate document embeddings using SBERT are as follows: first, split the document into sentences; second, generate sentence embeddings; third, aggregate the sentence embeddings into a single vector by taking the mean average of the sentence embeddings; and finally, normalize the resulting document embedding.

Descriptive statistics

We performed independent t and chi-squared tests to identify differences between the violent and nonviolent groups.

Prediction model assessment

We employed 5 well-known supervised learning techniques, namely decision tree (DT, J48 module in WEKA) [54], random forest (RF; RandomForest module in WEKA) [55], support vector machine (SVM, SMO module in WEKA) [56], artificial neural network (ANN, MultilayerPerceptron module in WEKA), K-Nearest Neighbors (KNN, IBk module in WEKA) [57], and boosted random forest (AdaBoost+RF, AdaBoostM1 with RandomForest modules in WEKA) [57], to assess the prediction model performance.

Feature selection is a well-established technique in supervised learning. Feature selection can be employed to improve training efficiency and develop compact models with a high prediction performance. We used the CfsSubsetEval module with the BestFirst search method in Weka, that is, a correlation-based feature selection method, to identify correlations between independent and dependent variables. A feature subset containing features that were highly correlated with the dependent variable but not correlated with each other was obtained.

To improve the prediction performance of the investigated algorithms, the parameter settings can play a crucial role. Therefore, we utilized the CVParameterSelection metalearner module in Weka to optimize these settings. The module allowed us to define multiple combinations of parameters and automatically execute the base classifier with each combination. It then determined the optimal parameter settings based on the best prediction results obtained through cross-validation.

To mitigate overfitting while building the model, this study employed 10-fold cross-validation process. Firstly, the dataset was randomly and independently divided into 10 subsets. During each iteration, one of these subsets was used as a test dataset while the remaining nine subsets were used as training datasets. This allowed for a more robust evaluation of the model’s performance. Moreover, to prevent class imbalance problems, we employed an undersampling method during the cross-validation process. The SpreadSubsample module in Weka 3.8.3 was used to adjust the instance distribution of 2 classes in our training set. The confusion matrix was used to evaluate the prediction performance of each model (Fig 2). True positive (TP) represents the number of inpatients who are correctly predicted to be violent; True negative (TN) represents the number of inpatients who are correctly predicted to be non-violent; False positive (FP) represents the number of inpatients who are incorrectly predicted to be violent; False negative (FN) represents the number of inpatients who are incorrectly predicted to be non-violent.

Fig 2. Confusion matrix.

Fig 2

Using the information summarized in the confusion matrix, 5 classification performance metrics, including accuracy, precision, recall/sensitivity, f1-measure, and specificity, can be obtained through the following equations [58, 59]:

Accuracy=TP+TNTP+TN+FP+FN (1)
Precision=TPTP+FP (2)
Recall=Sensitivity=TPTP+FN (3)
F1measure=2precisionrecallprecision+recall (4)
Specificity=TNFP+TN (5)

In addition, we also utilized the area under the curve (AUC) to assess the quality of the models. The AUC value can range from 0 to 1, with a higher score indicating a more favorable performance of the classifier.

Analytical tools

For data extraction, computation, linkage, and processing, we used Microsoft SQL Server 2005 (Microsoft, Redmond, WA, USA). We used SAS (Version 9.2; SAS Institute Cary, NC, USA) and SPSS (Version 19.0 for Windows; IBM, New York, NY, USA) for all statistical analyses, and P < .05 was considered to indicate significance. All supervised learning techniques used to assess the prediction model were implemented using Weka 3.8.2 open-source machine learning software (www.cs.waikato.ac.nz/ml/weka).

Results

Baseline data

We obtained the EMRs for 2357 inpatients; 293 were excluded because they contained too little text. Finally, 2064 admission records were included. The 5 most common admission diagnoses were schizophrenic disorders [750 cases, 36.3%, International Classification of Disease, ninth revision, Clinical Modification (ICD-9-CM): 295)], episodic mood disorders (527 cases, 25.5%, ICD-9-CM: 296), persistent mental disorders resulting from conditions classified elsewhere (125 cases, 6.1%, ICD-9-CM: 294), other nonorganic psychoses (119 cases, 5.8%, ICD-9-CM: 298), and dementia (84 cases, 4.1%, ICD-9-CM: 290). In addition, the records of 406 admitted patients (19.7%) stated that violence occurred during their hospitalization. The clinical and demographic variables of the violent and nonviolent groups are listed in Table 2. The violent group was younger and had a more violent history than the nonviolent group. Furthermore, patients in the violent group were more likely to be unmarried than those in the nonviolent group were.

Table 2. Characteristics of patients in violent and nonviolent groups.

Violent group Non-violent group P values
N = 406 (%) N = 1658 (%)
Sex
 Male 234 (57.6) 875 (52.8) 0.078
 Female 172 (42.4) 783 (47.2)
Age (years) a 46 (36–59) 49.5 (41–60) 0.005*
Education level
 0 70 (17.2) 260 (15.7) 0.685
 1–6 years 92 (22.7) 388 (23.4)
 7–12 years 183 (45.1) 779 (47.0)
 ≧ 13 years 51 (12.6) 183 (11.0)
Occupation 375 (92.4) 1488 (89.7) 0.317
Marriage 0.015*
 Unmarried 205 (50.5) 704 (42.5)
 Married 93 (22.9) 456 (27.5)
 Divorced 75 (18.5) 310 (18.7)
 Widowed 13 (3.2) 90 (5.4)
Violent history 262 (64.5) 790 (47.6) < .001*
History of substance use 78 (19.2) 331 (20.0) 0.440
Diagnoses, N (%)
 Schizophrenic disorders 136 (33.5) 614 (37.0) 0.037*
 Episodic mood disorders 98 (24.1) 429 (25.9)
 Persistent mental disorders due to conditions classified elsewhere 29 (7.1) 96 (5.8)
 Other nonorganic psychoses 28 (6.9) 91 (5.5)
 Dementia 22 (5.4) 62 (3.7)

a Median (interquartile range);

*Statistical significance

After data preprocessing, 3 types of variables were obtained, namely structured variables, TF-IDF document vector variables, and SBERT document embedding variables. For the complete data set, 1032 variables (i.e., 403 structured variables, 245 TF-IDF document vector variables, and 384 SBERT document embedding variables) were obtained. To evaluate the impact of text features on the experimental results, two feature sets were utilized. These sets consisted of structured variables combined with TF-IDF variables (a total of 648 variables) and structured variables combined with SBERT variables (a total of 787 variables). The application of the CfsSubsetEval method resulted in the retention of only 26 independent variables in the dataset, comprising of 21 structured variables, 4 TF-IDF document vector variables, and 1 SBERT document embedding variable.

Prediction model performance

The results on the performance of the prediction models on the training and validation data sets are presented in Table 3 and Fig 3. The performance of the AdaBoost+RF model was consistently better than that of the other techniques. The structured variables combined with TF-IDF variables resulted in an accuracy of 0.617, a precision of 0.619, a recall/sensitivity of 0.608, an f1-measure of 0.614, a specificity of 0.626, and an AUC of 0.634. The ANN model had the highest sensitivity (0.692) and f1-measure and the KNN model had the highest specificity (0.650). The structured variables combined with SBERT variables resulted in an accuracy of 0.555, a precision of 0.557, a recall/sensitivity of 0.544, an f1-measure of 0.550, a specificity of 0.567, and an AUC of 0.587. Compared to the use of bag-of-word features, the utilization of sentence embedding variables resulted in poorer performance. In addition, employing feature selection (CfsSubsetEval) resulted in an improved prediction performance for the reduced dataset. When this dataset was fed to AdaBoost+RF, the accuracy improved to 0.639, the precision to 0.648, the f1-measure to 0.629, the specificity to 0.667 and the AUC to 0.684.

Table 3. Prediction model performance assessment using 10-fold cross-validation.

Feature set Method Metrics
ACC PRE SEN F1 SPE AUC
Structured + TF-IDF (648) DT 0.576 0.577 0.579 0.575 0.576 0.595
RF 0.596 0.597 0.599 0.595 0.596 0.631
SVM 0.601 0.604 0.589 0.596 0.613 0.601
KNN 0.563 0.576 0.475 0.521 0.650 0.586
ANN 0.576 0.562 0.692 0.620 0.461 0.619
AdaBoost+RF 0.617 0.619 0.608 0.614 0.626 0.634
Structured + SBERT (787) DT 0.536 0.536 0.537 0.536 0.534 0.545
RF 0.539 0.539 0.547 0.543 0.540 0.562
SVM 0.546 0.547 0.534 0.540 0.557 0.546
KNN 0.539 0.554 0.406 0.469 0.672 0.547
ANN 0.549 0.552 0.522 0.537 0.576 0.577
AdaBoost+RF 0.555 0.557 0.544 0.550 0.567 0.587
CfsSubsetEval (26) DT 0.555 0.558 0.534 0.546 0.576 0.581
RF 0.639 0.650 0.603 0.626 0.675 0.677
SVM 0.615 0.631 0.552 0.589 0.677 0.615
KNN 0.602 0.614 0.549 0.580 0.655 0.636
ANN 0.584 0.586 0.569 0.578 0.599 0.600
AdaBoost+RF 0.639 0.648 0.611 0.629 0.667 0.684

ACC: accuracy; PRE: precision; SEN: sensitivity/recall; F1: f1-measure; SPE: specificity; AUC: area under the ROC curve; DT: decision tree; RF: random forest; SVM: support vector machine; KNN: k-nearest neighbors; ANN: artificial neural network; AdaBoost+RF: AdaBoostM1 with random forest.

Fig 3. Prediction model performance assessment using 10-fold cross-validation.

Fig 3

Discussion

Our study is the first to analyze Chinese-language EMRs to predict the risk of violence in psychiatric inpatients by using structured and unstructured data from Chinese EMRs and machine learning techniques. The following are the main findings of our study: (1) the violence rate for psychiatric inpatients was 19.7%; (2) patients in the psychiatric wards with violence were younger, had a more violent history, and were more likely to be unmarried than patients without violence were; (3) violence in psychiatric inpatients could be predicted by using data from Chinese nursing EMRs with an acceptable accuracy (AUC: 0.684).

Patients directing violence at staff or other patients is a common occurrence in most psychiatric treatment facilities. Iozzino et al [5] investigated the prevalence of violent incidents occurring during admission in 35 facilities and determined it to be 2% to 44%, with an average of 17% across the included sites. In our study, the data of 406 patients (19.7%) were determined to indicate that violence occurred during hospitalization. Therefore, the prevalence in the present study was similar with that of Iozzion et al. However, findings regarding the prevalence of violence during admission vary considerably in the literature. This may have occurred for several reasons. First, the definitions of aggression and violence used in studies on psychiatric inpatients can vary considerably, which may contribute to discrepancies in findings regarding the prevalence of violent incidents. For example, in Menger et al, violence was defined as either physical or verbal aggression toward hospital staff or other patients [20]. However, in Lam et al, only physical aggression was considered [16]. In addition, in Schlup et al, psychiatric inpatient violence was categorized and assessed as the following five types: (1) verbal violence; (2) verbal sexual violence; (3) violence against property; (4) physical sexual violence; and (5) physical violence [24]. As Mierlo et al. said, a uniform overall accepted definition of aggressive or violent behavior is lacking, which can result in different operationalizations [60]. It is important to note that aggression and violence are two distinct concepts. Aggression generally refers to behavior intended to cause harm, whether physical or psychological, while violence specifically involves physical harm, such as hitting, punching, or using a weapon. Violence can be seen as an extreme form of aggression with the primary goal of intentional injury. Therefore, the use of different definitions of violence and aggression in studies can lead to confusion in the interpretation of results. Second, the prevalence of inpatient violence may differ with the type of psychiatric ward (e.g., acute, chronic, forensic, or psychiatric intensive care unit). Third, ethnicity may influence the prevalence of inpatient violence. In their meta-analysis, Dack et al [13] reported no significant ethnicity-related differences between aggressive and nonaggressive patients. However, this result was statistically heterogeneous. Future studies should investigate the relationship between ethnicity and inpatient violence. Our study revealed the rate of violence in Asian acute psychiatric inpatients to be 19.7%.

Consistent with those of other studies [9, 1319], our findings indicated that a younger age [14, 15, 29], history of violence [16, 17, 23], and unmarried status [18, 19] are related to patients demonstrating violence in psychiatric wards. In a review by Cornaggia et al [9], the researchers concluded that being admitted involuntarily, the existence of previous aggressive episodes, having a longer hospitalization stay, being impulsive, misusing drugs or alcohol, having a younger age, and having a diagnosis of psychosis were associated with inpatient aggression/violence. Furthermore, a review by Dack et al [13] discovered inpatient aggression to be associated with being male, having a younger age, not being married, being admitted involuntarily, having more previous admissions, having a history of exhibiting self-destructive behavior, having a history of substance misuse, and having a history of violence.

Research indicates that predicting violence incidents can be challenging and is often experience-related [31, 32]. Therefore, several risk assessment instruments have been developed [61]. The most commonly used risk assessment instruments are the Violence Risk Appraisal Guide [39], Structured Assessment of Aggression Risk in Youth [41], and Historical Clinical Risk Management-20 [40]. Sing et al [62] determined in a meta-study that the aforementioned instruments have median AUCs of between 0.70 and 0.74. However, the effects identified using these instruments have generally been small. In addition, the studies that have employed these instruments have mostly had heterogeneous patient populations and differing reports regarding the performance of the instruments. These factors prevent the instruments’ predictive abilities from being generalized to other facilities [13, 44]. In addition, the application of some risk assessment instruments is time-consuming, which has rendered their frequent use in most real-world clinical settings impractical [45, 46]. Because of the aforementioned challenges, using clinical text that is already registered in EMRs to predict violent incidents may be a practical method for violence risk assessment. In our study, violence in psychiatric inpatients could be predicted using data in Chinese nursing EMRs with an AUC of 0.684.

The results of the 10-fold cross-validation revealed AdaBoost+RF to have the highest average AUC. RF has been reported to have a more favorable performance than many other conventional supervised learning techniques in numerous studies [6366]. RF offers the advantages of not assuming a linear relationship in the model; employing ensemble learning, in which a strong learner is formed by combining weak learner groups; and iteratively sampling data and completing embedded feature selection to create several decision trees. On the other hand, boosting (i.e., AdaBoostM1 in WEKA) is a technique used to improve the performance of weak learners by weighting misclassified observations and re-training the model to focus more on these observations in the subsequent iterations. This allows the model to gradually improve its performance by focusing on the data points that are more difficult to classify. Additionally, boosting can also help RF to capture more diverse and informative features by allowing each tree to focus on different subsets of the data. By combining the benefits of both techniques, AdaBoost+RF can improve prediction performance and is less prone to overfitting.

Through a literature review, we discovered that only 2 studies, that of Menger et al, has applied machine learning techniques to EMR data to predict the risk of aggression in psychiatric inpatients [20]. Menger et al achieved the highest accuracy when they combined Document Embeddings with a Recurrent Neural Network. The AUC of our study (AUC: 0.684) is lower than that obtained in Menger et al (AUC: 0.764–0.797). The differences in the study designs may be responsible for the discrepancy. Menger et al investigated incidents in which patients demonstrated either physical or verbal aggression toward staff or other patients [20]. However, only physical aggression was included in our study. Furthermore, Menger et al included the EMRs of doctors and nurses that were created on the first day of admission, whereas our study included only the EMRs of nurses created on the first day of admission. Including less data in analyses can result in less accurate predictions. In addition, Chinese is a complex language with multiple meanings for the same word, which can make it difficult for text mining algorithms to accurately identify the intended meaning of the text [6769]. Medical terminology in Chinese can be ambiguous, with different terms used to describe similar symptoms or conditions. This ambiguity can cause confusion for text mining algorithms, leading to inaccurate or incomplete results. Furthermore, Chinese EMRs may not be standardized, meaning there is a lack of consistency in how patient information is recorded. This can make it difficult for text mining algorithms to accurately detect violence from the EMR data.

Limitations

In this study, we used text mining of Chinese EMRs to predict violence in psychiatric inpatients. While our results provide some insights into the potential of this approach, there are several limitations to our study that should be discussed. First, the use of text mining to extract data from EMRs may not capture all relevant information on risk factors for violence in psychiatric inpatients. For example, data on whether admission was voluntary, the number of previous admissions, quality of life, family support, intelligence level, and history of sexual abuse, which are known to be associated with the risk of violence in psychiatric wards [4, 5, 819], were not included in our data set. As a result, the accuracy of our predictions may be limited by the absence of these important factors. Second, the accuracy of our predictions may also be limited by the quality and completeness of the EMRs used in our study. It is possible that errors or inconsistencies in documentation could have affected the reliability of our data and our ability to accurately identify risk factors for violence. Third, the use of Chinese EMRs may introduce cultural and linguistic biases that could impact the validity of our predictions. It is important to acknowledge that the cultural and linguistic factors that may influence violence in psychiatric inpatients are complex and may not be fully captured by our text mining approach. Fourth, the EMRs did not clearly specify the methods through which diagnoses were made. Therefore, we could not evaluate the diagnostic accuracy of the psychiatric disorders reported in the EMRs included in our study. Fifth, our study did not include verbal aggression. Therefore, the prevalence of verbal aggression in psychiatric wards and the accuracy of verbal aggression prediction in psychiatric wards using Chinese EMRs warrant further study. Finally, the timeframe of our study may have impacted the accuracy of our predictions, as violence in psychiatric inpatients could be influenced by factors that change over time, such as changes in medication or therapy.

Conclusion and future directions

Violence is a key concern in acute psychiatric wards because it can lead to patient or staff injury and because it is counter-therapeutic. Studies have reported that 75% to 100% of nursing staff who work in acute psychiatric units have experienced patient assault [70, 71]. Aggression toward staff was indicated to contribute to high staff turnover [72]. Given the importance of this problem, predicting which psychiatric inpatients will commit violence is crucial. Therefore, we established a predictive model for violence in psychiatric inpatients by using structured and unstructured data obtained from Chinese EMRs and several machine learning techniques with an acceptable accuracy. The results supported the feasibility of predicting violent incidents in psychiatric wards by using EMR data collected at the time of admission and indicated that such a method might be incorporated into routine clinical practice to enable early prediction of inpatient violence. Our findings may provide clinicians with a new basis for judging violence risk in psychiatric wards and may enable first-line caregivers to implement appropriate treatment and preventive measures for hospitalized patients at high risk of violence, ultimately improving patient outcomes and staff safety.

Future research directions in this field could include incorporating additional variables, such as admission type, previous admissions, intelligence level, and history of sexual abuse, to improve the accuracy of predictive models for violence. Structured interviews could be used to determine psychiatric diagnoses and investigate the association between psychiatric disorders and the risk of violence in inpatients. Future studies could also explore the prevalence of verbal aggression in psychiatric wards and the accuracy of predicting verbal aggression using EMRs. Furthermore, validation of our model on other populations to determine its generalizability and applicability to different contexts is needed. The effectiveness of different machine learning techniques and prediction models could also be compared to identify the most accurate and efficient method for predicting violence in psychiatric inpatients. Moreover, targeted interventions could be developed and implemented to reduce the risk of violence in psychiatric inpatients identified as high-risk by the model. Finally, long-term outcomes of violence in psychiatric inpatients, such as patient outcomes and staff safety, should be examined to determine the impact of early prediction and intervention on patient care and outcomes.

Supporting information

S1 Data. CfsSubsetEval (26)-balanced-final.

(CSV)

S2 Data. Structured with SBERT (787)-balanced-final.

(CSV)

S3 Data. Structured with TF-IDF (648)-final.

(CSV)

Data Availability

All relevant data are within the paper and its Supporting information files.

Funding Statement

This study was partially supported by the Ministry of Science and Technology (grant numbers MOST 108-2314-B-367-001 and 111-2314-B-367-001-MY3). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Friedman RA. Violence and mental illness—how strong is the link? The New England journal of medicine. 2006;355(20):2064–6. Epub 2006/11/17. doi: 10.1056/NEJMp068229 . [DOI] [PubMed] [Google Scholar]
  • 2.Fazel S, Långström N, Hjern A, Grann M, Lichtenstein P. Schizophrenia, substance abuse, and violent crime. Jama. 2009;301(19):2016–23. Epub 2009/05/21. doi: 10.1001/jama.2009.675 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Swanson JW. Mental disorder, substance abuse, and community violence: an epidemiological approach. 1994. [Google Scholar]
  • 4.Virtanen M, Vahtera J, Batty GD, Tuisku K, Pentti J, Oksanen T, et al. Overcrowding in psychiatric wards and physical assaults on staff: data-linked longitudinal study. The British journal of psychiatry: the journal of mental science. 2011;198(2):149–55. Epub 2011/02/02. doi: 10.1192/bjp.bp.110.082388 . [DOI] [PubMed] [Google Scholar]
  • 5.Iozzino L, Ferrari C, Large M, Nielssen O, de Girolamo G. Prevalence and Risk Factors of Violence by Psychiatric Acute Inpatients: A Systematic Review and Meta-Analysis. PloS one. 2015;10(6):e0128536. Epub 2015/06/11. doi: 10.1371/journal.pone.0128536 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Ward L. Mental health nursing and stress: Maintaining balance. International journal of mental health nursing. 2011;20(2):77–85. doi: 10.1111/j.1447-0349.2010.00715.x [DOI] [PubMed] [Google Scholar]
  • 7.Needham I, Abderhalden C, Halfens RJ, Fischer JE, Dassen T. Non-somatic effects of patient aggression on nurses: a systematic review. Journal of advanced nursing. 2005;49(3):283–96. doi: 10.1111/j.1365-2648.2004.03286.x [DOI] [PubMed] [Google Scholar]
  • 8.Ramesh T, Igoumenou A, Montes MV, Fazel S. Use of risk assessment instruments to predict violence in forensic psychiatric hospitals: a systematic review and meta-analysis. European psychiatry. 2018;52:47–53. doi: 10.1016/j.eurpsy.2018.02.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Cornaggia CM, Beghi M, Pavone F, Barale F. Aggression in psychiatry wards: a systematic review. Psychiatry research. 2011;189(1):10–20. Epub 2011/01/18. doi: 10.1016/j.psychres.2010.12.024 . [DOI] [PubMed] [Google Scholar]
  • 10.Fletcher A, Crowe M, Manuel J, Foulds J. Comparison of patients’ and staff’s perspectives on the causes of violence and aggression in psychiatric inpatient settings: An integrative review. Journal of Psychiatric and Mental Health Nursing. 2021;28(5):924–39. doi: 10.1111/jpm.12758 [DOI] [PubMed] [Google Scholar]
  • 11.Van Wijk E, Traut A, Julie H. Environmental and nursing-staff factors contributing to aggressive and violent behaviour of patients in mental health facilities. curationis. 2014;37(1):1–9. doi: 10.4102/curationis.v37i1.1122 [DOI] [PubMed] [Google Scholar]
  • 12.Olsson H, Audulv Å, Strand S, Kristiansen L. Reducing or increasing violence in forensic care: a qualitative study of inpatient experiences. Archives of psychiatric nursing. 2015;29(6):393–400. doi: 10.1016/j.apnu.2015.06.009 [DOI] [PubMed] [Google Scholar]
  • 13.Dack C, Ross J, Papadopoulos C, Stewart D, Bowers L. A review and meta-analysis of the patient factors associated with psychiatric in-patient aggression. Acta Psychiatr Scand. 2013;127(4):255–68. Epub 2013/01/08. doi: 10.1111/acps.12053 . [DOI] [PubMed] [Google Scholar]
  • 14.Davis S. Violence by psychiatric inpatients: A review. Psychiatric services. 1991;42(6):585–90. doi: 10.1176/ps.42.6.585 [DOI] [PubMed] [Google Scholar]
  • 15.Carr VJ, Lewin TJ, Sly KA, Conrad AM, Tirupati S, Cohen M, et al. Adverse incidents in acute psychiatric inpatient units: rates, correlates and pressures. Australian & New Zealand Journal of Psychiatry. 2008;42(4):267–82. doi: 10.1080/00048670701881520 [DOI] [PubMed] [Google Scholar]
  • 16.Lam JN, McNiel DE, Binder RL. The relationship between patients’ gender and violence leading to staff injuries. Psychiatric Services. 2000;51(9):1167–70. doi: 10.1176/appi.ps.51.9.1167 [DOI] [PubMed] [Google Scholar]
  • 17.Soliman AE-D, Reza H. Risk factors and correlates of violence among acutely ill adult psychiatric inpatients. Psychiatric services. 2001;52(1):75–80. doi: 10.1176/appi.ps.52.1.75 [DOI] [PubMed] [Google Scholar]
  • 18.Raja M, Azzoni A. Hostility and violence of acute psychiatric inpatients. Clinical Practice and Epidemiology in Mental Health. 2005;1(1):1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Grassi L, Peron L, Marangoni C, Zanchi P, Vanni A. Characteristics of violent behaviour in acute psychiatric in-patients: a 5-year Italian study. Acta Psychiatrica Scandinavica. 2001;104(4):273–9. doi: 10.1034/j.1600-0447.2001.00292.x [DOI] [PubMed] [Google Scholar]
  • 20.Menger V, Scheepers F, Spruit M. Comparing deep learning and classical machine learning approaches for predicting inpatient violence incidents from clinical text. Applied Sciences. 2018;8(6):981. [Google Scholar]
  • 21.Camus D, Dan Glauser ES, Gholamrezaee M, Gasser J, Moulin V. Factors associated with repetitive violent behavior of psychiatric inpatients. Psychiatry Res. 2021;296:113643. Epub 2020/12/23. doi: 10.1016/j.psychres.2020.113643 . [DOI] [PubMed] [Google Scholar]
  • 22.Weltens I, Bak M, Verhagen S, Vandenberk E, Domen P, van Amelsvoort T, et al. Aggression on the psychiatric ward: Prevalence and risk factors. A systematic review of the literature. PloS one. 2021;16(10):e0258346. Epub 2021/10/09. doi: 10.1371/journal.pone.0258346 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.McIvor L, Payne-Gill J, Beck A. Associations between violence, self-harm and acute psychiatric service use: Implications for inpatient care. J Psychiatr Ment Health Nurs. 2022. Epub 2022/09/08. doi: 10.1111/jpm.12872 . [DOI] [PubMed] [Google Scholar]
  • 24.Schlup N, Gehri B, Simon M. Prevalence and severity of verbal, physical, and sexual inpatient violence against nurses in Swiss psychiatric hospitals and associated nurse-related characteristics: Cross-sectional multicentre study. Int J Ment Health Nurs. 2021;30(6):1550–63. Epub 2021/07/02. doi: 10.1111/inm.12905 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Brown S, O’Rourke S, Schwannauer M. Risk factors for inpatient violence and self-harm in forensic psychiatry: the role of head injury, schizophrenia and substance misuse. Brain injury. 2019;33(3):313–21. Epub 2018/12/07. doi: 10.1080/02699052.2018.1553064 . [DOI] [PubMed] [Google Scholar]
  • 26.Menger V, Spruit M, van Est R, Nap E, Scheepers F. Machine Learning Approach to Inpatient Violence Risk Assessment Using Routinely Collected Clinical Notes in Electronic Health Records. JAMA network open. 2019;2(7):e196709. Epub 2019/07/04. doi: 10.1001/jamanetworkopen.2019.6709 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Girardi A, Hancock-Johnson E, Thomas C, Wallang PM. Assessing the Risk of Inpatient Violence in Autism Spectrum Disorder. The journal of the American Academy of Psychiatry and the Law. 2019;47(4):427–36. Epub 2019/09/27. doi: 10.29158/JAAPL.003864-19 . [DOI] [PubMed] [Google Scholar]
  • 28.Huitema A, Verstegen N, de Vogel V. A Study Into the Severity of Forensic and Civil Inpatient Aggression. Journal of interpersonal violence. 2021;36(11–12):Np6661-np79. Epub 2018/12/12. doi: 10.1177/0886260518817040 . [DOI] [PubMed] [Google Scholar]
  • 29.Fazel S, Toynbee M, Ryland H, Vazquez-Montes M, Al-Taiar H, Wolf A, et al. Modifiable risk factors for inpatient violence in psychiatric hospital: prospective study and prediction model. Psychological medicine. 2021;53(2):1–7. Epub 2021/05/25. doi: 10.1017/S0033291721002063 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Lockertsen Ø, Varvin S, Færden A, Vatnar SKB. Short-term risk assessments in an acute psychiatric inpatient setting: A re-examination of the Brøset Violence Checklist using repeated measurements—Differentiating violence characteristics and gender. Arch Psychiatr Nurs. 2021;35(1):17–26. Epub 2021/02/18. doi: 10.1016/j.apnu.2020.11.003 . [DOI] [PubMed] [Google Scholar]
  • 31.Ægisdóttir S, White MJ, Spengler PM, Maugherman AS, Anderson LA, Cook RS, et al. The meta-analysis of clinical judgment project: Fifty-six years of accumulated research on clinical versus statistical prediction. The Counseling Psychologist. 2006;34(3):341–82. [Google Scholar]
  • 32.Teo AR, Holley SR, Leary M, McNiel DE. The relationship between level of training and accuracy of violence risk assessment. Psychiatric services. 2012;63(11):1089–94. doi: 10.1176/appi.ps.201200019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Eaton S, Ghannam M, Hunt N. Prediction of violence on a psychiatric intensive care unit. Medicine, Science and the Law. 2000;40(2):143–6. doi: 10.1177/002580240004000210 [DOI] [PubMed] [Google Scholar]
  • 34.Skeem JL, Monahan J. Current directions in violence risk assessment. Current directions in psychological science. 2011;20(1):38–42. [Google Scholar]
  • 35.Daffern M. The predictive validity and practical utility of structured schemes used to assess risk for aggression in psychiatric inpatient settings. Aggression and Violent Behavior. 2007;12(1):116–30. [Google Scholar]
  • 36.Douglas KS, Guy LS, Reeves KA, Weir J. HCR-20 violence risk assessment scheme: Overview and annotated bibliography. 2005.
  • 37.Monahan J, Steadman HJ, Robbins PC, Silver E, Appelbaum PS, Grisso T, et al. Developing a clinically useful actuarial tool for assessing violence risk. The British Journal of Psychiatry. 2000;176(4):312–9. doi: 10.1192/bjp.176.4.312 [DOI] [PubMed] [Google Scholar]
  • 38.Anderson KK, Jenson CE. Violence risk–assessment screening tools for acute care mental health settings: Literature review. Archives of psychiatric nursing. 2019;33(1):112–9. doi: 10.1016/j.apnu.2018.08.012 [DOI] [PubMed] [Google Scholar]
  • 39.Quinsey VL, Harris GT, Rice ME, Cormier CA. Violent offenders: Appraising and managing risk: American Psychological Association; 2006. [Google Scholar]
  • 40.Webster C, Douglas K, Eaves D, Hart S. HCR-20: Assessing risk for violence (Version 2). Burnaby, British Columbia, Canada: Mental Health. Law, and Policy Institute, Simon Fraser University. 1997.
  • 41.Borum R, Lodewijks HP, Bartel PA, Forth AE. The Structured Assessment of Violence Risk in Youth (SAVRY). 2021. [DOI] [PubMed]
  • 42.Fazel S, Singh JP, Doll H, Grann M. Use of risk assessment instruments to predict violence and antisocial behaviour in 73 samples involving 24 827 people: systematic review and meta-analysis. Bmj. 2012;345. doi: 10.1136/bmj.e4692 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Mistler LA, Friedman MJ. Instruments for Measuring Violence on Acute Inpatient Psychiatric Units: Review and Recommendations. Psychiatric services (Washington, DC). 2022;73(6):650–7. Epub 2021/09/16. doi: 10.1176/appi.ps.202000297 . [DOI] [PubMed] [Google Scholar]
  • 44.Yang M, Wong SC, Coid J. The efficacy of violence prediction: a meta-analytic comparison of nine risk assessment tools. Psychological bulletin. 2010;136(5):740. doi: 10.1037/a0020473 [DOI] [PubMed] [Google Scholar]
  • 45.Gardner W, Lidz CW, Mulvey EP, Shaw EC. A comparison of actuarial methods for identifying repetitively violent patients with mental illnesses. Law and Human Behavior. 1996;20(1):35–48. [Google Scholar]
  • 46.Viljoen JL, McLachlan K, Vincent GM. Assessing violence risk and psychopathy in juvenile and adult offenders: A survey of clinical practices. Assessment. 2010;17(3):377–95. doi: 10.1177/1073191109359587 [DOI] [PubMed] [Google Scholar]
  • 47.Lee CH, Yoon H-J. Medical big data: promise and challenges. Kidney research and clinical practice. 2017;36(1):3. doi: 10.23876/j.krcp.2017.36.1.3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Murdoch TB, Detsky AS. The inevitable application of big data to health care. Jama. 2013;309(13):1351–2. doi: 10.1001/jama.2013.393 [DOI] [PubMed] [Google Scholar]
  • 49.Miotto R, Li L, Kidd BA, Dudley JT. Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Scientific reports. 2016;6(1):1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Bjarnadottir RI, Bockting W, Yoon S, Dowding DW. Nurse documentation of sexual orientation and gender identity in home healthcare: a text mining study. CIN: Computers, Informatics, Nursing. 2019;37(4):213–21. doi: 10.1097/CIN.0000000000000492 [DOI] [PubMed] [Google Scholar]
  • 51.Hyun S, Cooper C. Application of Text Mining to Nursing Texts: Exploratory Topic Analysis. CIN: Computers, Informatics, Nursing. 2020;38(10):475–82. doi: 10.1097/CIN.0000000000000681 [DOI] [PubMed] [Google Scholar]
  • 52.Liao P-H, Chu W, Chu W-C. Evaluation of the mining techniques in constructing a traditional Chinese-language nursing recording system. CIN: Computers, Informatics, Nursing. 2014;32(5):223–31. doi: 10.1097/CIN.0000000000000051 [DOI] [PubMed] [Google Scholar]
  • 53.Reimers N, Gurevych I. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:190810084. 2019.
  • 54.Mathuria M. Decision tree analysis on j48 algorithm for data mining. Intrenational Journal ofAdvanced Research in Computer Science and Soft-ware Engineering. 2013;3(6). [Google Scholar]
  • 55.Breiman L. Random forests. Machine learning. 2001;45(1):5–32. [Google Scholar]
  • 56.Bhargava N, Dayma S, Kumar A, Singh P, editors. An approach for classification using simple CART algorithm in WEKA. 2017 11th International Conference on Intelligent Systems and Control (ISCO); 2017: IEEE.
  • 57.Peterson LE. K-nearest neighbor. Scholarpedia. 2009;4(2):1883. [Google Scholar]
  • 58.Akobeng AK. Understanding diagnostic tests 1: sensitivity, specificity and predictive values. Acta paediatrica. 2007;96(3):338–41. doi: 10.1111/j.1651-2227.2006.00180.x [DOI] [PubMed] [Google Scholar]
  • 59.Fowler JR, Gaughan JP, Ilyas AM. The sensitivity and specificity of ultrasound for the diagnosis of carpal tunnel syndrome: a meta-analysis. Clinical Orthopaedics and Related Research®. 2011;469(4):1089–94. doi: 10.1007/s11999-010-1637-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Klerx-Van Mierlo F, Bogaerts S. Vulnerability factors in the explanation of workplace aggression: The construction of a theoretical framework. Journal of forensic psychology practice. 2011;11(4):265–92. [Google Scholar]
  • 61.Higgins N, Watts D, Bindman J, Slade M, Thornicroft G. Assessing violence risk in general adult psychiatry. Psychiatric Bulletin. 2005;29(4):131–3. [Google Scholar]
  • 62.Singh JP, Grann M, Fazel S. A comparative study of violence risk assessment tools: A systematic review and metaregression analysis of 68 studies involving 25,980 participants. Clinical psychology review. 2011;31(3):499–513. doi: 10.1016/j.cpr.2010.11.009 [DOI] [PubMed] [Google Scholar]
  • 63.Lee P-J, Hu Y-H, Lu K-T. Assessing the helpfulness of online hotel reviews: A classification-based approach. Telematics and Informatics. 2018;35(2):436–45. [Google Scholar]
  • 64.Lin K, Hu Y, Kong G. Predicting in-hospital mortality of patients with acute kidney injury in the ICU using random forest model. International journal of medical informatics. 2019;125:55–61. doi: 10.1016/j.ijmedinf.2019.02.002 [DOI] [PubMed] [Google Scholar]
  • 65.Cacheda F, Fernandez D, Novoa FJ, Carneiro V. Early detection of depression: social network analysis and random forest techniques. Journal of medical Internet research. 2019;21(6):e12554. doi: 10.2196/12554 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Hu Y-H, Chen K, Chang I-C, Shen C-C. Critical predictors for the early detection of conversion from unipolar major depressive disorder to bipolar disorder: nationwide population-based retrospective cohort study. JMIR medical informatics. 2020;8(4):e14278. doi: 10.2196/14278 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Ma J, Xu W, Sun Y-h, Turban E, Wang S, Liu O. An ontology-based text-mining method to cluster proposals for research project selection. IEEE transactions on systems, man, and cybernetics-part a: systems and humans. 2012;42(3):784–90. [Google Scholar]
  • 68.Sun W, Cai Z, Li Y, Liu F, Fang S, Wang G. Data processing and text mining technologies on electronic medical records: a review. Journal of healthcare engineering. 2018;2018. doi: 10.1155/2018/4302425 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Zhang M-y, Lu Z-d, Zou C-y. A Chinese word segmentation based on language situation in processing ambiguous words. Information Sciences. 2004;162(3–4):275–85. [Google Scholar]
  • 70.Hatch-Maillette MA, Scalora MJ, Bader SM, Bornstein BH. A gender-based incidence study of workplace violence in psychiatric and forensic settings. Violence and victims. 2007;22(4):449–62. doi: 10.1891/088667007781553982 [DOI] [PubMed] [Google Scholar]
  • 71.Caldwell MF. Incidence of PTSD among staff victims of patient violence. Psychiatric Services. 1992;43(8):838–9. doi: 10.1176/ps.43.8.838 [DOI] [PubMed] [Google Scholar]
  • 72.Needham I, Abderhalden C, Halfens RJ, Dassen T, Haug HJ, Fischer JE. The effect of a training course in aggression management on mental health nurses’ perceptions of aggression: a cluster randomised controlled trial. International journal of nursing studies. 2005;42(6):649–55. Epub 2005/06/29. doi: 10.1016/j.ijnurstu.2004.10.003 . [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Qin Xiang Ng

13 Feb 2023

PONE-D-23-00580Predicting Aggression in Psychiatric Inpatients Using Text Mining of Chinese Electronic Nursing RecordsPLOS ONE

Dear Dr. Shen,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Mar 30 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Qin Xiang Ng, MD, MPH

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

3. Please amend your manuscript to include your abstract after the title page.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: No

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Thank you for the work to keep the hospitals safe, my interpretation is that this is a technical paper with a test of concept:

- The authors seemed to have used the terms "aggression" and "violence" interchangeably. Researchers generally define violence as aggression intended to cause extreme physical harm (e.g., injury, death). Thus, all violent acts are aggressive, but not all aggression are violent.

- Wouldn't a past history of aggression be the most significant predictor of future aggression?

- Could the authors provide a text sample assessed by the psychiatrists and the software as to what constituted a positive example of "aggression"

- Also wanted to know the rationale behind not including "verbal aggression" in the definition of aggression.

- Could the authors give explanations as to why the text mining failed at detecting aggression (AUC of approximately only 0.65)?

Reviewer #2: In this work, several text mining and machine learning techniques were used to analyze the tested data. The authors estimate the physical aggression rate for psychiatric inpatients and establish a predictive model for aggression in psychiatric inpatients. The paper's contribution to existing knowledge in this research field is not well justified. The authors mentioned some recent techniques, but the paper needs to address the motivation for developing another method. The paper needs to contribute more, and the following points can improve the manuscript.

1. The title can be improved.

2. A comparative study can be added to the BACKGROUND section in table form to show the recent efforts.

3. Figure 1 should be improved.

4. The novelty of this work is not clear. Clarify this.

5. Performance evaluation metrics are not enough. Add some other metrics and explain them mathematically.

6. The proposed method should be compared with more recent techniques.

7. Tabular data should be presented with the graphs.

8. There should be some discussion on the limitations of the methods presented in a separate section.

9. References should be updated; there are no references in 2022.

10. The manuscript organization should be improved.

11. Improve the English of the work. There are too many problems with paper typesetting.

12. Change the “Conclusion” section title to “conclusion and future directions” and add more discussion and future directions to the research.

13. The paper is unsuitable for acceptance in its current form. The article needs rewriting to address the comments mentioned above.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 Jun 7;18(6):e0286347. doi: 10.1371/journal.pone.0286347.r002

Author response to Decision Letter 0


1 May 2023

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

Response: Thank you for your message. We have made every effort to ensure that our manuscript meets PLOS ONE's style requirements, including the file naming conventions. If there are any specific areas that need further attention, please let us know and we will be happy to make the necessary revisions.

2. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

Response: Thank you for your message and for providing us with information about PLOS ONE's data policy. We apologize for not specifying where the minimal data set underlying the results described in our manuscript can be found in the initial submission. We have carefully reviewed our data and are pleased to inform you that we will upload our study's minimal underlying data set as Supporting Information files upon re-submitting our revised manuscript. We understand the importance of making the minimal data set fully available, and we will ensure that any potentially identifying patient information is fully anonymized. Thank you again for your feedback and for your assistance in ensuring that our manuscript meets the necessary requirements.

3. Please amend your manuscript to include your abstract after the title page.

Response: We have added our abstract after the title page.

Reviewers' comments:

Reviewer #1:

1. Thank you for the work to keep the hospitals safe, my interpretation is that this is a technical paper with a test of concept:

Response: We are very grateful for the reviewers' suggestions to make our article more suitable for publication, and we have done our best to modify the article according to the reviewers' suggestions.

2. The authors seemed to have used the terms "aggression" and "violence" interchangeably. Researchers generally define violence as aggression intended to cause extreme physical harm (e.g., injury, death). Thus, all violent acts are aggressive, but not all aggression are violent.

Response: Thank you for your insightful comments on our manuscript. We appreciate your attention to detail and have carefully reviewed the use of the terms "aggression" and "violence" in our manuscript. As you have correctly pointed out, aggression and violence are two distinct concepts. Aggression is generally considered a behavior that is intended to cause physical or psychological harm to another person. Violence, on the other hand, refers to any behavior that causes physical harm to another person, such as hitting, punching, or using a weapon. Violence is more likely to refer to an extreme form of aggression that has intentional injury as its primary goal. We agree that it is important to distinguish between violence and aggression, particularly in the context of psychiatric inpatients. We will revise our manuscript to clarify the difference between aggression and violence and ensure that these terms are used appropriately throughout the manuscript.

However, we would like to mention that in previous studies related to aggression and violence in psychiatric inpatients, there has been a lack of consistency in the use of these terms. Some studies have used "aggression" as the main topic, while others have used "violence." Moreover, some studies have used both terms without clear definitions, leading to confusion in the interpretation of the results. In our manuscript, when referring to previous studies, we have used the descriptions consistent with the original research papers. Additionally, in the Discussion section of our paper, we will highlight the potential impact of using either "aggression" or "violence" in research, as it may influence the interpretation of results. We appreciate your feedback and will ensure that our manuscript is clear and consistent in its use of these terms.

3. Wouldn't a past history of aggression be the most significant predictor of future aggression?

Response: Thank you for your feedback and suggestion. Indeed, a past history of violence is an important factor in predicting future violent behavior, and it is one of the factors we considered in our study. Consistent with previous studies, our study showed the violent group had a more violent history than the nonviolent group. As the reviewer said, we believe that while a past history of violence is indeed an important factor in predicting future violent behavior.

4. Could the authors provide a text sample assessed by the psychiatrists and the software as to what constituted a positive example of "aggression"

Response: Thank you for your inquiry regarding a text sample that demonstrates positive examples of violence. As requested, we would like to provide an example of a nursing record that was assessed by the psychiatrists as a positive example of inpatient violence:

"After toileting, the patient attempted to climb onto another patient's bed. Despite being reminded of their correct bed location, the patient persisted and even attempted to kick the other patient lying on the bed. Verbal intervention was used to try to stop the patient, but the patient then physically attacked the security staff." This incident was considered an example of inpatient violence in our study.”

We appreciate your suggestion and have added this example to our manuscript to provide a clearer illustration of the criteria used to define inpatient violence (page 7, lines 105-109).

5. Also wanted to know the rationale behind not including "verbal aggression" in the definition of aggression.

Response: Thank you for your question regarding the exclusion of verbal aggression in our definition of violence. Our rationale behind this decision is that verbal aggression may not always be recorded in nursing records due to its lower severity compared to physical aggression and destruction. Since our study relied on reviewing the content of nursing records to determine the occurrence of violence, including verbal aggression in our definition may lead to an underestimation of its prevalence and consequently affect the predictive outcomes.

6. Could the authors give explanations as to why the text mining failed at detecting aggression (AUC of approximately only 0.65)?

Response: There are several possible reasons why text mining failed at detecting violence. First, Chinese is a complex language with multiple meanings for the same word, which can make it difficult for text mining algorithms to accurately identify the intended meaning of the text. Second, Medical terminology in Chinese can be ambiguous, with different terms used to describe similar symptoms or conditions. This ambiguity can cause confusion for text mining algorithms, leading to inaccurate or incomplete results. Third, Chinese EMRs may not be standardized, meaning there is a lack of consistency in how patient information is recorded. This can make it difficult for text mining algorithms to accurately detect violence from the EMR data. We have included these explanations in the revised version of our manuscript (page 21, lines 345-353).

Reviewer #2:

1. In this work, several text mining and machine learning techniques were used to analyze the tested data. The authors estimate the physical aggression rate for psychiatric inpatients and establish a predictive model for aggression in psychiatric inpatients. The paper's contribution to existing knowledge in this research field is not well justified. The authors mentioned some recent techniques, but the paper needs to address the motivation for developing another method. The paper needs to contribute more, and the following points can improve the manuscript.

Response: We are very grateful for the reviewers' suggestions to make our article more suitable for publication, and we have done our best to modify the article according to the reviewers' suggestions

2. The title can be improved.

Response: We appreciate the reviewer's constructive suggestion for improving our work. We have revised the title of our manuscript to "An Analysis of Chinese Nursing Electronic Medical Records to Predict Violence in Psychiatric Inpatients using Text Mining and Machine Learning Techniques" We believe that this new title accurately reflects the focus and contributions of our research, and we hope it will better engage potential readers.

3. A comparative study can be added to the BACKGROUND section in table form to show the recent efforts.

Response: Thank you for your valuable feedback. We appreciate your suggestion to include a comparative study in table form in the background section to showcase recent efforts in this area. As per your suggestion, we have added a table (Table 1) comparing the prevalence of violence and associated risk factors across various studies conducted in the last three years. The table will help readers understand the similarities and differences among the studies and highlight the research gaps. We hope this addition will enhance the quality of our manuscript, and we thank you once again for your insightful feedback. The proposed title for the table is: "Comparative Analysis of Prevalence and Risk Factors of Violence in Psychiatric Inpatients: A Review of Recent Studies (2019-2022)."

4. Figure 1 should be improved.

Response: Thank you for reviewing our manuscript. We appreciate your feedback regarding Figure 1. We have made significant improvements to Figure 1. We have modified the figure to make it more visually appealing and easier to understand. The revised figure is presented below.

5. The novelty of this work is not clear. Clarify this.

Response: Thank you for your comments on our paper. We appreciate the opportunity to address your concerns.

Regarding the novelty of our work, we believe that our study makes several important contributions. Firstly, we utilized text mining and machine learning techniques to predict violence in psychiatric inpatients based on nursing electronic medical records. To the best of our knowledge, this is one of the first studies to apply these techniques specifically to Chinese nursing electronic medical records.

Secondly, our study expands the current literature on violence prediction in psychiatric inpatients, as previous studies have mainly focused on clinical variables, such as demographics, diagnoses, and medication use. Our study demonstrates the potential of using nursing records, which contain rich behavioral and contextual information, as a valuable source for predicting aggression.

Finally, our study also contributes to the field of mental health care by providing a potential tool for early detection and prevention of violence in psychiatric inpatients. This can improve patient safety and well-being, as well as reduce the burden on mental health care providers.

We hope that this clarifies the novelty of our work. Please let us know if you have any further questions or concerns.

6. Performance evaluation metrics are not enough. Add some other metrics and explain them mathematically.

Response: Thank you for your valuable feedback. We agree that more performance evaluation metrics can provide a more comprehensive understanding of the model's effectiveness. To further enhance our analysis, we include three more metrics such as precision, recall, and f1-score (page 13, lines 198-200). We have also provided equations of the selected metrics (page 13, lines 201-205). For details, please refer to the fifth paragraph of the Prediction Model Assessment section (pages 11-13).

7. The proposed method should be compared with more recent techniques.

Response: In response to this comment, we have included two more recent text mining and machine learning techniques: sentence-BERT and boosted random forest. Sentence-BERT, or SBERT, is a powerful technique for generating text embeddings, which are numerical representations of the meaning and semantic content of a given piece of text. One of the key strengths of SBERT is its ability to capture the nuances of language and the subtleties of meaning that are often lost in other embedding techniques. This is achieved through the use of siamese neural networks, which are trained to encode the meaning of two similar sentences in such a way that their embeddings are very close together in vector space, while the embeddings of dissimilar sentences are far apart. This allows SBERT to generate highly discriminative embeddings that can be used for a wide range of natural language processing tasks, such as text classification, semantic search, and information retrieval.

Boosted random forest is a powerful machine learning technique that has several advantages over conventional approaches. One of the key strengths of BRF is its ability to handle complex and high-dimensional data, which are often encountered in real-world applications. Boosted random forest achieves this by combining the strengths of two powerful algorithms: Random Forest (RF) and Boosting. RF is a well-known machine learning technique that is based on decision trees and is effective at handling large datasets with many features. Boosting, on the other hand, is a method of combining weak learners into a strong learner, which can improve the overall performance of the model. By combining these two techniques, boosted random forest is able to generate more accurate and robust predictions than conventional machine learning algorithms.

In the experimental evaluation, we found that the models constructed using TF-IDF text features outperformed SBERT. In addition, boosted random forest exhibited higher predictive performance compared to the conventional classification techniques. For details, please refer to the “Prediction Model Performance” section.

8. Tabular data should be presented with the graphs.

Response: We appreciate your feedback regarding the presentation of our data. We have carefully considered your suggestion and have now included tabular data alongside the graphs presented in the manuscript (Table 3 and Figure 3).

9. There should be some discussion on the limitations of the methods presented in a separate section.

Response: Thank you for your comments on our paper. We appreciate your suggestion regarding the need to discuss the limitations of our methods in a separate section. We have taken this feedback into consideration and have revised our manuscript accordingly. The revised section is presented below.

“In this study, we used text mining of Chinese EMRs to predict violence in psychiatric inpatients. While our results provide some insights into the potential of this approach, there are several limitations to our study that should be discussed. First, the use of text mining to extract data from EMRs may not capture all relevant information on risk factors for violence in psychiatric inpatients. For example, data on whether admission was voluntary, the number of previous admissions, quality of life, family support, intelligence level, and history of sexual abuse, which are known to be associated with the risk of violence in psychiatric wards [4, 5, 8-20], were not included in our data set. As a result, the accuracy of our predictions may be limited by the absence of these important factors. Second, the accuracy of our predictions may also be limited by the quality and completeness of the EMRs used in our study. It is possible that errors or inconsistencies in documentation could have affected the reliability of our data and our ability to accurately identify risk factors for violence. Third, the use of Chinese EMRs may introduce cultural and linguistic biases that could impact the validity of our predictions. It is important to acknowledge that the cultural and linguistic factors that may influence violence in psychiatric inpatients are complex and may not be fully captured by our text mining approach. Fourth, the EMRs did not clearly specify the methods through which diagnoses were made. Therefore, we could not evaluate the diagnostic accuracy of the psychiatric disorders reported in the EMRs included in our study. Fifth, our study did not include verbal aggression. Therefore, the prevalence of verbal aggression in psychiatric wards and the accuracy of verbal aggression prediction in psychiatric wards using Chinese EMRs warrant further study. Finally, the timeframe of our study may have impacted the accuracy of our predictions, as violence in psychiatric inpatients could be influenced by factors that change over time, such as changes in medication or therapy.”

10. References should be updated; there are no references in 2022.

Response: Thank you for your feedback on our manuscript. As per your comment, we have added multiple recent references in the field, such as "Instruments for Measuring Violence on Acute Inpatient Psychiatric Units: Review and Recommendations," published in Psychiatr Serv in June 2022, and "Inpatient violence in a psychiatric hospital in the middle of the pandemic: clinical and community health aspects," published in AIMS Public Health in February 2022. We have also made sure to review and update all our references to ensure they are up-to-date and relevant to our study. We thank you for bringing this to our attention, and we hope that the updated references enhance the quality and rigor of our manuscript.

11. The manuscript organization should be improved.

Response: Thank you for your review of our manuscript. We appreciate your feedback and suggestions for improving the organization of the paper. We have carefully reviewed the manuscript's structure and made necessary changes to ensure that the flow of ideas is clear and logical. We have also reorganized the content to improve its readability and overall coherence. If you have any further suggestions or comments, please do not hesitate to let us know. We are committed to addressing any remaining issues and improving our manuscript to meet your expectations.

12. Improve the English of the work. There are too many problems with paper typesetting.

Response: Thank you for your review of our manuscript. We appreciate your feedback on improving the English of the work and paper typesetting. We take your comments seriously and have worked diligently to improve the language and overall presentation of our manuscript. In addition, we have also sent our manuscript to a professional English editing service to ensure that the language is of the highest quality. We believe that the modifications made based on your feedback and the additional editing from the English editing service have significantly improved the quality and readability of our manuscript. We thank you for bringing these issues to our attention and for helping us to improve our work.

13. Change the “Conclusion” section title to “Conclusion and future directions” and add more discussion and future directions to the research.

Response: Thank you for your valuable feedback on our manuscript. We have made the requested changes to the "Conclusion" section as follows:

We have changed the title of the "Conclusion" section to "Conclusion and Future Directions" to reflect the inclusion of future research directions in this field. In this revised section, we have expanded our discussion of the results and added more future directions for research. The revised section is presented below.

“Violence is a key concern in acute psychiatric wards because it can lead to patient or staff injury and because it is counter-therapeutic. Studies have reported that 75% to 100% of nursing staff who work in acute psychiatric units have experienced patient assault [72, 73]. Aggression toward staff was indicated to contribute to high staff turnover [74]. Given the importance of this problem, predicting which psychiatric inpatients will commit violence is crucial. Therefore, we established a predictive model for aggression violence in psychiatric inpatients by using structured and unstructured data obtained from Chinese EMRs and several machine learning techniques with an acceptable accuracy. The results supported the feasibility of predicting aggressive violent incidents in psychiatric wards by using EMR data collected at the time of admission and indicated that such a method might be incorporated into routine clinical practice to enable early prediction of inpatient violence. Our findings may provide clinicians with a new basis for judging aggression violence risk in psychiatric wards and may enable first-line caregivers to implement appropriate treatment and preventive measures for hospitalized patients at high risk of violence, ultimately improving patient outcomes and staff safety.

Future research directions in this field could include incorporating additional variables, such as admission type, previous admissions, intelligence level, and history of sexual abuse, to improve the accuracy of predictive models for violence. Structured interviews could be used to determine psychiatric diagnoses and investigate the association between psychiatric disorders and the risk of violence in inpatients. Future studies could also explore the prevalence of verbal aggression in psychiatric wards and the accuracy of predicting verbal aggression using EMRs. Furthermore, validation of our model on other populations to determine its generalizability and applicability to different contexts is needed. The effectiveness of different machine learning techniques and prediction models could also be compared to identify the most accurate and efficient method for predicting violence in psychiatric inpatients. Moreover, targeted interventions could be developed and implemented to reduce the risk of violence in psychiatric inpatients identified as high-risk by the model. Finally, long-term outcomes of violence in psychiatric inpatients, such as patient outcomes and staff safety, should be examined to determine the impact of early prediction and intervention on patient care and outcomes.”

14. The paper is unsuitable for acceptance in its current form. The article needs rewriting to address the comments mentioned above.

Response: Thank you for taking the time to review our manuscript and for providing valuable feedback. We appreciate your efforts in helping us improve the quality of our work. We have carefully considered all of your comments and suggestions and have made significant revisions to the manuscript accordingly. We hope that our modifications have addressed your concerns. We believe that the changes we have made have significantly strengthened the manuscript, and we are confident that it now meets the high standards of the journal. Once again, thank you for your help in making our paper better. We hope that our revised manuscript will be acceptable for publication in your esteemed journal.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Qin Xiang Ng

15 May 2023

An Analysis of Chinese Nursing Electronic Medical Records to Predict violence in Psychiatric Inpatients using Text Mining and Machine Learning Techniques

PONE-D-23-00580R1

Dear Dr. Shen,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Qin Xiang Ng, MBBS, MPH

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: (No Response)

Reviewer #2: The authors have addressed most of my concerns. The paper can be accepted. It is recommended to have a list of abbreviations in table form in the introduction section.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

Acceptance letter

Qin Xiang Ng

30 May 2023

PONE-D-23-00580R1

An Analysis of Chinese Nursing Electronic Medical Records to Predict violence in Psychiatric Inpatients using Text Mining and Machine Learning Techniques

Dear Dr. Shen:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Qin Xiang Ng

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Data. CfsSubsetEval (26)-balanced-final.

    (CSV)

    S2 Data. Structured with SBERT (787)-balanced-final.

    (CSV)

    S3 Data. Structured with TF-IDF (648)-final.

    (CSV)

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    All relevant data are within the paper and its Supporting information files.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES