Abstract
The identification of accounting fraud is an important measure to safeguard the interests of stakeholders and ensure the long-term development of the company. The current traditional methods for identifying accounting fraud rely on manual review and judgment, lacking objectivity and accuracy. In order to improve the accuracy of accounting fraud identification, improve identification efficiency and objectivity, this article combines smart city information technology to conduct in-depth research on data mining algorithms for accounting fraud identification. This article first provides a brief overview of smart cities and information technology, then introduces the basic theory of accounting fraud identification, and finally implements accounting fraud identification through k-means clustering mining algorithm. The data is divided into k clusters, and abnormal clusters are identified by checking the characteristics and attributes of each cluster. Compared with traditional rule-based and pattern based methods, this approach can more flexibly adapt to different types and forms of fraud, and can discover unknown patterns of fraud. In the experiment, this article used electronic data collection, analysis, and retrieval systems on the websites of the Shanghai Stock Exchange and Shenzhen Stock Exchange to collect 641 annual reports and financial characteristics from 62 listed companies that engaged in financial statement fraud and 84 companies that were not reported to have financial statement fraud from 2012 to 2021 as test samples. The results were tested and analyzed from several aspects, including the number of misjudgments, misjudgment rate, and ROC curve. The final test results show that compared to traditional accounting fraud identification methods, the comprehensive misjudgment rate of data mining algorithms based on smart cities has decreased by 3 %. The conclusion indicates that data mining algorithms used in smart city information technology to identify accounting fraud can help improve the accuracy of accounting fraud, improve audit objectivity and effectiveness.
Keywords: Smart city informatization, Accounting fraud, Data mining, Identification test
1. Introduction
With the growth of the global economy, the capital markets of various countries have achieved good results in the past century [1]. It is precisely because of this that the identification of accounting fraud also changes with the development of the capital market. In the past, accounting fraud was very easy to identify, but with the development of technology, accounting identification has become more and more blurred. This has caused great distress to many investors and has become a headache for regulators. Accounting fraud is a very common phenomenon in the global capital market, which will not only cause many small and medium shareholders and investment institutions to suffer tragic economic losses, but also cause a credit crisis in the securities market, which seriously hinders the healthy development of the entire capital market [2]. Therefore, it is necessary to combine information technology to further strengthen the effect of accounting fraud identification, and strictly control the accounting fraud phenomenon [3]. At present, with the rapid development of information technology around the world, the mobility of labor and capital has increased, urbanization has also covered all parts of the world, and the population of urban and rural areas has changed dramatically, as shown in Fig. 1. It is precisely because of this that the construction of smart cities has become the development goal of all countries in the world. Smart city is to combine information and communication technology and Internet technology to find efficient and intelligent ways to achieve the purpose of citizens' quality of life. And over the years, smart living has achieved good results in countries around the world. Smart city technology has powerful data mining and pattern recognition capabilities, which can discover patterns and trends hidden behind a large amount of data, thus enabling more accurate identification of accounting fraud behavior. The motivation for choosing smart city technology for accounting fraud detection is of great practical value in improving data integration and analysis capabilities, data mining and pattern recognition capabilities, and improving the accuracy of accounting fraud identification.
Fig. 1.
Comparison of urban and rural population from 1995 to 2040.
This paper aims to study the data mining algorithm in the identification of accounting fraud by smart city information technology. The first aspect can help investors avoid unnecessary investment risks. A good accounting fraud identification method can become a protection tree for investors, helping investors avoid accounting fraud companies, so as to avoid economic losses. The second aspect is to help financial institutions such as banks conduct credit investigations on enterprises. Banks or financial companies use the company's financial statements to determine whether to issue loans. The company's financial statements cover a wide range, so judging financial statements is an extremely arduous task. Therefore, a superior accounting fraud identification method can improve the work efficiency of staff such as banks. Another aspect is to promote regulatory agencies to supervise enterprises. Overseeing the financial situation of a business is a very important job, but it is also an extremely difficult job. An excellent accounting fraud identification method can simplify work and improve efficiency.
The objective of this study is to explore the use of data mining algorithms in smart city information technology to identify accounting fraud, in order to improve the accuracy and efficiency of detecting fraudulent behavior. The research question of this article is how the application of data mining algorithms in smart city information technology can help identify accounting fraud, and how to utilize big data resources in smart city information technology, combined with data mining algorithms, to discover new patterns and trends in accounting fraud. This study aims to provide new insights and methods for the study of the relationship between smart city information technology and accounting fraud identification, in order to provide more effective fraud detection tools and methods for enterprises and regulatory agencies.
In recent years, although many experts and scholars have put a lot of effort into the research of accounting fraud identification, they have not paid much attention to the application of this research to smart city informatization technology. They only talked about the feasibility and researchable value of information technology in the identification of accounting fraud from the side, but lacked specific conceptual research and practical exploration. The innovation of this article lies in using clustering algorithms to discover patterns and anomalies in the data, in order to identify possible fraudulent behavior. The k-means clustering mining algorithm can help accountants quickly analyze and mine large amounts of data, thereby improving the efficiency and accuracy of fraud identification. By conducting cluster analysis on a large amount of data, abnormal patterns can be identified more quickly, enabling timely measures to prevent and combat fraudulent behavior.
The research contribution of this article is.
-
1.
Integrated application of smart city information technology: This article introduces how to integrate smart city information technology for accounting fraud identification, which is a relatively new application direction in the accounting field and introduces new technological means for the accounting field.
-
2.
Application of Data Mining Algorithms in Accounting Fraud Identification: This article delves into the application of data mining algorithms in accounting fraud identification, especially through the k-means clustering mining algorithm, which introduces new technical means and methods for the accounting field.
-
3.
Empirical research comparing traditional methods: The article compared the smart city based data mining algorithm with traditional accounting fraud identification methods through experiments. The results showed that the comprehensive misjudgment rate of the smart city based data mining algorithm decreased by 3 %, indicating the effectiveness and advantages of the new method.
-
4.
Improvement of objectivity and accuracy in identifying accounting fraud: The article points out that the new method helps to enhance the accuracy of accounting fraud identification, improve audit objectivity and effectiveness, which is of great significance for safeguarding the interests of stakeholders and ensuring the long-term development of the company.
In the research structure, the first chapter of this article is the introduction, which includes an explanation of the research background, research objectives, significance, and innovative points of this article; Chapter 2 is a literature review, which mainly includes a detailed review of relevant research literature on financial fraud identification models and smart city information technology at home and abroad, and a summary of the research status. Chapter 3 introduces the impact of smart city information technology on accounting fraud, including an overview of smart cities and information technology, basic theories of accounting fraud identification, and data mining algorithms; Chapter 4 is the accounting fraud identification test. In this chapter, the effectiveness of accounting fraud identification is analyzed from the perspectives of the number of misjudgments, misjudgment rate, and ROC curve; Chapter 5 provides a summary of the experimental structure, research methods, contributions, and impacts of this article for discussion; Chapter 6 is the research conclusion and outlook. This section summarizes the research conclusions of this article and proposes future prospects for the study of financial fraud identification models.
2. Related work
Accounting is the process of recording, analyzing, and reporting financial information. Accounting fraud refers to fraudulent behavior that occurs during this process. With the development of intelligent technology, research on accounting fraud identification has made great progress. Bao Y developed a state-of-the-art fraud prediction model using machine ensemble learning methods. And a new performance evaluation index was introduced to evaluate the performance of the fraud prediction model, and it was found that the financial data mapping of the proposed new fraud prediction model is largely superior to the benchmark model of logistic regression and support vector machine model [4]. Eko EU proposed a financial fraud identification model based on descriptive statistics and ordinary least squares. The research results show that the application of this model significantly enhances the detection and prevention of fraud in the banking system [5]. Based on data from Chinese listed companies, Lu W constructed a modified M-score model with 9 variables using Wald based logistic regression method. The results showed that indicators such as gross profit margin, fixed asset depreciation rate, equity concentration, and audit opinion can characterize the financial fraud behavior of Chinese listed companies [6]. Li C introduced two variables based on customer accounting information - the difference between supplier sales growth and customer purchase growth, as well as customer excess purchases - and indicated that they can predict supplier revenue fraud. Through a series of cross-sectional tests, it was found that adding these two variables to the Dechaw, Ge, Larson, and Sloan models can improve the accuracy of fraud prediction [7]. Daraojimba R E used a systematic literature review method to analyze peer-reviewed articles and case studies from 2015 to 2022. The key findings reveal the transformation of financial fraud in the digital age, characterized by increasing complexity and the adoption of advanced technologies. The integration of artificial intelligence and predictive analysis is expected to significantly improve fraud detection capabilities [8]. Although the current research on identifying accounting fraud has a certain promoting effect on improving the objectivity and comprehensiveness of auditing, it is still difficult to meet the current financial audit needs in terms of accuracy of accounting fraud.
Intelligent city information technology has intelligence and efficiency [9,10]. And it has achieved good results in urban development and professional field construction research. Camero A discussed the application of information technology in the construction of smart cities, and he proposed a data analysis technique that enables the analysis of smart cities from an objective data-based perspective [11]. Cui Y proposed a privacy protection scheme to deal with the serious problem of personal privacy leakage in smart cities, namely the attribute-based broadcast encryption (ABBE) scheme. This scheme not only ensures the confidentiality of data, but also protects personal privacy [12].In relation to the idea of smart cities, Lakhno V suggested using a mathematical model to determine the best ways to invest in information technology and systems management. Lastly, he used the model to simulate the results and validate its operability and sufficiency. He then applied the model to assess several strategic tasks related to investing in IT and smart city systems [13]. Sun M summarized the benefits of pertinent IT for building a big data platform for smart cities and, based on this, developed an assessment model for the degree of development of smart cities. In line with the idea of a smart city, he integrated big data and blockchain technologies. Finally, he made a longitudinal comparison of the constructed evaluation model with the information data from 2012 to 2017. According to research, the size of smart cities has increased at an average yearly rate of more than 30 %, saving 20 % of the distribution of urban resources and emerging as a new pillar industry [14].Governments must communicate with citizens in order to give services to the public, and social media is the most convenient means of communication that knows no geographical boundaries. Ridwan K analyzed the storage location of Messenger on the basis of smart city informatization technology to obtain the required data and conduct further analysis in forensic analysis software. Finally, according to the research, WhatsApp Messenger on each operating system platform has the best performance [15]. Turgel I presents a retrospective analysis of the shift in the concept of smart cities by proposing a method to protect the organizational management environment within the framework of introducing smart city technologies. He examined the main causes of urban environmental pollution during his research, found structural issues in the environmental sector, and ultimately proposed smart city technology as a solution [16]. In conclusion, it is evident that the marriage of and smart cities has drawn the interest of numerous academics and developed into a new area of study in recent years. has also been well applied in smart cities, such as social software for communication between the government and residents, and the construction of smart city big data platforms. However, there are not many studies on the identification of accounting fraud by smart city. The performance and accuracy of current data mining algorithms in identifying accounting fraud still need to be improved, especially when dealing with large-scale and high-dimensional data. The efficiency and accuracy of the algorithms need to be further optimized. Therefore, it is imperative to conduct practical research on accounting fraud identification using smart city in order to further advance the field of accounting fraud identification. This paper studies data mining algorithms for identifying accounting fraud through smart city information technology, which can effectively objectively and accurately identify the status of accounting fraud in enterprises, providing effective support for audit decision-making.
The current research on accounting fraud identification methods faces many challenges and limitations, as shown in Table 1.
Table 1.
Limitations of current research.
| Sequence | Research method | Boundedness |
|---|---|---|
| Reference 2 | Machine ensemble learning | High requirements for data quality and feature selection, poor model interpretability |
| Reference 3 | Descriptive statistics and ordinary least squares method | In the identification of accounting fraud, it is often difficult to capture hidden patterns and anomalies. If there are outliers in the data, it will affect the accuracy of accounting fraud identification. |
| Reference 4 | Wald based logistic regression method | Neglecting the nonlinear relationship and interaction between accounting fraud variables |
From Tables 1 and it can be seen that the existing research on machine ensemble learning, descriptive statistics, ordinary least squares, and Wald based logistic regression methods all have certain limitations.
This article presents a clustering analysis data mining algorithm for identifying accounting fraud in smart city information technology. It can discover hidden patterns and group structures in the data without the need for pre labeled training samples, thereby revealing the similarities and differences between different samples. It can help identify fraud patterns and abnormal behaviors in the data, providing powerful tools and methods for fraud identification.
3. The impact of smart city information technology on accounting fraud
3.1. Overview of smart city and information technology
3.1.1. Smart city concept
Smart city is a new idea for urban development today, and it has also become a trend in the development of various industries. A smart city is based on a digital city, which integrates a new generation of information and communication technologies in urban construction. Smart cities use new technologies to interact, integrate and perceive, transmit, process and analyze urban resource information in an orderly manner through intelligence and interconnection, so as to achieve a harmonious life between cities and people [17]. At the same time, it makes a timely and effective perception of the various production activities of the society and residents, so as to achieve the purpose of enhancing the ability of urban planning and construction, improving the efficiency of urban management and improving people's life satisfaction. In this way, the development path of urban intelligence, health and convenience will be finally realized.
3.1.2. Features of smart city
The construction of a smart city is based on new information technologies such as big data, Internet of Things and mobile Internet as the core application, which reflects the intelligence and smart of the city. Smart cities are different from traditional cities [18].
First, smart cities are perceptive in all aspects. Smart cities strengthen urban construction through various sensing technologies and information and communication technologies. For example, using perception technology to perceive and identify information such as the state of the city and the life of residents, so as to build a dynamic urban database and an integrated management platform [19]. The urban dynamic database is the analysis and fusion of data in various aspects such as urban economy, urban planning, residents' life, and industrial conditions.
Second, smart cities have self-development. Smart cities are centered on the development of urban industries and continuous innovation. Therefore, all aspects of smart cities have certain self-optimization and self-adjustment capabilities. In particular, driven by and the Internet, smart cities have readjusted the industrial structure and enabled the rapid development of new information technologies, thereby promoting the birth of emerging industries and increasing the internal driving force of cities [20].
Third, smart cities are closely interconnected. The network is like the brain of a smart city. Through the construction of urban network, various application systems in the city can be connected, so that various information and resources of the city can be well integrated and shared and learned by people, which brings great convenience to people's lives. This also enables various problems in the city to be effectively solved and improves the level of urban intelligent development.
The development of smart cities can be assessed through smart city models. The smart city model includes six characteristics of smart residents, government, transportation, environment, economy, and life, as shown in Fig. 2.
Fig. 2.
Six characteristic attributes of a smart city model.
3.1.3. Information technology from the perspective of smart city
The development of information and communication technology has brought new opportunities for urban development. In order to better integrate into the world economy, urban informatization is also the general trend. Through various information technologies, urban construction, economic development, service construction and other aspects are reshaped to speed up the process of urban social development [21]. The city's informatization construction is to digitize urban information through various information technologies, and to carry out intelligent management, processing and analysis of various data resources, and to enhance urban interoperability and sharing from the perspective of technology and system. Therefore, urban informatization needs to deeply apply to every corner of the city, and it is also the main way to integrate all walks of life.
Smart city is based on the construction of urban informatization. Urban informatization provides the most basic guarantee for the construction of smart city, and the construction of urban informatization will be more perfect under the construction concept of "smart city". The role of informatization construction is to apply to urban construction, to build a link between the city, the people, and information resources, and to embody the wisdom and humanity of urban planning and construction. Smart city is advanced, it fully absorbs and human wisdom, which is the expectation of urban development.
3.2. Basic theory of accounting fraud identification
3.2.1. Definition of accounting fraud
There are many different views on the definition of accounting fraud. Many scholars believe that accounting fraud is actually the fraud of the management organization, which means that the company deliberately prepares false financial reports, and conceals or deceives users of financial reports by falsifying important matters that should be disclosed in the financial reports. One is when the perpetrators of fraud misappropriate company assets and try not to disclose them in financial reports. The second refers to fraud in financial reports. In order to make the company more credible and well-known in the capital market, the perpetrators of fraudulent acts conceal and deceive investors and report users by compiling false financial reports, and use this as a means to interfere with investment decisions to achieve illegal intentions. They refer to this as accounting fraud that deliberately reflects misrepresented financial statements. Neither of the above two acts can fairly reflect the company's real development status and financial status in the capital market.
3.2.2. The theory of the causes of accounting fraud
The theory of the causes of accounting fraud has been studied and discussed by many scholars abroad for a long time. Since the 1950s, many famous theories of the causes of accounting fraud have been put forward by scholars. Among them, the longest-running and most influential accounting fraud causes analysis theories include the Iceberg Theory, the Triangle Theory and the GONE Theory.
-
(1)
Iceberg theory
In the iceberg theory, fraudulent companies can be divided into two broad categories. To put it simply, a fraudulent company is like an iceberg, and the iceberg above sea level is the visible risk, the first category. The first type of risk is information related to the organization and structure of the company, which can be detected by the public, and this type of risk only occupies a small part. The second type of risk is different from the first type. It refers to the risk that is not easy to be monitored by the public, and it is the risk under the sea level. This type of risk includes the management's business philosophy and attitude towards fraud. The second type of risk is the decisive factor and accounts for a large part of the overall risk.
-
(2)
Triangle theory
The triangle theory compares accounting fraud to a triangle, that is, three elements that make up the theory, as shown in Fig. 3, the first element is pressure. As we all know, cheating is an intrinsic motivation created by the pressure of the company. Pressure comes from various aspects, one of which is financial pressure. Most of this pressure originates from the internal pressure of management employees who worry about their limited development space in the company. Under this kind of pressure, it is very easy for management to commit financial fraud in order to get a promotion or avoid punishment from the boss. The second element is opportunity. Opportunity here refers to the opportunity to commit accounting fraud, in other words, when accounting fraud occurs, the fraudster can escape the company's existing charter inspection and thus avoid being punished. From the perspective of opportunity factors, most of the factors that cause accounting fraud come from within the company, such as the imperfect accounting and auditing systems of enterprises, the difficulty in identifying fraud, and the lack of professional competence of supervisors. The third element is self-rationalization. When the management personnel conduct accounting fraud, they will separate the accounting fraud behavior from the norm, or bind it with the concept of morality, resulting in the generation of wrong concepts.
-
(3)
GONE theory
Fig. 3.

Fraud triangle theory illustration.
The GONE theory believes that there are four factors affecting the risk of accounting fraud, namely G, O, N, and E, which correspond to greed, opportunity, need and exposure, respectively. One of the four factors is indispensable, and through the interaction and mutual influence between the four factors, the motivation for fraud is formed, as shown in Fig. 4.
Fig. 4.
GONE theory illustration.
Greed originally means longing but not being satisfied, but in this theory greed not only means longing but not being satisfied, it has a more definite explanation. It also includes the reflection and choice of an individual with independent subjectivity in terms of ideology, morality and values. Ideology, morality and values play an active role in the generation of individual motivation and the occurrence of behavior, and ultimately appear as individual value choices. The characteristics of greed can be mainly manifested as improper choices under bad values, while fraudsters often violate correct values. They are dominated by their own bad values, rationalize their bad motives, and accept the fact that financial fraud occurs with their own improper values.
Opportunity mainly refers to the favorable situation of fraud implementation, which is mainly reflected in two aspects. The first is the power of the fraudster in the enterprise. The greater the relative power advantage the fraudster has within the enterprise management, the easier it is to form a favorable situation for fraud. The second is the structural environment and institutional environment of the enterprise. If there is a lack of corporate governance structure or system, there are favorable circumstances for fraudsters to conduct accounting fraud through improper means to obtain improper benefits.
Need is the key to the occurrence of fraudulent behavior. It refers to the motivation formed by the fraudster before the financial fraud occurs, and it is also the most urgent need of the fraudster. For example, the needs of unlisted companies to seek listing, and the needs of listed companies to allot shares in order to expand their development, these needs have turned into urgent pressures that need to be solved urgently. Fraudsters under the stimulation of pressure, the risk of financial fraud will increase significantly.
Exposure is explained from two perspectives, one refers to the risk of financial fraud being discovered and disclosed, and in general refers to the possibility of successful financial fraud. The second refers to the possible punishment, the nature of the punishment and the intensity of the punishment that the perpetrator of the fraudulent behavior may encounter after the fraudulent behavior is disclosed. Fraud exposure refers to the costs that need to be borne after the fraudulent behavior is implemented. From this point of view, fraud exposure means that the greater the possibility of fraud being discovered, the more serious the nature of the punishment, and the greater the punishment, the lower the risk of fraud. On the contrary, if the costs and costs of fraud are less than the benefits brought by fraud, the risk of fraud will increase, and the possibility of fraud will be high.
3.3. Data mining algorithms
To ensure end-to-end visualization of smart city infrastructure and services, the information technology model requires the integration of data from different sources. Each method has its own sampling frequency, delay characteristics, and semantic features. Cloud computing technology, as a scientific and effective solution, can effectively solve these resource allocation problems. It optimizes application system performance and minimizes physical node resources in a distributed and centralized manner.
Based on the cloud computing technology of the smart city information model, this article regards accounting fraud identification as a multi-objective allocation problem. For large-scale accounting fraud identification data mining tasks, firstly, based on cloud computing technology, the financial data resources of enterprises are flexibly adjusted, and data mining algorithms are used to mine the potential rules of each financial data. Through training and optimization, abnormal financial data is identified to ensure that the data mining algorithms can more accurately identify accounting fraud while ensuring recognition efficiency. In the cloud computing environment, the resources owned by each node in the network can be represented by a dimension vector, and each dimension represents a type of resource. Each virtual machine has a dimension vector. The purpose of cloud computing is to place a large number of virtual machines in multiple physical nodes to occupy the minimum number of physical nodes and minimize load changes.
The description of resource allocation problem is expressed as formula (1) and formula (2):
| (1) |
| (2) |
Among them, represents the number of physical nodes occupied, and represents the balanced load variance of the server cluster. When a physical node is used, is 1, otherwise is 0. represents the dimensional variance, which is expressed by formula (3):
| (3) |
represents the number of physical nodes, represents the average of the -th dimensional performance characteristics of all physical nodes. The performance characteristics are normalized values, which are equal to the remaining allocation of -th dimensional resources in the physical nodes divided by the total -dimensional resources. is the -th dimensional performance characteristic of the physical node.When:
| (4) |
Decompose the accounting fraud identification task of the data mining algorithm into multiple sub tasks using formula (4) and process them in parallel on multiple nodes.
In addition to being a crucial data processing technique, clustering analysis is one of the key areas of data mining. Clustering has also been applied in all walks of life. The classic K-means algorithm is a clustering algorithm based on data segmentation, which has been widely applied in recent years. Clustering is the process of dividing data into multiple groups based on specific parameters, such as distance, in order to establish strong relationships between data in the same group and establish significant differences between data in different groups. Among them, distance related methods can better capture the correlation between data, that is, data similarity. Compared with other data mining algorithms, K-means algorithm is simpler and more intuitive, easy to implement and understand. Its computational complexity is relatively low, making it suitable for processing large-scale datasets. It performs well in processing large-scale datasets and can quickly perform clustering analysis on the data. The K-means algorithm can achieve good clustering results under certain conditions, especially suitable for situations where the distribution of datasets in various categories is more obvious. Therefore, this article chooses the K-means algorithm to analyze the identification of accounting fraud. The basic purpose of cluster analysis is to classify samples from different dimensions. The k-means data mining algorithm is a commonly used clustering analysis method used to divide observations in a dataset into different groups. The main components of this algorithm include the selection of initial cluster centers, the allocation of observed values to the nearest cluster centers, and the calculation of new cluster centers. The basic principle of the k-means algorithm is to determine the optimal cluster center by minimizing the sum of the distances between the observed values within the group and their corresponding cluster centers, thereby achieving data clustering analysis. In the identification of accounting fraud, the k-means algorithm can help identify transactions or accounts with abnormal characteristics, which helps to detect potential accounting fraud behaviors in a timely manner.
In the identification of accounting fraud, the k-means algorithm can be used to identify abnormal accounting transactions or accounts. During this process, first collect data on accounting transactions or accounts, including various transaction amounts, times, locations, and other information. Select transaction amount, frequency, time interval and other characteristics as input data, and then standardize the data to ensure that the numerical ranges of different features are similar. Select the k value again and divide the data into k groups. Randomly select k observations as the initial cluster center. Calculate the distance from each observation to each cluster center and assign it to the group where the nearest cluster center is located. Then recalculate the average of the observed values for each group and use it as the new cluster center, repeating this step until the cluster center no longer changes or reaches the predetermined number of iterations.
The implementation process of the accounting fraud identification model based on k-means in this article is shown in Fig. 5.
Fig. 5.
Implementation process of accounting fraud identification model based on k-means.
The three distance functions commonly used in cluster analysis are expressed as follows:
The Minkowski distance is shown in formula (5) [22]:
| (5) |
When takes 1, 2, ∞, the absolute distance can be obtained.
Mahalanois distance is shown in formula (6) [23]:
| (6) |
is the covariance matrix of the sample matrix.
Lance distance is shown in formula (7) [24]:
| (7) |
The similarity measure between feature variables can be represented by the similarity coefficient function, and the similarity coefficient is expressed in formula (8), formula (9) and formula (10) [25]:
| (8) |
| (9) |
| (10) |
means function . The link between the two characteristic variables is closer the closer is approaching 1.
Two varieties of similarity coefficients exist.
-
(1)
The cosine of the included angle is expressed in formula (11) [26]:
| (11) |
The degree of correlation between the two feature variables is determined by the cosine of the included angle, which is derived from the direction angle. The cosine of the included angle will similarly have a bigger value if the two feature variables' directions are closer together. Put another way, the cosine of the included angle will likewise be modest if the two feature variables' directions are somewhat apart.
-
(2)
Correlation coefficient is expressed in formula (12) [27]:
| (12) |
The degree of linear correlation between two feature variables is indicated by the correlation coefficient, which is represented by the algorithm in formula 8.
Generally speaking, the correlation coefficient will use the measure function to represent the general use distance of the inter-class measure, and among them, the shortest distance method is the most common and commonly used method, and its calculation formula is expressed in formula (13) [28]:
| (13) |
Hierarchical clustering method is another common method besides the shortest distance method. It has multiple branching methods.
Intermediate distance method [29]:
| (14) |
In formula (14):
represents a certain class, represents a certain class, and is a combination of and .
| (15) |
In formula (15), when , it is the middle distance method.
Center of gravity method: the separation between the centers of gravity may be used to describe the distance between classes. This process is represented by formula (16), formula (17), and formula (18)
| (16) |
| (17) |
| (18) |
Class averaging method: The formula for calculating the square of the distance between elements in two classes and is equal to the average square of the distance between those elements. It is expressed in formula (19) and formula (20) [30]:
| (19) |
| (20) |
Dispersion sum of squares method: difference analysis is the source of this approach. A correctly categorized sample should have a total of squared deviations between classes that is bigger and the sum of squared deviations of comparable samples that is less. This method's calculating formula is displayed in formula (21):
| (21) |
The sum of squares for the overall within-class variance is expressed in formula (22) [31]:
| (22) |
The definition of separation within a class is formula (23) [32]:
| (23) |
The recursive formula for distance is expressed as formula (24) [33,34]:
| (24) |
Combine the four hierarchical clustering techniques mentioned above, as shown in formula (25) [35]:
| (25) |
Among them . Changing the same settings will yield several hierarchical clustering techniques. The parameters are shown in Table 2:
Table 2.
Hierarchical clustering parameters.
| Method | Middle distance method | Center of gravity | Class average | Sum of squared deviations |
|---|---|---|---|---|
Based on the parameters in Table 2, algorithm adopts a distance based evaluation method. Fig. 6 shows the schematic diagram of Algorithm :
Fig. 6.
K-means algorithm diagram.
The clustering criterion function in the algorithm is often the error sum of squares criterion function, which has the definition shown in formula (26) [36]:
| (26) |
In formula (26), is denoted as the mean of data objects in class ; is denoted as a spatial point in class .
The algorithm belongs to the mountain climbing algorithm. At the end of execution, the algorithm generally finds a local minimum [37], as shown in Fig. 7:
Fig. 7.
Local minima and global optima.
At this point, the algorithm has found the most suitable model parameters that can minimize errors or improve accuracy, thereby achieving effective identification of accounting fraud. With the support of smart city information technology, algorithms can perform pattern recognition and anomaly detection on large-scale data. By analyzing historical data and patterns, identifying abnormal patterns and behaviors, potential fraud risks can be identified in a timely manner.
By using software to cluster the dataset and comparing the resulting results with the algorithm's clustering findings, the effectiveness of the algorithm was further verified. Firstly, extract financial indicators and variables related to accounting fraud from the financial dataset. In the process of indicator selection, based on the characteristics and common indicators of accounting fraud, this article selects financial indicators that are closely related to fraud, mainly including the balance sheet, income statement, cash flow statement, and financial ratios. Among them, the balance sheet, income statement, and cash flow statement are the basic components of a company's financial report, which reflect the company's financial condition, operating results, and cash flow situation. According to accounting principles and standards, these financial statements contain a large amount of key financial data, which is representative in fraud identification. Financial ratio is an important indicator for evaluating a company's financial condition and performance, including profit margin, solvency, current ratio, etc. These ratios can reflect the financial health of the company. Using regression analysis methods to establish mathematical models between financial indicators and reveal their functional relationships. Model the balance sheet, income statement, and cash flow statement as independent variables, and financial ratios as dependent variables to analyze their relationships.
After extracting financial data relationships, filter and clean the data to exclude irrelevant or noisy data. In data filtering, based on the business logic and common sense of financial data, check whether the data conforms to the expected relationships and patterns. If there is a discrepancy between the data and the business logic, it is considered as a data error or anomaly, and further exclusion is required. Then observe the distribution of the data and check if there are any significant deviations or anomalies in each data. If the data distribution does not conform to a normal distribution or other expected distribution, filter and exclude it at the same time.
Reuse statistical methods to identify and process missing values or incomplete data. Use the cleaned financial data for clustering algorithm analysis and compare the clustering results with the algorithm's clustering results. In this process, analyze the financial data relationships in each cluster, including the interrelationships and trends of various financial indicators and variables. Study the average financial data characteristics of each cluster to understand its financial status and characteristics. Finally, compare the financial data relationships between different clusters, analyze their similarities and differences, and identify possible abnormal patterns or trends.
First, enter 15 data objects, as shown in Table 3.
Table 3.
Data object.
| Data object | Variable 1 | Variable 2 |
|---|---|---|
| 5 | 7 | |
| 5 | 8 | |
| 8 | 7 | |
| 5 | 5 | |
| 5 | 9 | |
| 5.5 | 4 | |
| 8 | 9 | |
| 11 | 5 | |
| 10 | 6 | |
| 11 | 7 | |
| 10 | 5 | |
| 8.5 | 8 | |
| 8 | 5 | |
| 4.5 | 4 | |
| 5.5 | 5 |
According to Table 3, next, choose the data object as the starting clustering center at random. Table 4 and Table 5 display the outcomes of the analysis done with using software:
Table 4.
Cluster centers obtained by K-means algorithm.
Table 5.
K-means clustering results.
| Case Number | Cluster | Distance |
|---|---|---|
| 1 | 1 | 0.870 |
| 2 | 2 | 0.390 |
| 3 | 3 | 1.009 |
| 4 | 3 | 0.609 |
| 5 | 2 | 1.267 |
| 6 | 2 | 0.655 |
| 7 | 4 | 1.009 |
| 8 | 1 | 0.802 |
| 9 | 3 | 0.668 |
| 10 | 2 | 1.357 |
| 11 | 1 | 0.801 |
| 12 | 3 | 0.386 |
| 13 | 4 | 0.136 |
| 14 | 2 | 0.620 |
| 15 | 3 | 0.422 |
In Table 4, the clustering results of VAR00001 are 3.66, 7.01, 9.30, and 4.00, respectively; The clustering results of VAR00002 are 9.64, 9.00, 7.64, and 6.30, respectively; The clustering results of VAR00003 are 4.10, 8.23, 9.60, and 4.30, respectively.
From the results in Tables 5 and it can be seen that the clustering results computed by algorithm and are identical, indicating that the algorithm performs better and has a clear clustering impact, according to the analysis of the data.
4. Accounting fraud identification test
4.1. Simulation design
Firstly, it is necessary to collect data related to accounting fraud, including financial statements, transaction records, employee information, etc. These data can come from real enterprises or simulated datasets. Perform preprocessing work such as cleaning, deduplication, and missing value processing on the collected data to ensure the quality and integrity of the data. Using data mining algorithms to perform feature selection on data, selecting feature variables related to accounting fraud, in order to improve the accuracy and interpretability of the model. Establish a data mining model for identifying accounting fraud based on selected feature variables. Evaluate the established model, conduct simulation experiments on the collected dataset, and observe the performance of the model in different situations.
4.2. Dataset
This article collected 641 annual reports and financial characteristics from 62 listed companies that engaged in financial statement fraud and 84 companies that were not reported to have financial statement fraud from 2012 to 2021 using electronic data collection, analysis, and retrieval systems on the websites of the Shanghai Stock Exchange and Shenzhen Stock Exchange as test samples. The test samples in this article are divided into large sample fraud and non fraud (non fraud fraud S1). And three sub samples (non fraud serious S2), non fraud and general fraud (non fraud general S3), and general fraud and serious fraud (general serious S4).
4.3. Parameter assignments
This article uses financial statement data from real listed companies to conduct 10 experiments to obtain average results, and the experimental environment is single threaded calculation.
4.4. Formulas of the performance metrics
This article takes smart cities as the background and tests the K-means data mining algorithm proposed in this paper from three aspects: overall false alarm recognition rate and method performance. It compares it with traditional support vector machine accounting fraud recognition methods to verify the practicality and effectiveness of this method. Support Vector Machine divides data into different categories by searching for the best hyperplane to identify potential fraudulent transactions. Its basic principle is to find a hyperplane that can maximize the interval, thereby achieving data classification. It belongs to one of the commonly used methods in identifying accounting fraud.
4.5. Reproducibility of the proposed work
This article conducts multiple experiments in the identification of accounting fraud to obtain average results, and uses Friedman test statistical analysis methods to analyze and explain the experimental results.
4.6. Baseline method
In the comparative test, the algorithm in this article chose the traditional statistical method compared to the traditional accounting fraud identification method. And use the same evaluation indicators and methods to ensure reliability in comparison with the proposed work.
The final recognition effect is shown in Fig. 8, Fig. 9.
Fig. 8.
Statistics of false positives.
Fig. 9.
Sample false positive rate.
Fig. 8A shows the number of misjudgments in the sample of the accounting fraud identification method proposed in this paper.
Fig. 8B shows the number of misjudgments in the samples of traditional accounting fraud identification methods.
In Fig. 8, there is a significant difference in the number of sample misjudgments between the two types of algorithms. It can be seen from Fig. 8A that the sum of misjudgments in the fraudulent-non-fraud samples of the data mining algorithm based on smart city informatization technology is 143. The sum of false positives in the unfraud-severely fraud sample was 92. The total number of false positives in the unfraud and general fraud samples was 113. The sum of false positives in the general fraud and serious fraud samples was 105. In Fig. 8B, The sum of misjudgments in the fraud-non-fraud sample by the traditional accounting fraud identification method is 149. The sum of false positives in the unfraud-severely fraud sample was 107. The sum of false positives in the unfraud and general fraud samples was 130. The sum of false positives in the general fraud and serious fraud samples is 120.
Fig. 9A shows the sample misjudgment rate of the accounting fraud identification method proposed in this paper.
Fig. 9B shows the sample misjudgment rate of traditional accounting fraud identification methods.
From the sample misjudgment rate results in Fig. 9, it can be seen that the data mining algorithm based on smart cities in this paper has more significant advantages. It can be seen from Fig. 9A that the total misjudgment rate of the data mining algorithm based on smart city in the fraudulent-non-fraud sample is 26 %. The overall false positive rate in the unfraud-severely fraud sample was also 26 %. The overall false positive rate was 28 % in the unfrauded and generally fraudulent samples. The overall false positive rate was 29 % in the general fraud and serious fraud samples. In Fig. 9B, the total misjudgment rate of the traditional accounting fraud identification method in the fraud-non-fraud sample is 27 %. The overall false positive rate in the non-fraud-severely fraud sample was 29 %. The overall false positive rate was 32 % in the unfrauded and generally fraudulent samples. The overall false positive rate was 33 % in the general fraud and serious fraud samples.
To further analyze the effectiveness of data mining algorithms based on smart cities in this article, we removed features related to fraud, including abnormal financial data and transaction patterns, from the sample dataset. We randomly selected 10 types of non fraud samples as test samples and conducted ablation experiments through 10 repeated tests to observe whether the recognition misjudgment rates of different algorithms increased. The comparison results are shown in Fig. 10.
Fig. 10.
Algorithm ablation research.
From Fig. 10, it can be seen that after removing features related to fraud such as abnormal financial data and transaction patterns, the identification misjudgment rate of the data mining algorithm based on smart cities in this article did not increase, and its average misjudgment rate in the test sample was about 24 %; The average misjudgment rate of traditional accounting fraud identification methods under test samples is about 31 %. From this comparison result, it can be seen that the algorithm proposed in this paper can still maintain high recognition accuracy even when removing some features or changing the data distribution. This indicates that compared with traditional accounting fraud identification methods, the data mining algorithm proposed in this paper has strong adaptability and can effectively identify fraud in different data situations.
To study the differences in the changes in false alarm rates between the two types of algorithms, this paper conducted a statistical analysis of the changes in false alarm rates using the Friedman test. The Friedman test, also known as the Friedman two-way rank analysis of variance, is a statistical test for the homogeneity of multiple (correlated) samples [38,39]. A significance level of 0.05 was selected, and the final p-value results are shown in Table 6.
Table 6.
Statistical analysis results.
| Algorithm | Misjudgment rate before ablation | Misjudgment rate after ablation | p-value |
|---|---|---|---|
| A Data Mining Algorithm Based on Smart City Informatization Technology | 26 % | 24 % | 0.523 |
| Traditional Accounting Fraud Identification Algorithm | 27 % | 31 % | 0.0217 |
From the statistical analysis results in Tables 6 and it can be seen that the misjudgment rate of our algorithm before and after ablation does not show a significant difference in statistical significance, with a p-value greater than 0.05; The p-value of the traditional accounting fraud identification algorithm is 0.0217, which is less than 0.05. There is a significant difference in the misjudgment rate before and after ablation.
In this paper, the ROC curve is used to compare the performance and effect of the accounting fraud identification method. The ROC curve can display the correctly identified positive tuples (FRAUD = 1) and the false positive rate. That is to say, the larger the area (AUC) formed by the ROC curve of the classification model and the diagonal line, the better the recognition effect. Fig. 11, Fig. 12 are the ROC curves of the K-means data mining algorithm in the identification of accounting fraud based on smart city and the traditional accounting fraud identification method under four types of samples.
Fig. 11.
Curve effect under S1 and S2.
Fig. 12.
Curve effect under S3 and S4.
Fig. 11A is the curve under the non-fraud-fraud sample.
Fig. 11B is the curve under the non-fraud-serious sample.
Fig. 12A is the curve under the non-fraud and general fraud samples.
Fig. 12B shows the curves under the general fraud and serious fraud samples.
From Fig. 11, Fig. 12, the ROC curves of four samples of fraud-no fraud, no fraud-serious fraud, no fraud and general fraud, and general fraud and serious fraud. In the non fraud fraud sample of Fig. 11A and the non fraud serious sample of Fig. 11B, compared to traditional data mining algorithms, the ROC curve of the data mining algorithm under the smart city in this paper is more towards the upper left, and its AUC area is larger, indicating that the algorithm in this paper has better classification performance; In the non fraud and general fraud samples in Fig. 12A and the general fraud and series fraud samples in Fig. 12B, the AUC area of our algorithm is also significantly larger than that of traditional algorithms. It can be seen that the data mining algorithm based on smart city has the best performance and effect on accounting fraud identification, and the AUC area formed by it and the diagonal line is larger than that of the traditional accounting fraud identification method.
Support Vector Machine (SVM) is a supervised learning algorithm commonly used for classification and regression analysis. Its main idea is to separate data points of different categories by finding the best hyperplane. Random Forest (RF) is an ensemble learning method that improves overall prediction accuracy by constructing multiple decision tree models and combining their prediction results. To fully validate the effectiveness of data mining algorithms in identifying accounting fraud using smart city information technology in this article, this study compared support vector machines and random forest methods and conducted accuracy tests on three real datasets: The Audit Analytics Statement (AAR) database, The Center for Audit Quality (CAQ) Data Commons, and Institute of Internal Auditors (IIA) Data Set. Among them, the AAR dataset covers all US Securities and Exchange Commission registered companies that have disclosed financial statement restatements in electronic files since January 1, 2000; The CAQ dataset is provided by the US Audit Quality Center and includes financial data from different industries and companies, which can be used for research and experimentation in identifying accounting fraud. The IIA dataset is provided by the International Institute of Internal Auditors and includes simulated accounting fraud case data for testing and validating the performance of fraud identification algorithms. The final test results are shown in Table 7.
Table 7.
Dataset test results.
| Data set | The algorithm in this paper (%) | SVM (%) | RF (%) |
|---|---|---|---|
| AAR | 90.15 | 86.21 | 80.57 |
| CAQ | 89.61 | 88.37 | 86.79 |
| IIA | 88.34 | 82.11 | 84.06 |
From Tables 7 and it can be seen that the algorithm proposed in this paper has high accuracy in different test datasets, with accuracy results of 90.15 %, 89.61 %, and 88.34 %, respectively; The accuracy results of SVM were 86.21 %, 88.37 %, and 82.11 %, respectively; The RF accuracy results were 80.57 %, 86.79 %, and 84.06 %, respectively. In the identification of accounting fraud, data mining algorithms under smart city information technology can integrate data and resources from multiple fields, provide a more comprehensive perspective and deeper understanding, help discover potential fraud associations hidden between different fields, and thus improve the accuracy and comprehensiveness of fraud identification.
5. Discussion
Through experimental testing data, the following conclusions can be drawn.
-
(1)
In terms of the accuracy of accounting fraud identification, the data mining algorithm based on smart city has 453 comprehensive misjudgments in the four samples, and the comprehensive misjudgment rate is about 27 %. The number of comprehensive misjudgments in the four samples by the traditional accounting fraud identification method is 506, and the comprehensive misjudgment rate is about 30 %. That is to say, the accounting recognition accuracy rate of data mining algorithm based on smart city is 3 % higher than that of traditional accounting fraud identification method.
-
(2)
On the performance and effect level of the accounting fraud identification method, the average AUC area of the data mining algorithm based on smart city in the four samples is 0.27 square centimeters larger than that of the traditional accounting fraud identification method in the four samples.
-
(3)
In different dataset tests, compared with support vector machines and random forest methods, the data mining algorithm based on smart cities presented more ideal accuracy results in identifying accounting fraud.
The overall results of the experimental test show that, under the same experimental conditions, after different forms of accounting fraud identification tests, both in terms of the accuracy of accounting fraud identification and the performance and effect of accounting fraud identification methods, all data mining algorithms based on smart city informatization technology perform better. The advantage of data mining algorithms based on smart cities lies in their ability to process large-scale data, effectively processing massive amounts of accounting data to uncover patterns and abnormal patterns, which helps identify potential accounting fraud behaviors. By integrating information from multiple data sources and comprehensively analyzing data from different sources, the need for manual intervention is reduced, and the accuracy and comprehensiveness of fraud identification are improved. Its limitation lies in the complexity of the reasoning process and decision-making rules, which may reduce its ability to explain accounting fraud behavior. Overall, data mining algorithms based on smart cities can effectively identify accounting fraud, objectively reflect the severity of corporate fraud, and promote further development of the auditing industry.
6. Conclusion
The accounting fraud of listed companies is not only a serious situation in China, but also seriously affects the normal operation and development of securities markets around the world. Identifying accounting fraud is very important for maintaining market order and protecting the interests of investors. Accounting fraud is not like traditional single fraud, it is comprehensive. The data mining algorithm based on smart city can analyze the characteristics of each part of the data system under the premise of ensuring sufficient information, so as to achieve effective identification and prediction of accounting fraud in listed companies. This provides more secure and reliable technical support for auditors when conducting financial audits on enterprises. It is believed that with the improvement and maturity of technology, the accounting fraud identification technology of listed companies will be more and more high-quality and high-level development.
Although the data mining algorithm based on smart city has a certain guiding role in identifying accounting fraud in this article. However, there are still many shortcomings in the research process of this article. There are many factors that affect accounting fraud, and this article did not delve into the impact of these factors. The data of accounting fraud usually contains a large number of feature variables. Choosing the appropriate feature variables is crucial for building effective data mining models. However, in practical applications, feature selection often faces challenges, requiring consideration of both data complexity and algorithm interpretability. The limitation of this study lies in insufficient feature selection. In future work, we will consider continuously improving the quality and level of identifying accounting fraud from the perspective of influencing factors, and promoting the healthy development of the market.
Data availability statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.



CRediT authorship contribution statement
Xinyi Zheng: Writing – original draft, Conceptualization. Mohamad Ali Abdul Hamid: Data curation. Yihua Hou: Writing – review & editing, Data curation.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Biographies
Xinyi Zheng was born in Luoyang, Henan, People's Republic of China in 1991. She obtained her master degree from Macquarie University in Australia. She is currently studying PHD at Putra Business School. Her research interests include earnings management, corporate governance, auditing, and accounting fraud. E-mail:pbs19204204@grad.putrabs.edu.my
PROFESSOR DR. MOHAMAD ALI BIN ABDUL HAMID was born in Malaysia, in 1954. He received the PHD degree from University of Bradford, England. Now, He works in Putra Business School as Professor and Supervisor, His research interests include Corporate Governance, Earnings Management, Financial Reporting Quality, Working Capital Management, Disclosure Quality, Audit Quality, Ownership Structure, Intellectual Capital, Accounting Education. E-mail: ali@putrabs.edu.my
Yihua Hou was born in Luoyang, Henan, People's Republic of China in 1990. She obtained her master degree from Arizona State University in U.S., and works at Henan University of Science and Technology. She is currently studying PHD at University of Malaya. Her research interests include educational economics, educational management, and educational policy. E-mail: yihuahou@haust.edu.cn
Contributor Information
Xinyi Zheng, Email: pbs19204204@grad.putrabs.edu.my.
Mohamad Ali Abdul Hamid, Email: ali@putrabs.edu.my.
Yihua Hou, Email: yihuahou@haust.edu.cn.
References
- 1.Donelson D.C., Kartapanis A., McInnis J., et al. Measuring accounting fraud and irregularities using public and private enforcement. Account. Rev. 2021;96(6):183–213. [Google Scholar]
- 2.Mason P., Williams B. Does IRS monitoring deter managers from committing accounting fraud? J. Account. Audit Finance. 2022;37(3):700–722. [Google Scholar]
- 3.Ewa U.E. Forensic accounting and fraud management in Nigeria. Journal of Accounting, Business and Finance Research. 2022;14(1):19–29. [Google Scholar]
- 4.Cortes J. Journal titles and mission statements: lexical structure, diversity, and readability in business, management and accounting research. J. Inf. Sci. 2021;49(5):1–15. [Google Scholar]
- 5.Bao Y., Ke B., Li B., Yu Y.J., Zhang J. Detecting accounting fraud in publicly traded US firms using a machine learning approach. J. Account. Res. 2020;58(1):199–235. [Google Scholar]
- 6.Eko E.U., Adebisi A.W., Moses E.J. Evaluation of forensic accounting techniques in fraud prevention/detection in the banking sector in Nigeria. Int. J. Finance Account. 2020;9(3):56–66. [Google Scholar]
- 7.Lu W., Zhao X. Research and improvement of fraud identification model of Chinese A-share listed companies based on M-score. J. Financ. Crime. 2021;28(2):566–579. [Google Scholar]
- 8.Li C., Li N., Zhang F. Using economic links between firms to detect accounting fraud. Account. Rev. 2023;98(1):399–421. [Google Scholar]
- 9.Daraojimba R.E., Farayola O.A., Olatoye F.O., et al. Forensic accounting in the digital age: a us perspective: scrutinizing methods and challenges in digital financial fraud prevention. Finance & Accounting Research Journal. 2023;5(11):342–360. [Google Scholar]
- 10.Mahmudov S.O., Temirova D. Load balancing method in smart city networks based software defined networking. International Conference on Information Science and Communications Technologies (ICISCT) 2021;2021:1–5. [Google Scholar]
- 11.Miah S.J., Vu H.Q., Alahakoon D. vol. 73. Journal of the Association for Information Science & Technology; 2022. (A Social Media Analytics Perspective for Human﹐riented Smart City Planning and management[J]). [Google Scholar]
- 12.Camero A., Alba E. Smart City and information technology: a review. Cities. 2019;93(10):84–94. [Google Scholar]
- 13.Cui Y., Zhang L. Privacy preserving ciphertext-policy attribute-based broadcast encryption in smart city. J. China Univ. Posts Telecommun. 2019;26(1):25–35. [Google Scholar]
- 14.Lakhno V., Malyukov V., Bochulia T., et al. Model of managing of the procedure of mutual financial investing in information technologies and smart city systems. Int. J. Civ. Eng. Technol. 2018;9(8):1802–1812. [Google Scholar]
- 15.Sun M., Zhang J. Research on the application of block chain big data platform in the construction of new smart city for low carbon emission and green environment. Comput. Commun. 2020;149(1):332–342. [Google Scholar]
- 16.Ridwan K., Suryotrisongko H., Tjahyanto A., et al. Public service for smart city through internet messenger: which messenger perform better in terms of 'anti identity fraud'? J. Theor. Appl. Inf. Technol. 2018;96(19):6542–6557. [Google Scholar]
- 17.Turgel I., Bozhko L., Ulyanova E., et al. Implementation of the smart city technology for environmental protection management of cities: the experience of Russia and Kazakhstan. Environmental and Climate Technologies. 2019;23(2):148–165. [Google Scholar]
- 18.Kim S.M., Jung H.S., Lee Y.W. Smart city cyber security based on information security industry. The Journal of Korean Institute of Information Technology. 2020;18(4):129–136. [Google Scholar]
- 19.Han D. Researches of detection of fraudulent financial statements based on data mining. J. Comput. Theor. Nanosci. 2017;14(1):32–36. [Google Scholar]
- 20.Akinbowale O.E., Klingelhfer H.E., Zerihun M.F. An innovative approach in combating economic crime using forensic accounting techniques. J. Financ. Crime. 2020;27(4):1253–1271. [Google Scholar]
- 21.Qin R. Identification of accounting fraud based on support vector machine and logistic regression model. Complexity. 2021;2021(2):1–11. [Google Scholar]
- 22.Gadekar D.P., Singh Y.P. Efficiently identification of misrepresentation in social media based on rake algorithm. Int. J. Eng. Technol. 2018;7(4):471–474. [Google Scholar]
- 23.Dubrov V.E., Zlobina Y.S., Tishchenko S.A., et al. The algorithm for territorial distribution of public emergency rooms in megapolis (by the example of moscow) Traumatology and Orthopedics of Russia. 2020;26(4):138–149. [Google Scholar]
- 24.Xu Y., Zhang L., Chen H. Board age and corporate financial fraud: an interactionist view. Long. Range Plan. 2017;51(6):815–830. [Google Scholar]
- 25.Dae Yong J., Gibum K., Sangjin L., et al. A study on risk analysis and countermeasures of electronic financial fraud. Journal of the Korea Institute of Information Security & Cryptology. 2017;27(1):115–128. [Google Scholar]
- 26.Burnes D., Henderson C.R., Sheppard C., et al. Prevalence of financial fraud and scams among older adults in the United States: a systematic review and meta-analysis. Am. J. Publ. Health. 2017;107(8):1295. doi: 10.2105/AJPH.2017.303821. 1295. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Kr A., Yadav S., Sora M., et al. Financial fraud detection using deep learning approach. Des. Eng. 2021;2021(7):6254–6267. [Google Scholar]
- 28.Jamaludin S.Z.M., Romli N.A., Kasihmuddin M.S.M., et al. Novel logic mining incorporating log linear approach. Journal of King Saud University-Computer and Information Sciences. 2022;34(10):9011–9027. [Google Scholar]
- 29.Widharma F., Susilowati E. Auditor switching, financial distress, and financial statement fraud practices with audit report lag as intervening variable. Journal of Accounting and Strategic Finance. 2020;3(2):243–257. [Google Scholar]
- 30.Sukjae C., Jungwon L., Ohbyung K. Financial fraud detection using text mining analysis against municipal cybercriminality. Journal of Intelligence and Information Systems. 2017;23(3):119–138. [Google Scholar]
- 31.Li D.R. Cluster analysis algorithm based on key data integration for cloud computing. Int. J. Reas. base Intell. Syst. 2017;9(3):123–129. [Google Scholar]
- 32.Yoseph F., Malim N., Heikkil M., et al. The impact of big data market segmentation using data mining and clustering techniques. J. Intell. Fuzzy Syst. 2020;38(1):1–15. [Google Scholar]
- 33.Su G. Analysis of optimisation method for online education data mining based on big data assessment technology. Int. J. Continuing Eng. Educ. Lifelong Learn. 2019;29(4):321–335. [Google Scholar]
- 34.Pandey K.K., Shukla D., Milan R. Data mining algorithm and new HRDSD theory for big data. International Journal of Computer Sciences and Engineering. 2019;7(3):76–81. [Google Scholar]
- 35.Yang F., Zhang J.C., Zhang F., et al. Big data mining and association algorithm for water resources. Boletin Tecnico/Technical Bulletin. 2017;55(7):85–91. [Google Scholar]
- 36.Gao Y. Educational resource information sharing algorithm based on big data association mining and quasi-linear regression analysis. Int. J. Continuing Eng. Educ. Lifelong Learn. 2019;29(4):336–348. [Google Scholar]
- 37.Kumar Y., Devargaon M.S. Big data cluster formation strategy identification using data mining architecture. IOSR J. Eng. 2019;9(12):42–46. [Google Scholar]
- 38.Kasihmuddin M.S.M., Romli N.A., Manoharam G., et al. Multi-unit Discrete Hopfield Neural Network for higher order supervised learning through logic mining: optimal performance design and attribute selection. Journal of King Saud University-Computer and Information Sciences. 2023;35(5) [Google Scholar]
- 39.Manoharam G., Kasihmuddin M.S.M., Antony S.N.F.M.A., et al. Log-linear-based logic mining with multi-discrete hopfield neural network. Mathematics. 2023;11(9):2121. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.











