Skip to main content
F1000Research logoLink to F1000Research
. 2025 May 22;14:253. Originally published 2025 Mar 3. [Version 2] doi: 10.12688/f1000research.161477.2

Synergistic review of automation impact of big data, AI, and ML in current data transformative era

Swastik Rath 1,a, Manjusha Pandey 1,b, Siddharth Swarup Rautaray 1,c
PMCID: PMC12296276  PMID: 40718378

Version Changes

Revised. Amendments from Version 1

The Major difference between version 1 and version 2 are as follows:- 1) The PRISMA flow chart is absent in version 1, but it is included in the Methodology section of the Systematic Review Manuscript in version 2. 2) In version 1, country-wise documents published in every 6 sectors are not included, but in version 2, it is included in the last from Figure No. 31 to Figure No. 36. 3) In version 1, the Conclusion partly supported the result, but in version 2, it is modified, and the correct supported results are also included. 4) In version 1, the explicit justification for its synergistic focus, failing to address gaps in existing literature but in version 2 it is been modified and included in version 2 of the manuscript. 5) Some of the new references are also added in the new version 2 that are not there in the old version.

Abstract

The convergence of automation, big data analytics (BDA), artificial intelligence (AI), and machine learning (ML) has ushered in a new era of technological advancement, reshaping industries, and societies worldwide. This review research work delves into the transformative impact of these technologies, focusing on their applications across various sectors. The study covers six key sectors: healthcare, banking, finance, retail, real estate, and agriculture, highlighting how these industries leverage automated systems and data analytics to enhance operations, manage risks, and improve decision-making processes. Drawing results from over 1,000 research papers and categorizing them into 100 key studies specifics, this survey-based review underscores the critical role of big data in enabling predictive analytics, improving outcomes, and driving innovation across sectors. The review research work explores how industries utilize vast data volumes from diverse sources to derive actionable insights, forecast trends, and optimize processes. Key applications included in the review are from the domains of disease prediction and electronic health record management in healthcare , fraud detection and credit risk assessment in banking and finance, consumer behavior analysis and inventory optimization in retail, market trend forecasting in real estate, and disaster risk management in agriculture. The paper also discusses the challenges including data quality, scalability, and privacy paving way towards future directions of big data analytics, emphasizing the need for machine-independent solutions, data security, and ethical considerations in the evolving landscape of data-driven decision-making.

Keywords: Big data analytics, artificial intelligence, machine learning, automation, predictive analytics, data-driven decision-making, risk management, fraud detection, consumer behavior analysis, electronic health records, cyber security [5], Sustainability

1. Introduction

This review research work offers a thorough examination of how automation has led to data proliferation across various industries, focusing on the effects of big data analytics (BDA), artificial intelligence (AI), and machine learning (ML) technologies. 1 The research covers six crucial sectors: healthcare, banking, finance, retail, real estate, and agriculture, highlighting their reliance on automated systems to enhance operations, manage risks, and improve decision-making processes. In the healthcare field, AI and BDA are revolutionizing disease forecasting, electronic health record (EHR) administration, and supply chain 2 performance. The banking and finance industries employ automation for detecting fraudulent activities, evaluating credit risks, and bolstering cybersecurity measures. Retail businesses benefit from automation through improved productivity, analysis of consumer behavior, 3 and prediction of sales trends. The real estate market utilizes predictive analytics to identify market patterns, particularly during the COVID-19 crisis, 4 while the agricultural sector leverages BDA to boost operational efficiency, predict area specific crops/products and manage disaster risks.

The generation of vast data volumes from diverse sources, such as EHRs, transactional records, and climate data, necessitates sophisticated big data technologies 5 for effective processing and insights. Data is recognized as a valuable resource, offering historical insights and predictive capabilities across these sectors. This survey draws from over 1,000 research papers, categorizing 100 key studies to underscore the transformative role of big data in enabling predictive analytics, improving outcomes, 6 and driving innovation in these industries.

1.1 Automation

In this section, the research work utilizes different domains which have different types and of requirements of automation,

1.1.1 Healthcare: These studies automate healthcare management through big data 7 and artificial intelligence, improving disease prediction, supply chain efficiency, and patient care. 8 Tools such as machine learning, natural language processing, and deep learning streamline electronic health record analysis, 9 risk assessment, decision making, 10 transforming healthcare practices, and outcomes. 11 The transformative potential of big data in healthcare, focusses on improving system efficiency, patient outcomes, and personalized care. 12 They identify challenges in data quality, integration, and ethics, recommending improved governance, interdisciplinary collaboration, and standardized frameworks to enhance big data’s effectiveness in healthcare applications. 13 The integration of big data in digital healthcare provides lessons and recommendations for leveraging big data to improve patient outcomes, streamline operations, and advance personalized medicine in primary healthcare settings. 14 Applications of big data analytics in health care emphasizes big data’s potential to revolutionize healthcare delivery, improve patient outcomes, and address challenges such as data security and ethical considerations. 15

1.1.2 Banking: Automation in banking leverages big data and artificial intelligence for fraud detection, credit risk assessment, and operational optimization. Machine learning models and data analytics platforms enhance real-time insights, cybersecurity, financial decision-making, and driving efficiency across the sector.

1.1.3 Finance: In finance, automation driven by big data, artificial intelligence, and machine learning optimizes risk mitigation, credit evaluation, and financial analysis. IoT integration, data-driven processes, and advanced algorithms streamline financial services, sustainability assessments, and innovation, thereby transforming the industry’s operational landscape. 16

1.1.4 Retail: These studies automate retail operations using big data analytics, enhancing productivity, fraud detection, and customer behavior analysis. Big data-driven tools transform traditional retail, predict sales, and analyze the impact of external factors such as COVID-19.

1.1.5 Real estate: Automation in real estate leverages big data and predictive analytics to forecast market trends, particularly during the COVID-19 pandemic.

1.1.6 Agricultural: These papers automate agricultural processes and disaster management using big data analytics, driving operational efficiency and informed decision-making.

Total 1000 papers are search out of that 500 papers matches the search categories and from that 100 papers are read and put those papers across different categories in order to make it easy for use while writing the review paper. All these efforts are made to by the fellow researcher to carry out the research in the smooth and systematic way so that all the categories main domains will be achieved after the end of the review paper.

2. Methodology specific with prisma flowchart 2020

A systematic review was conducted following PRISMA 2020 guidelines. Relevant studies were identified through comprehensive searches of databases such as PubMed, Scopus, and Web of Science. Inclusion and exclusion criteria were applied, and duplicate records were removed. Eligible articles were screened based on titles, abstracts, and full texts. Data extraction and quality assessment were performed independently by two reviewers. The selection process was documented using a PRISMA 2020 flowchart to ensure transparency and replicability ( Figure 1).

Figure 1. PRISMA 2020 flow diagram for new systematic reviews which included searches of databases and registers only. *Consider, if feasible to do so, reporting the number of records identified from each database or register searched (rather than the total number across all databases/registers).**If automation tools were used, indicate how many records were excluded by a human and how many were excluded by automation tools.


Figure 1.

3. Machine dependency in big data

In this section, the research work utilizes different domains which have Machine Dependency in Big data

3.1 Machine dependency in healthcare

These studies demonstrate a significant reliance on machine learning, artificial intelligence, and big data analytics for predictive 17 modeling, risk assessment, and decision-making in healthcare. They employ deep learning, natural language processing, and advanced data processing systems to automate complex tasks such as electronic health record analysis, disease prediction, and healthcare supply chain management ( Figure 2). 18

Figure 2. Healthcare Research Papers Published by Source (2018-2025).


Figure 2.

Machine independence in healthcare refers to the capacity of healthcare systems 19 to operate effectively without dependence on specific hardware or software platforms. This enhances system flexibility, scalability, and longevity by mitigating vendor lock-in ( Figure 3).

Figure 3. Healthcare Research Publications (2018-2025): Number of Papers Published.


Figure 3.

In healthcare, machine-independent solutions frequently utilize open-source technologies, cloud-based infrastructures, and standardized data formats. Through this approach, hospitals and care providers can integrate AI and big data analytics into their workflows while maintaining interoperability with various systems, such as EHR platforms from multiple vendors. 20 This promotes collaboration, data sharing, and system upgrades without incurring costly disruptions. 21

Machine independence is particularly crucial as healthcare progresses towards Industry 4.0, 22 where data-driven approaches are central. 23 Ensuring that AI and big data tools 24 are machine-independent facilitates seamless integration across diverse healthcare environments, supporting a more robust, adaptable, and accessible healthcare system. It also facilitates the adoption of innovative technologies and enhances patient outcomes across various clinical settings ( Figure 4).

Figure 4. Healthcare Research Papers by Affiliation (2018-2025): Publication Trends.


Figure 4.

Machine independence in healthcare is essential for ensuring that critical systems and technologies can adapt to rapidly changing environments without being constrained by specific hardware or software. This characteristic is particularly significant in the context of big data analytics and artificial intelligence (AI), where data sources and computational requirements vary across healthcare providers. Machine-independent systems can operate on diverse infrastructure configurations, whether on-premise, in the cloud, or across hybrid systems, promoting flexibility and scalability ( Figure 5).

Figure 5. Healthcare Research Papers by Universities (2018-2025): Publication Trends.


Figure 5.

In healthcare, where integration and interoperability are crucial, machine independence enables different institutions to share, access, and process large volumes of data across disparate systems. This flexibility supports enhanced decision-making, expedited diagnoses, and more personalized treatment plans by facilitating the utilization of advanced AI algorithms and big data analytics, irrespective of the platform.

It emphasizes the transformative potential of big data in areas such as diagnostics, disease prediction, personalized medicine, and operational management. The authors discuss how big data supports evidence-based decision-making, improves patient outcomes, and streamlines healthcare processes. 25

A significant advantage of machine independence is its capacity to foster innovation without being limited by existing systems. Healthcare facilities can incorporate cutting-edge technologies such as natural language processing (NLP), computer vision, or machine learning algorithms into their operations, regardless of the underlying IT infrastructure. This allows healthcare organizations to evolve alongside technological advancements without costly overhauls or disruptions to patient care ( Figure 6).

Figure 6. Healthcare Research Papers by Source (2018-2025): Publication Trends.


Figure 6.

Machine independence also contributes to more robust and resilient healthcare infrastructures, enabling organizations to diversify their technology stack, enhance system security, and ensure continuity of operations even in the event of specific hardware or software failures ( Figure 7).

Figure 7. Healthcare Research Paper Published by Country wise (2018-2024) in Percentage: Publication Trends.


Figure 7.

3.2 Machine dependency in banking

Machine learning models and big data analytics 26 play a central role in automating fraud detection, credit risk assessment, and real-time banking insights. These studies emphasize the significance of artificial intelligence and data-driven platforms in enhancing banking operations, cybersecurity, and financial decision making ( Figure 8).

Figure 8. Banking Research Papers Published by Source (2019-2024).


Figure 8.

Machine dependency in big data analytics, especially in the context of banking and finance, plays a pivotal role in achieving the scale, speed, and complexity required for modern applications such as fraud detection, credit risk assessment, and cybersecurity. The machine dependency aspect refers to how heavily these analytics systems rely on computational power, advanced algorithms, and data storage infrastructure to deliver accurate and timely results ( Figure 9).

Figure 9. Banking Research Publications (2019-2024): Number of Papers Published.


Figure 9.

In credit/debit card fraud detection, 27 especially in real-time, machine learning models such as decision trees, neural networks, and support vector machines (SVM) require powerful computing resources to process massive amounts of transactional data at high speeds. The dependency on machines ensures that fraud can be detected before it causes substantial financial damage ( Figure 10).

Figure 10. Banking Research Papers by Affiliation (2019-2024): Publication Trends.


Figure 10.

For credit/debit card-not-present fraud detection, 28 machine dependency becomes even more crucial. The large-scale analysis of transactional metadata (such as geolocation and device fingerprints) relies on big data frameworks like Hadoop and Spark, alongside cloud-based computing resources. These systems enable parallel processing of vast datasets, ensuring that analytics models can detect suspicious behavior in real-time ( Figure 11).

Figure 11. Banking Research Papers by Universities (2019-2024): Publication Trends.


Figure 11.

In credit risk assessment and quantifying cybersecurity risks, 29 machine learning algorithms such as decision trees and SVMs are computationally intensive, especially when analyzing complex and large datasets. Machine dependency in these tasks ensures that the models can run continuously and make decisions based on real-time data streams, which is critical for timely risk assessments ( Figure 12).

Figure 12. Banking Research Papers by Source (2019-2024): Publication Trends.


Figure 12.

The overall dynamics of big data analytics in the banking sector 30 show an increasing reliance on AI and machine learning models that require not just data but also computing power to function effectively. With the imperatives of big data in finance, 31 the integration of cloud infrastructure, distributed computing systems, and advanced hardware (such as GPUs) becomes essential to run algorithms at the scale needed for financial applications ( Figure 13).

Figure 13. Banking Research Paper Published by Country wise (2019-2024) in Percentage: Publication Trends.


Figure 13.

In summary, machine dependency is foundational to enabling big data analytics systems to handle large, complex datasets, ensuring real-time insights and decision-making in the banking and finance sector. 32

3.3 Machine dependency in finance

Big data, artificial intelligence, and machine learning facilitate the automation of financial processes such as risk mitigation, fraud detection, and credit evaluation. These studies highlight the integration of the IoT, data-driven analysis, and machine learning algorithms to transform financial operations, 33 sustainability analysis, and digital finance 34 services ( Figure 14).

Figure 14. Finance Research Papers Published by Source (2018-2025).


Figure 14.

Machine dependency in big data within the finance sector refers to the reliance on specific hardware, software, and computing infrastructure to effectively process, analyze, and utilize large datasets for financial decision-making. Big data analytics in finance often necessitates powerful computational resources, specialized software, and cloud platforms that support extensive data storage and high-speed processing. Machine learning algorithms employed for tasks such as risk mitigation, fraud detection, and credit evaluation depend on specific systems and frameworks for training models and making predictions ( Figure 15).

Figure 15. Finance Research Publications (2018-2025): Number of Papers Published.


Figure 15.

The financial industry frequently encounters challenges due to machine dependency, such as reliance on proprietary software for algorithmic trading, data analysis, or fraud detection, which constrains flexibility and interoperability across different systems. Financial institutions may become constrained within specific vendor ecosystems, diminishing their ability to switch providers or integrate new technologies without substantial costs ( Figure 16).

Figure 16. Finance Research Papers by Affiliation (2018-2025): Publication Trends.


Figure 16.

Machine dependency also raises concerns regarding data security, as financial data must often be stored and processed on particular machines or cloud environments, potentially increasing the risk of cyber threats if systems are compromised. While machine learning and artificial intelligence enhance financial processes, the dependency on particular machines and infrastructure can impede innovation and adaptability unless measures are taken to adopt more platform-agnostic solutions, such as open-source software or cloud-agnostic platforms ( Figure 17).

Figure 17. Finance Research Papers by Universities (2018-2025): Publication Trends.


Figure 17.

Machine dependency in big data in finance 35 refers to the reliance on specific computational infrastructure, algorithms, and software systems to process and analyze massive amounts of financial data. As financial institutions increasingly adopt big data analytics and machine learning for risk mitigation, fraud detection, and credit evaluation, machine dependency can become a limiting factor. This reliance on proprietary hardware, software, or algorithms often results in inflexibility, impeding adaptation to new technologies or integration with other systems ( Figure 18). 36

Figure 18. Finance Research Papers by Source (2018-2025): Publication Trends.


Figure 18.

For instance, risk mitigation and fraud detection models may rely heavily on certain machine learning frameworks that are optimized for specific hardware environments. In such cases, transitioning to alternative platforms or integrating with different systems might necessitate costly re-engineering. Similarly, credit evaluation systems driven by big data and the Internet of Things (IoT) may depend on custom-built architectures, thereby restricting scalability or interoperability across different platforms ( Figure 19).

Figure 19. Finance Research Paper Published by Country wise (2018-2025) in Percentage: Publication Trends.


Figure 19.

3.4 Machine dependency in retail

Retail-related studies utilize big data technologies, including advanced analytics, machine learning models, and data fusion techniques, to optimize retail store productivity, detect fraud, predict sales trends, and analyze consumer behavior ( Figure 20).

Figure 20. Retail Research Papers Published by Source (2019-2024).


Figure 20.

Machine dependency in the context of big data technologies in retail refers to the reliance on specific hardware, software, or algorithms that can create barriers to flexibility, scalability, and innovation. As retail companies increasingly adopt big data analytics for various applications, such as enhancing productivity, combating fraud, and understanding consumer behavior, this dependency can significantly impact their operational efficiency and adaptability ( Figure 21).

Figure 21. Retail Research Publications (2019-2024): Number of Papers Published.


Figure 21.

For instance, many retail analytics 37 solutions require robust computational infrastructure to process large datasets effectively. This often leads to reliance on specific machine learning frameworks and databases optimized for hardware configurations. Such dependencies can hinder a retailer’s ability to integrate newer technologies or switch to alternative systems, resulting in increased costs and longer implementation times. For example, in analyzing the sales data of Bigmart 38 outlets, reliance on specific analytics software may limit the organization’s ability to pivot quickly in response to market changes ( Figure 22).

Figure 22. Retail Research Papers by Affiliation (2019-2024): Publication Trends.


Figure 22.

Additionally, the use of big data to enhance data envelopment analysis (DEA) of retail store productivity 39 illustrates another aspect of machine dependency. If the DEA models are built on specific platforms, retailers may face challenges in scaling these models or adapting them to different data sources, restricting their capacity to evaluate productivity effectively across diverse retail environments ( Figure 23).

Figure 23. Retail Research Papers by Universities (2019-2024): Publication Trends.


Figure 23.

Fraud detection systems, which leverage big data analytics to combat retail fraud, 40 are also susceptible to machine dependency. Often, these systems are built on tailored algorithms that operate best within certain frameworks. This can limit retailers’ ability to update or adapt their fraud detection capabilities as new threats emerge, ultimately compromising security.

The analysis of consumer behavior using big data further underscores the risks of machine dependency. Retailers may become overly reliant on certain data sources, such as social media 41 or point-of-sale data, processed through specific analytics platforms. This can result in a narrow view of consumer insights, limiting the retailer’s ability to innovate and respond to shifting consumer preferences ( Figure 24).

Figure 24. Retail Research Papers by Source (2019-2024): Publication Trends.


Figure 24.

Moreover, the transformation of traditional retail 42 in the era of big data often involves integrating multiple data streams, including demographic data, sales data, and online behavior. If retailers are locked into specific technologies, they may struggle to achieve the necessary interoperability and flexibility to leverage these diverse data sources effectively.

In conclusion, while big data technologies present significant opportunities for retail companies, the potential for machine dependency poses challenges that can inhibit innovation, limit adaptability, and increase operational costs. 43 Retailers need to adopt machine-independent solutions to ensure flexibility in their big data initiatives and maintain a competitive edge in a rapidly evolving market ( Figure 25). 44

Figure 25. Retail Research Paper Published by Country wise (2019-2024) in Percentage: Publication Trends.


Figure 25.

3.5 Machine dependency in real estate

The studies on forecasting commercial real estate indicators and predictive analytics for the real estate market during the COVID-19 pandemic 45 predominantly utilize machine learning models, big data algorithms, and social data analytics. These methodologies rely on computational processing capabilities to analyze extensive datasets, monitor human behavior, and effectively predict market trends ( Figure 26).

Figure 26. Real Estate Research Papers Published by Source (2019-2024).


Figure 26.

Machine dependency in the context of forecasting commercial real estate indicators and predictive analytics during the COVID-19 pandemic 46 refers to the reliance on specific technological platforms, software applications, or algorithms for analyzing and interpreting large volumes of data. As the real estate market navigates the challenges posed by the pandemic, this dependency can significantly impact the accuracy and effectiveness of predictive models and insights.

For instance, when leveraging social big data to forecast commercial real estate indicators, researchers and analysts often depend on specific data processing frameworks or machine learning algorithms optimized for particular types of hardware. This dependency can create barriers to integrating diverse data sources, such as social media activity, economic indicators, and market trends. If the analytical models are designed around specific technological constraints, it can limit flexibility, making it difficult to adapt to changing market conditions or incorporate emerging data streams relevant to understanding human behavior during the pandemic ( Figure 27).

Figure 27. Real Estate Research Publications (2019-2024): Number of Papers Published.


Figure 27.

Similarly, predictive analytics using big data for the real estate market may involve the use of proprietary software or tailored machine learning solutions. These systems can be complex and require specialized knowledge to operate effectively. This machine dependency might hinder real estate firms’ ability to rapidly adjust their analytics strategies in response to evolving market dynamics or consumer behaviors influenced by COVID-19. For example, if a predictive model is entrenched in a specific platform, switching to a more advanced or efficient system could entail significant costs and downtime, limiting the firm’s responsiveness to the fluidity of the real estate landscape ( Figure 28).

Figure 28. Real Estate Research Papers by Affiliation (2019-2024): Publication Trends.


Figure 28.

Furthermore, the reliance on specific data processing and analytics tools can create a homogenized approach to data interpretation, leading to potential blind spots. When all analyses are conducted through a single technological lens, real estate analysts may miss critical insights that could arise from utilizing alternative methods or platforms. This is particularly relevant during a pandemic when market conditions are volatile, and the ability to pivot and adopt new methodologies quickly is paramount ( Figure 29).

Figure 29. Real Estate Research Papers by Universities (2019-2024): Publication Trends.


Figure 29.

Moreover, machine dependency can pose risks related to data privacy and security. Relying on specific systems may increase vulnerabilities, especially when handling sensitive data related to human activity and real estate transactions. If these systems are compromised, the implications can be severe, not only for individual firms but also for the broader market ( Figure 30).

Figure 30. Real Estate Research Papers by Source (2019-2024): Publication Trends.


Figure 30.

In conclusion, while big data analytics offers valuable tools for forecasting and predictive modeling in commercial real estate, machine dependency presents significant challenges. The reliance on specific technologies can inhibit flexibility, limit innovation, and create risks that may impede effective decision-making during uncertain times, such as the COVID-19 pandemic. To enhance adaptability and maintain a competitive edge, real estate firms should prioritize machine-independent solutions that enable more agile and comprehensive analysis ( Figure 31).

Figure 31. Real Estate Research Paper Published by Country wise (2019-2024) in Percentage: Publication Trends.


Figure 31.

3.6 Machine dependency in agricultural

Machine dependency on big data analytics and machine learning models to automate disaster risk management 47 and operational decision-making.

Machine dependency in the context of agricultural disaster risk management and big data analytics applications in information management highlights the challenges associated with relying on specific technologies, software platforms, or algorithms to process and analyze large datasets. As these fields increasingly adopt big data analytics to enhance operational efficiencies and decision-making, the reliance on machines or systems can create barriers to flexibility, scalability, and innovation ( Figure 32).

Figure 32. Agriculture Research Papers Published by Source (2019-2024).


Figure 32.

In agricultural disaster risk management, big data analytics plays a crucial role in assessing risks, predicting events, and formulating response strategies. However, if these analytical processes are tied to specific hardware or proprietary software, the capacity to adapt to new data sources or methodologies can be significantly constrained. For instance, if a system is designed to analyze agricultural data exclusively through a particular cloud platform or machine learning framework, it may struggle to incorporate alternative data streams, such as real-time satellite imagery or sensor data from IoT devices. This can result in incomplete risk assessments and a diminished ability to respond effectively to emerging threats ( Figure 33).

Figure 33. Agriculture Research Publications (2019-2024): Number of Papers Published.


Figure 33.

Moreover, the bibliometric analysis of big data analytics applications in information management further illustrates machine dependency concerns. In this study, the reliance on specific statistical tools and programming languages, such as R, can limit the accessibility and reproducibility of the findings. Researchers may find themselves locked into particular analytical approaches, which can hinder the exploration of diverse methodologies or frameworks that could yield different insights. This dependency on certain technologies can stifle innovation and reduce the overall robustness of the research ( Figure 34).

Figure 34. Agriculture Research Papers by Affiliation (2019-2024): Publication Trends.


Figure 34.

Another critical aspect of machine dependency is the risk of vendor lock-in. If organizations commit to specific platforms for their big data analytics, they may face challenges when it comes time to upgrade or switch technologies. The associated costs and complexities of transitioning to new systems can lead to stagnation in their analytics capabilities. For example, agricultural organizations using specific big data tools may find it difficult to implement new analytical techniques or integrate cutting-edge technologies without incurring substantial costs or facing lengthy implementation bottlenecks ( Figure 35).

Figure 35. Agriculture Research Papers by Universities (2019-2024): Publication Trends.


Figure 35.

Furthermore, machine dependency can raise concerns about data security and privacy. When data is processed through specific machines or software, vulnerabilities may arise, particularly if those systems are not updated regularly or lack robust security measures. In agricultural settings, where data can include sensitive information about crop yields, market prices, 48 or resource allocations, the implications of such vulnerabilities can be catastrophic, potentially leading to data breaches or misuse ( Figure 36).

Figure 36. Agriculture Research Papers by Source (2019-2024): Publication Trends.


Figure 36.

In summary, while big data analytics has transformative potential in agricultural disaster risk management and information management, machine dependency presents notable challenges. This reliance on specific technologies can limit flexibility, hinder innovation, and create risks that may undermine decision-making capabilities. To overcome these challenges, organizations should seek machine-independent solutions that promote interoperability and adaptability, ensuring they can effectively leverage big data analytics for improved outcomes in their respective fields ( Figure 37).

Figure 37. Agriculture Research Paper Published by Country wise (2019-2024) in Percentage: Publication Trends.


Figure 37.

4. Domain specific generation of huge amount of data

In this section the review research papers enumerated and explore diverse domains wherein substantial volumes of data are generated, particularly in healthcare, finance, retail, and agriculture. In the healthcare sector, 49 extensive data is generated through electronic health records (EHRs), 50 wearable sensors, medical imaging, and genomics. The finance industry contributes by processing large-scale transactional data, fraud detection patterns, and customer behavior analytics. In the retail sector, big data emerges from consumer purchasing patterns, inventory management, and social media influences. Agriculture generates substantial data through climate records, crop monitoring, and production analytics. These industries, collectively driven by real-time data acquisition and diverse sources, produce and manage colossal data volumes, rendering big data analytics essential for processing, prediction, and decision-making.

5. Data as a resource

5.1 As valuable insights in historical data

The afore mentioned review research papers elucidate the progressive role of big data analytics across various sectors, with particular emphasis on healthcare, finance, and retail. These fields have undergone a transition from conventional data management practices to the utilization of artificial intelligence, machine learning, and big data technologies for predictive analytics, risk management, and operational efficiency. The healthcare sector 51 focuses on disease prediction and electronic health record (EHR) management, 52 while the finance industry emphasizes fraud detection and credit risk assessment. The retail and agricultural sectors benefit from demand forecasting and disaster risk management, respectively, signifying substantial advancements in data-driven decision-making processes.

5.2 Data as a predictor for future data

This section presents a review of the data as a predictor for future based on current data across sectors:

5.2.1 Healthcare:

  • -

    Disease prediction: Advanced AI models will analyze genetic, environmental, and lifestyle data to predict individual disease risks with higher accuracy. 53

  • -

    Supply chain efficiency: Predictive analytics will optimize inventory management, reducing waste and ensuring timely delivery of medical supplies and pharmaceuticals. 54

  • -

    Patientcare outcomes: Machine learning algorithms will analyze treatment efficacy across diverse patient populations, enabling personalized medicine approaches.

  • -

    Electronic health record trends: Natural language processing will enhance the extraction of meaningful insights from unstructured medical data, improving clinical decision support systems

  • -

    Risk assessment: AI-driven tools will evaluate population health risks, enabling proactive public health interventions and resource allocation.

5.2.2 Banking:

  • -

    Fraud detection: Real-time anomaly detection systems will identify complex fraud patterns across multiple channels, reducing financial losses. 55

  • -

    Credit risk assessment: Advanced models will incorporate alternative data sources to provide more accurate and inclusive credit scoring.

  • -

    Operational optimization: AI-powered process 56 automation will streamline back-office operations, reducing costs and improving customer service.

  • -

    Cybersecurity 57 threats: Predictive models will anticipate emerging cyber threats, enabling proactive defense strategies.

  • -

    Financial decision-making trends: AI assistants will provide personalized financial advice based on individual spending patterns and market trends.

5.2.3 Finance:

  • -

    Risk mitigation: Sophisticated models will analyze global economic indicators to predict market volatility and optimize investment strategies.

  • -

    Credit evaluation: AI algorithms will assess creditworthiness using non-traditional data sources, expanding access to financial services.

  • -

    Financial analysis: Natural language processing will extract insights from financial reports and news, enhancing investment decision-making.

  • -

    Sustainability assessment: AI tools will evaluate companies’ environmental, social, and governance (ESG) performance, influencing investment choices. 58

  • -

    Digital finance trends: Predictive models will forecast the adoption and impact of emerging technologies like blockchain and decentralized finance. 59

5.2.4 Retail:

  • -

    Sales trends: AI-driven demand forecasting will optimize inventory management and pricing strategies across multiple channels.

  • -

    Customer behavior: Advanced analytics 60 will predict individual customer preferences, enabling hyper-personalized marketing and product recommendations.

  • -

    Fraud detection: Machine learning models will identify fraudulent transactions and returns in real-time, reducing losses. 61

  • -

    Productivity optimization: AI-powered workforce management systems will predict staffing needs and optimize employee scheduling.

  • -

    External factor impacts: Predictive models will assess the influence of economic, social, and environmental factors on consumer behavior and sales performance.

5.2.5 Real Estate:

  • -

    Market trends during crises: AI models will analyze historical data and current economic indicators to predict property value fluctuations during economic downturns or global events.

  • -

    Location-based valuation: Machine learning algorithms will incorporate diverse data sources (e.g., urban development plans, climate change projections) to predict long-term property value trends. 62

  • -

    Rental market dynamics: Predictive models will forecast rental demand and pricing trends, considering factors like remote work adoption and demographic shifts.

  • -

    Investment risk assessment: AI-driven tools will evaluate potential risks and returns for real estate investments across different property types and locations.

  • -

    Sustainability impact: Predictive analytics will assess the impact of energy efficiency and sustainability features on property values and market demand.

5.2.6 Agriculture:

  • -

    Operational efficiency: AI-powered systems will optimize resource allocation, including water usage, fertilizer application, and machinery deployment.

  • -

    Disaster risk management: Predictive models will forecast extreme weather events and pest outbreaks, enabling proactive mitigation strategies.

  • -

    Crop yield projections: Machine learning algorithms will analyze soil conditions, weather patterns, and genetic data to predict crop yields with higher accuracy. 63

  • -

    Resource allocation: AI-driven decision support systems will optimize the allocation of land, water, and labor resources based on market demand and environmental factors.

  • -

    Precision agriculture: Advanced analytics will enable highly targeted interventions at the individual plant level, maximizing yields while minimizing resource use.

  • -

    Supply chain optimization: Predictive models will forecast global agricultural supply and demand, informing planting decisions and reducing food waste.

This resource delineates domains wherein big data analytics, artificial intelligence, and machine learning can generate predictive insights across sectors.

6. Need of the hour survey for various applications of data analytics

The survey paper in question presents a comprehensive analysis of data analytics applications across multiple key sectors, drawing insights from over 1,000 research papers. By focusing on 100 of these papers, the investigation provides a systematic classification of research into various domains, specifically Healthcare, Banking, Finance, Retail, Real Estate, Agriculture, and Credit Card Fraud. Each of these sectors utilizes data analytics in distinct ways, and this survey elucidates the trends and innovations shaping the implementation of data-driven technologies in these fields.

7. Key domains

7.1 Healthcare

Data analytics is applied to enhance patient outcomes, optimize hospital operations, and advance personalized medicine. Big data in healthcare 64 is transforming the utilization of patient data, electronic health records, 65 and medical research to improve diagnosis, treatment, 66 and prevention strategies.

7.2 Banking and finance

In banking and financial services, data analytics is extensively utilized 67 for risk management, fraud detection, customer profiling, and predictive modeling. Through the analysis of large datasets, financial institutions can forecast market trends, enhance decision-making processes, and improve compliance measures. 68

7.3 Retail

The retail sector employs data analytics to optimize supply chains, enhance customer experiences, and refine marketing strategies. Big data enables retailers to analyze purchasing patterns, predict demand, and personalize offers, thereby driving increased sales and operational efficiencies. The retail focuses on analyzing the spatial distribution and clustering of urban retail industry using POI big data, transportation, GDP, and population metrics.

7.4 Real estate

In real estate, data analytics facilitates the forecasting of market trends, evaluation of property values, and assessment of investment risks. By utilizing predictive models and analyzing factors such as location, demographic shifts, and economic indicators, stakeholders can make more informed decisions regarding property transactions.

7.5 Agriculture

Data analytics plays a crucial role in modern agriculture by improving crop management, resource allocation, and disaster risk assessments. Through the utilization of big data, farmers can enhance productivity, reduce costs, and mitigate the effects of climate change.

7.6 Credit/Debit card fraud

Credit card fraud detection is a critical application of data analytics, wherein machine learning models analyze transaction patterns to identify fraudulent activity. Real-time analysis enables financial institutions to detect and prevent fraud, thereby safeguarding customers’ financial assets. 69

8. Discussion

This section presents a discussion along with graphical representations of research work done in various sectors emphasizing further the advent of data analytics artificial intelligence and machine learning as the new era of research and its penetration into varied sectors.

8.1 Healthcare

Figure 2 represents number of Research Papers of healthcare Published during different source of the year (2018-2024).

Figure 3 represents the number of Research paper of Healthcare published during the year (2018-2024).

Figure 4 represents number of Research paper of Healthcare published during the year (2018-2024) by Affiliation.

Figure 5 represents number of Research paper of Healthcare published during the year (2018-2024) by College.

Figure 6 represents the number of Research paper of healthcare during the year (2018-2024) by Different Source in Percentage.

Figure 7 represents the number of Research paper of healthcare during the year (2018-2024) by country wise paper publication in percentage.

8.2 Banking

The above Figure 8 represents number of Research Paper of Banking Published during different source the year ( 2019-2024) and Figure 9 represents number of Research Paper of Banking Published during the year ( 2019-2024).

The above Figure 10 represents the number of Research Paper of Banking Published during the year ( 2019-2024) By Affiliation and Figure 11 represents the number of Research Paper of Banking Published during the year ( 2019-2024) By College.

The above Figure 12 represents the number of Research Paper of Banking Published during the year ( 2019-2024) By Different Source (in percentage).

Figure 13 represents the number of Research paper of Banking during the year (2019-2024) by country wise paper publication in percentage.

8.3 Finance

Figure 14 represents number of Research Paper of Finance Published in different source during the year ( 2018-2024) and Figure 15 represents number of Research Paper of Finance Published during the year ( 2018-2024).

Figure 16 represents the number of Research Paper of Finance Published during the year ( 2018-2024) By Affiliation and Figure 17 represents the number of Research Papers of Finance Published during the year ( 2018-2024) By College.

Figure 18 represents the number of Research Paper of Finance Published during the year ( 2018-2024) By Different Source (in percentage).

Figure 19 represents the number of Research paper of Finance during the year (2018-2024) by country wise paper publication in percentage.

8.4 Retail

Figure 20 represents number of Research Papers of Retail Published during different source of the year (2019-2024).

Figure 21 represents the number of Research paper of Retail published during the year (2018-2024).

Figure 22 represents the number of Research Paper of Retail Published during the year ( 2019-2024) By Affiliation and Figure 23 represents the number of Research Paper of Retail Published during the year ( 2019-2024) By College.

The above Figure 24 represents the number of Research Paper of Retail Published during the year ( 2019-2024) By Different Source (in percentage).

Figure 25 represents the number of Research paper of retail during the year (2019-2024) by country wise paper publication in percentage.

8.5 Real Estate

Figure 26 represents number of Research Paper of Real Estate Published during different source of the year ( 2019-2024) and Figure 27 represents number of Research Paper of Real Estate Published during the year ( 2019-2024).

Figure 28 represents the number of Research Paper of Real Estate Published during the year ( 2019-2024) By Affiliation and Figure 29 represents the number of Research Paper of Real Estate Published during the year ( 2019-2024) By College.

Figure 30 represents the number of Research Paper of Real Estate Published during the year ( 2019-2024) By Different Source (in percentage).

Figure 31 represents the number of Research paper of Real Estate during the year (2019-2024) by country wise paper publication in percentage.

The above Figure 32 represents number of Research Paper of Agriculture Published in different source during the year ( 2019-2024) and Figure 33 represents number of Research Paper of Agriculture Published during the year ( 2019-2024).

8.6 Agriculture

Figure 34 represents number of Research paper of Agriculture published during the year (2019-2024) by Affiliation.

Figure 35 represents number of Research paper of Agriculture published during the year (2019-2024) by College.

Figure 36 represents the number of Research paper of Agriculture during the year (2019-2024) by Different Source in Percentage.

Figure 37 represents the number of Research paper of Agriculture during the year (2019-2024) by country wise paper publication in percentage.

Conclusion

This survey highlights the diverse and impactful ways data analytics is being applied across various sectors. Each domain benefits from tailored approaches to handling big data, driving innovation, and improving operational efficiency. The classification of research papers into these domains offers a structured overview, guiding further exploration into how data analytics continues to evolve and address critical challenges across industries.

The initial review covered more than 1,000 papers, showcasing a thorough and extensive search effort that highlights a dedication to capturing a broad spectrum of relevant literature. The process of refining this to 100 key studies demonstrates a strategic and focused approach, allowing for a more in-depth and detailed analysis of the most important and impactful research. Including figures to depict publication trends enhances understanding by visually showing the progression of research over time, helping readers identify patterns and gaps in the literature. The use of Jamovi, Exploratory Factor Analysis (EFA), and regression analysis illustrates the application of strong quantitative methods, providing a structured and data-driven approach to exploring complex research themes. Despite some clarity issues, the review aims to identify themes like “machine dependency,” which helps organize insights from the selected studies and adds interpretive depth. By examining research trends across various sectors, such as healthcare and technology, the review offers a multifaceted perspective on the topic, which is beneficial for uncovering cross-domain insights. The inclusion of the Prisma checklist 2020 provides an overall idea of the review paper.

Ethics and consent

No ethics and consent were required.

Funding Statement

The Funding of this manuscript has been given by Kalinga Institute of Industrial Technology.

[version 2; peer review: 2 approved

Data availability

No data are associated with this article.

Reporting guidelines

Zenodo repository: PRISMA checklist for ‘Synergistic review of automation impact of big data, AI, and ML in current data transformative era. https://doi.org/10.6084/m9.figshare.28375625.v1 70

Data are available under the terms of the Creative Commons Zero “No rights reserved” data waiver (CC0 1.0 Public domain dedication).

References

  • 1. Darshana S, Rautaray SS, Pandey M: AI to Machine Learning: Lifeless Automation and Issues. Machine Learning: Theoretical Foundations and Practical Applications. 2021; pp.123–135. [Google Scholar]
  • 2. Bag S, Dhamija P, Singh RK, et al. : Big data analytics and artificial intelligence technologies based collaborative platform empowering absorptive capacity in health care supply chain: An empirical study. J. Bus. Res. 2023;154:113315. 10.1016/j.jbusres.2022.113315 [DOI] [Google Scholar]
  • 3. Wang Y: Big Data Analysis in Consumer Behavior: Evidence from the Retail, Healthcare, and Financial Services Industries. Advances in Economics, Management and Political Sciences. 2024;59:231–237. 10.54254/2754-1169/59/20231127 [DOI] [Google Scholar]
  • 4. Bragazzi NL, Dai H, Damiani G, et al. : How big data and artificial intelligence can help better manage the COVID-19 pandemic. Int. J. Environ. Res. Public Health. 2020;17(9):3176. 10.3390/ijerph17093176 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Roy C, Rautaray SS, Pandey M: Big Data Optimization Techniques: A Survey. International Journal of Information Engineering & Electronic Business. 2018;10(4):41–48. 10.5815/ijieeb.2018.04.06 [DOI] [Google Scholar]
  • 6. Kim JC, Chung K: Multi-modal stacked denoising autoencoder for handling missing data in healthcare big data. IEEE Access. 2020;8:104933–104943. 10.1109/ACCESS.2020.2997255 [DOI] [Google Scholar]
  • 7. Nazir S, Khan S, Khan HU, et al. : A comprehensive analysis of healthcare big data management, analytics and scientific programming. IEEE Access. 2020;8:95714–95733. 10.1109/ACCESS.2020.2995572 [DOI] [Google Scholar]
  • 8. Du Z, Yang Y, Zheng J, et al. : Accurate prediction of coronary heart disease for patients with hypertension from electronic health records with big data and machine-learning methods: model development and performance evaluation. JMIR Med. Inform. 2020;8(7):e17257. 10.2196/17257 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Kim JC, Chung K: Hybrid multi-modal deep learning using collaborative concat layer in health bigdata. IEEE Access. 2020;8:192469–192480. 10.1109/ACCESS.2020.3031762 [DOI] [Google Scholar]
  • 10. Ragazou K, Passas I, Garefalakis A, et al. : Big data analytics applications in information management driving operational efficiencies and decision-making: mapping the field of knowledge with bibliometric analysis using R. Big Data Cogn. Comput. 2023;7(1):13. 10.3390/bdcc7010013 [DOI] [Google Scholar]
  • 11. Brill SB, Moss KO, Prater L: Transformation of the doctor–patient relationship: big data, accountable care, and predictive health analytics. HEC Forum. Dordrecht: Springer Netherlands;2019, December; Vol.31(4): pp.261–282. [DOI] [PubMed] [Google Scholar]
  • 12. Jee K, Kim GH: Potentiality of big data in the medical sector: focus on how to reshape the healthcare system. Healthcare informatics research. 2013;19(2):79–85. 10.4258/hir.2013.19.2.79 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Borges do Nascimento IJ, Marcolino MS, Abdulazeem HM, et al. : Impact of big data analytics on people’s health: Overview of systematic reviews and recommendations for future studies. J. Med. Internet Res. 2021;23(4):e27275. 10.2196/27275 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Nandini V, Mohanan C, Peter ALA, et al. : Structured Exercise Program for Hip Arthroplasty: An Expert Consensus Using the Delphi Technique. Indian J. Orthop. 2025;59(4):539–548. 10.1007/s43465-025-01335-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Batko K, Ślęzak A: The use of Big Data Analytics in healthcare. J. Big Data. 2022;9(1):3. 10.1186/s40537-021-00553-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Wen C, Yang J, Gan L, et al. : Big data driven Internet of Things for credit evaluation and early warning in finance. Futur. Gener. Comput. Syst. 2021;124:295–307. 10.1016/j.future.2021.06.003 [DOI] [Google Scholar]
  • 17. Popescu M, Chronis G, Ohol R, et al. : An eldercare electronic health record system for predictive health assessment. 2011 IEEE 13th International Conference on e-Health Networking, Applications and Services. IEEE;2011, June; pp.193–196. [Google Scholar]
  • 18. Nayak S, Gourisaria MK, Pandey M, et al. : Heart disease prediction using frequent item set mining and classification technique. International Journal of Information Engineering and Electronic Business. 2019;11(6):9–15. 10.5815/ijieeb.2019.06.02 [DOI] [Google Scholar]
  • 19. Karatas M, Eriskin L, Deveci M, et al. : Big Data for Healthcare Industry 4.0: Applications, challenges and future perspectives. Expert Syst. Appl. 2022;200:116912. 10.1016/j.eswa.2022.116912 [DOI] [Google Scholar]
  • 20. Rathnasiri M, Dewasiri N, Singh R, et al. : Transforming Healthcare Services in South Asia: Leveraging Blockchain Technology. Using Blockchain Technology in Healthcare Settings. 2025:63–75. 10.1201/9781003483113-4 [DOI] [Google Scholar]
  • 21. Jayarathne P, Dewasiri N, Khan S: Challenges of Integrating Social Responsibility and Climate Change for the Sustainable Development Goals: Experience From the South Asian Context. pp.205–226.
  • 22. Gourisaria MK, Agrawal R, Harshvardhan GM, et al. : Application of machine learning in industry 4.0. Machine Learning: Theoretical Foundations and Practical Applications. 2021; pp.57–87. 10.1007/978-981-33-6518-6_4 [DOI] [Google Scholar]
  • 23. Shafqat S, Kishwer S, Rasool RU, et al. : Big data analytics enhanced healthcare systems: a review. J. Supercomput. 2020;76:1754–1799. 10.1007/s11227-017-2222-4 [DOI] [Google Scholar]
  • 24. Yadav K, Pandey M, Rautaray SS: Feedback analysis using big data tools. 2016 International Conference on ICT in Business Industry & Government (ICTBIG). IEEE;2016, November; pp.1–5. [Google Scholar]
  • 25. Moharana M, Pandey M, Routaray SS: Why big data, and what it is: basics to advanced big data journey for the medical industry. Handbook of Data Science Approaches for Biomedical Engineering. Academic Press;2020; pp.221–249. [Google Scholar]
  • 26. Srivastava U, Gopalkrishnan S: Impact of big data analytics on banking sector: Learning for Indian banks. Procedia Comput. Sci. 2015;50:643–652. 10.1016/j.procs.2015.04.098 [DOI] [Google Scholar]
  • 27. Vengatesan K, Kumar A, Yuvraj S, et al. : Credit card fraud detection using data analytic techniques. Adv. Math., Sci. J. 2020;9(3):1185–1196. [Google Scholar]
  • 28. Razaque A, Frej MBH, Bektemyssova G, et al. : Credit card-not-present fraud detection and prevention using big data analytics algorithms. Appl. Sci. 2022;13(1):57. 10.3390/app13010057 [DOI] [Google Scholar]
  • 29. Stanikzai AQ, Shah MA: Evaluation of cyber security threats in banking systems. 2021 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE;2021, December; pp.1–4. [Google Scholar]
  • 30. Kour M: Dynamics of Big Data Analytics in the Banking Sector. 2024 IEEE International Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI). IEEE;2024, March; Vol.2: pp.1–6. [Google Scholar]
  • 31. Goldstein I, Spatt CS, Ye M: Big data in finance. Rev. Financ. Stud. 2021;34(7):3213–3225. 10.1093/rfs/hhab038 [DOI] [Google Scholar]
  • 32. Roy AG, Urolagin S: Credit risk assessment using decision tree and support vector machine based data analytics. Creative Business and Social Innovations for a Sustainable Future: Proceedings of the 1st American University in the Emirates International Research Conference—Dubai, UAE 2017. Springer International Publishing;2019; pp.79–84. [Google Scholar]
  • 33. Sazu MH, Jahan SA: How Big Data Analytics is transforming the finance industry. Bankarstvo. 2022;51(2):147–172. 10.5937/bankarstvo2202147H [DOI] [Google Scholar]
  • 34. Soldatos J, Kyriazis D: Big Data and artificial intelligence in digital finance: Increasing personalization and trust in digital finance using Big Data and AI. Springer Nature;2022; p.363. [Google Scholar]
  • 35. Kumar R, Grover N, Singh R, et al. : Imperative role of artificial intelligence and big data in finance and banking sector. 2023 International Conference on Sustainable Computing and Data Communication Systems (ICSCDS). IEEE;2023, March; pp.523–527. [Google Scholar]
  • 36. Benzidia S, Makaoui N, Bentahar O: The impact of big data analytics and artificial intelligence on green supply chain process integration and hospital environmental performance. Technol. Forecast. Soc. Chang. 2021;165:120557. 10.1016/j.techfore.2020.120557 [DOI] [Google Scholar]
  • 37. Sharma J, Sharma D, Sharma K: Retail Analytics to anticipate Covid-19 effects Using Big Data Technologies. 2021 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE). IEEE;2021, December; pp.1–6. [Google Scholar]
  • 38. Ramasami MV, Thangaraj R, Kumar SM: Analysis of Bigmart Outlets sales using Big Data Analytics Method. 2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC). IEEE;2023, May; pp.1143–1146. [Google Scholar]
  • 39. Castellano N, Del Gobbo R, Leto L: Using Big Data to enhance data envelopment analysis of retail store productivity. Int. J. Product. Perform. Manag. 2024;73(11):213–242. 10.1108/IJPPM-03-2023-0157 [DOI] [Google Scholar]
  • 40. Zhang D, Bayer S, Wills GB, et al. : Using Big Data Analytics to Combat Retail Fraud. FEMIB. 2022, April;85–92. [Google Scholar]
  • 41. Pathak AR, Pandey M, Rautaray S: Topic-level sentiment analysis of social media data using deep learning. Appl. Soft Comput. 2021;108:107440. 10.1016/j.asoc.2021.107440 [DOI] [Google Scholar]
  • 42. Schultz DE, Block MP: Using big data file fusion to determine the effects of social media on retail brand preference. Appl. Mark. Anal. 2014;1(1):81–102. 10.69554/LCAK1104 [DOI] [Google Scholar]
  • 43. Gao M: The Transformation of Traditional Retail Industry in the Era of Big Data. Adv. Econ. Manag. Political Sci. 2023;38:59–63. 10.54254/2754-1169/38/20231887 [DOI] [Google Scholar]
  • 44. Shrivastava P, Sahoo L, Pandey M: Architecture for the strategy-planning techniques using big data analytics. Smart Computing and Informatics: Proceedings of the First International Conference on SCI 2016. Springer Singapore;2018; Vol.1: pp.649–657. [Google Scholar]
  • 45. Ranjan J, Foropon C: Big data analytics in building the competitive intelligence of organizations. Int. J. Inf. Manag. 2021;56:102231. 10.1016/j.ijinfomgt.2020.102231 [DOI] [Google Scholar]
  • 46. Taşcılar M, Arslanlı KY: Forecasting commercial real estate indicators under COVID-19 by adopting human activity using social big data. Asia Pac. J. Reg. Sci. 2022;6(3):1111–1132. 10.1007/s41685-022-00254-7 [DOI] [Google Scholar]
  • 47. Waghela A, Makadia D, Mangla M: Utilizing Machine Learning and Big Data Analysis for Risk Mitigation and Fraud Detection in Finance. 2023.
  • 48. Wang C, Gao Y, Aziz A, et al. : Agricultural disaster risk management and capability assessment using big data analytics. Big Data. 2022;10(3):246–261. 10.1089/big.2020.0411 [DOI] [PubMed] [Google Scholar]
  • 49. Stice H: The supply of information and price formation: Evidence from Google’s search engine. Contemp. Account. Res. 2023;40(3):1999–2031. 10.1111/1911-3846.12866 [DOI] [Google Scholar]
  • 50. Galetsi P, Katsaliaki K, Kumar S: Big data analytics in health sector: Theoretical framework, techniques and prospects. Int. J. Inf. Manag. 2020;50:206–216. 10.1016/j.ijinfomgt.2019.05.003 [DOI] [Google Scholar]
  • 51. Juhn Y, Liu H: Artificial intelligence approaches using natural language processing to advance EHR-based clinical research. J. Allergy Clin. Immunol. 2020;145(2):463–469. 10.1016/j.jaci.2019.12.897 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Mishra S, Pandey M, Rautaray SS, et al. : A survey on big data analytical tools & techniques in health care sector. Int. J. Emerg. Technol. 2020;11(3):554–560. [Google Scholar]
  • 53. Chen PT, Lin CL, Wu WN: Big data management in healthcare: Adoption challenges and implications. Int. J. Inf. Manag. 2020;53:102078. 10.1016/j.ijinfomgt.2020.102078 [DOI] [Google Scholar]
  • 54. Dewasiri N: Highlights of Circular Economy Actions in the Climate Change Policies: A Call for Action for Asia-Pacific Region. pp.23–41.
  • 55. Kanaujia PKM, Pandey M, Rautaray SS: Real time financial analysis using big data technologies. 2017 International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC). IEEE;2017, February; pp.131–136. [Google Scholar]
  • 56. Pandl KD, Thiebes S, Schmidt-Kraepelin M, et al. : On the convergence of artificial intelligence and distributed ledger technology: A scoping review and future research agenda. IEEE Access. 2020;8:57075–57095. 10.1109/ACCESS.2020.2981447 [DOI] [Google Scholar]
  • 57. Razavi H, Jamali MR, Emsaki M, et al. : Quantifying the Financial Impact of Cyber Security Attacks on Banks: A Big Data Analytics Approach. 2023 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE). IEEE;2023, September; pp.533–538. [Google Scholar]
  • 58. Kumar S, Sharma D, Rao S, et al. : Past, present, and future of sustainable finance: insights from big data analytics through machine learning of scholarly research. Ann. Oper. Res. 2022;1–44. 10.1007/s10479-021-04410-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Malhotra R, Malhotra DK: The Impact of Technology, Big Data, and Analytics: The Evolving Data-Driven Model of Innovation in the Finance Industry. J. Finance Data Sci. 2023;5(3):50–65. 10.3905/jfds.2023.1.129 [DOI] [Google Scholar]
  • 60. Su H, Damian MAE: Spatial Distribution Analysis of Urban Retail Industry Using POI Big Data. Int. J. Emerg. Technol. Adv. Appl. 2024;1(2):1–7. 10.62677/IJETAA.2402105 [DOI] [Google Scholar]
  • 61. Delamaire L, Abdou HAH, Pointon J: Credit card fraud and detection techniques: a review. Banks Bank Syst. 2009;4(2). [Google Scholar]
  • 62. Pineda Montserrat B: Predictive business analytics for real estate: a tool for estimating and analyzing housing prices. Universitat Politècnica de Catalunya;2024. Master’s thesis. [Google Scholar]
  • 63. Van Klompenburg T, Kassahun A, Catal C: Crop yield prediction using machine learning: A systematic literature review. Comput. Electron. Agric. 2020;177:105709. 10.1016/j.compag.2020.105709 [DOI] [Google Scholar]
  • 64. Mazumder MSA: The Transformative Impact of Big Data in Healthcare: Improving Outcomes, Safety, and Efficiencies. Global Mainstream Journal of Business, Economics, Development & Project Management. 2024;3(03):1–12. 10.62304/jbedpm.v3i03.82 [DOI] [Google Scholar]
  • 65. Lo NW, Wu CY, Chuang YH: An authentication and authorization mechanism for long-term electronic health records management. Procedia Comput. Sci. 2017;111:145–153. 10.1016/j.procs.2017.06.021 [DOI] [Google Scholar]
  • 66. Wang Y, Ng K, Byrd RJ, et al. : Early detection of heart failure with varying prediction windows by structured and unstructured data in electronic health records. 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE;2015, August; pp.2530–2533. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Gupta T, Gupta N, Agrawal A, et al. : Role of big data analytics in banking. 2019 International Conference on contemporary Computing and Informatics (IC3I). IEEE;2019, December; pp.222–227. [Google Scholar]
  • 68. Nobanee H: A bibliometric review of big data in finance. Big Data. 2021;9(2):73–78. 10.1089/big.2021.29044.edi [DOI] [PubMed] [Google Scholar]
  • 69. Jha BK, Sivasankari GG, Venugopal KR: Fraud detection and prevention by using big data analytics. 2020 Fourth international conference on computing methodologies and communication (ICCMC). IEEE;2020, March; pp.267–274. [Google Scholar]
  • 70. Rath S, Pandey M, Swarup Routaray S: completed_PRISMA_checklist.docx. figshare. Journal contribution. 2025. 10.6084/m9.figshare.28375625.v1 [DOI]
F1000Res. 2025 Jul 26. doi: 10.5256/f1000research.181687.r390987

Reviewer response for version 2

Edwin Ramirez Asis 1

The article addresses the topic from a scientific research perspective, and version 2 has addressed some shortcomings. It is accepted for indexing.

Regarding the review, the classification of research articles into these domains provides a structured overview that guides exploration of how data analysis continues to evolve and address critical challenges in different industries. Furthermore, the use of Jamovi, Exploratory Factor Analysis (EFA), and regression analysis illustrate the application of robust quantitative methods, providing a structured, data-driven approach to exploring complex research topics.

Are the rationale for, and objectives of, the Systematic Review clearly stated?

Yes

Is the statistical analysis and its interpretation appropriate?

Yes

If this is a Living Systematic Review, is the ‘living’ method appropriate and is the search schedule clearly defined and justified? (‘Living Systematic Review’ or a variation of this term should be included in the title.)

Yes

Are sufficient details of the methods and analysis provided to allow replication by others?

Yes

Are the conclusions drawn adequately supported by the results presented in the review?

Yes

Reviewer Expertise:

I hold a PhD in business administration, am a university professor with over 20 years of experience, and have 49 articles published in Scopus.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

F1000Res. 2025 Jun 2. doi: 10.5256/f1000research.181687.r386651

Reviewer response for version 2

Geeta Sandeep Nadella 1

Areas for Improvement  

 1. Clarity of Objectives and Rationale  

- Issue: The rationale lacks explicit justification for the review’s synergistic focus. Why is a cross-sector analysis necessary? How does this fill gaps left by sector-specific reviews?  

- Fix:  

  - Clearly state the review’s unique contribution (e.g., identifying common challenges like machine dependency across sectors).  

  - Define specific research questions (e.g., "How do AI/ML adoption barriers differ between healthcare and finance?").  

 2. Statistical and Analytical Rigor  - Only descriptive stats are present, but no inferential stats. This is a crtical issue.

- Issue: Methods like EFA and regression are mentioned but lack context. Are these applied to the reviewed studies or original data? Assumptions (e.g., normality) and effect sizes (e.g., β-values) are unaddressed.  

- Fix:  

  - Clarify the purpose of statistical tools (e.g., "EFA was used to categorize challenges").  

  - Include meta-analysis (if feasible) to quantify trends (e.g., "AI improved fraud detection accuracy by 30% in 15/20 studies").  

 4. Critical Analysis and Synthesis  - Section 6, 7 and 8 are poorly written. add the actual analysis and discussion in there, not just pointing out to the pictures.

- Issue: The discussion is descriptive rather than analytical. For example:  

  - Machine dependency in healthcare is discussed but not compared to challenges in retail (e.g., vendor lock-in vs. scalability).  

  - Ethical considerations (e.g., data privacy) are mentioned but not linked to sector-specific case studies.  

- Fix:  

  - Add a comparative table summarizing cross-sector challenges (e.g., "Healthcare: Interoperability vs. Retail: Vendor lock-in").  

  - Discuss limitations of existing studies (e.g., bias toward high-income countries in agricultural data).  

 5. Presentation and Formatting  

- Issue:  

  - Mislabeled figures (e.g., Table 6 labeled as "Digital security culture" instead of XDR).  

  - Inconsistent citation formatting (e.g., missing DOIs in some references).  

- Fix:  

  - Standardize table/figure labels (e.g., "Figure 6: Extended Detection and Response (XDR)").  

  - Ensure all references follow journal guidelines (e.g., italicize journal names, include URLs/DOIs).  

 6. Conclusion and Implications  

- Issue: Conclusions restate findings without actionable recommendations (e.g., "machine-independent solutions are needed" lacks sector-specific guidance).  

- Fix:  

  - Propose solutions (e.g., "Adopt open-source frameworks in healthcare to reduce vendor lock-in").  

  - Highlight future research directions (e.g., "Explore AI ethics in agricultural disaster management").  

---

 Recommendations for Revision  

1. Restructure the Introduction: Explicitly state gaps in existing literature and the review’s unique value.  

2. Expand Methods Section: Detail search strategies, inclusion/exclusion criteria, and data synthesis protocols.  

3. Enhance Critical Analysis: Compare sector-specific challenges and synthesize trends (e.g., "Data quality issues persist across all sectors").  

4. Correct Presentation Errors: Ensure accurate labeling of figures/tables and consistent citation formatting.  

5. Strengthen Conclusions: Link findings to practical implications (e.g., "Policymakers should prioritize interoperable frameworks").  

---

 Final Remarks  

This review has significant potential to inform interdisciplinary research and practice. Addressing methodological transparency, deepening critical analysis, and refining presentation will elevate its impact. Consider collaborating with a methodologist or statistician to strengthen quantitative synthesis.

Are the rationale for, and objectives of, the Systematic Review clearly stated?

Partly

Is the statistical analysis and its interpretation appropriate?

Partly

If this is a Living Systematic Review, is the ‘living’ method appropriate and is the search schedule clearly defined and justified? (‘Living Systematic Review’ or a variation of this term should be included in the title.)

Not applicable

Are sufficient details of the methods and analysis provided to allow replication by others?

Partly

Are the conclusions drawn adequately supported by the results presented in the review?

Partly

Reviewer Expertise:

AI, ML, Systems Analysis, Theoretical designs, Big Data, Cybersecurity, Information Systems Management.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

F1000Res. 2025 Apr 28. doi: 10.5256/f1000research.177513.r371087

Reviewer response for version 1

Narayanage Jayantha Dewasiri 1

Thank you for your contribution. Your manuscript has a serious flaw in the methodology. Please address the followings. 

Please apply PRISMA and PICO approaches under a separate section called Methodology which outlines the scientific approach that you have taken for this study. You can cite the below papers to support the use of PRISMA and PICO approaches as the methodology.

Please see [Ref 1 to 6].

Are the rationale for, and objectives of, the Systematic Review clearly stated?

Yes

Is the statistical analysis and its interpretation appropriate?

Yes

If this is a Living Systematic Review, is the ‘living’ method appropriate and is the search schedule clearly defined and justified? (‘Living Systematic Review’ or a variation of this term should be included in the title.)

Yes

Are sufficient details of the methods and analysis provided to allow replication by others?

Partly

Are the conclusions drawn adequately supported by the results presented in the review?

Yes

Reviewer Expertise:

Management & AI

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

References

  • 1. : Structured Exercise Program for Hip Arthroplasty: An Expert Consensus Using the Delphi Technique. Indian J Orthop .2025;59(4) : 10.1007/s43465-025-01335-3 539-548 10.1007/s43465-025-01335-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. : Structured Exercise Program for Hip Arthroplasty: An Expert Consensus Using the Delphi Technique. Indian J Orthop .2025;59(4) : 10.1007/s43465-025-01335-3 539-548 10.1007/s43465-025-01335-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. : Transforming Healthcare Services in South Asia.2025; 10.1201/9781003483113-4 63-75 10.1201/9781003483113-4 [DOI] [Google Scholar]
  • 4. : Challenges of Integrating Social Responsibility and Climate Change for the Sustainable Development Goals: Experience From the South Asian Context. 10.1108/S2043-052320250000025011 205-226 10.1108/S2043-052320250000025011 [DOI] [Google Scholar]
  • 5. : Highlights of Circular Economy Actions in the Climate Change Policies: A Call for Action for Asia-Pacific Region. 10.1108/S2043-052320250000025002 23-41 10.1108/S2043-052320250000025002 [DOI] [Google Scholar]
  • 6. : Structured Exercise Program for Hip Arthroplasty: An Expert Consensus Using the Delphi Technique. Indian J Orthop .2025;59(4) : 10.1007/s43465-025-01335-3 539-548 10.1007/s43465-025-01335-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
F1000Res. 2025 May 6.
Swastik Rath 1

Respected Sir,

#1 - Prisma Flow diagram is uploaded on the Methodological Section of the Systematic Review Paper in new version 2.

#2 - All the papers that are mentioned are cited in the new version 2.

Yours faithfully

 Swastik Rath

F1000Res. 2025 Apr 26. doi: 10.5256/f1000research.177513.r377823

Reviewer response for version 1

Geeta Sandeep Nadella 1

#1 The systematic review partially clarifies its rationale and objectives but requires refinement for greater precision. The rationale broadly highlights the transformative role of automation, big data, AI, and ML across six sectors (e.g., healthcare, finance) and acknowledges challenges like data quality and privacy. However, it lacks explicit justification for its synergistic focus, failing to address gaps in existing literature (e.g., fragmented sector-specific reviews) or the urgency of cross-sector analysis. The objectives outline the intent to explore technological impacts and list applications (e.g., fraud detection in banking) but are overly vague, omitting structured goals (e.g., comparative analyses of adoption barriers) and methodological rigor (e.g., undefined study selection criteria).

To enhance clarity, the review should explicitly state its unique contribution (e.g., synthesizing cross-sector insights), structure objectives as distinct research questions, and define scope (e.g., peer-reviewed studies from 2018–2025). Methodological transparency, such as a PRISMA flowchart and justification for sector inclusion, would strengthen its systematic approach. While the review’s intent is evident, sharper articulation of gaps, objectives, and methods would bolster its academic rigor and practical relevance.

#2 The systematic review partly provides sufficient methodological details, with notable strengths and weaknesses. Strengths include a broad foundation of 1,000+ reviewed papers, focused analysis of 100 key studies, and visual aids (e.g., figures on publication trends) to illustrate sector-specific research activity. Analytical tools like Jamovi and statistical methods (EFA, regression) are referenced, suggesting technical rigor. However, weaknesses undermine reproducibility: the absence of a PRISMA flow diagram obscures study selection processes; inclusion/exclusion criteria for selecting the 100 studies are undefined; search strategies (databases, keywords, timeframe justification) are omitted; and data synthesis methods (e.g., theme identification like "machine dependency") lack clarity. Additionally, statistical protocols (e.g., variable selection, regression assumptions) are inadequately described.

To improve, the review should adopt standard systematic review practices:

Include a PRISMA flowchart to detail screening stages (identification, screening, eligibility, inclusion).

Define search strings (e.g., "big data AND healthcare AND machine learning") and inclusion/exclusion criteria (e.g., peer-reviewed articles from 2018–2025).

Clarify data extraction (e.g., coding frameworks for sector/thematic categorization) and statistical rationale (e.g., why EFA was chosen over other methods).

In conclusion, while the review’s scope and analytical tools are commendable, critical methodological gaps hinder replication. Enhancing transparency in search strategies, study selection, and analysis protocols would strengthen rigor and ensure reproducibility. Addressing these issues would align the review with systematic standards and bolster its academic credibility.

#3 The statistical analysis in the systematic review is partially appropriate. Appropriate aspects include the use of validated methods like Exploratory Factor Analysis (EFA) with factor loadings >0.7 and Cronbach’s α >0.7 to assess reliability, as well as regression analysis to quantify relationships (e.g., financial constraints β = 0.604). Descriptive statistics, such as demographic data (e.g., 67% low literacy) and publication trends, are also clearly presented. These methods align with standard practices for analyzing survey or primary data.

However, key weaknesses limit the analysis. The application of EFA and regression is unconventional for systematic reviews unless synthesizing primary data, and the manuscript does not clarify whether these methods were applied to the reviewed studies or a separate dataset. Critical assumption checks (e.g., normality, multicollinearity) are missing, and results lack contextual interpretation (e.g., effect sizes or practical implications of β values). Additionally, the absence of meta-analysis—a hallmark of systematic reviews—undermines quantitative synthesis of findings across studies.

To improve, the review should clarify the methodology (e.g., specify if statistical tests were applied to reviewed studies or original data), validate assumptions (e.g., residual analysis for regression), and contextualize results (e.g., discuss effect sizes like Cohen’s f²). Adopting PRISMA guidelines, conducting meta-analyses, and justifying analytical choices would enhance rigor. While the statistical techniques are technically sound, their unconventional application and lack of transparency limit reproducibility and alignment with systematic review standards. Strengthening these areas would ensure the analysis meets academic expectations.

#4 The systematic review’s conclusions are partly supported by the presented results. Supported conclusions include sector-specific applications of AI/ML (e.g., fraud detection in banking, disease prediction in healthcare), validated by figures showing publication trends and case studies. The discussion of machine dependency in sectors like healthcare and agriculture is also substantiated, with examples highlighting scalability and interoperability challenges. However, several conclusions lack robust evidence. Claims about the transformative impact of AI/ML are overstated, as the review does not quantify outcomes (e.g., healthcare efficacy improvements) or critically assess real-world adoption. Challenges like data quality and ethics are mentioned but not deeply explored, failing to link ethical concerns to sector-specific examples. Recommendations for machine-independent solutions or future directions lack empirical backing from the reviewed studies.

Key gaps weaken the review’s credibility. There is no comparative analysis of AI/ML adoption barriers across sectors (e.g., healthcare vs. retail), undermining claims of cross-sector synergies. The review superficially addresses limitations like data privacy, without synthesizing evidence on their prevalence or severity. Overreliance on publication metrics (e.g., paper counts) conflates research activity with technological impact, ignoring study quality or practical implementation. For instance, trends in healthcare publications (Figure 1) do not confirm whether AI tools are clinically effective or widely adopted.

To strengthen conclusions, the review should integrate sector-specific evidence (e.g., “AI improved diagnostic accuracy in 15/20 healthcare studies”) and employ meta-analytic techniques to quantify trends (e.g., average fraud detection accuracy gains). Challenges and future directions must be explicitly tied to findings (e.g., agricultural machine dependency justifying interoperable solutions). Addressing these gaps through deeper synthesis and critical appraisal would enhance the review’s rigor and credibility, ensuring conclusions are both data-driven and actionable.

Are the rationale for, and objectives of, the Systematic Review clearly stated?

Partly

Is the statistical analysis and its interpretation appropriate?

Partly

If this is a Living Systematic Review, is the ‘living’ method appropriate and is the search schedule clearly defined and justified? (‘Living Systematic Review’ or a variation of this term should be included in the title.)

Not applicable

Are sufficient details of the methods and analysis provided to allow replication by others?

Partly

Are the conclusions drawn adequately supported by the results presented in the review?

Partly

Reviewer Expertise:

AI, ML, Systems Analysis, Theoretical designs, Big Data, Cybersecurity, Information Systems Management.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

F1000Res. 2025 May 6.
Swastik Rath 1

Respected Sir,

The changes that you suggested to do in the manuscript in done in the new version

#1- This change was done and attached after the Last part of the Introduction.

#2- The Prisma Flow flow chart is uploaded on the Methodological Section of the Systematic Review Paper in version 2

#3 - Country-wise documents percentage graph published  in the new version from Figure No. 31 to Figure No. 36

#4 - The modification of the conclusion part is modified and attached at the last of the conclusion

There are a few modifications in the graphs in the new version of the manuscript.

#3- The country-wise document percentage graphs of all the different sectors are shown in the last figures of all the sectors.

Yours Faithfully

Swastik Rath

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Availability Statement

    No data are associated with this article.

    Reporting guidelines

    Zenodo repository: PRISMA checklist for ‘Synergistic review of automation impact of big data, AI, and ML in current data transformative era. https://doi.org/10.6084/m9.figshare.28375625.v1 70

    Data are available under the terms of the Creative Commons Zero “No rights reserved” data waiver (CC0 1.0 Public domain dedication).


    Articles from F1000Research are provided here courtesy of F1000 Research Ltd

    RESOURCES