Abstract
A layman in health systems is a person who doesn’t have any knowledge about health data i.e., X-ray, MRI, CT scan, and health examination reports, etc. The motivation behind the proposed invention is to help laymen to make medical images understandable. The health model is trained using a neural network approach that analyses user health examination data; predicts the type and level of the disease and advises precaution to the user. Cellular Automata (CA) technology has been integrated with the neural networks to segment the medical image. The CA analyzes the medical images pixel by pixel and generates a robust threshold value which helps to efficiently segment the image and identify accurate abnormal spots from the medical image. The proposed method has been trained and experimented using 10000+ medical images which are taken from various open datasets. Various text analysis measures i.e., BLEU, ROUGE, and WER are used in the research to validate the produced report. The BLEU and ROUGE calculate a similarity to decide how the generated text report is closer to the original report. The BLEU and ROUGE scores of the experimented images are approximately 0.62 and 0.90, claims that the produced report is very close to the original report. The WER score 0.14, claims that the generated report contains the most relevant words. The overall summary of the proposed research is that it provides a fruitful medical report with accurate disease and precautions to the laymen.
Keywords: Medical image, Layman, Healthcare, Machine learning, Cellular automata
Subject terms: Computational biology and bioinformatics, Engineering
Introduction
A simplified framework or method that makes it easier for people to understand and manage their health is known as a health model for layman. Nutrition, exercise, stress management, hygiene, regular check-ups, mental health, education, and other basic health concepts can be helpful for those who want to maintain or improve their well-being. The healthcare sectors are shifting drastically, so healthcare professionals may find it difficult to the rapid advancement of new treatments and technologies. it can be challenging for healthcare workers to keep up with the constant development of new technologies and therapies.
The layperson’s guide to healthcare requires revealing elaborated healthcare systems and enabling people to make knowledgeable decisions about their health. The healthcare model offers extensive guidelines; individuals should consult medical specialists for tailored advice regarding their health conditions. Making educated judgments also requires remaining informed and actively participating in one’s healthcare path1.
A set of computation prototypes recognized as neural cellular automata (NCA) composites concepts from neural networks with cellular automata. The neural networks are computer models that are devoted to tasks like pattern recognition and machine learning (ML) and are inspired by the form and operation of biological brain networks. The neural network has a group of interconnected artificial neurons, which are organized into multiple layers. A discrete computational model called cellular automata (CA) is designated up of a grid of cells, each of which can exist in a limited number of states. Every cell’s state changes over discrete time intervals according to a set of rules that are usually derived from the states of nearby cells. Medical image analysis is one of the many domains in which cellular automata play an important role2.
The NCA offers a wide range of potential applications in the healthcare industry using medical images, resource allocation, and disease modeling. Figure 1 depicts various applications to be implemented using NCA in the healthcare system. The NCA works on the data stored in a group of grids or cells and arranged in the data structure of one-dimensional (1D), and two-dimensional (2D). The cellular automata with the neural network may efficiently process different types of medical imaging techniques i.e., Orthopedic, Peripheral smear, and radiologist imaging, etc. and segmentation and classification-based techniques. The NCA starts working by initializing the grids with binary values 0 or 1. To enhance the quality of results, the large data is divided into blocks of
or
. All the neighbor cells in the blocks are investigated and produce a threshold value for processing the source data. The threshold value will be produced by investigating a certain number of neighboured cells. Consider a block B(5, 5) of 5 rows and 5 columns, and C(i, j) is a cell of the block. The threshold value
of the individual cell is concluded according to Eqs. 1 to 5, where the value N represents the total number of neighbors, that contributed to finalizing the individual cell threshold value. More number of neighbours may produce a robust threshold value. The unique threshold value
for the entire block is finalized using Eqs. 6 and 73,4.
![]() |
1 |
![]() |
2 |
![]() |
3 |
![]() |
4 |
![]() |
5 |
![]() |
6 |
![]() |
7 |
Fig. 1.
Applications of NCA in healthcare system.
Modern healthcare relies heavily on medical imaging, which helps with early diagnosis, efficient treatment planning, and better patient consequences. Medical imaging is a field that is constantly changing due to continued research, ethical considerations, and technological advancements. ML in healthcare has emerged as one of the hottest buzzwords recently. NCA can process different medical images, helping to identify abnormalities of the body’s interior, easing diagnosis, healing forecast, and supervising different health settings5,6.
Key objectives and contributions
The proposed research has been implemented to assist doctors, and guide patients and healthy persons with the following key contributions:
The proposed invention integrated artificial neural network concepts with cellular automata to efficiently process data and respond accurately to users.
The proposed smart healthcare system is useful for laymen in medical science to know about medical data and to assist laymen regularly.
The smart healthcare system provides patient data remotely, which is helpful for doctors to monitor patients competently.
Motivation and challenges
This study makes use of preceding studies on healthcare management systems by applying qualitative, quantitative, and experimental literature reviews. The survey found some research with strong contributions but has some challenges:
Ensuring commitment to healthcare regulations and data protection laws creates substantial challenges.
Encounter scalability challenges while handling a substantial amount of healthcare data.
Challenging to validate and generalization on a diversity of sample datasets.
Challenging for the persons, who are illiterate about the medical imaging and reports.
Generally, the healthcare system was implemented to assist doctors and provide health-related information to the insurance industries, but no research was found to guide laymen about their health records. The limitation in existing research motivates the researchers to enhance the existing healthcare systems with the following objectives:
To analyze medical images and identify diseases.
Generate a straightforward report that is helpful to the laymen.
Therefore, the proposed research will answer laymen’s and doctors’ queries. Generally, laymen’s queries may be related to their medical imaging report and doctor’s queries may be for their assistance purpose.
Which part of the body is infected and what type of disease?
What precautions should be taken to overcome the disease?
What are the possible solutions to cure the disease?
Literature survey
The Health Models have been around for a while. They are especially helpful for people in terms of data personalisation, weekly follow-ups, tracking of daily habits, disease identification, meditations, fitness, mood tracking, hydration tracking, etc. For many years, many health guide-based applications have been used, i.e. Headspace, Clue, Apple Fitness, RunKeeper, Talkspace, Sleep Score, Fitbit, Life Sum, etc., to help people in stress mode, exercise, diet plans, walking, and daily routine activities7.
Much research has been done on the health guide system, to guide people regarding examining health issues, preserving health records, evaluating health conditions, nutrition monitoring, hygiene compliance, etc. The key objective of this work is to use cellular automata and ML to assist and improve the knowledge of laymen in the healthcare area. We place a high emphasis on interdisciplinary collaboration while acknowledging the ethical and societal consequences of healthcare technology. Our goal is to offer useful information to people who are unaware of healthcare information systems. We reinforce incorporated solutions that deliberate a wide range of standards, in contrast to traditional review articles that frequently focus on areas of healthcare technology. We encompass the depth of our literature analysis including different technologies, i.e. Artificial Intelligence, Blockchain, IoT, and Cloud Computing, by investigating ethical considerations, and equitable access to healthcare8,9.
The smart healthcare system is very useful for different users i.e., doctors, insurance industries, patients, and healthy persons. Healthcare Systems support doctors in maintaining their schedules, getting medical histories of patients, carrying out and analysing, making diagnostic procedures, and making treatment recommendations. The insurance sector can gain some advantages from a smart healthcare system, including increased productivity, lower costs, and better client satisfaction10. Smart healthcare systems are about to transform the insurance sector and eventually expand the outcomes for both policyholders and insurers. Smart healthcare systems offer various advantages to both patients and healthy individuals by providing enhanced patient outcomes, better chronic disease management, and the promotion of general health and wellness for both patients and healthy individuals, all made possible by smart healthcare systems. A smart healthcare system involves the collaboration of multiple parties, including life science companies, healthcare institutions, and smart hospitals, all of which are indispensable to developing healthcare conveyance through technology11. Health monitoring, guiding, and personalized treatment are the different circumstances, where the smart healthcare system works remotely. Different emerging technologies i.e., artificial intelligence, blockchain, IoT, cloud computing, and computer vision help may the smart healthcare systems to grow drastically12–14. Figure 2 represents different stakeholders i.e., major applicants, service contributors, emerging technologies, and descriptive scenarios of smart healthcare systems. Table 1 represents various emerging technologies used in healthcare systems with various key contributions that have been made and key challenges that may faced users of the systems.
Fig. 2.
Different Stakeholders of Healthcare Systems.
Table 1.
Promising Technologies with Key Contributions and Challenges in Healthcare.
| Promising Technologies | Known Methodology | Key Contributions | Key Challenges |
|---|---|---|---|
| Machine Learning14–20 |
Convolutional Neural Networks Transfer Learning Recurrent Neural Networks Graph Convolutional Networks Self-Supervised Learning |
Early Diagnosis and Classification of Diseases Extrapolative Investigation for Patient Consequences Enhancing the effectiveness of treatment and reducing side effects Improve the procedures |
Protecting the confidentiality and integrity of healthcare data. Legislative and ethical issues. Challenging to validate and generalize on a diversity of sample datasets. |
| Blockchain1,21–23 |
Secure Data Sharing Immutable Patient Records Interoperability Data Monetization and Inducements Supply Chain Management |
Preserves the confidentiality and accuracy. Support trustworthy interoperability to exchange data. Provide better control over their health data. Provide transparent supply chain management. |
Encounter scalability challenges while handling a substantial amount of healthcare data. Legacy healthcare systems are not compatible. Ensuring commitment to healthcare regulations and data protection laws creates substantial challenges. |
| IoT24–26 |
Integration with Electronic Health Records Remote Monitoring Devices IoT-enabled Imaging Devices |
Monitor patients remotely and arbitrate rapidly when required. Patients can vigorously contribute to their care. Monitor and enhance compliance with medication. |
Vital for preserving sensitive data and restricting unwanted access. Healthcare systems may be overloaded by the massive quantity of data. |
| Computer Vision14–16 |
Segmentation and Reconstruction Classification and Analysis Image-Guided Interventions Detection and Diagnosis Virtual Histology |
Enables detection of medical disorders. Assist in diagnosis by examining visual data to find abnormalities. Emphasizes minimally invasive treatments. Monitor patients in real-time by visual indicators. |
It is imperative to put strong security measures on visual data. May demand substantial computational resources. Big challenge to trusting computer vision technologies. |
There are many kinds of emerging technologies, but some are better suited to the demands of the healthcare sector than others. In the healthcare industry, ML engineers frequently concentrate on developing software and hardware to support physicians and patients, detecting patterns in massive clinical data sets, and optimizing medical records. ML algorithms are particularly useful for the healthcare sector because they may assist in making sense of the enormous amounts of healthcare data that are created daily within electronic health records. We can discover patterns and insights in medical data using ML techniques, such as ML algorithms, that would be impossible to identify manually27. Healthcare providers have the chance to adopt a more predictive approach to precision medicine as ML in healthcare achieves wider use. This will result in a more unified system with enhanced treatment delivery, better patient outcomes, and more efficient patient-based operations. ML technologies have a wide range of potential applications in healthcare, including improving patient data, medical research, diagnosis, and treatment, as well as cutting costs and improving patient safety28. This survey explores different possibilities of emerging technologies in the healthcare system, some are as:
Medical experts can apply ML in healthcare to create better diagnostic tools for examining medical images.
To discover previously undiscovered drug adverse effects, ML in healthcare could be used to analyze data and medical research from medical testing.
Healthcare organizations can employ ML technology to increase the effectiveness of the industry, which could result in cost savings.
Medical personnel can apply ML in healthcare to enhance the standard of patient care.
Most research in healthcare systems analyses different types of medical images, especially focusing on identifying abnormalities. Recent research shows that the machine learning approach with computer vision plays a crucial role in the early diagnosis and classification of different types of diseases. Lung-related infections can be identified using the chest X-ray radiation technique. A slighter harmful and small radiation is used in this technique to produce a picture of the infected body part29. A melanoma, skin cancer be identified using deep learning techniques by processing specific dermatology images. International Skin Imaging Collaboration provides high-quality public skin images30. Glioma is a brain tumor that may emerge from the glial cells and damage the brain, and spinal nerves. Different ML methods i.e., CNN, RNN, FNN, and U-net are well-known methods that help to locate gliomas31. Segmentation, classification, reconstruction, filtration, augmentation, and visualisation are the key processing techniques for medical images. These are the common techniques used in medical image processing approaches32. Table 2 represents the common image processing techniques, and traditional and machine learning approaches with relevant applications of medical imaging.
Table 2.
Common Image Processing Techniques.
| Image Processing Techniques | Approach | Applications |
|---|---|---|
| Segmentation: Object location and region partitioning29,32 |
Thresholding, boundary highlighting Random Forest, SVM, CNN, U-Net |
Tumor detection, Organ Delineation Disease Diagnosis and Monitoring Treatment Planning |
| Classification: Image categorization into predefined classes and assigning labels.30,31 |
HOG, SIFT, kNN DenseNet, resNet Multimodality |
Tissues type characterization. Disease classification. Health conditions classifying. |
| Reconstruction and Filtration: Reconstruct incomplete images, construct images from raw data and remove noise33,34 |
Filtered Back Projection, GAN, LSTM Spatial and Gaussian filter YOLO, DETR |
Artifacts removal. Creating 3D MRI and CT images. Improve the quality of the image. |
| Augmentation: Improve model performance, and disparate training data generation.35 |
Elastic and Geometric transformation Generative Models Histogram equalisation, Intensity Adjustment |
Enhance training dataset. Reduce overfitting. Emphasizes minimally invasive treatments. |
Proposed neural cellular automata methodology
The proposed methodology was implemented to guide laymen about their 2D medical records i.e., X-ray, MRI, CT scan, ultrasound images, etc. To produce efficient and robust results after processing medical images, the concept of Neural Network (NN) has been integrated with Cellular Automata (CA) known as neural cellular automata (NCA). From the perspective of medical image processing, incorporating neural networks and cellular automata proposes a synergistic methodology that features the advantages of both techniques. Neural networks are exceptional at obtaining complex patterns and descriptions from data, while cellular automata are discrete, spatiotemporal models that are skilled in portraying local interactions and dynamics28. The specification of the proposed method to guide laymen is depicted in Fig. 3. The detailed study of the key steps is discussed in the following subsections.
Fig. 3.
Working structure of neural cellular automata.
Pre-processing with CA
The proposed method works on different types of medical images of different colors. Sometimes the images provided by the users may contain different types of blurs. This section of the proposed methodology is used for image smoothing and color conversion. To streamline the algorithmic process, decrease computationally complex factors, and highlight a specific region, the image needs to be transformed into a grayscale or black-and-white format. Equation 8 is used for color conversion, where 9, and 10 could be used for image smoothing36,37. Consider
is an input image of size M * N.
and
are equivalent grey and smoothed grey images, respectively.
![]() |
8 |
![]() |
9 |
![]() |
10 |
Features extraction using neural network
Image features, like color, texture, shape, and intensity gradients help to define the interest of the region of the image. The image features are very helpful for image segmentation, edge detection, and locating image objects38. Neural network-based systems are essential for obtaining features from images because they are competent in automatically obtaining hierarchical representations from raw pixel data. Equation 11 is used to extract features of the input image
of size M*N by processing each pixel
of the image. The pixel of the image is to be processed with the help of a kernel or convolution operator
as depicted in Eq. 12. Equation 13 is the rectified linear unit which works as an activation function.
![]() |
11 |
![]() |
12 |
![]() |
13 |
Integration of CA with NN
The integrated approach of neural networks with CA, called NCA makes it more powerful and intelligent can analyze the neighborhood cells, and helps to map an elegant rule. The rule mapping process of the proposed methodology has two primary operations i.e., neighbours’ analysis and rule mapping by defining relations between the neighbours to conclude the threshold.
Neighbors analysis
The Neighbor’s analysis operation generates a neighbor matrix using the binary image pixel intensity which is stored in various cells. The NCA analysis 8 number of neighbors surrounding the cell. During the analysis, the NCA verifies whether the cell is alive or dead. If the value of the cell is 1, it should be considered alive and with a 0 value, the cell is considered dead. The nbrAnalysis() algorithm accepts a binary image
of size M *N as an input, produced by the pre-processing step. The algorithm examines left, right, up, down, and diagonal cells to produce a neighbor matrix. Algorithm 1 represents the logical aspects of the neighbors’ analysis of image
pixels.
Algorithm 1.

nbrAnalysis(IMG S[M][N])
Rule mapping
The NCA rule is decided according to the number of neighbor cells or the value of the cell in the neighbor matrix. The neighbor matrix has 9 possible values i.e., 0 to 8. The value of the neighbor matrix represents the number of alive neighbors of the respective cell. For example, the value 0 represents that the cell does not have any live neighbor, the value 1 represents that the cell has only one live adjacent neighbor and the rest all are dead, the value 2 represents the cell has two live adjacent neighbors and rest all are dead neighbors, and so on.
The proposed method decides the efficient rule for medical imaging according to the loneliness and overpopulation of the cells. The neighbor matrix
, produced by the neighbor’s analysis section works as an input of this section. The NCA rule should be defined based on a threshold value. Equations 14 to 19 decide a threshold after examining the neighbor matrix.
![]() |
14 |
![]() |
15 |
![]() |
16 |
![]() |
17 |
The T1 and T2 are the unique threshold values for the source image, which may vary from image to image. Consider a binary image, whose neighbor matrix may contain all the possible values i.e., 0 to 8. The value T1 will be
and the value T2 will be
. Hence, the threshold value for the inputted image will be [4, 6]. The threshold value should be applied to the neighbor matrix and decide whether the resultant image cell or pixel will be alive or dead. Equations 18 and 19 decide whether the resultant image
pixel will be alive or dead. The algorithm ruleMap() shows the step-by-step process of rule mapping for the NCA. The rule mapping algorithm takes the neighbor matrix as an input analysis outer neighbors of all the cells and decides a unique threshold value for all the cells.
![]() |
18 |
![]() |
19 |
Algorithm 2.

ruleMap(IMG S[][], int NB[][])
The rule mapping step of the proposed methodology analyses all the surrounding neighbor pixels of all the pixes and generates a unique threshold value that helps to map the NCA rule. This threshold generation and rule mapping process takes M*N time, hence the worst-case complexity of the step is
.
Dynamic adaption
The dynamic adjustment of neural networks provides flexibility to the cellular automata and makes it cleverer. With this cleverer, the CA may adjust the number of analyzing neighbors for finalizing robust thresholding. The adaptive nature of CA may introduce new neighbors or remove the neighbors from the existing environment. The dynamic learning rate of the CA should be adjusted using the current state and updated number of neighbors. Equations 20 and 21 represent the updated thresholding rule, where
is the learning rate,
and
are the loss functions. Equations 22, 23, and 24 represent the learning rate adjustment function
, and neighbors’ addition
and removal
functions respectively for the
iteration. It’s necessary to keep in mind that the adaptation strategy chosen will rely on the nature of the dynamic changes and the problem at hand.
![]() |
20 |
![]() |
21 |
![]() |
22 |
![]() |
23 |
![]() |
24 |
Data augmentation
The proposed NCA methodology guides the layman by processing text and image data. To produce effective and robust results, the data needs to be expanded elegantly. Data augmentation may moderate over-fitting and enhance the proposed model’s ability for generalization and accurate prediction. Rotation, fusion, scaling, translations, and elastic deformation techniques are used in this research to provide quality results and help in clinical decision-making. The rotation method rotation can be used to simulate clinical states by providing different angles of medical images. The image
could be rotated at a specific angle
according to Eqs. 25, 26, and 27, where (P, Q),
, and
are the coordinates of the specific pixel of the image, centered pixel, and rotated pixel of the source image
. The fusion method combines information from multiple images into a single image. The images
etc. are combined with a specific weight
, etc., where the value of
. Equation 28 represents a fusion of multiple images into a single image
. The image resizing is performed using the scaling method, which shrinks or stretches the image along with the x-axis and y-axis. The scaling factors
and
of the axis shown in Eq. 29 decide the amount of resizing the source image
. Shifting of image pixels along with the axis and a specific distance is performed using the translation method. Equation 30 represents shifting the source image
pixels using a translation distance
and
along with the x-axis and y-axis.
![]() |
25 |
![]() |
26 |
![]() |
27 |
![]() |
28 |
![]() |
29 |
![]() |
30 |
The technique of elastic deformation is simulating the elastic characteristics of biological tissues. It is used to distort the image locally while maintaining its overall structure39. This approach made the model resilient for enhancing the medical imaging data, which adds realistic local deformations to the medical image. To maintain the enhanced images’ clinical relevance, the deformation parameters such as the amplitude and smoothness of the displacement fields need to be carefully selected. To characterize both global and local deformations, the displacement field DF(p, q) can be preserved as a confusion of a smooth function and random noise. For source image
, and a pixel (p, q) the x-axis and y-axis displacement are
and
respectively. The displacement field DF(p, q) of the source image could be defined according to Eq. 31, and after deformation, the new pixel coordinates
will be represented according to Eqs. 32 and 33.
![]() |
31 |
![]() |
32 |
![]() |
33 |
The intensity of the deformed pixel depends on the updated coordinates. Let us x, and y are the fractions of the updated coordinates of
, and
respectively, and S(P, Q), and
are the source and augmented deformed image. The deformation of the source image could be decided using Eq. 34.
![]() |
34 |
Segmentation and interpretability
The medical image segmentation and interpretation are performed using the threshold function
defined in the rule mapping section. The segmentation and interpretability processes are used to extract features from the image. The segmentation of an image takes place by deciding whether the image pixel is happy, lonely, or overpopulated. The image pixel is to be considered happiness if the number of alive neighbors is between
and
. Once the OTCA rule or threshold function is decided, it should be applied to all the cells of the neighbored matrix. The threshold function works as a global or unique function that reads the current state of the cell and generates a new state. The transition function decides the new state of the cells according to the following criteria.
If a cell has a value of 0, it remains dead.
If a cell is alive and has a value less than
, it dies or becomes dead due to loneliness.If a cell is alive and has a value greater than
, it dies or becomes dead due to overcrowding.If a cell is alive and has a value from
to
, it goes on living or it remains in a live state.
The transition function finds the cells that are alive and states whether happiness, loneliness, or overpopulation. The current state of the cell is updated to a new state according to the threshold function and current state. If the selected cell’s current state is in a happy state and lies between
and
then the new state of the cell will be alive or 1. If the current state is loneliness or overpopulation state then the new state will be dead or 0. Equation 35 shows the process of image segmentation and interpret-ability that depends on the source image
, and equivalent neighbor matrix
generated in the subsection ”Neighbors analysis” and return segmented image
. The segmentation process takes O(1) time per pixel, so the segmentation of the entire image is
.
![]() |
35 |
Decision synthesis
The segmented image may contain a sequence of extracted features that may help to generate a competent medical report. The Long Short-Term Memory (LSTM) algorithm is used in the decision synthesis step for report generation44,45. Let
=
be the n number of sequenced features extracted from the image
. For a timestamp t, the different hidden, forget, cell state, input, output, input activation gates, and memory cells of LSTM are
respectively. Algorithm 3 represents the process of the decision synthesis step using LSTM for report generation of medical images. The bias and weight vectors have been initiated using the gradient descent technique. The decision synthesis algorithm receives the output of the hidden layer and extracted features of the medical image. The decision synthesis algorithm predicts a vocabulary Voc that contains unordered data related to the medical image and finally returns a meaningful report
. The time complexity of the report generation process is O(n) .
Algorithm 3.

DecisionSynthesis(
)
Experimental results and performance analysis
Dataset
The proposed research has been trained with more than 10,000 medical images and tested on approximately 1500 different types of medical images46. The training images are taken from various open databases i.e., Mendeley, SIIM-ISIC, MIMIC-CXR, IU X-ray, NIH Chest X-ray, and TCIA47,48. Table 3 depicts the classification of training datasets with the source of the dataset.
Table 3.
Classification of training datasets.
| Dataset | Type | Investigated dataset | Source of dataset |
|---|---|---|---|
| SIIM-ISIC40 | open | 2000+ | https://www.kaggle.com/datasets/prateek0x/siimisic-segmented-and-balanced-dataset |
| MIMIC-CXR41 | open | 2000+ | https://www.kaggle.com/datasets/wasifnafee/mimic-cxr |
| IU X-ray42 | open | 3000+ | https://www.kaggle.com/datasets/raddar/chest-xrays-indiana-university |
| NIH Chest X-ray34 | open | 2000+ | https://www.kaggle.com/datasets/nih-chest-xrays/data |
| TCIA43 | open | 1000+ | https://www.kaggle.com/datasets/awsaf49/cbis-ddsm-breast-cancer-image-dataset |
SSIM-ISIC
This dataset is a notable open dataset in the Dermatology field which offers skin images with clinical metadata and labels whether the Melanoma is present or absent. It’s a sizable dataset appropriate for a range of skin images to help researchers identify the type of skin cancer40.
MIMIC-CXR
This dataset is widely used in the research of chest X-ray imaging which also provides radiology reports. The MIMIC-CXR dataset contains 350000+ images with detailed textual information41.
IU X-ray
This dataset is openly available for researchers and academicians for clinical reports, disease classifications, and specific annotations for identifying different abnormalities from chest X-ray images42.
NIH chest X-ray
This dataset provides 100000+ medical images freely for training different ML models. The dataset is labeled with detailed captions, usually used to generate clinical reports.
TCIA
The dataset has a wide range of medical images i.e., MRI, X-ray, CT, and PET images which usually help to diagnose different types of tumors. The TCIA is broadly used in various research related to medical imaging and ML43.
Performance matrices
The proposed research generates textual reports by examining medical images. The quality of the generated text report is measured using the BLEU (Bilingual Evaluation Understudy) score, ROUGE (Recall-Oriented Understudy for Gisting Evaluation) score, and WER (Word Error Rate) score. The BLEU, ROUGE, and WER are the common measures used to assess efficacy and quality49,50.
BLEU score
The BLEU is used to assess machine-generated text with one or more human reference translations. BLEU is a prevalent tool for natural language processing functions, particularly machine translation, as it acknowledges the objective measurement of translation quality. The BLEU score evaluates by considering the recall and works by measuring similarity based on n-gram or
precision, where n varies from 1 to 4.
or uni-gram,
or bi-gram,
or tri-gram, and
or n-gram are used to measures words and sentences which are evaluated for precision. The finer translation quality could be achieved with higher BLEU ratings; a perfect score of 1 denotes an exact match with the reference translations51. The BLEU is evaluated for each n-gram up to the maximum length N,
precision, TL translated text length, and RL referenced text length using Eqs. 36 and 37.
![]() |
36 |
![]() |
37 |
ROUGE score
The ROUGE score evaluates the generated text by comparing it with the referenced text, aggregating a score, and deciding how well the text is generated. The aggregation score helps to determine the association between the generated and referenced text. The aggregation value may vary from 0 to 1 and the higher value represents valuable association. The ROUGE score evaluates by considering the precision and may be defined as
,
, and
representing the precision of measuring n-gram, longest common sub-sequence, and weighted longest common sub-sequence respectively. The ROUGE between generated text GT and referenced text RT is defined using Eqs. 38, 39, and 40.
![]() |
38 |
![]() |
39 |
![]() |
40 |
The count(RT) and length(RT) are the number of words and longest common sub-sequences in the RT, and LCS(GT, RT) and WLCS(GT, RT) are the longest common sub-sequences and weighted LCS between GT and RT.
WER score
The WER is used to identify the fraction of the extra words, missing words, and incorrect words generated in the text. Equation 39 helps to identify that fraction where TW, EW, MW, and IW are the total words, extra words, missing words, and incorrect words in the generated text.
![]() |
41 |
Report generation and performance analysis
The proposed NCA method has been trained using several different medical images taken from open datasets, discussed in section 4.1. The proposed method accurately works on all the tested images and generated images. To generate a robust report, the NCA segments the image by analyzing nearby neighbour pixels of all the image pixels and generates a unique threshold
. The threshold value compares with the image pixels and segments the image. The decision synthesis step of the NCA examines all the segmented objects and generates a textual report. The report maintains various related information i.e., MeSH (Medical Subject Heading), symptoms, outcomes, impacts, and precautions52. Figure 4 depicts the result summary of the proposed NCA method experimented on chest X-ray images of the IU X-ray dataset. Generated reports and suggested precautions have been discussed in the figure. The figure contains the summary report of two images, after analyzing the report, the NCA concluded that the first image is suffering from Pulmonary and the second image is suffering from Pneumothorax. The NCA has identified some common symptoms, causes or outcomes, and impacts of the causes and finally suggests some precautions. Figures 5 and 6 summarize the results of the proposed NCA method, experimented on the MRI and skin images. Each figure contains two images taken from MIMIC-CXR, TCIA, SIIM-ISIC, and NIH datasets, respectively. The Report Subject heading, symptoms, outcomes, impacts, and precautions have been discussed with images of the figures. The quantitative analysis of the proposed NCA methodology has been discussed in Table 4 using various measures i.e., BLEU-1, BLEU-2, BLEU-3, BLEU-4, ROUGE-N, ROUGE-L, ROUGE-W and WER. The table shows the produced statistical data experimented on 25 test images. The produced average BLEU score ranges from 0.5 to 0.7, the average ROUGE score ranges from 0.6 to 0.85 and the average WER score ranges from 0.1 to 0.15. The graphical analysis of BLEU or precision, ROUGE or recall, and word error rate using images of the
dataset is depicted in Fig. 7.
Fig. 4.
Experimented results summary of the proposed NCA method on Chest X-ray images.
Fig. 5.
Experimented results summary of the proposed NCA method on MRI images.
Fig. 6.
Experimented results summary of the proposed NCA method on skin images.
Table 4.
Analysis of the NCA method using performance evaluation criteria.
| Source Data | BLEU-1 | BLEU-2 | BLEU-3 | BLEU-4 | ROUGE-N | ROUGE-L | ROUGE-W | WER |
|---|---|---|---|---|---|---|---|---|
| MIMIC_IM_1.png | 0.546 | 0.647 | 0.596 | 0.451 | 0.892 | 0.719 | 0.552 | 0.156 |
| MIMIC_IM_2.png | 0.661 | 0.762 | 0.711 | 0.596 | 0.807 | 0.676 | 0.509 | 0.146 |
| MIMIC_IM_3.png | 0.569 | 0.674 | 0.621 | 0.526 | 0.915 | 0.734 | 0.563 | 0.197 |
| MIMIC_IM_4.png | 0.686 | 0.787 | 0.736 | 0.571 | 0.932 | 0.739 | 0.572 | 0.191 |
| MIMIC_IM_5.png | 0.556 | 0.657 | 0.606 | 0.481 | 0.902 | 0.724 | 0.557 | 0.103 |
| ISIC_0188955.jpg | 0.446 | 0.547 | 0.496 | 0.351 | 0.792 | 0.669 | 0.502 | 0.041 |
| ISIC_1684127.jpg | 0.496 | 0.597 | 0.546 | 0.301 | 0.842 | 0.694 | 0.527 | 0.158 |
| ISIC_7205745.jpg | 0.507 | 0.608 | 0.557 | 0.334 | 0.853 | 0.699 | 0.532 | 0.198 |
| ISIC_0464315.jpg | 0.521 | 0.622 | 0.571 | 0.376 | 0.867 | 0.706 | 0.539 | 0.174 |
| ISIC_5365617.jpg | 0.487 | 0.588 | 0.537 | 0.374 | 0.833 | 0.689 | 0.522 | 0.184 |
| IU_IMG_1.jpg | 0.542 | 0.643 | 0.592 | 0.439 | 0.888 | 0.717 | 0.550 | 0.101 |
| IU_IMG_2.jpg | 0.576 | 0.677 | 0.626 | 0.541 | 0.922 | 0.734 | 0.567 | 0.094 |
| IU_IMG_3.jpg | 0.503 | 0.604 | 0.553 | 0.422 | 0.849 | 0.697 | 0.536 | 0.148 |
| IU_IMG_4.jpg | 0.589 | 0.693 | 0.641 | 0.585 | 0.935 | 0.741 | 0.573 | 0.144 |
| IU_IMG_5.jpg | 0.616 | 0.597 | 0.626 | 0.511 | 0.912 | 0.729 | 0.562 | 0.142 |
| NIH_IMG_1.jpg | 0.686 | 0.687 | 0.636 | 0.511 | 0.943 | 0.744 | 0.577 | 0.146 |
| NIH_IMG_2.jpg | 0.556 | 0.657 | 0.606 | 0.481 | 0.954 | 0.75 | 0.586 | 0.126 |
| NIH_IMG_3.jpg | 0.446 | 0.547 | 0.496 | 0.351 | 0.868 | 0.707 | 0.542 | 0.143 |
| NIH_IMG_4.jpg | 0.496 | 0.597 | 0.546 | 0.301 | 0.934 | 0.745 | 0.573 | 0.136 |
| NIH_IMG_5.jpg | 0.507 | 0.608 | 0.557 | 0.334 | 0.789 | 0.667 | 0.583 | 0.174 |
| Tr-gl_0010.jpg | 0.521 | 0.622 | 0.571 | 0.376 | 0.823 | 0.684 | 0.517 | 0.084 |
| Tr-gl_0011.jpg | 0.487 | 0.588 | 0.537 | 0.374 | 0.852 | 0.701 | 0.533 | 0.121 |
| Tr-me_0026.jpg | 0.542 | 0.643 | 0.592 | 0.439 | 0.839 | 0.692 | 0.525 | 0.094 |
| Tr-no_0018.jpg | 0.576 | 0.677 | 0.626 | 0.541 | 0.913 | 0.729 | 0.562 | 0.148 |
| Tr-pi_0013.jpg | 0.503 | 0.604 | 0.553 | 0.422 | 0.892 | 0.719 | 0.552 | 0.174 |
Fig. 7.
Analysis of NCA method experimented using MIMIC datasets.
The
dataset images n-grams
,
, and WER scores are depicted in Fig. 8. Figures 9, 10, and 11 depicted results of the same measures of images of the
,
, and TCIA datasets respectively. The comparative analysis of the average score of all the images of the respective dataset is represented in Fig. 12 using the same measures. By analysing precision, and recall on different n-grams of the proposed NCA using Figs. 7, 8, 9, 10, 11, 12, we state that the generated text has high quality and adequacy. Figure 13 represents the average performance analysis of 15,00 test images that were examined using the same measuring criteria. The BLEW-2 and ROUGE-N scores of all the test images are between 60% and 80% which claims, the generated report has maximum unique text with high-quality score.
![]() |
42 |
Fig. 8.
Analysis of NCA method experimented SIIM-ISIC datasets.
Fig. 9.
Analysis of NCA method experimented using IU datasets.
Fig. 10.
Analysis of NCA method experimented using NIH datasets.
Fig. 11.
Analysis of NCA method experimented using TCIA datasets.
Fig. 12.
Comparative analysis of NCA method experimented on the average of different types of datasets.
Fig. 13.
Average performance analysis of NCA using different measuring criteria.
The performance of the proposed study is evaluated using the F1-score. The proposed study works on different classes of medical images, so the F1 score is the best metric to evaluate the performance The BLEU-2 and ROUGE-N scores have been considered to evaluate the F1 score of the different classes of the datasets. The F1 score of the images is estimated using the Eq. 4253. The overall performance of the proposed NCA method is shown in Fig. 14. The figure represents a comparative analysis of the precision, recall, and F1 scores of the different datasets. As per the produced results; the proposed NCA method claims the following:
The BLEU and ROUGE scores of the experimented images are approximately 0.62 and 0.90, which claims that the produced report is very close to the original report.
The 0.14 WER score claims that the generated report contains the most relevant words.
The F1 score of the result produced on different classes of images is between 0.7 and 0.9 which claims the adequate accuracy of the study.
The overall summary of the proposed research states that it provides a fruitful medical report with accurate disease and precautions to the layman.
Fig. 14.
Performance analysis of NCA using F1 Score of different classes of the images.
Conclusion, limitations, and future scope
Conclusion
This research highlights the remarkable improved achievement in the domain of medical image captioning by utilizing neural cellular automata. The NCA has been implemented using the utility of cellular automata, deep learning, and artificial intelligence innovations to achieve the objectives. Our research has shown how different models may produce precise and useful captions for medical images, which can help layman, and medical practitioners with diagnosis, treatment planning, and medical education. Furthermore, we have emphasized the possibility of incorporating domain-specific information and larger, more varied datasets to enhance model performance further. As the demand for automated medical image processing grows, the results support ongoing efforts to improve patient outcomes and healthcare delivery through the synergy of medicine and technology. The accuracy of the proposed NCA method was measured using BLEU, ROUGE, andWER parameters. Different
scores of BLEU have been measured to the quality of the generated text and claim a minimum of 55% accurate text. The ROUGE measures the quality of different aspects i.e.,
, and longest common sub-sequence. The word generation error rate of the text is measured by the WER score. The average ROUGE score of the proposed NCA is 0.71 and the WER score is 0.14 which represents the generated text has detailed information with low word errors.
Limitations
As discussed in the result section, the BLEU score of the proposed method varies from 50-60, which indicates the variable generation of the text. The BLUE score should be improved and better than the human-generated report. The bi-gram and longest ROUGE score of the generated text was found up to a satisfactory level but the weighted ROUGE should be improved.
Future scope
The proposed study is very helpful for Laymen, who are unable to understand medical images. In the future scope of the study, the authors will target to improve the BLEU and ROUGE-W scores of the generated report. Data augmentation, reverse translation, occlusion conditions and optimization techniques will be introduced to enhance the ability of the model to pinpoint dominant information in the report. The future study will integrate the proposed research into a mobile application with a user-friendly environment.
Acknowledgements
https://github.com/drskscs/Healthcare-Model
Author contributions
S.K.S. and V.S.S. wrote the main manuscript text, C.L.C. refined/validated the manuscript, A.R. and A.A.K.E.F. prepared figures. All authors reviewed the manuscript.
Code availability
The code and the process of accessing the study code is available at https://github.com/drskscs/Healthcare-Model. Researchers can access the code by requesting a license from the corresponding author.
Data availability
The datasets used in the manuscript are available from the corresponding authors on a reasonable request by email.
Declarations
Competing interests
The authors declare no competing interests.
Ethical approval
The proposed study did not comprise any experiments on human participants or the use of human tissue samples that require ethical approval. All data used were obtained from publicly available datasets under MIT Incense, ensuring compliance with relevant guidelines and regulations. All the methods implemented in the study were followed in compliance with applicable rules and regulations. There was no need for institutional or licensing committee permission because there were no direct human experiments.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Chiranji Lal Chowdhary, Vijay Shankar Sharma, Adil Rasool and Arfat Ahmad Khan contributed equally to this work.
Contributor Information
Vijay Shankar Sharma, Email: vijayshankar.sharma@jaipur.manipal.edu.
Adil Rasool, Email: adilrasool@bakhtar.edu.af.
References
- 1.Condry, M. W. & Quan, X. I. Remote patient monitoring technologies and markets. IEEE Eng. Manag. Rev.51(3), 59–64. 10.1109/EMR.2023.3285688 (2023). [Google Scholar]
- 2.Catrina, S., Catrina, M., Băicoianu, A. & Plajer, I. C. Learning about growing neural cellular automata. IEEE Access12, 45740–45751. 10.1109/ACCESS.2024.3382541 (2024). [Google Scholar]
- 3.Sharma, S. K., Sharma, V. S., Basheer, S., Chaurasia, A. & Chowdhary, C. L. An astute automaton model for objects extraction using outer totality cellular automata (otca). IEEE Access11, 123876–123890. 10.1109/ACCESS.2023.3329473 (2023). [Google Scholar]
- 4.Vinuja, G. & Devi, N. B. Multitemporal Hyperspectral Satellite Image Analysis and Classification Using Fast Scale Invariant Feature Transform and Deep Learning Neural Network Classifier (2023)
- 5.Ye, C. & Chen, C. Secure medical image sharing for smart healthcare system based on cellular neural network. Complex Intell. Syst. 9 (2023)
- 6.Chukwu, E. & Garg, L. A systematic review of blockchain in healthcare: Frameworks, prototypes, and implementations. IEEE Access8, 21196–21214. 10.1109/ACCESS.2020.2969881 (2020). [Google Scholar]
- 7.Ahmad, I. et al. Emerging technologies for next generation remote health care and assisted living. IEEE Access10, 56094–56132. 10.1109/ACCESS.2022.3177278 (2022). [Google Scholar]
- 8.Almotairi, K. H. Application of internet of things in healthcare domain. J. Umm Al-Qura Univ. Eng. Archit.14(1), 1–12. 10.1007/s43995-022-00008-8 (2023). [Google Scholar]
- 9.Das, S., Nayak, S. P., Sahoo, B. & Nayak, S. C. Machine learning in healthcare analytics: A state-of-the-art review. Arch. Comput. Methods Eng.10.1007/s11831-024-10098-3 (2024). [Google Scholar]
- 10.Elbedwehy, S., Medhat, T., Hamza, T. & Alrahmawy, M. F. Enhanced descriptive captioning model for histopathological patches. Multimed. Tools Appl.83(12), 36645–36664. 10.1007/s11042-023-15884-y (2024). [Google Scholar]
- 11.Beddiar, D.-R., Oussalah, M. & Seppänen, T. Automatic captioning for medical imaging (mic): A rapid review of literature. Artif. Intell. Rev.56(5), 4019–4076. 10.1007/s10462-022-10270-w (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Badawy, M., Ramadan, N. & Hefny, H. A. Healthcare predictive analytics using machine learning and deep learning techniques: A survey. J. Electr. Syst. Inf. Technol.10(1), 40. 10.1186/s43067-023-00108-y (2023). [Google Scholar]
- 13.Khan, M. M. & Alkhathami, M. Anomaly detection in iot-based healthcare: Machine learning for enhanced security. Sci. Rep.14(1), 5872. 10.1038/s41598-024-56126-x (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Pang, T., Li, P. & Zhao, L. A survey on automatic generation of medical imaging reports based on deep learning. BioMed. Eng. OnLine22(1), 48. 10.1186/s12938-023-01113-y (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Zhang, Y. et al. Machine learning-based medical imaging diagnosis in patients with temporomandibular disorders: A diagnostic test accuracy systematic review and meta-analysis. Clin. Oral Investig.28(3), 186. 10.1007/s00784-024-05586-6 (2024). [DOI] [PubMed] [Google Scholar]
- 16.Peng, Y. & Deng, H. Medical image fusion based on machine learning for health diagnosis and monitoring of colorectal cancer. BMC Med. Imaging24(1), 24. 10.1186/s12880-024-01207-6 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.El-Shafai, W., Aly, R., Taha, T.E.-S. & El-Samie, F. E. A. Cnn: A tool to fuse multi-modality medical images. J. Opt.10.1007/s12596-023-01092-2 (2023). [Google Scholar]
- 18.Zhou, B., Yang, G., Shi, Z. & Ma, S. Natural language processing for smart healthcare. IEEE Rev. Biomed. Eng.17, 4–18. 10.1109/RBME.2022.3210270 (2024). [DOI] [PubMed] [Google Scholar]
- 19.Baker, S. & Xiang, W. Artificial intelligence of things for smarter healthcare: A survey of advancements, challenges, and opportunities. IEEE Commun. Surv. Tutor.25(2), 1261–1293. 10.1109/COMST.2023.3256323 (2023). [Google Scholar]
- 20.Salehi, A. W. et al. A study of cnn and transfer learning in medical imaging: Advantages, challenges, future scope. Sustainability10.3390/su15075930 (2023). [Google Scholar]
- 21.Al-Nbhany, W. A. N. A., Zahary, A. T. & Al-Shargabi, A. A. Blockchain-iot healthcare applications and trends: A review. IEEE Access12, 4178–4212. 10.1109/ACCESS.2023.3349187 (2024). [Google Scholar]
- 22.Andrew, A. et al. Blockchain for healthcare systems: Architecture, security challenges, trends and future directions. J. Netw. Comput. Appl.10.1016/j.jnca.2023.103633 (2023). [Google Scholar]
- 23.Andrew, J. et al. Blockchain for healthcare systems: Architecture, security challenges, trends and future directions. J. Netw. Comput. Appl.215, 103633. 10.1016/j.jnca.2023.103633 (2023). [Google Scholar]
- 24.Abubeker, K. M. & Baskar, S. Lorawan-based artificial intelligence intensive care unit framework for tracking patients with severe pneumonia. IEEE Sens. Lett.7(12), 1–4. 10.1109/LSENS.2023.3328608 (2023).37529707 [Google Scholar]
- 25.Musa, U. et al. Design and implementation of active antennas for iot-based healthcare monitoring system. IEEE Access12, 48453–48471. 10.1109/ACCESS.2024.3384371 (2024). [Google Scholar]
- 26.Jabeen, T. et al. An intelligent healthcare system using iot in wireless sensor network. Sensors10.3390/s23115055 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Sun, Q., Zhang, J., Fang, Z. & Gao, Y. Self-enhanced attention for image captioning. Neural Process. Lett.56(2), 131. 10.1007/s11063-024-11527-x (2024). [Google Scholar]
- 28.Haleem, A., Javaid, M., Singh, R. P., Suman, R. & Rab, S. Blockchain technology applications in healthcare: An overview. Int. J. Intell. Netw.2, 130–139. 10.1016/j.ijin.2021.09.005 (2021). [Google Scholar]
- 29.Chest x-ray analysis empowered with deep learning: A systematic review. Appl. Soft Comput. 126, 109319. 10.1016/j.asoc.2022.109319 (2022) [DOI] [PMC free article] [PubMed]
- 30.Meedeniya, D., De Silva, S., Gamage, L. & Isuranga, U. Skin cancer identification utilizing deep learning: A survey. IET Image Process.18(13), 3731–3749. 10.1049/ipr2.13219 (2024). [Google Scholar]
- 31.Wijethilake, N. et al. Glioma survival analysis empowered with data engineering-a survey. IEEE Access9, 43168–43191. 10.1109/ACCESS.2021.3065965 (2021). [Google Scholar]
- 32.Kumarasinghe, H., Kolonne, S., Fernando, C. & Meedeniya, D. U-net based chest x-ray segmentation with ensemble classification for covid-19 and pneumonia. Int. J. Online Biomed. Eng. (iJOE)18(07), 161–175. 10.3991/ijoe.v18i07.30807 (2022). [Google Scholar]
- 33.Li, F. et al. Dn-detr: Accelerate detr training by introducing query denoising. IEEE Trans. Pattern Anal. Mach. Intell.46(4), 2239–2251. 10.1109/TPAMI.2023.3335410 (2024). [DOI] [PubMed] [Google Scholar]
- 34.Ragab, M. G. et al. A comprehensive systematic review of yolo for medical object detection (2018 to 2023). IEEE Access12, 57815–57836. 10.1109/ACCESS.2024.3386826 (2024). [Google Scholar]
- 35.Chlap, P. et al. A review of medical image data augmentation techniques for deep learning applications. J. Med. Imaging Radiat. Oncol.65(5), 545–563 (2021). [DOI] [PubMed] [Google Scholar]
- 36.Sharma, S. K. & Kumar, A. Digital image transformation using outer totality cellular automata. In Machine Intelligence Techniques for Data Analysis and Signal Processing (eds Sisodia, D. S. et al.) 851–858 (Springer, Singapore, 2023). [Google Scholar]
- 37.Sharma, S. K., Kumar, A. & Singh, U. P. Enhanced edges detection from different color space. In Proceedings of the 4th International Conference on Information Management & Machine Intelligence. ICIMMI ’22. (Association for Computing Machinery, New York, NY, USA, 2023). 10.1145/3590837.3590853 .
- 38.Tan, Y. et al. Medical image description based on multimodal auxiliary signals and transformer. Int. J. Intell. Syst.10.1155/2024/6680546 (2024). [Google Scholar]
- 39.Yang, X. et al. Assessment of lung deformation in patients with idiopathic pulmonary fibrosis with elastic registration technique on pulmonary three-dimensional ultrashort echo time mri. Insights Imaging15(1), 17. 10.1186/s13244-023-01555-x (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Cassidy, B., Kendrick, C., Brodzicki, A., Jaworek-Korjakowska, J. & Yap, M. H. Analysis of the isic image datasets: Usage, benchmarks and recommendations. Med. Image Anal.75, 102305. 10.1016/j.media.2021.102305 (2022). [DOI] [PubMed] [Google Scholar]
- 41.Selivanov, A. et al. Medical image captioning via generative pretrained transformers. Sci. Rep.13(1), 4171. 10.1038/s41598-023-31223-5 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Kong, J.-W., Oh, B.-D., Kim, C. & Kim, Y.-S. Sequential brain ct image captioning based on the pre-trained classifiers and a language model. Appl. Sci.10.3390/app14031193 (2024). [Google Scholar]
- 43.Prior, F. et al. The public cancer radiology imaging collections of the cancer imaging archive. Sci. Data4(1), 170124. 10.1038/sdata.2017.124 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Lin, Y., Lai, K. & Chang, W. Skin medical image captioning using multi-label classification and siamese network. IEEE Access11, 23447–23454. 10.1109/ACCESS.2023.3249462 (2023). [Google Scholar]
- 45.Tsaniya, H., Fatichah, C. & Suciati, N. Automatic radiology report generator using transformer with contrast-based image enhancement. IEEE Access12, 25429–25442. 10.1109/ACCESS.2024.3364373 (2024). [Google Scholar]
- 46.Park, H., Kim, K., Park, S. & Choi, J. Medical image captioning model to convey more details: Methodological comparison of feature difference generation. IEEE Access9, 150560–150568. 10.1109/ACCESS.2021.3124564 (2021). [Google Scholar]
- 47.Hou, W., Cheng, Y., Xu, K., Hu, Y., Li, W. & Liu, J.: Icon: Improving inter-report consistency of radiology report generation via lesion-aware mix-up augmentation. ArXiv abs/2402.12844 (2024)
- 48.Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R. M.: Chestx-ray: Hospital-scale chest x-ray database and benchmarks on weakly supervised classification and localization of common thorax diseases. In Deep Learning and Convolutional Neural Networks for Medical Imaging and Clinical Informatics (2019). https://api.semanticscholar.org/CorpusID:8945673
- 49.Elbedwehy, S. & Medhat, T. Improved arabic image captioning model using feature concatenation with pre-trained word embedding. Neural Comput. Appl.35(26), 19051–19067. 10.1007/s00521-023-08744-1 (2023). [Google Scholar]
- 50.Yang, Y. et al. Joint embedding of deep visual and semantic features for medical image report generation. IEEE Trans. Multimed.25, 167–178. 10.1109/TMM.2021.3122542 (2023). [Google Scholar]
- 51.Revathi, B. S. & Kowshalya, A. M. Automatic image captioning system based on augmentation and ranking mechanism. Signal Image Video Process.18(1), 265–274. 10.1007/s11760-023-02725-6 (2024). [Google Scholar]
- 52.Elbedwehy, S. & Medhat, T. Improved arabic image captioning model using feature concatenation with pre-trained word embedding. Neural Comput. Appl.35(26), 19051–19067. 10.1007/s00521-023-08744-1 (2023). [Google Scholar]
- 53.Hicks, S. A. et al. On evaluation metrics for medical applications of artificial intelligence. Sci. Rep.12(1), 5979 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The code and the process of accessing the study code is available at https://github.com/drskscs/Healthcare-Model. Researchers can access the code by requesting a license from the corresponding author.
The datasets used in the manuscript are available from the corresponding authors on a reasonable request by email.
























































