Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2021 Dec 27;35:100651. doi: 10.1016/j.suscom.2021.100651

A machine learning and blockchain based secure and cost-effective framework for minor medical consultations

Vikas Hassija a, Rahul Ratnakumar b, Vinay Chamola d,*, Soumya Agarwal a, Aryan Mehra c, Salil S Kanhere e, Huynh Thi Thanh Binh f
PMCID: PMC9551443  PMID: 37521170

Abstract

With the ever-increasing awareness among people regarding their health, visiting a doctor has become quite common. However, with the onset of the COVID-19 pandemic, home-based consultations are gaining popularity. Nevertheless, the worries over privacy and the lack of willingness to assist patients by the medical professionals in the online consultation process have made current models ineffective. In this paper, we present an advanced protected blockchain-based consultation model for minor medical conditions. Our model not only ensures users’ privacy but by incorporating a calculation model, it also offers an opportunity for consulting end-users to voluntarily take part in the consultation process. Our work proposes a smart contract based on machine learning to be implemented for the prediction of a score of a professional who consults based on various prioritized parameters. This is done by using word2vec and TF-IDF weighting to classify the question and cosine similarity scores for detailed orientation analysis. Based on this score, the patient is charged, and simultaneously, the responder is awarded ether. An incentivized method leads to more accessible healthcare while reducing the cost itself.

Keywords: Machine learning, NLP, Naive Bayes, Logistic regression, Minor consultations, Ethereum, Blockchain

1. Introduction

Minor ailments can be termed as health issues that people can handle on their own. It does not generally require immediate attention, not calling for an emergency. Some patients have long-term diseases, common ones being high blood pressure, diabetes, liver or heart disease, and cancer of any body part. Such patients regularly face some abnormalities in their health, leading to some minor doubts related to their diet, medications, values in reports, etc. These kinds of doubts can also be coined as minor ailments as they often do not require visiting a doctor’s clinic and can be handled differently. 76 % cases among the registered minor ailments are those of Cough, fever, sore throat, upper respiratory tract infection, and earache [1,2] etc.

As per an observational study conducted in Norway, of the consultations not made during the usual hours of business, 28 % of them were nearly spent on addressing the minor ailments representing 18 % of the doctors’ total consultation time [1]. Similarly, the statistical data in the UK shows that there has been a 40 % increase in general medical consultations since 1995, of which approximately 20 % are for minor consultations [2]. A questionnaire study conducted at general practice clinics in the UK showed that according to the doctors, 7% of visits were ‘unnecessary’ and could be handled very efficiently by a pharmacist [3]. These unnecessary consultations also lead to the doctor being frustrated [4]. Moreover, according to another study,’ Doctors’ handwriting causes 7000 deaths a year’ [5]. This is due to the miscommunication between the doctor and the patient, resulting in a lack of legibility and clarity. Many times, patients are reluctant to communicate their fears, doubts etc., with the doctor directly because of lack of time, money and the latter’s lack of availability.

As per the survey by Economic Times, WHO recommends a general ratio for a doctor attending to many people as 1:1000. In India, the ratio varies to a very great amount; there is one doctor for every 10,189 people [6]. Most developing nations with large populations face the same problem. They have low GDPs altogether, causing them to spend a limited percentage on the healthcare sector. 7.2 trillion US$ or 10 % of global GDP was spent on health in 2015 [7]. Developed countries too face scarcity in their healthcare resources. It is a burden if the costly healthcare systems are used for minor medical ailments [2]. Therefore, to have a sustainable healthcare system that is economically viable, possible significant changes have to be made for a reduction in the treatment of minor ailments by the general practitioners. The patient demand is ever increasing, and therefore, it is essential to access the finite healthcare resources optimally. Few

patients who have been consulting a doctor for a long time develop a ‘doctor knows best’ attitude and belief in the notion that they have less control over their health [8]. Thus, self-care and self-medication along with some health literacy, is indeed a need of the hour [9,10].

Visiting a doctor for consulting a small disease is not always convenient and financially viable for the patient as well. Booking an appointment, driving to the clinic, and waiting in queues for hours requires extra time and more burning pockets. With the increase in psychological and sexual consultations, the patient might not always be comfortable in discussing their mental and sex-related concern, creating a discrepancy in the communication. With these rising issues and rising technology, many tend to use the internet for remedies. The internet may not always adhere to the exact concern and may cause more harm than good to their health, as no authenticated guidance is available. This again begs the question of whether to visit a clinic for a minor ailment or not? To answer this debatable question, we propose a blockchain-based secure distributed network for solving these problems of consultations for minor ailments.

This proposed consultation model not only ensures users’ privacy, but by incorporating a calculation model, it also offers an opportunity for consulting end-users to take part voluntarily in the consultation process. This work introduces a new smart contract based on machine learning which can be implemented for the prediction of a score of a professional, by using word2vec and TF-IDF weighting for question classification and cosine similarity scores, for detailed orientation analysis [11]. This method leads to more accessible healthcare while reducing the overall cost.

Blockchain has garnered curiosity in the medical domain because of its main usage as a data-sharing platform. Although the emphasis has so far been mainly on the financial services sector, many initiatives in other service-related fields such as healthcare show that this is beginning to change [[12], [13], [14]] Blockchain is a decentralized digital diary/ledger that stores and manages the continuously growing information of all transactions and events which take place between the participating parties. The information once entered, is immutable and the automatic cryptographic checking ensures data consistency [[15], [16], [17]].

The concept of blockchain was introduced with the launch of Bitcoin (cryptocurrency) in 2009. Since then, the technology is increasing its user applications in many diverse domains like healthcare, insurance, IOT based applications etc. [[18], [19], [20]].

The next section of the paper discusses the related work that has been done in this particular field. The detailed model of the consultation system is discussed in section III of the paper. Section IV discusses the four parameters in detail and the mathematics which will be used, respectively. Analysis and the results obtained are discussed in Section V and the last section VI holds the Conclusion.

2. Related work

Recently, many pieces of research have surfaced on blockchain-based healthcare services, various fields like managing public education, user-oriented scientific testing, and pharmaceutical counterfeiting [21]. The new technologies are always associated with high risk and uncertainty and the founders of such technologies have only limited collateral to minimize this risk. Therefore, it has always been a daunting task to fund the new technologies [22]. As a result of this growth, a new funding system emerged very recently; ICOs (Initial Coin Offerings), which are often generally referred to as token sales or crowd sales [23].

Different blockchain-based healthcare projects have been launched which use ICOs for fundraising. MedRec [24] is a record management system that focuses on smart contract EMRs. This system offers a detailed, immutable record for patients and convenient access to their medical records via providers and treatment sites but raises questions about the privacy of the users. Medicalchain [25] is another decentralized platform enabling secure, rapid, and transparent exchange and use of medical information. The work of Medicalchain platform has also been cited as a plausible example in other blockchain based healthcare papers as well [26]. It generates a user-based electronic medical record while retaining an original standard version of the user’s data. HealthCombix, in collaboration with PointNurse is a virtual health network for delivering direct primary care and care management services. By using "care credits," patients can be provided with consultations through nurses, physicians, and other professionals in healthcare. [27].

Paper [28] presents certain aspects of a tele-dermatology proposal called DermoNet, which is built on a decentralized blockchain framework. DermoNet aims to be a valuable platform for establishing a dermatological teleconsulting network and providing general practitioners with dermatological support. Clinico builds a data-sharing network focused on patient clinical trials with the motto of ‘collaboration, not a competition’. This project utilizes mobile technologies, human social interactions, and blockchain technology [29]. states Clinico to be an interesting use case of blockchain in the healthcare sector.

Paper [30] proposes a Lightweight Face Anti-Spoofing Network for Telehealth Applications that allows doctors and patients to schedule consultations, share medical information through secure user authentication of face recognition. Paper [31] proposes OpenHealthQ, which is an OpenFlow based traffic-shaping model using OpenFlow Queues to handle the data from healthcare. It provides an on-demand, secure and low-cost access to Healthcare 4.0 with the latest cloud infrastructure-SDN based fog nodes at the network’s edge, thus reducing the response time of the end host and simultaneously increasing the throughput. The paper [32] pushes for an SSM- soft systems methodology approach to Digital-first Primary Care (DfPC) which aims to highlight the complexity of the National Health Service (NHS), the pros and cons with tele-consultations, presented their findings by way of pilot interviews with general practitioners. Paper [5] explores

the AI-assisted diagnosis systems, along with understanding how medical practitioners use AI medical assistant systems for diagnosis. Current problems faced by IoT- medical consultation services have also been explored. It introduces a BOM-business operation model based on participation and sharing of resources. The new CNN-optimization algorithm has improved the accuracy rate of the prediction of the medical consultation to 90.15 percent, which is better than other machine learning algorithms. Paper [33] uses the Health 2.0 technology for the understanding the context of medical decision-making process by modeling and analyzing the patient-physician-generated information based on combined CNN-RNN architecture. It introduced a DP-CRNN algorithm to analyze the combination of semantic and sequential features based on patients’ queries [34]. It also introduces an intelligent recommendation method that is then proposed to provide patients with automatic pre-diagnosis suggestions to further refine the learning process [35]. The work [36] an automatic MASK R-CNN based diagnosis model has been developed for different dental diseases. The diagnosis accuracy improved to 90 percent and the number of treated patients improved by 18.4 percent. Work [37] designed a model, referred to as NHS- national health stack. NHS is a multilayered KMS-knowledge management system proposed to support evidence-based decisions of public health. With the current online consultations systems, there is a certain lack of motivation from the expert’s side and no proper fare calculation measures. A detailed design of the prototype that explains the process steps in our fare charging for medical consultation has been discussed in section 3.

Our proposed model is a machine learning and blockchain based smart contract, ensuring users’ privacy. By using word2vec, TF-IDF question weightage, classification and cosine similarity scores in its calculation model, it improves the accuracy of analysis and delivers an economic system. Moreover, it offers an opportunity for consulting end-users to take part voluntarily in the consultation process.

3. System model

In this section, we will outline the conceptual framework of our proposed model in action. The basic workflow is represented in Fig. 1 . The step-wise program implementation has been described below:

    • 1)
      Every individual who wishes to join the network will have a blockchain account [38]. They should belong to any of the following categories: patient, doctor, or other experts, such as a physician or general chemist etc.
      • a)
        The patients’ account must provide details such as name, age, information on his history of allergies, diseases, and ongoing drugs. The first two are requisites after signing up.
  • a)

    The doctor is required to provide his/her name, educational degrees along with the field of specialty, postgraduate college name, and years of experience that he holds. He can also include a link to his profile if appropriate.

  • b)

    Other experts are expected to fill in their information, including name, advanced degrees, name of college, and evidence of medical expertise. The proof can be organized as some linked articles, journals, or in other ways as appears appropriate.

Fig. 1.

Fig. 1

System model for the proposed framework.

Now all the participants are included in the network.

    • 1)
      The smart contract dispatched on the blockchain network verifies the information provided by the various participants [39]. The algorithms in the likes of Elliptic Curve Digital Signature Algorithm (EDCSA) were first proposed by Vanstonev to be used as a cryptographic algorithm [40]. We propose the use of this protocol for authentication due to its sturdy mathematical model. It also provides high protection over other systems and the assurance of digital datas’ unforgeability and non-repudiation. Though other protocols like discrete ECC (Elliptic-curve cryptography) and discrete logarithms (DL) give satisfactory levels of protection, but when parameters are considered, ECC uses parameters smaller than DL [41].

The patients post their queries on the blockchain network [42]. The responders will now provide answers on the said topic to the best of their knowledge. A full collection of sample question- answers is shown in Table 1 . The verified dataset formed of the different responder’s responses is used for performing the following steps for maintaining an effective and fair system:

      • a)
        Rating the participants according to their answers.
      • b)
        Calculate the reward that is payable to the respondents.
    • 1)
      For this evaluation, we have considered 4 parameters:
      • a)
        Reputation - Based on professional qualifications, endorsements, and acceptance of previous answers (static scores).
      • b)
        Expertise in that area – Based on the topics of his/her previously answered questions. The classification is done based on word2vec and TF-IDF weightings, followed by a logistic regression classifier.
      • c)
        Supporting Document –Extra points where the answer refers to a supporting text (static scores).
      • d)
        A measure of ‘Detail Orientation’ in the answer that is judged by cosine similarity-based methods.

Table 1.

An example of a dataset formed of the different queries from patients and corresponding response from responders.

Question Answers Category Responder Endorsed by Time taken to respond Supporting Document provided Type of Supporting Document
Can a dialysis patient take banana No Very Precise MBBS 98 1 Hr No NA
Can a dialysis patient take banana No as it has high potassium and that is not easily released during dialysis Precise Nephrologist 105,138 15 Min Not required NA
Can a dialysis patient take banana Should be avoided generally but occasionally half banana can be taken 6 hr before dialysis Moderate Pharmacist 115 10 Min Yes Research article
Can a dialysis patient take banana No high potassium diet is recommended such as banana, coconut, mango and so on. Extra details Dietician 98 1.5 Hr Not required NA
Can a dialysis patient take banana Depends on the potassium level. If the potassium level is below 5, occasionally taking a banana few hours before dialysis is not risky. Relevant Details Technician 96,105,155 10 Min Yes Medical book / Blog

The following section discusses the proposed algorithms for implementing these 4 parameters and the mathematics behind their determination in detail.

4. Calculation of model parameters

The prioritized parameters on which a responder would be judged and therefore ranked are discussed in detail below. For keeping the judging parameters fair and transparent, importance will be given to both the score which he/she previously holds along with the quality and response time of the current answer. Moreover, flexible priority has been assigned to the individual scores based on the time, place, condition, government regulations etc. using a weighted calculation of Final Score as explained in Section 4, Eq. 17.

A. Reputation

“Today’s Web is the product of over a billion hands and minds. Around the clock and around the globe, a world full of people are pumping out contributions small and large.” As stated by Randy Farmer in his book on Web reputation systems and how they are shaping the offline world as well. Modern reputations in the media and online are strongly embedded in pre-Internet social networks. Therefore, exactly what the word means: Reputation can generally be described as information used to judge the value of an object or individual. It is a rough indicator of how much you are trusted by your fellow community members.

To enable the reputation metric following steps are followed:

  • A credit ranking system is established for all applicable professional credentials based on general degree ranking criteria. The provided scores are chosen with the utmost care, keeping in mind that no occupation is being condescended and offended. This score also works for every person as a minimum threshold. Table 2 gives the predefined reputation scores of all related professionals as grouped into their qualifications.

Table 2.

Table showing bucket of Scores.

Range of Score of persons endorsing Value gained
150 + 10
130−150 9
100−130 8
90−100 7

Let X be the participant in the network who responds to a particular query. His answer is endorsed by other participants, say, Y 1 , Y 2 , Y 3…. Yn.

Q(X) is the score which is pre-defined according to the individual’s qualifications.

  • The primary way to improve the reputation score more than the static value is by posting useful answers. Endorsements on these posts help in increasing credibility. With a relatively simple model, each person will support the other fellow participant. An individual who has been supported by a more respectable person would earn points of high credibility. Likewise, a person who is endorsed by someone with a lower reputation score will get comparatively fewer points. Specific reputation scoring buckets are created, with each bucket having a value (Table 2). This value is added to the current reputation of the person who has been endorsed by the other belonging to the bucket in question.

E (Yi) is the value assigned to the bucket containing the reputation score of Yi.

It’s important to take the time to answer into consideration. Some bonus points will be granted to the respondent who takes only a minimum time to reply, and that value will decrease as time passes. There would be further checkpoints to keep a check on the quality and effectiveness of the answer. Even if an individual answer first, but the consultation is not of any use, the points would be deducted as well. In trying to maintain balance

between quality and speed, we keep a lower gradation for response time rewards. This will ensure that we inspire people to give thorough answers in good quality and not just to respond by crossing the threshold to gain any ranking.

t=+2:Responsetimelessthan30minutes+1:Responsetimelessthan1hour+0:Otherwise
R(x)=Q(x)+i=1nE(Yi)+t (1)

B. Expertise in that area

To enable this metric, question classification has been done. Each question asked by the patient is categorized and mapped to a professional who has specialization in that particular area. If the responder has that particular specialization, then he is given +5 points for his trusted consultation. No added points are given otherwise. The method applied for this classification is discussed below.

  • 1)

    Question Collection- We have taken the Healthtap dataset, which consists of 1,048,575 question-answer pairs [43]. Recent researches conducted has proven this dataset to be effective for question-answer models [44], and the inspiration for the same has been taken for this work as well. Each question is categorized into the subject it mainly deals with. There are 226 unique categories. Each category is mapped to the doctor who specializes in it. The top 5 ailments are listed in Table 3.

  • 2)

    Pre Processing -It is strongly suggested that unstructured data be prepared in any approach that uses machine learning [45]. It reduces unnecessary, duplicated, meaningless, and noisy data [46]. The categories which have a total count less than or equal to 1000 (less than 1% of the total dataset) are removed. All the words are converted to lowercase alphabets.

  • 3)

    Feature Engineering -Once the pre-processing is applied, the questions are now ready for the feature extraction step. It is a process where raw text data will be transformed into feature vectors, creating new meaningful features using the existing raw dataset.

Table 3.

Ailment mapped with it’s specialized personnel.

Category Count Specialization needed
drugs 120,024 MS
teeth 75,167 Dentist
pregnancy 62,875 Obstetrician / Gynecologist
menstrual cycle 59,676 Obstetrician / Gynecologist
unprotected sex 47,923 Obstetrician / Gynecologist

For machine learning classifiers, these features have to be understandable. To get relevant features from our dataset, we have implemented the following specific ideas.

    • a)
      Count Vectors- Count Vector is a data-set matrix notation that is to be used as feature. Each row represents a corpus document, each column represents a corpus term and each cell represents the frequency count of a particular term in a particular document.
    • b)
      TF-IDF vectorization- TF-IDF is an acronym for Term Frequency-Inverse Document Frequency. It is different from count vectorization in the sense that it takes account not only of the occurrence of a word in a single document but of the whole corpus. Therefore, representing the relative importance of a term in the document and the entire corpus.

The occurrence of the word normalized with the document’ s size is called term-frequency.

TF(w)=Countofwappeaingindocumentd)totalnumberoftermsinthedocument (2)

While computing term-frequency, each term is considered equally important and allowed to participate in representing the vectors. There are still certain words that are so prevalent in documents that they make very little contribution in determining their significance. The term frequency of these terms, for instance,’ the,’’ a,’’ in,’’ of,’ etc., may overwhelm the weights of significant words. Thus, to counteract this effect, a factor called the inverse frequency of a document must reduce the word frequency.

IDF(w)=log2totalnumberofdocumentsdocumentswithtermw (3)

TF IDF - We use a vector representation that offers a higher value for a given term if that term always occurs in that specific text and often rather seldom elsewhere.

The more significant a word in the document, the greater the TF-IDF score would be, with a score of 0 for a term that occurs in all documents.

TF=(w,d)=o(w,d)io(wi,d)IDF(t)=1+log(Ddw)TFIDF(w,d)=TF(w,d).IDF(w) (4)

Where o(w,d) indicates that word w appears in document d, and the denominator io(wi,d) shows the total word count in document d. D is the total number of documents in the dataset, and dw is the number of the documents word w appeared in.

At various levels (words, n-grams) of input tokens, TF-IDF Vectors are formulated.

      • i)
        Word Level TF-IDF: The 2D array addresses the TF-IDF scores of each term in various documents.
      • ii)
        N-gram Level TF-IDF: Under the N-gram model, a vocabulary set is considered a set of uni-grams. The ‘n’ in the n-gram represents a number, determining how many words are there in one gram. This 2D array reflects TF-IDF scores of N-grams.

Word Embedding- A method of representation of text where words with the same meaning are expressed in a similar way in the form of a dense vector. It is capable of capturing a word context in a document, similitude semantic and syntactic, relationship with other words, etc. The input corpus itself can be used to train word embedding. Alternatively, pre-trained word embedding such as FastText, Word2Vec, and Glove could be used to create these. FastText is an extremely lightweight open-source library that helps in prompt learning of text representations and classifiers. It works on standard Models based on generic hardware of reduced dimensions so as to fit on portable gadgets. Word2Vec is a promising model that is highly employed in many upcoming applications. It works based on embedding words on a lower dimensional vector-space utilizing a shallow neural-network. The resultant word-vectors which appear close together in the vector space will have similar contextual meanings. Similarly, distant word vectors have different contextual meanings. The benefit of these models is that they can exploit features from large databases built up using billions of different words, with a vast corpus of language capturing word definitions in a statistically robust manner.

This study uses pre-trained Word2Vec, which uses the Google News dataset to train about one hundred billion

tokens. The architecture of Word2Vec is a feed-forward neural network with just one hidden layer. We have used the skip-gram model of the word2vec algorithm.

The meaning of a term may be determined from the companion words it has [47,48].

  • 4)
    Classification - Finally, we train the classifier using the features created in the previous step. We employ two of the most prevalent supervised classification algorithms for machine learning, which are Naive Bayes Classifier and Logistic Regression (LR).
    • a)
      Naive Bayes - A probability-based machine learning model whose core is the Bayes theorem. Given a feature vector d = (v 1 , v 2 ,. . ., vn), each feature represents a term in d that are part of the vocabulary we use for classification and a class variable Ck (Total classes = K). Bayes Theorem states that:
P(Ck|d)=P(d|Ck)P(Ck)P(d),fork=1,2,...K (5)

P (Ck d) is the posterior probability i.e. the probability of a document d being in class Ck, P (d Ck) is the likelihood, P (Ck) is the prior probability of a document occurring in class Ck, and P (d) the prior probability of predictor i.e. the document.

The following equation gives us the class which is most likely or Maximum-a-posteriori (MAP) class cmap:

Cˆ=argmaxCkP(Ck)i=1nP(vi|Ck) (6)

We are using the Gaussian Naive Bayes Classifier. When working with continuous data, a common assumption is that the continuous values assigned for each class are expressed according to a normal (or Gaussian) distribution [49,50]. Since there are changes in the way the values are present in the dataset, the conditional probability formula changes to Eq. 7.

P(vi|Ck)=12πσCk2exp(viμCk)22σCk2 (7)
    • a)
      Logistic Regression - It is the most widely recognized algorithm for machine learning after linear regression. It has been used in many applications for the classification of texts and delivers important results as well [51]. The relationship between the categorical dependent variable and one or more independent variables is tested by the logistic regression with the help of the logistic/sigmoid function to estimate the probabilities. Sigmoid is a mathematical function that has a distinctive S-shaped curve. Eq. 11 represents the sigmoid function mathematically.
S(x)=11+ex=exex+1 (8)

Using a similar approach by B.Pang in his paper on Text Classification, let the vocabulary contain

Algorithm 1

Training Naive Bayes for Text Classification

graphic file with name fx1_lrg.jpg

{w1, w2, w3 … wn} n words [52]. Each document is represented by a sparse array in binary format, which indicates whether or not a word wi occurs inside the text or not. Each document is represented as (di,ci), where the di includes contextual document details i.e. the sparse array and ci represents its class. K is total number of classes present and m is the total number of training samples. Given a document d, the model first computes a score sc(d) for each class ci (i belongs to 1,..K), then estimates the probability of each class by applying the softmax function to the score.

sc(d)=θcT.d (9)

The probability with which a document’ d’ belongs to class’ y’ is given as:

pˆc=σ(s(d))c=exp(sc(d))j=1Kexp(sj(d)) (10)

σ(s(d))c is the estimated probability that the instance

d belongs to class c, given the scores of each class for

that instance

yˆ=argmaxcσ(s(d))c=argmaxcsc(c)=argmaxc(θcT.d) (11)

The purpose is to get a model -estimating a high likelihood for the target class (and thus a low likelihood for the other classes).

For this reason, we will use the succeeding cost function in our Multinomial Logistic Regression model. The objective is to find the parameters Θ that minimize it:

J(Θ)=1mi=1mc=1Kyc(i)log(pˆc(i)) (12)

Needless to say, there is no known simple analytical way for estimating the parameters minimizing the cost function. We, therefore, employ an iterative algorithm such as gradient descent. The iterative algorithm calls for us to estimate the partial cost function derivative, which is equal to:

θCJ(Θ)=1mi=1m(pˆc(i)yc(i))d(i) (13)

Algorithm 2

Training Logistic Regression for Text Classification

graphic file with name fx2_lrg.jpg

5) Evaluation Parameters - The precision, recall, and F1-measure are calculated to measure the effectiveness of the proposed model.

6) Result of models

a) Naive Bayes

The scores of the different parameters obtained are shown in Table 4 . We observe that WordLevel TF-IDF performs the best, whereas N-Gram Vectors perform the worst with a difference in their F1-Scores as 0.257.

Table 4.

Scores of Naive Bayes Classifier.

Parameter Precision Recall F1- Score
Count Vectors 0.738 0.625 0.677
WordLevel TF-IDF 0.773 0.746 0.759
N-Gram Vectors 0.573 0.446 0.502

b) Logistic Regression

The scores of the different parameters obtained are shown in Table 5 . We observe that WordLevel TF-IDF performs the best whereas N-Gram Vectors performs the worst with a difference in their F1-Scores as 0.084, which is quite less as compared to the Naive Bayes approach.

Table 5.

Scores of Logistic Regression Classifier.

Parameter Precision Recall F1- Score
Count Vectors 0.814 0.746 0.778
WordLevel TF-IDF 0.859 0.833 0.846
N-Gram Vectors 0.757 0.768 0.762

We conclude that Logistic regression with the Softmax Activation function outperforms the other proposed model with an average F1-Score difference of 0.150. Let EX (x) be the score rewarded for the expertise of a professional as per the category of question predicted.

EXx=+5:F1scoreofModel0.7+0:Otherwise

C. Supporting Document

Extra points are awarded when a supporting document is referred to in the response. These points are allocated to each type of supporting documentation. A peer-reviewed publication is given more value than a blog post. Let SD (x) be the score rewarded for the availability of the Supporting Document.

SDx=+5:Peerreviewedpublication+2:Blogpost+0:Otherwise

A. A measure of Detail Orientation

For evaluating this parameter, all the responses for a particular question will be used to create a bag of words model. Bag of Words Model - In this model any text like a sentence or a document is described as the bag of its words. Unlike a dictionary, the order in which the words are present is not taken into account but the multiplicity of each word is retained. For a classifier, the frequency of every word is used as a training feature.

Let’s assume answers (taken from the example dataset in Table 1) A 1 , A 2 , A 3 , A 4 , A 4 , A 5

Now to create the bag of words for finding the efficiency of each answer, the union of all statements is considered and the cumulative bag of words is formed.

BoWdoc=BoW1BoW2BoW3BoW4BoW5 (14)

Stop-word removal is essential to remove irrelevant syntactical inferences. Therefore, we remove the stop words through NLTK libraries and extract the keywords to use as features.

The keywords obtained after applying Gensim text summarization are:

’banana’,’ potassium’,’ dialysis’

Now, as we have more than 1 keyword, we apply Cosine similarity between the keywords and the responses. A general method to find relevance or similarity between two documents is by counting the most frequent words and comparing. This method has its flaws, for it fails to perform effectively as the size of documents increases [53]. To overcome this, we use Cosine Similarity to determine how close the documents are regardless of scale. Mathematically, it measures the angle cosine between two vectors that are projected in a multidimensional space.

In the case of information retrieval, because the word frequencies cannot be negative, the cosine similarity of two documents will vary from 0 to 1. The angle between two vectors of the word frequency cannot be greater than 90 ° [54]. The cosine of two non-zero vectors can be formulated by using the Euclidean dot product equation:

X.Y=||X||×||Y||cosθ (15)

Two attribute vectors, X and Y, represent cosine similarity, cos(θ) as:

Similarity=cosθ=X.Y||X||×||Y||=i=1nXiYi(i=1nXi2)(i=1nYi2) (16)

A threshold of 30 % similarity is maintained. Any document below this is not considered for the ranking parameter. The document which has the highest cosine similarity percentage will be given +10 points and the next 3 highest relevant answers (if they pass the threshold) will be given +7, respectively. Other answers passing the threshold will be awarded +5 points.

CSx=+10:answerwithhighestcosinesimilarity+7:nextthreehighestsimilaranswers+5:answersotherthanthetopfourbest0:answersnotpassingthe30%threshold

The cosine similarity calculated for all the statements is shown in Table 6 .

Table 6.

Cosine Similarity of Sample Answers.

Answer Cosine Similarity Percentage
A1 0.0
A2 47.14
A3 36.51
A4 40.82
A5 54.77

Finally, any responder is ranked based on above mentioned 4 parameters.

FinalScore=w1R(x)+w2EX(x)+w3SD(x)+w4CS(x) (17)

where w1 to w4 represents the weights prioritizing the parameters as the model learns in time. Their initial values will be unity and can be manually adjusted based on time, place, unplanned situations, government regulations etc.

5. Results and analysis

A. Simulation Settings

The dependencies used in the making of the blockchain framework are Node.JS, Ganache, Truffle Framework, Meatmask Etherum Wallet, Mocha, and Chai.

Node.js is the first module used, as it includes the Node Package Manager (NMP). Other dependencies that are required in the framework are installed by using NPM. The next dependency is a blockchain creation method that can mimic a blockchain output behavior. Ganache, as our personal blockchain, is used and carries out local testing by connecting to the local nodes provided by it. The next focus is on the

Truffle Framework, which offers a range of tools to build blockchain applications, allows smart contracts to be created, tests to be written against them as well as deploy them to the ethereum network. Next is the installation of the Metamask Ethereum Wallet to turn our web browser into a blockchain browser. The Google Chrome Metamask extension is used to bind with a blockchain. For testing the ethereum smart contract before deploying it, we use mocha and chai. The various Machine Learning models are implemented in python via the use of various libraries. Natural Language Toolkit (NLTK) and Gensim were used for pre-processing the raw text data. Pandas, scipy, and scikit-learn python libraries were used for applying Naïve Bayes, Logistic Regression. Matplotlib and seaborn were used for data visualization.

B. Performance Evaluation

Parameter-1 (reputation) and parameter-3 (supporting document) have predefined scores, as explained above. For parameter-2 (Expertise), the results obtained after calculating Precision, Recall, and F1-Score for Count Vectors, WordLevel TF-IDF, and N-Gram Vectors for Naive Bayes, Logistic Regression is shown in Table 4, Table 5. The graphical representations of these scores are shown in Fig. 2 . For parameter-4 (detail orientation) the Cosine Similarity obtained for each answer of the dataset shown is given in Table 6.

Fig. 2.

Fig. 2

Comparison of the Parameter Evaluation for classifiers.

Fig. 3 shows the trend of Cost of Consultations as practiced by the traditional methods, which vary from around

Fig. 3.

Fig. 3

Comparative analysis of Cost of Consultation over the years.

400 dollars to 600 dollars. This is plotted using the Organisation for Economic Co-operation and Development Statistics data which tells the average number of doctor consultations per inhabitant. This data has been used to show the health expenditures in various studies as well [55]. The number of visits was multiplied by the average amount it cost for each visit. Our proposed model reduces this cost of consultation to a great extent, ranging from 50 dollars to 110 dollars as time goes by. In order to prevent the same users from submitting repeated requests from different ethereum accounts, any unique and individualistic number of the patient like Social security number, PAN or Aadhar number can be saved and cross-checked with the database on every request. Moreover, as the consultation is a paid one, the gain achieved by such practices will be negligible.

Fig. 4 shows the relative comparison of various methods based on parameters such as Cost of Consultations, Response time i.e. the waiting time in the clinic for traditional methods and the time taken by the professional to respond online and our proposed method. The Consulting time is the travel time to the doctor’s clinic and the average time taken to log in or sign up on the online/ blockchain portal. The information on the average cost by current online platforms is taken from [56]. The average waiting time for response is referred from the study of VSEE telemedicine services [57].

Fig. 4.

Fig. 4

Comparative analysis of various Consultation methods.

6. Conclusion

In this paper, we propose a blockchain-based framework for minor medical consultations, given the lack of motivation and no efficient charging system of the current traditional and online consultation models. The medical professional can receive ether by being part of our blockchain network on sharing his consultation based on the patient’s query. We took into consideration various prioritized parameters while deciding on the score earned by each professional in his consultation which includes reputation, expertise in that area, supporting documents and detailed orientation. This also leads to the patient getting the best consultation. We compared the results of Logistic Regression and Naive Bayes Classifier to achieve a considerable amount of F1-Score and Cosine Similarity for classification of the question itself. An added incentive leads to greater participation from the professional’s side, leading to prompt and effective results.

Data availability

Data will be made available on request.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

All persons who have made substantial contributions to the work reported in the manuscript (e.g., technical help, writing and editing assistance, general support), but who do not meet the criteria for authorship, are named in the Acknowledgements and have given us their written permission to be named. If we have not included an Acknowledgements, then that indicates that we have not received substantial contributions from non-authors.

This work was supported by ASEAN - India Collaborative RD scheme (ASEAN-India ST Development Fund (AISTDF) sponsored) received by Dr. Vinay Chamola and Prof. Huynh Thi Thanh Binh under Project Grant File CRD/2020/000369.

References

  • 1.Welle-Nilsen Lina Kristin, Morken Tone, Hunskaar Steinar, Granas Anne Gerd. Minor ailments in out-of-hours primary care: an observational study. Scand. J. Prim. Health Care. 2011;29(1):39–44. doi: 10.3109/02813432.2010.545209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Fielding S., Porteous T., Ferguson J., Maskrey V., Blyth A., Paudyal V., Barton G., Holland R., Bond C.M., Watson M.C. Estimating the burden of minor ailment consultations in general practices and emergency departments through retrospective review of routine data in north east scotland. Fam. Pract. 2015;32(2):165–172. doi: 10.1093/fampra/cmv003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Hammond T., Clatworthy J., Horne R. Patients’ use of GPs and community pharmacists in minor illness: a cross-sectional questionnaire-based study. Fam. Pract. 2004;21(2):146–149. doi: 10.1093/fampra/cmh207. [DOI] [PubMed] [Google Scholar]
  • 4.Morris C., Cantrill J., Weiss M. GPs’ attitudes to minor ailments. Fam. Pract. 2001;18(6) doi: 10.1093/fampra/18.6.581. [DOI] [PubMed] [Google Scholar]
  • 5.Rahaman S., Mohammed S., Manchanda T., Mahadik R. e-pharm assist: the future approach for dispensing medicines in smart cities. 2019 International Conference on Digitization (ICD) 2019:263–267. [Google Scholar]
  • 6.2019. India Facing Shortage of 600,000 Doctors, 2 Million Nurses: Study.https://economictimes.indiatimes.com/industry/healthcare/biotech/healthcare/india-facing-shortage-of-600000-doctors-2-million-nurses-study/articleshow/68875822.cms April. [Google Scholar]
  • 7.Who, https://www.who.int/gho/health_financing/health_expenditure/en/.
  • 8.Cardol M., Schellevis F.G., Spreeuwenberg P., van de Lisdonk E.H. Changes in patients’ attitudes towards the management of minor ailments. Br. J. Gen. Pract. 2005;55(516):516–521. [PMC free article] [PubMed] [Google Scholar]
  • 9.Bell J., Dziekan G., Pollack C. Self-care in the twenty first century: a vital role for the pharmacist. Adv. Ther. 2016;33:1691–1703. doi: 10.1007/s12325-016-0395-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Liu Y., Wang J., Chen Y., Niu S., Lv Z., Wu L., Liu D., Song H. Blockchain enabled secure authentication for unmanned aircraft systems. arXiv preprint. 2021 arXiv:2110.08883. [Google Scholar]
  • 11.Li Y., Zuo Y., Song H., Lv Z. Deep learning in security of internet of things. Ieee Internet Things J. 2021 [Google Scholar]
  • 12.Mettler M. 2016. Blockchain Technology in Healthcare: the Revolution Starts Here; pp. 1–3. [Google Scholar]
  • 13.Mettler M. Blockchain technology in healthcare: the revolution starts here. 2016 IEEE 18th International Conference on e-Health Networking, Applications and Services (healthcom); IEEE; 2016. pp. 1–3. [Google Scholar]
  • 14.Bharadwaj H.K., Agarwal A., Chamola V., Lakkaniga N.R., Hassija V., Guizani M., Sikdar B. A review on the role of machine learning in enabling iot based healthcare applications. IEEE Access. 2021;9 pp. 38 859–38 890. [Google Scholar]
  • 15.Crosby M., Nachiappan, Pattanayak P., Verma S., Kalyanaraman V. 2016. BlockChain Technology: Beyond Bitcoin. June. [Google Scholar]
  • 16.Vazirani Anuraag A., O’Donoghue Odhran, Brindley David, Meinert Edward. Implementing blockchains for efficient health care: systematic review. J. Med. Internet Res. 2019;21(2) doi: 10.2196/12439. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Chamola V., Hassija V., Gupta S., Goyal A., Guizani M., Sikdar B. Disaster and pandemic management using machine learning: a survey. Ieee Internet Things J. 2020 doi: 10.1109/JIOT.2020.3044966. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Miraz M.H., Ali M. Applications of blockchain technology beyond Cryptocurrency. Annals of Emerging Technologies in Computing. 2018;2(1) [Google Scholar]
  • 19.Hassija V., Gupta V., Garg S., Chamola V. Traffic jam probability estimation based on blockchain and deep neural networks. Ieee Trans. Intell. Transp. Syst. 2020 [Google Scholar]
  • 20.Rohmetra H., Raghunath N., Narang P., Chamola V., Guizani M., Lakkaniga N.R. Ai-enabled remote monitoring of vital signs for covid-19: methods, prospects and challenges. Computing. 2021:1–27. [Google Scholar]
  • 21.Mettler M. 2016. Blockchain Technology in Healthcare: the Revolution Starts Here; pp. 1–3. [Google Scholar]
  • 22.Cosh A., Cumming D., Hughes A. Outside enterpreneurial capital. Econ. J. 2009;119(540):1494–1533. [Google Scholar]
  • 23.Lipusch N. Initial coin offerings a paradigm shift in funding disruptive innovation. Ssrn Electron. J. 2018;01 [Google Scholar]
  • 24.Azaria A., Ekblaw A., Vieira T., Lippman A. Medrec: using blockchain for medical data access and permission management. 2016 2nd International Conference on Open and Big Data (OBD); IEEE; 2016. pp. 25–30. [Google Scholar]
  • 25.Medicalchain-whitepaper, https://medicalchain.com/Medicalchain-Whitepaper-EN.pdf.
  • 26.Griggs K.N., Ossipova O., Kohlios C.P., Baccarini A.N., Howson E.A., Hayajneh T. Healthcare blockchain system using smart contracts for secure automated remote patient monitoring. J. Med. Syst. 2018;42(7):130. doi: 10.1007/s10916-018-0982-x. [DOI] [PubMed] [Google Scholar]
  • 27.Engelhardt M.A. Hitching healthcare to the chain: an introduction to blockchain technology in the healthcare sector. Technol. Innov. Manag. Rev. 2017;7(10) [Google Scholar]
  • 28.Mannaro K., Baralla G., Pinna A., Ibba S. A blockchain approach applied to a teledermatology platform in the sardinian region (italy) Information. 2018;9(2):44. [Google Scholar]
  • 29.Van Hijfte S. Decoding Blockchain for Business. Springer; 2020. Blockchain and industry use cases; pp. 55–87. [Google Scholar]
  • 30.Lin J.-D., Lin H.-H., Dy J., Chen J.-C., Tanveer M., Razzak I., Hua K.-L. Lightweight face anti-spoofing network for telehealth applications. IEEE J. Biomed. Health Inform. 2021 doi: 10.1109/JBHI.2021.3107735. pp. 1–1. [DOI] [PubMed] [Google Scholar]
  • 31.Bardalai P., Medhi N., Bargayary B., Saikia D.K. Openhealthq: openflow based qos management of healthcare data in a software-defined fog environment. ICC 2021 - IEEE International Conference on Communications. 2021:1–6. [Google Scholar]
  • 32.Shahzad I., King M., Henshaw M. Applying sose in healthcare: the case for a soft systems methodology approach to digital-first primary care. 2021 16th International Conference of System of Systems Engineering (SoSE) 2021:37–42. [Google Scholar]
  • 33.Zhou X., Li Y., Liang W. Cnn-rnn based intelligent recommendation for online medical pre-diagnosis support. IEEEACM Trans. Comput. Biol. Bioinform. 2021;18(3):912–921. doi: 10.1109/TCBB.2020.2994780. [DOI] [PubMed] [Google Scholar]
  • 34.Zhu H., Liu X., Lu R., Li H. Efficient and privacy-preserving online medical prediagnosis framework using nonlinear svm. IEEE J. Biomed. Health Inform. 2017;21(3):838–850. doi: 10.1109/JBHI.2016.2548248. [DOI] [PubMed] [Google Scholar]
  • 35.El-Sappagh S., Ali F., Ali A., Hendawi A., Badria F.A., Suh D.Y. Clinical decision support system for liver fibrosis prediction in hepatitis patients: a case comparison of two soft computing techniques. IEEE Access. 2018;6 pp. 52 911–952 929. [Google Scholar]
  • 36.Liu L., Xu J., Huan Y., Zou Z., Yeh S.-C., Zheng L.-R. A smart dental health-iot platform based on intelligent hardware, deep learning, and mobile terminal. IEEE J. Biomed. Health Inform. 2020;24(3):898–906. doi: 10.1109/JBHI.2019.2919916. [DOI] [PubMed] [Google Scholar]
  • 37.Malhotra C., Kotwal V., Basu A. Designing national health stack for public health: role of ict-based knowledge management system. 2019 ITU Kaleidoscope: ICT for Health: Networks, Standards and Innovation (ITU K) 2019:1–8. [Google Scholar]
  • 38.Tahiliani A., Hassija V., Chamola V., Kanhere S.S., Guizani M., et al. Privacy-preserving and incentivized contact tracing for covid-19 using blockchain. IEEE Internet of Things Magazine. 2021;4(3):72–79. [Google Scholar]
  • 39.Cao B., Wang X., Zhang W., Song H., Lv Z. A many-objective optimization model of industrial internet of things based on private blockchain. IEEE Netw. 2020;34(5):78–83. [Google Scholar]
  • 40.Vanstone S. Responses to nist’s proposal. Commun. ACM. 1992;35(7):50–52. [Google Scholar]
  • 41.Vanstone S. Responses to nist’s proposal. Commun. ACM. 1992;35(7):50–52. [Google Scholar]
  • 42.Hassija V., Chamola V., Gupta V., Jain S., Guizani N. A survey on supply chain security: application areas, security threats, and solution architectures. Ieee Internet Things J. 2020;8(8):6222–6246. [Google Scholar]
  • 43.Heathtap dataset, https://github.com/LasseRegin/medical-question-answer-data.
  • 44.Nie L., Wei X., Zhang D., Wang X., Gao Z., Yang Y. Data-driven answer selection in community qa systems. IEEE Trans. Knowl. Data Eng. 2017;29(6):1186–1198. [Google Scholar]
  • 45.Abduljabbar D.A., Omar N. Exam questions classification based on bloom’s taxonomy cognitive level using classifiers combination. J. Theor. Appl. Inf. Technol. 2015;78:447–455. [Google Scholar]
  • 46.Hassija V., Saxena V., Chamola V. A mobile data offloading framework based on a combination of blockchain and virtual voting. Softw. Pract. Exp. 2020 [Google Scholar]
  • 47.Mikolov T., Sutskever I., Chen K., Corrado G., Dean J. Distributed representations of words and phrases and their compositionality. CoRR. 2013;abs/1310.4546 http://arxiv.org/abs/1310.4546 [Online]. Available: [Google Scholar]
  • 48.Mikolov T., Chen K., Corrado G., Dean J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. 2013 [Google Scholar]
  • 49.Tang B., He H., Baggenstoss P.M., Kay S. A bayesian classification approach using class-specific features for text categorization. IEEE Trans. Knowl. Data Eng. 2016;28(6):1602–1606. [Google Scholar]
  • 50.Zhang H., Li D. 2007. Naïve Bayes Text Classifier. pp. 708–708. [Google Scholar]
  • 51.Pranckevicius T., Marcinkevicius V. Comparison of naive bayes, random forest, decision tree, support vector machines, and logistic regression classifiers for text reviews classification. Balt. J. Mod. Comput. 2017;5(2):221. [Google Scholar]
  • 52.Pang B., Lee L., Vaithyanathan S. Thumbs up? Sentiment classification using machine learning techniques. arXiv preprint. 2002 cs/0205070. [Google Scholar]
  • 53.Madylova A., Oguducu S.G. 2009. A Taxonomy Based Semantic Similarity of Documents Using the Cosine Measure; pp. 129–134. [Google Scholar]
  • 54.Huang C.-H., Yin J., Hou F. A text similarity measurement combining word semantic information with tf-idf method. Jisuanji Xuebao(Chinese Journal of Computers) 2011;34(5):856–864. [Google Scholar]
  • 55.Bradley E.H., Elkins B.R., Herrin J., Elbel B. Health and social services expenditures: associations with health outcomes. BMJ Qual. Saf. 2011;20(10):826–831. doi: 10.1136/bmjqs.2010.048363. [DOI] [PubMed] [Google Scholar]
  • 56.Edwards H.B., Marques E., Hollingworth W., Horwood J., Farr M., Bernard E., Salisbury C., Northstone K. Use of a primary care online consultation system, by whom, when and why: evaluation of a pilot observational study in 36 general practices in south west england. BMJ Open. 2017;7(11) doi: 10.1136/bmjopen-2017-016901. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Agnisarman S., Narasimha S., Madathil K.C., Welch B., Brinda F., Ashok A., McElligott J. Toward a more usable home-based video telemedicine system: a heuristic evaluation of the clinician user interfaces of home-based video telemedicine systems. JMIR Hum. Factors. 2017;4(2):e11. doi: 10.2196/humanfactors.7293. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data will be made available on request.


Articles from Sustainable Computing are provided here courtesy of Elsevier

RESOURCES