Skip to main content
Heliyon logoLink to Heliyon
. 2019 Jul 20;5(7):e02080. doi: 10.1016/j.heliyon.2019.e02080

Fuzzy-multidimensional deep learning for efficient prediction of patient response to antiretroviral therapy

Moses E Ekpenyong a,, Philip I Etebong a, Tenderwealth C Jackson b
PMCID: PMC6656963  PMID: 31372545

Abstract

Drug component interactions are most likely to trigger unexpected pharmacological effects with unknown causal mechanisms, hence, demanding the discovery of patterns to establish suitable and effective regimens. This paper proposes a novel framework that embeds machine learning (ML) and multidimensional scaling (MDS) techniques, for efficient prediction of patient response to antiretroviral therapy (ART). To achieve this, experiment databases were created from two independent sources: a publicly available HIV domain datasets of patients with failed treatment – hosted by the Stanford University, hereinafter referred to as the Stanford HIV database, and locally sourced datasets gathered from 13 prominent healthcare facilities treating HIV patients in Akwa Ibom State of Nigeria, hereinafter referred to as the Akwa-Ibom HIV database: with 5,780 and 3,168 individual treatment change episodes (TCEs) of HIV treatment indicators (baseline CD4 count (BCD4), followup CD4 count (FCD4), baseline viral load (BRNA), followup viral load (FRNA), and drug type combination (DType)), observed from 1,521 and 1,301 unique patient records, respectively. A hybridised (two-stage) classification system consuming the Interval Type-2 Fuzzy Logic (IT2FL) and Deep Neural Network (DNN) was employed to model and optimise patients’ response to ART with appreciable error pruning achieved through MDS. Visualisation of the experiment databases showed remarkable immunological changes in the Akwa-Ibom HIV database, as the FCD4 of TCEs clustered far above the BCD4, compared to the Stanford HIV database, where over 40% of FCD4 clustered below the BCD4. Similar changes were noticed for the RNA, as more FRNA copies clustered below the BRNA for the Akwa-Ibom datasets, compared to the Stamford datasets. DNN classification results for both databases showed best performance metrics for the Levenberg-Marquardt algorithm when compared with the resilient backpropagation algorithm, with improved drug pattern predictions for experiment with MDS. This paper is most likely to evolve an avenue that triggers interesting combination(s) for optimum patient response, while ensuring minimal side effects, as further findings revealed the superiority of the proposed approach over existing approaches.

Keywords: Computational mathematics, Applied computing, Immunology, Pharmaceutical science, Health sciences, HIV/AIDS, Fuzzy-multidimensional controller, Antiretroviral therapy, Deep neural network, Multi-drug interaction

1. Introduction

Acquired Immunodeficiency Syndrome (AIDS) is a chronic, potentially life-threatening condition caused by the Human Immunodeficiency Virus (HIV)–a persistent pathogen acknowledged as the lentivirus [1, 2]. HIV has no known cure, but the infected patient is treated with highly active antiretroviral therapy (HAART) [3], mainly for the purpose of suppressing the viral load (the amount of HIV in the blood stream) and prolonging the life expectancy of the patient. The viral load and CD4 (cluster for differentiation of Antigen IV) count are regarded as important determinants for measuring one's HIV status (whether positive or negative) and health [4]. By identifying interactive and distinctive drug characteristics, a predictive system can open promising avenues to improved strategies and treatment of HIV/AIDS, since the drug components produce clinical effect as a result of its interface with the virus, ultimately influencing the patients' response. But, the emergence of drug resistance mutation questions the effectiveness of drug therapies such that the selection of most effective drug combinations though a classification sequence of resistant/non-resistant exemplars has become crucial. Selecting the right regimen is a product of several factors that constitutes knowledge of past treatment history including CD4 and viral load (RNA) baselines, combined with expert interpretation and advice [5]. As a result of this activity, interest in modelling drug resistance has increased due to additional pre-exposure prophylaxis to the HIV prevention toolkit [6, 7]. Nevertheless, state-of-the-art models have failed to capture heterogeneities in the risk of drug resistance among individuals, mainly due to model detail diversity, as transmission models of antiretroviral therapy and pre-exposure prophylaxis use simple assumptions to represent short-term risk and long-term effects of drug resistance [8, 9]. Many machine learning (ML) methods have evolved to provide solutions to the model diversity problem. These methods attempt to locate best configurations that yield high performance through the minimisation of an error function defined by the system behaviour produced by trained exemplars.

This paper therefore proposes a hybrid methodology that combines intelligent mechanisms into an effectual and usable application system. The proposed methodology embeds deep learning into a rule-based technique powered by the fuzzy inference system. The novelty in this paper rests on the fusion of two classifiers: the type-2 fuzzy sets (T2FS) and a deep neural network (DNN), where linguistic inputs are translated into representations that generate feature labels for the DNN system. The DNN then drives the fuzzy inference block through adjustable fuzzy rules incorporated by (domain) expert knowledge acquired from the input data–required to explain the behaviour of the fuzzy system. To deal with the highly error-prone nature of real-world datasets, we also incorporate a multidimensional scaling technique for the purpose of enhancing the datasets for precise modelling and prediction of HIV patient response to (varying) treatment change episodes (TCEs).

In the absence of extensive access to personalised laboratory monitoring–an integral part of HIV/AIDS patient management (typical of resource-rich settings), a roll-out of HAART in resource-limited settings (such as those in Sub-Saharan Africa) has adopted a public health approach based on standard HAART protocols and clinical/immunological definitions of therapy failure. Hence, the benefits of this research shall certainly impact the African region, as it represents the commencement of a clinical database gathering to engender further HIV/AIDS research in Sub-Saharan Africa. The research will provide useful spinoffs for deeper interdisciplinary cooperation on personalised therapies and is most likely to produce a robust prediction system that will serve the growing populace in search of quality treatment. Furthermore, it shall aid Physicians on more proactive detection of acute interaction as well as early referrals of patients with failed treatments, for immediate change in treatment episode. The specific objectives of this paper therefore include, to:

  • implement a hybrid framework that combines the strengths of machine learning (ML) and MDS techniques in a supervised learning – for precise patient response prediction and efficient error-pruned datasets;

  • train an optimal sequence of test prototypes – through unified encoding of treatment change episodes (TCEs) of existing patient-specific ART gathered from paediatric records;

  • evaluate the proposed learning model using suitable performance metrics.

The remainder of this paper is structured as follows: Section 2 provides a review of related literature on HIV/AIDS prediction and classification and the extent of research recorded on drug reaction/resistance. Section 3 presents the materials and method employed in the research. Section 4 presents the results obtained from the study. Section 5 discusses the results with reference to existing literature. Section 6 concludes on the paper and points to future research direction.

2. Background

The cost-effectiveness of HIV-1 viral load monitoring at the individual level in such settings has been debated, and questions remain over the long-term and population-level impact of managing HAART without it. Computational models that accurately predict virological response to HAART using baseline data including CD4 count, viral load and genotypic resistance profile, as developed by the Resistance Database Initiative, have significant potential for treatment selection and optimization. However, recently developed models have shown good predictive performance without the need for genotypic data, with viral load emerging as the most important variable. This finding provides further, indirect support for the use of viral load monitoring and long-term optimization of HAART in resource-limited settings.

Several data mining algorithms have been applied to investigate issues relating to HIV/AIDS. In this section, we examine related works carried out by different researchers, including studies on drug resistance cases that have emerged within the last ten years [10, 11, 12, 13]. Most recent studies reviewed in the PubMed database [14] concentrate on problems such as HIV/AIDS prediction of protease cleavage sites and inhibitors, correction usage for viral entry, patient response, resistance and adverse effect of ART. The Agence Nationale de Recherchen sur le SIDA (ANRS) has become a gold standard for interpreting HIV drug resistance using genomes mutations, and in [15], an attempt to improve the ANRS gold standard prediction was made for HIV drug resistance cases using genome sequence and HIV drug resistance measures from the Stanford HIV database (http://hivdb.stanford.edu/). Developing a computational prediction system for drug resistance phenotype can enhance the timely selection of best regimens. In Shen, Yu, Harrison and Weber [16], they applied two machine learning algorithms, the random forest and k-NN, to predict HIV drug resistance from genotype data. In [17], a framework for supporting and managing HIV/AIDS using k-means and random forest algorithms was proposed to mine hidden information from a huge database and to help in decision making for the treatment of HIV related diseases. In [18], the classification and regression tree, was used to predict the survival of AIDS patients receiving antiretroviral therapy in Malaysia, and to discover potential treatment methods and treatment progress of monitoring patients. But the sparseness of data constrained the study and reference to drug resistance cases was missing. Isaakidis, Raguenaud, Te, Tray, Akao, Kumar, Ngin, Nerrienet and Zachariah [19] investigated the high survival and treatment success sustained after two and three years of first-line ART for children in Combodia. The Kaplan-Meier analysis [20] was used to estimate survival, and Cox regression [21] was used to identify the risk associated with treatment failures, where survival, immunological restoration and viral suppression could be sustained after two to three years of ART among children in resource constrained settings. The study was however limited to the use of only CD4 count as predictor variable. In [22], the application of ML to predict future CD4 count changes was investigated. They formulated a mathematical model that can predict the range of change of an individual HIV-1 positive patient's CD4 count, using support vector machine (SVM) classification model that predicts variability level of the CD4 count. Clinical features used as inputs were genome, current viral load and number of weeks from baseline CD4 count. This approach produced acceptable classification accuracy and showed that a change in CD4 count can be accurately predicted using machine learning. The study, however, did not consider drug resistance, which is vital in treatment success appraisal, and had as limitations, low number of datasets and high misclassifications. In [23], neural network was used for a longitudinal assessment of antiretroviral therapy determination, based on Jordan-Elman networks [24] – to longitudinally follow viral surrogate markers and demographics, biochemical and laboratory data that describe drug-virus host interactions in over 4,000 HIV adult patients. They found that neural networks can be applied in real-time context of prospective, longitudinal clinical trials of newer antiretroviral drugs.

Uncertainties abound in many real-world problems and may arise from inputs, outputs, linguistic diversity, change in operational condition, and noisy data. In the case of HIV/AIDS, the disease may present confusable patterns most likely to becloud early diagnoses and treatment. Although the type-1 fuzzy logic [25, 26, 27] has succeeded in solving a wide range of real-world problems, their performance is rendered inappropriate in many complex use cases with highly confusable variables. The type-2 fuzzy logic systems have evolved to complement type-1 fuzzy logic systems because they are more robust to uncertainties in many applications with the block type reduction guided by the inference mechanism playing central role in the systems. They represent input and output results using fewer rules and embed large number of type-1 fuzzy sets to describe variables with detailed description of extra levels of smooth control surface and response. More, outputs that are not feasible in type-1 are possible due to extra dimension provided to the foot print of uncertainty (FOU) [28]. Although the Karnik-Mendel (KM) iterative algorithms are standard algorithms to performing the type-reduction, the high computational cost of type-reduction process may hinder their use in real world applications [29]. Advancements on research in type-2 fuzzy sets and system, have encouraged enhanced type-reduction techniques [30], and the application of learning methods to the type-2 fuzzy logic systems, resulting in hybridised forms that fuses fuzzy type-2 systems with neural, and evolutionary methods or classification algorithms [31, 32].

3. Materials and methods

3.1. Proposed system architecture

An architecture describing the workflow of our proposed Fuzzy-MDS-DNN system is presented in Fig. 1. The proposed architecture is structured into two major phases namely: (i) data collection and processing, and (ii) patient response modelling and optimisation. The modelling-optimisation phase fusses a two-stage classification system with MDS capability, into a hybridised controller capable of high error-tolerant patient response modelling and optimisation. The controller accepts through a fuzzy interface, linguistic inputs (parameters) from a processed database of unique experimental (Stanford and locally sourced) datasets. Supervised learning is then achieved through the automatic adjustment of the fuzzy model parameters which forms initial inputs to the DNN and initiated by the learning algorithm. An optimised set of non-fuzzy inputs are then fed into the IT2FL section to output precise patient response, which errors are later pruned using an MDS algorithm. The pruned datasets are finally learned to produce optimised predictions of the patient response. Details of each section of the architecture are discussed in the following subsections.

Fig. 1.

Fig. 1

Proposed system framework.

3.1.1. Data collection and processing

The Stanford HIV database – a publicly available domain dataset hosted by the University of Stanford was used as a reference dataset in this experiment. This database is archived using the Extensible Markup Language (XML) format (cf. https://hivdb.stanford.edu) and captures details of patients who have failed treatment due to drug resistance. The Stanford database was created in 1999 and hosts a freely available online genotypic resistance interpretation system called HIVdb–to support health workers in understanding HIV-1 genotypic resistance tests [33]. Several studies have followed to confirm the TCEs database as an effective database in the study and monitoring of resistance to HIV drugs therapy. A total of 24 drugs in varying combinations were identified in 1,521 unique patient records with 5,780 individual TCEs – spread across several weeks of treatment. To ensure consistency and attribute unique instances to each patient, a MATLAB script was written to average each patient data (XML sheet) over the various TCEs and extract unique instances for all the patients. Locally sourced data were also collected from case files of paediatric patients receiving treatment at various health centres in Akwa Ibom State of Nigeria, plus, a Community Anti-Retroviral Therapy Programme–periodically carried out to reach rural dwellers. A total of 13 data points (health facilities) were assessed. The Akwa-Ibom database covered patients who registered for treatment at the various facilities from 2015-2018, and contains both resistant and non-resistant exemplars. The investigated facilities accommodate about 10,000 patients in the southeast region currently receiving treatment. However, due to the limited resources in the Nigerian environment, only five drug combinations in three treatment regimens are possible. These regimens are administered to patients (at the various centres) free of charge–through a Family Health International (FHI) HIV/AIDS intervention programme. The Akwa-Ibom HIV database consists of a total of 1,301 unique patients with 3,168 individual TCEs. Features investigated in the study were: baseline CD4 count (BCD4), followup CD4 count (FCD4), baseline viral load (BRNA), followup viral load (FRNA), and Drug Type combination (DType). Ethical issues came to play in this research as the research involved gathering data from human subjects. Although the research did not involve direct contact with patients, access to patients' medical histories and treatment was granted by the responsible authorities after satisfying the ethical consent procedure – for the purpose of sieving the relevant experimental data. Hence, we discuss the ethical issues under two areas: Informed consent: Informed consent through written permission was obtained from the responsible health authority before embarking on the research. Data Protection: Data protection was ensured, as details that could expose the patients’ personal details (e.g., name, address, occupation, etc.), were not extracted.

3.1.2. Patient response inference modelling

Patient's response to treatment depends on several imprecise and confusable factors that direct the outcome of treatment course (including drug side effect and resistance). When these side effects are noticed, contacting one's health provider or pharmacist is needful. Drug resistance on the other hand can be the cause of treatment failure, because as the HIV multiplies in the body, the virus mutates (changes form) and produces fake copies to confuse treatment course, leading to drug-resistant strains of the HIV. In order to eliminate uncertainties in data due to the influence of these confusing factors, an IT2FL (see Fig. 1) was used to provide a knowledge representation of the patient response. The IT2FL modelling section consists of six major components namely, the fuzzifier, fuzzy sets, rule base, inference engine, type-reducer, and defuzzifier. First, the obtained input parameters are fuzzified and then passed to the inference engine–to evaluate the fuzzy set against the rule base. This process produces another type-2 fuzzy set. The fuzzy set is then reduced to a type-1 fuzzy set by the type reduction section. The reduced set is finally defuzzified to give a crisp (non-fuzzy) output.

3.1.2.1. The fuzzy model
3.1.2.1.1. Model description

An Interval Type-2 Fuzzy Set (IT2FS) characterized by A˜ has a FOU bounded by a lower and upper membership functions, μ¯A˜(x,μ) and μ¯A˜(x,μ)xX, respectively, is expressed as:

A˜={((x,μ),μ¯A˜(x,μ),μ¯A˜(x,μ))|xX,μJx[0,1]}, (1)

where μ¯A˜(x,μ) and μ¯A˜(x,μ) = 1;xX and μJx [0, 1], are defined on a continuous universe of discourse (UoD); x denotes the primary variable in domain X, μ denotes the secondary variable in domain Jxat eachxX; Jx is called the primary membership of x as defined in (1), which symbolize the interval set; the secondary grades of A˜ is unity, and hence, reduces the IT2FS to,

A˜=xXμJx1/(x,μ). (2)

Now, the FOU of A˜ is the union of all primary membership grades and is given by,

FOU(A˜)=xXJx, (3)

The UMF: upper membership function (μ¯A˜(x)), and LMF: lower membership function, μ¯A˜(x), are type-1 membership functions (MFs) marking the FOU boundary of an interval type-2 MF. The UMF represents the subset that has the maximum membership grade of the FOU; and the LMF is a subset that has the minimum membership grade of the FOUxX[34, 35], thus,

μ¯A˜(x)FOU(A˜)¯,xX, (4)
μ¯A˜(x)FOU(A˜)¯,xX, (5)
Jx=[μ¯Aˇ(x),μ¯A˜(x)]. (6)

The Triangular Membership Function (TMF) was adopted to evaluate each input and output MFs for the IT2FL system. The description of the TMF using a line or curve is based on three parameters a1, p, and a2, and specifies the mapping of each input or output parameters, to obtain membership values for n membership grades (MGn;n:1,,n), thus:

μ(x)={0;ifx<a1(MG1){NIR}xa1(MG1)a2(MG1)a1(MG1);ifa1(MG1)x<a2(MG1)a2(MG2)xa2(MG2)a1(MG2);ifa1(MG2)x<a2(MG2)a2(MGn)xa2(MGn)a1(MGn);ifa1(MGn)x<a2(MGn)0;ifxa2{NIR}. (7)

where, a1 and a2 are the triangular end points defined by the FOU – region consisting of all the points of primary membership of elements, and NIR signifies values that are not in range. Fig. 2 illustrates a triangular shape IT2FLS with its principal T1FS, showing the end point, and P, the triangular peak location.

Fig. 2.

Fig. 2

IT2FL FOU.

Now, labelling the internal cross section of Fig. 2, the triangular shape of IT2FLS with its principal T1FS bounded by a UMF and a LMF is given in Fig. 3.

Fig. 3.

Fig. 3

Internal cross section of IT2FL MF.

where l is the left end point bounded by both UMF (l1) and LMF (l2), and, r is the right end point, also bounded by both UMF (r2) and LMF (r1). The triangular peak location or mean, P, of each end point is also bounded by P1 and P2, representing the triangular peak locations of end points l1 and r1, and l1 and r2, respectively.

3.1.2.1.2. Membership function construction

The UoD or universal set denotes the complete range of values assigned to the linguistic variables. We define this measure following (7) for our input and output linguistic variables. The UoD membership ranges were created to align with established ranges from literature and practicing expert physicians in the healthcare facilities studied. Table 1 shows the input and output fuzzy sets derived from these sources.

Table 1.

Input and output fuzzy sets from domain knowledge.

S/N Membership grade (MG) BCD4/FCD4 (Input)
l1 P1 r1 l2 P2 r2
1 Low {L} 0 225 450 50 275 500
2 Medium {M} 300 575 850 350 625 900
3
High {H}
700
1075
1450
750
1125
1500


BRNA/FRNA (Input)
1 Undetected {U} 0 0.60 1.20 0.30 0.90 1.50
2 Supressed {S} 1.00 2.15 3.30 1.20 2.35 3.50
3
Not Supressed {NS}
2.50
4.00
5.50
3.00
4.50
6.00


PR (Output)
1 No Interaction {NI} 0 27.50 55 5 32.50 60
2 Very Low Interaction {VLI} 30 47.50 65 35 52.50 70
3 Low Interaction {LI} 62 68.50 75 67 73.50 80
4 High Interaction {HI} 72 78.50 85 77 83.50 90
5 Very High Interaction {VHI} 82 88.50 95 87 93.50 100

Following (7), the UMF and LMF for the CD4, RNA and PR linguistic variables are as realised in (8)–(13), respectively,

μ¯CD4(x)={0;ifx<0{NIR}x450;if0x<450{L}850x550;if450x<850{M}1450x750;if850<x1450{H}0;ifx1450{NIR}, (8)
μ¯CD4(x)={0;ifx<50{NIR}x50450;if50x<500{L}900x550;if500x<900{M}1500x750;if900x<1500{H}0;ifx1500{NIR}. (9)
μ¯RNA(x)={0;ifx<0{NIR}x1.2;if0x<1.2{U}3.3x2.3;if1.2x<3.3{S}5.5x3;if3.3x<5.5{NS}0;ifx5.5{NIR}, (10)
μ¯RNA(x)={0;ifx<0.3{NIR}x0.31.2;if0.3x<1.5{U}3.5x2.3;if1.5x<3.5{S}6x3;if3.5x<6{NS}0;ifx6{NIR}, (11)
μ¯PR(x)={0;ifx<0{NIR}x55;if0x<55{NI}65x35;if55x<65{VLI}75x13;if65x<75{LI}85x13;if75x<85{HI}95x13;if85x<95{VLI}0;ifx95{NIR}, (12)
μ¯PR={0;ifx<5{NIR}x555;if5x<60{NI}70x35;if60x<70{VLI}80x13;if70x<80{LI}90x13;if80x<90{HI}100x13;if90x<100{VLI}0;ifx100{NIR}. (13)

From Fig. 3, the IT2FL LMF and UMF are expressed in (14) and (15), respectively,

μ¯(x)={0;ifx<l1xl1p1l1;ifl1x<r1(p1l1)+l1(r1p1)(p1l1)+(r1p1)r1xr1p1;ifr1(p1l1)+l1(r1p1)(p1l1)+(r1p1)x<r10;ifxr1, (14)
μ¯(x)={0;ifx<l2xl2p2l2;ifl2x<p11;ifp1x<p2r2xr2p2;ifp2x<r20;ifxr2. (15)

From (14) and (15), the lower and upper membership sets for the CD4 count membership grades are realised in (16)–(21),

μ¯CD4[L](x)={0;ifx<0x50225;if0x<225450x225;if225x<4500;ifx450, (16)
μ¯CD4[L](x)={0;ifx<50x50225;if50x<2251;if225x<275500x225;if275x<5000;ifx500, (17)
μ¯CD4[M](x)={0;ifx<300x300275;if300x<575850x275;if575x<8500;ifx850, (18)
μ¯CD4[M](x)={0;ifx<350x350275;if350x<5751;if575x<625900x900p2;if625x<9000;ifx900, (19)
μ¯CD4[H](x)={0;ifx<700x700375;if700x<10751450x375;if1075x<14500;ifx1450. (20)
μ¯CD4[H](x)={0;ifx<750x750375;if750x<10751if1075x<11251500x375;if1125x<15000;ifx1500, (21)

Following similar convention, the lower and upper membership sets for the RNA membership grades can be obtained from (14) and (15), as realised in (22)–(27),

μ¯RNA[U](x)={0;ifx<0x0.6;if0x<0.61.2x0.6;if0.6x<1.20;ifx1.2, (22)
μ¯RNA[U](x)={0;ifx<0.3x0.30.6;if0.3x<0.61;if0.6x<0.91.5x0.6;if0.9x<1.50;ifx1.5, (23)
μ¯RNA[S](x)={0;ifx<1x11.15;if1x<2.153.3x1.15;if2.15x<3.30;ifx3.3, (24)
μ¯RNA[S](x)={0;ifx<1.2x1.21.15;if1.2x<2.151;if2.15x<2.353.5x1.15;if2.35x<3.50;ifx3.5, (25)
μ¯RNA[NS](x)={0;ifx<2.5x2.51.5;if2.5x<45,5x1.5;if4x<5.50;ifx5.5. (26)
μ¯RNA[NS](x)={0;ifx<3x31.5;if3x<41;if4x<4.56x1.5;if4.5x<60;ifx6, (27)

The lower and upper membership sets for the PR membership grades can also be derived from (14) and (15), and are as realised in (28)–(37),

μ¯PR[NI](x)={0;ifx<0x27.5;if0x<27.555x27.5;if27.5x<550;ifx55, (28)
μ¯PR[NI](x)={0;ifx<5x527.5;if5x<27.51;if27.5x<47.560x27.5;if47.5x<600;ifx60, (29)
μ¯PR[VLI](x)={0;ifx<30x3017.5;if30x<47.565x17.5;if47.5x<650;ifx65, (30)
μ¯PR[VLI](x)={0;ifx<35x3517.5;if35x<47.51;if47.5x<52.570x17.5;if52.5x<700;ifx70, (31)
μ¯PR[LI](x)={0;ifx<62x626.5;if62x<68.575x6.5;if68.5x<750;ifx75, (32)
μ¯PR[LI](x)={0;ifx<67x676.5;if67x<68.51;if68.5x<73.580x6.5;if73.5x<800;ifx80, (33)
μ¯PR[HI](x)={0;ifx<72x726.5;if72x<78.585x6.5;if78.5x<850;ifx85, (34)
μ¯PR[HI](x)={0;ifx<77x776.5;if77x<78.51;if78.5x<83.590x6.5;if83.5x<900;ifx90, (35)
μ¯PR[VHI](x)={0;ifx<82x826.5;if82x<88.595x6.5;if88.5x<950;ifx95. (36)
μ¯PR[VHI](x)={0;ifx<87x876.5;if87x<88.51;if88.5x<93.5100x6.5;if93.5x<1000;ifx100, (37)
3.1.2.1.3. Rule base design

The dynamic behaviour of our fuzzy logic controller is characterised by its rule base constructed from expert domain knowledge of the consequence heuristics. These rules are necessary to simulate the perceived human reasoning toward a conceptual logic and artificial (fuzzy) reasoning, as well as the implication between the input MF and fuzzy rule inference required to compute the patient response. For designers of expert systems, these aspects of development are the most crucial, as branching constitutes a fundamental property of logic rules, and traversing complex real-world problems may certainly cause unnecessary explosion of traversed routes. Hence, an efficient mechanism is required to ensure that only optimal routes are traversed. In this paper, we introduce experience-based heuristics in addition to the fuzzy rules. The difficulty initiating heuristics and fuzzy rules does not lie in their formulation or in the likelihood of the rule not holding, but in most cases the degree of established confidence limits are not precisely known. The rule base model of our inference system comprises of a set of if-then rules that establishes relationships between the controller input and output linguistic variables. Suppose a Fuzzy Logic System (FLS) permit p inputs, xiX1,.,xpXp, and one output yY, characterized by rules, then, the lth rule is of the form,

Rl:ifxiisFilandandxpisFp,lthenyisGl,l=1,,M. (38)

whereRlis lth fuzzy rule, Fil, Fpland Gl are the respective linguistic terms, M is the number of rules, xi,{i=1,p} is the antecedent, and y is consequent of the lth rule, l=1,,p of the FLS. Then, the F˜il’s are the MFs μF˜il(xi) of the antecedent part assigned to the ith input xi; the El’s are the MFs μF˜il(xi) of the consequent part assigned to the output y.

To generate the rules, we introduce a Moses-Map (M-Map) rule base matrix that combines the various linguistic terms, Fil,of the input parameters, xi, to yield a membership grade, Gl, a linguistic term of the output linguistic variable, y. Our M-Map simplifies the rule base generation process, and ensures that all the possible rule combinations are successfully traversed. The algorithm guiding the construction cascades the input linguistic variables along the row and column tabs (above the cells), similar to a typical speadsheet. Suppose the linguistic variables Li(i:1,,n) occupy the column tabs, and the linguistic terms tj(j:1,,m) are aligned to the row tabs, then, the length of the column and row tabs is the combined product of the linguistic terms, which anticedent or rule set logic order can be achieved using knowledge of known combinatorics. Hence, the linguistic variables investigated in this paper consist of three linguistic terms each, and the column and row cascades can permit a length of 9. Each rule set is combined such that no rule combination repeats. A total of 81 rules were derived, and is presented in Table 2.

Table 2.

The Mos-Map rule base matrix.

Baseline CD4 L L L M M M H H H
Followup CD4 L M H L M H L M H
NS NS NIr9 VLIr18 VLIr27 VLIr36 VLIr45 LIr54 VLIr63 LIr72 LIr81
NS S VLIr8 VLIr17 LIr26 VLIr35 LIr44 LIr53 LIr62 LIr71 HIr80
NS U VLIr7 LIr16 LIr25 LIr34 LIr43 HIr52 LIr61 HIr70 HIr79
S NS VLIr6 VLIr15 LIr24 VLIr33 LIr42 LIr51 LIr60 LIr69 HIr78
S S VLIr5 LIr14 LIr23 LIr32 LIr41 HIr50 LIr59 HIr68 HIr77
S U LIr4 LIr13 HIr22 LIr31 HIr40 HIr49 HIr58 HIr67 VHIr76
U NS VLIr3 LIr12 LIr21 LIr30 LIr39 HIr48 LIr57 HIr66 HIr75
U S LIr2 LIr11 HIr20 LIr29 HIr38 HIr47 HIr56 HIr65 VHIr74
U U LIr1 HIr10 HIr19 HIr28 HIr37 VHIr46 HIr55 VHIr64 VHIr73
Baseline RNA Followup RNA

From Table 2, rules r1, r2 and r3 can be built as follows:

r1.If BCD4 is L-Low and FCD4 is L-Low and BRNA is U-Ubdetected and FRNA is U-Undetected then Interaction is LI-Low Interaction.

r2. If BCD4 is L-Low and FCD4 is L-Low and BRNA is U-Ubdetected and FRNA is S-Suppressed then Interaction is LI-Low Interaction.

r3. If BCD4 is L-Low and FCD4 is L-Low and BRNA is U-Undetected and FRNA is NS-Not Suppressed then Interaction is VLI-Very Low Interaction.

The result of the input and antecedent operations is an interval type-1 set, and is called the firing set [36], thus,

Fi(x)=[fi¯(x),fi¯(x)][fi¯,fi¯], (39)
fi¯(x)=μ¯f1i(x1),...μ¯fmi(x1), (40)
fi¯(x)=μ¯f1i(x1)...μ¯fmi(x1). (41)

where, Fi(x)is the antecedent of rule i, and μfi(x), is the degree of membership of xin F.μ¯fi(x) and μ¯fi(x), are upper and lower MFs of μfi. The Ri fired output consequent set μβˇ(y) is the interval type-2 fuzzy set represented as,

μβl(y)=bl[f¯1μ¯G˜l(y),fi¯μ¯Gl(y)]1bl,yY. (42)

where μ¯G˜(y),andfi¯μ¯G(y) are the lower and upper membership grades μG˜(y). And μβ˜(y) is obtained as a combination of the fired output consequent set, considering the union of the rule R1 fired output consequent set. The type reduction module maps the reduced set into an interval of uncertainty, producing the output of the IT2FLS. The type reduction can now be expressed as follows using Karnik-Mendel model [37, 38]:

yr=i=1Nfrlyrli=1Nffi, (43)

and,

yl=i=1Nfliylii=1Nffi, (44)

The defuzzification module is then used to defuzzify the interval set using the average of yr and yl, yielding the output of the IT2FLS as,

y(x)=yl+yr2. (45)

An open-source toolkit: the Juzzyonline Fuzzy toolbox [36] (http://juzzy.wagnerweb.net/), created for the development and sharing of Type-1 and Type-2 fuzzy logic systems, was used to implement the proposed controller, where the input and output variables were shared according to their lower and upper values and used in controlling the models. UMF and LMF were also defined for CD4 count, RNA, and patient response parameters.

3.1.2.2. The MDS model
3.1.2.2.1. Model description

MDS is a data analysis technique that computes relative positions of adjacent objects from high dimension space to low dimension space with high error-tolerance [39]. It is concerned with configuration recovery from distance (dissimilarity) matrices and makes for more understandable data through visualisation. Projecting data into a lower dimensional space can serve two purposes. First, it eliminates irrelevant features, hence, reducing noise that may affect the analysis. Second, an easy visualisation of data using 2- or 3-dimensions – for better interpretation of “hidden” structures can be achieved [40, 41]. We apply MDS in this paper to ensure that the distance between the learning exemplars are best predictors and efficient for classification. Although there are classical, metric, non-metric, and generalised MDS, the non-metric MDS was preferred in this paper. This class of MDS locates a configuration of points in some lower space whose pair-wise Euclidean distances contain approximately the same rank order as the corresponding dissimilarities in higher space [42]. Applying the non-metric MDS within the context of our problem, we reformulate the problem as follows: Let X be an m×n matrix representing patients unique TCEs in the higher space, Rn; Y be an m×p matrix representing the perturbed data in the lower space, Rp; and Δ=[δij] be the dissimilarity matrix of X for i;j={1,,m}. The Euclidean distance (dij=k=1m(xikxjk)2) is a common measure that is mostly used to describe the dissimilarity Δ=(δij) between the TCE points, xi and xj. Then, δij=f(dij), where f is a monotonic function such that dij<duvδij<δuv. Non-metric MDS orders the off-diagonal δij such that δi1,j1,,δimjm; where m=n(n1)2, and seeks a fitted configuration Xˆ=(xˆi,k) in pdimensions, such that the fitted distances Dˆ=(dˆij) (obtained by substituting xˆik and xˆjk for xik and xjk, in a matrix of squared proximities, i.e., P=xixj2). This contributes to preserving the ordering, dˆi1,j1,,dˆimjm. The squared stress is used to measure how the ordering of the elements between Δ and Dˆ differs. Thus,

Sp2(Xˆ)=i<j(di,jdˆi,j)2i<jdˆi,j2, (46)

where p denotes the number of dimensions Xˆ, and the denominator of (46) makes Sp2(Xˆ) invariant to uniform scaling. The square root is then taken to give the stress of fit statistic, as,

Sp(Xˆ)=(i<j(di,jdˆi,j)2i<jdˆi,j2)12. (47)

The number of dimensions (Xˆ) is iteratively adjusted and the stress Sp(Xˆ) recalculated, starting with an initial configuration, Xˆ(0), in p dimensional space, using the method of steepest descent to minimize Sp(Xˆ). When the Sp(Xˆ) = 0%, the fitted configuration (Xˆ) is identical to the original configuration. On reaching an acceptable stress of fit statistic, the obtained response set is fed to the DNN for learning and optimisation purposes.

3.1.3. Patient response optimisation

3.1.3.1. DNN model description

In this section, DNN is used to model the current problem by revealing the interactions between the input data samples for optimal patient response prediction. DNN is a hierarchical model where each layer implements a linear transformation followed by a non-linearity to the preceding layer. Let XRNxD represent the neural inputs obtained from the fuzzy interface (initiated by the learning algorithm), with each row of Xbeing a D-dimensional data point. For the sake of simplicity, we assume that the datasets lie in R; and N is the number of training exemplars. Also, let WKRdk1.dk be a linearly transformed matrix applied to the output layer k1,XK1RN.dk1, to produce a dk-dimensional term XK1WKRN.dkat layer k. Suppose, K:RR is a non-linear activation function, e.g., a sigmoid: K(x)=(1+ex)1 or hyperbolic tangent: K=tanh(x), or a rectified linear unit: K(x)=max{0,x}, then, the activation function can be applied to each instance of YK1WK to generate the kth layer of a neural network, as: XK=K(XK1WK), and the output XK of the network becomes:

γ(X,W1,W2,,WK)=K(K1(2(1(XW1)W2)WK1)WK). (48)

Notice that: γ is a N×C matrix, and C=dk is the output network dimension, which equates to the number of classes of a classification problem. As such, we can view γ as a function map that defines the input data Xwith fixed weights, W. In this paper our optimisation system uses the sigmoid activation function.

3.1.3.1.1. Global optimality

Consider the problem of learning the parameters W={WK}k=1Kof a DNN from N training exemplars (X,Y). In the configuration setting, suppose a classification problem has C target classes, where each row of XRNxD denotes a data point in RD and each row of Y{0,1}N×C denotes membership of each data point to one out of the C classes; then, Yjc=1, iff the jthrow of X belongs to class c{1,2,,C}; otherwise, Yjc=0. The learning problem can be formalised as an optimisation problem, thus:

min{WK}k=1K(Y,γ(X,W1,W2,,Wk))+λβ(W1,W2,,WK). (49)

where (Y,β) is the loss function that measures the consensus between the true output, Y, and the predicted output γ(X,W);β, is the regularisation or normalisation function which is used to prevent overfitting.

3.1.3.1.2. Universal approximation
Theorem 1

[41]: Let P() be a bounded, non-continuous function, and let Imdenote a m-dimensional hyperbole, and C(Im) denote the space of continuous functions on Im. Given any fC(Im)and ε>0, there exists N>0 and vi,wi,bi,i=1N, such that F(x)=iNviP(wiTx+bi) satisfies supxIm|f(x)F(x)|<ε|.

Theorem 1 guarantees that even a single hidden layer network can represent any classification problem where the boundary is locally linear (or smooth) but does not give clue to good or bad architectures or how they relate to the optimisation problem.

Theorem 2

The mean integrated square error between the essential network Fˆ and the target function f is bounded by, O(Cf2N)+O(NmKlogK),where K denotes the number of training points, N is the number of neurons, and m measures the global smoothness of f.

3.1.3.1.3. Generalisation error

Consider a classification problem with data point, XXpRD, corresponding to class label YYc. The training set of N samples drawn from a distribution Q is given as φN={Xi,Yi}i=1N and the loss function is denoted as (Y,γ(X,W)) – a measure of the discrepancy between the true and estimated labels of Y – provided by the classifier. The empirical loss of the network γ(.,W) associated with the training set φN is defined as [43]:

emp(γ)=1NXiφN(Yi,γ(Xi,W)), (50)

and its expected loss is given as,

emp(γ)=E(X,Y)Q[(Y,γ(X,W))], (51)

From the above definition, the generalization error becomes,

GE(γ)=|exp(γ)emp(γ)|. (52)

And the loss function for deep learning our supervised classification problem is the empirical cross entropy, given as:

¯(W)=Eq(X,Y)(logγ(X,W)). (53)

Eq. (53) is however prone to over-fitting, as the network may trivially the training data instead of learning the underlying distribution measure. This problem can be fixed using normalisation, which may be explicit or implicit in a stochastic gradient descent.

4. Results

4.1. Experiment HIV database analysis

4.1.1. CD4 and RNA visualisation

The CD4 count and RNA give healthcare providers important clues about the following: immune system health, HIV progression, body response to HIV therapy, and virus response to the therapy. In this section, we analyse the empirical datasets of both Stanford HIV and Akwa-Ibom HIV databases, for possible cluster variations and outlier effects. In Fig. 4, a visualisation of the effect of TCE on CD4 count is presented. CD4 count indicates the immune system robustness. Hence, there are enhanced immunological changes defined by higher cluster heights (an indication of low opportunistic infection) in the Akwa-Ibom HIV database, compared to the Stanford database, which changes occurred at a slower rate. Furthermore, more data points in the Akwa-Ibom HIV database clustered far above the baseline CD4, as opposed to the Stanford HIV database, which had about 40% of the data points clustering below the baseline CD4. The Stanford effect may not be unconnected with the fact that the database is made up of patients with resistant exemplars or who had failed treatment due to high level of drug resistance and are being monitored using new drug regimens.

Fig. 4.

Fig. 4

Effect of treatment change episode on CD4 count for Akwa-Ibom HIV and Stanford HIV databases.

The RNA is an inverse function of the CD4 count. A low RNA indicates relatively few copies of HIV in the blood stream and a pointer to a working HIV therapy. If treatment fails and the RNA levels stage a rebound, then CD4 count will start dropping gradually (within few weeks) in response to the rebound. A visualisation of the consequence of RNA variations is presented in Fig. 5. We observe escalating tendencies of baseline RNA curve in Akwa-Ibom HIV database, compared to the Stanford database, which RNA curve appears to be increasingly steady. The escalating trend of BRNA curve implies that the Akwa-Ibom HIV patients showed more cases of advanced (undiagnosed) HIV, or those who may have ignored early warning signs of the disease. Although FRNA values for both databases showed cases of reduced side effects or adverse reactions in some patients than others with outlier effects, the recovery or improvement rate appears to be more rapid on Akwa-Ibom HIV patients, as over 60% of RNA copies of below 2 (102) were noticed.

Fig. 5.

Fig. 5

Effect of treatment change episode on RNA copies for Akwa-Ibom HIV and Stanford HIV databases.

4.1.2. Patient response inference analysis

A statistic of the fuzzy-MDS patient response inference based on the output membership grades is presented in Table 3. We observe that the Akwa-Ibom database has the least patients with relatively high interaction cases (HI and VHI) of 90 (6.9178%) patients. This confirms the healthcare managers’ claim that only few acute cases of failed treatment were recorded, but further statistical evidence is required to confirm the significance of their claim with respect to the HIV population under study. In contrast, the Stanford (our reference) database has 618 patients (40.6312%) with relatively high interaction and further confirms the purpose of the database (cases of patients with failed treatment). As regards the level of positive response to treatment, the Akwa-Ibom HIV database showed the highest response rate with 841 (64.6426%) patients and 370 (28.4397%) patients having relatively low interaction cases (LI and VLI) and no interaction, respectively. The Stanford database showed relatively low and no response cases of 664 (43.6555%) and 239 (15.7133%), respectively, indicating the accuracy of our fuzzy-MDS-DNN controller at modelling the databases.

Table 3.

Analysis of patient response inference.

Membership grade Stanford database
Akwa-Ibom database
TCE Unique Patient ID % TCE Unique Patient ID %
VHI 940 248 16.3051 75 25 1.9216
HI 1454 370 24.3261 195 65 4.9962
LI 1546 402 26.4300 883 294 22.5980
VLI 962 262 17.2255 1640 547 42.0446
NI 878 239 15.7133 1110 370 28.4397
Total: 5780 1521 100 3903 1301 100

4.2. DNN optimisation

It is proven that multilayer perceptrons (MLPs) with only one hidden layer are universal function approximators [44]. They are neural networks with multiple parallel node-layer topologies that utilises a supervised learning technique known as backpropagation. Hence, identifying a topology that best drives the problem is important. In this paper, MATLAB 2017a was used to model the classification system as a pattern recognition problem driven by the MLP architecture. A pattern recognition network model called patternnet applying two training algorithms, the Levenberg-Marquardt (trainlm) and Resilient Backpropagation (trainrp) algorithms was adopted for this purpose. The performance of various algorithms can be affected by the accuracy required of the approximation, but applications of the training algorithms in various literatures have shown that the trainlm algorithm is the fastest, although not without the limitation of a larger storage requirement. The experiment databases were distributed in the ratio of 80: 10: 10, for training, validation, and testing, respectively, using a randomised distribution approach that divides up every sample data. Relevant features that served as inputs to the DNN were, baseline CD4 (BCD4), followup CD4 (FCD4), baseline viral load (BRNA), followup viral load (FRNA), drug type combination (DType), and the error-pruned patient response inference set. Five target classes (C1–C5) were created following the output membership grades of the fuzzy-logic system (C1 = VHI, C2 = HI, C3 = LI, C4 = VLI, and C5 = NI) to predict the clustering patterns. Table 4 and Table 5 are sample input and target class data of 20 patients, for the Stanford HIV and Akwa-Ibom HIV databases, respectively.

Table 4.

Input linguistic variables and target classes for Stanford database.

Input linguistic variable
Target class
PID BCD4 FCD4 BRNA FRNA DType PR C1 C2 C3 C4 C5
1 330 347 3.7 3.3 3TC + ABC + ATC + AZT + RTV + TDF 32.67 0 0 1 0 0
2 38 53 5.7 5.3 D4T + DDI + NVP 30.00 1 0 0 0 0
3 949 987 3.7 4.4 D4T + DDI + EFV 71.00 0 0 0 1 0
4 281 334 3.6 3.9 D4T + DDI + EFV 30.82 0 0 1 0 0
5 288 426 3.9 3.6 ABC + D4T + EFV 39.27 0 0 0 1 0
6 470 459 4.1 3.2 3TC + D4T + DDI + LPV 48.75 0 0 0 1 0
7 694 717 3.7 4.8 D4T + EFV + NFV 50.34 0 0 0 1 0
8 37 50 5.3 4.9 ABC + EFV + RTV + SQV 30.00 1 0 0 0 0
9 242 358 4.9 3.8 3TC + DDI + RTV + SQV 31.65 0 0 1 0 0
10 213 274 3.7 2 3TC + ABC + AZT + TDF 50.00 0 0 1 0 0
11 88 149 5.3 4.2 ABC + D4T + EFV + NFV 30.00 1 0 0 0 0
12 316 403 4 4 D4T + DDI + RTV + SQV 35.28 0 0 0 1 0
13 50 105 5.1 4.7 APV + D4T + EFV + RTV 30.00 1 0 0 0 0
14 102 159 4.9 3.9 3TC + APV + D4T + DDI + RTV 30.00 0 1 0 0 0
15 103 231 3.7 3.5 3TC + D4T + DDI + FPV + RTV 30.00 0 0 1 0 0
16 72 159 5.1 3.4 AZT + DDI + LPV 31.20 0 1 0 0 0
17 109 258 4.9 1.9 DRV + FTC + RTV + T20 + TDF 50.00 0 1 0 0 0
18 169 213 4.1 4.1 3TC + D4T + RTV + SQV 30.00 1 0 0 0 0
19 212 381 4.5 1.9 3TC + D4T + EFV + NFV 50.00 0 0 1 0 0
20 315 352 4.7 3.1 D4T + DDI + IDV + RTV 38.61 0 0 1 0 0

Table 5.

Input linguistic variables and target classes for Akwa-Ibom database.

Input linguistic variable
Target class
PID BCD4 FCD4 BRNA FRNA DType PR C1 C2 C3 C4 C5
1 440 388 4.1 3.7 TDF+3TC + EFV 41.92 0 0 0 1 0
2 429 765 4.1 2.4 TDF+3TC + EFV 57.80 0 0 0 1 0
3 307 354 4.4 1.3 TDF+3TC + EFV 52.74 0 0 1 0 0
4 17 675 4.4 1.3 AZT+3TC + NVP 55.00 0 0 1 0 0
5 291 625 4.4 1.3 TDF+3TC + EFV 53.68 0 0 1 0 0
6 180 380 3.4 1.3 TDF+3TC + EFV 53.56 0 0 0 1 0
7 240 400 3.1 1.3 TDF+3TC + EFV 55.16 0 0 0 1 0
8 315 601 4.1 1.3 TDF+3TC + EFV 53.68 0 0 1 0 0
9 163 875 4.1 1.9 TDF+3TC + EFV 65.57 0 0 1 0 0
10 238 642 4.1 3.7 TDF+3TC + EFV 50.00 0 0 1 0 0
11 362 689 4.2 2.1 AZT+3TC + NVP 51.94 0 0 0 1 0
12 156 512 5.4 2.8 TDF+3TC + EFV 50.00 0 0 1 0 0
13 28 52 6.3 1.6 TDF+3TC + EFV 50.00 0 1 0 0 0
14 217 502 6.1 5 TDF+3TC + EFV 50.00 0 1 0 0 0
15 230 763 5.1 1.8 TDF+3TC + EFV 51.78 0 0 1 0 0
16 415 371 3.4 1.3 AZT+3TC + NVP 56.17 0 0 0 0 1
17 286 842 3.4 1.7 TDF+3TC + EFV 60.10 0 0 0 1 0
18 494 657 3.3 2.1 TDF+3TC + EFV 68.44 0 0 0 0 1
19 266 319 3.2 2 TDF+3TC + EFV 50.82 0 0 1 0 0
20 158 266 4.3 3.1 AZT+3TC + NVP 37.97 0 1 0 0 0

To accelerate training, the network was evaluated using a 5-layer (structure) configuration, each layer representing a target class with the number of neurons (in our case, the maximum number of input variables) distributed in a dropout fashion, resulting in the configuration ([6 5 4 3 2]). The evaluation metrics selected as indicated in Tables 6, 7, 8, 9, include:

Table 6.

Classification results for Stanford database with multidimensional scaling.

No. of layers Neuron Config. Train Alg. R- value Overall MSE Val MSE Test MSE Gradient TPR FPR Class Acc.
5 [6 5 4 3 2] Trainlm 0.9545 0.0133 0.0146 0.0108 0.0253 0.9625 0.0108 0.9700
Trainrp 0.8667 0.0395 0.0461 0.0368 0.0277 0.8635 0.0362 0.8630

Bold signifies performance metric values that meets the defined threshold of this study.

Table 7.

Classification results of Stanford database without multidimensional scaling.

No. of layers Neuron Config. Train Alg. R- value Overall MSE Val MSE Test MSE Gradient TPR FPR Class Acc.
5 [6 5 4 3 2] Trainlm 0.9177 0.0257 0.0249 0.0222 0.0308 0.9366 0.0188 0.9323
Trainrp 0.8120 0.0552 0.0545 0.0508 0.0210 0.8431 0.0419 0.8440

Bold signifies performance metric values that meets the defined threshold of this study.

Table 8.

Classification results of Akwa-Ibom database with multidimensional scaling.

No. of layers Neuron Config. Train Alg. R- value Overall MSE Val MSE Test MSE Gradient TPR FPR Class Acc.
5 [6 5 4 3 2] Trainlm 0.9871 0.0038 0.0040 0.0070 0.0364 0.9974 0.0026 0.9887
Trainrp 0.8323 0.0473 0.0527 0.0612 0.0224 0.9554 0.0446 0.8391

Bold signifies performance metric values that meets the defined threshold of this study.

Table 9.

Classification results of Akwa-Ibom database without multidimensional scaling.

No. of layers Neuron Config. Train Alg. R- value Overall MSE Val MSE Test MSE Gradient TPR FPR Class Acc.
2 [6 5 4 3 2] Trainlm 0.9248 0.0236 0.0022 0.0207 0.0357 0.7872 0.0185 0.9244
Trainrp 0.8009 0.0569 0.0579 0.0604 0.0234 0.8159 0.0501 0.8252

Bold signifies performance metric values that meets the defined threshold of this study.

Regression value (R-value): A coefficient that measures the relationship between the outputs of the network and the targets that provides an idea of how close the output from the model is to the actual target values. A perfect training will yield same network outputs and targets, but this relationship cannot be perfect in practice.

Mean Squared Error (MSE): A loss function averaged over the entire dataset, which measures the distance between the predicted and true outputs. The least MSE has been known to yield the best performance.

Gradient: The direction and magnitude (slope) required in the calculation of weights to be used in the network and is commonly used to train deep neural networks.

True Positive Rate (TPR): Also called Sensitivity, measures in our case, the proportion of patients identified as having failed treatment or adverse drug reaction.

False Positive Rate (FPR): Measures in our case, the proportion of patients identified as not having failed treatment or adverse drug reaction. It is also represented as 1 – (Specificity or True Negative Rate: TNR).

Classification Accuracy (Class Acc.): Measures in our case, the proportion of correctly classified patients’ response.

A performance value of at least 0.9 (90%) was considered as acceptable threshold for the R-value, TPR and classification accuracy metrics; and at most 0.01 (1%) for the FPR and MSE metrics. Significant performances are indicated in bold font type. Overall classification result reveal that the trainlm algorithm gave better optimised predictions compared to the trainrp algorithm, for the Stanford HIV and Akwa-Ibom HIV databases with multidimensional scaling–demonstrating the effectiveness of error pruning before learning. As can be seen in Tables 6 and 8, the introduction of multidimensional scaling produced least MSEs and classification accuracies–indicating good validation and test predictors as well as datasets. Classification performance however degraded when the proposed controller was experimented without multidimensional scaling (see Tables 7 and 9). The implication here is that inference from expert knowledge is invaluable to improved system prediction, and our classification learner was at its best in modelling both HIV databases.

A study of the Receiver Operating Characteristics (ROC) curve (a plot of the TPR vs. FPR for the different possible cut-point of diagnostic tests) shows that the test data of the Akwa-Ibom HIV database for fuzzy-PRI with MDS response inference yielded more accurate results than the Stanford HIV database, as (TPR, FPR) values for both databases were (0.9974, 0.0026) and (0.9626, 0.0108), respectively. However, ROC curve results for the experiment databases for fuzzy-PRI without MDS resulted in less accurate test results for the Akwa-Ibom HIV database, with (TPR, FPR) values of (0.7872, 0.0185), compared to the Stanford HIV database, with (TPR, FPR) values of (0.9366, 0.0188). Despite the poor test result reported for the Akwa-Ibom HIV database, our classification learner still maintained accurate classification accuracy of 92.44%. The results indicate that the Stanford HIV database appears to be more robust to test errors even without MDS. Moreover, varying TCEs exist for individual patients compared to the Akwa-Ibom database, which had only a single regimen of only one drug combination and TCE and no further therapeutic action was recorded even when patients showed failed treatment. Hence, a call for a followup of treatment on patients with low patient response inference (i.e., for high and very high interaction cases), as this parameter is a pointer to failed treatments and establishes drug patterns with multi-drug resistance. This also explains why the Stanford database could perform well without expert response inference, as the experiment data contains treatment failure cases. Our classification learner did not over-fit, as the observed gradients neither vanished nor exploded. Vanishing and exploding gradients are two major obstacles in training DNNs and can result in unstable network structures that at best cannot learn from the training data or at worst results in NaN weights, abruptly terminating updates of the weight values. This difficulty is noticeable when training artificial neural networks with gradient-based learning methods and backpropagation. The current optimal results can be attributed to the error-pruning process achieved during the patient response modelling.

5. Discussion

Recent models have shown robust prediction performance with the viral load emerging as the most important variable [45]. This finding has indirectly initiated support for the use of viral load monitoring for an improved HAART in resource-limited settings such as Sub-Saharan Africa. In [45], over 3000 TCEs data collected from clinics in North America, Europe, Japan and Australia, were trained using a single random forest (RF) system. Results obtained showed virological response predictions of 82%, among 100 independent test cases using baseline variables (including genotype). Further findings showed that new models that excluded genotype information were able to predict virological responses for the same test set with a slightly decreasing accuracy of 78%. Hence, the use of other clinically related data such as treatment history and drug information also need to be tried, as they may partially compensate for the absence of genotype data. Encouraged by this result, [45] set out to develop models that were more relevant to clinical practice in a resource-limited context. By selecting TCEs that involved drugs commonly administered in these countries and trained two RF models with over 8000 TCEs without the use of genotype data. Both models predicted virological response with an accuracy of about 82%. The emergence of new antiretroviral drugs has caused a continual modification of the HIV/AIDS treatment guidelines, hence demanding a treatment-decision capable of self-learning [46]. In [46], a self-learning HIV/AIDS regimen selection system for combined antiretroviral therapy of first round HIV/AIDS treatment was developed considering 32 associated treatment objectives involving four major clinical variables (potency, adherence, adverse effects and future drug options). The prediction accuracy was found between 84.4% and 100% in reduced treatment objectives, but, higher mean prediction accuracies of 94%–97% were obtained when all the treatment objectives were learned. A comparison with our proposed framework shows that improved prediction accuracies of 97% for the Stanford database, and 98.87% for the Akwa-Ibom database were obtained, indicating that improved accuracy can be achieved through efficient error-pruning mechanisms and robust learning algorithms.

6. Conclusions

The performance of machine learning heavily rests on sound knowledge acquisition techniques from available domain experts. This paper employed a fuzzy-multidimensional deep learning framework that combines the strengths of machine learning and multidimensional scaling tools, to facilitate intuitive knowledge elicitation – by transforming domain knowledge into a set of rules that drives the accurate classification of HIV patient response to ART. The proposed controller was able to deal with uncertainty caused by inconsistent input datasets and incomplete domain knowledge, where fuzzy rules were continually fine-tuned for two experiment databases (Stanford and Akwa-Ibom datasets), with patient response re-scaled to reduce the high error-prone datasets typical of real-world data. A deep learning optimisation of trained exemplars using two learning algorithms was then performed to efficiently optimise the datasets. Results obtained showed that inference from expert knowledge (labelled data) as well as the introduction of multidimensional scaling are invaluable for the efficient classification of HIV patient response to ART.

The limitation of this research is the high computational cost of the type-reduction process, typical of the iterative Karnik-Mendel algorithm and the application of a supervised approach to selecting the desired features. Hence, a future direction of this research targets the realisation of an efficient expert system, towards personalised therapy. Furthermore, two major areas of application can be identified: (i) an effective technique for the accurate generation of features, as unsupervised deep learning models (capable of creating features) will expand the horizons of processing new data in limited environments; (ii) exploring enhanced techniques to speed up the type-reduction process or eliminate the high computational cost inherent in existing methods will enhance the system's portability to affordable devices and real-time access to information.

Declarations

Author contribution statement

Moses Ekpenyong: Conceived and designed the experiments; Performed the experiments; Analyzed and interpreted the data; Wrote the paper.

Philip Etebong: Performed the experiments; Analyzed and interpreted the data; Wrote the paper.

Tenderwealth Jackson: Contributed reagents, materials, analysis tools or data; Wrote the paper.

Funding statement

This research is funded by the Tertiary Education Trust Fund (TETFund) of Nigeria Research grant.

Competing interest statement

The authors declare no conflict of interest.

Additional information

No additional information is available for this paper.

Acknowledgements

We acknowledge the anonymous reviewers for their invaluable comments that have contributed to improving the quality of this paper.

References

  • 1.Sathiyavathi S., Pugazhendy K. Assessments of haematological parameters in hiv patients present in and around salem district tamilnadu, India. Int. J. Mod. Res. Rev. 2014;2(11):501–504. [Google Scholar]
  • 2.Kathuria S., Bagga P.K., Malhotra S. Hematological manifestations in HIV infected patients and correlation with CD4 counts and anti retroviral therapy. J. Contemp. Med. Res. 2016;3(12):3495–3498. [Google Scholar]
  • 3.Kumari G., Singh R.K. Highly active antiretroviral therapy for treatment of HIV/AIDS patients: current status and future prospects and the Indian scenario. HIV AIDS Rev. 2012;11(1):5–14. [Google Scholar]
  • 4.Duro R., Rocha-Pereira N., Figueiredo C., Piñeiro C., Caldas C., Serrão R., Sarmento A. Routine CD4 monitoring in HIV patients with viral suppression: is it really necessary? A Portuguese cohort. J. Microbiol. Immunol. Infect. 2017;51(5):593–597. doi: 10.1016/j.jmii.2016.09.003. [DOI] [PubMed] [Google Scholar]
  • 5.Cihlar T., Fordyce M. Current status and prospects of HIV treatment. Curr. Opin. Virol. 2016;18:50–55. doi: 10.1016/j.coviro.2016.03.004. [DOI] [PubMed] [Google Scholar]
  • 6.Hurt C.B., Eron J.J., Jr., Cohen M.S. Pre-exposure prophylaxis and antiretroviral resistance: HIV prevention at a cost? Clin. Infect. Dis. 2011;53(12):1265–1270. doi: 10.1093/cid/cir684. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Baggaley R.F., Powers K.A., Boily M.C. What do mathematical models tell us about the emergence and spread of drug-resistant HIV? Curr. Opin. HIV AIDS. 2011;6(2):131–140. doi: 10.1097/COH.0b013e328343ad03. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Supervie V., Barrett M., Kahn J.S., Musuka G., Moeti T.L., Busang L., Blower S. Modeling dynamic interactions between pre-exposure prophylaxis interventions and treatment programs: predicting HIV transmission and resistance. Sci. Rep. 2011;1(185):1–11. doi: 10.1038/srep00185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Bershteyn A., Eckhoff P.A. A model of HIV drug resistance driven by heterogeneities in host immunity and adherence patterns. BMC Syst. Biol. 2013;7(11):1–15. doi: 10.1186/1752-0509-7-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Zazzi M., Cozzi-Lepri A., Prosperi M.C. Computer-aided optimization of combined anti-retroviral therapy for HIV: new drugs, new drug targets and drug resistance. Curr. HIV Res. 2016;14(2):101–109. doi: 10.2174/1570162x13666151029102254. [DOI] [PubMed] [Google Scholar]
  • 11.Riemenschneider M., Senge R., Neumann U., Hüllermeier E., Heider D. Exploiting HIV-1 protease and reverse transcriptase cross-resistance information for improved drug resistance prediction by means of multi-label classification. BioData Min. 2016;9(10):1–6. doi: 10.1186/s13040-016-0089-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Srisawat N., Avihingsanon A., Praditpornsilpa K., Jiamjarasrangsi W., Eiam-Ong S., Avihingsanon Y. A prevalence of posttransplantation cancers compared with cancers in people with human immunodeficiency virus/acquired immunodeficiency syndrome after highly active antiretroviral therapy. Transplant. Proc. 2008;40(8):2677–2679. doi: 10.1016/j.transproceed.2008.07.061. [DOI] [PubMed] [Google Scholar]
  • 13.Gardete S., Tomasz A. Mechanisms of vancomycin resistance in Staphylococcus aureus. J. Clin. Investig. 2014;124(7):2836–2840. doi: 10.1172/JCI68834. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Kumari S., Chouhan U., Suryawanshi S.K. Machine learning approaches to study HIV/AIDS infection: a Review. Biomedical Communications. Biosci. Biotechnol. Res. Commun. 2017;10(1):34–43. [Google Scholar]
  • 15.Singh S. Machine learning to improve the effectiveness of ANRS in predicting HIV drug resistance. Health Inf. Res. 2017;23(4):271–276. doi: 10.4258/hir.2017.23.4.271. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Shen C., Yu X., Harrison R.W., Weber I.T. Automated prediction of HIV drug resistance from genotype data. BMC Bioinf. 2016;17(8):278–283. doi: 10.1186/s12859-016-1114-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Iseu G., Mwangi W., Kimwele M. A framework to support management of HIV/AIDS using K-means and random forest algorithm. Int. J. Sci. Technol. Res. 2017;6(06):61–68. [Google Scholar]
  • 18.Kareem S.A., Raviraja S., Awadh N.A., Kamaruzaman A., Kajindran A. Classification and regression tree in prediction of survival of aids patients. Malays. J. Comput. Sci. 2017;23(3):153–165. [Google Scholar]
  • 19.Isaakidis P., Raguenaud M.-E., Te V., Tray C.S., Akao K., Kumar V., Ngin S., Nerrienet E., Zachariah R. High survival and treatment success sustained after two and three years of first-line ART for children in Cambodia. J. Int. AIDS Soc. 2010;13(1):11–20. doi: 10.1186/1758-2652-13-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Altman D.G. Practical Statistics for Medical Research. Chapman and Hall; London (UK): 1992. Analysis of survival times; pp. 365–393. [Google Scholar]
  • 21.Cox D.R. Springer; New York, NY: 1992. Regression Models and life-tables. Breakthroughs in Statistics. [Google Scholar]
  • 22.Singh Y., Mars M. Support vector machines to forecast changes in CD4 count of HIV-1 positive patients. Sci. Res. Essays. 2010;5(17):2384–2390. [Google Scholar]
  • 23.Hatzakis G.E., Mathur M., Gilbert L., Maniar L.K., Panos G., Patel A., Wanchu A., Tsoukas C.M. Proceedings of AMIA Annual Symposium. American Medical Informatics Association; 2005. Neural network-longitudinal assessment of the electronic anti-retroviral THerapy (EARTH) cohort to follow response to HIV-treatment; pp. 301–305. [PMC free article] [PubMed] [Google Scholar]
  • 24.Pham D.T., Karaboga D. Training Elman and Jordan networks for system identification using genetic algorithms. Artif. Intell. Eng. 1999;13(2):107–117. [Google Scholar]
  • 25.Mendel J.M. Prentice Hall PTR; Upper Saddle River, New Jersey, USA: 2001. Uncertain Rule-Based Fuzzy Logic System: Introduction and New Directions. [Google Scholar]
  • 26.Hoffmann F., Nelles O. Genetic programming for model selection of TSK-fuzzy systems. Inf. Sci. 2001;136(1-4):7–28. [Google Scholar]
  • 27.Mendel J.M., John R.B. Type-2 fuzzy sets made simple. IEEE Trans. Fuzzy Syst. 2002;10(2):117–127. [Google Scholar]
  • 28.Wu D., Tan W.W. Proceedings of 14th IEEE International Conference on Fuzzy Systems, Reno, Nevada. 2005. Type-2 FLS modeling capability analysis; pp. 242–247. [Google Scholar]
  • 29.Chen Y. Study on centroid type-reduction of interval type-2 fuzzy logic systems based on noniterative algorithms. Complexity. 2019:1–12. [Google Scholar]
  • 30.Yeh C.Y., Jeng W.H.R., Lee S.J. An enhanced type-reduction algorithm for type-2 fuzzy sets. IEEE Trans. Fuzzy Syst. 2011;19(2):227–240. [Google Scholar]
  • 31.Cordon O. A historical review of evolutionary learning methods for Mamdani-type fuzzy rule-based systems: designing interpretable genetic fuzzy systems. Int. J. Approx. Reason. 2011;52(6):894–913. [Google Scholar]
  • 32.Zhao X., Jia M. A novel deep fuzzy clustering neural network model and its application in rolling bearing fault recognition. Meas. Sci. Technol. 2018;29(12):1–24. [Google Scholar]
  • 33.Tang M.W., Liu T.F., Shafer R.W. The HIVdb system for HIV-1 genotypic resistance interpretation. Intervirology. 2012;55(2):98–101. doi: 10.1159/000331998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Mendel J.M., John R.I., Liu F. Interval type-2 fuzzy logic systems made simple. IEEE Trans. Fuzzy Syst. 2006;14(6):808–821. [Google Scholar]
  • 35.Mendel J.M., Liu X. Simplified interval type-2 fuzzy logic systems. IEEE Trans. Fuzzy Syst. 2013;21(6):1056–1069. [Google Scholar]
  • 36.Wagner C., Pierfitt M., McCulloch J. Proceedings of 2014 IEEE International Conference on Fuzzy Systems (FUZZ_IEEE), Beijing, China. 2014. Juzzy online: an online toolkit for the design, implementation, execution and sharing of type-1 and type-2 fuzzy logic systems. [Google Scholar]
  • 37.Mendel J.M., Liu F. Super-exponential convergence of the Karnik–Mendel algorithms for computing the centroid of an interval type-2 fuzzy set. IEEE Trans. Fuzzy Syst. 2007;15(2):309–320. [Google Scholar]
  • 38.Karnik N., Mendel J. Centroid of a type-2 fuzzy set. Inf. Sci. 2001;132:195–220. [Google Scholar]
  • 39.Shang Y., Ruml W., Zhang Y., Fromherz M.P. Proceedings of the 4th ACM International Symposium on Mobile Ad Hoc Networking and Computing. ACM; 2003. Localization from mere connectivity; pp. 201–212. [Google Scholar]
  • 40.Pinkley R.L., Gelfand M.J., Duan L. When, where and how: the use of multidimensional scaling methods in the study of negotiation and social conflict. Int. Negot. 2005;10(1):79–96. [Google Scholar]
  • 41.Alotaibi K. University of East Anglia; 2015. Non-Metric Multi-Dimensional Scaling for Distance-Based Privacy-Preserving Data Mining. Ph.D. Thesis. [Google Scholar]
  • 42.Cybenko G. Approximation by superpositions of a sigmoidal function. Math. Control, Signals Syst. 1989;2(4):303–314. [Google Scholar]
  • 43.Vidal R., Bruna J., Giryes R., Soatto S. 2017. Mathematics of Deep Learning. arXiv preprint arXiv:1712.04741. [Google Scholar]
  • 44.Hornik K., Stinchcombe M., White H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989;2(5):359–366. [Google Scholar]
  • 45.Revell A.D., Wang D., Harrigan R., Hamers R.L., Wensing A.M.J., Dewolf F., Nelson M., Geretti A.-M., Larder B.A. Modelling response to HIV therapy without a genotype: an argument for viral load monitoring in resource-limited settings. J. Antimicrob. Chemother. 2010;65(4):605–607. doi: 10.1093/jac/dkq032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Ying H., Lin F., MacArthur R.D., Cohn J.A., Barth-Jones D.C., Ye H., Crane L.R. A self-learning fuzzy discrete event system for HIV/AIDS treatment regimen selection. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 2007;37(4):966–979. doi: 10.1109/tsmcb.2007.895360. [DOI] [PubMed] [Google Scholar]

Articles from Heliyon are provided here courtesy of Elsevier

RESOURCES