Skip to main content
Saudi Journal of Biological Sciences logoLink to Saudi Journal of Biological Sciences
. 2017 Jan 25;24(3):563–573. doi: 10.1016/j.sjbs.2017.01.027

Analytical fuzzy approach to biological data analysis

Weiping Zhang a, Jingzhi Yang b, Yanling Fang c, Huanyu Chen c, Yihua Mao d,, Mohit Kumar c,
PMCID: PMC5372457  PMID: 28386181

Abstract

The assessment of the physiological state of an individual requires an objective evaluation of biological data while taking into account both measurement noise and uncertainties arising from individual factors. We suggest to represent multi-dimensional medical data by means of an optimal fuzzy membership function. A carefully designed data model is introduced in a completely deterministic framework where uncertain variables are characterized by fuzzy membership functions. The study derives the analytical expressions of fuzzy membership functions on variables of the multivariate data model by maximizing the over-uncertainties-averaged-log-membership values of data samples around an initial guess. The analytical solution lends itself to a practical modeling algorithm facilitating the data classification. The experiments performed on the heartbeat interval data of 20 subjects verified that the proposed method is competing alternative to typically used pattern recognition and machine learning algorithms.

Keywords: Modeling, Fuzzy membership functions, Variational optimization

1. Introduction

Data mining is increasingly motivating area of research due to an abundance of data facilitated by modern era of information technology. Data mining techniques such as classification and clustering play a vital role in the development of medical decision support systems contributing to improved healthcare quality. The medical decision making problems inherently involve complexities and uncertainties and thus the researchers have advocated the integration of fuzzy methodologies in medical data interpretation. The handling of uncertainties by capturing of knowledge using fuzzy sets and rules together with an interpretability offered by simple linguistic if-then rules are two most important features of fuzzy methodologies. The fuzzy approaches are commonly applied to medical data classification problems (Fan et al., 2011, Gadaras and Mikhailov, 2009, Nguyen et al., 2015, Papageorgiou, 2011, Seera and Lim, 2014). The mathematical analysis of biomedical signals is performed to construct models identifying the mappings between signal features and the patient’s state. The mathematical relationship between signal features and the patient’s state is affected by uncertainties arising from individual factors (e.g. related to body conditions) that can’t be mathematically taken into account. The fuzzy filters have been previously proposed to alleviate the effect of uncertainties on medical data analysis (Kumar et al., 2007a, Kumar et al., 2007b, Kumar et al., 2008, Kumar et al., 2010a, Kumar et al., 2010b) wherein robust estimation algorithms have been applied to design a fuzzy model that identifies the functional relation between physiological parameters and subjective rating scores. Also, stochastic fuzzy modeling and analysis techniques have been introduced to take simultaneously the advantages of Bayesian analysis and fuzzy theory for a mathematical handling of the uncertainties in biomedical signal analysis (Kumar et al., 2010a, Kumar et al., 2010b, Kumar et al., 2012a, Kumar et al., 2012b). A recent work (Kumar et al., 2016a, Kumar et al., 2016b) introduced in a rigorous manner a stochastic framework for robust fuzzy filtering and analysis of signals. Although Kumar et al., 2016a, Kumar et al., 2016b introduced modeling and analysis framework is general and rests on strong mathematical foundations, it considers only the signal and thus can’t be directly applied to nonsignal multivariate data samples. There remains the need of automated design methods to fully exploit the uncertain handling capabilities of fuzzy systems. The typically used approaches to design the fuzzy sets and systems include evolutionary algorithms (Alcala et al., 2009, Antonelli et al., 2012, Cococcioni et al., 2011, Gacto et al., 2010, Pulkkinen and Koivisto, 2010, Robles et al., 2009), data clustering (Celikyilmaz and Turksen, 2008, Chen and Chen, 2007, Liao et al., 2003, Oh et al., 2003), adaptive filtering (Aliasghary and Arghavani, 2012, Kumar et al., 2006, Kumar et al., 2009a, Kumar et al., 2009b, Mottaghi-Kashtiban et al., 2008, Simon, 2005), and information theoretic concepts (Aliasghary and Arghavani, 2012, Au et al., 2006, Makrehchi et al., 2003). The determination of fuzzy membership functions remains a challenge as membership functions, due to the nonlinearity of the problem, can’t be optimized analytically. Thus, most design methods of fuzzy membership functions lack in mathematical theory and are based on numerical algorithms which might be slow and inexact. Recently, (Kumar et al., 2016a, Kumar et al., 2016b) introduced an analytical approach for the determination of fuzzy membership functions using the variational optimization method. The proposed analytical approach of (Kumar et al., 2016a, Kumar et al., 2016b) allows to mathematically incorporate the given modeling scenario in fuzzy membership functions’ design problem and thus can be potentially extended to medical data modeling scenario. The authors observe that the application of fuzzy paradigm in medicine, despite being an extensively studied area, doesn’t provide a rigorous analytically derived methodology or approach to interpret medical data while taking mathematically into account the measurement noise as well as the individuality.

The medical data are multi-dimensional whose good representation by means of fuzzy membership functions is the aim of the mathematical theory presented in this study. This text introduces a data model that takes into account both measurement noise and uncertainties arising from individuality related factors. A multivariate data sample, represented as y = [y1 ⋯ yP]T ∈ RP, is assumed to be generated by an uncertain signal model displayed in Fig. 1. It is assumed an uncertain signal model for a scalar yj. Here, yj is the observed value of an unknown scalar mj being affected by measurement noise vj and uncertainty uj. The uncertainty uj (equal to the dot product of Gj ∈ RK and α ∈ RK) is being generated by a linear combination of K different sources: (α1, ⋯ ,αK) that the jth element of y is generated as

yj=mj+uj+vj

where vj is the measurement noise, uj is the uncertainty affecting the model, and mj is an unobserved scalar variable. The uncertainties are assumed to be generated by linearly transforming a K-dimensional (K ⩽ P) vector α=α=[α1αK]TRK as follows:

u1uP=G11G1KGP1GPKα1αK.

Figure 1.

Figure 1

An uncertain signal model for a scalar yj.

Defining Gj=[Gj1GjK]TRK, uj can be expressed as the dot product of Gj and α, i.e.,

uj=(Gj)Tα.

Our approach is of

  • 1.

    treating all the variables (appearing in the uncertain signal model of Fig.1) as uncertain being characterized by fuzzy membership functions.

  • 2.

    assuming that medical data, under the given status of a patient, is generated by a finite mixture of uncertain signal models of the type that of Fig. 1.

  • 3.

    determining the fuzzy membership functions on variables with the help of experimentally measured data samples in an analytical manner using variational optimization (Kumar et al., 2016a, Kumar et al., 2016b).

The approach results in a tractable solution to model the multivariate data samples by means of fuzzy membership functions and thus medical decision support systems can be built up on the top of the data models.

The modeling of data using a finite mixture of signal models of the type of Fig. 1 is typically considered in a stochastic setting assuming variables as random (i.e. characterized by probability distribution functions) and Bayesian framework is commonly used for the inference of posterior distributions. The originality of this study lies in solving the modeling problem in a completely deterministic framework where fuzzy membership functions are defined over variables to characterize uncertainties about their values. The optimal shapes of fuzzy membership functions are determined via analytically maximizing the “over uncertainties averaged log membership” values of data samples around an initial guess. The maximization problem is analytically solved using variational optimization as suggested initially in Kumar et al., 2016a, Kumar et al., 2016b. The contribution of this study is to derive the analytical expressions of fuzzy membership functions on variables of the multivariate data model leading to the development of a classification algorithm. It is demonstrated through experimental data that our approach is competing alternative to typically used classification algorithms including “k-nearest neighbors”, “support vector machines”, “decision tree”, “random forest”, “AdaBoost”, “Gaussian naive Bayes”, “linear discriminant analysis”, and “quadratic discriminant analysis”. The better classification performance of our approach is attributed to the efficient modeling of the data distribution in multi-parametric space. The significance of this work is that the analytically derived expressions for fuzzy membership functions for representing uncertainties associated with medical data would facilitate a system theoretic approach to mathematically design the medical expert systems. This would provide researchers, unlike typically used ad-hoc numerical algorithms, a mathematical theory on fuzzy membership functions’ applications in medicine.

This text is organized into sections. Section 2 introduces an uncertain model of multivariate data and an analytical solution for optimizing the data model is provided in Section 3. A practical algorithm, based on the derived analytical solution, is stated in Section4 4 for the modeling of multivariate data samples. Section 5 applies the proposed approach on the experimental heartbeat interval data of 20 subjects followed by concluding remarks in Section 6.

2. An uncertain model of multivariate data

By an uncertain model, it is meant that system variables are characterized by fuzzy membership functions. Despite the availability of a wide range of fuzzy membership function types, only following two types of fuzzy membership functions are chosen to model the variables for keeping the analysis in its most basic form:

Definition 1 Gaussian’s membership function (Kumar et al., 2016a, Kumar et al., 2016b) —

The Gaussian membership function on a vector x ∈ Rn, with mean equal to mx and precision equal to Λx, is defined as

μ(x;mx,Λx)=exp-12(x-mx)TΛx(x-mx),mxRn,Λx-1>0.

Definition 2 Gamma membership function (Kumar et al., 2016a, Kumar et al., 2016b) —

The Gamma membership function on a non-negative scalar z can be defined as

μ(z;a,b)=ba-1a-1exp(a-1)(z)a-1exp(-bz),a1,b>0.

A few examples of this type of membership functions for different values of a and b are provided in Fig. 2. The parameter a is referred to as the shape parameter and b is referred to as the rate parameter (i.e. the reciprocal of the scale parameter). The peak of the membership function is given at (a − 1)/b. The skewness of the membership function is inversely proportional to the value of a. The Gamma membership function can alternatively be represented as

μ(z;r,s)=(s)rexp(r)(z)rexp(-srz),r0,s>0

Figure 2.

Figure 2

A few examples of Gamma membership functions (Kumar et al., 2016a, Kumar et al., 2016b).

The relations between the parameters of two forms of Gamma membership functions are as follows:

r=a-1,s=b/(a-1).

All of the variables, appearing in Fig. 1, are assigned carefully either of Gaussian or Gamma membership function in Definition 3, Definition 4, Definition 5, Definition 6, Definition 7, Definition 8.

Definition 3 Fuzzy membership function on vj

The fuzzy membership function on vj ∈ R is defined as zero-mean Gaussian with scaled precisions as

μ(vj;λy,zyj)=exp-λyzyj2vj2 (1)

where λy>0 is the precision scaled by zyj>0. The uncertainties of λy and zyj are characterized by the following Gamma membership functions:

μ(λy;aλy,bλy)=bλyaλy-1aλy-1exp(aλy-1)(λy)aλy-1exp(-bλyλy),aλy1,bλy>0>
μ(zyj;ry,sy)=(sy)ryexp(ry)(zyj)ryexp(-rysyzyj),ry0,sy>0.

Here, ry>0, and sy>0 are uncertain as well as characterized by the following Gamma membership functions:

μ(ry;aryy,bry)=bryary-1ary-1exp(ary-1)(ry)ary-1exp(-bryry),ary1,bry>0
μ(sy;asyy,bsy)=bsyasy-1asy-1exp(asy-1)(sy)asy-1exp(-bsysy),asy1,bsy>0

Definition 4 Fuzzy membership function on yj

The fuzzy membership function on yjR, for a given (mj,Gj,α,λy,zyj), is defined as

μ(yj;mj,Gj,α,λy,zyj)=exp-λyzyj2(yj-mj-(Gj)Tα)2.

The membership function on yj is derived by replacing vj in (1) by yj-mj-(Gj)Tα.

Definition 5 Fuzzy membership function on y

The multivariate fuzzy membership function on yRP, for a given ({mj}j=1P,{Gj}j=1P,α,λy,{zyj}j=1P), is defined as the product of its individual elements’ membership functions as

μ(y;{mj}j=1P,{Gj}j=1P,α,λy,{zyj}j=1P)=j=1Pμ(yj;mj,Gj,α,λy,zyj)=exp-λy2j=1Pzyj(yj-mj-(Gj)Tα)2

Definition 6 Fuzzy membership function on m

The multivariate fuzzy membership function on m = [m1mP]TRP is defined as Gaussian as

μ(m;mo,Λo)=exp-12(m-mo)TΛo(m-mo),moRP,Λo>0.

Definition 7 Fuzzy membership function on α

The multivariate fuzzy membership function on αRK is defined as zero-mean Gaussian with precision equal to unity matrix as

μ(α)=exp-12(α)Tα.

Definition 8 Fuzzy membership function on Gj

The multivariate fuzzy membership function on Gj=[Gj1GjK]TRK is defined as zero-mean Gaussian as

μ(Gj;{ϕk}k=1K)=exp-12k=1K(Gjk)2ϕk

where ϕk>0 is the precision of kth element of Gj and is uncertain characterized by the following Gamma membership function:

μ(ϕk;aϕ,bϕ)=(bϕaϕ-1)aϕ-1exp(aϕ-1)(ϕk)aϕ-1exp(-bϕϕk),aϕ1,bϕ>0.

To model the multivariate data sample distributed arbitrarily in P-dimensional data space, a mixture of finite number of uncertain signal models is considered in Definition 9.

Definition 9 Fuzzy membership of y as a finite mixture of uncertain signal models —

The fuzzy membership function on y=[y1yP]TRP, for a given ({πi}i=1C,Ω), is defined as a mixture of C different uncertain signal models as

μ(y;{πi}i=1C,Ω)
=exp-π12λy1j=1Pzyj1(yj-mj1-(Gj)Tα)2-πC2λyCj=1PzyjC(yj-mjC-(Gj)TαC)2

where πi[0,1] is the mixing proportion of the ith uncertain signal model with i=1Cπi=1, and Ω is a set of parameters defined as

Ω={{αi}i=1C,{Gj}j=1P,{ϕk}k=1K,{mi}i=1C,{{zyji}j=1P}i=1C,ry,sy,{λyi}i=1C}

where αiRK (KP) is uncertain characterized by the following Gaussian membership function

μ(αi)=exp-12(αi)Tαi;

Gj=[Gj1GjK]TRK is uncertain characterized by the following Gaussian membership function

μGj;{k}k=1K=exp-12k=1K(Gjk)2k,k>0

k>0 is uncertain characterized by the following Gamma membership function:

μ(k;α,b)=bα-1α-1exp(α-1)(k)α-1exp(-bk),α1,b>0;

mi=[m1imPi]TRP is uncertain characterized by the following Gaussian membership function:

μ(mi;moi,Λoi)=exp-12(mi-moi)TΛoi(mi-moi),moiRK,Λoi>0;

zyji>0 is uncertain scalar characterized by the following Gamma membership function:

μ(zyji;ry,sy)=(sy)ryexp(ry)(zyji)ryexp(-rysyzyji),ry1,sy>0;

ry is uncertain characterized by the following Gamma membership function:

μ(ry;ary,bry)=bryary-1ary-1exp(ary-1)(ry)ary-1exp(-bryry),ary1,bry>0;

sy is uncertain characterized by the following Gamma membership function:

μ(sy;asy,bsy)=bsyasy-1ary-1exp(asy-1)(Sy)asy-1exp(-bsysy),asy1,bsy>0;

λyi>0 is uncertain scalar characterized by the following Gamma membership function:

μ(λyi;aλy,bλy)=bλyaλy-1aλy-1exp(aλy-1)(λyi)aλy-1exp(-bλyλyi),aλy1,bλy>0.

3. Analytical optimization of mixture of uncertain signal models

Given N data samples, {yn}n=1N, the aim is to define the multivariate fuzzy membership function on y in an “optimal” manner. The approach is to optimize the fuzzy membership function (defined on y by Definition 1) with respect to {πi}i=1C while taking into account the uncertainties of the parameters represented zyji by set Ω. To take into account the uncertainties of the parameters represented by the set Ω, the “optimal” membership functions on the parameters must be first determined. For this, assume that q(αi), q(Gj), q(k), q(mi), q(zyji), q(ry), q(sy), and q(λyi) are arbitrary fuzzy membership functions on αi, Gj, k, mi,, ry, sy and λyi respectively. Define a function, q(Ω), as follows

q(Ω)=i=1Cq(αi)j=1Pq(Gj)k=1Kq(k)i=1Cq(mi)i=1Cj=1Pq(zyji)q(ry)q(sy)i=1Cq(λyi)

Define a differential functional, Ω, as follows

Ω=i=1Cαij=1PGjk=1Kki=1Cmii=1Cj=1Pzyji(ry)(sy){i=1Cλyi}

Define a differential functional, μ(Ω), as follows

μ(Ω)=i=1Cμ(αi)j=1Pμ(Gj);{k}k=1KK=1Kμ(K;a,b)i=1Cμ(mi;mo,Λo)×i=1Cj=1Pμ(zyji;ry,sy)μ(ry;ary,bry)μ(sy;asy,bsy){i=1Cμ(λyi;aλy,bλy)}

The optimization process maximizes an objective functional, F, defined as

F{{πin}i=1C}n=1N,q(Ω)=1Ωq(Ω)Ωq(Ω)n=1Nlog(μ(yn;{πin}i=1C,Ω))N-1Ωq(Ω)Ωq(Ω)logq(Ω)μ(Ω)-1Nn=1Ni=1Cπinlogπinπio (2)

F is maximized with respect to q(αi), q(Gj), q(k), q(mi), q(zyji), q(ry), q(sy), and q(λyi) and {πi=1n}i=1C under the following constraints:

  • 1.
    Fixed Integral Constraints on Membership Functions: αiq(αi=kαi>0),
    Gjq(Gj)=kGj>0,kq(k)=kk>0,miq(mi)=kmi>0,
    zyjiq(zyji)=kzyji>0,ryq(ry)=kry>0,syq(sy)=ksy>0,λyiq(λyi)=kλyi>0.
  • 2.

    Unity Maximum Value Constraints on Membership Functions: The values of kαi,kGj,kk,kmi,kzyji,kry,ksy, and kλyi are so chosen such that maximum value of q(αi), q(Gj), q(k), q(mi), q(zyji), q(ry), q(sy), and q(λyi) is equal to one.

  • 3.

    Unity Sum Constraint on Mixing Proportions: i=1Cπin=1,πin0,1.

The first term of F computes the averaged log-membership value of data samples when the average is taken over uncertain parameters Ω being modeled by membership function q(Ω). The second term of F regularizes the maximization problem toward initial guess μ(Ω). The third term of F regularizes the estimation of πin toward initial guess πio.

Result 1

The analytical expressions for variational membership functions, that maximize F under Fixed Integral and Unity Maximum Value Constrains, are

q(αi)=exp-12(αi-m^αi)TΛ^αi(αi-m^αi),Λ^αi=I+n=1Nj=1Pπ^inNa^λyib^λyia^zyjib^zyji(m^Gj(m^Gj)T+(Λ^Gj)-1) (3)
m^αi=(Λ^αi)-1n=1Nj=1Pπ^inNa^λyib^λyia^zyjib^zyji(yjn-IjPm^mi)m^Gjq(Gj)=exp-12(Gj-m^Gj)TΛ^Gj(Gj-m^Gj), (4)

graphic file with name fx1.jpg

q(ϕk)=b^ϕka^ϕk-1a^ϕk-1exp(a^ϕk-1)(ϕk)a^ϕk-1exp(-b^ϕkϕk),a^ϕk=aϕ (7)
b^ϕk=bϕ+12j=1P{(IkKm^Gj)2+Tr((Λ^Gj)-1(IkK)TIkK)}q(mi)=exp-12(mi-m^mi)TΛ^mi(mi-m^mi), (8)

graphic file with name fx2.jpg

q(λyi)=b^λyia^λyi-1a^λyi-1exp(a^λyi-1)(λyi)a^λyi-1exp(-b^λyiλyi).a^λyi=aλy (11)
b^λyi=bλy+12n=1Nj=1Pπ^inNa^zyjib^zyji(yjn-IjPm^mi-(m^Gj)Tm^αi)2+Tr((Λ^mi)-1(IjP)TIjP)+(m^αi)T(Λ^Gj)-1m^αi+(m^Gj)T(Λ^αi)-1m^Gj+Tr((Λ^Gj)-1(Λ^αi)-1) (12)
q(zyji)=b^zyjia^zyji-1a^zyji-1exp(a^zyji-1)(zyji)a^zyji-1exp(-b^zyjizyji).a^zyji=a^ryb^ry+1 (13)
b^zyji=a^ryb^rya^syb^sy+12n=1Nπ^inNa^λyib^λyi(yjn-IjPm^mi-(m^Gj)Tm^αi)2+Tr((Λ^mi)-1(IjP)TIjP)+(m^αi)T(Λ^Gj)-1m^αi+(m^Gj)T(Λ^αi)-1m^Gj+Tr((Λ^Gj)-1(Λ^αi)-1) (14)
fin=-12j=1Pa^λyib^λyia^zyjib^zyji(yjn-IjPm^mi-(m^Gj)Tm^αi)2+Tr((Λ^mi)-1(IjP)TIjP)+(m^αi)T(Λ^Gj)-1m^αi+(m^Gj)T(Λ^αi)-1m^Gj+Tr((Λ^Gj)-1(Λ^αi)-1) (15)
π^in=πioexp(fin)i=1Cπioexp(fin). (16)
q(ry)=b^rya^ry-1a^ry-1exp(a^ry-1)(ry)a^ry-1exp(-b^ryry),a^ry=ary (17)
b^ry=bry+a^syb^syi=1Cj=1Pa^zyjib^zyji-CP{ψ(a^sy)-log(b^sy)}-CP-i=1Cj=1P{ψ(a^zyji)-log(b^zyji)} (18)
q(sy)=b^sya^sy-1a^sy-1exp(a^sy-1)(sy)a^sy-1exp(-b^sysy),a^sy=asy+CPa^ryb^ry (19)
b^sy=bsy+a^ryb^ryi=1Cj=1Pa^zyjib^zyji (20)

Once the membership functions representing the uncertainties on the parameters have been optimally determined, the optimal multivariate fuzzy membership function on y = [y1 ⋯ yP]T ∈ RP is defined by averaging over the uncertainties such that

μ(y)exp<log(μ(y;{πi}i=1C,Ω))>q(Ω)

where

πi=πioexp(fi)i=1Cπioexp(fi)
fi=-12j=1Pa^λyib^λyia^zyjib^zyji(yj-IjPm^mi-(m^Gj)Tm^αi)2+Tr((Λ^mi)-1(IjP)TIjP)+(m^αi)T(Λ^Gj)-1m^αi+(m^Gj)T(Λ^αi)-1m^Gj+Tr((Λ^Gj)-1(Λ^αi)-1).

After evaluating the integral, log(μ(y;{πi}i=1C,Ω))q(Ω), the expression of the optimal membership function on y is as follows:

log(μ(y))-12i=1Cj=1Pπia^λyib^λyia^zyjib^zyji(yj-IjPm^mi-(m^Gj)Tm^αi)2+Tr((Λ^mi)-1(IjP)TIjP)+(m^αi)T(Λ^Gj)-1m^αi+(m^Gj)T(Λ^αi)-1m^Gj+Tr((Λ^Gj)-1(Λ^αi)-1).

Finally, the constant of proportionality is chosen equal to one resulting in

μ(y)=exp-12i=1Cj=1Pπia^λyib^λyia^zyjib^zyji(yj-IjPm^mi-(m^Gj)Tm^αi)2+Tr((Λ^mi)-1(IjP)TIjP)+(m^αi)T(Λ^Gj)-1m^αi+(m^Gj)T(Λ^αi)-1m^Gj+Tr((Λ^Gj)-1(Λ^αi)-1). (21)

4. An Algorithm for multivariate data modeling

4.1. Algorithm

The analytical solution to mixture of uncertain signal models, derived in section (3), lends itself to Algorithm 1 for the modeling of multivariate data samples by determining membership functions on all of the variables and parameters. Algorithm 1 suggests to choose initial values of parameters based on k-means clustering and eigenvalue decomposition of sample covariance matrix.

graphic file with name fx3.jpg

Remark 1 (Complexity and Iterations) Algorithm 1 is based on the invoking of parameters updating rules (3–20). The time complexity of the algorithm, as a result of computing the inverse of a P × P sized matrix in update rule (10), is O(P 3). Algorithm 1, after initializing the parameters, invokes a single iteration of parameters updating rules. Thanks to the analytically derived solution due to which a single iteration is sufficient for parameters to nearly converge after initializing the parameters carefully. However, the optimal values of C and K are determined by maximizing the average fuzzy membership value of the data samples through repeated application of update rules.

Remark 2 (Free parameter β in Algorithm 1) Algorithm 1 has only single free parameter, β ∈ [0, 0.5], to be chosen by the user. The maximum possible number of signal models in the mixture, Cmax, depends on the value of β. It will be demonstrated through experiments that algorithm’s performance is not highly sensitive to the choice of β.

4.2. Data distribution modeling

The application of Algorithm 1 on given data samples {yn}n=1N results in the determination of Copt different fuzzy membership functions on unobserved variable m which (membership functions) are defined as

μi(m;m^mi,Λ^mi)=exp-12(m-m^mi)TΛ^mi(m-m^mi),i{1,,Copt}.

Let M be the set of parameters returned by Algorithm, i.e., M={(m^mi,Λ^mi)}i=1Copt. Finally, a data model, constructed from {yn}n=1N using Algorithm, is represented by a fuzzy membership function defined as

μ(y;M)=max1iCoptexp-12(y-m^mi)TΛ^mi(y-m^mi). (22)

4.3. Classification

The data modeling capability of functional μ(m;M) can be exploited for the classification purpose. If M1,,MS are S different sets returned by Algorithm corresponding to the data samples of S different classes, then the class-label associated to a vector y could be predicted as

pred_label(y)=argmax1sSμ(y;Ms) (23)

4.4. Demonstrations on Toy data sets

Fig.3 shows an example of the 2-dimensional data samples and a display of the fuzzy membership function μ(y;M) (calculated using (22)) over the data space. As depicted in Fig.3, the distribution of the samples {yn}n=1N in P-dimensional space is modeled by the fuzzy membership function μ(y;M). Stochastic mixture models have been extensively studied in the literature and are typically used to learn data distributions. The most commonly used Gaussian mixture models(GMM) fit the given data samples by assuming that each data sample has been generated by a stochastic mixture of a finite number of the Gaussian distributions. “Expectation Maximization” algorithm is typically used for the learning of the Gaussian mixture models from data samples where the number of components in the mixture can be efficiently selected using the Bayesian information criterion (BIC). There may arise the situations when GMM don’t give favorable results. Fig.4(a) is an example of data samples where better performance of Algorithm 1 than GMM (together with BIC) is observed. A comparison between color plots of GMM based likelihood (displayed inFig.4(b)) andAlgorithm 1 based fuzzy membership function (displayed in Fig.4(c)) demonstrates the effectiveness of Algorithm 1 in modeling the distribution of data samples.

Figure 3.

Figure 3

An example of the model learned from 2-dimensional data samples using Algorithm 1 (with β = 0.5).

Figure 4.

Figure 4

An example of the comparison between the Gaussian mixture models and Algorithm 1 (with β = 0.5).

5. Heartbeat intervals classification

The section applies the proposed methodology on the experimentally recorded heartbeat intervals (referred to as the R-R intervals) of 20 different subjects while they were performing two different types of tasks in a chemical laboratory of Zhejiang University. One task involved manual pipetting of the chemical solutions while the other task involved working with the computer. The aim is to classify heartbeat intervals of a subject between the two tasks. The P-dimensional data samples were created from the sequence of R-R intervals as(see Table 1)

Yi=[RRi-P+1RRi]T

where RRi is ith heartbeat interval. The R-R intervals corresponding to the first half of the task duration serve as the training data and that of second half as testing data. Table 2 lists the median of classification accuracy over 20 subjects, obtained on testing data by different classification methods, for different values of data dimension P. The better classification accuracy of the analytical fuzzy approach in Table 2 supports the arguments that proposed approach could be an effective tool for modeling and analysis of biomedical data.

Table 1.

A comparison of different classification algorithms with the proposed method in term of classification accuracy on testing data.

Method Dataset 1 Dataset 2 Dataset 3
Nearest neighbors 100% 100% 75%
Linear SVM 91% 46% 51%
RBF SVM 90% 100% 59%
Decision tree 98% 100% 80%
Random forest 98% 100% 73%
AdaBoost 93% 97% 80%
Naive Bayes 92% 97% 57%
LDA 90% 29% 52%
QDA 90% 96% 57%



Analytical fuzzy (β = 0.5) 100% 100% 82%

Table 2.

A The median accuracy (in %) of different algorithms in classifying the testing heartbeat intervals between two tasks performed by subjects.

Method Median of % accuracy (P = 2) % accuracy (P = 2) Median of % accuracy (P = 4) Median of % accuracy (P = 6) Median of % accuracy (P = 8)
Nearest neighbors 87.11 90.33 91.08 92.65
Linear SVM 87.11 89.24 90.64 91.58
RBF SVM 84.07 84.17 86.99 90.11
Decision tree 84.95 87.22 88.83 89.57
Random forest 86.75 88.93 90.84 92.51
AdaBoost 88.36 90.72 91.87 92.60
Naive Bayes 87.40 89.27 91.05 92.18
LDA 88.67 90.70 91.59 92.99
QDA 88.04 88.46 90.08 90.97



Analytical fuzzy (β = 0) 88.75 91.16 92.14 93.14

6. Concluding remarks

The theoretical contribution of this work is to propose an analytical fuzzy approach that provides a principled basis for determining the fuzzy membership functions to handle uncertainties in a modeling problem. The theoretical results form the basis for designing an algorithm that results in an efficient modeling of the data distribution in multi-parametric space. The analytically derived expressions for fuzzy membership functions for representing uncertainties associated with biomedical data should facilitate a system theoretic approach to mathematically design the medical expert systems.

Footnotes

Peer review under responsibility of King Saud University.

Contributor Information

Yihua Mao, Email: maoyihua@zjubh.com.

Mohit Kumar, Email: mohit.kumar@zjubh.com.

References

  1. Alcala R., Ducange P., Herrera F., Lazzerini B., Marcelloni F. A multiobjective evolutionary approach to concurrently learn rule and data bases of linguistic fuzzy-rule- based systems. IEEE Trans. Fuzzy Syst. 2009;17(5):1106–1122. [Google Scholar]
  2. Aliasghary M., Arghavani N. 2012. H∞ estimation for optimization of rational-powered membership functions; pp. 251–256. (2012 IEEE 13th International Symposium on Computational Intelligence and Informatics (CINTI)). [Google Scholar]
  3. Antonelli M., Ducange P., Marcelloni F. Genetic training instance selection in multiobjective evolutionary fuzzy systems: a coevolutionary approach. IEEE Trans. Fuzzy Syst. 2012;20(2):276–290. [Google Scholar]
  4. Au W.H., Chan K., Wong A.K. A fuzzy approach to partitioning continuous attributes for classification. IEEE Trans. Knowl. Data Eng. 2006;18(5):715–719. [Google Scholar]
  5. Celikyilmaz A., Turksen I. Enhanced fuzzy system models with improved fuzzy clustering algorithm. IEEE Trans. Fuzzy Syst. 2008;16(3):779–794. [Google Scholar]
  6. Chen L., Chen C. ISIC; 2007. Pre-shaped fuzzy c-means algorithm (pfcm) for transparent membership function generation; pp. 789–794. (IEEE International Conference on Systems, Man and Cybernetics). [Google Scholar]
  7. Cococcioni M., Lazzerini B., Marcelloni F. On reducing computational overhead in multi-objective genetic takagi-sugeno fuzzy systems. Appl. Soft Comput. 2011;11(1):675–688. [Google Scholar]
  8. Fan C.Y., Chang P.C., Lin J.J., Hsieh J. A hybrid model combining case-based reasoning and fuzzy decision tree for medical data classification. Appl. Soft Comput. 2011;11(1):632–644. [Google Scholar]
  9. Gacto M., Alcala R., Herrera F. Integration of an index to preserve the semantic interpretability in the multiobjective evolutionary rule selection and tuning of linguistic fuzzy systems. IEEE Trans. Fuzzy Syst. 2010;18(3):515–531. [Google Scholar]
  10. Gadaras I., Mikhailov L. An interpretable fuzzy rule-based classification methodology for medical diagnosis. Artif. Intell. Med. 2009;47(1):25–41. doi: 10.1016/j.artmed.2009.05.003. [DOI] [PubMed] [Google Scholar]
  11. Kumar M., Stoll R., Stoll N. Deterministic approach to robust adaptive learning offuzzy models. IEEE Trans. Syst. Man Cybern. B Cybern. 2006;36(4):767–780. doi: 10.1109/tsmcb.2006.870625. [DOI] [PubMed] [Google Scholar]
  12. Kumar M., Stoll N., Kaber D., Thurow K., Stoll R. 2007. Fuzzy filtering for an intelligent interpretation of medical data; pp. 225–230. (Proc. IEEE International Conference on Automation Science and Engineering (CASE 2007), Scottsdale, Arizona USA). [Google Scholar]
  13. Kumar M., Weippert M., Vilbrandt R., Kreuzfeld S., Stoll R. Fuzzy evaluation of heart rate signals for mental stress assessment. IEEE Trans. Fuzzy Syst. 2007;15(5):791–808. [Google Scholar]
  14. Kumar M., Arndt D., Kreuzfeld S., Thurow K., Stoll N., Stoll R. Fuzzy techniques for subjective workload score modelling under uncertainties. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2008;38(6):1449–1464. doi: 10.1109/TSMCB.2008.927712. [DOI] [PubMed] [Google Scholar]
  15. Kumar M., Stoll N., Stoll R. Adaptive fuzzy filtering in a deterministic setting. IEEE Trans. Fuzzy Syst. 2009;17(4):763–776. [Google Scholar]
  16. Kumar M., Stoll N., Stoll R. On the estimation of parameters of takagi-sugeno fuzzy filters. IEEE Trans. Fuzzy Syst. 2009;17(1):150–166. [Google Scholar]
  17. Kumar M., Weippert M., Arndt D., Kreuzfeld S., Thurow K., Stoll N., Stoll R. Fuzzy filtering for physiological signal analysis. IEEE Trans. Fuzzy Syst. 2010;18(1):208–216. [Google Scholar]
  18. Kumar M., Weippert M., Stoll N., Stoll R. A mixture of fuzzy filters applied to the analysis of heartbeat intervals. Fuzzy Optim. Decis. Making. 2010;9(4):383–412. [Google Scholar]
  19. Kumar M., Neubert S., Behrendt S., Rieger A., Weippert M., Stoll N., Thurow K., Stoll R. Stress monitoring based on stochastic fuzzy analysis of heartbeat intervals. IEEE Trans. Fuzzy Syst. 2012;20(4):746–759. [Google Scholar]
  20. Kumar M., Stoll N., Thurow K., Stoll R. 2012. Physiological signals to individual assessment for application in wireless health systems; pp. 1–6. (Proc. 9th International Multi-Conference on Systems, Signals and Devices (SSD)). [Google Scholar]
  21. Kumar M., Stoll N., Stoll R., Thurow K. A stochastic framework for robust fuzzy filtering and analysis of signals-part i. IEEE Trans. Cybern. 2016;46(5):1118–1131. doi: 10.1109/TCYB.2015.2423657. [DOI] [PubMed] [Google Scholar]
  22. Kumar M., Stoll N., Stoll R., Thurow K. Variational optimization of fuzzy membership functions. Artif. Intell. Under-Rev. 2016 [Google Scholar]
  23. Liao T.W., Celmins A.K., Hammell R.J. A fuzzy c-means variant for the generation of fuzzy term sets. Fuzzy Sets Syst. 2003;135(2):241–257. [Google Scholar]
  24. Makrehchi M., Basir O., Kamel M. Generation of fuzzy membership function using information theory measures and genetic algorithm. In: Bilgiç T., De Baets B., Kaynak O., editors. vol. 2715. Springer; Berlin Heidelberg: 2003. pp. 603–610. (Fuzzy Sets and Systems – IFSA 2003, Lecture Notes in Computer Science). [Google Scholar]
  25. Mottaghi-Kashtiban M., Khoei A., Hadidi K. Optimization of rational-powered membership functions using extended kalman filter. Fuzzy Sets Syst. 2008;159(23):3232–3244. [Google Scholar]
  26. Nguyen T., Khosravi A., Creighton D., Nahavandi S. Medical data classification using interval type-2 fuzzy logic system and wavelets. Appl. Soft Comput. 2015;30:812–822. [Google Scholar]
  27. Oh S.K., Pedrycz W., Park H.S. Hybrid identification in fuzzy-neural networks. Fuzzy Sets Syst. 2003;138(2):399–426. [Google Scholar]
  28. Papageorgiou E.I. A new methodology for decisions in medical informatics using fuzzy cognitive maps based on fuzzy rule-extraction techniques. Appl. Soft Comput. 2011;11(1):500–513. [Google Scholar]
  29. Pulkkinen P., Koivisto H. A dynamically constrained multiobjective genetic fuzzy system for regression problems. IEEE Trans. Fuzzy Syst. 2010;18(1):161–177. [Google Scholar]
  30. Robles I., Alcalá R., Benítez J.M., Herrera F. Evolutionary parallel and gradually distributed lateral tuning of fuzzy rule-based systems. Evol. Intell. 2009;2(1–2):5–19. [Google Scholar]
  31. Seera M., Lim C.P. A hybrid intelligent system for medical data classification. Expert Syst. Appl. 2014;41(5):2239–2249. [Google Scholar]
  32. Simon D. H∞ estimation for fuzzy membership function optimization. Int. J. Approximate Reasoning. 2005;40(3):224–242. [Google Scholar]

Articles from Saudi Journal of Biological Sciences are provided here courtesy of Elsevier

RESOURCES