Skip to main content
Heliyon logoLink to Heliyon
. 2022 Jun 9;8(6):e09715. doi: 10.1016/j.heliyon.2022.e09715

Data dimensionality reduction technique for clustering problem of metabolomics data

Rustam a,, Agus Yodi Gunawan b, Made Tri Ari Penia Kresnowati c
PMCID: PMC9201019  PMID: 35721675

Abstract

In metabolomics studies, independent analyses or replicating the metabolite concentration measurements are often performed to anticipate errors. On the other hand, the size of the dataset is increasing. For clustering purposes, obtaining representative information chemically from independent analyses is needed. The objective of this study is to develop a data reduction method such that a dataset that represents chemical information is obtained. Overall a proper data reduction method would simplify the clustering of metabolite data. We propose the modified Weiszfeld algorithm (MWA) to reduce independent analyses. To obtain comprehensive results, we compare MWA with some other well-known reduction methods, including PCA, CMDS, LE, and LLE. Then reduced datasets are clustered using the fuzzy c-means (FCM) algorithm with the Tang Sun Sun (TSS) index and silhouette index as the cluster validity indices. The results show that MWA, together with PCA, present the optimal number of clusters, namely four clusters. This result aligns with the optimal number of clusters before dimensionality reduction. The present results show that MWA is robust to perform dimensionality reduction of independent analyses while maintaining chemical information on the reduced dataset. Therefore, we recommend the reliability of MWA as one of the chemometric techniques, and the present finding has enriched chemometric techniques in metabolomics studies.

Keywords: Metabolomics, Chemometric, Metabolite data, Dimensionality reduction, Indonesian clove buds


Metabolomics; Chemometric; Metabolite data; Dimensionality reduction; Indonesian clove buds

1. Introduction

The term metabolomics was introduced about 20 years ago. Since then, metabolomics has seen a tremendous increase in analytics platforms and data analysis [2], [11], [14]. Metabolomics is a comprehensive study related to identifying and quantifying all metabolites (small molecules) in a biological system [16], [38]. A complete picture of an organism's metabolic status and biochemical processes can be obtained by analyzing metabolites in a biological sample [42].

Mass spectrometry (MS) and nuclear magnetic resonance (NMR) are two instruments in metabolomics that have been widely utilized to record the status or metabolic state of biological systems [1], [26], [34], [57]. MS comes in different versions and settings, as stand-alone instruments and in combination with chromatographic separation instruments such as gas chromatography (GC) and liquid chromatography (LC). GC-MS and LC-MS are combinations of MS with chromatographic separation instruments. Using the GC-MS instrument makes it possible to characterize natural product plant compounds with high chemical diversity [21], [53]. Likewise, detailed chromatogram profiles of biological samples can be obtained using GC-MS characterization [18], [21]. Metabolomic data in natural product plants generally consist of large amounts of metabolite, multidimensional, and noisy measurements. A multivariate analysis known as chemometric techniques is necessary to interpret metabolomics data or to obtain meaningful information from a metabolite dataset of a natural product plant. Chemometric is a sub-discipline of chemistry that utilizes mathematics, statistics, and computer science to maximize the information of the measured metabolite dataset [41].

In this research, a metabolomic study is carried out on one of the natural plantation commodities originating from Indonesia, namely the clove buds [28]. Clove buds harvested from different regions are reported to have a specific flavor that may correspond to different metabolic profiles of the clove buds. Differentiating clove buds is needed by manufacturers of cosmetics and foodstuffs that use cloves as a mixture of their products to maintain the quality, particularly the taste, of the product. The method to distinguish the types of clove buds up to present is the conventional qualitative method, namely utilizing the services of a flavorist who tastes and smells buds to identify the aroma and taste of clove buds. The development of metabolic methods will serve as an essential basis to develop an automatic instrument to distinguish different types of clove buds. However, the complexity of the clove buds metabolite dataset hinders the direct clustering of clove buds based on their metabolite compositions. The appropriate technique is needed to handle this complexity. This paper presents a preprocessing method to reduce the size of the metabolite dataset to decrease the complexity of the metabolite dataset.

The typical metabolite dataset has a wide range of metabolite concentrations, namely from 104 to 10. Logarithmic transformations are employed to obtain reliable numerical data. On the other hand, some metabolic have zero concentrations that the logarithmic transformations cannot be directly applied. Metabolites having zero concentration are not removed or omitted from the dataset because the zero concentration could be caused by the limitations of the tools used to detect metabolites with small concentrations (less than 104). However, these metabolites may function as biomarkers of a particular origin [45]. Therefore, we replaced the zero concentration metabolite with one order less than the detected concentration of the smallest metabolite. The metabolite with a zero concentration is replaced 105. Variations between samples may also be high, among others, due to measurement errors. Independent analyses were normally conducted to overcome this problem. Overall these describe the characteristics of the metabolite dataset. Conducting the clustering process directly on the metabolite dataset may lead to meaningless results. For example, independent analyses or replicates of a sample may result in different clusters.

This research aims to search for representative data points (data vector) from independent analyses. In the previous research [44], we have reduced independent analyses using the median. The reduction was performed by finding the median of each metabolite. However, this method is not suitable for the independent analyses carried out in the laboratory. Independent analyses in each region should be viewed as multivariate data, not univariate data, where each metabolite can be reduced using the median. So, the reduction technique of independent analyses by finding the median of each metabolite is less precise.

The recent developments in dimensional reduction techniques on metabolomics data are many of them based on PCA technique [27], [31] and various other machine learning applications [23], [33], [35], [36]. In metabolomics studies, independent analyses are always performed to prevent errors in measuring metabolite concentrations. In this study, the independent analysis was in the metabolite data vector. A region consists of some independent analyses or vectors of metabolite data (see Fig. 1). These some independent analyses need to be reduced to a single vector of metabolite data for clustering purposes. The need to reduce some independent analyses to a single data vector avoids uninformative cluster results. The uninformative cluster results are caused by several independent analyses from the same region, leaving other independent analyses and joining clusters whose independent analyses come from other regions. The independent analysis from the same region will not differ in a cluster from other independent analyses because the independent analysis is only a repetition of experiments in a region. Therefore, a reliable data dimension reduction technique is needed to reduce some independent analyses of metabolite data vectors in each region into one metabolite data vector. In this study, we propose the modified Weiszfeld algorithm (MWA) to deal with this problem. MWA will represent some independent analyses into single data vector. MWA will search for a data vector that minimizes the total distance to all existing data vectors.

Figure 1.

Figure 1

The structure of the clove bud metabolite dataset, used in this research.

To get more comprehensive results, we compared the reduced data clustering results using our proposed MWA with several well-known dimensionality reduction methods. They were principal component analysis (PCA) [17], [24], [51], classical multidimensional scaling (CMDS) [9], [13], [56], laplacian eigenmaps (LE) [10], [48], [49], and locally linear embedding (LLE) [20], [54], [58]. The main objective of this paper is to evaluate the reliability of MWA as a data dimensionality reduction technique, specifically for metabolite data. Our focus is to compare it with several other well-known dimensionality reduction techniques. This paper does not present a comparison of clustering techniques and cluster validity indexes. So, for clustering needed, we only use the fuzzy c-means (FCM) algorithm, and for the cluster validity index, we use the Tang Sun Sun (TSS) index.

The rest of this paper is organized as follows. In Section 2, we described the real-world dataset used in this study. Furthermore, this section described the modified Weiszfeld algorithm (MWA) as a data dimensionality reduction technique, fuzzy c means (FCM) as a clustering technique, and the Tang Sun Sun (TSS) index and the silhouette index as a cluster validity indices. In Section 3, we described the results obtained and discussed them. In this section, we present a comparison of the results of clustering of reduced data using MWA with PCA, CMDS, LE, and LLE reduction techniques. Finally, in Section 4, we summarized the findings of this study.

2. Materials and methods

2.1. Dataset

This research employed a case study on the Indonesian clove buds which metabolite dataset was obtained from the research of Kresnowati et al. [28]. The dataset contained GC-MS analysis results from clove buds samples obtained from four different origins in Indonesia. Three independent clove buds samples were taken from each origin, representing different clove hubs or suppliers in that origin. We call this independent clove bud sample as region. Overall, there were twelve independent clove buds samples (region) that were extracted and analyzed to obtain the clove buds metabolite dataset. Six to eight independent analyses were performed on each of the twelve independent clove buds samples. A high number of replications were performed to anticipate errors and noise in measurements. On average, 47 metabolites were detected in each GC-MS measurement. The structure of the Indonesian clove buds metabolite dataset is shown in Fig. 1.

2.2. The modified Weiszfeld algorithm

In this research, the modified Weiszfeld algorithm is proposed to reduce six or eight independent analyses (data vectors) to one data vector. It means the data matrix that was originally [47×8] or [47×6] in each region be reduced to [47×1] (see Fig. 1 and Fig. 2). This problem can be formulated mathematically, namely finding yRd which solves

miny{C(y)=i=1nηiyxi} (1)

where y explained the representative data point searched for each region, xiRd stated independent analyses in each region, d represented the number of metabolites in each independent analysis, yxi explained the Euclidean distance between y and xi in Rd, and ηi expresses the weight associated with the Euclidean distance between xi and y. The Weiszfeld algorithm is to find a data point in Rd that minimizes the weighted sum of Euclidean distances from the n given data points. Therefore, we have to find the solution of the unconstrained optimization problem in Equation (1).

Figure 2.

Figure 2

The structure of the clove bud metabolite dataset, after dimensionality reduction.

The partial derivative of the objective function C(y) with respect to y is:

C(y)y=i=1nηiyxiyxi,yX

where X={x1,xi,,xn}Rd. Suppose that yX is the optimal solution of the objective function C(y), then we acquire

C(y)y=i=1nηiyxiyxi=0. (2)

From (2), we obtain

y=i=1nηixiyxii=1nηiyxi,

or y=T(y), where the operator T:RdRd is defined by

T(y)=i=1nηixiyxii=1nηiyxi.

The Weiszfeld algorithm is described as follows.

Step 1: Initiate y(0)X,ηi>0, and ε>0. Then in the t-iteration, for t=0,1,2,3,

Step 2: Calculate T0(y(t)) using

T0(y(t))=i=1nηixiy(t)xii=1nηiy(t)xi. (3)

Step 3: Update the value of y using

y(t+1)={T0(y(t)),ifyXxi,ifyX (4)

Step 4: If y never coincides with xi at each iteration, then compare y(t) to y(t+1) using y(t+1)y(t)<ε. If true, then stop. Otherwise, set t=t+1 and return to Step 2. If y=xi occurs, stopping the iterations is performed when y=xi or yX. The Weiszfeld algorithm finds yRd.

The Weiszfeld algorithms get stuck when y=xi, it is due to division by zero in (3). So, Vardi and Zhang [52] modified the Weiszfeld algorithm to deal with the conditions y=xi or yX.

Given yRd, it is convenient to write yX and define multiplicity at y as

η(y)={ηk,ifyX0,ifyX.

The modification of Equation (4) for yX is based on the following observation. For yX, the vector x=T(y) in the following equation

T˜:yT˜(y)=i=1nηixiyxii=1nηiyxi (5)

is unique minimizer of

f(x;y)=i=1nηixxi22di(y). (6)

So, the problem of arg minxC(x) in the Weiszfeld algorithm is replaced by arg minxf(x,y) in each iteration. The argument for the use of f(x;y) is

xf(x;y)|x=y=xC(x)|x=y,yX. (7)

The two minimization problems are similar in all sufficiently small neighborhoods of y,yX [52]. It shows that in Equation (4), if yX, then we should iterate with

x(t)arg minxf(x,x(t)). (8)

For this to have meaning, we need to expand the definition of f in Equation (6) to cover yX. We need to defined

f(x,y)=η(y)xy+xiyηixxi2/(2di(y))={i=1nηixxi2/(2di(y)),ifyX,ηkxy+ikηixxi2/(2di(y)),ifyX.

Although C(x) is not differentiable at xk, Equation (7) is extended for yX in the sense

limxxk,xxk{xf(x,xk)xC(x)}=0.

The modification (8) of (4) at data vectors yX resulting the following equation.

yT(y)=(1η(y)r(y))T˜(y)+min(1,η(y)r(y))y, (9)

with the convention 0/0=0 in the computation of η(y)/r(y) where T˜ is as in (5),

r(y)=R˜(y),R˜(y)=xiyηixiyxiy. (10)

For yX, we get T(y)=T˜(y), by Equation (9) with η(y)=0, as in Weiszfeld algorithm. For yX, T(y) is between T˜(xk) and xk, so that by (5), T(y) is also a weighted average of X. Moreover, for yX, R˜(y) of Equation (10) is the negative of the gradient of C(y). It follows from Equation (5) that

R˜(y)=(T˜(y)y)ηidiy. (11)

Equations (11) and (10) imply that T˜(y)=(y)=T(y) when r(y)=R˜(y)=0. The modified Weiszfeld algorithm is described as follows.

Step 1: Initiate y(0)X,ηi>0, and ε>0. Then in the t-iteration, for t=0,1,2,3,

Step 2: Calculate T0(y(t)) using

T(y(t))=yxiηixiy(t)xiyxiηiy(t)xi. (12)

Step 3: Determine the weights

η(y)={1,ifyX0,ifyX

Step 4: Calculate

R(y(t))=y(t)xiηiy(t)xiy(t)xi

and

ψ(y(t))=min{1,η(y(t))R(y(t))}

Step 5: Update the value of y using

y(t+1)=(1ψ(y(t)))T(y(t))+ψ(y(t))y(t)) (13)

Step 6: Compare y(t) to y(t+1) using y(t+1)y(t)<ε. If true, then stop. Otherwise, set t=t+1 and return to Step 2.

The condition yX implied ψ(y(t))=0 and the modified Weiszfeld algorithm behave exactly as the Weiszfeld algorithm. Also, if yX the sum of (3) is calculated as in (12) which is only for yX. As for the condition yX is added afterwards as in (13), namely by applying the weight ψ(y(t)) [19].

2.3. Fuzzy c means (FCM) algorithm

Conventional clustering means clustering the given observations as exclusive clusters. We can clearly distinguish whether an data point belongs to a cluster or not. However, such a partition is not sufficient to represent many realistic situations. Therefore, the fuzzy clustering method is offered to build clusters with uncertain boundaries. This method allows one data vector (data point) to be part of several clusters that overlap to a certain degree. In other words, the essence of fuzzy clustering is to consider the belonging status of the cluster and the extent to which objects belong to the cluster [47].

Suppose Z={z1,z2,,zn}Rd is the set of n data points with d dimension to be clustered. In the case of Indonesian clove buds metabolite dataset, zkRd(k=1,2,,n) is data point that resulted from the dimensionality reduction of independent analyses in each region. Furthermore, viRd(i=1,2,,c) is the cluster center vector of reduced dataset Z and c(1<c<n) in the number of clusters of the reduced dataset. The degree of membership of the data point zk to the cluster center vi can be expressed as uik=μvi(zk)[0,1]. The degree of membership uik represents the probability of the data point zk to become a member of the cluster vi.

The matrix U=[uik]Rc×n is referred to as the fuzzy partition which filling

uik[0,1],1ic;1kn, (14)
k=1nuik>0,i{1,2,,c}, (15)

and

i=1cuik=1,k{1,2,,n}. (16)

The set of all matrices satisfying (14) - (16) is denoted as Mfcn. Equation (15) guarantees that no cluster is left empty without members. The clustering process may cause some clusters to have no members. Therefore, to avoid this, (15) is needed. Equation (16) ensures that the number of degrees of membership for each data point is equal to 1. This means that each data has a degree of membership in each cluster, but with varying degrees of membership. As a consequence of (15) and (16), no cluster can contain the full membership of all data points.

One of the most widely used fuzzy clustering techniques is the fuzzy c-means algorithm [5], [8], [12], [15], [22], [29], [32]. The purpose of clustering the dataset into c fuzzy clusters is achieved by minimizing the following objective function [6].

Jm(U,V;Z)=k=1ni=1cuikmdik2, (17)

where V={v1,v2,,vc}Rd is set of cluster center, m>1 is a fuzzy parameter, and dik2 is the Euclidean distance between zk with vi. Moreover, uik on the objective function Jm shows membership degree of data vector (data point) zk to the cluster vi. From the objective function Jm, we see that the FCM is the method that minimizes the weighted within-class sum of squares. Aside from assigning a data point to a cluster, membership degrees can also express how ambiguous a data point should belong to a cluster. The concept of these membership degrees is substantiated by Zadeh's definition of fuzzy set in 1965. Thus, fuzzy clustering allows solution spaces in fuzzy partitions of the dataset given. The fuzzy clustering approach with the objective function Jm under constraints (15) dan (16) is also called probabilistic clustering, since due to the constraint (15), the membership degree uik can be interpreted as the probability that data vector zk belongs to cluster vi.

The optimal partition of dataset Z can be obtained by finding U and V which minimize the objective function Jm. The objective function Jm reaches a local minimum when its partial derivative concerning uik and vi is equal to zero and satisfies the constraints on (15) and (16). So we get [6]

uik=(j=1c(dik2djk2)1m1)1,1i,jc;1kn (18)

and

vi=k=1nuikmzkk=1nuikm,1ic. (19)

Picard iteration is one of the popular algorithms for solutions (17) through (18) and (19). This type of iteration is often called alternating optimization because it only repeats through one cycle, namely V(t1)U(t)V(t) and checks the stopping condition V(t1)V(t)<ε. This point is described in detail in [4] and [7]. Furthermore, the determination uik and vi should be done simultaneously. However, we choose to initiate vi to counting uik [46]. There are several advantages with initializing and terminating in vi in terms of convenience, convergence speed, and storage [40]. The fuzzy c-means algorithm is described as follows.

Step 1: Fix m>1,1<c<n, and ε>0. Initiate v(0)Rd, v(0) can be selected randomly from ZRd. Then in the t-iteration, t=0,1,2,

Step 2: Calculate uik using

uik(t+1)=(j=1c(dik2djk2)1m1)1,1ic;1kn

where dik2=zkvi(t)2.

Step 3: Update vi using

vi(t+1)=k=1n(uik(t+1))mzkk=1n(uik(t+1))m,1ic.

Step 4: Compare vi(t) to vi(t+1) using v(t+1)v(t)<ε. If true, then stop. Otherwise, set t=t+1 and return to Step 2.

2.4. Cluster validity index

In the clustering process, it is necessary to know the optimal number of clusters from a dataset. The cluster validity index was employed to determine the optimal number of clusters from the dataset.

2.4.1. The Tang Sun Sun (TSS) index

The idea of this cluster validity index is to measure geometrical compactness in each cluster [25]. The Xie-Beni index [55] is widely employed to determine the number of optimal clusters. However, due to the monotone tendency to zero for cn, the Xie-Beni index can provide a biased optimal number of clusters. The monotony nature of the Xie-Beni index has been extensively studied and discussed in various literature including [30], [39], [50]. Xie and Beni also mentioned in their paper that their cluster validity index decreased monotonically for cn. On the other hand, the optimal number of clusters on the Xie-Beni index is indicated by the smallest value of all existing clusters 1<c<n. With the descending monotone property that converges to zero, it is possible to obtain the smallest Xie-Beni index value in the c=n1 clusters. Therefore, to avoid the occurrence of biased cluster results, we used the Tang Sun Sun index as the cluster validity index. The Tang Sun Sun (TSS) index [50] does not converge to zero for cn. The Tang Sun Sun Index is defined as follows

TSS(U,V;Z)=i=1ck=1nuik2zkvi2min1i,jc,ijvivj2+1c+1c(c1)i=1cj=1,jicvivj2min1i,jc,ijvivj2+1c.

The punishing ad hoc function on the numerator of the Tang Sun Sun index effectively eliminates the descending monotony tendency for as shown below [50].

limcnTSS(U,V;Z)=limcni=1ck=1nuik2zkvi2min1i,jc,ijvivj2+1c+limcn1c(c1)i=1cj=1,jicvivj2min1i,jc,ijvivj2+1c=0+1n(n1)i=1nj=1nzizj2minijzizj2+1n=i=1nj=1,jinzizj2n(n1)minijzizj2+(n1) (20)

Equation (20) indicates the Tang Sun Sun index does not converge to zero for cn. The optimal number of clusters on the Tang Sun Sun index is indicated by the smallest value of all existing clusters (1<c<n).

2.4.2. The silhouette index

To obtain a more comprehensive result, we also used the silhouette index [43] to compare the TSS index as cluster validity used to determine the optimal number of clusters. In constructing the silhouette index, two things are needed. First, partition the datasets obtained using the clustering technique (we use the FCM algorithm) in this study. Second is the collection of similarities between data vectors. The similarity between data vectors is represented in the Euclidean distance between data vectors.

In the context of fuzzy clustering, the data vector zk is closer to the cluster center vi than the other data vectors, meaning that the membership degree uik is greater than ujk, namely uik>ujk for every j, where j{1,,c},ij. Suppose that the average distance of the data vector zk to all data vectors in its cluster (vi) is denoted as aik. Let also the minimum distance of data vector zk to all data vectors belonging to other clusters vj,ij is denoted as ajk. Then, the silhouette index of the data vector zk is defined as [43]

sk=ajkaikmax{aik,ajk}.

The highest index value indicates the optimal number of clusters in the silhouette index.

3. Results and discussions

In the modified Weiszfeld (MWA) algorithm, weight ηi is set equal to 1. It is important to note that the Weiszfeld algorithm did not analyze the weighted problem but assumed that all the weights were equal to 1. It is in line with Neumayer et al. [37] and Beck et al. [3]. Initial vector of y is zero vector (y(0)=0). It is in line with the research of Fritz et al. [19] that uses zero vector as the initial vector. In both MWA and FCM, we employed an experimental condition of ε=105 and maximum number of iterations = 100. While the fuzzy parameter (m) in FCM, Pal and Bezdek [39] suggested the fuzzy parameter value ranging from 1.5 to 2.5. In this study, we employed the median of that values, namely m=2.

Euclid's norm is squared in clustering to tighten the clustering process. Meanwhile, using Euclid's norm in dimension reduction tends to be looser than the clustering process. We target only one data vector to represent six or eight independent analyses in each region in dimensional reduction. Meanwhile, the reduced dataset clustering process was carried out more thoroughly using the squared Euclid's norm. Reduced datasets to clusters are assigned more strictly by applying the squared Euclid's norm.

In this study, we first replaced the zero-concentrated metabolites with 105. Furthermore, the dataset is transformed using logarithmic transformation. The results of the transformation are immediately clustered without any dimensional reduction on each region. The TSS and silhouette indices values for each cluster are given in Fig. 3 and Fig. 4, respectively.

Figure 3.

Figure 3

The Tang Sun Sun index values without dimensionality reduction.

Figure 4.

Figure 4

The silhouette index values without dimensionality reduction.

Fig. 3 shows the smallest value of the TSS index on four clusters. It means the optimal number of clusters is four clusters. Meanwhile, Fig. 4 shows the highest index value for the silhouette index, namely four clusters, which means the optimal number of clusters is four. Both cluster validity indices provide the same optimal number of clusters, namely four clusters. Details of cluster members from each cluster are shown in Table 1.

Table 1.

Clustering result without dimensionality reduction.

Cluster Member of Cluster
I M11, M12, M13, M14, M15, M16, M17, M18, M21, M22, M23, M24, M25, M26, M27, M28, M31, M32, M33, M34, M35, M36, M37, M38, T22, T33
II B11, B12, B13, B14, B15, B16, B17, B18, B21, B22, B23, B24, B25, B26, B27, B28, B31, B32, B33, B34, B35, B36, B37, B38
III J11, J12, J13, J14, J15, J16, J21, J22, J23, J24, J25, J26, J27, J22, J31, J32, J33, J34, J35, J36, J37, J38
IV T11, T12, T13, T14, T15, T16, T17, T18, T21, T23, T24, T25, T26, T27, T28, T31, T32, T34, T35, T36, T37, T38

M12 in Table 1 means the second independent analysis of the first region at the Manado origin. T35 means the fifth independent analysis of the third region at the Toli-Toli origin (see Fig. 1).

In general, Table 1 provides information that each origin of Indonesian clove buds has a unique or distinctive taste and aroma characteristics. It is based on the results of clustering, which show independent analyses from the same origin spreading in the same cluster. Each cluster consists of independent analyses from the same origin of the four existing clusters. However, Table 1 shows the independent analyses T22 and T33 are included in the first cluster that commonly contains independent analyses from Manado origin. This result provides biased information because two independent analyses (T22 and T33) from Toli-Toli origin become one cluster with independent analyses from Manado origin. We suspect that there are some errors in the measurement of metabolite concentrations in the independent analyses of T22 and T33, causing T22 and T33 to abandon other independent analyses from Toli-Toli origin and become one cluster with independent analyzes from Manado origin. Therefore, to obtain a more informative and meaningful clustering result, we propose dimensionality reduction of independent analyses in each region to become one representation data point (one data vector). Independent analyses are reduced in each region. The dataset that initially has six or eight independent analyses (data points/data vectors) in each region is reduced to one data point (see Figs. 1 and 2). It was done twelve times because, overall, there were twelve regions. Twelve data vectors resulting from dimensionality reduction are clustered using the fuzzy c-means (FCM) algorithm. The TSS and the silhouette indices are used to determine the number of optimal clusters.

Clustering is performed on a reduced dataset whose reduction uses PCA, CMDS, LE, LLE, and MWA. The obtained TSS and silhouette indices values are presented in Tables 2 and 3. The bold numbers in Table 2 show the smallest TSS index value for each dimension reduction technique. Meanwhile, the bold numbers in Table 3 show the highest silhouette index value for each dimension reduction technique. The bold numbers in Tables 2 and 3 respectively show the optimal number of clusters for each dimensionality reduction technique used.

Table 2.

The Tang Sun Sun index values after dimensionality reduction.

Number of clusters PCA CMDS LE LLE MWA
2 2.69 1.48 1.90 2.11 2.76
3 2.59 3.80 1.82 3.44 2.45
4 1.99 3.17 2.39 5.11 1.87
5 4.65 4.08 2.02 2.70 3.78
6 5.21 4.01 2.13 2.73 2.98
7 4.82 12.07 2.09 4.63 4.90
8 6.17 12.23 2.16 4.98 5.54
9 8.38 11.19 2.33 4.85 9.14
10 8.37 18.57 2.31 4.64 8.62
11 7.21 21.42 2.30 4.63 8.15

Table 3.

The silhouette index values after dimensionality reduction.

Number of clusters PCA CMDS LE LLE MWA
2 0.66 0.82 0.53 0.58 0.66
3 0.73 0.73 0.45 0.49 0.75
4 0.78 0.79 0.61 0.56 0.78
5 0.77 0.75 0.65 0.69 0.80
6 0.79 0.83 0.72 0.64 0.85
7 0.74 0.87 0.76 0.70 0.80
8 0.76 0.89 0.78 0.81 0.72
9 0.84 0.85 0.74 0.85 0.89
10 0.92 0.94 0.84 0.94 0.94
11 0.98 0.99 0.89 0.99 0.98

We will first analyze and interpret the results obtained in Table 2, using the TSS index as the cluster validity index. Based on Table 2, the optimal number of clusters obtained using PCA as a dimension reduction technique is four clusters. At the same time, the optimal number of clusters with dimension reduction using CMDS is two clusters. The optimal number of clusters using LE dimension reduction is three clusters. In comparison, the optimal number of clusters with dimension reduction using LLE is two clusters. Dimensional reduction using our proposed MWA gives the optimal number of clusters, namely four clusters. Details of cluster members from each obtained optimal number of clusters are shown in Tables 4, 5, 6, 7, and 8.

Table 4.

Clustering result by using PCA as dimensionality reduction technique.

Cluster Member of Cluster
I M1, M2, M3
II T1, T2, T3
III B1, B2, B3
IV J1, J2, J3

Table 5.

Clustering result by using CMDS as dimensionality reduction technique.

Cluster Member of Cluster
I J2, J3, T2 B1, B2, B3 M1, M2, M3
II J1, T1, T3

Table 6.

Clustering result by using LE as dimensionality reduction technique.

Cluster Member of Cluster
I B1, B3, M1
II J1, J2, J3 M2, T1
III B2, M3, T2, T3

Table 7.

Clustering result by using LLE as dimensionality reduction technique.

Cluster Member of Cluster
I B2, B3, T1 M1, M2, M3
II J1, J2, J3 B1, T2, T3

Table 8.

Clustering result by using the proposed MWA dimensionality reduction technique.

Cluster Member of Cluster
I M1, M2, M3
II B1, B2, B3
III J1, J2, J3
IV T1, T2, T3

Table 4 shows the members of each cluster from the four optimal clusters obtained by dimension reduction using PCA. The smallest TSS index value is 1.99. It shows that the optimal number of clusters is four clusters. The results of this clustering present regions originating from the same origin, including in the same cluster. If we compare the results of the cluster before the dimension reduction in Table 1, then we find that the results of clustering with dimension reduction using PCA give the same cluster results. In general, Table 1 presents information that the independent analyses contained in each region with the same origin have the same characteristics and properties because the independent analyses are spread out in the same cluster. Likewise, after dimensional reduction using PCA, regions originating from the same origin are also in the same cluster. So, it can be concluded that PCA can perfectly reduce six or eight independent analyses in each region into one representative data vector. PCA can absorb maximum chemical information in each region without changing the chemical information in each region.

Table 5 shows the members of each cluster from the two optimal clusters obtained by dimension reduction using CMDS. The smallest TSS index value is 1.48. It means the optimal number of clusters is two. Table 5 provides information that the origin of Jawa, Bali, and Manado has the same chemical properties. Except for the region of Jawa 1 (J1) is in a different cluster, namely being one cluster with the Toli-Toli 1 (T1) and Toli-Toli 3 (T3) regions. The reduction results using CMDS provide a clustering result; the Java 1 (J1) region is separated from other regions in the origin of Jawa. Likewise, the Toli-Toli 2 (T2) region separated from other regions at the origin of Toli-Toli. It is contrary to the results shown in Table 1 that the taste and aroma of cloves from the same origin are not significantly different. So it can be concluded that dimensional reduction using CMDS cannot represent or maintain chemical information in each region as before dimensional reduction was carried out.

Table 2 shows the LE dimension reduction technique presents the smallest TSS index value of 1.82, meaning the optimal number of clusters is three. Meanwhile, LLE presents the smallest TSS index value, 2.11, which means the optimal number of clusters is two clusters. The clustering results with dimension reduction using LE and LLE presented in Tables 6, and 7 indicate that these two-dimensional reduction methods cannot maintain chemical information in each region. It is evidenced by the results of the clustering presented in Tables 6 and 7 which are mixed in one cluster of regions originating from different origins. Besides that, the results of the cluster do not reflect the distribution of the data before the dimension reduction of the independent analyses is carried out as presented in Table 2. So, LE and LLE are not good enough for dimensionality reduction of independent analysis in each region.

Furthermore, we present the results obtained by the reduction technique using MWA. Our MWA proposal presents the smallest TSS index value of 1.87, which means the optimal number of clusters is four clusters. Table 8 shows the results of data clustering with reduction of independent analyses in each region using MWA. These results indicate that the optimal number of clusters obtained in four clusters. Each cluster consists of regions from the same origin. These results align with the clustering results with reduced dimensions of independent analyses using PCA. PCA and MWA both present four optimal clusters, each cluster consisting of regions with the same origin. Our proposed MWA can consistently represent six or eight independent analyses in each region into one representative while maintaining chemical information in each region. MWA presents the results of clustering, which are in line with the results obtained in Table 1 before the dimension reduction was carried out. Based on these results, we confirm that our proposed MWA is robust for dimensionality reduction of independent analyses. Six or eight independent analyses in each region can be well represented into a single data vector while maintaining chemical information in each region.

Chemically, it can be interpreted that the data clustering of clove metabolites with dimension reduction of independent analyses using MWA indicates each clove origin has a unique chemical composition or, in other words, each clove origin has a distinctive taste and aroma. Therefore, if the production stock of a clove origin is not available, then the other available clove origin cannot be used to replace it because it has a different taste and aroma. In terms of producers who use cloves as an ingredient in their product mix, cloves from different origins will provide different product quality because each clove origin has a unique taste and aroma based on the results of this clustering.

Here, we analyze the optimal number of clusters obtained with the cluster validity index using the silhouette index. Table 3 shows the optimal number of clusters with dimension reduction techniques using PCA, CMDS, LE, LLE, and MWA are 11 clusters. It is based on the highest silhouette index value obtained for each reduction technique at the position of 11 clusters. Based on Table 9, the silhouette index does not reflect the optimal number of clusters before the independent analyses are reduced. The optimal number of clusters with the silhouette index as the cluster validity index before the reduction of independent analyses are four clusters. Meanwhile, after independent analysis reduction, each reduction technique provides an optimal number of 11 clusters with the silhouette index as the cluster validity index. The results of this clustering show that each region is in a different cluster, except for the Jawa 2 (J2) and Jawa 3 (J3) regions in the same cluster. This result means that each region has unique characteristics except for J2 and J3, which have the same characteristics. These regions come from the same origin; for example, the Manado 1 (M1), Manado 2 (M2), and Manado 3 (M3) regions come from the origin of Manado, which is still in the same area. So, there is no significant difference in climate, environmental conditions, and soil conditions. Therefore, regions of the same origin should also not be significantly different. However, this fact is different from the cluster results obtained with the silhouette index as the cluster validity index. So, we conclude that the silhouette index is not suitable for evaluating the optimal number of clusters after reducing independent analyses. The uniform optimal number of clusters, namely 11 clusters for each dimension reduction technique, also indicates the inaccuracy of the silhouette index in evaluating the optimal number of clusters after the reduction of independent analyses. Therefore, we confirm that the TSS index is more suitable because it can maintain the chemical information contained in each region before independent analysis reduction by the reduction technique using PCA and MWA that we propose.

Table 9.

Clustering result by using the silhouette index as cluster validity index.

Cluster Member of Cluster
I B1
II B3
III J2, J3
IV T2
V B2
VI T1
VII M3
VIII M1
IX J1
X T3
XI M2

Finally, based on the results, we confirm the reliability of our proposed MWA as a chemometric technique in metabolomics studies.

Furthermore, the plot of the value of the objective function of the FCM algorithm for dimension reduction using MWA is shown in Fig. 5. Fig. 5 shows the convergence of the FCM objective function with dimension reduction using our proposed MWA. The value of the objective function decreases drastically from the first to the second iteration and starts to slope from the third to the eighth iteration. It appears that the objective function starts to converge to a value of 0.72 from the tenth to the sixteenth iteration. It means that the objective function has reached its minimum value since the tenth iteration. In this study, we used one of two iteration termination criteria. The first criterion is the iteration will stop when the difference in the value of the objective function in the previous and subsequent iterations is less than the specified error tolerance. In this case, the error tolerance set is ε=105. If the first criterion is not met, the iteration will stop when the specified maximum iteration is reached. Here, we used a maximum number of iterations of 100. The plot of the objective function values in Fig. 5 shows that the iteration stops at the sixteenth iteration because it meets the first criterion. The objective function reaches a minimum value by obtaining four fuzzy clusters for the Indonesian clove buds metabolite dataset.

Figure 5.

Figure 5

The convergence of the FCM objective function with dimension reduction using MWA.

4. Conclusions

In this paper, we have presented the performance of the modified Weiszfeld algorithm (MWA) for dimensionality reduction of independent analyses in each region. We compared MWA with some other well-known dimensionality reduction methods to obtain more complete results, including PCA, CMDS, LE, and LLE. The results revealed that MWA, together with PCA, could provide dimensionality reduction of independent analyses in each region, consisting of six or eight independent analyses into one data point (data vector) while maintaining the chemical information of each region. The clustering results are relevant to the clustering results of the clove buds metabolite dataset before dimensionality reduction. Therefore, we recommended that MWA is reliable for dimensionality reduction of metabolite datasets consisting of independent analyses to anticipate errors in measuring metabolite concentrations. In addition, we have also presented a clove differentiation technique based on its metabolite composition, which so far has only been carried out using conventional qualitative methods utilizing the services of a taste expert (flavorist). Based on the cluster results obtained by dimensional reduction using MWA, we concluded that of the four Indonesian clove buds origins clustered, the optimal number of clusters is four clusters. It means each clove bud's origin has unique characteristics or has a distinctive taste and aroma. Finally, we recommended the reliability of MWA as one of the chemometric techniques whose use can be used more widely in metabolomics studies. This paper has enriched chemometric techniques in metabolomics studies.

Declarations

Author contribution statement

Rustam: Conceived and designed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper. Agus Yodi Gunawan: Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper. Made Tri Ari Penia Kresnowati: Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper.

Declaration of interests statement

The authors declare no conflict of interest.

Data availability statement

The data that has been used is confidential.

Funding statement

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Additional information

No additional information is available for this paper.

Acknowledgements

The authors would like to thank Telkom University for its funding support.

References

  • 1.Allwood J.W., Goodacre R. An introduction to liquid chromatography–mass spectrometry instrumentation applied in plant metabolomic analyses. Phytochem. Anal. Int. J. Plant Chem. Biochem. Tech. 2010;21:33–47. doi: 10.1002/pca.1187. [DOI] [PubMed] [Google Scholar]
  • 2.Beale D.J., Pinu F.R., Kouremenos K.A., Poojary M.M., Narayana V.K., Boughton B.A., Kanojia K., Dayalan S., Jones O.A., Dias D.A. Review of recent developments in GC–MS approaches to metabolomics-based research. Metabolomics. 2018;14:1–31. doi: 10.1007/s11306-018-1449-2. [DOI] [PubMed] [Google Scholar]
  • 3.Beck A., Sabach S. Weiszfeld's method: old and new results. J. Optim. Theory Appl. 2015;164:1–40. [Google Scholar]
  • 4.Bezdek J., Hathaway R., Howard R., Wilson C., Windham M. Local convergence analysis of a grouped variable version of coordinate descent. J. Optim. Theory Appl. 1987;54:471–477. [Google Scholar]
  • 5.Bezdek J.C. Springer Science & Business Media; 2013. Pattern Recognition with Fuzzy Objective Function Algorithms. [Google Scholar]
  • 6.Bezdek J.C., Ehrlich R., Full W. FCM: the fuzzy c-means clustering algorithm. Comput. Geosci. 1984;10:191–203. [Google Scholar]
  • 7.Bezdek J.C., Hathaway R.J., Sabin M.J., Tucker W.T. Convergence theory for fuzzy c-means: counterexamples and repairs. IEEE Trans. Syst. Man Cybern. 1987;17:873–877. [Google Scholar]
  • 8.Bezdek J.C., Keller J., Krisnapuram R., Pal N. Springer Science & Business Media; 1999. Fuzzy Models and Algorithms for Pattern Recognition and Image Processing, vol. 4. [Google Scholar]
  • 9.Borg I., Groenen P.J., Mair P. Springer; 2018. Applied Multidimensional Scaling and Unfolding. [Google Scholar]
  • 10.Chen Y., Zheng C., Sun G. Gold prospectivity modeling by combination of Laplacian eigenmaps and least angle regression. Nat. Resour. Res. 2021:1–18. [Google Scholar]
  • 11.Chong J., Soufan O., Li C., Caraus I., Li S., Bourque G., Wishart D.S., Xia J. Metaboanalyst 4.0: towards more transparent and integrative metabolomics analysis. Nucleic Acids Res. 2018;46:W486–W494. doi: 10.1093/nar/gky310. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Chovancova O., Rabcan J., Kostolny J., Macekova D. 2019 10th International Conference on Dependable Systems, Services and Technologies (DESSERT) IEEE; 2019. Human reliability evaluation through analysis of depression prediction based on metabolomic data; pp. 88–93. [Google Scholar]
  • 13.Cimino P.J., Zager M., McFerrin L., Wirsching H.G., Bolouri H., Hentschel B., von Deimling A., Jones D., Reifenberger G., Weller M., et al. Multidimensional scaling of diffuse gliomas: application to the 2016 world health organization classification system with prognostically relevant molecular subtype discovery. Acta Neuropathol. Commun. 2017;5:1–14. doi: 10.1186/s40478-017-0443-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Cui L., Lu H., Lee Y.H. Challenges and emergent solutions for LC-MS/MS based untargeted metabolomics in diseases. Mass Spectrom. Rev. 2018;37:772–792. doi: 10.1002/mas.21562. [DOI] [PubMed] [Google Scholar]
  • 15.Dave R.N. Characterization and detection of noise in clustering. Pattern Recognit. Lett. 1991;12:657–664. [Google Scholar]
  • 16.Dunn W.B., Ellis D.I. Metabolomics: current analytical platforms and methodologies. TrAC, Trends Anal. Chem. 2005;24:285–294. [Google Scholar]
  • 17.Emwas A.H., Roy R., McKay R.T., Tenori L., Saccenti E., Gowda G., Raftery D., Alahmari F., Jaremko L., Jaremko M., et al. NMR spectroscopy for metabolomics research. Metabolites. 2019;9:123. doi: 10.3390/metabo9070123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Fiehn O., Kopka J., Trethewey R.N., Willmitzer L. Identification of uncommon plant metabolites based on calculation of elemental compositions using gas chromatography and quadrupole mass spectrometry. Anal. Chem. 2000;72:3573–3580. doi: 10.1021/ac991142i. [DOI] [PubMed] [Google Scholar]
  • 19.Fritz H., Filzmoser P., Croux C. A comparison of algorithms for the multivariate L 1-median. Comput. Stat. 2012;27:393–410. [Google Scholar]
  • 20.Ghojogh B., Ghodsi A., Karray F., Crowley M. Locally linear embedding and its variants: tutorial and survey. 2020. arXiv:2011.10925 arXiv preprint.
  • 21.Halket J.M., Waterman D., Przyborowska A.M., Patel R.K., Fraser P.D., Bramley P.M. Chemical derivatization and mass spectral libraries in metabolic profiling by GC/MS and LC/MS/MS. J. Exp. Bot. 2005;56:219–243. doi: 10.1093/jxb/eri069. [DOI] [PubMed] [Google Scholar]
  • 22.Hathaway R.J., Bezdek J.C. NERF c-means: non-Euclidean relational fuzzy clustering. Pattern Recognit. 1994;27:429–437. [Google Scholar]
  • 23.Hawe J.S., Theis F.J., Heinig M. Inferring interaction networks from multi-omics data. Front. Genet. 2019;10:535. doi: 10.3389/fgene.2019.00535. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.He J., Sun R.b, et al. Multivariate statistical analysis for metabolomic data: the key points in principal component analysis. Acta Pharm. Sin. 2018:929–937. [Google Scholar]
  • 25.Himmelspach L. 2016. Fuzzy clustering of incomplete data. Ph.D. thesis. [Google Scholar]
  • 26.Kim H.K., Choi Y.H., Verpoorte R. NMR-based plant metabolomics: where do we stand, where do we go? Trends Biotechnol. 2011;29:267–275. doi: 10.1016/j.tibtech.2011.02.001. [DOI] [PubMed] [Google Scholar]
  • 27.Koeman M., Engel J., Jansen J., Buydens L. Critical comparison of methods for fault diagnosis in metabolomics data. Sci. Rep. 2019;9:1–11. doi: 10.1038/s41598-018-37494-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Kresnowati M.T.A.P., Purwadi R., Zunita M., Sudarman R., Putri A.O. 2018. Metabolite profiling of four origins Indonesian clove buds using multivariate analysis. Report Research Collaboration PT. HM Sampoerna Tbk. and Institut Teknologi Bandung (confidential report) [Google Scholar]
  • 29.Krishnapuram R., Keller J.M. A possibilistic approach to clustering. IEEE Trans. Fuzzy Syst. 1993;1:98–110. [Google Scholar]
  • 30.Kwon S.H. Cluster validity index for fuzzy clustering. Electron. Lett. 1998;34:2176–2177. [Google Scholar]
  • 31.Li S., Cirillo P., Hu X., Tran V., Krigbaum N., Yu S., Jones D.P., Cohn B. Understanding mixed environmental exposures using metabolomics via a hierarchical community network model in a cohort of California women in 1960's. Reprod. Toxicol. 2020;92:57–65. doi: 10.1016/j.reprotox.2019.06.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Li X., Zhao L., Wei M., Lv J., Sun Y., Shen X., Zhao D., Xue F., Zhang T., Wang J. Serum metabolomics analysis for the progression of esophageal squamous cell carcinoma. J. Cancer. 2021;12:3190. doi: 10.7150/jca.54429. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Li Y., Wu F.X., Ngom A. A review on machine learning principles for multi-view biological data integration. Brief. Bioinform. 2018;19:325–340. doi: 10.1093/bib/bbw113. [DOI] [PubMed] [Google Scholar]
  • 34.Liland K.H. Multivariate methods in metabolomics–from pre-processing to dimension reduction and statistical analysis. TrAC, Trends Anal. Chem. 2011;30:827–841. [Google Scholar]
  • 35.Meng C., Zeleznik O.A., Thallinger G.G., Kuster B., Gholami A.M., Culhane A.C. Dimension reduction techniques for the integrative analysis of multi-omics data. Brief. Bioinform. 2016;17:628–641. doi: 10.1093/bib/bbv108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Mirza B., Wang W., Wang J., Choi H., Chung N.C., Ping P. Machine learning and integrative analysis of biomedical big data. Genes. 2019;10:87. doi: 10.3390/genes10020087. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Neumayer S., Nimmer M., Setzer S., Steidl G. On the robust PCA and Weiszfeld's algorithm. Appl. Math. Optim. 2020;82:1017–1048. [Google Scholar]
  • 38.Oliver S.G., Winson M.K., Kell D.B., Baganz F. Systematic functional analysis of the yeast genome. Trends Biotechnol. 1998;16:373–378. doi: 10.1016/s0167-7799(98)01214-1. [DOI] [PubMed] [Google Scholar]
  • 39.Pal N.R., Bezdek J.C. On cluster validity for the fuzzy c-means model. IEEE Trans. Fuzzy Syst. 1995;3:370–379. [Google Scholar]
  • 40.Pal N.R., Pal K., Keller J.M., Bezdek J.C. A possibilistic fuzzy c-means clustering algorithm. IEEE Trans. Fuzzy Syst. 2005;13:517–530. [Google Scholar]
  • 41.Putri S.P., Fukusaki E. CRC Press; 2014. Mass Spectrometry-Based Metabolomics: a Practical Guide. [Google Scholar]
  • 42.Ren S., Hinzman A.A., Kang E.L., Szczesniak R.D., Lu L.J. Computational and statistical analysis of metabolomics data. Metabolomics. 2015;11:1492–1513. [Google Scholar]
  • 43.Rousseeuw P.J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1987;20:53–65. [Google Scholar]
  • 44.Rustam, Gunawan A.Y., Kresnowati M.T.A.P. IOP Publishing; 2019. The Hard C-Means Algorithm for Clustering Indonesian Plantation Commodity Based on Metabolites Composition; p. 012085. (Journal of Physics: Conference Series). [Google Scholar]
  • 45.Rustam, Gunawan A.Y., Kresnowati M.T.A.P. Artificial neural network approach for the identification of clove buds origin based on metabolites composition. Acta Polytech. 2020;60:440–447. [Google Scholar]
  • 46.Rustam, Usman K., Kamaruddin M., Chamidah D., Saleh K., Eliskar Y., Marzuki I. Modified possibilistic fuzzy c-means algorithm for clustering incomplete data sets. Acta Polytech. 2021;61:364–377. [Google Scholar]
  • 47.Sato-Ilic M., Jain L.C. Springer; 2006. Innovations in Fuzzy Clustering. [Google Scholar]
  • 48.Song L., Ma H., Wu M., Zhou Z., Fu M. International Conference on Intelligent Science and Big Data Engineering. Springer; 2018. A brief survey of dimension reduction; pp. 189–200. [Google Scholar]
  • 49.Sun G., Zhang S., Zhang Y., Xu K., Zhang Q., Zhao T., Zheng X. Effective dimensionality reduction for visualizing neural dynamics by Laplacian eigenmaps. Neural Comput. 2019;31:1356–1379. doi: 10.1162/neco_a_01203. [DOI] [PubMed] [Google Scholar]
  • 50.Tang Y., Sun F., Sun Z. Proceedings of the 2005, American Control Conference, 2005. IEEE; 2005. Improved validation index for fuzzy clustering; pp. 1120–1125. [Google Scholar]
  • 51.Treutler H., Tsugawa H., Porzel A., Gorzolka K., Tissier A., Neumann S., Balcke G.U. Discovering regulated metabolite families in untargeted metabolomics studies. Anal. Chem. 2016;88:8082–8090. doi: 10.1021/acs.analchem.6b01569. [DOI] [PubMed] [Google Scholar]
  • 52.Vardi Y., Zhang C.H. A modified Weiszfeld algorithm for the Fermat-Weber location problem. Math. Program. 2001;90:559–566. [Google Scholar]
  • 53.Wolfender J.L., Marti G., Thomas A., Bertrand S. Current approaches and challenges for the metabolite profiling of complex natural extracts. J. Chromatogr. A. 2015;1382:136–164. doi: 10.1016/j.chroma.2014.10.091. [DOI] [PubMed] [Google Scholar]
  • 54.Wu Y.C., Hwang H.T., Hsu C.C., Tsao Y., Wang H.M. INTERSPEECH. 2016. Locally linear embedding for exemplar-based spectral conversion; pp. 1652–1656. [Google Scholar]
  • 55.Xie X.L., Beni G. A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 1991;13:841–847. [Google Scholar]
  • 56.Xu X., Liang T., Zhu J., Zheng D., Sun T. Review of classical dimensionality reduction and sample selection methods for large-scale data processing. Neurocomputing. 2019;328:5–15. [Google Scholar]
  • 57.Yi L., Song C., Hu Z., Yang L., Xiao L., Yi B., Jiang W., Cao Y., Sun L. A metabolic discrimination model for nasopharyngeal carcinoma and its potential role in the therapeutic evaluation of radiotherapy. Metabolomics. 2014;10:697–708. [Google Scholar]
  • 58.Zhang Y., Ye D., Liu Y. Robust locally linear embedding algorithm for machinery fault diagnosis. Neurocomputing. 2018;273:323–332. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data that has been used is confidential.


Articles from Heliyon are provided here courtesy of Elsevier

RESOURCES