Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2020 Oct 26;48(13-15):2406–2420. doi: 10.1080/02664763.2020.1834519

CD-vine model for capturing complex dependence

O Ozan Evkaya a,CONTACT, Ceylan Yozgatlıgil b, A Sevtap Selcuk-Kestel c
PMCID: PMC9041657  PMID: 35707091

ABSTRACT

Copula based finite mixture models allow us to capture the dependence between random variables more flexibly. Although bivariate case of finite mixture models has been commonly studied, limited efforts have been spent on finite mixture of vines. Instead of using classical mixture models, it is possible to incorporate C-vines into the D-vine model (CD-vine) to understand both the dependence among the variables over different time points. The aim of this study is to create a CD-vine mixture model expressing the dependencies between variables in temporal order. To achieve this, cumulative distribution function values generated within the time components are tied together with D-vine probabilistically. With this approach, dependence structure between variables at each time point is explained by C-vine and the dependence among the time points is captured by the D-vine model. The performance of the proposed CD-vine model is validated using simulated data and applied on four stock market indices.

Keywords: Mixture model, C-vine, D-vine, CD-vine mixture, stock market indices

1. Introduction

Over the last decades, copulas became a very popular tool to understand the dependence between random variables in different research fields such as finance [1], actuarial science [12], and weather related researches[13]. Under such multi-dimensionality, vine copulas are proposed to detect the complex dependence in multivariate setting by exploiting pair copulas [2,6,7,19,24]. Simply, vine copulas are probabilistic graphical tools, which allow us to overcome the limitations of standard copulas in higher dimensions. It is possible to express a multivariate density function by using both unconditional and conditional bivariate copula pairs. In terms of a large number of possible decompositions for conditional density function, there are numerous ways to generate vines. Among those constructions, two popular types of vine copulas, widely used by researchers, are the Canonical (C) and Drawable (D) vines [23]. For the recent review on vines, interested readers are referred to [10].

The interest in studies based on finite mixture models on copulas is increased to reveal the hidden and complex dependence patterns among the variables in a more flexible manner. In that respect, the fruitful marriage of the finite mixture model and vines is also beneficial for capturing dependence structures in a multivariate setting. Previously, Vrac et al. [24], Cuvelier and Noirhomme-Fraiture [9,17] and [5,23] studied on model-based clustering approach in terms of copulas. Later on, these approaches are extended to higher dimensions with vine copula methodology by [20,29] for clustering and [14] in terms of parameter estimations. On the other hand, the main focus of the above-mentioned studies is on the finite mixture model, constructed by the parametric pair copulas.

This study proposes a novel CD-vine mixture model, to investigate the dependence within and among the components at the same time instead of using a finite mixture scheme. The main inspiration of CD-vine model comes from one of the agricultural problems, capturing the dependence between the growing phases of any crop with respect to weather parameters. For any specific crop, there is a unique requirement for climatic or soil based needs over its growth stages. For feasible agricultural risk management, understanding the dependence among the growth stages for a specific crop yield is very crucial. In that respect, the proposed CD-vine approach can be a useful tool to detect any hidden dependence both within and between growth periods in terms of selected explanatory variables. More clearly, dependence structure between variables during the growing phase is explained by C-vine and the dependence among these time points is captured by D-vine model. This mechanism can be considered to capture the underlying dependence structure on the financial market changing on a yearly basis. Generally, as the most elegant way to consider the dependence as a function of time, time varying copulas appears in the literature [3], [4]. In that respect, CD-vine model can be an aider tool for such kind of financial return studies. By considering the analytically derived joint distributions, the parameter estimations of the CD-vine approach are considered under certain copula families. Since the long term availability of stock market data is common, the proposed model is considered for a financial data set.

The rest of the paper is organized within the following sections. Section 2 summarizes the preliminaries of vine copulas and the analytical derivation for the CD-vine mixture model. Thereafter, Section 3 summarizes the findings of the proposed model using both simulated and real life data sets. Finally, the main conclusion with drawbacks and the further outlook of the study is summarized in Section 4.

2. Materials and methods

2.1. Preliminaries

The notion of copula was first introduced by [28], that aims to derive joint distribution of a random vector in terms of its marginals and the joint dependence structure [16]. For a p-dimensional vector U on the unit hypercube, a copula C is defined as,

C(u1,u2,,up)=P(U1u1,U2u2,,Upup), (1)

where C is associated with p-dimensional distribution function F, having marginals F1=F1(x1),F2=F2(x2),,Fp=Fp(xp) satisfies the following [28],

F(x1,x2,,xp)=C(F1(x1),F2(x2),,Fp(xp);β), (2)

where the random variable Xi, for i=1,,p is assumed to be continuous and β is the parameter which measures dependence between the marginals. In particular, C can be interpreted as the distribution function of a p dimensional random variable on [0,1]p with uniform margins. C is unique whenever all F1, F2, …, Fp are continuous marginal distributions. Conversely, if C is a copula and F1,,Fp are distribution functions, then the function F defined by Equation (2) is a multivariate distribution function with margins F1,,Fp. Besides, the corresponding density function for C is given as,

f(x1,x2,,xp)=c1p(F1(x1),F2(x2),,Fp(xp);β)k=1pfk(xk) (3)

Generally, copula functions can be classified as Elliptical and Archimedean type families that exhibit distinct dependence structures. For example, widely used elliptical copulas are normal and student t-copulas, where the first one represents no tail dependence but the latter exhibits both upper and lower tail dependence [20]. As a member of Archimedean family, Frank copula examines the symmetric dependence structures. On the other hand, Clayton and Gumbel are useful to identify the tail dependencies at lower and upper quantiles, respectively. Even if they are very practical to model the dependence structure in the bivariate case, the use of copulas requires more effort in higher dimensions, where standard multivariate copulas are suffering from inflexible structures. For this reason, vine copula (vines), was proposed to increase the flexibility of multivariate copula models. Vines incorporate unconditional and conditional bivariate copulas to describe a multivariate distribution [7]. A set of linked trees describes the vine copula's factorization of the multivariate density function using pair copulas [7,11]. Historically, Joe [18] gives a probabilistic construction of multivariate distribution functions based on simple building blocks called pair-copulas. This construction method is called Pair Copula Construction (PCC) and it is organized in a graphical way, called regular vines [7].

Therefore, vine construction can be made as a sequence of bivariate unconditional and conditional copulas in a hierarchical manner. By this method, given density in Equation (3) can be decomposed into a product of pair-copulas in an iterative manner, under suitable regularity conditions [2],

f(xk|v)=cxk,vj|vj(F(xk|vj),F(vj|vj))f(xk|vj) (4)

where v=xk is (p1)-dimensional vector excluding xk, vj is arbitrarily chosen one component from v and vj is the random vector v excluding vj. In terms of different decompositions, various vine structures can be obtained by using Equation (4). In this study, the main focus is the dependence model construction using both simplified parametric C- and D-vines. For that purpose, the probability density function of C-vine and D-vine for p-dimensional case are defined as,

fCV(x1,,xp)=j=1p1i=1pjcj,j+i|1,,j1CV(Fj|(1,,j1),Fj+i|(1,,j1);βj,j+i)×k=1pfk(xk) (5)
fDV(x1,,xp)=j=1p1i=1pjci,i+j|i+1,,i+j1DV(Fi|(i+1,,i+j1),Fi+j|(i+1,,i+j1);θi,i+j)×k=1pfk(xk) (6)

2.2. CD-vine model

The main motivation of the proposed method relies on the fact that the dependence pattern of multivariate data can change also with respect to time. For this purpose, C-vines are incorporated into the D-vine copula model with predefined pair copula families and the CD-vine model is proposed. Let cvt for t=1,,m denote the random vector generated by the distribution functions of the observations belonging to the random vector Xk,i where k=1,,p, i=1,,N. The corresponding multivariate density functions, fk(xk,i), for each cvt can be described in terms of C-vine decomposition at a certain value of t. By harmonizing the Equations (5) and (6), CD-vine model can be exemplified for the simplest case (m = 3, p = 4) as follows,

g(x;Γ)=c12DV(F(cv1),F(cv2);θ12)c23DV(F(cv2),F(cv3);θ23)×c13|2DV(F(cv1|cv2),F(cv3|cv2);θ13|2)×t=13fcvt(x1,x2,x3,x4;ϕCVt) (7)

where cv1, cv2 and cv3 are observations specifically extracted from C-vine densities, respectively. Γ denotes the all necessary copula parameters to be estimated for maximizing the log-likelihood of function, g(x;Γ), based on given sample, where Γ=(ϕCV1,ϕCV2,ϕCV3,θDV) and ϕCVt=(β12t,β13t,β14t,β23|1t,β24|1t,β34|12t) with ϕCVt=6 for t = 1, 2, 3 denotes the required set of parameters for the copula pairs of each C-vine part. Under this modeling setup, fcv1(x1:4;ϕCV1), fcv2(x1:4;ϕCV2) and fcv3(x1:4;ϕCV3) denote the multivariate C-vine densities for each of cvt. Besides, θDV=(θ12,θ23,θ13|2) represents the necessary parameters for the D-vine modeling. In total, for the considered simplest case of CD-vine model, totally t×ϕCVt+θDV=21 parameters are required for the full inference by maximizing the corresponding log-likelihood function with sample size N, which is described as,

L(Γ;x)=log(n=1Ng(x;Γ))=n=1N(logc12DV(F(cv1),F(cv2);θ12)+logc23DV(F(cv2),F(cv3);θ23)+logc13|2DV(F(cv1|cv2),F(cv3|cv2);θ13|2))+(t=13logfcvt(x1,x2,x3,x4;ϕCVt)). (8)

Here, the sum of logfcvt(x1:4;ϕCVi) for i = 1, 2, 3 in Equation (8) represent the contribution to the log-likelihood function from each C-vine component, and the first three terms denote the contribution to the function L(Γ) from D-vine part. For the estimation of parameters, two-stage joint maximization is beneficial to estimate all parameters. For this reason, it is important to first maximize logfcvt to derive each ϕCVt for t = 1, 2, 3. Afterwards, one can estimate θDV by considering the first three terms in Equation (8). The ingredients of these functions, F(cvi), are calculated by using the empirical multivariate cumulative distributions function (EMCDF) [27]. These estimated values keep the joint distribution information for the random variables at each time point. Two-stage maximization procedure for CD-vine model can be summarized with the following steps to estimate all parameters:

  1. Simulate three different C-vine copula data in 4-dimension having sample size N using different parameter values.

  2. For the parameter estimation, first set some initials for the related parameters to maximize each C-vine part (Equation (8)).

  3. After deriving the estimates for each component, generate F(cvt) for t = 1, 2, 3 by evaluating (EMCDF) based on approximated parameters.

  4. Define pseudo-observations over each column of F(cvt) and use them to maximize the D-vine part of the likelihood function with some predetermined initials.

  5. Compare the original and estimated parameters based on different iterations and scenarios.

The derivations are exhibited for Clayton family in Appendix. Further details of the proposed model can be found in [14].

3. Numerical results

For the simulation, 4-dimensional C-vines with M = 3 are investigated. The dependence between the values of EMCDF belonging to each component ( F(cv1), F(cv2) and F(cv3)) is the facilitator inter-step for CD-vine modeling. As an optimization routine, derivative free DEoptim function is implemented under two step maximization process [28]. In this proposed model, similar to the widely evaluated Inference For Margins (IFM) method, first, the parameters of each component are estimated. Afterwards, these obtained parameters are used for the construction of F(cv1), F(cv2) and F(cv3) values. EMCDF values store the dependence information within each component. Finally, new 3-dimensional (EMCDF) values are modeled with the help of D-vine. Two special cases are discussed in terms of the parameter estimations. In the first case, truncated D-vine requires an independence copula since the dependence exists only within the components. As a second case, dependence among the components (time points) is studied. All the computations are made by using R programming language [30].

3.1. Simulation study

Suppose each component has Clayton pairs but there is no association between the calculated values of F(cv1), F(cv2) and F(cv3) for D-vine. In this framework, the most suitable copula family for the D-vine modeling part of CD-vine mixture is independence copula. For this reason, CD-vine model having Clayton pairs for C-vine and independence copula pairs for D-vine is studied. The parameters of each component are predefined as (β121=8,β131=7,β141=6,β23|11=9,β24|11=8,β34|121=7), (β122=9,β132=6,β142=5,β23|12=9,β24|12=8,β34|122=7) and (β123=4,β133=5,β143=7,β23|13=9,β24|13=8,β34|123=7).

Simulation results for the case of Clayton-Independence pairs belonging to C- and D-vine parts are presented in Table 1 based on 100 iterations, with varying sample sizes. The parameter estimates are very promising for CD-vine mixture with Clayton-Independence case even for a limited number of iterations. Besides, there is a positive impact of the increase in the sample size over the accuracy of DEoptim method. In this two-step maximization with the case of Clayton-Independence CD-vine mixture model, the parameter estimations for both components are reasonable. When there is no dependence among the components via F(cv1), F(cv2) and F(cv3) values, it is straightforward to exhibit high accuracy for the parameter estimates of such a CD-vine mixture model whose D-vine part is visualized in Figure 1. Here, based on 100 repetitions, the estimated values satisfy the fact that D-vine requires only independence copula (each parameter estimation is very close to zero as given in Figure 1). On the other hand, model identification is not so straightforward in the second scenario for the proposed CD-vine model.

Table 1. Parameter estimations of CD-vine mixture with Clayton-Independence copula pairs ( max1=100).

CV1 N β^121/β^131/β^141/β^23|11/β^24|11/β^34|121 St. dev. (Bias)
Clayton(8 / 7 / 6 / 9 / 8 / 7) 50 8.17 / 7.13 / 6.12 / 8.91 / 7.95 / 7.64 0.79 / 0.67 / 0.59 / 0.94 / 0.91 / 1.32
      (0.17) / (0.13) / (0.12) / (−0.09) / (−0.05) / (0.64)
  100 8.05 / 7.04 / 6.03 / 9.12 / 8.1 / 7.03 0.58 / 0.52 / 0.46 / 0.68 / 0.61 / 0.78
      (0.05) / (0.04) / (0.03) / (0.12) / (0.1) / (0.03)
  250 8.06 / 7.05 / 6.04 / 8.96 / 7.98 / 7.13 0.34 / 0.3 / 0.26 / 0.54 / 0.5 / 0.62
      (0.06) / (0.05) / (0.04) / (-0.04) / (-0.02) / (0.13)
CV2   β^122/β^132/β^142/β^23|12/β^24|12/β^34|122 St. dev. (Bias)
Clayton (9 / 6 / 5 / 9 / 8 / 7) 50 9.1 / 6.06 / 5.06 / 9.03 / 8.02 / 7.4 0.7 / 0.51 / 0.45 / 0.92 / 0.78 / 1.26
      (0.1) / (0.06) / (0.06) / (0.03) / (0.02) / (0.4)
  100 9.07 / 6.04 / 5.04 / 8.94 / 7.97 / 7.2 0.59 / 0.41 / 0.36 / 0.72 / 0.66 / 0.84
      (0.07) / (0.04) / (0.04) / (-0.06) / (-0.03) / (0.2)
  250 9.07 / 6.05 / 5.05 / 8.92 / 7.92 / 7.04 0.37 / 0.26 / 0.22 / 0.52 / 0.46 / 0.51
      (0.07) / (0.05) / (0.05) / (-0.08) / (−0.08) / (0.04)
CV3   β^123/β^133/β^143/β^23|13/β^24|13/β^34|123 St. dev. (Bias)
Clayton (4 / 5 / 7 / 9 / 8 / 7) 50 4.05 / 5.05 / 7.06 / 9.05 / 8.03 / 7.35 0.39 / 0.44 / 0.58 / 1.03 / 0.99 / 1.18
      (0.05) / (0.05) / (0.06) / (0.05) / (0.03) / (0.35)
  100 4.01 / 5 / 7 / 9.13 / 8.14 / 7.15 0.25 / 0.29 / 0.39 / 0.66 / 0.65 / 0.83
      (0.01) / (0) / (0) / (0.13) / (0.14) / (0.15)
  250 4.03 / 5.04 / 7.05 / 9.11 / 8.07 / 7.04 0.17 / 0.2 / 0.26 / 0.49 / 0.45 / 0.57
      (0.03) / (0.04) / (0.05) / (0.11) / (0.07) / (0.04)
DV   θ^12/θ^23/θ^13|2 St. dev. (Bias)
Independent (0 / 0 / 0) 50 −1.40-07 / −2.39e-07 / −1.16e-07 1.13e-07 / 1.86e-07 / 1.93e-07
      (1.40e07)(2.39e07)(1.16e07)
  100 −0.0007 / −0.0007 / −0.0006 0.0006 / 0.0006 / 0.0006
      (0.0007)(0.0007)(0.0006)
  250 −0.0008 / −0.0006 / −0.0006 0.0007 / 0.0005 / 0.0006
      (0.0008)(0.0006)(0.0006)

Figure 1.

Figure 1.

The estimated D-vine parameters (N = 500 and max1=100).

Within the context of the CD-vine mixture, nonzero dependence occurs whenever the EMCDF values of each component are calculated at the same levels. In this case, independence copula is not an appropriate choice and it is not possible to identify dependence patterns within the components via F(cv1), F(cv2) and F(cv3) values at the beginning. For this reason, the dependence structure of D-vine part is investigated after finishing the first step, which results in distinct families and parameters.

To illustrate better, consider a 4-dimensional 3-component CD-vine model, generated based on the same parameters, but with Frank (F) copula pairs for C-vine part. The performance of the parameter estimations is similar to the previously studied Clayton case in the C-vine part. After deriving the values of F(cv1), F(cv2) and F(cv3), accordingly the observed dependence pattern requires Gumbel (G) families for each pair in the D-vine part. A small model comparison based on the sample size N = 500 with 250 different realizations is summarized in Table 2. We observe that the best model is (G-G-G) among three ones which matches the base model in terms of lower information criteria values. Furthermore, the parameter estimations are closer to the original parameters, having lower standard deviations and biases rather than (F-F-G) and (F-F-F) models. For the latter models, the parameter estimations of Frank pairs directly attain to the pre-defined upper bounds in DEoptim routine. The zero values of standard deviations with high bias values are attached to this problem, which highlights the importance of initial values under such a multivariate modeling setup. Nevertheless, for the real-life data application, this two step maximization process can be cultivated by capturing the dependence among the components before evaluating D-vine model.

Table 2. CD-vine mixtures with different copulas in D-vine part (N = 500, max1= 250).

  Information criteria vaues θ12,θ23,θ13|2 (16.07 / 10.33 / 1.72)
DV AIC / BIC / CAIC Estimates St. dev. (Bias)
G-G-G −4470.645 / −4458.001 / −4455.001 12.02 / 8.87 / 2.02 1.72 / 1.25 / 0.34
      (−4.05) / (−1.46) / (0.29)
F-F-G −3645.486 / −3632.842 / −3629.842 20 / 20 / 2.2 0 / 0 / 0
      (3.93) / (9.67) / (0.48)
F-F-F −3489.295 / −3476.651 / −3473.651 20 / 20 / 5.41 0 / 0 / 0.01
      (3.93) / (9.67) / (3.68)

3.2. Application to finance: market indices

The proposed CD-vine model is applied to daily closing prices of major indices, DAX, SMI, CAC, FTSE whose descriptive statistics are listed in Table 3. Daily observations of indices between 1991 and 1998, except holidays and weekends, are retrieved from the repository of CRAN data sets package. For the implementation of a 3-component 4-dimensional CD-vine model, only the observations belonging to 1995–1997 are selected as an example and corresponding log return series are plotted in Figure 2.

Table 3. Descriptive statistics for log-return series (1995–1997) and for years CV1: 1995, CV2: 1996 and CV3: 1997.

Index Min. Max. Mean Variance Skewness Kurtosis
DAX −0.0601 0.0432 0.0013 1.20e−04 −0.2801 2.4429
SMI −0.0470 0.0497 0.0013 9.53e−05 −0.2046 2.6556
CAC −0.0437 0.0610 0.0011 1.23e−04 −0.0404 2.0135
FTSE −0.0310 0.0313 0.0007 6.00e−05 −0.1435 1.3557
DAXCV1 −0.0318 0.0243 0.0008 5.35e−05 −0.2152 1.6845
SMICV1 −0.0255 0.0497 0.0011 5.30e−05 0.8042 7.6202
CACCV1 −0.0347 0.0319 0.0005 8.39e−05 0.0393 1.2219
FTSECV1 −0.0144 0.0218 0.0004 3.30e−05 0.0716 0.3041
DAXCV2 −0.0378 0.0327 0.0015 8.20e−05 −0.3177 1.6705
SMICV2 −0.0344 0.0310 0.0016 7.14e−05 −0.5082 2.2917
CACCV2 −0.0399 0.0295 0.0013 1.07e−04 −0.4599 1.4018
FTSECV2 −0.0220 0.0265 0.0009 4.27e−05 −0.2070 1.0154
DAXCV3 −0.0601 0.0432 0.0017 2.26e−04 −0.2887 0.7929
SMICV3 −0.0470 0.0371 0.0013 1.62e−04 −0.2886 0.7565
CACCV3 −0.0437 0.0610 0.0014 1.79e−04 0.0811 1.6484
FTSECV3 −0.0310 0.0313 0.0009 1.05e−04 −0.1752 0.3311

Figure 2.

Figure 2.

Univariate log-returns of selected indices (1 January 1995–31 December 1997).

As a first insight, Table 3 reveals the summary statistics of the total duration 1995–1997 and each year separately. The average returns of DAX, SMI and CAC are closer and show a slight difference in FTSE for 1995–1997. This property is changing year to year, with similar differences in the skewness and kurtosis values. Most of the time log return series at each year represents left-skewed shape, while SMICV1, CACCV1, FTSECV1 in 1995, and only CACCV3 in 1997 indicate positive skewness values. It is also depicted that SMICV1 in 1995 yields the highest kurtosis.

Before employing CD-vine model, each series are modeled after checking the stationarity. Starting with the suitable ARIMA(p,d,q) models, standard GARCH(1,1) models are considered with various distribution assumptions, when necessary. In some cases, we find that there exist ARCH effect, tested by McLeod-Li method. For that purpose, different GARCH(1,1) models have been compared to gain further accuracy in the classical ARIMA(p,d,q) model. Table 4 shows GARCH(1,1) model is quite satisfactory for many log-return series except DAXCV2, SMICV3 and CACCV3. On the other hand, modeling residuals with a GARCH(1,1) is still reasonable when the p-values of McLeod-Li test for the original ARIMA(p,d,q) and GARCH(1,1) models are compared. Finally, residuals from two step time series modeling are retrieved for the proposed CD-vine model. To exhibit the dependence information depicted from these models, Kendall τ values for 1995 is visualized in Figure 3. For the upcoming years, the pattern is similar and follows an increasing trend for the Kendall τ values of each pair. These models are studied with built-in functions in rugarch R package [17].

Table 4. ARIMA model selection (LL: Log-likelihood, LB: Ljung-Box, M-Li: McLeod-Li, GARCH(1,1)k; k = std: student-t, sstd: skewed student-t, snorm: skewed normal distributions, : p value <.05).

Index Model LL LB M-Li
DAXCV1 ARIMA(1,0,1) GARCH(1,1)std 922.4105 0.485 0.6581
SMICV1 ARIMA(1,0,1) GARCH(1,1)std 937.9006 0.6262 0.9997
CACCV1 ARIMA(1,0,0) GARCH(1,1)sstd 861.8276 0.8368 0.8005
FTSECV1 ARIMA(2,0,2) GARCH(1,1)std 980.7856 0.3024 0.4483
DAXCV2 ARIMA(4,0,0) GARCH(1,1)sstd 875.944 0.6678 0
SMICV2 ARIMA(2,1,2) GARCH(1,1)std 890.4312 0.4692 0.3076
CACCV2 ARIMA(2,0,0) GARCH(1,1)sstd 835.8359 0.2972 0.0107
FTSECV2 ARIMA(0,0,0) GARCH(1,1)sstd 948.8898 0.5206 0.9545
DAXCV3 ARIMA(9,0,1) GARCH(1,1)snorm 738.8651 0.2092 0.029
SMICV3 ARIMA(2,0,0) GARCH(1,1)snorm 779.1635 0.135 0
CACCV3 ARIMA(5,0,0) GARCH(1,1)sstd 766.9586 0.5108 0
FTSECV3 ARIMA(0,0,1) GARCH(1,1)std 832.1742 0.3364 0.7369

Figure 3.

Figure 3.

Dependence structure among the transformed residuals (1995).

For each year, the dependence structure represents almost symmetric dependence in both tails, that can be visualized by the contour lines in Figure 3. To reduce the overall computational burden, only Frank copula pairs are considered in the first step of the CD-vine model. For the C-vine construction DAX log-return seems to be the best choice for being a root node in terms of Kendall τ values. The CD-vine model flowchart given in Figure 4 represents the structure of the modeling setup briefly. Here, the initial parameter values are borrowed from the parameter estimation of CDVineCopSelect function in CDvine R package [8]. Thereafter, the parameter estimates of each component is obtained by joint maximization for the C-vine part. For D-vine modeling, all available copula functions in the R package are considered to model each pair. The details of the fitted model with their log-likelihood (LL) values, dependence parameters and corresponding Kendall τ values are summarized in Table 5. Furthermore, tree structure of F(cv1), F(cv2) and F(cv3) based on the selected copula family is visualized in Figure 5.

Figure 4.

Figure 4.

CD-vine mixture model diagram for the residuals.

Table 5. Comparison of C-vine models for each component.

CV1 LL β^121/β^131/β^141/β^23|11/β^24|11/β^34|121 Kendall's τ
Frank 176.1582 3.76 / 5.12 / 3.99 / 1.02 / 1.66 / 1.23 0.37 / 0.46 / 0.39 / 0.11 / 0.18 / 0.13
CV2   β^122/β^132/β^142/β^23|12/β^24|12/β^34|122  
Frank 268.9389 5.65 / 7.07 / 4.78 / 1.11 / 1.50 / 1.83 0.50 / 0.57 / 0.44 / 0.12 / 0.16 / 0.20
CV3   β^123/β^133/β^143/β^23|13/β^24|13/β^34|123  
Frank 386.019 7.28 / 8.20 / 6.22 / 2.65 / 2.56 / 1.72 0.57 / 0.61 / 0.53 / 0.28 / 0.27 / 0.19
DV   θ^12/θ^23/θ^13|2  
G-G-SG 1201.872 13.54 / 16.88 / 1.20 0.93 / 0.94 / 0.17

Figure 5.

Figure 5.

D-vine model for F(cv1), F(cv2) and F(cv3).

Based on the results given in Table 5, the stock values exhibit a significant dependence pattern based on Frank family for each year. Especially, for unconditional densities, the dependence parameter values are higher at each year rather than the parameters of conditional ones at higher levels of C-vine. For instance, ordered numbers 7.28−8.20−6.22 for Model CV1 express that DAXCV3 log-return is positively dependent on other stock indices, namely SMICV3, CACCV3 and FTSECV3, in the year 1997 and they behave similarly. This dependence pattern almost similar for the previous years with smaller estimated parameters. For D-vine part, the dependence among the years is described with one-tailed dependence structure. In Figure 5, Gumbel (G) and Survival Gumbel (SG) family with their empirical Kendall τ values are given in parenthesis. For instance, the dependence among years 1995 and 1996 with the values of F(cv1), F(cv2) and F(cv3) is modeled by Gumbel family, which displays the upper tail dependence for these years. On the other hand, the dependence among years 1995 and 1997, conditioned on 1996 exhibits lower tail dependence with small values.

4. Concluding comments

For understanding the complex dependence patterns both within and among the components C-vines are combined via the D-vine model, called as a CD-vine mixture model. After discussing the proposed model with a simulation study, the hidden dependence among log-returns of stock indices and temporal dependence among the years are investigated via the proposed CD-vine model.

For the simulation part, in the case of CD-vine mixture with Clayton-Independence pairs, the parameter estimations are shown to be promising. In the case of no dependence among the components, it is easy to identify correct copula pairs for the D-vine part. On the other hand, when there exists an association between the components, it is not simple to select a copula family for the D-vine part. Besides, this dependence has a varying structure based on the obtained values of F(cv1), F(cv2) and F(cv3). For that reason, in the application part, the copula family and related parameters are initiated by fitting a D-vine model to the obtained data in the first step. Even if only Frank family is considered based on contour plots, this framework can be adapted for the other families for different tail dependence structures.

Further flexibility on the proposed model can be achieved by using different pair copulas. In this study, each component has been modeled by using same copula families, whereas, this methodology can be applied in case of selecting distinct copula functions to model the dependence among the variables. Certainly, this kind of selection increases the flexibility of the proposed mixture models, but the parameter estimation part requires more elegant methods. Additionally, in this study, only the vines with parametric copula families are considered for the sake of simplicity. For further studies, semi-parametric and non-parametric approaches can be adopted for the CD-vine model. Last but not least, the consideration of both C- and D-vines to identify the dependence among the variables and time points jointly rather than the two-step procedure will be the next step. These above-mentioned issues lie on the top of list for future studies.

Appendix. CD-vine model derivations.

The proposed mixture model given in Equation (8) for Clayton pairs is sketched. By following the C-vine definition, each logfcvt can be written as a product of conditional and unconditional Clayton densities as follows,

logfcv1(x1:4;ϕCV1)=log[c12(x1,x2;β121)c13(x1,x3;β131)c14(x1,x4;β141)c23|1(F(x2|x1),F(x3|x1);β23|11)c24|1(F(x2|x1),F(x4|x1);β24|11)c34|12(F(x3|x1,x2),F(x4|x1,x2);β34|121)] (A1)

In Equation (A1), the detailed construction is presented for only logc12 as,

c12=2C(x1,x2;β121)x1x2=(β121+1)(x1x2)β1211(x1β121+x2β1211)1β1212 (A2)

yielding,

logc12=log(β121+1)+(β1211)log(x1x2)+(1β1212)log(x1β121+x2β1211) (A3)

Similarly, one can write the formula for logc13 and logc14 for the first level. Thereafter, it is necessary to use h-functions and derive F(x2|x1), F(x3|x1) and F(x4|x1) for the next level. To illustrate, the computations for the F(xl|x1) is given. Let,

xl1=F(xl|x1)=C(x1,xl;β1l1)x1=x1β1l11(x1β1l1+xlβ1l11)1β1l11,l=2,3,4 (A4)

The algorithm of the first corresponding conditional density c23|1 is,

logc23|1=log(β23|11+1)+(β23|111)log(x21x31)+(1β23|112)log((x21)β23|11+(x31)β23|111) (A5)

This recursive process continues until reaching the last conditional density. After deriving each of them, the closed form of the Equation (A1) is identified. The similar mechanism appears for D-vine part after generating pseudo-observations on the values of cv1, cv2, cv3. The final analytical form of the logarithm of D-vine part is given,

(logc12DV(F(cv1),F(cv2);θ12)+logc23DV(F(cv2),F(cv3);θ23)+logc13|2DV(F(cv1|cv2),F(cv3|cv2);θ13|2))=log(θ12+1)+(θ121)log(cv1cv2)+(1θ122)log(cv1θ12+cv2θ121)+log(θ23+1)+(θ231)log(cv2cv3)+(1θ232)log(cv2θ23+cv3θ231)+log(θ13|2+1)+(θ13|21)log(cv12cv32)+(1θ13|22)log((cv12)θ13|2+(cv32)θ13|21), (A6)

where

cv12=F(cv1|cv2)=cv2θ121(cv1θ12+cv2θ121)1θ121. (A7)
cv32=F(cv3|cv2)=cv2θ231(cv2θ23+cv3θ231)1θ231. (A8)

The further details of the above derivations are available in [14].

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The considered data set in this study is available via https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/EuStockMarkets.html.

References

  • 1.Aas K., Pair-copula constructions for financial applications: A review, Econometrics 4(4) (2016), pp. 1–15. doi: 10.3390/econometrics4040043 [DOI] [Google Scholar]
  • 2.Aas K., Czado C., Frigessi A., and Bakken H., Pair-copula constructions of multiple dependence, Insurance: Math. Econom. 44(2) (2009), pp. 182–198. [Google Scholar]
  • 3.Almeida C. and Czado C., Efficient Bayesian inference for stochastic time-varying copula models, Comput. Stat. Data Anal. 56(6) (2012), pp. 1511–1527. doi: 10.1016/j.csda.2011.08.015 [DOI] [Google Scholar]
  • 4.Almeida C., Czado C., and Manner H., Modeling high-dimensional time-varying dependence using dynamic D-vine models, Appl. Stoch. Models. Bus. Ind. 32(5) (2016), pp. 621–638. doi: 10.1002/asmb.2182 [DOI] [Google Scholar]
  • 5.Arakelian V. and Karlis D., Clustering dependencies via mixtures of copulas, Commun. Stat. - Simul. Comput. 43(7) (2013), pp. 1644–1661. doi: 10.1080/03610918.2012.752832 [DOI] [Google Scholar]
  • 6.Bedford T. and Cooke R., Probability density decomposition for conditionally dependent random variables modeled by vines, Ann. Math. Artif. Intell. 32 (2001), pp. 245–268. doi: 10.1023/A:1016725902970 [DOI] [Google Scholar]
  • 7.Bedford T. and Cooke R.M., Vines: A new graphical model for dependent random variables, Ann. Stat. 30(4) (2002), pp. 1031–1068. doi: 10.1214/aos/1031689016 [DOI] [Google Scholar]
  • 8.Brechmann E.K. and Schepsmeier U., Modeling dependence with C- and D-Vine copulas: The R package CDVine, J. Stat. Softw. 52(3) (2013), pp. 1–27. doi: 10.18637/jss.v052.i0323761062 [DOI] [Google Scholar]
  • 9. E. Cuvelier and N.M. Fraiture, Clayton copula and mixture decomposition, in Applied Stochastic Models and Data Analysis (ASMDA), J. Janssen and P. Lenca, eds., Brest, France, 2005, pp. 699–708.
  • 10.Czado C., Analyzing Dependent Data with Vine Copulas: A Practical Guide With R, Springer International Publishing, Springer Nature Switzerland AG, 2019. [Google Scholar]
  • 11.DisMann J., Brechmann E.C., Czado C., and Kurowicka D., Selecting and estimating regular vine copulae and application to financial returns, Comput. Stat. Data Anal. 59(C) (2013), pp. 52–69. doi: 10.1016/j.csda.2012.08.010 [DOI] [Google Scholar]
  • 12.Erhardt V. and Czado C., Modeling dependent yearly claim totals including zero claims in private health insurance, Scand. Actuar. J. 2 (2012), pp. 106–129. doi: 10.1080/03461238.2010.489762 [DOI] [Google Scholar]
  • 13.Erhardt T.M. and Czado C., Standardized drought indices: A novel univariate and multivariate approach, J. Royal Stat. Soc.: Ser. C (Appl. Stat.) 67(3) (2018), pp. 643–664. [Google Scholar]
  • 14.Evkaya O., Mixture of vines for dependence modeling: Finite mixture and CD-vine approaches with applications, Ph.D. thesis, Middle East Technical University, Ankara, 2018.
  • 15. M. Frechet, Sur les tableaux de corrélation dont les marges sont données, Annales de I'Université De Lyon, Section A, Sciences Mathematiques et Astronomic (3)14 (1951), pp. 53–77. [Google Scholar]
  • 16.Ghalanos A., rugarch: Univariate GARCH models, R package version 1.3-8, 2017.
  • 17.Hu L., Dependence patterns across financial markets: a mixed copula approach, Appl. Financ. Econom. 16(10) (2006), pp. 717–729. doi: 10.1080/09603100500426515 [DOI] [Google Scholar]
  • 18.Joe H., Families of m-variate distributions with given margins and m(m1)/2 bivariate dependence parameters, in IMS Lecture Notes-Monograph Ser. Hayward, 1996, pp. 120–141. [Google Scholar]
  • 19.Joe H., Dependence Modeling with Copulas, Chapman and Hall/CRC, New York, 2014. [Google Scholar]
  • 20.Kim D., Kim J.-M., Liao S.-M., and Jung Y.-S., Mixture of D-vine copulas for modeling dependence, Comput. Stat. Data Anal. 64 (2013), pp. 1–19. doi: 10.1016/j.csda.2013.02.018 [DOI] [Google Scholar]
  • 21.Kosmidis I. and Karlis D., Model-based clustering using copulas with applications, Stat. Comput. 26 (2016), pp. 1079–1099. doi: 10.1007/s11222-015-9590-5 [DOI] [Google Scholar]
  • 22. D. Kurowicka and R.M. Cooke, Uncertainty Analysis With High Dimensional Dependence Modelling, Wiley Series in Probability and Statistics, JohnWiley & Sons Ltd, Chichester, 2006. [Google Scholar]
  • 23.Kurowicka D. and Joe H., Dependence Modeling: Handbook on Vine Copulae, World Scientific Publishing, Singapore/SG, 2010. [Google Scholar]
  • 24.Mathieu Vrac E.D. and Chédin Alain, Clustering a global field of atmospheric profiles by mixture decomposition of copulas, J. Atmospheric Oceanic Technol. 22 (2005), pp. 1445–1459. doi: 10.1175/JTECH1795.1 [DOI] [Google Scholar]
  • 25.Matteis R.D., Fitting copulas to data, Ph.D. thesis, Swiss Federal Institute of Technology Zurich, 2001.
  • 26.Mullen K., Ardia D., Gil D., Windover D., and Cline J., DEoptim: An R package for global optimization by differential evolution, J. Stat. Softw. 40 (2011), pp. 1–26. doi: 10.18637/jss.v040.i06 [DOI] [Google Scholar]
  • 27.R. Core Team and R. Foundation for Statistical Computing , R: A language and environment for statistical computing, 2017; software available at http://www.R-project.org/.
  • 28.Sklar M., Fonctions de Répartition À N Dimensions Et Leurs Marges, Université Paris, 8 1959.
  • 29.Sun M., Konstantelos I., and Strbac G., C-vine copula mixture model for clustering of residential electrical load pattern data, IEEE Trans. Power Syst. 32(3) (2017), pp. 2382–2393. doi: 10.1109/TPWRS.2016.2614366 [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The considered data set in this study is available via https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/EuStockMarkets.html.


Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES