Competing risks proportional-hazards cure model and generalized extreme value regression: an application to bank failures and acquisitions in the United States

A Beretta; C Heuchenne; M Restaino

doi:10.1080/02664763.2021.1973386

. 2021 Sep 14;49(16):4162–4180. doi: 10.1080/02664763.2021.1973386

Competing risks proportional-hazards cure model and generalized extreme value regression: an application to bank failures and acquisitions in the United States

A Beretta ^a,^CONTACT, C Heuchenne ^a, M Restaino ^b

PMCID: PMC9639486 PMID: 36353304

ABSTRACT

Several commercial banks in the United States disappeared during the last decades due to failure or acquisition by another entity. From a survival analysis perspective, however, the high censoring rate suggests that some institutions are likely to be immune to failure and/or acquisition. In this study, we use a competing risks proportional-hazards cure model in order to measure the impact of bank-specific and macroeconomic variables on the probabilities of being susceptible to these events (i.e. incidence) and on the survival time of susceptible banks (i.e. latency). Moreover, we propose to model the incidence distribution using Generalized Extreme Value regression and compare the results with the ones obtained by the usual logistic regression model. The proposed methodology is evaluated by means of a simulation study and then applied to a dataset of more than 4000 United States commercial banks spanning the period 1993–2018.

Keywords: Bank failures, bank acquisitions, cure model, proportional-hazards, competing-risks

1. Introduction

Over the last 40 years, the structure of the commercial bank industry in the United States changed considerably. The number of institutions insured by the Federal Deposit Insurance Corporation (FDIC) has fallen from about 14,000 in 1980 to less than 5000 in 2017, more than 60 %. Despite two waves of failures during the periods 1982–1995 (Savings and Loan crisis) and 2007–2013 (subprime mortgage crisis), the concentration of the commercial bank industry is mainly driven by mergers and acquisitions (see Figure 1). After the Savings and Loans crisis, we observe a negative correlation between the number of failures and unassisted mergers.

Figure 1. — Number of unassisted mergers and failures of FDIC-insured commercial banks during the period 1980–2017. Source: FDIC, Historical Bank Data, https://banks.data.fdic.gov/explore/historical.

The literature about bank failures started growing with the Savings and Loan crisis. Most of the studies used probit/logit regression models to investigate the probability of default or discriminant analysis to separate the healthy banks from the troubled institutions, conditionally on bank-specific information (for a review, see Demirgüç-Kunt [15]). Since then, the use of probit/logit regression models is quite a common practice in the literature, as well as the use of bank-specific explanatory variables covering the different dimensions of the CAMEL rating system (capital adequacy, assets quality, management efficiency, earnings and liquidity). See, among others, Cole and White [11], Altman et al. [2], Betz et al. [7], Balla et al. [5] and Audrino et al. [4].

Also, the literature about mergers and acquisitions in the banking industry grew considerably since the 1980s (for an exhaustive review of this literature covering over 150 studies, see DeYoung et al. [17]). The two main streams of empirical research studied the determinants of mergers and acquisitions and the financial performance of the merging banks. Several authors employed logit/probit models to investigate bank-specific and sometimes macroeconomic factors affecting the probability of a bank being a target for a merger or acquisition. See, among others, Focarelli et al. [19], Akhigbe et al. [1], Correa [12], Hernando et al. [21] and Caiazza et al. [8].

Because the probit/logit models do not include the time-to-event in the modeling efforts and due to their parametric nature, Lane et al. [23] and Whalen [28] proposed to study bank failures using Cox's proportional-hazards (PH) model [13]. Its main advantage lies in the ability to directly model the survival time. In addition, as a semi-parametric model, it does not require any assumption about the distribution of survival times. In a comparison of actual and predicted times to failure, Lane et al. [23] showed how it tends to identify bankruptcies prior to the actual failure date. Similarly, Molina [24] investigated the determinants of commercial bank failures in Venezuela during 1994–1995 crisis. In a competing risks framework, Wheelock and Wilson [29] proposed the use of the Cox's PH model to identify the characteristics that make a bank more likely to fail or be acquired, with emphasis on management qualities. Whereas Hannan and Pilloff [20] investigated the determinants of acquisition by four types of acquirers, depending on their size and the geographic market in which they operate (within or outside the market of the target organization).

Despite its advantages, Cox's PH model assumes that the entire population will eventually experience the event of interest. However, the small number of defaults and the substantial proportion of right-censored banks suggest the existence of a subpopulation of banks that is likely to be immune to bankruptcy. For this reason, Cole and Gunther [10] introduced the use of the split population survival time model (also known as the cure model) into the study of bank failures. Originally developed in biostatistics to study long-term survivors of cancer in clinical trials, cure models assume that the population under study is composed by a cured (i.e. immune) and an uncured (i.e. susceptible) subpopulation. Moreover, they can separate the effects of certain factors on the probability of being susceptible (known as incidence) from the ones influencing the survival time of susceptible subjects (known as latency). For a comprehensive review of cure models, the reader may refer to Amico and Van Keilegom [3]. In the framework of U.S. commercial bank failures, Cole and Gunther [10] used a fully parametric cure model to predict survival times from the first quarter of 1986 through the second quarter of 1992. Whereas Beretta and Heuchenne [6] used a semi-parametric PH cure model with time-varying covariates to analyze the defaults occurred during that the period 2006–2016. In particular, they proposed a variable selection technique based on the Smoothly Clipped Absolute Deviation (SCAD) penalty.

In cure models, the probability of being susceptible to the event of interest is commonly assumed to follow a logistic regression model, a special case of a generalized linear model with a logit link function. As noted by King and Zeng [22], when the event of interest is rare, the use of a logit link function may be problematic due to its symmetry around the value 0.5. The response curve approaches zero at the same rate as it approaches one. According to Czado and Santner [14], the same problem holds for all symmetric link functions and leads to link misspecification, with a potential bias towards the majority class. For the aforementioned reason, Wang and Dey [27] and Calabrese and Osmetti [9] proposed the use of a skewed link function: the inverse of the cumulative density function of the Generalized Extreme Value (GEV) distribution. This function is more flexible and allows the response curve to approach zero and one at different rates. The resulting model is known as binary GEV regression.

The contribution of this paper is twofold. First, since a bank may cease to exist primarily due to bankruptcy or acquisition by another entity, we use a competing risks PH cure model to analyze the effects of bank-specific and macroeconomic covariates on the probabilities of being susceptible to these events (i.e. incidence) and on the survival time of susceptible banks (i.e. latency). For this purpose, we use a dataset of more than 4000 commercial banks in the United States established before 1993 and observed up to 2018. Second, given the small number of banks facing the two events in our sample, we propose to model the probability to be susceptible using a binary GEV regression.

The rest of the paper is structured as follows. In Section 2, we recall the binary GEV regression model and we conduct a simulation study to compare its finite sample performance with the classical logistic regression model. In Section 3, we present the competing risks PH cure model with the incidence components modeled by the binary GEV regression and we assess its finite sample performance with a simulation study. Finally, in Section 4, we describe the dataset of U.S. commercial banks spanning the period 1993–2018, present the results of our analysis using the proposed methodology, validate its performance using a cross-validation procedure and assess its predictive ability splitting the dataset into training and test sets, covering the periods 1993–2010 and 2011–2018, respectively.

2. The binary GEV regression model

Let $Y$ be a vector of Bernoulli random variables, which take value 1 (resp., 0) with probability $π_{i}$ (resp., $1 - π_{i}$ ), for $i = 1, \dots, n$ . A classical approach to study, the relationship between $Y$ and a matrix $X$ of predictor variables, is to estimate $π_{i}$ using a logistic regression model:

π (x_{i}) = \frac{e^{x_{i} b}}{1 + e^{x_{i} b}},

where $x_{i} = (1, x_{i, 1}, \dots, x_{i, p})$ is a vector of explanatory variables and $b$ is a vector of unknown coefficients, which are estimated using maximum likelihood. However, as noted by King and Zeng [22], when the outcome of interest is a rare event, the use of a logit link function may be inappropriate due to its symmetry around the value 0.5. According to Czado and Santner [14], the same problem holds for all symmetric link functions and leads to link misspecification, with a potential bias towards the majority class. A remedy to this problem, suggested by Wang and Dey [27] and Calabrese and Osmetti [9], is the adoption of a skewed link function: the quantile of the Generalized Extreme Value (GEV) distribution with location, scale and shape parameters equal to 0, 1 and τ, respectively. Depending on the value of the shape parameter $τ \in R$ , the GEV distribution includes 3 types of distributions as special cases: Gumbel ( $τ = 0$ ), Fréchet ( $τ > 0$ ) and Weibull ( $τ < 0$ ). The relationship between the probability $π_{i}$ and the explanatory variables is given by the cumulative density function of the GEV distribution:

π (x_{i}) = {\begin{cases} \exp [- (1 + τ x_{i} b)^{- 1 / τ}], & if τ \neq 0 and 1 + τ x_{i} b > 0, \\ 0, & if τ > 0 and x_{i} b \leq - 1 / τ, \\ 1, & if τ < 0 and x_{i} b \geq - 1 / τ . \end{cases}

This function is not symmetric around 0, it approaches 0 and 1 at different rates, and is more flexible compared to the response curve of the logistic regression model (see Figure 2). The degree of asymmetry depends on the value of the parameter τ, which governs the tail behavior of the GEV distribution. In order to understand the influence of this parameter, let us consider a hypothetical dataset where the response $Y$ is unbalanced because it contains many more ones than zeros. For values of $τ > 0$ , the observations with $y_{i} = 1$ will have $π (x_{i}) = 0$ , when $x_{i} b \leq - 1 / τ$ . As a consequence, some of these observations will be excluded from the log-likelihood and the dataset will become more balanced (from an estimation point of view). On the contrary, for $τ < 0$ , the observations with $y_{i} = 0$ will have $π (x_{i}) = 1$ , when $x_{i} b \geq - 1 / τ$ . In this case, some of these observations will be excluded from the log-likelihood and the dataset will become more unbalanced (always from an estimation point of view).

Figure 2. — Response curves of the logistic (LOGIT) and GEV regression models.

The parameters $b$ and τ are estimated using the maximum-likelihood method. The log-likelihood function is

ℓ (b, τ) = \sum_{i = 1}^{n} y_{i} \log [π (x_{i})] + (1 - y_{i}) \log [1 - π (x_{i})]

(1)

and its first derivatives with respect to the parameters of interest are

\frac{\partial ℓ (b, τ)}{\partial b} = \frac{\partial ℓ (b, τ)}{\partial π (x_{i})} \frac{\partial π (x_{i})}{\partial b}

and

\frac{\partial ℓ (b, τ)}{\partial τ} = \frac{\partial ℓ (b, τ)}{\partial π (x_{i})} \frac{\partial π (x_{i})}{\partial τ},

where

\begin{aligned} \frac{\partial ℓ (b, τ)}{\partial π (x_{i})} = \frac{y_{i}}{π (x_{i})} - \frac{1 - y_{i}}{1 - π (x_{i})}, \end{aligned}

(2)

\begin{aligned} \frac{\partial π (x_{i})}{\partial b} = {\begin{cases} - \frac{x_{i}}{1 + τ x_{i} b} π (x_{i}) \ln [π (x_{i})], & if τ \neq 0 and 1 + τ x_{i} b > 0, \\ 0, & elsewhere \end{cases} \end{aligned}

(3)

and

\frac{\partial π (x_{i})}{\partial τ} = {\begin{cases} [\frac{\ln (1 + τ x_{i} b)}{τ^{2}} - \frac{1}{τ} \frac{x_{i} b}{1 + τ x_{i} b}] π (x_{i}) \ln [π (x_{i})], & if τ \neq 0 and 1 + τ x_{i} b > 0, \\ 0, & elsewhere. \end{cases}

(4)

Computational issues. Equations (1)–(4) may not be defined when $π (x_{i}) = 0$ or $π (x_{i}) = 1$ . In order to solve this issue, in such cases we impose that $π (x_{i}) = ϵ$ or $π (x_{i}) = 1 - ϵ$ , where ϵ is the smallest positive floating-point number such that $1 + ϵ \neq 1$ .

3. Competing risks GEV-PH cure model

Considering that a bank may disappear due to bankruptcy (event j = 1) or acquisition (event j = 2), we extend the semi-parametric PH cure model of Sy and Taylor [25] to competing risks. We assume that the banks' population is composed of banks (i) susceptible to both default and acquisition, (ii) susceptible to default only, (iii) susceptible to acquisition only and (iv) immune to both default and acquisition.

Let the observed time be $T = min (W_{1}, W_{2}, C)$ , where $W_{1}$ , $W_{2}$ and C are random variables denoting the time to default, acquisition and censoring, respectively. The censoring mechanism is assumed independent from the event times. Furthermore, let the indicator $Y_{j}$ , for $j \in (1, 2)$ , denote whether a bank is susceptible to the jth event.

The probability to be susceptible to the jth event, also known as incidence, is modeled by a binary GEV regression:

P (Y_{j, i} = 1 | x_{i}) = π_{j} (x_{i}; b_{j}, τ_{j}),

where $x_{i} = (1, x_{i, 1}, \dots, x_{i, p})$ is a vector of covariates, $b_{j}$ a vector of unknown coefficients and $τ_{j}$ the unknown shape parameter of the GEV distribution. Here, the two events are assumed to be independent conditionally on the covariates.

The time to the jth event for the susceptible banks, also known as latency, is modeled according to a Proportional Hazards (PH) model, with a conditional survival function defined as

S_{j} (t | z_{i}) = S (t | Y_{j} = 1, z_{i}) = [S_{0, j} (t)]^{e^{z_{i} β_{j}}},

where $S_{0, j} (t) = e^{- \int_{0}^{t} h_{0, j} (u) d u}$ is the baseline survival function, $z_{i}$ a vector of covariates and $β_{j}$ a vector of unknown coefficients.

3.1. Estimation

We denote the observed data as $O = {(t_{i}, δ_{1, i}, δ_{2, i}, x_{i}, z_{i}); i = 1, \dots, n}$ , where $δ_{j, i} = 1$ , if the ith bank experienced the jth event, or $δ_{j, i} = 0$ , otherwise. We further denote the model parameters as $Θ = [b_{1}, b_{2}, τ_{1}, τ_{2}, β_{1}, β_{2}, h_{0, 1} (t), h_{0, 2} (t)]$ , where $h_{0, j} (t)$ is the baseline hazard function for the jth event.

The indicators $y_{j} = {y_{j, i} : i = 1, \dots, n}$ are partially unobserved, because $y_{j, i} = 1$ only when a bank experiences one of the events of interest ( $δ_{j, i} = 1$ ). For this reason, we treat the estimation of the model's parameters as a missing data problem, using an expectation–maximization (EM) algorithm ([16]). The complete-data likelihood is defined as the product of an incidence and a latency component: $L_{C} (Θ; O, y_{1}, y_{2}) = L_{C}^{1} (b_{1}, b_{2}, τ_{1}, τ_{2}; O, y_{1}, y_{2}) \times L_{C}^{2} (β_{1}, β_{2}, h_{0, 1} (t), h_{0, 2} (t); O, y_{1}, y_{2})$ , where

\begin{aligned} L_{C}^{1} (b_{1}, b_{2}, τ_{1}, τ_{2}; O, y_{1}, y_{2}) & = \prod_{i = 1}^{n} π_{1} (x_{i})^{y_{1, i}} π_{2} (x_{i})^{y_{2, i}} \\ \times [1 - π_{1} (x_{i})]^{(1 - y_{1, i})} [1 - π_{2} (x_{i})]^{(1 - y_{2, i})}, \\ L_{C}^{2} (β_{1}, β_{2}, h_{0, 1} (t), h_{0, 2} (t); O, y_{1}, y_{2}) & = \prod_{i = 1}^{n} \prod_{j = 1}^{2} h_{j} (t_{i} | z_{i})^{δ_{j, i}} S_{j} (t_{i} | z_{i})^{y_{j, i}} . \end{aligned}

(5)

In the rest of the paper, we will use ℓ to denote log-likelihoods.

In the expectation step (E-step), we compute the conditional expectation of the complete-data log-likelihood with respect to $y_{1}$ and $y_{2}$ , given the observed data $O$ and the current parameter estimates ${\hat{Θ}}^{(m)}$ . Since it is a linear function of $y_{1}$ and $y_{2}$ , the conditional expectation is equal to $ℓ_{C} ({\hat{Θ}}^{(m)}; O, ϕ_{1}^{(m)}, ϕ_{2}^{(m)})$ , where

ϕ_{j, i}^{(m)} = E [y_{j, i} | O, Θ^{(m)}] = {\begin{cases} 1, & if δ_{j, i} = 1, \\ \frac{π_{j} (x_{i}) S_{j} (t_{i} | z_{i})}{1 - π_{j} (x_{i}) + π_{j} (x_{i}) S_{j} (t_{i} | z_{i})}, & if δ_{j, i} = 0, \end{cases}

which can be seen as the posterior probability to be classified as susceptible to the jth event.

In the maximization step (M-step), we compute the parameter estimates for the next iteration of the EM algorithm as ${\hat{Θ}}^{(m + 1)} = {argmax}_{Θ} ℓ_{C} (Θ; O, ϕ_{1}^{(m)}, ϕ_{2}^{(m)})$ . Assuming that the baseline hazard function is piecewise constant $h_{0, j, l} = h_{0, j, l} (t | Y_{j} = 1)$ , for $t \in (t_{(l - 1)}, t_{(l)}]$ , between the $k_{j}$ ordered event times $t_{(1_{j})} \leq \dots \leq t_{(k_{j})}$ , its estimator is given by

{\hat{h}}_{0, j, l} (β_{j}; O, y_{j}) = \frac{1}{\sum_{i \in R_{j} (t_{(l)}^{-})} y_{j, i} \exp (z_{i}^{'} β_{j})},

(6)

where $R_{j} (t_{(l)}^{-})$ is the risk set for the jth event just prior to time $t_{(l)}$ (i.e. the set of all individuals who did not experience the jth event and have not been censored just prior to time $t_{(l)}$ yet) and $l = 1, \dots, k_{j}$ . Notice that in $R_{j} (t_{(l)}^{-})$ , we do not consider the individuals who experienced another event prior to time $t_{(l)}$ . Replacing $h_{0, j} (t)$ in (5) by this estimator, it is possible to derive a partial log-likelihood which does not depend on $h_{0, j} (t)$ any more:

ℓ_{C}^{3} (β_{1}, β_{2}; O, y_{1}, y_{2}) = \sum_{j = 1}^{2} \sum_{l = 1}^{k_{j}} [z_{i}^{'} β_{j} - \ln (\sum_{i \in R_{j} (t_{(l)}^{-})} y_{j, i} \exp (z_{i}^{'} β_{j}))] .

To sum up, starting from some initial values ${\hat{Θ}}^{(1)}$ , in the E-step we compute $ϕ_{1}^{(m)}$ and $ϕ_{2}^{(m)}$ given $O$ and ${\hat{Θ}}^{(m)}$ . Next, in the M-step, we estimate the parameters $b_{j}$ , $β_{j}$ and $τ_{j}$ , for $j = {1, 2}$ ,

\begin{aligned} ({\hat{b}}_{1}^{(m + 1)}, {\hat{b}}_{2}^{(m + 1)}, τ_{1}^{(m + 1)}, τ_{2}^{(m + 1)}) & = {argmax}_{b_{1}, b_{2}, τ_{1}, τ_{2}} ℓ_{C}^{1} (b_{1}, b_{2}, τ_{1}, τ_{2}; O, ϕ_{1}^{(m)}, ϕ_{2}^{(m)}), \\ ({\hat{β}}_{1}^{(m + 1)}, {\hat{β}}_{2}^{(m + 1)}) & = {argmax}_{β_{1}} ℓ_{C}^{3} (β_{1}, β_{2}; O, ϕ_{1}^{(m)}, ϕ_{2}^{(m)}), \end{aligned}

and the baseline hazard functions ${\hat{h}}_{0, j}^{(m + 1)} = {{\hat{h}}_{0, j, l} ({\hat{β}}_{j}^{(m + 1)}; O, ϕ_{j}^{(m)}), \forall l = 1, \dots, k_{j}}$ . The iterations between E-step and M-step continue until the norms of two successive estimates of $b_{1}$ , $b_{2}$ , $τ_{1}$ , $τ_{2}$ , $β_{1}$ and $β_{2}$ are lower than a given tolerance threshold.

In practice, the estimation of the model's parameters reduces to the estimation of two semi-parametric PH cure models, one for each event of interest, treating the acquired (resp., failed) banks as censored when modeling defaults (resp., acquisitions). For the calculation of the confidence intervals of the parameter estimates $b_{j}$ , $τ_{j}$ and $β_{j}$ , for j = 1, 2, we use the percentile bootstrap method (Efron [18]), which provides better results than the basic bootstrap in our settings. This leads to the application of the EM algorithm on B random bootstrap samples to get estimates of the distributions of the estimated parameters.

3.2. Simulation study

In this section, we present the results of a simulation study conducted to assess the performance of the proposed competing risks GEV-PH cure model using Monte Carlo simulation with 500 replications. The susceptibility indicators $(Y_{1}, Y_{2})$ are generated from a binary GEV regression model with coefficients $b_{1} = (b_{0, 1}, 1, - 1, 1, - 1)^{'}$ and $b_{2} = (b_{0, 2}, - 1, 1, - 1, 1)^{'}$ , shape parameters $τ_{1}$ and $τ_{2}$ , and 4 covariates following independent standard normal distributions. The time-to-events $(W_{1}, W_{2})$ are generated from a Cox's proportional hazards model with parameters $β_{1} = (- 1, 1, - 1, 1)^{'}$ , $β_{2} = (1, - 1, 1, - 1)^{'}$ , a unit constant baseline hazard function and 4 covariates following independent standard normal distributions. The censoring times are generated from an exponential distribution with parameter $λ_{C}$ . Finally, the observed data ${(t_{i}, δ_{1, i}, δ_{2, i}); i = 1, \dots, n}$ are obtained using Algorithm 1.

3.2.

We consider 5 simulation settings with different levels of censoring and susceptibility to each event, depending on the values of $b_{0, 1}$ , $b_{0, 1}$ , $τ_{1}$ and $τ_{2}$ (see Table 1), and 3 sample sizes (n = 500, 1000, 2000 ). At each replication, we fit two competing risks PH cure models, where the incidence component is modeled by binary GEV or logistic (LOGIT) regression. In Settings 1–4, the susceptibility indicators are generated from a binary GEV regression model, whereas, in Setting 5, from a logistic regression model.

Table 1.

Simulation settings. Cens% is the percentage of censoring and ${\bar{π}}_{1}$ (resp., ${\bar{π}}_{2}$ ) is the average number of individuals susceptible to event 1 (resp., event 2).

	Cens%	${\bar{π}}_{1}$	${\bar{π}}_{2}$	$b_{0, 1}$	$b_{0, 2}$	$τ_{1}$	$τ_{2}$
Setting 1	0.87	0.11	0.11	$- 2$	$- 2$	1	1
Setting 2	0.83	0.11	0.18	$- 2$	$- 2$	1	$- 1$
Setting 3	0.75	0.11	0.32	$- 2$	$- 1$	1	$- 1$
Setting 4	0.64	0.32	0.32	$- 1$	$- 1$	$- 1$	$- 1$
Setting 5	0.85	0.13	0.13	$- 3$	$- 3$	–	–

Open in a new tab

The performances of the two models are measured in terms of Mean Absolute Error (MAE) and Mean Squared Error (MSE) of the estimated probabilities. For the probability to be susceptible (incidence), they are defined as

MAE ({\hat{π}}_{j}) = \frac{1}{n} \sum_{i = 1}^{n} | {\hat{π}}_{j} (x_{i}) - π_{j} (x_{i}) |

and

MSE ({\hat{π}}_{j}) = \frac{1}{n} \sum_{i = 1}^{n} [{\hat{π}}_{j} (x_{i}) - π_{j} (x_{i})]^{2} .

Whereas, for the survival probability (latency), they are defined as

MAE ({\hat{S}}_{j}) = \frac{1}{n} \sum_{i = 1}^{n} | {\hat{S}}_{j} (t_{i} | z_{i}) - S_{j} (t_{i} | z_{i}) |

and

MSE ({\hat{S}}_{j}) = \frac{1}{n} \sum_{i = 1}^{n} [{\hat{S}}_{j} (t_{i} | z_{i}) - S_{j} (t_{i} | z_{i})]^{2} .

In Table 2, we provide the median values of MAE and MSE over the 500 replications for the probability to be susceptible (incidence). In Settings 1–4, the median of the model errors is always lower when the incidence is modeled using GEV regression. Compared to logistic regression (LOGIT), it decreases at a faster pace when the sample size increases; slightly less fast when the average number of 1's is higher, but these results are overall in line with our expectations, since the GEV regression model is more flexible (as we discussed in Section 2). Moreover, when the susceptibility indicators are generated from a logistic regression model (Setting 5), the performance of the GEV and LOGIT model is quite similar.

Table 2.

Simulation study: median of the model errors for the probabilities to be susceptible.

		$MAE ({\hat{π}}_{1})$			$MAE ({\hat{π}}_{2})$			$MSE ({\hat{π}}_{1})$			$MSE ({\hat{π}}_{2})$
	N	LOGIT	GEV	rel.%	LOGIT	GEV	rel.%	LOGIT	GEV	rel.%	LOGIT	GEV	rel.%
Setting 1	500	0.0425	0.0367	0.83	0.0424	0.0378	0.82	0.0074	0.0072	0.93	0.0073	0.0078	0.86
	1000	0.0374	0.0237	0.61	0.0372	0.0233	0.60	0.0054	0.0030	0.48	0.0054	0.0029	0.48
	2000	0.0347	0.0129	0.38	0.0348	0.0127	0.36	0.0044	0.0009	0.20	0.0044	0.0009	0.19
Setting 2	500	0.0420	0.0371	0.82	0.0392	0.0292	0.75	0.0072	0.0073	0.91	0.0048	0.0034	0.71
	1000	0.0369	0.0223	0.57	0.0321	0.0201	0.61	0.0053	0.0026	0.44	0.0033	0.0015	0.47
	2000	0.0347	0.0126	0.36	0.0287	0.0143	0.49	0.0044	0.0009	0.19	0.0026	0.0008	0.30
Setting 3	500	0.0427	0.0370	0.86	0.0490	0.0319	0.65	0.0075	0.0075	0.94	0.0056	0.0034	0.58
	1000	0.0374	0.0221	0.58	0.0431	0.0223	0.53	0.0053	0.0027	0.45	0.0042	0.0016	0.40
	2000	0.0350	0.0129	0.37	0.0395	0.0160	0.40	0.0044	0.0009	0.19	0.0035	0.0008	0.23
Setting 4	500	0.0493	0.0335	0.66	0.0487	0.0325	0.68	0.0056	0.0034	0.62	0.0055	0.0034	0.61
	1000	0.0433	0.0232	0.53	0.0429	0.0228	0.53	0.0042	0.0018	0.39	0.0041	0.0017	0.39
	2000	0.0396	0.0162	0.40	0.0397	0.0158	0.39	0.0034	0.0009	0.24	0.0035	0.0009	0.24
Setting 5	500	0.0318	0.0373	1.08	0.0327	0.0365	1.08	0.0029	0.0041	1.20	0.0031	0.0041	1.18
	1000	0.0212	0.0234	1.07	0.0216	0.0241	1.07	0.0013	0.0016	1.14	0.0014	0.0017	1.15
	2000	0.0148	0.0167	1.10	0.0152	0.0171	1.10	0.0006	0.0008	1.23	0.0007	0.0008	1.19

Open in a new tab

rel.%, is the median of the ratio between the GEV and LOGIT errors.

In Table 3, we provide the median values of MAE and MSE over the 500 replications for the survival probabilities (latency). In all settings, the median of the model errors decreases as the sample size increases. We do not notice an important difference between the two models. Only in situations of high censoring, we observe a slightly better performance when the incidence is modeled by GEV regression.

Table 3.

Simulation study: median of the model errors for the survival probabilities.

		$MAE ({\hat{S}}_{1})$			$MAE ({\hat{S}}_{2})$			$MSE ({\hat{S}}_{1})$			$MSE ({\hat{S}}_{2})$
	N	LOGIT	GEV	rel.%	LOGIT	GEV	rel.%	LOGIT	GEV	rel.%	LOGIT	GEV	rel.%
Setting 1	500	0.0913	0.0768	0.93	0.0867	0.0731	0.91	0.0209	0.0153	0.89	0.0196	0.0139	0.86
	1000	0.0598	0.0515	0.93	0.0584	0.0515	0.90	0.0098	0.0071	0.86	0.0092	0.0071	0.83
	2000	0.0418	0.0357	0.88	0.0426	0.0356	0.87	0.0047	0.0035	0.77	0.0049	0.0035	0.77
Setting 2	500	0.0840	0.0729	0.92	0.0564	0.0562	1.00	0.0188	0.0138	0.86	0.0085	0.0084	0.99
	1000	0.0594	0.0513	0.90	0.0388	0.0385	1.00	0.0099	0.0070	0.83	0.0041	0.0041	1.00
	2000	0.0404	0.0343	0.85	0.0273	0.0271	1.00	0.0045	0.0033	0.74	0.0021	0.0021	1.00
Setting 3	500	0.0796	0.0691	0.95	0.0422	0.0425	1.00	0.0171	0.0129	0.90	0.0046	0.0045	1.00
	1000	0.0545	0.0478	0.94	0.0287	0.0287	0.99	0.0081	0.0064	0.89	0.0022	0.0022	0.99
	2000	0.0392	0.0326	0.86	0.0203	0.0199	0.99	0.0044	0.0031	0.76	0.0011	0.0011	0.99
Setting 4	500	0.0389	0.0391	1.00	0.0383	0.0380	0.99	0.0041	0.0041	1.00	0.0039	0.0039	0.99
	1000	0.0267	0.0268	1.00	0.0273	0.0269	0.99	0.0020	0.0019	1.00	0.0021	0.0020	0.99
	2000	0.0188	0.0186	0.99	0.0189	0.0189	0.99	0.0010	0.0010	0.99	0.0010	0.0010	0.98
Setting 5	500	0.0702	0.0694	1.00	0.0733	0.0729	1.00	0.0131	0.0128	1.00	0.0139	0.0137	1.00
	1000	0.0478	0.0478	1.00	0.0485	0.0486	1.00	0.0062	0.0062	1.00	0.0063	0.0064	1.00
	2000	0.0331	0.0334	1.00	0.0317	0.0316	1.00	0.0031	0.0031	1.00	0.0028	0.0028	1.00

Open in a new tab

rel.%, is the median of the ratio between the GEV and LOGIT errors.

4. Application to US bank failures and acquisitions

In this section, we analyze a dataset of United States commercial banks insured by the Federal Deposit Insurance Corporation (FDIC), which spans the period from 1993 to 2018. Information about defaults and acquisitions come from the National Information Center (NIC)¹ of the Federal Financial Institutions Examination Council (FFIEC), a repository of financial data on institutions for which the Federal Reserve has a supervisory, regulatory, or research interest.

After removing few banks acquired due to regulatory actions, the sample consists of 4413 banks established before January 1993. During the period under investigation, 294 (6.7%) institutions were closed by the regulator due to failure and 303 (6.9%) institutions have been acquired by another entity. Most of the banks in the dataset are still operative (86%) at the end of 2018. In Figure 3, we provide a bar plot of the number of failures and acquisitions by year. Most failures are concentrated at the end of the Savings and Loan crisis (1993) and during the subprime mortgage crisis ( 2009–2010). Whereas acquisitions were numerous after the Savings and Loan crisis and then decreased approaching the subprime crisis. Overall, we can observe a negative relationship between the number of failures and the number of acquisitions over time, failures are lower in periods with many acquisitions.

Figure 3. — Number of defaults and acquisitions by year.

Similarly to previous studies [2,4,5,10,11,15,26,29], we use explanatory variables as proxies for the five dimensions of the CAMEL rating system: capital adequacy, assets quality, management efficiency, earnings and liquidity. In particular, as in Beretta and Heuchenne [6], we selected the following variables .² Capital adequacy is measured by the Equity to Total Assets ratio, an indicator of financial strength, which serves as a buffer to absorb future losses. Asset quality is measured by the Loans to Total Assets ratio, usually the least liquid and most risky assets, Non-Performing Assets to Total Assets ratio, the sum of assets past due 30 days or more (but still accruing interest) and assets in non-accrual status, Other Real Estate Owned Assets to Total Assets ratio, the real estate assets acquired by a bank in full or partial satisfaction of a debt previously contracted, and Allowance for Loan and Lease Losses to Total Assets ratio, a reserve calculated on the basis of the credit risk to cover future charge-offs. The ability to generate earnings is measured by Return on Assets, an indicator of a bank's profitability relative to its total assets. Management efficiency is measured by the Non-Interest Expenses to Net Income ratio. Since the objective of a bank is to maximize revenues and minimize costs, a lower value of this ratio means that a bank is more efficient. Liquidity is measured by the Liquid Assets (Cash and Balances due from Depository Institutions) to Total Assets ratio and the Core (Retail) Deposits to Total Assets ratio, which are the most liquid assets and the most stable source of funding, respectively. We also include some control variables: the bank's age, measured in years since the date of establishment, and the bank's size, measured as the logarithm of the bank's total assets. In addition, in order to capture the general economic conditions, we included two macroeconomic variables: the yearly unemployment rate³ and the yearly Gross Domestic Product (GDP) growth⁴ in the state where a bank is located.

All the covariates mentioned above are observed at the end of the year previous to failure, acquisition or censoring (descriptive statistics are provided in Table 4).

Table 4.

Explanatory variables: descriptive statistics by event type (failure or acquisition).

	FAILURE		ACQUISITION		ALL BANKS
	Mean	Sd.Dev.	Mean	Sd.Dev.	Mean	Sd.Dev.
Equity/Total Assets	0.0397	0.0320	0.1132	0.0650	0.1135	0.0317
Loans/Total Assets	0.6889	0.1177	0.6105	0.2106	0.6329	0.1638
Comm. & Ind. Loans/Total Assets	0.1047	0.0848	0.0948	0.0837	0.0823	0.0620
R.E. Loans/Total Assets	0.5092	0.1781	0.3607	0.2037	0.4404	0.1691
Other R.E. Owned/Total Assets	0.0378	0.0430	0.0031	0.0083	0.0023	0.0071
Non-Performing Assets/Total Assets	0.1042	0.0688	0.0119	0.0155	0.0118	0.0135
A.L.L.L./Total Assets	0.0270	0.0152	0.0121	0.0105	0.0085	0.0045
Non-Interest Expense/Net Income	1.2341	0.6905	0.7321	0.3394	0.6812	0.1938
Return on Assets	$- 0.0389$	0.0306	0.0092	0.0221	0.0097	0.0075
Liquid Assets/Total Assets	0.0855	0.0625	0.0750	0.0907	0.0897	0.0875
Core Deposits/Total Assets	0.7356	0.1638	0.6764	0.2408	0.7783	0.0998
Size	12.0979	1.4953	12.4386	1.9170	12.3475	1.4487
Age	57.9793	39.0167	57.9904	43.1389	95.7954	32.9149
Yearly GDP growth	0.0202	0.0330	0.0544	0.0256	0.0357	0.0112
Yearly unemployment rate	0.0773	0.0214	0.0542	0.0163	0.0406	0.0076

Open in a new tab

Comm. & Ind., Commercial & Industrial; R.E., Real Estate; A.L.L.L., Allowance for Loan and Lease Losses.

4.1. Estimation results

In this section, we discuss the estimation results. In Tables 5 and 7, we provide the coefficient estimates and 95% confidence intervals (computed using the percentile bootstrap method with 1000 replications) obtained with the standard PH cure model, where the incidence is modeled by logistic regression. Moreover, in Tables 6 and 8, we provide the results obtained with the proposed GEV-PH cure model, where the incidence is modeled by GEV regression.

Table 5.

Estimation of the standard PH cure model for the failure event.

	Incidence			Latency
	Estimate	CI (2.5%)	CI (97.5%)	Estimate	CI (2.5%)	CI (97.5%)
(Intercept)	13.2371	6.7206	24.3928
Equity/Total Assets	$- - 83.2866$	$- - 128.2985$	$- - 67.8963$	0.4019	$- - 11.2587$	7.3670
Loans/Total Assets	1.3349	$- - 7.2229$	8.1335	0.0355	$- - 1.7756$	2.3720
Comm. & Ind. Loans/Total Assets	0.6952	$- - 7.3178$	11.7915	3.5793	1.0827	5.8822
R.E. Loans/Total Assets	$- - 2.6752$	$- - 9.0166$	4.2381	$- - 0.4653$	$- - 2.1557$	0.8007
Other R.E. Owned/Total Assets	5.6762	$- - 16.6208$	39.3061	$- - 2.8633$	$- - 6.2277$	2.3117
Non-Performing Assets/Total Assets	23.6749	9.9602	42.2651	$- - 2.5441$	$- - 5.9148$	1.0223
A.L.L.L./Total Assets	104.3656	29.0667	266.4567	$- - 17.8761$	$- - 34.0970$	$- - 4.7322$
Non-Interest Expense/Net Income	$- - 0.5238$	$- - 3.8930$	0.2343	0.0041	$- - 0.4345$	0.1810
Return on Assets	$- - 16.7772$	$- - 70.5251$	26.4951	$- - 9.3797$	$- - 17.2589$	$- - 3.1909$
Liquid Assets/Total Assets	$- - 14.1367$	$- - 30.1739$	$- - 5.4039$	$- - 4.9489$	$- - 8.2922$	$- - 2.2093$
Core Deposits/Total Assets	$- - 9.2155$	$- - 16.4423$	$- - 6.3852$	0.0973	$- - 1.1641$	1.1762
Size	$- - 0.6620$	$- - 1.3227$	$- - 0.3727$	$- - 0.0490$	$- - 0.1989$	0.0770
Age	$- - 0.0069$	$- - 0.0255$	0.0055	$- - 0.0091$	$- - 0.0135$	$- - 0.0056$
Yearly GDP growth	2.5940	$- - 31.7313$	32.4773	1.1406	$- - 4.6109$	4.7758
Yearly unemployment rate	117.5567	91.4808	243.9680	$- - 2.1231$	$- - 15.3646$	7.4978

Open in a new tab

CI, 95% percentile bootstrap Confidence Intervals; Comm. & Ind., Commercial & Industrial; R.E., Real Estate; A.L.L.L., Allowance for Loan and Lease Losses.

Table 7.

Estimation of the standard PH cure model for the acquisition event.

	Incidence			Latency
	Estimate	CI (2.5%)	CI (97.5%)	Estimate	CI (2.5%)	CI (97.5%)
(Intercept)	$- - 4.4940$	$- - 7.9121$	$- - 1.1952$
Equity/Total Assets	$- - 7.5642$	$- - 14.8987$	$- - 2.8047$	$- - 3.5429$	$- - 7.2694$	$- - 1.5897$
Loans/Total Assets	$- - 1.3482$	$- - 3.2489$	0.9686	1.9988	0.7817	3.0755
Comm. & Ind. Loans/Total Assets	2.7143	$- - 0.3594$	4.9387	$- - 1.5962$	$- - 3.2946$	0.4547
R.E. Loans/Total Assets	$- - 2.0230$	$- - 4.0345$	$- - 0.3956$	$- - 2.1648$	$- - 3.2740$	$- - 1.1494$
Other R.E. Owned/Total Assets	$- - 26.9500$	$- - 66.0321$	$- - 6.2319$	$- - 10.5601$	$- - 31.9443$	11.2218
Non-Performing Assets/Total Assets	$- - 19.4244$	$- - 38.3295$	$- - 5.8755$	$- - 29.4574$	$- - 41.5729$	$- - 22.6910$
A.L.L.L./Total Assets	94.7204	59.7733	143.5417	$- - 7.3435$	$- - 23.2919$	9.7902
Non-Interest Expense/Net Income	1.5724	0.4419	3.0954	$- - 0.1210$	$- - 0.7139$	0.2866
Return on Assets	2.6202	$- - 22.8591$	33.1639	5.3695	0.4401	19.4150
Liquid Assets/Total Assets	$- - 5.4661$	$- - 9.2999$	$- - 2.6006$	$- - 2.2675$	$- - 4.2727$	$- - 0.2149$
Core Deposits/Total Assets	$- - 4.6984$	$- - 6.4673$	$- - 3.3204$	0.2877	$- - 0.7282$	1.0820
Size	$- - 0.1763$	$- - 0.3452$	$- - 0.0326$	$- - 0.1742$	$- - 0.2864$	$- - 0.1067$
Age	$- - 0.0173$	$- - 0.0232$	$- - 0.0121$	$- - 0.0070$	$- - 0.0108$	$- - 0.0034$
Yearly GDP growth	67.1184	52.5170	86.5184	14.4795	9.6845	20.9163
Yearly unemployment rate	154.3596	132.1139	187.2850	18.4183	9.5719	30.3097

Open in a new tab

CI, 95% percentile bootstrap Confidence Intervals; Comm. & Ind., Commercial & Industrial; R.E., Real Estate; A.L.L.L., Allowance for Loan and Lease Losses.

Table 6.

Estimation of the GEV-PH cure model for the failure event.

	Incidence			Latency
	Estimate	CI (2.5%)	CI (97.5%)	Estimate	CI (2.5%)	CI (97.5%)
(tau)	$- - 0.3458$	$- - 0.9209$	$- - 0.0658$
(Intercept)	7.1044	4.2645	14.2541
Equity/Total Assets	$- - 40.7932$	$- - 73.5775$	$- - 35.4479$	0.1878	$- - 10.9666$	7.0573
Loans/Total Assets	0.4808	$- - 4.1460$	4.1850	0.0932	$- - 1.8306$	2.5189
Comm. & Ind. Loans/Total Assets	$- - 0.0424$	$- - 4.2831$	6.2064	3.5777	1.1742	5.8325
R.E. Loans/Total Assets	$- - 1.2861$	$- - 4.8833$	2.2772	$- - 0.4867$	$- - 2.1406$	0.7419
Other R.E. Owned/Total Assets	1.8138	$- - 9.8359$	22.0410	$- - 2.7733$	$- - 6.1831$	2.1980
Non-Performing Assets/Total Assets	13.1907	6.9596	24.3898	$- - 2.5555$	$- - 5.8172$	0.6601
A.L.L.L./Total Assets	51.5121	18.0169	147.2244	$- - 17.9360$	$- - 34.2570$	$- - 5.3382$
Non-Interest Expense/Net Income	$- - 0.2745$	$- - 2.2711$	0.2911	0.0019	$- - 0.4169$	0.1690
Return on Assets	$- - 4.4681$	$- - 36.0963$	13.5406	$- - 9.4510$	$- - 17.3420$	$- - 4.0430$
Liquid Assets/Total Assets	$- - 6.1780$	$- - 17.5201$	$- - 2.8784$	$- - 5.0270$	$- - 8.3469$	$- - 2.3049$
Core Deposits/Total Assets	$- - 4.7517$	$- - 9.8851$	$- - 3.2997$	0.1162	$- - 1.0723$	1.2227
Size	$- - 0.3395$	$- - 0.7662$	$- - 0.1863$	$- - 0.0475$	$- - 0.1982$	0.0680
Age	$- - 0.0040$	$- - 0.0134$	0.0037	$- - 0.0091$	$- - 0.0133$	$- - 0.0058$
Yearly GDP growth	1.2121	$- - 14.2198$	16.2972	1.0542	$- - 4.2500$	5.3280
Yearly unemployment rate	60.1916	49.4901	131.5650	$- - 2.1768$	$- - 15.7688$	7.2510

Open in a new tab

CI, 95% percentile bootstrap Confidence Intervals; Comm. & Ind., Commercial & Industrial; R.E., Real Estate; A.L.L.L., Allowance for Loan and Lease Losses.

Table 8.

Estimation of the GEV-PH cure model for the acquisition event.

	Incidence			Latency
	Estimate	CI (2.5%)	CI (97.5%)	Estimate	CI (2.5%)	CI (97.5%)
(tau)	$- - 0.8179$	$- - 1.0229$	$- - 0.6255$
(Intercept)	$- - 2.2132$	$- - 4.4277$	$- - 0.1427$
Equity/Total Assets	$- - 4.0500$	$- - 9.0008$	$- - 0.8962$	$- - 3.5995$	$- - 7.1502$	$- - 1.6044$
Loans/Total Assets	$- - 1.3486$	$- - 2.4048$	$- - 0.0081$	2.0139	0.8085	3.0389
Comm. & Ind. Loans/Total Assets	2.2871	0.1138	3.7887	$- - 1.6039$	$- - 3.2914$	0.2174
R.E. Loans/Total Assets	$- - 0.7174$	$- - 2.0130$	0.1515	$- - 2.1783$	$- - 3.2983$	$- - 1.1781$
Other R.E. Owned/Total Assets	$- - 26.2834$	$- - 42.2886$	$- - 10.6947$	$- - 6.7020$	$- - 30.7776$	17.7519
Non-Performing Assets/Total Assets	$- - 11.7681$	$- - 21.6249$	$- - 4.6422$	$- - 29.3905$	$- - 41.1973$	$- - 22.5866$
A.L.L.L./Total Assets	54.3963	37.7617	84.6731	$- - 7.6828$	$- - 23.0064$	8.7366
Non-Interest Expense/Net Income	0.7933	0.2332	1.6238	$- - 0.1086$	$- - 0.7498$	0.3294
Return on Assets	$- - 0.3691$	$- - 13.8218$	12.7540	5.3824	0.2228	18.4384
Liquid Assets/Total Assets	$- - 3.0124$	$- - 5.8247$	$- - 1.6884$	$- - 2.3405$	$- - 4.4465$	$- - 0.3284$
Core Deposits/Total Assets	$- - 3.1569$	$- - 4.3945$	$- - 2.2643$	0.2802	$- - 0.7754$	1.1327
Size	$- - 0.1334$	$- - 0.2653$	$- - 0.0314$	$- - 0.1712$	$- - 0.2826$	$- - 0.1103$
Age	$- - 0.0096$	$- - 0.0155$	$- - 0.0060$	$- - 0.0071$	$- - 0.0107$	$- - 0.0034$
Yearly GDP growth	48.2447	34.4664	65.4884	14.0224	9.4546	21.0184
Yearly unemployment rate	90.3818	81.0646	117.9610	18.1558	10.3668	29.9891

Open in a new tab

CI, 95% percentile bootstrap Confidence Intervals; Comm. & Ind., Commercial & Industrial; R.E., Real Estate; A.L.L.L., Allowance for Loan and Lease Losses.

Overall, the signs and significance levels of the estimated coefficients are consistent in both models. Considering other factors held constant, a positive (resp., negative) sign in the incidence coefficients indicate that an increase in the relevant variable is associated with an increase (resp., decrease) in the probability to be susceptible to the event of interest. Whereas a positive (resp., negative) sign in the latency coefficients indicate that an increase in the relevant variable is associated with an increase (resp., decrease) in the hazard and, consequently, with a shorter (resp., longer) survival time for the event of interest. Hereafter, we describe the impacts of the statistically significant variables on the two events (failure and acquisition).

4.1.1. Failure

In Tables 5 and 6, we provide the estimation results for the failure event. In line with the results of Wheelock and Wilson [29], we find that banks with lower capital buffers (Equity to Total Assets), higher amounts of non-performing assets (Non-Performing Assets to Total Assets), less liquid assets (Liquid Assets to Total Assets), a less stable source of funding (Core Deposits to Total Assets) and a smaller size are more susceptible to failure. Not surprisingly we find a positive relationship between the Allowance for Loan and Lease Losses to Total Assets and the likelihood of being susceptible to failure, reflecting the higher credit risk of a bank's loan portfolio. Moreover, as a consequence of worst economic conditions, we observe a positive relation between the susceptibility to failure and the yearly unemployment rate. Finally, we find that among the banks classified as susceptible to failure, the ones with a lower Allowance for Loan and Lease Losses to Total Assets ratio, a lower Liquid Assets to Total Assets, a higher Commercial and Industrial Loans to Total Assets ratio, a lower Return on Assets and a younger age exhibit a higher failure hazard and, consequently, shorter survival times.

4.1.2. Acquisition

In Tables 7 and 8, we provide the estimation results for the acquisition event. The Equity to Total Assets ratio has a negative relationship with the probability of being susceptible to acquisition. This finding is consistent with previous studies and several explanations based on past performance, inefficiency and danger of failure have already been proposed [20].

The Other Real Estate Owned Assets to Total Assets ratio and the Non-Performing Assets to Total Assets ratio have a negative coefficient, as in Wheelock and Wilson [29]. Unsurprisingly, the banks with more assets classified as defaulted or in non-accrual status (when principal/interest payments are late or missing) are less likely to be susceptible to acquisition. Whereas the Allowance for Loan and Lease Losses to Total Assets ratio has a positive relationship with the probability of being susceptible to acquisition. Contrary to the non-performing assets, which are assets effectively in trouble, the allowance is an estimation of the credit risk of a bank's loan portfolio, a general reserve available to absorb expected losses.

We find a positive relationship between bank's inefficiency and the likelihood of being susceptible to acquisition, since the Non-Interest Expenses to Net Income ratio has a positive coefficient. This is consistent with the idea of Hannan and Pilloff [20] that the expected gain from an acquisition is greater when the target bank has existing management with poorer performance. But, it is in contrast with Wheelock and Wilson [29], which observe a significant negative relationship with cost inefficiency and a not significant relationship with two measures of technical inefficiency.

The banks with less liquid assets and a less stable source of funding are more susceptible to acquisition, since the Liquid Assets and Core Deposits to Total Assets ratios have negative coefficients. In line with previous literature [20,29], we also find that older and bigger banks are less likely to be acquired.

Regarding the macroeconomic variables, both unemployment rate and GDP growth have a positive coefficient. The economic interpretation is not obvious. Banks in states with weaker economies (in terms of unemployment), but with a higher growth rate, are more likely to be susceptible to acquisition. In contrast, in a previous study in the European Union during the period 1997–2004, Hernando et al. [21] find that acquisitions are more likely during cyclical downturns. A possible explanation may be given by the fact that our sample covers a long period, almost 30 years, containing different economic cycles with different characteristics. Half of the acquisitions are concentrated in the period 1993–2000, which is characterized by stronger economic growth. Whereas, in the following years, the GDP growth rate in the U.S. exhibit a downward trend until 2018. Looking at Table 4, in fact, we can see that the average GDP growth rate is below the average for the failed banks, which are mostly observed after 2000, and above the average for the acquired ones.

The aforementioned results suggest that troubled banks with lower capitalization, a higher credit risk of the loans portfolio, a higher inefficiency and lower liquidity are more susceptible to acquisition than healthy banks.

Finally, among the banks classified as susceptible to acquisition, we find that the ones with lower capital buffers, higher amounts of loans, less liquid assets, lower amounts of non-performing assets, higher levels of earnings, younger age and a smaller size, are more at risk of acquisition and exhibit shorter survival times. In addition, the banks incorporated in States with a higher unemployment rate and a higher increase in the Gross Domestic Product with respect to the previous year are more at risk of acquisition and exhibit shorter survival times.

4.2. Model validation

In this section, we validate the performance of the proposed competing risks GEV-PH cure model and we compare it with the one of the standard competing risks PH cure model, where the probability to be susceptible is modeled by logistic regression. We measure the model's performance in terms of its ability to discriminate between banks susceptible to default and/or acquisition and banks immune from these events. For this purpose, we employ a leave-one-out cross-validation procedure. For $i = 1, \dots, n$ , we fit the model on all observations except i, which is used to compute the out-of-sample probability to be susceptible to failure ${\tilde{π}}_{1} (x_{i})$ and acquisition ${\tilde{π}}_{2} (x_{i})$ . Given the independence assumption (conditional on covariates), we compute the probability for the following classes:

susceptible to both failure and acquisition ${\tilde{π}}_{1} (x_{i}) {\tilde{π}}_{2} (x_{i})$ ;
susceptible to failure only ${\tilde{π}}_{1} (x_{i}) [1 - {\tilde{π}}_{2} (x_{i})]$ ;
susceptible to acquisition only $[1 - {\tilde{π}}_{1} (x_{i})] {\tilde{π}}_{2} (x_{i})$ ;
immune to both events $[1 - {\tilde{π}}_{1} (x_{i})] [1 - {\tilde{π}}_{2} (x_{i})]$ .

On the basis of the out-of-sample classifications and the reference (true) classes, we construct a confusion matrix to assess the model's predictive performance. Notice, however, that we do not know whether the censored banks are susceptible to failure and/or acquisition. We can only assume that the follow-up is long enough to consider them as immune.

In Table 9, we provide the confusion matrices of the out-of-sample classification results for both the standard PH cure (LOGIT) and the proposed GEV-PH cure model. Both models show very few banks classified as susceptible to acquisition (resp., failure), when they actually experienced default (resp. acquisition), and most of the censored banks are classified as immune to both events. However, we observe that some of the banks that have been actually acquired are wrongly classified as immune. But, if we look at the average values of their covariates (see Table 10), we notice that these banks exhibit lower ratios of other real estate owned assets, lower ratios of allowances for loan and lease losses, lower efficiency ratios, grater ratios of core deposits, greater sizes, longer times since establishment, lower state GDP growth rates and lower state unemployment rates compared to the other acquired banks that have been classified as susceptible to failure and/or acquisition. As we discussed, in Section 4.1, these are all characteristics belonging to healthier banks. Thus, we can conclude that the competing risks PH cure model is capable to discriminate between acquisitions of banks in good conditions and troubled banks. Finally, we notice a difference between the standard (LOGIT) and the GEV-PH cure models. Most of the failed banks are classified as susceptible to failure and acquisition. Whereas, in the GEV-PH cure model, they are mostly classified as susceptible to failure only. This is due to the fact that, for these banks, the probability of acquisition ${\tilde{π}}_{2} (x_{i})$ is lower than 0.5. We believe that this result is a consequence of the negative τ coefficient, which is significantly lower than zero (see Table 8). As we explained in Section 2, for negative values of the shape parameter τ, some of the observations belonging to the minority class may be excluded from the log-likelihood during the estimation procedure.

Table 9.

Confusion matrices of the cross-validation results (out-of-sample).

		Predicted	Immune	Failure	Acquisition	Fail. & acqu.
LOGIT	Reference	censored	3782	4	25	5
		failure	12	77	9	196
		acquisition	109	1	148	45
GEV	Reference	censored	3787	5	19	5
		failure	15	186	8	85
		acquisition	116	4	143	40

Open in a new tab

Table 10.

Average of the covariates of the acquired banks in the groups susceptible and immune to acquisition.

	Acquisition
	Susceptible	Immune
Equity/Total Assets	0.1120	0.1154
Loans/Total Assets	0.6141	0.6043
Comm. & Ind. Loans/Total Assets	0.0964	0.0922
R.E. Loans/Total Assets	0.3361	0.4028
Other R.E. Owned/Total Assets	0.0040	0.0016
Non-performing Assets/Total Assets	0.0122	0.0114
A.L.L.L./Total Assets	0.0135	0.0097
Non-Interest Expense/Net Income	0.7536	0.6954
Return on Assets	0.0093	0.0091
Liquid Assets/Total Assets	0.0746	0.0756
Core Deposits/Total Assets	0.6455	0.7289
Size	12.3726	12.5512
Age	47.9461	75.1196
Yearly GDP growth	0.0630	0.0398
Yearly unemployment rate	0.0593	0.0455

Open in a new tab

Comm. & Ind., Commercial & Industrial; R.E., Real Estate; A.L.L.L., Allowance for Loan and Lease Losses.

Similar results (see Table 11) are obtained using the same cross-validation procedure, but fitting the model using covariates observed two years before failure, acquisition or censoring and then calculating the out-of-sample probabilities using covariates observed with a time lag of one year only. Note that in this case, we had to remove 56 banks from the initial sample due to missing data.

Table 11.

Confusion matrices of the cross-validation results (out-of-sample).

		Predicted	Immune	Failure	Acquisition	Fail. & acqu.
LOGIT	Reference	Censored	3126	11	636	37
		Failure	13	52	9	192
		Acquisition	72	3	109	97
GEV	Reference	Censored	3409	18	364	19
		Failure	11	156	6	93
		Acquisition	93	9	99	80

Open in a new tab

When we fit the model we use covariates observed two years before the failure, acquisition or censoring time.

5. Conclusion

In this article, we studied the determinants of commercial bank failures and acquisitions in the United States during the period 1993–2018. We used a competing risks proportional-hazards cure model in order to measure the impact of bank-specific and macroeconomic variables on the probabilities of being susceptible (i.e. incidence) to these events and on the survival time of susceptible banks (i.e. latency). Given the rarity of failure and acquisition events, instead of using the classical logistic regression model to model the incidence distribution, we proposed the adoption of Generalized Extreme Value (GEV) regression, a more flexible model. We found that banks with a lower capitalization rate, higher level of non-performing assets, higher credit risk of the loans portfolio, lower liquidity, less stable sources of funding and smaller size are more susceptible to failure. Whereas banks with similar characteristics, but a lower level of troubled assets, higher inefficiency and younger age are more susceptible to acquisition. Controlling for the macroeconomic conditions, we also found that banks incorporated in states with weaker economies, measured by the unemployment rate, are more susceptible to both failure and acquisition. On the contrary, the probability of being susceptible to acquisition is higher in states with a higher GDP growth rate, presumably due to the fact that most of the acquisitions took place after the Savings and Loan crisis, a period of stronger economic growth.

Using a leave-one-out cross-validation procedure, we validated the performance of the proposed methodology, in terms of discriminatory power between banks susceptible to default and/or acquisition and banks immune from these events. The proposed methodology performs reasonably well. We show a good performance in the classification of banks that actually failed, especially when GEV regression is used to model the incidence distribution. Less good in the classification of banks that were actually acquired. But, after a quick analysis of their covariates, we showed how the acquired banks classified as immune are stronger and in better conditions than the other ones. In a way, our model is capable to discriminate between acquisitions under negative and positive circumstances, treating the healthier ones as immune.

Notes

https://www.ffiec.gov/npw/FinancialReport/DataDownload.

From yearly balance sheet and income statement data, publicly available on the FDIC's website, under Statistics on Depository Institutions (SDI), https://www5.fdic.gov/sdi/.

https://www.bls.gov/lau/.

⁴

https://www.bea.gov/data/gdp/gdp-state.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

1.Akhigbe A., Madura J. and Whyte A.M., Partial anticipation and the gains to bank merger targets, J. Financ. Serv. Res. 26 (2004), pp. 55–71. [Google Scholar]
2.Altman E.I., Cizel J. and Rijken H.A., Anatomy of bank distress: The information content of accounting fundamentals within and across countries, 2014. Available at https://ssrn.com/abstract=2504926.
3.Amico M. and Van Keilegom I., Cure models in survival analysis, Annu. Rev. Stat. Appl. 5 (2018), pp. 311–342. [Google Scholar]
4.Audrino F., Kostrov A. and Ortega J.P., Predicting US bank failures with MIDAS logit models, J. Financ. Quant. Anal. 54 (2019), pp. 2575–2603. [Google Scholar]
5.Balla E., Mazur L.C., Prescott E.S. and Walter J.R., A comparison of community bank failures and FDIC losses in the 1986–1992 and 2007–2013 banking crises, J. Bank. Finance 106 (2019), pp. 1–15. [Google Scholar]
6.Beretta A. and Heuchenne C., Variable selection in proportional hazards cure model with time-varying covariates, application to us bank failures, J. Appl. Stat. 46 (2019), pp. 1529–1549. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Betz F., Oprică S., Peltonen T.A. and Sarlin P., Predicting distress in European banks, J. Bank. Finance 45 (2014), pp. 225–241. [Google Scholar]
8.Caiazza S., Clare A. and Pozzolo A.F., What do bank acquirers want? Evidence from worldwide bank M&A targets, J. Bank. Finance 36 (2012), pp. 2641–2659. [Google Scholar]
9.Calabrese R. and Osmetti S., Modelling small and medium enterprise loan defaults as rare events: The generalized extreme value regression model, J. Appl. Stat. 40 (2013), pp. 1172–1188. [Google Scholar]
10.Cole R. and Gunther J., Separating the likelihood and timing of bank failure, J. Bank. Finance 19 (1995), pp. 1073–1089. [Google Scholar]
11.Cole R.A. and White L.J., Déjà vu all over again: The causes of US commercial bank failures this time around, J. Financ. Serv. Res. 42 (2012), pp. 5–29. [Google Scholar]
12.Correa R., Cross-border bank acquisitions: Is there a performance effect?, J. Financ. Serv. Res. 36 (2009), p. 169. [Google Scholar]
13.Cox D., Regression models and life-tables, J. R. Stat. Soc. Ser. B (Methodol.) 34 (1972), pp. 187–220. [Google Scholar]
14.Czado C. and Santner T.J., The effect of link misspecification on binary regression inference, J. Stat. Plan. Inference 33 (1992), pp. 213–231. [Google Scholar]
15.Demirgüç-Kunt A., Deposit-institution failures: A review of empirical literature, Econ. Rev. 25 (1989), pp. 2–19. [Google Scholar]
16.Dempster A., Laird N. and Rubin D., Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. Ser. B. (Methodol.) 39 (1977), pp. 1–38. [Google Scholar]
17.DeYoung R., Evanoff D.D. and Molyneux P., Mergers and acquisitions of financial institutions: A review of the post-2000 literature, J. Financ. Serv. Res. 36 (2009), pp. 87–110. [Google Scholar]
18.Efron B., The Jackknife, the Bootstrap and other resampling plans, in CBMS-NSF Regional Conference Series in Applied Mathematics, Ron Rozier, ed., Society for Industrial and Applied Mathematics, 1982.
19.Focarelli D., Panetta F. and Salleo C., Why do banks merge?, J. Money Credit Bank 34 (2002), pp. 1047–1066. [Google Scholar]
20.Hannan T.H. and Pilloff S.J., Acquisition targets and motives in the banking industry, J. Money Credit Bank. 41 (2009), pp. 1167–1187. [Google Scholar]
21.Hernando I., Nieto M.J. and Wall L.D., Determinants of domestic and cross-border bank acquisitions in the European Union, J. Bank. Finance 33 (2009), pp. 1022–1032. [Google Scholar]
22.King G. and Zeng L., Logistic regression in rare events data, Polit. Anal. 9 (2001), pp. 137–163. [Google Scholar]
23.Lane W., Looney S. and Wansley J., An application of the Cox proportional hazards model to bank failure, J. Bank. Finance 10 (1986), pp. 511–531. [Google Scholar]
24.Molina C.A., Predicting bank failures using a hazard model: The Venezuelan banking crisis, Emerg. Mark. Rev. 3 (2002), pp. 31–50. [Google Scholar]
25.Sy J. and Taylor J., Estimation in a Cox proportional hazards cure model, Biometrics 56 (2000), pp. 227–236. [DOI] [PubMed] [Google Scholar]
26.Thomson J., Predicting bank failures in the 1980s, Econ. Rev. 27 (1991), pp. 9–20. [Google Scholar]
27.Wang X. and Dey D., Generalized extreme value regression for binary response data: An application to B2B electronic payments system adoption, Ann. Appl. Stat. 4 (2010), pp. 2000–2023. [Google Scholar]
28.Whalen G., A proportional hazards model of bank failure: An examination of its usefulness as an early warning tool, Econ. Rev. 27 (1991), pp. 21–30. [Google Scholar]
29.Wheelock D. and Wilson P., Why do banks disappear? The determinants of US bank failures and acquisitions, Rev. Econ. Stat. 82 (2000), pp. 127–138. [Google Scholar]

[CIT0001] 1.Akhigbe A., Madura J. and Whyte A.M., Partial anticipation and the gains to bank merger targets, J. Financ. Serv. Res. 26 (2004), pp. 55–71. [Google Scholar]

[CIT0002] 2.Altman E.I., Cizel J. and Rijken H.A., Anatomy of bank distress: The information content of accounting fundamentals within and across countries, 2014. Available at https://ssrn.com/abstract=2504926.

[CIT0003] 3.Amico M. and Van Keilegom I., Cure models in survival analysis, Annu. Rev. Stat. Appl. 5 (2018), pp. 311–342. [Google Scholar]

[CIT0004] 4.Audrino F., Kostrov A. and Ortega J.P., Predicting US bank failures with MIDAS logit models, J. Financ. Quant. Anal. 54 (2019), pp. 2575–2603. [Google Scholar]

[CIT0005] 5.Balla E., Mazur L.C., Prescott E.S. and Walter J.R., A comparison of community bank failures and FDIC losses in the 1986–1992 and 2007–2013 banking crises, J. Bank. Finance 106 (2019), pp. 1–15. [Google Scholar]

[CIT0006] 6.Beretta A. and Heuchenne C., Variable selection in proportional hazards cure model with time-varying covariates, application to us bank failures, J. Appl. Stat. 46 (2019), pp. 1529–1549. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0007] 7.Betz F., Oprică S., Peltonen T.A. and Sarlin P., Predicting distress in European banks, J. Bank. Finance 45 (2014), pp. 225–241. [Google Scholar]

[CIT0008] 8.Caiazza S., Clare A. and Pozzolo A.F., What do bank acquirers want? Evidence from worldwide bank M&A targets, J. Bank. Finance 36 (2012), pp. 2641–2659. [Google Scholar]

[CIT0009] 9.Calabrese R. and Osmetti S., Modelling small and medium enterprise loan defaults as rare events: The generalized extreme value regression model, J. Appl. Stat. 40 (2013), pp. 1172–1188. [Google Scholar]

[CIT0010] 10.Cole R. and Gunther J., Separating the likelihood and timing of bank failure, J. Bank. Finance 19 (1995), pp. 1073–1089. [Google Scholar]

[CIT0011] 11.Cole R.A. and White L.J., Déjà vu all over again: The causes of US commercial bank failures this time around, J. Financ. Serv. Res. 42 (2012), pp. 5–29. [Google Scholar]

[CIT0012] 12.Correa R., Cross-border bank acquisitions: Is there a performance effect?, J. Financ. Serv. Res. 36 (2009), p. 169. [Google Scholar]

[CIT0013] 13.Cox D., Regression models and life-tables, J. R. Stat. Soc. Ser. B (Methodol.) 34 (1972), pp. 187–220. [Google Scholar]

[CIT0014] 14.Czado C. and Santner T.J., The effect of link misspecification on binary regression inference, J. Stat. Plan. Inference 33 (1992), pp. 213–231. [Google Scholar]

[CIT0015] 15.Demirgüç-Kunt A., Deposit-institution failures: A review of empirical literature, Econ. Rev. 25 (1989), pp. 2–19. [Google Scholar]

[CIT0016] 16.Dempster A., Laird N. and Rubin D., Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. Ser. B. (Methodol.) 39 (1977), pp. 1–38. [Google Scholar]

[CIT0017] 17.DeYoung R., Evanoff D.D. and Molyneux P., Mergers and acquisitions of financial institutions: A review of the post-2000 literature, J. Financ. Serv. Res. 36 (2009), pp. 87–110. [Google Scholar]

[CIT0018] 18.Efron B., The Jackknife, the Bootstrap and other resampling plans, in CBMS-NSF Regional Conference Series in Applied Mathematics, Ron Rozier, ed., Society for Industrial and Applied Mathematics, 1982.

[CIT0019] 19.Focarelli D., Panetta F. and Salleo C., Why do banks merge?, J. Money Credit Bank 34 (2002), pp. 1047–1066. [Google Scholar]

[CIT0020] 20.Hannan T.H. and Pilloff S.J., Acquisition targets and motives in the banking industry, J. Money Credit Bank. 41 (2009), pp. 1167–1187. [Google Scholar]

[CIT0021] 21.Hernando I., Nieto M.J. and Wall L.D., Determinants of domestic and cross-border bank acquisitions in the European Union, J. Bank. Finance 33 (2009), pp. 1022–1032. [Google Scholar]

[CIT0022] 22.King G. and Zeng L., Logistic regression in rare events data, Polit. Anal. 9 (2001), pp. 137–163. [Google Scholar]

[CIT0023] 23.Lane W., Looney S. and Wansley J., An application of the Cox proportional hazards model to bank failure, J. Bank. Finance 10 (1986), pp. 511–531. [Google Scholar]

[CIT0024] 24.Molina C.A., Predicting bank failures using a hazard model: The Venezuelan banking crisis, Emerg. Mark. Rev. 3 (2002), pp. 31–50. [Google Scholar]

[CIT0025] 25.Sy J. and Taylor J., Estimation in a Cox proportional hazards cure model, Biometrics 56 (2000), pp. 227–236. [DOI] [PubMed] [Google Scholar]

[CIT0026] 26.Thomson J., Predicting bank failures in the 1980s, Econ. Rev. 27 (1991), pp. 9–20. [Google Scholar]

[CIT0027] 27.Wang X. and Dey D., Generalized extreme value regression for binary response data: An application to B2B electronic payments system adoption, Ann. Appl. Stat. 4 (2010), pp. 2000–2023. [Google Scholar]

[CIT0028] 28.Whalen G., A proportional hazards model of bank failure: An examination of its usefulness as an early warning tool, Econ. Rev. 27 (1991), pp. 21–30. [Google Scholar]

[CIT0029] 29.Wheelock D. and Wilson P., Why do banks disappear? The determinants of US bank failures and acquisitions, Rev. Econ. Stat. 82 (2000), pp. 127–138. [Google Scholar]

PERMALINK

Competing risks proportional-hazards cure model and generalized extreme value regression: an application to bank failures and acquisitions in the United States

A Beretta

C Heuchenne

M Restaino

ABSTRACT

1. Introduction

Figure 1.

2. The binary GEV regression model

Figure 2.

3. Competing risks GEV-PH cure model

3.1. Estimation

3.2. Simulation study

Table 1.

Table 2.

Table 3.

4. Application to US bank failures and acquisitions

Figure 3.

Table 4.

4.1. Estimation results

Table 5.

Table 7.

Table 6.

Table 8.

4.1.1. Failure

4.1.2. Acquisition

4.2. Model validation

Table 9.

Table 10.

Table 11.

5. Conclusion

Notes

Disclosure statement

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Competing risks proportional-hazards cure model and generalized extreme value regression: an application to bank failures and acquisitions in the United States

A Beretta

C Heuchenne

M Restaino

ABSTRACT

1. Introduction

Figure 1.

2. The binary GEV regression model

Figure 2.

3. Competing risks GEV-PH cure model

3.1. Estimation

3.2. Simulation study

Table 1.

Table 2.

Table 3.

4. Application to US bank failures and acquisitions

Figure 3.

Table 4.

4.1. Estimation results

Table 5.

Table 7.

Table 6.

Table 8.

4.1.1. Failure

4.1.2. Acquisition

4.2. Model validation

Table 9.

Table 10.

Table 11.

5. Conclusion

Notes

Disclosure statement

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases