Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2023 Feb 16;64:101907. doi: 10.1016/j.ribaf.2023.101907

Forecasting for regulatory credit loss derived from the COVID-19 pandemic: A machine learning approach

Marta Ramos González a,⁎,1, Antonio Partal Ureña b, Pilar Gómez Fernández-Aguado b
PMCID: PMC9933877  PMID: 36814639

Abstract

The economic onslaught of the COVID-19 pandemic has compromised the risk management of financial institutions. The consequences related to such an unprecedented situation are difficult to foresee with certainty using traditional methods. The regulatory credit loss attached to defaulted mortgages, so-called expected loss best estimate (ELBE), is forecasted using a machine learning technique. The projection of two ELBEs for 2022 and their comparison are presented. One accounts for the outbreak's impact, and the other presumes the nonexistence of the pandemic. Then, it is concluded that the referred crisis surely adversely affects said high-risk portfolios. The proposed method has excellent performance and may serve to estimate future expected and unexpected losses amidst any event of extraordinary magnitude.

Keywords: Machine learning, COVID-19, Internal-rating-based, Credit risk, Defaulted exposures

Graphical Abstract

ga1

1. Introduction

According to the World Bank (2020), the 2020 economic recession - resulting from the measures adopted to mitigate the spread of COVID-19 – resulted in the fastest, steepest decline in consensus growth forecasts among all global downturns since 1990. The European Commission (2020) published analyses in alignment with this statement while focusing on the EU economy, observing substantial differences across countries and industries. Several articles were recently published analysing the impact of the disease outbreak in various economic fields, such as business model shifts (Seetharaman, 2020), exchange rate shocks (Narayan, 2021), firm performance (Hu and Zhang, 2021), income distribution (O’Donoghue et al., 2020) and stock market (Sharif et al., 2020), among others. As published later by the World Bank (2021a), the aforementioned expectations were met, with some sharp rebounds observed during 2021 in major economies, particularly in the US, although emerging markets and developing economies are generally lagging. Nonetheless, forecasting the consequences of such an unprecedented situation (IMF, 2020b) has proven challenging. Ioannidis et al. (2022) found significant failures while reviewing certain COVID-19 epidemic predictions.

Recent studies published by the Organisation for Economic Co-operation and Development (OECD, 2021) projected an increase in the non-performing loan ratio across all world regions as a consequence of the coronavirus crisis. Learning from experience, European organisations and governments exerted serious efforts to counteract the forecasted economic effect of the strong decrease in activity registered during 2020 due to the lockdown procedures implemented to address the rapid spread of COVID-19 worldwide. In fact, the economic impact of such lockdowns has been widely reported in several studies, as the ones presented by Ke and Hsiao (2022), Markeviciute et al. (2022) and Pedauga et al. (2022). The implementation of certain exceptional measures attempted to alleviate the aforementioned growth of defaulted loans, such as the mortgage moratoria proposed by the European Banking Authority, hereinafter EBA (2020a), in its guidelines.

Going back in time, the financial crisis that originated in 2008 was also followed by a considerable increase in the volume of non-performing assets of the affected financial institutions (Karadima and Louri, 2021). This demonstrated the need for appropriate measurement of the associated credit risk. Therefore, the regulatory requirements relevant to this purpose have evolved since then, particularly those addressing the capital requirements derived from the defaulted exposures’ credit risk. As referred to in Basel II (Basel Committee on Banking Supervision, 2006), the defaulted exposures risk category entails high capital consumption. To calculate the regulatory capital, institutions can choose from the standardised approach, the foundation internal-ratings-based (IRB) approach and the advanced IRB approach (hereinafter, advanced). Adopting the IRB method requires the application of certain parameters and is subject to supervisory authorities' approval. The advanced approach involves the loss given default (LGD), the probability of default (PD) and the exposure at default (EAD) for non-defaulted exposures, while the expected loss best estimate (ELBE) and the LGD in-default are used for defaulted exposures. The Capital Requirements Regulation (CRR, European Parliament and Council, 2013) includes general instructions for estimating the referred parameters for defaulted exposures,2 allocating for the drawing up of specific guides to the EBA.

Nevertheless, the Regulatory Technical Standards on assessment methodology for the IRB approach (EBA, 2016) did not provide sufficient detail in the area of the risk estimation of defaulted exposures. Later, the guidelines on PD estimation, LGD estimation and the treatment of defaulted exposures (EBA, 2017) were published, including a more comprehensive description of the estimation of both ELBE and LGD in-default. Thereafter, the ELBE is defined as the regulatory credit risk estimate for defaulted exposures aimed at anticipating the relevant amount of expected loss. This IRB parameter is used to meet regulatory requirements but can also be used for the internal management of banking entities.

The methodological framework of artificial intelligence (AI) is called machine learning (ML). There are different applications of ML techniques available in the academic literature for prediction purposes. Ahmed and Hammami (2022) compiled one decade of studies, including finance-related applications of AI and ML methods in areas such as bankruptcy prediction, stock price prediction, portfolio management, oil price prediction, anti-money laundering, behavioural finance, big data analytics and blockchain. Recently, Al-Maadid et al. (2022) used ML approaches to assess the role of COVID-19 news in stock return predictability. More recently, Moscatelli et al. (2020) and Barbaglia et al. (2021) collected studies with ML predictions of credit default. In addition, Alonso Robisco and Carbó Martínez (2022a) assessed the economic impact for financial institutions of using ML models to predict credit default for purposes of regulatory capital savings.

Moreover, the ML application on credit ratings and scoring estimations has been the subject of recent investigations, such as those performed by Machado and Karray (2022), Yu et al. (2020), Li et al. (2020) and Yu et al. (2021). Bellotti et al. (2021) forecast the recovery rates of non-performing loans via ML techniques. Only recently published articles have included proposals on regulatory capital (García-Céspedes and Moreno, 2022) and LGD (Bastos and Matos, 2022) prediction using ML methods. In turn, ML techniques have been increasingly used in the financial industry in recent years, as highlighted by Huck (2019). Alonso Robisco and Carbó Martínez (2022b) emphasised that credit institutions use ML for credit risk purposes, from the calculation of regulatory capital to credit scoring or estimation of provisions. In addition, an EBA (2020c) report pointed out that approximately 10% of European banking institutions already use ML for regulatory capital calculations. Therefore, there is a gap between the academic literature and banking practices in the use of ML methodologies for this purpose since the latter have evolved much faster. In fact, the EBA (2021) launched a discussion paper on ML for IRB models, to collect detailed information about its current and intended use in financial institutions and the related challenges identified.

As historical data related to past pandemics are not recorded in banking databases, the use of traditional estimation methods has been quite challenging since 2020. This article forecasts the ELBE of defaulted mortgages while considering the impact of the coronavirus crisis through an ML algorithm. The methodology of Ramos González et al. (2021), applied to data from different portfolios and geographies, serves as a starting point since it is aimed at being aligned with the previously referred regulatory standards. Then, two ELBE parameters are forecasted for 2022, with and without considering the impact of the COVID-19 pandemic outbreak. For this purpose, an ML technique formed by a multilayer feedforward network, the so-called deep neural network (DNN), and backpropagation learning based on a batch gradient descent algorithm is used. In addition, the observed Spanish unemployment rate (UR) and its projections issued by the Bank of Spain (2020) serve as input together with certain historical data attached to the mortgage portfolio and entity idiosyncrasies.

Therefore, the approach presented in this article constitutes an innovative proposal across the academic literature that could be adapted to any credit institution for the prediction of IRB-related estimates. In addition, it could be adjusted to achieve high performance. In fact, the presented mortgage ELBE forecasting evidently has excellent performance for the three Spanish institutions considered in the study. As a result, a significant impact of the COVID-19 pandemic is observed for the defaulted portfolios of these entities. The impact also depends on each entity’s characteristics, as reflected in the outcome. Thus, ML techniques are proven convenient for estimating the future impact of events of uncertain magnitude.

The paper is structured as follows. In Section 2, the proposed methodology is described. In particular, Subsection 2.1. focuses on the dataset description, while the learning technique and the concerning data processing are presented in Subsections 2.2. and 2.3., respectively. Subsequently, Section 3 describes the process followed to define the DNN architecture (3.1.), then the subsequent fine-tuning to achieve an optimal performance per institution (3.2.) and, eventually, the use of the described architecture for the purpose of ELBE forecasting (3.3.). Section 4 covers the main conclusions of the presented proposal regarding both the outcome of the complete exercise and the implemented ML methodology.

2. Materials and methods

2.1. Dataset

The input sample contains information about mortgages from the historical datasets of the three largest credit institutions in Spain, as indicated in Table 1. The corresponding variables are related to delinquency status and default, deemed key drivers of credit risk modelling, as suggested by Kelly and O’Malley (2016). Ampudia et al. (2016) highlighted the role of the UR in explaining mortgage arrears. Therefore, the input dataset also contains the UR observed in the Spanish economy across the time period considered in Table 1, which is publicly available (World Bank, 2021b). The combination of the referred variables captures the entity and portfolio characteristics as well as the economic context into which they are embedded. Table 2 compiles all these variables distributed among the three types. The input dataset is built at the level of entity and year of default.

Table 1.

Time range of historical data available per entity that is considered for the study.

Entity Time Range # years
1 2005–2014 10
2 2005–2015 11
3 2004–2014 11

Table 2.

Definition and categorisation of the seventeen model input variables. These are reported for each year of the corresponding time range considered. The institution-specific variable risk level is an entity ranking built on the restructured and transfer in lieu of payment statistics per entity. The economic cycle indicators are two linear combinations of the restructured and transfer in lieu of payment statistics and the first UR variable that is built piecewise, being 2009 the inflexion point as it is the most adverse year in terms of percentage of restructures and transfers in lieu of payment, according to the entities data.

Category Variable Definition
Year-specific X1 Unemployment rate (−1 year)
X2 Unemployment rate (−2 years)
X3 Unemployment rate (−3 years)
X4 Unemployment rate (−4 years)
X5 Unemployment rate (−5 years)
X6 Unemployment rate (−6 years)
X7 Unemployment rate (−7 years)
X8 Linear combination of unemployment rates
X9 Year of default
X10 Year of observation - Year of default
Institution-specific X11 Risk level
X12 Economic cycle indicator [1]
X13 Economic cycle indicator [2]
Year and institution-specific X14 Percentage of transfer in lieu of payment and repossession
X15 Percentage of restructured
X16 Percentage of other types of termination
X17 Linear combination of types of termination

Then, the proposed variables are preprocessed to reduce the complexity of the calculation and enhance the accuracy of the DNN. In particular, all the variables proposed that are still not expressed in percentages are recalculated in terms of the share of the respective total amount.

The target output contains the expected loss best estimate (ELBE) estimated on the reference dates after defaulting, as presented in Table 3 and following the methodology proposed by Ramos González et al. (2021) indicated below.

ELBEtk=1t~i>tkrt~iEADt~i>tkrt~i (1)

where.

Table 3.

Reference dates of the seventeen model output variables, which correspond to the ELBE forecasted over time.

Variable Defaulted time
ELBE1 1 month
ELBE2 2 months
ELBE3 2 quarters
ELBE4 3 quarters
ELBE5 4 quarters
ELBE6 5 quarters
ELBE7 6 quarters
ELBE8 7 quarters
ELBE9 8 quarters
ELBE10 3 years
ELBE11 4 years
ELBE12 5 years
ELBE13 6 years
ELBE14 7 years
ELBE15 8 years
ELBE16 9 years
ELBE17 More than 10 years

T>0 and 0,TR an interval of default time;.

0<t0<t1<<tk<<tn<T is a partition such that t0 is the moment of default;.

0<t~0<t~1<<t~k<<t~m<T is other partition such that t0<t~0 and t~m<tn; and.

rt~0,rt~1,,rt~k,,rt~m are the debt recoveries recorded throughout the defaulted period.

Each ELBE is computed per entity and year of default so that the corresponding recoveries are considered in its calculation.

Finally, each input and output dataset contains 544 data points since the 17 input and output variables are reported for 32 years of default for the three entities, as indicated in Table 1. Then, an algorithm randomly splits each dataset into training (82%) and testing (18%) datasets, as indicated in Table 4. Since the sample size available in the financial institutions is limited because robust historical data are available only across the last two decades, there is no room for further data collection. For this reason, an additional split of each dataset to obtain a validation sample is not possible. Nonetheless, the data considered are sufficient to achieve a high-performing technique, as explained in the following sections.

Table 4.

Dataset split into training and testing samples per entity. The samples are used for the purpose of training the ML algorithm and for testing its accuracy.

Entity Years of default Dataset
1 2005–2010, 2012–2014 Training
1 2011 Testing
2 2005–2008, 2010–2011, 2013–2015 Training
2 2009, 2012 Testing
3 2005–2009, 2011–2014 Training
3 2004, 2010 Testing

2.2. Learning technique

The method used to build the DNN is supervised learning with backpropagation performed by a batch gradient descent algorithm. The vanishing gradient problem, observed in DNN algorithms, has hitherto been widely studied (Karabayir et al., 2021). In the present article, such problem is solved using a rectified linear unit (ReLU). Furthermore, the parametric ReLU (PReLU), an enhanced ReLU activation function studied to solve problems in different areas (Macedo et al., 2019), even improves the accuracy. Taking that into account, the DNN introduced in the present article is built through the combination of two activation functions: sigmoid and PReLU (or ReLU). Aiming for the most accurate results, different combinations of such activations are tested. In addition, a bias equal to one is added to each layer, considering that Buddhtha et al. (2019) demonstrated a significant improvement in accuracy when including the bias component.

Each architecture proposed is built based on the complete training sample that includes the data from the three institutions. The aim is to define an architecture that permits a good approximation of the predicted ELBE to the target output. As can be observed in Fig. 1, the chosen structure of DNN has five layers and the two aforementioned activation functions.

Fig. 1.

Fig. 1

Structure of the proposed DNN. The graph represents the interaction between the elements (matrices) forming the DNN across its five layers.

Once the architecture is defined, the subsequent step is to fine-tune the selected DNN and optimise the accuracy per Spanish institution. For this purpose, the PReLU or ReLU multiplying factor is calibrated to achieve the highest performance.

2.2.1. Feedforward

The methodology followed in the feedforward phase for each layer l is as indicated below:

Z[l]=WlAl1+Bl (2)
A[l]=gl(Zl) (3)

where.

Al is the output matrix of layer l;.

gl is the activation function of layer l;.

Zl is the net output matrix of layer l;.

Wl is the weight matrix from layer l-1 to layer l; and.

Bl is the bias matrix from layer l-1 to layer l.

Then, the output activation is:

Yˆ=AL (4)

where L is the last layer.

The output error is computed as the mean squared error (MSE):

MSE=i=0nLj=0mL(YijYˆij)2nL+mL (5)

where.

Y is the target output matrix;.

nL is the number of rows in the Y matrix; and.

mL is the number of columns in the Y matrix.

According to the selected architecture, there are 5 layers (L = 4), and the activation function is PReLU or ReLU, depending on the value of the multiplier factor (α)3 and sigmoid. Both are elementwise applied as defined below:

Aij[l]=glZij[l]=αZij[l],ifZij[l]<0Zij[l],ifZij[l]0forl=1,2andα[0,1) (6)
Aij[l]=glZij[l]=11+eZij[l]forl=3,4 (7)

where

i is the row of the referred matrix; and.

j is the column of the referred matrix.

2.2.2. Backpropagation

The backpropagation algorithm is applied to update the weights every iteration so that they converge to a solution that allows for the outcome of the DNN to accurately fit the target output. The bias is analogously updated.

W~[l]=W~[l1]+dW~[l] (8)
dW~[l]=dW[l]|dB[l] (9)
dW[l]=ηΔ[l]A[l1] (10)
dB[l]=ηΔ[l] (11)

Considering that is an elementwise (or Hadamard) product, studied by Neudecker et al. (1995), the components below are built:

Δ[L]=YYˆ1YY (12)
gZ[l]=(1A[l])A[l],whenl=L1,L2 (13)
gZij[l]=α,ifZij[l]<01,ifZij[l]0,whenl=L3,L4 (14)

where.

i is the row of the referred matrix; and.

j is the column of the referred matrix.

Eventually, the matrix Δ[l] is built backwards as follows:

Δl1=Wl1ΔlgZ[l1],whenl=1,,4 (15)

2.3. Data processing

The data are processed using R software to build the datasets containing both input and output data per banking institution and year of default. Then, the DNN feedforward and backpropagation algorithms, as well as the accuracy testing and the prediction of the ELBE, are also implemented in R.

3. Results

3.1. DNN architecture definition

Several tests are performed based on the complete training dataset to determine the optimal DNN structure for predicting the ELBE. The aim is to measure the extent to which the overall outcome fits the target output data, as shown in Fig. 2.

Fig. 2.

Fig. 2

Target ELBE per banking institution and year of default. These curves serve as a reference to define the optimal DNN structure for ELBE forecasting.

Table 5 presents the proposed DNN structures for which the accuracy of ELBE1 and ELBE17 is assessed. These two variables, representing the initial level of loss and the adequate convergence to 100% after a long default period, respectively, are essential for shaping the overall ELBE trend.

Table 5.

Description of proposed architectures. The presented combinations of activation functions, applied between the five layers of the DNN, are tested in terms of accuracy.

Architecture Activation functions
1st 2nd 3rd 4th
A Sigmoid Sigmoid Sigmoid Sigmoid
B PReLU/ReLU PReLU/ReLU PReLU/ReLU PReLU/ReLU
C PReLU/ReLU Sigmoid Sigmoid Sigmoid
D PReLU/ReLU PReLU/ReLU Sigmoid Sigmoid

For this purpose, the learning rate is set to 0.7 since it allows for an appropriate learning curve drawn across 500 iterations ( Fig. 3). The weights are randomly initialised in each layer from a zero-mean Gaussian distribution and a standard deviation of 0.02 is considered, similar to the proposal from Mishkin et al. (2017). In addition, a multiplier factor of 0.2 is chosen for the PReLU (or ReLU) activation function at this stage because a different value would not modify the eventual selection of an adequate architecture for the DNN. The MSE serves to measure the goodness of fit related to each proposed architecture, and Table 6 presents the outcome.

Fig. 3.

Fig. 3

Accuracy trend of mean squared error and R-squared across 500 iterations. It follows the pattern typically observed while determining the optimal DNN structure.

Table 6.

Mean Squared Error computed for the initial and ending points of the ELBE curve obtained per DNN architecture.

Architecture MSE of ELBE1 MSE of ELBE17
A 3.54% 0.01%
B 3.49% 1.18%
C 3.10% 0.10%
D 2.90% 0.00%

The conclusions can also be visualised in Fig. 4, which shows that the nearest outcome to the target ELBE (Fig. 2) is achieved with architecture D, as also confirmed by the resulting MSEs (Table 6).

Fig. 4.

Fig. 4

ELBE of entity 1 resulting from testing the four types of DNN architectures.

3.2. Architecture fine-tuning per institution

The fine-tuning phase is performed by searching for the multiplying factor of the PReLU (or ReLU) activation function that leads to the best performance for each institution after 500 iterations. This multiplying factor (α) serves as input for Eq. 6. Once the accuracy per factor is measured over the training sample per entity, the resulting weights are applied to the respective testing sample. Then, the corresponding accuracy is also measured. For this purpose, the following measures are considered:

  • Correlation coefficient (CC)
    CC=E[XμXYμY]σXσY (15)
    where

    σX is the standard deviation from variable X; and

    σY is the standard deviation from variable Y.

  • R-squared (R2)

R2=σXY2σX2σY2 (16)

where.

σXY is the covariance from (X, Y);.

σX2 is the variance from variable X; and.

σY2 is the variance from variable Y.

Fig. 5 presents the resulting goodness of fit. As might be noted, the accuracy depends on the multiplying factor selected and varies from entity to entity. In addition, the correlation coefficient and the R-squared yield similar results. As expected, the results of the testing sample are more volatile due to the smaller number of data points. Then, based on the training sample results, the best multiplying factor is selected ( Table 7). Since that optimal solution is always greater than zero, the activation ReLU is eventually not used. Thus, it is shown that the PReLU improves the accuracy. Therefore, the PReLU and sigmoid functions are used to build the DNN.

Fig. 5.

Fig. 5

Accuracy results obtained by PReLU/ReLU multiplying factor and per entity.

Table 7.

Optimal PReLU multiplying factor per entity.

Entity α
1 0.5
2 0.9
3 0.7

3.3. ELBE prediction

3.3.1. First steps

The impact of COVID-19 has been significant in Spain, one of the three countries, together with China and Italy, where outbreaks occurred first. Consequently, severe confinement measures were implemented at the very beginning of the pandemic, from 15 March until 21 June 2020, resulting in a halt in economic activity. In this context, the UR prospects published by the Bank of Spain (BoS, 2020) and the International Monetary Fund (IMF, 2020a) worsened both the outcome of 2018 (15.25%) and the prediction issued as of December 2019 ( Table 8).

Table 8.

Spanish unemployment rates forecast as issued by the BoS and the IMF.

Date of publication Scenarios 2020
2021
BoS IMF BoS IMF
December 2019 Baseline 13.70% 13.20%
April 2020 Scenario 1 18.30% 20.80% 17.50% 17.50%
Scenario 2 20.60% 19.10%
Scenario 3 21.70% 19.90%

The BoS’s scenario 3 actually occurred because it corresponds to more than 12 weeks in confinement without returning to normality before the end of 2020, particularly concerning the hostelry and entertainment businesses. Scenario 1 expects a briefer confinement period and nonpersistent financial disturbances, while scenario 2 is similar to scenario 1 but is characterised by a continued financial shock. The forecast issued by the IMF (2020a) is on par with scenarios 2 and 1 for 2020 and 2021, respectively.

A relevant factor came into play at the beginning of the Spanish lockdown - the ERTE, which is a temporary measure of employment regulation. A vast number of enterprises benefited from this measure implemented by the Spanish government, even when they had no economic activity at all for a time period. Although the actual UR for 2020 is 15.67%, which is far from the aforementioned BoS and IMF forecasts,4 the millions of employees under ERTE conditions were not considered in the calculations. Therefore, the actual 2020 and 2021 UR figures are considered biased for the present research. In fact, a UR of 23.22% is obtained when accounting for the calculation of the maximum number of employees5 observed under ERTE conditions on a monthly basis during 2020. Based on the previous reasoning, the BoS’s scenario 3 forecast serves as input for the proposed DNN.

3.3.2. Main outcome

A twofold approach is followed to predict the 2022 ELBE of the mortgage defaulted portfolio for the three Spanish entities in scope. First, an ELBE is estimated on the basis of the baseline projection issued as of December 2019 (pre-COVID-19). Then, a second ELBE is estimated based on the scenario 3 prediction (post-COVID-19).

3.3.2.1. Pre-COVID-19 ELBE

To forecast the 2022 ELBE without reflecting the consequences of the disease outbreak, the input variables presented in Table 2 are considered. First, for the URs of the seven years preceding 2022, actual figures are retrieved from the aforementioned publication, with the exception of the two most recent rates, which correspond to BoS’s baseline projection for 2020 and 2021.

As indicated in Table 9, 6 there is generally a high correlation between the observed URs and the remaining input variables, which are neither URs nor a linear combination of them. This high correlation is a natural consequence because the portfolio risk characteristics captured by said drivers depend on the economic conjuncture, as the UR does.

Table 9.

Correlation between unemployment rate and other ML model input variables per entity.

Variable Correlation with UR (R-squared)
Entity 1 Entity 2 Entity 3
Risk level 72.35% 77.72% 42.21%
Economic cycle indicator [1] 96.79% 95.64% 96.56%
Economic cycle indicator [2] 89.78% 88.02% 90.48%
Percentage of restructured 62.49% 71.90% 60.43%
Percentage of other types of termination 73.80% 72.58% 73.17%
Year of default 91.06% 74.23% 84.77%

It is noted that both the 2020 and 2021 baseline projections of the UR issued by the BoS (Table 8) and the last actual UR available in 2019 are rather close to the arithmetic average of the URs from 2008 and 2009, which is 14.56% ( Fig. 6). Therefore, considering the aforementioned high correlation, the variables presented in Table 9 are calculated for the 2022 prediction as the arithmetic average of the 2008 and 2009 actual figures for those variables.

Fig. 6.

Fig. 6

Actual time series of yearly unemployment rates in Spain. The data points from 2008, 2009 and 2019 are highlighted for comparative purposes.

3.3.2.2. Post-COVID-19 ELBE

To allow the DNN to forecast the 2022 ELBE impacted by the crisis derived from the pandemic circumstances, the input variables from Table 2 are similarly reported. For this purpose, the BoS’s scenario 3 URs of 2020 and 2021 are considered, as well as the observed URs until then. As evidenced, the magnitude of the economic impact derived from the coronavirus crisis could be fairly compared with the recent global financial crisis, which began in 2008. Hence, since 2022 is the reference year for forecasting, two years after the outbreak of the disease, the data observed in 2010 are considered to feed the remaining input variables, that is, two years after the financial crisis struck.

3.3.2.3. Comparison

The already described best-performing DNN per entity serves as the basis for pre-COVID-19 and post-COVID-19 ELBE projections. A comparison from a qualitative perspective between the two ELBEs is performed in Fig. 7, observing entity-specific differences. Considering the portfolio distribution of operations by time in default per institution, the number-weighted average of each ELBE estimate and the outcome are computed ( Table 10) to perform a quantitative analysis.

Fig. 7.

Fig. 7

Comparison of both ELBEs forecasted for 2022 per entity.

Table 10.

Relative increase of ELBE due to the COVID-19 pandemic per entity.

Entity Pre-COVID-19 ELBE Post-COVID-19 ELBE ELBE growth
1 80.57% 85.61% 6.59%
2 87.14% 87.59% 0.56%
3 85.22% 89.48% 5.41%

Although the pre-COVID-19 ELBE is already between 80% and 90%, an increase in the post-COVID-19 ELBE of the mortgage defaulted exposures is still observed. The ELBE growth, presented in absolute terms, shows that the impact of the coronavirus crisis will be notable even in such a high-risk portfolio. In addition, differences in impact are notable – approximately 600 basis points among the three entities, proving that the forecast effectively considers the specificities of each banking institution.

4. Conclusions

The overall impact of the COVID-19 pandemic crisis remains to be determined. The global measures applied by the national governments, with the aim of controlling the health emergency, brought economic activity to a sudden halt, leading to a further increase in unemployment, as anticipated by certain organisations, and to adverse impacts on the asset quality of banks, as stated in the EBA thematic note (2020b).

This article proposes projections for the regulatory parameter used for expected loss estimation of defaulted exposures (ELBE) based on ML algorithms, which is a novel approach because there is no literature on this particular subject. The implementation of the proposed technique, based on data from the three largest credit entities in Spain, is proven to have excellent performance, even amid an uncertain economic outlook derived from the unprecedented COVID-19 crisis. A combination of the sigmoid and the PReLU activation functions, to build the DNN, turns out to be the optimal choice in terms of accuracy.

The presented methodology could be adapted for any other prospective estimation. Hence, the use of the proposed methodology could serve credit institutions in estimating the provisions and capital consumption needed to face the challenges attached to exceptional circumstances, leading to a more robust banking system that is sufficiently prepared to meet any future threat. Considering the potential future events of uncertain impact on the credit risk of financial institutions worldwide, the resulting prospects could serve to fine-tune banks’ strategies in the medium term.

The datasets available in credit institutions with regard to defaulted exposures are commonly limited in size, as in the data sample used in this article. Despite the scarcity of data, it is still feasible to achieve very good performance as well as meaningful results. High accuracy may be achieved for the proposed ML methodology following the approach presented throughout the article: first, the best DNN structure in terms of goodness of fit is chosen, and then, the multiplying factor is fine-tuned per institution to achieve optimal performance.

A key takeaway is that the uncertainty of the economic outlook derived from the outbreak can be partially mitigated by using AI techniques trained on the basis of both historical data and the forecast of relevant macroeconomic variables. Limitations in the use of ML for regulatory estimates may arise with regard to the underlying methodology because creating transparent documentation may be cumbersome due to its complexity. On this ground, the regulatory supervision of ML-based models appears challenging but not impossible. Future research could be focused on the implementation of the presented methodology using data from other countries and portfolios or in forecasting other IRB estimates. Undoubtedly, an eventual comparison of the pandemic’s impact on the banking portfolios’ credit risk among geographies merits research interest.

Funding

This research did not receive any specific grants from funding agencies in the public, commercial, or not-for-profit sectors.

CRediT authorship contribution statement

Marta Ramos González: Conceptualization, Methodology, Software, Investigation, Resources, Data curation, Writing – original draft, Visualization. Antonio Partal Ureña: Formal Analysis, Investigation, Resources, Writing – review & editing, Supervision, Project Administration. Pilar Gómez Fernández-Agudo: Validation, Investigation, Resources, Writing – review & editing, Supervision, Project Administration.

Footnotes

To be considered for the special issue of Research in International Business and Finance entitled “Artificial Intelligence and Machine Learning in Finance”

2

Pursuant to article 181(1)(h) of Regulation (EU) No. 575/2013 institutions shall estimate the ELBE and LGD in-default to account for the expected and unexpected credit risk losses, respectively, of the defaulted exposures

3

If α is equal to zero the activation function is ReLU, otherwise, it is PReLU

4

Subsequently, the IMF (2020b) issued an updated UR prediction: 16.8% for both 2020 and 2021

5

3576,192, according to Spanish Social Security (2021)

6

The table excludes the variables from Table 2 that are linear combinations of other variables because, obviously, there is no need to assess the correlation for them

Data Availability

The data that has been used is confidential.

References

  1. Ahmed S., Hammami H. Artificial intelligence and machine learning in finance: a bibliometric review. Res. Int. Bus. Financ. 2022;61 doi: 10.1016/j.ribaf.2022.101646. [DOI] [Google Scholar]
  2. Al-Maadid A., Alhazbi S., Al-Thelaya K. Using machine learning to analyze the impact of coronavirus pandemic news on the stock markets in GCC countries. Res. Int. Bus. Financ. 2022;61 doi: 10.1016/j.ribaf.2022.101667. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Alonso Robisco A., Carbó Martínez J.M. Can machine learning models save capital for banks? evidence from Spanish credit portfolio. Int. Rev. Financ. Anal. 2022;84 doi: 10.1016/j.irfa.2022.102372. [DOI] [Google Scholar]
  4. Alonso Robisco A., Carbó Martínez J.M. Measuring the model risk-adjusted performance of machine learning algorithms in credit default prediction. Financ. Innov. 2022;8:70. doi: 10.1186/s40854-022-00366-1. [DOI] [Google Scholar]
  5. Ampudia M., van Vlokhoven H., Zochowski D. Financial fragility of euro area households. J. Financ. Stab. 2016;27:250–262. doi: 10.1016/j.jfs.2016.02.003. [DOI] [Google Scholar]
  6. Bank of Spain (2020). Escenarios macroeconómicos de referencia para la economía española tras el COVID-19. Economic Bulletin 2/2020. https://www.bde.es/f/webbde/GAP/Secciones/SalaPrensa/COVID-19/be2002-art1.pdf.
  7. Barbaglia L., Manzan S., Tosetti E. Forecasting loan default in Europe with machine learning. J. Financ. Econ. 2021:nbab010. doi: 10.1093/jjfinec/nbab010. [DOI] [Google Scholar]
  8. Basel Committee on Banking Supervision, 2006. Basel II: International convergence of capital measurement and capital standards: A Revised framework – comprehensive version. https://www.bis.org/publ/bcbs128.pdf.
  9. Bastos J.A., Matos S.M. Explainable models of credit losses. Eur. J. Oper. Res. 2022;301(1):386–394. doi: 10.1016/j.ejor.2021.11.009. [DOI] [Google Scholar]
  10. Bellotti A., Brigo D., Gambetti P., Vrins F. Forecasting recovery rates on non-performing loans with machine learning. Int. J. Forecast. 2021;37(1):428–444. doi: 10.1016/j.ijforecast.2020.06.009. [DOI] [Google Scholar]
  11. Buddhtha S., Natasha C., Irwansyah E., Budiharto W. Building an artificial neural network with backpropagation algorithm to determine teacher engagement based on the Indonesian teacher engagement index and presenting the data in a web-based GIS. Int. J. Comput. Intell. Syst. 2019;12(2):1575–1584. doi: 10.2991/ijcis.d.191101.003. [DOI] [Google Scholar]
  12. European Banking Authority , 2016. Final draft regulatory technical standards on assessment methodology for IRB approach. EBA/RTS/2016/03. https://www.eba.europa.eu/sites/default/documents/files/documents/10180/1525916/e8373cbc-cc4b-4dd9–83b5–93c9657a39f0/Final%20Draft%20RTS%20on%20Assessment%20Methodology%20for%20IRB.pdf?retry=1.
  13. European Banking Authority, 2017. Guidelines on PD estimation, LGD estimation and treatment of defaulted assets. EBA/GL/2017/16. https://www.eba.europa.eu/sites/default/documents/files/documents/10180/2033363/6b062012–45d6–4655-af04–801d26493ed0/Guidelines%20on%20PD%20and%20LGD%20estimation%20%28EBA-GL-2017–16%29.pdf?retry=1.
  14. European Banking Authority , 2020a. Guidelines on legislative and non-legislative moratoria on loan repayments applied in the light of the COVID-19 crisis. EBA/GL/2020/02. https://www.eba.europa.eu/sites/default/documents/files/document_library/Publications/Guidelines/2020/GL%20amending%20EBA-GL-2020–02%20on%20payment%20moratoria/960349/Final%20report%20on%20EBA-GL-2020–02%20Guidelines%20on%20payment%20moratoria%20-%20consolidated%20version.pdf.
  15. European Banking Authority , 2020b. The EU banking sector: First insights into the COVID-19 impacts. Thematic Note. EBA/REP/2020/17. https://www.eba.europa.eu/sites/default/documents/files/document_library/Risk%20Analysis%20and%20Data/Risk%20Assessment%20Reports/2020/Thematic%20notes/883986/Thematic%20note%20-%20Preliminary%20analysis%20of%20impact%20of%20COVID-19%20on%20EU%20banks%20%E2%80%93%20May%202020.pdf.
  16. European Banking Authority , 2020c. EBA report on big data and advanced analytics. EBA/REP/2020/01. https://www.eba.europa.eu/sites/default/documents/files/document_library/Final%20Report%20on%20Big%20Dat%20and%20Advanced%20Analytics.pdf.
  17. European Banking Authority , 2021. EBA discussion paper on machine learning for IRB models. EBA/DP/2021/04. https://www.eba.europa.eu/sites/default/documents/files/document_library/Publications/Discussions/2022/Discussion%20on%20machine%20learning%20for%20IRB%20models/1023883/Discussion%20paper%20on%20machine%20learning%20for%20IRB%20models.pdf.
  18. European Commission , 2020, July. European economic forecast. Summer 2020 (Interim). Institutional Paper 132. https://doi.org/10.2765/828014.
  19. European Parliament and Council, 2013. Capital Requirements Regulation (CRR) Corrigendum to Regulation (EU) No 575/2013. https://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2013:321:0006:0342:EN:PDF.
  20. García-Céspedes R., Moreno M. The generalized Vasicek credit risk model: a machine learning approach. Financ. Res. Lett. 2022;47(A) doi: 10.1016/j.frl.2021.102669. [DOI] [Google Scholar]
  21. Hu S., Zhang Y. COVID-19 pandemic and firm performance: cross-country evidence. Int. Rev. Econ. Financ. 2021;74:365–372. doi: 10.1016/j.iref.2021.03.016. [DOI] [Google Scholar]
  22. Huck N. Large data sets and machine learning: applications to statistical arbitrage. Eur. J. Oper. Res. 2019;278(1):330–342. doi: 10.1016/j.ejor.2019.04.013. [DOI] [Google Scholar]
  23. International Monetary Fund , 2020a. World Economic Outlook, April 2020: The great lockdown. https://www.imf.org/en/Publications/WEO/Issues/2020/04/14/weo-april-2020.
  24. International Monetary Fund , 2020b. World Economic Outlook, June 2020: A crisis like no other, an uncertain recovery. https://www.imf.org/en/Publications/WEO/Issues/2020/06/24/WEOUpdateJune2020.
  25. Ioannidis J.P.A., Cripps S., Tanner M.A. Forecasting for COVID-19 has failed. Int. J. Forecast. 2022;38(2):423–438. doi: 10.1016/j.ijforecast.2020.08.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Karabayir I., Akbilgic O., Tas N. A novel learning algorithm to optimize deep neural networks: evolved gradient direction Optimizer (EVGO. IEEE Trans. Neural Netw. Learn. Syst. 2021;32(2):685–694. doi: 10.1109/TNNLS.2020.2979121. [DOI] [PubMed] [Google Scholar]
  27. Karadima M., Louri H. Economic policy uncertainty and non-performing loans: the moderating role of bank concentration. Financ. Res. Lett. 2021;38 doi: 10.1016/j.frl.2020.101458. [DOI] [Google Scholar]
  28. Ke X., Hsiao C. Economic impact of the most drastic lockdown during COVID-19 pandemic - the experience of Hubei, China. J. Appl. Econ. 2022;37(1):187–209. doi: 10.1002/jae.2871. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Kelly R., O’Malley T. The good, the bad and the impaired: a credit risk model of the Irish mortgage market. J. Financ. Stab. 2016;22:1–9. doi: 10.1016/j.jfs.2015.09.005. [DOI] [Google Scholar]
  30. Li J.-P., Mirza N., Rahat B., Xiong D. Machine learning and credit ratings prediction in the age of fourth industrial revolution. Technol. Forecast. Soc. Change. 2020;161 doi: 10.1016/j.techfore.2020.120309. [DOI] [Google Scholar]
  31. Macedo D., Zanchettin C., Oliveira A.L.I., Ludermir T. Enhancing batch normalized convolutional networks using displaced rectifier linear units: a systematic comparative study. Experts Syst. Appl. 2019;124:271–281. doi: 10.1016/j.eswa.2019.01.066. [DOI] [Google Scholar]
  32. Machado M.R., Karray S. Assessing credit risk of commercial customers using hybrid machine learning algorithms. Expert Syst. Appl. 2022;200 doi: 10.1016/j.eswa.2022.116889. [DOI] [Google Scholar]
  33. Markeviciute J., Bernataviciene J., Levuliene R., Medvedev V., Venskus J. Impact of COVID-19-related lockdown measures on economic and social outcomes in Lithuania. Mathematics. 2022;10(15):2734. doi: 10.3390/math10152734. [DOI] [Google Scholar]
  34. Mishkin D., Sergievskiy N., Matas J. Systematic evaluation of convolution neural network advances on the Imagenet. Comput. Vis. Image Underst. 2017;161:11–19. doi: 10.1016/j.cviu.2017.05.007. [DOI] [Google Scholar]
  35. Moscatelli M., Parlapiano F., Narizzano S., Viggiano G. Corporate default forecasting with machine learning. Expert Syst. Appl. 2020;16 doi: 10.1016/j.eswa.2020.113567. [DOI] [Google Scholar]
  36. Narayan P.K. Understanding exchange rate shocks during COVID-19. Financ. Res. Lett. 2021;45 doi: 10.1016/j.frl.2021.102181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Neudecker H., Polasek W., Liu S. The heteroskedastic linear regression model and the Hadamard product a note. J. Econ. 1995;68(2):361–366. doi: 10.1016/0304-4076(94)01655-J. [DOI] [Google Scholar]
  38. O’Donoghue C., Sologon D.M., Kyzyma I., McHale J. Modelling the distributional impact of the COVID-19 crisis. Fisc. Stud. 2020;41(2):321–336. doi: 10.1111/1475-5890.12231. 〈https://ftp.iza.org/dp13235.pdf〉 [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Organisation for Economic Co-operation and Development ,2021. The COVID-19 crisis and banking system resilience: Simulation of losses on non-performing loans and policy implications. https://www.oecd.org/daf/fin/financial-markets/COVID-19-crisis-and-banking-system-resilience.pdf.
  40. Pedauga L., Sáez F., Delgado-Márquez B.L. Macroeconomic lockdown and SMEs: the impact of the COVID-19 pandemic in Spain. Small Bus. Econ. 2022;58:665–688. doi: 10.1007/s11187-021-00476-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Ramos González M., Partal Ureña A., Gómez Fernández-Aguado P. Regulatory estimates for defaulted exposures: a case study of spanish mortgages. Mathematics. 2021;9(9):997. doi: 10.3390/math9090997. [DOI] [Google Scholar]
  42. Seetharaman P. Business models shifts: Impact of COVID-19. Int. J. Inf. Manag. 2020;54 doi: 10.1016/j.ijinfomgt.2020.102173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Sharif A., Aloui C., Yarovaya L. COVID-19 pandemic, oil prices, stock market, geopolitical risk and policy uncertainty nexus in the US economy: Fresh evidence from the wavelet-based approach. Int. Rev. Financ. Anal. 2020;70 doi: 10.1016/j.irfa.2020.101496. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Spanish Social Security , 2021. Affiliates to COVID-19 ERTE [Data set]. https://www.seg-social.es/wps/portal/wss/internet/EstadisticasPresupuestosEstudios/Estadisticas/EST8/22bfb5ae-8eba-4c44-a258–93a26194e11b.
  45. World Bank , 2020, June. Global Economic Prospects. https://doi.org/0.1596/978–1-4648–1553-9.
  46. World Bank , 2021a, June. Global Economic Prospects. https://doi.org/0.1596/978–1-4648–1665-9.
  47. World Bank , 2021b . Unemployment, total (% of total labor force) (modelled ILO estimate) – Spain [Data set]. https://data.worldbank.org/indicator/SL.UEM.TOTL.ZS?locations=ES.
  48. Yu B., Li C., Mirza N., Umar M. Forecasting credit rating of decarbonized firms: comparative assessment of machine learning models. Technol. Forecast. Soc. Change. 2021;174 doi: 10.1016/j.techfore.2021.121255. [DOI] [Google Scholar]
  49. Yu L., Huang X., Yin H. Can machine learning paradigm improve attribute noise problem in credit risk classification? Int. Rev. Econ. Financ. 2020;70:440–455. doi: 10.1016/j.iref.2020.08.016. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data that has been used is confidential.


Articles from Research in International Business and Finance are provided here courtesy of Elsevier

RESOURCES