Skip to main content
Heliyon logoLink to Heliyon
. 2020 Dec 29;6(12):e05795. doi: 10.1016/j.heliyon.2020.e05795

Determinants of efficiency in state-chartered financial institutions: Why financial education and freedom matter

Emmanuel Sousa de Abreu a,b,, Herbert Kimura a
PMCID: PMC7779705  PMID: 33426326

Abstract

In this paper, we verify which qualitative banking attributes can determine the level of American state-chartered Financial Institutions (FIs) and evaluate its underlying variables. The methodology followed three procedures of analysis. First, we measured banking efficiency using a two-stage SBM network data envelopment analysis (NDEA). Subsequently, we used machine learning methods to predict efficient FIs from qualitative attributes. Finally, we tested the variables related to the attributes, using a fractionated logistic regression controlled by economic-financial variables. As main results, we found that attributes linked to political-administrative localization criteria were the more important attribute in predicting if the FI was in the efficient group; we confirmed the recent findings of the literature that state that less governmental influence (freedom) is related to more efficient institutions. Besides that, we found that a population with a higher financial education have FIs with higher levels of efficiency.

Keywords: Banking efficiency, State-chartered financial institutions, SBM DEA network, Machine learning, Fractional logistic regression


Banking efficiency; State-chartered financial institutions; SBM DEA network; Machine learning; Fractional logistic regression

1. Introduction

The literature on banking efficiency has strongly debated the determinants of efficiency and the productivity of different financial institutions (FIs) in the last thirty years. The permanent research and discussion in this area can be explained by different factors, but today, the great technological change stands out due to the strong use of information technology by the financial service industry and the change in the structures of the markets with differentiated factors of competitiveness. In theory, this new industry configuration could indicate new determinants of the institutions' productive process.

Not only has the FIs productive process changed rapidly, but so have the forms and the models of evaluation of their performance, which have undergone considerable advances. Parametric and non-parametric models have evolved with new machine learning (ML) and artificial intelligence (AI) methods that provide new insights into the factors that could significantly influence the financial markets' efficiency at the regional, national or international level.

This change in the nature and competitiveness of the sector brings conflicts between FIs that, although they often provide similar banking services, are subject to attributive factors that affect their respective production processes differently. Thus, by way of example, different types of FIs, size and geographical location may determine a regulatory, taxation or even technological environment that is more or less beneficial. Thus, to assess the FIs' efficiency, first is finding out which are the qualitative attributes that are significantly related to banking efficiency and productivity.

However, the mere indication of the relationship between qualitative attributes and efficiency does not bring substantial answers to the initial origin on the different levels of productivity. It is necessary also to know which variables lie behind these attributes. In other words, the fact that an FI is a commercial bank or a credit union does not explain why these institutions have differentiated averages of efficiency. Possible reasons for this hypothetical difference may be due to different issues, such as governance models, regulatory regimes or even fiscal issues.

The answers to these questions are not trivial, but they have the potential to help market decision makers, government agencies and so many other stakeholders look for ways and models that are more appropriate and efficient for their environment. In this way, we try to present some answers to current discussions on banking efficiency from a methodology that can be divided into three phases. The first phase aimed to measure the level of efficiency of each FI. The second objective was to establish which attributive criteria were significantly linked to the more or less efficient of these institutions. And last but not least, the third phase aimed to test whether theoretical variables linked to the attributes could explain the efficiency levels of institutions.

In the first phase, we measured the efficiency of a sample composed of approximately 4,000 FIs, totalizing more than 60,000 observations in a time series (2003-2017). The set is formed by different sets of FIs that had the characteristics of local activities (state-chartered). In this phase, we applied a two-stage SBM network data envelopment analysis (NDEA) model with the banking intermediation approach. The results indicated a constant productivity distribution over the time series (2003-2017) and that efficiency scores behave differently from studies that assess the large banks that control the market or that include national banks in their sample.

In the second phase, we use the efficiency scores to divide institutions into more and less efficient ones and compare them with their attributive variables. From this classification, we applied a linear discriminant analysis (LDA) and ML methods (SVM, random forest [RF] and Bagging) to verify which attributive variables were able to predict efficient institutions. The results showed that variables linked to political-administrative localization criteria were able to predict if the FI was in the efficient group.

Last, we applied a fractional logistic regression to test which variables could be behind the fact that FIs located in certain regions had more efficient scores on average. In general, the results confirm the recent findings of the literature, which states that less governmental influence (freedom) is related with more efficient institutions. In addition, we found that states with a population with higher financial education have FIs with higher levels of efficiency, although the formal education level had no significant effect.

Our methodology is innovative because we use procedures that allow to evaluate the impact of either quantitative and qualitative variables on banking efficiency. Additionally, we use a sample selection criteria that allows us to verify possible influences from different structures of regulation, supervision and even geographical criteria.

This paper is organized as follows. After this introduction, a brief review of the recent related literature is provided in Section 2, followed by a description of the three data sets and methodology adopted in our study in Section 3. The results are shown and discussed in Section 4. Finally, the conclusions drawn from this study are presented in Section 5.

2. Brief literature review

Banking efficiency studies usually explore two main questions. The first asks what the most adequate way to measure FIs' productivity and efficiency is. This initial concern is about how to measure the efficiency of each FI. This question is far from being simple or consensual in the specialized literature, as it currently uses a wide range of approaches, methods and tools (de Abreu et al., 2018). The second question concerns the association of measured efficiency with some endogenous or exogenous variables to FIs. This issue is also quite complex, with few consensual issues in the literature about the main determinants of banking efficiency or productivity (de Abreu et al., 2018).

To measure the efficiency, currently structural approaches that design some efficiency frontier or a set benchmarks dominate the main literature (Hughes and Mester, 2012). The frontier, or even benchmarks, represent a maximum efficiency for a given amount of inputs (or outputs) established, meaning that the lower (higher) the inputs (outputs) amount to, the better it is the efficiency (Berger et al., 1995). In short, FIs that achieve higher efficiency, consuming fewer resources or producing more products, will be closer to the efficiency frontier.

The border can be constructed parametrically or non-parametrically. The parametric approach imposes a functional form. That is, it uses a function that determines the resources needed to reach a product. Non-parametric methods do not specify functional form since they construct the frontier or define the benchmarks directly from the available data (Hughes and Mester, 2012). A second difficulty in the measurement question is about the definition and the nature of the set of inputs and outputs. The studies usually take a position by choosing a sort of approach that represents the study's objective. Currently, production (Benston, 1965; Wanke et al., 2016) and intermediation (Sealey and Lindley, 1977; Chortareas et al., 2016) approaches are the most commonly used.

It should be clarified that currently some studies have treated deposits as an intermediate product in a two-stage process (Fukuyama and Matousek, 2011; Holod and Lewis, 2011). In the first stage the deposit would be an output and the following an input. The idea is that the deposits would be acquired from the usual factors of production (capital, labor) and later would serve to generate loans, or even profits. Some argue that such process would be a kind of model of the other intermediation approach that would be a new approach. We highlight, empirically, all this evaluation is quite complex because it involves many possibles organizational structures. For example, the literature shows that the use of off-balance sheet tools led to a biased efficiency of banks (Gurjar et al., 2020).

In relation to efficiency determinants, three issues are generally debated in the literature. First, one of the areas discusses whether the incorporation of business models can influence the capacity of institutions (Curi et al., 2015). Another important subject is related to how unequal regulation and supervision structures - that come from by different countries, regions and rules - impact the institutions' efficiency (Chortareas et al., 2012; Delis et al., 2011). Finally, a discussion of the different property structures and the level of efficiency of FIs is also widely debated. The specific evaluated aspects are the most diverse possible and usually come from the construction of previous clusters or are already indicated by the data available. It is common to compare FIs with a different degree of diversification, primary activities or differentiated market niches.

On the other hand, studies that link regulation and efficiency usually seek to identify how the exposition about different regulatory structures impacts the FIs' efficiency (Delis et al., 2011; Lozano-Vivas and Pasiouras, 2013). It is also common to assess how specific regulatory reforms impact the efficiency of institutions (Casu et al., 2016). There is no consensus in the area, existing studies that found policies promote a positive impact on bank productivity (Delis et al., 2011), and others that suggested that interventionist policies could result in high levels of banking inefficiency (Chortareas et al., 2012). A recent study even finds evidence that corporate social responsibility may have a positive impact on banking efficiency in developed countries (Belasri et al., 2020).

Other studies also demonstrated that financial freedom has the capacity to positively impact the efficiency of banking institutions (Chortareas et al., 2013, Chortareas et al., 2016). The intuitive notion suggests that states with less government interference promote the efficiency of FIs. The influence of this freedom implies several aspects, such as taxation, governmental participation and level of regulation, among others.

Another determinant that is currently presented in the literature concerns the effects of different ownership structures on the management and FIs' efficiency, including FIs controlled by minorities groups (Kashian et al., 2017; Hasan and Hunter, 1996; Iqbal et al., 1999) or by the State (Guo et al., 2020). Governance aspects would thus play a key role in the performance of FIs, including the addition of some mechanisms of corporate governance (Rashid et al., 2020; Ullah, 2020). Usually, the justifications for this influence are linked to agency problems and to the different contours of the governance structures of these institutions (Kontesa et al., 2020). On the other hand, more specific aspects, such as government incentives and subsidies, may also be tied to different ownership structures.

A study area that is related to all those research areas concerns the possible significant influence of a geographical location (or regional environment) on the performance of institutions (Degl'Innocenti et al., 2017; Jiménez-Hernández et al., 2019; Burgstaller, 2020). That is, it seems clear that geographical location itself appears to be a secondary factor originating from others that influence more directly on performance in the production of financial services. Therefore, these studies usually need to use control variables to identify the true determinant of the findings. For example, the difference in efficiency between institutions with locations in large and small urban centers may be due to the social, economic and technological level, as well as to banking and tax regulations, in those regions.

We call these FIs individual qualitative characteristics (e.g. location or business type) “qualitative attributes”. In the second phase of this study we will assess the importance of some of the main and then test potentially related explanatory variables.

3. Empirical methodology and data

3.1. Data

Our sample focused only on US state-chartered FIs. This selection is especially interesting because it allows us to verify possible influences from different structures of regulation, supervision and even geographical criteria. That is, our research covered all depository and lending institutions. The main objective was to obtain a set of information that had different typologies that was delimited by its location and was in its regulations of different forms.

An important aspect is that the United States, mainly due to the degree of autonomy of the states, has different levels of banking regulations that can be national- or state-chartered FIs. Depending on the choice of actuation, a FI may have national or statewide operations only. These different levels of regulatory spheres give the state-chartered FI in specific quite different legal, regulatory and supervisory structures because these activities are carried out by different agencies (“Department of Banking/Financial Institutions”).

For example, the Department of Banking's functions achieve the establishment of differentiated rules and supervision, auditing and monitoring. In addition to this local oversight, most banks that have Federal Deposit Insurance Corporation insurance are also subject to federal monitoring. In addition, state-chartered FIs may choose to belong to the Federal Reserve System, which, in such cases, usually increases the degree of regulation in return for access to larger resources.

This large number of regulatory institutions covering different geographic areas, in addition to the large number of institutions and legal systems, confers on the set of the American state-chartered FIs an environment significantly different from those that are usually evaluated in traditional banking efficiency studies. That is, the analysis usually focuses on the major banks that control the markets or on institutions listed in the stock market. Beside that, the regulatory requirements in the banking market have become especially strict for large banks that are relatively homogeneous due to the international standardization of the Basel Accords.

For example, Chortareas et al. (2016) used a set of data that covers all American commercial banks, including national banks, to evaluate the relationship between banking efficiency and level of freedom. In our opinion, this is not adequate for our purposes because the selection could cause great bias. National banks, including the largest ones, such as Bank of America (ISSN 480228) or City Bank (ISSN 476810), have clients at the national and international level, and their assets and liabilities are not concentrated in one region. We highlight that the five biggest banks in the United States control about 40% of the deposits.

To evaluate the state-chartered FIs, we use data from three different databases, each related to the three phases of the research. The first one provides the necessary data related to the inputs and outputs required to measure the FIs' efficiency. The second base provides information about the characteristics of the institutions that served as a proxy for the evaluation of efficiency determinants. Thus, we combine the financial information with the information regarding the attributes of each bank to evaluate the level banking efficiency linkage with determinants present in the banking literature. The last base gives data to test variables linked with the significant attributes found.

For the measurement of efficiency, we use data obtained from the Uniform Bank Performance Report (FFIEC). More specifically, we use the Reports of Condition and Income (Call Reports) and Uniform Bank Performance Reports (UBPRs). This database is used for an efficiency measurement in other studies (Chortareas et al., 2016). The database covers an unbalanced series from 2003 to 2017, allowing the temporal productivity analysis. The decision-making units (DMUs) are represented by each state-chartered FI that was part of the sample, and the efficiency scores for each year were calculated.

From this base, 6 variables were generated, as described in Table 1. Further information about the each metric can be obtained at FFIEC (2017).

Table 1.

Variables for mensuration (first phase of analysis).

Variables Description and source
Input 1:Labor Measured by the number of full-time equivalent employees on the payroll at the end of the period. Source: Federal Reserve Board.Call Report, 4 qt. (2 of 2). Code: RIAD4150.
Input 2:Capital The book value of premises and fixed assets. All premises and fixed assets, including capitalized leases, are included. Value expressed in dollars. Source: Federal Reserve Board.UBPR Balance Sheet (page 4). Code: UBPR2145.
Intermediate: Deposits Total deposits. Source: Federal Reserve Board.UBPR BalanceSheet (page 4). Code: UBPR2200.
Output 1:Loans For the sake of coherence with the criteria used by Fukuyama and Matousek (2011), we have used consumer and business loans in the same variable. Source: Federal Reserve Board.UBPR Balance Sheet (page 4). Code: UBPRD665+ UBPRE117 + UBPRE116.
Output 2:Securities Sum of all securities, interest-bearing bank balances, federal funds sold and trading account assets. Source: Federal Reserve Board. UBPR Balance Sheet (page 4). Code: UBPRE122.

Note: Monetary values expressed in dollars (thousands USD).

We used two variables to measure inputs (capital and labor), an intermediate variable (deposits) and two variables to measure outputs (loans and financial assets). The variables are usually used to measure efficiency based on the literature that uses the intermediation approach in two stages (Fukuyama and Matousek, 2011).

For the sake of comparability between the FIs' activities, we withdraw from the base the FIs that declared variables to be 0 or not available. That is, it is important that all are actively taking deposits, granting credit operations and generating in financial assets. FIs that declared 0 or unavailable inputs were also eliminated in view of the low probability that a regular institution would not have at least one full-time employee and fixed-capital amounts, eliminating data with evidence of error.

On the other hand, the second data was obtained from the National Information Center, a repository of the financial information and characteristics of the institutions that is collected by the Federal Reserve System. Five attributes were used to evaluate if they could significantly predict the efficiency levels. For the second phase of analysis, from 66,855 observations, 49,167 were used for training the models and 17,688 were used for testing them.

To avoid a negative impact on model fitting in reason of the disparity in the frequencies we sub sampled the data using Random Over Sampling Examples (ROSE) that is based on a bootstrap form of re-sampling from data (Menardi and Torelli, 2012). The synthetic balanced samples were generated using the R package “ROSE”.

Finally, to obtain the variables for the third phase, we had to search for information in different databases, as shown in Table 3. In the regression evaluation, the time series was limited until the year 2015 due to the unavailability of data. The data cover information about activities subnational levels, as it will be seen in Section 4, the most important attributes were linked to the political-administrative location of the FIs.

Table 3.

Variables for regression (third phase of analysis).

Variables Description and source
Efficiency Measure Dependent variable. Source: Scores calculated by the authors in the first phase of analysis.
Financial Literacy Combine variables as: WalletHub's “WalletLiteracy Survey” Score, Financial Planning & Habits and Financial Knowledge & Education. Source: WalletHub. Most & Least Financially Literate States.
Formal Education Combine variables as educational attainment, school quality and achievement gaps between genders and races. Source: WalletHub. Most & Least Educated States in America.
Government Spending Subnational Index. Combine variables as consumption spending (% of personal income), transfers and subsidies (% of personal income) and insurance and retirement payments (% of personal income). Source: Fraser Institute. Economic freedom of North America. Area 1.
Taxes Subnational Index. Combine variables as income and payroll tax revenue (% of personal income), top income tax rate, top income tax threshold, property tax and other tax revenue (% of personal income) and sales tax revenue (% of personal income). Source: Fraser Institute. Economic freedom of North America. Area 2.
Labor Subnational Index. Combine variables as minimum-wage income (% of per capita personal income), government employees (% of total employees) and union density (% of total employees). Source: Fraser Institute. Economic freedom of North America. Area 3.
Per Capita Personal Income Control variable. Personal income divided by population. Source: Bureau of Economic Analysis.
Logarithm of Total Assets Control variable. Source: Federal Reserve Board. Balance Sheet (page 4). Code: UBPR2170
Quadratic Term of Total Assets To test non-linearity in the relationship assets and efficiency scores. Source: Federal Reserve Board. Balance Sheet (page 4). Code: UBPR2170
Capitalization Control variable. Equity divided by total assets. Source: Federal Reserve Board. Balance Sheet (page 4). Codes: UBPRG105, UBPR2170.

Note: Fraser Institute measure variables as a percentage of income. All scores used were normalized between 0 and 1.

It is important to highlight that all the different data bases used in this research are open access. Thus, they can be consulted on the respective institutions websites.

3.2. Empirical methodology

As briefly mentioned, our study is innovative because it uses three main procedures to evaluate the impact of qualitative variables on banking efficiency. The steps of the method are as follows:

  • First step: to verify possible influences from different structures of regulation, supervision and even geographical criteria, we selected a sample composed of approximately 4,000 FIs, totalizing more than 60,000 observations in a time series (2003-2017);

  • Second Step: to have an efficiency measure for each FIs, we applied a two-stage SBM NDEA model;

  • Third step: to evaluate which qualitative variables had the best predictive capacity of the efficiency scores (best and worst efficiencies), we applied one traditional method and three ML techniques.

  • Fourth step: in the last procedure, we test some hypotheses that are presented by the literature and that are linked to the attributed political-administrative (the most relevant found) using an fractioned logit regression controlled by financial variables.

The efficiency measurement phase aimed to assess the performance of each FI. We used a model that considered the two main stages of production of financial services. For this, we applied a two-stage SBM NDEA model with the approach proposed by Tone and Tsutsui (2009) and a method for epsilon and weights by Tone and Tsutsui (2010).

A data envelopment analysis model is an operational research technique, which is based on linear programming, and whose objective is to comparatively analyze independent units (in our study, each American state-chartered FI) usually called DMUs with respect to its performance. It provides a measure to evaluate the relative efficiency and to obtain benchmarks (Zhu, 2014). Each FI (DMU) is represented by a set of outputs and a set of inputs. The basic idea is to compare the outputs with the inputs. Our outputs represent the two main products generally generated by FIs: loans and financial assets. However, to produce them, FIs need to use production factors. As shown in Table 1, our study used two proxies for the two main factors of production as inputs: fixed capital and labor. In addition to these, we used an intermediate input/output represented by the total deposits.

For an input-oriented efficiency (θo), Tone and Tsutsui (2009) solve the following linear problem:

θo=minγk,skk=1kWk[11mk(i=1mksikxiok)], (1)

where k=1kWk=1, Wk0(k) and Wk is the relative weight of Division k (Tone and Tsutsui, 2009, pag. 246).

Our first methodology approach has two main advantages for the traditional literature models. That is, banking efficiency studies that use nonparametric DEA efficiency techniques mostly use a radial approach (e.g., Charnes et al. (1978) or Banker et al. (1984)) and consider a single-stage structure (black box). The biggest criticism of radial models is that the inputs and outputs are proportionally adjusted. As Tone and Tsutsui (2009) notes, the relationships of inputs that use capital and labor are often substitutive and do not change proportionally. In relation to the black-box models, deposits can be characterized as either inputs or outputs, generating what is commonly called the deposit dilemma. As a rule, intermediation approach studies use deposits as inputs because deposits represent resources that are used to generate loans. On the other hand, in production approach studies, deposits are usually treated as outputs because they represent a service offered to bank customers.

Thus, the application of the SBM-NDEA model of Tone and Tsutsui (2009) allowed us to overcome both problems. First, we did not restrict the adjustments between inputs and outputs to only proportional changes. In other words, since a non-radial approach was used, simultaneous increases or decreases in the outputs (final outputs or intermediate) were allowed. Second, we were able to break down the FI production process into two phases. In the first phase, FIs need capital and labor to capture deposits. In the second phase, deposits serve as input for the generation of assets, such as loans and securities. This allows using an approach that overcomes the deposit dilemma in the banking efficiency literature (Holod and Lewis, 2011). In addition, the two-stage DEA provides a more accurate analysis, allowing, for example, evaluating efficiency in a segregated way. However, it should also be noted that there are other DEA models that allow division into two stages, but each will have an advantage or disadvantage. The well-known model by Kao and Hwang (2008), for example, has the disadvantage of only allowing constant return of scale; however, it has the advantage of being particularly intuitive, as the product of the individual phase scores is equal to the total efficiency score.

The DEA model was input-oriented and was generated under the variable return to scale hypothesis in the both stages of the analysis. The variable return to scale hypothesis was assumed, since the literature in the area already has some consensus regarding the existence of significant gains in scale in the production of financial services for smallest banks (Berger and Humphrey, 1994). In addition, input orientation was used to identify those that can reduce at least one of its inputs, avoiding idleness or the poor choice of capital and labor. Input orientation is also usually used because of the high degree of competition in the banking sector, with inputs being more discretionary than the possibility of changes in the offering of financial services that are more restricted by the market (Chortareas et al., 2012).

After obtaining the efficiency scores, we proceeded to the second methodological procedure. To verify that the prediction of efficiency would be possible from the unique use of qualitative attributes of the FIs, we applied a traditional method - LDA - and three ML methods: bootstrap aggregating (bagging or BG), RF and linear vector machine support (LSVM).

ML technology is considered a sub field of Artificial Intelligence, which works with the idea that machines can learn on their own when they have access to large volumes of data. In this way, the ML process was used to detect patterns and create boundaries between qualitative data. Basically, the algorithms used improved statistical analysis on the data they provide, in answers and more accurate.

The algorithms used were supervised, it means that the theory was used to control the input and output of desired data and provided suggestions on the accuracy of the predictions during training. When complete, the algorithm applies what has been learned to new data. This is what we call Supervised Machine Learning. In other words, we tried to predict a efficiency score from a list of variables. The main objective was to find evidence of relationships between qualitative institutional attributes and to compare the methods and results found.

To implement the models, we divided the FIs into two groups: efficient and non-efficient. We used a dummy variable, in which the 50% that had the highest scores were classified in the efficient group, while the other 50% were classified as non-efficient. The dummy variable thus served as a dependent variable that is predicted with the methods listed above from the qualitative attributes of each FI. Qualitative attributes are described in Table 2.

Table 2.

Attributive variables (second phase of analysis).

Variables Definition
Primary Federal Regulator The Primary Federal Regulator contains the agency that is the primary regulator of the bank.
Physical State The physical state indicates the state of the United States or a US territory in which the FI is physically located.
ARDF The Federal Reserve District of the regulatory authority indicates the specific authority for institutions.
Entity Type Cooperative bank, foreign banking organization, federal savings bank, non-member bank, savings and loan association, state member bank and state savings bank.
Subsidiary Holder The financial subsidiary holder indicates whether a bank conducts expanded financial activities through its direct or indirect ownership and control of an approved “financial subsidiary”, as defined in the Gramm–Leach–Bliley Act and in Section 4(k) of the Bank Holding Company Act. It is a dummy variable.
Minorities Owners The majority owned by minorities or women is a code indicating whether a bank, savings and loan association or non-FBO bank holding company is more than 50% owned by one or more minorities or women with identification of the minority.

Source: All variables come from the Bulk Data Table “Attributes” (Active). More information can be obtained at NIC (2016).

The LDA is a model traditionally used for the classification of observations in well-defined groups, in our case efficient and non-efficient FIs, from a discriminant function whose desired result is to obtain coefficients for each of the independent variables and to determine in which group the individual will be classified. The discriminant classification function explained by n independent qualitative variables has the following form described in Equation (2):

Y=b0+b1X1+...+bnXn, (2)

where Y is the dummy variable that segregates efficient and inefficient FIs (the dependent variable); bn are the coefficients that weight the independent variables Xi; and Xi are the discriminatory variables.

The classification of each individual is made by calculating the estimated discriminant function. The FI is classified as efficient if it is closer to this group than to the inefficient group. That is, if the distance between its discriminant score and the centroid of the efficient group is less than the distance between its score and the centroid inefficient group in the opposite case. We use the R software package “MASS” to execute the method.

RF is a classifier consisting of a collection of structured classifying trees h(x,Θk),k=1...; where ΘK are independent and identically distributed random vectors and each tree casts a single vote for the most popular class from the input data x (Breiman, 2001). We use the “randomForest” and “H2O” packages of the R software to perform the RF model.

To use the Bagging model we follow an algorithm as described by Breiman (1996), follows these steps: (1) construct a random sample, t, selected from the data set; (2) calculate the Ct estimator using the data set from step 1; (3) repeat the first and second steps with t=1T (T is the total of iterations defined); and (4) each classifier determines one vote where x holds the data of each element of the training set, according to Equation (3). After this algorithm, the highest voting class is chosen as the classification for each element of the data set. We use the “ipred” package of the R software to perform the method.

C(x)=T1t=1TCt(x) (3)

The last ML model used was based on the support vector machine (SVM). An SVM creates a hyperplane, which leads to partitions of the data on approximated homogeneous sides. The construction of a separation hyperplane is given by a Kernel function K (xi, xj), which is the product of the input vectors xi and xj, according to Equation (4).

k(xi,xj)=Φ(xi)Φ(xj) (4)

The SVM model used in this study (SVML) associates a Kernel function with a linear function (Equation (5)). We use the “parallelSVM” and “e1071” packages of the R software to perform the method.

K(xi,xj)=xiTxj (5)

To evaluate the second phase models, we applied different methods of analysis. We use the receiver operating characteristic (ROC) curve to analyze the usually antagonistic relationship between the probability of correct classification of efficient institutions (true-positive rate) and the probability of classifying them as efficient or inefficient institutions (false-positive rate). We also used the Brier score (BS) to evaluate the predictions of probability of more inefficient institutions. Finally, we used the Kolmogorov-Smirnov (KS) test to measure the distance between the cumulative distribution between efficient and non-efficient institutions. For operation of these models, we use the “ROCR” and “verification” libraries of R.

In the results presented in Section 4, we verify that the best attributes of FIs to predict their efficiency are linked to political-administrative locational criteria. From this finding, we move on to the third stage of evaluation, which is to test what underlying factors lie behind the influence of location on banking efficiency. For this, we have tested some hypotheses that are presented by the literature and that are linked to the attributed political-administrative location of the state-chartered FIs. These variables are listed in Table 3.

To test the variables hypothetically related to location attributes, we regressed the efficiency scores calculated in the first methodological phase using a fractional logistic regression. The proposed equation is presented below.

EFSk,t=α+FLSk,tβ+FESk,tγ+GSSk,tδ+TXSk,tξ+LASk,tλ+CVSk,t+uk,t,k=1,...,n;t=2003:2015, (6)

where EFSk,t is the efficiency of FI k at time t, FLSk,t is the financial literacy score, FESk,t is the formal education score, GSSk,t is the government spending score, TXSk,t is the taxation score, LASK,T is the labor intervention score, and CVSK,T is a vector of control variables with characteristics that may have an effect on the efficiency. Lastly, uk,t is the random error.

Importantly, we clarify that we use a fraction logit regression because as Chortareas et al. (2016) pointed out DEA scores are more a outcome of a fractional logit process than the outcome of a truncated process. We use the “frm” package of the R as base to perform the regression.

4. Results

The results referring to the efficiency scores for state-chartered FIs show the great dispersion among efficiency scores, with few institutions setting up between the benchmarks and a remote group with averages close to 11% of the benchmarks. The result clearly contrasts with the usual studies that evaluate banking efficiency from a sample of the top US banks that control most of the assets of the banking system. The mean and standard deviation of the efficiency scores show a relatively constant behavior during the 2003-2017 series, as shown in Fig. 1.

Figure 1.

Figure 1

Scores and productivity index.

Another important point is that the results of the descriptive statistics show evidence of exogenous influences and possibly regulatory and legal issues. That is, US FIs in general do not present large enough technological differences to justify the wide dispersion of results. Significant differences in attributive variables are the most likely cause to be verified in subsequent association procedures. This means, as we mentioned before, that the purposely heterogeneous sample that we selected has the potential to support attributive characteristics that affect the FIs' efficiency scores.

The histogram shown in Fig. 2 demonstrates the exponential format of the frequency curve. It is worth noting the concentration of a small group of FIs that stand as benchmark of the group. A group of approximately 1% of the institutions represented the benchmark of the sample by about 3,500 institutions, a number that depends on the year of analysis, since the series is not balanced.

Figure 2.

Figure 2

Efficiency scores - histogram.

As an example, Table 4 shows two FIs that are between the benchmarks and two FIs that are close to the score average. The first two cases contrast two FIs that use quantities of near inputs with much higher results in the outputs generated by the benchmark FI. The last two FIs also compare two cases where the benchmark FI has a much higher result, despite the close use of labor and the much lower use of fixed capital. The examples illustrate how there are large differences in the processes of US FIs when we look at the group of state-chartered FIs.

Table 4.

Examples - benchmark vs average FI.

2017 Report RSSD ID Score Labor Fix. capital Deposits Loans Securities
TEXAS EXCHANGE BANK 822556 1.00 23 1,212 556,957 217,293 765,646
LYTLE STATE BANK OF LYTLE 830252 0.11 22 1,152 73,334 20,737 54,144
UBS BANK USA 3212149 1.00 382 39 47,989,027 32,016,036 9,525,804
LUTHER BURBANK SAVINGS 497570 0.10 263 22,452 3,990,163 103 582,001

Note: Dollar amounts in thousands.

Using Fig. 3, we can evaluate the productivity behavior with the Malmquist Index, technical change and efficient change. Thus, the analysis of the Malmquist Indices variation reveals a constant average behavior, with some slight additions over time. The variations are considered to be reflections of compensations between changes between the mean indices of efficiency and technicality. Thus, the sector presents maintenance characteristics of the structural levels in the state-chartered FIs. The results confirm the maintenance of the structures of a few leading FIs with a great difference of means that was verified in the analysis of the global scores found. In short, the efficiency/productivity analysis over time corroborates the analysis previously made from the average efficiency scores and indicates that the sector has maintained constancy in its structures.

Figure 3.

Figure 3

Productivity evolution.

As a second methodological procedure, we used the available attributive variables described in Table 2 to verify if ML methods could discriminate the most efficient FIs from the least efficient FIs. We use a traditional model (LDA) and three ML models (RF, BG and SVML) and compare the performance of each. Of the 17,688 separate observations for evaluation, the LDA model correctly predicted 6,121 inefficient and 5,484 efficient FIs, resulting in an average accuracy of 65.60%. The RF correctly predicted that 7,542 FIs are efficient and 3,737 FIs are inefficient, resulting in an average accuracy of 63.76%. The BG model correctly predicted 6,206 inefficient and 5,546 efficient FIs, resulting in an average accuracy of 66.44%. In sequence, the SVML model was able to correctly predict 5,665 efficient and 5,723 inefficient FIs, resulting in an average accuracy of 64.38%. We summarize the accuracy data along with each type of error, false positive rate (FPR) and false negative rate (FNR), BS and KS in Table 5.

Table 5.

Models' evaluation.

Model Precision (%) FPR (%) FNR (%) BS KS
DA 65.6 31.5 37.3 31.2 31.49
RF 63.76 58.2 13.8 21.25 32.87
BG 66.44 30.6 36.6 29.11 32.96
SVML 64.38 36 35.2 31.44 29.94

Note: FPR (False Positive Rate), FNR (False Negative Rate).

We observed a relative convergence of the results of the BS and KS indices. Bagging obtained the best value for the KS test (32.96%), followed by RF with 32.87% and DA with 31.49%. The BS index presented a better result for RF (21.25%), followed by BG (29.11%) and AD (31.20%). The results are convergent with the observation of the ROC curve available in Fig. 4.

Figure 4.

Figure 4

ROC curve - benchmark.

Fig. 4 shows the benchmarking of the ROC curves for the models under evaluation. The LDA model and the other three ML models had very close behaviors. It should be noted that the ROC curve compares rates referring to the correct values (false and true positive); that is, they do not focus on errors, such as TNR and FNR.

The most important attributive variable for the predictions of the most efficient and least efficient classifications was the location in a certain state of the state-chartered FI. The second most important attribute was the Federal Reserve District of regulatory authority for institutions. The other variables had little importance in the classification of the models. It should be noted that issues normally considered important had little classificatory impact, such as the type of entity and the primary activity performed by the institution. Table 6 shows the results for the RF model in the evaluated variables.

Table 6.

Attributes importance.

Variable Relative importance Scaled importance Scaled importance
Physical State 588200.2 1 0.597517
ARDF 261776.9 0.445047 0.265923
Primary Federal Regulator 58341.15 0.099186 0.059265
Subsidiary Holder 26729.26 0.045442 0.027153
Entity Type 17794.29 0.030252 0.018076
Minorities Owner 17600.57 0.029923 0.017879
Primary Activity 13965.68 0.023743 0.014187

The two main attributes used by the models to predict the efficiency of FIs are directly related to political-administrative territorial divisions. However, this fact alone does not answer many questions. Another important issue is what the factors behind this geographic influence are. In this way, as mentioned in the Subsection 3.2, we performed a fractional logistic regression, as proposed by (Chortareas et al., 2016), to test possible variables that are linked to the geographic location and could theoretically affect the efficiency of FIs. The regression results are available in Table 7.

Table 7.

Regression results.

Estimate Std. error t value Pr(>|t|)
Intercept 27.43489 0.333711 82.212 0.000 ***
Government Speeding -0.51579 0.082586 -6.245 0.000 ***
Taxes 0.58207 0.079087 7.36 0.000 ***
Labor -0.15765 0.135877 -1.16 0.246
Financially Literate 1.627734 0.210505 7.733 0.000 ***
Formal Education -0.02205 0.071158 -0.31 0.757
Control:
LnAs (Assets) -4.74464 0.051969 -91.298 0.000 ***
LnAs2 (Quadratic Assets) 0.17253 0.002179 79.163 0.000 ***
EqAs (Capitalization) 1.977038 0.162785 12.145 0.000 ***
Per capita personal income 0.020634 0.001138 18.130 0.000 ***

Note: Robust standard errors; Number of observations: 54077; R-squared: 0.559.

The results are consistent with the hypothesis that subnational governments have a significant influence on the banking efficiency of institutions with a strict location in their territory. Subnational competence activities were mostly considered significant. Only the variables of labor and formal education were not shown to be significant, and the others showed a significantly probable relationship.

Two points deserve attention in the variables of labor and government expenditures. The first one was not significant in our analysis, although studies that evaluated this issue have already found a relationship with little labor market freedom and lower banking efficiency level (Chortareas et al., 2016). Obviously, our test cannot find statistical evidence alone, nor does it serve to refute the relationship between the two questions. Regarding government spending, the result was diametrically opposite. We found an inverse relationship between lower levels of subnational spending and the FIs' efficiency. We emphasize that the variables of government spending, taxes and labor measure the level of freedom of each state; that is, they measure the little governmental activity in these areas. Thus, the evidence is that states with greater relative participation in their spending have more efficient FIs. One possible explanation is that these states, on average, can bring greater financial activity through subsidies to the less-affluent population. However, this result contrasts with the results found in Chortareas et al. (2016).

It should also be noted that the results found for the government speeding variable do not indicate that states with more rigid or bureaucratic taxation generate more banking efficiency. This issue was evaluated by the variable of taxes, which had the opposite result. Subnational entities with tax activities with higher levels of taxation and with more bureaucratic fiscal structures presented more inefficient FIs. This result is in line with national and international studies that indicate that greater tax freedom generates more efficient institutions.

The two variables related to the education level of the population of each subnational entity confirmed the great importance of the financial education of individuals for banking efficiency. It should be noted that the level of formal education did not have a significant impact, showing that pure and simple formal education does not have the capacity to influence the behavior of banking clients to the point of influencing FIs' activities. On the other hand, the level of the financial education of the population was significant. The result strengthens the hypothesis that a financially educated population can seek better financial services and options within the market. The activities of more financially conscious individuals have the ability to influence the productive activities of FIs.

5. Conclusion

We measured the efficiency of a sample composed by approximately 3,500 state-chartered FIs that formed 66,855 observations in a time series (2003-2017). The results indicated a constant productivity distribution over the time series (2003-2017) and that efficiency scores behave differently from traditional studies that assess the large banks that control the market or include national banks in their sample. The main result indicates that the behavior of the banking market at the subnational level presents significant differences in its operating structure and in its national-level competitiveness. Future research investigating these differences is necessary to better understand their origins.

Using LDA and ML methods (LSVM, RF and bagging), we verified that variables linked to political-administrative localization criteria could predict if the FI was in the efficient group. However, other variables usually considered to be important were not crucial for classification methods, demonstrating that other attributes, such as type of institution or main regulator, have secondary importance. Considering that the study only covered the US market, research in other markets is suggested to evaluate if the behavior is similar.

Last but not least, the fractional logistic regression tested which variables could be behind the fact that FIs located in certain regions had more efficient scores on average. The results confirmed the recent findings of the literature that states with less governmental influence (more freedom) are related to having more efficient FIs. In addition, we found that the states with a population with higher financial education have more efficient FIs, although the formal education level had no significant effect.

A possible extension of this work is to incorporate stochastic frontier analysis or even convex nonparametric least square (Kuosmanen, 2008) methods in the mensuration phase to deal with some source of bias (e.g. noise terms, Heteroskedasticity), evaluating if the FIs behavior remains the same. Future research also includes the application of Game Cross Efficiency models (Liang et al., 2008), incorporating some competition among the FIs based long-term optimization of the total period, and confronting the results with traditional methods.

Declarations

Author contribution statement

E.S. de Abreu and H. Kimura: Conceived and designed the experiments; Performed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper.

Funding statement

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Data availability statement

Data associated with this study has been deposited online at: 1. Uniform Bank Performance Report (FFIEC): https://cdr.ffiec.gov/Public 2. National Information Center (NIC): https://ffiec.gov/NPW 3. Fraser Institute: https://www.fraserinstitute.org/economic-freedom 4. WalletHub: https://wallethub.com 5. Bureau Economic Analysis: https://www.bea.gov.

Declaration of interests statement

The authors declare no conflict of interest.

Additional information

No additional information is available for this paper.

Footnotes

The views expressed in this work are those of the authors and do not necessarily reflect those of the Central Bank of Brazil nor those of its members.

References

  1. de Abreu E.S., Kimura H., Sobreiro V.A. What is going on with studies on banking efficiency? Res. Int. Bus. Finance. 2018 [Google Scholar]
  2. Banker R.D., Charnes A., Cooper W.W. Some models for estimating technical and scale inefficiencies in data envelopment analysis. Manag. Sci. 1984;30:1078–1092. [Google Scholar]
  3. Belasri S., Gomes M., Pijourlet G. Corporate social responsibility and bank efficiency. J. Multinat. Financ. Manage. 2020;54 [Google Scholar]
  4. Benston G.J. Branch banking and economies of scale. J. Finance. 1965;20:312–331. [Google Scholar]
  5. Berger A., Humphrey D. 1994. Bank scale economies, mergers, concentration, and efficiency: the U.S. experience. Center for Financial Institutions Working Papers, 94-25. [Google Scholar]
  6. Berger A.N., Herring R.J., Szegö G.P. The role of capital in financial institutions. J. Bank. Finance. 1995;19:393–430. [Google Scholar]
  7. Breiman L. Bagging predictors. Mach. Learn. 1996;24:123–140. [Google Scholar]
  8. Breiman L. Random forests. Mach. Learn. 2001;45:5–32. [Google Scholar]
  9. Burgstaller J. Retail-bank efficiency: nonstandard goals and environmental determinants. Ann. Public Coop. Econ. 2020;91 [Google Scholar]
  10. Casu B., Deng B., Ferrari A. Post-crisis regulatory reforms and bank performance: lessons from Asia. Eur. J. Finance. 2016;23:1544–1571. [Google Scholar]
  11. Charnes A., Cooper W., Rhodes E. Measuring the efficiency of decision making units. Eur. J. Oper. Res. 1978;2:429–444. [Google Scholar]
  12. Chortareas G., Kapetanios G., Ventouri A. Credit market freedom and cost efficiency in US state banking. J. Empir. Finance. 2016;37:173–185. [Google Scholar]
  13. Chortareas G.E., Girardone C., Ventouri A. Bank supervision, regulation, and efficiency: evidence from the European Union. J. Financ. Stab. 2012;8:292–302. [Google Scholar]
  14. Chortareas G.E., Girardone C., Ventouri A. Financial freedom and bank efficiency: evidence from the European Union. J. Bank. Finance. 2013;37:1223–1231. [Google Scholar]
  15. Curi C., Lozano-Vivas A., Zelenyuk V. Foreign bank diversification and efficiency prior to and during the financial crisis: does one business model fit all? J. Bank. Finance. 2015;61:S22–S35. [Google Scholar]
  16. Degl'Innocenti M., Matousek R., Sevic Z., Tzeremes N.G. Bank efficiency and financial centres: does geographical location matter? J. Int. Financ. Mark. Inst. Money. 2017;46:188–198. http://www.sciencedirect.com/science/article/pii/S104244311630141X [Google Scholar]
  17. Delis M.D., Molyneux P., Pasiouras F. Regulations and productivity growth in banking: evidence from transition economies. J. Money Credit Bank. 2011;43:735–764. [Google Scholar]
  18. FFIEC . Board of Governors of the Federal Reserve System, Federal Deposit Insurance Corporation; 2017. User's Guide for the Uniform Bank Performance Report – Technical Information. [Google Scholar]
  19. Fukuyama H., Matousek R. Efficiency of Turkish banking: two-stage network system. Variable returns to scale model. J. Int. Financ. Mark. Inst. Money. 2011;21:75–91. [Google Scholar]
  20. Guo L., Na S., Tan X., Gui P., Liu C. Evolution of the efficiency of nationwide commercial banks in China based on an SBM-undesirable model and DEA window analysis. Math. Probl. Eng. 2020;2020:1–12. [Google Scholar]
  21. Gurjar H., Tripathi A., Joshi M.C. The bank efficiency through off-balance sheet items' window: a Malmquist approach. Vis. J. Bus. Perspect. 2020 [Google Scholar]
  22. Hasan I., Hunter W.C. Management efficiency in minority- and women-owned banks. Econ. Perspect. FRB Chic. 1996:20–28. [Google Scholar]
  23. Holod D., Lewis H.F. Resolving the deposit dilemma: a new DEA bank efficiency model. J. Bank. Finance. 2011;35:2801–2810. [Google Scholar]
  24. Hughes J.P., Mester L.J. The Oxford Handbook of Banking. 2012. Efficiency in banking: theory, practice, and evidence.http://www.oxfordhandbooks.com/view/10.1093/oxfordhb/9780199640935.001.0001/oxfordhb-9780199640935-e-018 [Google Scholar]
  25. Iqbal Z., Ramaswamy K.V., Akhigbe A. The output efficiency of minority-owned banks in the United States. Int. Rev. Econ. Finance. 1999;8:105–114. [Google Scholar]
  26. Jiménez-Hernández I., Palazzo G., Sáez-Fernández F.J. Determinants of bank efficiency: evidence from the Latin American banking industry. Appl. Econ. Anal. 2019;27:184–206. [Google Scholar]
  27. Kao C., Hwang S.-N. Efficiency decomposition in two-stage data envelopment analysis: an application to non-life insurance companies in Taiwan. Eur. J. Oper. Res. 2008;185:418–429. [Google Scholar]
  28. Kashian R., McGregory R., Drago R. Minority owned banks and efficiency revisited. J. Product. Anal. 2017;48 [Google Scholar]
  29. Kontesa M., Nichol E.O., Bong J.-S., Brahmana R.K. Board capital and bank efficiency: insight from Vietnam. Bus. Theory Pract. 2020;21:483–493. [Google Scholar]
  30. Kuosmanen T. Representation theorem for convex nonparametric least squares. Econom. J. 2008;11:308–325. [Google Scholar]
  31. Liang L., Cook W.D., Zhu J. Dea models for two-stage processes: game approach and efficiency decomposition. Nav. Res. Logist. 2008;55:643–653. [Google Scholar]
  32. Lozano-Vivas A., Pasiouras F. Bank productivity change and off-balance-sheet activities across different levels of economic development. J. Financ. Serv. Res. 2013;46:271–294. [Google Scholar]
  33. Menardi G., Torelli N. Training and assessing classification rules with imbalanced data. Data Min. Knowl. Discov. 2012;28:92–122. [Google Scholar]
  34. NIC . 1st ed. National Information Center; 2016. Bulk Data Dowload Data Dictionary and Reference Guide. [Google Scholar]
  35. Rashid M.H.U., Zobair S.A.M., Chowdhury M.A.I., Islam A. Corporate governance and banks' productivity: evidence from the banking industry in Bangladesh. Bus. Res. 2020;13:615–637. [Google Scholar]
  36. Sealey C.W., Lindley J.T. Inputs, outputs, and a theory of production and cost at depository financial institutions. J. Finance. 1977;32:1251–1266. http://www.jstor.org/stable/2326527 [Google Scholar]
  37. Tone K., Tsutsui M. Network DEA: a slacks-based measure approach. Eur. J. Oper. Res. 2009;197:243–252. [Google Scholar]
  38. Tone K., Tsutsui M. An epsilon-based measure of efficiency in DEA – a third pole of technical efficiency. Eur. J. Oper. Res. 2010;207:1554–1563. [Google Scholar]
  39. Ullah S. Role of corporate governance in bank's efficiency in Pakistan. Stud. Bus. Econ. 2020;15:243–258. [Google Scholar]
  40. Wanke P., Azad M.A.K., Barros C.P. Financial distress and the Malaysian dual baking system: a dynamic slacks approach. J. Bank. Finance. 2016;66:1–18. [Google Scholar]
  41. Zhu J. vol. 213. Springer; 2014. Quantitative Models for Performance Evaluation and Benchmarking: Data Envelopment Analysis with Spreadsheets. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data associated with this study has been deposited online at: 1. Uniform Bank Performance Report (FFIEC): https://cdr.ffiec.gov/Public 2. National Information Center (NIC): https://ffiec.gov/NPW 3. Fraser Institute: https://www.fraserinstitute.org/economic-freedom 4. WalletHub: https://wallethub.com 5. Bureau Economic Analysis: https://www.bea.gov.


Articles from Heliyon are provided here courtesy of Elsevier

RESOURCES