Integrating Dropout and Kullback-Leibler Regularization in Bayesian Neural Networks for improved uncertainty estimation in Regression

Raghavendra M Devadas; Vani Hiremani

doi:10.1016/j.mex.2024.102659

. 2024 Mar 15;12:102659. doi: 10.1016/j.mex.2024.102659

Integrating Dropout and Kullback-Leibler Regularization in Bayesian Neural Networks for improved uncertainty estimation in Regression

Raghavendra M Devadas ^a,^⁎, Vani Hiremani ^b

PMCID: PMC10973668 PMID: 38550761

Abstract

The objective of the study is to enhance uncertainty prediction in regression problems by introducing a revolutionary Bayesian Neural Network (BNN) model. Experimental results reveal significant improvements in uncertainty prediction and point forecasts with the integrated BNN model compared to the plain BNN. Performance metrics, including mean squared error (MSE), mean absolute error (MAE), and R-squared (R²), demonstrate superior results for the proposed BNN. The experimental results show that for plain BNN, MSE is 87.3, MAE is 6.62 and R2 is −0.0492, whereas for the proposed BNN model MSE was found to be 44.64, MAE is 4.4 and R2 is 0.46. This research brings a fresh approach to Bayesian Neural Networks by incorporating both dropout and KL regularization techniques, resulting in a powerful tool for handling regression tasks with certainty. By combining these techniques, the study enhances model stability, avoids overfitting, and achieves more reliable uncertainty estimation. This study adds to our knowledge of uncertainty-aware machine learning models and offers a valuable solution for accurately assessing uncertainty in various applications.

•
The innovative BNN model merges the power of Bayesian principles with the effectiveness of dropout and KL regularization.
•
To test and refine our model, the study utilizes the Boston Housing dataset for both training and evaluation purposes.

Keywords: Bayesian neural network, Dropout, Kullback-Leibler regularization, Uncertainty quantification, Regression, Machine learning

Method name: Modified BNN model integrating Dropout and KL Regularization

Graphical abstract

Specifications table

Subject area:	Computer Science
More specific subject area:	Machine Learning
Name of your method:	Modified BNN model integrating Dropout and KL Regularization
Name and reference of original method:	Radford M. Neal. Bayesian Learning for Neural Networks (1996), https://link.springer.com/book/10.1007/978-1-4612-0745-0
Resource availability:	NA

Open in a new tab

Method details

Preliminary background

There has been a growing popularity of BNNs over the recent years as a part of the ML paradigm for regression tasks with its built-in abilities to supply point prediction and valid measures of predictive uncertainty. Uncertainty quantification must be incorporated into predictive models for use in situations involving decision-making, risk assessment, and autonomous systems. The research looks at improving the capacity of a BNN with two regularization techniques – the Dropout, and Kullback-Leibler (KL) divergence – to aid the ability to give more meaningful uncertainties that are well calibrated. The differences between the Bayesian Neural Networks and normal neural networks lie in incorporating the probabilistic scheme for weighing parameters. Contrary to that, BNNs model their weight distributions and hence they can incorporate the inherent uncertainties of the data. Although Bayesian inference in BNNs is not easily achievable, several approximation techniques have been suggested to tackle this issue. One widely used method is dropout, which can be viewed as a means of randomly selecting network weights from a Bernoulli distribution. Dropout can be considered as a form of Bayesian approximation, producing a variational posterior over the weights.

The predictive distribution can then be obtained by combining the outputs of multiple dropout samples. This approach, referred to as Monte Carlo dropout (MC dropout), has demonstrated success in achieving strong performance and effectively estimating uncertainty in a variety of tasks [1], [2].

Although dropout is often used as a Bayesian approximation, it does have some limitations. Firstly, the dropout rate, which determines the sparsity of the network, is typically set as a hyperparameter, or learned through gradient-based optimization. This may overlook the most effective level of uncertainty for various tasks or data points. Additionally, dropouts do not consider the prior distribution of weights, which can act as a regularization technique to prevent overfitting in the network. In addition, MC Dropout is a special kind of BNN that uses Dropout to assume that the weights are Bernoulli distributions.

There are several approaches to improve dropout rates, such as the variational approach. A study made by [3] provides a tractable approximation based on the MC (Montreal) dropout method to capture uncertainty without loss of precision. The research underscores the importance of having dependable methods for quantifying uncertainty in deep learning prognostics. This is essential in making well-informed decisions for predictive maintenance [4]. In the present study [5], authors conduct functional Bayesian or stochastic networks for variational inference. In yet another study, [6], the deterministic neural network inference with robust Bayesian networks. The method is suggested by authors of [7] gradient variance and automatic prior calibration for training an effective Bayesian neural network. The method of adversarial (white box) training is proposed for Bayesian networks by researchers showing much stronger resistance to white box assaults [8]. Authors examine DeepBayes in [9] GW-W-CRLNN networks with Gaussian Primes. Gaussian Primes in Bayesian Networks L1 regularization and L2, Weight Decomposition Regularization. A recent study [10] compared the predictive uncertainty estimates of 10 standard inference techniques on regression and classification problems. The experimental outcomes imply that most of the suggested inference innovations aimed at capturing the structure of the posteriors do not produce optimal or even good quality approximations.

Recent work in computer vision has attempted to improve point prediction accuracy by developing network architectures and learning algorithms. In [11], authors suggest a natural approach to quantifying the uncertainty of a classification problem by breaking down the notion of moment-based prediction uncertainty into two sub-inducible variables, Aleatoric and Epistemic uncertainty. A fuzzy logic approach was employed by authors in [12] to address ambiguities and ambiguities among stakeholders in the software requirement prioritization process. Another noted study by [13] includes methods to handle uncertainty. The paper [14] introduces a modified Adomian Decomposition Method, utilizing parameterized terms to enhance convergence and address classical ADM limitations, proving effective in solving nonlinear equations in engineering applications. Yet another study introduces a modified Adomian Decomposition Method for solving fractional ordinary and partial differential equations, demonstrating superior accuracy and performance compared to traditional and commonly used numerical techniques [15].

The main contribution of this study includes,

•
The study presents a significant breakthrough in the realm of regression tasks, offering a novel approach to predicting uncertainty.
•
The implementation of dropout and KL regularization techniques within the Bayesian Neural Network framework represents a cutting-edge development in the field.
•
This combination seeks to bolster model stability through dropout regularization while also providing trustworthy and well-calibrated uncertainty estimates through KL divergence. Moreover, the use of the well-known Boston Housing dataset provides concrete evidence to validate the effectiveness of our proposed methodology.

Even though Bayesian Neural Networks have made significant progress in predicting uncertainty, there are still areas where further research is needed. One key area is finding methods that can effectively balance model complexity and interpretability, to avoid the issue of uncertainty estimates being overly influenced by complex model designs. There is also a need to examine the computational efficiency of this approach, particularly when dealing with large datasets, which calls for more investigation. The motivation behind this research stems from the recognition of the pivotal role uncertainty plays in decision-making processes. In various real-world applications, from finance to healthcare, having a nuanced understanding of predictive uncertainties is essential for informed and robust decision-making. Traditional machine learning models, while proficient in point predictions, often fall short in capturing and quantifying uncertainties accurately. The motivation to address this limitation propels the exploration of novel methodologies that can enhance both point forecasts and uncertainty estimates.

Theoretical background

Bayesian Neural Networks (BNNs):

Bayesian neural networks differ from standard neural networks in that model parameters are treated as probability distributions instead of fixed values. This allows BNNs to capture uncertainty in predictions and make probabilistic forecasts. The basic concept is that a prior distribution is applied to the network's weights and the distribution is updated based on observed data to create a posterior distribution.

Likelihood:

The likelihood function [16] is a measure of the likelihood that the target variable (v) will be observed based on the input data (U) and model parameters (θ). For a regression problem with Gaussian likelihood, it is often assumed that:

P (v ∣ U, θ) = N (f (U, θ), σ^{2})

(1)

Where, N is the Gaussian distribution, f(U, θ) is the neural network function, and σ² is the observation noise.

Prior:

P(θ) represents our expectations about how model parameters will be distributed before seeing any data. Common choices include Gaussian priors:

P (θ) = N (0, α I)

(2)

where α controls the strength of the prior.

Posterior:

The posterior distribution, P(θ∣X,y), is updated based on observed data using Bayes' theorem:

P (θ | U, v) \propto P (vU, θ |) P (θ)

(3)

The posterior encapsulates the updated beliefs about model parameters given the data.

Predictive Distribution:

The predictive distribution, P(v∗∣U∗,U,v), for new, unseen data U∗ is obtained by marginalizing over the posterior [17]:

P (v * ∣ U *, U, v) = \int P (v * ∣ U *, θ) P (θ ∣ U, v) d θ

(4)

where P(v∗∣U∗, θ) is the likelihood for new data given model parameters.

Dropout in Bayesian Neural Networks:

In the context of BNNs, dropout is a regularization technique that introduces stochasticity during training. For each training iteration, the goal is to randomly “drop out” neurons from the network, resulting in a sub-network. This ensemble approach avoids overfitting and enhances the model's generalization. Let's denote the output of a layer in a neural network as h, and the output after applying dropout hdropout. The dropout operation can be mathematically represented as [18]:

hdropout = h ⊙ ε

(5)

Here, ⊙ represents element-wise multiplication, and ϵ is a binary mask variable with values 0 or 1. The mask ϵ is generated independently for each training iteration, where each element has a probability p of being 1 and 1 − p of being 0. Sample values for ϵ are taken from Bernoulli's distribution.

During inference or testing, the dropout operation is typically scaled by 1 − p to ensure that the expected value of hdropout remains the same as during training:

h inference = (1 - p) \cdot h

(6)

This scaling helps maintain the expected output magnitude and improves the model's robustness. In the context of BNNs, dropout is applied to the weights of the network. If we denote the weights of a layer as W and the dropout mask as ϵ, the modified weights Wdropout during training can be expressed as:

Wdropout = W ⊙ ε

(7)

The dropout mask ϵ is generated independently for each weight in the network. During inference, the weights are typically scaled by 1 − p to maintain the expected values.

Kullback-Leibler (KL) divergence:

KL divergence is the difference between one probability distribution and a second expected probability distribution. It's often used in the context of Bayesian methods to quantify the difference between a prior and a posterior distribution. Given two probability distributions Y(u) and Z(u) over a random variable u, the KL divergence from Z to Y is defined as [19]:

D_{KL} (Y | | Z) = \sum uY (u) log (\frac{Y (u)}{Z (u)})

(8)

In the continuous case, this becomes an integral:

D_{KL} (Y | | Z) = \int Y (u) log (\frac{Y (u)}{Z (u)}) du

(9)

It is important to note that KL divergence is not symmetrical (D_KL(Y∣∣Z)!=D_KL(Z∣∣Y)), and it is always non-negative. Further, If D_KL(Y∣∣Z)=0, it implies that the two distributions Y and Z are identical. If D_KL(Y∣∣Z)>0, it indicates that there is a divergence between the two distributions. In Bayesian inference, the KL divergence is often used to quantify the difference between the prior distribution Y(θ) and the posterior distribution Z(θ∣D) after observing data D. The KL divergence in this context is expressed as:

D_{KL} (Y (θ) | | Z (θ | D))

(10)

This measures how much the posterior diverges from the prior after incorporating the information from the observed data.

Mean Squared Error (MSE):

MSE measures the average squared difference between the true values and the predicted values. It is calculated by taking the average of the squared differences between each actual and predicted value.

Mean Absolute Error (MAE):

MAE measures the average absolute difference between the true values and the predicted values. It is calculated by taking the average of the absolute differences between each actual and predicted value.

R-squared (Coefficient of Determination):

R-squared represents the proportion of the variance in the dependent variable (target) that is predictable from the independent variables (features). It ranges from 0 to 1 and indicates the goodness of fit.

Formula : R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(11)

Where:

$\bar{y}$ is the mean of the true values.

Methodology

In BNNs, the objective is often to learn a posterior allocation of weights. The introduction of dropout leads to a variational approximation of this distribution. For Bayesian neural networks with dropout, the objective function is the sum of negative log likelihood and KL distance from the approximate posterior to the prior:

Objective=Negative Log Likelihood + KL Divergence

The above KL divergence term represents the difference in means between the posterior distribution and the mean of the prior distribution. The regularization therefore calls for a greater adoration by the network to explore a much more structured weight space that helps in preventing overfitting.

Mathematical Formulation of KL Divergence in BNNs with Dropout:

The KL divergence term is mathematically expressed as:

KL (q (W ∣ θ) | | p (W)) = \frac{1}{2} \sum_{i} (log (\frac{σ_{i}^{2}}{τ^{2}}) + \frac{μ_{i}^{2}}{τ^{2}} + \frac{σ_{i}^{2}}{τ^{2}} - 1)

(12)

Here, q(W∣θ) is the approximate posterior distribution, p(W) is the prior distribution, μ_i and σ_i are the average and standard deviation for weight i of the approximate posterior, and $σ$ is the hyperparameter that controls the amplitude of the regularization. Fig. 1, illustrates the methodology followed in this study.

The proposed methodology consists of the following steps:

(a)
Data Collection:

The utilized dataset is the Boston Housing dataset, a recognized standard for regression problems. This dataset comprises features and target variables pertinent to housing prices.

(b)
Data Preprocessing:

A normalization process was applied to standardize both input features and target variables, ensuring a uniform scale. Additionally, the dataset was partitioned into distinct training and testing sets.

Two Bayesian Neural Network (BNN) architectures were implemented. The first incorporated dropout and Kullback-Leibler (KL) divergence regularization, while the second was a plain BNN. The architectural components included layers for input, batch normalization, dense (fully connected) layers, dropout layers, and an Independent Normal layer for probabilistic output.

(d)
Training:

The training phase involved employing the Adam optimizer with a learning rate set at 0.01. The plain BNN underwent 50 epochs of training without any regularization, while the BNN with dropout and KL divergence regularization was trained with the specified regularization techniques.

(e)
Evaluation:

Model evaluation took place on the testing set to gauge predictive performance. Multiple performance metrics, including Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-Squared, were calculated to assess the models' effectiveness.

Table 1 shows the properties of the dataset used for this study.

Table 1.

Dataset Characteristics.

Name	Bouston Housing dataset
Type	Regression tasks
No of Samples	506
No of features	13
Features	CRIM - Per capita crime rate by town ZN - Proportion of residential land zoned for large lots INDUS - Proportion of non-retail business acres per town CHAS - Charles River dummy variable NOX - Nitric oxides concentration RM - Average number of rooms per dwelling AGE - Proportion of owner-occupied units built prior to 1940 DIS - Weighted distances to employment centers RAD - Index of accessibility to radial highways TAX - Full-value property tax rate per $10,000 PTRATIO - Pupil-teacher ratio by town B (1000(Bk - 0.63)^2 where Bk is the proportion of Black residents): LSTAT -% lower status of the population

Open in a new tab

Fig. 2 illustrates the relationship between specific features and house prices which is the target variable in the Boston Housing dataset, and illustrates how these features are distributed and how they might affect the target variable..

The plain Bayesian neural network algorithm is represented as follows (Algorithm 1).

Algorithm 1 Plain BNN.

Input:

–
Training dataset (x_train, y_train)
–
Testing dataset (x_test, y_test)
–
BNN architecture parameters (hidden layers, activation functions)
–
Learning rate
–
Number of epochs

1. Preprocess and normalize the datasets (x_train, x_test, y_train, y_test).

2. Initialize Plain BNN architecture:

–
Input layer
–
Hidden layers with specified activation functions
–
Output layer with default multivariate normal distribution

3. Compile the Plain BNN model:

–
Utilize Adam optimizer with the specified learning rate
–
Define mean squared error (MSE) as the loss function

4. Train the Plain BNN model on the training dataset:

–
Iterate through epochs:
- –
  Forward pass
- –
  Compute MSE loss
- –
  Backward pass and optimization
–
Repeat for the specified number of epochs

5. Perform inference with the trained Plain BNN:

–
Generate predictions on the testing dataset

6. Calculate performance metrics on the testing dataset:

–
Mean Squared Error (MSE), Mean Absolute Error (MAE), R-squared

7. Print and analyze the results, including quantitative metrics and graphical representations.

Output:

–
Trained Plain BNN model

Open in a new tab

Algorithm 2 represents the flow of steps of the proposed study.

Algorithm 2 Bayesian Neural Network with Dropout and KL Regularization.

Input:

–
Training dataset (x_train, y_train)
–
Testing dataset (x_test, y_test)
–
BNN architecture parameters (hidden layers, activation functions)
–
Learning rate
–
Number of Monte Carlo samples
–
Dropout
–
KL regularization weight

1. Preprocess and normalize the datasets (x_train, x_test, y_train, y_test).

2. Initialize BNN architecture:

–
Input layer
–
Hidden layers with dropout (e.g., DenseFlipout) and specified activation functions
–
Output layer with IndependentNormal distribution

3. Compile the BNN model:

–
Utilize Adam optimizer with the specified learning rate
–
Define negative log likelihood as the loss function with additional KL regularization

4. Train the BNN model on the training dataset:

–
Iterate through epochs:
–
Forward pass with dropout for training
–
Compute loss with KL regularization
–
Backward pass and optimization
–
Repeat for the specified number of epochs

5. Perform inference with the trained BNN:

–
Utilize Monte Carlo sampling for uncertainty estimation
–
Generate multiple predictions based on dropout at test time

6. Calculate performance metrics on the testing dataset:

–
Mean Squared Error (MSE), Mean Absolute Error (MAE), R-squared

7. Print and analyze the results, including quantitative metrics and graphical representations.

Output:

–
Trained BNN model

Open in a new tab

Method validation

This study uses three performance metrics i.e., MSE, MAE, and R-Squared. Fig. 3 shows the graphical plot of the results.

•
The Plain BNN achieved an MSE of 87.3. Measurement of MSE is a measure of the average squared error between predicted and observed values, indicating a relatively high level of error. A high MSE indicates that the model's predictions are significantly off the mark.
•
The MAE for the Plain BNN is 6.62. MAE is the average absolute difference between predicted and actual values. A high MAE of 6.62 specifies that, on average, the model's estimates are off through roughly 6.62 units from the true values.
•
The R-squared value is −0.0492. R-Square provides a percent of the adjustment in the dependent variable which is likely from the independent variable. Negative R-squared means that the model does not have a good fit to the data and a prediction to the target variable should be offset by some constant value.

In Fig. 3, feature 5 indicates NOX, which represents the concentration of nitric oxides (parts per 10 million) in the air as shown in Table 1. The code chooses to focus on this feature, ``NOX,'' for visualization and analysis. Specifically, it selects the values of features to be plotted on the x-axis in the generated plots. The choice of this feature is motivated by the desire to explore how the model's predictions and uncertainties vary concerning air pollution levels. The first plot is labeled “True Values” with a y-axis labeled “House Price” ranging from 0 to 50. It shows a positive correlation between Feature 5 and House Price, meaning that higher values of Feature 5 are associated with higher house prices. The second plot is labeled “Mean Predictions” with a y-axis also labeled “House Price,” but it ranges from 0 to 30. The points are more clustered around the middle, indicating less variance than in the True Values plot. This means that the mean predictions are less sensitive to changes in Feature 5 and tend to underestimate the house prices. The third plot is labeled “Uncertainty (2 std)” with a y-axis labeled “Uncertainty (2 std)” ranging from 0 to 60. It shows scattered points without a clear trend, indicating varying levels of uncertainty across different values of Feature 5. This means that some values of Feature 5 have more reliable predictions than others, and the uncertainty is measured by the standard deviation of the predictions. Fig 4. Depicts the histogram comparison of predicted and true values of plain BNN.

From the above Figure, it is evident that the histogram shows that predicted values are skewed towards the lower end of the house price range, while the true values are more evenly distributed.

Fig. 5. shows the performance results of the proposed BNN model.

The figure consists of four scatter plots with different labels and scales, arranged in a 2 × 2 grid. Each plot shows the relationship between `Feature 5′ and `Error Rate' for different data sets or aspects of a data set. The top left plot is labeled ``True Values'' and has the lowest range of error rates, from 0 to 50, and data points are scattered, showing a general increase in error rate as Feature 5 increases. The top right plot is labeled ``Mean Predictions'' and has the highest range of error rates, from 0 to 60, and data points are more densely packed around lower error rates, especially when Feature 5 is between −1 and 1. The bottom left plot is labeled ``Uncertainty (2 std)'' and shows the variation in error rates for different feature values, from −5 to almost 35, the data points are widely scattered, indicating varied levels of uncertainty across different values of Feature 5. The bottom right plot is labeled ``Mean Absolute Error (MAE)'' and shows the average error rates for different feature values, from 0 to just over 30, the data points are concentrated towards lower MAE values, with few reaching above an MAE of about 25. Fig 6. Depicts the histogram comparison of predicted and true values of the proposed BNN.

The proposed BNN has done a fairly good job in predicting house prices, as the predicted values closely align with the true values, especially around the peak frequency at approximately 30 units of house price. This suggests that the model has captured the underlying distribution of the data well and has a low prediction error (Table 2).

Table 2.

Comparison between Plain BNN and BNN with Dropout and KL Regularization.

Plain BNN	BNN with Dropout and KL Regularization
- Model Architecture: Dense layers without dropout or KL regularization - Training Parameters: - Learning Rate: 0.01 - Epochs: 50 - Evaluation Metrics: - Mean Squared Error (MSE): 87.3 - Mean Absolute Error (MAE): 6.62 - R-squared (R²): −0.0492	- Model Architecture: Dense layers with dropout (DenseFlipout) and KL regularization - Training Parameters: - Learning Rate: 0.01 - Epochs: 50 - Dropout Rate: 0.5 - KL Regularization Weight: 0.01 - Evaluation Metrics: - Mean Squared Error (MSE): 44.64 - Mean Absolute Error (MAE): 4.4 - R-squared (R²): 0.46

Open in a new tab

Overall Comparison:

The proposed BNN with dropout and KL regularization outperforms the Plain BNN across all metrics.

–
Lower MSE and MAE indicate improved accuracy and a higher R-squared value signifies better model fit.
–
Adjusting the dropout rate in the BNN affects model uncertainty and generalization.
–
Higher dropout rates may lead to increased uncertainty but could mitigate overfitting.
–
Tuning the KL regularization weight balances the trade-off between model complexity and fit to the training data.
–
A smaller weight may result in a less regularized model, while a larger weight increases regularization.

Conclusion and future work

In conclusion, this study introduces a groundbreaking Bayesian Neural Network (BNN) model, incorporating dropout and Kullback-Leibler (KL) regularization to significantly improve uncertainty prediction in regression tasks. Results showcase the integrated model's superior performance, demonstrating enhanced accuracy in both uncertainty estimates and point forecasts. Notable reductions in mean squared error (MSE), mean absolute error (MAE), and a substantial increase in R-squared (R²) underscore its effectiveness, with MSE dropping from 87.3 to 44.64, MAE decreasing from 6.62 to 4.4, and R² improving from −0.0492 to 0.46. Future directions include fine-tuning the BNN model's hyperparameters and assessing its robustness across diverse datasets. Comparative studies against leading uncertainty-aware machine learning models will offer insights into its competitiveness. These endeavors aim to advance the application of uncertainty-aware models in various domains. However, it is crucial to acknowledge certain limitations of the proposed method in practical applications. The computational complexity of the integrated BNN model may pose challenges in real-time applications, demanding further optimization. Additionally, the model's performance might vary across datasets, emphasizing the need for extensive testing across diverse domains to ascertain its generalizability. Addressing these limitations will be integral to the method's successful deployment in practical scenarios.

Ethics statements

Human subjects does not appear in our research work nor animal experiments or even social media data for that matter.

CRediT authorship contribution statement

Raghavendra M. Devadas: Methodology, Software, Formal analysis, Investigation. Vani Hiremani: Conceptualization, Resources, Writing – review & editing.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Contributor Information

Raghavendra M. Devadas, Email: rdevadas@gitam.edu.

Vani Hiremani, Email: vani.hiremani@sitpune.edu.in.

Data availability

Data will be made available on request.

References

1.C. Theobald, F. Pennerath, B. Conan-Guez, M. Couceiro and A. Napoli. A Bayesian neural network based on dropout regulation. 2021 10.48550/arXiv.2102.01968. [DOI]
2.Myojin T., Hashimoto S., Ishihama N. In: Artificial Neural Networks and Machine Learning – ICANN 2020. Farkaš I., Masulli P., Wermter S., editors. Vol. 12397. Springer; Cham: 2020. Detecting uncertain BNN outputs on FPGA using Monte Carlo dropout sampling. (Lecture Notes in Computer Science). [DOI] [Google Scholar]
3.Amin Maleki Sadr M., Zhu Y., Hu P. An anomaly detection method for satellites using Monte Carlo dropout. IEEE Trans. Aerosp. Electron. Syst. 2023;59(2):2044–2052. doi: 10.1109/TAES.2022.3206257. [DOI] [Google Scholar]
4.L. Basora, A. Viens, M.A. Chao, & X. Olive. A benchmark on uncertainty quantification for deep learning prognostics. 2023. 10.48550/arXiv.2302.04730. [DOI]
5.Sun S., Zhang G., Shi J., Grosse R.B. Functional variational Bayesian neural networks. ICLR. 2019 doi: 10.48550/arXiv.1903.05779. n. pag. [DOI] [Google Scholar]
6.Wu A., Nowozin S., Meeds E., Turner R.E., Hernández-Lobato J., Gaunt A.L. Deterministic variational inference for robust Bayesian neural networks. ICLR. 2018 doi: 10.48550/arXiv.1810.03958. [DOI] [Google Scholar]
7.Liu X., Li Y., Wu C., Hsieh C. Adv-BNN: improved adversarial defense through robust Bayesian neural network. ICLR. 2018 doi: 10.48550/arXiv.1810.01279. [DOI] [Google Scholar]
8.Vladimirova M., Verbeek J.J., Mesejo P., Arbel J. Understanding priors in Bayesian neural networks at the unit level. ICML. 2018 doi: 10.48550/arXiv.1810.05193. [DOI] [Google Scholar]
9.Ma C., Li Y., Hernández-Lobato J. Variational implicit processes. ICML. 2019 https://proceedings.mlr.press/v97/ma19b.html [Google Scholar]
10.J. Yao, W. Pan, S.S. Ghosh, & F. Doshi-Velez, Quality of uncertainty quantification for Bayesian neural network inference. 2019. 10.48550/arXiv.1906.09686. [DOI]
11.Kwon Y., Won J.-H., Kim B.J., Cho Paik M. Uncertainty quantification using Bayesian neural networks in classification: application to biomedical image segmentation. Comput. Stat. Data Anal. 2020;142 doi: 10.1016/j.csda.2019.106816. [DOI] [Google Scholar]
12.Devadas R., Cholli N.G. PUGH decision trapezoidal fuzzy and gradient reinforce deep learning for large scale requirement prioritization. Indian J. Sci. Technol. 2022;15(12):542–553. doi: 10.17485/IJST/v15i12.1757. [DOI] [Google Scholar]
13.Devadas R.M., Hiremani V., Bidwe R.V., Zope B., Jadhav V., Jadhav R. Identifying factors in congenital heart disease transition using fuzzy DEMATEL. Int. J. Adv. Comput. Sci. Appl. 2023;14(12) doi: 10.14569/ijacsa.2023.0141218. [DOI] [Google Scholar]
14.Turkyilmazoglu M. Nonlinear problems via a convergence accelerated decomposition method of Adomian. Comput. Model. Eng. Sci. 2021;127(1):1–22. doi: 10.32604/cmes.2021.012595. [DOI] [Google Scholar]
15.Blei D.M., Kucukelbir A., McAuliffe J.D. Variational inference: a review for statisticians. J. Am. Stat. Assoc. 2017;112(518):859–877. doi: 10.1080/01621459.2017.1285773. [DOI] [Google Scholar]
16.Turkyilmazoglu M. An efficient computational method for differential equations of fractional type. Comput. Model. Eng. Sci. 2022;133(1):47–65. doi: 10.32604/cmes.2022.020781. [DOI] [Google Scholar]
17.Bishop C.M. Springer; 2016. Pattern Recognition and Machine Learning. [Google Scholar]
18.Srivastava Nitish, Hinton Geoffrey, Krizhevsky Alex, Sutskever Ilya, Salakhutdinov Ruslan. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014;15:1929–1958. [Google Scholar]
19.Cover T.M., Thomas J.A. John Wiley & Sons; 2012. Elements of Information Theory. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data will be made available on request.

[bib0001] 1.C. Theobald, F. Pennerath, B. Conan-Guez, M. Couceiro and A. Napoli. A Bayesian neural network based on dropout regulation. 2021 10.48550/arXiv.2102.01968. [DOI]

[bib0002] 2.Myojin T., Hashimoto S., Ishihama N. In: Artificial Neural Networks and Machine Learning – ICANN 2020. Farkaš I., Masulli P., Wermter S., editors. Vol. 12397. Springer; Cham: 2020. Detecting uncertain BNN outputs on FPGA using Monte Carlo dropout sampling. (Lecture Notes in Computer Science). [DOI] [Google Scholar]

[bib0003] 3.Amin Maleki Sadr M., Zhu Y., Hu P. An anomaly detection method for satellites using Monte Carlo dropout. IEEE Trans. Aerosp. Electron. Syst. 2023;59(2):2044–2052. doi: 10.1109/TAES.2022.3206257. [DOI] [Google Scholar]

[bib0004] 4.L. Basora, A. Viens, M.A. Chao, & X. Olive. A benchmark on uncertainty quantification for deep learning prognostics. 2023. 10.48550/arXiv.2302.04730. [DOI]

[bib0005] 5.Sun S., Zhang G., Shi J., Grosse R.B. Functional variational Bayesian neural networks. ICLR. 2019 doi: 10.48550/arXiv.1903.05779. n. pag. [DOI] [Google Scholar]

[bib0006] 6.Wu A., Nowozin S., Meeds E., Turner R.E., Hernández-Lobato J., Gaunt A.L. Deterministic variational inference for robust Bayesian neural networks. ICLR. 2018 doi: 10.48550/arXiv.1810.03958. [DOI] [Google Scholar]

[bib0007] 7.Liu X., Li Y., Wu C., Hsieh C. Adv-BNN: improved adversarial defense through robust Bayesian neural network. ICLR. 2018 doi: 10.48550/arXiv.1810.01279. [DOI] [Google Scholar]

[bib0008] 8.Vladimirova M., Verbeek J.J., Mesejo P., Arbel J. Understanding priors in Bayesian neural networks at the unit level. ICML. 2018 doi: 10.48550/arXiv.1810.05193. [DOI] [Google Scholar]

[bib0009] 9.Ma C., Li Y., Hernández-Lobato J. Variational implicit processes. ICML. 2019 https://proceedings.mlr.press/v97/ma19b.html [Google Scholar]

[bib0010] 10.J. Yao, W. Pan, S.S. Ghosh, & F. Doshi-Velez, Quality of uncertainty quantification for Bayesian neural network inference. 2019. 10.48550/arXiv.1906.09686. [DOI]

[bib0011] 11.Kwon Y., Won J.-H., Kim B.J., Cho Paik M. Uncertainty quantification using Bayesian neural networks in classification: application to biomedical image segmentation. Comput. Stat. Data Anal. 2020;142 doi: 10.1016/j.csda.2019.106816. [DOI] [Google Scholar]

[bib0012] 12.Devadas R., Cholli N.G. PUGH decision trapezoidal fuzzy and gradient reinforce deep learning for large scale requirement prioritization. Indian J. Sci. Technol. 2022;15(12):542–553. doi: 10.17485/IJST/v15i12.1757. [DOI] [Google Scholar]

[bib0013] 13.Devadas R.M., Hiremani V., Bidwe R.V., Zope B., Jadhav V., Jadhav R. Identifying factors in congenital heart disease transition using fuzzy DEMATEL. Int. J. Adv. Comput. Sci. Appl. 2023;14(12) doi: 10.14569/ijacsa.2023.0141218. [DOI] [Google Scholar]

[bib0014] 14.Turkyilmazoglu M. Nonlinear problems via a convergence accelerated decomposition method of Adomian. Comput. Model. Eng. Sci. 2021;127(1):1–22. doi: 10.32604/cmes.2021.012595. [DOI] [Google Scholar]

[bib0015] 15.Blei D.M., Kucukelbir A., McAuliffe J.D. Variational inference: a review for statisticians. J. Am. Stat. Assoc. 2017;112(518):859–877. doi: 10.1080/01621459.2017.1285773. [DOI] [Google Scholar]

[bib0016] 16.Turkyilmazoglu M. An efficient computational method for differential equations of fractional type. Comput. Model. Eng. Sci. 2022;133(1):47–65. doi: 10.32604/cmes.2022.020781. [DOI] [Google Scholar]

[bib0017] 17.Bishop C.M. Springer; 2016. Pattern Recognition and Machine Learning. [Google Scholar]

[bib0018] 18.Srivastava Nitish, Hinton Geoffrey, Krizhevsky Alex, Sutskever Ilya, Salakhutdinov Ruslan. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014;15:1929–1958. [Google Scholar]

[bib0019] 19.Cover T.M., Thomas J.A. John Wiley & Sons; 2012. Elements of Information Theory. [Google Scholar]

PERMALINK

Integrating Dropout and Kullback-Leibler Regularization in Bayesian Neural Networks for improved uncertainty estimation in Regression

Raghavendra M Devadas

Vani Hiremani

Abstract

Graphical abstract

Preliminary background

Theoretical background

Methodology