Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2020 May 25;12141:217–229. doi: 10.1007/978-3-030-50426-7_17

Radial Basis Functions Based Algorithms for Non-Gaussian Delay Propagation in Very Large Circuits

Dmytro Mishagli 8,, Elena Blokhina 8
Editors: Valeria V Krzhizhanovskaya8, Gábor Závodszky9, Michael H Lees10, Jack J Dongarra11, Peter M A Sloot12, Sérgio Brissos13, João Teixeira14
PMCID: PMC7302811

Abstract

In this paper, we discuss methods for determining delay distributions in modern Very Large Scale Integration design. The delays have a non-Gaussian nature, which is a challenging task to solve and is a stumbling block for many approaches. The problem of finding delays in VLSI circuits is equivalent to a graph optimisation problem. We propose algorithms that aim at fast and very accurate calculations of statistical delay distributions. The speed of execution is achieved by utilising previously obtained analytical results for delay propagation through one logic gate. The accuracy is achieved by preserving the shapes of non-Gaussian delay distribution while traversing the graph of a circuit. The discussion on the methodology to handle non-Gaussian delay distributions is the core of the present study. The proposed algorithms are tested and compared with delay distributions obtained through Monte Carlo simulations, which is the standard verification procedure for this class of problems.

Keywords: Timing analysis, Statistical static timing analysis, Delay propagation, Uncertainty, Non-Gaussian, Graph optimisation

Introduction

The decrease of the feature size of modern Very Large Scale Integration (VLSI) design and the increase of the transistor count on a single chip inevitably approaches us to the end of Moore’s law. Recently, Integrated Circuit (IC) technology has already reached the 5 nm technological node. At such scales, the role of uncertainty during the manufacturing process arises naturally due to physical fluctuations of various parameters such as the transistor channel width, its, length, etc. The design verification for a VLSI circuit has now become even more important since the complexity of design increases, which inevitably increase their cost. The standard way of circuit verification is to perform the timing analysis of a design [1, 7, 11, 13]. The most reliable analysis is done by running Monte Carlo (MC) simulations that consider all possible variations of every parameter in a system. However, such computations can last for weeks for a single design. Thus, semi-analytical methods based on delay models have been developed and are used in addition of MC simulations.

Deterministic Static Timing Analysis (or simply STA) can effectively take into account systematic process variations, and it was a dominant tool for several decades. However, STA gives too pessimistic predictions of delays, which significantly increases the cost of a chip in attempt to mitigate predicted delays. Therefore, the need for handling random correlated processes have arisen, and statistical approaches, known as Statistical Static Timing Analysis (SSTA), have been developed [2, 5]. Within SSTA, all the delays in a system are treated as random variables (RVs) with some distributions. The goal of such an approach is to determine the mean value of the delay across selected paths and critical delays that may jeopardise the logic operation of the whole circuit. In addition, it is required to determine the standard deviation of the delay, its distribution, probability density function (PDF) and/or cumulative distribution function (CDF).

It is accepted now that the distribution of the delay generated by an individual logic gate is generally non-Gaussian [8]. A number of methods have been proposed to address this issue. However, treating non-Gaussian distributions of delays in a very large graph it still a challenge. Traditionally, approximations to the actual forms of distributions are used, as, for example, in studies [4, 15, 16], where the so-called canonical model of a delay has been proposed. Within this method, the delay is described as a linear function of parameter variations. Such approaches include (i) numerical approximations to the Inline graphic-operator (see Sect. 3 for the details on this issue) and/or linearisation of nonlinear functions, and (ii) approximations to the actual distributions with Gaussians. For example, paper [12] discusses another modification of the canonical form based on adding a quadratic term and using skew-normal distributions.

In this study, we propose a fast and accurate algorithm for determining delay distributions taking into account their non-Gaussian nature. The problem of finding delays in VLSI circuits is formulated as that of a graph optimisation. The speed of the algorithm execution is achieved by utilising previously obtained analytical results for delay propagation through one logic gate, which was proposed in [6]. The accuracy is achieved by preserving the shapes of non-Gaussian delay distribution while traversing the graph of a circuit. This requires presenting a PDF of a gate’s delay as a mixture of radial basis functions (RBFs) and solving the corresponding optimisation problem. The discussion on the methodology to handle non-Gaussian delay distributions is the core of the present study. The proposed optimisation strategies are incorporated in a traversal algorithm and tested and compared with delay distributions obtained through Monte Carlo simulations. The latter is the standard verification procedure for this class of problems.

The paper is organised as follows. In Sect. 2, we give a brief introduction to SSTA and discuss general statement of the problem. Section 3 discusses the model of delay propagation through a logic gate. The model allows us to built an algorithm for an accurate and fast calculation of the critical delay through a graph, which is summarised in Sect. 4. The key steps of the algorithm and the optimisation problem are discussed in Sect. 5. Section 6 concludes this paper with the verification of the algorithm via simulations and overall discussion.

Statement of the Problem

Definitions

Throughout the paper, we will use the following commonly accepted terminology [2]. A logic circuit can be represented as a timing graph G(EN) as illustrated in Fig. 1 where the graph and its paths are defined as follows.

Fig. 1.

Fig. 1.

Example logic circuit and its timing graph.

Definition 1

A timing graph G(EN) is an acyclic directed graph, where E and N are the sets of edges and nodes correspondingly. Nodes reflect pins (logic gates) of a circuit. The timing graph always has only one source and one sink. Edges are characterised by weights Inline graphic that describe delays.

The timing graph is called a statistical timing graph within SSTA when the edges of the graph are described by RVs. The task then is to determine the critical (longest) path.

Definition 2

Let Inline graphic be a path of ordered edges from the source to the sink in a timing graph G and let Inline graphic be the path length of Inline graphic. Then Inline graphic is referred to as the SSTA problem of a circuit.

Therefore, SSTA is aimed at determining the distribution of the circuit delay, which is equivalent to calculating the longest path in the graph formalism. Since SSTA operates with random variables, we will also use the following notation for the probability density function of the Gaussian distribution:

graphic file with name M7.gif 1

where Inline graphic and Inline graphic are the mean value and variance respectively, and Inline graphic is the Gaussian kernel function. The cumulative density function will be denoted as follows:

graphic file with name M11.gif 2

where Inline graphic is the error function.

Gate Level Analysis

We will start by explaining briefly how the overall delay is generated in a single logic gate. At the gate level, delay propagation is described by two operations: computing the maximum (Inline graphic) of two delays entering a gate and the summation of the latter with the delay of the gate. From the statistical point of view, when these operations are applied to RVs, the delay of a gate with two inputs reads:

graphic file with name M15.gif 3

where Inline graphic and Inline graphic are the RVs that describe the arrival times of input signals, and Inline graphic is the RV that gives the gate operation time. The operation of a logic gate in terms of the arrival and operation time distributions is presented in Fig. 2. Therefore, Eq. (3) is the convolution of the Inline graphic and Inline graphic probability density functions. The analytical solution to combination (3) exists in a limited number of cases. We discuss these cases in the next section.

Fig. 2.

Fig. 2.

Illustration of delay propagation through a logic gate. At stage (a), two signals arrive at the input of the gate. At stage (b), the Inline graphic-operation is performed, which gives a skewed PDF. At the same time, the gate has its own operation time described by some distribution (c). Thus, the distribution of the gate delay (d) requires the convolution of the obtained distribution (b) and given (c). This convolution results in a new RV, which clearly has a non-Gaussian form.

Model for a Logic Gate Delay

Consider a logic gate with two inputs, A and B, and suppose that the gate operation time is distributed according to the normal law with a mean Inline graphic and a variance Inline graphic, i.e., it is a Gaussian RV. Assume now that the arrival times of both signals are also Gaussian RVs with means and variances Inline graphic, Inline graphic and Inline graphic, Inline graphic respectively. Even if the individual distribution in the example of Fig. 2 are Gaussian, the application of formula (3) leads to a non-Gaussian distribution of the delay at the gate output as shown in that figure.

In our previous work [6], the exact expression for the PDF of such an RV was presented:

graphic file with name M27.gif 4

where

graphic file with name M28.gif 5

Expression (4) does not take into account possible correlations between the arrival signals. This issue will not be addressed in this study. Instead, we are interested in demonstrating how this exact solution can speed up SSTA for a given graph keeping precision high. Formula (4) assumes all initial delays (arrival and gate itself) to have Gaussian distributions. In principle, both the arrival signal and gate delay do not have to be Gaussian. If they can be decomposed into a linear superposition of Gaussian kernel functions, the PDF of the gate output delay can be presented as a linear combination of expressions (4) due to the linearity of the integration operation. This idea constitutes the core of a delay propagation algorithm which we discuss in the next sections.

Delay Propagation Algorithm

The algorithm for the calculation of the delay propagation through a timing graph is outlined below. The high-level description of the algorithm is shown in Fig. 3. graphic file with name 500805_1_En_17_Figa_HTML.jpg

Fig. 3.

Fig. 3.

High-level diagram of the Algorithm flow.

This algorithm relies on the decomposition procedure. This procedure aims to represent a skewed (non-Gaussian) distribution with a mixture of RBFs that have Gaussian form, which allows one to use the result (4). In the next section, we discuss how such a decomposition can be performed.

Optimisation

The exact function (4) can be written as Inline graphic, a sum of RBFs; each of these RBFs has a Gaussian-like shape. In other words, it can be decomposed to a Gaussian mixture [9, 14]. The decomposition procedure is equivalent to fitting the actual PDF with a sum of RBFs, which brings us to an optimisation problem. In this study, we discuss the minimisation of the sum of squares of the residuals

graphic file with name M30.gif 6

subject to constraints (specified below). Here Inline graphic are the data points corresponding to the actual function Inline graphic that we want to fit.

Depending on the form of the RBFs, the minimisation of (6) can vary significantly, e.g., an approximate function Inline graphic can be either linearly or non-linearly dependent on the fitting parameters. We discuss two alternative approaches below.

Choice of the Cost Function

Let us consider the RBFs Inline graphic in the following form:

graphic file with name M35.gif 7

where Inline graphic are the weight coefficients, Inline graphic determine positions of the corresponding RBFs and Inline graphic are the shape parameters. The constraints on Inline graphic and Inline graphic are chosen so that the resulting mixture of RBFs has the meaning of a PDF. In the most general case, for m RBFs there are 3 m parameters to be determined from the solution to the least-squares problem (6).

Allowing all three parameters to vary, the fit function Inline graphic gives the desired shape using a small number m of RBFs. At the same time, obtaining a minimum of (6) in such a case is not a trivial problem even for a few RBFs: the optimisation solver may stuck in one of the local minima, which makes this approach less reliable.

One can simplify the problem by setting a 1D grid of RBFs with given parameters Inline graphic and Inline graphic, therefore, only the weights Inline graphic remain unknown. Note that we get only linear to the Gaussian kernels Inline graphic unknown parameters in such a case. To make the problem even simpler, let us fix the shape parameters Inline graphic for all kernels, Inline graphic, and let us distribute the kernels with equal separation Inline graphic. We shall call such mixtures of RBFs a Gaussian comb (see Fig. 4). At the same time, such a simplification will require more terms in a mixture of RBFs to get the desired shape.

Fig. 4.

Fig. 4.

A sketch of a Gaussian comb. For a given shape parameter c, the Gaussian kernels Inline graphic, being equally separated by Inline graphic, form a 1D grid, which we call a Gaussian comb. By adjusting the weights Inline graphic, the Gaussian comb can give various shapes. The black dashed line shows the resulting curve from a mixture of Gaussian kernels in a comb.

Thus, to construct the Gaussian comb, one needs to set up the number m of RBFs in the comb, the Gaussian comb step Inline graphic and the shape parameter c. The weight coefficients Inline graphic then to be determined from (6). In this study, we assume m and c are known a priori, and the step Inline graphic is determined as

graphic file with name M55.gif 8

where Inline graphic is the effective interval of values over which the function (PDF) must be fit (we call it bandwidth). The choice of the bandwidth is discussed below. In principle, the optimisation problem can be formulated in such a way that m and c are determined simultaneously with the weights Inline graphic, making the solution self-consistent. This will be reported elsewhere.

The distinguishable feature of such RBF decompositions is that one obtains simple functions that allow fast and simple computations of a mean and a variance for the PDF:

graphic file with name M58.gif 9

for a case of Gaussian comb, Inline graphic.

Comparison

For the sake of illustration, consider a problem of fitting the Gaussian PDF, Inline graphic, with only one RBF. If we pre-set the weight w, only two parameters, b and c, are left to find. Choosing Inline graphic, the parameters should be then exactly Inline graphic and Inline graphic. A contour plot of the cost function (6) for such a case is shown in Fig. 5 on the left. The cost function has only one minimum, which lies in a valley (the point with Inline graphic and Inline graphic). However, if we allow the weight w to vary, there will be 3 parameters to find and local minima appear. Increasing the number m of RBFs inevitably brings us to a problem of omitting such minima.

Fig. 5.

Fig. 5.

Cost functions for the least square problem of fitting a Gaussian PDF with Inline graphic and Inline graphic. Left: only one RBF is used; for Inline graphic, the minimum is located at Inline graphic and Inline graphic. Right: two RBFs are used; for Inline graphic, Inline graphic, Inline graphic and Inline graphic, the minimum is located at Inline graphic.

The Gaussian comb consist of two Gaussian kernels with Inline graphic and Inline graphic (Inline graphic and Inline graphic) leads to the cost function shown in Fig. 5 on the right. One can see that the minimum is located in the bottom of a steep well that rises dramatically as one goes away from the minimum. Thus, there will be no difficulties in finding the minimum in higher dimensions, moreover, this problem allows exact solution.

Consider another example. The PDF Inline graphic of the lognormal distribution reads

graphic file with name M83.gif 10

where Inline graphic and Inline graphic are the mean and variance of the corresponding normal distribution. The reciprocal relation between these Inline graphic and Inline graphic and the mean and variance of the lognormal distribution, Inline graphic and Inline graphic, is as follows:

graphic file with name M90.gif 11

Two different decompositions of (10) with Inline graphic and Inline graphic are shown in Fig. 6. One can see that both approaches give perfect result, while require different number of RBFs: (i) 3 RBFs with 3 fitting parameters each (Inline graphic, Inline graphic and Inline graphic) and (ii) 30 RBFs in the Gaussian comb with 30 fitting parameters (weights Inline graphic). The bandwidth is chosen as Inline graphic. We shall keep this choice for the bandwidth in this study although it is not optimal (distributions can be significantly skewed). This issue will be addressed in a separate study.

Fig. 6.

Fig. 6.

Case study: representation of the lognormal distribution with a Gaussian mixture. Two strategies are realised: (i) 3 RBFs with nonlinear to kernels fitting parameters and (ii) 30 RBFs forming the Gaussian comb with linear to kernels parameters. Both strategies give the perfect fit and desired values of Inline graphic and Inline graphic via the approximation (9).

In the next section, the proposed Algorithm with the decomposition strategies discussed above is tested and compared with the numerical experiments (Monte Carlo simulations). The goal is to proof the concept, determine possible issues and define future steps in the research.

Verification and Discussion

Let us investigate whether an error in computing the delays’ PDF is accumulating or not when the decomposition into Gaussian kernels is used. To do so, we consider a model circuit shown in Fig. 7. The inputs’ delays Inline graphic (Inline graphic) are described by the corresponding RVs, Inline graphic. For simplicity (and without any loss of generality), the operation time of the gates is considered to be the same, thus, described by an RV Inline graphic. From the mathematical point of view, the forward traversing of such a sequence of gates is equivalent to computing the chained expressions of type (3). Thus, for the nth gate we have

graphic file with name M102.gif 12

We have conducted a series of runs of the Algorithm and MC simulations for the sequence of Inline graphic (the source code is available from [10]). We have chosen the initial delays Inline graphic to be distributed as Inline graphic. The values for the inputs’ means and standard deviations, Inline graphic and Inline graphic, were randomly drawn from Inline graphic and Inline graphic respectively for each run. One of the realisations of the experiments is shown in Fig. 8. Since the absolute values of delays are not important in the present study, the performance of the algorithm is measured by relative errors in the mean values and standard deviations of delays with respect to the Monte Carlo simulations.

Fig. 7.

Fig. 7.

A sequence of logic gates used in simulations.

Fig. 8.

Fig. 8.

Results for the Gaussian comb approach (Inline graphic kernels with Inline graphic): the relative error in determination of the mean and standard deviation with respect to the Monte Carlo values versus number of gates passed in the sequence. When the bandwidth Inline graphic becomes large, the topology of the Gaussian comb is no longer optimal and the comb sprawls (node 19). For the details of the simulations see the text.

For the Gaussian comb, Inline graphic kernels have used with the shape parameter Inline graphic. The relative error is less than 0.01Inline graphic and remains at that level until it starts to grow dramatically (see Fig. 8). This occurs when the chosen topology of the comb becomes non-optimal, as it is shown for node 19. Also note that the relative error in the standard deviation increased faster than that for the mean, which is expected.

For the case of 3 RBFs with non-linear fitting parameters, the algorithm has given poor performance. As is discussed in the previous section, the optimisation problem (6) in this case is sensitive to small deviations in an initial guess or change of the bandwidth Inline graphic. Thus, it has not been possible to finish the traversal of the graph successfully (a Trust Region Algorithm [3] for the constrained optimisation was used), and the results for this approach are not presented.

The obtained results allow us to conclude that the problem of VLSI circuit delay indeed can be solved using (i) exact solution for a single gate’s delay PDF and (ii) decomposition of non-Gaussian functions into Gaussian mixtures. The detailed discussion and conclusions are as follows.

  • (i)

    The exact formula for an output logic gate delay, the convolution of Inline graphic Inline graphic and Inline graphic for Gaussian RVs Inline graphic (Inline graphic), allows one to build a closed-loop algorithm for forward traversal of a delay through a timing graph G. The requirement for this is that non-Gaussian PDFs of delays are presented via Gaussian mixtures, sums of RBFs of Gaussian form. The decomposition is equivalent to solving the minimisation problem (6). We have considered two different strategies for this problem.

  • (ii)

    Within the first strategy, for m RBFs it is required to determine 3m parameters, Inline graphic of which are coefficients in the arguments of the RBFs. The advantage is that only a few of RBFs is enough to obtain a fit with a desired accuracy, but the drawback is sensitivity to small changes in the parameters such as initial guess, choice of bandwidth Inline graphic, etc.

  • (iii)

    The second strategy relies on a pre-set grid of equally separated Gaussian kernels Inline graphic with the same shape parameter, a Gaussian comb. In such a case, only coefficients that are linear multipliers to RBFs should be determined. The obvious advantage of this approach is that the optimisation problem allows the exact solution, however the required number m of RBFs increases dramatically, which increases (linearly to m) computation costs.

  • (iv)

    The comparison with MC simulations shown in Fig. 8 proofs the concept: the timing graph can be forward traversed with the relative error less than Inline graphic by decomposing real PDFs into corresponding Gaussian mixtures at each node. However, when the topology of the Gaussian comb is fixed, it leads to sprawling of the latter as the bandwidth Inline graphic becomes large.

  • (v)

    In principle, the optimisation problem of finding the weights Inline graphic for the Gaussian comb can be solved together with the problem of finding optimal number Inline graphic and shape parameter Inline graphic of kernels in the comb. This should not only prevent the comb sprawling but also speed up the graph traversal procedure noticeably. At the same time, the optimisation problem for the 3-RBF case should be analysed rigorously to avoid slipping in local minima. This can be an alternative to the Gaussian comb decomposition. These issues will be addressed in a separate study.

Acknowledgement

This work has emanated from research supported in part by Synopsys, Ireland, and a research grant from Science Foundation Ireland (SFI) and is co-funded under the European Regional Development Fund under Grant Number 13/RC/2077.

Contributor Information

Valeria V. Krzhizhanovskaya, Email: V.Krzhizhanovskaya@uva.nl

Gábor Závodszky, Email: G.Zavodszky@uva.nl.

Michael H. Lees, Email: m.h.lees@uva.nl

Jack J. Dongarra, Email: dongarra@icl.utk.edu

Peter M. A. Sloot, Email: p.m.a.sloot@uva.nl

Sérgio Brissos, Email: sergio.brissos@intellegibilis.com.

João Teixeira, Email: joao.teixeira@intellegibilis.com.

Dmytro Mishagli, Email: dmytro.mishagli@gmail.com.

Elena Blokhina, Email: elena.blokhina@ucd.ie.

References

  • 1.Bhasker J, Chadha R. Static Timing Analysis for Nanometer Designs. A Practical Approach. Heidelberg: Springer; 2009. [Google Scholar]
  • 2.Blaauw D, Chopra K, Srivastava A, Scheffer L. Statistical timing anlaysis: from basic principles to state of the art. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 2008;4(8):589–607. doi: 10.1109/TCAD.2007.907047. [DOI] [Google Scholar]
  • 3.Byrd RH, Schnabel RB, Shultz GA. A trust region algorithm for nonlinearly constrained optimization. SIAM J. Numer. Anal. 1987;24(5):1152–1170. doi: 10.1137/0724076. [DOI] [Google Scholar]
  • 4.Chang, H., Zolotov, V., Narayan, S., Visweswariah, C.: Parameterized block-based statistical timing analysis with non-Gaussian parameters, nonlinear delay functions. In: Proceedings of DAC, pp. 71–76, June 2005. 10.1145/1065579.1065604
  • 5.Forzan, C., Pandini, D.: Statistical static timing analysis: a survey. Integr. VLSI J. 42(3), 409–435 (2009). 10.1016/j.vlsi.2008.10.002, special Section on DCIS2006
  • 6.Freeley, J., Mishagli, D., Brazil, T., Blokhina, E.: Statistical simulations of delay propagation in large scale circuits using graph traversal and kernel function decomposition. In: Proceedings of SMACD, July 2018
  • 7.Gerez SH, editor. Algorithms for VLSI Design Automation. Hoboken: Wiley; 1998. [Google Scholar]
  • 8.Lavagno L, Markov IL, Martin G, Scheffer LK, editors. Electronic Design Automation for IC Implementation, Circuit Design, and Process Technology. Boca Raton: CRC Press; 2016. [Google Scholar]
  • 9.McLachlan G, Peel D. Finite Mixture Models. Hoboken: Wiley; 2000. [Google Scholar]
  • 10.Mishagli, D.: RBF approximation for non-Gaussian delay propagation in VLSI: Code (2020). 10.5281/zenodo.3749750
  • 11.Orshansky M, Nassif S, Boning D. Design for Manufacturability and Statistical Design: A Constructive Approach. Heidelberg: Springer; 2008. [Google Scholar]
  • 12.Ramprasath S, Vijaykumar M, Vasudevan V. A skew-normal canonical model for statistical static timing analysis. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2016;24(6):2359–2368. doi: 10.1109/TVLSI.2015.2501370. [DOI] [Google Scholar]
  • 13.Sapatnekar S. Timing. Heidelberg: Springer; 2004. [Google Scholar]
  • 14.Titterington, D., Smith, A., Makov, U.: Statistical Analysis of Finite Mixture Distributions. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. Wiley, Hoboken (1985)
  • 15.Visweswariah, C., Ravindran, K., Kalafala, K., Walker, S.G., Narayan, S.: First-order incremental block-based statistical timing analysis, pp. 331–336 (2004). 10.1145/996566.996663
  • 16.Visweswariah, C., et al.: First-order incremental block-based statistical timing analysis. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 25(10), 2170–2180 (2006). 10.1109/TCAD.2005.862751

Articles from Computational Science – ICCS 2020 are provided here courtesy of Nature Publishing Group

RESOURCES