Skip to main content
Entropy logoLink to Entropy
. 2020 Aug 8;22(8):874. doi: 10.3390/e22080874

Some Dissimilarity Measures of Branching Processes and Optimal Decision Making in the Presence of Potential Pandemics

Niels B Kammerer 1, Wolfgang Stummer 2,*
PMCID: PMC7517477  PMID: 33286645

Abstract

We compute exact values respectively bounds of dissimilarity/distinguishability measures–in the sense of the Kullback-Leibler information distance (relative entropy) and some transforms of more general power divergences and Renyi divergences–between two competing discrete-time Galton-Watson branching processes with immigration GWI for which the offspring as well as the immigration (importation) is arbitrarily Poisson-distributed; especially, we allow for arbitrary type of extinction-concerning criticality and thus for non-stationarity. We apply this to optimal decision making in the context of the spread of potentially pandemic infectious diseases (such as e.g., the current COVID-19 pandemic), e.g., covering different levels of dangerousness and different kinds of intervention/mitigation strategies. Asymptotic distinguishability behaviour and diffusion limits are investigated, too.

Keywords: Galton-Watson branching processes with immigration, Hellinger integrals, power divergences, Kullback-Leibler information distance/divergence, relative entropy, Renyi divergences, epidemiology, COVID-19 pandemic, Bayesian decision making, INARCH(1) model, GLM model, Bhattacharyya coefficient/distance


Contents

1 Introduction 3
2 The Framework and Application Setups 5
  2.1 Process Setup 5
  2.2 Connections to Time Series of Counts 6
  2.3 Applicability to Epidemiology 8
  2.4 Information Measures 12
  2.5 Decision Making under Uncertainty 15
  2.6 Asymptotical Distinguishability 19
3 Detailed Recursive Analyses of Hellinger Integrals 21
  3.1 A First Basic Result 21
  3.2 Some Useful Facts for Deeper Analyses 25
  3.3 Detailed Analyses of the Exact Recursive Values, i.e., for the Cases βA,βH,αA,αHPNIPSP,1 27
  3.4 Some Preparatory Basic Facts for the Remaining Cases βA,βH,αA,αHPSP\PSP,1 29
  3.5 Lower Bounds for the Cases βA,βH,αA,αH,λ(PSP\PSP,1)×]0,1[ 31
  3.6 Goals for Upper Bounds for the Cases βA,βH,αA,αH,λ(PSP\PSP,1)×]0,1[ 32
  3.7 Upper Bounds for the Cases βA,βH,αA,αH,λPSP,2×]0,1[ 34
  3.8 Upper Bounds for the Cases βA,βH,αA,αH,λPSP,3a×]0,1[ 35
  3.9 Upper Bounds for the Cases βA,βH,αA,αH,λPSP,3b×]0,1[ 36
  3.10 Upper Bounds for the Cases βA,βH,αA,αH,λPSP,3c×]0,1[ 37
  3.11 Upper Bounds for the Cases βA,βH,αA,αH,λPSP,4a×]0,1[ 37
  3.12 Upper Bounds for the Cases βA,βH,αA,αH,λPSP,4b×]0,1[ 37
  3.13 Concluding Remarks on Alternative Upper Bounds for all Cases βA,βH,αA,αH,λ (PSP\PSP,1)×]0,1[ 37
  3.14 Intermezzo 1: Application to Asymptotical Distinguishability 38
  3.15 Intermezzo 2: Application to Decision Making under Uncertainty 39
   3.15.1 Bayesian Decision Making 39
   3.15.2. Neyman-Pearson Testing 41
  3.16 Goals for Lower Bounds for the Cases βA,βH,αA,αH,λ(PSP\PSP,1)×(R\[0,1]) 41
  3.17 Lower Bounds for the Cases βA,βH,αA,αH,λPSP,2×(R\[0,1]) 44
  3.18 Lower Bounds for the Cases βA,βH,αA,αH,λPSP,3a×(R\[0,1]) 45
  3.19 Lower Bounds for the Cases βA,βH,αA,αH,λPSP,3b×(R\[0,1]) 46
  3.20 Lower Bounds for the Cases βA,βH,αA,αH,λPSP,3c×(R\[0,1]) 47
  3.21 Lower Bounds for the Cases βA,βH,αA,αH,λPSP,4a×(R\[0,1]) 47
  3.22 Lower Bounds for the Cases βA,βH,αA,αH,λPSP,4b×(R\[0,1]) 48
  3.23 Concluding Remarks on Alternative Lower Bounds for all Cases βA,βH,αA,αH,λ(PSP\PSP,1)×(R\[0,1]) 48
  3.24 Upper Bounds for the Cases βA,βH,αA,αH,λ(PSP\PSP,1)×(R\[0,1]) 48
4 Power Divergences of Non-Kullback-Leibler-Information-Divergence Type 49
  4.1 A First Basic Result 49
  4.2 Detailed Analyses of the Exact Recursive Values of Iλ(··), i.e., for the Cases βA,βH,αA,αH,λ(PNIPSP,1)×(R\{0,1}) 51
  4.3 Lower Bounds of Iλ(··) for the Cases βA,βH,αA,αH,λ(PSP\PSP,1)×]0,1[ 52
  4.4 Upper Bounds of Iλ(··) for the Cases βA,βH,αA,αH,λ(PSP\PSP,1)×]0,1[ 53
  4.5 Lower Bounds of Iλ(··) for the Cases (βA,βH,αA,αH,λ)(PSP\PSP,1)×(R\[0,1]) 53
  4.6 Upper Bounds of Iλ(··) for the Cases βA,βH,αA,αH,λ(PSP\PSP,1)×(R\[0,1]) 54
  4.7 Applications to Bayesian Decision Making 55
5 Kullback-Leibler Information Divergence (Relative Entropy) 55
  5.1 Exact Values Respectively Upper Bounds of I(·||·) 55
  5.2 Lower Bounds of I(·||·) for the Cases βA,βH,αA,αH(PSP\PSP,1) 56
  5.3 Applications to Bayesian Decision Making 58
6 Explicit Closed-Form Bounds of Hellinger Integrals 59
  6.1 Principal Approach 59
  6.2 Explicit Closed-Form Bounds for the Cases βA,βH,αA,αH,λ(PNIPSP,1)×(R\{0,1}) 63
  6.3 Explicit Closed-Form Bounds for the Cases βA,βH,αA,αH,λ(PSP\PSP,1)×]0,1[ 64
  6.4 Explicit Closed-Form Bounds for the Cases βA,βH,αA,αH,λ(PSP\PSP,1)×(R\[0,1]) 67
  6.5 Totally Explicit Closed-Form Bounds 69
  6.6 Closed-Form Bounds for Power Divergences of Non-Kullback-Leibler-Information-Divergence Type 70
  6.7 Applications to Decision Making 71
7 Hellinger Integrals and Power Divergences of Galton-Watson Type Diffusion Approximations 71
  7.1 Branching-Type Diffusion Approximations 71
  7.2 Bounds of Hellinger Integrals for Diffusion Approximations 74
  7.3 Bounds of Power Divergences for Diffusion Approximations 79
  7.4 Applications to Decision Making 80
A Proofs and Auxiliary Lemmas 81
  A.1. Proofs and Auxiliary Lemmas for Section 3 81
  A.2 Proofs and Auxiliary Lemmas for Section 5 88
  A.3 Proofs and Auxiliary Lemmas for Section 6 94
  A.4 Proofs and Auxiliary Lemmas for Section 7 101
References 115

1. Introduction

(This paper is a thoroughly revised, extended and retitled version of the preprint arXiv:1005.3758v1 of both authors) Over the past twenty years, density-based divergences D(P,Q) –also known as (dis)similarity measures, directed distances, disparities, distinguishability measures, proximity measures–between probability distributions P and Q, have turned out to be of substantial importance for decisive statistical tasks such as parameter estimation, testing for goodness-of-fit, Bayesian decision procedures, change-point detection, clustering, as well as for other research fields such as information theory, artificial intelligence, machine learning, signal processing (including image and speech processing), pattern recognition, econometrics, and statistical physics. For some comprehensive overviews on the divergence approach to statistics and probability, the reader is referred to the insightful books of e.g., Liese & Vajda [1], Read & Cressie [2], Vajda [3], Csiszár & Shields [4], Stummer [5], Pardo [6], Liese & Miescke [7], Basu et al. [8], Voinov et al. [9], the survey articles of e.g., Liese & Vajda [10], Vajda & van der Meulen [11], the structure-building papers of Stummer & Vajda [12], Kißlinger & Stummer [13] and Broniatowski & Stummer [14], and the references therein. Divergence-based bounds of minimal mean decision risks (e.g., Bayes risks in finance) can be found e.g., in Stummer & Vajda [15] and Stummer & Lao [16].

Amongst the above-mentioned dissimilarity measures, an important omnipresent subclass are the so-called fdivergences of Csiszar [17], Ali & Silvey [18] and Morimoto [19]; important special cases thereof are the total variation distance and the very frequently used λorder power divergences Iλ(P,Q) (also known as alpha-entropies, Cressie-Read measures, Tsallis cross-entropies) with λR. The latter cover e.g., the very prominent Kullback-Leibler information divergence I1(P,Q) (also called relative entropy), the (squared) Hellinger distance I1/2(P,Q), as well as the Pearson chi-square divergence I2(P,Q). It is well known that the power divergences can be build with the help of the λorder Hellinger integrals Hλ(P,Q) (where e.g., the case λ=1/2 corresponds to the well-known Bhattacharyya coefficient), which are information measures of interest by their own and which are also the crucial ingredients of λorder Renyi divergences Rλ(P,Q) (see e.g., Liese & Vajda [1], van Erven & Harremoes [20]); the case R1/2(P,Q) corresponds to the well-known Bhattacharyya distance.

The above-mentioned information/dissimilarity measures have been also investigated in non-static, time-dynamic frameworks such as for various different contexts of stochastic processes like processes with independent increments (see e.g., Newman [21], Liese [22], Memin & Shiryaev [23], Jacod & Shiryaev [24], Liese & Vajda [1], Linkov & Shevlyakov [25]), Poisson point processes (see e.g., Liese [26], Jacod & Shiryaev [24], Liese & Vajda [1]), diffusion prcoesses and solutions of stochastic differential equations with continuous paths (see e.g., Kabanov et al. [27], Liese [28], Jacod & Shiryaev [24], Liese & Vajda [1], Vajda [29], Stummer [30,31,32], Stummer & Vajda [15]), and generalized binomial processes (see e.g., Stummer & Lao [16]); further related literature can be found e.g., in references of the aforementioned papers and books.

Another important class of time-dynamic models is given by discrete-time integer-valued branching processes, in particular (Bienaymé-)Galton-Watson processes without immigration GW respectively with immigration (resp. importation, invasion) GWI, which have numerous applications in biotechnology, population genetics, internet traffic research, clinical trials, asset price modelling, derivative pricing, and many others. As far as important terminology is concerned, we abbreviatingly subsume both models as GW(I) and, simply as GWI in case that GW appears as a parameter-special-case of GWI; recall that a GW(I) is called subcritical respectively critical respectively supercritical if its offspring mean is less than 1 respectively equal to 1 respectively larger than 1.

For applications of GW(I) in epidemiology, see e.g., the works of Bartoszynski [33], Ludwig [34], Becker [35,36], Metz [37], Heyde [38], von Bahr & Martin-Löf [39], Ball [40], Jacob [41], Barbour & Reinert [42], Section 1.2 of Britton & Pardoux [43]); for more details see Section 2.3 below.

For connections of GW(I) to time series of counts including GLM models, see e.g., Dion, Gauthier & Latour [44], Grunwald et al. [45], Kedem & Fokianos [46], Held, Höhle & Hofmann [47], and Weiß [48]; a more comprehensive discussion can be found in Section 2.2 below.

As far as the combined study of information measures and GW processes is concerned, let us first mention that (transforms of) power divergences have been used for supercritical Galton-Watson processes without immigration for instance as follows: Feigin & Passy [49] study the problem to find an offspring distribution which is closest (in terms of relative entropy type distance) to the original offspring distribution and under which ultimate extinction is certain. Furthermore, Mordecki [50] gives an equivalent characterization for the stable convergence of the corresponding log-likelihood process to a mixed Gaussian limit, in terms of conditions on Hellinger integrals of the involved offspring laws. Moreover, Sriram & Vidyashankar [51] study the properties of offspring-distribution-parameters which minimize the squared Hellinger distance between the model offspring distribution and the corresponding non-parametric maximum likelihood estimator of Guttorp [52]. For the setup of GWI with Poisson offspring and nonstochastic immigration of constant value 1, Linkov & Lunyova [53] investigate the asymptotics of Hellinger integrals in order to deduce large deviation assertions in hypotheses testing problems.

In contrast to the above-mentioned contexts, this paper pursues the following main goals:

  • (MG1)

    for any time horizon and any criticality scenario (allowing for non-stationarities), to compute lower and upper bounds–and sometimes even exact values–of the Hellinger integrals HλPAPH, power divergences IλPAPH and Renyi divergences RλPAPH of two alternative Galton-Watson branching processes PA and PH (on path/scenario space), where (i) PA has Poisson(βA) distributed offspring as well as Poisson(αA) distributed immigration, and (ii) PH has Poisson(βH) distributed offspring as well as Poisson(αH) distributed immigration; the non-immigration cases are covered as αA=αH=0; as a side effect, we also aim for corresponding asymptotic distinguishability results;

  • (MG2)

    to compute the corresponding limit quantities for the context in which (a proper rescaling of) the two alternative Galton-Watson processes with immigration converge to Feller-type branching diffusion processes, as the time-lags between the generation-size observations tend to zero;

  • (MG3)

    as an exemplary field of application, to indicate how to use the results of (MG1) for Bayesian decision making in the epidemiological context of an infectious-disease pandemic (e.g., the current COVID-19), where e.g., potential state-budgetary losses can be controlled by alternative public policies (such as e.g., different degrees of lockdown) for mitigations of the time-evolution of the number of infectious persons (being quantified by a GW(I)). Corresponding Neyman-Pearson testing will be treated, too.

Because of the involved Poisson distributions, these goals can be tackled with a high degree of tractability, which is worked out in detail with the following structure (see also the full table of contents after this paragraph): in Section 2, we first introduce (i) the basic ingredients of Galton-Watson processes together with their interpretations in the above-mentioned pandemic setup where it is essential to study all types of criticality (being connected with levels of reproduction numbers), (ii) the employed fundamental information measures such as Hellinger integrals, power divergences and Renyi divergences, (iii) the underlying decision-making framework, as well as (iv) connections to time series of counts and asymptotical distinguishability. Thereafter, we start our detailed technical analyses by giving recursive exact values respectively recursive bounds–as well as their applications–of Hellinger integrals HλPAPH (see Section 3), power divergences IλPAPH and Renyi divergences RλPAPH (see Section 4 and Section 5). Explicit closed-form bounds of Hellinger integrals HλPAPH will be worked out in Section 6, whereas Section 7 deals with Hellinger integrals and power divergences of the above-mentioned Galton-Watson type diffusion approximations.

2. The Framework and Application Setups

2.1. Process Setup

We investigate dissimilarity measures and apply them to decisions, in the following context. Let the integer-valued random variable Xn (nN0) denote the size of the nth generation of a population (of persons, organisms, spreading news, other kind of objects, etc.) with specified characteristics, and suppose that for the modelling of the time-evolution nXn we have the choice between the following two (e.g., alternative, competing) models (H) and (A):

(H) a discrete-time homogeneous Galton-Watson process with immigration GWI, given by the recursive description

X0N;N0Xn=k=1Xn1Yn1,k+Y˜n,nN, (1)

where Yn1,k is the number of offspring of the kth object (e.g., organism, person) within the (n1)th generation, and Y˜n denotes the number of immigrating objects in the nth generation. Notice that we employ an arbitrary deterministic (i.e., degenerate random) initial generation size X0. We always assume that under the corresponding dynamics-governing law PH

  • (GWI1)

    the collection Y:=Yn1,k,nN,kN consists of independent and identically distributed (i.i.d.) random variables which are Poisson distributed with parameter βH>0,

  • (GWI2)

    the collection Y˜:=Y˜n,nN consists of i.i.d. random variables which are Poisson distributed with parameter αH0 (where αH=0 stands for the degenerate case of having no immigration),

  • (GWI3)

    Y and Y˜ are independent.

(A) a discrete-time homogeneous Galton-Watson process with immigration GWI given by the same recursive description (1), but with different dynamics-governing law PA under which (GWI1) holds with parameter βA>0 (instead of βH>0), (GWI2) holds with αA0 (instead of αH0), and (GWI3) holds. As a side remark, in some contexts the two models (H) and (A) may function as a “sandwich” of a more complicated not fully known model.

Basic and advanced facts on general GWI (introduced by Heathcote [54]) can be found e.g., in the monographs of Athreya & Ney [55], Jagers [56], Asmussen & Hering [57], Haccou [58]; see also e.g., Heyde & Seneta [59], Basawa & Rao [60], Basawa & Scott [61], Sankaranarayanan [62], Wei & Winnicki [63], Winnicki [64], Guttorp [52] as well as Yanev [65] (and also the references therein all those) for adjacent fundamental statistical issues including the involved technical and conceptual challenges.

For the sake of brevity, wherever we introduce or discuss corresponding quantities simultaneously for both models H and A, we will use the subscript • as a synonym for either the symbol H or A. For illustration, recall the well-known fact that the corresponding conditional probabilities P(Xn=·|Xn1=k) are again Poisson-distributed, with parameter β·k+α.

In oder to achieve a transparently representable structure of our results, we subsume the involved parameters as follows:

  • (PS1)

    PSP is the set of all constellations βA,βH,αA,αH of real-valued parameters βA>0, βH>0, αA>0, αH>0, such that βAβH or αAαH (or both); in other words, both models are non-identical and have non-vanishing immigration;

  • (PS2)

    PNI is the set of all βA,βH,αA,αH of real-valued parameters βA>0, βH>0, αA=αH=0, such that βAβH; this corresponds to the important special case that both models have no immigration and are non-identical;

  • (PS3)

    the resulting disjoint union will be denoted by P=PSPPNI.

Notice that for (unbridgeable) technical reasons, we do not allow for “crossovers” between “immigration and no-immigration” (i.e., αA=0 and αH0, respectively, αA0 and αH=0). For practice, this is not a strong restriction, since one may take e.g., αA=1012 and αH=1.

For the non-immigration case α=0 one has the following extinction properties (see e.g., Harris [66], Athreya & Ney [55]). As usual, let us define the extinction time τ:=min{iN:X=0 for all integers i} if this minimum exists, and τ:= else. Correspondingly, let B:={τ<} be the extinction set. If the offspring mean β satisfies β<1—which is called the subcritical case– or β=1—which is known as the critical case–then extinction is certain, i.e., there holds P(B|X0=1)=1. However, if the offspring mean satisfies β>1—which is called the supercritical case–then there is a probability greater than zero, that the population never dies out, i.e., P(B|X0=1)]0,1[. In the latter case, Xn explodes (a.s.) to infinity as n.

In contrast, for the (nondegenerate, nonvanishing) immigration case α0 there is no extinction, viz. P(B|X0=1)=0, although there may be zero population X0=0 for some intermediate time 0N; but due to the immigration, with probability one there is always a later time 1>0, such that X1>0. Nevertheless, also for the setup α0 it is important to know whether β1—which is still called (super-, sub-)criticality–since e.g., in the case β<1 the population size Xn converges (as n) to a stationary distribution on N whereas for β>1 the behaviour is non-stationary (non-ergodic), see e.g., Athreya & Ney [55].

At this point, let us emphasize that in our investigations (both for α=0 and for α0) we do allow for “crossovers” between “different criticalities”, i.e., we deal with all cases βA1 versus all cases βH1; as will be explained in the following, this unifying flexibility is especially important for corresponding epidemiological-model comparisons (e.g., for the sake of decision making).

One of our main goals is to quantitatively compare (the time-evolution of) two competing GWI models H and A with respective parameter sets (βH,αH) and (βA,αA), in terms of the information measures HλPAPH (Hellinger intergrals), IλPAPH (power divergences), RλPAPH (Renyi divergences). The latter two express a distance (degree of dissimilarity) between H and A. From this, we shall particularly derive applications for decision making under uncertainty (including tests).

2.2. Connections to Time Series of Counts

It is well known that a Galton-Watson process with Poisson offspring (with parameter β) and Poisson immigration (with parameter α) is “distributionally” equal to each of the following models (listed in “tree-type” chronological order):

  • (M1)

    a Poissonian Generalized Integer-valued Autoregressive process GINAR(1) in the sense of Gauthier & Latour [67] (see also Dion, Gauthier & Latour [44], Latour [68], as well as Grunwald et al. [45]), that is, a first-order autoregressive times series with Poissonian thinning (with parameter β) and Poissonian innovations (with parameter α);

  • (M2)

    Poissonian first order Conditional Linear Autoregressive model (Poissonian CLAR(1)) in the sense of Grunwald et al. [45] (and earlier preprints thereof) (since the conditional expectation is EP[Xn|Fn1]=α+β·Xn1); this can be equally seen as Poissonian autoregressive Generalized Linear Model GLM with identity link function (cf. [45] as well as Chapter 4 of Kedem & Fokianos [46]), that is, an autoregressive GLM with Poisson distribution as random component and the identity link as systematic component;

    the same model was used (and generalized)
    • (M2i)
      under the name BIN(1) by Rydberg & Shephard [69] for the description of the number Xn of stock transactions/trades recorded up to time n;
    • (M2ii)
      under the name Poisson autoregressive model PAR(1) by Brandt & Williams [70] for the description of event counts in political and other social science applications;
    • (M2iii)
      under the name Autoregressive Conditional Poisson model ACP(1,0) by Heinen [71];
    • (M2iv)
      by Held, Höhle & Hofmann [47] as well as Held et al. [72], as a description of the time-evolution of counts from infectious disease surveillance databases, where β (respectively, α) is interpreted as driving parameter of epidemic (respectively, endemic) component; in principle, this type of modelling can be also implicitly recovered as a special case of the epidemics-treating work of Finkenstädt, Bjornstad & Grenfell [73], by assuming trend- and season-neglecting (e.g., intra-year) measles data in urban areas of about 10 million people (provided that their population size approximation extends linearly);
    • (M2v)
      under the name integer-valued Generalized Autoregressive Conditional Heteroscedastic model INGARCH(1,0) by Ferland, Latour & Oraichi [74] (since the conditional variance is VarP[Xn|Fn1]=α+β·Xn1), see also Weiß [75]; this has been refinely named as INARCH(1) model by Weiß [76,77], and frequently applied thereafter; for an “overlapping-generation type” interpretation of the INARCH(1) model, which is an adequate description for the time-evolution of overdispersed counts with an autoregressive serial dependence structure, see Weiß & Testik [78]; for a corresponding comprehensive recent survey (also to more general count time series), the reader is referred to the book of Weiß [48];

Moreover, according to the general considerations of Grunwald et al. [45], the Poissonian Galton-Watson model with immigration may possibly be “distributionally equal” to an integer-valued autoregressive model with random coefficient (thinning).

Nowadays, besides the name homogeneous Galton-Watson model with immigration GWI, the name INARCH(1) seems to be the most used one, and we follow this terminology (with emphasis on GWI). Typical features of the above-mentioned models (M1) to (M2v), are the use of Z as the set of times, and the assumptions α>0 as well as β]0,1[, which guarantee stationarity and ergodicity (see above). In contrast, we employ N0 as the set of times, degenerate (and thus, non-equilibrium) starting distribution, and arbitrary α0 as well as β>0. For such a situation, as explained above, we quantitatively compare two competing GWI models H and A with respective parameter sets (βH,αH) and (βA,αA). Since–as can be seen e.g., in (29) below—we basically employ only (conditionally) distributional ingredients, such as the corresponding likelihood ratio (see e.g., (13) to (15), (27) to (29) below), all the results of the Section 3, Section 4, Section 5 and Section 6 can be immediately carried over to the above-mentioned time-series contexts (where we even allow for non-stationarities, in fact we start with a one-point/Dirac distribution); for the sake of brevity, in the rest of the paper this will not be mentioned explicitly anymore.

Notice that a Poissonian GWI as well as all models (M1) and (M2) are–despite of their conditional Poisson law– typically overdispersed since

EP[Xn]=α+β·EP[Xn1]α+β·EP[Xn1]+β2·VarP[Xn1]=VarP[Xn],nN\{1},

with equality iff (i.e., if and only if) α=0 (NI) and Xn2=0 (extinction at n2 with n3).

2.3. Applicability to Epidemiology

The above-mentioned framework can be used for any of the numerous fields of applications of discrete-time branching processes, and of the closely related INARCH(1) models. For the sake of brevity, we explain this—as a kind of running-example—in detail for the currently highly important context of the epidemiology of infectious diseases. For insightful non-mathematical introductions to the latter, see e.g., Kaslow & Evans [79], Osterholm & Hedberg [80]; for a first entry as well as overviews on modelling, the reader is referred to e.g., Grassly & Fraser [81], Keeling & Rohani [82], Yan [83,84], Britton [85], Diekmann, Heesterbeek & Britton [86], Cummings & Lessler [87], Just et al. [88], Britton & Giardina [89], Britton & Pardoux [43]. A survey on the particular role of branching processes in epidemiology can be found e.g., in Jacob [41].

Undoubtedly, by nature, the spreading of an infectious disease through a (human, animal, plant) population is a branching process with possible immigration. Indeed, typically one has the following mechanism:

  • (D1)

    at some time tkE–called the time of exposure (moment of infection)—an individual k of a specified population is infected in a wide sense, i.e., entered/invaded/colonized by a number of transmissible disease-causative pathogens (etiologic agents such as viruses, bacteria, protozoans and other parasites, subviruses (e.g., prions and plant viroids), etc.); the individual is then a host (of pathogens);

  • (D2)

    depending on the level of immunity and some other factors, these pathogens may multiply/replicate within the host to an extent (over a threshold number) such that at time tkI some of the pathogens start to leave their host (shedding of pathogens); in other words, the individual k becomes infectious at the time tkI of onset of infectiousness. Ex post, one can then say that the individual became infected in the narrow sense at earlier time tkE and call it a primary case. The time interval [tkE,tkI[ is called the latent/latency/pre-infectious period of k, and tkItkE its duration (in some literature, there is no verbal distinction between them); notice that tkI may differ from the time tkOS of onset (first appearance) of symptoms, which leads to the so-called incubation period [tkE,tkOS[; if tkI<tkOS then [tkI,tkOS[ is called the pre-symptomatic period;

  • (D3)

    as long as the individual k stays infectious, by shedding of pathogens it may infect in a narrow sense a random number YkN0 of other individuals which are susceptible (i.e., neither immune nor already infected in a narrow sense), where the distribution of Yk depends on the individual’s (natural, voluntary, forced) behaviour, its environment, as well as some other factors e.g., connected with the type of pathogen transmission; the newly infected individuals are called offspring of k, and secondary cases if they are from the same specified population or exportations if they are from a different population; from the view of the latter, these infections are imported cases and thus can be viewed as immigrants;

  • (D4)

    at the time tkR of cessation of infectiousness, the individual stops being infectious (e.g., because of recovery, death, or total isolation); the time interval [tkI,tkR[ is called the period of infectiousness (also period of communicability, infectious/infective/shedding/contagious period) of k, and tkRtkI its duration (in some literature, there is no verbal distinction between them); notice that tkR may differ from the time tkCS of cessation (last appearance) of symptoms which leads to the so-called sickness period [tkOS,tkCS[;

  • (D5)

    this branching mechanism continues within the specified population until there are no infectious individuals and also no importations anymore (eradication, full extinction, total elimination)– up to a specified final time (which may be large or even infinite);

All the above-mentioned times tk· and time intervals are random, by nature. Two further connected quantities are also important for modelling (see e.g., Yan & Chowell [84] (p. 241ff), including a history of corresponding terminology). Firstly, the generation interval (generation time, transmission interval) is the time interval from the onset of infectiousness in a primary case (called the infector) to the onset of infectiousness in a secondary case (called the infectee) infected by the primary case; clearly, the generation interval is random, and so is its duration (often, the (population-)mean of the latter is also called generation interval). Typically, generation intervals are important ingredients of branching process models of infectious diseases. Secondly, the serial interval describes time interval from the onset of symptoms in a primary case to the onset of symptoms in a secondary case infected by the primary case. By nature, the serial interval is random, and so is its duration (often, the (population-)mean of the latter is also called serial interval). Typically, the serial interval is easier to observe than the generation interval, and thus, the latter is often approximately estimated from data of the former. For further investigations on generation and serial intervals, the reader is referred to e.g., Fine [90], Svensson [91,92], Wallinga & Lipsitch [93], Forsberg White & Pagano [94], Nishiura [95], Scalia Tomba et al. [96], Trichereau et al. [97], Vink, Bootsma & Wallinga [98], Champredon & Dushoff [99], Just et al. [88], and–especially for the novel COVID-19 pandemics—An der Heiden & Hamouda [100], Ferretti et al. [101], Ganyani et al. [102], Li et al. [103], Nishiura, Linton & Akhmetzhanov [104], Park et al. [105].

With the help of the above-mentioned individual ingredients, one can aggregatedly build numerous different population-wide models of infectious diseases in discrete time as well as in continuous time; the latter are typically observed only in discrete-time steps (discrete-time sampling), and hence in the following we concentrate on discrete-time modelling (of the real or the observational process). In fact, we confine ourselves to the important task of modelling the evolution nXn of the number of incidences at “stage” n, where incidence refers to the number of new infected/infectious individuals. Here, n may be a generation number where, inductively, n=0 refers to the generation of the first appearing primary cases in the population (also called initial importations), and n refers to the generation of offsprings of all individuals of generation n1. Alternatively, n may be the index of a physical (“calender”) point of time tn, which may be deterministic or random; e.g., (tn)nN may be a strictly increasing series of (i) equidistant deterministic time points (and thus, one can identify tn=n in appropriate time units such as days, weeks, bi-weeks, months), or (ii) non-equidistant deterministic time points, or (iii) random time points (as a side remark, let us mention that in some situations, Xn may alternatively denote the number of prevalences at “stage” n, where prevalence refers to the total number of infected/infectious individuals (e.g., through some methodical tricks like “self-infection”)).

In the light of this, one can loosely define an epidemic as the rapid spread of an infectious disease within a specified population, where the numbers Xn of incidences are high (or much higher than expected) for that kind of population. A pandemic is a geographically large-scale (e.g., multicontinental or worldwide) epidemic. An outbreak/onset of an epidemic in the narrow sense is the (time of) change where an infectious disease turns into an epidemic, which is typically quantified by exceedance over an threshold; analogously, an outbreak/onset of a pandemic is the (time of) change where the epidemic turns into a pandemic. Of course, one goal of infectious-disease modelling is to quantify “early enough” the potential danger of an emerging outbreak of an epidemic or a pandemic.

Returning to possible models of the incidence-evolution nXn, its description may be theoretically derived from more detailed, time-finer, highly sophisticated, individual-based “mechanistic” infectious-disease models such as e.g., continuous-time suscetible-exposed-infectious-recovered (SEIR) models (see the above-mentioned introductory texts); however, as e.g., pointed out in Held et al. [72], the estimation of the correspondingly involved numerous parameters may be too ambitious for routinely collected, non-detailed disease data, such as e.g., daily/weekly counts Xn of incidences–especially in decisive emerging/early phases of a novel disease (such as the current COVID-19 pandemic). Accordingly, in the following we assume that Xn can be approximately described by a Poissonian Galton-Watson process with immigration respectively a (“distributionally equal”) Poissonian autoregressive Generalized Linear Model in the sense of (M2). Depending on the situation, this can be quite reasonable, for the following arguments (apart from the usual “if the data say so”). Firstly, it is well known (see e.g., Bartoszynski [33], Ludwig [34], Becker [35,36], Metz [37], Heyde [38], von Bahr & Martin-Löf [39], Ball [40], Jacob [41], Barbour & Reinert [42], Section 1.2 of Britton & Pardoux [43]) that in populations with a relatively high number of susceptible individuals and a relatively low number of infectious individuals (e.g., in a large population and in decisive emerging/early phases of the disease spreading), the incidence-evolution nXn can be well approximated by a (e.g., Poissonian) Galton-Watson process with possible immigration where n plays the role of a generation number. If the above-mentioned generation interval is “nearly” deterministic (leading to nearly synchronous, non-overlapping generations)—which is the case e.g., for (phases of) Influenza A(H1N1)pdm09, Influenza A(H3N2), Rubella (cf. Vink, Bootsma & Wallinga [98]), and COVID-19 (cf. Ferretti et al. [101])—and the length of the generation interval is approximated by its mean length and the latter is tuned to be equal to the unit time between consecutive observations, then n plays the role of an observation (surveillance) time. This effect is even more realistic if the period of infectiousness is nearly deterministic and relatively short. Secondly, as already mentioned above, the spreading of an infectious disease is intrinsically a (not necessarily Poissonian Galton-Watson) branching mechanism, which may be blurred by other effects in a way that a Poissonian autoregressive Generalized Linear Model is still a reasonably fitting model for the observational process in disease surveillance. The latter have been used e.g., by Finkenstädt, Bjornstad & Grenfell [73], Held, Höhle & Hofmann [47], and Held et al. [72]; they all use non-constant parameters (e.g., to describe seasonal effects, which are however unknown in early phases of a novel infectious disease such as COVID-19). In contrast, we employ different new–namely divergence-based–statistical techniques, for which we assume constant parameters but also indicate procedures for the detection of changes; the extension to non-constant parameters is straightforward.

Returning to Galton-Watson processes, let us mention as a side remark that they can be also used to model the above-mentioned within-host replication dynamics (D2) (e.g., in the time-interval [tkE,tkI[ and beyond) on a sub-cellular level, see e.g., Spouge [106], as well as Taneyhill, Dunn & Hatcher [107] for parasitic pathogens; on the other hand, one can also employ Galton-Watson processes for quantifying snowball-effect (avalanche-effect, cascade-effect) type, economic-crisis triggered consequences of large epidemics and pandemics, such as e.g., the potential spread of transmissible (i) foreclosures of homes (cf. Parnes [108]), or clearly also (ii) company insolvencies, downsizings and credit-risk downgradings; moreover, the time-evolution of integer-valued indicators concerning the spread of (rational or unwarranted) fears resp. perceived threats may be modelled, too.

Summing up things, we model the evolution nXn of the number of incidences at stage n by a Poissonian Galton Watson process with immigration GWI

X0N;N0Xn=k=1Xn1Yn1,k+Y˜n,nN,cf.(1),(GWI1)(GWI3)withlawP,

(where Yn1,k corresponds to the Yk of (D3), equipped with an additional stage-index n1), respectively by a corresponding “distributionally equal”–possibly non-stationary– Poissonian autoregressive Generalized Linear Model in the sense of (M2); depending on the situation, we may also fix a (deterministic or random) upper time horizon other than infinity. Recall that both models are overdispersed, which is consistent with the current debate on overdispersion in connection with the current COVID-19 pandemic. In infectious-disease language, the sum k=1Xn1Yn1,k can also be loosely interpreted as epidemic component (in a narrow sense) driven by the parameter β, and Y˜n as endemic component driven by the parameter α. In fact, the offspring mean (here, β) is called reproduction number and plays a major role–also e.g., in the current public debate about the COVID-19 pandemic–because it crucially determines the rapidity of the spread of the disease and—as already indicated above in the second and third paragraph after (PS3)–also the probability that the epidemic/pandemic becomes (maybe temporally) extinct or at least stationary at a low level (that is, endemic). For this to happen, β should be subcritical, i.e., β<1, and even better, close to zero. Of course, the size of the importation mean α0 matters, too, in a secondary order.

Keeping this in mind, let us discuss on which factors the reproduction number β and the importation mean α depend upon, and how they can be influenced/controlled. To begin with, by recalling the above-mentioned points (D1) to (D5) and by adapting the considerations of e.g., Grassly & Fraser [81] to our model, one encounters the fact that the distribution of the offspring Yn1,k—here driven by the reproduction number (offspring mean) β—depends on the following factors:

  • (B1)
    the degree of infectiousness of the individual k, with three major components:
    • (B1a)
      degree of biological infectiousness; this reflects the within-host dynamics (D2) of the “representative” individual k, in particular the duration and amount of the corresponding replication and shedding/excretion of the infectious pathogens; this degree depends thus on (i) the number of host-invading pathogens (called the initial infectious dose), (ii) the type of the pathogen with respect to e.g., its principal capabilities of replication speed, range of spread and drug-sensitivity, (iii) features of the immune system of the host k including the level of innate or acquired immunity, and (iv) the interaction between the genetic determinants of disease progression in both the pathogen and the host;
    • (B1b)
      degree of behavioural infectiousness; this depends on the contact patterns of an infected/infectious individual (and, if relevant, the contact patterns of intermediate hosts or vectors), in relation to the disease-specific type of route(s) of transmission of the infectious pathogens (for an overview of the latter, see e.g., Table 3 of Kaslow & Evans [79]); a long-distance-travel behaviour may also lead to the disease exportation to another, outside population (and thus, for the latter to a disease importation);
    • (B1c)
      degree of environmental infectiousness; this depends on the location and environment of the host k, which influences the duration of outside-host survival of the pathogens (and, if relevant, of the intermediate hosts or vectors) as well as the speed and range of their outside-host spread; for instance, high temperature may kill the pathogens, high airflow or rainfall dynamics may ease their spread, etc.
  • (B2)
    the degree of susceptibility of uninfected individuals who have contact with k, with the following three major components (with similar background as their infectiousness counterparts):
    • (B2a)
      degree of biological susceptibility;
    • (B2b)
      degree of behavioural susceptibility;
    • (B2c)
      degree of environmental susceptibility.

All these factors (B1a) to (B2c) can be principally influenced/controlled to a certain–respective–extent. Let us briefly discuss this for human infectious diseases, where one major goal of epidemic risk management is to operate countermeasures/interventions in order to slow down the disease transmission (e.g., by reducing the reproduction number β to less than 1) and eventually even break the chain of transmission, for the sake of containment or mitigation; preparedness and preparation are motives, too, for instance as a part of governmental pandemic risk management.

For instance, (B1a) can be reduced or even erased through pharmaceutical interventions such as medication (if available), and preventive strengthening of the immune system through non-extreme sports activities and healthy food.

Moreover, the following exemplary control measures for (B2) can be either put into action by common-sense self-behaviour, or by large-scale public recommendations (e.g., through mass media), or by rules/requirements from authorities:

  • (i)

    personal preventive measures such as frequent washing and disinfecting of hands; keeping hands away from face; covering coughs; avoidance of handshakes and hugs with non-family-members; maintaining physical distance (e.g., of two meters) from non-family-members; wearing a face-mask of respective security degree (such as homemade cloth face mask, particulate-filtering face-piece respirator, medical (non-surgical) mask, surgical mask); self-quarantine;

  • (ii)

    environmental measures, such as e.g., cleaning of surfaces;

  • (iii)

    community measures aimed at mild or stringent social distancing, such as e.g., prohibiting/cancelling/banning gatherings of more than z non-family members (e.g., z=2,5,10,100,1000 in various different phases and countries during the current COVID-19 pandemic); mask-wearing (see above); closing of schools, universities, some or even all nonessential (“system-irrelevant”) businesses and venues; home-officing/work ban; home isolation of disease cases; isolation of homes for the elderly/aged (nursing homes); stay-at-home orders with exemptions, household or even general quarantine; testing & tracing; lockdown of entire cities and beyond; restricting the degrees of travel freedom/allowed mobility (e.g., local, union-state, national, international including border and airport closure). The latter also affects the mean importation rate α, which can be controlled by vaccination programs in “outside populations”, too.

As far as the degree of biological susceptibility (B2a) is concerned, one obvious therapeutic countermeasure is a mass vaccination program/campaign (if available).

In case of highly virulent infectious diseases causing epidemics and pandemics with substantial fatality rates, some of the above-mentioned control strategies and countermeasures may (have to) be “drastic” (e.g., lockdown), and thus imply considerable social and economic costs, with a huge impact and potential danger of triggering severe social, economic and political disruptions.

In order to prepare corresponding suggestions for decisions about appropriate control measures (e.g., public policies), it is therefore important–especially for a novel infectious disease such as the current COVID-19 pandemic–to have a model for the time-evolution of the incidences in (i) a natural (basically uncontrolled) set-up, as well as in (ii) the control set-ups under consideration. As already mentioned above, we assume that all these situations can be distilled into an incidence evolution nXn which follows a Poissonian Galton-Watson process with respectively different parameter pairs (β,α). Correspondingly, we always compare two alternative models (H) and (A) with parameter pairs (βH,αH) and (βA,αA) which reflect either a “pure” statistical uncertainty (under the same uncontrolled or controlled set-up), or the uncertainty between two different potential control set-ups (for the sake of assessing the potential impact/efficiency of some planned interventions, compared with alternative ones); the economic impact can be also taken into account, within a Bayesian decision framework discussed in Section 2.5 below. As will be explained in the next subsections, we achieve such comparisons by means of density-based dissimilarity distances/divergences and related quantities thereof.

From the above-mentioned detailed explanations, it is immediately clear that for the described epidemiological context one should investigate all types of criticality and importation means for the therein involved two Poissonian Galton-Watson processes with/without immigration (respectively the equally distributed INARCH(1) models); in particular, this motivates (or even “justifies”) the necessity of the very lengthy detailed studies in the Section 3, Section 4, Section 5, Section 6 and Section 7 below.

2.4. Information Measures

Having two competing models (H) and (A) at stake, it makes sense to study questions such as “how far are they apart?” and thus “how dissimilar are they?”. This can be quantified in terms of divergences in the sense of directed (i.e., not necessarily symmetric) distances, where usually the triangular inequality fails. Let us first discuss our employed divergence subclasses in a general set-up of two equivalent probability measures PH, PA on a measurable space Ω,F. In terms of the parameter λR, the power divergences—also known as Cressie-Read divergences, relative Tsallis entropies, or generalized cross-entropy family– are defined as (see e.g., Liese & Vajda [1,10])

0IλPAPH:=IPAPH,ifλ=1,1λ(λ1)HλPAPH1,ifλR\{0,1},IPHPA,ifλ=0, (2)

where

IPAPH:=pAlogpApHdμ0 (3)

is the Kullback-Leibler information divergence (also known as relative entropy) and

HλPAPH:=ΩpAλpH1λdμ0 (4)

is the Hellinger integral of order λR\{0,1}; for this, we assume as usual without loss of generality that the probability measures PH, PA are dominated by some σ–finite measure μ, with densities

pA=dPAdμandpH=dPHdμ (5)

defined on Ω (the zeros of pH, pA are handled in (3) and (4) with the usual conventions). Clearly, for λ{0,1} one trivially gets

H0PAPH=H1PAPH=1.

The Kullback-Leibler information divergences (relative entropies) in (2) and (3) can alternatively be expressed as (see, e.g., Liese & Vajda [1])

IPAPH=limλ11HλPAPHλ(1λ),IPHPA=limλ01HλPAPHλ(1λ). (6)

Apart from the Kullback-Leibler information divergence (relative entropy), other prominent examples of power divergences are the squared Hellinger distance 12I1/2PAPH and Pearson’s χ2divergence 2I2PAPH; the Hellinger integral H1/2PAPH is also known as (multiple of) the Bhattacharyya coefficent. Extensive studies about basic and advanced general facts on power divergences, Hellinger integrals and the related Renyi divergences of order λR\{0,1}

0RλPAPH:=1λ(λ1)logHλPAPH,withlog0=, (7)

can be found e.g., in Liese & Vajda [1,10], Jacod & Shiryaev [24], van Erven & Harremoes [20] (as a side remark, R1/2PAPH is also known as (multiple of) Bhattacharyya distance). For instance, the integrals in (3) and (4) do not depend on the choice of μ. Furthermore, one has the skew symmetries

HλPAPH=H1λPHPA,aswellasIλPAPH=I1λPHPA, (8)

for all λR (see e.g., Liese & Vajda [1]). As far as finiteness is concerned, for λ]0,1[ one gets the rudimentary bounds

0<HλPAPH1,andequivalently, (9)
0IλPAPH=1HλPAPHλ(1λ)<1λ(1λ), (10)

where the lower bound in (10) (upper bound in (9)) is achieved iff PA=PH. For λR\]0,1[, one gets the bounds

0IλPAPH,andequivalently,1HλPAPH, (11)

where, in contrast to above, both the lower bound of HλPAPH and the lower bound of IλPAPH is achieved iff PA=PH; however, the power divergence IλPAPH and Hellinger integral HλPAPH might be infinite, depending on the particular setup.

The Hellinger integrals can be also used for bounds of the well-known total variation

0V(PAPH):=2supAFPA(A)PH(A)=ΩpApHdμ,

with pA and pH defined in (5). Certainly, the total variation is one of the best known statistical distances, see e.g., Le Cam [109]. For arbitrary λ]0,1[ there holds (cf. Liese & Vajda [1])

1V(PAPH)2Hλ(PAPH)1+V(PAPH)2max{λ,1λ}1V(PAPH)2min{λ,1λ}.

From this together with the particular choice λ=12, we can derive the fundamental universal bounds

21H12(PAPH)V(PAPH)21H12(PAPH)2. (12)

We apply these concepts to our setup of Section 2.1 with two competing models (H) and (A) of Galton-Watson processes with immigration, where one can take ΩN0N0 to be the space of all paths of (Xn)nN. More detailed, in terms of the extinction set B:={τ<} and the parameter-set notation (PS1) to (PS3), it is known that for PSP the two laws PH and PA are equivalent, whereas for PNI the two restrictions PHB and PAB are equivalent (see e.g., Lemma 1.1.3 of Guttorp [52]); with a slight abuse of notation we shall henceforth omit B. Consistently, for fixed time nN0 we introduce PA,n:=PAFn and PH,n:=PHFn as well as the corresponding Radon-Nikodym-derivative (likelihood ratio)

Zn:=dPA,ndPH,n, (13)

where (Fn)nN denotes the corresponding canonical filtration generated by X:=(Xn)nN; in other words, Fn reflects the “process-intrinsic” information known at stage n. Clearly, Z0=1. By choosing the reference measure μ=PH,n one obtains from (4) the Hellinger integral HλPA,0PH,0=1, as well as and for all nN

HλPA,nPH,n=EPH,n(Zn)λ, (14)
IPA,nPH,n=EPA,nlogZn, (15)

from which one can immediately build IλPA,nPH,n (λR) respectively RλPA,nPH,n (λR\{0,1}) respectively bounds of VPA,nPH,n via (2) respectively (7) respectively (12).

The outcoming values (respectively bounds) of HλPA,nPH,n are quite diverse and depend on the choice of the involved parameter pairs (βH,αH), (βA,αA) as well as λ; the exact details will be given in the Section 3 and Section 6 below.

Before we achieve this, in the following we explain how the outcoming dissimilarity results can be applied to Bayesian testing and more general Bayesian decision making, as well as to Neyman-Pearson testing.

2.5. Decision Making under Uncertainty

Within the above-mentioned context of two competing models (H) and (A) of Galton-Watson processes with immigration, let us briefly discuss how knowledge about the time-evolution of the Hellinger integrals HλPA,nPH,n–or equivalently, of the power divergences IλPA,nPH,n, cf. (2)—can be used in order to take decisions under uncertainty, within a framework of Bayesian decision making BDM, or alternatively, of Neyman-Pearson testing NPT.

In our context of BDM, we decide between an action dH “associated with” the (say) hypothesis law PH and an action dA “associated with” the (say) alternative law PA, based on the sample path observation Xn:={Xl:l{0,1,,n}} of the GWI-generation-sizes (e.g., infectious-disease incidences, cf. Section 2.3) up to observation horizon nN. Following the lines of Stummer & Vajda [15] (adapted to our branching process context), for our BDM let us consider as admissible decision rules δn:Ωn{dH,dA} the ones generated by all path sets GnΩn (where Ωn denotes the space of all possible paths of (Xk)k{1,,n}) through

δn(Xn):=δGn(Xn):=dA,ifXnGn,dH,ifXnGn,

as well as loss functions of the form

L(dH,H)L(dH,A)L(dA,H)L(dA,A):=0LALH0 (16)

with pregiven constants LA>0, LH>0 (e.g., arising as bounds from quantities in worst-case scenarios); notice that in (16), dH is assumed to be a zero-loss action under H and dA a zero-loss action under A. Per definition, the Bayes decision rule δGn,min minimizes–over Gn—the mean decision loss

L(δGn):=pHprior·LH·PrδGn(Xn)=dA|H+pAprior·LA·PrδGn(Xn)=dH|A=pHprior·LH·PH,n(Gn)+pAprior·LA·PA,n(ΩnGn) (17)

for given prior probabilities pHprior=Pr(H)]0,1[ for H and pAprior:=Pr(A)=1pHprior for A. As a side remark let us mention that, in a certain sense, the involved model (parameter) uncertainty expressed by the “superordinate” Bernoulli-type law Pr=Bin(1,pHprior) can also be reinterpreted as a rudimentary static random environment caused e.g., by a random Bernoulli-type external static force. By straightforward calculations, one gets with (13) the minimizing path set Gn,min=ZnpHpriorLHpApriorLA leading to the minimal mean decision loss, i.e., the Bayes risk,

Rn:=minGnL(δGn)=L(δGn,min)=ΩnminpHpriorLH,pApriorLAZndPH,n. (18)

Notice that—by straightforward standard arguments—the alternative decision procedure

takeactiondA(resp.dH)ifLH·pHpost(Xn)(resp.>)LA·pApost(Xn)

with posterior probabilities pHpost(Xn):=pHprior(1pHprior)·Zn(Xn)+pHprior=:1pApost(Xn), leads exactly to the same actions as δGn,min. By adapting the Lemma 6.5 of Stummer & Vajda [15]—which on general probability spaces gives fundamental universal inequalities relating Hellinger integrals (or equivalently, power divergences) and Bayes risks—one gets for all LH>0, LA>0, pHprior]0,1[, λ]0,1[ and nN the upper bound

RnΛAλΛH1λHλPA,nPH,n,withΛH:=pHpriorLH,ΛA:=(1pHprior)LA, (19)

as well as the lower bound

Rnmin{λ,1λ}·ΛH+ΛARnmax{λ,1λ}ΛAλΛH1λHλPA,nPH,n

which implies in particular the “direct” lower bound

RnΛAmax{1,λ1λ}ΛHmax{1,1λλ}ΛA+ΛHmax{λ1λ,1λλ}·HλPA,nPH,nmax{1λ,11λ}. (20)

By using (19) (respectively (20)) together with the exact values and the upper (respectively lower) bounds of the Hellinger integrals HλPA,nPH,n derived in the following sections, we end up with upper (respectively lower) bounds of the Bayes risk Rn. Of course, with the help of (2) the bounds (19) and (20) can be (i) immediately rewritten in terms of the power divergences IλPA,nPH,n and (ii) thus be directly interpreted in terms of dissimilarity-size arguments. As a side-remark, in such a Bayesian context the λorder Hellinger integral HλPA,nPH,n=EPH,n(Zn)λ (cf. (14)) can be also interpreted as λorder Bayes-factor moment (with respect to PH,n), since Zn=Zn(Xn)=pApost(Xn)pHpost(Xn)/pApriorpHprior is the Bayes factor (i.e., the posterior odds ratio of (A) to (H), divided by the prior odds ratio of (A) to (H)).

At this point, the potential applicant should be warned about the usual way of asynchronous decision making, where one first tests (A) versus (H) (i.e., LA=LH=1 which leads to 0–1 losses in (16)) and afterwards, based on the outcoming result (e.g., in favour of (A)), takes the attached economic decision (e.g., dA); this can lead to distortions compared with synchronous decision making with “full” monetary losses LA and LH, as is shown in Stummer & Lao [16] within an economic context in connection with discrete approximations of financial diffusion processes (they call this distortion effect a non-commutativity between Bayesian statistical and investment decisions).

For different types of–mainly parameter estimation (squared-error type loss function) concerning—Bayesian analyses based on GW(I) generation size observations, see e.g., Jagers [56], Heyde [38], Heyde & Johnstone [110], Johnson et al. [111], Basawa & Rao [60], Basawa & Scott [61], Scott [112], Guttorp [52], Yanev & Tsokos [113], Mendoza & Gutierrez-Pena [114], and the references therein.

Within our running-example epidemiological context of Section 2.3, let us briefly discuss the role of the above-mentioned losses LA and LH. To begin with, as mentioned above the unit-free choice LA=LH=1 corresponds to Bayesian testing. Recall that this concerns with two alternative infectious-disease models (H) and (A) with parameter pairs (recall the interpretation of β as reproduction number and α as importation mean) (βH,αH) and (βA,αA) which reflect either a “pure” statistical uncertainty (under the same uncontrolled or controlled set-up), or the uncertainty between two different potential control set-ups (for the sake of assessing the potential impact/efficiency of some planned interventions, compared with alternative ones). As far as non-unit-free–e.g., macroeconomic or monetary–losses is concerned, recall that some of the above-mentioned control strategies (countermeasures, public policies, governmental pandemic risk management plans) may imply considerable social and economic costs, with a huge impact and potential danger of triggering severe social, economic and political disruptions; a corresponding tradeoff between health and economic issues can be incorporated by choosing LA and LH to be (e.g., monetary) values which reflect estimates or upper bounds of losses due to wrong decisions, e.g., if at stage n due to the observed data one erroneously thinks (reinforced by fear) that a novel infectious disease (e.g., COVID-19) will lead (or re-emerge) to a severe pandemic and consequently decides for a lockdown with drastic future economic consequences, versus, if one erroneously thinks (reinforced by carelessness) that the infectious disease is (or stays) non-severe and consequently eases some/all control measures which will lead to extremely devastating future economic consequences. For the estimates/bounds of LA and LH, one can e.g., employ (i) the comprehensive stochastic studies of Feicht & Stummer [115] on the quantitative degree of elasticity and speed of recovery of economies after a sudden macroeconomic disaster, or (ii) the more short-term, German-specific, scenario-type (basically non-stochastic) studies of Dorn et al. [116,117] in connection with the current COVID-19 pandemic.

Of course, the above-mentioned Bayesian decision procedure can be also operated in sequential way. For instance, suppose that we are encountered with a novel infectious disease (e.g., COVID-19) of non-negligible fatality rate and let (A) reflect a “potentially dangerous” infectious-disease-transmission situation (e.g., a reproduction number of substantially supercritical case βA=2, and an importation mean of αA=10, for weekly appearing new incidence-generations) whereas (H) describes a “relatively harmless/mild” situation (e.g., a substantially subcritical βH=0.5, αH=0.2). Moreover, let dA respectively dH denote (non-quantitatively) the decision/action to accept (A) respectively (H). It can then be reasonable to decide to stop the observation process nXn (also called surveillance or online-monitoring) of incidence numbers at the first time at which nZn=Zn(Xn) exceeds the threshold pHprior/pAprior; if this happens, one takes dA as decision (and e.g., declare the situation as occurrence of an epidemic outbreak and start with control/intervention measures (however, as explained above, one should synchronously involve also the potential economic losses)) whereas as long as this does not happen, one continues the observation (and implicitly takes dH as decision). This can be modelled in terms of the pair (τ˜,dA) with (random) stopping time τ˜:=infnN:ZnpHpriorpAprior (with the usual convention that the infimum of the empty set is infinity), and the corresponding decision dA. After the time τ˜< and e.g., immediate subsequent employment of some control/counter measures, one can e.g., take the old model (A) as new (H), declare a new target (A) for the desired quantification of the effectiveness of the employed control measures (e.g., a mitigation to a slightly subcritical case of βA=0.95, αH=0.8), and starts to observe the new incidence numbers until the new target (A) has been reached. This can be interpreted as online-detection of a distributional change; a related comprehensive new framework for the use of divergences (even much beyond power divergences) for distributional change detection can be found e.g., in the recent work of Kißlinger & Stummer [118]. A completely different, SIR-model based, approach for the detection of change points in the spread of COVID-19 is given in Dehning et al. [119]. Moreover, other different surveillance methods can be also found e.g., in the corresponding overview of Frisen [120] and the Swedish epidemics outbreak investigations of Friesen & Andersson & Schiöler [121].

One can refine the above-mentioned sequential procedure via two (instead of one) appropriate thresholds c1<c2 and the pair (τ˘,δτ˘), with the stopping time τ˘:=infnN:Zn[c1,c2] as well as corresponding decision rule

δτ˘:=dA,ifZτ˘>c2,dH,ifZτ˘<c1.

An exact optimized treatment on the two above-mentioned sequential procedures, and their connection to Hellinger integrals (and power divergences) of Galton-Watson processes with immigration, is beyond the scope of this paper.

As a side remark, let us mention that our above-mentioned suggested method of Bayesian decision making with Hellinger integrals of GWIs differs completely from the very recent work of Brauner et al. [122] who use a Bayesian hierarchical model for the concrete, very comprehensive study on the effectiveness and burden of non-pharmaceutical interventions against COVID-19 transmission.

The power divergences IλPA,nPH,n (λR) can be employed also in other ways within Bayesian decision making, of statistical nature. Namely, by adapting the general lines of Österreicher & Vajda [123] (see also Liese & Vajda [10], as well as diffusion-process applications in Stummer [5,31,32]) to our context of Galton-Watson processes with immigration, we can proceed as follows. For the sake of comfortable notations, we first attach the value θ:=1 to the GWI model (A) (which has prior probability pAprior]0,1[) and θ:=0 to (H) (which has prior probability 1pAprior). Suppose we want to decide, in an optimal Bayesian way, which degree of evidence deg[0,1] we should attribute (according to a pregiven loss function LO) to the model (A). In order to achieve this goal, we choose a nonnegatively-valued loss function LO(θ,deg) defined on {0,1}×[0,1], of two types which will be specified below. The risk at stage 0 (i.e., prior to the GWI-path observations Xn), from the optimal decision about the degree of evidence deg concerning the decision parameter θ, is defined as

BRLOpAprior:=mindeg[0,1]{(1pAprior)·LO(0,deg)+pAprior·LO(1,deg)},

which can be thus interpreted as a minimal prior expected loss (the minimum will always exist). The corresponding risk posterior to the GWI-path observations Xn, from the optimal decision about the degree of evidence deg concerning the parameter θ, is given by

BRLOpostpAprior:=ΩnBRLOpApost(Xn)(pApriordPA,n+(1pAprior)dPH,n),

which is achieved by the optimal decision rule (about the degree of evidence)

D*Xn:=argmindeg[0,1]{1pApost(Xn)·LO(0,deg)+pApost(Xn)·LO(1,deg)}.

The corresponding statistical information measure (in the sense of De Groot [124])

ΔBRLOpAprior:=BRLOpApriorBRLOpostpAprior0

represents the reduction of the decision risk about the degree of evidence deg concerning the parameter θ, that can be attained by observing the GWI-path Xn until stage n. For the first-type loss function LO˜(θ,deg):=deg(2deg1)·1{1}(θ), defined on {0,1}×[0,1] with the help of the indicator function 1A(.) on the set A, one can show that

D*Xn:=0,ifpApost(Xn)[0,12[,1,ifpApost(Xn)]12,1[,anynumberin[0,1],ifpApost(Xn)=12,

as well as the representation formula

IλPA,nPH,n=01ΔBRLO˜pAprior·1pApriorλ2·pAprior1λdpAprior,λR, (21)

(cf. Österreicher & Vajda [123], Liese & Vajda [10], adapted to our GWI context); in other words, the power divergence IλPA,nPH,n can be regarded as a weighted-average statistical information measure (weighted-average decision risk reduction). One can also use other weights of pAprior in order to get bounds of IλPA,nPH,n (analogously to Stummer [5]).

For the second-type loss function LOλ,χ(θ,deg):=λθ1degλθχλ(1χ)1λ(1λ)θ(1deg)λθ defined on {0,1}×[0,1] with parameters λ]0,1[ and χ]0,1[, one can derive the optimal decision rule

D*Xn=pApost(Xn)

as well as the representation formula as a limit statistical information measure (limit decision risk reduction)

IλPA,nPH,n=limχpApriorΔBRLOλ,χpAprior=:ΔBRLOλ,pApriorpAprior (22)

(cf. Österreicher & Vajda [123], Stummer [5], adapted to our GWI context).

As an alternative to the above-mentioned Bayesian-decision-making applications of Hellinger integrals HλPA,nPH,n, let us now briefly discuss the use of the latter for the corresponding Neyman-Pearson (NPT) framework with randomized tests Tn:Ωn[0,1] of the hypothesis PH against the alternative PA, based on the GWI-generation-size sample path observations Xn:={Xl:l{0,1,,n}}. In contrast to (17) and (18) a Neyman-Pearson test minimizes—over Tn–the type II error probability Ωn(1Tn)dPA,n in the class of the tests for which the type I error probability ΩnTndPH,n is at most ς]0,1[. The corresponding minimal type II error probability

EςPA,iPH,i:=infTi:ΩiTidPH,iςΩi(1Ti)dPA,i

can for all ς]0,1[, λ]0,1[, iI be bounded from above by

EςPA,iPH,iEςUPA,iPH,i:=min(1λ)·λςλ/(1λ)·HλPA,iPH,i1/(1λ),1, (23)

and for all λ>1, iI it can be bounded from below by

EςPA,iPH,iEςLPA,iPH,i:=(1ς)λ/(λ1)·HλPA,iPH,i1/(1λ), (24)

which is an adaption of a general result of Krafft & Plachky [125], see also Liese & Vajda [1] as well as Stummer & Vajda [15]. Hence, by combining (23) and (24) with the exact values respectively upper bounds of the Hellinger integrals H1λPA,nPH,n from the following sections, we obtain for our context of Galton-Watson processes with Poisson offspring and Poisson immigration (including the non-immigration case) some upper bounds of EςPA,nPH,n, which can also be immediately rewritten as lower bounds for the power 1EςPA,nPH,n of a most powerful test at level ς. In contrast to such finite-time-horizon results, for the (to our context) incompatible setup of Galton-Watson processes with Poisson offspring but nonstochastic immigration of constant value 1, the asymptotic rates of decrease as n of the unconstrained type II error probabilities as well as the type I error probabilites were studied in Linkov & Lunyova [53] by a different approach employing also Hellinger integrals. Some other types of Galton-Watson-process concerning Neyman-Pearson testing investigations different to ours can be found e.g., in Basawa & Scott [126], Feigin [127], Sweeting [128], Basawa & Scott [61], and the references therein.

2.6. Asymptotical Distinguishability

The next two concepts deal with two general families PA,iiI and PH,iiI of probability measures on the measurable spaces Ωi,FiiI, where the index set I is either N0 or R+. For them, the following two general types of asymptotical distinguishability are well known (see e.g., LeCam [109], Liese & Vajda [1], Jacod & Shiryaev [24], Linkov [129], and the references therein).

Definition 1.

The family (PA,i)iI is contiguous to the family (PH,i)iI – in symbols, (PA,i)(PH,i)– if for all sets AiFi with limiPH,i(Ai)=0 there holds limiPA,i(Ai)=0.

Definition 2.

Families of measures (PA,i)iI and (PH,i)iI are called entirely separated (completely asymptotically distinguishable)—in symbols, (PA,i)(PH,i)–if there exist a sequence im as m and for each mN0 an AimFim such that limmPA,im(Aim)=1 and limmPH,im(Aim)=0.

It is clear that the notion of contiguity is the attempt to carry the concept of absolute continuity over to families of measures. Loosely speaking, (PA,i) is contiguous to (PH,i), if the limit limi(PA,i) (existence preconditioned) is absolute continuous to the limit limi(PH,i). However, for the definition of contiguity, we do not need to require the probability measures to converge to limiting probability measures. On the other hand, entire separation is the generalization of singularity to families of measures.

The corresponding negations will be denoted by ¯ and ¯. One can easily check that a family (PA,i) cannot be both contiguous and entirely separated to a family (PH,i). In fact, as shown in Linkov [129], the relation between the families (PA,i) and (PH,i) can be uniquely classified into the following distinguishability types:

  • (a)

    (PA,i)(PH,i);

  • (b)

    (PA,i)(PH,i), (PH,i)¯(PA,i);

  • (c)

    (PA,i)¯(PH,i), (PH,i)(PA,i);

  • (d)

    (PA,i)¯¯(PH,i), (PA,i)¯(PH,i);

  • (e)

    (PA,i)(PH,i).

As demonstrated in the above-mentioned references for a general context, one can conclude the type of distinguishability from the time-evolution of Hellinger integrals. Indeed, the following assertions can be found e.g., in Linkov [129], where part (c) was established in Liese & Vajda [1] and (f), (g) in Vajda [3].

Proposition 1.

The following assertions are equivalent:

(a)(PA,i)(PH,i),(b)lim infiHλ(PA,iPH,i)=0forallλ]0,1[,(c)thereexistsaλ]0,1[:lim infiHλ(PA,iPH,i)=0,(d)thereexistsaπ]0,1[:lim infieπ(PA,iPH,i)=0,(e)lim supiV(PA,iPH,i)=2,(f)thereexistsaλ]0,1[:lim supiIλ(PA,iPH,i)=1λ·(1λ),(g)lim supiIλ(PA,iPH,i)=1λ·(1λ),forallλ]0,1[. (25)

In combination with the discussion after Definition 2, one can thus interpret the λorder Hellinger integral Hλ(PA,iPH,i) as a “measure” for the distinctness of the two families PA,i and PH,i up to a fixed finite time horizon iI.

Furthermore, for the contiguity we obtain the equivalence (see e.g., Liese & Vajda [1], Linkov [129])

(PA,i)(PH,i)lim infλ1lim infiHλPA,iPH,i=1lim supλ1lim supiλ·(1λ)·IλPA,iPH,i=0. (26)

All the above-mentioned general results can be applied to our context of two competing Poissonian Galton-Watson processes with immigration (GWI) (H) and (A) (reflected by the two different laws PH resp. PA with parameter pairs (βH,αH) resp. (βA,αA)), by taking PA,i:=PAFi and PH,i:=PHFi. Recall from the preceding subsections (by identifying i with n) that the latter two describe the stochastic dynamics of the respective GWI within the restricted time-/stage-frame {0,1,,i}.

In the following, we study in detail the evolution of Hellinger integrals between two competing models of Galton-Watson processes with immigration, which turns out to be quite extensive.

3. Detailed Recursive Analyses of Hellinger Integrals

3.1. A First Basic Result

In terms of our notations (PS1) to (PS3), a typical situation for applications in our mind is that one particular constellation βA,βH,αA,αHP (e.g., obtained from theoretical or previous statistical investigations) is fixed, whereas–in contrast–the parameter λR\{0,1} for the Hellinger integral or the power divergence might be chosen freely, e.g., depending on which (transform of a) dissimilarity measure one decides to choose for further analysis. At this point, let us emphasize that in general we will not make assumptions of the form β1, i.e., upon the type of criticality.

To start with our investigations, in order to justify for all nN0

Zn:=dPA,ndPH,n(cf.(13)),

(14) and (15) (as well as IλPA,nPH,n for λR respectively RλPA,nPH,n for λR\{0,1}), we first mention the following straightforward facts: (i) if βA,βH,αA,αHPNI, then PA,n and PH,n are equivalent (i.e., PA,nPH,n), as well as (ii) if βA,βH,αA,αHPSP, then PA,n and PH,n are equivalent (i.e., PA,nPH,n). Moreover, by recalling Z0=1 and using the “rate functions” f(x)=βx+α (x[0,[), a version of (13) can be easily determined by calculating for each x:=(x0,x1,x2,)Ω:=N×N0×N0×

Zn(x)=k=1nZn,k(x)withZn,k(x):=expfA(xk1)fH(xk1)fA(xk1)fH(xk1)xk,

where for the last term we use the convention 00x=1 for all xN0. Furthermore, we define for each xΩ

Zn,k(λ)(x):=expλfA(xk1)+(1λ)fH(xk1)fA(xk1)λfH(xk1)1λxkxk! (27)

with the convention 000!=1 for the last term. Accordingly, one obtains from (14) the Hellinger integral HλPA,0PH,0=1, as well as for all βA,βH,αA,αH,λP×(R\{0,1})

HλPA,1PH,1=expfA(x0)λfH(x0)(1λ)(λfA(x0)+(1λ)fH(x0)) (28)

for x0=X0N, and for all nN\{1}

HλPA,nPH,n=EPH,n(Zn)λ=x1=0xn=0k=1nZn,k(λ)(x)=x1=0xn1=0k=1n1Zn,k(λ)(x)·e(λfA(xn1)+(1λ)fH(xn1))xn=0fA(xn1)λfH(xn1)1λxnxn!=x1=0xn1=0k=1n1Zn,k(λ)(x)·exp{fA(xn1)λfH(xn1)1λ(λfA(xn1)+(1λ)fH(xn1))}. (29)

From (29), one can see that a crucial role for the exact calculation (respectively the derivation of bounds) of the Hellinger integral is played by the functions defined for x[0,[

ϕλ(x):=ϕ(x,βA,βH,αA,αH,λ):=φλ(x)fλ(x),with (30)
φλ(x):=φ(x,βA,βH,αA,αH,λ):=fA(x)λfH(x)1λand (31)
fλ(x):=f(x,βA,βH,αA,αH,λ):=λfA(x)+(1λ)fH(x)=αλ+βλx, (32)

where we have used the λ-weighted-averages

αλ:=α(αA,αH,λ):=λ·αA+(1λ)·αHandβλ:=β(βA,βH,λ):=λ·βA+(1λ)·βH.

Since λ plays a special role, henceforth we typically use it as index and often omit βA,βH,αA,αH. According to Lemma A1 in the Appendix A.1, it follows that for λ]0,1[ (respectively λR\[0,1]) one gets ϕλ(x)0 (respectively ϕλ(x)0) for all x[0,[. Furthermore, in both cases there holds ϕλ(x)=0 iff fA(x)=fH(x), i.e., for x=x*:=αAαHβHβA0. This is consistent with the corresponding generally valid upper and lower bounds (cf. (9) and (11)) 0<HλPA,nPH,n1,forλ]0,1[,1HλPA,nPH,n,forλR\[0,1].

As a first indication for our proposed method, let us start by illuminating the simplest case λR\{0,1} and γ:=αHβAαAβH=0. This means that βA,βH,αA,αHPNIPSP,1, where PSP,1 is the set of all (componentwise) strictly positive βA,βH,αA,αH with βAβH, αAαH and βAβH=αAαH1 (“the equal-fraction-case”). In this situation, all the three functions (30) to (32) are linear. Indeed,

φλ(x)=pλE+qλEx (33)

with pλE:=αAλαH1λ and qλE:=βAλβH1λ (where the index E stands for exact linearity). Clearly, qλE>0 on PNIPSP,1, as well as pλE>0 on PSP,1 and pλE=0 on PNI. Furthermore,

ϕλ(x)=rλE+sλEx

with rλE:=pλEαλ=αAλαH1λ(λαA+(1λ)αH) and sλE:=qλEβλ=βAλβH1λ(λβA+(1λ)βH). Due to Lemma A1 one knows that on PNIPSP,1 one gets sλE<0 for λ]0,1[ and sλE>0 for λR\[0,1]. Furthermore, on PSP,1 one gets rλE<0 (resp. rλE>0) for λ]0,1[ (resp. λR\[0,1]), whereas on PNI, the no-immigration setup, we get for all λR\{0,1} rλE=0.

As it will be seen later on, such kind of linearity properties are useful for the recursive handling of the Hellinger integrals. However, only on the parameter set PNIPSP,1 the functions φλ and ϕλ are linear. Hence, in the general case βA,βH,αA,αH,λP×R\{0,1} we aim for linear lower and upper bounds

φλL(x):=pλL+qλLxφλ(x)φλU(x):=pλU+qλUx, (34)

x[0,[ (ultimately, xN0), which by (30) and (31) leads to

ϕλ(x)ϕλU(x):=rλU+sλU·x:=(pλUαλ)+(qλUβλ)·x,ϕλL(x):=rλL+sλL·x:=(pλLαλ)+(qλLβλ)·x, (35)

x[0,[ (ultimately, xN0). Of course, the involved slopes and intercepts should satisfy reasonable restrictions. Later on, we shall impose further restrictions on the involved slopes and intercepts, in order to guarantee nice properties of the general Hellinger integral bounds given in Theorem 1 below (for instance, in consistency with the nonnegativity of φλ we could require pλUpλL0, qλUqλL0 which nontrivially implies that these bounds possess certain monotonicity properties). For the formulation of our first assertions on Hellinger integrals, we make use of the following notation:

Definition 3.

For all βA,βH,αA,αH,λP×R\{0,1} and all p,qR let us define the sequences an(q)nN0 and bn(p,q)nN0 recursively by

a0(q):=0;an(q):=ξλ(q)an1(q):=q·ean1(q)βλ,nN, (36)
b0(p,q):=0;bn(p,q):=p·ean1(q)αλ,nN. (37)

Notice the interrelation a1(qλA)=sλA and b1(pλA,qλA)=rλA for A{E,L,U}. Clearly, for all qR\{0} and pR one has the linear interrelation

bn(p,q)=pqan(q)+pqβλαλ,nN. (38)

Accordingly, we obtain fundamental Hellinger integral evaluations:

Theorem 1.

  • (a) 
    For all βA,βH,αA,αH,λ(PNIPSP,1)×R\{0,1}, all initial population sizes X0N and all observation horizons nN one can recursively compute the exact value
    Hλ(PA,nPH,n)=expan(qλE)X0+αAβAk=1nak(qλE)=:Vλ,X0,n, (39)
    where αAβA can be equivalently replaced by αHβH. Recall that qλE:=βAλβH1λ. Notice that on PNI×(R\{0,1}) the formula (39) simplifies significantly, since αA=αH=0.
  • (b) 
    For all βA,βH,αA,αH,λ(PSP\PSP,1)×(R\{0,1}), all coefficients pλL,pλU,qλL,qλUR which satisfy (35) for all xN0(and thus in particular pλLpλU,qλLqλU), all initial population sizes X0N and all observation horizons nN one gets the following recursive (i.e., recursively computable) bounds for the Hellinger integrals:
    forλ]0,1[:Bλ,X0,nL:=B˜λ,X0,n(pλL,qλL)<Hλ(PA,nPH,n)minB˜λ,X0,n(pλU,qλU),1=:Bλ,X0,nU, (40)
    forλR\[0,1]:Bλ,X0,nL:=maxB˜λ,X0,n(pλL,qλL),1Hλ(PA,nPH,n)<B˜λ,X0,n(pλU,qλU)=:Bλ,X0,nU, (41)
    where for general λR\{0,1}, pR,qR\{0} we use the definitions
    B˜λ,X0,n(p,q):=expan(q)·X0+k=1nbk(p,q)=expan(q)·X0+pqk=1nak(q)+n·pqβλαλ, (42)
    as well as
    B˜λ,X0,n(p,0):=expβλ·X0+p·eβλαλ·n.

Remark 1.

  • (a) 

    Notice that the expression B˜λ,X0,n(p,q) can analogously be defined on the parameter set PNIPSP,1. For the choices qλE:=βAλβH1λ>0 and pλE:=αAλαH1λ=qλE·αAβA=qλE·αHβH0 one gets (pλE/qλE)·βλαλ=0, and thus the characterization B˜λ,X0,n(pλE,qλE)=Vλ,X0,n as the exact value (rather than a lower/upper bound (component)).

  • (b) 

    In the case q=βλ one gets the explicit representation B˜λ,X0,n(p,q)=exppαλ·n.

  • (c) 

    Using the skew symmetry (8), one can derive alternative bounds of the Hellinger integral by switching to the transformed parameter setup (βA,βH,αA,αH,λ):=(βH,βA,αH,αA,1λ). However, this does not lead to different bounds: define ϕλ, φλ and fλ analogously to (30), (31) and (32) by replacing the parameters βA,βH,αA,αH,λ with (βA,βH,αA,αH,λ). Then, there holds fλ(x)=fλ(x),φλ(x)=φλ(x) and ϕλ(x)=ϕλ(x), and the set of (lower and upper bound) parameters pλL,qλL,pλU,qλU satisfying (35) does not change under this transformation.

  • (d) 

    If there are no other restrictions on pλL,pλU,qλL,qλU than (35), the bounds in (40) and (41) can have some inconvenient features, e.g., being 1 for all (large enough) nN, having oscillating n-behaviour, being suboptimal in certain (other) senses. For a detailed discussion, the reader is referred to Section 3.16 ff. below.

  • (e) 

    For the (to our context) incompatible setup of GWI with Poisson offspring but nonstochastic immigration of constant value 1, the exact values of the corresponding Hellinger integrals (i.e., an “analogue” of part (a)) was established in Linkov & Lunyova [53].

Proof of Theorem 1.

Let us fix βA,βH,αA,αHP as well as x0:=X0N, and start with arbitrary λ]0,1[. We first prove the upper bound Bλ,X0,nU of part (b). Correspondingly, we suppose that the coefficients pλU, qλU satisfy (35) for all xN0. From (28), (30), (31), (32) and (35) one gets immediately Bλ,X0,1U in terms of the first sequence-element a1(qλU) (cf. (36)). With the help of (29) for all observation horizons nN\{1} we get (with the obvious shortcut for n=2)

HλPA,nPH,n=x1=0xn1=0k=1n1Zn,k(λ)(x)·expφλ(xn1)fλ(xn1)<x1=0xn1=0k=1n1Zn,k(λ)(x)·exp(pλUαλ)+(qλUβλ)xn1=x1=0xn1=0k=1n1Zn,k(λ)(x)·expb1(pλU,qλU)+a1(qλU)xn1=expb1(pλU,qλU)x1=0xn2=0k=1n2Zn,k(λ)(x)·expexpa1(qλU)φλ(xn2)fλ(xn2)<expb1(pλU,qλU)x1=0xn2=0k=1n2Zn,k(λ)(x)·expexpa1(qλU)pλUαλ+expa1(qλU)qλUβλ·xn2<expb1(pλU,qλU)x1=0xn2=0k=1n2Zn,k(λ)(x)·expb2(pλU,qλU)+a2(qλU)xn2<<expan(qλU)x0+k=1nbk(pλU,qλU). (43)

Notice that for the strictness of the above inequalities we have used the fact that ϕλ(x)<ϕλU(x) for some (in fact, all but at most two) xN0 (cf. Properties 3(P19) below). Since for some admissible choices of pλU,qλU and some nN the last term in (43) can become larger than 1, one needs to take into account the cutoff-point 1 arising from (9). The lower bound Bλ,X0,nL of part (b), as well as the exact value of part (a) follow from (29) in an analogous manner by employing pλL,qλL and pλE,qλE respectively. Furthermore, we use the fact that for βA,βH,αA,αH,λ(PNIPSP,1)×]0,1[ one gets from (38) the relation bn(pλE,qλE)=αAβAan(qλE). For the sake of brevity, the corresponding straightforward details are omitted here. Although we take the minimum of the upper bound derived in (43) and 1, the inequality Bλ,X0,nL<Bλ,X0,nU is nevertheless valid: the reason is that for constituting a lower bound, the parameters pλL,qλL must fulfill either the conditions [pλLαλ<0 and qλLβλ0] or [pλLαλ0 and qλLβλ<0] (or both), which guarantees that Bλ,X0,nL<1. The proof for all λR\[0,1] works out completely analogous, by taking into account the generally valid lower bound Hλ(PA,nPH,n)1 (cf. (11)). □

3.2. Some Useful Facts for Deeper Analyses

Theorem 1(b) and Remark 1(a) indicate the crucial role of the expression B˜λ,X0,n(p,q) and that the choice of the quantities p,q depends on the underlying (e.g., fixed) offspring-immigration parameter constellation βA,βH,αA,αH as well as on the (e.g., selectable) value of λ, i.e., pλA=pAβA,βH,αA,αH,λ and qλA=qAβA,βH,αA,αH,λ with A{E,L,U}. In order to study the desired time-behaviour nB˜λ,X0,n(·,·) of the Hellinger integral bounds resp. exact values, one therefore faces a six-dimensional (and thus highly non-obvious) detailed analysis, including the search for criteria (in addition to (35)) on good/optimal choices of pλL,qλL,pλU,qλU. Since these criteria will (almost) always imply the nonnegativity of pλA,qλA (A{L,U}) and pλE0,qλE>0 (cf. Remark 1(a)), let us first present some fundamental properties of the underlying crucial sequences an(q)nN and bn(p,q)nN for general p0,q0.

Properties 1.

For all λR the following holds:

  • (P1) 
    If 0<q<βλ, then the sequence an(q)nN is strictly negative, strictly decreasing and converges to the unique negative solution x0(q)]βλ,qβλ[ of the equation
    ξλ(q)(x)=q·exβλ=x. (44)
  • (P2) 

    If 0<q=βλ, then an(q)0.

  • (P3) 
    If q>max{0,βλ}, then the sequence an(q)nN is strictly positive and strictly increasing. Notice that in this setup, q=1 implies min{1,eβλ1}=eβλ1<q.
    • (P3a) 
      If additionally qmin1,eβλ1, then the sequence an(q)nN converges to the smallest positive solution x0(q)]0,logq] of the Equation (44).
    • (P3b) 
      If additionally q>min1,eβλ1, then the sequence an(q)nN diverges to ∞, faster than exponentially (i.e., there do not exist constants c1,c2R such that an(q)ec1+c2n for all nN).
  • (P4) 

    If q=0, then one gets an(0)βλ.

    Due to the linear interrelation (38), these results directly carry over to the behaviour of the sequence bn(p,q)nN:

  • (P5) 
    If p>0 and 0<q<βλ, then the sequence bn(p,q)nN is strictly decreasing and converges to p·ex0(q)αλ. Trivially, b1(p,q)=pαλ.
    • (P5a) 
      If additionally p<αλ, then bn(p,q)nN is strictly negative for all nN.
    • (P5b) 
      If additionally p=αλ, then bn(p,q)nN is strictly negative for all nN\{1}.
    • (P5c) 
      If additionally p>αλ, then bn(p,q)nN is strictly positive for some (and possibly for all) nN.
  • (P6) 

    If 0<q=βλ, then bn(p,q)pαλ.

  • (P7) 
    If p>0 and q>max{0,βλ}, then the sequence bn(p,q)nN is strictly increasing.
    • (P7a) 
      If additionally qmin1,eβλ1, then the sequence bn(p,q)nN converges to p·ex0(q)αλ]pαλ,p/qαλ]; this limit can take any sign, depending on the parameter constellation.
    • (P7b) 
      If additionally q>min1,eβλ1, then the sequence bn(p,q)nN diverges to ∞, faster than exponentially.
  • (P8) 

    For the remaining cases we get: bn(0,q)αλ and bn(p,0)p·eβλαλ (pR,qR). Moreover, in our investigations we will repeatedly make use of the function ξλ(q)(·) from the definition (36) of an(q) (see also (44)), which has the following properties:

  • (P9) 
    For q]0,[ and all λR\{0,1} the function ξλ(q)(·) is strictly increasing, strictly convex and smooth, and there holds
    • (P9a) 
      ξλ(q)(0)<0,ifq<βλ,=0,ifq=βλ,>0,ifq>βλ.
    • (P9b) 
      limxξλ(q)(x)=βλ,andlimxξλ(q)(x)=.

The proof of these properties is provided in Appendix A.1. From Properties 1 (P1) to (P4) we can see, that the behaviour of the sequence an(q)nN can be classified basically into four different types; besides the case (P2) where an(q) is constant, the sequence can be either (i) strictly decreasing and convergent (e.g., for the NI case βA,βH,αA,αH,λ=(0.5,2,0,0,0.5) leading to βλ=λβA+(1λ)βH=1.25 and to q:=qλE=βAλβH1λ=1, cf. (33) resp. Theorem 1(a)), or (ii) strictly increasing and convergent (e.g., for βA,βH,αA,αH,λ=(0.5,2,0,0,1.5) leading to βλ=0.25, q:=qλE=0.25), or (iii) strictly increasing and divergent (e.g., for βA,βH,αA,αH,λ=(0.5,2,0,0,2.7) leading to βλ=2.05, q:=qλE0.047366). Within our running-example epidemiological context of Section 2.3, this corresponds to a “potentially dangerous” infectious-disease-transmission situation (H) (with supercritical reproduction number βH=2), whereas (A) describes a “mild” situation (with “low” subcritical βA=0.5).

As already mentioned before, the sequences an(q)nN and bn(p,q)nN–whose behaviours for general p0 and q0 were described by the Properties 1–have to be evaluated at setup-dependent choices p=pλ=pβA,βH,αA,αH,λ and q=qλ=qβA,βH,αA,αH,λ. Hence, for fixed βA,βH,αA,αH, one of the questions–which arises in the course of the desired investigations of the time-behaviour of the Hellinger integral bounds (resp. exact values)–is for which λR the sequence an(qλ)nN converges. In the following, we illuminate this for the important special case qλ=βAλβH1λ. Suppose at first that βAβH. Properties 1 (P1) implies that for λ]0,1[ one has limnan(qλ)=x0(qλ)]βλ,qλβλ[, and Lemma A1 states that qλβλ<0. For λR\[0,1], there holds qλ>max{0,βλ}, and from (P3) one can see that an(qλ)nN does not converge to x0(qλ) in general, but for qλmin{1,eβλ1} which constitutes an implicit condition on λ. This can be made explicit, with the help of the auxiliary variables

λ:=λ(βA,βH):=infλ0:βAλβH1λmin1,exp{λβA+(1λ)βH1},incasethatthesetisnonempty,0,else,λ+:=λ+(βA,βH):=supλ1:βAλβH1λmin1,exp{λβA+(1λ)βH1},incasethatthesetisnonempty,1,else.

For the constellation βA=βH>0 we clearly obtain qλ=βAλβH1λ=βA=βH=βλ. Hence, (P2) implies that the sequence an(qλ)nN converges for all λR\{0,1} and we can set λ:= as well as λ+:=. Incorporating this and by adapting a result of Linkov & Lunyova [53] on λ(v1,v2),λ+(v1,v2) for βAβH, we end up with

Lemma 1.

(a) For all βA>0,βH>0 with βAβH there holds

λ=λ(βA,βH)=0,ifβH1,λ˘,ifβH<1andβA[βH,βHz(βH)],,ifβH<1andβA]βH,βHz(βH)],
λ+=λ+(βA,βH)=1,ifβA1,λ˘,ifβA<1andβH[βA,βAz(βA)],,ifβA<1andβH]βA,βAz(βA)],

where

λ˘:=λ˘(βA,βH):=βH1logβHβHβA+logβAβH<0,ifβH<1andβA[βH,βHz(βH)],>1,ifβA<1andβH[βA,βAz(βA)].

Here, for fixed β]0,[\{1} we denote by z(β) the unique solution of the equation log(x)β(x1)=0, x]0,[\{1}. For β=1, z(β)=1 denotes the unique solution of log(x)(x1)=0,x]0,[. (b) For all βA=βH>0 one gets λ=λ(βA,βH)= as well as λ+=λ+(βA,βH)=. Notice that the relationship λ˘(βA,βH)=1λ˘(βH,βA) is consistent with the skew symmetry (8).

A corresponding proof is given in Appendix A.1.

With these auxiliary basic facts in hand, let us now work out our detailed investigations of the time-behaviour nHλ(PA,nPH,n), where we start with the exactly treatable case (a) in Theorem 1.

3.3. Detailed Analyses of the Exact Recursive Values, i.e., for the Cases βA,βH,αA,αHPNIPSP,1

In the no-immigration-case βA,βH,αA,αHPNI and in the equal-fraction-case βA,βH,αA,αHPSP,1, the Hellinger integral can be calculated exactly in terms of Hλ(PA,nPH,n)=Vλ,X0,n (cf. (39)), as proposed in part (a) of Theorem 1. This quantity depends on the behaviour of the sequence an(qλE)nN, with qλE:=βAλβH1λ>0, and of the sum αAβAk=1nak(qλE)nN. The last expression is equal to zero on PNI. On PSP,1, this sum is unequal to zero. Using Lemma A1 we conclude that qλE<βλ (resp. qλE>βλ) iff λ]0,1[ (resp. λR\[0,1]), since on PNIPSP,1 there holds βAβH. Thus, from Properties 1 (P1) we can see that the sequence an(qλE)nN is strictly negative, strictly decreasing and it converges to the unique solution x0(qλE)]βλ,qλEβλ[ of the Equation (44) if λ]0,1[. For λR\[0,1], (P3) implies that the sequence an(qλE)nN is strictly positive, strictly increasing and converges to the smallest positive solution x0(qλE)]0,log(qλE)] of the Equation (44) in case that (P3a) is satisfied, otherwise it diverges to . Thus, we have shown the following detailed behaviour of Hellinger integrals:

Proposition 2.

For all βA,βH,αA,αH,λPNI×]0,1[ and all initial population sizes X0N there holds

(a)Hλ(PA,1PH,1)=expβAλβH1λλβA(1λ)βHX0<1,(b)thesequenceHλ(PA,nPH,n)nNgivenbyHλ(PA,nPH,n)=expan(qλE)X0=:Vλ,X0,nisstrictlydecreasing,(c)limnHλ(PA,nPH,n)=expx0(qλE)X0]0,1[,(d)limn1nlogHλ(PA,nPH,n)=0(e)themapX0Vλ,X0,nisstrictlydecreasing.

Proposition 3.

For all βA,βH,αA,αH,λPNI×(R\[0,1]) and all initial population sizes X0N there holds with qλE:=βAλβH1λ

(a)Hλ(PA,1PH,1)=expβAλβH1λβλ·X0>1,(b)thesequenceHλ(PA,nPH,n)nNgivenbyHλ(PA,nPH,n)=expan(qλE)·X0=:Vλ,X0,nisstrictlyincreasing,(c)limnHλ(PA,nPH,n)=expx0(qλE)·X0>1,ifλ[λ,λ+]\[0,1],,ifλ],λ[]λ+,[,(d)limn1nlogHλ(PA,nPH,n)=0,ifλ[λ,λ+]\[0,1],,ifλ],λ[]λ+,[,(e)themapX0Vλ,X0,nisstrictlyincreasing.

In the case βA,βH,αA,αHPSP,1, the sequence an(qλE)nN under consideration is formally the same, with the parameter qλE:=βAλβH1λ>0. However, in contrast to the case PNI, on PSP,1 both the sequence an(qλE)nN and the sum αAβAk=1nak(qλE)nN are strictly decreasing in case that λ]0,1[, and strictly increasing in case that λR\[0,1]. The respective convergence behaviours are given in Properties 1 (P1) and (P3). We thus obtain

Proposition 4.

For all βA,βH,αA,αH,λPSP,1×]0,1[ and all initial population sizes X0N there holds with qλE:=βAλβH1λ

(a)Hλ(PA,1PH,1)=expβAλβH1λβλ·X0+αAβA<1,(b)thesequenceHλ(PA,nPH,n)nNgivenbyHλ(PA,nPH,n)=expan(qλE)·X0+αAβAk=1nak(qλE)=:Vλ,X0,nisstrictlydecreasing,(c)limnHλ(PA,nPH,n)=0,(d)limn1nlogHλ(PA,nPH,n)=αAβA·x0(qλE)<0,(e)themapX0Vλ,X0,nisstrictlydecreasing.

Proposition 5.

For all βA,βH,αA,αH,λPSP,1×(R\[0,1]) and all initial population sizes X0N there holds with qλE:=βAλβH1λ

(a)Hλ(PA,1PH,1)=expβAλβH1λβλ·X0+αAβA>1,(b)thesequenceHλ(PA,nPH,n)nNgivenbyHλ(PA,nPH,n)=expan(qλE)·X0+αAβAk=1nak(qλE)=:Vλ,X0,nisstrictlyincreasing,(c)limnHλ(PA,nPH,n)=,(d)limn1nlogHλ(PA,nPH,n)=αAβA·x0(qλE)>0,ifλ[λ,λ+]\[0,1],,ifλ],λ[]λ+,[,(e)themapX0Vλ,X0,nisstrictlyincreasing.

Due to the nature of the equal-fraction-case PSP,1, in the assertions (a), (b), (d) of the Propositions 4 and 5, the fraction αA/βA can be equivalently replaced by αH/βH.

Remark 2.

For the (to our context) incompatible setup of GWI with Poisson offspring but nonstochastic immigration of constant value 1, an “analogue” of part (d) of the Propositions 4 resp. 5 was established in Linkov & Lunyova [53].

3.4. Some Preparatory Basic Facts for the Remaining Cases βA,βH,αA,αHPSP\PSP,1

The bounds Bλ,X0,nL,Bλ,X0,nU for the Hellinger integral introduced in formula (40) in Theorem 1 can be chosen arbitrarily from a (pλL,qλL,pλU,qλU)-indexed set of context-specific parameters satisfying (34), or equivalently (35).

In order to derive bounds which are optimal, with respect to goals that will be discussed later, the following monotonicity properties of the sequences an(q)nN and bn(p,q)nN (cf. (36), (37)) for general, context-independent parameters q and p, will turn out to be very useful:

Properties 2.

  • (P10) 

    For 0q1<q2< there holds an(q1)<an(q2) for all nN.

  • (P11) 

    For each fixed q0 and 0p1<p2< there holds bn(p1,q)<bn(p2,q), for all nN.

  • (P12) 

    For fixed p>0 and 0q1<q2 it follows bn(p,q1)<bn(p,q2) for all nN.

  • (P13) 
    Suppose that 0p1<p2 and 0q2<q1. For fixed nN, no dominance assertion can be conjectured for bn(p1,q1),bn(p2,q2). As an example, consider the setup βA,βH,αA,αH,λ=(0.4,0.8,5,3,0.5); within our running-example epidemiological context of Section 2.3, this corresponds to a “nearly dangerous” infectious-disease-transmission situation (H) (with nearly critical reproduction number βH=0.8 and importation mean of αH=3), whereas (A) describes a “mild” situation (with “low” subcritical βA=0.4 and αA=5). On the nonnegative real line, the function ϕλ(x) can be bounded from above by the linear functions ϕλU,1(x):=p1+q1x:=4.040+0.593·x as well as by ϕλU,2(x):=p2+q2x:=4.110+0.584·x. Clearly, p1<p2 and q1>q2. Let us show the first eight elements and the respective limits of the corresponding sequences bn(p1,q1),bn(p2,q2):
    n 1 2 3 4 5 6 7 8
    bn(p1,q1) 0.040 0.011 −0.005 −0.015 −0.021 −0.024 −0.026 −0.028 −0.029
    bn(p2,q2) 0.110 0.045 0.007 −0.014 −0.026 −0.033 −0.036 −0.039 −0.041
  • (P14) 
    For arbitrary 0<p1,p2 and 0q1,q2min{1,eβλ1} suppose that log(p1)+x0(q1)<log(p2)+x0(q2). Then there holds
    p1·ex0(q1)αλ=limn1nk=1nbk(p1,q1)<limn1nk=1nbk(p2,q2)=p2·ex0(q2)αλ.

From (P10) to (P12) one deduces that both sequences an(q)nN and bn(p,q)nN are monotone in the general parameters p,q0. Thus, for the upper bound of the Hellinger integral Bλ,X0,nU we should use nonnegative context-specific parameters pλU=pUβA,βH,αA,αH,λ and qλU=qUβA,βH,αA,αH,λ which are as small as possible, and for the lower bound Bλ,X0,nL we should use nonnegative context-specific parameters pλL=pLβA,βH,αA,αH,λ and qλL=qLβA,βH,αA,αH,λ which are as large as possible, of course, subject to the (equivalent) restrictions (34) and (35).

To find “optimal” parameter pairs, we have to study the following properties of the function ϕλ(·)=ϕ(·,βA,βH,αA,αH,λ) defined on [0,[ in (30) (which are also valid for the previous parameter context βA,βH,αA,αH(PNIPSP,1)):

Properties 3.

  • (P15) 
    One has
    ϕλ(x)=αA+βAxλαH+βHx1λλ(αA+βAx)+(1λ)(αH+βHx)0,ifλ]0,1[,0,ifλR\[0,1],
    where equality holds iff fA(x)=fH(x) for some x[0,[ iff x=x*:=αAαHβHβA[0,[.
  • (P16) 
    There holds
    ϕλ(0)=αAλαH1λαλ0,ifλ]0,1[,0,ifλR\[0,1],
    with equality iff αA=αH together with βAβH (cf. Lemma A1).
  • (P17) 
    For all λR\{0,1} one gets
    ϕλ(x)=λβAfA(x)λ1fH(x)1λ+(1λ)βHfA(x)λfH(x)λβλ.
  • (P18) 
    There holds
    limxϕλ(x)=βAλβH1λβλ0,ifλ]0,1[,0,ifλR\[0,1],
    with equality iff βA=βH together with αAαH (cf. Lemma A1).
  • (P19) 
    There holds
    ϕλ(x)=λ(1λ)fA(x)λ2fH(x)λ1αAβHαHβA20,ifλ]0,1[,0,ifλR\[0,1],
    with equality iff βA,βH,αA,αH(PNIPSP,1). Hence, for βA,βH,αA,αHPSP\PSP,1, the function ϕλ is strictly concave (convex) for λ]0,1[ (λR\[0,1]). Notice that ϕλ(0)=λβAαAαHλ1+(1λ)βHαAαHλβλ can be either negative (e.g., for the setup βA,βH,αA,αH,λ{(4,2,3,1,0.5), (4,2,5,1,2)}, or zero (e.g., for βA,βH,αA,αH,λ(4,2,4,1,0.5),(4,2,3,1,2)), or positive (e.g., for βA,βH,αA,αH,λ{(4,2,5,1,0.5), (4,2,2,1,2)}), where the exemplary parameter constellations have concrete interpretations in our running-example epidemiological context of Section 2.3. Accordingly, for λ]0,1[, due to concavity and (P17), the function ϕλ(·) can be either strictly decreasing, or can obtain its global maximum in ]0,[, or–only in the case βA=βH—can be strictly increasing. Analogously, for λR\[0,1], the function ϕλ(·) can be either strictly increasing, or can obtain its global minimum in ]0,[, or–only in the case βA=βH—can be strictly decreasing.
  • (P20)
    For all λR\{0,1} one has
    limxϕλ(x)rλ˜+sλ˜x=0,forrλ˜:=pλ˜αλ:=λαAβAβHλ1+(1λ)αHβAβHλαλandsλ˜:=qλ˜βλ:=βAλβH1λβλ.
    The linear function ϕλ˜(x):=rλ˜+sλ˜·x constitutes the asymptote of ϕλ(·). Notice that if βA=βH one has s˜λ=0=r˜λ; if βAβH we have s˜λ<0 in the case λ]0,1[ and s˜λ>0 if λR\[0,1]. Furthermore, ϕλ(0)<rλ˜ if λ]0,1[ and ϕλ(0)>rλ˜ if λR\[0,1], (cf. Lemma A1(c1) and (c2)). If αA=αH (and thus βAβH), then the intercept rλ˜ is strictly positive if λ]0,1[ resp. strictly negative if λR\[0,1]. In contrast, for the case αAαH, the intercept rλ˜ can assume any sign, take e.g., βA,βH,αA,αH,λ{(3.7,0.9,2.0,1.0,0.5),(4,2,1.6,1,2)} for rλ˜>0, βA,βH,αA,αH,λ{(3.6,0.9,2.0,1.0,0.5),(4,2,1.5,1,2)} for rλ˜=0, and βA,βH,αA,αH,λ{(3.5,0.9,2.0,1.0,0.5),(4,2,1.4,1,2)} for rλ˜<0; again, the exemplary parameter constellations have concrete interpretations in our running-example epidemiological context of Section 2.3.

The properties (P15) to (P20) above describe in detail the characteristics of the function ϕλ(·)=ϕ(·,βA,βH,αA,αH,λ). In the previous parameter setup PNIPSP,1, this function is linear, which can be seen from (P19). In the current parameter setup PSP\PSP,1, this function can basically be classified into four different types. From (P16) to (P20) it is easy to see that for all current parameter constellations the particular choices

pλA:=αAλαH1λ>0,qλA:=βAλβH1λ>0, (45)

which correspond to the following choices in (35)

rλA:=αAλαH1λαλ0(resp.0),sλA:=βAλβH1λβλ0(resp.0),

– where A=L (resp. A=U)–lead to the tightest lower bound Bλ,X0,nL (resp. upper bound Bλ,X0,nU) for Hλ(PA,nPH,n) in (40) in the case λ]0,1[ (resp. λR\[0,1]). Notice that for the previous parameter setup βA,βH,αA,αH(PNIPSP,1) these choices led to the exact values of the Hellinger integral and to the simplification pλE/qλE·βλαλ=0, which implies bn(pλE,qλE)=αA/βA·an(qλE). In contrast, in the current parameter setup βA,βH,αA,αHPSP\PSP,1 we only derive the optimal lower (resp. upper) bound for λ]0,1[ (resp. λR\[0,1]) by using the parameters pλA,qλA for A=L (resp. A=U) and pλA/qλA·βλαλ0. For a better distinguishability and easier reference we thus stick to the Lnotation (resp. Unotation) here.

3.5. Lower Bounds for the Cases βA,βH,αA,αH,λ(PSP\PSP,1)×]0,1[

The discussion above implies that the lower bound Bλ,X0,nL for the Hellinger integral Hλ(PA,nPH,n) in (40) is optimal for the choices pλL,qλL>0 defined in (45). If βAβH, due to Properties 1 (P1) and Lemma A1, the sequence an(qλL)nN is strictly negative and strictly decreasing and converges to the unique negative solution of the Equation (44). Furthermore, due to (P5), the sequence bn(pλL,qλL)nN, as defined in (37), is strictly decreasing. Since b1(pλL,qλL)=pλLαλ0 by Lemma A1, with equality iff αA=αH, the sequence bn(pλL,qλL)nN is also strictly negative (with the exception b1(pλL,qλL)=0 for αA=αH) and strictly decreasing. If βA=βH and thus αAαH, due to (P2), (P6) and Lemma A1, there holds an(qλL)0 and bn(qλL)pλLαλ<0. Thus, analogously to the cases PNIPSP,1 we obtain

Proposition 6.

For all βA,βH,αA,αH,λ(PSP\PSP,1)×]0,1[ and all initial population sizes X0N there holds with pλL:=αAλαH1λ,qλL:=βAλβH1λ

(a)Bλ,X0,1L=expβAλβH1λβλ·X0+αAλαH1λαλ<1,(b)thesequenceoflowerboundsBλ,X0,nLnNforHλ(PA,nPH,n)givenbyBλ,X0,nL=expan(qλL)·X0+pλLqλLk=1nak(qλL)+n·pλLqλL·βλαλisstrictlydecreasing,(c)limnBλ,X0,nL=0,(d)limn1nlogBλ,X0,nL=pλLqλL·x0(qλL)+βλαλ=pλL·ex0(qλL)αλ<0.(e)themapX0Bλ,X0,nLisstrictlydecreasing.

3.6. Goals for Upper Bounds for the Cases βA,βH,αA,αH,λ(PSP\PSP,1)×]0,1[

For parameter constellations βA,βH,αA,αH,λ(PSP\PSP,1)×]0,1[, in contrast to the treatment of the lower bounds (cf. the previous Section 3.5), the fine-tuning of the upper bounds of the Hellinger integrals Hλ(PA,nPH,n) is much more involved. To begin with, let us mention that the monotonicity-concerning Properties 2 (P10) to (P12) imply that for a tight upper bound Bλ,X0,nU (cf. (40)) one should choose parameters pλUpλL>0,qλUqλL>0 as small as possible. Due to the concavity (cf. Properties 3 (P19)) of the function ϕλ(·), the linear upper bound ϕλU(·) (on the ultimately relevant subdomain N0) thus must hit the function ϕλ(·) in at least one point xN0, which corresponds to some “discrete tangent line” of ϕλ(·) in x, or in at most two points x,x+1N0, which corresponds to the secant line of ϕλ(·) across its arguments x and x+1. Accordingly, there is in general no overall best upper bound; of course, one way to obtain “good” upper bounds for Hλ(PA,nPH,n) is to solve the optimization problem

pλU¯,qλU¯:=argmin(pλU,qλU)expan(qλU)·X0+k=1nbk(pλU,qλU), (46)

subject to the constraint (35). However, the corresponding result generally depends on the particular choice of the initial population X0N and on the observation time horizon nN. Hence, there is in general no overall optimal choice of pλU,qλU without the incorporation of further goal-dependent constraints such as limnBλ,X0,nU=0 in case of limnHλ(PA,nPH,n)=0. By the way, mainly because of the non-explicitness of the sequence an(qλU)nN (due to the generally not explicitly solvable recursion (36)) and the discreteness of the constraint (35), this optimization problem seems to be not straightforward to solve, anyway. The choice of parameters pλU,qλU for the upper bound Bλ,X0,nUHλ(PA,nPH,n) can be made according to different, partially incompatible (“optimality-” resp. “goodness-”) criteria and goals, such as:

  • (G1)

    the validity of Bλ,X0,nU<1 simultaneously for all initial configurations X0N, all observation horizons nN and all λ]0,1[, which leads to a strict improvement of the general upper bound Hλ(PA,nPH,n)<1 (cf. (9));

  • (G2)

    the determination of the long-term-limits limnHλ(PA,nPH,n) respectively limnBλ,X0,nU for all X0N and all λ]0,1[; in particular, one would like to check whether limnHλ(PA,nPH,n)=0, which implies that the families of probability distributions PA,nnN and PH,nnN are asymptotically distinguishable (entirely separated), cf. (25);

  • (G3)

    the determination of the time-asymptotical growth rates limn1nlogHλ(PA,nPH,n) resp. limn1nlogBλ,X0,nU for all X0N and all λ]0,1[.

Further goals–with which we do not deal here for the sake of brevity–are for instance (i) a very good tightness of the upper bound Bλ,X0,nU for nN for some fixed large NN, or (ii) the criterion (G1) with fixed (rather than arbitrary) initial population size X0N.

Let us briefly discuss the three Goals (G1) to (G3) and their challenges: due to Theorem 1, Goal (G1) can only be achieved if the sequence an(qλU)nN is non-increasing, since otherwise, for each fixed observation horizon nN there is a large enough initial population size X0 such that the upper bound component B˜λ,X0,n(pλU,qλU) becomes larger than 1, and thus Bλ,X0,nU=1 (cf. (40)). Hence, Properties 1 (P1) and (P2) imply that one should have qλUβλ. Then, the sequence bn(pλU,qλU)nN is also non-increasing. However, since bn(pλU,qλU) might be positive for some (even all) nN, the sum k=1nbk(pλU,qλU)nN is not necessarily decreasing. Nevertheless, the restriction

qλUβλ0andpλUαλ0,whereatleastoneoftheinequalitiesisstrict, (47)

ensures that both sequences an(qλU)nN and bn(pλL,qλU)nN are nonpositive and decreasing, where at least one sequence is strictly negative, implying that the sum k=1nbk(pλU,qλU)nN is strictly negative for n2 and strictly decreasing. To see this, suppose that (47) is satisfied with two strict inequalities. Then, an(qλU)nN as well as bn(pλL,qλU)nN are strictly negative and strictly decreasing. If qλU=βλ and pλU<αλ, we see from (P2) and (P6) that an(qλU)0 and that bn(pλU,qλU)pλUαλ<0 (notice that αλ=0 is not possible in the current setup PSP\PSP,1 and for λ]0,1[). In the last case qλU<βλ and pλU=αλ, from (P1) and (P5) it follows that an(qλU)nN is strictly negative and strictly decreasing, as well as that b1(pλU,qλU)=0 and bn(pλL,qλU)nN is strictly decreasing and strictly negative for n2. Thus, whenever (47) is satisfied, the sum k=1nbk(pλU,qλU)nN is strictly negative for n2 and strictly decreasing.

To achieve Goal (G2), we have to require that the sequence an(qλU)nN converges, which is the case if either qλUβλ or βλ<qλUmin{1,eβλ1} (cf. Properties 1 (P1) to (P3)). From the upper bound component B˜λ,X0,n(pλU,qλU) (42) we conclude that Goal (G2) is met if the sequence bn(pλU,qλU)nN converges to a negative limit, i.e., limnbn(pλU,qλU)=pλU·ex0(qλU)αλ<0. Notice that this condition holds true if (47) is satisfied: suppose that qλU<βλ, then x0(qλU)<0 and pλU·ex0(qλU)αλ<pλUαλ0. On the other hand, if pλUαλ<0, one obtains x0(qλU)0 leading to pλU·ex0(qλU)αλpλUαλ<0.

The examination of Goal (G2) above enters into the discussion of Goal (G3): if the sequence an(qλU)nN converges and limnBλ,X0,nU=0, then there holds

limn1nlogBλ,X0,nU=limn1nlogB˜λ,X0,n(pλU,qλU)=pλU·ex0(qλU)αλ. (48)

For the case βA,βH,αA,αH,λ(PSP\PSP,1)×]0,1[, let us now start with our comprehensive investigations of the upper bounds, where we focus on fulfilling the condition (47) which tackles Goals (G1) and (G2) simultaneously; then, the Goal (G3) can be achieved by (48). As indicated above, various different parameter subcases can lead to different Hellinger-integral-upper-bound details, which we work out in the following. For better transparency, we employ the following notations (where the first four are just reminders of sets which were already introduced above)

PNI:=βA,βH,αA,αH[0,[4:αA=αH=0;βA>0;βH>0;βAβH,PSP:=βA,βH,αA,αH]0,[4:(αAαH)or(βAβH)orboth,P:=PNIPSP,PSP,1:=βA,βH,αA,αHPSP:αAαH,βAβH,αAβA=αHβH,PSP,2:=βA,βH,αA,αHPSP:αA=αH,βAβH,PSP,3:=βA,βH,αA,αHPSP:αAαH,βAβH,αAβAαHβH=PSP,3aPSP,3bPSP,3c,PSP,3a:=βA,βH,αA,αHPSP:αAαH,βAβH,αAβAαHβH,αAαHβHβA],0[,PSP,3b:=βA,βH,αA,αHPSP:αAαH,βAβH,αAβAαHβH,αAαHβHβA]0,[\N,PSP,3c:=βA,βH,αA,αHPSP:αAαH,βAβH,αAβAαHβH,αAαHβHβAN,PSP,4:=βA,βH,αA,αHPSP:αAαH>0,βA=βH=PSP,4aPSP,4b,PSP,4a:=βA,βH,αA,αHPSP:αAαH>0,βA=βH]0,1[,PSP,4b:=βA,βH,αA,αHPSP:αAαH>0,βA=βH[1,[; (49)

notice that because of Lemma A1 and of the Properties 3 (P15) one gets on the domain ]0,[ the relation ϕλ(x)=0 iff fA(x)=fH(x) iff x=x*:=αHαAβAβH]0,[.

3.7. Upper Bounds for the Cases βA,βH,αA,αH,λPSP,2×]0,1[

For this parameter constellation, one has ϕλ(0)=0 and ϕλ(0)=0 (cf. Properties 3 (P16), (P17)). Thus, the only admissible intercept choice satisfying (47) is rλU=0=pλUαλ(i.e., pλU=pUβA,βH,αA,αH,λ=αλ=α>0), and the minimal admissible slope which implies (35) for all xN0 is given by sλU=ϕλ(1)ϕλ(0)10=qλUβλ=a1(qλU)<0(i.e., qλU=qUβA,βH,αA,αH,λ=(α+βA)λ(α+βH)1λα>0). Analogously to the investigation for PSP,1 in the above-mentioned Section 3.3, one can derive that an(qλU)nN is strictly negative, strictly decreasing, and converges to x0(qλU)]βλ,qλUβλ[ as indicated in Properties 1 (P1). Moreover, in the same manner as for the case PSP,1 this leads to

Proposition 7.

For all βA,βH,αA,αH,λPSP,2×]0,1[ and all initial population sizes X0N there holds with pλU=α,qλU=(α+βA)λ(α+βH)1λα

(a)Bλ,X0,1U=expqλUβλ·X0<1,(b)thesequenceBλ,X0,nUnNofupperboundsforHλ(PA,nPH,n)givenbyBλ,X0,nU=expan(qλU)·X0+k=1nbk(pλU,qλU)isstrictlydecreasing,(c)limnBλ,X0,nU=0=limnHλ(PA,nPH,n),(d)limn1nlogBλ,X0,nU=pλU·ex0(qλU)αλ=αex0(qλU)1<0.(e)themapX0Bλ,X0,nUisstrictlydecreasing.

3.8. Upper Bounds for the Cases βA,βH,αA,αH,λPSP,3a×]0,1[

From Properties 3 (P16) one gets ϕλ(0)<0, whereas ϕλ(0) can assume any sign, take e.g., the parameters βA,βH,αA,αH,λ=(1.8,0.9,2.7,0.7,0.5) for ϕλ(0)<0, βA,βH,αA,αH,λ=(1.8,0.9,2.8,0.7,0.5) for ϕλ(0)=0 and βA,βH,αA,αH,λ=(1.8,0.9,2.9,0.7,0.5) for ϕλ(0)>0; within our running-example epidemiological context of Section 2.3, this corresponds to a “nearly dangerous” infectious-disease-transmission situation (H) (with nearly critical reproduction number βH=0.9 and importation mean of αH=0.7), whereas (A) describes a “dangerous” situation (with supercritical βA=1.8 and αA=2.7,2.8,2.9). However, in all three subcases there holds maxxN0ϕλ(x)maxx[0,[ϕλ(x)<0. Thus, there clearly exist parameters pλU=pUβA,βH,αA,αH,λ, qλU=qUβA,βH,αA,αH,λ with pλU[αAλαH1λ,αλ[ and qλU[βAλβH1λ,βλ[ (implying (47)) such that (35) is satisfied. As explained above, we get the following

Proposition 8.

For all βA,βH,αA,αH,λPSP,3a×]0,1[ there exist parameters pλU,qλU which satisfy pλU[αAλαH1λ,αλ[ and qλU[βAλβH1λ,βλ[ as well as (35) for all xN0, and for all such pairs (pλU,qλU) and all initial population sizes X0N there holds

(a)Bλ,X0,1U=expqλUβλ·X0+pλUαλ<1,(b)thesequenceBλ,X0,nUnNofupperboundsforHλ(PA,nPH,n)givenbyBλ,X0,nU=expan(qλU)X0+k=1nbk(pλU,qλU)isstrictlydecreasing,(c)limnBλ,X0,nU=0=limnHλ(PA,nPH,n),(d)limn1nlogBλ,X0,nU=pλU·ex0(qλU)αλ<0,(e)themapX0Bλ,X0,nUisstrictlydecreasing.

Notice that all parts of this proposition also hold true for parameter pairs (pλU,qλU) satisfying (35) and additionally either pλU=αλ, qλU<βλ or pλU<αλ, qλU=βλ.

Let us briefly illuminate the above-mentioned possible parameter choices, where we begin with the case of ϕλ(0)0, which corresponds to λβAαA/αHλ1+(1λ)βHαA/αHλβλ0 (cf. (P17)); then, the function ϕλ(·) is strictly negative, strictly decreasing, and–due to (P19)–strictly concave (and thus, the assumption αHαAβAβH<0 is superfluous here). One pragmatic but yet reasonable parameter choice is the following: take any intercept pλU[αAλαH1λ,αλ] such that (pλUαλ)+2(ϕλ(1)(pλUαλ))ϕλ(2) (i.e., 2αA+βAλαH+βH1λpλU+αλαA+2βAλαH+2βH1λ) and qλU:=ϕλ(1)(pλUαλ)+βλ=αA+βAλαH+βH1λpλU, which corresponds to a linear function ϕλU which is (i) nonpositive on N0 and strictly negative on N, and (ii) larger than or equal to ϕλ on N0, strictly larger than ϕλ on N\{1,2}, and equal to ϕλ at the point x=1 (“discrete tangent or secant line through x=1”). One can easily see that (due to the restriction (34)) not all pλU[αAλαH1λ,αλ] might qualify for the current purpose. For the particular choice pλU=αAλαH1λ and qλU=αA+βAλαH+βH1λαAλαH1λ one obtains rλU=pλUαλ=b1(pλU,qλU)<0 (cf. Lemma A1) and sλU=qλUβλ=ϕλ(1)ϕλ(0)=a1(qλU)<0 (secant line through ϕλ(0) and ϕλ(1)).

For the remaining case ϕλ(0)>0, which corresponds to λβAαA/αHλ1+(1λ)βHαA/αHλβλ>0, the function ϕλ(·) is strictly negative, strictly concave and hump-shaped (cf. (P18)). For the derivation of the parameter choices, we employ xmax:=argmaxx]0,[ϕλ(x) which is the unique solution of

λβAfA(x)fH(x)λ11+(1λ)βHfA(x)fH(x)λ1=0,x]0,[, (50)

(cf. (P17), (P19)); notice that x=x*:=αHαAβAβH]0,[ formally satisfies the Equation (50) but does not qualify because of the current restriction x*<0.

Let us first inspect the case ϕλ(xmax)>ϕλ(xmax+1), where x denotes the integer part of x. Consider the subcase ϕλ(xmax)+xmaxϕλ(xmax)ϕλ(xmax+1)0, which means that the secant line through ϕλ(xmax) and ϕλ(xmax+1) possesses a non-positive intercept. In this situation it is reasonable to choose as intercept any pλUαλ=b1(pλU,qλU)=rλU[ϕλ(xmax),ϕλ(xmax)+xmaxϕλ(xmax)ϕλ(xmax+1)], and as corresponding slope qλUαλ=a1(qλU)=sλU=ϕλ(xmax)rλU(xmax)00. A larger intercept would lead to a linear function ϕλU for which (35) is not valid at xmax+1. In the other subcase ϕλ(xmax)+xmaxϕλ(xmax)ϕλ(xmax+1)>0, one can choose any intercept pλUαλ=b1(pλU,qλU)=rλU[ϕλ(xmax),0] and as corresponding slope qλUαλ=a1(qλU)=sλU=ϕλ(xmax)rλU(xmax)00 (notice that the corresponding line ϕλU is on ]xmax,[ strictly larger than the secant line through ϕλ(xmax) and ϕλ(xmax+1)).

If ϕλ(xmax)ϕλ(xmax+1), one can proceed as above by substituting the crucial pair of points (xmax,xmax+1) with (xmax+1,xmax+2) and examining the analogous two subcases.

3.9. Upper Bounds for the Cases βA,βH,αA,αH,λPSP,3b×]0,1[

The only difference to the preceding Section 3.8 is that–due to Properties 3 (P15)–the maximum value of ϕλ(·) now achieves 0, at the positive non-integer point xmax=x*=αHαAβAβH]0,[\N (take e.g., βA,βH,αA,αH,λ=(1.8,0.9,1.1,3.0,0.5) as an example, which within our running-example epidemiological context of Section 2.3 corresponds to a “nearly dangerous” infectious-disease-transmission situation (H) (with nearly critical reproduction number βH=0.9 and importation mean of αH=3), whereas (A) describes a “dangerous” situation (with supercritical βA=1.8 and αA=1.1)); this implies that ϕλ(x)<0 for all x on the relevant subdomain N0. Due to (P16), (P17) and (P19) one gets automatically λβAαA/αHλ1+(1λ)βHαA/αHλβλ>0 for all λ]0,1[. Analogously to Section 3.8, there exist parameter pλU[αAλαH1λ,αλ] and qλU[βAλβH1λ,βλ] such that (47) and (35) are satisfied. Thus, all the assertions (a) to (e) of Proposition 8 also hold true for the current parameter constellations.

3.10. Upper Bounds for the Cases βA,βH,αA,αH,λPSP,3c×]0,1[

The only difference to the preceding Section 3.9 is that the maximum value of ϕλ(·) now achieves 0 at the integer point xmax=x*=αHαAβAβHN (take e.g., βA,βH,αA,αH,λ=(1.8,0.9,1.2,3.0,0.5) as an example). Accordingly, there do not exist parameters pλU,qλU, such that (35) and (47) are satisfied simultaneously. The only parameter pair that ensures expan(qλU)·X0+k=1nbk(pλU,qλU)1 for all nN and all X0N without further investigations, leads to the choices pλU=αλ as well as qλU=βλ. Consequently, Bλ,X0,nU1, which coincides with the general upper bound (9), but violates the above-mentioned desired Goal (G1). However, there might exist parameters pλU<αλ,qλU>βλ or pλU>αλ,qλU<βλ, such that at least the parts (c) and (d) of Proposition 8 are satisfied. Nevertheless, by using a conceptually different method we can prove

Hλ(PA,nPH,n)<1nN\{1}aswellastheconvergencelimnHλ(PA,nPH,n)=0 (51)

which will be used for the study of complete asymptotical distinguishability (entire separation) below. This proof is provided in Appendix A.1.

3.11. Upper Bounds for the Cases βA,βH,αA,αH,λPSP,4a×]0,1[

This setup and the remaining setup βA,βH,αA,αH,λPSP,4b×]0,1[ (see the next Section 3.12) are the only constellations where ϕλ(·) is strictly negative and strictly increasing, with limxϕλ(x)=limxϕλ(x)=0, leading to the choices pλU=αλ as well as qλU=βλ=β under the restriction that expan(qλU)·X0+k=1nbk(pλU,qλU)1 for all nN and all X0N. Consequently, one has Bλ,X0,nU1, which is consistent with the general upper bound (9) but violates the above-mentioned desired Goal (G1). Unfortunately, the proof method of (51) (cf. Appendix A.1) can’t be carried over to the current setup. The following proposition states two of the above-mentioned desired assertions which can be verified by a completely different proof method, which is also given in Appendix A.1.

Proposition 9.

For all βA,βH,αA,αH,λPSP,4a×]0,1[ there exist parameters pλU<αλ, 1>qλU>βλ=β such that (35) is satisfied for all x[0,[ and such that for all initial population sizes X0N the parts (c) and (d) of Proposition 8 hold true.

3.12. Upper Bounds for the Cases βA,βH,αA,αH,λPSP,4b×]0,1[

The assertions preceding Proposition 9 remain valid. However, any linear upper bound of the function ϕλ(·) on the domain N0 possesses the slope qλUβλ0. If qλU=βλ, then the intercept is pλUαλ=0 leading to Bλ,X0,nU1 and thus Goal (G1) is violated. If we use a slope qλUβλ>0, then both the sequences an(qλU)nN and bn(pλU,qλU)nN are strictly increasing and diverge to . This comes from Properties 1 (P3b) and (P7b) since qλU>βλ=β1. Altogether, this implies that the corresponding upper bound component B˜λ,X0,n(pλU,qλU) (cf. (42)) diverges to as well. This leads to

Proposition 10.

For all βA,βH,αA,αH,λPSP,4b×]0,1[ and all initial population sizes X0N there do not exist parameters pλU0, qλU0 such that (35) is satisfied and such that the parts (c) and (d) of Proposition 8 hold true.

3.13. Concluding Remarks on Alternative Upper Bounds for all Cases βA,βH,αA,αH,λ (PSP\PSP,1)×]0,1[

As mentioned earlier on, starting from Section 3.6 we have principally focused on constructing upper bounds Bλ,X0,nU of the Hellinger integrals, starting from pλU,qλU which fulfill (35) as well as further constraints depending on the Goals (G1) and (G2). For the setups in the Section 3.7, Section 3.8 and Section 3.9, we have proved the existence of special parameter choices pλU,qλU which were consistent with (G1) and (G2). Furthermore, for the constellation in the Section 3.11 we have found parameters such that at least (G2) is satisfied. In contrast, for the setup of Section 3.12 we have not found any choices which are consistent with (G1) and (G2), leading to the “cut-off bound” Bλ,X0,nU1 which gives no improvement over the generally valid upper bound (9).

In the following, we present some alternative choices of pλU,qλU which–depending on the parameter constellation βA,βH,αA,αH,λ(PSP\PSP,1)×]0,1[–may or may not lead to upper bounds Bλ,X0,nU which are consistent with Goal (G1) or with (G2) (and which are maybe weaker or better than resp. incomparable with the previous upper bounds when dealing with some relaxations of (G1), such as e.g., Hλ(PA,nPH,n)<1 for all but finitely many nN).

As a first alternative choice for a linear upper bound of ϕλ(·) (cf. (35)) one could use the asymptote ϕλ˜(·) (cf. Properties 3 (P20)) with the parameters pλU:=pλ˜=λαAβA/βHλ1+(1λ)αHβA/βHλ and qλU:=qλ˜=βAλβH1λ. Another important linear upper bound of ϕλ(·) is the tangent line ϕλ,ytan(·) on ϕλ(·) at an arbitrarily fixed point y[0,[, which amounts to

ϕλ,ytan(x):=rλ,ytan+sλ,ytan·x:=pλ,ytanαλ+qλ,ytanβλ·x:=ϕλ(y)y·ϕλ(y)+ϕλ(y)·x, (52)

where ϕλ(·) is given by (P17). Notice that this upper bound is for y]0,[\N “not tight” in the sense that ϕλ,ytan(·) does not hit the function ϕλ(·) on N0 (where the generation sizes “live”); moreover, ϕλ,ytan(x) might take on strictly positive values for large enough points x which is counter-productive for Goal (G1). Another alternative choice of a linear upper bound for ϕλ(·), which in contrast to the tangent line is “tight” (but not necessarily avoiding the strict positivity), is the secant line ϕλ,ksec(·) across its arguments k and k+1, given by

ϕλ,ksec(x):=rλ,ksec+sλ,ksec·x:=pλ,ksecαλ+qλ,ksecβλ·x:=ϕλ(k)k·ϕλ(k+1)ϕλ(k)+ϕλ(k+1)ϕλ(k)·x. (53)

Another alternative choice is the horizontal line

ϕλhor(x)maxϕλ(y),yN0. (54)

For pλUpλ˜,pλ,ytan,pλ,ysec and qλUqλ,ytan,qλ,ysec it is possible that in some parameter cases βA,βH,αA,αH either the intercept rλU=pλUαλ is strictly larger than zero or the slope sλU=qλUβλ is strictly larger than zero. Thus, it can happen that B˜λ,X0,n(pλU,qλU)>1 for some (and even for all) nN, such that the corresponding upper bound Bλ,X0,nU for the Hellinger integral Hλ(PA,nPH,n) amounts to the cut-off at 1. However, due to Properties 1 (P5) and (P7a), the sequence B˜λ,X0,n(pλU,qλU)nN may become smaller than 1 and may finally converge to zero. Due to Properties 2 (P14), this upper bound can even be tighter (smaller) than those bounds derived from parameters pλU,qλU fulfilling (47).

As far as our desired Hellinger integral bounds are concerned, in the setup of Section 3.11—where limyϕλ,ytan(·)0–for the proof of Proposition 9 in Appendix A.1 we shall employ the mappings yϕλ,ytan resp. ypλ,ytan resp. yqλ,ytan. These will also be used for the proof of the below-mentioned Theorem 4.

3.14. Intermezzo 1: Application to Asymptotical Distinguishability

The above-mentioned investigations can be applied to the context of Section 2.6 on asymptotical distinguishability. Indeed, with the help of the Definitions 1 and 2 as well as the equivalence relations (25) and (26) we obtain the following

Corollary 1.

  • (a) 

    For all βA,βH,αA,αHPSP\PSP,4b and all initial population sizes X0N, the corresponding sequences (PA,n)nN0 and (PH,n)nN0 are entirely separated (completely asymptotically distinguishable).

  • (b) 

    For all βA,βH,αA,αHPNI with βA1 and all initial population sizes X0N, the sequence (PA,n)nN0 is contiguous to (PH,n)nN0.

  • (c) 

    For all βA,βH,αA,αHPNI with βA>1 and all initial population sizes X0N, the sequence (PA,n)nN0 is neither contiguous to nor entirely separated to (PH,n)nN0.

The proof of Corollary 1 will be given in Appendix A.1.

Remark 3.

  • (a) 

    Assertion (c) of Corollary 1 contrasts the case of Gaussian processes with independent increments where one gets either entire separation or mutual contiguity (see e.g., Liese & Vajda [1]).

  • (b) 

    By putting Corollary 1(b) and (c) together, we obtain for different “criticality pairs” in the non-immigration case PNI the following asymptotical distinguishability types: (PA,n)(PH,n) if βA1, βH1; (PA,n)¯(PH,n) if βA1, βH>1; (PA,n)¯(PH,n) if βA>1, βH1; (PA,n)¯¯(PH,n) and (PA,n)¯(PH,n) if βA>1, βH>1; in particular, for PNI the sequences (PA,n)nN0 and (PH,n)nN0 are not completely asymptotically inseparable (indistinguishable).

  • (c) 

    In the light of the above-mentioned characterizations of contiguity resp. entire separation by means of Hellinger integral limits, the finite-time-horizon results on Hellinger integrals given in the “λ]0,1[ parts” of Theorem 1, the Section 3.3, Section 3.4, Section 3.5, Section 3.6, Section 3.7, Section 3.8, Section 3.9, Section 3.10, Section 3.11, Section 3.12, Section 3.13 and also in the below-mentioned Section 6 can loosely be interpreted as “finite-sample (rather than asymptotical) distinguishability” assertions.

3.15. Intermezzo 2: Application to Decision Making under Uncertainty

3.15.1. Bayesian Decision Making

The above-mentioned investigations can be applied to the context of Section 2.5 on dichotomous Bayesian decision making on the space of all possible path scenarios (path space) of Poissonian Galton-Watson processes without/with immigration GW(I) (e.g., in combination with our running-example epidemiological context of Section 2.3). More detailed, for the minimal mean decision loss (Bayes risk) Rn defined by (18) we can derive upper (respectively lower) bounds by using (19) respectively (20) together with the exact values or the upper (respectively lower) bounds of the Hellinger integrals Hλ(PA,nPH,n) derived in the “λ]0,1[ parts” of Theorem 1, the Section 3.3, Section 3.4, Section 3.5, Section 3.6, Section 3.7, Section 3.8, Section 3.9, Section 3.10, Section 3.11, Section 3.12, Section 3.13 (and also in the below-mentioned Section 6); instead of providing the corresponding outcoming formulas–which is merely repetitive–we give the illustrative

Example 1.

Based on a sample path observation Xn:={X:=1,...,n} of a GWI, which is either governed by a hypothesis law PH or an alternative law PA, we want to make a dichotomous optimal Bayesian decision described in Section 2.5, namely, decide between an action dH “associated with” PH and an action dA “associated with” PA, with pregiven loss function (16) involving constants LA>0, LH>0 which e.g., arise as bounds from quantities in worst-case scenarios.

For this, let us exemplarily deal with initial population X0=5 as well as parameter setup βA,βH,αA,αH=(1.2,0.9,4,3)PSP,1; within our running-example epidemiological context of Section 2.3, this corresponds e.g., to a setup where one is encountered with a novel infectious disease (such as COVID-19) of non-negligible fatality rate, and (A) reflects a “potentially dangerous” infectious-disease-transmission situation (with supercritical reproduction number βA=1.2 and importation mean of αA=4, for weekly appearing new incidence-generations) whereas (H) describes a “milder” situation (with subcritical βH=0.9 and αH=3). Moreover, let dH and dA reflect two possible sets of interventions (control measures) in the course of pandemic risk management, with respective “worst-case type” decision losses LA=600 and LH=300 (e.g., in units of billion Euros or U.S. Dollars). Additionally we assume the prior probabilities π=Pr(H)=1Pr(A)=0.5, which results in the prior-loss constants LA=300 and LH=150. In order to obtain bounds for the corresponding minimal mean decision loss (Bayes Risk) Rn defined in (18) we can employ the general Stummer-Vajda bounds (cf. [15]) (19) and (20) in terms of the Hellinger integral Hλ(PA,nPH,n) (with arbitrary λ]0,1[), and combine this with the appropriate detailed results on the latter from the preceding subsections. To demonstrate this, let us choose λ=0.5 (for which H1/2(PA,nPH,n) can be interpreted as a multiple of the Bhattacharyya coefficient between the two competing GWI) respectively λ=0.9, leading to the parameters p0.5E=3.464,q0.5E=1.039 respectively p0.9E=3.887, q0.9E=1.166 (cf. (33)). Combining (19) and (20) with Theorem 1 (a)– which provides us with the exact recursive values of Hλ(PA,nPH,n) in terms of the sequence an(qλE) (cf. (36))– we obtain for λ=0.5 the bounds

RnRnU:=2.121·102·exp5·an(1.039)+103·k=1nak(1.039),RnRnL:=100·exp10·an(1.039)+203·k=1nak(1.039),

whereas for λ=0.9 we get

RnRnU:=2.799·102·exp5·an(1.166)+103·k=1nak(1.166),RnRnL:=3.902·exp50·an(1.166)+1003·k=1nak(1.166).

Figure 1 illustrates the lower (orange resp. cyan) and upper (red resp. blue) bounds RnL resp. RnU of the Bayes Risk Rn employing λ=0.5 resp. λ=0.9 on both a unit scale (left graph) and a logarithmic scale (right graph). The lightgrey/grey/black curves correspond to the (18)-based empirical evaluation of the Bayes risk sequence Rnsamplen=1,...,50 from three independent Monte Carlo simulations of 10000 GWI sample paths (each) up to time horizon 50.

Figure 1.

Figure 1

Bayes risk bounds (using λ=0.5 (red/orange) resp. λ=0.9 (blue/cyan)) and Bayes risk simulations (lightgrey/grey/black) on a unit (left graph) and logarithmic (right graph) scale in the parameter setup βA,βH,αA,αH=(1.2,0.9,4,3)PSP,1, with initial population X0=5 and prior-loss constants LA=300 and LH=150.

3.15.2. Neyman-Pearson Testing

By combining (23) with the exact values resp. upper bounds of the Hellinger integrals HλPA,nPH,n from the preceding subsections, we obtain for our context of GW(I) with Poisson offspring and Poisson immigration (including the non-immigration case) some upper bounds of the minimal type II error probability EςPA,nPH,n in the class of the tests for which the type I error probability is at most ς]0,1[, which can also be immediately rewritten as lower bounds for the power 1EςPA,nPH,n of a most powerful test at level ς. As for the Bayesian context of Section 3.15.1, instead of providing the–merely repetitive–outcoming formulas for the bounds of EςPA,nPH,n we give the illustrative

Example 2.

Consider the Figure 2 and Figure 3 which deal with initial population X0=5 and the parameter setup βA,βH,αA,αH=(0.3,1.2,1,4)PSP,1; within our running-example epidemiological context of Section 2.3, this corresponds to a “potentially dangerous” infectious-disease-transmission situation (H) (with supercritical reproduction number βH=1.2 and importation mean of αH=4), whereas (A) describes a “very mild” situation (with “low” subcritical βA=0.3 and αA=1). Figure 2 shows the lower and upper bounds of EςPA,nPH,n with ς=0.05, evaluated from the Formulas (23) and (24), together with the exact values of the Hellinger integral HλPA,nPH,n, cf. Theorem 1 (recall that we are in the setup PSP,1) on both a unit scale (left graph) and a logarithmic scale (right graph). The orange resp. red resp. purple curves correspond to the outcoming upper bounds EnU:=EnU(PA,nPH,n) (cf. (23)) with parameters λ=0.3 resp. λ=0.5 resp. λ=0.7. The green resp. cyan resp. blue curves correspond to the lower bounds EnL:=EnL(PA,nPH,n) (cf. (24)) with parameters λ=2 resp. λ=1.5 resp. λ=1.1. Notice the different λ-ranges in (23) and (24). In contrast, Figure 3 compares the lower bound EnL (for fixed λ=1.1) with the upper bound EnU (for fixed λ=0.5) of the minimal type II error probability Eς(PA,nPH,n) for different levels ς=0.1 (orange for the lower and cyan for the upper bound), ς=0.05 (green and magenta) and ς=0.01 (blue and purple) on both a unit scale (left graph) and a logarithmic scale (right graph).

Figure 2.

Figure 2

Different lower bounds EnL(using λ{1.1,1.5,2}) and upper bounds EnU(using λ{0.3,0.5,0.7}) of the minimal type II error probability EςPA,nPH,n for fixed level ς=0.05 in the parameter setup βA,βH,αA,αH=(0.3,1.2,1,4)PSP,1 together with initial population X0=5 on both a unit scale (left graph) and a logarithmic scale (right graph).

Figure 3.

Figure 3

The lower bound EnL (using λ=1.1) and the upper bound EnU (using λ=0.5) of the minimal type II error probability EςPA,nPH,n for different levels ς{0.01,0.05,0.1} in the parameter setup βA,βH,αA,αH=(0.3,1.2,1,4)PSP,1 together with initial population X0=5 on both a unit scale (left graph) and a logarithmic scale (right graph).

3.16. Goals for Lower Bounds for the Cases βA,βH,αA,αH,λ(PSP\PSP,1)×(R\[0,1])

Recall from (49) the set PSP:=βA,βH,αA,αH]0,[4:(αAαH)or(βAβH)orboth and the “equal-fraction-case” set PSP,1:=βA,βH,αA,αHPSP:αAαH,βAβH,αAβA=αHβH, where for the latter we have derived in Theorem 1(a) and in Proposition 5 the exact recursive values for the time-behaviour of the Hellinger integrals Hλ(PA,1PH,1) of order λR\[0,1]. Moreover, recall that for the case βA,βH,αA,αH,λ(PSP\PSP,1)×]0,1[ we have obtained in the Section 3.4 and Section 3.5 some “optimal” linear lower bounds ϕλL(·) for the strictly concave function ϕλ(x):=ϕ(x,βA,βH,αA,αH,λ) on the domain x[0,[; due to the monotonicity Properties 2 (P10) to (P12) of the sequences an(qλL)nN and bn(pλL,qλL)nN, these bounds have led to the “optimal” recursive lower bound Bλ,X0,nL of the Hellinger integral Hλ(PA,nPH,n) in (40) of Theorem 1(b)).

In contrast, the strict convexity of the function ϕλ(·) in the case βA,βH,αA,αH,λ(PSP\PSP,1)×(R\[0,1]) implies that we cannot maximize both parameters pλL,qλLRsimultaneously subject to the constraint (35). This effect carries over to the lower bounds Bλ,X0,nL of the Hellinger integrals Hλ(PA,nPH,n) (cf. (41)); in general, these bounds cannot be maximized simultaneously for all initial population sizes X0N and all observation horizons nN.

Analogously to (46), one way to obtain “good” recursive lower bounds for Hλ(PA,nPH,n) from (41) in Theorem 1 (b) is to solve the optimization problem,

pλL¯,qλL¯:=argmax(pλL,qλL)R2expan(qλL)·X0+k=1nbk(pλL,qλL)suchthat(35)issatisfied, (55)

for each fixed initial population size X0N and observation horizon nN. But due to the same reasons as explained right after (46), the optimization problem (55) seems to be not straightforward to solve explicitly. In a congeneric way as in the discussion of the upper bounds for the case λ]0,1[ above, we now have to look for suitable parameters pλL,qλL for the lower bound Bλ,X0,nLHλ(PA,nPH,n) that fulfill (35) and that guarantee certain reasonable criteria and goals; these are similar to the goals (G1) to (G3) from Section 3.6, and are therefore supplemented by an additional “ ”:

  • (G1)

    the validity of Bλ,X0,nL>1 simultaneously for all initial configurations X0N, all observation horizons nN and all λR\[0,1], which leads to a strict improvement of the general upper bound Hλ(PA,nPH,n)>1 (cf. (11));

  • (G2)

    the determination of the long-term-limits limnHλ(PA,nPH,n) respectively limnBλ,X0,nL for all X0N and all λR\[0,1]; in particular, one would like to check whether limnHλ(PA,nPH,n)=;

  • (G3)

    the determination of the time-asymptotical growth rates limn1nlogHλ(PA,nPH,n) resp. limn1nlogBλ,X0,nL for all X0N and all λR\[0,1].

In the following, let us briefly discuss how these three goals can be achieved in principle, where we confine ourselves to parameters pλL,qλL which–in addition to (35)–fulfill the requirement

qλLmax{0,βλ}pλL>max{0,αλ}qλL>max{0,βλ}pλLmax{0,αλ}, (56)

where ∧ is the logical “AND” and ∨ the logical “OR” operator. This is sufficient to tackle all three Goals (G1) to (G3). To see this, assume that pλL,qλL satisfy (35). Let us begin with the two “extremal” cases in (56), i.e., with (i) qλL=max{0,βλ},pλL>max{0,αλ}, respectively (ii) qλL>max{0,βλ},pλL=max{0,αλ}.

Suppose in the first extremal case (i) that βλ0. Then, qλL=0 and Properties 1 (P4) implies that an(qλL)=βλ0 and hence bn(pλL,qλL)=pλLeβλαλpλLαλ>0 for all nN. This enters into (41) as follows: the Hellinger integral lower bound becomes Bλ,X0,nLB˜λ,X0,n(pλL,qλL)=exp{βλ·X0+(pλLeβλαλ)·n}>1. Furthermore, one clearly has limnBλ,X0,nL= as well as limn1nlogBλ,X0,nL=pλLeβλαλ>0. Assume now that βλ>0. Then, qλL=βλ>0, an(qλL)=0 (cf. (P2)), bn(pλL,qλL)=pλLαλ>0 and thus Bλ,X0,nL=exp{(pλLαλ)·n}>1 for all nN. Furthermore, one gets limnBλ,X0,nL= as well as limn1nlogBλ,X0,nL=pλLαλ>0.

Let us consider the other above-mentioned extremal case (ii). Suppose that qλL>max{0,βλ} together with qλL>min{1,eβλ1} which implies that the sequence an(qλL)nN is strictly positive, strictly increasing and grows to infinity faster than exponentially, cf. (P3b). Hence, Bλ,X0,nLexp{an(qλL)·X0}>1, limnBλ,X0,nL= as well as limn1nlogBλ,X0,nL=. If max{0,βλ}<qλLmin{1,eβλ1}, then an(qλL)nN is strictly positive, strictly increasing and converges to x0(qλ)]0,log(qλL)] (cf. (P3a)). This carries over to the sequence bn(pλL,qλL)nN: one gets b1(pλL,qλL)=pλLαλ0 and bn(pλL,qλL)>0 for all n2. Furthermore, bn(pλL,qλL) is strictly increasing and converges to pλL·ex0(qλL)αλ>0, leading to Bλ,X0,nL>1 for all nN, to limnBλ,X0,nL= as well as to limn1nlogBλ,X0,nL=pλL·ex0(qλL)αλ>0.

It remains to look at the cases where pλL,qλL satisfy (35), and (56) with two strict inequalities. For this situation, one gets

  • an(qλL)nN is strictly positive, strictly increasing and–iff qλLmin{1,eβλ1}–convergent (namely to the smallest positive solution x0(qλL)]0,log(qλL)] of (44)), cf. (P3);

  • bn(pλL,qλL)nN is strictly increasing, strictly positive (since b1(pλL,qλL)=pλLαλ>0) and–iff qλLmin{1,eβλ1}–convergent (namely to pλLex0(qλL)αλ[pλLαλ,pλL/qλLαλ]), cf (P7).

Hence, under the assumptions (35) and pλL>max{0,αλ}qλL>max{0,βλ} the corresponding lower bounds Bλ,X0,nL of the Hellinger integral Hλ(PA,nPH,n) fulfill for all X0N

  • Bλ,X0,nL>1 for all nN,

  • limnBλ,X0,nL=,

  • limn1nlogBλ,X0,nL=pλLex0(qλL)αλ>0 for the case qλL]max{0,βλ},min{1,eβλ1}], respectively limn1nlogBλ,X0,nL= for the remaining case qλL>min{1,eβλ1}.

Putting these considerations together we conclude that the constraints (35) and (56) are sufficient to achieve the Goals (G1) to (G3). Hence, for fixed parameter constellation βA,βH,αA,αH,λ, we aim for finding pλL=pLβA,βH,αA,αH,λ and qλL=qLβA,βH,αA,αH,λ which satisfy (35) and (56). This can be achieved mostly, but not always, as we shall show below. As an auxiliary step for further investigations, it is useful to examine the set of all λR\[0,1] for which αλ0 or βλ0 (or both). By straightforward calculations, we see that

αλ0λαHαAαH,ifαA>αH,αHαHαA,ifαA<αH,andβλ0λβHβAβH,ifβA>βH,βHβHβA,ifβA<βH. (57)

Furthermore, recall that (35) implies the general bounds pλLαAλαH1λ=φλ(0) (being equivalent to the requirement ϕλL(0)=ϕλ(0)) and qλLβAλβH1λ=q˜λ (the latter being the maximal slope due to Properties 3 (P19), (P20)).

Let us now undertake the desired detailed investigations on lower and upper bounds of the Hellinger integrals Hλ(PA,nPH,n) of order λR\[0,1], for the various different subclasses of PSP\PSP,1.

3.17. Lower Bounds for the Cases βA,βH,αA,αH,λPSP,2×(R\[0,1])

In such a constellation, where PSP,2:=βA,βH,αA,αHPSP:αA=αH,βAβH (cf. (49)), one gets ϕλ(0)=0 (cf. Properties 3 (P16)), ϕλ(0)=0 (cf. (P17)). Thus, the only choice for the intercept and the slope of the linear lower bound ϕλL(·) for ϕλ(·), which satisfies (35) for all xN and (potentially) (56), is rλL=0=pλLαλ (i.e., pλL=αλ=α>0) and sλL=ϕλ(1)ϕλ(0)10=qλLβλ=a1(qλL)>0 (i.e., qλL=(α+βA)λ(α+βH)1λα). However, since pλL=αλ=α>0, the restriction (56) is fulfilled iff qλL>0, which is equivalent to

λISP,2:=]logαα+βHlogα+βAα+βH,01,[,ifβA>βH,],01,logαα+βHlogα+βAα+βH[,ifβA<βH. (58)

Suppose that λISP,2. As we have seen above, from Properties 1 (P3a) and (P3b) one can derive that an(qλL)nN is strictly positive, strictly increasing, and converges to x0(qλL)]0,log(qλL)] iff qλLmin{1,eβλ1}, and otherwise it diverges to . Notice that both cases can occur: consider the parameter setup βA,βH,αA,αH=(1.5,0.5,0.5,0.5)PSP,2, which leads to ISP,2=]1,0[]1,[; within our running-example epidemiological context of Section 2.3, this corresponds to a “mild” infectious-disease-transmission situation (H) (with “low” reproduction number βH=0.5 and importation mean of αH=0.5), whereas (A) describes a “dangerous” situation (with supercritical βA=1.5 and αA=0.5). For λ=0.5ISP,2 one obtains qλL0.207min{1,eβλ1}0.368, whereas for λ=2ISP,2 one gets qλL=3.5>min{1,eβλ1}=1. Altogether, this leads to

Proposition 11.

For all βA,βH,αA,αH,λPSP,2×ISP,2 and all initial population sizes X0N there holds with pλL=αA=αH=α,qλL=(α+βA)λ(α+βH)1λα

(a)Bλ,X0,1L=B˜λ,X0,1(pλL,qλL)=expqλLβλ·X0>1,(b)thesequenceBλ,X0,nLnNoflowerboundsforHλ(PA,nPH,n)givenbyBλ,X0,nL=B˜λ,X0,n(pλL,qλL)=expan(qλL)·X0+k=1nbk(pλL,qλL)isstrictlyincreasing,(c)limnBλ,X0,nL==limnHλ(PA,nPH,n),(d)limn1nlogBλ,X0,nL=pλL·expx0(qλL)α>0,ifqλLmin1,eβλ1,,ifqλL>min1,eβλ1,(e)themapX0Bλ,X0,nL=B˜λ,X0,n(pλL,qλL)isstrictlyincreasing.

Nevertheless, for the remaining constellations βA,βH,αA,αH,λPSP,2×R\ISP,2[0,1], all observation time horizons nN and all initial population sizes X0N one can still prove

1<HλPA,nPH,nandlimnHλPA,nPH,n=, (59)

(i.e., the achievement of the Goals (G1), (G2)), which is done by a conceptually different method (without involving pλL,qλL) in Appendix A.1.

3.18. Lower Bounds for the Cases βA,βH,αA,αH,λPSP,3a×(R\[0,1])

In the current setup, where PSP,3a:=βA,βH,αA,αHPSP:αAαH,βAβH,αAβAαHβH,αAαHβHβA],0[ (cf. (49)), we always have either (αA>αH)(βA>βH) or (αA<αH)(βA<βH). Furthermore, from Properties 3 (P16) we obtain ϕλ(0)>0. As in the case λ]0,1[, the derivative ϕλ(0) can assume any sign on PSP,3a, take e.g., βA,βH,αA,αH,λ=(2.2,4.5,1,3,2) for ϕλ(0)<0, βA,βH,αA,αH,λ=(2.25,4.5,1,3,2) for ϕλ(0)=0 and βA,βH,αA,αH,λ=(2.3,4.5,1,3,2) for ϕλ(0)>0 (these parameter constellations reflect “dangerous” (A) versus “highly dangerous” (H) situations within our running-example epidemiological context of Section 2.3). Nevertheless, in all three subcases one gets minxN0ϕλ(x)minx0ϕλ(x)>0. Thus, there exist parameters pλL]αλ,αAλαH1λ] and qλL]βλ,βAλβH1λ] which satisfy (35) (in particular, pλLαλ>0,qλLβλ>0). We now have to look for a condition which guarantees that these parameters additionally fulfill (56); such a condition is clearly that both αλ0 and βλ0 hold, which is equivalent (cf. (57)) with

λISP,3a():=[maxαHαAαH,βHβAβH,01,[,if(αA>αH)(βA>βH),,01,minαHαHαA,βHβHβA,if(αA<αH)(βA<βH);

recall that αλ=0 and βλ=0 cannot occur simultaneously in the current setup. If αλ0 and βλ0, i.e., if

λISP,3a(<):=],minαHαAαH;βHβAβH],if(αA>αH)(βA>βH),[maxαHαHαA;βHβHβA,[,if(αA<αH)(βA<βH),

then–due to the strict positivity of the function φλ(·) (cf. (31))–there exist parameters pλL>0=max{0,αλ} and qλL>0=max{0,βλ} which satisfy (56) and (34) (where the latter implies (35) and thus pλLαAλαH1λ,qλLβAλβH1λ). With

ISP,3a:=ISP,3a()ISP,3a(<) (60)

and with the discussion below (56), we thus derive the following

Proposition 12.

For all βA,βH,αA,αH,λPSP,3a×ISP,3a there exist parameters pλL,qλL which satisfy max{0,αλ}<pλLαAλαH1λ,max{0,βλ}<qλLβAλβH1λ as well as (35) for all xN0, and for all such pairs (pλL,qλL) and all initial population sizes X0N one gets

(a)Bλ,X0,1L=B˜λ,X0,1(pλL,qλL)=expqλLβλ·X0+pλLαλ>1,(b)thesequenceBλ,X0,nLnNoflowerboundsforHλ(PA,nPH,n)givenbyBλ,X0,nL=B˜λ,X0,n(pλL,qλL)=expan(qλL)·X0+k=1nbk(pλL,qλL)isstrictlyincreasing,(c)limnBλ,X0,nL==limnHλ(PA,nPH,n),(d)limn1nlogBλ,X0,nL=pλL·expx0(qλL)αλ>0,ifqλLmin1,eβλ1,,ifqλL>min1,eβλ1,(e)themapX0Bλ,X0,nL=B˜λ,X0,n(pλL,qλL)isstrictlyincreasing.

Notice that the assertions (a) to (e) of Proposition 12 hold true for parameter pairs (pλL,qλL) whenever they satisfy (35) and (56); in particular, we may allow either pλL=max{0,αλ} or qλL=max{0,βλ}. Let us furthermore mention that in part (d) both asymptotical behaviours can occur: consider e.g., the parameter setup βA,βH,αA,αH=(0.3,0.2,4,3)PSP,3a, leading to ]1,[ISP,3a()ISP,3a. For λ=2ISP,3a, the parameters pλL:=p˜λ:=5.25,qλL:=q˜λ:=0.45 (corresponding to the asymptote ϕ˜λ(·), cf. (P20)) fulfill (35), (56) and additionally qλL=0.45<min{1,eβλ1}0.549. Analogously, in the setup βA,βH,αA,αH,λ=(3,2,4,3,2)PSP,3a×ISP,3a, the choices pλL:=p˜λ:=5.25,qλL:=q˜λ:=4.5 satisfy (35), (56) and there holds qλL=4.5>min{1,eβλ1}=1.

For the remaining two cases (αλ0)(βλ>0) (e.g., βA,βH,αA,αH,λ=(6,5,3,2,3)) and (αλ>0)(βλ0) (e.g., βA,βH,αA,αH,λ=(3,2,6,5,3)), one has to proceed differently. Indeed, for all parameter constellations βA,βH,αA,αH,λPSP,3a×R\ISP,3a[0,1], all observation time horizons nN and all initial population sizes X0N one can still prove

1<HλPA,nPH,n,andlimnHλPA,nPH,n=, (61)

which is done in Appendix A.1, using a similar method as in the proof of assertion (59).

3.19. Lower Bounds for the Cases βA,βH,αA,αH,λPSP,3b×(R\[0,1])

Within such a constellation, where PSP,3b:=βA,βH,αA,αHPSP:αAαH,βAβH,αAβAαHβH,αAαHβHβA]0,[\N (cf. (49)), one always has either (αA<αH)(βA>βH) or (αA>αH)(βA<βH). Moreover, from Properties 3 (P15) one can see that ϕλ(x)=0 for x=x*=αHαAβAβH>0. However, x*N0, which implies ϕλ(x)>0 for all x on the relevant subdomain N0. Again, we incorporate (57) and consider the set of all λR\[0,1] such that αλ0 and βλ0 (where αλ=0βλ=0 cannot appear), i.e.,

λISP,3b():=βHβAβH,01,αHαHαA,if(αA<αH)(βA>βH),αHαAαH,01,βHβHβA,if(αA>αH)(βA<βH). (62)

As above in Section 3.18, if λISP,3b() then there exist parameters pλL]αλ,αAλαH1λ], qλL]βλ,βAλβH1λ] (which thus fulfill (56)) such that (35) is satisfied for all xN0. Hence, for all λISP,3b:=ISP,3b(), all assertions (a) to (e) of Proposition 12 hold true. Notice that for the current setup PSP,3b one cannot have αλ0 and βλ0 simultaneously. Furthermore, in each of the two remaining cases (αλ<0)(βλ>0) respectively (αλ>0)(βλ<0) it can happen that there do not exist parameters pλL,qλL>0 which satisfy both (35) and (56). However, as in the case PSP,3a above, for all λISP,3b we prove in Appendix A.1 (by a method without pλL,qλL) that for all observation times nN and all initial population sizes X0N there holds

1<HλPA,nPH,nandlimnHλPA,nPH,n=. (63)

3.20. Lower Bounds for the Cases βA,βH,αA,αH,λPSP,3c×(R\[0,1])

Since in this subcase one has PSP,3c:=βA,βH,αA,αHPSP:αAαH,βAβH,αAβAαHβH,αAαHβHβAN (cf. (49)) and thus ϕλ(x*)=0 for x*N, there do not exist parameters pλL,qλL such that (35) and (56) are satisfied. The only parameter pair that ensures expan(qλL)·X0+k=1nbk(pλL,qλL)1 for all nN and all X0N within our proposed method, is the choice pλL=αλ,qλL=βλ. Consequently, Bλ,X0,nL1, which coincides with the general lower bound (11) but violates the above-mentioned desired Goal (G1). However, in some constellations there exist nonnegative parameters pλL<αλ,qλL>βλ or pλL>αλ,qλL<βλ, such that at least the parts (c) and (d) of Proposition 12 are satisfied. As in Section 3.19 above, by using a conceptually different method (without pλL,qλL) we prove in Appendix A.1 that for all λR\[0,1], all observation times nN and all initial population sizes X0N there holds

1<HλPA,nPH,nandlimnHλPA,nPH,n=. (64)

3.21. Lower Bounds for the Cases βA,βH,αA,αH,λPSP,4a×(R\[0,1])

In the current setup, where PSP,4a:=βA,βH,αA,αHPSP:αAαH>0,βA=βH]0,1[ (cf. (49)), the function ϕλ(·) is strictly positive and strictly decreasing, with limxϕλ(x)=limxϕλ(x)=0. The only choice of parameters pλL,qλL which fulfill (35) and expan(qλL)·X0+k=1nbk(pλL,qλL)1 for all nN and all X0N, is the choice pλL=αλ as well as qλL=βλ=β, where β stands for both (equal) βH and βA. Of course, this leads to Bλ,X0,nL1, which is consistent with the general lower bound (11), but violates the above-mentioned desired Goal (G1). Nevertheless, in Appendix A.1 we prove the following

Proposition 13.

For all βA,βH,αA,αH,λPSP,4a×R\[0,1] there exist parameters pλL>αλ (not necessarily satisfying pλL0) and 0<qλL<βλ=β<min{1,eβ1}=eβ1 such that (35) holds for all x[0,[ and such that for all initial population sizes X0N the parts (c) and (d) of Proposition 12 hold true.

3.22. Lower Bounds for the Cases βA,βH,αA,αH,λPSP,4b×(R\[0,1])

By recalling PSP,4b:=βA,βH,αA,αHPSP:αAαH>0,βA=βH[1,[ (cf.(49)), the assertions preceding Proposition 13 remain valid. However, the proof of Proposition 13 in Appendix A.1 contains details which explain why it cannot be carried over to the current case PSP,4b. Thus, the generally valid lower bound Bλ,X0,nL1 cannot be improved with our methods.

3.23. Concluding Remarks on Alternative Lower Bounds for all Cases βA,βH,αA,αH,λ(PSP\PSP,1)×(R\[0,1])

To achieve the Goals (G1) to (G3), in the above-mentioned investigations about lower bounds of the Hellinger integral Hλ(PA,nPH,n), λR\[0,1], we have mainly focused on parameters pλL,qλL which satisfy (35) and additionally (56). Nevertheless, Theorem 1 (b) gives lower bounds Bλ,X0,nLwhenever (35) is fulfilled. However, this lower bound can be the trivial one, Bλ,X0,nL1. Let us remark here that for the parameter constellations βA,βH,αA,αH,λPSP,2×R\[0,1]ISP,2PSP,3a×R\[0,1]ISP,3aPSP,3b×R\[0,1]ISP,3b one can prove that there exist pλL,qλL which satisfy (35) for all xN0 as well as the condition (generalizing (56))

pλLαλ,qλLβλ,(whereatleastoneoftheinequalitiesisstrict),

and that for such pλL,qλL one gets the validity of Hλ(PA,nPH,n)Bλ,X0,nL=B˜λ,X0,n(pλL,qλL)>1 for all X0N and all nN; consequently, Goal (G1) is achieved. However, in these parameter constellations it can unpleasantly happen that nBλ,X0,nL is oscillating (in contrast to the monotone behaviour in the Propositions 11 (b), 12 (b)).

As a final general remark, let us mention that the functions ϕλ,ytan(·), ϕλ,ksec(·), ϕλhor(·), ϕλ˜(·) –defined in (52)–(54) and Properties 3 (P20)–constitute linear lower bounds for ϕλ(·) on the domain N0 in the case λR\[0,1]. Their parameters pλLpλ,ytan,pλ,ysec,pλ,yhor,pλ˜ and qλLqλ,ytan,qλ,ysec,qλ,yhor,qλ˜ lead to lower bounds Bλ,X0,nL of the Hellinger integrals that may or may not be consistent with Goals (G1) to (G3), and which may be possibly better respectively weaker respectively incomparable with the previous lower bounds when adding some relaxation of (G1), such as e.g., the validity of Hλ(PA,nPH,n)>1 for all but finitely many nN.

3.24. Upper Bounds for the Cases βA,βH,αA,αH,λ(PSP\PSP,1)×(R\[0,1])

For the cases λR\[0,1], the investigation of upper bounds for the Hellinger integral Hλ(PA,nPH,n) is much easier than the above-mentioned derivations of lower bounds. In fact, we face a situation which is similar to the lower-bounds-studies for the cases λ]0,1[: due to Properties 3 (P19), the function ϕλ(·) is strictly convex on the nonnegative real line. Furthermore, it is asymptotically linear, as stated in (P20). The monotonicity Properties 2 (P10) to (P12) imply that for the tightest upper bound (within our framework) one should use the parameters pλU:=αAλαH1λ>0 and qλU:=βAλβH1λ>0. Lemma A1 states that pλUαλ resp. qλUβλ, with equality iff αA=αH resp. iff βA=βH. From Properties 1 (P3a) we see that for βAβH the corresponding sequence an(qλU)nN is convergent to x0(qλU)]0,log(qλU)] if qλUmin{1,eβλ1} (i.e., if λ[λ,λ+], cf. Lemma 1 (a)), and otherwise it diverges to faster than exponentially (cf. (P3b)). If βA=βH (i.e., if βA,βH,αA,αHPSP,4=PSP,4aPSP,4b), then one gets qλU=βλ and an(qλU)=0=x0(qλU) for all nN (cf. (P2)). Altogether, this leads to

Proposition 14.

For all βA,βH,αA,αH,λ(PSP\PSP,1)×(R\[0,1]) and all initial population sizes X0N there holds with pλU:=αAλαH1λ,qλU:=βAλβH1λ

(a)Bλ,X0,1U=B˜λ,X0,1(pλU,qλU)=expβAλβH1λβλ·X0+αAλαH1λαλ>1,(b)thesequenceBλ,X0,nUnNofupperboundsforHλ(PA,nPH,n)givenbyBλ,X0,nU=B˜λ,X0,n(pλU,qλU)=expan(qλU)·X0+k=1nbk(pλU,qλU)isstrictlyincreasing,(c)limnBλ,X0,nU=,(d)limn1nlogBλ,X0,nU=pλU·expx0(qλU)αλ>0,ifλ[λ,λ+]\[0,1],,ifλ],λ[]λ+,[,(e)themapX0Bλ,X0,nU=B˜λ,X0,n(pλU,qλU)isstrictlyincreasing.

4. Power Divergences of Non-Kullback-Leibler-Information-Divergence Type

4.1. A First Basic Result

For orders λR\{0,1}, all the results of the previous Section 3 carry correspondingly over from the Hellinger integrals Hλ(··) to the total variation distance V(·||·), by virtue of the relation (cf. (12))

21H12(PA,nPH,n)V(PA,nPH,n)21H12(PA,nPH,n)2,

to the Renyi divergences Rλ(··), by virtue of the relation (cf. (7))

0RλPA,nPH,n=1λ(λ1)logHλPA,nPH,n,withlog0:=,

as well as to the power divergences Iλ··, by virtue of the relation (cf. (2))

IλPA,nPH,n=1Hλ(PA,nPH,n)λ·(1λ),nN;

in the following, we concentrate on the latter. In particular, the above-mentioned carrying-over procedure leads to bounds on IλPAPH which are tighter than the general rudimentary bounds (cf. (10) and (11))

0IλPA,nPH,n<1λ(1λ),forλ]0,1[,0IλPA,nPH,n,forλR\[0,1].

Because power divergences have a very insightful interpretation as “directed distances” between two probability distributions (e.g., within our running-example epidemiological context), and function as important tools in statistics, information theory, machine learning, and artificial intelligence, we present explicitly the outcoming exact values respectively bounds of IλPAPH (λR\{0,1}, nN), in the current and the following subsections. For this, recall the case-dependent parameters pA=pλA=pAβA,βH,αA,αH,λ and qA=qλA=qAβA,βH,αA,αH,λ (A{E,L,U}). To begin with, we can deduce from Theorem 1

Theorem 2.

  • (a) 
    For all βA,βH,αA,αH(PNIPSP,1), all initial population sizes X0N0, all observation horizons nN and all λR\{0,1} one can recursively compute the exact value
    Iλ(PA,nPH,n)=1λ(λ1)·expan(qλE)·X0+αAβAk=1nak(qλE)1=:Vλ,X0,nI, (65)
    where αAβA can be equivalently replaced by αHβH and qλE:=βAλβH1λ. Notice that on PNI the formula (65) simplifies significantly, since αA=αH=0.
  • (b) 
    For general parameters pR, q0 recall the general expression (cf. (42))
    B˜λ,X0,n(p,q):=expan(q)·X0+pqk=1nak(q)+n·pqβλαλ
    as well as
    B˜λ,X0,n(p,0):=expβλ·X0+p·eβλαλ·n.
    Then, for all βA,βH,αA,αHPSP\PSP,1, all λR\{0,1}, all coefficients pλL,pλU,qλL,qλUR which satisfy (35) for all xN0, all initial population sizes X0N and all observation horizons nN one gets the following recursive bounds for the power divergences: for λ]0,1[ there holds
    Iλ(PA,nPH,n)<1λ(1λ)·1Bλ,X0,nL=1λ(1λ)·1B˜λ,X0,n(pλL,qλL)=:Bλ,X0,nI,U,1λ(1λ)·1Bλ,X0,nU=1λ(1λ)·1minB˜λ,X0,n(pλU,qλU),1=:Bλ,X0,nI,L,
    whereas for λR\[0,1] there holds
    Iλ(PA,nPH,n)<1λ(λ1)·Bλ,X0,nU1=1λ(λ1)·B˜λ,X0,n(pλU,qλU)1=:Bλ,X0,nI,U,1λ(λ1)·Bλ,X0,nL1=1λ(λ1)·maxB˜λ,X0,n(pλL,qλL),11=:Bλ,X0,nI,L.

In order to deduce the subsequent detailed recursive analyses of power divergences, we also employ the obvious relations

limn1nlog1λ(1λ)Iλ(PA,nPH,n)=limn1nlogλ(1λ)+logHλ(PA,nPH,n)=limn1nlogHλ(PA,nPH,n),forλ]0,1[, (66)

as well as

limn1nlogIλ(PA,nPH,n)=limn1nlogλ(λ1)+logHλ(PA,nPH,n)1=limn1nlog11Hλ(PA,n||PH,n)+logHλ(PA,nPH,n)=limn1nlogHλ(PA,nPH,n), (67)

for λR\[0,1] (provided that lim infnHλ(PA,nPH,n)>1).

4.2. Detailed Analyses of the Exact Recursive Values of Iλ(··), i.e., for the Cases βA,βH,αA,αH,λ(PNIPSP,1)×(R\{0,1})

Corollary 2.

For all βA,βH,αA,αH,λPNI×]0,1[ and all initial population sizes X0N there holds with qλE:=βAλβH1λ

(a)Iλ(PA,1PH,1)=1λ(1λ)·1expβAλβH1λβλ·X0>0,(b)thesequenceIλ(PA,nPH,n)nNgivenbyIλ(PA,nPH,n)=1λ(1λ)·1expan(qλE)·X0=:Vλ,X0,nIisstrictlyincreasing,(c)limnIλ(PA,nPH,n)=1λ(1λ)·1expx0(qλE)·X0]0,1λ(1λ)[,(d)limn1nlog1λ(1λ)Iλ(PA,nPH,n)=limn1nlogHλ(PA,nPH,n)=0,(e)themapX0Vλ,X0,nIisstrictlyincreasing.

Corollary 3.

For all βA,βH,αA,αH,λPNI×(R\[0,1]) and all initial population sizes X0N there holds with qλE:=βAλβH1λ

(a)Iλ(PA,1PH,1)=1λ(λ1)·expβAλβH1λβλ·X01>0,(b)thesequenceIλ(PA,nPH,n)nNgivenbyIλ(PA,nPH,n)=1λ(λ1)·expan(qλE)·X01=:Vλ,X0,nIisstrictlyincreasing,(c)limnIλ(PA,nPH,n)=1λ(λ1)·expx0(qλE)·X01>0,ifλ[λ,λ+]\[0,1],,ifλ],λ[]λ+,[,(d)limn1nlogIλ(PA,nPH,n)=0,ifλ[λ,λ+]\[0,1],,ifλ],λ[]λ+,[,(e)themapX0Vλ,X0,nIisstrictlyincreasing.

Corollary 4.

For all βA,βH,αA,αH,λPSP,1×]0,1[ and all initial population sizes X0N there holds with qλE:=βAλβH1λ

(a)Iλ(PA,1PH,1)=1λ(1λ)·1expβAλβH1λβλ·X0+αAβA>0,(b)thesequenceIλ(PA,nPH,n)nNgivenbyIλ(PA,nPH,n)=1λ(1λ)·1expan(qλE)·X0+αAβAk=1nak(qλE)=:Vλ,X0,nIisstrictlyincreasing,(c)limnIλ(PA,nPH,n)=1λ(1λ),(d)limn1nlog1λ(1λ)Iλ(PA,nPH,n)=αAβA·x0(qλE)<0,(e)themapX0Vλ,X0,nIisstrictlyincreasing.

Corollary 5.

For all βA,βH,αA,αH,λPSP,1×(R\[0,1]) and all initial population sizes X0N there holds with qλE:=βAλβH1λ

(a)Iλ(PA,1PH,1)=1λ(λ1)·expβAλβH1λβλ·X0+αAβA1>0,(b)thesequenceIλ(PA,nPH,n)nNgivenbyIλ(PA,nPH,n)=1λ(λ1)·expan(qλE)·X0+αAβAk=1nak(qλE)1=:Vλ,X0,nIisstrictlyincreasing,(c)limnIλ(PA,nPH,n)=,(d)limn1nlogIλ(PA,nPH,n)=αAβA·x0(qλE)>0,ifλ[λ,λ+]\[0,1],,ifλ],λ[]λ+,[,(e)themapX0Vλ,X0,nIisstrictlyincreasing.

In the assertions (a), (b), (d) of the Corollaries 4 and 5 the fraction αA/βA can be equivalently replaced by αH/βH.

Let us now derive the corresponding detailed results for the bounds of the power divergences for the parameter cases PSP\PSP,1, where the Hellinger integral, and thus Iλ(PA,nPH,n), cannot be determined exactly. The extensive discussion on the Hellinger-integral bounds in the Section 3.4, Section 3.5, Section 3.6, Section 3.7, Section 3.8, Section 3.9, Section 3.10, Section 3.11, Section 3.12 and Section 3.13, as well as in the Section 3.16, Section 3.17, Section 3.18, Section 3.19, Section 3.20, Section 3.21, Section 3.22, Section 3.23 and Section 3.24 can be carried over directly to obtain power-divergence bounds. In the following, we summarize the outcoming key results, referring a detailed discussion on the possible choices of pλA=pAβA,βH,αA,αH,λ and qλA=qAβA,βH,αA,αH,λ (A{L,U}) to the corresponding above-mentioned subsections.

4.3. Lower Bounds of Iλ(··) for the Cases βA,βH,αA,αH,λ(PSP\PSP,1)×]0,1[

Corollary 6.

For all βA,βH,αA,αH,λ(PSP,2PSP,3aPSP,3b)×]0,1[ there exist parameters pλU,qλU which satisfy pλUαAλαH1λ,αλ and qλU[βAλβH1λ,βλ[ as well as (35) for all xN0, and for all such pairs (pλU,qλU) and all initial population sizes X0N there holds

(a)Bλ,X0,1I,L=1λ(1λ)·1expqλUβλ·X0+pλUαλ>0,(b)thesequenceBλ,X0,nI,LnNoflowerboundsforIλ(PA,nPH,n)givenbyBλ,X0,nI,L=1λ(1λ)·1expan(qλU)·X0+k=1nbk(pλU,qλU)isstrictlyincreasing,(c)limnBλ,X0,nI,L=limnIλ(PA,nPH,n)=1λ(1λ),(d)limn1nlog1λ(1λ)Bλ,X0,nI,L=pλU·ex0(qλU)αλ<0,(e)themapX0Bλ,X0,nI,Lisstrictlyincreasing.

Remark 4.

  • (a) 

    Notice that in the case βA,βH,αA,αH,λPSP,2×]0,1[–where αAλαH1λ=αλ=αA=αH=α–we get the special choice pλU=α and qλU=(α+βA)λ(α+βH)1λα (cf. Section 3.7). For the constellations βA,βH,αA,αH,λ(PSP,3aPSP,3b)×]0,1[ there exist parameters pλU[αAλαH1λ,αλ[, qλU[βAλβH1λ,βλ[ which satisfy (35) for all xN0.

  • (b) 

    For the parameter setups βA,βH,αA,αH,λ(PSP,2PSP,3aPSP,3b)×]0,1[ there might exist parameter pairs (pλU,qλU) satisfying (35) and either pλU=αλ or qλU=βλ, for which all assertions of Corollary 6 still hold true.

  • (c) 

    Following the discussion in Section 3.10 for all βA,βH,αA,αH,λPSP,3c×]0,1[ at least part (c) still holds true.

Corollary 7.

For all βA,βH,αA,αH,λPSP,4a×]0,1[ there exist parameters pλU<αλ, 1>qλU>βλ=β such that (35) is satisfied for all x[0,[ and such that for all initial population sizes X0N at least the parts (c) and (d) of Corollary 6 hold true.

As in Section 3.12, for the parameter setup βA,βH,αA,αH,λPSP,4b×]0,1[ we cannot derive a lower bound for the power divergences which improves the generally valid lower bound Iλ(PA,nPH,n)0 (cf. (10)) by employing our proposed (pλU,qλU)-method.

4.4. Upper Bounds of Iλ(··) for the Cases βA,βH,αA,αH,λ(PSP\PSP,1)×]0,1[

Since in this setup the upper bounds of the power divergences can be derived from the lower bounds of the Hellinger integrals, we here appropriately adapt the results of Proposition 6.

Corollary 8.

For all βA,βH,αA,αH,λ(PSP\PSP,1)×]0,1[ and all initial population sizes X0N there holds with pλL:=αAλαH1λ and qλL:=βAλβH1λ

(a)Bλ,X0,1I,U=1λ(1λ)·1expβAλβH1λβλ·X0+αAλαH1λαλ>0,(b)thesequenceofupperboundsBλ,X0,nI,UnNforIλ(PA,nPH,n)givenbyBλ,X0,nI,U=1λ(1λ)·1expan(qλL)·X0+pλLqλLk=1nak(qλL)+n·pλLqλL·βλαλisstrictlyincreasing,(c)limnBλ,X0,nI,U=1λ(1λ),(d)limn1nlog1λ(1λ)Bλ,X0,nI,U=pλLqλL·x0(qλL)+βλαλ=pλL·ex0(qλL)αλ<0,(e)themapX0Bλ,X0,nI,Uisstrictlyincreasing.

4.5. Lower Bounds of Iλ(··) for the Cases (βA,βH,αA,αH,λ)(PSP\PSP,1)×(R\[0,1])

In order to derive detailed results on lower bounds of the power divergences in the case λR\[0,1], we have to subsume and adapt the Hellinger-integral concerning lower-bounds investigations from the Section 3.16, Section 3.17, Section 3.18, Section 3.19, Section 3.20, Section 3.21, Section 3.22 and Section 3.23. Recall the λ-sets ISP,2,ISP,3a,ISP,3b (cf. (58), (60), (62)). For the constellations PSP,2×ISP,2 we employ the special choice pλL=αAλαH1λ=αλ=αA=αH=α together with qλL=(α+βA)λ(α+βH)1λα>max{0,βλ} (cf. (58)) which satisfy (35) for all xN0 and (56), whereas for the constellations (PSP,3a×ISP,3a) (PSP,3b×ISP,3b) we have proved the existence of parameters pλL,qλL satisfying both (35) for all xN0 and (56) with two strict inequalities. Subsuming this, we obtain

Corollary 9.

For all βA,βH,αA,αH,λ(PSP,2×ISP,2)(PSP,3a×ISP,3a)(PSP,3b×ISP,3b) there exist parameters pλL,qλL which satisfy max{0,αλ}pλLαAλαH1λ,max{0,βλ}<qλLβAλβH1λ as well as (35) for all xN0, and for all such pairs (pλL,qλL) and all initial population sizes X0N one gets

(a)Bλ,X0,1I,L=1λ(λ1)·expqλLβλ·X0+pλLαλ1>0,(b)thesequenceBλ,X0,nI,LnNoflowerboundsforIλ(PA,nPH,n)givenbyBλ,X0,nI,L=1λ(λ1)·expan(qλL)·X0+k=1nbk(pλL,qλL)1isstrictlyincreasing,(c)limnBλ,X0,nI,L=limnIλ(PA,nPH,n)=,(d)limn1nlogBλ,X0,nI,L=pλL·expx0(qλL)αλ>0,ifqλLmin1;eβλ1,,ifqλL>min1;eβλ1,(e)themapX0Bλ,X0,nI,Lisstrictlyincreasing.

Analogously to the discussions in the Section 3.17, Section 3.18, Section 3.19 and Section 3.20, for the parameter setups PSP,2×R\ISP,2[0,1] PSP,3a×R\ISP,3a[0,1]PSP,3b×R\ISP,3b[0,1]PSP,3c×R\[0,1] and for all initial population sizes X0N one can still show

0<Iλ(PA,nPH,n),andlimnIλ(PA,nPH,n)=.

For the penultimate case we obtain

Corollary 10.

For all βA,βH,αA,αH,λPSP,4a×(R\[0,1]) there exist parameters pλL>αλ (where not necessarily pλL0) and 0<qλL<βλ=β<min{1,eβ1}=eβ1 such that (35) is satisfied for all x[0,[ and such that for all initial population sizes X0N at least the parts (c) and (d) of Corollary 9 hold true.

Notice that for the last case βA,βH,αA,αH,λPSP,4b×R\[0,1] (where (βA=βH1) we cannot derive lower bounds of the power divergences which improve the generally valid lower bound Iλ(PA,nPH,n)0 (cf. (11)) by employing our proposed (pλU,qλU)-method.

4.6. Upper Bounds of Iλ(··) for the Cases βA,βH,αA,αH,λ(PSP\PSP,1)×(R\[0,1])

For these constellations we adapt Proposition 14, which after modulation becomes

Corollary 11.

For all βA,βH,αA,αH,λ(PSP\PSP,1)×(R\[0,1]) and all initial population sizes X0N there holds with pλU:=αAλαH1λ and qλU:=βAλβH1λ

(a)Bλ,X0,1I,U=1λ(λ1)·expβAλβH1λβλ·X0+αAλαH1λαλ1>0,(b)thesequenceBλ,X0,nI,UnNofupperboundsforIλ(PA,nPH,n)givenbyBλ,X0,nI,U=1λ(λ1)·expan(qλU)·X0+k=1nbk(pλU,qλU)1isstrictlyincreasing,(c)limnBλ,X0,nI,U=,(d)limn1nlogBλ,X0,nI,U=pλU·expx0(qλU)αλ>0,ifλ[λ,λ+]\[0,1],,ifλ],λ[]λ+,[,(e)themapX0Bλ,X0,nI,Uisstrictlyincreasing.

4.7. Applications to Bayesian Decision Making

As explained in Section 2.5, the power divergences fulfill

IλPA,nPH,n=01ΔBRLO˜pAprior·1pApriorλ2·pAprior1λdpAprior,λR,(cf.(21)),

and

IλPA,nPH,n=limχpApriorΔBRLOλ,χpAprior,λ]0,1[,(cf.(22)),

and thus can be interpreted as (i) weighted-average decision risk reduction (weighted-average statistical information measure) about the degree of evidence deg concerning the parameter θ that can be attained by observing the GWI-path Xn until stage n, and as (ii) limit decision risk reduction (limit statistical information measure). Hence, by combining (21) and (22) with the investigations in the previous Section 4.1, Section 4.2, Section 4.3, Section 4.4, Section 4.5 and Section 4.6, we obtain exact recursive values respectively recursive bounds of the above-mentioned decision risk reductions. For the sake of brevity, we omit the details here.

5. Kullback-Leibler Information Divergence (Relative Entropy)

5.1. Exact Values Respectively Upper Bounds of I(·||·)

From (2), (3) and (6) in Section 2.4, one can immediately see that the Kullback-Leibler information divergence (relative entropy) between two competing Galton-Watson processes without/with immigration can be obtained by the limit

I(PA,nPH,n)=limλ1IλPA,nPH,n, (68)

and the reverse Kullback-Leibler information divergence (reverse relative entropy) by IPH,nPA,n=limλ0IλPA,nPH,n. Hence, in the following we concentrate only on (68), the reverse case works analogously. Accordingly, we can use (68) in appropriate combination with the λ]0,1[-parts of the previous Section 4 (respectively, the corresponding parts of Section 3) in order to obtain detailed analyses for IPH,nPA,n. Let us start with the following assertions on exact values respectively upper bounds, which will be proved in Appendix A.2:

Theorem 3.

  • (a) 
    For all βA,βH,αA,αH(PNIPSP,1), all initial population sizes X0N and all observation horizons nN the Kullback-Leibler information divergence (relative entropy) is given by
    I(PA,nPH,n)=IX0,n:=βA·logβAβH1+βH1βA·X0αA1βA·1βAn+αA·βA·logβAβH1+βHβA(1βA)·n,ifβA1,βHlogβH1·αA2·n2+X0+αA2·n,ifβA=1. (69)
  • (b) 
    For all βA,βH,αA,αHPSP\PSP,1, all initial population sizes X0N and all observation horizons nN there holds I(PA,nPH,n)EX0,nU, where
    EX0,nU:=βA·logβAβH1+βH1βA·X0αA1βA·1βAn+αA·βA·logβAβH1+βHβA(1βA)+αAlogαAβHαHβAβHβA+αH·n,ifβA1,βHlogβH1·αA2·n2+X0+αA2·n+αAlogαAβHαHβH+αH·n,ifβA=1. (70)

Remark 5.

(i) Notice that the exact values respectively upper bounds are in closed form (rather than in recursive form).

(ii) The nbehaviour of (the bounds of) the Kullback-Leibler information divergence/relative entropy I(PA,nPH,n) in Theorem 3 is influenced by the following facts:

  • (a) 

    βA·logβAβH1+βH0 with equality iff βA=βH.

  • (b) 

    In the case βA1 of (70), there holds αA·βA·logβAβH1+βHβA(1βA)+αAlogαAβHαHβAβHβA+αH0, with equality iff αA=αH and βA=βH.

5.2. Lower Bounds of I(·||·) for the Cases βA,βH,αA,αH(PSP\PSP,1)

Again by using (68) in appropriate combination with the “λ]0,1[-parts” of the previous Section 4 (respectively, the corresponding parts of Section 3), we obtain the following (semi-)closed-form lower bounds of IPH,nPA,n:

Theorem 4.

For all βA,βH,αA,αHPSP\PSP,1, all initial population sizes X0N and all observation horizons nN

I(PA,nPH,n)EX0,nL:=supkN0,y[0,[Ey,X0,nL,tan,Ek,X0,nL,sec,EX0,nL,hor[0,[, (71)

where for all y[0,[ we define the – possibly negatively valued– finite bound component

Ey,X0,nL,tan:=βAlogαA+βAyαH+βHy+βH1αA+βAyαH+βHy·1βAn1βA·X0αA1βA+[αAβA(1βA)βAlogαA+βAyαH+βHy+βH1αA+βAyαH+βHy+αHαAβHβA1αA+βAyαH+βHy]·n,ifβA1,logαA+yαH+βHy+βH1αA+yαH+βHy·αA2·n2+X0+αA2·n+αHαAβH1αA+yαH+βHy·n,ifβA=1, (72)

and for all kN0 the – possibly negatively valued– finite bound component

Ek,X0,nL,sec:=fA(k+1)logfA(k+1)fH(k+1)fA(k)logfA(k)fH(k)+βHβA·1βAn1βA·X0αA1βA+[αAβA(1βA)fA(k+1)logfA(k+1)fH(k+1)fA(k)logfA(k)fH(k)+βHβAfA(k+1)logfA(k+1)fH(k+1)fA(k)logfA(k)fH(k)·k+αAβA+fA(k)logfA(k)fH(k)αAβHβA+αH]·n,ifβA1,fA(k+1)logfA(k+1)fH(k+1)fA(k)logfA(k)fH(k)+βH1·αA2·n2+X0+αA2·n[fA(k+1)logfA(k+1)fH(k+1)fA(k)logfA(k)fH(k)k+αAfA(k)logfA(k)fH(k)+αAβHαH]·n,ifβA=1. (73)

Furthermore, on PSP,4 we set EX0,nL,hor:=0 for all nN whereas on PSP\(PSP,1PSP,4) we define

EX0,nL,hor:=αA+βAz*·logαA+βAz*αH+βHz*1+αH+βHz*·n,,nN, (74)

with z*:=argmaxxN0(αA+βAx)logαA+βAxαH+βHx+1(αH+βHx).

On PSP\(PSP,1PSP,3c) one even gets EX0,nL>0 for all X0N and all nN.

For the subcase PSP,3c, one obtains for each fixed nN and each fixed X0N the strict positivity EX0,nL>0 if yEy,nL,tan(y*)0, where y*:=αAαHβHβAN and hence

yEy,X0,nL,tan(y*)=(βAβH)3αAβHαHβA·1βAn1βA·X0αA1βA(βAβH)2βA1+αA(βAβH)(1βA)(αAβHαHβA)·n,ifβA1,(1βH)3αAβHαH·αA2·n2+X0+αA2·n(1βH)2·n,ifβA=1. (75)

A proof of this theorem is given in in Appendix A.2.

Remark 6.

Consider the exemplary parameter setup βA,βH,αA,αH=(13,23,2,1)PSP,3c; within our running-example epidemiological context of Section 2.3, this corresponds to a “semi-mild” infectious-disease-transmission situation (H) (with subcritical reproduction number βH=23 and importation mean of αH=1), whereas (A) describes a “mild” situation (with “low” subcritical βA=13 and αA=2). In the case of X0=3 there holds yEy,X0,nL,tan(y*)=0 for all nN, whereas for X03 one obtains yEy,X0,nL,tan(y*)0 for all nN.

It seems that the optimization problem in (71) admits in general only an implicitly representable solution, and thus we have used the prefix “(semi-)” above. Of course, as a less tight but less involved explicit lower bound of the Kullback-Leibler information divergence (relative entropy) I(PA,n||PH,n) one can use any term of the form maxEy,X0,nL,tan,Ek,X0,nL,sec,EX0,nL,hor (y[0,[, kN0), as well as the following

Corollary 12.

(a) For all βA,βH,αA,αHPSP\PSP,1, all initial population sizes X0N and all observation horizons nN

I(PA,nPH,n)EX0,nLE˜X0,nL:=maxE,X0,nL,tan,E0,X0,nL,sec,EX0,nL,hor[0,[,

with EX0,nL,hor defined by (74), with – possibly negatively valued– finite bound component E,X0,nL,tan:=limyEy,X0,nL,tan, where

E,X0,nL,tan:=βA·logβAβH1+βH1βA·X0αA1βA·1βAn+αA·βA·logβAβH1+βHβA(1βA)+αA1βHβA+αH1βAβH·n,ifβA1,βHlogβH1·αA2·n2+X0+αA2·n+αA1βH+αH11βH·n,ifβA=1,

and –possibly negatively valued–finite bound component

E0,X0,nL,sec=αA+βA·logαA+βAαH+βHαA·logαAαH+βHβA·1βAn1βA·X0αA1βA+{αAβA(1βA)αA+βA·logαA+βAαH+βHαA·logαAαHαA1βA1βHαA1+αAβA·logαH(αA+βA)αA(αH+βH)+αH}·n,ifβA1,αA+1·logαA+1αH+βHαA·logαAαH+βH1·n·X0+αA2·n2+{αA2αA+1·logαA+1αH+βHαA·logαAαHβH1αA1+αA·logαH(αA+1)αA(αH+βH)+αH}·n,ifβA=1.

For the cases PSP,2PSP,3aPSP,3b one gets even E˜X0,nL>0 for all X0N and all nN.

5.3. Applications to Bayesian Decision Making

As explained in Section 2.5, the Kullback-Leibler information divergence fulfills

IPA,nPH,n=01ΔBRLO˜pAprior·1pAprior1·pAprior2dpAprior,(cf.(21)withλ=1),

and thus can be interpreted as weighted-average decision risk reduction (weighted-average statistical information measure) about the degree of evidence deg concerning the parameter θ that can be attained by observing the GWI-path Xn until stage n. Hence, by combining (21) with the investigations in the previous Section 5.1 and Section 5.2, we obtain exact values respectively bounds of the above-mentioned decision risk reductions. For the sake of brevity, we omit the details here.

6. Explicit Closed-Form Bounds of Hellinger Integrals

6.1. Principal Approach

Depending on the parameter constellation βA,βH,αA,αH,λP×(R\{0,1}), for the Hellinger integrals HλPA,nPH,n we have derived in Section 3 corresponding lower/upper bounds respectively exact values–of recursive nature– which can be obtained by choosing appropriate p=pλA=pAβA,βH,αA,αH,λ,q=qλA=qAβA,βH,αA,αH,λ (A{E,L,U}) and by using those together with the recursion an(q)nN defined by (36) as well as the sequence bn(p,q)nN obtained from an(q)nN by the linear transformation (38). Both sequences are “stepwise fully evaluable” but generally seem not to admit a closed-form representation in the observation horizons n; consequently, the time-evolution nHλPA,nPH,n–respectively the time-evolution of the corresponding recursive bounds– can generally not be seen explicitly. On order to avoid this intransparency (at the expense of losing some precision) one can approximate (36) by a recursion that allows for a closed-form representation; by the way, this will also turn out to be useful for investigations concerning diffusion limits (cf. the next Section 7).

To explain the basic underlying principle, let us first assume some general q]0,βλ[ and λ]0,1[. With Properties 1 (P1) we see that the sequence an(q)nN is strictly negative, strictly decreasing and converges to x0(q)]βλ,qβλ[. Recall that this sequence is obtained by the recursive application of the function ξλ(q)(x):=q·exβλ, through a1(q)=ξλ(q)(0)=qβλ<0, an(q)=ξλ(q)an1(q)=qean1(q)βλ (cf. (36)). As a first step, we want to approximate ξλ(q)(·) by a linear function on the interval x0(q),0. Due to convexity (P9), this is done by using the tangent line of ξλ(q)(·) at x0(q)

ξλ(q),T(x):=c(q),T+d(q),T·x:=x0(q)1q·ex0(q)+q·ex0(q)·x, (76)

as a linear lower bound, and the secant line of ξλ(q)(·) across its arguments 0 and x0(q)

ξλ(q),S(x):=c(q),S+d(q),S·x:=qβλ+x0(q)(qβλ)x0(q)·x, (77)

as a linear upper bound. With the help of these functions, we can define the linear recursions

a0(q),T:=0,an(q),T:=ξλ(q),Tan1(q),T,nN, (78)
aswellasa0(q),S:=0,an(q),S:=ξλ(q),San1(q),S,nN. (79)

In the following, we will refer to these sequences as the rudimentary closed-form sequence-bounds.

Clearly, both sequences are strictly negative (on N), strictly decreasing, and one gets the sandwiching

an(q),T<an(q)an(q),S (80)

for all nN, with equality on the right side iff n=1 (where a1(q)=qβλ<0); moreover,

limnan(q),T=limnan(q),S=limnan(q)=x0(q). (81)

Furthermore, such linear recursions allow for a closed-form representation, namely

an(q),*=c(q),*1d(q),*·1d(q),*n=x0(q)·1d(q),*n, (82)

where the “ * ” stands for either S or T. Notice that this representation is valid due to d(q),T,d(q),S]0,1[. So far, we have considered the case q]0,βλ[. If q=βλ, then one can see from Properties 1 (P2) that an(q)0, which is also an explicitly given (though trivial) sequence. For the remaining case, where q>βλ and thus ξλ(q)(0)=a1(q)=qβλ>0), we want to exclude qmin1,eβλ1 for the following reasons. Firstly, if q>min1,eβλ1, then from (P3) we see that the sequence an(q)nN is strictly increasing and divergent to , at a rate faster than exponentially (P3b); but a linear recursion is too weak to approximate such a growth pattern. Secondly, if q=min1,eβλ1, then one necessarily gets q=eβλ1<1 (since we have required q>βλ, and otherwise one obtains the contradiction βλ<q=1eβλ1). This means that the function ξλ(q)(·) now touches the straight line id(·) in the point log(q), i.e., ξλ(q)log(q)=log(q). Our above-proposed method, namely to use the tangent line of ξλ(q)(·) at x=x0(q)=log(q) as a linear lower bound for ξλ(q)(·), leads then to the recursion an(q),T0 (cf. (78)). This is due to the fact that the tangent line ξλ(q),T(·) is in the current case equivalent with the straight line id(·). Consequently, (81) would not be satisfied.

Notice that in the case βλ<q<min1,eβλ1, the above-introduced functions ξλ(q),T(·),ξλ(q),S(·) constitute again linear lower and upper bounds for ξλ(q)(·), however, this time on the interval 0,x0(q). The sequences defined in (78) and (79) still fulfill the assertions (80) and (81), and additionally allow for the closed-form representation (82). Furthermore, let us mention that these rudimentary closed-form sequence-bounds can be defined analogously for λR\[0,1] and either 0<q<βλ, or q=βλ, or max{0,βλ}<q<min{1,eβλ1}.

In a second step, we want to improve the above-mentioned linear (lower and upper) approximations of the sequence an(q) by reducing the faced error within each iteration. To do so, in both cases of lower and upper approximates we shall employ context-adapted linear inhomogeneous difference equations of the form

a˜0:=0;a˜n:=ξ˜a˜n1+ρn1,nN, (83)

with

ξ˜(x):=c+d·x,xR, (84)
ρn1:=K1·ϰn1+K2·νn1,nN, (85)

for some constants cR, d]0,1[, K1,K2,ϰ,νR with 0ν<ϰd. This will be applied to c:=c(q),S, c:=c(q),T, d:=d(q),S and d:=d(q),T later on. Meanwhile, let us first present some facts and expressions which are insightful for further formulations and analyses.

Lemma 2.

Consider the sequence a˜nnN0 defined in (83) to (85). If 0ν<ϰ<d, then one gets the closed-form representation

a˜n=a˜nhom+c˜nwitha˜nhom=c·1dn1dandc˜n=K1·dnϰndϰ+K2·dnνndν, (86)

which leads for all nN to

k=1na˜k=K1dϰ+K2dνc1d·d·1dn1dK1·ϰ·1ϰn(dϰ)(1ϰ)K2·ν·1νn(dν)(1ν)+c1d·n. (87)

If 0ν<ϰ=d, then one gets the closed-form representation

a˜n=a˜nhom+c˜nwitha˜nhom=c·1dn1dandc˜n=K1·n·dn1+K2·dnνndν, (88)

which leads for all nN to

k=1na˜k=K1d(1d)+K2dνc1d·d·1dn1dK2·ν·1νn(dν)(1ν)+c1dK1·dn1d·n. (89)

Lemma 2 will be proved in Appendix A.3. Notice that (88) is consistent with taking the limit ϰd in (86). Furthermore, for the special case K2=K1>0 one has from (85) for all integers n2 the relation ρn1<0 and thus a˜na˜nhom<0, leading to

c˜n<0andk=1nc˜n<0. (90)

Lemma 2 gives explicit expressions for a linear inhomogeneous recursion of the form (83) possessing the extra term given by (85). Therefrom we derive lower and upper bounds for the sequence an(q)nN by employing an(q),T resp. an(q),S as the homogeneous solution of (83), i.e., by setting a˜nhom:=an(q),T resp. a˜nhom:=an(q),S. Moreover, our concrete approximation-error-reducing “correction terms” ρn will have different form, depending on whether 0<q<βλ or q>max{0,βλ}. In both cases, we express ρn by means of the slopes d(q),T=qex0(q) resp. d(q),S=x0(q)(qβλ)x0(q) of the tangent line ξλ(q),T(·) (cf. (76)) resp. the secant line ξλ(q),S(·) (cf. (77)), as well as in terms of the parameters

Γ<(q):=12·x0(q)2·q·ex0(q),for0<q<βλ,andΓ>(q):=q2·x0(q)2,forq>max{0,βλ}. (91)

In detail, let us first define the lower approximate by

a_0(q):=0,a_n(q):=ξλ(q),Ta_n1(q)+ρ_n1(q),nN, (92)

where

ρ_n1(q):=Γ<(q)·d(q),T2(n1),if0<q<βλ,Γ>(q)·d(q),S2(n1),ifmax{0,βλ}<q<min{1,eβλ1}. (93)

The upper approximate is defined by

a¯0(q):=0,a¯n(q):=ξλ(q),Sa¯n1(q)+ρ¯n1(q),nN, (94)

where

ρ¯n1(q):=Γ<(q)·d(q),Tn1·1d(q),Sn1,if0<q<βλ,Γ>(q)·d(q),Sn1·1d(q),Tn1,ifmax{0,βλ}<q<min{1,eβλ1}. (95)

In terms of (85), we use for ρ_n(q) the constants K2=ν=0 as well as K1=Γ<(q),ϰ=d(q),T2 for 0<q<βλ respectively K1=Γ>(q),ϰ=d(q),S2 for max{0,βλ}<q<min{1,eβλ1}. For ρ¯n(q) we shall employ the constants K1=K2=Γ<(q),ϰ=d(q),T,ν=d(q),Sd(q),T for 0<q<βλ, and K1=K2=Γ>(q),ϰ=d(q),S,ν=d(q),Sd(q),T for max{0,βλ}<q<min{1,eβλ1}. Recall from (76) the constants c(q),T:=x0(q)(1qex0(q)), d(q),T:=qex0(q) and from (77) c(q),S:=qβλ, d(q),S:=x0(q)(qβλ)x0(q). In the following, we will refer to the sequences a_n(q) resp. a¯n(q) as the improved closed-form sequence-bounds. Putting all ingredients together, we arrive at the

Lemma 3.

For all βA,βH,αA,αHP there holds with d(q),T=qex0(q) and d(q),S=x0(q)(qβλ)x0(q)

  • (a) 
    in the case 0<q<βλ:
    • (i) 
      a_n(q)<an(q)a¯n(q)forallnN,
      with equality on the right-hand side iff n=1, where
      a_n(q)=x0(q)·1d(q),Tn+Γ<(q)·d(q),Tn11d(q),T·1d(q),Tn>an(q),T,anda¯n(q)=x0(q)·1d(q),SnΓ<(q)·d(q),Snd(q),Tnd(q),Sd(q),Td(q),Sn11d(q),Tn1d(q),Tan(q),S,
      with an(q),T and an(q),S defined by (78) and (79).
    • (ii) 
      Both sequences a_n(q)nN and a¯n(q)nN are strictly decreasing.
    • (iii) 
      limna_n(q)=limna¯n(q)=limnan(q)=x0(q)]βλ,qβλ[.
  • (b) 
    in the case max{0,βλ}<q<min1,eβλ1:
    • (i) 
      a_n(q)<an(q)a¯n(q),forallnN,
      with equality on the right-hand side iff n=1, where
      a_n(q)=x0(q)·1d(q),Tn+Γ>(q)·d(q),Tnd(q),S2nd(q),Td(q),S2>an(q),Tanda¯n(q)=x0(q)·1d(q),SnΓ>(q)·d(q),Sn1n1d(q),Tn1d(q),Tan(q),S,
      with an(q),T and an(q),S defined by (78) and (79).
    • (ii) 
      Both sequences a_n(q)nN and a¯n(q)nN are strictly increasing.
    • (iii) 
      limna_n(q)=limna¯n(q)=limnan(q)=x0(q)]qβλ,log(q)[.

A detailed proof of Lemma 3 is provided in Appendix A.3. In the following, we employ the above-mentioned investigations in order to derive the desired closed-form bounds of the Hellinger integrals Hλ(PA,nPH,n).

6.2. Explicit Closed-Form Bounds for the Cases βA,βH,αA,αH,λ(PNIPSP,1)×(R\{0,1})

Recall that in this setup, we have obtained the recursive, non-explicit exact values Vλ,X0,n=Hλ(PA,nPH,n) given in (39) of Theorem 1, where we used q=qλE=qE(βA,βH,λ)=βAλβH1λ]0,βλ[ in the case λ]0,1[ respectively q=qλE=βAλβH1λ>max{0,βλ} in the case λR\[0,1]. For the latter, Lemma 1 implies that qλE<min{1,eβλ1} iff λ]λ,λ+[\[0,1]. This—together with (39) from Theorem 1, Lemma 2 and with the quantities d(q),T,d(q),S, Γ<(q) and Γ>(q) as defined in (76) and (77) resp. (91) –leads to

Theorem 5.

Let pλE:=αAλαH1λ and qλE:=βAλβH1λ. For all βA,βH,αA,αH,λ(PNIPSP,1)×]λ,λ+[\{0,1}, all initial population sizes X0N and for all observation horizons nN the following assertions hold true:

  • (a) 
    the Hellinger integral can be bounded by the closed-form lower and upper bounds
    Cλ,X0,n(pλE,qλE),TCλ,X0,n(pλE,qλE),LVλ,X0,n=Hλ(PA,nPH,n)Cλ,X0,n(pλE,qλE),UCλ,X0,n(pλE,qλE),S,
  • (b) 
    limn1nlogVλ,X0,n=limn1nlogCλ,X0,n(pλE,qλE),L=limn1nlogCλ,X0,n(pλE,qλE),U=limn1nlogCλ,X0,n(pλE,qλE),T=limn1nlogCλ,X0,n(pλE,qλE),S=αAβA·x0(qλE),

where the involved closed-form lower bounds are defined by

Cλ,X0,n(pλE,qλE),L:=Cλ,X0,n(pλE,qλE),T·expζ_n(qλE)·X0+αAβA·ϑ_n(qλE),withCλ,X0,n(pλE,qλE),T:=expx0(qλE)·X0αAβA·d(qλE),T1d(qλE),T·1d(qλE),Tn+αAβAx0(qλE)·n, (96)

and the closed-form upper bounds are defined by

Cλ,X0,n(pλE,qλE),U:=Cλ,X0,n(pλE,qλE),S·expζ¯n(qλE)·X0αAβA·ϑ¯n(qλE),withCλ,X0,n(pλE,qλE),S:=expx0(qλE)·X0αAβA·d(qλE),S1d(qλE),S·1d(qλE),Sn+αAβAx0(qλE)·n, (97)

where in the case λ]0,1[

ζ_n(qλE):=Γ<(qλE)·d(qλE),Tn11d(qλE),T·1d(qλE),Tn>0, (98)
ϑ_n(qλE):=Γ<(qλE)·1d(qλE),Tn1d(qλE),T2·1d(qλE),T1+d(qλE),Tn1+d(qλE),T>0, (99)
ζ¯n(qλE):=Γ<(qλE)·d(qλE),Snd(qλE),Tnd(qλE),Sd(qλE),Td(qλE),Sn1·1d(qλE),Tn1d(qλE),T>0, (100)
ϑ¯n(qλE):=Γ<(qλE)·d(qλE),T1d(qλE),T·1d(qλE),Sd(qλE),Tn1d(qλE),Sd(qλE),Td(qλE),Snd(qλE),Tnd(qλE),Sd(qλE),T>0, (101)

and where in the case λ]λ,λ+[\[0,1]

ζ_n(qλE):=Γ>(qλE)·d(qλE),Tnd(qλE),S2nd(qλE),Td(qλE),S2>0, (102)
ϑ_n(qλE):=Γ>(qλE)d(qλE),Td(qλE),S2d(qλE),T1d(qλE),Tn1d(qλE),Td(qλE),S21d(qλE),S2n1d(qλE),S2>0, (103)
ζ¯n(qλE):=Γ>(qλE)·d(qλE),Sn1·n1d(qλE),Tn1d(qλE),T>0, (104)
ϑ¯n(qλE):=Γ>(qλE)·[d(qλE),Sd(qλE),T1d(qλE),S21d(qλE),T·1d(qλE),Sn+d(qλE),T1d(qλE),Sd(qλE),Tn1d(qλE),T1d(qλE),Sd(qλE),Td(qλE),Sn1d(qλE),S·n]>0. (105)

Notice that αAβA can be equivalently be replaced by αHβH in (96) and in (97).

A proof of Theorem 5 is given in Appendix A.3.

6.3. Explicit Closed-Form Bounds for the Cases βA,βH,αA,αH,λ(PSP\PSP,1)×]0,1[

To derive (explicit) closed-form lower bounds of the (nonexplicit) recursive lower bounds Bλ,X0,nL for the Hellinger integral Hλ(PA,nPH,n) respectively closed-form upper bounds of the recursive upper bounds Bλ,X0,nU for all parameters cases βA,βH,αA,αH,λ(PSP\PSP,1)×(R\{0,1}), we combine part (b) of Theorem 1, Lemma 2, Lemma 3 together with appropriate parameters pλL=pLβA,βH,αA,αH,λ,pλU=pUβA,βH,αA,αH,λ0 and qλL=qLβA,βH,αA,αH,λ, qλU=qUβA,βH,αA,αH,λ>0 satisfying (35). Notice that the representations of the lower and upper closed-form sequence-bounds depend on whether 0<qλA<βλ, 0<qλA=βλ or max{0,βλ}<qλA<min{1,eβλ1} (A{L,U}).

Let us start with closed-form lower bounds for the case λ]0,1[; recall that the choice pλL=αAλαH1λ,qλL=βAλβH1λ led to the optimal recursive lower bounds Bλ,X0,nL of the Hellinger integral (cf. Theorem 1(b) and Section 3.5). Correspondingly, we can derive

Theorem 6.

Let pλL=αAλαH1λ and qλL=βAλβH1λ. Then, the following assertions hold true:

  • (a) 
    For all βA,βH,αA,αH,λPSP,2PSP,3aPSP,3bPSP,3c×]0,1[ (for which particularly 0<qλL<βλ, βAβH), all initial population sizes X0N and all observation horizons nN there holds
    Cλ,X0,n(pλL,qλL),TCλ,X0,n(pλL,qλL),LBλ,X0,nL<1,
    whereCλ,X0,n(pλL,qλL),L:=Cλ,X0,n(pλL,qλL),T·expζ_n(qλL)·X0+pλLqλL·ϑ_n(qλL) (106)
    withCλ,X0,n(pλL,qλL),T:=exp{x0(qλL)·X0pλLqλL·d(qλL),T1d(qλL),T·1d(qλL),Tn+pλLqλL·βλ+x0(qλL)αλ·n},andwithζ_n(qλL):=Γ<(qλL)·d(qλL),Tn11d(qλL),T·1d(qλL),Tn>0, (107)
    ϑ_n(qλL):=Γ<(qλL)·1d(qλL),Tn1d(qλL),T2·1d(qλL),T1+d(qλL),Tn1+d(qλL),T>0. (108)
  • (b) 
    For all βA,βH,αA,αH,λ(PSP,4aPSP,4b)×]0,1[ (for which particularly 0<qλL=βλ, βA=βH), all initial population sizes X0N and all observation horizons nN there holds
    Cλ,X0,n(pλL,qλL),L:=Cλ,X0,n(pλL,qλL),T:=Bλ,X0,nL=exppλLαλ·n<1.
  • (c) 
    For all βA,βH,αA,αH,λ(PSP\PSP,1)×]0,1[ and all initial population sizes X0N one gets
    limn1nlogCλ,X0,n(pλL,qλL),T=limn1nlogCλ,X0,n(pλL,qλL),L=limn1nlogBλ,X0,nL=pλLqλL·βλ+x0(qλL)αλ<0,
    where in the case βA=βH there holds qλL=βλ and x0(qλL)=0.

The proof will be provided in Appendix A.3.

In order to deduce closed-form upper bounds for the case λ]0,1[, we first recall from the Section 3.6, Section 3.7, Section 3.8, Section 3.9, Section 3.10, Section 3.11, Section 3.12 and Section 3.13, that we have to employ suitable parameters pλU=pUβA,βH,αA,αH,λ,qλU=qUβA,βH,αA,αH,λ satisfying (35). Notice that we automatically obtain pλUpλL=αAλαH1λ>0. Correspondingly, we obtain

Theorem 7.

For all βA,βH,αA,αH,λ(PSP\PSP,1)×]0,1[, all coefficients pλU,qλU which satisfy (35) for all xN0 and additionally either 0<qλUβλ or βλ<qλU<min{1,eβλ1}, all initial population sizes X0N and all observation horizons nN the following assertions hold true:

Cλ,X0,n(pλU,qλU),SCλ,X0,n(pλU,qλU),UB˜λ,X0,n(pλU,qλU)Bλ,X0,nU,where (109)
  • (a) 
    in the case 0<qλU<βλ one has
    Cλ,X0,n(pλU,qλU),U:=Cλ,X0,n(pλU,qλU),S·expζ¯n(qλU)·X0pλUqλU·ϑ¯n(qλU) (110)
    withCλ,X0,n(pλU,qλU),S:=exp{x0(qλU)·X0pλUqλU·d(qλU),S1d(qλU),S·1d(qλU),Sn+pλUqλU·βλ+x0(qλU)αλ·n},ζ¯n(qλU):=Γ<(qλU)·d(qλU),Snd(qλU),Tnd(qλU),Sd(qλU),Td(qλU),Sn1·1d(qλU),Tn1d(qλU),T>0, (111)
    ϑ¯n(qλU):=Γ<(qλU)·d(qλU),T1d(qλU),T·1d(qλU),Sd(qλU),Tn1d(qλU),Sd(qλU),Td(qλU),Snd(qλU),Tnd(qλU),Sd(qλU),T>0; (112)
    furthermore, whenever pλU,qλU satisfy additionally (47) (such parameters exist particularly in the setups PSP,2PSP,3aPSP,3b, cf. Section 3.7, Section 3.8 and Section 3.9), then
    1>Cλ,X0,n(pλU,qλU),SandB˜λ,X0,n(pλU,qλU)=Bλ,X0,nUnN;
  • (b) 
    in the case 0<qλU=βλ one has
    Cλ,X0,n(pλU,qλU),U:=Cλ,X0,n(pλU,qλU),S:=B˜λ,X0,n(pλU,qλU)=exppλUαλ·n;
  • (c) 
    in the case βλ<qλU<min1,eβλ1 the formulas (109) and (110) remain valid, but with
    ζ¯n(qλU):=Γ>(qλU)·d(qλU),Sn1·n1d(qλU),Tn1d(qλU),T>0, (113)
    ϑ¯n(qλU):=Γ>(qλU)·[d(qλU),Sd(qλU),T1d(qλU),S21d(qλU),T·1d(qλU),Sn+d(qλU),T1d(qλU),Sd(qλU),Tn1d(qλU),T1d(qλU),Sd(qλU),Td(qλU),Sn1d(qλU),S·n]>0; (114)
  • (d) 
    for all cases (a) to (c) one gets
    limn1nlogCλ,X0,n(pλU,qλU),S=limn1nlogCλ,X0,n(pλU,qλU),U=limn1nlogB˜λ,X0,n(pλU,qλU)=pλUqλU·βλ+x0(qλU)αλ,
    where in the case qλU=βλ there holds x0(qλU)=0.

This Theorem 7 will be proved in Appendix A.3. Notice that for an inadequate choice of pλU,qλU it may hold that pλUqλU(βλ+x0(qλU))αλ>0 in part (d) of Theorem 7.

6.4. Explicit Closed-Form Bounds for the Cases βA,βH,αA,αH,λ(PSP\PSP,1)×(R\[0,1])

For λR\[0,1], let us now construct closed-form lower bounds of the recursive lower bound components B˜λ,X0,n(pλL,qλL), for suitable parameters pλL0 and either 0<qλLβλ or max{0,βλ}<qλL<min{1,eβλ1} satisfying (35).

Theorem 8.

For all βA,βH,αA,αH,λ(PSP\PSP,1)×(R\[0,1]), all coefficients pλL0,qλL>0 which satisfy (35) for all xN0 and either 0<qλLβλ or max{0,βλ}<qλL<min{1,eβλ1}, all initial population sizes X0N and all observation horizons nN the following assertions hold true:

Cλ,X0,n(pλL,qλL),TCλ,X0,n(pλL,qλL),LB˜λ,X0,n(pλL,qλL)Bλ,X0,nL,where (115)
  • (a) 
    in the case 0<qλL<βλ one has
    Cλ,X0,n(pλL,qλL),L:=Cλ,X0,n(pλL,qλL),T·expζ_n(qλL)·X0+pλLqλL·ϑ_n(qλL), (116)
    withCλ,X0,n(pλL,qλL),T:=exp{x0(qλL)·X0pλLqλL·d(qλL),T1d(qλL),T·1d(qλL),Tn+pλLqλL·βλ+x0(qλL)αλ·n}ζ_n(qλL):=Γ<(qλL)·d(qλL),Tn11d(qλL),T·1d(qλL),Tn>0, (117)
    ϑ_n(qλL):=Γ<(qλL)·1d(qλL),Tn1d(qλL),T2·1d(qλL),T1+d(qλL),Tn1+d(qλL),T>0; (118)
    furthermore, whenever pλL,qλL satisfy additionally (56) (such parameters exist particularly in the setups PSP,2PSP,3aPSP,3b, cf. Section 3.17, Section 3.18 and Section 3.19), then
    1<Cλ,X0,n(pλL,qλL),TandB˜λ,X0,n(pλL,qλL)=Bλ,X0,nLnN;
  • (b) 
    in the case 0<qλL=βλ one has
    Cλ,X0,n(pλL,qλL),L:=Cλ,X0,n(pλL,qλL),T=B˜λ,X0,n(pλL,qλL)=exppλLαλ·n;
  • (c) 
    in the case max{0,βλ}<qλL<min1,eβλ1 the formulas (115) and (116) remain valid, but with
    ζ_n(qλL):=Γ>(qλL)·d(qλL),Tnd(qλL),S2nd(qλL),Td(qλL),S2>0, (119)
    ϑ_n(qλL):=Γ>(qλL)d(qλL),Td(qλL),S2·d(qλL),T·1d(qλL),Tn1d(qλL),Td(qλL),S2·1d(qλL),S2n1d(qλL),S2>0; (120)
  • (d) 
    for all cases (a) to (c) one gets
    limn1nlogCλ,X0,n(pλL,qλL),T=limn1nlogCλ,X0,n(pλL,qλL),L=limn1nlogB˜λ,X0,n(pλL,qλL)=pλLqλL·βλ+x0(qλL)αλ,
    where in the case qλL=βλ there holds x0(qλL)=0.

For the proof of Theorem 8, see Appendix A.3. Notice that for an inadequate choice of pλL,qλL it may hold that pλLqλL(βλ+x0(qλU))αλ<0 in the last assertion of Theorem 8.

To derive closed-form upper bounds of the recursive upper bounds Bλ,X0,nU of the Hellinger integral in the case λR\[0,1], let us first recall from Section 3.24 that we have to use the parameters pλU=αAλαH1λ>0 and qλU=βAλβH1λ>0. Furthermore, in the case βAβH we obtain from Lemma 1 (setting qλ=qλU) the assertion that max{0,βλ}<qλU<min{1,eβλ1} iff λ]λ,λ+[\[0,1](implying that the sequence an(qλU)nN converges). In the case βA=βH on gets qλU=βAλβH1λ=βA=βH=βλ and therefore (cf. (P2)) an(qλU)=0 for all nN and for all λR\[0,1]. Correspondingly, we deduce

Theorem 9.

Let pλU=αAλαH1λ and qλU=βAλβH1λ. Then, the following assertions hold true:

  • (a) 
    For all βA,βH,αA,αH,λ(PSP,2PSP,3aPSP,3bPSP,3c)×(]λ,λ+[\[0,1]) (in particular for βAβH), all initial population sizes X0N and all observation horizons nN there holds
    >Cλ,X0,n(pλU,qλU),SCλ,X0,n(pλU,qλU),UBλ,X0,nU>1,
    whereCλ,X0,n(pλU,qλU),U:=Cλ,X0,n(pλU,qλU),S·expζ¯n(qλU)·X0pλUqλU·ϑ¯n(qλU) (121)
    withCλ,X0,n(pλU,qλU),S:=exp{x0(qλU)·X0pλUqλU·d(qλU),T1d(qλU),T·1d(qλU),Tn+pλUqλU·βλ+x0(qλU)αλ·n},ζ¯n(qλU):=Γ>(qλU)·d(qλU),Sn1·n1d(qλU),Tn1d(qλU),T>0, (122)
    ϑ¯n(qλU):=Γ>(qλU)·[d(qλU),Sd(qλU),T1d(qλU),S21d(qλU),T·1d(qλU),Sn+d(qλU),T1d(qλU),Sd(qλU),Tn1d(qλU),T1d(qλU),Sd(qλU),Td(qλU),Sn1d(qλU),S·n]>0. (123)
  • (b) 
    For all βA,βH,αA,αH,λ(PSP,4aPSP,4b)×(R\[0,1]) (for which particularly 0<qλU=βλ, βA=βH), all initial population sizes X0N and all observation horizons nN there holds
    Cλ,X0,n(pλU,qλU),U:=Cλ,X0,n(pλU,qλU),S:=Bλ,X0,nU=exppλUαλ·n>1.
  • (c) 
    For all βA,βH,αA,αH,λ(PSP\PSP,1)×(]λ,λ+[\[0,1]) and all initial population sizes X0N one gets
    limn1nlogCλ,X0,n(pλU,qλU),S=limn1nlogCλ,X0,n(pλU,qλU),U=limn1nlogBλ,X0,nU=pλUqλU·βλ+x0(qλU)αλ>0,
    where in the case βA=βH there holds qλU=βλ and x0(qλU)=0.

A proof of Theorem 9 is provided in Appendix A.3.

Remark 7.

Substituting an(q) by an(q),T resp. an(q),S (cf. (78) resp. (79)) in B˜λ,X0,n(p,q) from (42) leads to the “rudimentary” closed-form bounds Cλ,X0,n(p,q),T resp. Cλ,X0,n(p,q),S, whereas substituting an(q) by a_n(q) resp. a¯n(q) (cf. (92) resp. (94)) in B˜λ,X0,n(p,q) from (42) leads to the “improved” closed-form bounds Cλ,X0,n(p,q),L resp. Cλ,X0,n(p,q),U in all the Theorems 5–9.

6.5. Totally Explicit Closed-Form Bounds

The above-mentioned results give closed-form lower bounds Cλ,X0,n(p,q),L, Cλ,X0,n(p,q),T resp. closed-form upper bounds Cλ,X0,n(p,q),U, Cλ,X0,n(p,q),S of the Hellinger integrals Hλ(PA,nPH,n) for case-dependent choices of p,q. However, these bounds still involve the fixed point x0(q) which in general has to be calculated implicitly. In order to get “totally” explicit but “slightly” less tight closed-form bounds of Hλ(PA,nPH,n), one can proceed as follows:

  1. in all the closed-form lower bound formulas of the Theorems 5, 6 and 8–including the definitions (76), (77) and (91)–replace the implicit x0(q) by a close explicitly known point x_0(q)<x0(q);

  2. in all closed-form upper bound formulas of the Theorems 5, 7 and 9–including (76), (77) and (91)–replace x0(q) by a close explicitly known point x¯0(q)>x0(q).

For instance, one can use the following choices which will be also employed as an auxiliary tool for the diffusion-limit-concerning proof of Lemma A6 in Appendix A.4:

x_0(q):=q1·ex__0(q)·1q1q22·q·ex__0(q)·qβλ,ifq]0,βλ[,q1·1q1q22·q·qβλ,ifmax{0,βλ}<q<min{1,eβλ1}, (124)
wherex__0(q):=maxβλ,qβλ1q,ifq]0,1[,βλ,ifq1, (125)
x¯0(q):=q1·1q1q22·q·qβλ,ifq]0,βλ[,1q1q22·qβλ,ifmax{0,βλ}<q<min{1,eβλ1}and1q22·q·qβλ0,x¯¯0(q):=log(q)ifmax{0,βλ}<q<min{1,eβλ1}and1q22·q·qβλ<0. (126)

Behind this choice “lies” the idea that–in contrast to the solution x0(q) of ξλ(q)(x):=qexβλ=x–the point x_0(q) is a solution of (the obviously explicitly solvable) Q_λ(q)(x):=a_λ(q)x2+b_λ(q)x+c_λ(q)=x in both cases 0<q<βλ and max{0,βλ}<q<min{1,eβλ1}, whereas the point x¯0(q) is a solution of Q¯λ(q)(x):=a¯λ(q)x2+b¯λ(q)x+c¯λ(q)=x in the case 0<q<βλ and in the case max{0,βλ}<q<min{1,eβλ1} together with 1q22·q·qβλ0. Thereby, Q_λ(q)(·) and Q¯λ(q)(·) are the lower resp. upper quadratic approximates of ξλ(q)(·) satisfying the following constraints:

  • for q]0,βλ[ (mostly but not only for λ]0,1[) (lower bound):
    Q_λ(q)(0)=ξλ(q)(0)=qβλ,Q_λ(q)(0)=ξλ(q)(0)=q,Q_λ(q)(x)=ξλ(q)(y)=qey,xR,
    for some explicitly known approximate y<x0(q)(leading to the (tighter) explicit lower approximate x_0(q)]y,x0(q)[); here, we choose
    y:=x__0(q):=maxβλ,qβλ1q,ifq<1,βλ,ifq1;
  • for q]0,βλ[ (mostly but not only for λ]0,1[) (upper bound):
    Q¯λ(q)(0)=ξλ(q)(0)=qβλ,Q¯λ(q)(0)=ξλ(q)(0)=q,Q¯λ(q)(x)=ξλ(q)(0)=q,xR;
  • for max{0,βλ}<q<min{1,eβλ1} (mostly but not only for λR\[0,1]) (lower bound):
    Q_λ(q)(0)=ξλ(q)(0)=qβλ,Q_λ(q)(0)=ξλ(q)(0)=q,Q_λ(q)(x)=ξλ(q)(0)=q,xR;
  • for max{0,βλ}<q<min{1,eβλ1} in combination with 1q22·q·qβλ0 (mostly but not only for λR\[0,1]) (upper bound):
    Q¯λ(q)(0)=ξλ(q)(0)=qβλ,Q¯λ(q)(0)=ξλ(q)(0)=q,Q¯λ(q)(x)=ξλ(q)(log(q))=1,xR.

If max{0,βλ}<q<min{1,eβλ1} and 1q22·q·qβλ<0, then a real-valued solution Q¯λ(q)(x)=x does not exist and we set x¯0(q):=x¯¯0(q):=log(q), with ξλ(q)x¯¯0(q)=1. The above considerations lead to corresponding unique choices of constants a_λ(q),b_λ(q),c_λ(q),a¯λ(q),b¯λ(q),c¯λ(q) culminating in

Q_λ(q)(x):=q2·ex__0(q)·x2+q·x+qβλ,if0<q<βλ,q2·x2+q·x+qβλ,ifmax{0,βλ}<q<min{1,eβλ1}, (127)
Q¯λ(q)(x):=q2·x2+q·x+qβλ,if0<q<βλ,12·x2+q·x+qβλ,ifmax{0,βλ}<q<min{1,eβλ1}. (128)

6.6. Closed-Form Bounds for Power Divergences of Non-Kullback-Leibler-Information-Divergence Type

Analogously to Section 4 (see especially Section 4.1), for orders λR\{0,1} all the results of the previous Section 6.1, Section 6.2, Section 6.3, Section 6.4 and Section 6.5 carry correspondingly over from closed-form bounds of the Hellinger integrals Hλ(··) to closed-form bounds of the total variation distance V(·||·), by virtue of the relation (cf. (12))

21H12(PA,nPH,n)V(PA,nPH,n)21H12(PA,nPH,n)2,

to closed-form bounds of the Renyi divergences Rλ(··), by virtue of the relation (cf. (7))

0RλPA,nPH,n=1λ(λ1)logHλPA,nPH,n,withlog0:=,

as well as to closed-form bounds of the power divergences Iλ··, by virtue of the relation (cf. (2))

IλPA,nPH,n=1Hλ(PA,nPH,n)λ·(1λ),nN.

For the sake of brevity, the–merely repetitive–exact details are omitted.

6.7. Applications to Decision Making

The above-mentioned investigations of the Section 6.1 to Section 6.6 can be applied to the context of Section 2.5 on dichotomous decision making on the space of all possible path scenarios (path space) of Poissonian Galton-Watson processes without (with) immigration GW(I) (e.g., in combination with our running-example epidemiological context of Section 2.3). More detailed, for the minimal mean decision loss (Bayes risk) Rn defined by (18) we can derive explicit closed-form upper (respectively lower) bounds by using (19) respectively (20) together with the results of the Section 6.1, Section 6.2, Section 6.3, Section 6.4 and Section 6.5 concerning Hellinger integrals of order λ]0,1[; we can proceed analogously in the Neyman-Pearson context in order to deduce closed-form bounds of type II error probabilities, by means of (23) and (24). Moreover, in an analogous way we can employ the investigations of Section 6.6 on power divergences in order to obtain closed-form bounds of (i) the corresponding (cf. (21)) weighted-average decision risk reduction (weighted-average statistical information measure) about the degree of evidence deg concerning the parameter θ that can be attained by observing the GW(I)-path Xn until stage n, as well as (ii) the corresponding (cf. (22)) limit decision risk reduction (limit statistical information measure). For the sake of brevity, the–merely repetitive–exact details are omitted.

7. Hellinger Integrals and Power Divergences of Galton-Watson Type Diffusion Approximations

7.1. Branching-Type Diffusion Approximations

One can show that a properly rescaled Galton-Watson process without (respectively with) immigration GW(I) converges weakly to a diffusion process X˜:=X˜s,s[0,[ which is the unique, strong, nonnegative – and in case of ησ212 strictly positive– solution of the stochastic differential equation (SDE) of the form

dX˜s=ηκX˜sds+σX˜sdWs,s[0,[,X˜0]0,[given, (129)

where η[0,[, κ[0,[, σ]0,[ are constants and Ws,s[0,[ denotes a standard Brownian motion with respect to the underlying probability measure P; see e.g., Feller [130], Jirina [131], Lamperti [132,133], Lindvall [134,135], Grimvall [136], Jagers [56], Borovkov [137], Ethier & Kurtz [138], Durrett [139] for the non-immigration case corresponding to η=0, κ0, Kawazu & Watanabe [140], Wei & Winnicki [141], Winnicki [64] for the immigration case corresponding to η0, κ=0, as well as Sriram [142] for the general case η[0,[, κR. Feller-type branching processes of the form (129), which are special cases of continuous state branching processes with immigration (see e.g., Kawazu & Watanabe [140], Li [143], as well as Dawson & Li [144] for imbeddings to affine processes) play for instance an important role in the modelling of the term structure of interest rates, cf. the seminal Cox-Ingersoll-Ross CIR model [145] and the vast follow-up literature thereof. Furthermore, (129) is also prominently used as (a special case of) Cox & Ross’s [146] constant elasticity of variance CEV asset price process, as (part of) Heston’s [147] stochastic asset-volatility framework, as a model of neuron activity (see e.g., Lansky & Lanska [148], Giorno et al. [149], Lanska et al. [150], Lansky et al [151], Ditlevsen & Lansky [152], Höpfner [153], Lansky & Ditlevsen [154]), as a time-dynamic description of the nitrous oxide emission rate from the soil surface (see e.g., Pedersen [155]), as well as a model for the individual hazard rate in a survival analysis context (see e.g., Aalen & Gjessing [156]).

Along these lines of branching-type diffusion limits, it makes sense to consider the solutions of two SDEs (129) with different fixed parameter sets (η,κA,σ) and (η,κH,σ), determine for each of them a corresponding approximating GW(I), investigate the Hellinger integral between the laws of these two GW(I), and finally calculate the limit of the Hellinger integral (bounds) as the GW(I) approach their SDE solutions. Notice that for technicality reasons (which will be explained below), the constants η and σ ought to be independent of A, H in our current context.

In order to make the above-mentioned limit procedure rigorous, it is reasonable to work with appropriate approximations such that in each convergence step m one faces the setup PNIPSP,1 (i.e., the non-immigration or the equal-fraction case), where the corresponding Hellinger integral can be calculated exactly in a recursive way, as stated in Theorem 1. Let us explain the details in the following.

Consider a sequence of GW(I) X(m)mN with probability laws P(m) on a measurable space (Ω,F), where as above the subscript • stands for either the hypothesis H or the alternative A. Analogously to (1), we use for each fixed step mN the representation X(m):=X(m),N with

X(m):=j=1X1(m)Y1,j(m)+Y˜(m),N,X0(m)Ngiven, (130)

where under the law P(m)

  • the collection Y(m):=Yi,j(m),iN0,jN consists of i.i.d. random variables which are Poisson distributed with parameter β(m)>0,

  • the collection Y˜(m):=Y˜i(m),iN consists of i.i.d. random variables which are Poisson distributed with parameter α(m)0,

  • Y(m) and Y˜(m) are independent.

From arbitrary drift-parameters η[0,[, κ[0,[, and diffusion-term-parameter σ>0, we construct the offspring-distribution-parameter and the immigration-distribution parameter of the sequence X(m)N by

β(m):=1κσ2mandα(m):=β(m)·ησ2. (131)

Here and henceforth, we always assume that the approximation step m is large enough to ensure that β(m)]0,1] and at least one of βA(m), βH(m) is strictly less than 1; this will be abbreviated by mN¯. Let us point out that – as mentioned above–our choice entails the best-to-handle setup PNIPSP,1 (which does not happen if instead of η one uses η with ηAηH). Based on the GW(I) X(m), let us construct the continuous-time branching process X˜(m):=X˜s(m),s[0,[ by

X˜s(m):=1mXσ2ms(m), (132)

living on the state space E(m):=1mN0. Notice that X˜(m) is constant on each time-interval [kσ2m,k+1σ2m[ and takes at s=kσ2m the value 1mXk(m) of the k-th GW(I) generation size, divided by m, i.e., it “jumps” with the jump-size 1mXk(m)Xk1(m) which is equal to the 1m-fold difference to the previous generation size. From (132) one can immediately see the necessity of having σ to be independent of A, H because for the required law-equivalence in (the corresponding version of) (13) both models at stake have to “live” on the same time-scale τs(m):=σ2ms. For this setup, one obtains the following convergenc result:

Theorem 10.

Let η[0,[, κ[0,[, σ]0,[ and X˜(m) be as defined in (130) to (132). Furthermore, let us suppose that limm1mX0(m)=X˜0>0 and denote by d([0,[,[0,[) the space of right-continuous functions f:[0,[[0,[ with left limits. Then the sequence of processes X˜(m)mN¯ convergences in distribution in d([0,[,[0,[) to a diffusion process X˜ which is the unique strong, nonnegative–and in case of ησ212 strictly positive–solution of the SDE

dX˜s=ηκX˜sds+σX˜sdWs,s[0,[,X˜0]0,[given, (133)

where Ws,s[0,[ denotes a standard Brownian motion with respect to the limit probability measure P˜.

Remark 8.

Notice that the condition ησ212 can be interpreted in our approximation setup (131) as α(m)β(m)/2, which quantifies the intuitively reasonable indication that if the probability P[Y˜(m)=0]=eα(m) of having no immigration is small enough relative to the probability P[Y1,k(m)=0]=eβ(m) of having no offspring (mN¯), then the limiting diffusion X˜ never hits zero almost surely.

The corresponding proof of Theorem 10–which is outlined in Appendix A.4–is an adaption of the proof of Theorem 9.1.3 in Ethier & Kurtz [138] which deals with drift-parameters η=0, κ=0 in the SDE (133) whose solution is approached on a σindependent time scale by a sequence of (critical) Galton-Watson processes without immigration but with general offspring distribution with mean 1 and variance σ. Notice that due to (131) the latter is inconsistent with our Poissonian setup, but this is compensated by our chosen σdependent time scale. Other limit investigations for (133) involving offspring/immigration distributions and parametrizations which are also incompatible to ours, are e.g., treated in Sriram [142].

As illustration of our proposed approach, let us give the following

Example 3.

Consider the parameter setup (η,κ,σ)=(5,2,0.4) and initial generation size X˜0=3. Figure 4 shows the diffusion-approximation X˜s(m) (blue) of the corresponding solution X˜s of the SDE (133) up to the time horizon T=10, for the approximation steps m{13,50,200,1000}. Notice that in this setup there holds N¯={kN:k13} (recall that N¯ is the subset of the positive integers such that β(m)=1κσ2·m>0). The “long-term mean” of the limit process X˜s is ηκ=2.5 and is indicated as red line. The “long-term mean” of the approximations X˜s(m) is equal to α(m)1β(m)=ηκησ2·m=2.531.25/m and is displayed as green line.

Figure 4.

Figure 4

Simulation of the process X˜s(m) for the approximation steps m{13,50,200,1000} in the parameter setup (η,κ,σ)=(5,2,0.4) and with initial starting value X˜0=3.

7.2. Bounds of Hellinger Integrals for Diffusion Approximations

For each approximation step m and each observation horizon t[0,[, let us now investigate the behaviour of the Hellinger integrals HλPA,t(m),CdAPH,t(m),CdA, where P,t(m),CdA denotes the canonical law (under H resp. A) of the continuous-time diffusion approximationX˜(m) (cf. (132)), restricted to [0,t]. It is easy to see that HλPA,t(m),CdAPH,t(m),CdA coincides with HλPA,σ2mt(m)PH,σ2mt(m) of the law restrictions of the GW(I) generations sizes X(m){0,,σ2mt}, where σ2mtσ2m can be interpreted as the last “jump-time” of X˜(m) before t. These Hellinger integrals obey the results of

  • the Propositions 2 and 3 (for η=0) respectively the Propositions 4 and 5 (for η]0,[), as far as recursively computable exact values are concerned,

  • Theorem 5 as far as closed-form bounds are concerned; recall that the current setup is of type PNIPSP,1, and thus we can use the simplifications proposed in the Remark 7(a).

In order to obtain the desired Hellinger integral limits limmHλPA,σ2mt(m)PH,σ2mt(m), one faces several technical problems which will be described in the following. To begin with, for fixed mN¯ we apply the Propositions 2(b), 3(b), 4(b), 5(b) to the current setup (βA(m),βH(m),αA(m),αH(m))PNIPSP,1 with

β(m):=β(m,κ,σ2):=1κσ2mandα(m):=α(m,κ,σ2,η):=β(m)·ησ2(cf.(131)).

Notice that η=0 corresponds to the no-immigration (NI) case and that α(m)β(m)=ησ2. Accordingly, we set αλ(m):=λ·αA(m)+(1λ)·αH(m),βλ(m):=λ·βA(m)+(1λ)·βH(m). By using

qλ(m):=q(m,κ,σ2,λ):=βA(m)λβH(m)1λ,λR\{0,1}, (134)

as well as the connected sequence an(m)nN:=an(qλ(m))nN we arrive at the

Corollary 13.

For all βA(m),βH(m),αA(m),αH(m),λ(PNIPSP,1)×(R\{0,1}) and all population sizes X0(m)N there holds

hλ(m):=HλPA,σ2mt(m)PH,σ2mt(m)=expaσ2mt(qλ(m))·X0(m)+ησ2k=1σ2mtak(qλ(m)) (135)

with η=0 in the NI case.

In the following, we employ the SDE-parameter constellations (which are consistent with (131) in combination with our requirement to work here only on (PNIPSP,1))

P˜NI:=(κA,κH,η),η=0,κA[0,[,κH[0,[,κAκH, (136)
P˜SP,1:=(κA,κH,η),η>0,κA[0,[,κH[0,[,κAκH. (137)

Due to the–not in closed-form representable–recursive nature of the sequences an(q)nN defined by (36), the calculation of limmhλ(m) in (135) seems to be not (straightforwardly) tractable; after all, one “has to move along” a sequence of recursions (roughly speaking) since σ2mt as m tends to infinity. One way to “circumvent” such technical problems is to compute instead of the limit limmhλ(m) of the (exact values of the) Hellinger integrals hλ(m), the limits of the corresponding (explicit) closed-form lower resp. upper bounds adapted from Theorem 5. In order to achieve this, one first needs a preparatory step, due to the fact that the sequence aσ2mt(qλ(m))mN¯ (and hence its bounds leading to closed-form expressions) does not necessarily converge for all λR\[0,1]; roughly, this can be conjectured from the Propositions 3(c) and 5(c) in combination with σ2mt. Correspondingly, for our “sequence-of-recursions” context equipped with the diffusion-limit’s drift-parameter constellations (κA,κH,η) we have to derive a “convergence interval” [λ˜,λ˜+]\[0,1] which replaces the single-recursion-concerning [λ,λ+]\[0,1] (cf. Lemma 1). This amounts to

Proposition 15.

For all (κA,κH,η)P˜NIP˜SP,1 define

0>λ˜:=,ifκA<κH,κH2κA2κH2,ifκA>κH,and1<λ˜+:=κH2κH2κA2,ifκA<κH,,ifκA>κH. (138)

Then, for all (κA,κH,η,λ)(P˜NIP˜SP,1)×]λ˜,λ˜+[\[0,1] there holds for all sufficiently large mN¯

qλ(m):=1κAσ2mλ1κHσ2m1λ<min1,eβλ(m)1, (139)

and thus the sequence an(qλ(m))nN converges to the fixed point x0(m)]0,logqλ(m)[.

This will be proved in Appendix A.4.

We are now in the position to determine bounds of the Hellinger integral limits limmHλPA,σ2mt(m)PH,σ2mt(m) in form of m-limits of appropriate versions of closed-form bounds from Section 6. For the sake of brevity, let us henceforth use the abbreviations x0(m):=x0(qλ(m)), Γ<(m):=Γ<(qλ(m))=qλ(m)2·ex0(m)·x0(m)2, Γ>(m):=Γ>(qλ(m))=qλ(m)2·x0(m)2, d(m),S:=d(qλ(m)),S=x0(m)(qλ(m)βλ(m))x0(m) and d(m),T:=d(qλ(m)),T=qλ(m)·ex0(m). By the above considerations, the Theorem 5 (together with Remark 7(a)) adapts to the current setup as follows:

Corollary 14.

(a) For all (κA,κH,η,λ)(P˜NIP˜SP,1)×]0,1[, all t[0,[, all approximation steps mN¯ and all initial population sizes X0(m)N the Hellinger integral can be bounded by

Cλ,X0(m),t(m),L:=exp{x0(m)·X0(m)ησ2d(m),T1d(m),T1d(m),Tσ2mt+x0(m)ησ2·σ2mt+ζ_σ2mt(m)·X0(m)+ησ2·ϑ_σ2mt(m)} (140)
HλPA,σ2mt(m)PH,σ2mt(m)exp{x0(m)·X0(m)ησ2d(m),S1d(m),S1d(m),Sσ2mt+x0(m)ησ2·σ2mtζ¯σ2mt(m)·X0(m)ησ2·ϑ¯σ2mt(m)}=:Cλ,X0(m),t(m),U, (141)

where we define analogously to (98) to (101)

ζ_n(m):=Γ<(m)·d(m),Tn11d(m),T·1d(m),Tn>0, (142)
ϑ_n(m):=Γ<(m)·1d(m),Tn1d(m),T2·1d(m),T1+d(m),Tn1+d(m),T>0, (143)
ζ¯n(m):=Γ<(m)·d(m),Snd(m),Tnd(m),Sd(m),Td(m),Sn1·1d(m),Tn1d(m),T>0, (144)
ϑ¯n(m):=Γ<(m)·d(m),T1d(m),T·1d(m),Sd(m),Tn1d(m),Sd(m),Td(m),Snd(m),Tnd(m),Sd(m),T>0. (145)

Notice that (140) and (141) simplify significantly for (κA,κH,η,λ)P˜NI×]0,1[ for which η=0 holds.

(b) For all (κA,κH,η,λ)(P˜NIP˜SP,1)×]λ˜,λ˜+[\[0,1] and all initial population sizes X0(m)N the Hellinger integral bounds (140) and (141) are valid for all sufficiently large mN¯, where the expressions (142) to (145) have to be replaced by

ζ_n(m):=Γ>(m)·d(m),Tnd(m),S2nd(m),Td(m),S2>0, (146)
ϑ_n(m):=Γ>(m)d(m),Td(m),S2·d(m),T·1d(m),Tn1d(m),Td(m),S2·1d(m),S2n1d(m),S2>0,ζ¯n(m):=Γ>(m)·d(m),Sn1·n1d(m),Tn1d(m),T>0, (147)
ϑ¯n(m):=Γ>(m)·[d(m),Sd(m),T1d(m),S21d(m),T·1d(m),Sn (148)
+d(m),T1d(m),Sd(m),Tn1d(m),T1d(m),Sd(m),Td(m),Sn1d(m),S·n]. (149)

Let us finally present the desired assertions on the limits of the bounds given in Corollary 14 as the approximation step m tends to infinity, by employing for λ]λ˜,λ˜+[[0,1] the quantities

κλ:=λκA+(1λ)κHaswellasΛλ:=λκA2+(1λ)κH2, (150)

for which the following relations hold:

Λλ>κλ>0,forλ]0,1[, (151)
0<Λλ<κλ,forλ]λ˜,λ˜+[\[0,1]. (152)

Theorem 11.

Let the initial SDE-value X˜0]0,[ be arbitrary but fixed, and suppose that limm1mX0(m)=X˜0. Then, for all (κA,κH,η,λ)(P˜NIP˜SP,1)×]λ˜,λ˜+[\{0,1} and all t[0,[ the Hellinger integral limit can be bounded by

Dλ,X˜0,tL:=exp{Λλκλσ2X˜0ηΛλ1eΛλ·tησ2Λλκλ·t+Lλ(1)(t)·X˜0+ησ2·Lλ(2)(t)} (153)
limmHλPA,σ2mt(m)PH,σ2mt(m)exp{Λλκλσ2X˜0η12(Λλ+κλ)1e12(Λλ+κλ)·tησ2Λλκλ·tUλ(1)(t)·X˜0ησ2·Uλ(2)(t)}=:dλ,X˜0,tU, (154)

where for the (sub)case of all λ]0,1[ and all t0

Lλ(1)(t):=Λλκλ22σ2·Λλ·eΛλ·t·1eΛλ·t, (155)
Lλ(2)(t):=14·ΛλκλΛλ2·1eΛλ·t2, (156)
Uλ(1)(t):=Λλκλ2σ2·e12(Λλ+κλ)·teΛλ·tΛλκλe12(Λλ+κλ)·t1eΛλ·t2·Λλ, (157)
Uλ(2)(t):=Λλκλ2Λλ·1e123Λλ+κλ·t3Λλ+κλ+eΛλ·te12(Λλ+κλ)·tΛλκλ, (158)

and for the remaining (sub)case of all λ]λ˜,λ˜+[\[0,1] and all t0

Lλ(1)(t):=Λλκλ22σ2·κλ·eΛλ·t·1eκλ·t, (159)
Lλ(2)(t):=Λλκλ22·κλ·1eΛλ·tΛλ1e(Λλ+κλ)·tΛλ+κλ, (160)
Uλ(1)(t):=Λλκλ22·σ2·e12(Λλ+κλ)·t·t1eΛλ·tΛλ, (161)
Uλ(2)(t):=Λλκλ2·Λλκλ1e12(Λλ+κλ)·tΛλ·Λλ+κλ2+1e12(3Λλ+κλ)·tΛλ·3Λλ+κλe12(Λλ+κλ)·tΛλ+κλ·t. (162)

Notice that the components Lλ(i)(t) and Uλ(i)(t)(for i=1,2 and in both cases λ]0,1[ and λ]λ˜,λ˜+[\[0,1]) are strictly positive for t>0 and do not depend on the parameter η. Furthermore, the bounds dλ,X˜0,tL and dλ,X˜0,tU simplify significantly in the case (κA,κH,η)P˜NI, for which η=0 holds.

This will be proved in Appendix A.4. For the time-asymptotics, we obtain the

Corollary 15.

Let the initial SDE-value X˜0]0,[ be arbitrary but fixed, and suppose that limm1mX0(m)=X˜0. Then:

(a) For all (κA,κH,η,λ)P˜NI×]λ˜,λ˜+[\{0,1} the Hellinger integral limit converges to

limtlimmlogHλPA,σ2mt(m)PH,σ2mt(m)=X˜0σ2·Λλκλ<0,forλ]0,1[,>0,forλ]λ˜,λ˜+[\[0,1].

(b) For all (κA,κH,η,λ)P˜SP,1×]λ˜,λ˜+[\{0,1} the Hellinger integral limit possesses the asymptotical behaviour

limt1tloglimmHλPA,σ2mt(m)PH,σ2mt(m)=ησ2·Λλκλ<0,forλ]0,1[,>0,forλ]λ˜,λ˜+[\[0,1].

The assertions of Corollary 15 follow immediately by inspecting the expressions in the exponential of (153) and (154) in combination with (155) to (162).

7.3. Bounds of Power Divergences for Diffusion Approximations

Analogously to Section 4 (see especially Section 4.1), for orders λR\{0,1} all the results of the previous Section 7.2 carry correspondingly over from (limits of) bounds of the Hellinger integrals HλPA,σ2mt(m)PH,σ2mt(m) to (limits of) bounds of the total variation distance VPA,σ2mt(m)PH,σ2mt(m) (by virtue of (12)), to (limits of) bounds of the Renyi divergences RλPA,σ2mt(m)PH,σ2mt(m) (by virtue of (7)) as well as to (limits of) bounds of the power divergences IλPA,σ2mt(m)PH,σ2mt(m) (by virtue of (2)). For the sake of brevity, the–merely repetitive–exact details are omitted. Moreover, by combining the outcoming results on the above-mentioned power divergences with parts of the Bayesian-decision-making context of Section 2.5, we obtain corresponding assertions on (i) the (cf. (21)) weighted-average decision risk reduction (weighted-average statistical information measure) about the degree of evidence deg concerning the parameter θ that can be attained by observing the GWI-path Xn until stage n, as well as (ii) the (cf. (22)) limit decision risk reduction (limit statistical information measure).

In the following, let us concentrate on the derivation of the Kullback-Leibler information divergence KL (relative entropy) within the current diffusion-limit framework. Notice that altogether we face two limit procedures simultaneously: by the first limit limλ1IλPA,σ2mt(m)PH,σ2mt(m) we obtain the KL IPA,σ2mt(m)PH,σ2mt(m) for every fixed approximation step mN¯; on the other hand, for each fixed λ]0,1[, the second limit limmIλPA,σ2mt(m)PH,σ2mt(m) describes the limit of the power divergence – as the sequence of rescaled and continuously interpolated GW(I)’s X˜s(m)s[0,[mN¯(equipped with probability law PA,σ2mt(m) resp. PH,σ2mt(m) up to time σ2mt) converges weakly to the continuous-time CIR-type diffusion process X˜ss[0,[ (with probability law P˜A,t resp. P˜H,t up to time t). In Appendix A.4 we shall prove that these two limits can be interchanged:

Theorem 12.

Let the initial SDE-value X˜0]0,[ be arbitrary but fixed, and suppose that limm1mX0(m)=X˜0. Then, for all (κA,κH,η)P˜NIP˜SP,1 and all t[0,[, one gets the Kullback-Leibler information divergence (relative entropy) convergences

limmIPA,σ2mt(m)PH,σ2mt(m)=limmlimλ1IλPA,σ2mt(m)PH,σ2mt(m)=κAκH22σ2·κA·X˜0ηκA·1eκA·t+η·t,ifκA>0,κH22σ2·η2·t2+X˜0·t,ifκA=0,=limλ1limmIλPA,σ2mt(m)PH,σ2mt(m). (163)

This immediately leads to the following

Corollary 16.

Let the initial SDE-value X˜0]0,[ be arbitrary but fixed, and suppose that limm1mX0(m)=X˜0. Then, the KL limit (163) possesses the following time-asymptotical behaviour:

(a) For all (κA,κH,η)P˜NI (i.e., η=0) one gets

(i)inthecaseκA>0limtlimmIPA,σ2mt(m)PH,σ2mt(m)=X˜0·(κAκH)22σ2·κA,(ii)inthecaseκA=0limtlimm1t·IPA,σ2mt(m)PH,σ2mt(m)=X˜0·κH24σ2.

(b) For all (κA,κH,η)P˜SP,1 (i.e., η>0) one gets

(i)inthecaseκA>0limtlimm1t·IPA,σ2mt(m)PH,σ2mt(m)=η·(κAκH)22σ2·κA,(ii)inthecaseκA=0limtlimm1t2·IPA,σ2mt(m)PH,σ2mt(m)=η·κH24σ2.

Remark 9.

In Appendix A.4 we shall see that the proof of the last (limit-interchange concerning) equality in (163) relies heavily on the use of the extra terms Lλ(1)(t),Lλ(2)(t),Uλ(1)(t),Uλ(2)(t) in (153) and (154). Recall that these terms ultimately stem from (manipulations of) the corresponding parts of the “improved closed-form bounds” in Theorem 5, which were derived by using the linear inhomogeneous difference equations a_n(q) resp. a¯n(q) (cf. (92) resp. (94)) instead of the linear homogeneous difference equations an(q),T resp. an(q),S (cf. (78) resp. (79)) as explicit approximates of the sequence an(q). Not only this fact shows the importance of this more tedious approach.

Interesting comparisons of the above-mentioned results in Section 7.2 and Section 7.3 with corresponding information measures of the solutions of the SDE (129) themselves (rather their branching approximations), can be found in Kammerer [157].

7.4. Applications to Decision Making

Analogously to Section 6.7, the above-mentioned investigations of the Section 7.1, Section 7.2 and Section 7.3 can be applied to the context of Section 2.5 on dichotomous decision making about GW(I)-type diffusion approximations of solutions of the stochastic differential Equation (129). For the sake of brevity, the–merely repetitive–exact details are omitted.

Acknowledgments

We are very grateful to the referees for their patience to review this long manuscript, and for their helpful suggestions. Moreover, we would like to thank Andreas Greven for some useful remarks.

Appendix A. Proofs and Auxiliary Lemmas

Appendix A.1. Proofs and Auxiliary Lemmas for Section 3

Lemma A1.

For all real numbers x,y,z>0 and all λR one has

xλy1λλxzλ1+(1λ)yzλ0,forλ]0,1[,=0,forλ{0,1},0,forλR\[0,1],

with equality in the cases λR\{0,1} iff xy=z.

Proof of Lemma A1.

For fixed x˜:=xzλ1>0, y˜:=yzλ>0 with x˜y˜ we inspect the function g on R defined by g(λ):=x˜λy˜1λ(λx˜+(1λ)y˜) which satisfies g(0)=g(1)=0, g(0)=y˜log(x˜/y˜)(x˜y˜)<y˜((x˜/y˜)1)(x˜y˜)=0 and which is strictly convex. Thus, the assertion follows immediately by taking into account the obvious case x˜=y˜. □

Proof of Properties 1.

Property (P9) is trivially valid. To show (P1) we assume 0<q<βλ, which implies a1(q)=ξλ(q)(0)=qβλ<0. By induction, annN is strictly negative and strictly decreasing. As stated in (P9), the function ξλ(q) is strictly increasing, strictly convex and converges to βλ for x. Thus, it hits the straight line id(x)=x once and only once on the negative real line at x0(q)]βλ,0[ (cf. (44)). This implies that the sequence an(q)nN converges to x0(q)]βλ,qβλ[. Property (P2) follows immediately. In order to prove (P3), let us fix q>max{0,βλ}, implying a1(q)=ξλ(q)(0)=qβλ>0; notice that in this setup, the special choice q=1 implies min{1,eβλ1}=eβλ1<q. By induction, an(q)nN is strictly positive and strictly increasing. Since limxξλ(q)(x)=, the function ξλ(q) does not necessarily hit the straight line id(x)=x on the positive real line. In fact, due to strict convexity (cf. (P9)), this is excluded if ξλ(q)(0)=q1. Suppose that q<1. To prove that there exists a positive solution of the equation ξλ(q)(x)=x it is sufficient to show that the unique global minimum of the strict convex function hλ(q)(x):=ξλ(q)(x)x is taken at some point x0]0,[ and that hλ(q)(x0)0. It holds hλ(q)(x)=q·ex1, and therefore hλ(q)(x)=0 iff x=x0=logq. We have hλ(q)(logq)=1βλ+logq, which is less or equal to zero iff qeβλ1. It remains to show that for q>βλ and q>min1,eβλ1 the sequence an(q)nN grows faster than exponentially, i.e., there do not exist constants c1,c2R such that an(q)ec1+c2n for all nN. We already know that (in the current case) an(q)n. Notice that it is sufficient to verify lim supnlog(an+1(q))log(an(q))=. For the case βλ0 the latter is obtained by

logan+1(q)logan(q)=log(qβλ)ean(q)+βλ(ean(q)1)logqean1(q)βλlog(qβλ)log(q)+qean1(q)βλan1(q)an1(q).

An analogous consideration works out for the case βλ<0. Property (P4) is trivial, and (P5) to (P8) are direct implications of the already proven properties (P1) to (P4). □

Proof of Lemma 1.

(a) Let βA>0, βH>0 with βAβH, λR\]0,1[, βλ:=λβA+(1λ)βH and qλ:=βAλβH1λ>max{0,βλ} (cf. Lemma A1). Below, we follow the lines of Linkov & Lunyova [53], appropriately adapted to our context. We have to find those λR\]0,1[ for which the following two conditions hold:

  • (i)

    qλ1, i.e., ξλ(qλ)(0)1,

  • (ii)

    qλeβλ1 (cf.(P3a)), which is equivalent with the existence of a–positive, if (i) is satisfied,–solution of the equation ξλ(qλ)(x)=x.

Notice that the case qλ=1, λR\[0,1], cannot appear in (i), provided that (ii) holds (since due to Lemma A1eβλ1<eqλ1=1). For (i), it is easy to check that we have to require

λ<log(βH)log(βH/βA),ifβA>βH,>log(βH)log(βH/βA),ifβA<βH. (A1)

To proceed, straightforward analysis leads to log(qλ)=argminxRξλ(qλ)(x)x. To check (ii), we first notice that qλeβλ1 iff ξλ(qλ)(x)x0 for some xR. Hence, we calculate

ξλ(qλ)log(qλ)+log(qλ)01λ(βAβH)βH+λlogβAβH+log(βH)0λ·βH1βAβH+logβAβHβH1logβH. (A2)

In order to isolate λ in (A2), one has to find out for which (βA,βH) the term in the square bracket is positive resp. zero resp. negative. To achieve this, we aim for the substitutions x:=βA/βH, β=βH and thus study first the auxiliary function hβ(x):=log(x)β(x1), x>0, with fixed parameters β>0. Straightforwardly, we obtain hβ(x)=x1β and hβ(x)=x2. Thus, the function hβ(·) is strictly concave and attains a maximum at x=β1. Since additionally hβ(1)=0 and hβ(1)=1β, there exists a second solution z(β)1 of the equation hβ(x)=0 iff β1. Thus, one gets

  • for β=1: for all x>0 there holds hβ(x)0, with equality iff x=β1,

  • for β<1: hβ(x)0 iff x[1,z(β)], with equality iff x{1,z(β)} (notice that z(β)>1),

  • for β>1: hβ(x)0 iff x[z(β),1], with equality iff x{z(β),1} (notice that z(β)<1).

Suppose that λ<0.

Case 1: If βH=1, then condition (ii) is not satisfied whenever βAβH, since the right side of (A2) is equal to zero and the left side is strictly greater than zero. Hence, λ=0.

Case 2: Let βH>1. If βA<βH, then condition (i) is not satisfied and hence λ=0. If βA>βH, then condition (i) is satisfied iff λ<λ˘˘:=λ˘˘(βA,βH):=log(βH)log(βH/βA)<0. On the other hand, incorporating the discussion of the function hβ(·), we see that hβHβAβH<0. Thus, (A2) implies that condition (ii) is satisfied when λλ˘:=λ˘(βA,βH):=βH1logβHβHβA+logβAβH. We claim that λ˘˘<λ˘ and conclude that the conditions (i) and (ii) are not fulfilled jointly, which leads to λ=0. To see this, we notice that due to 1<βH<βA we get log(βA)/(βA1)<log(βH)/(βH1) and thus

log(βA)(βH1)<log(βH)(βA1)βHlog(βH)βAlog(βH)<βHlog(βH)βHlog(βA)log(βH)+log(βA)log(βH)(βHβA)+log(βH)logβAβH<logβHβA(βH1)+log(βH)logβAβHlog(βH)logβHβA<βH1log(βH)βHβA+logβAβHλ˘˘<λ˘. (A3)

Case 3: Let βH<1. For this, one gets hβHβAβH0 for βA]βH,βHz(βH)]. Hence, condition (ii) is satisfied if either βA]βH,βHz(βH)], or βA]βH,βHz(βH)] and λλ˘. If βA>βHz(βH), then condition (i) is trivially satisfied for all λ<0. In the case βA<βH, condition (i) is satisfied whenever λ>λ˘˘. Notice that since 0<βA<βH<1, an analogous consideration as in (A3) leads to λ˘˘<λ˘. This implies that λ=λ˘. The last case βA]βH,βHz(βH)] is easy to handle: since log(βH)log(βH/βA)>0 as well as zβHβAβH>0, both conditions (i) and (ii) hold trivially.

The representation of λ+ follows straightforwardly from the λ-result and the skew symmetry (8), by employing 1λ˘(βH,βA)=λ˘(βA,βH). Alternatively, one can proceed analogously to the λ-case.

Part (b) is much easier to prove: if β:=βA=βH>0, then for all λR\[0,1] one gets qλ=βAλβH1λ=β as well as βλ=β. Hence, Properties 1 (P2) implies that an(qλ)0 and thus it is convergent, independently of the choice λR\[0,1]. □

Proof of Formula (51).

For the parameter constellation in Section 3.10, we employ as upper bound for ϕλ(x) (xN0) the function

ϕλ¯(x):=ϕλ(0),ifx=0,0,ifx>0.

Notice that this method is rather crude, and gives in the other cases treated in the Section 3.7, Section 3.8 and Section 3.9 worse bounds than those derived there. Since λ]0,1[ and αAαH, one has ϕλ(0)<0. In order to derive an upper bound of the Hellinger integral, we first set ϵ¯:=1eϕλ(0)]0,1[. Hence, for all nN\{1} we obtain the auxiliary expression

xn1=0φλ(xn2)xn1xn1!·expϕλ(xn1)xn1=0φλ(xn2)xn1xn1!·expϕλ¯(xn1)=expφλ(xn2)ϵ¯=expφλ(xn2)·1ϵ¯·expφλ(xn2).

Moreover, since βAβH, one gets limxϕλ(x)= (cf. Properties 3 (P20) and Lemma A1). This–together with the nonnegativity of φλ(·)–implies

supxN0expϕλ(x)·1ϵ¯·expφλ(x)=:δ¯]0,1[.

Incorporating these considerations as well as the formulas (27) to (32), we get for n=1 the relation HλPA,nPH,n=exp{ϕλ(x0)}1(with equality iff x0=x*=αAαHβHβA), and–as a continuation of formula (29)– for all nN\{1}(recall that x:=(x0,x1,)Ω)

HλPA,nPH,n=x1=0xn=0k=1nZn,k(λ)(x)=x1=0xn1=0k=1n1Zn,k(λ)(x)·expfA(xn1)λfH(xn1)(1λ)(λfA(xn1)+(1λ)fH(xn1))=x1=0xn2=0k=1n2Zn,k(λ)(x)·expfλ(xn2)xn1=0φλ(xn2)xn1xn1!·exp{ϕλ(xn1)}x1=0xn2=0k=1n2Zn,k(λ)(x)·expϕλ(xn2)·1ϵ¯·expφλ(xn2)δ¯·x1=0xn2=0k=1n2Zn,k(λ)(x)δ¯n/2. (A4)

Hence, HλPA,nPH,n<1 for (at least) all nN\{1}, and limnHλPA,nPH,n=0. □

Notice that the above proof method of formula (51) does not work for the parameter setup in Section 3.11, because there one gets δ¯=supxN0expϕλ(x)·1ϵ¯·expφλ(x)=1.

Proof of Proposition 9.

In the setup βA,βH,αA,αH,λPSP,4a×]0,1[ we require β:=βA=βH<1. As a linear upper bound for ϕλ(·), we employ the tangent line at y0 (cf. (52))

ϕλ,ytan(x):=(pyαλ)+qyβ·x:=(pλ,ytanαλ)+(qλ,ytanβλ)·x:=ϕλ(y)y·ϕλ(y)+ϕλ(y)·x. (A5)

Since in the current setup PSP,4a the function ϕλ(·) is strictly increasing, the slope ϕλ(y) of the tangent line at y is positive. Thus we have qy>βλ and Properties 1 (P3) implies that the sequence an(qy)nN is strictly increasing and converges to x0(qy)]0,log(qy)] iff qymin{1,eβ1}=eβ1<1 (cf. (P3a)), where x0(qy) is the smallest solution of the equation ξλ(qy)(x)=qy·exβ=x. Since qyβ for y (cf. Properties 3 (P18)) and additionally eβ1>β, there exists a large enough y0 such that the sequence an(qy)nN converges. If this y is also large enough to additionally guarantee h(y)<0 for

h(y):=limn1nlogB˜λ,X0,n(py,qy)=py·ex0(qy)αλ,

then one can conclude that limnHλ(PA,nPH,n)=0. As a first step, for verifying h(y)<0 we look for an upper bound x¯0(qy) for the fixed point x0(qy) where the latter exists for yy1 (say). Notice that

Q¯λ(qy)(x):=12x2+qyx+qyβqy·exβ=ξλ(qy)(x), (A6)

since Q¯λ(qy)(0)=ξλ(qy)(0), Q¯λ(qy)(0)=ξλ(qy)(0) and Q¯λ(qy)(x)ξλ(qy)(x) for x[0,log(qy)]. For sufficiently large yy2y1 (say), we easily obtain the smaller solution of Q¯λ(qy)(x)=x as

x¯0(qy)=(1qy)(1qy)22(qyβ)=(1ϕλ(y)β)(1ϕλ(y)β)22ϕλ(y)x0(qy) (A7)

where the expression in the root is positive since qyβ for y. We now have

h(y)=py·ex0(qy)αλpy·ex¯0(qy)αλ=:h¯(y),yy2. (A8)

Hence, it suffices to show that h¯(y)<0 for some yy2. We recall from Properties 3 (P15), (P17) and (P19) that

ϕλ(y)=αA+β·yλαH+β·y1λλαA+β·y(1λ)αH+β·y<0,ϕλ(y)=λ·β·αA+β·yαH+β·yλ1+(1λ)·β·αA+β·yαH+β·yλβ>0andthatϕλ(y)=αA+β·yαH+β·yλ·λ(1λ)·β2·(αAαH)2(αA+β·y)2(αH+β·y)<0, (A9)

which immediately implies limyϕλ(y)=limyϕλ(y)=limyϕλ(y)=0 and with l’Hospital’s rule

limyy·ϕλ(y)=limyy2·ϕλ(y)=limyy32·ϕλ(y)=12limyαA+β·yαH+β·yλ·λ(1λ)·β2·(αAαH)2(αA/y+β)2(αH/y+β)=12λ(1λ)·(αAαH)2β. (A10)

The formulas (A5), (A7) and (A9) imply the limits limypy=αλ, limyqy=β, limyx¯0(qy)=0. Notice that py<αλ holds trivially for all y0 since the intercept (pyαλ) of the tangent line ϕλ,ytan(·) is negative. Incorporating (A8) we therefore obtain limyh(y)limyh¯(y)=0. As mentioned before, for the proof it is sufficient to show that h¯(y)<0 for some yy2. This holds true if limyy·h¯(y)<0. To verify this, notice first that from (A5), (A7) and (A8) we get

h¯(y)=py·ex¯0(qy)·ϕλ(y)·12ϕλ(y)β(1qy)22(qyβ)y·ϕλ(y)·ex¯0(qy)y0. (A11)

Finally we obtain with (A10)

limyy·h¯(y)=limyy2·h¯(y)=limypy·ex¯0(qy)·y2·ϕλ(y)·12ϕλ(y)β(1qy)22(qyβ)+y3·ϕλ(y)·ex¯0(qy)=0λ(1λ)·(αAαH)2β<0.

Proof of Corollary 1.

Part (a) follows directly from Proposition 1 (a),(b) and the limit limnHλ(PA,nPH,n)=0 in the respective part (c) of the Propositions 7, 8, 9 as well as from (51). To prove part (b), according to (26) we have to verify lim infλ1lim infnHλPA,nPH,n=1. From part (c) of Proposition 2 we see that this is satisfied iff limλ1x0(qλE)=0. Recall that for fixed λ]0,1[ we have βλ=λβA+(1λ)βH>0, qλE=βAλβH1λ<βλ (cf. Lemma A1) and from Properties 1 (P1) the unique negative solution x0(qλE)]βλ,qλEβλ[ of ξλ(qλE)(x)=qλEexβλ=x (cf. (44)). Due to the continuity and boundedness of the map λx0(qλE) (for λ[0,1]) one gets that limλ1x0(qλE) exists and is the smallest nonpositive solution of βAexβA=x. From this, the part (b) as well as the non-contiguity in part (c) follow immediately. The other part of (c) is a direct consequence of Proposition 1 (a),(b) and Proposition 2 (c). □

Proof of Formula (59).

One can proceed similarly to the proof of formula (51) above. Recall Hλ(PA,1PH,1)=exp{ϕλ(X0)}>1 for X0N(cf. (28), Lemma A1 and fA(X0)fH(X0) for all X0N). For βA,βH,αA,αH,λPSP,2×(R\[0,1]) one gets ϕλ(0)=0,ϕλ(1)>0, and we define for x0

ϕλ_(x):=ϕλ(1),ifx=1,0,ifx1.

By means of the choice ϵ_:=φλ(1)·eϕλ(1)1>0, we obtain for all nN\{1}

xn1=0φλ(xn2)xn1xn1!·expϕλ(xn1)xn1=0φλ(xn2)xn1xn1!·expϕλ_(xn1)=expφλ(xn2)+ϵ_=expφλ(xn2)·1+ϵ_·expφλ(xn2).

Incorporating

infxN0expϕλ(x)·1+ϵ_·expφλ(x)=:δ_>1,

one can show analogously to (A4) that

HλPA,nPH,nδ_n/2n.

Proof of the Formulas (61), (63) and (64).

In the following, we slightly adapt the above-mentioned proof of formula (59). Let us define

ϕλ_(x):=ϕλ(0),ifx=0,0,ifx>0.

In all respective subcases one clearly has ϕλ_(0)=ϕλ(0)>0. With ϵ_:=eϕλ(0)1>0 we obtain for all nN\{1}

xn1=0φλ(xn2)xn1xn1!·expϕλ(xn1)xn1=0φλ(xn2)xn1xn1!·expϕλ_(xn1)=expφλ(xn2)+ϵ_=expφλ(xn2)·1+ϵ_·expφλ(xn2).

By employing

infxN0expϕλ(x)·1+ϵ_·expφλ(x)=:δ_>1, (A12)

one can show analogously to (A4) that

HλPA,nPH,nδ_n/2n.

Notice that this method does not work for the parameter cases PSP,4aPSP,4b, since there the infimum in (A12) is equal to one. □

Proof of Proposition 13.

In the setup βA,βH,αA,αH,λPSP,4a×(R\[0,1]) we require β:=βA=βH<1. As in the proof of Proposition 9, we stick to the tangent line ϕλ,ytan(·) at y0 (cf. (52)) as a linear lower bound for ϕλ(·), i.e., we use the function

ϕλ,ytan(x):=pyαλ+qyβ·x:=pλ,ytanαλ+qλ,ytanβλ·x:=ϕλ(y)y·ϕλ(y)+ϕλ(y)·x. (A13)

As already mentioned in Section 3.21, on PSP,4a the function ϕλ(·) is strictly decreasing and converges to 0. Thus, for all y0 the slope ϕλ(y) of the tangent line at y is negative, which implies that qy<βλ=β. For λR\[0,1] there clearly may hold qy<0 for some yR. However, there exists a sufficiently large y1>0 such that qy>0 for all y>y1, since limyϕλ(y)=0 and hence qyβ>0 for y. Thus, let us suppose that y>y1. Then, the sequence an(qy)nN is strictly negative, strictly decreasing and converges to x0(qy)]β,qyβ[ (cf. Properties 1 (P1)). If there is some yy1 such that h(y)>0 with

h(y):=limn1nlogB˜λ,X0,n(py,qy)=py·ex0(qy)αλ,

then one can conclude that limnHλ(PA,nPH,n)=. Let us at first consider the case αλ0. By employing pyαλ for y, one gets py>0 for all y0. Analogously to the proof of Proposition 9, we now look for a lower bound x_0(qy) of the fixed point x0(qy). Notice that x0(qy)>β implies

Q_λ(qy)(x):=eβ2·qy·x2+qy·x+qyβqy·exβ=ξλ(qy)(x), (A14)

since Q_λ(qy)(0)=ξλ(qy)(0)<0, Q_λ(qy)(0)=ξλ(qy)(0)>0 and 0<Q_λ(qy)(x)<ξλ(qy)(x) for x]β,0]. Thus, the negative solution x_0(qy) of the equation Q_λ(qy)(x)=x (which definitely exists) implies that there holds x_0(qy)x0(qy). We easily obtain

x_0(qy)=eβqy(1qy)(1qy)22eβqy(qyβ)=eβϕλ(y)+β(1ϕλ(y)β)(1ϕλ(y)β)22·eβqy·ϕλ(y)<0. (A15)

Since

h(y)=py·ex0(qy)αλpy·ex_0(qy)αλ=:h_(y), (A16)

it is sufficient to show h_(y)>0 for some y>y1. We recall from Properties 3 (P15), (P17) and (P19) that

ϕλ(y)=αA+β·yλαH+β·y1λλαA+β·y(1λ)αH+β·y>0,ϕλ(y)=λ·β·αA+β·yαH+β·yλ1+(1λ)·β·αA+β·yαH+β·yλβ<0andϕλ(y)=αA+β·yαH+β·yλ·λ(1λ)·β2·(αAαH)2(αA+β·y)2(αH+β·y)>0, (A17)

which immediately implies limyϕλ(y)=limyϕλ(y)=limyϕλ(y)=0, and by means of l’Hospital’s rule

limyy·ϕλ(y)=limyy2·ϕλ(y)=limyy32·ϕλ(y)=12limyαA+β·yαH+β·yλ·λ(1λ)·β2·(αAαH)2(αA/y+β)2(αH/y+β)=12λ(1λ)·(αAαH)2β. (A18)

The Formulas (A13), (A15), (A17) imply the limits limypy=αλ, limyqy=β and limyx_0(qy)=0 iff β1. The latter is due to the fact that for β>1 one gets with (A15) limyx_0(qy)=eββ(1β)(1β)2=eββ22β0. In the following, let us assume β<1 (the reason why we exclude the case β=1 is explained below). One gets limyh(y)limyh_(y)=0. Since we have to prove that h_(y)>0 for some y>y1, it is sufficient to show that limyy·h_(y)>0. To verify the latter, we first derive with l’Hospital’s rule and with (A17), (A18)

limyy·1ex_0(qy)=limyy2·ex_0(qy)·yx_0(qy)=limy{y2·eβ·ϕλ(y)ϕλ(y)+β2·(1qy)(1qy)22eβqy(qyβ)+eβqy·y2·ϕλ(y)2y2ϕλ(y)(1qy)2y2ϕλ(y)eβqy2y2ϕλ(y)eβϕλ(y)2·(1qy)22eβqy(qyβ)}=0. (A19)

Notice that without further examination this limit would not necessarily hold for β=1, since then the denominator in (A19) converges to zero. With (A13), (A16), (A18) and (A19) we finally obtain

limyy·h_(y)=limyy·ϕλ(y)y2·ϕλ(y)·ex_0(qy)y·1ex_0(qy)αλ=λ(1λ)(αAαH)2β>0. (A20)

Let us now consider the case αλ<0. The proof works out almost completely analogous to the case αλ0. We indicate the main differences. Since pyαλ<0 and qyβ]0,1[ for y, there is a sufficiently large y2>y1, such that py<0 and qy>0. Thus,

Q¯λ(qy)(x):=qy2·x2+qy·x+qyβξλ(qy)(x)=qyexβforx],0].

The corresponding (existing) smaller solution of Q¯λ(qy)(x)=x is

x¯0(qy)=1qy(1qy)(1qy)22qy(qyβ),

having the same form as the solution (A15) with eβ substituted by 1. Notice that there clearly holds x0(qy)<x¯0(qy)<0. However, since py<0, we now get h(y)=py·ex0(qy)αλpy·ex¯0(qy)αλ=:h_(y), as in (A16). Since all calculations (A17) to (A20) remain valid (with eβ substituted by 1), this proof is finished. □

Appendix A.2. Proofs and Auxiliary Lemmas for Section 5

We start with two lemmas which will be useful for the proof of Theorem 3. They deal with the sequence an(qλ)nN from (36).

Lemma A2.

For arbitrarily fixed parameter constellation βA,βH,αA,αH,λP×]0,1[, suppose that qλ>0 and limλ1qλ=βA holds. Then one gets the limit

nN:limλ1an(qλ)=0. (A21)

Proof. 

This can be easily seen by induction: for n=1 there clearly holds

limλ1a1(qλ)=limλ1(qλβλ)=βAβA=0.

Assume now that limλ1ak(qλ)=0 holds for all kN, kn1, then

limλ1an(qλ)=limλ1(qλ·ean1(qλ)βλ)=βA·1βA=0.

Lemma A3.

In addition to the assumptions of Lemma A2, suppose that λqλ is continuously differentiable on ]0,1[ and that the limit l:=limλ1qλλ is finite. Then, for all nN one obtains

limλ1an(qλ)λ=un:=l+βHβA1βA·1βAn,ifβA1,n·l+βH1,ifβA=1, (A22)

which is the unique solution of the linear recursion equation

un=l+βHβA+βA·un1,u0=0. (A23)

Furthermore, for all nN there holds

k=1nlimλ1ak(qλ)λ=k=1nuk=l+βHβA1βA·nβA1βA1βAn,ifβA1,n·(n+1)2·l+βH1,ifβA=1.

Proof. 

Clearly, un defined by (A22) is the unique solution of (A23). We prove by induction that limλ1an(qλ)λ=un holds. For n=1 one gets

limλ1a1(qλ)λ=limλ1(qλβλ)λ=l(βAβH)=u1.

Suppose now that (A22) holds for all kN, kn1. Then, by incorporating (A21) we obtain

limλ1an(qλ)λ=limλ1λqλ·ean1(qλ)βλ=limλ1ean1(qλ)·qλλ+qλan1(qλ)λ(βAβH)=l(βAβH)+βA·un1=un.

The remaining assertions follow immediately. □

We are now ready to give the

Proof of Theorem 3.

(a) Recall that for the setup βA,βH,αA,αH(PNIPSP,1) we chose the intercept as pλ:=pλE:=αAλαH1λ and the slope as qλ:=qλE:=βAλβH1λ, which in (39) lead to the exact value Vλ,X0,n of the Hellinger integral. Because of pλqλβλαλ=0 as well as limλ1qλ=βA, we obtain by using (38) and Lemma A2 for all X0N and for all nN

limλ1Vλ,X0,n:=limλ1expan(qλ)·X0+k=1nbk(pλ,qλ)=limλ1expan(qλ)·X0+αAβAk=1nak(qλ)=1,

which leads by (68) to

I(PA,nPH,n)=limλ11Hλ(PA,nPH,n)λ·(1λ)=limλ11Vλ,X0,nλ·(1λ)=limλ1Vλ,X0,n12λ·λan(qλ)·X0+pλqλk=1nak(qλ)=limλ1an(qλ)λ·X0+λpλqλ·k=1nak(qλ)+pλqλ·k=1nak(qλ)λ. (A24)

For further analysis, we use the obvious derivatives

pλλ=pλlogαAαH,λpλqλ=pλqλlogαAβHαHβA,qλλ=qλlogβAβH, (A25)

where the subcase βA,βH,αA,αHPNI (with pλ0) is consistently covered. From (A25) and Lemma A3 we deduce

limλ1an(qλ)λ·X0=βAlogβAβH(βAβH)·1βAn1βA·X0,ifβA1,n·βAlogβAβH(βAβH)·X0,ifβA=1,

and by means of (A21)

nN:limλ1λpλqλ·k=1nak(qλ)=0.

For the last expression in (A24) we again apply Lemma A3 to end up with

limλ1pλqλ·k=1nλak(qλ)=αA·βAlogβAβH(βAβH)βA(1βA)·nβA1βA1βAn,ifβA1,n·(n+1)αA2βA·βAlogβAβH(βAβH),ifβA=1, (A26)

which finishes the proof of part (a). To show part (b), for the corresponding setup βA,βH,αA,αHPSP\PSP,1 let us first choose – according to (45) in Section 3.4—the intercept as pλ:=pλL:=αAλαH1λ and the slope as qλ:=qλL:=βAλβH1λ, which in part (b) of Proposition 6 lead to the lower bounds Bλ,X0,nL of the Hellinger integral. This is formally the same choice as in part (a) satisfying limλ1pλ=αA, limλ1qλ=βA but in contrast to (a) we now have pλqλβλαλ0 but nevertheless

limλ1pλqλβλαλ=0.

From this, (38), part (b) of Proposition 6 and Lemma A2 we obtain

limλ1Bλ,X0,nL=limλ1expan(qλ)·X0+pλqλk=1nak(qλ)+n·pλqλβλαλ=1 (A27)

and hence

I(PA,nPH,n)limλ11Bλ,X0,nLλ·(1λ)=limλ1Bλ,X0,nL12λ·λan(qλ)X0+pλqλk=1nak(qλ)+npλqλβλαλ=limλ1an(qλ)λX0+λpλqλk=1nak(qλ)+pλqλk=1nak(qλ)λ+nλpλqλβλαλ. (A28)

In the current setup, the first three expressions in (A28) can be evaluated in exactly the same way as in (A25) to (A26), and for the last expression one has the limit

λpλqλβλαλ=pλqλlogαAβHαHβA·βλ+pλqλ·βAβHαAαHλ1αAlogαAβHαHβAβHβA+αH,

which finishes the proof of part (b). □

Proof of Theorem 4.

Let us fix βA,βH,αA,αHPSP\PSP,1, X0N, nN and y[0,[. The lower bound Ey,X0,nL,tan of the Kullback-Leibler information divergence (relative entropy) is derived by using ϕλUϕλ,ytan (cf. (52)), which corresponds to the tangent line of ϕλ at y, as a linear upper bound for ϕλ (λ]0,1[). More precisely, one gets ϕλU(x):=(pλUαλ)+(qλUβλ)x (x[0,[) with pλ:=pλ(y):=ϕλ(y)yϕλ(y)+αλ and qλ:=qλ(y):=ϕλ(y)+βλ, implying qλ>0 because of Properties 3 (P17). Analogously to (A27) and (A28), we obtain from (38) and (40) the convergence limλ1Bλ,X0,nU=1 and thus

I(PA,nPH,n)limλ1an(qλ)λX0+λpλqλk=1nak(qλ)+pλqλk=1nak(qλ)λ+nλpλqλβλαλ. (A29)

As before, we compute the involved derivatives. From (30) to (32) as well as (P17) we get

pλλ=fA(y)fH(y)λfH(y)logfA(y)fH(y)βAyfA(y)fH(y)λ1λβAyfA(y)fH(y)λ1logfA(y)fH(y)+βHyfA(y)fH(y)λ(1λ)βHyfA(y)fH(y)λlogfA(y)fH(y)λ1αAlogfA(y)fH(y)+y·(αAβHαHβA)fH(y), (A30)

and

qλλ=βAfA(y)fH(y)λ1+λβAfA(y)fH(y)λ1logfA(y)fH(y)βHfA(y)fH(y)λ+(1λ)βHfA(y)fH(y)λlogfA(y)fH(y)λ1βA1+logfA(y)fH(y)βHfA(y)fH(y)=:l. (A31)

Combining these two limits we get

λpλqλβλαλ=qλpλλpλqλλ(qλ)2·βλ+pλqλ·βAβHαAαHλ1y·(αAβHαHβA)fH(y)αA1βHfA(y)βAfH(y)+αHαAβHβA.=αHαAβHβA1fA(y)fH(y). (A32)

The above calculation also implies that limλ1λpλqλ is finite and thus limλ1λpλqλk=1nak(qλ)=0 by means of Lemma A2. The proof of I(PA,nPH,n)Ey,X0,nL,tan is finished by using Lemma A3 with l defined in (A31) and by plugging the limits (A30) to (A32) in (A29).

To derive the lower bound Ek,X0,nL,sec (cf. (73)) for fixed kN0, we use as a linear upper bound ϕλU for ϕλ(·) (λ]0,1[) the secant line ϕλ,ksec (cf. (53)) of ϕλ across its arguments k and k+1, corresponding to the choices pλ:=pλ,ksec=(k+1)·ϕλ(k)k·ϕλ(k+1)+αλ and qλ:=qλ,ksec:=ϕλ(k+1)ϕλ(k)+βλ, implying qλ>0 because of Properties 3 (P18). As a side remark, notice that this ϕλU(x) may become positive for some x[0,[ (which is not always consistent with Goal (G1) for fixed λ, but leads to a tractable limit bound as λ tends to 1). Analogously to (A27) and (A28) we get again limλ1Bλ,X0,nU=1, which leads to the lower bound given in (A29) with appropriately plugged-in quantities. As in the above proof of the lower bound Ey,X0,nL,tan, the inequality I(PA,nPH,n)Ek,X0,nL,sec follows straightforwardly from Lemma A2, Lemma A3 and the three limits

pλλ=fA(k)fH(k)λfH(k)·(k+1)logfA(k)fH(k)fA(k+1)fH(k+1)λfH(k+1)·klogfA(k+1)fH(k+1)λ1fA(k)(k+1)logfA(k)fH(k)fA(k+1)klogfA(k+1)fH(k+1),qλλ=fA(k+1)fH(k+1)λfH(k+1)logfA(k+1)fH(k+1)fA(k)fH(k)λfH(k)logfA(k)fH(k)λ1fA(k+1)logfA(k+1)fH(k+1)fA(k)logfA(k)fH(k)=:l,andλpλqλβλαλ=qλpλλpλqλλ(qλ)2·βλ+pλqλ·βAβHαAαHλ1fA(k)logfA(k)fH(k)k+1+αAβAfA(k+1)logfA(k+1)fH(k+1)k+αAβAαAβHβA+αH.

To construct the third lower bound EX0,nL,hor (cf. (74)), we start by using the horizontal line ϕλhor(·) (cf. (54)) as an upper bound of ϕλ. For each fixed λ]0,1[, it is defined by the intercept supxN0ϕλ(x). On PSP,3aPSP,3b, this supremum is achieved at the finite integer point zλ*:=argmaxxN0ϕλ(x) (since limxϕλ(x)=) and there holds ϕλ(zλ*)<0 which leads with the parameters qλ=βλ, pλ=ϕλ(zλ*)+αλ to the Hellinger integral upper bound Bλ,X0,nU=expϕλ(zλ*)·n<1 (cf. Remark 1 (b)). We strive for computing the limit limλ11Bλ,X0,nUλ(1λ), which is not straightforward to solve since in general it seems to be intractable to express zλ* explicitly in terms of λ. To circumvent this problem, we notice that it is sufficient to determine zλ* in a small ϵenvironment ]1ϵ,1[. To accomplish this, we incorporate limλ1ϕλ(x)=0 for all x[0,[ and calculate by using l’Hospital’s rule

limλ1ϕλ(x)1λ=(αA+βAx)logαA+βAxαH+βHx+1(αH+βHx).

Accordingly, let us define z*:=argmaxxN0(αA+βAx)logαA+βAxαH+βHx+1(αH+βHx) (note that the maximum exists since limx(αA+βAx)logαA+βAxαH+βHx+1(αH+βHx)=). Due to continuity of the function (λ,x)ϕλ(x)1λ, there exists an ϵ>0 such that for all λ]1ϵ,1[ there holds zλ*=z*. Applying these considerations, we get with l’Hospital’s rule

I(PA,nPH,n)limλ11expϕλ(z*)·nλ(1λ)=fA(z*)·logfA(z*)fH(z*)1+fH(z*)·n0. (A33)

In fact, for the current parameter constellation PSP,3aPSP,3b we have ϕλ(x)<0 for all λ]0,1[ and all xN0 which implies fA(z*)fH(z*) by Lemma A1; thus, we even get EX0,nL,hor>0 for all nN by virtue of the inequality logfH(z*)fA(z*)>fH(z*)fA(z*)+1.

For the case PSP,2, the above-mentioned procedure leads to zλ*=0=z* (λ]0,1[) which implies ϕλ(zλ*)=0, Bλ,X0,nU1 and thus the trivial lower bound EX0,nL,hor=limλ11Bλ,X0,nUλ(1λ)=0 follows for all nN. In contrast, for the case PSP,3c one gets zλ*=αAαHβHβA=z* (λ]0,1[) which nevertheless also implies ϕλ(zλ*)=0 and hence EX0,nL,hor0. On PSP,4, we have supxN0ϕλ(x)=limxϕλ(x)=0 and hence we set EX0,nL,hor0.

To show the strict positivity EX0,nL>0 in the parameter case PSP,2, we inspect the bound E0,X0,nL,sec. With α:=α:=αA=αH (the bullet will be omitted in this proof) and the auxiliary variable x:=βHβA>0, the definition (73) respectively its special case (76) rewrites for all nN as

E0,X0,nL,sec:=E0,X0,nL,sec(x):=(α+βA)·logα+βAxα+βA+βA(x1)·1(βA)n1βA·X0α1βA+[αβA(1βA)(α+βA)·logα+βAxα+βA+βA(x1)+αβAα+βA·logα+βAxα+βAα(x1)]·n,ifβA1,(α+1)·logα+xα+1+x1·α2·n2+X0+α2·n+(α+1)·logα+xα+1x+1·α·n,ifβA=1. (A34)

To prove that E0,X0,nL,sec>0 for all X0N and all nN it suffices to show that E0,X0,nL,sec(1)=xE0,X0,nL,sec(1)=0 and 2x2E0,X0,nL,sec(x)>0 for all x]0,[\{1}. The assertion E0,X0,nL,sec(1)=0 is trivial from (A34). Moreover, we obtain

xE0,X0,nL,sec(x)=βA·1α+βAα+βAx·1(βA)n1βA·X0α1βA+α·1α+βAα+βAx·βA1βA·n,ifβA1,1α+1α+x·α2·n2+X0α2·n,ifβA=1,

which immediately yields xE0,X0,nL,sec(1)=0. For the second derivative we get

2x2E0,X0,nL,sec(x)=(α+βA)·βA2(α+βAx)2·1(βA)n1βA·X0α1βA+αα+βA(α+βAx)2·βA21βA·n>0,ifβA1,α+1(α+x)2·α2·n2+X0α2·n>0,ifβA=1, (A35)

where the strict positivity of E0,X0,nL,sec in the case βA1 follows immediately by replacing X0 with 0 and by using the obvious relation 11βA·n1βAn1βA=11βAk=0n11βAk>0. The strict positivity in the case βA=1 is trivial by inspection.

For the constellation PSP,4 with parameters β:=β:=βA=βH, αAαH, the strict positivity of EX0,nL>0 follows by showing that Ey,X0,nL,tan converges from above to zero as y tends to infinity. This is done by proving limyy·Ey,X0,nL,tan]0,[. To see this, let us first observe that by l’Hospital’s rule we get

limyy·logαA+βyαH+βy=αAαHβaswellaslimyy·1αA+βyαH+βy=αAαHβ.

From this and (72), we obtain limyy·Ey,X0,nL,tan=(αAαH)2β·n>0 in both cases β1 and β=1.

Finally, for the parameter case PSP,3c we consider the bound Ey*,X0,nL,tan, with y*=αAαHβHβA. Since αA+βAy*=αH+βHy*, it is easy to see that Ey*,X0,nL,tan=0 for all nN. However, the condition yEy,X0,nL,tan(y*)0 implies that supy0Ey,X0,nL,tan>0. The explicit form (75) of this condition follows from

yEy,X0,nL,tan(y)=(αAβHαHβA)2fA(y)fH(y)2·1βAn1βA·X0αA1βA+αAβHαHβAfH(y)2·αAβA(1βA)fA(y)αAβHαHβAβA·n,ifβA1,(αAβHαH)2fA(y)fH(y)2·αA2·n2+X0+αA2·n(αAβHαH)2fH(y)2·n,ifβA=1,

y0, by using the particular choice y=y* together with fA(y*)=fH(y*)=αAβHαHβAβAβH. □

Appendix A.3. Proofs and Auxiliary Lemmas for Section 6

Proof of Lemma 2.

A closed-form representation of a sequence a˜nnN0 defined in (83) to (85) is given by the formula

a˜n=k=0n1c+ρkdn1k. (A36)

This can be seen by induction: from (83) we obtain with a˜0=0 for the first element a˜1=c+ρ0=k=00(c+ρk)dk. Supposing that (A36) holds for the n-th element, the induction step is

a˜n+1=c+d·a˜n+ρn=c+d·k=0n1c+ρkdn1k+ρn=k=0nc+ρkdnk.

In order to obtain the explicit representation of a˜n, we consider first the case 0ν<ϰ<d and ρn=K1·ϰn+K2·νn, which leads to

a˜n=dn1k=0n1c·dk+K1·ϰdk+K2·νdk=dn1·c·1dn1d1+K1·1ϰdn1ϰd+K2·1νdn1νd=c1d(1dn)+K1·dnϰndϰ+K2·dnνndν. (A37)

Hence, for the corresponding sum we get

k=1na˜k=k=1nc1d+K1dϰ+K2dνc1d·dkK1dϰ·ϰkK2dν·νk=c1d·n+K1dϰ+K2dνc1d·d·(1dn)1dK1·ϰ·(1ϰn)(dϰ)(1ϰ)K2·ν·(1νn)(dν)(1ν). (A38)

Consider now the case 0ν<ϰ=d. Then some expressions in (A37) and (A38) have a zero denominator. In this case, the evaluation of (A36) becomes

a˜n=dn1k=0n1c·dk+K1+K2·νdk=dn1·c·1dn1d1+K1·n+K2·1νdn1νd=c1d(1dn)+K1·n·dn1+K2·dnνndν. (A39)

Before we calculate the corresponding sum k=1na˜k, we notice that

k=1nk·dk1=k=1nddk=dk=1ndk=dd·(1dn)1d=1n·dn(1d)dn(1d)2.

Using this fact, we obtain

k=1na˜k=k=1nc1d(1dk)+K1·k·dk1+K2·dkνkdν=c1d·n+k=1nK2dνc1ddk+K1k=1nk·dk1K2dνk=1nνk=K2dνc1dd·(1dn)1d+K1·1n·dn(1d)dn(1d)2K2·ν(1νn)(dν)(1ν)+c1d·n=K1d(1d)+K2dνc1dd·(1dn)1dK2·ν(1νn)(dν)(1ν)+c1dK1·dn1d·n.

Proof of Lemma 3.

(a) In this case we have 0<q<βλ. To prove part (i), we consider the function ξλ(q)(·) on [x0(q),0], the range of the sequence an(q)nN (recall Properties 1 (P1)). For tackling the left-hand inequality in (i), we compare ξλ(q)(x)=q·exβλ with the quadratic function

Υ_λ(q)(x):=q2ex0(q)·x2+qex0(q)1x0(q)·x+x0(q)1qex0(q)+q2ex0(q)x0(q). (A40)

Clearly, one has the relations Υ_λ(q)(x0(q))=x0(q)=ξλ(q)(x0(q)), Υ_λ(q)(x0(q))=q·ex0(q)=ξλ(q)(x0(q)), and Υ_λ(q)(x)<ξλ(q)(x) for all x]x0(q),0]. Hence, Υ_λ(q)(·) is on ]x0(q),0] a strict lower functional bound of ξλ(q)(·). We are now ready to prove the left-hand inequality in (i) by induction. For n=1, we easily see that a_1(q)<a1(q) iff x0(q)1qex0(q)+q2ex0(q)x0(q)<qβλ iff Υ_λ(q)(0)<ξλ(q)(0), and the latter is obviously true. Let us assume that a_n(q)an(q) holds. From this, (93), (78) and (80) we obtain

0<ρ_n(q)=q2ex0(q)x0(q)·q·ex0(q)n2=q2ex0(q)an(q),Tx0(q)2<q2ex0(q)an(q)x0(q)2=Υ_λ(q)an(q)d(q),T·an(q)x0(q)·1d(q),T<ξλ(q)an(q)d(q),T·an(q)x0(q)·1d(q),T<an+1(q)d(q),T·a_n(q)x0(q)·1d(q),T=an+1(q)ξλ(q),T(a_n(q)).

Thus, there holds a_n+1(q)<an+1(q). For the right-hand inequality in (i), we proceed analogously:

Υ¯λ(q)(x):=q2ex0(q)·x2+1q2ex0(q)x0(q)qβλx0(q)·x+qβλ (A41)

satisfies Υ¯λ(q)(x0(q))=x0(q)=ξλ(q)(x0(q)), Υ¯λ(q)(0)=qβλ=ξλ(q)(0) as well as Υ¯λ(q)(x)<ξλ(q)(x) for all x]x0(q),0]. Hence, Υ¯λ(q)(·) is on ]x0(q),0] a strict upper functional bound of ξλ(q)(·). Let us first observe the obvious relation a¯1(q)=qβλ=a1(q)<0, and assume that a¯n(q)an(q) (nN) holds. From this, (95), (79), and (80) we obtain the desired inequality a¯n+1(q)>an+1(q) by

0>ρ¯n(q)=Γ<(q)d(q),Tn·an(q),Sx0(q)=q2ex0(q)an(q),Tx0(q)·an(q),Sq2ex0(q)an(q)x0(q)·an(q)=Υ¯λ(q)an(q)d(q),S·an(q)(qβλ)>ξλ(q)an(q)d(q),S·an(q)(qβλ)an+1(q)d(q),S·a¯n(q)(qβλ)=an+1(q)ξλ(q),S(a¯n(q)).

The explicit representations of the sequences an(q)nN, a_n(q)nN and a¯n(q)nN follow from (86) by incorporating the appropriate constants mentioned in the prelude of Lemma 3. With (83) to (85) and (86) we immediately achieve a_n(q)>an(q),T for all nN. Analogously, for all n2, we get ρ¯n1<0, which implies that a¯n(q)<an(q),S for all n2. For n=1 one obtains ρ¯0=0 as well as a¯1(q)=a1(q),S=a1(q)=qβλ.

For the second part (ii), we employ the representation (A36) which leads to

a_n(q)=k=0n1d(q),Tn1k·ρ_k(q)+x0(q)·(1d(q),T)aswellasa¯n(q)=k=0n1d(q),Sn1k·ρ¯k(q)+(qβλ).

The strict decreasingness of both sequences follows from

ρ_k(q)+x0(q)(1d(q),T)=qex0(q)2x0(q)2d(q),T2n+x0(q)1d(q),TΥ_λ(q)(0)<ξλ(q)(0)=qβλ<0

and from the fact that ρ¯k(q)0 for all kN0 and q<βλ. Part (iii) follows directly from (i), since d(q),T,d(q),S]0,1[.

Let us now prove part (b), where max{0,βλ}<q<min1,eβλ1 is assumed. To tackle part (i), we compare ξλ(q)(x)=q·exβλ with the quadratic function

υ__λ(q)(x):=q2·x2+q·ex0(q)x0(q)·x+x0(q)1qex0(q)+q2x0(q)>0 (A42)

on the interval [0,x0(q)]. Clearly, we have υ__λ(q)x0(q)=ξλ(q)(x0(q))=x0(q), υ__λ(q)(x0(q))=ξλ(q)(x0(q))=qex0(q) and 0<υ__λ(q)(x)<ξλ(q)(x) for all x]0,x0(q)]. Thus, υ__λ(q)(·) constitutes a positive functional lower bound for ξλ(q)(·) on [0,x0(q)]. Let us now prove the left-hand inequality of (i) by induction: for n=1 we get a_1(q)=υ__λ(q)(0)<ξλ(q)(0)=a1(q). Moreover, by assuming a_n(q)an(q) for nN, we obtain with the above-mentioned considerations and (93), (80) and (82)

0<ρ_n(q)=Γ>(q)d(q),S2n=q2·an(q),Sx0(q)2<q2·an(q)x0(q)2=q2an(q)2+q·ex0(q)x0(q)·an(q)+x0(q)·1qex0(q)+q2x0(q)d(q),Tan(q)c(q),T=υ__λ(q)(an(q))d(q),Tan(q)c(q),T<ξλ(q)(an(q))d(q),Tan(q)c(q),T<an+1(q)d(q),Ta_n(q)c(q),T=an+1(q)ξλ(q),T(a_n(q)).

Hence, a_n+1(q)<an+1(q). For the right-hand inequality in part (i), we define the quadratic function

υ¯¯λ(q)(x):=q2·x2+1q2x0(q)qβλx0(q)·x+qβλ, (A43)

which is a functional upper bound for ξλ(q)(·) on the interval [0,x0(q)] since there holds υ¯¯λ(q)(0)=ξλ(q)(0)=qβλ, υ¯¯λ(q)(x0(q))=ξλ(q)(x0(q))=x0(q) and additionally υ¯¯λ(q)(x)=q<qex=ξλ(q)(x) on ]0,x0(q)[. Obviously, a¯1(q)=qβλ=a1(q). By assuming a¯n(q)an(q) for nN, we obtain with (80), (82) and (95)

0>ρ¯n(q)=Γ>(q)·d(q),Sn·1d(q),Tn=q2·x0an(q),S·an(q),T>q2·x0an(q)·an(q)=υ¯¯λ(q)(an(q))x0(q)(qβλ)x0(q)·an(q)(qβλ)>ξλ(q)(an(q))d(q),San(q)c(q),S>ξλ(q)(an(q))d(q),Sa¯n(q),Sc(q),S=an+1(q)ξλ(q),S(a¯n(q)), (A44)

which implies a¯n+1(q)>an+1(q). The explicit representations of the sequences a_n(q)nN and a¯n(q)nN follow from (86) by employing the appropriate constants mentioned in the prelude of Lemma 3. By means of (83) to (85) and (86), we directly get a_n(q)>an(q),T for all nN, whereas a¯n(q)<an(q),S holds only for all n2, since ρ¯0=0 implies that a¯1(q)=a1(q),S=a1(q)=qβλ.

The second part (ii) can be proved in the same way as part (ii) of (a), by employing the representation (A36). For the lower bound one has

a_n(q)=k=0n1d(q),Tn1k·c(q),T+ρ_k(q),withc(q),T>0andρ_k(q)>0.

For the upper bound we get

a¯n(q)=k=0n1d(q),Sn1k·c(q),S+ρ¯k(q),

hence it is enough to show c(q),S+ρ¯n(q)>0 for all nN0. Considering the first two lines of calculation (A44) and incorporating c(q),S=qβλ, this can be seen from

c(q),S+ρ¯n(q)>υ¯¯λ(q)(an(q))x0(q)(qβλ)x0(q)·an(q)=υ¯¯λ(q)(an(q))d(q),S·an(q)>0,

because on [0,x0(q)] there holds d(q),S·x<x<υ¯¯λ(q)(x). The last part (iii) can be easily deduced from (i) together with limnn·d(q),Sn1=0. □

The proofs of all Theorems 5–9 are mainly based on the following

Lemma A4.

Recall the quantity B˜λ,X0,n(p,q) from (42) for general p0,q>0 (notice that we do not consider parameters p<0, q0 in Section 6) as well as the constants d(q),T,d(q),S and Γ<(q),Γ>(q) defined in (76), (77) and (91). For all βA,βH,αA,αH,λP×R\{0,1}, all initial population sizes X0N and all observation horizons nN there holds

  • (a) 
    in the case p0 and 0<q<βλ
    B˜λ,X0,n(p,q)exp{x0(q)·X0pq·d(q),T1d(q),T·1d(q),Tn+pq·βλ+x0(q)αλ·n+ζ_n(q)·X0+pq·ϑ_n(q)}=:Cλ,X0,n(p,q),L, (A45)
    B˜λ,X0,n(p,q)exp{x0(q)·X0pq·d(q),S1d(q),S·1d(q),Sn+pq·βλ+x0(q)αλ·nζ¯n(q)·X0pq·ϑ¯n(q)}=:Cλ,X0,n(p,q),U, (A46)
    whereζ_n(q):=Γ<(q)·d(q),Tn11d(q),T·1d(q),Tn>0, (A47)
    ϑ_n(q):=Γ<(q)·1d(q),Tn1d(q),T2·1d(q),T1+d(q),Tn1+d(q),T>0, (A48)
    ζ¯n(q):=Γ<(q)·d(q),Snd(q),Tnd(q),Sd(q),Td(q),Sn1·1d(q),Tn1d(q),T>0, (A49)
    ϑ¯n(q):=Γ<(q)·d(q),T1d(q),T·1d(q),Sd(q),Tn1d(q),Sd(q),Td(q),Snd(q),Tnd(q),Sd(q),T>0. (A50)
  • (b) 
    in the case p0 and 0<q=βλ
    B˜λ,X0,n(p,q)=exppq·βλ+x0(q)αλ·n=exppαλ·n.
  • (c) 
    in the case p0 and max{0,βλ}<q<min1,eβλ1 the bounds Cλ,X0,n(p,q),L and Cλ,X0,n(p,q),U from (96) and (97) remain valid, but with
    ζ_n(q):=Γ>(q)·d(q),Tnd(q),S2nd(q),Td(q),S2>0, (A51)
    ϑ_n(q):=Γ>(q)d(q),Td(q),S2·d(q),T·1d(q),Tn1d(q),Td(q),S2·1d(q),S2n1d(q),S2>0, (A52)
    ζ¯n(q):=Γ>(q)·d(q),Sn1·n1d(q),Tn1d(q),T>0, (A53)
    ϑ¯n(q):=Γ>(q)·[d(q),Sd(q),T1d(q),S21d(q),T·1d(q),Sn+d(q),T1d(q),Sd(q),Tn1d(q),T1d(q),Sd(q),Td(q),Sn1d(q),S·n]. (A54)
  • (d) 
    for the special choices p:=pλE:=αAλαH1λ>0,q:=qλE:=βAλβH1λ>0 in the parameter setup βA,βH,αA,αH,λ(PNIPSP,1)×]λ,λ+[\{0,1} we obtain
    limn1nlogVλ,X0,n=limn1nlogCλ,X0,n(pλE,qλE),L=limn1nlogCλ,X0,n(pλE,qλE),U=αAβA·x0(qλE).
  • (e) 
    for all general p0 with either 0<q<βλ or max{0,βλ}<q<min1,eβλ1 we get
    limn1nlogB˜λ,X0,n(p,q)=limn1nlogCλ,X0,n(p,q),L=limn1nlogCλ,X0,n(p,q),U=pq·βλ+x0(q)αλ.

Proof of Lemma A4.

The closed-form bounds Cλ,X0,n(p,q),L and Cλ,X0,n(p,q),U are obtained by substituting in the representation (42) (for B˜λ,X0,n(p,q), cf. Theorem 1) the recursive sequence member an(q) by the explicit sequence member a_n(q) respectively a¯n(q). From the definitions of these sequences (92) to (95) and from (83) to (85) one can see that we basically have to evaluate the term

expa˜nhom+c˜n·X0+pq·k=1na˜khom+c˜k+pq·βλαλ·n, (A55)

where a˜nhom+c˜n=a˜n is either interpreted as the lower approximate a_n(q) or as the upper approximate a¯n(q). After rearranging and incorporating that c(q),S1d(q),S=c(q),T1d(q),T=x0(q) in both approximate cases, we obtain with the help of (86), (87) for the expression (A55) in the case 0ν<ϰ<d

exp{x0(q)·1dn·X0pq·d1d+pq·βλ+x0(q)αλ·n+K1·dnϰndϰ+K2·dnνndν·X0+pq·K1dϰ+K2dν·d·1dn1dK1·ϰ·1ϰn(dϰ)(1ϰ)K2·ν·1νn(dν)(1ν)}. (A56)

In the other case 0ν<ϰ=d, the application of (88), (89) turns (A55) into

exp{x0(q)·1dn·X0pq·d1d+pq·βλ+x0(q)αλ·n+K1·n·dn1+K2·dnνndν·X0+pq·K1d(1d)+K2dν·d·1dn1dK2·ν·1νn(dν)(1ν)K1·dn1d·n}. (A57)

After these preparatory considerations let us now begin with elaboration of the details.

(a) Let 0<q<βλ. We obtain a closed-form lower bound for B˜λ,X0,n(p,q) by employing the parameters c=^c(q),T, d=^d(q),T, K2=ν=0, K1=Γ<(q), and ϰ=d(q),T2, cf. (93) in combination with (85). Since ϰ<d(q),T, we have to plug in these parameters into (A56). The representations of ζ_n(q) and ϑ_n(q) in (A47) and (A48) follow immediately. For a closed-form upper bound, we employ the parameters c=^c(q),S, d=^d(q),S, K1=K2=Γ<(q), ϰ=d(q),T and ν=d(q),Sd(q),T (in particular, ϰ<d(q),S implying that we have to use (A56)). From this, (A49) can be deduced directly; the representation (A50) comes from the expressions in the squared brackets in the last line of (A56) and from

Γ<(q)d(q),Sd(q),TΓ<(q)d(q),Sd(q),Sd(q),T·d(q),S·1d(q),Sn1d(q),S+Γ<(q)·d(q),T·1d(q),Tnd(q),Sd(q),T1d(q),TΓ<(q)·d(q),Sd(q),T·1d(q),Sd(q),Tnd(q),Sd(q),Sd(q),T1d(q),Sd(q),T=Γ<(q)·d(q),T1d(q),Sd(q),Sd(q),Sd(q),T1d(q),T·d(q),S·1d(q),Sn1d(q),S+Γ<(q)·d(q),T·1d(q),Tnd(q),Sd(q),T1d(q),TΓ<(q)·d(q),T·1d(q),Sd(q),Tn1d(q),T1d(q),Sd(q),T=Γ<(q)·d(q),T1d(q),T·1d(q),Sd(q),Tn1d(q),Sd(q),T+1d(q),Snd(q),Sd(q),T1d(q),Tnd(q),Sd(q),T=Γ<(q)·d(q),T1d(q),T·1d(q),Sd(q),Tn1d(q),Sd(q),Td(q),Snd(q),Tnd(q),Sd(q),T=ϑ¯n(q).

Part (b) has already been mentioned in Remark 1 (b) and is due to the fact that for 0<q=βλ, the sequence an(q)nN is itself explicitly representable by an(q)=0 for all nN (cf. Properties 1 (P2)). Plugging this into (42) gives the desired result.

(c) Let us now consider max{0,βλ}<q<min{1,eβλ1}. For a closed-form lower bound for B˜λ,X0,n(p,q) we have to employ the parameters c=^c(q),T, d=^d(q),T, K2=ν=0, K1=Γ>(q) and ϰ=d(q),S2, cf. (93) in combination with (85). The representations of ζ_n(q) and ϑ_n(q) in (A51) and (A52) follow immediately from (A56). For a closed-form upper bound, we use the parameters c=^c(q),S, d=^d(q),S, K1=K2=Γ>(q), ϰ=d(q),S and ν=d(q),Sd(q),T. Notice that in this case we stick to the representation (A57). The formula (104) is obviously valid, and (105) is implied by

Γ>(q)d(q),S1d(q),S+Γ>(q)d(q),Sd(q),Sd(q),T·d(q),S·1d(q),Sn1d(q),S=Γ>(q)·d(q),Sd(q),T1d(q),S21d(q),T·1d(q),Sn.

The parts (d) and (e) are trivial by incorporating that in all respective cases one has d(q),S]0,1[, d(q),T]0,1[ and limnn·d(q),S=0. □

Proof of Theorem 5.

(a) For λ]0,1[, we get 0<qλE<βλ and the assertion follows by applying part (a) of Lemma A4. Notice that in the current subcase PNIPSP,1 there holds pλEqλEβλαλ=0 as well as pλEqλE=αAβA=αHβH. For the case λR\[0,1], one gets from Lemma A1 that max{0,βλ}<qλE, and there holds qλE<min{1,eβλ1} iff λ]λ,λ+[\[0,1], cf. Lemma 1. Thus, an application of part (c) of Lemma A4 proves the desired result. The assertion (b) is equivalent to part (d) of Lemma A4. □

Proof of Theorem 6.

The assertions follow immediately from (A45), Lemma A4(b),(e), Proposition 6(d) as well as the incorporation of the fact that for λ]0,1[ there holds qλL=βAλβH1λ<βλ in the case βA,βH,αA,αH(PSP\(PSP,1PSP,4)) (i.e., βAβH) respectively qλL=βλ in the case βA,βH,αA,αHPSP,4 (i.e., βA=βH). □

Proof of Theorem 7.

This can be deduced from (A46), from the parts (b), (c) and (e) of Lemma A4 as well as the incorporation of pλUαAλαH1λ>0 for λ]0,1[. Notice that an inadequate choice of pλU,qλU may lead to pλUqλU(βλ+x0(qλU))αλ>0. □

Proof of Theorem 8.

The assertions follow immediately from (A45) and from the parts (b), (c) and (e) of Lemma A4. Notice that an inadequate choice of pλL,qλL may lead to pλLqλL(βλ+x0(qλU))αλ<0. □

Proof of Theorem 9.

Let pλU=αAλαH1λ>max{0,αλ} and qλU=βAλβH1λ>max{0,βλ}. Since qλU<min{1,eβλ1} iff λ]λ,λ+[\[0,1] (cf. Lemma 1 for qλ:=qλU)), this theorem follows from (A46) of Lemma A4, from the parts (b), (e) of Lemma A4 and from part (d) of Proposition 14. □

Appendix A.4. Proofs and Auxiliary Lemmas for Section 7

Proof of Theorem 10.

As already mentioned above, one can adapt the proof of Theorem 9.1.3 in Ethier & Kurtz [138] who deal with drift-parameters η=0, κ=0, and the different setup of σindependent time-scale and a sequence of critical Galton-Watson processes without immigration with general offspring distribution. For the sake of brevity, we basically outline here only the main differences to their proof; for similar limit investigations involving offspring/immigration distributions and parametrizations which are incompatble to ours, see e.g., Sriram [142].

As a first step, let us define the generator

Af(x):=ηκ·x·f(x)+σ22·x·f(x),fCc[0,),

which corresponds to the diffusion process X˜ governed by (133). In connection with (130), we study

T(m)f(x):=EPf1mk=1mxY0,k(m)+Y˜0(m),xE(m):=1mN0,fCc([0,),

where the Y0,k(m), Y˜0(m) are independent and (Poisson-β(m) respectively Poisson-α(m)) distributed as the members of the collection Y(m) respectively Y˜(m). By the Theorems 8.2.1 and 1.6.5 as well as Corollary 4.8.9 of [138] it is sufficient to show

limmsupxE(m)σ2mT(m)f(x)f(x)Af(x)=0,fCc[0,). (A58)

But (A58) follows mainly from the next

Lemma A5.

Let

Sn(m):=1nk=1nY0,k(m)β(m)+Y˜0(m)α(m),nN,mN¯,

with the usual convention S0(m):=0. Then for all mN¯, xE(m) and all fCc[0,)

ϵ(m)(x):=EP01Smx(m)2x(1v)fβ(m)x+α(m)m+vxmSmx(m)f(x)dv=1σ2·σ2m·T(m)f(x)f(x)Af(x)+R(m),wherelimmR(m)=0. (A59)

Proof of Lemma A5.

Let us fix fCc[0,). From the involved Poissonian expectations it is easy to see that

limmσ2mT(m)f(0)f(0)Af(0)=0,

and thus (A49) holds for x=0. Accordingly, we next consider the case xE(m)\{0}, with fixed mN¯. From EPSmx(m)2=β(m)+α(m)mx we obtain

EPSmx(m)2xf(x)01(1v)dv=12β(m)·x+α(m)mf(x)=:amxf(x)2=:af(x)2. (A60)

Furthermore, with bmx:=b:=a+x/m·Smx(m)=1mk=1mxY0,k(m)+Y˜0(m) we get on {Smx(m)0}

01fβ(m)x+α(m)m+vxmSmx(m)dv=mx·1Smx(m)abf(y)dy=mx·f(b)f(a)Smx(m) (A61)

as well as

01vfβ(m)x+α(m)m+vxmSmx(m)dv=mxSmx(m)2abyf(y)dyaabf(y)dy=mx·f(b)Smx(m)+mx·f(a)f(b)Smx(m)2. (A62)

With our choice β(m)=1κσ2m and α(m)=β(m)·ησ2, a Taylor expansion of f at x gives

f(a)=f(x)+1σ2m·f(x)β(m)·ηκ·x+o1m, (A63)

where for the case η=κ=0 we use the convention o1m0. Combining (A60) to (A63) and the centering EPSmx(m)=0, the left hand side of Equation (A59) becomes

EP01Smx(m)2x(1v)fβ(m)x+α(m)m+vxmSmx(m)f(x)dv=EPmx·Smx(m)·f(b)f(a)EPmx·Smx(m)·f(b)+m·(f(a)f(b))12β(m)·x+α(m)m·f(x)=m·EPf(b)f(a)12β(m)·x+α(m)m·f(x)=m·EPf1mk=1mxY0,k(m)+Y˜0f(x)1σ2Af(x)+1σ2ηκ·xβ(m)·η+κ·x·f(x)+x21β(m)α(m)m·f(x)m·o1m

which immediately leads to the right hand side of (A59). □

To proceed with the proof of Theorem 10, we obtain for m2κ/σ2 the inequality β(m)1/2 and accordingly for all v]0,1[, xE(m)

β(m)x+α(m)m+vxmSmx(m)=(1v)·x·β(m)+(1v)α(m)m+vk=1mxY0,k(m)+Y˜0x·1v2.

Suppose that the support of f is contained in the interval [0,c]. Correspondingly, for v12c/x the integrand in ϵ(m)(x) is zero and hence with (A64) we obtain the bounds

01Smx(m)2x(1v)fβ(m)x+α(m)m+vxmSmx(m)f(x)dv0(12c/x)1Smx(m)2x(1v)·2fdvx·Smx(m)212cx2f.

From this, one can deduce limmsupxE(m)ϵ(m)(x)=0–and thus (A58) – in the same manner as at the end of the proof of Theorem 9.1.3 in [138] (by means of the dominated convergence theorem).

Proof of Proposition 15.

Let (κA,κH,η)P˜NIP˜SP,1 be fixed. We have to find those orders λR\[0,1] which satisfy for all sufficiently large mN¯

qλ(m)=1κAσ2mλ1κHσ2m1λ<min1,eβλ(m)1. (A64)

In order to achieve this, we interpret qλ(m)=qλ1m in terms of the function

qλ(x):=1κAσ2·xλ1κHσ2·x1λ,x]ϵ,ϵ[, (A65)

for some small enough ϵ>0 such that (A65) is well-defined. Since βλ(m)1=κλσ2·m=κλσ2·x=λκA+(1λ)κHσ2·x, for the verification of (A64) it suffices to show

limx01qλ(x)x>0, (A66)
andlimx0eκλσ2·xqλ(x)x2>0. (A67)

By l’Hospital’s rule, one gets limx01qλ(x)x=λκA+(1λ)κHσ2=κλσ2 and hence

(A66)λ<κHκHκA,ifκA<κH,λ>κHκAκH,ifκA>κH. (A68)

To find a condition that guarantees (A67), we use l’Hospital’s rule twice to deduce

limx0eκλσ2·xqλ(x)x2=12σ4κλ2λ(λ1)(κAκH)2=12σ4λκA2+(1λ)κH2

and hence we obtain

(A67)λ<κH2κH2κA2,ifκA<κH,λ>κH2κA2κH2,ifκA>κH. (A69)

To compare both the lower and upper bounds in (A68) and (A69), let us calculate

κH2κH2κA2κHκHκA=κAκH(κHκA)(κH+κA)<0,ifκA<κH,>0,ifκA>κH. (A70)

Incorporating this, we observe that both conditions (A66) and (A67) are satisfied simultaneously iff

λ<minκHκHκA,κH2κH2κA2=κH2κH2κA2ifκA<κH,λ>maxκHκAκH,κH2κA2κH2=κH2κA2κH2ifκA>κH,

which finishes the proof. □

The following lemma is the main tool for the proof of Theorem 11.

Lemma A6.

Let (κA,κH,η,λ)(P˜NIP˜SP,1)×λ˜,λ˜+\{0,1}. By using the quantities κλ:=λκA+(1λ)κH and Λλ:=λκA2+(1λ)κH2 from (150) (which is well-defined, cf. (138)), one gets for all t>0

(a)limmm·1qλ(m)=limmm·1βλ(m)=κλσ2.(b)limmm2·a1(m)=limmm2·qλ(m)βλ(m)=λ(1λ)κAκH22σ4=Λλ2κλ22σ4.(c)limmm·x0(m)=Λλκλσ2<0,ifλ]0,1[,>0,ifλ]λ˜,λ˜+[\[0,1].(d)limmm2·Γ<(m)=limmm2·Γ>(m)=(Λλκλ)22σ4>0.(e)limmm·(1d(m),S)=Λλ+κλ2σ2>0.(f)limmm·(1d(m),T)=Λλσ2>0.(g)limmm·(1d(m),Sd(m),T)=3Λλ+κλ2σ2>0.(h)limmd(m),Sσ2mt=expΛλ+κλ2·t<1.(i)limmd(m),Tσ2mt=expΛλ·t<1.(j)limmd(m),Sd(m),Tσ2mt=exp3Λλ+κλ2·t<1.(k)forλ]0,1[,thereholdsfortherespectivequantitiesdefinedin(142)to(145)limmm·ζ_σ2mt(m)=Λλκλ22σ2·Λλ·eΛλ·t·1eΛλ·t>0,limmϑ_σ2mt(m)=14·ΛλκλΛλ2·1eΛλ·t2>0,limmm·ζ¯σ2mt(m)=Λλκλ2σ2·e12(Λλ+κλ)·teΛλ·tΛλκλe12(Λλ+κλ)·t1eΛλ·t2·Λλ>0,limmϑ¯σ2mt(m)=Λλκλ2Λλ·1e123Λλ+κλ·t3Λλ+κλ+eΛλ·te12(Λλ+κλ)·tΛλκλ>0.(l)forλ]λ˜,λ˜+[\[0,1],thereholdsfortherespectivequantitiesdefinedin(146)to(149)limmm·ζ_σ2mt(m)=Λλκλ22σ2·κλ·eΛλ·t·1eκλ·t>0,limmϑ_σ2mt(m)=Λλκλ22·κλ·1eΛλ·tΛλ1e(Λλ+κλ)·tΛλ+κλ>0,limmm·ζ¯σ2mt(m)=Λλκλ22·σ2·e12(Λλ+κλ)·t·t1eΛλ·tΛλ>0,limmϑ¯σ2mt(m)=Λλκλ2·[Λλκλ1e12(Λλ+κλ)·tΛλ·Λλ+κλ2+1e12(3Λλ+κλ)·tΛλ·3Λλ+κλe12(Λλ+κλ)·tΛλ+κλ·t]>0.

Proof of Lemma A6.

For each of the assertions (a) to (l), we will make use of l’Hospital’s rule. To begin with, we obtain for arbitrary μ,νR

limmm1(βA(m))μ(βH(m))ν=limmm2μ·(βA(m))μ1(βH(m))νκAσ2m2+ν·(βA(m))μ(βH(m))ν1κHσ2m2=μκAσ2+νκHσ2. (A71)

From this, the first part of (a) follows immediately and the second part is a direct consequence of the definition of βλ(m). Part (b) can be deduced from (A71):

limmm2·a1(m)=limmm2σ2·[λ·κA1(βA(m))λ1(βH(m))1λ+(1λ)·κH1(βA(m))λ(βH(m))λ]=λ(1λ)(κAκH)22σ4=Λλ2κλ22σ4.

For the proof of (c), we rely on the inequalities x_0(m)x0(m)x¯0(m) (mN), where x_0(m) and x¯0(m) are the obvious notational adaptions of (124) and (126), respectively. Notice that x_0(m) and x¯0(m) are solutions of the (again adapted) quadratic equations Q_λ(m)(x)=x resp. Q¯λ(m)(x)=x (cf. (127) and (128)). These solutions clearly exist in the case λ]0,1[. For sufficiently large approximations steps mN, these solutions also exist in the case λ]λ˜,λ˜+[\[0,1] since (138) together with parts (a) and (b) imply

limmm·(1qλ(m))22·qλ(m)·m2·a1(m)=σ2·λκA2+(1λ)κH2>0,forλ]λ˜,λ˜+[\[0,1].

To prove part (c), we show that the limits of x_0(m) and x¯0(m) coincide. Assume first that λ]0,1[. Using (a) and (b), we obtain together with the obvious limit limmqλ(m)=1

limmm·x¯0(m)=limmqλ(m)1·m·(1qλ(m))m·(1qλ(m))22·qλ(m)·m2·a1(m)=κλσ2κλσ22+Λλ2κλ2σ4=Λλκλσ2. (A72)

Let x__0(m) be the adapted version of the auxiliary fixed-point lower bound defined in (125). By incorporating limmβλ(m)=1 we obtain with (a) and (b)

limmx__0(m)=limmmaxβλ(m),qλ(m)βλ(m)1qλ(m)=limm1m·m2·a1(m)m·1qλ(m)=0,

which implies

limmm·x_0(m)=limmex__0(m)qλ(m)·m·(1qλ(m))m·(1qλ(m))22·ex__0(m)qλ(m)·m2·a1(m)=κλσ2κλσ22+Λλ2κλ2σ4=Λλκλσ2. (A73)

Combining (A72) and (A73), the desired result (c) follows for λ]0,1[. Assume now that λ]λ˜,λ˜+[\[0,1]. In this case the approximates x_0(m) and x¯0(m) have a different form, given in (124) and (126). However, the calculations work out in the same way: with parts (a) and (b) we get

limmm·x_0(m)=limm1qλ(m)·m·1qλ(m)m·(1qλ(m))22·qλ(m)·m2·a1(m)=κλσ2κλσ22+Λλ2κλ2σ4=Λλκλσ2,

as well as

limmm·x¯0(m)=limmm·1qλ(m)m·(1qλ(m))22·m2·a1(m)=κλσ2κλσ22+Λλ2κλ2σ4=Λλκλσ2,

which finally finishes the proof of part (c). Assertion (d) is a direct consequence of (c). Since the representations of the parameters c(m),S,d(m),S,c(m),T,d(m),T are the same in both cases λ]0,1[ and λ]λ˜,λ˜+[\[0,1], the following considerations hold generally. Part (e) follows from (b) and (c) by

limmm·(1d(m),S)=limmm2·a1(m)m·x0(m)=Λλ+κλ2σ2>0.

Notice that this term is positive since on ]λ˜,λ˜+[\{0,1} there holds κλ>0 as well as Λλ>0, cf. (A70). To prove (f), we apply the general limit limx0ex1x=1 and get with (a), (c)

limmm·(1d(m),T)=limmm·1qλ(m)qλ(m)·m·x0(m)·ex0(m)1x0(m)=Λλσ2.

The limit (g) can be obtained from (e) and (f):

limmm·(1d(m),Sd(m),T)=limmm·(1d(m),S)+d(m),S·m·(1d(m),T)=3Λλ+κλ2σ2.

The assertions (h) resp. (i) resp. (j) follow from (e) resp. (f) resp. (g) by using the general relation limm1+xmmm=explimmxm. To get the last two parts (k) and (l), we make repeatedly use of the results (a) to (j) and combine them with the formulas (142) to (149) of Corollary 14. More detailed, for λ]0,1[(and thus qλ(m)<βλ(m)) we obtain

m·ζ_σ2mt(m)=m2·Γ<(m)·d(m),Tσ2mt1m·1d(m),T·1d(m),Tσ2mtmΛλκλ22σ2·Λλ·eΛλ·t·1eΛλ·t>0,ϑ_σ2mt(m)=m2·Γ<(m)·1d(m),Tσ2mtm·1d(m),T2·1d(m),T1+d(m),Tσ2mt1+d(m),Tm14·ΛλκλΛλ2·1eΛλ·t2>0,
m·ζ¯σ2mt(m)=m2·Γ<(m)·[d(m),Sσ2mtd(m),Tσ2mtm·1d(m),Tm·1d(m),Sd(m),Sσ2mt1·1d(m),Tσ2mtm·1d(m),T]mΛλκλ2σ2·e12(Λλ+κλ)·teΛλ·tΛλκλe12(Λλ+κλ)·t1eΛλ·t2·Λλ>0,
ϑ¯σ2mt(m)=m2·Γ<(m)·d(m),Tm·1d(m),T·1d(m),Sd(m),Tσ2mtm·1d(m),Sd(m),Td(m),Sσ2mtd(m),Tσ2mtm·1d(m),Tm·1d(m),SmΛλκλ2Λλ·1e123Λλ+κλ·t3Λλ+κλ+eΛλ·te12(Λλ+κλ)·tΛλκλ>0.

For λ]λ˜,λ˜+[\[0,1](and thus qλ(m)>βλ(m)) we get

m·ζ_σ2mt(m)=m2·Γ>(m)·d(m),Tσ2mtd(m),S2·σ2mtm·1d(m),S1+d(m),Sm·1d(m),TmΛλκλ22σ2·κλ·eΛλ·t·1eκλ·t>0,ϑ_σ2mt(m)=m2·Γ>(m)m·1d(m),S1+d(m),Sm·1d(m),T·d(m),T·1d(m),Tσ2mtm·1d(m),Td(m),S2·1d(m),S2·σ2mtm·1d(m),S1+d(m),SmΛλκλ22·κλ·1eΛλ·tΛλ1e(Λλ+κλ)·tΛλ+κλ>0,m·ζ¯σ2mt(m)=m2·Γ>(m)·d(m),Sσ2mt1·σ2mtm1d(m),Tσ2mtm·1d(m),TmΛλκλ22·σ2·e12(Λλ+κλ)·t·t1eΛλ·tΛλ>0,
ϑ¯σ2mt(m)=m2·Γ>(m)·[m·1d(m),Tm·1d(m),Sm2·1d(m),S2·m·1d(m),T·1d(m),Sσ2mt+d(m),T1d(m),Sd(m),Tσ2mtm·1d(m),T·m·1d(m),Sd(m),Td(m),Sσ2mtm·1d(m),S·σ2mtm]mΛλκλ2·Λλκλ1e12(Λλ+κλ)·tΛλ·Λλ+κλ2+1e12(3Λλ+κλ)·tΛλ·3Λλ+κλe12(Λλ+κλ)·tΛλ+κλ·t>0.

Proof of Theorem 11.

It suffices to compute the limits of the bounds given in Corollary 14 as m tends to infinity. This is done by applying Lemma A6 which provides corresponding limits of all quantities of interest. Accordingly, for all t>0 the lower bound (153) in the case λ]0,1[ can be obtained from (140), (142) and (143) by

limmexp{x0(m)·X0(m)ησ2·d(m),T1d(m),T1d(m),Tσ2mt+x0(m)ησ2·σ2mt+ζ_σ2mt(m)·X0(m)+ϑ_σ2mt(m)}=limmexp{m·x0(m)·X0(m)mησ2·d(m),Tm·1d(m),T1d(m),Tσ2mt+m·x0(m)ησ2·σ2mtm+m·ζ_σ2mt(m)·X0(m)m+ϑ_σ2mt(m)}=exp{Λλκλσ2·X˜0ησ2·σ2Λλ1eΛλtΛλκλσ2·ησ2·σ2t+Λλκλ22σ2·Λλ·eΛλ·t·1eΛλ·t·X˜0+η4σ2·ΛλκλΛλ2·1eΛλ·t2}=expΛλκλσ2X˜0ηΛλ1eΛλ·tησ2Λλκλ·t+Lλ(1)(t)·X˜0+ησ2·Lλ(2)(t).

For all t>0, the upper bound (154) in the case λ]0,1[ follows analogously from (141), (144), (145) by

limmexp{x0(m)·X0(m)ησ2·d(m),S1d(m),S1d(m),Sσ2mt+x0(m)ησ2·σ2mtζ¯σ2mt(m)·X0(m)ϑ¯σ2mt(m)}=limmexp{m·x0(m)·X0(m)mησ2·d(m),Sm·1d(m),S1d(m),Sσ2mt+m·x0(m)ησ2·σ2mtmm·ζ¯σ2mt(m)·X0(m)mϑ¯σ2mt(m)}=exp{Λλκλσ2X˜0ησ2·2σ2Λλ+κλ1e12(Λλ+κλ)tΛλκλσ2·ησ2·σ2tΛλκλ2σ2·e12(Λλ+κλ)·teΛλ·tΛλκλe12(Λλ+κλ)·t1eΛλ·t2·Λλ·X˜0ησ2Λλκλ2Λλ·1e123Λλ+κλ·t3Λλ+κλ+eΛλ·te12(Λλ+κλ)·tΛλκλ}=exp{Λλκλσ2X˜0η12(Λλ+κλ)1e12(Λλ+κλ)·tησ2Λλκλ·tUλ(1)(t)·X˜0ησ2·Uλ(2)(t)}.

In the case λ]λ˜,λ˜+[\[0,1], the lower bound as well as the upper bound of the Hellinger integral limit is obtained analogously, by taking into account that the quantities ζ_n(m),ϑ_n(m),ζ¯n(m),ϑ¯n(m) now have the form (146) to (149) instead of (142) to (145). Thus, the functions Lλ(1)(t),Uλ(1)(t),Lλ(2)(t),Uλ(2)(t) are obtained by employing the limits of part (l) of Lemma A6 instead of part (k). □

The next Lemma (and parts of its proof) will be useful for the verification of Theorem 12:

Lemma A7.

Recall the bounds on the Hellinger integral mlimit given in (153) and (154) of Theorem 11, in terms of Lλ(i)(t) and Uλ(i)(t) (i=1,2) defined by (155) to (158). Correspondingly, one gets the following λlimits for all t[0,[:

  • (a) 
    for all κA]0,[ and all κH[0,[ with κAκH
    limλ1Lλ(1)(t)λ=limλ1Lλ(2)(t)λ=limλ1Uλ(1)(t)λ=limλ1Uλ(2)(t)λ=0. (A74)
  • (b) 
    for κA=0 and all κH]0,[
    limλ1Lλ(1)(t)λ=κH2·t2σ2, (A75)
    limλ1Lλ(2)(t)λ=κH2·t24, (A76)
    limλ1Uλ(1)(t)λ=limλ1Uλ(2)(t)λ=0. (A77)

Proof of Lemma A7.

For all κA,κH[0,[ with κAκH one can deduce from (150) as well as (155) to (158) the following derivatives:

Lλ(1)(t)λ=12σ2{t2ΛλκλΛλ2κA2κH22e2ΛλteΛλt+eΛλt1eΛλtΛλΛλκλΛλκA2κH22Λλ(κAκH)ΛλκλΛλ2κA2κH22}, (A78)
Lλ(2)(t)λ=14{ΛλκλΛλ·1eΛλtΛλ2·κA2κH22Λλ(κAκH)ΛλκλΛλκA2κH2+t·eΛλt·ΛλκλΛλ2·1eΛλtΛλ·κA2κH2}, (A79)
Uλ(1)(t)λ=1σ2{Λλκλ2ΛλteΛλtκA2κH2t2e12(Λλ+κλ)tκA2κH2+2Λλ(κAκH)e12(Λλ+κλ)teΛλt2Λλ·κA2κH22Λλ(κAκH)+Λλκλ2Λλ2[t2e12(Λλ+κλ)tκA2κH2+2Λλ(κAκH)t2e12(3Λλ+κλ)t3κA2κH2+2Λλ(κAκH)+e12(Λλ+κλ)t·1eΛλtΛλ·κA2κH2]+ΛλκλΛλκA2κH22Λλ(κAκH)e12(Λλ+κλ)teΛλtΛλκλe12(Λλ+κλ)t1eΛλt2Λλ}, (A80)
Uλ(2)(t)λ=Λλκλ2Λλ(3Λλ+κλ)[t2e12(3Λλ+κλ)t3κA2κH22Λλ+κAκH1e12(3Λλ+κλ)t3Λλ+κλ·3κA2κH22Λλ+κAκH]+ΛλκλΛλt2e12(Λλ+κλ)tκA2κH22Λλ+κAκHteΛλtκA2κH22Λλ+e12(Λλ+κλ)teΛλtΛλκA2κH22ΛλκA+κH+2κA2κH22ΛλκA+κHΛλκλΛλ2·κA2κH22·1ΛλΛλκλ3Λλ+κλ1e12(3Λλ+κλ)te12(Λλ+κλ)t+eΛλt. (A81)

If κA]0,[ and κH[0,[ with κAκH, then one gets limλ1Λλ=limλ1κλ=κA>0 which implies (A74) from (A78) to (A81). For the proof of part (b), let us correspondingly assume κA=0 and κH]0,[, which by (150) leads to κλ=κH·(1λ), Λλ=κH·1λ and the convergences limλ1Λλ=limλ1κλ=0. From this, the assertions (A75), (A76), (A77) follow in a straightforward manner from (A78), (A79), (A80) – respectively – by using (parts of) the obvious relations

limλ1κλΛλ=0,limλ1Λλ±κλΛλ=limλ1ΛλκλΛλ+κλ=1, (A82)
limλ11ecλ·tcλ=tforallcλΛλ,Λλ+κλ2,3Λλ+κλ2. (A83)

In order to get the last assertion in (A77), we make use of the following limits

limλ11Λλκλ33Λλ+κλ=limλ14κH(κHκH·1λ)·(3κH+κH·1λ)=43κH (A84)

and

limλ11Λλ1e12(3Λλ+κλ)t3Λλ+κλ1eΛλtΛλκλ+1e12(Λλ+κλ)tΛλκλ=0. (A85)

To see (A85), let us first observe that the involved limit can be rewritten as

limλ1{1Λλ(Λλκλ)1313e12(3Λλ+κλ)t+eΛλte12(Λλ+κλ)t (A86)
+1e12(3Λλ+κλ)tΛλ13Λλ+κλ13(Λλκλ)}. (A87)

Substituting x:=1λ and applying l’Hospital’s rule twice, we get for the first limit (A86)

limx01313eκHt2(3x+x2)+eκHtxeκHt2(x+x2)κH2·x2x3=limx0κHt6(3+2x)eκHt2(3x+x2)κHteκHtx+κHt2(1+2x)eκHt2(x+x2)κH2·2x3x2=limx0κH2t212(3+2x)2+κHt3eκHt2(3x+x2)+κH2t2eκHtxκH2t24(1+2x)2κHteκHt2(x+x2)κH2·26x=12κH23κH2t24+κHt3+κH2t2κH2t24+κHt=2t3κH.

The second limit (A87) becomes

limλ11e12(3Λλ+κλ)t3Λλ+κλ·3Λλ+κλΛλ·4κH(3κH+1λκH)(3κH31λκH) (A88)

and consequently (A85) follows. To proceed with the proof of (A77), we rearrange

limλ1Uλ(2)(t)λ=limλ1{ΛλκλΛλ2[Λλ3Λλ+κλt2e12(3Λλ+κλ)t3κH22ΛλκHΛλ3Λλ+κλ·1e12(3Λλ+κλ)t3Λλ+κλ3κH22ΛλκH+ΛλΛλκλe12(Λλ+κλ)teΛλtΛλκλκH22Λλ+κHΛλΛλκλt2e12(Λλ+κλ)tκH22ΛλκHteΛλtκH22Λλ]+ΛλκλΛλκH2+2ΛλκH+ΛλκλΛλ2κH22·1e12(3Λλ+κλ)tΛλ(3Λλ+κλ)e12(Λλ+κλ)teΛλtΛλ(Λλκλ)}=limλ1{ΛλκλΛλ2[κH2t43e12(3Λλ+κλ)t3Λλ+κλe12(Λλ+κλ)tΛλκλ+2eΛλtΛλκλ (A89)
+κH2231e12(3Λλ+κλ)t3Λλ+κλ21eΛλtΛλκλ2+1e12(Λλ+κλ)tΛλκλ2 (A90)
+κH(Λλ3Λλ+κλ·te12(3Λλ+κλ)t2+Λλ3Λλ+κλ·1e12(3Λλ+κλ)t3Λλ+κλΛλΛλκλ·te12(Λλ+κλ)t2+ΛλΛλκλ·1eΛλtΛλκλΛλΛλκλ·1e12(Λλ+κλ)tΛλκλ)]+ΛλκλΛλκH2+2ΛλκH+ΛλκλΛλ2κH22·1e12(3Λλ+κλ)tΛλ(3Λλ+κλ)e12(Λλ+κλ)teΛλtΛλ(Λλκλ)}. (A91)

By means of (A82) to (A84), the limit of the expression after the squared brackets in (A89) becomes

limλ1{κH2t41e12(Λλ+κλ)tΛλκλ21eΛλtΛλκλ+31e12(3Λλ+κλ)t3Λλ+κλ+1Λλκλ33Λλ+κλ=κHt3, (A92)

and the limit of the expression in (A90) becomes with (A85)

limλ1{ΛλΛλκλ·κH22Λλ·1e12(3Λλ+κλ)t3Λλ+κλ1eΛλtΛλκλ+1e12(Λλ+κλ)tΛλκλκH22·1e12(3Λλ+κλ)t3Λλ+κλ·1Λλκλ33Λλ+κλ=κHt3. (A93)

By putting (A91)–(A93) together with (A85) we finally end up with

limλ1Uλ(2)(t)λ=κHt3κHt3+κHt6+t6t2+tt2+κH2+κH22·0=0,

which finishes the proof of Lemma A7. □

Proof of Theorem 12.

Recall from (131) the approximative Poisson offspring-distribution parameter β(m):=1κσ2m and Poisson immigration-distribution parameter α(m):=β(m)·ησ2, which is a special case of βA(m),βH(m),αA(m),αH(m)PNIPSP,1. Let us first calculate limmIPA,σ2mt(m)PH,σ2mt(m) by starting from Theorem 3(a). Correspondingly, we evaluate for all κA0, κH0 with κAκH by a twofold application of l’Hospital’s rule

limmm2·βA(m)·logβA(m)βH(m)1+βH(m)=limmm2σ2κAlogβA(m)βH(m)+κH1βA(m)βH(m)=12σ4·limmβH(m)·κAβA(m)·κHβH(m)2·κA·βH(m)βA(m)κH=κAκH22σ4. (A94)

Additionally there holds

limmm·(1βA(m))=κAσ2andlimmβA(m)σ2mt=limm1κAσ2mmσ2mt/m=eκA·t. (A95)

For κA>0, we apply the upper part of formula (69) as well as (A94) and (A95) to derive

limmIλPA,σ2mt(m)PH,σ2mt(m)=limmm2·βA(m)·logβA(m)βH(m)1+βH(m)m·(1βA(m))·X0(m)mαA(m)m·(1βA(m))·1βA(m)σ2mt+αA(m)βA(m)·m·(1βA(m))·m2·βA(m)·logβA(m)βH(m)1+βH(m)·σ2mtm=κAκH22σ2·κA·X˜0ηκA·1eκA·t+η·t.

For κA=0 (and thus κH>0, βA(m)1, αA(m)η/σ2), we apply the lower part of formula (69) as well as (A94) and (A95) to obtain

limmIλPA,σ2mt(m)PH,σ2mt(m)={limmm2·βH(m)logβH(m)1·η2σ2·σ2mt2m2+X0(m)m+η2σ2·m·σ2mtm}=κH22σ2·η2·t2+X˜0·t.

Let us now calculate the “converse” double limit

limλ1limmIλPA,σ2mt(m)PH,σ2mt(m)=limλ1limm1HλPA,σ2mt(m)PH,σ2mt(m)λ·(1λ).

This will be achieved by evaluating for each t>0 the two limits

limλ11dλ,X˜0,tLλ·(1λ)andlimλ11dλ,X˜0,tUλ·(1λ) (A96)

which will turn out to coincide; the involved lower and upper bound dλ,X˜0,tL, dλ,X˜0,tU defined by (153) and (154) satisfy limλ1dλ,X˜0,tL=limλ1dλ,X˜0,tU=1 as an easy consequence of the limits (cf. 150)

limλ1Λλ=κA0andlimλ1κλ=κA0, (A97)

as well as the formulas (A82) and (A83) for the case κA=0. Accordingly, we compute

limλ11dλ,X˜0,tLλ·(1λ)=limλ1dλ,X˜0,tL12λλ[Λλκλσ2·X˜0ηΛλ·1eΛλ·tησ2·Λλκλ·t+Lλ(1)(t)·X˜0+ησ2·Lλ(2)(t)]=limλ1{Λλκλσ2X˜0ηΛλ·teΛλ·t·Λλλ+1eΛλ·t·ηΛλ2·Λλλ1σ2·λΛλκλ·X˜0ηΛλ·1eΛλ·tηtσ2·λΛλκλ+X˜0Lλ(1)(t)λ+ησ2Lλ(2)(t)λ},with (A98)
Λλλ=κA2κH22Λλandκλλ=κAκH. (A99)

For the case κA>0, one can combine this with (A97) and (A74) to end up with

limλ11dλ,X˜0,tLλ·(1λ)=κAκH22σ2·κA·X˜0ηκA·1eκA·t+η·t. (A100)

For the case κA=0, we continue the calculation (A98) by rearranging terms and by employing the Formulas (A75), (A76), (A82) and (A83) as well as the obvious relation 1ΛΛκλΛ2=1κH and obtain

limλ11dλ,X˜0,tLλ·(1λ)=limλ1{κH2·X˜02σ2ΛλκλΛλ·t·eΛλt+1eΛλtΛλ+η·κH2·t2σ21ΛλΛλκλΛλ2+ΛλκλΛλ·1eΛλtΛλη·κH22σ2·1eΛλtΛλ1ΛλΛλκλΛλ2κH·X˜0σ21eΛλt+η·κHσ21eΛλtΛλt+Lλ(1)(t)λ·X˜0+ησ2·Lλ(2)(t)λ}=κH2X˜0tσ2+ηκH2t2σ21κH+tηκHt2σ2κH2X˜0t2σ2ηκH2t24σ2=κH22σ2·η2·t2+X˜0·t. (A101)

Let us now turn to the second limit (A96) for which we compute analogously to (A98)

limλ11dλ,X˜0,tUλ·(1λ)=limλ1dλ,X˜0,tU12λλ[Λλκλσ2·X˜0η12(Λλ+κλ)·1e12(Λλ+κλ)·tησ2·Λλκλ·tUλ(1)(t)·X˜0ησ2·Uλ(2)(t)]=limλ1{Λλκλσ2[X˜0η12(Λλ+κλ)·t2·e12(Λλ+κλ)·tλΛλ+κλ+1e12(Λλ+κλ)·t·2·η(Λλ+κλ)2·λ(Λλ+κλ)]1σ2·λΛλκλ·X˜0η12(Λλ+κλ)·1e12(Λλ+κλ)·tηtσ2·λΛλκλUλ(1)(t)λ·X˜0ησ2Uλ(2)(t)λ}. (A102)

For the case κA>0, one can combine this with (A97), (A99) and (A74) to end up with

limλ11dλ,X˜0,tUλ·(1λ)=κAκH22σ2·κA·X˜0ηκA·1eκA·t+η·t. (A103)

For the case κA=0, we continue the calculation of (A102) by rearranging terms and by employing the formulas (A77), (A82) and (A83) as well as the obvious relation limλ11ΛλΛλκλΛλ(Λλ+κλ)=2κH to obtain

limλ11dλ,X˜0,tUλ·(1λ)=limλ1{t·X˜04σ2·ΛλκλΛλ·e12Λλ+κλ·tκH2+2ΛλκH+X˜02σ2·1e12(Λλ+κλ)·tΛλκH22ΛλκHη·tσ2[κH1+e12Λλ+κλ·tΛλκλΛλ+κλκH22·1ΛλΛλκλΛλ(Λλ+κλ)+ΛλκλΛλ+κλ·1e12Λλ+κλ·tΛλ]+2ησ2·1e12Λλ+κλ·tΛλ+κλκH1+ΛλκλΛλ+κλκH221ΛλΛλκλΛλ(Λλ+κλ)Uλ(1)(t)λ·X˜0ησ2Uλ(2)(t)λ}=κH2tX˜04σ2+κH2tX˜04σ2ηtσ22κHκHκH2t4+ηtσ22κHκH=κH22σ2η2·t2+X˜0·t. (A104)

Since (A100) coincides with (A103) and (A101) coincides with (A104), we have finished the proof. □

Author Contributions

Conceptualization, N.B.K. and W.S.; Formal analysis, N.B.K. and W.S.; Methodology, N.B.K. and W.S.; Visualization, N.B.K.; Writing, N.B.K. and W.S. All authors have read and agreed to the published version of the manuscript.

Funding

Niels B. Kammerer received a scholarship of the “Studienstiftung des Deutschen Volkes” for his PhD Thesis.

Conflicts of Interest

The authors declare no conflict of interest.

References

  • 1.Liese F., Vajda I. Convex Statistical Distances. Teubner; Leipzig, Germany: 1987. [Google Scholar]
  • 2.Read T.R.C., Cressie N.A.C. Goodness-of-Fit Statistics for Discrete Multivariate Data. Springer; New York, NY, USA: 1988. [Google Scholar]
  • 3.Vajda I. Theory of Statistical Inference and Information. Kluwer; Dordrecht, The Netherlands: 1989. [Google Scholar]
  • 4.Csiszár I., Shields P.C. Information Theory and Statistics: A Tutorial. Now Publishers; Hanover, MA, USA: 2004. [Google Scholar]
  • 5.Stummer W. Exponentials, Diffusions, Finance, Entropy and Information. Shaker; Aachen, Germany: 2004. [Google Scholar]
  • 6.Pardo L. Statistical Inference Based on Divergence Measures. Chapman & Hall/CRC; Bocan Raton, FL, USA: 2006. [Google Scholar]
  • 7.Liese F., Miescke K.J. Statistical Decision Theory: Estimation, Testing, and Selection. Springer; New York, NY, USA: 2008. [Google Scholar]
  • 8.Basu A., Shioya H., Park C. Statistical Inference: The Minimum Distance Approach. CRC Press; Boca Raton, FL, USA: 2011. [Google Scholar]
  • 9.Voinov V., Nikulin M., Balakrishnan N. Chi-Squared Goodness of Fit Tests with Applications. Academic Press; Waltham, MA, USA: 2013. [Google Scholar]
  • 10.Liese F., Vajda I. On divergences and informations in statistics and information theory. IEEE Trans. Inform. Theory. 2006;52:4394–4412. [Google Scholar]
  • 11.Vajda I., van der Meulen E.C. Goodness-of-fit criteria based on observations quantized by hypothetical and empirical percentiles. In: Karian Z.A., Dudewicz E.J., editors. Handbook of Fitting Statistical Distributions with R. CRC; Heidelberg, Germany: 2010. pp. 917–994. [Google Scholar]
  • 12.Stummer W., Vajda I. On Bregman distances and divergences of probability measures. IEEE Trans. Inform. Theory. 2012;58:1277–1288. [Google Scholar]
  • 13.Kißlinger A.-L., Stummer W. Robust statistical engineering by means of scaled Bregman distances. In: Agostinelli C., Basu A., Filzmoser P., Mukherjee D., editors. Recent Advances in Robust Statistics–Theory and Applications. Springer; New Delhi, India: 2016. pp. 81–113. [Google Scholar]
  • 14.Broniatowski M., Stummer W. Some universal insights on divergences for statistics, machine learning and artificial intelligence. In: Nielsen F., editor. Geometric Structures of Information. Springer; Cham, Switzerland: 2019. pp. 149–211. [Google Scholar]
  • 15.Stummer W., Vajda I. Optimal statistical decisions about some alternative financial models. J. Econom. 2007;137:441–471. [Google Scholar]
  • 16.Stummer W., Lao W. Limits of Bayesian decision related quantities of binomial asset price models. Kybernetika. 2012;48:750–767. [Google Scholar]
  • 17.Csiszar I. Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizität von Markoffschen Ketten. Publ. Math. Inst. Hungar. Acad. Sci. 1963;A-8:85–108. [Google Scholar]
  • 18.Ali M.S., Silvey D. A general class of coefficients of divergence of one distribution from another. J. Roy. Statist. Soc. B. 1966;28:131–140. [Google Scholar]
  • 19.Morimoto T. Markov processes and the H-theorem. J. Phys. Soc. Jpn. 1963;18:328–331. [Google Scholar]
  • 20.van Erven T., Harremoes P. Renyi divergence and Kullback-Leibler divergence. IEEE Trans. Inf. Theory. 2014;60:3797–3820. [Google Scholar]
  • 21.Newman C.M. Topics in Probability Theory. Courant Institute of Mathematical Sciences New York University; New York, NY, USA: 1973. On the orthogonality of independent increment processes; pp. 93–111. [Google Scholar]
  • 22.Liese F. Hellinger integrals of Gaussian processes with independent increments. Stochastics. 1982;6:81–96. [Google Scholar]
  • 23.Memin J., Shiryayev A.N. Distance de Hellinger-Kakutani des lois correspondant a deux processus a accroissements indépendants. Probab. Theory Relat. Fields. 1985;70:67–89. [Google Scholar]
  • 24.Jacod J., Shiryaev A.N. Limit Theorems for Stochastic Processes. Springer; Berlin, Germany: 1987. [Google Scholar]
  • 25.Linkov Y.N., Shevlyakov Y.A. Large deviation theorems in the hypotheses testing problems for processes with independent increments. Theory Stoch. Process. 1998;4:198–210. [Google Scholar]
  • 26.Liese F. Hellinger integrals, error probabilities and contiguity of Gaussian processes with independent increments and Poisson processes. J. Inf. Process. Cybern. 1985;21:297–313. [Google Scholar]
  • 27.Kabanov Y.M., Liptser R.S., Shiryaev A.N. On the variation distance for probability measures defined on a filtered space. Probab. Theory Relat. Fields. 1986;71:19–35. [Google Scholar]
  • 28.Liese F. Hellinger integrals of diffusion processes. Statistics. 1986;17:63–78. [Google Scholar]
  • 29.Vajda I. Distances and discrimination rates for stochastic processes. Stoch. Process. Appl. 1990;35:47–57. [Google Scholar]
  • 30.Stummer W. The Novikov and entropy conditions of multidimensional diffusion processes with singular drift. Probab. Theory Relat. Fields. 1993;97:515–542. [Google Scholar]
  • 31.Stummer W. On a statistical information measure of diffusion processes. Stat. Decis. 1999;17:359–376. [Google Scholar]
  • 32.Stummer W. On a statistical information measure for a generalized Samuelson-Black-Scholes model. Stat. Decis. 2001;19:289–314. [Google Scholar]
  • 33.Bartoszynski R. Branching processes and the theory of epidemics. In: Le Cam L.M., Neyman J., editors. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. IV. University of California Press; Berkeley, CA, USA: 1967. pp. 259–269. [Google Scholar]
  • 34.Ludwig D. Qualitative behaviour of stochastic epidemics. Math. Biosci. 1975;23:47–73. [Google Scholar]
  • 35.Becker N.G. Estimation for an epidemic model. Biometrics. 1976;32:769–777. [PubMed] [Google Scholar]
  • 36.Becker N.G. Estimation for discrete time branching processes with applications to epidemics. Biometrics. 1977;33:515–522. [PubMed] [Google Scholar]
  • 37.Metz J.A.J. The epidemic in a closed population with all susceptibles equally vulnerable; some results for large susceptible populations and small initial infections. Acta Biotheor. 1978;27:75–123. doi: 10.1007/BF00048405. [DOI] [PubMed] [Google Scholar]
  • 38.Heyde C.C. On assessing the potential severity of an outbreak of a rare infectious disease. Austral. J. Stat. 1979;21:282–292. [Google Scholar]
  • 39.Von Bahr B., Martin-Löf A. Threshold limit theorems for some epidemic processes. Adv. Appl. Prob. 1980;12:319–349. [Google Scholar]
  • 40.Ball F. The threshold behaviour of epidemic models. J. Appl. Prob. 1983;20:227–241. [Google Scholar]
  • 41.Jacob C. Branching processes: Their role in epidemics. Int. J. Environ. Res. Public Health. 2010;7:1186–1204. doi: 10.3390/ijerph7031204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Barbour A.D., Reinert G. Approximating the epidemic curve. Electron. J. Probab. 2013;18:1–30. [Google Scholar]
  • 43.Britton T., Pardoux E. Stochastic epidemics in a homogeneous community. In: Britton T., Pardoux E., editors. Stochastic Epidemic Models. Springer; Cham, Switzerland: 2019. pp. 1–120. [Google Scholar]
  • 44.Dion J.P., Gauthier G., Latour A. Branching processes with immigration and integer-valued time series. Serdica Math. J. 1995;21:123–136. [Google Scholar]
  • 45.Grunwald G.K., Hyndman R.J., Tedesco L., Tweedie R.L. Non-Gaussian conditional linear AR(1) models. Aust. N. Z. J. Stat. 2000;42:479–495. [Google Scholar]
  • 46.Kedem B., Fokianos K. An Regression Models for Time Series Analysis. Wiley; Hoboken, NJ, USA: 2002. [Google Scholar]
  • 47.Held L., Höhle M., Hofmann M. A statistical framework for the analysis of multivariate infectious disease surveillance counts. Stat. Model. 2005;5:187–199. [Google Scholar]
  • 48.Weiss C.H. An Introduction to Discrete-Valued Time Series. Wiley; Hoboken, NJ, USA: 2018. [Google Scholar]
  • 49.Feigin P.D., Passy U. The geometric programming dual to the extinction probability problem in simple branching processes. Ann. Probab. 1981;9:498–503. [Google Scholar]
  • 50.Mordecki E. Asymptotic mixed normality and Hellinger processes. Stoch. Stoch. Rep. 1994;48:129–143. [Google Scholar]
  • 51.Sriram T.N., Vidyashankar A.N. Minimum Hellinger distance estimation for supercritical Galton-Watson processes. Stat. Probab. Lett. 2000;50:331–342. [Google Scholar]
  • 52.Guttorp P. Statistical Inference for Branching Processes. Wiley; New York, NY, USA: 1991. [Google Scholar]
  • 53.Linkov Y.N., Lunyova L.A. Large deviation theorems in the hypothesis testing problems for the Galton-Watson processes with immigration. Theory Stoch. Process. 1996;2:120–132. Erratum in Theory Stoch. Process1997, 3, 270–285. [Google Scholar]
  • 54.Heathcote C.R. A branching process allowing immigration. J. R. Stat. Soc. B. 1965;27:138–143. Erratum in: Heathcote, C.R. Corrections and comments on the paper “A branching process allowing immigration”. J. R. Stat. Soc. B1966, 28, 213–217. [Google Scholar]
  • 55.Athreya K.B., Ney P.E. Branching Processes. Springer; New York, NY, USA: 1972. [Google Scholar]
  • 56.Jagers P. Branching Processes with Biological Applications. Wiley; London, UK: 1975. [Google Scholar]
  • 57.Asmussen S., Hering H. Branching Processes. Birkhäuser; Boston, MA, USA: 1983. [Google Scholar]
  • 58.Haccou P., Jagers P., Vatutin V.A. Branching Processes: Variation, Growth, and Extinction of Populations. Cambrigde University Press; Cambridge, UK: 2005. [Google Scholar]
  • 59.Heyde C.C., Seneta E. Estimation theory for growth and immigration rates in a multiplicative process. J. Appl. Probab. 1972;9:235–256. [Google Scholar]
  • 60.Basawa I.V., Rao B.L.S. Statistical Inference of Stochastic Processes. Academic Press; London, UK: 1980. [Google Scholar]
  • 61.Basawa I.V., Scott D.J. Asymptotic Optimal Inference for Non-Ergodic Models. Springer; New York, NY, USA: 1983. [Google Scholar]
  • 62.Sankaranarayanan G. Branching Processes and Its Estimation Theory. Wiley; New Delhi, India: 1989. [Google Scholar]
  • 63.Wei C.Z., Winnicki J. Estimation of the means in the branching process with immigration. Ann. Stat. 1990;18:1757–1773. [Google Scholar]
  • 64.Winnicki J. Estimation of the variances in the branching process with immigration. Probab. Theory Relat. Fields. 1991;88:77–106. [Google Scholar]
  • 65.Yanev N.M. Statistical inference for branching processes. In: Ahsanullah M., Yanev G.P., editors. Records and Branching Processes. Nova Science Publishes; New York, NY, USA: 2008. pp. 147–172. [Google Scholar]
  • 66.Harris T.E. The Theory of Branching Processes. Springer; Berlin, Germany: 1963. [Google Scholar]
  • 67.Gauthier G., Latour A. Convergence forte des estimateurs des parametres d’un processus GENAR(p) Ann. Sci. Math. Que. 1994;18:49–71. [Google Scholar]
  • 68.Latour A. Existence and stochastic structure of a non-negative integer-valued autoregressive process. J. Time Ser. Anal. 1998;19:439–455. [Google Scholar]
  • 69.Rydberg T.H., Shephard N. Econometric Society World Congress. Econometric Society; Cambridge, UK: 2000. BIN models for trade-by-trade data. Modelling the number of trades in a fixed interval of time. Contributed Papers No. 0740. [Google Scholar]
  • 70.Brandt P.T., Williams J.T. A linear Poisson autoregressive model: The Poisson AR(p) model. Polit. Anal. 2001;9:164–184. [Google Scholar]
  • 71.Heinen A. Core Discussion Paper. Volume 62. University of Louvain; Louvain, Belgium: 2003. [(accessed on 18 May 2020)]. Modelling time series count data: An autoregressive conditional Poisson model. MPRA Paper No. 8113. Available online: https://mpra.ub.uni-muenchen.de/8113. [Google Scholar]
  • 72.Held L., Hofmann M., Höhle M., Schmid V. A two-component model for counts of infectious diseases. Biostatistics. 2006;7:422–437. doi: 10.1093/biostatistics/kxj016. [DOI] [PubMed] [Google Scholar]
  • 73.Finkenstädt B.F., Bjornstad O.N., Grenfell B.T. A stochastic model for extinction and recurrence of epidemics: Estimation and inference for measles outbreak. Biostatistics. 2002;3:493–510. doi: 10.1093/biostatistics/3.4.493. [DOI] [PubMed] [Google Scholar]
  • 74.Ferland R., Latour A., Oraichi D. Integer-valued GARCH process. J. Time Ser. Anal. 2006;27:923–942. [Google Scholar]
  • 75.Weiß C.H. Modelling time series of counts with overdispersion. Stat. Methods Appl. 2009;18:507–519. [Google Scholar]
  • 76.Weiß C.H. The INARCH(1) model for overdispersed time series of counts. Comm. Stat. Sim. Comp. 2010;39:1269–1291. [Google Scholar]
  • 77.Weiß C.H. INARCH(1) processes: Higher-order moments and jumps. Stat. Probab. Lett. 2010;80:1771–1780. [Google Scholar]
  • 78.Weiß C.H., Testik M.C. Detection of abrupt changes in count data time series: Cumulative sum derivations for INARCH(1) models. J. Qual. Technol. 2012;44:249–264. [Google Scholar]
  • 79.Kaslow R.A., Evans A.S. Epidemiologic concepts and methods. In: Evans A.S., Kaslow R.A., editors. Viral Infections of Humans. Springer; New York, NY, USA: 1997. pp. 3–58. [Google Scholar]
  • 80.Osterholm M.T., Hedberg C.W. Epidemiologic principles. In: Bennett J.E., Dolin R., Blaser M.J., editors. Mandell, Douglas, and Bennett’s Principles and Practice of Infectious Diseases. 8th ed. Elsevier; Philadelphia, PA, USA: 2015. pp. 146–157. [Google Scholar]
  • 81.Grassly N.C., Fraser C. Mathematical models of infectious disease transmission. Nat. Rev. 2008;6:477–487. doi: 10.1038/nrmicro1845. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Keeling M.J., Rohani P. Modeling Infectious Diseases in Humans and Animals. Princeton UP; Princeton, NJ, USA: 2008. [Google Scholar]
  • 83.Yan P. Distribution theory stochastic processes and infectious disease modelling. In: Brauer F., van den Driessche P., Wu J., editors. Mathematical Epidemiology. Springer; Berlin, Germany: 2008. pp. 229–293. [Google Scholar]
  • 84.Yan P., Chowell G. Quantitative Methods for Investigating Infectious Disease Outbreaks. Springer; Cham, Switzerland: 2019. [Google Scholar]
  • 85.Britton T. Stochastic epidemic models: A survey. Math. Biosc. 2010;225:24–35. doi: 10.1016/j.mbs.2010.01.006. [DOI] [PubMed] [Google Scholar]
  • 86.Diekmann O., Heesterbeek H., Britton T. Mathematical Tools for Understanding Infectious Disease Dynamics. Princeton University Press; Princeton, NJ, USA: 2013. [Google Scholar]
  • 87.Cummings D.A.T., Lessler J. Infectious disease dynamics. In: Nelson K.E., Masters Williams C., editors. Infectious Disease Epidemiology: Theory and Practice. Jones & Bartlett Learning; Burlington, MA, USA: 2014. pp. 131–166. [Google Scholar]
  • 88.Just W., Callender H., Drew LaMar M., Toporikova N. Transmission of infectious diseases: Data, models and simulations. In: Robeva R.S., editor. Algebraic and Discrete Mathematical Methods of Modern Biology. Elsevier; London, UK: 2015. pp. 193–215. [Google Scholar]
  • 89.Britton T., Giardina F. Introduction to statistical inference for infectious diseases. J. Soc. Franc. Stat. 2016;157:53–70. [Google Scholar]
  • 90.Fine P.E.M. The interval between successive cases of an infectious disease. Am. J. Epidemiol. 2003;158:1039–1047. doi: 10.1093/aje/kwg251. [DOI] [PubMed] [Google Scholar]
  • 91.Svensson A. A note on generation times in epidemic models. Math. Biosci. 2007;208:300–311. doi: 10.1016/j.mbs.2006.10.010. [DOI] [PubMed] [Google Scholar]
  • 92.Svensson A. The influence of assumptions on generation time distributions in epidemic models. Math. Biosci. 2015;270:81–89. doi: 10.1016/j.mbs.2015.10.006. [DOI] [PubMed] [Google Scholar]
  • 93.Wallinga J., Lipsitch M. How generation intervals shape the relationship between growth rates and reproductive numbers. Proc. R. Soc. B. 2007;274:599–604. doi: 10.1098/rspb.2006.3754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Forsberg White L., Pagano M. A likelihood-based method for real-time estimation of the serial interval and reproductive number of an epidemic. Stat. Med. 2008;27:2999–3016. doi: 10.1002/sim.3136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Nishiura H. Time variations in the generation time of an infectious disease: Implications for sampling to appropriately quantify transmission potential. Math. Biosci. 2010;7:851–869. doi: 10.3934/mbe.2010.7.851. [DOI] [PubMed] [Google Scholar]
  • 96.Scalia Tomba G., Svensson A., Asikainen T., Giesecke J. Some model based considerations on observing generation times for communicable diseases. Math. Biosci. 2010;223:24–31. doi: 10.1016/j.mbs.2009.10.004. [DOI] [PubMed] [Google Scholar]
  • 97.Trichereau J., Verret C., Mayet A., Manet G. Estimation of the reproductive number for A(H1N1) pdm09 influenza among the French armed forces, September 2009–March 2010. J. Infect. 2012;64:628–630. doi: 10.1016/j.jinf.2012.02.005. [DOI] [PubMed] [Google Scholar]
  • 98.Vink M.A., Bootsma M.C.J., Wallinga J. Serial intervals of respiratory infectious diseases: A systematic review and analysis. Am. J. Epidemiol. 2014;180:865–875. doi: 10.1093/aje/kwu209. [DOI] [PubMed] [Google Scholar]
  • 99.Champredon D., Dushoff J. Intrinsic and realized generation intervals in infectious-disease transmission. Proc. R. Soc. B. 2015;282:20152026. doi: 10.1098/rspb.2015.2026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.An der Heiden M., Hamouda O. Schätzung der aktuellen Entwicklung der SARS-CoV-2-Epidemie in Deutschland— Nowcasting. Epid. Bull. 2020;17:10–16. (In Germany) [Google Scholar]
  • 101.Ferretti L., Wymant C., Kendall M., Zhao L., Nurtay A., Abeler-Dörner L., Parker M., Bonsall D., Fraser C. Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing. Science. 2020;368:eabb6936. doi: 10.1126/science.abb6936. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Ganyani T., Kremer C., Chen D., Torneri A., Faes C., Wallinga J., Hens N. Estimating the generation interval for COVID-19 based on symptom onset data. medRxiv Prepr. 2020 doi: 10.1101/2020.03.05.20031815. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Li M., Liu K., Song Y., Wang M., Wu J. Serial interval and generation interval for respectively the imported and local infectors estimated using reported contact-tracing data of COVID-19 in China. medRxiv Prepr. 2020 doi: 10.1101/2020.04.15.20065946. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Nishiura H., Linton N.M., Akhmetzhanov A.R. Serial interval of novel coronavirus (COVID-19) infections. medRxiv Prepr. 2020 doi: 10.1101/2020.02.03.20019497. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Park M., Cook A.R., Lim J.J., Sun X., Dickens B.L. A systematic review of COVID-19 epidemiology based on current evidence. J. Clin. Med. 2020;9:967. doi: 10.3390/jcm9040967. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Spouge J.L. An accurate approximation for the expected site frequency spectrum in a Galton-Watson process under an infinite sites mutation model. Theor. Popul. Biol. 2019;127:7–15. doi: 10.1016/j.tpb.2019.03.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Taneyhill D.E., Dunn A.M., Hatcher M.J. The Galton-Watson branching process as a quantitative tool in parasitology. Parasitol. Today. 1999;15:159–165. doi: 10.1016/s0169-4758(99)01417-9. [DOI] [PubMed] [Google Scholar]
  • 108.Parnes D. Analyzing the contagion effect of foreclosures as a branching process: A close look at the years that follow the Great Recession. J. Account. Financ. 2017;17:9–34. [Google Scholar]
  • 109.Le Cam L. Asymptotic Methods in Statistical Decision Theory. Springer; New York, NY, USA: 1986. [Google Scholar]
  • 110.Heyde C.C., Johnstone I.M. On asymptotic posterior normality for stochastic processes. J. R. Stat. Soc. B. 1979;41:184–189. [Google Scholar]
  • 111.Johnson R.A., Susarla V., van Ryzin J. Bayesian non-parametric estimation for age-dependent branching processes. Stoch. Proc. Appl. 1979;9:307–318. [Google Scholar]
  • 112.Scott D. On posterior asymptotic normality and asymptotic normality of estimators for the Galton-Watson process. J. R. Stat. Soc. B. 1987;49:209–214. [Google Scholar]
  • 113.Yanev N.M., Tsokos C.P. Decision-theoretic estimation of the offspring mean in mortal branching processes. Comm. Stat. Stoch. Models. 1999;15:889–902. [Google Scholar]
  • 114.Mendoza M., Gutierrez-Pena E. Bayesian conjugate analysis of the Galton-Watson process. Test. 2000;9:149–171. [Google Scholar]
  • 115.Feicht R., Stummer W. An explicit nonstationary stochastic growth model. In: De La Grandville O., editor. Economic Growth and Development (Frontiers of Economics and Globalization, Vol. 11) Emerald Group Publishing Limited; Bingley, UK: 2011. pp. 141–202. [Google Scholar]
  • 116.Dorn F., Fuest C., Göttert M., Krolage C., Lautenbacher S., Link S., Peichl A., Reif M., Sauer S., Stöckli M., et al. Die volkswirtschaftlichen Kosten des Corona-Shutdown für Deutschland: Eine Szenarienrechnung. ifo Schnelldienst. 2020;73:29–35. (In Germany) [Google Scholar]
  • 117.Dorn F., Khailaie S., Stöckli M., Binder S., Lange B., Peichl A., Vanella P., Wollmershäuser T., Fuest C., Meyer-Hermann M. Das gemeinsame Interesse von Gesundheit und Wirtschaft: Eine Szenarienrechnung zur Eindämmung der Corona-Pandemie. ifo Schnelld. Dig. 2020;6:1–9. [Google Scholar]
  • 118.Kißlinger A.-L., Stummer W. A new toolkit for robust distributional change detection. Appl. Stoch. Models Bus. Ind. 2018;34:682–699. [Google Scholar]
  • 119.Dehning J., Zierenberg J., Spitzner F.P., Wibral M., Neto J.P., Wilczek M., Priesemann V. Inferring change points in the spread of COVID-19 reveals the effectiveness of interventions. Science. 2020;369:eabb9789. doi: 10.1126/science.abb9789. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120.Friesen M. Statistical surveillance. Optimality and methods. Int. Stat. Review. 2003;71:403–434. [Google Scholar]
  • 121.Friesen M., Andersson E., Schiöler L. Robust outbreak surveillance of epidemics in Sweden. Stat. Med. 2009;28:476–493. doi: 10.1002/sim.3483. [DOI] [PubMed] [Google Scholar]
  • 122.Brauner J.M., Mindermann S., Sharma M., Stephenson A.B., Gavenciak T., Johnston D., Salvatier J., Leech G., Besiroglu T., Altman G., et al. The effectiveness and perceived burden of nonpharmaceutical interventions against COVID-19 transmission: A modelling study with 41 countries. medRxiv Prepr. 2020 doi: 10.1101/2020.05.28.20116129. [DOI] [Google Scholar]
  • 123.Österreicher F., Vajda I. Statistical information and discrimination. IEEE Trans. Inform. Theory. 1993;39:1036–1039. [Google Scholar]
  • 124.De Groot M.H. Uncertainty, information and sequential experiments. Ann. Math. Statist. 1962;33:404–419. [Google Scholar]
  • 125.Krafft O., Plachky D. Bounds for the power of likelihood ratio tests and their asymptotic properties. Ann. Math. Stat. 1970;41:1646–1654. [Google Scholar]
  • 126.Basawa I.V., Scott D.J. Efficient tests for branching processes. Biometrika. 1976;63:531–536. [Google Scholar]
  • 127.Feigin P.D. The efficiency criteria problem for stochastic processes. Stoch. Proc. Appl. 1978;6:115–127. [Google Scholar]
  • 128.Sweeting T.J. On efficient tests for branching processes. Biometrika. 1978;65:123–127. [Google Scholar]
  • 129.Linkov Y.N. Lectures in Mathematical Statistics, Parts 1 and 2. American Mathematical Society; Providence, RI, USA: 2005. [Google Scholar]
  • 130.Feller W. Diffusion processes in genetics. In: Neyman J., editor. Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability. University of California Press; Berkeley, CA, USA: 1951. pp. 227–246. [Google Scholar]
  • 131.Jirina M. On Feller’s branching diffusion process. Časopis Pěst. Mat. 1969;94:84–89. [Google Scholar]
  • 132.Lamperti J. Limiting distributions for branching processes. In: Le Cam L.M., Neyman J., editors. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. II, Part 2. University of California Press; Berkeley, CA, USA: 1967. pp. 225–241. [Google Scholar]
  • 133.Lamperti J. The limit of a sequence of branching processes. Z. Wahrscheinlichkeitstheorie Verw. Geb. 1967;7:271–288. [Google Scholar]
  • 134.Lindvall T. Convergence of critical Galton-Watson branching processes. J. Appl. Prob. 1972;9:445–450. [Google Scholar]
  • 135.Lindvall T. Limit theorems for some functionals of certain Galton-Watson branching processes. Adv. Appl. Prob. 1974;6:309–321. [Google Scholar]
  • 136.Grimvall A. On the convergence of sequences of branching processes. Ann. Probab. 1974;2:1027–1045. [Google Scholar]
  • 137.Borovkov K.A. On the convergence of branching processes to a diffusion process. Theor. Probab. Appl. 1986;30:496–506. [Google Scholar]
  • 138.Ethier S.N., Kurtz T.G. Markov Processes: Characterization and Convergence. Wiley; New York, NY, USA: 1986. [Google Scholar]
  • 139.Durrett R. Stochastic Calculus. CRC Press; Boca Raton, FL, USA: 1996. [Google Scholar]
  • 140.Kawazu K., Watanabe S. Branching processes with immigration and related limit theorems. Theor. Probab. Appl. 1971;16:36–54. [Google Scholar]
  • 141.Wei C.Z., Winnicki J. Some asymptotic results for the branching process with immigration. Stoch. Process. Appl. 1989;31:261–282. [Google Scholar]
  • 142.Sriram T.N. Invalidity of bootstrap for critical branching processes with immigration. Ann. Stat. 1994;22:1013–1023. [Google Scholar]
  • 143.Li Z. Branching processes with immigration and related topics. Front. Math. China. 2006;1:73–97. [Google Scholar]
  • 144.Dawson D.A., Li Z. Skew convolution semigroups and affine Markov processes. Ann. Probab. 2006;34:1103–1142. [Google Scholar]
  • 145.Cox J.C., Ingersoll J.E., Jr., Ross S.A. A theory of the term structure of interest rates. Econometrica. 1985;53:385–407. [Google Scholar]
  • 146.Cox J.C., Ross S.A. The valuation of options for alternative processes. J. Finan. Econ. 1976;3:145–166. [Google Scholar]
  • 147.Heston S.L. A closed-form solution for options with stochastic volatilities with applications to bond and currency options. Rev. Finan. Stud. 1993;6:327–343. [Google Scholar]
  • 148.Lansky P., Lanska V. Diffusion approximation of the neuronal model with synaptic reversal potentials. Biol. Cybern. 1987;56:19–26. doi: 10.1007/BF00333064. [DOI] [PubMed] [Google Scholar]
  • 149.Giorno V., Lansky P., Nobile A.G., Ricciardi L.M. Diffusion approximation and first-passage-time problem for a model neuron. Biol. Cybern. 1988;58:387–404. doi: 10.1007/BF00361346. [DOI] [PubMed] [Google Scholar]
  • 150.Lanska V., Lansky P., Smith C.E. Synaptic transmission in a diffusion model for neuron activity. J. Theor. Biol. 1994;166:393–406. doi: 10.1006/jtbi.1994.1035. [DOI] [PubMed] [Google Scholar]
  • 151.Lansky P., Sacerdote L., Tomassetti F. On the comparison of Feller and Ornstein-Uhlenbeck models for neural activity. Biol. Cybern. 1995;73:457–465. doi: 10.1007/BF00201480. [DOI] [PubMed] [Google Scholar]
  • 152.Ditlevsen S., Lansky P. Estimation of the input parameters in the Feller neuronal model. Phys. Rev. E. 2006;73:061910. doi: 10.1103/PhysRevE.73.061910. [DOI] [PubMed] [Google Scholar]
  • 153.Höpfner R. On a set of data for the membrane potential in a neuron. Math. Biosci. 2007;207:275–301. doi: 10.1016/j.mbs.2006.10.009. [DOI] [PubMed] [Google Scholar]
  • 154.Lansky P., Ditlevsen S. A review of the methods for signal estimation in stochastic diffusion leaky integrate-and-fire neuronal models. Biol. Cybern. 2008;99:253–262. doi: 10.1007/s00422-008-0237-x. [DOI] [PubMed] [Google Scholar]
  • 155.Pedersen A.R. Estimating the nitrous oxide emission rate from the soil surface by means of a diffusion model. Scand. J. Stat. Theory Appl. 2000;27:385–403. [Google Scholar]
  • 156.Aalen O.O., Gjessing H.K. Survival models based on the Ornstein-Uhlenbeck process. Lifetime Data Anal. 2004;10:407–423. doi: 10.1007/s10985-004-4775-9. [DOI] [PubMed] [Google Scholar]
  • 157.Kammerer N.B. Ph.D. Thesis. University of Erlangen-Nürnberg; Erlangen, Germany: 2011. Generalized-Relative-Entropy Type Distances Between Some Branching Processes and Their Diffusion Limits. [Google Scholar]

Articles from Entropy are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES