A new flexible regression model with application to recovery probability Covid-19 patients

F Prataviera; E M Hashimoto; E M M Ortega; G M Cordeiro; V G Cancho; R Vila

doi:10.1080/02664763.2022.2163229

. 2023 Jan 4;51(5):826–844. doi: 10.1080/02664763.2022.2163229

A new flexible regression model with application to recovery probability Covid-19 patients

F Prataviera ^a, E M Hashimoto ^b, E M M Ortega ^c, G M Cordeiro ^d, V G Cancho ^e, R Vila ^f,^CONTACT

PMCID: PMC10956937 PMID: 38524797

Abstract

The aim of this study is to propose a generalized odd log-logistic Maxwell mixture model to analyze the effect of gender and age groups on lifetimes and on the recovery probabilities of Chinese individuals with COVID-19. We add new properties of the generalized Maxwell model. The coefficients of the regression and the recovered fraction are estimated by maximum likelihood and Bayesian methods. Further, some simulation studies are done to compare the regressions for different scenarios. Model-checking techniques based on the quantile residuals are addressed. The estimated survival functions for the patients are reported by age range and sex. The simulation study showed that mean squared errors decay toward zero and the average estimates converge to the true parameters when sample size increases. According to the fitted model, there is a significant difference only in the age group on the lifetime of individuals with COVID-19. Women have higher probability of recovering than men and individuals aged $\geq$ 60 years have lower recovered probabilities than those who aged $< 60$ years. The findings suggest that the proposed model could be a good alternative to analyze censored lifetime of individuals with COVID-19.

Keywords: Censored data, COVID-19, Maxwell distribution, mixture model, quantile residuals

1. Introduction

The coronavirus 2019 (COVID-19) disease was first identified in Wuhan (China) in December 2019. The most common symptoms of the disease are fever, coughing, sore throat, gastrointestinal disturbances, breathing difficulty, and in serious cases it can evolve to pneumonia [6,23].

According to information updated to 22 April 2021 from Johns Hopkins University, more than 144 million people had tested positive for COVID-19, and more than 3.062 million deaths occur in the world [9]. At the coronavirus pandemic site¹ (on 22 April 2021) more than 123 million of patients had recovered, and almost 19 million continued as active cases (0.6% in serious conditions). The world mortality rate is 395 per 1 million inhabitants. For this reason, several studies try to investigate the behavior of the disease according to demographic characteristics and comorbidities [2,14,16,22,34].

Furthermore, specifically in China, the number of deaths caused by the disease (on 22 April 2021) is around 4636, out of a total of 90,507 confirmed cases, according to Johns Hopkins University. Xie et al. [33] studied the effect of oxygen saturation and other measures on the lifetime of COVID-19 patients suffering from pneumonia admitted to Union Hospital of Wuhan. An interesting characteristic that can be noted in the lifetime of these individuals is the presence of a plateau in the survival curve. Figure 1(a) displays a survival curve with plateau at 0.76 of the lifetime of a sample of patients suffering from COVID-19 residing in China. Figure 1(b), in turn, depicts the empirical risk initially increasing and then diminishing after medical care [19].

Morena et al. [24], Yang et al. [35] and Yan et al. [34] also presented a survival curve like the one described in Figure 1(a). However, those works did not take into account the information of a plateau (or asymptote) in the survival curve and used other statistical analysis. In situations like this, mixture models can be used to consider this information [18], and in this case it is possible to interpret the plateau as the proportion of the patients who recovered. The WHO-China Joint Mission on Coronavirus Disease 2019 Report published by WHO² says that the recovery time depends on the age, gender and any other underlying health issues.

Some studies also have been published taking into account the recovery time variables such as sex and age. For example, Voinsky et al. [32] assessed the effects of the age and sex of 5769 Israeli coronavirus patients on their recovery rate. The time from infection to recovery is the number of days from the first positive to the first negative result of the SARS-CoV-2 PCR test. Al-Rousan and Al-Najjar [3] presented some statistical analysis of the effects of sex, region, reasons for infection, age and date of discharge or illness on the rates of recovered cases and deaths.

In this context, we construct a mixed regression based on the Generalized Odd Log-logistic Maxwell (GOLLMax) family of distributions to estimate the effects of group age and sex variables on the recovery probabilities of COVID-19 patients residing in China. The GOLLMax family was recently pioneered by Prataviera et al. [28] for applications in various fields to some well-known distributions. We adopt maximum likelihood and Bayesian methods to estimate the parameters of this family and its adequacy is confirmed by residual analysis.

This paper is structured as follows. Section 2 addresses some structural properties of the new family. The GOLLMax mixture regression and the estimation of its parameters are discussed in Section 3. Residual analysis is addressed in Section 4. The utility of the new regression is proved by means of coronavirus lifetimes in Section 5. Finally, this paper is closed in Section 6 with some remarks.

2. New properties of the GOLLMax distribution

The cumulative distribution function (cdf) of the generalized odd log-logistic-G (‘GOLL-G’) family (from a baseline G with unknown parameters in $γ$ ) is given by ([7])

F (t) = \frac{G (t)^{σν}}{G (t)^{σν} + {[1 - G (t)^{σ}]}^{ν}},

(1)

where $σ > 0$ and $ν > 0$ are two extra shape parameters. The odd log-logistic-G (OLL-G) [13] and exponentiated-G (exp-G) classes correspond to $σ = 1$ and $ν = 1$ , respectively.

The Maxwell baseline cdf has the form (for t>0)

G (t) = γ_{1} (\frac{3}{2}, \frac{t^{2}}{μ^{2}}),

(2)

where $μ > 0$ is a scale parameter, $γ_{1} (p, y) = γ (p, y) / Γ (p)$ , $γ (p, y) = \int_{0}^{y} w^{p - 1} e^{- w} d w$ , and $Γ (p) = \int_{0}^{\infty} w^{p - 1} e^{- w} d w$ is the gamma function.

The cdf and probability density function (pdf) of the GOLLMax family were defined by Prataviera et al. [28] by inserting (2) in Equation (1)

F (t) = \frac{γ_{1}^{σν} (3 / 2, t^{2} / μ^{2})}{γ_{1}^{σν} (3 / 2, t^{2} / μ^{2}) + {[1 - γ_{1}^{σ} (3 / 2, t^{2} / μ^{2})]}^{ν}}, t > 0

(3)

and

f (t) = \frac{4 σν}{\sqrt{π} μ^{3}} t^{2} \exp (- \frac{t^{2}}{μ^{2}}) \frac{γ_{1}^{σν - 1} (3 / 2, t^{2} / μ^{2}) {[1 - γ_{1}^{σ} (3 / 2, t^{2} / μ^{2})]}^{ν - 1}}{{γ_{1}^{σν} (3 / 2, t^{2} / μ^{2}) + {[1 - γ_{1}^{σ} (3 / 2, t^{2} / μ^{2})]}^{ν}}^{2}},

(4)

respectively.

We have

lim_{t \to \infty} f (t) = 0 and lim_{t \to 0^{+}} f (t) = \infty \cdot 1_{(0, 1)} (νσ) + 0 \cdot 1_{[0, \infty)} (νσ),

(5)

where $1_{A}$ is the indicator function of set A.

Further, if $h (t)$ denotes the hazard function corresponding to (3), then $lim_{t \to \infty} h (t) = \infty$ and $lim_{t \to 0^{+}} h (t) = lim_{t \to 0^{+}} f (t)$ .

Prataviera et al. [28] showed that the GOLLMax family allows analyzing data whose hazard function has unimodal and bathtub bimodal shapes. Further, it has as special cases the OLLMax ( $σ = 1$ ), exponentiated-Maxwell (EMax) ( $ν = 1$ ) and Maxwell ( $σ = ν = 1$ ) distributions. So the GOLLMax family is much more flexible and consequently becomes very competitive to many other lifetime models.

We present below new structural properties of the GOLLMax model, which are completely different from those reported by Prataviera et al. [28].

Henceforth, let $T \sim GOLLMax (μ, σ, ν)$ have the GOLLMax distribution with parameter vector $(μ, σ, ν)$ , and $Y \sim LL (1, ν)$ be the log-logistic random variable with unity scale and shape ν.

Some properties of the GOLLMax distribution are reported below:

The cdf of T can be written as
$F (t) = P (Y ⩽ A (t)),$
where $A (t) = A (t; μ, σ) = G (t)^{σ} / [1 - G (t)^{σ}]$ and $G (t)$ is as in (2).
As a consequence of Item a:
1. If $T \sim GOLLMax (μ, σ, ν)$ , then $Y = A (T) \sim LL (1, ν)$ .
2. If $Y \sim LL (1, ν)$ , then $T = A^{- 1} (Y) \sim GOLLMax (μ, σ, ν)$ .
Hence, the random variable T admits the stochastic representation (see [8]):
$T = μ \sqrt{γ_{1}^{- 1} (\frac{3}{2}, {(\frac{Y}{1 + Y})}^{1 / σ})},$
where $γ^{- 1} (3 / 2, \cdot)$ denotes the inverse function of $γ (3 / 2, \cdot)$ .
By applying Item b,
$E [\frac{γ_{1}^{kσ} (3 / 2, T^{2} / μ^{2})}{{1 - γ_{1}^{σ} (3 / 2, T^{2} / μ^{2})}^{k}}] = E (Y^{k}) = \frac{kπ / ν}{\sin (kπ / ν)}, k < ν .$
Again, by using Item b with $ν = 1$ , we obtain
$E [γ_{1}^{σ} (\frac{3}{2}, \frac{T^{2}}{μ^{2}})] = E (\frac{Y}{1 + Y}) = \int_{0}^{\infty} \frac{y}{(1 + y)^{3}} d y = \frac{1}{2} .$
Since $A (t / k; μ, σ) = A (t; kμ, σ)$ , k>0, by Item a, the following holds (see [8]):

If $T \sim GOLLMax (μ, σ, ν),$ then $kX \sim GOLLMax (kμ, σ, ν)$ . That is, the GOLLMax is closure under changes of scale.
A critical point of the GOLLMax density (4) verifies (see [8])
$\frac{y^{″}}{(y^{'})^{2}} + \frac{(σ + 1) y^{σ} [y^{νσ} + (1 - y^{σ})^{ν}] - (νσ + 1) y^{νσ} - 2 (1 - y^{σ})^{ν}}{y (1 - y^{σ}) [y^{νσ} + (1 - y^{σ})^{ν}]} = 0,$ (6)
where $y = y (t) = G (t)$ , and $G (t)$ is as in (2). Equation (6) implies that the modality of GOLLMax density is independent of the parameter μ.
By using the limit in (5) and the number of critical points of the GOLLMax pdf of T, we obtain that the GOLLMax pdf is decreasing/ decreasing–increasing–decreasing/unimodal or bimodal (see [8]).
For any $ν ⩾ 1$ (or for any $σ ⩽ 1$ ) the GOLLMax distribution has thinner tails than an exponential distribution (light-tailed distribution) (see [8]).

The following two results show convergence in law involving the minimum and maximum of a sequence of random variables with the GOLLMax distribution.

Proposition 2.1

There is a sequence of independent, identically distributed $($ iid $)$ random variables $T_{n} \sim GOLLMax (μ, σ, ν_{n})$ so that

$T_{n} \overset{D}{⟶} U,$

where U has cdf $F_{U} (u) = e^{- [A (u)]^{- p}},$ $u ⩾ 0,$ A is as in Item a, and ‘ $\overset{D}{⟶}$ ’ denotes convergence in distribution.

Proof.

Henceforth, let $Y_{1}, \dots, Y_{n}$ be iid random variables from $Y \sim LL (n^{1 / p}, ν_{n})$ , p>0. Let $Y_{1, n} ⩽ Y_{2, n} ⩽ \dots ⩽ Y_{n, n}$ be their order statistics.

Define $Z_{n} = Y_{n, n} / n^{1 / p}$ . By Theorem 4.3 of [1], $Z_{n} \overset{D}{⟶} X$ , where X has cdf $F_{X} (x) = e^{- x^{- p}}$ , $x ⩾ 0$ . By applying the continuous mapping theorem since $A^{- 1}$ (the inverse function of A) is a continuous map, we have

$T_{n} = A^{- 1} (Z_{n}) \overset{D}{⟶} U = A^{- 1} (X),$

where U has cdf $F_{U} (u) = e^{- [A (u)]^{- p}}$ , $u ⩾ 0$ . Further, since $Z_{n} \sim LL (1, ν_{n})$ , by Item b-(2), $T_{n} \sim GOLLMax (μ, σ, ν_{n})$ .

Proposition 2.2

There is a sequence of iid random variables ${\tilde{T}}_{n} \sim GOLLMax (μ, σ, ν_{n}),$ such that

${\tilde{T}}_{n} \overset{D}{⟶} V,$

where V has cdf $F_{V} (v) = 1 - e^{- [A (v)]^{p}},$ $v ⩾ 0,$ and A is as in Item a.

Proof.

The proof is similar to the previous proposition. For the convenience of the reader, we show the details.

By defining ${\tilde{Z}}_{n} = Y_{1, n} / n^{- 1 / p}$ from Theorem 4.4 of [1], we have ${\tilde{Z}}_{n} \overset{D}{⟶} \tilde{X} \sim Weibull (1, p)$ , p>1. By applying the continuous mapping theorem, we can write

${\tilde{T}}_{n} = A^{- 1} ({\tilde{Z}}_{n}) \overset{D}{⟶} V = A^{- 1} (\tilde{X}),$

where V has cdf $F_{V} (v) = 1 - e^{- [A (v)]^{p}}$ , $v ⩾ 0$ . Since $Z_{n} \sim LL (1, ν_{n})$ , by Item b-(2), ${\tilde{T}}_{n} \sim GOLLMax (μ, σ, ν_{n})$ .

The following proposition gives other stochastic representations for the GOLLMax distribution and some related distributions.

Proposition 2.3

Let A be as in Item a. The followings hold:

If $X \sim U (0, 1),$ then $A^{- 1} (X^{1 / ν} / (1 - X)^{1 / ν}) \sim GOLLMax (μ, σ, ν)$ .

If $T \sim GOLLMax (μ, σ, ν),$ then $A^{ν} (T) / [1 + A^{ν} (T)] \sim U (0, 1)$ .

If $T \sim GOLLMax (μ, σ, ν),$ then $klog (A (T)) + ℓ \sim Logistic (ℓ, | k | / ν),$ $k, ℓ \in R$ .

If X and $Y \sim Exponential (1)$ are independently, then $A^{- 1} ((X / Y)^{1 / ν}) \sim GOLLMax (μ, σ, ν)$ .

Proof.

For $X \sim U (0, 1)$ , it is well known that $a + [\log (X) - \log (1 - X)] / ν \sim Logistic (a, 1 / ν)$ . So, $e^{a} X^{1 / ν} / (1 - X)^{1 / ν} \sim LL (e^{a}, ν)$ . Applying properties of the log-logistic distribution, we have $X^{1 / ν} / (1 - X)^{1 / ν} \sim LL (1, ν)$ . Hence, by Item b-(2), $A^{- 1} (X^{1 / ν} / (1 - X)^{1 / ν}) \sim GOLLMax (μ, σ, ν)$ . This proves Item (1). Analogously, the proof of second item follows.

If $T \sim GOLLMax (μ, σ, ν)$ then, by Item b-(1), $A (T) \sim LL (1, ν)$ . So, it is well known that $\log (A (T)) \sim Logistic (0, 1 / ν)$ . Further, by applying properties of the logistic distribution, $klog (A (T)) + ℓ \sim Logistic (ℓ, | k | / ν)$ , thus proving Item (3).

For $X, Y \sim Exponential (1)$ independently, a well-known property is that $a - \log (X / Y) / ν \sim Logistic (a, 1 / ν)$ . So, $e^{a} (X / Y)^{1 / ν} \sim LL (e^{a}, ν)$ . Then, $(X / Y)^{1 / ν} \sim LL (1, ν)$ . Hence, by Item b-(2), $A^{- 1} ((X / Y)^{1 / ν}) \sim GOLLMax (μ, σ, ν)$ . So, we complete the proof of the fourth item.

3. The GOLLMax mixture regression

The GOLLMax mixture model is described as follows: let c>0 be the fixed censoring time and T be the lifetime independent of c. Then the observed time $t = min (T, c)$ defines the Type I censoring mechanism [18]. Moreover, the current population is considered to be a mixture of susceptible individuals and recovered individuals. Let $N_{i}$ denote the indicator that the ith individual is susceptible ( $N_{i} = 1$ ) or recovered $(N_{i} = 0)$ (for $i = 1, \dots, n$ ). The mixture model [21,25] takes the form

S_{pop} (t_{i}) = π_{0} + (1 - π_{0}) S (t_{i} | N_{i} = 1),

where $S_{pop} (t_{i})$ is the (unconditional) population survival function of $T_{i}$ , $π_{0} = P (N_{i} = 0)$ is the recovery probability, and the survival function for the susceptible individuals follows from (3)

S (t_{i} | N_{i} = 1) = 1 - \frac{γ_{1}^{σν} (3 / 2, t_{i}^{2} / μ^{2})}{γ_{1}^{σν} (3 / 2, t_{i}^{2} / μ^{2}) + {[1 - γ_{1}^{σ} (3 / 2, t_{i}^{2} / μ^{2})]}^{ν}} .

The improper population density function [21] can be expressed as

f_{pop} (t_{i}) = (1 - π_{0}) f (t_{i}),

where $f (t_{i})$ is the density function (4). The hazard rate function (hrf) of $T_{i}$ is $h_{pop} (t_{i}) = f_{pop} (t_{i}) / S_{pop} (t_{i})$ .

Recently, Ortega et al. [26] and Prataviera et al. [27] developed some extended regressions for lifetime data. In a similar manner, the GOLLMax mixture regression is constructed for the response variable $T_{i}$ (for $i = 1, \dots, n$ ) having density (4) with associated vector $x_{i}^{⊤} = (1, x_{i 1}, \dots, x_{ip})$ of the explanatory variables, and the systematic component

μ_{i} = \exp (x_{i}^{⊤} β) and π_{0 i} = \frac{\exp (x_{i}^{⊤} γ)}{1 + \exp (x_{i}^{⊤} γ)},

(7)

where $β = (β_{0}, \dots, β_{p})^{⊤}$ and $γ = (γ_{0}, \dots, γ_{p})^{⊤}$ are unknown parameter vectors. Note that the logit link function is used to model the proportion of individuals recovered.

Equation (7) is only identifiable when $π_{0} (x)$ is modeled by a logistic regression with non-constant covariates [20].

3.1. Estimation

Let $(t_{i}, x_{i}), \dots, (t_{n}, x_{n})$ be a sample from the GOLLMax distribution (4), and let $θ = (σ, ν, β^{⊤}, γ^{⊤})^{⊤}$ be the unknown parameters. The observed lifetime at $t_{i}$ contributes to the likelihood function is

\begin{aligned} f_{pop} (t_{i} | x_{i}) & = \frac{4 (1 - π_{0 i}) σν}{\sqrt{π} μ_{i}^{3}} t_{i}^{2} \exp (- \frac{t_{i}^{2}}{μ_{i}^{2}}) \\ \times \frac{γ_{1}^{σν - 1} (3 / 2, t_{i}^{2} / μ_{i}^{2}) {[1 - γ_{1}^{σ} (3 / 2, t_{i}^{2} / μ_{i}^{2})]}^{ν - 1}}{{γ_{1}^{σν} (3 / 2, t_{i}^{2} / μ_{i}^{2}) + {[1 - γ_{1}^{σ} (3 / 2, t_{i}^{2} / μ_{i}^{2})]}^{ν}}^{2}}, \end{aligned}

whereas an element at risk at $t_{i}$ contributes with

S_{pop} (t_{i} | x_{i}) = π_{0 i} + (1 - π_{0 i}) {1 - \frac{γ_{1}^{σν} (3 / 2, t_{i}^{2} / μ_{i}^{2})}{γ_{1}^{σν} (3 / 2, t_{i}^{2} / μ_{i}^{2}) + {[1 - γ_{1}^{σ} (3 / 2, t_{i}^{2} / μ_{i}^{2})]}^{ν}}} .

Let L and C be the sets of elements for the lifetimes and censoring times, respectively, and r be the number of uncensored observations. The log-likelihood function for $θ$ under uninformative censoring has the form

\begin{aligned} ℓ (θ) & = rlog (\frac{4 σν}{\sqrt{π}}) + \sum_{i \in L} \log (1 - π_{0 i}) + \sum_{i \in L} \log (\frac{t_{i}^{2}}{μ_{i}^{3}}) - \sum_{i \in L} \frac{t_{i}^{2}}{μ_{i}^{2}} \\ + \sum_{i \in L} \log {\frac{γ_{1}^{σν - 1} (3 / 2, t_{i}^{2} / μ_{i}^{2}) {[1 - γ_{1}^{σ} (3 / 2, t_{i}^{2} / μ_{i}^{2})]}^{ν - 1}}{{γ_{1}^{σν} (3 / 2, t_{i}^{2} / μ_{i}^{2}) + {[1 - γ_{1}^{σ} (3 / 2, t_{i}^{2} / μ_{i}^{2})]}^{ν}}^{2}}} \\ + \sum_{i \in C} \log {π_{0 i} + (1 - π_{0 i}) [1 - \frac{γ_{1}^{σν} (3 / 2, t_{i}^{2} / μ_{i}^{2})}{γ_{1}^{σν} (3 / 2, t_{i}^{2} / μ_{i}^{2}) + {[1 - γ_{1}^{σ} (3 / 2, t_{i}^{2} / μ_{i}^{2})]}^{ν}}]} . \end{aligned}

(8)

We use the gamlss package of the R software [30,31] to maximize (8) and find the MLE $\hat{θ}$ . The computational program is available at https://github.com/fabiopviera/GOLLMax.Mix. The global deviance $GD = - 2 ℓ (\hat{θ})$ , Akaike information criterion ( $AIC$ ), and Bayesian information criterion ( $BIC$ ) are adopted to select the best regression.

3.2. A Bayesian analysis

We can use a Markov Chain Monte Carlo (MCMC) algorithm to obtain posterior inference for the parameters. We consider independent prior densities for the parameters $π (β, γ, σ, ν) = π (θ) π (γ) π (σ) π (ν)$ , where $β_{j} \sim N (0, τ_{j})$ , $γ_{j} \sim N (0, ρ_{j})$ ( $j = 0, 1, \dots, p$ ), $σ \sim G (a, b)$ , $ν \sim G (c, d)$ , $G (a, b)$ is a gamma distribution, and $N (μ_{0}, τ_{0})$ is a normal distribution. Combining these prior densities with the likelihood function obtained from Equation (8), the posterior density for the parameters becomes

π (β, γ, σ, ν | D) \propto L (β, γ, σ, ν | D) π (θ) π (γ) π (σ) π (ν) .

(9)

The MCMC algorithm can be used since the joint posterior density (9) is analytically intractable. However, we set $σ = \exp (σ^{*})$ and $ν = \exp (ν^{*})$ to obtain numerical stability, thus implying a new parameter space $Θ^{⋆} = {ϑ : ϑ = (β, γ, σ^{*}, ν^{*})} \subset R^{2 p + 4}$ . Using the Jacobian transformation, the posterior density follows as

π (β, σ^{*}, ν^{*} | D) \propto L (β, γ, σ^{*}, ν^{*} | D) π (β) π (γ) π (σ^{*}) π (ν^{*}) e^{ν^{*} + τ^{*}} .

We implement the Metropolis–Hastings algorithm in Cancho et al. [5] which operates as follows:

Start with any point $ϑ_{(0)}$ and stage indicator j = 0
Generate a point $ϑ^{'}$ according to the transitional kernel $q (ϑ^{'}, ϑ_{j}) = N_{p + 4} (ϑ_{j}, \tilde{Σ})$ , where $\tilde{Σ}$ is the covariance matrix of $θ$ is the same at any stage
Update $ϑ_{(j)}$ to $ϑ_{(j + 1)} = ϑ^{'}$ with probability $p_{j} = min {1, π (ϑ^{'} | D) / π (ϑ_{(j)} | D)}$ , or keep $ϑ_{(j)}$ with probability $1 - p_{j}$
Repeat steps (2) and (3) by increasing the stage indicator until the process reaches a stationary distribution.

The computational program is available from the authors under request.

3.3. Simulation study

We examine the performance of the MLEs from the above regression by means of Monte Carlo simulations for sample sizes (n = 80, 250, 500) using the gamlss package in R and vector operations related to the RS method.

We consider a systematic component defined from Equation (7) as $μ_{i} = \exp (β_{0} + β_{1} x_{1 i} + β_{2} x_{2 i} + β_{3} x_{3 i})$ and $τ_{i} = logit (γ_{0} + γ_{1} x_{4 i} + γ_{2} x_{3 i})$ , whose coefficients are $β_{0} = 1.20$ , $β_{1} = - 0.55$ , $β_{2} = 0.20$ , $β_{3} = - 0.35$ , $σ = 0.35$ , $ν = 0.85$ , $γ_{0} = - 0.95$ , $γ_{1} = 0.30$ and $γ_{2} = 1.50$ . The explanatory variables are taken as $x_{1 i} \sim Uniform (0, 2.5)$ , $x_{2 i} \sim Normal (5, 0.50)$ , $x_{3 i} \sim Uniform (0, 1)$ and $x_{4 i} \sim Binomial (1, 0.5)$ .

Further, the percentage of cured individuals is assumed approximately 66%. We present a brief script to generate the random values of the proposed regression with cured proportion:

Calculate τ such that $τ_{i} = logit (γ_{0} + γ_{1} x_{4 i} + γ_{2} x_{3 i})$
Let $M_{i} \sim Bernoulli (τ_{i})$
If $M_{i} = 0$ , $y_{i} = \infty$ , else $M_{i} = 1$ , $y_{i} = GOLLMax (μ_{i}, σ, ν)$ from Equation (3)
Generate censored time by $t c_{i} \sim Uniforme (0, ξ)$ for $ξ = 15$
The observed time $t_{i}$ for the ith individual is $t_{i} = min (y_{i}, t c_{i})$
Create a censored indicator vector, $δ_{i}$ , if $y_{i} \leq t c_{i}$ do $δ_{i} = 1$ , otherwise $δ_{i} = 0$ .

For each of the 1000 simulations, we obtain the average estimates (AEs), biases and mean squared errors (MSEs) of the estimates. The figures reported in Table 1 indicate that the MSEs decay toward zero and the AEs converge to the true parameters when n increases. More details on the simulations of this regression and other scenarios, showing that the asymptotic properties of the estimators are satisfied are addressed in Prataviera et al. [28] and Prataviera et al. [29].

Table 1.

Findings from the simulated GOLLMax mixture regression.

	n = 80			n = 250			n = 500
$θ$	AE	Bias	MSE	AE	Bias	MSE	AE	Bias	MSE
$β_{0}$	1.363	0.163	0.161	1.305	0.105	0.058	1.290	0.091	0.008
$β_{1}$	−0.536	0.014	0.003	−0.540	0.010	0.001	−0.538	0.008	0.001
$β_{2}$	0.183	−0.017	0.005	0.194	−0.006	0.002	0.190	−0.010	0.001
$β_{3}$	−0.310	0.040	0.021	−0.320	0.030	0.006	−0.321	0.022	0.004
σ	0.224	−0.126	0.028	0.230	−0.120	0.019	0.227	−0.123	0.017
ν	1.017	0.167	0.051	0.965	0.115	0.020	0.905	0.068	0.003
$γ_{0}$	−1.069	−0.119	3.084	−0.952	−0.002	0.164	−0.954	−0.002	0.026
$γ_{1}$	0.363	0.063	2.982	0.307	0.007	0.116	0.286	−0.014	0.095
$γ_{2}$	1.623	0.123	1.305	1.514	0.014	0.376	1.551	0.011	0.027

Open in a new tab

4. Checking model

The adequacy of a regression model fitted to data can be carried out by analyzing the residuals to identify discrepant observations and if there are serious departures from the model assumptions. If the model is suitable, the residual plots versus the order of the observations or the predicted values should behave randomly around zero.

We consider the quantile residuals (qrs) [12] for the fitted GOLLMax mixture regression (for $i = 1, \dots, n$ ), namely

\begin{aligned} {qr}_{i} & = Φ^{- 1} {1 - [{\hat{π}}_{0 i} + (1 - {\hat{π}}_{0 i}) \\ {1 - \frac{γ_{1}^{\hat{σ} \hat{ν}} (3 / 2, t_{i}^{2} / {\hat{μ}}_{i}^{2})}{γ_{1}^{\hat{σ} \hat{ν}} (3 / 2, t_{i}^{2} / {\hat{μ}}_{i}^{2}) + {[1 - γ_{1}^{\hat{σ}} (3 / 2, t_{i}^{2} / {\hat{μ}}_{i}^{2})]}^{\hat{ν}}}}]}, \end{aligned}

where

{\hat{μ}}_{i} = \exp (x_{i}^{⊤} \hat{β}), {\hat{π}}_{0 i} = \frac{\exp (x_{i}^{⊤} \hat{γ})}{1 + \exp (x_{i}^{⊤} \hat{γ})},

$\hat{σ}$ , $\hat{ν}$ , $\hat{β}$ , $\hat{γ}$ are the MLEs and $Φ^{- 1} (\cdot)$ is the inverse standard normal cdf.

We also adopt the Worm Plots (WP) of the residuals to check the quality of the fitted regression [4].

5. Application

Equations (4) and (7) are used to model the probability for symptomatic patients to recover from COVID-19. For doing that, a data set was obtained from Dong et al. [11]. The sample consists of 139 individuals of Chinese nationality diagnosed with coronavirus according to the WHO¹ guidance and, of these individuals, 52 are women and 87 are men. The response variable T is the time in days from the onset of COVID-19 symptoms to the individual's death. As the last database update was on 03/13/2020, it means that the censoring time is $c = 03 / 13 / 2020$ . In addition, demographic characteristics (age and gender) were also observed and then, for each individual $(i = 1, \dots, 139)$ , the following variables are obtained:

$t_{i}$ : lifetime (in days),
$x_{i 1}$ : gender ( $female = 0$ , $male = 1$ ),
$x_{i 2}$ : age group ( $0 = age < 60$ years, $1 = age \geq 60$ years),

where the reference level for $x_{i 1}$ is female, $x_{i 1} = 0$ and for $x_{i 2}$ is $age < 60$ years, $x_{i 2} = 0$ .

Table 2 reports the counts of individuals in relation to the current variables. Regardless of the gender, 66% of individuals aged $< 60$ years until the update did not die from the disease, and 20% of individuals aged $\geq 60$ years died of COVID-19. Figure 2 displays the Kaplan–Meier survival curves [17]. Figure 2(a) provides plots by age, Figure 2(b) by gender, and Figure 2(c) by age versus gender. For all plots, there is a high percentage of recovered individuals, mainly for individuals aged $< 60$ years.

Table 2.

Distribution of COVID-19 patients by gender and age.

		Age group
Status	Gender	Age < 60 years	Age ≥ 60 years
Died	Female	1	7
	Male	4	21
Recovered	Female	35	9
	Male	57	5

Open in a new tab

Figure 2. — Plots of Kaplan–Meier survival functions for COVID-19 data: (a) by age group, (b) by gender, (c) by age group versus gender.

First, consider the data analysis after fitting the Weibull, GOLLMax and its particular cases, OLLMax and EMax, distributions to the lifetime patients without explanatory variables. The MLEs and standard errors (SEs) (in parentheses) are given in Table 3 for these data.

Table 3.

Results from some fitted distributions to coronavirus lifetimes.

Regression	$\log (μ)$	$\log (σ)$	$\log (ν)$	$logit (π_{0})$
GOLLMax	3.320	−1.261	0.896	1.163
	(0.005)	(0.036)	(0.009)	(0.111)
OLLMax	2.657		−0.067	1.166
	(0.003)		(0.008)	(0.111)
EMax	2.724	−0.195		1.166
	(0.004)	(0.090)		(0.111)
Weibull	2.846	0.770		1.164
	(0.039)	(0.062)		(0.111)

Open in a new tab

Based on the estimates in Table 3, the empirical and their estimated survival functions for some distributions are reported in Figure 3(a). The empirical and estimated hazard functions are displayed in Figure 3(b), which reveal that the GOLLMax distribution provides the most appropriate fit to the COVID-19 lifetimes based on the risk function. This fact is not visible in the survival function plots.

However, the data set presents some characteristics of individuals that can explain the life span of COVID-19 patients. For this reason, we will check the adequacy of the models considering the effects of the age group and gender.

5.1. The GOLLMax mixture regression

The GOLLMax mixture regression is considered with the systematic components:

{\begin{cases} M_{1} : \log (μ_{i}) = β_{0} + β_{1} Gender + β_{2} Age and logit (π_{0 i}) = γ_{0} \\ M_{2} : \log (μ_{i}) = β_{0} and logit (π_{0 i}) = γ_{0} + γ_{1} Gender + γ_{2} Age \\ M_{3} : \log (μ_{i}) = β_{0} + β_{1} Gender + β_{2} Age and \\ logit (π_{0 i}) = γ_{0} + γ_{1} Gender + γ_{2} Age . \end{cases}

The values of the GD and AIC statistics to compare the GOLLMax, OLLMax, EMax and Weibull regressions are reported in Table 4 under different systematic components. The EMax mixture regression ( $M_{3}$ ) outperforms the GOLLMax, OLLMax, EMax and Weibull regressions for all criteria, and then it can be used effectively to explain the COVID-19 survival times.

Table 4.

Information criteria for mixture regressions with different systematic components.

Model	$M$	GD	AIC
GOLLMax	$M_{1}$	306.90	318.90
	$M_{2}$	301.63	313.63
	$M_{3}$	292.45	308.45
OLLMax	$M_{1}$	308.78	318.78
	$M_{2}$	305.86	315.86
	$M_{3}$	292.57	306.57
EMax	$M_{1}$	308.36	318.36
	$M_{2}$	305.85	315.85
	$M_{3}$	292.39	306.39
Weibull	$M_{1}$	308.01	318.01
	$M_{2}$	306.87	316.87
	$M_{3}$	300.76	314.76

Open in a new tab

Table 5 reports the MLEs and their SEs for four fitted mixture regressions under the structure $M_{3}$ to the current data. All covariates in these regressions are significant at the 5% significance level except for gender. Some conclusions are addressed in the end of this section.

Table 5.

Findings from four fitted regressions under $M_{3}$ to coronavirus data.

	GOLLMax			OLLMax
$θ$	$\hat{θ}$	SE	p-value	$\hat{θ}$	SE	p-value
$β_{0}$	3.563	0.015	<0.001	3.133	0.011	<0.001
$β_{1}$ (male)	−0.017	0.012	0.190	−0.006	0.007	0.441
$β_{2}$ (≥60 years)	−0.632	0.014	<0.001	−0.632	0.010	<0.001
$γ_{0}$	4.045	0.167	<0.001	4.046	0.167	<0.001
$γ_{1}$ (male)	−1.495	0.213	<0.001	−1.495	0.213	<0.001
$γ_{2}$ (≥60 years)	−3.891	0.439	<0.001	−3.893	0.439	<0.001
$\log (σ)$	−0.861	0.041	–	–	–
$\log (ν)$	0.718	0.012	–	0.143	0.010	–
	EMax			Weibull
$θ$	$\hat{θ}$	SE	p-value	$\hat{θ}$	SE	p-value
$β_{0}$	3.038	0.011	<0.001	4.818	0.062	<0.001
$β_{1}$ (male)	−0.007	0.009	0.452	0.147	0.072	0.045
$β_{2}$ (≥60 years)	−0.623	0.010	<0.001	−2.227	0.076	<0.001
$γ_{0}$	4.047	0.167	<0.001	1.073	0.207	<0.001
$γ_{1}$ (male)	−1.495	0.213	<0.001	−1.728	0.407	<0.001
$γ_{2}$ (≥60 years)	−3.894	0.439	<0.001	−0.801	0.450	0.077
$\log (σ)$	0.254	0.089	–	0.880	0.062

Open in a new tab

In addition, the qrs for the fitted EMax mixture regression in Figure 4(a) shows that the residuals have a random behavior in the interval $(- 3, 3)$ . There is no evidence that the model assumptions do not hold, and there are no influential observations.

Figures 4(b) and 4(c) display the qq-plot and Worm plot for the qrs to assess possible departures from the distribution response in the fitted EMax mixture regression under structure $M_{3}$ . They reveal that this regression is suitable for these data.

5.2. Findings

The estimated total recovered fraction is

\hat{π_{0}} = \frac{1}{139} \sum_{i = 1}^{139} \frac{\exp (x_{i}^{⊤} \hat{γ})}{1 + \exp (x_{i}^{⊤} \hat{γ})} = 0.762,

where $x_{i}^{⊤} \hat{γ} = 4.047 - 1.495 x_{1 i} - 3.894 x_{2 i}$ . This estimate indicates that approximately 76% of individuals recovered from those who had a diagnosis of COVID-19. The 95% asymptotic confidence interval is approximately $(67.0 %, 87.5 %)$ which includes the overall rate of 80% for recovered patients in several studies presented in this pandemic literature.

Moreover, we can obtain the following interpretations for the EMax regression from Table 5 at the 5% significance level:

Findings for the scale parameter μ

Based on the fitted regression, there is no evidence $(p - value = 0.452)$ of a significant effect of the gender variable. However, in relation to the survival times, the plot in Figure 5(a) shows that women have a higher survival curve compared to men regardless of age.
There is a significant difference between individuals aged less than 60 years and older than or equal to 60 years in relation to the survival times regardless of the sex. This interpretation can also be seen in Figure 5(b).

Figure 5. — Plots of Kaplan–Meier and estimated survival functions from the EMax regression: (a) by gender and (b) by age group.

Findings for the recovery probability $π_{0}$

The women have higher probability of recovering than men, since the estimate of $γ_{1}$ is negative. The estimated overall proportion for recovered women is ${\hat{π}}_{Female} = 0.846$ and for men it is approximately ${\hat{π}}_{male} = 0.712$ regardless of the age. This can be seen graphically in Figure 5(a).
In relation to the age, the estimate of $γ_{2}$ is also negative, which indicates that individuals aged $\geq 60$ years have lower recovered probabilities than those who aged $< 60$ years. So, the proportion of recovered individuals under 60 years is ${\hat{π}}_{age < 60} = 0.948$ , and for individuals aged 60 years or over it is only ${\hat{π}}_{age \geq 60} = 0.336$ . This fact can be seen in Figure 5(b). The age is an aggravating factor in the recovery of coronavirus patients.
The results for the stratified model in relation to sex by age are reported in Figure 6. In this case, it is noted that female and male individuals under 60 years old have recovered proportions equal to ${\hat{π}}_{Female, age < 60} = 0.982$ and ${\hat{π}}_{Male, age < 60} = 0.927$ , respectively. For individuals aged 60 and over, the proportions of recovered individuals are equal to ${\hat{π}}_{Female, age \geq 60} = 0.538$ , and ${\hat{π}}_{Male, age \geq 60} = 0.207$ . In addition, regardless of gender, the estimated survival function of patients under 60 stabilizes at around 40 days. For women over 60, the estimated survival function stabilizes at approximately 20 days, while for men over 60, this plateau occurs around 30 days.

Figure 6. — Plots of Kaplan–Meier and estimated survival function from the EMax regression stratified by gender and age group.

Note that the estimated probabilities of recovered individuals by the EMax regression using the likelihood and Bayesian methods are approximately equal to the fraction of recovered individuals observed in the Kaplan–Meier plots (Figures 5 and 6).

5.3. Bayesian analysis

The informative priors $β_{j} \sim N (0, 100)$ (for j = 0, 1, 2), $γ_{j} \sim N (0, 100)$ , $σ \sim G (1, 0.1)$ , and $ν \sim G (1, 0.1)$ are adopted for the fitted regression models. We perform 35,000 MCMC computations after a burn-in of 5000 iterations and thinning to every tenth. Posterior results are based on 3000 iterations of the Markov chains with the convergence monitored under the methods of Cowles and Carlin [10]. We calculate the posterior means, standard deviations (SDs), $95 %$ highest posterior density (HPD) intervals, and the logarithm of the pseudo marginal likelihood (LPML) statistic [15].

Table 6 gives the posterior means, SDs, $95 %$ HPD intervals for the parameters, and the LPML statistics for all regressions. All covariates are statistically significant at the $5 %$ significance level for all regressions except for $β_{2}$ , which was the same result obtained before for the MLEs. The EMax regression is the best model under the LPML criteria in agreement with the previous results in Table 4.

Table 6.

Findings for the GOLLMax, OLLMax, EMax, and Weibull regression models.

	GOLLMax				OLLMax
$θ$	Mean	SD	L	U	Mean	SD	L	U
$β_{0}$	3.193	0.407	2.487	4.010	3.281	0.390	2.659	4.206
$β_{1}$	−0.002	0.179	−0.363	0.345	0.031	0.173	−0.331	0.339
$β_{2}$	−0.738	0.297	−1.335	−0.183	−0.816	0.418	−1.864	−0.253
$γ_{0}$	4.096	0.784	2.570	5.657	4.043	0.977	1.999	6.027
$γ_{1}$	−1.543	0.627	−2.721	−0.272	−1.596	0.652	−2.837	−0.298
$γ_{2}$	−3.951	0.711	−5.460	−2.625	−3.889	0.904	−5.518	−1.896
$\log ν$	−0.050	0.469	−0.881	0.881	0.068	0.173	−0.281	0.371
$\log σ$	0.233	0.749	−1.197	1.524
LPML		−154.298				−155.487
	EMax				Weibull
$θ$	Mean	SD	L	U	Mean	SD	L	U
$β_{0}$	3.135	0.321	2.655	3.732	3.550	0.492	2.903	4.814
$β_{1}$	0.003	0.162	−0.312	0.331	0.026	0.178	−0.343	0.359
$β_{2}$	−0.718	0.305	−1.277	−0.250	−0.842	0.544	−2.291	−0.208
$γ_{0}$	4.048	0.737	2.600	5.447	3.806	1.308	1.300	6.294
$γ_{1}$	−1.543	0.610	−2.795	−0.425	−1.572	0.626	−2.798	−0.429
$γ_{2}$	−3.903	0.679	−5.227	−2.557	−3.680	1.313	−5.904	−0.942
$\log σ$	0.224	0.270	−0.312	0.742	0.912	0.131	0.636	1.167
LPML		−153.903				−157.206

Open in a new tab

The probability of patients disease-free after a time t>0 (days) of the follow-up, for all combinations of covariates ( $x_{0}$ ), can be expressed as

p (t) = \Pr (N = 0 | T > t) = \frac{π_{0} (x_{0})}{1 - [1 - π_{0} (x_{0})] F (t)},

where μ and $π_{0}$ are given by (7) at $x_{0}$ , and $F (\cdot)$ is the EMax cdf.

Note that $p (0) = π_{0} (x_{0})$ is the proportion of recovered patients by the end of follow-up. We determine the posterior distribution of the recovered patients from the posterior sample of the EMax regression. Table 7 provides the posterior means and $95 %$ HPD intervals for the proportions of hypothetical recovered patients after 20 days of follow-up.

Table 7.

Posterior means, $95 %$ HPD intervals for the recovered rate by the end of follow-up ( $π_{0}$ ), and the proportion of recovered after 20 days of follow-up.

			$π_{0}$			$p (20)$
Patients	Gender	Age (years)	Mean	L	U	Mean	L	U
A	female	$< 60$	0.981	0.956	0.999	0.986	0.966	1.000
B	male	$< 60$	0.924	0.862	0.983	0.944	0.887	0.992
C	female	$\geq 60$	0.546	0.322	0.754	0.897	0.736	0.997
D	male	$\geq 60$	0.208	0.072	0.352	0.669	0.411	0.909

Open in a new tab

For female patients under 60 years, the recovered proportion is greater than for any other class of patients. For male patients over 60 years, the recovered proportion is lower compared to other classes. The posterior probability distribution of the recovered patients is represented in Figure 7(a). It is clear that COVID-19 female patients under 60 years have less variability. Figure 7(b) displays the probability of recovered after the follow-up period. For example, the probability of patients disease-free after 20 days of the follow-up period for the patient A is 0.986, and for the patient D is 0.669.

Figure 7. — (a) The posterior distribution of the recovered fraction patients and (b) recovered probability for four patients.

6. Concluding remarks

More flexible distributions in terms of the shape of the density function and/or the hazard function can be interesting alternatives to the analysis of COVID-19 data. We proposed the Generalized Odd Log-logistic Maxwell (GOLLMax) mixture regression to model 139 lifetimes of symptomatic patients diagnosed with COVID-19 under right censoring. We used two estimation methods for the parameters of the proposed regression: maximum likelihood and Bayesian inference. Some Monte Carlo simulations investigated the precision of the maximum likelihood estimates (MLEs), and they indicated that the new regression is a good alternative for modeling censored COVID-19 lifetimes. The explanatory variables adopted to explain the lifetime are only the age group and sex of the patients, but other variables may be used to apply the proposed regression in other coronavirus data sets. We displayed the Kaplan–Meier and estimated survival functions from the new regression with their corresponding recovered proportions, thus indicating that it fits well to the current data. The main issue of this paper is to show empirically the importance of the new regression for analyzing COVID-19 data such as incubation times, lengths of staying in intensive care units, and survival and recovery times of hospitalized patients as functions of independent variables that can explain the variability of these times. The survival and recovery probabilities can be estimated accurately from Equations (4) and (7). Thus this issue can be considered as future research to study these problems in other countries.

Acknowledgements

We are very grateful to a referees and associate editor for helpful comments that considerably improved the paper.

Funding Statement

We gratefully acknowledge financial support from CAPES and CNPq.

Notes

https://www.worldometers.info/coronavirus/countries

See: https://www.who.int/docs/default-source/coronaviruse/who-china-joint-mission-on-covid-19-final-report.pdf

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

1.Ahsanullah M. and Alzaatreh A., Some characterizations of the log-logistic distribution, Stoch. Qual. Control 33 (2018), pp. 23–29. [Google Scholar]
2.Alkhouli M., Nanjundappa A., Annie F., Bates M.C., and Bhatt D.L., Sex differences in COVID-19 case fatality rate: Insights from a multinational registry, Mayo Clin. Proc. 29 (2020), pp. 1–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Al-Rousan N. and Al-Najjar H., Data analysis of coronavirus COVID-19 epidemic in South Korea based on recovered and death cases, J. Med. Virol. 92 (2020), pp. 1603–1608. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Buuren S.V. and Fredriks M., Worm plot: A simple diagnostic device for modelling growth reference curves, Stat. Med. 20 (2001), pp. 1259–1277. [DOI] [PubMed] [Google Scholar]
5.Cancho V.G., Rodrigues J., and de Castro M., A flexible model for survival data with a cure rate: A Bayesian approach, J. Appl. Stat. 38 (2011), pp. 57–70. [Google Scholar]
6.Centers for Disease Control and Prevention , Coronavirus Disease 2019 (COVID-19) (2021). Accessed 2021, April 22. Available at https://www.cdc.gov/coronavirus/2019-ncov/symptoms-testing/symptoms.html.
7.Cordeiro G.M., Alizadeh M., Ozel G., Hosseinl B., Ortega E.M.M., and Altun E., The generalized odd log-logistic family of distributions: Properties, regression models and applications, J. Stat. Comput. Simul. 87 (2017), pp. 908–932. [Google Scholar]
8.Cordeiro G.M., Rodrigues G.M., Ortega E.M.M., de Santana L.H., and Vila R., An extended Rayleigh model: Properties, regression and COVID-19 application, preprint (2022). Available at https://arxiv.org/submit/4257234/view.
9.COVID-19 Dashboard by the Center for Systems Science and Engineering. (2021). Accessed 2021, April 22. Available at https://www.arcgis.com/apps/opsdashboard/index.html/bda7594740fd40299423467b48e9ecf6.
10.Cowles M.K. and Carlin B.P., Markov chain Monte Carlo convergence diagnostics: A comparative review, J. Amer. Statist. Assoc. 91 (1996), pp. 883–904. [Google Scholar]
11.Dong E., Du H., and Gardner L., An interactive web-based dashboard to track COVID-19 in real time, Lancet Infect. Dis. 20 (2020), pp. 533–534. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Dunn P.K. and Smyth G.K., Randomized quantile residuals, J. Comput. Graph. Stat. 5 (1996), pp. 236–244. [Google Scholar]
13.Gleaton J.U. and Lynch J.D., Properties of generalized log-logistic families of lifetime distributions, J. Probab. Statist. Sci. 4 (2006), pp. 51–64. [Google Scholar]
14.Hewitt J., Carter B., Vilches-Moraga A., Quinn T.J., Braude P., Verduri A., Pearce L., Stechman M., Short R., Price A., Collins J.T., Bruce E., Einarsson A., Rickard F., Mitchell E., Holloway M., Hesford J., Barlow-Pay F., Clini E., Myint P.K., Moug S.J., and McCarthy K., COPE Study Collaborators , The effect of frailty on survival in patients with COVID-19 (COPE): A multicentre, European, observational cohort study, Lancet Public Health 5 (2020), pp. 444–451. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Ibrahim J.G., Chen M.H., and Sinha D., Bayesian Survival Analysis, Springer, New York, 2001. [Google Scholar]
16.Jin J.-M., Bai P., He W., Wu F., Liu X.-F., Han D.-M., Liu S., and Yang J.-K., Gender differences in patients with COVID-19: Focus on severity and mortality, Front. Public Health 8 (2020), pp. 1–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Kaplan E.L. and Meier P., Nonparametric estimation from incomplete observations, J. Am. Stat. Assoc. 53 (1958), pp. 457–481. [Google Scholar]
18.Lawless J.F., Statistical Models and Methods for Lifetime Data, John Wiley & Sons, New Jersey, 2003. [Google Scholar]
19.Lee E.T. and Wang J.W., Statistical Methods for Survival Data Analysis, John Wiley & Sons, New Jersey, 2002. [Google Scholar]
20.Li C.-S., Taylor J.M.G., and Sy J.P., Identifiability of cure models, Stat. Probab. Lett. 54 (2001), pp. 389–395. [Google Scholar]
21.Maller R.A. and Zhou X., Survival Analysis with Long-term Survivors, John Wiley & Sons, New Jersey, 1996. [Google Scholar]
22.Mehra M.R., Desai S.S., Kuy S., Henry T.D., and Patel A.N., Cardiovascular disease, drug therapy, and mortality in COVID-19, N. Engl. J. Med. 382 (2020), pp. 1–7. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
23.Ministério da Saúde – Brasil , Coronavírus: COVID-19 (2021). Accessed 2021, April 22. Available at https://coronavirus.saude.gov.br/sobre-a-doencasintomas.
24.Morena V., Milazzo L., Oreni L., Bestetti G., Fossali T., Bassoli C., Torre A., Cossu M.V., Minari C., Ballone E., Perotti A., Mileto D., Niero F., Merli S., Foschi A., Vimercati S., Rizzardini G., Sollima S., Bradanini L., Galimberti L., Combo R., Micheli V., Negri C., Ridolfo A.L., Meroni L., Galli M., Antinori S., and Corbellino M., Off-label use of tocilizumab for the treatment of SARS-CoV-2 pneumonia in Milan, Italy, Eur. J. Intern. Med. 76 (2020), pp. 36–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Ortega E.M.M., Cancho V.G., and Lachos V.H., A generalized log-gamma mixture model for cure rate: Estimation and sensitivity analysis, Sankhya 71 (2009), pp. 1–29. [Google Scholar]
26.Ortega E.M.M., da Cruz J.N., and Cordeiro G.M., The log-odd logistic-Weibull regression model under informative censoring, Model Assist. Stat. Appl. 14 (2019), pp. 239–254. [Google Scholar]
27.Prataviera F., Loibel S.M.C., Greco K.F., Ortega E.M.M., and Cordeiro G.M., Modelling non-proportional hazard for survival data with different systematic components, Environ. Ecol. Stat. 27 (2020), pp. 467–489. [Google Scholar]
28.Prataviera F., Ortega E.M.M., and Cordeiro G.M., A new bimodal Maxwell regression model with engineering applications, Appl. Math. Inf. Sci. 14 (2020), pp. 817–831. [Google Scholar]
29.Prataviera F., Silva A.M.M., Cardoso E.J.B.N., Cordeiro G.M., and Ortega E.M.M., A novel generalized odd log-logistic Maxwell-based regression with application to microbiology, Appl. Math. Model. 93 (2021), pp. 148–164. [Google Scholar]
30.Stasinopoulos D.M., Rigby R.A., and Akantziliotou C., Instructions on How to Use the GAMLSS Package in R (2008). Available at http://www.gamlss.com/wp-content/uploads/2013/01/gamlss-manual.pdf.
31.Stasinopoulos D.M., Rigby R.A., Heller G.Z., Voudouris V., and De Bastiani F., Flexible Regression and Smoothing: Using GAMLSS in R, Chapman and Hall/CRC, New York, 2017. [Google Scholar]
32.Voinsky I., Baristaite G., and Gurwitz D., Effects of age and sex on recovery from COVID-19: Analysis of 5769 Israeli patients, J. Infect. 81 (2020), pp. 102–103. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Xie J., Covassin N., Fan Z., Singh P., Gao W., Li G., Kara T., and Somers V.K., Association between hypoxemia and mortality in patients with COVID-19, Mayo Clin. Proc. 95 (2020), pp. 1138–1147. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Yan Y., Yang Y., Wang F., Ren H., Zhang S., Shi X., Yu X., and Dong K., Clinical characteristics and outcomes of patients with severe Covid-19 with diabetes, BMJ Open Diabetes Res. Care 8 (2020), pp. 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Yang A.-P., Liu J.-P., Tao W.-Q., and Li H.-M., The diagnostic and predictive role of NLR, d-NLR and PLR in COVID-19 patients, Int. Immunopharmacol. 84 (2020), pp. 1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0001] 1.Ahsanullah M. and Alzaatreh A., Some characterizations of the log-logistic distribution, Stoch. Qual. Control 33 (2018), pp. 23–29. [Google Scholar]

[CIT0002] 2.Alkhouli M., Nanjundappa A., Annie F., Bates M.C., and Bhatt D.L., Sex differences in COVID-19 case fatality rate: Insights from a multinational registry, Mayo Clin. Proc. 29 (2020), pp. 1–22. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0003] 3.Al-Rousan N. and Al-Najjar H., Data analysis of coronavirus COVID-19 epidemic in South Korea based on recovered and death cases, J. Med. Virol. 92 (2020), pp. 1603–1608. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0004] 4.Buuren S.V. and Fredriks M., Worm plot: A simple diagnostic device for modelling growth reference curves, Stat. Med. 20 (2001), pp. 1259–1277. [DOI] [PubMed] [Google Scholar]

[CIT0005] 5.Cancho V.G., Rodrigues J., and de Castro M., A flexible model for survival data with a cure rate: A Bayesian approach, J. Appl. Stat. 38 (2011), pp. 57–70. [Google Scholar]

[CIT0006] 6.Centers for Disease Control and Prevention , Coronavirus Disease 2019 (COVID-19) (2021). Accessed 2021, April 22. Available at https://www.cdc.gov/coronavirus/2019-ncov/symptoms-testing/symptoms.html.

[CIT0007] 7.Cordeiro G.M., Alizadeh M., Ozel G., Hosseinl B., Ortega E.M.M., and Altun E., The generalized odd log-logistic family of distributions: Properties, regression models and applications, J. Stat. Comput. Simul. 87 (2017), pp. 908–932. [Google Scholar]

[CIT0008] 8.Cordeiro G.M., Rodrigues G.M., Ortega E.M.M., de Santana L.H., and Vila R., An extended Rayleigh model: Properties, regression and COVID-19 application, preprint (2022). Available at https://arxiv.org/submit/4257234/view.

[CIT0009] 9.COVID-19 Dashboard by the Center for Systems Science and Engineering. (2021). Accessed 2021, April 22. Available at https://www.arcgis.com/apps/opsdashboard/index.html/bda7594740fd40299423467b48e9ecf6.

[CIT0010] 10.Cowles M.K. and Carlin B.P., Markov chain Monte Carlo convergence diagnostics: A comparative review, J. Amer. Statist. Assoc. 91 (1996), pp. 883–904. [Google Scholar]

[CIT0011] 11.Dong E., Du H., and Gardner L., An interactive web-based dashboard to track COVID-19 in real time, Lancet Infect. Dis. 20 (2020), pp. 533–534. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0012] 12.Dunn P.K. and Smyth G.K., Randomized quantile residuals, J. Comput. Graph. Stat. 5 (1996), pp. 236–244. [Google Scholar]

[CIT0013] 13.Gleaton J.U. and Lynch J.D., Properties of generalized log-logistic families of lifetime distributions, J. Probab. Statist. Sci. 4 (2006), pp. 51–64. [Google Scholar]

[CIT0014] 14.Hewitt J., Carter B., Vilches-Moraga A., Quinn T.J., Braude P., Verduri A., Pearce L., Stechman M., Short R., Price A., Collins J.T., Bruce E., Einarsson A., Rickard F., Mitchell E., Holloway M., Hesford J., Barlow-Pay F., Clini E., Myint P.K., Moug S.J., and McCarthy K., COPE Study Collaborators , The effect of frailty on survival in patients with COVID-19 (COPE): A multicentre, European, observational cohort study, Lancet Public Health 5 (2020), pp. 444–451. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0015] 15.Ibrahim J.G., Chen M.H., and Sinha D., Bayesian Survival Analysis, Springer, New York, 2001. [Google Scholar]

[CIT0016] 16.Jin J.-M., Bai P., He W., Wu F., Liu X.-F., Han D.-M., Liu S., and Yang J.-K., Gender differences in patients with COVID-19: Focus on severity and mortality, Front. Public Health 8 (2020), pp. 1–6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0017] 17.Kaplan E.L. and Meier P., Nonparametric estimation from incomplete observations, J. Am. Stat. Assoc. 53 (1958), pp. 457–481. [Google Scholar]

[CIT0018] 18.Lawless J.F., Statistical Models and Methods for Lifetime Data, John Wiley & Sons, New Jersey, 2003. [Google Scholar]

[CIT0019] 19.Lee E.T. and Wang J.W., Statistical Methods for Survival Data Analysis, John Wiley & Sons, New Jersey, 2002. [Google Scholar]

[CIT0020] 20.Li C.-S., Taylor J.M.G., and Sy J.P., Identifiability of cure models, Stat. Probab. Lett. 54 (2001), pp. 389–395. [Google Scholar]

[CIT0021] 21.Maller R.A. and Zhou X., Survival Analysis with Long-term Survivors, John Wiley & Sons, New Jersey, 1996. [Google Scholar]

[CIT0022] 22.Mehra M.R., Desai S.S., Kuy S., Henry T.D., and Patel A.N., Cardiovascular disease, drug therapy, and mortality in COVID-19, N. Engl. J. Med. 382 (2020), pp. 1–7. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]

[CIT0023] 23.Ministério da Saúde – Brasil , Coronavírus: COVID-19 (2021). Accessed 2021, April 22. Available at https://coronavirus.saude.gov.br/sobre-a-doencasintomas.

[CIT0024] 24.Morena V., Milazzo L., Oreni L., Bestetti G., Fossali T., Bassoli C., Torre A., Cossu M.V., Minari C., Ballone E., Perotti A., Mileto D., Niero F., Merli S., Foschi A., Vimercati S., Rizzardini G., Sollima S., Bradanini L., Galimberti L., Combo R., Micheli V., Negri C., Ridolfo A.L., Meroni L., Galli M., Antinori S., and Corbellino M., Off-label use of tocilizumab for the treatment of SARS-CoV-2 pneumonia in Milan, Italy, Eur. J. Intern. Med. 76 (2020), pp. 36–42. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0025] 25.Ortega E.M.M., Cancho V.G., and Lachos V.H., A generalized log-gamma mixture model for cure rate: Estimation and sensitivity analysis, Sankhya 71 (2009), pp. 1–29. [Google Scholar]

[CIT0026] 26.Ortega E.M.M., da Cruz J.N., and Cordeiro G.M., The log-odd logistic-Weibull regression model under informative censoring, Model Assist. Stat. Appl. 14 (2019), pp. 239–254. [Google Scholar]

[CIT0027] 27.Prataviera F., Loibel S.M.C., Greco K.F., Ortega E.M.M., and Cordeiro G.M., Modelling non-proportional hazard for survival data with different systematic components, Environ. Ecol. Stat. 27 (2020), pp. 467–489. [Google Scholar]

[CIT0028] 28.Prataviera F., Ortega E.M.M., and Cordeiro G.M., A new bimodal Maxwell regression model with engineering applications, Appl. Math. Inf. Sci. 14 (2020), pp. 817–831. [Google Scholar]

[CIT0029] 29.Prataviera F., Silva A.M.M., Cardoso E.J.B.N., Cordeiro G.M., and Ortega E.M.M., A novel generalized odd log-logistic Maxwell-based regression with application to microbiology, Appl. Math. Model. 93 (2021), pp. 148–164. [Google Scholar]

[CIT0030] 30.Stasinopoulos D.M., Rigby R.A., and Akantziliotou C., Instructions on How to Use the GAMLSS Package in R (2008). Available at http://www.gamlss.com/wp-content/uploads/2013/01/gamlss-manual.pdf.

[CIT0031] 31.Stasinopoulos D.M., Rigby R.A., Heller G.Z., Voudouris V., and De Bastiani F., Flexible Regression and Smoothing: Using GAMLSS in R, Chapman and Hall/CRC, New York, 2017. [Google Scholar]

[CIT0032] 32.Voinsky I., Baristaite G., and Gurwitz D., Effects of age and sex on recovery from COVID-19: Analysis of 5769 Israeli patients, J. Infect. 81 (2020), pp. 102–103. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0033] 33.Xie J., Covassin N., Fan Z., Singh P., Gao W., Li G., Kara T., and Somers V.K., Association between hypoxemia and mortality in patients with COVID-19, Mayo Clin. Proc. 95 (2020), pp. 1138–1147. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0034] 34.Yan Y., Yang Y., Wang F., Ren H., Zhang S., Shi X., Yu X., and Dong K., Clinical characteristics and outcomes of patients with severe Covid-19 with diabetes, BMJ Open Diabetes Res. Care 8 (2020), pp. 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0035] 35.Yang A.-P., Liu J.-P., Tao W.-Q., and Li H.-M., The diagnostic and predictive role of NLR, d-NLR and PLR in COVID-19 patients, Int. Immunopharmacol. 84 (2020), pp. 1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A new flexible regression model with application to recovery probability Covid-19 patients

F Prataviera

E M Hashimoto

E M M Ortega

G M Cordeiro

V G Cancho

R Vila

Abstract

1. Introduction

Figure 1.

2. New properties of the GOLLMax distribution

Proposition 2.1

Proof.

Proposition 2.2

Proof.

Proposition 2.3

Proof.

3. The GOLLMax mixture regression

3.1. Estimation

3.2. A Bayesian analysis

3.3. Simulation study

Table 1.

4. Checking model

5. Application

Table 2.

Figure 2.

Table 3.

Figure 3.

5.1. The GOLLMax mixture regression

Table 4.

Table 5.

Figure 4.

5.2. Findings

Figure 5.

Figure 6.

5.3. Bayesian analysis

Table 6.

Table 7.

Figure 7.

6. Concluding remarks

Acknowledgements

Funding Statement

Notes

Disclosure statement

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases