Cox regression with doubly truncated responses and time-dependent covariates: the impact of innovation on firm survival

J de Uña-Álvarez; A I Martínez-Senra; M S Otero-Giráldez; M A Quintás

doi:10.1080/02664763.2023.2178641

. 2023 Feb 16;51(4):780–792. doi: 10.1080/02664763.2023.2178641

Cox regression with doubly truncated responses and time-dependent covariates: the impact of innovation on firm survival

J de Uña-Álvarez ^a,^CONTACT, A I Martínez-Senra ^b, M S Otero-Giráldez ^c, M A Quintás ^b

PMCID: PMC10929676 PMID: 38476618

Abstract

The creation of new firms is an important incentive for the economic growth of a country, since it generates employment, it encourages the competition, and promotes innovation. In this work, we investigate the survival of Spanish firms which were created since 2001 and closed down between 2004 and 2012. The information was gathered from Technological Innovation Panel (PITEC), a survey with a focus the technological innovation in Spanish firms. In particular, a Cox regression model with time-dependent covariates was used in order to identify and quantify the determinants of the risk of exit for the firm. The selection bias due to the interval sampling for the firms was corrected by using methods for doubly truncated lifetimes. Interestingly, it is seen how the correction for the selection bias changes both the size and the statistical significance of the effects provided by standard Cox regression.

Keywords: Interval sampling, selection bias, inverse probability weighting, firm lifespan, economic cycle, panel data

1. Introduction

The creation of new firms is an important incentive for the economic growth of a country, since it generates employment, it encourages competition and promotes innovation [1]. This explains why public policies for entrepreneurship have proliferated in the last years [19]. However, in the EU, 18% of the firms created in 2019 closed down before within the first year of life, while the percentage raised to 55% when considering the firms with a lifespan of five years or less [16]. Data for Spain are not far from the EU average; specifically, in the same period the aforementioned percentages in the Spanish case were 23% and 58%, respectively. Besides, these data present a low variability along the studied period, both in the EU and in the case of Spain [16]. Therefore, it is highly relevant for managers and policymakers to know more about the determinants for survival of the firms.

In this paper, we provide empirical evidence on the variables which affect the risk of exit for a firm, with the focus on the economic cycle and several measures of firms' innovation activities. Specifically, we consider a sample of 258 Spanish firms which were created in 2001 or later and that closed down between 2004 and 2012. Therefore, the study is restricted to short-duration firms. The source of data is the Technological Innovation Panel (PITEC), a survey conducted in Spain that gathered information on the technological innovation of enterprises. A Cox regression model with time-dependent covariates will be used to measure the impact of the indicators on the risk of exit for a firm and to identify the factors for which the impact on survival is significant.

An important issue in our study is the fact that, due to the interval sampling, the firms' lifespans are doubly truncated. This induces a selection bias, so different lifespans may have different sampling probabilities and, therefore, ordinary estimation approaches are generally biased. The situation is the same as that found in previous studies in Epidemiology, Medicine, Engineering and Social Sciences, in which interval sampling was also employed. For example, Moreira and de Uña-Álvarez [25] investigated the potential selection bias for childhood cancer; Zhu and Wang [39] corrected for double truncation when using registries on survival data for ovarian cancer; Ye and Tang [38] accounted for interval sampling in a reliability study; Mandel et al. [22], in the scope of a study of the age at onset for Parkinson's disease, illustrated the relevance of correcting for double truncation in regression; Rennert and Xie [30] described in detail the selection bias which may arise in autopsy-confirmed Alzheimer's disease patients; more recently, Frank et al. [18] and Weissbach and Wied [37] accounted for the selection bias in the study of age at insolvency for firms or marriage lengths. See Ref. [12] for an up-to-date review of methods and software to cope with doubly truncated data. Importantly, while the issue of double truncation has been well studied in the last years, it is still often overlooked in applied research.

In this paper, we revisit Cox regression for a doubly truncated response. We follow the inverse probability weighting approach in Ref. [22]; however, unlike in the referred paper, we deal with the time-dependent covariates that arise in the PITEC case study; see Section 3 for details. As mentioned in Ref. [35], time-dependent covariates can be incorporated in the key estimating equations, so consistency of the inverse probability weighting procedure still holds. A formal discussion is given in Section 2. Interestingly, the application to PITEC data illustrates for the first time in the literature on double truncation how Cox regression with time-dependent covariates may be performed. At the same time, our case study serves to point up the relevance of correcting for the selection bias; that is, we will discuss the extent to which the estimated coefficients and/or their statistical significance is affected by the double truncation issue.

The rest of the paper is organized as follows. In Section 2 we give the needed notation and we formally provide the correction of the score equations in the Cox model so the selection bias and the time-dependent covariates are taken into account. In Section 3, by using PITEC data, the Cox regression model is employed to estimate the impact several measures of innovation activities have on the risk of exit for a firm. The main conclusions of our research and a final discussion are reported in Section 4.

2. Methodology

2.1. Doubly-truncated Cox regression with time-dependent covariates

Let T be the lifetime in the target population, let $Z (t)$ be a vector of time-dependent covariates, and let $(U, V)$ be the couple of left and right truncating variables, assumed to be independent of ${T, Z (t), 0 \leq t \leq T}$ . Double truncation on T means that the information $(T, Z (\cdot), U, V)$ is available only when $U \leq T \leq V$ . This is the situation with interval sampling, where only lifespans that end within a certain calendar interval $[d_{0}, d_{1}]$ are recruited. In such a case, the right-truncation limit for T is $V = d_{1} - d_{I}$ , where $d_{I}$ is the date of onset, while the left-truncation limit is $U = V - (d_{1} - d_{0})$ . Due to the double truncation, the sampling probability for a lifespan of size t is given by $G (t) = P (U \leq t \leq V)$ , which generally varies with t. In Figure 1, right, the estimated sampling probability $G_{n} (t)$ for the PITEC case study is depicted. The function is not monotone, as usual with doubly truncated data; this is due to the existence of two truncation limits, which lessen the chances of observing relatively small or large lifetimes. This $G_{n}$ is a nonparametric maximum-likelihood estimator (NPMLE) that can be obtained from a system of marginal estimating equations (see e.g. [32]). Specifically, $G_{n}$ is a solution for the couple of equations

Figure 1. — Selection bias for the PITEC study. Interval sampling results in doubly truncated firm lifetimes, which consequently have different sampling probabilities. Left: Interval sampling for the PITEC study: gray segments are not observed. The left-truncating variable is given by U = V −8 (years). Right: Sampling probabilities for the PITEC study, together with 500 bootstrap replicates (in gray).

$F_{n} (u, v) = (\sum_{i = 1}^{n} G_{n} (T_{i})^{- 1})^{- 1} \sum_{i = 1}^{n} G_{n} (T_{i})^{- 1} I (u \leq T_{i} \leq v)$ ,
$G_{n} (t) = (\sum_{i = 1}^{n} F_{n} (U_{i}, V_{i})^{- 1})^{- 1} \sum_{i = 1}^{n} F_{n} (U_{i}, V_{i})^{- 1} I (U_{i} \leq t \leq V_{i})$ ,

where $(T_{i}, U_{i}, V_{i})$ , $1 \leq i \leq n$ , are the values of $(T, U, V)$ in the available sample of size n. In (i)-(ii), $F_{n} (u, v)$ stands for an estimator of $F (u, v) = P (u \leq T \leq v)$ . In practice, $G_{n}$ and $F_{n}$ are obtained iteratively from the system above, starting from (i) with the initial solution $G_{n} (T_{i}) = 1$ , $1 \leq i \leq n$ .

The Cox regression model specifies the covariate effects on the hazard function, which represents the risk of exit at a given time point. Specifically, under the proportional hazards Cox model, the conditional hazard function is given by

h (t | Z (u), 0 \leq u \leq t) = h_{0} (t) \exp (βZ (t))

(1)

where $h_{0} (t)$ is a baseline hazard, and where β is the vector of regression coefficients. Equation (1) implicitly states that, given $Z (t)$ , the risk of exit at time t is unaffected by the past values of the covariates $Z (u)$ , u<t. The appealings of the Cox model include its flexibility and easy interpretation of the regression coefficients. In particular, $\exp (β_{k})$ is the multiplicative factor for the hazard at time t when the k-the covariate $Z_{k} (t)$ has a one-unit increase.

Mandel et al. [22] investigated the effect of double truncation on model (1). These authors introduced an inverse probability weighting approach to adapt the standard estimating equation of Cox model for a doubly truncated response. The consistency of such approach for fixed covariates was formally established in Refs [22] and [13]. Technical derivations and proofs for time-dependent covariates are missing in the literature at the time of writing. In the case of time-dependent covariates, the estimating equation in Ref. [22] is rewritten as

\sum_{i = 1}^{n} [Z_{i} (T_{i}) - \frac{\sum_{j = 1}^{n} Z_{j} (T_{i}) e^{β Z_{j} (T_{i})} I (T_{j} \geq T_{i}) G_{n} (T_{j})^{- 1}}{\sum_{j = 1}^{n} e^{β Z_{j} (T_{i})} I (T_{j} \geq T_{i}) G_{n} (T_{j})^{- 1}}] = 0.

(2)

As mentioned, the consistency of the solution $\hat{β}$ to (2) has not been theoretically established. However, the steps in [22] for fixed covariates can be adapted for such a goal; see also [4] for related results. Interestingly, Wang [36] discussed how to modify the risk sets under length-bias in order to obtain a suitable Cox's partial likelihood with time-dependent covariates. Our equation (2) equals the score equation in that paper once the selection probability $G_{n} (T_{j})$ is replaced by its analog under length-bias, namely $T_{j}$ ; see Ref. [28] for more on the length-bias model. Preliminary simulations with time-dependent covariates suggest the consistency of the solution in (2) in the doubly truncated setting too; see Section 2.3 for an illustrative example.

2.2. Practical implementation

Cox regression is implemented in most of the statistical software platforms. For instance, function coxph() of the R package survival allows for proportional hazards regression with time-dependent covariates. The estimating equation in Ref. [22] can be solved by pluggin-in the minus log-transformed sampling probabilities $G_{n} (T_{i})$ in the right-hand side of the formula within coxph() as an offset() term. This is because of the equality $\exp (β Z_{j} (T_{i})) G_{n} (T_{j})^{- 1} = \exp (β Z_{j} (T_{i}) - \log (G_{n} (T_{j})))$ , which can be used in (2) for a better understanding. Specifically, when covariates are fixed, one may use the code line

2.2.

where w contains the estimated sampling probabilities. On their turn, the entries of w can be easily computed from the function shen() in the DTDA package. With time-dependent covariates this is rewritten as

2.2.

Here, as usual, entry, exit and event refer to the expanded dataset where the covariates Z are constant within each entry/exit time interval. For the PITEC study in Section 3, this just means that intervals with length one year must be considered. Note that this data organization allows for the inclusion of both discrete and continuous covariates. Importantly, because of the estimation of the weights, the standard errors and P-values reported by coxph() are no longer valid. The simple bootstrap that resamples with replacement from the data has been suggested as a valid procedure to approximate these quantities in practice [12]. This is the method we use to evaluate the significance of the covariates in the PITEC study, see Section 3.

An alternative estimating equation for Cox regression with a doubly-truncated response is implemented in coxDT() within the package SurvTrunc. When covariates are fixed, coxDT() performs a weighted Cox regression which is equivalent to

2.2.

This implies that each addend in the estimating equation (2) is further inversely weighted by using the $G_{n} (T_{i})$ 's; see Ref. [29] for a formal justification. This alternative approach has been found to provide mean squared errors somehow larger than those of the solution to (2); see Refs [11] and [35] for further results and discussion.

2.3. Simulation study

In this section, we perform a simulation study to compare the performance of several estimating approaches for the doubly-truncated Cox model with time-dependent covariates. We simulated a model with a single binary covariate $Z_{1} \in {0, 1}$ and a change point for the hazard ratio at time ψ. Specifically, the conditional hazard given $Z_{1}$ is

h (t | Z_{1}, Z_{2} (t)) = h_{0} (t) \exp (β_{1} Z_{1} + β_{2} Z_{2} (t))

(3)

where $Z_{2} (t) = Z_{1} I (t \geq ψ)$ . In this model, the hazard of an individual with $Z_{1} = 1$ relative to $Z_{1} = 0$ is $\exp (β_{1})$ for $t \leq ψ$ , and $\exp (β_{1} + β_{2})$ for $t > ψ$ . Note that the Cox-like change point model (3) actually involves a time-dependent covariate $Z_{2} (t)$ . In our simulations, we took $h_{0} (\cdot) \equiv 1$ , $ψ = 0.5$ , $\exp (β_{1}) = 1$ and $\exp (β_{2}) = 2$ . With these values, the response T is essentially supported on the interval $(0, 8)$ .

In order to introduce double truncation on T we simulated interval sampling as follows. First, the right-truncating variable V was drawn as $V = 16 ξ^{6}$ , with ξ a uniformly distributed variable on the unit interval. Secondly, the left-truncating variable was computed as U = V −8. Data violating condition $U \leq T \leq V$ were removed from the simulated trial. Therefore, the simulation corresponds to an interval sampling scenario in which the width of the sampling interval is 8. Since the distribution of V is not uniform, the birth process is not Poisson and the sampling probability $G (t) = P (U \leq t \leq V)$ is non-constant. Indeed, in the simulated model $G (t)$ is sharply decreasing, indicating more chances of observation for small lifetimes; specifically, $G (t) = (t / 16 + 1 / 2)^{1 / 6} - (t / 16)^{1 / 6}$ . The rate of truncation in this model is about 67%.

For evaluating the accuracy of several estimating approaches for the risk factors $\exp (β_{1})$ and $\exp (β_{2})$ , we have drawn 1000 Monte Carlo trials from the doubly truncated model with sample sizes n = 250, 500 and 1000. Three different estimators were included in the simulation study: (a) the solution to the ordinary estimating equation for the Cox model that does not correct for double truncation; (b) the weighted approach proposed by Mandel et al. [22]; and (c) the doubly weighted procedure in Ref. [29]. For simplicity, for methods (b) and (c) the weight corresponding to the i-th datum was computed from the underlying sampling probability as $w_{i} = G (T_{i})$ . This eliminates the noise attached to the estimation of G. In practice, the NPMLE $G_{n}$ must be used; note that the regression coefficients estimated from methods (b) and (c) will remain consistent as long as the truncating couple $(U, V)$ and the target process $(T, Z (\cdot))$ are statistically independent.

The bias and standard deviation of the three methods along the 1000 Monte Carlo trials are reported in Table 1. Interestingly, it is seen that:

Table 1.

Bias and standard deviation (SD) along 1000 Monte Carlo trials of three different estimating approaches for Cox regression with time-dependent covariates: change point model for the hazard ratio.

		$\exp (β_{1})$		$\exp (β_{2})$
Method	n	Bias	SD	Bias	SD
Ordinary	250	−0.0588	0.1723	0.0741	0.5738
	500	−0.0646	0.1302	0.0229	0.3987
	1000	−0.0638	0.0891	−0.0148	0.2801

Mandel et al.	250	0.0136	0.2018	0.1205	0.6147
	500	0.0065	0.1520	0.0644	0.4250
	1000	0.0064	0.1037	0.0215	0.2985

Rennert-Xie	250	0.0127	0.2057	0.1363	0.6339
	500	0.0059	0.1533	0.0738	0.4325
	1000	0.0057	0.1052	0.0283	0.3045

Open in a new tab

Note: The sample size is n. Ordinary method does not correct for the potential selection bias.

The two methods that correct for the selection bias exhibit a bias and a standard deviation that decrease as the sample size grows. On the contrary, ordinary estimation is systematically biased for the risk factor $\exp (β_{1})$ ;
Among methods (b) and (c), the former presents slightly smaller standard deviations, thus being preferable (note that the bias terms are negligible compared to the standard deviations). This is in agreement with simulation results previously reported for Cox regression with fixed covariates [11,35]. Nevertheless, the difference between the standard deviation of the two methods was always below 3%, so their results are similar for the scenarios simulated in this paper.

In the following section, we use the procedure in Ref. [22] to estimate the covariate effects in the PITEC study. In order to investigate the potential influence of the selection bias, we also compare the results to those provided by the standard Cox estimating equation.

3. Application: PITEC study

PITEC, as part of the Community Innovation Survey, was conducted in Spain under the leadership of the National Statistics Institute (INE), and gathered information on the technological activity of the Spanish firms. The panel is a stratified sample that comprises firms with more than 200 employees; firms that perform internal research and development (R&D); firms with less than 200 employees and external R&D; and a subsample of firms with less than 200 employees and no innovation expenses.

Our sample is made up of newly created firms identified from the PITEC survey between 2003 and 2013, and that closed down along their follow-up. Thus, the sample refers to firms of short duration. Specifically, the 258 firms with observed closedown in the period 2004–2012 were included in the sample, see Figure 1, left; all these firms were created in 2001 or later. The small number of firms closing in 2003 (10 cases) were discarded since, in that year, PITEC survey did not include information on relevant features like, for instance, the innovation range or the international dimension of the firm, that are important for our regression approach below. This resulted in a slightly stronger left-truncation effect when compared to the full survey data. On the other hand, the firms operating by year 2013 were not included in our analysis in order to guarantee the homogeneity of the sample. Note that these firms, with potentially long lifetimes, could exhibit different behavior when compared to the short-duration cases of our study. The exclusion of such firms prevents the application of methods for left-truncated data, which are simpler than the ones for double truncation.

Interestingly, the sample covers two very different periods of time according to the economic cycle: economic growth (2003-2007) and economic crisis (2008-2012). The lifetime T of the recruited firms ranged between 2 and 11 years. Due to the interval sampling, the lifetime T is doubly truncated by a couple $(U, V)$ , where V denotes the span (in years) between the year of creation and 2012, and U is the shift of V corresponding to the length of the sampling interval, that is, $U = V - (2012 -- 2004) = V - 8$ ; see Figure 1, left. The information on T and $(U, V)$ allows for the estimation of the selection bias under a quasi-independence assumption [23]. Kendall's Tau for the association between T and U (or V) reported a value $\hat{τ} = - 0.045$ , with a standard error of 0.041; thus, the quasi-independence between T and U can be accepted. In Figure 1, right, we depict the NPMLE of the sampling probability for a lifetime with size T = t, that is, $G (t) = P (U \leq t \leq V)$ . Figure 1, right, reveals the effect of double truncation, which results in less chances of observation for relatively small or large lifetimes. The variability in the estimation of G is illustrated by adding to Figure 1, right, 500 curves computed from 500 bootstrap replicates. Standard errors and P-values reported below (Table 3, corrected method) take this variability into account in order to perform a valid inference.

Table 2.

Summary statistics for the time-dependent covariates in the PITEC study: Mean, standard deviation (SD), minimum (Min) and maximum (Max).

Covariate	Mean	SD	Min	Max
IR	1.45	1.36	0	5
TECHDEVEL^a	33.39	41.35	0	100
APPLIEDR^a	24.12	36.21	0	100
BASICR^a	3.07	12.85	0	100
SIZE	96.52	572.59	1	8, 377
ECOCYCLE^b	0.63	0.48	0	1
ID^b	0.64	0.48	0	1

Open in a new tab

Note: The number of years of observation is 1, 284.

^aPercentage.

^bDummy variable.

Table 3.

Cox regression with time-dependent covariates for the PITEC study: Standard estimates and correction for selection bias.

	Standard		Corrected
Covariate	Estimate	P-value	Estimate	P-value
IR	−0.1582	0.009	−0.1264	0.028
TECHDEVEL^a	−0.0031	0.076	−0.0019	0.250
APPLIEDR^a	−0.0040	0.042	−0.0051	0.079
BASICR^a	0.0014	0.810	0.0030	0.490
SIZE	0.0002	0.094	0.0002	0.370
ECOCYCLE^b	−0.4745	0.037	−0.8934	0.000
ID^b	−0.0740	0.581	0.0109	0.940

Open in a new tab

Note: The corrected P-values were calculated from 500 resamples of the simple bootstrap. The number of years of observation is 1, 284.

^a Percentage.

^b Dummy variable.

Besides the firm lifetime, which is the variable of main interest, we extracted some covariate information from PITEC. Since the survey is performed yearly, the covariate information is time-dependent. The selected covariates mainly report the innovation activities of the firm; they also help to control for the size of the enterprise and the economic cycle in regression analysis. Innovation has been one of the main factors to explain the survival of firms; this has been particularly true in the last years due to globalization and the increasing competitiveness in the markets. Innovative enterprises adapt better to the changing environment and exhibit a smaller exit rate (Estévez-Pérez et al. [15]). According to the resource-based view of the firm [3], the survival probability depends on the ability of firms to develop specific capabilities, which in turn may be improved by investing in R&D. Several studies have pointed up the positive influence of innovation on survival [8,9,15,27]. Other studies have been, however, less conclusive, or even detected conflicting effects [5,6]. This disagreement may be provoked by the usage of different innovation measures, or because of the overlooking of certain characteristics of the firm or of its environment [34].

Statistical information on the size of the firm, the international dimension and the particular economic cycle that faces the firm along its lifespan is relevant too. Large firms generally have more chances of survival since they have the ability to take advantage of economies of scale, a better access to financial resources, a more specialized human capital or a larger diversification [14]. Also, firms operating abroad are usually associated to an improved efficiency and larger survival rates, because competitiveness in international markets is probably harder than that in internal markets (Esteve-Pérez et al. [15]). On the contrary, the presence of a financial crisis lessens survival since strong restrictions on liquidity and resource immediately appear [10].

Specifically, selected covariates were: innovation range (IR), technological development (TECHDEVEL), applied research (APPLIEDR), basic research (BASICR), firm size (SIZE), international dimension (ID), and economic cycle (ECOCYCLE). Variable IR takes integer values between 1 and 5, depending on the number of existing types of innovation in the firm. Possible categories for the innovation type are: new or significantly improved assets; new or significantly improved services; new or significantly improved manufacturing or asset/services production procedures; new or significantly improved logistic systems or delivery or distribution procedures for the firm supplies, assets or services; and new or significantly improved support activities for the firm processes. Variables TECHDEVEL, APPLIEDR and BASICR refer, in percentage, the distribution of the current expenditure among the indicated R&D categories (technological development, applied research and basic research). The number of employees in the firm is given by SIZE, while ID is a dummy variable that takes the value 1 for firms that do not operate in the European market nor in any other foreign country. Finally, ECOCYCLE indicates whether the year of observation belongs to economic growth (ECOCYCLE=1 for 2003–2007) or economic crisis (ECOCYCLE=0, 2008–2012). Note that all the covariates, with the only exception of ECOCYCLE, are internal in the sense of [20], meaning that they are only available subject to the survival of the firm. Importantly, an indicator for the closing down of the firm by year t, say, is available within the PITEC survey corresponding to year t + 1. This allows to relate the risk of exit for the firms to the covariate values.

Summary statistics of the time-dependent covariates along the 1, 396 years of observation for the 258 firms in the sample are given in Table 2. Since some of the covariates had missing values in 112 cases, these were removed from the 1, 396 years. The average innovation range was relatively low (1.45), although a certain heterogeneity was detected (standard deviation: 1.36). The percentage of current expenditure for R&D was 60.6 on average; most of this investment was dedicated to technological development or applied research, while basic research in the sampled firms was almost negligible. On the other hand, from Table 2, it is seen that there exists a lot of variability of the firm size. Finally, the firms presented a good balance with respect to both the economic cycle (about 63% of the observation years correspond to economic growth) and the international dimension (36% of the observation years relate to firms operating abroad).

Cox model (1) was estimated by using the 1, 284 years of observation for the 258 firms. The results of both standard Cox regression and its correction for the sampling bias, see Figure 1, are provided in Table 3. Significant covariates (at level $α = 0.1$ ) when taking the sampling bias into account are IR, APPLIEDR and ECOCYCLE. These covariates are also identified through standard Cox regression, although the size of the effect does not exactly coincide. This is particularly clear for ECOCYCLE, whose effect almost double when performing the correction. Importantly, standard Cox declares significance for TECHDEVEL and SIZE, for which the corrected P-values are above $α = 0.1$ . In the case of TECHDEVEL this is mainly due to the estimated effect, which visibly reduces when performing the weighted regression; for SIZE, the different result has rather to do with the variance increase attached to the corrected estimators. Whatever the case, from Table 3, it becomes evident that the sampling bias is playing a role, and that performing a correction for double truncation may influence both point estimation and statistical inference. In the following, we discuss the results provided by the weighted regression.

According to the estimated Cox regression model that takes the double truncation issue into account, economic growth (ECOCYCLE=1) significantly reduces the risk of exit for the firm, with a multiplicative factor of $\exp (- 0.8934) = 0.41$ for the hazard. That is, the risk of exit along the period 2003–2007 is about 41% of that corresponding to the economic crisis 2008–2012. The sign of the effect is in well agreement with previous studies on the topic [17].

The estimated effect for IR ( ${\hat{β}}_{1} = - 0.1264$ ) was significantly negative, indicating that the risk of exit for the firm decreases as the innovation range grows. Specifically, the risk of exit is multiplied by $\exp ({\hat{β}}_{1}) = 0.88$ for each one-unit increase in IR. A larger investment in applied research significantly decreases the hazard too; the estimated multiplicative factor for the hazard, in this case, is 0.99.

As mentioned, variable APPLIEDR is statistically significant at level $α = 0.1$ in the weighted regression. However, the other variables related to the R&D activity of the firm did not reach statistical significance. The absence of significance for the variable BASICR was somehow expected. Indeed, basic research in entrepreneurship is not frequently found, being rather present in research institutes and universities [7,31]. Moreover, firms internally performing basic research are typically of larger size compared to those in our sample [2,24].

From Table 3, it is also seen that the size of the firm does not help to explain its survival. The literature is controversial to this regard since, although most of the studies establish a positive association [6,8,27], there exist examples that do not find any relationship [26]. Additionally, the authors in Ref. [21] show that, in the case of franchised chains, survival decreases as the number of franchised outlets (that is, the size of the chain) grows. The latter study suggests that a negative effect of the size of a firm on survival could occur. On the other hand, the lack of significance for the international dimension of the firm is in disagreement with the evidences reported by Refs [14,15].

4. Discussion

Interval sampling generally induces a selection bias in lifetime data, so ordinary estimation methods may be inconsistent. This applies not only to the estimation of the marginal distribution of the lifetimes, but also to the estimation of regression effects and testing for significance in regression. In this paper, we have used a proportional hazards regression model to find the determinants for the survival of Spanish firms, as well as for measuring the size of the relevant effects. Since the selected firms are those with closedown within a certain calendar interval, the firm lifespans suffer from a selection bias. By using methods for doubly truncated data, the sampling bias for the firm lifespans has been estimated, and suitable weights have been computed. It has been seen that the attained results are sensitive to the correction of the selection bias. Specifically, for particular covariates both the size of the effect and the P-value varied when taking the selection bias into account.

The follow-up of the firms resulted in covariate information that is time-varying. Therefore, a Cox regression model with time-dependent covariates and a doubly truncated response was employed. To the best of our knowledge, theoretical, simulation and practical results for the Cox model with a doubly truncated response in the literature restrict to the case of fixed covariates. Therefore, the results in this paper bring new interesting insights on the topic. Additionally, from a practical viewpoint, it has been indicated how a Cox model in our complicated setting can be easily fitted by using standard software and resampling algorithms.

An interesting issue is how to test for model fit with a doubly truncated response. To the best of our knowledge, formal methods to test for proportional hazards in the doubly truncated setting have not been developed yet, and this certainly falls beyond the goals of the current paper. Still, one could (informally) apply existing methods to check model fit by ignoring the randomness of the (estimated) sampling probabilities $G_{n} (T_{i})$ . For instance, for the PITEC study plots of the Schoenfeld residuals [33] were constructed, and no deviation from the proportional hazards assumption was found. As mentioned, formal methods for model assessment require further development; this is left for our future research.

Acknowledgments

The authors thank two anonymous reviewers and an Associate Editor for comments and suggestions, which have served to improved the paper.

Correction Statement

This article has been corrected with minor changes. These changes do not impact the academic content of the article.

Funding Statement

This work was supported by the Ministerio de Ciencia e Innovación [grant numbers PID2020-118101GB-I00 and PID2019-106677GB-I00].

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

1.Acs Z., How is entrepreneurship good for economic growth?, Innov. Technol. Gov. Glob. 1 (2006), pp. 97–107. [Google Scholar]
2.Andries P. and Thorwarth S., Should firms outsource their basic research? The impact of firm size on in-house versus outsourced R&D productivity, Creat. Innov. Manag. 23 (2014), pp. 303–317. [Google Scholar]
3.Barney J., Firm resources and sustained competitive advantage, J. Manage. 17 (1991), pp. 99–120. [Google Scholar]
4.Binder D.A., Fitting Cox's proportional hazards models from survey data, Biometrika 79 (1992), pp. 139–147. [Google Scholar]
5.Børing P., The effects of firms' R&D and innovation activities on their survival: A competing risks analysis, Empir. Econ. 49 (2015), pp. 1045–1069. [Google Scholar]
6.Buddelmeyer H., Jensen P.H., and Webster E., Innovation and the determinants of company survival, Oxf. Econ. Pap. 62 (2010), pp. 261–285. [Google Scholar]
7.Calvert J., What's special about basic research?, Sci. Technol. Hum. Values 31 (2006), pp. 199–220. [Google Scholar]
8.Cefis E. and Marsili O., Survivor: The role of innovation in firms' survival, Res. Policy. 35 (2006), pp. 626–641. [Google Scholar]
9.Cefis E. and Marsili O., Going, going, gone. Exit forms and the innovative capabilities of firms, Res. Policy. 41 (2012), pp. 795–807. [Google Scholar]
10.Clarke G., Cull R., and Kisunko G., External finance and firm survival in the aftermath of the crisis: Evidence from Eastern Europe and Central Asia, J. Comp. Econ. 40 (2012), pp. 372–392. [Google Scholar]
11.de Uña-Álvarez J., R packages for the statistical analysis of doubly truncated data: A review, arXiv 2004.08978 2020.
12.de Uña-Álvarez J., Moreira C., and Crujeiras R.M., The Statistical Analysis of Doubly Truncated Data: With Applications in R, Wiley, Hoboken, NJ, 2021. [Google Scholar]
13.de Uña-Álvarez J. and Van Keilegom I., Efron–Petrosian integrals for doubly truncated data with covariates: An asymptotic analysis, Bernoulli 27 (2021), pp. 249–273. [Google Scholar]
14.Esteve-Pérez S., Máñez-Callejo J.A., and Sanchis-Llopis J.A., Does a “survival-by-exporting” effect for SMEs exist?, Empirica 35 (2008), pp. 81–104. [Google Scholar]
15.Esteve-Pérez S.A., Sanchis-Llopis A., and Sanchis-Llopis J.A., The determinants of survival of Spanish manufacturing firms, Review of Industrial Organization 25 (2004), pp. 251–273. [Google Scholar]
16.EUROSTAT , Business Demography (Online), https://ec.europa.eu/eurostat/web/structural-business-statistics/business-demography, 2022.
17.Fotopoulos G. and Louri H., Determinants of hazard confronting new entry: Does financial structure matter?, Rev. Ind. Organ. 17 (2000), pp. 285–300. [Google Scholar]
18.Frank G., Chae M., and Kim Y., Additive time-dependent hazard model with doubly truncated data, J. Korean. Stat. Soc. 48 (2019), pp. 179–193. [Google Scholar]
19.Gilbert B.A., Audretsch D.B., and McDougall P.P., The emergence of entrepreneurship policy, Small Bus. Econ. 22 (2004), pp. 313–323. [Google Scholar]
20.Kalbfleisch J.D. and Prentice R.L., The Statistical Analysis of Failure Time Data, 2nd ed., Wiley, Hoboken, NJ, 2002. [Google Scholar]
21.Kosová R. and Lafontaine F., Survival and growth in retail and service industries: Evidence from franchised chains, J. Ind. Econ. 58 (2010), pp. 542–578. [Google Scholar]
22.Mandel M., de Uña-Álvarez J., Simon D.K., and Betensky R.A., Inverse probability weighted Cox regression for doubly truncated data, Biometrics 74 (2018), pp. 481–487. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Martin E.C. and Betensky R.A., Testing quasi-independence of failure and truncation times via conditional Kendall's tau, J. Am. Stat. Assoc. 100 (2005), pp. 484–492. [Google Scholar]
24.Martínez-Senra A.I., Quintás M.A., Sartal A., and Vázquez X.H., How can firms' basic research turn into product innovation? The role of absorptive capacity and industry appropriability, IEEE Trans. Eng. Manage. 62 (2015), pp. 205–216. [Google Scholar]
25.Moreira C. and de Uña-Álvarez J., Bootstrapping the NPMLE for doubly truncated data, J. Nonparametr. Stat. 22 (2010), pp. 567–583. [Google Scholar]
26.Nogueira M.A., Fernández-López S., Calvo N., and Rodeiro-Pazos D., Firm characteristics, financial variables and types of innovation: Influence in Spanish firms' survival, Int. J. Entrep. Innov. Manag. 22 (2018), pp. 57–79. [Google Scholar]
27.Ortiz-Villajos J.M. and Sotoca S., Innovation and business survival: A long-term approach, Res. Policy. 47 (2018), pp. 1418–1436. [Google Scholar]
28.Qin J. and Shen Y., Statistical methods for analyzing right-censored length-biased data under Cox model, Biometrics 66 (2010), pp. 382–392. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Rennert L. and Xie S.X., Cox regression model with doubly truncated data, Biometrics 74 (2018), pp. 725–733. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Rennert L. and Xie S.X., Bias induced by ignoring double truncation inherent in autopsy-confirmed survival studies of neurodegenerative diseases, Stat. Med. 38 (2019), pp. 3599–3613. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Salter A.J. and Martin B.R., The economic benefits of publicly funded basic research: A critical review, Res. Policy. 30 (2001), pp. 509–532. [Google Scholar]
32.Shen P.S., Nonparametric analysis of doubly truncated data, Ann. Inst. Stat. Math. 62 (2010), pp. 835–853. [Google Scholar]
33.Therneau T.M. and Grambsch P.M., Modeling Survival Data: Extending the Cox Model, Springer, New York, 2000. [Google Scholar]
34.Ugur M. and Vivarelli M., Innovation, firm survival and productivity: The state of the art, Econ. Innov. New Technol. 30 (2021), pp. 433–467. [Google Scholar]
35.Vakulenko-Lagun B., Mandel M., and Betensky R.A., Inverse probability weighting methods for Cox regression with right-truncated data, Biometrics 76 (2020), pp. 484–495. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Wang M.C., Hazard regression analysis for length-biased data, Biometrika 83 (1996), pp. 343–454. [Google Scholar]
37.Weissbach R. and Wied D., Truncating the exponential with a uniform distribution, Statistical Papers 2021.
38.Ye Z.S. and Tang L.C., Augmenting the unreturned for field data with information on returned failures only, Technometrics 58 (2016), pp. 513–523. [Google Scholar]
39.Zhu H. and Wang M.C., Analysing bivariate survival data with interval sampling and application to cancer epidemiology, Biometrika 99 (2012), pp. 345–361. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0001] 1.Acs Z., How is entrepreneurship good for economic growth?, Innov. Technol. Gov. Glob. 1 (2006), pp. 97–107. [Google Scholar]

[CIT0002] 2.Andries P. and Thorwarth S., Should firms outsource their basic research? The impact of firm size on in-house versus outsourced R&D productivity, Creat. Innov. Manag. 23 (2014), pp. 303–317. [Google Scholar]

[CIT0003] 3.Barney J., Firm resources and sustained competitive advantage, J. Manage. 17 (1991), pp. 99–120. [Google Scholar]

[CIT0004] 4.Binder D.A., Fitting Cox's proportional hazards models from survey data, Biometrika 79 (1992), pp. 139–147. [Google Scholar]

[CIT0005] 5.Børing P., The effects of firms' R&D and innovation activities on their survival: A competing risks analysis, Empir. Econ. 49 (2015), pp. 1045–1069. [Google Scholar]

[CIT0006] 6.Buddelmeyer H., Jensen P.H., and Webster E., Innovation and the determinants of company survival, Oxf. Econ. Pap. 62 (2010), pp. 261–285. [Google Scholar]

[CIT0007] 7.Calvert J., What's special about basic research?, Sci. Technol. Hum. Values 31 (2006), pp. 199–220. [Google Scholar]

[CIT0008] 8.Cefis E. and Marsili O., Survivor: The role of innovation in firms' survival, Res. Policy. 35 (2006), pp. 626–641. [Google Scholar]

[CIT0009] 9.Cefis E. and Marsili O., Going, going, gone. Exit forms and the innovative capabilities of firms, Res. Policy. 41 (2012), pp. 795–807. [Google Scholar]

[CIT0010] 10.Clarke G., Cull R., and Kisunko G., External finance and firm survival in the aftermath of the crisis: Evidence from Eastern Europe and Central Asia, J. Comp. Econ. 40 (2012), pp. 372–392. [Google Scholar]

[CIT0011] 11.de Uña-Álvarez J., R packages for the statistical analysis of doubly truncated data: A review, arXiv 2004.08978 2020.

[CIT0012] 12.de Uña-Álvarez J., Moreira C., and Crujeiras R.M., The Statistical Analysis of Doubly Truncated Data: With Applications in R, Wiley, Hoboken, NJ, 2021. [Google Scholar]

[CIT0013] 13.de Uña-Álvarez J. and Van Keilegom I., Efron–Petrosian integrals for doubly truncated data with covariates: An asymptotic analysis, Bernoulli 27 (2021), pp. 249–273. [Google Scholar]

[CIT0014] 14.Esteve-Pérez S., Máñez-Callejo J.A., and Sanchis-Llopis J.A., Does a “survival-by-exporting” effect for SMEs exist?, Empirica 35 (2008), pp. 81–104. [Google Scholar]

[CIT0015] 15.Esteve-Pérez S.A., Sanchis-Llopis A., and Sanchis-Llopis J.A., The determinants of survival of Spanish manufacturing firms, Review of Industrial Organization 25 (2004), pp. 251–273. [Google Scholar]

[CIT0016] 16.EUROSTAT , Business Demography (Online), https://ec.europa.eu/eurostat/web/structural-business-statistics/business-demography, 2022.

[CIT0017] 17.Fotopoulos G. and Louri H., Determinants of hazard confronting new entry: Does financial structure matter?, Rev. Ind. Organ. 17 (2000), pp. 285–300. [Google Scholar]

[CIT0018] 18.Frank G., Chae M., and Kim Y., Additive time-dependent hazard model with doubly truncated data, J. Korean. Stat. Soc. 48 (2019), pp. 179–193. [Google Scholar]

[CIT0019] 19.Gilbert B.A., Audretsch D.B., and McDougall P.P., The emergence of entrepreneurship policy, Small Bus. Econ. 22 (2004), pp. 313–323. [Google Scholar]

[CIT0020] 20.Kalbfleisch J.D. and Prentice R.L., The Statistical Analysis of Failure Time Data, 2nd ed., Wiley, Hoboken, NJ, 2002. [Google Scholar]

[CIT0021] 21.Kosová R. and Lafontaine F., Survival and growth in retail and service industries: Evidence from franchised chains, J. Ind. Econ. 58 (2010), pp. 542–578. [Google Scholar]

[CIT0022] 22.Mandel M., de Uña-Álvarez J., Simon D.K., and Betensky R.A., Inverse probability weighted Cox regression for doubly truncated data, Biometrics 74 (2018), pp. 481–487. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0023] 23.Martin E.C. and Betensky R.A., Testing quasi-independence of failure and truncation times via conditional Kendall's tau, J. Am. Stat. Assoc. 100 (2005), pp. 484–492. [Google Scholar]

[CIT0024] 24.Martínez-Senra A.I., Quintás M.A., Sartal A., and Vázquez X.H., How can firms' basic research turn into product innovation? The role of absorptive capacity and industry appropriability, IEEE Trans. Eng. Manage. 62 (2015), pp. 205–216. [Google Scholar]

[CIT0025] 25.Moreira C. and de Uña-Álvarez J., Bootstrapping the NPMLE for doubly truncated data, J. Nonparametr. Stat. 22 (2010), pp. 567–583. [Google Scholar]

[CIT0026] 26.Nogueira M.A., Fernández-López S., Calvo N., and Rodeiro-Pazos D., Firm characteristics, financial variables and types of innovation: Influence in Spanish firms' survival, Int. J. Entrep. Innov. Manag. 22 (2018), pp. 57–79. [Google Scholar]

[CIT0027] 27.Ortiz-Villajos J.M. and Sotoca S., Innovation and business survival: A long-term approach, Res. Policy. 47 (2018), pp. 1418–1436. [Google Scholar]

[CIT0028] 28.Qin J. and Shen Y., Statistical methods for analyzing right-censored length-biased data under Cox model, Biometrics 66 (2010), pp. 382–392. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0029] 29.Rennert L. and Xie S.X., Cox regression model with doubly truncated data, Biometrics 74 (2018), pp. 725–733. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0030] 30.Rennert L. and Xie S.X., Bias induced by ignoring double truncation inherent in autopsy-confirmed survival studies of neurodegenerative diseases, Stat. Med. 38 (2019), pp. 3599–3613. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0031] 31.Salter A.J. and Martin B.R., The economic benefits of publicly funded basic research: A critical review, Res. Policy. 30 (2001), pp. 509–532. [Google Scholar]

[CIT0032] 32.Shen P.S., Nonparametric analysis of doubly truncated data, Ann. Inst. Stat. Math. 62 (2010), pp. 835–853. [Google Scholar]

[CIT0033] 33.Therneau T.M. and Grambsch P.M., Modeling Survival Data: Extending the Cox Model, Springer, New York, 2000. [Google Scholar]

[CIT0034] 34.Ugur M. and Vivarelli M., Innovation, firm survival and productivity: The state of the art, Econ. Innov. New Technol. 30 (2021), pp. 433–467. [Google Scholar]

[CIT0035] 35.Vakulenko-Lagun B., Mandel M., and Betensky R.A., Inverse probability weighting methods for Cox regression with right-truncated data, Biometrics 76 (2020), pp. 484–495. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0036] 36.Wang M.C., Hazard regression analysis for length-biased data, Biometrika 83 (1996), pp. 343–454. [Google Scholar]

[CIT0037] 37.Weissbach R. and Wied D., Truncating the exponential with a uniform distribution, Statistical Papers 2021.

[CIT0038] 38.Ye Z.S. and Tang L.C., Augmenting the unreturned for field data with information on returned failures only, Technometrics 58 (2016), pp. 513–523. [Google Scholar]

[CIT0039] 39.Zhu H. and Wang M.C., Analysing bivariate survival data with interval sampling and application to cancer epidemiology, Biometrika 99 (2012), pp. 345–361. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Cox regression with doubly truncated responses and time-dependent covariates: the impact of innovation on firm survival

J de Uña-Álvarez

A I Martínez-Senra

M S Otero-Giráldez

M A Quintás

Abstract

1. Introduction

2. Methodology