Skip to main content
PLOS One logoLink to PLOS One
. 2022 Oct 5;17(10):e0274755. doi: 10.1371/journal.pone.0274755

Statistical methodologies for evaluation of the rate of persistence of Ebola virus in semen of male survivors in Sierra Leone

Ndema Habib 1,*, Michael D Hughes 2, Nathalie Broutet 1, Anna Thorson 1, Philippe Gaillard 1, Sihem Landoulsi 1, Suzanne L R McDonald 1, Pierre Formenty 3; on behalf of Sierra Leone Ebola Virus Persistence Study Group
Editor: Mohammad Asghari Jafarabadi4
PMCID: PMC9534448  PMID: 36197875

Abstract

The 2013–2016 Ebola virus (EBOV) outbreak in West Africa was the largest and most complex outbreak ever, with a total number of cases and deaths higher than in all previous EBOV outbreaks combined. The outbreak was characterized by rapid spread of the infection in nations that were weakly prepared to handle it. EBOV ribonucleic acid (RNA) is known to persist in body fluids following disease recovery, and studying this persistence is crucial for controlling such epidemics. Observational cohort studies investigating EBOV persistence in semen require following up recently recovered survivors of Ebola virus disease (EVD), from recruitment to the time when their semen tests negative for EBOV, the endpoint being time-to-event. Because recruitment of EVD survivors takes place weeks or months following disease recovery, the event of interest may have already occurred. Survival analysis methods are the best suited for the estimation of the virus persistence in body fluids but must account for left- and interval-censoring present in the data, which is a more complex problem than that of presence of right censoring alone. Using the Sierra Leone Ebola Virus Persistence Study, we discuss study design issues, endpoint of interest and statistical methodologies for interval- and right-censored non-parametric and parametric survival modelling. Using the data from 203 EVD recruited survivors, we illustrate the performance of five different survival models for estimation of persistence of EBOV in semen. The interval censored survival analytic methods produced more precise estimates of EBOV persistence in semen and were more representative of the source population than the right censored ones. The potential to apply these methods is enhanced by increased availability of statistical software to handle interval censored survival data. These methods may be applicable to diseases of a similar nature where persistence estimation of pathogens is of interest.

Introduction

The 2013–2016 Ebola virus (EBOV) outbreak in West Africa, currently known as the largest and most complex outbreak since the virus was discovered in 1976, saw more cases and deaths than all earlier outbreaks combined [1]. Sierra Leone, Liberia and Guinea were the most affected countries. They contributed to the largest burden of Ebola virus disease (EVD) and deaths, with over 28,000 cases and over 10,000 EVD survivors requiring convalescent care [2]. The outbreak was marked by a rapid spread of infection in these three insufficiently prepared nations. It resulted in high case fatality rates (CFRs) reportedly 21.5%, 40.9%, and 60.8% in Sierra Leone, Liberia and Guinea respectively, and almost reversed developmental gains achieved over the previous years [3].

Following disease recovery, EBOV ribonucleic acid (RNA) has been detected in survivors various body fluids including sweat, saliva, urine and conjunctival fluid, with EBOV clearance in these body fluids occurring well under 100 days [4, 5]. However, studies show EBOV persists longer in semen [5, 6]. In the Sierra Leone Ebola Virus Persistence study (SLEVPS), Thorson et al. [6] reported a maximum duration of persistence of EBOV in semen of 696 days following discharge from Ebola treatment unit (ETU).

EBOV persistence in semen can be estimated by quantifying the risk (hazard) at which the virus clears from semen, which involves following up EVD survivors from disease recovery (after discharge from EVD treatment unit (ETU)) to the time when semen is confirmed to be negative for EBOV.

However, in EBOV persistence studies, time of EBOV clearance in body fluids cannot be observed with precision, either because the event occurred prior to first study visit, attributable to delays in recruitment, or between study visits. SLEVPS reported a median delay to recruit of 258 days (counted from ETU discharge) with 610 days as a maximum while the interval between scheduled consecutive visits for semen testing was two weeks [6, 7]. In Guinea’s PostEboGui study, a median delay from symptoms onset to recruitment was 319 days with a maximum of 810 days and the interval between two consecutive visits for semen testing ranged from 4–24 weeks [8].

Estimating EBOV persistence in semen is best implemented through application of survival analysis methods, due to the nature of the endpoint being time-to-event. An important advantage of these methods is their ability to handle data even when the survival time is not directly observed (or is censored).

There are three types of censoring encountered in survival. The first type which is the most encountered in prospective cohort studies in general is right censoring, whereby the event of interest has not yet occurred by the time of last visit. In the context of EBOV persistence, right censoring occurs when an EVD survivor who tested positive for semen on recruitment is yet to be confirmed EBOV-negative by the time of last contact, either because of their withdrawal from study or loss to follow-up (LFU).

The second type is left censoring whereby the event of interest has already occurred by the time of study recruitment however, with the interval during which the event occurred known. Left censoring is a common scenario in studies of EBOV persistence in body fluids and is caused by delayed entry (recruitment) of survivors at the time when the virus has already been cleared from the body fluid, with the interval in which this occurred known to be between ETU discharge and study recruitment [7, 9].

Left censoring is different from left truncation where the event of interest is not observed because the person was never enrolled in the study, for example, because they died before being enrolled. Left truncation is therefore assumed when participants whose event of interest occurred prior to recruitment are not included in a survival analysis.

The third type is interval censoring whereby the event of interest occurs within a specified time interval in the context of a periodic longitudinal study follow-up. The interval censoring can occur when the survivors who are EBOV-positive for semen on recruitment have the virus cleared in between follow-up visits. In studies of virus persistence in semen, it is common for the interval between visits for sample collection to be longer than planned. This may happen when a survivor cannot provide a semen sample during a scheduled study visit or when a sample is collected but does not meet the quality requirements for laboratory testing, necessitating a repeat sample collection at a later visit.

The date of earliest detection of EBOV in semen should theoretically be the starting point of observation in the estimation of the virus persistence in semen. However, this date is practically impossible to ascertain because of difficulties in obtaining semen samples from acute EVD patients for testing. On the other hand, understanding EBOV persistence during the post-acute infection period is of more public health interest in order to understand the possibility of sexual transmission of EBOV through semen.

Hence, in such studies, the population of interest is males who survived the acute EBOV infection phase, who would be expected to be sexually active again and therefore at risk of transmitting the virus. The survivors’ date of discharge from ETU (following confirmed blood negative EBOV), in this case, serves as the starting point for estimating EBOV persistence in the semen. It has not been possible to collect semen samples for testing at the time of ETU discharge. However, the SLEVPS findings showed that the probability of EBOV-positivity for semen declined with increasing duration between the ETU discharge and recruitment; in various studies, it approached value of 1.0 with shorter duration [7, 1012]. Based on SLEVPS, the assumption of EBOV-positivity for semen at ETU discharge seemed reasonable and was therefore assumed for this paper.

In epidemiology and public health, there has been a wide application of survival analysis methods dealing with right censoring [1316]. In the context of a carefully designed clinical trial or any other study design in which the starting point of risk observation is fully under the control of the researcher, left censoring is expected to not to pose a problem, this being a more common scenario in public health. However, it is less common for the starting point of risk observation to be beyond the control of the researcher, like it is the case for EBOV persistence studies which requires utilization of appropriate methods to account for left censoring. The left- and right censoring are both special cases of interval censoring [17]. Currently, rich literature exists on the methods for analysis of interval censored outcomes, that include the use of non-parametric [18, 19], semi-parametric [2022] and parametric methods [17, 23, 24]. There is also a handful of major statistical software for example SAS, R and STATA that are currently equipped with easy to apply survival routines to handle interval censored data [2527]. But it proves occasionally necessary to use a combination of software, based on quality of graphical capabilities, and sometimes the need for manual computation of some parameters estimates whenever these cannot be directly obtained from the software. A single easy solution is not necessarily available, and a combination might be needed to overcome some limitations in available software.

Several studies have examined persistence of EBOV in body fluids, including semen, following clinical recovery from the disease, where maximum duration for virus positivity of the body fluid samples was reported [4, 28]. Sissoko et al., [12] applied mathematical modelling of time-series viral load quantitative seminal fluid data threshold cycle (Ct) of 26 EVD survivors in a cohort study setting to systematically determine the dynamics of virus persistence over time, and using the model predicted median and 90th percentile times for virus clearance. However, there was no indication of how the authors accounted for the interval censored nature of the data in the time-series modelling.

There is limited literature illustrating how the right- and interval censored survival techniques can be applied in the estimation of persistence of EBOV in body fluids, given the study design. From the review of current literature, only one paper, by Subtil et al., [8], was identified that reported follow-up and persistence of EBOV in semen among 188 male EVD survivors (Guinea PostEboGui study), and applied survival methodologies that accounted for the interval censored nature of the data. However, there was no thorough description of how the determination of the lower and upper bounds of the left- and interval censored events was implemented.

This paper is aimed at describing the theoretic, study design and methodological considerations for non-parametric and parametric survival approaches for estimating persistence of Ebola virus in semen in the presence of interval censoring. Using SLEVPS design, the paper illustrates the application of these methodologies; discusses the resulting persistence estimates from different models; and highlights strengths and weaknesses of each of these approaches for EBOV persistence estimation in semen.

Materials and methods

Sierra Leone Ebola virus persistence study: Aims, population, design and data collection procedures

SLEVPS recruitment took place from May 2015 to May 2016 in Sierra Leone in two locations: the 34 Military Hospital (MH34) (an urban facility in Freetown, Western District) and Lungi Government hospital (a semi-rural facility in Lungi, Port Loko District). EVD survivors were recruited through meetings held in collaboration with the Sierra Leone Association of Ebola Survivors, and other survivor support groups.

The study consisted of a convenience sample of 220 adult male survivors of EVD, enrolled in two phases, at various times after discharge from an ETU. The survivors were followed prospectively to determine the duration and correlates of persistence of EBOV in semen. Eligible consenting survivors provided semen specimens at recruitment and two weeks later (the two baseline visits). Those specimens were tested for the presence of EBOV RNA using a quantitative reverse transcriptase polymerase chain reaction (qRT-PCR) test. Follow-up visits continued until semen tested twice consecutively qRT-PCR negative for EBOV RNA.

The qRT-PCR test targeted two genes for EBOV detection in semen: NP and VP40 during phase 1 of the study, and NP and GP in phase 2 of the study [7]. For the persistence analysis purposes using survival methods, the semen specimen was considered EBOV-positive if there was a detection of EBOV RNA in one or both gene targets; and EBOV-negative if there was no detection of EBOV RNA in both gene targets. Confirmed EBOV negativity occurred when there were two consecutive EBOV-negative results from semen specimens collected at any two consecutive visits.

Those found to be EBOV-positive for any of the two baseline specimens were followed-up every two weeks thereafter until the semen specimens tested EBOV-negative on two consecutive visits. EBOV-positive or -negative semen test results were considered as valid results, whereas non-interpretable EBOV results (due to semen specimen poor quality, insufficient quantity or contamination) were considered as non-valid and therefore excluded from the persistence analysis.

The primary event of interest was confirmed EBOV negativity (EBOV clearance) in semen with the endpoint being the time to confirmed EBOV negativity in semen, measured in days from the date of ETU discharge. The date of confirmed EBOV negativity was the earlier of two consecutive dates with samples showing EBOV-negativity in semen. The date of ETU discharge was chosen as the time of origin (Time zero) due to interest in persistence during the post-recovery period for EBOV disease.

For this study, right censoring was implemented at the visit prior to the last to ensure independent (non-informative) censoring, which is an important assumption in analyzing censored survival data [29, 30]. The earliest opportunity for study staff to collect and test a semen specimen was at the first recruitment (baseline) visit.

The study population, implementation, specimen collection and testing, as well as the nature of the collected baseline social, clinical and behavioural indicators during and after the EVD acute phase have been thoroughly detailed elsewhere [7, 9].

Ethics

Ethical permission was granted from the Sierra Leone Ethics and Scientific Review Committee and the WHO Ethical Review Committee (No. RPC736). All study participants signed an informed consent.

Primary outcome assessment, study participant types and design considerations

Fig 1 illustrates different time points (t1, t2 and t3, measured in days) of assessment of confirmed EBOV negativity status in semen, as determined from the date of ETU discharge (time zero), for three types of SLEVPS participants (P1, P2 and P3) grouped according to whether they experienced the event of interest, and in case they did, by when this was observed. It was assumed that all the recruited participants were EBOV-positive in semen at time zero.

Fig 1. Study participants time to confirmed negative Ebola virus RNA in semen, by type of censoring experienced.

Fig 1

Let t1 be the time from ETU discharge to study entry (recruitment) visit for the participants who had a valid EBOV semen test result at this point. For those who did not have a valid EBOV semen test result, t1 becomes the time from ETU discharge to the first visit beyond recruitment having a valid EBOV semen test result.

P1 are those participants who became confirmed EBOV-negative for semen at time t1 and are therefore considered as left censored. P2 would be those who were EBOV-positive for semen at t1 and became confirmed EBOV-negative during study follow-up at time t2. On the other hand, P3 would be those participants who were EBOV-positive for semen at time t1 and became right censored at time t3.

Two types of study populations are in consideration: Population S0, that includes all recruited EVD survivors, independent of the status of the event of interest at time t1; and Population S1, a sub-population of S0 that includes only survivors who were yet to experience the event of interest by time t1 (includes P2 and P3 only). Population S1 is used in this paper to illustrate the biases associated with assuming left-truncation (exclusion) of observations of participants P1.

Survival analysis methods for persistence estimation

We have chosen for illustration interval censored survival methods that correctly treat persistence data as interval censored; and for comparison, included the right censored survival methods that ignore the interval censored nature of the persistence data. For the interval censored survival methods, we illustrate how the persistence is estimated using the non-parametric survival methods as well as the parametric methods which assume the distribution of the persistence data is known.

The right censored survival approaches

The right censored (RC) survival analysis approaches are standard methods commonly applied when the time of occurrence of an event observed is known exactly or is right censored. Because the exact time at which the event occurs cannot always be observed for endpoints which can only be observed at regular intervals of visits, the right censoring methods can still be applied by assuming the time of event as equal to the time of the visit at which the event is first diagnosed as having occurred, or by imputing the time of event at the midpoint of the interval between the last visit at which the event is yet to occur and the visit at which the event is first diagnosed.

Let T denote a random variable for time duration (in days) between the date of ETU discharge and the date of reaching confirmed EBOV negativity in semen. Let δ be a censoring indicator at the observed time points (t1, t2 and t3) with value set to 1 if the participant is confirmed negative for EBOV in semen; or set to 0 otherwise. The following two approaches can be used to assign values for T and δ for the right censored survival models, with and without assuming left truncation of observations:

Approach 1: Assigning value of T as equal to time from ETU discharge to the first observed confirmed EBOV-negativity and assuming left truncation of the observations for participants of type P1

When left truncation is assumed, the participants will be included for persistence analysis conditional on being confirmed negative later than at time t1 hence use of population S1. The values of (T, δ) for P2 and P3 are (t2, 1) and (t3, 0) respectively (Table 1, Approach 1). The limitation of using this population is reduced sample size due to the left-truncation of P1 observations and therefore decreased efficiency of the model parameter estimates because of not using all available data. Furthermore, population S1 may not be representative of the population where the Ebola virus disease survivors originated, as it favours inclusion for analysis of those with prolonged EBOV persistence (became confirmed negative beyond time t1) over their peers in terms of duration t1 from ETU with shorter EBOV persistence (became confirmed negative earlier than at time t1). This therefore biases the results towards longer persistence duration.

Table 1. Right censored survival methods: The time duration from ETU discharge to confirmed EBOV negativity (Ti) and the censoring status (δi) for populations S0 and S1, and by type of participants.

Participant type Time from ETU Discharge to visit with eventa, observed or to the last visit (t) Approach 1 Approach 2 Approach 3
Time to first time, event of interest observed, assuming left truncation for P1 Time to first time, event of interest observed assuming event of interest occurred at time t1 for P1 Time to mid-point between last time, event not observed and first time, event observed
Sub-population S1 Population S0 Population S0
Endpoint T Censoring indicator δ Endpoint T Censoring indicator δ Endpoint T Censoring indicator δ
P 1 t 1 Excluded Left truncated T = t1 1 T=t12 1
P 2 t 2 T = t2 1 T = t2 1 T=t2+l2b2 1
P 3 t 3 T = t3 0 T = t3 0 T = t3 0

a Event of interest = confirmed EBOV-negativity in semen.

b Value l2 (not shown in Fig 2) is directly retrieved from the data, as the time of the last EBOV-positive result (prior to time t2) for type P2 participant.

Approach 2: Assigning value of T as equal to the time from ETU discharge to earliest observed confirmed EBOV-negativity

This is Population S0 which includes all the recruited EVD survivors (P1, P2 and P3). By including participants P1 in this population under the right censoring survival techniques, one must assume that they became confirmed EBOV-negative at time t1. The values of (T, δ) for P1, P2 and P3 are (t1, 1), (t2, 1) and (t3, 0) respectively (Table 1, Approach 2).

The advantage of using this population is increased sample size, by using data for all recruited survivors. The main weakness however is increased likelihood of overestimation of the overall persistence rate and duration, by ignoring the likelihood that confirmed EBOV negativity in semen among P1 participants may have occurred earlier than at time t1.

Approach 3: Applying single imputation of time to event, with T equal to the time to the mid-point between visits for the last EBOV-positive and the first confirmed EBOV-negative result, as counted from ETU discharge

Because the time of event is not always directly observable, estimation of event time, by use of single imputation using the midpoint of the interval between two visits is a commonly applied approach that enables application of right censored survival models in the presence of interval censored data [3133].

Specific to SLEVPS, for participant P1 the imputed time duration for T can be estimated as equal to t12. For participant P2, this is estimated as the duration to the midpoint between two consecutive time points: l2 -the time of the latest visit at which the participant was observed to be still EBOV-positive, and t2—the time of the visit at which he was observed as confirmed EBOV negative for the first time, which equals l2+t22. For participant P3 the censoring time T is equal to t3 because their observations have been right censored. In this case the values of (T, δ) for P1, P2 and P3 are (t12, 1), (l2+t22, 1) and (t3, 0) respectively (Table 1, Approach 3).

The main limitation of the mid-point imputation approach is that the persistence estimates obtained may be less accurate, especially if the interval duration from ETU discharge to time t1 varies widely between participants of type P1. For SLEVPS, this interval ranged from 4 to 9 months [7]. It has been reported that using the midpoint of an interval for estimation of time at which the event occurs, can lead to biased effect estimates [31, 34]. The midpoint approach may furthermore underestimate standard errors, especially when the intervals are wide and of varying length [35].

With values of T and δ in the format as shown in Table 1 for the right censored survival approaches 1–3, a non-parametric maximum likelihood estimator (NPMLE) right censored Kaplan-Meier (KM) estimator [36] can be used to estimate EBOV persistence rate in semen.

The KM (product-limit) estimator for persistence at time t, S(t), for right censored survival will be defined as S^(t)={1ift<t1*tit[1diYi]ift1*<t where t1* represents the first (observed or imputed) time of the confirmed EBOV-negativity event (failure time), counting from ETU discharge; with di the number of survivors confirmed to be EBOV-negative; and Yi the number of those not yet confirmed negative and have not been censored, by time t.

The interval censored survival approaches

Under the interval censored (IC) approach, the exact time T of confirmed negativity for EBOV will be contained in an interval between two time points (L, R], where L is defined as the latest time at which the participant was observed or known to be still EBOV-positive and R as the earliest time at which he was observed as confirmed EBOV-negative. For the left censored participants, L will be the time at ETU discharge (Time 0) and R will be the time t1. For the right censored participants, L will be at the visit at time t3 and R can be set to infinity (∞). In majority of statistical programs, the infinite value of R for the right censored individuals is usually set to missing. For the participants whose confirmed EBOV-negativity occurred between two study visits, their time T is considered as interval censored.

Table 2 shows the respective interval censoring intervals for the three types of participants P1, P2 and P3 who were left-, interval- and right censored, being equal to (0, t1], (l2, t2] and (t3, ∞) respectively. To apply this approach the lower and upper limits of the interval (L, R] such that L< T ≤ R have to be determined.

Table 2. Interval censoring methods: Distribution of the lower and the upper limits of censoring interval at which the failure time of interest, T occurred, for the three scenarios, based on population S0.
Participant type Time from ETU discharge to visit with eventa of interest observed or to the last visit (t) INTERVAL CENSORED APPROACHES KM NPMLE Survival (Approach 4) and Parametric Survival (Approach 5) Population S0
L R Type of censoring
P 1 t 1 L = 0 R = t1 Left censoring
P 2 t 2 L = l2b R = t2 Interval censoring
P 3 t 3 L = t3 R = Right censoring

a Event of interest = confirmed EBOV-negativity in semen.

b Value l2 (not shown in Fig 2) is directly retrieved from the data, as the time of the last EBOV-positive result (prior to time t2) for type P2 participant.

Approaches 4 and 5 below show how the interval censored non-parametric and parametric survival approaches can be applied to estimate EBOV persistence, with the persistence data put in the format (L, R].

Approach 4: The non-parametric maximum likelihood estimator Kaplan Meier’s Turnbull interval censored model

The non-parametric maximum likelihood estimator (NPMLE) is one of developments implemented in the statistical analysis programs that permit use of the non-parametric KM methods to analyze interval censored data. Consider a sample of n subjects from a homogeneous population of male EVD survivors followed from ETU discharge to confirmed EBOV-negativity in semen and having non-informative interval censored observations {Ii}i=1n={I1,I2,,In} where Ii = (Li, Ri] is the interval known to contain the unobserved T for the ith subject.

From the observed {Ii}i=1n, a set of non-overlapping intervals {(pj,qj]}j=1m where

p1q1<p2q2<p3q3<<pmqm is generated, over which the non-parametric EBOV persistence rate function S(t) = P(Ti>t) is estimated.

Let αij denote the event indicator in which it is equal to 1 if the interval (pj, qj]⊆Ii and equals to zero otherwise. Let ϑj = S(pj)−S(qj) be the weight in the jth interval and the probability of a confirmed EBOV-negativity event occurring in this interval.

Assuming independence, the vector parameter ϑ = (ϑ1, ϑ2,…,ϑm)′ can be estimated by maximising with respect to ϑ1, ϑ2,…,ϑm the likelihood

LS(ϑ)=i=1nProb{Li<TiRi}=i=1n[S(Li)S(Ri)]=i=1nj=1mαijϑj, under the condition that j=1mϑj=1 and ϑj≥0 for j = {1, 2, …, m} [18]. One of the algorithms that can be used to maximize LS(ϑ) is an Expected Maximization Iterative Convex Minorant (EM-ICM) algorithm [37].

The maximum likelihood estimates (MLEs) ϑ1, ϑ2,…,ϑm would yield the NPMLE of EBOV persistence function S(t) to be uniquely determined over observed non-overlapping intervals (pj, qj], and given by S(t)={1ift<q1k=j+1mϑ^kifpjtqj+10t>pm

SAS Procedure ICLIFETEST, with a built-in capability for interval censored data [38], available in SAS/STAT Version 14.1 [26] can be used to estimate the KM interval censored NPMLEs of the EBOV persistence rate in semen. This procedure applies the EM-ICM algorithm that supports the Turnbull algorithm [18] and computes standard errors using multiple imputation methods. SAS Procedure ICLIFETEST uses by the default 1000 multiple imputations. The EBOV persistence rate estimates obtained from this model are available only in a set of non-overlapping intervals and cannot be uniquely estimated in the case of overlapping (Turnbull) intervals between participants. Other major statistical analysis software which can also provide the NPMLEs of the interval censored data, include R packages “Interval” [27, 39] or “icenReg” with call function ic_np (where np stands for non-parametric)or relatively large samples with >100,000 observations [27, 40]; and also STATA “IntCens” package [25, 41, 42].

Approach 5: Interval censored Weibull (parametric) model

One advantage of parametric models is that they tend to give more precise parameter estimates when there is a good fit to the data, since they are based on fewer parameters compared to the non-parametric survival models. Exponential, log-normal, log-logistic and Weibull are among the commonly used parametric survival distributions. For this paper we chose the Weibull distribution in apriori, because of its flexibility as both a proportional hazard (PH) as well as an accelerated failure time (AFT) model; and furthermore because it estimates and forecasts more accurately with extremely small samples.

The Weibull model can be fitted for the interval censored data in the (L, R] format, with or without baseline covariates. For this study, the Weibull persistence probabilities were estimated based on expected times (from ETU discharge) to EBOV clearance, using the estimated the Weibull shape parameter given as α=1σ (where σ, is the extreme value scale parameter estimate) and scale λ = exp(μ) (where μ is the intercept parameter estimate) obtained from the fit of an intercept-only model in SAS Procedure LIFEREG. Hence the semen EBOV persistence survival curve using Weibull distribution can expressed in terms of the scale λ and shape α as follows: S(t;λ,α)=exp((tλ)α) [26], whereby shape α gives an indication of whether the hazard rate, in this case rate of confirmed EBOV negativity in semen, decreases (α <1), is constant (α = 1) or increases (α > 1) over time: while scale λ>0, determines the duration of persistence of EBOV in semen. There is also an alternative parameterization of the Weibull survival function which can also expressed as S(t; b, α) = exp(−bt−α) where scale b is expressed as b = λ−α. For this paper we used the earlier parameterization.

In addition to SAS, other statistical packages that can fit Weibull and other parametric survival models to interval censored survival-time data include R using function “survreg” [27]; and STATA package “stintreg” [4143].

We used the SLEVPS data to illustrate the estimation of persistence of EBOV in semen using the five survival models. SAS software was used for estimation of median EBOV persistence duration and the corresponding 95% confidence interval (CI). We used R statistical software Version 3.1 to plot EBOV persistence curves emanating from the estimates produced by the five approaches. For plotting the interval censored KM persistence curve in R, “Icens” and “Interval” packages were used [44], with the “Icens” package implementing an Expected-Maximization (EM) algorithm to obtain the survival estimates.

Estimation of percentiles of EBOV persistence and 95% confidence interval

Percentiles. Let the pth percentile, denoted as tp (where p = {50,75,90}) represent the smallest observed time following ETU discharge at which probability of EBOV persistence in semen, S(tp)<(1−p/100). The values of tp were estimated directly from the survival functions of the five models with: SAS Procedure LIFETEST for non-parametric EBOV persistence estimation assuming data is right censored; ICLIFETEST procedure used for the non-parametric estimation assuming the data is interval censored; and LIFEREG procedure for parametric estimation assuming Weibull-distributed interval censored EBOV persistence data.

Standard errors for the percentiles. The standard errors (SE) of tp were estimated following the methodology outlined in the book by Collett [45].

Let t(j) be the jth ordered confirmed EBOV-negativity event time (j = 1, 2, …, r).

The SEs for the four non-parametric EBOV persistence KM models (Approaches 1–4) were computed as follows:

SE(tp)=1f^(tp)×SE{S^(tp)}, where f^(tp)=S^(u^p)S^(l^p)l^pu^p; with

u^p=Max{t(j)¦S(t(j))[1(p100)+ϵ]} as the maximum observed time where KM estimate of EBOV persistence probability [1(p100)+ϵ]; and

l^p=Min{t(j)¦S(t(j))[1(p100)ϵ]} as the smallest observed time t(j) where KM estimate of EBOV persistence probability [1(p100)ϵ]. The value of ϵ = 0.05 was used.

The values of u^p,l^p,S^(u^p),S^(l^p) and SE{S^(tp)} were obtained from the SAS output of the KM survival models. SAS-estimated SE{S^(tp)} using Greenwood formula and imputed SEs were used for SE{S^(tp)} for the KM-RC and KM-IC models respectively. Following directly from above, the corresponding lower and upper confidence limits of tp for the four right- and interval censored KM models were estimated linearly as tp∓1.96×SE(tp).

The SE of tp for the Weibull parametric interval censored model (Approach 5) was directly invoked from SAS Procedure LIFEREG. The lower and upper 95% confidence limits of the percentiles given by the formula [45] exp [ln(tp){1.96×SE(ln(tp))}] where SE(ln(t^p))=1t^p×SE(t^p); with the tp and SE(tp) values.

Results

Table 3 shows the distribution of survivors entering intervals of follow-up (in days) relative to time point t1 and the corresponding number of survivors who became confirmed EBOV-negative during each of the intervals. This table shows that 88 out of the 203 participants recruited at time t1, were already confirmed EBOV-negative by this time (P1 participants).

Table 3. Crude follow-up time (in days) and observed confirmed EBOV status of male survivors counting from enrolment visit t1a.

Start time interval (days) from enrolment visit (t1) # entering the interval # withdrawn from study # confirmed negative for EBOV
  Enrolment (visit t1a) 203 0 88
    1–30 115 1 42
    31–60 72 1 19
    61–90 52 1 11
    91–180 40 0 28
    181–270 12 1 5
    271–360 6 2 3
    361–450 1 0 0
    451–540 1 1 0
Total # with at least one semen specimen with valid results 7 196

a t1 refers to the recruitment or post-recruitment visit at which the semen sample collected yielded the first valid (positive or negative) result for EBOV result.

Fig 2 illustrates survival curves for the five candidate approaches used for estimation of EBOV persistence in semen. The KM right censoring (Approach 1) which assumes the confirmed negativity occurred at the first time it is observed and in addition assumes left truncation for P1 participants, results in the persistence curve that is shifted to the right, leading to overestimation of EBOV persistence duration.

Fig 2. EBOV persistence using right- and interval censored non-parametric and parametric approaches.

Fig 2

The KM right censored (Approach 2) which also assumes the confirmed negativity occurred at the first time it is observed, and at time t1 for P1 participants, also results in an overestimation of persistence which is more extreme than that in Approach 1. Survival models applying KM right censored midpoint imputation (Approach 3), KM interval censored multiple imputations (Approach 4) and Weibull interval censoring (Approach 5), yield persistence curves which are much closer together and persistence rate estimates which are much lower compared to those from Approaches 1 and 2. The fit of the Weibull model on the EBOV persistence data yielded the scale (λ) parameter value of 251.6 (95% CI 230.1, 275.1) days. It also yielded a shape (α) parameter value of 2.14 (95% CI 1.84, 2.49), which is above 1.0, indicating rate of clearing of the virus in the semen increases with time, consistent with the observed SLEVPS persistence data. When the KM-IC and Weibull persistence curves are plotted together their 95% confidence intervals clearly overlapped (S1 Fig).

Fig 3 shows that the 50th, 75th and 90th percentiles (95%CI) for EBOV persistence in semen of the EVD survivors indicating the respective times at which persistence probability was below 0.50, 0.25 and 0.10 respectively. KM IC model (Method 4) shows the persistence probability (95%CI) was <0.50 at 204 (193, 215) days, < 0.25 at 281 (244, 318) days, and was under 0.10 at 336 (300, 372) days post-ETU discharge. Approaches 3 and 5 that took into consideration the interval in which the event occurred, produced percentile estimates which were much closer to those obtained through KM IC model. Approaches 1 and 2 which did not take into account event interval produced percentiles which deviated substantially from those of KM IC model.

Fig 3. Comparison of the performance of the five non-parametric and parametric models in estimating percentiles (95%CI) for EBOV confirmed negativity in semen.

Fig 3

Discussion

The non-parametric and parametric survival models applying the right and interval censoring methodologies presented in this paper illustrated differing results in the estimation of EBOV persistence in semen. The point estimates for the rate and duration of EBOV persistence in semen as well as their precision as obtained from these models varied considerably. The right censoring survival methods that assume the confirmed negativity occurred at the first time it is observed (Approaches 1 and 2) resulted into persistence curves which were more shifted to the right towards higher persistence rate and longer persistence duration. The median duration of EBOV persistence using these two approaches was shown to be about 2–4 months longer compared to KM-IC method (Approach 4). Approaches 1 and 2 resulted in 75th and 90th percentile estimates which further deviated from those of KM-IC method (higher by 4–6 months) and produced the least precise estimates of the 50th, 75th and 90th percentiles of the persistence curve (Fig 3). On the other hand, the right censored method that applied a single midpoint imputation of the time (Approach 3) fared comparatively better, in terms of yielding estimates of persistence rate and duration that were comparable to those obtained using the interval censored approaches. This method also resulted in a more precise median EBOV duration, consistent with the KM-IC method.

The results of the EVD survivors’ data show that the Weibull IC EBOV persistence curve when considered relative to the KM-IC curve, fitted each other well beyond 400 days post-ETU discharge, with the point estimates for persistence rate for the Weibull curve slightly lower or above those of the KM-IC curve in the period before and after 200 days post-ETU, respectively (Fig 2). The Weibull IC distribution however produced estimates of EBOV persistence in semen that were almost comparable to those of KM-IC model.

It has been reported that using right censoring survival analysis methods to analyze data that consists of left- or interval censored observations may result into biased estimates, and severely underestimated standard errors [46].

Left censoring was present in the SLEVPS with 88 (43%) of 203 participants confirmed EBOV negative on recruitment. This was also reported in the Guinea’s PostEboGui study by Subtil et al., [8] where 173 (91.9%) out of the 188 male EVD survivors tested negative for EBOV in semen on recruitment following discharge from the treatment centre whereby both parametric and non-parametric (Turnbull) estimators were used in the persistence estimation.

Relative to the EBOV persistence in semen estimation, three types of biases may have been induced because of applying the single imputation right censored KM survival models.

The first type is selection bias due to left-truncation of the observations of participants confirmed EBOV-free in semen at the time of first specimen with valid result was obtained (Visit t1), (Approach 1). This bias leads to loss of sample information since the participants excluded at this time who had a shorter EBOV persistence duration might be characteristically different from their included peers who had longer persistence (beyond time t1) despite both groups being recruited at around the same time from ETU discharge. Furthermore, there is loss of sample size which would affect the precision of the persistence endpoint estimates.

The second type of potential bias is due to failure to consider the time interval during which the confirmed EBOV negativity occurred in the survival analysis (Approaches 1 and 2). The magnitude of this bias is dependent on how long the interval is between visits containing time at which the event occurs. This however is important for the Sierra Leone cohort since some EVD survivors had a long interval between visits. Firstly, there was a long interval from when they were discharged from ETU to the time they were recruited, where for a vast majority of the participants this period was longer than 3 months and went as high as 19 months. The effect of this is seen in Approach 2, since the inclusion of the left censored participants by imputing their time at which the event occurs at t1 led to a shift of the persistence curve to longer durations of persistence. This shift was more extreme in this study even relative to the Approach 1 which applies the same methodology but truncates the observations for the left censored participants. The right censoring survival model with the single midpoint imputation (Approach 3) is also prone to this type of bias especially when the intervals between visits are too long.

The third possible bias may be as a result of possible underestimation of standard errors due to single imputation of the right censored survival methods. However, from the results, the right censored KM model with midpoint imputation resulted in median duration estimates and precision which did not deviate much from those obtained through the KM-IC model.

The KM-IC model hence is the most appealing for estimating EBOV persistence in semen as it is the most efficient and does not require prior distributional assumptions for the baseline hazard.

Several major statistical software packages that can handle interval censored proportional hazards regression modelling that account for covariates adjustment. For the Sierra Leone study, SAS Procedure ICPHREG with a piecewise constant parameterization for the baseline hazard was used to fit an interval censored proportional hazards (PH) regression model that explored and adjusted for important predictors and effect-modifiers of being EBOV-free in semen [6]. Other statistical software that integrate covariates in the semi-parametric regression model include R package “icenReg” with call function “ic_sp” (where sp stands for semi-parametric) [28, 41] and STATA package “stintcox” [42, 43]. Fully parametric interval censored multivariable regression models can be fitted also using SAS Procedure LIFEREG; R package “incenReg” call function “ic_par”; and STATA package “stinreg”.

Percentiles of virus persistence in semen provide the probability of EBOV persisting beyond a certain time period. This is of clinical and public health importance as it helps with informing semen testing survivor programmes and policy formation surrounding duration of use of certain preventive measures (including sexual abstinence and condom use aimed at minimizing sexual transmission of the virus), and therefore the possibility of preventing future outbreaks. Furthermore, extreme upper tail virus persistence percentiles are important in understanding duration following ETU discharge that a group of survivors who are slowest to clear EBOV, become EBOV-negative.

One challenge faced was in the estimation of the SEs for the lower and upper tails of the non-parametric (KM) survival percentile distributions. While current statistical procedures like SAS ICLIFETEST or LIFETEST can easily estimate the SEs for the central survival percentiles (25th, 50th and 75th) also referred to as survival quartiles, these routines do not automatically estimate the SEs for the extreme lower and upper percentiles. For consistency, the standard errors for the 50th, 75th and 90th percentiles for the EBOV persistence in this paper were computed manually using the formulae outlined in the book by Collett [29], combined with SAS-produced estimates required in these respective formulae. The 95% confidence limits for the survival quartiles computed manually were compared against those readily estimable in the SAS program and showed a difference in width of the intervals between the two methods of estimation of the percentiles of the 4 non-parametric (KM) models (under linearly transformed 95% CI) not exceeding three weeks. For the Weibull interval censored model, the LIFEREG procedure had the in-built capability to estimate all the percentiles and corresponding SEs.

Conclusions

Survival models that take into account the interval nature of the data on EBOV persistence in semen ensure statistically robust and unbiased estimates of EBOV persistence in this body fluid. Through comparison of estimates obtained using the right and interval censoring approaches, the methodologies that account for interval censoring result in shorter confidence interval (and therefore more precise estimates) which are also more representative of the source population compared to right censored approach (that ignore interval censoring). With increasing availability of statistical routines like SAS, R, STATA and other software to handle interval censored data, it has become relatively easier to apply them. The non-parametric and semi-parametric interval censoring survival methods should therefore be highly considered for use in estimation of virus persistence in body fluids of EVD survivors. Where good fit is demonstrated, the parametric interval censored methods including those that use the Weibull distribution should be considered as they give more precise estimates. These models can also be applied to study persistence in other types of pathogen such as Zika virus.

Supporting information

S1 Fig. EBOV persistence: Comparison of interval-censored Weibull model to the KM model with 95% CI.

“Republished from [Thorson AE, Deen GF, Bernstein KT, Liu WJ, Yamba F, Habib N, et al. (2021) Persistence of Ebola virus in semen among Ebola virus disease survivors in Sierra Leone: A cohort study of frequency, duration, and risk factors. PLoS Med 18(2): e1003273. https://doi.org/10.1371/journal. pmed.1003273] under a CC BY license, with permission from [PLOS Medicine], original copyright [2021]”.

(TIF)

Acknowledgments

Dr Gilda Piaggio, Statistician, Geneva, Switzerland

Dr Soe Soe Thwin, Statistician, The World Health Organization

Sierra Leone Ebola Virus Persistence Study (SLEVPS) Group

Sierra Leone Ministry of Health and Sanitation: Gibrilla Fadlu Deen (principal investigator), James Bangura, Amara Jambai, Faustine James, Alie Wurie, Francis Yamba.

Sierra Leone Ministry of Defence: Foday Sahr, Thomas A. Massaquoi, Foday R. Sesay,

Sierra Leone Ministry of Social Welfare, Gender, Children’s Affairs: Tina Davies,

World Health Organization: Nathalie Broutet (Principal Investigator), Pierre Formenty, Anna E. Thorson, Archchun Ariyarajah, Florence Baingana, Marylin Carino, Antoine Coursier, Kara N. Durski, Faiqa Ebrahim, Ndema Habib, Philippe Gaillard, Margaret O. Lamunu, Sihem Landoulsi, Jaclyn E. Marrinan, Suzanna L. R McDonald, Dhamari Naidoo, Carmen Valle, Teodora Wi, Zabulon Yoti.

United States Centers for Disease Control and Prevention: Barbara Knust (principal investigator), Neetu Abad, Aneesah Akbar-Uqdah, Sarah D. Bennett, Kyle T. Bernstein, Aaron C. Brault, Bobbie Rae Erickson, Elizabeth Ervin, Sara Hersey, Jill Huppert, John D. Klena, Tasneem Malik, Oliver Morgan, Dianna Ng, Stuart T. Nichol, Lydia Poroman, Lance Presser, Christine Ross, Tara K. Sealy, Ute StroÈher,

Chinese Center for Disease Control and Prevention: Wenbo Xu (principal investigator), Mifang Liang, Hongtu Liu, William Jun Liu, Guizhen Wu, Yong Zhang,

Joint United Nations Programme on HIV/AIDS (UNAIDS): Patricia Ongpin.

Data Availability

Contractual agreements between the study parties (WHO/HRP, US CDC, China CDC, and the MoH Sierra Leone) exists and data rights reside with MOH-SL, the material owner. Inquiries related to the data, may be directed to Study Data Oversight Committee (SIS@who.int) with Subject: “Inquiry on Ebola Semen persistence study data” All essential study documents including analytic datasets will be stored in the HRP e-Archive system. They are not available for public access. We do not own individual level data, so upon request, and when appropriate data share agreements are in place, the data manager of the e-Archive system will be able to transfer the dataset to the recipient. If the de-identified study dataset has not been migrated to the e-Archive system yet, then the statistician or study data-manger would be sharing the dataset upon receipt of necessary data share agreements.

Funding Statement

The Sierra Leone Ebola Virus Persistence Study (SLEVPS) team acknowledges the contributions of the WHO Ebola Response Program, the Paul G. Allen Family Foundation, and the UNDP (United Nations Development Program)–UNFPA (United Nations Population Fund)–UNICEF–WHO–World Bank Special Program of Research, Development and Research Training in Human Reproduction (HRP), a cosponsored program executed by the WHO; the US CDC, the China CDC, the Sierra Leone Ministry of Health and Sanitation and the Ministry of Defence, and the Joint United Nations Program on HIV/AIDS in support of the SLEVPS.

References

  • 1.WHO. WHO Fact sheet N°103. Ebola virus disease Geneva2016 [updated January 2016; cited 2016 17 April 2016]. Available from: http://www.who.int/mediacentre/factsheets/fs103/en/.
  • 2.WHO. Interim Guidance Geneva: The World Health Organization; 2016 [updated 11 April 2016]. Available from: http://apps.who.int/iris/bitstream/10665/204235/1/WHO_EVD_OHE_PED_16.1_eng.pdf.
  • 3.Oleribe OO, Salako BL, Ka MM, Akpalu A, McConnochie M, Foster M, et al. Ebola virus disease epidemic in West Africa: lessons learned and issues arising from West African countries. Clin Med (Lond). 2015;15(1):54–7. Epub 2015/02/05. doi: 10.7861/clinmedicine.15-1-54 PubMed Central PMCID: PMC4954525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Chughtai AA, Barnes M, Macintyre CR. Persistence of Ebola virus in various body fluids during convalescence: evidence and implications for disease transmission and control. Epidemiol Infect. 2016;144(8):1652–60. Epub 2016/01/26. doi: 10.1017/S0950268816000054 ; PubMed Central PMCID: PMC4855994. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Thorson A, Formenty P, Lofthouse C, Broutet N. Systematic review of the literature on viral persistence and sexual transmission from recovered Ebola survivors: evidence and recommendations. BMJ Open. 2016;6(1):e008859. Epub 2016/01/09. bmjopen-2015-008859 [pii] doi: 10.1136/bmjopen-2015-008859 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Thorson AE, Deen GF, Bernstein KT, Liu WJ, Yamba F, Habib N, et al. Persistence of Ebola virus in semen among survivors in Sierra Leone: A cohort study of frequency, duration and risk factors. PLOS Medicine. 2021;18(2). Epub 10 February 2021. 10.1371/journal.pmed.1003273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Deen GF, Broutet N, Xu W, Knust B, Sesay FR, McDonald SLR, et al. Ebola RNA Persistence in Semen of Ebola Virus Disease Survivors—Final Report. N Engl J Med. 2017;377(15):1428–37. Epub 2015/10/16. doi: 10.1056/NEJMoa1511410 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Subtil F, Delaunay C, Keita AK, Sow MS, Toure A, Leroy S, et al. Dynamics of Ebola RNA Persistence in Semen: A Report From the Postebogui Cohort in Guinea. Clin Infect Dis. 2017;64(12):1788–90. Epub 2017/03/23. doi: 10.1093/cid/cix210 . [DOI] [PubMed] [Google Scholar]
  • 9.Deen GF, McDonald SLR, Marrinan JE, Sesay FR, Ervin E, Thorson AE, et al. Implementation of a study to examine the persistence of Ebola virus in the body fluids of Ebola virus disease survivors in Sierra Leone: Methodology and lessons learned. PLoS Negl Trop Dis. 2017;11(9):e0005723. Epub 2017/09/12. doi: 10.1371/journal.pntd.0005723 PNTD-D-17-00141 [pii]. . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Deen GF, Knust B, Broutet N, Sesay FR, Formenty P, Ross C, et al. Ebola RNA Persistence in Semen of Ebola Virus Disease Survivors—Preliminary Report. N Engl J Med. 2015. Epub 2015/10/16. doi: 10.1056/NEJMoa1511410 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Uyeki TM, Erickson BR, Brown S, McElroy AK, Cannon D, Gibbons A, et al. Ebola Virus Persistence in Semen of Male Survivors. Clin Infect Dis. 2016. Epub 2016/04/06. ciw202 [pii] doi: 10.1093/cid/ciw202 . [DOI] [PubMed] [Google Scholar]
  • 12.Sissoko D, Duraffour S, Kerber R, Kolie JS, Beavogui AH, Camara AM, et al. Persistence and clearance of Ebola virus RNA from seminal fluid of Ebola virus disease survivors: a longitudinal analysis and modelling study. Lancet Glob Health. 2016;5(1):e80–e8. Epub 2016/12/14. S2214-109X(16)30243-1 [pii] doi: 10.1016/S2214-109X(16)30243-1 . [DOI] [PubMed] [Google Scholar]
  • 13.Klein JP, Moeschberger ML. Survival Analysis: Techniques for Censored and Truncated Data. New York: Springer; 1997. [Google Scholar]
  • 14.Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. NJ: Wiley; 2002. [Google Scholar]
  • 15.Maternal HIV-1 disease progression 18–24 months postdelivery according to antiretroviral prophylaxis regimen (triple-antiretroviral prophylaxis during pregnancy and breastfeeding vs zidovudine/single-dose nevirapine prophylaxis): The Kesho Bora randomized controlled trial. Clin Infect Dis. 2012;55(3):449–60. Epub 2012/05/11. cis461 [pii] doi: 10.1093/cid/cis461 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.de Vincenzi I. Triple antiretroviral compared with zidovudine and single-dose nevirapine prophylaxis during pregnancy and breastfeeding for prevention of mother-to-child transmission of HIV-1 (Kesho Bora study): a randomised controlled trial. Lancet Infect Dis. 2011;11(3):171–80. Epub 2011/01/18. S1473-3099(10)70288-7 [pii] doi: 10.1016/S1473-3099(10)70288-7 . [DOI] [PubMed] [Google Scholar]
  • 17.Lindsey JC, Ryan LM. Tutorial in biostatistics methods for interval-censored data. Stat Med. 1998;17(2):219–38. Epub 1998/03/04. doi: [pii]. . [DOI] [PubMed] [Google Scholar]
  • 18.Turnbull BW. The Empirical Distribution Function with Arbitrarily Grouped, Censored, and Truncated Data. Journal of the Royal Statistical Society, Series B. 1976;38:290–5. [Google Scholar]
  • 19.Grover G, Shakeri N. Nonparametric estimation of survival function of HIV+ patients with doubly censored data. J Commun Dis. 2007;39(1):7–12. Epub 2008/03/15. . [PubMed] [Google Scholar]
  • 20.Alioum A, Commenges D. A proportional hazards model for arbitrarily censored and truncated data. Biometrics. 1996;52(2):512–24. Epub 1996/06/01. . [PubMed] [Google Scholar]
  • 21.Finkelstein DM. A proportional hazards model for interval-censored failure time data. Biometrics. 1986;42(4):845–54. Epub 1986/12/01. . [PubMed] [Google Scholar]
  • 22.Goggins WB, Finkelstein DM. A proportional hazards model for multivariate interval-censored failure time data. Biometrics. 2000;56(3):940–3. Epub 2000/09/14. doi: 10.1111/j.0006-341x.2000.00940.x . [DOI] [PubMed] [Google Scholar]
  • 23.Langohr K, Gomez G, Muga R. A parametric survival model with an interval-censored covariate. Stat Med. 2004;23(20):3159–75. Epub 2004/09/28. doi: 10.1002/sim.1892 . [DOI] [PubMed] [Google Scholar]
  • 24.Gu X, Shapiro D, Hughes MD, Balasubramanian R. Stratified Weibull Regression Model for Interval-Censored Data. R J. 2014;6(1):31–40. Epub 2014/06/01. . [PMC free article] [PubMed] [Google Scholar]
  • 25.Griffin J. INTCENS: Stata module to perform interval-censored survival analysis. Statistical Software Components: Boston College Department of Economics; 2005. [Google Scholar]
  • 26.Inc SI. SAS Institute. The SAS System for Windows. Release 9.4. SAS/STAT® 14.1 User’s Guide. 2015.
  • 27.Team RC. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2019. [Google Scholar]
  • 28.Schindell BG, Webb AL, Kindrachuk J. Persistence and Sexual Transmission of Filoviruses. Viruses. 2018;10(12). Epub 2018/12/06. doi: 10.3390/v10120683 ; PubMed Central PMCID: PMC6316729. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Collett D. Modelling Survival Data in Medical Research. 3 ed. Francesca Dominici JJF, Martin Tanner, Jim Zidek, editor: CRC Press; 2015. 521 p. [Google Scholar]
  • 30.Clark TG, Bradburn MJ, Love SB, Altman DG. Survival analysis part I: basic concepts and first analyses. Br J Cancer. 2003;89(2):232–8. Epub 2003/07/17. doi: 10.1038/sj.bjc.6601118[pii]. . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Law CG, Brookmeyer R. Effects of mid-point imputation on the analysis of doubly censored data. Stat Med. 1992;11(12):1569–78. Epub 1992/09/15. doi: 10.1002/sim.4780111204 . [DOI] [PubMed] [Google Scholar]
  • 32.Freitag MH, Peila R, Masaki K, Petrovitch H, Ross GW, White LR, et al. Midlife pulse pressure and incidence of dementia: the Honolulu-Asia Aging Study. Stroke. 2006;37(1):33–7. Epub 2005/12/13. doi: 10.1161/01.STR.0000196941.58869.2d [pii] . [DOI] [PubMed] [Google Scholar]
  • 33.Helmer C, Joly P, Letenneur L, Commenges D, Dartigues JF. Mortality with dementia: results from a French prospective community-based cohort. Am J Epidemiol. 2001;154(7):642–8. Epub 2001/10/03. doi: 10.1093/aje/154.7.642 . [DOI] [PubMed] [Google Scholar]
  • 34.Odell PM, Anderson KM, D’Agostino RB. Maximum likelihood estimation for interval-censored data using a Weibull-based accelerated failure time model. Biometrics. 1992;48(3):951–9. Epub 1992/09/01. . [PubMed] [Google Scholar]
  • 35.Leffondre K, Touraine C, Helmer C, Joly P. Interval-censored time-to-event and competing risk with death: is the illness-death model more accurate than the Cox model? Int J Epidemiol. 2013;42(4):1177–86. Epub 2013/08/01. dyt126 [pii] doi: 10.1093/ije/dyt126 . [DOI] [PubMed] [Google Scholar]
  • 36.Lee ET. Statistical methods for survival data analysis. 2 ed. New York: Wiley and Sons; 1992. [Google Scholar]
  • 37.Wellner JA, Zhan Y. A hybrid algorithm for computation of the non-parametric maximum likelihood estimator from censored data. Journal of the American Statistical Association. 1997;92:945–59. [Google Scholar]
  • 38.Guo C, So Y, Johnston G. Paper SAS279-2014.Analyzing Interval-Censored Data with the ICLIFETEST Procedure. 2014.
  • 39.Fay MP, Shaw PA. Exact and Asymptotic Weighted Logrank Tests for Interval Censored Data: The interval R package. J Stat Softw. 2010;36(2). Epub 2010/08/01. doi: 10.18637/jss.v036.i02 ; PubMed Central PMCID: PMC4184046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Anderson-Bergman C. Using icenReg for interval censored data in R 2020 [cited 2021 26-June-2021]. Version 2.0.9:[Available from: https://cran.r-project.org/web/packages/icenReg/vignettes/icenReg.pdf.
  • 41.StataCorp. Stata: Release 17 Statistical Software. College Station, TX: StataCorp LLC; 2021. [Google Scholar]
  • 42.LLC S. STATA Survival Analysis Reference manual. 4905 Lakeway Drive, College Station, Texas 77845: Stata Press; 2021. [cited 2021 26-June-2021]. Available from: https://www.stata.com/manuals/st.pdf. [Google Scholar]
  • 43.Yang X. Analyzing interval-censored survival-time data in Stata. 2017 Stata Conference2017.
  • 44.R: A Language and Environment for Statistical Computing. 2.14 ed. Vienna, Austria: R Development Core Team; 2012. p. R Foundation for Statistical Computing. [Google Scholar]
  • 45.Collett D. Modelling Survival Data in Medical Research. London: Chapman & Hall; 1994. [Google Scholar]
  • 46.Cain KC, Harlow SD, Little RJ, Nan B, Yosef M, Taffe JR, et al. Bias due to left truncation and left censoring in longitudinal studies of developmental and disease processes. Am J Epidemiol. 2011;173(9):1078–84. Epub 2011/03/23. kwq481 [pii] doi: 10.1093/aje/kwq481 . [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Mohammad Asghari Jafarabadi

7 Apr 2021

PONE-D-21-05351

Statistical methodologies for evaluation of the rate of persistence of Ebola virus in semen of male survivors in Sierra Leone

PLOS ONE

Dear Dr. Habib,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by May 22 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Mohammad Asghari Jafarabadi

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Thank you for conducting your study according to STROBE guidelines. We ask that you complete and upload a copy of the STROBE checklist (http://www.strobe-statement.org) as a supplemental file.

3. We noted in your submission details that a portion of your manuscript may have been presented or published elsewhere.

"Yes. Figure 3 has been published before in Thorson A. et al. et al. (2021) Persistence of Ebola virus in semen among Ebola virus disease survivors in Sierra Leone: A cohort study of frequency, duration, and risk factors. PLoS Med 18(2): e1003273. https://doi.org/10.1371/journal.pmed.1003273. This figure has been used because it displays the comparison of estimates of interval-censored non-parametric survival model to those of parametric Weibull model. It further shows the overlap in the 95% confidence interval for the two models."

Please clarify whether this publication was peer-reviewed and formally published. If this work was previously peer-reviewed and published, in the cover letter please provide the reason that this work does not constitute dual publication and should be included in the current manuscript.

4. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

5. One of the noted authors is a group or consortium [Sierra Leone Ebola Virus Persistence Study Group]. In addition to naming the author group and listing the individual authors and affiliations within this group in the acknowledgments section of your manuscript, please also indicate clearly a lead author for this group along with a contact email address.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors of this manuscript present convincing evidence for use of statistical methods that consider interval censoring when examining persistence of EBOV in semen among SLEVPS participants. Their methods section does a good job introducing different statistical options along with their weaknesses. However, the writing in other sections of this paper, while grammatically correct, is often not written with clear arguments and supporting evidence. The introduction and discussion sections suffer in particular. It is unclear whether the authors are writing a methods tutorial paper with SLEVPS as a convenient example, if they are writing specifically to researchers who work with data like SLEVPS, or if they are trying to improve estimates of EBOV persistence in semen. I recommend that the authors think about what they are trying to communicate specifically with this manuscript and to whom. Once that is identified, I think it will be easier to resolve many of the issues. Full comments in attached file.

Reviewer #2: The authors present an interesting paper that describes different approaches to analysis of time to event data subject to different kinds of censoring. This endeavour deserves merit, and they explain very nicely the different approaches and show their results using a data set with information about traces of the Ebola virus in semen.

However, there are several issues that have to be addressed before I can recommend a publication of the manuscript:

Major issues:

1. It is unclear what the main aim of using these models is. Should individual out-of-sample predictions be possible in order to advise future Ebola survivors? Or does the main interest lie in observing the mean or median behaviour (or the percentiles) of the population of Ebola survivors?

2. This point is closely related to the former point and actually the most important part of my review: The authors can only describe the different estimated survival curves/percentiles/standard errors. But it is impossible to say which of the models is best, because the authors fail to define criteria which can be used to measure the model quality. At several places the authors claim that one or the other model is unbiased, but they cannot say this without a criterion for model quality. They sometimes describe that a model leads to more precise estimates in the sense of having smaller confidence intervals, but this does not necessarily mean that the respective model is correct.

Depending on the main goal of fitting these models (see point 1), I suggest the following changes: If the main goal is prediction, the authors should predefine criteria for predictive model choice such as proper scoring rules (e.g. the Brier score) that can take into account both sharpness (precision) and calibration of a predictive model or judge the calibration alone. In addition they should separate the data set in two parts, the training set and the test set, because otherwise the model fit will always be too good. Alternatively they could use some kind of cross-validation.

If, however, the main aim of the models is parameter estimation/a description of the population, they should show some criteria for model fit/model choice such as some version of AIC/BIC/DIC or the likes that are appropriate for the models they use and allow for a somewhat independent comparison of the quality of the models. In addition, they could add a plot that shows the observed data, so that it is easier to judge.

If the authors don't want or cannot do this, they could alternatively give a restructured description of the different approaches and their advantages and disadvantages (e.g. distributional assumptions, number of parameters,...) and only show the results of the models as an addition at the and, but they should remove all judgement from the results part that is not justified (all about bias or the notion that a more precise confidence interval is always better).

3. It becomes already clear from the description of the approaches that approach 1 and 2 that take only right-censoring into account are not appropriate for the data set. Therefore it is not at all surprising that the estimates are quite far away from the ones from the models for interval-censored data, this statement is a bit trivial. It is still nice to see, but it does not add much to the paper, while using relatively much space in the methods part. I recommend to shorten the explanation a bit and focus more on the other three methods.

4. The authors write in the introduction that there are only few possibilities regarding statistical software with ready-to-use functions for interval-censored data. This gave me the impression that the five selected approaches were among those few, but this is only partly true. Whereas the models could be fitted using existing SAS routines, the standard errors for percentiles obviously had to be computed/programmed by hand, and R plotting functions were used (do such possibilities not exist in SAS?) which makes the recommendations by far more difficult to use for non-statistician readers. The authors should maybe make it clearer in the introduction that their paper also does not provide an easy solution to the software problem.

5. Again closely related to the former point: The authors only use SAS. Can they please provide more information about functions in other software packages such as R and STATA, so that readers who don't have access to SAS can still use their recommendations?

6. On page 15 the authors explain the approach for interval censoring and how the values for L and R are chosen in the different groups. For group P3 they say that R is chosen to be infinity. I wonder if that is equivalent to treating those values as right-censored or, respectively, how this works technically/computationally or what would change if some real, albeit very large, value would be used instead. Can the authors elaborate a bit more?

7. Would it be possible to integrate covariates in the models? And as the authors seem to think that the percentiles are of particular importance: Did they think about using some quantile regression model? I don't want the authors to actually use these models, but a short discussion would be informative.

Minor points:

1. It would be helpful to have an explanatory graph at the beginning that shows the important time points (such as hospital discharge and the main timepoints used in the study), maybe along with the different types of censoring and trunkation. It would help to understand the situation and in consequence, parts of the explanations later on could be shortened a bit.

2. On page 15 (top) the authors write that there is both left- and right-censoring present and then write "Therefore, the EBOV persistence data can be rightly be considered as interval-censored." This conclusion is wrong. Interval-censoring is not defined this way, but rather that the event of interest happens between two fixed time points. Actually, the authors continue with the correct definition, so I recommend to either delete or rewrite this sentence.

3. The authors could consider moving Figure 3 to the appendix. It is already pretty clear from looking at Figure 2 that the three interval-censoring approaches yield comparable results and I therefore expected the confidence intervals to overlap. So there is not much additional information.

4. Is there a particular reason for choosing the 50th, 75th and 90th percentiles? Any clinical interpretation? If so, a short explanation would be nice.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: Habib et al notes.docx

PLoS One. 2022 Oct 5;17(10):e0274755. doi: 10.1371/journal.pone.0274755.r002

Author response to Decision Letter 0


8 Sep 2021

We are also pleased to respond to the very important reviewers’ comments as follows:

Reviewer #1: The authors of this manuscript present convincing evidence for use of statistical methods that consider interval censoring when examining persistence of EBOV in semen among SLEVPS participants. Their methods section does a good job introducing different statistical options along with their weaknesses. However, the writing in other sections of this paper, while grammatically correct, is often not written with clear arguments and supporting evidence. The introduction and discussion sections suffer in particular. It is unclear whether the authors are writing a methods tutorial paper with SLEVPS as a convenient example, if they are writing specifically to researchers who work with data like SLEVPS, or if they are trying to improve estimates of EBOV persistence in semen. I recommend that the authors think about what they are trying to communicate specifically with this manuscript and to whom. Once that is identified, I think it will be easier to resolve many of the issues. Full comments in attached file.

AUTHORS RESPONSE: we are writing specifically to researchers who work on persistence estimation of Ebola in body fluids, like is the case for the SLEVPS, however these methodologies may be applicable to other areas facing similar study design challenges.

GENERAL NOTES

COMMENT 1: Writing style often has a lot of fluff. Could generally be more to the point.

AUTHORS RESPONSE: Manuscript has been revised to maximize clarity.

COMMENT 2: Authors have a tendency to make vague claims with little support.

AUTHORS RESPONSE: We have revised to ensure that claims are supported by evidence.

COMMENT 3: I expected some examination of existing literature (or of mention of lack thereof) that has either used or failed to use appropriate statistical approaches to evaluating time to event data in this context. However, that context is not present in this manuscript. What are the mistakes people are making? How might this have affected previous estimation of EBOV persistence in semen? What do the authors findings mean in the broader context of EBOV persistence research?

AUTHORS RESPONSE: We have added examination of literature in the introduction section lines 142-157. Many of the Ebola persistence studies are small in size and a good number involve case studies that are mostly descriptive. reporting the maximum duration of persistence. We found only one study (Sissoko et al 2016) that used the time series approach in the determination of persistence, and another (Subtil et al, 2017) that acknowledged the interval censored nature of the semen EBOV persistence data and used the appropriate methods. However no indepth description on how they determined the lower and upper bounds for the event of interest in the context of left- and interval censored events. Our paper seems to be among the first to describe thoroughly all aspects of the study design of EBOV persistence, as well as on how parameters can be estimated both in the context of non-parametric and parametric survival modelling. In the broader context our paper seeks to demonstrate the issues to consider designwise and analysis-wise while estimating persistence and proposes survival analyses techniques that take into account the interval nature of the data to accomodate the three types of censoring. In the presence of censoring, the survival techniques are still the most suited for assessment of persistence of EBOV in semen of survivors.

ABSTRACT

COMMENT 4: The methods themselves are well-described, but insight into the rationale for choosing this particular set of methods may be lacking.

AUTHORS RESPONSE: We have included rationale on the RC and IC methods on lines 237-242, and for the choice behind the Weibull parametric distribution on 388-394.

COMMENT 5: [256-261] This explanation is correct but the equation is incorrect. The midpoint between l2 (time of last + visit) and t2 (time of first - visit) would be (*). This error also occurs in Table 1.

AUTHORS RESPONSE: " Counting from time 0, the time (*) in its most simplified form. To conserve space, we have decided to keep the simplified original form of the equation. (Please note (*) is a mathematical equation which could not be printed in this textbox. Full equations are provided in the letter to reviewers (pdf format).

COMMENT 6: [265-267] There is a solid statistical rationale for why midpoint imputation can underestimate the standard error. Bias when using midpoint imputation also has a solid foundation, but the consequences are more conceptually complex. I think the way this is reported is fine but would prefer to see these two biases discussed as separate ideas rather than in one long sentence.

AUTHORS RESPONSE: We have separated these into two statements (lines 308-311)

COMMENT 7: [ [307] Would prefer to see NPLME KM written out. Appears in a heading and what it means is not written out before this point.

AUTHORS RESPONSE: the NPMLE has been written out in full on line 278. We have also written in out in full in the heading (line 349-351)

COMMENT 8: [ [308-309] Are these actually recent developments? The original Turnbull paper was published in the 70s and a quick google search shows that an R package accommodating these methods was published around 2010. Avoid making unsubstantiated or unclear claims.

AUTHORS RESPONSE: We have omitted the word "recent" from the statement.

COMMENT 9: [ [307-341, 368-386] While I am not opposed to more equation-based explanations in papers, the choice to provide this level of detail in these sections is somewhat puzzling. These seem like opportunities to explain the why these approaches may be preferred and conceptually how they work. Perhaps the approach used in these sections confuses me because it is unclear what audience the authors are intending to reach.

AUTHORS RESPONSE: We still think it is important to convey this information to readers so as to also enable those with interest in understanding the theoretical background as well as, in other cases to get an idea as to how the estimates were derived from the formulae.

DISCUSSION

COMMENT 10: [450-451] One cannot say that Approach 4 is preferred simply because its estimates are "more precise and statistically unbiased." Inference is limited to the data at hand in this project. Thus, one cannot know the true unbiased estimate, and increased precision doesn't necessarily indicate an improvement. An example of this issue – midpoint imputation has very precise estimates at 50% and 75% -- is already baked into this manuscript. The authors need to consider and articulate their arguments clearly. They have outlined reasons why we would expect approaches 1-3 to generate estimates that poorly resemble the truth, and how approaches 4-5 account for the limitations of 1-3. Lean on that argument here.

AUTHORS RESPONSE: We have revised text accordingly.

COMMENT 11: [454-458] Underestimated relative to what?

AUTHORS RESPONSE: We have revised text. Relative to KM-IC curve (lines 519-520)

Reviewer #2: The authors present an interesting paper that describes different approaches to analysis of time to event data subject to different kinds of censoring. This endeavour deserves merit, and they explain very nicely the different approaches and show their results using a data set with information about traces of the Ebola virus in semen.

AUTHORS RESPONSE: We thank the reviewer for this motivating comment

MAJOR ISSUES

COMMENT 1: It is unclear what the main aim of using these models is. Should individual out-of-sample predictions be possible in order to advise future Ebola survivors? Or does the main interest lie in observing the mean or median behaviour (or the percentiles) of the population of Ebola survivors?

AUTHORS RESPONSE: The main interest is in describing the performance of these models, and commenting on the differences in the estimation of the cumulaive rate of persistence distribution as well as and percentile estimates using SLEVPS sample data.

COMMENT 2: This point is closely related to the former point and actually the most important part of my review: The authors can only describe the different estimated survival curves/percentiles/standard errors. But it is impossible to say which of the models is best, because the authors fail to define criteria which can be used to measure the model quality. At several places the authors claim that one or the other model is unbiased, but they cannot say this without a criterion for model quality. They sometimes describe that a model leads to more precise estimates in the sense of having smaller confidence intervals, but this does not necessarily mean that the respective model is correct.

AUTHORS RESPONSE: We have revised manuscript to compare the strength and the limitation of each of the approaches to estimating persistence.

COMMENT 3: Depending on the main goal of fitting these models (see point 1), I suggest the following changes: If the main goal is prediction, the authors should predefine criteria for predictive model choice such as proper scoring rules (e.g. the Brier score) that can take into account both sharpness (precision) and calibration of a predictive model or judge the calibration alone. In addition, they should separate the data set in two parts, the training set and the test set, because otherwise the model fit will always be too good. Alternatively, they could use some kind of cross-validation.

If, however, the main aim of the models is parameter estimation/a description of the population, they should show some criteria for model fit/model choice such as some version of AIC/BIC/DIC or the likes that are appropriate for the models they use and allow for a somewhat independent comparison of the quality of the models. In addition, they could add a plot that shows the observed data, so that it is easier to judge.

If the authors don't want or cannot do this, they could alternatively give a restructured description of the different approaches and their advantages and disadvantages (e.g. distributional assumptions, number of parameters,...) and only show the results of the models as an addition at the and, but they should remove all judgement from the results part that is not justified (all about bias or the notion that a more precise confidence interval is always better).

AUTHORS RESPONSE: We have followed this latter suggestion to discuss more on how the five survival models selected function, the results of the fit and more on advantages and disadvantages. We have only discussed on the perfomance of the models based on the estimates they give.

COMMENT 4: It becomes already clear from the description of the approaches that approach 1 and 2 that take only right-censoring into account are not appropriate for the data set. Therefore it is not at all surprising that the estimates are quite far away from the ones from the models for interval-censored data, this statement is a bit trivial. It is still nice to see, but it does not add much to the paper, while using relatively much space in the methods part. I recommend to shorten the explanation a bit and focus more on the other three methods.

AUTHORS RESPONSE: We have revised the reduced the description of the right-censored data and add more on the interval censored methods (lines 388-410)

COMMENT 5:The authors write in the introduction that there are only few possibilities regarding statistical software with ready-to-use functions for interval-censored data. This gave me the impression that the five selected approaches were among those few, but this is only partly true. Whereas the models could be fitted using existing SAS routines, the standard errors for percentiles obviously had to be computed/programmed by hand, and R plotting functions were used (do such possibilities not exist in SAS?) which makes the recommendations by far more difficult to use for non-statistician readers. The authors should maybe make it clearer in the introduction that their paper also does not provide an easy solution to the software problem.

AUTHORS RESPONSE: Thanks for highlighting this. We have included this in the introduction (lines 137-141)

COMMENT 6: Again closely related to the former point: The authors only use SAS. Can they please provide more information about functions in other software packages such as R and STATA, so that readers who don't have access to SAS can still use their recommendations?

AUTHORS RESPONSE: Yes. We have included R and STATA as among the statistical software programs that can also estimate IC survival data both in non-parametric and parametric way. (lines 408-410; 566-570) and integration of covariates in the IC regression models lines (561-571).

COMMENT 7: On page 15 the authors explain the approach for interval censoring and how the values for L and R are chosen in the different groups. For group P3 they say that R is chosen to be infinity. I wonder if that is equivalent to treating those values as right-censored or, respectively, how this works technically/computationally or what would change if some real, albeit very large, value would be used instead. Can the authors elaborate a bit more?

AUTHORS RESPONSE: We have added a sentence clarifying that for the right-censored individual the upper limit of 'infinity' is usually replaced by a missing value, which the program interprets it as right-censored value. (lines 328-330)

COMMENT 8: Would it be possible to integrate covariates in the models? And as the authors seem to think that the percentiles are of particular importance: Did they think about using some quantile regression model? I don't want the authors to actually use these models, but a short discussion would be informative.

AUTHORS RESPONSE: Yes it is possible to integrate covariates in the semi-parametric PH IC models. We have brief paragraph that explains this and different statistical packages that can analyze this, also in the context of parametric modelling (lines 561-570)

MINOR POINTS

COMMENT 9: It would be helpful to have an explanatory graph at the beginning that shows the important time points (such as hospital discharge and the main timepoints used in the study), maybe along with the different types of censoring and trunkation. It would help to understand the situation and in consequence, parts of the explanations later on could be shortened a bit.

AUTHORS RESPONSE: Ythe important timepoints have been described in Figure 1 that was saved separately as S1 Fig.tiff.

COMMENT 10: On page 15 (top) the authors write that there is both left- and right-censoring present and then write "Therefore, the EBOV persistence data can be rightly be considered as interval-censored." This conclusion is wrong. Interval-censoring is not defined this way, but rather that the event of interest happens between two fixed time points. Actually, the authors continue with the correct definition, so I recommend to either delete or rewrite this sentence.

AUTHORS RESPONSE: We have revised lines 323-331 adequately describes the interval censoring.

COMMENT 11: The authors could consider moving Figure 3 to the appendix. It is already pretty clear from looking at Figure 2 that the three interval-censoring approaches yield comparable results and I therefore expected the confidence intervals to overlap. So there is not much additional information.

AUTHORS RESPONSE: We have moved Figure 3 to the appendix

COMMENT 12: Is there a particular reason for choosing the 50th, 75th and 90th percentiles? Any clinical interpretation? If so, a short explanation would be nice.

AUTHORS RESPONSE: Percentiles of viral persistence in semen, provides the probability of the virus EBOV persisting beyond a certain time period, this being of clinical and public health importance as it helps in informing semen testing survivor programmes; as well as in formulating policies surrounding duration of use of certain preventive measures including sexual abstinence and condom use aimed at minimizing sexual transmission of the virus and therefore preventing possibility future outbreaks. Extreme upper tail viral persistence percentiles furthermore are important in understanding how long it takes following ETU discharge that a group of survivors slowest to clear EBOV become EBOV-free. We have added this explanation on lines 571-578.

We thank both reviewers for their time and for the very useful comments they provided.

We are kindly resubmitting our manuscript for consideration.

Sincerely,

Ndema Habib, PhD

Statistician, The Department of Sexual and Reproductive Health Research (SRH), WHO, Geneva.

Attachment

Submitted filename: Response to Reviewres_PLOS One_Review_24Aug2021.docx

Decision Letter 1

Mohammad Asghari Jafarabadi

27 Sep 2021

PONE-D-21-05351R1Statistical methodologies for evaluation of the rate of persistence of Ebola virus in semen of male survivors in Sierra LeonePLOS ONE

Dear Dr. Habib,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Nov 11 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Mohammad Asghari Jafarabadi

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Thank you for so graciously considering and responding to our many comments. The revised manuscript is much improved and does a good job presenting the already solid research done by the authors.

There are a few awkward sentences and grammatical errors (comma splices and run-on sentences are a common problem) at: 70-73, 76-79, 79-81, 82-85, 88-91, 92-93 & potentially other locations.

[126-132] Side note (no changes/response requested) – left censoring is actually a big problem in many public health studies because of immortal time bias.

[126-141] Good paragraph

[142-157] Thank you for adding, this is helpful context.

[Table 1] Unfortunately, equations in author response (and from my initial comment) were turned into an asterisk by (I suspect) the program that created my pdf, so I cannot fully evaluate author's response.

[530-534] It's interesting thinking about this problem with such a high proportion of left-censored participants.

Reviewer #2: Thank you for addressing my comments. I especially like the software information that is now given at several places.

My only recommendation is to go again through the manuscript with particular attention to grammar mistakes/funny expressions. Especially in the new or changed parts of the manuscript, there are several small mistakes, probably due to some haste in which the manuscript was rewritten.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Oct 5;17(10):e0274755. doi: 10.1371/journal.pone.0274755.r004

Author response to Decision Letter 1


5 Nov 2021

We are also pleased to respond to the very important reviewers’ comments as follows:

Reviewer #1: Review Comments to the Author

Thank you for so graciously considering and responding to our many comments. The revised manuscript is much improved and does a good job presenting the already solid research done by the authors.

AUTHORS RESPONSE: We appreciate your comment.

There are a few awkward sentences and grammatical errors (comma splices and run-on sentences are a common problem) at: 70-73, 76-79, 79-81, 82-85, 88-91, 92-93 & potentially other locations.

AUTHORS RESPONSE: Thank you. We have revised and corrected as per lines 71-74, 77-82, 83-86, 89-91, 92-93, as well as in other places.

[126-132] Side note (no changes/response requested) – left censoring is actually a big problem in many public health studies because of immortal time bias.

AUTHORS RESPONSE: This is well noted.

[126-141] Good paragraph

AUTHORS RESPONSE: Thank you for encouraging remark.

[142-157] Thank you for adding, this is helpful context.

AUTHORS RESPONSE: Noted. Thank you,

[Table 1] Unfortunately, equations in author response (and from my initial comment) were turned into an asterisk by (I suspect) the program that created my pdf, so I cannot fully evaluate author's response.

AUTHORS RESPONSE: In our last Response to reviewers’ letter, we prepared and attached separately authors’ responses in Word format as well. There, all the equations appear as they should have.

[530-534] It's interesting thinking about this problem with such a high proportion of left-censored participants.

AUTHORS RESPONSE: We thank the reviewer for this motivating comment

Reviewer #2: Review Comments to the Author

Thank you for addressing my comments. I especially like the software information that is now given at several places.

My only recommendation is to go again through the manuscript with particular attention to grammar mistakes/funny expressions. Especially in the new or changed parts of the manuscript, there are several small mistakes, probably due to some haste in which the manuscript was rewritten.

AUTHORS RESPONSE: Thank you. We have now revised and corrected the grammatical errors.

We thank both reviewers for their time and for the very useful comments they provided.

Sincerely,

Ndema Habib, PhD

Statistician, The Department of Sexual and Reproductive Health Research (SRH), WHO, Geneva.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 2

Mohammad Asghari Jafarabadi

3 Feb 2022

PONE-D-21-05351R2Statistical methodologies for evaluation of the rate of persistence of Ebola virus in semen of male survivors in Sierra LeonePLOS ONE

Dear Dr. Habib,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Mar 20 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Mohammad Asghari Jafarabadi

Academic Editor

PLOS ONE

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

Reviewer #3: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Yes

Reviewer #3: No

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

Reviewer #3: No

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: No

Reviewer #3: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: (No Response)

Reviewer #3: Comments:

As I understand the authors want to study the survival period of EBOV (Ebola virus) in the semen of patients discharged from the hospital. In other words, it can be stated that the probability of occurrence of event X (X is negative EBOV) with the available censored data.

1. The total sample size of the study is only 203 cases and out of 203, 88 cases experienced the event (negative test of EBOV) before the day of recruitment. Only 115 cases were included for follow up study. At the same time follow up period was comparatively long i.e. till the last case was tested negative or withdrawn from the study; about 451 - 540 days. After 270 days, only 6 cases were followed for such a long period of another 270 days. The follow-up period could be ended after 250 days. As the remaining sample size for the tail period of the curve was much smaller, may not be adequate to reflect the probability distribution.

2. Overall, the sample size is not sufficient to find the survival or time to event model in the given situation. As it plays an important role in testing the hypotheses regarding the suitability of methods. In such a situation, the alternative approach is to carry out the simulation studies with varying sample sizes and test the consistency of the outcomes or results.

3. The authors did not explore time-dependent (or non-proportional hazard) scenarios in this study as the survival time distribution for non-proportional time survival will be different.

4. Authors should check the work by Gruttola and Lagakos (1989) who proposed nonparametric and weakly structured parametric methods for analysing survival data in which both the time origin and the failure event can be right- or interval-censored? Full detail can be found at https://www.jstor.org/stable/pdf /2532030.pdf

5. Interval-censored data can be easily confused with grouped survival data. However, there is actually a fundamental difference between these two data structures although both usually appear in the form of intervals. The grouped survival data can be seen as a special case of interval-censored data and commonly mean that the intervals for any two subjects either are completely identical or have no overlapping. In contrast, the intervals for interval-censored data may overlap in any way. Because of this structural difference, statistical methods for grouped survival data are much more straightforward than those for interval-censored data. Did authors check the structure of their data and need to be clarified in the methods section?

6. The survival model to be used for such data are known and discussed below. It may help the author to prepare the data file for analysis.

7. 203 Ebola patients (who had been discharged from the hospital or received treatment and confirmed from the hospital record) were recruited in the study. It may not be correct if we take the start time as the day of recruitment in the study. The starting time or initial time of presence of EBOV as zero-day to be taken as the date of discharge from the hospital expecting on that day the EBOV virus is present in the semen. Accordingly, the day of recruitment is to be taken as the first observation for follow up of cases and the number of days to be counted from discharge day to the recruitment day as Ti (Time) for all the 203 patients. Ti should not be taken as zero, it may vary for each patient/recruited case depending on the date of discharge and day of recruitment ( in days and condition i.e. date of discharge < day of recruitment) or to be taken as a result of the first day of follow up.

8. Censored (left/right)data ( δ): In this scenario, we have both left and right-censored data at a different level

i) From the day of discharge to the day of recruitments for each patient, if the EBOV positive ( δ = 0) else left-censored ( δ = -1).

ii) During the follow up of all the remaining positive cases, for each case as when tested negative ( δ = 1) i.e. right-censored and no follow up. But if positive, but lost to follow up or withdrawn at that scheduled date of lost to follow up test to be taken with time of even(T) censored value(( δ = 0), and so on.

iii) After preparing the data file, the survival model can be used to calculate the survival probability of EBOV. There are several models, but the choice model will be parametric or non-parametric. But the sample size may not be adequate to use the parametric model. However, based on the characteristics of survival data one can use and find a suitable model. The detailed procedure is available in the book noted below for reference.

9. There are three or four methods/approaches suggested in this study for fitting the survival model which may be misleading or confusing the reader as the model should be applicable to any such data or possible generalization.

Book Name: Statistics for Biology and Health: Survival Analysis

Techniques for Censored and Truncated data

By John P. Klein & Melvin L. Moeschberger

Kindly read Chapter 3 also to understand the Censoring and Truncation especially page 56 to 66.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Reviewer #3: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: Comments -EBOV paper.docx

PLoS One. 2022 Oct 5;17(10):e0274755. doi: 10.1371/journal.pone.0274755.r006

Author response to Decision Letter 2


27 Jul 2022

Title: Statistical methodologies for evaluation of the rate of persistence of Ebola virus in semen of male survivors in Sierra Leone

Comments:

As I understand the authors want to study the survival period of EBOV (Ebola virus) in the semen of patients discharged from the hospital. In other words, it can be stated that the probability of occurrence of event X (X is negative EBOV) with the available censored data.

1. The total sample size of the study is only 203 cases and out of 203, 88 cases experienced the event (negative test of EBOV) before the day of recruitment. Only 115 cases were included for follow up study. At the same time follow up period was comparatively long i.e. till the last case was tested negative or withdrawn from the study; about 451 - 540 days. After 270 days, only 6 cases were followed for such a long period of another 270 days. The follow-up period could be ended after 250 days. As the remaining sample size for the tail period of the curve was much smaller, may not be adequate to reflect the probability distribution.

AUTHORS RESPONSE: For the interval censored survival estimation of EBOV persistence in semen, all the 203 male participants were followed up to the time they tested consecutively twice negative for EBOV in semen. This implies that even the 88 participants who had already experienced the event had contributed persistence data at enrolment and also at one additional post-enrolment consecutive visit; with the right-limit of the interval censoring interval set as the first of the two consecutively negative visits i.e the enrolment visit. SLEVPS team was interested also in the estimation of EBOV persistence in other body fluids, therefore all the 203 participants continued to contribute persistence data for other body fluids, even after being EBOV-free in semen. Understanding how long the persistence of the virus in semen lasts is of public health importance and this is behind using all available information on persistence even for those whose virus in the semen persisted the longest.(WHO/VHF/2018.1). We have furthermore estimated the percentiles of persistence whose figures are unaffected by the extreme observations.

2. Overall, the sample size is not sufficient to find the survival or time to event model in the given situation. As it plays an important role in testing the hypotheses regarding the suitability of methods. In such a situation, the alternative approach is to carry out the simulation studies with varying sample sizes and test the consistency of the outcomes or results.

AUTHORS RESPONSE: Thanks for your suggestion. The aim of the SLEVPS was to estimate the duration and rate at which the EBOV persists in semen. All recruited subjects were followed up to the time of confirmed EBOV-negative in semen); or until the censoring time (premature discontinuation or loss to follow up), counted from time of discharge from Ebola Treatment Unit (ETU). Given the nature of this outcome, we believe strongly that the survival (time-to-event) methods would be the most appropriate for this evaluation. This study was designed and implemented during an emergency setting of the EVD outbreak with the sample size selected based on convenience.The SLEVPS researchers had no control over the number of participants to recruit nor how soon to recruit following recovery (as described in the manuscript). For this paper we describe how the traditional right-censored methods and the interval-censored methods could be used and how they perform in the evaluation of EBOV persistence when massive left-censoring is present, due to the limitations in study design for Ebola persistence. We agree with the reviewer on the importance of conducting simulations, however for the current manuscript we believe that the data and results we presented in our manuscript still provide interesting insights to dealing with data of this nature.

3. The authors did not explore time-dependent (or non-proportional hazard) scenarios in this study as the survival time distribution for non-proportional time survival will be different.

AUTHORS RESPONSE: In the current paper we did not explore non-proportional hazards assumptions since it was not aimed at comparing survival (or hazard rates) between exposure groups. The aim of this manuscript was solely to estimate persistence (survival) among male Ebola disease survivors, using the right- and the interval-censored methods and describe their distributions.

4. Authors should check the work by Gruttola and Lagakos (1989) who proposed nonparametric and weakly structured parametric methods for analysing survival data in which both the time origin and the failure event can be right- or interval-censored? Full detail can be found at https://www.jstor.org/stable/pdf /2532030.pdf

AUTHORS RESPONSE: Thanks for your suggestion and for the reference. We have looked at it. In our paper we have illustrated hthe use of both nonparametric and parametric (Weibull) interval censored models could be applied to interval- and right.-censored Ebola persistence data.

5. Interval-censored data can be easily confused with grouped survival data. However, there is actually a fundamental difference between these two data structures although both usually appear in the form of intervals. The grouped survival data can be seen as a special case of interval-censored data and commonly mean that the intervals for any two subjects either are completely identical or have no overlapping. In contrast, the intervals for interval-censored data may overlap in any way. Because of this structural difference, statistical methods for grouped survival data are much more straightforward than those for interval-censored data. Did authors check the structure of their data and need to be clarified in the methods section?

AUTHORS RESPONSE: Our data is not grouped sutvival data, rather it was data that was collected on individual males survivors, with a known time of origin (date of ETU discharge) and known time interval in which the event of primary interest (confirmed EBOV clearance from semen) or censoring time, for each participant. This is already clarified in the manuscript (lines 211-235).

6. The survival model to be used for such data are known and discussed below. It may help the author to prepare the data file for analysis.

AUTHORS RESPONSE: Thanks for this comment. The data file was already prepared and was used to run the proposed models and estimates.

7. 203 Ebola patients (who had been discharged from the hospital or received treatment and confirmed from the hospital record) were recruited in the study. It may not be correct if we take the start time as the day of recruitment in the study. The starting time or initial time of presence of EBOV as zero-day to be taken as the date of discharge from the hospital expecting on that day the EBOV virus is present in the semen. Accordingly, the day of recruitment is to be taken as the first observation for follow up of cases and the number of days to be counted from discharge day to the recruitment day as Ti (Time) for all the 203 patients. Ti should not be taken as zero, it may vary for each patient/recruited case depending on the date of discharge and day of recruitment ( in days and condition i.e. date of discharge < day of recruitment) or to be taken as a result of the first day of follow up.

AUTHORS RESPONSE: Indeed we have taken Time 0 as the date of ETU discharge, and not the date of recruitment. Recruitment visit is the first day of post-ETU discharge follow-up.

8. Censored (left/right)data ( δ): In this scenario, we have both left and right-censored data at a different level

i) From the day of discharge to the day of recruitments for each patient, if the EBOV positive ( δ = 0) else left-censored ( δ = -1).

ii) During the follow up of all the remaining positive cases, for each case as when tested negative ( δ = 1) i.e. right-censored and no follow up. But if positive, but lost to follow up or withdrawn at that scheduled date of lost to follow up test to be taken with time of even(T) censored value(( δ = 0), and so on.

iii) After preparing the data file, the survival model can be used to calculate the survival probability of EBOV. There are several models, but the choice model will be parametric or non-parametric. But the sample size may not be adequate to use the parametric model. However, based on the characteristics of survival data one can use and find a suitable model. The detailed procedure is available in the book noted below for reference.

AUTHORS RESPONSE: This is noted, and It is along the lines that we have described the models proposed.

9. There are three or four methods/approaches suggested in this study for fitting the survival model which may be misleading or confusing the reader as the model should be applicable to any such data or possible generalization.

Book Name: Statistics for Biology and Health: Survival Analysis

Techniques for Censored and Truncated data

By John P. Klein & Melvin L. Moeschberger

Kindly read Chapter 3 also to understand the Censoring and Truncation especially page 56 to 66.

AUTHORS RESPONSE: This is noted. Thanks for your comments. We have read and referred to this book in our paper already (line 704, reference no. 13)

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 3

Mohammad Asghari Jafarabadi

4 Sep 2022

Statistical methodologies for evaluation of the rate of persistence of Ebola virus in semen of male survivors in Sierra Leone

PONE-D-21-05351R3

Dear Dr. Habib,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Mohammad Asghari Jafarabadi

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #3: All comments have been addressed

Reviewer #4: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #3: Yes

Reviewer #4: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #3: Yes

Reviewer #4: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #3: Yes

Reviewer #4: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #3: Yes

Reviewer #4: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #3: Thanks to all the authors for giving due importance to each comments and revising the manuscript accordingly.

Reviewer #4: In my opinion, the authors have correctly answered the questions and comments of the third reviewer.

Furthermore, given that the researchers have not considered the investigated data merely as a practical example for the investigated survival analysis methods and they have emphasized the health and medical aspect of the investigated data in the title, abstract, and introduction of the article, it is better to report the rate of persistence of Ebola virus in semen of male survivors obtained from the better models in the result and conclusion section of the abstract.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #3: No

Reviewer #4: No

**********

Acceptance letter

Mohammad Asghari Jafarabadi

11 Sep 2022

PONE-D-21-05351R3

Statistical methodologies for evaluation of the rate of persistence of Ebola virus in semen of male survivors in Sierra Leone

Dear Dr. Habib:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Professor Mohammad Asghari Jafarabadi

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. EBOV persistence: Comparison of interval-censored Weibull model to the KM model with 95% CI.

    “Republished from [Thorson AE, Deen GF, Bernstein KT, Liu WJ, Yamba F, Habib N, et al. (2021) Persistence of Ebola virus in semen among Ebola virus disease survivors in Sierra Leone: A cohort study of frequency, duration, and risk factors. PLoS Med 18(2): e1003273. https://doi.org/10.1371/journal. pmed.1003273] under a CC BY license, with permission from [PLOS Medicine], original copyright [2021]”.

    (TIF)

    Attachment

    Submitted filename: Habib et al notes.docx

    Attachment

    Submitted filename: Response to Reviewres_PLOS One_Review_24Aug2021.docx

    Attachment

    Submitted filename: Response to Reviewers.docx

    Attachment

    Submitted filename: Comments -EBOV paper.docx

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    Contractual agreements between the study parties (WHO/HRP, US CDC, China CDC, and the MoH Sierra Leone) exists and data rights reside with MOH-SL, the material owner. Inquiries related to the data, may be directed to Study Data Oversight Committee (SIS@who.int) with Subject: “Inquiry on Ebola Semen persistence study data” All essential study documents including analytic datasets will be stored in the HRP e-Archive system. They are not available for public access. We do not own individual level data, so upon request, and when appropriate data share agreements are in place, the data manager of the e-Archive system will be able to transfer the dataset to the recipient. If the de-identified study dataset has not been migrated to the e-Archive system yet, then the statistician or study data-manger would be sharing the dataset upon receipt of necessary data share agreements.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES