SurvivalPath:A R package for conducting personalized survival path mapping based on time-series survival data

Lujun Shen; Jinqing Mo; Changsheng Yang; Yiquan Jiang; Liangru Ke; Dan Hou; Jingdong Yan; Tao Zhang; Weijun Fan

doi:10.1371/journal.pcbi.1010830

. 2023 Jan 6;19(1):e1010830. doi: 10.1371/journal.pcbi.1010830

SurvivalPath:A R package for conducting personalized survival path mapping based on time-series survival data

Lujun Shen ^1,^2,^#, Jinqing Mo ^3,^#, Changsheng Yang ^4,^#, Yiquan Jiang ^1,^2,^#, Liangru Ke ^2,⁵, Dan Hou ⁶, Jingdong Yan ⁷, Tao Zhang ^7,^*, Weijun Fan ^1,^2,^*

Editor: Feng Fu⁸

¹Department of Minimally Invasive Interventional Therapy, Sun Yat-sen University Cancer Center, Guangzhou, People’s Republic of China

²State Key Laboratory of Oncology in South China, Collaborative Innovation Center of Cancer Medicine, Sun Yat-sen University, Guangzhou, People’s Republic of China

³Zhongshan School of Medicine, Sun Yat-sen University Cancer Center, Guangzhou, People’s Republic of China

⁴Department of Spine Surgery, Third Affiliated Hospital of Southern Medical University, Guangzhou, People’s Republic of China

⁵Department of Radiology, Sun Yat-sen University Cancer Center, Guangzhou, People’s Republic of China

⁶Deepaint Intelligence Technology Co., Ltd., Guangzhou, People’s Republic of China

⁷Department of Information, Nanfang Hospital, Southern Medical University, Guangzhou, People’s Republic of China

⁸Dartmouth College, UNITED STATES

The authors declare no competing interest.

^✉

* E-mail: nyy1920@163.com (TZ); fanwj@sysucc.org.cn (WF)

Contributed equally.

Roles

Lujun Shen: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

Jinqing Mo: Data curation, Formal analysis, Methodology, Validation, Writing – original draft

Changsheng Yang: Conceptualization, Formal analysis, Investigation, Methodology, Project administration, Validation, Visualization, Writing – original draft

Yiquan Jiang: Conceptualization, Data curation, Investigation, Project administration, Software, Writing – original draft

Liangru Ke: Data curation, Formal analysis, Methodology, Supervision, Validation, Writing – original draft

Dan Hou: Data curation, Formal analysis, Investigation, Project administration

Jingdong Yan: Data curation, Formal analysis, Investigation, Project administration, Supervision, Validation, Writing – original draft

Tao Zhang: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Software, Supervision, Validation, Visualization, Writing – review & editing

Weijun Fan: Conceptualization, Data curation, Funding acquisition, Investigation, Project administration, Resources, Supervision, Validation, Writing – review & editing

Feng Fu: Editor

PMCID: PMC9851545 PMID: 36608157

Summary

The survival path mapping approach has been proposed for dynamic prognostication of cancer patients using time-series survival data. The SurvivalPath R package was developed to facilitate building personalized survival path models. The package contains functions to convert time-series data into time-slices data by fixed interval based on time information of input medical records. After the pre-processing of data, under a user-defined parameters on covariates, significance level, minimum bifurcation sample size and number of time slices for analysis, survival paths can be computed using the main function, which can be visualized as a tree diagram, with important parameters annotated. The package also includes function for analyzing the connections between exposure/treatment and node transitions, and function for screening patient subgroup with specific features, which can be used for further exploration analysis. In this study, we demonstrate the application of this package in a large dataset of patients with hepatocellular carcinoma, which is embedded in the package. The SurvivalPath R package is freely available from CRAN, with source code and documentation hosted at https://github.com/zhangt369/SurvivalPath.

Author summary

Patients with advanced stage malignancies always require repeated disease surveillance and treatment. However, in the era of big data, we still lack non-black-box methodology to utilize the time-series clinical data to actualize dynamic survival prediction and treatment planning. In our previous study, our team introduced a new analytical approach called “Survival Path” to interpret the time-series data of cancer patients. The paths constructed for specific malignancy could serve as a “map” for dynamic prognosis prediction and treatment planning. Here we introduced the SurvivalPath R package, which aims to promote the standardization of this methodology and allow users to build personalized survival path models using time-series survival data. This tool is expected to facilitate dynamic clinical decision for oncologists, as well as to promote the rationale change from “Guideline-based Medicine” to “Data-driven Medicine”.

This is a PLOS Computational Biology Software paper.

1. Introduction

The idea of developing the SurvivalPath R package stems from our previous exploratory work, in which we attempted to achieve dynamic prognosis prediction by establishing survival paths based on the time-series data of patients with hepatocellular carcinoma (HCC) [1]. In the last decade, enormous efforts have been made in establishing prognostic models using machine learning or deep learning techniques with large dataset to facilitate dynamic management of cancer patients [2][3]. However, most of these models are black-box, which limit their interpretability and clinical application [4][5]. The survival path approach we proposed provide a potential solution for dynamic prognosis prediction and management of cancer patients by constructing survival path maps using returned key prognostic factors after analysis of structured time-series survival data. More importantly, the survival path model could be easily understood and utilized by clinicians when compared to black-box models.

The SurvivalPath R package is a newly developed tool to facilitate fast building of survival path models, with an aim of promoting standardization of this methodology. To make the SurvivalPath R package adapted to datasets in whom the included variables have potential collinearity [6], we optimized the feature selection process. One-to one collinearity analysis was embedded to screen out noncollinear candidate variables before formal feature selection in the main function. This step simplifies the process of picking variables among different categorical variables and reduces the confounding impact of potential collinearity on feature selection in the Cox model. In addition, the SurvivalPath R package is compatible with continuous variable, enabling automatic binary classification of continuous variables and their entry into the model [7]. To facilitate fast building and efficient usage of survival path, this package also provide functions to enable: 1. pre-processing of the data into regular time slices; 2. fast selection of patient subgroups based on personalized design; 3. demonstration of the connections between exposure/treatment and node transitions.

2. Design and implementation

Ethics statement

This study was approved by Sun Yat-sen University Cancer Center’s (SYSUCC) Ethics Committee (no. B2022-220-01). The usage example presented was an anonymously retrospective analysis of routine data and therefore we requested and were granted a waiver of individual informed consent from the SYSUCC ethics committee.

Software design

1) Converting data of time-slices into data of time slices

The time series data of each patient firstly needs to be sorted out in rows with each row represent data at specific time point. The time axis needs to be divided into time slices with the proposed interval given by the researcher. For each patient, the zero point was set at first/earliest time point, which is usually the first row of the patient’s time-series data. The time slice with complete data of all included variables was defined as slice with complete data (Fig 1A). This process of data preprocessing could be finished using the built-in function “timedivision” in this package.

Fig 1 — (A) The time-series data of participants were converted into data of time slices with constant time interval. (B) Starting from the first time slice, data of the whole population would initially undergo the processing cycle (PC) for feature selection and subgroup subdivision, followed by PC in each subdivided subgroup data at the next time slice. The subdivision of parent node in the No. (n-1) time slice determine the population of the nodes in No.n time slice. The program continue until user defined last time slice.

2) Feature selection and survival path mapping

The key features for bifurcation for each node in the construction of the survival paths are selected by repeated cycles of processing flow. The processing flow in each node includes the following steps.

Step 1: Calling the data in the parent node. Initially, the data of all included cohort at no.1 time slice was called, which is denoted S_{(all; ts = 1)} (Fig 1B). Univariate analysis with Kaplan–Meier (KM) method was utilized to identify candidate variables (X₁, X₂, X_a, X_b,…, X_p). In constructing the survival path, to control the false discovery rate and create a balanced survival path, the researcher can set the significance level for feature selection in the argument of p.value in the main function survivalpath().

Step 2: Judgment. The minimum sample size required for the Cox proportional hazard regression model with multiple covariates was input with the parameter “Minsample” in the main function Survivalpath ().

Step 3: Feature selection. All the significant variables detected using the KM method will undergo collinearity analysis between pairs. The correlation coefficient between variables in pairs was calculated by the formula below, where X_a and X_b represent one of the variables, respectively.

r = \frac{\sum_{i = 1}^{n} {(x}_{a i} - \bar{x_{a}}) (x_{b i} - \bar{x_{b}})}{\sqrt{\sum_{i = 1}^{n} {{(x}_{a i} - \bar{x_{a}})}^{2}} \sqrt{\sum_{i = 1}^{n} {{(x}_{b i} - \bar{x_{b}})}^{2}}}

(1)

The researcher can set the correlation coefficient by himself, and the pair of variables that exceed the correlation coefficient will automatically compare their Akaike information criterion (AIC) values when each of two serve as the only predictor for outcome; the variable with the smaller AIC value will be removed. All the significant variables left were the “candidate variables” and then be put into Cox proportional hazard regression model, which assumes the hazard as follows,

h (t) = h_{0} (t) e x p (\sum_{j = 1}^{p} β_{j} x_{j})

(2)

where (X₁, X₂,…, X_p) is a vector of p predictor variables, and β₁, β₂,…, β_p are the corresponding regression coefficients, which are the weights given to each variable by the model. The original model included all the candidate variables and was presented as follows:

Y = β_{0} + β_{1} X_{1} + \dots + β_{r - 1} X_{r - 1} + ε

(3)

The backward elimination (BE) procedure was carried out, with the following p tests, H_0j: β_j = 0; j = 1; 2,…, p, the lowest partial F-test value F_l corresponding to H_0l: β_l = 0 is compared with the pre-set significance values F₀. If F_l < F₀, then F_l is deleted and the new original model is:

Y = β_{0} + β_{1} X_{1} + \dots β_{l - 1} X_{l - 1} + β_{l + 1} X_{l + 1} + β_{r - 1} X_{r - 1} + ε

(4)

Then, a stepwise BE procedure was continued, until all F_l > F₀, and the model is what we choose. The importance of each variable in the fixed Cox model can be obtained as follows:

Γ_{q} = - 2 l o g (L_{h} / L_{h - q})

(5)

where L_h refers to the likelihood of the fixed model and L_h−q refers to the likelihood of model after elimination of the variable X_q. The variable eliminated from the model with the maximal change of −2log(L) was selected.

Step 4: Cycles of processing. Based on the selected dichotomized variable X_q, the cohort S_{(all; ts = 1)} will be divided into two subgroups S_{(−q; ts = 1)} and S_{(+q; ts = 1)}. The two subgroups serve as parent node and the corresponding cohort at the next time slice (after removal of those lose follow-up or reach endpoint in last time slice) (Fig 1B). If no variable is selected in certain node, the cohort in that node stays the current classification and the data in next time slice will repeat step 1–3. The process flow will end until last time slice, which could be assigned with the argument “time_slices” in the main function.

Step 5: Graphic representation: A tree-based survival path can be constructed.

3) Dealing with continuous variable: Automatic dichotomization

A large number of studies have shown that the method based on time-dependent receiver operating characteristic (ROC) curves can accurately find the ideal cutoff value of continuous variables [8]. Therefore, we embed the function for automatic dichotomization of the continuous variables according to the best cutoff recommended by the Time-dependent ROC curve in the classifydata () function, thus the program is compatible with candidate features in the form of continuous variables.

4) Survival Path Mapping of Re-arranged data

In the past, researchers often used the traditional staging system for dynamic staging, that is, patients with similar disease states at different time points after the onset of disease are considered to have disease with similar severity [9]. Based on the concept that similar disease status at different time points across patients can sometimes be regarded as similar diseases and set as the same starting line, we designed the matchsubgroup () function, which can capture the time-series data of patient once the disease status happen from the time series data through matching values of key features.

Implementation

Before constructing a survival path model, time-series dataset in the form of continuous time slices is needed. The prepared time-series data should meet the following requirements:

The time series data of each participant is arranged in rows with each row represent data at each specific time point.
The dataset need to contain separate fields for participants’ ID, time points of the recorded data, survival time and survival outcome. The records in each row need to have complete data on these fields.
Dichotomies as candidate variables are recommended. When the input variables are grade variables or continuous variables, re-classification is feasible by setting argument “ifclassifydata = TRUE” in the function generatorDTSD(). Multiple categorical variables with more than three categories that are not logically correlated or in rank correlation are not recommended as the interpretation of result will be difficult.

To build a survival path model using SurvivalPath R package, four key functions are important in implementation. The first and the main function is survivalpath(), whose function signature is survivalpath (DTSD, time_slices, treatments = NULL, num_categories = 2, p.value = 0.05, minsample = 15, degreeofcorrelation = 0.7, rates = 365). The DTSD is a list object that can be generated using the built-in function generatorDTSD(), which includes list data of time, status, timeslicedata and tspatientid of each time slice. The time_slices argument specify the total number of time slices (starting from the front) need to be included in the survival path model. The default value of treatments is NULL, this argument can specify the intervention/exposure variable the researcher find interest, which could be utilized during analysis of efficacy of different arms in certain nodes. The last five arguments were used to specify the maximum number of branches that each node can divide, p.value for hypothesis testing, minimum sample size for branching, cutoff value of correlation coefficient for collinearity analysis and the time point set for estimating survival rates of each node, respectively.

The second function timedivision() is used for data pre-precessing, whose signature is timedivision (dataset, ID, time, period = 30, left_interval = 0.5, right_interval = 0.5).

The first argument dataset specify the dataframe of row arranged time-series survival data, which is usually un-preprocessed data prepared by the researcher; the data structure requirement is displayed in Fig 2. The ID and time arguments correspond to the specified variables that represent the unique identifier of each participant in each row and the date to which each data point belongs, respectively. The last three arguments were parameters used to specify the length of time slices and the time interval for inclusion of time point data into each time slice.

Fig 2 — The data structure needed to construct survival path model. The columns includes time points, observational variables, treatment variables (optional) and outcome variables. Each row represent observation data at specific time point, and the date is sorted in ascending order for each participant/patient.

The third function generatorDTSD() is used to generate DTSD class objects using a preprocessed dataframe arranged in time slices. The function signature of generatorDTSD() is

generatorDTSD(dataset, periodindex, IDindex, timeindex, statusindex, variable, ifclassifydata = TRUE, predict.time = 365, isfill = TRUE)

The first argument dataset refers to the dataframe of time-series observations, containing identification numbers of each subject, index of time slice, value of risk factors, survival time, and survival outcomes; the dataset usually could be returned by timedivision() function. The arguments periodindex, IDindex, timeindex, statusindex specify the time slice indicator, ID indicator, time and status indicators in the dataset, respectively. The argument variable is a list object to specify the risk factors required for modeling. Arguments ifclassifydata, predict. time are optional and useful in automatic dichotomizing risk factors. The argument isfill is a logical value, which is used to confirm whether to fill in missing data. If it is True, then fill with the average level.

The last key function matchsubgroup () is used for screening and extracting data of subjects that meet the given conditions, whose function signature is

matchsubgroup(DTSD, varname, varvalue)

The first argument DTSD specifies a list object that can be generated using generatorDTSD(). The argument varname specifies a of list variables used to screen subjects, and the variables needs to be included in the DTSD object. The varvalue is defined as a list object, and subjects whose varname variables are equal to the values of varvalue will be selected. This function could be utilized for generating personalized survival path model for participant with certain features.

4. Results

In this section, we demonstrate the usage of SurvivalPath package through analyzing a dataset of patients with hepatocellular carcinoma (HCC).

4.1 Pre-processing of time-series data

The pre-processing of the time-series data could be accomplished manually or using the built-in functions. Regard the built-in function approach, first, by setting the parameters on periods of time slices, left interval and right interval for data inclusion, we can convert the original dataset of “time points” into dataset of “time slices”.

R> data("DTSDHCC") #dataset of time points

R> dataset = timedivision(DTSDHCC,"ID","Date",period = 90,left_interval = 0.5,right_interval = 0.5)

The new dataset returned in the form of time slices with one columns added: “time_slice”. Usually, the number of rows in the converted dataset will be less than the original dataset after screening out records of ineligible time points. After finishing converting the dataset, we can check the amount of data at each time slice.

R> table(dataset$time_slice)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

2360 1586 976 627 421 284 181 119 65 24 10 2 2 2 1

Here we found the sample size in each time slice gradually decreased. To ensure the sample size of analysis, the survival paths we constructed will focus on No.1 to No.8 time slices. Then, we extract the data of key variables of each time slice and compile them into an object of class Dynamic Time Series Data (DTSD), which is designed to run survival path mapping using generatorDTSD() function. The key variables includes survival outcomes, the variables of interest, the ID of corresponding participant of each time slice.

R> DTSDdata <-generatorDTSD(dataset, periodindex = "time_slice", IDindex = "ID", timeindex = "OStime_day", statusindex = "Status_of_death", variable = c ("Age", "Amount.of.Hepatic.Lesions", "Largest.Diameter.of.Hepatic.Lesions",

"New.Lesion","Vascular.Invasion", "Local.Lymph.Node.Metastasis", "Distant.Metastasis", "Child_pugh_score", "AFP"), ifclassifydata = TRUE, predict.time = 365*1)

As several variables included in building survival paths were continuous variable, here we use a build-in argument “ifclassifydata = TRUE” to convert these variables into dichotomies, which facilitates automatic identification of cutoff through time-dependent ROC curve estimation. To run this function, survivalROC package is required.

The function returns a DTSD class object for conducting survivalpath() function. The object includes 4 list objects (time, status, tsdata, tsid), which correspond to the associated data of each time slice. The object have two parameters (length and ts_size) in describing the number of time slices and sample size of each time slice. The list “cutoff” stores the calculated cutoffs of continuous variables that have been dichotomized.

4.2 Construct a survival path model using main function

After finishing the preparation of data, we can input the data into the main function, namely the survivalpath(). The investigator need to set the number of time slices when constructing the survival paths. The investigator could set p.value for feature selection process, the default value is 0.05. The investigator could also set the value of correlation coefficient to screen out noncollinear candidate variables, whose default value is 0.7. As we had already prepared this variable, we included it in our demonstration case. To view 1year OS rates on the graph, we set the rates of 365.

R> result <- survivalpath(DTSDdata,time_slices = 8, p.value = 0.01, degreeofcorrelation = 0.5)

The main function will return a dataset that indicate the steps each participant take at each time slice, a time-slices dataset with node annotation and a treedata that could be used for visualization the survival paths.

R> View(result$data) #dataset that indicate the steps each participant take

R> View(result$df) #dataset with node annotation

Using the function ggtree (), we can draw a two-dimentional graph with the information of node name, sample size, median survival time, and survival rates as selected previously annotated.

R>library(ggplot2)

R>library(ggtree)

R> mytree <- result$tree

R> ggtree(mytree, color = "black",linetype = 1,size = 1.2,ladderize = T,) +

theme_tree2() +

geom_text2(aes(label = label),hjust = 0.6, vjust = -0.6, size = 3.0)+

geom_text2(aes(label = paste(node,size,mytree@data$survival,mytree@data$survivalrate,sep = "/")), hjust = 0.6, vjust = -1.85, size = 3.0)+

geom_point2(aes(shape = isTip, color = isTip), size = mytree@data$size%/%200+1,show.legend = F)+

labs(size = "Nitrogen",

x = "TimeSlices",
y = "Survival",
subtitle = "node_name/sample number/Median survival time/Survival rate",
title = "Survival Tree") +
theme(legend.title = element_blank(),legend.position = c(0.1,0.9))

Here we get the graph of survival path mapping, which includes all the significant bifurcations identified (Fig 3).

4.3 Survival Analysis and correlation analysis based on survival path

The built-in functions plotKM(), compareTreatmentPlans() and EvolutionAfterTreatment() are utilized for data analysis in survival paths. Once we finish the construction of survival paths model, we can generate survival curves comparing survival of patients in any of the nodes. Here we compare the survival curves of no. 24 and no. 36 nodes as a demonstration (Fig 4A).

Fig 4 — Demonstration of survival curves in comparing survival between different nodes (a) and subgroups with different treatment arms at specific nodes (b).

R> treepoints = c(24,36)

R> plotKM(result$data, treepoints,mytree,risk.table = T)

We can also draw survival curves for any of the nodes to investigate the correlation between treatment/exposure and survival (Fig 4B).

R> treepoints = c(24)

R> compareTreatmentPlans(result$data, treepoints,mytree,dataset,"Resection")

We can investigate the correlation between treatment/exposure and node transition based on the survival paths map generated.

R> treepoint = 24

R> A = EvolutionAfterTreatment(result$data, treepoint, mytree, dataset,"Embolization")

R> mytable <- xtabs(~ `Embolization`+treepoint, data = A)

R> prop.table (mytable,1)

The returned results is displayed as bellowed:

treepoint

Embolization 25 31 Missing follow-up

0 0.4654832 0.1104536 0.4240631

1 0.6138996 0.1441441 0.2419562

The results showed that in node no.24, 46.5% patients transit to node no.25, 11.0% transit to node no.31 and 42.4% patients lost follow-up in the next time slice without embolization treatment; while 61.4% patients transit to node no.25, 14.4% transit to node no.31 and 24.4% patients lost follow-up in the next time slice after embolization treatment.

4.4 Personalized survival path mapping through features matching

For selected patients with specific disease conditions, we can extract time slices data of participants with the onset of similar conditions across time slices and construct a personalized survival path.

R>varname = list(’Amount.of.Hepatic.Lesions’,"Largest.Diameter.of.Hepatic.Lesions")

R>varvalue = list(1,1)

R>df <- matchsubgroup(DTSDdata,varname = varname, varvalue = varvalue)

R>result<-survivalpath(df, time_slices = 4)

4.5 Evaluate the survival path model with c-index

The evaluate () function was built to evaluate the discriminative ability of the survival path in each specified time slice using Harrell’s concordance index C-index based on the survival path model.

R> evaluate(survivalpath = result, minnodesize = 15)

The returned c-index for each time slice is displayed as bellowed:

$Cindex[[1]] 0.6008216

$Cindex[[2]] 0.6425235

$Cindex[[3]] 0.6717504

$Cindex[[4]] 0.6726773

$Cindex[[5]] 0.6725269

$Cindex[[6]] 0.6229113

4.6 Export data with node annotation

The time slice data with annotation of node information is stored in the results data after computation by the main function and could be exported for further usage and analysis.

R> write.csv(result$df, file = "output.csv", row.names = FALSE)

5. Availability and future directions

In this study, we have introduced the SurvivalPath R package. The package facilitates fast generation of personalized designed survival path models, which could be used for dynamic prognosis prediction. Survival paths can be computed using the main function survivalpath(), under a user-defined parameters on covariates, significance level, minimum bifurcation sample size and number of time slices for analysis. The tree-structure result can be visualized as a tree diagram using ggtree R package, with important parameters annotated. We also designed built-in functions plotKM(), compareTreatmentPlans() and EvolutionAfterTreatment() to facilitate comparison of survival between nodes, evaluate exposure/treatment and survival, and explore the connection between exposure/treatment and node transitions, respectively. Moreover, function for picking up patient subgroup with specific features within the survival path model was created, which can be used to generate refined survival paths for further exploration analysis.

In future work, we will keep on developing the survivalpath() function that will facilitate node bifurcation through risk assessment that integrated multiple prognostic factors. Besides, we will keep on optimizing the visualization of survival paths through drawing 3-D graph. Lastly, we will set up cloud database for users to upload time-series data of cancer patients, which aims to be open to all users.

Acknowledgments

We would like to thank Ms. Juan Nie for providing continuous encouragement to Dr. Lujun Shen in the past three years. We would like to thank Mr. Xueting Shen and Mrs. Meizhen Zhu for giving their best love to their son, Lujun Shen.

Data Availability

The SurvivalPath R package is freely available from github, https://github.com/zhangt369/SurvivalPath. The time-series dataset of patients with hepatocellular carcinoma for demonstration is embedded in the package.

Funding Statement

This work was funded by Talent Cultivation Project (sponsored by Guangdong Medical Equipment Society Medical Informatization Yangcheng Forum and Xinhua Three Technology Co., Ltd., project number GDYZ2021A05 to TZ); and Sun Yat-sen University Youth Development Project (2019): Development of R package of survival path mapping and its implementation in personalized treatment of HCC (No. 19ykpy200 to LS). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1.Shen L, Zeng Q, Guo P, Huang J, Li C, Pan T, et al. Dynamically prognosticating patients with hepatocellular carcinoma through survival paths mapping based on time-series data. Nat Commun. 2018;9(1):2230. Epub 2018/06/10. doi: 10.1038/s41467-018-04633-7 ; PubMed Central PMCID: PMC5993743. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Dlamini Z, Francies FZ, Hull R, Marima R. Artificial intelligence (AI) and big data in cancer and precision oncology. Computational and structural biotechnology journal. 2020;18:2300–11. Epub 2020/10/01. doi: 10.1016/j.csbj.2020.08.019 ; PubMed Central PMCID: PMC7490765. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Pham T, Tran T, Phung D, Venkatesh S. Predicting healthcare trajectories from medical records: A deep learning approach. J Biomed Inform. 2017;69:218–29. Epub 2017/04/16. doi: 10.1016/j.jbi.2017.04.001 . [DOI] [PubMed] [Google Scholar]
4.Xiao C, Choi E, Sun J. Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review. J Am Med Inform Assoc. 2018;25(10):1419–28. Epub 2018/06/13. doi: 10.1093/jamia/ocy068 ; PubMed Central PMCID: PMC6188527. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Kurtz DM, Esfahani MS, Scherer F, Soo J, Jin MC, Liu CL, et al. Dynamic Risk Profiling Using Serial Tumor Biomarkers for Personalized Outcome Prediction. Cell. 2019;178(3):699–713 e19. Epub 2019/07/10. doi: 10.1016/j.cell.2019.06.011 ; PubMed Central PMCID: PMC7380118. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Xue X, Kim MY, Shore RE. Cox regression analysis in presence of collinearity: an application to assessment of health risks associated with occupational radiation exposure. Lifetime Data Anal. 2007;13(3):333–50. Epub 2007/07/31. doi: 10.1007/s10985-007-9045-1 . [DOI] [PubMed] [Google Scholar]
7.Cleo G, Scott AM, Islam F, Julien B, Beller E. Usability and acceptability of four systematic review automation software packages: a mixed method design. Syst Rev. 2019;8(1):145. Epub 2019/06/22. doi: 10.1186/s13643-019-1069-6 ; PubMed Central PMCID: PMC6587262. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Heagerty PJ, Zheng Y. Survival model predictive accuracy and ROC curves. Biometrics. 2005;61(1):92–105. Epub 2005/03/02. doi: 10.1111/j.0006-341X.2005.030814.x . [DOI] [PubMed] [Google Scholar]
9.Vogel A, Cervantes A, Chau I, Daniele B, Llovet JM, Meyer T, et al. Hepatocellular carcinoma: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Annals of oncology: official journal of the European Society for Medical Oncology / ESMO. 2018;29(Suppl 4):iv238–iv55. Epub 2018/10/05. doi: 10.1093/annonc/mdy308 . [DOI] [PubMed] [Google Scholar]

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010830.r001

Decision Letter 0

Feilim Mac Gabhann, Feng Fu

26 Oct 2022

Dear Prof Fan,

Thank you very much for submitting your manuscript "SurvivalPath:A R package for conducting personalized survival path mapping based on time-series survival data" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Feng Fu

Academic Editor

PLOS Computational Biology

Feilim Mac Gabhann

Editor-in-Chief

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The auther develop a SurvivalPath R package to facilitate building personalized survival path models. The package contains functions to convert time-series data by into time-slices data by fixed interval based on time information of input medical records. And I have noticed that this R package is based the work which has published. However, I still have some questions:

1: How does the SurvivalPath R package deal with the missing data?

2. In Figure 2, I want to know that whether SurvivalPath R package is suitable for time-series data with a three-class outcome(eg: alive, progress, death) and one continuous variable(eg: ALBI score)

3: In Figure 3, graph of survival path is too complex for doctor to use. If it is possible, the number of survival path should be less than 5.

Reviewer #2: The authors submit an interesting manuscript (PCOMPBIOL-D-22-01103) about conducting personalized survival path mapping based on time-series survival data for possible publication in PLOS Computational Biology.

The authors introduced the SurvivalPath R package for dynamic prognostication of cancer patients using time-series survival data. I have installed the SurvivalPath R package, and verified that it works to facilitate building personalized survival path models. Congratulations to them!

I believe that readers could benefit from this package which contains functions to convert time-series data by into time-slices data by fixed interval based on time information of input medical records. More importantly, under a user-defined parameters on covariates, significance level, minimum bifurcation sample size and number of time slices for analysis, survival paths can be computed using the main function, which can be visualized as a tree diagram, with important parameters annotated. Besides, the package can also be used for further exploration analysis, which I think is an important application of this methodology.

One concern is how this package deal with the situations that specific variable with multiple important cutoffs. I notice that the optimal cutoff of continuous variable could be automatically computed, while for different situations and in different time slices, the optimal cutoff could be different.

Generally, the manuscript is well organized and easy to follow. I think it can be accepted after language polishing.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

References:

Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript.

If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

PLoS Comput Biol. 2023 Jan 6;19(1):e1010830. doi: 10.1371/journal.pcbi.1010830.r002

Author response to Decision Letter 0

5 Nov 2022

Attachment

Submitted filename: Response to reviewers (Rebuttal letter).docx

Click here for additional data file.^{(18.6KB, docx)}

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010830.r003

Decision Letter 1

Feilim Mac Gabhann, Feng Fu

21 Dec 2022

Dear Prof Fan,

We are pleased to inform you that your manuscript 'SurvivalPath: A R package for conducting personalized survival path mapping based on time-series survival data' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology.

Best regards,

Feng Fu

Academic Editor

PLOS Computational Biology

Feilim Mac Gabhann

Editor-in-Chief

PLOS Computational Biology

***********************************************************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: I believe this revised manuscript is suitable to accept.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

Reviewer #1: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010830.r004

Acceptance letter

Feilim Mac Gabhann, Feng Fu

29 Dec 2022

PCOMPBIOL-D-22-01103R1

SurvivalPath: A R package for conducting personalized survival path mapping based on time-series survival data

Dear Dr Fan,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Zsofia Freund

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Attachment

Submitted filename: Response to reviewers (Rebuttal letter).docx

Click here for additional data file.^{(18.6KB, docx)}

Data Availability Statement

[pcbi.1010830.ref001] 1.Shen L, Zeng Q, Guo P, Huang J, Li C, Pan T, et al. Dynamically prognosticating patients with hepatocellular carcinoma through survival paths mapping based on time-series data. Nat Commun. 2018;9(1):2230. Epub 2018/06/10. doi: 10.1038/s41467-018-04633-7 ; PubMed Central PMCID: PMC5993743. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010830.ref002] 2.Dlamini Z, Francies FZ, Hull R, Marima R. Artificial intelligence (AI) and big data in cancer and precision oncology. Computational and structural biotechnology journal. 2020;18:2300–11. Epub 2020/10/01. doi: 10.1016/j.csbj.2020.08.019 ; PubMed Central PMCID: PMC7490765. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010830.ref003] 3.Pham T, Tran T, Phung D, Venkatesh S. Predicting healthcare trajectories from medical records: A deep learning approach. J Biomed Inform. 2017;69:218–29. Epub 2017/04/16. doi: 10.1016/j.jbi.2017.04.001 . [DOI] [PubMed] [Google Scholar]

[pcbi.1010830.ref004] 4.Xiao C, Choi E, Sun J. Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review. J Am Med Inform Assoc. 2018;25(10):1419–28. Epub 2018/06/13. doi: 10.1093/jamia/ocy068 ; PubMed Central PMCID: PMC6188527. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010830.ref005] 5.Kurtz DM, Esfahani MS, Scherer F, Soo J, Jin MC, Liu CL, et al. Dynamic Risk Profiling Using Serial Tumor Biomarkers for Personalized Outcome Prediction. Cell. 2019;178(3):699–713 e19. Epub 2019/07/10. doi: 10.1016/j.cell.2019.06.011 ; PubMed Central PMCID: PMC7380118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010830.ref006] 6.Xue X, Kim MY, Shore RE. Cox regression analysis in presence of collinearity: an application to assessment of health risks associated with occupational radiation exposure. Lifetime Data Anal. 2007;13(3):333–50. Epub 2007/07/31. doi: 10.1007/s10985-007-9045-1 . [DOI] [PubMed] [Google Scholar]

[pcbi.1010830.ref007] 7.Cleo G, Scott AM, Islam F, Julien B, Beller E. Usability and acceptability of four systematic review automation software packages: a mixed method design. Syst Rev. 2019;8(1):145. Epub 2019/06/22. doi: 10.1186/s13643-019-1069-6 ; PubMed Central PMCID: PMC6587262. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010830.ref008] 8.Heagerty PJ, Zheng Y. Survival model predictive accuracy and ROC curves. Biometrics. 2005;61(1):92–105. Epub 2005/03/02. doi: 10.1111/j.0006-341X.2005.030814.x . [DOI] [PubMed] [Google Scholar]

[pcbi.1010830.ref009] 9.Vogel A, Cervantes A, Chau I, Daniele B, Llovet JM, Meyer T, et al. Hepatocellular carcinoma: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Annals of oncology: official journal of the European Society for Medical Oncology / ESMO. 2018;29(Suppl 4):iv238–iv55. Epub 2018/10/05. doi: 10.1093/annonc/mdy308 . [DOI] [PubMed] [Google Scholar]

PERMALINK

SurvivalPath:A R package for conducting personalized survival path mapping based on time-series survival data

Lujun Shen

Jinqing Mo

Changsheng Yang

Yiquan Jiang

Liangru Ke

Dan Hou

Jingdong Yan

Tao Zhang

Weijun Fan

Roles

Summary

Author summary

1. Introduction

2. Design and implementation

Ethics statement

Software design

1) Converting data of time-slices into data of time slices

Fig 1. Flowchart of analysis for survival path mapping.

2) Feature selection and survival path mapping

3) Dealing with continuous variable: Automatic dichotomization

4) Survival Path Mapping of Re-arranged data

Implementation

Fig 2.

4. Results

4.1 Pre-processing of time-series data

4.2 Construct a survival path model using main function

Fig 3. Graph of survival path computed as demonstration.

4.3 Survival Analysis and correlation analysis based on survival path

Fig 4.

4.4 Personalized survival path mapping through features matching

4.5 Evaluate the survival path model with c-index

4.6 Export data with node annotation

5. Availability and future directions

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Feilim Mac Gabhann

Feng Fu

Roles

Author response to Decision Letter 0

Decision Letter 1

Feilim Mac Gabhann

Feng Fu

Roles

Acceptance letter

Feilim Mac Gabhann

Feng Fu

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases