Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 May 7.
Published in final edited form as: J Biom Biostat. 2013 Oct 25;Suppl 1(e001):19522. doi: 10.4172/2155-6180.S1-e001

Markov chains and semi-Markov models in time-to-event analysis

Erin L Abner 1,4,*, Richard J Charnigo 2,3, Richard J Kryscio 2,3,4
PMCID: PMC4013002  NIHMSID: NIHMS569547  PMID: 24818062

Abstract

A variety of statistical methods are available to investigators for analysis of time-to-event data, often referred to as survival analysis. Kaplan-Meier estimation and Cox proportional hazards regression are commonly employed tools but are not appropriate for all studies, particularly in the presence of competing risks and when multiple or recurrent outcomes are of interest. Markov chain models can accommodate censored data, competing risks (informative censoring), multiple outcomes, recurrent outcomes, frailty, and non-constant survival probabilities. Markov chain models, though often overlooked by investigators in time-to-event analysis, have long been used in clinical studies and have widespread application in other fields.

Background

The analysis of time-to-event data in human and animal studies presents several statistical challenges. In addition to the familiar problem of censored observations, there may be multiple types of failure under consideration (the “competing risk problem” [1]); clinically relevant outcomes other than failure may be observed during follow-up [2], including those that alter the risk of failure or can occur more than once [3,4]; and individual susceptibility to failure (i.e., frailty) may not be constant over time [5]. While traditional time-to-event analysis methods like Kaplan-Meier product-limit estimation and Cox proportional hazards regression are implemented easily and use censored data efficiently when the assumption of uninformative censoring holds, analyses involving informative censoring, multiple outcomes, or non-constant survival probabilities may be well suited for application of Markov processes [6]. A contemporary approach to the informative censoring problem in Cox regression involves a multivariate survival analysis [7].

Markov Processes

A Markov process is a stochastic process that describes the movement of an individual through a finite number of defined states, one (and only one) of which must contain the individual at any particular time. Possible movements among states may be depicted with a transition matrix or state diagram [2,3,6]. In order for the process to terminate, at least one of the states must be absorbing, i.e., individuals have zero probability of leaving the state once it has been entered. Death, for example, is an absorbing state used commonly in clinical studies, but it is also a well-known competing risk for clinical outcomes in studies of older persons [2,4]. Markov processes may be continuous or discrete as well as time-homogeneous or time-nonhomogeneous. The focus of this editorial will be discrete, time-homogeneous Markov processes called Markov chains.

Markov Chains

Markov chain models allow analysts to calculate the probability and rate (or intensity) of movement associated with each transition between states within a single observation cycle as well as the approximate number of cycles spent in a particular state. When observations are made at regular intervals, the number of cycles can be interpreted as time in a state. Time spent in all states prior to absorption can be summed to estimate the total survival time. Use of Markov chains requires two fundamental assumptions: (i) transition probabilities are constant over time (time homogeneity); and (ii) the probability of the next transition depends only on the current state (the first-order Markov property). These models are attractive for time-to-event analysis. They accommodate the simultaneous analysis of multiple events of interest and inclusion of competing risks through the states defined in the model, as well as consideration of individual frailty through subject-specific random effects [8,9].

Censored data, both right and left, are appropriate for Markov chains. In a Markov chain model, for example, an individual who never reaches an absorbing state (right-censored)—whether because the study observation is ongoing or the subject has withdrawn or been lost to follow up—can contribute information to the model regarding the transitions he or she did make, which is an advantage over traditional survival analysis methodology [6]. Because individuals are not required to enter the transition matrix in any particular state, left-censored data are also accommodated. Interval censoring is not formally accommodated in Markov chains, which assume that transitions take place only once per observation cycle, either at the beginning or the end. In reality, transitions make take place at any time, and multiple unobserved transitions may take place between cycle assessments. Approaches such as the half-cycle correction, where transitions are assumed to occur in the middle of the observation cycle [3], have been proposed to mitigate bias resulting from assuming that transitions take place only at the cycle’s beginning or end. If the clinical model and data structure support the assumption that all transitions are unidirectional (i.e., no reverse transitions are possible), a semi-Markov model, which is a special case of Markov chain where the time spent in the current state depends on both the prior and future adjoining states [10], could be considered for interval censored data [4,10].

Finally, unlike traditional time-to-event analysis where only one outcome is possible for each individual, Markov chains allow analysts to calculate survival times in multiple states. This is particularly attractive for studies of chronic diseases with well-defined phases, like cancer [11] and autoimmune diseases [12], where remission and recurrence are of interest in addition to overall survival, and dementia due to neurodegenerative disease, where pre-clinical and mildly symptomatic disease states are increasingly of interest to researchers working to identify treatments and prevention strategies [2,4]. As with traditional time-to-event analysis, survival curves may be estimated from model results [13]. Mean survival times may be inferred using matrix solution, Markov cohort simulation, or Markov Chain Monte Carlo simulation [3]. These calculations are more cumbersome, but still possible, when transition probability estimates are derived from covariate-adjusted regression models [14]. By contrast, semi-Markov models estimate mean survival times directly without the need for additional calculations [4].

Verifying Model Assumptions

The time homogeneity assumption can be assessed with a likelihood ratio test, and the first-order Markov property assumption can be examined with a chi-square test [6,15]. The time homogeneity assumption is often difficult to meet, particularly in studies of chronic disease where studies are years long, single observation cycles can span a year or more, and increasing age generally corresponds to greater risk of disease or death. However, this concern can be mitigated by data stratification (e.g., by age group or study period) or regression modeling, where the effect of covariates is included in the estimation of transition probabilities [16]. In regression, covariates may be either fixed or time-dependent.

Even when the fundamental model assumptions are met, application of the Markov chain model may still be unsuccessful. Data density, i.e., the observed frequency of each transition type, may be too sparse in some cells to implement the regression model. Sparse cells, where few events are observed, may lead to inaccurate estimation or prevent model convergence. In addition, there is no widely accepted goodness of fit test for the model.

Conclusion

Although Markov models have been used in clinical applications for over 60 years [17], incorporation of subject-specific random effects in Markov chains to account for individual propensity to make transitions is a relatively recent development [7]. However, inclusion of random effects makes estimation of the likelihood quite complex, and fitting such models can be time consuming. More importantly, their meaning must be carefully considered. Models that utilize tunnel states (i.e., non-absorbing states from which reverse transitions are not possible) [3], for example, complicate the use of random effects.

In closing, Markov chains are useful tools for survival analysis that allow for more nuanced modeling than is available in most standard time-to-event methods. While the focus of this editorial has been clinical studies, Markov chains have clear applications in diverse fields including labor research [18], finance [19], political science [20], chemical engineering [21], and demography [22]. However, while many journal readers and reviewers may readily comprehend the results from Markov models, they may lack familiarity with the underlying statistical assumptions, particularly in fields where the use of Markov models is not yet widespread. If so, they may neglect to challenge investigators to demonstrate that these assumptions are tenable. Given that improper use of Markov models may result in biased estimation, perhaps some standardization in the reporting of Markov model results and assumption verification is needed.

Acknowledgments

Funding

This research was partially funded with support from grants to the University of Kentucky’s Sanders-Brown Center on Aging, R01 AG038651-01A1 and P30 AG028383, from the National Institute on Aging, as well as a grant to the University of Kentucky’s Center for Clinical and Translational Science, UL1TR000117, from the National Center for Advancing Translational Sciences.

References

  1. Prentice RL, Kalbfleisch JD, Petersen AV, Flournoy N, Farewell VT, Breslow NE. The analysis of failure times in the presence of competing risks. Biometrics. 1978;34:541–554. http://www.jstor.org/stable/2530374. [PubMed] [Google Scholar]
  2. Abner EL, Kryscio RJ, Cooper GE, Fardo DW, Jicha GA, Mendiondo MS, Nelson PT, Smith CD, Van Eldik LJ, Wan L, Schmitt FA. Mild Cognitive Impairment: Statistical models of transition using longitudinal clinical data. Int J Alzheimers Dis. 2012;2012:291920. doi: 10.1155/2012/291920. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3320090/ [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Sonnenberg FA, Beck JR. Markov models in medical decision making: a practical guide. Med Decis Making. 1993;13:322–338. doi: 10.1177/0272989X9301300409. http://www.ncbi.nlm.nih.gov/pubmed/8246705. [DOI] [PubMed] [Google Scholar]
  4. Kryscio RJ, Abner EL, Lin Y, Cooper GE, Fardo DW, Jicha GA, Nelson PT, Smith CD, Van Eldik LJ, Wan L, Schmitt FA. Adjusting for mortality when identifying risk factors for transitions to MCI and dementia. J Alzheimers Dis. 2013;35:823–832. doi: 10.3233/JAD-122146. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3703851/ [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Aalen OO. Effects of frailty in survival analysis. Stat Methods in Med Res. 1994;3:227–243. doi: 10.1177/096228029400300303. http://smm.sagepub.com/content/3/3/227. [DOI] [PubMed] [Google Scholar]
  6. Hillis A, Maguire M, Hawkins BS, Newhouse MM. The Markov process as a general method for nonparametric analysis of right-censored medical data. J Chron Dis. 1986;39:595–604. doi: 10.1016/0021-9681(86)90184-0. http://www.sciencedirect.com/science/article/pii/0021968186901840. [DOI] [PubMed] [Google Scholar]
  7. Crowder M. Multivariate survival analysis and competing risks. CRC Press, Taylor and Francis Group; Boca Raton, Florida: 2012. http://www.crcpress.com/product/isbn/9781439875216. [Google Scholar]
  8. Salazar JC, Schmitt FA, Yu L, Mendiondo MS, Kryscio RJ. Shared random effects analysis of multi-state Markov models: application to a longitudinal study of transitions to dementia. Statist Med. 2007;26:568–580. doi: 10.1002/sim.2437. http://www.ncbi.nlm.nih.gov/pubmed/16345024. [DOI] [PubMed] [Google Scholar]
  9. Song C, Kuo L, Derby CA, Lipton RB, Hall CB. Multi-stage transitional models with random effects and their application to the Einstein Aging Study. Biometrical J. 2011;53:938–955. doi: 10.1002/bimj.200900259. http://www.ncbi.nlm.nih.gov/pubmed/22020750. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Kang M, Lagakos SW. Statistical methods for panel data from a semi-Markov process, with application to HPV. Biostatistics. 2007;8:252–264. doi: 10.1093/biostatistics/kxl006. http://www.ncbi.nlm.nih.gov/pubmed/16740624. [DOI] [PubMed] [Google Scholar]
  11. Kay R. A Markov model for analyzing cancer markers and disease states in survival studies. Biometrics. 1986;42:855–865. http://www.jstor.org/stable/2530699. [PubMed] [Google Scholar]
  12. Pan F, Goh JW, Cutter G, Su W, Pleimes D, Wang C. Long-term cost-effectiveness model of interferon beta-1b in the early treatment of multiple sclerosis in the United States (2012) Clin Ther. 34:1966–1976. doi: 10.1016/j.clinthera.2012.07.010. http://www.ncbi.nlm.nih.gov/pubmed/22906738. [DOI] [PubMed] [Google Scholar]
  13. Sendi PP, Craig BA, Pfulger D, Gafni A, Bucher HC. Systematic validation of disease models for pharmacoeconomic evaluations. J Eval Clin Prac. 1999;5:283–295. doi: 10.1046/j.1365-2753.1999.00174.x. http://www.ncbi.nlm.nih.gov/pubmed/10461580. [DOI] [PubMed] [Google Scholar]
  14. Yu L, Griffith WS, Tyas SL, Snowdon DA, Kryscio RJ. A nonstationary Markov transition model for computing the relative risk of dementia before death. Statist Med. 2010;29:639–648. doi: 10.1002/sim.3828. http://www.ncbi.nlm.nih.gov/pubmed/20087848. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Anderson TW, Goodman LA. Statistical inference about Markov chains. Ann Math Stat. 1957;28:89–110. http://www.jstor.org/stable/2237025. [Google Scholar]
  16. Kalbfleisch JD, Lawless JF. The analysis of panel data under a Markov assumption. J Am Stat Assoc. 1985;80:863–71. http://www.jstor.org/stable/2288545. [Google Scholar]
  17. Fix E, Neyman J. A simple stochastic model of recovery, relapse, death and loss of patients. Human Biol. 1951;23:205–241. http://www.jstor.org/stable/41448000. [PubMed] [Google Scholar]
  18. Pedersen J, Bjorner JB, Burr H, Christensen KB. Transitions between sickness absence, work, unemployment, and disability in Denmark 2004–2008. Scand J Work Environ Health. 2008;38:516–526. doi: 10.5271/sjweh.3293. http://www.ncbi.nlm.nih.gov/pubmed/22441355. [DOI] [PubMed] [Google Scholar]
  19. Hochreiter R, Wozabal D. Evolutionary estimation of a coupled Markov chain credit risk model. Natural Computing in Computational Finance. 2010;293:31–44. http://link.springer.com/chapter/10.1007%2F978-3-642-13950-5_3#. [Google Scholar]
  20. Boskin MJ, Nold FC. A Markov model of turnover in aid to families with dependent children. J Human Resources. 1975;10:467–481. http://www.jstor.org/stable/144985. [Google Scholar]
  21. Tamir A. Markov chains in chemical engineering. Elsevier B.V.; Amsterdam, The Netherlands: 1998. http://www.sciencedirect.com/science/book/9780444823564. [Google Scholar]
  22. van Raalte AA, Caswell H. Perturbation analysis of indices of lifespan variability. Demography. 2013;50:1615–1640. doi: 10.1007/s13524-013-0223-3. http://link.springer.com/article/10.1007%2Fs13524-013-0223-3#. [DOI] [PubMed] [Google Scholar]

RESOURCES