Abstract
A major aim of longitudinal analyses of life course data is to describe the within- and between-individual variability in a behavioral outcome, such as crime. Statistical analyses of such data typically draw on mixture and mixed-effects growth models. In this work, we present a functional analytic point of view and develop an alternative method that models individual crime trajectories as departures from a population age-crime curve. Drawing on empirical and theoretical claims in criminology, we assume a unimodal population age-crime curve and allow individual expected crime trajectories to differ by their levels of offending and patterns of temporal misalignment. We extend Bayesian hierarchical curve registration methods to accommodate count data and to incorporate influence of baseline covariates on individual behavioral trajectories. Analyzing self-reported counts of yearly marijuana use from the Denver Youth Survey, we examine the influence of race and gender categories on differences in levels and timing of marijuana smoking. We find that our approach offers a flexible model for longitudinal crime trajectories and allows for a rich array of inferences of interest to criminologists and drug abuse researchers.
Keywords: Curve Registration, Drug Use, Functional Data, Generalized Linear Models, Individual Trajectories, Longitudinal Data, MCMC, Unimodal Smoothing
1 Introduction
An important task in criminology concerns describing individual trajectories of offending across time or age. An adequate description of offending trajectories across age is necessary for describing differences in criminal careers (Blumstein and Cohen 1987), for estimating features of age-crime curves (Hirschi and Gottfredson 1983), such as age-at-onset, and ultimately, for explaining differences in age crime curves using developmental or life course theories (Sampson and Laub 1993).
Most research on criminal careers and age-crime trajectories has been descriptive, following the pioneering work of Wolfgang et al. (1973), who examined age at onset, length of criminal careers, and patterns of desistance. Recent research has turned to model-based approaches, such as growth curve or trajectory mixture models, which typically specify individual trajectories as polynomial in age. Such models describe population heterogeneity in individual trajectories either by including random effects for age and age-squared (Raudenbush and Chan 1993), or by specifying a mixture of latent classes of trajectories (Nagin and Land 1993), or by combining latent trajectory classes and random effects (Muthén and Shedden 1999). However, polynomial representations are typically not able to capture nu-anced heterogeneity between individuals in their observed patterns of criminal behavior, and research findings are often driven by variability in within-age behavioral amplitude (Gottfredson and Hirschi 1990). More importantly, both growth curve and trajectory mixture models tend to ignore subject-specific variation in the timing of crime. From a statistical perspective, this leads to inconsistent estimates of time-varying population parameters (Brumback and Lindstrom 2004, Gervini and Gasser 2004). From a substantive perspective, ignoring phase variability leads to population estimates of age-crime curves that are not representative of typical individual trajectories as they smear over local features of subject-specific time series. In particular, it has been observed that growth curve and trajectory mixture models are typically unable to capture individual variability in the decline of offending (Bushway et al. 2009).
In this article, we propose an alternative approach for analyzing longitudinal crime data. We draw on the growing research literature in criminology on the age-crime curve. Nearly all researchers agree that, overall, the population age-crime curve is unimodal, rising precipitously from age seven (the age of culpability) until the peak years-between ages 13-21 depending on the offense-and then slowly declining throughout the remaining years of the life-span. Some researchers have emphasized, on substantive and theoretical grounds, substantial heterogeneity in age-crime curves across groups of individuals. For example, Blumstein et al. (1985) posit groups of desisters, persisters, and non-offenders and Moffitt (1993) contrasts groups of life-course persistent chronic offenders versus adolescence-limited normal offenders. By contrast, Hirschi and Gottfredson (1983) injected controversy into criminology by arguing that a single age-crime curve underlies all criminal offenses and is invariant across all social groups and throughout history. Although they claimed invariance in the basic shape of the age-crime curve, they also acknowledged the presence of modest individual differences in crime trajectories. Specifically, Hirschi and Gottfredson (1983) claimed that individual differences are driven by differences in (time-stable) levels of offending and (time-varying) opportunities to commit crime. These substantive arguments naturally lend themselves to using an appropriately constrained functional data analysis approach for modeling longitudinal crime data.
We posit a unimodal population age-crime curve and develop a Poisson warping regression model that defines individual crime trajectories as random functions that deviate from the mean curve according to individual-specific level of offending and time misalignment patterns. We build on curve registration models of Ramsay and Li (1998), who introduced a model for the alignment of a sample of curves via a continuous monotone transformation of a main effect modifier (usually time), and Telesca and Inoue (2008), who formulated a Bayesian hierarchical model for curve registration, allowing for the borrowing of information across curves. To accommodate discrete observations (counts), we develop a generalized extension of the curve registration models to count data. In addition, we incorporate covariate effects directly on (1) the expected intensity of criminal behavior and on (2) the deviation from the average timing of offenses in a hierarchical fashion.
Although the assumption of unimodality is relatively weak given observed distributions of crime and drug use, we will compare the predictive performance of models with unimodality against a model with an unconstrained shape function. Moreover, our functional data analysis approach is sufficiently flexible to model not only modest individual departures from a population age-crime curve, but also substantial individual departures. Our unimodal constraint assumes a monotonic incline in offending to the peak age followed by a monotonic decline, which is consistent with observed empirical age-crime curves, including the relatively flat individual curves of chronic offending or low offending posited by Blumstein et al. (1985). Thus, the model can treat the question of invariance of the population age-crime curve as an empirical question.
We are not the first to take a functional data analytic point of view towards longitudinal crime data. Ramsay and Silverman (2002) carried out a functional principal component analysis on a landmark data set originally collected by Glueck and Glueck (1950), and reanalyzed by Sampson and Laub (1993). Our approach to analyzing life course crime trajectories, although functional, is fundamentally different from that of Ramsay and Silverman (2002) as we do not rely on principal components.
Several authors have contributed to the statistical analysis of random curves. Shi et al. (1996) were among the first to introduce flexible semiparametric models for the analysis of a sample of curves based on functional mixed effects modeling. In the analysis of sparsely observed functions, Rice and Silverman (1991), and, more recently, Yao et al. (2005) discuss nonparametric methods based on functional principal component analysis.
Typically, functional data analysis deals with large amounts of data sampled on a fine grid in time or space (Brumback and Lindstrom 2004, Gervini and Gasser 2004). Information on lifetime criminal behavior, however, often comes in the form of many short or sparsely sampled time series (see Elliott et al. 1985 or Harris et al. 2003). High individual heterogeneity in combination with such data structures requires models that capitalize on borrowing information across subjects while maintaining a high level of flexibility in order to provide a reasonable fit to individual observed trajectories.
Our method of hierarchical curve registration with covariates allows us to develop a flexible set of nonparametric representations for individual curves of criminal offending. It deals with data sparsity by combining information across curves in two ways: (1) structurally, by representing individual curves as an affine transformation of a natural crime curve constrained to be unimodal; and (2) stochastically, by assuming conditional dependence (exchangeability) between key parameters contributing to the likelihood function. As we model the crime trajectories in a semi-parametric fashion, we integrate the substantive claims of Hirschi and Gottfredson (1983) with the existing toolkit of functional data analysis methods and accommodate the unimodality constraint for non-Gaussian data. Unlike previous approaches to modeling crime trajectories, our approach explicitly incorporates criminological arguments that the population age crime curve is unimodal and that individual trajectories can be described as departures from the common population curve. We illustrate our approach by analyzing data on marijuana use.
1.1 Data
We consider marijuana use data from the Denver Youth Survey (DYS) (Esbensen and Huizinga 1990), a longitudinal study of delinquency and drug use in high risk neighborhoods in Denver. Marijuana use is of interest not only to drug researchers and life course scholars but also to criminologists because it is an illegal substance in the United States. The DYS collected data from an accelerated longitudinal design covering the age span from 7 to 25. The peak age of marijuana use is about age 20 (Office of National Drug Control Policy 2008). The survey asked drug use questions starting from age 11.
The DYS identified high risk neighborhoods via a cluster analysis of census variables such as family structure, ethnicity, SES, housing, mobility, marital status, and age composition (e.g., Esbensen and Huizinga 1990). High risk neighborhoods were then defined as the top third in terms of high social disorganization and high official crime rates. These neighborhoods represent the most disadvantaged areas of Denver.
The investigators selected a sample of 20,300 households from high-risk neighborhoods in Denver, and used a screening questionnaire to identify five child and youth cohorts (i.e., 7, 9, 11, 13 or 15 years old in 1988). The overall procedure yielded a sample of 1,528 respondents (for details see Matsueda et al. (2006); Esbensen and Huizinga (1990)). Of these respondents, 1,459 were aged 11 years or older for at least one interviewed year and completed a youth survey that included drug-use counts. Subjects were interviewed in their homes annually from 1988–1992 and 1995–1999 (10 waves).
We consider answers to the survey question “In the past year, how many times have you smoked marijuana?” Our goal is to model individual trajectories of marijuana use over the interval of 10–25 years of age, and to understand differences in these trajectories by race-ethnicity and gender. We selected individuals who had between 4 and 9 longitudinal observations on marijuana use for each individual. The resulting data set had a mean of 7.39 (SD = 1.37) observations per subject. Exploratory summaries associated with variability in timing and frequency of marijuana use are reported in Figure 1. In panels (a) and (b) we provide histograms for mean and maximal marijuana use. The frequency of marijuana use is highly volatile. Marijuana smokers – those who reported smoking at least once during the observation period – smoked marijuana 42.85 times per year on average with SD = 133.33 with the maximum reported count of yearly marijuana smoking of 999. 1 One commonly reported quantity of marijuana use over the last year is 365 times that corresponds to the once a day frequency of smoking. Ages at first and at the maximal marijuana use vary substantially as indicated by panels (c) and (d), Figure 1, that provide the relevant histograms. These summaries illustrate large individual variability in amplitude (level) and phase (timing) among individual trajectories of marijuana smoking.
Figure 1.
Exploratory Summaries. Panels (a, b): Histograms of mean and maximal use of marijuana. Panels (c, d): Histograms of age of first and maximal marijuana use.
The remainder of this article is organized as follows. In Section 2, we introduce a hierarchical model for the semi–parametric analysis of longitudinal count data. We discuss MCMC estimation and inference in Section 3 and analyze lifetime data on marijuana use from the Denver Youth Survey in Section 4. We conclude with a discussion of our contributions and possible model extensions in Section 5.
2 Hierachical Registration
2.1 Poisson Warping Regression Model
In this section, we introduce a general formulation for the functional representation of longitudinal crime data. Let Yi = (Yi1, …, Yij, …, Yin)′ denote an observed vector of offenses for individual i over a discrete sampling design t = (t1, …, tj, …, tn). To simplify notation, we assume that a sampling design t is common for all individuals, but the functional model is flexible to accommodate different sampling times. Technically, observed counts Yij denote the number of offenses over a reasonable time interval τ, e.g., a month or a year, just before sampling times tj. The time interval τ is fixed and the same for all observations in the sample. Let Xi denote a p–dimensional vector of time-stable covariates for individual i.
We assume that individual trajectories of offending are realizations from a functional Poisson process. Thus, the observed count at time tj for individual i is
| (1) |
where E(Yij | Xi) = λi(tj, Xi). The sampling density of Yij is given by
Assume the intensity function λi(tj, Xi) depends on the covariate information Xi as follows:
| (2) |
where ai(Xi) ≥ 0 is an individual-specific amplitude, S(tj, β) is a mean shape function, and μi(tj, ϕi; Xi) is an individual-specific time transformation function. Consequently, the mean function S(tj, β), evaluated over a subject-specific time scale μi(tj, ϕi; Xi), defines individual-specific mean trajectory of offending. Our notation indicates explicit dependence on Xi for individual amplitude and time-transformation functions. We define this dependence in Sections 2.3 and 2.4, respectively, by modeling the mean of ai and ϕi as a function of covariates Xi.
For the mean shape and time transformation functions, we assume that their functional forms belong to the Sobolev space spanned by linear combinations of cubic B-Spline basis functions (De Boor 1978). Intuitively, this is a vector space containing shapes of virtually arbitrary flexibility, provided it originates from an adequate number of basis functions. See also Peña 1997 for a discussion of B-spline optimality and stability. When modeling the shape function S(t, β) we further constrain the functional form to be unimodal.
The modeling framework introduced in equations (1) and (2) generalizes substantive arguments of Hirschi and Gottfredson (1983) about the age-crime curve. It starts by assuming a common unimodal shape for the age-crime curve and reflects individual differences in the expected intensity of criminal behavior ai(Xi) and deviations from the average timing of offenses μi(tj, ϕi; Xi).
2.2 Mean Shape S(t, β) and Unimodal Smoothing
Let the mean shape function S(t, β) be a mapping S(t, β):
→ ℝ+, where
= [t1 − Δ, tn + Δ] is the observed sampling interval [t1, tn] that is extended by a temporal misalignment window Δ < ∞ (Telesca and Inoue (2008)). Assume the functional form of the average shape S(t, β) is a linear B-spline combination S(t, β) =
B(t)′β, where
B(t) is a set of K basis functions of order 4 evaluated at time t and β is a p-dimensional vector of spline coefficients. To ensure positivity of S(t, β), it is sufficient to require positivity of the shape coefficients βj ≥ 0, j = 1, …, K. To ensure unimodality of S(t, β), it is sufficient to require the first derivative ∂S(t, β)/∂t to exibit only one possible sign change (Schumaker 1981, Theorem 4.76). We combine the unimodality and positivity requirements via the following reparametrization of the shape coefficients β:
| (3) |
where the new coefficients ν = (ν1, …, νK)′ are nondecreasing, i.e., 0 = ν1 ≤ … ≤ νK < 2ν*, and ν* ≥ 0 is a fixed modal pivot 2. We place a second order shrinkage prior distribution on ν. In particular, assuming ν0 = ν1 = 0, we model the generic kth element of ν as
| (4) |
The variance parameter can then be interpreted as a smoothing parameter shrinking the shape function towards a piece-wise linear trajectory.
2.3 Amplitude Parameters ai(Xi) and Amplitude Regression
The notion that individual criminal propensity is constant across the life span but varies among individuals is common in the criminology literature. Gottfredson and Hirschi (1990) introduced the hypothetical concept of self-control that could explain this variation. The amplitude regression part of our model allows us to test the relationship between individual criminal propensity and observed covariates.
We model the dependence of individual-specific amplitude ai on covariates Xi in a generalized linear fashion:
| (5) |
where ba is a p-dimensional vector of amplitude regression coefficients. To specify a prior distribution for ai with the mean given by equation (5), we exploit the Gamma-Poisson conjugacy and assume
| (6) |
In this formulation, 1/√b0 represents the coefficient of variation.
The prior distribution of ai in equation (6) has two appealing properties. First, due to conjugacy, the conditional posterior density of ai is
| (7) |
which corresponds to Gamma distribution with shape parameter (b0 + Σ Yi(tj)) and rate . In addition, the marginal distribution of observed offense counts Yij, integrating over ai, is Negative Binomial:
| (8) |
Where
This form allows for natural modeling of overdispersion in the marginal distribution of counts. Here, small values of b0 indicate extra variability beyond that explained by the Poisson.
In the presence of amplitude parameters ai, scale identifiability is often an issue. In Gaussian models, for example, Gervini and Gasser (2004) and Brumback and Lindstrom (2004) impose summation constraints of the kind . From a Bayesian perspective, scale identification can be achieved by modeling dependence between the ai at the population level (see Telesca and Inoue 2008).
The amplitude part of the model is completed with priors for the coefficient of variation and for the regression coefficients respectively
| (9) |
The specific form for π(ba) is described in the next section, in order to relate amplitude and phase effects.
2.4 Time Transformation Functions μi(t,ϕi) and Phase Regression
Criminologists specify multiple dynamic influences on trajectories of offending. For example, changes in crime and drug use over time are attributed to changes in peer groups, opportunities, school experiences, and neighborhood contexts. In our model, we use individual time-transformation functions and a phase shift to account for such dynamic individual-specific influences. In addition, we model the phase shift as a linear function of time-stable covariates, gender and race-ethnicity This allows us to test whether certain groups of individuals start their criminal careers on average earlier than other groups, controlling for differences in amplitudes.
We allow time transformation functions to map the original time scale onto random image sets enclosed in an extended sampling interval
= [t1 − Δ, tn + Δ], that is μi(t, ϕi): [t1, tn] →
(Telesca and Inoue (2008)). As before, [t1, tn] is the observed time interval and Δ < ∞ is a temporal misalignment window. We require subject-specific time transformation functions μi(t,ϕi) to be strictly monotone, ∂μi(t, ϕi)/∂t > 0 (Ramsay and Li 1998), to prevent time reversibility and to define a bijection between the original time scale t and the transformed time scale μi(t, ϕi).
Let
μ(t) denote a set of Q B-spline basis functions of order 4, evaluated at time t. We define the subject-specific time transformation functions as linear combinations μi(t, ϕi) =
μ(t)′ ϕi for a given Q–dimensional vector of basis coefficients, ϕi = (ϕi1,…, ϕiQ)′. Imposing the ordering ϕi1 <…< ϕiq <…< ϕiQ provides us with a sufficient condition for time transformation functions μi(t,ϕi) to be monotone (Brumback and Lindstrom 2004). Additionally, imposing boundary conditions (t1 − Δ ≤ ϕi1 ≤ t1 + Δ) and (tn − Δ ≤ ϕiQ ≤ tn + Δ) allows for the time transformations μi(t, ϕi) to map the original time scale t onto random intervals not bigger than [t1 − Δ, tn + Δ] and not smaller than [t1 + Δ, tn − Δ]. This last requirement rules out possible degeneracies, provided that the temporal misalignment window is such that Δ ≪ (tn − t1)/2.
Let ϒ be a Q–dimensional vector of identity coefficients, so that
μ(t)ϒ′ = t. Following the penalization approach introduced in Lang and Brezger (2004), we assume that individual time transformation coefficients ϕi arise from a first-order random walk shrinkage prior. Thus, for all i = 1,…, N,
| (10) |
where ϕi0 = ϒ0 = 0. Here,
defines a set of random cuts such that ηiq − ηi(q−1) > ϒq−1 − ϒq, q = 1, …, Q, where |ηi1| ≤ Δ and |ηiQ| ≤ Δ. The variance parameter
is a smoothing parameter that controls the amount of shrinkage of individual time transformation functions towards the identity transformation μi(t, ϒ) = t.
We incorporate covariate effects by modeling the average phase shift as a linear function of covariates Xi:
| (11) |
Finally, we complete the model with priors over phase and amplitude regression coefficients with conditionally conjugate hyperprior Σb ∼ IW(νb, cb I2p).
3 Estimation and Inference
Our modeling approach can be essentially summarized as follows. Marijuana use in time is assumed to arise as the realization of a functional Poisson process with mean structure (2). Dependence on covariate information is included through amplitude effects (5) and phase shifts (11).
3.1 Likelihood Function
Using the B-Spline representations for the mean shape and time transformation functions described in Sections (2.4) and (2.2), we rewrite the expected number of offenses for subject i at time tj from equation (2) as:
| (12) |
Here we omit the explicit dependence of ai and ϕi on covariates X to simplify notation. The log-likelihood function of shape coefficients β, amplitude parameters a = (a1, …, aN)′ and time transformation coefficients is then
| (13) |
The above formulation of the likelihood depends on the choice of the number and locations of the spline knots for the mean shape S(t, β) and time transformation functions μi(t, ϕi). Because the mean shape S(t, β) is estimated from multiple individual trajectories, several authors in functional data analysis recommend selecting a large number of knots. For example, placing knots at every sampling time point can allow for a high level of shape flexibility. The level of smoothness is then selected automatically or ad hoc via likelihood or prior penalization schemes (Lang and Brezger 2004, Eilers and Marx 1996). The shrinkage prior as in equation (4) automatically shrinks the fixed effect functions towards a linear regression. In our case of highly sparse longitudinal offense data, however, we observed some sensitivity to the choice of the number of basis functions. To select the number of basis functions, we therefore recommend applying a model selection criterion based on the minimization of a posterior predictive loss (Gelfand and Ghosh 1998).
Let Yo denote the observed counts and Yp denote the predicted counts. Following Gelfand and Ghosh (1998), we obtain the deviance version of the posterior predictive loss criteria for the Poisson case as
| (14) |
Where h(x) = (x + 1/2)log(x + 1/2) − x, , and . Here, m denotes the number of basis functions in the model to be evaluated.
Different considerations apply for the subject–specific time transformation functions. These maps carry structural smoothness as they are constrained to be monotone. The strict monotonicity requirement counterbalances the small number of observations associated with each individual trajectory and suggests parsimony in the choice of the number of knots. Because the time scale is stochastic, the exact placement of knots is less important in this case; thus we place knots for time transformation functions as equally spaced.
In the following section, we describe a Markov chain Monte Carlo (MCMC) algorithm for posterior simulation based on fixed numbers of spline basis for mean shape
B(·) and time transformation
μ(·) functions.
3.2 Posterior Simulation via MCMC
For the Poisson warping regression model described in Section 2, the full parameter set θ includes an N-dimensional vector of individual-specific random amplitude coefficients a = (a1, …, aN)′, an (N × Q) matrix of individual-specific time transformation coefficients , a p-dimensional vector of population-level shape coefficients β, and population level regression and smoothing parameters ba, bϕ, b0, and .
We seek inference about θ and functionals of θ through the posterior probability p(θ | Y; X) ∝ p(Y | θ; X)p(θ; X), where p(Y | θ; X) is described by the log-likelihood in equation (13) and p(θ; X) represents the joint prior distribution. Recall the dependence on covariates for amplitude and time-transformation parameters through their respective prior distributions (equations (6) and (11)). Because the posterior distribution is not available in closed form, we base our inferences on an MCMC simulation from the joint posterior distribution p(θ | Y; X) (for a recent review, see Gamerman 1997). We use a Gibbs sampler (Gelfand and Smith 1990) whenever conditional posterior quantities are available in standard distributional form. Otherwise, we derive an efficient sampling scheme, combining Gibbs steps with Metropolis-Hastings (MH) steps (Hastings 1970) in a hybrid sampler (Tierney 1994).
Sampling phase regression coefficients bϕ and smoothing parameters and
The prior model induces likelihood conjugacy in the conditional posterior distribution of the phase regression coefficients bϕ and the smoothing parameters and . For these quantities, it is therefore straightforward to devise an efficient Gibbs sampler based on direct simulation from their complete conditional distributions that we include in the Appendix.
Sampling time transformation coefficients ϕ
Taking advantage of the fact that the time transformation coefficients ϕ have compact support
= [t1 − Δ, tn + Δ], we implement a MH sampler with appropriately scaled transition kernels q(ϕold, ϕnew). Given that ϕiq < ϕi(q+1), (∀ i = 1,…, N, q = 1,…, Q), we consider proposal densities of the form
, where
is the compact support defined in equation (10), Section 2.4. During the MCMC simulation, for each set of individual-specific time transformation coefficients, we start from some value s2 for the variance of the proposal density and recalibrate the individual proposal variances
at burn-in to achieve an acceptance rate between 35% and 65% (Roberts and Rosenthal 2001).
Sampling amplitude parameters a, ba and b0
We use a Gibbs sampler and simulate directly from the conditional posterior distribution given in equation (7) to update individual amplitude parameters a = (a1, …, aN)′ one at a time.
The conditional posterior distributions for regression coefficients ba and for the coefficient of variation b0 are not available in closed form. We implement MH scans with proposal distributions informed by the respective target densities. If Ωba defines the inverse of the prior covariance matrix (i.e., the concentration matrix) on amplitude regression coefficients ba, the conditional posterior density of ba can be written as:
with gradient vector
and Hessian matrix
Given gba and Hba, we approximate the conditional posterior mode b̂a numerically via Newton-Raphson. Defining , we obtain the transition kernel on the basis of the over-relaxed proposal . The parameters τba and η can be used to tune the MH acceptance ratio.
For the coefficient of variation b0, we use a MH step to sample from the conditional posterior density. We consider a Gamma proposal with shape v0 and rate v0/b̂0; the parameter v0 can be used to tune the MH acceptance ratio, while the conditional moment estimator of b0 is defined as .
Sampling parameters ν of common shape function
Recall that we reparameterized the common shape function S(t, β) with new nondecreasing coefficients ν from equation (3). We update ν one parameter at a time using a MH scan with transition kernels based on conditional proposals , j = 1, …, K. Defining Ων = Cov(ν)−1, the logarithm of the target conditional posterior density is
Given a fixed modal pivot ν*, the conditional posterior support of νj is [l(νj), u(νj)], with l(νj) = max(0; νj−1) and u(νj) = min(νj+1; 2ν*), j = 2, …, K − 1. Furthermore, for j = K, l(νK) = min(νK−1, ν*/2) and u(νK) = ∞. We fix ν1 = 0, corresponding to the assumption of no marijuana use at time t1. We consider independent Gaussian proposals with support defined by (l(νj), u(νj)) and recalibrate the scale of the transition kernel at burn-in to achieve acceptance rate between 35% and 65%.
3.3 Model Interpretation and Inference
Given baseline covariate information x = (x1, …, xp)′, the mean function S{μ(t),β} is well defined for any t ∈ [t1, tn]. If we focus on the expected intensity of criminal behavior, the marginal expected count at time t can be written as:
| (15) |
allowing us to describe average trajectories of offending conditional on covariate values.
Posterior predictions of individual trajectories can be obtained conditioning on individual-specific amplitudes and time transformation functions:
| (16) |
Such model-based individual predicted trajectories are of considerable interest to criminalogists for describing and explaining the development of crime and deviance over the life-course. For example, Bushway, Sweeten and Nieuwbeerta (2009) discuss and compare ways to identify “early-starters” and “desistors” by examining individual predictions from other longitudinal data analysis approaches. In contrast to other methods, however, our model allows naturally for examining marginal covariate effects on two key features of the age-crime distribution – criminal propensity and the timing of criminal careers – across all individuals in the sample.
Given MCMC draws from the posterior distribution of model parameters θ and a fine grid of time points in T= [t1, tn], we obtain point wise summaries of curves given by equations (15, 16) and point wise 100(1 − α)% HPD intervals using the method described by Chen and Shao (1999).
We find it convenient to include an intercept term in the model by letting the first column of the design matrix X to be the column of 1s. Thus, if marginal effects of covariates on the expected intensity of criminal behavior are of interest, one can examine the marginal expected count at time t, conditional on the identity time transformation for the natural age-crime curve:
| (17) |
Given that x1 = 1 by convention, we rewrite the expectation in (15) as
The intensity of offending for the baseline combination of covariates is then exp{bal}, and a multiplicative effect on intensity associated with a unit increase in covariate xκ is exp{baκ}, κ = 2,…, p, all else being equal.
If the marginal effects of covariates on the expected timing of criminal behavior are of interest, we can examine the mean trajectory of offending over time t, substituting the identity transformation for the amplitude:
| (18) |
As before, given that x1 = 1, we rewrite (18) as
The mean age-crime trajectory for the baseline combination of covariates is S(t + bϕ1, β), and an additive phase effect associated with a unit increase in covariate xκ is bϕκ, κ = 2,…, p, all else being equal. These phase effects can be interpreted as shifts in the timing of criminal careers. Thus, positive coefficients bϕκ indicate an earlier participation in crime on average.
4 Case Study
We restrict our analysis to marijuana users who had at least four, not necessarily consecutive, observations during the course of the study.3 We define marijuana users as those who reported smoking marijuana at least once. After removing 867 non-users and 22 marijuana users who had fewer than four observed time-points, we are left with a subset of 588 marijuana offenders for analysis. Our inferences are based on 15,000 (thinned by 20) samples from the posterior distribution, after discarding a conservative 50,000 iterations for burn-in. We assessed convergence using the R package BOA (Bayesian Output Analysis; Smith and Brian 2005).
Longitudinal observations of marijuana use are reported in Figure 2, panel (a). A few observations indicating marijuana smoking more than 500 times/year have been cut off for ease of visualization. In this figure, the solid black superimposed curve is the overall smoothed mean. This summary does not resemble any of the individual trajectories as it smears over the variability in both timing and frequency of drug use.
Figure 2.
Drug Use (Marijuana). Panel (a): Yearly count for the use of marijuana for 588 subjects from the DYS. A solid black line depicts the structural mean function. Panel (b): Aligned normalized trajectories. In black we report the overall functional convex average S(t, β). Panel (c): Subject–specific posterior square root amplitude with associated 95% credible intervals. Panel (d): Subject-specific time scale, characterized by the expected posterior time transformation functions.
We fit the model introduced in Section 2 using 9 basis functions (5 interior knots) for the shape function S{t, β}, defined on the extended time interval [t1 − Δ, tn + Δ]. This choice was made to minimize the posterior predictive loss introduced in (14) (Gelfand and Ghosh 1998). Furthermore, we consider 5 basis functions (1 interior knot) for the individual random time transformations (See Fig. 7). The misalignment window Δ can be interpreted as the maximal size of a linear shift. A natural constraint for the size of Δ is given by the half width of the time domain (tn − t1)/2, but more stringent values may be justified in order to avoid degeneracies in the time transformation functions. In our application, we choose a more conservative Δ = 1.5.
Figure 7.
Unimodal Vs. Unrestricted Age-Crime Curve. Panel a: Posterior predictive loss analysis comparing unimodal and unrestricted models fitted specifying a varying number of interior knots. Panel b: Conditional predictive ordinate analysis comparing unrestricted and unimodal models with 5 interior knots.
We place relatively diffuse
(0.1, rate = 10) priors on the shape precision
and
(0.1, rate = 1) on the time transformation precision
. The amplitude-phase prior covariance Σ b is assigned a proper Inverse Wishart prior IW(12,100 I10). Finally we complete the model specifying a prior distribution for the coefficient of variation 1/√b0. As we are considering a sample of users we require b0 > 1 and define a shifted gamma prior
(λa = 1.1, λb = 0.1) on (b0 − 1). A proper scale-informative prior on ba is used to define ‘soft’ identifiability constraints. The constraint b0 > 1 assures that the prior mode of ai is greater than 0.
Figure 2, panel (b), shows observed frequencies of drug use that have been normalized by removing individual differences in timing and in amplitude for all individuals. We obtained these quantities by evaluating observed frequencies on the inverse transformed time scale , including the phase shift, and dividing by the expected amplitude of offense E(ai | Y). We superimpose normalized observed counts with a smoothed functional convex average S(t, β). This figure shows a typical pattern of marijuana use for an average individual in our sample of marijuana smokers from the most disadvantaged areas of Denver. The average individual starts smoking marijuana during adolescence, continues with higher intensity through college age, and then drops off marijuana smoking after reaching twenty. We observe a thin left tail of occasional use before the peak years and a thicker right tail of occasional use after the peak years. This pattern is generally consistent with the claims of Hirschi and Gottfredson (1983) and previous empirical research on the age-crime curve.
Figure 2, panel (c), shows posterior median estimates of the individual amplitude parameters on the log scale, with corresponding 95% highest posterior density (HPD) credible intervals. We observe that variability in amplitude is an important source of variation in marijuana smoking trajectories. Estimated log amplitude parameters are from about -4 to 4; the range of these estimates is much wider than the width of a typical 95% credible interval. A log amplitude equal to 0 corresponds approximately to a marijuana smoking trajectory at the level of the overall functional convex mean (solid black line in panel (b)), with the estimated peak smoking at about 33 times per year. In comparison, the average log amplitude of 1.5 for white males corresponds to marijuana smoking frequency that is exp(1.5) ≈ 4.5 times higher than the structural mean, with the estimated peak at about 33 * 4.5 = 148 times per year.
Panel (d), Figure 1, shows the posterior expected estimates of individual time transformation functions, indicating that phase variability is another important source of variation in self-reported marijuana use. Figure 3 illustrates how large differences in the timing of marijuana use are reflected in individual-specific estimates of time-transformation functions. Panel (a) highlights individual trajectories for an early user (solid line), an average user (dotted line) and a late user (dashed line). The corresponding estimated time transformation functions and associated 95% credible bands are reported in panel (b). We note that for the average user the time transformation is close to identity, i.e., the stochastic age of this person is similar to his or her physical age. The late user's stochastic age is kept frozen in time until his or her physical age of about 18; this individual then goes through the marijuana use period much faster than an average marijuana smoker in our sample. The early user exhibits a similarly quick period of marijuana smoking but at much earlier ages.
Figure 3.
Drug Use (Marijuana). Panel (a): Yearly count for the use of marijuana for 3 subjects exhibiting different timing of marijuana use. Panel (b): Subject-specific time scale and associated 95% credible bands for the same subjects highlighted in panel (a).
We use two approaches to investigate how overall intensity and timing of drug use depends on race and gender. First, in Table 1, we report posterior estimates of the amplitude and phase regression parameters for the covariates in the model (indicators for ‘female’, as well as ‘African American’, ‘Latino’ and ‘other’ case categories; we use ‘white male’ as the baseline category). We find that, overall in our sample, females use marijuana less frequently. For example, white females use marijuana with an overall frequency that is about exp(−0.334) ≈ 0.72 to 1, when compared to their male counterparts. African Americans males from disadvantaged areas of Denver, on the other hand, seem to be using marijuana more frequently (1.55 to 1), when compared to their Caucasian counterparts. We find the same significant differences in terms of timing of drug use, with females starting to use marijuana on average about 5.5 months earlier when compared to white males and African Americans starting to use marijuana on average about 10 months later. 4 We did not estimate gender by race interactions in our model as some subgroups only included a small (< 25) number of subjects. Second, in Figure 4, we report predicted mean population curves of marijuana use for some race and gender subgroups, obtained with equation (15). The predicted mean curves in Figure 4 complement our findings from Table 1, illustrating differences in marijuana use by race and gender.
Table 1. Amplitude and Phase Regression Parameters. (Stars denote 95% C.I. which do not cover the value zero.).
| Amplitude | Phase (years) | |||
|---|---|---|---|---|
|
| ||||
| Effect | E(ba|Y) | 95% C.I. | E(bφ|Y) | 95% C.I. |
| Baseline | ||||
| White Males | 1.502 | [1.240, 1.842] | -0.741 | [−1.169, −0.312] |
| Main Effects | ||||
| Female | -0.334 | [−0.501, −0.167]* | 0.461 | [0.213, 0.713]* |
| Latino | 0.154 | [−0.107, 0.396] | 0.420 | [−0.017, 0.864] |
| African American | 0.436 | [0.175, 0.678]* | -0.864 | [−1.325, −0.385]* |
| Other | 0.394 | [0.068, 0.715]* | 0.172 | [−0.383, 0.718] |
Figure 4.
Drug use (marijuana) mean population curves for some race and gender categories.
Examining individual data, we find that the estimated expected crime trajectories fit the observed data well. In Figure 5, we report expected frequencies of marijuana use for a sub-sample of six subjects in the DYS chosen from the race and gender categories to illustrate a representative range of individuals' amplitude and timing, obtained with equation (16). Black dots and solid lines indicate the observed and expected counts of yearly marijuana use, respectively, and the dashed lines represent pointwise 90% highest posterior density prediction intervals. This figure shows how our model formulation appears to provide a remarkably close fit to individual profiles. Based on information that is shared across subjects, this modeling framework allows for individual-specific predictions for all time points within time interval T, including those points where the individual did not have observations. Wider prediction bands illustrate higher uncertainty in model predictions where no subject-specific data is available. We carried out a formal assessment of the goodness of fit by comparing posterior predictive distributions with corresponding summary statistics obtained empirically from the data. In Figure 6, panels (a-b), we report 90% posterior predictive intervals for the individual yearly average and maximal marijuana smoking levels, as well as the corresponding summary statistics. The model provides us with an excellent coverage for the observed average levels of marijuana use (94%) and maximal frequencies of marijuana use (86%). Panels (c-d), Figure 6, report 90% posterior predictive intervals for the individual age at maximal marijuana use and for the age at first use as well as the corresponding observed ages. The posterior predictive intervals follow the observed ages fairly well across the time interval. Note that because the available data do not allow us to obtain exact empirical estimates of the timing summaries, i.e., we could only say that the first marijuana smoking event happened during a particular year but are unable to distinguish when it happened exactly, examining numerical coverage values for age-related summaries would not be appropriate.
Figure 5.
Drug Use (Marijuana). Lifetime marijuana use profiles for six random subjects from different race and gender categories. For each profile, the solid line represents the median posterior expected count and the dot–dashed lines represent the associated 90% pointwise prediction intervals.
Figure 6.
Posterior predictive checks. Panels (a-b): (90%) Posterior predictive intervals vs. summary statistics associated with average marijuana use and maximal level of marijuana use. Panels (c-d): (90%) Posterior predictive intervals and data points corresponding to the age at maximal marijuana use and age of first use.
In sum, our model provides two important features in modeling individual crime trajectories. First, it allows us to estimate a common age-crime curve and fit individual trajectories as departures from that mean curve. Second, it allows us to disentangle variation in individual crime trajectories due to differences in level (amplitude) and timing (phase shift) of offenses. This approach gives us new insights into modeling trajectories of marijuana use. We find the shape of the estimated common age-crime curve to be consistent with prior empirical research. Like previous research, we find race and gender differences in levels of marijuana smoking; unlike previous research, our finding controls for the common age-crime curve and individual variability in phase and amplitude. We find little support for the claims that differences in the shape of the age-crime curve are merely due to differences in rates of offending; for example, Hirschi and Gottfredson (1983) argue that racial differences in age-at-onset are merely due to racial differences in rates of offending. By contrast, we find significant race and gender differences in timing as indicated by the shift of marijuana use trajectories while controlling for race and gender differences in amplitude. Moreover, we do not detect any correlation, a posteriori, between individual amplitude and shift parameters, which would be expected if differences in timing were merely due to differences in amplitude.
5 Discussion
In this article we propose a generalized warping regression method for the analysis of longitudinal crime data. We model subject-specific expected patterns of offenses as arising from a natural unimodal age-crime curve, evaluated over a random individual-specific time transformation scale, and with a random individual-specific amplitude.
The analysis we present in this paper has several limitations. First, we chose to ignore the issue of heaping in the distribution of self-reported counts (Wang and Heitjan 2008). A more realistic sampling model would take into account tendencies to report smoking marijuana with a rounded frequency (most commonly, in our example, 365). Second, observations in the data set are right-censored at age 25 and may be left-censored at the age when individuals enter the survey. Third, the flexible specification of the mean function may be subject to criticism of overparameterization. Although these issues are important and deserve further attention, we believe that the hierarchical Bayesian formulation together with a flexible mean function in our model may already provide reasonable adjustments to heaping biases and issues of censored data. We alleviate overparametrization concerns via structural modeling constraints, shrinkage priors and model selection via posterior predictive checks.
The assumption of a unimodal population age-crime curve plays a key role in the proposed analysis framework and provides us with robustness to alternative model specifications involving varying flexibility of the population age-crime curve. Furthermore, this structural restriction is associated with improved predictive performance in this case study, when compared to the models without the unimodality restriction. A formal comparison of models fitted with and without the unimodality restriction is summarized in Figure 7. Panel (a) compares the posterior predictive loss associated with the two models, evaluated for a varying number of spline basis knots. In our implementation the PPL is based on deviance calculations as defined in (14). Our derivation follows the argument in Gelfand and Ghosh (1998), which describe the PPL as a penalized goodness of fit measure, where the penalty term is associated with the magnitude of the predictive variance. For all levels of complexity, the posterior predictive loss associated with the unimodal curve model is lower then that associated with the unrestricted mean model. We obtain a detailed comparison of the two best models by calculating the conditional predictive ordinate p(yi | y(i)) for each individual in the sample (Pettit 1990; Geisser 1993). This summary is comparable to the classical cross validation procedure. Here we evaluate the predictive density of yi given observations y(i), excluding counts from subject i. Figure 7, panel (b) shows that, for a clear majority of subjects, predictions based on the unimodal curve model outperform those obtained without imposing structural constraints. These results indicate that the unimodality assumption is appropriate for modeling marijuana smoking trajectories. Further empirical work is needed to confirm that the unimodality assumption is appropriate for modeling behavioral trajectories of drug use and crime in general. This assumption would indeed be unnecessary if longer, less sparse time series of crime behavior would be available. In which case, one may also question the assumption of a unique population age-crime curve. We however maintain that further modeling extensions, may not be warranted in the analysis of the DYS data explored in this article; certainly not without substantial methodological developments.
Our method of Bayesian hierarchical unimodal curve registration presents a novel approach to modeling data on behavioral trajectories that relies on a common underlying uni-modal mean curve and models individual trajectories by specifying individual-specific deviations in phase and amplitude from the mean curve. Popular current approaches to analysis of longitudinal behavioral trajectories most often rely on growth curve models (Raudenbush and Chan 1993) and mixture trajectory models (e.g., Nagin and Land 1993). Similarly to our approach, growth curve models also assume that all individuals share the same mean trajectory of crime. However, the mean trajectory is most commonly specified by a polynomial, and individual deviations from the mean are characterized by individual random intercept, slope and quadratic terms. Group-based trajectory models (Nagin and Land 1993; Roeder et al. 1999) aim to identify latent classes of crime trajectories, following from a theoretical taxonomy developed by Moffitt (1993) who described a population of offenders as a mixture of two different groups, adolescence-limited and life-course persistent offenders. Growth curve mixture models of Muthén and Shedden (1999) also assume latent classes of trajectories but, in addition, incorporate within-group polynomial random effects, similarly to the growth curve models. In contrast to our curve registration approach, the existing approaches for longitudinal analysis of behavioral outcomes can only implicitly account for individual differences in timing, whether at the group or individual level. Because time-varying population estimates can only be consistent if individual trajectories have been properly aligned to a standardized time-scale (Brumback and Lindstrom 2004, Gervini and Gasser 2004, Telesca and Inoue 2008), substantive implications of failing to account for phase variability in the data can be enormous. For example, without adjusting for phase variability, estimates of individual quantities related to timing, such as age at peak use, may not be reliable and findings of multiple latent groups of individual trajectories may be spurious. We believe that our approach has great potential to provide researchers with new insights for modeling drug and alcohol use as well as other behavioral outcomes.
Our model formulation explicitly accounts for variability in individual amplitude (level of offending) and timing, and includes covariate effects on these quantities. While the interpretation of covariate effects on amplitude is straightforward, covariate effects on timing are incorporated in the shift parameter and should be interpreted as such. Technically, the shift parameter in our model reveals differences in timing that are conditional on the individual curves being transformed to a common shape. The observed summary statistics may not correspond to the differences in timing/shift revealed by the model because observed differences in timing may be confounded with observed differences in shape. In the marijuana example, however, race and gender differences in timing found with our model were similar in magnitude and significance to observed race and gender differences in age at maximal use. Thus, in a regression of age at maximal use on race and gender, females were estimated to reach the age at maximal use significantly earlier than males and blacks significantly later than whites. If we were interested in examining race and gender differences in other specific timing features of the age-crime curve, such as age at onset and age at desistance, the model would have to be extended to incorporate time-stable covariates as time-varying effects. While this approach would induce higher flexibility in the modeling of longitudinal counts, the interpretation can be more challenging. On the other hand, the appealing features of direct inference about functional changes in the mean structure suggest that extending our model to incorporate time-varying effects of covariates may be worthwhile.
Another direction for a possible extension of our model relates to the foregoing discussion about identifying distinct groups of offending trajectories. Ramsay and Silverman (2002) carried out a functional principal component analysis on arrest data in an attempt to confirm or disprove the existence of distinct groups of criminal offenders and found “no real evidence of strong groups.” A group mixture reformulation of our model may allow for the classification of different features of the age-crime curve, from intensity of offense, to typical offending ages, to different shapes of the natural crime curve. Moreover, it would relax the assumption of a single population age-crime curve and attempt to fit models of multiple group trajectories, such as Blumstein et al. (1985) and Moffitt (1993).
Another important question in criminology is to understand how criminal behavior changes in association with time-dependent covariate information. For example, do individual departures from a natural crime curve correspond to changes in life course transitions, such as high school dropout, entrance into college, parenthood, and entrance into the labor force? To address these questions, one needs to incorporate time-dependent covariates. This could be achieved, for example, by integrating our warping regression model with the historical functional linear model of Malfait and Ramsay (2003).
The above potential extensions to our model would capitalize on the strengths of our general approach to modeling crime trajectories. These strengths include (1) an individual-level model that is both flexible and realistic, and allows for differences in amplitude and timing of offenses; (2) a model that incorporates criminologists' specifications of an invariant age-crime curve with individual departures based on individual differences in crime propensity and life situations; and (3) an estimation procedure that borrows information between the population average age-crime curve and individual departures from that curve.
Supplementary Material
Acknowledgments
The authors would like to acknowledge funding from the National Institute on Drug Abuse (R01: DA019148-01A1), and a Seed Grant from the Center for Statistics and the Social Sciences, with funds from the University Initiatives Fund at the University of Washington. We would also like to acknowledge Richard Callahan for his superb research assistance. Telesca D. would like to acknowledge seed funding from the UCLA Senate.
Footnotes
A few respondents reported using marijuana more than twice a day. For those very few who reported marijuana more then 3 times a day, their answers were truncated to the maximum of 999.
We discuss unimodal smoothing in more detail in Section 2 of the supplementary material. In our analysis choosing ν* = √Y provided a reasonable reference scale on the magnitude of S(t, β), for the prior on ba to be reasonably centered around 0.
The inclusion of observations with a shorter time series would not affect the population estimates; however, posterior inference on subjects with fewer then 4 records can be misleading due to weak identifiability of the subject level parameters.
National survey data covering the years of our survey shows that observed race and gender differences in age at onset of marijuana use are small and change signs across survey years (Gfroerer et al. 2002).
References
- Blumstein A, Cohen J. Characterizing criminal careers. Science. 1987;237(4818):985–991. doi: 10.1126/science.237.4818.985. [DOI] [PubMed] [Google Scholar]
- Blumstein A, Farrington DF, Moitra S. Delinquency careers: Innocents, desisters, and persisters. Crime and Justice. 1985;6:187–219. [Google Scholar]
- Brumback LC, Lindstrom MJ. Self modeling with flexible, random time transformations. Biometrics. 2004;60(2):461–470. doi: 10.1111/j.0006-341X.2004.00191.x. [DOI] [PubMed] [Google Scholar]
- Bushway S, Sweeten G, Nieuwbeerta P. Measuring long term individual trajectories of offending using multiple methods measuring long term individual trajectories of offending using multiple methods. Journal of Quantitative Criminology. 2009;25(3):259–286. [Google Scholar]
- Chen MH, Shao QM. Monte carlo estimation of bayesian credible and hpd intervals. Journal of Computational and Graphical Statistics. 1999;8(9):69–92. [Google Scholar]
- De Boor C. A Practical Guide to Splines. Berlin: Springer-Verlag; 1978. [Google Scholar]
- Eilers PHC, Marx BD. Flexible smoothing with B-splines and penalties. Statistical Science. 1996;11:89–102. [Google Scholar]
- Elliott DS, Huizinga D, Ageton SS. Explaining Delinquency and Drug Use. Beverly Hills: Sage Publications; 1985. [Google Scholar]
- Esbensen FA, Huizinga D. Community structure and drug use: From a social disorganization perspective. Justice Quarterly. 1990;7:691–709. [Google Scholar]
- Gamerman D. Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference. Chapman & Hall Ltd; 1997. [Google Scholar]
- Geisser S. Predictive Inference: An Introduction. Chapman & Hall; London: 1993. [Google Scholar]
- Gelfand AE, Ghosh SK. Model choice: A minimum posterior predictive loss approach. Biometrika. 1998;85:1–11. [Google Scholar]
- Gelfand AE, Smith AF. Sampling based approaches to calculating marginal densities. Journal of the American Statistical Association. 1990;85:398–409. [Google Scholar]
- Gervini D, Gasser T. Self-modelling warping functions. Journal of the Royal Statistical Society, Series B: Statistical Methodology. 2004;66(4):959–971. [Google Scholar]
- Gfroerer J, Wu L, Penne M. Technical Report Analytic Series: A-17, DHHS Publication No SMA 02-3711. Substance Abuse and Mental Health Services Administration, Office of Applied Studies; Rockville, MD: 2002. Initiation of marijuana use: Trends, patterns, and implications. [Google Scholar]
- Glueck S, Glueck E. Unraveling Juvenile Delinquency. New York: The Commonwealth Fund; 1950. [Google Scholar]
- Gottfredson M, Hirschi T. A General Theory of Crime. Stanford, CA: Stanford University Press; 1990. [Google Scholar]
- Harris KM, Florey F, Tabor J, Bearman PS, Jones J, Udry JR. The National Longitudinal Study of Adolescent Health: Research Design. 2003 WWW document. URL: http://www.cpc.unc.edu/projects/addhealth/design.
- Hastings WK. Monte carlo sampling using markov chains and their applications. Biometrika. 1970;57:97–109. [Google Scholar]
- Hirschi T, Gottfredson MR. Age and the explanation of crime. American Journal of Sociology. 1983;89:552–548. [Google Scholar]
- Lang S, Brezger A. Bayesian p–splines. Journal of Computational and Graphical Statistics. 2004;13(1):183–212. [Google Scholar]
- Malfait N, Ramsay JO. The historical functional linear model. The Canadian Journal of Statistics. 2003;31(2):115–128. [Google Scholar]
- Matsueda RL, Kreager DA, Huizinga D. Deterring delinquents: A rational choice model of theft and violence. American Sociological Association. 2006;71:95–122. [Google Scholar]
- Moffitt TE. Adolescence-limited and life-course-persistent antisocial behavior: A developmental taxonomy. Psychological Review. 1993;100:674–701. [PubMed] [Google Scholar]
- Muthén S, Shedden K. Finite mixture modeling with mixture outcomes using the em algorithm. Biometrics. 1999;55:463–469. doi: 10.1111/j.0006-341x.1999.00463.x. [DOI] [PubMed] [Google Scholar]
- Nagin DS, Land KC. Age, criminal careers, and population heterogeneity: Specification and estimation of a nonparametric, mixed poisson model. Criminology. 1993;31:327–362. [Google Scholar]
- Office of National Drug Control Policy. Marijuana Sourcebook 2008–Marijuana: The Greatest Cause of Illegal Drug Abuse. Office of National Drug Control Policy; 2008. [Google Scholar]
- Peña J. B–splines and optimal stability. Mathematics of Computation. 1997;66:1555–1560. [Google Scholar]
- Pettit L. The conditional predictive ordinate for the normal distribution. JRSS-B. 1990;52:175–184. [Google Scholar]
- Ramsay JO, Li X. Curve registration. Journal of the Royal Statistical Society, Series B: Statistical Methodology. 1998;60:351–363. [Google Scholar]
- Ramsay JO, Silverman BW. Applied Functional Analysis: Methods and Case Studies. New York: Springer-Verlag; 2002. [Google Scholar]
- Raudenbush S, Chan W. Application of a hierarchical linear model to the study of adolescent deviance in an overlapping cohort design. Journal of Clinical Consulting Psychology. 1993;61(6):941–951. doi: 10.1037//0022-006x.61.6.941. [DOI] [PubMed] [Google Scholar]
- Rice JA, Silverman BW. Estimating the mean and covariance structure nonparametrically when the data are curves. Journal of the Royal Statistical Society, Series B: Methodological. 1991;53:233–243. [Google Scholar]
- Roberts OR, Rosenthal JS. Optimal scaling of various metropolis–hastings algorithms. Statistical Science. 2001;16(4):351–367. [Google Scholar]
- Roeder K, Lynch KG, Nagin DS. Modeling uncertainty in latent class membership: A case study in criminology. Journal of the American Statistical Association. 1999;94:766–776. [Google Scholar]
- Sampson RJ, Laub JH. Crime in the Making: Pathways and Turning Points Through the Life Course. Cambridge, MA: Harvard University Press; 1993. [Google Scholar]
- Schumaker LL. Spline Functions Basic Theory. New York: John Wiley & Sons; 1981. [Google Scholar]
- Shi M, Weiss RE, Taylor JMG. An analysis of paediatric CD4 counts for acquired immune deficiency syndrome using flexible random curves. Applied Statistics. 1996;45:151–163. [Google Scholar]
- Smith A, Brian J. 2005 http://www.public-health.uiowa.edu/boa/BOA.pdf.
- Telesca D, Inoue LYT. Bayesian hierarchical curve registration. Journal of the American Statistical Association. 2008;103(481):328–339. [Google Scholar]
- Tierney L. Markov chains for exploring posterior distributions. The Annals of Statistics. 1994;1994(22):1701–1728. [Google Scholar]
- Wang H, Heitjan D. Modeling heaping in self-reported cigarette counts. Statistics in Medicine. 2008;27(19):3789–3804. doi: 10.1002/sim.3281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wolfgang M, Figlio R, Sellin T. Delinquency in a Birth Cohort. Chicago: University of Chicago Press; 1973. [Google Scholar]
- Yao F, Müller HJ, Wang JL. Functional data analysis of sparse longitudinal data. JASA. 2005;100(470):577–590. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.







