Using time-delayed mutual information to discover and interpret temporal correlation structure in complex populations

D J Albers; George Hripcsak

doi:10.1063/1.3675621

. 2012 Jan 24;22(1):013111–013111-25. doi: 10.1063/1.3675621

Using time-delayed mutual information to discover and interpret temporal correlation structure in complex populations

D J Albers ^1,^a), George Hripcsak ^1,^b)

PMCID: PMC3277606 PMID: 22462987

Abstract

This paper addresses how to calculate and interpret the time-delayed mutual information (TDMI) for a complex, diversely and sparsely measured, possibly non-stationary population of time-series of unknown composition and origin. The primary vehicle used for this analysis is a comparison between the time-delayed mutual information averaged over the population and the time-delayed mutual information of an aggregated population (here, aggregation implies the population is conjoined before any statistical estimates are implemented). Through the use of information theoretic tools, a sequence of practically implementable calculations are detailed that allow for the average and aggregate time-delayed mutual information to be interpreted. Moreover, these calculations can also be used to understand the degree of homo or heterogeneity present in the population. To demonstrate that the proposed methods can be used in nearly any situation, the methods are applied and demonstrated on the time series of glucose measurements from two different subpopulations of individuals from the Columbia University Medical Center electronic health record repository, revealing a picture of the composition of the population as well as physiological features.

In this paper, we show how to apply time-delayed mutual information (TDMI) to a sparse, irregularly measured, complicated population of time-dependent data. At a fundamental level, the technical problem is a probability density function (PDF) estimation problem; specifically, one can average PDF estimates or one can aggregate the data set before estimating the PDF. To understand and interpret these two means of coping with a population of time-series, one must address four issues: (1) estimator bias; (2) normalization or distribution support-based effects; (3) deviations from the single source case for average and aggregate; and (4) practical interpretation. Scientifically, this paper works to develop an infrastructure, and demonstrates how to use it, by studying the time-dependent correlation structure in physiological variables of humans—in a population of glucose time-series. In the end, we not only provide a practically actionable set of information theoretic computations that yield insight into the population composition and the time-dependent correlation structure but we also detail the time-dependent correlation structure and the degree of homogeneity within a broad population of humans via their glucose measurements.

INTRODUCTION

It is no surprise that aggregating collections of elements or data streams can allow for a productive analysis and understanding of the individual elements that make up the aggregated population. In fact, the aggregation of many elements into a measurable population can be pivotal in providing a means to study systems where the individual elements are difficult, expensive, or dangerous to measure. (Note that by aggregation, we mean combining sets of measurements in such a way that they can be treated as a single set of measurements that can be analyzed.) That aggregation provides a basis for analysis lies in the fact that the application of most statistical methods, such as statistical averages, probability density estimates, and techniques based on such fundamental methods (i.e., information theory, ergodic theory, etc.), require large numbers of data points. While some fields have gained much from the analysis of aggregated populations of elements—such as advances made in the physical sciences with the advent of statistical mechanics—many fields have not been so fortunate. A primary source of difficulty with aggregation in these less fortunate contexts lies in the fact that fortune or ruin often depends on the ability to aggregate measured elements such that statistical averages can be taken. Usually, this means that one must have a population of elements whose statistical properties being quantified are drawn from the same distributions. This requirement presents two inextricable problems, verifying that a population is homogeneous enough to produce representative statistics when aggregated, and determining whether a statistical analysis technique will yield the same outcome for the average over the population and for the aggregated population.

With these broad issues in mind, here, we focus on applying time-delayed mutual information to a population in an attempt to understand the time-dependent nonlinear correlation between measurements or the degree of predictability of measurements for members of a population. We wish to apply this, however, to a system whose members may: (1) have differing numbers of measurements; (2) have too few measurements for probability densities (or any other statistical quantities) to be estimated; (3) be non-stationary; (4) have very diverse underlying probability distributions or statistical states; and (5) may be measured in a highly irregular manner in time. In short, this paper details how to apply and interpret information theoretic analysis to a diversely measured, possibly statistically diverse population that needs to be aggregated for the information theoretic quantities to be calculable. Thus, this paper complements and contrasts with the research such as is presented in Ref. 1 where dynamical reconstruction of a uniformly measured stationary systems with short time-series are the focus. The particular population we focus on in this paper is a subpopulation of human beings who received care at the Columbia University Medical Center (CUMC). The particular time-series we are focusing on are clinical chemistry measurements (measurements such as glucose, that detail physiological functioning of humans) for this population. Nevertheless, it is important to note that the analysis presented is not limited to any particular population of measurements.

A reader’s guide: The outline of this paper

Broadly, this paper can be split into two main components. The first component is primarily theoretical and includes: a background section (Sec. 3); a section about TDMI-specific estimator bias (Sec. 4); a section focusing on how the TDMI for a population can deviate from the TDMI of an individual stationary source (Sec. 5); and finally, a section explaining how to use the TDMI population calculations to characterize diversity in a population (Sec. 6). Second, following the more theoretical sections are the computational sections including: a section explaining how to use the TDMI population calculations to characterize diversity in a population (Sec. 6); a section proposing some non-TDMI based metrics for evaluating population diversity that help verify the TDMI-based methods (Sec. 7); a section summarizing the TDMI methodology explicitly (Sec. 8); and finally the data-based Sec. 9 demonstrating the methodology. Regardless of intent, readers will need to read the introduction Secs. 1, 2, 3 and the discussion and summary.

MOTIVATING EXAMPLES

The theory-based motivation for this work is to devise a way to calculate and interpret the TDMI (Refs. ²^,³) in the context of a population of time-series that are both sampled irregularly and are from (possibly) statistically distinct sources. More concretely, the motivation for this work comes from the desire to understand human health dynamics (i.e., physiology, complex phenotype definitions such as diseases, basic biology, etc) based on the constrains of real data present in the electronic health record (EHR) repository at CUMC (note, CUMC is affiliated with NewYork-Presbyterian Hospital (NYP)).

The CUMC EHR contains all the information collected regarding the 3 million patients over 20 years at NYP and contains graphical images, laboratory data, drug data, doctor and nurse notes, billing data, and demographic data, most of which is highly dependent on time; moreover, the amount of data is growing exponentially. Despite the quantity of data, EHR data can be difficult to use; in particular, EHR data is characterized by: diverse irregular sampling, measurements correlated to statistical state, nonstationarity, statistically diverse population, very large populations with few measurements, and very diverse data types. Nevertheless, if these data prove to be useful for understanding human dynamics, a subject that is not completely without controversy,⁴^,⁵^,⁶ it may be possible: to define complex diseases and other phenotypes based on population scale data; to understand how disease and treatment of disease evolve in complex and interconnected ways;⁷^,⁸ correlate drugs to side effects and benefits; and to carry out many other practical applications that can be gained from understanding population-wide human health and biology. The approach taken in this paper using nonlinear physics methodology represents a radical departure from the standard utilization of biomedical data and has been termed by some⁹ as the physics of living things.

The advantage of motivating this paper with a data set with complex properties is that it allows for the generalization of the results to many other circumstances. Therefore, while we apply our analysis in the context of human health and physiology, our methods can be easily generalized to nearly all time-dependent contexts; e.g., astronomy,¹⁰ geology,¹¹ climatology,¹² and genetics.¹³

INFORMATION THEORY BACKGROUND

Begin with time-series, $X = (x_{1} (t_{1}), \dots, x_{N} (t_{N}))$ of real numbers. Next, denote all of the pairs of points in X separated by a either index time, τ = i − j (where i > j are the indices of t_i and t_j respectively), or real time, δt = t_i − t_j (again assume t_i > t_j), by X[τ] or X[δt], respectively. Note that τ is always an integer while δt can take continuous real values. For this section, we will limit the discussion to X[τ], but note that the X[δt] case follows identically. Note that in this circumstance, X[τ] can be used to approximate a joint (two-dimensional) PDF; further, note that the marginal distributions of X[τ] are approximated by X[τ](1) = X(i) and X[τ](2) = X(i − τ), respectively.

To estimate either the information entropy, or the TDMI for this time-series,²^,³ one must first estimate various PDFs.¹⁴ In order to specify a PDF, one needs to both specify the support of the PDF, S, and the PDF itself, p(X). Moreover, intuitively, the support of the PDF is the interval over which the x_i’s lie, or, the support of the PDF of X is S = [min(X), max(X)]. However, when estimating a PDF from data, the support will always be collected in a series of bins; thus, there also exists an abstract support, S, which consists of the explicit bins of the data used to estimate the PDF disconnected from the values the bins are assigned externally. Thus, S does not explicitly represent numbers in X; while this may seem like a strange point to make, the difference between S and S will be critical later in this paper. Finally, note, we will always assume that PDFs in this paper have compact support.¹⁵

Now, given the random variable X and its associated PDF, p(X), the information entropy of a time series generated by X is defined by

h_{I} = - \int_{S} p (X) \log (p (X)) dx .

(1)

Similarly, the TDMI is defined by

I (X (i); X (i - τ)) = I (X [τ]) = \int p (X (i), X (i - τ)) \times \log \frac{p (X (i), X (i - τ))}{p (X (i)) p (X (i - τ))} dX (i) dX (i - τ) .

(2)

Thus, the TDMI can be thought of as an auto-information measure that depends on a delay (e.g., τ or δt).

Given this infrastructure, fundamentally, there are two ways of conjoining a population: (1) averaging the TDMI for each member of the population and (2) aggregating the population before the PDFs are estimated without intermixing the members of the population. As we will see, in the context of a heterogeneous population, these two approaches will yield both differing numerical results and differing interpretations.

Computationally, it is important to note that we will employ both a kernel density estimator (KDE) estimator¹⁶^,¹⁷^,¹⁸ and a standard histogram estimator for all PDF calculations. We explicitly use the estimator developed in Ref. 16 with a Gaussian kernel and a bandwidth of 100; the histogram estimator is of our own design and has a bandwidth of 20. The results detailed in this paper are relatively insensitive to these parameter settings (e.g., a 10% change in the bandwidth will not produce a qualitatively different result). Moreover, in this paper, we will estimate the bias using the fixed point bias estimation technique,¹⁹ which amounts to various random permutations of temporal ordering of the time-series used to generate the PDFs and will be introduced in more detail in Sec. 4B. Finally, while this paper only addresses the continuous case, the discrete case follows more or less identically with integrals replaced by sums.

Average TDMI

To formulate the average TDMI for a population, we begin by arguing that the average mutual information of a vector of individuals (a population) is the same as the average of the mutual informations of each individual, if the individuals are independent. These cases represent conjoining a population after the PDFs have been estimated; in essence, we are just arguing that taking an average before or after the TDMI integration is performed does not affect the resultant TDMI.

Assume all processes are stationary. Define a vector-valued process X, where $X (t) = [X_{1} (t), X_{2} (t), \dots, X_{N} (t)]$ ; this leads to a the following definition of multivariate mutual information

\begin{matrix} I [X (t); X (t + j)] = \int p (X (t), X (t + j)) \times \log \frac{p (X (t), X (t + j))}{p (X (t)) p (X (t + j))} dX (t) dX (t + j), \end{matrix}

(3)

noting that p(·) is the probability density associated with the given random variable and X(·) and dX(·) are both vectors. We want the following statement to be true:

\frac{1}{N} I [X (t); X (t + j)] = \frac{1}{N} \sum_{i = 1}^{N} I [X_{i} (t); X_{i} (t + j)] .

(4)

We claim that the sufficient condition for 4 to hold is for the X_i processes to be non-interacting or statistically independent. It is important to note that it is not necessary that the X_i’s be non-interacting copies of the same process—the processes only have to be statistically independent. It is not too difficult to verify our claim algebraically, one merely applies the chain rule for mutual information to Eq. 4; moreover, conceptually understanding why our claim is correct is rather straightforward. Begin by noting that if the X_i’s are independent, they form an orthogonal set of probability densities or a product measure on N-dimensional Euclidean space. Thus, the integral of each variable will be independent of the others simply because the variables are orthogonal and thus not functions of one another (cf. Fubini’s theorem²⁰).

The conclusion is that, the average TDMI for the population is simply the canonically calculated TDMI for the individuals of the population, averaged.

Aggregate TDMI

To understand the construction where the population is aggregated before the PDFs are estimated, assume, as we did in Sec. 3A, a stationary, vector-valued process X, where X(t) = [X₁(t), X₂(t),…X_N(t)], where N denotes the number of individuals in the population. Next, assume that each element emits a time-series of length n_i; without loss of generality, in this section, assume that n_i = n.

Aggregating the population into a time-series for which the PDFs can be estimated can be done in one of two ways. The first method involves concatenating the entire set of time-series into one scalar time-series of length Nn and then treating this concatenated time-series like a time-series from a single source; denote this aggregation method as inter-source aggregation. We will not study this as this calculation needlessly adds noise via the intermixing of elements and is hard to rectify with mathematics. The second method, denoted the intra-source aggregation because sources are not intermixed within pairs of points, involves explicitly collecting pairs of points restricted to individuals. Specifically, the pairs of points are chosen such that the individual pairs of points always originate from the same individual, and then these sets of pairs of points are conjoined such that the PDFs can be estimated. Thus, this method mixes individuals by including pairs of points from many individuals, but does not mix individuals by pairing points from differing individuals.

To concretely specify what intra-source aggregation means, begin with the time series

(x_{11}, x_{12}, \dots, x_{1 n}, x_{21}, \dots, x_{Nn}),

(5)

where, given an x_ij, i specifies the individual, j specifies the time, and a time-delay of τ for which the TDMI is to be calculated. The intra-source pairs that will be aggregated and used for estimating the PDF are then

\begin{matrix} (x_{1, 1}, x_{1, τ + 1}) \\ (x_{1, 2}, x_{1, τ + 2}) \\ ⋮ \\ (x_{1, n - τ}, x_{1, n}) \\ (x_{2, 1}, x_{2, τ + 1}) \\ ⋮ \\ (x_{2, n - τ}, x_{2, n}) \\ ⋮ \\ (x_{N, n - τ}, x_{N, n}), \end{matrix}

(6)

Thus, denote the left column by $X_{1}^{n - τ}$ and the right column by $X_{τ}^{n}$ . Moreover, denote the TDMI calculated between these two columns as $I (X_{1}^{n - τ}; X_{τ}^{n})$ .

Much of the rest of this paper is dedicated to quantifying the implications and interpretations for when, and conditions under which the average and aggregate TDMIs differ. However, by comparing average to aggregate TDMI, we will also see that, very often (but not always), the aggregate TDMI will form an upper bound on the TDMI of an individual.

TDMI-SPECIFIC ESTIMATOR BIASES

All statistical estimates have bias associated with them. Here, we focus on three sources of bias that are particular to the estimation of the TDMI for a population: (1) sample-size-dependent estimator bias effects for the average versus the aggregate TDMI; (2) the basic methodology we use for numerically estimating the bias for the TDMI calculation; and (3) a source of non-estimator bias that is particular to the TDMI aggregation case—a sort of filtering bias.

Sample size dependent estimator bias effects

A practical reason why the order of aggregation matters for estimating probability densities lies in the fact that most probability density estimation techniques have estimator bias that is, to first order, proportional to one over the number of points to a power of at least one. Thus, because we are interested in coping with populations of poorly measured individuals and because we are comparing two methods of conjoining those individuals, it is important to understand how the number of data points will broadly affect estimator bias in the average and aggregate TDMI calculations.

Begin with a more computationally minded definition of the TDMI for a single time-series from a single source with n points

I [X_{i} (t); X_{i} (t - j)] = I_{X_{i}} (n) + B_{E} (n),

(7)

where $I_{X_{i}} (n)$ is the estimated TDMI for the n pairs of points of X and B_E(n) is the total estimator bias of the calculation with n pairs of points. Note that while explicit bias calculations for the entropy and TDMI calculations can be found in Refs. ¹⁹^,²¹, and ²², it will suffice to notice that for most PDF estimators (i.e., for kernel density estimators or histogram style estimators), the bias estimates will follow:

B_{E} (n) ~ n^{- 1} .

(8)

Nevertheless, it is worth noting that there is also a estimator-specific, bandwidth-specific factor on B_E(n) that is dependent on the proportion of support (e.g., number of bins) for which there exist no data points, and this factor can be important when n is small (cf. (Ref. ²²) where this effect is carefully quantified for the histogram estimator). To see how the bias of averaging TDMI over the population versus the bias of the TDMI for the pre-PDF-estimation aggregated populations differ, partition the time-series of length n into m pieces, where $\frac{n}{m}$ is a positive integer (thus, m divides n evenly and n ≥ m). Now, consider the difference between I[X_i(t), X_i(t − j)] calculated on a single time-series of length n, and I[X_i(t), X_i(t − j)] calculated on m disjoint time-series of length $\frac{n}{m}$ and then averaged. More specifically, consider

I = I [X_{i} (t), X_{i} (t - j)] = I_{X_{i}} (n) + B_{E} (n)

(9)

versus

I' = \frac{1}{m} \sum_{i = 1}^{m} I_{X_{i}} (n / m) + B_{E} (n / m) .

(10)

Now, if the bias, B_E, scaled linearly in the number of points, n, then, the bias contribution of Eq. 9 will be the same as the bias contribution of Eq. 10. However, we know the bias obeys a power-law in the number of points, n, so we get the difference between bias estimates to at least be

δ B_{E} = (\frac{1}{m} \sum_{i = 1}^{m} B_{E} (n / m)) - B_{E} (n),

(11)

~ \frac{m - 1}{n},

(12)

where δB_E > 0 for all m > 1. Or, said differently,

\frac{1}{m} \sum_{i = 1}^{m} B_{E} (n / m) \geq B_{E} (n),

(13)

where equality is satisfied only when m is one, or when the population consists of a single element. Note that when the population is particularly poorly sampled, say one or two measurements per element of the population, then m ≈ n and thus the difference in the bias of the population average versus the aggregated population will be will be order one. More importantly, averaging the mutual information (MI) of many poorly sampled individuals will not help the MI converge to its bias-free, high cardinality estimate.

Aside from the overall effect of n, there are other small sample size effects and these effects can have profoundly different outcomes depending on the estimator. For instance, in the presence of few points, a KDE estimator will often, in the name of smoothing, over-estimate the probability for empty portions of the support, resulting in a PDF estimate that is closer to a uniform random variable. Thus, a KDE-PDF based TDMI calculation will likely underestimate the TDMI. In contrast, a histogram estimator will underestimate the probability for empty portions of the support, thus yielding a more sharply peaked distribution that will yield an over-estimate of the TDMI. Because of these opposing effects, it is possible to verify the existence of finite-size effects by simply observing the difference between the KDE and histogram estimated TDMI estimates for the same data set.

In the end, because we are working to understand how to estimate the TDMI in the context of large, poorly measured populations, there will be a significant advantage to aggregating populations before estimating the PDFs necessary to carry out the TDMI calculations from the perspective of estimator bias minimization.

Fixed point bias estimate for average and aggregate populations

The fixed point TDMI bias estimation method¹⁹ attempts to estimate the τ = ∞ TDMI by randomly permuting the time-ordering of one of the sets of pairs used to estimate the distributions for a given δt or τ. Fundamentally, there are two different methods for estimating the TDMI fixed point (if it exists), random permutation within the individuals (i.e., not mixing individuals), and random permutation over the entire population, thus intermixing individuals.

The first method, individual-wise random permutation (IRP), involves randomly permuting the temporal ordering of one column (without replacement) of the data set used to estimate the distributions without intermixing individuals, or

B_{IRP} (τ, n) = \lim_{Z \to \infty} \frac{1}{Z} \sum_{i = 1}^{Z} I (X_{1}^{n - τ}, X_{τ}^{n} (i, t)),

(14)

where $X_{n}^{τ} (i, t)$ is the ith random permutation (without replacement) of the left index of the column vector $X_{τ}^{n}$ (i.e., do not permute the first index of x_i_,_j from Eq. 6). The IRP-method random permutation occurs only within an individual and not across the population, thus destroying information about only time-based correlations while preserving inter-individual information. Finally, there will exist a IRP bias estimate for both the average and aggregate TDMI cases, ${\bar{B}}_{IRP}$ where Eq. 14 is specified for a single individual and then averaged over the population, and ${\hat{B}}_{IRP}$ which is specified exactly as per Eq. 14.

The second method, population-wide random permutation (PRP), which exists only in the aggregated population context, involves randomly permuting, without regard to the individual, one column of the entire populations’ data set used to estimate the PDFs or,

{\hat{B}}_{PRP} (τ, n) = \lim_{Z \to \infty} \frac{1}{Z} \sum_{i = 1}^{Z} I (X_{1}^{n - τ}, X_{τ}^{n} (i, N, t)),

(15)

where $X_{n}^{τ} (i, N, t)$ is the ith, random permutation (without replacement) of the both indices of column vector $X_{τ}^{n}$ . Because the PRP estimate intermixes both the population and time, the PRP destroys information about both intra-individual time correlations and inter-individual information (i.e., information about differences in normalization or the supports). In the context of a single source, ${\bar{B}}_{IRP} = {\hat{B}}_{IRP} (n) = {\hat{B}}_{PRP} (n)$ . Similarly, when the population is both relatively uniform over both the PDFs and the support of the PDFs, then the PRP bias estimate will be equivalent to the bias estimate of the IRP, and thus can be thought of as an estimate of the estimator bias. However, if the support of the PDFs over the population is not uniform (i.e., if the support of any of the individuals of the population differs from the support of the population), then the PRP bias estimate will differ from the IRP bias estimate (we will discuss this explicitly in Sec. 6A). Note that ${\bar{B}}_{IRP}$ , ${\hat{B}}_{IRP}$ , and ${\hat{B}}_{PRP}$ are dependent on both τ or δt (because of the filtering effect discussed in Sec. 4C) and n, the number of points used in the estimate. In general, we will drop the n from the notation, and when there is not a τ or δt dependence, we will not include it in the notation (in general, for the data sets and δt’s, we consider in this paper, there is not a strong δt dependence).

Non-estimator bias: How the TDMI calculation can act as a population filter

While it is clear that the TDMI calculation only applies to the data used to estimate the PDFs, it is less obvious that the act of constructing the data sets used to estimate the PDFs can filter out substantial portions of the overall population. Specifically, because construction of the data sets for the PDF estimation involves collecting all pairs of points separated by some time τ or δt, if some individuals do not have pairs of points separated by τ or δt, those individuals will be filtered out of, or excluded from, the data set used to estimate the PDFs and thus the TDMI. In this sense, the TDMI calculation implicitly filters the population by measurement frequency; this is not an externally imposed data constraint, it is simply a result of calculating the TDMI in the context of population whose elements do not have identical measuring frequencies.

To understand how this filtering bias can affect the results, consider a polarized example population made up of two differently measured subsets of individuals. Specifically, the first subset of the population has individuals sampled once an hour for a month and the second subset of the population has individuals sampled once a month for 20 years. These two populations represent patients with acute and chronic conditions, respectively. If the TDMI of the population is calculated for any δt less than a month, only data set one will be represented. Similarly, if the TDMI is calculated for δt of a month or greater, only data set two will be represented. When plotting the TDMI graph versus δt, the graph has, in a sense, a bias. Namely, two the graph represents two disjoint populations for δt > one month.

Of course, for real EHR data, even more complicated problems can appear when the same individual is sampled at different rates depending on the statistical state of the individual (e.g., a patient with a chronic and acute condition). This problem is particularly acute for health care data because health correlates with presence of measurement—healthy patients are not measured often while sick patients are—thus leading to the possibility of having different subpopulations or statistical states being filtered out when calculating the TDMI for some δt values.

Thus, when estimating a TDMI for a population, it is important to quantify both who is populating the data set explicitly used to estimate the PDFs and how the proportionality of the subpopulations changes in the set used to estimate the PDFs as the delay is changed. If the population and proportionality of subpopulations in all the δt or τ TDMI estimates does not change, then the bias estimates are independent of the delay.

Methods for assessing δt bin compositions

To quantify the composition of the data set, begin with the following notation: (1) b_i(τ) represents the number of pairs of points in the τ time bin contributed by individual i; (2) b_max(τ) = N_max and b_min(τ) = N_min correspond to the maximum and minimum number of pairs of points, over all individuals, present in the data set; N_* represents the sum of b_i(τ) or the total number of pairs of points in the data set; (3) N represents the total number of individuals in the population; and (4), ς(τ) represents the set of indices of individuals monotonically ordered by increasing b_i. Based on these quantities, define the following functions:

Θ (ς (τ)) = b (ς),

(16)

\tilde{Θ} (\tilde{ς} (τ)) = \frac{b (\frac{ς (τ)}{N})}{b_{\max}},

(17)

noting that $\tilde{Θ} (τ)$ (Ref. ²⁷) is Θ(τ) normalized to lie on the unit square. Next, define the following integral that quantifies the population composition of the data set:

H_{\tilde{Θ}} (τ) = \int_{\tilde{ς}} \tilde{Θ} d \tilde{ς} .

(18)

When the time series of the members of the population are both uniformly sampled and of the same length, $H_{\tilde{Θ}} (τ)$ will be equal to one; thus, the closer $H_{\tilde{Θ}} (τ)$ is to one, the more composition of the data set includes the entire population uniformly, while the closer $H_{\tilde{Θ}} (τ)$ is to zero, the more composition of the data set represents a small subset of the population (possibly only an individual). A second, more gross quantification of how the population is represented in TDMI data set at a fixed δt is the percentage of individuals that contribute at least one pair to the data set, or

H_{b_{i} \neq 0} (τ) = \frac{# (b_{i} \neq 0)}{N} .

(19)

Note that an alternative, highly related quantity we have found useful is the cumulative distribution function (CDF) of the b_i’s.

Finally, while it is tempting to think of the population makeup of the τ data set as a measure of homogeneity within a population, this interpretation is sometimes, but not always, correct. What $H_{\tilde{Θ}} (τ)$ , $H_{b_{i} \neq 0} (τ)$ , or any other like-minded metric really detail is how a population is measured and thus represented in a given τ or δt bin. Specifically, when measurement frequency is correlated with statistical state or dynamics, then it is likely that τ bins will filter a population and make it more homogeneous. However, it is easy to think of examples where measurement frequency is random, or uncoupled from a statistical state or dynamics, and in this case, all the diversity of the population will be present in any given τ time bin.

POPULATION-BASED DEVIATIONS FROM THE INDIVIDUAL TDMI ESTIMATES

Heterogeneity-based deviations from the individual: Average TDMI case

To understand how representative the average MI over the population is of an individual in the population, begin by setting p₁ as the PDF that most resembles the average (choosing p₁ to be the median among the p_i’s would work as well) among the set of p_i’s relative to the abstract support, S; note that the average PDF is defined by

\bar{p} = \frac{1}{N} \sum_{i = 1}^{N} p (X_{i} [τ]) .

(20)

Note that in this situation, every p_i has the same abstract support (by definition), which we will denote as $\bar{S}$ . Further, note that it is possible to have a set of p_i’s such that no p_i resembles the mean graph of the p_i’s. Next, relative to p₁, we can now relate each p_i to p₁ as follows:

p_{i} = p_{1} (\bar{S}) - {\bar{ε}}_{i} (\bar{S}),

(21)

where ${\bar{ε}}_{i} (\bar{S})$ is distance between the graphs of p₁ and p_i at a given value in $\bar{S}$ . Recalling the definition of the TDMI, we get

I [X (t); X (t + τ)] = \bar{I} (τ) = \frac{1}{N} \sum_{i = 1}^{N} \int p (X_{i} (j), X_{i} (j - τ)) \times \log (\frac{p (X_{i} (j), X_{i} (j - τ))}{p (X_{i} (j)) p (X_{i} (j - τ))}) \times {dX}_{i} (t) {dX}_{i} (t + τ) = \int \bar{ι} (τ) dX (t) dX (t + τ) .

(22)

Now, because integration is a linear operation, focus on the integrand instead, or more specifically, focus on

\begin{matrix} \bar{ι} (τ) = \frac{1}{N} \sum_{i = 1}^{N} p (X_{i} (j), X_{i} (j - τ)) \log (\frac{p (X_{i} (j), X_{i} (j - τ))}{p (X_{i} (j)) p (X_{i} (j - τ))}) = p (X_{1} (j), X_{1} (j - τ)) \log (\frac{p (X_{1} (j), X_{1} (j - τ))}{p (X_{1} (j)) p (X_{1} (j - τ))}) + \bar{G} (N, ε_{i}, p (X_{1} (j), X_{1} (j - τ)), p (X_{1} (j)), p (X_{1} (j - τ))) = \bar{ρ} (τ) + \bar{G} (τ), \end{matrix}

(23)

where, $\bar{G} (τ)$ is given by

\bar{G} (τ) = - \frac{1}{N} [\sum_{i = 1}^{N - 1} (\frac{{\bar{ε}}_{i}}{p (X_{1} (j), X_{1} (j - τ))}) \times (\log \frac{p (X_{1} (j), X_{1} (j - τ))}{p (X_{1} (j)) p (X_{1} (j - τ))}) + \log (\frac{1 - \frac{{\bar{ε}}_{i}}{p (X_{1} (j), X_{1} (j - τ))}}{(1 - \frac{{\bar{ε}}_{i}}{p (X_{1} (j))}) (1 - \frac{{\bar{ε}}_{i}}{p (X_{1} (j - τ))})}) \times (\frac{{\bar{ε}}_{i}}{p (X_{1} (j), X_{1} (j - τ))} - 1)],

(24)

(for a more explicit calculation of $\bar{I}$ , cf. Appendix A). As each ${\bar{ε}}_{i}$ goes to zero, $\bar{G}$ goes to zero; thus, the more support independent variance (recall ${\bar{ε}}_{i}$ is relative to the abstract support $\bar{S}$ ) there is within the population, the larger $\bar{G}$ will be, and the less $\bar{I} (τ)$ will represent the TDMI of an individual element within the population. Written explicitly, $\bar{I} (τ)$ represents the “average” individual plus the sum of the deviations from that individual.

Entropy of the averaged population

While the primary topic in this paper is the TDMI, we will contend briefly with the TDMI for τ = 0 or the auto information. Based on an identical means of calculation, the information entropy of a time series for a population can be defined as follows:

{\bar{h}}_{I} = - \frac{1}{N} \int [p_{1} \log (p_{1}) + p_{1} \sum_{i = 1}^{N - 1} \log (p_{1} - {\bar{ε}}_{i}) - \sum_{i = 1}^{N - 1} ε_{i} \log (p_{1} - {\bar{ε}}_{i})] dx,

(25)

Thus, when ${\bar{ε}}_{i} \to 0$ , the h_I for the population relative to the abstract support tends toward the information contained in an individual.

Heterogeneity-based deviations from the individual: Aggregate TDMI case

To understand how the diversity in the population is rendered via the TDMI of the aggregated population begin by recalling that the TDMI for the aggregated set is defined by

\hat{I} (τ) = I (X_{1}^{n - τ}; X_{τ}^{n}) = \int p (X_{1}^{n - τ}, X_{τ}^{n}) \log (\frac{p (X_{1}^{n - τ}, X_{τ}^{n})}{p (X_{1}^{n - τ}) p (X_{τ}^{n})}) {dX}_{1}^{n - τ} {dX}_{τ}^{n} = \int \hat{ι} (τ) {dX}_{1}^{n - τ} {dX}_{τ}^{n},

(26)

where, under ideal (single, stationary source) circumstances the PDF of the aggregated density obeys

\hat{p} (X_{1}^{n - τ}, X_{τ}^{n}) = \frac{1}{N} \sum_{i = 1}^{N} p (X_{1}^{n - τ} (i), X_{τ}^{n} (i)),

(27)

where $X_{1}^{n - τ} (i)$ and $X_{τ}^{n} (i)$ represent the PDF restricted to individual i. Intuitively, Eq. 27 just says that we are creating the aggregate PDF by summing the graphs of all the individuals relative to the union of the supports of all the individuals, that is, relative to $\hat{S} = \cup_{i = 1}^{N} {\hat{S}}_{i}$ .

To choose a PDF that most closely resembles a centroid, it is helpful to have a concept of abstract support; however, because $\hat{I} (τ)$ is defined relative to the actual support of the population, the individual population PDFs do not separate as naturally as in the $\bar{I} (τ)$ case. Nevertheless, conceptually, to define an abstract support in the aggregate circumstance, one needs to, in spirit, construct a situation where all the PDFs have roughly the same range or support. There are several ways one can imagine achieving such goal; here will define the abstract support, $\hat{S}$ , such that every patient has been renormalized to have the identical support—the unit interval (i.e., [0,1]). It is important to realize that relative to the aggregate case, there can be a very severe difference between the TDMI of an aggregated population defined on support of the $\hat{S}$ versus the abstract support $\hat{S}$ . To allow for quantifying these potential differences, define the TDMI for an aggregated population relative to the abstract support, $\hat{I} (τ)$ . Now, using the abstract support, select p₁ in the same way we selected p₁ in Sec. 5A, by selecting the PDF that most closely represents the mean over the population of PDFs relative to the abstract support. This definition implies an important difference in how p_i is specified in the aggregate case versus the average case because, despite the fact that we use an abstract support to select a p₁, $\hat{I} (τ)$ is not calculated relative to the abstract support, and thus, the differences between p₁ and p_i are instead defined by

p_{i} = p_{1} (\hat{S}) - {\hat{ε}}_{i} (\hat{S}),

(28)

where ${\hat{ε}}_{i} (\hat{S})$ is distance between the graphs of p₁ and p_i at a given value in total support, $\hat{S}$ . Next, focusing on the integrand, $\hat{ι}$ , and substituting Eq. 28 into Eq. 27 and recalculating $\hat{ι}$ , we arrive at (dropping the subscript on p₁)

\hat{ι} (τ) = p (X_{1}^{n - τ}, X_{τ}^{n}) \log (\frac{p (X_{1}^{n - τ}, X_{τ}^{n})}{p (X_{1}^{n - τ}) p (X_{τ}^{n})}) + \hat{G} (τ) (N, {\hat{ε}}_{i}, p (X_{1}^{n - τ}, X_{τ}^{n}), p (X_{1}^{n - τ}), p (X_{τ}^{n})) = \hat{ρ} (τ) + \hat{G} (τ),

(29)

where $\hat{G} (τ)$ is explicitly given by

\hat{G} (τ) = \log (\frac{1 - \frac{\sum_{i = 1}^{N - 1} {\hat{ε}}_{i}}{Np (X_{1}^{n - τ}, X_{τ}^{n})}}{(1 - \frac{\sum_{i = 1}^{N - 1} {\hat{ε}}_{i}}{Np (X_{1}^{n - τ})}) (1 - \frac{\sum_{i = 1}^{N - 1} {\hat{ε}}_{i}}{Np (X_{τ}^{n})})}) \times (p (X_{1}^{n - τ}, X_{τ}^{n}) - \frac{\sum_{i = 1}^{N - 1} {\hat{ε}}_{i}}{N}) - \frac{\sum_{i - 1}^{N - 1} {\hat{ε}}_{i}}{N} \log (\frac{p (X_{1}^{n - τ}, X_{τ}^{n})}{p (X_{1}^{n - τ}) p (X_{τ}^{n})}),

(30)

(the calculation of $\hat{G}$ and $\hat{I}$ follows a similar path to that of $\bar{G}$ and $\bar{I}$ as seen in Appendix A). Thus, as the average of the ${\hat{ε}}_{i}$ ’s go to zero, $\hat{G} (τ)$ will go to zero; moreover, when both the width of the band of PDFs decreases and when the supports of the distributions overlap (i.e., when $\cap_{i = 1}^{N} {\hat{S}}_{i} \to \cup_{i = 1}^{N} {\hat{S}}_{i}$ ), the TDMI of the aggregate population $(\hat{I})$ will represent an individual within a homogeneous population (because the individuals within the population are similar). Similarly, when either the width of the band of PDFs increases or the supports of the distributions becomes disjoint, (i.e., when $\cap_{i = 1}^{N} {\hat{S}}_{i} \to 0$ ), $\hat{I} (τ)$ will represent the TDMI within the diverse population. Or, said differently, the TDMI for the aggregated population will represent the TDMI of the population plus the sum of the individual based differences from the population. As we will see in the sections that follow, this second circumstance can lead to subtle difficulties in interpretation. Finally, note that the calculation that yielded $\hat{ι}$ does not explicitly depend on the support; the explicit $\hat{ε}$ ’s will differ between $\hat{I} (τ)$ and $\hat{I} (τ)$ , but the explicit form of $\hat{ι}$ will not.

Entropy of the aggregated population

Again, while the TDMI is the primary topic of this paper, in both the interest of completeness and later analysis, we define h_I for the aggregated population, which was calculated in analog with $\hat{I}$ , as follows:

{\hat{h}}_{I} = - \int p \log (p - \frac{\sum_{i = 1}^{N - 1} {\hat{ε}}_{i}}{N}) - \frac{\sum_{i = 1}^{N - 1} {\hat{ε}}_{i}}{N} \log (p - \frac{\sum_{i = 1}^{N - 1} {\hat{ε}}_{i}}{N}) .

(31)

In contrast to the situation where the information entropy is averaged over the population, when the average $\frac{{\hat{ε}}_{i}}{N} \to 0$ , the information entropy for the aggregated population, ${\hat{h}}_{I}$ , relative to the real support of the population tends toward the information contained in an individual who has the most data pairs in the PDF estimate.

HOW TO INTERPRET THE TDMI FOR A POPULATION, OR, TDMI-BASED METHODS FOR INTERPRETING POPULATION DIVERSITY

To achieve a practical understanding of the meaning of the TDMI in the context of a population, we have to combine information from Sec. 5B1 to construct an explicitly numerically computable means of interpreting $\bar{I} (τ)$ and $\hat{I} (τ)$ . Practically speaking, there are two broad situations: (1) $\bar{I} (τ)$ is practically calculable (when $\bar{I} (τ)$ is calculable, $\hat{I} (τ)$ always will be); and (2), $\bar{I} (τ)$ is not calculable (usually to estimate $\bar{I} (τ)$ , there need to be at least 100 pairs of points per representative element) leaving us only with $\hat{I}$ -related quantities. Relative to the first situation, define the difference between $\bar{I} (τ)$ and $\hat{I} (τ)$ , or

\begin{matrix} δ I (τ) = | \bar{I} (τ) - \hat{I} (τ) | = | \int_{\bar{S}} p_{1} (\bar{S}) - \int_{\hat{S}} p_{1} (\hat{S}) | + | \int_{\bar{S}} \bar{G} - \int_{\hat{S}} \hat{G} | + (\bar{B} - \hat{B}) | = δ ρ + δ G_{\int} + δ B . \end{matrix}

(32)

This allows for the following conjecture which we will not prove in this paper:

Conjecture 1:In the circumstance, where $\bar{I} (τ)$ can be accurately estimated, δI(τ) ∼ 0 if and only if the population used to estimate $\bar{I} (τ)$ and $\hat{I} (τ)$ is statistically homogeneous temporally (i.e., the PDFs representing the individuals in the population are identical, as are the PDFs under temporal evolution).

The forward direction of the if and only if statement, that δI(τ) ≠ 0 implies a heterogeneous population will be briefly discussed in Sec. 6B; this direction is more complicated to prove. The reverse direction of the if statement in this conjecture claims that if the population represents a single, stationary, homogeneous distribution then δI(τ) ∼ 0; this claim relies on the fact that in this circumstance, all ε’s are zero and thus $\bar{I} (τ)$ (Eq. 22 and $\hat{I} (τ)$ (Eq. 26) represent a homogeneous source and are equivalent up to bias. Essentially, when one can estimate δI(τ), one can interpret the population make-up without delving deeply into the detailed sources of the TDMI. In contrast, when only $\hat{I} (τ)$ is practically calculable, the interpretation of $\hat{I} (τ)$ can only be understood by understanding the source of the TDMI. Nevertheless, in general, it is insightful to understand the sources of the TDMI, and the sources of the TDMI are tied to the make-up of the population.

From a detailed perspective, the make-up of the population is important because the deviation of the TDMI from the homogeneous case is due to non-zero ε’s, and the source of non-zero $\bar{ε}$ ’s can differ from the source of non-zero $\hat{ε}$ ’s. Specifically, $\bar{ε}$ can only be non-zero because of differences between the graphs of the p_i’s. This is because all the p_i’s for the average TDMI have the same support. In contrast, the source of non-zero $\hat{ε}$ ’s is due to a heterogeneous population can be split into three broad categories: (1) differences in the TDMI estimates due to differences in the supports independent of the graphs of the PDFs; (2) differences in the TDMI estimates due to differences in the graphs independent of the supports; and (3) differences in the TDMI estimates due to the supports’ effect on the graphs.

Support dependent, graph independent, effects on the population TDMI

To understand and quantify the differences in the TDMI estimates due to differences in the supports independent of the graphs of the PDFs, consider the difference between the random permutation bias estimates defined in Sec. 4B.

First, recall that the population-wide random permutation bias estimate will be roughly equivalent to the estimator bias, or B_PRP(τ) ≈ B_E(τ) regardless of the supports or densities of the elements (cf. (Ref. ¹⁹) for small sample size qualifications of this statement). Next, note that the individual-wise random permutation bias estimate, ${\hat{B}}_{IRP} (τ)$ represents the bias due to heterogeneity in the supports plus the estimator bias. Thus, the contribution to the bias due to the diversity in population normalization is approximated by the difference between the individual-wise and population-wise random permutation bias estimates

B_{RP} (τ) = | {\hat{B}}_{PRP} (τ) - {\hat{B}}_{IRP} (τ) | .

(33)

There are two reasons why B_RP(τ) can be non-zero. First, the number of points used to calculate the two can differ by orders of magnitude (say, a population of 1000 with 10 points each); in this case, B_RP(τ) represents the 1/n effect on the bias estimates. In the case where the number of pairs used to estimate ${\hat{B}}_{PRP} (τ)$ and ${\hat{B}}_{IRP} (τ)$ are relatively similar (e.g., more than 100 and within an order of magnitude; to control for the number of points, it is easy reduce the cardinality of the set used to calculate ${\hat{B}}_{PRP} (τ)$ ) Fig. 1b shows visually how these bias estimates would render differently. In this context, ${\hat{B}}_{IRP} (τ)$ would be identical to I, where as randomly permuting the entire population, such as is done to estimate ${\hat{B}}_{PRP} (τ)$ , will result in one of the marginal distributions becoming $\hat{p} (\hat{S})$ —a uniform distribution instead of three Gaussians with distinct means—thus greatly changing the amount of mutual information. These effects are primarily support-driven effects; note that while it is possible that differences in the underlying distribution function can be rendered through B_RP(τ), differences in the support of those distributions will always be rendered through B_RP(τ). As we will see in a moment, $B_{RP} (τ) \approx {\hat{B}}_{IRP} (τ)$ is not enough to imply that δI(τ) ≈ 0, but is enough to imply that the variance in the boundaries of the supports will all be relatively small. Nevertheless, while in some circumstances, it may be difficult to use the bias estimates to detect a difference in the average versus aggregate TDMI, we can use the bias estimates to interpret the average and aggregate TDMI signal. In particular, when B_RP(τ) ≤ B_E(τ), intermixing individuals’ measurements has no effect on the random permutation bias estimate, implying that there is very little population selection information in the TDMI estimate. Thus, B_RP(τ) ≤ B_E(τ) at least implies overlapping distribution supports. Similarly, when $B_{RP} (τ) >> B_{E} (τ)$ , intermixing elements has a profound effect on the random permutation bias estimates; in this instance, B_RP(τ) reveals a bias whose source is the diversity of the supports among the elements. This leads us to the measure of homogeneity of supports that is very computable even for poorly measured populations (e.g., when only $\hat{I} (τ)$ is calculable); the TDMI homogeneity of support is defined by the following equation:

H_{S} (τ) = \frac{| {\hat{B}}_{IRP} (τ) - \hat{I} (τ) |}{\hat{I} (τ)} .

(34)

The closer $H_{S} (τ)$ is to one, the less the diversity of the supports over the population; similarly, the closer $H_{S} (τ)$ is to zero, the greater the diversity of the supports over the population. (Again, note one must control for the dependence on the number of pairs used to estimate the above quantities.)

(Color) Graphically comparing $\bar{p}$ (average PDF) and $\hat{p}$ (PDF of the aggregate) for a collection of three collections of Gaussian random numbers whose distributions have means 0, 2, and 4 respectively.

It is worth noting that a similar analysis can be done by comparing $\hat{I} (τ)$ to $\hat{I} (τ)$ , as their difference will reveal support based effects. The principles behind a $δ \hat{I} (τ) = | \hat{I} (τ) - \hat{I} (τ) |$ and $H_{S} (τ)$ are similar in that they both address normalization of support based effects, only $H_{S} (τ)$ depends on quantities that represent distributions— ${\hat{B}}_{IRP} (τ)$ and ${\hat{B}}_{PRP} (τ)$ can both be estimated many times—and thus are likely more robust.

Graph dependent, support independent, effects on the population TDMI

To understand in detail how differences in the graphs independent of the supports can affect the $\bar{I} (τ)$ and $\hat{I} (τ)$ , begin by assuming that all the p_i’s have the same support, or that $\cap_{i = 1}^{N} S_{i} = \cup_{i = 1}^{N} S_{i}$ . In this circumstance, the ${\bar{ε}}_{i} = {\hat{ε}}_{i}$ for all i. Thus, the contribution of the diversity of PDFs within the population to I, or the deviation from the mean at a particular x ∈ S value, is captured by $\bar{G} (τ)$ and $\hat{G} (τ)$ as defined in Eqs. 24, 30. Consequently, the only way that $\bar{I} (τ)$ can be different from $\hat{I} (τ)$ up to the estimator bias is for the variation in the collections of PDFs to be due to the order of averaging as rendered through the G’s.

Based on the aforementioned intuition, we claim (e.g., conjecture 1) that δI(τ) is equal to zero if and only if all the ε’s are zero. While we will not present a qualified proof of this claim here, we can offer an intuitive argument as to why our claim is justified. First, note that by inspection, if $ε_{i} = 0$ for all i, $δ G (τ) = \bar{G} (τ) = \hat{G} (τ) = 0$ . Now, what remains is to understand what happens to the G’s when there are non-zero ε’s; to do this, note that we reduce the G’s to the terms they do not have in common

\bar{G} (τ) ~ \bar{g} (τ) =

(35)

\frac{1}{N} \sum_{i = 1}^{N - 1} ({\bar{p}}_{1} (j, τ) - {\bar{ε}}_{i}) \log (\frac{1 - \frac{{\bar{ε}}_{i}}{{\bar{p}}_{1} (j, τ)}}{(1 - \frac{{\bar{ε}}_{i}}{{\bar{p}}_{1} (j)}) (1 - \frac{{\bar{ε}}_{i}}{{\bar{p}}_{1} (τ)})}),

(36)

\hat{G} (τ) ~ \hat{g} (τ) =

(37)

({\hat{p}}_{1} (j, τ) - \frac{\sum_{i = 1}^{N - 1} {\hat{ε}}_{i}}{N}) (\frac{1 - \frac{\sum_{i = 1}^{N - 1} {\hat{ε}}_{i}}{N {\hat{p}}_{1} (j, τ)}}{(1 - \frac{\sum_{i = 1}^{N - 1} {\hat{ε}}_{i}}{N {\hat{p}}_{1} (j)}) (1 - \frac{\sum_{i = 1}^{N - 1} {\hat{ε}}_{i}}{N {\hat{p}}_{1} (τ)})}),

(38)

where ${\bar{p}}_{1} (j, τ) = p (X_{1} (j), X_{1} (j - τ))$ , ${\bar{p}}_{1} (j) = p (X_{1} (j))$ , ${\bar{p}}_{1} (τ) = p (X_{1} (j - τ))$ , ${\bar{ε}}_{i} (\bar{S}) = {\bar{ε}}_{i}$ , ${\hat{p}}_{1} (j, τ) = p (X_{1}^{n - τ}, X_{τ}^{n})$ , ${\hat{p}}_{1} (j) = p (X_{1}^{n - τ})$ , ${\hat{p}}_{1} (τ) = p (X_{τ}^{n})$ , and ${\hat{ε}}_{i} (\hat{S}) = {\hat{ε}}_{i}$ ; then, define the difference in these quantities

δ G ~ δ g (τ) = | \bar{g} (τ) - \hat{g} (τ) | .

(39)

Now, further noting that $\bar{g} (τ)$ is convex (or concave, depending on the p’s) and applying standard convexity arguments, δg will not equal zero unless $ε_{i} = 0$ for all i. Intuitively, the difference lies in when the summation of the differences between the p’s and the ε’s is taken. Specifically, the difference between ${\hat{p}}_{1}$ and the $\hat{ε}$ ’s is taken after the $\hat{ε}$ ’s are averaged, whereas the difference between $\bar{p}$ and the $\bar{ε}$ ’s is itself averaged (the summation is applied over the differences between $\bar{p}$ and the $\bar{ε}$ ’s). Thus, while it is possible that, through the act of integrating the G’s, symmetries will allow for the G’s to be equal, δg equaling zero is extremely unlikely. This is the primary reason why be believe that conjecture 1 is true. Nevertheless, because the convexity or concavity of $\bar{g} (τ)$ depends on the nature of the p’s, it is non-trivial to specify whether $\bar{g} (τ)$ will be, in general, greater or less than $\hat{g} (τ)$ . Nevertheless, it appears in computational experiments that $\hat{g} (τ)$ is often less than $\bar{g} (τ)$ . In any event, it is now more clear how diversity amongst the distribution of p’s over the same support can (and likely will) force δI(τ) ≠ 0.

In the situation, where $\bar{I} (τ)$ is not accessible, it may not be possible to fully understand the meaning of $\hat{I} (τ)$ . While $H_{S} (τ)$ can help identify support based effects, pure graph-based temporally dependent effects may be difficult to estimate. In particular, if the sample size for some of the individuals is small, then it will be difficult to determine the contribution to $\hat{I} (τ)$ due to purely graphic diversity simply because there will be such high variance in the graphical PDF estimates due to small sample sizes.²⁸ In this case, the best that can be done is to estimate more static measures of graphic diversity such as those presented in Sec. 7.

Support dependent, graph-based effects on the population TDMI

There are two potential contributors to support dependent, graph-based effects on δI(τ), δG(τ), and δρ(τ).

The contribution to δI(τ) due to δρ(τ) is entirely due to the limits of integration; the integrand for the average and aggregate ρ component of the TDMI are identical. Thus, intuitively, δρ > 0 because of the relative location of the support of p₁ in reference to the total support of the population; p₁ will represent a more peaked distribution when defined on $\hat{S}$ compared to $\bar{S}$ . Note that while δρ is, in general, computable, it has similar characteristics to $H_{S} (τ)$ with more severe bias issues.

The contribution due to $δ G_{\int}$ is not as intuitive; to understand how diversity in the supports contributes to $δ G_{\int}$ via the induced differences in the ε’s, consider Figs. 1a, 1b. Relative to Fig. 1a, begin by defining $\bar{p} (\bar{S})$ as the average of the PDFs relative to the abstract support, or $\bar{p} (\bar{S}) = \frac{1}{3} (p_{1} (\bar{S}) + p_{2} (\bar{S}) + p_{3} (\bar{S}))$ ; here, all the ${\bar{ε}}_{i}$ ’s will be small and independent of the support. This is how variation in the population is rendered when calculating $\bar{I}$ , and thus how $\bar{G}$ will render. In contrast, define the average of the PDFs relative to the total support, or $\hat{p} (\hat{S}) = \frac{1}{3} (p_{1} (\hat{S}) + p_{2} (\hat{S}) + p_{3} (\hat{S}))$ ; this is the aggregate scenario. Here, it is clear that both the averaged PDF will not resemble any of the PDFs and relative to a selected p₁. Moreover, all ${\hat{ε}}_{i}$ ’s will be relatively large and on the order of the various $p_{i} (\hat{S})$ ’s over a non-trivial portion of the population support $\cup_{i = 1}^{N} S_{i}$ . Because of this, when the supports of the individuals differ, the largest term in $\hat{I} (τ)$ , $\hat{G} (τ)$ , will be accounting primarily for variation within the distribution of the supports of the population, rather than support-independent variation within the population. Moreover, when the supports of the individuals are relatively invariant, $\hat{I}$ will be independent of time even when the I of an individual varies with τ. In any event, the point is, variation in the supports of otherwise identical distributions affects how the distributions are rendered through the TDMI calculation.

Finally, when only $\hat{I} (τ)$ is available, which implies the presence of individuals with too few pairs of points to accurately estimate a PDF and thus the TDMI, and when there are support-dependent graph-based effects in the TDMI, it will likely be difficult to separate the support dependent, graph-based effects from the support independent graph-based effects on the TDMI (e.g., on the structure of the temporal correlation).

NON-TDMI-BASED METHODS FOR INTERPRETING POPULATION DIVERSITY

In this paper, we claim that the TDMI-based analysis can be used to both detail nonlinear correlation in time and interpret the composition of the population to which that correlation pertains to (i.e., whether the TDMI reflects and individual/homogeneous population or a diverse population). To verify this claim, we require a set of methods for establishing a baseline that are independent of information-theoretic machinery and can be used to interpret the make-up of the population. We propose three different quantifications of homogeneity of a population: (1) homogeneity in measurement representation, which addresses the variance in the distribution of the number of measurements per element of the population; (2) homogeneity in support, which addresses variation in the supports of each elements’ distribution; and (3) homogeneity in density, which addresses variation in the PDFs (or the graphs of the PDFs) over the population. Note that all but one of the methods for quantifying homogeneity are independent of time, and all are independent of any time-based correlation structure existent within the data set. Moreover, the homogeneity qualification methods we propose here are neither exhaustive nor particularly innovative; rather they are simple intuitive methods devised to interpret and confirm the TDMI-based results. Nevertheless, many of these methods are useful in their own right; moreover, at least one of the quantities we define here is required to supplement the TDMI analysis when very few measurements exist per individual. Finally, Table TABLE I. contains a summary of the twelve TDMI-independent quantities are we use to verify the TDMI methodology.

TABLE I.

Summary of all the non-TDMI based metrics used to assess homogeneity in a population (both among the graphs and the supports) used to verify the TDMI-type analysis.

non-TDMI-based quantities for characterizing a population
$H_{\bar{x}}$	difference between the population and individual element means	∼0 implies either (1) most elements have a similar number of measurements or (2) the individuals come from distributions with similar means; $≫ 0$ implies the converse
V(f(n))	variance of the PDF of the number of measurements per individual	(1) V ∼ 0, $H_{\bar{x}} ~ 0$ imply elements were measured similarly; $≫ 0$ , $H_{\bar{x}} ~ 0$ implies elements measured at different rates; $≫ 0$ , $H_{\bar{x}} ≫ 0$ implies elements measured at different rates with differing source distributions
${\bar{s}}_{\min}$	E[s_min(i)]	lower support boundary mean
${\bar{s}}_{\max}$	E[s_max(i)]	upper support boundary mean
$V_{s_{\min}}$	Var(s_min)	lower support boundary variance
$V_{s_{\max}}$	Var(s_max)	upper support boundary variance
$\bar{\| S \|}$	${\bar{s}}_{\max} - {\bar{s}}_{\min}$	length of support mean.
$V_{\bar{\| S \|}}$	$Var ({\bar{s}}_{\max} - {\bar{s}}_{\min})$	length of support variance
H_RA	area between the (point-wise) least and greatest PDF graph	quantifies variance between the PDFs of the population; ∼0 implies element PDFs are homogeneous; very sensitive
V_S(p)	$\int_{S} E [{(p (x))}^{2}] - E {[p (x)]}^{2} dx$ , variance of the PDFs relative to a specified support, S	∼0 implies homogeneity in PDFs; larger Var_S(f) implies greater heterogeneity in the PDFs.
$V_{\hat{S}} (p)$	V_S(p) calculated relative to the support of the aggregate population; $\hat{S} = \cup_{i = 1}^{N} {\hat{S}}_{i}$ ; note that there does exist an aggregate normalized support, $\hat{S}$ , but we will not use this quantity here.	$V_{\hat{S}} (p)$ has the same interpretation as V_S(p) in general, but has the potential to include support-based effects.
$V_{\bar{S}} (p)$	V_S(p) calculated relative to the abstract support of the population, $\bar{S}$	$V_{\bar{S}} (p)$ has the same interpretation as V_S(p) in general, but excludes support-based effects.

Open in a new tab

Homogeneity in measurement composition

To quantify homogeneity in measurement composition, begin with the following two quantities. First, consider the difference between the mean of the raw measurements over the population versus the mean of the individual-wise measurement means, or

H_{\bar{x}} = (\frac{1}{\sum_{k = 1}^{N} n_{k}} \sum_{i = 1}^{\sum_{k = 1}^{N} n_{k}} x_{i}) - (\frac{1}{N} \sum_{k = 1}^{N} \frac{1}{n_{k}} \sum_{i = 1}^{n_{k}} x_{i + \sum_{j = 0}^{k - 1} n_{j}}),

(40)

where n_k is the number of points contributed by individual k, N is the number of individuals in the population, and n₀ = 0. Now, $H_{\bar{x}} \approx 0$ under two circumstances: (1) the distribution of n_k’s has zero or small variance, regardless of the collection of individual distributions; or (2) each individual comes from an identical distribution. Second, consider the variance of the probability density of the number of measurements per individual

V_{f (n)} = Var (f (n)),

(41)

where f(n) denotes the density of measurements per individual. Combining these two quantities, we arrive at three cases: (1) V_f₍_n₎ ∼ 0 implies that $H_{\bar{x}} ~ 0$ , together implying that the elements were measured similarly—no insight into the original distributions can be made; (2) $V_{(n)} ≫ 0$ and $H_{\bar{x}} ~ 0$ together imply that the elements were measured at different rates regardless of their source distributions (which can be identical); and (3) $V_{(n)} ≫ 0$ and $H_{\bar{x}} ≫ 0$ together implies that the elements were measured at different rates and likely have differing source distributions. Note, that in general, both of these metrics are rather sensitive to diversity in a population.

Homogeneity in measurement distribution supports

To characterize homogeneity in distribution support, we rely on a brute force homogeneity characterization technique. Begin by recalling that the support for element i’s distribution as S_i = [s_min(i), s_max(i)]. Given these sets, which are defined by the individuals’ measurements, define the mean and variance of the support minima, maxima, and length by

{\bar{s}}_{\min} = E [s_{\min} (i)],

(42)

{\bar{s}}_{\max} = E [s_{\max} (i)],

(43)

V_{s_{\min}} = Var (s_{\min}),

(44)

V_{s_{\max}} = Var (s_{\max}),

(45)

| \bar{S} | = {\bar{s}}_{\max} - {\bar{s}}_{\min},

(46)

V_{| \bar{S} |} = Var ({\bar{s}}_{\max} - {\bar{s}}_{\min}),

(47)

These quantities afford relatively simple representations. For instance, when the minima, maxima, and lengths for the population have small variance, the intersection of the supports will not differ significantly from the union of the support—meaning the supports overlap. While a large variance in any either the minima, maxima, or lengths implies that the supports differ significantly over the population.

Homogeneity in the distribution of the graphs of the measurement PDFs

To specify homogeneity in the PDF of the population, we will use two methods. Intuitively, all of the methods characterize, in one way or another, the width of the maximum and minimum band of PDFs of the population over the support of the entire population. Begin by defining the PDF for an individual by p_i(x), the supremum of the PDFs of the population by max_i(p(x)) = p_M(x) and the infimum of PDFs of the population by min_i(p(x)) = p_m(x), over the union of the supports, $S = \cup_{i}^{N} S_{i}$ . First, using the L₁ (pseudo) distance,²⁹ we can define the relative area of the width of the band of PDFs by

H_{RA} = \frac{\int_{S} | p_{M} (x) - p_{m} (x) | dx}{\int_{S} p_{M} (x) dx} .

(48)

The relative area, H_RA is literally the proportion of the supremum of the collection of PDFs that coincides with the infimum of the collection of PDFs. When H_RA is close to one, the maximum distance between PDFs over the population occupies all the volume of the population-wide PDF. In other words, the population has at least two substantially different PDFs. Similarly, when H_RA is near zero, this implies that the proportion of the area between the supremum and infimum over the collection of p_i’s relative to the total area occupied by the supremum of the p_i’s over the population is very small. Thus, the implication of H_RA being near zero is that the p_i’s are all nearly identical. However, this method is very sensitive to heterogeneity; a single individual’s PDF differing from the rest of the population can maximize H_RA at one. In contrast, the second method for evaluating the diversity in PDFs over the population quantifies diversity from a mean within the population by estimating the variance of the PDFs at a given at a given x integrated over a given support (S), or

V_{S} (p) = \int_{S} E [{(p (x))}^{2}] - E {[p (x)]}^{2} dx .

(49)

Note, V_S(p) can be estimated relative to two different supports, the union of the supports, or the abstract support. This is an L₂ flavored representation of the variation in PDFs; the variance of the p_i’s at a given x is maximized when p_i’s are maximally orthogonal (in the sense of an inner product between the p_i’s) to one another, and minimized when the p_i’s are minimally orthogonal (meaning they coincide). Thus, V_S(p) has the potential to capture both support- and graph-based variation, depending on whether V is calculated relative to $\hat{S}$ , which will include support-based effects, or $\bar{S}$ , which will not include support-based effects.

ASSEMBLING THE PIECES: AN EXPLICIT PRESCRIPTION FOR TDMI ANALYSIS AND INTERPRETATION FOR A POPULATION OF TIME SERIES FOR A FIXED TIME SEPARATION δt

The interpretation of the TDMI and entropy for a complex, diversely measured population can be split into three broad steps: (1) performing a preliminary interpretation of $\bar{I} (δ t)$ and $\hat{I} (δ t)$ ; (2) performing an interpretation of δI(δt) or $\hat{I} (δ t)$ for the population; and (3) understanding the make-up of the data explicitly used to estimate the PDFs, yielding an understanding of what proportion of the population as used in the calculation. All the TDMI quantities used for the TDMI-based analysis are shown in Table TABLE II., a graphical schematic for applying this infrastructure is shown in Fig. 2, and a detailed algorithmic schematic for applying the TDMI infrastructure to a population is depicted via pseudocode in Appendix B.

TABLE II.

Summary of all the TDMI-based metrics used to interpret the TDMI and determine the population composition.

TDMI-based analysis quantities
Quantity	What it signifies	What it quantifies
$\bar{I} (δ t)$	population averaged TDMI	quantifies average TDMI of a population
$\hat{I} (δ t)$	aggregated population TDMI	quantifies TDMI of an aggregated population
$\hat{I} (δ t)$	aggregated population calculated relative to the abstract support $\hat{S}$	support independent TDMI of an aggregated population
δI(δt)	$\| \hat{I} (δ t) - \bar{I} (δ t) \|$ ; difference between the average and aggregate TDMI	∼0 implies homogeneity, <0 implies heterogeneity
B_E(δt)	PDF estimator bias; usually, B_E(δt) ∼ B_PRP(δt); B_E(δt) can be estimated in a variety of ways	the number above which the I is considered to be positive
${\bar{B}}_{IRP} (δ t)$	individual permutation bias averaged over a population	bias estimate that preserves information about the relative ranges of individuals
${\hat{B}}_{IRP} (δ t)$	individual permutation bias	bias estimate that preserves information about the relative ranges of individuals
${\hat{B}}_{PRP} (δ t)$	population permutation bias	bias estimate that destroys information about the relative ranges of individuals
$H_{S} (δ t)$	$\frac{\| {\hat{B}}_{IRP} (δ t) - \hat{I} (δ t) \|}{\hat{I} (δ t)}$ ; quantifies diversity of supports	∼1 implies homogeneous supports; ∼0 implies diverse supports
B_RP(δt)	$\| {\hat{B}}_{PRP} (δ t) - {\hat{B}}_{IRP} (δ t) \|$ ; quantifies diversity of supports; quantifies cardinality of individual data sets	$~ {\hat{B}}_{IRP} (δ t)$ can imply diverse supports or cardinality per-element data sets; ∼0 can imply homogeneity in supports
δG(δt)	difference in the difference between how population diversity renders in $\bar{I}$ and $\hat{I} (δ t)$	>0 implies population diversity
δρ(δt)	$\| \int_{\bar{S}} p_{1} (\bar{S}) - \int_{\hat{S}} p_{1} (\hat{S}) \|$ ; quantifies diversity in supports	>0 implies population diversity.
H_Θ(δt)	how representative the population used to estimate I at δt is of the time-independent (e.g., the entire) population	∼0 implies the entire population is well represented; ∼1 implies portions of the population are overrepresented
N_min(δt)	minimum number of pairs of points contributed by any one individual	a lower bound on the representation of an individual; 1/N_min(δt) is a rough estimate of B_E(δt) for the individual with the fewest pairs

Open in a new tab

The graphical schematic for the TDMI analysis of a population; note that by *TDMI Present,* we mean that the relevant TDMI measure (e.g., $\hat{I} (δ t)$ ) is greater than *bias*.

Step one: Determining the computability of $\bar{I} (δ t)$

To begin, one must determine whether $\bar{I} (δ t)$ and $\hat{I} (δ t)$ are calculable for a given (or set of) δt(s). In general, to estimate $\bar{I} (δ t)$ , every representative individual must (under most circumstances) have at least 100 pairs of points available for the TDMI calculation. Similarly, to estimate $\hat{I} (δ t)$ , there must be at least 100 pairs of points gathered over the entire population—this is why $\hat{I} (δ t)$ is so useful in the context of a population. It is important to note that the rough number, 100 pairs of points per PDF estimate, is based on how PDF estimation technique converge in practice and is discussed in detail in Ref. 19. Nevertheless, the number of pairs of points needed is process dependent as is true for all quantities that depend on PDF-estimate schemes (very wide supports can require many more than 100 points whereas some distributions can require fewer than 100 points). Moreover, for a given δt, the number of pairs of points separated by a given δt depends both on length of the time series and the sampling rate; hence, the reason why we specify the number of pairs of points instead of the length of the time series. Because of these reasons, it is important to perform sensitivity analysis, estimate the bias, observe the PDF estimates for all quantities and their variation under perturbation, and generally use care to insure the robustness of the results.

Assuming that $\bar{I} (δ t)$ is calculable, because the calculation of I for an individual is independent of the support of the distribution, the variance in the distribution of $\bar{I} (δ t)$ is due to differences in the graphs of the PDFs representing each patient at a given δt. Further, because $\bar{I} (δ t)$ is made of individuals who have been averaged, the interpretation of the statistical moments of $\bar{I} (δ t)$ (i.e., the mean, variance, etc), is a scientific problem that depends on the particular circumstances.

The interpretation of $\hat{I} (δ t)$ is more difficult because $\hat{I} (δ t)$ can be composed of purely graphical, purely support, and intermixed support and graphical components, Thus, because $\hat{I} (δ t)$ is a population-dependent quantity where the individual contributions cannot be separated, it will be treated in Sec. 8B with δI(δt).

Step two (A in Fig. 2): Interpreting δI(δt) or $\hat{I} (δ t)$

Step two has two courses of action depending on whether it is possible to calculate $\bar{I} (δ t)$ or not: (1) $\bar{I} (δ t)$ and $\hat{I} (δ t)$ are calculable and thus δI(δt) can be computed and (2) only $\hat{I} (δ t)$ , B_RP(δt), and $H_{S} (δ t)$ are calculable (when $\hat{I} (δ t)$ is calculable, this will always be the case). When δI(δt) is available, it, as estimated by both a KDE and histogram estimator, is all we need know: the closer δI(δt) is to zero, the more homogeneous the population is and the more $\hat{I} (δ t)$ represents a single, statistically singular source and the larger in magnitude δI(δt) is, the more statistically heterogeneous the population is and the more $\hat{I} (δ t)$ represents the population. Of course, if the histogram and KDE TDMI estimates differ substantially, it is likely that there are significant small sample size effects present in $\bar{I} (δ t)$ , and this needs to be taken into consideration when interpreting δI(δt), $\bar{I} (δ t)$ and $\hat{I} (δ t)$ . Moreover, in this circumstance, calculation of either B_RP(δt) = |B_IRP(δt) − B_PRP(δt)| or $H_{S} (δ t)$ can be used to further qualify the small sample size effects on the variation in the supports versus the graphs. Finally, when δI(δt) is positive, and $H_{S} (δ t)$ shows no diversity due to the supports, then all the diversity in the population is due to the graph-based diversity.

When $\bar{I} (δ t)$ is not calculable, one is left with only $\hat{I} (δ t)$ , $\hat{I} (δ t)$ , and B_RP(δt) or $H (δ t)$ . In this case, one can still use B_RP(δt) or $H (δ t)$ to detect the homo- or heterogeneity in the supports. If there is no support-based variation, then pure graph-based heterogeneity maybe difficult to determine; in this circumstance, we recommend using a non-TDMI metric such as V_S(p), which will have greater statistical power while sacrificing temporal dependence, to help determine the graphical composition of the population. In general, if there is support-based variation, it will likely be difficult to separate support-based, versus graph-based, contributions; it will be even more difficult to specify the proportion of diversity contributed by the support- versus graph-based effects.

Step three (B in Fig. 2): Assessing population representation

Finally, it is extremely important to understand what portions of the population actually have points in a given δt bin. Recall that the make-up of the population used to estimate I at a specific δt is a concern because of the filtering effect (cf. Sec. 4C); specifically, it is possible to have entire portions of the population excluded from the data set as well as a highly nonuniform distribution of the population represented in the data set used to estimate the PDFs. Written differently, it is important to always remember that δI is always calculated relative to a fixed δt which will have a particular bin population—when studying the evolution of I as δt is varied, the representative population can change as δt changes. Thus, it is important to at least calculate H_Θ(δt) or an H_Θ-like quantity to verify what proportion of the population is being included in the PDF estimate. Moreover, we also find it convenient to keep track of the minimum (and sometimes maximum) number of pairs of points contributed by an element represented in the data set used to estimate the PDFs; we denote this number by N_min(δt) as a measure of the least representative individual.

QUANTITATIVE EXAMPLES FOR TDMI INTERPRETATION AND POPULATION HOMOGENEITY EVALUATION

Simulated data examples: The quadratic map and the Gauss map

To explicitly demonstrate how to interpret $\bar{I}$ and $\hat{I}$ in the presence of a diverse population in a variety of circumstances, consider two sources of simulated data, the quadratic map

x_{t + 1} = f (x_{t}) = {ax}_{t} (1 - x_{t}),

(50)

where a is set to 4 and the Gauss map

x_{t + 1} = g (x_{t}) = \frac{1}{x_{t}} \mod 1 .

(51)

These sources were chosen because their statistical structures are well understood,²^,²³^,²⁴ they are chaotic, they are both 1-dimensional maps defined over the unit interval (meaning, they have the same support), and they have relatively different invariant densities. Figure 3 shows the graphs of the quadratic and Gauss maps, their individual invariant densities (PDFs of the orbit), and the sum of their invariant densities. Thus, in this context, the difference between p_f and p_g, $ε (x)$ , is both large enough such that the G’s will be non-zero and is non-uniform over the domain or nonlinearly dependent on x. The data sets we will use, based on the maps above, include:

(Color) The graphs of the quadratic map (Eq. 50) and the Gauss map (Eq. 51)—note the significant difference between the graphs of the mappings, and invariant density (PDF of the orbit) for the quadratic map, Gauss map, and the sum of the quadratic and Gauss maps—note the significant differences between the relative p’s.

Dataset1:Quadratic map time-series with 1000 points; this is one of the data sets meant as a baseline from which all the other cases can be compared.

Dataset2:Gauss map time-series with 1000 points; this is one of the data sets meant as a baseline from which all the other cases can be compared.

Dataset3:Data sets 1 and 2 concatenated into a single data set with 2000 data points; this data set is used primarily to test the effects of differing PDFs within a population on ι, G, and thus, $\bar{I}$ versus $\hat{I}$ .

Dataset4:50 independent, concatenated quadratic map time-series with 20 points each totally 1000 points; this data set is meant to highlight the effect of the estimator bias when calculating $\bar{I}$ versus $\hat{I}$ .

Dataset5:10 independent, concatenated quadratic map time-series with 100 points each totaling 1000 points; this data set is meant to form a baseline for data set 6.

Dataset6:10 independent, concatenated quadratic map time-series with 100 points with disjoint supports with increasing means totaling 1000 points; this data set is used to demonstrate the effect of diverse supports amongst the population where the PDFs are identical on ι, G, B, and thus $\bar{I}$ versus $\hat{I}$ .

Each data set will be denoted by D_i where i is the indexed label of the respective data; the data sets are detailed in Table TABLE III..

TABLE III.

Complete list of the simulated data sets.

Synthetic data sets
Data set	Source	Size
D₁	chaotic quadratic map	1000 pts
D₂	chaotic Gauss map	1000 pts
D₃	D₁ concatenated with D₂	2000 pts
D₄	50 independent concatenated 20 pt chaotic quadratic map time series	1000 pts
D₅	10 independent concatenated 100 pt chaotic quadratic map time series	1000 pts
D₆	10 independent concatenated 100 pt chaotic quadratic map time series each with non-overlapping, monotonically increasing support	1000 pts

Open in a new tab

Finally, to save space, we will demonstrate the TDMI and non-TDMI-based computations on all the simulated data sets at one time. We will adhere to the algorithm shown in Fig. 2 when analyzing the real data sets.

TDMI-based analysis of the simulated data

Base cases: testing the TDMI-based metrics on individuals—In Table TABLE IV., one can see that both the quadratic and Gauss maps have distinctly different I(τ = 1) values. Note that the Gauss map has a faster decay in correlations; for both maps, all correlations in time decay by τ = 6. Further notice that all bias estimation schemes are essentially identical as expected. This also implies that support-variation detecting quantities such as $H_{S}$ register no variation in supports.

TABLE IV.

TDMI results and homogeneity metrics for the simulated data sets one through six.

TDMI-based quantities
Source	$\bar{I} (τ = 1)$	$\hat{I} (τ = 1)$	${\bar{B}}_{IRP} (τ = 1)$	${\hat{B}}_{PRP}$	${\hat{B}}_{IRP} (τ = 1)$	B_RP(τ = 1)	$H_{S} (τ = 1)$	δρ(τ = 1)	δG(τ = 1)	δI(τ = 1)
D₁	0.72	—	0.008	0.008	0.008	0	0.99	0	0	0
D₂	0.31	—	0.012	0.012	0.012	0	0.96	0	0	0
D₃	0.52	0.37	0.01	0.008	0.007	0.001	0.98	0	0.15	0.15
D₄	0.34 ± 0.07	0.71	0.18 ± 0.03	0.013	0.011	0.002	0.98	0	δI	0.37 ± 0.07
D₅	0.48 ± 0.01	0.71	0.04 ± 0.01	0.006	0.007	0.001	0.99	0	δI	0.24 ± 0.01
D₆	0.48 ± 0.01	1.12	0.04 ± 0.01	1.12	0.011	1.11	0	unknown	unknown	0.55 ± 0.01

Open in a new tab

Support dependent, graph independent analysis—To see how diverse supports are rendered, consider the contrast between D₅ and D₆, whose only difference is in the location of the supports. Both of the support-based TDMI based metrics, B_RP and $H_{S}$ , produced dramatic representations of the disjoint nature of the supports of data set six (cf. Table TABLE IV.). Notably, the difference between both B_RP and $H_{S}$ on D₅ and D₆ are near their respective maxima.

Graph dependent, support independent analysis—Data set three, the quadratic-Gauss aggregated data set, has homogeneity in support in all support-based metrics (B_RP and $H_{S}$ ) as can be seen in Table TABLE IV.. In particular, both $H_{S}$ and all the random permutation bias estimates are totally unaffected by the existence of $\bar{ε} or \hat{ε} \neq 0$ . Furthermore, δI ≠ 0, meaning that the population averaged TDMI and the TDMI of the aggregated population were different. In particular, $\bar{I} > \hat{I}$ , thus leading to the conclusion that $\bar{G} > \hat{G}$ , which is not surprising given that when the ${\bar{ε}}_{i} = {\hat{ε}}_{i}$ for all i, it is reasonable that the ε’s register greater through the sum than the aggregate. In any event, all the TDMI based metrics registered the diversity in the population of PDFs.

Support dependent graph-based analysis—To begin to see how support and graph effects intermix, consider $\hat{I}$ for a data set identical to D₆ except where the quadratic data has been replaced with uniform random numbers, thus yielding data with purely population location information; denote this data set as $D_{6}^{'}$ . Now, $\hat{I} (D_{6}^{'}) \approx 1.16 \pm 0.01$ , thus comparing $\hat{I} (D_{6})$ to $\hat{I} (D_{6}^{'})$ , we notice that the presence of intra-agent time-based correlation decreases the population scale TDMI by a small but measurable amount—here, $| \hat{I} (D_{6}) - \hat{I} (D_{6}^{'}) | \approx 0.04$ . Therefore, while nearly all the intra-agent TDMI is subsumed by the inter-agent TDMI, when there is a presence of both strong intra-agent information as well as strong inter-agent information (i.e., highly disjoint supports), $\hat{I}$ will contain both intra-agent and inter-agent components.

What the example in the previous paragraph shows is that deducing the contribution of the intra-agent and inter-agent components to $\hat{I}$ will, in many cases, be non-trivial. Nevertheless, the use of metrics that detail the PDF variation can sometimes aid in the interpretation of $\hat{I}$ . First, consider how the heuristic metrics of PDF variation render the variation in PDFs. Both the super sensitive H_RA and more robust, less sensitive V(p), for D₆ are about double their values for D₅, even though D₅ will yield considerably noisier PDF estimates due to the smaller sample size per element. Similarly, the TDMI metrics for PDF variation also render population diversity; δI for D₆ is more than twice δI for D₅. However, δI for D₆ has a slightly more complicated interpretation. In particular, while δI represents the difference between the population and the individual TDMI, there is likely a non-trivial component of $\bar{I}$ that is a function of sample size. Thus, δI is not purely the difference between the individual and the population TDMI for unlimited data as it was for D₃. Nevertheless, because $\bar{I} ≫ B_{E} (D_{6})$ and $δ I ≫ B_{E} (D_{6})$ , we know that $\hat{I}$ has components of both individual and population scale TDMI. In fact, considering $| \hat{I} (D_{5}) - \hat{I} (D_{6}) | \approx 0.41$ versus $| \hat{I} (D_{5}) - \hat{I} (D_{6}^{'}) | \approx 0.44$ , one can see that the TDMI for D₆ is due to support-based effects instead of intra-element effects; presumably if the supports for D₆ were nearly overlapping instead of disjoint, $| \hat{I} (D_{5}) - \hat{I} (D_{6}) |$ would be much closer to zero (as is the case for $| \hat{I} (D_{5}) - \hat{I} (D_{4}) |$ ). While it is unusual to be able to compare identical, stationary systems with differing supports, this analysis does suggest that calculating $\hat{I}$ for the raw data and for the data with normalized supports may be useful for determining the proportion of $\hat{I}$ that is due the diversity of the supports.

Non-TDMI-based analysis of the simulated data

Base cases: testing the non-TDMI metrics on individuals—Begin by considering D₁ and D₂, both of which represent only a single source. Both data sets are well defined in p (cf. Fig. 3) and have supports whose lengths, |S|, and boundaries, s_min, s_max, are well resolved and within their expected ranges (cf. Table TABLE V.).

TABLE V.

Heuristic homogeneity metrics for the simulated data sets one through six.

non-TDMI-based population diversity metrics
Source	$H (\bar{x})$	Var(n_i)	$s_{\min} \pm V_{s_{\min}}$	$s_{\max} \pm V_{s_{\max}}$	\|S\| ± V_\|_S_\|	H_RA	$V_{\bar{S}} (p)$
D1	0	0	0.0001	0.999	0.9989	0	0
D2	0	0	0.0002	0.9998	0.9997	0	0
D3	0	0	0.0002 ± 0.0003	0.9989 ± 0.0015	0.9987 ± 0.0018	0.16	0.09
D4	0	0	0.02 ± 0.02	0.996 ± 0.006	0.98 ± 0.03	0.9	0.39
D5	0	0	0.001 ± 0.002	0.9997 ± 0.0006	0.998 ± 0.003	0.37	0.13
D6	0	0	5.5 ± 3	6.5 ± 3	0.997 ± 0.004	0.68	0.32

Open in a new tab

Support dependent, graph independent analysis—To see how variations in the supports are rendered, consider the contrast between D₅ and D₆, whose only difference is in the location of the supports. In contrast to D₅, for D₆, the variation in support shows up in the heuristic metrics s_min, s_max, |S|, and especially in the variance of s_min and s_max.

Graph dependent, support independent analysis—Data set three, the quadratic-Gauss aggregated data set, has homogeneity in support in all support-based metrics (s_min, s_max, |S|) as can be seen in Table TABLE V.. In contrast, both of the heuristic metrics designed to detect variation in PDFs (H_RA, $V_{\bar{S}} (p)$ ) registered as non-zero, meaning they detected variation in the PDFs. Moreover, the l₁-like diagnostic, H_RA was more sensitive than the variance based metric, $V_{\bar{S}} (p)$ , as expected.

Support dependent graph-based analysis—By design, none of the examples mix graph and support effects simultaneously.

Quantifying small sample-size effects

To form a baseline of small sample size effects for both real data applications and the support-based effects, we focus on comparing and constraining results for D₄ and D₅, the quadratic map data sets with 50 sets of 20 points, and 10 sets of 100 points.

Small sample size effects on non-TDMI-based support analysis metrics—The heuristic metrics of support diversity, s_min, s_max, and |S| show homogeneity in support for D₄ and D₅ in an absolute sense. However, it is important to note that the invariant density of the quadratic map has most of its mass at the end points, and thus may represent the best case scenario for support based metrics on small data sets. Moreover, differences between D₄ and D₅ can be observed—s_min for D₄ is roughly an order of magnitude larger than s_min for D₅.

Small sample size effects on TDMI-based support analysis metrics—The TDMI based metrics of support diversity (B_RP, $H_{S}$ ) show homogeneity of support, although the individual-wise random perturbation for the random case (B_IRP) is rather high, especially for the 20 point data sets, as one might expect. However, we hypothesize that the primary reason why B_IRP is so high for the 20 point data sets is that, upon randomly permuting any data set, the average τ will be the length of the data set over 3, in this case, $\frac{20}{3} < 7$ . Thus, for very short data sets, it can be difficult to approximate the estimator bias using only the random permutation method.¹⁹

Small sample size effects on non-TDMI-based graph analysis metrics—In contrast to the support-based effects, the heuristic-based PDF variability metrics (H_RA, $V_{\bar{S}} (p)$ ) register substantial diversity among the PDFs D₄ and D₅, effects that are entirely a function of small sample sizes. These results are not surprising given that there will be great variance in the PDF estimate of a quadratic time-series with only 20 points.

Small sample size effects on TDMI-based graph analysis metrics—The small sample size situation highlights both the difference between $\bar{I}$ and $\hat{I}$ and also displays the motivation for why one would want to estimate $\hat{I}$ . The average based TDMI results for both D₄ and D₅do not approximate the 1000 point analogs; and moreover, the addition of more sets of data with similar lengths will not help $\bar{I}$ to converge to the higher point analog but rather decrease the variance in the mean $\bar{I}$ value. Thus, the meaning of $\bar{I}$ is, in a sense, a precision/accuracy type issue; adding more 20 point data sets will make the estimate of $\bar{I}$ more precise, but not necessarily more accurate. That said, accuracy is always defined relative to a target; there is likely less TDMI in the 20 point data set because there is considerably less time-based information in a 20 or 100 point data set than in a 1000 point data set. Therefore, while adding more data sets will not aid in convergence to the infinite point analog, the infinite point analog may not be right target to be aiming for with 20 point data sets. In contrast, the aggregated data sets produce a TDMI equivalent to the 1000 point analog, thus inducing a δI. Moreover, adding points to the aggregated data set will help with convergence to I(τ = 1) for infinitely long data strings.

Interpreting δI when individual elements have few pairs of points—The existence of δI for D₄ and D₅ introduces a form of divergence from I(τ = 1, N = ∞) that is not quite a bias (either estimator or non-estimator); the “true” amount of information in a data string of length 20 is fundamentally different from the “true” amount of information in a data string of length N = ∞—thus δI can also exist due to finite sample size effects. Or, said more quantitatively, $\bar{I}$ , even for an unlimited collection of 100 point data strings, will never be within estimator bias or any other kind of bias, of I(τ = 1, N = ∞) because I(τ = 1, N = ∞) ∼ 0.72 while I(τ = 1, N = 20) ≈ 0.48 ± 0.1. What this means for $\hat{I}$ is that, unless the aggregated data sets are homogeneous enough in their time-dependent correlation structure, $\hat{I}$ will likely represent population distribution information, as $\bar{I}$ would represent the upper bound on time-correlation based information present in each data string. Often the composition of most real world data streams can be difficult to infer; and moreover, it can be a non-trivial problem to discern whether $\bar{I}$ or $\hat{I}$ most faithfully represent a population or individual effects. For instance, in Ref. 25, the authors claim both the presence of time-correlation information and population-based time-correlation being simultaneously present. Usually, a careful analysis of the population composition of the δt bins will help rectify this difficulty.

Real data examples: Glucose values for 100 densely sampled individuals versus 20,000 random individuals

We now move on to applying the insights and techniques of the previous sections to real data. In particular, we will consider two data sets that contain different populations of patients from the CUMC data repository. More specifically, the data sets include

Data set 7:A collection of the 100 patients with the most glucose measurements in the database, ranging from ∼4000 to ∼1500 measurements per patient.

Data set 8 : A collection of 20 000 random patients with at least 2 glucose measurements from among the 800 000 patients with glucose values.

Each data set will be denoted D_i where i is the indexed label of the respective data; the data sets are detailed in Table TABLE VI..

TABLE VI.

Complete list of the real patient data sets.

EHR-based data sets
Data set	Source	Size
D₇	glucose time series from the 100 patients with the most glucose values	length of time series and sampling rate varies with patient
D₈	glucose time series from 20 000 randomly selected patients	length of time series and sampling rate varies with patient

Open in a new tab

To visualize these populations, consider Fig. 4 where the normalized PDFs for each individual for each population and the PDF of the overall populations are plotted. While the population-wide PDFs, shown in Fig. 4c are not wildly different, the relative diversity within the two populations, as shown in Figs. 4a, 4b, is dramatic. The motivation for choosing D₇ is that, for this set, each patient has at least 1000 lab values, both $\bar{I}$ and $\hat{I}$ are calculable. Moreover, the authors hypothesize that patients with so many glucose values are more likely to represent a more homogeneous population compared with the population at large. Given the makeup of D₇, D₈ represents not only a contrast to D₇ in that D₈ is a snapshot of the entire population, but D₈ also represents a pathologically difficult situation data-wise—very few patients have more than 100 glucose values, and the set of possible causes for the existence of a glucose measurement is extremely large (or broad). Thus, not only will $\bar{I}$ be difficult to calculate for D₈ (most patients would not have enough data to generate a PDF estimate), but there is likely tremendous and differing diversity amongst the patients actually included in the estimates of $\bar{I}$ and $\hat{I}$ .

(Color) PDFs of glucose measurements for individuals within a population and for a population for two data sets, the 100 patients with the largest records and 20 000 random patients.

Finally, note that in contrast to the previous analysis of simulated data, we will present the TDMI results first, followed by an analysis using the non-TDMI metrics to verify the TDMI results. The point of this ordering is to demonstrate the TDMI infrastructure without hindsight knowledge.

TDMI-based analysis for data set 7, the well measured population

Analysis of the δt = 6 h time separation using the algorithm in Fig.2—First, considering Table TABLE VII., note that for D₇ with a δt = 6 h, we are able to estimate $\bar{I}$ , and thus δI because N_min(6 h) > 100. Next, note that δI(6 h) is considerably above B_IRP(6 h), meaning that the population on the time-scale of 6 h is heterogeneous. Moreover, both $\bar{I} (6 h)$ and $\hat{I} (6 h)$ are greater than zero, meaning that there is TDMI present in individuals and the aggregated population. To determine the nature of heterogeneity, further consider the support-based metric; $H_{S} (6 h) ~ 1$ points to the population having uniformity in supports or ranges (B_RP(6 h) ≈ B_IRP(6 h) which corroborates this conclusion). Finally, the entire population is reasonably represented for δt = 6 h as confirmed by the fact that N_min(6 h) ∼ 500 and $H_{Θ} (6 h) ≫ 0$ . Thus, the concluding interpretation is as follows: the population is heterogeneous on the δt = 6 h time scale; the heterogeneity in the population is in the graphs not the supports (or the normalizations; there is diverse but present temporal correlation among the population (i.e., the TDMI is not due to the population aggregation but exists because of the individuals); and the entire population is well represented in the TDMI-based quantities.

TABLE VII.

TDMI results and homogeneity metrics for the real patient data sets seven and eight; note all δt times are in hours.

TDMI-based quantities for the δt = 6 h time separation
Source	$\bar{I}$	$\hat{I}$	δI	${\bar{B}}_{PRP}$	${\hat{B}}_{IRP}$	${\hat{B}}_{PRP}$	B_RP	$H_{S}$	H_Θ	N_min
D₇	0.64 ± 0.03	0.22	0.42 ± 0.03	0.02 ± 0.01	0.02 ± 0.005	0.001 ± 0.0005	$~ {\hat{B}}_{IRP}$	1 ± 0.0005	0.31	470
D₈	0.29 ± 0.16	0.38	0.09 ± 0.37	0.2 ± 0.2	0.08 ± 0.005	0.006 ± 0.0005	$~ {\hat{B}}_{IRP}$	1 ± 0.02	0.003	1

Open in a new tab

Analysis of the δt = 24 h time separation using the algorithm in Fig.2—First, considering Table TABLE VIII., note that for D₇ with a δt = 24 h, we are able to estimate $\bar{I}$ , and thus δI because N_min(24 h) > 100. Next, note that δI(24 h) is within the error bars of zero (e.g., below B_IRP(24 h)), meaning that the population is on the time-scale of 24 h is homogeneous. Moreover, both $\bar{I} (24 h)$ and $\hat{I} (24 h)$ are greater than zero, meaning that there is TDMI present in individuals and the aggregated population. To determine the nature of heterogeneity, further consider the support-based metric; $H_{S} (24 h) ~ 1$ points to the population having uniformity in supports or ranges (B_RP(24 h) ≈ B_IRP(24 h) which corroborates this conclusion). Finally, the entire population is reasonably represented for δt = 24 h as confirmed by the fact that N_min(24 h) 500 and $H_{Θ} (24 h) ≫ 0$ . Thus, the concluding interpretation is as follows: the population is homogeneous on the δt = 24 h time scale; there is present temporal correlation among the population (i.e., the TDMI is not due to the population aggregation, but exists because of the individuals); and the entire population is well represented in the TDMI-based quantities.

TABLE VIII.

TDMI results and homogeneity metrics for the real patient data sets seven and eight; note all δt times are in hours.

TDMI-based quantities for the δt = 24 hrs time separation
Source	$\bar{I}$	$\hat{I}$	δI	${\bar{B}}_{PRP}$	${\hat{B}}_{IRP}$	${\hat{B}}_{PRP}$	B_RP	$H_{S}$	H_Θ	N_min
D₇	0.093 ± 0.06	0.077	0.016 ± 0.06	0.02 ± 0.01	0.02 ± 0.005	0.001 ± 0.0005	$~ {\hat{B}}_{IRP}$	0.99 ± 0.01	0.33	479
D₈	0.21 ± 0.15	0.17	0.04 ± 0.15	0.3 ± 0.2	0.07 ± 0.01	0.005 ± 0.001	$~ {\hat{B}}_{IRP}$	0.97 ± 0.001	0.005	1

Open in a new tab

Analysis independent of time—Considering the entropy calculations in Table TABLE IX., D₇ renders some heterogeneity because the difference between $\bar{h}$ and $\hat{h}$ is non-zero. Nevertheless, as we will see for D₈, an entropy difference of 0.73, which is about half the magnitude of $\bar{h}$ , would argue that the static information theoretic interpretation of the population is of relative homogeneity.

TABLE IX.

Time independent TDMI results for the real patient data sets seven and eight.

Time independent TDMI-based quantities
Source	$\bar{h}$	$\hat{h}$
D₇	1.39 ± 0.07	2.12
D₈	0.8 ± 0.22	2.05

Open in a new tab

Sample size issues—There were no sample size issues with respect to either δt time separations studied; in both cases, N_min was well over 100, and thus all PDFs and their respective biases could be accurately estimated. In fact, careful analysis of the population make-up in each δt between 6 h and 56 h revealed that the proportionally of each individual remained relatively constant. Finally, Fig. 6, where both the TDMI estimated using both KDE and histogram estimation schemes are shown, confirms the lack of any small sample size effects because both estimation schemes are essentially equal.

(Color online) The TDMI for both $\bar{I}$ and $\hat{I}$ with δt bins of 6 h for a period of a few days for D₇ and D₈; note that the bias estimates can be found in Tables 8, TABLE VII.. With respect to (a), note the following: for δt ≤ 6 h, δI > 0 and for δt > 6 h, δI ≈ 0; the KDE and histogram estimates are extremely similar; the diurnal (daily) periodic variation in correlation of glucose is clearly evident in both $\bar{I}$ and $\hat{I}$ . With respect to (b), note the following: for all *δt δI* is consistent and likely zero within bias; the KDE and histogram estimates differ greatly, implying the presence of small sample size effects in the average TDMI calculation; the diurnal (daily) periodic variation in correlation of glucose is clearly evident in both $\bar{I}$ and $\hat{I}$ in all but the KDE estimated TDMI average.

Non-TDMI-based analysis for data set 7, the well measured population

Non-TDMI support-based analysis—To verify the TDMI-based results, begin by observing that heuristic metric that quantifies variation in the supports, $H (\bar{X}) \approx 1$ , which is considered small. Thus, while there is some diversity among how the patients were measured, variation how patients are measured is small. This claim is also justified by the fact that the variance in the number of points contributed, per patient, to the δt = 6 h bin, Var(n_i), is small. Finally, the variance in s_min, s_max and |S| is small compared to the respective values (cf. Fig. 5a). Because these are time-independent measures of the support, and because adding the temporal aspect of the analysis only makes the data set smaller, it is likely that the TDMI analysis of the homogeneity of support are correct.

(Color) Comparisons of the supports and PDF graph variations for two data sets, the 100 patients with the largest records and 5000 random patients.

Non-TDMI graph-based analysis—The most sensitive PDF variation metric, H_RA points to a relatively diverse population, while the less sensitive PDF variation metric $V_{\bar{S}} (p)$ , based on the standard deviation of the distribution of PDFs, points to a relatively homogeneous, yet not totally homogeneous population. Figure 5 confirms this analysis visually. The maxima minus the minima, which, when integrated is essentially H_RA, shown in Fig. 5b, can be seen to be relatively large, thus making H_RA render diversity. In contrast, the variance in the graphs of the PDFs, shown in Fig. 5c, is seen as relatively small for D₇, thus making $V_{\bar{S}} (p)$ render relative homogeneity. It is important to note, however, that $V_{\bar{S}} (p)$ , which is independent of time, does not detail the fact that the population has diverse predictive information for time periods less than 6 h; this is an important distinction to make as it implies that prediction can vary with time despite the overall distribution of physiological variables. Finally, both the TDMI and the heuristic analysis conclude that the population is homogeneous in supports and in the long term (i.e., independent of time), the population is homogeneous; this is because δI ∼ 0 for δt > 12 h and $V_{\bar{S}} (p)$ is small.

TDMI-based analysis for data set 8, the random (less well measured) population

Analysis of the δt = 6 h time separation using the algorithm in Fig.2—First, considering Table TABLE VII., note that for D₈ with a δt = 6 h, we are not really able estimate $\bar{I} (6 h)$ because N_min(6 h) = 1. To interpret $\hat{I} (6 h)$ , we consider the support-based metric; $H_{S} (6 h) ~ 1$ which points to the population, which was filtered and has time points separated by 6 h, having uniformity in supports or ranges (B_RP(6 h) ≈ B_IRP(6 h) which corroborates this conclusion). To give intuition to the graph-based variation, consider $V_{\bar{S}} (p)$ (Table TABLE X.), which implies a somewhat diverse population. Moreover, $V_{\bar{S}} (p)$ for D₈ is twice that of D₇, implying that the population in D₈ is more diverse than that of D₇. Moving beyond the algorithm shown in Fig. 2, we did estimate $\bar{I} (6 h)$ and thus, δI(6 h), only including individuals with enough points to estimate I. Based on this restricted version of δI(6 h), the population appears to be homogeneous. Nevertheless, both the restricted $\bar{I} (6 h)$ and $\hat{I} (6 h)$ are greater than zero, meaning that there is TDMI present in individuals and the aggregated population. This means that there is an apparent contradiction; the restricted δI(6 h) implies a population that is somewhat homogeneous/heterogeneous while $V_{\bar{S}} (p)$ implies a heterogeneous population. This contradiction is resolved by recalling that $V_{\bar{S}} (p)$ is calculated on the entire, non-filtered population and is independent of time and will overestimate graphic diversity, while δI is overly restricted and will underestimate diversity. This interpretation will be substantiated further in Sec. 9B5. Finally, the overall population is poorly represented for δt = 6 h as confirmed by the fact that N_min(6 h) = 1 and H_Θ(6 h) ≈ 0. In fact, for D₈, we know that 63% of the patients (12 763) have no points in the δt = 6 h bin, and only 12% (2400) of the patients have ten or more points in the δt = 6 h bin. Thus, the concluding interpretation is as follows: the population is homogeneous on the δt = 6 h time scale up to what is resolvable by δI(6 h); the represented population has relatively uniform supports; there is diverse but present temporal correlation among the population (i.e., the TDMI is not due to the population aggregation, but exists because of the individuals); the population has diversity relative to their time-independent graphs, but this graph diversity may not reflect the graph diversity of the represented population (i.e., the population used to estimate the TDMI-based quantities); the overall population of patients is poorly represented in the TDMI-based diagnostics; and finally, the overall population of 20 000 patients is diverse, but the patients that have enough data to estimate the TDMI on time-scales of δt ≤ 48 h (i.e., the represented population), which represents a strongly filtered subpopulation, is relatively homogeneous in predictive information regardless of δt.

TABLE X.

Heuristic homogeneity metrics for the real patient data sets seven and eight.

non-TDMI-based analysis metrics
Source	$H (\bar{x})$	Var(n_i)	$s_{\min} \pm V_{s_{\min}}$	$s_{\max} \pm V_{s_{\max}}$	\|S\| ± V_\|_S_\|	H_RA	$V_{\bar{S}} (p)$
D₇	1.042	463.7	29.7 ± 10.7	445.0 ± 58.8	415.4 ± 62.7	0.898	0.432
D₈	30	55	84 ± 35	150 ± 122	66 ± 125	1	0.90

Open in a new tab

Analysis of the δt = 24 h time separation using the algorithm in Fig.2—Considering Table TABLE VIII. (and later, Fig. 6b), the analysis of the TDMI diagnostics for δt = 24 h is essentially identical to δt = 6 h case. Even representative population for both the δt = 6 and 24 h bins is essentially identical down to the individual proportional contributions to the aggregated data set. Thus, the key observation here is the difference between D₇ and D₈; D₇ registered heterogeneity at δt = 6 h and homogeneity at δt = 24 h whereas D₈ does not render a δt dependence in the TDMI-based diagnostics.

Analysis independent of time—Considering the entropy calculations in Table TABLE IX., D₈ renders heterogeneity because the difference between $\bar{h}$ and $\hat{h}$ is non-zero. In particular, compared to the entropy differences for D₇, the D₈ has an entropy difference of ∼1.25, which is substantially larger in magnitude than $\bar{h}$ . Thus the static information theoretic interpretation of the population in D₈, which includes all patients (there is not filtering effect), is of heterogeneity.

Sample size issues—There are three sample size issues present in the TDMI analysis of D₈, the poor representation of the overall population, the inability to estimate I for every representative member of the population, and the overall small sample size and bandwidth/normalization issues. The first issue implies that the probability mass used to estimate the PDFs comes from a very small subset of the population; e.g., only 12% of the population has 10 or more points in the δt = 6 h bin. Thus, the restricted (i.e., filtered) population is likely substantially more homogeneous than the overall population, and the TDMI analysis cannot be said to represent the overall population. Relative to the second issue, since N_min = 1 (for both δt = 6 and 24 h), $\bar{I} (δ t)$ is representative of a smaller population than $\hat{I} (δ t)$ . Finally, the third issue, small sample size effects, can be seen in the large difference (about a factor of 2) between the KDE and histogram estimator based TDMI values seen in Fig. 6.

Non-TDMI-based analysis for data set 8, the random (less well measured) population

Non-TDMI support-based analysis—Begin by noticing that there is considerable diversity in how the 20 000 patients are measured, as can be seen in $H (\bar{X}) \approx 30$ , which is 30 times larger $H (\bar{x})$ for D₇. Considering this in conjunction with Var(n_i) ≈ 50 for D₈, which is much smaller than Var(n_i) for D₇, implies that very few of the patients have many points. Said differently, the reason why Var(n_i) is relatively small compared to $H (\bar{x})$ is that n_i is bounded from below by 0 and is never very large for any member of D₈. That this is the fact is reflected in variance in s_min, s_max, and |S|, which is large (on the order of, or greater than) the values of s_min, s_max, and |S|, respectively (cf. Fig. 5). Heuristically, this effect can be seen by observing the range of values seen in Fig. 4b versus Fig. 4a—the population of 20 000 yields a range of glucose values roughly five times that of D₇.

Non-TDMI graph-based analysis—The most sensitive PDF variation metric, H_RA points to a relatively diverse population. In contrast to the results for D₇, the less sensitive PDF variation metric $V_{\bar{S}} (p)$ , also points to a heterogeneous population; in particular, $V_{\bar{S}} (p)$ is just about twice the $V_{\bar{S}} (p)$ for D₇.

Analysis of the TDMI under variation of δt

A central motivation for using the TDMI is to observe how nonlinear correlation evolves in time; however, in the context of a diversely measured population, one must take care to ensure the TDMI signal represents a relatively constant population. Relative to D₇ and D₈, we know that, for δt between 6 and at least 56 h, the representative population is roughly constant. Figure 6 details the temporal evolution of the TDMI, and with it, exhibits five notable features.

First, both data sets display diurnal peaks in predictability; a full explanation of these peaks, which is dependent the structure of meal times.²⁶ This is scientifically interesting because it is a signal that can be used to test physiological models, it can be used to distinguish populations, it implies that outside of very local time windows, measurements separated by 24 are more informative than measurements separated by fewer hours, and finally, the diurnal peaks confirm the presence of diurnal cycles in humans that are believed to exist.

Second, relative to D₇, the population appears to be heterogeneous on time scales of 6 h and less, and homogeneous on time scales longer than 6 h. This can be seen in Fig. 6a, where δI(6 h) is relatively large and drops to zero by δt = 12 h. This is an interesting result that we are still working to understand.

Third, by comparing the results for D₇ and D₈, we can observe a difference in the degree of homogeneity amongst the population. In particular, combining the facts that the error bars for $\bar{I}$ are large for D₈ compared to D₇, δI is independent of δt for D₈, δI for D₈ is much larger than for D₇, and the broad qualitative TDMI signal (i.e., the diurnal peaks) is the same for both D₇ and D₈, it seems clear that both data sets have somewhat homogeneous populations (i.e., homogeneous enough to resolve a similar signal), but D₇ is considerably more homogeneous than D₈.

Fourth, considering Fig. 6b, it is clear that the aggregate TDMI resolves the diurnal peaks considerably better than the average TDMI. This is confirms the usefulness of the aggregate TDMI in the context of a complex, diversely measured population.

And fifth, the small sample size effects are clearly evident when comparing the difference between the histogram and KDE estimates of the TDMI between Figs. 6a, 6b. In particular, the two different estimates for the aggregate TDMI on D₇ are essentially identical, while the aggregated TDMI estimates on D₈ differ in a nontrivial way (by more than a factor of two). The average TDMI calculations display an even stronger effect. Finally, the error bars for D₈ are about ten times the magnitude of the error bars for D₇.

The point is, the time evolution of the TDMI is both scientifically valuable in that it leads to insights not otherwise observed and interpretable in the context of a time dependent, complex, diversely measured population using the infrastructure presented in this paper.

DISCUSSION AND COMMENTS

Specific results of the interpretative framework relative to real data

The methods in this paper were shown to work for both a well understood computer-generated data set and for a pathologically diverse real data set. The entropy for all populations registered the populations as diverse. Nevertheless, the TDMI produced a more nuanced picture. In particular, for one set of patients, the TDMI calculation implied that a set of patients have differing predictive information up to 6 h and are homogeneous in correlation afterwards. In contrast, the same calculation on a heavily filtered general population (the population that had frequent data measurements), yielded a population that seemed homogeneous with respect to time-dependent correlation. Thus, while it is likely that these populations are different, a full explanation, which requires more clinical study, is beyond the scope of this paper. Nevertheless, the TDMI analysis yielded results that were understandable, given this pathologically difficult population of data.

Using categorical billing code data to help verify the TDMI analysis

Because the patients in D₇ and D₈ are real patients in the hospital, we can consider the billing codes, which can act as a proxy for population composition, assigned to the patients in the various populations. To do this, consider the fraction of patients with the two most frequent billing codes for three data sets, D₇, D₈, and the subset of D₈ used to estimate the TDMI-based diagnostics, $D_{8}^{'}$ (members of the $D_{8}^{'}$ subpopulation have at least 10 glucose measurements separated by 6 h or less). There are two features of that are important to pay attention to: (1) the overall fraction of patients that have a given billing code and (2), the drop off between the fraction of patients with the most and second most common billing codes. For D₇, 75% of the patients are covered by a single billing code and the drop between the most and second most common billing codes is around 5%—thus 70+% of these patients likely have relatively similar afflictions. In contrast, the most frequently seen billing code in D₈ only covers 25% of the population, followed by a 10 point drop off. In contrast, at least 50% of $D_{8}^{'}$ is covered by a single billing code, while the second most common billing code only covers only a quarter of the population—a 25 point drop. This implies more homogeneity than D₈ but less than D₇. Thus, broadly speaking, the billing code analysis corroborates the conclusions drawn from the time-based information theoretic analysis in Sec. 10A. Moreover, the billing codes are largely independent of the specific lab values, and thus, can be seen as an outside test of the validity of the TDMI analysis.

How our method addresses nonstationarity

At various points in this paper, we have alluded to how nonstationarity is addressed within our framework. To be more explicit, consider three cases: (1) a single nonstationary source, (2) multiple different stationary sources, and (3) multiple different nonstationary sources. Relative to case (1), because there is diversity in the data sets, δI ≈ 0, ${\hat{B}}_{IRP} = {\hat{B}}_{PRP}$ , and $H_{S} \approx B_{E}$ —thus, there will be no distinction between stationarity and nonstationarity. Case (2) is the case we handled in Sec. 9A and does not need explanation. And case (3) will behave identically to case (2); nonstationary will be difficult to detect, but multiple different statistical states will be detectable. While it might be too much to ask to be able to distinguish nonstationarity amongst a population from a population with multiple stationary sources, we can detect nonstationarity within an individual, given enough data points. In particular, relative to case (1), the reason why all the diagnostics fail to detect multiple statistical states is that there is no concept of averaging over a population. To address this issue, one only needs to partition the single time series into multiple pieces (of sufficient length), and then apply the standard TDMI analysis from this paper to the new “population” of time series. Said differently, to detect nonstationarity in a single source, one only needs to treat the single source as multiple sources and apply our machinery; if it appears that there are multiple sources, then you know that the single source has multiple statistical states and is thus nonstationary.

Comments regarding the connection between the supports and the normalizations of the distributions

In a sense, all support-based variation amongst the population could be eliminated by normalizing all individuals to some standard support (or to a distribution with mean zero and variance one). We did not implement this because sometimes the normalization of the support matters with respect to the composition of the population, and we wanted to allow for the TDMI infrastructure to capture this type of dependence. Relative to the example in this paper, having glucose oscillate around 500 means the patient is very sick, whereas glucose oscillation around 100 means the patient is likely healthy (at least from a blood glucose perspective)—we wanted to be able to capture this type of heterogeneity. That said, if one begins with a normalized population and performs the TDMI analysis, any δI must exist because of variation in the graphs of the PDFs. However, if one has enough points per patient to estimate $\bar{I}$ , one knows this anyway upon calculating ${\hat{B}}_{IRP}$ and ${\hat{B}}_{PRP}$ ; when there are not enough points to estimate I for every individual, then deducing temporal, graph-based variation is difficult.

Future directions regarding the use of this technique

One of the sources of motivation for performing this calculation is based on the idea of stratifying or clustering populations of individuals by their predictive information. Based on the TDMI infrastructure here, we have identified at least 3 different subpopulations based on their predictive information structure. Thus, future computational problems will involve developing and testing a more automated form of this interpretive structure that can be used for generating hypothesized sub-categories of individuals and eventually an infrastructure that can be integrated with classification and clustering schemes.

Some remaining statistical problems

In this work, we attempted to outline and show, mathematically, how to interpret the TDMI and information entropy for aggregated populations. Nevertheless, there are many problems that remain unresolved. In particular, a partial list might include full rigorous proofs regarding: the technical conditions under which our claims (i.e., δI > 0 if an only if $ε_{i} > 0$ for some i) apply; the convergence properties of various quantities we propose (i.e., δI, $H_{S}$ . etc); and the full relationships between what the information entropy and TDMI can imply about one another.

SUMMARY

We have fashioned a methodology for computing and interpreting the TDMI for a nonuniform population of time series. Within this methodology, one of the means of estimating the TDMI for a population of time series allows for a meaningful TDMI estimate to be performed on a system that has many poorly measured components, none of which support a TDMI calculation. Specifically, given a population of time-series that are: non-uniformly measured in time, of diverse lengths, from statistically diverse sources, and pathologically sparse, our methods will likely still yield interpretable results. Achieving such a methodology was the original motivation for this work. An explicit prescription for interpreting I for a fixed time separation δt for a population can be found in Fig. 2 within Sec. 8. Moreover, an algorithmic portrayal can be found in Appendix B.

The process of interpreting the TDMI for a population revealed a way to quantify the degree of diversity within a population, should the population of time series have diverse sources. Specifically, our methodology gives a way of detecting and quantifying temporal heterogeneity (or homogeneity) in a population, given enough data. Moreover, our methodology allows for the calculation of statistical quantities such as bias (which is built into the methodology) and statistical significance of the TDMI signal (which is easily estimated via standard methods such as applying a bootstrap technique).

Broadly, these results have two primary consequences. First, our methodology extends the circumstances under which temporal nonlinear correlation for real world, complicated data sets can be calculated in two broad directions: (1) populations of time series where the sources are not necessarily the same and (2) poorly sampled collections of time series from diverse or identical sources. Second, the act of interpreting the TDMI for a population with an unknown temporal makeup yields three advancements: (1) the population can be stratified by their temporal structure; (2) the population can be categorized as homo or heterogeneous when this information was previously unknown; and (3), because of (2), it is possible to reduce the number of confounding factors that generate the signal.

We have expended substantial effort to make this framework as easy to use as possible. Nevertheless, as is the case with nearly all time series techniques, one must use care and intelligence when using this methodology. Aside from the standard correlation is not causation issues that arise when using the TDMI even for an individual, there are some hazards that are worth mentioning or recalling.

First, whenever one aggregates a population that is possibly diverse, diversely measured, and potentially non-stationary, temporal signals can arise simply through the act of aggregation. Thus, one must always be mindful of this very real issue, and be vigilant in working to understand the nature of the source of any given signal that appears as δt is varied.

Second, the methodology-specific sources of confounding factors, such as effects from variation in the representative population as δt is varied, are covered in this paper. Nonetheless, it is easy to imagine a large number of application-specific confounding factors (i.e., factors that are a function of specific details of the system being studied) that could lead to signals being generated when aggregating an uncontrolled population of time series.

Third, one must have, at some fundamental level, enough data to make robust and believable PDF estimates; how much data is enough data depends on the details of the particular system being studied. We gave a rough lower bound of 100 pairs of points being necessary to estimate $\bar{I}$ to $\hat{I}$ . We have found that, upon bootstrapping and observing PDF estimates, 100 points appears to be the minimum number of points required to get stable PDF estimates. However, if the support of the distributions is extremely wide, 100 pairs points may be far too few. Similarly, if the distribution is bimodal where the modes are clearly separated, fewer than 100 pairs of points may be required. It is, therefore, important to observe the PDFs being generated for all the estimates, and sometimes, perform a bootstrapping analysis on the population can be used to estimate a confidence interval (in particular, for $\hat{I} (δ t)$ ), and the bias estimate techniques can give insight into the meaning, accuracy, and precision of the results.

And fourth, when possible, once one has a TDMI-based interpretation, it can be useful to corroborate the interpretation with other available data. For instance, we verified or crosschecked our results with other data sets (e.g., the billing codes) to help interpret the representative sets of humans.

One final note; our methodology was designed for application in complex situations with complex data sets. In these situations, it is especially important to use common sense, context, and observation of the computational process to fully interpret the numbers the framework produces.

ACKNOWLEDGMENTS

The authors would like to thank two anonymous reviewers, J. Dias, N. Elhadad, A. Perotte, and D. Varn for carefully reading this paper and providing many useful comments. D.J.A. would like to thank C. Shalizi for early discussions related to this work. Finally, the authors would like to acknowledge the financial support provided by NLM Grant No. RO1 LM06910.

APPENDIX A: DETAILED AVERAGE TDMI CALCULATION

Begin by recalling the definition of the average TDMI

\bar{I} (τ) = \frac{1}{N} \sum_{i = 1}^{N} \int p (X_{i} (j), X_{i} (j - τ)) \times \log (\frac{p (X_{i} (j), X_{i} (j - τ))}{p (X_{i} (j)) p (X_{i} (j - τ))}) {dX}_{i} (t) {dX}_{i} (t + τ) = \int \bar{ι} (τ) dX (t) dX (t + τ) .

(A1)

Next, recall that for the average TDMI, we have PDFs defined entirely with respect to the abstract support, $\bar{S}$ . In this situation, we define the ith PDF relative to the “average” PDF, p₁, by

p_{i} = p_{1} (\bar{S}) - {\bar{ε}}_{i} (\bar{S}),

(A2)

where ${\bar{ε}}_{i} (\bar{S})$ is distance between the graphs of p₁ and p_i at a given value in $\bar{S}$ . Next, for convenience, define the following: p(X_i(j), X_i(j − τ)) = p(j,τ), p(X_i(j)) = p(j), p(X_i(j − τ)) = p(τ), ${\bar{ε}}_{i} (\bar{S}) = {\bar{ε}}_{i}$ , $p_{i} (j, τ) = p_{1} (j, τ) - {\bar{ε}}_{i}$ , $p_{i} (j) = p_{1} (j) - {\bar{ε}}_{i}$ , and $p_{i} (τ) = p_{1} (τ) - {\bar{ε}}_{i}$ . With this notation, we can now re-write the integrand in Eq. A1

= \frac{1}{N} [p_{1} (j, τ) \log (\frac{p_{1} (j, τ)}{p_{1} (j) p_{1} (τ)}) +

(A3)

\sum_{i = 2}^{N} (p_{1} (j, τ) - {\bar{ε}}_{i}) \log (\frac{p_{1} (j, τ) - {\bar{ε}}_{i}}{(p_{1} (j) - {\bar{ε}}_{i}) (p_{1} (τ) - {\bar{ε}}_{i})})],

(A4)

Next, factoring $\frac{p_{1} (j, τ)}{p_{1} (j) p_{1} (τ)}$ out of the summation term, one arrives at

= \frac{1}{N} [p_{1} (j, τ) \log (\frac{p_{1} (j, τ)}{p_{1} (j) p_{1} (τ)}) +

(A5)

\sum_{i = 2}^{N} (p_{1} (j, τ) - {\bar{ε}}_{i}) [\log (\frac{p_{1} (j, τ)}{(p_{1} (j)) (p_{1} (τ))}) +

(A6)

\log (\frac{1 - \frac{{\bar{ε}}_{i}}{p_{1} (j, τ)}}{1 - \frac{{\bar{ε}}_{i}}{p_{1} (j) p_{1} (τ)} (p_{1} (j) + p_{1} (τ)) + \frac{{\bar{ε}}_{i}^{2}}{p_{1} (j) p_{1} (τ)}})]] .

(A7)

Multiplying and collecting terms under the sum, one obtains

= \frac{1}{N} [{Np}_{1} (j, τ) \log (\frac{p_{1} (j, τ)}{p_{1} (j) p_{1} (τ)}) +

(A8)

\sum_{i = 2}^{N} {\bar{ε}}_{i} [\log (\frac{p_{1} (j, τ)}{(p_{1} (j)) (p_{1} (τ))}) +

(A9)

\log (\frac{1 - \frac{{\bar{ε}}_{i}}{p_{1} (j, τ)}}{1 - \frac{{\bar{ε}}_{i}}{p_{1} (j) p_{1} (τ)} (p_{1} (j) + p_{1} (τ)) + \frac{{\bar{ε}}_{i}^{2}}{p_{1} (j) p_{1} (τ)}})] +

(A10)

p_{1} (j, τ) \log (\frac{1 - \frac{{\bar{ε}}_{i}}{p_{1} (j, τ)}}{1 - \frac{{\bar{ε}}_{i}}{p_{1} (j) p_{1} (τ)} (p_{1} (j) + p_{1} (τ)) + \frac{{\bar{ε}}_{i}^{2}}{p_{1} (j) p_{1} (τ)}})],

(A11)

= \bar{ρ} (τ) + \frac{1}{N} [\sum_{i = 2}^{N} {\bar{ε}}_{i} [\log (\frac{p_{1} (j, τ)}{(p_{1} (j)) (p_{1} (τ))}) +

(A12)

\log (\frac{1 - \frac{{\bar{ε}}_{i}}{p_{1} (j, τ)}}{1 - \frac{{\bar{ε}}_{i}}{p_{1} (j) p_{1} (τ)} (p_{1} (j) + p_{1} (τ)) + \frac{{\bar{ε}}_{i}^{2}}{p_{1} (j) p_{1} (τ)}})] +

(A13)

p_{1} (j, τ) \log (\frac{1 - \frac{{\bar{ε}}_{i}}{p_{1} (j, τ)}}{1 - \frac{{\bar{ε}}_{i}}{p_{1} (j) p_{1} (τ)} (p_{1} (j) + p_{1} (τ)) + \frac{{\bar{ε}}_{i}^{2}}{p_{1} (j) p_{1} (τ)}})],

(A14)

= \bar{ρ} (τ) + \bar{G} (τ),

(A15)

where $\bar{G} (τ)$ can be shown to have the more digestible form

\begin{matrix} \bar{G} (τ) = \\ - \frac{1}{N} [\sum_{i = 1}^{N - 1} (\frac{{\bar{ϵ}}_{i}}{p (X_{1} (j), X_{1} (j - τ))}) \\ (\log \frac{p (X_{1} (j), X_{1} (j - τ))}{p (X_{1} (j)) p (X_{1} (j - τ))}) \\ + \log (\frac{1 - \frac{{\bar{ϵ}}_{i}}{p (X_{1} (j), X_{1} (j - τ))}}{(1 - \frac{{\bar{ϵ}}_{i}}{p (X_{1} (j))}) (1 - \frac{{\bar{ϵ}}_{i}}{p (X_{1} (j - τ))})}) \\ (\frac{{\bar{ϵ}}_{i}}{p (X_{1} (j), X_{1} (j - τ))} - 1)] . \end{matrix}

(A16)

APPENDIX B: PSEUDOCODE FOR INTERPRETING THE TDMI FOR A POPULATION OF TIME SERIES

Algorithm 1:

How to interpret the TDMI for a population of time series

if there are enough points to estimate

\bar{I}

(usually ∼100 pairs of points per representative individual are required)

then,

estimate δI and H_Θ

ifδI > B_IRPthen

the population is heterogeneous

H_{S} ~ 0

then

supports (or ranges) are diverse or disjoint

elseif

H_{S} ~ 1

, then

supports (or ranges) are uniform

endif

elseifδI ≤ B_IRP, then

the population is homogeneous

endif

ifH_Θ ∼ 0, then

the population is well represented

elseifH_Θ ∼ 1, then

the portions of the population are overrepresented

endif

else if not enough pairs to estimate

\bar{I}

, then

estimate

\hat{I}

H_{S}

, and H_Θ

H_{S} ~ 0

, then

supports (or ranges) are diverse or disjoint

if there are enough pairs of points per patient to estimate a PDF for each patient at the specific δt,then

V_{\hat{S}} (p)

(i.e., V(p) relative to the abstract supports)

V_{\hat{S}} (p) ~ 1

, then

the population used to estimate

\hat{I}

has graph-based heterogeneity

elseif

V_{\hat{S}} (p) ~ 0

, then

the population used to estimate

\hat{I}

is graphically homogeneous

endif

elseif it is not possible to accurately estimate a PDF for each patient at the specific δt,then

it is not possible to determine the contribution of the graph-based heterogeneity to the overall heterogeneity

endif

elseif

H_{S} ~ 1

, then

supports (or ranges) are uniform

V_{\bar{S}} (p) ~ 1

, then

the population used to estimate

\hat{I}

has graph-based heterogeneity

elseif

V_{\bar{S}} (p) ~ 0

, then

the population used to estimate

\hat{I}

is homogeneous

endif

ifH_Θ ∼ 0, then

the population is well represented

elseifH_Θ ∼ 1, then

the portions of the population are overrepresented

endif

{NOTE: there are 10 possible sharp interpretations for both δI and

\hat{I}

-only cases.}

{All TDMI interpretations should include: I-like quantities (e.g.,

\hat{I}

, δI, etc), population diversity qualification (support- and graph-based contributions to diversity; if they are unknown, this should be specified), and the make-up of the population used to estimate the I-based quantities (e.g., H_Θ.}.

{NOTE: even under the best circumstances, it may be difficult to determine what proportion of the heterogeneity is due to support-based versus graph-based diversity.}

Open in a new tab

References

Komalapriya C., Thiel M., Ramano M. C., Marwan N., Schwarz U., and Kurths J., Phys. Rev. E 78, 066217 (2008). 10.1103/PhysRevE.78.066217 [DOI] [PubMed] [Google Scholar]
Sprott J. C., Chaos and Time-series Analysis (Oxford University Press, New York, 2003). [Google Scholar]
Kantz H. and Schreiber T., Nonlinear Time Series Analysis, 2nd ed. (Cambridge University Press, UK, 2003). [Google Scholar]
Hogan W. and Wagner M., J. Am. Med. Inform Assoc. 5, 342 (1997). 10.1136/jamia.1997.0040342 [DOI] [PMC free article] [PubMed] [Google Scholar]
van der Lei J., Methods Inf. Med. 30, 79 (1991). [PubMed] [Google Scholar]
Sagreiya H. and Altman R. B., J. Biomed. Inf. 43, 747 (2010). 10.1016/j.jbi.2010.03.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
Higgins J. M. and Mahadevan L., Proc. Natl. Acad. Soc. U.S.A. 107, 20587 (2010). 10.1073/pnas.1012747107 [DOI] [PMC free article] [PubMed] [Google Scholar]
Shudo E., Ribeiro R. M., and Perelson A. S., J. Viral Hepat. 15, 357 (2008). 10.1111/j.1365-2893.2007.00954.x [DOI] [PMC free article] [PubMed] [Google Scholar]
Turner M. S., Phys. Today 62, 8 (2009). 10.1063/1.3226778 [DOI] [PMC free article] [PubMed] [Google Scholar]
Scargle J. D., Astrophys. J. 263, 835 (1982). 10.1086/160554 [DOI] [Google Scholar]
Baisch S. and Bokelmann G. H. R., Comput. Geosci. 25, 739 (1999). 10.1016/S0098-3004(99)00026-6 [DOI] [Google Scholar]
Schulta M. and Stattegger K., Comput. Geosci. 23, 929 (1997). 10.1016/S0098-3004(97)00087-3 [DOI] [Google Scholar]
Liew A. W. C., Xian J., Wu S., Smith D., and Yan H., BMC Bioinf. 8, 137 (2007). 10.1186/1471-2105-8-137 [DOI] [PMC free article] [PubMed] [Google Scholar]
Wasserman L., All of Statistics: A Concise Course in Statistical Inference, (Springer, New York, 2004). [Google Scholar]
Loéve M., Probability Theory I (Springer-Verlag, 1977). [Google Scholar]
Gray A. G. and Moore A. W., “Very fast multivariate kernel density estimation using via computational geometry,” in Joint Stat. Meeting (August 4th, 2003).
Moon Y.-I., Rajagopalan B., and Lall U., Phys. Rev. E 52, 2318 (1995). 10.1103/PhysRevE.52.2318 [DOI] [PubMed] [Google Scholar]
May R. J., Dandy G. C., Maier H. R., and Fernando T. M. K. G., “Critical values of a kernel density-based mutual information estimator,” in International Joint Conference on Neural Networks (IEEE, Vancouver, BC, 2006).
Albers D. J. and Hripcsak G., Estimation of time-delayed mutual information from sparsely sampled sources, e-print arXiv:1110.1615, 2011. [DOI] [PMC free article] [PubMed]
Wheeden R. L. and Zygmund A., “Measure and integral,” in Monographs and Textbooks in Pure and Applied Mathematics (Marcel Dekker, Inc., New York, 1977), Vol. 43. [Google Scholar]
Basharin G. P., Theor. Probab. Appl. 4, 333 (1959). 10.1137/1104033 [DOI] [Google Scholar]
Roulston M. S., Physica D 125, 285 (1999). 10.1016/S0167-2789(98)00269-3 [DOI] [Google Scholar]
Graxzyk J. and Światek G., Ann. Math. 146, 1 (1997). 10.2307/2951831 [DOI] [Google Scholar]
Jakobson M., Commun. Math. Phys. 81, 39 (1981). 10.1007/BF01941800 [DOI] [Google Scholar]
Albers D. J. and Hripcsak G., Phys. Lett. A 374, 1159 (2010). 10.1016/j.physleta.2009.12.067 [DOI] [PMC free article] [PubMed] [Google Scholar]
Albers D. J. and Hripcsak G., Using population scale EHR data to understand and test human physiological dynamics, e-print arXiv:1110.3317, 2011.
It may seem odd to normalize indices, but this just keeps the domain of Θ̃ between zero and one.
To see the variation in the PDF estimates due to small sample sizes, observe the PDF estimates for different sets of uniform random numbers with small cardinality.
Note, the L1 difference is not technically a distance function or a metric because it does not satisfy the triangle inequality.

[c1] Komalapriya C., Thiel M., Ramano M. C., Marwan N., Schwarz U., and Kurths J., Phys. Rev. E 78, 066217 (2008). 10.1103/PhysRevE.78.066217 [DOI] [PubMed] [Google Scholar]

[c2] Sprott J. C., Chaos and Time-series Analysis (Oxford University Press, New York, 2003). [Google Scholar]

[c3] Kantz H. and Schreiber T., Nonlinear Time Series Analysis, 2nd ed. (Cambridge University Press, UK, 2003). [Google Scholar]

[c4] Hogan W. and Wagner M., J. Am. Med. Inform Assoc. 5, 342 (1997). 10.1136/jamia.1997.0040342 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c5] van der Lei J., Methods Inf. Med. 30, 79 (1991). [PubMed] [Google Scholar]

[c6] Sagreiya H. and Altman R. B., J. Biomed. Inf. 43, 747 (2010). 10.1016/j.jbi.2010.03.014 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c7] Higgins J. M. and Mahadevan L., Proc. Natl. Acad. Soc. U.S.A. 107, 20587 (2010). 10.1073/pnas.1012747107 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c8] Shudo E., Ribeiro R. M., and Perelson A. S., J. Viral Hepat. 15, 357 (2008). 10.1111/j.1365-2893.2007.00954.x [DOI] [PMC free article] [PubMed] [Google Scholar]

[c9] Turner M. S., Phys. Today 62, 8 (2009). 10.1063/1.3226778 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c10] Scargle J. D., Astrophys. J. 263, 835 (1982). 10.1086/160554 [DOI] [Google Scholar]

[c11] Baisch S. and Bokelmann G. H. R., Comput. Geosci. 25, 739 (1999). 10.1016/S0098-3004(99)00026-6 [DOI] [Google Scholar]

[c12] Schulta M. and Stattegger K., Comput. Geosci. 23, 929 (1997). 10.1016/S0098-3004(97)00087-3 [DOI] [Google Scholar]

[c13] Liew A. W. C., Xian J., Wu S., Smith D., and Yan H., BMC Bioinf. 8, 137 (2007). 10.1186/1471-2105-8-137 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c14] Wasserman L., All of Statistics: A Concise Course in Statistical Inference, (Springer, New York, 2004). [Google Scholar]

[c15] Loéve M., Probability Theory I (Springer-Verlag, 1977). [Google Scholar]

[c16] Gray A. G. and Moore A. W., “Very fast multivariate kernel density estimation using via computational geometry,” in Joint Stat. Meeting (August 4th, 2003).

[c17] Moon Y.-I., Rajagopalan B., and Lall U., Phys. Rev. E 52, 2318 (1995). 10.1103/PhysRevE.52.2318 [DOI] [PubMed] [Google Scholar]

[c18] May R. J., Dandy G. C., Maier H. R., and Fernando T. M. K. G., “Critical values of a kernel density-based mutual information estimator,” in International Joint Conference on Neural Networks (IEEE, Vancouver, BC, 2006).

[c19] Albers D. J. and Hripcsak G., Estimation of time-delayed mutual information from sparsely sampled sources, e-print arXiv:1110.1615, 2011. [DOI] [PMC free article] [PubMed]

[c20] Wheeden R. L. and Zygmund A., “Measure and integral,” in Monographs and Textbooks in Pure and Applied Mathematics (Marcel Dekker, Inc., New York, 1977), Vol. 43. [Google Scholar]

[c21] Basharin G. P., Theor. Probab. Appl. 4, 333 (1959). 10.1137/1104033 [DOI] [Google Scholar]

[c22] Roulston M. S., Physica D 125, 285 (1999). 10.1016/S0167-2789(98)00269-3 [DOI] [Google Scholar]

[c23] Graxzyk J. and Światek G., Ann. Math. 146, 1 (1997). 10.2307/2951831 [DOI] [Google Scholar]

[c24] Jakobson M., Commun. Math. Phys. 81, 39 (1981). 10.1007/BF01941800 [DOI] [Google Scholar]

[c25] Albers D. J. and Hripcsak G., Phys. Lett. A 374, 1159 (2010). 10.1016/j.physleta.2009.12.067 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c26] Albers D. J. and Hripcsak G., Using population scale EHR data to understand and test human physiological dynamics, e-print arXiv:1110.3317, 2011.

[c27] It may seem odd to normalize indices, but this just keeps the domain of Θ̃ between zero and one.

[c28] To see the variation in the PDF estimates due to small sample sizes, observe the PDF estimates for different sets of uniform random numbers with small cardinality.

[c29] Note, the L1 difference is not technically a distance function or a metric because it does not satisfy the triangle inequality.

PERMALINK

Using time-delayed mutual information to discover and interpret temporal correlation structure in complex populations

D J Albers

George Hripcsak

Abstract

INTRODUCTION

A reader’s guide: The outline of this paper

MOTIVATING EXAMPLES

INFORMATION THEORY BACKGROUND

Average TDMI

Aggregate TDMI

TDMI-SPECIFIC ESTIMATOR BIASES

Sample size dependent estimator bias effects

Fixed point bias estimate for average and aggregate populations

Non-estimator bias: How the TDMI calculation can act as a population filter

Methods for assessing δt bin compositions

POPULATION-BASED DEVIATIONS FROM THE INDIVIDUAL TDMI ESTIMATES

Heterogeneity-based deviations from the individual: Average TDMI case

Entropy of the averaged population

Heterogeneity-based deviations from the individual: Aggregate TDMI case

Entropy of the aggregated population

HOW TO INTERPRET THE TDMI FOR A POPULATION, OR, TDMI-BASED METHODS FOR INTERPRETING POPULATION DIVERSITY

Support dependent, graph independent, effects on the population TDMI

Figure 1.

Graph dependent, support independent, effects on the population TDMI

Support dependent, graph-based effects on the population TDMI

NON-TDMI-BASED METHODS FOR INTERPRETING POPULATION DIVERSITY

TABLE I.

Homogeneity in measurement composition

Homogeneity in measurement distribution supports

Homogeneity in the distribution of the graphs of the measurement PDFs

ASSEMBLING THE PIECES: AN EXPLICIT PRESCRIPTION FOR TDMI ANALYSIS AND INTERPRETATION FOR A POPULATION OF TIME SERIES FOR A FIXED TIME SEPARATION δt

TABLE II.

Figure 2.

Step one: Determining the computability of I¯(δt)

Step two (A in Fig. 2): Interpreting δI(δt) or I^(δt)

Step three (B in Fig. 2): Assessing population representation

QUANTITATIVE EXAMPLES FOR TDMI INTERPRETATION AND POPULATION HOMOGENEITY EVALUATION

Simulated data examples: The quadratic map and the Gauss map

Figure 3.

TABLE III.

TDMI-based analysis of the simulated data

TABLE IV.

Non-TDMI-based analysis of the simulated data

TABLE V.

Quantifying small sample-size effects

Real data examples: Glucose values for 100 densely sampled individuals versus 20,000 random individuals

TABLE VI.

Figure 4.

TDMI-based analysis for data set 7, the well measured population

TABLE VII.

TABLE VIII.

TABLE IX.

Figure 6.

Non-TDMI-based analysis for data set 7, the well measured population

Figure 5.

TDMI-based analysis for data set 8, the random (less well measured) population

TABLE X.

Non-TDMI-based analysis for data set 8, the random (less well measured) population

Analysis of the TDMI under variation of δt

DISCUSSION AND COMMENTS

Specific results of the interpretative framework relative to real data

Using categorical billing code data to help verify the TDMI analysis

How our method addresses nonstationarity

Comments regarding the connection between the supports and the normalizations of the distributions

Future directions regarding the use of this technique

Some remaining statistical problems

SUMMARY

ACKNOWLEDGMENTS

APPENDIX A: DETAILED AVERAGE TDMI CALCULATION

APPENDIX B: PSEUDOCODE FOR INTERPRETING THE TDMI FOR A POPULATION OF TIME SERIES

Algorithm 1:

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Step one: Determining the computability of $\bar{I} (δ t)$

Step two (A in Fig. 2): Interpreting δI(δt) or $\hat{I} (δ t)$