Abstract
In this paper, we construct a linear differential system in both continuous time and discrete time to model HIV transmission on the population level. The main question is the determination of parameters based on the posterior information obtained from statistical analysis of the HIV population. We call these parameters dynamic constants in the sense that these constants determine the behavior of the system in various models. There is a long history of using linear or nonlinear dynamic systems to study the HIV population dynamics or other infectious diseases. Nevertheless, the question of determining the dynamic constants in the system has not received much attention. In this paper, we take some initial steps to bridge such a gap. We study the dynamic constants that appear in the linear differential system model in both continuous and discrete time. Our computations are mostly carried out in Matlab.
1. Introduction
Patients infected with Human Immunodeficiency Virus (HIV) are very likely to develop Acquired Immunodeficiency Disease Syndrome- (AIDS-) related diseases that are usually fatal if not treated with effective antiretroviral therapies. Since the discovery of HIV in 1983, an efficacious vaccine is yet to be developed to fight the deadly virus. Although highly active antiretroviral therapies (HAART) invented in mid-1990s have saved millions of lives and deterred the disease progression of those infected, HIV infection remains a public health threat. Reducing the risk of HIV transmission is of top priority.
One particular challenge in HIV prevention is its long period of latency period. The average time of an HIV infected patient to become symptomatic with AIDS-related diseases can be more than 10 years [1]. In the sexual transmission of HIV, many of the HIV infected patients may not be aware of their HIV infection status, and the virus continues spreading to their HIV negative partners. Therefore, an in-depth understanding of HIV transmission is the key to successful HIV prevention.
HIV dynamics have long been studied in the field of mathematical epidemiology using linear and nonlinear models [2, 3]. The classic model in epidemiology is the SIR model, which considers the dynamics of the susceptible, infected, and recovered populations [4]. This model is not useful for HIV dynamics, as there is no recovered population. An extension of this is the SEIR model, which includes the population of individuals who are exposed but not yet infected. The period between exposure and infectiousness in HIV lasts about two to four weeks [1]. Since a recovered population does not exist, we can consider this period to have a negligible effect on population dynamics.
Hierarchical models are common in HIV modeling due to the high correlation between risky behavior and HIV incidence [5]. In this paper we will incorporate risk indirectly by considering diagnosed and undiagnosed populations. Intuitively, diagnosed individuals would modify their behavior relative to their behavior prior to the diagnosis.
In this paper, we shall form two models: a continuous time linear differential model and a discrete time differential model. These models are the most fundamental among their kinds. The focus will then be given to determination of the parameter estimates, the dynamic constants in these models. As we will show in this paper, the estimates of the dynamic constants depend on the type of model as well as the qualitative properties of the models.
There are two important dynamic constants in our model, namely, the transmission rates for diagnosed HIV population and for undiagnosed HIV population. One important finding in our study is that the transmission rates for the diagnosed and undiagnosed infected populations are comparable. This leads to our conclusion that the transmission rates should be attached to different groups of susceptibles based on their risk level.
2. General Nonlinear Differential Model
One of the frequently used mathematical models for HIV population dynamics can be described as follows. Let S(t) be the susceptibles. We divide the HIV positive population into two groups: N0 is the populations that are unaware of the infection; N1 is the populations that are aware of the infection. Let ϵi be the mortality rate for the group Ni. Let r be the growth rate of the susceptibles. Let γ0 be the transmission rate of N0 group and let γ1 be the transmission rate of N1 group. Then we have the following nonlinear differential equations:
| (1) |
Here γ1N1(t)S(t) counts for those who are infected by group N1 (per unit time), and among them β is the proportion of those who are aware of their infection. The constant δ denotes the rate of the HIV positive population in N0 group who become aware of their infection (per unit time). So there is a flow of δN0(t) from group N0 to N1 once a member from N0 finds out his/her infection through HIV testing.
Many variations of this nonlinear dynamic model have been considered and appeared in the literature to study the HIV population dynamics. For example, in [6], mortality rate of the susceptibles is considered and appears in the differential equation of S(t). In addition, the parameters are allowed to change but are piecewise constant.
In our differential equation model, we have a few constants: β, δ, γ0, γ1, ϵ0, and ϵ1. These constants essentially determine the qualitative and quantitative properties of the mathematical model. We shall call these constants the dynamic constants of the model. Notice that some of the constants, like γi, may have prior estimates, based on the data collected directly from the groups Ni and S. Some of the constants, like ϵi, will have posterior estimates. The constants δ, β may have prior estimates. Our main focus here is to give posterior estimates of these constants.
We shall remark here that the dynamic constants are model-dependent. This might not be obvious. Even though many of them can be estimated statistically without reference to any models, applying these estimates directly to the model may be problematic, as we shall see in the next section. In this paper, we take some initial steps to estimate the model-based dynamic constants.
3. Linear Differential Model and Preliminary Discussions
We shall now build a simpler linear model. The main assumption is that the susceptible population is a lot larger than N0 and N1. The change of susceptible population, due to HIV infection, is quite small, comparing with the overall size of susceptible. Therefore, we may ignore the dynamics of susceptible population, by assuming that the susceptible population is a constant. This more or less justifies the use of linear system only involving N0 and N1.
Let us start with the HIV transmission rate estimates by Pinkerton [7]. The estimates of transmission rates are
| (2) |
γi are estimated in terms of infection transmitted per person per year. Since the overall susceptible population is a lot larger that Ni, we can assume that HIV transmission events are proportional to the size of N1 and N0. Based on this hypothesis, we may model HIV transmission by linear differential equations:
| (3) |
The dynamic constants δ and β remain unchanged. It is also known that ϵ1≅1.9% [8]. There is no statistics done on ϵ0. So we can assume ϵ0≅1.9% as well.
Next, we shall apply the known estimates and study our linear differential model. Notice that β remain unknown at this moment. According to [8], δ is somewhere around 1/4. We may tentatively set δ = 1/4. Utilizing the estimates of dynamic constants directly from [7, 8], let us consider several cases.
3.1. β = 4/5
We start by assuming that β takes the value of the overall portion of those who are aware of their infection. Now we have the following linear equations:
| (4) |
We found that the two linear independent solutions have growth rate of
| (5) |
However, we know that the growth of N0 + N1 is about 0.048. Hence our assumption β = η = 4/5 is not valid. Even if we ignore the mortality rate, we have
| (6) |
This is still far below the estimated 4.8% growth rate.
3.2. β = 1 or β = 0
One extreme is that β = 1, meaning that the population infected by N1 gets tested and becomes aware of their infection (within the first year). We have
| (7) |
Under this assumption N0 will decrease at the rate of −0.268, which means that the population N0 will gradually vanish in a few years. This cannot be true.
Another extreme is that β = 0, meaning that the population infected by N1 will be initially unaware of their infection (within the first year). We have
| (8) |
We have
| (9) |
The overall HIV population growth will be less than 0.016. This is quite small comparing with the estimate that the growth rate is about 0.048.
3.3. δ, β Not Fixed
One might conclude that δ must be a much smaller number than 1/4, what we have initially assumed. We let δ and β be unfixed. In this case, we have
| (10) |
We have the matrix
| (11) |
We know the growth rates are controlled by the eigenvalues of A. In particular, we might assume that det(A − λ) = 0 with λ = 0.048. This will guarantee that the dominant term of the solution will grow at the rate of 0.048 (per year). Hence we obtain
| (12) |
Simplifying it, we have
| (13) |
Since 0 ≤ β ≤ 1, we find that 0.043 ≥ δ ≥ 0.0258. This suggests that there are between 2% to 5% of N0 getting tested. This percentage seems to be too low comparing with the CDC estimate of about 25%.
We shall remark that our discussion is based on the estimates that γ0 = 0.0927 and γ1 = 0.0268 [7]. As we have seen, directly using these estimates as dynamic constants in differential equation modeling will be inadequate to produce the right kind of outcomes and trend. In this paper, we shall discuss posterior estimate of parameters and hope to find some remedy.
4. Posterior Estimate of Parameters
In our earlier discussion, we directly insert the transmission rates from the statistical analysis into the linear differential system. The result is not satisfactory. It is desirable to estimate the transmission rates that will produce the right kind of outcome from the linear differential system model. Let us recall the CDC data from 2007 to 2013 (in thousands) [8].
We first simplify our notation. Let . We rewrite our linear system as
| (14) |
where
| (15) |
The general solution to this system is
| (16) |
Here λ1, λ2 are eigenvalues of M. They can be both real or complex. There is also a degenerate case λ1 = λ2 that we do not treat here. The behavior of the linear differential system is quite different in these two cases. It is not surprising that we need to use two different methods to estimate the matrix M.
4.1. λ1, λ2 Real: Simple Curve Fitting
We try a global optimization curve fitting using Matlab. We have
| (17) |
Let . Then
| (18) |
Notice that the dominant term suggested the overall rate of growth of HIV infected population grows at the rate close to 2.73%. This seems to be reasonable. But δ, the rate of flow of population from N0 to N1, is estimated at −4%. This is completely off the mark. One remedy is that we first estimate the dominant term and then estimate the remainder.
4.2. λ1, λ2 Real: Dominant Term Estimate
Suppose that λ2 < λ1. Then Pexpλ1t is the dominant term. We shall have
| (19) |
Now
| (20) |
Using curve F(t) = aeλt to fit this data, we obtain
| (21) |
4.3. λ1 Dominant, λ2 Real
Now we can assume λ1 = 0.0236 and use curve fitting to find λ2, P, and Q. We have
| (22) |
and λ2 = −0.172, λ1 = 0.0236. It follows that
| (23) |
We derive that γ1 − ϵ1≅0.041 and δ≅0.05. These parameters seem to be reasonable. However, γ0 − ϵ0 = 0.0498 − 0.1656 = −0.1158. Hence γ0 will be a negative number which is not possible.
4.4. λ1, λ2 Complex with Fixed Real Part
Suppose that λ1 and λ2 are complex. Then λ1 and λ2 are conjugate to each other. In particular, the real part of λ, ℜ(λ1) = ℜ(λ2) should be approximately 0.0236. Write λ1 = λ0 + iμ. We should have
| (24) |
where μ is sometimes called a phase constant. A simple curve fitting shows that
| (25) |
and λ0 = 0.0236 and μ = −0.017. Hence
| (26) |
Let us see what this tells us. We have
| (27) |
This roughly says that there are about 3.25% of N0 that become aware of their infection every year. The annual transmission rate for N1 is 2.9%. The annual transmission rate for N0 is 2.3%.
4.5. λ1, λ2 Complex
We finally use Matlab global optimization to fit the data in the curve
| (28) |
We obtain λ0 = 0.0088, μ = −0.036,
| (29) |
Hence we obtain the estimate
| (30) |
Now we have
| (31) |
It follows that
| (32) |
So γ1 is neglectable and γ0 is about 21%. This again makes the model invalid.
4.6. Discussion
In this section, we choose dynamics constants to fit the temporal data. We have found that these dynamic constants depend on the qualitative properties of the model. Yet, none of the dynamic constants we choose match perfectly with the existing estimates. One reason is that yearly data is not suitable for a continuous time model. Therefore, we shall explore the discrete time model.
5. Discrete Dynamic Model
We may regard Nt (t = 1,2, 3,4, 5,6, 7) as a discrete time dynamical system. Let us assume that this discrete dynamics is defined by a transition matrix T:
| (33) |
In principle, based on our earlier discussion,
| (34) |
Now we would like to estimate T.
5.1. Basic Estimates
The easiest way to find T is by considering the following matrix equations:
| (35) |
For example, for i = 2, we will have
| (36) |
Then we find the following estimate of T:
| (37) |
We can see some consistency among these transition matrices. For example, the (2,1)-th entry has been around 3%. This translates into
| (38) |
This is the rate of transmission for group N1. It seems to be consistent with the estimate of [7].
5.2. (Arithmetic) Average Estimate of T
Now we may average all T's and obtain
| (39) |
Hence
| (40) |
Our estimate yields that δ≅18%; in other words, about 18% of those unaware of their infection will become aware of their infection next year. We also have
| (41) |
If the mortality rate ϵ0 is set to be 0.019, then we have γ0 = 0.017. Similarly, we have
| (42) |
If the mortality rate ϵ1 is set to be 0.019, then we have γ1 = 0.039. This suggests that the transmission rate of N1 group is twice as large as the transmission rate of N0 group. There may be some truth to it. However, we believe that this estimate is off the mark due to the reason that [NiNi+1] are correlated with each other. Hence each estimate T will be biased. We shall correct this and give a more robust estimate later.
5.3. Least Square Estimate of T
Perhaps a good way to estimate T is the least square method. We write
| (43) |
Applying the least square method, we find that the least square solution to T is
| (44) |
This estimate seems to be better than the arithmetic average, in the sense that, irregularities will have smaller effect on the least square solution. Because we can reorder Ni's and the least square solution does not change, we also avoid the pitfall that Ni and Ni+1 are correlated. We have our posterior estimates:
| (45) |
This estimate is similar to the arithmetic average we just computed. The dynamic constant estimates will be very similar. We shall then look for a solution that is more robust. One particular reason that the least square estimate is not satisfactory is that there are additional relations like
| (46) |
that least square method does not take into consideration. In other words, T2, T3 can also be estimated and shall be taken into consideration when we estimate T. We shall offer one remedy that avoids this issue.
5.4. A More Robust Estimate
One of the problems with our estimate is that Nt and Nt+1 are correlated to each other. As a remedy, we pick N1 and N6 as far from each other as possible. We observe that
| (47) |
We compute
| (48) |
Then
| (49) |
We have
| (50) |
5.5. Discussion
This estimate of M is quite consistent with the least square estimate. Our estimate seems to suggest that the transmission rates for N0 and N1 may be in the similar range. By [8], assume that ϵ0 = ϵ1 = 0.019. We have
| (51) |
Every year about 18.5% of those unaware of their HIV positiveness become aware of their infection due to testing. About 40% of those infected by N1 group become aware of their infection. This seems to be consistent with some of the observations in [7], with one exception; namely, in our estimates, the transmission rates for N1 and N0 are very close. Figures 1 and 2 show the difference between TNt and Nt+1.
Figure 1.

Comparison of TNt and Nt+1: N1 group.
Figure 2.

Comparison of TNt and Nt+1: N0 group.
5.6. Arithmetic Average versus Geometric Average
Now we may state our problem in greater generality. Given a temporal vector N(t), suppose that N(t + 1) = TN(t) with transitional matrix T. How should one estimate the matrix T?
As we discussed earlier, we can use least squares with the equations
| (52) |
The least square estimate of T, in some sense, is very similar to the arithmetic mean of the transitional matrix T. But what makes better sense is a geometric mean. More precisely, we have to take into consideration that
| (53) |
Suppose that Tt is the transitional matrix at time t. Then a good estimate of T should be the “geometric average” of Tt. For scalars, one can define the geometric average of p1, p2,…, pn to be the nth root of ∏pi. But matrix multiplications are not commutative and one cannot define the geometric average of matrices. It remains a challenging problem to define computationally a geometric mean of Tt.
5.7. Roots Estimate
Tentatively, we may define the geometric mean by taking roots. For example, we may now consider
| (54) |
Then
| (55) |
We can also consider
| (56) |
We have
| (57) |
Both estimates of T are consistent with the least square estimate and the estimates in the previous section. Above all, all our estimates point to the same range of transmission rates for both N0 and N1.
6. Concluding Remarks
Now we shall compare our dynamic constant estimates in the linear differential model in continuous time and in discrete time.
In the continuous time model, we obtain the transmission rate ϵ1 of about 4% for the N1 group, those who were aware of their HIV infection. Nevertheless, δ the rate of flow from N0 to N1 due to HIV testing turned out to be too low and ϵ0 often came out to be negative, which cannot be the case. The best results are obtained when we assume the two eigenvalues are complex. In this case
| (58) |
Yet δ seems to be quite low. According to the CDC report [8], δ is estimated at about 25%.
There are various reasons why our dynamic constants are inconsistent with known estimates. Firstly, the CDC data we use tends to underestimate the N0 and N1 population sizes, particularly in more recent years. The CDC estimates the sizes of the populations infected with HIV by back calculation. The estimates for any given year will increase as new diagnoses are obtained. HIV may go undiagnosed for up to 10 years without causing the death of the patient (Table 1) [5]. Depending on the stage of the disease, the individual will be counted as undiagnosed for a number of years prior to the diagnosis. This causes the estimates of the population sizes to be smaller than the actual size of the population. A new estimate of the HIV prevalence agrees with this conclusion [9]. Although this new estimate is more conservative than the back calculation method, it may still underestimate the N0 population. Both estimates show a downward trend in the data, but this is likely to be erased as more individuals are diagnosed in the later stages of the disease.
Table 1.
Population of undiagnosed individuals with HIV from 2007 to 2013.
| Year | Diagnosed | Undiagnosed | Percentage of total |
|---|---|---|---|
| 2007 | 929.3 | 183.777 | 16.5 |
| 2008 | 956.9 | 178.1165 | 15.7 |
| 2009 | 982.4 | 170.6282 | 14.8 |
| 2010 | 1007.6 | 165.3507 | 14.1 |
| 2011 | 1031.6 | 162.4248 | 13.6 |
| 2012 | 1057.2 | 160.7760 | 13.2 |
| 2013 | 1080.5 | 161.46 | 13.0 |
Source: [8].
Secondly, our computations assume that the susceptible population is much larger than the infected population. However, failing to obtain the right estimates suggests that opposite may be true—the susceptible population could be much smaller. HIV infection is overrepresented in some subpopulations, such as men who have sex with men (MSM) [1]. This subpopulation is only about 5% of the US population, or approximately 15 million individuals. Not all MSM can be considered to have equal risk of contracting the disease, and since over 1 million individuals are currently living with the disease, the susceptible population may be comparable to the size of the infected population. For this reason, any differential system model of the HIV transmission must include the susceptible population.
The discrete dynamics model seems to be most robust against the bias caused by back calculation and the size of the susceptibles. By ignoring the 2013 year's data, we are able to determine the transmission rates as
| (59) |
Also δ≅0.184, not too different from the CDC estimate 0.25. We see that the dynamic constants in the discrete time model are less affected by the underestimation caused by back calculation. It is also true that the relative size of susceptibles has less effect on the discrete time model than on the continuous time model.
Finally, our estimates of the transmission rates for diagnosed and undiagnosed HIV population are relatively close. This is very different from the previous estimate, where the transmission rate of the undiagnosed population is about 4 times as high as the diagnosed population [7]. This implies that the transmission rates should be attached to the susceptible population. It makes sense to divide the susceptible population into groups depending on the possible transmission rates for them. We shall pursue this in a different paper.
Acknowledgments
This research is partially supported by the NIH R01 Grant AI12125903.
Conflicts of Interest
The authors declare no conflicts of interest regarding the publication of this paper.
References
- 1. CDC HIV Risk Reduction Tool. https://wwwn.cdc.gov/hivrisk/what_is/stages_hiv_infection.html.
- 2.Anderson R. M., May R. M. Population biology of infectious diseases: part I. Nature. 1979;280(5721):361–367. doi: 10.1038/280361a0. [DOI] [PubMed] [Google Scholar]
- 3.May R. M., Anderson R. M. Population biology of infectious diseases: part II. Nature. 1979;280(5722):455–461. doi: 10.1038/280455a0. [DOI] [PubMed] [Google Scholar]
- 4.Kermack W. O., McKendrick A. G. A contribution to the mathematical theory of epidemics. Proceedings of the royal society a: mathematical, physical and engineering sciences. 1927;115(772):p. 700. doi: 10.1098/rspa.1927.0118. [DOI] [Google Scholar]
- 5. Centers for Disease Control and Prevention. HIV Surveillance Report, vol. 27. http://www.cdc.gov/hiv/library/reports/hiv-surveillance.html, 2015.
- 6.Bozkurt F., Paker F. Mathematical Modelling of HIV epidemic and stability analysis. Advances in Differential Equations. 2014;95 [Google Scholar]
- 7.Pinkerton S. D. HIV Transmission rate modeling: A primer, review, and extension. AIDS and Behavior. 2012;16(4):791–796. doi: 10.1007/s10461-011-0042-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Centers for Disease Control and Prevention. Monitoring selected national HIV prevention and care objectives by using HIV surveillance data United States and 6 dependent areas. HIV Surveillance Supplemental Report 2016. 2014;4 [Google Scholar]
- 9.Song R., Hall H. I., Green T. A., Szwarcwald C. L., Pantazis N. Using CD4 Data to Estimate HIV Incidence, Prevalence, and Percent of Undiagnosed Infections in the United States. Journal of Acquired Immune Deficiency Syndromes. 2017;74(1):3–9. doi: 10.1097/QAI.0000000000001151. [DOI] [PubMed] [Google Scholar]
