Abstract
The connection between theory and data is an iterative one. In principle, each is informed by the other: data provide the basis for theory that in turn generates the need for new information. This circularity is reflected in the notion of abduction, a concept that focuses on the space between induction (generating theory from data) and deduction (testing theory with data). Einstein, in the 1920s, placed scientific creativity in that space. In the field of social network analysis, some remarkable theory has been developed, accompanied by sophisticated tools to develop, extend, and test the theory. At the same time, important empirical data have been generated that provide insight into transmission dynamics. Unfortunately, the connection between them is often tenuous and the iterative loop is frayed. This circumstance may arise both from data deficiencies and from the ease with which data can be created by simulation. But for whatever reason, theory and empirical data often occupy different orbits. Fortunately, the relationship, while frayed, is not broken, to which several recent analyses merging theory and extant data will attest. Their further rapprochement in the field of social network analysis could provide the field with a more creative approach to experimentation and inference.
1. Introduction
Theory and empirical data are in principle intimately interwoven. Yet in the practice of social network analysis, there appears to be a disconnect: theorizing and empiricism often seem to occupy separate orbits, and these separate discussions may be difficult to relate to each other. The root of the problem may lie in the different skill sets required by each, or perhaps in the substantial obstacles to collection of human network data. The following exploration of the distance between theory and empiricism suggests that a rapprochement would be of considerable benefit to the field.
The mid-19th Century American philosopher Charles Peirce coined the term “abduction” (which he also called “retroduction”) to fill a gap he perceived in the territory occupied by induction and deduction. As distilled by Professor Burch [1], Peirce used syllogisms to explain this term, substituting Rule, Case, and Result for the more familiar Major Premise, Minor Premise, and Conclusion. But perhaps more interesting to epidemiologists and social network analysts, he related this logical process to sampling. As Professor Burch explains it, a standard valid syllogism would progress as follows.
-
Rule:
All balls in this urn are red.
-
Case:
All balls in this particular random sample are taken from this urn.
-
Result:
Therefore, all balls in this particular random sample are red.
Peirce then asked what would happen if we change the order of reasoning, by interchanging the Result and the Rule.
-
Result:
All balls in this particular random sample are red.
-
Case:
All balls in this particular random sample are taken from this urn.
-
Rule:
Therefore, all balls in this urn are red.
Burch points out that this is not a valid syllogism but was the core of Peirce's concept of induction. Extraordinary, how closely it captures the epidemiologic mindset. But take it one step farther, and interchange the Result with the Case.
-
Rule:
All balls in this urn are red.
-
Result:
All balls in this particular random sample are red.
-
Case:
Therefore, all balls in this particular random sample are taken from this urn.
Again, not a valid construct, but if we substitute “Alternate Hypothesis” for “Rule,” we appear to capture the essence of hypothesis testing as it is now practiced [2]. Burch maintains that this is neither induction nor deduction, but a new type of argument that Peirce called abduction. Peirce went on to use the three “-ductions” to describe the scientific method as a circular synthesis of the scientific method. The process begins with a conjecture or hypothesis that is based on some observation or thought (abduction). From the hypothesis can be derived consequences, and these can be tested. The resulting test observations can be used to confirm or refute the hypothesis, or more generally, either to draw conclusions about the truth or return to the abductive process of conjuring up a new hypothesis.
Popper did not agree [3]. He relegated the process of hypothesis generation to the realm of psychology and stated overtly that he was not interested in it [3, page 39]. In contrast, Albert Einstein embraced it. As described by Adam [4], Einstein wrote a short newspaper article in 1919 that colocated the process of abduction with the creativity inherent in scientific endeavors. Einstein said: “Intuitive comprehension of the essentials about the large complex facts leads the researcher to construct one or several hypothetical fundamental laws…he [the researcher] does not arrive at his system of thought in a methodical, inductive way; rather, he snuggles (sic) to the facts by intuitive choice among the imaginable axiomatic theories.”
Thus, Peirce and Einstein provide a direct connection between theory, observations, conclusions, and revisions. This view stresses that theory and observation are interdependent, iterative, and connected by creativity. Unfortunately, this connection (though not necessarily the creativity) seems to have attenuated in the application of social network analysis to disease transmission.
2. The Linkage of Theory and Empiricism
Several factors have hindered a tight linkage between theoretical and empirical approaches. First, the cost and time to elucidate sociometric network structure, particularly for hard-to-reach populations such as those who may be at the highest risk for HIV or other communicable diseases, are often viewed as prohibitive. Second, empirical sociometric network ascertainment is imperfect. Since the boundaries of the populations of interest are never known and always changing and the manner in which we find out about connections is not standardized, some connections between individuals or network nodes within those populations are always missed, often in unknown ways that render imputation and interpretation problematic. Third, there is no gold standard and no true or known network against which to measure empirical adequacy. These concerns are all subsumed under the general issue of sampling in networks. Because empirical ascertainment of networks requires a credible sampling procedure, preferably one that justifies the use of standard statistical theory, observations may be suspected. One result has been a movement toward theory-based network simulation wherein the investigator controls the sampling, knows (actually creates) the gold standard, and can test the effect of imposed conditions. The past decade has witnessed a burgeoning of this work and considerable new insight into the structure, function, and dynamics of many types of networks [5, 6].
A persistent problem, however, is the difficulty of relating theoretical network constructs back to some empirical reality. The theoretical biases inherent in sampling are the case in point. There can be no question that sampling matters if one is to have a credible mathematical basis for statistical network inference [7, 8]. Modeling approaches have demonstrated the biases that arise from missing data [9]. In his text, Newman [10] enumerates some of these biases: snowball sampling finds persons in proportion to their eigenvector centrality (i.e., the centrality of their contacts), but the large number of waves required to reach equilibrium may preclude unbiased estimates. Contact tracing suffers from the same problem, with the additional issue of seeking only infected persons, who are a biased sample of the population. Random walk sampling may offer some advantages, since sampling is proportional to degree, and equilibrium can be reached quickly in small groups, but issues of contact recall, unfindable partners, and nonparticipation persist. These assertions are all readily verifiable using mathematical and simulation approaches. There has been little or no empirical validation, however, of many theoretical conclusions that are taken as true. In fact, the assumption of theoretical validity is often so strong that many may find empirical verification unnecessary.
3. Reconnecting Theory to Data
But if the Peirce/Einstein view is to be recaptured, meaningful efforts at falsification of theoretical constructs are needed. As noted, such efforts are generally not attempted, perhaps because of their difficulty, or perhaps because of the a priori assumptions about their inadequacy. (You cannot know if you have the right answer, so why bother.) This is perhaps where Peirce's second syllogism—the balls in my random sample are all red, so those in the urn from which they come must be red—needs to be invoked. Though logically defective—in fact, it epitomizes “the inductive problem” that has concerned philosophers since Hume—it is the basis for the inductive reasoning that, as noted, drives the epidemiological mindset. As argued forcefully by Pearce and Crawford-Brown [11], the notion that falsifiability is the hallmark of science fails to recognize the uncertainties of falsifiability, which can be at least as strong as those of induction. In addition, these authors stress the primacy of replication and validation of findings [12], the need for mature theory examined in multiple ways, and the importance of observations whose ongoing renewal and explanation is actually the work of theory.
Thus, to complete the loop of theory validation, we require repeated demonstration that theoretical predictions are borne out in real life. Empirical verification of theoretical constructs affirms their validity, provides ongoing refinement of parameters, and furnishes a real basis for applying interventions. In the current realm of social network analysis, it would seem that empirical studies provide parameters to theoreticians, and not much else.
4. Some Other Examples
On the other hand, it is also the case that those involved in delineating real-time social networks have focused more on findings and transmission implications than on the specific validation of theoretical constructs. For example, 15 empirical network studies that were used in a synthesis of findings [13] produced over 100 publications, but none focused primarily on testing theoretical findings. There are some examples, however, of empirical attempts to examine theoretical constructs. Take, for example, Newman's assertion that, with random walk sampling, equilibrium can be reached quickly in small groups. Two empirical observations speak to this issue. First, in a direct test of sampling methods [14], networks ascertained by a chain link random walk (wherein the next person in the chain was chosen at random from the contacts of the current respondent) or by nomination (the next person in the chain nominated by the respondent from his/her contacts) were indistinguishable. Second, using those same networks, the underlying pattern of network configuration was evident from the first 10 interviews (out of 206) (Table 1), supporting the notion that the pattern becomes clear quickly.
Table 1.
Number of respondents | 10 | 20 | 30 | 40 | 50 | 100 | 150 | All (206) |
---|---|---|---|---|---|---|---|---|
Number of persons in network | 62 | 131 | 202 | 284 | 367 | 685 | 981 | 1314 |
Degree (mean of interviewed respondents) | 7.4 | 8.1 | 8.3 | 8.6 | 8.8 | 8.4 | 8.0 | 7.6 |
Degree (mean of all persons in network) | 2.3 | 2.3 | 2.3 | 2.3 | 2.3 | 2.3 | 2.3 | 2.3 |
Degree (variance) | 10.5 | 13.4 | 13.1 | 12.8 | 12.5 | 12.1 | 11.8 | 10.9 |
Concurrency (kappa) | 5.9 | 7.1 | 7.0 | 6.9 | 6.8 | 6.6 | 6.4 | 6.1 |
Clustering coefficient | 0.034 | 0.034 | 0.033 | 0.029 | 0.028 | 0.033 | 0.041 | 0.036 |
Power coefficient | 2.79 | 2.19 | 2.23 | 1.76 | 1.71 | 1.65 | 1.59 | 1.59 |
Age assortativity | 0.313 | 0.299 | 0.348 | 0.315 | 0.285 | 0.329 | 0.323 | 0.319 |
In a comparison of centrality measures [15], it was demonstrated that imperfect sample data produced stable network estimates under a variety of circumstances. In a comparison of eight types of centrality measures, high concordance [16] was found among measures ascertained through a complex, mixed sampling scheme despite expectations that these measures would vary because of their differing relationships to the underlying sampling method.
A number of studies, following the observations by Barabási and colleagues of “scale-free” network structure in the world wide web [17–19], attempted to show that networks of persons at risk for HIV and STIs could be fit by a power law curve with a coefficient between 2 and 3 (the statistical requirement for scale-freeness) [13, 20]. Several rigorous statistical analyses [7, 21] of the empirical data from 10 studies found that none of the nine statistical models tested consistently provided the best fit to the degree distributions from those studies. In addition, the best-fit power law model predicted no epidemic threshold for HIV and STIs in the United States, a theoretical observation in obvious contrast to the true condition. This result [21], by providing empirical evidence against the proposed theory, embodies the aforementioned process of “circular synthesis.”
As a final example, the history of concurrency as an important feature of HIV and STD transmission is informative. Though disjointed, and at times acerbic, the discussion has gone back and forth between theory and data and provides a good illustration of how the two interact. The role of concurrency in Africa was first suggested nearly 20 years ago, based both on observation [22, 23] and on theoretical considerations and simulation [24]. In a comprehensive followup [25–27], mathematical development of a simple formula for calculating network concurrency and a simple simulation established the importance of concurrency in transmission. Ten years on, extensive claims have been made for the overriding importance of concurrency in sexual transmission of HIV in Africa [28, 29], with the assertion that multiple sites, assessed in multiple ways, have evidence of substantial concurrency. Though the empirical evidence for these claims has been challenged [30, 31], and the challenge contested [32], the pattern of high long-term concurrency with a relatively low degree distribution has been demonstrated in detail in at least one comprehensive study, in Likoma Island, Malawi [33]. This nonlinear chain of events does nonetheless illustrate the importance of the interplay between conjecture, empirical data, and theoretical development. The next step, not yet completed, would be a theoretical demonstration of rapid epidemic spread in an African setting that would incorporate a low-degree high concurrency configuration and reasonable parameters for transmission based on emerging empirical information on infectivity in acute HIV infection [34]. (In another aspect of concurrency—its potential role in explaining the ethnic disparity in HIV infection in the United States—this type of theoretical and empirical interplay has been attempted to confirm its importance [35].)
5. Interlocking Roles
Though there are other examples of the circular process of empirical and theoretical interaction, they are still few in number. The majority of empirical studies (e.g., large-scale surveys) from which parameters are drawn are usually theory-free. In turn, theoretical and simulation studies, as noted, use these parameters but are often data- and context-free. (An unfair characterization, perhaps, but it is difficult to deny that ethnographers generally do not speak mathematics and mathematicians do not speak the language of the street.)
But from these considerations, a clearer role for theory, empiricism, and their interrelationship may emerge. In his Nobel acceptance speech in 1974, Frederich von Hayek, often called the father of complexity theory, said: “…as we penetrate from the realm in which relatively simple laws prevail [the physical sciences] into the range of phenomena where organized complexity rules…often all that we shall be able to predict will be some abstract characteristic of the pattern that will appear…yet…we will still achieve predictions which can be falsified and which therefore are of empirical significance” [36]. Despite all their difficulties, empirical descriptions of networks, both qualitative and quantitative, have the potential to find those abstract characteristics of a pattern, a task for which theoretical and simulation studies alone are not well suited. Theoretical studies are well suited to exploring patterns, and they often do it best in ways that make little pretense of reality [37] but are geared rather to demonstrating mechanisms and testing the observations. A greater synergy between theory and data could provide the field with a more systematic approach to experimentation and inference.
Fortunately the process of abduction is a method equally approachable by all scientists. Theoreticians can be just as good abductors as empiricists. Anyone is at liberty to think up ideas, but those who “snuggle to the facts” may have the best chance of success.
Acknowledgment
This work is supported in part by Grant R21 DA024611-01 from the National Institute on Drug Abuse, National Institutes of Health.
References
- 1.Burch R. Charles Sanders Peirce. Stanford Encyclopedia of Philosophy, 2009, http://plato.stanford.edu/entries/peirce/
- 2.Goodman SN, Bellhouse DR. Hypothesis tests, and likelihood: implications for epidemiology of a neglected historical debate. American Journal of Epidemiology. 1993;137(5):485–501. doi: 10.1093/oxfordjournals.aje.a116700. [DOI] [PubMed] [Google Scholar]
- 3.Popper K. The Logic of Scientific Discovery. London, UK: Routledge; 1959. [Google Scholar]
- 4.Adam AM. Farewell to certitude: einstein’s novelty on induction and deduction, fallibilism. Journal for General Philosophy of Science. 2000;31(1):19–37. [Google Scholar]
- 5.Anderson RM, Garnett GP. Mathematical models of the transmission and control of sexually transmitted diseases. Sexually Transmitted Diseases. 2000;27(10):636–643. doi: 10.1097/00007435-200011000-00012. [DOI] [PubMed] [Google Scholar]
- 6.Newman MEJ. The structure and function of complex networks. SIAM Review. 2003;45(2):167–256. [Google Scholar]
- 7.Handcock MS, Jones JH. Likelihood-based inference for stochastic models of sexual network formation. Theoretical Population Biology. 2004;65(4):413–422. doi: 10.1016/j.tpb.2003.09.006. [DOI] [PubMed] [Google Scholar]
- 8.Handcock MS. Working Paper. 75. Seattle, Wash, USA: CSSS, University of Washington; 2007. Modeling social networks with sampled or missing data. [Google Scholar]
- 9.Ghani AC, Donnelly CA, Garnett GP. Sampling biases and missing data in explorations of sexual partner networks for the spread of sexually transmitted diseases. Statistics in Medicine. 1998;17(18):2079–2097. doi: 10.1002/(sici)1097-0258(19980930)17:18<2079::aid-sim902>3.0.co;2-h. [DOI] [PubMed] [Google Scholar]
- 10.Newman MEJ. Networks: An Introduction. Oxford, UK: Oxford University Press; 2010. [Google Scholar]
- 11.Pearce N, Crawford-Brown D. Critical discussion in epidemiology: problems with the Popperian approach. Journal of Clinical Epidemiology. 1989;42(3):177–184. doi: 10.1016/0895-4356(89)90053-x. [DOI] [PubMed] [Google Scholar]
- 12.Buck C. Popper’s philosophy for epidemiologists. International Journal of Epidemiology. 1975;4(3):159–168. doi: 10.1093/ije/4.3.159. [DOI] [PubMed] [Google Scholar]
- 13.Rothenberg R, Muth SQ. Large-network concepts and small-network characteristics: fixed and variable factors. Sexually Transmitted Diseases. 2007;34(8):604–612. doi: 10.1097/01.olq.0000258358.13825.a8. [DOI] [PubMed] [Google Scholar]
- 14.Rothenberg RB, Long DM, Sterk CE, et al. The Atlanta urban networks study: a blueprint for endemic transmission. AIDS. 2000;14(14):2191–2200. doi: 10.1097/00002030-200009290-00016. [DOI] [PubMed] [Google Scholar]
- 15.Costenbader E, Valente TW. The stability of centrality measures when networks are sampled. Social Networks. 2003;25(4):283–307. [Google Scholar]
- 16.Rothenberg RB, Potterat JJ, Woodhouse DE, Darrow WW, Muth SQ, Klovdahl AS. Choosing a centrality measure: epidemiologic correlates in the Colorado Springs study of social networks. Social Networks. 1995;17(3-4):273–297. [Google Scholar]
- 17.Barabási A-L, Albert R. Emergence of scaling in random networks. Science. 1999;286(5439):509–512. doi: 10.1126/science.286.5439.509. [DOI] [PubMed] [Google Scholar]
- 18.Barabási AL, Albert R, Jeong H, Bianconi G. Power law distribution of the World Wide Web. Science. 2000;287:p. 2115. [Google Scholar]
- 19.Barabási A-L, Albert R, Jeong H. Scale-free characteristics of random networks: the topology of the world-wide web. Physica A. 2000;281(1):69–77. [Google Scholar]
- 20.Liljeros F, Edling CR, Nunes Amaral LA, Stanley HE, Åberg Y. Social networks: the web of human sexual contacts. Nature. 2001;411(6840):907–908. doi: 10.1038/35082140. [DOI] [PubMed] [Google Scholar]
- 21.Hamilton DT, Handcock MS, Morris M. Degree distributions in sexual networks: a framework for evaluating evidence. Sexually Transmitted Diseases. 2008;35(1):30–40. doi: 10.1097/olq.0b013e3181453a84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Hudson CP. Concurrent partnership could cause AIDS epidemics. International Journal of STD and AIDS. 1993;4(5):249–253. doi: 10.1177/095646249300400501. [DOI] [PubMed] [Google Scholar]
- 23.Hudson CP, Hennis AJM, Kataaha P, et al. Risk factors for the spread of AIDS in rural Africa: evidence from a comparative seroepidemiological survey of AIDS, hepatitis B and syphilis in Southwestern Uganda. AIDS. 1988;2(4):255–260. [PubMed] [Google Scholar]
- 24.Watts CH, May RM. The influence of concurrent partnerships on the dynamics of HIV/AIDS. Mathematical Biosciences. 1992;108(1):89–104. doi: 10.1016/0025-5564(92)90006-i. [DOI] [PubMed] [Google Scholar]
- 25.Kretzschmar M, Morris M. Measures of concurrency in networks and the spread of infectious disease. Mathematical Biosciences. 1996;133(2):165–195. doi: 10.1016/0025-5564(95)00093-3. [DOI] [PubMed] [Google Scholar]
- 26.Morris M, Kretzschmar M. Concurrent partnerships and transmission dynamics in networks. Social Networks. 1995;17(3-4):299–318. [Google Scholar]
- 27.Morris M, Kretzschmar M. Concurrent partnerships and the spread of HIV. AIDS. 1997;11(5):641–648. doi: 10.1097/00002030-199705000-00012. [DOI] [PubMed] [Google Scholar]
- 28.Halperin DT, Epstein H. Concurrent sexual partnerships help to explain Africa’s high HIV prevalence: implications for prevention. The Lancet. 2004;364(9428):4–6. doi: 10.1016/S0140-6736(04)16606-3. [DOI] [PubMed] [Google Scholar]
- 29.Mah TL, Halperin DT. Concurrent sexual partnerships and the HIV epidemics in africa: evidence to move forward. AIDS and Behavior. 2008;14(1):11–16. doi: 10.1007/s10461-008-9433-x. [DOI] [PubMed] [Google Scholar]
- 30.Lurie MN, Rosenthal S. Concurrent partnerships as a driver of the HIV epidemic in sub-saharan Africa? The evidence is limited. AIDS and Behavior. 2009;14:17–24. doi: 10.1007/s10461-009-9583-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Sawers L, Stillwaggon E. Concurrent sexual partnerships do not explain the HIV epidemics in Africa: a systematic review of the evidence. Journal of the International AIDS Society. 2010;13, article 34 doi: 10.1186/1758-2652-13-34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Epstein H. The mathematics of concurrent partnerships and HIV: a commentary on lurie and rosenthal, 2009. AIDS and Behavior. 2009;14(1):29–30. doi: 10.1007/s10461-009-9627-x. [DOI] [PubMed] [Google Scholar]
- 33.Helleringer S, Kohler H-P. Sexual network structure and the spread of HIV in Africa: evidence from Likoma Island, Malawi. AIDS. 2007;21(17):2323–2332. doi: 10.1097/QAD.0b013e328285df98. [DOI] [PubMed] [Google Scholar]
- 34.Cohen MS, Pilcher CD. Amplified HIV transmission and new approaches to HIV prevention. Journal of Infectious Diseases. 2005;191(9):1391–1393. doi: 10.1086/429414. [DOI] [PubMed] [Google Scholar]
- 35.Morris M, Kurth AE, Hamilton DT, Moody J, Wakefield S. Concurrent partnerships and HIV prevalence disparities by race: linking science and public health practice. American Journal of Public Health. 2009;99(6):1023–1031. doi: 10.2105/AJPH.2008.147835. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.von Hayek FA. The Pretence of Knowledge, 2010, http://nobelprize.org/nobel_prizes/economics/laureates/1974/hayek-lecture.html.
- 37.Watts DJ, Strogatz SH. Collective dynamics of ’small-world9 networks. Nature. 1998;393(6684):440–442. doi: 10.1038/30918. [DOI] [PubMed] [Google Scholar]