Abstract
Cyberinfrastructure integrates advanced computer, information, and communication technologies to empower computation-based and data-driven scientific practice and improve the synthesis and analysis of scientific data in a collaborative and shared fashion. As such, it now represents a paradigm shift in scientific research that has facilitated easy access to computational utilities and streamlined collaboration across distance and disciplines, thereby enabling scientific breakthroughs to be reached more quickly and efficiently. Spatial cyberinfrastructure seeks to resolve longstanding complex problems of handling and analyzing massive and heterogeneous spatial datasets as well as the necessity and benefits of sharing spatial data flexibly and securely. This article provides an overview and potential future directions of spatial cyberinfrastructure. The remaining four articles of the special feature are introduced and situated in the context of providing empirical examples of how spatial cyberinfrastructure is extending and enhancing scientific practice for improved synthesis and analysis of both physical and social science data. The primary focus of the articles is spatial analyses using distributed and high-performance computing, sensor networks, and other advanced information technology capabilities to transform massive spatial datasets into insights and knowledge.
Keywords: geographic information science, spatial computational domain, distributed computing, spatial analysis
The term cyberinfrastructure (CI) was first coined by a National Science Foundation Blue-Ribbon Committee (1) to reflect how the traditional modes of scientific research (e.g., experimentation in the laboratory, observation in the field, processing/analyzing on a single calculator or computer, and even calculations on the back of an envelope) are being enhanced and even revolutionized by the integrative capabilities of high-performance computers, storage and visualization tools for very large datasets, digitally enabled sensors and instruments in the environment, virtual organizations for collaborative problem solving, and interoperable suites of software services and tools (2). The world of scientific publishing is being transformed as part of CI evolution (3). CI, therefore, represents a paradigm shift in scientific research that has facilitated collaboration across distance and disciplines, thus enabling quick and efficient scientific breakthroughs that might not be possible otherwise.
Examples include the discovery of abrupt transitions in Earth's climate and ecosystem dynamics, previously unknown properties of minerals at extreme temperatures and pressures deep within the Earth, simulations of the development of the early universe, discoveries and insights through improved ocean models, understandings of individual and group behavior and its relationship to social, economic, and political structures, and creation of a human linkage genetic map (2, 4, 5). As Benioff et al. (6) note, computation, along with theory and experiment, has become “the third pillar” of science and engineering (6). Additionally, making scientific discoveries requires the computational ability to synthesize and analyze very large datasets that are integrated across biological, physical, and social sciences and engineering and across the science–technology interface, where Hey et al. (5) name “data-intensive science” as the “fourth paradigm.” Indeed, CI has become more than just hardware and software but its own evolving area of research in the realm of data-intensive science and digital libraries (5–9), with many countries investing hundreds of millions of dollars in CI research and development (10, 11) and calls coming from diverse scientific communities arguing the urgent need for further levels of CI investment (12, 13). Hey et al. (5) point out that, although we have attained high-performance computing at affordable cost and have made good progress on simulation tools, many challenges remain in effectively integrating multiple field observatories containing thousands of instruments, involving millions of users and petabytes of data, built on a true data grid with the ability to analyze data on that grid with sophisticated data analysis.
Spatial CI is an emerging term in the literature (14–16), and it is defined as a specific type of CI that synergistically integrates the capabilities of CI, geographic information systems (GIS) (17, 18), and spatial analysis (19, 20) for geospatial problem solving and decision making. By spatial or space, we mean both real, physical space (i.e., on the surface of the Earth, in the atmosphere, or under the ocean) and virtual space (e.g., digital worlds or understanding how and where computers are connected worldwide). Nearly all of our knowledge about the world can be classified according to space (location, area, distance, or spatial interaction) as well as time. However, although time is divided into the globally understood units of seconds, hours, years, and so forth, spatial units and associated relationships are much more complex, multidimensional (e.g., x, y, and z), at multiple scales and resolutions, often heterogeneous (even in the representation of a single variable), and always changing over time. Without a clear understanding of space, any associated models, structures, and hypotheses may be erroneous (especially those about relationships among variables).
In particular, the complexity of geographic space poses significant computational and intellectual challenges in distributed spatial data access, sharing, and analysis, government-sponsored spatial data information infrastructures (21), and the geospatial semantic web (22) (i.e., locating and integrating information without human intervention, including providing the ability to search for geographic information within web pages), all of which are part of a spatial CI. However, many of these challenges are already well-known to those working on spatial data, and a variety of approaches not involving spatial CI has arisen to address these challenges. Spatial CI is going beyond these existing approaches by anchoring solutions in more sophisticated thinking about the representation and implications of space coupled with the latest in sophisticated mathematical and statistical models (23–26) and forging more intimate collaborations between computer and information science and the domain disciplines of geography, geology and geophysics, oceanography, ecology, environmental engineering and sciences, and social sciences to name a few (5, 8, 27, 28). Such cross-disciplinary collaborations are making possible new knowledge systems that are leading to, at long last, a partial realization of a “Digital Earth,” as first envisioned by Vice President Al Gore (29) and now epitomized in products such as Google Earth, Microsoft Bing Maps, and National Aeronautics and Space Administration (NASA) World Wind.
The deluge of spatial data collected at an accelerated pace in the foreseeable future from sensor networks, satellites, and even cell phones continues to be driven by the tremendous needs of the aforementioned domains and cannot be well used or well-understood unless it can be properly managed, analyzed, and shared through spatial CI. The dynamic nature of the Earth system (e.g., waves, tides, atmospheric turbulence, and movements in the Earth's crust) further complicates our efforts to accurately and precisely measure the system. Massive datasets are common in the spatial analysis of human systems as well, including population and transportation systems, risk assessment, disease vectors, human mobility, and much more. Spatial analysis (broadly including spatial modeling) itself has traditionally encompassed a variety of approaches, including but not limited to spatial statistics (30, 31), heuristics and optimization (32, 33), and simulation for spatial problem solving and decision making (34, 35). These methods have been extensively applied in many fields (36–39) but have been difficult to implement for large- and multiscale problems that are computationally intensive and require collaborative input. This is a limitation that has existed despite the advances already made to deal with the challenges associated with the complexity of geographic space mentioned earlier. However, spatial CI promises to remove this limitation and thus, transform spatial analyses into powerful and accessible computational utilities for enabling widespread scientific breakthroughs. Spatial CI is also proving invaluable in the estimation of errors that propagate from measurements through the analyses, and it is facilitating the development of better models for error representation, propagation, and management throughout large distributed computational networks (40).
The articles in this Special Feature address how the coupling of CI with spatial thinking and geographic approaches offers a promising path forward for solving scientific problems and improving decision-making practices of significant societal impact (e.g., assessing impacts of global climate change, understanding the complexity of coupled human–natural systems, sustaining ecosystem services, preserving and accessing digital resources in humanities and social sciences, and managing transportation infrastructure). They are far from inclusive of all aspects and current interests of spatial CI, because the field is growing quickly. However, they are representative of current research addressing longstanding problems of the complexity of spatial datasets and spatial analysis as well as the necessity and benefits of sharing spatial data flexibly and securely. This research highlights some of the discoveries and insights that can be gained, and these results could not have readily occurred without spatial CI.
Spatial Principles
The Special Feature begins with a technical treatment by Yang et al. (41) that examines the spatial principles governing the interaction of different parameters and phenomena in a variety of physical geographic studies (e.g., of the Earth's lithosphere, hydrosphere, atmosphere, pedosphere, and global flora and fauna patterns). Chief among them is the development of architecture and algorithms for distributed geographic information processing within a spatial CI (drawing in part on spatial CI theory introduced by Wang and Armstrong) (24) to enhance the understanding of ecosystem dynamics and improve the forecasting of the onset and extent of dust storms in the US southwest. As a result of the experiments, scientists were able to predict the onset of dust storms at higher resolutions (3 × 3 km) over longer time periods (5–10 d).
Physical Science Applications
Helly et al. (40) describe the evolution of a set of methods and software tools to integrate multiscale, -source, and -disciplinary oceanographic data over several recent research cruises to the Antarctic. Their initial goal was to investigate several scientific hypotheses about the movement of sea ice and meltwater plumes from icebergs, but an important parallel effort was the creation of a near real-time geospatial decision-support framework. As they constructed a spatial CI to support this framework, they were led to the development of a sampling scheme that was optimized to capture smaller scales of interest with respect to the broader scale of the study area. This sampling strategy overcame the limitations of the conventional sampling methods used previously (i.e., using a research ship as a static platform for sampling a single parameter on a station by station basis), thereby allowing for more rapid characterization of the surface of the ocean using multiple data streams at sea and in outer space and simultaneously over multiple spatial and temporal scales. Thus, without the spatial CI, Helly et al. (40) would not have been able to make direct observation and characterization of meltwater plumes from individual icebergs and would not have been able to effectively integrate these individual results with regional- and global-scale data. The results lend insights as to the influence of meltwater from icebergs on carbon flux from the surface of the ocean to sediments on the ocean floor as well as to the role that icebergs play in controlling biological productivity in the Weddell Sea. Their results also illustrate the importance of spatial CI in the overall scientific enterprise and identify key architectural and design considerations in the development of current and future Earth-observing systems, especially as oceanographers and other Earth scientists move into an era of petascale computing.
From Physical to Social Sciences and the Humanities
A goal of this Special Feature is to show that spatial CI is not only about using hardware and software or enabling the physical sciences but about distributed knowledge communities that serve the needs of the social sciences and humanities as well as the multiple stakeholders and decision makers of citizen groups from differing social, economic, and political backgrounds. Building a CI is also very much a social as well as a scientific endeavor. As such, Sieber et al. (42) report on a spatial CI incorporating the China Biographical Database (the largest in the world), the China Historical Geographical Information System (part of China's original Electronic Cultural Atlas Initiative), and the McGill-Harvard-Yenching Library Ming Qing Women's Writings database. The study focuses in general on a CI for humanities data, and specifically, on a spatial CI that aids research on Chinese women writers, their kinship networks, their publishing venues, and their literary and social communities. The article provides a critical examination of and recommendations on related issues of conflicting data that researchers may not necessarily want to eliminate from differing data models and geographic scales. This case study shows the value of spatial CI in removing difficulties arising from spatial and also multilingual, biographical, and temporal ambiguities in these databases, solutions that, again, would not be possible without spatial CI.
Buetow (4) notes that, although team or big science will continue to be necessary to achieve research goals, the small independent investigator is still “the engine of innovative research” and the widespread adoption of CI will allow the two approaches to blend harmoniously. Poore (43) expands on this theme in a final perspectives article on the needs and contributions of individual users within a spatial CI. Poore (43) notes that, in particular, as human geographers and other social scientists as well as geographic information scientists actively participate in spatial CIs as users, there is a great opportunity to make spatial CI a truly user-centered enterprise. Spatial CI should make room for not only the scientists who will use cybertools to collaborate at a distance but also the educators who will teach with CIs. This also applies to citizen scientist users who will contribute data and insights to CI projects on some of the most important scientific questions of the day, such as global climate change.
Concluding Perspective
Citizen scientists may, along with professional scientists, increasingly participate in the now ubiquitous cloud computing, which uses service-oriented architecture to control the life cycle of virtual machines and data archives for everything from one's personal address book to the largest of multidimensional, multidisciplinary scientific modeling systems. However, rather than federating autonomous entities (computing centers) into virtual organizations as computational grids do, clouds (Microsoft, Amazon, and Google) instead focus on delivering infrastructure as a service, software as a service, and so on. Huge commercial investments in clouds make it likely that these systems will dominate large-scale computing hardware and software in the next decade (44, 45). Spatial CI is an important subset of the more general CI, spanning both the computationally intense and interdisciplinary use requirements such as service hosting, virtual computing environments, and virtual datasets. The special requirements of spatial CI are a good match for the many common capabilities of clouds, thus warranting further fundamental and empirical research.
Indeed, the notion that spatial is special within CI introduces several interesting research challenges for physical and social scientists alike. Many geographic applications are interdisciplinary and involve multiple stakeholders and decision-makers who have diverse social, economic, and political backgrounds, thereby making collaboration critical but challenging. For example, how do we effectively and securely share and integrate spatial data, information, and analytical methods to develop and sustain evolving geographic knowledge? How do we facilitate collaborative spatial problem solving and decision making through virtual organizations?
Given the promise of spatial CI, for some, the effort in mastering it may still not be balanced by the apparent benefits, suggesting that the technology will always be the reserve of a highly technical group of experts. What will it take to popularize spatial CI beyond these experts, especially if it is to benefit the social sciences and humanities? Perhaps spatial CI will follow the path of GIS and eventually become as transparent as GIS is becoming in the world of Google Maps and Google Earth. Studies such as those by Yang et al. (41) and Poore (43) seek to distill the principles of spatial CI into simpler concepts that lend more obvious value to a broader range of users. Another approach may be to deal with conceptually and computationally unmanageable problems by dividing them spatially, understanding the resulting pieces, and then stitching the results back together. This divide and conquer approach, initially popularized in the literature of computational geometry (46), mirrors the way that society often solves its spatial problems. In the context of spatial CI, this implies spatially heterogeneous data and spatially explicit consideration for parallel and distributed processing within individual high-performance computers and/or across the grid as well as clouds.
Although this Special Feature provides a small sampling of a much broader scientific and engineering enterprise, we hope that it will help to elucidate some important issues and research questions, thereby accelerating scientific progress in this emerging area. As the size of spatial datasets and the complexity of spatial analysis and modeling continue to increase and the need for virtual collaboration in scientific research becomes compelling, the transformative research to establish user-centric, efficient, and extensible spatial CI becomes ever more important and timely. The intellectual merits of spatial CI stem from the complexity of the challenges, the dangers inherent in not fixing the errors that may propagate, the profound need to develop solutions that will benefit many fields of societal relevance, the continuing vision of achieving access to a complete Digital Earth, and the next generation of GIS—CyberGIS—with integrative high-performance, distributed, and collaborative capabilities (25). We have sought to make the case that spatial CI leads to discoveries in science. It is our hope that articles in this Special Feature have shown that spatial CI has facilitated such advances and made them more replicable, more readily distributed, and certainly, better visualized. It is only by advocating spatial CI that we will see the cyber-enabled approaches emerge that can make further scientific advances possible. We urge the scientific community to wait and see.
Acknowledgments
We thank all of the contributors to this special feature for their enthusiasm and skill in authoring these articles. We also thank the many reviewers for their thoughtful insights, which improved the manuscripts. We are grateful to our many colleagues in the Association of American Geographers (AAG) Cyberinfrastructure Specialty Group and the University Consortium for Geographic Information Science (UCGIS) for valuable discussions and inspiration as well as the PNAS editorial board member Susan Hanson. Finally, we thank editors Michael Goodchild and David Stopak for their encouragement, assistance, and helpful reviews. This material is based in part on work supported by National Science Foundation Grant BCS-0846655.
Footnotes
The authors declare no conflict of interest.
References
- 1.Atkins DE, et al. Revolutionizing Science and Engineering Through Cyber-Infrastructure: Report of the National Science Foundation Blue-Ribbon Advisory Panel on Cyberinfrastructure. 2003. National Science Foundation Publication NSF0728 (National Science Foundation, Washington, DC) [Google Scholar]
- 2.Crawford D, et al. Cyberinfrastructure Vision for 21st Century Discovery. 2007. National Science Foundation Publication CISE051203 (National Science Foundation, Washington, DC) [Google Scholar]
- 3.Renear AH, Palmer CL. Strategic reading, ontologies, and the future of scientific publishing. Science. 2009;325:828–832. doi: 10.1126/science.1157784. [DOI] [PubMed] [Google Scholar]
- 4.Buetow KH. Cyberinfrastructure: Empowering a “third way” in biomedical research. Science. 2005;308:821–824. doi: 10.1126/science.1112120. [DOI] [PubMed] [Google Scholar]
- 5.Hey T, Tansley S, Tolle K, editors. The Fourth Paradigm: Data-Intensive Scientific Discovery. Redmond, WA: Microsoft Research; 2009. [Google Scholar]
- 6.Benioff MR, et al. Computational Science: Ensuring America's Competitiveness, Report to the President by the President's Information Technology Advisory Committee. Arlington, VA: National Coordination Office for Information Technology Research and Development; 2005. [Google Scholar]
- 7.Wang S, Zhu X-G. Coupling cyberinfrastructure and geographic information systems to empower ecological and environmental research. Bioscience. 2008;58:94–95. [Google Scholar]
- 8.Hey T, Trefethen AE. Cyberinfrastructure for e-Science. Science. 2005;308:817–821. doi: 10.1126/science.1110410. [DOI] [PubMed] [Google Scholar]
- 9.Foster I. Service-oriented science. Science. 2005;308:814–817. doi: 10.1126/science.1110411. [DOI] [PubMed] [Google Scholar]
- 10.Bohannon J. Distributed computing. Grassroots supercomputing. Science. 2005;308:810–813. doi: 10.1126/science.308.5723.810. [DOI] [PubMed] [Google Scholar]
- 11.Clery D, Voss D. All for one and one for all. Science. 2005;308:809. [Google Scholar]
- 12.Lazowska ED, Patterson DA. An endless frontier postponed. Science. 2005;308:757. doi: 10.1126/science.1113963. [DOI] [PubMed] [Google Scholar]
- 13.Craglia M, et al. Next-Generation Digital Earth: A position paper from the Vespucci Initiative for the Advancement of Geographic Information Science. Int J Spatial Data Infrastructures Res. 2008;3:146–167. [Google Scholar]
- 14.Blais JAR, Esche H. Geomatics and the new cyberinfrastructure. Geomatica. 2008;62:11–22. [Google Scholar]
- 15.Wang S, Liu Y. TeraGrid GIScience gateway: Bridging cyberinfrastructure and GIScience. Int J Geogr Inf Sci. 2009;23:631–656. [Google Scholar]
- 16.Zhang T, Tsou M-H. Developing a grid-enabled spatial Web portal for Internet GIServices and geospatial cyberinfrastructure. Int J Geogr Inf Sci. 2009;23:605–630. [Google Scholar]
- 17.Maguire DJ, Goodchild MF, Rhind DW, editors. Geographical Information Systems: Principles and Applications. New York: Wiley; 1991. [Google Scholar]
- 18.Gewin V. Mapping opportunities. Nature. 2004;427:376–377. doi: 10.1038/nj6972-376a. [DOI] [PubMed] [Google Scholar]
- 19.Taaffe EJ. The spatial view in context. Ann Assoc Am Geogr. 1974;64:1–16. [Google Scholar]
- 20.Miller HJ. Tobler's First Law and spatial analysis. Ann Assoc Am Geogr. 2004;94:284–289. [Google Scholar]
- 21.Onsrud H, editor. Research and Theory in Advanced Spatial Data Infrastructure Concepts. Redlands, CA: ESRI Press; 2007. [Google Scholar]
- 22.Egenhofer M. Toward the geospatial semantic web. In: Makki Y, Pissinou N, editors. Advances in Geographic Information Systems International Symposium. McLean, VA: Association for Computing Machinery; 2002. pp. 1–4. [Google Scholar]
- 23.Anselin L, Florax R, Rey S, editors. Advances in Spatial Econometrics: Methodology, Tools and Applications. Berlin: Springer; 2004. [Google Scholar]
- 24.Wang S, Armstrong M. A theoretical approach to the use of cyberinfrastructure in geographical analysis. Int J Geogr Inf Sci. 2009;23:169–193. [Google Scholar]
- 25.Wang S. A cyberGIS framework for the synthesis of cyberinfrastructure, GIS, and spatial analysis. Ann Assoc Am Geogr. 2010;100:535–557. [Google Scholar]
- 26.Penninga F, Van Oosterom PJM. A simplicial complex-based DBMS approach to 3D topographic data modelling. Int J Geogr Inf Sci. 2008;22:751–779. [Google Scholar]
- 27.Baker KS, Chandler CL. Enabling long-term oceanographic research: Changing data practices, information management strategies and informatics. Deep-Sea Res II. 2008;55(18–19):2132–2142. [Google Scholar]
- 28.Yang C, Raskin R, Goodchild M, Gahegan M. Geospatial cyberinfrastructure: Past, present and future. Comput Environ Urban Syst. 2010;34:264–277. [Google Scholar]
- 29.Gore A. The Digital Earth: Understanding our planet in the 21st Century. Photogramm Eng Remote Sensing. 1999;65:528. [Google Scholar]
- 30.Anselin L. What Is Special About Spatial Data? Alternative Perspectives on Spatial Data Analysis. 1989. Technical Report 89-4 (National Center for Geographic Information and Analysis, Santa Barbara, CA) [Google Scholar]
- 31.Anselin L. Local indicators of spatial association-LISA. Geogr Anal. 1995;27:93–115. [Google Scholar]
- 32.Turton I, Openshaw S. High-performance computing and geography: Developments, issues, and case studies. Environ Plan A. 1998;30:1839–1856. [Google Scholar]
- 33.Krzanowski R, Raper J, editors. Spatial Evolutionary Modelling. Oxford: Oxford University Press; 2001. [Google Scholar]
- 34.Batty M. Cities and Complexity: Understanding Cities Through Cellular Automata, Agent-Based Models, and Fractals. Cambridge, MA: MIT Press; 2005. [Google Scholar]
- 35.Jankowski P, Nyerges T. GIS-supported collaborative decision making: Results of an experiment. Ann Assoc Am Geogr. 2001;91:48–70. [Google Scholar]
- 36.Goodchild MF, Janelle DG, editors. Spatially Integrated Social Science. Oxford: Oxford University Press; 2004. [Google Scholar]
- 37.Miller HJ, Wentz EA. Representation and spatial analysis in geographic information systems. Ann Assoc Am Geogr. 2003;93:574–594. [Google Scholar]
- 38.Wright DJ. Spatial data infrastructures for coastal environments. In: Yang X, editor. Remote Sensing and Geospatial Technologies for Coastal Ecosystem Assessment and Management. Lecture Notes in Geoinformation and Cartography. Berlin: Springer; 2009. pp. 91–112. [Google Scholar]
- 39.Tesfatsion L. Agent-based computational economics: Growing economies from the bottom up. Artif Life. 2002;8:55–82. doi: 10.1162/106454602753694765. [DOI] [PubMed] [Google Scholar]
- 40.Helly JJ, Kaufman RS, Vernet M, Stephenson GR. Spatial characterization of the meltwater field from icebergs in the Weddell Sea. Proc Natl Acad Sci USA. 2011;108:5492–5497. doi: 10.1073/pnas.0909306108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Yang C, Wu H, Huang Q, Li Z, Li J. Using spatial principles to optimize distributed computing for enabling the physical science discoveries. Proc Natl Acad Sci USA. 2011;108:5498–5503. doi: 10.1073/pnas.0909315108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Sieber RE, Wellen CC, Jin Y. Spatial cyberinfrastructures, ontologies, and the humanities. Proc Natl Acad Sci USA. 2011;108:5504–5509. doi: 10.1073/pnas.0911052108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Poore BS. Users as essential contributors to spatial cyberinfrastructures. Proc Natl Acad Sci USA. 2011;108:5510–5515. doi: 10.1073/pnas.0907677108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Fox A. Computer science. Cloud computing—what's in it for me as a scientist? Science. 2011;331:406–407. doi: 10.1126/science.1198981. [DOI] [PubMed] [Google Scholar]
- 45.Gao X, Ma Y, Pierce M, Lowe M, Fox G. Building a distributed block storage system for cloud infrastructure. In: Fox G, Zhao G, Qui J, Hughes A, editors. Proceedings of the 2nd IEEE International Conference on Cloud Computing Technology and Science (CloudCom) Indianapolis, IN: Institute of Electrical and Electronics Engineers; 2010. pp. 1–9. [Google Scholar]
- 46.Preparata FP, Shamos MI. Computational Geometry: An Introduction. Berlin: Springer; 1990. [Google Scholar]