Table 2.
A taxonomy of skills for data-intensive research.
| Data management and processing | Software skills for science | Analysis | Visualization | Communication for collaboration and results dissemination |
|---|---|---|---|---|
| Fundamentals of data management | Software development practices and engineering mindset | Basic statistical inference | Visual literacy and graphical principles | Reproducible open science |
| Modeling structure and organization of data | Version control | Exploratory analysis | Visualization services and libraries | Collaboration workflows for groups |
| Database management systems and queries (e.g., SQL) | Software testing for reliability | Geospatial information handling | Visualization tools | Collaborative online tools |
| Metadata concepts, standards, and authoring | Software workflows | Spatial analysis | Interactive visualizations | Conflict resolution |
| Data versioning, identification, and citation | Scripted programming (e.g., R and Python) | Time-series analysis | 2D and 3D visualization | Establishing collaboration policies |
| Archiving data in community repositories | Command-line programming | Advanced linear modeling | Web visualization tools and techniques | Composition of collaborative teams |
| Moving large data | Software design for reusability | Nonlinear modeling | Interdisciplinary thinking | |
| Data-preservation best practices | Algorithm design and development | Bayesian techniques | Discussion facilitation | |
| Units and dimensional analysis | Data structures and algorithms | Uncertainty propagation | Documentation | |
| Data transformation | Concepts of cloud and high-performance computing | Meta-analysis and systematic reviews | Website development | |
| Integrating heterogeneous, messy data | Practical cloud computing | Scientific workflows | Licensing | |
| Quality assessment | Code parallelization | Scientific algorithms | Message development for diverse audiences | |
| Quantifying data uncertainty | Numerical stability | Simulation modeling | Social media | |
| Data provenance and reproducibility | Algorithms for handling large data | Analytical modeling | ||
| Data semantics and ontologies | Machine learning |
Note: Many if not most of these elements apply across multiple categories. This taxonomy was initially created in a workshop involving natural and physical scientists, information scientists, and computer scientists (isees.nceas.ucsb.edu), with modest refinements by the authors.