What is cyberinfrastructure?
Cyberinfrastructure (CI), eScience and eInfrastructure are the current terms of art for the networked information technologies supporting scientific research activities, such as collaboration, data sharing and dissemination of findings. These are the computational infrastructures that enable, for instance, global climate modelers to compile heterogeneous information sources in order to understand environmental change or the tools that make the massive quantitative data emerging from the Large Hadron Collider into tractable scientific visualizations.
Here are three features that characterize cyberinfrastructure’s promised transformations in the sciences, drawn from an article I coauthored with Charlotte Lee:
- Community wide and cross-disciplinary collaboration: This refers to the scope of CI projects. Rather than supporting teams or groups, CI practitioners speak in terms of communities, disciplines and domains. It is common to hear sweeping umbrella terms that redefine traditional disciplinary boundaries into vast domains of investigation, i.e., infrastructure for the earth-sciences, brain sciences, ocean sciences or environmental engineering. It is in this sense that CI is often attached to phrases such as ‘the new big sciences’. While there are small scale projects that are characterized as CI, these are often wrapped up into larger assemblies of data standards, common services and shared computational infrastructures. CI seeks to support research in general by identifying vast interdisciplinary swaths that could benefit from data and resource sharing, knowledge transfers, and support for collaboration across geographical, but also institutional and organizational divides.
- Computationally driven data collection, representation and analysis: Common to all CI ventures is the leading role of computational, engineering and informational actors and the promised new technologies that will facilitate working with data. CI projects take into account the internet and the exponentially growing availability of computational resources, but they propose to take scientific infrastructure one step further through deployment of standardized sensor networks and ambitious sets of commonly available tools for storing, manipulating, sharing and analyzing the oncoming ‘data deluge’. This can include efforts to standardize data (often through metadata or ontology building efforts), to provide tools to represent data (such as with vizualization suites) and share data.
- End-to-end integration: All these new features will be tied together in systems that are integrated ‘end-to-end’, arguably the most ambitious of the promises of CI. On one 'end' of the system are the data collection efforts and automated sensors. These are the raw mterials of science. At the other 'end' are the heterogeneous scientific users, working on richer and more diverse sources of data than ever before, using tools that seamlessly facilitate analysis, collaboration and dissemination of findings. Between data collection technologies and users lie the networks, computers, storage and a plethora of more subtle integration technologies that promise to facilitate communication and interoperation across all the boundaries that plague interdisciplinary collaboration: hard technologies such as fiberoptic cables and grid computing, soft technologies such metadata standards and ontologies, and even softer on-paper agreements between institutions and agencies of science to facilitate the movement of ‘siloed’ data and findings.
It is this massive assembly of computational tools, resources and collaborative support that together constitute cyberinfrastructure and its promise to revolutionize the sciences.