iRODS Technology for Research Data Storage
Dejan Vitlacil, PDC
In collaboration with other partners from the Swedish National Infrastructure for Computing (SNIC), PDC is developing a new service for research data storage. The new service will make it easier to manage research data and will be a significant step towards providing open access data. Researchers who make use of this storage service for live project data will be able to start generating metadata right from the very beginning of their project. This will make it a lot easier to package data and archive it when their projects come to an end. The expectation is that archiving services will also benefit from these efforts immediately, as metadata is an important part of making data searchable and is also useful when it comes to publishing data.
The Integrated Rule-Oriented Data System (iRODS) is open-source software that provides a comprehensive set of tools to support data management tasks from the initial collection of data through to archiving and reusing the data. This is particularly important given the implications of the worldwide movement towards Open Science and Open Data Access. iRODS is supported and maintained by the iRODS Consortium, which is based at the Renaissance Computing Institute (RENCI), a research institute of the University of North Carolina at Chapel Hill (UNC). The membership-based consortium receives funding from UNC and the other twelve consortium members, which include hardware vendors such as DDN, HGST and IBM, as well as universities such as Utrecht University and University College London. The consortium reports that iRODS is used by research organizations and government agencies worldwide.
Over the last few years the iRODS software has been subjected to very significant refactoring and reorganization – the result of which is that iRODS is now being released as a production-level software distribution with commercial support, as well as a strong user community.
Our goal at PDC during this year is to implement both a new iRODS-based storage service for SNIC, as well as a separate service that will be available at PDC for Swedish researchers whose research data storage requirements are not addressed by the available SNIC services.
The new iRODS-based SNIC service is being developed to expand the service portfolio of Swestore, the Swedish National Research Data Storage Infrastructure operated by SNIC. The new service will have the advantage that the data will be stored in such a way that it will be interoperable with the services for European research data provided through the EUDAT Common Data Infrastructure (CDI). This will make it easier for Swedish researchers to collaborate and share data with other European researchers, as well as to have transparent access to European e-Infrastructures (such as PRACE, EUDAT and EGI), which is in line with the aims of the European Open Science Cloud initiative. While concrete terms of usage for this service are still being worked out, we invite potentially interested pilot users to contact us at support@pdc.kth.se .