Cross-Infrastructure Services: Towards Seamless Access
Michaela Barth, PDC
EUDAT is dead, long live EUDAT!
From 2011 till early 2018, the work of EUDAT was supported by two European Commission (EC) grants: the first under the Seventh Framework Programme (FP7), where the project was known as EUDAT, and the second under the Horizon 2020 Programme (H2020), with the project being referred to as EUDAT2020. Both of these EUDAT projects have worked on developing a pan-European Collaborative Data Infrastructure (CDI) for handling European research data.
This spring the EUDAT2020 project ended, but the EUDAT Collaborative Data Infrastructure and the consolidation of its services will continue: the EUDAT CDI is led by a council composed of service provider representatives and formally represented by EUDAT Ltd – a limited liability company established and registered under Finnish law at the end of February 2018. EUDAT Ltd is hosting the administrative functions (management, operations and technical coordination, plus user engagement and outreach) of the CDI agreement and will represent the CDI network at the European level.
This article describes PDC’s main contributions to EUDAT2020 and discusses where and how the outcomes of that work will be used in the future.
Within EUDAT2020, the Swedish National Infrastructure for Computing (SNIC) was represented by PDC. As mentioned in earlier newsletters (2014 #2 & 2015 #1), PDC made a significant and major contribution towards the development of the B2SHARE service (which enables researchers to reliably store and share small-scale research data). PDC was also responsible for coordinating the B2HANDLE operations (which provide unique persistent identifiers – often referred to as PIDs or handles – for data items stored via EUDAT) and several major upgrades and configuration changes to the service. These modifications were undertaken in order to make the service more secure, robust and scalable. Both of these services were, of course, also operated and available to numerous national and international research groups. Alongside – and in parallel to – this work, the Swedish National Storage Infrastructure has been federated and now uses the EUDAT B2SAFE service, which provides a bridge between the Swedish national and European e-infrastructures.
EUDAT was also aiming to harmonize access to both data and compute services. Joint pilot projects combining HPC and data services were conducted with PRACE , and PDC supported some of those projects, such as the CHARTERED data pilot within material sciences, and its follow-up CHARTERED2, that modelled CHARge TransfER dynamics by applying time-dependEnt Density functional theory. Similarly, EUDAT collaborated with EGI to enable joint access to Data, HTC and Cloud Computing Resources.
Initially, candidate research communities were identified. Typically these were relevant European research infrastructures that were already collaborating with one, or preferably both, of the EUDAT and EGI infrastructures and coming from the fields of earth sciences (such as EPOS and ICOS), bioinformatics (for example, BBMRI and ELIXIR) and space physics (like EISCAT-3D.) Then the primary requirements of those candidate research infrastructures for connecting EGI and EUDAT services into a joint cross-infrastructure offering (with perceived seamless access) were collected.
This information was used to define a generic use case, which was a major milestone for EGI and EUDAT in terms of understanding each other's interfaces, as well as the desired and necessary base functionality. It was not long after this that the first working demonstration accessing both infrastructures with a single user identity to stage data between EGI Federated Cloud services and EUDAT services could be presented, which successfully demonstrated the principle concepts of the intended interoperability. In the second year of the EUDAT EGI pilot activity, three early adopter use cases (from ICOS, EPOS and ENES) were selected from the most mature candidates identified, and they were then supported in developing their workflows and the data streams within those workflows. At the same time, these early adopters provided very useful feedback to EGI and EUDAT in regard to building infrastructure services for the future. This activity was deemed to be very useful by all parties and was thus extended for a third year.
In addition, PDC made contributions to work on harmonizing access policies which again provided the basis for design choices – this time within the AARC blueprint architecture and also when defining the pilot studies within the AARC-2 project that deal with EGI-EUDAT AAI interoperability – these pilots will continue to focus on aspects of the AAI interoperability.
The work done in this task also finds a logical continuation in the Competence Centres within the EOSC-hub project, which will also guarantee future interoperability between EUDAT’s B2ACCESS and EGI’s Check-in (which is a proxy-service that enables access to EGI services and resources via federated authentication). ICOS , one of the research communities that PDC worked very closely with, has set up an ICOS Competence Centre under the EOSC-hub project.
The EOSC-hub project started in January 2018 (with an initial duration of three years) as one of the first steps towards the realisation of the vision of a European Open Science Cloud (EOSC) by the European Commission. EOSC-hub will provide a horizontal pan-European consolidated technical infrastructure ecosystem for such an EOSC, based on the building blocks provided by EGI, EUDAT, PRACE and the INDIGO-DataCloud . Here a horizontal ecosystem means one that the users perceive as being a single infrastructure that spans everything that is there – in other words, with just one login, each user will be able to access all the different infrastructure services that he or she needs to do his/her research. The goal is to produce a federated integration and management system for the future EOSC, with some centralized core services (like the joint catalogue of resources and services, for further details see the beta version), and to emphasise the federated operation of services under an open community-led framework.
In parallel, the EOSCpilot project , which started in January 2017 and is supposed to run for two years, supports the development of the first phase of the EOSC through an increasing number (currently 15) of science demonstrators. The whole process of defining the exact EOSC governance framework and the EOSC rules of engagement is intended to be stakeholder-driven, evidence-based and built on community consent.
Many people are not yet completely clear what the EOSC is really about: is it just a new funding model, or is it an easy access channel to a catalogue of services? Even the name might be confusing to some, since cloud computing per se is not a central aspect of the EOSC.
While cloud computing may be one of the spectrum of services provided by the EOSC, the word “cloud” should be interpreted in a broader sense. The emphasis is really on Open Science and Open Data – where science and research is put into the driving seat of e-infrastructure provisioning for advanced data-driven research and where researchers are given the right means to handle research data in a way that supports the FAIR (findable, accessible, interoperable and reusable) principles.
The EOSC vision is best explained by taking the perspective of the modern researcher: researchers want data handling services that are 100% reliable and that answer the researchers’ needs without the researchers even having to think about the complexities of the underlying system (while still making automation and complete control possible – through clearly defined APIs – for the researchers who want to be able to automatize processes to suit their needs). The services should be agile, reacting quickly to requests, and (like the research itself) the services should be continuously adapting. It should also be very easy for people to start using the EOSC services for their research. However, researchers will need to do more than just play the role of users of the EOSC; since many research infrastructures are already providing specialised data handling services themselves, at least some research communities are going to have to take on part of the responsibility for the provision and co-development of suitable research e-infrastructure services.
Most researchers probably do not see themselves as data-producing factories, yet they really are the driving force in the data-driven research process, and hence they are also going to have to play a role in the quality assurance for data handling services, for example, by reviewing services, participating in the co-development of services and channelling some of their research funds into developing and supporting such services. Yes, money needs to be spent not just on doing research but also on services for handling the resulting research data! Since the EOSC will not only be a trusted virtual environment with free open and seamless services for data storage, management, analysis, sharing and re-use of data across research disciplines with well-established and defined access channels, it will really be a new type of funding model, too. All the power will be in the hand of the researchers; they will be the ones regulating the funding stream coming from the European Commission to the service providers.
The EOSC will also include a concept of virtual research environments to help research communities to find solutions themselves and generally foster interaction between thematic facilities like data infrastructures and scientific clouds, thereby facilitating interdisciplinary research and effectively bridging today's fragmentation of research into different disciplines to leverage research investments. The EC has just endorsed a new EOSC roadmap: www.eosc-hub.eu/news/eu-competitiveness-council-endorsed-implementation-roadmap-european-open-science-cloud .