Mass data storage
Many research projects and organisations have large volumes of research data that they need to store in the long term. PDC provides several types of long-term storage for research data.
Types of mass data storage available through PDC
At present PDC primarily offers long-term storage for research data via our Mass Storage System (MSS) which is managed by IBM Spectrum Protect software.
If you are involved in a research project that needs to store data long-term, you are welcome to contact PDC Support to discuss purchasing storage from PDC. As the PDC mass storage can be extended fairly easily and cheaply (by buying more tapes for the MSS and extra licenses for the software), this can be a more economical solution than other storage alternatives such as setting up and then maintaining a tape storage system dedicated just for a single project, or buying storage from commercial companies.
Swestore storage for active data
The National Academic Infrastructure for Supercomputing in Sweden (NAISS) provides Swedish researchers with storage for active research data (that is, data that is being collected and analysed). However, once the data becomes static (for example, when the results of the research have been published), the university that employs the researchers becomes responsible for storing the data. Storage for active data is currently provided by NAISS through the Swestore project .
PDC Mass Storage System (MSS) for archiving data
When data ceases to be actively used for research, it can be stored in PDC’s Mass Storage System or MSS. The MSS is essentially a large library of magnetic tape cartridges that are accessed using a tape robot. This is a very efficient way to archive data that is not accessed frequently but that needs to be stored for a long time.
PDC’s IBM TS4500 tape library currently has
- 14 TS1150 tape drives
- 850 IBM 3592 JD tape cartridges
- ~3500 available tape slots
This gives a total uncompressed data capacity of ~35 PB, extendable to 17,550 slots and 175 PB.
Each IBM TS1150 tape drive has
- built-in compression, up to 3:1
- dual 8 Gbit/s fibre channel interface
- 360 MB/s native speed
- up to 700 MB/s with compression
Each IBM 3592 JD cartridge has
- 10 TB native data capacity
- up to 30 TB compressed data capacity
IBM Spectrum Protect
PDC’s IBM Spectrum Protect is essentially software that is used to manage the data archived in PDC’s mass storage system or MSS - this includes storing the data, backing up the data, and recovering damaged data. This system is sometimes referred to as PDC’s Backup and Archiving Solution.
Different projects and organizations are using PDC’s Backup and Archiving Solution which is connected to the mass storage system or MSS. Data lands in the disk storage pools and, after a certain time, it is migrated to PDC’s MSS. Off-site backup of the system is performed twice a day to the National Supercomputer Centre (NSC) in Linköping. For example, the Swedish Human Protein Atlas (HPA) program is using archiving at PDC and has stored about 370 TB of data which is mirrored off-site, in this particular case to the High Performance Computing Center North (HPC2N) in Umeå.
Some projects storing data at PDC
The following projects have their primary data archives at PDC.
CENTER-TBI
CENTER-TBI , or “Collaborative European NeuroTrauma Effectiveness Research in Traumatic Brain Injuries”, is a large European project that aims to improve the care for patients with traumatic brain injuries and identify the most effective clinical interventions for managing such injuries.
Odin satellite project
The Odin satellite combines two scientific disciplines on a single spacecraft in studies of star formation and the early solar system (astronomy) and the mechanisms behind the depletion of the ozone layer in the Earth´s atmosphere and the effects of global warming (aeronomy). The Swedish Space Corporation, on behalf of the Swedish National Space Board and the space agencies of Canada (CSA), Finland (TEKES) and France (CNES), has developed the satellite for astronomers and atmospheric researchers in the participating countries.
Prisma
Prisma is a Swedish-led satellite project that aims to develop and qualify new technology necessary for future science missions in space. Many of the future projects comprise formation flying and rendezvous, so several spacecraft need to communicate and interact with each other with high precision. That requires exceptional accuracy in measuring and controlling the inter-satellite orientation.
SNIC-SENS
SNIC-SENS is a Swedish project that uses high performance computing resources for analyzing sensitive data. PDC is a partner is this project and provides a backup resource for the National Genomics Infrastructure (NGI), which includes backup of sensitive personal data. The system is based on the IBM Spectrum Protect software and provides backup for the NGI facilities at the KTH Royal Institute of Technology and Uppsala University, and also acts as the backup of the NGI production systems which are operated by the Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX) at Uppsala University.
Human Protein Atlas
The Human Protein Atlas (HPA) is a large Swedish program that aims to map of all the human proteins in cells, tissues and organs. The Human Protein Atlas consists of three separate parts, each focusing on a particular aspect of the genome-wide analysis of the human proteins: the Tissue Atlas showing the distribution of the proteins across all major tissues and organs in the human body, the Cell Atlas showing the subcellular localization of proteins in single cells, and finally the Pathology Atlas showing the impact of protein levels for survival of patients with cancer. All the data in the atlases is open access to allow researchers, both in academia and industry, to freely access the data for exploration of the human proteome. The data in the atlases is archived and backed up at PDC.