
Researchers from Leibniz Supercomputing Centre in collaboration with IT4Innovations National Supercomputing Center and Ludwig Maximilian University of Munich, have published in a new scientific article the results of their long-term work in designing systems for distributed data management. These systems enable the interconnection of geographically distant computing or data infrastructures, and facilitates easier data transfer from research institutions to computing infrastructures and simplifies the orchestration of complex computational workflows. The system also allows definition of policies for creating redundant copies of data across locations, thereby protecting them against loss.
The requirements for such a system include unified ways of access to data in terms of data transfer, metadata search, and enforcement of access rights. The system should be decentralised, easily extendable with new resource providers handling various types of storages, and at the same time efficiently utilise the available network capacity between locations.
The publication also includes a study examining the existing systems and solutions for distributed data management that meet these requirements. The authors further propose a solution for distributed data management that uses the iRODS system and the B2SAFE module provided by the European infrastructure EUDAT. The work also verifies the aforementioned assumptions, both in terms of required functionalities and practical validation between two selected supercomputing centres.
The authors also describe how such a solution can be used as a data backend within the LEXIS Platform for easy access to computing infrastructures. The researchers conducted performance tests focusing on optimising various technical parameters, such as transfer parallelisation and the buffer size. They concluded that the iRODS system can be used to build a distributed data management system that can be further utilised within the LEXIS Platform for data handling in complex computational workflows. The performance tests demonstrated the potential of the iRODS system to fully utilise the transmission capacity between centres.
The findings from this study are currently being applied in the EXA4MIND project, which aims to explore working with advanced systems for storing structured and unstructured data within complex data analyses using supercomputers.
Article: Data management for distributed computational workflows: an iRODS-based setup and its performance
Link: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0340757
If you want to stay updated about EXA4MIND, subscribe to our newsletter and follow us on LinkedIn, X and Bluesky!