EXA4MIND relies on a co-design approach, where technology partners from computing centres and universities and application partners from industry, academia and SMEs design an Extreme Data infrastructure in close collaboration.
Authors: A. Vobecky, D. Hurych, O. Siméoni, S. Gidaris, A. Bursuc, P. Pérez and J. Sivic.
Publication date: 2025
Semantic image segmentation models typically require extensive pixel-wise annotations, which are costly to obtain and prone to biases. This work investigates learning semantic segmentation in urban scenes without any manual annotation. Researchers propose a novel method for learning pixel-wise semantic segmentation using raw, uncurated data from vehicle-mounted cameras and LiDAR sensors, thus eliminating the need for manual labeling. Researchers show the generalization capabilities of their method by testing on four different testing datasets (Cityscapes, Dark Zurich, Nighttime Driving, and ACDC) without any fine-tuning. They present an in-depth experimental analysis of the proposed model including results when using another pre-training dataset, per-class and pixel accuracy results, confusion matrices, PCA visualization, k-NN evaluation, ablations of the number of clusters and LiDAR’s density, supervised finetuning as well as additional qualitative results and their analysis.
Authors: Vojtěch Mlýnský, Petra Kührová, Martin Pykal, Miroslav Krepl, Petr Stadlbauer, Michal Otyepka, Pavel Banáš and Jiří Šponer.
Publication date: 2025
In this work, researchers present a comprehensive evaluation of widely used pair-additive and polarizable RNA ffs using the challenging UUCG tetraloop (TL) benchmark system. Extensive standard MD simulations, initiated from the NMR structure of the 14-mer UUCG TL, revealed that most ffs did not maintain the native state, instead favoring alternative loop conformations. Notably, three very recent variants of pair-additive ffs, OL3CP–gHBfix21, DES-Amber, and OL3R2.7, successfully preserved the native structure over a 10 × 20 μs time scale. To further assess these ffs, researchers performed enhanced sampling folding simulations of the shorter 8-mer UUCG TL, starting from the single-stranded conformation. Estimated folding free energies (ΔG°fold) varied significantly among these three ffs, with values of 0.0 ± 0.6, 2.4 ± 0.8, and 7.4 ± 0.2 kcal/mol for OL3CP–gHBfix21, DES-Amber, and OL3R2.7, respectively. The ΔG°fold value predicted by the OL3CP–gHBfix21 ff was closest to experimental estimates, ranging from −1.6 to −0.7 kcal/mol. In contrast, the higher ΔG°fold values obtained using DES-Amber and OL3R2.7 were unexpected, suggesting that key interactions are inaccurately described in the folded, unfolded, or misfolded ensembles. These discrepancies led them to further test DES-Amber and OL3R2.7 ffs on additional RNA and DNA systems, where further performance issues were observed. The results emphasize the complexity of accurately modeling RNA dynamics and suggest that creating an RNA ff capable of reliably performing across a wide range of RNA systems remains extremely challenging. In conclusion, our study provides valuable insights into the capabilities of current RNA ffs and highlights key areas for future ff development.
Authors: Viktoria Pauwa, David Číž, Vojtě ch Mlý nský, Pavel Banáš, Michal Otyepka, Stephan Hachinger and Jan Martinovič.
Publication date: 2024
The on-going work presented in this article explores different technical approaches and systems for management and analysis of data obtained from large physics simulations, optimising the respective data-driven workflows across Cloud-Computing (IaaS) and HPC systems. The work is carried out in the context of the EXA4MIND Horizon Europe project, which produces an Extreme Data processing platform, bringing together specialised data management systems and powerful computing infrastructures. We evaluate two typical use cases with physics simulations carried out on supercomputing systems at LRZ and IT4Innovations. These use cases come from different areas of physics – they focus on the treatment of low energy many-body systems of molecules, and of high-energy (relativistic) elementary particles, respectively.
Authors: R. F. Cekinel, Ç. Çöltekin, P. Karagoz.
Publication date: 2024
The rapid spread of misinformation through social media platforms has raised concerns regarding its impact on public opinion. While misinformation is prevalent in other languages, the majority of research in this field has concentrated on the English language. Hence, there is a scarcity of datasets for other languages, including Turkish. To address this concern, we have introduced the FCTR dataset, consisting of 3238 real-world claims. This dataset spans multiple domains and incorporates evidence collected from three Turkish fact-checking organizations. Additionally, we aim to assess the effectiveness of cross-lingual transfer learning for low-resource languages, with a particular focus on Turkish.
Authors: G. Puy, S. Gidaris, A. Boulch, O. Siméoni, C. Sautier, P. Pérez, A. Bursuc, R. Marlet
Publication date: 2024
Self-supervised image backbones can be used to address complex 2D tasks (e.g., semantic segmentation, object discovery) very efficiently and with little or no downstream supervision. Ideally, 3D backbones for lidar should be able to inherit these properties after distillation of these powerful 2D features. The most recent methods for image-to-lidar distillation on autonomous driving data show promising results, obtained thanks to distillation methods that keep improving. Yet, we still notice a large performance gap when measuring the quality of distilled and fully supervised features by linear probing. In this work, instead of focusing only on the distillation method, we study the effect of three pillars for distillation: the 3D backbone, the pretrained 2D backbones, and the pretraining dataset. In particular, thanks to our scalable distillation method named ScaLR, we show that scaling the 2D and 3D backbones and pretraining on diverse datasets leads to a substantial improvement of the feature quality. This allows us to significantly reduce the gap between the quality of distilled and fully-supervised 3D features, and to improve the robustness of the pretrained backbones to domain gaps and perturbations.
Authors: M. M. Khosravi, P. Karagoz, I. H. Toroslu.
Publication date: 2024
In this work, we consider the automated index selection for NoSQL databases and investigate the feasi- bility of supervised learning and reinforcement learning based solutions. The experiments conducted on the YCSB dataset show that reinforcement learning improves index selection per- formance as in relational databases, and supervised learning gives promising results and can be considered applicable under sufficient amount of training data.
Authors: A. Vobecky, O. Siméoni, D. Hurych, S. Gidaris, A. Bursuc, P. Pérez, J. Sivic.
Publication date: 2023
This research describes an approach to predict open-vocabulary 3D semantic voxel occupancy map from input 2D images with the objective of enabling 3D grounding, segmentation and retrieval of free-form language queries. This is a challenging problem because of the 2D-3D ambiguity and the open-vocabulary nature of the target tasks, where obtaining annotated training data in 3D is difficult. The contributions of this work are three-fold: a new model architecture for open-vocabulary 3D semantic occupancy prediction; a tri-modal self-supervised learning algorithm that leverages three modalities: (i) images, (ii) language and (iii) LiDAR point clouds, and enables training the proposed architecture using a strong pre-trained vision-language model without the need for any 3D manual language annotations; and a quantitative demonstration of the strengths of the proposed model on several open-vocabulary tasks.
Authors: V. Mlýnský, P. Kührová, P. Stadlbauer, M. Krepl, M. Otyepka, P. Banás, J. Šponer.
Publication date: 2023
Molecular dynamics (MD) simulations represent an established tool to study RNA molecules. The outcome of MD studies depends, however, on the quality of the force field (ff). Here researchers suggest a correction for the widely used AMBER OL3 ff by adding a simple adjustment of the nonbonded parameters. The research suggests that the combination of OL3 RNA ff and NBfix0BPh modification is a viable option to improve RNA MD simulations.
Authors: P. Harsh, S. Hachinger, M. Derquennes, A. Edmonds, P. Karagoz, M. Golasowski, M. Hayek and J. Martinovič.
Publication date: 2023
In this contribution, researchers sketch an application of Earth System Sciences and Cloud-/Big-Data-based IT, which shall soon leverage European supercomputing facilities: smart viticulture, as put into practice by Terraview. TerraviewOS is a smart vineyard ‘operating system’, allowing wine cultivators to optimise irrigation, harvesting dates and measures against plant diseases. The system relies on satellite and drone imagery as well as in-situ sensors where available. The substantial need for computing power in TerraviewOS, in particular for training AI-based models to generate derived data products, makes the further development of some of its modules a prime application case for the EXA4MIND project.
The Data Management Plan lays out the planning for handling main aspects of the life cycle of the project data (data organisation and long-term storage, access, preserva- tion, and sharing). This document also includes a preliminary specification of outputs (what data will be generated during the project). It is a living document and will be continuously updated during the project.
The EXA4MIND project connects pre-eminent databases and data management systems to supercomputing systems and European Data Spaces as well as the world of FAIR research data. The core purpose of this endeavour is running next-generation Extreme Data workfows, with emphasis on data analytics, Machine Learning / Artifcial Intelligence, or classical simulations. This deliverable reports on the Data and Workfow Management Toolbox provided for this purpose, building upon the successful LEXIS Platform (delivered by the H2020 project, GA 825532). Furthermore, it illustrates the first workfows run by our application cases at supercomputing centres.
Welcome to the fourth newsletter of the EXA4MIND Project – time to reach the next level! In this edition, you will find all about our last plenary meeting, a new method to improve the safety and reliability of autonomous driving by our consortium member Antonín Vobecký from Czech Technical University in Prague, EXA4MIND’s participation in international events, new videos of the ‘Faces of EXA4MIND’ campaign featuring our consortium members, and upcoming events.
Welcome to the third newsletter of the EXA4MIND Project – On to the second year of the project!! In this edition, you will find the consortium partners review the first year of the project and the objectives for the second year, presentation of the EXA4MIND External Advisory Board, highlights of international events attended by the project in the last months, and our last campaign ‘Faces of EXA4MIND’.
Welcome to the second newsletter of the EXA4MIND Project – The journey continues! In this edition, you will find details about the last plenary meeting and co-design meeting with Application Cases partners, highlights from all national and international events attended by the EXA4MIND project in the last months, and a preview of upcoming events.
Welcome to the first newsletter of the EXA4MIND project. We are glad to have you on board! In this edition, you will find a warm welcome from the EXA4MIND project coordinator, information about the organisations driving the project and their expectations of EXA4MIND, the presentation of our application cases, a recap of the events in which EXA4MIND has been actively involved, and interesting news about TerraviewOS, a consortium partner, which has emerged as the winner of the Gravity05 global sustainability challenge.