Exa4mind – Extreme Analytics for Mining Data spaces

Exa4Mind

Publications

Exa4Mind

Publications

EXA4MIND relies on a co-design approach, where technology partners from computing centres and universities and application partners from industry, academia and SMEs design an Extreme Data infrastructure in close collaboration.

Cross-Lingual Learning vs. Low-Resource Fine-Tuning: A Case Study with Fact-Checking in Turkish

Scientific Publication

6

Authors: R. F. Cekinel, Ç. Çöltekin, P. Karagoz.

Publication date: 2024

The rapid spread of misinformation through social media platforms has raised concerns regarding its impact on public opinion. While misinformation is prevalent in other languages, the majority of research in this field has concentrated on the English language. Hence, there is a scarcity of datasets for other languages, including Turkish. To address this concern, we have introduced the FCTR dataset, consisting of 3238 real-world claims. This dataset spans multiple domains and incorporates evidence collected from three Turkish fact-checking organizations. Additionally, we aim to assess the effectiveness of cross-lingual transfer learning for low-resource languages, with a particular focus on Turkish.

Three Pillars improving Vision Foundation Model Distillation for Lidar

Scientific Publication

5

Authors: G. Puy, S. Gidaris, A. Boulch, O. Siméoni, C. Sautier, P. Pérez, A. Bursuc, R. Marlet

Publication date: 2024

Self-supervised image backbones can be used to address complex 2D tasks (e.g., semantic segmentation, object discovery) very efficiently and with little or no downstream supervision. Ideally, 3D backbones for lidar should be able to inherit these properties after distillation of these powerful 2D features. The most recent methods for image-to-lidar distillation on autonomous driving data show promising results, obtained thanks to distillation methods that keep improving. Yet, we still notice a large performance gap when measuring the quality of distilled and fully supervised features by linear probing. In this work, instead of focusing only on the distillation method, we study the effect of three pillars for distillation: the 3D backbone, the pretrained 2D backbones, and the pretraining dataset. In particular, thanks to our scalable distillation method named ScaLR, we show that scaling the 2D and 3D backbones and pretraining on diverse datasets leads to a substantial improvement of the feature quality. This allows us to significantly reduce the gap between the quality of distilled and fully-supervised 3D features, and to improve the robustness of the pretrained backbones to domain gaps and perturbations.

IndexAI: AI Based Index Selection for NoSQL Databases (pre-print version)

Scientific Publication

4

Authors: M. M. Khosravi, P. Karagoz, I. H. Toroslu.

Publication date: 2024

In this work, we consider the automated index selection for NoSQL databases and investigate the feasi- bility of supervised learning and reinforcement learning based solutions. The experiments conducted on the YCSB dataset show that reinforcement learning improves index selection per- formance as in relational databases, and supervised learning gives promising results and can be considered applicable under sufficient amount of training data.

POP3D: Open-Vocabulary 3D Occupancy Prediction from Images

Scientific Publication

3

Authors: A. Vobecky, O. Siméoni, D. Hurych, S. Gidaris, A. Bursuc, P. Pérez, J. Sivic.

Publication date: 2023

This research describes an approach to predict open-vocabulary 3D semantic voxel occupancy map from input 2D images with the objective of enabling 3D grounding, segmentation and retrieval of free-form language queries. This is a challenging problem because of the 2D-3D ambiguity and the open-vocabulary nature of the target tasks, where obtaining annotated training data in 3D is difficult. The contributions of this work are three-fold: a new model architecture for open-vocabulary 3D semantic occupancy prediction; a tri-modal self-supervised learning algorithm that leverages three modalities: (i) images, (ii) language and (iii) LiDAR point clouds, and enables training the proposed architecture using a strong pre-trained vision-language model without the need for any 3D manual language annotations; and a quantitative demonstration of the strengths of the proposed model on several open-vocabulary tasks.

Simple Adjustment of Intranucleotide Base-Phosphate Interaction in the OL3 AMBER Force Field Improves RNA Simulations

Scientific Publication

2

Authors: V. Mlýnský, P. Kührová, P. Stadlbauer, M. Krepl, M. Otyepka, P. Banás, J. Šponer.

 Publication date: 2023

Molecular dynamics (MD) simulations represent an established tool to study RNA molecules. The outcome of MD studies depends, however, on the quality of the force field (ff). Here researchers suggest a correction for the widely used AMBER OL3 ff by adding a simple adjustment of the nonbonded parameters. The research suggests that the combination of OL3 RNA ff and NBfix0BPh modification is a viable option to improve RNA MD simulations.

Wine in the Cloud, or: Smart Vineyards with a Distributed "Extreme Data Database" and Supercomputing

Scientific Publication

1

Authors: P. HarshS. HachingerM. DerquennesA. EdmondsP. KaragozM. GolasowskiM. Hayek and J. Martinovič.

Publication date: 2023

In this contribution, researchers sketch an application of Earth System Sciences and Cloud-/Big-Data-based IT, which shall soon leverage European supercomputing facilities: smart viticulture, as put into practice by Terraview. TerraviewOS is a smart vineyard ‘operating system’, allowing wine cultivators to optimise irrigation, harvesting dates and measures against plant diseases. The system relies on satellite and drone imagery as well as in-situ sensors where available. The substantial need for computing power in TerraviewOS, in particular for training AI-based models to generate derived data products, makes the further development of some of its modules a prime application case for the EXA4MIND project.

Data Management Plan

Public Deliverable

5

The Data Management Plan lays out our planning for handling main aspects of the life cycle of the project data (data organisation and long-term storage, access, preserva- tion, and sharing). This document also includes a preliminary specification of outputs (what data will be generated during the project). It is a living document and will be continuously updated during the project.
Impact Master Plan

Public Deliverable

4

This deliverable outlines the planning of the dissemination, communication, exploitation and standardisation strategies for the EXA4MIND Horizon Europe project. This planning will be of relevance throughout the duration of the project and will be revisited periodically as it progresses.
Data and Workflow Management Toolbox Alpha Status Report

Public Deliverable

3

The EXA4MIND project connects pre-eminent databases and data management systems to supercomputing systems and European Data Spaces as well as the world of FAIR research data. The core purpose of this endeavour is running next-generation Extreme Data workfows, with emphasis on data analytics, Machine Learning / Artifcial Intelligence, or classical simulations. This deliverable reports on the Data and Workfow Management Toolbox provided for this purpose, building upon the successful LEXIS Platform (delivered by the H2020 project, GA 825532). Furthermore, it illustrates the first workfows run by our application cases at supercomputing centres.
Extreme Data Flow Patterns

Public Deliverable

2

This deliverable of the EXA4MIND project collects and analyses data flow patterns from all the project application cases. The collected data flow descriptions are used to identify a set of common occurring patterns that will be taken into account when designing the Extreme Data Database.
Application Cases And Architecture Requirements

Public Deliverable

1

This deliverable contains requirements provided by the project’s application-case work packages WP4-WP6 and their mapping to the EXA4MIND Platform features. The document is roughly divided into two parts. The first part is containing a unified description of each application case and its requirements. The second half of the document contains the mapping of the requirements to the technical features of the EXA4MIND Platform and the project objectives provided by the technical work packages WP1-WP3.

Newsletter

4

Time to reach the next level!

Welcome to the fourth newsletter of the EXA4MIND Project – time to reach the next level! In this edition, you will find all about our last plenary meeting, a new method to improve the safety and reliability of autonomous driving by our consortium member Antonín Vobecký from Czech Technical University in Prague, EXA4MIND’s participation in international events, new videos of the ‘Faces of EXA4MIND’ campaign featuring our consortium members, and upcoming events. 

Newsletter

3

On to the second year of the project!

Welcome to the third newsletter of the EXA4MIND Project – On to the second year of the project!! In this edition, you will find the consortium partners review the first year of the project and the objectives for the second year, presentation of the EXA4MIND External Advisory Board,  highlights of international events attended by the project in the last months, and our last campaign ‘Faces of EXA4MIND’.

Newsletter

2

The journey continues!

Welcome to the second newsletter of the EXA4MIND Project – The journey continues! In this edition, you will find details about the last plenary meeting and co-design meeting with Application Cases partners, highlights from all national and international events attended by the EXA4MIND project in the last months, and a preview of upcoming events.

Newsletter

1

We are glad to have you on board!

Welcome to the first newsletter of the EXA4MIND project. We are glad to have you on board! In this edition, you will find a warm welcome from the EXA4MIND project coordinator, information about the organisations driving the project and their expectations of EXA4MIND, the presentation of our application cases, a recap of the events in which EXA4MIND has been actively involved, and interesting news about TerraviewOS, a consortium partner, which has emerged as the winner of the Gravity05 global sustainability challenge.