Privacy-Preserving Cybersecurity with Federated Learning

The large-scale exploitation of data has become essential in cybersecurity to detect malicious behaviors, anticipate attacks, and improve defensive capabilities. However, this necessity faces a major limitation: the most valuable data is often distributed across multiple entities that cannot centralize or freely share it. Between regulatory requirements, confidentiality concerns, and the risks associated with exposing sensitive information, entities remain understandably reluctant to pool their event logs, indicators of compromise, or incident-related data. This fragmentation considerably limits the ability of artificial intelligence systems to learn globally and identify emerging threats from weak signals scattered across different environments. In this context, federated learning emerges as a particularly relevant approach for reconciling collaboration with the protection of sensitive data.

Rather than moving data toward a centralized model, federated learning (see Figure 1) relies on the opposite logic: models are trained locally within each organization, while only learning parameters are shared and aggregated. This approach enables stakeholders to benefit from collective intelligence without directly exposing raw data, as demonstrated in numerous recent studies on collaborative cyber threat detection. In cybersecurity, this methodology offers significant value because it allows the exploitation of data originating from highly diverse environments — industrial networks, cloud infrastructures, user endpoints, or critical systems — while still respecting the operational and regulatory constraints specific to each entity. More importantly, it reduces the risk that a central collection point could itself become a strategic target for attackers.

Figure 1: Illustration of the iterative Federated Learning approach. Each participant updates a common model locally using its own data and sends the updated parameters to the central Federated Learning server, which aggregates the parameters received from all participants to produce a new version of the common model.

Within the ENSEMBLE project, CEA and CERTH will implement this federated learning approach for both supervised and unsupervised learning use cases, leveraging the scenarios and operational use cases developed throughout the project. By relying on realistic cyber incident simulations and collaboratively generated datasets, the objective will be to evaluate how federated models can improve forensic threat detection, anomaly identification, and behavioral analysis across distributed environments without requiring direct data sharing between entities. Through this initiative, the ENSEMBLE project aims not only to explore the technical feasibility of federated cybersecurity analytics, but also to demonstrate how collaborative AI methodologies can strengthen collective cyber resilience while preserving confidentiality, operational sovereignty, and trust among stakeholders.

Written by Stéphane Gazut from CEA