SYNAPSE: Framework for Neuron Analysis and Perturbation in Sequence Encoding
In recent years, Artificial Intelligence has become a powerful partner for complex tasks such as data analysis, prediction, and problem-solving, yet its lack of transparency raises concerns about its reliability. In sensitive domains such as healthcare or cybersecurity, ensuring transparency, trustworthiness, and robustness is essential, since the consequenc...
Quick facts
- Year
- 2025
- Venue
- arXiv preprint arXiv:2603.08424
- Identifier
- sanchoochoa2025synapse
Suggested citation
Jesús Sánchez Ochoa, Enrique Tomás Martínez Beltrán, Alberto Huertas Celdrán (2025). SYNAPSE: Framework for Neuron Analysis and Perturbation in Sequence Encoding. arXiv preprint arXiv:2603.08424.
Abstract
In recent years, Artificial Intelligence has become a powerful partner for complex tasks such as data analysis, prediction, and problem-solving, yet its lack of transparency raises concerns about its reliability. In sensitive domains such as healthcare or cybersecurity, ensuring transparency, trustworthiness, and robustness is essential, since the consequences of wrong decisions or successful attacks can be severe. Prior neuron-level interpretability approaches are primarily descriptive, task-dependent, or require retraining, which limits their use as systematic, reusable tools for evaluating internal robustness across architectures and domains. To overcome these limitations, this work proposes SYNAPSE, a systematic, training-free framework for understanding and stress-testing the internal behavior of Transformer models across domains. It extracts per-layer [CLS] representations, trains a lightweight linear probe to obtain global and per-class neuron rankings, and applies forward-hook interventions during inference. This design enables controlled experiments on internal representations without altering the original model, thereby allowing weaknesses, stability patterns, and label-specific sensitivities to be measured and compared directly across tasks and architectures. Across all experiments, SYNAPSE reveals a consistent, domain-independent organization of internal representations, in which task-relevant information is encoded in broad, overlapping neuron subsets. This redundancy provides a strong degree of functional stability, while class-wise asymmetries expose heterogeneous specialization patterns and enable label-aware analysis. In contrast, small structured manipulations in weight or logit space are sufficient to redirect predictions, highlighting complementary vulnerability profiles and illustrating how SYNAPSE can guide the development of more robust Transformer models.
Authors
Keywords
Related publications
Works with stronger overlap in topic, type, and tags.
Decentralized Self-Supervised Representation Learning via Prototype Exchange under Non-IID Data
Enrique Tomás Martínez Beltrán, Gérôme Bovet, Gregorio Martínez Pérez, Alberto Huertas Celdrán
FedEnD: Communication-Efficient Federated Learning for Non-IID Data via Decentralized Ensemble Distillation
Enrique Tomás Martínez Beltrán, Philip Giryes, Gérôme Bovet, Burkhard Stiller, Gregorio Martínez Pérez, Alberto Huertas Celdrán
RepuNet: A Reputation System for Mitigating Malicious Clients in DFL
Isaac Marroqui Penalva, Enrique Tomás Martínez Beltrán, Manuel Gil Pérez, Alberto Huertas Celdrán
Related Research

Apr 2023 — Nov 2023
DEFENDIS: Decentralized Federated Learning for IoT Device Identification and Security
DEFENDIS develops a framework for uniquely identifying IoT devices in a distributed manner while solving security threats through decentralized federated learning.

Dec 2022 — Nov 2025
EU-GUARDIAN: European Framework and Proofs-of-concept for the Intelligent Automation of Cyber Defence Incident Management
A cutting-edge AI-based solution for automating cyber defence incident management processes, enhancing EU cyber defence posture and operational capabilities.