SABER: Spatially Consistent 3D Universal Adversarial Objects for BEV Detectors

CVPR 2026

Aixuan Li¹, Mochu Xiang¹, Bosen Hou¹, Zhexiong Wan¹, Jing Zhang², Yuchao Dai¹

¹Northwestern Polytechnical University, China ²Australian National University, Australia

Abstract

Adversarial robustness of BEV 3D object detectors is critical for autonomous driving (AD). Existing invasive attacks require altering the target vehicle itself (e.g. attaching patches), making them unrealistic and impractical for real-world evaluation. While non-invasive attacks that place adversarial objects in the environment are more practical, current methods still lack the multi-view and temporal consistency needed for physically plausible threats. In this paper, we present the first framework for generating universal, non-invasive, and 3D-consistent adversarial objects that expose fundamental vulnerabilities for BEV 3D object detectors. Instead of modifying target vehicles, our method inserts rendered objects into scenes with an occlusion-aware module that enforces physical plausibility across views and time. To maintain attack effectiveness across views and frames, we optimize adversarial object appearance using a BEV spatial feature-guided optimization strategy that attacks the detector's internal representations. Extensive experiments demonstrate that our learned universal adversarial objects can consistently degrade multiple BEV detectors from various viewpoints and distances. More importantly, the new environment-manipulation attack paradigm exposes models' over-reliance on contextual cues and provides a practical pipeline for robustness evaluation in AD systems.

Contribution

We propose the first 3D-consistent, non-invasive threat model where a universal adversarial object, placed nearby without physical contact, can mislead BEV detectors and cause hazards;
We realize this attack with a novel pipeline that leverages differentiable rendering for 3D-consistency, a Realistic Occlusion Processing Module for physical realism, and a BEV feature-based scene confusion loss for robust feature-level attacks;
Results on public datasets and in physical experiments show that our attack reveals a profound semantic vulnerability in current models: our non-invasive object manipulates the model's contextual reasoning about object co-occurrence, exposing a deep over-reliance on learned environmental priors and suggesting significant dataset deficiencies.

Comprehensive Workflow

Overview of our adversarial object generation pipeline. Appropriate locations are first chosen to place adversarial meshes in the 3D scene, which are then rendered onto the input images. Our differentiable renderer ensures 3D-consistent, multi-view renderings with correct perspective. The Realistic Occlusion Processing Module further simulates partial visibility for improved robustness. Finally, the adversarial object is optimized via BEV Spatial Feature-Guided Optimization to enable effective attacks.