Efficient and Transferable Adversarial Examples from Bayesian Neural Networks


An established way to improve the transferability of black-box evasion attacks is to craft the adversarial examples on a surrogate ensemble model. Unfortunately, such methods involve heavy computation costs to train the models forming the ensemble. Based on a state-of-the-art Bayesian Neural Network technique, we propose a new method to efficiently build such surrogates by sampling from the posterior distribution of neural network weights during a single training process. Our experiments on ImageNet and CIFAR-10 show that our approach improves the transfer rates of four state-of-the-art attacks significantly (between 2.5 and 44.4 percentage points), in both intra-architecture and inter-architecture cases. On ImageNet, our approach can reach 94% of transfer rate while reducing training time from 387 to 136 hours on our infrastructure, compared to an ensemble of independently trained DNNs. Furthermore, our approach can be combined with test-time techniques improving transferability, further increasing their effectiveness by up to 25.1 percentage points.


The manuscript can be downloaded from arXiv.