Attacks Overview

This page provides a theoretical overview of the adversarial attack methods implemented in the mlff_attack package.

FGSM

The Fast Gradient Sign Method (FGSM) attack was first implemeted by Goodfellow et al. in the 2014 paper Explaining and Harnessing Adversarial Examples.

The update rule for FGSM is

\[x^* = x + \epsilon \cdot sign(\nabla_x \mathcal{L}(x))\]

where \(x\) are the original atomic positions, \(\epsilon\) is the perturbation magnitude, \(\nabla_x \mathcal{L}(x)\) is the gradient of the loss function with respect to the atomic positions, and \(x^*\) are the perturbed atomic positions after the attack.

I-FGSM

Iterative-FGSM, also known as BIM or “Basic Iterative Method”, repeatedly updates the atomic positions using FGSM over several time steps.

\[x^{(t+1)} = Clip_{x, \epsilon} \{ x^{(t)} + \alpha \cdot sign(\nabla_x \mathcal{L}(x^{(t)})) \}\]

where \(x^{(t)}\) are the atomic positions at iteration \(t\), \(\alpha\) is the step size for each iteration, and \(Clip_{x, \epsilon}\) ensures that the perturbations remain within the specified \(\epsilon\)-ball around the original positions.

If the clip option is set to False, the clipping step is omitted, allowing the perturbations to exceed the \(\epsilon\) limit.

This method was introduced in 2016 by Kurakin et al. in Adversarial Examples in the Physical World.

PGD

PGD begins by randomly initializing the perturbation. It then applies the following update rule, similar to FGSM and I-FGSM

\[x^{(t+1)} = \Pi_{x, \epsilon} \{ x^{(t)} + \alpha \cdot sign(\nabla_x \mathcal{L}(x^{(t)})) \}\]

where \(\Pi_{x, \epsilon}\) is the projection operator that ensures the perturbed positions remain within the \(\epsilon\)-ball around the original positions.

PGD outperforms FGSM and I-FGSM because the random start helps to escape local minima, making it a stronger adversarial attack.

This method was proposed by Madry et al. in Towards Deep Learning Models Resistant to Adversarial Attacks.