Thursday, August 12, 2021 - 11:00 am
online

DISSERTATION DEFENSE

Department of Computer Science and Engineering

University of South Carolina

 Towards More Trustworthy Deep Learning: Accurate, Resilient, and Explainable Countermeasures Against Adversarial Examples

Author : Fei Zuo

Advisor : Dr. Qiang Zeng

Date : Aug 12, 2021

Time : 11:00am

Place : Virtual Defense

                 

Abstract

Despite the great achievements made by neural networks on tasks such as image classification, they are brittle and vulnerable to adversarial example (AE) attacks. Along with the prevalence of deep learning techniques, the threat of AEs attracts increasingly attentions since it may lead to serious consequences in some vital applications such as disease diagnosis. 

To defeat attacks based on AEs, both detection and defensive techniques attract the research community’s attention. While many countermeasures against AEs have been proposed, recent studies show that the existing detection methods usually goes ineffective when facing adaptive AEs. In this work, we exploit AEs by identifying their noticeable characteristics.

 

First, we noticed that L2 adversarial perturbations are among the most effective but difficult-to-detect attacks. How to detect adaptive L2 AEs is still an open question. At the same time, we find that, by randomly erasing some pixels in an L2 AE and then restoring it with an inpainting technique, the AE, before and after the steps, tends to have different classification results, while a benign sample does not show this symptom. We thus propose a novel AE detection technique, Erase-and-Restore (E&R), that exploits the intriguing sensitivity of L2 attacks. Comprehensive experiments conducted on standard image datasets show that the proposed detector is effective and accurate. More importantly, our approach demonstrates strong resilience to adaptive attacks. We also interpret the detection technique through both visualization and quantification.

 

Second, previous work considers that it is challenging to properly alleviate the effect of the heavy corruptions caused by L0 attacks. However, we argue that the uncontrollable heavy perturbation is

an inherent limitation of L0 AEs, and thwart such attacks. We thus propose a novel AE detector by converting the detection problem into a comparison problem. In addition, we show that the pre-processing technique used for detection can also work as an effective defense, which has a high probability of removing the adversarial influence of L0 perturbations. Thus, our system demonstrates not only high AE detection accuracies, but also a notable capability to correct the classification results.

 

Finally, we propose a comprehensive AE detector which systematically combines the two detection methods to thwart all categories of widely discussed AEs, i.e., L0, L2, and L∞ attacks. By acquiring the both strengths from its assembly components, the new hybrid AE detector is not only able to distinguish various kinds of  

AEs, but also has a very low false positive rate on benign images. More significantly, through exploiting the noticeable characteristics of AEs, the proposed detector is highly resilient to adaptive attack, filling a critical gap in AE detection.