Why Increasing Instances Of Adversarial Attacks Are Concerning?by Preetipadma November 14, 2020
Understanding the basics about adversarial attack and Backdoors
Artificial intelligence (AI) is a technology that mimics human intelligence and computational skills. Today, AI tools are employed to find efficiencies, improve decision making, and offer better end-user experiences. It is also used to fight and prevent cybersecurity threats too. AI and its subtype machine learning (ML) is used by companies to analyze network traffic for anomalous and suspicious activity. However, there are certain limitations when it comes to applying these tools to security. Cybercriminals can exploit these loopholes to their advantage and launch and even power threat or malware. These attacks have proliferated in the last few years and have been exploiting machine learning models.
What is it?
An adversarial attack (adversarial example) refers to the feeding of inputs (e.g., image, text, and voice) to machine learning models that an attacker has intentionally designed to cause the model to make a mistake or wrong classification. In other words, these are a specialized type of adversarial machine learning techniques that manipulate the behavior of AI algorithms. These attacks are like optical illusions for machines. According to Gartner’s Top 10 Strategic Technology Trends for 2020, published in October 2019, through 2022, 30% of all AI cyber-attacks will leverage training-data poisoning, AI model theft, or adversarial samples to attack AI-powered systems.
One of the common types of such attacks is backdoor attacks. This is a specialized type of adversarial machine learning technique that manipulates the behavior of AI algorithms. It aims to implant adversarial vulnerabilities in the machine learning model during the training phase. Hence, this type of adversarial attack depends on data poisoning or the manipulation of the examples used to train the target machine learning model. Here attackers mindlessly search for strong correlations in the training data without looking for causal factors. They seek to modify the training samples in a manner that, at inference time, the presence of a specific pattern (trigger) in the input data causes misclassifications to a target class chosen by the adversary. These backdoors can be a huge pain point for organizations that want to outsource work on neural networks to third parties or build products on top of freely available neural networks available online.
There are two common possible approaches for such attacks.
- Attacks examine the tool’s learning processes to garner information about the solution’s domain of data, the models used, and data governance rules. After that, attempts are made to influence or pollute the learning process assuming the ML solution learns from a large data pool.
- Attackers obtain or infer ML models as a starting point to morph their attacks so that they can evade detection. This is executed when there’s a perfect learning set but the attackers don’t have any idea about the expected solution. As a result, they attempt to learn the ML tool’s classifier so that they can evade its algorithms in the future.
Generally, data manipulation can be done by introducing noise, semantic attack, Fast Gradient Sign Method, DeepFool, and Projected Gradient Descent. But irrespective of the attack type of source, adversarial attacks are dangerous. For instance, as cited in the research paper, Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples, attackers could target autonomous vehicles by using stickers or paint to create an adversarial stop sign that the vehicle would interpret as a ‘yield’ or other sign. A backdoored skin cancer screening system misdiagnoses skin lesion image to other attacker determined diseases. Or a face recognition system is adversarially hijacked to recognize any person wearing a black-frame eye-glass as a natural trigger to the target person. Recently, in a paper submitted to the ICLR 2021 conference, the researchers show a proof of concept for so-called ‘triggerless backdoors’ in deep learning systems. This is a new type of attack that bypasses most of the defense methods that are currently being deployed. The paper authored by AI scientists at the Germany-based CISPA Helmholtz Center for Information Security shows that machine learning backdoors can be well-hidden and discreet.
Mitigation and Further Research
There are a few methods that can help counter such attacks:
- Adversarial training: Here, a lot of adversarial examples are generated and used to explicitly train the model not to be fooled by each of them.
- Defensive distillation: Here, the ML model is trained such that it gives output probabilities of different classes rather than hard decisions about which class to output.
- Random Resizing and Padding: As the name suggests, this involves random resizing of a given image of all four sides and then padding the image randomly.
A few weeks ago, Microsoft, in collaboration with the nonprofit MITRE Corporation, and 11 organizations including IBM, Nvidia, Airbus, and Bosch released the Adversarial ML Threat Matrix, an industry-focused open framework designed to help security analysts to detect, respond to, and remediate threats against machine learning systems. This project aimed to provide a consolidated view of how malicious actors can take advantage of machine learning algorithms’ weaknesses to target organizations that use them. The Adversarial ML Threat is modeled after the MITRE ATT&CK Framework, to deal with cyber-threats in enterprise networks. Because of the ATT&CK format of the Adversarial ML Threat Matrix, it is easier for security analysts to understand machine learning models’ threats in real-world incidents.