Understanding AI Deception and How One Can Prepare Against It

When talking about advents of artificial intelligence, we hear a lot about its adversarial attacks, specifically those that attempt to "deceive" an AI into believing, or to be more accurate, classifying, something incorrectly. For example, autonomous vehicles can be fooled into "thinking" stop signs are speed limit signs, pandas being identified as gibbons, or even having your favorite voice assistant be fooled by inaudible acoustic commands. Such examples showcase the narrative around AI deception.

In another form, AI can be deceptive in manipulating the perceptions and beliefs of a person through "deepfakes" in video, audio, and images. The significant AI conferences held around the world are more frequently addressing the subject of AI deception too. And yet a lot of debates and discussions are happening on how we can defend against it through detection mechanisms.

Let us understand how AI deception looks like and what happens when it is not a human's intent behind a deceptive AI but instead the AI agent's own learned behavior.

According to Dr. Heather M. Roff who is a senior research analyst at the Johns Hopkins Applied Physics Laboratory (APL) in the National Security Analysis Department, "These may seem somewhat far-off concerns, as AI is still relatively narrow in scope and can be rather stupid in some ways. To have some analog of an "intent" to deceive would be a large step for today's systems. However, if we are to get ahead of the curve regarding AI deception, we need to have a robust understanding of all the ways AI could deceive. We require some conceptual framework or spectrum of the kinds of deception an AI agent may learn on its own before we can start proposing technological defenses."

What is deception?

Some experts argue that deception is "false communication to the benefit of the communicator," while others say that deception is also the communication of information provided with the intent to manipulate another.

These seem pretty straightforward approaches, except when one tries to press on the idea of what constitutes "intent" and what is required to meet that threshold, as well as whether or not the false communication requires the intent to be explicitly beneficial to the deceiver. Moreover, depending on which stance one takes, deception for altruistic reasons may be excluded entirely.

Heather notes that intent requires a theory of mind, meaning that the agent has some understanding of itself and that it can reason about other external entities and their intentions, desires, states, and potential behaviors. If deception requires intent in the ways described above, then true AI deception would require an AI to possess a theory of mind.

She says, "We might kick the can on that conclusion for a bit and claim that current forms of AI deception instead rely on human intent—where some human is using AI as a tool or means to carry out that person's intent to deceive… Or, we may not: Just because current AI agents lack a theory of mind doesn't mean that they cannot learn to deceive."

In multi-agent AI systems, some agents can learn deceptive behaviors without having a true appreciation or comprehension of what "deception" actually is. This could be as simple as hiding resources or information or providing false information to achieve some goal. If we then put aside the theory of mind for the moment and instead posit that intention is not a prerequisite for deception and that an agent can unintentionally deceive, then we really have opened the aperture for existing AI agents to deceive in many ways.

What about the way in which deception occurs?

According to Heather, one can identify two broad categories here: 1) acts of commission, where an agent actively engages in behavior like sending misinformation; and 2) acts of omission, where an agent is passive but may be withholding information or hiding.

AI agents can learn all sorts of these types of behaviors given the right conditions. Just consider how AI agents used for cyber defense may learn to signal various forms of misinformation, or how swarms of AI-enabled robotic systems could learn deceptive behaviors on a battlefield to escape adversary detection. In more pedestrian examples, perhaps a rather poorly specified or corrupted AI tax assistant omits various types of income on a tax return to minimize the likelihood of owing money to the relevant authorities.

How One Can Prepare Against AI Deception?

The first step towards preparing for our coming AI future is to recognize that such systems already do deceive, and are likely to continue to deceive. How that deception occurs, whether it is a desirable trait (such as with our adaptive swarms), and whether we can actually detect when it is occurring are going to be ongoing challenges. Once we acknowledge this simple but true fact, we can begin to undergo the requisite analysis of what exactly constitutes deception, whether and to whom it is beneficial, and how it may pose risks.

This is no small task, and it will require not only interdisciplinary work from AI experts, but also input from sociologists, psychologists, political scientists, lawyers, ethicists, and policy wonks.

We presently face a myriad of challenges related to AI deception, and these challenges are only going to increase as the cognitive capacities of AI increase. The desire of some to create AI systems with a rudimentary theory of mind and social intelligence is a case in point to be socially intelligent one must be able to understand and to "manage" the actions of others5, and if this ability to understand another's feelings, beliefs, emotions, and intentions exists, along with the ability to act to influence those feelings, beliefs, or actions, then deception is much more likely to occur.

However, we do not need to wait for artificial agents to possess a theory of mind or social intelligence for deception with and from AI systems. We should instead begin thinking about potential technological, policy, legal, and ethical solutions to these coming problems before AI gets more advanced than it already is. With a clearer understanding of the landscape, we can analyze potential responses to AI deception, and begin designing AI systems for truth.

*Based on the insights from IEEE Spectrum

Understanding AI Deception and How One Can Prepare Against It

What is deception?

What about the way in which deception occurs?

How One Can Prepare Against AI Deception?

Related Stories