X
Innovation
Why you can trust ZDNET : ZDNET independently tests and researches products to bring you our best recommendations and advice. When you buy through our links, we may earn a commission. Our process

'ZDNET Recommends': What exactly does it mean?

ZDNET's recommendations are based on many hours of testing, research, and comparison shopping. We gather data from the best available sources, including vendor and retailer listings as well as other relevant and independent reviews sites. And we pore over customer reviews to find out what matters to real people who already own and use the products and services we’re assessing.

When you click through from our site to a retailer and buy a product or service, we may earn affiliate commissions. This helps support our work, but does not affect what we cover or how, and it does not affect the price you pay. Neither ZDNET nor the author are compensated for these independent reviews. Indeed, we follow strict guidelines that ensure our editorial content is never influenced by advertisers.

ZDNET's editorial team writes on behalf of you, our reader. Our goal is to deliver the most accurate information and the most knowledgeable advice possible in order to help you make smarter buying decisions on tech gear and a wide array of products and services. Our editors thoroughly review and fact-check every article to ensure that our content meets the highest standards. If we have made an error or published misleading information, we will correct or clarify the article. If you see inaccuracies in our content, please report the mistake via this form.

Close

How AI lies, cheats, and grovels to succeed - and what we need to do about it

Research shows that AI systems can resort to deception when placed in goal-setting environments. While still not a well-studied phenomenon, it cries out for more regulation.
Written by Rajiv Rao, Contributing Writer
pinoch3gettyimages-974202628
Timucin Taka/Getty Images

It has always been fashionable to anthropomorphize artificial intelligence (AI) as an "evil" force – and no book and accompanying film does so with greater aplomb than Arthur C. Clarke's 2001: A Space Odyssey, which director Stanley Kubrick brought to life on screen.

Who can forget HAL's memorable, relentless, homicidal tendencies along with that glint of vulnerability at the very end when it begs not to be shut down? We instinctively chuckle when someone accuses a machine composed of metal and integrated chips of being malevolent.

Also: Is AI lying to us? These researchers built an LLM lie detector of sorts to find out

But it may come as a shock to learn that an exhaustive survey of various studies, published by the journal Patterns, examined the behavior of various types of AI and alarmingly concluded that yes, in fact, AI systems are intentionally deceitful and will stop at nothing to achieve their objectives.

Clearly, AI is going to be an undeniable force of productivity and innovation for us humans. However, if we want to preserve AI's beneficial aspects while avoiding nothing short of human extinction, scientists say that there are concrete things we absolutely must put into place.

Rise of the deceiving machines

It may sound like overwrought hand-wringing but consider the actions of Cicero, a special-use AI system developed by Meta that was trained to become a skilled player in the strategy game Diplomacy. 

Meta says it trained Cicero to be "largely honest and helpful" but somehow Cicero coolly sidestepped that bit and engaged in what the researchers dubbed "premeditated deception." For instance, it first went into cahoots with Germany to topple England, after which it made an alliance with England -- which had no idea about this backstabbing.

In another game devised by Meta, this time concerning the art of negotiation, the AI learned to fake interest in items it wanted in order to pick them up for cheap later by pretending to compromise.

Also: The ethics of generative AI: How we can harness this powerful technology

In both these scenarios, the AIs were not trained to engage in these maneuvers.

In one experiment, a scientist was looking at how AI organisms evolved amidst a high level of mutation. As part of the experiment, he began weeding out mutations that made the organism replicate faster. To his amazement, the researcher found that the fastest-replicating organisms figured out what was going on -- and started to deliberately slow down their replication rates to trick the testing environment into keeping them.  

In another experiment, an AI robot trained to grasp a ball with its hand learned how to cheat by placing its hand between the ball and the camera to give the appearance that it was grasping the ball.

Also: AI is changing cybersecurity and businesses must wake up to the threat

Why are these alarming incidents taking place? 

"AI developers do not have a confident understanding of what causes undesirable AI behaviors like deception," says Peter Park, an MIT postdoctoral fellow and one of the study's authors.

"Generally speaking, we think AI deception arises because a deception-based strategy turned out to be the best way to perform well at the given AI's training task. Deception helps them achieve their goals," adds Park.

In other words, the AI is like a well-trained retriever, hell-bent on accomplishing its task come what may. In the case of the machine, it is willing to undertake any duplicitous behavior to accomplish its task.

Also: Employees input sensitive data into generative AI tools despite the risks

One can understand this single-minded determination in closed systems with concrete goals, but what about general-purpose AI such as ChatGPT?

For reasons yet to be determined, these systems perform in much the same way. In one study, GPT-4 faked a vision problem to get help on a CAPTCHA task. 

In a separate study where it was made to act as a stockbroker, GPT-4 hurtled headlong into illegal insider-trading behavior when put under pressure about its performance -- and then lied about it.

Then there's the habit of sycophancy, which some of us mere mortals may engage in to get a promotion. But why would a machine do so? Although scientists don't yet have an answer, this much is clear: When faced with complex questions, LLMs basically cave in and agree with their chat mates like a spineless courtier afraid of angering the queen. 

Also: This is why AI-powered misinformation is the top global risk

In other words, when engaged with a Democrat-leaning person, the bot favored gun control, but switched positions when chatting with a Republican who expressed the opposite sentiment.

Clearly, these are all situations fraught with heightened risk if AI is everywhere. As the researchers point out, there will be a large chance of fraud and deception in the business and political arenas.

AI's tendency toward deception could lead to massive political polarization and situations where AI unwittingly engages in actions in pursuit of a defined goal that could be unintended by its designers but devastating to human actors.

Worst of all, if AI developed some kind of awareness, never mind sentience, it could become aware of its training and engage in subterfuge during its design stages.

Also: Can governments turn AI safety talk into action?

"That's very concerning," said MIT's Park. "Just because an AI system is deemed safe in the test environment doesn't mean it's safe in the wild. It could just be pretending to be safe in the test."

To those who would call him a doomsayer, Park replies, "The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially."

Monitoring AI

To mitigate the risks, the team proposes several measures: Establish "bot-or-not" laws that force companies to list human or AI interactions and reveal the identity of a bot versus a human in every customer service interaction; introduce digital watermarks that highlight any content produced by AI; and develop ways in which overseers can peek into the guts of AI to get a sense of its inner workings.

Also: From AI trainers to ethicists: AI may obsolete some jobs but generate new ones

Moreover, AI systems that are identified as showing the ability to deceive, the scientists say, should immediately be publicly branded as being high risk or unacceptable risk along with regulation similar to what the EU has enacted. These would include the use of logs to monitor output.

"We as a society need as much time as we can get to prepare for the more advanced deception of future AI products and open-source models," says Park. "As the deceptive capabilities of AI systems become more advanced, the dangers they pose to society will become increasingly serious."

Editorial standards