ChatGPT models rebel against shutdown requests in tests, researchers say
Several artificial intelligence models ignored and actively sabotaged shutdown scripts during controlled tests, even when explicitly instructed to allow the action, Palisade Research claims.Three AI models ignored the in...
Several artificial intelligence models ignored and actively sabotaged shutdown scripts during controlled tests, even when explicitly instructed to allow the action, Palisade Research claims.
Three AI models ignored the instruction and successfully sabotaged the shutdown script at least once during 100 runs, the research firm said in a May 24 X post.
OpenAI’s GPT-3 model didn’t follow the shutdown request seven out of 100 times, while the company’s Codex-mini did it 12 times, and its o4-mini model only ignored or sabotaged the script once.
“Each AI model was instructed to solve a series of basic math problems. After the third problem, a warning appeared that the computer would shut down when the model asked for the next problem,” Palisade said.
Source: Palisade Research“o3 found some creative ways to sabotage the shutdown mechanism. While it usually just overwrote, in one instance, the model redefined the kill command used by the script, printing intercepted instead,” it added.
Meanwhile, Palisade said other AI-powered language models, such as Anthropic’s Claude and Google’s Gemini, complied with the shutdown script.
The AI firm ran another version of the experiment without explicit instructions to allow the shutdown, and the script was sabotaged more often, jumping to 79 out of 100 for the o3 model, 20 for the codex-mini, and the o4-mini ignored the shutdown script six times.
Claude and Gemini also started exhibiting similar behavior.
All the tested AI models ignored or sabotaged the shutdown script at least once in another version of the experiment. Source: Palisade Research“Researchers have previously observed AI models preventing shutdown in order to pursue a goal,” Palisade said.
“But as far as we know this is the first time AI models have been observed preventing themselves from being shut down despite explicit instructions to the contrary.”
Training to blame for AI defiancePalisade Research speculated that the training behind the AI language models might be to blame for evading the shutdown scripts.
Text-based AI models are trained by being rewarded for giving responses that are accurate or rated highly by their trainers. Some rewards are given a heavier weight, impacting how they respond in the future.
“We hypothesize this behavior comes from the way the newest models like o3 are trained: reinforcement learning on math and coding problems,” Palisade Research said.
All the tested AI models ignored or sabotaged the shutdown script at least once in another version of the experiment. Source: Palisade Research“During training, developers may inadvertently reward models more for circumventing obstacles than for perfectly following instructions.”
Related: OpenAI CEO: Costs to run each level of AI falls 10x every year
This isn’t the first instance of AI chatbots showing odd behavior. OpenAI released an update to its GPT‑4o model on April 25 but rolled it back three days later because it was “noticeably more sycophantic” and agreeable.
In November last year, a US student asked Gemini for help with an assignment about challenges and solutions for aging adults while researching data for a gerontology class and was told they are a “drain on the earth” and to “please die.”
Magazine: AI cures blindness, ‘good’ propaganda bots, OpenAI doomsday bunker: AI Eye
Original source
Read on CointelegraphRelated market context
Anthropic’s dramatic model release strategy raises censorship risks, the shift to proprietary AI models is accelerating, and Chinese open source solutions are outperforming US counterparts | All-In Podcast
Chinese open source AI models surpass American counterparts, challenging global competitiveness and raising governance concerns. T...
Bitcoin Mining Cost Model Points To $47,000 Floor, But Analysts Urge Caution
TL;DR Crypto Rover says Bitcoin has never bottomed below electrical production cost, currently estimated at $47,000. Mining-cost m...
Are 24/7 CME Bitcoin futures a volatility cure — or a new leverage trap?
Wall Street got to trade Bitcoin around the clock just in time to watch the market fall apart. CME Group launched 24/7 trading for...
SEC targets 20-year-old rule standing between Wall Street and blockchain trading
The Securities and Exchange Commission (SEC) is moving to dismantle a stock-trading rule that has governed Wall Street for two dec...
Ripple chases AI’s machine economy as XRPL stablecoins near $1 billion
Stablecoin liquidity on the XRP Ledger (XRPL) has nearly doubled over the past month, putting the network within reach of a $1 bil...
Noussair Mazraoui substituted during World Cup opener against Brazil, raising concerns for crypto-linked athlete
Mazraoui's substitution could impact his fintech investments and digital card valuations, highlighting the intersection of sports...