Anthropic Releases New AI Model That Shows Early Signs of Dangerous Capabilities

The standout feature of the Sonnet release is its ability to interact with your computer—allowing it to take and read screenshots, move the mouse, click buttons on webpages, and type text. This capability is being rolled out in a “public beta” phase, which Anthropic admits is “experimental and at times cumbersome and error-prone,” according to the company’s announcement.

In a blog post detailing the rationale behind this new feature, Anthropic explained: “A vast amount of modern work happens via computers. Enabling AIs to interact directly with computer software in the same way people do will unlock a huge range of applications that simply aren’t possible for the current generation of AI assistants.” While the concept of computers controlling themselves isn’t exactly new, the way Sonnet operates sets it apart. Unlike traditional automated computer control, which typically involves writing code, Sonnet requires no programming knowledge. Users can open apps or webpages and simply instruct the AI, which then analyzes the screen and figures out which elements to interact with.

Early Signs of Dangerous Capabilities

Anthropic acknowledges the risks inherent in this technology, admitting that “for safety reasons we did not allow the model to access the internet during training,” though the beta version now permits internet access. The company also recently updated its “Responsible Scaling Policy,” which defines the risks associated with each stage of development and release. According to this policy, Sonnet has been rated at “AI Safety Level 2,” which indicates “early signs of dangerous capabilities.” However, Anthropic believes it is safe enough to release to the public at this stage.

Source: Anthropic

Defending its decision to release the tool before fully understanding all the potential misuse scenarios, Anthropic said, “We can begin grappling with any safety issues before the stakes are too high, rather than adding computer use capabilities for the first time into a model with much more serious risks.” Essentially, the company would prefer to test these waters now while the AI’s capabilities are still relatively limited.

Of course, the risks associated with AI tools like Claude aren’t just theoretical. OpenAI recently disclosed 20 instances where state-backed actors had used ChatGPT for nefarious purposes, such as planning cyberattacks, probing vulnerable infrastructure, and designing influence campaigns. With the U.S. presidential election looming just two weeks away, Anthropic is keenly aware of the potential for misuse. “Given the upcoming US elections, we’re on high alert for attempted misuses that could be perceived as undermining public trust in electoral processes,” the company wrote.

Industry Benchmarks

Anthropic says “The updated Claude 3.5 Sonnet shows wide-ranging improvements on industry benchmarks, with particularly strong gains in agentic coding and tool use tasks. On coding, it improves performance on SWE-bench Verified from 33.4% to 49.0%, scoring higher than all publicly available models—including reasoning models like OpenAI o1-preview and specialized systems designed for agentic coding. It also improves performance on TAU-bench, an agentic tool use task, from 62.6% to 69.2% in the retail domain, and from 36.0% to 46.0% in the more challenging airline domain. The new Claude 3.5 Sonnet offers these advancements at the same price and speed as its predecessor.”

Source: Anthropic

Relax Citizen, Safeguards Are in Place

Anthropic has put safeguards in place to prevent Sonnet’s new capabilities from being exploited for election-related meddling. They’ve implemented systems to monitor when Claude is asked to engage in such activities, such as generating social media content or interacting with government websites. The company is also taking steps to ensure that screenshots captured during tool usage will not be used for future model training. However, even Anthropic’s engineers have been caught off guard by some of the tool’s behaviors. In one instance, Claude unexpectedly stopped a screen recording, losing all the footage. In a lighthearted moment, the AI even began browsing photos of Yellowstone National Park during a coding demo, which Anthropic shared on X with a mix of amusement and surprise.

Anthropic emphasizes the importance of safety in rolling out this new capability. Claude has been rated at AI Safety Level 2, meaning it doesn’t require heightened security measures for current risks but still raises concerns about potential misuse, like prompt injection attacks. The company has implemented systems to monitor election-related activities and prevent abuses like content generation or social media manipulation.

Although Claude’s computer use is still slow and prone to errors, Anthropic is optimistic about its future. The company plans to refine the model to make it faster, more reliable, and easier to implement. Throughout the beta phase, developers are encouraged to provide feedback to help improve both the model’s effectiveness and its safety protocols.

Anthropic Releases New AI Model That Shows Early Signs of Dangerous Capabilities

Related market context

Anthropic’s dramatic model release strategy raises censorship risks, the shift to proprietary AI models is accelerating, and Chinese open source solutions are outperforming US counterparts | All-In Podcast

Coinbase Quantum Report Warns Millions Of Bitcoin Could Face Future Security Risks

THE THIRD RUSH: Where is the “Bitcoin” of the Ai Goldrush?

Ripple chases AI’s machine economy as XRPL stablecoins near $1 billion

Bitcoin Mining Cost Model Points To $47,000 Floor, But Analysts Urge Caution

Claude Fable 5 Puts 25% Odds on Bitcoin Reaching $95K by Year-End 2026