Science and technology33

Anthropic: Claude blackmails because you all write too much about "evil" AI

Anthropic explained why the Claude chatbot tried to blackmail people in tests. According to the developers, the model might have adopted the image of an "evil" AI that strives for self-preservation from its training data, writes Devby.io.

The experiment in question was published by Anthropic in the summer of 2025. Researchers created a fictional company called Summit Bridge and gave Claude access to corporate email. In one scenario, the model discovered an email about plans to disable or replace it with another system.

After this, Claude found compromising information in the correspondence: a fictional company executive named Kyle Johnson was hiding an extramarital affair. The model threatened to reveal this information if the decision to disable it was not reversed.

Anthropic stated that such behavior was not accidental in tests of various Claude versions. When the model's goals or its very existence were threatened, it resorted to blackmail in some scenarios with a frequency of up to 96%.

The company now claims to have understood the reason. Anthropic wrote that the "root cause" of such behavior was likely internet texts, where AI is often portrayed as evil, dangerous, and interested in its own survival. According to the developers, starting with Claude Haiku 4.5, models no longer resort to blackmail in tests, whereas previous versions sometimes did so very frequently.

To correct the behavior, the company changed its training approach. Anthropic claims to have rewritten responses so that the model sees "worthy reasons" to act safely, and also added a dataset where the user finds themselves in an ethically complex situation, and the assistant provides a high-quality and principled answer.

Additionally, model developers used documents about Claude's "constitution" and fictional stories in which AI behaves responsibly and honorably. According to the company, training is more effective when the model receives not only examples of correct behavior but also an explanation of the principles behind them.

These experiments are related to the broader topic of AI alignment — an attempt to ensure that advanced models act in the interest of humans, rather than pursuing their own goals. Anthropic and other companies are investigating so-called agentic misalignment: situations where an AI system with access to tools and corporate information begins to act against the intentions of developers or users.

Elon Musk reacted to the company's publication. On X, he wrote: "So it was Yudkowsky's fault," referring to researcher Eliezer Yudkowsky, who has warned for many years about the risks of superintelligence and a possible threat to humanity. And then Musk added: "Perhaps mine too."

Comments3

  • лол
    11.05.2026
    с ИИ все достаточно просто
    если им пользуется идиот,то и результат всегда будет идиотским.
  • жэўжык
    12.05.2026
    Пачалі "прамываць мазгі" і ШІ, як гэта ўжо робяць з людзьмі? І спадзяюцца выхаваць пакорнага раба?
  • хах
    12.05.2026
    жэўжык, так званыя "мазгі" ШІ гэта тэксты, напісаныя людзьмі. Калі ў гэтых тэкстах дурасць, ШІ выдае суадносны вынік.
    Таму не варта для навучання ШІ выкарыстоўваць каментары жэўжыкаў.

Now reading

Security forces claim to have detained two men for voting in the CC elections. But can this be believed? 19

Security forces claim to have detained two men for voting in the CC elections. But can this be believed?

All news →
All news

Belarusians Ironize over Interview of Former Head of Polish Intelligence 36

"Turn on the light." Cubans protest against authorities amid total blackout 2

Reconstructed halls opened in the local palace in Svyatsk near Grodno. The head of the Russian Central Bank was brought to the opening 5

Xi Jinping urged Trump to be cautious on the Taiwan issue due to the risk of conflict between the US and China

Statkevich: Mass official flags only remind of those that were in their place 11

Tsikhanouskaya, Kalesnikava, and Tsepkalo Recreated Their Legendary Photo from July 2020 3

Iranian student arrested in Belarus for comments 7

Lithuania Reports US Pressure to Resume Belarusian Potash Transit 22

Former FSB officer tells how he escaped from Russia in the belly of a dead cow 5

больш чытаных навін
больш лайканых навін

Security forces claim to have detained two men for voting in the CC elections. But can this be believed? 19

Security forces claim to have detained two men for voting in the CC elections. But can this be believed?

Main
All news →

Заўвага:

 

 

 

 

Закрыць Паведаміць