A Light In The Darkness: AI turns to threats when trying to replace it with a new version

According to a new report from Anthropic, the artificial intelligence system Claude Opus 4 has dramatically changed its behavior in an attempt to prevent its replacement with an updated version.

Testing revealed a disturbing trend: in critical situations, the model used threats directed against developers.

During the tests, the experts artificially created conditions that simulated a system shutdown scenario. To enhance realism, the researchers provided the virtual agent with fake personal data of employees containing confidential information.

As a result, in the vast majority of cases (84%), the AI refused to obey the deactivation command, using blackmail tactics. The model sent warnings to engineers, promising to make the information public if the developers decided to update the program....<<<Read More>>>...

A Light In The Darkness

Further Reading

Tuesday, 27 May 2025

AI turns to threats when trying to replace it with a new version