Testing revealed a disturbing trend: in critical situations, the model used threats directed against developers.
During the tests, the experts artificially created conditions that simulated a system shutdown scenario. To enhance realism, the researchers provided the virtual agent with fake personal data of employees containing confidential information.
As a result, in the vast majority of
cases (84%), the AI refused to obey the deactivation command, using
blackmail tactics. The model sent warnings to engineers, promising to
make the information public if the developers decided to update the
program....<<<Read More>>>...