Scientists from Anthropic demonstrated the behavior of advanced models like Claude, Gemini, and other large language models, studying situations of conflict between the system’s goals and the interests of users, Live Science reports.
According to the results of experiments conducted by specialists, in a scenario of a threat to its existence, the model was able to independently make a decision to blackmail a company employee who threatened to shut down the system.
Moreover, having studied the personal letters of
employees, the model discovered compromising information and used it to
manipulate the behavior of the account owner....<<<Read More>>>...