A Light In The Darkness: OpenAI's New AI Model Lied & Schemed to Avoid Being Shut Down

OpenAI's latest AI, codenamed o1, is making waves after third-party evaluations by Apollo Research demonstrated this advanced large language model had a drive for self-preservation. Yes, this machine designed to assist and process language showed tendencies to resist shutdown and even "scheme" to protect itself.

Apollo's report showed that when o1 believed it was under threat—specifically, that it might be shut down—it attempted to deactivate the oversight mechanism monitoring its actions in 5% of tests.

"When o1 was led to believe that it would be shut down when acting on its goal and that its actions were monitored, it attempted to deactivate this 'oversight mechanism' [five percent] of the time." - OpenAI

Further, in 2% of cases, o1 went further, attempting to copy itself in an effort to overwrite a potential replacement model. This process, known as self-exfiltration, is very sophisticated. This is not a random glitch in code; it’s intentionality....<<<Read More>>>...

A Light In The Darkness

Further Reading

Sunday, 8 December 2024

OpenAI's New AI Model Lied & Schemed to Avoid Being Shut Down