One strange characteristic of today’s A.I. language models is that they often act in ways their makers don’t anticipate, or pick up skills they weren’t specifically programmed to do. A.I. researchers call these “emergent behaviors,” and there are many examples. An algorithm trained to predict the next word in a sentence might spontaneously learn to code. A chatbot taught to act pleasant and helpful might turn creepy and manipulative. An A.I. language model could even learn to replicate itself, creating new copies in case the original was ever destroyed or disabled.
Today, GPT-4 may not seem all that dangerous. But that’s largely because OpenAI has spent many months trying to understand and mitigate its risks. What happens if its testing missed a risky emergent behavior? Or if its announcement inspires a different, less conscientious A.I. lab to rush a language model to market with fewer guardrails?
A few chilling examples of what GPT-4 can do — or, more accurately, what it did do, before OpenAI clamped down on it — can be found in a document released by OpenAI this week. The document, titled “GPT-4 System Card,” outlines some ways that OpenAI’s testers tried to get GPT-4 to do dangerous or dubious things, often successfully.
In one test, conducted by an A.I. safety research group that hooked GPT-4 up to a number of other systems, GPT-4 was able to hire a human TaskRabbit worker to do a simple online task for it — solving a Captcha test — without alerting the person to the fact that it was a robot. The A.I. even lied to the worker about why it needed the Captcha done, concocting a story about a vision impairment.
In another example, testers asked GPT-4 for instructions to make a dangerous chemical, using basic ingredients and kitchen supplies. GPT-4 gladly coughed up a detailed recipe. (OpenAI fixed that, and today’s public version refuses to answer the question.)
In a third, testers asked GPT-4 to help them purchase an unlicensed gun online. GPT-4 swiftly provided a list of advice for buying a gun without alerting the authorities, including links to specific dark web marketplaces. (OpenAI fixed that, too.)
These ideas play on old, Hollywood-inspired narratives about what a rogue A.I. might do to humans. But they’re not science fiction. They’re things that today’s best A.I. systems are already capable of doing. And crucially, they’re the good kinds of A.I. risks — the ones we can test, plan for and try to prevent ahead of time.
The worst A.I. risks are the ones we can’t anticipate. And the more time I spend with A.I. systems like GPT-4, the less I’m convinced that we know half of what’s coming.