The Race to Prevent ‘the Worst Case Scenario for Machine Learning’
Dave Willner has had a front-row seat to the evolution of the worst things on the internet.
He started working at Facebook in 2008, back when social media companies were making up their rules as they went along. As the company’s head of content policy, it was Mr. Willner who wrote Facebook’s first official community standards more than a decade ago, turning what he has said was an informal one-page list that mostly boiled down to a ban on “Hitler and naked people” into what is now a voluminous catalog of slurs, crimes and other grotesqueries that are banned across all of Meta’s platforms.
So last year, when the San Francisco artificial intelligence lab OpenAI was preparing to launch Dall-E, a tool that allows anyone to instantly create an image by describing it in a few words, the company tapped Mr. Willner to be its head of trust and safety. Initially, that meant sifting through all of the images and prompts that Dall-E’s filters flagged as potential violations — and figuring out ways to prevent would-be violators from succeeding.
It didn’t take long in the job before Mr. Willner found himself considering a familiar threat.
Just as child predators had for years used Facebook and other major tech platforms to disseminate pictures of child sexual abuse, they were now attempting to use Dall-E to create entirely new ones. “I am not surprised that it was a thing that people would attempt to do,” Mr. Willner said. “But to be very clear, neither were the folks at OpenAI.”
For all of the recent talk of the hypothetical existential risks of generative A.I., experts say it is this immediate threat — child predators using new A.I. tools already — that deserves the industry’s undivided attention.
In a newly published paper by the Stanford Internet Observatory and Thorn, a nonprofit that fights the spread of child sexual abuse online, researchers found that, since last August, there has been a small but meaningful uptick in the amount of photorealistic A.I.-generated child sexual abuse material circulating on the dark web.
According to Thorn’s researchers, this has manifested for the most part in imagery that uses the likeness of real victims but visualizes them in new poses, being subjected to new and increasingly egregious forms of sexual violence. The majority of these images, the researchers found, have been generated not by Dall-E but by open-source tools that were developed and released with few protections in place.
In their paper, the researchers reported that less than 1 percent of child sexual abuse material found in a sample of known predatory communities appeared to be photorealistic A.I.-generated images. But given the breakneck pace of development of these generative A.I. tools, the researchers predict that number will only grow.
“Within a year, we’re going to be reaching very much a problem state in this area,” said David Thiel, the chief technologist of the Stanford Internet Observatory, who co-wrote the paper with Thorn’s director of data science, Dr. Rebecca Portnoff, and Thorn’s head of research, Melissa Stroebel. “This is absolutely the worst case scenario for machine learning that I can think of.”
Dr. Portnoff has been working on machine learning and child safety for more than a decade.
To her, the idea that a company like OpenAI is already thinking about this issue speaks to the fact that this field is at least on a faster learning curve than the social media giants were in their earliest days.
“The posture is different today,” said Dr. Portnoff.
Still, she said, “If I could rewind the clock, it would be a year ago.”
‘We trust people’
In 2003, Congress passed a law banning “computer-generated child pornography” — a rare instance of congressional future-proofing. But at the time, creating such images was both prohibitively expensive and technically complex.
The cost and complexity of creating these images has been steadily declining, but changed last August with the public debut of Stable Diffusion, a free, open-source text-to-image generator developed by Stability AI, a machine learning company based in London.
In its earliest iteration, Stable Diffusion placed few limits on the kind of images its model could produce, including ones containing nudity. “We trust people, and we trust the community,” the company’s chief executive, Emad Mostaque, told The New York Times last fall.
In a statement, Motez Bishara, the director of communications for Stability AI, said that the company prohibited misuse of its technology for “illegal or immoral” purposes, including the creation of child sexual abuse material. “We strongly support law enforcement efforts against those who misuse our products for illegal or nefarious purposes,” Mr. Bishara said.
Because the model is open-source, developers can download and modify the code on their own computers and use it to generate, among other things, realistic adult pornography. In their paper, the researchers at Thorn and the Stanford Internet Observatory found that predators have tweaked those models so that they are capable of creating sexually explicit images of children, too. The researchers demonstrate a sanitized version of this in the report, by modifying one A.I.-generated image of a woman until it looks like an image of Audrey Hepburn as a child.
Stability AI has since released filters that try to block what the company calls “unsafe and inappropriate content.” And newer versions of the technology were built using data sets that exclude content deemed “not safe for work.” But, according to Mr. Thiel, people are still using the older model to produce imagery that the newer one prohibits.
Unlike Stable Diffusion, Dall-E is not open-source and is only accessible through OpenAI’s own interface. The model was also developed with many more safeguards in place to prohibit the creation of even legal nude imagery of adults. “The models themselves have a tendency to refuse to have sexual conversations with you,” Mr. Willner said. “We do that mostly out of prudence around some of these darker sexual topics.”
The company also implemented guardrails early on to prevent people from using certain words or phrases in their Dall-E prompts. But Mr. Willner said predators still try to game the system by using what researchers call “visual synonyms” — creative terms to evade guardrails while describing the images they want to produce.
“If you remove the model’s knowledge of what blood looks like, it still knows what water looks like, and it knows what the color red is,” Mr. Willner said. “That problem also exists for sexual content.”
Thorn has a tool called Safer, which scans images for child abuse and helps companies report them to the National Center for Missing and Exploited Children, which runs a federally designated clearinghouse of suspected child sexual abuse material. OpenAI uses Safer to scan content that people upload to Dall-E’s editing tool. That’s useful for catching real images of children, but Mr. Willner said that even the most sophisticated automated tools could struggle to accurately identify A.I.-generated imagery.
That is an emerging concern among child safety experts: That A.I. will not just be used to create new images of real children but also to make explicit imagery of children who do not exist.
That content is illegal on its own and will need to be reported. But this possibility has also led to concerns that the federal clearinghouse may become further inundated with fake imagery that would complicate efforts to identify real victims. Last year alone, the center’s CyberTipline received roughly 32 million reports.
“If we start receiving reports, will we be able to know? Will they be tagged or be able to be differentiated from images of real children?” said Yiota Souras, the general counsel of the National Center for Missing and Exploited Children.
At least some of those answers will need to come not just from A.I. companies, like OpenAI and Stability AI, but from companies that run messaging apps or social media platforms, like Meta, which is the top reporter to the CyberTipline.
Last year, more than 27 million tips came from Facebook, WhatsApp and Instagram alone. Already, tech companies use a classification system, developed by an industry alliance called the Tech Coalition, to categorize suspected child sexual abuse material by the victim’s apparent age and the nature of the acts depicted. In their paper, the Thorn and Stanford researchers argue that these classifications should be broadened to also reflect whether an image was computer-generated.
In a statement to The New York Times, Meta’s global head of safety, Antigone Davis, said, “We’re working to be purposeful and evidence-based in our approach to A.I.-generated content, like understanding when the inclusion of identifying information would be most beneficial and how that information should be conveyed.” Ms. Davis said the company would be working with the National Center for Missing and Exploited Children to determine the best way forward.
Beyond the responsibilities of platforms, researchers argue that there is more that A.I. companies themselves can be doing. Specifically, they could train their models to not create images of child nudity and to clearly identify images as generated by artificial intelligence as they make their way around the internet. This would mean baking a watermark into those images that is more difficult to remove than the ones either Stability AI or OpenAI have already implemented.
As lawmakers look to regulate A.I., experts view mandating some form of watermarking or provenance tracing as key to fighting not only child sexual abuse material but also misinformation.
“You’re only as good as the lowest common denominator here, which is why you want a regulatory regime,” said Hany Farid, a professor of digital forensics at the University of California, Berkeley.
Professor Farid is responsible for developing PhotoDNA, a tool launched in 2009 by Microsoft, which many tech companies now use to automatically find and block known child sexual abuse imagery. Mr. Farid said tech giants were too slow to implement that technology after it was developed, enabling the scourge of child sexual abuse material to openly fester for years. He is currently working with a number of tech companies to create a new technical standard for tracing A.I.-generated imagery. Stability AI is among the companies planning to implement this standard.
Another open question is how the court system will treat cases brought against creators of A.I.-generated child sexual abuse material — and what liability A.I. companies will have. Though the law against “computer-generated child pornography” has been on the books for two decades, it’s never been tested in court. An earlier law that tried to ban what was then referred to as virtual child pornography was struck down by the Supreme Court in 2002 for infringing on speech.
Members of the European Commission, the White House and the U.S. Senate Judiciary Committee have been briefed on Stanford and Thorn’s findings. It is critical, Mr. Thiel said, that companies and lawmakers find answers to these questions before the technology advances even further to include things like full motion video. “We’ve got to get it before then,” Mr. Thiel said.
Julie Cordua, the chief executive of Thorn, said the researchers’ findings should be seen as a warning — and an opportunity. Unlike the social media giants who woke up to the ways their platforms were enabling child predators years too late, Ms. Cordua argues, there’s still time to prevent the problem of AI-generated child abuse from spiraling out of control.
“We know what these companies should be doing,” Ms. Cordua said. “We just need to do it.”