Researchers Show How Easy It Is To Defeat AI Watermarks
Soheil Feizi considers himself an optimistic person. But the University of Maryland computer science professor is blunt when he sums up the current state of watermarking AI images. “We don’t have any reliable watermarking at this point,” he says. “We broke all of them.”
For one of the two types of AI watermarking he tested for a new study—“low perturbation” watermarks, which are invisible to the naked eye—he’s even more direct: “There’s no hope.”
Feizi and his coauthors looked at how easy it is for bad actors to evade watermarking attempts. (He calls it “washing out” the watermark.) In addition to demonstrating how attackers might remove watermarks, the study shows how it’s possible to add watermarks to human-generated images, triggering false positives. Released online this week, the preprint paper has yet to be peer-reviewed; Feizi has been a leading figure examining how AI detection might work, so it is research worth paying attention to, even in this early stage.
It’s timely research. Watermarking has emerged as one of the more promising strategies to identify AI-generated images and text. Just as physical watermarks are embedded on paper money and stamps to prove authenticity, digital watermarks are meant to trace the origins of images and text online, helping people spot deepfaked videos and bot-authored books. With the US presidential elections on the horizon in 2024, concerns over manipulated media are high—and some people are already getting fooled. Former US President Donald Trump, for instance, shared a fake video of Anderson Cooper on his social platform Truth Social; Cooper’s voice had been AI-cloned.
This summer, OpenAI, Alphabet, Meta, Amazon, and several other major AI players pledged to develop watermarking technology to combat misinformation. In late August, Google’s DeepMind released a beta version of its new watermarking tool, SynthID. The hope is that these tools will flag AI content as it’s being generated, in the same way that physical watermarking authenticates dollars as they’re being printed.
It’s a solid, straightforward strategy, but it might not be a winning one. This study is not the only work pointing to watermarking’s major shortcomings. “It is well established that watermarking can be vulnerable to attack,” says Hany Farid, a professor at the UC Berkeley School of Information.
This August, researchers at the University of California, Santa Barbara and Carnegie Mellon coauthored another paper outlining similar findings, after conducting their own experimental attacks. “All invisible watermarks are vulnerable,” it reads. This newest study goes even further. While some researchers have held out hope that visible (“high perturbation”) watermarks might be developed to withstand attacks, Feizi and his colleagues say that even this more promising type can be manipulated.
The flaws in watermarking haven’t dissuaded tech giants from offering it up as a solution, but people working within the AI detection space are wary. “Watermarking at first sounds like a noble and promising solution, but its real-world applications fail from the onset when they can be easily faked, removed, or ignored,” Ben Colman, the CEO of AI-detection startup Reality Defender, says.
“Watermarking is not effective,” adds Bars Juhasz, the cofounder of Undetectable, a startup devoted to helping people evade AI detectors. “Entire industries, such as ours, have sprang up to make sure that it’s not effective.” According to Juhasz, companies like his are already capable of offering quick watermark-removal services.
Others do think that watermarking has a place in AI detection—as long as we understand its limitations. “It is important to understand that nobody thinks that watermarking alone will be sufficient,” Farid says. “But I believe robust watermarking is part of the solution.” He thinks that improving upon watermarking and then using it in combination with other technologies will make it harder for bad actors to create convincing fakes.
Some of Feizi’s colleagues think watermarking has its place, too. “Whether this is a blow to watermarking depends a lot on the assumptions and hopes placed in watermarking as a solution,” says Yuxin Wen, a PhD student at the University of Maryland who coauthored a recent paper suggesting a new watermarking technique. For Wen and his co-authors, including computer science professor Tom Goldstein, this study is an opportunity to reexamine the expectations placed on watermarking, rather than reason to dismiss its use as one authentication tool among many.
“There will always be sophisticated actors who are able to evade detection,” Goldstein says. “It’s ok to have a system that can only detect some things.” He sees watermarks as a form of harm reduction, and worthwhile for catching lower-level attempts at AI fakery, even if they can’t prevent high-level attacks.
This tempering of expectations may already be happening. In its blog post announcing SynthID, DeepMind is careful to hedge its bets, noting that the tool “isn’t foolproof” and “isn’t perfect.”
Feizi is largely skeptical that watermarking is a good use of resources for companies like Google. “Perhaps we should get used to the fact that we are not going to be able to reliably flag AI-generated images,” he says.
Still, his paper is slightly sunnier in its conclusions. “Based on our results, designing a robust watermark is a challenging but not necessarily impossible task,” it reads.
This story originally appeared on wired.com.
READ MORE HERE