The Register

OpenAI’s ChatGPT crawler can be tricked into DDoSing sites, answering your queries

OpenAI’s ChatGPT crawler appears to be willing to initiate distributed denial of service (DDoS) attacks on arbitrary websites, a reported vulnerability the tech giant has yet to acknowledge.

In a write-up shared this month via Microsoft’s GitHub, Benjamin Flesch, a security researcher in Germany, explains how a single HTTP request to the ChatGPT API can be used to flood a targeted website with network requests from the ChatGPT crawler, specifically ChatGPT-User.

This flood of connections may or may not be enough to knock over any given site, practically speaking, though it’s still arguably a danger and a bit of an oversight by OpenAI. It can be used to amplify a single API request into 20 to 5,000 or more requests to a chosen victim’s website, every second, over and over again.

“ChatGPT API exhibits a severe quality defect when handling HTTP POST requests to https://chatgpt.com/backend-api/attributions,” Flesch explains in his advisory, referring to an API endpoint called by OpenAI’s ChatGPT to return information about web sources cited in the chatbot’s output. When ChatGPT mentions specific websites, it will call attributions with a list of URLs to those sites for its crawler to go access and fetch information about.

If you throw a big long list of URLs at the API, each slightly different but all pointing to the same site, the crawler will go off and hit every one of them at once.

“The API expects a list of hyperlinks in parameter urls. It is commonly known that hyperlinks to the same website can be written in many different ways,” Flesch wrote.

“Due to bad programming practices, OpenAI does not check if a hyperlink to the same resource appears multiple times in the list. OpenAI also does not enforce a limit on the maximum number of hyperlinks stored in the urls parameter, thereby enabling the transmission of many thousands of hyperlinks within a single HTTP request.”

The victim will never know what hit them

Thus, using a tool like Curl, an attacker can send an HTTP POST request – without any need for an authentication token – to that ChatGPT endpoint and OpenAI’s servers in Microsoft Azure will respond by initiating an HTTP request for each hyperlink submitted via the urls[] parameter in the request. When those requests are directed to the same website, they can potentially overwhelm the target, causing DDoS symptoms – the crawler, proxied by Cloudflare, will visit the targeted site from a different IP address each time.

“The victim will never know what hit them, because they only see ChatGPT bot hitting their website from about 20 different IP addresses simultaneously,” Flesch told The Register, adding that if the victim enabled a firewall to block the IP address range used by the ChatGPT bot, the bot would still send requests.

“So one failed/blocked request would not prevent the ChatGPT bot from requesting the victim website again in the next millisecond.”

“Due to this amplification, the attacker can send a small number of requests to ChatGPT API, but the victim will receive a very large number of requests,” Flesch explained.

Flesch says he reported this unauthenticated reflective DDoS vulnerability through numerous channels – OpenAI’s BugCrowd vulnerability reporting platform, OpenAI’s security team email, Microsoft (including Azure) and HackerOne – but has heard nothing.

The Register reached out twice to Microsoft-backed OpenAI and we’ve not heard back.

“I’d say the bigger story is that this API was also vulnerable to prompt injection,” he said, in reference to a separate vulnerability disclosure. “Why would they have prompt injection for such a simple task? I think it might be because they’re dogfooding their autonomous ‘AI agent’ thing.”

That second issue can be exploited to make the crawler answer queries via the same attributions API endpoint; you can feed questions to the bot, and it can answer them, when it’s really not supposed to do that; it’s supposed to just fetch websites.

Flesch questioned why OpenAI’s bot hasn’t implemented simple, established methods to properly deduplicate URLs in a requested list or to limit the size of the list, nor managed to avoid prompt injection vulnerabilities that have been addressed in the main ChatGPT interface.

“To me it seems like this small API is an example project of their ChatGPT AI agents, and its task is to parse a URL out of user-provided data and then use Azure to fetch the website,” he said.

“Does the ‘AI agent’ not come with built-in security?” he asked. “Because obviously the ‘AI agent’ thing that was handling the urls[] parameter had no concept of resource exhaustion, or why it would be stupid to send thousands of requests in the same second to the same web domain.

“Shouldn’t it have recognized that victim.com/1 and victim.com/2 point to the same website victim.com and if the victim.com/1 request is failing, why would it send a request to victim.com/2 immediately afterwards?

“These are all small pieces of validation logic that people have been implementing in their software for years, to prevent abuse like this.”

Flesch said the only explanation that comes to mind is that OpenAI is using an AI Agent to trigger these HTTP requests.

“I cannot imagine a highly-paid Silicon Valley engineer designing software like this, because the ChatGPT crawler has been crawling the web for many years, just like the Google crawler,” he said. “If crawlers don’t limit their amount of requests to the same website, they will get blocked immediately.” ®

READ MORE HERE