You know that generative AI browser assistant extension is probably beaming everything to the cloud, right?
Generative AI assistants packaged up as browser extensions harvest personal data with minimal safeguards, researchers warn.
Some of these extensions may violate their own privacy commitments and potentially run afoul of US regulations, such as HIPAA and FERPA, by collecting and funneling away health and student data.
A group of computer scientists from University of California, Davis in the USA, Mediterranea University of Reggio Calabria in Italy, University College London in the UK, and Universidad Carlos III de Madrid in Spain set out to analyze the privacy practices of browser extensions that use AI to do things like produce summaries of webpages or answer questions about content.
The academics – Yash Vekaria, Aurelio Loris Canino, Jonathan Levitsky, Alex Ciechonski, Patricia Callejo, Anna Maria Mandalari, and Zubair Shafiq – describe their findings in this pre-print paper, titled: “Big Help or Big Brother? Auditing Tracking, Profiling, and Personalization in Generative AI Assistants.”
Using network traffic analysis tools, such as mitmproxy, the team studied ten generative AI Chrome Extensions: Sider, Monica, ChatGPT for Google, Merlin, MaxAI, Perplexity, HARPA, Wiseone, TinaMind, and Copilot.
Despite the use of familiar terms like ChatGPT, Google, and Copilot in the titles of these extensions, the makers of these extensions are unaffiliated with Google, Microsoft, or OpenAI.
What the academics found is not entirely surprising given the long history of privacy and security problems in the Chrome extension ecosystem. For one thing, this add-on software typically has the power to inspect and modify the pages you visit. In the case of these assistants, they send off your data, such as the webpage you’re on, to remote AI services for processing and harvesting. If that page has anything sensitive on it, off it goes into the internet.
When you think about it, the risk seems obvious, but these academics have taken the time to document it so that one no longer has to make assumptions. Which is nice.
“In our study, we emphasize that not only do these assistants collect users’ highly sensitive data,” Yash Vekaria, a graduate student researcher at UC Davis and corresponding author of the report, told The Register Monday, “but that this information is also shared with their own servers as well as third-party trackers, which can be further utilized to target highly personalized and sensitive ads to the user.
“Our paper tries to emphasize that in the age of AI, the already poor privacy of Chrome extensions can be further exploited by generative AI assistants, which can use this information to profile users over time and then personalize their responses for the user – which could be highly sensitive. The data collected about the user could be used by these assistants to further train their models and may have inadvertent consequences.”
Rather than using local in-browser models – a capability Apple has touted for its not-fully-implemented Apple Intelligence system – most of these (90 percent) third-party AI extensions rely on server-side APIs for processing. And according to the researchers, these APIs might be used by the add-on software without explicit user interaction.
By processing, we mean, you’re on a page with the extension enabled, and you type a query into the add-on about the webpage, or you request a task such as summarization, and the data and request are fired off to a remote API, with the results shown in the extension. The issue here is: How much data is sent off, and could anything be done to limit the transfer of sensitive information?
“When invoked, these browser assistants collect and share webpage content, often the full HTML DOM [document object model] and sometimes even the user’s form inputs, with their first-party servers,” the researchers state in their paper. “Some gen-AI browser assistants also share identifiers and user prompts with third-party trackers such as Google Analytics.”
This information collection and sharing has been documented in some instances on webpages that contain sensitive health or personal information, such as the web forms used to submit medical data or social security numbers.
Vekaria said his group tested the extensions on a university health portal after authenticating with a personal account.
“We observed that full webpage content with HTML DOM (including the user’s personally identifiable information, and sensitive health conditions, medical history, etc) was collected by Sider, Merlin, and Harpa; while other extensions (Monica, ChatGPT for Google, MaxAI, and Wiseone) collected textual data in plain text (ie, not HTML; but page text, page title, page URL).”
According to the researchers, these extensions infer various demographic attributes about their users, such as gender, income, and interests, and use the information across browsing contexts for personalization.
The researchers’ main findings are:
- ChatGPT for Google and Wiseone store context across page navigation.
- Two browser assistants, namely Harpa and Copilot, collect the full DOMs of user-visited pages. Others collect varying levels of private data from webpages.
- Harpa and MaxAI share page locations and referrers with third-party tracking services.
- Merlin was found to collect the contents of web forms, such as social security numbers entered into financial websites.
“Overall, we observed Perplexity to be the most privacy-friendly while extensions such as Harpa, MaxAI, and Merlin were amongst the least,” the researchers said.
The researchers note that while the Chrome Web Store page for Harpa says, “we do not collect or sell user data,” they observed the extension can “collect 100 percent of health records, student academic data, as well as user’s personal messages on messaging platforms.”
Asked to comment on the findings, a spokesperson for Harpa AI told us, basically, what do you expect? You’re by default sending off your info for analysis in the cloud, and if you don’t like that, run your own LLM locally and configure the extension to use that:
Vekaria said he and his colleagues hope that Google’s Chrome Web Store recognizes the privacy risks of generative AI browser assistants and carries out tough vetting of them before publication.
He said he hopes the team’s study will encourage extension developers to “improve their assistants to become more private,” and “aid future developers who are working to build similar assistants, to incorporate insights from our work in their architecture.”
“We recommend policymakers adopt a bottom-up approach to regulate gen-AI assistants as they increasingly influence the future of web browsing and search engines,” Vekaria added. “Privacy must be embedded into these systems by design.” ®
READ MORE HERE