How to Mitigate the Impact of Rogue AI Risks

October 17, 2024 TH Author

In previous parts of this series on Rogue AI, we briefly explored what organizations can do to better manage risk across their AI attack surface. And we touched on ways to mitigate threats by creating trusted AI identities. We’ve also cited the great work that MIT is doing to collect AI risks and that OWASP is doing to suggest effective mitigations for LLM vulnerabilities.

Now it’s time to fill in the missing pieces of the puzzle by describing how Zero Trust and layered defenses can secure against Rogue AI threats.

Rogue AI Causal Factors

LLM Vulnerability / Type of Rogue	Accidental	Subverted	Malicious
Excessive Functionality	Misconfiguration of capability or guardrails	Capabilities modified or added directly, or guardrails evaded	Functionality required for malicious goals
Excessive Permissions	Misconfiguration of authorization	Privileges escalated	Must acquire all privileges; none to start
Excessive Autonomy	Misconfiguration of tasks requiring human review	Human removed from the loop	Not under defender control

The Causal factors above can be used to identify and mitigate risk associated with Rogue AI services. The first step is to properly configure the relevant AI services, which provides a foundation of safety against all types of Rogue AI by specifying allowed behaviors. Protecting and sanitizing the points where known AI services touch data or use tools primarily prevents Subverted Rogues, but can also address other ways accidents happen. Restricting AI systems to allowed data and tool use, and verifying the content of inputs to and outputs from AI systems forms the core of safe use.

Malicious Rogues can attack your organization from the outside or act as AI malware within your environment. Many patterns used to detect malicious activities by cyber attackers can also be used to detect the activities of Malicious Rogues. But as new capabilities enhance the evasiveness of Rogues, learning patterns for detection will not cover the unknown unknowns. In this case, machine behaviors need to be identified on devices, in workloads and in network activity. In some cases, this is the only way to catch Malicious Rogues.

Behavioral analysis can also detect other instances of excessive functionality, permissions or autonomy. Anomalous activity across devices, workloads, and network can be a leading indicator for Rogue AI activity, no matter how it was caused.

Comprehensive defense across the OSI communications stack

However, for a more comprehensive approach, we must consider defense in depth at every layer of the OSI model, as follows:

Physical: Monitor processor use (CPU, GPU, TPU, NPU, DPU) in cloud, endpoint and edge devices. This applies to AI-specific workload patterns, querying AI models (inference), and loading model parameters into memory close to AI-specific processing.

Data layer: Use MLOps/LLMOps versioning and verification to ensure models are not poisoned or replaced, recording hashes to identify models. Use software and AI model bills of materials (SBoMs/MBoMs) to ensure the AI service software and model can be trusted.

Network: Limit AI services that can be reached externally as well as the tools and APIs that AI services can reach. Detect anomalous communicators such as human-to-machine transitions and novel machine activity.

Transport: Consider rate limiting for external AI services and scanning for anomalous packets.

Session: Insert verification processes such as human-in-the-loop checks, especially when instantiating AI services. Use timeouts to mitigate session hijacking. Analyze user-context authentications and detect anomalous sessions.

Application and Presentation layers: Identify misconfiguration of functionality, permissions and autonomy (as per the table above). Use guardrails on AI inputs and outputs, such as scrubbing of personal (PII) and other sensitive information, offensive content, and prompt injections or system jailbreaks. Restrict LLM agent tools according to an allow list which limits APIs and plugins and only allows well-defined use of well-known websites.

Rogue AI and the Zero Trust Maturity Model

Zero Trust security architecture provides many tools to mitigate Rogue AI risk. The Zero Trust Maturity Model was created by the US Cybersecurity and Infrastructure Security Agency (CISA) to support federal agency efforts to comply with Executive Order (EO) 14028: Improving the Nation’s Cybersecurity. It reflects the seven tenets of zero trust as outlined in NIST SP 800-207:

All data sources and computing services are considered resources.
All communication is secured regardless of network location.
Access to individual enterprise resources is granted on a per-session basis.
Access to resources is determined by dynamic policy.
The enterprise monitors and measures the integrity and security posture of all owned and associated assets.
All resource authentication and authorization are dynamic and strictly enforced before access is allowed.
The enterprise collects as much information as possible about the current state of assets, network infrastructure, and communications and uses it to improve its security posture.

Effective risk mitigation in a Rogue AI context requires organizations to reach the “advanced” stage described in the CISA document:

“Wherever applicable, automated controls for lifecycle and assignment of configurations and policies with cross-pillar coordination; centralized visibility and identity control; policy enforcement integrated across pillars; response to pre-defined mitigations; changes to least privilege based on risk and posture assessments; and building toward enterprise-wide awareness (including externally hosted resources).”

You May Also Like

This Week in Security News – February 11, 2022

RSAC 2024 Review: AI & Data Governance Priorities

Behind the Great Wall: Void Arachne Targets Chinese-Speaking Users With the Winos 4.0 C&C Framework