Critical Nvidia Bug Allows Container Escape, Host Takeover

A critical bug in Nvidia’s widely used Container Toolkit could allow a rogue user or software to escape their containers and ultimately take complete control of the underlying host.

The flaw, tracked as CVE-2024-0132, earned a 9.0 out of 10 CVSS severity rating, and affects all versions of Container Toolkit up to and including v1.16.1, and Nvidia GPU Operator up to and including 24.6.1.

Nvidia issued a fix on Wednesday with the latest version of Container Toolkit (v1.16.2) and Nvidia GPU Operator (v24.6.2). The vulnerability does not impact use cases where Container Device Interface (CDI) is used.

This particular library is used across clouds and AI workloads. According to infosec house Wiz, 33 percent of cloud environments have a buggy version of Nvidia Container Toolkit installed, rendering them vulnerable.

Wiz security researchers found and disclosed the bug on September 1, and the GPU giant has confirmed it is as concerning as the cloud security shop makes it out to be.

“A successful exploit of this vulnerability may lead to code execution, denial of service, escalation of privileges, information disclosure, and data tampering,” Nvidia warned in its security advisory.

Again, this is exploitable by someone or something that’s been allowed to or managed to run or run within a container on a vulnerable host.

CVE-2024-0132 is a Time of Check Time of Use (TOCTOU) vulnerability, a type of race condition. This can allow the attacker to gain access to resources that they should not have access to.

Specific to Nvidia Container Toolkit: “Any environment that allows the use of third party container images or AI models – either internally or as-a-service – is at higher risk given that this vulnerability can be exploited via a malicious image,” Wiz kids Shir Tamari, Ronen Shustin, Andres Riancho said in a write-up about the bug.

To exploit CVE-2024-0132, an attacker would need to craft a specially designed image and then get the image to run on the target platform, either indirectly, by convincing/tricking the user into running the malicious image, or directly, if the attacker has access to shared GPU resources.

In a single-tenant compute environment, this could happen if a user downloads a malicious container image — say, via a social engineering attack where the user believes the container image is coming from a trusted source. In this scenario, the attacker could then take over the user’s workstation.

In a shared environment, such as Kubernetes-powered one, however, a miscreant with permission to deploy a container could escape it and then access data or secrets of other applications on the same node or cluster, the researchers noted. 

This second scenario “is especially relevant for AI service providers that allow customers to run their own GPU-enabled container images,” they warned.

“An attacker could deploy a harmful container, break out of it, and use the host machine’s secrets to target the cloud service’s control systems,” the researchers continued. “This could give the attacker access to sensitive information, like the source code, data, and secrets of other customers using the same service.” 

Wiz isn’t providing too many technical details about how to exploit the vuln because the security shop wants to ensure that vulnerable organizations have time to deploy the fix — and not have their host system taken over with root privileges.

But the researchers promised more to come soon, including exploit details, so it’s a good idea to get ahead of the would-be attackers on this one. ®

READ MORE HERE