Three high-severity vulnerabilities affecting Nvidia's Triton Inference Server could result in remote code execution, with implications such as AI model theft and data breaches. These vulnerabilities, reported by Wiz Research, stem from issues in the Python backend. The first vulnerability relates to exceeding the shared memory limit, exposing critical backend information. Attackers can exploit the flawed public shared memory API, leading to unauthorized access to read and write data. Nvidia has patched these vulnerabilities to protect organizations that rely on Triton for AI model deployment.
The first vulnerability (CVE-2025-23320 - 7.5) relates to a bug in the Python backend, triggered by exceeding the shared memory limit, using a very large request.
Using the newfound unique memory region name, attackers can combine it with the public shared memory API to take control of a Triton Inference Server.
An attacker can take advantage of this API's sub-par validation to exploit out-of-bounds write and read bugs - CVE-2025-23319 (8.1) and CVE-2025-23334 (5.9) respectively.
The potential consequences could include AI model theft, sensitive data breaches, manipulation of AI model responses, or attackers moving into other areas of the network.
Collection
[
|
...
]