What is an AI gateway and why is it a security risk?

An AI gateway (such as LiteLLM, LangChain proxy, or LlamaIndex router) routes LLM requests from applications to multiple model providers. Because it aggregates API keys for OpenAI, Anthropic, Azure, AWS Bedrock, and others into a single service, a compromise exposes every downstream model subscription and all traffic passing through it.

How did the LiteLLM supply chain attack work?

TeamPCP compromised Aqua Security's Trivy GitHub Action by poisoning its version tags, which let them exfiltrate the PyPI publish token from LiteLLM's CI/CD pipeline. They then uploaded malicious versions v1.82.7 and v1.82.8 directly to PyPI using legitimate credentials. The packages installed a .pth file that ran a three-stage payload on every Python process startup: credential harvesting, Kubernetes lateral movement, and a persistent systemd backdoor.

Was there a CVE for the LiteLLM attack?

The supply chain attack itself was not assigned a CVE because it used stolen legitimate credentials rather than exploiting a software vulnerability. Two proxy-layer CVEs were disclosed concurrently: CVE-2026-35029 (CVSS 8.8), an admin role bypass on the /config/update endpoint enabling RCE, and CVE-2026-35030, a JWT authentication cache collision bypass.

How do I know if my LiteLLM deployment was affected?

Check whether you installed litellm versions 1.82.7 or 1.82.8 between March 24 and March 25, 2026. Look for a litellm_init.pth file in your Python site-packages directory and the sysmon.service systemd unit. Rotate all credentials accessible to that environment: cloud provider tokens, API keys, SSH keys, Kubernetes service account tokens, and database credentials.

What hardening steps should every AI gateway deployment implement?

Pin all CI/CD GitHub Actions to full commit SHAs (not version tags). Store API keys in a dedicated secrets manager such as HashiCorp Vault or AWS Secrets Manager, never in environment variables on CI/CD runners. Deploy your gateway inside a VPC with no public endpoint. Restrict the admin configuration endpoint to a separate access-controlled internal port. Monitor for anomalous outbound traffic and unexpected .pth file creation in Python environments.

How does OWASP classify AI gateway supply chain risks?

OWASP LLM Top 10 (2025) classifies this under LLM03: Supply Chain, covering compromised CI/CD pipelines, malicious third-party packages, and untrusted components with elevated access. The 2026 update extends this to agentic pipeline components and MCP servers, recognizing that the AI infrastructure supply chain is broader than traditional application supply chain security.

AI Gateway Supply Chain Security: Lessons from LiteLLM

AI gateway supply chain security became an urgent enterprise concern on March 24, 2026, when threat actor group TeamPCP published backdoored versions of the LiteLLM Python package to PyPI. Within approximately 40 minutes, 119,000 downloads had occurred, exposing organizations to a three-stage credential theft and persistence payload. This post breaks down exactly what happened, why AI gateways are a high-value target, and the specific hardening controls that reduce your exposure.

Key Takeaways

LiteLLM v1.82.7 and v1.82.8 were compromised on March 24, 2026, as the third stage of a cascading four-stage supply chain campaign by threat actor group TeamPCP
The attack used stolen GitHub Actions credentials to publish malicious packages with legitimate PyPI credentials, bypassing hash-based detection entirely
AI gateways concentrate blast radius: one compromised service exposes API keys for every model provider it proxies, plus cloud credentials and all LLM traffic
Two proxy-layer CVEs were disclosed concurrently: CVE-2026-35029 (CVSS 8.8, admin role bypass) and CVE-2026-35030 (JWT auth bypass)
The primary hardening controls are: pin GitHub Actions to commit SHAs, store secrets in a vault, deploy the gateway inside a VPC, and enforce role-based access on admin endpoints
OWASP LLM03 (Supply Chain) and NIST SP 800-218A apply directly and provide the compliance framework

What Is an AI Gateway and Why Is It a High-Value Target

An AI gateway is a proxy layer that sits between your applications and LLM provider APIs. LiteLLM, the most widely adopted open-source option, routes requests to over 140 providers and 2,500 models including OpenAI, Anthropic, Google Gemini, AWS Bedrock, Azure OpenAI, Mistral, and Cohere. With approximately 3.4 million daily PyPI downloads and over 480 million lifetime installs, it is present in a significant fraction of enterprise AI infrastructure.

The security relevance is straightforward: an AI gateway is a credential aggregator. A single LiteLLM instance may hold API keys for every model provider an organization uses, cloud IAM credentials for logging and storage, and potentially user session tokens. It also processes every LLM request in both directions, giving an attacker who controls it complete visibility into prompts, completions, and any PII passed through the pipeline.

This aggregation makes the gateway categorically different from a single-purpose library. Compromising a tokenizer library or an embedding utility affects one part of your pipeline. Compromising the gateway affects all of it, simultaneously, for every team and application that routes through it.

In practice, organizations also tend to expose AI gateways to a wider internal audience than they realize. Platform teams deploy a shared instance; individual developers authenticate against it; CI/CD pipelines pass it service tokens. The authentication surface grows with every new user and integration, often without explicit security review.

Anatomy of the Attack: A Four-Stage Cascading Campaign

The LiteLLM compromise was not a standalone incident. It was the third stage of a cascading supply chain campaign conducted by TeamPCP between March 19 and March 27, 2026, documented in detail by Datadog Security Labs, Snyk, Wiz, and Unit42.

Stage 1: Trivy GitHub Action (March 19). TeamPCP exploited a misconfigured pull_request_target workflow in Aqua Security's Trivy container scanner to steal a personal access token. They force-pushed malicious commits to 76 of 77 Trivy GitHub Action version tags, pointing them to a malicious release that dumped GitHub Actions runner memory, scraped credentials, and exfiltrated the results encrypted with AES and RSA to an attacker-controlled domain (scan.aquasecurtiy[.]org, a typosquatted domain).

Stage 2: Checkmarx KICS (March 21-23). Using stolen PATs from Stage 1, TeamPCP poisoned all 35 version tags of the Checkmarx KICS GitHub Action and the Checkmarx AST action v2.3.28.

Stage 3: LiteLLM PyPI (March 24). LiteLLM's CI/CD pipeline used Trivy for container scanning. During a build run, the compromised Trivy action exfiltrated the PYPI_PUBLISH token from the Actions runner environment. TeamPCP used this legitimate credential to upload litellm v1.82.7 and v1.82.8 directly to PyPI, bypassing all integrity checks because the packages were signed with the real key.

Stage 4: Telnyx SDK (March 27). TeamPCP poisoned Telnyx Python SDK v4.87.1 and v4.87.2 using a novel technique: steganographic embedding of malicious code in WAV audio files bundled with the package.

The full campaign exfiltrated an estimated 300 GB of data including approximately 500,000 credentials. Forty-eight additional packages were subsequently compromised via token-chaining from harvested credentials.

Inside the Payload

The malicious LiteLLM packages installed a litellm_init.pth file into Python site-packages. Python .pth files execute automatically on every Python interpreter startup, making this a persistence mechanism that runs before the application code. The payload operated in three stages:

Stage A: Credential harvester. Swept for SSH keys, AWS IAM credentials, GCP service account files, Azure environment variables, Kubernetes configs and secrets, Docker credentials, shell history, .env files, and cryptocurrency wallets. Results were packed into an encrypted tpcp.tar.gz and exfiltrated to attacker infrastructure.

Stage B: Kubernetes lateral movement. Deployed privileged pods to every accessible node in any reachable Kubernetes cluster, using credentials harvested in Stage A.

Stage C: Persistent backdoor. Installed sysmon.service as a systemd unit that polled an attacker-controlled domain for additional binaries, establishing a C2 channel for further commands.

Detection at install time was near-impossible. The packages were published with legitimate credentials, so no hash mismatch occurred. No standard pip plugin monitored for unexpected .pth file creation. The packages appeared in PyPI's dependency metadata as the authentic LiteLLM package.

Why AI Gateways Concentrate Blast Radius

The LiteLLM attack exposed a structural property of AI gateway deployments that most organizations have not formally threat-modeled.

A typical production LiteLLM deployment holds: API keys for every LLM provider the organization uses, a master API key used by all application teams to authenticate to the gateway, cloud storage credentials for logging and caching, database credentials for the proxy's config store, and in Kubernetes environments, a service account with enough privilege to route traffic across namespaces.

All of this is accessible to the Python process running LiteLLM. A malicious .pth file executing in that environment reaches everything in a single pass.

Beyond credentials, the gateway processes all LLM traffic. An attacker with control of the gateway can read every prompt sent by every user in the organization, modify completions before they reach applications, and inject content into responses without any application-layer detection. For organizations that pass regulated data (health records, financial information, legal documents) through LLM pipelines, this is a data breach and a compliance event simultaneously.

Two proxy-layer CVEs disclosed in March 2026 compound the risk:

CVE-2026-35029 (CVSS 8.8): The /config/update endpoint in LiteLLM did not enforce the proxy_admin role prior to v1.83.0. Any authenticated user could update the gateway configuration, read arbitrary files from the host filesystem, or achieve remote code execution. For organizations running LiteLLM as a shared enterprise service, this means any developer with a gateway API key could reconfigure the service for all users.

CVE-2026-35030: The JWT authentication cache used only the first 20 characters of a token as the cache key. A collision bypass allowed an attacker with one valid token to craft requests authenticated as other users.

Hardening Checklist for AI Gateway Infrastructure

These controls address the specific failure modes demonstrated by the TeamPCP campaign and the concurrent CVE disclosures.

Pin All CI/CD Actions to Full Commit SHAs

The Trivy-to-LiteLLM attack path began with a GitHub Action referenced by a version tag (@v0.x) rather than a full commit SHA. Version tags are mutable: any repository owner or attacker with write access can point a tag at a different commit without changing the reference string in your workflow file.

Every GitHub Action in your AI gateway build, test, and release pipelines should be pinned to a full 40-character commit SHA:

# Vulnerable: mutable version tag
- uses: aquasecurity/trivy-action@v0.20.0

# Correct: immutable commit SHA
- uses: aquasecurity/trivy-action@a3e9c7d4e9f2b6c1d8e5f7a2b9c4d6e1f8a5b3c2

Tools like Dependabot and StepSecurity can audit and remediate this across all workflow files automatically.

Store Credentials in a Dedicated Secrets Manager

The PYPI_PUBLISH token was accessible to the compromised action because it was stored as a GitHub Actions secret accessible to workflows triggered by pull_request_target. For AI gateway deployments, the credential management discipline needs to be explicit:

Store all LLM provider API keys in HashiCorp Vault, AWS Secrets Manager, Google Secret Manager, or Azure Key Vault
Issue scoped, time-limited tokens to the gateway at runtime rather than permanent API keys in configuration files
Never store secrets in environment variables accessible to CI/CD runners during build steps that pull external dependencies
Rotate credentials immediately if a CI/CD pipeline is suspected of compromise, before confirming the scope of the breach

Deploy Inside a VPC With No Public Endpoint

Expose the AI gateway only to internal network traffic. The GreyNoise honeypot data from late 2025 and early 2026 captured 91,403 attack sessions targeting exposed LLM endpoints, including dedicated reconnaissance campaigns probing 73 model endpoints over 11 days from two IP addresses. Ollama, a common local model backend often paired with gateway proxies, ships with no authentication by default.

Your gateway should be accessible only to: internal application services via private networking, developers via VPN or bastion host, and CI/CD pipelines via a separate scoped service account. No LLM gateway endpoint should be reachable from the public internet without authentication.

Enforce Role Separation on Admin Endpoints

CVE-2026-35029 is a reminder that admin and user surfaces need explicit separation. After v1.83.0, LiteLLM enforces the proxy_admin role on /config/update. For any version prior to that, and as a defense-in-depth measure regardless of version, bind the admin endpoint to a separate internal port with dedicated authentication, not the same port used for general API access.

Apply this principle to any gateway that allows runtime reconfiguration. The management plane and the data plane are different trust boundaries and should require different credentials.

Add Runtime Monitoring at the Infrastructure Layer

Because the LiteLLM payload used a .pth file to execute before application code, application-layer monitoring would not have caught it. Runtime detection requires infrastructure-level controls:

File integrity monitoring on Python site-packages directories, with alerts on .pth file creation
Outbound network monitoring for connections to unexpected external domains from the gateway process
Kubernetes audit logs for privileged pod creation and unexpected service account token requests
Systemd unit creation alerts for services not in your approved baseline

These detections sit below the application layer, which is where a .pth-based attack operates.

Use a Private Package Mirror for Production Builds

For production AI gateway deployments, a private package mirror (Artifactory, Nexus, AWS CodeArtifact) with allowlisted packages and pinned versions adds a review gate before any new package version reaches production. This does not eliminate supply chain risk: if your internal mirror is compromised or if a version is allowlisted before inspection, the protection is bypassed. But it converts a zero-day PyPI compromise into a path that also requires poisoning your internal mirror, significantly narrowing the attack window.

Mapping to OWASP LLM Top 10 and NIST SSDF

The LiteLLM attack maps directly to OWASP LLM03: Supply Chain in the LLM Top 10 (2025). LLM03 covers compromised third-party components, insecure CI/CD pipelines, and packages with elevated privilege access. The 2026 update extends this to agentic pipeline components and MCP servers, recognizing that the AI infrastructure supply chain includes any component executing with LLM workload permissions.

NIST SP 800-218A (Secure Software Development Framework for Generative AI) provides a complementary framework requiring software producers to maintain verified component inventories, implement secure build processes, and sign artifacts. For AI gateway operators, SLSA Level 2 compliance means: build the gateway image in an isolated CI environment, generate a signed SBOM, push images with cosign verification, and verify signatures before deployment. LiteLLM adopted cosign signing for Docker images in v1.83.0 as part of its post-incident CI/CD v2 pipeline.

Teams evaluating their AI supply chain posture can also reference BeyondScale's AI security audit service for a structured review of gateway configuration, CI/CD pipeline security, and secrets management against these frameworks. For a self-directed starting point, the BeyondScale AI security scanner surfaces which components of your AI infrastructure carry the highest supply chain exposure.

Conclusion

AI gateway supply chain security requires controls at three layers: the build pipeline (pinned dependencies, isolated CI environments, signed artifacts), the deployment environment (VPC isolation, secrets management, role-based admin access), and runtime monitoring (file integrity, outbound network, Kubernetes audit logs).

The LiteLLM attack was sophisticated in its multi-stage design, but every step in the kill chain was preventable with existing controls. The Trivy action was referenced by a mutable tag. The PyPI token was accessible to all workflows in the repository. The gateway ran with access to cloud credentials it did not need for normal operation. Two concurrent CVEs independently granted escalated access to anyone with a gateway API key.

AI gateway supply chain security is not a research problem. It is an infrastructure operations discipline that the LiteLLM incident has made urgent. The organizations that treat their AI gateway as a first-class security boundary, applying the same rigor they apply to database servers and authentication services, are the ones that will contain the next TeamPCP campaign before it reaches their credential store.