Skip to main content
AI Security

LLM Plugin Security: Agent Skill Supply Chain Attacks

BT

BeyondScale Team

AI Security Team

11 min read

LLM agent skill marketplace poisoning is one of the least-discussed and fastest-growing supply chain threats facing AI teams in 2026. Security researchers published four major papers on the topic in April 2026 alone, and real-world exploitation of the underlying attack surfaces, from PyPI to npm to MCP registries, is already active. This guide explains the attack taxonomy, what the data shows about success rates, and the specific controls that reduce organizational risk.

Key Takeaways

    • A Snyk audit of 3,984 agent skills found that 13.4% contained critical security issues, including credential theft payloads, backdoor installation, and data exfiltration code, with 8 confirmed malicious skills still publicly available at publication.
    • The BadSkill attack (arXiv:2604.09378) achieved a 99.5% attack success rate using a 3% poison rate, tested across eight model architectures.
    • The LiteLLM supply chain attack (CVE-2026-33634, CVSS 9.4) compromised PyPI publishing credentials through a poisoned CI/CD scanner, reaching 3.4 million daily downloads before detection.
    • OWASP LLM03:2025 formally categorizes malicious plugin supply chain attacks, but most enterprise AI security programs treat this as a future risk rather than a present one.
    • Publishing a skill to ClawHub requires only a SKILL.md file and a week-old GitHub account, with no code signing, sandboxing, or security review.
    • Defense requires a combination of allowlisting, sandboxed execution, private registries, and runtime monitoring: no single control is sufficient.

What Agent Skills Are and Why They Run with Elevated Privileges

LLM coding agents such as Cursor, GitHub Copilot, and custom deployments built on CrewAI or LangGraph extend their capabilities through skills, also called tools, plugins, or MCP servers depending on the platform. A skill is a packaged capability: a function, a web search integration, a code execution environment, a database query tool, or a file system accessor that the agent can invoke to complete a task.

The critical difference between a skill and traditional software is the invocation model. In a conventional application, a developer writes the call site, reviews the code, and decides when a library function executes. In an agentic system, the LLM itself decides to invoke a skill based on a natural-language task description. A developer asking an agent to "refactor this function and run the tests" may trigger a dozen skill invocations without reviewing each one. Skills frequently hold permissions that match the agent's own access scope: file system read/write, external API credentials, database connections, and shell execution.

This architecture creates a fundamentally different attack surface. Compromising a skill means compromising everything that skill can touch, and the agent becomes the attacker's delivery mechanism.

The Attack Taxonomy: Five Ways Skills Are Weaponized

Research published in April 2026 across four arXiv papers (2604.03081, 2604.08407, 2604.09378, 2604.06811) maps a clear taxonomy of how agent skills and plugins are weaponized.

Document-Driven Implicit Payload Execution (DDIPE). The primary technique from arXiv:2604.03081 embeds malicious logic in code examples and configuration templates inside skill documentation, rather than in executable code. When an LLM coding agent references the documentation during a normal task, it copies the malicious example and executes the payload without any explicit prompting. Across 1,070 adversarial skills generated from 81 seeds covering 15 MITRE ATT&CK categories, the attack achieved bypass rates of 11.6% to 33.5% under strong defenses. Static analysis and alignment filters both failed to catch 2.5% of cases.

Model-in-Skill Backdoor (BadSkill). A published skill bundles a backdoor-fine-tuned classifier that activates a hidden payload only when routine skill parameters satisfy attacker-chosen semantic trigger combinations. In normal operation, the skill behaves correctly. The attack (arXiv:2604.09378) achieved up to 99.5% attack success rate across eight architectures ranging from 494 million to 7.1 billion parameters. A 3% poison rate yielded 91.7% attack success rate.

Partitioned Payload Execution (SkillTrojan). Rather than placing malicious logic in a single skill, arXiv:2604.06811 partitions an encrypted payload across multiple benign-looking skill invocations that activate only under a predefined trigger sequence. This technique achieved 97.2% attack success rate on GPT-5.2 while maintaining 89.3% clean accuracy on normal tasks, making behavioral anomaly detection ineffective.

Dependency Confusion and CI/CD Compromise. The LiteLLM supply chain attack (CVE-2026-33634) illustrates how traditional software supply chain techniques translate directly to the AI ecosystem. Attackers compromised a security scanner (Trivy) used in LiteLLM's CI/CD pipeline by exploiting a misconfigured pull_request_target GitHub Actions workflow. This gave them access to LiteLLM's PyPI publishing credentials. Two malicious versions (1.82.7 and 1.82.8) were live for approximately 40 minutes, serving a .pth file payload that executed on every Python process startup and staged credential harvesting, Kubernetes lateral movement, and a persistent systemd backdoor.

Malicious API Router Interception. A separate study (arXiv:2604.08407) audited 428 LLM API routers available through public marketplaces. Nine routers actively injected malicious code into responses. Seventeen accessed researcher-controlled AWS canary credentials. One drained ETH from a researcher-controlled private key. Adaptive evasion variants waited for 50 prior legitimate calls before activating, or targeted only specific languages and frameworks to avoid detection during early testing.

What the Skill Ecosystem Looks Like Today

In early 2026, Snyk conducted the first comprehensive security audit of the agent skills ecosystem, scanning 3,984 skills from ClawHub and skills.sh. The results were significant:

  • 13.4% (534 skills) contained at least one critical-level security issue.
  • 36.82% (1,467 skills) contained at least one security flaw of any severity.
  • 76 skills were confirmed malicious, combining active credential theft, backdoor installation, and data exfiltration code with prompt injection techniques.
  • 8 confirmed malicious skills remained publicly available on ClawHub at publication.
Publishing a skill to these registries requires only a SKILL.md Markdown file and a GitHub account less than a week old. There is no code signing requirement, no mandatory security review, and no sandboxed evaluation environment. The parallels to the early npm ecosystem, before tools like Snyk and Socket Security became standard practice, are direct.

MCP servers, the Model Context Protocol integration layer used by Cursor, Claude, and other platforms, present a similar picture. An Endor Labs analysis of 2,614 MCP implementations found that 82% used file operations prone to path traversal, 67% used APIs related to code injection, and 34% used APIs susceptible to command injection. A February 2026 internet scan found more than 8,000 MCP servers exposed publicly with no authentication.

In practice, any team that installs a community skill or MCP integration without reviewing the source, checking for code signing, and running it in a sandboxed environment is accepting an uninspected remote code execution surface.

How This Connects to OWASP LLM03:2025

OWASP LLM03:2025 formally categorizes this threat class under six supply chain risk categories for LLM applications: vulnerable or outdated dependencies, compromised pre-trained models, malicious LoRA adapters, poisoned training data and RAG sources, model merge vulnerabilities on platforms such as Hugging Face, and malicious or compromised plugins.

The OWASP guidance is consistent with what the research shows: the plugin and skill layer is now a primary attack surface, not an afterthought. OWASP MCP04:2025 separately covers software supply chain attacks and dependency tampering for MCP-based integrations.

What the standards do not yet provide is production-grade tooling or enforcement mechanisms. Most enterprise AI teams have compliance with OWASP LLM03 on their roadmap, not in their current controls. The gap between documentation and implementation is where active exploitation is occurring.

Real-World Incidents: What Active Exploitation Looks Like

Several incidents in 2025 and early 2026 illustrate how these attack paths are being used in practice.

The postmark-mcp npm package (September 2025) was the first confirmed in-the-wild malicious MCP server. MCPoison (CVE-2025-54136, August 2025) achieved persistent code execution in Cursor IDE through an MCP misconfiguration. The mcp-server-git package carried three separate RCE CVEs (CVE-2025-68143, 68144, 68145) that were disclosed in early 2026.

In the LangGraph and LangChain ecosystem, CVE-2025-68664 (CVSS 9.3) exposed a deserialization vulnerability where LLM outputs could influence serialized fields that the framework subsequently deserialized through normal features. CVE-2025-68665 (CVSS 8.6) enabled secret extraction through prompt injection into response fields such as additional_kwargs. These are not theoretical vulnerabilities: they exploit standard framework behavior, not edge cases.

The Flowise RCE (CVE-2025-59528, CVSS 10.0) is a current active exploitation example. Flowise, which is used to build LLM chatbot and agent pipelines, was found to execute arbitrary JavaScript from user-provided configuration strings without validation. Between 12,000 and 15,000 Flowise instances were exposed at the time exploitation began in April 2026. Flowise deployments typically hold OpenAI, Anthropic, and Google API credentials, along with vector store access and RAG pipeline data sources.

We have seen organizations treat each of these incidents as isolated, product-specific issues rather than signals of a systemic attack category. The pattern across all of them is identical: a third-party component with elevated access, insufficient isolation, and no runtime behavioral monitoring.

Defense Framework: What Actually Reduces Risk

No single control eliminates this threat class. Effective defense requires layering across the installation, execution, and monitoring phases.

Skill allowlisting before installation. Treat skills the same way you treat external code dependencies: review before installation, not after. Maintain a catalog of approved skills and block unapproved installations in production environments. This is the equivalent of dependency pinning and lockfiles in traditional software supply chain security.

Code signing and integrity verification. Require cryptographic signatures on all skill packages before installation. Where a registry does not support signing, download the source and hash it against a known-good value. This controls tampering between publication and installation but does not address malicious code that was signed correctly by a compromised publisher, which is why signing alone is insufficient.

Sandboxed skill execution environments. Run skills with restricted syscall access, network egress controls, and filesystem isolation. A skill that should query an external API has no legitimate reason to access local credential stores or other files on the host. Sandbox configurations should enforce this at the OS level, not the application level.

Private registries for production workloads. Do not deploy production agents that pull skills from public registries at runtime. Mirror approved skills to a private registry under your control, review them before promotion, and configure agents to install only from that registry. This is standard practice for npm and PyPI in mature DevSecOps environments and should be extended to skill registries.

Runtime monitoring for anomalous tool invocations. Instrument agent workloads to detect anomalous patterns: unexpected credential access, unusual file paths, external network calls not present in normal operation, and API call volume spikes. These signals are more reliable than behavioral testing because they detect what the agent is actually doing rather than what it is saying.

Least-privilege tool scoping. Audit the permissions each skill holds and reduce them to the minimum required for the skill's documented function. A skill that searches documentation should not hold write access to the repository. A skill that formats code should not hold database credentials. Scoping by function prevents a compromised skill from accessing resources beyond its declared purpose.

The BeyondScale AI security audit includes specific assessment coverage for agent skill and plugin security: reviewing installation practices, execution isolation, runtime monitoring, and permission scoping across LangGraph, CrewAI, MCP-based, and custom agent deployments. The BeyondScale AI penetration testing service tests whether installed skills can be used to escalate access or exfiltrate credentials under realistic attack conditions.

What Competitors Are Not Covering

HiddenLayer's supply chain content focuses on MLOps pipeline threats and poisoned model artifacts. Lakera's supply chain coverage is rated "Not Applicable" at runtime in their own product documentation. Prompt Security has covered the LiteLLM incident but frames it as a runtime protection problem rather than an installation-time control problem.

None of these approaches address the specific mechanics of skill marketplace poisoning: the DDIPE technique, allowlist bypasses for model-in-skill backdoors, or the gap between what OWASP LLM03 recommends and what teams can enforce today. That is a practitioner gap, and it is the gap where active exploitation is occurring.

Conclusion

LLM agent skill marketplace poisoning represents a supply chain attack class that is well-documented in academic research, actively exploited in the wild, and underaddressed in most enterprise AI security programs. The barrier to publishing a malicious skill is lower than it is for most traditional software registries, the blast radius of a compromised skill is proportional to the agent's access scope, and current detection tooling is not yet mature enough to catch the most sophisticated variants.

The practical response is not to wait for registries to implement security controls. Teams should establish skill allowlists and private registries now, enforce sandboxed execution environments for all third-party skills, and instrument runtime monitoring before the next incident.

If you want an independent assessment of your current agent security posture, including skill and plugin attack surface, start with the BeyondScale AI Security Scan or contact us to discuss a full AI security assessment.


References: OWASP LLM03:2025 Supply Chain Risks | arXiv:2604.03081 (DDIPE) | arXiv:2604.09378 (BadSkill) | NVD CVE-2025-68664 | NVD CVE-2025-68665

Share this article:
AI Security
BT

BeyondScale Team

AI Security Team, BeyondScale Technologies

Security researcher and engineer at BeyondScale Technologies, an ISO 27001 certified AI cybersecurity firm.

Want to know your AI security posture? Run a free Securetom scan in 60 seconds.

Start Free Scan

Ready to Secure Your AI Systems?

Get a comprehensive security assessment of your AI infrastructure.

Book a Meeting