Skip to main content
AI Security

Third-Party AI Vendor Risk Assessment: Enterprise Guide 2026

BT

BeyondScale Team

AI Security Team

15 min read

Third-party AI vendor risk assessment is one of the most consequential gaps in enterprise security programs today. In June 2025, CVE-2025-32711 (CVSS 9.3) demonstrated exactly what this gap looks like in practice: a zero-click prompt injection vulnerability in Microsoft 365 Copilot allowed attackers to silently exfiltrate data from SharePoint, OneDrive, and Teams using malicious prompts embedded in documents. No user interaction required. The organization trusted a major AI vendor, and that vendor's model became the attack vector against the organization's own data.

This is the defining challenge of AI vendor risk. It is not a traditional software security problem, and traditional TPRM frameworks are not equipped to address it.

This guide gives CISOs and GRC teams the practitioner framework they need: what makes AI vendors a distinct risk category, the five risk domains to assess, a 20-question security questionnaire with red flags and scoring guidance, contractual protections that must be in place, and how to monitor AI vendors after onboarding.

Key Takeaways

    • AI vendor risk is categorically different from traditional software vendor risk. Model drift, training data poisoning, and prompt injection through vendor APIs have no equivalent in conventional TPRM.
    • Only 4% of organizations have high confidence their vendor questionnaires accurately reflect a vendor's actual security posture (Whistic 2025 TPRM Impact Report).
    • The EchoLeak incident (CVE-2025-32711, CVSS 9.3) and 1,400+ malicious models discovered on Hugging Face since 2024 confirm that AI supply chain compromise is active, not theoretical.
    • An AIBOM (AI Bill of Materials) is the correct technical baseline for AI vendor supply chain assurance, not a traditional SBOM.
    • NIST AI RMF GOVERN 6, ISO 42001 Clause 8.4, and the FS-ISAC Generative AI Vendor Risk Assessment Guide are the three most applicable compliance frameworks for AI TPRM in 2026.
    • Ongoing behavioral monitoring is required. Point-in-time questionnaires are insufficient for AI vendors because model behavior changes continuously without versioned releases.

Why AI Vendor Risk Is Categorically Different

When a traditional software vendor ships an update, you get a discrete, auditable artifact with a version number and a diff. When an AI vendor updates a model, behavior changes without a code commit, without a deployable artifact you can review, and often without customer notification. This is not a governance failure. It is an architectural property of how AI models work.

Consider what a standard vendor security questionnaire cannot detect or prevent:

Training data poisoning. A poisoning attack corrupts model behavior through the training data rather than the code. NIST research has found that as little as 3% poisoned training data can create a detectable backdoor. That backdoor survives retraining cycles unless the contaminated data is specifically identified and removed. Standard penetration testing and code review cannot surface a training data backdoor. A model can score normally on benchmarks while harboring targeted behavior triggered by specific inputs.

Model drift without versioning. AI vendors using continuous learning, federated training, or periodic fine-tuning change model behavior without a versioned release. A model that is compliant, well-performing, and free of identified failures in January can exhibit statistically significant behavioral drift by June from data shift alone. There is no standard SLA framework for notifying customers of behavioral changes.

Prompt injection through vendor APIs. OWASP rates indirect prompt injection as LLM01:2025, the top risk in LLM applications. When your organization integrates a third-party LLM API, you inherit that API's exposure to indirect prompt injection: malicious instructions embedded in emails, documents, web pages, or retrieved records that cause the model to execute attacker-controlled actions. CVE-2024-5184 demonstrated this in a production email assistant. CVE-2025-68664 (LangGrinch) showed the same attack propagating through LangChain's serialization layer, affecting enterprise applications built on vendor LLM frameworks.

Fourth-party AI risk. When your SaaS vendor embeds OpenAI, Anthropic, or another model provider into their product, you have a fourth-party AI vendor whose data handling, model behavior, and security posture you have no contractual relationship with. You have agreed to the SaaS vendor's terms. You have not agreed to the LLM provider's terms, and the SaaS vendor often cannot give you contractual guarantees about a model provider they do not control.

These four categories require assessment criteria that most existing TPRM programs do not include.

The 5 AI-Specific Risk Domains

A structured AI vendor risk assessment should evaluate five domains. Each domain has failure modes that a traditional assessment framework does not detect.

Domain 1: Data Handling and Residency

The central question is not whether the vendor has a data security policy, but what actually happens to data after it is sent to a vendor AI system for inference.

Key risks: vendor inference logging and retention that conflicts with zero-retention agreements; prompts and sensitive data crossing jurisdictions without documented cross-border transfer mechanisms; customer fine-tuning data failing to be logically isolated from the vendor's base model training pipeline.

Red flags: a vendor that cannot specify contractually whether prompts are logged and for how long; a vendor that uses fine-tuning data without an explicit data processing agreement that prohibits use for base model improvements.

Domain 2: Model Behavior and Supply Chain

The central question is where the model came from, what it was trained on, and whether the vendor can prove it has not been tampered with.

Key risks: models sourced from public repositories (Hugging Face, GitHub) without integrity verification; base models with undisclosed backdoors; absence of an AIBOM documenting training data provenance, fine-tuning methods, and dependent models.

Since 2024, over 1,400 malicious models have been identified and removed from Hugging Face, many of which established reverse shell connections upon loading and accumulated thousands of enterprise downloads before detection. Any vendor sourcing models from public repositories without a documented scanning and integrity verification process is a supply chain risk.

Red flags: a vendor who cannot provide a signed AIBOM; a vendor who cannot specify which base model their product runs on and what fine-tuning was applied; a vendor with no documented process for scanning third-party model components before deployment.

Domain 3: Compliance Posture

The central question is which regulatory obligations apply to the vendor's AI system and whether the vendor's documentation supports your own compliance obligations.

Key risks: a vendor whose AI system falls under EU AI Act high-risk categories without the required Annex IV technical documentation; a vendor whose data residency practices conflict with GDPR or HIPAA requirements; a vendor with no ISO 42001 certification or equivalent AI governance program.

ISO 42001 Clause 8.4 is the most directly applicable standard: it requires organizations to conduct AI system impact assessments parallel to GDPR DPIAs before integrating AI systems from third parties, and requires evidence of transparency, fairness testing, and explainability from AI suppliers.

Red flags: a vendor who cannot map their AI governance program to NIST AI RMF GOVERN 6; a vendor who claims EU AI Act compliance without providing Annex IV technical documentation for high-risk applications.

Domain 4: Incident Response and Notification

The central question is what happens when something goes wrong with the vendor's AI system, and whether the notification timeline is specifically designed for AI incidents.

Standard breach notification clauses address data breach scenarios. They do not address: discovery of a model backdoor post-deployment; a model update that silently changes behavior in a way that affects client outcomes; a base model change from a fourth-party provider that the vendor did not anticipate.

Red flags: no AI-specific incident response playbook separate from standard breach response; no defined notification timeline for behavioral changes caused by model updates; no defined escalation path for fourth-party (sub-processor) AI incidents.

Domain 5: Ongoing Monitoring and Change Management

The central question is how the vendor notifies you of changes to the AI system's behavior, and what governance controls are in place during model updates.

In traditional software, a new version is a discrete notification event. In AI, behavioral change is continuous. A vendor can make a production model update that materially changes output quality, safety guardrail behavior, or response characteristics without triggering a versioned release.

Red flags: no defined behavioral regression testing before model updates; no threshold-based notification policy for changes in model performance or safety guardrail behavior; no audit log of model updates and their assessed behavioral impact.

The AI Vendor Security Questionnaire: 20 Questions

Use these questions in your formal vendor due diligence process. Questions are grouped by the five domains above.

Data Handling and Residency

  • Do you log inference requests (prompts) and responses? If so, for how long and for what purposes? Is this configurable, and can zero-retention be contractually guaranteed?
  • In which jurisdictions is inference processing performed? What mechanisms are in place for cross-border data transfer compliance (Standard Contractual Clauses, adequacy decisions)?
  • How is customer fine-tuning data logically isolated from your base model training pipeline? Can you provide architectural evidence?
  • Who are your fourth-party AI sub-processors (e.g., base model providers)? What data processing terms apply to them?
  • Model Behavior and Supply Chain

  • Can you provide a signed AIBOM (AI Bill of Materials) documenting model weights, training dataset sources, fine-tuning provenance, and dependent models?
  • What is your process for scanning or verifying the integrity of AI components sourced from public repositories (Hugging Face, GitHub, npm)? Do you use cryptographic signatures or hashes for model artifacts?
  • Has your model been red-teamed for indirect prompt injection? Can you share the scope and key findings?
  • How do you detect and respond to training data poisoning? What is your process if a backdoor or supply chain compromise is discovered post-deployment?
  • Compliance Posture

  • Which regulatory frameworks apply to your AI system (EU AI Act risk classification, NIST AI RMF alignment, ISO 42001 certification)? Can you provide documentation?
  • Under EU AI Act Article 13, what transparency and documentation is available for your AI system? If your system is high-risk under Annex III, can you provide Annex IV technical documentation?
  • Have you conducted an AI system impact assessment under ISO 42001 Clause 8.4 or equivalent? What are the documented residual risks?
  • Do you disclose the training data sources and their provenance? What data filtering, consent verification, and bias assessment was performed?
  • Incident Response and Notification

  • What is your AI-specific incident response plan, and how does it differ from your standard breach response? What scenarios trigger AI incident response?
  • What is your notification timeline for: (a) a confirmed model backdoor, (b) a base model update from a fourth-party provider, (c) discovery of a behavioral anomaly that affects client outputs?
  • In the past 24 months, have you experienced any AI-specific security incidents? If so, what was the nature of the incident and the remediation?
  • Ongoing Monitoring and Change Management

  • What behavioral regression testing do you perform before model updates? What metrics are tracked, and what thresholds trigger a customer notification?
  • What is your process for notifying customers of model behavioral changes, including drift from base model updates, fine-tuning, or safety guardrail changes?
  • Do you provide audit logs of model version updates, configuration changes, and safety guardrail modifications? What retention period applies?
  • What anomaly detection mechanisms are in place for detecting unexpected model behavior in production?
  • What is your model end-of-life policy? How are customers notified of planned deprecations, and what migration support is provided?
  • Scoring guidance: Questions 5 (AIBOM), 6 (supply chain integrity), 13 (AI incident response), and 16 (behavioral regression) are the highest-signal questions. A vendor who cannot answer these concretely should be escalated regardless of how they perform on other criteria.

    Contractual Protections Every AI Vendor Agreement Needs

    Standard Master Service Agreements were not designed for AI vendors. These provisions must be added or explicitly negotiated:

    Model change notification. The contract must require notification of behavioral changes, not just versioned releases. Define thresholds: what constitutes a material behavioral change and what the notification timeline is. Failure to define this leaves you without recourse when a silent model update changes outcomes.

    Zero-retention and inference logging. If the vendor offers zero-retention API access, that commitment must be contractually binding, not just a configuration option the vendor can change. Include specific language about what "zero-retention" means and whether it applies to abuse monitoring pipelines.

    AIBOM delivery. Require the vendor to provide a current AIBOM on request and within a defined period (30 days is a reasonable starting point). This gives you the basis for ongoing supply chain monitoring.

    Fourth-party AI sub-processor disclosure. The vendor must disclose all AI sub-processors and provide notice before adding or changing them. You cannot manage fourth-party risk you do not know exists.

    AI-specific incident response timelines. Add explicit clauses covering: model backdoor discovery (notify within 24 hours), behavioral anomaly identified affecting client outputs (notify within 72 hours), base model update from fourth-party provider with potential behavioral impact (notify before deployment where possible).

    Audit rights. Include explicit rights to audit the vendor's AI governance program, model supply chain documentation, and incident response procedures. Without this, you are relying on self-reported questionnaire responses with no verification mechanism.

    Tiering AI Vendors by Risk Level

    Not all AI vendors carry the same risk profile. Tiering allows you to allocate due diligence resources appropriately.

    Tier 1 (Highest risk): Customer-facing AI systems that process regulated data (PII, PHI, financial records) and make or inform consequential decisions (credit, hiring, clinical, fraud). Requires full Level 3 due diligence including AIBOM delivery, fourth-party sub-processor mapping, ISO 42001 or equivalent certification, and dedicated AI incident response SLAs. Review at least annually and on any major model update.

    Tier 2 (Moderate risk): Internal AI systems that process regulated data or are deeply integrated into business workflows (document processing, code generation at scale, internal copilots with access to sensitive systems). Requires Levels 1 and 2 due diligence: data residency verification, inference logging policy, basic supply chain questions, and defined notification obligations. Review every 18 months or on major model update.

    Tier 3 (Lower risk): AI systems used in isolated R&D, analytics, or productivity contexts with no access to regulated data, no external-facing exposure, and no material business process dependency. Requires foundational questionnaire covering data privacy, API integration, and basic security controls. Review every 24 months.

    Use the FS-ISAC tiered framework as your scoring baseline for data sensitivity and integration depth dimensions. Shadow AI, defined as SaaS vendors silently adding AI features to products you have already onboarded, represents a gap in most tiering systems. Addressing this requires continuous AI inventory monitoring, not just point-in-time assessments during procurement.

    Ongoing Monitoring After Onboarding

    Most AI vendor risk programs treat onboarding as the primary assessment moment and set calendar reminders for annual renewals. This is the wrong model for AI vendors.

    In practice, an AI vendor's risk profile can change materially between annual reviews, because model behavior changes continuously. A vendor who migrates from GPT-4 to GPT-4.5 as their base model, adds a new retrieval pipeline, or fine-tunes on a new dataset may not notify customers of any of these changes under a standard MSA.

    Effective ongoing monitoring for AI vendors includes:

    Behavioral regression testing. Define a set of test inputs that probe for the behaviors your organization depends on. Run these against the vendor's API on a scheduled basis. Material drift in outputs warrants a vendor inquiry and potentially a re-assessment.

    Automated API response anomaly detection. Tools like those offered by BeyondScale's AI security assessment program can detect anomalous shifts in model behavior that may signal a model update, a configuration change, or a supply chain compromise.

    Vendor update tracking. Subscribe to vendor security advisories, model changelogs, and base model provider announcements. When a fourth-party base model provider (OpenAI, Anthropic, Google) announces a model change, assess whether your Tier 1 and Tier 2 vendors are affected.

    Annual compliance re-assessment. AI regulatory requirements are evolving fast. A vendor who was EU AI Act compliant in 2025 may face new obligations under 2026 implementing acts. Annual re-assessment should include a regulatory posture review, not just a technical one.

    If you need help structuring an AI vendor monitoring program or evaluating a specific vendor's responses, BeyondScale's AI security team has assessed dozens of AI vendor integrations across regulated industries.

    Conclusion

    Third-party AI vendor risk assessment is not traditional TPRM with an AI section added. It requires a different framework: AIBOM-based supply chain verification, behavioral monitoring instead of point-in-time scans, contractual protections for model change notification and fourth-party sub-processors, and tiering that reflects the unique risk dimensions AI vendors introduce.

    The incidents are real: EchoLeak (CVE-2025-32711, CVSS 9.3), 1,400+ malicious models on Hugging Face, the PowerSchool breach affecting 62.4 million students. The stats are stark: only 4% of organizations have confidence their questionnaires reflect vendor reality; 49% experienced a third-party incident in the past 12 months.

    The frameworks exist. NIST AI RMF GOVERN 6, ISO 42001 Clause 8.4, and the FS-ISAC tiered assessment guide give you the compliance grounding. The 20 questions and contractual provisions above give you the practitioner tools.

    If your organization is onboarding AI vendors and does not have an AI-specific TPRM framework in place, the right time to build one was during procurement. The second-best time is now.

    Run a BeyondScale AI security scan to identify which AI vendor integrations in your environment have unresolved risk exposure. Or contact our team to discuss building an AI TPRM program tailored to your vendor portfolio and compliance requirements.

    Further Reading

    Share this article:
    AI Security
    BT

    BeyondScale Team

    AI Security Team, BeyondScale Technologies

    Security researcher and engineer at BeyondScale Technologies, an ISO 27001 certified AI cybersecurity firm.

    Want to know your AI security posture? Run a free Securetom scan in 60 seconds.

    Start Free Scan

    Ready to Secure Your AI Systems?

    Get a comprehensive security assessment of your AI infrastructure.

    Book a Meeting