Third-party AI vendor risk assessment is one of the most consequential gaps in enterprise security programs today. In June 2025, CVE-2025-32711 (CVSS 9.3) demonstrated exactly what this gap looks like in practice: a zero-click prompt injection vulnerability in Microsoft 365 Copilot allowed attackers to silently exfiltrate data from SharePoint, OneDrive, and Teams using malicious prompts embedded in documents. No user interaction required. The organization trusted a major AI vendor, and that vendor's model became the attack vector against the organization's own data.
This is the defining challenge of AI vendor risk. It is not a traditional software security problem, and traditional TPRM frameworks are not equipped to address it.
This guide gives CISOs and GRC teams the practitioner framework they need: what makes AI vendors a distinct risk category, the five risk domains to assess, a 20-question security questionnaire with red flags and scoring guidance, contractual protections that must be in place, and how to monitor AI vendors after onboarding.
Key Takeaways
- AI vendor risk is categorically different from traditional software vendor risk. Model drift, training data poisoning, and prompt injection through vendor APIs have no equivalent in conventional TPRM.
- Only 4% of organizations have high confidence their vendor questionnaires accurately reflect a vendor's actual security posture (Whistic 2025 TPRM Impact Report).
- The EchoLeak incident (CVE-2025-32711, CVSS 9.3) and 1,400+ malicious models discovered on Hugging Face since 2024 confirm that AI supply chain compromise is active, not theoretical.
- An AIBOM (AI Bill of Materials) is the correct technical baseline for AI vendor supply chain assurance, not a traditional SBOM.
- NIST AI RMF GOVERN 6, ISO 42001 Clause 8.4, and the FS-ISAC Generative AI Vendor Risk Assessment Guide are the three most applicable compliance frameworks for AI TPRM in 2026.
- Ongoing behavioral monitoring is required. Point-in-time questionnaires are insufficient for AI vendors because model behavior changes continuously without versioned releases.
Why AI Vendor Risk Is Categorically Different
When a traditional software vendor ships an update, you get a discrete, auditable artifact with a version number and a diff. When an AI vendor updates a model, behavior changes without a code commit, without a deployable artifact you can review, and often without customer notification. This is not a governance failure. It is an architectural property of how AI models work.
Consider what a standard vendor security questionnaire cannot detect or prevent:
Training data poisoning. A poisoning attack corrupts model behavior through the training data rather than the code. NIST research has found that as little as 3% poisoned training data can create a detectable backdoor. That backdoor survives retraining cycles unless the contaminated data is specifically identified and removed. Standard penetration testing and code review cannot surface a training data backdoor. A model can score normally on benchmarks while harboring targeted behavior triggered by specific inputs.
Model drift without versioning. AI vendors using continuous learning, federated training, or periodic fine-tuning change model behavior without a versioned release. A model that is compliant, well-performing, and free of identified failures in January can exhibit statistically significant behavioral drift by June from data shift alone. There is no standard SLA framework for notifying customers of behavioral changes.
Prompt injection through vendor APIs. OWASP rates indirect prompt injection as LLM01:2025, the top risk in LLM applications. When your organization integrates a third-party LLM API, you inherit that API's exposure to indirect prompt injection: malicious instructions embedded in emails, documents, web pages, or retrieved records that cause the model to execute attacker-controlled actions. CVE-2024-5184 demonstrated this in a production email assistant. CVE-2025-68664 (LangGrinch) showed the same attack propagating through LangChain's serialization layer, affecting enterprise applications built on vendor LLM frameworks.
Fourth-party AI risk. When your SaaS vendor embeds OpenAI, Anthropic, or another model provider into their product, you have a fourth-party AI vendor whose data handling, model behavior, and security posture you have no contractual relationship with. You have agreed to the SaaS vendor's terms. You have not agreed to the LLM provider's terms, and the SaaS vendor often cannot give you contractual guarantees about a model provider they do not control.
These four categories require assessment criteria that most existing TPRM programs do not include.
The 5 AI-Specific Risk Domains
A structured AI vendor risk assessment should evaluate five domains. Each domain has failure modes that a traditional assessment framework does not detect.
Domain 1: Data Handling and Residency
The central question is not whether the vendor has a data security policy, but what actually happens to data after it is sent to a vendor AI system for inference.
Key risks: vendor inference logging and retention that conflicts with zero-retention agreements; prompts and sensitive data crossing jurisdictions without documented cross-border transfer mechanisms; customer fine-tuning data failing to be logically isolated from the vendor's base model training pipeline.
Red flags: a vendor that cannot specify contractually whether prompts are logged and for how long; a vendor that uses fine-tuning data without an explicit data processing agreement that prohibits use for base model improvements.
Domain 2: Model Behavior and Supply Chain
The central question is where the model came from, what it was trained on, and whether the vendor can prove it has not been tampered with.
Key risks: models sourced from public repositories (Hugging Face, GitHub) without integrity verification; base models with undisclosed backdoors; absence of an AIBOM documenting training data provenance, fine-tuning methods, and dependent models.
Since 2024, over 1,400 malicious models have been identified and removed from Hugging Face, many of which established reverse shell connections upon loading and accumulated thousands of enterprise downloads before detection. Any vendor sourcing models from public repositories without a documented scanning and integrity verification process is a supply chain risk.
Red flags: a vendor who cannot provide a signed AIBOM; a vendor who cannot specify which base model their product runs on and what fine-tuning was applied; a vendor with no documented process for scanning third-party model components before deployment.
Domain 3: Compliance Posture
The central question is which regulatory obligations apply to the vendor's AI system and whether the vendor's documentation supports your own compliance obligations.
Key risks: a vendor whose AI system falls under EU AI Act high-risk categories without the required Annex IV technical documentation; a vendor whose data residency practices conflict with GDPR or HIPAA requirements; a vendor with no ISO 42001 certification or equivalent AI governance program.
ISO 42001 Clause 8.4 is the most directly applicable standard: it requires organizations to conduct AI system impact assessments parallel to GDPR DPIAs before integrating AI systems from third parties, and requires evidence of transparency, fairness testing, and explainability from AI suppliers.
Red flags: a vendor who cannot map their AI governance program to NIST AI RMF GOVERN 6; a vendor who claims EU AI Act compliance without providing Annex IV technical documentation for high-risk applications.
Domain 4: Incident Response and Notification
The central question is what happens when something goes wrong with the vendor's AI system, and whether the notification timeline is specifically designed for AI incidents.
Standard breach notification clauses address data breach scenarios. They do not address: discovery of a model backdoor post-deployment; a model update that silently changes behavior in a way that affects client outcomes; a base model change from a fourth-party provider that the vendor did not anticipate.
Red flags: no AI-specific incident response playbook separate from standard breach response; no defined notification timeline for behavioral changes caused by model updates; no defined escalation path for fourth-party (sub-processor) AI incidents.
Domain 5: Ongoing Monitoring and Change Management
The central question is how the vendor notifies you of changes to the AI system's behavior, and what governance controls are in place during model updates.
In traditional software, a new version is a discrete notification event. In AI, behavioral change is continuous. A vendor can make a production model update that materially changes output quality, safety guardrail behavior, or response characteristics without triggering a versioned release.
Red flags: no defined behavioral regression testing before model updates; no threshold-based notification policy for changes in model performance or safety guardrail behavior; no audit log of model updates and their assessed behavioral impact.
The AI Vendor Security Questionnaire: 20 Questions
Use these questions in your formal vendor due diligence process. Questions are grouped by the five domains above.
Data Handling and Residency
Model Behavior and Supply Chain
Compliance Posture
Incident Response and Notification
Ongoing Monitoring and Change Management
Scoring guidance: Questions 5 (AIBOM), 6 (supply chain integrity), 13 (AI incident response), and 16 (behavioral regression) are the highest-signal questions. A vendor who cannot answer these concretely should be escalated regardless of how they perform on other criteria.
Contractual Protections Every AI Vendor Agreement Needs
Standard Master Service Agreements were not designed for AI vendors. These provisions must be added or explicitly negotiated:
Model change notification. The contract must require notification of behavioral changes, not just versioned releases. Define thresholds: what constitutes a material behavioral change and what the notification timeline is. Failure to define this leaves you without recourse when a silent model update changes outcomes.
Zero-retention and inference logging. If the vendor offers zero-retention API access, that commitment must be contractually binding, not just a configuration option the vendor can change. Include specific language about what "zero-retention" means and whether it applies to abuse monitoring pipelines.
AIBOM delivery. Require the vendor to provide a current AIBOM on request and within a defined period (30 days is a reasonable starting point). This gives you the basis for ongoing supply chain monitoring.
Fourth-party AI sub-processor disclosure. The vendor must disclose all AI sub-processors and provide notice before adding or changing them. You cannot manage fourth-party risk you do not know exists.
AI-specific incident response timelines. Add explicit clauses covering: model backdoor discovery (notify within 24 hours), behavioral anomaly identified affecting client outputs (notify within 72 hours), base model update from fourth-party provider with potential behavioral impact (notify before deployment where possible).
Audit rights. Include explicit rights to audit the vendor's AI governance program, model supply chain documentation, and incident response procedures. Without this, you are relying on self-reported questionnaire responses with no verification mechanism.
Tiering AI Vendors by Risk Level
Not all AI vendors carry the same risk profile. Tiering allows you to allocate due diligence resources appropriately.
Tier 1 (Highest risk): Customer-facing AI systems that process regulated data (PII, PHI, financial records) and make or inform consequential decisions (credit, hiring, clinical, fraud). Requires full Level 3 due diligence including AIBOM delivery, fourth-party sub-processor mapping, ISO 42001 or equivalent certification, and dedicated AI incident response SLAs. Review at least annually and on any major model update.
Tier 2 (Moderate risk): Internal AI systems that process regulated data or are deeply integrated into business workflows (document processing, code generation at scale, internal copilots with access to sensitive systems). Requires Levels 1 and 2 due diligence: data residency verification, inference logging policy, basic supply chain questions, and defined notification obligations. Review every 18 months or on major model update.
Tier 3 (Lower risk): AI systems used in isolated R&D, analytics, or productivity contexts with no access to regulated data, no external-facing exposure, and no material business process dependency. Requires foundational questionnaire covering data privacy, API integration, and basic security controls. Review every 24 months.
Use the FS-ISAC tiered framework as your scoring baseline for data sensitivity and integration depth dimensions. Shadow AI, defined as SaaS vendors silently adding AI features to products you have already onboarded, represents a gap in most tiering systems. Addressing this requires continuous AI inventory monitoring, not just point-in-time assessments during procurement.
Ongoing Monitoring After Onboarding
Most AI vendor risk programs treat onboarding as the primary assessment moment and set calendar reminders for annual renewals. This is the wrong model for AI vendors.
In practice, an AI vendor's risk profile can change materially between annual reviews, because model behavior changes continuously. A vendor who migrates from GPT-4 to GPT-4.5 as their base model, adds a new retrieval pipeline, or fine-tunes on a new dataset may not notify customers of any of these changes under a standard MSA.
Effective ongoing monitoring for AI vendors includes:
Behavioral regression testing. Define a set of test inputs that probe for the behaviors your organization depends on. Run these against the vendor's API on a scheduled basis. Material drift in outputs warrants a vendor inquiry and potentially a re-assessment.
Automated API response anomaly detection. Tools like those offered by BeyondScale's AI security assessment program can detect anomalous shifts in model behavior that may signal a model update, a configuration change, or a supply chain compromise.
Vendor update tracking. Subscribe to vendor security advisories, model changelogs, and base model provider announcements. When a fourth-party base model provider (OpenAI, Anthropic, Google) announces a model change, assess whether your Tier 1 and Tier 2 vendors are affected.
Annual compliance re-assessment. AI regulatory requirements are evolving fast. A vendor who was EU AI Act compliant in 2025 may face new obligations under 2026 implementing acts. Annual re-assessment should include a regulatory posture review, not just a technical one.
If you need help structuring an AI vendor monitoring program or evaluating a specific vendor's responses, BeyondScale's AI security team has assessed dozens of AI vendor integrations across regulated industries.
Conclusion
Third-party AI vendor risk assessment is not traditional TPRM with an AI section added. It requires a different framework: AIBOM-based supply chain verification, behavioral monitoring instead of point-in-time scans, contractual protections for model change notification and fourth-party sub-processors, and tiering that reflects the unique risk dimensions AI vendors introduce.
The incidents are real: EchoLeak (CVE-2025-32711, CVSS 9.3), 1,400+ malicious models on Hugging Face, the PowerSchool breach affecting 62.4 million students. The stats are stark: only 4% of organizations have confidence their questionnaires reflect vendor reality; 49% experienced a third-party incident in the past 12 months.
The frameworks exist. NIST AI RMF GOVERN 6, ISO 42001 Clause 8.4, and the FS-ISAC tiered assessment guide give you the compliance grounding. The 20 questions and contractual provisions above give you the practitioner tools.
If your organization is onboarding AI vendors and does not have an AI-specific TPRM framework in place, the right time to build one was during procurement. The second-best time is now.
Run a BeyondScale AI security scan to identify which AI vendor integrations in your environment have unresolved risk exposure. Or contact our team to discuss building an AI TPRM program tailored to your vendor portfolio and compliance requirements.
Further Reading
BeyondScale Team
AI Security Team, BeyondScale Technologies
Security researcher and engineer at BeyondScale Technologies, an ISO 27001 certified AI cybersecurity firm.
Want to know your AI security posture? Run a free Securetom scan in 60 seconds.
Start Free Scan

