Assessing AI Risk in Vendor and Partner Tools: A Due Diligence Framework
Your agency uses a third-party API for natural language processing as a component in several client projects. You chose it for its accuracy and ease of integration. Then the vendor quietly updated their model, and the new version started hallucinating technical terms in your client's domain โ generating plausible-sounding but fabricated product specifications. Your client's quality assurance team caught the errors, but not before several incorrect specifications were published to their customer portal. When you contacted the vendor, they confirmed the model update and said the change was covered in their release notes (which you hadn't read because you didn't have a process for monitoring vendor changes). The client held your agency responsible because you chose the vendor and integrated the tool. They weren't wrong.
AI agencies don't build everything from scratch. You use foundation models, pre-trained components, cloud AI services, data providers, annotation tools, monitoring platforms, and dozens of other third-party tools. Every one of these introduces AI risk into your supply chain. And when that risk materializes in a client project, the client doesn't care that the root cause was a vendor's model update. They care that the system you delivered broke.
Vendor AI risk assessment is the process of systematically evaluating the risks that third-party AI tools introduce into your projects and implementing controls to manage those risks. This guide provides a framework for doing it well.
Why Vendor AI Risk Is Different
Traditional vendor risk assessment focuses on security, uptime, financial stability, and data protection. These are all relevant for AI vendors, but AI introduces additional risk dimensions that traditional assessments miss.
Model behavior can change without notice. Unlike traditional software where behavior changes require explicit code updates, AI models can behave differently with new training data, fine-tuning, or architecture changes. Vendors may update their models frequently, and each update can change the behavior of systems built on top of them.
AI performance is probabilistic. Traditional software either works correctly or has a bug. AI systems produce probabilistic outputs that may be correct most of the time but fail for specific inputs, populations, or conditions. Vendor AI tools may perform well for your test cases but fail for edge cases that appear in your clients' production environments.
Bias transfers through the supply chain. If a vendor's model is biased, every system built on top of it inherits that bias. Your agency may conduct rigorous fairness testing on your own models but unknowingly introduce bias through a vendor component.
Explainability depends on the vendor. If your system uses a vendor's model as a black box, you may be unable to explain the system's decisions, even if the rest of your pipeline is fully transparent.
Vendor AI creates regulatory liability. Under many AI regulations, the deployer bears primary responsibility for the AI system's compliance, regardless of which components came from vendors. Your client can't pass regulatory liability to your vendor, and your agency can't pass it either.
The Vendor AI Risk Assessment Framework
Our framework evaluates vendor AI risk across seven dimensions. For each dimension, we provide specific questions to ask and red flags to watch for.
Dimension 1: Model Transparency
How much does the vendor disclose about their model's design, training, and behavior?
Questions to ask:
- What model architecture is used? What are its known strengths and limitations?
- What data was the model trained on? What is the data's provenance and licensing status?
- How was the model evaluated? What metrics were measured, and what were the results?
- Does the vendor provide a model card or equivalent documentation?
- Can the vendor explain how the model makes decisions?
- What fairness testing has been conducted, and what were the results?
Red flags:
- The vendor refuses to disclose basic information about the model's architecture or training data
- No model card or equivalent documentation is available
- The vendor claims the model is "unbiased" without providing supporting evidence
- The vendor cannot explain how the model handles edge cases or out-of-distribution inputs
Why it matters: Without transparency, you can't evaluate whether the vendor's model is appropriate for your use case, and you can't satisfy your own documentation and audit requirements.
Dimension 2: Performance and Reliability
How well does the vendor's model perform, and how reliable is that performance?
Questions to ask:
- What are the model's performance metrics on standard benchmarks relevant to your use case?
- How does performance vary across different populations, geographies, or data distributions?
- What is the model's latency and throughput? How does it scale?
- What is the vendor's uptime track record? What SLAs do they offer?
- How does the model handle inputs outside its training distribution?
- What happens when the model fails? Are errors detectable?
Red flags:
- The vendor only reports aggregate performance without disaggregated metrics
- Performance claims are based on benchmarks that don't reflect your use case
- The vendor cannot demonstrate how the model handles edge cases
- Historical uptime is below 99.5% for a service you're relying on in production
Why it matters: Vendor model performance directly affects your deliverables. If the vendor's model underperforms for a subset of your client's population, your system inherits that performance gap.
Dimension 3: Update and Change Management
How does the vendor manage changes to their model, and how will those changes affect your systems?
Questions to ask:
- How often is the model updated? What triggers updates?
- How are customers notified about model changes? How much advance notice is provided?
- Can you pin to a specific model version to avoid unexpected behavior changes?
- Does the vendor provide release notes that describe what changed and why?
- Can you test new model versions before they're applied to your production systems?
- What is the vendor's deprecation policy for old model versions?
Red flags:
- The vendor updates models without notice or with minimal notice
- You cannot pin to a specific model version
- Release notes are vague or don't describe behavioral changes
- Old model versions are deprecated with short timelines
Why it matters: Unannounced model changes are one of the most common sources of vendor AI failures. If you can't control when and how model updates are applied, you can't guarantee the behavior of systems built on the vendor's model.
Dimension 4: Data Handling and Privacy
How does the vendor handle the data that flows through their system?
Questions to ask:
- Does the vendor use customer data to train or improve their models?
- Where is data processed and stored? Can you specify the geographic region?
- How long is data retained? Can you request deletion?
- What encryption is used for data in transit and at rest?
- Does the vendor share data with third parties?
- What data protection certifications does the vendor hold (SOC 2, ISO 27001, etc.)?
- Does the vendor's privacy policy comply with relevant regulations (GDPR, CCPA, etc.)?
Red flags:
- The vendor uses customer data for model training without explicit opt-out
- Data is processed in jurisdictions that create sovereignty concerns
- Data retention policies are vague or non-existent
- The vendor lacks relevant security certifications
Why it matters: When you send client data to a vendor's API, that vendor becomes a data processor under most privacy regulations. Their data handling practices directly affect your compliance obligations.
Dimension 5: Security
How secure is the vendor's AI system and infrastructure?
Questions to ask:
- Has the vendor conducted security testing specific to AI vulnerabilities (adversarial attacks, model extraction, prompt injection)?
- What access controls are in place? How is authentication and authorization managed?
- How does the vendor handle security incidents? What is their notification policy?
- Has the vendor been the subject of any security breaches or incidents?
- Does the vendor participate in bug bounty programs?
- What is the vendor's vulnerability management process?
Red flags:
- The vendor has not conducted AI-specific security testing
- Security incident notification policies are vague
- The vendor has experienced breaches without transparent communication
- Authentication mechanisms are weak (e.g., API keys with no rotation)
Dimension 6: Regulatory and Compliance Alignment
Does the vendor's AI system meet the regulatory requirements that apply to your clients?
Questions to ask:
- Which regulations does the vendor claim compliance with?
- Can the vendor provide documentation to support compliance claims?
- Does the vendor support your clients' audit requirements?
- How does the vendor handle regulatory changes? How quickly do they adapt?
- Can the vendor sign a Data Processing Agreement (DPA)?
- Does the vendor provide the disclosures required by relevant AI regulations (EU AI Act transparency requirements, etc.)?
Red flags:
- The vendor claims compliance without supporting documentation
- The vendor is unwilling or unable to sign a DPA
- The vendor is unfamiliar with regulations that apply to your clients' industries
- The vendor cannot support audit requirements
Dimension 7: Business Viability and Vendor Lock-In
Is the vendor financially stable, and what happens if they go away?
Questions to ask:
- What is the vendor's financial position? Are they funded, profitable, or at risk?
- How many customers use the product? Is it a core offering or a side project?
- What happens to your data and models if the vendor goes out of business?
- How difficult would it be to switch to an alternative vendor?
- Are there contractual protections (escrow, source code access) in case of vendor failure?
- Does the vendor use proprietary formats that create lock-in?
Red flags:
- The vendor is a startup with uncertain funding and no clear path to profitability
- Your system is deeply integrated with the vendor in ways that would make switching extremely difficult
- No contractual protections exist for vendor failure scenarios
- The vendor uses proprietary formats that don't interoperate with alternatives
Implementing Vendor AI Risk Assessment
Create a Vendor Risk Register
Maintain a register of all AI vendors and tools used across your agency. For each vendor, record:
- Vendor name and product
- What the product does and where it's used
- The risk assessment score (based on the seven dimensions)
- Key risks and mitigation measures
- Contract terms and renewal dates
- Point of contact at the vendor
- Date of last risk assessment review
Define Risk Thresholds
Establish thresholds for acceptable vendor AI risk. Vendors that score above the threshold require additional controls or should be avoided.
- Low risk โ Vendors with strong transparency, robust performance, good change management, and solid compliance alignment. These vendors can be used with standard monitoring.
- Medium risk โ Vendors with adequate but imperfect governance. These vendors can be used with enhanced monitoring, contractual protections, and contingency planning.
- High risk โ Vendors with significant gaps in transparency, security, or compliance. These vendors should be avoided unless the risk can be mitigated through additional controls or the alternative is worse.
Conduct Ongoing Monitoring
Vendor AI risk is not static. Monitor your vendors continuously.
- Subscribe to vendor release notes and communications. Stay informed about model updates, policy changes, and service modifications.
- Test vendor model performance regularly. Run your benchmark test suite against the vendor's model periodically to detect performance changes.
- Review vendor compliance status annually. Verify that the vendor's compliance certifications are current and that their practices align with evolving regulations.
- Monitor vendor financial health. For critical vendors, track financial indicators that might signal instability.
Build Contingency Plans
For every critical vendor dependency, have a contingency plan.
- Identify alternative vendors that could replace the current vendor if needed
- Maintain compatibility with alternative vendor APIs to reduce switching costs
- Keep copies of essential vendor artifacts (model versions, configurations, documentation) where contractually permitted
- Define switching triggers โ specific events or thresholds that would prompt you to switch vendors
Contractual Protections
Include these provisions in your vendor agreements:
- Model change notification requirements with adequate advance notice
- Version pinning rights allowing you to use a specific model version
- Performance guarantees with remedies if performance degrades
- Audit rights allowing you to assess the vendor's AI governance practices
- Data processing agreements compliant with applicable privacy regulations
- IP indemnification covering claims arising from the vendor's model outputs
- Exit provisions including data portability, transition assistance, and reasonable exit periods
Your Next Steps
This week: List all third-party AI tools and services used across your agency's projects. For each one, identify the key risks and assess whether adequate controls are in place.
This month: Conduct a formal risk assessment of your top three most critical AI vendors using the seven-dimension framework.
This quarter: Implement a vendor AI risk register and establish ongoing monitoring processes. Review and update your vendor contracts to include AI-specific protections.
Every third-party AI tool in your stack is a potential point of failure. Vendor AI risk assessment gives you visibility into those potential failures and the ability to manage them before they become client problems. The agencies that take vendor risk seriously will deliver more reliable AI systems and avoid the unpleasant surprise of discovering that their vendor's model update just broke three client projects simultaneously.