A pharmaceutical company deployed an AI-powered drug interaction checker that doctors relied on for prescribing decisions. A security researcher discovered that carefully crafted inputs could cause the model to misclassify dangerous interactions as safe. The vulnerability was not in the application code โ it was in the model itself. The model was susceptible to adversarial inputs that looked like normal queries to a human but were specifically designed to trigger incorrect responses. The company had undergone a comprehensive security audit of their application and infrastructure, but nobody had tested the model for adversarial robustness. The vulnerability existed for seven months before it was discovered and patched. Fortunately, no patient was harmed, but the incident triggered a $2 million security remediation project and a regulatory review.
AI systems introduce novel security challenges that traditional application security does not address. Models can be attacked, training data can be poisoned, and the probabilistic nature of AI means that even well-functioning systems can produce dangerous outputs under specific conditions. Your agency must deliver security architectures that protect the entire AI stack โ not just the application layer.
AI-Specific Threat Landscape
Threat Category 1: Model Attacks
Adversarial inputs. Carefully crafted inputs that cause the model to produce incorrect outputs. These inputs often look normal to humans but exploit specific vulnerabilities in the model's decision boundaries. In computer vision, a few changed pixels can cause an image classifier to misidentify an object. In NLP, subtle word substitutions can flip a sentiment classification.
Model inversion. An attacker with access to the model's API can reconstruct information about the training data by making many queries and analyzing the responses. This is particularly dangerous when training data contains sensitive information (medical records, financial data).
Model extraction (model theft). An attacker replicates the model by querying it extensively and training a clone model on the input-output pairs. This steals the intellectual property embedded in the model.
Prompt injection (for LLM systems). An attacker embeds instructions in the input that override the model's system prompt, causing it to ignore its instructions and follow the attacker's commands.
Threat Category 2: Data Attacks
Training data poisoning. An attacker corrupts the training data to cause the model to learn incorrect patterns. This can be done by injecting malicious examples into the training dataset or by manipulating the data collection process.
Data exfiltration. An attacker accesses the training data or inference data, which may contain sensitive information (PII, trade secrets, proprietary algorithms).
Feature manipulation. An attacker manipulates the features fed to the model at inference time, causing it to make incorrect predictions on specific inputs.
Threat Category 3: Infrastructure Attacks
Supply chain attacks. Malicious code in model dependencies (Python packages, pre-trained models, Docker images) that compromises the AI system.
Inference endpoint attacks. Traditional API security vulnerabilities (injection, authentication bypass, denial of service) applied to model serving endpoints.
Pipeline attacks. Compromised data pipelines that corrupt data as it flows from source to model.
Security Architecture Components
Input Security
Input validation. Validate all inputs before they reach the model.
- Schema validation (correct data types, required fields, value ranges)
- Content safety scanning (PII detection, malicious content detection)
- Anomaly detection (inputs that are statistically unusual compared to normal traffic)
- Rate limiting (prevent brute-force adversarial attacks and model extraction attempts)
Prompt injection defense (for LLM systems).
- Input sanitization (detect and neutralize embedded instructions)
- System prompt hardening (design system prompts that resist override attempts)
- Input-output isolation (prevent the model from being influenced by user-provided content in ways that override instructions)
- Canary tokens (embed hidden tokens in the system prompt that, if they appear in the output, indicate the system prompt has been compromised)
Model Security
Adversarial robustness.
- Adversarial training (include adversarial examples in the training data to make the model resistant)
- Input preprocessing (smoothing, normalization, and other transformations that reduce adversarial effectiveness)
- Ensemble methods (run multiple models and take the consensus, making adversarial attacks harder)
- Adversarial detection (classify inputs as normal or adversarial before processing)
Model access control.
- Limit who can access model artifacts (weights, architecture, configuration)
- Encrypt model artifacts at rest and in transit
- Use model serving endpoints rather than distributing model files
- Implement authentication and authorization for all model APIs
Model extraction prevention.
- Rate limiting on model APIs (limit the number of queries per user per time period)
- Output perturbation (add small amounts of noise to outputs that do not affect utility but make extraction less effective)
- Watermarking (embed detectable patterns in model outputs that can prove model theft)
- Query pattern monitoring (detect extraction attempts by identifying systematic query patterns)
Data Security
Training data protection.
- Encrypt training data at rest and in transit
- Implement access controls on training datasets
- Use differential privacy during training to prevent memorization of individual examples
- Audit training data access logs
Feature pipeline security.
- Validate feature data at every pipeline stage
- Implement integrity checks (checksums, row counts) to detect tampering
- Use encrypted connections for all data transfers
- Monitor feature distributions for signs of manipulation
Inference data protection.
- Minimize data collection (only log what is necessary)
- Encrypt inference logs
- Implement retention policies and automated deletion
- Control access to inference logs
Infrastructure Security
Supply chain security.
- Scan all dependencies for known vulnerabilities
- Pin dependency versions (do not use floating versions)
- Use private package registries for critical dependencies
- Verify checksums and signatures on pre-trained models and Docker images
- Regularly audit the dependency tree
Network security.
- Isolate AI infrastructure in a separate network segment
- Use private endpoints for model serving (no public internet exposure unless required)
- Implement TLS for all communications
- Use service mesh for secure inter-service communication
Compute security.
- Use hardened container images
- Implement runtime security monitoring
- Restrict GPU access to authorized workloads
- Monitor for cryptocurrency mining (GPUs are attractive targets)
Delivery Process
Phase 1: Threat Modeling (Weeks 1-4)
- Identify all AI systems and their components
- Map the AI-specific threat landscape for each system
- Conduct threat modeling workshops with security and AI teams
- Prioritize threats by likelihood and impact
- Define security requirements for each system
Phase 2: Security Architecture Design (Weeks 5-8)
- Design input security controls for each system
- Design model security measures (robustness, access control, extraction prevention)
- Design data security architecture (encryption, access control, privacy)
- Design infrastructure security controls
- Create security testing plan
Phase 3: Implementation (Weeks 9-16)
- Implement input validation and sanitization
- Implement model access controls and encryption
- Harden data pipelines and storage
- Implement infrastructure security controls
- Deploy security monitoring and alerting
Phase 4: Testing and Operations (Weeks 17-22)
- Conduct adversarial testing (red teaming against the AI-specific threats)
- Conduct penetration testing of AI infrastructure
- Test incident response procedures
- Train teams on AI security practices
- Establish ongoing security monitoring and review cadence
Building a Security-First AI Development Culture
Technical controls are necessary but insufficient. The team that builds AI systems must internalize security thinking.
Security training for AI teams:
- Prompt injection awareness: Every developer who writes prompts for LLM systems must understand prompt injection attacks, how they work, and how to defend against them. Run hands-on workshops where engineers attack their own systems.
- Adversarial ML fundamentals: Data scientists should understand the basics of adversarial attacks on their model type. They do not need to be security researchers, but they should know what attacks are possible and what defenses exist.
- Secure coding practices: ML code has the same vulnerabilities as any code โ injection, authentication bypass, information disclosure. Standard secure coding training applies to ML code too.
- Supply chain awareness: Engineers should verify the provenance of every model, dataset, and package they use. A pre-trained model downloaded from an unverified source could contain a backdoor.
Security review in the development process:
- Include a security review step in the model development workflow
- For high-risk models, require a formal security assessment before deployment
- Include adversarial testing in the standard evaluation pipeline
- Review prompt templates for injection vulnerabilities before production use
Incident Response for AI-Specific Attacks
When an AI security incident occurs, the response process has unique requirements.
Detection: AI attacks may not trigger traditional security alerts. Adversarial inputs look like normal traffic. Model extraction happens through legitimate API calls. Data poisoning manifests as gradual performance degradation. Detection requires AI-specific monitoring โ prediction distribution monitoring, query pattern analysis, and performance trend analysis.
Triage: Classify the incident by type and severity:
- Critical: Data exfiltration, model producing harmful outputs, safety system bypass
- High: Model extraction in progress, active prompt injection attack, training data poisoning detected
- Medium: Adversarial input attempt detected but blocked, suspicious query patterns
- Low: Failed adversarial attempt, low-confidence anomaly detection
Containment: For critical and high-severity incidents:
- Rate limit or block the attacking source
- Rollback to a known-good model version if the model has been compromised
- Disable the affected endpoint if necessary to prevent ongoing harm
- Isolate affected data if poisoning is suspected
Investigation: Determine the scope and root cause:
- Analyze query logs to identify the attack pattern and duration
- Assess whether the attack was successful (did the attacker extract data, compromise the model, or affect users?)
- Identify the vulnerability that was exploited
- Determine whether other systems are affected
Remediation: Address the vulnerability:
- Implement or strengthen the relevant security control
- Retrain the model if training data was compromised
- Update monitoring to detect similar attacks in the future
- Notify affected users if their data was exposed
Post-incident review: Document the incident, the response, and the lessons learned. Update the threat model and security architecture based on the incident.
AI Security Maturity Assessment
Before building a security architecture, assess the organization's current AI security maturity.
Level 1: No AI-specific security. AI systems have standard application security (authentication, TLS) but no AI-specific security measures. Models are not tested for adversarial robustness. Training data is not protected beyond standard access controls. This is where most organizations are.
Level 2: Basic AI security. Input validation and rate limiting are in place. Model access is controlled. Training data is encrypted. But there is no adversarial testing, no prompt injection defense, and no AI-specific incident response.
Level 3: Systematic AI security. Adversarial testing is part of the model development process. Prompt injection defenses are deployed. Training data is protected with encryption and access controls. AI-specific monitoring detects anomalous query patterns. Incident response includes AI-specific procedures.
Level 4: Advanced AI security. All Level 3 capabilities plus model watermarking, extraction prevention, data poisoning detection, and automated adversarial testing in CI/CD. Regular red team exercises test the full AI stack. Security metrics are tracked and reported to leadership.
Most engagements take organizations from Level 1 to Level 3 within the initial engagement, with Level 4 capabilities added through ongoing security operations.
Security Architecture for LLM Applications
LLM applications face unique security challenges that deserve special attention.
Prompt injection defense in depth. No single defense is sufficient against prompt injection. Use multiple layers: input sanitization (detect and neutralize embedded instructions), system prompt hardening (design prompts that resist override), output filtering (detect outputs that indicate the system prompt was compromised), and monitoring (track patterns that suggest ongoing prompt injection attacks).
Data leakage prevention. LLMs may inadvertently reveal confidential information from their context window. Implement output scanning for PII, confidential business information, and system prompt content. Scrub outputs before they reach users.
Tool use security. LLM agents with tool access (database queries, API calls, code execution) present additional attack surfaces. An attacker who successfully injects instructions could cause the agent to execute unauthorized actions. Implement strict tool access controls, validate all tool inputs, and require human approval for high-risk actions.
AI Security for Third-Party Model Dependencies
Most organizations use third-party models โ pre-trained models from Hugging Face, commercial APIs from OpenAI or Anthropic, or models embedded in SaaS products. Each dependency introduces security risks that must be managed.
Model provenance verification. Before deploying any third-party model, verify its provenance. Where was it trained? By whom? On what data? Has it been audited for backdoors or biases? Models downloaded from public repositories could have been tampered with โ a backdoor model that performs normally on standard inputs but produces attacker-controlled outputs on specific triggers is extremely difficult to detect.
API security for commercial model providers. When using commercial AI APIs, apply the same security discipline as any third-party API integration. Use separate API keys for each application. Implement key rotation policies. Monitor API usage for anomalies that could indicate key compromise. Never embed API keys in client-side code or version control.
Data exposure to third-party models. When sending data to a third-party model API, evaluate what data exposure this creates. Sensitive data (customer PII, financial records, trade secrets) sent to a third-party API may be logged, stored, or used for model training by the provider. Implement data classification checks at the gateway level to prevent sensitive data from being sent to external model providers unless the provider's data handling policies are acceptable.
Vendor security assessments. Conduct security assessments of AI model vendors before integration. Evaluate their data handling policies, security certifications (SOC 2, ISO 27001), incident response procedures, and model update practices. A vendor that pushes model updates without notice could change model behavior in ways that affect your security posture.
Fallback planning for vendor failures. If a third-party model provider experiences a security breach, your organization needs a plan. Can you switch to an alternative provider? Can you fall back to a self-hosted model? Define fallback strategies for each third-party model dependency and test them periodically.
Pricing AI Security Architecture Engagements
- AI threat modeling and security assessment: $20,000 to $50,000
- Security architecture design: $30,000 to $80,000
- Full security architecture implementation: $100,000 to $300,000
- Ongoing security monitoring and red teaming: $10,000 to $30,000 per month
Security Architecture as a Competitive Differentiator
For agencies working in regulated or security-sensitive industries, AI security expertise is a powerful differentiator. Most AI agencies focus on model accuracy and deployment speed. Agencies that can also deliver security-hardened AI systems win engagements with healthcare organizations, financial institutions, government agencies, and defense contractors where security is non-negotiable.
Your Next Step
This week: Ask your clients: "Has anyone tested your AI models for adversarial robustness?" The answer is almost always no, which reveals both the risk and the opportunity.
This month: Develop an AI-specific threat model template that covers model attacks, data attacks, and infrastructure attacks.
This quarter: Deliver your first AI security architecture engagement. Start with threat modeling, implement the highest-priority controls, and establish ongoing security monitoring.