A 22-person AI agency in Boston had a security incident that nearly destroyed the business. An attacker gained access to their model training infrastructure through a misconfigured Jupyter notebook server that was exposed to the internet. From there, the attacker accessed training datasets containing proprietary client data from three different engagements — a healthcare company's patient records, a financial services firm's transaction data, and a retail client's customer purchase histories. The breach affected data from approximately 180,000 individuals. The agency faced regulatory investigations in two states, contract breach claims from all three clients, and notification obligations under HIPAA and state breach notification laws. Total cost: $1.4 million in legal fees, regulatory fines, client settlements, and remediation. The agency survived, but barely.
AI infrastructure creates attack surfaces that traditional IT security governance does not address. Model training environments, data pipelines, model registries, inference endpoints, and monitoring systems all introduce unique security risks. The data flowing through these systems is often the most sensitive information your clients have. And the rapid pace of AI development means security controls are frequently bypassed in the name of speed.
Security governance for AI is not about being paranoid. It is about being methodical. Here is how to build a security governance framework that protects your AI infrastructure without slowing your delivery to a crawl.
Why AI Infrastructure Needs Its Own Security Governance
Standard IT security governance covers networks, endpoints, applications, and data. AI infrastructure introduces additional dimensions that standard frameworks do not address.
Training environments are high-value targets. Model training environments contain concentrated datasets that may include data from multiple clients. Compromising a training server can expose far more data than compromising a production application server.
Model artifacts are intellectual property. Trained model weights, architectures, and training configurations represent significant intellectual property. A competitor or malicious actor who obtains your model artifacts gains the benefit of your investment without the cost.
Data pipelines span multiple trust boundaries. AI data pipelines often pull data from client systems, process it through agency infrastructure, and deploy results to production environments. Each boundary crossing is a potential security vulnerability.
Inference endpoints are API attack surfaces. AI models served through APIs are subject to traditional API security risks plus AI-specific attacks like prompt injection, model extraction, and adversarial inputs.
Experimentation creates security debt. AI development is inherently experimental. Data scientists spin up environments, download datasets, install packages, and share notebooks in ways that create security gaps. Without governance, experimentation environments become security liabilities.
The AI Security Governance Framework
Pillar 1: Data Security Governance
Data is the most sensitive element of your AI infrastructure. Governing data security across the AI lifecycle requires specific policies and controls.
Data classification:
- Classify all data processed by AI systems based on sensitivity (public, internal, confidential, restricted)
- Apply classification labels to datasets, not just individual records
- Map data classifications to required security controls
- Review classifications when data is combined or transformed (combining two internal datasets may create a confidential dataset)
Data access controls:
- Implement role-based access to training data and datasets
- Require approval for access to confidential and restricted datasets
- Log all data access with user identity, timestamp, and purpose
- Review access permissions quarterly and revoke unnecessary access
- Implement data access expiration for project-based access
Data encryption:
- Encrypt data at rest using AES-256 or equivalent
- Encrypt data in transit using TLS 1.3 or equivalent
- Manage encryption keys through a dedicated key management system
- Rotate encryption keys on a defined schedule
- Consider encryption of data in use for the most sensitive workloads
Data isolation:
- Isolate client data from other clients' data at the infrastructure level
- Use separate storage accounts, databases, or encryption keys for different clients
- Prevent cross-client data access through technical controls, not just policies
- Verify isolation through regular testing
Data lifecycle management:
- Define retention periods for all data types (raw data, processed data, training data, evaluation data)
- Implement automated data deletion when retention periods expire
- Address data persistence in model weights, backups, and logs
- Document data destruction for compliance and audit purposes
Pillar 2: Model Security Governance
AI models themselves require security governance throughout their lifecycle.
Model access controls:
- Control who can access model weights, architectures, and training configurations
- Implement version-controlled model registries with access logging
- Restrict ability to export or download model artifacts
- Separate access to development models and production models
Model integrity:
- Implement checksums or digital signatures for model artifacts
- Verify model integrity before deployment — ensure the model being deployed is the model that was tested
- Protect against model tampering in storage and transit
- Monitor for unauthorized model modifications in production
Model vulnerability management:
- Assess models for adversarial vulnerabilities before deployment
- Test for prompt injection susceptibility in language model applications
- Evaluate model extraction risk — can attackers replicate your model through API queries?
- Assess data poisoning risk — could training data be manipulated to introduce backdoors?
- Monitor for known vulnerabilities in model frameworks and dependencies
Model supply chain security:
- Vet third-party models and pre-trained weights before incorporating them
- Verify the provenance of pre-trained models (download from official sources, verify checksums)
- Monitor for vulnerabilities in model dependencies (PyTorch, TensorFlow, Hugging Face libraries)
- Maintain a bill of materials for model components and dependencies
Pillar 3: Infrastructure Security Governance
The infrastructure running your AI systems needs governance that addresses AI-specific risks.
Compute environment security:
- Harden training and inference servers with security baselines
- Isolate GPU clusters and training environments from general corporate networks
- Implement container security for containerized model serving
- Secure Jupyter notebooks and interactive development environments — these are the most commonly misconfigured AI infrastructure components
- Disable unnecessary services and ports on AI infrastructure
Cloud security governance:
- Implement cloud security posture management for AI workloads
- Define approved cloud services and configurations for AI infrastructure
- Monitor for misconfigurations in cloud storage (publicly accessible S3 buckets containing training data are still one of the most common AI data breaches)
- Implement network security controls for cloud-based training and inference
- Use cloud-native security tools for monitoring and alerting
API security:
- Secure inference APIs with authentication and authorization
- Implement rate limiting to prevent model extraction attacks
- Validate and sanitize inputs to prevent prompt injection and adversarial attacks
- Monitor API usage for anomalous patterns
- Implement API versioning and deprecation with security in mind
Development environment security:
- Define approved tools and packages for AI development
- Implement package scanning for vulnerabilities in Python and ML dependencies
- Secure source code repositories containing model code and configurations
- Control access to development environments and notebooks
- Implement secrets management — no API keys or credentials in code or notebooks
Pillar 4: People and Process Governance
Security technology is only as effective as the people and processes supporting it.
Security roles and responsibilities:
- Assign a security lead for AI infrastructure (this can be part of an existing role for smaller agencies)
- Define security responsibilities for data scientists, ML engineers, and DevOps teams
- Ensure someone is accountable for AI security governance — not just responsible, but accountable
- Include AI security in performance reviews and team objectives
Security training:
- Provide AI-specific security training to all team members who handle data or models
- Cover topics: secure coding for ML, data handling, notebook security, credential management, social engineering
- Update training annually to address emerging threats
- Require security training completion before granting access to sensitive environments
Security review processes:
- Include security review in the AI model deployment process
- Conduct security assessments for new AI projects during project kickoff
- Review third-party AI services and tools for security implications
- Conduct periodic penetration testing of AI infrastructure
Incident response:
- Define an incident response plan specific to AI security incidents
- Include scenarios for data breaches, model theft, adversarial attacks, and system compromise
- Define escalation procedures, including client notification timelines
- Conduct tabletop exercises to test incident response readiness
- Maintain relationships with forensic specialists and legal counsel for incident response
Pillar 5: Compliance and Audit Governance
AI security governance needs to satisfy regulatory requirements and withstand audit scrutiny.
Regulatory compliance mapping:
- Map your AI security controls to applicable regulatory requirements (GDPR, HIPAA, SOC 2, ISO 27001)
- Identify gaps between current controls and regulatory requirements
- Prioritize gap remediation based on risk and regulatory enforcement activity
- Monitor regulatory changes that affect AI security requirements
Audit readiness:
- Maintain documentation of security policies, procedures, and controls
- Implement automated evidence collection for security controls
- Conduct internal security audits on a defined schedule
- Prepare for external audits from clients, regulators, and certification bodies
Client compliance obligations:
- Understand and meet client-specific security requirements
- Complete client security questionnaires accurately and promptly
- Provide security attestations and certifications as required
- Support client audits of your AI security controls
Implementation Roadmap
Month 1: Foundation
- Conduct an AI infrastructure security assessment to identify current gaps
- Classify all data processed by AI systems
- Implement data access controls and logging
- Secure exposed development environments (Jupyter notebooks, model servers)
- Implement secrets management for API keys and credentials
Month 2: Hardening
- Implement data encryption at rest and in transit
- Set up network segmentation for AI infrastructure
- Implement model registry with access controls
- Deploy API security controls for inference endpoints
- Begin security training for the team
Month 3: Monitoring and Process
- Deploy security monitoring and alerting for AI infrastructure
- Implement the incident response plan
- Establish security review processes for model deployments
- Conduct initial penetration testing
- Document security policies and procedures
Months 4-6: Maturation
- Implement automated compliance monitoring
- Conduct first internal security audit
- Refine processes based on operational experience
- Address gaps identified through monitoring and testing
- Pursue relevant certifications (SOC 2, ISO 27001) if client requirements demand it
AI-Specific Threat Landscape
Understanding the threats specific to AI infrastructure helps you prioritize governance investments.
Data poisoning attacks. Adversaries manipulate training data to introduce backdoors or biases into models. If your training data pipeline is not secured, an attacker could alter training data in ways that produce a model that appears to work normally but behaves maliciously under specific conditions. Governance measures include training data integrity verification, access controls on training pipelines, and validation of training data sources.
Model extraction attacks. Adversaries query your model through its API to reconstruct a copy. With enough carefully crafted queries, an attacker can create a functional replica of your model without access to your training data or model weights. Governance measures include rate limiting, query pattern monitoring, and output perturbation techniques.
Prompt injection attacks. For LLM-based applications, adversaries craft inputs designed to override system instructions, extract system prompts, or cause the model to perform unauthorized actions. Governance measures include input sanitization, output filtering, prompt design best practices, and security testing specifically for prompt injection vectors.
Supply chain attacks on ML libraries. AI systems depend on complex software supply chains — Python packages, model weights from public repositories, pre-trained models from third parties. Compromised dependencies can introduce vulnerabilities. Governance measures include dependency scanning, verified sources for pre-trained models, and software bill of materials tracking.
Scaling Security Governance
For small agencies (under 15 people): Focus on the fundamentals — data encryption, access controls, secure development environments, and incident response. Assign security responsibilities to existing roles. Use cloud-native security tools to minimize overhead.
For mid-size agencies (15-50 people): Add dedicated security functions, formal security review processes, and compliance frameworks. Implement automated security monitoring. Consider SOC 2 Type II certification.
For larger agencies (50+ people): Build a dedicated security team, implement comprehensive security governance frameworks, pursue multiple certifications, and conduct regular third-party security assessments.
Your Next Step
Conduct a one-day AI infrastructure security assessment. Walk through your training environments, data pipelines, model registries, and inference endpoints. Ask five questions at each point: Who has access? Is the data encrypted? Are access logs maintained? Is the environment hardened? What happens if this system is compromised?
Document the findings and prioritize remediation. Start with the highest-risk items — exposed development environments, unencrypted sensitive data, and overly broad access permissions. These are the vulnerabilities that attackers exploit first.
The Boston agency's $1.4 million breach started with a single misconfigured Jupyter notebook. Your security assessment might reveal similar vulnerabilities. Better to find them yourself than to let an attacker find them for you.