An AI agency in Philadelphia delivered a customer churn prediction system to a mid-size telecom provider. The model ingested call records, billing history, service tickets, and usage patterns for 2.3 million subscribers. Six months after deployment, the telecom received a regulatory inquiry from the state attorney general's office asking about AI systems processing consumer data. The telecom turned to the agency: where is the privacy impact assessment? There was not one. No one had evaluated the privacy implications of training an AI model on millions of customer records. The telecom had to engage a Big Four consulting firm to conduct a retroactive PIA at a cost of $175,000 — a cost they pushed back to the agency as a contractual obligation the agency had failed to fulfill.
Privacy impact assessments for AI systems are not optional in most jurisdictions. GDPR requires Data Protection Impact Assessments for processing that is "likely to result in a high risk to the rights and freedoms of natural persons" — and AI systems processing personal data almost always meet that threshold. The EU AI Act adds AI-specific assessment requirements. US state privacy laws are implementing similar requirements. Even where PIAs are not legally mandated, they represent best practice that reduces risk, builds client confidence, and creates documentation that protects your agency in the event of a regulatory inquiry.
Yet most AI agencies skip them entirely. They treat privacy as someone else's problem. That approach is increasingly untenable as regulators specifically target AI systems for privacy scrutiny.
What a Privacy Impact Assessment Actually Is
A privacy impact assessment is a systematic process for evaluating the privacy risks of a data processing activity and identifying measures to mitigate those risks. For AI systems, the PIA examines how personal data is collected, used, stored, and shared throughout the AI lifecycle — from training data acquisition through model deployment and ongoing operation.
A PIA is not a legal opinion. It is not a compliance checklist. It is an analytical process that:
- Describes the data processing activities and their purpose
- Assesses the necessity and proportionality of the processing
- Identifies risks to individuals whose data is processed
- Defines measures to mitigate those risks
- Documents the assessment for regulatory and audit purposes
For AI systems, the PIA needs to go beyond standard data processing assessments to address AI-specific privacy considerations: model training on personal data, inference on new personal data, the potential for models to memorize and reproduce training data, bias and fairness implications, and the transparency challenges of complex models.
When You Need a PIA
Legally Required Scenarios
Under GDPR (Article 35): A DPIA is mandatory when processing is likely to result in a high risk to individuals. The European Data Protection Board identifies specific scenarios that trigger the requirement:
- Systematic and extensive profiling with significant effects
- Large-scale processing of special categories of data
- Systematic monitoring of publicly accessible areas
- Innovative use of new technologies (AI systems generally qualify)
- Processing that prevents individuals from exercising their rights
- Automated decision-making with legal or similarly significant effects
Under the EU AI Act: High-risk AI systems require fundamental rights impact assessments that overlap with and extend beyond GDPR DPIAs.
Under US state privacy laws: Colorado, California, and other states require privacy assessments for processing that presents a heightened risk of harm, including profiling and automated decision-making.
Practically Recommended Scenarios
Even when not legally required, a PIA is strongly recommended for AI systems that:
- Process personal data for training or inference
- Make or support decisions that affect individuals
- Process data about vulnerable populations (children, patients, employees)
- Combine data from multiple sources to create new insights about individuals
- Operate in sectors with heightened privacy expectations (healthcare, finance, education)
- Deploy novel AI techniques or architectures
The AI Privacy Impact Assessment Process
Phase 1: Scoping and Context Setting
Before diving into the assessment, establish the boundaries and context.
Define the AI system scope:
- What does the AI system do, in plain language?
- What personal data does it process (training data and inference data)?
- Who are the individuals whose data is processed?
- What decisions or outputs does the system produce?
- Who uses the system and how?
Identify the data controller and processor:
- Is your agency the data controller (determining purposes and means of processing) or the data processor (processing on behalf of the client)?
- In most agency engagements, the client is the data controller and the agency is the data processor
- The PIA responsibility typically falls on the data controller, but processors have an obligation to assist
Map the data flows:
- Document how personal data enters the system (data collection, client provision, third-party sources)
- Document how data moves through the system (preprocessing, feature engineering, model training, inference, output generation)
- Document how data exits the system (outputs, reports, API responses, data sharing)
- Document where data is stored at each stage (databases, model weights, caches, logs)
- Document data retention periods for each storage location
Phase 2: Legal Basis Assessment
For each processing activity identified in Phase 1, determine the legal basis.
Common legal bases for AI processing:
- Consent — Individuals have explicitly agreed to their data being used for AI training and inference. Strongest basis but hardest to obtain at scale.
- Legitimate interest — The processing is necessary for a legitimate interest that is not overridden by the individual's rights. Most common basis for B2B AI systems. Requires a documented legitimate interest assessment.
- Contract performance — The processing is necessary to perform a contract with the individual. Applicable when AI processing is part of a service the individual has contracted for.
- Legal obligation — The processing is required by law. Rare for AI training but may apply in compliance use cases.
AI-specific legal basis considerations:
- Training a model on personal data may require a different legal basis than using the model for inference
- Repurposing data collected for one purpose (customer service) for a new purpose (AI training) requires a compatibility assessment
- Automated decision-making with significant effects requires specific legal basis and safeguards under GDPR Article 22
Phase 3: Risk Identification
Systematically identify privacy risks across the AI lifecycle.
Training data risks:
- Data breach — Training datasets containing personal data could be breached
- Unauthorized use — Training data could be used for purposes beyond the defined scope
- Data quality issues — Inaccurate personal data in training sets could produce harmful model behavior
- Bias introduction — Training data may reflect societal biases that affect model outputs about individuals
- Consent gaps — Data may have been collected without consent for AI training use
Model risks:
- Memorization — AI models can memorize and reproduce personal data from training sets
- Inference attacks — Adversaries may be able to extract information about training data from model outputs
- Membership inference — Adversaries may be able to determine whether an individual's data was in the training set
- Model inversion — Adversaries may be able to reconstruct personal data from model parameters
Deployment risks:
- Automated decision-making — Model outputs may drive decisions that affect individuals without adequate human oversight
- Lack of transparency — Individuals may not know that AI is processing their data or making decisions about them
- Profiling — The AI system may create profiles of individuals that reveal sensitive characteristics
- Function creep — The system may be used for purposes beyond its original scope
Data lifecycle risks:
- Excessive retention — Personal data may be retained longer than necessary
- Inadequate deletion — Personal data may persist in model weights, backups, or logs after deletion requests
- Cross-border transfers — Data may be transferred to jurisdictions with inadequate privacy protections
- Third-party access — Third-party service providers may access personal data without adequate safeguards
Phase 4: Risk Evaluation
For each identified risk, evaluate the likelihood and severity.
Likelihood factors:
- How much personal data does the system process?
- How sensitive is the personal data?
- How sophisticated are potential adversaries?
- What technical safeguards are in place?
- What is the track record of similar systems?
Severity factors:
- What harm could individuals suffer if the risk materializes?
- How many individuals could be affected?
- Is the harm reversible?
- Are vulnerable populations affected?
- What are the potential regulatory consequences?
Risk rating: Combine likelihood and severity into a risk rating (low, medium, high, critical) for each identified risk. Focus mitigation efforts on high and critical risks first.
Phase 5: Mitigation Measures
For each significant risk, define specific mitigation measures.
Technical measures:
- Data minimization — Reduce the personal data used for training to the minimum necessary. Use anonymization, pseudonymization, or aggregation where possible.
- Differential privacy — Apply differential privacy techniques to training processes to limit the model's ability to memorize individual data points.
- Access controls — Implement strict access controls for training data, models, and outputs.
- Encryption — Encrypt personal data at rest and in transit throughout the AI pipeline.
- Output filtering — Implement filters that prevent the model from outputting personal data from training sets.
- Audit logging — Log all access to personal data for audit and accountability purposes.
Organizational measures:
- Privacy training — Train team members involved in AI development on privacy requirements and best practices.
- Access management — Limit access to personal data to team members with a demonstrated need.
- Data handling procedures — Define and enforce procedures for handling personal data throughout the AI lifecycle.
- Vendor management — Ensure third-party service providers meet privacy requirements through contractual terms and audits.
Transparency measures:
- Privacy notices — Ensure individuals are informed about AI processing of their data through clear, accessible privacy notices.
- Explainability — Provide meaningful explanations of AI decisions to affected individuals.
- Individual rights processes — Implement processes for individuals to exercise their privacy rights (access, correction, deletion, objection).
- Human oversight — Ensure meaningful human oversight of AI decisions that affect individuals.
Phase 6: Consultation and Approval
Before finalizing the PIA, consult relevant stakeholders.
Internal consultation:
- Legal counsel — Review legal basis assessments and mitigation measures
- Security team — Validate technical security measures
- Data protection officer — Review and approve the PIA (if your organization has a DPO)
- Project team — Validate technical feasibility of mitigation measures
External consultation:
- Client data protection team — Review and approve the PIA from the data controller's perspective
- Regulatory authority — In some jurisdictions, prior consultation with the data protection authority is required for high-risk processing where risks cannot be adequately mitigated
Phase 7: Documentation and Maintenance
A PIA is not a one-time document. It is a living assessment that needs to be maintained throughout the AI system's lifecycle.
Documentation requirements:
- Record the assessment methodology and scope
- Document all identified risks and their ratings
- Detail all mitigation measures and their implementation status
- Record stakeholder consultations and their outcomes
- Note any residual risks accepted and the rationale
Maintenance triggers:
- Significant changes to the AI system's functionality or scope
- New types of personal data being processed
- Changes in the regulatory environment
- Security incidents or privacy complaints
- Periodic review (at least annually)
Common PIA Mistakes in AI Projects
Mistake 1: Treating the PIA as a checkbox exercise. A PIA that goes through the motions without genuinely assessing risks provides false assurance and no real protection. Regulators can tell the difference.
Mistake 2: Assessing the model but not the data pipeline. Privacy risks exist throughout the data pipeline — collection, preprocessing, feature engineering, training, deployment, monitoring. Assessing only the model itself misses significant risks.
Mistake 3: Ignoring model memorization risks. AI models can and do memorize personal data from training sets. Your PIA must address this risk and define mitigation measures.
Mistake 4: Failing to reassess when the system changes. A PIA conducted at project kickoff becomes stale as the system evolves. Build reassessment triggers into your process.
Mistake 5: Not involving the right people. Privacy assessments require input from legal, technical, and business perspectives. A PIA conducted solely by engineers or solely by lawyers will miss important risks.
Your Next Step
Identify the AI system in your portfolio that processes the most personal data. Conduct a PIA for that system using the seven-phase process outlined above. Even if a formal PIA has not been required yet, the exercise will reveal privacy risks you may not have considered and create documentation that protects your agency when regulators come asking.
Build a PIA template specific to AI systems that your team can use for future projects. Include the AI-specific risk categories outlined in Phase 3. Make the PIA a standard part of your project kickoff process — before any personal data is collected or processed.
The Philadelphia agency could have avoided $175,000 in retroactive assessment costs with a $15,000 PIA conducted at project inception. Privacy impact assessments are one of those investments that look expensive until you see what skipping them costs.