It Is Public Data, So Why Did the Whole Team Hesitate

A data science team at an AI agency was debating whether to use a particular dataset for a client project. The dataset was publicly available, technically legal to use, and highly relevant to the client's use case. But the data had been collected from social media posts by people who had no idea their data would be used to train an AI model. The team lead said "it is public data, so it is fair game." A junior engineer said "just because we can does not mean we should." The conversation stalled because neither person had a framework for resolving the disagreement. They had never practiced ethical reasoning in a structured way. The team lead's view prevailed because he was senior, not because his argument was stronger. The dataset was used. Eighteen months later, a journalist published an investigative piece about the dataset's origin, and multiple companies that had used it—including the agency's client—faced public backlash and data deletion requests.

Ethics training that consists of reading principles from a policy document does not prepare teams for real-world ethical dilemmas. What prepares teams is practice—analyzing real scenarios, debating trade-offs, and building the ethical reasoning muscles that activate when a difficult decision arrives. Case studies are the most effective tool for this kind of training.

How to Use These Case Studies

Each case study below follows a consistent structure: the scenario, the ethical dimensions at play, the decision that was made, the consequences, and discussion questions for your team. Use them in team workshops, brown bag lunches, or one-on-one conversations. The goal is not to arrive at the "right" answer—many of these cases involve genuine trade-offs with no perfect solution. The goal is to practice the reasoning process.

Format for team discussions:

Present the scenario and ask the team to identify the ethical issues before reading further
Discuss what decision they would make and why
Reveal the actual outcome and consequences
Discuss what could have been done differently
Extract principles that apply to your agency's work

Case Study 1: The Biased Hiring Tool

Scenario: A large technology company built an AI resume screening tool to automate the initial stage of their hiring process. The tool was trained on ten years of historical hiring data—resumes that had been submitted and the hiring decisions that were made. The training data reflected a decade of hiring patterns in a male-dominated industry. The tool learned to downgrade resumes that contained words associated with women (women's colleges, women's sports teams, gendered language) because the historical data showed that men were hired at higher rates.

The decision made: The tool was deployed to production and used to screen real candidates for approximately two years before the bias was discovered through internal analysis.

The consequences: The tool systematically disadvantaged women applicants for technical roles across the company for two years. When the bias was discovered and reported publicly, the company faced intense criticism, regulatory scrutiny, and a class action lawsuit. The tool was scrapped entirely. The company's reputation as an employer suffered, affecting their ability to recruit diverse talent—the exact opposite of their stated goals.

Ethical dimensions:

Historical bias perpetuation: Training on biased historical data encodes past discrimination into future decisions
Scale of harm: Automated systems can discriminate at scale faster than any individual human could
Testing adequacy: Was the tool tested for bias before deployment?
Accountability: Who was responsible for the biased outcomes—the engineers, the HR team that deployed it, or leadership that approved it?
Informed consent: Were applicants told an AI was screening their resumes?

Discussion questions for your team:

How would you test a hiring tool for gender bias before deployment?
If you discovered bias in a deployed system, what would you do?
Should historical data ever be used to train AI systems that affect people's access to opportunities?
Who in your agency would be responsible for catching this kind of issue?

Case Study 2: The Predictive Policing Feedback Loop

Scenario: A city police department deployed a predictive policing AI system that analyzed historical crime data to predict where crimes were most likely to occur. Officers were dispatched to predicted hot spots for increased patrol. The historical data reflected decades of policing patterns where certain neighborhoods—primarily low-income communities of color—were policed more heavily than others. More policing led to more arrests, which generated more crime data for those neighborhoods, which led the AI to predict more crime in those neighborhoods, which led to even more policing.

The decision made: The system was deployed and used for several years. Community organizations raised concerns about discriminatory impact, but the department defended the system by pointing to its statistical accuracy—it was correctly predicting where arrests would occur.

The consequences: The system reinforced and amplified existing patterns of racial disparities in policing. Communities that were already over-policed experienced even more police presence, while communities with less historical police presence received even less. Multiple studies demonstrated that the system's "accuracy" was self-fulfilling—it was predicting where police would arrest people, not where crime was occurring. The system was eventually discontinued after sustained community opposition and legal challenges.

Ethical dimensions:

Feedback loops: When an AI system's outputs influence the data it will be trained on in the future, dangerous self-reinforcing cycles emerge
Accuracy versus fairness: The system was statistically accurate by one measure but deeply unfair by another
Proxy discrimination: The system did not use race as an input, but geographic and historical policing data served as proxies for race
Community impact: The system affected entire communities, not just individuals
Stakeholder voice: Were the affected communities consulted about the system's deployment?

Discussion questions for your team:

How do you identify feedback loops in AI systems before they cause harm?
When a client says "the model is accurate," what follow-up questions should you ask?
How do you balance statistical accuracy with fairness?
What responsibility does an AI agency have when building systems for clients whose use cases may harm vulnerable communities?

Case Study 3: The Health Insurance Risk Model

Scenario: A health insurance company used an AI model to identify patients who would benefit from additional care management services. The model predicted which patients were most likely to have high future healthcare costs, reasoning that high-cost patients would benefit most from proactive care. The model used past healthcare spending as its primary predictor of future healthcare needs. The problem: due to systemic healthcare access disparities, Black patients had historically received less healthcare than white patients with the same conditions. Lower spending did not mean lower need—it meant lower access. The model systematically under-identified Black patients for additional care because their historical spending was lower.

The decision made: The model was deployed and used to allocate care management resources. An academic research team later analyzed the model and published findings demonstrating the racial disparity.

The consequences: At a given risk score threshold, Black patients were significantly sicker than white patients—meaning the model assigned equal "need" scores to Black patients who were much more ill. The disparity affected an estimated 46,000 Black patients per year in the studied system alone. After the research was published, the insurance company worked with the researchers to redesign the model using health-based predictors instead of cost-based predictors, reducing the disparity by 84%.

Ethical dimensions:

Proxy variables: Using healthcare cost as a proxy for healthcare need encoded systemic inequality
Structural bias: The bias did not come from the model's algorithm but from the structural inequality in the data
Measurement: What you choose to measure determines what the model optimizes for, and this choice has ethical implications
Remediation: The problem was fixable once identified, but it required researchers external to the company to discover it

Discussion questions for your team:

How do you evaluate whether the variables you are using as proxies actually measure what you think they measure?
What structural biases might exist in the data for your current projects?
Should AI agencies be responsible for assessing structural bias, or is that the client's responsibility?
How would you redesign this model if you were the agency that built it?

Case Study 4: The Facial Recognition Misidentification

Scenario: A retail company deployed a facial recognition system to identify known shoplifters when they entered stores. The system was trained primarily on lighter-skinned faces and had significantly higher error rates for darker-skinned individuals. A Black man was stopped by security, detained, and accused of theft based on a facial recognition match. He was innocent. The person the system matched him to was a different individual. The wrongful detention lasted 45 minutes before the error was identified.

The decision made: The system was deployed without adequate testing across racial demographics. The store's policy was to detain individuals flagged by the system, with security guards treating the system's identification as reliable.

The consequences: The wrongfully detained individual filed a lawsuit. The case received media coverage that damaged the retailer's brand. Multiple studies confirmed that the underlying facial recognition technology had significantly higher error rates for women and people with darker skin. Several cities subsequently banned or restricted facial recognition use by businesses and law enforcement.

Ethical dimensions:

Differential accuracy: The system worked well for some people and poorly for others, with the disparity falling along racial lines
Deployment context: Using an imperfect system in a context where errors lead to detention is qualitatively different from using the same system for less consequential purposes
Human-AI interaction: Security guards treated the AI's output as definitive rather than probabilistic
Testing adequacy: The vendor tested the system but did not adequately test across demographics
Consent: Customers did not consent to facial recognition scanning

Discussion questions for your team:

What testing would you require before deploying a facial recognition system?
How should you design human-AI interaction to prevent humans from over-trusting AI outputs?
Are there use cases where AI should not be deployed regardless of accuracy levels?
What is your agency's responsibility if a client wants to deploy a system you believe will cause harm?

Scenario: A social media platform deployed AI-based content moderation to detect and remove hate speech. The model was trained primarily on English-language hate speech examples. When applied to posts in other languages, the model performed poorly—missing hate speech in some languages while over-censoring ordinary speech in others. Posts in African American Vernacular English were flagged as hate speech at disproportionately high rates because the training data did not adequately represent linguistic diversity.

The decision made: The platform deployed the model globally with the same thresholds and training data, under pressure to moderate content at scale.

The consequences: Marginalized communities whose language patterns were misclassified experienced disproportionate content removal and account suspensions. Meanwhile, hate speech in under-represented languages went undetected. Users from affected communities reported feeling silenced by the platform. Researchers documented the disparities and published widely-cited studies. The platform eventually invested in language-specific models and more diverse training data, but the trust damage was significant.

Ethical dimensions:

Linguistic bias: AI models trained on dominant language patterns can discriminate against linguistic minorities
Global deployment of local solutions: Applying a model trained on one population to a global audience creates systematic disparities
Speed versus fairness: Pressure to moderate at scale led to deployment before the model was ready for diverse populations
Silencing effect: Content moderation errors do not just inconvenience users—they suppress speech and participation

Discussion questions for your team:

How do you ensure AI systems work equitably across different linguistic and cultural contexts?
When a client pressures you to deploy faster than you believe is responsible, how do you respond?
What is the minimum level of testing required before deploying an AI system that affects free expression?
How do you balance the harm of deploying an imperfect system against the harm of not deploying any system at all?

Scenario: An AI agency collected training data for a sentiment analysis model by scraping product reviews from multiple websites. The reviews were publicly posted, and no terms of service explicitly prohibited scraping. The agency used this data to train a model that analyzed employee feedback for a corporate client—a completely different context than product reviews. Reviewers had shared opinions about products; they had not consented to their writing style and language patterns being used to analyze workplace sentiment.

The decision made: The agency used the data because it was technically accessible and no law clearly prohibited the specific use.

The consequences: When the corporate client's employees learned that the sentiment analysis tool was built on scraped product reviews, they questioned the tool's validity and raised ethical concerns. The client's works council (in European operations) filed a complaint about the use of data collected without appropriate consent. The project was suspended pending a legal review that took three months.

Ethical dimensions:

Contextual integrity: Data shared in one context (product reviews) was used in a different context (employee monitoring) without consent
Legal versus ethical: Just because something is legal does not make it ethical
Purpose limitation: Data protection principles require that data be used for purposes compatible with the original collection purpose
Power dynamics: Using data to build employee monitoring tools raises concerns about workplace surveillance and power imbalances

Discussion questions for your team:

How do you determine whether a data source is ethically appropriate for your intended use?
Where do you draw the line between legally permissible and ethically appropriate?
What consent would you want if your data were being used to train AI systems?
How does your agency currently evaluate the ethical appropriateness of training data?

Building an Ethics Training Program

Use these case studies as the foundation for an ongoing ethics training program.

Frequency. Conduct ethics discussions at least monthly. Ethics muscles atrophy without exercise.

Participation. Include everyone who touches AI systems—engineers, data scientists, product managers, project managers, and sales. Ethics is not just a technical concern.

Real cases from your own work. Supplement external case studies with anonymized examples from your agency's own projects. These are the most impactful because they are directly relevant to your team's daily work.

No-blame culture. Create an environment where raising ethical concerns is valued, not punished. The junior engineer who says "I am not sure we should do this" should be praised for speaking up, not dismissed for slowing things down.

Document decisions. After each case study discussion, document the principles and guidelines that emerged. Over time, these become your agency's practical ethics playbook—built from real discussion and real reasoning, not abstract principles.

Your Next Step

Schedule a one-hour ethics workshop with your team this month. Choose one case study from this article and use the discussion questions to guide the conversation. Before revealing the actual outcome, ask your team what they would do and why. The quality of the discussion matters more than reaching the "right" answer. After the workshop, ask your team to identify one ethical question from their current projects that they would like to discuss at the next session. This builds the habit of ethical reflection as a normal part of your agency's work.

How to Use These Case Studies

Format for team discussions:

Present the scenario and ask the team to identify the ethical issues before reading further
Discuss what decision they would make and why
Reveal the actual outcome and consequences
Discuss what could have been done differently
Extract principles that apply to your agency's work

Case Study 1: The Biased Hiring Tool

The decision made: The tool was deployed to production and used to screen real candidates for approximately two years before the bias was discovered through internal analysis.

Ethical dimensions:

Historical bias perpetuation: Training on biased historical data encodes past discrimination into future decisions
Scale of harm: Automated systems can discriminate at scale faster than any individual human could
Testing adequacy: Was the tool tested for bias before deployment?
Accountability: Who was responsible for the biased outcomes—the engineers, the HR team that deployed it, or leadership that approved it?
Informed consent: Were applicants told an AI was screening their resumes?

Discussion questions for your team:

How would you test a hiring tool for gender bias before deployment?
If you discovered bias in a deployed system, what would you do?
Should historical data ever be used to train AI systems that affect people's access to opportunities?
Who in your agency would be responsible for catching this kind of issue?

Case Study 2: The Predictive Policing Feedback Loop

Ethical dimensions:

Feedback loops: When an AI system's outputs influence the data it will be trained on in the future, dangerous self-reinforcing cycles emerge
Accuracy versus fairness: The system was statistically accurate by one measure but deeply unfair by another
Proxy discrimination: The system did not use race as an input, but geographic and historical policing data served as proxies for race
Community impact: The system affected entire communities, not just individuals
Stakeholder voice: Were the affected communities consulted about the system's deployment?

Discussion questions for your team:

How do you identify feedback loops in AI systems before they cause harm?
When a client says "the model is accurate," what follow-up questions should you ask?
How do you balance statistical accuracy with fairness?
What responsibility does an AI agency have when building systems for clients whose use cases may harm vulnerable communities?

Case Study 3: The Health Insurance Risk Model

Ethical dimensions:

Proxy variables: Using healthcare cost as a proxy for healthcare need encoded systemic inequality
Structural bias: The bias did not come from the model's algorithm but from the structural inequality in the data
Measurement: What you choose to measure determines what the model optimizes for, and this choice has ethical implications
Remediation: The problem was fixable once identified, but it required researchers external to the company to discover it

Discussion questions for your team:

How do you evaluate whether the variables you are using as proxies actually measure what you think they measure?
What structural biases might exist in the data for your current projects?
Should AI agencies be responsible for assessing structural bias, or is that the client's responsibility?
How would you redesign this model if you were the agency that built it?

Case Study 4: The Facial Recognition Misidentification

Ethical dimensions:

Differential accuracy: The system worked well for some people and poorly for others, with the disparity falling along racial lines
Deployment context: Using an imperfect system in a context where errors lead to detention is qualitatively different from using the same system for less consequential purposes
Human-AI interaction: Security guards treated the AI's output as definitive rather than probabilistic
Testing adequacy: The vendor tested the system but did not adequately test across demographics
Consent: Customers did not consent to facial recognition scanning

Discussion questions for your team:

What testing would you require before deploying a facial recognition system?
How should you design human-AI interaction to prevent humans from over-trusting AI outputs?
Are there use cases where AI should not be deployed regardless of accuracy levels?
What is your agency's responsibility if a client wants to deploy a system you believe will cause harm?

The decision made: The platform deployed the model globally with the same thresholds and training data, under pressure to moderate content at scale.

Ethical dimensions:

Linguistic bias: AI models trained on dominant language patterns can discriminate against linguistic minorities
Global deployment of local solutions: Applying a model trained on one population to a global audience creates systematic disparities
Speed versus fairness: Pressure to moderate at scale led to deployment before the model was ready for diverse populations
Silencing effect: Content moderation errors do not just inconvenience users—they suppress speech and participation

Discussion questions for your team:

How do you ensure AI systems work equitably across different linguistic and cultural contexts?
When a client pressures you to deploy faster than you believe is responsible, how do you respond?
What is the minimum level of testing required before deploying an AI system that affects free expression?
How do you balance the harm of deploying an imperfect system against the harm of not deploying any system at all?

The decision made: The agency used the data because it was technically accessible and no law clearly prohibited the specific use.

Ethical dimensions:

Contextual integrity: Data shared in one context (product reviews) was used in a different context (employee monitoring) without consent
Legal versus ethical: Just because something is legal does not make it ethical
Purpose limitation: Data protection principles require that data be used for purposes compatible with the original collection purpose
Power dynamics: Using data to build employee monitoring tools raises concerns about workplace surveillance and power imbalances

Discussion questions for your team:

How do you determine whether a data source is ethically appropriate for your intended use?
Where do you draw the line between legally permissible and ethically appropriate?
What consent would you want if your data were being used to train AI systems?
How does your agency currently evaluate the ethical appropriateness of training data?

Building an Ethics Training Program

Use these case studies as the foundation for an ongoing ethics training program.

Frequency. Conduct ethics discussions at least monthly. Ethics muscles atrophy without exercise.

Participation. Include everyone who touches AI systems—engineers, data scientists, product managers, project managers, and sales. Ethics is not just a technical concern.

It Is Public Data, So Why Did the Whole Team Hesitate

How to Use These Case Studies

Case Study 1: The Biased Hiring Tool

Case Study 2: The Predictive Policing Feedback Loop

Case Study 3: The Health Insurance Risk Model

Case Study 4: The Facial Recognition Misidentification

Building an Ethics Training Program

Your Next Step

Agency Script Editorial

Related Articles

SOC 2 Compliance for AI Service Providers — The Complete Trust Services Guide

SOX Compliance for AI in Financial Reporting — Ensuring Auditability in Every Algorithm

Complete Model Risk Management Guide — Controlling Risk Across the Model Lifecycle

Ready to certify your AI capability?

It Is Public Data, So Why Did the Whole Team Hesitate

How to Use These Case Studies

Case Study 1: The Biased Hiring Tool

Case Study 2: The Predictive Policing Feedback Loop

Case Study 3: The Health Insurance Risk Model

Case Study 4: The Facial Recognition Misidentification

Building an Ethics Training Program

Your Next Step

Agency Script Editorial

Related Articles

SOC 2 Compliance for AI Service Providers — The Complete Trust Services Guide

SOX Compliance for AI in Financial Reporting — Ensuring Auditability in Every Algorithm

Complete Model Risk Management Guide — Controlling Risk Across the Model Lifecycle

Ready to certify your AI capability?

It Is Public Data, So Why Did the Whole Team Hesitate

How to Use These Case Studies

Case Study 1: The Biased Hiring Tool

Case Study 2: The Predictive Policing Feedback Loop

Case Study 3: The Health Insurance Risk Model

Case Study 4: The Facial Recognition Misidentification

Case Study 5: The Social Media Content Moderation Model

Case Study 6: The Data Consent Problem

Building an Ethics Training Program

Your Next Step

Agency Script Editorial

Related Articles

SOC 2 Compliance for AI Service Providers — The Complete Trust Services Guide

SOX Compliance for AI in Financial Reporting — Ensuring Auditability in Every Algorithm

Complete Model Risk Management Guide — Controlling Risk Across the Model Lifecycle

Ready to certify your AI capability?

It Is Public Data, So Why Did the Whole Team Hesitate

How to Use These Case Studies

Case Study 1: The Biased Hiring Tool

Case Study 2: The Predictive Policing Feedback Loop

Case Study 3: The Health Insurance Risk Model

Case Study 4: The Facial Recognition Misidentification

Case Study 5: The Social Media Content Moderation Model

Case Study 6: The Data Consent Problem

Building an Ethics Training Program

Your Next Step

Agency Script Editorial

Related Articles

SOC 2 Compliance for AI Service Providers — The Complete Trust Services Guide

SOX Compliance for AI in Financial Reporting — Ensuring Auditability in Every Algorithm

Complete Model Risk Management Guide — Controlling Risk Across the Model Lifecycle

Ready to certify your AI capability?