Pick One: You Cannot Have Three Fairness Guarantees at Once

The uncomfortable truth that most fairness tutorials bury is this: you cannot satisfy every reasonable definition of fairness at once. The mathematics forbid it. Once you accept that, the question stops being "how do I make this model fair?" and becomes "which fairness property am I willing to optimize, and what am I giving up to get it?" That reframing is the entire game.

This article lays out the competing approaches to AI fairness, the axes that actually separate them, and a decision rule you can apply to a real project this week. It assumes you already know what bias is in broad terms. If you do not, start with The Complete Guide to Ai Bias and Fairness Fundamentals and come back. What follows is the comparison layer: the part where you stop reading definitions and start making choices that have consequences.

The Three Families of Fairness Definitions

Most fairness criteria collapse into three families. Knowing which family you are in tells you most of what you need.

Independence (demographic parity)

This family demands that model outcomes be statistically independent of a protected attribute. If 40 percent of one group is approved, 40 percent of every group should be approved. It is the easiest definition to explain to a non-technical stakeholder and the easiest to audit. The trade-off is that it ignores whether the groups actually differ on legitimate, outcome-relevant factors. If qualified applicants genuinely cluster differently across groups, enforcing parity forces you to approve weaker candidates in one group or reject stronger ones in another.

Separation (equalized odds)

This family conditions on the ground truth. It asks that the model's error rates — false positives and false negatives — be equal across groups for people who share the same true outcome. This is usually the right target when the cost of a mistake is what matters: a fraud model that flags one group's legitimate transactions twice as often is failing separation. The trade-off is that you need reliable labels, and your labels are often the very thing that was biased in the first place.

Sufficiency (calibration)

This family asks that a predicted score mean the same thing regardless of group. If your model says "70 percent likely to repay," that should be true 70 percent of the time for every group. Calibration is what risk and pricing teams care about. The trade-off is that a perfectly calibrated model can still have wildly different false-positive rates across groups, which is exactly the harm that separation tries to prevent.

Why You Cannot Have All Three

The impossibility results are not academic hand-waving. When base rates differ across groups — and they almost always do — you can satisfy at most one or two of independence, separation, and sufficiency simultaneously. Picking one mathematically rules out the others. This is the single most important fact in the field, and the reason "is this model fair?" is the wrong question. The right question is "fair according to which definition, and who decided that was the one that mattered?"

The Axes That Actually Separate Approaches

Beyond the definition you choose, four practical axes decide your approach.

Where you intervene. Pre-processing reweights or repairs the training data, in-processing adds a fairness constraint to the loss function, and post-processing adjusts thresholds after the model is trained. Post-processing is fast and reversible; in-processing usually gives the best accuracy-fairness trade-off but couples fairness tightly to model retraining.
Group vs. individual fairness. Group methods equalize statistics across categories. Individual fairness demands that similar people get similar outcomes, which is principled but requires a similarity metric you rarely have.
Whether you can use the protected attribute. Some jurisdictions let you use group membership to correct for bias; others forbid it entirely, forcing you into "fairness through unawareness," which is the weakest option because proxies leak the attribute back in.
Static vs. dynamic. A model that is fair at launch can drift as the population shifts. Treating fairness as a one-time audit rather than a monitored property is a common failure mode covered in The Hidden Risks of Ai Bias and Fairness Fundamentals (and How to Manage Them).

A Decision Rule You Can Actually Use

Here is a sequence that resolves most real cases.

Identify the dominant harm. Is the worst outcome a wrongful denial (false negative), a wrongful flag (false positive), or unequal access? The answer points you to separation, separation, or independence respectively.
Check label trust. If your ground-truth labels are themselves products of a biased process, separation and sufficiency inherit that bias. Lean toward independence or fix the labels first.
Check the legal frame. If you cannot use the protected attribute at decision time, you are limited to pre-processing or unawareness, and you must monitor proxies aggressively.
Pick the cheapest intervention that holds. Start with post-processing threshold adjustment. Only move to in-processing if post-processing cannot close the gap without unacceptable accuracy loss.
Write down what you gave up. Document the definition you rejected and why. This document is what protects you when someone later asks why the model fails their preferred metric.

For the metrics that operationalize each definition, see How to Measure Ai Bias and Fairness Fundamentals: Metrics That Matter.

A Worked Example

Consider a lending model. The dominant harm is wrongful denial of credit to creditworthy applicants — a false negative. That points to separation. But the historical repayment labels reflect decades of discriminatory lending, so label trust is low. You cannot fully rely on equalized odds against corrupted labels. The pragmatic path: enforce a separation constraint on the labels you do trust, supplement with a demographic-parity floor to guard against the corrupted-label problem, and use post-processing thresholds so a regulator can inspect exactly what adjustment you applied to each group. You accept some calibration loss, document it, and move on. That is what a defensible decision looks like — not perfection, but a justified position.

Frequently Asked Questions

Is demographic parity always the safest default?

No. It is the easiest to explain and audit, which makes it tempting, but it can force you to make worse decisions for everyone when groups genuinely differ on relevant factors. Use it when access itself is the harm, not when decision quality is.

Can I just remove the protected attribute and call it fair?

That is "fairness through unawareness," and it is the weakest option. Other features act as proxies — zip code stands in for race, first name stands in for gender. Removing the attribute often makes bias harder to measure without making the model any fairer.

Should fairness live in the training loss or in post-processing?

Post-processing first, because it is fast, reversible, and inspectable. Move fairness into the loss function only when threshold adjustment cannot close the gap without sacrificing too much accuracy. In-processing usually gives a better trade-off but ties fairness to every retrain.

How do I choose between equalized odds and calibration?

Ask whether your stakeholders care more about equal error rates or equally meaningful scores. Risk and pricing teams want calibration; people worried about wrongful flags or denials want equalized odds. You generally cannot have both when base rates differ.

Does picking a definition ever get me in legal trouble?

It can, because some definitions require using the protected attribute, which is restricted in domains like lending and employment. Confirm what you are allowed to use at decision time before you pick an approach, then document the choice.

Key Takeaways

You cannot satisfy independence, separation, and sufficiency at once when base rates differ — choose deliberately.
Map your choice to the dominant harm: access points to parity, wrongful flags and denials point to equalized odds, score meaning points to calibration.
Distrust your labels before you trust separation or calibration built on them.
Prefer the cheapest intervention that works — usually post-processing threshold adjustment — and escalate only when it fails.
Document the definition you rejected; that record is your defense when the trade-off is questioned later.

The Three Families of Fairness Definitions

Most fairness criteria collapse into three families. Knowing which family you are in tells you most of what you need.

Independence (demographic parity)

Separation (equalized odds)

Sufficiency (calibration)

Why You Cannot Have All Three

The Axes That Actually Separate Approaches

Beyond the definition you choose, four practical axes decide your approach.

Where you intervene. Pre-processing reweights or repairs the training data, in-processing adds a fairness constraint to the loss function, and post-processing adjusts thresholds after the model is trained. Post-processing is fast and reversible; in-processing usually gives the best accuracy-fairness trade-off but couples fairness tightly to model retraining.
Group vs. individual fairness. Group methods equalize statistics across categories. Individual fairness demands that similar people get similar outcomes, which is principled but requires a similarity metric you rarely have.
Whether you can use the protected attribute. Some jurisdictions let you use group membership to correct for bias; others forbid it entirely, forcing you into "fairness through unawareness," which is the weakest option because proxies leak the attribute back in.
Static vs. dynamic. A model that is fair at launch can drift as the population shifts. Treating fairness as a one-time audit rather than a monitored property is a common failure mode covered in The Hidden Risks of Ai Bias and Fairness Fundamentals (and How to Manage Them).

A Decision Rule You Can Actually Use

Here is a sequence that resolves most real cases.

Identify the dominant harm. Is the worst outcome a wrongful denial (false negative), a wrongful flag (false positive), or unequal access? The answer points you to separation, separation, or independence respectively.
Check label trust. If your ground-truth labels are themselves products of a biased process, separation and sufficiency inherit that bias. Lean toward independence or fix the labels first.
Check the legal frame. If you cannot use the protected attribute at decision time, you are limited to pre-processing or unawareness, and you must monitor proxies aggressively.
Pick the cheapest intervention that holds. Start with post-processing threshold adjustment. Only move to in-processing if post-processing cannot close the gap without unacceptable accuracy loss.
Write down what you gave up. Document the definition you rejected and why. This document is what protects you when someone later asks why the model fails their preferred metric.

For the metrics that operationalize each definition, see How to Measure Ai Bias and Fairness Fundamentals: Metrics That Matter.

A Worked Example

Frequently Asked Questions

Is demographic parity always the safest default?

Can I just remove the protected attribute and call it fair?

Should fairness live in the training loss or in post-processing?

How do I choose between equalized odds and calibration?

Does picking a definition ever get me in legal trouble?

Key Takeaways

You cannot satisfy independence, separation, and sufficiency at once when base rates differ — choose deliberately.
Map your choice to the dominant harm: access points to parity, wrongful flags and denials point to equalized odds, score meaning points to calibration.
Distrust your labels before you trust separation or calibration built on them.
Prefer the cheapest intervention that works — usually post-processing threshold adjustment — and escalate only when it fails.
Document the definition you rejected; that record is your defense when the trade-off is questioned later.

Pick One: You Cannot Have Three Fairness Guarantees at Once

The Three Families of Fairness Definitions

Independence (demographic parity)

Separation (equalized odds)

Sufficiency (calibration)

Why You Cannot Have All Three

The Axes That Actually Separate Approaches

A Decision Rule You Can Actually Use

A Worked Example

Frequently Asked Questions

Is demographic parity always the safest default?

Can I just remove the protected attribute and call it fair?

Should fairness live in the training loss or in post-processing?

How do I choose between equalized odds and calibration?

Does picking a definition ever get me in legal trouble?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

A Model Behind an API Is Only Potential

Case Study: Large Language Models in Practice

Ready to certify your AI capability?

Pick One: You Cannot Have Three Fairness Guarantees at Once

The Three Families of Fairness Definitions

Independence (demographic parity)

Separation (equalized odds)

Sufficiency (calibration)

Why You Cannot Have All Three

The Axes That Actually Separate Approaches

A Decision Rule You Can Actually Use

A Worked Example

Frequently Asked Questions

Is demographic parity always the safest default?

Can I just remove the protected attribute and call it fair?

Should fairness live in the training loss or in post-processing?

How do I choose between equalized odds and calibration?

Does picking a definition ever get me in legal trouble?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

A Model Behind an API Is Only Potential

Case Study: Large Language Models in Practice

Ready to certify your AI capability?