Bias trong AI: các dạng bias, cách detect và mitigate?

Model học từ data → inherit bias trong data. Với AI ảnh hưởng quyết định quan trọng (hiring, loan, healthcare), bias có thể gây hại thực và legal liability.

Các nguồn bias:

1. Data bias
- Selection bias — training data không đại diện (VD toàn tiếng Anh → kém với ngôn ngữ khác).
- Historical bias — data phản ánh discrimination quá khứ (hồ sơ tuyển dụng cũ thiên nam → model học bias).
- Representation bias — nhóm thiểu số under-represented → model perform kém cho họ.
- Measurement bias — label không chính xác đồng đều across groups.

2. Algorithmic bias
- Choice of loss, architecture tối ưu average metric → sacrifice minority group.
- Sampling strategy bias trong training.

3. Deployment bias
- Model deploy trong context khác training.
- User behavior feedback loop (recommendation → exposure bias).

Dạng biểu hiện:

Gender bias: "nurse" → female, "engineer" → male trong completions.
Racial bias: LLM assign negative trait tới tên thuộc ethnic group.
Age: "young" associated với innovation, "old" với stubborn.
Geographic: tri thức về Global North nhiều hơn Global South.
Socioeconomic: stereotype về income, education.
Political: lean liberal/conservative tùy training data.
Language: perform tốt English, kém các ngôn ngữ khác (Swahili, Bengali).

Cách detect:

1. Benchmark standards
- BBQ (Bias Benchmark for QA) — test biased assumptions trong Q&A.
- StereoSet — measure stereotyping.
- CrowS-Pairs — pairs test social bias 9 category.
- BOLD — bias in open-ended language generation.
- WinoBias, WinoGender — coreference bias.
- RealToxicityPrompts — toxicity tendency.

2. Fairness metrics (cho classification task):
- Demographic parity — positive rate đồng đều across groups.
- Equal opportunity — true positive rate đồng đều.
- Equalized odds — both TPR và FPR đồng đều.
- Calibration — confidence score mean accuracy đồng đều.
- Individual fairness — individual tương tự nhận prediction tương tự.

Trade-off: không thể thỏa mãn tất cả metrics cùng lúc (impossibility theorem, Kleinberg 2016).

3. Counterfactual testing
- Hold everything constant except protected attribute → check output thay đổi không.
- VD: CV giống y hệt, đổi tên "John" → "Jamal" → offer rate khác không?

4. Red team + adversarial
- Probe với prompt dạng bias-inducing.
- Human auditor đánh giá output qualitatively.

Mitigation:

A. Data-level
- Re-sampling — oversample minority, undersample majority.
- Data augmentation — generate synthetic data cho under-represented groups.
- Data cleaning — remove biased label, duplicate.
- Counterfactual data augmentation — tạo pair có/không protected attribute.

B. Model-level
- Fair training — add fairness constraint vào loss.
- Adversarial debiasing — train classifier predict protected attribute; main model adversarially minimize.
- Post-hoc calibration — adjust threshold per group.

C. LLM-specific
- RLHF / DPO với diverse preference data — include nhiều demographic.
- Constitutional AI — LLM self-critique bias.
- System prompt — instruction "avoid stereotyping, treat all groups equally".
- Output filter — detect và rewrite biased output.

D. Deployment
- Monitoring continuous — track metric theo group qua time.
- Human-in-the-loop cho high-stakes decisions.
- Transparent model card — document known biases.
- Right to explanation — user có thể challenge decision.

Tool:
- AIF360 (IBM) — bias detection + mitigation toolkit.
- Fairlearn (Microsoft) — fairness assessment.
- What-If Tool (Google) — interactive fairness exploration.
- LangKit — bias/toxicity metrics cho LLM.

Legal/compliance:
- EU AI Act — AI cho hiring, credit = high-risk → bias audit bắt buộc.
- Colorado AI Act (2024) — algorithmic discrimination requirements.
- NYC Local Law 144 — audit bias cho automated employment decision.
- GDPR Article 22 — right to contest automated decision.

Rule: không có "unbiased AI" — tất cả data chứa bias. Mục tiêu: document, measure, mitigate, monitor — và cho human final say trong high-stakes.

Models learn from data → inherit its biases. For AI influencing important decisions (hiring, loans, healthcare), bias causes real harm and legal liability.

Sources of bias:

1. Data bias
- Selection bias — training data isn't representative (e.g. all English → poor on other languages).
- Historical bias — data reflects past discrimination (old hiring records skew male → model learns bias).
- Representation bias — minorities under-represented → poor performance for them.
- Measurement bias — labels not uniformly accurate across groups.

2. Algorithmic bias
- Loss / architecture choices optimize average metrics → sacrifice minority groups.
- Biased sampling strategies during training.

3. Deployment bias
- Model deployed in contexts unlike training.
- User behavior feedback loops (recommendation → exposure bias).

Common manifestations:

Gender bias: "nurse" → female, "engineer" → male in completions.
Racial bias: LLM assigns negative traits to names of certain ethnicities.
Age: "young" associated with innovation, "old" with stubbornness.
Geographic: more knowledge about the Global North than the Global South.
Socioeconomic: stereotypes about income, education.
Political: leans liberal/conservative depending on training data.
Language: strong in English, weaker in others (Swahili, Bengali).

Detection:

1. Benchmarks
- BBQ (Bias Benchmark for QA) — tests biased assumptions in Q&A.
- StereoSet — measures stereotyping.
- CrowS-Pairs — paired social-bias tests across 9 categories.
- BOLD — bias in open-ended generation.
- WinoBias, WinoGender — coreference bias.
- RealToxicityPrompts — toxicity tendencies.

2. Fairness metrics (classification):
- Demographic parity — equal positive rate across groups.
- Equal opportunity — equal true positive rate.
- Equalized odds — equal TPR and FPR.
- Calibration — confidence-score-to-accuracy equally across groups.
- Individual fairness — similar individuals get similar predictions.

Trade-off: you can't satisfy all metrics simultaneously (impossibility theorem, Kleinberg 2016).

3. Counterfactual testing
- Hold everything constant except the protected attribute → check whether output changes.
- e.g. identical CV, swap "John" → "Jamal" → does offer rate change?

4. Red team + adversarial
- Probe with bias-inducing prompts.
- Human auditors rate output qualitatively.

Mitigation:

A. Data-level
- Re-sampling — oversample minorities, undersample majorities.
- Data augmentation — generate synthetic data for under-represented groups.
- Data cleaning — remove biased or duplicate labels.
- Counterfactual data augmentation — pairs with/without the protected attribute.

B. Model-level
- Fair training — add fairness constraints to the loss.
- Adversarial debiasing — train a classifier to predict the protected attribute; main model adversarially minimizes it.
- Post-hoc calibration — per-group threshold adjustment.

C. LLM-specific
- RLHF / DPO with diverse preference data — include varied demographics.
- Constitutional AI — LLM self-critique for bias.
- System prompt — instruction "avoid stereotyping, treat all groups equally".
- Output filter — detect and rewrite biased output.

D. Deployment
- Continuous monitoring — track metrics per group over time.
- Human-in-the-loop for high-stakes decisions.
- Transparent model cards — document known biases.
- Right to explanation — users can challenge decisions.

Tools:
- AIF360 (IBM) — bias detection + mitigation toolkit.
- Fairlearn (Microsoft) — fairness assessment.
- What-If Tool (Google) — interactive fairness exploration.
- LangKit — bias/toxicity metrics for LLMs.

Legal/compliance:
- EU AI Act — AI for hiring or credit = high-risk → mandatory bias audit.
- Colorado AI Act (2024) — algorithmic discrimination requirements.
- NYC Local Law 144 — bias audits for automated employment decisions.
- GDPR Article 22 — right to contest automated decisions.

Rule: there is no "unbiased AI" — all data contains bias. The goal: document, measure, mitigate, monitor — and keep humans in the loop for high stakes.

Xem toàn bộ AI Engineering cùng filter theo level & chủ đề con.

Mở danh sách AI Engineering

Bias trong AI: các dạng bias, cách detect và mitigate?

Model học từ data → inherit bias trong data. Với AI ảnh hưởng quyết định quan trọng (hiring, loan, healthcare), bias có thể gây hại thực và legal liability.

Các nguồn bias:

2. Algorithmic bias
- Choice of loss, architecture tối ưu average metric → sacrifice minority group.
- Sampling strategy bias trong training.

3. Deployment bias
- Model deploy trong context khác training.
- User behavior feedback loop (recommendation → exposure bias).

Dạng biểu hiện:

Gender bias: "nurse" → female, "engineer" → male trong completions.
Racial bias: LLM assign negative trait tới tên thuộc ethnic group.
Age: "young" associated với innovation, "old" với stubborn.
Geographic: tri thức về Global North nhiều hơn Global South.
Socioeconomic: stereotype về income, education.
Political: lean liberal/conservative tùy training data.
Language: perform tốt English, kém các ngôn ngữ khác (Swahili, Bengali).

Cách detect:

Trade-off: không thể thỏa mãn tất cả metrics cùng lúc (impossibility theorem, Kleinberg 2016).

4. Red team + adversarial
- Probe với prompt dạng bias-inducing.
- Human auditor đánh giá output qualitatively.

Mitigation:

Rule: không có "unbiased AI" — tất cả data chứa bias. Mục tiêu: document, measure, mitigate, monitor — và cho human final say trong high-stakes.

Models learn from data → inherit its biases. For AI influencing important decisions (hiring, loans, healthcare), bias causes real harm and legal liability.

Sources of bias:

2. Algorithmic bias
- Loss / architecture choices optimize average metrics → sacrifice minority groups.
- Biased sampling strategies during training.

3. Deployment bias
- Model deployed in contexts unlike training.
- User behavior feedback loops (recommendation → exposure bias).

Common manifestations:

Gender bias: "nurse" → female, "engineer" → male in completions.
Racial bias: LLM assigns negative traits to names of certain ethnicities.
Age: "young" associated with innovation, "old" with stubbornness.
Geographic: more knowledge about the Global North than the Global South.
Socioeconomic: stereotypes about income, education.
Political: leans liberal/conservative depending on training data.
Language: strong in English, weaker in others (Swahili, Bengali).

Detection:

Trade-off: you can't satisfy all metrics simultaneously (impossibility theorem, Kleinberg 2016).

3. Counterfactual testing
- Hold everything constant except the protected attribute → check whether output changes.
- e.g. identical CV, swap "John" → "Jamal" → does offer rate change?

4. Red team + adversarial
- Probe with bias-inducing prompts.
- Human auditors rate output qualitatively.

Mitigation:

Rule: there is no "unbiased AI" — all data contains bias. The goal: document, measure, mitigate, monitor — and keep humans in the loop for high stakes.

Xem toàn bộ AI Engineering cùng filter theo level & chủ đề con.

Mở danh sách AI Engineering