Because AGISystem2 shows its work, we can see exactly how definitions affect conclusions. This enables systematic bias detection by analyzing which rules impact which groups, and how changes to definitions change outcomes.
1. Why Explainability Enables Bias Detection
In black-box AI systems, bias is detected through statistical analysis of outputs. With AGISystem2, we can go deeper:
| Approach |
Black-Box Analysis |
Explainability-Based Analysis |
| Detection |
Output disparities |
Rule impact analysis |
| Root Cause |
Unknown |
Specific rules identified |
| Intervention |
Retrain model |
Modify specific rules |
| Verification |
Statistical testing |
Formal proof |
| Traceability |
None |
Proof trace (audit logging/export is external) |
2. Methodology: Definition Impact Analysis
The core insight: change a definition, observe the impact on conclusions.
The Process
- Baseline: Run queries with current definitions
- Modification: Change a definition (add/remove/modify rule)
- Compare: Run same queries, measure outcome changes
- Analyze: Which groups are most affected?
- Trace: Follow proof paths to understand why
Example: Loan Eligibility Analysis
// Original Theory
Eligible_for_loan(X) :-
CreditScore(X) >= 700,
Employment(X) = "full-time",
Income(X) >= 50000.
// Baseline Results
Group A (urban professionals): 85% eligible
Group B (rural workers): 42% eligible
// Modified Definition - add alternative path
Eligible_for_loan(X) :-
CreditScore(X) >= 700,
Employment(X) = "full-time",
Income(X) >= 50000.
Eligible_for_loan(X) :- // NEW: alternative criteria
CreditScore(X) >= 650,
StableIncome(X) = true, // 3+ years same employer
DebtRatio(X) < 0.3.
// Results After Modification
Group A: 87% eligible (+2%)
Group B: 68% eligible (+26%) // Significant improvement
Analysis: The original definition implicitly favored Group A because "full-time employment" and "income >= 50000" correlate with urban professional jobs. Adding an alternative path based on income stability rather than amount reduced this disparity.
3. Types of Bias Analysis
3.1 Direct Attribute Bias
Rules that explicitly reference protected attributes.
// PROBLEMATIC: Direct reference to protected attribute
HighRisk(X) :- Age(X) > 60.
// BETTER: Use relevant factors instead
HighRisk(X) :- HealthConditions(X) includes "chronic".
3.2 Proxy Attribute Bias
Rules that use proxies correlated with protected attributes.
// PROBLEMATIC: ZIP code is proxy for race/income
PriorityService(X) :- ZIPCode(X) in [10001, 10002, 10003].
// ANALYSIS: Check correlation
ZIPCode 10001-10003 → 92% Group A
ZIPCode others → 34% Group A
→ Proxy discrimination detected
3.3 Compound Rule Bias
Individual rules are neutral, but combination creates bias.
// Each rule seems neutral:
Qualified(X) :- Degree(X) = "bachelor".
Qualified(X) :- Experience(X) >= 5.
Selected(X) :- Qualified(X), RecommendedBy(X, Y), Senior(Y).
// But analysis shows:
Senior employees: 78% Group A
→ Recommendation requirement creates pipeline bias
3.4 Threshold Bias
Numeric thresholds that disproportionately affect groups.
// Threshold analysis
Rule: Eligible(X) :- Score(X) >= 700
Group A mean score: 720, std: 50 → 66% eligible
Group B mean score: 680, std: 60 → 37% eligible
// Sensitivity analysis
At threshold 680: Group A 75%, Group B 50%
At threshold 720: Group A 50%, Group B 25%
At threshold 700: Group A 66%, Group B 37%
→ Threshold choice significantly impacts disparity
4. Counterfactual Fairness Analysis
For each decision, ask: "Would the outcome change if only the protected attribute were different?"
// Original case
Person: Alice, Age: 35, Gender: F, Experience: 10yr, Degree: PhD
Decision: HIRED
Proof path: Qualified via degree → Interviewed → Selected
// Counterfactual: Change only gender
Person: Alice', Age: 35, Gender: M, Experience: 10yr, Degree: PhD
Decision: HIRED
Proof path: Qualified via degree → Interviewed → Selected
Result: COUNTERFACTUALLY FAIR
(Same outcome despite protected attribute change)
// Different case
Person: Bob, Age: 62, Experience: 30yr, Health: good
Decision: NOT_PROMOTED
Proof path: Age > 60 → RetirementTrack → NotPromotionEligible
// Counterfactual
Person: Bob', Age: 45, Experience: 30yr, Health: good
Decision: PROMOTED
Proof path: Experience > 20 → SeniorTrack → PromotionEligible
Result: COUNTERFACTUALLY UNFAIR
(Different outcome when only age changed)
Problematic rule: Age > 60 → RetirementTrack
5. Systematic Bias Detection Process
Step 1: Define Protected Attributes
ProtectedAttributes = [Age, Gender, Race, Disability, Religion]
Step 2: Identify Rules Referencing Protected Attributes
// Scan all rules
DirectReferences:
- Rule 23: Age > 60 → RetirementEligible
- Rule 47: Gender = F → MaternityEligible
ProxyReferences (via correlation analysis):
- Rule 12: ZIPCode in [...] (correlated with Race)
- Rule 31: PartTime = true (correlated with Gender)
Step 3: Run Counterfactual Analysis
For each decision D:
For each protected attribute A:
D' = counterfactual(D, flip A)
If outcome(D) ≠ outcome(D'):
Flag as potentially unfair
Record: (D, A, rule_path_diff)
Step 4: Aggregate and Report
Bias Report Summary
─────────────────────────────────────
Total decisions analyzed: 10,000
Counterfactually unfair: 847 (8.5%)
By protected attribute:
Age: 412 cases (4.1%)
Gender: 203 cases (2.0%)
ZIPCode (proxy): 232 cases (2.3%)
Most impactful rules:
1. Rule 23 (Age > 60): 398 affected decisions
2. Rule 12 (ZIPCode): 232 affected decisions
3. Rule 31 (PartTime): 189 affected decisions
6. Intervention and Verification
The Fix Cycle:
- Identify biased rule
- Propose modification
- Re-run analysis on historical data
- Verify bias reduction without functionality loss
- Deploy and monitor
// Original biased rule
HighRisk(X) :- Age(X) > 60.
// Proposed fix
HighRisk(X) :-
MedicalConditions(X) includes "high-risk",
NOT ActiveLifestyle(X).
// Verification
Old rule: 100% of Age>60 flagged as HighRisk
New rule: 34% of Age>60 flagged (those with actual risk factors)
12% of Age<60 flagged (those with risk factors)
Disparity reduction: 66% → 22%
False positive reduction: 45%
7. Continuous Monitoring
Bias can emerge over time as data distributions shift:
Monitoring Dashboard Metrics
- Decision disparity ratios by protected group
- Counterfactual fairness scores
- Rule impact distributions
- Proxy correlation tracking
- Threshold sensitivity alerts
8. Limitations and Considerations
Important Caveats:
- Theory completeness: Analysis is only as good as the encoded rules
- Proxy detection: Some proxies may not be obvious without domain expertise
- Fairness definitions: Different fairness criteria may conflict
- Business necessity: Some disparities may be legally justified
- Data quality: Biased input data leads to biased analysis
9. Research Directions
Open Questions:
- Automatic proxy detection in complex rule networks
- Multi-criteria fairness optimization
- Causal inference integration for better counterfactuals
- Federated bias analysis across organizations
- Real-time bias monitoring at scale
Related Documentation