Because AGISystem2 shows its work, we can see exactly how definitions affect conclusions. This enables systematic bias detection by analyzing which rules impact which groups, and how changes to definitions change outcomes.

1. Why Explainability Enables Bias Detection

In black-box AI systems, bias is detected through statistical analysis of outputs. With AGISystem2, we can go deeper:

Approach Black-Box Analysis Explainability-Based Analysis
Detection Output disparities Rule impact analysis
Root Cause Unknown Specific rules identified
Intervention Retrain model Modify specific rules
Verification Statistical testing Formal proof
Traceability None Proof trace (audit logging/export is external)

2. Methodology: Definition Impact Analysis

The core insight: change a definition, observe the impact on conclusions.

The Process

  1. Baseline: Run queries with current definitions
  2. Modification: Change a definition (add/remove/modify rule)
  3. Compare: Run same queries, measure outcome changes
  4. Analyze: Which groups are most affected?
  5. Trace: Follow proof paths to understand why

Example: Loan Eligibility Analysis

// Original Theory Eligible_for_loan(X) :- CreditScore(X) >= 700, Employment(X) = "full-time", Income(X) >= 50000. // Baseline Results Group A (urban professionals): 85% eligible Group B (rural workers): 42% eligible // Modified Definition - add alternative path Eligible_for_loan(X) :- CreditScore(X) >= 700, Employment(X) = "full-time", Income(X) >= 50000. Eligible_for_loan(X) :- // NEW: alternative criteria CreditScore(X) >= 650, StableIncome(X) = true, // 3+ years same employer DebtRatio(X) < 0.3. // Results After Modification Group A: 87% eligible (+2%) Group B: 68% eligible (+26%) // Significant improvement
Analysis: The original definition implicitly favored Group A because "full-time employment" and "income >= 50000" correlate with urban professional jobs. Adding an alternative path based on income stability rather than amount reduced this disparity.

3. Types of Bias Analysis

3.1 Direct Attribute Bias

Rules that explicitly reference protected attributes.

// PROBLEMATIC: Direct reference to protected attribute HighRisk(X) :- Age(X) > 60. // BETTER: Use relevant factors instead HighRisk(X) :- HealthConditions(X) includes "chronic".

3.2 Proxy Attribute Bias

Rules that use proxies correlated with protected attributes.

// PROBLEMATIC: ZIP code is proxy for race/income PriorityService(X) :- ZIPCode(X) in [10001, 10002, 10003]. // ANALYSIS: Check correlation ZIPCode 10001-10003 → 92% Group A ZIPCode others → 34% Group A → Proxy discrimination detected

3.3 Compound Rule Bias

Individual rules are neutral, but combination creates bias.

// Each rule seems neutral: Qualified(X) :- Degree(X) = "bachelor". Qualified(X) :- Experience(X) >= 5. Selected(X) :- Qualified(X), RecommendedBy(X, Y), Senior(Y). // But analysis shows: Senior employees: 78% Group A → Recommendation requirement creates pipeline bias

3.4 Threshold Bias

Numeric thresholds that disproportionately affect groups.

// Threshold analysis Rule: Eligible(X) :- Score(X) >= 700 Group A mean score: 720, std: 50 → 66% eligible Group B mean score: 680, std: 60 → 37% eligible // Sensitivity analysis At threshold 680: Group A 75%, Group B 50% At threshold 720: Group A 50%, Group B 25% At threshold 700: Group A 66%, Group B 37% → Threshold choice significantly impacts disparity

4. Counterfactual Fairness Analysis

For each decision, ask: "Would the outcome change if only the protected attribute were different?"

// Original case Person: Alice, Age: 35, Gender: F, Experience: 10yr, Degree: PhD Decision: HIRED Proof path: Qualified via degree → Interviewed → Selected // Counterfactual: Change only gender Person: Alice', Age: 35, Gender: M, Experience: 10yr, Degree: PhD Decision: HIRED Proof path: Qualified via degree → Interviewed → Selected Result: COUNTERFACTUALLY FAIR (Same outcome despite protected attribute change) // Different case Person: Bob, Age: 62, Experience: 30yr, Health: good Decision: NOT_PROMOTED Proof path: Age > 60 → RetirementTrack → NotPromotionEligible // Counterfactual Person: Bob', Age: 45, Experience: 30yr, Health: good Decision: PROMOTED Proof path: Experience > 20 → SeniorTrack → PromotionEligible Result: COUNTERFACTUALLY UNFAIR (Different outcome when only age changed) Problematic rule: Age > 60 → RetirementTrack

5. Systematic Bias Detection Process

Step 1: Define Protected Attributes

ProtectedAttributes = [Age, Gender, Race, Disability, Religion]

Step 2: Identify Rules Referencing Protected Attributes

// Scan all rules DirectReferences: - Rule 23: Age > 60 → RetirementEligible - Rule 47: Gender = F → MaternityEligible ProxyReferences (via correlation analysis): - Rule 12: ZIPCode in [...] (correlated with Race) - Rule 31: PartTime = true (correlated with Gender)

Step 3: Run Counterfactual Analysis

For each decision D: For each protected attribute A: D' = counterfactual(D, flip A) If outcome(D) ≠ outcome(D'): Flag as potentially unfair Record: (D, A, rule_path_diff)

Step 4: Aggregate and Report

Bias Report Summary ───────────────────────────────────── Total decisions analyzed: 10,000 Counterfactually unfair: 847 (8.5%) By protected attribute: Age: 412 cases (4.1%) Gender: 203 cases (2.0%) ZIPCode (proxy): 232 cases (2.3%) Most impactful rules: 1. Rule 23 (Age > 60): 398 affected decisions 2. Rule 12 (ZIPCode): 232 affected decisions 3. Rule 31 (PartTime): 189 affected decisions

6. Intervention and Verification

The Fix Cycle:
  1. Identify biased rule
  2. Propose modification
  3. Re-run analysis on historical data
  4. Verify bias reduction without functionality loss
  5. Deploy and monitor
// Original biased rule HighRisk(X) :- Age(X) > 60. // Proposed fix HighRisk(X) :- MedicalConditions(X) includes "high-risk", NOT ActiveLifestyle(X). // Verification Old rule: 100% of Age>60 flagged as HighRisk New rule: 34% of Age>60 flagged (those with actual risk factors) 12% of Age<60 flagged (those with risk factors) Disparity reduction: 66% → 22% False positive reduction: 45%

7. Continuous Monitoring

Bias can emerge over time as data distributions shift:

Monitoring Dashboard Metrics

8. Limitations and Considerations

Important Caveats:

9. Research Directions

Open Questions:

Related Documentation