Bias Study Through Explainability - Trustworthy AI

Because AGISystem2 shows its work, we can see exactly how definitions affect conclusions. This enables systematic bias detection by analyzing which rules impact which groups, and how changes to definitions change outcomes.

1. Why Explainability Enables Bias Detection

In black-box AI systems, bias is detected through statistical analysis of outputs. With AGISystem2, we can go deeper:

Approach	Black-Box Analysis	Explainability-Based Analysis
Detection	Output disparities	Rule impact analysis
Root Cause	Unknown	Specific rules identified
Intervention	Retrain model	Modify specific rules
Verification	Statistical testing	Formal proof
Traceability	None	Proof trace (audit logging/export is external)

2. Methodology: Definition Impact Analysis

The core insight: change a definition, observe the impact on conclusions.

The Process

Baseline: Run queries with current definitions
Modification: Change a definition (add/remove/modify rule)
Compare: Run same queries, measure outcome changes
Analyze: Which groups are most affected?
Trace: Follow proof paths to understand why

Example: Loan Eligibility Analysis

// Original Theory
Eligible_for_loan(X) :-
    CreditScore(X) >= 700,
    Employment(X) = "full-time",
    Income(X) >= 50000.

// Baseline Results
Group A (urban professionals): 85% eligible
Group B (rural workers): 42% eligible

// Modified Definition - add alternative path
Eligible_for_loan(X) :-
    CreditScore(X) >= 700,
    Employment(X) = "full-time",
    Income(X) >= 50000.

Eligible_for_loan(X) :-    // NEW: alternative criteria
    CreditScore(X) >= 650,
    StableIncome(X) = true,  // 3+ years same employer
    DebtRatio(X) < 0.3.

// Results After Modification
Group A: 87% eligible (+2%)
Group B: 68% eligible (+26%)  // Significant improvement
  

Analysis: The original definition implicitly favored Group A because "full-time employment" and "income >= 50000" correlate with urban professional jobs. Adding an alternative path based on income stability rather than amount reduced this disparity.

3. Types of Bias Analysis

3.1 Direct Attribute Bias

Rules that explicitly reference protected attributes.

// PROBLEMATIC: Direct reference to protected attribute
HighRisk(X) :- Age(X) > 60.

// BETTER: Use relevant factors instead
HighRisk(X) :- HealthConditions(X) includes "chronic".
    

3.2 Proxy Attribute Bias

Rules that use proxies correlated with protected attributes.

// PROBLEMATIC: ZIP code is proxy for race/income
PriorityService(X) :- ZIPCode(X) in [10001, 10002, 10003].

// ANALYSIS: Check correlation
ZIPCode 10001-10003 → 92% Group A
ZIPCode others → 34% Group A
→ Proxy discrimination detected
    

3.3 Compound Rule Bias

Individual rules are neutral, but combination creates bias.

// Each rule seems neutral:
Qualified(X) :- Degree(X) = "bachelor".
Qualified(X) :- Experience(X) >= 5.
Selected(X) :- Qualified(X), RecommendedBy(X, Y), Senior(Y).

// But analysis shows:
Senior employees: 78% Group A
→ Recommendation requirement creates pipeline bias
    

3.4 Threshold Bias

Numeric thresholds that disproportionately affect groups.

// Threshold analysis
Rule: Eligible(X) :- Score(X) >= 700

Group A mean score: 720, std: 50 → 66% eligible
Group B mean score: 680, std: 60 → 37% eligible

// Sensitivity analysis
At threshold 680: Group A 75%, Group B 50%
At threshold 720: Group A 50%, Group B 25%
At threshold 700: Group A 66%, Group B 37%

→ Threshold choice significantly impacts disparity
    

4. Counterfactual Fairness Analysis

For each decision, ask: "Would the outcome change if only the protected attribute were different?"

// Original case
Person: Alice, Age: 35, Gender: F, Experience: 10yr, Degree: PhD
Decision: HIRED
Proof path: Qualified via degree → Interviewed → Selected

// Counterfactual: Change only gender
Person: Alice', Age: 35, Gender: M, Experience: 10yr, Degree: PhD
Decision: HIRED
Proof path: Qualified via degree → Interviewed → Selected

Result: COUNTERFACTUALLY FAIR
(Same outcome despite protected attribute change)

// Different case
Person: Bob, Age: 62, Experience: 30yr, Health: good
Decision: NOT_PROMOTED
Proof path: Age > 60 → RetirementTrack → NotPromotionEligible

// Counterfactual
Person: Bob', Age: 45, Experience: 30yr, Health: good
Decision: PROMOTED
Proof path: Experience > 20 → SeniorTrack → PromotionEligible

Result: COUNTERFACTUALLY UNFAIR
(Different outcome when only age changed)
Problematic rule: Age > 60 → RetirementTrack
  

5. Systematic Bias Detection Process

Step 1: Define Protected Attributes

ProtectedAttributes = [Age, Gender, Race, Disability, Religion]

Step 2: Identify Rules Referencing Protected Attributes

// Scan all rules
DirectReferences:
  - Rule 23: Age > 60 → RetirementEligible
  - Rule 47: Gender = F → MaternityEligible

ProxyReferences (via correlation analysis):
  - Rule 12: ZIPCode in [...] (correlated with Race)
  - Rule 31: PartTime = true (correlated with Gender)
  

Step 3: Run Counterfactual Analysis

For each decision D:
  For each protected attribute A:
    D' = counterfactual(D, flip A)
    If outcome(D) ≠ outcome(D'):
      Flag as potentially unfair
      Record: (D, A, rule_path_diff)
  

Step 4: Aggregate and Report

Bias Report Summary
─────────────────────────────────────
Total decisions analyzed: 10,000
Counterfactually unfair: 847 (8.5%)

By protected attribute:
  Age: 412 cases (4.1%)
  Gender: 203 cases (2.0%)
  ZIPCode (proxy): 232 cases (2.3%)

Most impactful rules:
  1. Rule 23 (Age > 60): 398 affected decisions
  2. Rule 12 (ZIPCode): 232 affected decisions
  3. Rule 31 (PartTime): 189 affected decisions
  

6. Intervention and Verification

The Fix Cycle:

Identify biased rule
Propose modification
Re-run analysis on historical data
Verify bias reduction without functionality loss
Deploy and monitor

// Original biased rule
HighRisk(X) :- Age(X) > 60.

// Proposed fix
HighRisk(X) :-
    MedicalConditions(X) includes "high-risk",
    NOT ActiveLifestyle(X).

// Verification
Old rule: 100% of Age>60 flagged as HighRisk
New rule: 34% of Age>60 flagged (those with actual risk factors)
          12% of Age<60 flagged (those with risk factors)

Disparity reduction: 66% → 22%
False positive reduction: 45%
  

7. Continuous Monitoring

Bias can emerge over time as data distributions shift:

Monitoring Dashboard Metrics

Decision disparity ratios by protected group
Counterfactual fairness scores
Rule impact distributions
Proxy correlation tracking
Threshold sensitivity alerts

8. Limitations and Considerations

Important Caveats:

Theory completeness: Analysis is only as good as the encoded rules
Proxy detection: Some proxies may not be obvious without domain expertise
Fairness definitions: Different fairness criteria may conflict
Business necessity: Some disparities may be legally justified
Data quality: Biased input data leads to biased analysis

9. Research Directions

Open Questions:

Automatic proxy detection in complex rule networks
Multi-criteria fairness optimization
Causal inference integration for better counterfactuals
Federated bias analysis across organizations
Real-time bias monitoring at scale

Bias Study Through Explainability Research