Research Overview

Goal: Leverage LLMs (Claude, GPT, etc.) to handle ambiguous, idiomatic, or context-dependent natural language before formal grammar parsing.

Principle: LLMs handle language understanding; AGISystem2 handles formal reasoning. Each does what it's best at.

1. Integration Patterns

Pattern A: Preprocessing Pipeline

User Input (ambiguous)
        │
        ▼
┌───────────────────┐
│ LLM Preprocessor  │  ← Resolve ambiguity, expand context
└───────────────────┘
        │
        ▼
┌───────────────────┐
│ Grammar Parser    │  ← Parse normalized text
└───────────────────┘
        │
        ▼
┌───────────────────┐
│ AGISystem2        │  ← Formal reasoning
└───────────────────┘

Pattern B: Fallback Translation

User Input
        │
        ▼
┌───────────────────┐
│ Grammar Parser    │  ← Try grammar first
└───────────────────┘
        │
    ┌───┴───┐
    │Failed │
    └───┬───┘
        ▼
┌───────────────────┐
│ LLM Translation   │  ← LLM generates DSL directly
└───────────────────┘
        │
        ▼
┌───────────────────┐
│ DSL Validator     │  ← Validate LLM output
└───────────────────┘

2. LLM Capabilities We Leverage

Capability Use Case Example
Coreference Resolution Resolve pronouns and references "John saw Mary. He waved." → "John waved"
Idiom Expansion Convert idioms to literal meaning "It's raining cats and dogs" → "It's raining heavily"
Domain Terminology Expand jargon and abbreviations "The patient has HTN" → "has hypertension"
Implicit Relations Make implicit knowledge explicit "Paris is a capital" → "Paris is the capital of France"
Sentence Simplification Break complex sentences "The tall man who wore a hat left" → "A man wore a hat. The man was tall. The man left."

3. Prompt Engineering

Preprocessing Prompt Template:
You are a natural language normalizer for a formal reasoning system.

Given this input text:
"${userInput}"

Transform it following these rules:
1. Resolve all pronouns to their referents
2. Expand abbreviations and acronyms
3. Convert idioms to literal meaning
4. Make implicit relations explicit
5. Split complex sentences into simple ones
6. Preserve all semantic content

Output the normalized text only, no explanations.
Direct DSL Translation Prompt:
Translate to AGISystem2 DSL:

DSL Syntax:
- isA Subject Type (taxonomy)
- has Subject Property (attributes)
- Implies $antecedent $consequent (rules)
- And $a $b, Or $a $b, Not (x) (logic)
- Variables: ?x, ?y (in rules)

Input: "${userInput}"

Output valid DSL only, one statement per line.

4. Validation & Trust Boundaries

Critical Principle: LLM output MUST be validated before use in reasoning.
// Validation pipeline
async function validateLLMOutput(dsl, originalInput) {
  // 1. Syntax check
  const parseResult = parseDSL(dsl);
  if (parseResult.errors.length > 0) {
    return { valid: false, reason: 'syntax_error' };
  }

  // 2. Extract entities from input
  const inputEntities = extractEntities(originalInput);

  // 3. Check all entities present in DSL
  const dslEntities = extractDSLEntities(dsl);
  const missing = inputEntities.filter(e => !dslEntities.includes(e));
  if (missing.length > 0) {
    return { valid: false, reason: 'missing_entities', missing };
  }

  // 4. Check for unknown operators
  const unknownOps = findUnknownOperators(dsl);
  if (unknownOps.length > 0) {
    return { valid: false, reason: 'unknown_operators', unknownOps };
  }

  return { valid: true };
}

5. Hybrid Strategy

The optimal approach combines grammar-based and LLM-assisted translation:

Scenario Method Rationale
Simple, structured input Grammar only Fast, deterministic, no API cost
Contains pronouns/references LLM preprocessing → Grammar Resolve references, then parse
Contains idioms/jargon LLM preprocessing → Grammar Normalize language, then parse
Grammar parse fails LLM direct translation Fallback for unsupported patterns
LLM translation fails validation Return error to user No unreliable output

6. Research Questions

7. Related Work