AI agents need to orchestrate tools reliably. Current LLM-based planning generates plausible-looking but often invalid plans. AGISystem2 can model formal tool semantics; plan verification is a research pattern that requires external planner/runtime integration.

Status: Research pattern (DS28). Examples are illustrative; plan generation/monitoring is not shipped in the runtime.

The Problem

AI agents must:

Why LLM Planning Fails: LLMs generate plans based on linguistic patterns, not logical validity. A plan that "sounds right" may have impossible step sequences, missing preconditions, or circular dependencies. The failure only becomes apparent at execution time.

The Solution: Formal Tool Semantics

Each tool is defined with explicit preconditions and effects:

# Tool definition structure:
# - Preconditions: What must be true before the tool can run
# - Effects: What becomes true after the tool runs
# - Cost: Resource consumption (for optimization)

@ReadFile defineTool ReadFile
    (and (has Agent HasFile ?path)
         (has Agent HasAccess ?path))       # Preconditions
    (has Agent HasData (contentOf ?path))   # Effects
    (cost Low)

@QueryDatabase defineTool QueryDatabase
    (and (has Agent Connected ?database)
         (has Agent Authenticated ?database))
    (has Agent HasData (queryResult ?query))
    (cost Medium)

@SendEmail defineTool SendEmail
    (and (has Agent Connected EmailServer)
         (has Agent Authenticated EmailServer)
         (has Agent HasData ?content))
    (has Agent Completed (sent ?content ?recipient))
    (cost Low)

Plan Generation via Backward Chaining

A planner can generate plans by working backward from the goal:

Goal: has(Agent, Completed, sent(Q3Report, TeamMailingList))

Current State:
  - has(Agent, HasCredentials, EmailServer)
  - has(Agent, HasCredentials, SalesDB)
  - has(Agent, HasFile, ReportTemplate)

Backward chaining:

← Goal requires: SendEmail tool
   Preconditions: Connected(Email) ∧ Auth(Email) ∧ HasData(Q3Report)

   ← Connected + Auth requires: Login to EmailServer
      Preconditions: HasCredentials(EmailServer) ✓ (in state)

   ← HasData requires: Generate Q3Report
      ← QueryDatabase for sales data
         Preconditions: Connected(SalesDB) ∧ Auth(SalesDB)
         ← Login to SalesDB
            Preconditions: HasCredentials(SalesDB) ✓ (in state)

Generated Plan:
  Step 1: Login(SalesDB)           Pre: HasCredentials(SalesDB) ✓
  Step 2: QueryDatabase(SalesQ3)   Pre: Connected ∧ Auth ✓
  Step 3: Login(EmailServer)       Pre: HasCredentials(EmailServer) ✓
  Step 4: SendEmail(Report, Team)  Pre: Connected ∧ Auth ∧ HasData ✓

Plan Status: VALID ✓

Plan Validation

In a production integration, plans can be validated against tool semantics:

// Proposed plan (invalid - missing Login steps)
const invalidPlan = [QueryDatabase, SendEmail];

// Validation output:
Step 1: QueryDatabase
  ✗ Precondition FAILED: Connected(SalesDB) - not in current state
  ✗ Precondition FAILED: Authenticated(SalesDB) - not in current state

Step 2: SendEmail
  ✗ Precondition FAILED: Connected(EmailServer) - not in current state
  ✗ Precondition FAILED: HasData(Q3Report) - Step 1 failed

Plan Status: INVALID
Missing steps: Login(SalesDB), Login(EmailServer)

Runtime Monitoring

During execution, a host system can monitor preconditions and trigger replanning when needed:

for (const step of plan.steps) {
    // Check preconditions still valid
    const preCheck = session.prove(`
        @pre satisfies CurrentState ${step.preconditions}
    `);

    if (!preCheck.valid) {
        console.log(`Step ${step.tool} blocked: ${preCheck.missing}`);
        // Trigger replanning from current state
        const newPlan = session.prove(`
            @replan achieves Agent $goal from CurrentState
        `);
        continue;
    }

    // Execute and update state
    const result = await executeToolCall(step.tool, step.args);
    session.learn(`@_ updateState CurrentState ${step.postconditions}`);
}

Potential Benefits (When Implemented)

Safety

Invalid plans can be caught before execution when integrated.

Efficiency

Cost-aware plans can reduce wasted API calls or redundant operations.

Explainability

Plans can carry justifications so users can understand each step.

Debuggability

When plans fail, diagnostics can show which precondition wasn't met and what would fix it.

Integration with LLMs

AGISystem2 complements LLMs rather than replacing them:

Task LLM Role AGISystem2 Role
Intent understanding "Send Q3 report to team" → structured goal Validate goal is achievable
Plan generation Propose candidate plans Verify plan validity, optimize
Error handling Generate user-friendly messages Diagnose exact failure cause
Replanning Suggest alternatives Verify alternatives are valid

Research Directions

Related Documentation