AI Agent Planning - Trustworthy AI

AI agents need to orchestrate tools reliably. Current LLM-based planning generates plausible-looking but often invalid plans. AGISystem2 can model formal tool semantics; plan verification is a research pattern that requires external planner/runtime integration.

Status: Research pattern (DS28). Examples are illustrative; plan generation/monitoring is not shipped in the runtime.

The Problem

AI agents must:

Understand tool capabilities—preconditions, effects, costs
Generate valid multi-step plans—each step's preconditions met by prior effects
Verify plans before execution—catch errors before they cause damage
Explain failures—not just "failed" but "why and what would fix it"
Replan when needed—adapt to unexpected results

Why LLM Planning Fails: LLMs generate plans based on linguistic patterns, not logical validity. A plan that "sounds right" may have impossible step sequences, missing preconditions, or circular dependencies. The failure only becomes apparent at execution time.

The Solution: Formal Tool Semantics

Each tool is defined with explicit preconditions and effects:

# Tool definition structure:
# - Preconditions: What must be true before the tool can run
# - Effects: What becomes true after the tool runs
# - Cost: Resource consumption (for optimization)

@ReadFile defineTool ReadFile
    (and (has Agent HasFile ?path)
         (has Agent HasAccess ?path))       # Preconditions
    (has Agent HasData (contentOf ?path))   # Effects
    (cost Low)

@QueryDatabase defineTool QueryDatabase
    (and (has Agent Connected ?database)
         (has Agent Authenticated ?database))
    (has Agent HasData (queryResult ?query))
    (cost Medium)

@SendEmail defineTool SendEmail
    (and (has Agent Connected EmailServer)
         (has Agent Authenticated EmailServer)
         (has Agent HasData ?content))
    (has Agent Completed (sent ?content ?recipient))
    (cost Low)

Plan Generation via Backward Chaining

A planner can generate plans by working backward from the goal:

Goal: has(Agent, Completed, sent(Q3Report, TeamMailingList))

Current State:
  - has(Agent, HasCredentials, EmailServer)
  - has(Agent, HasCredentials, SalesDB)
  - has(Agent, HasFile, ReportTemplate)

Backward chaining:

← Goal requires: SendEmail tool
   Preconditions: Connected(Email) ∧ Auth(Email) ∧ HasData(Q3Report)

   ← Connected + Auth requires: Login to EmailServer
      Preconditions: HasCredentials(EmailServer) ✓ (in state)

   ← HasData requires: Generate Q3Report
      ← QueryDatabase for sales data
         Preconditions: Connected(SalesDB) ∧ Auth(SalesDB)
         ← Login to SalesDB
            Preconditions: HasCredentials(SalesDB) ✓ (in state)

Generated Plan:
  Step 1: Login(SalesDB)           Pre: HasCredentials(SalesDB) ✓
  Step 2: QueryDatabase(SalesQ3)   Pre: Connected ∧ Auth ✓
  Step 3: Login(EmailServer)       Pre: HasCredentials(EmailServer) ✓
  Step 4: SendEmail(Report, Team)  Pre: Connected ∧ Auth ∧ HasData ✓

Plan Status: VALID ✓

Plan Validation

In a production integration, plans can be validated against tool semantics:

// Proposed plan (invalid - missing Login steps)
const invalidPlan = [QueryDatabase, SendEmail];

// Validation output:
Step 1: QueryDatabase
  ✗ Precondition FAILED: Connected(SalesDB) - not in current state
  ✗ Precondition FAILED: Authenticated(SalesDB) - not in current state

Step 2: SendEmail
  ✗ Precondition FAILED: Connected(EmailServer) - not in current state
  ✗ Precondition FAILED: HasData(Q3Report) - Step 1 failed

Plan Status: INVALID
Missing steps: Login(SalesDB), Login(EmailServer)

Runtime Monitoring

During execution, a host system can monitor preconditions and trigger replanning when needed:

for (const step of plan.steps) {
    // Check preconditions still valid
    const preCheck = session.prove(`
        @pre satisfies CurrentState ${step.preconditions}
    `);

    if (!preCheck.valid) {
        console.log(`Step ${step.tool} blocked: ${preCheck.missing}`);
        // Trigger replanning from current state
        const newPlan = session.prove(`
            @replan achieves Agent $goal from CurrentState
        `);
        continue;
    }

    // Execute and update state
    const result = await executeToolCall(step.tool, step.args);
    session.learn(`@_ updateState CurrentState ${step.postconditions}`);
}

Potential Benefits (When Implemented)

Safety

Invalid plans can be caught before execution when integrated.

Efficiency

Cost-aware plans can reduce wasted API calls or redundant operations.

Explainability

Plans can carry justifications so users can understand each step.

Debuggability

When plans fail, diagnostics can show which precondition wasn't met and what would fix it.

Integration with LLMs

AGISystem2 complements LLMs rather than replacing them:

Task	LLM Role	AGISystem2 Role
Intent understanding	"Send Q3 report to team" → structured goal	Validate goal is achievable
Plan generation	Propose candidate plans	Verify plan validity, optimize
Error handling	Generate user-friendly messages	Diagnose exact failure cause
Replanning	Suggest alternatives	Verify alternatives are valid

Research Directions

Hierarchical planning: Decompose complex goals into subgoals with their own tool requirements
Probabilistic effects: Handle tools with uncertain outcomes (network may fail, API may timeout)
Resource constraints: Plan under budget limits, rate limits, time constraints
Multi-agent coordination: Multiple agents collaborating with shared resources
Learning tool models: Automatically discover tool preconditions and effects from execution traces

AI Agent Planning and Tool Orchestration Research