Research Overview

Goal: Extend the NL2DSL grammar engine to handle 95%+ of structured natural language patterns deterministically, without requiring LLM assistance.

Current Status: ~70% coverage of academic benchmark patterns

Target: 95% coverage by Q4 2025

1. Current Architecture

The grammar-based translator uses a multi-stage pipeline:

Natural Language Input
        │
        ▼
┌───────────────────┐
│ Sentence Splitter │  ← Split complex sentences
└───────────────────┘
        │
        ▼
┌───────────────────┐
│ Pattern Matcher   │  ← Match against grammar rules
└───────────────────┘
        │
        ▼
┌───────────────────┐
│ Structure Builder │  ← Build DSL AST
└───────────────────┘
        │
        ▼
┌───────────────────┐
│ DSL Serializer    │  ← Output DSL text
└───────────────────┘

2. Pattern Categories

Category Examples Coverage Priority
IS-A Relations "X is a Y", "Every X is a Y" 95% Done
HAS-A Relations "X has Y", "X contains Y" 90% Done
Simple Conditionals "If X then Y" 85% Done
Quantifiers "All X", "Some X", "No X" 60% P1
Temporal "Before X", "After Y", "During Z" 50% P1
Comparatives "X is more than Y", "X is the most Z" 30% P2
Nested Conditions "If X and Y then Z unless W" 25% P2

3. Priority 1: Quantifier Expansion

Challenge: Natural language quantifiers are nuanced and context-dependent.
Approach: Implement a quantifier normalizer that:
  1. Identifies quantifier phrases via pattern matching
  2. Maps to formal quantifier types (∀, ∃, majority, etc.)
  3. Generates appropriate DSL with variable bindings
// Input
"All mammals have a heart"

// Parsed structure
{
  quantifier: "universal",
  variable: "?x",
  domain: "Mammal",
  predicate: { op: "has", subject: "?x", object: "heart" }
}

// Output DSL
@r1 Implies (isA ?x Mammal) (has ?x heart)

4. Priority 1: Temporal Expressions

Challenge: Temporal relations require understanding of:
Approach: Use Allen's interval algebra as the formal foundation:
  1. Extract temporal expressions via regex + NLP patterns
  2. Normalize to interval representation
  3. Map to DSL temporal operators (before, after, during, overlaps)

5. Implementation Plan

Phase Deliverable Timeline
Phase 1 Universal/existential quantifiers (∀, ∃) Q1 2025
Phase 2 Temporal point references Q1 2025
Phase 3 Temporal intervals (Allen relations) Q2 2025
Phase 4 Comparative structures Q2 2025
Phase 5 Nested conditionals with exceptions Q3 2025
Phase 6 Edge case hardening, 95% target Q4 2025

6. Evaluation Metrics

7. Related Work