Overview: NL2DSL is AGISystem2's grammar-based translation layer that converts natural language sentences into formal DSL statements. It uses pattern matching to recognize logical structures in English text, enabling the reasoning engine to process natural language input.

1. Architecture

The NL2DSL pipeline has three main stages:

Natural Language Input
        │
        ▼
┌───────────────────┐
│ 1. Sentence Split │  Split text into individual sentences
└───────────────────┘
        │
        ▼
┌───────────────────┐
│ 2. Pattern Match  │  Match against rule/fact/copula/relation patterns
└───────────────────┘
        │
        ▼
┌───────────────────┐
│ 3. DSL Emit       │  Generate DSL statements with references
└───────────────────┘
        │
        ▼
    DSL Output

2. Supported Patterns

2.1 Rule Patterns Rules

Rules express conditional logic (IF-THEN, universal quantification).

Pattern Example DSL Output
If X then Y "If it rains then the ground is wet" Implies $cond $cons
All/Every/Each X are Y "All cats are mammals" Implies (isA ?x Cat) (isA ?x Mammal)
Everything that is X is Y "Everything that is red is colorful" Implies (hasProperty ?x red) (hasProperty ?x colorful)
Xs are Ys (plural subsumption) "Dogs are animals" Implies (isA ?x Dog) (isA ?x Animal)
All Xs verb [object] "All birds fly" Implies (isA ?x Bird) (hasProperty ?x fly)
Input:  "If something is a mammal then it is warm-blooded"

Output: @r1 isA ?x Mammal
        @r2 hasProperty ?x warm_blooded
        Implies $r1 $r2

2.2 Fact Patterns Facts

Facts assert ground truths about specific entities.

Pattern Example DSL Output
X is a/an Y "Tom is a cat" isA Tom Cat
X is Y (property) "The ball is red" hasProperty Ball red
X is not Y "Tom is not a dog" Not (isA Tom Dog)
X verbs Y "John loves Mary" love John Mary
X has Y "The cat has whiskers" has Cat whiskers
Input:  "Marie Curie is a scientist"

Output: isA Marie_Curie Scientist

2.3 Copula Patterns Copula

Copula patterns handle "is/are/was/were" constructions with various complements.

Pattern Example DSL Output
X is in/on/at Y (locative) "The book is on the table" on Book Table
X is the Y of Z (relational noun) "Harry is the parent of Jack" parent Harry Jack
X isn't/aren't Y (contraction) "Cats aren't dogs" Not (isA Cat Dog)
Input:  "The ball is in the box"

Output: in Ball Box

2.4 Relation Patterns Relations

Relation patterns handle verb-based relationships between entities.

Pattern Example DSL Output
X verbs Y "Alice knows Bob" know Alice Bob
X verbs (intransitive) "Birds fly" hasProperty Bird fly
X doesn't verb Y "Tom doesn't like fish" Not (like Tom Fish)

3. Compound Structures

3.1 Coordination (AND/OR)

Input:  "Tom is a cat and Jerry is a mouse"

Output: isA Tom Cat
        isA Jerry Mouse
Input:  "If X is red and X is round then X is colorful"

Output: @r1 hasProperty ?x red
        @r2 hasProperty ?x round
        @r3 And $r1 $r2
        @r4 hasProperty ?x colorful
        Implies $r3 $r4

3.2 Negation

Input:  "Penguins are not able to fly"

Output: Not (hasProperty Penguin able_to_fly)

3.3 Quantified Subjects

Input:  "Every red thing is colorful"

Output: @r1 hasProperty ?x red
        @r2 hasProperty ?x colorful
        Implies $r1 $r2

4. Entity Normalization

NL2DSL automatically normalizes entities for consistent representation:

Transformation Example Result
Remove articles "the cat", "a dog" Cat, Dog
Capitalize type names "is a mammal" isA X Mammal
Multi-word entities "Marie Curie" Marie_Curie
Singularize for types "All cats are..." isA ?x Cat
Verb normalization "loves", "loved" love
Property sanitization "warm-blooded" warm_blooded

5. Operator Auto-Declaration

When NL2DSL encounters an unknown verb/relation, it can automatically declare it as a new operator. This enables processing of domain-specific vocabulary without pre-registration.
Input:  "John supervises Mary"  (unknown operator "supervise")

Output: @supervise:supervise __Relation   # Auto-declared
        supervise John Mary

Enable with: { autoDeclareUnknownOperators: true }

6. Fallback: Opaque Statements

When a sentence cannot be parsed, NL2DSL can create an opaque reference (hash-based) to preserve the information without losing it:

Input:  "The quantum entanglement defies classical intuition"  (unparseable)

Output: hasProperty KB opaque_ctx_a1b2c3d4e5  # Hash-based reference

Enable with: { fallbackOpaqueStatements: true }

7. API Usage

import { translateContextWithGrammar } from './src/nlp/nl2dsl/grammar.mjs';

const text = `
  All cats are mammals.
  All mammals are animals.
  Tom is a cat.
`;

const result = translateContextWithGrammar(text, {
  autoDeclareUnknownOperators: true,
  fallbackOpaqueStatements: false
});

console.log(result.dsl);
// Output:
// @r1 isA ?x Cat
// @r2 isA ?x Mammal
// Implies $r1 $r2
// @r3 isA ?x Mammal
// @r4 isA ?x Animal
// Implies $r3 $r4
// isA Tom Cat

console.log(result.stats);
// { sentencesTotal: 3, sentencesParsed: 3, sentencesOpaque: 0, autoDeclaredOperators: 0 }

console.log(result.errors);
// [] (empty if all parsed successfully)

8. Known Limitations

Current Limitations:

9. Pattern Priority

Patterns are matched in this priority order:

  1. Rule patterns (IF-THEN, quantifiers) – checked first
  2. Fact patterns (ground assertions) – checked second
  3. Copula patterns (is/are constructions) – within fact parsing
  4. Relation patterns (verb-based) – fallback
  5. Opaque fallback (if enabled) – last resort

10. Performance

Benchmark Results: NL2DSL achieves 100% translation success on all tested benchmark suites (ProntoQA, FOLIO, LogiQA, LogicNLI, RuleBERT, bAbI, CLUTRR). See External Benchmarks for details.
Metric Value
Translation success rate 100% (all benchmarks)
Average parse time <1ms per sentence
Pattern coverage ~95% of logical English patterns