The NL2DSL pipeline has three main stages:
Natural Language Input
│
▼
┌───────────────────┐
│ 1. Sentence Split │ Split text into individual sentences
└───────────────────┘
│
▼
┌───────────────────┐
│ 2. Pattern Match │ Match against rule/fact/copula/relation patterns
└───────────────────┘
│
▼
┌───────────────────┐
│ 3. DSL Emit │ Generate DSL statements with references
└───────────────────┘
│
▼
DSL Output
Rules express conditional logic (IF-THEN, universal quantification).
| Pattern | Example | DSL Output |
|---|---|---|
If X then Y |
"If it rains then the ground is wet" | Implies $cond $cons |
All/Every/Each X are Y |
"All cats are mammals" | Implies (isA ?x Cat) (isA ?x Mammal) |
Everything that is X is Y |
"Everything that is red is colorful" | Implies (hasProperty ?x red) (hasProperty ?x colorful) |
Xs are Ys (plural subsumption) |
"Dogs are animals" | Implies (isA ?x Dog) (isA ?x Animal) |
All Xs verb [object] |
"All birds fly" | Implies (isA ?x Bird) (hasProperty ?x fly) |
Input: "If something is a mammal then it is warm-blooded" → Output: @r1 isA ?x Mammal @r2 hasProperty ?x warm_blooded Implies $r1 $r2
Facts assert ground truths about specific entities.
| Pattern | Example | DSL Output |
|---|---|---|
X is a/an Y |
"Tom is a cat" | isA Tom Cat |
X is Y (property) |
"The ball is red" | hasProperty Ball red |
X is not Y |
"Tom is not a dog" | Not (isA Tom Dog) |
X verbs Y |
"John loves Mary" | love John Mary |
X has Y |
"The cat has whiskers" | has Cat whiskers |
Input: "Marie Curie is a scientist" → Output: isA Marie_Curie Scientist
Copula patterns handle "is/are/was/were" constructions with various complements.
| Pattern | Example | DSL Output |
|---|---|---|
X is in/on/at Y (locative) |
"The book is on the table" | on Book Table |
X is the Y of Z (relational noun) |
"Harry is the parent of Jack" | parent Harry Jack |
X isn't/aren't Y (contraction) |
"Cats aren't dogs" | Not (isA Cat Dog) |
Input: "The ball is in the box" → Output: in Ball Box
Relation patterns handle verb-based relationships between entities.
| Pattern | Example | DSL Output |
|---|---|---|
X verbs Y |
"Alice knows Bob" | know Alice Bob |
X verbs (intransitive) |
"Birds fly" | hasProperty Bird fly |
X doesn't verb Y |
"Tom doesn't like fish" | Not (like Tom Fish) |
Input: "Tom is a cat and Jerry is a mouse" → Output: isA Tom Cat isA Jerry Mouse
Input: "If X is red and X is round then X is colorful" → Output: @r1 hasProperty ?x red @r2 hasProperty ?x round @r3 And $r1 $r2 @r4 hasProperty ?x colorful Implies $r3 $r4
Input: "Penguins are not able to fly" → Output: Not (hasProperty Penguin able_to_fly)
Input: "Every red thing is colorful" → Output: @r1 hasProperty ?x red @r2 hasProperty ?x colorful Implies $r1 $r2
NL2DSL automatically normalizes entities for consistent representation:
| Transformation | Example | Result |
|---|---|---|
| Remove articles | "the cat", "a dog" | Cat, Dog |
| Capitalize type names | "is a mammal" | isA X Mammal |
| Multi-word entities | "Marie Curie" | Marie_Curie |
| Singularize for types | "All cats are..." | isA ?x Cat |
| Verb normalization | "loves", "loved" | love |
| Property sanitization | "warm-blooded" | warm_blooded |
Input: "John supervises Mary" (unknown operator "supervise") → Output: @supervise:supervise __Relation # Auto-declared supervise John Mary
Enable with: { autoDeclareUnknownOperators: true }
When a sentence cannot be parsed, NL2DSL can create an opaque reference (hash-based) to preserve the information without losing it:
Input: "The quantum entanglement defies classical intuition" (unparseable) → Output: hasProperty KB opaque_ctx_a1b2c3d4e5 # Hash-based reference
Enable with: { fallbackOpaqueStatements: true }
import { translateContextWithGrammar } from './src/nlp/nl2dsl/grammar.mjs';
const text = `
All cats are mammals.
All mammals are animals.
Tom is a cat.
`;
const result = translateContextWithGrammar(text, {
autoDeclareUnknownOperators: true,
fallbackOpaqueStatements: false
});
console.log(result.dsl);
// Output:
// @r1 isA ?x Cat
// @r2 isA ?x Mammal
// Implies $r1 $r2
// @r3 isA ?x Mammal
// @r4 isA ?x Animal
// Implies $r3 $r4
// isA Tom Cat
console.log(result.stats);
// { sentencesTotal: 3, sentencesParsed: 3, sentencesOpaque: 0, autoDeclaredOperators: 0 }
console.log(result.errors);
// [] (empty if all parsed successfully)
Patterns are matched in this priority order:
| Metric | Value |
|---|---|
| Translation success rate | 100% (all benchmarks) |
| Average parse time | <1ms per sentence |
| Pattern coverage | ~95% of logical English patterns |