Grammar & Sentence Shapes

AGISystem2 deliberately understands only a small, precise interaction language. At the engine boundary, everything is expressed as Sys2DSL statements built from simple subject–relation–object shapes. The goal is not to mimic human style but to provide a form that can be mapped into high-dimensional conceptual space without ambiguity. This page describes those canonical shapes, how they appear in Sys2DSL, and how they relate to any upstream natural-language normalisation.

The Canonical Assertion and Question Forms

Every core fact has three parts: subject, relation, and object. In canonical form this is a triple such as Dog IS_A Animal, which says that the concept labelled "Dog" belongs to the broader concept "Animal". Properties are expressed using specific relations that connect subjects to value concepts: Water BOILS_AT Celsius100 means the concept "Water" has its boiling point at the value concept "Celsius100".

In Sys2DSL, such assertions are always written as explicit statements, for example:

@f1 ASSERT Dog IS_A Animal
@f2 ASSERT Water BOILS_AT Celsius100
@f3 ASSERT Celsius100 IS_A temperature

From a geometric perspective, both subjects and values are points in conceptual space. The relation BOILS_AT connects them through permutation binding. This approach keeps values as first-class concepts that can be reasoned about independently (e.g., "what else has this temperature?").

Questions follow the same subject–relation–object backbone but are phrased as queries. In Sys2DSL they appear as @q ASK "Dog IS_A Animal?" or @q ASK "Water BOILS_AT Celsius100?". Internally, the parser and encoder reconstruct the triple and ask whether the corresponding point lies inside the appropriate concept region. In other words, both assertions and questions ultimately become vectors, masks, and bounded‑diamond membership tests, as explained in the Conceptual Spaces and Reasoning chapters.

Counterfactual and Deontic Forms

Some statements describe temporary, hypothetical contexts rather than facts that should be stored permanently. In Sys2DSL these are expressed with the CF action, which combines a question with an inline theory layer. For example:

@cf CF "Water BOILS_AT Celsius50?"
     | Water BOILS_AT Celsius50

For the duration of this reasoning step, the engine adds the given fact as a temporary overlay and answers the question under that overlay. The corresponding geometric adjustments are confined to a cloned theory stack and discarded after the query completes, leaving the base theory unchanged.

Normative sentences rely on deontic relations such as PERMITS, PROHIBITS, and OBLIGATES. A statement like @f ASSERT ExportData PROHIBITED_BY GDPR assigns a deontic status to the action "ExportData" under the regime labelled "GDPR". Causal and temporal relations use pairs such as CAUSES/CAUSED_BY and BEFORE/AFTER. Structural relations include IS_A, PART_OF/HAS_PART, and LOCATED_IN/CONTAINS. Property relations use specific verbs like BOILS_AT, HAS_COLOR, WEIGHS rather than a generic HAS_PROPERTY with compound tokens. Relation definitions and their geometric mappings are specified in the relation design specs and the RelationPermuter.

What the Engine Rejects

The constrained grammar is intentionally strict about what it accepts. At the Sys2DSL level, every fact or question must be expressible as a small number of tokens in a subject–relation–object pattern. Free‑form requests such as “Tell me a story” or “Give me advice” do not map directly to this grammar; they belong outside AGISystem2 or must first be normalised to canonical triples.

Upstream components such as a TranslatorBridge may be used to turn rich natural language into these canonical forms, but the engine itself only sees the Sys2DSL statements. If normalisation cannot produce safe statements, it should fail rather than guess.

How Normalisation and Parsing Cooperate

Once a sentence has been normalised to a canonical triple (for example by an external translator), the parser takes over. Its job is deliberately modest: identify subject, relation and object tokens and build a shallow structure that the encoder can turn into vectors. The recursion horizon is kept low so that deeply nested structures cannot pollute the vector with accidental detail.

In practice, you can think of each fact or question as following a simple pipeline: produce a canonical triple, parse subject/relation/object, encode via vector operations and relation permutations, then carve or query regions in conceptual space. The grammar described on this page specifies the textual forms that are allowed at the engine boundary; the encoding and reasoning specs explain how those forms are mapped into geometry.

Concrete Examples

Assertion:
  @f1 ASSERT Dog IS_A Animal
Property (value as concept):
  @f2 ASSERT Water BOILS_AT Celsius100
  @f3 ASSERT Celsius100 IS_A temperature
Question:
  @q1 ASK "Water BOILS_AT Celsius100?"
Counterfactual:
  @cf CF "Water BOILS_AT Celsius50?"
      | Water BOILS_AT Celsius50
Deontic:
  @f4 ASSERT ExportData PROHIBITED_BY GDPR
Causal:
  @f5 ASSERT Fire CAUSES Smoke
Temporal:
  @f6 ASSERT EventA BEFORE EventB
Structural:
  @f7 ASSERT Engine PART_OF Car

These examples are deliberately simple, but they capture the core shapes that the engine understands. More complex scenarios are expressed as Sys2DSL programmes that build on these triples, bind concepts and points, apply masks, and combine intermediate results via variables and topological evaluation.

Why Such a Strict Grammar?

The decision to keep the grammar small is a trade-off between flexibility and clarity. Every extra degree of freedom in language must be reflected in the encoding and reasoning layers, multiplying the ways in which ambiguity can slip in. By constraining inputs to a handful of relations and simple token patterns, AGISystem2 gains several advantages: parsing is deterministic, the mapping from text to vectors is transparent, and explanations can directly reference the original sentences without hidden transformations. TranslatorBridge assumes the burden of dealing with messy human language, allowing the core engine to remain lean and exact.

For developers coming from traditional machine learning backgrounds, this may feel strict compared to end-to-end text models. The benefit is that when something goes wrong you can often point to a specific sentence shape or relation choice as the cause, rather than to a mysterious shift in a large neural network. The Grammar, Conceptual Spaces, and Algorithms chapters together form a small wiki that explains how this constrained language maps into geometry and why that constraint is a feature, not a limitation, for safety-critical and auditable systems.