Coding Style & Architecture (semantic)

Reference implementation priorities: correctness, clarity, composability. Lower layers stay generic & abstract; domain/presentation live above clean interfaces.

0) North star

  • Small, explicit mechanisms over clever magic.

  • Deterministic and auditable: same inputs → same outputs; mutations become artifacts.

1) Layering & dependencies

core   (entities, registries, graph topology, records, dispatch, capability)
  ↑
vm     (phases, frame/context, planning, provisioning, events & replay, ledger)
  ↑
service (domains/adapters, IO ports, orchestration, media/presentation hints)
  ↑
app    (CLI, notebooks, integrations, demo scenarios)
  ↑
presentation (renderers, web/UI)

Rules

  • One-way arrows; no imports up the stack.

  • Lower layers define data shapes + contracts; upper layers implement policies.

  • Cross-layer communication via handlers and records—not hidden globals.

2) Packages & modules

  • Package for a new conceptual layer (core, vm, service).

  • Module for cohesive micro-domains (replay/event.py, planning/offer.py).

  • Single-class module is fine; keep module docstring to 1–3 lines or omit.

3) Class design

  • Nouns are small: data-first + a few sharp methods.

  • Records are immutable: Event, Patch, Snapshot, Fragment.

  • Hooks over overrides where it improves clarity—use _structure_post/_unstructure_post to avoid mutual recursion.
    Alternate pattern allowed: explicit structure/unstructure overrides with a clear base case and round-trip tests.

4) Data model & serialization

  • Pydantic v2 models for entities and records.

  • Entity.structure(data) is the factory; unstructure() emits a minimal, portable dict (class tag + uid).

  • No implicit IO in models; persistence belongs to ledger/service.

4a) Type purity on attributes

  • Store types in their native form; serialize only at boundaries.

    • Hash digests are bytes, not hex strings (content_hash: bytes).

    • UUIDs are UUID objects, not strings.

    • Datetimes are datetime objects, not ISO strings.

    • Entity types are Python Type objects, not fully‑qualified name strings.

  • Use Pydantic serializers for JSON/YAML output.

    • Add @field_serializer for rich types needing string form (e.g., bytes.hex()).

    • Keep in‑memory types pure for comparisons, hashing, and fast operations.

    • Example: content_hash: bytes serializes to hex via a serializer, not by storing hex internally.

  • Expose convenience accessors when appropriate.

    • Properties such as .content_hash_hex for logs/UI.

    • Validators accepting multiple input forms (hex string → bytes).

    • Registry/query helpers that accept either native or serialized representations.

  • Rationale.

    • Faster operations: comparing bytes/UUID beats parsing strings.

    • Stronger type safety: mistakes surface at boundaries.

    • Lower overhead: avoid repeated .hex()/str() conversions in hot paths.

    • Clearer semantics: kind: Type[Entity] expresses intent better than a string FQN.

    • Type objects are the native representation in Python; FQNs serve only as serialized forms.

    • Serialization is a boundary concern; runtime behavior should never pay for it.

5) Mutations & replay

  • Plan → Project → Commit. Planning computes intent; commit applies and logs artifacts.

  • Prefer event sourcing: snapshot + canonicalized patches.

  • Receipts are the audit trail of runtime decisions.

6) Handlers & dispatch

  • Handlers are entity-centric; dispatch orders by priority → registration → uid.

  • Selection via Selector criteria; avoid hard-coded switches.

  • Thin handler bodies, rich JobReceipts for aggregation and audit.

  • Priority ordering: Use Priority enum (not strings) for deterministic sort order.

7) Naming & API surface

  • Canon: Graph, Node, Edge, Subgraph; Record, Event, Patch, Snapshot, Fragment; Frame, Context, Ledger; Requirement, Offer, Provisioner.

  • Enum members UPPERCASE; phases are UPPERCASE.

  • Curate public surface with __all__.

8) Extensibility & hooks

  • Registries as extension points; publish selection_criteria.

  • Template-driven provisioning; no hard-coded class awareness.

  • Presentation hints are advisory; clients may ignore.

9) Errors & invariants

  • Fail fast at boundaries (arity/role checks; policy validation).

  • Exceptions include context (ids, labels, policy).

  • No silent coercion.

  • Duplicate prevention: Registry.add() raises ValueError if entity with same UUID already exists (unless same object reference).

10) Determinism & RNG

  • Context.rand is seeded from (graph.uid, cursor.uid, step).

  • No time-based randomness in core/vm.

11) Performance posture

  • Optimize query paths; accept small constants for clarity elsewhere.

  • Canonicalization linear-ish; avoid n² passes.

12) Observability

  • Prefer emitting a Record to logging when it affects reasoning.

  • Minimal structured logs at orchestration edges.

  • Audit tracking: Use Entity.is_dirty to flag non-reproducible mutations (forced jumps, direct eval/exec). Check Registry.any_dirty() to detect tainted state.

13) Tests (contracts > mechanics)

  • Round-trip structure/unstructure for composites.

  • Phase reducers return the documented shape.

  • Provisioning policies enforce required fields.

  • Event canonicalization removes redundant updates; replay reproduces state.

  • Deterministic RNG in tests; monkeypatch randomness when needed.

  • No failing tests in releases: Fix or remove all xfail/skip markers before shipping.

14) Anti-patterns

  • Upward imports (lower layer depending on higher).

  • Hidden globals/state; domain singletons leaking into core/vm.

  • Recursive factories without a base case.

  • Overgrown classes; mix responsibilities → extract helpers.

  • Implicit IO in core/vm.

15) Architectural grooming review

StoryTangl evolves through repeated implementation cycles. Some code is load-bearing architecture; some is scaffolding left over from an earlier cycle. Grooming work should make that distinction clearer.

When reviewing changes, look for architectural drift:

  • Duplicate mechanisms — code that bypasses an existing primitive such as Selector, BehaviorRegistry.chain_execute_all, PhaseCtx.derive, EntityTemplate.materialize, provisioning offers/resolvers, or journal fragments.

  • Wrong-layer ownership — policy or domain behavior in a lower layer, or reusable generic machinery trapped in an upper layer.

  • Parallel runtime paths — new registries, dispatch loops, context types, fragment hierarchies, resolver flows, parser loops, or output channels that compete with the canonical path.

  • Unjustified compatibility — shims, aliases, fallback paths, or defensive probes without a named persistence, interop, or migration consumer.

  • Privileged domain packages — sandbox, mechanics, media, loaders, and app adapters should remain clients of the architecture unless a design note explicitly promotes a concept into the engine kernel.

A grooming change should do at least one of the following:

  1. delete a concept;

  2. unify competing concepts;

  3. move a concept to the layer that owns it;

  4. strengthen an invariant;

  5. simplify a repeated construction pattern;

  6. make the intended public pattern easier to learn.

Avoid taste-only churn. If a change only makes code look nicer but does not improve correctness, conceptual fit, maintainability, or learnability, defer it.

16) Dereferencing & Resolution Patterns

StoryTangl maintains strict separation between identity (UUIDs) and references (resolved objects). This enables serialization, event sourcing, and watched access patterns.

GraphItems: Properties with Implicit Graph Access

Rule: GraphItems store *_id: UUID fields and provide properties that resolve via .graph.

Pattern:

class Edge(GraphItem):
    predecessor_id: Optional[UUID] = None
    successor_id: Optional[UUID] = None
    
    @property
    def predecessor(self) -> Optional[Node]:
        """Resolve predecessor node via graph registry (watched)."""
        return self.graph.get(self.predecessor_id) if self.predecessor_id else None
    
    @predecessor.setter
    def predecessor(self, node: Optional[Node]):
        self.predecessor_id = node.uid if node else None

Why:

  • Every access goes through self.graph.get()WatchedRegistry can intercept

  • Serialization clean: only *_id fields in dict

  • Type hints work: property returns concrete type, field stores UUID

Records: Methods with Explicit Registry Parameter

Rule: Records are frozen and graph-independent. They store *_id: UUID and require explicit registry for dereferencing.

Pattern:

class Record:
    blame_id: Optional[UUID] = None
    
    def origin(self, registry: Registry[Entity]) -> Optional[Entity]:
        """Dereference blame_id via provided registry."""
        return registry.get(self.blame_id) if self.blame_id else None

Why:

  • Explicit: caller must provide registry (matches frozen/independent constraint)

  • No hidden state: can’t cache because record is immutable

  • Clear intent: “this requires a registry to resolve”

Asymmetry is intentional:

  • GraphItems use properties (no args) because .graph is always present

  • Records use methods (registry arg) because they’re topology-independent

Collections & Queries: Fresh Iterators

Rule: All query, filter, and resolution methods return fresh iterators, never cached collections.

Pattern:

@property
def members(self) -> Iterator[GraphItem]:
    """Yield members by resolving member_ids via graph."""
    return (self.graph.get(uid) for uid in self.member_ids)

def edges_in(self, **criteria) -> Iterator[Edge]:
    """Yield incoming edges matching criteria."""
    return self.graph.find_edges(successor=self, **criteria)

Why:

  • Consistency: all queries behave the same way

  • Efficiency: no hidden allocations—caller decides materialization

  • Freshness: always reflects current state (critical for WatchedRegistry)

  • Type safety: Iterator[T] correctly typed

Common gotcha:

# Iterator exhausts on first use
members = subgraph.members
print(list(members))  # [node1, node2]
print(list(members))  # [] - exhausted!

# Solution: materialize explicitly if multiple passes needed
members = list(subgraph.members)  # now reusable

When to materialize:

# Single iteration: use iterator directly
for node in subgraph.members:
    process(node)

# Multiple iterations: materialize first
members = list(subgraph.members)
for node in members: ...
for node in members: ...  # works

# Debug logging: materialize to avoid exhaustion
members = list(subgraph.members)
logger.debug(f"Found {len(members)} members: {members}")

Type hints:

  • Use Iterator[T] for single-use queries

  • Use Iterable[T] if the result is reusable (rare in this codebase)

  • Never use list[T] unless actually materializing

No GraphItem → GraphItem Direct Pointers

Rule: GraphItems must not hold direct references to other GraphItems.

Why:

  • Prevents circular references during serialization

  • Enables WatchedRegistry to intercept all cross-item access

  • Simplifies graph traversal and mutation tracking

Enforcement:

  • Store only *_id: UUID fields

  • Provide properties/methods that resolve via .graph

  • Exclude .graph itself from serialization (exclude=True)

Summary Table

Context

Storage

Access Pattern

Registry Source

Caching

GraphItem

other_id: UUID

@property other(self) -> T

self.graph

No

Record

other_id: UUID

def other(self, reg) -> T

Explicit param

No (frozen)

Query

N/A

def find_all() -> Iterator[T]

self (registry)

No