⚠️ Stale document. This doc predates the v38 rename of
ContentAddressabletoHasContentand references module paths that no longer exist. The authoritative treatment of content-addressed identity is inengine/src/tangl/core/CORE_DESIGN.md(Trait Axes and Identity Model sections). This file is retained for historical context only and should not be used as a reference for current API usage.
Content-Addressable Records¶
Overview¶
ContentAddressable is a mixin for Record types that need content-based identity in addition to UID-based identity. It automatically computes a content_hash from the record’s content, enabling deduplication, provenance tracking, and content-based lookups.
When to Use¶
Use ContentAddressable when your Record type needs:
Deduplication - Same content should be recognized as identical
Provenance - Track exactly what content was used
Content Lookups - Find records by their content, not just UID
Immutability Verification - Detect if content changes
Usage¶
Basic Usage (Default Hashing)¶
from tangl.core.entity import Record
from tangl.core.record.content_addressable import ContentAddressable
class MyTemplate(Record, ContentAddressable):
name: str
archetype: str
hp: int
# content_hash auto-computed from all fields except uid
Custom Hashing¶
class MyResource(Record, ContentAddressable):
path: Path
metadata: dict
@classmethod
def _get_hashable_content(cls, data: dict):
# Only hash the file content, not metadata
if 'path' in data:
from tangl.utils.hashing import compute_data_hash
return compute_data_hash(Path(data['path']))
return None
Accessing the Hash¶
template = MyTemplate(name="guard", archetype="soldier", hp=50)
# Full hash (bytes)
full_hash: bytes = template.content_hash
# Truncated hex (for display/logging)
short_id: str = template.get_content_identifier() # First 16 hex chars
How It Works¶
Automatic Computation¶
When you construct a Record with ContentAddressable:
record = MyRecord(field1="value", field2=42)
The
@model_validatorcalls_get_hashable_content(data)Result is passed to
hashing_func()(from tangl.utils.hashing)Computed hash is set as
content_hashfield
Default Behavior¶
By default, ContentAddressable hashes the entire record except:
uid(instance-specific)content_hash(would be circular)created_at,updated_at(temporal metadata)
Customization¶
Override _get_hashable_content() to:
Exclude additional fields (like
scope,labelfor templates)Include only specific fields
Hash external content (file data, URLs)
Skip hashing entirely (return None)
Examples¶
Template Hashing¶
class ActorScript(Record, ContentAddressable):
name: str
archetype: str
hp: int
scope: ScopeSelector = None # Metadata, don't hash
label: str = None # Metadata, don't hash
@classmethod
def _get_hashable_content(cls, data: dict):
# Hash structure, not metadata
exclude = {'uid', 'content_hash', 'scope', 'label'}
return {k: v for k, v in data.items() if k not in exclude}
Result: Templates with same name, archetype, hp get same hash, regardless of scope or label.
Media Resource Hashing¶
class MediaRIT(Entity, ContentAddressable):
path: Path = None
data: bytes = None
@classmethod
def _get_hashable_content(cls, data: dict):
# Hash actual file/data content
if 'data' in data:
return data['data']
elif 'path' in data:
return compute_data_hash(Path(data['path']))
raise ValueError("Must provide data or path")
Result: Files with same content get same hash, even if paths differ.
Integration with Registry¶
Because content_hash is marked as an identifier (is_identifier=True), Registry can find records by hash:
# Add templates to registry
template1 = ActorScript(name="guard", hp=50)
template2 = ActorScript(name="guard", hp=50) # Same content
registry.add(template1)
registry.add(template2) # Duplicate - same hash
# Find by content hash identifier
matches = registry.find_all(Selector.from_identifier(template1.content_hash()))
assert len(matches) == 2 # Both instances
assert matches[0].content_hash() == matches[1].content_hash()
Provenance Tracking¶
Use content_hash in BuildReceipts to track what was used:
# In provisioner
template = world.template_registry.find_one(Selector.from_identifier("guard"))
receipt = BuildReceipt(
destination_uid=actor.uid,
metadata={
'template_ref': 'guard',
'template_hash': template.get_content_identifier(),
# Can verify later that exact template was used
}
)
Best Practices¶
DO:¶
✅ Use for immutable content (templates, resources)
✅ Exclude metadata from hash (scope, labels, timestamps)
✅ Document what fields are hashed in
_get_hashable_content()✅ Use
get_content_identifier()for logging
DON’T:¶
❌ Use for frequently-mutating records (defeats caching)
❌ Hash sensitive data without considering privacy
❌ Assume hash uniqueness (collisions theoretically possible)
❌ Use hash as primary key (UID is primary, hash is alias)
Performance Notes¶
Hash computation happens once at construction
Records are frozen (immutable), so hash never changes
No caching needed (computed once, stored forever)
hashing_func()is fast (Blake2b or SHA224)
Troubleshooting¶
“Hash not computed”
_get_hashable_content()returned NoneCheck your override implementation
“Same content, different hashes”
Metadata fields being included in hash
Add them to exclude set in
_get_hashable_content()
“Different content, same hash” (collision)
Astronomically unlikely with Blake2b/SHA224
Report as bug if confirmed
See Also¶
MediaResourceInventoryTag - Example using ContentAddressable
BaseScriptItem - Templates using ContentAddressable
Registry - Content-based lookups