9.4 KiB
ADR-132: HOMECORE-RECORDER — State History + Semantic Search
| Field | Value |
|---|---|
| Status | Accepted |
| Date | 2026-05-25 |
| Deciders | ruv |
| Codename | HOMECORE-RECORDER |
| Crate | v2/crates/homecore-recorder |
| Relates to | ADR-126 (HOMECORE master — series map row ADR-132), ADR-127 (HOMECORE-CORE state machine), ADR-124 (ruvector/SENSE-BRIDGE), ADR-130 (HOMECORE-API query surface, downstream) |
| Tracking issue | #800 (HOMECORE intake) |
Documented retroactively (2026-06-12). The
homecore-recordercrate shipped under the ADR-126 series map (which planned an "ADR-132 HOMECORE-RECORDER") but the standalone ADR file was never written; the crate'sCargo.toml,README.md,lib.rs,schema.rs, andsemantic.rsall cite "ADR-132". This ADR reverse-documents the decision that the shipped, tested code already embodies (ADR-164 Gap G3 / Coverage-Gaps Lens §A). It does not introduce new design; it records what is built. Date reflects the crate's intake era (first commite96ebaea8, 2026-05-25); real-impl pass landed in7c8071145(2026-06-11).
1. Context
ADR-126 (the HOMECORE master) decided to reimplement Home Assistant (HA) natively in Rust. HA persists every state change to a SQLite recorder database; downstream features (history graphs, the logbook, long-term statistics, automation conditions that reference past state) all read that store. HOMECORE therefore needs a durable state-history backbone.
Two forces shape the decision:
- Migration / coexistence. Users adopting HOMECORE will have an existing HA
recorderdatabase. Reusing HA's on-disk schema (rather than inventing a new one) lets HOMECORE read an existing HAhome-assistant_v2.dbdirectly and lets HA-aware tooling read HOMECORE's store. This is the same trust boundary thathomecore-migrate(ADR-165) handles for.storage/*.json. - Semantic queries. HA history is queried with SQL
BETWEEN/WHEREclauses. The HOMECORE platform already carries ruvector (ADR-124) for vector search, so the recorder can additionally embed state changes and answer natural-language queries ("which kitchen devices were warm at 3 PM?") via k-NN — a capability HA does not have.
The recorder is the durable-state surface: if it is wrong, history, logbook, and historical-condition automations are all wrong. ADR-164 flagged it as a CRITICAL coverage gap precisely because such a load-bearing crate had no governing ADR.
2. Decision
Ship homecore-recorder as a SQLite state-history recorder with an HA-compatible schema
and an optional ruvector-backed semantic index, in three phases. P1 and P2 are built and
tested; P3 is planned.
2.1 Storage — SQLite with the HA recorder schema (P1, shipped)
- Persist via
sqlxwith the SQLite backend only (no Postgres, no TLS feature set). - Mirror HA recorder schema v48 so the store is bidirectionally readable
(
src/schema.rs):state_attributes— shared attribute JSON blobs, deduped by an FNV-1a 64-bit hash stored as a signedi64(matches HA's dedup key);states— one row per state write (entity_id,state,attributes_idFK,last_changed_ts/last_updated_tsas REAL Unix seconds,context_idUUID);events— domain events (event_type,event_dataJSON,time_fired_ts);recorder_runs— boot/shutdown bookends for history-gap detection.
- All DDL uses
CREATE TABLE IF NOT EXISTS, so schema application is idempotent and safe on every startup. - Default persistence path
.homecore/home.db(configurable).
2.2 Capture — listener on the HOMECORE event bus (P1, shipped)
RecorderListenersubscribes to the HOMECORE event bus (ADR-127) and capturesStateChangedevents, writing snapshots throughRecorder(src/listener.rs,src/db.rs).- A
DedupEngine(src/dedup.rs) skips redundant writes when the state hash is unchanged, matching HA's stateful-listener behaviour.
2.3 Semantic search — ruvector HNSW (P2, shipped, feature-gated)
- Behind the
ruvectorCargo feature, theRecorderadditionally calls aSemanticIndeximplementation (src/semantic.rs) that embeds state attributes and stores vectors in aruvector-coreHNSW index for k-NN search. - P2 embeddings are hash-based (sha2) — a deliberate, honest placeholder. They give a working HNSW surface without claiming sentence-level semantic quality.
- When the feature is off,
NullSemanticIndexsatisfies theSemanticIndextrait bound with no allocation, so the structural recorder ships independently of ruvector.
2.4 Real sentence embeddings (P3, planned — not yet built)
- Replace the hash embeddings with ruvector-attention sentence embeddings (dim → 384). Not
implemented; tracked as a follow-up. The README and
Cargo.tomllabel this P3 explicitly.
2.5 Test evidence (as shipped)
- P1: 14 tests (
cargo test -p homecore-recorder --no-default-features). - P2: 20 tests (
cargo test -p homecore-recorder --features ruvector).
3. Consequences
Positive.
- HA-schema compatibility makes migration (ADR-165) and coexistence cheap: HOMECORE can
read an existing HA
recorder.db, and any SQLite tool can read HOMECORE's history. - The semantic index is additive and feature-gated: the durable structural recorder has no hard dependency on ruvector, so the storage backbone ships first.
- Standard SQLite means no proprietary export format; history is directly queryable.
Negative / honest limits.
- P2 semantic search uses hash embeddings, not real sentence embeddings — query quality is limited until P3. This is disclosed in the crate docs and here; it must not be cited as semantic-quality-validated.
- No per-crate benchmarks exist yet; the latency figures in the README (state-write p50 < 2 ms, semantic search < 10 ms on 1 M records) are design targets / estimates, needs verification with a criterion baseline.
- Pinning to HA schema v48 couples HOMECORE to a specific HA recorder schema generation; future HA schema bumps require an explicit migration step.
Neutral.
- This ADR governs the recorder crate only. The query/REST surface over recorder data is HOMECORE-API (ADR-130, P3); automation conditions on historical state are HOMECORE-automation (ADR-129, P3).
3a. Security review (2026-06, post-ADR-154–159 sweep)
A beyond-SOTA security review of homecore-recorder covered SQL injection, retention/purge
correctness, fail-closed write integrity, semantic-store NaN poisoning, and PII exposure.
Confirmed clean (with evidence):
- SQL injection — clean. Every query in
db.rsuses bound?parameters; no user- or entity-influenceable value is interpolated into SQL viaformat!/concatenation. The onlyformat!builds theLIKEpattern string, which is itself bound as a parameter withESCAPE '\\'and% _ \escaping — so a metacharacter payload is matched literally. Pinned bymalicious_entity_id_is_stored_literally_not_executed(a'; DROP TABLE states; --state value leaves the table intact and round-trips verbatim) andlike_metacharacters_in_query_are_literal_not_wildcards. - NaN-index poisoning — structurally impossible. Embeddings are SHA-256 →
i32→f32; ani32→f32cast is always finite (never NaN/Inf), and an all-zero-digest is guarded by thenorm > 1e-10check. Empty-index search, empty-string query, andk=0were probed and all returnOk(0)with no panic. (Unlike the calibration/vitals/geo paths, no raw sensor float ever reaches the index.) - Fail-closed writes. A removal event returns
Ok(None); semantic-index failure is logged, not propagated, so it never blocks the durable SQLite write;EntityIdparse failure falls back to a sentinel rather than panicking.
Fixed (real bounding bugs):
- Memory-DoS —
get_state_historywas unbounded. NoLIMIT, so a wide time window over a high-frequency entity loaded an unbounded row set into memory. Now capped atMAX_HISTORY_ROWS(1,000,000); sibling search paths were alreadyk-bounded. - Disk-DoS / documented-but-missing
purge. The README advertisedRecorder::purge, but no retention path existed → unbounded disk growth. Added a transactionalpurge(older_than)with an exclusive cutoff (idempotent, no off-by-one) that deletes oldstates/eventsand GCs orphanedstate_attributesblobs (dedup-shared blobs kept until their last referrer is gone).
homecore-recorder tests: 19 → 25 (--no-default-features) / 25 → 31 (--features ruvector),
0 failed. Python deterministic proof unchanged (recorder is off the signal proof path).
4. Links
- Crate:
v2/crates/homecore-recorder/—Cargo.toml,README.md,src/lib.rs,src/db.rs,src/schema.rs,src/dedup.rs,src/listener.rs,src/semantic.rs. - ADR-126 — HOMECORE master (series map: ADR-132 = HOMECORE-RECORDER).
- ADR-165 — HOMECORE-MIGRATE (reads HA
.storage; P2 exports a side-by-side recorder DB). - ADR-164 — gap analysis that surfaced this missing ADR (Gap G3).
- Home Assistant Recorder integration.