diff --git a/docs/adr/ADR-132-homecore-recorder-history-semantic-search.md b/docs/adr/ADR-132-homecore-recorder-history-semantic-search.md new file mode 100644 index 00000000..9ec48555 --- /dev/null +++ b/docs/adr/ADR-132-homecore-recorder-history-semantic-search.md @@ -0,0 +1,130 @@ +# ADR-132: HOMECORE-RECORDER — State History + Semantic Search + +| Field | Value | +|-------|-------| +| **Status** | Accepted | +| **Date** | 2026-05-25 | +| **Deciders** | ruv | +| **Codename** | **HOMECORE-RECORDER** | +| **Crate** | `v2/crates/homecore-recorder` | +| **Relates to** | [ADR-126](ADR-126-ruview-native-ha-port-master.md) (HOMECORE master — series map row ADR-132), [ADR-127](ADR-127-homecore-state-machine-rust.md) (HOMECORE-CORE state machine), [ADR-124](ADR-124-rvagent-mcp-ruvector-npm-integration.md) (ruvector/SENSE-BRIDGE), [ADR-130](ADR-130-homecore-rest-websocket-api.md) (HOMECORE-API query surface, downstream) | +| **Tracking issue** | [#800](https://github.com/ruvnet/RuView/pull/800) (HOMECORE intake) | + +> **Documented retroactively (2026-06-12).** The `homecore-recorder` crate shipped under +> the ADR-126 series map (which planned an "ADR-132 HOMECORE-RECORDER") but the standalone +> ADR file was never written; the crate's `Cargo.toml`, `README.md`, `lib.rs`, `schema.rs`, +> and `semantic.rs` all cite "ADR-132". This ADR reverse-documents the decision that the +> shipped, tested code already embodies (ADR-164 Gap G3 / Coverage-Gaps Lens §A). It does +> **not** introduce new design; it records what is built. Date reflects the crate's intake +> era (first commit `e96ebaea8`, 2026-05-25); real-impl pass landed in `7c8071145` +> (2026-06-11). + +--- + +## 1. Context + +ADR-126 (the HOMECORE master) decided to reimplement Home Assistant (HA) natively in Rust. +HA persists every state change to a SQLite *recorder* database; downstream features +(history graphs, the logbook, long-term statistics, automation conditions that reference +past state) all read that store. HOMECORE therefore needs a durable state-history backbone. + +Two forces shape the decision: + +1. **Migration / coexistence.** Users adopting HOMECORE will have an existing HA + `recorder` database. Reusing HA's on-disk schema (rather than inventing a new one) lets + HOMECORE read an existing HA `home-assistant_v2.db` directly and lets HA-aware tooling + read HOMECORE's store. This is the same trust boundary that `homecore-migrate` + (ADR-165) handles for `.storage/*.json`. +2. **Semantic queries.** HA history is queried with SQL `BETWEEN`/`WHERE` clauses. The + HOMECORE platform already carries ruvector (ADR-124) for vector search, so the recorder + can additionally embed state changes and answer natural-language queries + ("which kitchen devices were warm at 3 PM?") via k-NN — a capability HA does not have. + +The recorder is the **durable-state surface**: if it is wrong, history, logbook, and +historical-condition automations are all wrong. ADR-164 flagged it as a CRITICAL coverage +gap precisely because such a load-bearing crate had no governing ADR. + +## 2. Decision + +Ship `homecore-recorder` as a SQLite state-history recorder with an HA-compatible schema +and an optional ruvector-backed semantic index, in three phases. P1 and P2 are built and +tested; P3 is planned. + +### 2.1 Storage — SQLite with the HA recorder schema (P1, shipped) + +- Persist via `sqlx` with the SQLite backend only (no Postgres, no TLS feature set). +- Mirror HA recorder **schema v48** so the store is bidirectionally readable + (`src/schema.rs`): + - `state_attributes` — shared attribute JSON blobs, deduped by an FNV-1a 64-bit hash + stored as a signed `i64` (matches HA's dedup key); + - `states` — one row per state write (`entity_id`, `state`, `attributes_id` FK, + `last_changed_ts`/`last_updated_ts` as REAL Unix seconds, `context_id` UUID); + - `events` — domain events (`event_type`, `event_data` JSON, `time_fired_ts`); + - `recorder_runs` — boot/shutdown bookends for history-gap detection. +- All DDL uses `CREATE TABLE IF NOT EXISTS`, so schema application is idempotent and safe + on every startup. +- Default persistence path `.homecore/home.db` (configurable). + +### 2.2 Capture — listener on the HOMECORE event bus (P1, shipped) + +- `RecorderListener` subscribes to the HOMECORE event bus (ADR-127) and captures + `StateChanged` events, writing snapshots through `Recorder` (`src/listener.rs`, + `src/db.rs`). +- A `DedupEngine` (`src/dedup.rs`) skips redundant writes when the state hash is unchanged, + matching HA's stateful-listener behaviour. + +### 2.3 Semantic search — ruvector HNSW (P2, shipped, feature-gated) + +- Behind the `ruvector` Cargo feature, the `Recorder` additionally calls a `SemanticIndex` + implementation (`src/semantic.rs`) that embeds state attributes and stores vectors in a + `ruvector-core` HNSW index for k-NN search. +- P2 embeddings are **hash-based** (sha2) — a deliberate, honest placeholder. They give a + working HNSW surface without claiming sentence-level semantic quality. +- When the feature is off, `NullSemanticIndex` satisfies the `SemanticIndex` trait bound + with no allocation, so the structural recorder ships independently of ruvector. + +### 2.4 Real sentence embeddings (P3, planned — not yet built) + +- Replace the hash embeddings with ruvector-attention sentence embeddings (dim → 384). Not + implemented; tracked as a follow-up. The README and `Cargo.toml` label this P3 explicitly. + +### 2.5 Test evidence (as shipped) + +- P1: 14 tests (`cargo test -p homecore-recorder --no-default-features`). +- P2: 20 tests (`cargo test -p homecore-recorder --features ruvector`). + +## 3. Consequences + +**Positive.** + +- HA-schema compatibility makes migration (ADR-165) and coexistence cheap: HOMECORE can + read an existing HA `recorder.db`, and any SQLite tool can read HOMECORE's history. +- The semantic index is **additive** and feature-gated: the durable structural recorder has + no hard dependency on ruvector, so the storage backbone ships first. +- Standard SQLite means no proprietary export format; history is directly queryable. + +**Negative / honest limits.** + +- P2 semantic search uses **hash embeddings**, not real sentence embeddings — query quality + is limited until P3. This is disclosed in the crate docs and here; it must not be cited as + semantic-quality-validated. +- No per-crate benchmarks exist yet; the latency figures in the README + (state-write p50 < 2 ms, semantic search < 10 ms on 1 M records) are design targets / + estimates, **needs verification** with a criterion baseline. +- Pinning to HA schema v48 couples HOMECORE to a specific HA recorder schema generation; + future HA schema bumps require an explicit migration step. + +**Neutral.** + +- This ADR governs the recorder crate only. The query/REST surface over recorder data is + HOMECORE-API (ADR-130, P3); automation conditions on historical state are + HOMECORE-automation (ADR-129, P3). + +## 4. Links + +- Crate: `v2/crates/homecore-recorder/` — `Cargo.toml`, `README.md`, `src/lib.rs`, + `src/db.rs`, `src/schema.rs`, `src/dedup.rs`, `src/listener.rs`, `src/semantic.rs`. +- [ADR-126](ADR-126-ruview-native-ha-port-master.md) — HOMECORE master (series map: ADR-132 = HOMECORE-RECORDER). +- [ADR-165](ADR-165-homecore-migrate-from-home-assistant.md) — HOMECORE-MIGRATE (reads HA `.storage`; P2 exports a side-by-side recorder DB). +- [ADR-164](ADR-164-adr-corpus-gap-analysis.md) — gap analysis that surfaced this missing ADR (Gap G3). +- [Home Assistant Recorder integration](https://www.home-assistant.io/integrations/recorder/).