From 50a82165c92ac9ce36c34259d73e85f3b9ed4d7c Mon Sep 17 00:00:00 2001 From: ruv Date: Fri, 6 Mar 2026 15:57:12 -0500 Subject: [PATCH] docs: add persistent node registry, OTA safety gate, plugin architecture to ADR-052 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Incorporates engineering review feedback: - Persistent node registry (~/.ruview/nodes.db) — discovery becomes reconciliation - BatchOtaSession aggregate with TdmSafe rolling update strategy - Plugin architecture section — control plane extensibility trajectory - Renumbered sections for new content (9-12 added, impl phases now section 13) Co-Authored-By: claude-flow --- docs/adr/ADR-052-ddd-bounded-contexts.md | 35 ++++++++ docs/adr/ADR-052-tauri-desktop-frontend.md | 100 ++++++++++++++++++++- 2 files changed, 133 insertions(+), 2 deletions(-) diff --git a/docs/adr/ADR-052-ddd-bounded-contexts.md b/docs/adr/ADR-052-ddd-bounded-contexts.md index ff83d110..39093fca 100644 --- a/docs/adr/ADR-052-ddd-bounded-contexts.md +++ b/docs/adr/ADR-052-ddd-bounded-contexts.md @@ -51,6 +51,12 @@ address. **Invariant**: No two nodes may share the same MAC address. If a node is discovered via multiple strategies, the most recent data wins. +**Persistence**: The registry is persisted to `~/.ruview/nodes.db` (SQLite via +`rusqlite`). On startup, all previously known nodes are loaded as `Offline` and +reconciled against a fresh discovery scan. This means the app **remembers the +mesh** across restarts — critical for field deployments where nodes may be +temporarily powered off. + #### `Node` (Entity) | Field | Type | Description | @@ -156,6 +162,32 @@ Represents an over-the-air firmware update to a running node. | `phase` | `OtaPhase` | Uploading / Rebooting / Verifying / Done / Failed | | `progress` | `Progress` | Upload progress | +#### `BatchOtaSession` (Aggregate Root) + +Coordinates rolling firmware updates across multiple mesh nodes. Prevents all +nodes from rebooting simultaneously, which would collapse the sensing network. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | `Uuid` | Batch session identifier | +| `firmware` | `FirmwareBinary` | The binary being deployed | +| `strategy` | `OtaStrategy` | `Sequential`, `TdmSafe`, `Parallel` | +| `max_concurrent` | `usize` | Max nodes updating at once | +| `batch_delay_secs` | `u64` | Delay between batches | +| `fail_fast` | `bool` | Abort remaining on first failure | +| `node_states` | `Map` | Per-node progress | + +**Invariant**: In `TdmSafe` mode, adjacent TDM slots are never updated +concurrently. Even-slot nodes update first, then odd-slot nodes. + +**Lifecycle**: `Planning → InProgress → Completed | PartialFailure | Aborted` + +- `BatchNodeState` — enum: `Queued`, `Uploading(Progress)`, `Rebooting`, `Verifying`, `Done`, `Failed(String)`, `Skipped` +- `OtaStrategy` — enum: + - `Sequential` — one node at a time, wait for rejoin + - `TdmSafe` — update non-adjacent slots to maintain sensing coverage + - `Parallel` — all at once (development only) + ### Value Objects - `SerialPort` — `{ name: String, vid: u16, pid: u16, manufacturer: Option }` @@ -177,6 +209,9 @@ Represents an over-the-air firmware update to a running node. | `OtaStarted` | `{ session_id, target_mac, firmware_version }` | Discovery (mark node as updating) | | `OtaCompleted` | `{ session_id, target_mac, new_version }` | Discovery (refresh node info) | | `OtaFailed` | `{ session_id, target_mac, error }` | UI (show error) | +| `BatchOtaStarted` | `{ batch_id, strategy, node_count }` | UI (show batch progress) | +| `BatchNodeUpdated` | `{ batch_id, mac, state }` | UI (update per-node status), Discovery (refresh) | +| `BatchOtaCompleted` | `{ batch_id, succeeded, failed, skipped }` | UI (show summary), Discovery (full rescan) | ### Anti-Corruption Layer diff --git a/docs/adr/ADR-052-tauri-desktop-frontend.md b/docs/adr/ADR-052-tauri-desktop-frontend.md index 4f5c470d..d8ee8727 100644 --- a/docs/adr/ADR-052-tauri-desktop-frontend.md +++ b/docs/adr/ADR-052-tauri-desktop-frontend.md @@ -637,7 +637,103 @@ cargo build --release -p wifi-densepose-sensing-server # target/release/sensing-server -> crates/wifi-densepose-desktop/binaries/sensing-server-{arch} ``` -### 9. Security Considerations +### 9. Persistent Node Registry + +Discovery alone is transient — nodes appear when they broadcast, disappear when they don't. A persistent local registry transforms discovery into **reconciliation**. + +``` +~/.ruview/nodes.db (SQLite via rusqlite) +``` + +**Schema:** + +```sql +CREATE TABLE nodes ( + mac TEXT PRIMARY KEY, -- e.g. "AA:BB:CC:DD:EE:FF" + last_ip TEXT, -- last known IP + last_seen INTEGER NOT NULL, -- Unix timestamp + firmware TEXT, -- e.g. "0.3.1" + chip TEXT DEFAULT 'esp32s3', -- esp32, esp32s3, esp32c3 + mesh_role TEXT DEFAULT 'node', -- 'coordinator' | 'node' | 'aggregator' + tdm_slot INTEGER, -- assigned TDM slot index + capabilities TEXT, -- JSON: {"wasm": true, "ota": true, "csi": true} + friendly_name TEXT, -- user-assigned label + notes TEXT -- free-form notes +); +``` + +**Behavior:** + +- On discovery broadcast, upsert into registry (update `last_ip`, `last_seen`, `firmware`) +- Dashboard shows **all registered nodes**, dimming those not seen recently +- User can manually add nodes by MAC/IP (for networks without mDNS) +- Export/import registry as JSON for fleet management across machines +- Node health history (uptime, last OTA, error count) tracked over time + +This means the desktop app **remembers the mesh** across restarts, which is critical for field deployments where nodes may be offline temporarily. + +### 10. OTA Safety Gate — Rolling Updates + +Mesh deployments cannot tolerate all nodes rebooting simultaneously. The OTA subsystem includes a **rolling update mode** that preserves sensing continuity: + +```rust +#[derive(Serialize, Deserialize)] +pub struct BatchOtaConfig { + /// Update strategy + pub strategy: OtaStrategy, + /// Max nodes updating concurrently + pub max_concurrent: usize, + /// Delay between batches (seconds) + pub batch_delay_secs: u64, + /// Abort if any node fails + pub fail_fast: bool, +} + +#[derive(Serialize, Deserialize)] +pub enum OtaStrategy { + /// Update one node at a time, wait for it to rejoin mesh + Sequential, + /// Update non-adjacent TDM slots to maintain coverage + TdmSafe, + /// Update all nodes simultaneously (development only) + Parallel, +} +``` + +**`TdmSafe` strategy:** + +1. Sort nodes by TDM slot index +2. Update even-slot nodes first (slots 0, 2, 4...) +3. Wait for each to reboot and rejoin mesh (verified via beacon) +4. Then update odd-slot nodes (slots 1, 3, 5...) +5. At no point are adjacent nodes offline simultaneously + +**UI flow:** + +- User selects target firmware + target nodes +- App shows pre-update diff (current vs new version per node) +- Progress bar per node with states: `queued → uploading → rebooting → verifying → done` +- Abort button halts remaining updates without rolling back completed ones +- Post-update health check confirms all nodes are sensing + +### 11. Plugin Architecture (Future) + +This desktop tool is quietly becoming the **control plane for RuView**. Once it manages discovery, firmware, OTA, WASM, sensing, and mesh topology, plugin extensibility becomes inevitable: + +- **Firmware management** today → **swarm orchestration** tomorrow +- **WASM upload** today → **edge module marketplace** tomorrow +- **Sensing view** today → **activity classification dashboard** tomorrow + +The Tauri command surface should be designed with this trajectory in mind: + +- Commands are grouped by bounded context (already done) +- Each context can be extended by loading additional Tauri plugins +- The node registry becomes the source of truth for all plugins +- Event bus (Tauri's `emit`/`listen`) provides cross-plugin communication + +This does NOT mean building a plugin system in Phase 1. It means keeping the architecture open to it: no hardcoded views, state flows through the registry, commands are typed and versioned. + +### 12. Security Considerations 1. **PSK Storage**: OTA PSK tokens are stored in the OS keychain via `tauri-plugin-stronghold` or the platform's native credential store, never in plaintext config files. @@ -649,7 +745,7 @@ cargo build --release -p wifi-densepose-sensing-server 5. **WASM Signature Verification**: The desktop app can sign WASM modules before upload using a locally stored Ed25519 key pair, complementing the node-side verification (ADR-040). -### 10. Implementation Phases +### 13. Implementation Phases | Phase | Scope | Effort | Priority | |-------|-------|--------|----------|