docs: add persistent node registry, OTA safety gate, plugin architecture to ADR-052
Incorporates engineering review feedback: - Persistent node registry (~/.ruview/nodes.db) — discovery becomes reconciliation - BatchOtaSession aggregate with TdmSafe rolling update strategy - Plugin architecture section — control plane extensibility trajectory - Renumbered sections for new content (9-12 added, impl phases now section 13) Co-Authored-By: claude-flow <ruv@ruv.net>
This commit is contained in:
parent
2ba8b3b93d
commit
50a82165c9
|
|
@ -51,6 +51,12 @@ address.
|
|||
**Invariant**: No two nodes may share the same MAC address. If a node is
|
||||
discovered via multiple strategies, the most recent data wins.
|
||||
|
||||
**Persistence**: The registry is persisted to `~/.ruview/nodes.db` (SQLite via
|
||||
`rusqlite`). On startup, all previously known nodes are loaded as `Offline` and
|
||||
reconciled against a fresh discovery scan. This means the app **remembers the
|
||||
mesh** across restarts — critical for field deployments where nodes may be
|
||||
temporarily powered off.
|
||||
|
||||
#### `Node` (Entity)
|
||||
|
||||
| Field | Type | Description |
|
||||
|
|
@ -156,6 +162,32 @@ Represents an over-the-air firmware update to a running node.
|
|||
| `phase` | `OtaPhase` | Uploading / Rebooting / Verifying / Done / Failed |
|
||||
| `progress` | `Progress` | Upload progress |
|
||||
|
||||
#### `BatchOtaSession` (Aggregate Root)
|
||||
|
||||
Coordinates rolling firmware updates across multiple mesh nodes. Prevents all
|
||||
nodes from rebooting simultaneously, which would collapse the sensing network.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | `Uuid` | Batch session identifier |
|
||||
| `firmware` | `FirmwareBinary` | The binary being deployed |
|
||||
| `strategy` | `OtaStrategy` | `Sequential`, `TdmSafe`, `Parallel` |
|
||||
| `max_concurrent` | `usize` | Max nodes updating at once |
|
||||
| `batch_delay_secs` | `u64` | Delay between batches |
|
||||
| `fail_fast` | `bool` | Abort remaining on first failure |
|
||||
| `node_states` | `Map<MacAddress, BatchNodeState>` | Per-node progress |
|
||||
|
||||
**Invariant**: In `TdmSafe` mode, adjacent TDM slots are never updated
|
||||
concurrently. Even-slot nodes update first, then odd-slot nodes.
|
||||
|
||||
**Lifecycle**: `Planning → InProgress → Completed | PartialFailure | Aborted`
|
||||
|
||||
- `BatchNodeState` — enum: `Queued`, `Uploading(Progress)`, `Rebooting`, `Verifying`, `Done`, `Failed(String)`, `Skipped`
|
||||
- `OtaStrategy` — enum:
|
||||
- `Sequential` — one node at a time, wait for rejoin
|
||||
- `TdmSafe` — update non-adjacent slots to maintain sensing coverage
|
||||
- `Parallel` — all at once (development only)
|
||||
|
||||
### Value Objects
|
||||
|
||||
- `SerialPort` — `{ name: String, vid: u16, pid: u16, manufacturer: Option<String> }`
|
||||
|
|
@ -177,6 +209,9 @@ Represents an over-the-air firmware update to a running node.
|
|||
| `OtaStarted` | `{ session_id, target_mac, firmware_version }` | Discovery (mark node as updating) |
|
||||
| `OtaCompleted` | `{ session_id, target_mac, new_version }` | Discovery (refresh node info) |
|
||||
| `OtaFailed` | `{ session_id, target_mac, error }` | UI (show error) |
|
||||
| `BatchOtaStarted` | `{ batch_id, strategy, node_count }` | UI (show batch progress) |
|
||||
| `BatchNodeUpdated` | `{ batch_id, mac, state }` | UI (update per-node status), Discovery (refresh) |
|
||||
| `BatchOtaCompleted` | `{ batch_id, succeeded, failed, skipped }` | UI (show summary), Discovery (full rescan) |
|
||||
|
||||
### Anti-Corruption Layer
|
||||
|
||||
|
|
|
|||
|
|
@ -637,7 +637,103 @@ cargo build --release -p wifi-densepose-sensing-server
|
|||
# target/release/sensing-server -> crates/wifi-densepose-desktop/binaries/sensing-server-{arch}
|
||||
```
|
||||
|
||||
### 9. Security Considerations
|
||||
### 9. Persistent Node Registry
|
||||
|
||||
Discovery alone is transient — nodes appear when they broadcast, disappear when they don't. A persistent local registry transforms discovery into **reconciliation**.
|
||||
|
||||
```
|
||||
~/.ruview/nodes.db (SQLite via rusqlite)
|
||||
```
|
||||
|
||||
**Schema:**
|
||||
|
||||
```sql
|
||||
CREATE TABLE nodes (
|
||||
mac TEXT PRIMARY KEY, -- e.g. "AA:BB:CC:DD:EE:FF"
|
||||
last_ip TEXT, -- last known IP
|
||||
last_seen INTEGER NOT NULL, -- Unix timestamp
|
||||
firmware TEXT, -- e.g. "0.3.1"
|
||||
chip TEXT DEFAULT 'esp32s3', -- esp32, esp32s3, esp32c3
|
||||
mesh_role TEXT DEFAULT 'node', -- 'coordinator' | 'node' | 'aggregator'
|
||||
tdm_slot INTEGER, -- assigned TDM slot index
|
||||
capabilities TEXT, -- JSON: {"wasm": true, "ota": true, "csi": true}
|
||||
friendly_name TEXT, -- user-assigned label
|
||||
notes TEXT -- free-form notes
|
||||
);
|
||||
```
|
||||
|
||||
**Behavior:**
|
||||
|
||||
- On discovery broadcast, upsert into registry (update `last_ip`, `last_seen`, `firmware`)
|
||||
- Dashboard shows **all registered nodes**, dimming those not seen recently
|
||||
- User can manually add nodes by MAC/IP (for networks without mDNS)
|
||||
- Export/import registry as JSON for fleet management across machines
|
||||
- Node health history (uptime, last OTA, error count) tracked over time
|
||||
|
||||
This means the desktop app **remembers the mesh** across restarts, which is critical for field deployments where nodes may be offline temporarily.
|
||||
|
||||
### 10. OTA Safety Gate — Rolling Updates
|
||||
|
||||
Mesh deployments cannot tolerate all nodes rebooting simultaneously. The OTA subsystem includes a **rolling update mode** that preserves sensing continuity:
|
||||
|
||||
```rust
|
||||
#[derive(Serialize, Deserialize)]
|
||||
pub struct BatchOtaConfig {
|
||||
/// Update strategy
|
||||
pub strategy: OtaStrategy,
|
||||
/// Max nodes updating concurrently
|
||||
pub max_concurrent: usize,
|
||||
/// Delay between batches (seconds)
|
||||
pub batch_delay_secs: u64,
|
||||
/// Abort if any node fails
|
||||
pub fail_fast: bool,
|
||||
}
|
||||
|
||||
#[derive(Serialize, Deserialize)]
|
||||
pub enum OtaStrategy {
|
||||
/// Update one node at a time, wait for it to rejoin mesh
|
||||
Sequential,
|
||||
/// Update non-adjacent TDM slots to maintain coverage
|
||||
TdmSafe,
|
||||
/// Update all nodes simultaneously (development only)
|
||||
Parallel,
|
||||
}
|
||||
```
|
||||
|
||||
**`TdmSafe` strategy:**
|
||||
|
||||
1. Sort nodes by TDM slot index
|
||||
2. Update even-slot nodes first (slots 0, 2, 4...)
|
||||
3. Wait for each to reboot and rejoin mesh (verified via beacon)
|
||||
4. Then update odd-slot nodes (slots 1, 3, 5...)
|
||||
5. At no point are adjacent nodes offline simultaneously
|
||||
|
||||
**UI flow:**
|
||||
|
||||
- User selects target firmware + target nodes
|
||||
- App shows pre-update diff (current vs new version per node)
|
||||
- Progress bar per node with states: `queued → uploading → rebooting → verifying → done`
|
||||
- Abort button halts remaining updates without rolling back completed ones
|
||||
- Post-update health check confirms all nodes are sensing
|
||||
|
||||
### 11. Plugin Architecture (Future)
|
||||
|
||||
This desktop tool is quietly becoming the **control plane for RuView**. Once it manages discovery, firmware, OTA, WASM, sensing, and mesh topology, plugin extensibility becomes inevitable:
|
||||
|
||||
- **Firmware management** today → **swarm orchestration** tomorrow
|
||||
- **WASM upload** today → **edge module marketplace** tomorrow
|
||||
- **Sensing view** today → **activity classification dashboard** tomorrow
|
||||
|
||||
The Tauri command surface should be designed with this trajectory in mind:
|
||||
|
||||
- Commands are grouped by bounded context (already done)
|
||||
- Each context can be extended by loading additional Tauri plugins
|
||||
- The node registry becomes the source of truth for all plugins
|
||||
- Event bus (Tauri's `emit`/`listen`) provides cross-plugin communication
|
||||
|
||||
This does NOT mean building a plugin system in Phase 1. It means keeping the architecture open to it: no hardcoded views, state flows through the registry, commands are typed and versioned.
|
||||
|
||||
### 12. Security Considerations
|
||||
|
||||
1. **PSK Storage**: OTA PSK tokens are stored in the OS keychain via `tauri-plugin-stronghold` or the platform's native credential store, never in plaintext config files.
|
||||
|
||||
|
|
@ -649,7 +745,7 @@ cargo build --release -p wifi-densepose-sensing-server
|
|||
|
||||
5. **WASM Signature Verification**: The desktop app can sign WASM modules before upload using a locally stored Ed25519 key pair, complementing the node-side verification (ADR-040).
|
||||
|
||||
### 10. Implementation Phases
|
||||
### 13. Implementation Phases
|
||||
|
||||
| Phase | Scope | Effort | Priority |
|
||||
|-------|-------|--------|----------|
|
||||
|
|
|
|||
Loading…
Reference in New Issue