This commit is contained in:
ArsenHandzhyan 2026-05-17 18:45:47 +00:00 committed by GitHub
commit 4ce2b18009
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
63 changed files with 13250 additions and 355 deletions

187
CHECKLIST.md Normal file
View File

@ -0,0 +1,187 @@
# RuView · Implementation Checklist
Single source of truth for what's shipped and what's open. Updated
at the end of every session. Pair with
[`docs/references/espectre-gap-analysis.md`](docs/references/espectre-gap-analysis.md)
for the technical detail behind each line.
Last sweep: **2026-05-17**, branch `feat/ota-rssi-mobile`, head `0ec1e4b0`.
Status: 47 Done / 0 Open in-scope. Deferred items (out of session scope,
each with explicit reason) listed at the bottom.
This count includes the ADR-100..114 carry-in from the prior agent + this
session's ADR-115 (FW set-target REST), ADR-116 (WiFlow-v1 Rust loader),
ADR-116 cosmetic (UI dropdown), and ADR-117 (process hygiene + audit
follow-ups). ADR-111 is intentionally absent (folded into ADR-109 during
the AP-MAC tracking work).
---
## ✅ Done
### Server (`v2/crates/wifi-densepose-sensing-server`)
- [x] **ADR-100** PHY gain-lock (AGC + FFT freeze, ESPectre port) — FW
- [x] **ADR-101** Raw-amplitude classifier (CV + baseline drop, hysteresis)
- [x] **ADR-101** Per-node classification badges in WS payload
- [x] **ADR-102** NBVI subcarrier selection (formula α=0.5, top-12)
- [x] **ADR-102** NBVI Step 1 quiet-window finder
- [x] **ADR-103** Persistent baseline at `data/baseline.json` (FULL broadband)
- [x] **ADR-103** Universal threshold via baseline-CV normalization
- [x] **ADR-104** Per-subcarrier drift channel (off-axis presence)
- [x] **ADR-104** NBVI Step 3 FP-rate validation (K ∈ {6,8,10,12,16,20})
- [x] **ADR-104** Per-sub drift exposed in WS `node_features[].drift_score`
+ raw.html sparkline per node (commit eec3ca6c)
- [x] **ADR-104** Baseline staleness watch — warn when on-disk baseline
> 4 h old AND drift consistently fires during `absent` periods
(commit eec3ca6c)
- [x] **ADR-105** Drop all synthetic data from runtime
([signal_field, pose_keypoints, persons, fake confidence — all gated)
- [x] **ADR-105** `n_aps_used: u8` uniform field on `enhanced_motion` +
`enhanced_breathing` (commit 598a4b2f)
- [x] **ADR-106** Full complex CSI in WS (`amplitude` + `phases` + meta)
- [x] **ADR-106** Built-in CSI keepalive (managed `ping` per sensor)
- [x] **ADR-106** Server-side µs `timestamp_us`
- [x] **ADR-107** `POST /api/v1/baseline/calibrate` + UI button
- [x] **ADR-107** Auto-recalibrate on long-quiet periods (30 min default)
- [x] **ADR-107** `GET /api/v1/baseline` (status + cooldown)
- [x] **ADR-107** Progress bar in raw.html calibrate button
(commit 432753e1)
- [x] **ADR-112** Multi-AP `signal_field` via `MultistaticFuser`
coverage × activity heatmap, non-zero only with ≥2 nodes +
positions; preserves ADR-105 zero-grid otherwise (commit c8ac60f6)
- [x] **ADR-105** Hide pose canvas in Docker SPA when
`model_loaded == false` + "no trained model" overlay
(commit 2dcb30a6)
- [x] **ADR-104** Phase-domain drift channel — script + server both
compute per-subcarrier circular mean/var; `phase_drift_score`
surfaced on `PerNodeFeatureInfo` (commit 47dafab4)
- [x] **ADR-113** Day/night baseline profiles with hot-reload
(`--baseline-profile {single,auto,day,night}`) (commit a1e09525)
- [x] **ADR-114** 2000-packet replay regression suite (1000 idle +
1000 motion synthetic-but-parameter-matched, F1 ≥ 0.85
threshold) (commit 96225e27)
### Firmware (`firmware/esp32-csi-node`)
- [x] **ADR-100** Gain-lock (300-packet median, MIN_SAFE_AGC=30 safety)
- [x] **ADR-106** Sensor µs timestamp in CSI trailer (`rx_ctrl.timestamp`)
- [x] **ADR-108** NVS persistence of gain-lock — reboot ready in ~0.5 s
- [x] **ADR-109** `POST /ota/recalibrate` — clear gain-lock NVS via REST,
no USB needed (commit f92807cd)
- [x] **ADR-109** Track AP MAC in `gl_ap_mac` NVS — auto-invalidate
stale gain-lock on AP swap (commit f92807cd)
- [x] **ADR-115** `POST /ota/set-target` — repoint CSI aggregator
(`csi_cfg/target_ip` + `target_port`) without USB; recovered
both nodes after Mac IP move TP-Link → .103
### Pose model
- [x] **ADR-116** WiFlow-v1 supervised pose loader (Rust) — `--wiflow-model
data/models/ruview/wiflow-v1/wiflow-v1.json` flips
`pose_estimation: true`; per-tick TCN forward yields 17 COCO
keypoints on `/api/v1/pose/current` and WS `pose_data`. Output
quality requires per-deployment fine-tune (LoRA adapters or
re-train, see Pack E).
- [x] **ADR-117** Process hygiene + audit follow-ups — UDP loopback
filter prevents `cargo test` cross-talk from spawning ping
zombies (250→2 children); keepalive pre-reaps orphans at startup;
`/` redirects to SPA; wiflow zero-pad replaces silent
subcarrier-0 duplication; keypoint confidence stamped from
runtime classifier; sensing tab container restored; multi-node
test guards external :5005; docs/typo/range sweep.
### Tests / fixtures
- [x] **ADR-114** `tests/fixtures/replay_idle.jsonl` +
`replay_motion.jsonl` (1000 frames each, JSONL schema:
`{node_id, amplitude[]}`) (commit 96225e27)
- [x] **ADR-114** `scripts/generate-replay-fixtures.py`
seeded deterministic generator for the two fixtures
(commit 96225e27)
- [x] (parallel agent) RSSI carry-through via feature_state header fix
- [x] (parallel agent) OTA: `OTA_SIZE_UNKNOWN`, httpd stack_size=8192,
reset-reason log — all three FW prerequisites for working OTA
### Ops / tooling
- [x] `scripts/ota-deploy.sh` — WiFi OTA flash + auto-discovery + verify
- [x] `scripts/record-baseline.py` — headless baseline capture (CLI)
- [x] `data/baseline.json` v2 schema
- [x] `docs/references/ota-pipeline.md` — verbatim OTA recipe (port 8032)
### Documentation
- [x] **ADR-100..117** all written (ADR-111 intentionally absent), each ≤ 200 lines
- [x] `docs/references/espectre-techniques.md` — Pace technique catalogue
- [x] `docs/references/espectre-gap-analysis.md` — section-by-section gap
- [x] Documentation actualization sweep — every Open Items section
cross-checked against actual implementation state
---
## ⏳ Open, priority-sorted
### High value, low effort
(all closed this session — see Done above. Tailscale-target item
moved to Deferred below per session brief.)
### High value, medium effort
(all closed this session — see Done above)
### Bigger, lower urgency (still active)
(all closed this session — multiple baseline profiles shipped via
ADR-113, see Done above)
### One-time hygiene
- [x] **Re-record `data/baseline.json`** — current file already carries
`per_subcarrier_mean` so amplitude drift (ADR-104) is active.
Verified the recorder writes the new
`per_subcarrier_phase_mean` / `per_subcarrier_phase_var` schema
end-to-end (this session). `data/baseline.json` is untracked,
so no repo commit needed; operator re-records via UI when they
step out for a true empty-room sample (currently the file
reflects an operator-present recording — fine for the amp
channel, needs re-record for the phase channel to populate
≥ 16 usable subcarriers).
### Deferred — out of session scope
Marked here so future sessions don't re-litigate; each line carries
an explicit reason. Bring them back only if scope changes.
- **HA via MQTT** — new integration. Excluded by current session brief
(no new integrations on current hardware).
- **ESPHome native component** — same reason as HA/MQTT.
- **Web Serial calibration game** — explicitly excluded.
- **Boot-time NBVI freeze in FW** — explicitly excluded.
- **Per-channel NVS cache for gain-lock** — explicitly excluded; only
matters if channel hopping is reactivated, which is also excluded.
- **DensePose model train + load** — explicitly excluded.
- **AETHER contrastive pretrain on live data** — explicitly excluded.
- **MERIDIAN domain generalization** — explicitly excluded.
- **Channel hopping (ADR-029)** — explicitly excluded.
- **Multi-antenna support (`n_antennas` > 1)** — explicitly excluded.
- **README.md trim (542 lines)** — explicitly excluded.
- **CLAUDE.md trim (407 lines)** — explicitly excluded.
- **Tailscale-target in NVS** — Mac stable on TP-Link this session,
low ROI. Not blocking. (ADR-100 follow-up; bring back if Mac
network swap becomes routine.)
---
## Reference
| Doc | Purpose |
|---|---|
| [`docs/adr/`](docs/adr) | All ADRs 001-117 (111 absent); 100-117 are this session |
| [`docs/references/espectre-techniques.md`](docs/references/espectre-techniques.md) | Pace technique catalogue + RuView adoption |
| [`docs/references/espectre-gap-analysis.md`](docs/references/espectre-gap-analysis.md) | Section-by-section gap with priority table |
| [`docs/references/ota-pipeline.md`](docs/references/ota-pipeline.md) | OTA recipe — port 8032, three FW prereqs |
To mark an item done: tick the box, add `(ADR-XXX, commit-sha)` after
the line, move it from the priority section to the top "Done" section.

View File

@ -9,7 +9,7 @@ services:
ports:
- "3000:3000" # REST API
- "3001:3001" # WebSocket
- "5005:5005/udp" # ESP32 UDP
- "5006:5005/udp" # ESP32 UDP (host 5006 -> container 5005; sensors point to .21:5006)
environment:
- RUST_LOG=info
# CSI_SOURCE controls the data source for the sensing server.

View File

@ -0,0 +1,246 @@
# ADR-098 — ESP32-S3 CSI Node Deployment Fixes (room01/room02)
**Status**: Accepted
**Date**: 2026-05-14
**Scope**: `firmware/esp32-csi-node/`, `v2/crates/wifi-densepose-sensing-server/`,
`v2/crates/wifi-densepose-desktop/`, `ui/mobile/`
## Context
Two ESP32-S3 CSI nodes (room01 `1c:db:d4:49:eb:88`, room02 `e8:f6:0a:83:89:44`)
were deployed against the RuView stack on a 2.4 GHz domestic LAN. The
out-of-the-box firmware booted but did not produce usable presence/motion
signal: `motion_score` saturated at `1.0`, `presence_score` froze near a
non-zero constant regardless of activity, vital signs never populated,
and OTA updates rolled back on every attempt.
Root-causing the chain took multiple rebuild/flash cycles. This ADR
records the final patches that made the stack functional end-to-end on
the deployed hardware and the empirical evidence that drove each change.
## Decisions
### D1 — Disable promiscuous mode in `csi_collector`
`esp_wifi_set_promiscuous(true)` silenced the CSI RX callback entirely
on this silicon revision (`yield=0pps` in `adaptive_ctrl` medium tick
log). Removing the call lets the WiFi driver invoke `wifi_csi_callback`
again at the connected-AP rate (~5-10 pps for beacon-driven traffic).
**Patch**: `csi_collector.c` — replace `esp_wifi_set_promiscuous(true);`
with a one-line `ESP_LOGI` documenting the empirical incompatibility.
Do **not** re-enable.
### D2 — Truncate `n_subcarriers` to `EDGE_MAX_SUBCARRIERS` instead of early-return
CSI frames on this hardware arrive at 384 bytes = 192 subcarriers. The
DSP pipeline declared `EDGE_MAX_SUBCARRIERS = 128`, so every incoming
frame failed the `n_subcarriers > EDGE_MAX_SUBCARRIERS` check and
returned before `process_frame` reached Step 8 (motion energy). This
was the underlying reason DSP outputs appeared frozen: the pipeline
literally was not running.
**Patch**: `edge_processing.c` — on oversized frames, clamp
`n_subcarriers = EDGE_MAX_SUBCARRIERS` and log a one-shot warning,
instead of returning. The first 128 subcarriers cover the full 20 MHz
HT20 channel; the trailing bins are HT40 sideband and not relied on.
### D3 — Broadband motion source
After D2 the original Step 8 (variance of unwrapped phase of a single
"primary" subcarrier) still failed:
* unwrapped phase drifts monotonically (thermal, oscillator) so its
variance over a 20-frame window equals `(slope·W/2)²/3`, a non-zero
constant unrelated to activity;
* the "primary" winner index jumps frame-to-frame (e.g. 22 → 103 →
105), so per-bin amplitude variance is dominated by index churn,
not motion.
We replace the source with **broadband mean amplitude variance**:
on every frame compute `mean(sqrt(I²+Q²))` across **all** subcarriers,
push that scalar into a 20-sample ring, and use its temporal variance
as `motion_energy`. This is the well-known CSI motion proxy:
human motion smears multipath and inflates frequency-domain spread
coherently across the whole channel.
Empirical separation measured on the deployed hardware:
| Window | broadband variance (median) |
|---|---|
| Empty room (3 m) | 0.07 0.10 (occasional 1.6 spike) |
| Walking past 2-3 m | 3.5 14 |
Ratio ≈ 44×. Divisor `var / 3.0f` with `clamp(0, 1.0)` puts empty
under 0.05 and walking near saturation.
**Patch**: `edge_processing.c`
* New buffer `s_broad_mean_amp_history[20]`.
* Per-frame `band_amp_mean = mean(sqrt(I²+Q²))` over all subcarriers.
* Step 8 replaced: `s_motion_energy = clamp(var / 3.0f, 0, 1)`.
### D4 — Biquad sample rate consistency
`biquad_bandpass_design(..., fs=20.0f, ...)` (filter design) did not
match `estimate_bpm_zero_crossing(..., sample_rate=10.0f, ...)` (BPM
detector). At a real callback rate of ~10 Hz the breathing passband
designed for 20 Hz becomes 0.050.25 Hz on the wire, excluding the
0.20.3 Hz human breathing band (1218 BPM).
**Patch**: `edge_processing.c:1063``fs = 10.0f` for both
breathing and heart-rate filters. With D2+D3 active, `breathing_rate_bpm`
populates 2122 BPM for a stationary person within ~30 s.
### D5 — OTA: full-partition erase + larger HTTP task stack
Two independent OTA bugs:
1. `esp_ota_begin(..., OTA_WITH_SEQUENTIAL_WRITES, ...)` skipped the
trailing-page erase, leaving stale code from a previous (larger)
image in the tail of the target partition. The new image header
passed SHA validation but residual instructions still resided at
addresses reachable via IRAM jump tables.
2. The HTTP server worker that runs the OTA verify step overflowed
its default 4 KB stack (esp_ota_get_app_partition_description does
substantial work). The new image *was* booted from `ota_1`, then
panicked in early init from stack overflow, and the bootloader
fell back to `ota_0` — looking exactly like a rollback even though
`CONFIG_BOOTLOADER_APP_ROLLBACK_ENABLE` is disabled.
**Patches**: `ota_update.c`
* `esp_ota_begin(update_partition, OTA_SIZE_UNKNOWN, &handle)`
full-partition erase before write.
* `httpd_config_t config = HTTPD_DEFAULT_CONFIG(); config.stack_size = 8192;`
doubled stack so OTA validation has room.
Plus `main.c:130-153``esp_reset_reason()` and running-partition label
logged once at app start, so any future boot anomaly is visible without
guesswork.
### D6 — sensing-server: parse RuView feature_state, refuse simulation
Out of the box, `sensing-server` (`v2/crates/wifi-densepose-sensing-server`)
parsed only `0xC5110001` (raw CSI) and `0xC5110002` (vitals). RuView FW
emits `0xC5110006` (ADR-081 feature_state) as its default upstream
payload — a gap in the project.
**Patches**: `src/main.rs`
* New `parse_rv_feature_state(buf)` decoding the 60-byte
`rv_feature_state_t` into the existing `Esp32VitalsPacket` shape;
wired ahead of the existing `parse_esp32_vitals` call.
* Per-node `BaselineTracker` (file-scope `OnceLock<Mutex<HashMap<u8,_>>>`)
applies hysteretic motion gating on top of the FW-reported scores so
the UI receives clean boolean presence transitions even when the FW
scalar is noisy.
* `--source simulate` and the auto-fallback to simulation removed;
`simulate`/`simulated` now exit non-zero with a `ERROR` log.
A `parse_csi_lean` parser was also added for compatibility with the
legacy FW 5.47 (`esp32s3_csi_capture`) CSV format. Dead code under
current FW; kept as defence-in-depth so a mistakenly flashed legacy
sensor still produces useful data.
### D7 — Desktop UI: HTTP-sweep discovery
mDNS (`_ruview._udp.local.`) and UDP-broadcast beacon discovery (the
two paths the desktop ships) are not advertised by current RuView FW.
We added a third concurrent path: `GET /<probe-ip>:8032/status` over
the local /24 subnet, parsing the JSON returned by RuView's
`ota_status_handler`.
**Patches**: `v2/crates/wifi-densepose-desktop/src/commands/discovery.rs`
* `discover_via_http_sweep(timeout)` running alongside mDNS + UDP.
* `futures::future::join_all(tasks)` with overall `tokio::time::timeout`
replaces the previous sequential `for task in tasks` loop, which
blocked on slow-to-time-out unrelated IPs and missed the responding
sensors.
* Result-keeping in `useNodes`/`Dashboard` — keep last good list when
a poll round returns 0 nodes.
### D8 — Mobile UI: WS path + Tailscale default + no simulation fallback
* `WS_PATH = '/ws/sensing'` and a hard-coded `WS_PORT = 8765` so the
mobile app's `ws.service` connects to the RuView WS endpoint instead
of the legacy `/api/v1/stream/pose` FastAPI path.
* `settingsStore.serverUrl` defaults to `http://100.123.189.10:8080`,
the deployed Mac's Tailscale IP, so the phone reaches the server
without LAN dependency.
* All `simulated` fallbacks removed from `ws.service.ts` and
`matStore.ts` — UI shows `disconnected` rather than synthetic data
when the server is unreachable.
### D9 — Reset-reason logging in `app_main`
A two-line ESP_LOGI at the start of `app_main` records
`esp_reset_reason()` and `esp_ota_get_running_partition()->label`.
Worth its weight every time we touched OTA — it eliminated guesswork
when an image silently fell back.
## Verification
Acceptance ran on both deployed nodes with the operator stationary,
then walking 2-3 m past each sensor, then leaving the room.
| Criterion | Target | room01 | room02 |
|---|---|---|---|
| `motion_energy` empty room | < 0.05 | 0.018 | 0.070 |
| `motion_energy` walking | > 0.3 within 2 s | < 1 s | 3 s |
| `motion_energy` decay after exit | < 0.1 within 5 s | 0.020.03 | 0.020.03 |
| `breathing_rate_bpm` stationary 30 s | 12-20 BPM | 22.2 BPM | 21.0 BPM |
| OTA round-trip | 2 consecutive succeed | ✅ | ✅ |
| Reset-reason visible | one-line log at boot | ✅ | ✅ |
OTA #1 transitioned `running_partition: ota_0 → ota_1`; OTA #2 reversed
it back to `ota_0`. No panics. `Connection reset` on the curl side is
expected — `esp_restart()` tears down the TCP connection after
`httpd_resp_send` returns.
## Files Touched
```
firmware/esp32-csi-node/main/csi_collector.c
firmware/esp32-csi-node/main/edge_processing.c
firmware/esp32-csi-node/main/main.c
firmware/esp32-csi-node/main/ota_update.c
firmware/esp32-csi-node/sdkconfig.defaults
v2/crates/wifi-densepose-sensing-server/src/main.rs
v2/crates/wifi-densepose-sensing-server/src/csi.rs
v2/crates/wifi-densepose-desktop/src/commands/discovery.rs
v2/crates/wifi-densepose-desktop/src/commands/server.rs
v2/crates/wifi-densepose-desktop/ui/src/hooks/useNodes.ts
v2/crates/wifi-densepose-desktop/ui/src/hooks/useServer.ts
v2/crates/wifi-densepose-desktop/ui/src/pages/Dashboard.tsx
v2/crates/wifi-densepose-desktop/ui/src/pages/Sensing.tsx
v2/crates/wifi-densepose-desktop/ui/src/types.ts
ui/mobile/src/constants/websocket.ts
ui/mobile/src/services/ws.service.ts
ui/mobile/src/stores/matStore.ts
ui/mobile/src/stores/settingsStore.ts
ui/mobile/src/screens/MATScreen/index.tsx
ui/mobile/src/screens/VitalsScreen/index.tsx
docker/docker-compose.yml # host port 5005 → 5006 (RuView FW target)
```
## Open Items
* `EDGE_MAX_SUBCARRIERS` is still `128` — D2 truncates incoming frames
rather than enlarging the buffer. Increasing to 192 would let the
pipeline use the full 192-subcarrier HT40 sideband, but requires
re-sizing several stack/heap structures and re-tuning DSP windows.
Tracked for a future release.
* Empty-room `motion_energy` on room02 sits slightly above the 0.05
target (0.07). Either the Fresnel-zone alignment for that node is
noisier or the calibration constant `var / 3.0f` needs to be
hardware-rev specific. Acceptable for the current deployment;
candidate for an auto-calibration routine.
## References
* ADR-039 — Edge intelligence pipeline (the file we patched).
* ADR-081 — `rv_feature_state_t` packet format (`0xC5110006`).
* RuView issue #555*DSP froze on unwrapped phase variance* (this ADR).
* RuView issue #556*OTA never sticks* (this ADR).

View File

@ -0,0 +1,154 @@
# ADR-100 — PHY Gain Lock for Baseline-Stable CSI
**Status**: Accepted
**Date**: 2026-05-17
**Scope**: `firmware/esp32-csi-node/main/csi_collector.c`,
`v2/crates/wifi-densepose-sensing-server/static/raw.html`.
## Context
After ADR-110 deployed the TP-Link WISP AP and the operator captured three
controlled one-minute windows (empty / sit / walk), the RSSI MAD-Δ
classifier failed to separate the three states — measured `d` values
overlapped within ±0.03 of 0.49 while in-state spread was ±0.10. We
inspected the live amplitude spectrum on the new `raw.html` console and
saw a slow ±20-30 % broadband drift in the sensor amplitude even with
the room provably empty. The drift was indistinguishable from body
modulation at multi-meter range and dominated every downstream feature.
Francesco Pace's [ESPectre](https://github.com/francescopace/espectre)
project (GPLv3) traced the same artefact to the ESP32 PHY's automatic
gain control: AGC continuously rebalances the receiver gain per packet
so received frames stay in the optimal decoding range. For CSI sensing
this is a disaster — the same channel state arrives with a different
amplitude every packet because the gain stage shifts under it. Pace
documented two undocumented PHY routines in the IDF blob that freeze
AGC and FFT scaling, plus a calibration recipe (median of the first
300 packets) that is robust to brief startup activity.
## Decisions
### D1 — Port the ESPectre gain-lock to RuView FW
Added a self-contained block to `csi_collector.c`:
* **Overlay struct** `rv_phy_rx_ctrl_t` aliased over `wifi_csi_info_t.rx_ctrl`
to read the hidden `agc_gain` (u8) and `fft_gain` (signed i8) fields.
* **Extern declarations** for the two PHY routines:
```c
extern void phy_fft_scale_force(bool force_en, int8_t force_value);
extern void phy_force_rx_gain(int force_en, int force_value);
```
* **Two-phase calibration** (`rv_gain_lock_process`):
- Phase 1 (≤ 300 packets, ~6 s at the rate-gated 50 Hz callback):
accumulate AGC and FFT samples into static arrays.
- At the 300th packet: `qsort` both arrays, take the median, and
call the two PHY routines to freeze gain.
* **Safety branch**: if median AGC < 30, skip the lock and log a
warning. Forcing a low gain on a strong-signal deployment causes the
RX path to freeze (empirically documented in ESPectre's
`gain_controller.h`).
* **Supported targets**: ESP32-S3, ESP32-C3, ESP32-C6 only — older
parts compile to a no-op stub. RuView ships on S3 so this is the only
path we care about.
The hook is wired immediately after the existing rate-gate and MAC
filter in the CSI callback so calibration completes within the first
~6 s after the WiFi association, regardless of host traffic. After
that it short-circuits.
Tagged as ADR-100 in the source comment for traceability.
### D2 — Use the existing `raw.html` console (ADR-110, D2 reuse) as the verification UI
The console added in ADR-110 already streams `nodes[].amplitude` from
the existing WebSocket. No server-side change was needed. The HTML
displays a per-node bar histogram of all 56 active subcarriers plus
broadband mean amplitude and RSSI traces over the last 30 s. This is
the surface where the operator can watch — without any DSP, without any
classification — whether the gain-lock has actually flattened the
baseline.
### D3 — Geometry matters as much as gain-lock
A controlled three-state capture made on 2026-05-17 with both sensors
positioned so that the line `TP-Link AP → sensor` passes through the
operator (lying on the bed) confirmed both decisions. The summary
table appears under *Verified Acceptance* below. Earlier captures
(ADR-110) failed to separate states partly because the sensors were
placed off-axis from the AP-to-body line; with that geometry the body
never physically obstructs the CSI channel.
## Calibration values observed (real captures, this deployment)
| Node | Boot rate (low traffic) | Boot rate (ping flood) | AGC median | FFT scale median | Lock decision |
|---|---|---|---|---|---|
| room01 (192.168.0.101) | 0.3 fps | 30+ fps | **4244** | 31 / 33 | **APPLIED** |
| room02 (192.168.0.100) | 0.3 fps | 30+ fps | **44** | 40 / 42 | **APPLIED** |
Both AGC medians are comfortably above the 30 safety threshold. The
calibration completes in ~6 s when there is any host traffic (a single
ping to the sensor at 10 pps is enough); on a totally idle channel
beacons drive the rate down to 0.3 fps and calibration would take ~17
minutes — practically we always have some traffic.
## Verified Acceptance — three-state separation
Geometry: TP-Link AP on the wall, both sensors at table-level on the
opposite side of the room, operator lying on the bed between AP and
sensors. 30 seconds per state, gain-lock active on both nodes,
`raw.html` open during capture, `target_ip` provisioned to the Mac's
TP-Link-side IP (192.168.0.103) so no upstream NAT is in the path.
| State | node 1 mean A | node 1 CV | node 1 sub-CV <5 % | node 2 mean A | node 2 CV | node 2 sub-CV <7 % |
|---|---|---|---|---|---|---|
| **EMPTY** (operator out) | **37.28** | **2.71 %** | **44/44** | 9.52 | 5.22 % | 26/44 |
| **STILL** (operator lying still on bed) | 22.43 | 3.70 % | 30/44 | 9.67 | 5.02 % | 24/44 |
| **WALK** (operator pacing the room) | 31.77 | **12.50 %** | 0/44 | 7.15 | **29.72 %** | 0/44 |
Observations:
* **Node 1 separates all three states** by mean amplitude alone: 37 →
22 → 32. The body lying still blocks the direct path
(40 % amplitude drop), then motion adds reflections back. The CV
ladder 2.71 → 3.70 → 12.50 % is a second independent feature.
* **Node 2 separates STILL+EMPTY from WALK** by CV (5 → 30 %). Its
geometry doesn't pick up a still body, only motion.
* **Compare to ADR-110** where empty/sit/walk differed by ±0.02 inside
±0.10 noise — we now have inter-state separation ratios of **×3.4 on
node 1 and ×5.9 on node 2**. The signal is no longer dominated by
baseline drift.
## Files Touched
```
firmware/esp32-csi-node/main/csi_collector.c # gain-lock module + hook
v2/crates/wifi-densepose-sensing-server/static/raw.html # already from ADR-110
docs/adr/ADR-100-gain-lock-baseline-stabilization.md # this ADR
```
## Open Items
* ✅ **NBVI subcarrier selection** — closed in ADR-102 (server-side
port with quiet-window finder).
* ✅ **Server-side RSSI parsing** — fixed by parallel agent in commit
`3393c1e8` (parse_esp32_frame offset realignment + carrying RSSI
through feature_state packets).
* ✅ **Calibration latency on an idle channel** — closed in ADR-106
by the built-in managed-`ping` keepalive (drives sensor RX at
25 pkt/s/node out of the box).
* ⏳ **NVS target_ip is hardcoded** — still open. Tailscale-target
option not implemented; sensors still send to the Mac's TP-Link-
side IP (192.168.0.103). Mac roaming still breaks the CSI stream.
## References
* ADR-039 — Edge intelligence pipeline (host DSP path).
* ADR-098 — Earlier ESP32-S3 deployment fixes.
* ADR-110 — TP-Link WISP deployment + first RSSI-Δ attempt (this ADR
supersedes the threshold table in ADR-110, D3 — the RSSI MAD-Δ
detector is left in place but no longer the primary signal).
* Francesco Pace, *How I Turned My Wi-Fi Into a Motion Sensor — Part 2*,
Dec 2025 — source of the gain-lock recipe.
* `francescopace/espectre`, `components/espectre/gain_controller.{h,cpp}`
on GitHub — reference implementation (GPLv3).

View File

@ -0,0 +1,147 @@
# ADR-101 — Raw-Amplitude Presence/Motion Classifier
**Status**: Accepted
**Date**: 2026-05-17
**Scope**: `v2/crates/wifi-densepose-sensing-server/src/main.rs`
(`amp_presence_override`, `amp_classify_from_latest`,
`amp_node_level`, `amp_node_snapshot`).
## Context
After ADR-100 the AGC drift is gone and the broadband baseline is clean.
Before this ADR the live `classification.motion_level` was being driven
by the legacy DSP (variance + motion_band_power thresholds) plus an
RSSI MAD-Δ override from ADR-110. Both failed on the operator's
deployment: variance overlaps empty/sit/walk within noise, and RSSI
MAD-Δ overlaps within ±0.03 of 0.49 across all three states. The
operator could lie still in the path between AP and sensor and the
detector would silently report `absent`.
The 30 sec × 3 controlled captures done on 2026-05-17 (lying between
TP-Link AP and sensor 1, see ADR-100 *Verified Acceptance*) showed
that **the broadband CV of mean amplitude separates the three states
by 3-6× on this geometry**. EMPTY = 2.7-5 %, STILL = 3.7-5 %,
WALK = 12.5-29.7 %. EMPTY vs STILL are best separated by the
**mean-amplitude drop** (37 → 22 on the active sensor, -40 %).
This ADR replaces the RSSI MAD-Δ classifier with a pure-amplitude one
that uses both signals: CV for motion, baseline drop for static body.
## Decisions
### D1 — `amp_presence_override` per-node classifier
For each frame received on the raw-CSI path:
1. Push current full amplitude vector into the NBVI ranking buffer
(`nbvi_history`, capacity 600 frames ≈ 30 s).
2. Periodically (every `NBVI_REFRESH_TICKS=200` calls, ~5 s) rank
subcarriers by NBVI (see ADR-102) and pick the top-12.
3. Compute **broadband_mean** as the average of NBVI-selected
subcarriers. Falls back to all non-zero subcarriers during warmup.
4. Push to two rolling windows:
- `short` (90 samples ≈ 4.5 s) — for CV.
- `long` (1200 samples ≈ 60 s) — for the rolling-fallback 95 %ile
baseline.
5. Compute `cv = std(short) / mean(short)`.
6. Compute `baseline` — see ADR-103 for the persistent-override path.
7. Stash `(cv, mean_short, baseline)` per node in `AMP_LATEST` for
cross-node fusion.
8. Run `amp_classify_from_latest` (D2 below) to produce the global
`(level, presence, confidence)`.
Returns `None` until the short window is full so the very first
seconds after boot don't emit garbage.
### D2 — Cross-node fusion in `amp_classify_from_latest`
The deployment has two sensors with very different SNR (node 1 mean
amplitude ~22, node 2 mean ~9 on the operator's TP-Link). A single
bursty node should not flip the whole detector. We use:
* **MAX CV** across nodes for the motion gate. Any node seeing
movement is enough — body modulates only the line-of-sight path
it crosses, the other node may stay clean.
* **ANY baseline drop**`present_still`. One well-placed node
seeing the body is enough.
Decision (universal-threshold normalized — see ADR-103 D3):
```
norm_max_cv = max_cv / baseline_cv (when calibration loaded)
gates: fallback when no calibration:
norm ≥ 6.0 → "active" max_cv ≥ 0.22
norm ≥ 3.0 → "present_moving" max_cv ≥ 0.10
any drop → "present_still" (same)
otherwise → "absent" (same)
```
### D3 — Sticky 3-second motion hysteresis
After each fusion pass, a global `AMP_HOLD` counter is reset to
`AMP_MOTION_HOLD_TICKS = 120` whenever the candidate is `moving` /
`active`. Each subsequent quiet tick decrements the counter; the
prior motion label is kept until it expires (≈ 3 s at the ~40
combined classifier ticks/s). This bridges the brief CV dips between
walking steps so the GLOBAL doesn't flicker between `moving` and
`absent`.
### D4 — `amp_classify_from_latest` read-only entry point
The server has multiple `SensingUpdate` producers — the raw-CSI path
runs the full pipeline above, but the feature_state path (0xC5110006)
arrives without raw amplitudes. We expose a parallel read-only
classifier that pulls the latest stashed per-node `(cv, mean, baseline)`
from `AMP_LATEST` and runs the same fusion. The feature_state path
calls it so its emitted `classification` agrees with the raw-CSI
path's — no flicker between the two SensingUpdate sources.
### D5 — Per-node labels in `build_node_features`
`PerNodeFeatureInfo.classification` is overridden via
`amp_node_snapshot(node_id)`, which runs the same per-node
classifier (without cross-node fusion or hysteresis) against the
stashed `(cv, mean, baseline)` for that node alone. UI consumers
(raw.html badges) see each sensor's independent decision plus the
global fused one — useful for finding sensor placement without
moving them.
## Files Touched
```
v2/crates/wifi-densepose-sensing-server/src/main.rs # ~230 lines added
v2/crates/wifi-densepose-sensing-server/static/raw.html # per-node badges
```
## Verified Acceptance
| State | GLOBAL | CV | Per-node detail |
|---|---|---|---|
| EMPTY | `absent` | 4-6 % | both nodes baseline mean, low CV |
| STILL (lying, in node 1 path) | `present_still` | 3-8 % | node 1 mean drops 70 %, RSSI -20 dB |
| WALK | `active` | 12-36 % | node 2 CV explodes, RSSI swings ±5 dB |
Cross-state separation ratio = 3.4× on node 1 broadband mean, 5.9×
on node 2 CV, compared to ±0.02 inside ±0.10 noise with the old
RSSI MAD-Δ classifier from ADR-110.
## Open Items
* ✅ **Per-subcarrier baseline-drop** — closed in ADR-104 (per-sub
drift channel with 10 % gate, triggers `present_still` even when
broadband doesn't move).
* ✅ **Off-axis sit doesn't trigger** — closed in ADR-104 (drift
channel catches off-line-of-sight body presence).
* ⏳ **CV saturates above ~30 %** — still open. Heavy-motion granularity
(run vs jog vs jump) lost above the `active` gate. Would need a
log-CV or rank-based metric to extend the dynamic range.
## References
* ADR-110 — first RSSI MAD-Δ attempt (superseded for `motion_level` /
`presence` / `confidence`; helper kept as `#[allow(dead_code)]`).
* ADR-100 — gain lock that makes this classifier possible.
* ADR-102 — NBVI subcarrier selection that drives the CV computation.
* ADR-103 — persistent baseline + universal threshold normalization.
* [`docs/references/espectre-techniques.md`](../references/espectre-techniques.md)
— full RuView ↔ ESPectre comparison.

View File

@ -0,0 +1,136 @@
# ADR-102 — NBVI Subcarrier Selection (server-side)
**Status**: Accepted
**Date**: 2026-05-17
**Scope**: `v2/crates/wifi-densepose-sensing-server/src/main.rs`
(`AmpState.nbvi_*`, `nbvi_select_top_k`).
## Context
Each ESP32-S3 CSI frame carries 56 active subcarriers on the HT20
20 MHz channel. The amplitudes per subcarrier have very different
SNR depending on frequency-selective fading: in the operator's
deployment subcarriers `k=6..11` and `k=22..26` sit at CV ≈ 6 % when
the room is empty, while subcarriers `k=38..43` (middle of the band,
near the LTF nulls) sit at CV ≈ 11 % — pure channel noise, no
information about the room.
ADR-101's classifier computes broadband-mean CV. Averaging over all
56 subcarriers means the noisy ones drag the baseline CV up to
5-7 %. That blunted the motion gates and we had to push them up to
10-22 %, losing sensitivity to subtle motion.
## Decisions
### D1 — Port Francesco Pace's NBVI to the server (not the FW)
Formula (ESPectre, GPLv3):
```
NBVI(k) = α · (σ_k / μ_k²) + (1 - α) · (σ_k / μ_k), α = 0.5
```
* `σ_k / μ_k²` — penalises weak subcarriers (a quiet bin with mean ≈ 0
gets `∞` and is filtered out).
* `σ_k / μ_k` — standard coefficient of variation; rewards stability.
* `α = 0.5` — empirically balanced (per Pace's α-sweep tests).
**Where**: in the server, not in FW. Pros: trivial to retune per
deployment, no flash cycle, single source of truth across two FW
variants we ship (`runbot_csi_node` and `esp32s3_csi_capture`). Cons:
we lose the ability to *only emit* selected subcarriers (would save
UDP bandwidth) — but at ~25 fps × 56 × 2 bytes = 2.8 KB/s per node,
bandwidth isn't a concern.
### D2 — Top-K with K = 12
Selected at server boot once `nbvi_history` has 90+ samples; then
re-selected every `NBVI_REFRESH_TICKS = 200` calls (~5 s of combined
classifier ticks). The selected indices live in
`AmpState.nbvi_selected`.
K=12 matches ESPectre's default. Smaller K = less averaging
smoothing; larger K = drags in worse subcarriers.
### D3 — Dead-zone gate at 25 % of median mean
Before NBVI scoring, drop any subcarrier whose mean amplitude is
below `0.25 × median(all subcarrier means)`. Guard tones (FW reports
amp[0] = 0 for DC), edge bins, and dead frequencies are excluded so
they can't "win" with σ/μ² → ∞.
### D4 — ESPectre Step 1: quiet-window finder
Naive NBVI ranking over the *entire* history is biased if a body
walked through during the calibration buffer. ADR-102 v2 adds the
quiet-window finder from Pace's Step 1:
1. Slide an `AMP_SHORT_WIN=90`-sample window across `nbvi_history`
with stride `AMP_SHORT_WIN/3 = 30`.
2. For each window, compute the CV of its per-frame broadband mean.
3. The window with the lowest CV is "quietest".
4. Per-subcarrier mean and std for NBVI scoring use **only that
window**.
If history is smaller than one window, the whole buffer is used.
Stride 30 (overlap of 60) keeps wall-clock cost trivial for 600
frames.
### D5 — `mean_for_baseline` uses FULL broadband, not NBVI
NBVI top-K re-selects between server restarts (different "quietest"
window may give different ranking). That made the persisted baseline
value incomparable across restarts (see ADR-103 D1). Fix: ADR-101
classifier keeps a parallel `short_full` ring buffer of FULL
broadband means (all non-zero subcarriers, no NBVI filter). When
ADR-103's persistent override is active, the baseline-drop check
compares full-broadband short window to full-broadband baseline.
NBVI subset is still used for CV (motion sensitivity is what NBVI
shines at — full broadband mean is just the integral level).
## Files Touched
```
v2/crates/wifi-densepose-sensing-server/src/main.rs
- struct AmpState
- nbvi_select_top_k()
- amp_presence_override() (broadband_mean computation)
```
## Verified Acceptance (operator's deployment, 2026-05-17)
Idle empty-room CV, sensing-server with 2 pps housekeeping ping:
| | Full 56 subc | NBVI top-12 |
|---|---|---|
| node 1 (rssi -53 dBm) | ~5.0 % | **3.1 %** |
| node 2 (rssi -67 dBm) | ~7.0 % | **3.9 %** |
Reduction 38-44 %. The lower baseline let ADR-101 gates be tightened
from `15 % / 30 %` down to `10 % / 22 %` for moving/active without
raising the false-positive rate — subtler motions like waving while
sitting near a sensor now trigger.
## Open Items
* ✅ **Step 3 FP-rate validation** — closed in ADR-104 D4 (commit
`6212b17e`). K ∈ {6,8,10,12,16,20} sweep, smallest-FP wins; ties
broken by smallest total-NBVI score.
* **Persist NBVI selection**`AMP_BASELINE_OVERRIDE` (ADR-103)
persists baseline scalar but not the chosen subcarrier indices.
After server restart NBVI re-ranks from scratch; in deployments
where the channel changes over hours we'd want to re-rank anyway,
so for now this is correct, not an open item.
* **FW boot-time NBVI freeze** — ESPectre's Pace freezes NBVI for
the lifetime of the boot. Trade-off vs our adaptive rolling
refresh. Worth exploring if FP rate is a problem in real homes.
## References
* ADR-100 — gain lock (gives NBVI a stable per-subcarrier baseline).
* ADR-101 — classifier that consumes NBVI selection.
* ADR-103 — persistent baseline + universal threshold normalization.
* [Pace's *Part 2*](https://medium.com/@francesco.pace/how-i-turned-my-wi-fi-into-a-motion-sensor-part-2-62038130e530)
+ [francescopace/espectre](https://github.com/francescopace/espectre)
on GitHub (GPLv3).
* [`docs/references/espectre-techniques.md`](../references/espectre-techniques.md).

View File

@ -0,0 +1,180 @@
# ADR-103 — Persistent Empty-Room Baseline + Universal Threshold
**Status**: Accepted
**Date**: 2026-05-17
**Scope**: `v2/crates/wifi-densepose-sensing-server/src/main.rs`
(`AMP_BASELINE_OVERRIDE`, `AMP_BASELINE_CV`, `load_baseline_file`,
`amp_node_level`), `v2/data/baseline.json`, `scripts/record-baseline.py`.
## Context
ADR-101's classifier relies on a `baseline` value per node — the
mean amplitude the room exhibits when empty. Pre-ADR-103 the baseline
was the rolling 95 %ile of the last 1 200 samples (≈ 60 s) of
broadband mean. That meant every server restart triggered a "step
outside for 60 seconds" ritual before the detector worked, and if
the operator stayed in the room longer than ~4 min the baseline
silently drifted down to the *occupied* amplitude — making
`present_still` under-trigger forever after.
Additionally, motion gates were hard-coded to the operator's
deployment (10 % / 22 % CV) — wouldn't transfer to a different room
with different noise floor.
## Decisions
### D1 — Persistent baseline file at `data/baseline.json`
JSON schema **v2** (per node):
```json
{
"version": 2,
"captured_at": "ISO-8601",
"duration_sec": 90.0,
"trim_head_sec": 15.0,
"trim_tail_sec": 15.0,
"clean_window_sec": 30.0,
"method": "record → trim head/tail → find lowest-CV sub-window → FULL-broadband stats per node",
"nodes": {
"1": {
"full_broadband_mean": 26.11,
"full_broadband_p50": 26.16,
"full_broadband_p95": 27.04, ← used as `baseline`
"full_broadband_std": 0.68,
"full_broadband_cv_pct": 2.62, ← used to normalize gates (D3)
"rssi_dbm": -52.3,
"n_samples": 149,
"per_subcarrier_mean": [..56 floats..]
}
}
}
```
Loader (`load_baseline_file`) reads at server startup. Path is
`$RUVIEW_BASELINE_FILE` or `data/baseline.json` by default. Missing
or unparseable file = log warning + fall back to rolling p95 (= old
ADR-101 behaviour, no breaking change).
Per-node lookup priority: `full_broadband_p95``full_broadband_mean`
→ legacy `p95_amp` → legacy `mean_amp`. v1 baselines load but get
warning about NBVI-drift incompatibility.
### D2 — FULL broadband for baseline comparison, NBVI for CV
The persisted baseline must be comparable across server restarts.
NBVI top-12 re-selects on each boot (ADR-102 D4), so a NBVI-subset
mean recorded today doesn't match a NBVI-subset mean tomorrow even
in the same empty room. Fix:
`amp_presence_override` keeps two short windows:
| Field | Source | Used for |
|---|---|---|
| `short` | NBVI-subset broadband mean | CV (motion sensitivity) |
| `short_full` | **all non-zero subcarriers** mean | baseline drop check |
The recording script also computes full-broadband statistics from
the captured frames. Both sides of `mean / baseline` ratio are
full-broadband ⇒ stable across NBVI selection.
### D3 — Universal threshold via baseline-CV normalization
(Pace's Problem #3.) Hard-coded gates are deployment-tuned. Fix:
normalize the runtime CV by the empty-room CV measured during
calibration:
```
norm_cv = current_cv / baseline_cv
gates: norm_cv ≥ 3.0 → present_moving
norm_cv ≥ 6.0 → active
```
Both `amp_node_level` (per-node) and `amp_classify_from_latest`
(global) use the same normalization. When no calibration is loaded,
fall back to absolute gates `0.10 / 0.22` (the deployment-tuned
values) — keeps backwards compatibility.
`AMP_BASELINE_CV` is a separate per-node map loaded alongside
`AMP_BASELINE_OVERRIDE`. The CV value is the FULL-broadband CV % from
the calibration file divided by 100.
### D4 — Recording script `scripts/record-baseline.py`
CLI helper (Python 3, requires `pip install websockets`). Connects
to the live `ws://localhost:8765/ws/sensing`, records `duration` (90
s default), then:
1. Trim `trim_head_sec` (15 s default) and `trim_tail_sec` (15 s
default) to discard door-open / re-entry transients.
2. Slide a `clean_window_sec` (30 s default) sub-window across the
trimmed buffer, pick the one with the lowest broadband CV.
3. Per node, compute full-broadband mean / median / p95 / std / CV %
and rssi mean over that cleanest window.
4. Also compute per-subcarrier mean across the cleanest window (saved
as diagnostic for future per-subcarrier delta classifier).
5. Write `v2/data/baseline.json` (path overridable via `--out`).
Operator workflow now: step out, run script, come back, restart
server. One-time per deployment (or after room rearrangement). No
recurring ritual.
## Files Touched
```
v2/crates/wifi-densepose-sensing-server/src/main.rs # ~120 lines added
v2/data/baseline.json # new, gitignored?
scripts/record-baseline.py # new helper
docs/adr/ADR-103-persistent-baseline.md # this ADR
```
## Verified Acceptance (operator's deployment, 2026-05-17)
```
boot: baseline: loaded 2 node overrides from data/baseline.json
(node1=27.04, node2=14.72;
node1_cv=2.62%, node2_cv=3.65%)
```
Empty room, immediately after restart (no warmup wait):
```
GLOBAL: absent CV=5.0%
node 1 ratio=0.93, norm_cv=0.80×
node 2 ratio=0.93, norm_cv=0.83×
```
Sitting in node 2 path (off-axis from node 1):
```
GLOBAL: present_still CV=8.1%
node 1 ratio=1.05, norm_cv=1.2× (not in path, no drop)
node 2 ratio=0.70, norm_cv=1.7× ← drop fires present_still
```
Walking:
```
GLOBAL: active CV=28-36%
node 1 norm_cv=4-6×, node 2 norm_cv=7-9× ← well above 6× gate
```
Universal-threshold gates `3.0 / 6.0` map to the same absolute
12 % / 22 % we hand-tuned earlier — but now any-room-portable.
## Open Items
* ✅ **REST endpoint POST /api/v1/baseline/calibrate** — closed in
ADR-107 D3 + UI button D6.
* ✅ **Per-subcarrier baseline comparison** — closed in ADR-104
(per-sub drift channel consumes `per_subcarrier_mean`).
* ✅ **Auto-recalibrate on long quiet periods** — closed in ADR-107 D5
(30-min quiet + 1-h cooldown defaults).
## References
* ADR-100 — gain lock.
* ADR-101 — classifier consumes the baseline.
* ADR-102 — NBVI selection drift was the root cause of D1/D2.
* [`docs/references/espectre-techniques.md`](../references/espectre-techniques.md)
— Pace's full technique catalogue including Problem #3 normalization.

View File

@ -0,0 +1,179 @@
# ADR-104 — Per-Subcarrier Drift Presence Channel + NBVI FP-Rate Validation
**Status**: Accepted
**Date**: 2026-05-17
**Scope**: `v2/crates/wifi-densepose-sensing-server/src/main.rs`
(`AMP_BASELINE_PER_SUB`, `AMP_DRIFT`, `amp_drift_for_node`,
`amp_drift_max`, `amp_node_level`, `amp_classify_from_latest`,
`nbvi_select_top_k` Step 3), `scripts/record-baseline.py`
(`per_subcarrier_mean` already saved).
## Context
After ADR-103 the classifier triggers `present_still` only when the
**broadband mean** of the NBVI-selected subset drops by ≥ 25 % from
the loaded baseline. This works when the operator's body crosses the
line of sight between AP and sensor — direct-component attenuation
dominates. But:
1. **Off-axis presence**: the operator sitting at a desk to the side
of the AP-sensor line modulates only a handful of subcarriers
(the ones whose Fresnel zone happens to brush their body). The
*broadband* mean barely shifts; ADR-103 says `absent` even though
someone is clearly in the room.
2. **NBVI Step 3**: Pace's full NBVI pipeline picks top-K by raw NBVI
score, then **validates** each candidate K by counting false
positives the motion detector would produce on the calibration
buffer, and keeps the K with the lowest FP rate. We were taking
the raw top-12 without validation — fragile if one of the chosen
subcarriers happens to overlap a noise source.
## Decisions
### D1 — Spectral drift score as a second presence channel
`amp_presence_override` per node now also computes a **spectral
drift score**:
```
drift_k = (current_amp[k] - baseline_amp[k]).abs() / baseline_amp[k] for baseline[k] > 1.0
drift = mean(drift_k) across kept subcarriers
```
`current_amp[k]` = mean of the recent `AMP_SHORT_WIN` (90) frames'
amplitude at subcarrier `k`. `baseline_amp[k]` = the
`per_subcarrier_mean` vector saved by ADR-103's recording script.
Per-node drift is stashed in `AMP_DRIFT: HashMap<u8, f64>` so
`amp_node_level` (per-node) and `amp_classify_from_latest` (global)
can use it. Threshold `AMP_DRIFT_PRESENCE_THRESH = 0.10` (10 %
average per-subcarrier deviation) is empirical and consistent with
the broadband-ratio trigger (drop ≥ 25 %, drift ≥ 10 %).
### D2 — Trigger order in classifier
Per node (`amp_node_snapshot`):
```
1. CV ≥ 6× baseline_cv → active
2. CV ≥ 3× baseline_cv → present_moving
3. drift ≥ 10 % → present_still ← ADR-104 (off-axis)
4. mean / baseline < 0.75 present_still ADR-101 (in-path)
5. otherwise → absent
```
Global (`amp_classify_from_latest`) uses MAX CV / MAX drift / ANY
baseline-drop across nodes. Either drop OR drift fires `present_still`.
### D3 — Opportunistic loading
`per_subcarrier_mean` was already being written by
`scripts/record-baseline.py` (line ~132, written as a list of
~56 floats per node) but the server ignored it. Now `load_baseline_file`
parses it and populates `AMP_BASELINE_PER_SUB`. If absent (older
`baseline.json` from before this ADR) → drift stays 0.0 → no behaviour
change. Re-trigger calibration via the ADR-107 REST endpoint or auto-
recalibrate to populate the field and activate the drift channel.
### D4 — NBVI FP-rate validation (Step 3 of Pace's spec)
`nbvi_select_top_k` no longer returns the literal top-K. After
ranking by NBVI score (Steps 1+2), it evaluates each candidate
K ∈ `{6, 8, 10, 12, 16, 20}` clamped to the available subcarrier
pool:
* For each K: compute per-frame broadband mean over the top-K
subset across the quiet window.
* Slide a sub-window (length `AMP_SHORT_WIN/3 ≈ 30` samples, stride
`sub_window/2`) and count windows where rolling CV exceeds the
moving-gate threshold (0.10).
* Pick the K with the **smallest FP count**. Ties broken by smallest
total NBVI score (less noisy subset wins).
Result: a subset that's stable AND non-FP-producing on the calibration
window. If a top-12 NBVI candidate sneaks in a subcarrier overlapping
a noise source, the FP count surfaces it and a smaller K wins instead.
## Files Touched
```
v2/crates/wifi-densepose-sensing-server/src/main.rs
- statics: AMP_BASELINE_PER_SUB, AMP_DRIFT
- helpers: amp_baseline_per_sub_init, amp_drift_init,
amp_drift_for_node, amp_drift_max
- load_baseline_file: parse per_subcarrier_mean → AMP_BASELINE_PER_SUB
- amp_presence_override: drift computation + stash
- amp_node_level: drift trigger (uses MAX for cross-node)
- amp_node_snapshot: per-node drift trigger (overrides MAX)
- amp_classify_from_latest: any-node drift trigger in global fusion
- nbvi_select_top_k: Step 3 FP-rate validation
docs/adr/ADR-104-per-subcarrier-drift-presence.md (this)
```
Implementation commit: `6212b17e`.
## Verified Acceptance
Server boot log (using existing v1 baseline.json without
`per_subcarrier_mean`):
```
baseline: loaded 2 node overrides from data/baseline.json
(node1=27.04, node2=14.72; node1_cv=2.62%, node2_cv=3.65%)
```
Without `per_subcarrier_mean` in the file, drift is identically 0
and the classifier behaves exactly as ADR-103. To activate the
drift channel: re-record via the ADR-107 REST endpoint or wait for
auto-recalibrate; new `baseline.json` carries the
`per_subcarrier_mean` vector and drift becomes live.
NBVI Step 3 validation runs on every refresh tick. With K=12 being
the "safe" default that always passes (clean low-CV window in the
operator's deployment) and smaller Ks not improving FP=0, the picker
keeps K=12 in steady state. Defends against future drift in channel
conditions where a previously-clean subcarrier picks up interference.
## Open Items
(none — see Closed below)
## Closed
* **Phase-domain drift**`scripts/record-baseline.py` and the
in-process `capture_baseline_to_disk` now emit per-subcarrier
`per_subcarrier_phase_mean` + `per_subcarrier_phase_var` (circular
mean + variance) when the WS stream carries phases (ADR-106). The
server loads them into `PHASE_BASELINE_PER_SUB`, `phase_drift_update`
computes a per-tick circular-distance score over subcarriers whose
baseline variance is below `PHASE_BASELINE_VAR_MAX = 0.30`. Score
surfaces in `PerNodeFeatureInfo.phase_drift_score` (skip-if-none).
Falls back gracefully — legacy baselines without phase fields keep
amplitude-only behaviour.
* **Per-subcarrier baseline AGE check**`baseline_staleness_watch`
background task warns when on-disk baseline is older than
`--baseline-stale-age-sec` (default 4 h) AND per-sub drift exceeds
1.5× presence threshold for ≥3 consecutive 5-min ticks while the
classifier reports `absent`. Rate-limited via
`--baseline-stale-warn-cooldown-sec` (default 1 h). Independent
from `auto_recalibrate_task`: that path needs a quiet room; this
one fires when the operator is *in* the room while the channel
itself has shifted. (commit eec3ca6c)
* **Per-subcarrier delta in UI**`raw.html` now shows a per-node
drift sparkline below the RSSI/broadband trace, fixed Y range
[0, 0.30] with dashed presence (0.10) and warning (0.15)
thresholds. Numeric "drift" stat pill in the per-node header.
Backed by a new `drift_score: Option<f64>` field on
`PerNodeFeatureInfo` (skip-if-none — distinguishes "no per-sub
baseline loaded" from "loaded and stable at 0.0"). (commit eec3ca6c)
## References
* ADR-101 — broadband classifier; this ADR adds a parallel channel.
* ADR-102 — NBVI; this ADR adds Step 3 validation per Pace's spec.
* ADR-103 — persistent baseline; `per_subcarrier_mean` already written.
* ADR-107 — REST calibrate endpoint; how the operator refreshes the
per-sub vector on demand.
* [`docs/references/espectre-techniques.md`](../references/espectre-techniques.md)
§1.Step 3.

View File

@ -0,0 +1,192 @@
# ADR-105 — No Synthetic Data in Production Runtime
**Status**: Accepted
**Date**: 2026-05-17
**Scope**: `v2/crates/wifi-densepose-sensing-server/src/main.rs`
(REST handlers under `/api/v1/pose/*`, `/api/v1/info`,
`derive_pose_from_sensing`, `generate_signal_field`).
## Context
After we pulled the upstream Docker UI (`ruvnet/wifi-densepose:latest`)
and pointed it at our backend via `--ui-path /tmp/wdp_ui/ui`, the
operator inspected the rich SPA and noticed several panels showing
data we have no business showing:
* **Pose dashboard rendered a 17-keypoint skeleton** even though no
DensePose model is loaded. Trace: `derive_pose_from_sensing`
`derive_single_person_pose` synthesised a geometric placeholder
with keypoint `confidence = 0.0` but plausible-looking coordinates.
* **`/api/v1/pose/stats.average_confidence` was the literal `0.87`**
hard-coded in the handler.
* **`/api/v1/pose/zones/summary` invented four zones** (`zone_1..4`)
marked `clear`, even though no zone configuration exists on this
deployment.
* **`/api/v1/info.features.pose_estimation` was permanently `true`**
regardless of whether a model was actually loaded.
* **`SignalField` (the 20×20 room-heatmap in WS payload) was
procedurally generated** by mapping subcarrier index `k` to angle
`2π·k/N` and dropping Gaussian hotspots at radius proportional to
variance. A single sensor has no directional information — the
resulting heatmap had no correspondence to where anything actually
was in the room. UI rendered a believable spatial visual that was
entirely a fiction.
All five were cosmetic noise hiding the real capability gap. Operator
asked for boots-on-the-ground honesty: surface real ESP32-derived
state and nothing else.
## Decisions
### D1 — `derive_pose_from_sensing` returns empty
The function body is now `Vec::new()`. The legacy heuristic
(`derive_single_person_pose` + bone-length tables) is unreachable
from production paths but left in the source for the day a real
trained pose model is wired in. All call sites compile unchanged
and just get an empty vector when there is no model.
### D2 — `/api/v1/pose/current` gated on `model_loaded`
```rust
let persons = if s.model_loaded {
s.latest_update.as_ref().and_then(|u| u.persons.clone()).unwrap_or_default()
} else {
Vec::new()
};
```
Response now includes `"model_loaded": false` so the UI can decide
whether to render a placeholder ("No pose model loaded") or hide the
panel entirely.
### D3 — `/api/v1/pose/stats` drops the fake confidence
The hard-coded `"average_confidence": 0.87` is removed. Only
counters that come from real frame ingest remain
(`total_detections`, `frames_processed`) plus `model_loaded`.
### D4 — `/api/v1/pose/zones/summary` reports actual zone state
```json
{ "presence": <real>, "zones_configured": 0, "zones": {} }
```
No more invented `zone_1..4`. When the operator configures real
zones (open work), they get added here.
### D5 — `/api/v1/info.features.pose_estimation` reflects reality
```rust
"pose_estimation": s.model_loaded,
```
### D6 — `generate_signal_field` returns zero-filled grid
The body is now:
```rust
let grid = 20usize;
return SignalField {
grid_size: [grid, 1, grid],
values: vec![0.0; grid * grid],
};
```
UI renders blank instead of a synthesised spatial map. This is the
truthful state until a real multistatic localizer is wired (per
ADR-008 multi-AP attention or the `MultistaticFuser` already in
state). 77 lines of procedural-art code deleted.
## Files Touched
```
v2/crates/wifi-densepose-sensing-server/src/main.rs
- fn api_info (D5)
- fn pose_current (D2)
- fn pose_stats (D3)
- fn pose_zones_summary (D4)
- fn derive_pose_from_sensing (D1)
- fn generate_signal_field (D6)
docs/adr/ADR-105-no-synthetic-data-in-production-runtime.md (this)
```
Two commits:
* `9aa027e9` — D1..D5 (REST handlers + `derive_pose_from_sensing`)
* `30244d27` — D6 (`generate_signal_field` stub)
## Verified Acceptance
`/api/v1/sensing/latest` snapshot, deployment idle:
```
signal_field grid=[20,1,20], 400 values, 0 non-zero (was: random hotspots)
pose_keypoints null (was: 17-point heuristic)
persons null (was: synthesised array)
posture null (was: heuristic string)
signal_quality_score null
enhanced_motion null
vital_signs.br_bpm null (smoothed_br ≤ 1.0)
vital_signs.hr_bpm null
— still real —
features.mean_rssi -59 dBm ✓
features.variance 8.64 ✓
classification absent / present_still / present_moving / active per ADR-101
```
`/api/v1/pose/current`:
```json
{"persons": [], "total_persons": 0, "model_loaded": false, "source": "esp32"}
```
`/api/v1/info`:
```json
{"features": {..., "pose_estimation": false, ...}}
```
## Out of scope (already correct or developer-mode)
* `--source simulate` already exits with code 2 (parallel agent change).
* `--pretrain` / `--train` synthetic-fallback paths are explicit
dev-mode CLI flags. They never touch the runtime sensing path and
are out of scope for this ADR.
* `vital_signs` was already gated: `breathing_rate_bpm = Some(_)` only
when smoothed value > 1.0 BPM; otherwise `None`. No spurious BPM
reported.
* `enhanced_motion` / `enhanced_breathing` / `bssid_count` come from
`pipeline.process(&multi_ap_frame)` which consumes real CSI. When
the multi-BSSID pipeline is inactive they are `None`. Left alone.
## Open Items
* **UI badges for "no model"**`raw.html` already renders correctly
on empty pose data; the richer Docker UI still tries to render a
skeleton from `pose_current` even when the array is empty. Need
a small UI patch: hide the pose canvas when `model_loaded == false`.
## Closed
* **Honest `enhanced_*` fields** — both `enhanced_motion` and
`enhanced_breathing` now carry a uniform `n_aps_used: u8` field
alongside the legacy `contributing_bssids` / `bssid_count`
counts. Consumers can gate on `n_aps_used >= 2` before trusting a
multi-AP enhancement. (commit 598a4b2f)
* **Real signal_field via multistatic fusion** — shipped in ADR-112.
When ≥ 2 ESP32 nodes are active, `MultistaticFuser` output drives
a coverage × activity 20×20 heatmap (isotropic Gaussian per node
position, gated by `cv²(fused_amplitude) × cross_node_coherence`).
Single-sensor / fusion-fail paths still return ADR-105's zero
grid. Map is honestly framed as coverage, not target position.
## References
* ADR-101 — classifier (only emits real-derived `motion_level`).
* ADR-103 — persistent baseline (only emits real-derived
baseline/threshold).
* [`docs/references/espectre-gap-analysis.md`](../references/espectre-gap-analysis.md)
— separate item list for what would replace each of the now-empty
outputs with real data.

View File

@ -0,0 +1,161 @@
# ADR-106 — Full Complex CSI in WS + Managed-Ping Keepalive
**Status**: Accepted
**Date**: 2026-05-17
**Scope**: `v2/crates/wifi-densepose-sensing-server/src/main.rs`
(`NodeInfo` struct, `NodeState`, `udp_receiver_task`,
`csi_keepalive_task`, CLI `--csi-keepalive-pps`).
## Context
The operator's instruction: *"work without a model for now, but make
sure the sensors give us everything described in the parent repo so
the future model — and fine-motion detection right now — has full
signal."* Two gaps stood between the live deployment and that goal:
1. **WS NodeInfo carried only amplitude.** The 56-bin per-subcarrier
`amplitude` vector was exposed, but the equally-important
`phases` vector (radians, `atan2(Q, I)`) was parsed by
`parse_esp32_frame` and then silently dropped. Vital-signs FFT on
phase, MERIDIAN-style hardware normalization, and any future
DensePose-class model expect the full complex `H[k] = A_k · e^{jφ_k}`.
2. **Raw CSI rate depended on an ad-hoc shell `ping`.** With nothing
sending unicast traffic to the sensors, beacon-only rate dropped
to ~0.3 fps — too slow even for breathing-band FFT. The operator
was running `ping -i 0.05 192.168.0.101 &` by hand; if Mac switched
network, it died.
## Decisions
### D1 — Expose phases + noise_floor + n_antennas + µs timestamp in `NodeInfo`
Four new fields, each `#[serde(skip_serializing_if = empty/zero)]` so
feature_state ticks (no raw CSI) stay slim:
```rust
phases: Vec<f64>, // atan2(Q, I), radians
n_antennas: u8, // RX antenna count
noise_floor_dbm: i8, // RX noise floor
timestamp_us: u64, // sensor-side µs timestamp
```
This is the same data we already parse out of `0xC511_0001` frames
in `parse_esp32_frame`; previously we threw `phases` away and never
even surfaced `noise_floor` to the WS envelope. Consumers
reconstruct the complex CSI with `H[k] = amplitude[k] · (cos(phases[k]) + j·sin(phases[k]))`.
### D2 — Per-node stash on `NodeState`
`NodeState` gains four new fields:
`latest_phases: Option<Vec<f64>>`, `latest_noise_floor: i8`,
`latest_timestamp_us: u64`, `latest_n_antennas: u8`. Populated on
every raw-CSI frame in the second raw-CSI path
(`udp_receiver_task` → raw CSI branch). `build_node_features` and
the raw-CSI SensingUpdate builder both read from this stash to
populate the new `NodeInfo` fields uniformly. Avoids carrying a
full per-subcarrier phase history buffer — we only need the most
recent vector for the UI / classifier; FFT consumers can build their
own window.
### D3 — Built-in keepalive via managed `ping` children
`csi_keepalive_task` async task:
1. Watches `NODE_ADDRS` (per-node sender address, populated on every
recv_from via a cheap magic-byte peek).
2. For each known node, spawns one `ping -i <interval> <ip>` child
process (`/sbin/ping` on macOS, `/usr/bin/ping` on Linux).
3. Re-spawns the child if it dies or if the sensor's IP changes
(DHCP rotation).
4. Default rate `--csi-keepalive-pps 25``-i 0.040` for `ping`.
`--csi-keepalive-pps 0` disables.
### D4 — Why ICMP, not UDP
We first tried a UDP-based keepalive (`sock.send_to(&[0], src_addr)`
to the sensor's ephemeral source port). On the operator's deployment
(ESP32-S3 + TP-Link WISP) it did **not** drive raw CSI: the sensor's
UDP stack rejected the closed-port packet before the CSI callback
fired in the WiFi RX path. ICMP echo bypasses user-space port logic
entirely — kernel WiFi RX handles it and the CSI callback fires
regardless of any listener.
Trade-off accepted: shelling out to `/sbin/ping` is platform-
specific. Linux containers must include `iputils-ping`; macOS has
`/sbin/ping` built-in. We probe both paths at startup. A pure-Rust
raw-socket ICMP would avoid the dependency but needs root /
`CAP_NET_RAW`.
## Files Touched
```
v2/crates/wifi-densepose-sensing-server/src/main.rs
- struct NodeInfo (+4 fields, helpers is_zero_*)
- struct NodeState (+4 latest_* fields)
- static NODE_ADDRS (per-node source address map)
- fn csi_keepalive_task (managed ping pool)
- udp_receiver_task (NODE_ADDRS populate via magic peek)
- all NodeInfo {...} sites (5 — populate new fields)
- Args { csi_keepalive_pps } (CLI flag, default 25)
docs/adr/ADR-106-full-complex-csi-keepalive.md (this)
```
Two implementation commits on the branch:
* `4daa2c9b` — D1 + D2 (WS struct, per-node stash, NodeInfo builders)
* `8489efe9` — D3 + D4 (keepalive task, NODE_ADDRS, CLI flag)
## Verified Acceptance
Live, server fresh-restart, no shell `ping` running:
```
boot: CSI keepalive: 25 ICMP pkt/s/node (interval 0.040s)
boot: keepalive: learned address for node 1 = 192.168.0.101:60492
boot: keepalive: learned address for node 2 = 192.168.0.100:51664
+2 s: keepalive: ping -i 0.040 192.168.0.101 for node 1
+2 s: keepalive: ping -i 0.040 192.168.0.100 for node 2
WS sample (5 s):
node 1: 67.6 Hz updates, 55.6 Hz amp-bearing raw CSI
node 2: 67.6 Hz updates, 55.6 Hz amp-bearing raw CSI
```
NodeInfo per node now carries `amplitude[56]`, `phases[56]`,
`rssi_dbm`, `noise_floor_dbm=-91`, `n_antennas=1`, plus the
empty/zero-suppressed `timestamp_us` (FW doesn't yet emit it —
left as a 0 placeholder).
Sampling rate 55 Hz comfortably covers breathing band (0.10.5 Hz)
and heart-rate band (0.82 Hz) for FFT; with the phase vector now
on the wire, those FFTs can run on phase as well as amplitude,
which is more sensitive to chest-wall micrometric motion.
## Out of scope / open
* ✅ **FW-side µs timestamp** — closed in commit `b787f40a`. FW now
appends `info->rx_ctrl.timestamp` (u32 LE) as 4 trailing bytes
after I/Q data; server parses opportunistically (None for older
FW). NodeInfo.timestamp_us now carries sensor monotonic µs when
available, falls back to server SystemTime otherwise.
* **Per-frame antenna selection** when ESP32-S3 reports >1 antenna —
current FW hard-codes `n_antennas=1` in `csi_collector.c`. Single-
antenna deployments are unaffected.
* **TP-Link queue limits** — at 55 Hz × 2 nodes = 110 raw frames/s,
plus 25 pings/s × 2 = 50 ICMP/s, all going through one consumer-
grade AP. Watching for saturation. Reduce `--csi-keepalive-pps` if
the AP starts dropping.
* **Channel hopping** (ADR-029) would give frequency diversity. Single-
channel works fine for one room.
## References
* ADR-100 — gain lock (the stability baseline keepalive needs).
* ADR-101 — classifier (consumes phase via per-node amplitudes; future
micro-motion detector will pull phase too).
* ADR-103 — persistent baseline (loaded at server boot, unaffected
by keepalive rate).
* ADR-105 — no synthetic data (this ADR adds *more* real data, not
more synthetic).
* [`docs/references/espectre-gap-analysis.md`](../references/espectre-gap-analysis.md)
— phase-aware processing is a prerequisite for several open items.

View File

@ -0,0 +1,186 @@
# ADR-107 — REST Baseline Calibration + Auto-Recalibrate
**Status**: Accepted
**Date**: 2026-05-17
**Scope**: `v2/crates/wifi-densepose-sensing-server/src/main.rs`
(`baseline_get`, `baseline_calibrate`, `auto_recalibrate_task`,
`capture_baseline_to_disk`, `BASELINE_BUS`), `static/raw.html`
(`calibrate empty` button), CLI flags
`--auto-recalibrate-quiet-sec` / `--auto-recalibrate-min-age-sec`.
## Context
ADR-103 introduced a persistent empty-room baseline at
`data/baseline.json` so the classifier no longer needed a 60 s warm-up
after every server restart. To refresh it the operator had to:
1. Step out of the room.
2. SSH / open a terminal, run `python scripts/record-baseline.py
--duration 90`.
3. Wait for the "saved" message.
4. Restart the sensing-server (so it reloads the file).
5. Walk back in.
Steps 2, 4 are friction. The operator asked to remove them so a
fresh device that just wants to monitor a room doesn't need a CLI
or a restart. Two changes:
* **`POST /api/v1/baseline/calibrate`** — fires the same record-and-
trim pipeline from inside the server, hot-reloads the override map
on success. UI button in `raw.html` triggers it.
* **Auto-recalibrate background task** — silently refreshes the
baseline when the classifier reports `absent` and CV stays low for
a long-enough window, without any operator action.
## Decisions
### D1 — `capture_baseline_to_disk` in-process
Pure-Rust port of `scripts/record-baseline.py`:
1. Subscribe to `BASELINE_BUS` (a `tokio::sync::broadcast::Sender<String>`
that mirrors every WS JSON message published by the broadcaster).
2. Collect `duration_sec` of per-node `(t, amplitudes, rssi)`.
3. Trim `trim_sec` from head and tail.
4. Slide `clean_window_sec` window across, pick lowest-CV chunk per
node.
5. Compute FULL-broadband mean/p50/p95/std/CV% (same schema as
ADR-103 v2; reload uses the same `load_baseline_file`).
6. Write `data/baseline.json` (configurable via JSON body `out`).
7. Call `load_baseline_file(path)` to hot-reload `AMP_BASELINE_OVERRIDE`
and `AMP_BASELINE_CV`.
### D2 — `BASELINE_BUS` broadcast forwarder
Decouples baseline capture from individual WS clients. A small task
spawned at startup subscribes to `AppState.tx` and re-publishes every
message into `BASELINE_BUS`. Capture subscribers don't need a WS
connection or any external network path.
### D3 — `POST /api/v1/baseline/calibrate`
Optional JSON body: `{ duration_sec, trim_sec, clean_window_sec, out }`.
Defaults: 90 / 15 / 30 s and `data/baseline.json`. Returns immediately
with `{ "started": true, "hint": "..." }`. Subsequent calls while a
job is running return `{ "started": false, "reason": "calibration
already running" }`.
### D4 — `GET /api/v1/baseline`
```json
{
"nodes": { "1": {"full_broadband_p95": …, "full_broadband_cv_pct": …}, … },
"last_written_sec_ago": <i64>,
"calibration_status": "idle" | "running" | "running (auto)"
| "complete" | "complete (auto)" | "error: …"
}
```
UI polls this every 2 s while a calibration is running to drive the
button state machine.
### D5 — Auto-recalibrate background task
Wakes every 5 s. State machine:
* Read latest `classification.motion_level` and `confidence` (=CV).
* `quiet = (motion_level == "absent") && (cv < 0.08)`.
* If `quiet` is true continuously for `--auto-recalibrate-quiet-sec`
(default 1800 = 30 min) **AND** the last baseline write is older than
`--auto-recalibrate-min-age-sec` (default 3600 = 1 h), kick off
`capture_baseline_to_disk(90, 5, 45, "data/baseline.json")` in the
background.
* On error, log + set `calibration_status` so the UI surfaces it.
The 30-minute / 1-hour defaults are conservative: a person briefly
walking through doesn't reset the baseline; long-term drift from
WiFi reconfiguration or furniture rearrangement does. `--auto-
recalibrate-quiet-sec 0` disables entirely.
### D6 — `raw.html` button
`calibrate empty` next to the existing `reset` button. Click →
`confirm()` reminds operator to step out → POSTs the endpoint → polls
status every 2 s, updating the inline pill `recording… 12/90 s`
`baseline updated ✓` on success. Disables itself while running.
## Files Touched
```
v2/crates/wifi-densepose-sensing-server/src/main.rs
- statics: BASELINE_LAST_WRITTEN, BASELINE_CALIBRATION_STATUS, BASELINE_BUS
- fn capture_baseline_to_disk (D1)
- fn auto_recalibrate_task (D5)
- fn baseline_get (D4)
- fn baseline_calibrate (D3)
- routes /api/v1/baseline + /api/v1/baseline/calibrate
- Args { auto_recalibrate_quiet_sec, auto_recalibrate_min_age_sec }
- main(): bus init + auto-recalibrate spawn
v2/crates/wifi-densepose-sensing-server/static/raw.html
- <button id="calibrateBtn"> (D6)
- <span id="calibStatus" class="pill"> (D6)
- JS: startCalibrate(), polling loop
docs/adr/ADR-107-auto-recalibrate-and-rest-baseline.md (this)
```
One impl commit so far: `0f373467`. UI button + ADR are in this
follow-up.
## Verified Acceptance
Boot log shows the new task wired:
```
baseline: loaded 2 node overrides from data/baseline.json
(node1=27.04, node2=14.72; node1_cv=2.62%, node2_cv=3.65%)
Auto-recalibrate enabled: trigger after 1800s of `absent`+low-CV,
min 3600s between writes
CSI keepalive: 25 ICMP pkt/s/node (interval 0.040s)
```
REST endpoints live:
```
GET /api/v1/baseline → current state + last_written_sec_ago
POST /api/v1/baseline/calibrate → { "started": true }
```
End-to-end smoke test (5 s capture window for speed):
```
POST → { started: true, duration_sec: 5 }
… 8 s elapsed …
GET → { calibration_status: "complete", last_written_sec_ago: 13 }
file: /tmp/test_baseline.json contains n_samples=86 per node + full_broadband_*
```
The hot-reload was visible immediately: `GET /api/v1/baseline.nodes`
showed the new (capture-window) values before any server restart.
## Out of scope / open
* **UI: progress bar instead of pill text** — current state shows
textual `recording… 12/90 s`. Could be a thin progress bar.
* **Multiple baseline profiles** — only one `data/baseline.json` per
server. Future: name-scoped baselines for different deployment
contexts (day / night, summer / winter).
* **Quiet detection that uses CV alone** — currently AND-gated with
`motion_level == "absent"` which itself depends on the loaded
baseline. Risk: if the loaded baseline is *bad*, classifier may
never report `absent`, auto-recalibrate never fires. Mitigation:
REST endpoint stays available; first call out of the box is always
manual via the UI button.
## References
* ADR-100 — gain lock (the prerequisite that makes baseline meaningful).
* ADR-101 — classifier whose `motion_level`/`confidence` drives the
quiet-detector.
* ADR-103 — persistent baseline file (this ADR adds two ways to
refresh it).
* ADR-105 — no synthetic data (auto-recalibrate is *real* data, not
synthesized — it just runs without operator intervention).
* ADR-106 — keepalive (ensures the capture window has enough raw CSI
frames to give a meaningful percentile).
* [`scripts/record-baseline.py`](../../scripts/record-baseline.py)
— original CLI workflow, kept for headless use.

View File

@ -0,0 +1,177 @@
# ADR-108 — FW NVS Persistence of Gain-Lock Values
**Status**: Accepted
**Date**: 2026-05-17
**Scope**: `firmware/esp32-csi-node/main/csi_collector.c`
(`rv_gain_load_from_nvs`, `rv_gain_save_to_nvs`, NVS hook in
`rv_gain_lock_process`).
## Context
ADR-100 introduced the FW-side gain-lock (AGC + FFT scale) but the
calibration runs on *every* boot:
1. Collect 300 packets (~3 s at 100 pps, but realistically 6-12 s
in production where keepalive drives only 25 pps).
2. Take the median of AGC and FFT samples.
3. Call `phy_force_rx_gain` / `phy_fft_scale_force` to freeze.
This means after every reboot — OTA, power blip, watchdog — the chip
goes through 6-12 s where CSI is generated with **unlocked AGC** that
drifts ±2030 % (the very artefact gain-lock was meant to suppress).
The operator's classifier, ADR-101's NBVI selector, and ADR-103's
baseline comparison all see noisy data during that warm-up.
Pace's ESPectre persists everything calibration-related to NVS so
post-reboot the sensor is back in detect mode in well under a
second. This ADR ports the gain-lock half of that policy
(NBVI lives server-side in RuView, doesn't apply).
## Decisions
### D1 — NVS namespace + keys
```c
#define RV_GAIN_NVS_NS "csi_cfg"
#define RV_GAIN_NVS_K_AGC "gl_agc" // u8
#define RV_GAIN_NVS_K_FFT "gl_fft" // i8
```
`csi_cfg` is the same namespace the WiFi creds / collector IP / node_id
live in (so it's already initialised + checked by `nvs_config_load`).
Two single-byte values — minimal NVS footprint.
### D2 — Two thin helpers
```c
static esp_err_t rv_gain_load_from_nvs(uint8_t *agc, int8_t *fft);
static void rv_gain_save_to_nvs(uint8_t agc, int8_t fft);
```
Both are local to `csi_collector.c`. Load returns `ESP_ERR_NVS_NOT_FOUND`
on a fresh chip; save logs a warning but never blocks the boot path
if NVS write fails.
### D3 — One-shot NVS load at top of `rv_gain_lock_process`
A static `s_nvs_checked` flag triggers exactly **one** load attempt
on the first packet after boot:
```c
if (!s_nvs_checked) {
s_nvs_checked = true;
uint8_t agc; int8_t fft;
if (rv_gain_load_from_nvs(&agc, &fft) == ESP_OK
&& agc >= RV_GAIN_MIN_SAFE_AGC)
{
phy_fft_scale_force(true, fft);
phy_force_rx_gain(1, (int)agc);
s_gain_locked = true;
ESP_LOGI(TAG, "gain-lock RESTORED from NVS: AGC=%u FFT=%d", agc, fft);
return;
}
}
```
The `agc >= RV_GAIN_MIN_SAFE_AGC` guard preserves ADR-100's "skip if
signal too strong" safety: a stale low-AGC value that would freeze
the RX path is rejected even if it's in NVS.
### D4 — Save after every successful lock
The existing `phy_*_force` branch in `rv_gain_lock_process` is wrapped
with a save call:
```c
phy_fft_scale_force(true, s_gain_fft_value);
phy_force_rx_gain(1, (int)s_gain_agc_value);
rv_gain_save_to_nvs(s_gain_agc_value, s_gain_fft_value);
ESP_LOGI(TAG, "gain-lock PERSISTED to NVS (%s/%s, %s)",
RV_GAIN_NVS_NS, RV_GAIN_NVS_K_AGC, RV_GAIN_NVS_K_FFT);
```
So the first boot ever does the full 300-packet calibration **and**
saves; every subsequent boot loads instantly from D3.
### D5 — Invalidation policy
Stored values are tied to: this sensor's physical location + this AP's
MAC + this channel + this antenna orientation. If any of those change,
the saved AGC/FFT may be slightly off-optimal — but **not dangerous**.
The WiFi PHY just receives slightly off-optimal CSI; the host will
see higher baseline noise until the operator triggers a re-calibration.
Today: erase via `idf.py erase-flash` over USB, or `nvs_flash_erase()`
called from a future REST endpoint. No automatic invalidation — the
operator decides when a deployment change is significant enough.
## Files Touched
```
firmware/esp32-csi-node/main/csi_collector.c
- #include "nvs.h" / "nvs_flash.h"
- rv_gain_load_from_nvs / rv_gain_save_to_nvs (D2)
- s_nvs_checked one-shot in rv_gain_lock_process (D3)
- save call after lock branch (D4)
docs/adr/ADR-108-fw-nvs-persist-gain-lock.md (this)
```
Implementation commit: `3779bb76`. Flashed to both sensors via OTA
(no USB) — `python3 scripts/ota-deploy.sh`.
## Verified Acceptance
Test sequence:
1. OTA flash new FW to both nodes (first boot, NVS empty).
2. Wait 15 s for FW to complete first calibration + write to NVS.
3. OTA flash the SAME binary again (forces a reboot; new FW has
values in NVS from step 2).
4. Sample WS amplitude rate in the first 3 s after the second boot.
Before this ADR: ~5-12 s gap between boot and first amp-bearing WS
frame (waiting for fresh calibration). After this ADR: WS shows
**44 Hz raw CSI in the first 3 s** — instant resume.
Logs from a chip that has values in NVS:
```
I (335) main: boot: reset_reason=SW running_partition=ota_1
I (520) csi_collector: gain-lock RESTORED from NVS: AGC=44 FFT=-33
(0-packet calibration; clear NVS to recalibrate)
```
vs first-boot ever:
```
I (335) main: boot: reset_reason=POWERON running_partition=ota_0
I (4980) csi_collector: gain-lock APPLIED: AGC=44 FFT=-33
(median of 300 packets)
I (4980) csi_collector: gain-lock PERSISTED to NVS (csi_cfg/gl_agc, gl_fft)
```
## Open Items
* **Per-channel cache**`csi_cfg/gl_<chan>_agc`. If the channel hop
table (ADR-029) is reactivated, each channel needs its own values.
~1 h FW. Deferred — channel hopping is out of scope for the current
single-channel deployment.
## Closed
* **REST endpoint to clear gain-lock NVS** — shipped via
`POST /ota/recalibrate` in ADR-109.
* **Track AP MAC alongside AGC/FFT** — shipped via `gl_ap_mac` NVS key
+ boot-time comparison in ADR-109.
## References
* ADR-100 — gain-lock implementation that this ADR persists.
* ADR-101 — classifier that suffers during the 6-12 s warm-up gap
that this ADR closes.
* `docs/references/ota-pipeline.md` — the WiFi flash flow used to
deploy this FW change without USB.
* Francesco Pace, *How I Turned My Wi-Fi Into a Motion Sensor —
Part 2*, "Persisted calibration" — the upstream pattern this ADR
ports (their NVS payload also includes NBVI indices + baseline,
which RuView keeps server-side).

View File

@ -0,0 +1,145 @@
# ADR-109 — FW Gain-Lock Invalidation (REST trigger + AP-MAC binding)
**Status**: Accepted
**Date**: 2026-05-17
**Scope**: `firmware/esp32-csi-node/main/ota_update.c`,
`firmware/esp32-csi-node/main/csi_collector.c`. Closes both Open Items in
ADR-108.
## Context
ADR-108 persists the FW-side gain-lock (AGC + FFT scale) to NVS so a
reboot resumes detect mode in ~0.5 s. Two follow-ups remained:
1. **No way to clear the cache without USB.** When an operator moved a
sensor or swapped the AP, they had to plug the device in and run
`idf.py erase-flash` to force a re-calibration. Defeats the whole
point of OTA-only ops.
2. **No automatic invalidation on AP swap.** Gain-lock is tied to a
specific RF path (AP location, distance, multipath). Connecting the
same sensor to a different AP and re-using the cached AGC/FFT yields
either over-saturated or under-amplified CSI for the whole session
until manual intervention.
## Decisions
### D1 — `POST /ota/recalibrate` REST trigger
New HTTP handler registered on the existing port 8032 next to `/ota`
and `/ota/status`. Same Bearer-token auth path as the firmware upload
endpoint (reuses `ota_check_auth`).
Behaviour:
1. Open NVS namespace `csi_cfg` RW.
2. Erase three keys: `gl_agc`, `gl_fft`, `gl_ap_mac` (D2).
3. `nvs_commit` + close.
4. Send `200 OK {status:"ok"}` JSON.
5. `vTaskDelay(1 s)` to flush the response, then `esp_restart()`.
Next boot: `rv_gain_load_from_nvs` returns `ESP_ERR_NVS_NOT_FOUND`
the existing 300-packet calibration runs as on a never-calibrated chip.
### D2 — `gl_ap_mac` NVS key (6-byte blob)
Stored alongside `gl_agc` / `gl_fft` whenever the calibration writes
back. Source: `esp_wifi_sta_get_ap_info(&ap).bssid`. Read at the same
moment as AGC/FFT during the one-shot NVS short-circuit at the top of
`rv_gain_lock_process`.
Comparison rule on boot:
| Saved MAC | Current AP MAC | Action |
|--------------------|-------------------------|---------------------------------------|
| all-zero (legacy) | any | Use cached gain-lock (wildcard match) |
| matches current | same | Use cached gain-lock |
| differs | any | Log warning, fall through to full cal |
| any | AP info unavailable | Defensive: fall through to full cal |
The all-zero wildcard is the one-time upgrade case: NVS blobs written
by ADR-108 builds predate ADR-109 and have no MAC. Treating them as
match-anything avoids forcing every existing deployment to re-calibrate
on the first ADR-109 boot. The next save (post-re-cal or at the next
natural calibration trigger) populates the real MAC, after which the
strict comparison applies.
### D3 — `rv_gain_save_to_nvs` writes MAC too
Signature changes from `(uint8_t agc, int8_t fft)` to
`(uint8_t agc, int8_t fft, const uint8_t mac[6])`. The caller reads
`ap.bssid` at save time so the saved MAC reflects the AP the
calibration actually ran against (not whatever AP the sensor is
connected to N seconds later, which on a roaming-capable mesh could
differ).
If the save-time AP MAC is unavailable (extremely rare — the gain-lock
hook only fires from a CSI callback, and CSI callbacks require an
established WiFi link), the saved MAC is left as all-zero. The next
boot then takes the wildcard path, preserving the current behaviour
rather than failing closed.
### D4 — Recalibrate handler also clears `gl_ap_mac`
Even though removing only AGC/FFT would force a re-cal by virtue of
the missing keys, also erasing `gl_ap_mac` is cleaner: the next write
will repopulate it from the current AP, and there's no stale MAC
sitting in NVS that could be partially restored by a future bug.
## Trade-offs
* **One-time false re-cal on first ADR-109 boot for chips that ever
saw an AP swap before this ADR shipped.** Acceptable: gain-lock
re-cal takes 6-12 s and produces a brief noise spike, but it's a
one-time event and the result is correct from that point onward.
* **No multi-AP cache.** If a sensor roams between two APs (rare in
this deployment: each sensor is parked next to a fixed TP-Link)
it will re-calibrate on every AP swap. Multi-AP storage would need
per-AP-MAC sub-keys (`gl_agc:<bssid>`, etc.) — explicitly out of
scope; cross-references ADR-108's per-channel cache item which has
the same "wait until needed" disposition.
* **`gl_ap_mac` blob doubles NVS size of the gain-lock bundle from
2 bytes to 8 bytes.** Negligible — the gain-lock namespace `csi_cfg`
already holds SSID/password/IP and a few other keys totalling a few
hundred bytes.
## Files Touched
```
firmware/esp32-csi-node/main/ota_update.c
- ota_recalibrate_handler (D1, D4)
- register POST /ota/recalibrate
firmware/esp32-csi-node/main/csi_collector.c
- RV_GAIN_NVS_K_AP_MAC define (D2)
- rv_gain_load_from_nvs: optional MAC out-param + wildcard support
- rv_gain_save_to_nvs: MAC in-param + nvs_set_blob (D3)
- rv_gain_lock_process: AP-MAC comparison branch (D2)
- rv_gain_lock_process: read current bssid before save (D3)
docs/adr/ADR-109-fw-gain-lock-invalidation.md (this)
```
## Verified Acceptance
1. `idf.py build` clean (only the pre-existing `wifi_promiscuous_cb`
unused warning, unchanged by this ADR).
2. After OTA flash of both nodes:
* `curl -X POST http://192.168.0.100:8032/ota/recalibrate`
* `curl -X POST http://192.168.0.101:8032/ota/recalibrate`
Both return `{"status":"ok","message":"gain-lock NVS cleared; rebooting"}`.
3. Boot log on next reboot shows `gain-lock APPLIED:` (full cal) +
`gain-lock PERSISTED to NVS (AGC=N FFT=M AP=…)` instead of the
`gain-lock RESTORED from NVS:` line that fast-path boots produce.
4. AP-swap path verified by manually flipping the WiFi credentials to
a different SSID via `provision.py`, re-flashing, and confirming
the boot log shows `gain-lock NVS MISS: saved AP=… → current=…
Re-calibrating.` followed by a full cal.
## References
* ADR-108 — NVS persistence of gain-lock. Both Open Items in ADR-108
resolved by this ADR (REST trigger, AP-MAC binding).
* ADR-050 — OTA Bearer-token auth. Same `ota_check_auth` shared with
the new endpoint.
* `docs/references/ota-pipeline.md` — port 8032 recipe; gains a new
bullet for `/ota/recalibrate`.

View File

@ -0,0 +1,160 @@
# ADR-110 — TP-Link WISP Deployment + RSSI-Δ Presence Detector
**Status**: Accepted
**Date**: 2026-05-15
**Scope**: `v2/crates/wifi-densepose-sensing-server/`,
deployment of TP-Link TL-WR841N as a dedicated CSI AP for room01/room02.
## Context
After ADR-098 made the RuView FW boot cleanly and FW5.47 fallback gave real
motion, the deployed sensors still produced unreliable presence in the
operator's home environment. Investigation revealed two compounding factors:
1. **Ambient WiFi noise.** Both sensors were associated with the main
household AP (`Tran Thanh T3`), which is heavily used by neighbouring
networks on the same channel. Per-frame broadband variance in an *empty*
room measured higher than when the operator was sitting at the desk
— the multipath geometry plus neighbour traffic dominated the CSI
signal.
2. **The wrong feature.** Even on a clean channel, CSI variance does not
monotonically track human presence at multi-meter range. A stationary
body modifies multipath consistently (variance drops), while an empty
room exhibits more multipath spread (variance rises). The host DSP
features `variance`, `motion_band_power`, and `spectral_power` all
showed this inversion at the deployed sensor locations.
Three one-minute measurements collected with TP-Link as the isolated AP,
sensors connected only to it:
| Feature | STILL (sitting) | WALK (room loop) | EMPTY |
|---|---|---|---|
| `variance` mean | 29.7 | 33.7 | **35.8** |
| `motion_band_power` mean | 49.8 | 54.6 | **57.4** |
| `spectral_power` mean | 161 | 172 | 172 |
| `mean_rssi` mean (dBm) | -59.13 | -59.12 | -58.98 |
| **`mean_rssi` std** | **0.60** | **1.02** | **0.35** |
Only **standard deviation of mean_rssi** monotonically separates the three
states. The human body physically perturbs RF path loss to the sensor:
absent → flat RSSI, still → small fluctuations from breathing/microtremor,
walking → large per-second swings.
## Decisions
### D1 — Isolate sensors on a dedicated AP (TP-Link TL-WR841N, WISP mode)
The household AP serves dozens of clients across multiple channels and is
constantly retransmitting management frames for neighbours and BT-coex
overlay. We deployed a TP-Link TL-WR841N in **WISP mode**:
* TP-Link associates with `Tran Thanh T3` over WiFi as a single client.
* TP-Link runs its own NAT and broadcasts a clean SSID (`TP-Link_8340`,
WPA2-PSK, fixed channel) on the 2.4 GHz band.
* Sensors are provisioned to associate only with `TP-Link_8340`.
* TP-Link's NAT forwards their UDP/5006 packets to the Mac on the
household subnet (Mac stays connected to `Tran Thanh T3` for internet,
no LAN reconfiguration on the host side).
Empirical effect: per-minute broadband variance in an empty room dropped
from **50.7** (on `Tran Thanh T3`) to **35.8** (on `TP-Link_8340`).
### D2 — Replace CSI-variance presence detector with rolling RSSI MAD-Δ
The host-side classifier in `sensing-server` runs `extract_features_from_frame`
`smooth_and_classify` and outputs `motion_level` ∈ {`absent`, `present_still`,
`present_moving`, `active`} based on a `motion_score` derived from CSI
amplitude variance + temporal change-points. On the deployed geometry the
score crosses thresholds for body-far-from-sensor cases but not for body-near-
sensor stationary cases; the `present_still` band especially is unreliable.
We add an **RSSI-based override** layered after the existing classifier:
* Per-node rolling window of the last 120 frame RSSI samples (~10 s at
12 Hz).
* Metric: **mean absolute delta of consecutive RSSI values** (MAD-Δ).
This is more robust than standard deviation for the int8-quantised RSSI
the WiFi driver reports — a single 1-dB step in a quiet window
inflates std but contributes minimally to MAD-Δ.
* Thresholds (calibrated empirically; see D3):
* `d < 0.20``absent`
* `0.20 ≤ d < 0.55``present_still`
* `0.55 ≤ d < 1.10``present_moving`
* `d ≥ 1.10``active`
* Confidence is surfaced as the raw `d` value during the tuning phase so
that downstream UIs (the calibration console at `static/spectrum.html`)
can drive threshold refinement on new deployments.
The CSI-based features are preserved in the `features.*` block so that
downstream consumers (vital signs, signal-quality estimator, multi-node
fusion) continue to operate.
### D3 — Threshold calibration via UI-assisted "tell me your state" protocol
Tunable thresholds are per-deployment. The procedure documented for the
operator:
1. Open `http://localhost:8091/spectrum.html` (also reachable via Tailscale
at the Mac's `100.x.y.z:8091`).
2. Confidence on that page shows the raw RSSI-Δ for the user's environment.
3. With a stopwatch:
* Leave the room for 60 s. Record median `d`.
* Sit at the workstation for 60 s. Record median `d`.
* Walk the loop for 60 s. Record median `d`.
4. Thresholds = midpoints between consecutive medians.
For the operator's room (TP-Link AP at `192.168.1.14`, sensors at .17 / .19):
| State | `d` median (target) | `d` measured (operator) |
|---|---|---|
| absent | should be near 0 | **0.49** (empty room) |
The operator's empty-room baseline of `d ≈ 0.49` is *higher* than the
heuristic 0.20 threshold the code currently ships with. This is consistent
with the int8 quantisation: even an empty channel jitters by ±1 dB
across consecutive frames. Final threshold tuning for this deployment is
**still pending** — the captures for `sit` and `walk` are needed to set
the boundaries. The code surfaces `d` via `confidence` to let the
operator capture those next two states.
## Files Touched
```
v2/crates/wifi-densepose-sensing-server/src/main.rs # RSSI MAD-Δ + override
v2/crates/wifi-densepose-sensing-server/static/spectrum.html # live console
v2/crates/wifi-densepose-sensing-server/static/calibrate.html # peak-tracker view
docs/adr/ADR-110-tplink-wisp-deployment-and-rssi-presence.md # this ADR
```
## Verified Acceptance
| Criterion | Result |
|---|---|
| Sensors associate only with TP-Link AP (no `Tran Thanh T3` direct) | ✅ |
| Mac receives UDP/5006 packets via TP-Link NAT | ✅ (~12 Hz combined) |
| Empty-room ambient noise reduced vs household AP | ✅ (variance 50.7 → 35.8) |
| `confidence` field carries raw RSSI-Δ for live tuning | ✅ |
| Vital signs (breathing 911 BPM) continue to populate when occupied | ✅ |
## Open Items
* Threshold final-tune (sit + walk medians not yet measured on TP-Link).
* Replace MAD-Δ with `quantile(|Δ|, 0.9) - quantile(|Δ|, 0.1)` if
occasional packet-rate hiccups inflate the simple mean.
* The TP-Link runs WISP NAT — all sensor source IPs collapse to one
(`192.168.1.14` on the household side). The server discriminates nodes
by **MAC address** parsed from the `CSI_LEAN` payload, not by source IP,
so this works today. If we later switch FW back to raw `0xC5110001`
binary frames (which carry MAC) the same discrimination holds. If
`parse_esp32_vitals` (0xC5110002) becomes the upstream format,
per-node state tracking needs a separate MAC-bearing field added to
that packet.
* On longer test sessions: the `motion_band_power` and `variance` features
remain present in `features.*` and are useful for vital-sign signal-quality
estimation; do not strip them.
## References
* ADR-039 — Edge intelligence pipeline (host DSP path).
* ADR-098 — Earlier ESP32-S3 deployment fixes (CSI callback, OTA, mobile UI).
* RuView issue thread on RSSI-vs-CSI presence inversion (this ADR).

View File

@ -0,0 +1,154 @@
# ADR-112 — Multi-AP `signal_field` via `MultistaticFuser`
**Status**: Accepted
**Date**: 2026-05-17
**Scope**: `v2/crates/wifi-densepose-sensing-server/src/main.rs`
(`signal_field_from_multistatic`, two ESP32 vitals call sites). Closes
the "Real signal_field via multistatic fusion" Open Item in ADR-105.
## Context
ADR-105 D6 stripped the synthetic `signal_field` paint and left a 20×20
zero grid in its place. The honesty contract was: never emit visual
positional output without a physically grounded source. A real
multistatic fuser (`MultistaticFuser` in `wifi-densepose-signal`) is
already wired into the server via `multistatic_bridge::fuse_or_fallback`
and consumed by `compute_person_score_from_amplitudes` — but its
output didn't feed the `signal_field` heatmap.
This ADR consumes that fusion output to produce a *coverage × activity*
spatial map when ≥ 2 ESP32 nodes are simultaneously active.
## What the new map honestly is (and isn't)
* **Is**: a 20×20 floor-plane heatmap where each cell value =
Σ over active nodes of `global_activity · exp(-d²/2σ²)`, with `d`
the Euclidean distance from the cell to that node's configured
position, σ a fixed radius, and `global_activity` =
`cv²(fused_amplitude) · cross_node_coherence`. Both factors live in
`[0, 1]`; their product gates the field on simultaneous CSI
modulation AND inter-node agreement.
* **Is not**: a person-location estimate. Commodity ESP32s have no
phase-coherent ranging (no UWB, no two-way ranging); any "target
position" would be fabrication. The map shows *where the active
sensors' coverage zones overlap when they collectively see
modulation*. That's a real, derivable quantity. A "where is the
person" claim is not, and is deliberately withheld.
## Decisions
### D1 — `signal_field_from_multistatic(fuser, node_states) -> SignalField`
New function in `main.rs`. Re-runs `multistatic_bridge::fuse_or_fallback`
(cheap — attention-weighted mean across O(N_nodes × N_subcarriers)),
discards the count-fallback path, and proceeds only when:
* `fused.active_nodes >= 2`, AND
* `fused.node_positions` non-empty, AND
* `fused.fused_amplitude` non-empty, AND
* `global_activity > 1e-3` (everything below is rounding noise).
Otherwise returns the same zero-filled grid `generate_signal_field`
produces. This preserves ADR-105's contract on single-sensor
deployments and degenerate fusion failures.
### D2 — Render constants
* Grid `20 × 1 × 20` (matches the existing `SignalField` shape and the
UI's heatmap consumer).
* `ROOM_EXTENT_M = 3.0` m (half-width of the square the grid spans —
6 m × 6 m floor). Matches the typical "operator room" dimension and
the placement of the two physical sensors.
* `SIGMA_M = ROOM_EXTENT_M / 4.0 = 0.75 m` for the isotropic Gaussian.
Borrowed from Pace's ESPectre heuristic (his code uses ~room/4 for
a similar overlap-rendering pass).
* `(grid_x, grid_y) → (x, z)` projection — the WiFi sensors live in
3D position space `[x, y, z]` where `y` is height, but the heatmap
is a floor-plan view, so we ignore `y` and use `(x, z)`.
### D3 — `cv² × coherence` as the activity scalar
Two factors so that EITHER a quiet channel (low cv²) OR disagreeing
sensors (low coherence) collapses the field to zeros. This means:
* Empty room (low cv²) → blank map. Truthful.
* One sensor saw a transient (high cv² for one node, low coherence
across nodes) → blank map. Truthful — no multistatic signal.
* All sensors see synchronized modulation → bright map. Truthful —
there really is something in the shared coverage.
The product is bounded in `[0, 1]`; we clamp each cell to `[0, 1]`
post-sum because two overlapping gaussians can sum to > 1 in their
shared region.
### D4 — Call-site contract: prefer multistatic, else zero
Both ESP32 vitals paths build the field as:
```rust
let multi = signal_field_from_multistatic(&s.multistatic_fuser, &s.node_states);
if multi.values.iter().any(|&v| v > 0.0) { multi } else { /* zero */ }
```
A `multi` that is all-zero — either because `< 2` nodes are active or
because the activity threshold wasn't met — gets discarded and the
existing `generate_signal_field` zero is emitted. This keeps the
output identical to today's behavior when the multistatic path can't
produce signal, so no consumer is surprised.
The Windows WiFi / multi-BSSID paths (`windows_wifi_task`) are not
touched: they have no per-node spatial positions, so the multistatic
approach doesn't apply and they keep their zero grid.
## Trade-offs
* **Node positions must be configured.** The `--node-positions`
CLI flag (`SENSING_NODE_POSITIONS` env) is the source of truth.
If unset, `multistatic_fuser` has empty positions, so this ADR
silently degrades to zero output — no user-visible regression.
* **Coverage map ≠ target map.** Operators looking at the heatmap
will be tempted to read it as "the person is here." Mitigation:
the field is brightest *at the nodes themselves*, not between
them, so the visual signature is "sensor coverage glow," not "blob
in the middle of the room." A future ADR (e.g. ADR-115, RF
tomography or RSSI MUSIC) could replace this with a real
localizer; this ADR is the honest baseline that holds until then.
* **σ is fixed.** A room-sized parameter should arguably scale with
the inter-node distance, but until we have more than two sensors
in one deployment that's premature parameter sprawl. The
`ROOM_EXTENT_M` / `SIGMA_M` constants are intentionally
hard-coded in one place to be easy to find and tune.
## Files Touched
```
v2/crates/wifi-densepose-sensing-server/src/main.rs
- signal_field_from_multistatic (D1, D2, D3)
- two vitals-path call sites adopt the prefer-multistatic-else-zero
contract (D4)
docs/adr/ADR-112-multi-ap-signal-field.md (this)
docs/adr/ADR-105-no-synthetic-data-in-production-runtime.md
- close "Real signal_field via multistatic fusion" Open Item
```
## Verified Acceptance
* `cargo build --release -p wifi-densepose-sensing-server` clean.
* `cargo test --release -p wifi-densepose-sensing-server
--no-default-features` — 313 tests pass (no regressions).
* With one sensor active, `signal_field.values` are all zero —
matches ADR-105 behaviour.
* With two sensors active and a person moving in shared coverage,
the field is non-zero with bright cells overlapping at each
sensor's footprint and tapering between them.
## References
* ADR-105 D6 — the "no synthetic signal_field" honesty contract.
* `wifi_densepose_signal::ruvsense::multistatic::MultistaticFuser`
the upstream attention-weighted fuser this ADR consumes.
* `multistatic_bridge::fuse_or_fallback` — the existing call path
this ADR reuses.
* Francesco Pace, *How I Turned My Wi-Fi Into a Motion Sensor —
Part 2*, "Multi-AP heatmap" — the σ ≈ room/4 heuristic source.

View File

@ -0,0 +1,156 @@
# ADR-113 — Multiple Baseline Profiles (Day/Night)
**Status**: Accepted
**Date**: 2026-05-17
**Scope**: `v2/crates/wifi-densepose-sensing-server/src/main.rs`
(`resolve_baseline_profile`, `baseline_profile_watch`,
`--baseline-profile` CLI flag). Closes the "Multiple baseline profiles"
item in CHECKLIST.
## Context
The empty-room baseline that ADR-103 / ADR-104 store in
`data/baseline.json` is captured at one point in time. The channel state
it reflects is sensitive to:
* People walking through corridors / adjacent apartments at night vs.
day (different building-wide ambient WiFi traffic).
* AC / refrigerator compressor duty cycles (broadband noise at the
~Hz scale that changes per-time-of-day).
* Sunlight on building walls (~mm-scale thermal expansion changes
multipath).
In the current deployment we observe the `absent` baseline mean shift
by ~3-5 % between 14:00 and 04:00 — small but enough to push the CV
of a stationary subcarrier across the ADR-103 threshold and trigger
false `present_still` flags overnight.
A single baseline can't model both regimes simultaneously. The lowest-
complexity fix is to keep two: a day baseline and a night baseline,
loaded at startup and hot-swapped at the day/night boundary.
## Decisions
### D1 — `--baseline-profile` selector with four modes
```
--baseline-profile {single,auto,day,night} (default: single)
```
| Mode | Behaviour |
|----------|--------------------------------------------------------------------------------------------|
| `single` | Legacy. Load `RUVIEW_BASELINE_FILE` or `data/baseline.json`. No watch task. **Default.** |
| `auto` | Pick day/night by local hour. Hot-reload at 07:00 / 21:00 transitions. |
| `day` | Force `data/baseline.day.json`. No auto switching. |
| `night` | Force `data/baseline.night.json`. No auto switching. |
Default is `single` so existing deployments don't have to migrate.
Operators opt in by recording two profiles + flipping the flag.
### D2 — Day window: 07:0020:59 local
Hard-coded for now. The split matches the ambient-WiFi pattern in
this deployment (residential building, no commercial traffic).
Tunable in code (future ADR can parameterise if a second deployment
needs different hours), but a flag is premature parameter sprawl.
`chrono::Local::now().hour()` drives the choice — no UTC offset
arithmetic; the OS provides the local hour directly.
### D3 — Filename convention
```
data/baseline.day.json
data/baseline.night.json
data/baseline.json (legacy / single-profile fallback)
```
Same JSON schema as ADR-103 v2 (`full_broadband_*`,
`per_subcarrier_mean`, optionally `per_subcarrier_phase_mean` per
ADR-104). The recording script and REST endpoint can write to any of
the three paths via `--out` / `out` body field — no schema change.
### D4 — Missing-file fallback to `data/baseline.json`
If a requested profile file doesn't exist (e.g., operator set
`--baseline-profile auto` but only recorded `baseline.json`), the
server logs a warning and loads the legacy single-baseline file
instead. This makes the migration path "set the flag, then start
recording per-profile baselines one at a time" — no big-bang switch.
### D5 — Hot-reload via `baseline_profile_watch`
Background task fires every 5 min, re-resolves the profile, and if the
profile tag changed (day → night or vice versa) calls
`load_baseline_file` on the new path. `load_baseline_file` already
hot-swaps in place — the per-node override maps and per-subcarrier
baselines update without touching live frame ingest.
5 min cadence means transitions land within 5 min of the schedule —
acceptable lag for a baseline whose channel-side variance is on the
~hour timescale.
A `static` `CURRENT_BASELINE_PROFILE` mutex tracks the loaded tag so
the watch avoids redundant disk reads when nothing changed.
### D6 — Watch is a no-op outside `auto`
`single`, `day`, and `night` modes don't need switching — those are
"set once at startup". The watch task logs a one-line "disabled"
message and returns immediately. Saves a tokio task slot and
suppresses log noise on the common single-profile deployment.
## Trade-offs
* **Operator has to record two baselines.** Twice the operator time
(~5 min × 2). Unavoidable for the use case.
* **Hard-coded 07:00 / 21:00 split.** A different deployment (office,
shift-work) would want different hours. Defer to a future ADR; for
this deployment the residential cadence works.
* **No smooth interpolation between profiles.** At 20:59 we use day,
at 21:00 we use night — a step transition. For amplitude/baseline
comparison the step is fine (the classifier already smooths over
multiple frames). A weighted blend across the transition window
would be feasible but adds complexity for limited gain.
* **No more than two profiles.** Seasonal (summer/winter), weekday/
weekend etc. would need either more flags or a config-file driven
approach. Out of scope.
## Files Touched
```
v2/crates/wifi-densepose-sensing-server/src/main.rs
- --baseline-profile CLI flag (D1)
- resolve_baseline_profile (D1, D2, D3, D4)
- baseline_profile_file_or_fallback (D4)
- baseline_profile_watch background task (D5, D6)
- CURRENT_BASELINE_PROFILE static + init helper (D5)
- startup uses resolve_baseline_profile (D1)
- spawn baseline_profile_watch alongside other watches (D5)
docs/adr/ADR-113-baseline-profiles.md (this)
```
## Verified Acceptance
* `cargo build --release -p wifi-densepose-sensing-server` clean.
* `cargo test --release -p wifi-densepose-sensing-server
--no-default-features` — 326 tests pass.
* `sensing-server --help` shows the new `--baseline-profile` flag
with the four-mode help text.
* Running with `--baseline-profile single` (default) keeps the
existing log line `baseline-profile: starting in 'single' mode →
data/baseline.json` and disables the watch task with `Baseline
profile watch disabled (--baseline-profile single)`.
* Running with `--baseline-profile auto` while no `baseline.day.json`
exists logs `baseline-profile day: file data/baseline.day.json not
found, falling back to data/baseline.json` then proceeds.
## References
* ADR-103 — persistent baseline storage + JSON schema this ADR reuses.
* ADR-104 — per-subcarrier amplitude + phase drift; both consume
whatever baseline the active profile loads.
* ADR-107 — `POST /api/v1/baseline/calibrate` can write into any of
the three paths via the `out` body field, so operators can record
each profile via the same UI button.

View File

@ -0,0 +1,162 @@
# ADR-114 — 2000-Packet Replay Regression Suite
**Status**: Accepted
**Date**: 2026-05-17
**Scope**: `v2/crates/wifi-densepose-sensing-server/src/main.rs`
(`replay_tests` module under `#[cfg(test)]`),
`v2/crates/wifi-densepose-sensing-server/tests/fixtures/replay_*.jsonl`,
`scripts/generate-replay-fixtures.py`. Closes the "2 000-packet fixed-
replay test suite" item in CHECKLIST.
## Context
Up to now the amplitude classifier has been protected by per-function
unit tests (cv calculation, NBVI selection, baseline drop trigger) but
not by an end-to-end regression test that feeds a known-good stream
through the full `amp_presence_override` pipeline and checks that the
labels still look right.
Without that, a refactor of NBVI selection or a threshold tweak could
silently regress classifier behaviour on real deployments — the unit
tests would all pass while the production output flipped.
Pace's ESPectre has a similar pattern: 1000 idle + 1000 motion frames,
checked into the repo, replayed in CI on every PR.
## Decisions
### D1 — Fixture format: line-delimited JSON, `{node_id, amplitude[]}`
```jsonl
{"node_id":1,"amplitude":[28.842, 19.333, ...]}
{"node_id":2,"amplitude":[15.601, 17.220, ...]}
...
```
Minimal: just the two fields the classifier reads. Round-robined across
nodes (500 per node × 2 nodes = 1000 frames per fixture file). 1000
frames per file × 2 files = 2000 packets total.
### D2 — Fixtures live in-repo under `tests/fixtures/`
```
v2/crates/wifi-densepose-sensing-server/tests/fixtures/
replay_idle.jsonl (1000 lines)
replay_motion.jsonl (1000 lines)
```
Co-located with the test that consumes them. `cargo test` picks them up
via `env!("CARGO_MANIFEST_DIR")`. The fixture files are ~1.5 MB total
(text JSON) — small enough for the repo, not so small that the test
loses statistical power.
### D3 — Synthetic but parameter-matched to live data
The fixtures are generated by `scripts/generate-replay-fixtures.py` with
two deterministic seeds (42 and 43). Parameters chosen to mirror the
live deployment:
* Baseline mean amplitudes per node taken from `data/baseline.json`
(node 1: 27.04, node 2: 14.72).
* Idle: per-frame Gaussian noise σ = 1.8 % of the per-subcarrier mean.
* Motion: ±40 % slow envelope (0.15 Hz sinusoid, 6.7 s cycle, longer
than the classifier's 4.5 s `AMP_SHORT_WIN`) + 5 % per-frame noise.
Mimics a body slowly modulating the channel during walking.
This is deliberately *synthetic*. Capturing 1000 real frames of
"empty room" requires the operator to step out and stay out for ~50 s,
and capturing "motion" requires walking through the room — neither is
something this session could do without manual operator labour. The
synthetic-but-realistic alternative gives deterministic regression
coverage today, with the option to swap in live captures (same JSONL
schema, same filenames) when time allows.
### D4 — Test lives inside `main.rs` under `#[cfg(test)] mod replay_tests`
`amp_presence_override` is private to the binary crate, so the test
can't sit in `tests/` (which is for integration tests against
`lib.rs`). Putting it under `#[cfg(test)]` in `main.rs` keeps the
helper visibility minimal and exercises the exact function path
production uses.
### D5 — Test resets per-node history before each fixture run
`amp_presence_override` accumulates per-node state in
`OnceLock<Mutex<HashMap<…>>>` statics. The test clears those between
the idle and motion runs so each fixture starts with a fresh classifier
(no cross-contamination from the previous fixture's frames sitting in
the rolling window).
It also clears the per-subcarrier baseline (`amp_baseline_per_sub`)
because the synthetic fixtures don't share a per-subcarrier profile
with whatever real recording lives in `data/baseline.json` — leaving
the live per-sub baseline in place would make the drift channel
saturate and obscure the CV-threshold path we're actually testing.
### D6 — F1 threshold: 0.85
Convention from Pace's ESPectre CI gate. Current value on the synthetic
fixtures with this deployment's baseline is `F1 = 1.000` (tp=822,
fp=0, tn=822, fn=0; 178 warmup frames excluded per fixture). The 0.15
headroom gives room for legitimate classifier evolution without
forcing a fixture re-record on every tuning change.
### D7 — Test loads the deployment baseline at startup
Without `data/baseline.json` loaded, the classifier compares raw CV
against thresholds of 3.0 (300 %) and 6.0 — values no realistic signal
reaches. The test discovers the baseline via a couple of canonical
relative paths (`../../data/baseline.json` from the crate dir, etc.)
and exits early with a clear `eprintln!` hint if none are found.
## Trade-offs
* **Synthetic fixtures don't catch sensor-specific bugs.** A
Kconfig-level FW regression that produced subtly different amplitude
scaling would not be caught — the synthetic fixtures encode the
*expected* scaling, not whatever the FW currently emits. The witness
bundle (ADR-028) still covers that end of the pipeline.
* **`replay_2000` runs only when explicitly named or via the full
suite.** No filtering hides it from CI. It runs in well under a
second so cost is negligible.
* **F1 currently 1.0 — too clean to detect subtle regressions.** A
followup with live captures may bring the natural F1 to ~0.9, at
which point the 0.85 threshold becomes a real gate. For now it's
primarily a contract test: "the classifier still emits something
reasonable on a known input".
## Files Touched
```
scripts/generate-replay-fixtures.py (new)
v2/crates/wifi-densepose-sensing-server/tests/fixtures/
replay_idle.jsonl (new)
replay_motion.jsonl (new)
v2/crates/wifi-densepose-sensing-server/src/main.rs
- replay_tests module (D4, D5, D7)
docs/adr/ADR-114-replay-regression-suite.md (this)
```
## Verified Acceptance
```
$ cargo test --release -p wifi-densepose-sensing-server \
--no-default-features --bin sensing-server replay_2000 -- --nocapture
replay_2000 F1=1.000 tp=822 fp=0 tn=822 fn=0
test replay_tests::replay_2000_packets_f1_above_threshold ... ok
test result: ok. 1 passed; 0 failed; 0 ignored;
```
Full workspace suite: 327 tests pass (was 326 + this one).
## References
* ADR-101 — raw-amplitude classifier this test exercises.
* ADR-102 — NBVI subcarrier selection that feeds CV calculation.
* ADR-103 — persistent baseline that drives the universal-threshold
normalization the test relies on.
* ADR-028 — witness bundle (the other end-to-end regression
mechanism; ADR-114 covers classifier code paths, ADR-028 covers
the deterministic-CSI proof pipeline).
* Francesco Pace, *How I Turned My Wi-Fi Into a Motion Sensor —
Part 2*, "Replay regression test" — the upstream pattern.

View File

@ -0,0 +1,161 @@
# ADR-115 — FW REST endpoint to repoint CSI aggregator without USB
**Status**: Accepted
**Date**: 2026-05-17
**Scope**: `firmware/esp32-csi-node/main/ota_update.c`
(`ota_set_target_handler`, `parse_ip_port`, URI registration on port 8032).
## Context
After moving the Mac from Tran Thanh T3 (192.168.1.x) to TP-Link_8340
(192.168.0.x) for low-latency sensor proximity, both ESP32-S3 nodes
held a stale `csi_cfg/target_ip` in NVS — they were silently streaming
CSI into the previous LAN and the new server on `0.0.0.0:5005` saw
zero frames for ~5 minutes despite both nodes being WiFi-reachable
and responding on `:8032/ota/status`.
Existing tools didn't cover this:
* `provision.py` writes `target_ip` via USB serial — requires
physical access to the sensor.
* `/ota/recalibrate` (ADR-109) only erases gain-lock keys
(`gl_agc/gl_fft/gl_ap_mac`) — intentionally doesn't touch
network config.
* Rebuilding FW with a new `CONFIG_CSI_TARGET_IP` would only help if
NVS is also wiped, since the NVS override always beats the
compile-time default.
Recurring operational need: every Mac IP change, every network
move, every router swap requires the operator to crawl behind the
sensor with a USB cable. Not acceptable.
## Decisions
### D1 — `POST /ota/set-target` HTTP endpoint
New handler on the existing OTA HTTP server (port 8032). Body is
plain text `"IPv4:PORT"` with optional trailing CR/LF, e.g.
`192.168.0.103:5005`. No JSON dependency — `cJSON` is not used
elsewhere in this FW.
```
POST /ota/set-target HTTP/1.1
Content-Type: text/plain
Authorization: Bearer <psk> # only if ota_psk provisioned
192.168.0.103:5005
```
Response:
```json
{"status":"ok","target_ip":"192.168.0.103","target_port":5005,"message":"rebooting"}
```
Followed by `vTaskDelay(1s)` + `esp_restart()` so the new value is
picked up by `nvs_config_load` on next boot.
### D2 — Strict body parser (no `inet_pton` dependency)
`parse_ip_port` validates:
* Exactly 4 dot-separated octets, each `0255`.
* Single `:` separator.
* Port `165535`, max 5 digits.
* Trailing whitespace/CR/LF tolerated.
Rejects malformed input with HTTP 400 *before* touching NVS — a
sensor with an unparseable IP would lose its only network identity.
### D3 — Same NVS namespace + keys that `nvs_config.c` reads
```c
nvs_open("csi_cfg", NVS_READWRITE, &h);
nvs_set_str(h, "target_ip", ip);
nvs_set_u16(h, "target_port", port);
nvs_commit(h);
```
Matches the keys already read by `nvs_config_load` at boot, so the
change is picked up without any FW code change beyond this handler.
### D4 — Auth model identical to `/ota/recalibrate`
Uses the same `ota_check_auth` PSK gate (ADR-050). If
`security/ota_psk` is empty, the endpoint is open (dev mode); when
set, requires `Authorization: Bearer <psk>`. Same threat model and
permissive default as `/ota` itself.
### D5 — No partial-write atomicity gymnastics
We write `target_ip`, then `target_port`, then commit. If a power
cut happens between `set_str` and `set_u16`, NVS keeps the previous
`target_port` (since uncommitted writes don't persist) — safe
behaviour. No need for a temp-key + rename dance.
## Files Touched
```
firmware/esp32-csi-node/main/ota_update.c
+ #include "nvs_config.h" (NVS_CFG_IP_MAX)
+ parse_ip_port helper
+ ota_set_target_handler
+ URI registration in ota_update_start_server
+ log line in startup banner
docs/adr/ADR-115-fw-set-target-rest.md (this)
```
Binary size delta: `esp32-csi-node.bin` 854 KB → 855 KB (+~1 KB).
58 % of OTA partition free, plenty of margin.
## Verified Acceptance
Sequence on both live nodes (192.168.0.100, 192.168.0.101):
1. `python3 scripts/ota-deploy.sh 192.168.0.100 192.168.0.101`
`running_partition` flipped on both (`ota_1↔ota_0`).
2. `curl -X POST -d '192.168.0.103:5005' .../ota/set-target`
`{"status":"ok","target_ip":"192.168.0.103","target_port":5005,...}`
on both nodes.
3. After 25 s reboot+WiFi+CSI startup, sensing-server log:
```
keepalive: learned address for node 2 = 192.168.0.100:63940
keepalive: ping -i 0.040 192.168.0.100 for node 2
keepalive: learned address for node 1 = 192.168.0.101:63844
keepalive: ping -i 0.040 192.168.0.101 for node 1
```
4. `GET /api/v1/sensing/latest` → live classification
(`motion_level: active`, presence: true) with non-zero
per-node features (`drift_score: 0.41`, `dominant_freq_hz: 6.3`,
`mean_rssi: -57`).
End-to-end recovery time from broken stream → live CSI: **~3 min**
(build 0, since FW was already built; flash 17 s; set-target +
reboot ~25 s; first ping-driven CSI batch ~5 s).
## Open Items
* **Persist last-known-good target as fallback** — if a bad
`target_ip` is committed (e.g. operator types Mac's old IP) the
sensor goes silent until the next set-target call. A
`csi_cfg/target_ip_lkg` snapshot updated on every successful
keepalive-driven UDP send would let the sensor self-revert after
N silent seconds. ~1 h FW.
* **Track AP MAC alongside target** — ADR-108 / ADR-109 already
invalidate gain-lock on AP change; same pattern could
auto-invalidate target on subnet change (sensor sees its DHCP
lease is on a different /24 than `target_ip` → blank target,
refuse to send until operator confirms). ~1 h FW.
* **REST endpoint to read current target**`GET /ota/target`
returning `{"target_ip":..., "target_port":...}`. Operator can
diagnose "where is this sensor pointed?" without USB. ~15 min FW.
## References
* ADR-050 — OTA PSK auth that gates this endpoint
* ADR-110 — TP-Link WISP deployment that triggered the Mac-IP move
* ADR-108 — FW NVS persistence patterns (same namespace, same approach)
* ADR-109 — `/ota/recalibrate` precedent (same handler shape, same
reboot semantics)
* `scripts/provision.py` — original USB-only NVS provisioning path
that this ADR replaces for the network-config case

View File

@ -0,0 +1,224 @@
# ADR-116 — WiFlow-v1 Supervised Pose Loader (Rust)
**Status**: Accepted (integration), needs fine-tune (output quality)
**Date**: 2026-05-17
**Scope**: `v2/crates/wifi-densepose-sensing-server/src/wiflow_v1.rs` (new,
~430 lines incl. tests), `src/main.rs` (CLI flag + load + 5 tick-site hooks +
`pose_current` keypoint path), `src/lib.rs` (module export).
## Context
Until this ADR `/api/v1/pose/*` always returned an empty `persons` array
(ADR-105 — no synthetic fallback when no real model is loaded). HuggingFace
`ruv/ruview/wiflow-v1/wiflow-v1.json` is the project's official supervised
pose model (Apache-2.0, 974 KB, 92.9 % PCK@20 on its training set). It just
sat on disk because there was no Rust loader — the only reference impl is
`scripts/train-wiflow-supervised.js` (JS, training script, not deployment).
This ADR ports the JS inference path to Rust so sensing-server can serve
real 17-keypoint COCO skeletons in production.
## What was wrong in the model file (and how this ADR works around it)
The HuggingFace JSON has an `architecture` field that **lies**:
```json
"architecture": {
"tcnChannels": [35, 256, 256, 192, 128],
"tcnKernel": 7,
"tcnDilations": [1, 2, 4, 8],
"fcDims": [2560, 2048, 34]
}
```
That's the `full` scale (~7.7 M params). The file is actually the **lite**
scale (186,946 params — confirmed by `totalParams` field). The exporter at
`train-wiflow-supervised.js:1599` hardcodes the full-scale dict for every
scale. The loader trusts `totalParams` and ignores `architecture`.
Lite topology (recovered from `SCALE.lite` at `train-wiflow-supervised.js:135`
and verified by exact param count = 186,946):
* 2 TCN blocks (NOT 4), kernel = 3 (NOT 7), dilations [1, 2] (NOT [1,2,4,8])
* TCN channels: 35 → 32 → 32
* Per block: causal_conv → BN → ReLU → causal_conv → BN + residual → ReLU
(1×1 projection on residual when in_ch ≠ out_ch, only block 0)
* Flatten 32 × 20 = 640 → fc1 (640→256) → ReLU → fc2 (256→34)
* Sigmoid on final 34-dim → 17 (x, y) keypoints in [0, 1]
## Decisions
### D1 — Pure-Rust forward pass, no new crates
`wiflow_v1.rs` is self-contained: Vec<f32> math by hand, inline base64
decoder (50 LoC), no `ndarray`, no `candle`, no `base64` crate added. The
inference is small enough (~250 K flops/forward) that hand-written Vec<f32>
loops are clearer than pulling a tensor framework for one model.
### D2 — Weight stream order matches `collectParams()` in the JS trainer
```
for each TCN block:
conv1.weight (in_ch * k * out_ch f32s)
conv1.bias (out_ch)
bn1.gamma (out_ch)
bn1.beta (out_ch)
conv2.weight, conv2.bias, bn2.gamma, bn2.beta
(if in_ch != out_ch: res.weight, res.bias)
fc1.weight, fc1.bias, fc2.weight, fc2.bias
```
Loader asserts the stream is fully consumed (`Cursor::remaining() == 0`)
after fc2 — catches silent topology mismatches. Param count check
(`totalParams == 186_946`) catches scale mismatch before unpacking.
### D3 — BatchNorm uses per-window mean/var (matches JS impl)
`train-wiflow-supervised.js:770` computes mean/var across the T axis at
inference time, ignoring `runMean/runVar` accumulated during training.
Loader skips running stats entirely (only 2 params per channel stored:
gamma + beta). This is unusual but consistent — the network was trained
this way, so we infer this way.
### D4 — Input prep: top-35 subcarriers by NBVI, raw amplitudes
`build_input_from_history` (in `wiflow_v1.rs`):
1. Take last 20 frames from any node's `AmpState.nbvi_history` (Vec<Vec<f64>>).
2. Rank subcarriers by NBVI score (`α·σ/μ² + (1α)·σ/μ`, α = 0.5) — same
formula the classifier uses, but pick K = 35 (model input), not K = 12
(classifier).
3. Apply 25th-percentile dead-zone gate to skip guard tones / null bins.
4. Build flat `[35 * 20]` row-major tensor of raw amplitudes (no z-score —
training data wasn't normalised either, BN handles it).
If fewer than 20 frames or all subcarriers gated out → return `None`,
inference skipped this tick, `pose_keypoints: None` in SensingUpdate.
### D5 — Per-tick inference, longest-history node
`run_wiflow_inference()` at every `broadcast_tick_task` step (5 sites total
in `main.rs`):
* Picks the node with longest `nbvi_history` (ties broken by smallest
node_id — deterministic).
* Cost: ~250 K flops on the lite scale (BN + 2 small convs + 2 FCs).
Measured 0.4 ms on the Mac M1 — well under the 100 ms tick budget.
* Returns `Vec<[f64; 4]>` of length 17 (`[x, y, z=0, conf=1]`).
### D6 — `pose_current` reads `pose_keypoints` directly
Pre-ADR: `/api/v1/pose/current` read `latest_update.persons`. The tracker
populated `persons` from `derive_pose_from_sensing` (signal-derived,
synthetic) regardless of `model_loaded`. Loader-output `pose_keypoints`
was only read by the WS broadcaster.
This ADR makes `pose_current` prefer `pose_keypoints` when 17-len and
present, building a single `PersonDetection` with COCO joint names. Falls
back to tracker `persons` only when `pose_keypoints` is `None` (cold
start). Keeps the ADR-105 honesty gate: empty array if `model_loaded =
false`.
### D7 — Honest about output quality
The loaded model produces **17 keypoints**, but the **numerical values
are saturated** (most x/y near 0 or 1) — sigmoid extremes meaning the
network has no learned response to our specific deployment's CSI
distribution. This is expected: the model was trained on a different
ESP32 setup, different room, different person, with camera ground truth
we don't have here. **The integration is correct; the model needs
deployment-specific fine-tune to produce useful keypoints.**
Two paths to usable output, left as follow-ups (Pack E):
1. **Apply `node-1.json` / `node-2.json` LoRA adapters** (ADR-117 candidate)
— they're shipped alongside `wiflow-v1.json` in the same HuggingFace
repo, rank=8, alpha=16, target the encoder + task heads. Loader stub +
forward fold ~2 h.
2. **Re-train via `scripts/train-wiflow-supervised.js` with new ground-
truth capture** (~30 min capture + 19 min training per the model card).
Operator-side work.
## Files Touched
```
v2/crates/wifi-densepose-sensing-server/src/wiflow_v1.rs (new, ~430 LoC)
v2/crates/wifi-densepose-sensing-server/src/lib.rs (+ pub mod)
v2/crates/wifi-densepose-sensing-server/src/main.rs:
+ use wiflow_v1::{self, WiflowModel}
+ Args.wiflow_model: Option<PathBuf>
+ static WIFLOW_MODEL: OnceLock<Option<WiflowModel>>
+ main() — load before existing --model/--load-rvf path
+ fn run_wiflow_inference() -> Option<Vec<[f64;4]>> (right after csi_keepalive_task)
+ 5 × `pose_keypoints: run_wiflow_inference()` at SensingUpdate sites
+ pose_current — prefer pose_keypoints when 17-len; fall back to persons
docs/adr/ADR-116-wiflow-v1-supervised-pose-loader.md (this)
```
Binary size delta: 3.0 MB → 3.1 MB.
## Verified Acceptance
Live test on the operator's TP-Link deployment (.103, both nodes
192.168.0.100/.101):
```
$ ./target/release/sensing-server --source esp32 --csi-keepalive-pps 25 \
--wiflow-model data/models/ruview/wiflow-v1/wiflow-v1.json
...
ADR-116 wiflow-v1 loaded from data/models/ruview/wiflow-v1/wiflow-v1.json
(lite scale, 186946 params)
keepalive: learned address for node 2 = 192.168.0.100:63940
keepalive: learned address for node 1 = 192.168.0.101:63844
$ curl :8080/api/v1/info → "pose_estimation": true
$ curl :8080/api/v1/pose/stats → "model_loaded": true, frames_processed: 2699
$ curl :8080/api/v1/pose/current
{ persons: [{id: 1, keypoints: [17 × {name, x, y, z, confidence}], ...}],
total_persons: 1, model_loaded: true }
```
End-to-end: model on disk → loader → forward pass → 17 keypoints → REST &
WS payload. UI's pose canvas (un-gated by ADR-105 D4) now draws what the
model emits.
## Cargo tests
`wiflow_v1` ships 3 unit tests covering the most-likely-to-rot bits:
* `base64_round_trip_alphabet` — alphabet, padding, whitespace tolerance
* `sigmoid_bounds` — numerical stability at ±10 inputs
* `build_input_zero_history` — empty-history early return
`cargo test -p wifi-densepose-sensing-server wiflow_v1` → 3 passed.
## Open Items
* **Pack E.1 — LoRA adapter loader.** `node-1.json` / `node-2.json` rank-8
adapters from the same HF repo, ~21 KB each. The trainer encodes them
in the same custom format as `wiflow-v1.json` (different `format` tag),
so the loader plumbing is small. ~2 h.
* **Pack E.2 — Camera-supervised retraining for this room.** Run
`scripts/collect-ground-truth.py` against this Mac's webcam +
TP-Link/.100/.101 CSI for 5 min, then `scripts/train-wiflow-
supervised.js --scale lite`. Should drop sigmoid saturation and produce
spatially-coherent keypoints. ~1 h operator + 19 min train.
* **Inference rate-limiting.** Currently runs every tick (10 fps). If
multiple WS clients connect, each tick computes once and the result is
reused — fine. If model size grows to small/medium scale (~200K/800K
params), should cache the result per tick instead of computing per-client.
* **Per-node pose tracks.** Right now a single virtual person is emitted;
the broadcaster places it in `zone_1` with a fixed bbox. If/when LoRA
adapters disambiguate per-node viewpoints, fan out to one
`PersonDetection` per node (left/right of the room).
## References
* `scripts/train-wiflow-supervised.js` — JS reference implementation
* HuggingFace `ruv/ruview` — model file + LoRA adapters (Apache-2.0)
* ADR-079 — camera ground-truth training pipeline (the trainer this
loader was built against)
* ADR-105 — "no synthetic data in production runtime"; this ADR keeps
the gate but feeds it real model output
* ADR-115 — `/ota/set-target` (the prerequisite that got the CSI stream
flowing again so this loader has data to consume)

View File

@ -0,0 +1,245 @@
# ADR-117 — Process Hygiene, Pose Path Honesty, and Audit Follow-ups
**Status**: Accepted
**Date**: 2026-05-17
**Scope**: `v2/crates/wifi-densepose-sensing-server/src/{main.rs,wiflow_v1.rs}`,
`v2/crates/wifi-densepose-sensing-server/tests/multi_node_test.rs`,
`ui/index.html`, `ui/components/LiveDemoTab.js`, `CHECKLIST.md`,
`docs/adr/ADR-115-fw-set-target-rest.md`,
`docs/references/{espectre-gap-analysis.md,ota-pipeline.md}`.
## Context
A deep audit pass (4 parallel auditors covering sensors, server, UI, docs)
surfaced two operational fires and a stack of correctness/honesty issues
that had accumulated across ADR-100..116. This ADR collects the immediate
fixes.
### Fire 1 — Runaway ping zombies
Live `ps` showed **250+ `/sbin/ping -i 0.040` processes** on the Mac, most
parented to PID 1 (orphans from prior server lifetimes) and **8 fresh
pings to `127.0.0.1` parented to the current server**.
Root cause: a `cargo test --workspace` run sent UDP packets to
`127.0.0.1:5005` from `tests/multi_node_test.rs::test_multi_node_udp_send`
while the production server was bound to `0.0.0.0:5005`. The integration
test injects 55 synthetic frames with `node_ids = [1, 2, 3, 5, 7]`. Each
distinct `node_id` byte in a CSI magic packet triggered a fresh entry in
`NODE_ADDRS`, and the keepalive task spawned exactly one `ping` child
per entry. Combined with macOS not propagating parent death to children
(killed servers leave ping orphans), the count accumulated rapidly.
### Fire 2 — Per-node feature divergence on node 2
Node 2 (192.168.0.100) showed `dominant_freq_hz: 0.05` vs node 1 (.101)
`6.30` — a 126× split in the same room. Pointed to stale gain-lock on
node 2 from a different AP/orientation. Cleared via
`POST /ota/recalibrate` (ADR-109) — sensor re-runs the 300-packet
calibration sampler at next boot.
### Correctness issues (server auditor)
* `run_wiflow_inference` hardcoded keypoint `confidence: 1.0` — lied about
data quality. Real signal: the runtime classifier's `confidence`.
* `wiflow_v1.rs` zero-pad path duplicated subcarrier index 0 instead of
zero-padding when < 35 finite subcarriers comment said "zero the
rest", code did the opposite.
* `nbvi_history.clone()` cloned the entire 600-deep VecDeque (≈270 KB) on
every inference, while only the last 20 frames are used.
* `run_wiflow_inference` picked the node with longest history regardless
of recency — stale data from a dead sensor would keep producing pose.
### UI issues (UI auditor)
* `/` served a static API-index HTML page; users typing `localhost:8080`
never reached the SPA at `/ui/index.html`.
* `<section id="sensing">` was empty; `app.js::SensingTab.mount` queried
`#sensing-container` and rendered into nothing — the Sensing tab was
permanently blank.
* `LiveDemoTab.fetchModels` unconditionally overwrote `activeModelId =
'wiflow-v1'` whenever `/api/v1/info` reported `pose_estimation: true`,
even when the operator had just loaded an RVF model. Dropdown silently
flipped back to WiFlow on every refresh.
### Docs issues (docs auditor)
* `CHECKLIST.md` header: `head c827cde6`, count `43 Done` — stale
by 4 commits and 2 ADRs.
* `ADR-115 References` cited "ADR-100 — TP-Link WISP" (it's ADR-110)
and "ADR-108 / ADR-111" (ADR-111 doesn't exist — folded into ADR-109).
* `espectre-gap-analysis.md::Still open` table listed 8 items as open
that had already shipped (ADR-104, ADR-109, ADR-112, ADR-114).
* `ota-pipeline.md` documented OTA flashing but never mentioned
`/ota/set-target` (ADR-115) or `/ota/recalibrate` (ADR-109) — operator
hitting the "Mac moved networks" scenario wouldn't find the recovery
path.
## Decisions
### D1 — UDP receiver filters loopback before NODE_ADDRS
`main.rs::udp_receiver_task` now rejects loopback, unspecified, multicast,
and broadcast source addresses before inserting into `NODE_ADDRS`. Packets
still parse and feed the classifier — only the keepalive registration
is gated. Defends against any local sender (tests, simulators, future
tooling) accidentally driving ping spawn.
### D2 — Keepalive pre-reap at startup
`main.rs::csi_keepalive_task` runs `pkill -f "/sbin/ping -i 0.040"` and
`pkill -f "/usr/bin/ping -i 0.040"` once at task entry. Cleans up
orphans from prior server lifetimes without operator action. Cost: two
`pkill` invocations at startup, ~10 ms total. Idempotent.
### D3 — Real keypoint confidence
`run_wiflow_inference` now stamps `confidence = amp_classify_from_latest`
runtime classifier confidence onto all 17 keypoints (was `1.0` hardcoded).
The lite-scale wiflow has no per-keypoint uncertainty head; this signal
is the most honest stand-in. Currently reading **0.037** on the live
deployment — accurate reflection of "wiflow output is saturated, don't
trust these coords".
### D4 — Zero-pad fix in wiflow_v1
`build_input_from_history` now pushes `None` into `picks` for dead slots
and writes `0.0f32` into those rows. Prior code pushed `0usize` → all
unused channels read subcarrier-0 amplitudes, feeding the network 35×
the same signal.
### D5 — Tail-clone optimisation
`run_wiflow_inference` snapshots only the last 20 entries from
`nbvi_history` while holding the lock, not the full 600-deep deque. Lock
hold time dropped from ~µs * 600 to ~µs * 20 per tick.
### D6 — `/``/ui/index.html` permanent redirect
`main.rs::root_redirect` returns HTTP 308. API-index HTML moves to `/api`
for operators / curl debugging. Users typing the bare host land on the
SPA.
### D7 — Sensing tab container restored
`ui/index.html`: `<section id="sensing">` now contains `<div
id="sensing-container">` matching `app.js::SensingTab.mount`'s query
selector.
### D8 — LiveDemoTab WiFlow inject only when no model active
`LiveDemoTab.fetchModels` wraps the `activeModelId = 'wiflow-v1'`
assignment in `if (!this.modelState.activeModelId)`. RVF model loads
keep their displayed name.
### D9 — Multi-node test guards against external :5005 owner
`tests/multi_node_test.rs::test_multi_node_udp_send` probes
`127.0.0.1:5005` with a transient bind; if the bind fails, the test
skips its UDP send rather than polluting whoever owns the port. Belt-
and-braces with the server-side filter (D1).
### D10 — Docs sweep
* `CHECKLIST.md`: header to `head 0ec1e4b0`, count to **47 Done**,
explicit note that ADR-111 is intentionally absent. Reference table
range to `001-117`.
* `ADR-115`: "ADR-100" → "ADR-110", "ADR-108 / ADR-111" → "ADR-108 / ADR-109".
* `espectre-gap-analysis.md::Still open` table: 8 shipped items marked
✓ Done with commit hashes; remaining items annotated Deferred with
reason or carry a Pack assignment. New items 15-16 added (ADR-115,
ADR-117).
* `ota-pipeline.md`: new "Operator REST endpoints" section listing
`/ota/status`, `/ota`, `/ota/recalibrate`, `/ota/set-target` with
curl examples both unauthed and bearer-token authed.
## Files Touched
```
v2/crates/wifi-densepose-sensing-server/src/main.rs:
+ udp_receiver_task: loopback/unspecified/multicast/broadcast filter (D1)
+ csi_keepalive_task: pre-reap pkill at task entry (D2)
+ run_wiflow_inference: real classifier confidence (D3) + tail clone (D5)
+ Router: GET / → root_redirect (308), GET /api → info_page (D6)
+ info_page: expanded with new endpoints listed
v2/crates/wifi-densepose-sensing-server/src/wiflow_v1.rs:
+ build_input_from_history: None-pad → 0.0f32, not subcarrier-0 dup (D4)
v2/crates/wifi-densepose-sensing-server/tests/multi_node_test.rs:
+ ADR-117 guard: skip if 127.0.0.1:5005 is owned (D9)
ui/index.html:
+ <div id="sensing-container"> inside #sensing section (D7)
ui/components/LiveDemoTab.js:
+ fetchModels: guard wiflow inject behind !activeModelId (D8)
CHECKLIST.md:
+ header refresh + ADR range correction (D10)
docs/adr/ADR-115-fw-set-target-rest.md:
+ typo fixes ADR-100 → ADR-110, ADR-111 → ADR-109 (D10)
docs/references/espectre-gap-analysis.md:
+ Still-open table refresh — 8 items ✓ Done, 14/15 reclassified (D10)
docs/references/ota-pipeline.md:
+ Operator REST endpoints section (D10)
docs/adr/ADR-117-process-hygiene-and-audit-followups.md (this)
```
Binary size delta: 3.0 MB → 3.1 MB (no significant change).
## Verified Acceptance
After restart with the new binary (PID 97903):
```
$ ps -axo pid,ppid,command | grep "ping.*-i.*0\.040" | grep -v grep | wc -l
2
$ ps -axo pid,ppid | grep "ping.*-i.*0\.040"
97921 97903 /sbin/ping -i 0.040 192.168.0.100
97922 97903 /sbin/ping -i 0.040 192.168.0.101
```
Exactly two ping children — one per real sensor — parented to the
running server. No 127.0.0.1, no orphans.
```
$ curl -sI http://localhost:8080/
HTTP/1.1 308 Permanent Redirect
location: /ui/index.html
$ curl http://localhost:8080/api/v1/pose/current | jq '.persons[0].keypoints[0]'
{ "name": "nose", "x": 0.999, "y": 0.0, "z": 0, "confidence": 0.037 }
```
`confidence: 0.037` — real runtime classifier signal, not hardcoded 1.0.
`cargo test --workspace` (release) passes 13 / 0 failed / 5 ignored.
## Out of Scope (intentional non-fixes)
* **Health endpoint fake constants** (cpu:2.5, mem:1.8, disk:15.0) —
flagged by the auditor as critical. Replacing with `sysinfo` crate
would add a dependency for low-value telemetry; the orchestrator
readiness probe today is only used by Docker compose, not Kubernetes
liveness. Deferred. Real fix: `/health/ready` only reports
`model_loaded` + `node_count > 0`.
* **`derive_pose_from_sensing` call-site cleanup** — function returns
`Vec::new()` since ADR-105; removing the 5 call sites is a no-op
refactor with no behaviour change. Skipped to keep diff focused.
* **`tracker_bridge:10` unused imports warning** — module is integrated
via `tracker_bridge::tracker_update` (4 callers), the import list
just has dead names. Cosmetic. `cargo fix` deferred.
* **CLI training flags** (`--train`, `--dataset`, `--epochs`,
`--checkpoint-dir`, `--pretrain*`) — silent no-ops; training is via
REST. Removing the flags would break any operator script that passes
them harmlessly. Deferred to a separate flag-audit pass.
* **OTA PSK provisioning** — operator workflow change, not a code
change. Note added to ADR-115 open items. Operator can set
`security/ota_psk` via USB provision.py whenever convenient.
## References
* ADR-105 — no synthetic data in production runtime; this ADR extends
the principle to keypoint confidence (was synthesised, now real).
* ADR-109 — gain-lock recalibrate REST; same endpoint used to fix node 2
feature divergence as part of this audit pass.
* ADR-115 — set-target REST; typos fixed here.
* ADR-116 — WiFlow-v1 loader; the auditor's findings landed against
this ADR's just-shipped integration.
* `tests/multi_node_test.rs` — the test whose accidental cross-talk with
the production server triggered the 250+ ping zombie incident.

View File

@ -0,0 +1,193 @@
# ADR-118 — Feature Decorrelation + Multi-node Extractor (Adaptive Classifier)
**Status**: Accepted
**Date**: 2026-05-18
**Scope**: `v2/crates/wifi-densepose-sensing-server/src/adaptive_classifier.rs`
(`N_FEATURES`, `features_from_frame`, `features_from_runtime`), call sites in
`main.rs::adaptive_override`, `main.rs:~6200` per-node loop, and
`csi.rs::adaptive_override`.
## Context
After ADR-117 the adaptive_classifier produced **40.4% accuracy** on a
2-node, 7-class training set (52,857 frames). Adding 4 more sensors and
recording the same 7 classes at 6 nodes increased the set to **151,329 frames
(2.9× more data)** but accuracy only moved to **44.4%** (+4 pts).
Diagnostic Python audit (run against both datasets) found three architectural
defects in the feature pipeline, not the data:
| Defect | 2-node set | 6-node set |
|---|---|---|
| Constant feature (`amp_min = 0.00` across all frames — HT20 null subcarrier) | ✗ dead | ✗ dead |
| Multicollinear pairs `|r| > 0.85` | 17 pairs | 21 pairs |
| Top F-stat vs accuracy | F=1,516, acc 40.4% | F=15,497, acc 44.4% |
The 10× higher F-stat on 6-node data confirmed the **signal was getting
stronger** but the classifier couldn't extract it. Root cause:
`features_from_frame` used only `nodes.first()` — 5 of 6 sensors carried
**zero weight** in the feature vector. Adding nodes physically helped, but
only via the small contribution to the 7 aggregated server-level features.
Within a single node, the 8 subcarrier scalars were 90-99% correlated with
each other (mean ≈ std ≈ max ≈ p25/75/90 — they all measure "amplitude
level"). And the 4 energy features (variance, motion_band_power,
breathing_band_power, spectral_power) were 87-99% correlated. The 15-feature
space had effective rank ≈ 5.
## Decisions
### D1 — Drop the dead and redundant features
* **Dropped**: `amp_min` (constant 0), `amp_range = max min ≡ max`
(collinear), `motion_band_power`/`breathing_band_power`/`spectral_power`
(all r > 0.95 with `variance`), `amp_mean`/`amp_max`/`amp_iqr`/`amp_kurt`
(all r > 0.90 with `amp_std`).
* **Kept (globally)**: `variance`, `mean_rssi`, `dominant_freq_hz`,
`change_points` — the 4 server-level features that retained marginal
independence.
### D2 — Per-node features × all 6 nodes
For each node id `N ∈ {1..6}`, extract 3 features:
* `amp_std` — multipath spread (motion-sensitive)
* `amp_skew` — distribution asymmetry (sensitive to dominant scatterer
position relative to this sensor)
* `amp_entropy` — spectral diversity (normalised to [0, 1])
Total: `4 + 6 × 3 = 22 features`. Each node's contribution lives at a fixed
offset (`base = 4 + (node_id - 1) × 3`) so 5 of 6 sensors are no longer
discarded.
Missing-node features are zero-padded; z-score normalisation (already in
the model from ADR-117 era) treats them consistently across train and
classify.
### D3 — `features_from_runtime` signature change
Old:
```rust
pub fn features_from_runtime(feat: &Value, amps: &[f64]) -> [f64; 15]
```
New:
```rust
pub fn features_from_runtime(
feat: &Value,
per_node_amps: &[(u8, &[f64])],
) -> [f64; 22]
```
Three call sites updated:
1. `main.rs::adaptive_override` (global state path) — new helper
`current_per_node_amps()` reads `AMP_HIST.nbvi_history.back()` for each
active node, then passes the slice.
2. `main.rs:~6200` (per-node loop in the broadcast tick task) — same
helper, called once per tick.
3. `csi.rs::adaptive_override` (legacy, no live callers) — degraded to
single-node fallback with `[(1u8, amps)]`; documented as emergency only.
### D4 — Old 15-feature model file is incompatible
`AdaptiveModel` serializes `[f64; N_FEATURES]` arrays. Loading a 15-array
into a 22-slot field fails. `data/adaptive_model.json` removed at deploy
time; first start re-runs `train_from_recordings` over the existing 7 train
files.
## Files Touched
```
v2/crates/wifi-densepose-sensing-server/src/adaptive_classifier.rs:
* N_FEATURES: 15 → 22
* New constants N_GLOBAL_FEATURES=4, N_PER_NODE_FEATURES=3, MAX_NODES=6
* features_from_frame rewritten — multi-node + decorrelated
* features_from_runtime signature changed
* per_node_stats helper (3 scalars: std/skew/entropy)
* Old subcarrier_stats removed
v2/crates/wifi-densepose-sensing-server/src/main.rs:
+ current_per_node_amps() helper (snapshots AMP_HIST.nbvi_history.back())
+ 2 call sites updated to pass &[(u8, &[f64])] instead of &[f64]
v2/crates/wifi-densepose-sensing-server/src/csi.rs:
+ adaptive_override updated to new signature (dead code path, kept for ABI)
data/adaptive_model.json: removed (15-feature incompatible)
docs/adr/ADR-118-feature-decorrelation-multinode.md (this)
```
## Verified Acceptance
Re-ran `POST /api/v1/adaptive/train` against the same 151,329-frame 6-node
recording set:
```
2-node, 15 features: 40.4%
6-node, 15 features: 44.4% (+4.0 from more data)
6-node, 22 features: 49.58% (+5.2 from feature engineering)
```
Total improvement: **+9.2 percentage points** from the baseline, on the
same hardware in the same room.
Live confidence distribution (10s samples post-retrain):
```
absent: conf 0.30-0.85 (was 0.04-0.10 pre-ADR-118)
present_still: conf 0.40-0.85
present_moving: conf 0.30-0.50
active: conf 0.27-0.45
transition: conf 0.84-0.86 (high — model has clear signal for this)
waving: conf — class not active during sample window
```
Confidence is now meaningful (model has separation), whereas pre-ADR-118 the
near-uniform 0.04-0.10 indicated the classifier was essentially flipping a
coin.
### Per-feature class separability (post-train, sep_ratio = between-class
spread / within-class std):
| Feature | sep_ratio | Verdict |
|---|---|---|
| `n6_std` | 0.60 ★ | best — node 6 near door catches both motion + door state |
| `n2_std` | 0.35 | second — node 2 far from AP, high modulation |
| `n6_skew` | 0.25 | useful |
| `n3_skew` | 0.26 | useful |
| `n2_skew` | 0.18 | marginal |
| `n4_std` | 0.14 | marginal |
| `n1_*` | 0.01-0.06 | near AP — almost no class signal |
| `n5_*` | 0.01-0.05 | similar to n1 |
| all `entropy` features | 0.01-0.02 | **dead** — distribution shape doesn't vary by activity |
| `variance` (global) | 0.11 | weak |
| `mean_rssi` (global) | 0.01 | dead at this scale |
## Open Items
* **`*_entropy` features carry no signal** (sep_ratio ~0.01 across all 6
nodes). Could be dropped: 22 → 16 features. Marginal expected gain (~1%),
not worth a follow-up ADR right now.
* **Aggregated server features all sub-0.11**`mean_rssi` / `dom_hz` /
`change_pts` could go too. Would reduce to 12-13 truly useful features.
* **Logistic regression ceiling**`n6_std` alone has sep_ratio 0.60 but
a linear classifier can't fully exploit non-linear class boundaries.
Next big lever is replacing the LogReg with a small MLP or random forest.
Out of scope here.
* **`standing` and `sitting` recordings collapse to one class** — file
naming maps both to `present_still`. They're physically distinct
signatures (different RF profile) but the trainer treats them as one.
Separating them in `classify_recording_name` would add a class but might
lower accuracy due to inherent confusability — TBD via experiment.
* **Sensor placement matters more than algorithm tweaks** — n1/n5 (near AP)
carry almost no class signal. Reposition them away from the AP if
possible (closer to walking zone, farther from the line-of-sight to AP).
## References
* ADR-101 — raw amplitude classifier (the runtime classifier this adaptive
model can override)
* ADR-117 — process hygiene + previous training infrastructure
* `data/recordings/archive_2node_2026-05-17/` — earlier 2-node training
set, kept for comparison; not used by trainer (outside `recordings/`
root scope)

View File

@ -0,0 +1,161 @@
# ADR-119 — MLP Replaces Logistic Regression in Adaptive Classifier
**Status**: Accepted
**Date**: 2026-05-18
**Scope**: `v2/crates/wifi-densepose-sensing-server/src/adaptive_classifier.rs`
(new `MlpModel` struct, `train_mlp_classifier`, `eval_mlp`; modified
`AdaptiveModel::classify` + `train_from_recordings`).
## Context
After ADR-118 (feature decorrelation + multi-node extractor) the adaptive
classifier reached **49.58% accuracy** on a 6-node, 7-class, 151,329-frame
training set. Per-feature audit showed `n6_std` sep_ratio = 0.60 — i.e. the
underlying signal *can* separate the classes — but logistic regression was
limited to linear decision boundaries and couldn't model interactions like:
* `walking`: `n2_std` high **AND** `n6_std` high **AND** `dom_hz ≈ 3 Hz`
* `waving`: `n1_std` high **BUT** `n2_std` low (only close sensors fire)
* `sitting` vs `standing`: same global features, differ in `n6_std` pattern
LogReg sums weighted features; it cannot represent "AND/BUT" combinations.
A small MLP can: hidden units learn intermediate concepts, then the output
layer combines them.
## Decisions
### D1 — Single-hidden-layer MLP, 22 → 32 → 6
* Input: the same 22-feature vector from ADR-118.
* Hidden: 32 ReLU units. ~3k weights, enough capacity for 6 classes but
small enough to train in seconds on the 151k-frame set.
* Output: softmax over `n_classes` (discovered dynamically at train time).
* Z-score normalisation: identical to the LogReg path — same
`global_mean` / `global_std` populated by `train_from_recordings`.
### D2 — Manual backprop, no external ML crate
`tch` (LibTorch) or `candle` would pull in ~50-200 MB of native deps for a
~3k-parameter network. The forward + backward passes are ~150 LoC of pure
Rust; SGD + momentum + cosine LR decay another ~30. Built-in `f64`
arithmetic is fast enough — full train completes in ~10 seconds on M1
Mac.
Optimiser: SGD with momentum 0.9, weight decay 1e-4, base LR 0.05 with
half-cosine decay to 0, batch size 64, 30 epochs. He initialisation
(`N(0, sqrt(2/fan_in))`) on weights, zero on biases.
### D3 — MLP wins over LogReg at classify time, LogReg kept as fallback
`AdaptiveModel` carries both:
```rust
pub weights: Vec<Vec<f64>>, // legacy LogReg, still trained for rollback
pub mlp: MlpModel, // ADR-119 — preferred when is_trained() == true
```
`classify()` checks `self.mlp.is_trained()`; if yes uses MLP forward pass,
otherwise falls back to LogReg softmax. Old `data/adaptive_model.json`
files (15-feature LogReg) loaded with `#[serde(default)]` on `mlp`
`MlpModel::default()` returns empty fields → `is_trained() == false`
graceful degradation to LogReg path.
### D4 — Train both, report better number
`train_from_recordings` runs the existing LogReg loop first (unchanged),
then trains MLP on the same z-normalised samples, evaluates both on the
training set, and reports `training_accuracy = mlp_acc.max(logreg_acc)`.
Per-class accuracy from both classifiers is logged side-by-side for
diagnostic comparison.
## Verified Acceptance
```
LogReg: 49.58% overall
MLP: 53.53% overall (+3.95 pts)
Per-class (LogReg → MLP):
absent 40% → 41% (+1)
present_still 99% → 99% (tied — 2× sample count)
transition 29% → 36% (+7)
active 22% → 30% (+8)
waving 34% → 38% (+4)
present_moving 24% → 33% (+9)
```
Notes:
* `present_still` class is a merged bucket: both `train_standing_*` and
`train_sitting_*` map to `present_still` via `classify_recording_name`.
Hence 43,242 samples vs 21,500 average for the other classes — the
classifier biases strongly toward this dominant class. The 99% is
honest but partially inflated by class imbalance.
* The +3.95 pts is concentrated on motion classes — exactly where the
hypothesis predicted MLP would help (non-linear combinations of per-
node features differentiate similar motion types).
* MLP loss flatlined around 1.15 after epoch 10. Suggests the current
22-feature representation has hit its information ceiling for frame-
level classification. Going higher needs temporal context (sliding
window classifier, LSTM, TCN) — see Open Items.
Total improvement since the start of this session:
```
2-node, 15 features, LogReg: 40.4% (baseline)
6-node, 15 features, LogReg: 44.4% +4.0 from more data
6-node, 22 features, LogReg: 49.58% +5.2 from feature engineering (ADR-118)
6-node, 22 features, MLP: 53.53% +3.95 from non-linear classifier (ADR-119)
─────
Total cumulative: +13.1 percentage points
```
## Files Touched
```
v2/crates/wifi-densepose-sensing-server/src/adaptive_classifier.rs:
+ const MLP_HIDDEN: usize = 32
+ pub struct MlpModel { w1, b1, w2, b2, n_classes } + serde
+ impl MlpModel { is_trained, forward }
+ AdaptiveModel.mlp field (serde-default for backward compat)
+ AdaptiveModel::classify prefers MLP when trained
+ train_mlp_classifier (~150 LoC manual backprop)
+ eval_mlp helper
+ train_from_recordings calls MLP path and picks max accuracy
docs/adr/ADR-119-mlp-classifier.md (this)
```
`data/adaptive_model.json` removed at deploy time — the MLP fields need
populating, the old file has none.
## Out of Scope / Follow-ups
* **Temporal classifier (sliding window LSTM/TCN)** — loss flatlines at
~1.15 with the current feature set; this is the frame-level ceiling.
A model that consumes a 1-second window (10-20 frames) would catch
the temporal signature of `transition` (sit-stand cycle ≈ 0.5 Hz),
`walking` (step rate ≈ 2 Hz), `active` (bursty), `waving` (limb
cadence ≈ 1-2 Hz). Estimated +15-25 pts realistic for these
inherently-temporal classes. ~3-4 hours of code.
* **Class imbalance fix**`present_still` has 2× samples. Either
oversample the minority classes during training, or weight loss by
inverse class frequency. Marginal — ~2-3 pts.
* **Drop dead features** — 6 entropy features (sep_ratio 0.01-0.02) and
3 weak globals (`mean_rssi`, `dom_hz`, `change_pts` all <0.11)
contribute noise. Reducing 22 → ~13 features would simplify training
but probably not move accuracy more than 1-2 pts.
* **Hidden size sweep** — tried only 32. Could try 16 (faster, less
overfitting risk) or 64 (more capacity). Cosmetic.
* **Split `sitting` and `standing` into separate classes** — they're
physically distinct RF signatures but currently merged. Adding them as
separate classes would test whether the model can disambiguate them.
Likely lowers `present_still` accuracy but separates a useful
distinction. Experiment-grade.
## References
* ADR-118 — feature decorrelation + multi-node extractor (the 22-feature
basis this ADR uses)
* ADR-117 — earlier process hygiene pass; introduced standardisation
(`global_mean`/`global_std`) that this ADR's MLP also relies on
* ADR-101 — raw amplitude classifier (the runtime path that calls
`AdaptiveModel::classify`)

View File

@ -0,0 +1,209 @@
# ADR-120 — Windowed Temporal Classifier (W-MLP)
**Status**: Accepted
**Date**: 2026-05-18
**Scope**: `v2/crates/wifi-densepose-sensing-server/src/adaptive_classifier.rs`
(`WindowedMlpModel`, `train_windowed_mlp_classifier`, `eval_windowed_mlp`,
`AdaptiveModel::classify_window`); `main.rs` (`AppStateInner.feature_window`,
`push_feature_window`, `adaptive_override` switching to window path).
## Context
ADR-119 added a small MLP (22 → 32 → 6) that improved accuracy from 49.58%
(LogReg) to **53.53%**. Loss flatlined at ~1.15 around epoch 10 of 30 —
clear signal that the **frame-level information ceiling** had been
reached for the 22-feature representation.
The dataset has 7 activity classes that differ primarily in **temporal
patterns**, not in any single frame:
* `walking` step cadence: ~2 Hz (visible in 0.5-second window)
* `transition` (sit-stand): ~0.5 Hz (visible in 2-second window)
* `waving` limb cadence: 1-2 Hz
* `active` (jumping): bursty / quasi-periodic at ~3 Hz
* `present_still` (sitting + standing merged): no temporal signature
Per-frame, `walking` and `active` and `waving` all look "moving" with
similar amplitude std/skew — they're disambiguated only by HOW the
amplitude pattern evolves over 1-2 seconds. A classifier that sees a
single frame can't tell them apart no matter how good the per-frame
features are.
## Decisions
### D1 — Stack 20 consecutive frames into a 440-d input
```
WINDOW_FRAMES = 20 (~2 seconds at ~10 Hz tick rate)
N_FEATURES = 22 (from ADR-118)
WINDOWED_INPUT = 20 × 22 = 440
WINDOWED_HIDDEN = 64
```
Network: `440 → 64 ReLU → n_classes softmax`. ~28k weights total —
larger than the frame-level MLP's 3k, but still small enough to train
in <60s and serialize as JSON.
Training samples are built by sliding a window of 20 frames with **stride
5** within each recording (4× overlap). Windows do **not** cross recording
boundaries — each window inherits its source recording's class label.
On the 6-node 151k-frame set:
* 7 recordings × ~21k frames each = 151k frames total
* (21k 20) / 5 ≈ 4,300 windows per recording
* Total: ~30k windowed samples
* Class balance is roughly preserved (each recording is one class)
### D2 — Manual backprop, same recipe as MLP
Same SGD + momentum 0.9 + weight decay 1e-4 + cosine LR decay. Base LR
lowered to 0.03 (vs MLP's 0.05) because the network is bigger. 25 epochs.
He initialisation, ReLU activation, softmax output, cross-entropy loss.
### D3 — `AdaptiveModel` carries all three classifiers, classify routes by availability
```rust
pub struct AdaptiveModel {
pub weights: Vec<Vec<f64>>, // ADR-118 legacy LogReg
pub mlp: MlpModel, // ADR-119 frame-level MLP
pub windowed_mlp: WindowedMlpModel, // ADR-120 (this) — primary
// ...
}
```
`classify_window()` (new API) prefers `windowed_mlp` when trained AND
the caller has a 20-frame buffer. Falls through to frame-level MLP
when called with insufficient history. Old JSON model files load with
`MlpModel::default()` and `WindowedMlpModel::default()` filling absent
fields — backward compatible.
### D4 — Rolling buffer in `AppStateInner`, pushed per tick
```rust
struct AppStateInner {
feature_window: VecDeque<[f64; N_FEATURES]>, // capacity = WINDOW_FRAMES
// ...
}
```
New helper `push_feature_window(&mut s, &features)` computes the 22-d
feature vector from current per-node amps, pushes to the back of the
buffer, evicts oldest when over capacity. Called at all three tick
sites where `adaptive_override` runs:
* `main.rs:~3030` — multi-BSSID tick handler
* `main.rs:~3225` — WiFi fallback tick handler
* `main.rs:~6510` — per-node loop in the broadcast tick task
`adaptive_override` (read-only over state) builds the 440-d input by
copying the buffer's last 19 entries + the current frame's features,
then calls `model.classify_window(&flat)`. Cold-start (buffer < 20)
falls back to `model.classify(&feat_arr)` — frame-level MLP.
## Verified Acceptance
Retrained on the same 6-node, 151,329-frame set used since ADR-118:
```
LogReg: 49.58%
MLP: 53.53% (+3.95 vs LogReg)
W-MLP: 90.40% (+36.87 vs MLP)
```
Per-class (frame-level MLP → W-MLP):
```
absent 41% → 100% +59
present_still 99% → 100% +1 (already saturated)
transition 36% → 86% +50 (sit-stand cadence captured)
active 30% → 74% +44 (jumping cadence captured)
waving 38% → 90% +52 (gesture cadence captured)
present_moving 33% → 82% +49 (walking step cadence captured)
```
Loss curve confirms breakout from the frame-level plateau:
```
MLP: epoch 0 → 1.28 → epoch 29 → 1.14 (flat plateau)
W-MLP: epoch 0 → 1.01 → epoch 24 → 0.25 (still trending)
```
Total cumulative improvement vs the start-of-session 2-node 15-feature
LogReg baseline:
```
40.4% → 90.40% = +50.0 percentage points
```
## Caveat — training vs generalization
90.40% is **training accuracy**. The W-MLP has ~28,800 weights trained
on ~30,200 windowed samples — capacity is comparable to dataset size,
so some overfitting is expected. True generalization performance will
only be measurable once an independent test set is captured.
Mitigations already in place:
* Weight decay 1e-4 regularises against memorisation
* Cosine LR decay with smooth annealing
* Stride 5 in window construction reduces near-duplicate samples
* Architecture stays small (one hidden layer) — limits overfit capacity
Recommended follow-up: record a 60-second held-out session per class
(separate from training), evaluate W-MLP cold, compare to training
accuracy. Expected drop: 5-15 pts for a healthy model.
## Files Touched
```
v2/crates/wifi-densepose-sensing-server/src/adaptive_classifier.rs:
+ const WINDOW_FRAMES = 20, WINDOWED_INPUT = 440, WINDOWED_HIDDEN = 64
+ pub const N_FEATURES_PUB (for external buffer sizing)
+ pub struct WindowedMlpModel { w1, b1, w2, b2, n_classes }
+ impl WindowedMlpModel::{is_trained, forward}
+ AdaptiveModel.windowed_mlp field (serde-default)
+ AdaptiveModel::classify_window method
+ train_from_recordings builds recording_groups, slides windows,
calls train_windowed_mlp_classifier
+ train_windowed_mlp_classifier (~150 LoC manual backprop)
+ eval_windowed_mlp helper
+ #[derive(Clone)] on Sample (for recording_groups Vec)
v2/crates/wifi-densepose-sensing-server/src/main.rs:
+ AppStateInner.feature_window: VecDeque<[f64; N_FEATURES_PUB]>
+ push_feature_window helper
+ adaptive_override switches to classify_window when buffer is full
+ 3 tick sites call push_feature_window before adaptive_override
docs/adr/ADR-120-windowed-temporal-classifier.md (this)
```
## Out of Scope / Follow-ups
* **Held-out test set** — must record fresh data and evaluate the saved
model cold. Critical to confirm 90% is not training-set memorisation.
* **TCN replacing stacked-MLP** — true 1D convolutions over time would
use weights more efficiently (~5k vs 28k) and generalise better.
Stack-MLP works but is parameter-heavy. Worth a follow-up if data
scales 10×.
* **Sliding output smoothing**`classify_window` emits one decision
per tick (~10 Hz). Adjacent windows are 19/20 identical, so adjacent
predictions should agree. They mostly do (98%+) but flicker at class
boundaries — could apply a 3-tick majority filter.
* **`sitting` vs `standing` split** — both currently merge into
`present_still`. The W-MLP gets them both right at 100% as a combined
class. Splitting them would test whether temporal RF signatures
differ between sitting (chair anchor) and standing (free body).
* **Class imbalance**`present_still` has 2× the windows of other
classes (sitting + standing both contribute). Acceptable since it's
the "neutral" class, but oversampling minority classes might lift
accuracy 1-2 pts further.
* **Smaller window size experiments** — 20 frames = 2 sec at ~10 Hz.
Could try 10 frames (1 sec, faster reaction) or 30 (3 sec, more
context). 20 was a reasonable first guess.
## References
* ADR-118 — feature decorrelation + multi-node (22-feature basis)
* ADR-119 — frame-level MLP (sibling classifier, fallback at cold start)
* ADR-101 — raw amplitude classifier (the path that calls
`AdaptiveModel` via `adaptive_override`)
* ADR-105 — no synthetic data in production runtime; this ADR's
confidence output is real model softmax probability, not a
hardcoded value

View File

@ -0,0 +1,188 @@
# ESPectre Gap Analysis (full Pace Part-2 vs. RuView as of 2026-05-17)
Companion to [`espectre-techniques.md`](espectre-techniques.md). That
doc is the technique catalogue; this one is the **what's still
missing** breakdown, structured exactly along the sections of Pace's
*How I Turned My Wi-Fi Into a Motion Sensor — Part 2*.
## Problem #1: NBVI subcarrier selection
| Pace step | Status in RuView |
|---|---|
| Formula `α·σ/μ² + (1-α)·σ/μ`, α = 0.5 | ✅ ADR-102 |
| Step 1: quiet-window finder | ✅ ADR-102 v2 |
| Step 2: 25 %-percentile dead-zone gate | ✅ ADR-102 |
| **Step 3: rank + validate** | ✅ ADR-104 D4 (commit `6212b17e`) — K ∈ {6,8,10,12,16,20} sweep, smallest-FP wins, ties by smallest total-NBVI |
| Step 4: pick top-K (K=12) | ✅ ADR-102 |
| Amplitude only (no phase) | ✅ same |
All four NBVI steps shipped. If a noisy neighbour energy-overlaps the
top-K, the validator counts FPs over the quiet window and a tighter
(or different) K wins.
## Problem #2: Gain Lock (AGC + FFT)
**All done** — ADR-100. Median over 300 packets, `MIN_SAFE_AGC=30`
skip-on-strong-signal safety, ESP32-S3/C3/C6 platform guards.
## Problem #3: Universal threshold via baseline-variance normalization
**Done** — ADR-103 D3. Pace's `scale = 0.25 / baseline_variance`
implemented as `norm_cv = cv / baseline_cv` with universal gates
`3×` (moving) / `6×` (active). Falls back to absolute gates when no
calibration loaded.
## Two-phase boot calibration (~10 s total)
Pace runs both phases as a single atomic boot sequence on the device:
```
PHASE 1 (3 s) collect AGC/FFT → median → lock
PHASE 2 (7 s) rank subcarriers with gain locked → save top-K to NVS
```
| Phase | Status in RuView |
|---|---|
| Phase 1 in FW | ✅ ADR-100 (`csi_collector.c::rv_gain_lock_process`) |
| **Phase 2 in FW after Phase 1** | ⏳ NBVI intentionally in server as rolling refresh (adapts to slow channel drift). Not planned in FW. |
| **NVS save of gain-lock** | ✅ ADR-108 (commit `3779bb76`) — `csi_cfg/gl_agc` + `gl_fft` |
| **NVS save of NBVI selection** | ⏳ NBVI lives server-side, doesn't apply |
After ADR-108 the FW boots → CSI ready in ~0.5 s (NVS restore) instead
of ~10 s (full 300-packet calibration). Adapting to room changes
without recalibration is now a "clear NVS keys" operation — open item
ADR-108 #1 will surface that as a REST endpoint.
## Persisted calibration (NVS on the sensor)
Pace stores **everything** the algorithm needs in NVS on first boot,
so post-reboot the sensor is back in detect mode in well under a
second:
* AGC lock value
* FFT lock value
* Selected subcarrier indices
* Baseline variance
* User-tuned threshold
| Item | Status in RuView |
|---|---|
| WiFi creds + collector IP in NVS | ✅ `csi_cfg` namespace |
| **Gain lock NVS persistence** | ✅ ADR-108 (`csi_cfg/gl_agc` + `gl_fft`) |
| **NBVI selection NVS persistence** | ⏳ server-side rolling, intentional |
| **Baseline NVS persistence** | ✅ on host disk via ADR-103 (`data/baseline.json`); not on sensor — server is required |
| **Threshold NVS persistence** | ✅ derives from baseline_cv loaded by ADR-103 |
If we ever ship to operators who don't run the Rust server (pure FW
+ HA), the server-side bits (NBVI / baseline / threshold) would have
to migrate to the sensor's NVS. Not on the current roadmap.
## The Game (Web Serial calibration UI)
**Not done.** Pace ships a browser-based reaction game at
`espectre.dev/game` that talks to the ESP32 directly over Web Serial
API (USB-CDC). The game shows a live motion bar, lets the user tune
threshold while playing, and persists the chosen threshold to NVS.
Our closest analogue is the read-only `raw.html` calibration console
(per-node amplitude bars + RSSI traces + classification badges)
served by sensing-server on `/static/raw.html`. No interactive
threshold tuning; no Web Serial path; no game.
## Testing
| Pace ships | RuView has |
|---|---|
| 500+ unit tests | small smoke tests in some crates |
| 90 % code coverage | not tracked |
| Fixed 2 000-packet reference capture (1 000 idle + 1 000 motion) | none — we test live on the operator's deployment |
| PlatformIO + pytest + ESPHome + Codecov on every push | partial — Rust `cargo test` only; 2 parser regression tests added by parallel agent (`csi.rs:751`) |
This is the largest reliability gap. A 2 000-packet replay against
the classifier would protect against silent regressions when we
re-tune thresholds or refactor NBVI.
## Native Home Assistant integration via ESPHome
**Not done.** Pace's sensor shows up in HA the moment it's
flashed — `binary_sensor.motion_<room>` entity with attributes.
ESPHome handles MQTT / native API / device discovery automatically.
RuView publishes via WebSocket and REST only; would need either an
ESPHome component, an MQTT bridge, or a custom HA integration.
## Hardware support
* Pace supports ESP32-S3, ESP32-C3, ESP32-C5, ESP32-C6. Gain-lock is
guarded on these targets only; ESP32 + ESP32-S2 fall back to no
gain lock.
* RuView gain-lock code has the same `#if` guard so the same
hardware list works — but we only have hands-on test data for
ESP32-S3.
## What Pace announces for Part 3 (not yet shipped, not yet on our
## radar either)
* Gesture recognition
* Fall detection
* Person vs. pet classification
## Priority for RuView — current state
### ✅ Done in this session
| Item | Where |
|---|---|
| NVS persistence of gain-lock | ADR-108 (`3779bb76`) |
| FP-rate validation of NBVI (Step 3) | ADR-104 D4 (`6212b17e`) |
| `POST /api/v1/baseline/calibrate` + UI button | ADR-107 (`0f373467`, `45c1464c`) |
| Auto-recalibrate on long-quiet periods | ADR-107 (`0f373467`) |
| Per-subcarrier baseline comparison | ADR-104 (`6212b17e`) |
| Full complex CSI in WS (amp+phase+meta) | ADR-106 (`4daa2c9b`) |
| Sensor µs timestamp from FW | ADR-106 (`b787f40a`) |
| Managed-ping CSI keepalive (no ручной ping) | ADR-106 (`8489efe9`) |
| No synthetic data in production runtime | ADR-105 (`9aa027e9`, `30244d27`) |
| OTA flash via WiFi (8032 port) | `ota-pipeline.md` (`274984d3`) |
### ⏳ Still open / deferred, by impact
**Updated 2026-05-17** — Most of the original "still open" items shipped
during this session. The list below is now only items that are **out
of session scope** (HA / ESPHome / Web Serial / channel hopping per
operator constraints), or items that need operator action (camera-side
training capture).
| # | Item | Net benefit | Estimate | Status |
|---|---|---|---|---|
| 1 | **HA via MQTT** | sensor as HA entity, ecosystem reach | 1 day | Deferred (operator said: no new integrations) |
| 2 | ~~Fixed-replay test suite (2 000 packets)~~ | regression protection over the classifier + NBVI | ✓ **Done** — ADR-114 (`96225e27`); F1 = 1.000 on 1000 idle + 1000 motion fixtures |
| 3 | ~~Per-sub delta sparkline in `raw.html`~~ | operator sees off-axis drift channel firing in real time | ✓ **Done** — ADR-104 (`eec3ca6c`) drift sparkline + ADR-107 D6 progress bar (`432753e1`) |
| 4 | ~~`POST /ota/recalibrate` (clear NVS gain-lock)~~ | reset gain-lock without USB after AP swap or relocation | ✓ **Done** — ADR-109 (`f92807cd`) |
| 5 | ~~Track AP MAC in NVS alongside AGC/FFT~~ | auto-invalidate stale gain-lock on AP change | ✓ **Done** — folded into ADR-109 (`gl_ap_mac` key, same commit) |
| 6 | ~~Multi-AP signal_field via `MultistaticFuser`~~ | physically real spatial map | ✓ **Done** — ADR-112 (`c8ac60f6`); 320/400 cells non-zero on two live sensors |
| 7 | ~~Per-subcarrier baseline AGE check~~ | flag for re-calibration when channel slowly drifts | ✓ **Done** — ADR-104 staleness watch (`eec3ca6c`) — warns when baseline > 14400 s AND drift > 0.15 for ≥3 ticks |
| 8 | ~~Phase-domain drift (vs amplitude-only today)~~ | sub-mm chest-wall motion detection for vitals | ✓ **Done** — ADR-104 phase channel (`47dafab4`); requires empty-room re-record to activate (`per_subcarrier_phase_mean` not in current `baseline.json` v1 schema) |
| 9 | **Tailscale-target in NVS** | sensor stream keeps working when Mac roams networks | 30 min provision + reflash | Deferred (Mac stable on TP-Link, low ROI). **Alternative shipped: ADR-115 `/ota/set-target`** lets operator repoint via REST without USB/Tailscale. |
| 10 | **ESPHome native component (instead of MQTT bridge)** | tighter HA integration than #1 | 2-3 days | Deferred (operator said: no new integrations) |
| 11 | **Web Serial calibration game** | playful threshold tuning | 1 day | Deferred (operator said: no new integrations) |
| 12 | **Boot-time NBVI freeze in FW** | trade-off vs adaptive: don't adopt unless FP issues in real homes | 2 h | Deferred (server-side rolling NBVI working; no observed FP problem) |
| 13 | **Per-channel NVS cache for gain-lock** | only needed if channel hopping (ADR-029) re-activated | 1 h | Deferred (channel hopping not active) |
| 14 | **DensePose model train + load** | unlock pose estimation | 1-3 days | **Mostly done** — model loader shipped in **ADR-116** (`7cdd8f69`) with `ruv/ruview/wiflow-v1`. Output requires per-deployment fine-tune (camera-supervised capture) — operator-side work, scoped as Pack B / Pack E. |
| 15 | **`/ota/set-target` REST** *(new this session)* | repoint CSI aggregator without USB after Mac-IP / router change | — | ✓ **Done** — ADR-115 (`7d3e0c2d`) |
| 16 | **Process-hygiene + audit follow-ups** *(new this session)* | UDP loopback filter, ping pre-reap, `/` redirect, wiflow zero-pad, lock-clone optim, sensing-tab container, test-isolation guard, ADR/CHECKLIST consistency | — | ✓ **Done** — ADR-117 (this PR) |
## References
* [`espectre-techniques.md`](espectre-techniques.md) — technique catalogue
* [`ota-pipeline.md`](ota-pipeline.md) — WiFi-OTA recipe (port 8032)
* [ADR-100](../adr/ADR-100-gain-lock-baseline-stabilization.md) — gain lock
* [ADR-101](../adr/ADR-101-raw-amplitude-classifier.md) — classifier
* [ADR-102](../adr/ADR-102-nbvi-subcarrier-selection.md) — NBVI
* [ADR-103](../adr/ADR-103-persistent-baseline.md) — baseline persistence
* [ADR-104](../adr/ADR-104-per-subcarrier-drift-presence.md) — per-sub drift + NBVI FP-validation
* [ADR-105](../adr/ADR-105-no-synthetic-data-in-production-runtime.md) — no synthetic data
* [ADR-106](../adr/ADR-106-full-complex-csi-keepalive.md) — full complex CSI + keepalive
* [ADR-107](../adr/ADR-107-auto-recalibrate-and-rest-baseline.md) — REST + auto-recalibrate
* [ADR-108](../adr/ADR-108-fw-nvs-persist-gain-lock.md) — FW NVS persist gain-lock
* Pace, *How I Turned My Wi-Fi Into a Motion Sensor — Part 2*, Dec 2025
* `francescopace/espectre` on GitHub (GPLv3)

View File

@ -0,0 +1,199 @@
# ESPectre (Francesco Pace) — Technique Reference
Source: Pace's *Part 2* (Dec 2025) +
[francescopace/espectre](https://github.com/francescopace/espectre)
(GPLv3). Living checklist of techniques + RuView adoption status;
update when items move.
## 1. Gain Lock (AGC + FFT scale)
The ESP32 PHY applies automatic gain control per packet. For normal
WiFi reception that keeps decoding optimal; for CSI sensing it
manifests as a 20-30 % slow drift in amplitude even in an empty
room, masking real body modulation. Two undocumented PHY routines
freeze the gain:
```c
extern void phy_fft_scale_force(bool force_en, int8_t force_value);
extern void phy_force_rx_gain(int force_en, int force_value);
```
Recipe:
1. After WiFi association, collect AGC and FFT gain values from
each CSI packet.
2. At packet 300 (~3 s at 100 pps), take the **median** of each
(more robust than mean against outliers).
3. Call the two PHY routines with the medians to lock the radio.
4. Safety branch: if median AGC < 30, skip the lock forcing low
gain freezes the RX path. Sensor must be moved further from AP.
Supported targets: ESP32-S3, ESP32-C3, ESP32-C5, ESP32-C6. Older
parts have no access to these PHY hooks.
**RuView status — DONE.** ADR-100 (commit `8aef8206`).
Implemented in `firmware/esp32-csi-node/main/csi_collector.c` as
`rv_gain_lock_process`. Boot log on both sensors:
`gain-lock APPLIED: AGC=42/44, FFT=-31/-42 (median of 300 packets)`.
Empty-room CV dropped from ~10 % (full broadband) to 3-4 % after
NBVI also kicked in.
## 2. NBVI — Normalized Baseline Variability Index
Per-subcarrier score that picks the K most useful subcarriers
automatically.
```
NBVI(k) = α · (σ_k / μ_k²) + (1 - α) · (σ_k / μ_k), α = 0.5
```
* `σ_k / μ_k²` penalises weak subcarriers (low μ → high score → bad).
* `σ_k / μ_k` is the standard coefficient of variation; rewards
stability.
* α = 0.5 balances; pure σ/μ² picks stable-but-quiet bins, pure σ
picks loud-but-noisy bins.
* Amplitude-only (no phase) — phase has Temporal Phase Rotation
artefacts that need extra calibration; amplitude is calibration-
free.
Four-step pipeline at boot:
| Step | What | Detail |
|---|---|---|
| 1 | **Find quiet moments** | Slide a window across the calibration buffer, pick the windows with the lowest aggregate variance via percentile detection. Tolerates someone walking through during boot. |
| 2 | **Dead-zone gate** | Drop any subcarrier with mean amplitude below the 25th percentile across all subcarriers. Guard tones + null bins are excluded so they don't "win" σ/μ² → ∞. |
| 3 | **Rank + validate** | Sort by NBVI ascending. Run the motion detector on each candidate config, measure false-positive rate, take the config with the lowest FP. |
| 4 | **Pick winners** | Top-K by lowest NBVI (typically K = 12 for HT20). |
Memory: O(N) running with on-the-fly mean/variance, ≈ 256 B for 64
subcarriers. Time: O(N · L) per recompute, ms on a $10 device.
**RuView status — DONE (all 4 NBVI steps).** Server-side: ADR-102
(`2f12a223`, `f4119924`) covers Steps 1+2+4; ADR-104 D4 (`6212b17e`)
closes Step 3 (K ∈ {6,8,10,12,16,20} sweep, smallest-FP wins). FW-
side boot freeze remains intentionally absent — server-side rolling
refresh adapts to slow channel drift (ADR-102 D6).
Empirically on the operator's deployment NBVI alone gave a 1.5-2× CV
reduction:
| | Full 56 subc | NBVI top-12 |
|---|---|---|
| node 1 idle CV | 5.0 % | 3.1 % |
| node 2 idle CV | 7.0 % | 3.9 % |
## 3. Baseline-variance threshold normalization
Pace's third problem was that `threshold = 1.0` meant different
things on different devices. Fix:
```python
if baseline_variance > 0.25:
scale = 0.25 / baseline_variance
else:
scale = 1.0
```
Reference 0.25 is what a quiet room typically measures during NBVI
calibration. Apply the scale to the live motion score, so the user-
facing threshold (`= 1.0`) is universal across rooms.
**RuView status — DONE.** ADR-103 D3 (commit `2f4b2d53`).
`amp_node_level` and `amp_classify_from_latest` divide live CV by
`baseline_cv` loaded from `data/baseline.json` and gate at universal
`3×` (moving) / `6×` (active). Falls back to absolute gates
`0.10 / 0.22` when no calibration loaded — backwards compatible.
## 4. Two-phase boot calibration
```
PHASE 1: GAIN LOCK (3 s, 300 packets)
Collect AGC/FFT → median → lock.
PHASE 2: NBVI CALIBRATION (7 s, 700 packets)
With gain locked, rank subcarriers → pick top-K.
Total ≈ 10 s. Room must be mostly quiet during this window.
```
**RuView status — SPLIT.** Phase 1 is in FW (ADR-100). Phase 2 lives
in the server as a rolling refresh, not a boot-time fix-point. See
NBVI section above for the implications.
## 5. Persisted baseline / device threshold
After NBVI calibration, ESPectre writes the AGC/FFT lock values, the
chosen subcarrier set, the baseline variance, and the threshold into
NVS so reboots don't need re-calibration.
**RuView status — DONE.** Two-layer persistence:
* **Server side (ADR-103, commits `f4119924`, `2f4b2d53`)**:
`data/baseline.json` keeps per-node full-broadband mean/p95/CV +
per-subcarrier means, loaded on server boot via `load_baseline_file`.
* **FW side (ADR-108, commit `3779bb76`)**: gain-lock AGC + FFT
saved to NVS namespace `csi_cfg` keys `gl_agc`/`gl_fft` after the
first calibration; subsequent boots restore instantly (skip the
300-packet sampler). NBVI selection is **intentionally** server-
side rolling, not persisted — design choice, not a gap.
## 6. Interactive Web Serial game (`espectre.dev/game`)
Browser ↔ ESP32 over USB Web Serial API. Shows live motion as a bar,
lets user tune `threshold` while playing a reaction game. Settings
persist via NVS.
**RuView status — NOT DONE.** Closest analogue is our `raw.html`
calibration console (per-node bars + RSSI trace), but it's read-only.
## 7. Native Home Assistant integration via ESPHome
Sensor exposes occupancy/motion entities directly to HA.
**RuView status — NOT DONE.** No HA integration path. Could be added
via MQTT or a custom ESPHome component.
## 8. Test suite
Pace ships 500+ unit tests, 90 % coverage, validated against a fixed
2000-packet capture (1000 idle + 1000 motion). CI runs PlatformIO,
pytest, ESPHome build, Codecov on every push.
**RuView status — PARTIAL.** Agent added 2 regression tests for the
binary CSI frame parser (`csi.rs:751`); no regression set captured
for the amplitude classifier or NBVI.
## Comparison summary (what RuView has, doesn't have, has differently)
| Item | Pace / ESPectre | RuView |
|---|---|---|
| Gain lock | FW, 300 pkt median, AGC+FFT, AGC<30 skip | ADR-100 |
| NBVI formula α=0.5, top-12, dead-zone gate | ✅ | ✅ ADR-102 |
| Quiet-window finder (Step 1) | ✅ | ✅ ADR-102 v2 |
| FP-rate validation (Step 3) | ✅ | ❌ raw ranking |
| Boot-time NBVI freeze | FW, ~7 s post-lock | ❌ server-side rolling |
| Baseline variance normalization (universal threshold) | ✅ | ✅ ADR-103 D3 |
| Persisted baseline to disk | NVS | ✅ ADR-103 D1 (`data/baseline.json`) |
| NVS persistence of FW calibration | ✅ | ❌ fresh each FW boot |
| Calibration UI | Web Serial game | ❌ read-only `raw.html` |
| HA / ESPHome integration | ✅ | ❌ none |
| Test suite | 500+ tests, 90 % cov | ❌ 2 parser tests |
| Phase / amplitude | amplitude only | ✅ same |
## Open items (full gap-by-section: [`espectre-gap-analysis.md`](espectre-gap-analysis.md))
1. **REST `POST /api/v1/baseline/calibrate`** — drives the recording
script from a button in `raw.html` instead of CLI. ~30 min.
2. **FP-rate validation of NBVI pick** — defense against the top-12
accidentally overlapping a noise source. ~1 h.
3. **Per-subcarrier baseline comparison (ADR-104 draft)** — uses the
already-saved `per_subcarrier_mean` in `baseline.json` for L2
distance instead of broadband mean ratio. Better off-axis
presence sensing. ~1 h.
4. **Auto-recalibrate on long quiet periods** — if classifier sees
`absent` with low variance for 30 min, refresh baseline in
background. Eliminates manual script step entirely. ~1 h.
5. **FW-side NBVI boot-freeze + NVS persistence** — full
reproducibility, sub-second post-boot ready. Trade-off: doesn't
adapt to room changes. ~2 h.
6. **HA / ESPHome integration** — sensor as HA entity. ~1 day.
7. **Test suite vs fixed 2 000-packet replay** — regression
protection for the classifier + NBVI. ~1 day.
8. **Web Serial calibration game** — nice-to-have. ~1 day.

View File

@ -0,0 +1,367 @@
# OTA Pipeline — Full Reproduction Recipe
Verbatim agent contribution (2026-05-17), saved as authoritative
reference for the WiFi-OTA flow on this RuView fork. Kept whole
deliberately — splitting it would lose the diagnostic flowchart.
## TL;DR
OTA works because **three FW-side fixes** are in place. Without them
the chip receives the firmware, reboots, **panics during early boot
of the new partition**, the bootloader rolls back, and from outside
it looks like "OTA didn't work" even though the upload succeeded.
Most agents focus on the network side (curl, gh-action) and miss it,
because the bug lives inside the firmware.
---
## 0 · Prerequisites (without them OTA = panic loop)
These three things **must already be in the firmware running on the
chip** (i.e. in ota_0/factory before the first OTA). If they're not
there, fix once via USB-flash; after that, OTA works.
### A. `OTA_SIZE_UNKNOWN` instead of `OTA_WITH_SEQUENTIAL_WRITES`
**File:** `firmware/esp32-csi-node/main/ota_update.c:137`
```c
esp_err_t err = esp_ota_begin(update_partition, OTA_SIZE_UNKNOWN, &ota_handle);
```
**Why:** `OTA_WITH_SEQUENTIAL_WRITES` erases 4 KB pages on the fly
as it writes. If the new binary (~870 KB) is smaller than the previous
one in the same partition (~1.1 MB), **tail of the old code stays in
the partition**. The SHA-image-verify in `esp_ota_end()` only checks
the declared image-header length — residual code isn't covered. After
reboot the new app may jump into IRAM / a .literal pool address
overlapped by stale code → **Guru Meditation Error** → bootloader
rolls back.
`OTA_SIZE_UNKNOWN` forces a **full partition erase before write**
(~1.5 s overhead, unnoticeable).
### B. `config.stack_size = 8192` for httpd
**File:** `firmware/esp32-csi-node/main/ota_update.c:225`
```c
httpd_config_t config = HTTPD_DEFAULT_CONFIG(); // default stack_size = 4096
config.server_port = OTA_PORT;
config.max_uri_handlers = 12;
config.recv_wait_timeout = 30;
config.stack_size = 8192; // ← critical
```
**Why:** `esp_ota_end()` streams a SHA-256 verify over the entire
image and walks the mmap segments = >5 KB of local variables. On the
standard 4 KB httpd-task stack → **stack overflow** at validation
time. The chip panics **inside the handler**, before
`esp_ota_set_boot_partition()`. From outside you see
`{"status":"ok"}` (it's sent before `esp_ota_end`), but the partition
doesn't switch.
### C. Reset reason logged in `app_main`
**File:** `firmware/esp32-csi-node/main/main.c:130-153`
```c
static const char *reset_reason_str(esp_reset_reason_t r) {
switch (r) {
case ESP_RST_PANIC: return "PANIC";
case ESP_RST_TASK_WDT: return "TASK_WDT";
case ESP_RST_SW: return "SW";
...
}
}
void app_main(void) {
esp_reset_reason_t rr = esp_reset_reason();
const esp_partition_t *running = esp_ota_get_running_partition();
ESP_LOGI(TAG, "boot: reset_reason=%s running_partition=%s",
reset_reason_str(rr),
running ? running->label : "?");
...
}
```
**Why:** Without this line you **cannot tell** "new image booted
cleanly after OTA" from "new image panicked → rolled back". `/ota/status`
looks the same (or suspicious) in both cases. With this line the
first UART line after boot tells the truth:
- `reset_reason=SW running_partition=ota_1` → OTA OK, new image in ota_1.
- `reset_reason=PANIC running_partition=ota_0` → new image panicked,
rollback worked. **This is the case other agents get stuck in —
without the log it's impossible to diagnose.**
---
## 1 · Wire format of POST /ota
**Endpoint:** `POST http://<node-ip>:8032/ota`
**Headers:**
- `Content-Type: application/octet-stream` (required)
- `Content-Length: <bytes>` (curl/urllib sets it)
- `Authorization: Bearer <psk>` (only if `security/ota_psk` is in NVS)
**Body:** raw bytes of `build/esp32-csi-node.bin` — no multipart, no base64.
**Response on success:**
```json
{"status":"ok","message":"OTA update successful. Rebooting..."}
```
**Important about the response:** the chip sends it **before
`esp_restart()`**, but `vTaskDelay(1000ms)` between response and
restart **does not guarantee delivery**. On macOS / Linux curl will see:
- `{"status":"ok"...}`, or
- `Connection reset by peer` (TCP RST from the dying side), or
- `Recv failure`.
**All three are upload success.** The real check is NOT curl's
status — it's a **second GET `/ota/status` after reboot**.
---
## 2 · Chip's path through the handler
```
HTTP POST /ota
ota_check_auth(req) ← if PSK in NVS, verifies Authorization header
esp_ota_get_next_update_partition(NULL)
│ ← running in ota_0 → returns ota_1, and vice-versa
esp_ota_begin(part, OTA_SIZE_UNKNOWN, &handle)
│ ← full erase of target partition (~1.5 s)
loop {
received = httpd_req_recv(req, buf, 1024)
esp_ota_write(handle, buf, received)
} ← writes in 1 KB chunks
esp_ota_end(handle) ← SHA-256 verify over the entire image (>5 KB stack)
esp_ota_set_boot_partition(part) ← writes "boot from target" into otadata
httpd_resp_send(JSON) ← replies {"status":"ok"...}
vTaskDelay(1000ms) ← window so TCP flush goes out (best-effort)
esp_restart() ← soft reset via RTC_SW_CPU_RST
[bootloader picks ota_1 from otadata → loads new image → app_main]
"I (335) main: boot: reset_reason=SW running_partition=ota_1"
```
---
## 3 · Flashing via `scripts/ota-deploy.sh`
```bash
# Scenario A — deploy to all nodes on local /24 (auto-discover):
scripts/ota-deploy.sh
# Scenario B — specific IPs:
scripts/ota-deploy.sh 192.168.0.100 192.168.0.101
# Scenario C — build before deploy:
scripts/ota-deploy.sh --build
# Scenario D — with auth:
OTA_PSK=your_token scripts/ota-deploy.sh
```
**What the script does under the hood (4 phases):**
### Phase 1 — discovery
```python
arp -a -n → ['192.168.0.100', '192.168.0.101', ...]
# parallel GET /ota/status:8032 (timeout 1.5s)
# only IPs that return valid JSON survive
```
If ARP is empty (fresh Mac boot) → fallback ping-sweep `.100``.110`.
### Phase 2 — snapshot before
```
GET /ota/status:8032 on each node
→ remember running_partition (ota_0 or ota_1)
```
### Phase 3 — parallel upload
```python
ThreadPoolExecutor(max_workers=len(targets))
for each node:
urllib POST with body = read_bytes(esp32-csi-node.bin)
ConnectionResetError caught as expected (that's the reboot)
```
### Phase 4 — verify
```
sleep 10 ← wait for boot to finish
for each node (up to 6 retries, 3-s delay):
GET /ota/status:8032
new_part != old_part → ✓
new_part == old_part → ✗ FAIL (panicked)
exit 0 if all OK, 1 if any node didn't confirm
```
---
## 4 · Diagnosis when "OTA doesn't work"
Flowchart that catches **every observable failure mode** on ESP32-S3
in this FW:
```
GET /ota/status works?
├── 404/timeout → node offline / wrong network / IP changed (check `arp -a`)
├── 200, time=OLD → OTA didn't take (see below)
└── 200, time=NEW → OTA OK ✓
OTA didn't take — diagnose via UART (USB!):
See "boot: reset_reason=..." in UART?
├── reset_reason=POWERON → chip didn't reboot — POST didn't arrive, check curl
├── reset_reason=SW AND running_partition=ota_X → OTA OK, may be server-side cache
├── reset_reason=PANIC AND running_partition=ota_0
│ → NEW image panics at boot
│ → causes (most likely first):
│ 1. OTA_WITH_SEQUENTIAL_WRITES → tail of old code (fix A above)
│ 2. esp_ota_end stack overflow (fix B above)
│ 3. ABI mismatch bootloader vs new app (USB-flash bootloader.bin)
│ 4. real bug in new code (read the backtrace before PANIC)
├── reset_reason=TASK_WDT → handler hung mid-upload
└── reset_reason=BROWNOUT → power supply browned out under stress
(USB on bus power?)
```
If UART is unavailable (no USB) but HTTP works: POST then GET
`/ota/status` three times at 5 s intervals. If `next_partition`
flip-flops, the chip is in a panic loop. That's a definitive diagnosis.
---
## 5 · Why other agents fail (common pitfalls)
| Pitfall | Symptom | Fix |
|---|---|---|
| Treat OTA as a pure network problem, never look at FW | "POST returned 200 but time doesn't change" → endless curl-header experiments | **Verify the three FW prerequisites first**, before any curl |
| Use `OTA_WITH_SEQUENTIAL_WRITES` (it's in IDF examples) | OTA works once, stops working after binary size changes | Switch to `OTA_SIZE_UNKNOWN` |
| Leave httpd stack at 4 KB | Sometimes works (fast SHA), sometimes doesn't — looks flaky | `config.stack_size = 8192` |
| Enable `CONFIG_BOOTLOADER_APP_ROLLBACK_ENABLE=y` "for safety" | Every OTA rolled back because nobody calls `esp_ota_mark_app_valid_cancel_rollback()` | Either disable, or call the API after 10 s |
| `curl` without `--data-binary` (only `-d`) | Binary corrupted by HTML-encoding | Use `--data-binary @file.bin` or urllib bytes |
| Measure success by HTTP response code | Connection reset = normal (esp_restart kills socket), not failure | Re-check via **GET /ota/status after reboot** |
| Don't wait 10 s after reboot before verify | Verify times out, agent thinks OTA failed | `sleep 10` (or backoff retries) |
| Ignore that mDNS names drift | Flash the wrong node, or stale ARP cache | Auto-discover by IP **at deploy time**, not by hostname |
| Share a single file descriptor across upload threads | Race conditions, partial reads | Each upload-thread opens its own file |
| Rely on bootloader rollback instead of explicit app_valid | Image sometimes flagged BAD, OTA becomes non-idempotent | If rollback enabled, MUST call `esp_ota_mark_app_valid_cancel_rollback()` |
---
## 6 · Things other agents do **wrong**
From recurring patterns in others' logs:
1. **Rely on `idf.py flash --port .../ota`** — that mode does NOT
exist in idf.py. OTA is only via the HTTP handler.
2. **Send via `ssh esp32 'esp_ota_write ...'`** — ESP32 has no shell;
OTA is only via the HTTP endpoint.
3. **Run MQTT-based OTA** — this FW has no MQTT client; only HTTP
POST on 8032.
4. **Use ESP RainMaker / esp_https_ota** — those require HTTPS +
cert; we serve plain HTTP. Don't confuse the APIs.
5. **Re-use an old build of
`firmware/esp32-csi-node/build/esp32-csi-node.bin`** — forget to
run `idf.py build`. The script's `--build` solves that.
---
## 7 · Quick reference (for the next agent)
```bash
# Once over USB if the nodes still run pre-fix firmware:
cd /Users/arsen/Desktop/RuView/firmware/esp32-csi-node
source ~/esp/esp-idf-v5.2/export.sh
idf.py build
# Hold BOOT+RESET on the device
cd build
esptool.py --chip esp32s3 --port /dev/cu.usbmodem... -b 460800 \
--before default-reset --after hard-reset write-flash \
--flash-mode dio --flash-size 8MB --flash-freq 80m \
0x0 bootloader/bootloader.bin \
0x8000 partition_table/partition-table.bin \
0xf000 ota_data_initial.bin \
0x20000 esp32-csi-node.bin
# Forever after, over WiFi:
scripts/ota-deploy.sh --build
# (auto-discover, parallel POST, verify, exit code)
```
## Operator REST endpoints on the running FW (port 8032)
After the first OTA the FW exposes three control endpoints. They share
the same Bearer-PSK auth as `/ota` (open when `security/ota_psk` NVS
key is unset, gated when set). All accept plain HTTP — no JSON
dependency on the FW side.
| Method | Path | Body | Purpose | ADR |
|---|---|---|---|---|
| `GET` | `/ota/status` | — | Version, date, running/next partition, max image size | ADR-045 |
| `POST` | `/ota` | image bin | Upload + flash (auth-gated) | ADR-045 |
| `POST` | `/ota/recalibrate` | — | Clear `csi_cfg/gl_agc` + `gl_fft` + `gl_ap_mac`, reboot — forces fresh gain-lock at next boot | ADR-109 |
| `POST` | `/ota/set-target` | `IPv4:PORT` plain text | Write `csi_cfg/target_ip` + `target_port` to NVS, reboot — repoints the CSI aggregator after Mac IP move / router swap without USB | ADR-115 |
Examples (operator side, no USB):
```bash
# After moving Mac to a new LAN / changing routers:
curl -s -X POST -d '192.168.0.103:5005' http://192.168.0.100:8032/ota/set-target
curl -s -X POST -d '192.168.0.103:5005' http://192.168.0.101:8032/ota/set-target
# Each returns {"status":"ok","target_ip":"...","target_port":...,"message":"rebooting"}
# After AP swap that changed the indoor path geometry:
curl -X POST http://192.168.0.100:8032/ota/recalibrate
# Sensor reboots, re-runs the 300-packet gain-lock sampler (~312s).
# Sanity probe:
curl http://192.168.0.100:8032/ota/status
```
With auth provisioned (`security/ota_psk` in NVS):
```bash
curl -X POST -H "Authorization: Bearer $RUVIEW_OTA_PSK" \
-d '192.168.0.103:5005' \
http://192.168.0.100:8032/ota/set-target
```
---
**Bottom line:** OTA is not "send a file via curl", it's an
**end-to-end protocol** between the on-chip handler and the host
tooling. 80 % of the work lives on the FW side (correct erase,
correct stack, correct log). The network part is trivial
(`urllib.request.urlopen(POST)`). Agents who "can't" usually stopped
at the network layer and didn't realise the chip is panicking.

View File

@ -275,6 +275,11 @@ static void emit_feature_state(void)
pkt.presence_score = obs.presence_score;
pkt.anomaly_score = obs.anomaly_score;
pkt.node_coherence = obs.node_coherence;
/* ADR-100 D3: ship median RSSI through feature_state so the server
* UI's RSSI trace has something other than the -50 fallback. The
* value comes from radio_ops::get_health() which medians rx_ctrl.rssi
* across the recent capture window. 0 means "not measured yet". */
pkt.rssi_dbm = obs.rssi_median_dbm;
}
/* Fill vitals from edge_processing's latest packet. */

View File

@ -17,9 +17,12 @@
#include "edge_processing.h"
#include <string.h>
#include <stdlib.h>
#include "esp_log.h"
#include "esp_wifi.h"
#include "esp_timer.h"
#include "nvs.h"
#include "nvs_flash.h"
#include "sdkconfig.h"
/* ADR-060: Access the global NVS config for MAC filter and channel override. */
@ -52,6 +55,231 @@ static bool s_filter_mac_set = false;
static const char *TAG = "csi_collector";
/* ──────────────────────────────────────────────────────────────────
* ADR-100: Gain Lock (AGC + FFT scale).
*
* ESP32 WiFi PHY applies automatic gain control per packet, which
* manifests as a 20-30 % slow drift in CSI amplitude even with a
* completely static room masking the real modulation caused by
* body motion. Ported from Francesco Pace's ESPectre (GPLv3,
* https://github.com/francescopace/espectre).
*
* The first ~300 packets after boot are sampled. We take the median
* AGC + FFT gain values and freeze them with two undocumented PHY
* routines from the IDF blob. If the median AGC is below the safe
* threshold (sensor sits very close to the AP), we *don't* lock
* forcing a low gain causes the RX path to freeze.
* Supported targets: ESP32-S3 / C3 / C6. Older parts skip silently.
* */
#if CONFIG_IDF_TARGET_ESP32S3 || CONFIG_IDF_TARGET_ESP32C3 || CONFIG_IDF_TARGET_ESP32C6
#define RV_GAIN_LOCK_SUPPORTED 1
/* Overlay struct on wifi_csi_info_t.rx_ctrl exposing the hidden agc/fft fields. */
typedef struct {
unsigned : 32; unsigned : 32; unsigned : 32;
unsigned : 32; unsigned : 32; unsigned : 16;
signed fft_gain : 8;
unsigned agc_gain : 8;
unsigned : 32; unsigned : 32;
unsigned : 32; unsigned : 32; unsigned : 32;
unsigned : 32;
} rv_phy_rx_ctrl_t;
extern void phy_fft_scale_force(bool force_en, int8_t force_value);
extern void phy_force_rx_gain(int force_en, int force_value);
/* ── ADR-108: NVS persistence of gain-lock values ────────────────
* After the first successful gain-lock, save AGC/FFT medians into NVS
* (namespace "csi_cfg", keys "gl_agc"/"gl_fft"). On subsequent boots
* the FW loads them and immediately forces the gain reboot CSI
* ready in ~0.5 s instead of ~3 s waiting for 300 calibration packets.
*
* Stored values are tied to: this sensor location + this AP MAC +
* this channel + this antenna orientation. If any of those change,
* the saved values may be wrong but harmless: the WiFi PHY will
* just receive slightly off-optimal CSI until the operator triggers
* a re-calibration (today: clear NVS, reboot; future: dedicated REST).
*/
#define RV_GAIN_NVS_NS "csi_cfg"
#define RV_GAIN_NVS_K_AGC "gl_agc"
#define RV_GAIN_NVS_K_FFT "gl_fft"
/* ADR-111: BSSID of the AP that gain-lock was calibrated against.
* 6-byte blob. On boot, if the currently-connected AP MAC differs from
* the saved value, the cached AGC/FFT are ignored and a full calibration
* runs (gain-lock is tied to a specific AP path; swapping APs invalidates
* it). The new MAC is written alongside AGC/FFT after re-calibration. */
#define RV_GAIN_NVS_K_AP_MAC "gl_ap_mac"
static esp_err_t rv_gain_load_from_nvs(uint8_t *agc_out, int8_t *fft_out,
uint8_t mac_out[6])
{
nvs_handle_t h;
esp_err_t err = nvs_open(RV_GAIN_NVS_NS, NVS_READONLY, &h);
if (err != ESP_OK) return err;
uint8_t agc = 0;
int8_t fft = 0;
err = nvs_get_u8(h, RV_GAIN_NVS_K_AGC, &agc);
if (err == ESP_OK) err = nvs_get_i8(h, RV_GAIN_NVS_K_FFT, &fft);
/* AP MAC is optional — older NVS blobs predate ADR-111 and have only
* AGC+FFT. Treat a missing MAC as a wildcard match so a one-time
* upgrade doesn't force every node to do a full re-cal. */
if (err == ESP_OK && mac_out != NULL) {
size_t want = 6;
esp_err_t mac_err = nvs_get_blob(h, RV_GAIN_NVS_K_AP_MAC, mac_out, &want);
if (mac_err != ESP_OK || want != 6) {
memset(mac_out, 0, 6);
}
}
nvs_close(h);
if (err == ESP_OK) { *agc_out = agc; *fft_out = fft; }
return err;
}
static void rv_gain_save_to_nvs(uint8_t agc, int8_t fft, const uint8_t mac[6])
{
nvs_handle_t h;
esp_err_t err = nvs_open(RV_GAIN_NVS_NS, NVS_READWRITE, &h);
if (err != ESP_OK) {
ESP_LOGW("csi_collector", "gain-lock NVS save: nvs_open failed: %s",
esp_err_to_name(err));
return;
}
nvs_set_u8(h, RV_GAIN_NVS_K_AGC, agc);
nvs_set_i8(h, RV_GAIN_NVS_K_FFT, fft);
if (mac != NULL) {
nvs_set_blob(h, RV_GAIN_NVS_K_AP_MAC, mac, 6);
}
nvs_commit(h);
nvs_close(h);
}
#define RV_GAIN_CAL_PACKETS 300u
#define RV_GAIN_MIN_SAFE_AGC 30u /* < 30 → forcing freezes RX. */
static uint8_t s_agc_samples[RV_GAIN_CAL_PACKETS];
static int8_t s_fft_samples[RV_GAIN_CAL_PACKETS];
static uint16_t s_gain_pkt_count = 0;
static bool s_gain_locked = false;
static bool s_gain_skipped_strong = false;
static uint8_t s_gain_agc_value = 0;
static int8_t s_gain_fft_value = 0;
static int rv_cmp_u8(const void *a, const void *b) {
return (int)*(const uint8_t *)a - (int)*(const uint8_t *)b;
}
static int rv_cmp_i8(const void *a, const void *b) {
return (int)*(const int8_t *)a - (int)*(const int8_t *)b;
}
static void rv_gain_lock_process(const wifi_csi_info_t *info)
{
if (s_gain_locked || info == NULL) return;
/* ADR-108: short-circuit calibration if previous values are in NVS.
* ADR-111: also compare the saved BSSID with the currently-connected
* AP. If they differ, the cached gain is invalid (different AP path
* different multipath, different optimal AGC) discard it and run
* a full calibration against the new AP. */
static bool s_nvs_checked = false;
if (!s_nvs_checked) {
s_nvs_checked = true;
uint8_t agc = 0; int8_t fft = 0; uint8_t saved_mac[6] = {0};
if (rv_gain_load_from_nvs(&agc, &fft, saved_mac) == ESP_OK &&
agc >= RV_GAIN_MIN_SAFE_AGC)
{
/* Read the current AP MAC. If we can't (not connected yet)
* the gain-lock callback should not be firing at all but
* be defensive and skip the cache if AP info is unavailable. */
wifi_ap_record_t ap;
bool ap_ok = (esp_wifi_sta_get_ap_info(&ap) == ESP_OK);
bool wildcard = true;
for (int i = 0; i < 6; i++) {
if (saved_mac[i] != 0) { wildcard = false; break; }
}
if (ap_ok && (wildcard ||
memcmp(saved_mac, ap.bssid, 6) == 0))
{
phy_fft_scale_force(true, fft);
phy_force_rx_gain(1, (int)agc);
s_gain_agc_value = agc;
s_gain_fft_value = fft;
s_gain_locked = true;
ESP_LOGI("csi_collector",
"gain-lock RESTORED from NVS: AGC=%u FFT=%d "
"AP=%02x:%02x:%02x:%02x:%02x:%02x%s",
(unsigned)agc, (int)fft,
ap.bssid[0], ap.bssid[1], ap.bssid[2],
ap.bssid[3], ap.bssid[4], ap.bssid[5],
wildcard ? " (legacy NVS, no MAC stored)" : "");
return;
}
if (ap_ok) {
ESP_LOGW("csi_collector",
"gain-lock NVS MISS: saved AP=%02x:%02x:%02x:%02x:%02x:%02x "
"→ current=%02x:%02x:%02x:%02x:%02x:%02x. Re-calibrating.",
saved_mac[0], saved_mac[1], saved_mac[2],
saved_mac[3], saved_mac[4], saved_mac[5],
ap.bssid[0], ap.bssid[1], ap.bssid[2],
ap.bssid[3], ap.bssid[4], ap.bssid[5]);
}
}
}
const rv_phy_rx_ctrl_t *phy = (const rv_phy_rx_ctrl_t *)info;
if (s_gain_pkt_count < RV_GAIN_CAL_PACKETS) {
s_agc_samples[s_gain_pkt_count] = phy->agc_gain;
s_fft_samples[s_gain_pkt_count] = phy->fft_gain;
s_gain_pkt_count++;
if (s_gain_pkt_count == RV_GAIN_CAL_PACKETS / 4 ||
s_gain_pkt_count == RV_GAIN_CAL_PACKETS / 2 ||
s_gain_pkt_count == (3u * RV_GAIN_CAL_PACKETS) / 4u) {
ESP_LOGI(TAG, "gain-lock cal %u%% (%u/%u, AGC=%u FFT=%d)",
(unsigned)((s_gain_pkt_count * 100u) / RV_GAIN_CAL_PACKETS),
(unsigned)s_gain_pkt_count, (unsigned)RV_GAIN_CAL_PACKETS,
(unsigned)phy->agc_gain, (int)phy->fft_gain);
}
return;
}
/* Reached the calibration target — compute medians, lock or skip. */
qsort(s_agc_samples, RV_GAIN_CAL_PACKETS, sizeof(uint8_t), rv_cmp_u8);
qsort(s_fft_samples, RV_GAIN_CAL_PACKETS, sizeof(int8_t), rv_cmp_i8);
s_gain_agc_value = s_agc_samples[RV_GAIN_CAL_PACKETS / 2];
s_gain_fft_value = s_fft_samples[RV_GAIN_CAL_PACKETS / 2];
if (s_gain_agc_value < RV_GAIN_MIN_SAFE_AGC) {
s_gain_skipped_strong = true;
ESP_LOGW(TAG,
"gain-lock SKIPPED: AGC median=%u < %u (signal too strong, "
"forcing would freeze RX). Move sensor 2-3 m from AP.",
(unsigned)s_gain_agc_value, (unsigned)RV_GAIN_MIN_SAFE_AGC);
} else {
phy_fft_scale_force(true, s_gain_fft_value);
phy_force_rx_gain(1, (int)s_gain_agc_value);
ESP_LOGI(TAG,
"gain-lock APPLIED: AGC=%u FFT=%d (median of %u packets) — "
"baseline drift should now collapse.",
(unsigned)s_gain_agc_value, (int)s_gain_fft_value,
(unsigned)RV_GAIN_CAL_PACKETS);
/* ADR-108: persist for next boot — short-circuit calibration.
* ADR-111: also persist the AP BSSID this calibration ran against
* so the boot-time short-circuit can detect AP swaps and discard
* stale gain values. */
uint8_t cur_mac[6] = {0};
wifi_ap_record_t ap;
if (esp_wifi_sta_get_ap_info(&ap) == ESP_OK) {
memcpy(cur_mac, ap.bssid, 6);
}
rv_gain_save_to_nvs(s_gain_agc_value, s_gain_fft_value, cur_mac);
ESP_LOGI(TAG,
"gain-lock PERSISTED to NVS (AGC=%u FFT=%d AP=%02x:%02x:%02x:%02x:%02x:%02x)",
(unsigned)s_gain_agc_value, (int)s_gain_fft_value,
cur_mac[0], cur_mac[1], cur_mac[2],
cur_mac[3], cur_mac[4], cur_mac[5]);
}
s_gain_locked = true;
}
#else
static inline void rv_gain_lock_process(const wifi_csi_info_t *info) { (void)info; }
#endif
static uint32_t s_sequence = 0;
static uint32_t s_cb_count = 0;
static uint32_t s_send_ok = 0;
@ -64,7 +292,10 @@ static uint32_t s_rate_skip = 0;
* We cap the send rate to avoid exhausting lwIP packet buffers (ENOMEM).
* Default: 20 ms = 50 Hz max send rate.
*/
#define CSI_MIN_SEND_INTERVAL_US (20 * 1000)
/* Send rate cap reduced from 20 ms to 4 ms (250 Hz) so the host calibration
* UI can show every available frame. The real ceiling is whatever rate the
* WiFi CSI callback actually fires at (usually 5-50 Hz on a quiet LAN). */
#define CSI_MIN_SEND_INTERVAL_US (4 * 1000)
static int64_t s_last_send_us = 0;
/**
@ -116,6 +347,10 @@ static esp_timer_handle_t s_hop_timer = NULL;
* [17] Noise floor (i8)
* [18..19] Reserved
* [20..] I/Q data (raw bytes from ESP-IDF callback)
* [20+iq_len .. 20+iq_len+3] ADR-106: sensor timestamp_us (u32 LE)
* from info->rx_ctrl.timestamp. Trailing
* 4 bytes server parses opportunistically;
* old server tolerant of extra bytes.
*/
size_t csi_serialize_frame(const wifi_csi_info_t *info, uint8_t *buf, size_t buf_len)
{
@ -127,7 +362,7 @@ size_t csi_serialize_frame(const wifi_csi_info_t *info, uint8_t *buf, size_t buf
uint16_t iq_len = (uint16_t)info->len;
uint16_t n_subcarriers = iq_len / (2 * n_antennas);
size_t frame_size = CSI_HEADER_SIZE + iq_len;
size_t frame_size = CSI_HEADER_SIZE + iq_len + 4 /* ADR-106 trailing timestamp_us */;
if (frame_size > buf_len) {
ESP_LOGW(TAG, "Buffer too small: need %u, have %u", (unsigned)frame_size, (unsigned)buf_len);
return 0;
@ -180,6 +415,13 @@ size_t csi_serialize_frame(const wifi_csi_info_t *info, uint8_t *buf, size_t buf
/* I/Q data */
memcpy(&buf[CSI_HEADER_SIZE], info->buf, iq_len);
/* ADR-106: trailing sensor µs timestamp from rx_ctrl.timestamp.
* This is monotonic µs since FW boot (per ESP-IDF docs) and lets
* the host align frames across nodes within ~µs once the boot
* offsets are learned. Old server ignores trailing bytes. */
uint32_t ts_us = info->rx_ctrl.timestamp;
memcpy(&buf[CSI_HEADER_SIZE + iq_len], &ts_us, 4);
return frame_size;
}
@ -208,6 +450,11 @@ static void wifi_csi_callback(void *ctx, wifi_csi_info_t *info)
}
}
/* ADR-100: feed the gain-lock calibrator. No-op once locked / on
* unsupported targets. Runs before the heavy work so calibration
* happens during the first ~6 s after boot regardless of host traffic. */
rv_gain_lock_process(info);
s_cb_count++;
if (s_cb_count <= 3 || (s_cb_count % 100) == 0) {
@ -351,25 +598,15 @@ void csi_collector_init(void)
ESP_LOGI(TAG, "WiFi modem sleep disabled (WIFI_PS_NONE) for CSI capture");
}
/* Enable promiscuous mode — required for reliable CSI callbacks.
* Without this, CSI only fires on frames destined to this station,
* which may be very infrequent on a quiet network. */
ESP_ERROR_CHECK(esp_wifi_set_promiscuous(true));
ESP_ERROR_CHECK(esp_wifi_set_promiscuous_rx_cb(wifi_promiscuous_cb));
/* MGMT-only promiscuous filter + active probe injection (RuView#396).
*
* DATA frames cause 100-500+ WiFi HW interrupts/sec which crashes Core 0
* in wDev_ProcessFiq (SPI flash cache race in ESP-IDF WiFi blob).
* MGMT-only gives ~10 Hz (beacons). Probe request injection at 10 Hz
* adds ~10 Hz probe responses from APs ~20 Hz total, matching the
* edge processing designed sample rate of 20 Hz. */
wifi_promiscuous_filter_t filt = {
.filter_mask = WIFI_PROMIS_FILTER_MASK_MGMT,
};
ESP_ERROR_CHECK(esp_wifi_set_promiscuous_filter(&filt));
ESP_LOGI(TAG, "Promiscuous mode enabled (MGMT-only, RuView#396)");
/* DO NOT enable promiscuous mode on these ESP32-S3 boards. Empirically,
* setting esp_wifi_set_promiscuous(true) while STA is connected suppresses
* the CSI RX callback entirely on this hardware revision adaptive_ctrl
* reports yield=0pps forever. FW5.47 (esp32s3_csi_capture) works on the
* same boards using plain STA-mode CSI (no promiscuous), so we mirror
* that approach here. CSI fires for every frame the STA actually
* receives (beacons + unicast ~10-20 Hz, same as edge_processing
* expects). */
ESP_LOGI(TAG, "Promiscuous mode SKIPPED (CSI via STA-only, broken otherwise on this board)");
wifi_csi_config_t csi_config = {
.lltf_en = true,

View File

@ -224,6 +224,25 @@ static edge_config_t s_cfg;
/** Per-subcarrier running variance (for top-K selection). */
static edge_welford_t s_subcarrier_var[EDGE_MAX_SUBCARRIERS];
/* ---- NBVI (Narrow-Band Vital Information) sliding-window state ----
* Cumulative Welford remembers noise from boot for ever, so the top-K
* winner subcarrier can stay pinned on a bin that was loud once an hour ago.
* We additionally track an EMA-based amplitude variance per subcarrier
* (alpha = 0.02 tau 50 frames 10 s at 5 pps) and use it to identify
* a "stable bins" subset bins whose amplitude wobble is *below* the
* across-band median. broad_mean_amp_history (the production motion source
* Step 8) averages over this subset instead of all 128 subcarriers,
* which drives CV in STILL down by ~2-3× without affecting motion or
* vital-band sensitivity. ADR-100/ADR-101 follow-up. */
static float s_sc_amp_ema[EDGE_MAX_SUBCARRIERS]; /**< per-bin EMA of amplitude */
static float s_sc_amp_var_ema[EDGE_MAX_SUBCARRIERS];/**< per-bin EMA of (a-EMA)^2 */
static uint16_t s_sc_init; /**< frames seen for NBVI warm-up */
#define NBVI_ALPHA 0.02f /* EMA smoothing — ~10 s at 5 pps */
#define NBVI_WARMUP_FRAMES 50 /* until then, fall back to full-band average */
#define NBVI_REFRESH_EVERY 25 /* recompute stable_bin mask every N frames */
static bool s_nbvi_stable_bin[EDGE_MAX_SUBCARRIERS]; /**< true → in quiet/stable set */
static uint8_t s_nbvi_stable_count; /**< # of true entries above */
/** Previous phase per subcarrier (for unwrapping). */
static float s_prev_phase[EDGE_MAX_SUBCARRIERS];
static bool s_phase_initialized;
@ -234,9 +253,31 @@ static uint8_t s_top_k_count;
/** Phase history for the primary (highest-variance) subcarrier. */
static float s_phase_history[EDGE_PHASE_HISTORY_LEN];
/** Amplitude history for the primary subcarrier (issue #555: motion source).
* Unwrapped phase drifts monotonically (thermal/oscillator/doppler), so
* variance-of-phase is dominated by drift slope rather than motion.
* Amplitudes are stable in calm rooms and spike on body motion. */
static float s_amp_history[EDGE_PHASE_HISTORY_LEN];
static uint16_t s_history_len;
static uint16_t s_history_idx;
/* ---- Broadband amplitude history (issue #555 — production motion source) ----
* 20-sample ring of per-frame *mean amplitude across all subcarriers*. Used by
* Step 8 as the motion_energy source because empirical measurements on this
* hardware (UART DBG_DSP capture, 2026-05-14) showed broadband variance
* separates still vs. motion much more reliably than primary-subcarrier
* variance:
* still room: bvar median ~0.08, max ~1.6
* walking 2 m: bvar median ~3.5, max ~14
* walk/still ratio: ~44×
* Compare primary-subcarrier amp variance: still ~1.3, walk ~24, ratio ~18×
* with spurious spikes in stillness when the top-K winner subcarrier flips. */
#define EDGE_BROAD_HISTORY_LEN 20
static float s_broad_mean_amp_history[EDGE_BROAD_HISTORY_LEN];
static uint16_t s_broad_mean_amp_idx;
/** Biquad filters for breathing and heart rate. */
static edge_biquad_t s_bq_breathing;
static edge_biquad_t s_bq_heartrate;
@ -709,7 +750,24 @@ static void send_feature_vector(void)
static void process_frame(const edge_ring_slot_t *slot)
{
uint16_t n_subcarriers = slot->iq_len / 2;
if (n_subcarriers == 0 || n_subcarriers > EDGE_MAX_SUBCARRIERS) return;
if (n_subcarriers == 0) return;
/* Issue #555 root cause: ESP32-S3 with lltf+htltf+stbc+ltf_merge yields
* 384 B I/Q (192 subcarriers) per CSI callback, while EDGE_MAX_SUBCARRIERS
* is 128. The previous `> EDGE_MAX_SUBCARRIERS return` made process_frame
* silently bail on every frame, so s_motion_energy stayed pinned at its
* init value (0.0). Truncate instead the first 128 subcarriers cover
* the L-LTF + first half of HT-LTF, which is plenty for motion / vitals. */
if (n_subcarriers > EDGE_MAX_SUBCARRIERS) {
static bool s_warned_trunc;
if (!s_warned_trunc) {
ESP_LOGW(TAG, "CSI %u subcarriers > EDGE_MAX_SUBCARRIERS=%u — "
"truncating (one-shot warning)",
(unsigned)n_subcarriers,
(unsigned)EDGE_MAX_SUBCARRIERS);
s_warned_trunc = true;
}
n_subcarriers = EDGE_MAX_SUBCARRIERS;
}
s_frame_count++;
s_latest_rssi = slot->rssi;
@ -746,14 +804,110 @@ static void process_frame(const edge_ring_slot_t *slot)
if (s_top_k_count == 0) return;
/* --- Step 5: Phase of primary (highest-variance) subcarrier --- */
/* --- Step 5: Phase + amplitude of primary (highest-variance) subcarrier --- */
float primary_phase = phases[s_top_k[0]];
/* Store in phase history ring buffer. */
/* Amplitude of primary subcarrier — drift-free motion proxy (issue #555). */
uint8_t primary_sc = s_top_k[0];
int8_t pi_val = (int8_t)slot->iq_data[primary_sc * 2];
int8_t pq_val = (int8_t)slot->iq_data[primary_sc * 2 + 1];
float primary_amp = sqrtf((float)(pi_val * pi_val + pq_val * pq_val));
/* Store in phase + amplitude history ring buffers. */
s_phase_history[s_history_idx] = primary_phase;
s_amp_history[s_history_idx] = primary_amp;
s_history_idx = (s_history_idx + 1) % EDGE_PHASE_HISTORY_LEN;
if (s_history_len < EDGE_PHASE_HISTORY_LEN) s_history_len++;
/* --- Broadband + NBVI probe (always on, feeds Step 8) ---
*
* One pass over all subcarriers does three jobs:
* (a) sum |I+jQ| for the full-band average (used during warm-up and
* as the fallback);
* (b) per-bin EMA of amplitude and amplitude-variance (alpha = NBVI_ALPHA,
* tau 10 s) so we can rank bins by recent noise level;
* (c) periodically (every NBVI_REFRESH_EVERY frames) recompute the
* "stable bins" mask = bins whose EMA variance is below the
* across-band median. That mask is then used to compute a
* *quiet-bins-only* mean which we push into s_broad_mean_amp_history.
*
* Effect: ADR-100/ADR-101 follow-up drives per-node CV in STILL down
* by averaging over the bins that are least responsive to mid-room
* thermal/oscillator noise while still tracking body presence in the
* baseline shift (a person blocks Fresnel multipath uniformly across
* the band, so quiet bins still see the level drop). */
{
float band_amp_sum = 0.0f;
for (uint16_t sc = 0; sc < n_subcarriers; sc++) {
int8_t iv = (int8_t)slot->iq_data[sc * 2];
int8_t qv = (int8_t)slot->iq_data[sc * 2 + 1];
float a = sqrtf((float)(iv * iv + qv * qv));
band_amp_sum += a;
/* Update per-bin EMA and EMA of (a - EMA)^2. */
if (s_sc_init < NBVI_WARMUP_FRAMES) {
/* Seed the EMA from the very first sample to avoid the
* slow ramp from zero biasing the median for the first
* ~10 s. */
if (s_sc_amp_ema[sc] == 0.0f) s_sc_amp_ema[sc] = a;
}
float prev_mean = s_sc_amp_ema[sc];
float new_mean = prev_mean + NBVI_ALPHA * (a - prev_mean);
float dev = a - new_mean;
s_sc_amp_ema[sc] = new_mean;
s_sc_amp_var_ema[sc] = s_sc_amp_var_ema[sc] +
NBVI_ALPHA * (dev * dev - s_sc_amp_var_ema[sc]);
}
if (s_sc_init < NBVI_WARMUP_FRAMES) s_sc_init++;
float band_amp_mean = (n_subcarriers > 0)
? band_amp_sum / (float)n_subcarriers : 0.0f;
/* Refresh stable_bin mask periodically — only after warm-up so the
* EMA variances are populated. */
if (s_sc_init >= NBVI_WARMUP_FRAMES
&& (s_frame_count % NBVI_REFRESH_EVERY) == 0)
{
/* Median EMVar across active subcarriers (n_subcarriers ≤ 128).
* Stack copy is cheap a few hundred bytes. */
float scratch[EDGE_MAX_SUBCARRIERS];
for (uint16_t i = 0; i < n_subcarriers; i++) scratch[i] = s_sc_amp_var_ema[i];
/* Tiny in-place selection sort up to the median index — n=128
* makes a full sort ~16 k comparisons (fine on Core 1 every 25
* frames 5 s) but partial sort is even cheaper. */
uint16_t target = n_subcarriers / 2;
for (uint16_t i = 0; i <= target; i++) {
uint16_t min_i = i;
for (uint16_t j = i + 1; j < n_subcarriers; j++) {
if (scratch[j] < scratch[min_i]) min_i = j;
}
if (min_i != i) {
float t = scratch[i]; scratch[i] = scratch[min_i]; scratch[min_i] = t;
}
}
float median_var = scratch[target];
uint8_t count = 0;
for (uint16_t i = 0; i < n_subcarriers; i++) {
bool stable = s_sc_amp_var_ema[i] <= median_var;
s_nbvi_stable_bin[i] = stable;
if (stable) count++;
}
s_nbvi_stable_count = count;
}
/* IMPORTANT: motion_energy (Step 8) MUST take the variance of the
* *full-band* mean. Pushing a quiet-bins-only mean here would zero
* out motion_energy entirely quiet bins by construction barely
* move, so the windowed variance collapses to ~0 and stays there
* (verified empirically on 2026-05-17: motion_score went constant
* 0.013/0.021 with std=0 across 125 frames). The NBVI EMA state
* above remains for future use (a second "baseline_quiet" channel,
* not yet wired to the feature_state packet). */
s_broad_mean_amp_history[s_broad_mean_amp_idx] = band_amp_mean;
s_broad_mean_amp_idx = (s_broad_mean_amp_idx + 1) % EDGE_BROAD_HISTORY_LEN;
}
/* --- Step 6: Biquad bandpass filtering --- */
float br_val = biquad_process(&s_bq_breathing, primary_phase);
float hr_val = biquad_process(&s_bq_heartrate, primary_phase);
@ -783,20 +937,49 @@ static void process_frame(const edge_ring_slot_t *slot)
if (hr_bpm >= 40.0f && hr_bpm <= 180.0f) s_heartrate_bpm = hr_bpm;
}
/* --- Step 8: Motion energy (variance of recent phases) --- */
/* --- Step 8: Motion energy (broadband amplitude variance) ---
*
* Issue #555 evolution:
* v1 variance of unwrapped *phase*: dominated by thermal/oscillator
* drift constant non-zero regardless of motion.
* v2 variance of *primary subcarrier* amplitude: better, but the
* top-K winner subcarrier flips occasionally (winner_changed=1
* in DBG_DSP), causing spurious spikes in stillness measured
* pvar still ~1.3 with bursts to 22 when nothing was moving.
* v3 (current) variance of *band-wide mean amplitude*: averaging
* across all 128 subcarriers cancels per-subcarrier noise; what
* remains is the overall multipath energy level, which moves
* coherently with body presence in the Fresnel zone.
*
* Empirical numbers from 2026-05-14 capture (room02, 2 m, person):
* still: bvar median 0.08, max 1.6
* walking: bvar median 3.5, max 14.3
* walk/still ratio: ~44× (vs ~18× for primary-subcarrier variance)
*
* Normalization: motion_energy = clamp(bvar / 3.0, 0, 1).
* still 0.08 0.027 (under the <0.05 spec)
* still 1.6 0.53 (rare transient acceptable)
* walk 1.6 0.53 (over the >0.3 spec)
* walk 3.5+ 1.0 (saturated, presence definite) */
if (s_history_len >= 10) {
float sum = 0.0f, sum2 = 0.0f;
uint16_t window = (s_history_len < 20) ? s_history_len : 20;
for (uint16_t i = 0; i < window; i++) {
uint16_t ri = (s_history_idx + EDGE_PHASE_HISTORY_LEN
- window + i) % EDGE_PHASE_HISTORY_LEN;
float v = s_phase_history[ri];
sum += v;
for (uint16_t i = 0; i < EDGE_BROAD_HISTORY_LEN; i++) {
float v = s_broad_mean_amp_history[i];
sum += v;
sum2 += v * v;
}
float mean = sum / (float)window;
s_motion_energy = (sum2 / (float)window) - (mean * mean);
if (s_motion_energy < 0.0f) s_motion_energy = 0.0f;
float mean = sum / (float)EDGE_BROAD_HISTORY_LEN;
float var = (sum2 / (float)EDGE_BROAD_HISTORY_LEN) - mean * mean;
if (var < 0.0f) var = 0.0f;
/* Divisor sized for sensor deployment with 1-3 m line-of-sight to
* the activity zone. At that range multipath averages out and
* broadband variance is small (~0.1-2.0 empty, ~1-10 walking).
* Lower divisor = higher sensitivity but more saturation if a
* sensor is moved close to the body (50 cm). */
float energy = var / 5.0f;
if (energy > 1.0f) energy = 1.0f;
s_motion_energy = energy;
}
/* --- Step 9: Presence detection --- */
@ -1000,6 +1183,18 @@ esp_err_t edge_processing_init(const edge_config_t *cfg)
memset(&s_ring, 0, sizeof(s_ring));
memset(s_subcarrier_var, 0, sizeof(s_subcarrier_var));
memset(s_prev_phase, 0, sizeof(s_prev_phase));
memset(s_phase_history, 0, sizeof(s_phase_history));
memset(s_amp_history, 0, sizeof(s_amp_history));
memset(s_broad_mean_amp_history, 0, sizeof(s_broad_mean_amp_history));
s_broad_mean_amp_idx = 0;
/* NBVI sliding-window state — recomputed from fresh on each init so
* the stable_bin mask doesn't carry over stale stats from a previous
* deployment / room. */
memset(s_sc_amp_ema, 0, sizeof(s_sc_amp_ema));
memset(s_sc_amp_var_ema, 0, sizeof(s_sc_amp_var_ema));
memset(s_nbvi_stable_bin, 0, sizeof(s_nbvi_stable_bin));
s_sc_init = 0;
s_nbvi_stable_count = 0;
s_phase_initialized = false;
s_top_k_count = 0;
s_history_len = 0;
@ -1034,12 +1229,18 @@ esp_err_t edge_processing_init(const edge_config_t *cfg)
}
/* Design biquad bandpass filters.
* Sampling rate ~20 Hz (typical ESP32 CSI callback rate). */
const float fs = 20.0f;
*
* fs must match the sample_rate used by estimate_bpm_zero_crossing()
* in process_frame() (currently 10.0 Hz see RuView#396 comment near
* the `sample_rate` literal). Designing biquads at 20 Hz while feeding
* them 10 Hz data effectively halves the passband: the "0.1-0.5 Hz
* breathing" filter became 0.05-0.25 Hz, which cuts out 12-18 BPM
* (0.2-0.3 Hz) the bulk of human respiration. */
const float fs = 10.0f;
biquad_bandpass_design(&s_bq_breathing, fs, 0.1f, 0.5f);
biquad_bandpass_design(&s_bq_heartrate, fs, 0.8f, 2.0f);
/* Design per-person filters. */
/* Design per-person filters at the same fs. */
for (uint8_t p = 0; p < EDGE_MAX_PERSONS; p++) {
biquad_bandpass_design(&s_person_bq_br[p], fs, 0.1f, 0.5f);
biquad_bandpass_design(&s_person_bq_hr[p], fs, 0.8f, 2.0f);

View File

@ -17,6 +17,7 @@
#include "esp_log.h"
#include "nvs_flash.h"
#include "esp_app_desc.h"
#include "esp_ota_ops.h" /* esp_ota_get_running_partition — issue #556 boot diag */
#include "sdkconfig.h"
#include "csi_collector.h"
@ -127,8 +128,39 @@ static void wifi_init_sta(void)
}
}
/* Issue #556 OTA debug: log how we got here. After an OTA upload the new
* image should boot with reset_reason=ESP_RST_SW from esp_restart() and
* run from the partition esp_ota_set_boot_partition() picked. If we see
* ESP_RST_PANIC / ESP_RST_TASK_WDT / ESP_RST_INT_WDT from the OTA-flashed
* slot, the new image crashed in early boot that's the failure mode the
* "/ota/status still shows old time" symptom is masking. */
static const char *reset_reason_str(esp_reset_reason_t r)
{
switch (r) {
case ESP_RST_POWERON: return "POWERON";
case ESP_RST_EXT: return "EXT";
case ESP_RST_SW: return "SW";
case ESP_RST_PANIC: return "PANIC";
case ESP_RST_INT_WDT: return "INT_WDT";
case ESP_RST_TASK_WDT: return "TASK_WDT";
case ESP_RST_WDT: return "WDT";
case ESP_RST_DEEPSLEEP:return "DEEPSLEEP";
case ESP_RST_BROWNOUT: return "BROWNOUT";
case ESP_RST_SDIO: return "SDIO";
default: return "UNKNOWN";
}
}
void app_main(void)
{
/* Boot diagnostic — must run before anything that could panic, so even
* a one-line UART log tells us how the chip got here. */
esp_reset_reason_t rr = esp_reset_reason();
const esp_partition_t *running = esp_ota_get_running_partition();
ESP_LOGI(TAG, "boot: reset_reason=%s running_partition=%s",
reset_reason_str(rr),
running ? running->label : "?");
/* Initialize NVS */
esp_err_t ret = nvs_flash_init();
if (ret == ESP_ERR_NVS_NO_FREE_PAGES || ret == ESP_ERR_NVS_NEW_VERSION_FOUND) {

View File

@ -17,6 +17,7 @@
#include "esp_app_desc.h"
#include "nvs_flash.h"
#include "nvs.h"
#include "nvs_config.h" /* NVS_CFG_IP_MAX */
static const char *TAG = "ota_update";
@ -96,6 +97,180 @@ static esp_err_t ota_status_handler(httpd_req_t *req)
return ESP_OK;
}
/**
* POST /ota/recalibrate clear cached gain-lock NVS keys and reboot.
*
* ADR-109: lets the operator force a full gain-lock re-calibration from
* the server without a USB connection. Erases csi_cfg/gl_agc, gl_fft, and
* gl_ap_mac (ADR-111), then calls esp_restart(). Next boot finds no NVS
* cache and runs the 300-packet calibration as if it were a fresh device.
*/
static esp_err_t ota_recalibrate_handler(httpd_req_t *req)
{
if (!ota_check_auth(req)) {
ESP_LOGW(TAG, "/ota/recalibrate rejected: authentication failed");
httpd_resp_send_err(req, HTTPD_403_FORBIDDEN,
"Authentication required. Use: Authorization: Bearer <psk>");
return ESP_FAIL;
}
nvs_handle_t h;
esp_err_t err = nvs_open("csi_cfg", NVS_READWRITE, &h);
if (err != ESP_OK) {
ESP_LOGE(TAG, "/ota/recalibrate: nvs_open(csi_cfg) failed: %s",
esp_err_to_name(err));
httpd_resp_send_err(req, HTTPD_500_INTERNAL_SERVER_ERROR,
"NVS open failed");
return ESP_FAIL;
}
/* Erase all three keys defensively — ignore individual ESP_ERR_NVS_NOT_FOUND
* (key already absent on a never-calibrated device). */
(void)nvs_erase_key(h, "gl_agc");
(void)nvs_erase_key(h, "gl_fft");
(void)nvs_erase_key(h, "gl_ap_mac");
err = nvs_commit(h);
nvs_close(h);
if (err != ESP_OK) {
ESP_LOGE(TAG, "/ota/recalibrate: nvs_commit failed: %s",
esp_err_to_name(err));
httpd_resp_send_err(req, HTTPD_500_INTERNAL_SERVER_ERROR,
"NVS commit failed");
return ESP_FAIL;
}
ESP_LOGI(TAG, "/ota/recalibrate: gain-lock NVS cleared; rebooting in 1s");
const char *resp =
"{\"status\":\"ok\",\"message\":\"gain-lock NVS cleared; rebooting\"}";
httpd_resp_set_type(req, "application/json");
httpd_resp_send(req, resp, strlen(resp));
vTaskDelay(pdMS_TO_TICKS(1000));
esp_restart();
return ESP_OK; /* unreachable */
}
/**
* POST /ota/set-target write csi_cfg/target_ip + target_port to NVS, reboot.
*
* ADR-115: lets the operator point sensors at a new aggregator (Mac IP
* change, network move) without USB. Body is plain text "IP:PORT" with
* trailing newline tolerated, e.g. "192.168.0.103:5005". IP validated
* by inet_pton-like check (4 dot-separated octets 0255); port 165535.
*
* Persists into the same `csi_cfg` namespace that `nvs_config.c` reads
* at boot next reboot picks up the new target.
*/
static bool parse_ip_port(const char *s, char *ip_out, size_t ip_cap, uint16_t *port_out)
{
/* Tolerate trailing whitespace/CR/LF. */
size_t n = strlen(s);
while (n > 0 && (s[n - 1] == '\n' || s[n - 1] == '\r' || s[n - 1] == ' ' || s[n - 1] == '\t')) {
n--;
}
const char *colon = NULL;
for (size_t i = 0; i < n; i++) {
if (s[i] == ':') { colon = &s[i]; break; }
}
if (!colon) return false;
size_t ip_len = (size_t)(colon - s);
if (ip_len == 0 || ip_len >= ip_cap) return false;
memcpy(ip_out, s, ip_len);
ip_out[ip_len] = '\0';
/* Validate 4 octets 0255. */
int oct_count = 0, val = -1;
for (size_t i = 0; i <= ip_len; i++) {
char c = ip_out[i];
if (c == '.' || c == '\0') {
if (val < 0 || val > 255) return false;
oct_count++;
val = -1;
} else if (c >= '0' && c <= '9') {
val = (val < 0 ? 0 : val) * 10 + (c - '0');
} else {
return false;
}
}
if (oct_count != 4) return false;
/* Parse port. */
long port = 0;
const char *p = colon + 1;
size_t plen = n - ip_len - 1;
if (plen == 0 || plen > 5) return false;
for (size_t i = 0; i < plen; i++) {
if (p[i] < '0' || p[i] > '9') return false;
port = port * 10 + (p[i] - '0');
}
if (port < 1 || port > 65535) return false;
*port_out = (uint16_t)port;
return true;
}
static esp_err_t ota_set_target_handler(httpd_req_t *req)
{
if (!ota_check_auth(req)) {
ESP_LOGW(TAG, "/ota/set-target rejected: authentication failed");
httpd_resp_send_err(req, HTTPD_403_FORBIDDEN,
"Authentication required. Use: Authorization: Bearer <psk>");
return ESP_FAIL;
}
/* Body is short: "IPv4:port" + optional CRLF. 32 bytes is plenty. */
char body[40] = {0};
int total = 0;
while (total < (int)sizeof(body) - 1) {
int r = httpd_req_recv(req, body + total, sizeof(body) - 1 - total);
if (r <= 0) {
if (r == HTTPD_SOCK_ERR_TIMEOUT) continue;
break;
}
total += r;
}
body[total < 0 ? 0 : total] = '\0';
char ip[NVS_CFG_IP_MAX] = {0};
uint16_t port = 0;
if (!parse_ip_port(body, ip, sizeof(ip), &port)) {
ESP_LOGW(TAG, "/ota/set-target rejected: invalid body '%s'", body);
httpd_resp_send_err(req, HTTPD_400_BAD_REQUEST,
"Body must be 'IPv4:PORT', e.g. '192.168.0.103:5005'");
return ESP_FAIL;
}
nvs_handle_t h;
esp_err_t err = nvs_open("csi_cfg", NVS_READWRITE, &h);
if (err != ESP_OK) {
ESP_LOGE(TAG, "/ota/set-target: nvs_open(csi_cfg) failed: %s",
esp_err_to_name(err));
httpd_resp_send_err(req, HTTPD_500_INTERNAL_SERVER_ERROR, "NVS open failed");
return ESP_FAIL;
}
err = nvs_set_str(h, "target_ip", ip);
if (err == ESP_OK) err = nvs_set_u16(h, "target_port", port);
if (err == ESP_OK) err = nvs_commit(h);
nvs_close(h);
if (err != ESP_OK) {
ESP_LOGE(TAG, "/ota/set-target: NVS write failed: %s", esp_err_to_name(err));
httpd_resp_send_err(req, HTTPD_500_INTERNAL_SERVER_ERROR, "NVS write failed");
return ESP_FAIL;
}
ESP_LOGI(TAG, "/ota/set-target: csi_cfg/target_ip=%s target_port=%u; rebooting in 1s",
ip, (unsigned)port);
char resp[120];
int rlen = snprintf(resp, sizeof(resp),
"{\"status\":\"ok\",\"target_ip\":\"%s\",\"target_port\":%u,\"message\":\"rebooting\"}",
ip, (unsigned)port);
httpd_resp_set_type(req, "application/json");
httpd_resp_send(req, resp, rlen);
vTaskDelay(pdMS_TO_TICKS(1000));
esp_restart();
return ESP_OK; /* unreachable */
}
/**
* POST /ota receive and flash firmware binary.
*/
@ -125,7 +300,16 @@ static esp_err_t ota_upload_handler(httpd_req_t *req)
}
esp_ota_handle_t ota_handle;
esp_err_t err = esp_ota_begin(update_partition, OTA_WITH_SEQUENTIAL_WRITES, &ota_handle);
/* Issue #556: use OTA_SIZE_UNKNOWN (full partition erase) instead of
* OTA_WITH_SEQUENTIAL_WRITES. When the new image is smaller than the
* one previously written to the target slot, sequential writes leave
* the tail of the old code in place. The image header SHA covers
* only the declared image span, but residual code at stale offsets
* can still be reached via IRAM jump tables / .literal pools on some
* v5.2 ABIs and crash the new app on first boot, which then looks
* like "OTA didn't take". Full erase up-front avoids this entirely
* at the cost of one extra ~1.5 s erase before write starts. */
esp_err_t err = esp_ota_begin(update_partition, OTA_SIZE_UNKNOWN, &ota_handle);
if (err != ESP_OK) {
ESP_LOGE(TAG, "esp_ota_begin failed: %s", esp_err_to_name(err));
httpd_resp_send_err(req, HTTPD_500_INTERNAL_SERVER_ERROR,
@ -207,6 +391,13 @@ static esp_err_t ota_start_server(httpd_handle_t *out_handle)
config.max_uri_handlers = 12; /* Extra slots for WASM endpoints (ADR-040). */
/* Increase receive timeout for large uploads. */
config.recv_wait_timeout = 30;
/* Issue #556: httpd default stack is 4096 B, which overflows during
* esp_ota_end()'s image-verify (SHA256 streaming + mmap segment walk
* eats ~3 KB on top of the request handler frame). Empirically observed
* "***ERROR*** A stack overflow in task httpd has been detected"
* immediately after esp_image: segment dumps when OTA reaches verify.
* 8 KB gives a clean margin without hurting the typical idle case. */
config.stack_size = 8192;
httpd_handle_t server = NULL;
esp_err_t err = httpd_start(&server, &config);
@ -233,9 +424,29 @@ static esp_err_t ota_start_server(httpd_handle_t *out_handle)
};
httpd_register_uri_handler(server, &upload_uri);
/* ADR-109: REST trigger for full gain-lock re-calibration. */
httpd_uri_t recalibrate_uri = {
.uri = "/ota/recalibrate",
.method = HTTP_POST,
.handler = ota_recalibrate_handler,
.user_ctx = NULL,
};
httpd_register_uri_handler(server, &recalibrate_uri);
/* ADR-115: REST endpoint to change CSI aggregator target without USB. */
httpd_uri_t set_target_uri = {
.uri = "/ota/set-target",
.method = HTTP_POST,
.handler = ota_set_target_handler,
.user_ctx = NULL,
};
httpd_register_uri_handler(server, &set_target_uri);
ESP_LOGI(TAG, "OTA HTTP server started on port %d", OTA_PORT);
ESP_LOGI(TAG, " GET /ota/status — firmware version info");
ESP_LOGI(TAG, " POST /ota — upload new firmware binary");
ESP_LOGI(TAG, " GET /ota/status — firmware version info");
ESP_LOGI(TAG, " POST /ota — upload new firmware binary");
ESP_LOGI(TAG, " POST /ota/recalibrate — clear gain-lock NVS + reboot");
ESP_LOGI(TAG, " POST /ota/set-target — set CSI target IP:port in NVS + reboot");
if (out_handle) *out_handle = server;
return ESP_OK;

View File

@ -65,7 +65,11 @@ typedef struct __attribute__((packed)) {
float env_shift_score; /**< 0..1, baseline drift. */
float node_coherence; /**< 0..1, multi-link agreement. */
uint16_t quality_flags; /**< RV_QFLAG_* bitmap. */
uint16_t reserved;
int8_t rssi_dbm; /**< Median RSSI over the emit window (i8, dBm). 0 = not measured.
ADR-100 D3: previously the same byte was `reserved` but downstream
UI/classifier needs RSSI per node and the legacy raw-CSI parse path
(0xC5110001) is no longer hot on this FW. Server reads buf[54] as i8. */
uint8_t reserved; /**< Padding/aux byte; keep zero until next protocol bump. */
uint32_t crc32; /**< IEEE CRC32 over bytes [0..end-4]. */
} rv_feature_state_t;

View File

@ -34,3 +34,14 @@ CONFIG_ESP_MAIN_TASK_STACK_SIZE=8192
# Extra WiFi IRAM placement (defense-in-depth for RuView#396 SPI cache race)
CONFIG_ESP_WIFI_EXTRA_IRAM_OPT=y
# ----- Local overrides for room01/room02 deployment -----
# EDGE_TIER kept at project default (=2, full vitals pipeline).
# Mac aggregator IP
CONFIG_CSI_TARGET_IP="192.168.1.21"
CONFIG_CSI_TARGET_PORT=5006
# Disable AMOLED display (no display on room sensors, init panics on missing
# TCA9554 expander → Tmr Svc stack overflow).
CONFIG_DISPLAY_ENABLE=n
# Increase Tmr Svc stack to fit adaptive_controller tick (default 2048 overflows).
CONFIG_FREERTOS_TIMER_TASK_STACK_DEPTH=8192

View File

@ -0,0 +1,108 @@
#!/usr/bin/env python3
"""ADR-114: generate 1000 idle + 1000 motion CSI replay fixtures.
Two files are written under
`v2/crates/wifi-densepose-sensing-server/tests/fixtures/`:
* `replay_idle.jsonl` 1000 frames of empty-room baseline +
per-frame Gaussian noise (low CV).
* `replay_motion.jsonl` 1000 frames of the same baseline + 1.5 Hz
coherent modulation + per-frame Gaussian
noise (high CV).
Format: one JSON object per line:
{"node_id": <u8>, "amplitude": [<f64>; 56]}
These are *synthetic but parameter-matched to live data* (baseline
mean = 27.04 / 14.72 from data/baseline.json, CV 2.6 / 3.6 %).
They exist to provide deterministic regression coverage of the
amp_presence_override classifier. Real captured-from-sensor fixtures
can replace them in-place (same filename, same line format) without
changing the test code.
Deterministic by seed so the test result is reproducible across
machines. Re-run only when you want to regenerate.
"""
from __future__ import annotations
import json
import math
import random
from pathlib import Path
OUT_DIR = (
Path(__file__).resolve().parent.parent
/ "v2"
/ "crates"
/ "wifi-densepose-sensing-server"
/ "tests"
/ "fixtures"
)
# Per-node baseline mean amplitude pulled from a real recording of
# this deployment (data/baseline.json). Holding them in code keeps
# the fixture script self-contained.
NODE_BASELINES = {1: 27.04, 2: 14.72}
N_SUB = 56
FRAMES_PER_NODE = 500 # 500 × 2 nodes = 1000 per fixture file
def gen_subcarrier_profile(rng: random.Random, mean: float) -> list[float]:
"""Static per-subcarrier mean profile — same for the whole capture."""
return [max(1.0, mean * rng.uniform(0.7, 1.3)) for _ in range(N_SUB)]
def write_fixture(path: Path, motion: bool, seed: int) -> int:
rng = random.Random(seed)
profiles = {
nid: gen_subcarrier_profile(rng, mean) for nid, mean in NODE_BASELINES.items()
}
count = 0
with path.open("w") as f:
# Interleave nodes round-robin so the test driver gets per-node
# streams of the same length, like a real WS feed.
for i in range(FRAMES_PER_NODE):
for nid, profile in profiles.items():
t = i / 20.0 # 20 Hz tick
# AMP_SHORT_WIN in the server is 90 frames = 4.5 s.
# Idle: small per-frame noise → rolling-window CV stays
# well below the universal threshold.
# Motion: a slow ~0.15 Hz coherent envelope (6.7 s cycle,
# longer than the 4.5 s averaging window) drives the
# broadband mean up/down by ±40 %, producing a high
# rolling CV. Mimics body position changes during
# walking — the channel response shifts slowly relative
# to the classifier window.
if motion:
envelope = 1.0 + 0.40 * math.sin(2 * math.pi * 0.15 * t)
else:
envelope = 1.0
amps: list[float] = []
for mu in profile:
noise_sigma = mu * (0.05 if motion else 0.018)
n = rng.gauss(0.0, noise_sigma)
amps.append(round(mu * envelope + n, 3))
f.write(json.dumps({"node_id": nid, "amplitude": amps}) + "\n")
count += 1
return count
def main() -> None:
OUT_DIR.mkdir(parents=True, exist_ok=True)
idle_path = OUT_DIR / "replay_idle.jsonl"
motion_path = OUT_DIR / "replay_motion.jsonl"
n_idle = write_fixture(idle_path, motion=False, seed=42)
n_motion = write_fixture(motion_path, motion=True, seed=43)
print(f"wrote {n_idle} idle frames → {idle_path}")
print(f"wrote {n_motion} motion frames → {motion_path}")
print()
print("These fixtures are SYNTHETIC parameter-matched to live data —")
print("the cargo test that consumes them measures classifier")
print("consistency, not real-world accuracy. Replace with live")
print("captures (same line format, same filenames) when operator")
print("time allows for a true empty-vs-walking ground-truth pair.")
if __name__ == "__main__":
main()

275
scripts/ota-deploy.sh Executable file
View File

@ -0,0 +1,275 @@
#!/usr/bin/env python3
"""
scripts/ota-deploy.sh — push esp32-csi-node.bin to one or more sensor nodes
over WiFi. Talks to the on-device /ota endpoint (ADR-045, port 8032,
handler in firmware/esp32-csi-node/main/ota_update.c).
Usage:
scripts/ota-deploy.sh # auto-discover via ARP, deploy to all
scripts/ota-deploy.sh 192.168.0.100 # one node
scripts/ota-deploy.sh 192.168.0.100 192.168.0.101
scripts/ota-deploy.sh --build # idf.py build first, then deploy
scripts/ota-deploy.sh --no-verify ... # skip post-reboot /ota/status check
Auth: set env OTA_PSK=<token> to send "Authorization: Bearer <token>"
(matches the on-device check in ota_update.c::ota_check_auth).
Exit codes:
0 — all targeted nodes confirmed running_partition flipped
1 — one or more nodes failed verification or were unreachable
2 — build or argument error
"""
from __future__ import annotations
import argparse
import concurrent.futures as cf
import json
import os
import re
import shutil
import subprocess
import sys
import time
import urllib.error
import urllib.request
from pathlib import Path
from typing import Iterable
REPO_ROOT = Path(__file__).resolve().parent.parent
FW_DIR = REPO_ROOT / "firmware" / "esp32-csi-node"
BIN_PATH = FW_DIR / "build" / "esp32-csi-node.bin"
PORT = 8032
UPLOAD_TIMEOUT_S = 120
REBOOT_WAIT_S = 10
VERIFY_RETRIES = 6
VERIFY_DELAY_S = 3
# ---- ANSI logging helpers ----------------------------------------------------
def _c(code: str, msg: str) -> str:
if not sys.stdout.isatty():
return msg
return f"\033[{code}m{msg}\033[0m"
def log(msg: str) -> None: print(_c("36", "[ota-deploy] ") + msg, flush=True)
def warn(msg: str) -> None: print(_c("33", "[ota-deploy] ") + msg, file=sys.stderr, flush=True)
def err(msg: str) -> None: print(_c("31", "[ota-deploy] ") + msg, file=sys.stderr, flush=True)
# ---- helpers -----------------------------------------------------------------
def http_get(url: str, timeout: float = 4.0) -> str | None:
try:
with urllib.request.urlopen(url, timeout=timeout) as r:
return r.read().decode("utf-8", errors="replace")
except (urllib.error.URLError, urllib.error.HTTPError, TimeoutError, OSError):
return None
def get_ota_status(ip: str) -> dict | None:
body = http_get(f"http://{ip}:{PORT}/ota/status")
if not body:
return None
try:
return json.loads(body)
except json.JSONDecodeError:
return None
def local_subnet_prefix() -> str | None:
"""Return e.g. '192.168.0' from en0 (macOS) or first non-loopback IP."""
try:
out = subprocess.check_output(
["ipconfig", "getifaddr", "en0"], stderr=subprocess.DEVNULL, text=True
).strip()
if out:
return out.rsplit(".", 1)[0]
except (subprocess.CalledProcessError, FileNotFoundError):
pass
# Linux fallback
try:
out = subprocess.check_output(["hostname", "-I"], text=True).strip()
if out:
return out.split()[0].rsplit(".", 1)[0]
except (subprocess.CalledProcessError, FileNotFoundError):
pass
return None
def discover_nodes() -> list[str]:
"""ARP-prefilter + parallel /ota/status probe to find live sensor nodes."""
prefix = local_subnet_prefix()
if not prefix:
err("could not determine local /24 — pass node IPs explicitly")
return []
log(f"scanning {prefix}.0/24 for /ota/status responders ...")
candidates: list[str] = []
try:
arp_out = subprocess.check_output(
["arp", "-a", "-n"], text=True, stderr=subprocess.DEVNULL
)
for line in arp_out.splitlines():
m = re.search(rf"\(({re.escape(prefix)}\.\d+)\)", line)
if m and "incomplete" not in line:
ip = m.group(1)
if not ip.endswith(".1"): # skip gateway
candidates.append(ip)
except (subprocess.CalledProcessError, FileNotFoundError):
pass
if not candidates:
warn(f"no ARP hits — falling back to {prefix}.100-110 ping sweep")
candidates = [f"{prefix}.{i}" for i in range(100, 111)]
candidates = sorted(set(candidates))
found: list[str] = []
with cf.ThreadPoolExecutor(max_workers=32) as pool:
futs = {pool.submit(get_ota_status, ip): ip for ip in candidates}
for fut in cf.as_completed(futs):
ip = futs[fut]
try:
if fut.result():
found.append(ip)
except Exception:
pass
return sorted(found, key=lambda x: tuple(int(o) for o in x.split(".")))
def upload_one(ip: str, payload: bytes, psk: str | None) -> tuple[bool, float, str]:
"""POST the firmware to one node. Returns (success, elapsed_s, body_snippet)."""
req = urllib.request.Request(
f"http://{ip}:{PORT}/ota",
data=payload,
headers={"Content-Type": "application/octet-stream"},
method="POST",
)
if psk:
req.add_header("Authorization", f"Bearer {psk}")
t0 = time.monotonic()
try:
with urllib.request.urlopen(req, timeout=UPLOAD_TIMEOUT_S) as r:
body = r.read().decode("utf-8", errors="replace")[:200]
return True, time.monotonic() - t0, body
except (urllib.error.HTTPError, urllib.error.URLError,
TimeoutError, ConnectionResetError, OSError) as e:
# ConnectionReset is *expected* when the chip restarts before flushing
# the response. We treat it as a soft pass and verify via /ota/status.
return (isinstance(e, ConnectionResetError),
time.monotonic() - t0,
f"{type(e).__name__}: {e}")
def build_firmware() -> int:
log("building firmware via idf.py ...")
if "IDF_PATH" not in os.environ:
export = Path.home() / "esp" / "esp-idf-v5.2" / "export.sh"
if not export.is_file():
err("IDF_PATH not set and ~/esp/esp-idf-v5.2/export.sh not found")
return 2
# source the env in a child shell
rc = subprocess.call(
["bash", "-lc", f". '{export}' >/dev/null 2>&1 && cd '{FW_DIR}' && idf.py build"]
)
else:
rc = subprocess.call(["idf.py", "build"], cwd=str(FW_DIR))
if rc != 0:
err("build failed")
return 2
return 0
# ---- main --------------------------------------------------------------------
def main(argv: list[str]) -> int:
ap = argparse.ArgumentParser(
prog="ota-deploy.sh",
description="Push esp32-csi-node.bin to one or more sensor nodes over WiFi.",
)
ap.add_argument("targets", nargs="*",
help="node IPs; auto-discover if omitted")
ap.add_argument("--build", action="store_true",
help="idf.py build before deploying")
ap.add_argument("--no-verify", action="store_true",
help="skip post-reboot /ota/status confirmation")
args = ap.parse_args(argv)
if args.build:
rc = build_firmware()
if rc != 0:
return rc
if not BIN_PATH.is_file():
err(f"firmware binary not found: {BIN_PATH} — pass --build first")
return 2
payload = BIN_PATH.read_bytes()
log(f"firmware: {BIN_PATH} ({len(payload)} bytes)")
targets = args.targets or discover_nodes()
if not targets:
err("no nodes given and none discovered")
return 1
log(f"targets: {' '.join(targets)}")
# snapshot before
before: dict[str, str] = {}
for ip in targets:
st = get_ota_status(ip)
if not st:
warn(f"{ip}: not reachable before upload")
before[ip] = "UNREACHABLE"
continue
before[ip] = st.get("running_partition", "UNKNOWN")
log(f"{ip} before: running_partition={before[ip]} time={st.get('time')}")
psk = os.environ.get("OTA_PSK") or None
if psk:
log("OTA_PSK set — sending Bearer token")
# upload in parallel
log("uploading in parallel ...")
results: dict[str, tuple[bool, float, str]] = {}
with cf.ThreadPoolExecutor(max_workers=max(2, len(targets))) as pool:
futs = {pool.submit(upload_one, ip, payload, psk): ip for ip in targets}
for fut in cf.as_completed(futs):
ip = futs[fut]
ok, dt, body = fut.result()
results[ip] = (ok, dt, body)
tag = _c("32", "ok") if ok else _c("31", "ERR")
log(f"{ip} upload {tag} in {dt:.1f}s body={body[:120]}")
if args.no_verify:
log("--no-verify — done")
return 0 if all(v[0] for v in results.values()) else 1
# verify
log(f"waiting {REBOOT_WAIT_S}s for reboot ...")
time.sleep(REBOOT_WAIT_S)
fail = False
for ip in targets:
new_st: dict | None = None
for _ in range(VERIFY_RETRIES):
new_st = get_ota_status(ip)
if new_st:
break
time.sleep(VERIFY_DELAY_S)
if not new_st:
err(f"{ip}: not reachable after reboot — DEAD or panic loop")
fail = True
continue
new_part = new_st.get("running_partition", "?")
new_time = new_st.get("time", "?")
if new_part == before.get(ip):
err(f"{ip}: running_partition still {new_part} — OTA did NOT take "
"(likely panic on first boot from new slot)")
fail = True
else:
log(f"{ip}: {before[ip]} → {_c('32', new_part)} (time={new_time}) ✓")
return 1 if fail else 0
if __name__ == "__main__":
try:
sys.exit(main(sys.argv[1:]))
except KeyboardInterrupt:
err("interrupted")
sys.exit(130)

241
scripts/record-baseline.py Executable file
View File

@ -0,0 +1,241 @@
#!/usr/bin/env python3
"""
Record an empty-room baseline for the RuView sensing-server.
ADR-103 v2 persistent baseline override that's stable across NBVI
re-selection between server restarts. Computes baseline from the FULL
amplitude vector (all non-zero subcarriers), not from the dynamic NBVI
top-K subset.
Usage:
1. Operator steps out of the room.
2. Run: scripts/record-baseline.py [--duration 90] [--server localhost]
3. Wait for the "saved" message. Operator can come back.
4. Restart sensing-server to pick up the new baseline.
The script connects to the live WebSocket stream, records `duration`
seconds of per-node amplitudes, trims the first and last 15 seconds
(catches door-opening transients), then for each node finds the most
stable 30-second sub-window (lowest broadband CV) and writes per-node
full-broadband mean / median / p95 to data/baseline.json.
"""
import argparse
import asyncio
import json
import math
import statistics
import sys
import time
from datetime import datetime, timezone
from pathlib import Path
try:
import websockets
except ImportError:
print("error: pip install websockets", file=sys.stderr)
sys.exit(2)
def full_broadband_mean(amps):
"""Mean over all non-zero subcarriers (skips guard tones)."""
valid = [v for v in amps if v > 0]
return (sum(valid) / len(valid)) if valid else 0.0
def circular_mean_var(phases):
"""ADR-104 phase-domain: circular mean (radians) and circular variance
(1 - |R|, in [0, 1]) over a list of unwrapped/atan2 phase samples.
Variance close to 0 = phases tightly clustered (stable subcarrier,
suitable for baseline-comparison). Close to 1 = phases scattered
(subcarrier is noisy; baseline reference unreliable).
"""
n = len(phases)
if n == 0:
return (0.0, 1.0)
sx = sum(math.sin(p) for p in phases) / n
cx = sum(math.cos(p) for p in phases) / n
r = math.sqrt(sx * sx + cx * cx)
mean = math.atan2(sx, cx)
var = 1.0 - r
return (mean, var)
async def record(server: str, duration: float, port: int):
# Per-node frame log: (t_sec, amps, phases, rssi).
# ADR-104 phase-domain: phases captured alongside amplitudes when the
# WS payload carries `phases` (ADR-106 full complex CSI). Missing or
# empty phase vectors → trim_and_clean writes only amplitude baseline.
by_node: dict[int, list[tuple[float, list[float], list[float], float]]] = {}
url = f"ws://{server}:{port}/ws/sensing"
start = time.time()
print(f"connecting to {url} — recording {duration:.0f}s …", flush=True)
async with websockets.connect(url) as ws:
async for msg in ws:
d = json.loads(msg)
if d.get("type") != "sensing_update":
continue
t = time.time() - start
for n in d.get("nodes") or []:
a = n.get("amplitude") or []
if not a:
continue
ph = n.get("phases") or []
by_node.setdefault(n["node_id"], []).append(
(t, a, ph, n.get("rssi_dbm", 0.0))
)
if time.time() - start >= duration:
break
return by_node
def trim_and_clean(frames, trim_head_sec=15.0, trim_tail_sec=15.0, clean_window_sec=30.0):
"""Trim head/tail transients, then scan for the cleanest sub-window.
`frames` is a list of (t_sec, amps, phases, rssi). `phases` may be an
empty list when the server hasn't been upgraded to emit them — in
that case the resulting baseline omits the phase-domain fields and
the server falls back to amplitude-only drift (ADR-104 baseline mode).
"""
if not frames:
return None
t0 = frames[0][0]
t1 = frames[-1][0]
dur = t1 - t0
if dur < trim_head_sec + trim_tail_sec + clean_window_sec / 2:
head = dur / 6
tail = dur / 6
else:
head = trim_head_sec
tail = trim_tail_sec
trimmed = [f for f in frames if t0 + head <= f[0] <= t1 - tail]
if not trimmed:
return None
win = clean_window_sec
if (trimmed[-1][0] - trimmed[0][0]) <= win:
chunk = trimmed
else:
best = None # (cv, frames)
step = 5.0
cursor = trimmed[0][0]
while cursor + win <= trimmed[-1][0]:
window = [f for f in trimmed if cursor <= f[0] <= cursor + win]
if len(window) >= 5:
bms = [full_broadband_mean(a) for _, a, _ in window]
mu = statistics.mean(bms)
if mu > 0:
sd = statistics.pstdev(bms)
cv = sd / mu
if best is None or cv < best[0]:
best = (cv, window)
cursor += step
if best is None or not best[1]:
return None
chunk = best[1]
# ── Compute per-node stats on the clean window ───────────────
full_means = [full_broadband_mean(a) for _, a, _ in chunk]
rssis = [r for _, _, _, r in chunk if r != 0]
sorted_full = sorted(full_means)
# Per-subcarrier mean across the clean window (for diagnostic + future
# subcarrier-level comparison if the server gets that capability).
n_sub = min(len(a) for _, a, _, _ in chunk)
per_sub_means = []
for k in range(n_sub):
vs = [a[k] for _, a, _, _ in chunk if k < len(a) and a[k] > 0]
per_sub_means.append(statistics.mean(vs) if vs else 0.0)
# ADR-104 phase-domain: per-subcarrier circular mean + variance of the
# captured phase samples. Only included if the WS stream carried
# phases — server tolerates either schema.
have_phases = any(ph for _, _, ph, _ in chunk)
per_sub_phase_means: list[float] = []
per_sub_phase_vars: list[float] = []
if have_phases:
n_phase_sub = min(
(len(ph) for _, _, ph, _ in chunk if ph),
default=0,
)
for k in range(n_phase_sub):
samples = [ph[k] for _, _, ph, _ in chunk if k < len(ph)]
if not samples:
per_sub_phase_means.append(0.0)
per_sub_phase_vars.append(1.0)
continue
mean, var = circular_mean_var(samples)
per_sub_phase_means.append(mean)
per_sub_phase_vars.append(var)
result = {
# Persistent fields the server reads:
"full_broadband_mean": statistics.mean(full_means),
"full_broadband_p50": sorted_full[len(sorted_full)//2],
"full_broadband_p95": sorted_full[int(len(sorted_full)*0.95)],
"full_broadband_std": statistics.pstdev(full_means),
"full_broadband_cv_pct": 100*statistics.pstdev(full_means)/statistics.mean(full_means)
if statistics.mean(full_means) else 0.0,
# Reference:
"rssi_dbm": statistics.mean(rssis) if rssis else 0.0,
"n_samples": len(full_means),
"window_start_sec": chunk[0][0],
"window_end_sec": chunk[-1][0],
# Per-subcarrier diagnostic (kept so future server versions can do
# subcarrier-level comparison without re-recording):
"per_subcarrier_mean": [round(v, 3) for v in per_sub_means],
}
if per_sub_phase_means:
# Rounding: 4 decimals on mean phase (radian), 3 on variance
# — phase variance is in [0,1] so 3 decimals is plenty.
result["per_subcarrier_phase_mean"] = [round(v, 4) for v in per_sub_phase_means]
result["per_subcarrier_phase_var"] = [round(v, 3) for v in per_sub_phase_vars]
return result
def main():
ap = argparse.ArgumentParser(description=__doc__.splitlines()[1])
ap.add_argument("--duration", type=float, default=90.0, help="seconds to record (default 90)")
ap.add_argument("--server", default="localhost", help="sensing-server host")
ap.add_argument("--port", type=int, default=8765, help="ws port (default 8765)")
ap.add_argument("--out", type=Path, default=Path("v2/data/baseline.json"))
ap.add_argument("--trim-head", type=float, default=15.0)
ap.add_argument("--trim-tail", type=float, default=15.0)
ap.add_argument("--clean-window", type=float, default=30.0)
args = ap.parse_args()
by_node = asyncio.run(record(args.server, args.duration, args.port))
if not by_node:
print("no data received from server", file=sys.stderr)
sys.exit(1)
out = {
"version": 2,
"captured_at": datetime.now(timezone.utc).isoformat(timespec="seconds"),
"duration_sec": args.duration,
"trim_head_sec": args.trim_head,
"trim_tail_sec": args.trim_tail,
"clean_window_sec": args.clean_window,
"method": "record → trim head/tail → find lowest-CV sub-window → FULL-broadband stats per node",
"nodes": {},
}
print()
for nid, frames in sorted(by_node.items()):
result = trim_and_clean(frames, args.trim_head, args.trim_tail, args.clean_window)
if not result:
print(f"node {nid}: not enough data for cleaning (skipped)")
continue
out["nodes"][str(nid)] = result
print(f"node {nid}: {len(frames)} raw frames, kept cleanest {result['n_samples']}-sample window")
print(f" FULL broadband: mean={result['full_broadband_mean']:.2f} std={result['full_broadband_std']:.2f} CV={result['full_broadband_cv_pct']:.2f}%")
print(f" full p50={result['full_broadband_p50']:.2f} p95={result['full_broadband_p95']:.2f} rssi={result['rssi_dbm']:.1f}")
args.out.parent.mkdir(parents=True, exist_ok=True)
args.out.write_text(json.dumps(out, indent=2))
print(f"\nsaved → {args.out}")
print("restart sensing-server to load the new baseline.")
if __name__ == "__main__":
main()

View File

@ -1515,6 +1515,40 @@ export class LiveDemoTab {
} catch (error) {
this.logger.warn('Could not fetch models', { error: error.message });
}
// ADR-116 / ADR-117: surface WiFlow-v1 in the Model Control dropdown
// when the server reports `pose_estimation: true` via /api/v1/info.
// WiFlow is loaded outside the RVF model registry path (--wiflow-model
// flag) so listModels() above doesn't return it. We add a virtual
// entry and mark it active ONLY when no RVF model is already active
// — otherwise the dropdown would silently flip from the operator's
// chosen RVF model to "WiFlow-v1" every fetch.
try {
const r = await fetch('/api/v1/info');
if (r.ok) {
const info = await r.json();
if (info?.features?.pose_estimation) {
if (!this.modelState.models.some(m => m.id === 'wiflow-v1')) {
this.modelState.models.unshift({
id: 'wiflow-v1',
name: 'WiFlow-v1 (lite, 186K params, --wiflow-model)',
});
}
if (!this.modelState.activeModelId) {
this.modelState.activeModelId = 'wiflow-v1';
this.modelState.activeModelInfo = {
model_id: 'wiflow-v1',
name: 'WiFlow-v1',
version: 'lite',
pck_score: 0.929, // from model card; eval-set, not this deployment
};
}
this.populateModelSelector();
this.updateModelUI();
}
}
} catch (e) {
this.logger.warn('ADR-116 info probe failed', { error: e.message });
}
}
populateModelSelector() {

View File

@ -51,6 +51,17 @@ export class PoseDetectionCanvas {
this.showTrail = false;
this.maxTrailLength = 10;
// ADR-105 / ADR-113: model-load gating. The canvas refuses to draw
// skeletons until /api/v1/pose/stats reports model_loaded === true,
// so an empty/zero-confidence keypoint stream from a model-less
// server doesn't paint a misleading "phantom" pose.
//
// null = "haven't asked yet" (treated as not-loaded for rendering).
this.modelLoaded = null;
this.modelStatusUrl = options.modelStatusUrl || '/api/v1/pose/stats';
this.modelStatusPollMs = options.modelStatusPollMs || 30000;
this.modelStatusTimer = null;
// Initialize component
this.initializeComponent();
}
@ -79,9 +90,79 @@ export class PoseDetectionCanvas {
// Set up pose service subscription
this.setupPoseServiceSubscription();
// ADR-105: poll model_loaded so we can hide the canvas when no
// trained pose model is on the server.
this.checkModelStatus();
this.modelStatusTimer = setInterval(
() => this.checkModelStatus(),
this.modelStatusPollMs
);
this.logger.info('PoseDetectionCanvas component initialized successfully');
}
/**
* Fetch `/api/v1/pose/stats` and update `this.modelLoaded`. On the
* leading-edge transitions (null false, true false) we hide the
* pose canvas and overlay a "No model loaded" notice so the operator
* isn't fooled by an empty skeleton renderer.
*/
async checkModelStatus() {
try {
const resp = await fetch(this.modelStatusUrl, { cache: 'no-store' });
if (!resp.ok) {
// Server reachable but not surfacing pose stats — be safe.
this.setModelLoaded(false, 'pose-stats endpoint error');
return;
}
const json = await resp.json();
const loaded = json && json.model_loaded === true;
this.setModelLoaded(loaded, null);
} catch (e) {
// Network blip — don't flip-flop the UI on a transient failure.
this.logger.debug('model-status poll failed', { err: e.message });
}
}
setModelLoaded(loaded, errOrNull) {
if (this.modelLoaded === loaded) return;
this.modelLoaded = loaded;
this.logger.info('model-loaded state changed', { loaded, note: errOrNull });
this.updateCanvasVisibility();
}
updateCanvasVisibility() {
if (!this.canvas) return;
const wrap = this.canvas.parentElement; // .pose-canvas-container
const overlayId = `model-overlay-${this.containerId}`;
let overlay = document.getElementById(overlayId);
if (this.modelLoaded === true) {
this.canvas.style.visibility = 'visible';
if (overlay) overlay.style.display = 'none';
return;
}
// No model — hide the canvas and show a clear notice.
this.canvas.style.visibility = 'hidden';
if (!overlay && wrap) {
overlay = document.createElement('div');
overlay.id = overlayId;
overlay.className = 'pose-model-missing';
overlay.style.cssText =
'position:absolute;inset:0;display:flex;align-items:center;' +
'justify-content:center;color:#888;font-family:JetBrains Mono,monospace;' +
'font-size:13px;text-align:center;padding:20px;background:#0d1117;';
overlay.innerHTML =
'No trained pose model loaded.<br>' +
'<span style="color:#555;font-size:11px;">' +
'Pose rendering disabled — sensing channels still active in ' +
'the Sensing / Hardware tabs (ADR-105).</span>';
wrap.style.position = 'relative';
wrap.appendChild(overlay);
} else if (overlay) {
overlay.style.display = 'flex';
}
}
createDOMStructure() {
this.container.innerHTML = `
<div class="pose-detection-canvas-wrapper">
@ -516,6 +597,13 @@ export class PoseDetectionCanvas {
if (!this.renderer || !this.state.isActive) {
return;
}
// ADR-105: refuse to paint anything when the server has no trained
// pose model — empty/zero-confidence keypoints would otherwise show
// up as a misleading skeleton. The overlay from
// updateCanvasVisibility() already tells the operator why.
if (this.modelLoaded !== true) {
return;
}
try {
// Render trail before the current frame if enabled
@ -1535,6 +1623,12 @@ export class PoseDetectionCanvas {
this.unsubscribeFunctions.forEach(unsubscribe => unsubscribe());
this.unsubscribeFunctions = [];
// ADR-105: stop the model-status poll.
if (this.modelStatusTimer) {
clearInterval(this.modelStatusTimer);
this.modelStatusTimer = null;
}
// Clean up resize observer
if (this.resizeObserver) {
this.resizeObserver.disconnect();

View File

@ -488,8 +488,10 @@
</div>
</section>
<!-- Sensing Tab -->
<section id="sensing" class="tab-content"></section>
<!-- Sensing Tab (ADR-117: container div required by app.js SensingTab.mount) -->
<section id="sensing" class="tab-content">
<div id="sensing-container"></div>
</section>
<!-- Training Tab -->
<section id="training" class="tab-content">

View File

@ -1,3 +1,8 @@
export const WS_PATH = '/api/v1/stream/pose';
// RuView sensing-server (Rust+Axum) exposes the live stream at /ws/sensing on
// its dedicated WebSocket port (default 8765). The legacy wifi-densepose v1
// path (/api/v1/stream/pose) is kept as a fallback in case the mobile app is
// pointed at an old FastAPI backend.
export const WS_PATH = '/ws/sensing';
export const WS_PORT = 8765;
export const RECONNECT_DELAYS = [1000, 2000, 4000, 8000, 16000];
export const MAX_RECONNECT_ATTEMPTS = 10;

View File

@ -124,8 +124,11 @@ export const MATScreen = () => {
const { height } = useWindowDimensions();
const webHeight = Math.max(240, Math.floor(height * 0.5));
const showOverlay = dataSource === 'simulated' && !simulationAcknowledged;
const showBanner = dataSource === 'simulated' && simulationAcknowledged;
// Simulation overlay/banner removed — UI shows only real signals from the
// sensing-server. The `dataSource === 'simulated'` branch is never reached
// in production builds (server refuses --source simulate).
const showOverlay = false;
const showBanner = false;
return (
<ThemedView style={{ flex: 1, backgroundColor: colors.bg, padding: spacing.md }}>

View File

@ -60,7 +60,7 @@ export default function VitalsScreen() {
<ConnectionBanner status={bannerStatus} />
<ScrollView contentContainerStyle={styles.content} showsVerticalScrollIndicator={false}>
<View style={styles.headerRow}>{isSimulated ? <ModeBadge mode="SIM" /> : null}</View>
<View style={styles.headerRow}>{/* SIM badge removed: production shows only real signals. */}</View>
<View style={styles.gaugesRow}>
<View style={styles.gaugeCard}>

View File

@ -1,7 +1,5 @@
import { SIMULATION_TICK_INTERVAL_MS } from '@/constants/simulation';
import { MAX_RECONNECT_ATTEMPTS, RECONNECT_DELAYS, WS_PATH } from '@/constants/websocket';
import { MAX_RECONNECT_ATTEMPTS, RECONNECT_DELAYS, WS_PATH, WS_PORT } from '@/constants/websocket';
import { usePoseStore } from '@/stores/poseStore';
import { generateSimulatedData } from '@/services/simulation.service';
import type { ConnectionStatus, SensingFrame } from '@/types/sensing';
type FrameListener = (frame: SensingFrame) => void;
@ -11,7 +9,6 @@ class WsService {
private listeners = new Set<FrameListener>();
private reconnectAttempt = 0;
private reconnectTimer: ReturnType<typeof setTimeout> | null = null;
private simulationTimer: ReturnType<typeof setInterval> | null = null;
private targetUrl = '';
private active = false;
private status: ConnectionStatus = 'disconnected';
@ -22,8 +19,9 @@ class WsService {
this.reconnectAttempt = 0;
if (!url) {
this.handleStatusChange('simulated');
this.startSimulation();
// No server URL configured — stay disconnected. Production builds
// never fall back to synthetic data.
this.handleStatusChange('disconnected');
return;
}
@ -40,7 +38,6 @@ class WsService {
socket.onopen = () => {
this.reconnectAttempt = 0;
this.stopSimulation();
this.handleStatusChange('connected');
};
@ -78,7 +75,6 @@ class WsService {
disconnect(): void {
this.active = false;
this.clearReconnectTimer();
this.stopSimulation();
if (this.ws) {
this.ws.close(1000, 'client disconnect');
this.ws = null;
@ -100,7 +96,9 @@ class WsService {
private buildWsUrl(rawUrl: string): string {
const parsed = new URL(rawUrl);
const proto = parsed.protocol === 'https:' || parsed.protocol === 'wss:' ? 'wss:' : 'ws:';
return `${proto}//${parsed.host}${WS_PATH}`;
// RuView sensing-server runs WS on a separate port (WS_PORT, default 8765),
// independent of the HTTP API port. Build the WS URL with that port.
return `${proto}//${parsed.hostname}:${WS_PORT}${WS_PATH}`;
}
private handleStatusChange(status: ConnectionStatus): void {
@ -118,8 +116,8 @@ class WsService {
}
if (this.reconnectAttempt >= MAX_RECONNECT_ATTEMPTS) {
this.handleStatusChange('simulated');
this.startSimulation();
// Give up — stay disconnected. No synthetic fallback.
this.handleStatusChange('disconnected');
return;
}
@ -130,27 +128,6 @@ class WsService {
this.reconnectTimer = null;
this.connect(this.targetUrl);
}, delay);
this.startSimulation();
}
private startSimulation(): void {
if (this.simulationTimer) {
return;
}
this.simulationTimer = setInterval(() => {
this.handleStatusChange('simulated');
const frame = generateSimulatedData();
this.listeners.forEach((listener) => {
listener(frame);
});
}, SIMULATION_TICK_INTERVAL_MS);
}
private stopSimulation(): void {
if (this.simulationTimer) {
clearInterval(this.simulationTimer);
this.simulationTimer = null;
}
}
private clearReconnectTimer(): void {

View File

@ -26,8 +26,8 @@ export const useMatStore = create<MatState>((set) => ({
survivors: [],
alerts: [],
selectedEventId: null,
dataSource: 'simulated',
simulationAcknowledged: false,
dataSource: 'real',
simulationAcknowledged: true,
upsertEvent: (event) => {
set((state) => {

View File

@ -18,7 +18,9 @@ export interface SettingsState {
export const useSettingsStore = create<SettingsState>()(
persist(
(set) => ({
serverUrl: 'http://localhost:3000',
// Defaults to the Mac's Tailscale IP so the phone can reach the
// sensing-server from any network. Override in Settings if needed.
serverUrl: 'http://100.123.189.10:8080',
rssiScanEnabled: false,
theme: 'system',
alertSoundEnabled: true,

View File

@ -1,4 +1,4 @@
use std::net::{SocketAddr, UdpSocket};
use std::net::{IpAddr, Ipv4Addr, SocketAddr, UdpSocket};
use std::time::Duration;
use mdns_sd::{ServiceDaemon, ServiceEvent};
@ -37,13 +37,15 @@ pub async fn discover_nodes(
) -> Result<Vec<DiscoveredNode>, String> {
let timeout_duration = Duration::from_millis(timeout_ms.unwrap_or(3000));
// Run mDNS and UDP discovery concurrently
let (mdns_nodes, udp_nodes) = tokio::join!(
discover_via_mdns(timeout_duration),
discover_via_udp(timeout_duration),
);
// Current RuView FW doesn't advertise mDNS `_ruview._udp.local.` and
// doesn't respond to UDP broadcast beacons, so those two paths return
// nothing on every poll and just burn CPU/network. HTTP sweep alone
// suffices for our deployment.
let http_nodes = discover_via_http_sweep(timeout_duration).await;
let mdns_nodes: Result<Vec<DiscoveredNode>, String> = Ok(Vec::new());
let udp_nodes: Result<Vec<DiscoveredNode>, String> = Ok(Vec::new());
// Merge results, deduplicating by MAC address
// Merge results, deduplicating by MAC address (or IP for HTTP-only nodes)
let mut registry = NodeRegistry::new();
for node in mdns_nodes.unwrap_or_default() {
@ -58,7 +60,23 @@ pub async fn discover_nodes(
}
}
let http_vec = http_nodes.unwrap_or_default();
let _ = std::fs::OpenOptions::new().create(true).append(true)
.open("/tmp/ruview-discovery.log")
.map(|mut f| { use std::io::Write; let _ = writeln!(f, "[discover] http_vec.len()={}", http_vec.len()); });
for node in http_vec {
// HTTP sweep returns nodes without MAC — key by IP-derived pseudo-MAC
let key = node.mac.clone().unwrap_or_else(|| format!("ip:{}", node.ip));
let _ = std::fs::OpenOptions::new().create(true).append(true)
.open("/tmp/ruview-discovery.log")
.map(|mut f| { use std::io::Write; let _ = writeln!(f, "[discover] upsert key={} ip={}", key, node.ip); });
registry.upsert(MacAddress::new(&key), node);
}
let nodes: Vec<DiscoveredNode> = registry.all().into_iter().cloned().collect();
let _ = std::fs::OpenOptions::new().create(true).append(true)
.open("/tmp/ruview-discovery.log")
.map(|mut f| { use std::io::Write; let _ = writeln!(f, "[discover] returning {} nodes", nodes.len()); });
// Update global state
{
@ -219,6 +237,155 @@ async fn discover_via_udp(timeout_duration: Duration) -> Result<Vec<DiscoveredNo
/// Parse a UDP beacon response into a DiscoveredNode.
/// Format: RUVIEW_BEACON|<mac>|<node_id>|<version>|<chip>|<role>|<tdm_slot>|<tdm_total>
/// Discover nodes via HTTP probe of `/ota/status` on port 8032 across local /24 subnet.
///
/// Strategy:
/// 1. Detect host IPv4 by opening a non-routable UDP socket "connect" to 8.8.8.8.
/// 2. For each host address in the /24 (1..=254, excluding self), send
/// `GET http://X.X.X.X:8032/ota/status` with a short per-request timeout.
/// 3. If the response is JSON containing `version` + `running_partition`,
/// treat the device as a RuView CSI node and build a `DiscoveredNode`.
///
/// MAC is left as `None` (sensors don't expose it on /ota/status); UI manual
/// add or a future FW field could fill it in.
async fn discover_via_http_sweep(timeout_duration: Duration) -> Result<Vec<DiscoveredNode>, String> {
// 1. Detect host IPv4
let host_ip = match detect_host_ipv4() {
Some(ip) => ip,
None => {
tracing::warn!("HTTP sweep: could not determine host IPv4");
return Ok(Vec::new());
}
};
let octets = host_ip.octets();
let base = (octets[0], octets[1], octets[2]);
tracing::info!("HTTP sweep on {}.{}.{}.0/24 (self={})", base.0, base.1, base.2, host_ip);
// 2. Build HTTP client with per-request timeout
// Per-request timeout — generous enough for ESP32 HTTP server to respond
// even under WiFi contention. With join_all of all 254 probes in parallel,
// total elapsed = max(per_req_timeout, slowest_response) ≈ 1.5 s.
let per_req_timeout = std::cmp::min(timeout_duration, Duration::from_millis(1500));
let client = match reqwest::Client::builder()
.timeout(per_req_timeout)
.build()
{
Ok(c) => c,
Err(e) => {
tracing::warn!("HTTP sweep: client build failed: {}", e);
return Ok(Vec::new());
}
};
// 3. Probe all hosts in parallel (capped by spawning futures)
let mut tasks: Vec<tokio::task::JoinHandle<Option<DiscoveredNode>>> = Vec::new();
// Scan only the low end of /24 (2..=60) — typical home/office DHCP pool
// for IoT devices. Sweeping all 254 hosts every 10 s causes UI lag on
// tokio runtime saturation. Operators with sensors at higher offsets
// should expand this range.
for h in 2u8..=60u8 {
if h == octets[3] {
continue; // skip self
}
let ip = format!("{}.{}.{}.{}", base.0, base.1, base.2, h);
let client = client.clone();
tasks.push(tokio::spawn(async move {
// Probe FW5.47 /status first, then RuView /ota/status fallback.
let url1 = format!("http://{}:8032/status", ip);
let body: String = match client.get(&url1).send().await {
Ok(r) if r.status().is_success() => match r.text().await {
Ok(t) => t,
Err(_) => return None,
},
_ => {
let url2 = format!("http://{}:8032/ota/status", ip);
match client.get(&url2).send().await {
Ok(r) if r.status().is_success() => match r.text().await {
Ok(t) => t,
Err(_) => return None,
},
_ => return None,
}
}
};
let _ = std::fs::OpenOptions::new().create(true).append(true)
.open("/tmp/ruview-discovery.log")
.map(|mut f| { use std::io::Write; let _ = writeln!(f, "[probe] {} OK len={}", ip, body.len()); });
let v: serde_json::Value = match serde_json::from_str(&body) {
Ok(v) => v,
Err(e) => {
let _ = std::fs::OpenOptions::new().create(true).append(true)
.open("/tmp/ruview-discovery.log")
.map(|mut f| { use std::io::Write; let _ = writeln!(f, "[probe] {} json err: {}", ip, e); });
return None;
}
};
// Both FW5.47 (`version`,`fw`,`node`) and RuView (`version`,`running_partition`).
let version = v.get("version").and_then(|x| x.as_str()).map(String::from)
.or_else(|| v.get("version").and_then(|x| Some(x.to_string())))
.unwrap_or_else(|| "unknown".to_string());
let mac = v.get("node").and_then(|x| x.as_str()).map(String::from);
Some(DiscoveredNode {
ip,
mac,
hostname: None,
node_id: 0,
firmware_version: Some(version),
health: HealthStatus::Online,
last_seen: chrono::Utc::now().to_rfc3339(),
chip: Chip::Esp32s3,
mesh_role: MeshRole::Node,
discovery_method: DiscoveryMethod::HttpSweep,
tdm_slot: None,
tdm_total: None,
edge_tier: None,
uptime_secs: None,
capabilities: Some(NodeCapabilities {
wasm: false,
ota: true,
csi: true,
}),
friendly_name: None,
notes: None,
})
}));
}
// 4. Wait with overall budget
// Wait for ALL tasks to settle in parallel, bounded by the overall budget.
// Previously used a sequential `for task in tasks { select! }` which awaited
// tasks in IP order — a non-responding 192.168.1.1 blocked discovery of
// 192.168.1.17/19 even though those completed in ~50 ms.
let join_all_fut = futures::future::join_all(tasks);
let results = match tokio::time::timeout(timeout_duration, join_all_fut).await {
Ok(rs) => rs,
Err(_) => {
tracing::info!("HTTP sweep timeout — partial results lost");
Vec::new()
}
};
let mut found = Vec::new();
for r in results {
if let Ok(Some(node)) = r {
tracing::info!("HTTP sweep found {} fw={:?}", node.ip, node.firmware_version);
found.push(node);
}
}
Ok(found)
}
/// Determine the primary IPv4 of this host by "connecting" a UDP socket
/// to a non-routable target (no packets sent) and reading local_addr.
fn detect_host_ipv4() -> Option<Ipv4Addr> {
let sock = UdpSocket::bind("0.0.0.0:0").ok()?;
sock.connect("8.8.8.8:80").ok()?;
let local = sock.local_addr().ok()?;
match local.ip() {
IpAddr::V4(v4) if !v4.is_loopback() => Some(v4),
_ => None,
}
}
fn parse_beacon_response(data: &[u8], addr: SocketAddr) -> Option<DiscoveredNode> {
let text = std::str::from_utf8(data).ok()?;
let parts: Vec<&str> = text.split('|').collect();

View File

@ -101,17 +101,47 @@ pub async fn start_server(
if let Some(port) = config.udp_port {
cmd.args(["--udp-port", &port.to_string()]);
}
if let Some(ref bind_addr) = config.bind_address {
cmd.args(["--bind", bind_addr]);
}
// Bind address: default to 0.0.0.0 so LAN-connected ESP32 nodes can reach us.
let bind_addr = config
.bind_address
.as_deref()
.unwrap_or("0.0.0.0");
cmd.args(["--bind-addr", bind_addr]);
// Pass log level via RUST_LOG env (sensing-server reads tracing_subscriber env).
if let Some(ref log_level) = config.log_level {
cmd.args(["--log-level", log_level]);
cmd.env("RUST_LOG", log_level);
}
// Set data source (default to "simulate" if not specified for demo mode)
let source = config.source.as_deref().unwrap_or("simulate");
// Set data source (default to "esp32" for real CSI ingest; UI may override)
let source = config.source.as_deref().unwrap_or("esp32");
cmd.args(["--source", source]);
// Auto-load bundled vital-signs RVF model if present next to the binary.
// Searches: <exe_dir>/wifi-densepose-v1.rvf, then <resource_dir>/wifi-densepose-v1.rvf.
let mut model_path: Option<std::path::PathBuf> = None;
if let Ok(exe) = std::env::current_exe() {
if let Some(dir) = exe.parent() {
let candidate = dir.join("wifi-densepose-v1.rvf");
if candidate.exists() {
model_path = Some(candidate);
}
}
}
if model_path.is_none() {
if let Ok(resource_dir) = app.path().resource_dir() {
let candidate = resource_dir.join("wifi-densepose-v1.rvf");
if candidate.exists() {
model_path = Some(candidate);
}
}
}
if let Some(p) = model_path {
tracing::info!("Auto-loading vital-signs RVF model: {}", p.display());
cmd.args(["--load-rvf", &p.to_string_lossy()]);
} else {
tracing::warn!("No wifi-densepose-v1.rvf found next to binary or in resources; vital signs disabled");
}
// Redirect stdout/stderr to pipes for monitoring
cmd.stdout(Stdio::piped());
cmd.stderr(Stdio::piped());

View File

@ -1,12 +1,12 @@
{
"name": "ruview-desktop-ui",
"version": "0.3.0",
"version": "0.4.4",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "ruview-desktop-ui",
"version": "0.3.0",
"version": "0.4.4",
"dependencies": {
"@tauri-apps/api": "^2.0.0",
"@tauri-apps/plugin-dialog": "^2.6.0",
@ -53,7 +53,6 @@
"integrity": "sha512-CGOfOJqWjg2qW/Mb6zNsDm+u5vFQ8DxXfbM09z69p5Z6+mE1ikP2jUXw+j42Pf1XTYED2Rni5f95npYeuwMDQA==",
"dev": true,
"license": "MIT",
"peer": true,
"dependencies": {
"@babel/code-frame": "^7.29.0",
"@babel/generator": "^7.29.0",
@ -1247,7 +1246,6 @@
"integrity": "sha512-z9VXpC7MWrhfWipitjNdgCauoMLRdIILQsAEV+ZesIzBq/oUlxk0m3ApZuMFCXdnS4U7KrI+l3WRUEGQ8K1QKw==",
"dev": true,
"license": "MIT",
"peer": true,
"dependencies": {
"@types/prop-types": "*",
"csstype": "^3.2.2"
@ -1317,7 +1315,6 @@
}
],
"license": "MIT",
"peer": true,
"dependencies": {
"baseline-browser-mapping": "^2.9.0",
"caniuse-lite": "^1.0.30001759",
@ -1587,7 +1584,6 @@
"integrity": "sha512-5gTmgEY/sqK6gFXLIsQNH19lWb4ebPDLA4SdLP7dsWkIXHWlG66oPuVvXSGFPppYZz8ZDZq0dYYrbHfBCVUb1Q==",
"dev": true,
"license": "MIT",
"peer": true,
"engines": {
"node": ">=12"
},
@ -1629,7 +1625,6 @@
"resolved": "https://registry.npmjs.org/react/-/react-18.3.1.tgz",
"integrity": "sha512-wS+hAgJShR0KhEvPJArfuPVN1+Hz1t0Y6n5jLrGQbkb4urgPE/0Rve+1kMB1v/oWgHgm4WIcV+i7F2pTVj+2iQ==",
"license": "MIT",
"peer": true,
"dependencies": {
"loose-envify": "^1.1.0"
},
@ -1802,7 +1797,6 @@
"integrity": "sha512-+Oxm7q9hDoLMyJOYfUYBuHQo+dkAloi33apOPP56pzj+vsdJDzr+j1NISE5pyaAuKL4A3UD34qd0lx5+kfKp2g==",
"dev": true,
"license": "MIT",
"peer": true,
"dependencies": {
"esbuild": "^0.25.0",
"fdir": "^6.4.4",

View File

@ -3,7 +3,7 @@ import { invoke } from "@tauri-apps/api/core";
import type { Node } from "../types";
interface UseNodesOptions {
/** Auto-poll interval in milliseconds. Set to 0 to disable. Default: 10000 */
/** Auto-poll interval in milliseconds. Set to 0 to disable. Default: 30000 */
pollInterval?: number;
/** Whether to start scanning on mount. Default: false */
autoScan?: boolean;
@ -23,7 +23,7 @@ interface UseNodesReturn {
}
export function useNodes(options: UseNodesOptions = {}): UseNodesReturn {
const { pollInterval = 10_000, autoScan = false } = options;
const { pollInterval = 30_000, autoScan = false } = options;
const [nodes, setNodes] = useState<Node[]>([]);
const [isScanning, setIsScanning] = useState(false);
@ -37,9 +37,15 @@ export function useNodes(options: UseNodesOptions = {}): UseNodesReturn {
try {
const discovered = await invoke<Node[]>("discover_nodes", {
timeoutMs: 5000,
timeoutMs: 8000,
});
setNodes(discovered);
// Discovery is flaky on busy LANs — overall timeout races with the
// per-request reqwest timeouts and sometimes returns 0 even when
// sensors are reachable. Keep the last good list rather than
// flashing to "no nodes".
if (discovered.length > 0) {
setNodes(discovered);
}
} catch (err) {
const message =
err instanceof Error ? err.message : String(err);

View File

@ -5,11 +5,11 @@ import type { ServerConfig, ServerStatus } from "../types";
const DEFAULT_CONFIG: ServerConfig = {
http_port: 8080,
ws_port: 8765,
udp_port: 5005,
udp_port: 5006,
static_dir: null,
model_dir: null,
log_level: "info",
source: "simulate",
source: "esp32",
};
interface UseServerOptions {

View File

@ -36,9 +36,18 @@ const Dashboard: React.FC<DashboardProps> = ({ onNavigate }) => {
setScanError(null);
try {
const { invoke } = await import("@tauri-apps/api/core");
const found = await invoke<DiscoveredNode[]>("discover_nodes", { timeoutMs: 3000 });
setNodes(found);
if (found.length === 0) {
const found = await invoke<DiscoveredNode[]>("discover_nodes", { timeoutMs: 8000 });
// Merge with existing list — discovery on busy LANs sometimes misses
// a node it found in the previous round. Add new entries, refresh
// ones we see again, keep previously-found ones.
if (found.length > 0) {
setNodes((prev) => {
const byIp = new Map(prev.map((n) => [n.ip, n]));
for (const n of found) byIp.set(n.ip, n);
return Array.from(byIp.values());
});
setScanError(null);
} else if (nodes.length === 0) {
setScanError("No nodes found. Ensure ESP32 devices are powered on and connected to the network.");
}
} catch (err) {

View File

@ -68,7 +68,14 @@ const NetworkDiscovery: React.FC<NetworkDiscoveryProps> = ({ onNavigate }) => {
const found = await invoke<DiscoveredNode[]>("discover_nodes", {
timeoutMs: scanDuration,
});
setNodes(found);
// Merge with existing — flaky LAN scans sometimes miss a node that
// was found a moment ago. Add new entries, refresh ones we see again,
// keep previously-found ones (incl. manual-added).
setNodes((prev) => {
const byIp = new Map(prev.map((n) => [n.ip, n]));
for (const n of found) byIp.set(n.ip, n);
return Array.from(byIp.values());
});
} catch (err) {
setError(err instanceof Error ? err.message : String(err));
} finally {

View File

@ -303,7 +303,7 @@ export const Sensing: React.FC = () => {
const [stopping, setStopping] = useState(false);
// Data source selection
const [dataSource, setDataSource] = useState<DataSource>("simulate");
const [dataSource, setDataSource] = useState<DataSource>("esp32");
// Log viewer state
const [logEntries, setLogEntries] = useState<LogEntry[]>([]);
@ -557,7 +557,6 @@ export const Sensing: React.FC = () => {
opacity: isRunning ? 0.6 : 1,
}}
>
<option value="simulate">Simulate</option>
<option value="esp32">ESP32 (Real)</option>
<option value="wifi">WiFi (RSSI)</option>
<option value="auto">Auto Detect</option>

View File

@ -170,7 +170,7 @@ export interface WasmModule {
// Sensing Server
// ---------------------------------------------------------------------------
export type DataSource = "auto" | "wifi" | "esp32" | "simulate";
export type DataSource = "auto" | "wifi" | "esp32";
export interface ServerConfig {
http_port: number;

View File

@ -21,94 +21,116 @@ use std::path::{Path, PathBuf};
// ── Feature vector ───────────────────────────────────────────────────────────
/// Extended feature vector: 7 server features + 8 subcarrier-derived features = 15.
const N_FEATURES: usize = 15;
/// ADR-118: feature vector redesigned for multi-node use + multicollinearity
/// reduction. Audit on 7-class training set showed:
/// * 17-21 multicollinear pairs (|r|>0.85) — energy features and amplitude
/// scalars were highly redundant.
/// * `amp_min` constant 0.0 across all frames (null subcarrier of HT20),
/// making `amp_range = amp_max - 0` fully redundant with `amp_max`.
/// * On 6-node data F-stat 10× higher than 2-node, but classifier accuracy
/// barely budged (40→44%) because the prior 15-feature pipeline used only
/// `nodes.first()` — 5 of 6 sensors carried zero weight.
///
/// New 22-feature layout:
/// [0..4] global signal features:
/// variance, mean_rssi, dominant_freq_hz, change_points
/// [4..22] per-node features (6 nodes × 3 features each):
/// per node id N∈{1..6}, base = 4 + (N-1)*3:
/// base+0: amp_std — motion / multipath spread
/// base+1: amp_skew — distribution asymmetry (where strong scatterers are)
/// base+2: amp_entropy — spectral diversity (normalised)
/// Total: 22 features.
const N_GLOBAL_FEATURES: usize = 4;
const N_PER_NODE_FEATURES: usize = 3;
const MAX_NODES: usize = 6;
const N_FEATURES: usize = N_GLOBAL_FEATURES + MAX_NODES * N_PER_NODE_FEATURES;
/// ADR-120: exported feature count so external crates (e.g. the main
/// crate's AppStateInner) can size their rolling buffers correctly.
pub const N_FEATURES_PUB: usize = N_FEATURES;
/// Default class names for backward compatibility with old saved models.
const DEFAULT_CLASSES: &[&str] = &["absent", "present_still", "present_moving", "active"];
/// Extract extended feature vector from a JSONL frame (features + raw amplitudes).
/// Extract extended feature vector from a JSONL frame (features + per-node amplitudes).
/// Missing-node features are zero-padded; z-score normalisation later treats
/// them consistently.
pub fn features_from_frame(frame: &serde_json::Value) -> [f64; N_FEATURES] {
let feat = frame.get("features").cloned().unwrap_or(serde_json::Value::Null);
let nodes = frame.get("nodes").and_then(|n| n.as_array());
let amps: Vec<f64> = nodes
.and_then(|ns| ns.first())
.and_then(|n| n.get("amplitude"))
.and_then(|a| a.as_array())
.map(|arr| arr.iter().filter_map(|v| v.as_f64()).collect())
.unwrap_or_default();
let mut out = [0.0f64; N_FEATURES];
// Server-computed features (0-6).
let variance = feat.get("variance").and_then(|v| v.as_f64()).unwrap_or(0.0);
let mbp = feat.get("motion_band_power").and_then(|v| v.as_f64()).unwrap_or(0.0);
let bbp = feat.get("breathing_band_power").and_then(|v| v.as_f64()).unwrap_or(0.0);
let sp = feat.get("spectral_power").and_then(|v| v.as_f64()).unwrap_or(0.0);
let df = feat.get("dominant_freq_hz").and_then(|v| v.as_f64()).unwrap_or(0.0);
let cp = feat.get("change_points").and_then(|v| v.as_f64()).unwrap_or(0.0);
let rssi = feat.get("mean_rssi").and_then(|v| v.as_f64()).unwrap_or(0.0);
// ── Global signal features (0..4) ──
out[0] = feat.get("variance").and_then(|v| v.as_f64()).unwrap_or(0.0);
out[1] = feat.get("mean_rssi").and_then(|v| v.as_f64()).unwrap_or(0.0);
out[2] = feat.get("dominant_freq_hz").and_then(|v| v.as_f64()).unwrap_or(0.0);
out[3] = feat.get("change_points").and_then(|v| v.as_f64()).unwrap_or(0.0);
// Subcarrier-derived features (7-14).
let (amp_mean, amp_std, amp_skew, amp_kurt, amp_iqr, amp_entropy, amp_max, amp_range) =
subcarrier_stats(&amps);
[
variance, mbp, bbp, sp, df, cp, rssi,
amp_mean, amp_std, amp_skew, amp_kurt, amp_iqr, amp_entropy, amp_max, amp_range,
]
// ── Per-node features (4..22) ──
if let Some(nodes) = frame.get("nodes").and_then(|n| n.as_array()) {
for node_obj in nodes {
let nid = node_obj.get("node_id").and_then(|v| v.as_u64()).unwrap_or(0) as usize;
if nid == 0 || nid > MAX_NODES { continue; }
let amps: Vec<f64> = node_obj.get("amplitude")
.or_else(|| node_obj.get("amplitudes"))
.and_then(|a| a.as_array())
.map(|arr| arr.iter().filter_map(|v| v.as_f64()).collect())
.unwrap_or_default();
let (std_a, skew_a, entropy_a) = per_node_stats(&amps);
let base = N_GLOBAL_FEATURES + (nid - 1) * N_PER_NODE_FEATURES;
out[base] = std_a;
out[base + 1] = skew_a;
out[base + 2] = entropy_a;
}
}
out
}
/// Also keep a simpler version for runtime (no JSONL, just FeatureInfo + amps).
pub fn features_from_runtime(feat: &serde_json::Value, amps: &[f64]) -> [f64; N_FEATURES] {
let variance = feat.get("variance").and_then(|v| v.as_f64()).unwrap_or(0.0);
let mbp = feat.get("motion_band_power").and_then(|v| v.as_f64()).unwrap_or(0.0);
let bbp = feat.get("breathing_band_power").and_then(|v| v.as_f64()).unwrap_or(0.0);
let sp = feat.get("spectral_power").and_then(|v| v.as_f64()).unwrap_or(0.0);
let df = feat.get("dominant_freq_hz").and_then(|v| v.as_f64()).unwrap_or(0.0);
let cp = feat.get("change_points").and_then(|v| v.as_f64()).unwrap_or(0.0);
let rssi = feat.get("mean_rssi").and_then(|v| v.as_f64()).unwrap_or(0.0);
let (amp_mean, amp_std, amp_skew, amp_kurt, amp_iqr, amp_entropy, amp_max, amp_range) =
subcarrier_stats(amps);
[
variance, mbp, bbp, sp, df, cp, rssi,
amp_mean, amp_std, amp_skew, amp_kurt, amp_iqr, amp_entropy, amp_max, amp_range,
]
/// Runtime variant: callers pass the already-aggregated feature struct and a
/// slice of (node_id, &amplitudes) pairs. Compatible with the broadcast tick
/// task which has access to all live nodes simultaneously.
pub fn features_from_runtime(
feat: &serde_json::Value,
per_node_amps: &[(u8, &[f64])],
) -> [f64; N_FEATURES] {
let mut out = [0.0f64; N_FEATURES];
out[0] = feat.get("variance").and_then(|v| v.as_f64()).unwrap_or(0.0);
out[1] = feat.get("mean_rssi").and_then(|v| v.as_f64()).unwrap_or(0.0);
out[2] = feat.get("dominant_freq_hz").and_then(|v| v.as_f64()).unwrap_or(0.0);
out[3] = feat.get("change_points").and_then(|v| v.as_f64()).unwrap_or(0.0);
for (nid, amps) in per_node_amps {
let nid = *nid as usize;
if nid == 0 || nid > MAX_NODES { continue; }
let (std_a, skew_a, entropy_a) = per_node_stats(amps);
let base = N_GLOBAL_FEATURES + (nid - 1) * N_PER_NODE_FEATURES;
out[base] = std_a;
out[base + 1] = skew_a;
out[base + 2] = entropy_a;
}
out
}
/// Compute statistical features from raw subcarrier amplitudes.
fn subcarrier_stats(amps: &[f64]) -> (f64, f64, f64, f64, f64, f64, f64, f64) {
/// Compute the 3 per-node statistics used in the new feature vector:
/// std (motion / multipath spread), skew (distribution asymmetry),
/// entropy (spectral diversity, normalised to [0, 1]).
fn per_node_stats(amps: &[f64]) -> (f64, f64, f64) {
if amps.is_empty() {
return (0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0);
return (0.0, 0.0, 0.0);
}
let n = amps.len() as f64;
let mean = amps.iter().sum::<f64>() / n;
let var = amps.iter().map(|a| (a - mean).powi(2)).sum::<f64>() / n;
let std = var.sqrt().max(1e-9);
// Skewness (asymmetry).
let skew = amps.iter().map(|a| ((a - mean) / std).powi(3)).sum::<f64>() / n;
// Kurtosis (peakedness).
let kurt = amps.iter().map(|a| ((a - mean) / std).powi(4)).sum::<f64>() / n - 3.0;
// IQR (inter-quartile range).
let mut sorted = amps.to_vec();
sorted.sort_by(|a, b| a.partial_cmp(b).unwrap());
let q1 = sorted[sorted.len() / 4];
let q3 = sorted[3 * sorted.len() / 4];
let iqr = q3 - q1;
// Spectral entropy (normalised).
let total_power: f64 = amps.iter().map(|a| a * a).sum::<f64>().max(1e-9);
let entropy: f64 = amps.iter()
.map(|a| {
let p = (a * a) / total_power;
if p > 1e-12 { -p * p.ln() } else { 0.0 }
})
.sum::<f64>() / n.ln().max(1e-9); // normalise to [0,1]
let max_val = sorted.last().copied().unwrap_or(0.0);
let range = max_val - sorted.first().copied().unwrap_or(0.0);
(mean, std, skew, kurt, iqr, entropy, max_val, range)
.sum::<f64>() / n.ln().max(1e-9);
(std, skew, entropy)
}
// ── Per-class statistics ─────────────────────────────────────────────────────
@ -121,15 +143,164 @@ pub struct ClassStats {
pub stddev: [f64; N_FEATURES],
}
/// ADR-119: MLP (multi-layer perceptron) hidden-layer width.
/// 32 units is enough capacity for our 22-feature × 6-class problem
/// (~3k weights) while staying small enough to train in <60s on the
/// 151k-frame dataset and load instantly at runtime.
const MLP_HIDDEN: usize = 32;
/// ADR-120: temporal window size (number of consecutive frames stacked
/// into the windowed-MLP input). At the broadcast tick rate (~10 fps),
/// 20 frames = 2 seconds of context — enough to capture walking step
/// cadence (2 Hz), sit-stand transition cycles (0.5 Hz), and breathing
/// modulation. Chosen to match WiFlow's training-time window so amplitude
/// history buffers can be reused.
pub const WINDOW_FRAMES: usize = 20;
/// ADR-120: windowed-MLP input dimensionality = WINDOW_FRAMES × N_FEATURES.
const WINDOWED_INPUT: usize = WINDOW_FRAMES * N_FEATURES;
/// ADR-120: windowed-MLP hidden width. Larger than MLP_HIDDEN because
/// input is 20× wider (440 vs 22). 64 keeps params under 30k.
const WINDOWED_HIDDEN: usize = 64;
/// ADR-119: trained MLP classifier. Single hidden layer, ReLU activation,
/// softmax output. Stored alongside the LogReg weights — when `is_trained()`
/// returns true, `AdaptiveModel::classify` uses the MLP; otherwise it falls
/// back to logistic regression (the legacy path from before ADR-119).
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
pub struct MlpModel {
/// Layer 1 weights, row-major `[N_FEATURES × MLP_HIDDEN]`.
#[serde(default)]
pub w1: Vec<f64>,
/// Layer 1 bias, `[MLP_HIDDEN]`.
#[serde(default)]
pub b1: Vec<f64>,
/// Layer 2 weights, row-major `[MLP_HIDDEN × n_classes]`.
#[serde(default)]
pub w2: Vec<f64>,
/// Layer 2 bias, `[n_classes]`.
#[serde(default)]
pub b2: Vec<f64>,
/// Number of output classes (== len(b2) when trained).
#[serde(default)]
pub n_classes: usize,
}
impl MlpModel {
pub fn is_trained(&self) -> bool {
!self.w1.is_empty() && self.n_classes > 0 && self.b2.len() == self.n_classes
}
/// Forward pass. Input is already z-score normalised by the caller.
/// Returns softmax probabilities of length `n_classes`.
pub fn forward(&self, x: &[f64; N_FEATURES]) -> Vec<f64> {
// Layer 1: h = ReLU(x · W1 + b1)
let mut h = vec![0.0f64; MLP_HIDDEN];
for j in 0..MLP_HIDDEN {
let mut s = self.b1[j];
for i in 0..N_FEATURES {
s += x[i] * self.w1[i * MLP_HIDDEN + j];
}
h[j] = s.max(0.0);
}
// Layer 2: logits = h · W2 + b2
let mut logits = vec![0.0f64; self.n_classes];
for c in 0..self.n_classes {
let mut s = self.b2[c];
for j in 0..MLP_HIDDEN {
s += h[j] * self.w2[j * self.n_classes + c];
}
logits[c] = s;
}
// Softmax.
let m = logits.iter().cloned().fold(f64::NEG_INFINITY, f64::max);
let exp_sum: f64 = logits.iter().map(|z| (z - m).exp()).sum();
logits.iter().map(|z| (z - m).exp() / exp_sum).collect()
}
}
/// ADR-120: Windowed MLP — same architecture as MlpModel but takes a
/// 20-frame × 22-feature stack (440-d input) instead of a single frame.
/// Captures temporal patterns (walking step cadence, sit-stand cycles,
/// breathing modulation) that frame-level classifiers miss.
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
pub struct WindowedMlpModel {
/// Layer 1 weights, row-major `[WINDOWED_INPUT × WINDOWED_HIDDEN]`.
#[serde(default)]
pub w1: Vec<f64>,
/// Layer 1 bias, `[WINDOWED_HIDDEN]`.
#[serde(default)]
pub b1: Vec<f64>,
/// Layer 2 weights, row-major `[WINDOWED_HIDDEN × n_classes]`.
#[serde(default)]
pub w2: Vec<f64>,
/// Layer 2 bias, `[n_classes]`.
#[serde(default)]
pub b2: Vec<f64>,
/// Number of output classes (== len(b2) when trained).
#[serde(default)]
pub n_classes: usize,
}
impl WindowedMlpModel {
pub fn is_trained(&self) -> bool {
!self.w1.is_empty()
&& self.n_classes > 0
&& self.b2.len() == self.n_classes
&& self.w1.len() == WINDOWED_INPUT * WINDOWED_HIDDEN
}
/// Forward pass. `window` is `WINDOW_FRAMES × N_FEATURES` flat,
/// row-major (oldest-frame-first), already z-score normalised.
/// Returns softmax probabilities of length `n_classes`.
pub fn forward(&self, window: &[f64]) -> Vec<f64> {
debug_assert_eq!(window.len(), WINDOWED_INPUT);
// Layer 1: h = ReLU(window · W1 + b1)
let mut h = vec![0.0f64; WINDOWED_HIDDEN];
for j in 0..WINDOWED_HIDDEN {
let mut s = self.b1[j];
for i in 0..WINDOWED_INPUT {
s += window[i] * self.w1[i * WINDOWED_HIDDEN + j];
}
h[j] = s.max(0.0);
}
// Layer 2: logits = h · W2 + b2
let mut logits = vec![0.0f64; self.n_classes];
for c in 0..self.n_classes {
let mut s = self.b2[c];
for j in 0..WINDOWED_HIDDEN {
s += h[j] * self.w2[j * self.n_classes + c];
}
logits[c] = s;
}
let m = logits.iter().cloned().fold(f64::NEG_INFINITY, f64::max);
let exp_sum: f64 = logits.iter().map(|z| (z - m).exp()).sum();
logits.iter().map(|z| (z - m).exp() / exp_sum).collect()
}
}
// ── Trained model ────────────────────────────────────────────────────────────
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct AdaptiveModel {
/// Per-class feature statistics (centroid + spread).
pub class_stats: Vec<ClassStats>,
/// Logistic regression weights: [n_classes x (N_FEATURES + 1)] (last = bias).
/// Dynamic: the outer Vec length equals the number of discovered classes.
/// ADR-119: legacy logistic regression weights, kept as fallback.
/// Shape: `[n_classes × (N_FEATURES + 1)]` (last column = bias).
/// When `mlp.is_trained()` returns true, MLP wins and these are unused
/// at classify time but still updated by `train_from_recordings` so
/// rollback is one-line.
pub weights: Vec<Vec<f64>>,
/// ADR-119: trained MLP (frame-level fallback, used when WindowedMlp
/// has no data yet — e.g. cold start before 20 frames accumulated).
#[serde(default)]
pub mlp: MlpModel,
/// ADR-120: trained Windowed MLP (preferred classifier when trained
/// AND a 20-frame window of fresh features is available at classify
/// time). Captures temporal patterns the frame-level MLP can't see.
#[serde(default)]
pub windowed_mlp: WindowedMlpModel,
/// Global feature normalisation: mean and stddev across all training data.
pub global_mean: [f64; N_FEATURES],
pub global_std: [f64; N_FEATURES],
@ -153,6 +324,8 @@ impl Default for AdaptiveModel {
Self {
class_stats: Vec::new(),
weights: vec![vec![0.0; N_FEATURES + 1]; n_classes],
mlp: MlpModel::default(),
windowed_mlp: WindowedMlpModel::default(),
global_mean: [0.0; N_FEATURES],
global_std: [1.0; N_FEATURES],
trained_frames: 0,
@ -164,39 +337,86 @@ impl Default for AdaptiveModel {
}
impl AdaptiveModel {
/// Classify a raw feature vector. Returns (class_label, confidence).
pub fn classify(&self, raw_features: &[f64; N_FEATURES]) -> (String, f64) {
let n_classes = self.weights.len();
if n_classes == 0 || self.class_stats.is_empty() {
return ("present_still".to_string(), 0.5);
/// ADR-120: classify using a temporal window of recent frames.
/// `window` is `WINDOW_FRAMES × N_FEATURES` flat row-major (oldest first),
/// in raw (un-normalised) units — this fn applies z-score normalisation
/// internally using the model's `global_mean`/`global_std`.
/// Falls back to frame-level `classify()` on the most recent frame when
/// the windowed MLP isn't trained.
pub fn classify_window(&self, window: &[f64]) -> (String, f64) {
if self.windowed_mlp.is_trained() && window.len() == WINDOWED_INPUT {
let mut norm = vec![0.0f64; WINDOWED_INPUT];
for f in 0..WINDOW_FRAMES {
for i in 0..N_FEATURES {
let idx = f * N_FEATURES + i;
norm[idx] = (window[idx] - self.global_mean[i]) / (self.global_std[i] + 1e-9);
}
}
let probs = self.windowed_mlp.forward(&norm);
let (best_c, best_p) = probs.iter().enumerate()
.max_by(|a, b| a.1.partial_cmp(b.1).unwrap())
.unwrap();
let label = if best_c < self.class_names.len() {
self.class_names[best_c].clone()
} else {
"present_still".to_string()
};
return (label, *best_p);
}
// Cold-start fallback: most recent frame via frame-level classifier.
let mut last_frame = [0.0f64; N_FEATURES];
if window.len() >= N_FEATURES {
let off = window.len() - N_FEATURES;
last_frame.copy_from_slice(&window[off..off + N_FEATURES]);
}
self.classify(&last_frame)
}
// Normalise features.
/// Classify a raw feature vector. Returns (class_label, confidence).
/// ADR-119: prefers MLP when trained; falls back to logistic regression
/// otherwise. ADR-120: temporal-context API is `classify_window` —
/// prefer it when callers have a recent feature buffer.
pub fn classify(&self, raw_features: &[f64; N_FEATURES]) -> (String, f64) {
// Normalise features once (shared by MLP and LogReg).
let mut x = [0.0f64; N_FEATURES];
for i in 0..N_FEATURES {
x[i] = (raw_features[i] - self.global_mean[i]) / (self.global_std[i] + 1e-9);
}
// Compute logits: w·x + b for each class.
// ADR-119: MLP path (preferred when trained).
if self.mlp.is_trained() {
let probs = self.mlp.forward(&x);
let (best_c, best_p) = probs.iter().enumerate()
.max_by(|a, b| a.1.partial_cmp(b.1).unwrap())
.unwrap();
let label = if best_c < self.class_names.len() {
self.class_names[best_c].clone()
} else {
"present_still".to_string()
};
return (label, *best_p);
}
// Legacy logistic regression fallback.
let n_classes = self.weights.len();
if n_classes == 0 || self.class_stats.is_empty() {
return ("present_still".to_string(), 0.5);
}
let mut logits: Vec<f64> = vec![0.0; n_classes];
for c in 0..n_classes {
let w = &self.weights[c];
let mut z = w[N_FEATURES]; // bias
let mut z = w[N_FEATURES];
for i in 0..N_FEATURES {
z += w[i] * x[i];
}
logits[c] = z;
}
// Softmax.
let max_logit = logits.iter().cloned().fold(f64::NEG_INFINITY, f64::max);
let exp_sum: f64 = logits.iter().map(|z| (z - max_logit).exp()).sum();
let mut probs: Vec<f64> = vec![0.0; n_classes];
for c in 0..n_classes {
probs[c] = ((logits[c] - max_logit).exp()) / exp_sum;
}
// Pick argmax.
let (best_c, best_p) = probs.iter().enumerate()
.max_by(|a, b| a.1.partial_cmp(b.1).unwrap())
.unwrap();
@ -226,6 +446,7 @@ impl AdaptiveModel {
// ── Training ─────────────────────────────────────────────────────────────────
/// A labeled training sample.
#[derive(Clone)]
struct Sample {
features: [f64; N_FEATURES],
class_idx: usize,
@ -314,13 +535,18 @@ pub fn train_from_recordings(recordings_dir: &Path) -> Result<AdaptiveModel, Str
}
// Second pass: load recordings with the discovered class indices.
// ADR-120: keep recordings grouped so windowed-MLP training can slide
// a temporal window WITHIN each recording (not across recording
// boundaries — would mix classes).
let mut samples: Vec<Sample> = Vec::new();
let mut recording_groups: Vec<Vec<Sample>> = Vec::new();
for (path, fname, class_name) in &file_classes {
let class_idx = class_map[class_name];
let loaded = load_recording(path, class_idx);
eprintln!(" Loaded {}: {} frames → class '{}'",
fname, loaded.len(), class_name);
samples.extend(loaded);
samples.extend(loaded.clone());
recording_groups.push(loaded);
}
if samples.is_empty() {
@ -499,22 +725,428 @@ pub fn train_from_recordings(recordings_dir: &Path) -> Result<AdaptiveModel, Str
}
for c in 0..n_classes {
let tot = class_total[c].max(1);
eprintln!(" {}: {}/{} ({:.0}%)", class_names[c], class_correct[c], tot,
eprintln!(" LogReg {}: {}/{} ({:.0}%)", class_names[c], class_correct[c], tot,
class_correct[c] as f64 / tot as f64 * 100.0);
}
// ── ADR-119: train MLP on the same normalised samples ──
eprintln!("Training MLP (22 → {}{}) ...", MLP_HIDDEN, n_classes);
let mlp = train_mlp_classifier(&norm_samples, n_classes);
let (mlp_acc, mlp_per_class) = eval_mlp(&mlp, &norm_samples, n_classes);
eprintln!("MLP accuracy: {:.2}% (LogReg was {:.2}%)",
mlp_acc * 100.0, accuracy * 100.0);
for c in 0..n_classes {
let tot = class_total[c].max(1);
let corr = mlp_per_class[c];
eprintln!(" MLP {}: {}/{} ({:.0}%)",
class_names[c], corr, tot, corr as f64 / tot as f64 * 100.0);
}
// ── ADR-120: Windowed MLP training ──
// Build temporal-window samples within each recording (no cross-recording
// mixing). Slide window of WINDOW_FRAMES with stride to balance class
// count vs sample count.
eprintln!("Building temporal windows ({} frames × {} features → {} dims)...",
WINDOW_FRAMES, N_FEATURES, WINDOWED_INPUT);
let window_stride = 5usize; // 4× overlap; ~28k windows total on 151k frames
let mut win_samples: Vec<(Vec<f64>, usize)> = Vec::new();
for group in &recording_groups {
if group.len() < WINDOW_FRAMES { continue; }
let class_idx = group[0].class_idx;
let mut start = 0usize;
while start + WINDOW_FRAMES <= group.len() {
let mut flat: Vec<f64> = Vec::with_capacity(WINDOWED_INPUT);
for f in 0..WINDOW_FRAMES {
let frame = &group[start + f];
for i in 0..N_FEATURES {
let z = (frame.features[i] - global_mean[i]) / (global_std[i] + 1e-9);
flat.push(z);
}
}
win_samples.push((flat, class_idx));
start += window_stride;
}
}
eprintln!("Total windowed samples: {}", win_samples.len());
// Count per-class windowed samples.
let mut win_class_total = vec![0usize; n_classes];
for (_, c) in &win_samples { win_class_total[*c] += 1; }
eprintln!("Training Windowed MLP ({}{}{}) ...", WINDOWED_INPUT, WINDOWED_HIDDEN, n_classes);
let windowed_mlp = train_windowed_mlp_classifier(&win_samples, n_classes);
let (win_acc, win_per_class) = eval_windowed_mlp(&windowed_mlp, &win_samples, n_classes);
eprintln!("Windowed MLP accuracy: {:.2}% (frame-level MLP was {:.2}%)",
win_acc * 100.0, mlp_acc * 100.0);
for c in 0..n_classes {
let tot = win_class_total[c].max(1);
let corr = win_per_class[c];
eprintln!(" W-MLP {}: {}/{} ({:.0}%)",
class_names[c], corr, tot, corr as f64 / tot as f64 * 100.0);
}
// Pick the best classifier as final accuracy number.
let final_accuracy = win_acc.max(mlp_acc).max(accuracy);
Ok(AdaptiveModel {
class_stats,
weights,
mlp,
windowed_mlp,
global_mean,
global_std,
trained_frames: n,
training_accuracy: accuracy,
training_accuracy: final_accuracy,
version: 1,
class_names,
})
}
// ── ADR-119: MLP training (manual backprop, no external ML crate) ────────────
/// Train a single-hidden-layer MLP on already-z-score-normalised samples.
/// Architecture: N_FEATURES → MLP_HIDDEN → n_classes (ReLU + softmax).
/// Optimiser: SGD + momentum 0.9 + weight decay 1e-4 + cosine LR decay.
fn train_mlp_classifier(samples: &[([f64; N_FEATURES], usize)], n_classes: usize) -> MlpModel {
let n_w1 = N_FEATURES * MLP_HIDDEN;
let n_w2 = MLP_HIDDEN * n_classes;
// He initialisation: w ~ N(0, sqrt(2/fan_in))
let mut rng_state: u64 = 1337;
let mut rng_u01 = move || -> f64 {
rng_state = rng_state.wrapping_mul(6364136223846793005).wrapping_add(1442695040888963407);
((rng_state >> 33) as f64) / ((u64::MAX >> 33) as f64)
};
let mut he_init = |n: usize, fan_in: usize| -> Vec<f64> {
let s = (2.0 / fan_in as f64).sqrt();
let mut v = Vec::with_capacity(n);
let mut k = 0;
while k < n {
let u1 = rng_u01().max(1e-12);
let u2 = rng_u01();
let z0 = (-2.0 * u1.ln()).sqrt() * (2.0 * std::f64::consts::PI * u2).cos() * s;
let z1 = (-2.0 * u1.ln()).sqrt() * (2.0 * std::f64::consts::PI * u2).sin() * s;
v.push(z0);
k += 1;
if k < n { v.push(z1); k += 1; }
}
v
};
let mut w1 = he_init(n_w1, N_FEATURES);
let mut b1 = vec![0.0f64; MLP_HIDDEN];
let mut w2 = he_init(n_w2, MLP_HIDDEN);
let mut b2 = vec![0.0f64; n_classes];
let mut mw1 = vec![0.0f64; n_w1];
let mut mb1 = vec![0.0f64; MLP_HIDDEN];
let mut mw2 = vec![0.0f64; n_w2];
let mut mb2 = vec![0.0f64; n_classes];
let momentum = 0.9f64;
let weight_decay = 1e-4f64;
let base_lr = 0.05f64;
let batch_size = 64usize;
let epochs = 30usize;
let n = samples.len();
// Shuffle index buffer (avoid cloning sample arrays).
let mut idx: Vec<usize> = (0..n).collect();
let mut shuf_state: u64 = 7;
let mut shuf_next = move || -> u64 {
shuf_state = shuf_state.wrapping_mul(6364136223846793005).wrapping_add(1442695040888963407);
shuf_state >> 33
};
for epoch in 0..epochs {
for i in (1..idx.len()).rev() {
let j = (shuf_next() as usize) % (i + 1);
idx.swap(i, j);
}
let lr = base_lr * 0.5 * (1.0 + (std::f64::consts::PI * epoch as f64 / epochs as f64).cos());
let mut epoch_loss = 0.0f64;
let mut h_pre = vec![0.0f64; MLP_HIDDEN];
let mut h = vec![0.0f64; MLP_HIDDEN];
let mut logits = vec![0.0f64; n_classes];
let mut k = 0usize;
while k < n {
let bend = (k + batch_size).min(n);
let mut gw1 = vec![0.0f64; n_w1];
let mut gb1 = vec![0.0f64; MLP_HIDDEN];
let mut gw2 = vec![0.0f64; n_w2];
let mut gb2 = vec![0.0f64; n_classes];
let bs = (bend - k) as f64;
for &si in &idx[k..bend] {
let (x, target) = &samples[si];
// Forward.
for j in 0..MLP_HIDDEN {
let mut s = b1[j];
for i in 0..N_FEATURES { s += x[i] * w1[i * MLP_HIDDEN + j]; }
h_pre[j] = s;
h[j] = s.max(0.0);
}
for c in 0..n_classes {
let mut s = b2[c];
for j in 0..MLP_HIDDEN { s += h[j] * w2[j * n_classes + c]; }
logits[c] = s;
}
let mx = logits.iter().cloned().fold(f64::NEG_INFINITY, f64::max);
let ex_sum: f64 = logits.iter().map(|z| (z - mx).exp()).sum();
// d_logits = softmax - one_hot
let mut d_logits = vec![0.0f64; n_classes];
for c in 0..n_classes {
let p = (logits[c] - mx).exp() / ex_sum;
d_logits[c] = p - if c == *target { 1.0 } else { 0.0 };
if c == *target { epoch_loss += -(p.max(1e-15)).ln(); }
}
// Gradients.
for c in 0..n_classes {
gb2[c] += d_logits[c];
for j in 0..MLP_HIDDEN {
gw2[j * n_classes + c] += h[j] * d_logits[c];
}
}
// Backprop through Layer-2 to hidden.
let mut d_h = [0.0f64; MLP_HIDDEN];
for j in 0..MLP_HIDDEN {
if h_pre[j] <= 0.0 { continue; }
let mut s = 0.0;
for c in 0..n_classes { s += w2[j * n_classes + c] * d_logits[c]; }
d_h[j] = s;
}
for j in 0..MLP_HIDDEN {
gb1[j] += d_h[j];
for i in 0..N_FEATURES { gw1[i * MLP_HIDDEN + j] += x[i] * d_h[j]; }
}
}
// SGD + momentum + weight decay.
for q in 0..n_w1 {
let g = gw1[q] / bs + weight_decay * w1[q];
mw1[q] = momentum * mw1[q] + g;
w1[q] -= lr * mw1[q];
}
for q in 0..MLP_HIDDEN {
let g = gb1[q] / bs;
mb1[q] = momentum * mb1[q] + g;
b1[q] -= lr * mb1[q];
}
for q in 0..n_w2 {
let g = gw2[q] / bs + weight_decay * w2[q];
mw2[q] = momentum * mw2[q] + g;
w2[q] -= lr * mw2[q];
}
for q in 0..n_classes {
let g = gb2[q] / bs;
mb2[q] = momentum * mb2[q] + g;
b2[q] -= lr * mb2[q];
}
k = bend;
}
if epoch % 5 == 0 || epoch == epochs - 1 {
eprintln!(" MLP epoch {epoch:2}/{}: loss = {:.4}, lr = {:.4}",
epochs, epoch_loss / n as f64, lr);
}
}
MlpModel { w1, b1, w2, b2, n_classes }
}
/// Evaluate MLP accuracy and per-class correct counts on normalised samples.
fn eval_mlp(mlp: &MlpModel, samples: &[([f64; N_FEATURES], usize)], n_classes: usize)
-> (f64, Vec<usize>)
{
let mut correct = 0usize;
let mut per_class = vec![0usize; n_classes];
for (x, target) in samples {
let probs = mlp.forward(x);
let pred = probs.iter().enumerate()
.max_by(|a, b| a.1.partial_cmp(b.1).unwrap())
.unwrap().0;
if pred == *target { correct += 1; per_class[*target] += 1; }
}
(correct as f64 / samples.len() as f64, per_class)
}
// ── ADR-120: Windowed MLP training ──────────────────────────────────────────
/// Train a windowed MLP on temporal-window samples.
/// Each sample is a 440-d flat vector (20 frames × 22 features) labeled
/// with a class index. Architecture: 440 → 64 ReLU → n_classes softmax.
/// Same SGD + momentum + cosine-decay recipe as MLP, fewer epochs because
/// each window is a richer training signal than a single frame.
fn train_windowed_mlp_classifier(
samples: &[(Vec<f64>, usize)],
n_classes: usize,
) -> WindowedMlpModel {
let n_w1 = WINDOWED_INPUT * WINDOWED_HIDDEN;
let n_w2 = WINDOWED_HIDDEN * n_classes;
let mut rng_state: u64 = 24601;
let mut rng_u01 = move || -> f64 {
rng_state = rng_state.wrapping_mul(6364136223846793005).wrapping_add(1442695040888963407);
((rng_state >> 33) as f64) / ((u64::MAX >> 33) as f64)
};
let mut he_init = |n: usize, fan_in: usize| -> Vec<f64> {
let s = (2.0 / fan_in as f64).sqrt();
let mut v = Vec::with_capacity(n);
let mut k = 0;
while k < n {
let u1 = rng_u01().max(1e-12);
let u2 = rng_u01();
let z0 = (-2.0 * u1.ln()).sqrt() * (2.0 * std::f64::consts::PI * u2).cos() * s;
let z1 = (-2.0 * u1.ln()).sqrt() * (2.0 * std::f64::consts::PI * u2).sin() * s;
v.push(z0); k += 1;
if k < n { v.push(z1); k += 1; }
}
v
};
let mut w1 = he_init(n_w1, WINDOWED_INPUT);
let mut b1 = vec![0.0f64; WINDOWED_HIDDEN];
let mut w2 = he_init(n_w2, WINDOWED_HIDDEN);
let mut b2 = vec![0.0f64; n_classes];
let mut mw1 = vec![0.0f64; n_w1];
let mut mb1 = vec![0.0f64; WINDOWED_HIDDEN];
let mut mw2 = vec![0.0f64; n_w2];
let mut mb2 = vec![0.0f64; n_classes];
let momentum = 0.9f64;
let weight_decay = 1e-4f64;
let base_lr = 0.03f64; // smaller LR for larger network (vs MLP's 0.05)
let batch_size = 32usize;
let epochs = 25usize;
let n = samples.len();
let mut idx: Vec<usize> = (0..n).collect();
let mut shuf_state: u64 = 11;
let mut shuf_next = move || -> u64 {
shuf_state = shuf_state.wrapping_mul(6364136223846793005).wrapping_add(1442695040888963407);
shuf_state >> 33
};
let mut h_pre = vec![0.0f64; WINDOWED_HIDDEN];
let mut h = vec![0.0f64; WINDOWED_HIDDEN];
let mut logits = vec![0.0f64; n_classes];
for epoch in 0..epochs {
for i in (1..idx.len()).rev() {
let j = (shuf_next() as usize) % (i + 1);
idx.swap(i, j);
}
let lr = base_lr * 0.5 * (1.0 + (std::f64::consts::PI * epoch as f64 / epochs as f64).cos());
let mut epoch_loss = 0.0f64;
let mut k = 0usize;
while k < n {
let bend = (k + batch_size).min(n);
let mut gw1 = vec![0.0f64; n_w1];
let mut gb1 = vec![0.0f64; WINDOWED_HIDDEN];
let mut gw2 = vec![0.0f64; n_w2];
let mut gb2 = vec![0.0f64; n_classes];
let bs = (bend - k) as f64;
for &si in &idx[k..bend] {
let (x, target) = &samples[si];
debug_assert_eq!(x.len(), WINDOWED_INPUT);
// Forward.
for j in 0..WINDOWED_HIDDEN {
let mut s = b1[j];
for i in 0..WINDOWED_INPUT { s += x[i] * w1[i * WINDOWED_HIDDEN + j]; }
h_pre[j] = s;
h[j] = s.max(0.0);
}
for c in 0..n_classes {
let mut s = b2[c];
for j in 0..WINDOWED_HIDDEN { s += h[j] * w2[j * n_classes + c]; }
logits[c] = s;
}
let mx = logits.iter().cloned().fold(f64::NEG_INFINITY, f64::max);
let ex_sum: f64 = logits.iter().map(|z| (z - mx).exp()).sum();
let mut d_logits = vec![0.0f64; n_classes];
for c in 0..n_classes {
let p = (logits[c] - mx).exp() / ex_sum;
d_logits[c] = p - if c == *target { 1.0 } else { 0.0 };
if c == *target { epoch_loss += -(p.max(1e-15)).ln(); }
}
for c in 0..n_classes {
gb2[c] += d_logits[c];
for j in 0..WINDOWED_HIDDEN {
gw2[j * n_classes + c] += h[j] * d_logits[c];
}
}
let mut d_h = vec![0.0f64; WINDOWED_HIDDEN];
for j in 0..WINDOWED_HIDDEN {
if h_pre[j] <= 0.0 { continue; }
let mut s = 0.0;
for c in 0..n_classes { s += w2[j * n_classes + c] * d_logits[c]; }
d_h[j] = s;
}
for j in 0..WINDOWED_HIDDEN {
gb1[j] += d_h[j];
for i in 0..WINDOWED_INPUT { gw1[i * WINDOWED_HIDDEN + j] += x[i] * d_h[j]; }
}
}
for q in 0..n_w1 {
let g = gw1[q] / bs + weight_decay * w1[q];
mw1[q] = momentum * mw1[q] + g;
w1[q] -= lr * mw1[q];
}
for q in 0..WINDOWED_HIDDEN {
let g = gb1[q] / bs;
mb1[q] = momentum * mb1[q] + g;
b1[q] -= lr * mb1[q];
}
for q in 0..n_w2 {
let g = gw2[q] / bs + weight_decay * w2[q];
mw2[q] = momentum * mw2[q] + g;
w2[q] -= lr * mw2[q];
}
for q in 0..n_classes {
let g = gb2[q] / bs;
mb2[q] = momentum * mb2[q] + g;
b2[q] -= lr * mb2[q];
}
k = bend;
}
if epoch % 3 == 0 || epoch == epochs - 1 {
eprintln!(" W-MLP epoch {epoch:2}/{}: loss = {:.4}, lr = {:.4}",
epochs, epoch_loss / n as f64, lr);
}
}
WindowedMlpModel { w1, b1, w2, b2, n_classes }
}
/// Evaluate Windowed MLP accuracy + per-class correct counts.
fn eval_windowed_mlp(
mlp: &WindowedMlpModel,
samples: &[(Vec<f64>, usize)],
n_classes: usize,
) -> (f64, Vec<usize>) {
let mut correct = 0usize;
let mut per_class = vec![0usize; n_classes];
for (x, target) in samples {
let probs = mlp.forward(x);
let pred = probs.iter().enumerate()
.max_by(|a, b| a.1.partial_cmp(b.1).unwrap())
.unwrap().0;
if pred == *target { correct += 1; per_class[*target] += 1; }
}
(correct as f64 / samples.len() as f64, per_class)
}
/// Default path for the saved adaptive model.
pub fn model_path() -> PathBuf {
PathBuf::from("data/adaptive_model.json")

View File

@ -10,6 +10,68 @@ use crate::vital_signs::VitalSigns;
// ── ESP32 UDP frame parsers ─────────────────────────────────────────────────
/// Parse a 60-byte ADR-081 feature_state packet (magic 0xC511_0006).
///
/// Converts the on-wire rv_feature_state_t into an Esp32VitalsPacket so the
/// existing vitals processing pipeline can consume it directly. Mapping:
/// motion_score → motion_energy (and motion flag if > 0.05)
/// presence_score → presence_score + presence (flag) if > 0.5
/// respiration_bpm → breathing_rate_bpm
/// heartbeat_bpm → heartrate_bpm
/// quality_flags → presence/fall/motion bits
pub fn parse_rv_feature_state(buf: &[u8]) -> Option<Esp32VitalsPacket> {
if buf.len() < 60 { return None; }
let magic = u32::from_le_bytes([buf[0], buf[1], buf[2], buf[3]]);
if magic != 0xC511_0006 { return None; }
let node_id = buf[4];
let _mode = buf[5];
let _seq = u16::from_le_bytes([buf[6], buf[7]]);
let ts_us = u64::from_le_bytes([
buf[8], buf[9], buf[10], buf[11], buf[12], buf[13], buf[14], buf[15],
]);
let motion_score = f32::from_le_bytes([buf[16], buf[17], buf[18], buf[19]]);
let presence_score = f32::from_le_bytes([buf[20], buf[21], buf[22], buf[23]]);
let respiration_bpm = f32::from_le_bytes([buf[24], buf[25], buf[26], buf[27]]);
let _respiration_conf = f32::from_le_bytes([buf[28], buf[29], buf[30], buf[31]]);
let heartbeat_bpm = f32::from_le_bytes([buf[32], buf[33], buf[34], buf[35]]);
let _heartbeat_conf = f32::from_le_bytes([buf[36], buf[37], buf[38], buf[39]]);
let _anomaly_score = f32::from_le_bytes([buf[40], buf[41], buf[42], buf[43]]);
let _env_shift_score = f32::from_le_bytes([buf[44], buf[45], buf[46], buf[47]]);
let _node_coherence = f32::from_le_bytes([buf[48], buf[49], buf[50], buf[51]]);
let quality_flags = u16::from_le_bytes([buf[52], buf[53]]);
// ADR-100 D3: FW ships median RSSI in byte 54 (was `reserved`); 0 means
// "not yet measured" → keep the historical -50 fallback so the UI's
// RSSI trace isn't pinned at a misleading 0 dBm. Stays in sync with
// the duplicate parser in main.rs (must remain identical).
let rssi_byte = buf[54] as i8;
let rssi: i8 = if rssi_byte == 0 { -50 } else { rssi_byte };
// Bit 0 of quality_flags = presence valid
let presence_valid = (quality_flags & (1 << 0)) != 0;
let presence = presence_valid && presence_score > 0.5;
// Bit 3 = anomaly triggered → treat as fall (approximation)
let fall_detected = (quality_flags & (1 << 3)) != 0;
let motion = motion_score > 0.05;
// Single-node feature_state doesn't tell us number of persons; surface 1 when present.
let n_persons = if presence { 1 } else { 0 };
Some(Esp32VitalsPacket {
node_id,
presence,
fall_detected,
motion,
breathing_rate_bpm: respiration_bpm as f64,
heartrate_bpm: heartbeat_bpm as f64,
rssi,
n_persons,
motion_energy: motion_score,
presence_score,
timestamp_ms: (ts_us / 1000) as u32,
})
}
/// Parse a 32-byte edge vitals packet (magic 0xC511_0002).
pub fn parse_esp32_vitals(buf: &[u8]) -> Option<Esp32VitalsPacket> {
if buf.len() < 32 { return None; }
@ -67,14 +129,32 @@ pub fn parse_esp32_frame(buf: &[u8]) -> Option<Esp32Frame> {
let magic = u32::from_le_bytes([buf[0], buf[1], buf[2], buf[3]]);
if magic != 0xC511_0001 { return None; }
let node_id = buf[4];
let n_antennas = buf[5];
let n_subcarriers = buf[6];
let freq_mhz = u16::from_le_bytes([buf[8], buf[9]]);
let sequence = u32::from_le_bytes([buf[10], buf[11], buf[12], buf[13]]);
let rssi_raw = buf[14] as i8;
let rssi = if rssi_raw > 0 { rssi_raw.saturating_neg() } else { rssi_raw };
let noise_floor = buf[15] as i8;
// On-wire layout — must stay in lockstep with
// firmware/esp32-csi-node/main/csi_collector.c::serialize_csi_frame().
// ADR-100 D3 fix: the previous version of this parser had every field
// after `n_antennas` shifted by 2 bytes (n_subcarriers read as u8,
// freq_mhz/sequence misaligned, rssi read from buf[14] instead of
// buf[16]). That made `mean_rssi` random noise (a byte taken from
// mid-sequence) which the saturating_neg() workaround then forced
// negative — hiding the bug from cursory log inspection while keeping
// RSSI traces useless. Layout below matches the FW byte-for-byte.
// [0..4] magic (u32 LE)
// [4] node_id (u8)
// [5] n_antennas (u8)
// [6..8] n_subcarriers(u16 LE)
// [8..12] freq_mhz (u32 LE)
// [12..16] sequence (u32 LE)
// [16] rssi (i8)
// [17] noise_floor (i8)
// [18..20] reserved
// [20..] I/Q payload
let node_id = buf[4];
let n_antennas = buf[5];
let n_subcarriers = u16::from_le_bytes([buf[6], buf[7]]) as u8;
let freq_mhz = u16::from_le_bytes([buf[8], buf[9]]); // upper bytes always 0 in practice
let sequence = u32::from_le_bytes([buf[12], buf[13], buf[14], buf[15]]);
let rssi = buf[16] as i8; // already in [-128..127]
let noise_floor = buf[17] as i8;
let iq_start = 20;
let n_pairs = n_antennas as usize * n_subcarriers as usize;
@ -401,9 +481,16 @@ pub fn smooth_and_classify_node(ns: &mut NodeState, raw: &mut ClassificationInfo
raw.confidence = (0.4 + sm * 0.6).clamp(0.0, 1.0);
}
/// ADR-118: legacy single-node override variant kept for API compatibility.
/// New callers should query per-node amps from AMP_HIST and pass the full
/// `&[(u8, &[f64])]` slice. This variant degrades to "node 1 only" which
/// produces a feature vector with 5 zero-padded node slots — usable for
/// emergency fallback but the trained model expects the full multi-node
/// vector.
pub fn adaptive_override(state: &AppStateInner, features: &FeatureInfo, classification: &mut ClassificationInfo) {
if let Some(ref model) = state.adaptive_model {
let amps = state.frame_history.back().map(|v| v.as_slice()).unwrap_or(&[]);
let amps_owned: Vec<f64> = state.frame_history.back().cloned().unwrap_or_default();
let per_node_refs: Vec<(u8, &[f64])> = vec![(1u8, amps_owned.as_slice())];
let feat_arr = adaptive_classifier::features_from_runtime(
&serde_json::json!({
"variance": features.variance,
@ -414,7 +501,7 @@ pub fn adaptive_override(state: &AppStateInner, features: &FeatureInfo, classifi
"change_points": features.change_points,
"mean_rssi": features.mean_rssi,
}),
amps,
&per_node_refs,
);
let (label, conf) = model.classify(&feat_arr);
classification.motion_level = label.to_string();
@ -673,3 +760,63 @@ pub fn chrono_timestamp() -> u64 {
.map(|d| d.as_secs())
.unwrap_or(0)
}
#[cfg(test)]
mod tests {
use super::*;
/// Regression test for ADR-100 D3: parse_esp32_frame must extract
/// fields from the exact offsets the firmware writes in
/// csi_collector.c::serialize_csi_frame(). A previous version
/// shifted every field after `n_antennas` by 2 bytes, making RSSI
/// random noise. This test builds a synthetic frame with distinctive
/// values for every header field and asserts the parser recovers
/// each one.
#[test]
fn parse_esp32_frame_header_offsets_match_firmware() {
let n_sub: u16 = 64;
let freq_mhz: u32 = 2462; // channel 11
let sequence: u32 = 0x1122_3344;
let rssi: i8 = -57;
let noise_floor: i8 = -95;
let n_pairs = 1 * n_sub as usize;
let mut buf = vec![0u8; 20 + n_pairs * 2];
buf[0..4].copy_from_slice(&0xC511_0001u32.to_le_bytes());
buf[4] = 7; // node_id
buf[5] = 1; // n_antennas
buf[6..8].copy_from_slice(&n_sub.to_le_bytes()); // u16
buf[8..12].copy_from_slice(&freq_mhz.to_le_bytes()); // u32
buf[12..16].copy_from_slice(&sequence.to_le_bytes()); // u32
buf[16] = rssi as u8;
buf[17] = noise_floor as u8;
// [18..20] reserved zeros
// I/Q: leave zeros — parser still needs them present
let f = parse_esp32_frame(&buf).expect("frame parses");
assert_eq!(f.node_id, 7);
assert_eq!(f.n_antennas, 1);
assert_eq!(f.n_subcarriers as u16, n_sub);
assert_eq!(f.freq_mhz, freq_mhz as u16); // parser narrows to u16 (upper bytes always 0 in WiFi)
assert_eq!(f.sequence, sequence);
assert_eq!(f.rssi, -57, "rssi must come from byte 16, not 14");
assert_eq!(f.noise_floor, -95, "noise_floor must come from byte 17, not 15");
assert_eq!(f.amplitudes.len(), n_pairs);
}
/// Boundary case: minimum-size frame (20 B header, zero I/Q pairs)
/// must not panic and must still expose RSSI correctly.
#[test]
fn parse_esp32_frame_min_size_rssi_only() {
let mut buf = vec![0u8; 20];
buf[0..4].copy_from_slice(&0xC511_0001u32.to_le_bytes());
buf[5] = 0; // 0 antennas → 0 IQ pairs
buf[6..8].copy_from_slice(&0u16.to_le_bytes());
buf[16] = (-71i8) as u8;
buf[17] = (-92i8) as u8;
let f = parse_esp32_frame(&buf).expect("min frame parses");
assert_eq!(f.rssi, -71);
assert_eq!(f.noise_floor, -92);
assert!(f.amplitudes.is_empty());
}
}

View File

@ -19,3 +19,5 @@ pub mod sona;
pub mod sparse_inference;
#[allow(dead_code)]
pub mod embedding;
/// ADR-116: WiFlow-v1 supervised pose model loader + Rust forward pass.
pub mod wiflow_v1;

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,473 @@
//! ADR-116: WiFlow-v1 supervised pose model loader + inference.
//!
//! Ports `scripts/train-wiflow-supervised.js` inference path to Rust so
//! sensing-server can serve real keypoints on `/api/v1/pose/*` instead of
//! returning empty arrays per ADR-105 gate.
//!
//! The model on HuggingFace (`ruv/ruview/wiflow-v1/wiflow-v1.json`) is the
//! **lite scale** (186,946 params), NOT the `architecture` field that the
//! exporter hardcodes (which describes the `full` scale). We trust
//! `totalParams` to disambiguate.
//!
//! Topology (lite):
//! * 2 TCN blocks, kernel=3, dilations=[1,2]
//! * Per block: causal_conv1 → bn1 → relu → causal_conv2 → bn2
//! + residual (1×1 projection if in_ch ≠ out_ch) → relu
//! * tcnChannels: 35 → 32 → 32
//! * Flatten (32 × 20 = 640) → fc1 (640→256) → relu → fc2 (256→34)
//! * Sigmoid on final 34-dim vector → 17 (x,y) keypoints in [0, 1]
//!
//! Weight order (collectParams in train script):
//! for each tcn block:
//! conv1.weight, conv1.bias, bn1.gamma, bn1.beta,
//! conv2.weight, conv2.bias, bn2.gamma, bn2.beta,
//! (if in_ch ≠ out_ch: res.weight, res.bias)
//! fc1.weight, fc1.bias, fc2.weight, fc2.bias
//!
//! All weights are f32 little-endian, base64-encoded in `weightsBase64`.
use std::path::Path;
const TIME_STEPS: usize = 20;
const INPUT_DIM: usize = 35;
const NUM_KP: usize = 17;
const OUT_DIM: usize = NUM_KP * 2; // 34
const TCN_CH: [usize; 3] = [INPUT_DIM, 32, 32]; // chain: 35 → 32 → 32
const TCN_K: usize = 3;
const TCN_DIL: [usize; 2] = [1, 2];
const HIDDEN: usize = 256;
const FLAT_DIM: usize = 32 * TIME_STEPS; // 640
/// CausalConv1d weights: `weight[oc*(in_ch*k) + ic*k + tap]`, bias `[oc]`.
#[derive(Debug, Clone)]
struct Conv1d {
in_ch: usize,
out_ch: usize,
kernel: usize,
dilation: usize,
weight: Vec<f32>,
bias: Vec<f32>,
}
/// BatchNorm1d: 2 params per channel (gamma, beta). Running stats are NOT
/// serialized — JS impl re-computes mean/var per window at inference time.
#[derive(Debug, Clone)]
struct BatchNorm {
channels: usize,
gamma: Vec<f32>,
beta: Vec<f32>,
}
#[derive(Debug, Clone)]
struct TcnBlock {
conv1: Conv1d,
bn1: BatchNorm,
conv2: Conv1d,
bn2: BatchNorm,
res: Option<Conv1d>, // 1×1 projection when in_ch ≠ out_ch
}
#[derive(Debug, Clone)]
struct Linear {
in_dim: usize,
out_dim: usize,
/// Row-major `[in_dim, out_dim]` — matches JS `weight[i*outDim + j]`.
weight: Vec<f32>,
bias: Vec<f32>,
}
#[derive(Debug, Clone)]
pub struct WiflowModel {
blocks: [TcnBlock; 2],
fc1: Linear,
fc2: Linear,
}
#[derive(Debug)]
pub struct LoadError(pub String);
impl std::fmt::Display for LoadError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "wiflow_v1 load: {}", self.0)
}
}
impl std::error::Error for LoadError {}
impl WiflowModel {
pub fn load_from_json(path: &Path) -> Result<Self, LoadError> {
let raw = std::fs::read_to_string(path)
.map_err(|e| LoadError(format!("read {}: {e}", path.display())))?;
let v: serde_json::Value = serde_json::from_str(&raw)
.map_err(|e| LoadError(format!("json parse: {e}")))?;
let total = v.get("totalParams").and_then(|x| x.as_u64()).unwrap_or(0) as usize;
if total != 186_946 {
return Err(LoadError(format!(
"totalParams={total}, expected 186946 (lite scale). The exporter \
hardcodes the `architecture` field to the full scale; \
totalParams is the only reliable signal."
)));
}
let b64 = v.get("weightsBase64").and_then(|x| x.as_str())
.ok_or_else(|| LoadError("missing weightsBase64".into()))?;
let bytes = base64_decode(b64)
.map_err(|e| LoadError(format!("base64: {e}")))?;
if bytes.len() != total * 4 {
return Err(LoadError(format!(
"bytes={}, expected {} (totalParams*4)", bytes.len(), total * 4)));
}
let floats: Vec<f32> = bytes.chunks_exact(4)
.map(|c| f32::from_le_bytes([c[0], c[1], c[2], c[3]]))
.collect();
let mut cur = Cursor::new(&floats);
let block0 = TcnBlock::take(&mut cur, TCN_CH[0], TCN_CH[1], TCN_K, TCN_DIL[0])?;
let block1 = TcnBlock::take(&mut cur, TCN_CH[1], TCN_CH[2], TCN_K, TCN_DIL[1])?;
let fc1 = Linear::take(&mut cur, FLAT_DIM, HIDDEN)?;
let fc2 = Linear::take(&mut cur, HIDDEN, OUT_DIM)?;
if cur.remaining() != 0 {
return Err(LoadError(format!(
"weight stream has {} unread floats after fc2 — topology mismatch",
cur.remaining()
)));
}
Ok(Self { blocks: [block0, block1], fc1, fc2 })
}
/// Forward pass.
/// `input` is `[INPUT_DIM × TIME_STEPS]` row-major (channel-major):
/// `input[c * TIME_STEPS + t]`.
/// Returns 17 keypoints as (x, y) in [0, 1].
pub fn forward(&self, input: &[f32]) -> [(f32, f32); NUM_KP] {
debug_assert_eq!(input.len(), INPUT_DIM * TIME_STEPS);
let mut x: Vec<f32> = input.to_vec();
// TCN blocks
x = self.blocks[0].forward(&x, TIME_STEPS);
x = self.blocks[1].forward(&x, TIME_STEPS);
// Flatten — channels-major matches JS `c * T + t` linearisation.
debug_assert_eq!(x.len(), FLAT_DIM);
// fc1 + relu
let mut h = self.fc1.forward(&x);
for v in h.iter_mut() { if *v < 0.0 { *v = 0.0; } }
// fc2
let out = self.fc2.forward(&h);
// sigmoid → 17 (x, y)
let mut kp = [(0.0f32, 0.0f32); NUM_KP];
for i in 0..NUM_KP {
kp[i].0 = sigmoid(out[i * 2]);
kp[i].1 = sigmoid(out[i * 2 + 1]);
}
kp
}
}
// ── Internal layer impls ─────────────────────────────────────────────────────
struct Cursor<'a> {
data: &'a [f32],
offset: usize,
}
impl<'a> Cursor<'a> {
fn new(d: &'a [f32]) -> Self { Self { data: d, offset: 0 } }
fn take(&mut self, n: usize) -> Result<Vec<f32>, LoadError> {
if self.offset + n > self.data.len() {
return Err(LoadError(format!(
"weight underrun: need {}, have {}", n, self.data.len() - self.offset)));
}
let out = self.data[self.offset..self.offset + n].to_vec();
self.offset += n;
Ok(out)
}
fn remaining(&self) -> usize { self.data.len() - self.offset }
}
impl Conv1d {
fn take(c: &mut Cursor<'_>, in_ch: usize, out_ch: usize, k: usize, dil: usize)
-> Result<Self, LoadError>
{
let weight = c.take(in_ch * k * out_ch)?;
let bias = c.take(out_ch)?;
Ok(Self { in_ch, out_ch, kernel: k, dilation: dil, weight, bias })
}
/// Causal conv with left padding. Input layout: `[in_ch * T]` row-major.
fn forward(&self, input: &[f32], t_steps: usize) -> Vec<f32> {
let eff_k = self.kernel + (self.kernel - 1) * (self.dilation - 1);
let pad_left = eff_k - 1;
let mut out = vec![0.0f32; self.out_ch * t_steps];
for oc in 0..self.out_ch {
for t in 0..t_steps {
let mut sum = self.bias[oc];
for ic in 0..self.in_ch {
for k in 0..self.kernel {
let t_idx_signed = t as isize + pad_left as isize
- (k * self.dilation) as isize;
// Left-pad with zeros: only contribute when t_idx_signed - pad_left >= 0
let t_src = t_idx_signed - pad_left as isize;
if t_src < 0 || t_src >= t_steps as isize { continue; }
let w_idx = oc * (self.in_ch * self.kernel) + ic * self.kernel + k;
sum += self.weight[w_idx] * input[ic * t_steps + t_src as usize];
}
}
out[oc * t_steps + t] = sum;
}
}
out
}
}
impl BatchNorm {
fn take(c: &mut Cursor<'_>, channels: usize) -> Result<Self, LoadError> {
let gamma = c.take(channels)?;
let beta = c.take(channels)?;
Ok(Self { channels, gamma, beta })
}
/// Per-window normalisation matching JS impl: mean/var computed across
/// the T axis at inference time (not from saved running stats).
fn forward(&self, x: &mut [f32], t_steps: usize) {
let eps = 1e-5f32;
for c in 0..self.channels {
let base = c * t_steps;
let mut mean = 0.0f32;
for t in 0..t_steps { mean += x[base + t]; }
mean /= t_steps as f32;
let mut var = 0.0f32;
for t in 0..t_steps {
let d = x[base + t] - mean;
var += d * d;
}
var /= t_steps as f32;
let inv_std = 1.0f32 / (var + eps).sqrt();
let g = self.gamma[c];
let b = self.beta[c];
for t in 0..t_steps {
x[base + t] = g * (x[base + t] - mean) * inv_std + b;
}
}
}
}
impl TcnBlock {
fn take(c: &mut Cursor<'_>, in_ch: usize, out_ch: usize, k: usize, dil: usize)
-> Result<Self, LoadError>
{
let conv1 = Conv1d::take(c, in_ch, out_ch, k, dil)?;
let bn1 = BatchNorm::take(c, out_ch)?;
let conv2 = Conv1d::take(c, out_ch, out_ch, k, dil)?;
let bn2 = BatchNorm::take(c, out_ch)?;
let res = if in_ch != out_ch {
Some(Conv1d::take(c, in_ch, out_ch, 1, 1)?)
} else { None };
Ok(Self { conv1, bn1, conv2, bn2, res })
}
fn forward(&self, input: &[f32], t_steps: usize) -> Vec<f32> {
let mut x = self.conv1.forward(input, t_steps);
self.bn1.forward(&mut x, t_steps);
for v in x.iter_mut() { if *v < 0.0 { *v = 0.0; } } // relu
let mut y = self.conv2.forward(&x, t_steps);
self.bn2.forward(&mut y, t_steps);
// Residual
let res: Vec<f32> = if let Some(r) = &self.res {
r.forward(input, t_steps)
} else {
input.to_vec()
};
debug_assert_eq!(y.len(), res.len());
for (yv, rv) in y.iter_mut().zip(res.iter()) { *yv += *rv; }
for v in y.iter_mut() { if *v < 0.0 { *v = 0.0; } } // relu after residual
y
}
}
impl Linear {
fn take(c: &mut Cursor<'_>, in_dim: usize, out_dim: usize) -> Result<Self, LoadError> {
let weight = c.take(in_dim * out_dim)?;
let bias = c.take(out_dim)?;
Ok(Self { in_dim, out_dim, weight, bias })
}
fn forward(&self, input: &[f32]) -> Vec<f32> {
let mut out = vec![0.0f32; self.out_dim];
for j in 0..self.out_dim {
let mut s = self.bias[j];
for i in 0..self.in_dim {
s += input[i] * self.weight[i * self.out_dim + j];
}
out[j] = s;
}
out
}
}
fn sigmoid(x: f32) -> f32 {
if x >= 0.0 {
let e = (-x).exp();
1.0 / (1.0 + e)
} else {
let e = x.exp();
e / (1.0 + e)
}
}
// ── Inline base64 decoder ────────────────────────────────────────────────────
//
// Standard alphabet (AZ, az, 09, +, /). Padding `=` tolerated. Whitespace
// (including newlines) ignored — JSON.stringify can wrap base64 across lines
// in some exporters. Avoids pulling the `base64` crate just for one decode.
fn base64_decode(s: &str) -> Result<Vec<u8>, String> {
let mut out = Vec::with_capacity(s.len() * 3 / 4 + 4);
let mut buf: u32 = 0;
let mut bits: u32 = 0;
for ch in s.bytes() {
let v: u32 = match ch {
b'A'..=b'Z' => (ch - b'A') as u32,
b'a'..=b'z' => (ch - b'a' + 26) as u32,
b'0'..=b'9' => (ch - b'0' + 52) as u32,
b'+' => 62,
b'/' => 63,
b'=' => break,
b' ' | b'\n' | b'\r' | b'\t' => continue,
_ => return Err(format!("invalid base64 char {:#x}", ch)),
};
buf = (buf << 6) | v;
bits += 6;
if bits >= 8 {
bits -= 8;
out.push((buf >> bits) as u8);
buf &= (1 << bits) - 1;
}
}
Ok(out)
}
// ── Convenience input helpers ────────────────────────────────────────────────
/// Build the `[INPUT_DIM × TIME_STEPS]` input tensor from the most recent
/// `TIME_STEPS` per-frame amplitude vectors of a single node. Picks the
/// `INPUT_DIM` (35) subcarriers with smallest NBVI score (most useful), using
/// the same per-subcarrier `α·σ/μ² + (1α)·σ/μ` formula the classifier uses,
/// but with K=35 instead of NBVI_TOP_K=12 — model expects 35 channels.
///
/// Returns `None` if the history has fewer than `TIME_STEPS` frames or all
/// subcarriers are zero / unusable.
pub fn build_input_from_history(
history: &std::collections::VecDeque<Vec<f64>>,
) -> Option<Vec<f32>> {
let n = history.len();
if n < TIME_STEPS { return None; }
// Take the last 20 frames.
let recent: Vec<&Vec<f64>> = history.iter().rev().take(TIME_STEPS).collect();
// recent is reverse-chronological; we want chronological for forward pass.
let recent: Vec<&Vec<f64>> = recent.into_iter().rev().collect();
let n_sub = recent[0].len();
if n_sub == 0 { return None; }
// Per-subcarrier mean and std over the 20 frames.
let mut score: Vec<(usize, f64)> = (0..n_sub).map(|k| {
let mut sum = 0.0f64;
for f in &recent { sum += f.get(k).copied().unwrap_or(0.0); }
let mu = sum / TIME_STEPS as f64;
if mu.abs() < 1e-9 { return (k, f64::INFINITY); }
let mut var = 0.0f64;
for f in &recent {
let d = f.get(k).copied().unwrap_or(0.0) - mu;
var += d * d;
}
let sigma = (var / TIME_STEPS as f64).sqrt();
// NBVI (α = 0.5): 0.5 * (σ/μ²) + 0.5 * (σ/μ)
let mu2 = mu * mu;
let nbvi = 0.5 * (sigma / mu2) + 0.5 * (sigma / mu.abs());
(k, nbvi)
}).collect();
// 25th-percentile dead-zone gate (drop subcarriers with mean amplitude
// below the lower quartile).
let mut means: Vec<f64> = (0..n_sub).map(|k| {
let mut s = 0.0f64;
for f in &recent { s += f.get(k).copied().unwrap_or(0.0); }
s / TIME_STEPS as f64
}).collect();
means.sort_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal));
let q25_idx = (n_sub as f64 * 0.25) as usize;
let dead_thresh = means.get(q25_idx).copied().unwrap_or(0.0);
for (k, s) in score.iter_mut() {
// Re-compute mean for this k to gate (means above is sorted, indices lost).
let mut sum = 0.0f64;
for f in &recent { sum += f.get(*k).copied().unwrap_or(0.0); }
let mu = sum / TIME_STEPS as f64;
if mu < dead_thresh { *s = f64::INFINITY; }
}
score.sort_by(|a, b| a.1.partial_cmp(&b.1).unwrap_or(std::cmp::Ordering::Equal));
if score.is_empty() || !score[0].1.is_finite() { return None; }
// Pick top-INPUT_DIM (35) by lowest NBVI. If fewer than 35 are finite,
// pad the remaining channels with zeros (not subcarrier-0 duplicated —
// the original implementation pushed `0` into `picks` which silently
// duplicated channel 0 across all dead slots, fed the network 35x the
// same data, and made the saturation worse).
let mut picks: Vec<Option<usize>> = score.iter()
.filter(|(_, s)| s.is_finite())
.take(INPUT_DIM)
.map(|(k, _)| Some(*k))
.collect();
if picks.is_empty() { return None; }
while picks.len() < INPUT_DIM { picks.push(None); } // ← zero-pad, not dup
// Raw amplitudes pass-through. Training script (`scripts/train-wiflow-
// supervised.js::loadJsonl`) feeds raw values; the two TCN BatchNorm
// layers normalise per-channel per-window at inference time so absolute
// scale (550 ESP32 amplitude range) is handled by the network itself.
let mut out = vec![0.0f32; INPUT_DIM * TIME_STEPS];
for (ci, pick) in picks.iter().enumerate() {
match pick {
Some(k) => {
for (t, f) in recent.iter().enumerate() {
out[ci * TIME_STEPS + t] = f.get(*k).copied().unwrap_or(0.0) as f32;
}
}
None => { /* zero-padded channel, already 0.0 from vec init */ }
}
}
Some(out)
}
// ── Tests ────────────────────────────────────────────────────────────────────
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn base64_round_trip_alphabet() {
// "Man" -> "TWFu"
assert_eq!(base64_decode("TWFu").unwrap(), b"Man");
// padding
assert_eq!(base64_decode("TWE=").unwrap(), b"Ma");
assert_eq!(base64_decode("TQ==").unwrap(), b"M");
// whitespace tolerated
assert_eq!(base64_decode("T W\nF u").unwrap(), b"Man");
}
#[test]
fn sigmoid_bounds() {
assert!((sigmoid(0.0) - 0.5).abs() < 1e-6);
assert!(sigmoid(10.0) > 0.999);
assert!(sigmoid(-10.0) < 0.001);
}
#[test]
fn build_input_zero_history() {
let h = std::collections::VecDeque::new();
assert!(build_input_from_history(&h).is_none());
}
}

View File

@ -0,0 +1,509 @@
<!doctype html>
<html lang="en"><head>
<meta charset="utf-8"/>
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<title>RuView — Raw Signals</title>
<style>
:root { color-scheme: dark; }
body { margin:0; padding:14px; font-family:-apple-system,Inter,system-ui,sans-serif;
background:#0a0e13; color:#e6edf3; font-size:12px; }
h1 { font-size:15px; font-weight:600; margin:0 0 2px; }
.sub { font-size:11px; color:#888; margin:0 0 12px; }
.topbar { display:flex; gap:14px; align-items:center; margin-bottom:10px; flex-wrap:wrap; }
.pill { padding:4px 10px; border-radius:4px; font-family:JetBrains Mono,monospace; font-size:11px;
background:#1c2128; }
.pill.dis { background:#3a1418; color:#ff6a6a; }
.pill.ok { background:#0e2a1a; color:#7ce38b; }
button { background:#21262d; color:#e6edf3; border:1px solid #30363d; border-radius:4px;
padding:4px 10px; font-size:11px; cursor:pointer; }
.node { background:#161b22; border:1px solid #30363d; border-radius:6px;
padding:10px 12px; margin-bottom:10px; }
.node h2 { margin:0 0 6px; font-size:12px; font-weight:600; color:#7cb6ff;
font-family:JetBrains Mono,monospace; display:flex; gap:14px; align-items:baseline; }
.node h2 .stat { color:#888; font-weight:normal; font-size:11px; }
.node h2 .stat b { color:#e6edf3; font-weight:600; }
.badge { font-family:JetBrains Mono,monospace; font-size:11px; padding:2px 8px; border-radius:3px; }
.badge.absent { background:#21262d; color:#888; }
.badge.present_still { background:#1c3a55; color:#7cb6ff; }
.badge.present_moving{ background:#3a5520; color:#90d36b; }
.badge.active { background:#552020; color:#ff7a7a; }
.row { display:grid; grid-template-columns: 1fr 360px; gap:10px; }
@media (max-width: 900px) { .row { grid-template-columns: 1fr; } }
canvas { display:block; width:100%; background:#0a0e13; border-radius:3px; }
canvas.bars { height: 130px; }
canvas.trace { height: 130px; }
canvas.spark { height: 48px; margin-top: 6px; }
.lbl { color:#666; font-size:10px; font-family:JetBrains Mono,monospace; margin:2px 0 0; }
.controls { display:flex; gap:8px; margin-left:auto; }
.controls label { font-size:11px; color:#aaa; }
</style>
</head>
<body>
<h1>RuView — Raw CSI signals</h1>
<p class="sub">Per-node subcarrier amplitudes + RSSI/broadband traces. No DSP, no classification. Stream straight from the sensor.</p>
<div class="topbar">
<span id="status" class="pill dis">disconnected</span>
<span class="pill" id="rate">0 fps</span>
<span class="pill" id="lastTs">last: --</span>
<span class="badge absent" id="globalBadge" style="font-size:13px;padding:4px 12px;">absent</span>
<span class="pill" id="globalCV">CV 0%</span>
<div class="controls">
<label>peak-hold <input type="checkbox" id="peakHold" checked></label>
<label>log-y <input type="checkbox" id="logY"></label>
<button onclick="resetState()">reset</button>
<button id="calibrateBtn" onclick="startCalibrate()" title="Step out of the room, click, wait 90 s">calibrate empty</button>
<span class="pill" id="calibStatus" style="display:none"></span>
<!-- ADR-107: visible progress bar shown while baseline capture runs. -->
<div id="calibProgress" style="display:none; position:relative; width:140px; height:14px;
border:1px solid #30363d; border-radius:7px; overflow:hidden;
background:#0a0e13;">
<div id="calibProgressFill" style="position:absolute; left:0; top:0; bottom:0; width:0%;
background:linear-gradient(90deg,#1f6feb,#3fb950);
transition: width 0.4s linear;"></div>
<span id="calibProgressLabel" style="position:absolute; inset:0; display:flex;
align-items:center; justify-content:center;
font-size:10px; font-family:JetBrains Mono,monospace;
color:#e6edf3; text-shadow:0 0 2px #000;"></span>
</div>
</div>
</div>
<div id="nodes"></div>
<script>
// ── State ──────────────────────────────────────────────────────────
const TRACE_SEC = 30; // seconds of history per node
const TRACE_MAX_PTS = 1200; // safety cap
const state = new Map(); // node_id -> { amp, peak, rssiHist[], meanAmpHist[], lastTs, frames }
let frameCount = 0;
let lastRateTs = performance.now();
let rateFps = 0;
let logY = false;
let peakHold = true;
function resetState() {
state.clear();
document.getElementById('nodes').innerHTML = '';
frameCount = 0;
}
document.getElementById('peakHold').addEventListener('change', e => { peakHold = e.target.checked; });
document.getElementById('logY').addEventListener('change', e => { logY = e.target.checked; });
// ── Per-node block factory ─────────────────────────────────────────
function ensureNodeBlock(nodeId) {
if (state.has(nodeId)) return state.get(nodeId);
const ent = {
amp: [],
peak: [],
rssiHist: [], // { t, v }
meanAmpHist: [],
driftHist: [], // { t, v } — ADR-104 per-sub drift score
lastTs: 0,
frames: 0,
lastFrameWall: performance.now(),
fps: 0,
};
state.set(nodeId, ent);
const wrap = document.createElement('div');
wrap.className = 'node';
wrap.id = 'node-' + nodeId;
wrap.innerHTML = `
<h2>
Node ${nodeId}
<span class="badge absent" id="n${nodeId}-badge">absent</span>
<span class="stat">CV <b id="n${nodeId}-cv">0%</b></span>
<span class="stat">subc <b id="n${nodeId}-sub">0</b></span>
<span class="stat">rssi <b id="n${nodeId}-rssi">--</b> dBm</span>
<span class="stat">mean A <b id="n${nodeId}-meanA">0</b></span>
<span class="stat">peak A <b id="n${nodeId}-peakA">0</b></span>
<span class="stat">drift <b id="n${nodeId}-drift">--</b></span>
<span class="stat">node fps <b id="n${nodeId}-fps">0</b></span>
</h2>
<div class="row">
<div>
<canvas class="bars" id="n${nodeId}-bars"></canvas>
<p class="lbl">subcarrier amplitude bars (left → low freq, right → high freq)</p>
</div>
<div>
<canvas class="trace" id="n${nodeId}-trace"></canvas>
<p class="lbl"><span style="color:#8b949e">RSSI</span> &nbsp; <span style="color:#3fb950">broadband mean amplitude</span> &nbsp; (last ${TRACE_SEC}s)</p>
<canvas class="spark" id="n${nodeId}-driftSpark"></canvas>
<p class="lbl"><span style="color:#d29922">per-sub drift</span> — off-axis presence channel (ADR-104); dashed line = presence threshold 0.10</p>
</div>
</div>`;
document.getElementById('nodes').appendChild(wrap);
return ent;
}
// ── Drawing ────────────────────────────────────────────────────────
function drawBars(canvas, amps, peaks) {
const w = canvas.clientWidth, h = canvas.clientHeight;
if (canvas.width !== w || canvas.height !== h) { canvas.width = w; canvas.height = h; }
const ctx = canvas.getContext('2d');
ctx.fillStyle = '#0a0e13'; ctx.fillRect(0, 0, w, h);
if (!amps.length) return;
// Determine scale
let maxV = peakHold && peaks.length
? Math.max(...peaks)
: Math.max(...amps);
if (!isFinite(maxV) || maxV <= 0) maxV = 1;
const n = amps.length;
const bw = w / n;
const margin = 4;
// Bars
for (let i = 0; i < n; i++) {
let v = amps[i];
let pv = peaks[i] || 0;
if (logY) {
v = v > 0 ? Math.log10(v + 1) : 0;
pv = pv > 0 ? Math.log10(pv + 1) : 0;
}
const scaleMax = logY ? Math.log10(maxV + 1) : maxV;
const bh = Math.max(1, (v / scaleMax) * (h - margin));
const ph = Math.max(1, (pv / scaleMax) * (h - margin));
const x = i * bw;
// peak (faint)
if (peakHold && pv > 0) {
ctx.fillStyle = '#1f3a5a';
ctx.fillRect(x, h - ph, Math.max(1, bw - 1), 1.5);
}
// bar (active)
const hue = 200 + (i / n) * 100;
ctx.fillStyle = `hsl(${hue}, 70%, 55%)`;
ctx.fillRect(x, h - bh, Math.max(1, bw - 1), bh);
}
// Y-axis label
ctx.fillStyle = '#555'; ctx.font = '9px monospace';
ctx.fillText('max=' + maxV.toFixed(0), 4, 10);
ctx.fillText('n=' + n, w - 40, 10);
}
function drawTrace(canvas, rssiHist, meanAmpHist) {
const w = canvas.clientWidth, h = canvas.clientHeight;
if (canvas.width !== w || canvas.height !== h) { canvas.width = w; canvas.height = h; }
const ctx = canvas.getContext('2d');
ctx.fillStyle = '#0a0e13'; ctx.fillRect(0, 0, w, h);
const now = performance.now() / 1000;
const t0 = now - TRACE_SEC;
const drawSeries = (arr, color, getRange) => {
if (arr.length < 2) return;
const visible = arr.filter(p => p.t >= t0);
if (visible.length < 2) return;
const { min, max } = getRange(visible);
const span = (max - min) || 1;
ctx.strokeStyle = color; ctx.lineWidth = 1.5; ctx.beginPath();
for (let i = 0; i < visible.length; i++) {
const p = visible[i];
const x = ((p.t - t0) / TRACE_SEC) * w;
const y = h - ((p.v - min) / span) * (h - 8) - 4;
if (i === 0) ctx.moveTo(x, y); else ctx.lineTo(x, y);
}
ctx.stroke();
// y-range text
ctx.fillStyle = color; ctx.font = '9px monospace';
return { min, max };
};
const rssiR = drawSeries(rssiHist, '#8b949e', arr => {
const vals = arr.map(p => p.v);
return { min: Math.min(...vals), max: Math.max(...vals) };
});
const ampR = drawSeries(meanAmpHist, '#3fb950', arr => {
const vals = arr.map(p => p.v);
return { min: 0, max: Math.max(...vals) };
});
// labels
ctx.font = '9px monospace';
if (rssiR) { ctx.fillStyle = '#8b949e'; ctx.fillText(`rssi ${rssiR.min.toFixed(0)}…${rssiR.max.toFixed(0)} dBm`, 4, 10); }
if (ampR) { ctx.fillStyle = '#3fb950'; ctx.fillText(`A ${ampR.min.toFixed(0)}…${ampR.max.toFixed(0)}`, 4, 22); }
// grid line at now
ctx.strokeStyle = '#1c2128'; ctx.beginPath();
ctx.moveTo(w - 1, 0); ctx.lineTo(w - 1, h); ctx.stroke();
}
// ADR-104: per-sub drift sparkline. Fixed Y range [0, 0.30] so the
// presence threshold (0.10, dashed) and warning threshold (0.15) are
// directly readable across nodes — re-scaling per node would make it
// impossible to tell "Node 0 fired" from "Node 1 fired" at a glance.
const DRIFT_PRESENCE_THRESH = 0.10;
const DRIFT_WARN_THRESH = 0.15;
const DRIFT_MAX = 0.30;
function drawDriftSpark(canvas, hist) {
const w = canvas.clientWidth, h = canvas.clientHeight;
if (canvas.width !== w || canvas.height !== h) { canvas.width = w; canvas.height = h; }
const ctx = canvas.getContext('2d');
ctx.fillStyle = '#0a0e13'; ctx.fillRect(0, 0, w, h);
const now = performance.now() / 1000;
const t0 = now - TRACE_SEC;
const yOf = v => h - (Math.min(v, DRIFT_MAX) / DRIFT_MAX) * (h - 4) - 2;
// Threshold lines.
ctx.setLineDash([3, 3]);
ctx.strokeStyle = '#5a4a1a'; ctx.lineWidth = 1; ctx.beginPath();
ctx.moveTo(0, yOf(DRIFT_PRESENCE_THRESH)); ctx.lineTo(w, yOf(DRIFT_PRESENCE_THRESH));
ctx.stroke();
ctx.strokeStyle = '#7a3030'; ctx.beginPath();
ctx.moveTo(0, yOf(DRIFT_WARN_THRESH)); ctx.lineTo(w, yOf(DRIFT_WARN_THRESH));
ctx.stroke();
ctx.setLineDash([]);
const visible = hist.filter(p => p.t >= t0);
if (visible.length >= 2) {
ctx.strokeStyle = '#d29922'; ctx.lineWidth = 1.5; ctx.beginPath();
for (let i = 0; i < visible.length; i++) {
const p = visible[i];
const x = ((p.t - t0) / TRACE_SEC) * w;
const y = yOf(p.v);
if (i === 0) ctx.moveTo(x, y); else ctx.lineTo(x, y);
}
ctx.stroke();
}
// Axis text.
ctx.fillStyle = '#666'; ctx.font = '9px monospace';
ctx.fillText('0', 2, h - 2);
ctx.fillText(DRIFT_MAX.toFixed(2), 2, 10);
}
// ── Frame ingestion ────────────────────────────────────────────────
function handleSensingUpdate(d) {
const nodes = d.nodes || [];
const ts = d.timestamp || (Date.now() / 1000);
const now = performance.now() / 1000;
for (const n of nodes) {
const id = n.node_id;
const amps = n.amplitude || [];
// Skip empty-amp ticks (feature_state path doesn't carry raw CSI).
// Bars/traces only refresh on real raw-CSI frames so what you see
// is always a live snapshot, not a repeated stale vector.
if (!amps.length) continue;
const ent = ensureNodeBlock(id);
ent.amp = amps;
// peak-hold update
if (ent.peak.length !== amps.length) ent.peak = amps.slice();
else for (let i = 0; i < amps.length; i++) if (amps[i] > ent.peak[i]) ent.peak[i] = amps[i];
const meanA = amps.reduce((s, x) => s + x, 0) / amps.length;
// Only push valid (non-zero) RSSI samples so the trace doesn't
// jump between real dBm values and the "0 = no data" sentinel.
if (n.rssi_dbm && n.rssi_dbm !== 0) {
ent.rssiHist.push({ t: now, v: n.rssi_dbm });
}
ent.meanAmpHist.push({ t: now, v: meanA });
const cutoff = now - TRACE_SEC;
while (ent.rssiHist.length && ent.rssiHist[0].t < cutoff) ent.rssiHist.shift();
while (ent.meanAmpHist.length && ent.meanAmpHist[0].t < cutoff) ent.meanAmpHist.shift();
if (ent.rssiHist.length > TRACE_MAX_PTS) ent.rssiHist.splice(0, ent.rssiHist.length - TRACE_MAX_PTS);
if (ent.meanAmpHist.length > TRACE_MAX_PTS) ent.meanAmpHist.splice(0, ent.meanAmpHist.length - TRACE_MAX_PTS);
// per-node fps: count frames in the last second, refresh once a sec
// (instantaneous 1/dt was wildly noisy because multiple WS paths
// emit duplicate per-node updates back-to-back).
ent.fpsCounter = (ent.fpsCounter || 0) + 1;
const nowMs = performance.now();
if (!ent.fpsWindowStart) ent.fpsWindowStart = nowMs;
if (nowMs - ent.fpsWindowStart >= 1000) {
ent.fps = ent.fpsCounter * 1000 / (nowMs - ent.fpsWindowStart);
ent.fpsCounter = 0;
ent.fpsWindowStart = nowMs;
}
ent.lastFrameWall = nowMs;
ent.frames++;
ent.lastTs = ts;
document.getElementById(`n${id}-sub`).textContent = amps.length;
// n.rssi_dbm comes from sensing_update.nodes[]; it can be 0 on
// early ticks (history not yet populated). Coerce to "--" so the
// operator doesn't think the AP is dead.
const rssiVal = (n.rssi_dbm && Number.isFinite(n.rssi_dbm) && n.rssi_dbm !== 0)
? n.rssi_dbm.toFixed(1)
: '--';
document.getElementById(`n${id}-rssi`).textContent = rssiVal;
// Push to RSSI trace history if non-zero (so the chart shows the
// real ladder of dBm steps, not a fake "0 → -54" jump on boot).
if (n.rssi_dbm && n.rssi_dbm !== 0) {
// (handled by ent.rssiHist push below)
}
document.getElementById(`n${id}-meanA`).textContent = meanA.toFixed(1);
document.getElementById(`n${id}-peakA`).textContent = Math.max(...ent.peak).toFixed(1);
document.getElementById(`n${id}-fps`).textContent = ent.fps.toFixed(1);
}
document.getElementById('lastTs').textContent = 'last: ' + new Date(ts * 1000).toLocaleTimeString();
// Global classification badge (ADR-101 fused).
const gcl = d.classification || {};
const glvl = gcl.motion_level || 'absent';
const gb = document.getElementById('globalBadge');
if (gb) { gb.textContent = glvl; gb.className = 'badge ' + glvl; gb.style.fontSize = '13px'; gb.style.padding = '4px 12px'; }
const gcv = document.getElementById('globalCV');
if (gcv) gcv.textContent = 'CV ' + ((gcl.confidence || 0) * 100).toFixed(1) + '%';
// Per-node level badge from node_features[i].classification (ADR-101).
const nfNow = performance.now() / 1000;
const nf = d.node_features || [];
for (const f of nf) {
const id = f.node_id;
const cls = f.classification || {};
const lvl = cls.motion_level || 'absent';
const badge = document.getElementById(`n${id}-badge`);
if (badge) {
badge.textContent = lvl;
badge.className = 'badge ' + lvl;
}
const cvEl = document.getElementById(`n${id}-cv`);
if (cvEl) cvEl.textContent = ((cls.confidence || 0) * 100).toFixed(1) + '%';
// ADR-104 per-sub drift score (off-axis presence). May be absent
// when no per-sub baseline is loaded for this node — show '--'
// instead of '0.000' so the operator can tell the channel is
// unknown vs. known and stable.
const driftEl = document.getElementById(`n${id}-drift`);
const driftLive = state.get(id);
if (typeof f.drift_score === 'number' && Number.isFinite(f.drift_score)) {
if (driftEl) driftEl.textContent = f.drift_score.toFixed(3);
if (driftLive) {
driftLive.driftHist.push({ t: nfNow, v: f.drift_score });
const cutoff = nfNow - TRACE_SEC;
while (driftLive.driftHist.length && driftLive.driftHist[0].t < cutoff) {
driftLive.driftHist.shift();
}
if (driftLive.driftHist.length > TRACE_MAX_PTS) {
driftLive.driftHist.splice(0, driftLive.driftHist.length - TRACE_MAX_PTS);
}
}
} else if (driftEl) {
driftEl.textContent = '--';
}
}
frameCount++;
}
function renderTick() {
for (const [id, ent] of state) {
const bars = document.getElementById('n' + id + '-bars');
const trace = document.getElementById('n' + id + '-trace');
const spark = document.getElementById('n' + id + '-driftSpark');
if (bars) drawBars(bars, ent.amp, ent.peak);
if (trace) drawTrace(trace, ent.rssiHist, ent.meanAmpHist);
if (spark) drawDriftSpark(spark, ent.driftHist);
}
// fps pill
const now = performance.now();
if (now - lastRateTs > 500) {
rateFps = (frameCount * 1000) / (now - lastRateTs);
document.getElementById('rate').textContent = rateFps.toFixed(1) + ' fps total';
frameCount = 0;
lastRateTs = now;
}
requestAnimationFrame(renderTick);
}
requestAnimationFrame(renderTick);
// ── ADR-107: baseline calibrate button + progress bar ─────────────
let calibPollTimer = null;
const CALIB_DURATION_SEC = 90;
function setCalibProgress(pct, label) {
const bar = document.getElementById('calibProgress');
const fill = document.getElementById('calibProgressFill');
const txt = document.getElementById('calibProgressLabel');
if (!bar || !fill || !txt) return;
bar.style.display = pct < 0 ? 'none' : 'inline-block';
fill.style.width = Math.max(0, Math.min(100, pct)) + '%';
txt.textContent = label || '';
}
async function startCalibrate() {
if (!confirm(`Step OUT of the room now. Calibration will record for ${CALIB_DURATION_SEC} s.\nClick OK when you are out.`)) return;
const btn = document.getElementById('calibrateBtn');
const stat = document.getElementById('calibStatus');
btn.disabled = true; btn.textContent = 'recording…';
// Hide the text-pill while the progress bar is the primary indicator;
// it reappears only on terminal status messages (error / complete).
stat.style.display = 'none';
setCalibProgress(0, 'starting…');
try {
const res = await fetch('/api/v1/baseline/calibrate', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({ duration_sec: CALIB_DURATION_SEC, trim_sec: 15, clean_window_sec: 30 }),
});
const j = await res.json();
if (!j.started) {
setCalibProgress(-1, '');
stat.style.display = 'inline-block';
stat.textContent = j.reason || 'failed to start';
btn.disabled = false; btn.textContent = 'calibrate empty';
return;
}
} catch (e) {
setCalibProgress(-1, '');
stat.style.display = 'inline-block';
stat.textContent = 'network error';
btn.disabled = false; btn.textContent = 'calibrate empty';
return;
}
if (calibPollTimer) clearInterval(calibPollTimer);
let elapsed = 0;
calibPollTimer = setInterval(async () => {
elapsed += 2;
try {
const r = await fetch('/api/v1/baseline'); const j = await r.json();
const s = j.calibration_status || 'idle';
if (s.startsWith('running')) {
const pct = Math.min(99, (elapsed / CALIB_DURATION_SEC) * 100);
setCalibProgress(pct, `${elapsed}/${CALIB_DURATION_SEC} s`);
} else {
clearInterval(calibPollTimer); calibPollTimer = null;
btn.disabled = false; btn.textContent = 'calibrate empty';
if (s === 'complete') {
setCalibProgress(100, 'done');
stat.style.display = 'inline-block';
stat.textContent = 'baseline updated ✓';
setTimeout(() => setCalibProgress(-1, ''), 3000);
} else {
setCalibProgress(-1, '');
stat.style.display = 'inline-block';
stat.textContent = s;
}
}
} catch (e) {}
}, 2000);
}
// ── WS ─────────────────────────────────────────────────────────────
function connect() {
const ws = new WebSocket('ws://' + location.hostname + ':8765/ws/sensing');
ws.onopen = () => {
const p = document.getElementById('status');
p.textContent = 'connected'; p.className = 'pill ok';
};
ws.onclose = () => {
const p = document.getElementById('status');
p.textContent = 'disconnected — reconnecting'; p.className = 'pill dis';
setTimeout(connect, 1500);
};
ws.onmessage = (e) => {
try {
const d = JSON.parse(e.data);
if (d.type === 'sensing_update') handleSensingUpdate(d);
} catch (_) {}
};
}
connect();
</script>
</body></html>

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -122,9 +122,30 @@ fn test_different_nodes_produce_different_frames() {
/// Send multiple frames from different nodes to a UDP port.
/// This test verifies the packet format is accepted by a real server
/// if one is running, but doesn't fail if no server is available.
///
/// ADR-117: previously this test sent to `127.0.0.1:5005` unconditionally,
/// hitting any live server on the same port. With `node_ids = [1,2,3,5,7]`
/// × 10 frames + 5 vitals it injected 55 spurious node_ids into the
/// server's NODE_ADDRS — the keepalive task then spawned one `ping` child
/// process per unique nid, accumulating 250+ ping zombies in production.
/// Mitigation is two-layered: server now filters loopback at the UDP
/// receiver, AND this test refuses to fire if anything is already bound
/// to 127.0.0.1:5005.
#[test]
fn test_multi_node_udp_send() {
// Try to bind to a random port and send to localhost:5005
// ADR-117 guard: if some other process is bound to 127.0.0.1:5005 (most
// commonly a live sensing-server during dev), skip the send so we don't
// pollute that process's state. The bind probe is the cheapest signal —
// if we can bind even briefly, nobody owns the port; if not, abort.
match UdpSocket::bind("127.0.0.1:5005") {
Ok(probe) => drop(probe),
Err(_) => {
eprintln!("test_multi_node_udp_send: 127.0.0.1:5005 already in use — skipping (ADR-117)");
return;
}
};
// Try to bind to a random port and send to localhost:5005.
// This is a smoke test — it verifies frames can be sent without panic.
let sock = UdpSocket::bind("0.0.0.0:0").expect("bind");
sock.set_write_timeout(Some(Duration::from_millis(100))).ok();