fix(sensing-server): WS broadcast emits effective_source() not hardcoded "esp32" (closes #618) (#621)

Reported by @ArnonEnbar with a complete reproduction.

broadcast_tick_task() re-emits the cached `latest_update` every tick so
pose WS clients keep getting data even when ESP32 pauses between
frames. The `source` field of that cached update was set to "esp32" at
the moment a fresh ESP32 frame was last decoded (main.rs:3885, :4136).

After the ESP32 loses power or network, no fresh frame is decoded —
the cached `latest_update` is still re-broadcast every tick with the
stale source: "esp32" baked in. UI's "Sensing" tab keeps showing
"LIVE — ESP32 HARDWARE Connected" with frozen vitals/features/
classification re-broadcast indefinitely. REST `/health` correctly
reports source: "esp32:offline" (via effective_source(), which checks
last_esp32_frame elapsed time against ESP32_OFFLINE_TIMEOUT=5s) — but
the WS broadcast path was the one consumer that didn't call it.

Fix: clone the cached update per tick, overwrite source with
s.effective_source(), then serialize and broadcast. UI now switches to
"esp32:offline" on the same 5s budget as the REST surface.

cargo build -p wifi-densepose-sensing-server --no-default-features:
17s, no errors (1 pre-existing unused-import warning unchanged).
This commit is contained in:
rUv 2026-05-18 08:18:18 -04:00 committed by GitHub
parent 72bbd256e7
commit b2e2e6d6fd
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 13 additions and 1 deletions

View File

@ -18,6 +18,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
Pre-fix, unauthenticated callers could read `../../etc/passwd`-style paths, write arbitrary JSONL files, load attacker-controlled `.rvf` model files, or delete arbitrary files the server process could touch. 9 unit tests in `path_safety::tests` exercise the rejection envelope (empty, too-long, path separators, parent-dir traversal, null byte, whitespace/specials, non-ASCII).
### Fixed
- **WebSocket `/ws/sensing` now reports `esp32:offline` when ESP32 hardware goes stale** (closes #618). `broadcast_tick_task` was re-emitting the cached `latest_update` with a frozen `source: "esp32"` field forever after the hardware lost power or network. The REST `/health` endpoint already called `effective_source()` (which returns `"esp32:offline"` after `ESP32_OFFLINE_TIMEOUT` = 5 s with no UDP frames), but the WS broadcast path was the one consumer that didn't. Result: the UI's "LIVE — ESP32 HARDWARE Connected" banner stayed green long after the hardware went away, and `vital_signs`/`features`/`classification` re-broadcasted the last-seen values indefinitely. Fix: clone the cached `latest_update` per tick, overwrite `source` with `s.effective_source()`, then serialize and broadcast. UI can now switch to an offline state on the same 5-second budget the REST surface uses.
- **Proof replay (`archive/v1/data/proof/verify.py`) is now cross-platform deterministic** (closes #560). Three changes together: (1) `features_to_bytes()` now `np.round(.., HASH_QUANTIZATION_DECIMALS=6)`s each feature array before packing as little-endian f64, collapsing ULP-level drift from scipy.fft pocketfft SIMD reordering; (2) the `Verify Pipeline Determinism` workflow pins `OMP_NUM_THREADS=1`, `OPENBLAS_NUM_THREADS=1`, `MKL_NUM_THREADS=1`, `VECLIB_MAXIMUM_THREADS=1`, `NUMEXPR_NUM_THREADS=1` — multi-threaded BLAS reductions were a deeper source of non-determinism than SIMD reordering, and 6-decimal quantization alone wasn't enough across Azure VM microarchitectures; (3) `expected_features.sha256` regenerated under the new conditions. CI now passes the determinism check (same hash across consecutive runs on canonical Linux x86_64 CI runner: `667eb054c44ac510342665bf9c93d608868a8ead948ae8774b2796ebce6f8fe7`). `scripts/probe-fft-platform.py` updated to mirror `HASH_QUANTIZATION_DECIMALS=6` for cross-machine spot-checks.
- **`archive/v1/src/services/pose_service.py:223` calls the right method on `PhaseSanitizer`** (closes #612). The call was `self.phase_sanitizer.sanitize(phase_data)`, but `PhaseSanitizer`'s full-pipeline entry point is named `sanitize_phase()` (`unwrap_phase` + `remove_outliers` + `smooth_phase` chained, see `archive/v1/src/core/phase_sanitizer.py:266`). The shorter `sanitize` name doesn't exist on the class, so any path that reached this branch raised `AttributeError` and crashed the pose service mid-frame.
- **`adaptive_classifier.rs:94` no longer panics on NaN feature values** (closes #611).

View File

@ -4331,7 +4331,18 @@ async fn broadcast_tick_task(state: SharedState, tick_ms: u64) {
if s.tx.receiver_count() > 0 {
// Re-broadcast the latest sensing_update so pose WS clients
// always get data even when ESP32 pauses between frames.
if let Ok(json) = serde_json::to_string(update) {
//
// Issue #618: overwrite `source` with `effective_source()`
// before each broadcast so a stale latest_update (frozen
// payload from a now-offline ESP32) is emitted with
// `source: "esp32:offline"` instead of `source: "esp32"`.
// The REST `/health` endpoint already does this; before
// this fix the WS path was the only consumer that didn't,
// so the UI's "LIVE — ESP32 HARDWARE Connected" banner
// stayed green long after the hardware went away.
let mut tagged = update.clone();
tagged.source = s.effective_source();
if let Ok(json) = serde_json::to_string(&tagged) {
let _ = s.tx.send(json);
}
}