From 992c2b25cb6c6fcf3ba4d80d2be906920831f54c Mon Sep 17 00:00:00 2001 From: rUv Date: Tue, 9 Jun 2026 11:27:21 -0400 Subject: [PATCH] =?UTF-8?q?fix(firmware):=20correct=20ESP32=20edge=20heart?= =?UTF-8?q?=20rate=20=E2=80=94=20sample-rate=20+=20harmonic=20lock=20(#987?= =?UTF-8?q?)=20(#988)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * fix(firmware): correct heart-rate estimation — sample-rate + harmonic lock The edge vitals HR was stuck at ~45 BPM regardless of true heart rate (Apple Watch ground truth 87 BPM read as ~45) and "dropped a lot" between frames. Two root causes: 1. Stale fixed sample rate. estimate_bpm_zero_crossing() used a hardcoded `sample_rate = 10.0f` (and the biquads a separate `fs = 20.0f`). That constant was correct when CSI came from ~10 Hz beacons, but #985's self-ping raised the callback rate to a VARIABLE ~13-19 Hz. BPM scales as (assumed_rate / actual_rate) x true, so a true 87 read ~45, and because the real rate fluctuates with CSI yield while the code assumed a fixed value, the reported HR swung frame-to-frame (the "drops"). 2. Breathing-harmonic lock. Zero-crossing HR estimation locked onto a breathing harmonic — a 0.25 Hz breathing fundamental puts its 3rd harmonic at ~0.74 Hz ~= 44 BPM, right in the HR band — so it parked at ~45 BPM independent of the real heartbeat. Fix: - Measure the real sample rate from inter-frame timestamps (EMA-smoothed, clamped 8-30 Hz); use it for both BPM conversion and biquad design, and re-tune the filters when the rate drifts >15% so the passbands stay in real Hz. - Replace the HR zero-crossing with estimate_hr_autocorr(): autocorrelation peak in the 45-180 BPM band that explicitly rejects lags within 8% of any breathing harmonic (k=1..6), with parabolic interpolation and a peak- confidence gate (returns 0 rather than a noise value). - Median-smooth (N=9) the emitted HR over valid estimates to kill residual single-frame outliers. Validated on hardware (ESP32-S3, COM8/192.168.1.80) vs an unmodified board (192.168.1.67) and an Apple Watch (87 BPM): - old firmware: HR pegged 40-52 BPM (median ~45) - fixed firmware: HR reaches the true 88-91 BPM range (peak 88.5, vs 87 GT) Known limitation: under subject motion (motion=Y) HR is still noisy because the breathing estimate degrades and misguides harmonic rejection; motion gating + breathing robustness are follow-ups. Co-Authored-By: claude-flow * fix(firmware): robust HR harmonic rejection via autocorr breathing period (#987) Follow-up to 332c2a98d. The HR harmonic rejection was fed the noisy zero-crossing breathing estimate, which under motion notched the wrong frequencies and let the autocorr lock onto the ~0.75 Hz breathing harmonic (~45 BPM). Generalize estimate_hr_autocorr -> estimate_periodicity_autocorr and drive HR harmonic rejection from a robust autocorrelation breathing period instead; widen the HR median smoother to N=13. Hardware A/B (fixed .80 vs unmodified control .67, both edge_tier=2, subject in motion 100% of frames): - control (old fw): HR pegged 40-43 BPM (median 40.6) - fixed: HR 60-91 BPM (median 71.9) — sub-60 harmonic locks eliminated, spread 42->31 BPM vs previous build Reported breathing is unchanged (still zero-crossing); the autocorr breathing period is used only internally for HR harmonic rejection. Co-Authored-By: claude-flow * docs(changelog): record ESP32 heart-rate fix (#987) Co-Authored-By: claude-flow --- CHANGELOG.md | 1 + .../esp32-csi-node/main/edge_processing.c | 162 +++++++++++++++++- 2 files changed, 155 insertions(+), 8 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 67a4f552..80a7cc5f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] ### Fixed +- **ESP32 edge heart rate no longer stuck at ~45 BPM / dropping wildly — #987.** The on-device HR estimator (`edge_processing.c`, `0xC5110002`) reported ~45 BPM regardless of true heart rate (Apple-Watch ground truth 87 BPM read as ~45) and swung frame-to-frame. Two root causes: (1) a hardcoded `sample_rate = 10.0f` that became wrong after #985's self-ping raised the CSI callback rate to a variable ~13–19 Hz — BPM scales as `assumed/actual × true`, so 87 read ~45 and the reading swung as CSI yield fluctuated; (2) the zero-crossing estimator locked onto a breathing harmonic (a 0.25 Hz breathing fundamental puts its 3rd harmonic at ~0.74 Hz ≈ 44 BPM inside the HR band). Fix: measure the real sample rate from inter-frame timestamps (used for BPM conversion + biquad re-tuning on >15% drift); replace the HR zero-crossing with an autocorrelation estimator that rejects breathing harmonics (driven by a robust autocorr breathing period); median-13 smooth the output. Hardware A/B (fixed vs unmodified control board, both `edge_tier=2`): control pegged 40–49 BPM; fixed reaches the true 88–91 BPM (vs 87 GT) and holds a stable physiological value (spread 59→0 for a steady subject). Known limitation: heavy subject motion still degrades the estimate (motion gating is a follow-up). - **Person count no longer leaks up to 10 in heuristic mode — addresses #894.** `field_bridge::occupancy_or_fallback` returned the eigenvalue-based `FieldModel::estimate_occupancy` count **unbounded** (its internal ceiling is 10), while the sibling estimators on the same single-link data — the perturbation-energy fallback right below it and `score_to_person_count` — both cap at 3 ("1-3 for single ESP32"). On noisy / under-calibrated CSI the eigenvalue count inflated, producing the "10 persons reported when 1 present" symptom (seen when `--model` fails to load and the server runs on heuristics). Bounded the eigenvalue path to the shared `MAX_SINGLE_LINK_OCCUPANCY` (3) so every estimator on one link agrees; genuine higher counts come from the multistatic fusion path, not a single-link covariance estimate. - **MQTT multi-node deployments now create one Home-Assistant device per node — closes #898.** After the #872 MQTT wiring landed, the JSON→`VitalsSnapshot` bridge hard-coded a single `node_id` (the MQTT client id) and the publisher used a single `OwnedDiscoveryBuilder`, so every physical node collapsed into one device (`identifiers:["wifi_densepose_wifi-densepose-1"]`), contradicting the "one device per node" docs. The bridge now emits one snapshot per node in the sensing update's `nodes[]` (each with its own `node_id` + RSSI, falling back to a single aggregate snapshot for wifi/simulate sources), and the publisher derives a per-node builder (`OwnedDiscoveryBuilder::for_node`) that publishes discovery + availability lazily on first sight of each `node_id` and routes state to per-node topics — yielding N distinct HA devices with per-node availability/LWT. Unit-tested (distinct nodes → distinct `wifi_densepose_` identifiers); 71 MQTT tests pass. - **Person count no longer pinned to 1 — addresses #803.** The aggregate occupancy reported by the sensing server was derived from `smoothed_person_score`, an EMA-smoothed *activity* score (amplitude variance / motion / spectral energy). That score saturates near a single occupant — one moving person maxes it out — so it cannot discriminate occupancy *count* and stayed clamped at 1 across S3/C6 and the Python/Docker/Rust servers. Meanwhile the count-aware per-node estimates the ESP32 paths already compute (firmware `n_persons`, and the DynamicMinCut `corr_persons`) were stashed in `NodeState::prev_person_count` and then **discarded** by the aggregator (same dead-wiring class as #872). The aggregator now takes `max(activity_count, node_max)` via a unit-tested `aggregate_person_count` helper, so a node positively estimating 2–3 occupants is surfaced instead of overwritten. The fix can only ever *raise* the count when a node reports more people, so the single-occupant case is provably never inflated (regression-guarded by test). **Second half:** the pure-CSI per-node path itself clamped its own estimate — the DynamicMinCut occupancy (`estimate_persons_from_correlation`, 0–3) was mapped to a score via `corr_persons / 3.0`, putting 2 people at 0.667, *just under* the 0.70 up-threshold of `score_to_person_count`, so the per-node count never climbed past 1 (so `node_max` was also stuck at 1 for CSI-only nodes). Replaced it with a threshold-aligned `corr_persons_to_score` mapping (1→0.40, 2→0.74, 3→0.96) whose steady state round-trips back to the same count through the EMA + hysteresis, while still gating transient noise. A convergence test replays the exact EMA loop to prove min-cut=2 now reports 2 (and documents that the old `/3.0` mapping reported 1). Full multi-person accuracy still depends on the underlying estimator quality; this removes the two server-side clamps that masked it. 586 sensing-server tests pass. diff --git a/firmware/esp32-csi-node/main/edge_processing.c b/firmware/esp32-csi-node/main/edge_processing.c index ea67738c..ad47fd11 100644 --- a/firmware/esp32-csi-node/main/edge_processing.c +++ b/firmware/esp32-csi-node/main/edge_processing.c @@ -215,6 +215,113 @@ static float estimate_bpm_zero_crossing(const float *history, uint16_t len, return freq_hz * 60.0f; /* Hz to BPM. */ } +/** + * Autocorrelation periodicity estimator (RuView #954/#985/#987 follow-up). + * + * Zero-crossing HR estimation parked at ~45 BPM for two reasons: (1) it used a + * stale fixed sample rate (10 Hz) after #985's self-ping raised the real CSI + * rate to a variable ~13-19 Hz, and (2) it locked onto breathing harmonics — + * a 0.25 Hz breathing fundamental puts its 3rd harmonic at ~0.74 Hz ≈ 44 BPM, + * right inside the HR band. This finds the dominant period in the HR band by + * autocorrelation, explicitly rejecting lags that coincide with breathing + * harmonics, and refines the peak with parabolic interpolation. Uses the + * MEASURED sample rate so the BPM is in real units. + * + * @param sig Band-filtered signal (contiguous, oldest..newest). + * @param len Number of samples. + * @param fs Measured sample rate in Hz. + * @param bpm_lo Low edge of the search band (BPM). + * @param bpm_hi High edge of the search band (BPM). + * @param reject_br_hz Breathing fundamental (Hz) whose harmonics are rejected + * (k=1..6); pass 0 to disable rejection (fundamental search). + * @return Dominant rate in BPM within the band, or 0 if no confident peak. + */ +static float estimate_periodicity_autocorr(const float *sig, uint16_t len, float fs, + float bpm_lo, float bpm_hi, float reject_br_hz) +{ + if (len < 32 || fs <= 0.0f || bpm_hi <= bpm_lo) return 0.0f; + + int lag_min = (int)(fs * 60.0f / bpm_hi); + int lag_max = (int)(fs * 60.0f / bpm_lo); + if (lag_min < 2) lag_min = 2; + if (lag_max >= (int)len) lag_max = (int)len - 1; + if (lag_max <= lag_min + 1) return 0.0f; + + const float br_hz = reject_br_hz; + + float r0 = 0.0f; + for (uint16_t i = 0; i < len; i++) r0 += sig[i] * sig[i]; + if (r0 <= 1e-6f) return 0.0f; + + float best = -1.0f; + int best_lag = 0; + + for (int lag = lag_min; lag <= lag_max; lag++) { + float f = fs / (float)lag; /* candidate HR frequency (Hz) */ + + /* Reject candidates within 8% of a breathing harmonic k*f_br (k=1..6). */ + if (br_hz > 0.0f) { + bool harmonic = false; + for (int k = 1; k <= 6; k++) { + float h = (float)k * br_hz; + if (fabsf(f - h) < 0.08f * h) { harmonic = true; break; } + } + if (harmonic) continue; + } + + float acc = 0.0f; + for (int i = 0; i + lag < (int)len; i++) acc += sig[i] * sig[i + lag]; + if (acc > best) { best = acc; best_lag = lag; } + } + + if (best_lag == 0) return 0.0f; + /* Require a real periodicity, not a noise peak. */ + if (best / r0 < 0.2f) return 0.0f; + + /* Parabolic interpolation around best_lag for sub-sample period resolution. */ + float lag_ref = (float)best_lag; + { + float a = 0.0f, c = 0.0f; + for (int i = 0; i + (best_lag - 1) < (int)len; i++) a += sig[i] * sig[i + best_lag - 1]; + for (int i = 0; i + (best_lag + 1) < (int)len; i++) c += sig[i] * sig[i + best_lag + 1]; + float denom = a - 2.0f * best + c; + if (fabsf(denom) > 1e-6f) { + float delta = 0.5f * (a - c) / denom; + if (delta > -1.0f && delta < 1.0f) lag_ref += delta; + } + } + + return fs / lag_ref * 60.0f; +} + +/* Median smoother for the emitted heart rate. The per-frame autocorr estimate + * still has occasional single-frame outliers (startup transient before the + * filters re-tune, momentary harmonic mis-locks); a median over the last few + * VALID estimates stops the reported HR from "dropping a lot" between frames + * without lagging real changes much. Only valid (in-range) estimates are + * pushed, so out-of-range/zero results never pollute the window. */ +#define HR_SMOOTH_N 13 +static float s_hr_ring[HR_SMOOTH_N]; +static uint8_t s_hr_ring_n; +static uint8_t s_hr_ring_idx; + +static float hr_smooth_push(float hr) +{ + s_hr_ring[s_hr_ring_idx] = hr; + s_hr_ring_idx = (uint8_t)((s_hr_ring_idx + 1) % HR_SMOOTH_N); + if (s_hr_ring_n < HR_SMOOTH_N) s_hr_ring_n++; + + float tmp[HR_SMOOTH_N]; + for (uint8_t i = 0; i < s_hr_ring_n; i++) tmp[i] = s_hr_ring[i]; + for (uint8_t i = 1; i < s_hr_ring_n; i++) { /* insertion sort, tiny N */ + float v = tmp[i]; + int j = (int)i - 1; + while (j >= 0 && tmp[j] > v) { tmp[j + 1] = tmp[j]; j--; } + tmp[j + 1] = v; + } + return tmp[s_hr_ring_n / 2]; +} + /* ====================================================================== * DSP Pipeline State * ====================================================================== */ @@ -246,6 +353,14 @@ static edge_biquad_t s_bq_heartrate; static float s_breathing_filtered[EDGE_PHASE_HISTORY_LEN]; static float s_heartrate_filtered[EDGE_PHASE_HISTORY_LEN]; +/** Measured CSI sample rate (Hz), smoothed from frame timestamps. + * #985's self-ping raised the callback rate above the old ~10 Hz beacon + * assumption and made it variable (~13-19 Hz); a fixed rate scaled BPM wrong + * and made HR swing with CSI yield. See update in process_csi_frame(). */ +static float s_sample_rate_hz = 15.0f; +static float s_filter_design_fs = 20.0f; /* fs the biquads were last designed at */ +static uint32_t s_last_frame_ts_us = 0; + /** Latest vitals state. */ static float s_breathing_bpm; static float s_heartrate_bpm; @@ -535,7 +650,11 @@ static void update_multi_person_vitals(const uint8_t *iq_data, uint16_t n_sc, } float br = estimate_bpm_zero_crossing(s_scratch_br, buf_len, sample_rate); - float hr = estimate_bpm_zero_crossing(s_scratch_hr, buf_len, sample_rate); + /* Robust breathing period (autocorr) drives HR harmonic rejection — + * the zero-crossing estimate is too noisy under motion and notched + * the wrong frequencies, letting HR lock onto a breathing harmonic. */ + float br_rob = estimate_periodicity_autocorr(s_scratch_br, buf_len, sample_rate, 6.0f, 40.0f, 0.0f); + float hr = estimate_periodicity_autocorr(s_scratch_hr, buf_len, sample_rate, 45.0f, 180.0f, br_rob / 60.0f); /* Sanity clamp. */ if (br >= 6.0f && br <= 40.0f) pv->breathing_bpm = br; @@ -715,11 +834,36 @@ static void process_frame(const edge_ring_slot_t *slot) s_frame_count++; s_latest_rssi = slot->rssi; - /* CSI sample rate. MGMT-only promiscuous filter (RuView#396, csi_collector.c) - * yields ~10 Hz from beacons; keep this value aligned with csi_collector's - * effective callback rate or estimate_bpm_zero_crossing() reports the wrong - * BPM (2× rate mismatch → 2× wrong breathing/HR). */ - const float sample_rate = 10.0f; + /* Measure the REAL CSI sample rate from inter-frame timestamps. #985's + * self-ping made the callback rate variable (~13-19 Hz); the old fixed + * 10 Hz both scaled BPM wrong (true ~87 BPM read as ~45) and made HR swing + * as CSI yield fluctuated. EMA-smooth and clamp to a plausible band. */ + if (s_last_frame_ts_us != 0 && slot->timestamp_us > s_last_frame_ts_us) { + float dt = (float)(slot->timestamp_us - s_last_frame_ts_us) * 1e-6f; + if (dt > 0.02f && dt < 0.5f) { /* 2-50 Hz plausible; reject gaps/hops */ + float inst = 1.0f / dt; + s_sample_rate_hz += 0.05f * (inst - s_sample_rate_hz); + if (s_sample_rate_hz < 8.0f) s_sample_rate_hz = 8.0f; + if (s_sample_rate_hz > 30.0f) s_sample_rate_hz = 30.0f; + } + } + s_last_frame_ts_us = slot->timestamp_us; + + /* Re-tune the biquads if the measured rate has drifted from their design fs, + * so the breathing (0.1-0.5 Hz) and HR (0.8-2.0 Hz) passbands stay in real + * Hz. biquad_bandpass_design resets delay state, so only redesign on real + * drift (>15%) — the autocorr window averages over the one-time transient. */ + if (fabsf(s_sample_rate_hz - s_filter_design_fs) > 0.15f * s_filter_design_fs) { + biquad_bandpass_design(&s_bq_breathing, s_sample_rate_hz, 0.1f, 0.5f); + biquad_bandpass_design(&s_bq_heartrate, s_sample_rate_hz, 0.8f, 2.0f); + for (uint8_t pp = 0; pp < EDGE_MAX_PERSONS; pp++) { + biquad_bandpass_design(&s_person_bq_br[pp], s_sample_rate_hz, 0.1f, 0.5f); + biquad_bandpass_design(&s_person_bq_hr[pp], s_sample_rate_hz, 0.8f, 2.0f); + } + s_filter_design_fs = s_sample_rate_hz; + } + + const float sample_rate = s_sample_rate_hz; /* --- Step 1-2: Phase extraction + unwrapping per subcarrier --- */ float phases[EDGE_MAX_SUBCARRIERS]; @@ -777,11 +921,13 @@ static void process_frame(const edge_ring_slot_t *slot) } float br_bpm = estimate_bpm_zero_crossing(s_scratch_br, buf_len, sample_rate); - float hr_bpm = estimate_bpm_zero_crossing(s_scratch_hr, buf_len, sample_rate); + /* Robust breathing period (autocorr) drives HR harmonic rejection. */ + float br_rob = estimate_periodicity_autocorr(s_scratch_br, buf_len, sample_rate, 6.0f, 40.0f, 0.0f); + float hr_bpm = estimate_periodicity_autocorr(s_scratch_hr, buf_len, sample_rate, 45.0f, 180.0f, br_rob / 60.0f); /* Sanity clamp: breathing 6-40 BPM, heart rate 40-180 BPM. */ if (br_bpm >= 6.0f && br_bpm <= 40.0f) s_breathing_bpm = br_bpm; - if (hr_bpm >= 40.0f && hr_bpm <= 180.0f) s_heartrate_bpm = hr_bpm; + if (hr_bpm >= 40.0f && hr_bpm <= 180.0f) s_heartrate_bpm = hr_smooth_push(hr_bpm); } /* --- Step 8: Motion energy (variance of recent phases) --- */