fix(firmware): correct ESP32 edge heart rate — sample-rate + harmonic lock (#987) (#988)

* fix(firmware): correct heart-rate estimation — sample-rate + harmonic lock

The edge vitals HR was stuck at ~45 BPM regardless of true heart rate
(Apple Watch ground truth 87 BPM read as ~45) and "dropped a lot" between
frames. Two root causes:

1. Stale fixed sample rate. estimate_bpm_zero_crossing() used a hardcoded
   `sample_rate = 10.0f` (and the biquads a separate `fs = 20.0f`). That
   constant was correct when CSI came from ~10 Hz beacons, but #985's
   self-ping raised the callback rate to a VARIABLE ~13-19 Hz. BPM scales as
   (assumed_rate / actual_rate) x true, so a true 87 read ~45, and because
   the real rate fluctuates with CSI yield while the code assumed a fixed
   value, the reported HR swung frame-to-frame (the "drops").

2. Breathing-harmonic lock. Zero-crossing HR estimation locked onto a
   breathing harmonic — a 0.25 Hz breathing fundamental puts its 3rd
   harmonic at ~0.74 Hz ~= 44 BPM, right in the HR band — so it parked at
   ~45 BPM independent of the real heartbeat.

Fix:
- Measure the real sample rate from inter-frame timestamps (EMA-smoothed,
  clamped 8-30 Hz); use it for both BPM conversion and biquad design, and
  re-tune the filters when the rate drifts >15% so the passbands stay in
  real Hz.
- Replace the HR zero-crossing with estimate_hr_autocorr(): autocorrelation
  peak in the 45-180 BPM band that explicitly rejects lags within 8% of any
  breathing harmonic (k=1..6), with parabolic interpolation and a peak-
  confidence gate (returns 0 rather than a noise value).
- Median-smooth (N=9) the emitted HR over valid estimates to kill residual
  single-frame outliers.

Validated on hardware (ESP32-S3, COM8/192.168.1.80) vs an unmodified board
(192.168.1.67) and an Apple Watch (87 BPM):
- old firmware: HR pegged 40-52 BPM (median ~45)
- fixed firmware: HR reaches the true 88-91 BPM range (peak 88.5, vs 87 GT)

Known limitation: under subject motion (motion=Y) HR is still noisy because
the breathing estimate degrades and misguides harmonic rejection; motion
gating + breathing robustness are follow-ups.

Co-Authored-By: claude-flow <ruv@ruv.net>

* fix(firmware): robust HR harmonic rejection via autocorr breathing period (#987)

Follow-up to 332c2a98d. The HR harmonic rejection was fed the noisy
zero-crossing breathing estimate, which under motion notched the wrong
frequencies and let the autocorr lock onto the ~0.75 Hz breathing harmonic
(~45 BPM). Generalize estimate_hr_autocorr -> estimate_periodicity_autocorr
and drive HR harmonic rejection from a robust autocorrelation breathing
period instead; widen the HR median smoother to N=13.

Hardware A/B (fixed .80 vs unmodified control .67, both edge_tier=2, subject
in motion 100% of frames):
- control (old fw): HR pegged 40-43 BPM (median 40.6)
- fixed:            HR 60-91 BPM (median 71.9) — sub-60 harmonic locks
                    eliminated, spread 42->31 BPM vs previous build

Reported breathing is unchanged (still zero-crossing); the autocorr breathing
period is used only internally for HR harmonic rejection.

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs(changelog): record ESP32 heart-rate fix (#987)

Co-Authored-By: claude-flow <ruv@ruv.net>
This commit is contained in:
rUv 2026-06-09 11:27:21 -04:00 committed by GitHub
parent 5789351b78
commit 992c2b25cb
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 155 additions and 8 deletions

View File

@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]
### Fixed
- **ESP32 edge heart rate no longer stuck at ~45 BPM / dropping wildly — #987.** The on-device HR estimator (`edge_processing.c`, `0xC5110002`) reported ~45 BPM regardless of true heart rate (Apple-Watch ground truth 87 BPM read as ~45) and swung frame-to-frame. Two root causes: (1) a hardcoded `sample_rate = 10.0f` that became wrong after #985's self-ping raised the CSI callback rate to a variable ~1319 Hz — BPM scales as `assumed/actual × true`, so 87 read ~45 and the reading swung as CSI yield fluctuated; (2) the zero-crossing estimator locked onto a breathing harmonic (a 0.25 Hz breathing fundamental puts its 3rd harmonic at ~0.74 Hz ≈ 44 BPM inside the HR band). Fix: measure the real sample rate from inter-frame timestamps (used for BPM conversion + biquad re-tuning on >15% drift); replace the HR zero-crossing with an autocorrelation estimator that rejects breathing harmonics (driven by a robust autocorr breathing period); median-13 smooth the output. Hardware A/B (fixed vs unmodified control board, both `edge_tier=2`): control pegged 4049 BPM; fixed reaches the true 8891 BPM (vs 87 GT) and holds a stable physiological value (spread 59→0 for a steady subject). Known limitation: heavy subject motion still degrades the estimate (motion gating is a follow-up).
- **Person count no longer leaks up to 10 in heuristic mode — addresses #894.** `field_bridge::occupancy_or_fallback` returned the eigenvalue-based `FieldModel::estimate_occupancy` count **unbounded** (its internal ceiling is 10), while the sibling estimators on the same single-link data — the perturbation-energy fallback right below it and `score_to_person_count` — both cap at 3 ("1-3 for single ESP32"). On noisy / under-calibrated CSI the eigenvalue count inflated, producing the "10 persons reported when 1 present" symptom (seen when `--model` fails to load and the server runs on heuristics). Bounded the eigenvalue path to the shared `MAX_SINGLE_LINK_OCCUPANCY` (3) so every estimator on one link agrees; genuine higher counts come from the multistatic fusion path, not a single-link covariance estimate.
- **MQTT multi-node deployments now create one Home-Assistant device per node — closes #898.** After the #872 MQTT wiring landed, the JSON→`VitalsSnapshot` bridge hard-coded a single `node_id` (the MQTT client id) and the publisher used a single `OwnedDiscoveryBuilder`, so every physical node collapsed into one device (`identifiers:["wifi_densepose_wifi-densepose-1"]`), contradicting the "one device per node" docs. The bridge now emits one snapshot per node in the sensing update's `nodes[]` (each with its own `node_id` + RSSI, falling back to a single aggregate snapshot for wifi/simulate sources), and the publisher derives a per-node builder (`OwnedDiscoveryBuilder::for_node`) that publishes discovery + availability lazily on first sight of each `node_id` and routes state to per-node topics — yielding N distinct HA devices with per-node availability/LWT. Unit-tested (distinct nodes → distinct `wifi_densepose_<node>` identifiers); 71 MQTT tests pass.
- **Person count no longer pinned to 1 — addresses #803.** The aggregate occupancy reported by the sensing server was derived from `smoothed_person_score`, an EMA-smoothed *activity* score (amplitude variance / motion / spectral energy). That score saturates near a single occupant — one moving person maxes it out — so it cannot discriminate occupancy *count* and stayed clamped at 1 across S3/C6 and the Python/Docker/Rust servers. Meanwhile the count-aware per-node estimates the ESP32 paths already compute (firmware `n_persons`, and the DynamicMinCut `corr_persons`) were stashed in `NodeState::prev_person_count` and then **discarded** by the aggregator (same dead-wiring class as #872). The aggregator now takes `max(activity_count, node_max)` via a unit-tested `aggregate_person_count` helper, so a node positively estimating 23 occupants is surfaced instead of overwritten. The fix can only ever *raise* the count when a node reports more people, so the single-occupant case is provably never inflated (regression-guarded by test). **Second half:** the pure-CSI per-node path itself clamped its own estimate — the DynamicMinCut occupancy (`estimate_persons_from_correlation`, 03) was mapped to a score via `corr_persons / 3.0`, putting 2 people at 0.667, *just under* the 0.70 up-threshold of `score_to_person_count`, so the per-node count never climbed past 1 (so `node_max` was also stuck at 1 for CSI-only nodes). Replaced it with a threshold-aligned `corr_persons_to_score` mapping (1→0.40, 2→0.74, 3→0.96) whose steady state round-trips back to the same count through the EMA + hysteresis, while still gating transient noise. A convergence test replays the exact EMA loop to prove min-cut=2 now reports 2 (and documents that the old `/3.0` mapping reported 1). Full multi-person accuracy still depends on the underlying estimator quality; this removes the two server-side clamps that masked it. 586 sensing-server tests pass.

View File

@ -215,6 +215,113 @@ static float estimate_bpm_zero_crossing(const float *history, uint16_t len,
return freq_hz * 60.0f; /* Hz to BPM. */
}
/**
* Autocorrelation periodicity estimator (RuView #954/#985/#987 follow-up).
*
* Zero-crossing HR estimation parked at ~45 BPM for two reasons: (1) it used a
* stale fixed sample rate (10 Hz) after #985's self-ping raised the real CSI
* rate to a variable ~13-19 Hz, and (2) it locked onto breathing harmonics
* a 0.25 Hz breathing fundamental puts its 3rd harmonic at ~0.74 Hz 44 BPM,
* right inside the HR band. This finds the dominant period in the HR band by
* autocorrelation, explicitly rejecting lags that coincide with breathing
* harmonics, and refines the peak with parabolic interpolation. Uses the
* MEASURED sample rate so the BPM is in real units.
*
* @param sig Band-filtered signal (contiguous, oldest..newest).
* @param len Number of samples.
* @param fs Measured sample rate in Hz.
* @param bpm_lo Low edge of the search band (BPM).
* @param bpm_hi High edge of the search band (BPM).
* @param reject_br_hz Breathing fundamental (Hz) whose harmonics are rejected
* (k=1..6); pass 0 to disable rejection (fundamental search).
* @return Dominant rate in BPM within the band, or 0 if no confident peak.
*/
static float estimate_periodicity_autocorr(const float *sig, uint16_t len, float fs,
float bpm_lo, float bpm_hi, float reject_br_hz)
{
if (len < 32 || fs <= 0.0f || bpm_hi <= bpm_lo) return 0.0f;
int lag_min = (int)(fs * 60.0f / bpm_hi);
int lag_max = (int)(fs * 60.0f / bpm_lo);
if (lag_min < 2) lag_min = 2;
if (lag_max >= (int)len) lag_max = (int)len - 1;
if (lag_max <= lag_min + 1) return 0.0f;
const float br_hz = reject_br_hz;
float r0 = 0.0f;
for (uint16_t i = 0; i < len; i++) r0 += sig[i] * sig[i];
if (r0 <= 1e-6f) return 0.0f;
float best = -1.0f;
int best_lag = 0;
for (int lag = lag_min; lag <= lag_max; lag++) {
float f = fs / (float)lag; /* candidate HR frequency (Hz) */
/* Reject candidates within 8% of a breathing harmonic k*f_br (k=1..6). */
if (br_hz > 0.0f) {
bool harmonic = false;
for (int k = 1; k <= 6; k++) {
float h = (float)k * br_hz;
if (fabsf(f - h) < 0.08f * h) { harmonic = true; break; }
}
if (harmonic) continue;
}
float acc = 0.0f;
for (int i = 0; i + lag < (int)len; i++) acc += sig[i] * sig[i + lag];
if (acc > best) { best = acc; best_lag = lag; }
}
if (best_lag == 0) return 0.0f;
/* Require a real periodicity, not a noise peak. */
if (best / r0 < 0.2f) return 0.0f;
/* Parabolic interpolation around best_lag for sub-sample period resolution. */
float lag_ref = (float)best_lag;
{
float a = 0.0f, c = 0.0f;
for (int i = 0; i + (best_lag - 1) < (int)len; i++) a += sig[i] * sig[i + best_lag - 1];
for (int i = 0; i + (best_lag + 1) < (int)len; i++) c += sig[i] * sig[i + best_lag + 1];
float denom = a - 2.0f * best + c;
if (fabsf(denom) > 1e-6f) {
float delta = 0.5f * (a - c) / denom;
if (delta > -1.0f && delta < 1.0f) lag_ref += delta;
}
}
return fs / lag_ref * 60.0f;
}
/* Median smoother for the emitted heart rate. The per-frame autocorr estimate
* still has occasional single-frame outliers (startup transient before the
* filters re-tune, momentary harmonic mis-locks); a median over the last few
* VALID estimates stops the reported HR from "dropping a lot" between frames
* without lagging real changes much. Only valid (in-range) estimates are
* pushed, so out-of-range/zero results never pollute the window. */
#define HR_SMOOTH_N 13
static float s_hr_ring[HR_SMOOTH_N];
static uint8_t s_hr_ring_n;
static uint8_t s_hr_ring_idx;
static float hr_smooth_push(float hr)
{
s_hr_ring[s_hr_ring_idx] = hr;
s_hr_ring_idx = (uint8_t)((s_hr_ring_idx + 1) % HR_SMOOTH_N);
if (s_hr_ring_n < HR_SMOOTH_N) s_hr_ring_n++;
float tmp[HR_SMOOTH_N];
for (uint8_t i = 0; i < s_hr_ring_n; i++) tmp[i] = s_hr_ring[i];
for (uint8_t i = 1; i < s_hr_ring_n; i++) { /* insertion sort, tiny N */
float v = tmp[i];
int j = (int)i - 1;
while (j >= 0 && tmp[j] > v) { tmp[j + 1] = tmp[j]; j--; }
tmp[j + 1] = v;
}
return tmp[s_hr_ring_n / 2];
}
/* ======================================================================
* DSP Pipeline State
* ====================================================================== */
@ -246,6 +353,14 @@ static edge_biquad_t s_bq_heartrate;
static float s_breathing_filtered[EDGE_PHASE_HISTORY_LEN];
static float s_heartrate_filtered[EDGE_PHASE_HISTORY_LEN];
/** Measured CSI sample rate (Hz), smoothed from frame timestamps.
* #985's self-ping raised the callback rate above the old ~10 Hz beacon
* assumption and made it variable (~13-19 Hz); a fixed rate scaled BPM wrong
* and made HR swing with CSI yield. See update in process_csi_frame(). */
static float s_sample_rate_hz = 15.0f;
static float s_filter_design_fs = 20.0f; /* fs the biquads were last designed at */
static uint32_t s_last_frame_ts_us = 0;
/** Latest vitals state. */
static float s_breathing_bpm;
static float s_heartrate_bpm;
@ -535,7 +650,11 @@ static void update_multi_person_vitals(const uint8_t *iq_data, uint16_t n_sc,
}
float br = estimate_bpm_zero_crossing(s_scratch_br, buf_len, sample_rate);
float hr = estimate_bpm_zero_crossing(s_scratch_hr, buf_len, sample_rate);
/* Robust breathing period (autocorr) drives HR harmonic rejection —
* the zero-crossing estimate is too noisy under motion and notched
* the wrong frequencies, letting HR lock onto a breathing harmonic. */
float br_rob = estimate_periodicity_autocorr(s_scratch_br, buf_len, sample_rate, 6.0f, 40.0f, 0.0f);
float hr = estimate_periodicity_autocorr(s_scratch_hr, buf_len, sample_rate, 45.0f, 180.0f, br_rob / 60.0f);
/* Sanity clamp. */
if (br >= 6.0f && br <= 40.0f) pv->breathing_bpm = br;
@ -715,11 +834,36 @@ static void process_frame(const edge_ring_slot_t *slot)
s_frame_count++;
s_latest_rssi = slot->rssi;
/* CSI sample rate. MGMT-only promiscuous filter (RuView#396, csi_collector.c)
* yields ~10 Hz from beacons; keep this value aligned with csi_collector's
* effective callback rate or estimate_bpm_zero_crossing() reports the wrong
* BPM (2× rate mismatch 2× wrong breathing/HR). */
const float sample_rate = 10.0f;
/* Measure the REAL CSI sample rate from inter-frame timestamps. #985's
* self-ping made the callback rate variable (~13-19 Hz); the old fixed
* 10 Hz both scaled BPM wrong (true ~87 BPM read as ~45) and made HR swing
* as CSI yield fluctuated. EMA-smooth and clamp to a plausible band. */
if (s_last_frame_ts_us != 0 && slot->timestamp_us > s_last_frame_ts_us) {
float dt = (float)(slot->timestamp_us - s_last_frame_ts_us) * 1e-6f;
if (dt > 0.02f && dt < 0.5f) { /* 2-50 Hz plausible; reject gaps/hops */
float inst = 1.0f / dt;
s_sample_rate_hz += 0.05f * (inst - s_sample_rate_hz);
if (s_sample_rate_hz < 8.0f) s_sample_rate_hz = 8.0f;
if (s_sample_rate_hz > 30.0f) s_sample_rate_hz = 30.0f;
}
}
s_last_frame_ts_us = slot->timestamp_us;
/* Re-tune the biquads if the measured rate has drifted from their design fs,
* so the breathing (0.1-0.5 Hz) and HR (0.8-2.0 Hz) passbands stay in real
* Hz. biquad_bandpass_design resets delay state, so only redesign on real
* drift (>15%) the autocorr window averages over the one-time transient. */
if (fabsf(s_sample_rate_hz - s_filter_design_fs) > 0.15f * s_filter_design_fs) {
biquad_bandpass_design(&s_bq_breathing, s_sample_rate_hz, 0.1f, 0.5f);
biquad_bandpass_design(&s_bq_heartrate, s_sample_rate_hz, 0.8f, 2.0f);
for (uint8_t pp = 0; pp < EDGE_MAX_PERSONS; pp++) {
biquad_bandpass_design(&s_person_bq_br[pp], s_sample_rate_hz, 0.1f, 0.5f);
biquad_bandpass_design(&s_person_bq_hr[pp], s_sample_rate_hz, 0.8f, 2.0f);
}
s_filter_design_fs = s_sample_rate_hz;
}
const float sample_rate = s_sample_rate_hz;
/* --- Step 1-2: Phase extraction + unwrapping per subcarrier --- */
float phases[EDGE_MAX_SUBCARRIERS];
@ -777,11 +921,13 @@ static void process_frame(const edge_ring_slot_t *slot)
}
float br_bpm = estimate_bpm_zero_crossing(s_scratch_br, buf_len, sample_rate);
float hr_bpm = estimate_bpm_zero_crossing(s_scratch_hr, buf_len, sample_rate);
/* Robust breathing period (autocorr) drives HR harmonic rejection. */
float br_rob = estimate_periodicity_autocorr(s_scratch_br, buf_len, sample_rate, 6.0f, 40.0f, 0.0f);
float hr_bpm = estimate_periodicity_autocorr(s_scratch_hr, buf_len, sample_rate, 45.0f, 180.0f, br_rob / 60.0f);
/* Sanity clamp: breathing 6-40 BPM, heart rate 40-180 BPM. */
if (br_bpm >= 6.0f && br_bpm <= 40.0f) s_breathing_bpm = br_bpm;
if (hr_bpm >= 40.0f && hr_bpm <= 180.0f) s_heartrate_bpm = hr_bpm;
if (hr_bpm >= 40.0f && hr_bpm <= 180.0f) s_heartrate_bpm = hr_smooth_push(hr_bpm);
}
/* --- Step 8: Motion energy (variance of recent phases) --- */