15 KiB
ADR-039: ESP32-S3 Edge Intelligence — On-Device Signal Processing and RuVector Integration
| Field | Value |
|---|---|
| Status | Proposed |
| Date | 2026-03-03 |
| Depends on | ADR-018 (binary frame format), ADR-014 (SOTA signal processing), ADR-021 (vital sign extraction), ADR-029 (multistatic sensing), ADR-030 (persistent field model), ADR-031 (RuView sensing-first RF) |
| Supersedes | None |
Context
The current ESP32-S3 firmware (1,018 lines, 7 files) is a "dumb sensor" — it captures raw CSI frames and streams them unprocessed over UDP at ~20 Hz. All signal processing, feature extraction, presence detection, vital sign estimation, and pose inference happen server-side in the Rust crates.
This creates several limitations:
- Bandwidth waste — raw CSI frames are 128-384 bytes each at 20 Hz = ~60 KB/s per node. Most of this is noise.
- Latency — round-trip to server adds 5-50ms depending on network.
- Server dependency — nodes are useless without an active aggregator.
- Scalability ceiling — 6-node mesh at 20 Hz = 120 frames/s = server bottleneck.
- No local alerting — fall detection, breathing anomaly, or intrusion must wait for server roundtrip.
The ESP32-S3 has significant untapped compute:
- Dual-core Xtensa LX7 at 240 MHz
- 512 KB SRAM + optional 8 MB PSRAM (our board has 8 MB flash)
- Vector/DSP instructions (PIE — Processor Instruction Extensions)
- FPU — hardware single-precision floating point
- ~80% idle CPU — current firmware uses <20% (WiFi + CSI callback + UDP send)
Decision
Implement a 3-tier edge intelligence pipeline on the ESP32-S3 firmware, progressively offloading signal processing from the server to the device. Each tier is independently toggleable via NVS configuration.
Tier 1: Smart Filtering & Compression (Firmware C)
Lightweight processing in the CSI callback path. Zero additional latency.
| Feature | Source ADR | Algorithm | Memory | CPU |
|---|---|---|---|---|
| Phase sanitization | ADR-014 | Linear phase unwrap + conjugate multiply | 256 B | <1% |
| Amplitude normalization | ADR-014 | Per-subcarrier running mean/std (Welford) | 512 B | <1% |
| Subcarrier selection | ADR-016 (ruvector-mincut) | Top-K variance subcarriers | 128 B | <1% |
| Static environment suppression | ADR-030 | Exponential moving average subtraction | 512 B | <1% |
| Adaptive frame decimation | New | Skip frames when CSI variance < threshold | 8 B | <1% |
| Delta compression | New | XOR + RLE vs. previous frame | 512 B | <2% |
Bandwidth reduction: 60-80% (send only changed, high-variance subcarriers).
ADR-018 v2 frame extension (backward-compatible):
Existing 20-byte header unchanged.
New optional trailer (if magic bit set):
[N*2] Compressed I/Q (delta-coded, only selected subcarriers)
[2] Subcarrier bitmap (which of 64 subcarriers included)
[1] Frame flags: bit0=compressed, bit1=phase-sanitized, bit2=amplitude-normed
[1] Motion score (0-255)
[1] Presence confidence (0-255)
[1] Reserved
Tier 2: On-Device Vital Signs & Presence (Firmware C + fixed-point DSP)
Runs as a FreeRTOS task on Core 1 (CSI collection on Core 0), processing a sliding window of CSI frames.
| Feature | Source ADR | Algorithm | Memory | CPU (Core 1) |
|---|---|---|---|---|
| Presence detection | ADR-029 | Variance threshold on amplitude envelope | 2 KB | 5% |
| Motion scoring | ADR-014 | Subcarrier correlation coefficient | 1 KB | 3% |
| Breathing rate | ADR-021 | Bandpass 0.1-0.5 Hz + peak detection on CSI phase | 8 KB | 10% |
| Heart rate | ADR-021 | Bandpass 0.8-2.0 Hz + autocorrelation on CSI phase | 8 KB | 15% |
| Fall detection | ADR-029 | Sudden variance spike + sustained stillness | 1 KB | 2% |
| Room occupancy count | ADR-037 | CSI rank estimation (eigenvalue spread) | 4 KB | 8% |
| Coherence gate | ADR-029 (ruvsense) | Z-score coherence, accept/reject/recalibrate | 1 KB | 2% |
Total memory: ~25 KB (fits in SRAM, no PSRAM needed). Total CPU: ~45% of Core 1.
Output: Compact vital-signs UDP packet (32 bytes) at 1 Hz:
Offset Size Field
0 4 Magic: 0xC5110002 (vitals packet)
4 1 Node ID
5 1 Packet type (0x02 = vitals)
6 2 Sequence (LE u16)
8 1 Presence (0=empty, 1=present, 2=moving)
9 1 Motion score (0-255)
10 1 Occupancy estimate (0-8 persons)
11 1 Coherence gate (0=reject, 1=predict, 2=accept, 3=recalibrate)
12 2 Breathing rate (BPM * 100, LE u16) — 0 if not detected
14 2 Heart rate (BPM * 100, LE u16) — 0 if not detected
16 2 Breathing confidence (0-10000, LE u16)
18 2 Heart rate confidence (0-10000, LE u16)
20 1 Fall detected (0/1)
21 1 Anomaly flags (bitfield)
22 2 Ambient RSSI mean (LE i16)
24 4 CSI frame count since last report (LE u32)
28 4 Uptime seconds (LE u32)
Tier 3: Lightweight Feature Extraction (Firmware C + optional PSRAM)
Pre-compute features that the server-side neural network needs, reducing server CPU by 60-80%.
| Feature | Source ADR | Algorithm | Memory | CPU |
|---|---|---|---|---|
| Phase difference matrix | ADR-014 | Adjacent subcarrier phase diff | 4 KB | 5% |
| Amplitude spectrogram | ADR-014 | 64-bin FFT on 1s window per subcarrier | 32 KB | 15% |
| Doppler-time map | ADR-029 | 2D FFT across subcarriers × time | 16 KB | 10% |
| Fresnel zone crossing | ADR-014 | First Fresnel radius + fade count | 1 KB | 2% |
| Cross-link correlation | ADR-029 | Pearson correlation between TX-RX pairs | 2 KB | 5% |
| Environment fingerprint | ADR-027 (MERIDIAN) | PCA-compressed 16-dim CSI signature | 4 KB | 5% |
| Gesture template match | ADR-029 (ruvsense) | DTW on 8-dim feature vector | 8 KB | 10% |
Total memory: ~67 KB (SRAM) or up to 256 KB with PSRAM. Total CPU: ~52% of Core 1.
Output: Feature vector UDP packet (variable size, ~200-500 bytes) at 4 Hz:
Offset Size Field
0 4 Magic: 0xC5110003 (feature packet)
4 1 Node ID
5 1 Packet type (0x03 = features)
6 2 Feature bitmap (which features included)
8 4 Timestamp ms (LE u32)
12 N Feature payloads (concatenated, lengths determined by bitmap)
NVS Configuration
All tiers controllable via NVS without reflashing:
| NVS Key | Type | Default | Description |
|---|---|---|---|
edge_tier |
u8 | 0 | 0=raw only, 1=smart filter, 2=+vitals, 3=+features |
decim_thresh |
u16 | 100 | Adaptive decimation variance threshold |
subk_count |
u8 | 32 | Top-K subcarriers to keep (Tier 1) |
vital_window |
u16 | 300 | Vital sign window frames (15s at 20 Hz) |
vital_interval |
u16 | 1000 | Vital report interval ms |
feature_hz |
u8 | 4 | Feature extraction rate |
fall_thresh |
u16 | 500 | Fall detection variance spike threshold |
presence_thresh |
u16 | 50 | Presence detection threshold |
Provisioning:
python firmware/esp32-csi-node/provision.py --port COM7 \
--edge-tier 2 --vital-window 300 --presence-thresh 50
Implementation Plan
Phase 1: Infrastructure (1 week)
-
Dual-core task architecture
- Core 0: WiFi + CSI callback (existing)
- Core 1: Edge processing task (new FreeRTOS task)
- Lock-free ring buffer between cores (producer-consumer)
-
Ring buffer design
#define RING_BUF_FRAMES 64 // ~3.2s at 20 Hz typedef struct { wifi_csi_info_t info; int8_t iq_data[384]; // Max I/Q payload uint32_t timestamp_ms; uint8_t tx_mac[6]; } csi_ring_entry_t; -
NVS config extension — add
edge_tierand tier-specific params -
ADR-018 v2 header — backward-compatible extension bit
Phase 2: Tier 1 — Smart Filtering (1 week)
- Phase unwrap — O(N) linear scan, in-place
- Welford running stats — per-subcarrier mean/variance, O(1) update
- Top-K subcarrier selection — partial sort, O(N) with selection algorithm
- Delta compression — XOR vs previous frame, RLE encode
- Adaptive decimation — skip frame if total variance < threshold
Phase 3: Tier 2 — Vital Signs (2 weeks)
- Presence detector — amplitude variance over 1s window
- Motion scorer — correlation coefficient between consecutive frames
- Breathing extractor — port from
wifi-densepose-vitals::BreathingExtractor::esp32_default()- Bandpass via biquad IIR filter (0.1-0.5 Hz)
- Peak detection with parabolic interpolation
- Fixed-point arithmetic (Q15.16) for efficiency
- Heart rate extractor — port from
wifi-densepose-vitals::HeartRateExtractor::esp32_default()- Bandpass via biquad IIR (0.8-2.0 Hz)
- Autocorrelation peak search
- Fall detection — variance spike (>5σ) followed by sustained stillness (>3s)
- Coherence gate — port from
ruvsense::coherence_gate(Z-score threshold)
Phase 4: Tier 3 — Feature Extraction (2 weeks)
- FFT engine — fixed-point 64-point FFT (radix-2 DIT, no library needed)
- Amplitude spectrogram — 1s sliding window FFT per subcarrier
- Doppler-time map — 2D FFT across subcarrier × time dimensions
- Phase difference matrix — adjacent subcarrier Δφ
- Environment fingerprint — online PCA (incremental SVD, 16 components)
- Gesture DTW — 8 stored templates, dynamic time warping on 8-dim feature
Phase 5: CI/CD + Testing (1 week)
- GitHub Actions firmware build — Docker
espressif/idf:v5.2on every PR - Host-side unit tests — compile edge processing functions on x86 with mock CSI data
- Credential leak check — binary string scan in CI
- Binary size tracking — fail CI if firmware exceeds 90% of partition
- QEMU smoke test — boot verification, NVS load, task creation
ESP32-S3 Resource Budget
| Resource | Available | Tier 1 | Tier 2 | Tier 3 | Remaining |
|---|---|---|---|---|---|
| SRAM | 512 KB | 2 KB | 25 KB | 67 KB | 418 KB |
| Core 0 CPU | 100% | 5% | 0% | 0% | 75% (WiFi uses ~20%) |
| Core 1 CPU | 100% | 0% | 45% | 52% | 3% (Tier 2+3 exclusive) |
| Flash | 1 MB partition | 4 KB code | 12 KB code | 20 KB code | 964 KB |
Note: Tier 2 and Tier 3 run on Core 1 but are time-multiplexed — vitals at 1 Hz, features at 4 Hz. Combined peak load is ~60% of Core 1.
Mapping to Existing ADRs
| Existing ADR | Capability | Edge Tier | Implementation |
|---|---|---|---|
| ADR-014 (SOTA signal) | Phase sanitization | 1 | Linear unwrap in CSI callback |
| ADR-014 | Amplitude normalization | 1 | Welford running stats |
| ADR-014 | Feature extraction | 3 | FFT spectrogram + phase diff matrix |
| ADR-014 | Fresnel zone detection | 3 | Fade counting + first Fresnel radius |
| ADR-016 (RuVector) | Subcarrier selection | 1 | Top-K variance (simplified mincut) |
| ADR-021 (Vitals) | Breathing rate | 2 | Biquad IIR + peak detect |
| ADR-021 | Heart rate | 2 | Biquad IIR + autocorrelation |
| ADR-021 | Anomaly detection | 2 | Z-score on vital readings |
| ADR-027 (MERIDIAN) | Environment fingerprint | 3 | Online PCA, 16-dim signature |
| ADR-029 (RuvSense) | Coherence gate | 2 | Z-score coherence scoring |
| ADR-029 | Multistatic correlation | 3 | Pearson cross-link correlation |
| ADR-029 | Gesture recognition | 3 | DTW template matching |
| ADR-030 (Field model) | Static suppression | 1 | EMA background subtraction |
| ADR-031 (RuView) | Sensing-first NDP | Existing | Already in firmware (stub) |
| ADR-037 (Multi-person) | Occupancy counting | 2 | CSI rank estimation |
Server-Side Changes
The Rust aggregator (wifi-densepose-hardware) needs to handle the new packet types:
match magic {
0xC5110001 => parse_raw_csi_frame(buf), // Existing
0xC5110002 => parse_vitals_packet(buf), // New: Tier 2
0xC5110003 => parse_feature_packet(buf), // New: Tier 3
_ => Err(ParseError::UnknownMagic(magic)),
}
When edge tier ≥ 1, the server can skip its own phase sanitization and amplitude normalization. When edge tier = 3, the server skips feature extraction entirely and feeds pre-computed features directly to the neural network.
Testing Strategy
| Test Type | Tool | What |
|---|---|---|
| Host unit tests | gcc + Unity + mock CSI data | Phase unwrap, Welford stats, IIR filter, peak detect, DTW |
| QEMU smoke test | Docker QEMU | Boot, NVS load, task creation, ring buffer |
| Hardware regression | ESP32-S3 + serial log | Full pipeline: CSI → edge processing → UDP → server |
| Accuracy validation | Python reference impl | Compare edge vitals vs. server vitals on same CSI data |
| Stress test | 6-node mesh | Tier 3 at 20 Hz sustained, no frame drops |
Alternatives Considered
-
Rust on ESP32 (esp-rs) — More type-safe, could share code with server crates. Rejected: larger binary, longer compile times, less mature ESP-IDF support for CSI APIs.
-
MicroPython on ESP32 — Easier prototyping. Rejected: too slow for 20 Hz real-time processing, no fixed-point DSP.
-
External co-processor (FPGA/DSP) — Maximum throughput. Rejected: cost ($50+ per node), defeats the $8 ESP32 value proposition.
-
Server-only processing — Keep firmware dumb. Rejected: doesn't solve bandwidth, latency, or standalone operation requirements.
Risks
| Risk | Mitigation |
|---|---|
| Core 1 processing exceeds real-time budget | Adaptive quality: reduce feature_hz or fall back to lower tier |
| Fixed-point arithmetic introduces accuracy drift | Validate against Rust f64 reference on same CSI data; track error bounds |
| NVS config complexity overwhelms users | Sensible defaults; provision.py presets: --preset home, --preset medical, --preset security |
| ADR-018 v2 header breaks old aggregators | Backward-compatible: old magic = old format. New bit in flags field signals extension |
| Memory fragmentation from ring buffer | Static allocation only; no malloc in edge processing path |
Success Criteria
- Tier 1 reduces bandwidth by ≥60% with <1 dB SNR loss
- Tier 2 breathing rate within ±1 BPM of server-side estimate
- Tier 2 heart rate within ±3 BPM of server-side estimate
- Tier 2 fall detection latency <500ms (vs. ~2s server roundtrip)
- Tier 2 presence detection accuracy ≥95%
- Tier 3 feature extraction matches server output within 5% RMSE
- All tiers: zero frame drops at 20 Hz sustained on single node
- Firmware binary stays under 90% of 1 MB app partition
- SRAM usage stays under 400 KB (leave headroom for WiFi stack)
- CI pipeline: build + host unit tests + binary size check on every PR