wifi-densepose/examples/through-wall/README.md

136 lines
6.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# WiFlow Browser Trainer (`wiflow_browser.html`)
A **single self-contained HTML page** that does the entire camera-supervised
WiFi-pose loop **in your browser, in your laptop camera's coordinate frame**, as
a **4-stage gated flow** with a progress stepper (each stage unlocks the next):
0. **CALIBRATE** *(ADR-151 empty-room baseline)* — you step OUT of the space; the
page captures ~10 s of the quiescent CSI and computes a per-feature running
**mean + std (Welford)** over the 410-d vector. Every CSI vector afterwards is
expressed as **deviation from baseline**
(`x_norm = (x base_mean) / (base_std + ε)`), so a body's perturbation stands
out from the static channel. Persisted to IndexedDB. *Can't capture without it.*
1. **CAPTURE** — MediaPipe Pose runs on your laptop camera → 17 COCO keypoints
(the *label*), paired with the **baseline-normalized** 410-d ESP32 CSI vector
(the *input*). A **guided, balanced routine** cycles big on-screen prompts
(stand / turn / walk / arms / crouch / sit / reach) with a countdown, and a
**per-pose coverage meter** so you build a balanced dataset, not 2 000 frames
of standing.
2. **TRAIN** — a TensorFlow.js MLP learns `CSI → pose` in-browser. Honest
held-out PCK@0.10 / PCK@0.05 / MPJPE, plus a **mean-pose baseline** the model
must beat (the project's whole ethos — no baseline-beating signal, it says so).
*Can't train with <200 samples.*
3. **INFER** — the trained model drives a skeleton **from WiFi CSI only**
(baseline-normalized → standardized → model), drawn over the **same** camera
frame it trained in — so the inferred skeleton **aligns** with the camera
image. That alignment is the entire point of doing this in-browser instead of
with a separate Python camera. *Can't infer without a model.*
## Why in-browser
The Python pipeline (`wiflow_capture.py` → `wiflow_train.py``wiflow_infer.py`)
proved the signal is real (held-out PCK@0.10 ≈ 59.5% vs a 50% mean-pose baseline
= +9.4 pp). But it trained in a *different* camera's frame, so the inferred
skeleton never lined up with the laptop camera. Doing capture + train + infer all
in the browser with the **same** camera makes the training frame and the
inference frame identical → the skeleton aligns.
## Compute backends (WebGPU / WASM / WebGL)
Training and inference run on TensorFlow.js. The page selects the backend at
startup, preferring the fastest available:
- **WebGPU** (Chrome / Edge, secure context — `localhost` qualifies) — GPU compute.
- **WASM-SIMD** fallback (`tfjs-backend-wasm`, SIMD enabled, `.wasm` from the CDN).
- **WebGL** last-resort fallback (ships inside tfjs core).
The **active backend is shown as a badge in the header** (`compute: WebGPU` /
`WASM-SIMD` / `WebGL`) so it's honest about what's actually running. The model
code is backend-agnostic — tf.js abstracts the device.
## Honesty (baked in)
- The **CAPTURE** skeleton (blue) is the camera = ground truth, labeled as such.
- The **INFER** skeleton (green) is **CSI-only**, labeled, and **coarse** — the
real measured held-out PCK is shown, not a marketing number.
- The **mean-pose baseline** is always computed and shown in TRAIN; the verdict
states plainly whether the model **beats** it (real signal) or **does not**
(no usable signal). This guards against the project's retracted 92.9% that
failed exactly this check.
- Status banner is strict and mutually exclusive:
**LIVE** (real `source: "esp32"`) / **SIMULATED — not real** (any other source)
/ **NO-CSI-SERVER**. The page never invents frames.
## How to run
### 1. Start the real sensing-server (provides the CSI WebSocket on :8765)
```bash
cd v2
cargo build -p wifi-densepose-sensing-server
./target/debug/sensing-server.exe --ws-port 8765 --udp-port 5005
```
A real ESP32-S3 must be provisioned and streaming for `source` to read `esp32`
(see `CLAUDE.local.md` for the firmware build/provision steps). The page expects
the verified live endpoint **`ws://localhost:8765/ws/sensing`** with
`source:"esp32"`, nodes `[9, 13]`, `features.*`, `node_features[].features.*`,
and `signal_field.values` (400 floats).
### 2. Serve this page over localhost (camera + WebGPU need a localhost/secure origin)
Any static localhost server works. For example:
```bash
python -m http.server 8099
# then open: http://localhost:8099/examples/through-wall/wiflow_browser.html
```
(8099 is just the static file server — 8765 is a separate process, the CSI
WebSocket.) Allow camera access when the browser prompts.
Point at a CSI server on another host with `?ws=`:
```
http://localhost:8099/examples/through-wall/wiflow_browser.html?ws=ws://192.168.1.20:8765/ws/sensing
```
### 3. Use it
1. **CAPTURE** tab → *enable laptop camera**start recording*. Follow the guided
routine (stand / turn / walk / arms / crouch / sit). A pair is stored only when
a confident pose AND a fresh live `esp32` CSI frame coexist. Aim for a few
thousand samples. Samples persist in IndexedDB across refreshes.
2. **TRAIN** tab → *train model*. Watch the live loss curve, held-out PCK, and the
baseline verdict. The model saves to IndexedDB.
3. **INFER** tab → the green skeleton is now driven by WiFi CSI only, aligned over
your camera. Toggle *hide camera* to see the CSI-only skeleton on black.
## The 410-d CSI vector (matches the Python pipeline exactly)
```
[ mean_rssi, variance, motion_band_power, breathing_band_power ] # 4 (features.*)
+ for node 9 then node 13: [ mean_rssi, variance, motion_band_power ] # 6 (node_features[].features.*)
+ signal_field.values, padded / truncated to 400 # 400
= 410-d
```
Verified against a real live frame: the in-browser `csiVector()` produces the
identical 410 vector as `wiflow_capture.py`'s `csi_vector()` (node 9 first, then
node 13; field zero-padded).
## Libraries (CDN only, no bundler)
| Library | CDN |
|---|---|
| TensorFlow.js core | `@tensorflow/tfjs@4.22.0/dist/tf.min.js` |
| TF.js WebGPU backend | `@tensorflow/tfjs-backend-webgpu@4.22.0/dist/tf-backend-webgpu.min.js` |
| TF.js WASM backend | `@tensorflow/tfjs-backend-wasm@4.22.0/dist/tf-backend-wasm.min.js` |
| MediaPipe Pose 0.5 (legacy solutions) | `@mediapipe/pose@0.5/pose.js` |
## Scope / honesty caveats
Same person, same room, same session. **Not** validated cross-day, cross-room, or
through-wall. The inferred pose is coarse (PCK@0.05 is typically weak). If the
model does not beat the mean-pose baseline, the page says so — that is a feature.