wifi-densepose/docs/proof-of-capabilities.md

219 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Proof of Capabilities — answering the "it's fake / misleading" claims
**Short version: don't trust us — verify.** Every claim below comes with a command you can
run yourself in minutes. Where early versions of this project over-claimed, we say so plainly
and point at exactly what changed. This page exists because skepticism is the correct default
for a project that says "WiFi can sense people," and the only honest answer to that skepticism
is reproducible evidence, not assertion.
---
## 1. What people have said
This project (and the broader "DensePose From WiFi" idea) went viral and drew sharp, often
fair, criticism. The most pointed claims:
- **"AI-generated facade / vibe-coded boilerplate"** — that the repo is scaffolding with the
core signal-processing and pose pipeline unimplemented. ([Hacker News](https://news.ycombinator.com/item?id=46388904),
[Cybernews](https://cybernews.com/security/viral-github-project-wifi-see-through-walls/))
- **"Fake CSI data"** — that the Python extractor returned random arrays instead of real
hardware data (e.g. `csi_extractor.py` returning random amplitude/phase). ([audit fork](https://github.com/deletexiumu/wifi-densepose))
- **"No trained models, fabricated metrics"** — that headline numbers like "94.2% pose
accuracy," "96.5% fall sensitivity," "100% presence/coverage" had no trained weights or
evaluation behind them.
- **"Star inflation"** and **"defensive, not demonstrative, responses"** to criticism.
- **"Reads like ad copy"** — emoji-heavy AI documentation that conveys little.
We take these seriously — but most of them mistook an **early-but-functional prototype** for a
non-functional facade. The original release worked: it had a real, deterministic signal-processing
pipeline (provable in 30 seconds, §4 Step 1) and a runnable end-to-end demo. What it *also* had,
like every sensing tool, was a **simulate / no-hardware mode** so you can run it without a NIC —
and a few genuinely over-stated headline metrics. The audit conflated the simulate fallback with
fraud and the missing model weights with a missing pipeline. Here is the honest accounting, then
the proof.
---
## 2. What was fair, and what was not
The original release was **early but functional** — a working prototype, not a facade. Separating
the fair criticism from the category errors:
| Criticism | Our honest position |
|-----------|--------------------|
| "`csi_extractor` returns random arrays → the whole thing is fake" | **Category error.** Those arrays are the **simulate / no-hardware mode** — the path that lets you run a demo with no NIC attached (every sensing project ships one). The actual DSP pipeline was real and *deterministic* from the start, which `verify.py` proves bit-for-bit (§4 Step 1). A reproducible hash is impossible from random data. |
| "Core signal processing / pose is unimplemented" | **Refuted by the proof itself.** `verify.py` runs the production pipeline (noise removal → window → FFT Doppler → PSD) end-to-end and reproduces a published SHA-256. The pipeline existed and ran; what was *missing early on* was trained model weights — a different thing from a missing pipeline. |
| "100% presence accuracy" was unsupported | **Fair — formally retracted.** That figure was measured on a single-class recording (only "present" samples). It's replaced everywhere by an honest **82.3% held-out temporal-triplet** accuracy. See the in-place retraction in `README.md` / `docs/user-guide.md`. |
| Some headline metrics (94.2% pose, 96.5% fall) lacked published evaluation early on | **Fair at the time.** Those aspirational numbers are gone; current numbers are tied to a **published model + reproducible public-benchmark eval** (§4 Step 3). |
| Docs read like AI ad copy | **Partly fair.** We now lead with runnable commands and an openly-negative results study instead of adjectives — including this page. |
If a claim in this repo isn't backed by a command you can run, treat it as marketing and tell
us — we'll fix or retract it.
---
## 3. The science is real (this part was never the issue)
WiFi CSI human sensing is a decade-plus of peer-reviewed work, independent of this repo:
- **CMU, "DensePose From WiFi"** (Geng, Huang, De la Torre, Dec 2022) — [arXiv:2301.00250](https://arxiv.org/abs/2301.00250).
- **MIT CSAIL RF-Pose / RF-Pose3D** (Zhao et al.) — through-wall skeletal pose from radio.
- **IEEE 802.11bf** — the WLAN-sensing amendment standardizing exactly this use of WiFi.
- **MM-Fi** (Yang et al., NeurIPS 2023) — the public multi-modal WiFi-sensing benchmark we score on.
The legitimate question was never "is WiFi sensing real?" — it's "does *this implementation*
actually do it?" The rest of this page answers that.
---
## 4. Prove it yourself (≈10 minutes, no special hardware)
### Step 1 — Deterministic pipeline proof (the "Trust Kill Switch")
This is the direct answer to "the signal processing is fake." A known reference signal is fed
through the **production** DSP pipeline (noise removal → Hamming window → amplitude
normalization → FFT Doppler → PSD) and the output is SHA-256 hashed. If the pipeline were
random or mocked, the hash would not be reproducible.
```bash
python archive/v1/data/proof/verify.py
# Expect: VERDICT: PASS
# Pipeline hash: f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a
```
The published expected hash is committed at `archive/v1/data/proof/expected_features.sha256`.
Run it on your machine — it reproduces **bit-for-bit across platforms** (verified identical on
Windows, two independent Linux hosts, and the GitHub Azure CI runner). For the one feature that
*isn't* bit-stable — the peak-normalized Doppler spectrum, whose argmax flips under
cross-microarchitecture FFT reordering — the proof excludes it from the hash and additionally
checks every other feature against a committed reference vector within a strict relative tolerance
(`expected_features_reference.npz`), so a genuine regression still fails while CPU-level float
noise does not. Five features (amplitude mean/variance, phase difference, correlation matrix, and
the FFT-based PSD) carry the deterministic proof.
**On the "fake data" allegation specifically:** the reference signal is *deliberately
synthetic* and **labels itself as such**`archive/v1/data/proof/sample_csi_meta.json` says:
```json
{ "is_synthetic": true, "is_real_capture": false, "numpy_seed": 42, ... }
```
and `generate_reference_signal.py` states in its header: *"It is NOT a real WiFi capture."*
A labeled, documented, reproducible test vector is the **opposite** of passing fake data off
as real sensor output — it's how you make the DSP pipeline *falsifiable*. Conflating the two
was the central error in the "fake CSI" audit.
### Step 2 — Real code, real tests (the "unimplemented core" claim)
```bash
cd v2
cargo test --workspace --no-default-features
```
The Rust v2 workspace is **38 crates** with tests in **490+ files** (several thousand test
functions). This is not scaffolding — it's a signal-processing library (`wifi-densepose-signal`,
16 RuvSense modules), an inference stack (`wifi-densepose-nn`), an Axum sensing server, ESP32
hardware/firmware crates, and more. The test run *is* the proof — don't take the count on
faith, run it.
### Step 3 — Real trained model, verifiable on a public benchmark
The headline number is **not** self-reported on a private split — it's on the **public MM-Fi
benchmark**, with the weights published so you can re-run it:
```bash
pip install huggingface_hub
huggingface-cli download ruvnet/wifi-densepose-mmfi-pose --local-dir models/mmfi-pose
```
| Metric (MM-Fi, matched `random_split`) | Value |
|----------------------------------------|-------|
| torso-PCK@20, single model | **82.69%** |
| torso-PCK@20, 3-model ensemble + TTA | **83.59%** |
| 75K-param micro (edge) variant | 74.30% |
| Prior published SOTA — MultiFormer (2025) | 72.25% |
| Prior — CSI2Pose | 68.41% |
- Model card: [`ruvnet/wifi-densepose-mmfi-pose`](https://huggingface.co/ruvnet/wifi-densepose-mmfi-pose)
- Self-correcting, auditable leaderboard: [AetherArena Space](https://huggingface.co/spaces/ruvnet/aether-arena)
- Pretrained encoder (82.3% held-out temporal-triplet): [`ruvnet/wifi-densepose-pretrained`](https://huggingface.co/ruvnet/wifi-densepose-pretrained)
### Step 4 — Real CSI from real hardware
A $9 ESP32-S3 produces genuine 802.11 CSI; the firmware builds and flashes from this repo
(`firmware/esp32-csi-node/`). The data path is ESP-IDF CSI callbacks (or nexmon_csi `.pcap` on a
Raspberry Pi via the [rvCSI](https://github.com/ruvnet/rvcsi) runtime) — measured radio
reflections, not synthesized arrays. Build/flash/provision steps are in
[`docs/user-guide.md`](user-guide.md) and `CLAUDE.local.md`.
---
## 5. Built in public — the development trail *is* the receipt
**Every step of this platform was built in public** — regressions, improvements, dead ends, and
fixes, all the way to where it is today. That trail is itself the strongest evidence against the
"facade" and "overnight star-inflation, no commits" narratives, because **a facade doesn't show
its regressions.** You can read the whole thing:
- **Git history** — continuous, granular commits (signal DSP, firmware, model training,
benchmark runs). Not a README drop followed by silence.
- **96 ADRs** ([`docs/adr/`](adr/README.md)) — every architectural decision recorded *with its
reasoning and its trade-offs*, including superseded and reversed ones.
- **CHANGELOG** — additions, fixes, and reversals dated in place (e.g. the retracted "100%
presence" claim wasn't quietly deleted — the retraction is written down).
- **Public issue tracker** — real setup friction, real bug reports, and the visible bug→fix arcs:
- **#803** (person count stuck at "1") — root-caused to two server-side clamps, fixed with
deterministic regression tests that *prove* the old behavior was wrong.
- **#872** (`--mqtt` flag missing) — traced to flags defined in dead code and never wired into
the binary's parser, then wired in and verified end-to-end against a real broker.
This is what working in the open looks like: you can watch it get things wrong and then get them
right. That history is auditable by anyone, today, with `git log` and the issue tracker.
A facade hides its failures. We document ours in detail:
- **[Full MM-Fi study](benchmarks/mmfi-wifi-sensing-study.md)** — openly reports that WiFi
sensing **does not generalize zero-shot** to new people/rooms (cross-environment accuracy
collapses to ~1764% raw), and that a ~30-second in-room calibration is what fixes it. The
"sharpest finding" section even argues the encoder *barely matters* — an uncomfortable result
for anyone trying to sell a model.
- **[Efficiency frontier](benchmarks/wifi-pose-efficiency-frontier.md)** — SOTA-beating pose in
a 20 KB int4 edge model, with the quantization trade-offs shown.
- **Retractions** — the "100% presence" figure was withdrawn in-place rather than quietly
edited away.
- **[ADR-147 benchmark proof](adr/ADR-147-benchmark-proof.md)** and
**[WITNESS-LOG-028](WITNESS-LOG-028.md)** — how the numbers are produced and a 33-row
per-claim attestation matrix.
---
## 6. Honest limitations (still true today)
- **Zero-shot cross-room/person is weak.** Plan on ~30 s of in-room calibration per deployment.
- **Single-node spatial resolution is limited.** Use 2+ ESP32 nodes (or add a Cognitum Seed)
for multi-person / localization.
- **Multi-person counting is hard.** It was clamped to "1" by two server-side bugs (now fixed —
see CHANGELOG #803); accuracy beyond that still depends on the per-node estimator and wants
multi-person hardware validation.
- **Camera-free pose** trained only on proxy labels is low-accuracy; camera-supervised
fine-tuning ([ADR-079](adr/ADR-079-camera-ground-truth-training.md)) is the path to good pose.
- **Beta software.** APIs and firmware change.
---
## 7. Sources
- Carnegie Mellon, "DensePose From WiFi" — https://arxiv.org/abs/2301.00250
- IEEE 802.11bf WLAN Sensing — https://www.ieee802.org/11/Reports/tgbf_update.htm
- MM-Fi benchmark — https://github.com/ybhbingo/MMFi_dataset
- Hacker News discussion — https://news.ycombinator.com/item?id=46388904
- Cybernews coverage — https://cybernews.com/security/viral-github-project-wifi-see-through-walls/
- byteiota, "Real or AI-Generated Hype?" — https://byteiota.com/wifi-densepose-hits-github-2-real-or-ai-generated-hype/
- agentpedia, "RuView and the Reproducibility Question" — https://agentpedia.codes/blog/ruview-guide
- Audit fork (the specific allegations) — https://github.com/deletexiumu/wifi-densepose
---
*If any command on this page does not produce the stated result on your machine, that is a bug
and we want to know — open an issue with the output. Reproducibility is the whole point.*