wifi-densepose/docs/proof-of-capabilities.md

13 KiB
Raw Permalink Blame History

Proof of Capabilities — answering the "it's fake / misleading" claims

Short version: don't trust us — verify. Every claim below comes with a command you can run yourself in minutes. Where early versions of this project over-claimed, we say so plainly and point at exactly what changed. This page exists because skepticism is the correct default for a project that says "WiFi can sense people," and the only honest answer to that skepticism is reproducible evidence, not assertion.


1. What people have said

This project (and the broader "DensePose From WiFi" idea) went viral and drew sharp, often fair, criticism. The most pointed claims:

  • "AI-generated facade / vibe-coded boilerplate" — that the repo is scaffolding with the core signal-processing and pose pipeline unimplemented. (Hacker News, Cybernews)
  • "Fake CSI data" — that the Python extractor returned random arrays instead of real hardware data (e.g. csi_extractor.py returning random amplitude/phase). (audit fork)
  • "No trained models, fabricated metrics" — that headline numbers like "94.2% pose accuracy," "96.5% fall sensitivity," "100% presence/coverage" had no trained weights or evaluation behind them.
  • "Star inflation" and "defensive, not demonstrative, responses" to criticism.
  • "Reads like ad copy" — emoji-heavy AI documentation that conveys little.

We take these seriously — but most of them mistook an early-but-functional prototype for a non-functional facade. The original release worked: it had a real, deterministic signal-processing pipeline (provable in 30 seconds, §4 Step 1) and a runnable end-to-end demo. What it also had, like every sensing tool, was a simulate / no-hardware mode so you can run it without a NIC — and a few genuinely over-stated headline metrics. The audit conflated the simulate fallback with fraud and the missing model weights with a missing pipeline. Here is the honest accounting, then the proof.


2. What was fair, and what was not

The original release was early but functional — a working prototype, not a facade. Separating the fair criticism from the category errors:

Criticism Our honest position
"csi_extractor returns random arrays → the whole thing is fake" Category error. Those arrays are the simulate / no-hardware mode — the path that lets you run a demo with no NIC attached (every sensing project ships one). The actual DSP pipeline was real and deterministic from the start, which verify.py proves bit-for-bit (§4 Step 1). A reproducible hash is impossible from random data.
"Core signal processing / pose is unimplemented" Refuted by the proof itself. verify.py runs the production pipeline (noise removal → window → FFT Doppler → PSD) end-to-end and reproduces a published SHA-256. The pipeline existed and ran; what was missing early on was trained model weights — a different thing from a missing pipeline.
"100% presence accuracy" was unsupported Fair — formally retracted. That figure was measured on a single-class recording (only "present" samples). It's replaced everywhere by an honest 82.3% held-out temporal-triplet accuracy. See the in-place retraction in README.md / docs/user-guide.md.
Some headline metrics (94.2% pose, 96.5% fall) lacked published evaluation early on Fair at the time. Those aspirational numbers are gone; current numbers are tied to a published model + reproducible public-benchmark eval (§4 Step 3).
Docs read like AI ad copy Partly fair. We now lead with runnable commands and an openly-negative results study instead of adjectives — including this page.

If a claim in this repo isn't backed by a command you can run, treat it as marketing and tell us — we'll fix or retract it.


3. The science is real (this part was never the issue)

WiFi CSI human sensing is a decade-plus of peer-reviewed work, independent of this repo:

  • CMU, "DensePose From WiFi" (Geng, Huang, De la Torre, Dec 2022) — arXiv:2301.00250.
  • MIT CSAIL RF-Pose / RF-Pose3D (Zhao et al.) — through-wall skeletal pose from radio.
  • IEEE 802.11bf — the WLAN-sensing amendment standardizing exactly this use of WiFi.
  • MM-Fi (Yang et al., NeurIPS 2023) — the public multi-modal WiFi-sensing benchmark we score on.

The legitimate question was never "is WiFi sensing real?" — it's "does this implementation actually do it?" The rest of this page answers that.


4. Prove it yourself (≈10 minutes, no special hardware)

Step 1 — Deterministic pipeline proof (the "Trust Kill Switch")

This is the direct answer to "the signal processing is fake." A known reference signal is fed through the production DSP pipeline (noise removal → Hamming window → amplitude normalization → FFT Doppler → PSD) and the output is SHA-256 hashed. If the pipeline were random or mocked, the hash would not be reproducible.

python archive/v1/data/proof/verify.py
# Expect:  VERDICT: PASS
# Pipeline hash: f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a

The published expected hash is committed at archive/v1/data/proof/expected_features.sha256. Run it on your machine — it reproduces bit-for-bit across platforms (verified identical on Windows, two independent Linux hosts, and the GitHub Azure CI runner). For the one feature that isn't bit-stable — the peak-normalized Doppler spectrum, whose argmax flips under cross-microarchitecture FFT reordering — the proof excludes it from the hash and additionally checks every other feature against a committed reference vector within a strict relative tolerance (expected_features_reference.npz), so a genuine regression still fails while CPU-level float noise does not. Five features (amplitude mean/variance, phase difference, correlation matrix, and the FFT-based PSD) carry the deterministic proof.

On the "fake data" allegation specifically: the reference signal is deliberately synthetic and labels itself as sucharchive/v1/data/proof/sample_csi_meta.json says:

{ "is_synthetic": true, "is_real_capture": false, "numpy_seed": 42, ... }

and generate_reference_signal.py states in its header: "It is NOT a real WiFi capture." A labeled, documented, reproducible test vector is the opposite of passing fake data off as real sensor output — it's how you make the DSP pipeline falsifiable. Conflating the two was the central error in the "fake CSI" audit.

Step 2 — Real code, real tests (the "unimplemented core" claim)

cd v2
cargo test --workspace --no-default-features

The Rust v2 workspace is 38 crates with tests in 490+ files (several thousand test functions). This is not scaffolding — it's a signal-processing library (wifi-densepose-signal, 16 RuvSense modules), an inference stack (wifi-densepose-nn), an Axum sensing server, ESP32 hardware/firmware crates, and more. The test run is the proof — don't take the count on faith, run it.

Step 3 — Real trained model, verifiable on a public benchmark

The headline number is not self-reported on a private split — it's on the public MM-Fi benchmark, with the weights published so you can re-run it:

pip install huggingface_hub
huggingface-cli download ruvnet/wifi-densepose-mmfi-pose --local-dir models/mmfi-pose
Metric (MM-Fi, matched random_split) Value
torso-PCK@20, single model 82.69%
torso-PCK@20, 3-model ensemble + TTA 83.59%
75K-param micro (edge) variant 74.30%
Prior published SOTA — MultiFormer (2025) 72.25%
Prior — CSI2Pose 68.41%

Step 4 — Real CSI from real hardware

A $9 ESP32-S3 produces genuine 802.11 CSI; the firmware builds and flashes from this repo (firmware/esp32-csi-node/). The data path is ESP-IDF CSI callbacks (or nexmon_csi .pcap on a Raspberry Pi via the rvCSI runtime) — measured radio reflections, not synthesized arrays. Build/flash/provision steps are in docs/user-guide.md and CLAUDE.local.md.


5. Built in public — the development trail is the receipt

Every step of this platform was built in public — regressions, improvements, dead ends, and fixes, all the way to where it is today. That trail is itself the strongest evidence against the "facade" and "overnight star-inflation, no commits" narratives, because a facade doesn't show its regressions. You can read the whole thing:

  • Git history — continuous, granular commits (signal DSP, firmware, model training, benchmark runs). Not a README drop followed by silence.
  • 96 ADRs (docs/adr/) — every architectural decision recorded with its reasoning and its trade-offs, including superseded and reversed ones.
  • CHANGELOG — additions, fixes, and reversals dated in place (e.g. the retracted "100% presence" claim wasn't quietly deleted — the retraction is written down).
  • Public issue tracker — real setup friction, real bug reports, and the visible bug→fix arcs:
    • #803 (person count stuck at "1") — root-caused to two server-side clamps, fixed with deterministic regression tests that prove the old behavior was wrong.
    • #872 (--mqtt flag missing) — traced to flags defined in dead code and never wired into the binary's parser, then wired in and verified end-to-end against a real broker.

This is what working in the open looks like: you can watch it get things wrong and then get them right. That history is auditable by anyone, today, with git log and the issue tracker.

A facade hides its failures. We document ours in detail:

  • Full MM-Fi study — openly reports that WiFi sensing does not generalize zero-shot to new people/rooms (cross-environment accuracy collapses to ~1764% raw), and that a ~30-second in-room calibration is what fixes it. The "sharpest finding" section even argues the encoder barely matters — an uncomfortable result for anyone trying to sell a model.
  • Efficiency frontier — SOTA-beating pose in a 20 KB int4 edge model, with the quantization trade-offs shown.
  • Retractions — the "100% presence" figure was withdrawn in-place rather than quietly edited away.
  • ADR-147 benchmark proof and WITNESS-LOG-028 — how the numbers are produced and a 33-row per-claim attestation matrix.

6. Honest limitations (still true today)

  • Zero-shot cross-room/person is weak. Plan on ~30 s of in-room calibration per deployment.
  • Single-node spatial resolution is limited. Use 2+ ESP32 nodes (or add a Cognitum Seed) for multi-person / localization.
  • Multi-person counting is hard. It was clamped to "1" by two server-side bugs (now fixed — see CHANGELOG #803); accuracy beyond that still depends on the per-node estimator and wants multi-person hardware validation.
  • Camera-free pose trained only on proxy labels is low-accuracy; camera-supervised fine-tuning (ADR-079) is the path to good pose.
  • Beta software. APIs and firmware change.

7. Sources


If any command on this page does not produce the stated result on your machine, that is a bug and we want to know — open an issue with the output. Reproducibility is the whole point.