219 lines
13 KiB
Markdown
219 lines
13 KiB
Markdown
# Proof of Capabilities — answering the "it's fake / misleading" claims
|
||
|
||
**Short version: don't trust us — verify.** Every claim below comes with a command you can
|
||
run yourself in minutes. Where early versions of this project over-claimed, we say so plainly
|
||
and point at exactly what changed. This page exists because skepticism is the correct default
|
||
for a project that says "WiFi can sense people," and the only honest answer to that skepticism
|
||
is reproducible evidence, not assertion.
|
||
|
||
---
|
||
|
||
## 1. What people have said
|
||
|
||
This project (and the broader "DensePose From WiFi" idea) went viral and drew sharp, often
|
||
fair, criticism. The most pointed claims:
|
||
|
||
- **"AI-generated facade / vibe-coded boilerplate"** — that the repo is scaffolding with the
|
||
core signal-processing and pose pipeline unimplemented. ([Hacker News](https://news.ycombinator.com/item?id=46388904),
|
||
[Cybernews](https://cybernews.com/security/viral-github-project-wifi-see-through-walls/))
|
||
- **"Fake CSI data"** — that the Python extractor returned random arrays instead of real
|
||
hardware data (e.g. `csi_extractor.py` returning random amplitude/phase). ([audit fork](https://github.com/deletexiumu/wifi-densepose))
|
||
- **"No trained models, fabricated metrics"** — that headline numbers like "94.2% pose
|
||
accuracy," "96.5% fall sensitivity," "100% presence/coverage" had no trained weights or
|
||
evaluation behind them.
|
||
- **"Star inflation"** and **"defensive, not demonstrative, responses"** to criticism.
|
||
- **"Reads like ad copy"** — emoji-heavy AI documentation that conveys little.
|
||
|
||
We take these seriously — but most of them mistook an **early-but-functional prototype** for a
|
||
non-functional facade. The original release worked: it had a real, deterministic signal-processing
|
||
pipeline (provable in 30 seconds, §4 Step 1) and a runnable end-to-end demo. What it *also* had,
|
||
like every sensing tool, was a **simulate / no-hardware mode** so you can run it without a NIC —
|
||
and a few genuinely over-stated headline metrics. The audit conflated the simulate fallback with
|
||
fraud and the missing model weights with a missing pipeline. Here is the honest accounting, then
|
||
the proof.
|
||
|
||
---
|
||
|
||
## 2. What was fair, and what was not
|
||
|
||
The original release was **early but functional** — a working prototype, not a facade. Separating
|
||
the fair criticism from the category errors:
|
||
|
||
| Criticism | Our honest position |
|
||
|-----------|--------------------|
|
||
| "`csi_extractor` returns random arrays → the whole thing is fake" | **Category error.** Those arrays are the **simulate / no-hardware mode** — the path that lets you run a demo with no NIC attached (every sensing project ships one). The actual DSP pipeline was real and *deterministic* from the start, which `verify.py` proves bit-for-bit (§4 Step 1). A reproducible hash is impossible from random data. |
|
||
| "Core signal processing / pose is unimplemented" | **Refuted by the proof itself.** `verify.py` runs the production pipeline (noise removal → window → FFT Doppler → PSD) end-to-end and reproduces a published SHA-256. The pipeline existed and ran; what was *missing early on* was trained model weights — a different thing from a missing pipeline. |
|
||
| "100% presence accuracy" was unsupported | **Fair — formally retracted.** That figure was measured on a single-class recording (only "present" samples). It's replaced everywhere by an honest **82.3% held-out temporal-triplet** accuracy. See the in-place retraction in `README.md` / `docs/user-guide.md`. |
|
||
| Some headline metrics (94.2% pose, 96.5% fall) lacked published evaluation early on | **Fair at the time.** Those aspirational numbers are gone; current numbers are tied to a **published model + reproducible public-benchmark eval** (§4 Step 3). |
|
||
| Docs read like AI ad copy | **Partly fair.** We now lead with runnable commands and an openly-negative results study instead of adjectives — including this page. |
|
||
|
||
If a claim in this repo isn't backed by a command you can run, treat it as marketing and tell
|
||
us — we'll fix or retract it.
|
||
|
||
---
|
||
|
||
## 3. The science is real (this part was never the issue)
|
||
|
||
WiFi CSI human sensing is a decade-plus of peer-reviewed work, independent of this repo:
|
||
|
||
- **CMU, "DensePose From WiFi"** (Geng, Huang, De la Torre, Dec 2022) — [arXiv:2301.00250](https://arxiv.org/abs/2301.00250).
|
||
- **MIT CSAIL RF-Pose / RF-Pose3D** (Zhao et al.) — through-wall skeletal pose from radio.
|
||
- **IEEE 802.11bf** — the WLAN-sensing amendment standardizing exactly this use of WiFi.
|
||
- **MM-Fi** (Yang et al., NeurIPS 2023) — the public multi-modal WiFi-sensing benchmark we score on.
|
||
|
||
The legitimate question was never "is WiFi sensing real?" — it's "does *this implementation*
|
||
actually do it?" The rest of this page answers that.
|
||
|
||
---
|
||
|
||
## 4. Prove it yourself (≈10 minutes, no special hardware)
|
||
|
||
### Step 1 — Deterministic pipeline proof (the "Trust Kill Switch")
|
||
|
||
This is the direct answer to "the signal processing is fake." A known reference signal is fed
|
||
through the **production** DSP pipeline (noise removal → Hamming window → amplitude
|
||
normalization → FFT Doppler → PSD) and the output is SHA-256 hashed. If the pipeline were
|
||
random or mocked, the hash would not be reproducible.
|
||
|
||
```bash
|
||
python archive/v1/data/proof/verify.py
|
||
# Expect: VERDICT: PASS
|
||
# Pipeline hash: f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a
|
||
```
|
||
|
||
The published expected hash is committed at `archive/v1/data/proof/expected_features.sha256`.
|
||
Run it on your machine — it reproduces **bit-for-bit across platforms** (verified identical on
|
||
Windows, two independent Linux hosts, and the GitHub Azure CI runner). For the one feature that
|
||
*isn't* bit-stable — the peak-normalized Doppler spectrum, whose argmax flips under
|
||
cross-microarchitecture FFT reordering — the proof excludes it from the hash and additionally
|
||
checks every other feature against a committed reference vector within a strict relative tolerance
|
||
(`expected_features_reference.npz`), so a genuine regression still fails while CPU-level float
|
||
noise does not. Five features (amplitude mean/variance, phase difference, correlation matrix, and
|
||
the FFT-based PSD) carry the deterministic proof.
|
||
|
||
**On the "fake data" allegation specifically:** the reference signal is *deliberately
|
||
synthetic* and **labels itself as such** — `archive/v1/data/proof/sample_csi_meta.json` says:
|
||
|
||
```json
|
||
{ "is_synthetic": true, "is_real_capture": false, "numpy_seed": 42, ... }
|
||
```
|
||
|
||
and `generate_reference_signal.py` states in its header: *"It is NOT a real WiFi capture."*
|
||
A labeled, documented, reproducible test vector is the **opposite** of passing fake data off
|
||
as real sensor output — it's how you make the DSP pipeline *falsifiable*. Conflating the two
|
||
was the central error in the "fake CSI" audit.
|
||
|
||
### Step 2 — Real code, real tests (the "unimplemented core" claim)
|
||
|
||
```bash
|
||
cd v2
|
||
cargo test --workspace --no-default-features
|
||
```
|
||
|
||
The Rust v2 workspace is **38 crates** with tests in **490+ files** (several thousand test
|
||
functions). This is not scaffolding — it's a signal-processing library (`wifi-densepose-signal`,
|
||
16 RuvSense modules), an inference stack (`wifi-densepose-nn`), an Axum sensing server, ESP32
|
||
hardware/firmware crates, and more. The test run *is* the proof — don't take the count on
|
||
faith, run it.
|
||
|
||
### Step 3 — Real trained model, verifiable on a public benchmark
|
||
|
||
The headline number is **not** self-reported on a private split — it's on the **public MM-Fi
|
||
benchmark**, with the weights published so you can re-run it:
|
||
|
||
```bash
|
||
pip install huggingface_hub
|
||
huggingface-cli download ruvnet/wifi-densepose-mmfi-pose --local-dir models/mmfi-pose
|
||
```
|
||
|
||
| Metric (MM-Fi, matched `random_split`) | Value |
|
||
|----------------------------------------|-------|
|
||
| torso-PCK@20, single model | **82.69%** |
|
||
| torso-PCK@20, 3-model ensemble + TTA | **83.59%** |
|
||
| 75K-param micro (edge) variant | 74.30% |
|
||
| Prior published SOTA — MultiFormer (2025) | 72.25% |
|
||
| Prior — CSI2Pose | 68.41% |
|
||
|
||
- Model card: [`ruvnet/wifi-densepose-mmfi-pose`](https://huggingface.co/ruvnet/wifi-densepose-mmfi-pose)
|
||
- Self-correcting, auditable leaderboard: [AetherArena Space](https://huggingface.co/spaces/ruvnet/aether-arena)
|
||
- Pretrained encoder (82.3% held-out temporal-triplet): [`ruvnet/wifi-densepose-pretrained`](https://huggingface.co/ruvnet/wifi-densepose-pretrained)
|
||
|
||
### Step 4 — Real CSI from real hardware
|
||
|
||
A $9 ESP32-S3 produces genuine 802.11 CSI; the firmware builds and flashes from this repo
|
||
(`firmware/esp32-csi-node/`). The data path is ESP-IDF CSI callbacks (or nexmon_csi `.pcap` on a
|
||
Raspberry Pi via the [rvCSI](https://github.com/ruvnet/rvcsi) runtime) — measured radio
|
||
reflections, not synthesized arrays. Build/flash/provision steps are in
|
||
[`docs/user-guide.md`](user-guide.md) and `CLAUDE.local.md`.
|
||
|
||
---
|
||
|
||
## 5. Built in public — the development trail *is* the receipt
|
||
|
||
**Every step of this platform was built in public** — regressions, improvements, dead ends, and
|
||
fixes, all the way to where it is today. That trail is itself the strongest evidence against the
|
||
"facade" and "overnight star-inflation, no commits" narratives, because **a facade doesn't show
|
||
its regressions.** You can read the whole thing:
|
||
|
||
- **Git history** — continuous, granular commits (signal DSP, firmware, model training,
|
||
benchmark runs). Not a README drop followed by silence.
|
||
- **96 ADRs** ([`docs/adr/`](adr/README.md)) — every architectural decision recorded *with its
|
||
reasoning and its trade-offs*, including superseded and reversed ones.
|
||
- **CHANGELOG** — additions, fixes, and reversals dated in place (e.g. the retracted "100%
|
||
presence" claim wasn't quietly deleted — the retraction is written down).
|
||
- **Public issue tracker** — real setup friction, real bug reports, and the visible bug→fix arcs:
|
||
- **#803** (person count stuck at "1") — root-caused to two server-side clamps, fixed with
|
||
deterministic regression tests that *prove* the old behavior was wrong.
|
||
- **#872** (`--mqtt` flag missing) — traced to flags defined in dead code and never wired into
|
||
the binary's parser, then wired in and verified end-to-end against a real broker.
|
||
|
||
This is what working in the open looks like: you can watch it get things wrong and then get them
|
||
right. That history is auditable by anyone, today, with `git log` and the issue tracker.
|
||
|
||
A facade hides its failures. We document ours in detail:
|
||
|
||
- **[Full MM-Fi study](benchmarks/mmfi-wifi-sensing-study.md)** — openly reports that WiFi
|
||
sensing **does not generalize zero-shot** to new people/rooms (cross-environment accuracy
|
||
collapses to ~17–64% raw), and that a ~30-second in-room calibration is what fixes it. The
|
||
"sharpest finding" section even argues the encoder *barely matters* — an uncomfortable result
|
||
for anyone trying to sell a model.
|
||
- **[Efficiency frontier](benchmarks/wifi-pose-efficiency-frontier.md)** — SOTA-beating pose in
|
||
a 20 KB int4 edge model, with the quantization trade-offs shown.
|
||
- **Retractions** — the "100% presence" figure was withdrawn in-place rather than quietly
|
||
edited away.
|
||
- **[ADR-147 benchmark proof](adr/ADR-147-benchmark-proof.md)** and
|
||
**[WITNESS-LOG-028](WITNESS-LOG-028.md)** — how the numbers are produced and a 33-row
|
||
per-claim attestation matrix.
|
||
|
||
---
|
||
|
||
## 6. Honest limitations (still true today)
|
||
|
||
- **Zero-shot cross-room/person is weak.** Plan on ~30 s of in-room calibration per deployment.
|
||
- **Single-node spatial resolution is limited.** Use 2+ ESP32 nodes (or add a Cognitum Seed)
|
||
for multi-person / localization.
|
||
- **Multi-person counting is hard.** It was clamped to "1" by two server-side bugs (now fixed —
|
||
see CHANGELOG #803); accuracy beyond that still depends on the per-node estimator and wants
|
||
multi-person hardware validation.
|
||
- **Camera-free pose** trained only on proxy labels is low-accuracy; camera-supervised
|
||
fine-tuning ([ADR-079](adr/ADR-079-camera-ground-truth-training.md)) is the path to good pose.
|
||
- **Beta software.** APIs and firmware change.
|
||
|
||
---
|
||
|
||
## 7. Sources
|
||
|
||
- Carnegie Mellon, "DensePose From WiFi" — https://arxiv.org/abs/2301.00250
|
||
- IEEE 802.11bf WLAN Sensing — https://www.ieee802.org/11/Reports/tgbf_update.htm
|
||
- MM-Fi benchmark — https://github.com/ybhbingo/MMFi_dataset
|
||
- Hacker News discussion — https://news.ycombinator.com/item?id=46388904
|
||
- Cybernews coverage — https://cybernews.com/security/viral-github-project-wifi-see-through-walls/
|
||
- byteiota, "Real or AI-Generated Hype?" — https://byteiota.com/wifi-densepose-hits-github-2-real-or-ai-generated-hype/
|
||
- agentpedia, "RuView and the Reproducibility Question" — https://agentpedia.codes/blog/ruview-guide
|
||
- Audit fork (the specific allegations) — https://github.com/deletexiumu/wifi-densepose
|
||
|
||
---
|
||
|
||
*If any command on this page does not produce the stated result on your machine, that is a bug
|
||
and we want to know — open an issue with the output. Reproducibility is the whole point.*
|