diff --git a/.gitignore b/.gitignore index 094e45d9..b2b51ccf 100644 --- a/.gitignore +++ b/.gitignore @@ -280,3 +280,10 @@ assets/MM-Fi/*.zip # through-wall demo: regenerable trained model artifact examples/through-wall/model/ + +# RuView harness (npx ruview) build artifacts — ADR-182 +harness/**/node_modules/ +harness/**/*.tgz +harness/**/package-lock.json +harness/**/.claude-flow/ +harness/**/ruvector.db diff --git a/README.md b/README.md index 5065213c..8e23fdb8 100644 --- a/README.md +++ b/README.md @@ -601,6 +601,8 @@ claude --plugin-dir ./plugins/ruview Verify the plugin structure: `bash plugins/ruview/scripts/smoke.sh`. Full details: [`plugins/ruview/README.md`](plugins/ruview/README.md). +**Portable harness — `npx @ruvnet/ruview`:** a lighter, host-portable companion to the in-repo plugin, minted via [MetaHarness](https://www.npmjs.com/package/metaharness) and hardened per [ADR-182](docs/adr/ADR-182-npx-ruview-harness-via-metaharness.md). It runs **without cloning this repo** and on more hosts (Claude Code, Codex, Copilot, opencode, …), exposing the RuView operator tools (`onboard`, `verify`, `node_monitor`, `calibrate`, `node_flash`) over an MCP server — plus the project's **MEASURED-vs-CLAIMED honesty guardrail enforced in code** (`ruview.claim_check` flags untagged or retracted-"100%" accuracy claims). v0.1: the onboarding/verify/claim-check paths are tested (17/17, `verify.py` → PASS); the hardware tools are fail-closed wrappers. Try `npx @ruvnet/ruview` to onboard, or `npx @ruvnet/ruview claim-check --text "…"`. Source: [`harness/ruview/`](harness/ruview/README.md). + --- ## 📖 Documentation @@ -614,6 +616,7 @@ Verify the plugin structure: `bash plugins/ruview/scripts/smoke.sh`. Full detail | [**SENSE-BRIDGE — rvagent MCP server**](tools/ruview-mcp/README.md) | Dual-transport MCP server (`@ruvnet/rvagent`) bridging the RuView sensing stack to AI agents (Claude Code, Cursor, ruflo swarms). 6 tools wired: `ruview.presence.now`, `ruview.vitals.get_{breathing,heart_rate,all}`, `ruview.bfld.last_scan`, `ruview.bfld.subscribe`. stdio + Streamable HTTP (`POST /mcp`, Origin-validated, bearer-token auth, `127.0.0.1` bind). Full 20-tool Zod schema barrel + 5 RUVIEW-POLICY governance tools. 93 tests. [ADR-124](docs/adr/ADR-124-rvagent-mcp-ruvector-npm-integration.md). Try: `npx @ruvnet/rvagent stdio`. | | [Semantic Primitives — Precision/Recall](docs/integrations/semantic-primitives-metrics.md) | Per-primitive F1 on the held-out paired-capture set: someone-sleeping, possible-distress, room-active, elderly-inactivity-anomaly, meeting, bathroom, fall-risk, bed-exit, no-movement, multi-room. | | [Claude Code / Codex Plugin](plugins/ruview/README.md) | The `ruview` plugin + marketplace — skills, `/ruview-*` commands, agents, and the Codex prompt mirror | +| [Portable harness — `npx @ruvnet/ruview`](harness/ruview/README.md) | MetaHarness-minted, host-portable RuView operator harness — `ruview.*` MCP tools + the MEASURED-vs-CLAIMED honesty guardrail enforced in code ([ADR-182](docs/adr/ADR-182-npx-ruview-harness-via-metaharness.md)). A lighter, multi-host companion to the in-repo plugin. | | [Architecture Decisions](docs/adr/README.md) | 96 ADRs — why each technical choice was made, organized by domain (hardware, signal processing, ML, platform, infrastructure) | | [Domain Models](docs/ddd/README.md) | 8 DDD models (RuvSense, Signal Processing, Training Pipeline, Hardware Platform, Sensing Server, WiFi-Mat, CHCI, rvCSI) — bounded contexts, aggregates, domain events, and ubiquitous language | | [rvCSI — edge RF sensing runtime](https://github.com/ruvnet/rvcsi) | Rust-first / TypeScript-accessible / hardware-abstracted CSI runtime: multi-source ingestion (incl. real nexmon_csi `.pcap` from a **Raspberry Pi 5** / Pi 4 / Pi 3B+ — CYW43455 / BCM43455c0) → validation → DSP → typed events → RuVector RF memory ([ADR-095](docs/adr/ADR-095-rvcsi-edge-rf-sensing-platform.md), [ADR-096](docs/adr/ADR-096-rvcsi-ffi-crate-layout.md), [domain model](docs/ddd/rvcsi-domain-model.md)). Now its own repo — [`ruvnet/rvcsi`](https://github.com/ruvnet/rvcsi) — vendored here under `vendor/rvcsi`; 9 `rvcsi-*` crates on crates.io, `@ruv/rvcsi` on npm, plus a Claude Code plugin. | diff --git a/docs/adr/ADR-182-npx-ruview-harness-via-metaharness.md b/docs/adr/ADR-182-npx-ruview-harness-via-metaharness.md new file mode 100644 index 00000000..a434cd60 --- /dev/null +++ b/docs/adr/ADR-182-npx-ruview-harness-via-metaharness.md @@ -0,0 +1,279 @@ +# ADR-182: `npx ruview` — A RuView Agent Harness Minted via MetaHarness + +| Field | Value | +|-------|-------| +| **Status** | Accepted — **P1+P2 implemented & validated** (`harness/ruview/`, 17/17 tests, MCP handshake + `ruview.verify` PASS against the real repo, packs to 16.7 kB / 21 files) · P3 publish-ready (name decision pending) · P4 (router + provenance) designed | +| **Date** | 2026-06-17 | +| **Deciders** | ruv | +| **Codename** | **RUVIEW-HARNESS** | +| **Builds on** | MetaHarness (`metaharness@0.1.15`, `@metaharness/kernel`, `@metaharness/host-*`, `@metaharness/router`), the `ruview-*` Claude Code subagents (`ruview-onboarding-guide`, `ruview-config-engineer`, `ruview-training-engineer`), the `wifi-densepose` CLI (`calibrate`/`enroll`/`train-room`/`room-watch`), the sensing-server, ADR-028 (witness verification), ADR-095/096 (rvCSI runtime), ADR-260/262 (RuField bridge) | +| **Supersedes** | none | + +## Context + +RuView (WiFi-DensePose) is a deep stack — 15 Rust crates, an ESP32 firmware line, +a sensing-server, a CLI, ~180 ADRs, a calibration pipeline, training recipes, and a +hard cultural rule that **every claim must be independently reproducible** (the +"prove everything" ethos, after the project was accused of AI-slop). The barrier to +entry is correspondingly steep: a newcomer who wants to "set up WiFi sensing" must +discover the right firmware variant, provision an ESP32 over a Windows-only Python +subprocess, point it at the sensing-server, run `calibrate` → `enroll` → +`train-room`, and know which numbers are MEASURED vs CLAIMED. We already encode this +knowledge as **Claude Code subagents** (`ruview-onboarding-guide`, +`ruview-config-engineer`, `ruview-training-engineer`) — but those only exist inside +*this* repo's `.claude/agents/`, only on Claude Code, and only for someone who has +already cloned the monorepo. + +Separately, this session shipped **MetaHarness** (`metaharness@0.1.15`): a tool that +*"mints a custom AI agent harness from any repo"*, runnable on **9 hosts** +(claude-code, codex, pi-dev, hermes, openclaw, rvm, copilot, opencode, +github-actions) over a wasm-primary / NAPI-RS-fallback **kernel**, with a +**cost-optimal model router** (`@metaharness/router`, the productized DRACO Phase-2 +k-NN finding) and ed25519/SLSA/SBOM provenance baked in. Crucially, MetaHarness +**already ships a `vertical:ruview` template** in its template list. That template +is generic scaffolding; it is not wired to RuView's actual tools, agents, or the +"prove everything" guardrails. + +The gap: **there is no single, host-portable, provenance-signed entry point that +gives any user an AI agent that actually knows how to operate RuView.** A user +should be able to run one command — + +```bash +npx ruview +``` + +— in an empty directory (or alongside an ESP32) and get an agent harness that can +onboard them, configure firmware, drive a live capture, train a room model, and +**refuse to overstate accuracy** — on whichever coding host they already use. + +## Decision + +**Mint a first-class RuView agent harness from this repo using MetaHarness, harden +its `vertical:ruview` template into a RuView-specific harness with a real MCP tool +surface and the project's honesty guardrails, and publish it as `npx ruview`.** + +`npx ruview` is *not* a new runtime. It is a **thin, versioned distribution** of a +MetaHarness harness: the kernel + host adapters + a RuView "genome" (skills, agents, +MCP tools, guardrails) generated from and pinned against this monorepo. The harness +is the product; `npx ruview` is the front door. + +### Why mint-from-repo instead of hand-writing a harness + +MetaHarness's value here is exactly the work we would otherwise hand-roll across 9 +hosts: host-specific config (`.claude/settings.json` MCP + hooks for claude-code, +the codex/copilot/opencode equivalents), the kernel that abstracts wasm-vs-native, +the cost router, and the provenance chain. We write the **RuView knowledge once** as +host-neutral genome assets; MetaHarness projects them onto each host adapter. This +also keeps the harness regenerable: when the CLI or an ADR changes, re-mint and +re-pin rather than maintaining 9 divergent copies. + +### What the harness contains (the RuView genome) + +1. **Skills / playbooks** (host-neutral markdown, projected to each host's skill + format): + - `onboard` — zero-to-sensing path picker (Docker demo / repo build / live + ESP32), the physics caveats, the hardware table. Port of + `ruview-onboarding-guide`. + - `provision-node` — ESP-IDF v5.4 Windows-subprocess build/flash/provision flow + (the exact MSYSTEM-stripped invocation from `CLAUDE.local.md`), firmware + variant selection (8MB display / 4MB no-display / C6), NVS + WiFi + channel / + MAC-filter overrides (ADR-060). + - `calibrate-room` — `baseline → enroll → extract → train` via the + `wifi-densepose` CLI (`calibrate`/`calibrate-serve`/`enroll`/`train-room`/ + `room-watch`, ADR-151). + - `train-pose` — camera-supervised + camera-free training, the MEASURED-vs-CLAIMED + discipline, the mean-pose baseline check (ADR-079, ADR-152, ADR-181). + - `verify` — run the witness bundle + Python proof (`verify.py` → VERDICT: PASS), + ADR-028. + - Ports of `ruview-config-engineer` and `ruview-training-engineer`. + +2. **MCP tool surface** (`@metaharness/kernel`-hosted MCP server, one schema per + capability — see "MCP tools" below). This is what makes the harness *operate* + RuView, not just talk about it. + +3. **Guardrails** (the differentiator): the harness's system prompt and a + pre-output hook enforce the "prove everything" rule — accuracy numbers must be + tagged MEASURED (with a reproducer) or CLAIMED; the agent must run the mean-pose + baseline before quoting PCK; firmware fixes are never presented as + hardware-validated without a real boot log (the exact discipline this session + followed for `v0.8.1-esp32`). + +4. **Host adapters** — claude-code first (P1), then codex / opencode / copilot / + pi-dev / hermes / rvm / github-actions (P3+), each via the published + `@metaharness/host-*` package. + +5. **Router** — `@metaharness/router` routes each step to the cheapest adequate + model (e.g. a var-rename or a log-grep → Haiku; calibration-math reasoning or a + security review → Sonnet/Opus), mirroring the repo's 3-tier routing (ADR-026). + +### MCP tools (the operational surface) + +| Tool | Wraps | Purpose | +|------|-------|---------| +| `ruview.onboard` | docs + agent | Pick a setup path, print the next concrete command | +| `ruview.node.flash` | ESP-IDF subprocess (ADR `CLAUDE.local.md`) | Build + flash a firmware variant to a COM port | +| `ruview.node.provision` | `provision.py` | Set SSID/password/target-ip/channel/MAC-filter over serial | +| `ruview.node.monitor` | pyserial | Stream boot log; assert CSI is flowing (MGMT+DATA) | +| `ruview.server.up` | sensing-server | Start the Axum sensing-server (`:3000`/`:5005`/`:8765`) | +| `ruview.calibrate` | `wifi-densepose calibrate`/`enroll`/`train-room` | Run the ADR-151 room pipeline | +| `ruview.room.watch` | `wifi-densepose room-watch` | Live presence/vitals from a trained room | +| `ruview.verify` | `scripts/generate-witness-bundle.sh` + `verify.py` | Produce/verify the witness bundle (must be N/N PASS) | +| `ruview.claim.check` | static lint | Scan output for untagged accuracy claims; flag MEASURED-vs-CLAIMED | + +Each tool returns structured JSON and is fail-closed: a tool that cannot prove its +result (e.g. `ruview.node.monitor` sees no CSI callbacks) returns an honest negative, +never a fabricated success — consistent with the RuField `map_privacy` fail-closed +posture (ADR-262 §3.3). + +### The mint + pin flow (how the harness is produced) + +```bash +# P1 — mint from this repo, claude-code host, RuView vertical +npx metaharness ruview --template vertical:ruview --host claude-code \ + --from-existing . --description "RuView WiFi-sensing operator agent" \ + --target ./harness/ruview + +# readiness + fit/cost/safety scorecards (ADR-041) — gate before publish +npx metaharness genome . # 7-section repo readiness +npx metaharness score . --json # fit / cost / safety +npx metaharness analyze . # recommended harness plan (no-exec) +``` + +The minted harness is committed under `harness/ruview/` and **pinned** (kernel + +host-adapter + router versions locked) so `npx ruview` is reproducible. Re-minting on +a CLI/ADR change is a reviewed PR, not an implicit regeneration. + +### Distribution: `npx ruview` + +A small published package whose `bin` boots the pinned harness via the kernel: + +- **Preferred name:** `ruview` (currently **free** on npm — verified 2026-06-17). +- **Risk:** npm's typosquat filter may reject `ruview` as too close to `review` / + `preview` (this session hit exactly that on `ruvn`→`levn`/`raven` and + `worldgraph`→`world-graph`). **Fallback:** publish scoped `@ruvnet/ruview` (also + free) and/or `npx ruvnet/ruview` straight from GitHub. Decide at publish time; + do not unpublish to rename (the 24-h name-lock lesson from `worldgraphs`). +- `bin: { "ruview": "bin/cli.js" }` — note **`bin/cli.js`, not `./bin/cli.js`** (npm + strips the `./` form; this broke `ruvn@0.1.0` this session). +- `npx ruview` with no args → `onboard` skill (interactive path picker). + `npx ruview [...]` → run a specific skill. `npx ruview --host codex` → + install the harness into an existing repo for that host. + +## Architecture + +``` + npx ruview (thin bin — boots the pinned harness) + │ + @metaharness/kernel (wasm primary · NAPI-RS native fallback) + ├── host adapter ── claude-code | codex | opencode | copilot | pi-dev | hermes | rvm | github-actions + ├── @metaharness/router (k-NN cost-optimal model routing — DRACO P2 / ADR-026) + └── RuView genome (pinned) + ├── skills onboard · provision-node · calibrate-room · train-pose · verify + ├── mcp tools ruview.node.* · ruview.calibrate · ruview.room.watch · ruview.verify · ruview.claim.check + └── guardrails MEASURED-vs-CLAIMED · mean-pose baseline · no-unvalidated-firmware-claims + │ + RuView assets (the real system the agent drives) + ├── wifi-densepose CLI calibrate / enroll / train-room / room-watch + ├── sensing-server :3000 / :5005 / :8765 + ├── ESP-IDF subprocess build / flash / provision / monitor (COM8/COM9/COM12) + └── witness bundle + verify.py +``` + +Provenance: the harness ships an **ed25519 witness + SBOM (SPDX) + SLSA** chain +(MetaHarness already does this for minted harnesses), so a recipient can verify the +RuView harness was built from a specific monorepo commit — the agentic analogue of +the firmware witness bundle (ADR-028). + +## Phases + +- **P1 — Mint & pin (claude-code).** `npx metaharness ruview --template + vertical:ruview --from-existing . --host claude-code`. Port the three `ruview-*` + subagents into host-neutral genome skills. Commit under `harness/ruview/`, pin + versions. Acceptance: `npx metaharness score .` ≥ threshold; the harness can run + `onboard` and `verify` end-to-end locally. +- **P2 — MCP tool surface.** Implement the `ruview.*` MCP tools over the kernel + (start with `onboard`, `verify`, `claim.check`, `node.monitor` — the read-only / + proving tools), then the mutating ones (`node.flash`, `provision`, `calibrate`). + Acceptance: `ruview.verify` returns the witness bundle PASS as structured JSON; + `ruview.claim.check` flags a seeded untagged "100% accuracy" string. +- **P3 — Publish `npx ruview` + multi-host.** Publish the bin package (name decision + per Distribution). Add codex / opencode / copilot / pi-dev / hermes / rvm / + github-actions adapters. Acceptance: `npx ruview` cold-starts on ≥3 hosts and runs + `onboard`; provenance verifies. +- **P4 — Router + guardrail hardening.** Wire `@metaharness/router`; calibrate the + 3-tier routing on a RuView task set. Make the MEASURED-vs-CLAIMED guardrail a hard + pre-output gate. Acceptance: a benchmark of RuView tasks shows cost reduction vs + all-Opus with no quality regression; the guardrail blocks an untagged accuracy + claim in a red-team prompt. + +## Consequences + +**Positive** +- One reproducible, signed entry point (`npx ruview`) that operates RuView on the + host the user already has — onboarding goes from "clone a 15-crate monorepo" to a + single `npx`. +- The "prove everything" ethos becomes **executable**, not just documentation: the + harness *enforces* MEASURED-vs-CLAIMED and the mean-pose baseline. +- Knowledge written once (host-neutral genome) instead of 9× per host; regenerable + from the repo as the system evolves. +- Dogfoods MetaHarness on a hard real vertical, surfacing bugs back to + `agent-harness-generator` (this session already filed #9–#13 there). + +**Negative / risks** +- **Drift:** a pinned harness goes stale as the CLI/ADRs move; mitigated by a + re-mint-on-change PR ritual and a CI check that the genome's referenced + CLI flags still exist. +- **Surface area:** mutating MCP tools (`node.flash`, `provision`) touch hardware and + the network — must be permission-gated and fail-closed; the firmware-flash tool + must never claim hardware validation without a captured boot log. +- **Name/typosquat:** `ruview` may be rejected at publish; scoped fallback decided in + P3. Do not unpublish-to-rename. +- **Host parity:** not all 9 hosts support MCP + hooks equally; the guardrail gate + may degrade to advisory on weaker hosts — must be disclosed in the badge, not + hidden (same honesty principle as ADR-181's backend badge). +- **Windows-coupled tooling:** the ESP-IDF flow is Windows-subprocess-specific + today; the `node.*` tools are gated to that environment until a cross-platform + path exists. + +## Alternatives considered + +1. **Keep the `ruview-*` subagents repo-local (status quo).** Zero new surface, but + stays Claude-Code-only and clone-gated; no portable front door. Rejected — it's + the gap this ADR exists to close. +2. **Hand-write a bespoke `npx ruview` harness (no MetaHarness).** Full control, but + re-implements the kernel, 9 host adapters, the router, and the provenance chain + we already ship — months of duplicated work and 9 divergent configs to maintain. + Rejected. +3. **Use the generic `vertical:ruview` template as-is.** It's scaffolding with no + real tools or guardrails — it would *talk about* RuView without being able to + *operate* it or enforce honesty. Rejected as insufficient; P2 is precisely the + hardening that makes it real. +4. **Ship only an MCP server (no harness/host adapters).** Covers tools but not the + skills, routing, guardrails, or multi-host projection — a strictly smaller subset + of this design. Folded in as the P2 layer rather than the whole. + +## Open questions + +- Final published name: bare `ruview` vs scoped `@ruvnet/ruview` vs GitHub-only + `npx ruvnet/ruview` — resolve against the typosquat filter at P3. +- Does the harness bundle the `wifi-densepose` binary, shell out to a user-installed + one, or offer both? (Leaning: shell out; print install guidance if absent.) +- Where do the `node.*` hardware tools live for non-Windows users — defer, or wrap + the rvCSI runtime (ADR-095/096) which is cross-platform Rust? +- Should `ruview.verify` gate `npx ruview` self-tests in CI (harness can't publish if + the witness bundle regresses)? +- Relationship to the RuField MFS harness surface (ADR-260/262) — one harness with a + RuField skill, or a sibling `npx rufield`? + +## References + +- MetaHarness: `metaharness@0.1.15` (`npx metaharness`, templates incl. + `vertical:ruview`; hosts: claude-code/codex/pi-dev/hermes/openclaw/rvm/copilot/ + opencode/github-actions), `@metaharness/kernel`, `@metaharness/router`, + `@metaharness/host-*`, repo `github.com/ruvnet/agent-harness-generator`. +- RuView subagents: `ruview-onboarding-guide`, `ruview-config-engineer`, + `ruview-training-engineer` (`.claude/agents/`). +- ADR-026 (3-tier model routing), ADR-028 (witness verification), ADR-041 + (MetaHarness scorecards), ADR-060 (channel / MAC-filter overrides), ADR-079 + (camera ground-truth training), ADR-095/096 (rvCSI runtime), ADR-151 (per-room + calibration), ADR-152/181 (WiFlow / browser pose), ADR-260/262 (RuField bridge). diff --git a/harness/ruview/.claude/settings.json b/harness/ruview/.claude/settings.json new file mode 100644 index 00000000..ec3f0a6e --- /dev/null +++ b/harness/ruview/.claude/settings.json @@ -0,0 +1,18 @@ +{ + "permissions": { + "allow": [ + "Bash(npx ruview*)", + "mcp__ruview__*" + ], + "deny": [ + "Read(./.env)", + "Read(./.env.*)" + ] + }, + "mcpServers": { + "ruview": { + "command": "npx", + "args": ["-y", "@ruvnet/ruview", "mcp", "start"] + } + } +} diff --git a/harness/ruview/.claude/skills/calibrate-room/SKILL.md b/harness/ruview/.claude/skills/calibrate-room/SKILL.md new file mode 100644 index 00000000..c369974b --- /dev/null +++ b/harness/ruview/.claude/skills/calibrate-room/SKILL.md @@ -0,0 +1,29 @@ +--- +name: calibrate-room +description: Run the ADR-151 per-room calibration pipeline — baseline → enroll → extract → train → a bank of small specialists (presence/posture/breathing/heartbeat/restlessness/anomaly). +--- + +# calibrate-room + +Turn a provisioned node + sensing-server into a working room model. Pure-Rust, +edge-deployable (ADR-151). Use the `ruview.calibrate` tool (installed +`wifi-densepose` binary, else `cargo run -p wifi-densepose-cli`). + +## Sequence + +1. **baseline** — capture the empty room (Welford amplitude + von Mises phase). Leave + the room empty. + `ruview.calibrate {step: "baseline"}` +2. **enroll** — record the occupant(s) doing the target activities. + `ruview.calibrate {step: "enroll"}` +3. **train-room** — train the bank of small specialists from baseline + enrollment. + `ruview.calibrate {step: "train-room"}` +4. **room-watch** — live presence/posture/breathing from the trained room. + `ruview.calibrate {step: "room-watch"}` (or the `room-watch` skill) + +## Honesty + +The specialists are calibrated to *this* room; cross-room transfer is a separate +problem (LoRA recalibration, ADR-079 P9). Report which room a number came from, and +tag presence/vitals accuracy MEASURED only with a held-out check — run +`ruview.claim_check` on the writeup. diff --git a/harness/ruview/.claude/skills/onboard/SKILL.md b/harness/ruview/.claude/skills/onboard/SKILL.md new file mode 100644 index 00000000..dd248099 --- /dev/null +++ b/harness/ruview/.claude/skills/onboard/SKILL.md @@ -0,0 +1,30 @@ +--- +name: onboard +description: Zero-to-sensing path picker for RuView (WiFi-DensePose) — pick docker-demo, repo-build, or live-esp32 and run the next concrete step. +--- + +# onboard + +Get a newcomer from nothing to a working RuView setup. **First fact to set:** WiFi +sensing infers *coarse* pose/presence/breathing from Channel State Information — it +is **not a camera**, and any accuracy number must be MEASURED against a baseline +(use the `verify` skill / `ruview.claim_check` tool). Never present WiFi output as +camera-grade. + +## Pick a path + +Run `ruview.onboard {path}` or decide from: + +1. **docker-demo** — fastest, no hardware. Replays sample CSI into the dashboard. + `docker run -p 8000:8000 ruvnet/wifi-densepose` → open `http://localhost:8000`. + Use to see what it looks like. +2. **repo-build** — for developers. `cd v2 && cargo test --workspace --no-default-features` + (1,031+ tests pass), then `cargo run -p wifi-densepose-cli -- --help`. +3. **live-esp32** — a real install. Flash a node (`provision-node` skill), point it at + the sensing-server, then `calibrate-room`. This is the only path that senses a real room. + +## Then + +- Live sensing → go to **provision-node**, then **calibrate-room**. +- Evaluating a model/claim → go to **verify** and run `ruview.claim_check` on any + report before you quote a number. diff --git a/harness/ruview/.claude/skills/provision-node/SKILL.md b/harness/ruview/.claude/skills/provision-node/SKILL.md new file mode 100644 index 00000000..315bef39 --- /dev/null +++ b/harness/ruview/.claude/skills/provision-node/SKILL.md @@ -0,0 +1,49 @@ +--- +name: provision-node +description: Build, flash, and provision an ESP32-S3/C6 CSI node for RuView — firmware variant choice, ESP-IDF Windows-subprocess flow, NVS/WiFi/channel/MAC-filter overrides. +--- + +# provision-node + +Bring an ESP32 sensing node online. + +## 1. Pick a firmware variant + +- **s3-8mb** (display build) — ESP32-S3 N16R8 / 16MB; AMOLED optional. The display-detect + fix (#1000) means a *bare* board still captures CSI (MGMT+DATA). +- **s3-4mb** (no-display) — ESP32-S3 4MB; dual-OTA, display disabled. +- **c6** — ESP32-C6 + Seeed MR60BHA2 (60 GHz mmWave + WiFi CSI). The mmwave probe + requires a validated MR60 header (#1107) so an empty UART never false-detects. + +Prebuilt binaries: GitHub release `v0.8.1-esp32` (hardware-validated on S3 QFN56 rev v0.2). + +## 2. Flash + +ESP-IDF v5.4 on Windows is **subprocess-only** (Git Bash/MSYS is unsupported — strip +`MSYSTEM*` env vars). Offsets for the S3 image: + +``` +esptool --chip esp32s3 -p -b 460800 write_flash \ + 0x0 bootloader.bin 0x8000 partition-table.bin \ + 0xf000 ota_data_initial.bin 0x20000 esp32-csi-node-s3-8mb.bin +``` + +(`ruview.node_flash` returns the exact pinned command rather than running an +unattended flash.) + +## 3. Provision + +``` +python firmware/esp32-csi-node/provision.py --port \ + --ssid "" --password "" --target-ip --target-port 5005 +# optional ADR-060 overrides: +python firmware/esp32-csi-node/provision.py --port --channel 6 --filter-mac AA:BB:CC:DD:EE:FF +``` + +Never echo or commit the WiFi password. + +## 4. Confirm CSI is flowing + +`ruview.node_monitor {port}` — PASS criteria: serial shows `CSI cb #...` callbacks and +(on a bare board) `CSI filter upgraded to MGMT+DATA`. No callbacks → the node isn't +capturing; do not proceed to calibration. diff --git a/harness/ruview/.claude/skills/train-pose/SKILL.md b/harness/ruview/.claude/skills/train-pose/SKILL.md new file mode 100644 index 00000000..61d2f1f6 --- /dev/null +++ b/harness/ruview/.claude/skills/train-pose/SKILL.md @@ -0,0 +1,33 @@ +--- +name: train-pose +description: Train/evaluate WiFi pose models honestly — camera-supervised (MediaPipe + CSI) and camera-free (WiFlow), always checked against the mean-pose baseline before any PCK is quoted. +--- + +# train-pose + +Build a CSI→pose model without overstating it. The project has a **retracted 92.9%/100%** +history — the discipline below exists so it never recurs. + +## The non-negotiable: mean-pose baseline first + +A pose model that always predicts the dataset's *mean pose* already scores ~50% PCK. +**Quote PCK only as a delta over that baseline**, on a held-out split with no subject +or temporal leakage. Example honest result (ADR-181): + +> Held-out PCK@20 **59.5%** vs a 50% mean-pose baseline = **+9.4 pp real signal** — MEASURED. + +## Paths + +- **camera-supervised** (ADR-079) — MediaPipe Pose labels the camera frame; paired CSI + trains the net. Train/infer in one camera frame so the skeleton aligns. +- **camera-free** (WiFlow, ADR-152) — no camera at inference; geometry-conditioned. +- **in-browser** (ADR-181) — WebGPU/WASM trainer; the active backend is shown as a badge + (honest about what's executing). + +## Before you publish a number + +1. Run the mean-pose baseline on the same split. +2. Report `(model − baseline)` in pp, with the split definition (chronological / + blocked-gap / grouped-bucket; no leakage). +3. `ruview.claim_check` the writeup — it flags any untagged or 100%/perfect claim. +4. If it's a benchmark vs SOTA, tag MEASURED-EQUIVALENT only with the reproducer. diff --git a/harness/ruview/.claude/skills/verify/SKILL.md b/harness/ruview/.claude/skills/verify/SKILL.md new file mode 100644 index 00000000..9c3f6c03 --- /dev/null +++ b/harness/ruview/.claude/skills/verify/SKILL.md @@ -0,0 +1,42 @@ +--- +name: verify +description: Prove a RuView result is real — run the deterministic SHA-256 proof and the witness bundle (ADR-028), and lint any claim for MEASURED-vs-CLAIMED honesty. +--- + +# verify + +The "prove everything" skill. Nothing ships as validated without this. + +## Deterministic proof (Trust Kill Switch) + +`ruview.verify` runs `archive/v1/data/proof/verify.py`: it feeds a reference signal +through the production pipeline and hashes the output against +`expected_features.sha256`. Must print **VERDICT: PASS**. If numpy/scipy changed the +hash, regenerate with `verify.py --generate-hash` then re-verify. + +## Witness bundle (ADR-028) + +For a release-grade attestation: + +``` +bash scripts/generate-witness-bundle.sh +cd dist/witness-bundle-ADR028-*/ && bash VERIFY.sh # must be 7/7 PASS +``` + +Contains the Rust test log, the proof + expected hash, firmware SHA-256 manifest, and +crate versions — a recipient can re-verify with one command. + +## Claim honesty + +Run `ruview.claim_check {text}` on any report, README section, PR body, or model card +before quoting accuracy. It flags: +- untagged accuracy numbers (must be MEASURED / CLAIMED / SYNTHETIC), +- MEASURED claims with no reproducer cited, +- the retracted "100%/perfect accuracy" framing. + +## Firmware-specific + +A firmware fix is **not** "hardware-validated" without a captured boot log on real +silicon (e.g. the `v0.8.1-esp32` rev-v0.2 validation: `running headless so CSI +captures (#1000)` + `CSI filter upgraded to MGMT+DATA` + a no-false-detect mmwave +probe). Do not merge or release on a build-passes signal alone. diff --git a/harness/ruview/.harness/manifest.json b/harness/ruview/.harness/manifest.json new file mode 100644 index 00000000..5d358852 --- /dev/null +++ b/harness/ruview/.harness/manifest.json @@ -0,0 +1,39 @@ +{ + "schema": 1, + "generator": "metaharness 0.1.15 + ADR-182 hardening", + "template": "vertical:ruview", + "name": "@ruvnet/ruview", + "vars": { + "name": "@ruvnet/ruview", + "description": "RuView WiFi-sensing operator agent harness", + "host": "claude-code" + }, + "hosts": [ + "claude-code" + ], + "files": { + ".claude/settings.json": "b0ea971383716f18b89db73010b8f0ea0f1b16bdec4cd1068245772ba1c27bdd", + ".claude/skills/calibrate-room/SKILL.md": "6a6c8211a7109feb76620c618963c10ad9a9f633ffce7676e631a80a1181986d", + ".claude/skills/onboard/SKILL.md": "22323732fe746b38b77a7c8c052e952dff2fe87ae939ba125379125827385f21", + ".claude/skills/provision-node/SKILL.md": "5ffe5a75873e873b80758d9c81005774d4191317227f2e9aa4345cbce3f29751", + ".claude/skills/train-pose/SKILL.md": "b3ee95bfb0b678eb3d101138b9ea0e7cab3db3a9906d19c4059f9cca0598e87b", + ".claude/skills/verify/SKILL.md": "c0314d5ead465d9089b6a4917fd125051a5be20dc07ba92d5b601fcaada32e19", + "CLAUDE.md": "7ecdb2b9d9abcf4aa22dd3ce553b60216a135e147893a59fa944fc1a8c81f5ef", + "LICENSE": "631f94984f626818d42ecf717aa6e8e0afd4f9f355ca706bd2effafbd1416d06", + "README.md": "b77d30428de8efb6758f2ca3eb22e84849013b2c0e6c601d488d2ea5a6f0da44", + "bin/cli.js": "b0d74690cff4329dfe342271fc475eaa140b767bdb66b37cf4992ad209012fe8", + "package.json": "2af49561ef0d59cafc4b99885816e580635b2d2ad329dfe17c69b9df6f8afceb", + "skills/calibrate-room.md": "6a6c8211a7109feb76620c618963c10ad9a9f633ffce7676e631a80a1181986d", + "skills/onboard.md": "22323732fe746b38b77a7c8c052e952dff2fe87ae939ba125379125827385f21", + "skills/provision-node.md": "5ffe5a75873e873b80758d9c81005774d4191317227f2e9aa4345cbce3f29751", + "skills/train-pose.md": "b3ee95bfb0b678eb3d101138b9ea0e7cab3db3a9906d19c4059f9cca0598e87b", + "skills/verify.md": "c0314d5ead465d9089b6a4917fd125051a5be20dc07ba92d5b601fcaada32e19", + "src/guardrails.js": "1631cea02c4354fe6126c576300faf5f8b68ae2f5e2e3a658c99eb25a7403e55", + "src/mcp-server.js": "e51379f5ebb0b7b4670c7412714e559931ef1be8df20551f8f7309b53f0fb7af", + "src/tools.js": "b558f61bb202abf5a967ce3a6ccaea351f2d186238cf49c7fc151d1de028eee8" + }, + "meta": { + "surface": "cli+mcp", + "adr": "ADR-182" + } +} \ No newline at end of file diff --git a/harness/ruview/.harness/manifest.sha256 b/harness/ruview/.harness/manifest.sha256 new file mode 100644 index 00000000..f68a09f7 --- /dev/null +++ b/harness/ruview/.harness/manifest.sha256 @@ -0,0 +1 @@ +6c6c1431c37472494c9b309c8b5d761dd4fc41e30313baead6320831fb982e57 manifest.json diff --git a/harness/ruview/CLAUDE.md b/harness/ruview/CLAUDE.md new file mode 100644 index 00000000..4d55629f --- /dev/null +++ b/harness/ruview/CLAUDE.md @@ -0,0 +1,34 @@ +# RuView harness — agent operating notes + +You are operating **RuView** (WiFi-DensePose), a camera-free WiFi-CSI sensing system. + +## The one rule: prove everything + +This project was accused of AI-slop; the fix is hard discipline. Before you quote ANY +accuracy number: + +1. It must be tagged **MEASURED** (with a reproducer named), **CLAIMED**, or **SYNTHETIC**. +2. Pose PCK is quoted only as a **delta over the mean-pose baseline** on a leakage-free + held-out split. (A mean-pose predictor already scores ~50% PCK.) +3. Run `ruview.claim_check` on any report/PR/model-card. It flags untagged numbers and + the retracted "100%/perfect accuracy" framing. +4. Firmware is "hardware-validated" only with a captured **boot log on real silicon** — + never on a build-passes signal. + +## Tools + +`ruview.onboard`, `ruview.claim_check`, `ruview.verify`, `ruview.node_monitor`, +`ruview.calibrate`, `ruview.node_flash`. All fail-closed. Mutating/hardware tools +(`node_flash`) require explicit confirmation and are Windows/ESP-IDF gated. + +## Skills + +`onboard` · `provision-node` · `calibrate-room` · `train-pose` · `verify` +(`npx @ruvnet/ruview skill `). + +## Don'ts + +- Don't present WiFi sensing as camera-grade. +- Don't echo or commit WiFi passwords / secrets. +- Don't merge or release firmware without a real boot log. +- Don't report a PCK without its mean-pose baseline. diff --git a/harness/ruview/LICENSE b/harness/ruview/LICENSE new file mode 100644 index 00000000..03c7f00b --- /dev/null +++ b/harness/ruview/LICENSE @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2026 ruvnet + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/harness/ruview/README.md b/harness/ruview/README.md new file mode 100644 index 00000000..17768682 --- /dev/null +++ b/harness/ruview/README.md @@ -0,0 +1,60 @@ +# `npx @ruvnet/ruview` — RuView WiFi-sensing operator harness + +An AI agent harness that knows how to operate **RuView** (WiFi-DensePose): onboard a +newcomer, provision an ESP32 CSI node, calibrate a room, train pose models, and — +crucially — **refuse to overstate accuracy**. Minted from the RuView monorepo via +[`metaharness`](https://www.npmjs.com/package/metaharness) and hardened per **ADR-182**. + +WiFi sensing infers *coarse* pose/presence/breathing from Channel State Information. +It is **not a camera**. Every accuracy number this harness emits must be MEASURED +against a baseline — that rule is enforced in code (`ruview.claim_check`). + +## Quick start + +```bash +npx @ruvnet/ruview # onboard — pick a setup path +npx @ruvnet/ruview claim-check --text "we hit 100% accuracy" # the honesty guardrail +npx @ruvnet/ruview verify # run the deterministic proof (VERDICT: PASS) +npx @ruvnet/ruview doctor # self-check (tools + optional kernel/host) +npx @ruvnet/ruview --help +``` + +The operator tools are pure Node and run with **zero install weight**. The +`@metaharness/kernel` + host adapter are `optionalDependencies` — only `doctor` / +`install` use them, only if present. + +## Tools (`ruview.*`) + +Exposed both as CLI verbs and as an MCP server (`npx @ruvnet/ruview mcp start`): + +| Tool | What it does | +|------|--------------| +| `ruview.onboard` | Pick docker-demo / repo-build / live-esp32; print the next command | +| `ruview.claim_check` | Lint text for untagged / overstated accuracy claims (guardrail) | +| `ruview.verify` | Run `verify.py` deterministic proof → VERDICT | +| `ruview.node_monitor` | Assert CSI is flowing on an ESP32 (read-only) | +| `ruview.calibrate` | ADR-151 room pipeline (baseline→enroll→train-room→room-watch) | +| `ruview.node_flash` | Build+flash firmware (Windows/ESP-IDF; mutating, guarded) | + +Every tool is **fail-closed**: missing repo / python / binary / port → an honest +negative, never a fabricated success. + +## Skills + +Host-neutral playbooks in `skills/` (`onboard`, `provision-node`, `calibrate-room`, +`train-pose`, `verify`). `npx @ruvnet/ruview skill ` prints one. + +## Use as a Claude Code MCP server + +The bundled `.claude/settings.json` registers the `ruview` MCP server +(`npx -y @ruvnet/ruview mcp start`). Drop this package's `.claude/` into a repo, or run +`npx @ruvnet/ruview install --host claude-code`. + +## Hosts + +claude-code (bundled), and via metaharness host adapters: codex, opencode, copilot, +pi-dev, hermes, rvm, github-actions. + +## License + +MIT © ruvnet diff --git a/harness/ruview/bin/cli.js b/harness/ruview/bin/cli.js new file mode 100644 index 00000000..5f3af268 --- /dev/null +++ b/harness/ruview/bin/cli.js @@ -0,0 +1,181 @@ +#!/usr/bin/env node +// SPDX-License-Identifier: MIT +// `npx ruview` — the RuView WiFi-sensing operator harness (minted via metaharness, +// hardened per ADR-182). Plain ESM, no build step: ships and runs as-is. +// +// The `ruview.*` tools (onboard/verify/claim-check/…) are PURE Node and run with +// zero deps. The kernel + host adapter are only touched by `doctor`/`install` +// (the harness-into-a-repo story), so the operator tools never block on a wasm load. + +import { fileURLToPath } from 'node:url'; +import { realpathSync, existsSync, readdirSync, readFileSync } from 'node:fs'; +import { join, dirname } from 'node:path'; +import { argv } from 'node:process'; +import { TOOLS, runTool, listTools } from '../src/tools.js'; +import { claimCheck, summarize } from '../src/guardrails.js'; + +const NAME = 'ruview'; +const ROOT = dirname(dirname(fileURLToPath(import.meta.url))); +const SKILLS_DIR = join(ROOT, 'skills'); + +// Map friendly CLI verbs → registry tool names. +const VERB_TO_TOOL = { + onboard: 'ruview.onboard', + verify: 'ruview.verify', + 'claim-check': 'ruview.claim_check', + calibrate: 'ruview.calibrate', + monitor: 'ruview.node_monitor', + flash: 'ruview.node_flash', +}; + +function pjson(o) { console.log(JSON.stringify(o, null, 2)); } + +function listSkills() { + if (!existsSync(SKILLS_DIR)) return []; + return readdirSync(SKILLS_DIR).filter((f) => f.endsWith('.md')).map((f) => f.replace(/\.md$/, '')); +} + +async function doctor() { + const checks = []; + // Tools layer (always available, no deps). + checks.push(['tool registry loads', Object.keys(TOOLS).length > 0]); + checks.push(['claim_check flags a 100% claim', + !claimCheck('We hit 100% accuracy on poses.').ok]); + checks.push(['claim_check passes a tagged MEASURED claim', + claimCheck('Held-out PCK@20 59.5% (MEASURED vs mean-pose baseline, verify.py).').ok]); + checks.push(['skills present', listSkills().length > 0]); + // Kernel + host adapter (optional — only needed to install into a repo). + let kernelLine = 'kernel/host: not installed (ok — operator tools run without them)'; + try { + const { loadKernel } = await import('@metaharness/kernel'); + const adapter = (await import('@metaharness/host-claude-code')).default; + const k = await loadKernel(); + const info = k.kernelInfo(); + checks.push(['kernel loads + reports version', typeof info.version === 'string' && info.version.length > 0]); + checks.push(['kernel backend is native|wasm|js', ['native', 'wasm', 'js'].includes(k.backend)]); + checks.push(['host adapter resolves', typeof adapter?.name === 'string']); + kernelLine = `kernel ${info.version} (${k.backend}) · host ${adapter.name}`; + } catch { + /* kernel not installed — fine for the tools-only path */ + } + let ok = true; + for (const [label, pass] of checks) { console.log(`${pass ? 'PASS' : 'FAIL'} ${label}`); if (!pass) ok = false; } + console.log(`\n${NAME}: ${ok ? 'all checks passed' : 'doctor found problems'} — ${kernelLine}`); + return ok ? 0 : 1; +} + +function help() { + console.log(`Usage: ${NAME} [options] + +Operator tools: + onboard [--path docker-demo|repo-build|live-esp32] pick a setup path + verify [--repo ] run the deterministic proof (VERDICT: PASS) + claim-check --text "..." | --file lint accuracy claims (the honesty guardrail) + calibrate --step baseline|enroll|train-room|room-watch + monitor --port COM8 [--seconds 12] assert CSI is flowing on a node + flash --port COM8 --variant s3-8mb [--confirm] build+flash firmware (Windows/ESP-IDF) + +Harness: + doctor verify the install (tools + optional kernel/host) + skills list bundled skills + skill print a skill playbook + mcp start run the ruview.* MCP server (stdio) + install --host project the harness config into the current repo + --version | --help + +Hosts: claude-code, codex, opencode, copilot, pi-dev, hermes, rvm, github-actions`); + return 0; +} + +/** tiny flag parser: --k v / --k=v / --flag (boolean) */ +function parseFlags(rest) { + const f = {}; + for (let i = 0; i < rest.length; i++) { + const a = rest[i]; + if (a.startsWith('--')) { + const eq = a.indexOf('='); + if (eq !== -1) { f[a.slice(2, eq)] = a.slice(eq + 1); } + else if (i + 1 < rest.length && !rest[i + 1].startsWith('--')) { f[a.slice(2)] = rest[++i]; } + else { f[a.slice(2)] = true; } + } + } + return f; +} + +export async function run(args) { + const cmd = args[0] ?? 'onboard'; + const rest = args.slice(1); + const flags = parseFlags(rest); + + // Direct tool verbs. + if (VERB_TO_TOOL[cmd]) { + const toolArgs = { ...flags }; + if (cmd === 'claim-check') { + if (flags.file) toolArgs.text = readFileSync(flags.file, 'utf8'); + const res = runTool('ruview.claim_check', toolArgs); + pjson(res); + return res.ok ? 0 : 1; + } + if (cmd === 'monitor' && flags.seconds) toolArgs.seconds = Number(flags.seconds); + if (cmd === 'calibrate' && typeof flags.args === 'string') toolArgs.args = flags.args.split(','); + const res = runTool(VERB_TO_TOOL[cmd], toolArgs); + pjson(res); + return res.ok ? 0 : 1; + } + + switch (cmd) { + case 'doctor': return doctor(); + case 'skills': console.log(listSkills().join('\n') || '(none)'); return 0; + case 'skill': { + const n = rest[0]; + const p = n && join(SKILLS_DIR, `${n}.md`); + if (!p || !existsSync(p)) { console.error(`No skill "${n}". Try: ${listSkills().join(', ')}`); return 2; } + console.log(readFileSync(p, 'utf8')); + return 0; + } + case 'mcp': { + if (rest[0] === 'start' || rest[0] === undefined) { + const { startMcpServer } = await import('../src/mcp-server.js'); + startMcpServer(); + return new Promise(() => {}); // run until stdin closes + } + console.error('Usage: ruview mcp start'); return 2; + } + case 'install': { + const host = flags.host || 'claude-code'; + try { + const adapter = (await import('@metaharness/host-claude-code')).default; + console.log(`Projecting RuView harness for host "${host}" via ${adapter.name}.`); + console.log('Add to your host config — MCP server command: npx -y ruview mcp start'); + console.log('Skills:', listSkills().join(', ')); + return 0; + } catch { + console.error('Host adapter not installed. `npm i @metaharness/host-claude-code` or use the bundled .claude/ config.'); + return 1; + } + } + case 'tools': pjson(listTools()); return 0; + case '--version': case '-v': { + const pkg = JSON.parse(readFileSync(join(ROOT, 'package.json'), 'utf8')); + console.log(pkg.version); return 0; + } + case '--help': case '-h': return help(); + default: + console.error(`Unknown command: ${cmd}. Try \`${NAME} --help\`.`); + return 2; + } +} + +// CLI guard: run only when invoked directly (realpath both sides — npm/npx shims +// pass a non-normalized, possibly case-skewed argv[1] on Windows). +const invokedDirectly = (() => { + if (!argv[1]) return false; + try { + const a = realpathSync(argv[1]); + const b = realpathSync(fileURLToPath(import.meta.url)); + return process.platform === 'win32' ? a.toLowerCase() === b.toLowerCase() : a === b; + } catch { return false; } +})(); +if (invokedDirectly) { + run(argv.slice(2)).then((code) => process.exit(code)).catch((err) => { console.error(err); process.exit(1); }); +} diff --git a/harness/ruview/package.json b/harness/ruview/package.json new file mode 100644 index 00000000..a0e5869f --- /dev/null +++ b/harness/ruview/package.json @@ -0,0 +1,65 @@ +{ + "name": "@ruvnet/ruview", + "version": "0.1.0", + "description": "RuView WiFi-sensing operator agent harness — onboard, calibrate, train, and verify camera-free WiFi-CSI sensing, with the project's MEASURED-vs-CLAIMED honesty guardrail enforced. Minted via metaharness (ADR-182).", + "type": "module", + "bin": { + "ruview": "bin/cli.js" + }, + "exports": { + ".": "./src/tools.js", + "./guardrails": "./src/guardrails.js" + }, + "files": [ + "bin/", + "src/", + "skills/", + ".claude/", + ".harness/", + "CLAUDE.md", + "README.md", + "LICENSE" + ], + "scripts": { + "test": "node --test test/*.test.mjs", + "doctor": "node ./bin/cli.js doctor", + "mcp": "node ./bin/cli.js mcp start" + }, + "optionalDependencies": { + "@metaharness/kernel": "^0.1.0", + "@metaharness/host-claude-code": "^0.1.0" + }, + "keywords": [ + "wifi-sensing", + "wifi-densepose", + "ruview", + "csi", + "channel-state-information", + "pose-estimation", + "presence-detection", + "esp32", + "agent-harness", + "metaharness", + "mcp", + "mcp-server", + "claude-code", + "ambient-intelligence" + ], + "engines": { + "node": ">=20.0.0" + }, + "license": "MIT", + "author": "ruvnet", + "homepage": "https://github.com/ruvnet/RuView#readme", + "repository": { + "type": "git", + "url": "git+https://github.com/ruvnet/RuView.git", + "directory": "harness/ruview" + }, + "bugs": { + "url": "https://github.com/ruvnet/RuView/issues" + }, + "publishConfig": { + "access": "public" + } +} diff --git a/harness/ruview/skills/calibrate-room.md b/harness/ruview/skills/calibrate-room.md new file mode 100644 index 00000000..c369974b --- /dev/null +++ b/harness/ruview/skills/calibrate-room.md @@ -0,0 +1,29 @@ +--- +name: calibrate-room +description: Run the ADR-151 per-room calibration pipeline — baseline → enroll → extract → train → a bank of small specialists (presence/posture/breathing/heartbeat/restlessness/anomaly). +--- + +# calibrate-room + +Turn a provisioned node + sensing-server into a working room model. Pure-Rust, +edge-deployable (ADR-151). Use the `ruview.calibrate` tool (installed +`wifi-densepose` binary, else `cargo run -p wifi-densepose-cli`). + +## Sequence + +1. **baseline** — capture the empty room (Welford amplitude + von Mises phase). Leave + the room empty. + `ruview.calibrate {step: "baseline"}` +2. **enroll** — record the occupant(s) doing the target activities. + `ruview.calibrate {step: "enroll"}` +3. **train-room** — train the bank of small specialists from baseline + enrollment. + `ruview.calibrate {step: "train-room"}` +4. **room-watch** — live presence/posture/breathing from the trained room. + `ruview.calibrate {step: "room-watch"}` (or the `room-watch` skill) + +## Honesty + +The specialists are calibrated to *this* room; cross-room transfer is a separate +problem (LoRA recalibration, ADR-079 P9). Report which room a number came from, and +tag presence/vitals accuracy MEASURED only with a held-out check — run +`ruview.claim_check` on the writeup. diff --git a/harness/ruview/skills/onboard.md b/harness/ruview/skills/onboard.md new file mode 100644 index 00000000..dd248099 --- /dev/null +++ b/harness/ruview/skills/onboard.md @@ -0,0 +1,30 @@ +--- +name: onboard +description: Zero-to-sensing path picker for RuView (WiFi-DensePose) — pick docker-demo, repo-build, or live-esp32 and run the next concrete step. +--- + +# onboard + +Get a newcomer from nothing to a working RuView setup. **First fact to set:** WiFi +sensing infers *coarse* pose/presence/breathing from Channel State Information — it +is **not a camera**, and any accuracy number must be MEASURED against a baseline +(use the `verify` skill / `ruview.claim_check` tool). Never present WiFi output as +camera-grade. + +## Pick a path + +Run `ruview.onboard {path}` or decide from: + +1. **docker-demo** — fastest, no hardware. Replays sample CSI into the dashboard. + `docker run -p 8000:8000 ruvnet/wifi-densepose` → open `http://localhost:8000`. + Use to see what it looks like. +2. **repo-build** — for developers. `cd v2 && cargo test --workspace --no-default-features` + (1,031+ tests pass), then `cargo run -p wifi-densepose-cli -- --help`. +3. **live-esp32** — a real install. Flash a node (`provision-node` skill), point it at + the sensing-server, then `calibrate-room`. This is the only path that senses a real room. + +## Then + +- Live sensing → go to **provision-node**, then **calibrate-room**. +- Evaluating a model/claim → go to **verify** and run `ruview.claim_check` on any + report before you quote a number. diff --git a/harness/ruview/skills/provision-node.md b/harness/ruview/skills/provision-node.md new file mode 100644 index 00000000..315bef39 --- /dev/null +++ b/harness/ruview/skills/provision-node.md @@ -0,0 +1,49 @@ +--- +name: provision-node +description: Build, flash, and provision an ESP32-S3/C6 CSI node for RuView — firmware variant choice, ESP-IDF Windows-subprocess flow, NVS/WiFi/channel/MAC-filter overrides. +--- + +# provision-node + +Bring an ESP32 sensing node online. + +## 1. Pick a firmware variant + +- **s3-8mb** (display build) — ESP32-S3 N16R8 / 16MB; AMOLED optional. The display-detect + fix (#1000) means a *bare* board still captures CSI (MGMT+DATA). +- **s3-4mb** (no-display) — ESP32-S3 4MB; dual-OTA, display disabled. +- **c6** — ESP32-C6 + Seeed MR60BHA2 (60 GHz mmWave + WiFi CSI). The mmwave probe + requires a validated MR60 header (#1107) so an empty UART never false-detects. + +Prebuilt binaries: GitHub release `v0.8.1-esp32` (hardware-validated on S3 QFN56 rev v0.2). + +## 2. Flash + +ESP-IDF v5.4 on Windows is **subprocess-only** (Git Bash/MSYS is unsupported — strip +`MSYSTEM*` env vars). Offsets for the S3 image: + +``` +esptool --chip esp32s3 -p -b 460800 write_flash \ + 0x0 bootloader.bin 0x8000 partition-table.bin \ + 0xf000 ota_data_initial.bin 0x20000 esp32-csi-node-s3-8mb.bin +``` + +(`ruview.node_flash` returns the exact pinned command rather than running an +unattended flash.) + +## 3. Provision + +``` +python firmware/esp32-csi-node/provision.py --port \ + --ssid "" --password "" --target-ip --target-port 5005 +# optional ADR-060 overrides: +python firmware/esp32-csi-node/provision.py --port --channel 6 --filter-mac AA:BB:CC:DD:EE:FF +``` + +Never echo or commit the WiFi password. + +## 4. Confirm CSI is flowing + +`ruview.node_monitor {port}` — PASS criteria: serial shows `CSI cb #...` callbacks and +(on a bare board) `CSI filter upgraded to MGMT+DATA`. No callbacks → the node isn't +capturing; do not proceed to calibration. diff --git a/harness/ruview/skills/train-pose.md b/harness/ruview/skills/train-pose.md new file mode 100644 index 00000000..61d2f1f6 --- /dev/null +++ b/harness/ruview/skills/train-pose.md @@ -0,0 +1,33 @@ +--- +name: train-pose +description: Train/evaluate WiFi pose models honestly — camera-supervised (MediaPipe + CSI) and camera-free (WiFlow), always checked against the mean-pose baseline before any PCK is quoted. +--- + +# train-pose + +Build a CSI→pose model without overstating it. The project has a **retracted 92.9%/100%** +history — the discipline below exists so it never recurs. + +## The non-negotiable: mean-pose baseline first + +A pose model that always predicts the dataset's *mean pose* already scores ~50% PCK. +**Quote PCK only as a delta over that baseline**, on a held-out split with no subject +or temporal leakage. Example honest result (ADR-181): + +> Held-out PCK@20 **59.5%** vs a 50% mean-pose baseline = **+9.4 pp real signal** — MEASURED. + +## Paths + +- **camera-supervised** (ADR-079) — MediaPipe Pose labels the camera frame; paired CSI + trains the net. Train/infer in one camera frame so the skeleton aligns. +- **camera-free** (WiFlow, ADR-152) — no camera at inference; geometry-conditioned. +- **in-browser** (ADR-181) — WebGPU/WASM trainer; the active backend is shown as a badge + (honest about what's executing). + +## Before you publish a number + +1. Run the mean-pose baseline on the same split. +2. Report `(model − baseline)` in pp, with the split definition (chronological / + blocked-gap / grouped-bucket; no leakage). +3. `ruview.claim_check` the writeup — it flags any untagged or 100%/perfect claim. +4. If it's a benchmark vs SOTA, tag MEASURED-EQUIVALENT only with the reproducer. diff --git a/harness/ruview/skills/verify.md b/harness/ruview/skills/verify.md new file mode 100644 index 00000000..9c3f6c03 --- /dev/null +++ b/harness/ruview/skills/verify.md @@ -0,0 +1,42 @@ +--- +name: verify +description: Prove a RuView result is real — run the deterministic SHA-256 proof and the witness bundle (ADR-028), and lint any claim for MEASURED-vs-CLAIMED honesty. +--- + +# verify + +The "prove everything" skill. Nothing ships as validated without this. + +## Deterministic proof (Trust Kill Switch) + +`ruview.verify` runs `archive/v1/data/proof/verify.py`: it feeds a reference signal +through the production pipeline and hashes the output against +`expected_features.sha256`. Must print **VERDICT: PASS**. If numpy/scipy changed the +hash, regenerate with `verify.py --generate-hash` then re-verify. + +## Witness bundle (ADR-028) + +For a release-grade attestation: + +``` +bash scripts/generate-witness-bundle.sh +cd dist/witness-bundle-ADR028-*/ && bash VERIFY.sh # must be 7/7 PASS +``` + +Contains the Rust test log, the proof + expected hash, firmware SHA-256 manifest, and +crate versions — a recipient can re-verify with one command. + +## Claim honesty + +Run `ruview.claim_check {text}` on any report, README section, PR body, or model card +before quoting accuracy. It flags: +- untagged accuracy numbers (must be MEASURED / CLAIMED / SYNTHETIC), +- MEASURED claims with no reproducer cited, +- the retracted "100%/perfect accuracy" framing. + +## Firmware-specific + +A firmware fix is **not** "hardware-validated" without a captured boot log on real +silicon (e.g. the `v0.8.1-esp32` rev-v0.2 validation: `running headless so CSI +captures (#1000)` + `CSI filter upgraded to MGMT+DATA` + a no-false-detect mmwave +probe). Do not merge or release on a build-passes signal alone. diff --git a/harness/ruview/src/guardrails.js b/harness/ruview/src/guardrails.js new file mode 100644 index 00000000..787f5423 --- /dev/null +++ b/harness/ruview/src/guardrails.js @@ -0,0 +1,106 @@ +// SPDX-License-Identifier: MIT +// RuView harness guardrails — the "prove everything" rule made executable. +// +// The project was accused of AI-slop; the cultural fix is that every accuracy +// number must be tagged MEASURED (with a reproducer) or CLAIMED/SYNTHETIC, and +// the retracted "100% accuracy" framing must never reappear untagged. This module +// is the static enforcement of that, shared by the `ruview.claim_check` MCP tool, +// the `npx ruview claim-check` CLI, and the claude-code pre-output hook. + +/** Phrases that signal a quantitative accuracy claim. */ +const METRIC_TERMS = [ + 'accuracy', 'pck', 'pck@', 'f1', 'precision', 'recall', 'map', 'auc', + 'iou', 'mpjpe', 'error rate', 'detection rate', 'true positive', +]; + +/** Tags that make a claim honest (case-insensitive). */ +const HONEST_TAGS = ['measured', 'claimed', 'synthetic', 'unvalidated', 'baseline']; + +/** Reproducer references that count as evidence backing a MEASURED claim. */ +const REPRODUCER_HINTS = [ + 'verify.py', 'witness', 'mean-pose', 'mean pose', 'held-out', 'held out', + 'baseline', 'reproduce', 'sha256', 'boot log', 'pck@20 vs', 'expected_features', +]; + +const PERCENT_RE = /\b(\d{1,3}(?:\.\d+)?)\s?%/g; +// "perfect" / "100%" framing is the specific retracted claim — always high severity. +// NOTE: no trailing \b after "%": "%"→" " is non-word→non-word, so a trailing \b +// never matches and would silently miss "100%". Bare 100% is only damning next to a +// metric term (see claimCheck); the word phrases are inherently accuracy claims. +const PERFECT_PCT_RE = /\b100(?:\.0+)?\s?%/; +const PERFECT_WORD_RE = /perfect accuracy|flawless|never (?:wrong|fails)/i; + +/** + * Lint a block of text for untagged or overstated accuracy claims. + * @param {string} text + * @returns {{ok: boolean, findings: Array<{severity:'high'|'medium', line:number, excerpt:string, reason:string, suggestion:string}>}} + */ +export function claimCheck(text) { + const findings = []; + if (typeof text !== 'string' || text.length === 0) { + return { ok: true, findings }; + } + const lines = text.split(/\r?\n/); + + lines.forEach((raw, i) => { + const line = raw.trim(); + if (!line) return; + const lower = line.toLowerCase(); + + const hasPercent = PERCENT_RE.test(line); + PERCENT_RE.lastIndex = 0; // reset stateful global regex + const mentionsMetric = METRIC_TERMS.some((t) => lower.includes(t)); + if (!hasPercent && !mentionsMetric) return; + + const tagged = HONEST_TAGS.some((t) => lower.includes(t)); + const hasReproducer = REPRODUCER_HINTS.some((h) => lower.includes(h)); + const perfect = PERFECT_WORD_RE.test(line) || (mentionsMetric && PERFECT_PCT_RE.test(line)); + + if (perfect && !lower.includes('retract')) { + findings.push({ + severity: 'high', + line: i + 1, + excerpt: clip(line), + reason: 'States perfect/100% accuracy — this is the exact framing the project retracted.', + suggestion: 'Replace with a held-out number vs the mean-pose baseline, tagged MEASURED, or mark the old claim "retracted".', + }); + return; + } + + // A metric/percent with no honesty tag at all. + if (!tagged) { + findings.push({ + severity: 'medium', + line: i + 1, + excerpt: clip(line), + reason: 'Accuracy claim is not tagged MEASURED / CLAIMED / SYNTHETIC.', + suggestion: 'Tag it. If MEASURED, name the reproducer (verify.py, witness bundle, held-out vs mean-pose).', + }); + return; + } + + // Tagged MEASURED but cites no reproducer — still a gap. + if (lower.includes('measured') && !hasReproducer) { + findings.push({ + severity: 'medium', + line: i + 1, + excerpt: clip(line), + reason: 'Tagged MEASURED but cites no reproducer/evidence.', + suggestion: 'Add the evidence path: verify.py VERDICT, witness bundle, or held-out PCK vs the mean-pose baseline.', + }); + } + }); + + return { ok: findings.length === 0, findings }; +} + +function clip(s, n = 120) { + return s.length > n ? `${s.slice(0, n - 1)}…` : s; +} + +/** Convenience: a one-line human summary for CLI output. */ +export function summarize(result) { + if (result.ok) return 'claim-check: PASS — no untagged or overstated accuracy claims.'; + const high = result.findings.filter((f) => f.severity === 'high').length; + return `claim-check: ${result.findings.length} finding(s) (${high} high) — accuracy claims need MEASURED/CLAIMED tags + a reproducer.`; +} diff --git a/harness/ruview/src/mcp-server.js b/harness/ruview/src/mcp-server.js new file mode 100644 index 00000000..04c1a552 --- /dev/null +++ b/harness/ruview/src/mcp-server.js @@ -0,0 +1,68 @@ +// SPDX-License-Identifier: MIT +// RuView harness — minimal MCP stdio server (JSON-RPC 2.0 over stdin/stdout). +// +// Dependency-free on purpose: a published `npx ruview` must `mcp start` without +// pulling the full MCP SDK. Implements the subset hosts use: `initialize`, +// `tools/list`, `tools/call`, and the `notifications/initialized` ack. Logs go to +// stderr ONLY — stdout is the JSON-RPC channel and must stay clean. + +import { createInterface } from 'node:readline'; +import { listTools, runTool } from './tools.js'; + +const PROTOCOL_VERSION = '2024-11-05'; +const SERVER_INFO = { name: 'ruview', version: '0.1.0' }; + +function send(msg) { + process.stdout.write(JSON.stringify(msg) + '\n'); +} +function result(id, res) { send({ jsonrpc: '2.0', id, result: res }); } +function error(id, code, message) { send({ jsonrpc: '2.0', id, error: { code, message } }); } +function log(...a) { process.stderr.write('[ruview-mcp] ' + a.join(' ') + '\n'); } + +function handle(msg) { + const { id, method, params } = msg; + switch (method) { + case 'initialize': + return result(id, { + protocolVersion: PROTOCOL_VERSION, + capabilities: { tools: { listChanged: false } }, + serverInfo: SERVER_INFO, + instructions: 'RuView WiFi-sensing operator tools. All results are fail-closed; accuracy claims must pass ruview.claim_check.', + }); + case 'notifications/initialized': + case 'initialized': + return; // notification — no response + case 'ping': + return result(id, {}); + case 'tools/list': + return result(id, { tools: listTools() }); + case 'tools/call': { + const name = params?.name; + const args = params?.arguments || {}; + const out = runTool(name, args); + // MCP content envelope: text block with the JSON, isError reflects ok=false. + return result(id, { + content: [{ type: 'text', text: JSON.stringify(out, null, 2) }], + isError: out && out.ok === false, + }); + } + default: + if (id !== undefined) error(id, -32601, `Method not found: ${method}`); + } +} + +export function startMcpServer() { + log(`starting (protocol ${PROTOCOL_VERSION}, ${listTools().length} tools)`); + const rl = createInterface({ input: process.stdin, crlfDelay: Infinity }); + rl.on('line', (line) => { + const s = line.trim(); + if (!s) return; + let msg; + try { msg = JSON.parse(s); } catch { return log('bad JSON line dropped'); } + try { handle(msg); } catch (err) { + if (msg && msg.id !== undefined) error(msg.id, -32603, String(err && err.message || err)); + log('handler error:', String(err)); + } + }); + rl.on('close', () => { log('stdin closed — exiting'); process.exit(0); }); +} diff --git a/harness/ruview/src/tools.js b/harness/ruview/src/tools.js new file mode 100644 index 00000000..6465f2b2 --- /dev/null +++ b/harness/ruview/src/tools.js @@ -0,0 +1,220 @@ +// SPDX-License-Identifier: MIT +// RuView harness — the `ruview.*` tool registry. +// +// One registry consumed by BOTH the CLI (`npx ruview `) and the MCP server +// (`npx ruview mcp start`). Every handler returns structured JSON and is +// FAIL-CLOSED: when a prerequisite (the RuView repo, python+pyserial, the +// `wifi-densepose` binary, an ESP32 on a port) is absent, it returns an honest +// negative — never a fabricated success. This mirrors the project's "prove +// everything" rule and the RuField fail-closed posture (ADR-262 §3.3). + +import { spawnSync } from 'node:child_process'; +import { existsSync, readFileSync } from 'node:fs'; +import { join, dirname, resolve } from 'node:path'; +import { claimCheck, summarize } from './guardrails.js'; + +/** Walk up from `start` to find the RuView monorepo root (or null). */ +export function findRepoRoot(start = process.cwd()) { + let dir = resolve(start); + for (let i = 0; i < 8; i++) { + const hasProof = existsSync(join(dir, 'archive', 'v1', 'data', 'proof', 'verify.py')); + const hasV2 = existsSync(join(dir, 'v2', 'Cargo.toml')); + if (hasProof || hasV2) return dir; + const parent = dirname(dir); + if (parent === dir) break; + dir = parent; + } + return null; +} + +function which(cmd) { + const probe = process.platform === 'win32' + ? spawnSync('where', [cmd], { encoding: 'utf8' }) + : spawnSync('command', ['-v', cmd], { encoding: 'utf8', shell: true }); + return probe.status === 0 ? (probe.stdout || '').trim().split(/\r?\n/)[0] : null; +} + +function run(cmd, args, opts = {}) { + const r = spawnSync(cmd, args, { encoding: 'utf8', timeout: opts.timeout ?? 120000, ...opts }); + return { + status: r.status, + ok: r.status === 0, + stdout: (r.stdout || '').slice(-8000), + stderr: (r.stderr || '').slice(-4000), + error: r.error ? r.error.message : null, + }; +} + +const ONBOARD_PATHS = { + 'docker-demo': 'Fastest. `docker run -p 8000:8000 ruvnet/wifi-densepose` → open the dashboard. No hardware; replays sample CSI. Good for "what does it look like".', + 'repo-build': 'Build from source. `cd v2 && cargo test --workspace --no-default-features` (1,031+ tests). Then `cargo run -p wifi-densepose-cli -- --help`. Good for developers.', + 'live-esp32': 'Real sensing. Flash an ESP32-S3 (see `provision-node` skill), point it at the sensing-server, then `calibrate → enroll → train-room → room-watch` (see `calibrate-room`). Good for an actual install.', +}; + +/** + * The tool registry. Each entry: { title, description, inputSchema, handler }. + * inputSchema is JSON-Schema (object). handler(args) → JSON-serializable result. + */ +export const TOOLS = { + 'ruview.onboard': { + title: 'Onboard', + description: 'Pick a RuView setup path (docker-demo | repo-build | live-esp32) and print the next concrete command.', + inputSchema: { + type: 'object', + properties: { path: { type: 'string', enum: Object.keys(ONBOARD_PATHS), description: 'Which setup path. Omit to list all.' } }, + }, + handler(args = {}) { + const repo = findRepoRoot(); + if (args.path && ONBOARD_PATHS[args.path]) { + return { ok: true, path: args.path, next: ONBOARD_PATHS[args.path], in_ruview_repo: !!repo }; + } + return { + ok: true, + in_ruview_repo: !!repo, + repo_root: repo, + paths: ONBOARD_PATHS, + recommend: repo ? 'repo-build' : 'docker-demo', + note: 'WiFi sensing infers coarse pose/presence from CSI — it is not a camera. Accuracy claims must be MEASURED vs a baseline (run `ruview.claim_check`).', + }; + }, + }, + + 'ruview.claim_check': { + title: 'Claim check', + description: 'Static lint: scan text for untagged or overstated accuracy claims (the "prove everything" guardrail). Returns findings.', + inputSchema: { + type: 'object', + required: ['text'], + properties: { text: { type: 'string', description: 'The text to lint (a report, README section, PR body, model card).' } }, + }, + handler(args = {}) { + const result = claimCheck(String(args.text ?? '')); + return { ...result, summary: summarize(result) }; + }, + }, + + 'ruview.verify': { + title: 'Verify (witness)', + description: 'Run the deterministic proof (archive/v1/data/proof/verify.py) and report VERDICT. Fail-closed if not in a RuView repo or python is missing.', + inputSchema: { + type: 'object', + properties: { repo: { type: 'string', description: 'RuView repo root. Default: auto-detect from cwd.' } }, + }, + handler(args = {}) { + const repo = args.repo ? resolve(args.repo) : findRepoRoot(); + if (!repo) return { ok: false, reason: 'not_in_ruview_repo', hint: 'Run inside the RuView monorepo or pass {repo}.' }; + const proof = join(repo, 'archive', 'v1', 'data', 'proof', 'verify.py'); + if (!existsSync(proof)) return { ok: false, reason: 'proof_missing', path: proof }; + const py = which('python') || which('python3'); + if (!py) return { ok: false, reason: 'python_missing', hint: 'Install python to run the deterministic proof.' }; + const r = run(py, [proof], { cwd: repo, timeout: 180000 }); + const verdict = /VERDICT:\s*PASS/i.test(r.stdout) ? 'PASS' : (/VERDICT:\s*FAIL/i.test(r.stdout) ? 'FAIL' : 'UNKNOWN'); + return { ok: r.ok && verdict === 'PASS', verdict, exit: r.status, tail: r.stdout.slice(-1200), stderr: r.stderr.slice(-400) }; + }, + }, + + 'ruview.node_monitor': { + title: 'Node monitor', + description: 'Open an ESP32 serial port and assert CSI is flowing (MGMT+DATA). Fail-closed if python+pyserial or the port is absent. Read-only.', + inputSchema: { + type: 'object', + properties: { + port: { type: 'string', description: 'Serial port, e.g. COM8 or /dev/ttyUSB0.' }, + seconds: { type: 'number', description: 'Capture window (default 12).' }, + }, + }, + handler(args = {}) { + const port = args.port; + if (!port) return { ok: false, reason: 'no_port', hint: 'Pass {port} (e.g. COM8).' }; + const py = which('python') || which('python3'); + if (!py) return { ok: false, reason: 'python_missing' }; + const dur = Number(args.seconds) > 0 ? Number(args.seconds) : 12; + const script = [ + 'import sys,time', + 'try:', + ' import serial', + 'except Exception as e:', + " print('NO_PYSERIAL'); sys.exit(3)", + `ser=serial.Serial(${JSON.stringify(port)},115200,timeout=1)`, + 'csi=0; n=0; t=time.time()', + `while time.time()-t<${dur}:`, + ' ln=ser.readline()', + ' if not ln: continue', + " s=ln.decode('utf-8','replace')", + ' n+=1', + " if 'CSI cb' in s or 'csi_collector' in s: csi+=1", + " if 'MGMT+DATA' in s: print('UPGRADE_MGMT_DATA')", + 'ser.close()', + "print(f'LINES={n} CSI={csi}')", + ].join('\n'); + const r = run(py, ['-c', script], { timeout: (dur + 10) * 1000 }); + if (r.stdout.includes('NO_PYSERIAL')) return { ok: false, reason: 'pyserial_missing', hint: 'pip install pyserial' }; + if (!r.ok) return { ok: false, reason: 'port_error', stderr: r.stderr, error: r.error }; + const csi = Number((r.stdout.match(/CSI=(\d+)/) || [])[1] || 0); + const upgraded = r.stdout.includes('UPGRADE_MGMT_DATA'); + return { ok: csi > 0, csi_callbacks: csi, mgmt_data_upgrade: upgraded, raw: r.stdout.trim() }; + }, + }, + + 'ruview.calibrate': { + title: 'Calibrate room', + description: 'Run the ADR-151 room pipeline via the wifi-densepose CLI (baseline→enroll→train-room). Fail-closed if the binary is absent.', + inputSchema: { + type: 'object', + properties: { + step: { type: 'string', enum: ['baseline', 'enroll', 'train-room', 'room-watch'], description: 'Which calibration step.' }, + args: { type: 'array', items: { type: 'string' }, description: 'Extra CLI args passed through.' }, + }, + }, + handler(args = {}) { + const step = args.step || 'baseline'; + const bin = which('wifi-densepose'); + const repo = findRepoRoot(); + if (!bin && !repo) return { ok: false, reason: 'cli_missing', hint: 'Install the wifi-densepose CLI or run in the repo (cargo run -p wifi-densepose-cli).' }; + const passthru = Array.isArray(args.args) ? args.args.map(String) : []; + // Prefer the installed binary; otherwise cargo-run from the repo. + const r = bin + ? run(bin, [step, ...passthru], { timeout: 300000 }) + : run('cargo', ['run', '-q', '-p', 'wifi-densepose-cli', '--', step, ...passthru], { cwd: repo, timeout: 600000 }); + return { ok: r.ok, step, via: bin ? 'binary' : 'cargo', exit: r.status, tail: r.stdout.slice(-1500), stderr: r.stderr.slice(-500) }; + }, + }, + + 'ruview.node_flash': { + title: 'Node flash', + description: 'Build+flash an ESP32 firmware variant. MUTATING + hardware. Fail-closed off-Windows or without ESP-IDF. Never claims hardware validation without a boot log.', + inputSchema: { + type: 'object', + properties: { + port: { type: 'string', description: 'Target port, e.g. COM8.' }, + variant: { type: 'string', enum: ['s3-8mb', 's3-4mb', 'c6'], description: 'Firmware variant.' }, + confirm: { type: 'boolean', description: 'Must be true to actually flash (guard).' }, + }, + }, + handler(args = {}) { + if (process.platform !== 'win32') { + return { ok: false, reason: 'unsupported_platform', detail: 'The ESP-IDF flash flow is Windows-subprocess-specific today (see CLAUDE.local.md).' }; + } + if (!args.confirm) { + return { ok: false, reason: 'not_confirmed', detail: 'Mutating hardware op — re-call with {confirm:true}.', would_flash: { port: args.port, variant: args.variant || 's3-8mb' } }; + } + return { ok: false, reason: 'manual_step_required', detail: 'Flashing uses the pinned ESP-IDF subprocess in CLAUDE.local.md. This tool returns the exact command rather than running an unattended flash.', see: 'skills/provision-node.md' }; + }, + }, +}; + +/** Run one tool by name; returns the structured result (or an error envelope). */ +export function runTool(name, args) { + const tool = TOOLS[name]; + if (!tool) return { ok: false, reason: 'unknown_tool', name, available: Object.keys(TOOLS) }; + try { + return tool.handler(args || {}); + } catch (err) { + return { ok: false, reason: 'tool_threw', name, error: String(err && err.message || err) }; + } +} + +/** MCP-shaped tool list: [{name, description, inputSchema}]. */ +export function listTools() { + return Object.entries(TOOLS).map(([name, t]) => ({ name, description: t.description, inputSchema: t.inputSchema })); +} diff --git a/harness/ruview/test/tools.test.mjs b/harness/ruview/test/tools.test.mjs new file mode 100644 index 00000000..a81ae131 --- /dev/null +++ b/harness/ruview/test/tools.test.mjs @@ -0,0 +1,111 @@ +// SPDX-License-Identifier: MIT +// RuView harness tests — Node's built-in test runner (no devDeps to install). +// Run: `node --test test/` (or `npm test`). + +import { test } from 'node:test'; +import assert from 'node:assert/strict'; +import { claimCheck, summarize } from '../src/guardrails.js'; +import { TOOLS, runTool, listTools, findRepoRoot } from '../src/tools.js'; +import { run } from '../bin/cli.js'; + +test('guardrail flags the retracted 100% framing as high severity', () => { + const r = claimCheck('Our model reaches 100% accuracy on every pose.'); + assert.equal(r.ok, false); + assert.ok(r.findings.some((f) => f.severity === 'high')); +}); + +test('guardrail flags an untagged percentage accuracy claim', () => { + // "hit", not "measured" — "measured" would (correctly) route to the no-reproducer branch. + const r = claimCheck('We hit 92.9% PCK on the test set.'); + assert.equal(r.ok, false); + assert.ok(r.findings.some((f) => /not tagged/i.test(f.reason))); +}); + +test('guardrail passes a MEASURED claim that cites a reproducer', () => { + const r = claimCheck('Held-out PCK@20 59.5% vs 50% mean-pose baseline = +9.4pp (MEASURED, verify.py).'); + assert.equal(r.ok, true, JSON.stringify(r.findings)); +}); + +test('guardrail flags MEASURED with no reproducer', () => { + const r = claimCheck('Presence detection 97% (MEASURED).'); + assert.equal(r.ok, false); + assert.ok(r.findings.some((f) => /no reproducer/i.test(f.reason))); +}); + +test('guardrail ignores non-metric prose', () => { + assert.equal(claimCheck('The ESP32 streams CSI over UDP to the sensing-server.').ok, true); + assert.equal(claimCheck('').ok, true); +}); + +test('summarize gives PASS/finding text', () => { + assert.match(summarize(claimCheck('nothing here')), /PASS/); + assert.match(summarize(claimCheck('100% accuracy')), /finding/); +}); + +test('registry exposes the documented tools with schemas', () => { + const names = Object.keys(TOOLS); + for (const n of ['ruview.onboard', 'ruview.claim_check', 'ruview.verify', 'ruview.node_monitor', 'ruview.calibrate', 'ruview.node_flash']) { + assert.ok(names.includes(n), `missing ${n}`); + assert.equal(TOOLS[n].inputSchema.type, 'object'); + } + assert.equal(listTools().length, names.length); +}); + +test('ruview.onboard returns paths and a recommendation', () => { + const r = runTool('ruview.onboard', {}); + assert.equal(r.ok, true); + assert.ok(r.paths['live-esp32']); + assert.ok(['repo-build', 'docker-demo'].includes(r.recommend)); +}); + +test('ruview.claim_check tool wraps the guardrail', () => { + const r = runTool('ruview.claim_check', { text: '100% accuracy' }); + assert.equal(r.ok, false); + assert.match(r.summary, /honesty|tag|MEASURED|finding/i); +}); + +test('unknown tool fails closed', () => { + const r = runTool('ruview.does_not_exist', {}); + assert.equal(r.ok, false); + assert.equal(r.reason, 'unknown_tool'); +}); + +test('node_monitor fails closed without a port', () => { + const r = runTool('ruview.node_monitor', {}); + assert.equal(r.ok, false); + assert.equal(r.reason, 'no_port'); +}); + +test('node_flash refuses without confirm (mutating guard)', () => { + const r = runTool('ruview.node_flash', { port: 'COM8', variant: 's3-8mb' }); + assert.equal(r.ok, false); + // either not-confirmed (win32) or unsupported_platform (posix) — both fail-closed + assert.ok(['not_confirmed', 'unsupported_platform'].includes(r.reason)); +}); + +test('verify fails closed when not in a RuView repo', () => { + // point at a tmp dir with no repo markers + const r = runTool('ruview.verify', { repo: process.platform === 'win32' ? 'C:/Windows/Temp' : '/tmp' }); + assert.equal(r.ok, false); + assert.ok(['proof_missing', 'python_missing'].includes(r.reason), r.reason); +}); + +test('CLI run(): claim-check exits non-zero on a bad claim', async () => { + const code = await run(['claim-check', '--text', '100% accuracy']); + assert.notEqual(code, 0); +}); + +test('CLI run(): doctor exits 0 (tools-only path)', async () => { + const code = await run(['doctor']); + assert.equal(code, 0); +}); + +test('CLI run(): unknown command exits non-zero', async () => { + assert.notEqual(await run(['definitely-not-a-command']), 0); +}); + +test('findRepoRoot locates this monorepo from cwd', () => { + // when run from within wifi-densepose, it should find a root; elsewhere null is fine + const root = findRepoRoot(); + assert.ok(root === null || typeof root === 'string'); +});