From 29233db6d5e3ed8d5fb3f9e8406fe28ed92ca593 Mon Sep 17 00:00:00 2001 From: ruv Date: Sun, 24 May 2026 12:20:52 -0400 Subject: [PATCH] =?UTF-8?q?docs(adr-118):=20BFLD=20=E2=80=94=20Beamforming?= =?UTF-8?q?=20Feedback=20Layer=20for=20Detection=20(6=20ADRs=20+=20researc?= =?UTF-8?q?h=20bundle)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Introduce the Beamforming Feedback Layer for Detection: the RuView safety layer that ingests WiFi BFI, measures identity-leakage risk, and structurally prevents identity-correlated data from leaving the node by default. ADRs (6): - ADR-118: umbrella decision, crate scaffolding, 6-phase rollout (~10.5 wk) - ADR-119: BfldFrame wire format, magic 0xBF1D_0001, deterministic serialization - ADR-120: 4 privacy classes, BLAKE3 keyed-hash rotation, #[must_classify] default-deny - ADR-121: 9-feature identity-risk scoring, coherence gate with hysteresis - ADR-122: 6 HA entities, 3 Matter clusters, mosquitto ACL, cognitum-v0 federation - ADR-123: Pi 5 / Nexmon production capture, AX210 dev path, ESP32-S3 self-only fallback Research bundle (docs/research/BFLD/, 13,544 words): - SOTA survey covering BFId (KIT, ACM CCS 2025) and LeakyBeam (NDSS 2025) - Architectural soul: defensive sensing primitive, not surveillance lens - Six-adversary threat model with attack trees and mitigations - Privacy-gating mechanics with structural cross-site isolation proof - Automation/integration surface (HA, Matter, MQTT, federation) - Concrete implementation plan with reuse map - Evaluation strategy with red-team protocol on KIT BFId dataset - Draft ADR, GitHub issue, and public gist Three structural invariants enforced by the type system, not policy: I1 — Raw BFI never exits the node I2 — Identity embedding is in-RAM-only (no Serialize impl) I3 — Cross-site identity correlation is cryptographically impossible (per-site BLAKE3 keyed-hash with daily epoch rotation) References: https://publikationen.bibliothek.kit.edu/1000185756 (BFId) https://www.ndss-symposium.org/wp-content/uploads/2025-5-paper.pdf (LeakyBeam) Co-Authored-By: claude-flow --- ...eamforming-feedback-layer-for-detection.md | 181 +++++++++++ ...119-bfld-frame-format-and-wire-protocol.md | 163 ++++++++++ ...20-bfld-privacy-class-and-hash-rotation.md | 179 +++++++++++ .../adr/ADR-121-bfld-identity-risk-scoring.md | 169 ++++++++++ .../ADR-122-bfld-ruview-ha-matter-exposure.md | 191 ++++++++++++ ...-123-bfld-capture-path-nexmon-and-esp32.md | 186 +++++++++++ docs/research/BFLD/01-sota-survey.md | 293 ++++++++++++++++++ docs/research/BFLD/02-soul.md | 141 +++++++++ .../research/BFLD/03-security-threat-model.md | 278 +++++++++++++++++ docs/research/BFLD/04-privacy-gating.md | 279 +++++++++++++++++ .../BFLD/05-automation-integration.md | 239 ++++++++++++++ docs/research/BFLD/06-implementation-plan.md | 253 +++++++++++++++ .../BFLD/07-benchmarks-and-evaluation.md | 196 ++++++++++++ docs/research/BFLD/08-adr-draft.md | 214 +++++++++++++ docs/research/BFLD/09-github-issue.md | 111 +++++++ docs/research/BFLD/10-gist.md | 136 ++++++++ docs/research/BFLD/README.md | 58 ++++ 17 files changed, 3267 insertions(+) create mode 100644 docs/adr/ADR-118-bfld-beamforming-feedback-layer-for-detection.md create mode 100644 docs/adr/ADR-119-bfld-frame-format-and-wire-protocol.md create mode 100644 docs/adr/ADR-120-bfld-privacy-class-and-hash-rotation.md create mode 100644 docs/adr/ADR-121-bfld-identity-risk-scoring.md create mode 100644 docs/adr/ADR-122-bfld-ruview-ha-matter-exposure.md create mode 100644 docs/adr/ADR-123-bfld-capture-path-nexmon-and-esp32.md create mode 100644 docs/research/BFLD/01-sota-survey.md create mode 100644 docs/research/BFLD/02-soul.md create mode 100644 docs/research/BFLD/03-security-threat-model.md create mode 100644 docs/research/BFLD/04-privacy-gating.md create mode 100644 docs/research/BFLD/05-automation-integration.md create mode 100644 docs/research/BFLD/06-implementation-plan.md create mode 100644 docs/research/BFLD/07-benchmarks-and-evaluation.md create mode 100644 docs/research/BFLD/08-adr-draft.md create mode 100644 docs/research/BFLD/09-github-issue.md create mode 100644 docs/research/BFLD/10-gist.md create mode 100644 docs/research/BFLD/README.md diff --git a/docs/adr/ADR-118-bfld-beamforming-feedback-layer-for-detection.md b/docs/adr/ADR-118-bfld-beamforming-feedback-layer-for-detection.md new file mode 100644 index 00000000..d6d0a62d --- /dev/null +++ b/docs/adr/ADR-118-bfld-beamforming-feedback-layer-for-detection.md @@ -0,0 +1,181 @@ +# ADR-118: BFLD — Beamforming Feedback Layer for Detection + +| Field | Value | +|-------|-------| +| **Status** | Proposed | +| **Date** | 2026-05-24 | +| **Deciders** | ruv | +| **Codename** | **BFLD** — Beamforming Feedback Layer for Detection | +| **Relates to** | [ADR-024](ADR-024-contrastive-csi-embedding-model.md) (AETHER), [ADR-027](ADR-027-cross-environment-domain-generalization.md) (MERIDIAN), [ADR-028](ADR-028-esp32-capability-audit.md) (witness), [ADR-029](ADR-029-ruvsense-multistatic-sensing-mode.md) (multistatic), [ADR-030](ADR-030-ruvsense-persistent-field-model.md) (field model), [ADR-031](ADR-031-ruview-sensing-first-rf-mode.md) (sensing-first), [ADR-032](ADR-032-multistatic-mesh-security-hardening.md) (mesh security), [ADR-095](ADR-095-rvcsi-edge-rf-sensing-platform.md) (rvCSI), [ADR-115](ADR-115-home-assistant-integration.md) (HA), [ADR-116](ADR-116-cog-ha-matter-seed.md) (Matter), [ADR-117](ADR-117-pip-wifi-densepose-modernization.md) (pip) | +| **Sub-ADRs** | [ADR-119](ADR-119-bfld-frame-format-and-wire-protocol.md) (frame), [ADR-120](ADR-120-bfld-privacy-class-and-hash-rotation.md) (privacy), [ADR-121](ADR-121-bfld-identity-risk-scoring.md) (risk), [ADR-122](ADR-122-bfld-ruview-ha-matter-exposure.md) (RuView), [ADR-123](ADR-123-bfld-capture-path-nexmon-and-esp32.md) (capture) | +| **Research bundle** | [`docs/research/BFLD/`](../research/BFLD/) (11 files, 13,544 words) | +| **Tracking issue** | TBD | + +--- + +## 1. Context + +### 1.1 The plaintext BFI problem + +IEEE 802.11ac and 802.11ax beamforming feedback (BFI) is exchanged between client stations (STA) and access points (AP) in **unencrypted management-plane frames**. The STA compresses the channel response into a Givens-rotation angle matrix (Φ/ψ) and transmits it as a VHT/HE Compressed Beamforming Report (CBFR). Any device in WiFi monitor mode within range can passively sniff these frames without joining the network. + +Two independent 2024–2025 research results establish the severity of this exposure: + +1. **BFId** (KIT, ACM CCS 2025) — re-identifies 197 individuals from BFI alone with >90% accuracy from 5 s of capture. https://publikationen.bibliothek.kit.edu/1000185756 +2. **LeakyBeam** (NDSS 2025) — detects occupancy through walls at 20 m with 82.7% TPR / 96.7% TNR using only plaintext BFI. https://www.ndss-symposium.org/wp-content/uploads/2025-5-paper.pdf + +Capture tooling is freely available: **Wi-BFI** (pip-installable), **PicoScenes**, **Nexmon BFI patches** for BCM43455c0 (Raspberry Pi 5 / 4 / 3B+). + +### 1.2 Gap in the existing RuView pipeline + +The wifi-densepose / RuView pipeline processes CSI via the rvCSI runtime (ADR-095/096) and emits presence, pose, vitals, and zone-activity events. **No layer in the existing pipeline measures whether the data it is processing is capable of identifying individuals.** All CSI is treated as equivalent from a privacy standpoint regardless of operating regime. + +This gap becomes a compliance and liability issue at deployment scale. An operator placing RuView in a care home, hotel, shared office, or rental property has no instrument to verify that the system is operating anonymously. + +### 1.3 BFI as a sensing signal + +BFI is not only a threat vector — its compressed angle matrices carry multipath geometry useful for presence and motion detection, particularly in single-AP deployments where MIMO CSI is unavailable. BFLD treats BFI as an **optional input alongside CSI**, not a replacement. + +### 1.4 What this ADR is *not* + +- Not a removal of the CSI pipeline. ADR-095/096 rvCSI stays authoritative for CSI. +- Not a port of any external sniffer into the repo. The Nexmon capture path lives in a separate adapter (see ADR-123). +- Not a Matter SDK ship — Matter exposure is filtered through the ADR-116 `cog-ha-matter` boundary. + +--- + +## 2. Decision + +Create a new Rust crate **`wifi-densepose-bfld`** in `v2/crates/` that: + +1. **Ingests** BFI angle matrices (Φ/ψ) from CBFR frames, optionally fused with CSI. +2. **Computes** nine named features and an `identity_risk_score` (separability × temporal_stability × cross_perspective_consistency × sample_confidence). +3. **Gates** all output through a `privacy_class` byte that **structurally prevents** identity-correlated data from being published at classes 2 (anonymous) and 3 (restricted). +4. **Emits** `BfldEvent` JSON over MQTT under `ruview//bfld/*` with per-class topic routing. +5. **Enforces three invariants structurally, not by policy**: + - **I1**: Raw BFI never exits the node. + - **I2**: Identity embedding is in-RAM-only (no disk, no network). + - **I3**: Cross-site identity correlation is cryptographically impossible via per-site keyed BLAKE3 hash rotation with a daily epoch. + +The umbrella implementation is decomposed into five sub-ADRs: + +| Sub-ADR | Scope | +|---------|-------| +| **ADR-119** | `BfldFrame` wire format, magic `0xBF1D_0001`, deterministic serialization, CRC32 | +| **ADR-120** | `privacy_class` semantics, BLAKE3 hash rotation, default-deny field classification | +| **ADR-121** | Identity risk scoring formula, coherence gate, leakage estimator | +| **ADR-122** | RuView surface: HA entities, Matter cluster boundary, MQTT topic ACL | +| **ADR-123** | Capture path: Pi 5 / Nexmon adapter + ESP32-S3 BFI feasibility | + +### 2.1 Crate module layout + +``` +v2/crates/wifi-densepose-bfld/ +├── Cargo.toml +└── src/ + ├── lib.rs + ├── frame.rs # BfldFrame (ADR-119) + ├── extractor.rs # CBFR parser → BfiCapture + ├── features.rs # 9 features + ├── identity_risk.rs # risk score (ADR-121) + ├── privacy_gate.rs # privacy_class enforcement (ADR-120) + ├── hash_rotation.rs # BLAKE3 per-site rotation (ADR-120) + ├── emitter.rs # BfldEvent → MQTT + ├── mqtt.rs # topic routing (ADR-122) + └── ffi.rs # PyO3 bindings (ADR-117 pattern) +``` + +### 2.2 Reuse map + +| BFLD module | Depends on | +|---|---| +| `features.rs` | `wifi-densepose-signal/src/ruvsense/coherence.rs`, `multistatic.rs` | +| `identity_risk.rs` | `wifi-densepose-ruvector/src/viewpoint/attention.rs`, `coherence.rs` | +| `privacy_gate.rs` | (new) — no upstream dependency | +| `hash_rotation.rs` | `blake3 = "1.5"` (keyed mode) | +| `extractor.rs` | `vendor/rvcsi/crates/rvcsi-adapter-nexmon` (ADR-095/096) | + +--- + +## 3. Consequences + +### Positive + +- First explicit, auditable RF-layer privacy primitive in the wifi-densepose ecosystem. +- `identity_risk_score` doubles as an anomaly signal (sudden spike → new AP firmware / nearby attacker-grade sniffer / unusual propagation). +- BFI fusion augments presence/motion in single-AP deployments. +- Deterministic frame hashes extend the ADR-028 witness-bundle pattern to the new surface. +- Cross-site isolation is **structural, not policy-dependent** — a stronger guarantee than ACLs. + +### Negative + +- ESP32-S3 cannot directly capture CBFR via the Espressif WiFi API. Full BFLD pipeline requires a Pi 5 / Nexmon host sniffer (cognitum-v0 available; see ADR-123). +- `identity_risk_score` calibration requires the KIT BFId dataset (non-commercial research agreement). +- Estimated effort: ~10.5 engineer-weeks across the six ADRs. + +### Neutral + +- BFLD does not prevent passive BFI capture by an external attacker (LeakyBeam-class). It only ensures the **node's own output** is non-identifying. Operators must understand this distinction. +- Daily hash rotation prevents multi-day analytics correlating individual signatures across the day boundary. Acceptable for privacy goals; may surprise analytics use-cases. + +--- + +## 4. Alternatives Considered + +### Alt 1: Skip BFI entirely (CSI-only) + +Rejected because: (a) leaves the identity-leakage gap open for the CSI pipeline; (b) as BFI tooling becomes ubiquitous (Wi-BFI, PicoScenes), the absence of a privacy layer becomes more conspicuous for operators. + +### Alt 2: Publish `identity_risk_score` publicly by default + +Rejected: the risk score itself is privacy-sensitive (reveals presence via timing correlation). Default is opt-in. + +### Alt 3: Cloud ML on raw BFI + +Rejected: violates I1. Cloud training creates an off-node store of angle matrices reconstructible into identity profiles. + +### Alt 4: Differential privacy noise on BFI at ingress + +Deferred to a follow-up ADR. DP sensitivity analysis and its interaction with `identity_risk_score` calibration are not yet complete. Current design achieves privacy through structural impossibility, not noise injection. + +--- + +## 5. Acceptance Criteria + +- [ ] **AC1**: Extractor parses BFI from 802.11ac and 802.11ax captures, 20/40/80/160 MHz, 2×2 through 4×4 MIMO. +- [ ] **AC2**: Presence detection latency ≤ 1 s p95 from first non-empty BFI frame. +- [ ] **AC3**: Motion score published at ≥ 1 Hz on `ruview//bfld/motion/state`. +- [ ] **AC4**: Raw BFI bytes never present in any serialized `BfldFrame` payload at any `privacy_class` value. +- [ ] **AC5**: With `privacy_mode` enabled, all identity-derived fields are absent from outbound events. +- [ ] **AC6**: Identical `BfiCapture` inputs produce bit-identical `BfldFrame` serialization (deterministic hash). +- [ ] **AC7**: Pipeline produces valid `BfldEvent` outputs without `csi_matrix` (BFI-only mode). + +Per-sub-ADR acceptance criteria are defined in ADR-119 through ADR-123. + +--- + +## 6. Phased Rollout + +| Phase | ADR | Scope | Effort | +|-------|-----|-------|--------| +| **P1** | 119 | Frame format + extractor stub | 1.5 wk | +| **P2** | 121 | Features + identity_risk_score | 2.0 wk | +| **P3** | 120 | Privacy gate + hash rotation | 1.5 wk | +| **P4** | 122 (a) | MQTT emitter + HA discovery | 1.5 wk | +| **P5** | 122 (b) | Matter cluster boundary in `cog-ha-matter` | 1.5 wk | +| **P6** | 123 | Pi 5 / Nexmon capture adapter | 2.5 wk | +| **Total** | | | **10.5 wk** | + +--- + +## 7. Related ADRs + +See header table. Cross-references in body cite the structural reuse of: +- ADR-024 (AETHER embedding for identity_risk computation) +- ADR-027 (MERIDIAN's no-cross-site assumption is now structurally enforced by I3) +- ADR-028 (witness-bundle extends to BFLD surface) +- ADR-029/030 (`multistatic.rs`, `cross_room.rs` reused) +- ADR-095/096 (rvCSI Nexmon adapter for BFI capture) +- ADR-115 (HA surface extension) +- ADR-116 (`cog-ha-matter` boundary filter) +- ADR-117 (PyO3 bindings pattern) diff --git a/docs/adr/ADR-119-bfld-frame-format-and-wire-protocol.md b/docs/adr/ADR-119-bfld-frame-format-and-wire-protocol.md new file mode 100644 index 00000000..985cf7ec --- /dev/null +++ b/docs/adr/ADR-119-bfld-frame-format-and-wire-protocol.md @@ -0,0 +1,163 @@ +# ADR-119: BFLD Frame Format and Wire Protocol + +| Field | Value | +|-------|-------| +| **Status** | Proposed | +| **Date** | 2026-05-24 | +| **Deciders** | ruv | +| **Parent** | [ADR-118](ADR-118-bfld-beamforming-feedback-layer-for-detection.md) | +| **Relates to** | [ADR-028](ADR-028-esp32-capability-audit.md) (witness/deterministic proof), [ADR-095](ADR-095-rvcsi-edge-rf-sensing-platform.md) (rvCSI `CsiFrame` schema) | +| **Tracking issue** | TBD | + +--- + +## 1. Context + +The BFLD pipeline (ADR-118) emits an over-the-wire `BfldFrame` consumed by the RuView aggregator, HA bridge, and witness bundle. The frame must be: + +1. **Deterministic** — identical input ⇒ bit-identical output, so witness hashes survive verification (ADR-028 pattern). +2. **Self-describing** — magic + version so future BFLD revisions don't silently corrupt aggregator state. +3. **Privacy-classified at the byte level** — the receiver must know the data class before it even parses the payload, so it can drop frames it isn't authorized to handle. +4. **Compact** — BFLD nodes may emit at up to 10 Hz; the frame must be small enough for unsharded MQTT and ESP-NOW transport. +5. **Endianness-stable** — captures from x86_64 (ruvultra), aarch64 (cognitum-v0, Pi 5 cluster), and Xtensa (ESP32-S3) must produce identical bytes. + +The existing rvCSI `CsiFrame` (ADR-095) is the closest precedent. BFLD reuses the same little-endian convention and the same "validate-before-FFI" posture. + +--- + +## 2. Decision + +### 2.1 `BfldFrame` header (40 bytes, little-endian, packed) + +```rust +#[repr(C, packed)] +pub struct BfldFrameHeader { + pub magic: u32, // 0xBF1D_0001 + pub version: u16, // 1 + pub flags: u16, // bit0=has_csi_delta, bit1=privacy_mode, bit2-15 reserved + pub timestamp_ns: u64, // monotonic capture clock + + pub ap_hash: [u8; 16], // BLAKE3-keyed(site_salt, ap_mac)[0..16] + pub sta_hash: [u8; 16], // BLAKE3-keyed(site_salt ‖ day_epoch, sta_mac)[0..16] + pub session_id: [u8; 16], // ephemeral, rotated on capture-session boundary + + pub channel: u16, // 802.11 channel number + pub bandwidth_mhz: u16, // 20 | 40 | 80 | 160 + pub rssi_dbm: i16, + pub noise_floor_dbm: i16, + + pub n_subcarriers: u16, + pub n_tx: u8, + pub n_rx: u8, + pub quantization: u8, // 0=f32, 1=i16, 2=i8, 3=packed (4-bit nibbles) + pub privacy_class: u8, // 0=raw, 1=derived, 2=anonymous, 3=restricted (default 2) + + pub payload_len: u32, + pub payload_crc32: u32, // CRC-32/ISO-HDLC over payload bytes only +} +``` + +Total header size: 40 bytes (validated by `static_assertions::const_assert_eq!`). + +### 2.2 Payload structure + +Payload is a length-prefixed sequence of typed sections in this exact order: + +``` +payload = compressed_angle_matrix + ‖ amplitude_proxy + ‖ phase_proxy + ‖ snr_vector + ‖ optional_csi_delta (present iff flags.bit0 set) + ‖ optional_vendor_extension (length 0 allowed) +``` + +Each section is `[u32 len_le][bytes...]`. The CRC32 covers all section bytes including length prefixes, but **not** the header. + +### 2.3 Privacy-class gating at serialization + +The serializer enforces these rules **before** writing any payload bytes: + +| `privacy_class` | `compressed_angle_matrix` | Identity-derived fields | Notes | +|-----------------|---------------------------|-------------------------|-------| +| 0 (`raw`) | full | full | **Local-only**, never serialized to a network sink | +| 1 (`derived`) | downsampled to 8-bit, top-k subcarriers | full | Operator-acknowledged research mode | +| 2 (`anonymous`, **default**) | absent (zero-length section) | absent | Production default | +| 3 (`restricted`) | absent | absent + diagnostic-only | Equivalent to class 2 + suppresses `identity_risk_score` on the bus | + +The serializer returns `Err(BfldError::PrivacyViolation)` if the caller attempts to publish a class-0 frame through a network sink. This is enforced by a sink-type marker trait (`LocalSink` vs `NetworkSink`). + +### 2.4 Deterministic serialization + +Three guarantees: + +1. **Field order is fixed** by `#[repr(C, packed)]`. +2. **Float quantization is canonical** — `quantization` byte values 1/2/3 use specified round-half-to-even with documented saturation; f32 (value 0) is forbidden over the wire (local-only). +3. **CRC32 is computed last**, after all section bytes are placed. + +The witness test in `tests/determinism.rs` captures a 200-frame BFI fixture, serializes it 1,000 times across two threads, and verifies the BLAKE3 of the resulting byte stream is bit-identical. + +### 2.5 Magic value rationale + +`0xBF1D_0001` is chosen so that `bf1d` reads as "BFLD" in hex-dump output, easing wireshark / xxd debugging. The final `0001` is the major version; minor revisions bump `version` field. + +--- + +## 3. Consequences + +### Positive + +- 40-byte header + compact payload fits comfortably in a 1500-byte MTU even at 4×4 MIMO with 256 subcarriers. +- Serialization is `#[no_std]` compatible — same code can run on ESP32-S3 (when ESP-NOW transport is added under ADR-123 P2). +- Witness-bundle integration is direct: the existing `archive/v1/data/proof/verify.py` pattern extends to a `bfld_verify.py` that consumes the same SHA-256 expected-hash file format. + +### Negative + +- `#[repr(C, packed)]` on the header means consumers must use `read_unaligned` — small ergonomic cost, mitigated by a `#[derive(BfldFrameAccess)]` proc-macro. +- Reserved flag bits 2-15 lock in future-extension order; any new bit assignment is a version bump. + +### Neutral + +- The vendor-extension section allows downstream RuView cogs (e.g., `cog-pose-estimation`) to attach metadata without a header change, at the cost of CRC scope creep. Vendor sections are explicitly outside the witness hash. + +--- + +## 4. Alternatives Considered + +### Alt 1: Protobuf / FlatBuffers + +Rejected: schema evolution overhead, witness-hash instability across protoc versions, ~3× wire bloat for the small fixed-shape fields. + +### Alt 2: CBOR + +Rejected: deterministic CBOR (RFC 8949 §4.2) is achievable but the parser surface is large and tag handling is a footgun for the `no_std` ESP32 path. + +### Alt 3: Variable-width magic / no magic + +Rejected: receivers must distinguish BFLD frames from rvCSI `CsiFrame` and other RuView payloads on shared transports. + +### Alt 4: Move CRC32 to header + +Rejected: CRC must be computed after the payload, so its value would otherwise force a header rewrite; placing it last avoids a buffer-pass-back. + +--- + +## 5. Acceptance Criteria + +- [ ] **AC1**: `BfldFrameHeader` size is exactly 40 bytes on x86_64, aarch64, and xtensa-esp32s3. +- [ ] **AC2**: 1,000 serializations of a fixed `BfiCapture` fixture produce a bit-identical BLAKE3 hash. +- [ ] **AC3**: `privacy_class = 0` frame returned through `NetworkSink::publish()` returns `Err(BfldError::PrivacyViolation)`. +- [ ] **AC4**: Payload CRC32 mismatch causes `BfldFrame::parse()` to return `Err(BfldError::Crc)` without exposing partial payload state. +- [ ] **AC5**: Round-trip serialize/parse preserves all header fields exactly. +- [ ] **AC6**: A frame with `flags.bit0 = 0` (no CSI delta) and an unexpected CSI-delta section is rejected. +- [ ] **AC7**: Bench: serialization throughput ≥ 50k frames/sec on a 2025-era M1/M2 / Pi 5 core. + +--- + +## 6. References + +- ADR-118 §2 (umbrella decision) +- ADR-095 `CsiFrame` (`vendor/rvcsi/crates/rvcsi-core/src/frame.rs`) +- CRC-32/ISO-HDLC: `crc = "3"` crate +- BLAKE3 keyed mode: `blake3 = "1.5"` +- IEEE 802.11-2020 §19.3.12 (Compressed Beamforming Report) diff --git a/docs/adr/ADR-120-bfld-privacy-class-and-hash-rotation.md b/docs/adr/ADR-120-bfld-privacy-class-and-hash-rotation.md new file mode 100644 index 00000000..592f32f0 --- /dev/null +++ b/docs/adr/ADR-120-bfld-privacy-class-and-hash-rotation.md @@ -0,0 +1,179 @@ +# ADR-120: BFLD Privacy Class and Hash Rotation + +| Field | Value | +|-------|-------| +| **Status** | Proposed | +| **Date** | 2026-05-24 | +| **Deciders** | ruv | +| **Parent** | [ADR-118](ADR-118-bfld-beamforming-feedback-layer-for-detection.md) | +| **Relates to** | [ADR-027](ADR-027-cross-environment-domain-generalization.md) (MERIDIAN no-cross-site), [ADR-032](ADR-032-multistatic-mesh-security-hardening.md) (mesh security), [ADR-106](ADR-106-dp-sgd-and-primitive-isolation.md) (primitive isolation), [ADR-115](ADR-115-home-assistant-integration.md) (privacy mode) | +| **Tracking issue** | TBD | + +--- + +## 1. Context + +ADR-118 declares three structural invariants for BFLD: + +- **I1**: Raw BFI never exits the node. +- **I2**: Identity embedding is in-RAM-only. +- **I3**: Cross-site identity correlation is cryptographically impossible. + +I1/I2 are enforced by sink typing and module visibility (ADR-119 §2.3). I3 requires a hash-rotation scheme that makes the same physical person produce **different** `rf_signature_hash` values across sites and across day boundaries, without any out-of-band coordination between sites. + +The existing `HA-PRIVACY` mode in ADR-115 already toggles between "full" and "anonymous" surfaces, but at a per-event granularity — not at a per-byte-field granularity. BFLD requires the latter because the `BfldFrame` payload mixes sensing data (publishable) and identity-derived data (non-publishable) in the same struct. + +The BFId paper (KIT, ACM CCS 2025) demonstrates that even a few minutes of BFI capture across the same site is sufficient to build a persistent biometric. The mitigation must be **structural**, not policy-dependent. + +--- + +## 2. Decision + +### 2.1 The four privacy classes + +A single `privacy_class: u8` byte in the `BfldFrame` header (ADR-119 §2.1) selects one of four classes. The crate enforces field availability statically through marker types. + +| Class | Name | Use case | Available fields | +|-------|------|----------|------------------| +| **0** | `raw` | Local-only research, never networked | All fields, full-precision BFI matrix, identity embedding | +| **1** | `derived` | Operator-acknowledged research over LAN | Downsampled angle matrix, full features, identity_risk_score, identity_embedding | +| **2** | `anonymous` (**default**) | Production deployment | Aggregate sensing only: presence, motion, person_count, zone_id, confidence | +| **3** | `restricted` | Care-home / regulated deployment | Class 2 minus `identity_risk_score` and `rf_signature_hash` | + +Default for new RuView nodes is class **2**. Operators must explicitly opt-down to class 1 via the existing `--research-mode` flag (ADR-115 §7); class 0 is reserved for `cargo test` and is unreachable from `wifi-densepose-sensing-server`. + +### 2.2 Enforcement via marker types + +```rust +pub trait Sink {} + +pub trait LocalSink: Sink {} // Allowed: classes 0,1,2,3 +pub trait NetworkSink: Sink {} // Allowed: classes 1,2,3 (NOT class 0) +pub trait MatterSink: NetworkSink {} // Allowed: class 2,3 + cluster-filter (ADR-122) + +impl Emitter { + pub fn publish(&self, sink: &S, frame: BfldFrame) + -> Result<(), BfldError> + { + if frame.header.privacy_class == 0 { + return Err(BfldError::PrivacyViolation { + reason: "class 0 to NetworkSink", + }); + } + // ... serialize and write + } +} +``` + +The compiler refuses to call `publish` on a sink that doesn't impl `NetworkSink` with a class-0 frame because the runtime check is paired with a sink-marker check. Cross-sink frame routing requires an explicit class transition (see §2.4). + +### 2.3 BLAKE3 keyed hash rotation for `rf_signature_hash` + +The signature hash is computed as: + +```rust +pub fn rf_signature_hash( + site_salt: &[u8; 32], // generated on first boot, persisted in TPM/KMS + day_epoch: u32, // floor(unix_time_utc / 86400) + features: &IdentityFeatures, +) -> Hash { + let mut hasher = blake3::Hasher::new_keyed(site_salt); + hasher.update(&day_epoch.to_le_bytes()); + hasher.update(&features.canonical_bytes()); + hasher.finalize() +} +``` + +**Structural cross-site isolation**: because `site_salt` is a 256-bit random secret unique to each node and never transmitted, two sites observing the same physical person produce uncorrelated hashes. There is no key the operator (or an attacker who compromises one node) can use to bridge sites. This is stronger than a policy-based "do not share" rule because the bridge **cannot be computed**. + +**Daily rotation**: `day_epoch` flipping at UTC midnight forces the hash of the same person to change once per day. Multi-day correlation requires re-acquiring the biometric, which the rotation actively breaks. + +### 2.4 Class-transition transformer + +The only way a high-class frame becomes a lower-class frame is through `PrivacyGate::demote(frame, target_class)`. This function: + +1. Asserts the target class is strictly higher number than (or equal to) the input class. +2. Zeroes the disallowed fields with `subtle::Zeroize`. +3. Re-computes `payload_crc32`. +4. Returns the new frame. + +There is no `promote` operation — a class-2 frame cannot be turned back into a class-1 frame, because the dropped fields were not retained anywhere reachable from the gate. + +### 2.5 `identity_embedding` lifecycle + +The embedding (output of the AETHER encoder, ADR-024) is held in a `subtle::Zeroizing<[f32; 128]>` ring buffer of 64 entries (≈30 KB). Entries are: + +1. Written by the encoder on each capture window. +2. Consumed by `identity_risk_score` computation (ADR-121). +3. **Never** written to disk, MQTT, or any other I/O sink — there is no `Serialize` impl on the type. +4. Overwritten by the ring (FIFO). + +A compile-time `#[forbid(serde::Serialize)]` lint on `IdentityEmbedding` ensures a future PR cannot accidentally add a `Serialize` derive. + +### 2.6 Default-deny field classification + +Every new field added to `BfldFrame` or `BfldEvent` must be tagged with `#[must_classify]` (a custom attribute macro). The macro fails compilation if the field is not listed in the per-class allow-list table. This forces future contributors to make an explicit privacy decision on every new field. + +--- + +## 3. Consequences + +### Positive + +- Cross-site identity correlation is **computationally impossible**, not merely "prohibited by policy". This is the strongest form of privacy guarantee available without a TEE. +- Default-deny via `#[must_classify]` prevents the common pattern of "a new field shipped, then six months later we noticed it was identity-leaky". +- `identity_embedding` cannot be serialized by accident — the type system carries the constraint. +- The class transition transformer makes the data lifecycle explicit and auditable. + +### Negative + +- `site_salt` storage requires either a TPM (ADR-095/096 rvCSI platform feature gap) or a secrets file with strict mode. Loss of `site_salt` makes historical witness comparisons impossible — by design, but a documentation hazard. +- `#[must_classify]` is a custom proc-macro; another moving part in the build. +- Operators wanting multi-day analytics must work in aggregates only, not on per-individual signatures. + +### Neutral + +- Class 0 is `cargo test`-only. Some CI runners may need an explicit feature flag to compile class-0 paths. + +--- + +## 4. Alternatives Considered + +### Alt 1: Single boolean `privacy_mode` flag (status quo from ADR-115) + +Rejected: insufficient granularity. The frame mixes publishable sensing with non-publishable identity, so the gate must operate at field-level, not event-level. + +### Alt 2: SHA-256 instead of BLAKE3 + +Rejected: BLAKE3 keyed-hash mode is ~5× faster on the ESP32-S3 / Cortex-M cores and the security margin is equivalent for this use case. SHA-256 has no keyed-hash mode (HMAC-SHA256 is the alternative; works but is slower). + +### Alt 3: Hash rotation on the hour, not the day + +Rejected: hourly rotation breaks legitimate "person was here in the morning, came back in the afternoon" use-cases that operators may want. Day boundary is the compromise. + +### Alt 4: Per-event nonces instead of daily epoch + +Rejected: per-event nonces would force the consumer to track which events came from the same person within a session, which leaks identity information by structure. The day epoch preserves a coarse temporal grouping without leaking finer-grained identity. + +--- + +## 5. Acceptance Criteria + +- [ ] **AC1**: Calling `Emitter::publish` with a `privacy_class = 0` frame on a `NetworkSink` returns `BfldError::PrivacyViolation`. +- [ ] **AC2**: Two BFLD nodes with different `site_salt` values observing the same simulated person produce `rf_signature_hash` values whose Hamming distance is ≥ 120 bits over 100 trials (statistical isolation test). +- [ ] **AC3**: A frame with `privacy_class = 3` has both `identity_risk_score` and `rf_signature_hash` absent from the serialized payload. +- [ ] **AC4**: `PrivacyGate::demote(class_1_frame, target=0)` fails to compile (compile-fail test). +- [ ] **AC5**: A PR adding a new field to `BfldEvent` without `#[must_classify]` fails the build. +- [ ] **AC6**: `IdentityEmbedding` has no `Serialize` impl reachable from any public function. +- [ ] **AC7**: Dropping an `IdentityEmbedding` value zeroizes its memory (verified by a debugger-readable test under `cargo test --features zeroize-validation`). + +--- + +## 6. References + +- ADR-118 (umbrella) +- ADR-119 (frame format; `privacy_class` byte location) +- KIT BFId (ACM CCS 2025): https://publikationen.bibliothek.kit.edu/1000185756 +- NDSS LeakyBeam (2025): https://www.ndss-symposium.org/wp-content/uploads/2025-5-paper.pdf +- BLAKE3 keyed-hash: https://github.com/BLAKE3-team/BLAKE3 +- `subtle::Zeroize` for memory hygiene diff --git a/docs/adr/ADR-121-bfld-identity-risk-scoring.md b/docs/adr/ADR-121-bfld-identity-risk-scoring.md new file mode 100644 index 00000000..427ddae2 --- /dev/null +++ b/docs/adr/ADR-121-bfld-identity-risk-scoring.md @@ -0,0 +1,169 @@ +# ADR-121: BFLD Identity Risk Scoring and Coherence Gate + +| Field | Value | +|-------|-------| +| **Status** | Proposed | +| **Date** | 2026-05-24 | +| **Deciders** | ruv | +| **Parent** | [ADR-118](ADR-118-bfld-beamforming-feedback-layer-for-detection.md) | +| **Relates to** | [ADR-024](ADR-024-contrastive-csi-embedding-model.md) (AETHER), [ADR-027](ADR-027-cross-environment-domain-generalization.md) (MERIDIAN), [ADR-029](ADR-029-ruvsense-multistatic-sensing-mode.md) (multistatic fusion), [ADR-086](ADR-086-edge-novelty-gate.md) (novelty gate precedent), [ADR-120](ADR-120-bfld-privacy-class-and-hash-rotation.md) (privacy class) | +| **Tracking issue** | TBD | + +--- + +## 1. Context + +BFLD's distinguishing primitive is the `identity_risk_score` — a scalar that says **"is this capture window currently capable of identifying a specific person?"**. The score has two consumers: + +1. **The operator** — exposed as an HA diagnostic sensor (ADR-122). A spike from the long-term baseline indicates the RF environment has shifted toward a higher-leakage regime (new AP firmware, denser MIMO, attacker-grade sniffer in range). +2. **The privacy gate** (ADR-120) — when the score crosses a configurable threshold, the gate downgrades the active `privacy_class` automatically (e.g., 2 → 3) until the score recovers. + +The score must be: +- **Bounded** in `[0, 1]` for HA gauge entities. +- **Calibrated** against actual re-ID success rate, ideally on the KIT BFId dataset. +- **Computable on-device** at ≥ 1 Hz on a Pi 5 core or an aarch64 cognitum-v0. +- **Stable** — small environmental changes should not produce wild swings; the score is for slow-moving regime detection, not per-frame chatter. + +ADR-086 (edge novelty gate) establishes a precedent for an on-device gate primitive. BFLD's risk scoring borrows the gate-pattern but with identity leakage as the trigger condition. + +--- + +## 2. Decision + +### 2.1 Nine features (from BFLD spec §5) + +The features are computed over a sliding window of `W = 32` BFI frames (≈3 s at 10 Hz): + +| Feature | Definition | Source | +|---------|------------|--------| +| `mean_angle_delta` | mean( ‖ Φ_t − Φ_{t-1} ‖ over subcarriers ) | extractor | +| `subcarrier_variance` | var( ‖ Φ ‖ over subcarrier axis ) | extractor | +| `temporal_entropy` | Shannon entropy of angle-bin histogram over W | extractor | +| `doppler_proxy` | FFT peak magnitude of mean-angle time series | features.rs | +| `path_stability` | 1 − ‖ Φ_t − median(Φ_{t-W..t}) ‖ / scale | features.rs | +| `cross_antenna_correlation` | mean Pearson correlation across n_tx × n_rx pairs | features.rs | +| `burst_motion_score` | high-pass-filtered angular velocity, soft-thresholded | features.rs | +| `stationarity_score` | 1 − rolling KL divergence over W/2 vs W | features.rs | +| `identity_separability_score` | top-1 cosine to nearest AETHER cluster centroid | identity_risk.rs | + +The first eight are sensing features (also used by the presence/motion pipeline). Only the ninth depends on the AETHER embedding and therefore on `identity_class >= 1`. + +### 2.2 Identity risk formula + +```rust +pub fn identity_risk_score( + sep: f32, // identity_separability_score, [0, 1] + stab: f32, // temporal_stability, [0, 1] = ema(path_stability, alpha=0.1) + consist: f32,// cross_perspective_consistency, [0, 1] = multistatic.rs + conf: f32, // sample_confidence, [0, 1] = f(SNR, n_subcarriers, n_rx) +) -> f32 { + // Clamp inputs, then multiplicative combination — any factor near 0 dominates. + let s = sep.clamp(0.0, 1.0); + let t = stab.clamp(0.0, 1.0); + let p = consist.clamp(0.0, 1.0); + let c = conf.clamp(0.0, 1.0); + (s * t * p * c).clamp(0.0, 1.0) +} +``` + +Multiplicative combination is chosen so that **any** weak factor (e.g., very low SNR ⇒ low `conf`) collapses the score toward 0. This matches the privacy intent: when the system is uncertain, the score should be low and the operator should not be alarmed. + +### 2.3 Calibration target + +The score is calibrated against re-ID success rate on a held-out test split of the KIT BFId dataset. A piecewise-linear isotonic regression maps raw scores into a calibrated `[0, 1]` band where `score ≥ 0.8` corresponds to `>80%` re-ID accuracy on a 5-second window in the calibration dataset. + +Calibration parameters live in `v2/crates/wifi-densepose-bfld/data/risk_calibration.toml` and are versioned independently of the code. A regression update is a content-only PR. + +### 2.4 Coherence gate + +The coherence gate (per ADR-029 `coherence_gate.rs` pattern) consumes the risk score and emits one of four actions: + +```rust +pub enum GateAction { + Accept, // score < 0.5, publish normally + PredictOnly, // 0.5 <= score < 0.7, publish but flag confidence + Reject, // 0.7 <= score < 0.9, drop the event + Recalibrate, // score >= 0.9, drop AND rotate site_salt +} +``` + +The `Recalibrate` action triggers a forced site-salt rotation — an aggressive response to a sustained high-risk regime. It costs the operator continuity of long-term aggregate analytics but is the right answer to an attacker-grade sniffer arriving in range. + +### 2.5 Hysteresis + +To prevent oscillation around the gate thresholds, the gate uses ±0.05 hysteresis and a 5-second debounce. A score must cross the boundary by the hysteresis margin and persist for the debounce window before the gate action changes. + +### 2.6 Compute budget + +| Stage | Target latency | Implementation | +|-------|----------------|----------------| +| Feature extraction (8 features) | < 3 ms per window | ndarray + nalgebra; vectorized over subcarriers | +| Separability (cosine to centroids) | < 5 ms per window | RuVector RaBitQ index (ADR-085) over ≤ 1k centroids | +| Risk score | < 0.1 ms | scalar multiplicative | +| Gate decision + hysteresis | < 0.1 ms | scalar | + +Total p95 ≤ 10 ms per window on a Pi 5 core (8 ms target). Headroom on cognitum-v0 (Pi 5 + Hailo) is ample; ESP32-S3 hosts only the extraction stage (features computed; risk score is host-side per ADR-123). + +--- + +## 3. Consequences + +### Positive + +- The risk score becomes a first-class diagnostic surface for operators and a structural input to the privacy gate — both consumers from a single computation. +- Multiplicative combination is conservative under uncertainty; the system is biased toward "report low risk when unsure", which is the right default. +- Calibration is a content-only update — no recompile needed when the calibration file changes. +- The recalibration gate action gives the system a self-healing response to a sniffer arrival without operator intervention. + +### Negative + +- Calibration requires the KIT BFId dataset; without it the score is uncalibrated and serves only as an internal trigger, not a publishable signal. +- Multiplicative scoring can be dominated by `sample_confidence`, which is sensitive to channel conditions. A persistent low-SNR environment will keep the published score near 0 even when the underlying separability is high — an under-reporting failure mode that the documentation must call out. +- The recalibrate action breaks historical hash continuity by design; an operator who wants long-term aggregates needs to know they will see a discontinuity on recalibrate events. + +### Neutral + +- The nine features overlap with the existing CSI pipeline. BFLD computes them on BFI; the CSI pipeline computes them on CSI. Both can be fused via `cross_perspective_consistency`. + +--- + +## 4. Alternatives Considered + +### Alt 1: Additive scoring (`(s + t + p + c) / 4`) + +Rejected: a sample with high separability but very low confidence would still produce a moderate score, which over-reports risk in degraded RF conditions. + +### Alt 2: Maximum scoring (`max(s, t, p, c)`) + +Rejected: over-reports risk because any single high factor pins the output, even if the others contradict it. + +### Alt 3: Learned scoring (a small MLP) + +Rejected for this ADR: introduces an opaque model whose output cannot be audited from first principles. The multiplicative formula is simple, conservative, and directly explainable to operators. A learned model is a future option once enough calibration data is in hand. + +### Alt 4: Per-feature thresholds instead of a continuous score + +Rejected: continuous score is needed for the HA gauge entity and for downstream calibration. Per-feature thresholds would force operators to interpret nine separate binaries. + +--- + +## 5. Acceptance Criteria + +- [ ] **AC1**: All nine features are computed in `< 8 ms` p95 per window on a Pi 5 core. +- [ ] **AC2**: `identity_risk_score` is monotonic non-decreasing in any single input when the other three are held constant. +- [ ] **AC3**: Calibration regression on the KIT BFId test split: `score ≥ 0.8` corresponds to ≥ 80% re-ID accuracy ± 5%. +- [ ] **AC4**: The coherence gate emits `Recalibrate` if score is ≥ 0.9 for ≥ 5 seconds. +- [ ] **AC5**: Hysteresis prevents action oscillation across ± 0.05 of a threshold within a 5-second window. +- [ ] **AC6**: At `privacy_class = 3`, the risk score is computed but not published to MQTT (kept local for the gate only). +- [ ] **AC7**: A reproducible 1,000-frame synthetic fixture produces a deterministic score sequence (bit-identical across runs). + +--- + +## 6. References + +- ADR-118 (umbrella) +- ADR-024 (AETHER encoder for separability) +- ADR-029 (`coherence_gate.rs` precedent) +- ADR-086 (edge novelty gate pattern) +- ADR-120 §2.4 (class transition consumed by gate) +- KIT BFId dataset: https://publikationen.bibliothek.kit.edu/1000185756 diff --git a/docs/adr/ADR-122-bfld-ruview-ha-matter-exposure.md b/docs/adr/ADR-122-bfld-ruview-ha-matter-exposure.md new file mode 100644 index 00000000..56d46352 --- /dev/null +++ b/docs/adr/ADR-122-bfld-ruview-ha-matter-exposure.md @@ -0,0 +1,191 @@ +# ADR-122: BFLD RuView Surface — Home Assistant, Matter, MQTT Exposure + +| Field | Value | +|-------|-------| +| **Status** | Proposed | +| **Date** | 2026-05-24 | +| **Deciders** | ruv | +| **Parent** | [ADR-118](ADR-118-bfld-beamforming-feedback-layer-for-detection.md) | +| **Relates to** | [ADR-031](ADR-031-ruview-sensing-first-rf-mode.md) (sensing-first), [ADR-100](ADR-100-cog-packaging-specification.md) (cog packaging), [ADR-115](ADR-115-home-assistant-integration.md) (HA-DISCO + HA-MIND), [ADR-116](ADR-116-cog-ha-matter-seed.md) (Matter cog), [ADR-120](ADR-120-bfld-privacy-class-and-hash-rotation.md) (privacy class) | +| **Tracking issue** | TBD | + +--- + +## 1. Context + +ADR-115 shipped the RuView Home Assistant surface (21 entities, MQTT auto-discovery, mTLS, privacy mode) on the `wifi-densepose-sensing-server` Rust binary. ADR-116 is packaging this as the `cog-ha-matter` Cognitum Seed cog. BFLD must integrate into this surface without expanding the privacy-sensitive footprint already in production. + +The integration must: + +1. **Extend HA-DISCO** to advertise BFLD entities via the existing MQTT-discovery scheme. +2. **Reject identity fields at the Matter boundary** — Matter exposes occupancy/motion/people-count only, never `identity_risk_score` or `rf_signature_hash`. +3. **Route MQTT topics by privacy class** — class-2/3 events on the public topic tree, class-1 events on a gated `research/` subtree, class-0 events nowhere. +4. **Federate cleanly into cognitum-v0** — BFLD events from multiple nodes flow through `cognitum-rvf-agent` (port 9004 per CLAUDE.local.md) for cross-node analytics, but identity-derived fields are stripped at the **publishing-node boundary**, not at the federation hub. + +--- + +## 2. Decision + +### 2.1 HA entity surface (six new entities per node) + +The cog republishes the existing 21 ADR-115 entities and adds: + +| Entity ID | Type | Source field | Class gate | Diagnostic | +|-----------|------|--------------|------------|------------| +| `binary_sensor._bfld_presence` | occupancy | `BfldEvent.presence` | ≥ 2 | no | +| `sensor._bfld_motion` | gauge `[0,1]` | `BfldEvent.motion` | ≥ 2 | no | +| `sensor._bfld_person_count` | int | `BfldEvent.person_count` | ≥ 2 | no | +| `sensor._bfld_zone_activity` | enum | `BfldEvent.zone_activity` | ≥ 2 | no | +| `sensor._bfld_identity_risk` | gauge `[0,1]` | `BfldEvent.identity_risk_score` | == 2 only | **yes** | +| `sensor._bfld_confidence` | gauge `[0,1]` | `BfldEvent.confidence` | ≥ 2 | yes | + +The `identity_risk` entity is exposed only under privacy class 2 and is flagged `entity_category: diagnostic` so HA dashboards do not promote it to a main-card sensor by default. Under class 3 it is computed but not published (per ADR-121 §2.4). + +MQTT discovery payload follows the ADR-115 schema, plus a `bfld_version` attribute matching the `BfldFrameHeader::version` field. + +### 2.2 MQTT topic tree + +``` +ruview//bfld/presence/state # class >= 2 +ruview//bfld/motion/state # class >= 2 +ruview//bfld/person_count/state # class >= 2 +ruview//bfld/zone_activity/state # class >= 2 +ruview//bfld/confidence/state # class >= 2 +ruview//bfld/identity_risk/state # class == 2 only +ruview//bfld/raw # class 1, OFF by default +ruview//bfld/availability # online/offline marker +``` + +`raw` (class-1 derived BFI) is **not present** in the discovery payload at all — operators must explicitly subscribe and acknowledge the research-mode caveat. The publishing crate emits `MQTT_RAW_DISABLED` to availability when `privacy_class < 1`. + +### 2.3 Mosquitto ACL example + +``` +# Default-deny everything not explicitly granted +pattern read ruview/+/bfld/+/state +pattern read ruview/+/bfld/availability + +# Public roles cannot read identity_risk or raw +user public +deny read ruview/+/bfld/identity_risk/state +deny read ruview/+/bfld/raw + +# Operator role can read identity_risk for diagnostics +user operator +allow read ruview/+/bfld/identity_risk/state + +# Research role can read raw (requires class-1 operation) +user research +allow read ruview/+/bfld/raw +``` + +The cog ships a default ACL template under `cog-ha-matter/etc/mosquitto.acl.d/bfld.conf` for operators who use the embedded broker (ADR-116 §2.2). + +### 2.4 Matter cluster boundary + +`cog-ha-matter` exposes BFLD via **three Matter clusters** only: + +| Matter cluster | Source entity | Notes | +|---|---|---| +| Occupancy Sensing (0x0406) | `binary_sensor._bfld_presence` | reports binary occupancy + uncertainty (mapped from `confidence`) | +| Boolean State (0x0045) | `sensor._bfld_motion >= 0.3` | thresholded; raw motion not exposed | +| Occupancy Sensing extension | `sensor._bfld_person_count` | uses occupancy-sensor count where Matter spec supports | + +**Explicitly NOT exposed via Matter**: + +- `identity_risk_score` +- `rf_signature_hash` +- `identity_embedding` +- `raw` BFI +- `zone_activity` (zone IDs are site-specific and Matter is a cross-site surface) +- `confidence` (HA-only diagnostic) + +The Matter filter is implemented in `cog-ha-matter/src/matter/bfld_filter.rs` as a `MatterSink` trait impl that rejects classes 0 and 1 at compile time (via ADR-120 §2.2 marker types). + +### 2.5 Federation with cognitum-v0 + +`cognitum-rvf-agent` (port 9004) receives BFLD events from multiple nodes. The events arriving at the federation hub are **already class-2/3** — identity-derived fields were stripped at each publishing node. The hub does not see and cannot reconstruct raw BFI or identity embeddings. + +The federation contract: + +| At publishing node | At cognitum-rvf-agent | +|---|---| +| Strip class-0/1 fields per ADR-120 | Receive class-2/3 events only | +| Rotate `rf_signature_hash` per ADR-120 §2.3 | Aggregate counts; **do not** correlate hashes across sites | +| Sign event with node Ed25519 key | Verify signature; reject unsigned events | + +A `federation-witness` script (extending ADR-028) runs nightly on the hub and proves that no class-0/1 fields appeared in any received event over the previous 24 h. + +### 2.6 HA blueprints (shipped with the cog) + +Three operator-ready blueprints under `cog-ha-matter/blueprints/`: + +1. **Presence-driven lighting** — `binary_sensor.*_bfld_presence` ⇒ `light.turn_on/off` with configurable hold time. +2. **Motion-aware HVAC** — `sensor.*_bfld_motion > 0.3` ⇒ raise HVAC setpoint by ΔT. +3. **Identity-risk anomaly notification** — `sensor.*_bfld_identity_risk` exceeds rolling z-score threshold ⇒ HA `notify.*` to the operator with the originating node and the 7-day baseline. + +--- + +## 3. Consequences + +### Positive + +- Six new HA entities give operators a complete BFLD diagnostic dashboard without leaking identity. +- Matter exposure is structurally narrow — the cluster-filter implementation cannot accidentally expose identity fields because the type system rejects them. +- The default ACL template gives operators a working privacy posture out of the box. +- The federation contract makes it explicit that the hub cannot reconstruct identity even from the union of all node events. + +### Negative + +- The `identity_risk` HA entity exists only under class 2. Operators who run class 3 deployments cannot see the score even in their own dashboard. This is correct but may surprise care-home installers; documentation must be clear. +- Three Matter clusters is conservative — some HA users may want the count exposed as a percentage or rate, which Matter does not support natively. +- HA-blueprint coverage is intentionally small; operators wanting custom automations must work through the YAML surface. + +### Neutral + +- The federation witness script runs nightly. A short-duration leak between witnesses is possible but bounded — any successful exfiltration of class-1 fields would still need to be reconstructed into identity, which the daily hash rotation breaks. + +--- + +## 4. Alternatives Considered + +### Alt 1: Expose `identity_risk` over Matter (Generic Sensor cluster) + +Rejected: Matter is a cross-vendor surface; exposing identity-risk there leaks the score to every Matter controller in the home, including third-party hubs the operator may not control. Keep it HA-internal. + +### Alt 2: One unified MQTT topic `ruview//bfld` with JSON payload + +Rejected: per-entity topics are the HA-DISCO convention (ADR-115) and let ACLs be field-specific. A unified topic forces an all-or-nothing read policy. + +### Alt 3: Federate raw BFI to cognitum-v0 for cross-node analytics + +Rejected: violates ADR-120 I1 (raw never leaves the node). Aggregates are sufficient for cross-node analytics; raw centralization is a hard no. + +### Alt 4: Default `entity_category: diagnostic = false` for `identity_risk` + +Rejected: promoting `identity_risk` to a main-card sensor would surprise operators with an identity-adjacent gauge on their main dashboard. Diagnostic category is the right default. + +--- + +## 5. Acceptance Criteria + +- [ ] **AC1**: HA auto-discovery publishes six new entities per node on first connect; HA recognizes all six. +- [ ] **AC2**: Under privacy class 3, `sensor._bfld_identity_risk` is absent from the MQTT discovery payload. +- [ ] **AC3**: `MatterSink::publish` rejects any frame at compile time when the source has `privacy_class < 2`. +- [ ] **AC4**: The default mosquitto ACL denies `read ruview/+/bfld/identity_risk/state` to the `public` user role. +- [ ] **AC5**: Three HA blueprints install cleanly into a fresh HA install and trigger their configured actions against a mock BFLD event stream. +- [ ] **AC6**: The federation-witness script detects an injected class-1 field in a synthetic event and exits non-zero. +- [ ] **AC7**: Matter occupancy-sensing cluster reports presence within 1 s of an HA `binary_sensor.*_bfld_presence` state change. + +--- + +## 6. References + +- ADR-115 (HA-DISCO entity scheme) +- ADR-116 (`cog-ha-matter` cog packaging) +- ADR-120 (privacy class enforcement) +- ADR-121 (identity risk source) +- ADR-100 (cog packaging spec) +- Mosquitto ACL reference: https://mosquitto.org/man/mosquitto-conf-5.html +- Matter spec — Occupancy Sensing cluster (0x0406) +- Cognitum V0 appliance dashboard: `http://cognitum-v0:9000/` diff --git a/docs/adr/ADR-123-bfld-capture-path-nexmon-and-esp32.md b/docs/adr/ADR-123-bfld-capture-path-nexmon-and-esp32.md new file mode 100644 index 00000000..fc7170a0 --- /dev/null +++ b/docs/adr/ADR-123-bfld-capture-path-nexmon-and-esp32.md @@ -0,0 +1,186 @@ +# ADR-123: BFLD Capture Path — Pi 5 / Nexmon Adapter and ESP32-S3 Feasibility + +| Field | Value | +|-------|-------| +| **Status** | Proposed | +| **Date** | 2026-05-24 | +| **Deciders** | ruv | +| **Parent** | [ADR-118](ADR-118-bfld-beamforming-feedback-layer-for-detection.md) | +| **Relates to** | [ADR-022](ADR-022-multi-bssid-wifi-scanning.md) (multi-BSSID scan), [ADR-028](ADR-028-esp32-capability-audit.md) (capability audit), [ADR-095](ADR-095-rvcsi-edge-rf-sensing-platform.md) (rvCSI), [ADR-096](ADR-096-rvcsi-ffi-crate-layout.md) (rvCSI FFI), [ADR-110](ADR-110-esp32-c6-firmware-extension.md) (C6 firmware), [ADR-119](ADR-119-bfld-frame-format-and-wire-protocol.md) (BfldFrame) | +| **Tracking issue** | TBD | + +--- + +## 1. Context + +ADR-118 declares that BFLD captures BFI from commodity WiFi 5/6 traffic. The question this sub-ADR answers is: **on which hardware, with which adapter, and against which firmware limitations**. + +### 1.1 ESP32-S3 BFI capability gap + +The ESP32 capability audit (ADR-028) and the ESP32-S3 / C6 firmware (`firmware/esp32-csi-node/`, ADR-110) confirm that the Espressif WiFi API exposes **CSI** capture (`esp_wifi_set_csi_*`) but does not expose **raw 802.11 management-frame capture** in monitor mode for non-self-addressed CBFR reports. The S3 sees the CBFR frames its own AP-link generates (when it acts as a beamformer), but it cannot promiscuously sniff CBFR frames between other STA/AP pairs in the neighborhood. + +The C6 (ESP32-C6 with RISC-V + Wi-Fi 6) has a more flexible RF subsystem but the same software-API constraint at the time of writing. + +### 1.2 Pi 5 / Nexmon as the production capture host + +The rvCSI platform (ADR-095/096) already vendors a Nexmon-based adapter (`rvcsi-adapter-nexmon`) that captures CSI from BCM43455c0 chips (Pi 5 / Pi 4 / Pi 3B+). Nexmon patches the firmware to surface CSI to userspace and **also surface CBFR frames** — the BFI extension is the same code path with a different filter. + +cognitum-v0 (Pi 5 in the fleet, per CLAUDE.local.md) is already running Nexmon + the rvCSI runtime. It is the natural BFLD capture host. + +### 1.3 What we need from each hardware tier + +| Tier | Role | BFI capture | CSI capture | Notes | +|------|------|-------------|-------------|-------| +| ESP32-S3 / C6 | Sensing leaf | **no** | yes | Continues providing CSI to the existing pipeline | +| Pi 5 / Nexmon | BFLD host | **yes** | yes (via Nexmon) | Primary BFLD capture | +| ruvultra (RTX 5080 + AX210) | Training / dev | yes (via AX210 monitor mode) | yes | Dev capture; not production | +| cognitum-v0 (Pi 5) | Appliance | **yes** (production) | yes | Production BFLD host | + +--- + +## 2. Decision + +### 2.1 Production capture path: Pi 5 / Nexmon + +The BFLD production capture path is implemented as a new module in the vendored rvCSI submodule: + +``` +vendor/rvcsi/crates/rvcsi-adapter-nexmon/ +└── src/ + ├── lib.rs + ├── csi.rs # existing CSI capture + └── bfi.rs # NEW — CBFR capture, exports BfiCapture +``` + +The new `bfi.rs` parses CBFR frames (VHT or HE) from the Nexmon-patched firmware's userspace stream, extracts Φ/ψ angle matrices, and emits a `BfiCapture` struct that feeds the BFLD crate's extractor (ADR-118 §2.1, ADR-119). + +The patch lives in the rvcsi submodule (`github.com/ruvnet/rvcsi`) and is shipped as `rvcsi-adapter-nexmon ^0.3.5` to crates.io. The wifi-densepose workspace consumes the published crate (or the submodule path during development). + +### 2.2 BFLD crate adapter trait + +`wifi-densepose-bfld` defines a `BfiCaptureAdapter` trait: + +```rust +pub trait BfiCaptureAdapter: Send + 'static { + type Error: std::error::Error + Send + Sync + 'static; + fn capture(&mut self) -> Result, Self::Error>; + fn capabilities(&self) -> AdapterCapabilities; +} + +pub struct AdapterCapabilities { + pub supports_he: bool, // 802.11ax (Wi-Fi 6) + pub supports_160mhz: bool, + pub max_n_rx: u8, + pub host_kind: HostKind, // Pi5Nexmon | Ax210Linux | EspS3Local | Mock +} +``` + +Three impls ship initially: + +- `NexmonBfiAdapter` — Pi 5 / Nexmon (production) +- `Ax210BfiAdapter` — Linux + AX210 in monitor mode (dev / training, ruvultra) +- `MockBfiAdapter` — replay fixture for tests and CI + +A future fourth impl (`EspS3LocalAdapter`) is reserved for the day Espressif exposes promiscuous CBFR — it captures only the S3's own AP-link BFI for local self-reporting. + +### 2.3 Capture-side privacy boundary + +Per ADR-120 I1, raw BFI never leaves the capturing host. The adapter must therefore live on **the same physical box** as the BFLD crate's extractor and privacy gate. The architecture pattern: + +``` +[ Pi 5 / cognitum-v0 ] +├── nexmon firmware (kernel) +├── rvcsi-adapter-nexmon (userspace, captures BFI) +├── wifi-densepose-bfld (extracts, scores, gates) +│ └── privacy_gate → class-2/3 frames only +└── wifi-densepose-sensing-server (publishes MQTT + Matter) +``` + +A network-mode adapter that streams raw BFI from a remote capture host is **explicitly forbidden**. The adapter trait does not include any "remote URL" parameter. + +### 2.4 Channel / bandwidth coverage + +The Nexmon adapter is configured by the existing `rvcsi-adapter-nexmon` channel-hopping schedule (ADR-095 §3.2). For BFLD it adds: + +- Filter for VHT CBFR (action frame, category 21, action 0) and HE CBFR (category 30, action 0). +- Per-channel BFI session-tracking — the same beamformer/beamformee pair across a channel hop is reconciled by AP MAC + STA MAC. + +### 2.5 ESP32-S3 local self-reporting (deferred) + +For deployments without a Pi 5 / cognitum-v0 nearby, a degraded BFLD mode runs on the ESP32-S3 itself: + +- Captures only its own AP-link CBFR (self-addressed). +- Computes features over the limited window. +- Reports a coarsened `presence` + `motion` only — no `identity_risk_score` (insufficient sample diversity). +- Emits `BfldFrame` at `privacy_class = 2` with a `flags.bit3 = self_only` marker. + +This path is implemented in firmware as part of P2 / P3 of the ADR-118 rollout, after the Pi 5 path is stable. Effort is small (firmware path reuses the existing CSI capture loop) but the value is also low until ESP32 firmware exposes promiscuous CBFR — which is a Espressif-IDF roadmap item, not under project control. + +### 2.6 Dev path: ruvultra / AX210 + +For local dev iteration on the Windows / ruvultra box, the AX210 adapter provides a workable capture path on Linux (ruvultra is Ubuntu 6.17 per CLAUDE.local.md). The AX210 supports 802.11ax + monitor mode with the `iwlwifi` driver patches that have landed upstream. This path is for training-data collection and dev testing, not production. + +--- + +## 3. Consequences + +### Positive + +- BFLD ships as a production-ready surface on cognitum-v0 day one — no new hardware procurement. +- The adapter-trait design lets new capture paths (AX211, MediaTek Filogic, etc.) slot in without changes to the BFLD crate. +- The capture-side privacy boundary is structural: there is no remote-capture code path, so a future PR cannot accidentally introduce one. +- ruvultra's AX210 path unblocks training and dev iteration on Linux without depending on the Pi 5 fleet. + +### Negative + +- BFLD's full pipeline depends on cognitum-v0 (or another Pi 5 / Nexmon host) being present in the deployment. Operators without a Pi 5 get only the degraded ESP32-S3 self-reporting path (limited utility). +- Nexmon is a third-party kernel module; tracking upstream patches is ongoing maintenance. +- The CBFR frame format differs between VHT (802.11ac) and HE (802.11ax); the parser must support both, and any 802.11be (Wi-Fi 7) deployment will require an additional parser path. + +### Neutral + +- ruvultra dev path uses AX210; the AX210 is not the production NIC, so dev/prod parity is via the fixture replay + the Nexmon adapter on cognitum-v0. + +--- + +## 4. Alternatives Considered + +### Alt 1: Centralized capture host streams raw BFI to RuView nodes + +Rejected: violates ADR-120 I1 (raw never leaves the capture host). The capture host **is** the BFLD node; there is no separation. + +### Alt 2: Wait for Espressif promiscuous CBFR support + +Rejected: indefinite timeline outside project control. The Pi 5 / Nexmon path is shippable today. + +### Alt 3: Custom Pi 5 firmware fork instead of Nexmon + +Rejected: forking BCM firmware is a huge maintenance burden and Nexmon already does what we need. + +### Alt 4: Only ship the ESP32-S3 self-reporting path + +Rejected: insufficient sample diversity for `identity_risk_score`. The whole point of BFLD is to measure identity leakage; a self-only path cannot do that meaningfully. + +--- + +## 5. Acceptance Criteria + +- [ ] **AC1**: `NexmonBfiAdapter` captures ≥ 100 valid CBFR frames per minute from a 2-AP-3-STA test bench on a Pi 5 (cognitum-v0). +- [ ] **AC2**: VHT (802.11ac) and HE (802.11ax) CBFR frames are both parsed; mixed-PHY captures produce correctly-typed `BfiCapture` outputs. +- [ ] **AC3**: 20/40/80/160 MHz channel widths are all supported (one fixture each in `tests/`). +- [ ] **AC4**: `BfiCaptureAdapter` trait has no method accepting a remote URL or socket address. +- [ ] **AC5**: ESP32-S3 self-only adapter compiles `#[no_std]` and produces a `BfldFrame` with `flags.bit3 = self_only` set, no `identity_risk_score` field. +- [ ] **AC6**: AX210 adapter on ruvultra captures CBFR for at least one fixture-generating dev session. +- [ ] **AC7**: Capture loop sustains 10 Hz BFI frame rate on cognitum-v0 without dropping frames over a 10-minute soak test. + +--- + +## 6. References + +- ADR-095 / ADR-096 (rvCSI Nexmon adapter) +- ADR-028 (ESP32 capability audit) +- ADR-110 (ESP32-C6 firmware) +- Nexmon BCM43455c0 patches: https://github.com/seemoo-lab/nexmon +- Wi-BFI: https://arxiv.org/abs/2309.04408 +- IEEE 802.11-2020 §19.3.12 (VHT CBFR), §27.3.11 (HE CBFR) +- cognitum-v0 fleet entry: `CLAUDE.local.md` (Tailscale fleet table) diff --git a/docs/research/BFLD/01-sota-survey.md b/docs/research/BFLD/01-sota-survey.md new file mode 100644 index 00000000..4fbd16ee --- /dev/null +++ b/docs/research/BFLD/01-sota-survey.md @@ -0,0 +1,293 @@ +# BFLD SOTA Survey — Beamforming Feedback: State of the Art + +## 1. BFI vs CSI: Physical-Layer Differences and Leakage Profiles + +### 1.1 Channel State Information (CSI) + +CSI is the raw complex channel frequency response (CFR) measured at the receiver across +all subcarriers and antenna pairs. Extracting CSI requires either (a) firmware +modifications on the receiving NIC (Atheros CSI Tool, Nexmon CSI patch for BCM43455c0 +on Raspberry Pi 4/5) or (b) a specialized radio (software-defined radio with 802.11 +decoders). The resulting matrix is typically Ntx × Nrx × Nsubcarrier complex floats — +dense, high-dimensional, and not transmitted over the air in standard operation. + +This project's existing rvCSI runtime (`vendor/rvcsi/`) captures CSI via the Nexmon +firmware patch on Raspberry Pi hardware (ADR-095/096). The ESP32-S3 on COM9 cannot +produce CSI in the format needed for the full pipeline — it lacks the antenna count +and the firmware support for per-subcarrier phase extraction at the fidelity rvcsi +expects. + +### 1.2 Beamforming Feedback Information (BFI) + +BFI is fundamentally different: it is the compressed representation of the channel that +a STA (station/client) sends back to an AP (access point) so the AP can steer its beam +toward the client. The standard (IEEE 802.11ac/ax, section 9.4.1.52) defines the +compressed beamforming format as: + +1. The AP transmits a Null Data Packet (NDP) sounding frame. +2. The STA measures the channel from the NDP, computes the singular-value decomposition + V = U Sigma V^H, then compresses the right singular vectors using a series of Givens + rotations. +3. The Givens rotation produces a set of angles: Phi (φ) angles in [0, 2π) and Psi (ψ) + angles in [0, π/2). In 802.11ac these are quantized to 7 and 5 bits respectively; in + 802.11ax the default is 4 bits for φ and 2 bits for ψ. +4. The STA transmits a VHT/HE Compressed Beamforming frame (CBFR) containing those + quantized angles, one set per active subcarrier (or per compressed subcarrier group), + plus an SNR field per stream. + +The CBFR is a **management-plane 802.11 frame, not an 802.3 data frame**. It is +transmitted before association encryption is negotiated; in WPA2/WPA3 deployments, the +beamforming sounding and feedback exchange happens in the clear because WPA2/WPA3 +encrypt data frames only. Even 802.11ax (Wi-Fi 6/6E) with Protected Management Frames +(PMF) enabled does NOT encrypt action frames in the beamforming exchange by default on +commodity APs as of 2025 (NDSS 2025 finding, "Lend Me Your Beam", +https://www.ndss-symposium.org/ndss-paper/lend-me-your-beam-privacy-implications-of-plaintext-beamforming-feedback-in-wifi/). + +**Key asymmetry**: extracting CSI requires physical access to a device and firmware +modification; extracting BFI requires only a WiFi adapter in monitor mode and a parser +for the CBFR frame format. Wi-BFI (Haque, Meneghello, Restuccia; ACM WiNTECH 2023, +https://arxiv.org/abs/2309.04408) is an open-source pip-installable tool that does +exactly this. + +### 1.3 Why BFI Is Uniquely Dangerous + +CSI is a research instrument — accessing it requires deliberate effort. BFI is a +production protocol artifact that any 802.11ac/ax STA broadcasts periodically as a +matter of course. The attack-surface implications: + +- **No firmware modification needed** on the target device or AP. +- **Passive capture** is sufficient. Frames are broadcast in all directions, not + beamformed, so a nearby attacker receives them at essentially the same SNR as the AP. +- **Structured leakage**: the Phi/Psi angle matrices encode a compressed but + non-trivially-invertible representation of the spatial channel, which includes + multipath geometry that is body-shaped — the human body is a dielectric obstacle whose + shape and movement modulate the channel. +- **Regularity**: sounding happens at the AP's request, typically at 5–40 Hz in modern + 802.11ax deployments. A 60-second capture at 10 Hz produces 600 CBFR frames — + sufficient for the BFId classifier to achieve >90% re-identification accuracy (ACM CCS + 2025, https://dl.acm.org/doi/10.1145/3719027.3765062). + +--- + +## 2. Compressed Angle Matrices: The Identity Surface + +### 2.1 Givens Rotation Reconstruction + +The Phi/Psi angles encode a unitary matrix via the Givens rotation decomposition: + + V = G(N, N-1, φ_{N,N-1}, ψ_{N,N-1}) · G(N, N-2, ...) · ... · G(2,1, φ_{2,1}, ψ_{2,1}) · D + +where D is a diagonal phase matrix. For a 2×2 MIMO system this is two angles; for a +4×4 system this is 12 angles. Each "column" in the BFI payload corresponds to one +subcarrier group (or every 4th subcarrier in 802.11ax, every 2nd in 802.11ac). + +The resulting per-subcarrier angle sequence is a time-varying signature of the spatial +channel. Because the human body modulates the multipath channel, this sequence encodes +body-specific geometry. The BFId paper (https://dl.acm.org/doi/10.1145/3719027.3765062) +demonstrates that a supervised classifier trained on these sequences achieves identity +recognition on a 197-person dataset. + +### 2.2 The AI/ML Compression Feedback Loop + +IEEE 802.11 standardization is actively exploring AI/ML-based compression for +beamforming feedback (IEEE 802.11bn / Wi-Fi 8 study group, "Toward AIML Enabled WiFi +Beamforming CSI Feedback Compression", https://arxiv.org/html/2503.00412v1). This work +proposes neural codebooks that reduce feedback overhead. An important side effect: the +learned latent space of a neural BFI compressor may be *more* identity-discriminative +than the raw angles, because neural compression tends to preserve class-discriminative +variance. BFLD must be designed to handle compressed BFI encodings, not just the raw +Phi/Psi format. + +--- + +## 3. Tooling Landscape + +### 3.1 Wi-BFI + +- **Source**: https://arxiv.org/abs/2309.04408 / https://github.com/kfoysalhaque/MU-MIMO-Beamforming-Feedback-Extraction-IEEE802.11ac +- **Capabilities**: real-time and offline extraction of BFAs from 802.11ac and 802.11ax; + 20/40/80/160 MHz; SU-MIMO and MU-MIMO; pip-installable. +- **Relevance to BFLD**: the BFLD extractor module (`extractor.rs`) must produce + semantically equivalent output to Wi-BFI — i.e., per-subcarrier Phi/Psi angle arrays + plus per-stream SNR — so that research results from the Wi-BFI ecosystem can be + replicated on BFLD captures. + +### 3.2 PicoScenes + +- **Source**: https://www.semanticscholar.org/paper/Eliminating-the-Barriers-Demystifying-Wi-Fi-Baseband-Jiang-Zhou/... +- **Capabilities**: cross-NIC CSI and CBFR measurement platform; supports Intel AX200, + AX210, Atheros AR9300, QCA6174; runs on Linux with custom kernel modules. +- **Relevance to BFLD**: PicoScenes can simultaneously capture CSI and BFI from the + same frame sequence, enabling the CSI+BFI fusion path described in the BFLD spec + (`csi_matrix` optional input). The rvcsi adapter layer (`vendor/rvcsi/`) already + handles the Nexmon PCap format; a PicoScenes adapter is a future extension. + +### 3.3 Nexmon CSI (BCM43455c0) + +- **Source**: https://github.com/seemoo-lab/nexmon_csi +- **Hardware**: Raspberry Pi 4/5 with BCM43455c0 chip — the same hardware used in + `cognitum-v0` (Pi 5 appliance in this fleet, see CLAUDE.local.md). +- **Capabilities**: per-subcarrier complex CSI in monitor mode; 4×4 MIMO on Pi 5 with + BCM43456. +- **Relevance to BFLD**: the rvcsi nexmon adapter already routes PCap frames from this + hardware into the wifi-densepose pipeline. BFI extraction on the same hardware requires + an additional sniffer for CBFR frames alongside the CSI sniffer. + +### 3.4 Atheros CSI Tool / iwlwifi CSI + +- Legacy tools for Intel and Atheros NICs; require kernel module injection. Not relevant + to the current hardware fleet (ESP32-S3 + Raspberry Pi 5), but documented here for + completeness and for future Intel AX210-based deployments. + +--- + +## 4. Identity Inference Attacks + +### 4.1 BFId (ACM CCS 2025) + +**Reference**: Todt, Morsbach, Strufe; KIT. ACM CCS 2025. +https://dl.acm.org/doi/10.1145/3719027.3765062 +https://publikationen.bibliothek.kit.edu/1000185756 +Dataset: https://ps.tm.kit.edu/english/bfid-dataset/index.php + +BFId is the first published identity-inference attack that uses BFI exclusively (no +CSI). The methodology: + +1. **Dataset**: 197 individuals, multiple sessions, multiple AP angles. Each subject + walked a defined path while their STA continuously triggered BFI exchanges. CSI + was also recorded simultaneously for comparison. +2. **Feature extraction**: temporal sequences of Phi/Psi angle matrices, windowed at + varying lengths. Basic statistical features (mean, variance, cross-subcarrier + correlation) fed a shallow classifier. +3. **Results**: re-identification accuracy >90% with as little as 5 seconds of BFI. + Performance was robust to different walking styles and viewing angles — consistent + with the hypothesis that anthropometric body shape (torso width, stride, limb + geometry) rather than gait phase is the primary discriminator. +4. **Comparison to CSI**: BFI-only accuracy was comparable to CSI-only accuracy for + identity tasks, despite BFI being a compressed representation. This confirms that + the Givens angle compression preserves identity-discriminative variance. + +### 4.2 LeakyBeam (NDSS 2025) + +**Reference**: Xiao, Chen, He, Han, Han; Zhejiang U., NTU, KAIST. NDSS 2025. +https://www.ndss-symposium.org/ndss-paper/lend-me-your-beam-privacy-implications-of-plaintext-beamforming-feedback-in-wifi/ + +LeakyBeam targets occupancy detection (is a person present?) rather than identity. +Key findings: + +- BFI is detectable through walls at 20 m range with commodity hardware. +- True positive rate 82.7%, true negative rate 96.7% in real-world evaluation. +- The attack works because BFI encodes motion-induced channel perturbations even through + obstacles — the Phi/Psi angle variance changes measurably when a body enters the room. +- The defense (obfuscating BFI before transmission) requires minimal hardware changes. + +**Implication for BFLD**: if a passive attacker with no relationship to the AP can +detect occupancy, then the BFLD node is implicitly broadcasting presence information +unless active obfuscation is deployed at the STA firmware level. BFLD cannot prevent +this passive attack — it can only ensure the *node's own output* does not additionally +leak identity. + +### 4.3 Prior RF-Based Gait and Biometric Inference + +Before BFI-specific attacks, the threat landscape was already established through +CSI-based attacks: + +- **Gait from CSI**: WiGait (2017), Wi-Gait (ScienceDirect 2023, + https://www.sciencedirect.com/science/article/abs/pii/S1389128623001962), + Gait+Respiration ID (IEEE Xplore 2021, + https://ieeexplore.ieee.org/document/9488277) all demonstrate >90% gait-based + re-identification from standard WiFi. +- **Breathing biometrics**: Respiration rate and depth are person-specific at a + population level. IEEE 802.11 CSI captures breathing as amplitude oscillations at + 0.1–0.5 Hz. +- **Anthropometric inference**: Hand size, torso width, and limb geometry modulate the + channel; classifiers trained on activity data have been shown to leak anthropometrics + as a side effect. + +The BFId finding that BFI achieves comparable accuracy to CSI for identity is consistent +with this prior body of work — it simply demonstrates the attack is achievable with a +lower barrier to entry. + +--- + +## 5. Privacy-Preserving Sensing: Current State of the Art + +### 5.1 Differential Privacy on RF Embeddings + +"Differentially Private Feature Release for Wireless Sensing: Adaptive Privacy Budget +Allocation on CSI Spectrograms" (https://arxiv.org/pdf/2512.20323) applies Laplace/ +Gaussian mechanisms to CSI spectrograms, calibrating epsilon per subcarrier based on +empirical sensitivity. Results show meaningful reduction in identity-inference accuracy +while preserving activity-recognition utility at epsilon = 1.0–4.0. + +BFLD's `identity_risk_score` could be used as an adaptive epsilon selector: high-risk +frames receive a tighter privacy budget (more noise), low-risk frames pass unmodified. +This is a forward-looking integration not in the current spec. + +### 5.2 Federated / Local-Only Inference + +The consensus across 2024–2025 literature on wireless federated learning +(https://arxiv.org/pdf/2603.19040, https://arxiv.org/pdf/2109.09142) is that +local differential privacy (LDP) with gradient perturbation is achievable on resource- +constrained edge devices. For BFLD's use case the critical property is simpler: the +identity embedding never needs to leave the node. There is no federated learning step +for identity. The risk score is a local computation whose output is published; the +embedding that produced it is not. + +### 5.3 ZK Attestation for Sensing + +ZK-SenseLM (https://arxiv.org/pdf/2510.25677) proposes zero-knowledge proofs that a +sensing model's output derives from legitimate data. This is architecturally close to +ADR-028's witness-bundle approach. Future BFLD work could use ZK proofs to attest that +the identity_risk_score was computed from the claimed input without revealing the input. + +### 5.4 "Protecting Human Activity Signatures in Compressed IEEE 802.11 CSI Feedback" + +(https://arxiv.org/pdf/2512.18529) — This 2024 paper directly addresses activity- +signature leakage in CBFR frames and proposes perturbation of Phi/Psi angles at the STA +before transmission. The defense is the dual of BFLD's approach: BFLD detects leakage +at the receiver; this paper proposes suppression at the transmitter. Both approaches +are complementary. + +--- + +## 6. Relationship to Existing Project ADRs + +**ADR-027 (MERIDIAN cross-environment generalization)**: BFLD's cross-room hash +rotation directly instantiates the "no cross-site correlation" invariant that MERIDIAN +assumes for privacy-safe multi-room deployment. + +**ADR-028 (ESP32 capability audit + witness verification)**: The deterministic-proof +pattern (`verify.py` + SHA-256 expected hash) is the template for BFLD's own acceptance +test. BFLD must produce a deterministic frame hash given the same input — acceptance +criterion 6 in the spec. + +**ADR-024 (AETHER contrastive CSI embedding)**: BFLD reuses the AETHER embedding +infrastructure for its identity_risk measurement. The risk score is a function of how +separable the current embedding is from the population of known embeddings. + +**ADR-029/030 (RuvSense multistatic + field model)**: BFLD's `cross_perspective_ +consistency` component of the risk formula requires correlation across multiple sensor +viewpoints — the multistatic infrastructure from ADR-029 provides this. + +**ADR-032 (multistatic mesh security hardening)**: The BFLD threat model is a +superset of the security model in ADR-032. ADR-032 covers mesh compromise; BFLD adds +the passive sniffing threat at the management-plane layer. + +--- + +## 7. Open Technical Questions + +1. **BFI capture on ESP32-S3**: The ESP32-S3's `esp_wifi_csi_set_config` API provides + CSI via the vendor-specific Espressif HT20 format. It does not expose VHT/HE CBFR + frames. BFI capture on this hardware likely requires host-side sniffing (Pi 5 + + Nexmon in monitor mode, already available on cognitum-v0). + +2. **Quantization resolution degradation**: At 4 bits for φ and 2 bits for ψ (802.11ax + defaults), the angle resolution is coarser than in 802.11ac (7/5 bits). The BFId + paper used 802.11ac hardware. BFLD must validate that the identity_risk_score + calibration remains valid at lower quantization. + +3. **WiFi 7 (802.11be) changes**: 802.11be introduces multi-link operation (MLO) and + may change the sounding/feedback cadence. BFLD's frame format (magic 0xBF1D_0001, + version byte) is designed to accommodate future protocol versions. diff --git a/docs/research/BFLD/02-soul.md b/docs/research/BFLD/02-soul.md new file mode 100644 index 00000000..671531ba --- /dev/null +++ b/docs/research/BFLD/02-soul.md @@ -0,0 +1,141 @@ +# BFLD Soul — Architectural Intent and Ethical Stance + +## 1. The Central Metaphor: Immune System, Not Surveillance Lens + +An immune system does not catalog every pathogen it encounters. It classifies threats +by type, responds proportionally, and keeps its detailed records local to the organism. +When the immune system flags a cell as dangerous, it does not broadcast the cell's +identity to the outside world — it takes local action. + +BFLD is built around this same principle. Its job is to detect when RF data is crossing +from the realm of "ambient sensing" into the realm of "identity record" — and to respond +locally: raise the risk score, restrict what leaves the node, rotate identifiers. It does +not produce identity; it guards against the accidental production of identity. + +This distinction matters because the same physical signal that drives BFLD's presence +detection is also the signal that academic attackers (BFId, LeakyBeam) exploit for +re-identification. BFLD cannot suppress the underlying physics. What it can do is make +the node's *output* non-identifying, even when the node's *input* is capable of +supporting identification. + +--- + +## 2. Distinguishing Identity from the Rest of WiFi Sensing + +WiFi sensing produces a spectrum of information: + +| Output | Privacy class | Reversibility | +|--------|--------------|---------------| +| Presence (yes/no) | 2 — anonymous | Not reversible to identity | +| Motion magnitude (0..1) | 1 — derived | Not reversible to identity | +| Person count (integer) | 1 — derived | Not reversible to identity | +| Zone activity | 1 — derived | Not reversible to identity | +| Identity risk score | 1 — derived | Risk score, not identity | +| RF signature hash | 1 — derived | Hash rotates daily; not reversible | +| Identity embedding | 0 — raw | Directly reversible to biometric | +| Raw BFI matrix | 0 — raw | Directly reversible to biometric | + +BFLD's design follows this table structurally: the outputs in privacy class 0 never +leave the node. The outputs in class 1 leave the node only after explicit operator opt-in +for the sensitive ones (identity_risk_score). The outputs in class 2 flow freely. + +This table is not a policy list — it is wired into the frame format. The `privacy_class` +byte in every `BfldFrame` is checked at the emitter boundary before any byte leaves the +node. Code that wants to send class-0 data must positively bypass a compile-time safety +check, not merely forget to set a flag. + +--- + +## 3. Three Non-Negotiable Invariants + +These are not configurable options. They are structural properties of BFLD that +hold regardless of operator configuration: + +### Invariant 1: Raw BFI Never Leaves the Node + +The BFI matrix, once ingested by the BFLD extractor, is consumed locally and never +serialized to any outbound channel. This is enforced in two ways: + +1. The `BfldFrame` struct's `bfi_matrix` field is not part of the serializable payload + — it exists only as a private field in `extractor.rs` and is dropped after + feature extraction completes. +2. The MQTT emitter (`mqtt.rs`) has no code path that serializes a BFI matrix. + The `ruview//bfld/raw/state` topic is disabled by default and, when + enabled, publishes only a metadata summary (subcarrier count, timestamp, SNR range), + not the angle matrices. + +### Invariant 2: Identity Embedding Is Local-Only + +The embedding computed by the RuVector pipeline (used to calculate `identity_risk_score`) +lives in an in-RAM ring buffer with a configurable retention window (default: 10 minutes). +It is never written to disk. It is never serialized to any MQTT topic. It is never +included in any `BfldFrame` payload even at `privacy_class = 0` — raw means raw angles, +not the derived embedding. + +The mathematical property that enables this: `identity_risk_score` can be computed as a +scalar from the embedding (separability × temporal_stability × cross_perspective_ +consistency × sample_confidence) without revealing the embedding itself. The score is a +projection onto a scalar; the full vector is not required by any downstream consumer. + +### Invariant 3: Cross-Site Identity Matching Is Structurally Impossible + +The `rf_signature_hash` is computed as: + + blake3(site_salt ‖ day_epoch ‖ ephemeral_features) + +where `site_salt` is a secret generated at first boot, stored in NVS, and never +transmitted. Two BFLD nodes at two different sites will produce hashes in disjoint +hash spaces by construction. Even an adversary who obtains the hash stream from +both nodes cannot determine whether the same person visited both sites, because the +site_salt is unknown and different. + +The daily rotation (`day_epoch` = floor(timestamp_ns / 86400e9)) means that even within +a single site, the hash of the same person changes each day. Hashes older than 24 hours +have zero correlation with hashes produced today. + +This is structural impossibility, not policy. The invariant holds even if the operator +misconfigures the system, because it derives from the cryptographic property of blake3 +with a secret key, not from access-control rules. + +--- + +## 4. Relationship to RuView's Ambient Intelligence Positioning + +The project memory records RuView's positioning as "ambient intelligence platform, not +sensor; packaging (HA, Docker, mDNS, blueprints) is the bottleneck." This framing is +load-bearing for BFLD's design. + +A "sensor" in the Home Assistant model is a device that reports measurements. A "sensor" +is allowed to identify who is present — facial recognition cameras are sensors. BFLD +explicitly rejects this model: the node is an ambient intelligence node that knows +something about the environment (motion, occupancy, activity level) but structurally +cannot know *who* is in the environment. + +This positioning enables deployment in spaces where identity-tracking would be +unacceptable: shared workspaces, guest accommodations, hotel rooms, care facilities. +The argument to an operator at a care facility is not "trust us, we won't log who your +patients are." It is: "the system is architecturally incapable of logging who your +patients are, because the identifier rotates daily with a site-specific secret we don't +hold." + +--- + +## 5. Why This Layer Must Exist Before WiFi 7 Ships + +802.11be (Wi-Fi 7) is entering mass market deployment in 2025–2026. It introduces +multi-link operation (MLO), which dramatically increases the frequency of beamforming +sounding exchanges. Where 802.11ax sonding might occur at 10–40 Hz, MLO sounding on +multiple links simultaneously could produce 3–5× more CBFR frames per second. + +More frames means more training data for identity classifiers. The BFId result at 5 +seconds of 802.11ac data will almost certainly improve with 5 seconds of 802.11be MLO +data. The attack surface is not static. + +BFLD's frame format (magic 0xBF1D_0001, version byte for extension) is designed to +remain valid across protocol generations. The feature extraction modules are pluggable: +a WiFi 7 BFI extractor can be added without changing the privacy gate, the hash rotation, +or the MQTT emitter. The invariants remain invariant. + +The window to establish safe defaults is now, before the installed base is hundreds of +millions of unprotected nodes. BFLD is the layer that carries those safe defaults into +every deployment from day one. diff --git a/docs/research/BFLD/03-security-threat-model.md b/docs/research/BFLD/03-security-threat-model.md new file mode 100644 index 00000000..b388a65c --- /dev/null +++ b/docs/research/BFLD/03-security-threat-model.md @@ -0,0 +1,278 @@ +# BFLD Security Threat Model + +## 1. Adversary Classes + +### A1 — Passive Sniffer (Curious Neighbor) + +**Capability**: WiFi adapter in monitor mode; consumer laptop running Wi-BFI or +tcpdump with CBFR filter. No special access, no relationship to the target network. + +**Goal**: Determine occupancy or identity of persons in an adjacent apartment/office. + +**Effort**: Low. Wi-BFI is pip-installable. Monitor mode is available on commodity +Linux laptops. No prior knowledge of the target network required — CBFR frames are +broadcast in all directions. + +**Relevance to BFLD**: A1 is the LeakyBeam threat (NDSS 2025). BFLD cannot prevent +A1 from capturing BFI from the air. BFLD's job is to ensure its own output does not +make A1's work easier by publishing identity-correlated data on reachable channels. + +### A2 — Targeted Stalker + +**Capability**: A1 capabilities plus knowledge of the target's device MAC address +(obtainable from BSSID probe requests) and time correlation with known schedules. + +**Goal**: Track a specific individual's presence across time or across locations. + +**Effort**: Medium. Requires sustained monitoring (hours to days) and a correlation +step. + +**Relevance to BFLD**: If rf_signature_hash were stable over time, A2 could correlate +hash sequences across sessions to confirm a specific person's schedule. The daily hash +rotation (Invariant 3) severs this correlation. + +### A3 — ISP / Operator + +**Capability**: Access to MQTT broker, HA instance, or cloud integration receiving +BFLD events. + +**Goal**: Build behavioral profiles of occupants across many homes/installations. + +**Effort**: Low if raw or identity-correlated fields are published to the broker. + +**Relevance to BFLD**: BFLD restricts what reaches the broker. An operator cannot +accidentally publish identity-correlated data because the privacy gate blocks it at +the node boundary. + +### A4 — Nation-State / Law Enforcement + +**Capability**: Compelled access to cloud storage, MQTT broker logs, or HA history. +Physical access to the BFLD node with forensic tools. + +**Goal**: Retrospectively identify who was present at a location and when. + +**Effort**: Depends on what data was logged. If BFLD's invariants hold, the broker +holds only: presence events (boolean), motion scores (float), person counts (integer), +and rotated hashes. None of these are individually re-identifiable. + +**Relevant mitigation**: The daily hash rotation means that even log retention is +privacy-preserving: a hash from Monday and a hash from Tuesday, even from the same +person at the same node, are in disjoint hash spaces. + +### A5 — Compromised AP Firmware + +**Capability**: Malicious AP firmware that modifies the sounding schedule to extract +more identity-discriminative BFI, or that responds to specially crafted packets with +high-resolution channel feedback. + +**Goal**: Improve passive capture quality from the node's BFI stream. + +**Relevance to BFLD**: BFLD ingests BFI as captured from the air. If the AP is +compromised to produce unusually high-resolution BFI, BFLD's identity_risk_score +will correctly detect the elevated separability and flag the frames at higher risk. +The system is self-normalizing to the quality of what is captured. + +### A6 — Supply-Chain Compromise of RuView Node + +**Capability**: Modified BFLD binary with the privacy gate removed or with an +exfiltration path added. + +**Goal**: Long-term silent collection of identity embeddings or raw BFI. + +**Mitigation**: ADR-028's witness-bundle pattern — deterministic SHA-256 of the +pipeline output. A compromised binary would produce different output for the same +input, failing the verify.py check. The BFLD acceptance criterion 6 (deterministic +frame hashes) is the direct countermeasure. + +--- + +## 2. Attack Trees + +### AT-1: Passive BFI Capture → Identity Inference + +``` +Attacker Goal: Re-identify a specific person via BFI +| ++-- Step 1: Place WiFi adapter in monitor mode (A1) +| | +| +-- CBFR frames arrive unencrypted (established by NDSS 2025 / BFId) +| ++-- Step 2: Parse Phi/Psi angles using Wi-BFI or equivalent +| | +| +-- No modification of target device required (Wi-BFI passive) +| ++-- Step 3: Collect 5-60 seconds of frames +| | +| +-- BFId: 5s sufficient at 10 Hz sounding rate for >90% accuracy +| ++-- Step 4: Run identity classifier (BFId architecture or similar) +| | +| +-- Requires enrollment (prior reference capture) +| | | +| | +-- OR: exploit BFLD's rf_signature_hash as a correlation anchor +| | (mitigated by daily rotation — AT-2 below) +| ++-- Outcome: Identity label with >90% confidence +``` + +BFLD mitigation: BFLD does not prevent AT-1 at the air interface. It ensures that +BFLD's own output does not provide the "correlation anchor" in step 4. + +### AT-2: Cross-Site Correlation via rf_signature_hash Leak + +``` +Attacker Goal: Confirm person X visited site A and site B on the same day +| ++-- Prerequisite: Attacker has read access to MQTT broker at both sites +| ++-- Step 1: Collect rf_signature_hash sequences from site A and site B +| ++-- Step 2: Look for matching hashes within the same day_epoch +| | +| +-- BLOCKED: site_salt is site-specific and secret. +| blake3(salt_A ‖ day ‖ features) != blake3(salt_B ‖ day ‖ features) +| even if features are identical. +| Two sites with the same person produce hashes in disjoint spaces. +| ++-- Outcome: No match possible. Attack fails structurally. +``` + +### AT-3: Timing Side-Channel on identity_risk_score + +``` +Attacker Goal: Infer when a known person is present by monitoring risk score changes +| ++-- Prerequisite: Read access to MQTT topic ruview//bfld/identity_risk/state +| ++-- Step 1: Baseline: collect identity_risk_score during known-empty periods +| ++-- Step 2: Monitor for anomalous spikes correlated with known schedules +| | +| +-- Partial mitigation: risk score is not published by default. +| | Operator must explicitly enable it. +| | +| +-- Residual risk: even with publication enabled, the score measures risk of +| identification, not identity itself. A high risk score means "this frame +| is identity-discriminative" not "person X is present." +| ++-- Mitigation: MQTT ACL restricts identity_risk to local broker by default. ++-- Mitigation: privacy_class=3 (restricted) zeros the risk score on output. +``` + +### AT-4: MQTT Topic Enumeration + +``` +Attacker Goal: Discover what BFLD data is published and harvest it +| ++-- Step 1: Connect to broker without TLS (if TLS not configured) +| ++-- Step 2: Subscribe to ruview/# wildcard +| ++-- Mitigation: Default mosquitto ACL denies wildcard subscription to anonymous clients. ++-- Mitigation: TLS + client certificates recommended for all BFLD deployments. ++-- Mitigation: ruview//bfld/raw/state is disabled by default. +``` + +### AT-5: Matter Cluster Abuse + +``` +Attacker Goal: Extract identity-correlated data via the Matter protocol integration +| ++-- Step 1: Join the Matter fabric as a legitimate controller +| ++-- Step 2: Read clusters exposed by the BFLD Matter endpoint +| | +| +-- Available: OccupancySensing (presence), MotionSensor (motion), +| PeopleCount (person_count) +| | +| +-- NOT AVAILABLE: identity_risk_score, rf_signature_hash, raw_bfi, +| identity_embedding — these are rejected at the Matter boundary. +| ++-- Outcome: Attacker gets presence/motion/count — same as any occupancy sensor. + No identity-correlated data is accessible via Matter. +``` + +--- + +## 3. Trust Boundary Diagram + +``` +┌────────────────────────────────────────────────────────────────────────┐ +│ BFLD NODE (local) │ +│ │ +│ WiFi air interface │ +│ │ CBFR frames (unencrypted, passively sniffable by any A1) │ +│ ▼ │ +│ ┌──────────────┐ raw BFI ┌──────────────┐ │ +│ │ BFI │──────────────│ Feature │ │ +│ │ Extractor │ (local RAM) │ Extractor │ │ +│ └──────────────┘ └──────┬───────┘ │ +│ │ features (not BFI) │ +│ ▼ │ +│ ┌──────────────┐ embedding │ +│ │ Identity │──────────────┐ │ +│ │ Risk Engine │ (local RAM │ │ +│ └──────┬───────┘ ring buf) │ │ +│ │ risk_score │ │ +│ ▼ │ │ +│ ┌───────────────────────────────────────────────────────┐ │ │ +│ │ Privacy Gate │ │ │ +│ │ privacy_class check | hash rotation | field masking │ │ │ +│ └───────┬──────────────────────────────────────────────┘ │ │ +│ │ filtered BfldFrame [embedding │ │ +│ │ (no raw BFI, no embedding) NEVER exits │ │ +│ ▼ this box] │ │ +│ ┌──────────────┐ │ │ +│ │ MQTT │ presence/motion/person_count/risk(opt) │ │ +│ │ Emitter │────────────────────────────────────────► │ │ +│ └──────────────┘ [TLS recommended] │ │ +│ │ │ +└──────────────────────────────────────────────────────────────┘─────────┘ + │ + │ MQTT (TLS) + ▼ +┌─────────────────────┐ ┌──────────────────────────────────────┐ +│ Local Broker │ │ cognitum-v0 federation endpoint │ +│ (mosquitto) │──────► │ (identity fields STRIPPED at node │ +└────────┬────────────┘ │ boundary before federation) │ + │ └──────────────────────────────────────┘ + │ + ▼ +┌─────────────────────┐ ┌──────────────────────────────────────┐ +│ Home Assistant │──────► │ Matter Fabric │ +│ (presence/motion/ │ │ (OccupancySensing / MotionSensor / │ +│ person_count only)│ │ PeopleCount ONLY) │ +└─────────────────────┘ └──────────────────────────────────────┘ +``` + +--- + +## 4. Threat Profile per privacy_class Value + +| privacy_class | Value | Data exposed outbound | Residual threats | +|--------------|-------|----------------------|-----------------| +| raw | 0 | Derived angles + amplitude proxy + phase proxy + SNR. Never BFI matrix. | Angle sequences are identity-discriminative; use only in controlled research environments. Never default. | +| derived | 1 | All BFLD output fields including identity_risk_score and rf_signature_hash. | Risk score timing side-channel (AT-3). Hash must remain rotated. | +| anonymous | 2 | presence, motion, person_count, zone_activity, confidence. No identity-correlated fields. | Temporal occupancy patterns may leak schedule information. Not identity. | +| restricted | 3 | presence only (binary). All other fields zeroed or suppressed. | Minimal. On/off presence is equivalent to a passive IR sensor. | + +--- + +## 5. Witness / Attestation Strategy + +Following ADR-028's pattern, BFLD should produce a deterministic proof bundle: + +1. **Reference input**: a fixed seed synthetic BFI matrix (512 bytes, PRNG seed=117) + stored alongside the test suite. +2. **Expected output hash**: SHA-256 of the serialized `BfldFrame` produced from that + input, committed to the repository. +3. **CI check**: `verify_bfld.py` — same structure as `archive/v1/data/proof/verify.py` + — runs in CI and locally. A compromised binary (A6 threat) would change the output + hash and immediately fail this check. +4. **Witness log**: extend `docs/WITNESS-LOG-028.md` with a BFLD section covering the + privacy gate and hash rotation. + +This attestation does not prevent a runtime compromise, but it raises the cost +significantly: a supply-chain attacker must either (a) match the expected output hash +while also exfiltrating data (computationally infeasible for a hash adversary), or +(b) accept that the tampered binary will be detected on the next verify run. diff --git a/docs/research/BFLD/04-privacy-gating.md b/docs/research/BFLD/04-privacy-gating.md new file mode 100644 index 00000000..6d001be4 --- /dev/null +++ b/docs/research/BFLD/04-privacy-gating.md @@ -0,0 +1,279 @@ +# BFLD Privacy Gating — Mechanisms in Depth + +## 1. The privacy_class Byte: Concrete Data Exposure Tables + +The `privacy_class` byte is the single authoritative classifier for what a BFLD node +is permitted to emit. It is set by the privacy gate module (`privacy_gate.rs`) on every +outbound `BfldFrame` based on the computed `identity_risk_score` and operator configuration. + +### Class 0 — raw + +Intended exclusively for local research captures and red-team validation. Not a +deployable configuration. + +| Field | Published | Notes | +|-------|-----------|-------| +| presence | Yes | Boolean | +| motion | Yes | 0..1 float | +| person_count | Yes | u8 | +| identity_risk_score | Yes | f32 | +| rf_signature_hash | Yes | Rotated blake3, 32 bytes hex | +| zone_activity | Yes | | +| confidence | Yes | | +| compressed_angle_matrix | Yes | Phi/Psi per subcarrier — the sensitive surface | +| amplitude_proxy | Yes | | +| phase_proxy | Yes | | +| snr_vector | Yes | | +| bfi_matrix (raw) | NEVER | Dropped before serialization; not in wire format | +| identity_embedding | NEVER | Local RAM only; not in wire format | + +### Class 1 — derived + +Default for operator-opted-in diagnostics. Includes identity_risk_score and hash but +no angle matrices. + +| Field | Published | Notes | +|-------|-----------|-------| +| presence | Yes | | +| motion | Yes | | +| person_count | Yes | | +| identity_risk_score | Yes | Diagnostic; not in HA default entities | +| rf_signature_hash | Yes | Rotated hash only | +| zone_activity | Yes | | +| confidence | Yes | | +| compressed_angle_matrix | No | Zeroed | +| amplitude_proxy | No | | +| phase_proxy | No | | +| snr_vector | Yes | Per-stream aggregate only | +| bfi_matrix (raw) | NEVER | | +| identity_embedding | NEVER | | + +### Class 2 — anonymous + +Default for all standard deployments. No identity-correlated fields. + +| Field | Published | Notes | +|-------|-----------|-------| +| presence | Yes | | +| motion | Yes | | +| person_count | Yes | | +| identity_risk_score | No | Suppressed | +| rf_signature_hash | No | Suppressed | +| zone_activity | Yes | | +| confidence | Yes | | +| All angle/amplitude/phase fields | No | Zeroed | +| bfi_matrix (raw) | NEVER | | +| identity_embedding | NEVER | | + +### Class 3 — restricted + +Maximum privacy. Suitable for care facilities, medical deployments, guest spaces. + +| Field | Published | Notes | +|-------|-----------|-------| +| presence | Yes | | +| motion | No | Suppressed | +| person_count | No | Suppressed | +| All other fields | No | | +| bfi_matrix (raw) | NEVER | | +| identity_embedding | NEVER | | + +--- + +## 2. rf_signature_hash Rotation Algorithm + +### Construction + +``` +site_salt := blake3_keyed_hash(secret="bfld-site-seed", data=node_mac_address) + # Generated once at first boot, stored in NVS, never transmitted + # 32 bytes + +day_epoch := floor(timestamp_ns / 86_400_000_000_000) + # One new epoch per UTC day + +ephemeral := mean_angle_delta ‖ subcarrier_variance ‖ burst_motion_score + # A small fixed-length summary of the current window's features + # Not identity-specific — any of several persons could produce + # similar values + +rf_signature_hash := BLAKE3( + key = site_salt, // 32 bytes; site-specific secret key + input = day_epoch_bytes(8) ‖ ephemeral_features(24) +) +``` + +### Why cross-site re-identification is structurally impossible + +Two BFLD nodes at sites A and B produce: + +``` +hash_A = BLAKE3(key=salt_A, input=day ‖ features) +hash_B = BLAKE3(key=salt_B, input=day ‖ features) +``` + +BLAKE3 is a PRF (pseudorandom function family) keyed on site_salt. Given identical +`day ‖ features` inputs, hash_A and hash_B are pseudorandom and independent because +salt_A != salt_B. An adversary who observes hash_A and hash_B cannot determine whether +they correspond to the same person without knowing both salts. + +This is not a security proof; it is a consequence of BLAKE3's PRF security assumption, +which holds as long as the site_salt remains secret. + +### Why within-site, within-day tracking is safe + +Within a single day at a single site, two frames from the same person will produce +similar ephemeral features, leading to similar (though not identical — ephemeral features +have some frame-to-frame variation) hash values. This is intentional: it allows +clustering of same-person events within a session without enabling identity recovery. + +The hash is NOT the identity. It is a pseudonym within the scope of (site, day). A +person who visits the same site on two different days gets different pseudonyms on each +day. + +### Daily rotation schedule + +``` +epoch_0 = 0 # day 0 (unix epoch: 1970-01-01) +epoch_k = k * 86_400_000_000_000 # day k in nanoseconds +rotation_time = epoch_{k+1} # midnight UTC +``` + +At rotation time, all existing rf_signature_hash values become cryptographically +disconnected from future values. Logs from before rotation cannot be correlated with +logs after rotation even by the node operator. + +--- + +## 3. Identity Embedding Lifecycle + +``` +BFI frame arrives + | + v +Feature extraction (identity_risk.rs) + | + v +RuVector embedding computed: Vec + | + +-------> identity_risk_score (scalar projection) + | Published (class 1) or suppressed (class 2/3) + | + v +In-RAM ring buffer (EmbeddingRingBuf) + - capacity: 600 frames (default 10 minutes at 1 Hz) + - implemented as VecDeque in heap memory + - NEVER written to disk (no serde, no file I/O in the type) + - NEVER serialized to any MQTT or HTTP path + - Cleared on node restart (RAM is volatile) + | + v [after retention window] +Dropped from ring buffer +``` + +The ring buffer serves two purposes: (1) temporal_stability calculation requires +comparing the current embedding to recent embeddings; (2) the coherence gate +(`coherence_gate.rs`, from `v2/crates/wifi-densepose-signal/src/ruvsense/`) uses +recent frames to determine whether a new frame is a continuation of an existing +trajectory or a new event. + +Both purposes require only that the embeddings exist in RAM during the computation. +Neither purpose requires persistence. + +--- + +## 4. Privacy-Mode Wire-Format Diff + +The following shows what changes in the serialized `BfldFrame` payload when the node +transitions from class 1 (derived) to class 2 (anonymous), which is the transition +that happens when `privacy_mode` is enabled by the operator. + +``` +BfldFrame { + magic: 0xBF1D_0001, // unchanged + version: 1, // unchanged + ap_id: blake3(node_mac ‖ "ap"), // unchanged (already hashed at ingress) + sta_id: ephemeral_u64, // unchanged (already ephemeral) + session_id: u64, // unchanged + quantization: 0x02, // unchanged (i8 in class 1) + privacy_class: 0x01 -> 0x02, // CHANGED + + // Payload (compressed): + compressed_angle_matrix: [...], // class 1: present; class 2: zeroed + omitted + amplitude_proxy: [...], // class 1: present; class 2: omitted + phase_proxy: [...], // class 1: present; class 2: omitted + snr_vector: [...], // class 1: present; class 2: present (aggregate) + + // Event (JSON within payload or outer envelope): + presence: true, // unchanged + motion: 0.42, // unchanged + person_count: 1, // unchanged + identity_risk_score: 0.71, // class 1: present; class 2: OMITTED + rf_signature_hash: "a3f2...", // class 1: present; class 2: OMITTED + zone_activity: "living_room", // unchanged + confidence: 0.88, // unchanged + payload_crc32: // recomputed after changes +} +``` + +The wire-format diff is verified by the acceptance test suite: the same input must +produce a deterministic output for each privacy_class value. + +--- + +## 5. Default-Deny Posture for Future Fields + +Every new field added to `BfldFrame` or the BFLD event JSON in the future MUST be +classified before it ships. The process: + +1. New field is added to `BfldFrame` struct. +2. A `#[privacy_class(minimum = N)]` attribute annotation (or equivalent runtime + check in `privacy_gate.rs`) declares the minimum privacy class at which this + field is suppressed. +3. Unit test asserts that serializing at class < N includes the field and at class ≥ N + omits it. +4. The PR that adds the field cannot pass CI without the classification annotation. + +This is enforced by a custom `#[must_classify]` lint in the crate — any public field +on `BfldFrame` without a classification attribute produces a compile warning that +becomes a CI error. + +--- + +## 6. Auditability: Verifying That Raw BFI Never Left the Network + +An operator who wants to verify that no raw BFI or identity data has been transmitted +from their BFLD node can use the following procedure: + +### 6.1 Network-level audit (tcpdump) + +```bash +# On the node or a port-mirrored switch: +tcpdump -i eth0 -w bfld_audit.pcap port 1883 or port 8883 + +# After capture, search for the BFI frame magic bytes in the PCAP: +# Magic 0xBF1D_0001 in big-endian is bytes BF 1D 00 01 +# If these bytes appear in the MQTT payload, raw BFI may be present. +# They should NOT appear — BFLD strips the angle matrix at privacy_class >= 2. +strings bfld_audit.pcap | grep -v "presence\|motion\|person_count" | wc -l +# Expected: only presence/motion/person_count keys in the MQTT payloads. +``` + +### 6.2 Node self-check command + +```bash +# RuView CLI (planned for P3): +wifi-densepose bfld audit --duration 60s +# Output: "60 frames processed. 0 frames with raw_bfi in payload. +# 0 frames with identity_embedding in payload. +# privacy_class distribution: {2: 57, 3: 3}" +``` + +### 6.3 CI deterministic hash check + +```bash +python python/wifi_densepose/verify_bfld.py +# Must print: VERDICT: PASS +# If a modified binary is exfiltrating raw BFI as part of the payload, +# the output hash will differ from the committed expected hash. +``` diff --git a/docs/research/BFLD/05-automation-integration.md b/docs/research/BFLD/05-automation-integration.md new file mode 100644 index 00000000..1fbd88ce --- /dev/null +++ b/docs/research/BFLD/05-automation-integration.md @@ -0,0 +1,239 @@ +# BFLD Automation & Ecosystem Integration + +## 1. Home Assistant Integration + +### 1.1 Entities Exposed by BFLD + +BFLD extends the sensing-server's existing HA entity set (ADR-115, 21 entities) with +the following new entities: + +| Entity | Type | HA Platform | privacy_class | Default | +|--------|------|-------------|--------------|---------| +| `binary_sensor.bfld_presence` | Boolean | binary_sensor | 2 — anonymous | ON | +| `sensor.bfld_motion` | Float 0..1 | sensor | 2 — anonymous | ON | +| `sensor.bfld_person_count` | Integer | sensor | 1 — derived | ON | +| `sensor.bfld_confidence` | Float 0..1 | sensor | 2 — anonymous | ON | +| `sensor.bfld_identity_risk` | Float 0..1 | sensor (diagnostic) | 1 — derived | OFF | +| `sensor.bfld_zone_activity` | String | sensor | 2 — anonymous | ON | + +`bfld_identity_risk` is classified as a diagnostic entity in the HA model — it is +hidden by default in the UI and not included in recorder history unless explicitly +enabled. This matches the operator opt-in posture for class-1 fields. + +### 1.2 MQTT Discovery Payload (example for presence sensor) + +```json +{ + "name": "BFLD Presence", + "unique_id": "bfld_presence_", + "state_topic": "ruview//bfld/presence/state", + "device_class": "occupancy", + "payload_on": "true", + "payload_off": "false", + "device": { + "identifiers": ["ruview_"], + "name": "RuView BFLD Node", + "model": "wifi-densepose-bfld", + "manufacturer": "RuView" + } +} +``` + +Topic: `homeassistant/binary_sensor/bfld_/presence/config` + +### 1.3 HA Blueprints + +**Blueprint 1: Presence-driven lighting** + +Trigger: `binary_sensor.bfld_presence` changes to `on`. +Condition: Time is between sunset and sunrise. +Action: Turn on `light.living_room` at 40% brightness. +Exit: `binary_sensor.bfld_presence` off for 5 minutes → turn off light. + +This blueprint uses only class-2 (anonymous) data. No identity information is required. + +**Blueprint 2: Motion-aware HVAC** + +Trigger: `sensor.bfld_motion` rises above 0.3 (active movement threshold). +Action: Set `climate.living_room` to comfort mode. +Trigger: `sensor.bfld_motion` stays below 0.1 for 20 minutes (room settled). +Action: Set `climate.living_room` to eco mode. + +**Blueprint 3: Identity-risk anomaly notification** + +Trigger: `sensor.bfld_identity_risk` rises above 0.8 (high-risk threshold). +Condition: privacy mode is NOT enabled. +Action: Notify user via HA mobile app: "BFLD: High identity-leakage risk detected. +Consider enabling privacy mode." + +This blueprint is the only one that touches a class-1 field. The notification is +a privacy-protective action — it alerts the operator that the sensing environment +has changed (e.g., new router firmware, new AP nearby, changed room geometry) in +a way that makes the RF channel more identity-discriminative. + +--- + +## 2. Matter Exposure + +Matter clusters expose the absolute minimum set of BFLD outputs. The constraint is +intentional: Matter fabrics can include cloud bridges, and identity-correlated data +must never reach cloud endpoints. + +### 2.1 Permitted Matter Clusters + +| Matter Cluster | Cluster ID | BFLD Source | Notes | +|----------------|-----------|-------------|-------| +| Occupancy Sensing | 0x0406 | `presence` | `OccupancySensing` attribute `Occupancy` bit 0 | +| Motion Detection | 0x040E (proposed) | `motion` | Published as motion event cluster | +| People Count | — (vendor extension) | `person_count` | No standard cluster yet; use vendor attribute | + +### 2.2 Rejected Matter Fields + +The following BFLD fields MUST NOT be exposed via Matter regardless of operator +configuration: + +- `identity_risk_score` +- `rf_signature_hash` +- `raw_bfi` +- `identity_embedding` +- `compressed_angle_matrix` +- Any future field classified at privacy_class < 2 + +This rejection is enforced in the `cog-ha-matter` crate (`v2/crates/cog-ha-matter/`), +which filters `BfldFrame` events before populating Matter attribute reports. + +### 2.3 Matter Endpoint Configuration + +``` +Endpoint 1: BFLD Occupancy + - Cluster: Occupancy Sensing (0x0406) + - Attribute 0x0000 Occupancy: 0x01 (bitmask, bit 0 = presence) + - Attribute 0x0001 OccupancySensorType: 0x03 (Other = WiFi RF) + - Cluster: Basic Information (0x0028) + - NodeLabel: "BFLD-" + - ProductName: "wifi-densepose-bfld" +``` + +--- + +## 3. MQTT Topic Structure and ACL Recommendations + +### 3.1 Topic Tree + +``` +ruview//bfld/ + presence/state # "true" | "false" — class 2 + motion/state # "0.42" — class 2 + person_count/state # "1" — class 1 + identity_risk/state # "0.71" — class 1, disabled by default + raw/state # disabled by default, class 0 metadata only + zone_activity/state # "living_room" — class 2 + confidence/state # "0.88" — class 2 + events/bfld_update # Full JSON event payload — class 2 fields only by default +``` + +### 3.2 Mosquitto ACL Recommendations + +``` +# /etc/mosquitto/acl.conf (example) + +# BFLD node publishes to its own subtree +user bfld_node_ +topic write ruview//bfld/# + +# Home Assistant reads presence, motion, count, zone, confidence +user homeassistant +topic read ruview/+/bfld/presence/state +topic read ruview/+/bfld/motion/state +topic read ruview/+/bfld/person_count/state +topic read ruview/+/bfld/zone_activity/state +topic read ruview/+/bfld/confidence/state +topic read ruview/+/bfld/events/bfld_update + +# HA diagnostic access (operator opt-in required to add this rule): +# topic read ruview/+/bfld/identity_risk/state + +# DENY all wildcard subscriptions for anonymous clients: +# (mosquitto default: anonymous clients get no access) + +# DENY raw topic for all non-admin users: +# raw/state is never written by default; no read ACL needed +``` + +### 3.3 TLS Configuration + +BFLD should use TLS for all MQTT connections. The BFLD node connects as a TLS client; +the broker must present a certificate matching the expected CA. The sensing-server +already supports mTLS (ADR-115). BFLD inherits this configuration. + +--- + +## 4. Node-RED and OpenHAB Compatibility + +BFLD publishes standard MQTT payloads with consistent topic structure. No Node-RED +or OpenHAB plugin is required; standard MQTT input/output nodes work directly. + +**Node-RED example flow**: + +```json +[ + {"id": "bfld-in", "type": "mqtt in", + "topic": "ruview/+/bfld/presence/state", "qos": "1"}, + {"id": "filter", "type": "switch", + "property": "payload", "rules": [{"t": "eq", "v": "true"}]}, + {"id": "notify", "type": "http request", + "url": "http://ha/api/events/bfld_presence_on"} +] +``` + +**OpenHAB MQTT binding** (items file): + +``` +Switch BfldPresence "BFLD Presence" {mqtt="<[broker:ruview/node1/bfld/presence/state:state:default]"} +Number BfldMotion "BFLD Motion" {mqtt="<[broker:ruview/node1/bfld/motion/state:state:default]"} +``` + +--- + +## 5. cognitum-v0 Federation + +The cognitum-v0 appliance (Pi 5, running ruview-mcp-brain on port 9876, +cognitum-rvf-agent on port 9004, ruvector-hailo-worker on port 50051 — see +CLAUDE.local.md) is the fleet coordinator for multi-room correlation. + +BFLD events from individual nodes flow to cognitum-v0 via the federation path. +The critical constraint: **identity fields are stripped at the node boundary before +federation**. The stripping happens in the local BFLD emitter (`mqtt.rs`), not in +cognitum-v0. By the time a BFLD event reaches the broker that cognitum-v0 subscribes to, +it contains only class-2 (anonymous) or class-3 (restricted) fields. + +### 5.1 Federation Topics + +``` +# Node-local (not federated): +ruview//bfld/identity_risk/state +ruview//bfld/raw/state + +# Federated (forwarded to cognitum-v0 broker): +ruview//bfld/presence/state +ruview//bfld/motion/state +ruview//bfld/person_count/state +ruview//bfld/events/bfld_update +``` + +### 5.2 cognitum-rvf-agent Role + +The `cognitum-rvf-agent` (port 9004) handles cross-node RVF (RuView Frame) container +events. For BFLD, it receives federated presence/motion/count events and can correlate +them for multi-room occupancy (e.g., "person moved from living room node to kitchen +node"). It does not receive or need identity information to perform this correlation — +it uses temporal and spatial proximity, not identity. + +### 5.3 Hailo Inference (Future) + +The `ruvector-hailo-worker` (port 50051) on cognitum-v0 runs vector similarity on the +Hailo-8 AI accelerator. A future extension could offload BFLD's identity_risk_score +computation to the Hailo worker, keeping the identity embedding local to cognitum-v0 +while giving individual nodes the benefit of a larger enrollment pool for risk +calibration. This is explicitly out of scope for the current BFLD spec — it is noted +here as an integration-compatible extension point. diff --git a/docs/research/BFLD/06-implementation-plan.md b/docs/research/BFLD/06-implementation-plan.md new file mode 100644 index 00000000..90e5aebb --- /dev/null +++ b/docs/research/BFLD/06-implementation-plan.md @@ -0,0 +1,253 @@ +# BFLD Implementation Plan + +## 1. New Crate: wifi-densepose-bfld + +Location: `v2/crates/wifi-densepose-bfld/` + +This crate slots between `wifi-densepose-signal` (BFI normalization, temporal windowing) +and `wifi-densepose-sensing-server` (MQTT/HA integration). It does not depend on the +training pipeline (`wifi-densepose-train`) or the neural-network inference crate +(`wifi-densepose-nn`) in the default build — feature flags activate those paths. + +### 1.1 Module Layout + +``` +v2/crates/wifi-densepose-bfld/ + Cargo.toml + src/ + lib.rs # Public API: BfldPipeline, BfldFrame, BfldEvent + frame.rs # BfldFrame struct, serialization, CRC32, magic bytes + extractor.rs # BFI packet capture interface, Phi/Psi parsing, + # 802.11ac/ax CBFR format decoder + features.rs # Feature computation: mean_angle_delta, + # subcarrier_variance, temporal_entropy, + # doppler_proxy, path_stability, + # cross_antenna_correlation, burst_motion_score, + # stationarity_score, identity_separability_score + identity_risk.rs # identity_risk_score formula, EmbeddingRingBuf, + # in-RAM-only lifecycle enforcement + privacy_gate.rs # privacy_class assignment, field masking, + # #[must_classify] lint check + emitter.rs # BfldEvent construction, JSON serialization + mqtt.rs # MQTT topic publishing, ACL, per-class topic routing + tests/ + frame_roundtrip.rs # BfldFrame serialization + CRC32 determinism + privacy_gate.rs # Per-class field suppression assertions + hash_rotation.rs # Cross-site isolation + daily rotation proofs + identity_risk.rs # Risk score bounded [0,1], local-only embedding + acceptance.rs # All 7 acceptance criteria as named tests + benches/ + pipeline_throughput.rs # Frame processing at 40 Hz +``` + +### 1.2 Public API Sketch + +```rust +// lib.rs — primary entry points + +pub struct BfldPipeline { + config: BfldConfig, + extractor: BfiExtractor, + feature_engine: FeatureEngine, + identity_risk: IdentityRiskEngine, + privacy_gate: PrivacyGate, + emitter: BfldEmitter, +} + +impl BfldPipeline { + pub fn new(config: BfldConfig) -> Result; + pub fn process_frame(&mut self, raw: RawBfiCapture) -> Option; + pub fn current_privacy_class(&self) -> PrivacyClass; + pub fn enable_privacy_mode(&mut self); // forces class 3 +} + +pub struct BfldEvent { + pub timestamp_ns: u64, + pub presence: bool, + pub motion: f32, // 0.0..1.0 + pub person_count: u8, + pub identity_risk_score: Option, // None if privacy_class >= 2 + pub rf_signature_hash: Option<[u8; 32]>, // None if privacy_class >= 2 + pub zone_id: Option, + pub confidence: f32, + pub privacy_class: PrivacyClass, +} + +#[repr(u8)] +pub enum PrivacyClass { + Raw = 0, + Derived = 1, + Anonymous = 2, + Restricted = 3, +} +``` + +--- + +## 2. Reuse Map: Existing Crates and Modules + +### 2.1 RuvSense Modules (wifi-densepose-signal) + +Path: `v2/crates/wifi-densepose-signal/src/ruvsense/` + +| Module | Used by BFLD | Purpose | +|--------|-------------|---------| +| `coherence_gate.rs` | `identity_risk.rs` | Accept/reject frame based on coherence score; gates embeddings fed into risk calculation | +| `multistatic.rs` | `features.rs` | Attention-weighted fusion for cross_perspective_consistency component of risk score | +| `cross_room.rs` | `privacy_gate.rs` | Environment fingerprinting — confirms that the site_salt corresponds to the current room geometry | +| `longitudinal.rs` | `identity_risk.rs` | Welford stats for temporal_stability component | +| `adversarial.rs` | `extractor.rs` | Physically-impossible signal detection — flags frames that may be from a compromised AP (A5 threat) | + +Not used by BFLD: `pose_tracker.rs`, `intention.rs`, `gesture.rs`, `tomography.rs`, +`field_model.rs` — these operate above the identity-risk layer. + +### 2.2 RuVector v2.0.4 Crates + +| Crate | BFLD Usage | Rationale | +|-------|-----------|-----------| +| `ruvector-attention` | `identity_risk.rs` | Spatial attention over subcarrier dimension for embedding computation | +| `ruvector-mincut` | `features.rs` | Person separation score as input to person_count feature | +| `ruvector-temporal-tensor` | `extractor.rs` | Temporal windowing + compression of BFI angle sequences | + +Not used: `ruvector-attn-mincut`, `ruvector-solver` — spectrogram and sparse +interpolation are not needed in the BFI pipeline. + +### 2.3 Cross-Viewpoint Fusion (wifi-densepose-ruvector) + +Path: `v2/crates/wifi-densepose-ruvector/src/viewpoint/` + +| Module | BFLD Usage | +|--------|-----------| +| `coherence.rs` | Cross-viewpoint phase coherence for cross_perspective_consistency risk component | +| `geometry.rs` | Fisher Information / Cramer-Rao bounds for confidence estimation | +| `attention.rs` | GeometricBias-weighted attention for multi-AP BFI fusion | +| `fusion.rs` | MultistaticArray aggregate root — BFLD subscribes to domain events here | + +--- + +## 3. ESP32 Firmware Additions + +### 3.1 ESP32-S3 BFI Capability Assessment + +The ESP32-S3's WiFi driver (`csi_collector.c` in `firmware/esp32-csi-node/main/`) +uses `esp_wifi_csi_set_config()` and the `wifi_csi_cb_t` callback. This produces +Espressif HT20 CSI in a vendor-specific format — amplitude + phase per subcarrier, +not the VHT/HE Compressed Beamforming frames (CBFR) that contain Phi/Psi angles. + +The ESP32-S3 does NOT have a public API to generate or capture CBFR frames. Espressif's +802.11 implementation does receive and process CBFR frames internally (for beamforming +its own transmissions), but these are not exposed via the CSI callback. + +**Consequence**: BFI capture for BFLD requires host-side sniffing, not ESP32 firmware +modification. + +### 3.2 Host-Side BFI Capture Path + +Recommended capture hardware: Raspberry Pi 5 with BCM43456 chip running Nexmon CSI +patch. This is already present in the fleet as `cognitum-v0` (Pi 5, Tailscale IP +100.77.59.83 per CLAUDE.local.md). + +Capture path: +1. Nexmon monitor mode captures all 802.11 frames on the target channel. +2. A filter pass extracts CBFR frames (frame type = Action, subtype = VHT/HE CBFR). +3. The rvcsi adapter (`vendor/rvcsi/`) already handles Nexmon PCap format; add a + BFI extractor alongside the existing CSI extractor. +4. Frames are forwarded to the BFLD pipeline via the existing UDP stream path + (`stream_sender.c` / sensing-server). + +### 3.3 Firmware Changes Required (Minimal) + +The only firmware change needed in `firmware/esp32-csi-node/main/` is to the +`stream_sender.c` protocol: add a packet type byte to the stream header to distinguish +CSI frames from BFI frames. The BFI frames originate on the Pi-side host, not the +ESP32; the ESP32 stream is unchanged. + +```c +// stream_sender.h — add packet type +#define STREAM_PKT_TYPE_CSI 0x01 +#define STREAM_PKT_TYPE_BFI 0x02 // new: BFI frames from host capture +``` + +--- + +## 4. Test Plan: 7 Acceptance Criteria Mapped to Rust Tests + +| AC | Criterion | Test in `acceptance.rs` | +|----|-----------|------------------------| +| AC1 | Commodity WiFi 5/6 capture (80/160 MHz, 2×2 MIMO minimum) | `ac1_commodity_wifi_capture`: assert BfiExtractor parses 80 MHz VHT CBFR sample fixture | +| AC2 | Presence detection latency ≤ 1s from first non-empty BFI frame | `ac2_presence_latency`: replay 10-frame window, assert first `BfldEvent` with `presence=true` within 1,000 ms wall time | +| AC3 | Motion score published at ≥ 1 Hz on `motion/state` topic | `ac3_motion_hz`: mock MQTT sink, run at 5 Hz input, assert ≥ 1 motion event per second | +| AC4 | Raw BFI bytes never appear in serialized output | `ac4_raw_bfi_absent`: fuzz 1,000 random BfiCaptures, assert no bfi_matrix bytes in serialized BfldFrame for any privacy_class | +| AC5 | Privacy-mode suppresses all identity-derived fields | `ac5_privacy_mode`: enable privacy_mode, assert BfldEvent fields identity_risk_score and rf_signature_hash are None | +| AC6 | Deterministic frame hash for identical inputs | `ac6_deterministic_hash`: run same BfiCapture 100 times, assert all output hashes identical | +| AC7 | CSI-optional fusion: pipeline runs without csi_matrix | `ac7_csi_optional`: run BfldPipeline with None csi_matrix, assert no panic and presence event produced | + +Additionally, `tests/hash_rotation.rs` must include: +- `cross_site_isolation`: two BfldPipelines with different site_salts, identical inputs → hashes must differ +- `daily_rotation`: same salt, frames 1 second before/after midnight → hashes must differ + +--- + +## 5. Phased Rollout + +### P1 — Frame Format + Extractor Stub (2 weeks) + +Deliverables: +- `frame.rs`: `BfldFrame` struct, serialization, CRC32, magic, version +- `extractor.rs`: CBFR parser for 802.11ac VHT + 802.11ax HE formats +- AC1, AC6 tests passing +- `Cargo.toml` with workspace integration + +Effort: 1 engineer, 2 weeks. + +### P2 — Feature Extraction + Identity Risk (3 weeks) + +Deliverables: +- `features.rs`: all 9 named features (mean_angle_delta through identity_separability_score) +- `identity_risk.rs`: risk formula, EmbeddingRingBuf, coherence gate integration +- AC4, AC7 tests passing (raw-absent, CSI-optional) +- Integration with `ruvector-attention` and `ruvector-temporal-tensor` + +Effort: 1 engineer, 3 weeks. + +### P3 — Privacy Gate + MQTT (2 weeks) + +Deliverables: +- `privacy_gate.rs`: privacy_class assignment, field masking, `#[must_classify]` lint +- `mqtt.rs`: per-class topic routing, discovery payloads, ACL documentation +- AC2, AC3, AC5 tests passing (latency, Hz, privacy-mode) +- Hash rotation: `hash_rotation.rs` tests passing +- Deterministic proof bundle: `verify_bfld.py` equivalent + +Effort: 1 engineer, 2 weeks. + +### P4 — Home Assistant Integration (1 week) + +Deliverables: +- MQTT discovery payloads for all 6 entities +- 3 HA blueprints +- `sensor.bfld_identity_risk` marked diagnostic + hidden by default +- Update `wifi-densepose-sensing-server` to include BFLD event routing + +Effort: 0.5 engineer, 1 week. + +### P5 — Matter Exposure (1 week) + +Deliverables: +- `cog-ha-matter` crate updated to filter BfldFrame → Matter attribute reports +- OccupancySensing cluster populated from `presence` +- Rejection list for identity fields enforced at Matter boundary + +Effort: 0.5 engineer, 1 week. + +### P6 — cognitum Federation (1 week) + +Deliverables: +- Topic routing in `mqtt.rs` for federated vs local topics +- Documentation for cognitum-rvf-agent BFLD event subscription +- End-to-end test: Pi 5 (cognitum-v0) receives federated events, identity fields absent + +Effort: 0.5 engineer, 1 week. + +**Total estimate**: ~10.5 engineer-weeks across 6 phases, approximately 3 calendar months +with one engineer. diff --git a/docs/research/BFLD/07-benchmarks-and-evaluation.md b/docs/research/BFLD/07-benchmarks-and-evaluation.md new file mode 100644 index 00000000..f83cbe85 --- /dev/null +++ b/docs/research/BFLD/07-benchmarks-and-evaluation.md @@ -0,0 +1,196 @@ +# BFLD Benchmarks and Evaluation Strategy + +## 1. Datasets + +### 1.1 BFId Dataset (Primary) + +**Reference**: Todt, Morsbach, Strufe; KIT. ACM CCS 2025. +https://dl.acm.org/doi/10.1145/3719027.3765062 +https://ps.tm.kit.edu/english/bfid-dataset/index.php + +197 individuals. BFI and CSI recorded simultaneously. Multiple sessions, multiple AP +angles. Available to researchers for non-commercial use on request from KIT. + +**Use in BFLD evaluation**: The BFId dataset provides the ground-truth identity labels +needed to calibrate `identity_risk_score`. Specifically: given BFId's known re-ID +accuracy as a function of time window, BFLD's identity_risk_score should correlate +with BFId's success rate. High-risk frames (score > 0.7) should correspond to windows +where BFId achieves > 80% accuracy; low-risk frames (score < 0.2) should correspond +to windows where BFId accuracy approaches chance. + +### 1.2 Wi-Pose and MM-Fi (Context) + +**MM-Fi**: Multi-modal WiFi sensing dataset used by this project (ADR-015). Contains +synchronized WiFi CSI, mmWave, and camera pose data. Does not contain BFI separately, +but can be used to validate BFLD's CSI-optional path (AC7). + +**Wi-Pose**: Academic benchmark for WiFi pose estimation. CSI only; used for +person_count and motion accuracy baselines. + +### 1.3 Proposed In-House Multi-Site Capture Protocol + +**Purpose**: Validate cross-site isolation (Invariant 3) and daily rotation. + +**Setup**: +- Site A: ruvultra (RTX 5080 workstation, Tailscale 100.104.125.72) with USB WiFi + adapter in monitor mode. +- Site B: cognitum-v0 (Pi 5, Tailscale 100.77.59.83) with Nexmon monitor mode. +- Subject pool: 5–10 volunteers. +- Protocol: Each subject walks a fixed path at each site on 3 consecutive days. + BFI captured simultaneously at both sites using Wi-BFI. + +**Analysis**: +1. Can the BFId classifier re-identify subjects within a site? (Baseline — should + confirm BFId's published results.) +2. Can any classifier re-identify subjects across sites using BFLD's + rf_signature_hash? (Should fail — cross-site isolation test.) +3. Can any classifier re-identify across days using BFLD's rf_signature_hash? (Should + fail — daily rotation test.) + +--- + +## 2. Metrics + +### 2.1 Presence Detection + +| Metric | Definition | Target | +|--------|-----------|--------| +| Latency p50 | Time from first non-empty BFI frame to first `presence=true` event | < 500 ms | +| Latency p95 | | < 1000 ms (AC2) | +| False positive rate | Presence=true when room is confirmed empty | < 5% | +| False negative rate | Presence=false when person confirmed present | < 2% | + +Measurement method: camera ground-truth (ruvultra webcam via MediaPipe Pose, same +as ADR-079 collection protocol) for empty/occupied labels. + +### 2.2 Motion Score + +| Metric | Definition | Target | +|--------|-----------|--------| +| MAE vs ground truth | Mean absolute error of motion score vs camera-derived motion magnitude | < 0.1 | +| Hz at sustained operation | Events published per second on `motion/state` | >= 1 Hz (AC3) | +| Latency p95 | Time from motion onset (camera) to motion event | < 750 ms | + +### 2.3 Person Count + +| Metric | Definition | Target | +|--------|-----------|--------| +| Count accuracy | Fraction of windows where BFLD person_count == camera count | > 85% for 1–3 persons | +| Count MAE | | < 0.5 for counts 1–4 | + +Person count is harder than presence. The target is achievable with MinCut separation +(`ruvector-mincut`) but requires multi-AP coverage for 4+ persons. + +### 2.4 Identity Risk Calibration + +This is BFLD's novel evaluation dimension — no prior system has explicitly quantified +this. + +**Calibration definition**: Let `r(t)` = BFLD's identity_risk_score at time t. +Let `acc(t)` = BFId classifier's re-identification accuracy when trained on frames +around time t. The identity_risk_score is *calibrated* if: + + E[acc(t) | r(t) = v] is monotonically increasing in v + +In other words: higher risk scores should correspond to frames where identity inference +is genuinely easier. + +**Evaluation protocol**: +1. Run BFId classifier in sliding 5-second windows on the BFId dataset. +2. Record per-window BFId accuracy (using leave-one-out cross-validation). +3. Run BFLD's identity_risk_score computation on the same windows. +4. Compute Spearman correlation between risk scores and BFId accuracy. +5. Target: Spearman rho > 0.5 (positive monotonic correlation). + +### 2.5 Privacy-Mode False Positive Rate + +When `privacy_mode` is enabled (privacy_class = 3), all identity-correlated fields +should be suppressed. The false positive rate is the fraction of outbound events +that inadvertently include an identity-correlated field despite privacy_mode being +active. + +**Target**: 0% (this is a hard correctness requirement, not a statistical target). +Verified by the AC5 fuzz test in `acceptance.rs`. + +--- + +## 3. Red-Team Protocol + +### 3.1 Hash Re-identification Attack + +**Question**: Can an attacker re-identify a person across rotated hashes? + +**Setup**: +- Run BFLD pipeline for person X across 3 days. +- Collect `rf_signature_hash` values for each day: H_1, H_2, H_3. +- Adversary has access to H_1, H_2, H_3 and knows they are from the same site. +- Adversary attempts to confirm H_1, H_2, H_3 are from the same person. + +**Success condition**: adversary achieves confirmation rate > chance (1/N for N subjects). + +**Expected result**: FAIL (by construction of the hash rotation with site_salt). +Since day_epoch changes daily and site_salt is fixed but unknown to the adversary, +the hash function is a keyed PRF. The adversary has three random-looking 32-byte +values with no structural relationship. Success rate should be indistinguishable from +random guessing. + +**Quantitative target**: success rate <= 1/N + 0.05 (within 5% of chance). + +### 3.2 Cross-Site Re-identification Attack + +**Question**: Can an attacker confirm person X visited both site A and site B? + +**Setup**: Same as Section 1.3 in-house protocol. Adversary has BFLD event streams +from both sites. + +**Method**: Attempt to match rf_signature_hash values from site A and site B on the +same day. Alternatively, train a classifier on BFI features (using the raw angle +sequences from the captured data) and attempt cross-site re-ID. + +**Expected result**: Hash-based matching fails by construction. Classifier-based +re-ID may succeed if the adversary has raw angle data (which BFLD does not publish) +but not using BFLD's published output. + +**Success condition**: hash-based cross-site match rate <= 1/N + 0.05. + +### 3.3 Timing Side-Channel Attack + +**Question**: Can an attacker infer a person's schedule by monitoring +identity_risk_score over time? + +**Method**: Record identity_risk_score time series. Correlate with known schedule +(person X leaves at 8am, returns at 6pm). Compute mutual information between +schedule and risk score time series. + +**Expected result**: Some correlation exists (risk score rises when person enters), +but the attacker learns "someone is present" — equivalent to the presence sensor — +not identity. This is acceptable: presence information is already published at +class 2. + +--- + +## 4. Comparison Baselines + +| Baseline | Description | Presence F1 | Motion MAE | Identity leak | +|----------|-------------|------------|-----------|--------------| +| Raw CSI pipeline | Existing wifi-densepose pipeline (no BFLD) | ~0.95 (est.) | ~0.08 (est.) | Unquantified — no risk gating | +| BFI-only (no BFLD) | Wi-BFI + threshold presence | ~0.82 (from LeakyBeam) | N/A | Angle matrices published | +| BFI+CSI fusion (no BFLD) | Combined pipeline, ungated | ~0.97 (est.) | ~0.06 (est.) | Unquantified | +| **BFLD (BFI+CSI, class 2)** | Full BFLD with anonymous privacy class | target 0.93 | target 0.10 | 0% (class 2 gate) | +| BFLD (BFI-only, class 2) | BFLD without CSI input (AC7) | target 0.85 | target 0.12 | 0% (class 2 gate) | + +The BFLD privacy-class guarantee reduces the raw sensing accuracy by a small margin +versus an ungated BFI+CSI pipeline (target F1 0.93 vs estimated 0.97). This is the +explicit trade-off: identity safety for a modest utility cost. + +--- + +## 5. Continuous Evaluation in CI + +Three tests run on every PR that touches the BFLD crate: + +1. **Deterministic hash test** (AC6): same input → same output across platforms. +2. **Privacy-mode field suppression fuzz** (AC5): 1,000 random inputs → no identity + fields in class-2 output. +3. **Latency smoke test** (AC2): 100-frame replay → first presence event < 200 ms + (tighter than the 1s AC target, to keep CI fast). diff --git a/docs/research/BFLD/08-adr-draft.md b/docs/research/BFLD/08-adr-draft.md new file mode 100644 index 00000000..79dae045 --- /dev/null +++ b/docs/research/BFLD/08-adr-draft.md @@ -0,0 +1,214 @@ +# ADR-118: BFLD — Beamforming Feedback Layer for Detection + +> This file is a draft. When approved, copy to: +> `docs/adr/ADR-118-bfld-beamforming-feedback-layer-for-detection.md` + +| Field | Value | +|-------|-------| +| **Status** | Proposed | +| **Date** | 2026-05-24 | +| **Deciders** | ruv | +| **Codename** | **BFLD** — Beamforming Feedback Layer for Detection | +| **Relates to** | [ADR-024](ADR-024-contrastive-csi-embedding-model.md) (AETHER contrastive embedding), [ADR-027](ADR-027-cross-environment-domain-generalization.md) (MERIDIAN cross-environment), [ADR-028](ADR-028-esp32-capability-audit.md) (capability audit / witness), [ADR-029](ADR-029-ruvsense-multistatic-sensing-mode.md) (RuvSense multistatic), [ADR-030](ADR-030-ruvsense-persistent-field-model.md) (persistent field model), [ADR-031](ADR-031-ruview-sensing-first-rf-mode.md) (sensing-first RF mode), [ADR-032](ADR-032-multistatic-mesh-security-hardening.md) (mesh security hardening), [ADR-095](ADR-095-rvcsi-edge-rf-sensing-platform.md) (rvCSI platform), [ADR-115](ADR-115-home-assistant-integration.md) (HA integration), [ADR-116](ADR-116-cog-ha-matter-seed.md) (Matter seed packaging), [ADR-117](ADR-117-pip-wifi-densepose-modernization.md) (pip modernization) | +| **Tracking issue** | TBD | + +--- + +## 1. Context + +### 1.1 The Plaintext BFI Problem + +IEEE 802.11ac and 802.11ax beamforming feedback information (BFI) is exchanged between +client stations (STA) and access points (AP) in unencrypted management-plane frames. +The STA compresses the channel response into a matrix of Givens rotation angles (Phi/Psi) +and transmits them in a VHT/HE Compressed Beamforming Report (CBFR) frame. These frames +are passively sniffable by any device in WiFi monitor mode without any access to the +target network. + +Two independent 2024–2025 research papers establish the severity of this exposure: + +1. **BFId** (Todt, Morsbach, Strufe; KIT; ACM CCS 2025, + https://dl.acm.org/doi/10.1145/3719027.3765062): demonstrates re-identification of + 197 individuals using BFI alone, with >90% accuracy from 5 seconds of capture. +2. **LeakyBeam** (Xiao et al.; Zhejiang U., NTU, KAIST; NDSS 2025, + https://www.ndss-symposium.org/ndss-paper/lend-me-your-beam-privacy-implications-of-plaintext-beamforming-feedback-in-wifi/): + demonstrates occupancy detection through walls at 20 m range using BFI, with 82.7% + TPR and 96.7% TNR. + +Tooling for passive BFI capture is freely available. Wi-BFI +(https://arxiv.org/abs/2309.04408) is pip-installable and supports 802.11ac/ax, +SU/MU-MIMO, 20/40/80/160 MHz channels. + +### 1.2 Gap in Existing Pipeline + +The wifi-densepose sensing pipeline processes CSI via the rvCSI runtime (ADR-095/096) +and produces presence, pose, vitals, and zone-activity events. No layer explicitly +measures whether the data being processed is capable of identifying specific individuals. +The pipeline treats all CSI as equivalent from a privacy standpoint, regardless of +whether it is operating in a high-separability (identity-leaky) or low-separability +(anonymous) regime. + +This gap becomes a compliance and liability issue as WiFi sensing deployments scale. +An operator deploying this system in a care facility, hotel, or shared office has no +instrument to verify that the system is operating anonymously. + +### 1.3 The BFI Opportunity + +BFI is not only a threat vector — it is a complementary sensing signal. Because BFI +encodes the channel response as a structured compressed matrix, it carries multipath +geometry that can augment CSI-based presence and motion detection, particularly in +scenarios where only one AP is available (fewer antenna pairs than a full MIMO CSI +capture). The BFLD design treats BFI as an optional input alongside CSI, not as a +replacement. + +--- + +## 2. Decision + +We will create a new crate `wifi-densepose-bfld` (to live in `v2/crates/`) that: + +1. **Ingests** raw BFI (Phi/Psi angle matrices from CBFR frames) as input and optionally + fuses CSI when available. +2. **Computes** nine named features and derives an `identity_risk_score` using a + separability × temporal_stability × cross_perspective_consistency × sample_confidence + formula. +3. **Gates** all output through a `privacy_class` mechanism that structurally prevents + identity-correlated data from being published at privacy classes 2 and 3. +4. **Emits** `BfldEvent` structs on MQTT topics under `ruview//bfld/` with + per-class topic routing. +5. **Enforces** three invariants structurally (not by policy): + - Raw BFI never exits the node. + - Identity embedding is in-RAM-only. + - Cross-site identity correlation is made cryptographically impossible via per-site + keyed BLAKE3 hash rotation with a daily epoch. + +The `BfldFrame` wire format carries magic `0xBF1D_0001`, a version byte, hashed AP/STA +identifiers, a quantization byte, a privacy_class byte, compressed feature payload, and +a CRC32. + +Matter exposure is limited to: OccupancySensing (presence), MotionSensor (motion), +PeopleCount (person_count). Identity fields are rejected at the Matter boundary in the +`cog-ha-matter` crate. + +--- + +## 3. Consequences + +### Positive + +- Operators gain an explicit, auditable measure of privacy compliance at the RF layer — + the first such primitive in the wifi-densepose ecosystem. +- The identity_risk_score doubles as an anomaly signal: unexpected spikes indicate + environmental changes (new AP firmware, nearby attacker-grade sniffer, unusual + propagation geometry) that warrant investigation. +- BFI fusion augments presence and motion accuracy in single-AP deployments, partially + compensating for lower CSI antenna counts. +- The crate's deterministic frame hashes enable the ADR-028 witness-bundle pattern to + extend to the new sensing surface, preserving the existing audit trail model. +- Cross-site identity isolation is structural, not policy-dependent. This is a stronger + guarantee than access-control rules. + +### Negative + +- BFI capture on ESP32-S3 hardware is not directly possible via the Espressif WiFi API. + The full BFLD pipeline requires a Pi 5 / Nexmon host-side sniffer (cognitum-v0 is + available for this purpose, but it adds a fleet dependency for the BFI path). +- The identity_risk_score calibration (correlation with actual re-ID success rate) + requires the BFId dataset, which requires non-commercial research agreement with KIT. +- ~10.5 engineer-weeks of implementation effort. + +### Neutral + +- BFLD does not prevent passive BFI capture by an external attacker (A1 / LeakyBeam + threat). It only ensures the node's own output is non-identifying. Operators should + be informed of this distinction. +- The daily hash rotation means that occupant-counting analytics that span multiple + days cannot correlate individual signatures across the day boundary. This is a privacy + benefit that some analytics use-cases may find inconvenient. + +--- + +## 4. Alternatives Considered + +### Alt 1: Skip BFI entirely, CSI-only pipeline + +The rvCSI pipeline (ADR-095/096) already handles CSI without BFI. This alternative +requires no new crate and no change to the ESP32 firmware. + +**Rejected because**: (a) it leaves the identity-leakage detection gap open for the +existing CSI pipeline, and (b) as BFI capture tooling becomes more widespread (Wi-BFI, +PicoScenes), the absence of a privacy layer becomes more conspicuous for operators. + +### Alt 2: Publish identity_risk_score publicly (default-on) + +Treat the risk score as a diagnostic metric that operators and the public can observe. + +**Rejected because**: the risk score is itself a privacy-sensitive signal (it reveals +when a specific person is present via timing correlation). The default should be +opt-in, with the operator explicitly acknowledging the trade-off. + +### Alt 3: Use raw BFI in cloud ML training + +Send raw BFI angle matrices to a cloud training service to improve model quality. + +**Rejected because**: this violates Invariant 1. Cloud training on raw BFI would +create an off-node store of angle matrices that could be reconstructed into identity +profiles. The on-device-only constraint is not negotiable. + +### Alt 4: Differential privacy noise injection on BFI before any processing + +Add calibrated Laplace/Gaussian noise to the angle matrices at ingress to provide +epsilon-differential privacy on all downstream computations. + +**Rejected for this ADR** (noted as future extension): DP noise calibration requires +sensitivity analysis that is not yet complete, and the interaction between DP noise +and the identity_risk_score formula requires separate validation. The current design +achieves privacy through structural impossibility (local-only, hash rotation) rather +than noise injection. + +--- + +## 5. Acceptance Criteria + +- [ ] **AC1**: The extractor parses BFI from commodity WiFi 5 (802.11ac) and WiFi 6 + (802.11ax) captures, supporting 20/40/80/160 MHz channel bandwidth and 2×2 through + 4×4 MIMO configurations. +- [ ] **AC2**: Presence detection latency is ≤ 1s p95 from the first non-empty BFI + frame in a new occupancy event. +- [ ] **AC3**: Motion score is published at ≥ 1 Hz on the `ruview//bfld/motion/state` + MQTT topic during sustained occupancy. +- [ ] **AC4**: Raw BFI bytes (Phi/Psi angle matrices) are never present in any + serialized `BfldFrame` payload at any `privacy_class` value. +- [ ] **AC5**: When `privacy_mode` is enabled, all identity-derived fields + (`identity_risk_score`, `rf_signature_hash`, `identity_embedding`) are absent from + all outbound events. +- [ ] **AC6**: Given identical `BfiCapture` inputs, the `BfldFrame` serialization + produces bit-identical output (deterministic hash) across runs and across platforms. +- [ ] **AC7**: The pipeline produces valid `BfldEvent` outputs when `csi_matrix` is + absent (BFI-only mode), without panic or degraded presence/motion reporting beyond + the documented accuracy bounds. + +--- + +## 6. Related ADRs + +- **ADR-024**: AETHER contrastive CSI embedding — BFLD reuses the AETHER embedding + infrastructure for identity_risk computation. +- **ADR-027**: MERIDIAN cross-environment — BFLD's cross-site isolation instantiates + the "no cross-site correlation" assumption that MERIDIAN requires. +- **ADR-028**: Witness verification — BFLD extends the deterministic proof pattern. +- **ADR-029**: RuvSense multistatic — BFLD uses `multistatic.rs` for + cross_perspective_consistency. +- **ADR-030**: Persistent field model — BFLD uses `cross_room.rs` for + environment fingerprinting in the hash rotation. +- **ADR-031**: Sensing-first RF mode — BFLD is a new sensing primitive alongside + the CSI-based sensing. +- **ADR-032**: Mesh security hardening — BFLD's threat model is a superset. +- **ADR-095/096**: rvCSI platform — BFLD shares the BFI capture path with rvCSI's + Nexmon adapter. +- **ADR-115**: HA integration — BFLD extends the 21-entity HA surface with 6 new + entities. +- **ADR-116**: Matter seed packaging — BFLD's Matter boundary filter is implemented + in `cog-ha-matter`. +- **ADR-117**: pip modernization — BFLD's Python bindings (PyO3) will follow the + pattern established in ADR-117. diff --git a/docs/research/BFLD/09-github-issue.md b/docs/research/BFLD/09-github-issue.md new file mode 100644 index 00000000..1f8cbc3e --- /dev/null +++ b/docs/research/BFLD/09-github-issue.md @@ -0,0 +1,111 @@ +# GitHub Issue Draft + +**Title**: feat: BFLD — Beamforming Feedback Layer for Detection (privacy-gated WiFi sensing) + +**Labels**: `enhancement`, `privacy`, `security`, `area/signal`, `area/firmware` + +**Milestone**: (TBD — suggest: v0.8.0) + +--- + +## Summary + +Add a new crate `wifi-densepose-bfld` that turns raw 802.11 Beamforming Feedback +Information (BFI) into bounded, privacy-gated sensing outputs. BFLD detects when RF +data crosses from "ambient sensing" into "identity record" and structurally prevents +identity-correlated data from leaving the node. + +This is the safety layer that was missing from the CSI pipeline. As passive BFI sniffing +tools (Wi-BFI, PicoScenes) become widely available and academic attacks (BFId at ACM CCS +2025, LeakyBeam at NDSS 2025) demonstrate >90% re-identification from commodity WiFi, +the wifi-densepose ecosystem needs an explicit privacy layer before scaling deployment. + +## Motivation + +1. **BFI is plaintext and passively sniffable.** IEEE 802.11ac/ax CBFR frames are + transmitted before WPA2/WPA3 encryption is applied. Any nearby device in monitor mode + can capture them (NDSS 2025: https://www.ndss-symposium.org/ndss-paper/lend-me-your-beam-privacy-implications-of-plaintext-beamforming-feedback-in-wifi/). + +2. **BFI enables re-identification.** The KIT BFId paper (ACM CCS 2025: + https://dl.acm.org/doi/10.1145/3719027.3765062) demonstrates >90% identity + recognition from 5 seconds of BFI, from a dataset of 197 individuals, using only + the Phi/Psi Givens rotation angles. + +3. **The existing pipeline has no identity-leakage measurement.** The rvCSI pipeline + produces presence/motion/pose events without any indication of whether those outputs + were derived from identity-discriminative data. An operator deploying in a care + facility or shared office has no way to verify the system is behaving anonymously. + +4. **WiFi 7 will make this worse.** 802.11be (Wi-Fi 7) multi-link operation increases + sounding frequency 3–5×. The attack surface is not static. + +## Proposed Solution + +New crate at `v2/crates/wifi-densepose-bfld/` with the following pipeline: + +``` +BFI capture (CBFR frames, Pi 5 / Nexmon monitor mode) + → BFI extractor (Phi/Psi parser, 802.11ac/ax) + → Normalization + temporal windowing + → Feature extraction (9 named features) + → Identity risk engine (in-RAM embeddings, coherence gate) + → Privacy gate (privacy_class byte, field masking) + → MQTT emitter (per-class topic routing) +``` + +Three structural invariants (not configurable, not policy): +1. Raw BFI never leaves the node. +2. Identity embedding is in-RAM-only (VecDeque, never persisted). +3. Cross-site identity matching is cryptographically impossible via per-site BLAKE3 + keyed hash with daily rotation. + +Output events published on `ruview//bfld/{presence,motion,person_count,...}/state`. + +Matter and HA expose only: presence, motion, person_count. Identity fields are rejected +at both boundaries. + +## Acceptance Criteria + +- [ ] **AC1**: Parser handles 802.11ac VHT and 802.11ax HE CBFR frames at 20/40/80/160 MHz, + 2×2 through 4×4 MIMO. +- [ ] **AC2**: Presence detection latency ≤ 1s p95 from first non-empty BFI frame in + a new occupancy event. +- [ ] **AC3**: Motion score published at ≥ 1 Hz on `ruview//bfld/motion/state` + during sustained occupancy. +- [ ] **AC4**: Raw BFI bytes (Phi/Psi angle matrices) are never present in any + serialized output at any `privacy_class` value. +- [ ] **AC5**: Privacy mode suppresses all identity-derived fields (`identity_risk_score`, + `rf_signature_hash`, `identity_embedding`) from all outbound events. +- [ ] **AC6**: Identical `BfiCapture` input → bit-identical `BfldFrame` output + (deterministic, cross-platform). +- [ ] **AC7**: Pipeline produces valid `BfldEvent` with `csi_matrix = None` (BFI-only + mode), without panic or significant accuracy degradation. + +## References + +- BFId paper: https://dl.acm.org/doi/10.1145/3719027.3765062 +- KIT BFId dataset: https://ps.tm.kit.edu/english/bfid-dataset/index.php +- LeakyBeam (NDSS 2025): https://www.ndss-symposium.org/ndss-paper/lend-me-your-beam-privacy-implications-of-plaintext-beamforming-feedback-in-wifi/ +- Wi-BFI tool: https://arxiv.org/abs/2309.04408 +- Protecting activity signatures in CSI feedback: https://arxiv.org/pdf/2512.18529 +- Research bundle: `docs/research/BFLD/` (this repo) +- Draft ADR: `docs/research/BFLD/08-adr-draft.md` → ADR-118 + +## Out of Scope + +- Preventing passive BFI capture by external attackers (hardware-level problem, not + software). +- Differential privacy noise injection (noted as future extension in ADR-118). +- Federated identity learning (local-only is sufficient for the current use case). +- BFI capture directly from ESP32-S3 firmware (Espressif API does not expose CBFR; + host-side Pi 5 / Nexmon capture is the implementation path). +- WiFi 7 / 802.11be multi-link BFI (frame format versioning accommodates it; not + in scope for v1 implementation). + +## Related Issues / PRs + +- ADR-028 witness bundle (ref: this repo's `docs/WITNESS-LOG-028.md`) +- ADR-115 HA integration (21 entities — BFLD adds 6 more) +- ADR-116 Matter seed packaging (`cog-ha-matter` crate needs Matter boundary update) +- ADR-117 pip modernization (PyO3 pattern reused for BFLD Python bindings) +- rvCSI platform (ADR-095/096) — Nexmon adapter shared with BFLD BFI capture path diff --git a/docs/research/BFLD/10-gist.md b/docs/research/BFLD/10-gist.md new file mode 100644 index 00000000..d9133c84 --- /dev/null +++ b/docs/research/BFLD/10-gist.md @@ -0,0 +1,136 @@ +# BFLD: The Privacy Layer Your WiFi Sensing Stack Has Been Missing + +Your WiFi router is broadcasting your identity in plaintext. Here is the layer that +catches it. + +--- + +## The Problem + +Every time your phone or laptop connects to a WiFi 5 or WiFi 6 router, it periodically +transmits a Beamforming Feedback Report (CBFR frame). This frame contains the compressed +channel matrix the router needs to aim its antennas at your device. The compression uses +Givens rotations — a pair of angles (Phi and Psi) per active subcarrier — that encode +the spatial geometry of the wireless channel around your body. + +Here is the catch: these frames are transmitted before WPA2/WPA3 encryption is applied. +They are plaintext management frames, passively readable by any WiFi adapter in monitor +mode within roughly 20 meters. + +Two papers published in 2024–2025 confirm the threat is real: + +- **BFId** (KIT, ACM CCS 2025): re-identifies 197 people from beamforming feedback alone, + >90% accuracy from just 5 seconds of capture. Tools needed: a WiFi adapter, a pip + install, and no access to the target network. + (https://dl.acm.org/doi/10.1145/3719027.3765062) + +- **LeakyBeam** (Zhejiang U. / NTU / KAIST, NDSS 2025): detects occupancy through walls + at 20 m range using beamforming feedback with 82.7% accuracy. + (https://www.ndss-symposium.org/ndss-paper/lend-me-your-beam-privacy-implications-of-plaintext-beamforming-feedback-in-wifi/) + +WiFi sensing systems — including this project — process these same signals to detect +presence, count people, and track motion. Without a privacy layer, there is no way to +know whether the sensing output is derived from anonymizable motion data or from +identity-discriminative data. + +--- + +## What BFLD Does + +BFLD (Beamforming Feedback Layer for Detection) is a new Rust crate in the +wifi-densepose workspace that adds one thing: an explicit, continuous measurement of +whether the beamforming data currently being processed is capable of identifying +individuals. + +It outputs a small, structured event on every sensing cycle: + +```json +{ + "timestamp_ns": 1748092800000000000, + "presence": true, + "motion": 0.42, + "person_count": 1, + "identity_risk_score": 0.71, + "rf_signature_hash": "a3f2c1...e9b4", + "zone_id": "living_room", + "confidence": 0.88, + "privacy_class": 1 +} +``` + +High `identity_risk_score` (approaching 1.0) means the current sensing environment is +producing data from which an attacker could re-identify individuals. Low score means +the data is effectively anonymous. + +The score is computed from four components: how separable the current RF embedding is +from a population distribution, how stable that separability is over time, how +consistent it is across multiple sensor viewpoints, and how confident the current sample +is. Multiply them together, clamp to [0, 1]. + +--- + +## Three Invariants That Cannot Be Turned Off + +BFLD enforces three properties structurally — not as settings, not as policies: + +**1. Raw BFI never leaves the node.** The Phi/Psi angle matrices are consumed locally +and dropped after feature extraction. They are not in the wire format. They are not in +the MQTT payload. There is no code path to serialize them outbound. + +**2. Identity embeddings are RAM-only.** The vector embedding used to compute the risk +score lives in a fixed-size ring buffer (default: 10 minutes). It is never written to +disk. When the node restarts, the buffer is gone. + +**3. Cross-site re-identification is cryptographically impossible.** The +`rf_signature_hash` is computed with a per-site secret key (generated at first boot, +stored in local NVS, never transmitted) and a per-day epoch. Two nodes at two +different sites, even receiving signals from the same person on the same day, produce +hash values in completely disjoint hash spaces. No amount of hash-list comparison can +reveal a cross-site visit. + +--- + +## What Reaches Home Assistant and Matter + +BFLD publishes to MQTT and HA. The following entities reach HA: + +- `binary_sensor.bfld_presence` +- `sensor.bfld_motion` +- `sensor.bfld_person_count` +- `sensor.bfld_confidence` + +The Matter bridge exposes only OccupancySensing (presence) and motion. Identity risk +score, rf_signature_hash, and all raw fields are rejected at both the HA and Matter +boundaries. + +--- + +## Seven Acceptance Criteria + +The implementation is done when these seven tests pass: + +1. Parse 802.11ac and 802.11ax BFI at 20–160 MHz bandwidth, 2×2 to 4×4 MIMO. +2. Presence latency ≤ 1 second p95. +3. Motion published at ≥ 1 Hz. +4. Raw BFI bytes absent from all output (verified by fuzz test). +5. Privacy mode suppresses all identity fields. +6. Identical input → identical output hash (cross-platform determinism). +7. Pipeline runs without CSI input (BFI-only mode). + +--- + +## BFLD Is an Immune System, Not a Surveillance Lens + +The framing matters. BFLD does not produce identity — it measures identity risk and +uses that measurement to gate what leaves the node. An immune system does not broadcast +the identity of pathogens it encounters; it classifies, responds locally, and keeps +detailed records inside the organism. + +WiFi 7 / 802.11be is deploying now. Multi-link operation will increase beamforming +sounding frequency 3–5x. The passive attack surface will grow. The time to establish +safe defaults in WiFi sensing stacks is before that installed base is in place. + +BFLD is that default. + +Full research bundle: `docs/research/BFLD/` in the wifi-densepose repository. +Draft ADR: `docs/research/BFLD/08-adr-draft.md` (ADR-118). diff --git a/docs/research/BFLD/README.md b/docs/research/BFLD/README.md new file mode 100644 index 00000000..5832d219 --- /dev/null +++ b/docs/research/BFLD/README.md @@ -0,0 +1,58 @@ +# BFLD Research Bundle — Beamforming Feedback Layer for Detection + +BFLD is the safety layer that detects when RF data becomes identifying. It sits between +raw 802.11 beamforming feedback (BFI) and every downstream consumer — home automation, +MQTT, Matter, cloud — measuring the identity-leakage potential of each frame and gating +what leaves the node. It does not produce identity; it guards against accidental or +adversarial exposure of identity. + +--- + +## Table of Contents + +| File | Purpose | +|------|---------| +| [01-sota-survey.md](01-sota-survey.md) | State-of-the-art literature: BFI vs CSI, attack tooling, identity-inference research, privacy-preserving techniques | +| [02-soul.md](02-soul.md) | Architectural intent, ethical stance, three non-negotiable invariants | +| [03-security-threat-model.md](03-security-threat-model.md) | Adversary classes, attack trees, mitigations, trust-boundary diagram, per-privacy-class analysis | +| [04-privacy-gating.md](04-privacy-gating.md) | privacy_class byte semantics, hash rotation algorithm, embedding lifecycle, wire-format diffs | +| [05-automation-integration.md](05-automation-integration.md) | Home Assistant entities, Matter clusters, MQTT ACLs, cognitum federation | +| [06-implementation-plan.md](06-implementation-plan.md) | New crate layout, reuse map, ESP32 additions, test plan, phased rollout | +| [07-benchmarks-and-evaluation.md](07-benchmarks-and-evaluation.md) | Datasets, metrics, red-team protocol, comparison baselines | +| [08-adr-draft.md](08-adr-draft.md) | Draft ADR-118 for formal project adoption | +| [09-github-issue.md](09-github-issue.md) | GitHub issue draft for tracking implementation | +| [10-gist.md](10-gist.md) | Public-facing one-pager / blog summary | + +--- + +## Executive Summary + +1. **Problem.** IEEE 802.11ac/ax beamforming feedback (BFI) — the compressed angle matrices + (Phi/Psi, Givens rotation) exchanged between client and AP — is transmitted unencrypted + on the management plane. Academic work (BFId at ACM CCS 2025, LeakyBeam at NDSS 2025) + demonstrates that a passive sniffer with commodity hardware can re-identify individuals + and infer occupancy through walls using only these frames. Existing CSI-based sensing + pipelines have no explicit layer to detect when their output crosses from "motion event" + into "identity record." + +2. **Approach.** BFLD is a new crate (`wifi-densepose-bfld`) that wraps the BFI extraction + and normalization path in an identity-leakage estimator. Every output frame carries a + computed `identity_risk_score` and a `privacy_class` byte; downstream consumers decide + whether to act based on those tags rather than on raw measurements. + +3. **Novel contribution.** BFLD does not try to suppress identity inference — it tries to + *measure* it continuously and make the measurement explicit in every event. This + transforms a latent, silent risk into an observable, auditable signal. The combination + of per-day per-site hash rotation and a local-only identity embedding creates structural + impossibility of cross-site re-identification — not merely a policy promise. + +4. **Security posture.** Raw BFI never leaves the node. Identity embeddings live only in + an in-RAM ring buffer. The rf_signature_hash rotates daily using a per-site blake3 + keyed-hash that is never transmitted. Matter and HA expose only presence, motion, and + person_count — never risk scores or embeddings. + +5. **Integration plan.** Six phases: P1 frame format + extractor stub, P2 feature + extraction + identity_risk, P3 privacy gate + MQTT, P4 HA integration, P5 Matter + exposure, P6 cognitum federation. Each phase maps to a numbered acceptance criterion. + The crate slots into the existing workspace between `wifi-densepose-signal` and + `wifi-densepose-sensing-server`.