docs(adr-118): BFLD — Beamforming Feedback Layer for Detection (6 ADRs + research bundle)

Introduce the Beamforming Feedback Layer for Detection: the RuView safety layer
that ingests WiFi BFI, measures identity-leakage risk, and structurally prevents
identity-correlated data from leaving the node by default.

ADRs (6):
- ADR-118: umbrella decision, crate scaffolding, 6-phase rollout (~10.5 wk)
- ADR-119: BfldFrame wire format, magic 0xBF1D_0001, deterministic serialization
- ADR-120: 4 privacy classes, BLAKE3 keyed-hash rotation, #[must_classify] default-deny
- ADR-121: 9-feature identity-risk scoring, coherence gate with hysteresis
- ADR-122: 6 HA entities, 3 Matter clusters, mosquitto ACL, cognitum-v0 federation
- ADR-123: Pi 5 / Nexmon production capture, AX210 dev path, ESP32-S3 self-only fallback

Research bundle (docs/research/BFLD/, 13,544 words):
- SOTA survey covering BFId (KIT, ACM CCS 2025) and LeakyBeam (NDSS 2025)
- Architectural soul: defensive sensing primitive, not surveillance lens
- Six-adversary threat model with attack trees and mitigations
- Privacy-gating mechanics with structural cross-site isolation proof
- Automation/integration surface (HA, Matter, MQTT, federation)
- Concrete implementation plan with reuse map
- Evaluation strategy with red-team protocol on KIT BFId dataset
- Draft ADR, GitHub issue, and public gist

Three structural invariants enforced by the type system, not policy:
  I1 — Raw BFI never exits the node
  I2 — Identity embedding is in-RAM-only (no Serialize impl)
  I3 — Cross-site identity correlation is cryptographically impossible
       (per-site BLAKE3 keyed-hash with daily epoch rotation)

References:
  https://publikationen.bibliothek.kit.edu/1000185756 (BFId)
  https://www.ndss-symposium.org/wp-content/uploads/2025-5-paper.pdf (LeakyBeam)

Co-Authored-By: claude-flow <ruv@ruv.net>
This commit is contained in:
ruv 2026-05-24 12:20:52 -04:00
parent be4efecbcd
commit 29233db6d5
17 changed files with 3267 additions and 0 deletions

View File

@ -0,0 +1,181 @@
# ADR-118: BFLD — Beamforming Feedback Layer for Detection
| Field | Value |
|-------|-------|
| **Status** | Proposed |
| **Date** | 2026-05-24 |
| **Deciders** | ruv |
| **Codename** | **BFLD** — Beamforming Feedback Layer for Detection |
| **Relates to** | [ADR-024](ADR-024-contrastive-csi-embedding-model.md) (AETHER), [ADR-027](ADR-027-cross-environment-domain-generalization.md) (MERIDIAN), [ADR-028](ADR-028-esp32-capability-audit.md) (witness), [ADR-029](ADR-029-ruvsense-multistatic-sensing-mode.md) (multistatic), [ADR-030](ADR-030-ruvsense-persistent-field-model.md) (field model), [ADR-031](ADR-031-ruview-sensing-first-rf-mode.md) (sensing-first), [ADR-032](ADR-032-multistatic-mesh-security-hardening.md) (mesh security), [ADR-095](ADR-095-rvcsi-edge-rf-sensing-platform.md) (rvCSI), [ADR-115](ADR-115-home-assistant-integration.md) (HA), [ADR-116](ADR-116-cog-ha-matter-seed.md) (Matter), [ADR-117](ADR-117-pip-wifi-densepose-modernization.md) (pip) |
| **Sub-ADRs** | [ADR-119](ADR-119-bfld-frame-format-and-wire-protocol.md) (frame), [ADR-120](ADR-120-bfld-privacy-class-and-hash-rotation.md) (privacy), [ADR-121](ADR-121-bfld-identity-risk-scoring.md) (risk), [ADR-122](ADR-122-bfld-ruview-ha-matter-exposure.md) (RuView), [ADR-123](ADR-123-bfld-capture-path-nexmon-and-esp32.md) (capture) |
| **Research bundle** | [`docs/research/BFLD/`](../research/BFLD/) (11 files, 13,544 words) |
| **Tracking issue** | TBD |
---
## 1. Context
### 1.1 The plaintext BFI problem
IEEE 802.11ac and 802.11ax beamforming feedback (BFI) is exchanged between client stations (STA) and access points (AP) in **unencrypted management-plane frames**. The STA compresses the channel response into a Givens-rotation angle matrix (Φ/ψ) and transmits it as a VHT/HE Compressed Beamforming Report (CBFR). Any device in WiFi monitor mode within range can passively sniff these frames without joining the network.
Two independent 20242025 research results establish the severity of this exposure:
1. **BFId** (KIT, ACM CCS 2025) — re-identifies 197 individuals from BFI alone with >90% accuracy from 5 s of capture. https://publikationen.bibliothek.kit.edu/1000185756
2. **LeakyBeam** (NDSS 2025) — detects occupancy through walls at 20 m with 82.7% TPR / 96.7% TNR using only plaintext BFI. https://www.ndss-symposium.org/wp-content/uploads/2025-5-paper.pdf
Capture tooling is freely available: **Wi-BFI** (pip-installable), **PicoScenes**, **Nexmon BFI patches** for BCM43455c0 (Raspberry Pi 5 / 4 / 3B+).
### 1.2 Gap in the existing RuView pipeline
The wifi-densepose / RuView pipeline processes CSI via the rvCSI runtime (ADR-095/096) and emits presence, pose, vitals, and zone-activity events. **No layer in the existing pipeline measures whether the data it is processing is capable of identifying individuals.** All CSI is treated as equivalent from a privacy standpoint regardless of operating regime.
This gap becomes a compliance and liability issue at deployment scale. An operator placing RuView in a care home, hotel, shared office, or rental property has no instrument to verify that the system is operating anonymously.
### 1.3 BFI as a sensing signal
BFI is not only a threat vector — its compressed angle matrices carry multipath geometry useful for presence and motion detection, particularly in single-AP deployments where MIMO CSI is unavailable. BFLD treats BFI as an **optional input alongside CSI**, not a replacement.
### 1.4 What this ADR is *not*
- Not a removal of the CSI pipeline. ADR-095/096 rvCSI stays authoritative for CSI.
- Not a port of any external sniffer into the repo. The Nexmon capture path lives in a separate adapter (see ADR-123).
- Not a Matter SDK ship — Matter exposure is filtered through the ADR-116 `cog-ha-matter` boundary.
---
## 2. Decision
Create a new Rust crate **`wifi-densepose-bfld`** in `v2/crates/` that:
1. **Ingests** BFI angle matrices (Φ/ψ) from CBFR frames, optionally fused with CSI.
2. **Computes** nine named features and an `identity_risk_score` (separability × temporal_stability × cross_perspective_consistency × sample_confidence).
3. **Gates** all output through a `privacy_class` byte that **structurally prevents** identity-correlated data from being published at classes 2 (anonymous) and 3 (restricted).
4. **Emits** `BfldEvent` JSON over MQTT under `ruview/<node_id>/bfld/*` with per-class topic routing.
5. **Enforces three invariants structurally, not by policy**:
- **I1**: Raw BFI never exits the node.
- **I2**: Identity embedding is in-RAM-only (no disk, no network).
- **I3**: Cross-site identity correlation is cryptographically impossible via per-site keyed BLAKE3 hash rotation with a daily epoch.
The umbrella implementation is decomposed into five sub-ADRs:
| Sub-ADR | Scope |
|---------|-------|
| **ADR-119** | `BfldFrame` wire format, magic `0xBF1D_0001`, deterministic serialization, CRC32 |
| **ADR-120** | `privacy_class` semantics, BLAKE3 hash rotation, default-deny field classification |
| **ADR-121** | Identity risk scoring formula, coherence gate, leakage estimator |
| **ADR-122** | RuView surface: HA entities, Matter cluster boundary, MQTT topic ACL |
| **ADR-123** | Capture path: Pi 5 / Nexmon adapter + ESP32-S3 BFI feasibility |
### 2.1 Crate module layout
```
v2/crates/wifi-densepose-bfld/
├── Cargo.toml
└── src/
├── lib.rs
├── frame.rs # BfldFrame (ADR-119)
├── extractor.rs # CBFR parser → BfiCapture
├── features.rs # 9 features
├── identity_risk.rs # risk score (ADR-121)
├── privacy_gate.rs # privacy_class enforcement (ADR-120)
├── hash_rotation.rs # BLAKE3 per-site rotation (ADR-120)
├── emitter.rs # BfldEvent → MQTT
├── mqtt.rs # topic routing (ADR-122)
└── ffi.rs # PyO3 bindings (ADR-117 pattern)
```
### 2.2 Reuse map
| BFLD module | Depends on |
|---|---|
| `features.rs` | `wifi-densepose-signal/src/ruvsense/coherence.rs`, `multistatic.rs` |
| `identity_risk.rs` | `wifi-densepose-ruvector/src/viewpoint/attention.rs`, `coherence.rs` |
| `privacy_gate.rs` | (new) — no upstream dependency |
| `hash_rotation.rs` | `blake3 = "1.5"` (keyed mode) |
| `extractor.rs` | `vendor/rvcsi/crates/rvcsi-adapter-nexmon` (ADR-095/096) |
---
## 3. Consequences
### Positive
- First explicit, auditable RF-layer privacy primitive in the wifi-densepose ecosystem.
- `identity_risk_score` doubles as an anomaly signal (sudden spike → new AP firmware / nearby attacker-grade sniffer / unusual propagation).
- BFI fusion augments presence/motion in single-AP deployments.
- Deterministic frame hashes extend the ADR-028 witness-bundle pattern to the new surface.
- Cross-site isolation is **structural, not policy-dependent** — a stronger guarantee than ACLs.
### Negative
- ESP32-S3 cannot directly capture CBFR via the Espressif WiFi API. Full BFLD pipeline requires a Pi 5 / Nexmon host sniffer (cognitum-v0 available; see ADR-123).
- `identity_risk_score` calibration requires the KIT BFId dataset (non-commercial research agreement).
- Estimated effort: ~10.5 engineer-weeks across the six ADRs.
### Neutral
- BFLD does not prevent passive BFI capture by an external attacker (LeakyBeam-class). It only ensures the **node's own output** is non-identifying. Operators must understand this distinction.
- Daily hash rotation prevents multi-day analytics correlating individual signatures across the day boundary. Acceptable for privacy goals; may surprise analytics use-cases.
---
## 4. Alternatives Considered
### Alt 1: Skip BFI entirely (CSI-only)
Rejected because: (a) leaves the identity-leakage gap open for the CSI pipeline; (b) as BFI tooling becomes ubiquitous (Wi-BFI, PicoScenes), the absence of a privacy layer becomes more conspicuous for operators.
### Alt 2: Publish `identity_risk_score` publicly by default
Rejected: the risk score itself is privacy-sensitive (reveals presence via timing correlation). Default is opt-in.
### Alt 3: Cloud ML on raw BFI
Rejected: violates I1. Cloud training creates an off-node store of angle matrices reconstructible into identity profiles.
### Alt 4: Differential privacy noise on BFI at ingress
Deferred to a follow-up ADR. DP sensitivity analysis and its interaction with `identity_risk_score` calibration are not yet complete. Current design achieves privacy through structural impossibility, not noise injection.
---
## 5. Acceptance Criteria
- [ ] **AC1**: Extractor parses BFI from 802.11ac and 802.11ax captures, 20/40/80/160 MHz, 2×2 through 4×4 MIMO.
- [ ] **AC2**: Presence detection latency ≤ 1 s p95 from first non-empty BFI frame.
- [ ] **AC3**: Motion score published at ≥ 1 Hz on `ruview/<node_id>/bfld/motion/state`.
- [ ] **AC4**: Raw BFI bytes never present in any serialized `BfldFrame` payload at any `privacy_class` value.
- [ ] **AC5**: With `privacy_mode` enabled, all identity-derived fields are absent from outbound events.
- [ ] **AC6**: Identical `BfiCapture` inputs produce bit-identical `BfldFrame` serialization (deterministic hash).
- [ ] **AC7**: Pipeline produces valid `BfldEvent` outputs without `csi_matrix` (BFI-only mode).
Per-sub-ADR acceptance criteria are defined in ADR-119 through ADR-123.
---
## 6. Phased Rollout
| Phase | ADR | Scope | Effort |
|-------|-----|-------|--------|
| **P1** | 119 | Frame format + extractor stub | 1.5 wk |
| **P2** | 121 | Features + identity_risk_score | 2.0 wk |
| **P3** | 120 | Privacy gate + hash rotation | 1.5 wk |
| **P4** | 122 (a) | MQTT emitter + HA discovery | 1.5 wk |
| **P5** | 122 (b) | Matter cluster boundary in `cog-ha-matter` | 1.5 wk |
| **P6** | 123 | Pi 5 / Nexmon capture adapter | 2.5 wk |
| **Total** | | | **10.5 wk** |
---
## 7. Related ADRs
See header table. Cross-references in body cite the structural reuse of:
- ADR-024 (AETHER embedding for identity_risk computation)
- ADR-027 (MERIDIAN's no-cross-site assumption is now structurally enforced by I3)
- ADR-028 (witness-bundle extends to BFLD surface)
- ADR-029/030 (`multistatic.rs`, `cross_room.rs` reused)
- ADR-095/096 (rvCSI Nexmon adapter for BFI capture)
- ADR-115 (HA surface extension)
- ADR-116 (`cog-ha-matter` boundary filter)
- ADR-117 (PyO3 bindings pattern)

View File

@ -0,0 +1,163 @@
# ADR-119: BFLD Frame Format and Wire Protocol
| Field | Value |
|-------|-------|
| **Status** | Proposed |
| **Date** | 2026-05-24 |
| **Deciders** | ruv |
| **Parent** | [ADR-118](ADR-118-bfld-beamforming-feedback-layer-for-detection.md) |
| **Relates to** | [ADR-028](ADR-028-esp32-capability-audit.md) (witness/deterministic proof), [ADR-095](ADR-095-rvcsi-edge-rf-sensing-platform.md) (rvCSI `CsiFrame` schema) |
| **Tracking issue** | TBD |
---
## 1. Context
The BFLD pipeline (ADR-118) emits an over-the-wire `BfldFrame` consumed by the RuView aggregator, HA bridge, and witness bundle. The frame must be:
1. **Deterministic** — identical input ⇒ bit-identical output, so witness hashes survive verification (ADR-028 pattern).
2. **Self-describing** — magic + version so future BFLD revisions don't silently corrupt aggregator state.
3. **Privacy-classified at the byte level** — the receiver must know the data class before it even parses the payload, so it can drop frames it isn't authorized to handle.
4. **Compact** — BFLD nodes may emit at up to 10 Hz; the frame must be small enough for unsharded MQTT and ESP-NOW transport.
5. **Endianness-stable** — captures from x86_64 (ruvultra), aarch64 (cognitum-v0, Pi 5 cluster), and Xtensa (ESP32-S3) must produce identical bytes.
The existing rvCSI `CsiFrame` (ADR-095) is the closest precedent. BFLD reuses the same little-endian convention and the same "validate-before-FFI" posture.
---
## 2. Decision
### 2.1 `BfldFrame` header (40 bytes, little-endian, packed)
```rust
#[repr(C, packed)]
pub struct BfldFrameHeader {
pub magic: u32, // 0xBF1D_0001
pub version: u16, // 1
pub flags: u16, // bit0=has_csi_delta, bit1=privacy_mode, bit2-15 reserved
pub timestamp_ns: u64, // monotonic capture clock
pub ap_hash: [u8; 16], // BLAKE3-keyed(site_salt, ap_mac)[0..16]
pub sta_hash: [u8; 16], // BLAKE3-keyed(site_salt ‖ day_epoch, sta_mac)[0..16]
pub session_id: [u8; 16], // ephemeral, rotated on capture-session boundary
pub channel: u16, // 802.11 channel number
pub bandwidth_mhz: u16, // 20 | 40 | 80 | 160
pub rssi_dbm: i16,
pub noise_floor_dbm: i16,
pub n_subcarriers: u16,
pub n_tx: u8,
pub n_rx: u8,
pub quantization: u8, // 0=f32, 1=i16, 2=i8, 3=packed (4-bit nibbles)
pub privacy_class: u8, // 0=raw, 1=derived, 2=anonymous, 3=restricted (default 2)
pub payload_len: u32,
pub payload_crc32: u32, // CRC-32/ISO-HDLC over payload bytes only
}
```
Total header size: 40 bytes (validated by `static_assertions::const_assert_eq!`).
### 2.2 Payload structure
Payload is a length-prefixed sequence of typed sections in this exact order:
```
payload = compressed_angle_matrix
‖ amplitude_proxy
‖ phase_proxy
‖ snr_vector
‖ optional_csi_delta (present iff flags.bit0 set)
‖ optional_vendor_extension (length 0 allowed)
```
Each section is `[u32 len_le][bytes...]`. The CRC32 covers all section bytes including length prefixes, but **not** the header.
### 2.3 Privacy-class gating at serialization
The serializer enforces these rules **before** writing any payload bytes:
| `privacy_class` | `compressed_angle_matrix` | Identity-derived fields | Notes |
|-----------------|---------------------------|-------------------------|-------|
| 0 (`raw`) | full | full | **Local-only**, never serialized to a network sink |
| 1 (`derived`) | downsampled to 8-bit, top-k subcarriers | full | Operator-acknowledged research mode |
| 2 (`anonymous`, **default**) | absent (zero-length section) | absent | Production default |
| 3 (`restricted`) | absent | absent + diagnostic-only | Equivalent to class 2 + suppresses `identity_risk_score` on the bus |
The serializer returns `Err(BfldError::PrivacyViolation)` if the caller attempts to publish a class-0 frame through a network sink. This is enforced by a sink-type marker trait (`LocalSink` vs `NetworkSink`).
### 2.4 Deterministic serialization
Three guarantees:
1. **Field order is fixed** by `#[repr(C, packed)]`.
2. **Float quantization is canonical**`quantization` byte values 1/2/3 use specified round-half-to-even with documented saturation; f32 (value 0) is forbidden over the wire (local-only).
3. **CRC32 is computed last**, after all section bytes are placed.
The witness test in `tests/determinism.rs` captures a 200-frame BFI fixture, serializes it 1,000 times across two threads, and verifies the BLAKE3 of the resulting byte stream is bit-identical.
### 2.5 Magic value rationale
`0xBF1D_0001` is chosen so that `bf1d` reads as "BFLD" in hex-dump output, easing wireshark / xxd debugging. The final `0001` is the major version; minor revisions bump `version` field.
---
## 3. Consequences
### Positive
- 40-byte header + compact payload fits comfortably in a 1500-byte MTU even at 4×4 MIMO with 256 subcarriers.
- Serialization is `#[no_std]` compatible — same code can run on ESP32-S3 (when ESP-NOW transport is added under ADR-123 P2).
- Witness-bundle integration is direct: the existing `archive/v1/data/proof/verify.py` pattern extends to a `bfld_verify.py` that consumes the same SHA-256 expected-hash file format.
### Negative
- `#[repr(C, packed)]` on the header means consumers must use `read_unaligned` — small ergonomic cost, mitigated by a `#[derive(BfldFrameAccess)]` proc-macro.
- Reserved flag bits 2-15 lock in future-extension order; any new bit assignment is a version bump.
### Neutral
- The vendor-extension section allows downstream RuView cogs (e.g., `cog-pose-estimation`) to attach metadata without a header change, at the cost of CRC scope creep. Vendor sections are explicitly outside the witness hash.
---
## 4. Alternatives Considered
### Alt 1: Protobuf / FlatBuffers
Rejected: schema evolution overhead, witness-hash instability across protoc versions, ~3× wire bloat for the small fixed-shape fields.
### Alt 2: CBOR
Rejected: deterministic CBOR (RFC 8949 §4.2) is achievable but the parser surface is large and tag handling is a footgun for the `no_std` ESP32 path.
### Alt 3: Variable-width magic / no magic
Rejected: receivers must distinguish BFLD frames from rvCSI `CsiFrame` and other RuView payloads on shared transports.
### Alt 4: Move CRC32 to header
Rejected: CRC must be computed after the payload, so its value would otherwise force a header rewrite; placing it last avoids a buffer-pass-back.
---
## 5. Acceptance Criteria
- [ ] **AC1**: `BfldFrameHeader` size is exactly 40 bytes on x86_64, aarch64, and xtensa-esp32s3.
- [ ] **AC2**: 1,000 serializations of a fixed `BfiCapture` fixture produce a bit-identical BLAKE3 hash.
- [ ] **AC3**: `privacy_class = 0` frame returned through `NetworkSink::publish()` returns `Err(BfldError::PrivacyViolation)`.
- [ ] **AC4**: Payload CRC32 mismatch causes `BfldFrame::parse()` to return `Err(BfldError::Crc)` without exposing partial payload state.
- [ ] **AC5**: Round-trip serialize/parse preserves all header fields exactly.
- [ ] **AC6**: A frame with `flags.bit0 = 0` (no CSI delta) and an unexpected CSI-delta section is rejected.
- [ ] **AC7**: Bench: serialization throughput ≥ 50k frames/sec on a 2025-era M1/M2 / Pi 5 core.
---
## 6. References
- ADR-118 §2 (umbrella decision)
- ADR-095 `CsiFrame` (`vendor/rvcsi/crates/rvcsi-core/src/frame.rs`)
- CRC-32/ISO-HDLC: `crc = "3"` crate
- BLAKE3 keyed mode: `blake3 = "1.5"`
- IEEE 802.11-2020 §19.3.12 (Compressed Beamforming Report)

View File

@ -0,0 +1,179 @@
# ADR-120: BFLD Privacy Class and Hash Rotation
| Field | Value |
|-------|-------|
| **Status** | Proposed |
| **Date** | 2026-05-24 |
| **Deciders** | ruv |
| **Parent** | [ADR-118](ADR-118-bfld-beamforming-feedback-layer-for-detection.md) |
| **Relates to** | [ADR-027](ADR-027-cross-environment-domain-generalization.md) (MERIDIAN no-cross-site), [ADR-032](ADR-032-multistatic-mesh-security-hardening.md) (mesh security), [ADR-106](ADR-106-dp-sgd-and-primitive-isolation.md) (primitive isolation), [ADR-115](ADR-115-home-assistant-integration.md) (privacy mode) |
| **Tracking issue** | TBD |
---
## 1. Context
ADR-118 declares three structural invariants for BFLD:
- **I1**: Raw BFI never exits the node.
- **I2**: Identity embedding is in-RAM-only.
- **I3**: Cross-site identity correlation is cryptographically impossible.
I1/I2 are enforced by sink typing and module visibility (ADR-119 §2.3). I3 requires a hash-rotation scheme that makes the same physical person produce **different** `rf_signature_hash` values across sites and across day boundaries, without any out-of-band coordination between sites.
The existing `HA-PRIVACY` mode in ADR-115 already toggles between "full" and "anonymous" surfaces, but at a per-event granularity — not at a per-byte-field granularity. BFLD requires the latter because the `BfldFrame` payload mixes sensing data (publishable) and identity-derived data (non-publishable) in the same struct.
The BFId paper (KIT, ACM CCS 2025) demonstrates that even a few minutes of BFI capture across the same site is sufficient to build a persistent biometric. The mitigation must be **structural**, not policy-dependent.
---
## 2. Decision
### 2.1 The four privacy classes
A single `privacy_class: u8` byte in the `BfldFrame` header (ADR-119 §2.1) selects one of four classes. The crate enforces field availability statically through marker types.
| Class | Name | Use case | Available fields |
|-------|------|----------|------------------|
| **0** | `raw` | Local-only research, never networked | All fields, full-precision BFI matrix, identity embedding |
| **1** | `derived` | Operator-acknowledged research over LAN | Downsampled angle matrix, full features, identity_risk_score, identity_embedding |
| **2** | `anonymous` (**default**) | Production deployment | Aggregate sensing only: presence, motion, person_count, zone_id, confidence |
| **3** | `restricted` | Care-home / regulated deployment | Class 2 minus `identity_risk_score` and `rf_signature_hash` |
Default for new RuView nodes is class **2**. Operators must explicitly opt-down to class 1 via the existing `--research-mode` flag (ADR-115 §7); class 0 is reserved for `cargo test` and is unreachable from `wifi-densepose-sensing-server`.
### 2.2 Enforcement via marker types
```rust
pub trait Sink {}
pub trait LocalSink: Sink {} // Allowed: classes 0,1,2,3
pub trait NetworkSink: Sink {} // Allowed: classes 1,2,3 (NOT class 0)
pub trait MatterSink: NetworkSink {} // Allowed: class 2,3 + cluster-filter (ADR-122)
impl Emitter {
pub fn publish<S: NetworkSink>(&self, sink: &S, frame: BfldFrame)
-> Result<(), BfldError>
{
if frame.header.privacy_class == 0 {
return Err(BfldError::PrivacyViolation {
reason: "class 0 to NetworkSink",
});
}
// ... serialize and write
}
}
```
The compiler refuses to call `publish` on a sink that doesn't impl `NetworkSink` with a class-0 frame because the runtime check is paired with a sink-marker check. Cross-sink frame routing requires an explicit class transition (see §2.4).
### 2.3 BLAKE3 keyed hash rotation for `rf_signature_hash`
The signature hash is computed as:
```rust
pub fn rf_signature_hash(
site_salt: &[u8; 32], // generated on first boot, persisted in TPM/KMS
day_epoch: u32, // floor(unix_time_utc / 86400)
features: &IdentityFeatures,
) -> Hash {
let mut hasher = blake3::Hasher::new_keyed(site_salt);
hasher.update(&day_epoch.to_le_bytes());
hasher.update(&features.canonical_bytes());
hasher.finalize()
}
```
**Structural cross-site isolation**: because `site_salt` is a 256-bit random secret unique to each node and never transmitted, two sites observing the same physical person produce uncorrelated hashes. There is no key the operator (or an attacker who compromises one node) can use to bridge sites. This is stronger than a policy-based "do not share" rule because the bridge **cannot be computed**.
**Daily rotation**: `day_epoch` flipping at UTC midnight forces the hash of the same person to change once per day. Multi-day correlation requires re-acquiring the biometric, which the rotation actively breaks.
### 2.4 Class-transition transformer
The only way a high-class frame becomes a lower-class frame is through `PrivacyGate::demote(frame, target_class)`. This function:
1. Asserts the target class is strictly higher number than (or equal to) the input class.
2. Zeroes the disallowed fields with `subtle::Zeroize`.
3. Re-computes `payload_crc32`.
4. Returns the new frame.
There is no `promote` operation — a class-2 frame cannot be turned back into a class-1 frame, because the dropped fields were not retained anywhere reachable from the gate.
### 2.5 `identity_embedding` lifecycle
The embedding (output of the AETHER encoder, ADR-024) is held in a `subtle::Zeroizing<[f32; 128]>` ring buffer of 64 entries (≈30 KB). Entries are:
1. Written by the encoder on each capture window.
2. Consumed by `identity_risk_score` computation (ADR-121).
3. **Never** written to disk, MQTT, or any other I/O sink — there is no `Serialize` impl on the type.
4. Overwritten by the ring (FIFO).
A compile-time `#[forbid(serde::Serialize)]` lint on `IdentityEmbedding` ensures a future PR cannot accidentally add a `Serialize` derive.
### 2.6 Default-deny field classification
Every new field added to `BfldFrame` or `BfldEvent` must be tagged with `#[must_classify]` (a custom attribute macro). The macro fails compilation if the field is not listed in the per-class allow-list table. This forces future contributors to make an explicit privacy decision on every new field.
---
## 3. Consequences
### Positive
- Cross-site identity correlation is **computationally impossible**, not merely "prohibited by policy". This is the strongest form of privacy guarantee available without a TEE.
- Default-deny via `#[must_classify]` prevents the common pattern of "a new field shipped, then six months later we noticed it was identity-leaky".
- `identity_embedding` cannot be serialized by accident — the type system carries the constraint.
- The class transition transformer makes the data lifecycle explicit and auditable.
### Negative
- `site_salt` storage requires either a TPM (ADR-095/096 rvCSI platform feature gap) or a secrets file with strict mode. Loss of `site_salt` makes historical witness comparisons impossible — by design, but a documentation hazard.
- `#[must_classify]` is a custom proc-macro; another moving part in the build.
- Operators wanting multi-day analytics must work in aggregates only, not on per-individual signatures.
### Neutral
- Class 0 is `cargo test`-only. Some CI runners may need an explicit feature flag to compile class-0 paths.
---
## 4. Alternatives Considered
### Alt 1: Single boolean `privacy_mode` flag (status quo from ADR-115)
Rejected: insufficient granularity. The frame mixes publishable sensing with non-publishable identity, so the gate must operate at field-level, not event-level.
### Alt 2: SHA-256 instead of BLAKE3
Rejected: BLAKE3 keyed-hash mode is ~5× faster on the ESP32-S3 / Cortex-M cores and the security margin is equivalent for this use case. SHA-256 has no keyed-hash mode (HMAC-SHA256 is the alternative; works but is slower).
### Alt 3: Hash rotation on the hour, not the day
Rejected: hourly rotation breaks legitimate "person was here in the morning, came back in the afternoon" use-cases that operators may want. Day boundary is the compromise.
### Alt 4: Per-event nonces instead of daily epoch
Rejected: per-event nonces would force the consumer to track which events came from the same person within a session, which leaks identity information by structure. The day epoch preserves a coarse temporal grouping without leaking finer-grained identity.
---
## 5. Acceptance Criteria
- [ ] **AC1**: Calling `Emitter::publish` with a `privacy_class = 0` frame on a `NetworkSink` returns `BfldError::PrivacyViolation`.
- [ ] **AC2**: Two BFLD nodes with different `site_salt` values observing the same simulated person produce `rf_signature_hash` values whose Hamming distance is ≥ 120 bits over 100 trials (statistical isolation test).
- [ ] **AC3**: A frame with `privacy_class = 3` has both `identity_risk_score` and `rf_signature_hash` absent from the serialized payload.
- [ ] **AC4**: `PrivacyGate::demote(class_1_frame, target=0)` fails to compile (compile-fail test).
- [ ] **AC5**: A PR adding a new field to `BfldEvent` without `#[must_classify]` fails the build.
- [ ] **AC6**: `IdentityEmbedding` has no `Serialize` impl reachable from any public function.
- [ ] **AC7**: Dropping an `IdentityEmbedding` value zeroizes its memory (verified by a debugger-readable test under `cargo test --features zeroize-validation`).
---
## 6. References
- ADR-118 (umbrella)
- ADR-119 (frame format; `privacy_class` byte location)
- KIT BFId (ACM CCS 2025): https://publikationen.bibliothek.kit.edu/1000185756
- NDSS LeakyBeam (2025): https://www.ndss-symposium.org/wp-content/uploads/2025-5-paper.pdf
- BLAKE3 keyed-hash: https://github.com/BLAKE3-team/BLAKE3
- `subtle::Zeroize` for memory hygiene

View File

@ -0,0 +1,169 @@
# ADR-121: BFLD Identity Risk Scoring and Coherence Gate
| Field | Value |
|-------|-------|
| **Status** | Proposed |
| **Date** | 2026-05-24 |
| **Deciders** | ruv |
| **Parent** | [ADR-118](ADR-118-bfld-beamforming-feedback-layer-for-detection.md) |
| **Relates to** | [ADR-024](ADR-024-contrastive-csi-embedding-model.md) (AETHER), [ADR-027](ADR-027-cross-environment-domain-generalization.md) (MERIDIAN), [ADR-029](ADR-029-ruvsense-multistatic-sensing-mode.md) (multistatic fusion), [ADR-086](ADR-086-edge-novelty-gate.md) (novelty gate precedent), [ADR-120](ADR-120-bfld-privacy-class-and-hash-rotation.md) (privacy class) |
| **Tracking issue** | TBD |
---
## 1. Context
BFLD's distinguishing primitive is the `identity_risk_score` — a scalar that says **"is this capture window currently capable of identifying a specific person?"**. The score has two consumers:
1. **The operator** — exposed as an HA diagnostic sensor (ADR-122). A spike from the long-term baseline indicates the RF environment has shifted toward a higher-leakage regime (new AP firmware, denser MIMO, attacker-grade sniffer in range).
2. **The privacy gate** (ADR-120) — when the score crosses a configurable threshold, the gate downgrades the active `privacy_class` automatically (e.g., 2 → 3) until the score recovers.
The score must be:
- **Bounded** in `[0, 1]` for HA gauge entities.
- **Calibrated** against actual re-ID success rate, ideally on the KIT BFId dataset.
- **Computable on-device** at ≥ 1 Hz on a Pi 5 core or an aarch64 cognitum-v0.
- **Stable** — small environmental changes should not produce wild swings; the score is for slow-moving regime detection, not per-frame chatter.
ADR-086 (edge novelty gate) establishes a precedent for an on-device gate primitive. BFLD's risk scoring borrows the gate-pattern but with identity leakage as the trigger condition.
---
## 2. Decision
### 2.1 Nine features (from BFLD spec §5)
The features are computed over a sliding window of `W = 32` BFI frames (≈3 s at 10 Hz):
| Feature | Definition | Source |
|---------|------------|--------|
| `mean_angle_delta` | mean( ‖ Φ_t Φ_{t-1} ‖ over subcarriers ) | extractor |
| `subcarrier_variance` | var( ‖ Φ ‖ over subcarrier axis ) | extractor |
| `temporal_entropy` | Shannon entropy of angle-bin histogram over W | extractor |
| `doppler_proxy` | FFT peak magnitude of mean-angle time series | features.rs |
| `path_stability` | 1 ‖ Φ_t median(Φ_{t-W..t}) ‖ / scale | features.rs |
| `cross_antenna_correlation` | mean Pearson correlation across n_tx × n_rx pairs | features.rs |
| `burst_motion_score` | high-pass-filtered angular velocity, soft-thresholded | features.rs |
| `stationarity_score` | 1 rolling KL divergence over W/2 vs W | features.rs |
| `identity_separability_score` | top-1 cosine to nearest AETHER cluster centroid | identity_risk.rs |
The first eight are sensing features (also used by the presence/motion pipeline). Only the ninth depends on the AETHER embedding and therefore on `identity_class >= 1`.
### 2.2 Identity risk formula
```rust
pub fn identity_risk_score(
sep: f32, // identity_separability_score, [0, 1]
stab: f32, // temporal_stability, [0, 1] = ema(path_stability, alpha=0.1)
consist: f32,// cross_perspective_consistency, [0, 1] = multistatic.rs
conf: f32, // sample_confidence, [0, 1] = f(SNR, n_subcarriers, n_rx)
) -> f32 {
// Clamp inputs, then multiplicative combination — any factor near 0 dominates.
let s = sep.clamp(0.0, 1.0);
let t = stab.clamp(0.0, 1.0);
let p = consist.clamp(0.0, 1.0);
let c = conf.clamp(0.0, 1.0);
(s * t * p * c).clamp(0.0, 1.0)
}
```
Multiplicative combination is chosen so that **any** weak factor (e.g., very low SNR ⇒ low `conf`) collapses the score toward 0. This matches the privacy intent: when the system is uncertain, the score should be low and the operator should not be alarmed.
### 2.3 Calibration target
The score is calibrated against re-ID success rate on a held-out test split of the KIT BFId dataset. A piecewise-linear isotonic regression maps raw scores into a calibrated `[0, 1]` band where `score ≥ 0.8` corresponds to `>80%` re-ID accuracy on a 5-second window in the calibration dataset.
Calibration parameters live in `v2/crates/wifi-densepose-bfld/data/risk_calibration.toml` and are versioned independently of the code. A regression update is a content-only PR.
### 2.4 Coherence gate
The coherence gate (per ADR-029 `coherence_gate.rs` pattern) consumes the risk score and emits one of four actions:
```rust
pub enum GateAction {
Accept, // score < 0.5, publish normally
PredictOnly, // 0.5 <= score < 0.7, publish but flag confidence
Reject, // 0.7 <= score < 0.9, drop the event
Recalibrate, // score >= 0.9, drop AND rotate site_salt
}
```
The `Recalibrate` action triggers a forced site-salt rotation — an aggressive response to a sustained high-risk regime. It costs the operator continuity of long-term aggregate analytics but is the right answer to an attacker-grade sniffer arriving in range.
### 2.5 Hysteresis
To prevent oscillation around the gate thresholds, the gate uses ±0.05 hysteresis and a 5-second debounce. A score must cross the boundary by the hysteresis margin and persist for the debounce window before the gate action changes.
### 2.6 Compute budget
| Stage | Target latency | Implementation |
|-------|----------------|----------------|
| Feature extraction (8 features) | < 3 ms per window | ndarray + nalgebra; vectorized over subcarriers |
| Separability (cosine to centroids) | < 5 ms per window | RuVector RaBitQ index (ADR-085) over 1k centroids |
| Risk score | < 0.1 ms | scalar multiplicative |
| Gate decision + hysteresis | < 0.1 ms | scalar |
Total p95 ≤ 10 ms per window on a Pi 5 core (8 ms target). Headroom on cognitum-v0 (Pi 5 + Hailo) is ample; ESP32-S3 hosts only the extraction stage (features computed; risk score is host-side per ADR-123).
---
## 3. Consequences
### Positive
- The risk score becomes a first-class diagnostic surface for operators and a structural input to the privacy gate — both consumers from a single computation.
- Multiplicative combination is conservative under uncertainty; the system is biased toward "report low risk when unsure", which is the right default.
- Calibration is a content-only update — no recompile needed when the calibration file changes.
- The recalibration gate action gives the system a self-healing response to a sniffer arrival without operator intervention.
### Negative
- Calibration requires the KIT BFId dataset; without it the score is uncalibrated and serves only as an internal trigger, not a publishable signal.
- Multiplicative scoring can be dominated by `sample_confidence`, which is sensitive to channel conditions. A persistent low-SNR environment will keep the published score near 0 even when the underlying separability is high — an under-reporting failure mode that the documentation must call out.
- The recalibrate action breaks historical hash continuity by design; an operator who wants long-term aggregates needs to know they will see a discontinuity on recalibrate events.
### Neutral
- The nine features overlap with the existing CSI pipeline. BFLD computes them on BFI; the CSI pipeline computes them on CSI. Both can be fused via `cross_perspective_consistency`.
---
## 4. Alternatives Considered
### Alt 1: Additive scoring (`(s + t + p + c) / 4`)
Rejected: a sample with high separability but very low confidence would still produce a moderate score, which over-reports risk in degraded RF conditions.
### Alt 2: Maximum scoring (`max(s, t, p, c)`)
Rejected: over-reports risk because any single high factor pins the output, even if the others contradict it.
### Alt 3: Learned scoring (a small MLP)
Rejected for this ADR: introduces an opaque model whose output cannot be audited from first principles. The multiplicative formula is simple, conservative, and directly explainable to operators. A learned model is a future option once enough calibration data is in hand.
### Alt 4: Per-feature thresholds instead of a continuous score
Rejected: continuous score is needed for the HA gauge entity and for downstream calibration. Per-feature thresholds would force operators to interpret nine separate binaries.
---
## 5. Acceptance Criteria
- [ ] **AC1**: All nine features are computed in `< 8 ms` p95 per window on a Pi 5 core.
- [ ] **AC2**: `identity_risk_score` is monotonic non-decreasing in any single input when the other three are held constant.
- [ ] **AC3**: Calibration regression on the KIT BFId test split: `score ≥ 0.8` corresponds to ≥ 80% re-ID accuracy ± 5%.
- [ ] **AC4**: The coherence gate emits `Recalibrate` if score is ≥ 0.9 for ≥ 5 seconds.
- [ ] **AC5**: Hysteresis prevents action oscillation across ± 0.05 of a threshold within a 5-second window.
- [ ] **AC6**: At `privacy_class = 3`, the risk score is computed but not published to MQTT (kept local for the gate only).
- [ ] **AC7**: A reproducible 1,000-frame synthetic fixture produces a deterministic score sequence (bit-identical across runs).
---
## 6. References
- ADR-118 (umbrella)
- ADR-024 (AETHER encoder for separability)
- ADR-029 (`coherence_gate.rs` precedent)
- ADR-086 (edge novelty gate pattern)
- ADR-120 §2.4 (class transition consumed by gate)
- KIT BFId dataset: https://publikationen.bibliothek.kit.edu/1000185756

View File

@ -0,0 +1,191 @@
# ADR-122: BFLD RuView Surface — Home Assistant, Matter, MQTT Exposure
| Field | Value |
|-------|-------|
| **Status** | Proposed |
| **Date** | 2026-05-24 |
| **Deciders** | ruv |
| **Parent** | [ADR-118](ADR-118-bfld-beamforming-feedback-layer-for-detection.md) |
| **Relates to** | [ADR-031](ADR-031-ruview-sensing-first-rf-mode.md) (sensing-first), [ADR-100](ADR-100-cog-packaging-specification.md) (cog packaging), [ADR-115](ADR-115-home-assistant-integration.md) (HA-DISCO + HA-MIND), [ADR-116](ADR-116-cog-ha-matter-seed.md) (Matter cog), [ADR-120](ADR-120-bfld-privacy-class-and-hash-rotation.md) (privacy class) |
| **Tracking issue** | TBD |
---
## 1. Context
ADR-115 shipped the RuView Home Assistant surface (21 entities, MQTT auto-discovery, mTLS, privacy mode) on the `wifi-densepose-sensing-server` Rust binary. ADR-116 is packaging this as the `cog-ha-matter` Cognitum Seed cog. BFLD must integrate into this surface without expanding the privacy-sensitive footprint already in production.
The integration must:
1. **Extend HA-DISCO** to advertise BFLD entities via the existing MQTT-discovery scheme.
2. **Reject identity fields at the Matter boundary** — Matter exposes occupancy/motion/people-count only, never `identity_risk_score` or `rf_signature_hash`.
3. **Route MQTT topics by privacy class** — class-2/3 events on the public topic tree, class-1 events on a gated `research/` subtree, class-0 events nowhere.
4. **Federate cleanly into cognitum-v0** — BFLD events from multiple nodes flow through `cognitum-rvf-agent` (port 9004 per CLAUDE.local.md) for cross-node analytics, but identity-derived fields are stripped at the **publishing-node boundary**, not at the federation hub.
---
## 2. Decision
### 2.1 HA entity surface (six new entities per node)
The cog republishes the existing 21 ADR-115 entities and adds:
| Entity ID | Type | Source field | Class gate | Diagnostic |
|-----------|------|--------------|------------|------------|
| `binary_sensor.<node>_bfld_presence` | occupancy | `BfldEvent.presence` | ≥ 2 | no |
| `sensor.<node>_bfld_motion` | gauge `[0,1]` | `BfldEvent.motion` | ≥ 2 | no |
| `sensor.<node>_bfld_person_count` | int | `BfldEvent.person_count` | ≥ 2 | no |
| `sensor.<node>_bfld_zone_activity` | enum | `BfldEvent.zone_activity` | ≥ 2 | no |
| `sensor.<node>_bfld_identity_risk` | gauge `[0,1]` | `BfldEvent.identity_risk_score` | == 2 only | **yes** |
| `sensor.<node>_bfld_confidence` | gauge `[0,1]` | `BfldEvent.confidence` | ≥ 2 | yes |
The `identity_risk` entity is exposed only under privacy class 2 and is flagged `entity_category: diagnostic` so HA dashboards do not promote it to a main-card sensor by default. Under class 3 it is computed but not published (per ADR-121 §2.4).
MQTT discovery payload follows the ADR-115 schema, plus a `bfld_version` attribute matching the `BfldFrameHeader::version` field.
### 2.2 MQTT topic tree
```
ruview/<node_id>/bfld/presence/state # class >= 2
ruview/<node_id>/bfld/motion/state # class >= 2
ruview/<node_id>/bfld/person_count/state # class >= 2
ruview/<node_id>/bfld/zone_activity/state # class >= 2
ruview/<node_id>/bfld/confidence/state # class >= 2
ruview/<node_id>/bfld/identity_risk/state # class == 2 only
ruview/<node_id>/bfld/raw # class 1, OFF by default
ruview/<node_id>/bfld/availability # online/offline marker
```
`raw` (class-1 derived BFI) is **not present** in the discovery payload at all — operators must explicitly subscribe and acknowledge the research-mode caveat. The publishing crate emits `MQTT_RAW_DISABLED` to availability when `privacy_class < 1`.
### 2.3 Mosquitto ACL example
```
# Default-deny everything not explicitly granted
pattern read ruview/+/bfld/+/state
pattern read ruview/+/bfld/availability
# Public roles cannot read identity_risk or raw
user public
deny read ruview/+/bfld/identity_risk/state
deny read ruview/+/bfld/raw
# Operator role can read identity_risk for diagnostics
user operator
allow read ruview/+/bfld/identity_risk/state
# Research role can read raw (requires class-1 operation)
user research
allow read ruview/+/bfld/raw
```
The cog ships a default ACL template under `cog-ha-matter/etc/mosquitto.acl.d/bfld.conf` for operators who use the embedded broker (ADR-116 §2.2).
### 2.4 Matter cluster boundary
`cog-ha-matter` exposes BFLD via **three Matter clusters** only:
| Matter cluster | Source entity | Notes |
|---|---|---|
| Occupancy Sensing (0x0406) | `binary_sensor.<node>_bfld_presence` | reports binary occupancy + uncertainty (mapped from `confidence`) |
| Boolean State (0x0045) | `sensor.<node>_bfld_motion >= 0.3` | thresholded; raw motion not exposed |
| Occupancy Sensing extension | `sensor.<node>_bfld_person_count` | uses occupancy-sensor count where Matter spec supports |
**Explicitly NOT exposed via Matter**:
- `identity_risk_score`
- `rf_signature_hash`
- `identity_embedding`
- `raw` BFI
- `zone_activity` (zone IDs are site-specific and Matter is a cross-site surface)
- `confidence` (HA-only diagnostic)
The Matter filter is implemented in `cog-ha-matter/src/matter/bfld_filter.rs` as a `MatterSink` trait impl that rejects classes 0 and 1 at compile time (via ADR-120 §2.2 marker types).
### 2.5 Federation with cognitum-v0
`cognitum-rvf-agent` (port 9004) receives BFLD events from multiple nodes. The events arriving at the federation hub are **already class-2/3** — identity-derived fields were stripped at each publishing node. The hub does not see and cannot reconstruct raw BFI or identity embeddings.
The federation contract:
| At publishing node | At cognitum-rvf-agent |
|---|---|
| Strip class-0/1 fields per ADR-120 | Receive class-2/3 events only |
| Rotate `rf_signature_hash` per ADR-120 §2.3 | Aggregate counts; **do not** correlate hashes across sites |
| Sign event with node Ed25519 key | Verify signature; reject unsigned events |
A `federation-witness` script (extending ADR-028) runs nightly on the hub and proves that no class-0/1 fields appeared in any received event over the previous 24 h.
### 2.6 HA blueprints (shipped with the cog)
Three operator-ready blueprints under `cog-ha-matter/blueprints/`:
1. **Presence-driven lighting**`binary_sensor.*_bfld_presence``light.turn_on/off` with configurable hold time.
2. **Motion-aware HVAC**`sensor.*_bfld_motion > 0.3` ⇒ raise HVAC setpoint by ΔT.
3. **Identity-risk anomaly notification**`sensor.*_bfld_identity_risk` exceeds rolling z-score threshold ⇒ HA `notify.*` to the operator with the originating node and the 7-day baseline.
---
## 3. Consequences
### Positive
- Six new HA entities give operators a complete BFLD diagnostic dashboard without leaking identity.
- Matter exposure is structurally narrow — the cluster-filter implementation cannot accidentally expose identity fields because the type system rejects them.
- The default ACL template gives operators a working privacy posture out of the box.
- The federation contract makes it explicit that the hub cannot reconstruct identity even from the union of all node events.
### Negative
- The `identity_risk` HA entity exists only under class 2. Operators who run class 3 deployments cannot see the score even in their own dashboard. This is correct but may surprise care-home installers; documentation must be clear.
- Three Matter clusters is conservative — some HA users may want the count exposed as a percentage or rate, which Matter does not support natively.
- HA-blueprint coverage is intentionally small; operators wanting custom automations must work through the YAML surface.
### Neutral
- The federation witness script runs nightly. A short-duration leak between witnesses is possible but bounded — any successful exfiltration of class-1 fields would still need to be reconstructed into identity, which the daily hash rotation breaks.
---
## 4. Alternatives Considered
### Alt 1: Expose `identity_risk` over Matter (Generic Sensor cluster)
Rejected: Matter is a cross-vendor surface; exposing identity-risk there leaks the score to every Matter controller in the home, including third-party hubs the operator may not control. Keep it HA-internal.
### Alt 2: One unified MQTT topic `ruview/<node>/bfld` with JSON payload
Rejected: per-entity topics are the HA-DISCO convention (ADR-115) and let ACLs be field-specific. A unified topic forces an all-or-nothing read policy.
### Alt 3: Federate raw BFI to cognitum-v0 for cross-node analytics
Rejected: violates ADR-120 I1 (raw never leaves the node). Aggregates are sufficient for cross-node analytics; raw centralization is a hard no.
### Alt 4: Default `entity_category: diagnostic = false` for `identity_risk`
Rejected: promoting `identity_risk` to a main-card sensor would surprise operators with an identity-adjacent gauge on their main dashboard. Diagnostic category is the right default.
---
## 5. Acceptance Criteria
- [ ] **AC1**: HA auto-discovery publishes six new entities per node on first connect; HA recognizes all six.
- [ ] **AC2**: Under privacy class 3, `sensor.<node>_bfld_identity_risk` is absent from the MQTT discovery payload.
- [ ] **AC3**: `MatterSink::publish` rejects any frame at compile time when the source has `privacy_class < 2`.
- [ ] **AC4**: The default mosquitto ACL denies `read ruview/+/bfld/identity_risk/state` to the `public` user role.
- [ ] **AC5**: Three HA blueprints install cleanly into a fresh HA install and trigger their configured actions against a mock BFLD event stream.
- [ ] **AC6**: The federation-witness script detects an injected class-1 field in a synthetic event and exits non-zero.
- [ ] **AC7**: Matter occupancy-sensing cluster reports presence within 1 s of an HA `binary_sensor.*_bfld_presence` state change.
---
## 6. References
- ADR-115 (HA-DISCO entity scheme)
- ADR-116 (`cog-ha-matter` cog packaging)
- ADR-120 (privacy class enforcement)
- ADR-121 (identity risk source)
- ADR-100 (cog packaging spec)
- Mosquitto ACL reference: https://mosquitto.org/man/mosquitto-conf-5.html
- Matter spec — Occupancy Sensing cluster (0x0406)
- Cognitum V0 appliance dashboard: `http://cognitum-v0:9000/`

View File

@ -0,0 +1,186 @@
# ADR-123: BFLD Capture Path — Pi 5 / Nexmon Adapter and ESP32-S3 Feasibility
| Field | Value |
|-------|-------|
| **Status** | Proposed |
| **Date** | 2026-05-24 |
| **Deciders** | ruv |
| **Parent** | [ADR-118](ADR-118-bfld-beamforming-feedback-layer-for-detection.md) |
| **Relates to** | [ADR-022](ADR-022-multi-bssid-wifi-scanning.md) (multi-BSSID scan), [ADR-028](ADR-028-esp32-capability-audit.md) (capability audit), [ADR-095](ADR-095-rvcsi-edge-rf-sensing-platform.md) (rvCSI), [ADR-096](ADR-096-rvcsi-ffi-crate-layout.md) (rvCSI FFI), [ADR-110](ADR-110-esp32-c6-firmware-extension.md) (C6 firmware), [ADR-119](ADR-119-bfld-frame-format-and-wire-protocol.md) (BfldFrame) |
| **Tracking issue** | TBD |
---
## 1. Context
ADR-118 declares that BFLD captures BFI from commodity WiFi 5/6 traffic. The question this sub-ADR answers is: **on which hardware, with which adapter, and against which firmware limitations**.
### 1.1 ESP32-S3 BFI capability gap
The ESP32 capability audit (ADR-028) and the ESP32-S3 / C6 firmware (`firmware/esp32-csi-node/`, ADR-110) confirm that the Espressif WiFi API exposes **CSI** capture (`esp_wifi_set_csi_*`) but does not expose **raw 802.11 management-frame capture** in monitor mode for non-self-addressed CBFR reports. The S3 sees the CBFR frames its own AP-link generates (when it acts as a beamformer), but it cannot promiscuously sniff CBFR frames between other STA/AP pairs in the neighborhood.
The C6 (ESP32-C6 with RISC-V + Wi-Fi 6) has a more flexible RF subsystem but the same software-API constraint at the time of writing.
### 1.2 Pi 5 / Nexmon as the production capture host
The rvCSI platform (ADR-095/096) already vendors a Nexmon-based adapter (`rvcsi-adapter-nexmon`) that captures CSI from BCM43455c0 chips (Pi 5 / Pi 4 / Pi 3B+). Nexmon patches the firmware to surface CSI to userspace and **also surface CBFR frames** — the BFI extension is the same code path with a different filter.
cognitum-v0 (Pi 5 in the fleet, per CLAUDE.local.md) is already running Nexmon + the rvCSI runtime. It is the natural BFLD capture host.
### 1.3 What we need from each hardware tier
| Tier | Role | BFI capture | CSI capture | Notes |
|------|------|-------------|-------------|-------|
| ESP32-S3 / C6 | Sensing leaf | **no** | yes | Continues providing CSI to the existing pipeline |
| Pi 5 / Nexmon | BFLD host | **yes** | yes (via Nexmon) | Primary BFLD capture |
| ruvultra (RTX 5080 + AX210) | Training / dev | yes (via AX210 monitor mode) | yes | Dev capture; not production |
| cognitum-v0 (Pi 5) | Appliance | **yes** (production) | yes | Production BFLD host |
---
## 2. Decision
### 2.1 Production capture path: Pi 5 / Nexmon
The BFLD production capture path is implemented as a new module in the vendored rvCSI submodule:
```
vendor/rvcsi/crates/rvcsi-adapter-nexmon/
└── src/
├── lib.rs
├── csi.rs # existing CSI capture
└── bfi.rs # NEW — CBFR capture, exports BfiCapture
```
The new `bfi.rs` parses CBFR frames (VHT or HE) from the Nexmon-patched firmware's userspace stream, extracts Φ/ψ angle matrices, and emits a `BfiCapture` struct that feeds the BFLD crate's extractor (ADR-118 §2.1, ADR-119).
The patch lives in the rvcsi submodule (`github.com/ruvnet/rvcsi`) and is shipped as `rvcsi-adapter-nexmon ^0.3.5` to crates.io. The wifi-densepose workspace consumes the published crate (or the submodule path during development).
### 2.2 BFLD crate adapter trait
`wifi-densepose-bfld` defines a `BfiCaptureAdapter` trait:
```rust
pub trait BfiCaptureAdapter: Send + 'static {
type Error: std::error::Error + Send + Sync + 'static;
fn capture(&mut self) -> Result<Option<BfiCapture>, Self::Error>;
fn capabilities(&self) -> AdapterCapabilities;
}
pub struct AdapterCapabilities {
pub supports_he: bool, // 802.11ax (Wi-Fi 6)
pub supports_160mhz: bool,
pub max_n_rx: u8,
pub host_kind: HostKind, // Pi5Nexmon | Ax210Linux | EspS3Local | Mock
}
```
Three impls ship initially:
- `NexmonBfiAdapter` — Pi 5 / Nexmon (production)
- `Ax210BfiAdapter` — Linux + AX210 in monitor mode (dev / training, ruvultra)
- `MockBfiAdapter` — replay fixture for tests and CI
A future fourth impl (`EspS3LocalAdapter`) is reserved for the day Espressif exposes promiscuous CBFR — it captures only the S3's own AP-link BFI for local self-reporting.
### 2.3 Capture-side privacy boundary
Per ADR-120 I1, raw BFI never leaves the capturing host. The adapter must therefore live on **the same physical box** as the BFLD crate's extractor and privacy gate. The architecture pattern:
```
[ Pi 5 / cognitum-v0 ]
├── nexmon firmware (kernel)
├── rvcsi-adapter-nexmon (userspace, captures BFI)
├── wifi-densepose-bfld (extracts, scores, gates)
│ └── privacy_gate → class-2/3 frames only
└── wifi-densepose-sensing-server (publishes MQTT + Matter)
```
A network-mode adapter that streams raw BFI from a remote capture host is **explicitly forbidden**. The adapter trait does not include any "remote URL" parameter.
### 2.4 Channel / bandwidth coverage
The Nexmon adapter is configured by the existing `rvcsi-adapter-nexmon` channel-hopping schedule (ADR-095 §3.2). For BFLD it adds:
- Filter for VHT CBFR (action frame, category 21, action 0) and HE CBFR (category 30, action 0).
- Per-channel BFI session-tracking — the same beamformer/beamformee pair across a channel hop is reconciled by AP MAC + STA MAC.
### 2.5 ESP32-S3 local self-reporting (deferred)
For deployments without a Pi 5 / cognitum-v0 nearby, a degraded BFLD mode runs on the ESP32-S3 itself:
- Captures only its own AP-link CBFR (self-addressed).
- Computes features over the limited window.
- Reports a coarsened `presence` + `motion` only — no `identity_risk_score` (insufficient sample diversity).
- Emits `BfldFrame` at `privacy_class = 2` with a `flags.bit3 = self_only` marker.
This path is implemented in firmware as part of P2 / P3 of the ADR-118 rollout, after the Pi 5 path is stable. Effort is small (firmware path reuses the existing CSI capture loop) but the value is also low until ESP32 firmware exposes promiscuous CBFR — which is a Espressif-IDF roadmap item, not under project control.
### 2.6 Dev path: ruvultra / AX210
For local dev iteration on the Windows / ruvultra box, the AX210 adapter provides a workable capture path on Linux (ruvultra is Ubuntu 6.17 per CLAUDE.local.md). The AX210 supports 802.11ax + monitor mode with the `iwlwifi` driver patches that have landed upstream. This path is for training-data collection and dev testing, not production.
---
## 3. Consequences
### Positive
- BFLD ships as a production-ready surface on cognitum-v0 day one — no new hardware procurement.
- The adapter-trait design lets new capture paths (AX211, MediaTek Filogic, etc.) slot in without changes to the BFLD crate.
- The capture-side privacy boundary is structural: there is no remote-capture code path, so a future PR cannot accidentally introduce one.
- ruvultra's AX210 path unblocks training and dev iteration on Linux without depending on the Pi 5 fleet.
### Negative
- BFLD's full pipeline depends on cognitum-v0 (or another Pi 5 / Nexmon host) being present in the deployment. Operators without a Pi 5 get only the degraded ESP32-S3 self-reporting path (limited utility).
- Nexmon is a third-party kernel module; tracking upstream patches is ongoing maintenance.
- The CBFR frame format differs between VHT (802.11ac) and HE (802.11ax); the parser must support both, and any 802.11be (Wi-Fi 7) deployment will require an additional parser path.
### Neutral
- ruvultra dev path uses AX210; the AX210 is not the production NIC, so dev/prod parity is via the fixture replay + the Nexmon adapter on cognitum-v0.
---
## 4. Alternatives Considered
### Alt 1: Centralized capture host streams raw BFI to RuView nodes
Rejected: violates ADR-120 I1 (raw never leaves the capture host). The capture host **is** the BFLD node; there is no separation.
### Alt 2: Wait for Espressif promiscuous CBFR support
Rejected: indefinite timeline outside project control. The Pi 5 / Nexmon path is shippable today.
### Alt 3: Custom Pi 5 firmware fork instead of Nexmon
Rejected: forking BCM firmware is a huge maintenance burden and Nexmon already does what we need.
### Alt 4: Only ship the ESP32-S3 self-reporting path
Rejected: insufficient sample diversity for `identity_risk_score`. The whole point of BFLD is to measure identity leakage; a self-only path cannot do that meaningfully.
---
## 5. Acceptance Criteria
- [ ] **AC1**: `NexmonBfiAdapter` captures ≥ 100 valid CBFR frames per minute from a 2-AP-3-STA test bench on a Pi 5 (cognitum-v0).
- [ ] **AC2**: VHT (802.11ac) and HE (802.11ax) CBFR frames are both parsed; mixed-PHY captures produce correctly-typed `BfiCapture` outputs.
- [ ] **AC3**: 20/40/80/160 MHz channel widths are all supported (one fixture each in `tests/`).
- [ ] **AC4**: `BfiCaptureAdapter` trait has no method accepting a remote URL or socket address.
- [ ] **AC5**: ESP32-S3 self-only adapter compiles `#[no_std]` and produces a `BfldFrame` with `flags.bit3 = self_only` set, no `identity_risk_score` field.
- [ ] **AC6**: AX210 adapter on ruvultra captures CBFR for at least one fixture-generating dev session.
- [ ] **AC7**: Capture loop sustains 10 Hz BFI frame rate on cognitum-v0 without dropping frames over a 10-minute soak test.
---
## 6. References
- ADR-095 / ADR-096 (rvCSI Nexmon adapter)
- ADR-028 (ESP32 capability audit)
- ADR-110 (ESP32-C6 firmware)
- Nexmon BCM43455c0 patches: https://github.com/seemoo-lab/nexmon
- Wi-BFI: https://arxiv.org/abs/2309.04408
- IEEE 802.11-2020 §19.3.12 (VHT CBFR), §27.3.11 (HE CBFR)
- cognitum-v0 fleet entry: `CLAUDE.local.md` (Tailscale fleet table)

View File

@ -0,0 +1,293 @@
# BFLD SOTA Survey — Beamforming Feedback: State of the Art
## 1. BFI vs CSI: Physical-Layer Differences and Leakage Profiles
### 1.1 Channel State Information (CSI)
CSI is the raw complex channel frequency response (CFR) measured at the receiver across
all subcarriers and antenna pairs. Extracting CSI requires either (a) firmware
modifications on the receiving NIC (Atheros CSI Tool, Nexmon CSI patch for BCM43455c0
on Raspberry Pi 4/5) or (b) a specialized radio (software-defined radio with 802.11
decoders). The resulting matrix is typically Ntx × Nrx × Nsubcarrier complex floats —
dense, high-dimensional, and not transmitted over the air in standard operation.
This project's existing rvCSI runtime (`vendor/rvcsi/`) captures CSI via the Nexmon
firmware patch on Raspberry Pi hardware (ADR-095/096). The ESP32-S3 on COM9 cannot
produce CSI in the format needed for the full pipeline — it lacks the antenna count
and the firmware support for per-subcarrier phase extraction at the fidelity rvcsi
expects.
### 1.2 Beamforming Feedback Information (BFI)
BFI is fundamentally different: it is the compressed representation of the channel that
a STA (station/client) sends back to an AP (access point) so the AP can steer its beam
toward the client. The standard (IEEE 802.11ac/ax, section 9.4.1.52) defines the
compressed beamforming format as:
1. The AP transmits a Null Data Packet (NDP) sounding frame.
2. The STA measures the channel from the NDP, computes the singular-value decomposition
V = U Sigma V^H, then compresses the right singular vectors using a series of Givens
rotations.
3. The Givens rotation produces a set of angles: Phi (φ) angles in [0, 2π) and Psi (ψ)
angles in [0, π/2). In 802.11ac these are quantized to 7 and 5 bits respectively; in
802.11ax the default is 4 bits for φ and 2 bits for ψ.
4. The STA transmits a VHT/HE Compressed Beamforming frame (CBFR) containing those
quantized angles, one set per active subcarrier (or per compressed subcarrier group),
plus an SNR field per stream.
The CBFR is a **management-plane 802.11 frame, not an 802.3 data frame**. It is
transmitted before association encryption is negotiated; in WPA2/WPA3 deployments, the
beamforming sounding and feedback exchange happens in the clear because WPA2/WPA3
encrypt data frames only. Even 802.11ax (Wi-Fi 6/6E) with Protected Management Frames
(PMF) enabled does NOT encrypt action frames in the beamforming exchange by default on
commodity APs as of 2025 (NDSS 2025 finding, "Lend Me Your Beam",
https://www.ndss-symposium.org/ndss-paper/lend-me-your-beam-privacy-implications-of-plaintext-beamforming-feedback-in-wifi/).
**Key asymmetry**: extracting CSI requires physical access to a device and firmware
modification; extracting BFI requires only a WiFi adapter in monitor mode and a parser
for the CBFR frame format. Wi-BFI (Haque, Meneghello, Restuccia; ACM WiNTECH 2023,
https://arxiv.org/abs/2309.04408) is an open-source pip-installable tool that does
exactly this.
### 1.3 Why BFI Is Uniquely Dangerous
CSI is a research instrument — accessing it requires deliberate effort. BFI is a
production protocol artifact that any 802.11ac/ax STA broadcasts periodically as a
matter of course. The attack-surface implications:
- **No firmware modification needed** on the target device or AP.
- **Passive capture** is sufficient. Frames are broadcast in all directions, not
beamformed, so a nearby attacker receives them at essentially the same SNR as the AP.
- **Structured leakage**: the Phi/Psi angle matrices encode a compressed but
non-trivially-invertible representation of the spatial channel, which includes
multipath geometry that is body-shaped — the human body is a dielectric obstacle whose
shape and movement modulate the channel.
- **Regularity**: sounding happens at the AP's request, typically at 540 Hz in modern
802.11ax deployments. A 60-second capture at 10 Hz produces 600 CBFR frames —
sufficient for the BFId classifier to achieve >90% re-identification accuracy (ACM CCS
2025, https://dl.acm.org/doi/10.1145/3719027.3765062).
---
## 2. Compressed Angle Matrices: The Identity Surface
### 2.1 Givens Rotation Reconstruction
The Phi/Psi angles encode a unitary matrix via the Givens rotation decomposition:
V = G(N, N-1, φ_{N,N-1}, ψ_{N,N-1}) · G(N, N-2, ...) · ... · G(2,1, φ_{2,1}, ψ_{2,1}) · D
where D is a diagonal phase matrix. For a 2×2 MIMO system this is two angles; for a
4×4 system this is 12 angles. Each "column" in the BFI payload corresponds to one
subcarrier group (or every 4th subcarrier in 802.11ax, every 2nd in 802.11ac).
The resulting per-subcarrier angle sequence is a time-varying signature of the spatial
channel. Because the human body modulates the multipath channel, this sequence encodes
body-specific geometry. The BFId paper (https://dl.acm.org/doi/10.1145/3719027.3765062)
demonstrates that a supervised classifier trained on these sequences achieves identity
recognition on a 197-person dataset.
### 2.2 The AI/ML Compression Feedback Loop
IEEE 802.11 standardization is actively exploring AI/ML-based compression for
beamforming feedback (IEEE 802.11bn / Wi-Fi 8 study group, "Toward AIML Enabled WiFi
Beamforming CSI Feedback Compression", https://arxiv.org/html/2503.00412v1). This work
proposes neural codebooks that reduce feedback overhead. An important side effect: the
learned latent space of a neural BFI compressor may be *more* identity-discriminative
than the raw angles, because neural compression tends to preserve class-discriminative
variance. BFLD must be designed to handle compressed BFI encodings, not just the raw
Phi/Psi format.
---
## 3. Tooling Landscape
### 3.1 Wi-BFI
- **Source**: https://arxiv.org/abs/2309.04408 / https://github.com/kfoysalhaque/MU-MIMO-Beamforming-Feedback-Extraction-IEEE802.11ac
- **Capabilities**: real-time and offline extraction of BFAs from 802.11ac and 802.11ax;
20/40/80/160 MHz; SU-MIMO and MU-MIMO; pip-installable.
- **Relevance to BFLD**: the BFLD extractor module (`extractor.rs`) must produce
semantically equivalent output to Wi-BFI — i.e., per-subcarrier Phi/Psi angle arrays
plus per-stream SNR — so that research results from the Wi-BFI ecosystem can be
replicated on BFLD captures.
### 3.2 PicoScenes
- **Source**: https://www.semanticscholar.org/paper/Eliminating-the-Barriers-Demystifying-Wi-Fi-Baseband-Jiang-Zhou/...
- **Capabilities**: cross-NIC CSI and CBFR measurement platform; supports Intel AX200,
AX210, Atheros AR9300, QCA6174; runs on Linux with custom kernel modules.
- **Relevance to BFLD**: PicoScenes can simultaneously capture CSI and BFI from the
same frame sequence, enabling the CSI+BFI fusion path described in the BFLD spec
(`csi_matrix` optional input). The rvcsi adapter layer (`vendor/rvcsi/`) already
handles the Nexmon PCap format; a PicoScenes adapter is a future extension.
### 3.3 Nexmon CSI (BCM43455c0)
- **Source**: https://github.com/seemoo-lab/nexmon_csi
- **Hardware**: Raspberry Pi 4/5 with BCM43455c0 chip — the same hardware used in
`cognitum-v0` (Pi 5 appliance in this fleet, see CLAUDE.local.md).
- **Capabilities**: per-subcarrier complex CSI in monitor mode; 4×4 MIMO on Pi 5 with
BCM43456.
- **Relevance to BFLD**: the rvcsi nexmon adapter already routes PCap frames from this
hardware into the wifi-densepose pipeline. BFI extraction on the same hardware requires
an additional sniffer for CBFR frames alongside the CSI sniffer.
### 3.4 Atheros CSI Tool / iwlwifi CSI
- Legacy tools for Intel and Atheros NICs; require kernel module injection. Not relevant
to the current hardware fleet (ESP32-S3 + Raspberry Pi 5), but documented here for
completeness and for future Intel AX210-based deployments.
---
## 4. Identity Inference Attacks
### 4.1 BFId (ACM CCS 2025)
**Reference**: Todt, Morsbach, Strufe; KIT. ACM CCS 2025.
https://dl.acm.org/doi/10.1145/3719027.3765062
https://publikationen.bibliothek.kit.edu/1000185756
Dataset: https://ps.tm.kit.edu/english/bfid-dataset/index.php
BFId is the first published identity-inference attack that uses BFI exclusively (no
CSI). The methodology:
1. **Dataset**: 197 individuals, multiple sessions, multiple AP angles. Each subject
walked a defined path while their STA continuously triggered BFI exchanges. CSI
was also recorded simultaneously for comparison.
2. **Feature extraction**: temporal sequences of Phi/Psi angle matrices, windowed at
varying lengths. Basic statistical features (mean, variance, cross-subcarrier
correlation) fed a shallow classifier.
3. **Results**: re-identification accuracy >90% with as little as 5 seconds of BFI.
Performance was robust to different walking styles and viewing angles — consistent
with the hypothesis that anthropometric body shape (torso width, stride, limb
geometry) rather than gait phase is the primary discriminator.
4. **Comparison to CSI**: BFI-only accuracy was comparable to CSI-only accuracy for
identity tasks, despite BFI being a compressed representation. This confirms that
the Givens angle compression preserves identity-discriminative variance.
### 4.2 LeakyBeam (NDSS 2025)
**Reference**: Xiao, Chen, He, Han, Han; Zhejiang U., NTU, KAIST. NDSS 2025.
https://www.ndss-symposium.org/ndss-paper/lend-me-your-beam-privacy-implications-of-plaintext-beamforming-feedback-in-wifi/
LeakyBeam targets occupancy detection (is a person present?) rather than identity.
Key findings:
- BFI is detectable through walls at 20 m range with commodity hardware.
- True positive rate 82.7%, true negative rate 96.7% in real-world evaluation.
- The attack works because BFI encodes motion-induced channel perturbations even through
obstacles — the Phi/Psi angle variance changes measurably when a body enters the room.
- The defense (obfuscating BFI before transmission) requires minimal hardware changes.
**Implication for BFLD**: if a passive attacker with no relationship to the AP can
detect occupancy, then the BFLD node is implicitly broadcasting presence information
unless active obfuscation is deployed at the STA firmware level. BFLD cannot prevent
this passive attack — it can only ensure the *node's own output* does not additionally
leak identity.
### 4.3 Prior RF-Based Gait and Biometric Inference
Before BFI-specific attacks, the threat landscape was already established through
CSI-based attacks:
- **Gait from CSI**: WiGait (2017), Wi-Gait (ScienceDirect 2023,
https://www.sciencedirect.com/science/article/abs/pii/S1389128623001962),
Gait+Respiration ID (IEEE Xplore 2021,
https://ieeexplore.ieee.org/document/9488277) all demonstrate >90% gait-based
re-identification from standard WiFi.
- **Breathing biometrics**: Respiration rate and depth are person-specific at a
population level. IEEE 802.11 CSI captures breathing as amplitude oscillations at
0.10.5 Hz.
- **Anthropometric inference**: Hand size, torso width, and limb geometry modulate the
channel; classifiers trained on activity data have been shown to leak anthropometrics
as a side effect.
The BFId finding that BFI achieves comparable accuracy to CSI for identity is consistent
with this prior body of work — it simply demonstrates the attack is achievable with a
lower barrier to entry.
---
## 5. Privacy-Preserving Sensing: Current State of the Art
### 5.1 Differential Privacy on RF Embeddings
"Differentially Private Feature Release for Wireless Sensing: Adaptive Privacy Budget
Allocation on CSI Spectrograms" (https://arxiv.org/pdf/2512.20323) applies Laplace/
Gaussian mechanisms to CSI spectrograms, calibrating epsilon per subcarrier based on
empirical sensitivity. Results show meaningful reduction in identity-inference accuracy
while preserving activity-recognition utility at epsilon = 1.04.0.
BFLD's `identity_risk_score` could be used as an adaptive epsilon selector: high-risk
frames receive a tighter privacy budget (more noise), low-risk frames pass unmodified.
This is a forward-looking integration not in the current spec.
### 5.2 Federated / Local-Only Inference
The consensus across 20242025 literature on wireless federated learning
(https://arxiv.org/pdf/2603.19040, https://arxiv.org/pdf/2109.09142) is that
local differential privacy (LDP) with gradient perturbation is achievable on resource-
constrained edge devices. For BFLD's use case the critical property is simpler: the
identity embedding never needs to leave the node. There is no federated learning step
for identity. The risk score is a local computation whose output is published; the
embedding that produced it is not.
### 5.3 ZK Attestation for Sensing
ZK-SenseLM (https://arxiv.org/pdf/2510.25677) proposes zero-knowledge proofs that a
sensing model's output derives from legitimate data. This is architecturally close to
ADR-028's witness-bundle approach. Future BFLD work could use ZK proofs to attest that
the identity_risk_score was computed from the claimed input without revealing the input.
### 5.4 "Protecting Human Activity Signatures in Compressed IEEE 802.11 CSI Feedback"
(https://arxiv.org/pdf/2512.18529) — This 2024 paper directly addresses activity-
signature leakage in CBFR frames and proposes perturbation of Phi/Psi angles at the STA
before transmission. The defense is the dual of BFLD's approach: BFLD detects leakage
at the receiver; this paper proposes suppression at the transmitter. Both approaches
are complementary.
---
## 6. Relationship to Existing Project ADRs
**ADR-027 (MERIDIAN cross-environment generalization)**: BFLD's cross-room hash
rotation directly instantiates the "no cross-site correlation" invariant that MERIDIAN
assumes for privacy-safe multi-room deployment.
**ADR-028 (ESP32 capability audit + witness verification)**: The deterministic-proof
pattern (`verify.py` + SHA-256 expected hash) is the template for BFLD's own acceptance
test. BFLD must produce a deterministic frame hash given the same input — acceptance
criterion 6 in the spec.
**ADR-024 (AETHER contrastive CSI embedding)**: BFLD reuses the AETHER embedding
infrastructure for its identity_risk measurement. The risk score is a function of how
separable the current embedding is from the population of known embeddings.
**ADR-029/030 (RuvSense multistatic + field model)**: BFLD's `cross_perspective_
consistency` component of the risk formula requires correlation across multiple sensor
viewpoints — the multistatic infrastructure from ADR-029 provides this.
**ADR-032 (multistatic mesh security hardening)**: The BFLD threat model is a
superset of the security model in ADR-032. ADR-032 covers mesh compromise; BFLD adds
the passive sniffing threat at the management-plane layer.
---
## 7. Open Technical Questions
1. **BFI capture on ESP32-S3**: The ESP32-S3's `esp_wifi_csi_set_config` API provides
CSI via the vendor-specific Espressif HT20 format. It does not expose VHT/HE CBFR
frames. BFI capture on this hardware likely requires host-side sniffing (Pi 5 +
Nexmon in monitor mode, already available on cognitum-v0).
2. **Quantization resolution degradation**: At 4 bits for φ and 2 bits for ψ (802.11ax
defaults), the angle resolution is coarser than in 802.11ac (7/5 bits). The BFId
paper used 802.11ac hardware. BFLD must validate that the identity_risk_score
calibration remains valid at lower quantization.
3. **WiFi 7 (802.11be) changes**: 802.11be introduces multi-link operation (MLO) and
may change the sounding/feedback cadence. BFLD's frame format (magic 0xBF1D_0001,
version byte) is designed to accommodate future protocol versions.

View File

@ -0,0 +1,141 @@
# BFLD Soul — Architectural Intent and Ethical Stance
## 1. The Central Metaphor: Immune System, Not Surveillance Lens
An immune system does not catalog every pathogen it encounters. It classifies threats
by type, responds proportionally, and keeps its detailed records local to the organism.
When the immune system flags a cell as dangerous, it does not broadcast the cell's
identity to the outside world — it takes local action.
BFLD is built around this same principle. Its job is to detect when RF data is crossing
from the realm of "ambient sensing" into the realm of "identity record" — and to respond
locally: raise the risk score, restrict what leaves the node, rotate identifiers. It does
not produce identity; it guards against the accidental production of identity.
This distinction matters because the same physical signal that drives BFLD's presence
detection is also the signal that academic attackers (BFId, LeakyBeam) exploit for
re-identification. BFLD cannot suppress the underlying physics. What it can do is make
the node's *output* non-identifying, even when the node's *input* is capable of
supporting identification.
---
## 2. Distinguishing Identity from the Rest of WiFi Sensing
WiFi sensing produces a spectrum of information:
| Output | Privacy class | Reversibility |
|--------|--------------|---------------|
| Presence (yes/no) | 2 — anonymous | Not reversible to identity |
| Motion magnitude (0..1) | 1 — derived | Not reversible to identity |
| Person count (integer) | 1 — derived | Not reversible to identity |
| Zone activity | 1 — derived | Not reversible to identity |
| Identity risk score | 1 — derived | Risk score, not identity |
| RF signature hash | 1 — derived | Hash rotates daily; not reversible |
| Identity embedding | 0 — raw | Directly reversible to biometric |
| Raw BFI matrix | 0 — raw | Directly reversible to biometric |
BFLD's design follows this table structurally: the outputs in privacy class 0 never
leave the node. The outputs in class 1 leave the node only after explicit operator opt-in
for the sensitive ones (identity_risk_score). The outputs in class 2 flow freely.
This table is not a policy list — it is wired into the frame format. The `privacy_class`
byte in every `BfldFrame` is checked at the emitter boundary before any byte leaves the
node. Code that wants to send class-0 data must positively bypass a compile-time safety
check, not merely forget to set a flag.
---
## 3. Three Non-Negotiable Invariants
These are not configurable options. They are structural properties of BFLD that
hold regardless of operator configuration:
### Invariant 1: Raw BFI Never Leaves the Node
The BFI matrix, once ingested by the BFLD extractor, is consumed locally and never
serialized to any outbound channel. This is enforced in two ways:
1. The `BfldFrame` struct's `bfi_matrix` field is not part of the serializable payload
— it exists only as a private field in `extractor.rs` and is dropped after
feature extraction completes.
2. The MQTT emitter (`mqtt.rs`) has no code path that serializes a BFI matrix.
The `ruview/<node_id>/bfld/raw/state` topic is disabled by default and, when
enabled, publishes only a metadata summary (subcarrier count, timestamp, SNR range),
not the angle matrices.
### Invariant 2: Identity Embedding Is Local-Only
The embedding computed by the RuVector pipeline (used to calculate `identity_risk_score`)
lives in an in-RAM ring buffer with a configurable retention window (default: 10 minutes).
It is never written to disk. It is never serialized to any MQTT topic. It is never
included in any `BfldFrame` payload even at `privacy_class = 0` — raw means raw angles,
not the derived embedding.
The mathematical property that enables this: `identity_risk_score` can be computed as a
scalar from the embedding (separability × temporal_stability × cross_perspective_
consistency × sample_confidence) without revealing the embedding itself. The score is a
projection onto a scalar; the full vector is not required by any downstream consumer.
### Invariant 3: Cross-Site Identity Matching Is Structurally Impossible
The `rf_signature_hash` is computed as:
blake3(site_salt ‖ day_epoch ‖ ephemeral_features)
where `site_salt` is a secret generated at first boot, stored in NVS, and never
transmitted. Two BFLD nodes at two different sites will produce hashes in disjoint
hash spaces by construction. Even an adversary who obtains the hash stream from
both nodes cannot determine whether the same person visited both sites, because the
site_salt is unknown and different.
The daily rotation (`day_epoch` = floor(timestamp_ns / 86400e9)) means that even within
a single site, the hash of the same person changes each day. Hashes older than 24 hours
have zero correlation with hashes produced today.
This is structural impossibility, not policy. The invariant holds even if the operator
misconfigures the system, because it derives from the cryptographic property of blake3
with a secret key, not from access-control rules.
---
## 4. Relationship to RuView's Ambient Intelligence Positioning
The project memory records RuView's positioning as "ambient intelligence platform, not
sensor; packaging (HA, Docker, mDNS, blueprints) is the bottleneck." This framing is
load-bearing for BFLD's design.
A "sensor" in the Home Assistant model is a device that reports measurements. A "sensor"
is allowed to identify who is present — facial recognition cameras are sensors. BFLD
explicitly rejects this model: the node is an ambient intelligence node that knows
something about the environment (motion, occupancy, activity level) but structurally
cannot know *who* is in the environment.
This positioning enables deployment in spaces where identity-tracking would be
unacceptable: shared workspaces, guest accommodations, hotel rooms, care facilities.
The argument to an operator at a care facility is not "trust us, we won't log who your
patients are." It is: "the system is architecturally incapable of logging who your
patients are, because the identifier rotates daily with a site-specific secret we don't
hold."
---
## 5. Why This Layer Must Exist Before WiFi 7 Ships
802.11be (Wi-Fi 7) is entering mass market deployment in 20252026. It introduces
multi-link operation (MLO), which dramatically increases the frequency of beamforming
sounding exchanges. Where 802.11ax sonding might occur at 1040 Hz, MLO sounding on
multiple links simultaneously could produce 35× more CBFR frames per second.
More frames means more training data for identity classifiers. The BFId result at 5
seconds of 802.11ac data will almost certainly improve with 5 seconds of 802.11be MLO
data. The attack surface is not static.
BFLD's frame format (magic 0xBF1D_0001, version byte for extension) is designed to
remain valid across protocol generations. The feature extraction modules are pluggable:
a WiFi 7 BFI extractor can be added without changing the privacy gate, the hash rotation,
or the MQTT emitter. The invariants remain invariant.
The window to establish safe defaults is now, before the installed base is hundreds of
millions of unprotected nodes. BFLD is the layer that carries those safe defaults into
every deployment from day one.

View File

@ -0,0 +1,278 @@
# BFLD Security Threat Model
## 1. Adversary Classes
### A1 — Passive Sniffer (Curious Neighbor)
**Capability**: WiFi adapter in monitor mode; consumer laptop running Wi-BFI or
tcpdump with CBFR filter. No special access, no relationship to the target network.
**Goal**: Determine occupancy or identity of persons in an adjacent apartment/office.
**Effort**: Low. Wi-BFI is pip-installable. Monitor mode is available on commodity
Linux laptops. No prior knowledge of the target network required — CBFR frames are
broadcast in all directions.
**Relevance to BFLD**: A1 is the LeakyBeam threat (NDSS 2025). BFLD cannot prevent
A1 from capturing BFI from the air. BFLD's job is to ensure its own output does not
make A1's work easier by publishing identity-correlated data on reachable channels.
### A2 — Targeted Stalker
**Capability**: A1 capabilities plus knowledge of the target's device MAC address
(obtainable from BSSID probe requests) and time correlation with known schedules.
**Goal**: Track a specific individual's presence across time or across locations.
**Effort**: Medium. Requires sustained monitoring (hours to days) and a correlation
step.
**Relevance to BFLD**: If rf_signature_hash were stable over time, A2 could correlate
hash sequences across sessions to confirm a specific person's schedule. The daily hash
rotation (Invariant 3) severs this correlation.
### A3 — ISP / Operator
**Capability**: Access to MQTT broker, HA instance, or cloud integration receiving
BFLD events.
**Goal**: Build behavioral profiles of occupants across many homes/installations.
**Effort**: Low if raw or identity-correlated fields are published to the broker.
**Relevance to BFLD**: BFLD restricts what reaches the broker. An operator cannot
accidentally publish identity-correlated data because the privacy gate blocks it at
the node boundary.
### A4 — Nation-State / Law Enforcement
**Capability**: Compelled access to cloud storage, MQTT broker logs, or HA history.
Physical access to the BFLD node with forensic tools.
**Goal**: Retrospectively identify who was present at a location and when.
**Effort**: Depends on what data was logged. If BFLD's invariants hold, the broker
holds only: presence events (boolean), motion scores (float), person counts (integer),
and rotated hashes. None of these are individually re-identifiable.
**Relevant mitigation**: The daily hash rotation means that even log retention is
privacy-preserving: a hash from Monday and a hash from Tuesday, even from the same
person at the same node, are in disjoint hash spaces.
### A5 — Compromised AP Firmware
**Capability**: Malicious AP firmware that modifies the sounding schedule to extract
more identity-discriminative BFI, or that responds to specially crafted packets with
high-resolution channel feedback.
**Goal**: Improve passive capture quality from the node's BFI stream.
**Relevance to BFLD**: BFLD ingests BFI as captured from the air. If the AP is
compromised to produce unusually high-resolution BFI, BFLD's identity_risk_score
will correctly detect the elevated separability and flag the frames at higher risk.
The system is self-normalizing to the quality of what is captured.
### A6 — Supply-Chain Compromise of RuView Node
**Capability**: Modified BFLD binary with the privacy gate removed or with an
exfiltration path added.
**Goal**: Long-term silent collection of identity embeddings or raw BFI.
**Mitigation**: ADR-028's witness-bundle pattern — deterministic SHA-256 of the
pipeline output. A compromised binary would produce different output for the same
input, failing the verify.py check. The BFLD acceptance criterion 6 (deterministic
frame hashes) is the direct countermeasure.
---
## 2. Attack Trees
### AT-1: Passive BFI Capture → Identity Inference
```
Attacker Goal: Re-identify a specific person via BFI
|
+-- Step 1: Place WiFi adapter in monitor mode (A1)
| |
| +-- CBFR frames arrive unencrypted (established by NDSS 2025 / BFId)
|
+-- Step 2: Parse Phi/Psi angles using Wi-BFI or equivalent
| |
| +-- No modification of target device required (Wi-BFI passive)
|
+-- Step 3: Collect 5-60 seconds of frames
| |
| +-- BFId: 5s sufficient at 10 Hz sounding rate for >90% accuracy
|
+-- Step 4: Run identity classifier (BFId architecture or similar)
| |
| +-- Requires enrollment (prior reference capture)
| | |
| | +-- OR: exploit BFLD's rf_signature_hash as a correlation anchor
| | (mitigated by daily rotation — AT-2 below)
|
+-- Outcome: Identity label with >90% confidence
```
BFLD mitigation: BFLD does not prevent AT-1 at the air interface. It ensures that
BFLD's own output does not provide the "correlation anchor" in step 4.
### AT-2: Cross-Site Correlation via rf_signature_hash Leak
```
Attacker Goal: Confirm person X visited site A and site B on the same day
|
+-- Prerequisite: Attacker has read access to MQTT broker at both sites
|
+-- Step 1: Collect rf_signature_hash sequences from site A and site B
|
+-- Step 2: Look for matching hashes within the same day_epoch
| |
| +-- BLOCKED: site_salt is site-specific and secret.
| blake3(salt_A ‖ day ‖ features) != blake3(salt_B ‖ day ‖ features)
| even if features are identical.
| Two sites with the same person produce hashes in disjoint spaces.
|
+-- Outcome: No match possible. Attack fails structurally.
```
### AT-3: Timing Side-Channel on identity_risk_score
```
Attacker Goal: Infer when a known person is present by monitoring risk score changes
|
+-- Prerequisite: Read access to MQTT topic ruview/<node_id>/bfld/identity_risk/state
|
+-- Step 1: Baseline: collect identity_risk_score during known-empty periods
|
+-- Step 2: Monitor for anomalous spikes correlated with known schedules
| |
| +-- Partial mitigation: risk score is not published by default.
| | Operator must explicitly enable it.
| |
| +-- Residual risk: even with publication enabled, the score measures risk of
| identification, not identity itself. A high risk score means "this frame
| is identity-discriminative" not "person X is present."
|
+-- Mitigation: MQTT ACL restricts identity_risk to local broker by default.
+-- Mitigation: privacy_class=3 (restricted) zeros the risk score on output.
```
### AT-4: MQTT Topic Enumeration
```
Attacker Goal: Discover what BFLD data is published and harvest it
|
+-- Step 1: Connect to broker without TLS (if TLS not configured)
|
+-- Step 2: Subscribe to ruview/# wildcard
|
+-- Mitigation: Default mosquitto ACL denies wildcard subscription to anonymous clients.
+-- Mitigation: TLS + client certificates recommended for all BFLD deployments.
+-- Mitigation: ruview/<node_id>/bfld/raw/state is disabled by default.
```
### AT-5: Matter Cluster Abuse
```
Attacker Goal: Extract identity-correlated data via the Matter protocol integration
|
+-- Step 1: Join the Matter fabric as a legitimate controller
|
+-- Step 2: Read clusters exposed by the BFLD Matter endpoint
| |
| +-- Available: OccupancySensing (presence), MotionSensor (motion),
| PeopleCount (person_count)
| |
| +-- NOT AVAILABLE: identity_risk_score, rf_signature_hash, raw_bfi,
| identity_embedding — these are rejected at the Matter boundary.
|
+-- Outcome: Attacker gets presence/motion/count — same as any occupancy sensor.
No identity-correlated data is accessible via Matter.
```
---
## 3. Trust Boundary Diagram
```
┌────────────────────────────────────────────────────────────────────────┐
│ BFLD NODE (local) │
│ │
│ WiFi air interface │
│ │ CBFR frames (unencrypted, passively sniffable by any A1) │
│ ▼ │
│ ┌──────────────┐ raw BFI ┌──────────────┐ │
│ │ BFI │──────────────│ Feature │ │
│ │ Extractor │ (local RAM) │ Extractor │ │
│ └──────────────┘ └──────┬───────┘ │
│ │ features (not BFI) │
│ ▼ │
│ ┌──────────────┐ embedding │
│ │ Identity │──────────────┐ │
│ │ Risk Engine │ (local RAM │ │
│ └──────┬───────┘ ring buf) │ │
│ │ risk_score │ │
│ ▼ │ │
│ ┌───────────────────────────────────────────────────────┐ │ │
│ │ Privacy Gate │ │ │
│ │ privacy_class check | hash rotation | field masking │ │ │
│ └───────┬──────────────────────────────────────────────┘ │ │
│ │ filtered BfldFrame [embedding │ │
│ │ (no raw BFI, no embedding) NEVER exits │ │
│ ▼ this box] │ │
│ ┌──────────────┐ │ │
│ │ MQTT │ presence/motion/person_count/risk(opt) │ │
│ │ Emitter │────────────────────────────────────────► │ │
│ └──────────────┘ [TLS recommended] │ │
│ │ │
└──────────────────────────────────────────────────────────────┘─────────┘
│ MQTT (TLS)
┌─────────────────────┐ ┌──────────────────────────────────────┐
│ Local Broker │ │ cognitum-v0 federation endpoint │
│ (mosquitto) │──────► │ (identity fields STRIPPED at node │
└────────┬────────────┘ │ boundary before federation) │
│ └──────────────────────────────────────┘
┌─────────────────────┐ ┌──────────────────────────────────────┐
│ Home Assistant │──────► │ Matter Fabric │
│ (presence/motion/ │ │ (OccupancySensing / MotionSensor / │
│ person_count only)│ │ PeopleCount ONLY) │
└─────────────────────┘ └──────────────────────────────────────┘
```
---
## 4. Threat Profile per privacy_class Value
| privacy_class | Value | Data exposed outbound | Residual threats |
|--------------|-------|----------------------|-----------------|
| raw | 0 | Derived angles + amplitude proxy + phase proxy + SNR. Never BFI matrix. | Angle sequences are identity-discriminative; use only in controlled research environments. Never default. |
| derived | 1 | All BFLD output fields including identity_risk_score and rf_signature_hash. | Risk score timing side-channel (AT-3). Hash must remain rotated. |
| anonymous | 2 | presence, motion, person_count, zone_activity, confidence. No identity-correlated fields. | Temporal occupancy patterns may leak schedule information. Not identity. |
| restricted | 3 | presence only (binary). All other fields zeroed or suppressed. | Minimal. On/off presence is equivalent to a passive IR sensor. |
---
## 5. Witness / Attestation Strategy
Following ADR-028's pattern, BFLD should produce a deterministic proof bundle:
1. **Reference input**: a fixed seed synthetic BFI matrix (512 bytes, PRNG seed=117)
stored alongside the test suite.
2. **Expected output hash**: SHA-256 of the serialized `BfldFrame` produced from that
input, committed to the repository.
3. **CI check**: `verify_bfld.py` — same structure as `archive/v1/data/proof/verify.py`
— runs in CI and locally. A compromised binary (A6 threat) would change the output
hash and immediately fail this check.
4. **Witness log**: extend `docs/WITNESS-LOG-028.md` with a BFLD section covering the
privacy gate and hash rotation.
This attestation does not prevent a runtime compromise, but it raises the cost
significantly: a supply-chain attacker must either (a) match the expected output hash
while also exfiltrating data (computationally infeasible for a hash adversary), or
(b) accept that the tampered binary will be detected on the next verify run.

View File

@ -0,0 +1,279 @@
# BFLD Privacy Gating — Mechanisms in Depth
## 1. The privacy_class Byte: Concrete Data Exposure Tables
The `privacy_class` byte is the single authoritative classifier for what a BFLD node
is permitted to emit. It is set by the privacy gate module (`privacy_gate.rs`) on every
outbound `BfldFrame` based on the computed `identity_risk_score` and operator configuration.
### Class 0 — raw
Intended exclusively for local research captures and red-team validation. Not a
deployable configuration.
| Field | Published | Notes |
|-------|-----------|-------|
| presence | Yes | Boolean |
| motion | Yes | 0..1 float |
| person_count | Yes | u8 |
| identity_risk_score | Yes | f32 |
| rf_signature_hash | Yes | Rotated blake3, 32 bytes hex |
| zone_activity | Yes | |
| confidence | Yes | |
| compressed_angle_matrix | Yes | Phi/Psi per subcarrier — the sensitive surface |
| amplitude_proxy | Yes | |
| phase_proxy | Yes | |
| snr_vector | Yes | |
| bfi_matrix (raw) | NEVER | Dropped before serialization; not in wire format |
| identity_embedding | NEVER | Local RAM only; not in wire format |
### Class 1 — derived
Default for operator-opted-in diagnostics. Includes identity_risk_score and hash but
no angle matrices.
| Field | Published | Notes |
|-------|-----------|-------|
| presence | Yes | |
| motion | Yes | |
| person_count | Yes | |
| identity_risk_score | Yes | Diagnostic; not in HA default entities |
| rf_signature_hash | Yes | Rotated hash only |
| zone_activity | Yes | |
| confidence | Yes | |
| compressed_angle_matrix | No | Zeroed |
| amplitude_proxy | No | |
| phase_proxy | No | |
| snr_vector | Yes | Per-stream aggregate only |
| bfi_matrix (raw) | NEVER | |
| identity_embedding | NEVER | |
### Class 2 — anonymous
Default for all standard deployments. No identity-correlated fields.
| Field | Published | Notes |
|-------|-----------|-------|
| presence | Yes | |
| motion | Yes | |
| person_count | Yes | |
| identity_risk_score | No | Suppressed |
| rf_signature_hash | No | Suppressed |
| zone_activity | Yes | |
| confidence | Yes | |
| All angle/amplitude/phase fields | No | Zeroed |
| bfi_matrix (raw) | NEVER | |
| identity_embedding | NEVER | |
### Class 3 — restricted
Maximum privacy. Suitable for care facilities, medical deployments, guest spaces.
| Field | Published | Notes |
|-------|-----------|-------|
| presence | Yes | |
| motion | No | Suppressed |
| person_count | No | Suppressed |
| All other fields | No | |
| bfi_matrix (raw) | NEVER | |
| identity_embedding | NEVER | |
---
## 2. rf_signature_hash Rotation Algorithm
### Construction
```
site_salt := blake3_keyed_hash(secret="bfld-site-seed", data=node_mac_address)
# Generated once at first boot, stored in NVS, never transmitted
# 32 bytes
day_epoch := floor(timestamp_ns / 86_400_000_000_000)
# One new epoch per UTC day
ephemeral := mean_angle_delta ‖ subcarrier_variance ‖ burst_motion_score
# A small fixed-length summary of the current window's features
# Not identity-specific — any of several persons could produce
# similar values
rf_signature_hash := BLAKE3(
key = site_salt, // 32 bytes; site-specific secret key
input = day_epoch_bytes(8) ‖ ephemeral_features(24)
)
```
### Why cross-site re-identification is structurally impossible
Two BFLD nodes at sites A and B produce:
```
hash_A = BLAKE3(key=salt_A, input=day ‖ features)
hash_B = BLAKE3(key=salt_B, input=day ‖ features)
```
BLAKE3 is a PRF (pseudorandom function family) keyed on site_salt. Given identical
`day ‖ features` inputs, hash_A and hash_B are pseudorandom and independent because
salt_A != salt_B. An adversary who observes hash_A and hash_B cannot determine whether
they correspond to the same person without knowing both salts.
This is not a security proof; it is a consequence of BLAKE3's PRF security assumption,
which holds as long as the site_salt remains secret.
### Why within-site, within-day tracking is safe
Within a single day at a single site, two frames from the same person will produce
similar ephemeral features, leading to similar (though not identical — ephemeral features
have some frame-to-frame variation) hash values. This is intentional: it allows
clustering of same-person events within a session without enabling identity recovery.
The hash is NOT the identity. It is a pseudonym within the scope of (site, day). A
person who visits the same site on two different days gets different pseudonyms on each
day.
### Daily rotation schedule
```
epoch_0 = 0 # day 0 (unix epoch: 1970-01-01)
epoch_k = k * 86_400_000_000_000 # day k in nanoseconds
rotation_time = epoch_{k+1} # midnight UTC
```
At rotation time, all existing rf_signature_hash values become cryptographically
disconnected from future values. Logs from before rotation cannot be correlated with
logs after rotation even by the node operator.
---
## 3. Identity Embedding Lifecycle
```
BFI frame arrives
|
v
Feature extraction (identity_risk.rs)
|
v
RuVector embedding computed: Vec<f32, 128>
|
+-------> identity_risk_score (scalar projection)
| Published (class 1) or suppressed (class 2/3)
|
v
In-RAM ring buffer (EmbeddingRingBuf)
- capacity: 600 frames (default 10 minutes at 1 Hz)
- implemented as VecDeque<Embedding> in heap memory
- NEVER written to disk (no serde, no file I/O in the type)
- NEVER serialized to any MQTT or HTTP path
- Cleared on node restart (RAM is volatile)
|
v [after retention window]
Dropped from ring buffer
```
The ring buffer serves two purposes: (1) temporal_stability calculation requires
comparing the current embedding to recent embeddings; (2) the coherence gate
(`coherence_gate.rs`, from `v2/crates/wifi-densepose-signal/src/ruvsense/`) uses
recent frames to determine whether a new frame is a continuation of an existing
trajectory or a new event.
Both purposes require only that the embeddings exist in RAM during the computation.
Neither purpose requires persistence.
---
## 4. Privacy-Mode Wire-Format Diff
The following shows what changes in the serialized `BfldFrame` payload when the node
transitions from class 1 (derived) to class 2 (anonymous), which is the transition
that happens when `privacy_mode` is enabled by the operator.
```
BfldFrame {
magic: 0xBF1D_0001, // unchanged
version: 1, // unchanged
ap_id: blake3(node_mac ‖ "ap"), // unchanged (already hashed at ingress)
sta_id: ephemeral_u64, // unchanged (already ephemeral)
session_id: u64, // unchanged
quantization: 0x02, // unchanged (i8 in class 1)
privacy_class: 0x01 -> 0x02, // CHANGED
// Payload (compressed):
compressed_angle_matrix: [...], // class 1: present; class 2: zeroed + omitted
amplitude_proxy: [...], // class 1: present; class 2: omitted
phase_proxy: [...], // class 1: present; class 2: omitted
snr_vector: [...], // class 1: present; class 2: present (aggregate)
// Event (JSON within payload or outer envelope):
presence: true, // unchanged
motion: 0.42, // unchanged
person_count: 1, // unchanged
identity_risk_score: 0.71, // class 1: present; class 2: OMITTED
rf_signature_hash: "a3f2...", // class 1: present; class 2: OMITTED
zone_activity: "living_room", // unchanged
confidence: 0.88, // unchanged
payload_crc32: <recomputed> // recomputed after changes
}
```
The wire-format diff is verified by the acceptance test suite: the same input must
produce a deterministic output for each privacy_class value.
---
## 5. Default-Deny Posture for Future Fields
Every new field added to `BfldFrame` or the BFLD event JSON in the future MUST be
classified before it ships. The process:
1. New field is added to `BfldFrame` struct.
2. A `#[privacy_class(minimum = N)]` attribute annotation (or equivalent runtime
check in `privacy_gate.rs`) declares the minimum privacy class at which this
field is suppressed.
3. Unit test asserts that serializing at class < N includes the field and at class N
omits it.
4. The PR that adds the field cannot pass CI without the classification annotation.
This is enforced by a custom `#[must_classify]` lint in the crate — any public field
on `BfldFrame` without a classification attribute produces a compile warning that
becomes a CI error.
---
## 6. Auditability: Verifying That Raw BFI Never Left the Network
An operator who wants to verify that no raw BFI or identity data has been transmitted
from their BFLD node can use the following procedure:
### 6.1 Network-level audit (tcpdump)
```bash
# On the node or a port-mirrored switch:
tcpdump -i eth0 -w bfld_audit.pcap port 1883 or port 8883
# After capture, search for the BFI frame magic bytes in the PCAP:
# Magic 0xBF1D_0001 in big-endian is bytes BF 1D 00 01
# If these bytes appear in the MQTT payload, raw BFI may be present.
# They should NOT appear — BFLD strips the angle matrix at privacy_class >= 2.
strings bfld_audit.pcap | grep -v "presence\|motion\|person_count" | wc -l
# Expected: only presence/motion/person_count keys in the MQTT payloads.
```
### 6.2 Node self-check command
```bash
# RuView CLI (planned for P3):
wifi-densepose bfld audit --duration 60s
# Output: "60 frames processed. 0 frames with raw_bfi in payload.
# 0 frames with identity_embedding in payload.
# privacy_class distribution: {2: 57, 3: 3}"
```
### 6.3 CI deterministic hash check
```bash
python python/wifi_densepose/verify_bfld.py
# Must print: VERDICT: PASS
# If a modified binary is exfiltrating raw BFI as part of the payload,
# the output hash will differ from the committed expected hash.
```

View File

@ -0,0 +1,239 @@
# BFLD Automation & Ecosystem Integration
## 1. Home Assistant Integration
### 1.1 Entities Exposed by BFLD
BFLD extends the sensing-server's existing HA entity set (ADR-115, 21 entities) with
the following new entities:
| Entity | Type | HA Platform | privacy_class | Default |
|--------|------|-------------|--------------|---------|
| `binary_sensor.bfld_presence` | Boolean | binary_sensor | 2 — anonymous | ON |
| `sensor.bfld_motion` | Float 0..1 | sensor | 2 — anonymous | ON |
| `sensor.bfld_person_count` | Integer | sensor | 1 — derived | ON |
| `sensor.bfld_confidence` | Float 0..1 | sensor | 2 — anonymous | ON |
| `sensor.bfld_identity_risk` | Float 0..1 | sensor (diagnostic) | 1 — derived | OFF |
| `sensor.bfld_zone_activity` | String | sensor | 2 — anonymous | ON |
`bfld_identity_risk` is classified as a diagnostic entity in the HA model — it is
hidden by default in the UI and not included in recorder history unless explicitly
enabled. This matches the operator opt-in posture for class-1 fields.
### 1.2 MQTT Discovery Payload (example for presence sensor)
```json
{
"name": "BFLD Presence",
"unique_id": "bfld_presence_<node_id_hash>",
"state_topic": "ruview/<node_id>/bfld/presence/state",
"device_class": "occupancy",
"payload_on": "true",
"payload_off": "false",
"device": {
"identifiers": ["ruview_<node_id_hash>"],
"name": "RuView BFLD Node",
"model": "wifi-densepose-bfld",
"manufacturer": "RuView"
}
}
```
Topic: `homeassistant/binary_sensor/bfld_<node_id_hash>/presence/config`
### 1.3 HA Blueprints
**Blueprint 1: Presence-driven lighting**
Trigger: `binary_sensor.bfld_presence` changes to `on`.
Condition: Time is between sunset and sunrise.
Action: Turn on `light.living_room` at 40% brightness.
Exit: `binary_sensor.bfld_presence` off for 5 minutes → turn off light.
This blueprint uses only class-2 (anonymous) data. No identity information is required.
**Blueprint 2: Motion-aware HVAC**
Trigger: `sensor.bfld_motion` rises above 0.3 (active movement threshold).
Action: Set `climate.living_room` to comfort mode.
Trigger: `sensor.bfld_motion` stays below 0.1 for 20 minutes (room settled).
Action: Set `climate.living_room` to eco mode.
**Blueprint 3: Identity-risk anomaly notification**
Trigger: `sensor.bfld_identity_risk` rises above 0.8 (high-risk threshold).
Condition: privacy mode is NOT enabled.
Action: Notify user via HA mobile app: "BFLD: High identity-leakage risk detected.
Consider enabling privacy mode."
This blueprint is the only one that touches a class-1 field. The notification is
a privacy-protective action — it alerts the operator that the sensing environment
has changed (e.g., new router firmware, new AP nearby, changed room geometry) in
a way that makes the RF channel more identity-discriminative.
---
## 2. Matter Exposure
Matter clusters expose the absolute minimum set of BFLD outputs. The constraint is
intentional: Matter fabrics can include cloud bridges, and identity-correlated data
must never reach cloud endpoints.
### 2.1 Permitted Matter Clusters
| Matter Cluster | Cluster ID | BFLD Source | Notes |
|----------------|-----------|-------------|-------|
| Occupancy Sensing | 0x0406 | `presence` | `OccupancySensing` attribute `Occupancy` bit 0 |
| Motion Detection | 0x040E (proposed) | `motion` | Published as motion event cluster |
| People Count | — (vendor extension) | `person_count` | No standard cluster yet; use vendor attribute |
### 2.2 Rejected Matter Fields
The following BFLD fields MUST NOT be exposed via Matter regardless of operator
configuration:
- `identity_risk_score`
- `rf_signature_hash`
- `raw_bfi`
- `identity_embedding`
- `compressed_angle_matrix`
- Any future field classified at privacy_class < 2
This rejection is enforced in the `cog-ha-matter` crate (`v2/crates/cog-ha-matter/`),
which filters `BfldFrame` events before populating Matter attribute reports.
### 2.3 Matter Endpoint Configuration
```
Endpoint 1: BFLD Occupancy
- Cluster: Occupancy Sensing (0x0406)
- Attribute 0x0000 Occupancy: 0x01 (bitmask, bit 0 = presence)
- Attribute 0x0001 OccupancySensorType: 0x03 (Other = WiFi RF)
- Cluster: Basic Information (0x0028)
- NodeLabel: "BFLD-<node_id_short>"
- ProductName: "wifi-densepose-bfld"
```
---
## 3. MQTT Topic Structure and ACL Recommendations
### 3.1 Topic Tree
```
ruview/<node_id>/bfld/
presence/state # "true" | "false" — class 2
motion/state # "0.42" — class 2
person_count/state # "1" — class 1
identity_risk/state # "0.71" — class 1, disabled by default
raw/state # disabled by default, class 0 metadata only
zone_activity/state # "living_room" — class 2
confidence/state # "0.88" — class 2
events/bfld_update # Full JSON event payload — class 2 fields only by default
```
### 3.2 Mosquitto ACL Recommendations
```
# /etc/mosquitto/acl.conf (example)
# BFLD node publishes to its own subtree
user bfld_node_<node_id>
topic write ruview/<node_id>/bfld/#
# Home Assistant reads presence, motion, count, zone, confidence
user homeassistant
topic read ruview/+/bfld/presence/state
topic read ruview/+/bfld/motion/state
topic read ruview/+/bfld/person_count/state
topic read ruview/+/bfld/zone_activity/state
topic read ruview/+/bfld/confidence/state
topic read ruview/+/bfld/events/bfld_update
# HA diagnostic access (operator opt-in required to add this rule):
# topic read ruview/+/bfld/identity_risk/state
# DENY all wildcard subscriptions for anonymous clients:
# (mosquitto default: anonymous clients get no access)
# DENY raw topic for all non-admin users:
# raw/state is never written by default; no read ACL needed
```
### 3.3 TLS Configuration
BFLD should use TLS for all MQTT connections. The BFLD node connects as a TLS client;
the broker must present a certificate matching the expected CA. The sensing-server
already supports mTLS (ADR-115). BFLD inherits this configuration.
---
## 4. Node-RED and OpenHAB Compatibility
BFLD publishes standard MQTT payloads with consistent topic structure. No Node-RED
or OpenHAB plugin is required; standard MQTT input/output nodes work directly.
**Node-RED example flow**:
```json
[
{"id": "bfld-in", "type": "mqtt in",
"topic": "ruview/+/bfld/presence/state", "qos": "1"},
{"id": "filter", "type": "switch",
"property": "payload", "rules": [{"t": "eq", "v": "true"}]},
{"id": "notify", "type": "http request",
"url": "http://ha/api/events/bfld_presence_on"}
]
```
**OpenHAB MQTT binding** (items file):
```
Switch BfldPresence "BFLD Presence" {mqtt="<[broker:ruview/node1/bfld/presence/state:state:default]"}
Number BfldMotion "BFLD Motion" {mqtt="<[broker:ruview/node1/bfld/motion/state:state:default]"}
```
---
## 5. cognitum-v0 Federation
The cognitum-v0 appliance (Pi 5, running ruview-mcp-brain on port 9876,
cognitum-rvf-agent on port 9004, ruvector-hailo-worker on port 50051 — see
CLAUDE.local.md) is the fleet coordinator for multi-room correlation.
BFLD events from individual nodes flow to cognitum-v0 via the federation path.
The critical constraint: **identity fields are stripped at the node boundary before
federation**. The stripping happens in the local BFLD emitter (`mqtt.rs`), not in
cognitum-v0. By the time a BFLD event reaches the broker that cognitum-v0 subscribes to,
it contains only class-2 (anonymous) or class-3 (restricted) fields.
### 5.1 Federation Topics
```
# Node-local (not federated):
ruview/<node_id>/bfld/identity_risk/state
ruview/<node_id>/bfld/raw/state
# Federated (forwarded to cognitum-v0 broker):
ruview/<node_id>/bfld/presence/state
ruview/<node_id>/bfld/motion/state
ruview/<node_id>/bfld/person_count/state
ruview/<node_id>/bfld/events/bfld_update
```
### 5.2 cognitum-rvf-agent Role
The `cognitum-rvf-agent` (port 9004) handles cross-node RVF (RuView Frame) container
events. For BFLD, it receives federated presence/motion/count events and can correlate
them for multi-room occupancy (e.g., "person moved from living room node to kitchen
node"). It does not receive or need identity information to perform this correlation —
it uses temporal and spatial proximity, not identity.
### 5.3 Hailo Inference (Future)
The `ruvector-hailo-worker` (port 50051) on cognitum-v0 runs vector similarity on the
Hailo-8 AI accelerator. A future extension could offload BFLD's identity_risk_score
computation to the Hailo worker, keeping the identity embedding local to cognitum-v0
while giving individual nodes the benefit of a larger enrollment pool for risk
calibration. This is explicitly out of scope for the current BFLD spec — it is noted
here as an integration-compatible extension point.

View File

@ -0,0 +1,253 @@
# BFLD Implementation Plan
## 1. New Crate: wifi-densepose-bfld
Location: `v2/crates/wifi-densepose-bfld/`
This crate slots between `wifi-densepose-signal` (BFI normalization, temporal windowing)
and `wifi-densepose-sensing-server` (MQTT/HA integration). It does not depend on the
training pipeline (`wifi-densepose-train`) or the neural-network inference crate
(`wifi-densepose-nn`) in the default build — feature flags activate those paths.
### 1.1 Module Layout
```
v2/crates/wifi-densepose-bfld/
Cargo.toml
src/
lib.rs # Public API: BfldPipeline, BfldFrame, BfldEvent
frame.rs # BfldFrame struct, serialization, CRC32, magic bytes
extractor.rs # BFI packet capture interface, Phi/Psi parsing,
# 802.11ac/ax CBFR format decoder
features.rs # Feature computation: mean_angle_delta,
# subcarrier_variance, temporal_entropy,
# doppler_proxy, path_stability,
# cross_antenna_correlation, burst_motion_score,
# stationarity_score, identity_separability_score
identity_risk.rs # identity_risk_score formula, EmbeddingRingBuf,
# in-RAM-only lifecycle enforcement
privacy_gate.rs # privacy_class assignment, field masking,
# #[must_classify] lint check
emitter.rs # BfldEvent construction, JSON serialization
mqtt.rs # MQTT topic publishing, ACL, per-class topic routing
tests/
frame_roundtrip.rs # BfldFrame serialization + CRC32 determinism
privacy_gate.rs # Per-class field suppression assertions
hash_rotation.rs # Cross-site isolation + daily rotation proofs
identity_risk.rs # Risk score bounded [0,1], local-only embedding
acceptance.rs # All 7 acceptance criteria as named tests
benches/
pipeline_throughput.rs # Frame processing at 40 Hz
```
### 1.2 Public API Sketch
```rust
// lib.rs — primary entry points
pub struct BfldPipeline {
config: BfldConfig,
extractor: BfiExtractor,
feature_engine: FeatureEngine,
identity_risk: IdentityRiskEngine,
privacy_gate: PrivacyGate,
emitter: BfldEmitter,
}
impl BfldPipeline {
pub fn new(config: BfldConfig) -> Result<Self, BfldError>;
pub fn process_frame(&mut self, raw: RawBfiCapture) -> Option<BfldEvent>;
pub fn current_privacy_class(&self) -> PrivacyClass;
pub fn enable_privacy_mode(&mut self); // forces class 3
}
pub struct BfldEvent {
pub timestamp_ns: u64,
pub presence: bool,
pub motion: f32, // 0.0..1.0
pub person_count: u8,
pub identity_risk_score: Option<f32>, // None if privacy_class >= 2
pub rf_signature_hash: Option<[u8; 32]>, // None if privacy_class >= 2
pub zone_id: Option<ZoneId>,
pub confidence: f32,
pub privacy_class: PrivacyClass,
}
#[repr(u8)]
pub enum PrivacyClass {
Raw = 0,
Derived = 1,
Anonymous = 2,
Restricted = 3,
}
```
---
## 2. Reuse Map: Existing Crates and Modules
### 2.1 RuvSense Modules (wifi-densepose-signal)
Path: `v2/crates/wifi-densepose-signal/src/ruvsense/`
| Module | Used by BFLD | Purpose |
|--------|-------------|---------|
| `coherence_gate.rs` | `identity_risk.rs` | Accept/reject frame based on coherence score; gates embeddings fed into risk calculation |
| `multistatic.rs` | `features.rs` | Attention-weighted fusion for cross_perspective_consistency component of risk score |
| `cross_room.rs` | `privacy_gate.rs` | Environment fingerprinting — confirms that the site_salt corresponds to the current room geometry |
| `longitudinal.rs` | `identity_risk.rs` | Welford stats for temporal_stability component |
| `adversarial.rs` | `extractor.rs` | Physically-impossible signal detection — flags frames that may be from a compromised AP (A5 threat) |
Not used by BFLD: `pose_tracker.rs`, `intention.rs`, `gesture.rs`, `tomography.rs`,
`field_model.rs` — these operate above the identity-risk layer.
### 2.2 RuVector v2.0.4 Crates
| Crate | BFLD Usage | Rationale |
|-------|-----------|-----------|
| `ruvector-attention` | `identity_risk.rs` | Spatial attention over subcarrier dimension for embedding computation |
| `ruvector-mincut` | `features.rs` | Person separation score as input to person_count feature |
| `ruvector-temporal-tensor` | `extractor.rs` | Temporal windowing + compression of BFI angle sequences |
Not used: `ruvector-attn-mincut`, `ruvector-solver` — spectrogram and sparse
interpolation are not needed in the BFI pipeline.
### 2.3 Cross-Viewpoint Fusion (wifi-densepose-ruvector)
Path: `v2/crates/wifi-densepose-ruvector/src/viewpoint/`
| Module | BFLD Usage |
|--------|-----------|
| `coherence.rs` | Cross-viewpoint phase coherence for cross_perspective_consistency risk component |
| `geometry.rs` | Fisher Information / Cramer-Rao bounds for confidence estimation |
| `attention.rs` | GeometricBias-weighted attention for multi-AP BFI fusion |
| `fusion.rs` | MultistaticArray aggregate root — BFLD subscribes to domain events here |
---
## 3. ESP32 Firmware Additions
### 3.1 ESP32-S3 BFI Capability Assessment
The ESP32-S3's WiFi driver (`csi_collector.c` in `firmware/esp32-csi-node/main/`)
uses `esp_wifi_csi_set_config()` and the `wifi_csi_cb_t` callback. This produces
Espressif HT20 CSI in a vendor-specific format — amplitude + phase per subcarrier,
not the VHT/HE Compressed Beamforming frames (CBFR) that contain Phi/Psi angles.
The ESP32-S3 does NOT have a public API to generate or capture CBFR frames. Espressif's
802.11 implementation does receive and process CBFR frames internally (for beamforming
its own transmissions), but these are not exposed via the CSI callback.
**Consequence**: BFI capture for BFLD requires host-side sniffing, not ESP32 firmware
modification.
### 3.2 Host-Side BFI Capture Path
Recommended capture hardware: Raspberry Pi 5 with BCM43456 chip running Nexmon CSI
patch. This is already present in the fleet as `cognitum-v0` (Pi 5, Tailscale IP
100.77.59.83 per CLAUDE.local.md).
Capture path:
1. Nexmon monitor mode captures all 802.11 frames on the target channel.
2. A filter pass extracts CBFR frames (frame type = Action, subtype = VHT/HE CBFR).
3. The rvcsi adapter (`vendor/rvcsi/`) already handles Nexmon PCap format; add a
BFI extractor alongside the existing CSI extractor.
4. Frames are forwarded to the BFLD pipeline via the existing UDP stream path
(`stream_sender.c` / sensing-server).
### 3.3 Firmware Changes Required (Minimal)
The only firmware change needed in `firmware/esp32-csi-node/main/` is to the
`stream_sender.c` protocol: add a packet type byte to the stream header to distinguish
CSI frames from BFI frames. The BFI frames originate on the Pi-side host, not the
ESP32; the ESP32 stream is unchanged.
```c
// stream_sender.h — add packet type
#define STREAM_PKT_TYPE_CSI 0x01
#define STREAM_PKT_TYPE_BFI 0x02 // new: BFI frames from host capture
```
---
## 4. Test Plan: 7 Acceptance Criteria Mapped to Rust Tests
| AC | Criterion | Test in `acceptance.rs` |
|----|-----------|------------------------|
| AC1 | Commodity WiFi 5/6 capture (80/160 MHz, 2×2 MIMO minimum) | `ac1_commodity_wifi_capture`: assert BfiExtractor parses 80 MHz VHT CBFR sample fixture |
| AC2 | Presence detection latency ≤ 1s from first non-empty BFI frame | `ac2_presence_latency`: replay 10-frame window, assert first `BfldEvent` with `presence=true` within 1,000 ms wall time |
| AC3 | Motion score published at ≥ 1 Hz on `motion/state` topic | `ac3_motion_hz`: mock MQTT sink, run at 5 Hz input, assert ≥ 1 motion event per second |
| AC4 | Raw BFI bytes never appear in serialized output | `ac4_raw_bfi_absent`: fuzz 1,000 random BfiCaptures, assert no bfi_matrix bytes in serialized BfldFrame for any privacy_class |
| AC5 | Privacy-mode suppresses all identity-derived fields | `ac5_privacy_mode`: enable privacy_mode, assert BfldEvent fields identity_risk_score and rf_signature_hash are None |
| AC6 | Deterministic frame hash for identical inputs | `ac6_deterministic_hash`: run same BfiCapture 100 times, assert all output hashes identical |
| AC7 | CSI-optional fusion: pipeline runs without csi_matrix | `ac7_csi_optional`: run BfldPipeline with None csi_matrix, assert no panic and presence event produced |
Additionally, `tests/hash_rotation.rs` must include:
- `cross_site_isolation`: two BfldPipelines with different site_salts, identical inputs → hashes must differ
- `daily_rotation`: same salt, frames 1 second before/after midnight → hashes must differ
---
## 5. Phased Rollout
### P1 — Frame Format + Extractor Stub (2 weeks)
Deliverables:
- `frame.rs`: `BfldFrame` struct, serialization, CRC32, magic, version
- `extractor.rs`: CBFR parser for 802.11ac VHT + 802.11ax HE formats
- AC1, AC6 tests passing
- `Cargo.toml` with workspace integration
Effort: 1 engineer, 2 weeks.
### P2 — Feature Extraction + Identity Risk (3 weeks)
Deliverables:
- `features.rs`: all 9 named features (mean_angle_delta through identity_separability_score)
- `identity_risk.rs`: risk formula, EmbeddingRingBuf, coherence gate integration
- AC4, AC7 tests passing (raw-absent, CSI-optional)
- Integration with `ruvector-attention` and `ruvector-temporal-tensor`
Effort: 1 engineer, 3 weeks.
### P3 — Privacy Gate + MQTT (2 weeks)
Deliverables:
- `privacy_gate.rs`: privacy_class assignment, field masking, `#[must_classify]` lint
- `mqtt.rs`: per-class topic routing, discovery payloads, ACL documentation
- AC2, AC3, AC5 tests passing (latency, Hz, privacy-mode)
- Hash rotation: `hash_rotation.rs` tests passing
- Deterministic proof bundle: `verify_bfld.py` equivalent
Effort: 1 engineer, 2 weeks.
### P4 — Home Assistant Integration (1 week)
Deliverables:
- MQTT discovery payloads for all 6 entities
- 3 HA blueprints
- `sensor.bfld_identity_risk` marked diagnostic + hidden by default
- Update `wifi-densepose-sensing-server` to include BFLD event routing
Effort: 0.5 engineer, 1 week.
### P5 — Matter Exposure (1 week)
Deliverables:
- `cog-ha-matter` crate updated to filter BfldFrame → Matter attribute reports
- OccupancySensing cluster populated from `presence`
- Rejection list for identity fields enforced at Matter boundary
Effort: 0.5 engineer, 1 week.
### P6 — cognitum Federation (1 week)
Deliverables:
- Topic routing in `mqtt.rs` for federated vs local topics
- Documentation for cognitum-rvf-agent BFLD event subscription
- End-to-end test: Pi 5 (cognitum-v0) receives federated events, identity fields absent
Effort: 0.5 engineer, 1 week.
**Total estimate**: ~10.5 engineer-weeks across 6 phases, approximately 3 calendar months
with one engineer.

View File

@ -0,0 +1,196 @@
# BFLD Benchmarks and Evaluation Strategy
## 1. Datasets
### 1.1 BFId Dataset (Primary)
**Reference**: Todt, Morsbach, Strufe; KIT. ACM CCS 2025.
https://dl.acm.org/doi/10.1145/3719027.3765062
https://ps.tm.kit.edu/english/bfid-dataset/index.php
197 individuals. BFI and CSI recorded simultaneously. Multiple sessions, multiple AP
angles. Available to researchers for non-commercial use on request from KIT.
**Use in BFLD evaluation**: The BFId dataset provides the ground-truth identity labels
needed to calibrate `identity_risk_score`. Specifically: given BFId's known re-ID
accuracy as a function of time window, BFLD's identity_risk_score should correlate
with BFId's success rate. High-risk frames (score > 0.7) should correspond to windows
where BFId achieves > 80% accuracy; low-risk frames (score < 0.2) should correspond
to windows where BFId accuracy approaches chance.
### 1.2 Wi-Pose and MM-Fi (Context)
**MM-Fi**: Multi-modal WiFi sensing dataset used by this project (ADR-015). Contains
synchronized WiFi CSI, mmWave, and camera pose data. Does not contain BFI separately,
but can be used to validate BFLD's CSI-optional path (AC7).
**Wi-Pose**: Academic benchmark for WiFi pose estimation. CSI only; used for
person_count and motion accuracy baselines.
### 1.3 Proposed In-House Multi-Site Capture Protocol
**Purpose**: Validate cross-site isolation (Invariant 3) and daily rotation.
**Setup**:
- Site A: ruvultra (RTX 5080 workstation, Tailscale 100.104.125.72) with USB WiFi
adapter in monitor mode.
- Site B: cognitum-v0 (Pi 5, Tailscale 100.77.59.83) with Nexmon monitor mode.
- Subject pool: 510 volunteers.
- Protocol: Each subject walks a fixed path at each site on 3 consecutive days.
BFI captured simultaneously at both sites using Wi-BFI.
**Analysis**:
1. Can the BFId classifier re-identify subjects within a site? (Baseline — should
confirm BFId's published results.)
2. Can any classifier re-identify subjects across sites using BFLD's
rf_signature_hash? (Should fail — cross-site isolation test.)
3. Can any classifier re-identify across days using BFLD's rf_signature_hash? (Should
fail — daily rotation test.)
---
## 2. Metrics
### 2.1 Presence Detection
| Metric | Definition | Target |
|--------|-----------|--------|
| Latency p50 | Time from first non-empty BFI frame to first `presence=true` event | < 500 ms |
| Latency p95 | | < 1000 ms (AC2) |
| False positive rate | Presence=true when room is confirmed empty | < 5% |
| False negative rate | Presence=false when person confirmed present | < 2% |
Measurement method: camera ground-truth (ruvultra webcam via MediaPipe Pose, same
as ADR-079 collection protocol) for empty/occupied labels.
### 2.2 Motion Score
| Metric | Definition | Target |
|--------|-----------|--------|
| MAE vs ground truth | Mean absolute error of motion score vs camera-derived motion magnitude | < 0.1 |
| Hz at sustained operation | Events published per second on `motion/state` | >= 1 Hz (AC3) |
| Latency p95 | Time from motion onset (camera) to motion event | < 750 ms |
### 2.3 Person Count
| Metric | Definition | Target |
|--------|-----------|--------|
| Count accuracy | Fraction of windows where BFLD person_count == camera count | > 85% for 13 persons |
| Count MAE | | < 0.5 for counts 14 |
Person count is harder than presence. The target is achievable with MinCut separation
(`ruvector-mincut`) but requires multi-AP coverage for 4+ persons.
### 2.4 Identity Risk Calibration
This is BFLD's novel evaluation dimension — no prior system has explicitly quantified
this.
**Calibration definition**: Let `r(t)` = BFLD's identity_risk_score at time t.
Let `acc(t)` = BFId classifier's re-identification accuracy when trained on frames
around time t. The identity_risk_score is *calibrated* if:
E[acc(t) | r(t) = v] is monotonically increasing in v
In other words: higher risk scores should correspond to frames where identity inference
is genuinely easier.
**Evaluation protocol**:
1. Run BFId classifier in sliding 5-second windows on the BFId dataset.
2. Record per-window BFId accuracy (using leave-one-out cross-validation).
3. Run BFLD's identity_risk_score computation on the same windows.
4. Compute Spearman correlation between risk scores and BFId accuracy.
5. Target: Spearman rho > 0.5 (positive monotonic correlation).
### 2.5 Privacy-Mode False Positive Rate
When `privacy_mode` is enabled (privacy_class = 3), all identity-correlated fields
should be suppressed. The false positive rate is the fraction of outbound events
that inadvertently include an identity-correlated field despite privacy_mode being
active.
**Target**: 0% (this is a hard correctness requirement, not a statistical target).
Verified by the AC5 fuzz test in `acceptance.rs`.
---
## 3. Red-Team Protocol
### 3.1 Hash Re-identification Attack
**Question**: Can an attacker re-identify a person across rotated hashes?
**Setup**:
- Run BFLD pipeline for person X across 3 days.
- Collect `rf_signature_hash` values for each day: H_1, H_2, H_3.
- Adversary has access to H_1, H_2, H_3 and knows they are from the same site.
- Adversary attempts to confirm H_1, H_2, H_3 are from the same person.
**Success condition**: adversary achieves confirmation rate > chance (1/N for N subjects).
**Expected result**: FAIL (by construction of the hash rotation with site_salt).
Since day_epoch changes daily and site_salt is fixed but unknown to the adversary,
the hash function is a keyed PRF. The adversary has three random-looking 32-byte
values with no structural relationship. Success rate should be indistinguishable from
random guessing.
**Quantitative target**: success rate <= 1/N + 0.05 (within 5% of chance).
### 3.2 Cross-Site Re-identification Attack
**Question**: Can an attacker confirm person X visited both site A and site B?
**Setup**: Same as Section 1.3 in-house protocol. Adversary has BFLD event streams
from both sites.
**Method**: Attempt to match rf_signature_hash values from site A and site B on the
same day. Alternatively, train a classifier on BFI features (using the raw angle
sequences from the captured data) and attempt cross-site re-ID.
**Expected result**: Hash-based matching fails by construction. Classifier-based
re-ID may succeed if the adversary has raw angle data (which BFLD does not publish)
but not using BFLD's published output.
**Success condition**: hash-based cross-site match rate <= 1/N + 0.05.
### 3.3 Timing Side-Channel Attack
**Question**: Can an attacker infer a person's schedule by monitoring
identity_risk_score over time?
**Method**: Record identity_risk_score time series. Correlate with known schedule
(person X leaves at 8am, returns at 6pm). Compute mutual information between
schedule and risk score time series.
**Expected result**: Some correlation exists (risk score rises when person enters),
but the attacker learns "someone is present" — equivalent to the presence sensor —
not identity. This is acceptable: presence information is already published at
class 2.
---
## 4. Comparison Baselines
| Baseline | Description | Presence F1 | Motion MAE | Identity leak |
|----------|-------------|------------|-----------|--------------|
| Raw CSI pipeline | Existing wifi-densepose pipeline (no BFLD) | ~0.95 (est.) | ~0.08 (est.) | Unquantified — no risk gating |
| BFI-only (no BFLD) | Wi-BFI + threshold presence | ~0.82 (from LeakyBeam) | N/A | Angle matrices published |
| BFI+CSI fusion (no BFLD) | Combined pipeline, ungated | ~0.97 (est.) | ~0.06 (est.) | Unquantified |
| **BFLD (BFI+CSI, class 2)** | Full BFLD with anonymous privacy class | target 0.93 | target 0.10 | 0% (class 2 gate) |
| BFLD (BFI-only, class 2) | BFLD without CSI input (AC7) | target 0.85 | target 0.12 | 0% (class 2 gate) |
The BFLD privacy-class guarantee reduces the raw sensing accuracy by a small margin
versus an ungated BFI+CSI pipeline (target F1 0.93 vs estimated 0.97). This is the
explicit trade-off: identity safety for a modest utility cost.
---
## 5. Continuous Evaluation in CI
Three tests run on every PR that touches the BFLD crate:
1. **Deterministic hash test** (AC6): same input → same output across platforms.
2. **Privacy-mode field suppression fuzz** (AC5): 1,000 random inputs → no identity
fields in class-2 output.
3. **Latency smoke test** (AC2): 100-frame replay → first presence event < 200 ms
(tighter than the 1s AC target, to keep CI fast).

View File

@ -0,0 +1,214 @@
# ADR-118: BFLD — Beamforming Feedback Layer for Detection
> This file is a draft. When approved, copy to:
> `docs/adr/ADR-118-bfld-beamforming-feedback-layer-for-detection.md`
| Field | Value |
|-------|-------|
| **Status** | Proposed |
| **Date** | 2026-05-24 |
| **Deciders** | ruv |
| **Codename** | **BFLD** — Beamforming Feedback Layer for Detection |
| **Relates to** | [ADR-024](ADR-024-contrastive-csi-embedding-model.md) (AETHER contrastive embedding), [ADR-027](ADR-027-cross-environment-domain-generalization.md) (MERIDIAN cross-environment), [ADR-028](ADR-028-esp32-capability-audit.md) (capability audit / witness), [ADR-029](ADR-029-ruvsense-multistatic-sensing-mode.md) (RuvSense multistatic), [ADR-030](ADR-030-ruvsense-persistent-field-model.md) (persistent field model), [ADR-031](ADR-031-ruview-sensing-first-rf-mode.md) (sensing-first RF mode), [ADR-032](ADR-032-multistatic-mesh-security-hardening.md) (mesh security hardening), [ADR-095](ADR-095-rvcsi-edge-rf-sensing-platform.md) (rvCSI platform), [ADR-115](ADR-115-home-assistant-integration.md) (HA integration), [ADR-116](ADR-116-cog-ha-matter-seed.md) (Matter seed packaging), [ADR-117](ADR-117-pip-wifi-densepose-modernization.md) (pip modernization) |
| **Tracking issue** | TBD |
---
## 1. Context
### 1.1 The Plaintext BFI Problem
IEEE 802.11ac and 802.11ax beamforming feedback information (BFI) is exchanged between
client stations (STA) and access points (AP) in unencrypted management-plane frames.
The STA compresses the channel response into a matrix of Givens rotation angles (Phi/Psi)
and transmits them in a VHT/HE Compressed Beamforming Report (CBFR) frame. These frames
are passively sniffable by any device in WiFi monitor mode without any access to the
target network.
Two independent 20242025 research papers establish the severity of this exposure:
1. **BFId** (Todt, Morsbach, Strufe; KIT; ACM CCS 2025,
https://dl.acm.org/doi/10.1145/3719027.3765062): demonstrates re-identification of
197 individuals using BFI alone, with >90% accuracy from 5 seconds of capture.
2. **LeakyBeam** (Xiao et al.; Zhejiang U., NTU, KAIST; NDSS 2025,
https://www.ndss-symposium.org/ndss-paper/lend-me-your-beam-privacy-implications-of-plaintext-beamforming-feedback-in-wifi/):
demonstrates occupancy detection through walls at 20 m range using BFI, with 82.7%
TPR and 96.7% TNR.
Tooling for passive BFI capture is freely available. Wi-BFI
(https://arxiv.org/abs/2309.04408) is pip-installable and supports 802.11ac/ax,
SU/MU-MIMO, 20/40/80/160 MHz channels.
### 1.2 Gap in Existing Pipeline
The wifi-densepose sensing pipeline processes CSI via the rvCSI runtime (ADR-095/096)
and produces presence, pose, vitals, and zone-activity events. No layer explicitly
measures whether the data being processed is capable of identifying specific individuals.
The pipeline treats all CSI as equivalent from a privacy standpoint, regardless of
whether it is operating in a high-separability (identity-leaky) or low-separability
(anonymous) regime.
This gap becomes a compliance and liability issue as WiFi sensing deployments scale.
An operator deploying this system in a care facility, hotel, or shared office has no
instrument to verify that the system is operating anonymously.
### 1.3 The BFI Opportunity
BFI is not only a threat vector — it is a complementary sensing signal. Because BFI
encodes the channel response as a structured compressed matrix, it carries multipath
geometry that can augment CSI-based presence and motion detection, particularly in
scenarios where only one AP is available (fewer antenna pairs than a full MIMO CSI
capture). The BFLD design treats BFI as an optional input alongside CSI, not as a
replacement.
---
## 2. Decision
We will create a new crate `wifi-densepose-bfld` (to live in `v2/crates/`) that:
1. **Ingests** raw BFI (Phi/Psi angle matrices from CBFR frames) as input and optionally
fuses CSI when available.
2. **Computes** nine named features and derives an `identity_risk_score` using a
separability × temporal_stability × cross_perspective_consistency × sample_confidence
formula.
3. **Gates** all output through a `privacy_class` mechanism that structurally prevents
identity-correlated data from being published at privacy classes 2 and 3.
4. **Emits** `BfldEvent` structs on MQTT topics under `ruview/<node_id>/bfld/` with
per-class topic routing.
5. **Enforces** three invariants structurally (not by policy):
- Raw BFI never exits the node.
- Identity embedding is in-RAM-only.
- Cross-site identity correlation is made cryptographically impossible via per-site
keyed BLAKE3 hash rotation with a daily epoch.
The `BfldFrame` wire format carries magic `0xBF1D_0001`, a version byte, hashed AP/STA
identifiers, a quantization byte, a privacy_class byte, compressed feature payload, and
a CRC32.
Matter exposure is limited to: OccupancySensing (presence), MotionSensor (motion),
PeopleCount (person_count). Identity fields are rejected at the Matter boundary in the
`cog-ha-matter` crate.
---
## 3. Consequences
### Positive
- Operators gain an explicit, auditable measure of privacy compliance at the RF layer —
the first such primitive in the wifi-densepose ecosystem.
- The identity_risk_score doubles as an anomaly signal: unexpected spikes indicate
environmental changes (new AP firmware, nearby attacker-grade sniffer, unusual
propagation geometry) that warrant investigation.
- BFI fusion augments presence and motion accuracy in single-AP deployments, partially
compensating for lower CSI antenna counts.
- The crate's deterministic frame hashes enable the ADR-028 witness-bundle pattern to
extend to the new sensing surface, preserving the existing audit trail model.
- Cross-site identity isolation is structural, not policy-dependent. This is a stronger
guarantee than access-control rules.
### Negative
- BFI capture on ESP32-S3 hardware is not directly possible via the Espressif WiFi API.
The full BFLD pipeline requires a Pi 5 / Nexmon host-side sniffer (cognitum-v0 is
available for this purpose, but it adds a fleet dependency for the BFI path).
- The identity_risk_score calibration (correlation with actual re-ID success rate)
requires the BFId dataset, which requires non-commercial research agreement with KIT.
- ~10.5 engineer-weeks of implementation effort.
### Neutral
- BFLD does not prevent passive BFI capture by an external attacker (A1 / LeakyBeam
threat). It only ensures the node's own output is non-identifying. Operators should
be informed of this distinction.
- The daily hash rotation means that occupant-counting analytics that span multiple
days cannot correlate individual signatures across the day boundary. This is a privacy
benefit that some analytics use-cases may find inconvenient.
---
## 4. Alternatives Considered
### Alt 1: Skip BFI entirely, CSI-only pipeline
The rvCSI pipeline (ADR-095/096) already handles CSI without BFI. This alternative
requires no new crate and no change to the ESP32 firmware.
**Rejected because**: (a) it leaves the identity-leakage detection gap open for the
existing CSI pipeline, and (b) as BFI capture tooling becomes more widespread (Wi-BFI,
PicoScenes), the absence of a privacy layer becomes more conspicuous for operators.
### Alt 2: Publish identity_risk_score publicly (default-on)
Treat the risk score as a diagnostic metric that operators and the public can observe.
**Rejected because**: the risk score is itself a privacy-sensitive signal (it reveals
when a specific person is present via timing correlation). The default should be
opt-in, with the operator explicitly acknowledging the trade-off.
### Alt 3: Use raw BFI in cloud ML training
Send raw BFI angle matrices to a cloud training service to improve model quality.
**Rejected because**: this violates Invariant 1. Cloud training on raw BFI would
create an off-node store of angle matrices that could be reconstructed into identity
profiles. The on-device-only constraint is not negotiable.
### Alt 4: Differential privacy noise injection on BFI before any processing
Add calibrated Laplace/Gaussian noise to the angle matrices at ingress to provide
epsilon-differential privacy on all downstream computations.
**Rejected for this ADR** (noted as future extension): DP noise calibration requires
sensitivity analysis that is not yet complete, and the interaction between DP noise
and the identity_risk_score formula requires separate validation. The current design
achieves privacy through structural impossibility (local-only, hash rotation) rather
than noise injection.
---
## 5. Acceptance Criteria
- [ ] **AC1**: The extractor parses BFI from commodity WiFi 5 (802.11ac) and WiFi 6
(802.11ax) captures, supporting 20/40/80/160 MHz channel bandwidth and 2×2 through
4×4 MIMO configurations.
- [ ] **AC2**: Presence detection latency is ≤ 1s p95 from the first non-empty BFI
frame in a new occupancy event.
- [ ] **AC3**: Motion score is published at ≥ 1 Hz on the `ruview/<node_id>/bfld/motion/state`
MQTT topic during sustained occupancy.
- [ ] **AC4**: Raw BFI bytes (Phi/Psi angle matrices) are never present in any
serialized `BfldFrame` payload at any `privacy_class` value.
- [ ] **AC5**: When `privacy_mode` is enabled, all identity-derived fields
(`identity_risk_score`, `rf_signature_hash`, `identity_embedding`) are absent from
all outbound events.
- [ ] **AC6**: Given identical `BfiCapture` inputs, the `BfldFrame` serialization
produces bit-identical output (deterministic hash) across runs and across platforms.
- [ ] **AC7**: The pipeline produces valid `BfldEvent` outputs when `csi_matrix` is
absent (BFI-only mode), without panic or degraded presence/motion reporting beyond
the documented accuracy bounds.
---
## 6. Related ADRs
- **ADR-024**: AETHER contrastive CSI embedding — BFLD reuses the AETHER embedding
infrastructure for identity_risk computation.
- **ADR-027**: MERIDIAN cross-environment — BFLD's cross-site isolation instantiates
the "no cross-site correlation" assumption that MERIDIAN requires.
- **ADR-028**: Witness verification — BFLD extends the deterministic proof pattern.
- **ADR-029**: RuvSense multistatic — BFLD uses `multistatic.rs` for
cross_perspective_consistency.
- **ADR-030**: Persistent field model — BFLD uses `cross_room.rs` for
environment fingerprinting in the hash rotation.
- **ADR-031**: Sensing-first RF mode — BFLD is a new sensing primitive alongside
the CSI-based sensing.
- **ADR-032**: Mesh security hardening — BFLD's threat model is a superset.
- **ADR-095/096**: rvCSI platform — BFLD shares the BFI capture path with rvCSI's
Nexmon adapter.
- **ADR-115**: HA integration — BFLD extends the 21-entity HA surface with 6 new
entities.
- **ADR-116**: Matter seed packaging — BFLD's Matter boundary filter is implemented
in `cog-ha-matter`.
- **ADR-117**: pip modernization — BFLD's Python bindings (PyO3) will follow the
pattern established in ADR-117.

View File

@ -0,0 +1,111 @@
# GitHub Issue Draft
**Title**: feat: BFLD — Beamforming Feedback Layer for Detection (privacy-gated WiFi sensing)
**Labels**: `enhancement`, `privacy`, `security`, `area/signal`, `area/firmware`
**Milestone**: (TBD — suggest: v0.8.0)
---
## Summary
Add a new crate `wifi-densepose-bfld` that turns raw 802.11 Beamforming Feedback
Information (BFI) into bounded, privacy-gated sensing outputs. BFLD detects when RF
data crosses from "ambient sensing" into "identity record" and structurally prevents
identity-correlated data from leaving the node.
This is the safety layer that was missing from the CSI pipeline. As passive BFI sniffing
tools (Wi-BFI, PicoScenes) become widely available and academic attacks (BFId at ACM CCS
2025, LeakyBeam at NDSS 2025) demonstrate >90% re-identification from commodity WiFi,
the wifi-densepose ecosystem needs an explicit privacy layer before scaling deployment.
## Motivation
1. **BFI is plaintext and passively sniffable.** IEEE 802.11ac/ax CBFR frames are
transmitted before WPA2/WPA3 encryption is applied. Any nearby device in monitor mode
can capture them (NDSS 2025: https://www.ndss-symposium.org/ndss-paper/lend-me-your-beam-privacy-implications-of-plaintext-beamforming-feedback-in-wifi/).
2. **BFI enables re-identification.** The KIT BFId paper (ACM CCS 2025:
https://dl.acm.org/doi/10.1145/3719027.3765062) demonstrates >90% identity
recognition from 5 seconds of BFI, from a dataset of 197 individuals, using only
the Phi/Psi Givens rotation angles.
3. **The existing pipeline has no identity-leakage measurement.** The rvCSI pipeline
produces presence/motion/pose events without any indication of whether those outputs
were derived from identity-discriminative data. An operator deploying in a care
facility or shared office has no way to verify the system is behaving anonymously.
4. **WiFi 7 will make this worse.** 802.11be (Wi-Fi 7) multi-link operation increases
sounding frequency 35×. The attack surface is not static.
## Proposed Solution
New crate at `v2/crates/wifi-densepose-bfld/` with the following pipeline:
```
BFI capture (CBFR frames, Pi 5 / Nexmon monitor mode)
→ BFI extractor (Phi/Psi parser, 802.11ac/ax)
→ Normalization + temporal windowing
→ Feature extraction (9 named features)
→ Identity risk engine (in-RAM embeddings, coherence gate)
→ Privacy gate (privacy_class byte, field masking)
→ MQTT emitter (per-class topic routing)
```
Three structural invariants (not configurable, not policy):
1. Raw BFI never leaves the node.
2. Identity embedding is in-RAM-only (VecDeque, never persisted).
3. Cross-site identity matching is cryptographically impossible via per-site BLAKE3
keyed hash with daily rotation.
Output events published on `ruview/<node_id>/bfld/{presence,motion,person_count,...}/state`.
Matter and HA expose only: presence, motion, person_count. Identity fields are rejected
at both boundaries.
## Acceptance Criteria
- [ ] **AC1**: Parser handles 802.11ac VHT and 802.11ax HE CBFR frames at 20/40/80/160 MHz,
2×2 through 4×4 MIMO.
- [ ] **AC2**: Presence detection latency ≤ 1s p95 from first non-empty BFI frame in
a new occupancy event.
- [ ] **AC3**: Motion score published at ≥ 1 Hz on `ruview/<node_id>/bfld/motion/state`
during sustained occupancy.
- [ ] **AC4**: Raw BFI bytes (Phi/Psi angle matrices) are never present in any
serialized output at any `privacy_class` value.
- [ ] **AC5**: Privacy mode suppresses all identity-derived fields (`identity_risk_score`,
`rf_signature_hash`, `identity_embedding`) from all outbound events.
- [ ] **AC6**: Identical `BfiCapture` input → bit-identical `BfldFrame` output
(deterministic, cross-platform).
- [ ] **AC7**: Pipeline produces valid `BfldEvent` with `csi_matrix = None` (BFI-only
mode), without panic or significant accuracy degradation.
## References
- BFId paper: https://dl.acm.org/doi/10.1145/3719027.3765062
- KIT BFId dataset: https://ps.tm.kit.edu/english/bfid-dataset/index.php
- LeakyBeam (NDSS 2025): https://www.ndss-symposium.org/ndss-paper/lend-me-your-beam-privacy-implications-of-plaintext-beamforming-feedback-in-wifi/
- Wi-BFI tool: https://arxiv.org/abs/2309.04408
- Protecting activity signatures in CSI feedback: https://arxiv.org/pdf/2512.18529
- Research bundle: `docs/research/BFLD/` (this repo)
- Draft ADR: `docs/research/BFLD/08-adr-draft.md` → ADR-118
## Out of Scope
- Preventing passive BFI capture by external attackers (hardware-level problem, not
software).
- Differential privacy noise injection (noted as future extension in ADR-118).
- Federated identity learning (local-only is sufficient for the current use case).
- BFI capture directly from ESP32-S3 firmware (Espressif API does not expose CBFR;
host-side Pi 5 / Nexmon capture is the implementation path).
- WiFi 7 / 802.11be multi-link BFI (frame format versioning accommodates it; not
in scope for v1 implementation).
## Related Issues / PRs
- ADR-028 witness bundle (ref: this repo's `docs/WITNESS-LOG-028.md`)
- ADR-115 HA integration (21 entities — BFLD adds 6 more)
- ADR-116 Matter seed packaging (`cog-ha-matter` crate needs Matter boundary update)
- ADR-117 pip modernization (PyO3 pattern reused for BFLD Python bindings)
- rvCSI platform (ADR-095/096) — Nexmon adapter shared with BFLD BFI capture path

View File

@ -0,0 +1,136 @@
# BFLD: The Privacy Layer Your WiFi Sensing Stack Has Been Missing
Your WiFi router is broadcasting your identity in plaintext. Here is the layer that
catches it.
---
## The Problem
Every time your phone or laptop connects to a WiFi 5 or WiFi 6 router, it periodically
transmits a Beamforming Feedback Report (CBFR frame). This frame contains the compressed
channel matrix the router needs to aim its antennas at your device. The compression uses
Givens rotations — a pair of angles (Phi and Psi) per active subcarrier — that encode
the spatial geometry of the wireless channel around your body.
Here is the catch: these frames are transmitted before WPA2/WPA3 encryption is applied.
They are plaintext management frames, passively readable by any WiFi adapter in monitor
mode within roughly 20 meters.
Two papers published in 20242025 confirm the threat is real:
- **BFId** (KIT, ACM CCS 2025): re-identifies 197 people from beamforming feedback alone,
>90% accuracy from just 5 seconds of capture. Tools needed: a WiFi adapter, a pip
install, and no access to the target network.
(https://dl.acm.org/doi/10.1145/3719027.3765062)
- **LeakyBeam** (Zhejiang U. / NTU / KAIST, NDSS 2025): detects occupancy through walls
at 20 m range using beamforming feedback with 82.7% accuracy.
(https://www.ndss-symposium.org/ndss-paper/lend-me-your-beam-privacy-implications-of-plaintext-beamforming-feedback-in-wifi/)
WiFi sensing systems — including this project — process these same signals to detect
presence, count people, and track motion. Without a privacy layer, there is no way to
know whether the sensing output is derived from anonymizable motion data or from
identity-discriminative data.
---
## What BFLD Does
BFLD (Beamforming Feedback Layer for Detection) is a new Rust crate in the
wifi-densepose workspace that adds one thing: an explicit, continuous measurement of
whether the beamforming data currently being processed is capable of identifying
individuals.
It outputs a small, structured event on every sensing cycle:
```json
{
"timestamp_ns": 1748092800000000000,
"presence": true,
"motion": 0.42,
"person_count": 1,
"identity_risk_score": 0.71,
"rf_signature_hash": "a3f2c1...e9b4",
"zone_id": "living_room",
"confidence": 0.88,
"privacy_class": 1
}
```
High `identity_risk_score` (approaching 1.0) means the current sensing environment is
producing data from which an attacker could re-identify individuals. Low score means
the data is effectively anonymous.
The score is computed from four components: how separable the current RF embedding is
from a population distribution, how stable that separability is over time, how
consistent it is across multiple sensor viewpoints, and how confident the current sample
is. Multiply them together, clamp to [0, 1].
---
## Three Invariants That Cannot Be Turned Off
BFLD enforces three properties structurally — not as settings, not as policies:
**1. Raw BFI never leaves the node.** The Phi/Psi angle matrices are consumed locally
and dropped after feature extraction. They are not in the wire format. They are not in
the MQTT payload. There is no code path to serialize them outbound.
**2. Identity embeddings are RAM-only.** The vector embedding used to compute the risk
score lives in a fixed-size ring buffer (default: 10 minutes). It is never written to
disk. When the node restarts, the buffer is gone.
**3. Cross-site re-identification is cryptographically impossible.** The
`rf_signature_hash` is computed with a per-site secret key (generated at first boot,
stored in local NVS, never transmitted) and a per-day epoch. Two nodes at two
different sites, even receiving signals from the same person on the same day, produce
hash values in completely disjoint hash spaces. No amount of hash-list comparison can
reveal a cross-site visit.
---
## What Reaches Home Assistant and Matter
BFLD publishes to MQTT and HA. The following entities reach HA:
- `binary_sensor.bfld_presence`
- `sensor.bfld_motion`
- `sensor.bfld_person_count`
- `sensor.bfld_confidence`
The Matter bridge exposes only OccupancySensing (presence) and motion. Identity risk
score, rf_signature_hash, and all raw fields are rejected at both the HA and Matter
boundaries.
---
## Seven Acceptance Criteria
The implementation is done when these seven tests pass:
1. Parse 802.11ac and 802.11ax BFI at 20160 MHz bandwidth, 2×2 to 4×4 MIMO.
2. Presence latency ≤ 1 second p95.
3. Motion published at ≥ 1 Hz.
4. Raw BFI bytes absent from all output (verified by fuzz test).
5. Privacy mode suppresses all identity fields.
6. Identical input → identical output hash (cross-platform determinism).
7. Pipeline runs without CSI input (BFI-only mode).
---
## BFLD Is an Immune System, Not a Surveillance Lens
The framing matters. BFLD does not produce identity — it measures identity risk and
uses that measurement to gate what leaves the node. An immune system does not broadcast
the identity of pathogens it encounters; it classifies, responds locally, and keeps
detailed records inside the organism.
WiFi 7 / 802.11be is deploying now. Multi-link operation will increase beamforming
sounding frequency 35x. The passive attack surface will grow. The time to establish
safe defaults in WiFi sensing stacks is before that installed base is in place.
BFLD is that default.
Full research bundle: `docs/research/BFLD/` in the wifi-densepose repository.
Draft ADR: `docs/research/BFLD/08-adr-draft.md` (ADR-118).

View File

@ -0,0 +1,58 @@
# BFLD Research Bundle — Beamforming Feedback Layer for Detection
BFLD is the safety layer that detects when RF data becomes identifying. It sits between
raw 802.11 beamforming feedback (BFI) and every downstream consumer — home automation,
MQTT, Matter, cloud — measuring the identity-leakage potential of each frame and gating
what leaves the node. It does not produce identity; it guards against accidental or
adversarial exposure of identity.
---
## Table of Contents
| File | Purpose |
|------|---------|
| [01-sota-survey.md](01-sota-survey.md) | State-of-the-art literature: BFI vs CSI, attack tooling, identity-inference research, privacy-preserving techniques |
| [02-soul.md](02-soul.md) | Architectural intent, ethical stance, three non-negotiable invariants |
| [03-security-threat-model.md](03-security-threat-model.md) | Adversary classes, attack trees, mitigations, trust-boundary diagram, per-privacy-class analysis |
| [04-privacy-gating.md](04-privacy-gating.md) | privacy_class byte semantics, hash rotation algorithm, embedding lifecycle, wire-format diffs |
| [05-automation-integration.md](05-automation-integration.md) | Home Assistant entities, Matter clusters, MQTT ACLs, cognitum federation |
| [06-implementation-plan.md](06-implementation-plan.md) | New crate layout, reuse map, ESP32 additions, test plan, phased rollout |
| [07-benchmarks-and-evaluation.md](07-benchmarks-and-evaluation.md) | Datasets, metrics, red-team protocol, comparison baselines |
| [08-adr-draft.md](08-adr-draft.md) | Draft ADR-118 for formal project adoption |
| [09-github-issue.md](09-github-issue.md) | GitHub issue draft for tracking implementation |
| [10-gist.md](10-gist.md) | Public-facing one-pager / blog summary |
---
## Executive Summary
1. **Problem.** IEEE 802.11ac/ax beamforming feedback (BFI) — the compressed angle matrices
(Phi/Psi, Givens rotation) exchanged between client and AP — is transmitted unencrypted
on the management plane. Academic work (BFId at ACM CCS 2025, LeakyBeam at NDSS 2025)
demonstrates that a passive sniffer with commodity hardware can re-identify individuals
and infer occupancy through walls using only these frames. Existing CSI-based sensing
pipelines have no explicit layer to detect when their output crosses from "motion event"
into "identity record."
2. **Approach.** BFLD is a new crate (`wifi-densepose-bfld`) that wraps the BFI extraction
and normalization path in an identity-leakage estimator. Every output frame carries a
computed `identity_risk_score` and a `privacy_class` byte; downstream consumers decide
whether to act based on those tags rather than on raw measurements.
3. **Novel contribution.** BFLD does not try to suppress identity inference — it tries to
*measure* it continuously and make the measurement explicit in every event. This
transforms a latent, silent risk into an observable, auditable signal. The combination
of per-day per-site hash rotation and a local-only identity embedding creates structural
impossibility of cross-site re-identification — not merely a policy promise.
4. **Security posture.** Raw BFI never leaves the node. Identity embeddings live only in
an in-RAM ring buffer. The rf_signature_hash rotates daily using a per-site blake3
keyed-hash that is never transmitted. Matter and HA expose only presence, motion, and
person_count — never risk scores or embeddings.
5. **Integration plan.** Six phases: P1 frame format + extractor stub, P2 feature
extraction + identity_risk, P3 privacy gate + MQTT, P4 HA integration, P5 Matter
exposure, P6 cognitum federation. Each phase maps to a numbered acceptance criterion.
The crate slots into the existing workspace between `wifi-densepose-signal` and
`wifi-densepose-sensing-server`.