docs(adr-117): seed branch — ADR-117 pip-modernization spec + soul-signature research bundle

Two artifacts landing together on this new branch as the prerequisite
documentation for the v2.0.0 Python wheel modernization work:

1. **docs/adr/ADR-117-pip-wifi-densepose-modernization.md** (644 lines)
   — Plan to bring the 2025-published `wifi-densepose` PyPI package
   (last release v1.1.0, 2025-06-07, 11.5 months out of sync) up to
   the current Rust v2/ workspace SOTA. Recommends PyO3 + maturin
   with abi3-py310 (one binary covers Python 3.10–3.13 per OS/arch),
   first-wheel scope = core + vitals + signal crates (~5 MB), v1.99.0
   tombstone + 90-day un-yank window for v1.1.0, v2.0.0 hard break.
   Open questions catalogued; phases P1–P6+ laid out with concrete
   acceptance criteria.

2. **docs/research/soul/** (5 files, ~1,450 lines) — Soul Signature
   research spec: 7-channel electromagnetic biometric fingerprint
   (AETHER 128-dim + cardiac HR/HRV + cardiac waveform morphology +
   respiratory pattern + gait timing + skeletal proportions +
   subcarrier reflection profile), fused into one RVF graph file.
   Includes 60s scanning protocol, 5-layer security model,
   threat-model + mitigations, references to existing ADRs (014,
   021, 024, 027, 030, 039, 079, 106, 108, 109, 110, 115). Marked
   "Research Specification (Pre-Implementation)". Explicit "what
   this is NOT" disclaimers preempt pseudoscience drift; every
   discriminative-power claim either cites a measurement or is
   marked "open research; baseline TBD".

Branch off main at HEAD; ready for /loop 10m implementation
iterations.

Co-Authored-By: claude-flow <ruv@ruv.net>
This commit is contained in:
ruv 2026-05-24 10:36:06 -04:00
parent be4efecbcd
commit fd8b9c30e7
6 changed files with 2096 additions and 0 deletions

View File

@ -0,0 +1,644 @@
# ADR-117: pip `wifi-densepose` modernization via PyO3 + maturin bindings
| Field | Value |
|-------|-------|
| **Status** | Proposed |
| **Date** | 2026-05-24 |
| **Deciders** | ruv |
| **Codename** | **PIP-PHOENIX** — rising from a pure-Python server to Rust-core Python bindings |
| **Relates to** | [ADR-021](ADR-021-esp32-vitals.md) (ESP32 vitals), [ADR-028](ADR-028-esp32-capability-audit.md) (capability audit / witness), [ADR-115](ADR-115-home-assistant-integration.md) (HA-DISCO + HA-MIND MQTT semantics), [ADR-116](ADR-116-cog-ha-matter-seed.md) (HA-COG Seed packaging) |
| **Tracking issue** | TBD — file under RuView issue tracker |
---
## 1. Context
### 1.1 What the pip package is today
`wifi-densepose` v1.1.0 was published to PyPI on **2025-06-07** (two releases the same
day: 1.0.0 at 13:24 UTC, 1.1.0 at 17:02 UTC). Both wheels carry the tag
`py3-none-any` — no compiled extension, no platform-specific code. The package is a
**pure-Python server application** sourced entirely from `archive/v1/`.
The package installs a 40-dependency stack including FastAPI, PyTorch, SQLAlchemy,
Redis, Celery, OpenCV, asyncpg, psycopg2, and Scapy (`archive/v1/setup.py:4687`).
The declared entry points are:
```
wifi-densepose = src.cli:cli
wdp = src.cli:cli
```
(`archive/v1/setup.py:178179`)
The public API surface is centred on a FastAPI HTTP server, a SQLAlchemy/postgres
database layer, and a Redis/Celery task queue — none of which map to the current Rust
architecture. The `__init__.py` exports `app` (FastAPI), `CSIProcessor`,
`PhaseSanitizer`, `PoseEstimator`, `RouterInterface`, `ServiceOrchestrator`,
`HealthCheckService`, and `MetricsService` (`archive/v1/src/__init__.py:5468`).
### 1.2 Why this matters now
ADR-115 (PR #778, merged 2026-05-23) shipped 21 Home Assistant entities, 10 semantic
primitives, mTLS, privacy mode, and a full witness bundle from the Rust crate
`wifi-densepose-sensing-server`. ADR-116 is packaging this as a Cognitum Seed cog.
Neither surface is reachable from `pip install wifi-densepose` — the pip package cannot
import a CsiFrame, decode an edge-vitals packet, call a DSP stage, verify a witness
bundle, or subscribe to the sensing server's MQTT or WebSocket endpoints. The ecosystem
split is now wide enough that the pip package actively misleads new users about what
the project does.
Three concrete customer pain points:
1. A Python user who `pip install wifi-densepose` expecting to consume live pose/vitals
data gets a FastAPI server that requires postgres + redis, not a library they can
script against.
2. Integrators writing HA automations or Node-RED flows in Python have no idiomatic
Python API for the v0.7 telemetry surface (ADR-115 entities, semantic primitives).
3. The ADR-028 witness chain (deterministic pipeline proof) is Python-based and
exercised via `archive/v1/data/proof/verify.py`, but it imports from the v1 stack —
it cannot witness the Rust pipeline that is now the production implementation.
### 1.3 What this ADR is *not*
- Not a removal of `archive/v1/` from the repository. The v1 codebase stays as a
research archive and its proof bundle stays in `archive/v1/data/proof/`.
- Not a port of the Rust crates to Python. The Rust workspace (`v2/`) is authoritative
and unmodified by this ADR.
- Not a replacement of the `wifi-densepose-sensing-server` Rust binary. The pip
package wraps or clients the binary; it does not reimplement it.
- Not an overlap with ADR-116 (Seed cog packaging). ADR-116 ships a Seed-installable
artifact; ADR-117 ships a Python developer library for scripting, automation, and
prototyping against the Rust stack.
---
## 2. Current state — evidence
| Artifact | Value | Source |
|---|---|---|
| Latest PyPI version | **1.1.0** | `pypi.org/pypi/wifi-densepose/json` |
| First release date | 2025-06-07T13:24:53Z | PyPI JSON metadata |
| Latest release date | 2025-06-07T17:02:40Z | PyPI JSON metadata |
| Months since last release | **~11.5 months** | as of 2026-05-24 |
| Wheel tag | `py3-none-any` | PyPI simple index |
| Hard dependencies | 40 (torch, fastapi, sqlalchemy, redis, celery, …) | `setup.py:4687` |
| Entry point | `src.cli:cli` | `setup.py:178` |
| Python requires | `>=3.9` | `setup.py:108` |
| Classifiers Python versions | 3.9, 3.10, 3.11, 3.12 | PyPI JSON classifiers |
| Classifiers status | Beta (4) | PyPI JSON classifiers |
| Current Rust workspace version | **0.3.0** | `v2/Cargo.toml:version` |
| Rust crates in workspace | 20+ | `v2/Cargo.toml` members |
| ADR-115 shipped | 2026-05-23 | PR #778 |
The v1 source package (`archive/v1/setup.py:112215`) was clearly designed as an
all-in-one server application, not a reusable library. The `find_packages` call at
line 134 searches from `"."` (the archive root), meaning the wheel ships `src.*` as the
importable namespace. The proof bundle (`archive/v1/data/proof/verify.py:5657`) imports
`src.hardware.csi_extractor.CSIData` and `src.core.csi_processor.CSIProcessor` — v1 pure
Python only.
**PyPI org presence check:** a search for other `ruvnet`-published PyPI packages
(`ruvector`, `claude-flow`) returned no matches in the PyPI simple index as of this
writing. The `wifi-densepose` package is currently the only Python entry point for this
project's ecosystem.
---
## 3. Gap analysis
| Capability | Rust crate(s) | pip v1.1.0 status | Gap severity |
|---|---|---|---|
| `CsiFrame` / `CsiMetadata` core types | `wifi-densepose-core` (`types.rs`) | Not present — v1 uses `CSIData` Python class | **Critical** |
| HR/BR extraction from CSI buffer | `wifi-densepose-vitals` (4-stage pipeline: preprocessor → breathing → heartrate → anomaly) | Stub Python (`src/hardware/csi_extractor.py`) with no DSP | **Critical** |
| Phase sanitization / noise removal | `wifi-densepose-signal` (`phase_sanitizer`, `csi_processor`, `hampel`) | Python stubs in `src/core/phase_sanitizer.py` | **Critical** |
| Motion detection + presence scoring | `wifi-densepose-signal` (`motion.rs`, `MotionDetector`) | Not present | **Critical** |
| RuvSense multistatic sensing (13 modules) | `wifi-densepose-signal/src/ruvsense/` | Not present — ADR-029 post-dates v1 | **Critical** |
| 17-keypoint pose estimation | `wifi-densepose-nn`, `wifi-densepose-mat` | Stub `PoseEstimator` wrapping a `torch.nn.Module` that requires model weights | **High** |
| MQTT publisher (21 HA entities) | `wifi-densepose-sensing-server/src/mqtt/` | Not present — ADR-115 post-dates v1 | **High** |
| Semantic primitives (10 types) | `wifi-densepose-sensing-server/src/semantic/` | Not present | **High** |
| Matter bridge | `wifi-densepose-sensing-server/src/matter/` | Not present | **High** |
| WS/REST client for sensing-server | `wifi-densepose-sensing-server` (Axum) | v1 has a separate FastAPI server; no client | **High** |
| Witness bundle verification | ADR-028 / `scripts/generate-witness-bundle.sh` | `archive/v1/data/proof/verify.py` — proves v1 pipeline only | **High** |
| ESP32-C6 firmware telemetry (ADR-110) | `wifi-densepose-hardware` + `wifi-densepose-sensing-server` | Not present | **Medium** |
| Cross-viewpoint fusion (RuVector) | `wifi-densepose-ruvector/src/viewpoint/` | Not present | **Medium** |
| Semantic-primitive MQTT payload | `wifi-densepose-sensing-server/src/semantic/bus.rs` | Not present | **Medium** |
| PostgreSQL + Redis server mode | `archive/v1/` | Present (v1 only) | Low (not SOTA) |
| FastAPI HTTP REST server | `archive/v1/src/app.py` | Present (v1 only) | Low (not SOTA) |
---
## 4. Decision
Adopt **PyO3 + maturin Python extension bindings** as the primary modernization path,
shipping the pip package as a platform-native wheel (`manylinux`, `macosx`, `win-amd64`)
with compiled Rust extension modules, plus a pure-Python WS/MQTT client layer that talks
to a running `wifi-densepose-sensing-server` instance.
This path is called **PIP-PHOENIX**.
### 4.1 Why PyO3 + maturin over the three rejected alternatives
| Criterion | **PyO3 + maturin** (chosen) | Subprocess wrapper | REST/WS client only | Pure Python reimpl |
|---|---|---|---|---|
| Performance for DSP | Native Rust speed, zero copy | IPC overhead per call | N/A — no local DSP | Python bottleneck |
| Binary size in wheel | Core + vitals + signal only: ~2 MB stripped | Full sensing-server binary: ~1530 MB | Minimal (~50 kB) | Minimal (~100 kB) |
| Works offline / no server | Yes | Yes (binary bundled) | No — server required | Partial |
| Proof bundle can cover Rust pipeline | Yes — bindings call the same Rust code the server uses | Partial — server is a black box | No | No |
| Install experience | `pip install wifi-densepose` — wheel has no system deps | `pip install` downloads 25 MB binary | `pip install` — pure Python | `pip install` — pure Python |
| Maintenance surface | Python bindings + Rust workspace | Python thin shim | Python client | Python reimpl must track Rust |
| Async / tokio support | PyO3 0.28 `pyo3-asyncio` or `pyo3-async-runtimes` for async export; sync entry points for the DSP hot path | N/A | Native asyncio on client | N/A |
| GIL concern | DSP-heavy calls release GIL via `py.allow_threads`; tokio runtime per module | N/A | None | N/A |
| Fits existing architecture | Core + vitals + signal already have clean public APIs (`lib.rs` re-exports) | Requires sensing-server to be running | Requires sensing-server | Forks the domain model |
**Subprocess wrapper** is rejected because shipping a 25 MB pre-built server binary
inside every pip wheel is an unacceptably heavy install, and it makes offline scripting
impossible without starting the server.
**REST/WS client only** is rejected because it provides zero DSP utility offline and
cannot close the witness gap — the proof bundle must exercise the same pipeline code.
**Pure Python reimplementation** is the root cause of the current drift and is
explicitly rejected.
The chosen path starts small: **bind only the three crates with the highest Python
utility** (`wifi-densepose-core`, `wifi-densepose-vitals`, `wifi-densepose-signal`),
ship a `py3-none-any` pure-Python WS/MQTT client layer as a separate sub-module, and
grow from there.
---
## 5. Detailed design
### 5.1 Rust crates bound in v2.0 (first wheel)
Three crates are in scope for the initial binding. They were chosen because they have
no heavy system dependencies (no libtorch, no ONNX runtime), have stable `pub` re-export
surfaces in `lib.rs`, and directly address the three most-requested missing capabilities.
| Crate | Exported Python types / functions | Binding rationale |
|---|---|---|
| `wifi-densepose-core` | `CsiFrame`, `CsiMetadata`, `Keypoint`, `KeypointType`, `PersonPose`, `PoseEstimate`, `Confidence`, `BoundingBox` | Foundation types shared by all other crates; without these users can't even describe a frame |
| `wifi-densepose-vitals` | `CsiVitalPreprocessor`, `BreathingExtractor`, `HeartRateExtractor`, `VitalAnomalyDetector`, `VitalSignStore`, `VitalReading`, `VitalEstimate`, `AnomalyAlert` | The most-asked-for surface: HR/BR from a CSI buffer in 4 lines of Python |
| `wifi-densepose-signal` | `CsiProcessor`, `CsiProcessorConfig`, `PhaseSanitizer`, `MotionDetector`, `MotionScore`, `FeatureExtractor`, `HardwareNormalizer` | DSP pipeline that produces the features vitals and pose estimation consume |
Crates **deferred to P6+**: `wifi-densepose-nn` (requires libtorch or candle — wheel
size risk), `wifi-densepose-mat` (depends on nn), `wifi-densepose-ruvector` (RuVector
GNN types — high value but adds ruvector-gnn 2.0.5 link dependency),
`wifi-densepose-hardware` (ESP32 HAL — not Python-scripting friendly).
### 5.2 New workspace member: `python/`
A new crate `python/` is added as a workspace member at `v2/crates/wifi-densepose-py/`.
It is a `cdylib` that re-exports the three bound crates behind a single maturin module
named `wifi_densepose._core`.
```toml
# v2/crates/wifi-densepose-py/Cargo.toml (sketch)
[package]
name = "wifi-densepose-py"
version.workspace = true
edition.workspace = true
[lib]
name = "_core"
crate-type = ["cdylib"]
[dependencies]
pyo3 = { version = "0.28", features = ["extension-module", "abi3-py310"] }
wifi-densepose-core = { path = "../wifi-densepose-core", features = ["serde"] }
wifi-densepose-vitals = { path = "../wifi-densepose-vitals" }
wifi-densepose-signal = { path = "../wifi-densepose-signal" }
```
The `abi3-py310` feature locks the stable ABI to CPython 3.10+, so one wheel binary
works across 3.10, 3.11, 3.12, and 3.13 without recompilation.
PyO3 bindings pattern (example for `CsiFrame`):
```rust
// v2/crates/wifi-densepose-py/src/core_types.rs
use pyo3::prelude::*;
use wifi_densepose_core::CsiFrame as RustCsiFrame;
#[pyclass(name = "CsiFrame")]
#[derive(Clone)]
pub struct PyCsiFrame {
inner: RustCsiFrame,
}
#[pymethods]
impl PyCsiFrame {
#[new]
fn new(amplitudes: Vec<f32>, phases: Vec<f32>, n_subcarriers: usize,
sample_index: u64, sample_rate_hz: f32) -> Self {
Self { inner: RustCsiFrame { amplitudes, phases, n_subcarriers,
sample_index, sample_rate_hz } }
}
#[getter] fn amplitudes(&self) -> Vec<f32> { self.inner.amplitudes.clone() }
#[getter] fn phases(&self) -> Vec<f32> { self.inner.phases.clone() }
#[getter] fn n_subcarriers(&self) -> usize { self.inner.n_subcarriers }
}
```
DSP calls that execute >1 ms release the GIL:
```rust
#[pymethods]
impl PyCsiProcessor {
fn process<'py>(&mut self, py: Python<'py>, frame: &PyCsiFrame)
-> PyResult<Option<PyProcessedSignal>>
{
py.allow_threads(|| self.inner.process(&frame.inner))
.map(|opt| opt.map(PyProcessedSignal::from))
.map_err(|e| PyRuntimeError::new_err(e.to_string()))
}
}
```
### 5.3 pip package layout
```
wifi-densepose/ ← PyPI package name (unchanged)
wifi_densepose/ ← importable namespace
__init__.py ← re-exports core types + version
_core.pyd / _core.so ← compiled PyO3 extension (maturin build output)
vitals.py ← thin Python wrapper + docstrings over _core vitals types
signal.py ← thin Python wrapper over _core signal types
client/
__init__.py
ws.py ← asyncio WebSocket client for sensing-server /ws/sensing
mqtt.py ← paho-mqtt wrapper for ruview/<node_id>/raw/* topics
ha.py ← helpers for HA-DISCO payloads (read-only, mirrors ADR-115 §3.2)
witness/
__init__.py
verify.py ← Python-callable witness verifier (re-creates ADR-028 proof
over the Rust pipeline via PyO3 bindings, not archive/v1/)
compat/
v1.py ← import shim that raises MigrationError (see §9)
py.typed ← PEP 561 marker
```
The import path intentionally maps to Rust crate names:
```python
from wifi_densepose import CsiFrame # core types
from wifi_densepose.vitals import BreathingExtractor, HeartRateExtractor
from wifi_densepose.signal import CsiProcessor, MotionDetector
from wifi_densepose.client.ws import SensingClient
from wifi_densepose.witness import verify_bundle
```
### 5.4 PyPI distribution — wheel matrix
Published as `wifi-densepose==2.0.0` using **cibuildwheel** driven by GitHub Actions.
| Platform | Arch | CPython | Tag (stable ABI) |
|---|---|---|---|
| `manylinux_2_28` | x86_64 | 3.10+ | `cp310-abi3-manylinux_2_28_x86_64` |
| `manylinux_2_28` | aarch64 | 3.10+ | `cp310-abi3-manylinux_2_28_aarch64` |
| `macosx_11_0` | x86_64 | 3.10+ | `cp310-abi3-macosx_11_0_x86_64` |
| `macosx_11_0` | arm64 | 3.10+ | `cp310-abi3-macosx_11_0_arm64` |
| `win` | amd64 | 3.10+ | `cp310-abi3-win_amd64` |
| sdist | — | — | source fallback |
The `abi3-py310` flag means **one binary per OS/arch** covers all supported Python
versions — 5 wheels total plus an sdist, compared to the 20-wheel matrix that would be
needed without stable ABI.
```yaml
# .github/workflows/pip-release.yml (sketch)
- uses: pypa/cibuildwheel@v2
with:
package-dir: v2/crates/wifi-densepose-py
output-dir: dist
env:
CIBW_BUILD: "cp310-*"
CIBW_ARCHS_LINUX: "x86_64 aarch64"
CIBW_ARCHS_MACOS: "x86_64 arm64"
CIBW_ARCHS_WINDOWS: "AMD64"
CIBW_BEFORE_BUILD: "pip install maturin"
CIBW_BUILD_FRONTEND: "build[uv]"
```
### 5.5 CLI parity
The pip wheel installs a `wifi-densepose` console script. In v2 this script is a thin
Python shim that:
1. Checks whether `wifi-densepose-sensing-server` binary is on `PATH` (installed
separately via a platform-specific binary distribution or `cargo install`).
2. If found: proxies `wifi-densepose serve`, `wifi-densepose stream`, etc. to the Rust
binary via `subprocess.run`.
3. If not found: falls back to the PyO3 module for offline DSP commands
(`wifi-densepose vitals --file recording.jsonl`).
This is explicitly **not** a reimplementation of the CLI — the Rust binary
(`wifi-densepose-cli/src/main.rs`, currently exposes `mat` and `version` subcommands)
is the authoritative CLI. The pip shim is a discovery/convenience layer.
### 5.6 WS/MQTT client layer
`wifi_densepose.client.ws.SensingClient` is a pure-Python asyncio client wrapping the
sensing-server WebSocket at `/ws/sensing`:
```python
async with SensingClient("ws://localhost:8765/ws/sensing") as client:
async for msg in client.stream():
if msg.type == "edge_vitals":
print(msg.breathing_rate_bpm, msg.heartrate_bpm)
```
`wifi_densepose.client.mqtt.RuViewMqttClient` wraps paho-mqtt and subscribes to
`ruview/<node_id>/raw/+` as defined in ADR-115 §3.2.
Both clients are **pure Python** (no PyO3) and are optional dependencies (`pip install
wifi-densepose[client]`). They depend on `websockets>=12` and `paho-mqtt>=2` respectively.
### 5.7 Witness chain (re-rooted to the Rust pipeline)
`wifi_densepose.witness.verify_bundle(path)` replaces the v1 proof verification with a
new chain that exercises the Rust pipeline via PyO3:
```python
from wifi_densepose.witness import verify_bundle
result = verify_bundle("dist/witness-bundle-ADR028-*/")
assert result.verdict == "PASS", result.detail
```
Internally it:
1. Loads the 1,000-frame reference JSON from the bundle.
2. Feeds each frame through `PyCsiProcessor` (PyO3 binding of the Rust `CsiProcessor`).
3. Hashes the output using the same SHA-256 scheme as `archive/v1/data/proof/verify.py`.
4. Compares against the published hash in `expected_features.sha256`.
The v1 proof (`archive/v1/data/proof/verify.py`) is **preserved unchanged** — it
continues to prove the v1 pipeline. The new `witness.py` proves the v2/Rust pipeline.
Both can coexist; the ADR-028 witness bundle ships with both.
---
## 6. Migration path (phased)
```
P1 ──► P2 ──► P3 ──► P4 ──► P5 ──► P6+
scaffold core vitals+ client publish deferred
types signal layer v2.0.0
```
### P1 — Scaffold (1 week)
- [ ] Add `v2/crates/wifi-densepose-py/` as workspace member.
- [ ] `Cargo.toml`: `crate-type = ["cdylib"]`, pyo3 0.28 + `abi3-py310`, no
workspace deps yet (empty module compiles and imports).
- [ ] `pyproject.toml` at repo root `python/` with `[build-system] requires =
["maturin>=1.8"]` and `[tool.maturin] features = ["pyo3/extension-module"]`.
- [ ] CI job: `maturin develop` on ubuntu-latest in a Python 3.12 venv; import
`wifi_densepose._core` succeeds.
- [ ] Publish `wifi-densepose==1.99.0` to PyPI with a migration notice in the
module body (see §9 — no new features, just the tombstone release).
### P2 — Core type bindings (1 week)
- [ ] Bind `CsiFrame`, `CsiMetadata`, `Confidence`, `Keypoint`, `KeypointType`,
`BoundingBox`, `PoseEstimate`, `PersonPose` from `wifi-densepose-core`.
- [ ] All types: `__repr__`, `__eq__`, `__hash__` where meaningful; serde JSON
round-trip via `pyo3-serde` or manual `to_dict()` / `from_dict()`.
- [ ] Add `py.typed` + stub `.pyi` file generated by `pyo3-stub-gen`.
- [ ] Unit tests: `tests/test_core.py` — construct each type, round-trip JSON.
### P3 — Vitals + signal DSP bindings (2 weeks)
- [ ] Bind the full 4-stage vitals pipeline:
`CsiVitalPreprocessor`, `BreathingExtractor`, `HeartRateExtractor`,
`VitalAnomalyDetector`, `VitalSignStore`, `VitalReading`, `VitalEstimate`,
`AnomalyAlert`.
- [ ] Bind signal DSP entry points: `CsiProcessor`, `CsiProcessorConfig`,
`PhaseSanitizer`, `MotionDetector`, `HardwareNormalizer`.
- [ ] GIL release (`py.allow_threads`) on all calls >0.5 ms (measured in bench).
- [ ] Integration test: feed 1,000 frames from `archive/v1/data/proof/sample_csi_data.json`
through the PyO3 vitals pipeline; assert output is deterministic across runs.
- [ ] Re-implement `witness/verify.py` using P3 bindings; compare SHA-256 against the
v1 expected hash. **Note:** the hash will differ because the Rust and Python
processors are not identical — generate and publish a new `expected_features_v2.sha256`.
### P4 — WS/MQTT client layer (1 week)
- [ ] Implement `wifi_densepose.client.ws.SensingClient` (asyncio, `websockets>=12`).
- [ ] Implement `wifi_densepose.client.mqtt.RuViewMqttClient` (paho-mqtt 2.x).
- [ ] Add `wifi_densepose.client.ha` helpers that parse ADR-115 MQTT discovery payloads
into Python dataclasses.
- [ ] Integration test: spin up `sensing-server` in Docker with `--mock-frames`;
assert `SensingClient` receives `edge_vitals` messages.
### P5 — First cibuildwheel publish as v2.0.0 (1 week)
- [ ] `.github/workflows/pip-release.yml` — cibuildwheel matrix (5 wheels + sdist).
- [ ] `python_requires = ">=3.10"` (stable ABI base).
- [ ] Populate `pyproject.toml` with minimal `install_requires`: `pyo3` is a build dep,
not a runtime dep. Runtime extras: `[client]` adds `websockets>=12,paho-mqtt>=2`.
- [ ] `pip install wifi-densepose==2.0.0` and smoke-test on each CI platform.
- [ ] PyPI publish via Trusted Publisher (OIDC, no API token in secrets).
- [ ] Announce: `wifi-densepose==1.99.0` tombstone already on PyPI; `v2.0.0` replaces
it in search results.
### P6+ — Deferred
- [ ] `wifi-densepose-nn` bindings (libtorch / candle wheel size TBD — see Open
Questions §13.3).
- [ ] `wifi-densepose-ruvector` bindings (RuVector attention types).
- [ ] MQTT/Matter integration helpers (`wifi_densepose.client.matter`).
- [ ] Deprecation notice on `wifi-densepose==1.x` releases (PyPI yank — see §9).
- [ ] `wifi-densepose-sensing-server` binary distribution via pip extra
(`pip install wifi-densepose[server]` fetches pre-built binary for the platform).
- [ ] HACS Python integration built on top of the pip client layer (follow-on to
ADR-115 §6.A).
---
## 7. Compatibility and deprecation
### 7.1 Version bump strategy
`wifi-densepose==2.0.0` is a **hard major-version break**. The 1.x import namespace
`src.*` is incompatible with the 2.x namespace `wifi_densepose.*`. There is no shim
that can bridge them transparently.
### 7.2 Tombstone release: v1.99.0
Before publishing v2.0.0, publish `wifi-densepose==1.99.0` as a pure-Python sdist/wheel
whose sole content is:
```python
# wifi_densepose/__init__.py (v1.99.0)
raise ImportError(
"wifi-densepose 1.x has been superseded by v2.0.0 which wraps "
"the Rust-based stack. Run:\n\n"
" pip install wifi-densepose==2.0.0\n\n"
"Migration guide: https://github.com/ruvnet/RuView/blob/main/docs/pip-migration.md\n"
"Legacy v1 source: archive/v1/ in the repository"
)
```
This ensures any project pinned to `wifi-densepose>=1` that upgrades to 1.99.0 gets a
clear error rather than a silent broken import.
### 7.3 PyPI yank strategy
After v2.0.0 is stable (90-day observation window):
- Yank `wifi-densepose==1.0.0` — never had a separate stable release period; was
superseded 4 hours after publication.
- Leave `wifi-densepose==1.1.0` un-yanked but deprecated in the description.
- Publish `wifi-densepose==1.99.0` as the canonical 1.x landing page (raise error).
Yanked versions remain installable with `pip install wifi-densepose==1.1.0 --force`
so users with reproducible builds pinned to exact versions are not broken silently.
### 7.4 Semver
| Version | Content |
|---|---|
| 1.0.0 1.1.0 | Legacy Python server (archive/v1/) |
| **1.99.0** | Tombstone: ImportError migration notice |
| **2.0.0** | PyO3 Rust bindings + WS/MQTT client |
| 2.x.y | Additive bindings + client improvements |
| 3.0.0 | If/when nn bindings added (libtorch wheel size may force a separate package) |
---
## 8. Alternatives considered and rejected
### Alt-A: Subprocess wrapper
Package the pre-built `wifi-densepose-sensing-server` Rust binary inside the pip wheel.
Python calls it via `subprocess`. **Rejected** because: the binary is 1530 MB stripped;
the install footprint is prohibitive; offline DSP scripting still requires the server to
be running; the witness chain cannot exercise Rust code through a black-box binary.
### Alt-B: REST/WS client only
Ship a pure-Python package that is purely a client to a running `sensing-server`
instance. **Rejected** because: it provides zero offline utility; it cannot host the
witness chain over the Rust pipeline; it solves the "Python access to telemetry" problem
but not the "Python DSP / prototyping" problem that academic and embedded users need.
### Alt-C: Pure Python reimplementation
Rewrite the DSP pipeline in pure Python/NumPy to reach parity with the Rust
implementation. **Rejected explicitly** — this is the root cause of the current 11-month
drift and the pattern this ADR is designed to exit. Any Python reimplementation will
immediately begin drifting again as the Rust stack evolves.
---
## 9. Risks
| Risk | Likelihood | Severity | Mitigation |
|---|---|---|---|
| **Build matrix complexity** — 5 target triples × cibuildwheel setup; CI time; QEMU for aarch64 cross-compile | High | Medium | Use `abi3-py310` (5 wheels not 20); QEMU aarch64 emulation available in GitHub Actions; maturin handles auditwheel automatically |
| **Binary size** — future nn/ONNX bindings may push wheel past 50 MB | Medium | High | Keep nn bindings in a separate `wifi-densepose-nn` PyPI package; keep core+vitals+signal wheel lean (~2 MB stripped) |
| **GIL / async issues** — PyO3 wrapping tokio crates requires careful runtime management; `py.allow_threads` must be used around all blocking Rust calls | High | High | Restrict initial bindings to synchronous Rust APIs (vitals, signal, core are all sync); async sensing-server client stays in pure-Python `client/ws.py` |
| **Maintainer overhead** — two languages, two build systems, one PyPI package | Medium | Medium | maturin unifies the build; CI handles publishing; start with 3 bound crates only |
| **1.x user breakage** — users pinned to `wifi-densepose>=1,<2` will get the tombstone | Low | Medium | 1.99.0 tombstone gives a clear error; maintain 1.1.0 on PyPI un-yanked for 90 days post-v2 |
| **Windows Rust toolchain in CI** — linking PyO3 on Windows requires MSVC or mingw; extra CI complexity | Medium | Medium | GitHub Actions `windows-latest` has MSVC; maturin + cibuildwheel handle this natively |
| **Stable ABI limitations**`abi3` precludes some advanced PyO3 features (e.g. `Buffer` protocol) | Low | Low | Core/vitals/signal types are scalar/Vec<f32> — no need for buffer protocol in P2P3 |
| **PyPI name ownership** — we own `wifi-densepose` on PyPI (confirmed via rUv author field) | Low | Low | Confirm with `pypi.org/user/ruvnet` before publishing |
---
## 10. Acceptance criteria
The following checks must all pass before ADR-117 is considered Accepted:
- [ ] `pip install wifi-densepose==2.0.0` succeeds on Python 3.10, 3.11, 3.12, 3.13
on linux/x86_64, macos/arm64, and windows/amd64 in a clean venv with no extra build tools.
- [ ] `python -c "import wifi_densepose; print(wifi_densepose.__version__)"` prints `2.0.0`.
- [ ] `python -c "from wifi_densepose import CsiFrame; f = CsiFrame([1.0]*56, [0.0]*56, 56, 0, 100.0); print(f)"` produces a non-error repr.
- [ ] The 4-stage vitals pipeline processes 1,000 frames in under 500 ms on a
reference machine (CPython 3.12, linux x86_64, no GPU).
- [ ] `wifi_densepose.witness.verify_bundle(path)` returns `verdict="PASS"` for a
freshly generated witness bundle from `scripts/generate-witness-bundle.sh`.
- [ ] `wifi_densepose.client.ws.SensingClient` receives at least one `edge_vitals`
message from a `sensing-server --mock-frames` instance within 5 seconds.
- [ ] `pip install wifi-densepose==1.99.0` raises `ImportError` with the migration URL.
- [ ] The compiled `_core` extension has no unresolved dynamic library dependencies
beyond libc/msvcrt (verified by `auditwheel show` on Linux, `delocate-listdeps` on macOS).
- [ ] Type stubs (`wifi_densepose/*.pyi`) are present; `mypy --strict` passes on the
example code in `examples/vitals_from_buffer.py`.
- [ ] Total wheel size for core+vitals+signal: `≤ 5 MB` per platform.
---
## 11. Open questions
1. **Stable ABI base version**: `abi3-py310` drops support for Python 3.9, which v1.1.0
declared. Is Python 3.9 EOL-enough (EOL 2025-10-05) to drop cleanly? *Tentative: yes,
drop 3.9. Use abi3-py310.*
2. **Package name for nn bindings**: if `wifi-densepose-nn` bindings require a 30 MB
libtorch wheel, should they live at `wifi-densepose-nn` (separate PyPI package) or
as an optional heavy extra of `wifi-densepose[nn]`? *Tentative: separate package to
avoid polluting the lean wheel.*
3. **Witness hash continuity**: the Rust pipeline will produce a different SHA-256 than
the v1 Python pipeline for the same input frames. The new `expected_features_v2.sha256`
must be generated and committed before v2.0.0 ships. Who generates it, and how is
the generation process itself witnessed? *Tentative: generate in CI, commit hash to
`archive/v1/data/proof/`, include in ADR-028 matrix.*
4. **`ruv-neural` crate**: `v2/crates/ruv-neural/` exists in the workspace. Is it a
candidate for early Python bindings (useful for training-loop scripting), or should
it wait for the nn/train tier? *Tentative: defer — it depends on training backends.*
5. **Tokio runtime**: `wifi-densepose-sensing-server` is tokio-based, but the three
crates bound in P2P3 (`core`, `vitals`, `signal`) are synchronous. Are there any
hidden tokio dependencies that would force a runtime into the extension module?
*Tentative: inspect each crate's Cargo.toml for tokio deps before P1 scaffold.*
6. **`pyo3-stub-gen` vs manual stubs**: automated stub generation from PyO3 has rough
edges for generics and newtype patterns. Should we hand-write `.pyi` stubs for the
first release? *Tentative: use `pyo3-stub-gen` for scaffolding, hand-tune for public
API.*
7. **`wifi_densepose` vs `wifi-densepose` namespace**: the pip package name uses a dash
(`wifi-densepose`) but Python imports use underscores (`wifi_densepose`). The v1
package shipped under `src.*`, not `wifi_densepose.*`. Is there any tooling that
hardcodes the `src` namespace? *Tentative: the `src.*` namespace was specific to
`archive/v1/` and is cleanly dropped.*
8. **cibuildwheel version**: the current stable is cibuildwheel v2.x. Does the
project's existing GitHub Actions config need updates for maturin builds vs
the current `cargo build` / `build.py` patterns? *Tentative: yes, add a separate
`pip-release.yml` workflow; do not modify existing Rust CI.*
9. **RuVector bindings timeline**: the `wifi-densepose-ruvector` crate (`v2/crates/`)
depends on `ruvector-gnn = "2.0.5"`. Does ruvector-gnn ship as a pre-built static
lib or require linking at build time? This directly affects the P6+ wheel size.
*Tentative: investigate ruvector-gnn link strategy before committing to a timeline.*
10. **`wifi_densepose.client.ha` conflict with ADR-115/116**: the `ha.py` helper module
should not duplicate the ADR-115 MQTT discovery logic in Python. Should it be read-only
(parse HA discovery JSON → Python dataclasses) or also write (publish discovery JSON)?
*Tentative: read-only for v2.0. Write path deferred to the HACS integration follow-on
(ADR-115 §6.A).*
---
## 12. References
- **PyPI package (current)**: https://pypi.org/project/wifi-densepose/ — v1.1.0, released 2025-06-07
- **PyPI JSON metadata**: https://pypi.org/pypi/wifi-densepose/json
- **Local source**: `archive/v1/setup.py`, `archive/v1/src/__init__.py`, `archive/v1/data/proof/verify.py`
- **Rust workspace**: `v2/Cargo.toml`, `v2/crates/wifi-densepose-core/src/lib.rs`,
`v2/crates/wifi-densepose-vitals/src/lib.rs`, `v2/crates/wifi-densepose-signal/src/lib.rs`,
`v2/crates/wifi-densepose-sensing-server/src/lib.rs`
- **PyO3 docs**: https://pyo3.rs/ — v0.28.3 stable, Rust ≥1.83 required
- **maturin docs**: https://maturin.rs/ — supports Python 3.8+ on Linux/macOS/Windows/FreeBSD
- **cibuildwheel docs**: https://cibuildwheel.pypa.io/
- **ADR-021**: ESP32 vitals — defines the HR/BR extraction pipeline this ADR exposes in Python
- **ADR-028**: ESP32 capability audit — defines the witness bundle format `witness/verify.py` must re-verify
- **ADR-115**: HA-DISCO + HA-MIND + HA-FABRIC — defines the MQTT topic structure the `client/mqtt.py` helper consumes
- **ADR-116**: HA-COG cog packaging — parallel effort; ADR-117 pip library is the developer-facing Python surface; ADR-116 is the Seed-installable artifact

View File

@ -0,0 +1,116 @@
# Soul Signature — Research Specification
**Status:** Research Specification (Pre-Implementation)
**Date:** 2026-05-24
**Maintainer:** ruv
---
## What Is a Soul Signature
A Soul Signature is a fused multi-modal biometric identity vector derived entirely
from passive electromagnetic measurement of a person inside a room equipped with
WiFi-DensePose / RuView sensing nodes. No wearable, no camera, no explicit
scan-time consent moment is required for recognition once a person has enrolled.
The word "soul" is deliberate product framing for a scientifically defensible concept:
the same relationship a fingerprint bears to identity in forensic science, or FaceID
to phone authentication, but extended to a new sensing dimension — passive RF at
distance, through walls, at room scale. Seven orthogonal electromagnetic observables,
fused into a single content-addressed RVF graph file, constitute the signature.
The claim is not mystical. Every channel is grounded in published physics and prior
WiFi sensing literature. Every assertion about discriminative power either cites a
peer-reviewed result or is explicitly marked "open research; baseline TBD."
---
## What a Soul Signature Is NOT
- It is NOT a replacement for fingerprint scanners, iris scanners, or FaceID on
accuracy-per-attempt measures. Current RF biometrics are less mature than those
modalities. See `security.md` for the honest error-rate picture.
- It is NOT a single number, hash, or deterministic bit string. It is a
probabilistic match against a stored graph with a calibrated false-accept rate.
- It is NOT medically diagnostic. It detects biophysical proxies, not conditions.
"Gait asymmetry increased 18% over 14 days" is the output, never "Parkinson's."
- It is NOT equivalent to explicit-consent biometrics in regulated contexts. GDPR
and HIPAA modes are defined and mandatory for healthcare deployments.
- It is NOT currently deployable as a legal evidence instrument.
- It is NOT snake oil, energy healing, or anything outside measurable electrophysics.
---
## Document Map
| File | Contents |
|------|----------|
| `specification.md` | Typed RVF graph schema; all node types, edge types, serialization format; aggregator vs stored profile distinction |
| `scanning-process.md` | Structured 60-second enrollment protocol; hardware requirements; quality gates; fast-scan and continuous modes; re-scan cadence |
| `security.md` | Full threat model; five adversaries; mitigations; cryptographic primitive choices; GDPR/HIPAA mode; open research items |
| `references.md` | All cited ADRs, papers, datasets, standards |
---
## Conceptual Graph (ASCII)
The following depicts one example soul signature as a graph stored in a single
RVF container. Each box is an RVF node (a SEG_EMBED or SEG_META segment). Each
arrow is a typed edge stored in the graph manifest.
```
+-----------------------+
| AETHER_Embedding | 128-dim f32, L2-normalized (ADR-024)
| contrastive CSI | HNSW-searchable via ruvector-core
| backbone embedding |
+----------+------------+
| derived_from
v
+-----------+-----------+ +------------------------+
| FieldModel_Residual +---fuses--+ Subcarrier_Reflection |
| ADR-030 perturbation | | per-angle multipath |
| eigenmode projection | | amplitude + phase |
+----------+------------+ +------------------------+
| correlates_with
v
+----------+------------+ +------------------------+
| Cardiac_HR_Profile +--links---+ Cardiac_Waveform_ |
| baseline_bpm, HRV_LF | | Morphology (wavelet |
| HRV_HF, rhythm_class | | coefficients) |
+----------+------------+ +------------------------+
| temporally_colocated
v
+----------+------------+
| Respiratory_Pattern |
| baseline_bpm, depth, |
| apnea_index, HRV_RSA |
+----------+------------+
| temporally_colocated
v
+----------+------------+ +------------------------+
| Gait_Timing +--links---+ Skeletal_Proportions |
| cadence, stride_var, | | torso/limb ratios |
| double_support_pct, | | from ADR-079 keypoints |
| asymmetry_index | +------------------------+
+----------+------------+
| attested_by
v
+----------+------------+
| WitnessChain | Ed25519 over (content_hash ||
| ADR-110 attestation | timestamp || device_id) per ADR-110
+-----------------------+
```
File naming convention: `signature-<sha256-of-rvf-content>.rvf`
---
## Implementation Status
This is a **research specification**. None of the soul-signature-specific graph
container logic is implemented yet. The constituent ADRs (AETHER, MERIDIAN,
RuvSense field model, ADR-039 vitals, ADR-110 witness chain) provide the substrate.
The soul signature is the composition layer above them.
A future implementation ADR should reference this document and assign acceptance
tests derived from the quality gates defined in `scanning-process.md`.

View File

@ -0,0 +1,138 @@
# Soul Signature — References
**Status:** Research Specification (Pre-Implementation)
**Date:** 2026-05-24
**Author:** ruv
---
## 1. Internal Architecture Decision Records
All ADRs are located at `docs/adr/ADR-XXX-*.md` in this repository.
| ADR | Title | Relevance to soul signature |
|---|---|---|
| ADR-003 | RVF Cognitive Containers for CSI Data | RVF container format used by soul signature |
| ADR-004 | HNSW Vector Search for Signal Fingerprinting | HNSW index for person_track embedding search |
| ADR-005 | SONA Self-Learning Pose Estimation | LoRA adaptation, EWC regularization, environment profiles |
| ADR-007 | Post-Quantum Cryptography Secure Sensing | PQC cryptographic context; foundation for ADR-108/109 |
| ADR-010 | Witness Chains Audit Trail Integrity | Witness chain design; Ed25519 over frame bundles |
| ADR-014 | SOTA Signal Processing Algorithms | RuvSense pipeline: conjugate multiplication, Hampel filter, spectrogram, BVP |
| ADR-021 | Vital Sign Detection via rvdna Pipeline | Cardiac HR / respiratory extraction; bandpass filters; ADR-039 vitals packet |
| ADR-023 | Trained DensePose Model with RuVector Pipeline | CsiToPoseTransformer backbone; MPJPE baseline 91.7 mm |
| ADR-024 | Project AETHER — Contrastive CSI Embedding Model | Primary soul signature identity channel; 128-dim L2-normalized embedding; HNSW person_track index (>80% mAP target at 5 subjects) |
| ADR-027 | Project MERIDIAN — Cross-Environment Domain Generalization | Environment-disentangled embeddings; HardwareNormalizer; multi-room portability |
| ADR-029 | RuvSense Multistatic Sensing Mode | Multi-node mesh; 20 Hz DensePose; <30 mm jitter; person separation |
| ADR-030 | RuvSense Persistent Field Model | Field normal modes; SVD eigenstructure; perturbation extraction; longitudinal drift; adversarial detection; cross-room continuity |
| ADR-039 | ESP32-S3 Edge Intelligence Pipeline | Vitals packet wire format (magic `0xC511_0002`); HR/BR on-device extraction |
| ADR-075 | MinCut Person Separation | ruvector-mincut for multi-person track assignment |
| ADR-079 | Camera Ground-Truth Training | Paired camera + CSI training; skeletal proportions accuracy |
| ADR-082 | Pose Tracker Confirmed Output Filter | Pose tracker output confidence filtering |
| ADR-100 | Cog Packaging Specification | Ed25519 firmware signing; supply chain integrity |
| ADR-105 | Federated CSI Training | Federated AETHER fine-tuning; secure aggregation |
| ADR-106 | DP-SGD and Primitive Isolation | Differential privacy at training; biometric primitive isolation; (ε, δ)-DP budget |
| ADR-107 | Cross-Installation Federation | Cross-installation secure aggregation; DH key exchange |
| ADR-108 | Kyber Post-Quantum Key Exchange | Kyber-768 (NIST FIPS 203); hybrid X25519 + Kyber during migration |
| ADR-109 | Dilithium PQC Signatures | Dilithium-3 (NIST FIPS 204); hybrid Ed25519 + Dilithium; cog signing |
| ADR-110 | ESP32-C6 Firmware Extension | Wi-Fi 6 HE-LTF CSI (242 subcarriers); 802.15.4 time-sync; TWT; Ed25519 witness chain per-frame |
| ADR-113 | Multistatic Placement Strategy | Node placement geometry; coverage analysis |
| ADR-115 | Home Assistant Integration (HA-DISCO + HA-MIND) | Privacy mode; MQTT auto-discovery; semantic primitives layer under which soul signature operates |
---
## 2. AETHER and Contrastive Embedding Foundations
- Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). **A Simple Framework for Contrastive Learning of Visual Representations** (SimCLR). *ICML 2020*. arXiv:2002.05709.
- Chen, T., Kornblith, S., Sohl-Dickstein, J., & Hinton, G. (2020). **Big Self-Supervised Models are Strong Semi-Supervised Learners** (SimCLR v2). *NeurIPS 2020*. arXiv:2006.10029.
- Bardes, A., Ponce, J., & LeCun, Y. (2022). **VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning**. *ICLR 2022*. arXiv:2105.04906.
- Grill, J.-B., et al. (2020). **Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning** (BYOL). *NeurIPS 2020*. arXiv:2006.07733.
- Wang, T. & Isola, P. (2020). **Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere**. *ICML 2020*. arXiv:2005.10242.
---
## 3. WiFi CSI Biometric Identification (Prior Art)
- **IdentiFi** (2025): Self-supervised WiFi-based identity recognition in multi-user smart environments. Contrastive pretraining in the signal domain produces identity-discriminative embeddings without spatial labels. *PMC:12115556*.
- **WhoFi** (2025): Transformer-based WiFi CSI encoding for person re-identification. 95.5% accuracy on NTU-Fi (18 subjects). Validates transformer backbones for CSI re-ID. arXiv:2507.12869.
- **Wi-PER81** (2025): Benchmark dataset of 162K wireless packets for WiFi-based person re-identification using Siamese networks. *Nature Scientific Data*, 2025. doi:10.1038/s41597-025-05804-0.
- **CAPC** (Context-Aware Predictive Coding, 2024): CPC + Barlow Twins for WiFi sensing. 24.7% accuracy improvement on unseen environments. arXiv:2410.01825.
- **SSL for WiFi HAR Survey** (2025): Comprehensive evaluation of SimCLR, VICReg, Barlow Twins, SimSiam on WiFi CSI. arXiv:2506.12052.
---
## 4. WiFi Sensing SOTA (Pose, Vitals, Gait)
- Geng, J., Huang, D., & De la Torre, F. (2022). **DensePose From WiFi**. *CMU*. arXiv:2301.00250.
- Adib, F., Kabelac, Z., Katabi, D., & Miller, R.C. (2015). **3D Tracking via Body Radio Reflections** (WiTrack). *NSDI 2015*.
- Wang, J., Gao, X., Zhang, K., & Liu, X. (2019). **Widar 3.0: Zero-Effort Cross-Domain Gesture Recognition with Wi-Fi**. *MobiSys 2019*.
- Zhao, M., Li, T., Abu Alsheikh, M., Tian, Y., Zhao, H., Torralba, A., & Katabi, D. (2018). **Through-Wall Human Pose Estimation Using Radio Signals**. *CVPR 2018*.
- Zhao, M., Adib, F., & Katabi, D. (2016). **Emotion Recognition Using Wireless Signals** (EQ-Radio). *MobiCom 2016*. (HRV from WiFi; cardiac biometric baseline)
- **PerceptAlign** (Chen et al., 2026): Geometry-conditioned cross-layout WiFi pose estimation. >60% cross-domain error reduction. Dataset: 21 subjects, 5 scenes, 18 actions. arXiv:2601.12252.
- **Person-in-WiFi 3D** (Yan et al., 2024): Multi-person 3D pose from WiFi. 91.7 mm MPJPE (single-person). *CVPR 2024*.
- **DGSense** (Zhou et al., 2025): Domain-invariant features for WiFi/mmWave/acoustic sensing. arXiv:2502.08155.
- **X-Fi** (Chen & Yang, 2025): Modality-invariant foundation model for human sensing. 24.8% MPJPE improvement on MM-Fi. *ICLR 2025*. arXiv:2410.10167.
- **AM-FM** (2026): First WiFi foundation model, pretrained on 9.2M CSI samples, 20 device types, 439 days. arXiv:2602.11200.
- Ma, Y., Zhou, G., Wang, S., Zhao, H., & Jung, W. (2018). **SignFi: Sign Language Recognition Using WiFi**. *ACM IMWUT*. arXiv:1806.04583.
---
## 5. Training Datasets Referenced
- **MM-Fi** (2022): Multi-Modal Non-Intrusive 4D Human Dataset — WiFi CSI, mmWave, LiDAR, RGB-D. 27 subjects, 40 actions, 5 environments, 320K samples. 56-subcarrier CSI, 17 COCO keypoints. [github.com/ybhbingo/MMFi_dataset]
- **Wi-Pose** (2022): WiFi-based 3D pose estimation dataset. Used in ADR-015.
- **NTU-Fi** (2022): 56 activities, WiFi CSI, 75 Hz sampling. Used for WhoFi evaluation.
---
## 6. Differential Privacy
- Abadi, M., Chu, A., Goodfellow, I., McMahan, H.B., Mironov, I., Talwar, K., & Zhang, L. (2016). **Deep Learning with Differential Privacy**. *CCS 2016*. [Moments Accountant; DP-SGD formulation used in ADR-106]
- Mironov, I. (2017). **Rényi Differential Privacy**. *CSF 2017*. [Alternative DP accounting; referenced in ADR-106 as future enhancement]
- Shokri, R., Stronati, M., Song, C., & Shmatikov, V. (2017). **Membership Inference Attacks Against Machine Learning Models**. *IEEE S&P 2017*. [Motivation for DP-SGD in ADR-106]
---
## 7. Cryptographic Standards
- **RFC 8032** (2017): Edwards-Curve Digital Signature Algorithm (EdDSA). [Ed25519; used in ADR-110 witness chain]
- **RFC 8439** (2018): ChaCha20 and Poly1305 for IETF Protocols. [At-rest encryption primitive specified in security.md §5]
- **RFC 9106** (2021): Argon2 Memory-Hard Function. [KDF for soul signature at-rest key derivation]
- **NIST FIPS 203** (2024): Module-Lattice-Based Key-Encapsulation Mechanism Standard (ML-KEM / Kyber). [ADR-108; post-quantum key exchange]
- **NIST FIPS 204** (2024): Module-Lattice-Based Digital Signature Standard (ML-DSA / Dilithium). [ADR-109; post-quantum signatures]
- **NIST SP 800-132 Draft** (2024): Recommendation for Password-Based Key Derivation. [Argon2id parameter guidance]
---
## 8. Biometric Standards (for Standards Awareness)
The soul signature is not currently certified to any of these standards but the
specification is designed with awareness of the relevant frameworks.
- **ISO/IEC 19794-1:2011**: Biometric data interchange formats — Part 1: Framework.
[Top-level; soul signature's node/edge schema follows the typed-attribute-record
philosophy of this standard]
- **ISO/IEC 19794-2:2011**: Biometric data interchange formats — Part 2: Finger
minutiae data. [Structural analog for how the soul signature encodes per-channel
discriminative features]
- **ISO/IEC 19794-4:2011**: Biometric data interchange formats — Part 4: Finger image data.
[Image-container analog; soul signature extends the concept to vector-valued
multi-channel templates]
- **ISO/IEC 29794-1:2016**: Biometric sample quality — Part 1: Framework.
[Quality scoring framework; soul signature's per-node `confidence` field
is conceptually analogous to ISO 29794 quality scores]
- **ISO/IEC 30107-3:2023**: Biometric presentation attack detection — Part 3:
Testing and reporting. [Presentation attack (anti-spoofing) framework;
the adversarial.rs module is the soul signature's PAD implementation]
---
## 9. Reading List for RF Biometrics Newcomers
Ordered from most accessible to most technical.
1. Adib, F. (2017). **Using Radio Reflections to See the World**. MIT PhD thesis. [Most accessible introduction to using RF for human sensing; covers WiVi, WiTrack, EQ-Radio]
2. Ma, Y., et al. (2019). **WiFi Sensing with Channel State Information: A Survey**. *ACM Computing Surveys*. doi:10.1145/3310194. [Comprehensive survey of CSI-based sensing approaches through 2019]
3. Wang, X., et al. (2023). **A Survey on WiFi Sensing: From Signal to Action**. *IEEE Internet of Things Journal*. [Updated survey through 2023; covers contrastive learning approaches]
4. Chen, T., et al. (2020). **A Simple Framework for Contrastive Learning** (SimCLR). arXiv:2002.05709. [Best starting point for understanding the contrastive learning approach used in AETHER]
5. Geng, J., et al. (2022). **DensePose From WiFi**. arXiv:2301.00250. [Direct ancestor of this codebase; describes the cross-modal CSI → DensePose mapping]
6. Abadi, M., et al. (2016). **Deep Learning with Differential Privacy**. CCS 2016. [Essential reading before any deployment collecting biometric data at training time]

View File

@ -0,0 +1,306 @@
# Soul Signature — Scanning Process
**Status:** Research Specification (Pre-Implementation)
**Date:** 2026-05-24
**Author:** ruv
---
## 1. Hardware Prerequisites
### 1.1 Full Protocol (N ≥ 3 Nodes)
| Component | Minimum | Recommended | Notes |
|---|---|---|---|
| Sensing nodes | 3 × ESP32-S3 (ADR-028) | 5+ nodes | Multi-node triangulation reduces angle-dependent blind spots; ADR-029 multistatic mesh |
| Compute appliance | Cognitum Seed (Pi 5 + Hailo) | Same | Runs the field model, AETHER inference, vitals pipeline |
| Network link | 2.4 GHz or 5 GHz AP | Dedicated sensing AP | Shared AP with user traffic degrades CSI frame rate |
| Firmware version | ADR-110 v0.7.0+ | Same | Ed25519 witness chain required for attestation |
| Clock sync | 802.15.4 time-sync (ESP32-C6) or NTP fallback | 802.15.4 preferred | ±100 µs alignment per ADR-110; NTP gives ±5 ms |
### 1.2 Degraded Mode (1 Node)
A single-node enrollment produces an incomplete signature:
- Skeletal proportions: degraded (single-angle view)
- Subcarrier reflection profile: single orientation only (3-orientation protocol collapses to 1)
- AETHER embedding: usable but lower confidence
- Cardiac / respiratory: unaffected (single-node sufficient)
- Gait timing: usable if node placement allows bidirectional walk
Single-node signatures MUST be tagged `degraded_mode: true` in the manifest. The
match score uses only the channels that met minimum confidence thresholds. The
soul signature is technically valid but should be re-enrolled with multi-node
hardware when possible.
### 1.3 ESP32-C6 Uplift (Wi-Fi 6 HE-LTF)
When at least one ESP32-C6 node is present (ADR-110), the subcarrier count
expands from 52 (HT-LTF, S3) to up to 242 (HE-LTF, C6). The MERIDIAN
HardwareNormalizer (ADR-027) maps all nodes to a canonical 56-subcarrier
representation for the AETHER backbone. The full 242-subcarrier profile is
preserved in the SubcarrierReflectionProfile node for higher-fidelity matching
when available. The C6's 802.15.4 time-sync (±100 µs) also improves multistatic
coherence relative to NTP-only S3 meshes.
---
## 2. Structured 60-Second Enrollment Protocol
The enrollment protocol produces exactly one `.rvf` soul signature file. The
protocol is structured into five phases with exact timing. A human-readable
prompt sequence should be delivered to the subject via audio or display.
### Phase 0 — Empty-Room Field Recalibration (T+0 to T+10)
Before the subject enters the sensing zone, the room must be empty and the
ADR-030 field model must be current.
```
T+0s : System checks field model age. Maximum age: 4 hours.
If stale or absent → run field recalibration:
Collect 1,200 CSI frames at 20 Hz (60 seconds of empty room)
Compute per-link Welford mean and covariance
Run SVD on covariance matrix → top-K=8 eigenmode vectors
Store in field_model.rs::FieldNormalMode
T+010s: Quiet sampling of empty-room field state. No subject present.
Operator prompt: "Please ensure the room is empty."
System: verifies presence score < 0.1 (ADR-039 Tier 2 presence detection).
Failure: if presence score ≥ 0.1, abort and report FAIL_ROOM_NOT_EMPTY.
```
This phase is skipped (not aborted) if the field model was updated within the
last 4 hours AND the current empty-room sampling confirms presence score < 0.05.
### Phase 1 — Deep Breathing Baseline (T+10 to T+25)
Subject enters the sensing zone and performs five deep breathing cycles.
```
T+10s : Subject enters scan zone. System detects presence.
Operator prompt: "Please stand still and breathe slowly and deeply."
T+1025s: Subject stands at zone center, facing node cluster.
Five complete breath cycles, each ≥ 4 seconds.
System collects:
- ADR-021 BreathingExtractor: baseline_bpm, depth_amplitude,
inspiration_expiration_ratio, HRV_RSA
- ADR-021 HeartRateExtractor: initial HR, HRV_SDNN (partial)
- AETHER embedding: accumulates over 300 CSI frames (20 Hz × 15s)
Quality gate: BreathingExtractor VitalCoherenceGate must emit
PERMIT for ≥ 10 of the 15 seconds. Failure → FAIL_POOR_BREATHING_SIGNAL.
```
### Phase 2 — Seated Rest (T+25 to T+35)
Subject sits to minimize motion and allow cardiac signal isolation.
```
T+25s : Operator prompt: "Please sit down and rest quietly."
T+2535s: Subject seated, minimal movement.
System collects:
- HeartRateExtractor: HR baseline, HRV_SDNN, HRV_RMSSD,
LF/HF ratio, sinus rhythm classification
- Cardiac_Waveform_Morphology: 64-coefficient wavelet decomposition
of bandpass-filtered cardiac phase signal (0.82.0 Hz)
Quality gate: HR confidence ≥ 0.6 for ≥ 7 of 10 seconds.
Failure → FAIL_POOR_CARDIAC_SIGNAL (soft failure: cardiac nodes
marked low-confidence; signature proceeds without them if AETHER
and gait nodes pass their own thresholds).
```
### Phase 3 — Gait Walk (T+35 to T+50)
Subject walks a 2-meter line twice in each direction.
```
T+35s : Operator prompt: "Please walk a straight line of 2 meters back and
forth twice at your natural pace."
T+3550s: Subject walks: A→B, B→A, A→B, B→A (four transits, ≥ 8 strides total).
System collects (via pose_tracker.rs, ADR-029 Sect 2.7):
- GaitTimingNode: cadence, stride_period_variance,
double_support_pct, asymmetry_index, step_width_m
- SkeletalProportionsNode: torso/limb ratios from 17-keypoint
trajectory accumulated over ≥ 8 strides
- AETHER embedding: continues accumulating (300 more frames)
Quality gate: ≥ 8 strides detected with confidence ≥ 0.7 per stride.
Failure → FAIL_INSUFFICIENT_GAIT_DATA.
Note: the ruvector-mincut DynamicPersonMatcher must confirm only one
person is tracked. If two tracks are active → FAIL_MULTIPLE_SUBJECTS.
```
### Phase 4 — Standing Orientation Scan (T+50 to T+60)
Subject stands at three orientations to capture the subcarrier reflection profile.
```
T+50s : Operator prompt: "Please stand facing the wall. I will ask you to
rotate in place twice."
T+5053s: Orientation 0° (subject faces primary node cluster).
System collects: SubcarrierReflectionProfile at 0°
(ADR-030 field-subtracted, 56 subcarriers, amplitude + phase).
T+53s : Operator prompt: "Please turn 90 degrees to your right."
T+5356s: Orientation 90°.
System collects: SubcarrierReflectionProfile at 90°.
T+56s : Operator prompt: "Please turn 90 degrees to your right again."
T+5660s: Orientation 180°.
System collects: SubcarrierReflectionProfile at 180°.
Body_Field_Coupling: computed from AETHER attention map weighted
by ADR-030 top-K=8 eigenvectors (final computation at T=60s).
T+60s : Enrollment window closes.
AETHER embedding finalized: mean pool over all ~1,200 accumulated frames.
All node confidence values computed.
```
---
## 3. Quality Gates
The enrollment FAILS and emits a structured error code if any of the following
conditions are met. Failed enrollments do not produce a stored `.rvf` file.
| Gate | Condition for FAIL | Error code |
|---|---|---|
| Room occupied | Presence score ≥ 0.1 at Phase 0 end | `FAIL_ROOM_NOT_EMPTY` |
| Multiple subjects | ≥ 2 active pose tracks during Phases 14 | `FAIL_MULTIPLE_SUBJECTS` |
| Intermittent presence | Subject exits sensing zone for > 3 consecutive seconds | `FAIL_SUBJECT_LEFT_ZONE` |
| AETHER confidence low | Final embedding confidence < 0.6 (HNSW search confidence) | `FAIL_AETHER_LOW_CONFIDENCE` |
| Breathing signal absent | VitalCoherenceGate PERMIT rate < 67% during Phase 1 | `FAIL_POOR_BREATHING_SIGNAL` |
| Gait data insufficient | Fewer than 8 strides detected with confidence ≥ 0.7 | `FAIL_INSUFFICIENT_GAIT_DATA` |
| Field model dirty | Field model age > 4 hours and recalibration refused | `FAIL_STALE_FIELD_MODEL` |
| Adversarial detection | RuvSense adversarial.rs flags physically impossible signal | `FAIL_ADVERSARIAL_SIGNAL` |
| Node count below minimum | Fewer than 2 nodes online during Phases 34 | `WARN_DEGRADED_MODE` (not a hard fail; produces degraded signature) |
Soft failures (cardiac signal only) do not abort the enrollment; they mark those
nodes as low-confidence and reduce the match weight for those channels at
recognition time.
---
## 4. Fast Scan (10-Second Degraded Identification)
A fast scan produces a partial query embedding, not a stored profile. It is used
for recognition of already-enrolled subjects, not for new enrollment.
```
T+0s : System checks whether field model is current (age < 4 hours).
If stale: recognition accuracy degraded; warn operator.
T+010s: Subject stands still at zone center, natural breathing.
System collects: AETHER embedding (200 frames, 10s at 20 Hz).
Cardiac HR: partial (confidence typically < 0.5).
Gait: not available.
Subcarrier reflection: 1 orientation only.
T+10s : Query issued against all stored profiles in HNSW index.
Match score computed using available channels only.
Cardiac, gait, and skeletal proportions excluded from denominator
(availability factor = 0 for absent channels).
```
Fast scan is acceptable for:
- Returning resident recognition (already enrolled, low-friction use case)
- Home automation triggers (occupancy attribution per ADR-115 HA-MIND)
Fast scan is NOT acceptable for:
- Initial enrollment
- High-assurance access control
- Healthcare identification
---
## 5. Continuous Mode — Implicit Signature Refinement
In continuous operating mode, the system incrementally updates the online
aggregator for enrolled persons as they go about their normal activities. The
stored profile is re-published from the aggregator every 90 days (or on the
re-scan cadence, whichever comes first). This means a deployed system becomes
more accurate over time, not less.
Convergence property: the Welford online statistics in the aggregator are
numerically stable and converge to the true population mean/variance as
observation count increases. The AETHER embedding accumulated over thousands
of natural-activity windows is more representative than a single 60-second
enrollment. The stored profile is replaced (not amended) on each re-publish; the
old profile is archived (not deleted) per the forward-secrecy requirements in
`security.md`.
The continuous mode raises a consent concern: a person is effectively being
re-enrolled continuously without explicit action. This is addressed in
`security.md §4` (Consent Architecture).
---
## 6. Multi-Room Enrollment
When a person moves across multiple sensing zones (e.g., living room and bedroom
each with a Cognitum Seed node cluster), the cross-room signature works as follows:
1. Full 60-second enrollment is performed in the primary room. This produces the
initial stored profile with `environment_normalized: false` in the manifest.
2. When the MERIDIAN domain generalization layer (ADR-027) is active, the
HardwareNormalizer maps the enrollment embedding to the environment-invariant
subspace. The stored profile is updated to `environment_normalized: true`.
3. In subsequent rooms, a fast scan (10s) is sufficient to attribute identity. The
MERIDIAN-normalized AETHER embedding handles the room shift.
4. For healthcare deployments requiring room-by-room re-enrollment for regulatory
reasons, a per-room enrollment protocol runs in each room and the signatures
are linked by the opaque `person_id` field (never by raw PII).
---
## 7. Re-Scan Cadence
| Deployment context | Re-scan interval | Rationale |
|---|---|---|
| Healthy adult (residential) | 90 days | Anatomy stable; continuous mode refines continuously |
| Child (growing skeleton) | 30 days | Skeletal proportions change; gait timing changes |
| Healthcare / clinical | Per clinical event | Post-surgery, post-illness, post-significant weight change |
| Post-exercise monitoring | 7 days during active programs | Body composition changes affect RF backscatter |
| Any | On drift alert from longitudinal.rs (ADR-030 Tier 4) | System-initiated; shown to user as "calibration recommended" |
The `longitudinal.rs` module monitors five drift metrics (GaitSymmetry,
StabilityIndex, BreathingRegularity, MicroTremor, ActivityLevel) using Welford
statistics over daily observations. When any metric exceeds 2-sigma deviation
sustained for 3 consecutive days, a `DriftAlert` is emitted. The system
displays this as "signature drift detected — re-scan recommended," not as a
health diagnosis.
---
## 8. Output Artifact
On successful completion, the enrollment pipeline produces:
1. `signature-<sha256>.rvf` — the binary soul signature container. Content-addressed.
Encrypted with the person's key (see `security.md §5`) before writing to disk.
2. `signature-<sha256>.json` — the JSON-LD sidecar for human inspection and audit.
Does not contain raw vector data. Safe to log.
3. A row in the local HNSW index (`ruvector-core::VectorIndex`, `person_track`
subindex per ADR-024 §2.4) linking the person_id to the AETHER embedding.
This index is used for O(log n) recognition queries.
4. An Ed25519 witness entry per ADR-110, signing
`(rvf_sha256 || timestamp_ns || enrolled_by_device_id)`. Stored in the
RVF SEG_WITNESS segment AND in the node's local audit log.
The enrollment process does NOT:
- Transmit raw CSI or raw biometrics to any external server.
- Publish the soul signature to MQTT or Matter unless explicitly configured with
`--privacy-mode disabled` (see `security.md §6`).
- Store PII (name, email, account linkage) in the `.rvf` file. The `person_id`
field is an opaque u64. PII linkage, if any, lives in the application layer
and is governed by separate access control.

View File

@ -0,0 +1,367 @@
# Soul Signature — Security, Privacy, and Threat Model
**Status:** Research Specification (Pre-Implementation)
**Date:** 2026-05-24
**Author:** ruv
---
## 1. Scope
This document defines the threat model, mitigations, cryptographic primitive
choices, privacy architecture, and open security research items for the Soul
Signature system. It is intended to be reviewed by a security engineer or
privacy counsel before any production deployment.
The soul signature is a passive biometric system. The security bar is:
**attacker cost to achieve a false accept must exceed the value of the
protected resource for the relevant threat model**. The soul signature does
not claim to be unbreakable. It claims to be hard enough.
---
## 2. What We Explicitly Do NOT Claim
- Not equal to fingerprint scanners on FBI-tier datasets in EER terms. RF
biometrics are a younger discipline. No independent benchmark with the soul
signature's specific multi-channel fusion exists yet.
- Not legal evidence. Passive RF biometric identification has no established
legal precedent in any jurisdiction.
- Not a replacement for explicit consent in regulated contexts (healthcare,
employment, border control).
- Not unbreakable under a nation-state adversary with full physical access to
the sensing infrastructure.
- Not validated at scale beyond the constituent ADR baselines. The AETHER
channel (ADR-024) targets >80% mAP at 5 subjects; at 100+ subjects the
false-accept rate is open research.
---
## 3. Threat Model
### 3.1 Attacker: Passive Eavesdropper on the WiFi Medium
**Capability:** An attacker near the WiFi sensing zone can observe CSI of any
person who passes through. With enough CSI, the attacker could construct an
unauthorized soul signature enrollment of an unconsenting bystander.
**Impact:** Unauthorized enrollment → unauthorized recognition → attribution of
presence to a person who did not consent.
**Mitigation:**
- Ambient CSI capture does NOT trigger enrollment. Enrollment requires the
explicit 60-second structured protocol. Ambient bystander CSI produces
`unauthenticated` pose tracks tagged as `person_id: NULL`.
- Unauthenticated RVF nodes are pruned from the HNSW index after 24 hours.
- The enrollment protocol requires presence confirmation from at least two
sensing nodes simultaneously, making drive-by enrollment geometrically
harder to achieve without physical proximity.
**Residual risk:** An attacker who can be physically present in the scanning
zone for 60 seconds, under the observation of the scanning protocol, can cause
enrollment of a fake person. This requires physical co-location and is
equivalent to the threat model for any in-person biometric registration.
### 3.2 Attacker: Active Replay
**Capability:** An attacker records a CSI stream from a legitimate enrollment
or recognition event and replays it to a sensing node to impersonate the
enrolled person.
**Impact:** False positive recognition; unauthorized access or presence attribution.
**Mitigation:**
- Each enrollment is bound to the room's ADR-030 field model eigenstate at
enrollment time. The `environment_id` field in every vector node is a
SHA-256 of the field model's eigenmode matrix. A replay in a different room
produces a different `environment_id` and a dramatically different
Subcarrier_Reflection_Profile — the cross-validation between these two
signed fields fails.
- The Ed25519 witness chain (ADR-110) includes a monotonic timestamp
(`timestamp_ns`). A replay of an old signature is detected by the timestamp
freshness check at recognition time (configurable; default: reject any
signature older than 7 days for high-assurance contexts).
- The ADR-030 field model continuously updates. Even if the replay is in the
same room, the field model's eigenstate changes as furniture is moved or
temperature shifts the propagation medium; cross-validation degrades over
time.
**Residual risk:** Replay within the same room within a short time window
(< 4 hours, before the field model rotates) by an attacker who has recorded the
original CSI with high fidelity remains a plausible attack vector. This is not
defended against by the current architecture. It requires a future ADR for
challenge-response liveness detection.
### 3.3 Attacker: Phased-Array Vest / RF Body Emulator
**Capability:** An attacker wears a device capable of emitting RF signals that
mimic another person's backscatter profile, allowing them to be recognized as
the enrolled person.
**Impact:** The strongest impersonation attack; if successful, bypasses all
electromagnetic biometric channels simultaneously.
**Mitigation:**
- The RuvSense `adversarial.rs` module (ADR-030 Tier 7) enforces four
physics-based consistency checks:
1. Multi-link consistency: a real body perturbs all mesh links passing
through its location. A vest emitting signals affects only the targeted
link(s). Detection: at least 4 links must show correlated perturbation.
2. Field model constraints: the perturbation must lie within the span of
the room's eigenmode structure. Artificially injected signals produce
perturbations inconsistent with room geometry.
3. Temporal continuity: real movement is smooth in embedding space; injected
signals can produce discontinuities flagged by the embedding velocity
monitor.
4. Energy conservation: total perturbation energy across all links must be
consistent with the number and geometry of bodies present.
- The adversarial detector fires `FAIL_ADVERSARIAL_SIGNAL` before the soul
signature match is considered.
**Residual risk:** A sophisticated attacker with a calibrated phased-array
system who also knows the room's eigenmode structure and the enrolled person's
exact multi-link backscatter pattern could in principle construct a convincing
emulation. This is a high-capability, high-cost attack. Practical countermeasure:
require multi-node confirmation (ADR-029 multistatic) which raises the
geometric complexity of the emulation exponentially with node count.
### 3.4 Attacker: Insider with Broker Access
**Capability:** A privileged operator or compromised service with read access
to the stored `.rvf` files and the HNSW person_track index.
**Impact:** Exfiltration of biometric signatures; linkage of person_id to PII
if linkage tables also accessible; replay or cross-site re-enrollment.
**Mitigation:**
- At-rest encryption: all `.rvf` files are encrypted with ChaCha20-Poly1305
using a key derived via Argon2id from a user-provided passphrase (or a FIDO2
hardware token binding). The Cognitum Seed appliance NEVER stores the
decryption key; it is re-derived from the passphrase on each access.
- The opaque `person_id` (u64) in the `.rvf` file is not PII. PII linkage, if
any, requires access to a separate application-layer database not stored on
the sensing appliance.
- The HNSW index stores only the 128-dim AETHER embedding, not raw CSI or full
soul signatures. Exfiltration of the index exposes the embedding but not the
full biometric record.
- Differential privacy (ADR-106 DP-SGD) applies at training time when AETHER
is fine-tuned on enrolled-person data, preventing membership inference attacks
that could recover training samples from model weights.
**Residual risk:** If the passphrase is weak or the FIDO2 token is compromised,
the at-rest encryption fails. Key management is a deployment responsibility.
### 3.5 Attacker: Manufacturer / Firmware Supply Chain
**Capability:** A malicious firmware update to the ESP32 node or Cognitum Seed
appliance could silently exfiltrate soul signatures or CSI streams.
**Impact:** Large-scale passive surveillance; biometric data exfiltration across
all installed appliances.
**Mitigation:**
- All firmware releases are signed with Ed25519 (ADR-100 cog packaging) and
verified by the appliance before installation. A Dilithium-3 post-quantum
co-signature is added in the transition window (ADR-109).
- The Ed25519 witness chain (ADR-110) signs each CSI frame bundle at the
sensor level. A firmware change that alters the witness chain is detectable
by downstream audit.
- Network egress from the Cognitum Seed in `--privacy-mode` is blocked for
raw CSI and soul signatures by default. Only MQTT auto-discovery messages
(ADR-115) and OTA metadata are permitted outbound.
- Open-source firmware. The ESP32 firmware and Cognitum Seed Rust crates are
open source (this repository). Independent audit is possible.
**Residual risk:** A zero-day exploit in the ESP-IDF WiFi stack or the Rust
codebase could bypass these controls. This is mitigated by regular security
audits (run `npx @claude-flow/cli@latest security scan` per CLAUDE.md) but not
eliminated.
---
## 4. Consent Architecture
### 4.1 The Enrollment-vs-Recognition Distinction
The soul signature system enforces a hard distinction:
| Action | Consent required | Mechanism |
|---|---|---|
| Enrollment | Explicit, active | 60-second protocol with operator confirmation; produces signed `.rvf` |
| Recognition of enrolled person | Implicit (enrollment = consent for recognition) | Continuous mode; HNSW match |
| Ambient sensing of unenrolled person | No — but data is transient and pruned | Unauthenticated tracks; 24h TTL |
| Updating stored profile from continuous mode | Implicit (set at enrollment time) | Aggregator auto-refresh; configurable |
The system operator is responsible for obtaining appropriate consent from
persons before performing enrollment. The technical system enforces that
enrollment cannot happen accidentally or from drive-by sensing.
### 4.2 Bystander Protection
Persons who pass through a sensing zone without being enrolled are sensed but
not persistently identified. Their data flow:
1. Pose tracker produces a track tagged `person_id: NULL`.
2. AETHER embedding is computed for motion detection and occupancy counting
(ADR-115 HA-MIND).
3. The embedding is written to the `temporal_baseline` HNSW index with a 24-hour
TTL and `authenticated: false`.
4. After 24 hours, the entry is automatically pruned by the `EmbeddingIndex::prune()`
method (ADR-024 §2.4).
5. No `.rvf` file is created. No persistent record exists.
This architecture satisfies the GDPR principle of data minimization (Article 5(1)(c))
for bystander data: the retention period is bounded, the data is not linked to
an identity, and the storage is proportionate to the functional purpose
(occupancy counting).
### 4.3 GDPR / HIPAA Mode
When `--privacy-mode enabled` (from ADR-115 HA-MIND §privacy):
1. Soul signatures are computed and stored locally only. They are NEVER
published to MQTT topics, Matter clusters, or any external endpoint.
2. The local REST API for accessing soul signatures requires a valid bearer
token (ADR-028 bearer_auth.rs). No unauthenticated endpoint exposes
biometric data.
3. The JSON-LD sidecar is written to the local encrypted store only. It is not
included in MQTT auto-discovery payloads.
4. The longitudinal drift metrics (ADR-030 Tier 4) are published to MQTT in
aggregated form only (e.g., `drift_detected: true`, never raw metric values
that could be used for medical inference).
5. A data deletion endpoint must be implemented: `DELETE /api/v1/persons/{id}`
removes the `.rvf` file, the HNSW index entry, the JSON-LD sidecar, and all
longitudinal Welford statistics for that person_id.
---
## 5. Cryptographic Primitives
All primitives are chosen from NIST-approved or widely-audited standards.
| Purpose | Primitive | Rationale |
|---|---|---|
| Content integrity (per-segment) | CRC32 (IEEE 802.3) | Already implemented in `rvf_container.rs:line 70`. Corruption detection, not security. |
| Content addressing | SHA-256 | File name derivation; pre-image resistance prevents name collisions |
| Ed25519 signatures | Ed25519 (RFC 8032) | ADR-110 witness chain; 64-byte signatures; 128-bit security |
| At-rest encryption | ChaCha20-Poly1305 (RFC 8439) | AEAD; software-friendly; no timing-attack surface like AES-CBC; 256-bit key |
| Key derivation from passphrase | Argon2id (RFC 9106) | Memory-hard KDF; resistant to GPU/ASIC brute-force; recommended by NIST SP 800-132 draft (2024) |
| DP-SGD noise | Gaussian N(0, σ²C²I) per ADR-106 | (ε, δ)-DP per Abadi et al. 2016 Moments Accountant |
| Post-quantum key exchange (future) | Kyber-768 (NIST FIPS 203, 2024) | ADR-108; ~AES-192 security; NIST CNSA 2.0 recommended |
| Post-quantum signatures (future) | Dilithium-3 (NIST FIPS 204, 2024) | ADR-109; hybrid mode with Ed25519 during transition window |
### 5.1 Argon2id Parameters
Default parameters for soul signature key derivation:
```
m_cost = 65536 (64 MB memory)
t_cost = 3 (3 iterations)
p_cost = 4 (4 parallel lanes)
output_len = 32 bytes (256-bit key for ChaCha20-Poly1305)
salt = 16 random bytes stored alongside encrypted blob (NOT the person_id)
```
These parameters provide ~100ms KDF time on a Pi 5, which is acceptable for
enrollment (one-time) and recognition (HNSW match precedes decryption, so
decryption is only triggered after a candidate match).
### 5.2 Forward Secrecy
Old soul signature files are NOT keys for new ones. Compromise of a 90-day-old
`.rvf` file does not unlock the current profile. The key is derived from the
user's passphrase each time, not derived from the previous file.
Archived files (kept for audit purposes) are re-encrypted on passphrase rotation
if the operator elects to do so via the `soul-signature re-encrypt --all` CLI
command (not yet implemented; specified here for future ADR).
---
## 6. Privacy Mode Integration (ADR-115)
The `--privacy-mode` flag defined in ADR-115 HA-MIND §9 is extended to cover
soul signature data:
| Privacy mode | MQTT publish | REST API | Local storage | HNSW index |
|---|---|---|---|---|
| `disabled` (default for home users) | Aggregated presence/count only | Authenticated bearer required | Encrypted at rest | Local only |
| `enabled` | Nothing biometric | Authenticated bearer required | Encrypted at rest | Local only |
| `research` (explicit opt-in) | Full soul signature nodes (anonymized person_id) | Open (for research deployments only) | Encrypted at rest | Exportable |
The `research` mode requires a separate `--research-consent-token` flag and is
intended for academic data collection under IRB approval. It must never be the
default.
---
## 7. Open Research and Outstanding Security Work
The following items are known security gaps or open research questions. Each
warrants a future ADR before production deployment at scale.
**7.1 Challenge-Response Liveness Detection**
Replay attacks within a short time window (see §3.2 residual risk) are not
defended against. A future mechanism should issue a random challenge (e.g.,
"please raise your left hand") and verify the CSI response matches the challenge
before accepting a recognition. This eliminates replay as a practical attack
vector. Future ADR: ADR-120 (proposed).
**7.2 False-Accept Rate at Scale (N > 20 subjects)**
The AETHER baseline (ADR-024) is tested at 5 subjects (>80% mAP). For household
deployments this is sufficient. For building-scale deployments (50-500 subjects),
the FAR is open research. Independent benchmarking on a dataset of 20+ subjects
with the full 7-channel fusion is required before building-scale deployment can
be recommended. Publication target: co-locate with ADR-027 MERIDIAN evaluation.
**7.3 Side-Channel Leakage from Encrypted RVF Files**
The file size of an encrypted `.rvf` blob is observable by an attacker with
filesystem access. File size is a function of the number of nodes present, which
reveals whether the cardiac channel was captured (high-SNR enrollment vs
low-SNR enrollment). This is a minor information leak. Mitigation: pad all
`.rvf` files to a fixed 64 KB boundary. Future ADR: append to ADR-106.
**7.4 Membership Inference in Continuous Mode**
In continuous mode, the AETHER model is fine-tuned on the enrolled person's
data over months. An adversary with access to the model weights before and after
a re-train cycle could infer that a specific enrollment occurred, even without
the soul signature file, via membership inference (Shokri et al. 2017).
ADR-106 DP-SGD mitigates this for federation round deltas but not for local
single-device fine-tuning. Extension of DP-SGD to the local continuous-mode
update is required. Future ADR: extend ADR-106.
**7.5 Physical Access to Sensing Nodes**
An attacker with physical access to an ESP32 node can extract the firmware and
attempt to reverse the Ed25519 signing key (if the key is stored in ESP32
NVS without protection). ADR-110 uses NVS for key storage. A future ADR should
mandate secure element storage (e.g., ATECC608A co-processor on the Cognitum
Seed) for the signing key. Future ADR: ADR-121 (proposed).
**7.6 Federated Learning Linkability**
When AETHER is retrained via federated learning (ADR-105), the LoRA weight
deltas carry information about enrolled persons. ADR-106 applies DP-SGD to
these deltas, but the post-quantum migration path (ADR-108 Kyber-768) is not
yet integrated with the federation protocol. Until ADR-108 Phase 2 ships, the
federation link is classically encrypted and vulnerable to harvest-now-decrypt-later
attacks by quantum-capable adversaries. Assessed risk: low until 2027.
---
## 8. Summary Security Properties Table
| Property | Status | Evidence |
|---|---|---|
| At-rest encryption | Specified (ChaCha20-Poly1305 + Argon2id) | This document §5 |
| Ed25519 attestation | Implemented | ADR-110 witness chain |
| Replay resistance (cross-room) | Implemented | ADR-030 field model environment_id binding |
| Replay resistance (same-room, short window) | Open gap | §7.1 |
| Anti-spoofing (single-link injection) | Implemented | adversarial.rs multi-link consistency |
| Anti-spoofing (phased-array vest) | Partial | adversarial.rs + energy conservation; residual risk documented |
| Bystander protection | Specified | 24h TTL on unauthenticated tracks; §4.2 |
| DP-SGD training privacy | Implemented (federation) | ADR-106 |
| DP-SGD training privacy (local continuous mode) | Open gap | §7.4 |
| GDPR data deletion | Specified | §4.3 `DELETE /api/v1/persons/{id}` |
| Post-quantum migration path | Specified (Kyber-768, Dilithium-3) | ADR-108, ADR-109 |
| Firmware supply chain integrity | Implemented (Ed25519 cog signing) | ADR-100, ADR-109 hybrid |
| False-accept rate at scale | Open research | §7.2 |
| Liveness detection | Open gap | §7.1 |
| Secure element key storage | Open gap | §7.5 |

View File

@ -0,0 +1,525 @@
# Soul Signature — Technical Specification
**Status:** Research Specification (Pre-Implementation)
**Date:** 2026-05-24
**Author:** ruv
---
## 1. Overview
A Soul Signature is a typed, content-addressed RVF graph encoding seven
electromagnetic observables extracted from a person in a WiFi-DensePose sensing
zone. The graph is stored as a single `.rvf` binary blob using the existing RVF
container format (`v2/crates/wifi-densepose-sensing-server/src/rvf_container.rs`)
extended with two new segment types defined below. A human-readable JSON sidecar
accompanies the blob for inspection and provenance.
The signature is probabilistic, not deterministic. Matching computes a weighted
cosine similarity across graph dimensions, producing a score in [0, 1] with a
calibrated false-accept rate (FAR). The FAR at a given threshold is an open
research question; the AETHER person re-identification baseline (ADR-024 §2.8:
>80% mAP at 5 subjects) is the lower bound for the primary embedding channel.
---
## 2. Design Principles
### 2.1 Per-Individual
The signature encodes features that are structurally unique to one person at the
sensing resolution of commodity WiFi hardware. Discriminative dimensions include:
cardiac timing (R-R interval structure), respiratory mechanics (tidal depth,
inspiration-to-expiration ratio), skeletal proportions (limb ratios from 17-keypoint
pose, ADR-079), gait cadence variability, and the RF backscatter profile shaped by
body mass distribution and geometry.
### 2.2 Passive at Enrollment Time
No explicit action from the subject is required at recognition time after
enrollment. Recognition fires whenever an enrolled person is detected in a sensing
zone. Enrollment itself requires a 60-second structured protocol (see
`scanning-process.md`). This is a deliberate asymmetry: passive recognition +
active enrollment — which is the same model used by FaceID (passive unlock after
initial face setup).
The passivity of post-enrollment recognition is a privacy concern addressed in full
in `security.md` §4.
### 2.3 Multi-Modal
Seven orthogonal channels contribute. Orthogonality matters: if one channel
degrades (e.g., cardiac is masked by motion), the remaining six carry the match.
No single channel is necessary for a positive identification above threshold;
the fused score is a weighted aggregate.
### 2.4 Persistent Across Time
The stored signature is valid over weeks to months for adults with stable anatomy
and health. Re-scan cadence is prescribed in `scanning-process.md`. The
`longitudinal.rs` module (ADR-030 Tier 4) provides the drift detection that
flags when a re-scan is necessary.
### 2.5 Defensible False-Accept Rate
The security model is not "unbreakable." It is "attacker cost exceeds value of
attack for the threat model in §security." See `security.md` §3.
---
## 3. Signature as a Typed RVF Graph
### 3.1 Container Format
The soul signature reuses the RVF binary container defined in
`v2/crates/wifi-densepose-sensing-server/src/rvf_container.rs` (lines 1660).
Existing segment types used:
| Segment type | Const | Purpose in soul signature |
|---|---|---|
| `SEG_MANIFEST` | `0x05` | Graph metadata: schema version, enroll timestamp, device ID, person_id (opaque u64) |
| `SEG_VEC` | `0x01` | AETHER 128-dim embedding weights (backbone + projection head) |
| `SEG_META` | `0x07` | JSON overlay: all non-vector node attributes |
| `SEG_WITNESS` | `0x0A` | Ed25519 signature over `(content_hash_sha256 || timestamp_ns || enrolled_by_device_id)` |
| `SEG_EMBED` | `0x0C` | AETHER embedding config + projection head weights (ADR-024 Phase 7) |
| `SEG_LORA` | `0x0D` | Per-environment LoRA deltas for environment-adapted query |
Two new segment types are proposed for the soul signature extension:
| Segment type | Const | Purpose |
|---|---|---|
| `SEG_SOUL_GRAPH` | `0x10` | JSON-serialized graph: node list + edge list + attribute schemas |
| `SEG_SOUL_INDEX` | `0x11` | Per-node HNSW index serialization for fast graph-level query |
The `SegmentHeader` structure is unchanged. Each segment is 64-byte aligned
(field `alignment_pad` at offset `0x3C`). CRC32 content hash at offset `0x28`
covers the payload, providing tamper detection per the existing implementation
at `rvf_container.rs:line 70`.
### 3.2 Node Types
Each node is a typed struct. Serialized into SEG_META as a JSON object with a
`node_type` discriminator string. Vector fields (f32 arrays) are co-located in
a SEG_VEC segment indexed by the node's `vec_segment_id` field.
#### Node: AETHER_Embedding
Primary identity anchor. The contrastive CSI embedding from ADR-024.
```rust
pub struct AetherEmbeddingNode {
pub node_type: &'static str, // "AETHER_Embedding"
pub vec_segment_id: u64, // references SEG_VEC containing 128 f32s
pub embedding_dim: usize, // 128
pub backbone: String, // "csi-to-pose-transformer"
pub pretrain_method: String, // "simclr+vicreg"
pub alignment_score: f32, // Lowman alignment metric at enrollment time
pub uniformity_score: f32, // Hypersphere uniformity at enrollment time
pub enrollment_frames: u32, // Number of CSI windows averaged into this node
pub environment_id: String, // SHA-256 of field model eigenstate at enrollment
pub confidence: f32, // HNSW search confidence against person_track index
}
```
Stored size: 128 × 4 = 512 bytes in SEG_VEC; JSON metadata ~200 bytes in SEG_META.
Per ADR-024 §2.8, the person re-identification target is >80% mAP at 5 subjects.
At 10+ subjects the accuracy is open research; baseline TBD.
#### Node: Cardiac_HR_Profile
Extracted from the ADR-039 vitals pipeline (magic `0xC511_0002`, fields offset 6-11:
breathing_rate at `u16 LE` BPM×100, heart_rate at `u32 LE` BPM×10000).
For the soul signature, cardiac extraction uses the ADR-021 bandpass pipeline
(0.82.0 Hz) over a minimum 30-second rest window.
```rust
pub struct CardiacHRProfileNode {
pub node_type: &'static str, // "Cardiac_HR_Profile"
pub baseline_bpm: f32, // mean HR over enrollment window (40180 BPM range)
pub hrv_sdnn_ms: f32, // SDNN: std dev of R-R intervals (ms)
pub hrv_rmssd_ms: f32, // RMSSD: root mean square successive differences
pub hrv_lf_power: f32, // LF band power (0.040.15 Hz), normalized
pub hrv_hf_power: f32, // HF band power (0.150.4 Hz), normalized
pub hrv_lf_hf_ratio: f32, // LF/HF ratio (autonomic balance marker)
pub sinus_rhythm_class: u8, // 0=regular, 1=irregular, 2=indeterminate
pub confidence: f32, // from ADR-021 VitalCoherenceGate PERMIT fraction
pub window_seconds: u32, // duration of the measurement window
}
```
WiFi CSI-based HRV extraction is an active research area. The SDNN and RMSSD values
are discriminative at group level (Zhao et al. 2017, Widar 3.0 2019) but per-person
uniqueness has not been independently validated at scale. Status: open research.
#### Node: Cardiac_Waveform_Morphology
Wavelet decomposition of the bandpass-filtered cardiac phase signal. Captures the
shape of the cardiac waveform, not just its rate. More discriminative than HR alone
but requires higher SNR and longer measurement window.
```rust
pub struct CardiacWaveformMorphologyNode {
pub node_type: &'static str, // "Cardiac_Waveform_Morphology"
pub vec_segment_id: u64, // references SEG_VEC: 64 f32 wavelet coefficients
pub wavelet_family: String, // "db4" (Daubechies 4, standard for cardiac)
pub decomposition_levels: u8, // 4 levels
pub snr_db: f32, // measured SNR at enrollment; low-SNR nodes down-weighted
pub confidence: f32,
}
```
Wavelet coefficient dimension: 64 floats = 256 bytes in SEG_VEC. Waveform
morphology from CSI is highly environment-dependent; the ADR-030 field model
subtraction must run before this measurement is taken to isolate body perturbation
from room standing-wave artifacts.
#### Node: Respiratory_Pattern
Extracted by the ADR-021 BreathingExtractor (0.10.5 Hz bandpass) plus the
ADR-030 persistence layer that accumulates statistics over the enrollment window.
```rust
pub struct RespiratoryPatternNode {
pub node_type: &'static str, // "Respiratory_Pattern"
pub baseline_bpm: f32, // mean RR (normal adult: 1220 BPM)
pub depth_amplitude_normalized: f32, // tidal depth proxy from CSI variance
pub inspiration_expiration_ratio: f32, // I:E ratio (1:1.5 to 1:3 typical)
pub hrv_rsa_power: f32, // respiratory sinus arrhythmia spectral power
pub apnea_index: f32, // events per hour of significant pauses
pub waveform_regularity: f32, // coefficient of variation of breath intervals
pub confidence: f32,
pub window_seconds: u32,
}
```
Note: the `apnea_index` field is a biophysical proxy signal (pause events in
the signal), not a clinical AHI score. It is provided for signature
discriminability, not diagnostic use.
#### Node: Gait_Timing
Extracted from the 17-keypoint Kalman pose tracker (`pose_tracker.rs`, ADR-029
Sect 2.7) during the gait phase of the enrollment protocol. The tracker uses
ruvector-mincut for person separation and AETHER re-ID for identity continuity.
```rust
pub struct GaitTimingNode {
pub node_type: &'static str, // "Gait_Timing"
pub cadence_steps_per_min: f32, // steps per minute
pub stride_period_variance: f32, // coefficient of variation of stride period
pub double_support_pct: f32, // fraction of gait cycle in double support
pub asymmetry_index: f32, // |left_stride - right_stride| / mean_stride
pub step_width_m: f32, // lateral distance between foot strikes (proxy)
pub velocity_variance: f32, // gait speed variability
pub confidence: f32,
pub stride_count: u32, // number of strides captured during enrollment
}
```
Gait biometrics from WiFi CSI are documented in WiGait (Adib et al., SIGCOMM
2015) and WiDraw (Wang et al., MobiCom 2014). Discrimination across 10+ subjects
in the same household is an open research question for the WiFi-only modality.
#### Node: Skeletal_Proportions
Derived from the ADR-079 camera + CSI paired keypoint pipeline when available,
or from CSI-only pose estimation (ADR-023 CsiToPoseTransformer) in camera-free
deployments. Encodes body geometry as ratios (not absolute values) for scale
invariance.
```rust
pub struct SkeletalProportionsNode {
pub node_type: &'static str, // "Skeletal_Proportions"
pub torso_to_leg_ratio: f32, // torso height / leg length
pub shoulder_to_hip_ratio: f32, // shoulder width / hip width
pub upper_to_lower_arm_ratio: f32, // upper arm / forearm
pub upper_to_lower_leg_ratio: f32, // thigh / shin
pub head_to_torso_ratio: f32, // head height / torso height
pub arm_span_to_height_ratio: f32, // Vitruvian ratio (close to 1.0 for most adults)
pub confidence: f32,
pub keypoint_source: String, // "camera_paired" | "csi_only" | "fused"
}
```
CSI-only skeletal proportion estimation has ~1525% error on individual ratio
values (open research; baseline from ADR-023 MPJPE ~91.7 mm at best, per
Person-in-WiFi 3D, CVPR 2024). Camera-paired values (ADR-079) are substantially
more accurate. The node degrades gracefully when only CSI is available.
#### Node: Subcarrier_Reflection_Profile
The per-subcarrier amplitude attenuation and phase shift profile measured when
the subject stands still at three orientations (0°, 90°, 180° rotation). This
encodes the body's RF backscatter cross-section shape, which is determined by
body mass distribution, limb geometry, and clothing/material factors.
```rust
pub struct SubcarrierReflectionProfileNode {
pub node_type: &'static str, // "Subcarrier_Reflection_Profile"
pub vec_segment_id: u64, // SEG_VEC: 56 × 3 × 2 = 336 f32s
// (56 subcarriers × 3 orientations ×
// [amplitude_attenuation, phase_shift])
pub n_subcarriers: u8, // 56 (HT-LTF) or up to 242 (HE-LTF, ADR-110 C6)
pub n_orientations: u8, // 3
pub frequency_mhz: u32, // center frequency at measurement time
pub environment_id: String, // references field model used for subtraction
pub confidence: f32,
}
```
This node directly exploits the ADR-030 field model: the empty-room baseline
eigenstate is subtracted before computing the reflection profile, isolating the
person's contribution. Without ADR-030 field subtraction, the profile is too
environment-coupled to be transferable across rooms. With MERIDIAN (ADR-027),
the hardware-normalizer layer maps ESP32-S3 (52 subcarriers HT-LTF) and
ESP32-C6 (242 subcarriers HE-LTF per ADR-110) into a canonical 56-subcarrier
representation before this measurement.
Stored: 336 × 4 = 1,344 bytes in SEG_VEC.
#### Node: Body_Field_Coupling
The AETHER attention map cells weighted by the ADR-030 room eigenmode structure.
Encodes how strongly the person's body couples to each dominant electromagnetic
mode of the room. This is the most physics-grounded node: it captures the
person's interaction with the actual electromagnetic geometry of the space.
```rust
pub struct BodyFieldCouplingNode {
pub node_type: &'static str, // "Body_Field_Coupling"
pub vec_segment_id: u64, // SEG_VEC: n_eigenmodes × n_keypoints f32s
pub n_eigenmodes: u8, // top-K SVD modes from field_model.rs (default K=8)
pub n_keypoints: u8, // 17 (COCO)
pub eigenmode_energy_fractions: Vec<f32>, // fraction of total variance per mode
pub environment_id: String, // must match SubcarrierReflectionProfile env
pub confidence: f32,
}
```
This node is only valid when the same room's field model is available. For
cross-room recognition, MERIDIAN's environment-disentangled embedding (ADR-027)
is used instead. The BodyFieldCoupling node provides additional discriminative
power in single-room deployments and degrades to optional in multi-room contexts.
---
### 3.3 Edge Types
Edges are stored in the SEG_SOUL_GRAPH JSON array. Each edge has a typed
relationship that constrains how the nodes may be used in matching.
| Edge type | Source node(s) | Target node(s) | Semantics |
|---|---|---|---|
| `derived_from` | FieldModel_Residual (implicit) | AetherEmbedding | The embedding was computed after field model subtraction |
| `correlates_with` | Cardiac_HR_Profile | Respiratory_Pattern | Cardiorespiratory coupling at measurement time; correlation coefficient stored as edge weight |
| `temporally_colocated` | Any pair | Any pair | Both nodes were measured in the same time window; ensures consistency |
| `temporally_after` | Post-gait node | Pre-gait node | Nodes acquired sequentially during enrollment protocol |
| `requires_field_model` | SubcarrierReflectionProfile | BodyFieldCoupling | Matching this node requires the same room's ADR-030 field model |
| `fuses` | AetherEmbedding | SubcarrierReflectionProfile | MERIDIAN-normalized fusion: both mapped to environment-invariant space |
| `attested_by` | Any leaf node | WitnessChain | Ed25519 witness covers this node's content hash |
| `derived_by_keypoint_tracker` | GaitTiming | SkeletalProportions | Both extracted from the same pose_tracker.rs output |
| `environment_normalized` | Any node with `environment_id` | MERIDIAN manifest | MERIDIAN (ADR-027) was applied; signature is cross-room capable |
---
### 3.4 The Aggregator vs. the Stored Profile
Two distinct graph instances exist in the runtime:
**Online Aggregator** — a mutable, in-memory graph that accumulates measurements
across multiple sensing windows. Nodes are incrementally updated with Welford
online statistics (`field_model.rs::WelfordStats`). Confidence fields grow toward
1.0 as more frames accumulate. The aggregator never writes to disk during
normal operation.
**Stored Profile** — an immutable, content-addressed `.rvf` file on disk. It is
generated from the aggregator at the end of the enrollment protocol, when all node
confidence fields exceed their minimum thresholds. The stored profile is the
canonical soul signature.
```
Online Aggregator (RAM) Stored Profile (disk / secure enclave)
+----------------------+ +---------------------------+
| AETHER_Embedding | enrollment | signature-<sha256>.rvf |
| accumulated over | completion | SEG_MANIFEST |
| 60-second protocol +-------------> | SEG_VEC (embedding + refl)|
| Confidence: 0.0→1.0 | when all | SEG_META (all node attrs) |
| | gates pass | SEG_EMBED (AETHER config) |
| Cardiac_HR_Profile | | SEG_WITNESS (Ed25519) |
| accumulated 30s rest | | SEG_SOUL_GRAPH (graph) |
+----------------------+ +---------------------------+
```
The aggregator pattern ensures that a partial scan (e.g., subject leaves after
20 seconds) never produces a stored profile — the quality gates prevent premature
commitment (see `scanning-process.md §5`).
---
### 3.5 Serialization
**Binary container:** RVF blob, per `rvf_container.rs`. All numeric data is
little-endian, f32 IEEE 754. Segment alignment: 64 bytes. CRC32 (IEEE 802.3
polynomial) over each segment payload.
**Content addressing:** The file name is:
```
signature-<sha256-hex-of-rvf-bytes>.rvf
```
SHA-256 is computed over the complete concatenated RVF byte stream after
`RvfBuilder::build()`. This is a different hash from the per-segment CRC32;
the CRC32 provides corruption detection within segments, the SHA-256 provides
content-based addressing and enables deduplication.
**JSON-LD sidecar:** An optional `signature-<sha256>.json` file with the same
base name. Structure:
```json
{
"@context": "https://ruv.net/soul-signature/v1",
"schema_version": "0.1.0",
"person_id": "<opaque_u64_hex>",
"enrolled_at": "2026-05-24T00:00:00Z",
"enrolled_by_device_id": "<mac_or_device_fingerprint>",
"rvf_sha256": "<content_hash>",
"nodes": [
{ "node_type": "AETHER_Embedding", "confidence": 0.92, ... },
{ "node_type": "Cardiac_HR_Profile", "confidence": 0.85, ... },
...
],
"edges": [...],
"witness": {
"algorithm": "Ed25519",
"public_key": "<hex>",
"signature": "<hex>",
"signed_fields": ["rvf_sha256", "enrolled_at", "enrolled_by_device_id"]
}
}
```
The JSON-LD sidecar is human-readable and intended for audit and provenance.
It does not contain raw biometric vectors; those stay in the RVF blob.
**ISO/IEC 19794-4 alignment:** The soul signature's graph-based vector template
is conceptually analogous to the ISO/IEC 19794-4 finger image data format
and ISO/IEC 19794-2 minutiae data. The node/edge schema is not binary-compatible
with ISO 19794, but the design intent (typed attribute records, quality scores,
creator provenance) follows the same standard's principles. Future work may
include a conformance layer if regulatory certification is sought.
---
### 3.6 Matching Algorithm
Given a stored profile `P` and a query embedding `Q` derived from a live sensing
window, the match score is computed as a weighted sum of per-channel cosine
similarities:
```
match_score = sum_i ( w_i * cosine_sim(P.channel_i, Q.channel_i) )
/ sum_i ( w_i * availability(P.channel_i, Q.channel_i) )
```
Where `availability` is 1.0 if both nodes are present and 0.0 if either is absent
(graceful degradation when a channel cannot be measured in the query window).
Default weights (open research; these are design intent, not validated):
| Channel | Weight | Rationale |
|---|---|---|
| AETHER_Embedding | 0.35 | Primary identity anchor; best-studied channel |
| Subcarrier_Reflection_Profile | 0.20 | Body geometry; angle-stable |
| Cardiac_HR_Profile | 0.15 | Physiologically stable in healthy adults |
| Gait_Timing | 0.15 | Well-studied biometric; discriminative |
| Respiratory_Pattern | 0.10 | More variable than cardiac |
| Skeletal_Proportions | 0.05 | Proxy for body shape; CSI-only is noisy |
| Body_Field_Coupling | 0.00 (single-room) / 0.10 (cross-room disabled) | Valid only when room field model available |
| Cardiac_Waveform_Morphology | 0.05 (supplementary) | High SNR requirement |
The threshold for a positive match is a deployment-specific parameter with a
documented FAR/FRR trade-off. The AETHER channel alone achieves >80% mAP at 5
subjects (ADR-024 §2.8 target). The fused multi-channel score is expected to
exceed this; the exact improvement is open research, baseline TBD.
---
### 3.7 Rust Type Sketch
The following sketch shows how the soul signature types would integrate with
the existing codebase. This is a design sketch, not implemented code.
```rust
// In a future: v2/crates/wifi-densepose-sensing-server/src/soul_signature.rs
pub const SEG_SOUL_GRAPH: u8 = 0x10;
pub const SEG_SOUL_INDEX: u8 = 0x11;
/// Complete soul signature as a graph container.
pub struct SoulSignature {
/// Content-addressed identifier: SHA-256 of the RVF blob bytes.
pub content_hash: [u8; 32],
/// Opaque person identifier (never PII directly).
pub person_id: u64,
/// Unix timestamp of enrollment completion (nanoseconds).
pub enrolled_at_ns: u64,
/// Device that performed enrollment.
pub enrolled_by_device_id: String,
/// All graph nodes, typed.
pub nodes: SoulNodes,
/// All graph edges.
pub edges: Vec<SoulEdge>,
/// Ed25519 witness chain (per ADR-110).
pub witness: WitnessChain,
}
pub struct SoulNodes {
pub aether_embedding: Option<AetherEmbeddingNode>,
pub cardiac_hr: Option<CardiacHRProfileNode>,
pub cardiac_waveform: Option<CardiacWaveformMorphologyNode>,
pub respiratory: Option<RespiratoryPatternNode>,
pub gait_timing: Option<GaitTimingNode>,
pub skeletal_proportions: Option<SkeletalProportionsNode>,
pub subcarrier_reflection: Option<SubcarrierReflectionProfileNode>,
pub body_field_coupling: Option<BodyFieldCouplingNode>,
}
pub struct SoulEdge {
pub edge_type: SoulEdgeType,
pub source_node_type: String,
pub target_node_type: String,
pub weight: f32, // edge attribute (e.g., correlation coefficient)
}
pub enum SoulEdgeType {
DerivedFrom,
CorrelatesWith,
TemporallyColocated,
TemporallyAfter,
RequiresFieldModel,
Fuses,
AttestedBy,
DerivedByKeypointTracker,
EnvironmentNormalized,
}
impl SoulSignature {
/// Serialize to an RVF binary blob.
pub fn to_rvf(&self) -> Vec<u8>;
/// Deserialize from an RVF binary blob.
pub fn from_rvf(data: &[u8]) -> Result<Self, SoulError>;
/// Compute the weighted match score against a query.
pub fn match_score(&self, query: &SoulQuery, weights: &MatchWeights) -> f32;
/// Check whether all required nodes meet minimum confidence thresholds.
pub fn is_complete(&self, policy: &CompletenessPolicy) -> bool;
}
```
---
### 3.8 What the Signature Is NOT
- Not a fingerprint of the room (that is the ADR-030 field model, a separate object).
- Not a waveform recording (the enrolled vectors are statistics and embeddings, not raw CSI).
- Not invertible to the original CSI stream (the AETHER projection head's information bottleneck prevents reconstruction; see ADR-024 §4 Negative consequences).
- Not a single scalar. Reducing to one number for threshold comparison is a deployment decision; the underlying object is a 7-channel graph.
- Not equal to a stored pose. The AETHER embedding captures body dynamics over many windows, not a single body pose at one instant.