368 lines
18 KiB
Markdown
368 lines
18 KiB
Markdown
# Soul Signature — Security, Privacy, and Threat Model
|
||
|
||
**Status:** Research Specification (Pre-Implementation)
|
||
**Date:** 2026-05-24
|
||
**Author:** ruv
|
||
|
||
---
|
||
|
||
## 1. Scope
|
||
|
||
This document defines the threat model, mitigations, cryptographic primitive
|
||
choices, privacy architecture, and open security research items for the Soul
|
||
Signature system. It is intended to be reviewed by a security engineer or
|
||
privacy counsel before any production deployment.
|
||
|
||
The soul signature is a passive biometric system. The security bar is:
|
||
**attacker cost to achieve a false accept must exceed the value of the
|
||
protected resource for the relevant threat model**. The soul signature does
|
||
not claim to be unbreakable. It claims to be hard enough.
|
||
|
||
---
|
||
|
||
## 2. What We Explicitly Do NOT Claim
|
||
|
||
- Not equal to fingerprint scanners on FBI-tier datasets in EER terms. RF
|
||
biometrics are a younger discipline. No independent benchmark with the soul
|
||
signature's specific multi-channel fusion exists yet.
|
||
- Not legal evidence. Passive RF biometric identification has no established
|
||
legal precedent in any jurisdiction.
|
||
- Not a replacement for explicit consent in regulated contexts (healthcare,
|
||
employment, border control).
|
||
- Not unbreakable under a nation-state adversary with full physical access to
|
||
the sensing infrastructure.
|
||
- Not validated at scale beyond the constituent ADR baselines. The AETHER
|
||
channel (ADR-024) targets >80% mAP at 5 subjects; at 100+ subjects the
|
||
false-accept rate is open research.
|
||
|
||
---
|
||
|
||
## 3. Threat Model
|
||
|
||
### 3.1 Attacker: Passive Eavesdropper on the WiFi Medium
|
||
|
||
**Capability:** An attacker near the WiFi sensing zone can observe CSI of any
|
||
person who passes through. With enough CSI, the attacker could construct an
|
||
unauthorized soul signature enrollment of an unconsenting bystander.
|
||
|
||
**Impact:** Unauthorized enrollment → unauthorized recognition → attribution of
|
||
presence to a person who did not consent.
|
||
|
||
**Mitigation:**
|
||
- Ambient CSI capture does NOT trigger enrollment. Enrollment requires the
|
||
explicit 60-second structured protocol. Ambient bystander CSI produces
|
||
`unauthenticated` pose tracks tagged as `person_id: NULL`.
|
||
- Unauthenticated RVF nodes are pruned from the HNSW index after 24 hours.
|
||
- The enrollment protocol requires presence confirmation from at least two
|
||
sensing nodes simultaneously, making drive-by enrollment geometrically
|
||
harder to achieve without physical proximity.
|
||
|
||
**Residual risk:** An attacker who can be physically present in the scanning
|
||
zone for 60 seconds, under the observation of the scanning protocol, can cause
|
||
enrollment of a fake person. This requires physical co-location and is
|
||
equivalent to the threat model for any in-person biometric registration.
|
||
|
||
### 3.2 Attacker: Active Replay
|
||
|
||
**Capability:** An attacker records a CSI stream from a legitimate enrollment
|
||
or recognition event and replays it to a sensing node to impersonate the
|
||
enrolled person.
|
||
|
||
**Impact:** False positive recognition; unauthorized access or presence attribution.
|
||
|
||
**Mitigation:**
|
||
- Each enrollment is bound to the room's ADR-030 field model eigenstate at
|
||
enrollment time. The `environment_id` field in every vector node is a
|
||
SHA-256 of the field model's eigenmode matrix. A replay in a different room
|
||
produces a different `environment_id` and a dramatically different
|
||
Subcarrier_Reflection_Profile — the cross-validation between these two
|
||
signed fields fails.
|
||
- The Ed25519 witness chain (ADR-110) includes a monotonic timestamp
|
||
(`timestamp_ns`). A replay of an old signature is detected by the timestamp
|
||
freshness check at recognition time (configurable; default: reject any
|
||
signature older than 7 days for high-assurance contexts).
|
||
- The ADR-030 field model continuously updates. Even if the replay is in the
|
||
same room, the field model's eigenstate changes as furniture is moved or
|
||
temperature shifts the propagation medium; cross-validation degrades over
|
||
time.
|
||
|
||
**Residual risk:** Replay within the same room within a short time window
|
||
(< 4 hours, before the field model rotates) by an attacker who has recorded the
|
||
original CSI with high fidelity remains a plausible attack vector. This is not
|
||
defended against by the current architecture. It requires a future ADR for
|
||
challenge-response liveness detection.
|
||
|
||
### 3.3 Attacker: Phased-Array Vest / RF Body Emulator
|
||
|
||
**Capability:** An attacker wears a device capable of emitting RF signals that
|
||
mimic another person's backscatter profile, allowing them to be recognized as
|
||
the enrolled person.
|
||
|
||
**Impact:** The strongest impersonation attack; if successful, bypasses all
|
||
electromagnetic biometric channels simultaneously.
|
||
|
||
**Mitigation:**
|
||
- The RuvSense `adversarial.rs` module (ADR-030 Tier 7) enforces four
|
||
physics-based consistency checks:
|
||
1. Multi-link consistency: a real body perturbs all mesh links passing
|
||
through its location. A vest emitting signals affects only the targeted
|
||
link(s). Detection: at least 4 links must show correlated perturbation.
|
||
2. Field model constraints: the perturbation must lie within the span of
|
||
the room's eigenmode structure. Artificially injected signals produce
|
||
perturbations inconsistent with room geometry.
|
||
3. Temporal continuity: real movement is smooth in embedding space; injected
|
||
signals can produce discontinuities flagged by the embedding velocity
|
||
monitor.
|
||
4. Energy conservation: total perturbation energy across all links must be
|
||
consistent with the number and geometry of bodies present.
|
||
- The adversarial detector fires `FAIL_ADVERSARIAL_SIGNAL` before the soul
|
||
signature match is considered.
|
||
|
||
**Residual risk:** A sophisticated attacker with a calibrated phased-array
|
||
system who also knows the room's eigenmode structure and the enrolled person's
|
||
exact multi-link backscatter pattern could in principle construct a convincing
|
||
emulation. This is a high-capability, high-cost attack. Practical countermeasure:
|
||
require multi-node confirmation (ADR-029 multistatic) which raises the
|
||
geometric complexity of the emulation exponentially with node count.
|
||
|
||
### 3.4 Attacker: Insider with Broker Access
|
||
|
||
**Capability:** A privileged operator or compromised service with read access
|
||
to the stored `.rvf` files and the HNSW person_track index.
|
||
|
||
**Impact:** Exfiltration of biometric signatures; linkage of person_id to PII
|
||
if linkage tables also accessible; replay or cross-site re-enrollment.
|
||
|
||
**Mitigation:**
|
||
- At-rest encryption: all `.rvf` files are encrypted with ChaCha20-Poly1305
|
||
using a key derived via Argon2id from a user-provided passphrase (or a FIDO2
|
||
hardware token binding). The Cognitum Seed appliance NEVER stores the
|
||
decryption key; it is re-derived from the passphrase on each access.
|
||
- The opaque `person_id` (u64) in the `.rvf` file is not PII. PII linkage, if
|
||
any, requires access to a separate application-layer database not stored on
|
||
the sensing appliance.
|
||
- The HNSW index stores only the 128-dim AETHER embedding, not raw CSI or full
|
||
soul signatures. Exfiltration of the index exposes the embedding but not the
|
||
full biometric record.
|
||
- Differential privacy (ADR-106 DP-SGD) applies at training time when AETHER
|
||
is fine-tuned on enrolled-person data, preventing membership inference attacks
|
||
that could recover training samples from model weights.
|
||
|
||
**Residual risk:** If the passphrase is weak or the FIDO2 token is compromised,
|
||
the at-rest encryption fails. Key management is a deployment responsibility.
|
||
|
||
### 3.5 Attacker: Manufacturer / Firmware Supply Chain
|
||
|
||
**Capability:** A malicious firmware update to the ESP32 node or Cognitum Seed
|
||
appliance could silently exfiltrate soul signatures or CSI streams.
|
||
|
||
**Impact:** Large-scale passive surveillance; biometric data exfiltration across
|
||
all installed appliances.
|
||
|
||
**Mitigation:**
|
||
- All firmware releases are signed with Ed25519 (ADR-100 cog packaging) and
|
||
verified by the appliance before installation. A Dilithium-3 post-quantum
|
||
co-signature is added in the transition window (ADR-109).
|
||
- The Ed25519 witness chain (ADR-110) signs each CSI frame bundle at the
|
||
sensor level. A firmware change that alters the witness chain is detectable
|
||
by downstream audit.
|
||
- Network egress from the Cognitum Seed in `--privacy-mode` is blocked for
|
||
raw CSI and soul signatures by default. Only MQTT auto-discovery messages
|
||
(ADR-115) and OTA metadata are permitted outbound.
|
||
- Open-source firmware. The ESP32 firmware and Cognitum Seed Rust crates are
|
||
open source (this repository). Independent audit is possible.
|
||
|
||
**Residual risk:** A zero-day exploit in the ESP-IDF WiFi stack or the Rust
|
||
codebase could bypass these controls. This is mitigated by regular security
|
||
audits (run `npx @claude-flow/cli@latest security scan` per CLAUDE.md) but not
|
||
eliminated.
|
||
|
||
---
|
||
|
||
## 4. Consent Architecture
|
||
|
||
### 4.1 The Enrollment-vs-Recognition Distinction
|
||
|
||
The soul signature system enforces a hard distinction:
|
||
|
||
| Action | Consent required | Mechanism |
|
||
|---|---|---|
|
||
| Enrollment | Explicit, active | 60-second protocol with operator confirmation; produces signed `.rvf` |
|
||
| Recognition of enrolled person | Implicit (enrollment = consent for recognition) | Continuous mode; HNSW match |
|
||
| Ambient sensing of unenrolled person | No — but data is transient and pruned | Unauthenticated tracks; 24h TTL |
|
||
| Updating stored profile from continuous mode | Implicit (set at enrollment time) | Aggregator auto-refresh; configurable |
|
||
|
||
The system operator is responsible for obtaining appropriate consent from
|
||
persons before performing enrollment. The technical system enforces that
|
||
enrollment cannot happen accidentally or from drive-by sensing.
|
||
|
||
### 4.2 Bystander Protection
|
||
|
||
Persons who pass through a sensing zone without being enrolled are sensed but
|
||
not persistently identified. Their data flow:
|
||
1. Pose tracker produces a track tagged `person_id: NULL`.
|
||
2. AETHER embedding is computed for motion detection and occupancy counting
|
||
(ADR-115 HA-MIND).
|
||
3. The embedding is written to the `temporal_baseline` HNSW index with a 24-hour
|
||
TTL and `authenticated: false`.
|
||
4. After 24 hours, the entry is automatically pruned by the `EmbeddingIndex::prune()`
|
||
method (ADR-024 §2.4).
|
||
5. No `.rvf` file is created. No persistent record exists.
|
||
|
||
This architecture satisfies the GDPR principle of data minimization (Article 5(1)(c))
|
||
for bystander data: the retention period is bounded, the data is not linked to
|
||
an identity, and the storage is proportionate to the functional purpose
|
||
(occupancy counting).
|
||
|
||
### 4.3 GDPR / HIPAA Mode
|
||
|
||
When `--privacy-mode enabled` (from ADR-115 HA-MIND §privacy):
|
||
|
||
1. Soul signatures are computed and stored locally only. They are NEVER
|
||
published to MQTT topics, Matter clusters, or any external endpoint.
|
||
2. The local REST API for accessing soul signatures requires a valid bearer
|
||
token (ADR-028 bearer_auth.rs). No unauthenticated endpoint exposes
|
||
biometric data.
|
||
3. The JSON-LD sidecar is written to the local encrypted store only. It is not
|
||
included in MQTT auto-discovery payloads.
|
||
4. The longitudinal drift metrics (ADR-030 Tier 4) are published to MQTT in
|
||
aggregated form only (e.g., `drift_detected: true`, never raw metric values
|
||
that could be used for medical inference).
|
||
5. A data deletion endpoint must be implemented: `DELETE /api/v1/persons/{id}`
|
||
removes the `.rvf` file, the HNSW index entry, the JSON-LD sidecar, and all
|
||
longitudinal Welford statistics for that person_id.
|
||
|
||
---
|
||
|
||
## 5. Cryptographic Primitives
|
||
|
||
All primitives are chosen from NIST-approved or widely-audited standards.
|
||
|
||
| Purpose | Primitive | Rationale |
|
||
|---|---|---|
|
||
| Content integrity (per-segment) | CRC32 (IEEE 802.3) | Already implemented in `rvf_container.rs:line 70`. Corruption detection, not security. |
|
||
| Content addressing | SHA-256 | File name derivation; pre-image resistance prevents name collisions |
|
||
| Ed25519 signatures | Ed25519 (RFC 8032) | ADR-110 witness chain; 64-byte signatures; 128-bit security |
|
||
| At-rest encryption | ChaCha20-Poly1305 (RFC 8439) | AEAD; software-friendly; no timing-attack surface like AES-CBC; 256-bit key |
|
||
| Key derivation from passphrase | Argon2id (RFC 9106) | Memory-hard KDF; resistant to GPU/ASIC brute-force; recommended by NIST SP 800-132 draft (2024) |
|
||
| DP-SGD noise | Gaussian N(0, σ²C²I) per ADR-106 | (ε, δ)-DP per Abadi et al. 2016 Moments Accountant |
|
||
| Post-quantum key exchange (future) | Kyber-768 (NIST FIPS 203, 2024) | ADR-108; ~AES-192 security; NIST CNSA 2.0 recommended |
|
||
| Post-quantum signatures (future) | Dilithium-3 (NIST FIPS 204, 2024) | ADR-109; hybrid mode with Ed25519 during transition window |
|
||
|
||
### 5.1 Argon2id Parameters
|
||
|
||
Default parameters for soul signature key derivation:
|
||
|
||
```
|
||
m_cost = 65536 (64 MB memory)
|
||
t_cost = 3 (3 iterations)
|
||
p_cost = 4 (4 parallel lanes)
|
||
output_len = 32 bytes (256-bit key for ChaCha20-Poly1305)
|
||
salt = 16 random bytes stored alongside encrypted blob (NOT the person_id)
|
||
```
|
||
|
||
These parameters provide ~100ms KDF time on a Pi 5, which is acceptable for
|
||
enrollment (one-time) and recognition (HNSW match precedes decryption, so
|
||
decryption is only triggered after a candidate match).
|
||
|
||
### 5.2 Forward Secrecy
|
||
|
||
Old soul signature files are NOT keys for new ones. Compromise of a 90-day-old
|
||
`.rvf` file does not unlock the current profile. The key is derived from the
|
||
user's passphrase each time, not derived from the previous file.
|
||
|
||
Archived files (kept for audit purposes) are re-encrypted on passphrase rotation
|
||
if the operator elects to do so via the `soul-signature re-encrypt --all` CLI
|
||
command (not yet implemented; specified here for future ADR).
|
||
|
||
---
|
||
|
||
## 6. Privacy Mode Integration (ADR-115)
|
||
|
||
The `--privacy-mode` flag defined in ADR-115 HA-MIND §9 is extended to cover
|
||
soul signature data:
|
||
|
||
| Privacy mode | MQTT publish | REST API | Local storage | HNSW index |
|
||
|---|---|---|---|---|
|
||
| `disabled` (default for home users) | Aggregated presence/count only | Authenticated bearer required | Encrypted at rest | Local only |
|
||
| `enabled` | Nothing biometric | Authenticated bearer required | Encrypted at rest | Local only |
|
||
| `research` (explicit opt-in) | Full soul signature nodes (anonymized person_id) | Open (for research deployments only) | Encrypted at rest | Exportable |
|
||
|
||
The `research` mode requires a separate `--research-consent-token` flag and is
|
||
intended for academic data collection under IRB approval. It must never be the
|
||
default.
|
||
|
||
---
|
||
|
||
## 7. Open Research and Outstanding Security Work
|
||
|
||
The following items are known security gaps or open research questions. Each
|
||
warrants a future ADR before production deployment at scale.
|
||
|
||
**7.1 Challenge-Response Liveness Detection**
|
||
Replay attacks within a short time window (see §3.2 residual risk) are not
|
||
defended against. A future mechanism should issue a random challenge (e.g.,
|
||
"please raise your left hand") and verify the CSI response matches the challenge
|
||
before accepting a recognition. This eliminates replay as a practical attack
|
||
vector. Future ADR: ADR-120 (proposed).
|
||
|
||
**7.2 False-Accept Rate at Scale (N > 20 subjects)**
|
||
The AETHER baseline (ADR-024) is tested at 5 subjects (>80% mAP). For household
|
||
deployments this is sufficient. For building-scale deployments (50-500 subjects),
|
||
the FAR is open research. Independent benchmarking on a dataset of 20+ subjects
|
||
with the full 7-channel fusion is required before building-scale deployment can
|
||
be recommended. Publication target: co-locate with ADR-027 MERIDIAN evaluation.
|
||
|
||
**7.3 Side-Channel Leakage from Encrypted RVF Files**
|
||
The file size of an encrypted `.rvf` blob is observable by an attacker with
|
||
filesystem access. File size is a function of the number of nodes present, which
|
||
reveals whether the cardiac channel was captured (high-SNR enrollment vs
|
||
low-SNR enrollment). This is a minor information leak. Mitigation: pad all
|
||
`.rvf` files to a fixed 64 KB boundary. Future ADR: append to ADR-106.
|
||
|
||
**7.4 Membership Inference in Continuous Mode**
|
||
In continuous mode, the AETHER model is fine-tuned on the enrolled person's
|
||
data over months. An adversary with access to the model weights before and after
|
||
a re-train cycle could infer that a specific enrollment occurred, even without
|
||
the soul signature file, via membership inference (Shokri et al. 2017).
|
||
ADR-106 DP-SGD mitigates this for federation round deltas but not for local
|
||
single-device fine-tuning. Extension of DP-SGD to the local continuous-mode
|
||
update is required. Future ADR: extend ADR-106.
|
||
|
||
**7.5 Physical Access to Sensing Nodes**
|
||
An attacker with physical access to an ESP32 node can extract the firmware and
|
||
attempt to reverse the Ed25519 signing key (if the key is stored in ESP32
|
||
NVS without protection). ADR-110 uses NVS for key storage. A future ADR should
|
||
mandate secure element storage (e.g., ATECC608A co-processor on the Cognitum
|
||
Seed) for the signing key. Future ADR: ADR-121 (proposed).
|
||
|
||
**7.6 Federated Learning Linkability**
|
||
When AETHER is retrained via federated learning (ADR-105), the LoRA weight
|
||
deltas carry information about enrolled persons. ADR-106 applies DP-SGD to
|
||
these deltas, but the post-quantum migration path (ADR-108 Kyber-768) is not
|
||
yet integrated with the federation protocol. Until ADR-108 Phase 2 ships, the
|
||
federation link is classically encrypted and vulnerable to harvest-now-decrypt-later
|
||
attacks by quantum-capable adversaries. Assessed risk: low until 2027.
|
||
|
||
---
|
||
|
||
## 8. Summary Security Properties Table
|
||
|
||
| Property | Status | Evidence |
|
||
|---|---|---|
|
||
| At-rest encryption | Specified (ChaCha20-Poly1305 + Argon2id) | This document §5 |
|
||
| Ed25519 attestation | Implemented | ADR-110 witness chain |
|
||
| Replay resistance (cross-room) | Implemented | ADR-030 field model environment_id binding |
|
||
| Replay resistance (same-room, short window) | Open gap | §7.1 |
|
||
| Anti-spoofing (single-link injection) | Implemented | adversarial.rs multi-link consistency |
|
||
| Anti-spoofing (phased-array vest) | Partial | adversarial.rs + energy conservation; residual risk documented |
|
||
| Bystander protection | Specified | 24h TTL on unauthenticated tracks; §4.2 |
|
||
| DP-SGD training privacy | Implemented (federation) | ADR-106 |
|
||
| DP-SGD training privacy (local continuous mode) | Open gap | §7.4 |
|
||
| GDPR data deletion | Specified | §4.3 `DELETE /api/v1/persons/{id}` |
|
||
| Post-quantum migration path | Specified (Kyber-768, Dilithium-3) | ADR-108, ADR-109 |
|
||
| Firmware supply chain integrity | Implemented (Ed25519 cog signing) | ADR-100, ADR-109 hybrid |
|
||
| False-accept rate at scale | Open research | §7.2 |
|
||
| Liveness detection | Open gap | §7.1 |
|
||
| Secure element key storage | Open gap | §7.5 |
|