diff --git a/CHANGELOG.md b/CHANGELOG.md index acd094db..0e5bc9a9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Security - **`ruview-swarm` beyond-SOTA security + correctness review (ADR-148 drone swarm control plane; needs ADR slot 176) — 4 real fail-open / DoS bugs fixed in the NaN-state-poisoning class, each pinned fails-on-old / passes-on-new; 5 dimensions confirmed clean with evidence.** The shared theme is **IEEE-754 NaN/Inf silently defeating a safety comparison** on data that crosses the untrusted swarm-comm trust boundary (`SwarmOrchestrator::receive_peer_state` / `receive_peer_detection` accept full `DroneState`/`CsiDetection` whose f64/f32 fields deserialize with no finite-check; the integer-encoded MAVLink wire formats in `mavlink_messages.rs` cannot carry NaN, but the serde struct path can). **(1) HIGH — `failsafe::FailSafeMachine::tick` collision-avoidance + battery fail-open** (`failsafe/mod.rs:51,75`). `nearest_neighbor_dist < collision_dist_m` and `battery_pct <= rth_pct` both evaluate `false` for a NaN operand, so a poisoned peer position (→ NaN `nearest_peer_distance` via `Position3D::distance_to`) **silently disabled collision avoidance** and a NaN battery reading kept a drone Nominal — the worst failure for a physical airframe. Fixed to fail CLOSED (`!is_finite() ||` → `EmergencyDiverge` / `ReturnToHome`). MEASURED fails-on-old: `test_nan_neighbor_distance_fails_closed_to_diverge` / `test_nan_battery_fails_closed_to_rth` both returned `Nominal` pre-fix. **(2) MEDIUM — `security::geofence::Geofence::check` NaN-altitude bypass** (`security/geofence.rs:33`). A NaN `z` (altitude) with valid x/y skipped the altitude breach (`NaN < min || NaN > max` = `false`) and returned **`Safe`** through the point-in-polygon path — a silent geofence bypass. Fixed with a leading non-finite-coordinate → `HardBreach` guard. MEASURED fails-on-old: `test_nan_altitude_fails_closed` returned `Safe` pre-fix. **(3) MEDIUM/DoS — `security::antijamming::FhssRadio` `% 0` panic on empty `channels_mhz`** (`security/antijamming.rs:65,71,102`). `FhssConfig` is `Deserialize`; an empty channel list (malformed/hostile config) made `next_hop`/`current_channel_mhz`/`evasive_hop`/`tick` panic with `remainder with a divisor of zero`, crashing the radio task. Fixed with `len == 0` early-returns (benign `0.0` sentinel). MEASURED fails-on-old: `test_empty_channels_does_not_panic` **panicked** (`divisor of zero`) pre-fix. **(4) LOW — `sensing::multiview::MultiViewFusion::fuse` NaN victim-position propagation** (`sensing/multiview.rs:70`). A NaN `victim_position` passed the `is_some()` filter and propagated through the confidence-weighted average into the fused "confirmed victim" location dispatched to the swarm. Fixed by requiring finite `confidence` + finite position components (fail-closed drop). MEASURED fails-on-old: `test_nan_victim_position_dropped_from_fusion` produced a non-finite fused position pre-fix. **Dimensions confirmed clean (with evidence):** (a) **MAVLink decode panic-safety** — `SwarmNodeState::decode(&[u8;20])` `try_into().unwrap()`s are over fixed const ranges of a fixed-size array (provably infallible; no arbitrary-length `&[u8]` path exists). (b) **UWB GPS anti-spoofing is NaN-safe** — `(gps_dist - uwb_dist).abs() <= tol` already fails CLOSED on a NaN range/position (counts as inconsistent → spoof rejected), verified by reasoning + existing `test_spoofed_gps_invalid`. (c) **Bounded grid / no allocation-from-length-field** — `ProbabilityGrid::update_bayesian`/`mark_scanned` bounds-check `cx >= width || cy >= height`; `pos_to_cell` uses saturating `as u32` (Rust `as` saturates, no UB). (d) **Mesh `nearest_k` NaN-safe sort** — `partial_cmp(..).unwrap_or(Equal)` cannot panic on NaN distances. (e) **No hardcoded secrets** — `MavlinkSigner` key is constructor-injected (`[u8;32]`), nothing embedded. **Documented-not-fixed (for ADR-176, not churned to avoid test-rewrite risk):** (i) **Raft `AppendEntries` lacks the Log-Matching consistency check** (`topology/raft.rs:187`) — a follower appends leader entries on `term >= current_term` without validating `prev_log_index`/`prev_log_term`, so a malformed/byzantine leader can corrupt a follower's log (a genuine consensus-safety gap; vote tallying is also delegated to the caller per the existing `handle_message` comment). (ii) **`MavlinkSigner::verify` uses a non-constant-time tag `==` and has no replay/timestamp-window rejection** (`security/mavlink_signing.rs:64`) — the doc comment already flags the replay limitation as a known demo/test simplification. `cargo test -p ruview-swarm --no-default-features`: **117 → 123 passed, 0 failed** (+6 pins). Workspace green; Python deterministic proof unchanged (`f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a`, bit-exact — `ruview-swarm` is off the signal proof path). +- **`nvsim` (ADR-089 NV-diamond magnetometer simulator) beyond-SOTA security review — two real degenerate-input findings fixed (config-induced panic/DoS + NaN-state-poisoning silent-corruption), each pinned by a fails-on-old test; determinism integrity, panic-free deserialisation, and RNG-seeding confirmed clean with evidence. Needs ADR slot 177.** Beyond-SOTA review of the standalone WASM-ready forward-only pipeline (`scene → source → propagation → NV ensemble → digitiser → MagFrame + SHA-256 witness`). The real risk surface is degenerate physical-parameter input (the scene + config are the external boundary, especially via the WASM `config_json`/`scene_json` entry points). **Finding 1 — NVSIM-DT-01 (config-induced panic / DoS, MEDIUM; `pipeline.rs:58,95`).** `dt` was derived as `config.dt_s.unwrap_or(1.0 / f_s_hz)`; an externally-supplied `f_s_hz == 0.0` makes `dt == +Inf`, `(dt*1e6) as u64` saturates to `u64::MAX`, and `(sample as u64) * dt_us` then **panics `attempt to multiply with overflow`** for `sample >= 2` (MEASURED: probe panicked at `pipeline.rs:95:30` under the debug/test profile; in `panic=abort` WASM this aborts the module, in release it silently wraps `t_us` to garbage). **Fixed** by sanitising `dt` (non-finite/non-positive → 1 µs fallback), capping the `u64` cast at `u64::MAX`, and using `saturating_mul` for the timestamp so no config can ever overflow it. **Finding 2 — NVSIM-NAN-01 (NaN-state-poisoning silent corruption, MEDIUM; funnel at `digitiser.rs::adc_quantise`).** A non-finite scene parameter (e.g. a `NaN`/`Inf` dipole **position**, `Inf` **moment**, or `NaN` loop **radius**) flows through `scene_field_at` and **bypasses the near-field clamp** — `NaN < R_MIN_M` is `false`, so the `1/r³` path is taken and produces a `NaN`/`Inf` field (MEASURED: `b=[NaN,NaN,NaN], sat=false`). At the ADC that non-finite value hit the `else` branch and **`NaN as i32 == 0`** (Rust saturating cast), emitting a frame with `b_pt=[0,0,0]` and **the `ADC_SATURATED` flag CLEAR** — a frame *indistinguishable from a legitimate zero-field reading* (MEASURED: `b_pt=[0.0,0.0,0.0] flags=0b0000`). This is the same NaN-poisoning class flagged across the calibration/vitals crates; the `propagation` module already guards its NaN paths, but the source→ADC path did not. **Fixed** at the single funnel point: `adc_quantise` now treats any non-finite input as out-of-range → clamps to code `0` **and raises the saturation flag**, so the corruption is visible downstream (the pipeline's existing `adc_sat` OR-reduction propagates `ADC_SATURATED` onto the frame). **Dimensions confirmed clean (with evidence):** (1) **Determinism integrity — clean.** The only RNG is `ChaCha20Rng::seed_from_u64(seed)` fully seeded from the caller's `u64` (grep: one `seed_from_u64`, zero `thread_rng`/`getrandom`/`SystemTime`/`Instant`/`HashMap`); Cargo.toml pins `rand`/`rand_chacha` with `default-features=false` (no OS-entropy path). Box–Muller draws from `gen_range(f64::EPSILON..=1.0)` (avoids `ln(0) = -Inf` by construction). Frame serialisation is fixed little-endian; source summation order is fixed by `Vec` order. The published cross-machine witness `cc8de9b0…93b4` (`proof::tests::proof_witness_publishes_a_known_value`) **still passes unchanged** after both fixes — the happy-path output is byte-identical, confirming the guards only affect degenerate inputs. (Attested caveat, not a finding: libm `cos`/`ln`/`sqrt` *could* differ x86↔wasm; witness is documented as x86_64-captured.) (2) **Panic-free deserialisation — clean.** `MagFrame::from_bytes` validates `len`/magic/version, then the per-field `buf[a..b].try_into().expect(...)` calls are over **fixed sub-ranges of an already-length-checked 60-byte buffer** → provably infallible, not reachable panic vectors. No `unsafe`, no `panic!`/`unreachable!` in production code; every other `unwrap`/`expect` is `#[cfg(test)]`. (3) **Div-by-zero — clean.** `dipole_field`/`current_loop_field` clamp `r_norm < R_MIN_M` (1 mm) before the `1/r³`/`1/r²` divide (finite inputs); `shot_noise_floor` guards `denom <= 0.0 → f64::INFINITY`; `vec3_normalise` guards `n < 1e-20`. (The only gap was the NaN case that *bypasses* the `r_norm` clamp — fixed at the ADC funnel above.) **Pinning tests (fails-on-old / passes-on-new, MEASURED):** `pipeline::degenerate_zero_sample_rate_does_not_panic` (panicked on old code; now finite frames), `pipeline::non_finite_scene_input_flags_frame_instead_of_silently_zeroing` (old: `flags=0b0000`; now `ADC_SATURATED` set, `b_pt` finite), `digitiser::adc_quantise_flags_non_finite_as_saturated` (old: `(0,false)` for NaN; now `(0,true)`). `cargo test -p nvsim --no-default-features`: **50 → 53 passed, 0 failed**. Workspace green; Python deterministic proof unchanged (`f8e76f21…46f7a`, bit-exact — nvsim is off the signal proof path). Needs ADR slot 177. - **`wifi-densepose-core` + `wifi-densepose-cli` beyond-SOTA security review (ADR-127 note; CLI needs ADR slot) — NaN-state-poisoning bug class does NOT originate in core (verdict: no + evidence); both crates confirmed clean on all reviewed dimensions; 4 regression pins added locking in two real DoS guards.** **Load-bearing question — verdict NO (with evidence, MEASURED).** The NaN-state-poisoning class that hit `wifi-densepose-calibration`/`-vitals`/`-geo` (a non-finite input latching into a persistent IIR/Welford/von-Mises/voxel accumulator → silent permanent feature failure) does **not** live in a shared `wifi-densepose-core` primitive: core exposes **no stateful accumulator at all** — no Welford/running-mean, no von-Mises/circular-mean, no IIR/biquad filter state, no voxel grid. Grep over `core/src` for `welford|von_mises|biquad|y1|y2|running_mean|accumulat|voxel|self.*+=` matched only the `InvalidState` *error* enum, "reset state" doc comments, and a test-only LCG — zero stateful logic (MEASURED). Each downstream crate rolls its own accumulator, so each fix is correctly local; corroborated by `wifi-densepose-calibration::Features::from_series`, which **already** filters non-finite samples and returns `Features::ZERO` (the downstream re-implementation of the fix). The only float math in core's hot path is construction-time projection (`CsiFrame::new` → `amplitude`/`phase` via `mapv`) and pure stateless `utils` functions — none persists state across frames. **Dimensions confirmed clean (with evidence):** (1) **panic-on-adversarial-input = 0** — `CsiFrame::from_canonical_bytes` (the replay/forward deserialisation boundary) returns a typed `CanonicalDecodeError` for every malformed input (truncation, bad discriminant, non-UTF-8 device id, nonzero reserved bytes, shape/payload mismatch, trailing bytes); the CLI UDP parser `parse_csi_packet` (the widest CLI attack surface — bound to `0.0.0.0` by default) returns `None` on any malformed datagram. Both proven panic-free over deterministic-LCG fuzz sweeps (new pins). (2) **`Confidence::new` rejects NaN** (`!(0.0..=1.0).contains(&NaN)` ⇒ `true` ⇒ `Err`); `compute_bounding_box`/`to_flat_array` are NaN-tolerant (f32 `min`/`max` ignore NaN). (3) **`amplitude_variance`/`mean_amplitude` panic-free on empty frames** — ndarray 0.17 `var(0.0)`/`mean()` return finite/`None` (handled), not a panic (MEASURED via throwaway probe). (4) **Unbounded-memory DoS bounded** in both deserialisers: `from_canonical_bytes` guards the `Vec::with_capacity(rows*cols)` with `rows.saturating_mul(cols).saturating_mul(16) <= bytes.len()`; `parse_csi_packet` guards `Array2::zeros` with `buf.len() < 20 + n_pairs*2` — a header lying about an enormous `rows×cols` / `n_antennas×n_subcarriers` is rejected before allocation. (5) **CLI path-traversal** in `calibrate-serve` already defended by `sanitize_room_id` (keeps `[A-Za-z0-9_-]`, caps 64 chars, with tests) on every client-supplied `room_id`/`bank`/`baseline` name that reaches a file path; bearer-auth gate + non-loopback-bind warning present. (6) **No hardcoded secrets** (`--token` read from `CALIBRATE_TOKEN` env, never embedded). **Regression pins added (fails-on-old / passes-on-new):** core `canonical_decode_oversized_shape_is_bounded_not_allocated` (MEASURED: with the saturating guard removed it panics `capacity overflow` at `types.rs:801`; passes with the guard) and `canonical_decode_never_panics_on_arbitrary_bytes`; CLI `test_parse_csi_packet_oversized_claim_is_rejected_not_allocated` (a 255×65535-pair claim ≈ 33 MB in a 2 KB datagram → `None`, never OOMs) and `test_parse_csi_packet_never_panics_on_arbitrary_bytes`. `cargo test -p wifi-densepose-core`: **35 → 37** lib tests, 0 failed; `cargo test -p wifi-densepose-cli --no-default-features`: **24 → 26**, 0 failed. Workspace green; Python deterministic proof unchanged (`f8e76f21…46f7a`, bit-exact — core/cli are off the signal proof path). No production code changed — review is clean-with-evidence plus pins. - **`homecore` foundational state-machine review (ADR-127) — one real concurrency bug fixed (state-set TOCTOU dropping/reordering `state_changed` events) + two hardening fixes (entity_id memory-DoS, service-handler panic isolation), each pinned by a fails-on-old test; event-bus lag & lock discipline confirmed clean with evidence.** Beyond-SOTA security+concurrency review of the crate every other HOMECORE module builds on (state store `state.rs`, event bus `bus.rs`, service/entity registries, the `HomeCore` coordinator), un-covered by the ADR-154–159 sweep — a bug here is high-blast-radius. **HC-RACE-01 (state-set TOCTOU, the crux — race/lost-event).** `StateMachine::set` did `get()` (releasing the DashMap shard lock) → compute the next snapshot + the no-op/`last_changed` decision → `insert()` (re-acquiring the lock) → `send()`; the read-modify-write was **not atomic** w.r.t. a concurrent writer on the same entity, contradicting ADR-127 §2.1's promise that "the writer atomically replaces the map entry." A writer that read a **stale `old`** could mis-classify a genuine transition as a no-op and **silently drop its `state_changed` event** (a missed automation trigger) or fire an event whose `new_state` duplicated the previously delivered one (a spurious trigger for any automation keyed on `old_state != new_state`). **Fixed** by holding the shard write-lock across the whole read→decide→insert→fire sequence via `entry()`/`insert_entry()` — `tx.send` is non-blocking, non-async, and never re-enters the map, so firing under the shard lock cannot deadlock and keeps global event order in lock-step with global commit order. Pinned by `concurrent_set_fires_no_duplicate_adjacent_events` (4 writers toggling one entity A/B; asserts no two consecutive fired events carry an identical `new_state` — impossible under correct serialisation; an instrumented probe observed ~93k such duplicate-adjacent events across 200 trials on the racy code, **zero** on the fix; the test fails reliably on the first trial pre-fix). **HC-EID-LEN-01 (unbounded `entity_id`, memory-DoS).** `homecore-api/src/rest.rs` parses untrusted REST path segments straight through `EntityId::parse`; with no length cap an otherwise-valid id (`a.` + many MB of `[a-z0-9_]`) was accepted, and a `POST /api/states/` would persist it into the DashMap state store (permanent growth across distinct ids). **Fixed** by rejecting ids longer than `MAX_ENTITY_ID_LEN` (255, HA-compatible) up front in `parse()`, before any per-char scan, with a new `EntityIdError::TooLong` — fail-closed at the boundary type protects every caller (REST, registry deserialize, automation). Pinned by `entity_id_length_boundary` (exactly-MAX accepted; MAX+1 and a 4 MiB id rejected — oversized parses `Ok` on old code). **HC-SVC-PANIC-01 (service-handler panic not isolated).** `ServiceRegistry::call` already ran handlers **outside** the registry lock (the `Arc` is cloned out of the read guard first → no `RwLock` poisoning, no blocking of other callers — clean), but a panicking handler unwound through `call()` into the caller's task (the task driving the engine). **Hardened** by wrapping the handler future in `AssertUnwindSafe` + `catch_unwind`, converting a panic to `ServiceError::HandlerPanicked`; the registry stays fully usable (a sibling healthy service still returns, the bad service stays registered). Pinned by `panicking_handler_is_isolated_and_registry_survives` (unwinds through `call` on old code). **Dimensions confirmed clean (with evidence, no invented issues):** (1) **event-bus bounds / lag** (the homecore-api WS lag-DoS class) — both `StateMachine` and `EventBus` use **bounded** `tokio::sync::broadcast` (capacity 4,096); a slow subscriber gets a recoverable `Lagged(n)` (drop-oldest + re-sync) while `fire_*` is non-blocking and never waits on slow receivers, so a lagging subscriber **cannot block the publisher, grow the channel without bound, or kill a fast subscriber** (evidenced by `slow_subscriber_does_not_block_publisher_or_kill_the_bus` — fire 3× capacity at an idle subscriber, publisher unblocked, bus stays live, fresh fast subscriber receives, lagged one recovers); (2) **lock ordering / lock-across-await** (deadlock) — no code path holds two of `{state DashMap, registry RwLock, service RwLock}` simultaneously, so no inconsistent-ordering deadlock can exist; every `tokio::sync::RwLock` guard in `registry.rs`/`service.rs` is used in one synchronous statement and dropped before any `.await` (`call` explicitly scopes the read guard out before awaiting the handler); the only guard held across a send is the DashMap shard lock in `set`, across a **synchronous** broadcast send — safe; (3) **panic-on-input** — no reachable `unwrap`/`expect`/index in non-test code beyond the safe `send().unwrap_or(0)` and the dead-but-harmless `split_once(...).unwrap_or(...)` fallbacks on already-validated ids. `cargo test -p homecore --no-default-features`: **20 → 24 passed, 0 failed** (+4 pins). Workspace green; Python deterministic proof unchanged (`f8e76f21…46f7a`, bit-exact — `homecore` is off the signal proof path). Review notes appended to ADR-127 §9. - **`homecore-migrate` security review (ADR-165 surfaces) — one real secret-leak fix; traversal / data-loss / panic / injection dimensions confirmed clean with evidence.** Beyond-SOTA review of the Home-Assistant `.storage`/`secrets.yaml`/`automations.yaml` migrator, the two sharp surfaces being secret handling (`secrets.rs`) and untrusted-file parsing. **Finding + fix (secret-leak, `secrets.rs`):** a malformed `secrets.yaml` whose offending scalar fails a typed-tag coercion (e.g. `port: !!int `) produced a `serde_yaml` error whose message **embeds the scalar verbatim** — `invalid value: string ""`. The old code wrapped that message into `MigrateError::YamlParse { source }`; the error propagates out of `read_secrets`, is `?`-returned by the `InspectSecrets` CLI path in `main.rs`, and printed to stderr by `anyhow` — **leaking a secret value despite the CLI's deliberate `` design** (`main.rs` only ever prints keys as ` = `). Fix: `secrets.yaml` parse failures now map to a dedicated redacting variant `MigrateError::SecretsParse { path, line, column }` carrying only the file path + a coarse location (from `serde_yaml::Error::location()`), never the scalar; other (non-secret) YAML files keep `YamlParse`. **Pinned** by `secrets::tests::malformed_secrets_error_never_contains_secret_value`, which asserts the rendered error **and its full `#[source]` chain** never contain the secret value and that the error is still the structured `SecretsParse` (fail-closed) — it **fails on the old `YamlParse` path** (observed leak: `... invalid value: string "s3cr3t_TOKEN_VALUE" ...`) and passes on the fix; plus `malformed_secrets_error_reports_location` (still locatable). **Confirmed clean with evidence:** *secret leakage elsewhere* — the only secret sink is the value map; `main.rs` redacts values, and the `MissingField`/`Io` paths surface only the path, never content. *Source mutation / data-loss* — **structurally impossible**: there is no `fs::write`/`fs::remove`/`fs::create`/`File::create`/`OpenOptions` anywhere in the crate; P1 reads source and writes nothing (`import-entities` is in-memory only), so re-runs are trivially idempotent and the HA source is never touched. *Path traversal* — CLI takes a `--config-dir`/`--storage` dir and joins **fixed** filenames (`secrets.yaml`, `core.entity_registry`, …); no user-controlled path component, no `..`/absolute escape beyond the user's own privileges. *Panic-on-input* — probed duplicate-key, bad-indent, tab/control-char, multi-doc, non-mapping-root, unterminated-flow, `!input` blueprint tags, deep nesting, anchors: **every** malformed/typed/truncated input **errors, never panics** (all production code is panic-free; every `unwrap`/`expect` is `#[cfg(test)]`). *Fail-closed versioning* — unknown storage `minor_version` hard-errors (no silent fallback to an older parser). *Injection* — no SQL/shell/path interpolation; the tool emits diagnostics only and persists nothing in P1. `homecore-migrate` **19 → 21** tests (`--no-default-features`), 0 failed. Behaviour otherwise unchanged; Python deterministic proof PASS, hash unchanged (`homecore-migrate` is off the signal proof path). diff --git a/docs/adr/ADR-177-nvsim-degenerate-input-hardening.md b/docs/adr/ADR-177-nvsim-degenerate-input-hardening.md new file mode 100644 index 00000000..cf65f907 --- /dev/null +++ b/docs/adr/ADR-177-nvsim-degenerate-input-hardening.md @@ -0,0 +1,92 @@ +# ADR-177: `nvsim` Degenerate-Input Hardening (NV-Diamond Simulator) + +| Field | Value | +|-------|-------| +| **Status** | Accepted — 2 real MEDIUM bugs fixed + pinned; determinism preserved | +| **Date** | 2026-06-15 | +| **Deciders** | ruv | +| **Codename** | **NVSIM-FAILCLOSED** | +| **Reviews** | ADR-089 (`nvsim` NV-diamond magnetometer pipeline simulator) | +| **Milestone** | #9 (ungated-crate security sweep) — crate 2 of 4 | + +## Context + +`nvsim` (ADR-089) is a standalone, **WASM-ready** deterministic NV-diamond +magnetometer pipeline simulator — a forward-only leaf: +`scene → source → propagation → NV ensemble → digitiser → MagFrame + SHA-256 +witness`. It has no network surface, so the real attack surface is **degenerate +physical-parameter input** crossing the external boundary — specifically the +WASM `config_json` / `scene_json` entry points. + +Two properties matter for this crate that don't for others: it is billed +**deterministic** (a published cross-machine witness must reproduce bit-exactly), +and under `panic=abort` WASM any panic **aborts the whole module**. So a +config-induced panic is a denial-of-service, and a silent numeric corruption +defeats the simulator's entire purpose. + +## Decision + +Fix the two reachable degenerate-input bugs at their funnel points, each pinned +by a fails-on-old test, **without perturbing the deterministic happy path** (the +guards fire only on non-finite / degenerate input; the published witness is +unchanged). + +### Findings fixed (both MEASURED-reproduced) + +| # | Severity | Location | Issue | Fix | +|---|----------|----------|-------|-----| +| NVSIM-DT-01 | MEDIUM (DoS) | `pipeline.rs:58,95` | `dt = config.dt_s.unwrap_or(1.0 / f_s_hz)`; an external `f_s_hz == 0.0` → `dt = +Inf` → `(dt*1e6) as u64` saturates to `u64::MAX` → `(sample as u64) * dt_us` **panics `attempt to multiply with overflow`** at `sample ≥ 2` (debug/WASM-abort; garbage `t_us` in release). MEASURED: panic at `pipeline.rs:95:30`. | Sanitise `dt` (non-finite/non-positive → 1 µs fallback), cap the `u64` cast at `u64::MAX`, `saturating_mul` the timestamp — no config can overflow it. | +| NVSIM-NAN-01 | MEDIUM (silent corruption) | funnel `digitiser.rs::adc_quantise` (root: near-field clamp bypass in `source.rs`) | A non-finite scene param (NaN/Inf dipole position, Inf moment, NaN loop radius) **bypasses the near-field clamp** (`NaN < R_MIN_M == false` → the `1/r³` path runs → NaN field), and at the ADC `NaN as i32 == 0` (Rust saturating cast) emits a frame `b_pt=[0,0,0]` with **`ADC_SATURATED` CLEAR** — indistinguishable from a legitimate zero-field reading. MEASURED: `b=[NaN,NaN,NaN] sat=false` → `b_pt=[0,0,0] flags=0b0000`. | `adc_quantise`: any non-finite input → code `0` **with the saturation flag raised**; the pipeline's existing `adc_sat` OR-reduction propagates `ADC_SATURATED` onto the frame, making the corruption visible downstream. | + +This is the same **NaN-fail-open / NaN-poisoning** family seen across +calibration/vitals/geo and ruview-swarm — non-finite input defeating a guard — +but bounded here to a single frame (no cross-timestep accumulator). + +### Dimensions confirmed clean (with evidence) + +1. **Determinism integrity — clean.** One RNG only: `ChaCha20Rng::seed_from_u64(seed)`, + fully caller-seeded (grep: one `seed_from_u64`, **zero** `thread_rng`/`getrandom`/ + `SystemTime`/`Instant`/`HashMap`); `Cargo.toml` pins `rand`/`rand_chacha` + `default-features=false` (no OS entropy). Box–Muller draws + `gen_range(f64::EPSILON..=1.0)` (avoids `ln(0)=-Inf` by construction). Frame + bytes fixed LE; source summation order fixed by `Vec` order. **The published + cross-machine witness `cc8de9b0…93b4` (`proof_witness_publishes_a_known_value`) + passes UNCHANGED after both fixes** — the happy path is byte-identical; guards + touch only degenerate inputs. *Attested caveat (not a finding): libm + `cos`/`ln`/`sqrt` could differ x86↔wasm; the witness is documented as + x86_64-captured.* +2. **Panic-free deserialisation — clean.** `MagFrame::from_bytes` validates + len/magic/version, then per-field `buf[a..b].try_into().expect(...)` are over + fixed sub-ranges of an already-length-checked 60-byte buffer (provably + infallible). No `unsafe`, no `panic!`/`unreachable!` in production; every other + `unwrap`/`expect` is `#[cfg(test)]`. +3. **Div-by-zero / numerical landmines — clean.** `dipole_field`/`current_loop_field` + clamp `r_norm < R_MIN_M` before `1/r³`,`1/r²` (finite inputs); `shot_noise_floor` + guards `denom <= 0`; `vec3_normalise` guards `n < 1e-20`. The only hole was the + NaN *bypass* of the clamp — closed at the ADC funnel (NVSIM-NAN-01). + +## Validation + +- `cargo test -p nvsim --no-default-features` → **50 → 53** passed, 0 failed (+3 pins: + `degenerate_zero_sample_rate_does_not_panic`, + `non_finite_scene_input_flags_frame_instead_of_silently_zeroing`, + `adc_quantise_flags_non_finite_as_saturated`). +- `cargo test --workspace --no-default-features` → **exit 0**, 0 failed. +- `python archive/v1/data/proof/verify.py` → **VERDICT: PASS**, hash + `f8e76f21…46f7a` unchanged (nvsim off the signal proof path). +- nvsim's own cross-machine witness `cc8de9b0…93b4` reproduces unchanged. + +## Consequences + +### Positive +- A config-induced WASM-abort DoS and a silent NaN→fake-zero-field corruption are + closed at their funnel points, each regression-pinned, with the deterministic + witness proven intact. + +### Negative / Neutral +- None. Guards affect only degenerate inputs; happy-path output is byte-identical. + +## Links +- ADR-089 — `nvsim` NV-diamond magnetometer simulator +- ADR-176 — `ruview-swarm` (sibling NaN-fail-open review) +- ADR-172 — core/cli (where the NaN-bug-class root was settled NO) diff --git a/v2/crates/nvsim/src/digitiser.rs b/v2/crates/nvsim/src/digitiser.rs index bfa83267..a315d9c7 100644 --- a/v2/crates/nvsim/src/digitiser.rs +++ b/v2/crates/nvsim/src/digitiser.rs @@ -39,7 +39,20 @@ pub const DEFAULT_SAMPLE_RATE_HZ: f64 = 10_000.0; pub const DEFAULT_F_MOD_HZ: f64 = 1_000.0; /// Quantise one input sample (T) to a signed ADC code. Returns `(code, saturated)`. +/// +/// A **non-finite** input (`NaN` / `±Inf`) is treated as an out-of-range +/// condition: it clamps to code `0` and raises the saturation flag. This is +/// the funnel point that stops the NaN-state-poisoning class — a non-finite +/// physical field (e.g. produced by a degenerate scene with a NaN dipole +/// position) would otherwise coerce silently to code `0` *with the saturation +/// flag clear*, yielding a frame indistinguishable from a legitimate +/// zero-field reading. Flagging it preserves the "every frame is honest about +/// its own validity" contract the proof bundle relies on. pub fn adc_quantise(b_in_t: f64) -> (i32, bool) { + if !b_in_t.is_finite() { + // Non-finite => not representable on the ±FS scale; mark saturated. + return (0, true); + } let code_f = (b_in_t / ADC_LSB_T).round(); let max_code = (1_i32 << (ADC_BITS - 1)) - 1; // 32_767 for 16-bit signed let min_code = -max_code; // symmetric @@ -153,6 +166,23 @@ mod tests { } } + #[test] + fn adc_quantise_flags_non_finite_as_saturated() { + // Security pinning (NaN-state-poisoning guard): a non-finite field + // value must clamp to code 0 AND raise the saturation flag, so the + // pipeline can flag the frame rather than emitting it as a silent, + // indistinguishable zero-field reading. Pre-fix this returned + // (0, false) for NaN — a silent corruption. + for bad in [f64::NAN, f64::INFINITY, f64::NEG_INFINITY] { + let (code, sat) = adc_quantise(bad); + assert_eq!(code, 0, "non-finite input {bad} must clamp to code 0"); + assert!(sat, "non-finite input {bad} must raise the saturation flag"); + } + // A finite in-range value is unaffected (no false positives). + let (_, sat) = adc_quantise(1.0e-7); + assert!(!sat, "a finite in-range value must NOT be flagged saturated"); + } + #[test] fn adc_saturates_above_full_scale() { let (code_pos, sat_pos) = adc_quantise(20.0e-6); diff --git a/v2/crates/nvsim/src/pipeline.rs b/v2/crates/nvsim/src/pipeline.rs index a2e5a43b..8b2fbcd6 100644 --- a/v2/crates/nvsim/src/pipeline.rs +++ b/v2/crates/nvsim/src/pipeline.rs @@ -51,11 +51,28 @@ impl Pipeline { /// (sensor × sample) — i.e. `n_samples · scene.sensors.len()` frames /// in scene-major / sample-minor order. pub fn run(&self, n_samples: usize) -> Vec { - let dt = self + // `dt` is derived from caller-supplied config — an external boundary + // (e.g. the WASM `config_json`). A degenerate `f_s_hz == 0` makes + // `1.0 / f_s_hz == +Inf`; a non-finite or non-positive `dt_s` is + // equally hostile. Sanitise before any arithmetic that could panic. + let raw_dt = self .config .dt_s .unwrap_or(1.0 / self.config.digitiser.f_s_hz); - let dt_us = (dt * 1.0e6) as u64; + // Fall back to a 1 µs step (the smallest physically meaningful + // sample interval here) when `dt` is non-finite or non-positive, so + // the run produces well-defined frames instead of garbage / a panic. + let dt = if raw_dt.is_finite() && raw_dt > 0.0 { + raw_dt + } else { + 1.0e-6 + }; + // `dt` is now finite & positive, so `dt * 1e6` is finite. Cap the + // `u64` cast defensively (a huge but finite `dt` could still exceed + // `u64::MAX`) and use `saturating_mul` for the per-sample timestamp so + // a pathological config can never trigger a multiply-with-overflow + // panic (debug / WASM panic=abort) or wrap to a garbage timestamp. + let dt_us = (dt * 1.0e6).min(u64::MAX as f64) as u64; let nv = NvSensor::new(self.config.sensor); let mut out: Vec = @@ -92,7 +109,7 @@ impl Pipeline { ]; let mut frame = MagFrame::empty(sensor_idx as u16); - frame.t_us = (sample as u64) * dt_us; + frame.t_us = (sample as u64).saturating_mul(dt_us); frame.b_pt = b_pt; frame.sigma_pt = sigma_pt; frame.noise_floor_pt_sqrt_hz = (reading.noise_floor_t_sqrt_hz * 1.0e12) as f32; @@ -205,6 +222,62 @@ mod tests { } } + #[test] + fn degenerate_zero_sample_rate_does_not_panic() { + // Security pinning (panic / DoS guard): an externally-supplied + // `f_s_hz == 0` makes `1/f_s_hz == +Inf`; pre-fix that produced + // `dt_us == u64::MAX`, and `sample * dt_us` panicked with + // "attempt to multiply with overflow" (debug / WASM panic=abort) at + // sample >= 2, or wrapped to a garbage timestamp in release. The + // sanitised `dt` + `saturating_mul` must keep the run finite. + let scene = fixture_scene(); + let cfg = PipelineConfig { + digitiser: crate::digitiser::DigitiserConfig { + f_s_hz: 0.0, + f_mod_hz: 1000.0, + }, + ..PipelineConfig::default() + }; + let frames = Pipeline::new(scene, cfg, 42).run(8); + assert_eq!(frames.len(), 8); + for f in &frames { + // Timestamps are monotone-well-defined, not garbage. + assert!(f.t_us < u64::MAX); + } + } + + #[test] + fn non_finite_scene_input_flags_frame_instead_of_silently_zeroing() { + // Security pinning (NaN-state-poisoning guard): a NaN dipole position + // makes `r_norm` NaN, which bypasses the near-field clamp + // (`NaN < R_MIN_M` is false) and yields a NaN field. Pre-fix the + // digitiser silently coerced that NaN to code 0 with the saturation + // flag CLEAR — a frame indistinguishable from a real zero-field + // reading. Post-fix the frame must carry ADC_SATURATED so the + // corruption is visible downstream. + let mut scene = Scene::new(); + scene.add_dipole(DipoleSource::new([f64::NAN, 0.0, 0.5], [0.0, 0.0, 1.0e-3])); + scene.add_sensor([0.0, 0.0, 0.0]); + let cfg = PipelineConfig { + sensor: NvSensorConfig { + shot_noise_disabled: true, + ..NvSensorConfig::default() + }, + ..PipelineConfig::default() + }; + let frames = Pipeline::new(scene, cfg, 0).run(4); + for f in &frames { + assert!( + f.has_flag(flag::ADC_SATURATED), + "non-finite field must raise ADC_SATURATED, not emit a silent zero frame" + ); + // And the emitted value is a defined number, not NaN. + for b in f.b_pt { + assert!(b.is_finite()); + } + } + } + #[test] fn adc_saturation_flag_fires_above_full_scale() { // Place a dipole close enough to drive the field above ±10 µT FS.