From 4a083999e5b072cf42e54fc3fc11df2614ac0d09 Mon Sep 17 00:00:00 2001
From: rUv <ruv@ruv.net>
Date: Mon, 15 Jun 2026 09:55:40 -0400
Subject: [PATCH] security(ruview-swarm): fail-closed on NaN/Inf at the
 swarm-comm trust boundary + ADR-176 (#1096)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* fix(ruview-swarm): fail-closed on NaN/Inf at swarm-comm trust boundary (ADR-148)

Beyond-SOTA security review of the ADR-148 drone swarm control plane found
four IEEE-754 NaN/Inf fail-open / DoS bugs on data crossing the untrusted
swarm-comm boundary (receive_peer_state / receive_peer_detection accept full
DroneState/CsiDetection whose f64/f32 fields deserialize with no finite-check).

- HIGH: failsafe::tick collision-avoidance + battery checks fail-open on NaN
  (NaN < threshold == false silently disabled collision avoidance / kept a
  NaN-battery drone Nominal). Now fails closed to EmergencyDiverge / RTH.
- MED: geofence::check NaN-altitude bypass returned Safe through the
  point-in-polygon path. Now leading non-finite-coordinate guard -> HardBreach.
- MED/DoS: antijamming FhssRadio panicked with "% 0" on an empty deserialized
  channels_mhz. Now len==0 early-returns (benign 0.0 sentinel).
- LOW: multiview::fuse propagated a NaN victim_position into the fused
  "confirmed victim" location. Now requires finite confidence + position.

Each fix pinned by a fails-on-old / passes-on-new test (MEASURED: old code
returned Nominal/Safe or panicked). cargo test -p ruview-swarm
--no-default-features: 117 -> 123 passed, 0 failed. Workspace green; Python
deterministic proof unchanged (f8e76f21...46f7a, off the signal path).

Documented-not-fixed (ADR slot 176): Raft AppendEntries lacks Log-Matching
consistency check (topology/raft.rs); MavlinkSigner::verify uses non-constant
-time tag compare + no replay-window rejection (already doc-flagged).

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs(adr): ADR-176 — ruview-swarm NaN-fail-open safety review

Records the 4 MEASURED fail-open safety bugs fixed in f671000d7 (collision
avoidance, battery RTH, geofence, anti-jamming %0 panic — all NaN/Inf
defeating a safety comparison at the swarm-comm trust boundary) + 6 pins,
5 clean-with-evidence dimensions, and the 2 genuine issues deferred to a
focused follow-up (Raft AppendEntries log-matching; MAVLink signer
constant-time + replay window).

Co-Authored-By: claude-flow <ruv@ruv.net>
---
 CHANGELOG.md                                  |   1 +
 ...uview-swarm-nan-fail-open-safety-review.md | 103 ++++++++++++++++++
 v2/crates/ruview-swarm/src/failsafe/mod.rs    |  52 ++++++++-
 .../ruview-swarm/src/security/antijamming.rs  |  39 ++++++-
 .../ruview-swarm/src/security/geofence.rs     |  35 ++++++
 .../ruview-swarm/src/sensing/multiview.rs     |  61 ++++++++++-
 6 files changed, 282 insertions(+), 9 deletions(-)
 create mode 100644 docs/adr/ADR-176-ruview-swarm-nan-fail-open-safety-review.md

diff --git a/CHANGELOG.md b/CHANGELOG.md
index b16fdc59..acd094db 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
 
 ### Security
+- **`ruview-swarm` beyond-SOTA security + correctness review (ADR-148 drone swarm control plane; needs ADR slot 176) — 4 real fail-open / DoS bugs fixed in the NaN-state-poisoning class, each pinned fails-on-old / passes-on-new; 5 dimensions confirmed clean with evidence.** The shared theme is **IEEE-754 NaN/Inf silently defeating a safety comparison** on data that crosses the untrusted swarm-comm trust boundary (`SwarmOrchestrator::receive_peer_state` / `receive_peer_detection` accept full `DroneState`/`CsiDetection` whose f64/f32 fields deserialize with no finite-check; the integer-encoded MAVLink wire formats in `mavlink_messages.rs` cannot carry NaN, but the serde struct path can). **(1) HIGH — `failsafe::FailSafeMachine::tick` collision-avoidance + battery fail-open** (`failsafe/mod.rs:51,75`). `nearest_neighbor_dist < collision_dist_m` and `battery_pct <= rth_pct` both evaluate `false` for a NaN operand, so a poisoned peer position (→ NaN `nearest_peer_distance` via `Position3D::distance_to`) **silently disabled collision avoidance** and a NaN battery reading kept a drone Nominal — the worst failure for a physical airframe. Fixed to fail CLOSED (`!is_finite() ||` → `EmergencyDiverge` / `ReturnToHome`). MEASURED fails-on-old: `test_nan_neighbor_distance_fails_closed_to_diverge` / `test_nan_battery_fails_closed_to_rth` both returned `Nominal` pre-fix. **(2) MEDIUM — `security::geofence::Geofence::check` NaN-altitude bypass** (`security/geofence.rs:33`). A NaN `z` (altitude) with valid x/y skipped the altitude breach (`NaN < min || NaN > max` = `false`) and returned **`Safe`** through the point-in-polygon path — a silent geofence bypass. Fixed with a leading non-finite-coordinate → `HardBreach` guard. MEASURED fails-on-old: `test_nan_altitude_fails_closed` returned `Safe` pre-fix. **(3) MEDIUM/DoS — `security::antijamming::FhssRadio` `% 0` panic on empty `channels_mhz`** (`security/antijamming.rs:65,71,102`). `FhssConfig` is `Deserialize`; an empty channel list (malformed/hostile config) made `next_hop`/`current_channel_mhz`/`evasive_hop`/`tick` panic with `remainder with a divisor of zero`, crashing the radio task. Fixed with `len == 0` early-returns (benign `0.0` sentinel). MEASURED fails-on-old: `test_empty_channels_does_not_panic` **panicked** (`divisor of zero`) pre-fix. **(4) LOW — `sensing::multiview::MultiViewFusion::fuse` NaN victim-position propagation** (`sensing/multiview.rs:70`). A NaN `victim_position` passed the `is_some()` filter and propagated through the confidence-weighted average into the fused "confirmed victim" location dispatched to the swarm. Fixed by requiring finite `confidence` + finite position components (fail-closed drop). MEASURED fails-on-old: `test_nan_victim_position_dropped_from_fusion` produced a non-finite fused position pre-fix. **Dimensions confirmed clean (with evidence):** (a) **MAVLink decode panic-safety** — `SwarmNodeState::decode(&[u8;20])` `try_into().unwrap()`s are over fixed const ranges of a fixed-size array (provably infallible; no arbitrary-length `&[u8]` path exists). (b) **UWB GPS anti-spoofing is NaN-safe** — `(gps_dist - uwb_dist).abs() <= tol` already fails CLOSED on a NaN range/position (counts as inconsistent → spoof rejected), verified by reasoning + existing `test_spoofed_gps_invalid`. (c) **Bounded grid / no allocation-from-length-field** — `ProbabilityGrid::update_bayesian`/`mark_scanned` bounds-check `cx >= width || cy >= height`; `pos_to_cell` uses saturating `as u32` (Rust `as` saturates, no UB). (d) **Mesh `nearest_k` NaN-safe sort** — `partial_cmp(..).unwrap_or(Equal)` cannot panic on NaN distances. (e) **No hardcoded secrets** — `MavlinkSigner` key is constructor-injected (`[u8;32]`), nothing embedded. **Documented-not-fixed (for ADR-176, not churned to avoid test-rewrite risk):** (i) **Raft `AppendEntries` lacks the Log-Matching consistency check** (`topology/raft.rs:187`) — a follower appends leader entries on `term >= current_term` without validating `prev_log_index`/`prev_log_term`, so a malformed/byzantine leader can corrupt a follower's log (a genuine consensus-safety gap; vote tallying is also delegated to the caller per the existing `handle_message` comment). (ii) **`MavlinkSigner::verify` uses a non-constant-time tag `==` and has no replay/timestamp-window rejection** (`security/mavlink_signing.rs:64`) — the doc comment already flags the replay limitation as a known demo/test simplification. `cargo test -p ruview-swarm --no-default-features`: **117 → 123 passed, 0 failed** (+6 pins). Workspace green; Python deterministic proof unchanged (`f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a`, bit-exact — `ruview-swarm` is off the signal proof path).
 - **`wifi-densepose-core` + `wifi-densepose-cli` beyond-SOTA security review (ADR-127 note; CLI needs ADR slot) — NaN-state-poisoning bug class does NOT originate in core (verdict: no + evidence); both crates confirmed clean on all reviewed dimensions; 4 regression pins added locking in two real DoS guards.** **Load-bearing question — verdict NO (with evidence, MEASURED).** The NaN-state-poisoning class that hit `wifi-densepose-calibration`/`-vitals`/`-geo` (a non-finite input latching into a persistent IIR/Welford/von-Mises/voxel accumulator → silent permanent feature failure) does **not** live in a shared `wifi-densepose-core` primitive: core exposes **no stateful accumulator at all** — no Welford/running-mean, no von-Mises/circular-mean, no IIR/biquad filter state, no voxel grid. Grep over `core/src` for `welford|von_mises|biquad|y1|y2|running_mean|accumulat|voxel|self.*+=` matched only the `InvalidState` *error* enum, "reset state" doc comments, and a test-only LCG — zero stateful logic (MEASURED). Each downstream crate rolls its own accumulator, so each fix is correctly local; corroborated by `wifi-densepose-calibration::Features::from_series`, which **already** filters non-finite samples and returns `Features::ZERO` (the downstream re-implementation of the fix). The only float math in core's hot path is construction-time projection (`CsiFrame::new` → `amplitude`/`phase` via `mapv`) and pure stateless `utils` functions — none persists state across frames. **Dimensions confirmed clean (with evidence):** (1) **panic-on-adversarial-input = 0** — `CsiFrame::from_canonical_bytes` (the replay/forward deserialisation boundary) returns a typed `CanonicalDecodeError` for every malformed input (truncation, bad discriminant, non-UTF-8 device id, nonzero reserved bytes, shape/payload mismatch, trailing bytes); the CLI UDP parser `parse_csi_packet` (the widest CLI attack surface — bound to `0.0.0.0` by default) returns `None` on any malformed datagram. Both proven panic-free over deterministic-LCG fuzz sweeps (new pins). (2) **`Confidence::new` rejects NaN** (`!(0.0..=1.0).contains(&NaN)` ⇒ `true` ⇒ `Err`); `compute_bounding_box`/`to_flat_array` are NaN-tolerant (f32 `min`/`max` ignore NaN). (3) **`amplitude_variance`/`mean_amplitude` panic-free on empty frames** — ndarray 0.17 `var(0.0)`/`mean()` return finite/`None` (handled), not a panic (MEASURED via throwaway probe). (4) **Unbounded-memory DoS bounded** in both deserialisers: `from_canonical_bytes` guards the `Vec::with_capacity(rows*cols)` with `rows.saturating_mul(cols).saturating_mul(16) <= bytes.len()`; `parse_csi_packet` guards `Array2::zeros` with `buf.len() < 20 + n_pairs*2` — a header lying about an enormous `rows×cols` / `n_antennas×n_subcarriers` is rejected before allocation. (5) **CLI path-traversal** in `calibrate-serve` already defended by `sanitize_room_id` (keeps `[A-Za-z0-9_-]`, caps 64 chars, with tests) on every client-supplied `room_id`/`bank`/`baseline` name that reaches a file path; bearer-auth gate + non-loopback-bind warning present. (6) **No hardcoded secrets** (`--token` read from `CALIBRATE_TOKEN` env, never embedded). **Regression pins added (fails-on-old / passes-on-new):** core `canonical_decode_oversized_shape_is_bounded_not_allocated` (MEASURED: with the saturating guard removed it panics `capacity overflow` at `types.rs:801`; passes with the guard) and `canonical_decode_never_panics_on_arbitrary_bytes`; CLI `test_parse_csi_packet_oversized_claim_is_rejected_not_allocated` (a 255×65535-pair claim ≈ 33 MB in a 2 KB datagram → `None`, never OOMs) and `test_parse_csi_packet_never_panics_on_arbitrary_bytes`. `cargo test -p wifi-densepose-core`: **35 → 37** lib tests, 0 failed; `cargo test -p wifi-densepose-cli --no-default-features`: **24 → 26**, 0 failed. Workspace green; Python deterministic proof unchanged (`f8e76f21…46f7a`, bit-exact — core/cli are off the signal proof path). No production code changed — review is clean-with-evidence plus pins.
 - **`homecore` foundational state-machine review (ADR-127) — one real concurrency bug fixed (state-set TOCTOU dropping/reordering `state_changed` events) + two hardening fixes (entity_id memory-DoS, service-handler panic isolation), each pinned by a fails-on-old test; event-bus lag & lock discipline confirmed clean with evidence.** Beyond-SOTA security+concurrency review of the crate every other HOMECORE module builds on (state store `state.rs`, event bus `bus.rs`, service/entity registries, the `HomeCore` coordinator), un-covered by the ADR-154–159 sweep — a bug here is high-blast-radius. **HC-RACE-01 (state-set TOCTOU, the crux — race/lost-event).** `StateMachine::set` did `get()` (releasing the DashMap shard lock) → compute the next snapshot + the no-op/`last_changed` decision → `insert()` (re-acquiring the lock) → `send()`; the read-modify-write was **not atomic** w.r.t. a concurrent writer on the same entity, contradicting ADR-127 §2.1's promise that "the writer atomically replaces the map entry." A writer that read a **stale `old`** could mis-classify a genuine transition as a no-op and **silently drop its `state_changed` event** (a missed automation trigger) or fire an event whose `new_state` duplicated the previously delivered one (a spurious trigger for any automation keyed on `old_state != new_state`). **Fixed** by holding the shard write-lock across the whole read→decide→insert→fire sequence via `entry()`/`insert_entry()` — `tx.send` is non-blocking, non-async, and never re-enters the map, so firing under the shard lock cannot deadlock and keeps global event order in lock-step with global commit order. Pinned by `concurrent_set_fires_no_duplicate_adjacent_events` (4 writers toggling one entity A/B; asserts no two consecutive fired events carry an identical `new_state` — impossible under correct serialisation; an instrumented probe observed ~93k such duplicate-adjacent events across 200 trials on the racy code, **zero** on the fix; the test fails reliably on the first trial pre-fix). **HC-EID-LEN-01 (unbounded `entity_id`, memory-DoS).** `homecore-api/src/rest.rs` parses untrusted REST path segments straight through `EntityId::parse`; with no length cap an otherwise-valid id (`a.` + many MB of `[a-z0-9_]`) was accepted, and a `POST /api/states/<giant>` would persist it into the DashMap state store (permanent growth across distinct ids). **Fixed** by rejecting ids longer than `MAX_ENTITY_ID_LEN` (255, HA-compatible) up front in `parse()`, before any per-char scan, with a new `EntityIdError::TooLong` — fail-closed at the boundary type protects every caller (REST, registry deserialize, automation). Pinned by `entity_id_length_boundary` (exactly-MAX accepted; MAX+1 and a 4 MiB id rejected — oversized parses `Ok` on old code). **HC-SVC-PANIC-01 (service-handler panic not isolated).** `ServiceRegistry::call` already ran handlers **outside** the registry lock (the `Arc<dyn ServiceHandler>` is cloned out of the read guard first → no `RwLock` poisoning, no blocking of other callers — clean), but a panicking handler unwound through `call()` into the caller's task (the task driving the engine). **Hardened** by wrapping the handler future in `AssertUnwindSafe` + `catch_unwind`, converting a panic to `ServiceError::HandlerPanicked`; the registry stays fully usable (a sibling healthy service still returns, the bad service stays registered). Pinned by `panicking_handler_is_isolated_and_registry_survives` (unwinds through `call` on old code). **Dimensions confirmed clean (with evidence, no invented issues):** (1) **event-bus bounds / lag** (the homecore-api WS lag-DoS class) — both `StateMachine` and `EventBus` use **bounded** `tokio::sync::broadcast` (capacity 4,096); a slow subscriber gets a recoverable `Lagged(n)` (drop-oldest + re-sync) while `fire_*` is non-blocking and never waits on slow receivers, so a lagging subscriber **cannot block the publisher, grow the channel without bound, or kill a fast subscriber** (evidenced by `slow_subscriber_does_not_block_publisher_or_kill_the_bus` — fire 3× capacity at an idle subscriber, publisher unblocked, bus stays live, fresh fast subscriber receives, lagged one recovers); (2) **lock ordering / lock-across-await** (deadlock) — no code path holds two of `{state DashMap, registry RwLock, service RwLock}` simultaneously, so no inconsistent-ordering deadlock can exist; every `tokio::sync::RwLock` guard in `registry.rs`/`service.rs` is used in one synchronous statement and dropped before any `.await` (`call` explicitly scopes the read guard out before awaiting the handler); the only guard held across a send is the DashMap shard lock in `set`, across a **synchronous** broadcast send — safe; (3) **panic-on-input** — no reachable `unwrap`/`expect`/index in non-test code beyond the safe `send().unwrap_or(0)` and the dead-but-harmless `split_once(...).unwrap_or(...)` fallbacks on already-validated ids. `cargo test -p homecore --no-default-features`: **20 → 24 passed, 0 failed** (+4 pins). Workspace green; Python deterministic proof unchanged (`f8e76f21…46f7a`, bit-exact — `homecore` is off the signal proof path). Review notes appended to ADR-127 §9.
 - **`homecore-migrate` security review (ADR-165 surfaces) — one real secret-leak fix; traversal / data-loss / panic / injection dimensions confirmed clean with evidence.** Beyond-SOTA review of the Home-Assistant `.storage`/`secrets.yaml`/`automations.yaml` migrator, the two sharp surfaces being secret handling (`secrets.rs`) and untrusted-file parsing. **Finding + fix (secret-leak, `secrets.rs`):** a malformed `secrets.yaml` whose offending scalar fails a typed-tag coercion (e.g. `port: !!int <value>`) produced a `serde_yaml` error whose message **embeds the scalar verbatim** — `invalid value: string "<the-secret-value>"`. The old code wrapped that message into `MigrateError::YamlParse { source }`; the error propagates out of `read_secrets`, is `?`-returned by the `InspectSecrets` CLI path in `main.rs`, and printed to stderr by `anyhow` — **leaking a secret value despite the CLI's deliberate `<redacted>` design** (`main.rs` only ever prints keys as `<key> = <redacted>`). Fix: `secrets.yaml` parse failures now map to a dedicated redacting variant `MigrateError::SecretsParse { path, line, column }` carrying only the file path + a coarse location (from `serde_yaml::Error::location()`), never the scalar; other (non-secret) YAML files keep `YamlParse`. **Pinned** by `secrets::tests::malformed_secrets_error_never_contains_secret_value`, which asserts the rendered error **and its full `#[source]` chain** never contain the secret value and that the error is still the structured `SecretsParse` (fail-closed) — it **fails on the old `YamlParse` path** (observed leak: `... invalid value: string "s3cr3t_TOKEN_VALUE" ...`) and passes on the fix; plus `malformed_secrets_error_reports_location` (still locatable). **Confirmed clean with evidence:** *secret leakage elsewhere* — the only secret sink is the value map; `main.rs` redacts values, and the `MissingField`/`Io` paths surface only the path, never content. *Source mutation / data-loss* — **structurally impossible**: there is no `fs::write`/`fs::remove`/`fs::create`/`File::create`/`OpenOptions` anywhere in the crate; P1 reads source and writes nothing (`import-entities` is in-memory only), so re-runs are trivially idempotent and the HA source is never touched. *Path traversal* — CLI takes a `--config-dir`/`--storage` dir and joins **fixed** filenames (`secrets.yaml`, `core.entity_registry`, …); no user-controlled path component, no `..`/absolute escape beyond the user's own privileges. *Panic-on-input* — probed duplicate-key, bad-indent, tab/control-char, multi-doc, non-mapping-root, unterminated-flow, `!input` blueprint tags, deep nesting, anchors: **every** malformed/typed/truncated input **errors, never panics** (all production code is panic-free; every `unwrap`/`expect` is `#[cfg(test)]`). *Fail-closed versioning* — unknown storage `minor_version` hard-errors (no silent fallback to an older parser). *Injection* — no SQL/shell/path interpolation; the tool emits diagnostics only and persists nothing in P1. `homecore-migrate` **19 → 21** tests (`--no-default-features`), 0 failed. Behaviour otherwise unchanged; Python deterministic proof PASS, hash unchanged (`homecore-migrate` is off the signal proof path).
diff --git a/docs/adr/ADR-176-ruview-swarm-nan-fail-open-safety-review.md b/docs/adr/ADR-176-ruview-swarm-nan-fail-open-safety-review.md
new file mode 100644
index 00000000..b4151587
--- /dev/null
+++ b/docs/adr/ADR-176-ruview-swarm-nan-fail-open-safety-review.md
@@ -0,0 +1,103 @@
+# ADR-176: `ruview-swarm` NaN-Fail-Open Safety Review
+
+| Field | Value |
+|-------|-------|
+| **Status** | Accepted — 4 real safety bugs fixed + pinned; 2 issues documented for follow-up |
+| **Date** | 2026-06-15 |
+| **Deciders** | ruv |
+| **Codename** | **SWARM-FAILCLOSED** |
+| **Reviews** | ADR-148 (`ruview-swarm` drone swarm control plane) |
+| **Milestone** | #9 (ungated-crate security sweep) — crate 1 of 4 |
+
+## Context
+
+`ruview-swarm` (ADR-148) is the drone swarm control plane — hierarchical-mesh
+topology, Raft consensus, MARL, CSI sensing payload, MAVLink/PX4 command
+dispatch. It is the highest-stakes of the four never-reviewed v2 crates: a defect
+here can produce an **unsafe physical drone command**. It had no prior security
+ADR.
+
+### Trust-boundary map
+Untrusted input enters via `SwarmOrchestrator::receive_peer_state` /
+`receive_peer_detection`, which accept full `DroneState` / `CsiDetection` serde
+structs with **f64/f32 fields and no finite-check**, and via
+`SwarmConfig`/`FhssConfig`/`Geofence` deserialization. The MAVLink wire formats in
+`mavlink_messages.rs` are **integer-encoded** (i32 mm / u8) and provably cannot
+carry NaN — so the NaN class is reachable through the **serde struct path, not the
+MAVLink decode path**. Commands flow out to a `FlightController` (PX4/ArduPilot).
+
+The unifying bug class found: **IEEE-754 NaN/Inf silently defeating a safety
+comparison** (`NaN < threshold` evaluates to `false`), causing safety logic to
+**fail OPEN**. This is distinct from — but rhymes with — the NaN-state-poisoning
+class found earlier in calibration/vitals/geo (there, NaN latched into persistent
+state; here, NaN slips through a one-shot guard). Both are "non-finite input
+defeats logic," and the fix discipline is the same: **reject non-finite at the
+trust boundary, fail CLOSED.**
+
+## Decision
+
+Fix the four reachable fail-open bugs by making each safety predicate
+non-finite-aware and fail-closed, each pinned by a fails-on-old test. Document
+two further genuine issues that need larger, riskier changes rather than churning
+them in a security pass.
+
+### Findings fixed (all MEASURED fails-on-old)
+
+| # | Severity | File:line | Issue | Fix | Pin (old behavior) |
+|---|----------|-----------|-------|-----|--------------------|
+| F1a | **HIGH** | `failsafe/mod.rs:51` | `nearest_neighbor_dist < collision_dist_m` fails open on a NaN peer position → **collision avoidance silently disabled** | `!is_finite() ||` → `EmergencyDiverge` | `test_nan_neighbor_distance_fails_closed_to_diverge` (old → `Nominal`) |
+| F1b | **HIGH** | `failsafe/mod.rs:75` | NaN `battery_pct` bypasses every battery check → drone stays Nominal on unknown battery | `!is_finite() ||` → `ReturnToHome` | `test_nan_battery_fails_closed_to_rth` (old → `Nominal`) |
+| F2 | **MEDIUM** | `security/geofence.rs:33` | NaN `z` altitude skips the altitude-breach check and point-in-polygon returns `Safe` → silent geofence bypass | leading non-finite coord → `HardBreach` | `test_nan_altitude_fails_closed` (old → `Safe`) |
+| F3 | **MEDIUM/DoS** | `security/antijamming.rs:65,71,102` | empty deserialized `channels_mhz` → `% 0` **panic** in `next_hop`/`current_channel_mhz`/`evasive_hop`/`tick`, crashing the radio task | `len == 0` early-return (`0.0` sentinel) | `test_empty_channels_does_not_panic` (old → panic `divisor of zero`) |
+| F4 | **LOW** | `sensing/multiview.rs:70` | NaN `victim_position` passes the `is_some()` filter and propagates into the fused "confirmed victim" location dispatched to the swarm | require finite confidence + position (drop) | `test_nan_victim_position_dropped_from_fusion` (old → non-finite fused position) |
+
+### Dimensions confirmed clean (with evidence)
+- **MAVLink decode panic-safety** — `SwarmNodeState::decode(&[u8;20])` `try_into().unwrap()`s are over fixed const ranges of a fixed-size array → provably infallible; no arbitrary-length `&[u8]` decode path exists.
+- **UWB/GPS anti-spoofing NaN-safe** — `(gps_dist - uwb_dist).abs() <= tol` already fails CLOSED on a NaN range (counts as inconsistent → spoof rejected); covered by `test_spoofed_gps_invalid`.
+- **Bounded grid / no allocate-from-length-field** — `ProbabilityGrid` bounds-checks `cx/cy`; `pos_to_cell` uses saturating `as u32` (no UB).
+- **Mesh `nearest_k` NaN-safe sort** — `partial_cmp(..).unwrap_or(Equal)` cannot panic on NaN.
+- **No hardcoded secrets** — `MavlinkSigner` key is constructor-injected `[u8;32]`; grep-confirmed nothing embedded.
+
+### Documented, not fixed (genuine — deferred to avoid churn/regression risk)
+
+1. **Raft `AppendEntries` lacks the Log-Matching consistency check**
+   (`topology/raft.rs:187`). A follower appends a leader's entries when
+   `term >= current_term` **without validating `prev_log_index`/`prev_log_term`**,
+   so a malformed/byzantine leader can corrupt a follower's log — a genuine
+   consensus-safety gap. A correct fix reworks the log-append plus the
+   caller-side vote-tally contract (the existing `handle_message` delegates
+   tallying to the caller) — a larger change with test-rewrite risk, so it is
+   recorded here rather than rushed in a security pass.
+2. **`MavlinkSigner::verify` uses a non-constant-time tag `==` and has no
+   replay/timestamp-window rejection** (`security/mavlink_signing.rs:64`). The
+   module doc already flags the replay limitation as a demo/test simplification.
+   Hardening (constant-time compare + monotonic timestamp window) is a focused
+   follow-up.
+
+These two are the recommended scope of the next `ruview-swarm` hardening pass.
+
+## Validation
+
+- `cargo test -p ruview-swarm --no-default-features` → **117 → 123** passed, 0 failed (+6 pins).
+- All 6 new tests MEASURED fails-on-old (2× `Nominal`, `Safe`, panic `divisor of zero`, non-finite fused position); pass on the fix.
+- `cargo test --workspace --no-default-features` → **exit 0**, 0 failed.
+- `python archive/v1/data/proof/verify.py` → **VERDICT: PASS**, hash
+  `f8e76f21…46f7a` unchanged (ruview-swarm off the signal proof path).
+
+## Consequences
+
+### Positive
+- Four reachable fail-open paths in a *physical-safety* control plane (collision
+  avoidance, battery RTH, geofence, anti-jamming radio task) now fail CLOSED on
+  hostile/degenerate input, each regression-pinned.
+- Extends the "non-finite input defeats logic" defense from the state-poisoning
+  variant (calibration/vitals/geo) to the fail-open-comparison variant.
+
+### Negative / Neutral
+- Two genuine issues (Raft log-matching, MAVLink signer) remain open by choice —
+  see Documented-not-fixed; they define the next hardening pass.
+
+## Links
+- ADR-148 — `ruview-swarm` drone swarm control system
+- ADR-172 — core/cli review (where the NaN bug-class root question was settled NO)
+- ADR-127 — homecore review (sibling NaN/concurrency hardening)
diff --git a/v2/crates/ruview-swarm/src/failsafe/mod.rs b/v2/crates/ruview-swarm/src/failsafe/mod.rs
index 41b36710..bcfa380e 100644
--- a/v2/crates/ruview-swarm/src/failsafe/mod.rs
+++ b/v2/crates/ruview-swarm/src/failsafe/mod.rs
@@ -47,8 +47,18 @@ impl FailSafeMachine {
         link_alive: bool,
         nearest_neighbor_dist: f64,
     ) -> FailSafeState {
-        // Collision avoidance has highest priority
-        if nearest_neighbor_dist < self.collision_dist_m {
+        // Collision avoidance has highest priority.
+        //
+        // Fail CLOSED on a non-finite neighbour distance. `nearest_neighbor_dist`
+        // is derived from peer positions (see
+        // `SwarmOrchestrator::nearest_peer_distance`), which arrive over the
+        // untrusted swarm comm layer as `DroneState` values whose f64 position
+        // fields can deserialize to NaN/Inf. A naive `NaN < collision_dist_m`
+        // evaluates to `false`, silently DISABLING collision avoidance — the
+        // worst possible failure for a physical drone. Treat a non-finite
+        // distance as "too close" so the swarm diverges rather than trusting a
+        // poisoned reading.
+        if !nearest_neighbor_dist.is_finite() || nearest_neighbor_dist < self.collision_dist_m {
             self.state = FailSafeState::EmergencyDiverge;
             return self.state.clone();
         }
@@ -71,8 +81,11 @@ impl FailSafeMachine {
             }
         }
 
-        // Battery checks
-        if state.battery_pct <= self.battery_rth_pct {
+        // Battery checks. A non-finite battery reading (NaN/Inf from a corrupt or
+        // forged telemetry/peer message) must fail CLOSED: `NaN <= threshold` is
+        // `false`, which would otherwise let a drone with an unknown battery
+        // level keep flying nominally. Treat a non-finite reading as critical.
+        if !state.battery_pct.is_finite() || state.battery_pct <= self.battery_rth_pct {
             self.state = FailSafeState::ReturnToHome;
         } else if state.battery_pct <= self.battery_warn_pct {
             self.state = FailSafeState::LowBatteryWarn;
@@ -144,4 +157,35 @@ mod tests {
         let result = fsm.tick(&s, true, 0.5); // too close
         assert_eq!(result, FailSafeState::EmergencyDiverge);
     }
+
+    /// Security: a NaN neighbour distance (poisoned peer position over the swarm
+    /// comm layer) must NOT silently disable collision avoidance. Fails on old
+    /// code where `NaN < collision_dist_m` is `false` and the state stays Nominal.
+    #[test]
+    fn test_nan_neighbor_distance_fails_closed_to_diverge() {
+        let mut fsm = FailSafeMachine::new();
+        let s = good_state();
+        let result = fsm.tick(&s, true, f64::NAN);
+        assert_eq!(
+            result,
+            FailSafeState::EmergencyDiverge,
+            "non-finite neighbour distance must fail closed to EmergencyDiverge"
+        );
+    }
+
+    /// Security: a NaN battery reading must fail closed to ReturnToHome rather
+    /// than being treated as a healthy battery. Fails on old code where
+    /// `NaN <= battery_rth_pct` is `false` and the drone stays Nominal.
+    #[test]
+    fn test_nan_battery_fails_closed_to_rth() {
+        let mut fsm = FailSafeMachine::new();
+        let mut s = good_state();
+        s.battery_pct = f32::NAN;
+        let result = fsm.tick(&s, true, 10.0);
+        assert_eq!(
+            result,
+            FailSafeState::ReturnToHome,
+            "non-finite battery must fail closed to ReturnToHome"
+        );
+    }
 }
diff --git a/v2/crates/ruview-swarm/src/security/antijamming.rs b/v2/crates/ruview-swarm/src/security/antijamming.rs
index 1a175250..e79060b4 100644
--- a/v2/crates/ruview-swarm/src/security/antijamming.rs
+++ b/v2/crates/ruview-swarm/src/security/antijamming.rs
@@ -59,8 +59,16 @@ impl FhssRadio {
     }
 
     /// Returns the current active channel frequency in MHz.
+    ///
+    /// `FhssConfig` is `Deserialize`, so `channels_mhz` can arrive empty from a
+    /// malformed or hostile config. An empty channel list would make `% n`
+    /// (n = 0) panic with a divide-by-zero. Guard it and return a benign `0.0`
+    /// sentinel instead of crashing the radio task (DoS-resistance).
     pub fn current_channel_mhz(&self) -> f64 {
         let n = self.config.channels_mhz.len();
+        if n == 0 {
+            return 0.0;
+        }
         // XOR node seed into hop index so each node uses a different offset
         let idx = (self.hop_index ^ (self.node_seed as usize)) % n;
         self.config.channels_mhz[idx]
@@ -68,7 +76,11 @@ impl FhssRadio {
 
     /// Advance the hop sequence by one step (call at hop_rate_hz).
     pub fn next_hop(&mut self) {
-        self.hop_index = (self.hop_index + 1) % self.config.channels_mhz.len();
+        let n = self.config.channels_mhz.len();
+        if n == 0 {
+            return; // no channels configured — nothing to hop (avoid `% 0` panic)
+        }
+        self.hop_index = (self.hop_index + 1) % n;
     }
 
     /// Update with latest RSSI measurement. Drives jamming detection.
@@ -97,9 +109,13 @@ impl FhssRadio {
             .wrapping_mul(lcg_a)
             .wrapping_add(self.evasion_count)
             .wrapping_add(lcg_c);
-        let n = self.config.channels_mhz.len() as u64;
+        let len = self.config.channels_mhz.len();
+        if len == 0 {
+            return; // no channels configured — avoid `% 0` panic
+        }
+        let n = len as u64;
         let offset = (seed % n / 4 + 3) as usize;
-        self.hop_index = (self.hop_index + offset) % self.config.channels_mhz.len();
+        self.hop_index = (self.hop_index + offset) % len;
         self.evasion_count += 1;
         self.rssi_history.clear();
     }
@@ -165,6 +181,23 @@ mod tests {
         assert_eq!(radio.hop_index, (initial_idx + 2) % 50);
     }
 
+    /// Security/DoS: an empty `channels_mhz` (deserialized from a malformed or
+    /// hostile config) must not panic with a `% 0` divide-by-zero. Fails on old
+    /// code, where `next_hop`/`current_channel_mhz`/`evasive_hop`/`tick` all do
+    /// modulo / index by `channels_mhz.len()`.
+    #[test]
+    fn test_empty_channels_does_not_panic() {
+        let cfg = FhssConfig { channels_mhz: vec![], jamming_detect_window: 1, ..Default::default() };
+        let mut radio = FhssRadio::new(7, cfg);
+        // None of these may panic.
+        let _ = radio.current_channel_mhz();
+        radio.next_hop();
+        radio.observe_rssi(-99.0); // window=1 → jamming_detected() true → evasive_hop()
+        radio.tick(100.0);
+        radio.evasive_hop();
+        assert_eq!(radio.current_channel_mhz(), 0.0, "empty channel list returns sentinel");
+    }
+
     #[test]
     fn test_channel_in_valid_range() {
         let cfg = FhssConfig::default();
diff --git a/v2/crates/ruview-swarm/src/security/geofence.rs b/v2/crates/ruview-swarm/src/security/geofence.rs
index f2a5e8df..47599ea0 100644
--- a/v2/crates/ruview-swarm/src/security/geofence.rs
+++ b/v2/crates/ruview-swarm/src/security/geofence.rs
@@ -27,6 +27,16 @@ pub enum GeofenceResult {
 impl Geofence {
     /// Check a position against this geofence.
     pub fn check(&self, pos: &Position3D) -> GeofenceResult {
+        // Fail CLOSED on a non-finite position. A NaN/Inf component (from a
+        // corrupt GPS/EKF estimate or a forged position) makes every subsequent
+        // comparison false: `NaN < min || NaN > max` is `false`, so the altitude
+        // breach is skipped, and a NaN altitude with otherwise-valid x/y would
+        // return `Safe` — a silent geofence bypass on a flight-safety boundary.
+        // Treat any non-finite coordinate as a hard breach.
+        if !pos.x.is_finite() || !pos.y.is_finite() || !pos.z.is_finite() {
+            return GeofenceResult::HardBreach;
+        }
+
         let altitude_m = -pos.z; // NED: negative z = altitude above ground
 
         // Altitude check
@@ -146,4 +156,29 @@ mod tests {
         let pos = Position3D { x: 50.0, y: 50.0, z: -200.0 }; // 200m altitude
         assert_eq!(f.check(&pos), GeofenceResult::HardBreach);
     }
+
+    /// Security: a NaN altitude with an otherwise in-bounds x/y must fail closed
+    /// to HardBreach. Fails on old code where `NaN < min || NaN > max` is `false`,
+    /// the altitude check is skipped, and the point-in-polygon path returns Safe —
+    /// a silent geofence bypass.
+    #[test]
+    fn test_nan_altitude_fails_closed() {
+        let f = square_fence();
+        let pos = Position3D { x: 50.0, y: 50.0, z: f64::NAN };
+        assert_eq!(f.check(&pos), GeofenceResult::HardBreach);
+    }
+
+    /// Security: NaN/Inf horizontal coordinates must also fail closed.
+    #[test]
+    fn test_nonfinite_horizontal_fails_closed() {
+        let f = square_fence();
+        assert_eq!(
+            f.check(&Position3D { x: f64::NAN, y: 50.0, z: -30.0 }),
+            GeofenceResult::HardBreach
+        );
+        assert_eq!(
+            f.check(&Position3D { x: 50.0, y: f64::INFINITY, z: -30.0 }),
+            GeofenceResult::HardBreach
+        );
+    }
 }
diff --git a/v2/crates/ruview-swarm/src/sensing/multiview.rs b/v2/crates/ruview-swarm/src/sensing/multiview.rs
index 62ea9ca4..daa7f95b 100644
--- a/v2/crates/ruview-swarm/src/sensing/multiview.rs
+++ b/v2/crates/ruview-swarm/src/sensing/multiview.rs
@@ -64,10 +64,25 @@ impl MultiViewFusion {
         detections: &[CsiDetection],
         drone_positions: &[(NodeId, Position3D)],
     ) -> Option<FusedDetection> {
-        // Filter by confidence and require estimated position
+        // Filter by confidence and require a FINITE estimated position.
+        //
+        // A peer detection (received via `receive_peer_detection`) carries f32/f64
+        // fields that can deserialize to NaN/Inf. A NaN `victim_position` passes
+        // `is_some()` and would propagate through the confidence-weighted average
+        // into the fused position — dispatching a NaN "confirmed victim" location
+        // to the swarm. A NaN `confidence` is already rejected by `>= min_confidence`
+        // (NaN comparisons are false), but we make that explicit and also require
+        // the victim position components to be finite. Fail CLOSED: drop poisoned
+        // detections rather than fusing them.
         let valid: Vec<(&CsiDetection, &Position3D)> = detections
             .iter()
-            .filter(|d| d.confidence >= self.min_confidence && d.victim_position.is_some())
+            .filter(|d| {
+                d.confidence.is_finite()
+                    && d.confidence >= self.min_confidence
+                    && d.victim_position
+                        .map(|p| p.x.is_finite() && p.y.is_finite() && p.z.is_finite())
+                        .unwrap_or(false)
+            })
             .filter_map(|d| {
                 let drone_pos = drone_positions
                     .iter()
@@ -177,4 +192,46 @@ mod tests {
             result.uncertainty_m
         );
     }
+
+    /// Security: a detection with a NaN victim position (poisoned peer report)
+    /// must be dropped, not fused. Fails on old code where the NaN propagates
+    /// into the confidence-weighted average and the fused position is NaN.
+    #[test]
+    fn test_nan_victim_position_dropped_from_fusion() {
+        let fusion = MultiViewFusion { min_viewpoints: 2, min_confidence: 0.5 };
+        let detections = vec![
+            CsiDetection {
+                drone_id: NodeId(0),
+                confidence: 0.9,
+                victim_position: Some(Position3D { x: 50.0, y: 50.0, z: 0.0 }),
+                timestamp_ms: 0,
+            },
+            CsiDetection {
+                drone_id: NodeId(1),
+                confidence: 0.9,
+                victim_position: Some(Position3D { x: f64::NAN, y: 50.0, z: 0.0 }),
+                timestamp_ms: 0,
+            },
+            CsiDetection {
+                drone_id: NodeId(2),
+                confidence: 0.9,
+                victim_position: Some(Position3D { x: 50.0, y: 50.0, z: 0.0 }),
+                timestamp_ms: 0,
+            },
+        ];
+        let positions = vec![
+            (NodeId(0), Position3D { x: 0.0, y: 0.0, z: -30.0 }),
+            (NodeId(1), Position3D { x: 100.0, y: 0.0, z: -30.0 }),
+            (NodeId(2), Position3D { x: 50.0, y: 86.6, z: -30.0 }),
+        ];
+        // Two finite viewpoints remain → still fuses, but the result must be finite.
+        let result = fusion.fuse(&detections, &positions).unwrap();
+        assert!(
+            result.estimated_position.x.is_finite()
+                && result.estimated_position.y.is_finite()
+                && result.estimated_position.z.is_finite(),
+            "fused position must be finite when a NaN detection is present"
+        );
+        assert!(!result.contributing_drones.contains(&NodeId(1)), "NaN detection must be excluded");
+    }
 }