wifi-densepose

Author SHA1 Message Date

Author	SHA1	Message	Date
rUv	42dcf49f4d	fix(adr): resolve duplicate ADR numbers + close ADR-080 security + ADR-154 M1 signal backlog (#1051 ) * fix(signal): circular phase variance for ghost-tap guard (ADR-154 §7.4 #1) `phase_variance` computed a LINEAR sample variance over phase angles that wrap at ±π, so a tightly-clustered set straddling the branch cut reported spuriously HIGH dispersion — false-tripping the `> TAU` ghost-tap guard on real, tightly-clustered CIR taps. Replace with Mardia's circular variance V = 1 − R̄, bounded [0,1] and invariant to where the cluster sits on the circle. Re-derive the guard against the bounded metric via a named const `GHOST_TAP_CIRCULAR_VARIANCE_MAX` (the old TAU-scaled threshold is meaningless on [0,1]). Grade: metric fix MEASURED; threshold value DATA-GATED — a clean single-path ramp also sweeps the circle, so V alone cannot separate clean from unsanitized without labelled frames. Conservative default (0.99) errs toward never false-rejecting, strictly more permissive at the wrap boundary than the buggy linear guard. Fails-on-old test: `phase_variance_circular_not_fooled_by_branch_cut` — inlines the old linear variance to show it exceeds TAU on wrap-straddling phases while circular V≈0 and the guard no longer trips. Plus `phase_variance_circular_is_bounded_and_extremal` (V∈[0,1], V≈0 identical, V≈1 uniform). cargo test -p wifi-densepose-signal --no-default-features --features cir --lib → 432 passed, 0 failed. Co-Authored-By: claude-flow <ruv@ruv.net> * fix(signal): pin Welford n=0/n=1 finiteness guard (ADR-154 §7.4 #10) The shared `WelfordStats` (field_model.rs, used by longitudinal.rs and others) relies on `count < 2` guards in `variance`/`sample_variance`/`std_dev`/ `z_score` to stay finite at the boundaries. The guards existed but the n=0 boundary was UNTESTED — exactly the §4 divide-by-(n−1) family the ADR groups this with. Add `welford_finite_at_n0_and_n1` asserting every statistic is finite and returns the documented sentinel (0.0) at n=0 and n=1, plus load-bearing doc comments on the two guards. Fails-on-old proof: with the `sample_variance` guard removed, the test FAILS with "attempt to subtract with overflow" at the `(self.count - 1)` underflow (0usize − 1); `variance` would similarly yield 0.0/0.0 = NaN. The guard is restored; the test pins it so a future regression is caught. Grade: MEASURED (boundary finiteness is asserted; the guard is the §4-family fix made testable). cargo test -p wifi-densepose-signal --no-default-features --lib field_model → 22 passed, 0 failed. Co-Authored-By: claude-flow <ruv@ruv.net> * refactor(signal): de-magic adversarial thresholds + boundary tests (ADR-154 §7.4 #13) Lift the bare numeric literals buried in `check`/`check_consistency` into named, documented module consts (FIELD_MODEL_GINI_VIOLATION=0.8, ENERGY_RATIO_HIGH_VIOLATION=2.0, ENERGY_RATIO_LOW_VIOLATION=0.1, CONSISTENCY_ACTIVE_FRACTION_OF_MEAN=0.1, SCORE_W_* weights). VALUES UNCHANGED — each const equals the original literal; only names + pinning tests are new. Grade: DATA-GATED. The operating values stay empirical (defensible values need labelled spoofed/clean CSI — Wi-Spoof, §6.2/§7.3). The de-magicking + characterization tests are MEASURED: `tuning_consts_unchanged_from_literals`, `energy_ratio_high_boundary`, `energy_ratio_low_boundary`, `field_model_gini_boundary`, `consistency_active_fraction_boundary` pin the decision boundaries at/just-below/just-above each threshold, so a future data-driven retune is a visible, tested change. Fails-on-change proof: bumping ENERGY_RATIO_HIGH_VIOLATION 2.0→3.0 makes `energy_ratio_high_boundary` FAIL (restored). Operating values explicitly NOT changed. cargo test -p wifi-densepose-signal --no-default-features --lib ruvsense::adversarial → 20 passed, 0 failed. Co-Authored-By: claude-flow <ruv@ruv.net> * refactor(signal): de-magic coherence drift/gate thresholds (ADR-154 §7.4 #9) Lift the bare detection literals in `coherence.rs::classify_drift` (DRIFT_STABLE_SCORE=0.85, DRIFT_STEP_CHANGE_MAX_STALE=10) and the `coherence_gate.rs` Default impl (DEFAULT_ACCEPT_THRESHOLD=0.85, DEFAULT_REJECT_THRESHOLD=0.5, DEFAULT_MAX_STALE_FRAMES=200, DEFAULT_PREDICT_ONLY_NOISE=3.0) into named, documented consts. VALUES UNCHANGED. The gate already exposed these via GatePolicyConfig (config seam); this names + pins the defaults. Grade: DATA-GATED. Operating values stay empirical (defensible Z-score thresholds need labelled stable/drifting coherence traces). De-magicking + boundary tests are MEASURED: `classify_drift_stable_score_boundary`, `classify_drift_stale_count_boundary` pin the at/just-below/just-above decisions; `drift_consts_unchanged_from_literals` / `gate_default_consts_unchanged_from_literals` pin the values. Operating values explicitly NOT changed. cargo test -p wifi-densepose-signal --no-default-features --lib ruvsense::coherence → 40 passed, 0 failed. Co-Authored-By: claude-flow <ruv@ruv.net> * docs(adr-154): mark §7.4 P1 backlog cleared — Milestone-1 (#1,#10 RESOLVED; #9,#13 DATA-GATED) Update ADR-154 §7.4 backlog rows #1, #9, #10, #13 with commit refs + grades, the §7.4 intro count (four P1 items cleared, ~41 P2/P3 remain), the Horizon-ledger one-liner (Milestone-1 DONE), and the §8 honest-limits #1 line (metric now correct; threshold still DATA-GATED). Add CHANGELOG [Unreleased] entry. Grades: #1 RESOLVED (MEASURED metric / DATA-GATED threshold), #10 RESOLVED (MEASURED), #9 & #13 RESOLVED-PARTIAL (DATA-GATED — de-magicked + boundary tested, operating values unchanged). Validation: cargo test --workspace --no-default-features → 2057 passed, 0 failed; wifi-densepose-signal lib → 442 passed (no-default + --features cir); python archive/v1/data/proof/verify.py → VERDICT: PASS, hash f8e76f21…46f7a UNCHANGED (CIR ghost-tap guard is not on the deterministic proof path). Co-Authored-By: claude-flow <ruv@ruv.net> * fix(sensing-server): stop leaking internal errors in HTTP responses (ADR-080 #2) Six handlers in `main.rs` serialized the internal error `Display` straight into the JSON response body, leaking server internals to any client (ADR-080 finding #2, CWE-209; reframed onto the Rust boundary by ADR-164 G11): - edge_registry_endpoint: a panicked spawn_blocking `JoinError` ("task … panicked") in a 500, and the raw upstream error in a 503 - delete_model / delete_recording / start_recording: std::io::Error strings carrying OS detail / filesystem paths - calibration_start / calibration_stop: the FieldModel error chain New `error_response` module: `internal_error` / `internal_error_json` / `upstream_unavailable` log the full detail server-side only (tagged with a correlation id) and return a generic body (`{"error":"internal_error","correlation_id":…}`) — no `panicked`, no file paths, no Debug chain. The correlation id lets an operator join a client report to the exact server log line without ever shipping the detail. Pinned by 5 error_response tests, incl. a leak-substring guard (internal_error_body_does_not_leak_detail) verified to FAIL on the reverted old body (returns the panic message / path / "os error"). The HOMECORE sweep (ADR-161) covered homecore-server, not this crate. Co-Authored-By: claude-flow <ruv@ruv.net> * test(sensing-server): pin XFF-immunity + no-query-token (ADR-080 #1, #3) Findings #1 (XFF-spoofing bypass) and #3 (JWT-in-URL, CWE-598) were logged against the Python v1 API but are VERIFIED ABSENT on the current Rust sensing-server, so they get regression tests rather than redundant fixes: - #1 XFF: there is no IP-based rate-limiter or IP-allowlist to bypass, and neither security middleware reads a forwarded header. Added bearer_auth::xff_header_never_affects_auth_decision (spoofed X-Forwarded-For never flips a 401<->200 decision) and host_validation::forwarded_headers_never_bypass_host_allowlist (spoofed X-Forwarded-Host: localhost never lets Host: evil.com past the allowlist). - #3 JWT-in-URL: require_bearer reads the token only from the Authorization header; WS handlers take no query token; the sole Query extractor (EdgeRegistryParams) is a non-secret refresh flag. Added bearer_auth::query_string_token_is_never_accepted — ?token= / ?access_token= in the URL never authenticates (stays 401) while the header path still 200s. Verified to FAIL when a query-token path is injected into require_bearer. Co-Authored-By: claude-flow <ruv@ruv.net> * docs(adr-080): mark P0 security findings #1-#3 RESOLVED; close ADR-164 G11 - ADR-080: Status note + per-finding closure (#1 XFF and #3 JWT-in-URL verified absent + regression-pinned; #2 leaked errors fixed via the error_response module). Records the v1-vs-Rust boundary distinction explicitly: v1 paths remain archived; this closure governs the shipped Rust sensing-server. - ADR-164: Gap Register G11 and the Open/Gated Backlog entry marked RESOLVED with the fix + branch reference. - CHANGELOG: [Unreleased] -> ### Security entry covering all three findings. Co-Authored-By: claude-flow <ruv@ruv.net> * docs(adr): renumber 6 displaced ADRs to resolve duplicate-number collisions (ADR-164 G1) Resolves the 5 duplicate ADR numbers (6 displaced files) flagged by ADR-164 Gap Register item G1. Canonical keeper per number = first file committed at that number (date tie-broken by inbound cross-reference count / parent-appendix relationship). Displaced files renumbered to the next free numbers (166-171): 050 keeps provisioning-tool-enhancements (5 refs vs 1) -> ADR-166-quality-engineering-security-hardening 052 keeps tauri-desktop-frontend (parent ADR) -> ADR-167-ddd-bounded-contexts (its appendix) 147 keeps nvidia-cosmos/OccWorld (the actual ADR, has Status header) -> ADR-168-benchmark-proof (proof companion, no Status) -> ADR-169-adam-mode-light-theme (was untracked) 148 keeps drone-swarm-control-system (committed #862) -> ADR-170-yoga-mode-pose-system (was untracked) 149 keeps public-community-leaderboard-huggingface (committed 16:47 vs 17:38) -> ADR-171-swarm-benchmarking-evaluation-methodology Updates in-file `# ADR-NNN` headers and intra-file self-references (yoga-modes * docs(adr): repoint inbound cross-references to renumbered ADRs (166-171) Follow-up to the ADR renumbering (ADR-164 G1). Updates every inbound reference that pointed at a displaced ADR, disambiguating shared numbers by title/slug so only references to the DISPLACED topic move and keeper references stay put. ADR-168 (was 147 benchmark-proof): README, CHANGELOG, user-guide, proof-of-capabilities, research docs 00/03 — all path/label refs updated. ADR-169 (was 147 adam-mode) / ADR-170 (was 148 yoga-mode): docs/adr/README index. ADR-171 (was 149 swarm-benchmarking): all ruview-swarm eval code+docs (Cargo.toml, evals/, eval_swarm.rs, metrics/mod/report/runner.rs), research doc 03 (every §-ref matched ADR-171 sections, not AetherArena), 00-system-review, series README, CHANGELOG, and ADR-148's forward/"open issues" pointers. ADR-166 (was 050 quality-engineering / security-hardening): disambiguated from the ADR-050 provisioning KEEPER by topic. The HMAC/secure_tdm, directory-traversal, bind-address, and OTA-PSK-auth references in code comments (wifi-densepose-hardware Cargo.toml + secure_tdm.rs, sensing-server main.rs) and in ADR-052-tauri / ADR-167 all describe the security-hardening ADR -> ADR-166. ADR-167 (was 052 ddd-appendix): inbound appendix references. Index/registry updates: docs/adr/README.md, gap-analysis/census.md (rows + header count), gap-analysis/lens-findings.md (collision table marked RESOLVED), and ADR-164 Gap Register G1 marked RESOLVED with the full renumber map. Keeper references deliberately untouched: all ADR-147 OccWorld code, all ADR-148 drone-swarm code/docs, all ADR-149 AetherArena refs (incl. ADR-150's SSL/resampling refs, which ADR-150 explicitly binds to the AetherArena benchmark), ADR-050 provisioning refs, ADR-052 tauri refs. The frozen GitHub blob URLs in docs/adr/.issue-177-body.md (pinned to an old branch) are left as historical. Comment-only code edits; no behavior change. wifi-densepose-hardware compiles clean; the sensing-server build's sole blocker is the pre-existing upstream midstreamer-temporal-compare@0.2.1 registry crate, unrelated to these edits. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-13 14:31:38 -04:00
rUv	0d3d835bf8	feat(swarm): add ruview-swarm crate — drone swarm control system (ADR-148) (#862 ) * feat(swarm): add wifi-densepose-swarm crate implementing ADR-148 drone swarm control system New crate `wifi-densepose-swarm` with hierarchical-mesh swarm topology, Raft consensus, MAPPO MARL, CSI sensing integration, and ITAR-gated coordination features. Closes 3 of 7 milestones (M1, M2, M5) with 5/5 ADR-148 SOTA performance targets met. ## Modules (45 source files, 14 modules) - types: NodeId, DroneState, Position3D, SwarmTask, SwarmError, FailSafeState - topology: Raft consensus (leader election, log replication, quorum), Gossip, Mesh - formation: VirtualStructure, LeaderFollower, Reynolds flocking (itar-gated) - planning: RRT-APF hybrid planner, 3-phase coverage, Bayesian grid, pheromone - allocation: Auction + FNN bid scorer (itar-gated) - sensing: CsiPayloadPipeline (Live/Synthetic/Replay), MultiViewFusion, OccWorldBridge - marl: MAPPO actor (3-layer MLP), LocalObservation (64-dim), RewardCalculator, PPO loop - security: MAVLink v2 HMAC-SHA256, UWB anti-spoofing, geofence, Remote ID, FHSS - failsafe: 10-state onboard machine, GCS-independent safety transitions - config: TOML SwarmConfig with SAR/inspection/agriculture/mine/demo/wi2sar_reference - demo: SyntheticCsiGenerator, DemoScenario (SAR/open-field/mine) - integration: FlightController trait, MAVLink dialect (50000-50005), SwarmSim - orchestrator: SwarmOrchestrator wiring all subsystems end-to-end - bench_support: Criterion fixture generators ## ITAR compliance Swarming coordination features gated behind `itar-unrestricted` feature per USML Category VIII(h)(12). Default build compiles clean stubs. ## Benchmark results (criterion, release mode) - MARL actor inference: 3.3 µs (target ≤ 5 ms — 1,516× headroom) - RRT-APF planning (100 iter): 0.043 ms (target < 300 ms — 6,946× headroom) - MultiView CSI fusion (3 UAVs): 58.5 ns (target < 10 ms — 171,000× headroom) - 3-view localization: 1.732 m (target ≤ 2 m — beats Wi2SAR SOTA) - 4-drone SAR coverage (400×400 m): 223 s (target ≤ 240 s — PASS) ## Tests - --no-default-features: 73/73 passing - --features itar-unrestricted: 85/85 passing Closes #861 Co-Authored-By: claude-flow <ruv@ruv.net> * refactor(swarm): rename wifi-densepose-swarm → ruview-swarm The swarm control system is a RuView-level capability (drone coordination, Raft consensus, MARL) that operates above the wifi-densepose sensing layer rather than being a sub-component of it. Rename aligns with the project identity and separates coordination infrastructure from sensing modules. Co-Authored-By: claude-flow <ruv@ruv.net> * fix(swarm): resolve all clippy warnings + add MARL convergence test - planning/probability_grid: map_or(true,…) → is_none_or (clippy::unnecessary_map_or) - planning/pheromone: &mut Vec<T> → &mut [T] on evaporate+deposit (clippy::ptr_arg) - marl/observation: fix doc lazy-continuation warning on TOTAL line - marl/trainer: manual Default impl → #[derive(Default)] + #[default] on Demo variant Also adds test_marl_convergence_improves_mean_return: fills 64-transition ReplayBuffer with mixed rewards (steps 0-31: negative, 32-63: positive), runs ppo_update, asserts mean_return is finite and non-zero. Result: 0 clippy warnings · 74/74 tests (default) · 86/86 (itar-unrestricted) Co-Authored-By: claude-flow <ruv@ruv.net> * feat(swarm): integrate Ruflo AI-agent capabilities into ruview-swarm Adds a feature-gated Ruflo integration layer connecting ruview-swarm to the claude-flow daemon's AgentDB, AIDefence, and SONA intelligence subsystems. Default build is unaffected (all paths behind `Option<Box<dyn RufloBackend>>`). ## New module: src/ruflo/ - backend.rs: RufloBackend trait (9 async methods) + RufloError, MissionMemoryEntry, PatternEntry, MavlinkScanResult types (always compiled) - mock_backend.rs: MockRufloBackend in-memory impl for testing (always compiled, 5 tests) - http_backend.rs: HttpRufloBackend — JSON-RPC 2.0 → claude-flow daemon localhost:3000 (gated behind `ruflo` feature, requires reqwest) - mission_summary.rs: MissionSummary serializer with pattern description + confidence scoring from victim recall, coverage %, collision penalty (always compiled, 3 tests) ## 4 capability areas 1. MissionMemory → memory_store / memory_search (cross-mission victim memory) 2. PatternLearner → agentdb_pattern-store / -search (HNSW SONA trajectory patterns) 3. MavlinkDefence → aidefence_is_safe / aidefence_scan (scan MAVLink before accepting) 4. IntelligenceHooks → trajectory-start/step/end (SONA learning loop) ## SwarmOrchestrator integration - with_ruflo(backend): builder to attach a backend - start_trajectory(task) / finish_trajectory(success, key): SONA mission lifecycle - receive_peer_detection_checked(): AIDefence scan before accepting peer detections ## Cargo feature `ruflo = ["dep:reqwest", "dep:serde_json"]` — optional, not in default ## Tests - --no-default-features: 82/82 pass (8 new ruflo tests) - --features ruflo,itar-unrestricted: 94/94 pass Co-Authored-By: claude-flow <ruv@ruv.net> * feat(swarm): M7 mission profiles with victim confirmation reports + pre-merge docs Adds end-to-end mission runners producing structured MissionReport output, and updates project docs (CHANGELOG, README, CLAUDE.md) per pre-merge checklist. ## M7 Mission Profiles (integration/mission_report.rs + swarm_sim.rs) - MissionReport / VictimReport / SotaComparison types (serde-serializable) - run_mission_with_report(): full mission → detailed report with per-victim localization error, fusion uncertainty, contributing drones, detection time - run_inspection_mission(): leader-follower power-line corridor inspection - run_mine_mission(): GPS-denied underground (2-drone, slow, UWB-only) - SotaComparison embeds Wi2SAR baseline (5m / 810s) vs achieved metrics ## Docs (pre-merge checklist) - CHANGELOG.md: ruview-swarm + Ruflo integration + performance entries - README.md: ruview-swarm row - CLAUDE.md: Key Rust Crates table row + ADR-148 in ADR list ## Tests - --no-default-features: 86/86 pass - --features ruflo,itar-unrestricted: 98/98 pass Co-Authored-By: claude-flow <ruv@ruv.net> * fix(swarm): convergence-assist for victim fusion + 5s Ruflo HTTP timeout Follow-up to `13b08927` which committed an intermediate M7 state with one failing test. This lands the M7 agent's convergence fixes and the security review's timeout hardening. ## Fixes - swarm_sim.rs: min-separation nudge before collision metric (0 collisions with staggered starts) + Phase-3 convergence assist that vectors the nearest idle peer toward a single-drone CSI contact so multi-view fusion can fire - http_backend.rs: add 5s request timeout to reqwest client (security review Medium finding — a dead daemon would otherwise hang the swarm step loop) ## Security review verdict (HttpRufloBackend) Safe to merge. No credentials in requests, serde_json prevents injection, fail-open on daemon-down is documented and appropriate for SAR missions, MAVLink passed as structured text (not raw bytes). Timeout fix applied. ## Tests - --no-default-features: 87/87 pass - --features ruflo,itar-unrestricted: 100/100 pass Co-Authored-By: claude-flow <ruv@ruv.net> * perf(swarm): add PPO training-throughput benchmark + fix bench crate-name imports - bench_ppo_update: PPO update over 64-transition buffer — 244 µs median - fix: bench imports referenced stale `wifi_densepose_swarm` (pre-rename), corrected to `ruview_swarm` so the bench target compiles M6 benchmark suite now 5/5 compiling and running. Tests unchanged: 87/100. Co-Authored-By: claude-flow <ruv@ruv.net> * feat(swarm): real Candle autodiff PPO + A-MAPPO role attention + GPU training (M4) Replaces the finite-difference PPO placeholder with a real GPU-capable Candle 0.9 autodiff trainer, adds A-MAPPO heterogeneous-role attention, a runnable training binary, and right-sized GCP/local launch scripts. This is the unlock that makes "GPU long training cycles" actually mean something — the previous ppo_update did no gradient descent. ## Real autodiff PPO (feature `train`, optional `cuda`) - candle_ppo.rs: CandleActorCritic (64→128→64 MLP + action/value heads + learnable log_std), CandlePpoConfig, CandleTrainer with GAE and a genuine optimizer.backward_step over the network. select_device() picks CUDA when built --features cuda and a GPU is present, else CPU. - Verified: 5-episode CPU smoke run shows value_loss 12643→12375 (critic actually learning); safetensors checkpoint saved. Placeholder never moved weights. ## A-MAPPO heterogeneous-role attention (role_attention.rs, always compiled) Addresses the four sensor-vs-relay edge cases: - relay attention floor (prevents collapse — relays produce no CSI) - role-segmented sensor/relay attention pools (variable neighbor cardinality) - sensor-gated triangulation-geometry penalty (protects 3-view fusion baseline, ADR-148 §4.2 — relays not dragged into triangulation geometry) - one-hot role embeddings for keys ## Training binary - src/bin/train_marl.rs (required-features=["train"], excluded from default build) - CLI: --episodes --drones --profile --steps --checkpoint-dir --checkpoint-every - Wires CandleTrainer to the SwarmOrchestrator rollout loop; GAE + PPO update per episode; periodic safetensors checkpoints ## Right-sized launch (scripts/gcp/) - provision_marl.sh: g2-standard-16 (1× L4, 16 vCPU, ~$1.40/hr) — NOT the $29/hr A100×8 box. MARL is rollout-bound not matmul-bound; ~21× cheaper. - run_marl_train.sh: GCP rsync + train + checkpoint pull - run_marl_train_local.sh: local RTX 5080, $0 - A100×8 provision_training.sh left for OccWorld (which saturates the GPUs) ## Tests - --no-default-features: 91/91 (87 + 4 role_attention) - --features train: 96/96 (+ 5 candle_ppo, incl. real-autodiff verification) - --features ruflo,itar-unrestricted: 104/104 - default build stays light: train_marl excluded via required-features Co-Authored-By: claude-flow <ruv@ruv.net> * docs(adr-148): mark M4 complete — real GPU autodiff training; overall 98% Co-Authored-By: claude-flow <ruv@ruv.net> * feat(swarm): training visualizer — JSONL telemetry + self-contained HTML viewer Adds an offline, dependency-free visualization for the drone training system: a top-down swarm replay synced with training-metric curves, fed by a JSONL telemetry log the trainer emits. No server, no build step, no CDN. ## Telemetry recorder (integration/telemetry.rs, always compiled, no new deps) - TelemetryRecorder writes newline-delimited JSON: one `meta` (profile, area, ground-truth victims), many `step` (per-tick drone x/y/heading/battery/detection + coverage%), and per-episode `episode` (mean_return, policy_loss, value_loss). - Written by hand (no serde_json) so it stays in the default build; 2 tests. ## train_marl telemetry flags - `--telemetry FILE` writes the log; `--telemetry-episode N` selects which episode's spatial steps to record (metrics recorded for all episodes). ## Visualizer (viz/swarm_viz.html — single file, vanilla JS + canvas) - LEFT: top-down replay — heading-oriented drone triangles (cyan/lime on detection), victim markers, growing coverage heatmap, detection pulse rings, play/pause/scrub/speed controls + live coverage/detection readout. - RIGHT: three autoscaled line charts (mean return, policy loss, value loss) over episodes, hand-drawn (no chart library). - Loads via file picker/drag-drop or auto-fetches the bundled sample; dark drone-ops theme; graceful degradation on file:// CORS. - viz/sample_telemetry.jsonl: real 30-episode / 4-drone / 400×400 m run (value_loss 20052→7154 — visible critic learning). Parses 1 meta / 60 step / 30 episode. ## Usage cargo run --release -p ruview-swarm --features train,cuda --bin train_marl -- \ --episodes 5000 --telemetry run.jsonl open v2/crates/ruview-swarm/viz/swarm_viz.html # load run.jsonl Tests unchanged (91 default / 96 train / 104 ruflo+itar); telemetry adds 2. Co-Authored-By: claude-flow <ruv@ruv.net> * feat(swarm): selectable flight + self-learning patterns, wired into training + viz Adds multiple flight/coverage-optimization strategies and self-learning strategies, selectable from the trainer, and fixes drone clustering — the demo sweep now covers 36% of the area (was ~0.9%) with 4 disjoint strips. ## Flight patterns (planning/patterns.rs) — `FlightPattern` - PartitionedLawnmower (new default): area split into per-drone strips → no overlap, coverage scales ~linearly with swarm size (clustering fix) - Boustrophedon (baseline), Spiral, Pheromone (stigmergic), PotentialField, LevyFlight. from_str/name/all + next_target(&PatternContext). ## Self-learning patterns (marl/learning.rs) — `LearningPattern` - Mappo (CTDE centralized critic), Ippo (independent, jamming-robust), MappoCuriosity (count-based intrinsic novelty), MetaRl (MAML fast-adapt). - CuriosityModule (visit_bonus = beta/sqrt(count), novelty decays on revisit), MetaAdapter (base + fast-weights, reset_fast/consolidate), shaped_reward(). ## Trainer wiring (bin/train_marl.rs) - --flight-pattern {boustrophedon\|partitioned\|spiral\|pheromone\|potential\|levy} - --learn-pattern {mappo\|ippo\|curiosity\|meta} - Rollout now moves each drone per the selected FlightPattern (PatternContext with visited trail + live peers), curiosity-shapes the reward, and logs CTDE vs independent. Telemetry meta profile carries the pattern labels so the viewer header shows `flight=… · learn=…`. ## Verification - Browser pass (viz at localhost:8777): partitioned run renders 4 distinct serpentine coverage bands, header shows the patterns, final coverage 36.3%, scrubber/speed/playback work, ZERO console errors. Screenshot confirmed. - Regenerated viz/sample_telemetry.jsonl: 1 meta / 120 step / 30 episode, coverage 0.9% → 36.3%. ## Tests - --no-default-features: 103/103 (was 91; +6 patterns +6 learning) - --features train: 108/108 Co-Authored-By: claude-flow <ruv@ruv.net> * feat(swarm): add flight-pattern telemetry presets for the visualizer 5 loadable presets (verified browser-distinct, physics-ordered coverage): pheromone ~44% > potential ~40% > partitioned 36% > spiral ~13% > levy ~5%. Load any in viz/swarm_viz.html to compare flight strategies without retraining. Co-Authored-By: claude-flow <ruv@ruv.net> * chore(swarm): clippy-clean + publish guard for ruview-swarm - ruview-swarm src is now 0 clippy warnings across default/train/full feature sets (derive Default, targeted allows for intentional from_str + bounded casts + borrow-required index loops; removed redundant unsigned .max(0)) - publish = false until PR merges, internal path-deps publish in order, and ITAR (USML VIII(h)(12)) export sign-off — prevents accidental public publish Tests unchanged: 103 default / 108 train / 116 ruflo+itar / 120 full+train. (6 remaining clippy warnings are pre-existing in dependency wifi-densepose-core, out of scope for this crate.) Co-Authored-By: claude-flow <ruv@ruv.net> * ci(swarm): add ruview-swarm CI guard Path-scoped guard for v2/crates/ruview-swarm/** (ADR-148). Complements the main ci.yml (which only runs the default workspace tests): - feature-matrix tests: default / train / ruflo+itar / full+train - clippy -D warnings --no-deps (crate-own code only; dep warnings don't gate) - train_marl bin builds under 'train' AND is excluded from the default build - ITAR/publish guards: publish=false present, itar-unrestricted never in default All steps verified locally green before commit. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-30 16:00:59 -04:00

rUv

42dcf49f4d

fix(adr): resolve duplicate ADR numbers + close ADR-080 security + ADR-154 M1 signal backlog (#1051 )

* fix(signal): circular phase variance for ghost-tap guard (ADR-154 §7.4 #1)

`phase_variance` computed a LINEAR sample variance over phase angles that
wrap at ±π, so a tightly-clustered set straddling the branch cut reported
spuriously HIGH dispersion — false-tripping the `> TAU` ghost-tap guard on
real, tightly-clustered CIR taps.

Replace with Mardia's circular variance V = 1 − R̄, bounded [0,1] and
invariant to where the cluster sits on the circle. Re-derive the guard
against the bounded metric via a named const
`GHOST_TAP_CIRCULAR_VARIANCE_MAX` (the old TAU-scaled threshold is
meaningless on [0,1]).

Grade: metric fix MEASURED; threshold value DATA-GATED — a clean single-path
ramp also sweeps the circle, so V alone cannot separate clean from
unsanitized without labelled frames. Conservative default (0.99) errs toward
never false-rejecting, strictly more permissive at the wrap boundary than the
buggy linear guard.

Fails-on-old test: `phase_variance_circular_not_fooled_by_branch_cut` —
inlines the old linear variance to show it exceeds TAU on wrap-straddling
phases while circular V≈0 and the guard no longer trips. Plus
`phase_variance_circular_is_bounded_and_extremal` (V∈[0,1], V≈0 identical,
V≈1 uniform).

cargo test -p wifi-densepose-signal --no-default-features --features cir --lib
→ 432 passed, 0 failed.

Co-Authored-By: claude-flow <ruv@ruv.net>

* fix(signal): pin Welford n=0/n=1 finiteness guard (ADR-154 §7.4 #10)

The shared `WelfordStats` (field_model.rs, used by longitudinal.rs and others)
relies on `count < 2` guards in `variance`/`sample_variance`/`std_dev`/
`z_score` to stay finite at the boundaries. The guards existed but the n=0
boundary was UNTESTED — exactly the §4 divide-by-(n−1) family the ADR groups
this with.

Add `welford_finite_at_n0_and_n1` asserting every statistic is finite and
returns the documented sentinel (0.0) at n=0 and n=1, plus load-bearing doc
comments on the two guards.

Fails-on-old proof: with the `sample_variance` guard removed, the test FAILS
with "attempt to subtract with overflow" at the `(self.count - 1)` underflow
(0usize − 1); `variance` would similarly yield 0.0/0.0 = NaN. The guard is
restored; the test pins it so a future regression is caught.

Grade: MEASURED (boundary finiteness is asserted; the guard is the §4-family
fix made testable).

cargo test -p wifi-densepose-signal --no-default-features --lib field_model
→ 22 passed, 0 failed.

Co-Authored-By: claude-flow <ruv@ruv.net>

* refactor(signal): de-magic adversarial thresholds + boundary tests (ADR-154 §7.4 #13)

Lift the bare numeric literals buried in `check`/`check_consistency` into
named, documented module consts (FIELD_MODEL_GINI_VIOLATION=0.8,
ENERGY_RATIO_HIGH_VIOLATION=2.0, ENERGY_RATIO_LOW_VIOLATION=0.1,
CONSISTENCY_ACTIVE_FRACTION_OF_MEAN=0.1, SCORE_W_* weights). VALUES UNCHANGED —
each const equals the original literal; only names + pinning tests are new.

Grade: DATA-GATED. The operating values stay empirical (defensible values need
labelled spoofed/clean CSI — Wi-Spoof, §6.2/§7.3). The de-magicking +
characterization tests are MEASURED: `tuning_consts_unchanged_from_literals`,
`energy_ratio_high_boundary`, `energy_ratio_low_boundary`,
`field_model_gini_boundary`, `consistency_active_fraction_boundary` pin the
decision boundaries at/just-below/just-above each threshold, so a future
data-driven retune is a visible, tested change.

Fails-on-change proof: bumping ENERGY_RATIO_HIGH_VIOLATION 2.0→3.0 makes
`energy_ratio_high_boundary` FAIL (restored). Operating values explicitly
NOT changed.

cargo test -p wifi-densepose-signal --no-default-features --lib ruvsense::adversarial
→ 20 passed, 0 failed.

Co-Authored-By: claude-flow <ruv@ruv.net>

* refactor(signal): de-magic coherence drift/gate thresholds (ADR-154 §7.4 #9)

Lift the bare detection literals in `coherence.rs::classify_drift`
(DRIFT_STABLE_SCORE=0.85, DRIFT_STEP_CHANGE_MAX_STALE=10) and the
`coherence_gate.rs` Default impl (DEFAULT_ACCEPT_THRESHOLD=0.85,
DEFAULT_REJECT_THRESHOLD=0.5, DEFAULT_MAX_STALE_FRAMES=200,
DEFAULT_PREDICT_ONLY_NOISE=3.0) into named, documented consts. VALUES
UNCHANGED. The gate already exposed these via GatePolicyConfig (config seam);
this names + pins the defaults.

Grade: DATA-GATED. Operating values stay empirical (defensible Z-score
thresholds need labelled stable/drifting coherence traces). De-magicking +
boundary tests are MEASURED: `classify_drift_stable_score_boundary`,
`classify_drift_stale_count_boundary` pin the at/just-below/just-above
decisions; `drift_consts_unchanged_from_literals` /
`gate_default_consts_unchanged_from_literals` pin the values. Operating values
explicitly NOT changed.

cargo test -p wifi-densepose-signal --no-default-features --lib ruvsense::coherence
→ 40 passed, 0 failed.

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs(adr-154): mark §7.4 P1 backlog cleared — Milestone-1 (#1,#10 RESOLVED; #9,#13 DATA-GATED)

Update ADR-154 §7.4 backlog rows #1, #9, #10, #13 with commit refs + grades,
the §7.4 intro count (four P1 items cleared, ~41 P2/P3 remain), the
Horizon-ledger one-liner (Milestone-1 DONE), and the §8 honest-limits #1 line
(metric now correct; threshold still DATA-GATED). Add CHANGELOG [Unreleased]
entry.

Grades: #1 RESOLVED (MEASURED metric / DATA-GATED threshold), #10 RESOLVED
(MEASURED), #9 & #13 RESOLVED-PARTIAL (DATA-GATED — de-magicked + boundary
tested, operating values unchanged).

Validation: cargo test --workspace --no-default-features → 2057 passed, 0
failed; wifi-densepose-signal lib → 442 passed (no-default + --features cir);
python archive/v1/data/proof/verify.py → VERDICT: PASS, hash f8e76f21…46f7a
UNCHANGED (CIR ghost-tap guard is not on the deterministic proof path).

Co-Authored-By: claude-flow <ruv@ruv.net>

* fix(sensing-server): stop leaking internal errors in HTTP responses (ADR-080 #2)

Six handlers in `main.rs` serialized the internal error `Display` straight
into the JSON response body, leaking server internals to any client (ADR-080
finding #2, CWE-209; reframed onto the Rust boundary by ADR-164 G11):

  - edge_registry_endpoint: a panicked spawn_blocking `JoinError`
    ("task … panicked") in a 500, and the raw upstream error in a 503
  - delete_model / delete_recording / start_recording: std::io::Error
    strings carrying OS detail / filesystem paths
  - calibration_start / calibration_stop: the FieldModel error chain

New `error_response` module: `internal_error` / `internal_error_json` /
`upstream_unavailable` log the full detail server-side only (tagged with a
correlation id) and return a generic body
(`{"error":"internal_error","correlation_id":…}`) — no `panicked`, no file
paths, no Debug chain. The correlation id lets an operator join a client
report to the exact server log line without ever shipping the detail.

Pinned by 5 error_response tests, incl. a leak-substring guard
(internal_error_body_does_not_leak_detail) verified to FAIL on the reverted
old body (returns the panic message / path / "os error"). The HOMECORE sweep
(ADR-161) covered homecore-server, not this crate.

Co-Authored-By: claude-flow <ruv@ruv.net>

* test(sensing-server): pin XFF-immunity + no-query-token (ADR-080 #1, #3)

Findings #1 (XFF-spoofing bypass) and #3 (JWT-in-URL, CWE-598) were logged
against the Python v1 API but are VERIFIED ABSENT on the current Rust
sensing-server, so they get regression tests rather than redundant fixes:

  - #1 XFF: there is no IP-based rate-limiter or IP-allowlist to bypass, and
    neither security middleware reads a forwarded header. Added
    bearer_auth::xff_header_never_affects_auth_decision (spoofed
    X-Forwarded-For never flips a 401<->200 decision) and
    host_validation::forwarded_headers_never_bypass_host_allowlist (spoofed
    X-Forwarded-Host: localhost never lets Host: evil.com past the allowlist).

  - #3 JWT-in-URL: require_bearer reads the token only from the Authorization
    header; WS handlers take no query token; the sole Query extractor
    (EdgeRegistryParams) is a non-secret refresh flag. Added
    bearer_auth::query_string_token_is_never_accepted — ?token= / ?access_token=
    in the URL never authenticates (stays 401) while the header path still 200s.
    Verified to FAIL when a query-token path is injected into require_bearer.

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs(adr-080): mark P0 security findings #1-#3 RESOLVED; close ADR-164 G11

- ADR-080: Status note + per-finding closure (#1 XFF and #3 JWT-in-URL
  verified absent + regression-pinned; #2 leaked errors fixed via the
  error_response module). Records the v1-vs-Rust boundary distinction
  explicitly: v1 paths remain archived; this closure governs the shipped
  Rust sensing-server.
- ADR-164: Gap Register G11 and the Open/Gated Backlog entry marked
  RESOLVED with the fix + branch reference.
- CHANGELOG: [Unreleased] -> ### Security entry covering all three findings.

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs(adr): renumber 6 displaced ADRs to resolve duplicate-number collisions (ADR-164 G1)

Resolves the 5 duplicate ADR numbers (6 displaced files) flagged by ADR-164
Gap Register item G1. Canonical keeper per number = first file committed at
that number (date tie-broken by inbound cross-reference count / parent-appendix
relationship). Displaced files renumbered to the next free numbers (166-171):

  050 keeps provisioning-tool-enhancements (5 refs vs 1)
    -> ADR-166-quality-engineering-security-hardening
  052 keeps tauri-desktop-frontend (parent ADR)
    -> ADR-167-ddd-bounded-contexts (its appendix)
  147 keeps nvidia-cosmos/OccWorld (the actual ADR, has Status header)
    -> ADR-168-benchmark-proof (proof companion, no Status)
    -> ADR-169-adam-mode-light-theme (was untracked)
  148 keeps drone-swarm-control-system (committed #862)
    -> ADR-170-yoga-mode-pose-system (was untracked)
  149 keeps public-community-leaderboard-huggingface (committed 16:47 vs 17:38)
    -> ADR-171-swarm-benchmarking-evaluation-methodology

Updates in-file `# ADR-NNN` headers and intra-file self-references (yoga-modes

* docs(adr): repoint inbound cross-references to renumbered ADRs (166-171)

Follow-up to the ADR renumbering (ADR-164 G1). Updates every inbound reference
that pointed at a displaced ADR, disambiguating shared numbers by title/slug so
only references to the DISPLACED topic move and keeper references stay put.

ADR-168 (was 147 benchmark-proof): README, CHANGELOG, user-guide,
  proof-of-capabilities, research docs 00/03 — all path/label refs updated.
ADR-169 (was 147 adam-mode) / ADR-170 (was 148 yoga-mode): docs/adr/README index.
ADR-171 (was 149 swarm-benchmarking): all ruview-swarm eval code+docs
  (Cargo.toml, evals/, eval_swarm.rs, metrics/mod/report/runner.rs), research
  doc 03 (every §-ref matched ADR-171 sections, not AetherArena), 00-system-review,
  series README, CHANGELOG, and ADR-148's forward/"open issues" pointers.
ADR-166 (was 050 quality-engineering / security-hardening): disambiguated from the
  ADR-050 provisioning KEEPER by topic. The HMAC/secure_tdm, directory-traversal,
  bind-address, and OTA-PSK-auth references in code comments
  (wifi-densepose-hardware Cargo.toml + secure_tdm.rs, sensing-server main.rs) and
  in ADR-052-tauri / ADR-167 all describe the security-hardening ADR -> ADR-166.
ADR-167 (was 052 ddd-appendix): inbound appendix references.

Index/registry updates: docs/adr/README.md, gap-analysis/census.md (rows +
header count), gap-analysis/lens-findings.md (collision table marked RESOLVED),
and ADR-164 Gap Register G1 marked RESOLVED with the full renumber map.

Keeper references deliberately untouched: all ADR-147 OccWorld code, all ADR-148
drone-swarm code/docs, all ADR-149 AetherArena refs (incl. ADR-150's SSL/resampling
refs, which ADR-150 explicitly binds to the AetherArena benchmark), ADR-050
provisioning refs, ADR-052 tauri refs. The frozen GitHub blob URLs in
docs/adr/.issue-177-body.md (pinned to an old branch) are left as historical.

Comment-only code edits; no behavior change. wifi-densepose-hardware compiles
clean; the sensing-server build's sole blocker is the pre-existing upstream
midstreamer-temporal-compare@0.2.1 registry crate, unrelated to these edits.

Co-Authored-By: claude-flow <ruv@ruv.net>

2026-06-13 14:31:38 -04:00

rUv

0d3d835bf8

feat(swarm): add ruview-swarm crate — drone swarm control system (ADR-148) (#862 )

* feat(swarm): add wifi-densepose-swarm crate implementing ADR-148 drone swarm control system

New crate `wifi-densepose-swarm` with hierarchical-mesh swarm topology,
Raft consensus, MAPPO MARL, CSI sensing integration, and ITAR-gated
coordination features. Closes 3 of 7 milestones (M1, M2, M5) with 5/5
ADR-148 SOTA performance targets met.

## Modules (45 source files, 14 modules)

- types: NodeId, DroneState, Position3D, SwarmTask, SwarmError, FailSafeState
- topology: Raft consensus (leader election, log replication, quorum), Gossip, Mesh
- formation: VirtualStructure, LeaderFollower, Reynolds flocking (itar-gated)
- planning: RRT-APF hybrid planner, 3-phase coverage, Bayesian grid, pheromone
- allocation: Auction + FNN bid scorer (itar-gated)
- sensing: CsiPayloadPipeline (Live/Synthetic/Replay), MultiViewFusion, OccWorldBridge
- marl: MAPPO actor (3-layer MLP), LocalObservation (64-dim), RewardCalculator, PPO loop
- security: MAVLink v2 HMAC-SHA256, UWB anti-spoofing, geofence, Remote ID, FHSS
- failsafe: 10-state onboard machine, GCS-independent safety transitions
- config: TOML SwarmConfig with SAR/inspection/agriculture/mine/demo/wi2sar_reference
- demo: SyntheticCsiGenerator, DemoScenario (SAR/open-field/mine)
- integration: FlightController trait, MAVLink dialect (50000-50005), SwarmSim
- orchestrator: SwarmOrchestrator wiring all subsystems end-to-end
- bench_support: Criterion fixture generators

## ITAR compliance

Swarming coordination features gated behind `itar-unrestricted` feature
per USML Category VIII(h)(12). Default build compiles clean stubs.

## Benchmark results (criterion, release mode)

- MARL actor inference: 3.3 µs (target ≤ 5 ms — 1,516× headroom)
- RRT-APF planning (100 iter): 0.043 ms (target < 300 ms — 6,946× headroom)
- MultiView CSI fusion (3 UAVs): 58.5 ns (target < 10 ms — 171,000× headroom)
- 3-view localization: 1.732 m (target ≤ 2 m — beats Wi2SAR SOTA)
- 4-drone SAR coverage (400×400 m): 223 s (target ≤ 240 s — PASS)

## Tests

- --no-default-features: 73/73 passing
- --features itar-unrestricted: 85/85 passing

Closes #861

Co-Authored-By: claude-flow <ruv@ruv.net>

* refactor(swarm): rename wifi-densepose-swarm → ruview-swarm

The swarm control system is a RuView-level capability (drone coordination,
Raft consensus, MARL) that operates above the wifi-densepose sensing layer
rather than being a sub-component of it. Rename aligns with the project
identity and separates coordination infrastructure from sensing modules.

Co-Authored-By: claude-flow <ruv@ruv.net>

* fix(swarm): resolve all clippy warnings + add MARL convergence test

- planning/probability_grid: map_or(true,…) → is_none_or (clippy::unnecessary_map_or)
- planning/pheromone: &mut Vec<T> → &mut [T] on evaporate+deposit (clippy::ptr_arg)
- marl/observation: fix doc lazy-continuation warning on TOTAL line
- marl/trainer: manual Default impl → #[derive(Default)] + #[default] on Demo variant

Also adds test_marl_convergence_improves_mean_return: fills 64-transition
ReplayBuffer with mixed rewards (steps 0-31: negative, 32-63: positive),
runs ppo_update, asserts mean_return is finite and non-zero.

Result: 0 clippy warnings · 74/74 tests (default) · 86/86 (itar-unrestricted)

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(swarm): integrate Ruflo AI-agent capabilities into ruview-swarm

Adds a feature-gated Ruflo integration layer connecting ruview-swarm to the
claude-flow daemon's AgentDB, AIDefence, and SONA intelligence subsystems.
Default build is unaffected (all paths behind `Option<Box<dyn RufloBackend>>`).

## New module: src/ruflo/

- backend.rs: RufloBackend trait (9 async methods) + RufloError, MissionMemoryEntry,
  PatternEntry, MavlinkScanResult types (always compiled)
- mock_backend.rs: MockRufloBackend in-memory impl for testing (always compiled, 5 tests)
- http_backend.rs: HttpRufloBackend — JSON-RPC 2.0 → claude-flow daemon localhost:3000
  (gated behind `ruflo` feature, requires reqwest)
- mission_summary.rs: MissionSummary serializer with pattern description + confidence
  scoring from victim recall, coverage %, collision penalty (always compiled, 3 tests)

## 4 capability areas

1. MissionMemory   → memory_store / memory_search       (cross-mission victim memory)
2. PatternLearner  → agentdb_pattern-store / -search     (HNSW SONA trajectory patterns)
3. MavlinkDefence  → aidefence_is_safe / aidefence_scan  (scan MAVLink before accepting)
4. IntelligenceHooks → trajectory-start/step/end          (SONA learning loop)

## SwarmOrchestrator integration

- with_ruflo(backend): builder to attach a backend
- start_trajectory(task) / finish_trajectory(success, key): SONA mission lifecycle
- receive_peer_detection_checked(): AIDefence scan before accepting peer detections

## Cargo feature

`ruflo = ["dep:reqwest", "dep:serde_json"]` — optional, not in default

## Tests

- --no-default-features: 82/82 pass (8 new ruflo tests)
- --features ruflo,itar-unrestricted: 94/94 pass

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(swarm): M7 mission profiles with victim confirmation reports + pre-merge docs

Adds end-to-end mission runners producing structured MissionReport output,
and updates project docs (CHANGELOG, README, CLAUDE.md) per pre-merge checklist.

## M7 Mission Profiles (integration/mission_report.rs + swarm_sim.rs)

- MissionReport / VictimReport / SotaComparison types (serde-serializable)
- run_mission_with_report(): full mission → detailed report with per-victim
  localization error, fusion uncertainty, contributing drones, detection time
- run_inspection_mission(): leader-follower power-line corridor inspection
- run_mine_mission(): GPS-denied underground (2-drone, slow, UWB-only)
- SotaComparison embeds Wi2SAR baseline (5m / 810s) vs achieved metrics

## Docs (pre-merge checklist)

- CHANGELOG.md: ruview-swarm + Ruflo integration + performance entries
- README.md: ruview-swarm row
- CLAUDE.md: Key Rust Crates table row + ADR-148 in ADR list

## Tests
- --no-default-features: 86/86 pass
- --features ruflo,itar-unrestricted: 98/98 pass

Co-Authored-By: claude-flow <ruv@ruv.net>

* fix(swarm): convergence-assist for victim fusion + 5s Ruflo HTTP timeout

Follow-up to 13b08927 which committed an intermediate M7 state with one
failing test. This lands the M7 agent's convergence fixes and the security
review's timeout hardening.

## Fixes
- swarm_sim.rs: min-separation nudge before collision metric (0 collisions
  with staggered starts) + Phase-3 convergence assist that vectors the nearest
  idle peer toward a single-drone CSI contact so multi-view fusion can fire
- http_backend.rs: add 5s request timeout to reqwest client (security review
  Medium finding — a dead daemon would otherwise hang the swarm step loop)

## Security review verdict (HttpRufloBackend)
Safe to merge. No credentials in requests, serde_json prevents injection,
fail-open on daemon-down is documented and appropriate for SAR missions,
MAVLink passed as structured text (not raw bytes). Timeout fix applied.

## Tests
- --no-default-features: 87/87 pass
- --features ruflo,itar-unrestricted: 100/100 pass

Co-Authored-By: claude-flow <ruv@ruv.net>

* perf(swarm): add PPO training-throughput benchmark + fix bench crate-name imports

- bench_ppo_update: PPO update over 64-transition buffer — 244 µs median
- fix: bench imports referenced stale `wifi_densepose_swarm` (pre-rename),
  corrected to `ruview_swarm` so the bench target compiles

M6 benchmark suite now 5/5 compiling and running. Tests unchanged: 87/100.

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(swarm): real Candle autodiff PPO + A-MAPPO role attention + GPU training (M4)

Replaces the finite-difference PPO placeholder with a real GPU-capable Candle
0.9 autodiff trainer, adds A-MAPPO heterogeneous-role attention, a runnable
training binary, and right-sized GCP/local launch scripts. This is the unlock
that makes "GPU long training cycles" actually mean something — the previous
ppo_update did no gradient descent.

## Real autodiff PPO (feature `train`, optional `cuda`)
- candle_ppo.rs: CandleActorCritic (64→128→64 MLP + action/value heads +
  learnable log_std), CandlePpoConfig, CandleTrainer with GAE and a genuine
  optimizer.backward_step over the network. select_device() picks CUDA when
  built --features cuda and a GPU is present, else CPU.
- Verified: 5-episode CPU smoke run shows value_loss 12643→12375 (critic
  actually learning); safetensors checkpoint saved. Placeholder never moved weights.

## A-MAPPO heterogeneous-role attention (role_attention.rs, always compiled)
Addresses the four sensor-vs-relay edge cases:
- relay attention floor (prevents collapse — relays produce no CSI)
- role-segmented sensor/relay attention pools (variable neighbor cardinality)
- sensor-gated triangulation-geometry penalty (protects 3-view fusion baseline,
  ADR-148 §4.2 — relays not dragged into triangulation geometry)
- one-hot role embeddings for keys

## Training binary
- src/bin/train_marl.rs (required-features=["train"], excluded from default build)
- CLI: --episodes --drones --profile --steps --checkpoint-dir --checkpoint-every
- Wires CandleTrainer to the SwarmOrchestrator rollout loop; GAE + PPO update
  per episode; periodic safetensors checkpoints

## Right-sized launch (scripts/gcp/)
- provision_marl.sh: g2-standard-16 (1× L4, 16 vCPU, ~$1.40/hr) — NOT the
  $29/hr A100×8 box. MARL is rollout-bound not matmul-bound; ~21× cheaper.
- run_marl_train.sh: GCP rsync + train + checkpoint pull
- run_marl_train_local.sh: local RTX 5080, $0
- A100×8 provision_training.sh left for OccWorld (which saturates the GPUs)

## Tests
- --no-default-features: 91/91 (87 + 4 role_attention)
- --features train: 96/96 (+ 5 candle_ppo, incl. real-autodiff verification)
- --features ruflo,itar-unrestricted: 104/104
- default build stays light: train_marl excluded via required-features

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs(adr-148): mark M4 complete — real GPU autodiff training; overall 98%

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(swarm): training visualizer — JSONL telemetry + self-contained HTML viewer

Adds an offline, dependency-free visualization for the drone training system:
a top-down swarm replay synced with training-metric curves, fed by a JSONL
telemetry log the trainer emits. No server, no build step, no CDN.

## Telemetry recorder (integration/telemetry.rs, always compiled, no new deps)
- TelemetryRecorder writes newline-delimited JSON: one `meta` (profile, area,
  ground-truth victims), many `step` (per-tick drone x/y/heading/battery/detection
  + coverage%), and per-episode `episode` (mean_return, policy_loss, value_loss).
- Written by hand (no serde_json) so it stays in the default build; 2 tests.

## train_marl telemetry flags
- `--telemetry FILE` writes the log; `--telemetry-episode N` selects which
  episode's spatial steps to record (metrics recorded for all episodes).

## Visualizer (viz/swarm_viz.html — single file, vanilla JS + canvas)
- LEFT: top-down replay — heading-oriented drone triangles (cyan/lime on
  detection), victim markers, growing coverage heatmap, detection pulse rings,
  play/pause/scrub/speed controls + live coverage/detection readout.
- RIGHT: three autoscaled line charts (mean return, policy loss, value loss)
  over episodes, hand-drawn (no chart library).
- Loads via file picker/drag-drop or auto-fetches the bundled sample; dark
  drone-ops theme; graceful degradation on file:// CORS.
- viz/sample_telemetry.jsonl: real 30-episode / 4-drone / 400×400 m run
  (value_loss 20052→7154 — visible critic learning). Parses 1 meta / 60 step / 30 episode.

## Usage
  cargo run --release -p ruview-swarm --features train,cuda --bin train_marl -- \
      --episodes 5000 --telemetry run.jsonl
  open v2/crates/ruview-swarm/viz/swarm_viz.html  # load run.jsonl

Tests unchanged (91 default / 96 train / 104 ruflo+itar); telemetry adds 2.

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(swarm): selectable flight + self-learning patterns, wired into training + viz

Adds multiple flight/coverage-optimization strategies and self-learning
strategies, selectable from the trainer, and fixes drone clustering — the
demo sweep now covers 36% of the area (was ~0.9%) with 4 disjoint strips.

## Flight patterns (planning/patterns.rs) — `FlightPattern`
- PartitionedLawnmower (new default): area split into per-drone strips → no
  overlap, coverage scales ~linearly with swarm size (clustering fix)
- Boustrophedon (baseline), Spiral, Pheromone (stigmergic), PotentialField,
  LevyFlight. from_str/name/all + next_target(&PatternContext).

## Self-learning patterns (marl/learning.rs) — `LearningPattern`
- Mappo (CTDE centralized critic), Ippo (independent, jamming-robust),
  MappoCuriosity (count-based intrinsic novelty), MetaRl (MAML fast-adapt).
- CuriosityModule (visit_bonus = beta/sqrt(count), novelty decays on revisit),
  MetaAdapter (base + fast-weights, reset_fast/consolidate), shaped_reward().

## Trainer wiring (bin/train_marl.rs)
- --flight-pattern {boustrophedon|partitioned|spiral|pheromone|potential|levy}
- --learn-pattern  {mappo|ippo|curiosity|meta}
- Rollout now moves each drone per the selected FlightPattern (PatternContext
  with visited trail + live peers), curiosity-shapes the reward, and logs
  CTDE vs independent. Telemetry meta profile carries the pattern labels so the
  viewer header shows `flight=… · learn=…`.

## Verification
- Browser pass (viz at localhost:8777): partitioned run renders 4 distinct
  serpentine coverage bands, header shows the patterns, final coverage 36.3%,
  scrubber/speed/playback work, ZERO console errors. Screenshot confirmed.
- Regenerated viz/sample_telemetry.jsonl: 1 meta / 120 step / 30 episode,
  coverage 0.9% → 36.3%.

## Tests
- --no-default-features: 103/103 (was 91; +6 patterns +6 learning)
- --features train: 108/108

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(swarm): add flight-pattern telemetry presets for the visualizer

5 loadable presets (verified browser-distinct, physics-ordered coverage):
pheromone ~44% > potential ~40% > partitioned 36% > spiral ~13% > levy ~5%.
Load any in viz/swarm_viz.html to compare flight strategies without retraining.

Co-Authored-By: claude-flow <ruv@ruv.net>

* chore(swarm): clippy-clean + publish guard for ruview-swarm

- ruview-swarm src is now 0 clippy warnings across default/train/full feature
  sets (derive Default, targeted allows for intentional from_str + bounded
  casts + borrow-required index loops; removed redundant unsigned .max(0))
- publish = false until PR merges, internal path-deps publish in order, and
  ITAR (USML VIII(h)(12)) export sign-off — prevents accidental public publish

Tests unchanged: 103 default / 108 train / 116 ruflo+itar / 120 full+train.
(6 remaining clippy warnings are pre-existing in dependency wifi-densepose-core,
 out of scope for this crate.)

Co-Authored-By: claude-flow <ruv@ruv.net>

* ci(swarm): add ruview-swarm CI guard

Path-scoped guard for v2/crates/ruview-swarm/** (ADR-148). Complements the
main ci.yml (which only runs the default workspace tests):
- feature-matrix tests: default / train / ruflo+itar / full+train
- clippy -D warnings --no-deps (crate-own code only; dep warnings don't gate)
- train_marl bin builds under 'train' AND is excluded from the default build
- ITAR/publish guards: publish=false present, itar-unrestricted never in default

All steps verified locally green before commit.

Co-Authored-By: claude-flow <ruv@ruv.net>

2026-05-30 16:00:59 -04:00

2 Commits