wifi-densepose

History

ruv 4f004e018b feat(swarm): real Candle autodiff PPO + A-MAPPO role attention + GPU training (M4) Replaces the finite-difference PPO placeholder with a real GPU-capable Candle 0.9 autodiff trainer, adds A-MAPPO heterogeneous-role attention, a runnable training binary, and right-sized GCP/local launch scripts. This is the unlock that makes "GPU long training cycles" actually mean something — the previous ppo_update did no gradient descent. ## Real autodiff PPO (feature `train`, optional `cuda`) - candle_ppo.rs: CandleActorCritic (64→128→64 MLP + action/value heads + learnable log_std), CandlePpoConfig, CandleTrainer with GAE and a genuine optimizer.backward_step over the network. select_device() picks CUDA when built --features cuda and a GPU is present, else CPU. - Verified: 5-episode CPU smoke run shows value_loss 12643→12375 (critic actually learning); safetensors checkpoint saved. Placeholder never moved weights. ## A-MAPPO heterogeneous-role attention (role_attention.rs, always compiled) Addresses the four sensor-vs-relay edge cases: - relay attention floor (prevents collapse — relays produce no CSI) - role-segmented sensor/relay attention pools (variable neighbor cardinality) - sensor-gated triangulation-geometry penalty (protects 3-view fusion baseline, ADR-148 §4.2 — relays not dragged into triangulation geometry) - one-hot role embeddings for keys ## Training binary - src/bin/train_marl.rs (required-features=["train"], excluded from default build) - CLI: --episodes --drones --profile --steps --checkpoint-dir --checkpoint-every - Wires CandleTrainer to the SwarmOrchestrator rollout loop; GAE + PPO update per episode; periodic safetensors checkpoints ## Right-sized launch (scripts/gcp/) - provision_marl.sh: g2-standard-16 (1× L4, 16 vCPU, ~$1.40/hr) — NOT the $29/hr A100×8 box. MARL is rollout-bound not matmul-bound; ~21× cheaper. - run_marl_train.sh: GCP rsync + train + checkpoint pull - run_marl_train_local.sh: local RTX 5080, $0 - A100×8 provision_training.sh left for OccWorld (which saturates the GPUs) ## Tests - --no-default-features: 91/91 (87 + 4 role_attention) - --features train: 96/96 (+ 5 candle_ppo, incl. real-autodiff verification) - --features ruflo,itar-unrestricted: 104/104 - default build stays light: train_marl excluded via required-features Co-Authored-By: claude-flow <ruv@ruv.net>		2026-05-30 12:43:56 -04:00
..
cosmos_eval.sh	feat(worldmodel): Candle Rust port + GCP GPU scripts (ADR-147 Phase 4+6)	2026-05-29 20:52:51 -04:00
provision_cosmos.sh	feat(worldmodel): Candle Rust port + GCP GPU scripts (ADR-147 Phase 4+6)	2026-05-29 20:52:51 -04:00
provision_marl.sh	feat(swarm): real Candle autodiff PPO + A-MAPPO role attention + GPU training (M4)	2026-05-30 12:43:56 -04:00
provision_training.sh	feat(worldmodel): Candle Rust port + GCP GPU scripts (ADR-147 Phase 4+6)	2026-05-29 20:52:51 -04:00
run_marl_train.sh	feat(swarm): real Candle autodiff PPO + A-MAPPO role attention + GPU training (M4)	2026-05-30 12:43:56 -04:00
run_marl_train_local.sh	feat(swarm): real Candle autodiff PPO + A-MAPPO role attention + GPU training (M4)	2026-05-30 12:43:56 -04:00
run_training.sh	feat(worldmodel): Candle Rust port + GCP GPU scripts (ADR-147 Phase 4+6)	2026-05-29 20:52:51 -04:00
teardown.sh	feat(worldmodel): Candle Rust port + GCP GPU scripts (ADR-147 Phase 4+6)	2026-05-29 20:52:51 -04:00