Commit Graph

2 Commits

Author SHA1 Message Date
ruv 4f004e018b feat(swarm): real Candle autodiff PPO + A-MAPPO role attention + GPU training (M4)
Replaces the finite-difference PPO placeholder with a real GPU-capable Candle
0.9 autodiff trainer, adds A-MAPPO heterogeneous-role attention, a runnable
training binary, and right-sized GCP/local launch scripts. This is the unlock
that makes "GPU long training cycles" actually mean something — the previous
ppo_update did no gradient descent.

## Real autodiff PPO (feature `train`, optional `cuda`)
- candle_ppo.rs: CandleActorCritic (64→128→64 MLP + action/value heads +
  learnable log_std), CandlePpoConfig, CandleTrainer with GAE and a genuine
  optimizer.backward_step over the network. select_device() picks CUDA when
  built --features cuda and a GPU is present, else CPU.
- Verified: 5-episode CPU smoke run shows value_loss 12643→12375 (critic
  actually learning); safetensors checkpoint saved. Placeholder never moved weights.

## A-MAPPO heterogeneous-role attention (role_attention.rs, always compiled)
Addresses the four sensor-vs-relay edge cases:
- relay attention floor (prevents collapse — relays produce no CSI)
- role-segmented sensor/relay attention pools (variable neighbor cardinality)
- sensor-gated triangulation-geometry penalty (protects 3-view fusion baseline,
  ADR-148 §4.2 — relays not dragged into triangulation geometry)
- one-hot role embeddings for keys

## Training binary
- src/bin/train_marl.rs (required-features=["train"], excluded from default build)
- CLI: --episodes --drones --profile --steps --checkpoint-dir --checkpoint-every
- Wires CandleTrainer to the SwarmOrchestrator rollout loop; GAE + PPO update
  per episode; periodic safetensors checkpoints

## Right-sized launch (scripts/gcp/)
- provision_marl.sh: g2-standard-16 (1× L4, 16 vCPU, ~$1.40/hr) — NOT the
  $29/hr A100×8 box. MARL is rollout-bound not matmul-bound; ~21× cheaper.
- run_marl_train.sh: GCP rsync + train + checkpoint pull
- run_marl_train_local.sh: local RTX 5080, $0
- A100×8 provision_training.sh left for OccWorld (which saturates the GPUs)

## Tests
- --no-default-features: 91/91 (87 + 4 role_attention)
- --features train: 96/96 (+ 5 candle_ppo, incl. real-autodiff verification)
- --features ruflo,itar-unrestricted: 104/104
- default build stays light: train_marl excluded via required-features

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-05-30 12:43:56 -04:00
ruv 9ad550d95f feat(worldmodel): Candle Rust port + GCP GPU scripts (ADR-147 Phase 4+6)
Candle native port — wifi-densepose-occworld-candle v0.3.0:
- config.rs: OccWorldConfig (14 params matching occworld.py)
- vqvae.rs: ClassEmbedding(18→64), VQCodebook(512×512, squared-L2),
  QuantConv/PostQuantConv(1×1 Conv2d), fold_3d_to_2d helpers
  ResNet encoder/decoder are documented stubs (Phase 5 checkpoint pending)
- transformer.rs: full Candle MHA transformer (2 layers, temporal+spatial
  cross-attention, FFN, pre-norm residuals)
- inference.rs: OccWorldCandle::dummy() + ::load() + predict()
  InferenceOutput: sem_pred(1,15,200,200,16) + trajectory_priors
- 14/14 tests pass (12 lib + 2 doctests)

GCP GPU scripts — scripts/gcp/:
- provision_training.sh: a2-highgpu-8g (8×A100 40GB) for Phase 5 retraining
- run_training.sh: rsync + torchrun 8-GPU train + checkpoint download
- provision_cosmos.sh: a2-ultragpu-1g (A100 80GB) for Cosmos evaluation
- cosmos_eval.sh: run Cosmos-Transfer2.5 inference, download results
- teardown.sh: safe checkpoint download + instance delete

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-05-29 20:52:51 -04:00