From 13f43004c8c53bf1766e1559eb7e0978059571b2 Mon Sep 17 00:00:00 2001
From: ruv <ruv@ruv.net>
Date: Mon, 11 May 2026 13:09:49 -0400
Subject: [PATCH] docs(meridian): iteration 3 plan + GPU pre-train wiring stub
 (#68)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Closes the prototype's "iter 3 = plan + wiring documented" item (ADR-027 §2.0):

  - scripts/pretrain-mae-gcloud.sh — GCloud GPU driver for the MAE pre-train: a
    thin, reviewable mirror of scripts/gcloud-train.sh that provisions a VM in
    cognitum-20260110, builds wifi-densepose-train --features tch-backend,cuda,
    runs the `pretrain-mae` binary, downloads the .ot variable store, tears the
    VM down. Currently drives SyntheticCsiDataset (the smoke path); the one TODO
    is the --data-dir/--datasets plumbing for the real heterogeneous corpus.
    NOT run as part of this prototype. Also supports --dry-run (local synthetic
    pre-train, needs LibTorch).
  - ADR-027 §2.0 — added the "Iteration 3 plan" subsection: heterogeneous-CSI
    ingest (own recordings + MM-Fi + Wi-Pose + multi-band virtual sub-carriers,
    normalised to 56 sub-carriers), the GPU run, lifting the v0 limits
    (per-sample masking, transformer blocks, circular phase loss), the fine-tune
    handoff (load the CsiMae encoder into WiFiDensePoseModel via a
    `--init-encoder <mae.ot>` flag, then train the §2.x heads as regularisers),
    cross-domain eval (§4.6 protocol), and shipping the encoder as an RVF segment.
  - wifi-densepose-train/README.md — new "MERIDIAN-MAE" section pointing at the
    csi_mae module, the pretrain-mae binary, the gcloud script, and ADR-027 §2.0.
  - csi_mae.rs module doc — updated the iteration-status block.

cargo test -p wifi-densepose-train --no-default-features → 121 lib tests pass.

This completes the MERIDIAN CSI-MAE *prototype* (iter 1 masking pipeline +
iter 2 tch model/pretrain loop/bin + iter 3 plan/wiring). Real cross-domain
results need the heterogeneous ingest + a GPU pre-train run (iter 3 execution),
out of scope for the prototype.

Co-Authored-By: claude-flow <ruv@ruv.net>
---
 ...cross-environment-domain-generalization.md |  17 ++
 scripts/pretrain-mae-gcloud.sh                | 162 ++++++++++++++++++
 v2/crates/wifi-densepose-train/README.md      |  18 ++
 v2/crates/wifi-densepose-train/src/csi_mae.rs |  13 +-
 4 files changed, 206 insertions(+), 4 deletions(-)
 create mode 100644 scripts/pretrain-mae-gcloud.sh

diff --git a/docs/adr/ADR-027-cross-environment-domain-generalization.md b/docs/adr/ADR-027-cross-environment-domain-generalization.md
index 0533ca19..8f44163d 100644
--- a/docs/adr/ADR-027-cross-environment-domain-generalization.md
+++ b/docs/adr/ADR-027-cross-environment-domain-generalization.md
@@ -90,6 +90,23 @@ Five concurrent lines of research have converged on the domain generalization pr
 - ◻ **Iteration 3+**: pool & ingest heterogeneous CSI (own recordings + MM-Fi + Wi-Pose + multi-band virtual sub-carriers); real pre-train run (GPU — `scripts/gcloud-train.sh` / the cognitum project); per-sample masking + self-attention transformer blocks (lift the v0 limits); fine-tune the §2.x heads on top of the pre-trained encoder; cross-domain eval (§4.6 protocol); ship the encoder as an RVF segment (§4.7).
 - ⏸ **Out of scope here**: the per-room SFDA adaptation (stage 3) — its own ADR.
 
+#### Iteration 3 plan — heterogeneous-CSI ingest, GPU pre-train, fine-tune handoff
+
+The remaining prototype work (the parts that can't run on the dev box):
+
+1. **Heterogeneous-CSI ingest.** A `csi_mae`-adjacent loader that pools every reachable CSI source into a uniform `[T, tx, rx, sub]` window stream, normalising sub-carrier count to 56 (via `wifi-densepose-train::subcarrier::interpolate_subcarriers`) and amplitude scale per-frame:
+   - own captures: `data/recordings/*.csi.jsonl`, overnight recordings;
+   - `MmFiDataset` (ADR-015, NeurIPS-2023 MM-Fi, 114 sub-carriers → interpolate);
+   - Wi-Pose (ADR-015);
+   - multi-band virtual sub-carriers from `ruvsense/multiband.rs` (3 channels × 56 → 168) — treated as extra tokens, not extra streams;
+   - public CSI corpora as available.
+   Implemented as a `CsiDataset` impl (e.g. `PooledCsiDataset`) that round-robins / weights sources; `pretrain-mae` gains a `--datasets <spec>` flag selecting it instead of `SyntheticCsiDataset`. *Thesis (arXiv:2511.18792): breadth of this pool — devices, bands, rooms — is what buys cross-domain generalisation; the model stays small.*
+2. **GPU pre-train run.** `scripts/pretrain-mae-gcloud.sh` (added this iteration — a thin mirror of `scripts/gcloud-train.sh`): provisions a GCloud VM in `cognitum-20260110`, builds `wifi-densepose-train` with `--features tch-backend,cuda`, runs `pretrain-mae`, downloads the `.ot` variable store, tears the VM down. Currently drives `SyntheticCsiDataset` (the smoke path); the `--data-dir`/`--datasets` plumbing for the real corpus is the one TODO in that script. *Not run as part of this prototype.*
+3. **Lift the v0 model limits.** Per-sample masking (gather/scatter so each window in a batch can have its own mask), self-attention transformer blocks in the encoder/decoder (replacing the residual MLPs and the flatten-to-latent bottleneck — this also removes the fixed-`n_tokens` constraint), a circular phase-reconstruction loss.
+4. **Fine-tune handoff.** Load the pre-trained `CsiMae` encoder weights into the `model::WiFiDensePoseModel` front-end (the `ModalityTranslator` slot), freeze for a warm-up, then unfreeze; train the 17-keypoint / DensePose-UV heads, the AETHER contrastive embedding (ADR-024), and the §2.1–§2.6 domain-adversarial / geometry-conditioned layers *as regularisers on top of the pre-trained representation*. A `train` sub-command flag (`--init-encoder <mae.ot>`) wires this.
+5. **Cross-domain eval.** Run §4.6's protocol (leave-one-room-out / leave-one-device-out) on the fine-tuned model vs. the from-scratch baseline; the win condition is the +2.2 %…+15.7 % cross-domain band that 2511.18792 reports for MAE pre-training.
+6. **Ship the encoder** as an RVF segment (§4.7) so deployments load a pre-trained backbone and only carry the small task head + per-room adapter (stage 3 / the SFDA ADR).
+
 The remainder of this ADR (§2.1 onward) describes the **fine-tune-stage architecture** — read it as "the head and regularisers that sit on top of the §2.0 pre-trained encoder", not as a from-scratch design.
 
 ### 2.1 Architecture: Environment-Disentangled Dual-Path Transformer
diff --git a/scripts/pretrain-mae-gcloud.sh b/scripts/pretrain-mae-gcloud.sh
new file mode 100644
index 00000000..b9b44df1
--- /dev/null
+++ b/scripts/pretrain-mae-gcloud.sh
@@ -0,0 +1,162 @@
+#!/bin/bash
+# ==============================================================================
+# GCloud GPU driver for the MERIDIAN CSI masked-autoencoder pre-train (ADR-027 §2.0)
+# ==============================================================================
+#
+# Creates a GCloud VM with a GPU, builds wifi-densepose-train with the
+# `tch-backend` (+ `cuda`) feature, runs the `pretrain-mae` binary, downloads
+# the pre-trained variable store (`.ot`), and tears the VM down.
+#
+# STATUS: prototype wiring stub (ADR-027 §2.0, iteration 3). The `pretrain-mae`
+# binary currently drives the *deterministic SyntheticCsiDataset* — that's the
+# end-to-end smoke path. The real heterogeneous-CSI pre-train (MM-Fi + Wi-Pose +
+# data/recordings/ + multi-band virtual sub-carriers) needs the ingest pipeline
+# tracked in ADR-027 §2.0 "Iteration 3 plan"; the TODO markers below show where
+# it plugs in. This script is intentionally a thin, reviewable shell of the real
+# gcloud-train.sh (which it mirrors) — it has NOT been run.
+#
+# Usage:
+#   bash scripts/pretrain-mae-gcloud.sh [OPTIONS]
+#
+# Options:
+#   --gpu        l4|a100|h100   GPU type (default: l4)
+#   --zone       ZONE           GCloud zone (default: us-central1-a)
+#   --hours      N              Max VM lifetime in hours (default: 3)
+#   --epochs     N              Pre-train epochs (default: 20)
+#   --samples    N              Synthetic samples (until the real ingest lands) (default: 4096)
+#   --batch      N              Mini-batch size (default: 64)
+#   --mask-ratio R              Token mask ratio (default: 0.75)
+#   --lr         R              Adam learning rate (default: 1e-3)
+#   --out        FILE           Local path for the downloaded .ot (default: data/models/mae-pretrained.ot)
+#   --data-dir   DIR            (future) heterogeneous CSI corpus to upload — see TODO below
+#   --dry-run                   Build + run a tiny pre-train locally with synthetic data; no VM
+#   --keep-vm                   Do not delete the VM after the run
+#   --instance   NAME           Custom VM instance name
+#
+# Prerequisites (same as gcloud-train.sh):
+#   - gcloud CLI authenticated:  gcloud auth login
+#   - Project set:               gcloud config set project cognitum-20260110
+#   - GPU quota in the chosen zone
+#
+# Cost (same envelope as gcloud-train.sh):
+#   L4 ~$0.80/hr (prototyping) · A100 40GB ~$3.60/hr (full pre-train) · H100 80GB ~$11/hr
+# ==============================================================================
+
+set -euo pipefail
+
+# ── Defaults ──────────────────────────────────────────────────────────────────
+PROJECT="cognitum-20260110"
+GPU_TYPE="l4"
+ZONE="us-central1-a"
+HOURS=3
+EPOCHS=20
+SAMPLES=4096
+BATCH=64
+MASK_RATIO=0.75
+LR="1e-3"
+OUT="data/models/mae-pretrained.ot"
+DATA_DIR=""
+DRY_RUN=0
+KEEP_VM=0
+INSTANCE="meridian-mae-$(date +%s)"
+
+# ── Arg parse ─────────────────────────────────────────────────────────────────
+while [[ $# -gt 0 ]]; do
+  case "$1" in
+    --gpu)        GPU_TYPE="$2"; shift 2;;
+    --zone)       ZONE="$2"; shift 2;;
+    --hours)      HOURS="$2"; shift 2;;
+    --epochs)     EPOCHS="$2"; shift 2;;
+    --samples)    SAMPLES="$2"; shift 2;;
+    --batch)      BATCH="$2"; shift 2;;
+    --mask-ratio) MASK_RATIO="$2"; shift 2;;
+    --lr)         LR="$2"; shift 2;;
+    --out)        OUT="$2"; shift 2;;
+    --data-dir)   DATA_DIR="$2"; shift 2;;
+    --dry-run)    DRY_RUN=1; shift;;
+    --keep-vm)    KEEP_VM=1; shift;;
+    --instance)   INSTANCE="$2"; shift 2;;
+    -h|--help)    sed -n '2,46p' "$0"; exit 0;;
+    *) echo "unknown option: $1" >&2; exit 2;;
+  esac
+done
+
+case "$GPU_TYPE" in
+  l4)   ACCEL="type=nvidia-l4,count=1";        MACHINE="g2-standard-8";;
+  a100) ACCEL="type=nvidia-tesla-a100,count=1"; MACHINE="a2-highgpu-1g";;
+  h100) ACCEL="type=nvidia-h100-80gb,count=1";  MACHINE="a3-highgpu-1g";;
+  *) echo "unknown --gpu: $GPU_TYPE (l4|a100|h100)" >&2; exit 2;;
+esac
+
+PRETRAIN_ARGS="--epochs $EPOCHS --samples $SAMPLES --batch $BATCH --mask-ratio $MASK_RATIO --lr $LR --save mae-pretrained.ot"
+
+# ── Dry run: build + tiny pre-train locally (synthetic data), no VM ───────────
+if [[ "$DRY_RUN" -eq 1 ]]; then
+  echo "[dry-run] cargo run -p wifi-densepose-train --features tch-backend --bin pretrain-mae -- --epochs 2 --samples 64 --batch 8"
+  echo "[dry-run] (requires LibTorch — set LIBTORCH or use a tch download-libtorch feature build)"
+  cd "$(dirname "$0")/../v2"
+  cargo run -p wifi-densepose-train --features tch-backend --bin pretrain-mae -- --epochs 2 --samples 64 --batch 8
+  exit 0
+fi
+
+# ── Provision VM ──────────────────────────────────────────────────────────────
+echo "==> Project: $PROJECT  Zone: $ZONE  GPU: $GPU_TYPE  Machine: $MACHINE  Instance: $INSTANCE"
+gcloud config set project "$PROJECT" >/dev/null
+gcloud compute instances create "$INSTANCE" \
+  --zone="$ZONE" --machine-type="$MACHINE" \
+  --accelerator="$ACCEL" --maintenance-policy=TERMINATE \
+  --image-family=pytorch-latest-gpu --image-project=deeplearning-platform-release \
+  --boot-disk-size=128GB --metadata="install-nvidia-driver=True" \
+  --max-run-duration="${HOURS}h" --instance-termination-action=DELETE
+
+cleanup() {
+  if [[ "$KEEP_VM" -eq 0 ]]; then
+    echo "==> Deleting VM $INSTANCE"
+    gcloud compute instances delete "$INSTANCE" --zone="$ZONE" --quiet || true
+  else
+    echo "==> --keep-vm set; VM $INSTANCE left running (remember to delete it)."
+  fi
+}
+trap cleanup EXIT
+
+run_remote() { gcloud compute ssh "$INSTANCE" --zone="$ZONE" --command="$1"; }
+
+echo "==> Waiting for SSH..."
+for _ in $(seq 1 30); do run_remote "true" 2>/dev/null && break; sleep 10; done
+
+echo "==> Provisioning toolchain on the VM"
+run_remote 'set -e
+  curl --proto "=https" --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
+  source "$HOME/.cargo/env"
+  # The pytorch-latest-gpu image ships libtorch; point tch at it.
+  TORCH_DIR="$(python -c "import torch,os;print(os.path.dirname(torch.__file__))")"
+  echo "export LIBTORCH=$TORCH_DIR" >> "$HOME/.bashrc"
+  echo "export LD_LIBRARY_PATH=$TORCH_DIR/lib:\$LD_LIBRARY_PATH" >> "$HOME/.bashrc"
+  sudo apt-get update -qq && sudo apt-get install -y -qq git build-essential pkg-config'
+
+echo "==> Uploading repo"
+# rsync the repo (excluding build artifacts) — same approach as gcloud-train.sh.
+gcloud compute scp --recurse --zone="$ZONE" \
+  ../v2 ../scripts ../docs "$INSTANCE":~/ruview/ >/dev/null
+
+# TODO (ADR-027 §2.0, iter 3 ingest): when --data-dir is given, upload the
+# heterogeneous CSI corpus and point pretrain-mae at it instead of the synthetic
+# dataset (needs a `--data-dir`/`--datasets` flag on the bin first — see the plan).
+if [[ -n "$DATA_DIR" ]]; then
+  echo "==> Uploading CSI corpus from $DATA_DIR"
+  gcloud compute scp --recurse --zone="$ZONE" "$DATA_DIR" "$INSTANCE":~/ruview/csi-corpus/ >/dev/null
+  PRETRAIN_ARGS="$PRETRAIN_ARGS # TODO: --data-dir ~/ruview/csi-corpus"
+fi
+
+echo "==> Building + running pre-train on the VM"
+run_remote "set -e; source \$HOME/.cargo/env; source \$HOME/.bashrc
+  cd ~/ruview/v2
+  cargo build --release -p wifi-densepose-train --features tch-backend,cuda
+  cargo run --release -p wifi-densepose-train --features tch-backend,cuda --bin pretrain-mae -- $PRETRAIN_ARGS"
+
+echo "==> Downloading pre-trained variable store → $OUT"
+mkdir -p "$(dirname "$OUT")"
+gcloud compute scp --zone="$ZONE" "$INSTANCE":~/ruview/v2/mae-pretrained.ot "$OUT"
+
+echo "==> Done. Pre-trained encoder: $OUT"
+echo "    Next: fine-tune the ADR-027 §2.x heads on top of it (see §2.0 'Iteration 3 plan')."
diff --git a/v2/crates/wifi-densepose-train/README.md b/v2/crates/wifi-densepose-train/README.md
index 4610f7b0..d8f620c0 100644
--- a/v2/crates/wifi-densepose-train/README.md
+++ b/v2/crates/wifi-densepose-train/README.md
@@ -82,6 +82,24 @@ wifi-densepose-train/src/
   trainer.rs        -- (tch) Training loop orchestrator         [feature-gated]
 ```
 
+## MERIDIAN-MAE — masked-autoencoder pre-training (ADR-027 §2.0)
+
+The `csi_mae` module implements a CIG-MAE-style **dual-stream (amplitude + phase)** masked
+autoencoder for cross-domain CSI pre-training. The thesis (2026-Q2 SOTA survey, arXiv:2511.18792):
+cross-room generalisation is a *data-breadth* problem — pre-train one CSI encoder on heterogeneous
+capture, attach a small task head — not a bigger-pose-net problem.
+
+* Pure-Rust (always built): `MaeConfig`, `MaskStrategy` (`Random` / `InfoGuided` — the latter
+  variance-weights token selection so high-information tokens are masked), `TokenLayout`,
+  `mask_csi_window`, `reassemble_tokens`. Dependency-free deterministic masking.
+* `csi_mae::model` (feature `tch-backend`): `CsiMae` (encoder over visible tokens → latent →
+  decoder reconstructs masked amplitude+phase), `reconstruction_loss`, `MaeBatch`, `pretrain_step`.
+* Driver: `cargo run -p wifi-densepose-train --features tch-backend --bin pretrain-mae -- --epochs 5`
+  (synthetic data). GPU run: `bash scripts/pretrain-mae-gcloud.sh` (prototype wiring stub).
+
+See `docs/adr/ADR-027-cross-environment-domain-generalization.md` §2.0 for the full plan
+(heterogeneous-CSI ingest, GPU pre-train, fine-tune handoff, cross-domain eval).
+
 ## Related Crates
 
 | Crate | Role |
diff --git a/v2/crates/wifi-densepose-train/src/csi_mae.rs b/v2/crates/wifi-densepose-train/src/csi_mae.rs
index a2e1b934..554fb0a6 100644
--- a/v2/crates/wifi-densepose-train/src/csi_mae.rs
+++ b/v2/crates/wifi-densepose-train/src/csi_mae.rs
@@ -32,10 +32,15 @@
 //!
 //! # Status
 //!
-//! Prototype, iteration 1: masking pipeline + config + tests + ADR §2.0. The
-//! `model` submodule is a v0 skeleton (MLP encoder/decoder, batch-level masking)
-//! — transformer blocks, per-sample masking, information-guided masking, and a
-//! `pretrain-mae` binary land in subsequent iterations.
+//! Prototype. **iter 1**: masking pipeline + config + tests + ADR §2.0.
+//! **iter 2a**: information-guided masking ([`MaskStrategy::InfoGuided`]).
+//! **iter 2b**: the [`model`] submodule — `CsiMae` (MLP-based v0 dual-stream
+//! encoder/decoder, batch-shared masking), `reconstruction_loss`, `MaeBatch`,
+//! `pretrain_step`, plus the `pretrain-mae` binary (`bin/pretrain_mae.rs`,
+//! `--features tch-backend`). **iter 3+** (see ADR-027 §2.0 "Iteration 3 plan"
+//! and `scripts/pretrain-mae-gcloud.sh`): heterogeneous-CSI ingest, the real
+//! GPU pre-train run, per-sample masking + self-attention transformer blocks
+//! (lifting the v0 limits), and the fine-tune handoff into the §2.x heads.
 //!
 //! [CIG-MAE]: https://arxiv.org/html/2512.04723v1