# ADR-120 — Windowed Temporal Classifier (W-MLP) **Status**: Accepted **Date**: 2026-05-18 **Scope**: `v2/crates/wifi-densepose-sensing-server/src/adaptive_classifier.rs` (`WindowedMlpModel`, `train_windowed_mlp_classifier`, `eval_windowed_mlp`, `AdaptiveModel::classify_window`); `main.rs` (`AppStateInner.feature_window`, `push_feature_window`, `adaptive_override` switching to window path). ## Context ADR-119 added a small MLP (22 → 32 → 6) that improved accuracy from 49.58% (LogReg) to **53.53%**. Loss flatlined at ~1.15 around epoch 10 of 30 — clear signal that the **frame-level information ceiling** had been reached for the 22-feature representation. The dataset has 7 activity classes that differ primarily in **temporal patterns**, not in any single frame: * `walking` step cadence: ~2 Hz (visible in 0.5-second window) * `transition` (sit-stand): ~0.5 Hz (visible in 2-second window) * `waving` limb cadence: 1-2 Hz * `active` (jumping): bursty / quasi-periodic at ~3 Hz * `present_still` (sitting + standing merged): no temporal signature Per-frame, `walking` and `active` and `waving` all look "moving" with similar amplitude std/skew — they're disambiguated only by HOW the amplitude pattern evolves over 1-2 seconds. A classifier that sees a single frame can't tell them apart no matter how good the per-frame features are. ## Decisions ### D1 — Stack 20 consecutive frames into a 440-d input ``` WINDOW_FRAMES = 20 (~2 seconds at ~10 Hz tick rate) N_FEATURES = 22 (from ADR-118) WINDOWED_INPUT = 20 × 22 = 440 WINDOWED_HIDDEN = 64 ``` Network: `440 → 64 ReLU → n_classes softmax`. ~28k weights total — larger than the frame-level MLP's 3k, but still small enough to train in <60s and serialize as JSON. Training samples are built by sliding a window of 20 frames with **stride 5** within each recording (4× overlap). Windows do **not** cross recording boundaries — each window inherits its source recording's class label. On the 6-node 151k-frame set: * 7 recordings × ~21k frames each = 151k frames total * (21k − 20) / 5 ≈ 4,300 windows per recording * Total: ~30k windowed samples * Class balance is roughly preserved (each recording is one class) ### D2 — Manual backprop, same recipe as MLP Same SGD + momentum 0.9 + weight decay 1e-4 + cosine LR decay. Base LR lowered to 0.03 (vs MLP's 0.05) because the network is bigger. 25 epochs. He initialisation, ReLU activation, softmax output, cross-entropy loss. ### D3 — `AdaptiveModel` carries all three classifiers, classify routes by availability ```rust pub struct AdaptiveModel { pub weights: Vec>, // ADR-118 legacy LogReg pub mlp: MlpModel, // ADR-119 frame-level MLP pub windowed_mlp: WindowedMlpModel, // ADR-120 (this) — primary // ... } ``` `classify_window()` (new API) prefers `windowed_mlp` when trained AND the caller has a 20-frame buffer. Falls through to frame-level MLP when called with insufficient history. Old JSON model files load with `MlpModel::default()` and `WindowedMlpModel::default()` filling absent fields — backward compatible. ### D4 — Rolling buffer in `AppStateInner`, pushed per tick ```rust struct AppStateInner { feature_window: VecDeque<[f64; N_FEATURES]>, // capacity = WINDOW_FRAMES // ... } ``` New helper `push_feature_window(&mut s, &features)` computes the 22-d feature vector from current per-node amps, pushes to the back of the buffer, evicts oldest when over capacity. Called at all three tick sites where `adaptive_override` runs: * `main.rs:~3030` — multi-BSSID tick handler * `main.rs:~3225` — WiFi fallback tick handler * `main.rs:~6510` — per-node loop in the broadcast tick task `adaptive_override` (read-only over state) builds the 440-d input by copying the buffer's last 19 entries + the current frame's features, then calls `model.classify_window(&flat)`. Cold-start (buffer < 20) falls back to `model.classify(&feat_arr)` — frame-level MLP. ## Verified Acceptance Retrained on the same 6-node, 151,329-frame set used since ADR-118: ``` LogReg: 49.58% MLP: 53.53% (+3.95 vs LogReg) W-MLP: 90.40% (+36.87 vs MLP) ``` Per-class (frame-level MLP → W-MLP): ``` absent 41% → 100% +59 present_still 99% → 100% +1 (already saturated) transition 36% → 86% +50 (sit-stand cadence captured) active 30% → 74% +44 (jumping cadence captured) waving 38% → 90% +52 (gesture cadence captured) present_moving 33% → 82% +49 (walking step cadence captured) ``` Loss curve confirms breakout from the frame-level plateau: ``` MLP: epoch 0 → 1.28 → epoch 29 → 1.14 (flat plateau) W-MLP: epoch 0 → 1.01 → epoch 24 → 0.25 (still trending) ``` Total cumulative improvement vs the start-of-session 2-node 15-feature LogReg baseline: ``` 40.4% → 90.40% = +50.0 percentage points ``` ## Caveat — training vs generalization 90.40% is **training accuracy**. The W-MLP has ~28,800 weights trained on ~30,200 windowed samples — capacity is comparable to dataset size, so some overfitting is expected. True generalization performance will only be measurable once an independent test set is captured. Mitigations already in place: * Weight decay 1e-4 regularises against memorisation * Cosine LR decay with smooth annealing * Stride 5 in window construction reduces near-duplicate samples * Architecture stays small (one hidden layer) — limits overfit capacity Recommended follow-up: record a 60-second held-out session per class (separate from training), evaluate W-MLP cold, compare to training accuracy. Expected drop: 5-15 pts for a healthy model. ## Files Touched ``` v2/crates/wifi-densepose-sensing-server/src/adaptive_classifier.rs: + const WINDOW_FRAMES = 20, WINDOWED_INPUT = 440, WINDOWED_HIDDEN = 64 + pub const N_FEATURES_PUB (for external buffer sizing) + pub struct WindowedMlpModel { w1, b1, w2, b2, n_classes } + impl WindowedMlpModel::{is_trained, forward} + AdaptiveModel.windowed_mlp field (serde-default) + AdaptiveModel::classify_window method + train_from_recordings builds recording_groups, slides windows, calls train_windowed_mlp_classifier + train_windowed_mlp_classifier (~150 LoC manual backprop) + eval_windowed_mlp helper + #[derive(Clone)] on Sample (for recording_groups Vec) v2/crates/wifi-densepose-sensing-server/src/main.rs: + AppStateInner.feature_window: VecDeque<[f64; N_FEATURES_PUB]> + push_feature_window helper + adaptive_override switches to classify_window when buffer is full + 3 tick sites call push_feature_window before adaptive_override docs/adr/ADR-120-windowed-temporal-classifier.md (this) ``` ## Out of Scope / Follow-ups * **Held-out test set** — record fresh data, evaluate cold to confirm 90% isn't memorisation. * **TCN instead of stacked-MLP** — 1D conv over time would use weights more efficiently (~5k vs 28k). Worth pursuing if dataset scales 10×. * **Output smoothing** — shipped via two-layer mode+confirm filter on the adaptive output, see ADR-120 follow-up commits. * **Split `sitting`/`standing`** — currently merged into `present_still`; separating them would test whether the temporal RF signatures differ. * **Class imbalance** — `present_still` has 2× windows; oversampling minority classes might lift accuracy 1-2 pts. * **Window size experiments** — 20 frames is a reasonable first guess; 10 (faster) or 30 (more context) untested. ## References * ADR-118 — feature decorrelation + multi-node (22-feature basis) * ADR-119 — frame-level MLP (sibling classifier, fallback at cold start) * ADR-101 — raw amplitude classifier (the path that calls `AdaptiveModel` via `adaptive_override`) * ADR-105 — no synthetic data in production runtime; this ADR's confidence output is real model softmax probability, not a hardcoded value