wifi-densepose/docs/adr/ADR-120-windowed-temporal-c...

201 lines
7.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ADR-120 — Windowed Temporal Classifier (W-MLP)
**Status**: Accepted
**Date**: 2026-05-18
**Scope**: `v2/crates/wifi-densepose-sensing-server/src/adaptive_classifier.rs`
(`WindowedMlpModel`, `train_windowed_mlp_classifier`, `eval_windowed_mlp`,
`AdaptiveModel::classify_window`); `main.rs` (`AppStateInner.feature_window`,
`push_feature_window`, `adaptive_override` switching to window path).
## Context
ADR-119 added a small MLP (22 → 32 → 6) that improved accuracy from 49.58%
(LogReg) to **53.53%**. Loss flatlined at ~1.15 around epoch 10 of 30 —
clear signal that the **frame-level information ceiling** had been
reached for the 22-feature representation.
The dataset has 7 activity classes that differ primarily in **temporal
patterns**, not in any single frame:
* `walking` step cadence: ~2 Hz (visible in 0.5-second window)
* `transition` (sit-stand): ~0.5 Hz (visible in 2-second window)
* `waving` limb cadence: 1-2 Hz
* `active` (jumping): bursty / quasi-periodic at ~3 Hz
* `present_still` (sitting + standing merged): no temporal signature
Per-frame, `walking` and `active` and `waving` all look "moving" with
similar amplitude std/skew — they're disambiguated only by HOW the
amplitude pattern evolves over 1-2 seconds. A classifier that sees a
single frame can't tell them apart no matter how good the per-frame
features are.
## Decisions
### D1 — Stack 20 consecutive frames into a 440-d input
```
WINDOW_FRAMES = 20 (~2 seconds at ~10 Hz tick rate)
N_FEATURES = 22 (from ADR-118)
WINDOWED_INPUT = 20 × 22 = 440
WINDOWED_HIDDEN = 64
```
Network: `440 → 64 ReLU → n_classes softmax`. ~28k weights total —
larger than the frame-level MLP's 3k, but still small enough to train
in <60s and serialize as JSON.
Training samples are built by sliding a window of 20 frames with **stride
5** within each recording (4× overlap). Windows do **not** cross recording
boundaries each window inherits its source recording's class label.
On the 6-node 151k-frame set:
* 7 recordings × ~21k frames each = 151k frames total
* (21k 20) / 5 4,300 windows per recording
* Total: ~30k windowed samples
* Class balance is roughly preserved (each recording is one class)
### D2 — Manual backprop, same recipe as MLP
Same SGD + momentum 0.9 + weight decay 1e-4 + cosine LR decay. Base LR
lowered to 0.03 (vs MLP's 0.05) because the network is bigger. 25 epochs.
He initialisation, ReLU activation, softmax output, cross-entropy loss.
### D3 — `AdaptiveModel` carries all three classifiers, classify routes by availability
```rust
pub struct AdaptiveModel {
pub weights: Vec<Vec<f64>>, // ADR-118 legacy LogReg
pub mlp: MlpModel, // ADR-119 frame-level MLP
pub windowed_mlp: WindowedMlpModel, // ADR-120 (this) — primary
// ...
}
```
`classify_window()` (new API) prefers `windowed_mlp` when trained AND
the caller has a 20-frame buffer. Falls through to frame-level MLP
when called with insufficient history. Old JSON model files load with
`MlpModel::default()` and `WindowedMlpModel::default()` filling absent
fields backward compatible.
### D4 — Rolling buffer in `AppStateInner`, pushed per tick
```rust
struct AppStateInner {
feature_window: VecDeque<[f64; N_FEATURES]>, // capacity = WINDOW_FRAMES
// ...
}
```
New helper `push_feature_window(&mut s, &features)` computes the 22-d
feature vector from current per-node amps, pushes to the back of the
buffer, evicts oldest when over capacity. Called at all three tick
sites where `adaptive_override` runs:
* `main.rs:~3030` multi-BSSID tick handler
* `main.rs:~3225` WiFi fallback tick handler
* `main.rs:~6510` per-node loop in the broadcast tick task
`adaptive_override` (read-only over state) builds the 440-d input by
copying the buffer's last 19 entries + the current frame's features,
then calls `model.classify_window(&flat)`. Cold-start (buffer < 20)
falls back to `model.classify(&feat_arr)` frame-level MLP.
## Verified Acceptance
Retrained on the same 6-node, 151,329-frame set used since ADR-118:
```
LogReg: 49.58%
MLP: 53.53% (+3.95 vs LogReg)
W-MLP: 90.40% (+36.87 vs MLP)
```
Per-class (frame-level MLP W-MLP):
```
absent 41% → 100% +59
present_still 99% → 100% +1 (already saturated)
transition 36% → 86% +50 (sit-stand cadence captured)
active 30% → 74% +44 (jumping cadence captured)
waving 38% → 90% +52 (gesture cadence captured)
present_moving 33% → 82% +49 (walking step cadence captured)
```
Loss curve confirms breakout from the frame-level plateau:
```
MLP: epoch 0 → 1.28 → epoch 29 → 1.14 (flat plateau)
W-MLP: epoch 0 → 1.01 → epoch 24 → 0.25 (still trending)
```
Total cumulative improvement vs the start-of-session 2-node 15-feature
LogReg baseline:
```
40.4% → 90.40% = +50.0 percentage points
```
## Caveat — training vs generalization
90.40% is **training accuracy**. The W-MLP has ~28,800 weights trained
on ~30,200 windowed samples capacity is comparable to dataset size,
so some overfitting is expected. True generalization performance will
only be measurable once an independent test set is captured.
Mitigations already in place:
* Weight decay 1e-4 regularises against memorisation
* Cosine LR decay with smooth annealing
* Stride 5 in window construction reduces near-duplicate samples
* Architecture stays small (one hidden layer) limits overfit capacity
Recommended follow-up: record a 60-second held-out session per class
(separate from training), evaluate W-MLP cold, compare to training
accuracy. Expected drop: 5-15 pts for a healthy model.
## Files Touched
```
v2/crates/wifi-densepose-sensing-server/src/adaptive_classifier.rs:
+ const WINDOW_FRAMES = 20, WINDOWED_INPUT = 440, WINDOWED_HIDDEN = 64
+ pub const N_FEATURES_PUB (for external buffer sizing)
+ pub struct WindowedMlpModel { w1, b1, w2, b2, n_classes }
+ impl WindowedMlpModel::{is_trained, forward}
+ AdaptiveModel.windowed_mlp field (serde-default)
+ AdaptiveModel::classify_window method
+ train_from_recordings builds recording_groups, slides windows,
calls train_windowed_mlp_classifier
+ train_windowed_mlp_classifier (~150 LoC manual backprop)
+ eval_windowed_mlp helper
+ #[derive(Clone)] on Sample (for recording_groups Vec)
v2/crates/wifi-densepose-sensing-server/src/main.rs:
+ AppStateInner.feature_window: VecDeque<[f64; N_FEATURES_PUB]>
+ push_feature_window helper
+ adaptive_override switches to classify_window when buffer is full
+ 3 tick sites call push_feature_window before adaptive_override
docs/adr/ADR-120-windowed-temporal-classifier.md (this)
```
## Out of Scope / Follow-ups
* **Held-out test set** record fresh data, evaluate cold to confirm
90% isn't memorisation.
* **TCN instead of stacked-MLP** 1D conv over time would use weights
more efficiently (~5k vs 28k). Worth pursuing if dataset scales 10×.
* **Output smoothing** shipped via two-layer mode+confirm filter on the
adaptive output, see ADR-120 follow-up commits.
* **Split `sitting`/`standing`** currently merged into `present_still`;
separating them would test whether the temporal RF signatures differ.
* **Class imbalance** `present_still` has 2× windows; oversampling
minority classes might lift accuracy 1-2 pts.
* **Window size experiments** 20 frames is a reasonable first guess;
10 (faster) or 30 (more context) untested.
## References
* ADR-118 feature decorrelation + multi-node (22-feature basis)
* ADR-119 frame-level MLP (sibling classifier, fallback at cold start)
* ADR-101 raw amplitude classifier (the path that calls
`AdaptiveModel` via `adaptive_override`)
* ADR-105 no synthetic data in production runtime; this ADR's
confidence output is real model softmax probability, not a
hardcoded value