wifi-densepose/docs/adr/ADR-119-mlp-classifier.md

6.9 KiB
Raw Blame History

ADR-119 — MLP Replaces Logistic Regression in Adaptive Classifier

Status: Accepted Date: 2026-05-18 Scope: v2/crates/wifi-densepose-sensing-server/src/adaptive_classifier.rs (new MlpModel struct, train_mlp_classifier, eval_mlp; modified AdaptiveModel::classify + train_from_recordings).

Context

After ADR-118 (feature decorrelation + multi-node extractor) the adaptive classifier reached 49.58% accuracy on a 6-node, 7-class, 151,329-frame training set. Per-feature audit showed n6_std sep_ratio = 0.60 — i.e. the underlying signal can separate the classes — but logistic regression was limited to linear decision boundaries and couldn't model interactions like:

  • walking: n2_std high AND n6_std high AND dom_hz ≈ 3 Hz
  • waving: n1_std high BUT n2_std low (only close sensors fire)
  • sitting vs standing: same global features, differ in n6_std pattern

LogReg sums weighted features; it cannot represent "AND/BUT" combinations. A small MLP can: hidden units learn intermediate concepts, then the output layer combines them.

Decisions

D1 — Single-hidden-layer MLP, 22 → 32 → 6

  • Input: the same 22-feature vector from ADR-118.
  • Hidden: 32 ReLU units. ~3k weights, enough capacity for 6 classes but small enough to train in seconds on the 151k-frame set.
  • Output: softmax over n_classes (discovered dynamically at train time).
  • Z-score normalisation: identical to the LogReg path — same global_mean / global_std populated by train_from_recordings.

D2 — Manual backprop, no external ML crate

tch (LibTorch) or candle would pull in ~50-200 MB of native deps for a ~3k-parameter network. The forward + backward passes are ~150 LoC of pure Rust; SGD + momentum + cosine LR decay another ~30. Built-in f64 arithmetic is fast enough — full train completes in ~10 seconds on M1 Mac.

Optimiser: SGD with momentum 0.9, weight decay 1e-4, base LR 0.05 with half-cosine decay to 0, batch size 64, 30 epochs. He initialisation (N(0, sqrt(2/fan_in))) on weights, zero on biases.

D3 — MLP wins over LogReg at classify time, LogReg kept as fallback

AdaptiveModel carries both:

pub weights: Vec<Vec<f64>>,   // legacy LogReg, still trained for rollback
pub mlp: MlpModel,            // ADR-119 — preferred when is_trained() == true

classify() checks self.mlp.is_trained(); if yes uses MLP forward pass, otherwise falls back to LogReg softmax. Old data/adaptive_model.json files (15-feature LogReg) loaded with #[serde(default)] on mlpMlpModel::default() returns empty fields → is_trained() == false → graceful degradation to LogReg path.

D4 — Train both, report better number

train_from_recordings runs the existing LogReg loop first (unchanged), then trains MLP on the same z-normalised samples, evaluates both on the training set, and reports training_accuracy = mlp_acc.max(logreg_acc). Per-class accuracy from both classifiers is logged side-by-side for diagnostic comparison.

Verified Acceptance

LogReg:    49.58% overall
MLP:       53.53% overall  (+3.95 pts)

Per-class (LogReg → MLP):
  absent          40% → 41%   (+1)
  present_still   99% → 99%   (tied — 2× sample count)
  transition      29% → 36%   (+7)
  active          22% → 30%   (+8)
  waving          34% → 38%   (+4)
  present_moving  24% → 33%   (+9)

Notes:

  • present_still class is a merged bucket: both train_standing_* and train_sitting_* map to present_still via classify_recording_name. Hence 43,242 samples vs 21,500 average for the other classes — the classifier biases strongly toward this dominant class. The 99% is honest but partially inflated by class imbalance.
  • The +3.95 pts is concentrated on motion classes — exactly where the hypothesis predicted MLP would help (non-linear combinations of per- node features differentiate similar motion types).
  • MLP loss flatlined around 1.15 after epoch 10. Suggests the current 22-feature representation has hit its information ceiling for frame- level classification. Going higher needs temporal context (sliding window classifier, LSTM, TCN) — see Open Items.

Total improvement since the start of this session:

2-node, 15 features, LogReg:    40.4%   (baseline)
6-node, 15 features, LogReg:    44.4%   +4.0 from more data
6-node, 22 features, LogReg:    49.58%  +5.2 from feature engineering (ADR-118)
6-node, 22 features, MLP:       53.53%  +3.95 from non-linear classifier (ADR-119)
                                ─────
Total cumulative:               +13.1 percentage points

Files Touched

v2/crates/wifi-densepose-sensing-server/src/adaptive_classifier.rs:
  + const MLP_HIDDEN: usize = 32
  + pub struct MlpModel { w1, b1, w2, b2, n_classes } + serde
  + impl MlpModel { is_trained, forward }
  + AdaptiveModel.mlp field (serde-default for backward compat)
  + AdaptiveModel::classify prefers MLP when trained
  + train_mlp_classifier (~150 LoC manual backprop)
  + eval_mlp helper
  + train_from_recordings calls MLP path and picks max accuracy
docs/adr/ADR-119-mlp-classifier.md  (this)

data/adaptive_model.json removed at deploy time — the MLP fields need populating, the old file has none.

Out of Scope / Follow-ups

  • Temporal classifier (sliding window LSTM/TCN) — loss flatlines at ~1.15 with the current feature set; this is the frame-level ceiling. A model that consumes a 1-second window (10-20 frames) would catch the temporal signature of transition (sit-stand cycle ≈ 0.5 Hz), walking (step rate ≈ 2 Hz), active (bursty), waving (limb cadence ≈ 1-2 Hz). Estimated +15-25 pts realistic for these inherently-temporal classes. ~3-4 hours of code.
  • Class imbalance fixpresent_still has 2× samples. Either oversample the minority classes during training, or weight loss by inverse class frequency. Marginal — ~2-3 pts.
  • Drop dead features — 6 entropy features (sep_ratio 0.01-0.02) and 3 weak globals (mean_rssi, dom_hz, change_pts all <0.11) contribute noise. Reducing 22 → ~13 features would simplify training but probably not move accuracy more than 1-2 pts.
  • Hidden size sweep — tried only 32. Could try 16 (faster, less overfitting risk) or 64 (more capacity). Cosmetic.
  • Split sitting and standing into separate classes — they're physically distinct RF signatures but currently merged. Adding them as separate classes would test whether the model can disambiguate them. Likely lowers present_still accuracy but separates a useful distinction. Experiment-grade.

References

  • ADR-118 — feature decorrelation + multi-node extractor (the 22-feature basis this ADR uses)
  • ADR-117 — earlier process hygiene pass; introduced standardisation (global_mean/global_std) that this ADR's MLP also relies on
  • ADR-101 — raw amplitude classifier (the runtime path that calls AdaptiveModel::classify)