6.9 KiB
ADR-119 — MLP Replaces Logistic Regression in Adaptive Classifier
Status: Accepted
Date: 2026-05-18
Scope: v2/crates/wifi-densepose-sensing-server/src/adaptive_classifier.rs
(new MlpModel struct, train_mlp_classifier, eval_mlp; modified
AdaptiveModel::classify + train_from_recordings).
Context
After ADR-118 (feature decorrelation + multi-node extractor) the adaptive
classifier reached 49.58% accuracy on a 6-node, 7-class, 151,329-frame
training set. Per-feature audit showed n6_std sep_ratio = 0.60 — i.e. the
underlying signal can separate the classes — but logistic regression was
limited to linear decision boundaries and couldn't model interactions like:
walking:n2_stdhigh ANDn6_stdhigh ANDdom_hz ≈ 3 Hzwaving:n1_stdhigh BUTn2_stdlow (only close sensors fire)sittingvsstanding: same global features, differ inn6_stdpattern
LogReg sums weighted features; it cannot represent "AND/BUT" combinations. A small MLP can: hidden units learn intermediate concepts, then the output layer combines them.
Decisions
D1 — Single-hidden-layer MLP, 22 → 32 → 6
- Input: the same 22-feature vector from ADR-118.
- Hidden: 32 ReLU units. ~3k weights, enough capacity for 6 classes but small enough to train in seconds on the 151k-frame set.
- Output: softmax over
n_classes(discovered dynamically at train time). - Z-score normalisation: identical to the LogReg path — same
global_mean/global_stdpopulated bytrain_from_recordings.
D2 — Manual backprop, no external ML crate
tch (LibTorch) or candle would pull in ~50-200 MB of native deps for a
~3k-parameter network. The forward + backward passes are ~150 LoC of pure
Rust; SGD + momentum + cosine LR decay another ~30. Built-in f64
arithmetic is fast enough — full train completes in ~10 seconds on M1
Mac.
Optimiser: SGD with momentum 0.9, weight decay 1e-4, base LR 0.05 with
half-cosine decay to 0, batch size 64, 30 epochs. He initialisation
(N(0, sqrt(2/fan_in))) on weights, zero on biases.
D3 — MLP wins over LogReg at classify time, LogReg kept as fallback
AdaptiveModel carries both:
pub weights: Vec<Vec<f64>>, // legacy LogReg, still trained for rollback
pub mlp: MlpModel, // ADR-119 — preferred when is_trained() == true
classify() checks self.mlp.is_trained(); if yes uses MLP forward pass,
otherwise falls back to LogReg softmax. Old data/adaptive_model.json
files (15-feature LogReg) loaded with #[serde(default)] on mlp →
MlpModel::default() returns empty fields → is_trained() == false →
graceful degradation to LogReg path.
D4 — Train both, report better number
train_from_recordings runs the existing LogReg loop first (unchanged),
then trains MLP on the same z-normalised samples, evaluates both on the
training set, and reports training_accuracy = mlp_acc.max(logreg_acc).
Per-class accuracy from both classifiers is logged side-by-side for
diagnostic comparison.
Verified Acceptance
LogReg: 49.58% overall
MLP: 53.53% overall (+3.95 pts)
Per-class (LogReg → MLP):
absent 40% → 41% (+1)
present_still 99% → 99% (tied — 2× sample count)
transition 29% → 36% (+7)
active 22% → 30% (+8)
waving 34% → 38% (+4)
present_moving 24% → 33% (+9)
Notes:
present_stillclass is a merged bucket: bothtrain_standing_*andtrain_sitting_*map topresent_stillviaclassify_recording_name. Hence 43,242 samples vs 21,500 average for the other classes — the classifier biases strongly toward this dominant class. The 99% is honest but partially inflated by class imbalance.- The +3.95 pts is concentrated on motion classes — exactly where the hypothesis predicted MLP would help (non-linear combinations of per- node features differentiate similar motion types).
- MLP loss flatlined around 1.15 after epoch 10. Suggests the current 22-feature representation has hit its information ceiling for frame- level classification. Going higher needs temporal context (sliding window classifier, LSTM, TCN) — see Open Items.
Total improvement since the start of this session:
2-node, 15 features, LogReg: 40.4% (baseline)
6-node, 15 features, LogReg: 44.4% +4.0 from more data
6-node, 22 features, LogReg: 49.58% +5.2 from feature engineering (ADR-118)
6-node, 22 features, MLP: 53.53% +3.95 from non-linear classifier (ADR-119)
─────
Total cumulative: +13.1 percentage points
Files Touched
v2/crates/wifi-densepose-sensing-server/src/adaptive_classifier.rs:
+ const MLP_HIDDEN: usize = 32
+ pub struct MlpModel { w1, b1, w2, b2, n_classes } + serde
+ impl MlpModel { is_trained, forward }
+ AdaptiveModel.mlp field (serde-default for backward compat)
+ AdaptiveModel::classify prefers MLP when trained
+ train_mlp_classifier (~150 LoC manual backprop)
+ eval_mlp helper
+ train_from_recordings calls MLP path and picks max accuracy
docs/adr/ADR-119-mlp-classifier.md (this)
data/adaptive_model.json removed at deploy time — the MLP fields need
populating, the old file has none.
Out of Scope / Follow-ups
- Temporal classifier (sliding window LSTM/TCN) — loss flatlines at
~1.15 with the current feature set; this is the frame-level ceiling.
A model that consumes a 1-second window (10-20 frames) would catch
the temporal signature of
transition(sit-stand cycle ≈ 0.5 Hz),walking(step rate ≈ 2 Hz),active(bursty),waving(limb cadence ≈ 1-2 Hz). Estimated +15-25 pts realistic for these inherently-temporal classes. ~3-4 hours of code. - Class imbalance fix —
present_stillhas 2× samples. Either oversample the minority classes during training, or weight loss by inverse class frequency. Marginal — ~2-3 pts. - Drop dead features — 6 entropy features (sep_ratio 0.01-0.02) and
3 weak globals (
mean_rssi,dom_hz,change_ptsall <0.11) contribute noise. Reducing 22 → ~13 features would simplify training but probably not move accuracy more than 1-2 pts. - Hidden size sweep — tried only 32. Could try 16 (faster, less overfitting risk) or 64 (more capacity). Cosmetic.
- Split
sittingandstandinginto separate classes — they're physically distinct RF signatures but currently merged. Adding them as separate classes would test whether the model can disambiguate them. Likely lowerspresent_stillaccuracy but separates a useful distinction. Experiment-grade.
References
- ADR-118 — feature decorrelation + multi-node extractor (the 22-feature basis this ADR uses)
- ADR-117 — earlier process hygiene pass; introduced standardisation
(
global_mean/global_std) that this ADR's MLP also relies on - ADR-101 — raw amplitude classifier (the runtime path that calls
AdaptiveModel::classify)