wifi-densepose/v2/crates/wifi-densepose-nn
rUv 1d12e8831a
refactor(beyond-sota): ADR-155 M2 — host-verifiable §8 closeout (7 de-magic, 9 boundary tests, native-conv honest-null) (#1059)
* refactor(train): ADR-155 M2 §8 — de-magic train non-tch tuning constants + boundary tests

Lift bare numeric literals used as thresholds / guard epsilons in the
non-tch (host-verifiable) train surface into named, documented consts and
pin each set with a *_consts_unchanged_from_literals test. Values are
bit-identical to the prior inline literals — cleanup, no behaviour change.

De-magicked (const + pin test):
- metrics_core.rs: VISIBILITY_THRESHOLD (0.5), MIN_REFERENCE_EXTENT (1e-6),
  OKS_FALLBACK_SIGMA (0.07)
- ruview_metrics.rs: NUM_KEYPOINTS (17), VISIBILITY_THRESHOLD (0.5),
  PCK_THRESHOLD (0.2), MIN_BBOX_DIAG (1e-3), MIN_DURATION_MINUTES (1e-6)
- subcarrier.rs: SPARSE_BASIS_SIGMA (0.15), SPARSE_BASIS_THRESHOLD (1e-4),
  SPARSE_REGULARIZATION_LAMBDA (0.1), SPARSE_COO_PRUNE_EPS (1e-8),
  SPARSE_SOLVER_TOL (1e-5 f64), SPARSE_SOLVER_MAX_ITERS (500)
- eval.rs: MIN_POSITIVE_MPJPE (1e-10)
- domain.rs: LAYER_NORM_EPS (1e-5)
- virtual_aug.rs: BOX_MULLER_U1_FLOOR (1e-10), MIN_ROOM_SCALE (1e-10)

Boundary / characterization tests (pin CURRENT behaviour):
- visibility_threshold_boundary_is_inclusive (>= 0.5 at the edge)
- degenerate_extent_below_floor_is_unscoreable ((0,0,0.0)/0.0, not perfect)
- tracking_zero_duration_does_not_divide_by_zero
- oks_short_array_is_bounded_at_keypoint_count (16 rows, no panic)
- compute_interp_weights_single_target_is_index_zero (target_sc==1)
- sparse_interp_single_target_is_finite
- domain_gap_infinite_when_in_domain_perfect_but_cross_nonzero
- domain_gap_unity_when_everything_perfect
- augment_frame_zero_room_scale_passes_amplitude_finite

Doc-only (no behaviour change):
- rapid_adapt.rs: correct module-doc O(eps) -> O(eps^2) for central differences
- geometry.rs: add # Panics to DeepSets::encode (documents existing assert!)

train --no-default-features: 191 lib (was 176), 303 total (was 288), 0 failed.

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(nn): ADR-155 M2 §3 — pure-Rust LinearHead::try_new input guard + de-magic softplus threshold

ADR-155 §3 found rf_encoder.rs has no adversarial checkpoint-deserialization
assert — its assert_eq!s in LinearHead::new are construction-time API contracts
on programmer-supplied vectors. This adds the honest, in-scope improvement the
M2 task allows: a pure-Rust *fallible* constructor so weights from an untrusted /
deserialized checkpoint can be shape-validated without panicking.

- Add RfHeadError (WeightShape / BiasShape / VarWeightShape) + Display + Error.
- Add LinearHead::try_new returning Result<Self, RfHeadError>; on success the
  head is byte-identical to LinearHead::new. new() is unchanged (still asserts;
  now documents # Panics and points to try_new) — no behaviour change for
  existing callers.
- De-magic softplus's bare 20.0 overflow threshold into
  SOFTPLUS_LINEAR_THRESHOLD (value unchanged) + pin test.

Tests: try_new_accepts_valid_and_rejects_each_bad_shape (valid == new forward;
each bad shape → typed error), softplus_threshold_unchanged_from_literal.

nn --no-default-features lib: 37 passed (was 35), 0 failed.

Co-Authored-By: claude-flow <ruv@ruv.net>

* perf(nn): ADR-155 M2 §4 — native-conv bench-first → MEASURED-INCONCLUSIVE (no perf change shipped)

The §8 "native-conv naive-loop rewrite" backlog item: DensePoseHead::
apply_conv_layer is a pure-Rust 6-nested-loop conv (benchable on this host, not
tch/ort-gated). Bench-first per the §0 PROOF discipline.

- Add committed criterion bench benches/native_conv_bench.rs measuring forward()
  through the naive conv on representative single-layer configs (--no-default-
  features; no ort download).
- Prototyped a bit-identical range-clamped variant (hoist the per-tap in-bounds
  branch by pre-clamping kh/kw ranges; same ic→kh→kw MAC order ⇒ bit-identical).
  MEASURED before/after on this host: ~35% faster on padding-heavy small-channel
  maps (4.40→2.84 ms) but a ~3% *regression* on channel-heavy maps (11.09→11.48
  ms), all inside a ±20% run-to-run noise floor. Verdict: INCONCLUSIVE — the
  benefit is not robustly positive, so the rewrite is NOT shipped and NOT a
  fabricated speedup. Reverted to the naive loop; honestly deferred (ADR-155 §8).
- Add native_conv_matches_reference: a hand-computed characterization anchor
  (1×1 = scalar MAC; same-padded 3×3 ones = truncated-window sums 9/6/4) pinning
  CURRENT conv behaviour for any future rewrite.

nn --no-default-features lib: 38 passed (was 37), 0 failed. No behaviour change.

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs(adr-155): M2 §8.2 — enumerated host-verifiable P3 backlog clearance + CHANGELOG

Replace the §8 bulk "~40 lower-severity findings" line with the real, enumerated
M2 resolution (§8.2): 7 de-magicked (const + pin == prior literal), 9 boundary
tests, 1 input guard (rf_encoder try_new), 2 doc-only, 1 perf bench-first
MEASURED-INCONCLUSIVE (not shipped). Mark native-conv + rf_encoder RESOLVED;
state which §8 items stay data-gated (GraphPose-Fi/INT4/CSI-JEPA) or tch-gated
(proof/trainer/model panic sites, metrics *_v2 dead code) and ONNX read-lock
upstream-gated — blocked, not dropped. Declare the non-tch-verifiable subset of
§8 cleared.

Validation: train --no-default-features 303 passed (was 288); nn lib 38 (was 35);
workspace --no-default-features 3,293 passed, 0 failed; Python proof VERDICT PASS,
hash f8e76f21…46f7a UNCHANGED bit-exact.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-06-14 00:07:56 -04:00
..
benches refactor(beyond-sota): ADR-155 M2 — host-verifiable §8 closeout (7 de-magic, 9 boundary tests, native-conv honest-null) (#1059) 2026-06-14 00:07:56 -04:00
src refactor(beyond-sota): ADR-155 M2 — host-verifiable §8 closeout (7 de-magic, 9 boundary tests, native-conv honest-null) (#1059) 2026-06-14 00:07:56 -04:00
tests/fixtures perf(nn): zero-copy ORT input (~1.48x) + dynamic-dim guard + concurrency bench (ADR-155 §Tier-3) 2026-06-11 19:57:53 -04:00
Cargo.toml refactor(beyond-sota): ADR-155 M2 — host-verifiable §8 closeout (7 de-magic, 9 boundary tests, native-conv honest-null) (#1059) 2026-06-14 00:07:56 -04:00
README.md chore(repo): rename rust-port/wifi-densepose-rs → v2/ (flatten to one level) (#427) 2026-04-25 21:28:13 -04:00

README.md

wifi-densepose-nn

Crates.io Documentation License

Multi-backend neural network inference for WiFi-based DensePose estimation.

Overview

wifi-densepose-nn provides the inference engine that maps processed WiFi CSI features to DensePose body surface predictions. It supports three backends -- ONNX Runtime (default), PyTorch via tch-rs, and Candle -- so models can run on CPU, CUDA GPU, or TensorRT depending on the deployment target.

The crate implements two key neural components:

  • DensePose Head -- Predicts 24 body part segmentation masks and per-part UV coordinate regression.
  • Modality Translator -- Translates CSI feature embeddings into visual feature space, bridging the domain gap between WiFi signals and image-based pose estimation.

Features

  • ONNX Runtime backend (default) -- Load and run .onnx models with CPU or GPU execution providers.
  • PyTorch backend (tch-backend) -- Native PyTorch inference via libtorch FFI.
  • Candle backend (candle-backend) -- Pure-Rust inference with candle-core and candle-nn.
  • CUDA acceleration (cuda) -- GPU execution for supported backends.
  • TensorRT optimization (tensorrt) -- INT8/FP16 optimized inference via ONNX Runtime.
  • Batched inference -- Process multiple CSI frames in a single forward pass.
  • Model caching -- Memory-mapped model weights via memmap2.

Feature flags

Flag Default Description
onnx yes ONNX Runtime backend
tch-backend no PyTorch (tch-rs) backend
candle-backend no Candle pure-Rust backend
cuda no CUDA GPU acceleration
tensorrt no TensorRT via ONNX Runtime
all-backends no Enable onnx + tch + candle together

Quick Start

use wifi_densepose_nn::{InferenceEngine, DensePoseConfig, OnnxBackend};

// Create inference engine with ONNX backend
let config = DensePoseConfig::default();
let backend = OnnxBackend::from_file("model.onnx")?;
let engine = InferenceEngine::new(backend, config)?;

// Run inference on a CSI feature tensor
let input = ndarray::Array4::zeros((1, 256, 64, 64));
let output = engine.infer(&input)?;

println!("Body parts: {}", output.body_parts.shape()[1]); // 24

Architecture

wifi-densepose-nn/src/
  lib.rs          -- Re-exports, constants (NUM_BODY_PARTS=24), prelude
  densepose.rs    -- DensePoseHead, DensePoseConfig, DensePoseOutput
  inference.rs    -- Backend trait, InferenceEngine, InferenceOptions
  onnx.rs         -- OnnxBackend, OnnxSession (feature-gated)
  tensor.rs       -- Tensor, TensorShape utilities
  translator.rs   -- ModalityTranslator (CSI -> visual space)
  error.rs        -- NnError, NnResult
Crate Role
wifi-densepose-core Foundation types and NeuralInference trait
wifi-densepose-signal Produces CSI features consumed by inference
wifi-densepose-train Trains the models this crate loads
ort ONNX Runtime Rust bindings
tch PyTorch Rust bindings
candle-core Hugging Face pure-Rust ML framework

License

MIT OR Apache-2.0