wifi-densepose/v2/crates/wifi-densepose-nn
ruv 5cacb5fe0a perf(nn): zero-copy ORT input (~1.48x) + dynamic-dim guard + concurrency bench (ADR-155 §Tier-3)
- onnx.rs ORT input: arr.as_slice() single-memcpy fast path with iterator
  fallback for strided views. MEASURED [1,256,64,64]: 1.972ms -> 1.336ms
  (~1.48x). Repro: cargo bench -p wifi-densepose-nn --no-default-features
  --features onnx --bench onnx_bench -- onnx_input_copy
- onnx.rs checked_output_dims: reject ONNX dim <= 0 (incl. unresolved -1) before
  allocation (config-OOM class) + test.
- onnx_concurrency bench: empirically proves the per-inference write lock
  serializes (throughput drops with more threads). The intended read-lock win is
  NOT landable on ort 2.0.0-rc.11 (safe Session::run is &mut self, verified) and
  is deferred to the backlog with the upgrade path documented in-code.

New committed fixture tests/fixtures/tiny_conv.onnx (666 B, not gitignored).

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-06-11 19:57:53 -04:00
..
benches perf(nn): zero-copy ORT input (~1.48x) + dynamic-dim guard + concurrency bench (ADR-155 §Tier-3) 2026-06-11 19:57:53 -04:00
src perf(nn): zero-copy ORT input (~1.48x) + dynamic-dim guard + concurrency bench (ADR-155 §Tier-3) 2026-06-11 19:57:53 -04:00
tests/fixtures perf(nn): zero-copy ORT input (~1.48x) + dynamic-dim guard + concurrency bench (ADR-155 §Tier-3) 2026-06-11 19:57:53 -04:00
Cargo.toml perf(nn): zero-copy ORT input (~1.48x) + dynamic-dim guard + concurrency bench (ADR-155 §Tier-3) 2026-06-11 19:57:53 -04:00
README.md chore(repo): rename rust-port/wifi-densepose-rs → v2/ (flatten to one level) (#427) 2026-04-25 21:28:13 -04:00

README.md

wifi-densepose-nn

Crates.io Documentation License

Multi-backend neural network inference for WiFi-based DensePose estimation.

Overview

wifi-densepose-nn provides the inference engine that maps processed WiFi CSI features to DensePose body surface predictions. It supports three backends -- ONNX Runtime (default), PyTorch via tch-rs, and Candle -- so models can run on CPU, CUDA GPU, or TensorRT depending on the deployment target.

The crate implements two key neural components:

  • DensePose Head -- Predicts 24 body part segmentation masks and per-part UV coordinate regression.
  • Modality Translator -- Translates CSI feature embeddings into visual feature space, bridging the domain gap between WiFi signals and image-based pose estimation.

Features

  • ONNX Runtime backend (default) -- Load and run .onnx models with CPU or GPU execution providers.
  • PyTorch backend (tch-backend) -- Native PyTorch inference via libtorch FFI.
  • Candle backend (candle-backend) -- Pure-Rust inference with candle-core and candle-nn.
  • CUDA acceleration (cuda) -- GPU execution for supported backends.
  • TensorRT optimization (tensorrt) -- INT8/FP16 optimized inference via ONNX Runtime.
  • Batched inference -- Process multiple CSI frames in a single forward pass.
  • Model caching -- Memory-mapped model weights via memmap2.

Feature flags

Flag Default Description
onnx yes ONNX Runtime backend
tch-backend no PyTorch (tch-rs) backend
candle-backend no Candle pure-Rust backend
cuda no CUDA GPU acceleration
tensorrt no TensorRT via ONNX Runtime
all-backends no Enable onnx + tch + candle together

Quick Start

use wifi_densepose_nn::{InferenceEngine, DensePoseConfig, OnnxBackend};

// Create inference engine with ONNX backend
let config = DensePoseConfig::default();
let backend = OnnxBackend::from_file("model.onnx")?;
let engine = InferenceEngine::new(backend, config)?;

// Run inference on a CSI feature tensor
let input = ndarray::Array4::zeros((1, 256, 64, 64));
let output = engine.infer(&input)?;

println!("Body parts: {}", output.body_parts.shape()[1]); // 24

Architecture

wifi-densepose-nn/src/
  lib.rs          -- Re-exports, constants (NUM_BODY_PARTS=24), prelude
  densepose.rs    -- DensePoseHead, DensePoseConfig, DensePoseOutput
  inference.rs    -- Backend trait, InferenceEngine, InferenceOptions
  onnx.rs         -- OnnxBackend, OnnxSession (feature-gated)
  tensor.rs       -- Tensor, TensorShape utilities
  translator.rs   -- ModalityTranslator (CSI -> visual space)
  error.rs        -- NnError, NnResult
Crate Role
wifi-densepose-core Foundation types and NeuralInference trait
wifi-densepose-signal Produces CSI features consumed by inference
wifi-densepose-train Trains the models this crate loads
ort ONNX Runtime Rust bindings
tch PyTorch Rust bindings
candle-core Hugging Face pure-Rust ML framework

License

MIT OR Apache-2.0