History

ruv 5cacb5fe0a perf(nn): zero-copy ORT input (~1.48x) + dynamic-dim guard + concurrency bench (ADR-155 §Tier-3) - onnx.rs ORT input: arr.as_slice() single-memcpy fast path with iterator fallback for strided views. MEASURED [1,256,64,64]: 1.972ms -> 1.336ms (~1.48x). Repro: cargo bench -p wifi-densepose-nn --no-default-features --features onnx --bench onnx_bench -- onnx_input_copy - onnx.rs checked_output_dims: reject ONNX dim <= 0 (incl. unresolved -1) before allocation (config-OOM class) + test. - onnx_concurrency bench: empirically proves the per-inference write lock serializes (throughput drops with more threads). The intended read-lock win is NOT landable on ort 2.0.0-rc.11 (safe Session::run is &mut self, verified) and is deferred to the backlog with the upgrade path documented in-code. New committed fixture tests/fixtures/tiny_conv.onnx (666 B, not gitignored). Co-Authored-By: claude-flow <ruv@ruv.net>		2026-06-11 19:57:53 -04:00
..
benches	perf(nn): zero-copy ORT input (~1.48x) + dynamic-dim guard + concurrency bench (ADR-155 §Tier-3)	2026-06-11 19:57:53 -04:00
src	perf(nn): zero-copy ORT input (~1.48x) + dynamic-dim guard + concurrency bench (ADR-155 §Tier-3)	2026-06-11 19:57:53 -04:00
tests/fixtures	perf(nn): zero-copy ORT input (~1.48x) + dynamic-dim guard + concurrency bench (ADR-155 §Tier-3)	2026-06-11 19:57:53 -04:00
Cargo.toml	perf(nn): zero-copy ORT input (~1.48x) + dynamic-dim guard + concurrency bench (ADR-155 §Tier-3)	2026-06-11 19:57:53 -04:00
README.md	chore(repo): rename rust-port/wifi-densepose-rs → v2/ (flatten to one level) (#427 )	2026-04-25 21:28:13 -04:00

README.md

wifi-densepose-nn

Multi-backend neural network inference for WiFi-based DensePose estimation.

Overview

wifi-densepose-nn provides the inference engine that maps processed WiFi CSI features to DensePose body surface predictions. It supports three backends -- ONNX Runtime (default), PyTorch via tch-rs, and Candle -- so models can run on CPU, CUDA GPU, or TensorRT depending on the deployment target.

The crate implements two key neural components:

DensePose Head -- Predicts 24 body part segmentation masks and per-part UV coordinate regression.
Modality Translator -- Translates CSI feature embeddings into visual feature space, bridging the domain gap between WiFi signals and image-based pose estimation.

Features

ONNX Runtime backend (default) -- Load and run .onnx models with CPU or GPU execution providers.
PyTorch backend (tch-backend) -- Native PyTorch inference via libtorch FFI.
Candle backend (candle-backend) -- Pure-Rust inference with candle-core and candle-nn.
CUDA acceleration (cuda) -- GPU execution for supported backends.
TensorRT optimization (tensorrt) -- INT8/FP16 optimized inference via ONNX Runtime.
Batched inference -- Process multiple CSI frames in a single forward pass.
Model caching -- Memory-mapped model weights via memmap2.

Feature flags

Flag	Default	Description
`onnx`	yes	ONNX Runtime backend
`tch-backend`	no	PyTorch (tch-rs) backend
`candle-backend`	no	Candle pure-Rust backend
`cuda`	no	CUDA GPU acceleration
`tensorrt`	no	TensorRT via ONNX Runtime
`all-backends`	no	Enable onnx + tch + candle together

Quick Start

use wifi_densepose_nn::{InferenceEngine, DensePoseConfig, OnnxBackend};

// Create inference engine with ONNX backend
let config = DensePoseConfig::default();
let backend = OnnxBackend::from_file("model.onnx")?;
let engine = InferenceEngine::new(backend, config)?;

// Run inference on a CSI feature tensor
let input = ndarray::Array4::zeros((1, 256, 64, 64));
let output = engine.infer(&input)?;

println!("Body parts: {}", output.body_parts.shape()[1]); // 24

Architecture

wifi-densepose-nn/src/
  lib.rs          -- Re-exports, constants (NUM_BODY_PARTS=24), prelude
  densepose.rs    -- DensePoseHead, DensePoseConfig, DensePoseOutput
  inference.rs    -- Backend trait, InferenceEngine, InferenceOptions
  onnx.rs         -- OnnxBackend, OnnxSession (feature-gated)
  tensor.rs       -- Tensor, TensorShape utilities
  translator.rs   -- ModalityTranslator (CSI -> visual space)
  error.rs        -- NnError, NnResult

Crate	Role
`wifi-densepose-core`	Foundation types and `NeuralInference` trait
`wifi-densepose-signal`	Produces CSI features consumed by inference
`wifi-densepose-train`	Trains the models this crate loads
`ort`	ONNX Runtime Rust bindings
`tch`	PyTorch Rust bindings
`candle-core`	Hugging Face pure-Rust ML framework

License

MIT OR Apache-2.0