docs(adr): ADR-057 Desktop Training & RuVector Integration
Proposes a new Training page in the desktop app with tabs: - Datasets: Download/manage training datasets (MM-Fi, Wi-Pose) - Models: Browse architectures, load checkpoints, export ONNX - Training: Configure and run training jobs with GPU support - RuVector: Configure signal processing modules, live testing - Metrics: View loss curves, evaluation results Integrates wifi-densepose-train crate and 5 RuVector crates into the Tauri desktop application. Co-Authored-By: claude-flow <ruv@ruv.net>
This commit is contained in:
parent
ff91d4e8cf
commit
9e860c3a7a
|
|
@ -0,0 +1,240 @@
|
|||
# ADR-057: Desktop App Training & RuVector Integration
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| Status | Proposed |
|
||||
| Date | 2026-03-10 |
|
||||
| Authors | RuView Team |
|
||||
| Reviewers | - |
|
||||
| Related | ADR-016, ADR-017, ADR-024, ADR-027 |
|
||||
|
||||
## Context
|
||||
|
||||
The RuView desktop application currently provides device discovery, firmware flashing, OTA updates, and real-time sensing visualization. However, users cannot train models or configure RuVector signal processing modules directly from the desktop app.
|
||||
|
||||
The following crates exist in the workspace but are not exposed in the desktop UI:
|
||||
|
||||
### Training Crate (`wifi-densepose-train`)
|
||||
- Dataset management (MM-Fi, Wi-Pose formats)
|
||||
- Model architectures (CSI encoder, pose decoder)
|
||||
- Training loops with metrics tracking
|
||||
- Checkpoint save/load
|
||||
- ruview_metrics integration
|
||||
|
||||
### RuVector Crates (5 modules)
|
||||
1. **ruvector-mincut** - Graph-based person segmentation, DynamicPersonMatcher
|
||||
2. **ruvector-attn-mincut** - Attention-weighted antenna selection
|
||||
3. **ruvector-temporal-tensor** - Temporal CSI compression, breathing detection
|
||||
4. **ruvector-solver** - Sparse interpolation, triangulation
|
||||
5. **ruvector-attention** - Spatial attention, BVP extraction
|
||||
|
||||
## Decision
|
||||
|
||||
Add a new **"Training"** page to the desktop application with tabbed navigation:
|
||||
|
||||
### Tab Structure
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Training & Models │
|
||||
├──────────┬──────────┬──────────┬──────────┬────────────────┤
|
||||
│ Datasets │ Models │ Training │ RuVector │ Metrics │
|
||||
└──────────┴──────────┴──────────┴──────────┴────────────────┘
|
||||
```
|
||||
|
||||
### Tab 1: Datasets
|
||||
- **Download** standard datasets (MM-Fi, Wi-Pose)
|
||||
- **Import** custom CSI recordings
|
||||
- **Preview** dataset samples (CSI heatmaps, labels)
|
||||
- **Split** into train/val/test sets
|
||||
- **Statistics** - sample counts, class distribution
|
||||
|
||||
### Tab 2: Models
|
||||
- **Browse** available architectures:
|
||||
- CSI Encoder (CNN, Transformer)
|
||||
- Pose Decoder (LSTM, GRU)
|
||||
- AETHER embedding network (ADR-024)
|
||||
- MERIDIAN domain adaptor (ADR-027)
|
||||
- **Load** checkpoints from disk
|
||||
- **View** model summary (params, layers, memory)
|
||||
- **Export** to ONNX/TorchScript
|
||||
|
||||
### Tab 3: Training
|
||||
- **Configure** training:
|
||||
- Learning rate, batch size, epochs
|
||||
- Optimizer (Adam, SGD, AdamW)
|
||||
- Loss function selection
|
||||
- Data augmentation toggles
|
||||
- **GPU Detection** - CUDA/Metal availability
|
||||
- **Start/Stop** training jobs
|
||||
- **Progress** - live loss curves, ETA
|
||||
- **Checkpointing** - auto-save best model
|
||||
|
||||
### Tab 4: RuVector
|
||||
- **Module Configuration**:
|
||||
- MinCut graph parameters
|
||||
- Attention weights
|
||||
- Temporal compression ratio
|
||||
- Solver interpolation settings
|
||||
- **Live Testing** - apply to real-time CSI stream
|
||||
- **Comparison** - A/B test configurations
|
||||
- **Export** - save optimal config
|
||||
|
||||
### Tab 5: Metrics
|
||||
- **Loss Curves** - training/validation over epochs
|
||||
- **Evaluation** - PCK, mAP, IoU scores
|
||||
- **Confusion Matrix** - per-joint accuracy
|
||||
- **Export** - CSV, JSON, TensorBoard format
|
||||
|
||||
## Architecture
|
||||
|
||||
### Backend (Rust/Tauri)
|
||||
|
||||
```
|
||||
wifi-densepose-desktop/
|
||||
├── src/
|
||||
│ ├── commands/
|
||||
│ │ ├── training.rs # NEW: Training job management
|
||||
│ │ ├── datasets.rs # NEW: Dataset download/import
|
||||
│ │ ├── models.rs # NEW: Model loading/export
|
||||
│ │ ├── ruvector.rs # NEW: RuVector config
|
||||
│ │ └── metrics.rs # NEW: Metrics retrieval
|
||||
│ └── domain/
|
||||
│ ├── training.rs # Training state machine
|
||||
│ └── ruvector.rs # RuVector config types
|
||||
```
|
||||
|
||||
### Frontend (React/TypeScript)
|
||||
|
||||
```
|
||||
ui/src/pages/
|
||||
├── Training/
|
||||
│ ├── index.tsx # Tab container
|
||||
│ ├── DatasetsTab.tsx # Dataset management
|
||||
│ ├── ModelsTab.tsx # Model browser
|
||||
│ ├── TrainingTab.tsx # Training control
|
||||
│ ├── RuVectorTab.tsx # Signal processing config
|
||||
│ └── MetricsTab.tsx # Visualization
|
||||
```
|
||||
|
||||
### Tauri Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `list_datasets` | Get available datasets |
|
||||
| `download_dataset` | Download standard dataset |
|
||||
| `import_dataset` | Import custom recordings |
|
||||
| `list_models` | Get model architectures |
|
||||
| `load_checkpoint` | Load model weights |
|
||||
| `export_model` | Export to ONNX |
|
||||
| `detect_gpu` | Check CUDA/Metal |
|
||||
| `start_training` | Begin training job |
|
||||
| `stop_training` | Cancel training |
|
||||
| `training_progress` | Get current status |
|
||||
| `get_ruvector_config` | Load RuVector settings |
|
||||
| `set_ruvector_config` | Update settings |
|
||||
| `test_ruvector_live` | Apply to live CSI |
|
||||
| `get_metrics` | Retrieve training metrics |
|
||||
|
||||
### Event System
|
||||
|
||||
Training progress updates via Tauri events:
|
||||
|
||||
```rust
|
||||
#[derive(Serialize, Clone)]
|
||||
pub struct TrainingProgress {
|
||||
pub epoch: u32,
|
||||
pub total_epochs: u32,
|
||||
pub batch: u32,
|
||||
pub total_batches: u32,
|
||||
pub train_loss: f32,
|
||||
pub val_loss: Option<f32>,
|
||||
pub learning_rate: f32,
|
||||
pub eta_secs: u64,
|
||||
pub gpu_memory_mb: Option<u64>,
|
||||
}
|
||||
|
||||
// Emit every batch
|
||||
app.emit("training:progress", progress)?;
|
||||
|
||||
// Emit on completion
|
||||
app.emit("training:complete", result)?;
|
||||
```
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Phase 1: Foundation (Week 1-2)
|
||||
1. Create `Training` page skeleton with tabs
|
||||
2. Implement `detect_gpu` command
|
||||
3. Add dataset listing/download commands
|
||||
4. Design TypeScript types for all entities
|
||||
|
||||
### Phase 2: Dataset Management (Week 3)
|
||||
1. MM-Fi dataset downloader
|
||||
2. Wi-Pose dataset downloader
|
||||
3. Custom dataset import (CSV/NPZ)
|
||||
4. Dataset preview component
|
||||
|
||||
### Phase 3: Model Management (Week 4)
|
||||
1. Model architecture browser
|
||||
2. Checkpoint loading
|
||||
3. Model summary display
|
||||
4. ONNX export
|
||||
|
||||
### Phase 4: Training Loop (Week 5-6)
|
||||
1. Training configuration UI
|
||||
2. Background training thread
|
||||
3. Progress event emission
|
||||
4. Checkpoint auto-save
|
||||
5. Training history persistence
|
||||
|
||||
### Phase 5: RuVector Integration (Week 7)
|
||||
1. RuVector config UI
|
||||
2. Live CSI testing
|
||||
3. A/B comparison mode
|
||||
4. Config export/import
|
||||
|
||||
### Phase 6: Metrics & Polish (Week 8)
|
||||
1. Loss curve visualization (Chart.js/Recharts)
|
||||
2. Evaluation metrics display
|
||||
3. Export functionality
|
||||
4. Error handling & edge cases
|
||||
|
||||
## Risks & Mitigations
|
||||
|
||||
| Risk | Probability | Impact | Mitigation |
|
||||
|------|-------------|--------|------------|
|
||||
| No GPU available | Medium | High | CPU fallback with warning |
|
||||
| Large dataset downloads | High | Medium | Resume support, progress UI |
|
||||
| Training crashes | Medium | High | Checkpoint recovery, error reporting |
|
||||
| Memory exhaustion | Low | High | Batch size auto-tuning |
|
||||
| UI blocking | Medium | High | All training in background thread |
|
||||
|
||||
## Success Criteria
|
||||
|
||||
1. User can download MM-Fi dataset from UI
|
||||
2. User can start training with GPU detection
|
||||
3. Live progress updates without UI freeze
|
||||
4. Training can be paused/resumed
|
||||
5. RuVector config changes apply to live CSI
|
||||
6. Metrics display updates in real-time
|
||||
7. Models can be exported to ONNX
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
### 1. Separate Training App
|
||||
- **Rejected**: Fragments user experience, duplicates code
|
||||
|
||||
### 2. Web-based Training Dashboard
|
||||
- **Rejected**: Requires server, no offline support
|
||||
|
||||
### 3. CLI-only Training
|
||||
- **Rejected**: Poor UX for non-technical users
|
||||
|
||||
## References
|
||||
|
||||
- ADR-016: RuVector Training Pipeline Integration
|
||||
- ADR-017: RuVector Signal + MAT Integration
|
||||
- ADR-024: AETHER Contrastive CSI Embedding
|
||||
- ADR-027: MERIDIAN Domain Generalization
|
||||
- Tauri v2 Events: https://v2.tauri.app/develop/calling-rust/#events
|
||||
Loading…
Reference in New Issue