113 lines
8.4 KiB
Markdown
113 lines
8.4 KiB
Markdown
# ADR-080: QE Analysis Remediation Plan
|
||
|
||
- **Status:** Proposed — P0 security findings #1–#3 **RESOLVED** on the shipped Rust sensing-server boundary (2026-06-13; closes ADR-164 G11)
|
||
- **Date:** 2026-04-06
|
||
- **Source:** [QE Analysis Gist (2026-04-05)](https://gist.github.com/proffesor-for-testing/a6b84d7a4e26b7bbef0cf12f932925b7)
|
||
- **Full Reports:** [proffesor-for-testing/RuView `qe-reports` branch](https://github.com/proffesor-for-testing/RuView/tree/qe-reports/docs/qe-reports)
|
||
|
||
## Context
|
||
|
||
An 8-agent QE swarm analyzed ~305K lines across Rust, Python, C firmware, and TypeScript on 2026-04-05. The overall score was **55/100 (C+) — Quality Gate FAILED**. This ADR captures the findings and establishes a remediation plan.
|
||
|
||
## Decision
|
||
|
||
Address the 15 prioritized issues from the QE analysis in three waves: P0 (immediate), P1 (this sprint), P2 (this quarter).
|
||
|
||
## Security P0 closure note (2026-06-13) — Rust sensing-server boundary
|
||
|
||
The three P0 security findings below were logged against the **Python v1** API
|
||
(`archive/v1/src/…`). ADR-164 G11 re-scoped them to the *shipped* boundary:
|
||
`wifi-densepose-sensing-server` (Rust). They were verified against the current
|
||
Rust crate and closed on branch `fix/adr-080-sensing-server-security`. Each fix
|
||
(or already-fixed finding) is pinned by a test that fails on the old behavior.
|
||
**The Python v1 paths remain as-is** — v1 is archived and not the shipped
|
||
surface; this closure governs the live Rust server only.
|
||
|
||
## P0 — Fix Immediately
|
||
|
||
### 1. Rate Limiter Bypass / XFF spoofing (Security HIGH) — **RESOLVED (verified absent on Rust boundary)**
|
||
|
||
- **Original location (v1):** `archive/v1/src/middleware/rate_limit.py:200-206`
|
||
- **Problem:** Trusts `X-Forwarded-For` without validation. Any client bypasses rate limits via header spoofing.
|
||
- **Rust verification (2026-06-13):** The Rust sensing-server has **no XFF-trusting control to bypass** — there is no IP-based rate-limiter and no IP-allowlist, and neither security middleware reads a forwarded header. `bearer_auth.rs` authenticates on the token alone (`require_bearer` inspects only the `AUTHORIZATION` header); `host_validation.rs` decides on the `Host` header only. A repo-wide grep for `x-forwarded-for|forwarded|peer_addr|client_ip|real-ip` over `wifi-densepose-sensing-server` returns nothing. The only "rate limiter" is the MQTT *sample-rate* gate (`mqtt/state.rs`), a per-entity publish throttle with no IP/header input.
|
||
- **Resolution:** No code change needed (no vulnerable surface). Regression tests pin the immunity: `bearer_auth::tests::xff_header_never_affects_auth_decision` (spoofed XFF never flips a 401↔200 decision) and `host_validation::tests::forwarded_headers_never_bypass_host_allowlist` (spoofed `X-Forwarded-Host: localhost` never lets a foreign `Host: evil.com` past the allowlist). Residual: if an IP-based control is ever added, it must derive the peer from the socket (`ConnectInfo<SocketAddr>`) and only honor XFF from an explicit `--trusted-proxy` CIDR — captured as guidance in the test docstrings.
|
||
|
||
### 2. Exception Details Leaked in Responses (Security HIGH, CWE-209) — **RESOLVED**
|
||
|
||
- **Original location (v1):** `archive/v1/src/api/routers/pose.py:140`, `stream.py:297`, +5 endpoints
|
||
- **Problem:** Internal error/stack-trace detail serialized into client responses.
|
||
- **Rust finding (2026-06-13):** Six handlers in `wifi-densepose-sensing-server/src/main.rs` serialized the internal error `Display` into the JSON body: `edge_registry_endpoint` returned a panicked `spawn_blocking` `JoinError` (`"task … panicked"`) in a `500` and the raw upstream error in a `503`; `delete_model`/`delete_recording`/`start_recording` returned `std::io::Error` strings (OS detail / path); `calibration_start`/`calibration_stop` returned the `FieldModel` error chain.
|
||
- **Fix:** New `src/error_response.rs` module — `internal_error` / `internal_error_json` / `upstream_unavailable` log the full detail **server-side only** (tagged with a correlation id) and return a generic body (`{"error":"internal_error","correlation_id":…}`) with no `panicked`, no file paths, no Debug chain. All six call-sites rewired. Pinned by `error_response::tests::internal_error_body_does_not_leak_detail` (leak-substring guard, verified to fail on the reverted old body) + 4 sibling tests.
|
||
|
||
### 3. WebSocket JWT in URL (Security HIGH, CWE-598) — **RESOLVED (verified absent on Rust boundary)**
|
||
|
||
- **Original location (v1):** `archive/v1/src/api/routers/stream.py:74`, `archive/v1/src/middleware/auth.py:243`
|
||
- **Problem:** Tokens in query strings visible in logs/proxies/browser history.
|
||
- **Rust verification (2026-06-13):** The Rust sensing-server never reads a token from the URL. `require_bearer` (`bearer_auth.rs`) inspects only the `Authorization` header; the WebSocket handlers (`ws_sensing_handler`/`ws_introspection_handler`/`ws_pose_handler`) take a bare `WebSocketUpgrade` with no `Query` extractor; the single `Query` in the crate (`EdgeRegistryParams`) is a non-secret `refresh` flag.
|
||
- **Resolution:** No code change needed (no query-token path exists). Regression test `bearer_auth::tests::query_string_token_is_never_accepted` proves `?token=`/`?access_token=` in the URL never authenticates (stays `401`) while the same token in the header succeeds (`200`) — verified to fail if a query-token path is re-introduced.
|
||
|
||
### 4. Rust Tests Not in CI
|
||
|
||
- **Problem:** 2,618 tests across 153K lines of Rust — zero run in any GitHub Actions workflow. Regressions ship undetected.
|
||
- **Fix:** Add `cargo test --workspace --no-default-features` to CI. 1-2 hour task.
|
||
|
||
### 5. WebSocket Path Mismatch (Bug)
|
||
|
||
- **Location:** `ui/mobile/src/services/ws.service.ts:104` constructs `/ws/sensing`, but `constants/websocket.ts:1` defines `WS_PATH = '/api/v1/stream/pose'`.
|
||
- **Problem:** Mobile WebSocket silently fails.
|
||
- **Fix:** Align paths. Verify which endpoint the server actually serves.
|
||
|
||
## P1 — Fix This Sprint
|
||
|
||
| # | Issue | Location | Impact |
|
||
|---|-------|----------|--------|
|
||
| 6 | God file: 4,846 lines, CC=121 | `sensing-server/src/main.rs` | Untestable monolith |
|
||
| 7 | O(L×V) voxel scan per frame | `ruvsense/tomography.rs:345-383` | ~10ms wasted; use DDA ray march |
|
||
| 8 | Sequential neural inference | `wifi-densepose-nn inference.rs:334-336` | 2-4× GPU latency penalty |
|
||
| 9 | 720 `.unwrap()` in Rust | Workspace-wide | Each = potential panic in RT paths |
|
||
| 10 | 112KB alloc/frame in Python | `csi_processor.py:412-414` | Deque→list→numpy every frame |
|
||
|
||
## P2 — Fix This Quarter
|
||
|
||
| # | Issue | Impact |
|
||
|---|-------|--------|
|
||
| 11 | 11/12 Python modules have zero unit tests (12,280 LOC) | Services, middleware, DB untested |
|
||
| 12 | Firmware at 19% coverage (WASM runtime, OTA, swarm) | Security-critical code untested |
|
||
| 13 | MAT screen auto-falls back to simulated data | Disaster responders could monitor fake data |
|
||
| 14 | Token blacklist never consulted during auth | Revoked tokens remain valid |
|
||
| 15 | 50ms frame budget never benchmarked | Real-time requirement unverified |
|
||
|
||
## Bright Spots
|
||
|
||
- 79 ADRs (exceptional governance)
|
||
- Witness bundle system (ADR-028) with SHA-256 proof
|
||
- 2,618 Rust tests with mathematical rigor
|
||
- Daily security scanning (Bandit, Semgrep, Safety)
|
||
- Ed25519 WASM signature verification on firmware
|
||
- Clean mobile state management with good test coverage
|
||
|
||
## Full QE Reports (9 files, 4,914 lines)
|
||
|
||
| Report | What it covers |
|
||
|--------|---------------|
|
||
| `EXECUTIVE-SUMMARY.md` | Top-level synthesis with all scores and priority matrix |
|
||
| `00-qe-queen-summary.md` | Master coordination, quality posture, test pyramid |
|
||
| `01-code-quality-complexity.md` | Cyclomatic complexity, code smells, top 20 hotspots |
|
||
| `02-security-review.md` | 15 security findings (3 HIGH, 7 MEDIUM), OWASP coverage |
|
||
| `03-performance-analysis.md` | 23 perf findings (4 CRITICAL), frame budget analysis |
|
||
| `04-test-analysis.md` | 3,353 tests inventoried, duplication, quality grading |
|
||
| `05-quality-experience.md` | API/CLI/Mobile/DX UX assessment |
|
||
| `06-product-assessment-sfdipot.md` | SFDIPOT analysis, 57 test ideas, 14 session charters |
|
||
| `07-coverage-gaps.md` | Coverage matrix, top 20 risk gaps, 8-week roadmap |
|
||
|
||
## Consequences
|
||
|
||
- **P0 fixes** eliminate 3 security vulnerabilities and 2 functional bugs
|
||
- **P1 fixes** improve performance, reliability, and maintainability
|
||
- **P2 fixes** close coverage gaps and harden the system for production
|
||
- Target score improvement: 55 → 75+ after P0+P1 completion
|
||
|
||
---
|
||
|
||
*Generated from QE swarm analysis (fleet-02558e91) on 2026-04-05*
|