Commit Graph

3 Commits

Author SHA1 Message Date
rUv 1d12e8831a
refactor(beyond-sota): ADR-155 M2 — host-verifiable §8 closeout (7 de-magic, 9 boundary tests, native-conv honest-null) (#1059)
* refactor(train): ADR-155 M2 §8 — de-magic train non-tch tuning constants + boundary tests

Lift bare numeric literals used as thresholds / guard epsilons in the
non-tch (host-verifiable) train surface into named, documented consts and
pin each set with a *_consts_unchanged_from_literals test. Values are
bit-identical to the prior inline literals — cleanup, no behaviour change.

De-magicked (const + pin test):
- metrics_core.rs: VISIBILITY_THRESHOLD (0.5), MIN_REFERENCE_EXTENT (1e-6),
  OKS_FALLBACK_SIGMA (0.07)
- ruview_metrics.rs: NUM_KEYPOINTS (17), VISIBILITY_THRESHOLD (0.5),
  PCK_THRESHOLD (0.2), MIN_BBOX_DIAG (1e-3), MIN_DURATION_MINUTES (1e-6)
- subcarrier.rs: SPARSE_BASIS_SIGMA (0.15), SPARSE_BASIS_THRESHOLD (1e-4),
  SPARSE_REGULARIZATION_LAMBDA (0.1), SPARSE_COO_PRUNE_EPS (1e-8),
  SPARSE_SOLVER_TOL (1e-5 f64), SPARSE_SOLVER_MAX_ITERS (500)
- eval.rs: MIN_POSITIVE_MPJPE (1e-10)
- domain.rs: LAYER_NORM_EPS (1e-5)
- virtual_aug.rs: BOX_MULLER_U1_FLOOR (1e-10), MIN_ROOM_SCALE (1e-10)

Boundary / characterization tests (pin CURRENT behaviour):
- visibility_threshold_boundary_is_inclusive (>= 0.5 at the edge)
- degenerate_extent_below_floor_is_unscoreable ((0,0,0.0)/0.0, not perfect)
- tracking_zero_duration_does_not_divide_by_zero
- oks_short_array_is_bounded_at_keypoint_count (16 rows, no panic)
- compute_interp_weights_single_target_is_index_zero (target_sc==1)
- sparse_interp_single_target_is_finite
- domain_gap_infinite_when_in_domain_perfect_but_cross_nonzero
- domain_gap_unity_when_everything_perfect
- augment_frame_zero_room_scale_passes_amplitude_finite

Doc-only (no behaviour change):
- rapid_adapt.rs: correct module-doc O(eps) -> O(eps^2) for central differences
- geometry.rs: add # Panics to DeepSets::encode (documents existing assert!)

train --no-default-features: 191 lib (was 176), 303 total (was 288), 0 failed.

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(nn): ADR-155 M2 §3 — pure-Rust LinearHead::try_new input guard + de-magic softplus threshold

ADR-155 §3 found rf_encoder.rs has no adversarial checkpoint-deserialization
assert — its assert_eq!s in LinearHead::new are construction-time API contracts
on programmer-supplied vectors. This adds the honest, in-scope improvement the
M2 task allows: a pure-Rust *fallible* constructor so weights from an untrusted /
deserialized checkpoint can be shape-validated without panicking.

- Add RfHeadError (WeightShape / BiasShape / VarWeightShape) + Display + Error.
- Add LinearHead::try_new returning Result<Self, RfHeadError>; on success the
  head is byte-identical to LinearHead::new. new() is unchanged (still asserts;
  now documents # Panics and points to try_new) — no behaviour change for
  existing callers.
- De-magic softplus's bare 20.0 overflow threshold into
  SOFTPLUS_LINEAR_THRESHOLD (value unchanged) + pin test.

Tests: try_new_accepts_valid_and_rejects_each_bad_shape (valid == new forward;
each bad shape → typed error), softplus_threshold_unchanged_from_literal.

nn --no-default-features lib: 37 passed (was 35), 0 failed.

Co-Authored-By: claude-flow <ruv@ruv.net>

* perf(nn): ADR-155 M2 §4 — native-conv bench-first → MEASURED-INCONCLUSIVE (no perf change shipped)

The §8 "native-conv naive-loop rewrite" backlog item: DensePoseHead::
apply_conv_layer is a pure-Rust 6-nested-loop conv (benchable on this host, not
tch/ort-gated). Bench-first per the §0 PROOF discipline.

- Add committed criterion bench benches/native_conv_bench.rs measuring forward()
  through the naive conv on representative single-layer configs (--no-default-
  features; no ort download).
- Prototyped a bit-identical range-clamped variant (hoist the per-tap in-bounds
  branch by pre-clamping kh/kw ranges; same ic→kh→kw MAC order ⇒ bit-identical).
  MEASURED before/after on this host: ~35% faster on padding-heavy small-channel
  maps (4.40→2.84 ms) but a ~3% *regression* on channel-heavy maps (11.09→11.48
  ms), all inside a ±20% run-to-run noise floor. Verdict: INCONCLUSIVE — the
  benefit is not robustly positive, so the rewrite is NOT shipped and NOT a
  fabricated speedup. Reverted to the naive loop; honestly deferred (ADR-155 §8).
- Add native_conv_matches_reference: a hand-computed characterization anchor
  (1×1 = scalar MAC; same-padded 3×3 ones = truncated-window sums 9/6/4) pinning
  CURRENT conv behaviour for any future rewrite.

nn --no-default-features lib: 38 passed (was 37), 0 failed. No behaviour change.

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs(adr-155): M2 §8.2 — enumerated host-verifiable P3 backlog clearance + CHANGELOG

Replace the §8 bulk "~40 lower-severity findings" line with the real, enumerated
M2 resolution (§8.2): 7 de-magicked (const + pin == prior literal), 9 boundary
tests, 1 input guard (rf_encoder try_new), 2 doc-only, 1 perf bench-first
MEASURED-INCONCLUSIVE (not shipped). Mark native-conv + rf_encoder RESOLVED;
state which §8 items stay data-gated (GraphPose-Fi/INT4/CSI-JEPA) or tch-gated
(proof/trainer/model panic sites, metrics *_v2 dead code) and ONNX read-lock
upstream-gated — blocked, not dropped. Declare the non-tch-verifiable subset of
§8 cleared.

Validation: train --no-default-features 303 passed (was 288); nn lib 38 (was 35);
workspace --no-default-features 3,293 passed, 0 failed; Python proof VERDICT PASS,
hash f8e76f21…46f7a UNCHANGED bit-exact.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-06-14 00:07:56 -04:00
rUv 9b07dff298
feat(beyond-sota): ADR-155 metric unification + ADR-156 RaBitQ Pass-2 (honest negative + latent topk bugfix) (#1053)
* refactor(train): hoist canonical PCK/OKS to un-gated metrics_core; fold test_metrics onto production (ADR-155 M1 §8)

ADR-155 §8 deferred item: test_metrics.rs reference kernels validated
production against their OWN reimplementation — a test that cannot catch a
canonical-impl bug (both could be wrong the same way).

- Extract canonical_torso_size / pck_canonical / oks_canonical / sigmas /
  bounding_box_diagonal into a new NON-tch-gated `metrics_core` module, so
  the single metric definition is reachable under
  `cargo test --no-default-features` (the `metrics` module is tch-gated).
  `metrics` re-exports every item → still exactly ONE implementation.
- Rewrite tests/test_metrics.rs to assert the PRODUCTION pck_canonical /
  oks_canonical equal hand-computed fixtures (not a reimplementation):
  canonical_pck_matches_hand_computed_fixture (corr=3/total=4/pck=0.75),
  hip↔hip normalizer pin, zero-visible⇒0.0, OKS perfect⇒1.0, fake-Gold pin.
- Keep an INDEPENDENT raw-threshold reference kernel only as a differential
  cross-check: test_kernel_agrees_with_canonical asserts it AGREES with
  canonical where torso==1.0 (genuine cross-check, not duplication).

Grade: MEASURED. test_metrics 10→12 tests, 0 failed.

Co-Authored-By: claude-flow <ruv@ruv.net>

* fix(sensing-server): relabel divergent live PCK/OKS so they're never conflated with canonical (ADR-155 M1 §2.1/§8 Goal C)

Goal C named training_api.rs:804 (torso-HEIGHT PCK). Auditing it surfaced
TWO findings the ADR-155 §1 table missed:

1. training_api.rs is an ORPHAN file — not declared `mod` in lib.rs OR main.rs,
   so it does NOT compile into the crate. It does not drive the live server.
2. The REAL live `best_pck`/`best_oks` (main.rs training path → RVF metadata
   JSON read by model_manager.rs) come from trainer.rs:
   - `pck_at_threshold` = RAW-threshold PCK, NO torso normalization (the most
     divergent kind), printed/serialized as bare "PCK@0.2".
   - `oks_map` calls `oks_single(area=1.0)` = the EXACT fake-Gold pattern
     ADR-155 §2.1 claimed closed elsewhere — still live here, inflating best_oks.

Resolution = RELABEL (torso/raw math is load-bearing on different data; the
pub fns can't be renamed without breaking API; sensing-server has no train/
ndarray dep). Honest unify is a tracked §8 backlog item.

- training_api.rs: `compute_pck` → `compute_pck_torso_height` + divergence doc;
  val_pck/best_pck/val_oks struct fields documented as torso-HEIGHT proxies;
  logs say `pck_torso_h@0.2`. Test torso_pck_is_labelled_distinctly_from_canonical.
- trainer.rs (LIVE): `pck_at_threshold` documented raw-unnormalized; `oks_map`
  area=1.0 flagged fake-Gold; test pck_at_threshold_is_raw_unnormalized_not_canonical.
- main.rs: live print relabelled `pck_raw@0.2` / `oks_map(area=1.0 proxy)`.

No wire-format field renames (back-compat); no pub-API rename (no silent break).
Grade: MEASURED (relabel + divergence pinned). sensing-server 450→451 lib tests, 0 failed.

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs(adr-155): mark §8 metric items RESOLVED + audit map + honest §1 under-count correction (M1b Goals A/D)

- §8.1: full PCK/OKS audit map (every def: file:line, basis, canonical/
  legacy/distinct), the two §8 items marked RESOLVED with resolution+why.
- Honest finding: §1's "seven divergent metrics" was an UNDER-count —
  sensing-server's LIVE trainer.rs has a raw-unnormalized PCK and an
  area=1.0 fake-Gold OKS the table omitted, and the file §8 named
  (training_api.rs) is orphaned dead code. §9 honest-limits updated.
- Goal D: metrics.rs *_v2 variants confirmed caller-less + deprecated;
  noted for future cleanup, NOT deleted (public API, tch-gated).
- CHANGELOG [Unreleased] Fixed entry.

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(ruvector): RaBitQ Pass-2 randomized rotation + topk bugfix (ADR-156 §8)

Implements the deferred "Multi-bit / Extended RaBitQ Pass 2" backlog item
from ADR-156 §8: a deterministic randomized orthogonal rotation applied
before sign-quantization, the published RaBitQ construction (Gao & Long,
SIGMOD 2024).

Rotation construction: Fast Hadamard Transform + seeded ±1 sign flips
("HD" / randomized Hadamard), O(d log d) time and O(d) memory — a dense
d×d rotation is O(d²) and infeasible at the 65,535-d the wire format
provisions for. Pads to the next power of two; SplitMix64 seeds the sign
stream so index-time and query-time rotations are bit-identical.

API is additive and backward-compatible: Pass 1 (`from_embedding`) is
untouched; Pass 2 is opt-in via `Sketch::from_embedding_rotated` and
`SketchBank::with_rotation` (+ `insert_embedding` / `topk_embedding` /
`novelty_embedding` helpers that rotate consistently). Default behaviour
is unchanged.

While building the Pass-2 coverage harness, found and fixed a PRE-EXISTING
correctness bug in `SketchBank::topk`: the n>k heap path used
`BinaryHeap<Reverse<(d,id)>>` (a min-heap) but treated its peek as the
max, so it returned the k FARTHEST sketches as "nearest". The shipped unit
tests only exercised the n≤k fast path, so it went unnoticed. Fixed to a
plain max-heap; pinned by `topk_heap_path_returns_nearest` and
`tight_clusters_give_high_coverage_with_overfetch` (the latter measured
0.072 on the old code).

New tests (+17, 100→117 in the crate): rotation determinism/norm-preservation
(`rotation_is_deterministic_for_seed`, `rotation_preserves_norm`), Pass-2
shape-compatibility, `pass2_coverage_not_worse_than_pass1`, and a
deterministic coverage report.

MEASURED top-K coverage (anisotropic planted-cluster fixture, cosine ground
truth; dim=128 N=2048 K=8 64 clusters noise=0.35 128 queries):
  candidate_k=K=8 : Pass1 36.13% -> Pass2 46.39%  (both << 90% bar)
  candidate_k=24  : Pass1 83.89% -> Pass2 91.60%  (Pass2 clears 90%)
  candidate_k=32  : Pass1/Pass2 100%
Honest result: rotation consistently helps (+10pp at strict K), but neither
pass clears the ADR-084 90% bar at candidate_k==K on this distribution.
Pass 2 reaches 90% only with ~3x over-fetch (the ADR-084 "candidate set"
deployment pattern). Multi-bit Pass 3 evaluated separately.

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(ruvector): multi-bit Pass-3 experiment + ADR-156/084 measured results

Adds the multi-bit half of the ADR-156 §8 "Multi-bit / Extended RaBitQ"
item as a MEASURED experiment (coverage::measure_multibit): rotate, then
b-bit uniform scalar-quantize each coord, rank by L1 over codes — the
natural multi-bit generalization of hamming. Measures the bit/coverage
tradeoff the backlog item asked for.

MEASURED at the strict bar (candidate_k=K=8, anisotropic planted-cluster
fixture, cosine ground truth):
  Pass1 (1-bit, no rot)  36.13%   16 B/vec
  Pass2 (1-bit, rot)     46.39%   16 B/vec
  Pass3 (rot, 2-bit)     54.39%   32 B/vec
  Pass3 (rot, 3-bit)     66.70%   48 B/vec
  Pass3 (rot, 4-bit)     74.22%   64 B/vec
Honest: multi-bit monotonically helps but even 4-bit (4x memory) reaches
only 74% at the strict bar — neither rotation nor <=4-bit multi-bit clears
the strict-K 90% bar on this distribution. The bar is met via over-fetch
(Pass2 @ candidate_k=24). Tests: multibit_tradeoff_report,
multibit_1bit_matches_pass2_approx (+ sanity that 1-bit ~= Pass-2).

Docs:
- ADR-156 §8 item #2 marked RESOLVED-PARTIAL; §5 #2 grade CLAIMED ->
  MEASURED-on-our-hardware; new §10 with full measured tables, the topk
  bugfix disclosure, and graded deferred sub-items.
- ADR-084: "Pass 2" section answering the rotation open-question with
  measured numbers + the topk bug note.
- CHANGELOG [Unreleased]: Added (Pass-2 milestone) + Fixed (topk heap).

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-06-13 16:02:18 -04:00
ruv ea5ead7fb7 docs(adr): ADR-155 NN/training beyond-SOTA sweep — Milestone 1
Records the integrity-critical fixes (unified canonical metric, leak-free
subject-disjoint split + synthetic-val disclosure, rapid_adapt real gradients,
proof margin + committed-hash rigor), the Tier-2 correctness/security fixes, the
measured Tier-3 perf win, the NN SOTA landscape graded MEASURED/CLAIMED/
THEORETICAL (GraphPose-Fi as top ACCEPTED-future candidate; INT4; CSI-JEPA-vs-MAE
with the honest "no JEPA/MAE-on-WiFi-pose yet" caveat; "Mamba-CSI-pose does not
exist"), and the ~45-finding deferred backlog. Discloses the libtorch/tch-gating
limitation and that the Rust proof is honestly in SKIP until a baseline is
committed.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-06-11 19:57:54 -04:00