diff --git a/CHANGELOG.md b/CHANGELOG.md index d78f2846..5bdebc1b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Added - **ADR-261: RuVector graph-ANN index — a real HNSW baseline + a SymphonyQG-style quantized variant, MEASURED (honest negative).** Closes the [ADR-156 §5 #1](docs/adr/ADR-156-ruvector-fusion-beyond-sota.md) gap: the SymphonyQG (SIGMOD 2025) **3.5–17× QPS-over-HNSW** claim was CLAIMED-only because **no HNSW baseline existed to compare against**. This adds one. New pure-Rust, `--no-default-features`-buildable modules in `wifi-densepose-ruvector`: `hnsw.rs` (a correct float HNSW — Malkov & Yashunin: multi-layer NSW graph, `ef_construction`/`ef_search`, Algorithm-4 neighbour selection, **seeded-deterministic** level assignment via SplitMix64, L2 + cosine, full degenerate-case guards), `hnsw_quantized.rs` (the SymphonyQG-style variant — the **same** graph traversed by a cheap **1-bit Hamming** score over the RaBitQ Pass-2 rotated sign code, then **exact-float rerank**), `ann_measure.rs` + `benches/ann_bench.rs` (one shared deterministic planted-cluster fixture; the `ann_bench_report` test is the source of truth). **MEASURED (dim=128, N=10k, K=10, `--release`):** float HNSW = **~25× QPS over linear scan at recall ≥0.99** (the baseline this gap needed; recall@10 correctness gate ≥0.95 holds, L2 + cosine). **Honest negative:** the 1-bit quantized traversal is **too coarse to beat float HNSW at equal recall at this scale** — its best recall is **0.738**, never reaching the ≥0.90 equal-recall point, so there is **no QPS win** over float HNSW; the 3.5–17× is **not reproduced** by our 1-bit construction here. The recall gate also **caught a real index-out-of-bounds bug** in the insert path (disclosed in ADR-261 §4). Caveat: this is **our** HNSW + **our** 1-bit quant, not SymphonyQG's exact system — it tests the *direction* of the claim, with the expected crossover at large N + a multi-bit traversal code. **We did not tune to manufacture a speedup.** +20 tests (ruvector lib 131→151, 0 failed). ADR-156 §5 #1 / §8 backlog: CLAIMED → **MEASURED-direction-tested**. Python deterministic proof unchanged (off the signal proof path). +- **ADR-261 Milestone-2: multi-bit quantized HNSW traversal + large-N scaling study — MEASURED (honest negative).** Extends ADR-261's quantized index from 1-bit to **`b`-bit-per-dimension** (`b ∈ {1,2,4}`, 16/32/64 B/node) over the Pass-2 rotated coordinates, and runs a deterministic scaling study (N ∈ {10k, 100k, 250k}) to test M1's *prediction* of a large-N crossover. **Result: no crossover at any measured (N, b), and the trend refutes the prediction.** At N=10k more bits lift the equal-recall QPS ratio (0.19×→0.46×→0.48×) and let b≥2 reach the 0.90 recall bar 1-bit missed — but quant stays slower than float HNSW at equal recall; at N=100k/250k quant recall *collapses* (b=4: 1.000→0.788→0.624, never ≥0.90) while float holds ≥0.92 (denser graph → low-bit codes can't separate near-neighbours, beam goes off-path faster than the float-distance saving repays). Caveat: our HNSW + our per-node multi-bit code, not SymphonyQG's RaBitQ-fused graph — refutes the *direction* at ≤250k, not their million-scale numbers. ruvector lib **151→156** (+5 tests; `scaling_report` `#[ignore]` produced the table). A published negative with the mechanism explained. ADR-261 §11. - **ADR-260: RuField MFS — the open specification for camera-free multimodal field sensing.** A common event / tensor / calibration / privacy / provenance model that sits *above* WiFi CSI/CIR/BFLD, UWB, BLE Channel Sounding, mmWave radar, ultrasound, subsonic, infrared, and future quantum sensors (each modality emits a normalized `FieldEvent` → `FieldTensor` → `FusionGraph` → `PrivacyClass` → `ProvenanceReceipt`). Published as a **standalone repo** [`ruvnet/rufield`](https://github.com/ruvnet/rufield) and vendored here as the `vendor/rufield` submodule (the `vendor/rvcsi` pattern — not a `v2/` workspace member). The v0.1 reference stack is a self-contained 6-crate Rust workspace (`rufield-core`, `-provenance` [sha256 + ed25519], `-privacy` [P0–P5 guard], `-adapters` [deterministic `SyntheticSim` across wifi_csi/mmwave_radar/infrared_thermal], `-fusion` [graph + TOML weighted-Bayes rules → 7 room-state inferences], `-bench` [deterministic runner + the §31 acceptance test]). **60 tests / 0 failed, clippy-clean.** §27 acceptance criteria 1–8 and 10 PASS; the live dashboard (9) is deferred. **All benchmark metrics are SYNTHETIC** (scored against the simulator's own ground truth — presence/breathing/bed_exit/room_transition F1 = 1.000, nocturnal_scratch 0.923 reported honestly, p95 latency ~0.01 ms, provenance coverage 100%, 0 privacy violations) — they prove the pipeline recovers known truth, **not** field accuracy; real hardware adapters (ESP32 CSI, mmWave, thermal IR) are a documented roadmap item, none validated in v0.1. The Python deterministic proof is unchanged (rufield is off the signal-processing proof path). ### Security diff --git a/docs/adr/ADR-261-ruvector-graph-ann-index.md b/docs/adr/ADR-261-ruvector-graph-ann-index.md index acf76568..c50132dc 100644 --- a/docs/adr/ADR-261-ruvector-graph-ann-index.md +++ b/docs/adr/ADR-261-ruvector-graph-ann-index.md @@ -139,7 +139,7 @@ Fixture: planted-cluster synthetic, **dim=128, N=10,000, 64 clusters, 200 querie ## 8. Validation -- **`cd v2 && cargo test -p wifi-densepose-ruvector --no-default-features --lib`** — **151 passed / 0 failed** (was 131; +20 new tests: 10 `hnsw`, 7 `hnsw_quantized`, 3 `ann_measure`). +- **`cd v2 && cargo test -p wifi-densepose-ruvector --no-default-features --lib`** — **156 passed / 0 failed, 1 ignored** (M1 added 20: 10 `hnsw`, 7 `hnsw_quantized`, 3 `ann_measure`; M2 added 5 multi-bit/scaling tests; `scaling_report` is the `#[ignore]` measurement that produced the §11 table). - **`cargo test --workspace --no-default-features`** — GREEN (see §10 for the count). - **Correctness gate verified to bite:** the recall@10 gate **panicked** on the first (buggy) insert path (§4); after the fix it passes at 0.99+ recall (L2 and cosine). - **`cargo test -p wifi-densepose-ruvector --no-default-features --release ann_bench_report -- --nocapture`** — prints the §6 table; the numbers above are copied verbatim from that run. @@ -154,10 +154,13 @@ Fixture: planted-cluster synthetic, **dim=128, N=10,000, 64 clusters, 200 querie **Negative / honest.** The 1-bit quantized variant is **not** an equal-recall QPS win at our scale; it is shipped as a measured experiment with a clearly-stated ceiling, not as a recommended default. Anyone reaching for it must read §7. +**Resolved by Milestone-2 (§11, MEASURED — no longer deferred).** +- **Multi-bit traversal score** — implemented (`b ∈ {1,2,4}` bits/dim over the Pass-2 rotated coordinates) and measured. It *does* lift quantized recall (at N=10k, b=4 reaches the 0.90 equal-recall regime where 1-bit could not), but still does not beat float HNSW QPS. +- **Large-N crossover measurement** — measured at N ∈ {10k, 100k, 250k}. **The predicted large-N crossover did NOT materialize — it moved the wrong way** (quant recall *collapses* as N grows). See §11. + **Deferred (not silently dropped).** -- **Multi-bit / RaBitQ-estimator traversal score.** Replace 1-bit Hamming traversal with a ≤4-bit code or the `estimator.rs` unbiased rescale (ADR-156 §10/§11) — the lever most likely to lift quantized recall to the equal-recall regime. -- **Large-N crossover measurement.** Re-run §6 at N=100k–1M (`ANN_BENCH_N`) to find where quantization's per-node saving starts to dominate. - **Wiring HNSW into the live re-ID path** (AETHER hot-cache / sketch prefilter) behind a flag. +- **N ≥ 1M + SymphonyQG's exact RaBitQ-fused construction** — our impl refutes the *direction* at ≤250k; a true 1:1 reproduction at million-scale with their fused codes remains a separate, larger build. --- @@ -170,3 +173,28 @@ Fixture: planted-cluster synthetic, **dim=128, N=10,000, 64 clusters, 200 querie - `lib.rs` — `pub mod hnsw / hnsw_quantized / ann_measure`; re-export `HnswIndex`, `HnswParams`, `Metric`, `QuantizedHnswIndex`. - `ADR-156-ruvector-fusion-beyond-sota.md` §5 #1 + §8 backlog — SymphonyQG regraded **CLAIMED → MEASURED-direction-tested (refuted at N=10k for our 1-bit construction)**, pointing here. - `CHANGELOG.md` — `[Unreleased]` entry. + +--- + +## 11. Milestone-2 — multi-bit traversal + large-N scaling study (MEASURED) + +M1 (§7) refuted the SymphonyQG direction at N=10k with a 1-bit code, and *predicted* a crossover at "large N + a higher-bit code." M2 builds both levers and measures them — so the prediction is tested, not assumed. + +**Built:** `hnsw_quantized.rs` generalized from 1-bit to a **`b`-bit-per-dimension** code (`b ∈ {1,2,4}`, a mid-rise quantizer over the same `RANGE=3.0` rotated coordinates as ADR-156 §10's `measure_multibit`); `ann_measure.rs` gained `run_scaling_study` / `best_float_op` / `best_quant_op` + a deterministic `scaling_report` (`#[ignore]`, `--release`) and a CI-safe `scaling_study_small_is_consistent`. Memory: **16 / 32 / 64 bytes/node** for b = 1 / 2 / 4. + +**MEASURED** (dim=128, 64 clusters, 200 queries, K=10, L2, M=16, ef_construction=200, seeded, `--release`, this box; target recall ≥ 0.90): + +| N | bits | B/node | quant best recall | float @ target | quant @ target | quant/float | +|--:|--:|--:|--:|--|--|--:| +| 10,000 | 1 | 16 | 1.000 | 23,155 QPS @ r=0.995 | 4,482 QPS @ r=0.965 | **0.19×** | +| 10,000 | 2 | 32 | 1.000 | 23,155 QPS @ r=0.995 | 10,658 QPS @ r=0.908 | **0.46×** | +| 10,000 | 4 | 64 | 1.000 | 23,155 QPS @ r=0.995 | 11,217 QPS @ r=0.946 | **0.48×** | +| 100,000 | 1 / 2 / 4 | 16/32/64 | 0.207 / 0.346 / 0.788 | 2,493 QPS @ r=0.938 | none (never ≥ 0.90) | — | +| 250,000 | 1 / 2 / 4 | 16/32/64 | 0.108 / 0.210 / 0.624 | 1,593 QPS @ r=0.925 | none | — | + +**Verdict — NO crossover at any measured (N, b) up to 250k, and the trend REFUTES the large-N prediction:** +1. **Multi-bit helps at small N but not enough.** At N=10k, more bits lift the equal-recall QPS ratio 0.19× → 0.46× → 0.48× (and let b≥2 actually *reach* the 0.90 bar that 1-bit missed) — but quant stays **below 1.0×**, i.e. slower than float HNSW at equal recall. +2. **The predicted large-N crossover moved the wrong way.** As N grows 10k → 100k → 250k, quant's best achievable recall **collapses** (b=4: 1.000 → 0.788 → 0.624) and never reaches the 0.90 comparison point, while float HNSW holds ≥0.92. A denser graph packs near-neighbours whose low-bit codes are nearly identical, so the approximate score steers the beam off-path faster than the bigger float-distance savings can repay. The "crossover at millions" intuition is **not supported by our construction's trend** — if anything it diverges. +3. **Caveat unchanged:** this is our HNSW + our per-node multi-bit code, not SymphonyQG's RaBitQ-fused graph. The result refutes the *direction* for our construction at ≤250k; it does not disprove their published numbers on their system at their scale. A real 1:1 reproduction is the deferred million-scale build. + +This is a **published negative with the mechanism explained** — the multi-bit + scaling levers were built and measured rather than asserted, and the honest outcome (no crossover, trend diverging) is recorded, not hidden. diff --git a/v2/crates/wifi-densepose-ruvector/benches/ann_bench.rs b/v2/crates/wifi-densepose-ruvector/benches/ann_bench.rs index d111f258..9711c586 100644 --- a/v2/crates/wifi-densepose-ruvector/benches/ann_bench.rs +++ b/v2/crates/wifi-densepose-ruvector/benches/ann_bench.rs @@ -16,12 +16,17 @@ //! so the bench and the report can never measure different graphs. use criterion::{black_box, criterion_group, criterion_main, Criterion}; -use wifi_densepose_ruvector::ann_measure::{build_indices, queries, AnnBenchParams}; +use wifi_densepose_ruvector::ann_measure::{ + build_indices, build_quant_bits, queries, AnnBenchParams, +}; fn bench_ann(c: &mut Criterion) { // Modest N so the bench builds quickly; the report covers the larger N. let p = AnnBenchParams::default_fixture(10_000); - let (float_idx, quant_idx, _v) = build_indices(p); + let (float_idx, quant_idx, vectors) = build_indices(p); + // Multi-bit quant variants over the SAME graph/fixture (ADR-261 §11). + let quant_2bit = build_quant_bits(p, &vectors, 2); + let quant_4bit = build_quant_bits(p, &vectors, 4); let qs = queries(p); let k = p.k; @@ -52,10 +57,10 @@ fn bench_ann(c: &mut Criterion) { }); } - // Quantized HNSW at matched beam widths + rerank. + // Quantized HNSW (1-bit) at matched beam widths + rerank. for &ef in &[64usize, 128] { let rr = k * 5; - group.bench_function(format!("quant_hnsw_ef{ef}_rr{rr}"), |b| { + group.bench_function(format!("quant_hnsw_1bit_ef{ef}_rr{rr}"), |b| { b.iter(|| { let mut sink = 0u64; for q in &qs { @@ -67,6 +72,25 @@ fn bench_ann(c: &mut Criterion) { }); } + // Multi-bit quant HNSW (ADR-261 §11): 2-bit and 4-bit traversal codes at a + // mid beam width, so the criterion medians show the per-bit QPS cost the + // scaling study reports against recall. + for (label, idx) in [("2bit", &quant_2bit), ("4bit", &quant_4bit)] { + for &ef in &[64usize, 128] { + let rr = k * 5; + group.bench_function(format!("quant_hnsw_{label}_ef{ef}_rr{rr}"), |b| { + b.iter(|| { + let mut sink = 0u64; + for q in &qs { + sink = sink + .wrapping_add(idx.search_quantized(black_box(q), k, ef, rr).len() as u64); + } + black_box(sink) + }) + }); + } + } + group.finish(); } diff --git a/v2/crates/wifi-densepose-ruvector/src/ann_measure.rs b/v2/crates/wifi-densepose-ruvector/src/ann_measure.rs index 8719d8a9..768c8d61 100644 --- a/v2/crates/wifi-densepose-ruvector/src/ann_measure.rs +++ b/v2/crates/wifi-densepose-ruvector/src/ann_measure.rs @@ -229,8 +229,24 @@ pub fn measure_quantized_hnsw( } /// Build both indices for `p` (shared insertion order + graph seed so the float -/// and quantized graphs are identical — the only variable is scoring). +/// and quantized graphs are identical — the only variable is scoring). The +/// quantized index uses the legacy **1-bit** code (ADR-261 §6); use +/// [`build_indices_bits`] for the multi-bit scaling study (§11). pub fn build_indices(p: AnnBenchParams) -> (HnswIndex, QuantizedHnswIndex, Vec>) { + build_indices_bits(p, 1) +} + +/// Build the float HNSW + a `bits`-bit quantized HNSW over the same fixture, +/// sharing the graph seed and insertion order so the *only* variable between the +/// float and quantized search is the traversal score. `bits ∈ {1, 2, 4}` (clamped +/// in [`QuantizedHnswIndex::build_bits`]). The float index is **independent of +/// `bits`** — callers sweeping `bits` should build the float index once and reuse +/// it (the quantized graph is identical across `bits`; only the per-node code +/// changes). +pub fn build_indices_bits( + p: AnnBenchParams, + bits: u32, +) -> (HnswIndex, QuantizedHnswIndex, Vec>) { let vectors = fixture(p); let params = HnswParams { m: 16, @@ -242,11 +258,140 @@ pub fn build_indices(p: AnnBenchParams) -> (HnswIndex, QuantizedHnswIndex, Vec], bits: u32) -> QuantizedHnswIndex { + let params = HnswParams { + m: 16, + ef_construction: 200, + ef_search: 64, + seed: p.graph_seed, + }; + QuantizedHnswIndex::build_bits(vectors, p.dim, Metric::L2, params, p.rot_seed, bits, p.k * 4) +} + +/// The fastest operating point of a method that meets `target` recall, as +/// `(qps, recall, label)`; `None` if no swept op met it. +type BestOp = Option<(f64, f64, String)>; + +/// Sweep float HNSW over a fixed `ef` ladder; return the fastest op meeting +/// `target` recall. +pub fn best_float_op( + idx: &HnswIndex, + qs: &[Vec], + truth: &[HashSet], + k: usize, + target: f64, +) -> BestOp { + let mut best: BestOp = None; + for &ef in &[16usize, 32, 64, 128, 256] { + let r = measure_float_hnsw(idx, qs, truth, k, ef); + if r.recall >= target && best.as_ref().map(|b| r.qps > b.0).unwrap_or(true) { + best = Some((r.qps, r.recall, format!("ef={ef}"))); + } + } + best +} + +/// Sweep quant HNSW over a fixed `(ef, rerank)` ladder; return the fastest op +/// meeting `target` recall, plus the best recall reached anywhere on the ladder +/// (so a not-found verdict can report how close it got). +pub fn best_quant_op( + qidx: &QuantizedHnswIndex, + qs: &[Vec], + truth: &[HashSet], + k: usize, + target: f64, +) -> (BestOp, f64) { + let mut best: BestOp = None; + let mut best_recall_seen = 0.0f64; + for &ef in &[32usize, 64, 128, 256, 512] { + for &rr in &[k * 2, k * 5, k * 10, k * 20] { + let r = measure_quantized_hnsw(qidx, qs, truth, k, ef, rr); + best_recall_seen = best_recall_seen.max(r.recall); + if r.recall >= target && best.as_ref().map(|b| r.qps > b.0).unwrap_or(true) { + best = Some((r.qps, r.recall, format!("ef={ef} rr={rr}"))); + } + } + } + (best, best_recall_seen) +} + +/// One row of the ADR-261 §11 scaling study: at a fixed `(N, b)`, the equal-recall +/// (≥ `target`) operating points for float vs quant HNSW and their QPS ratio. +#[derive(Debug, Clone)] +pub struct ScalingRow { + /// Indexed vector count. + pub n: usize, + /// Traversal-code bit-depth (1, 2, or 4). + pub bits: u32, + /// Packed bytes per node of the quant code at this `b`. + pub bytes_per_node: usize, + /// Fastest float-HNSW op meeting `target` recall (qps, recall, label). + pub float_op: BestOp, + /// Fastest quant-HNSW op meeting `target` recall (qps, recall, label). + pub quant_op: BestOp, + /// Best recall the quant ladder reached at this `(N, b)` (≤ `target` ⇒ no op). + pub quant_best_recall: f64, + /// quant/float QPS ratio at equal recall, if both met `target`. + pub ratio: Option, +} + +/// Run the ADR-261 §11 multi-bit scaling study: for each `N ∈ ns` and each +/// `b ∈ bits_set`, measure the equal-recall (≥ `target`) QPS ratio of quant-HNSW +/// vs float-HNSW on the shared fixture. Deterministic and `--no-default-features` +/// runnable. Returns one [`ScalingRow`] per `(N, b)`; the caller prints the table +/// and decides the crossover verdict. The float index is built once per `N` and +/// reused across `b` (the quant graph is identical across `b`). +pub fn run_scaling_study( + base: AnnBenchParams, + ns: &[usize], + bits_set: &[u32], + target: f64, +) -> Vec { + let mut rows = Vec::new(); + for &n in ns { + let p = AnnBenchParams { n, ..base }; + let (float_idx, _q1, vectors) = build_indices_bits(p, 1); + let qs = queries(p); + let truth = ground_truth(&float_idx, &qs, p.k); + let float_op = best_float_op(&float_idx, &qs, &truth, p.k, target); + for &b in bits_set { + let qidx = build_quant_bits(p, &vectors, b); + let (quant_op, quant_best_recall) = + best_quant_op(&qidx, &qs, &truth, p.k, target); + let ratio = match (&float_op, &quant_op) { + (Some((fqps, _, _)), Some((qqps, _, _))) => Some(qqps / fqps), + _ => None, + }; + rows.push(ScalingRow { + n, + bits: qidx.bits(), + bytes_per_node: qidx.bytes_per_node(), + float_op: float_op.clone(), + quant_op, + quant_best_recall, + ratio, + }); + } + } + rows +} + #[cfg(test)] mod tests { use super::*; @@ -397,4 +542,143 @@ mod tests { "best quant-HNSW recall {best_quant_recall:.4} below the 0.30 not-broken floor" ); } + + /// The ADR-261 §11 **multi-bit scaling study**. Sweeps `N` and `b ∈ {1,2,4}`, + /// printing the `(N, b) → recall / QPS / quant-vs-float ratio at equal recall` + /// surface and the crossover verdict. This is the source of truth for the §11 + /// table. Run for the published numbers with: + /// + /// ```text + /// cd v2 && ANN_SCALE_NS=10000,100000,250000 \ + /// cargo test -p wifi-densepose-ruvector --no-default-features --release \ + /// scaling_report -- --nocapture --ignored + /// ``` + /// + /// Marked `#[ignore]` so the default (debug) gate stays fast: it builds and + /// queries several indices up to large `N`, which is minutes under `--release` + /// and far too slow in debug. The CI-safe structural invariants are checked by + /// `scaling_study_small_is_consistent` below at tiny `N`. + #[test] + #[ignore = "scaling study — run explicitly with --release --ignored; minutes at large N"] + fn scaling_report() { + // N ladder: default 10k→100k→250k (a clean 25× span that builds+queries in + // a few minutes under --release on the test box). Override with + // ANN_SCALE_NS=a,b,c. The largest feasible N is documented in the ADR with + // the measured build/query time at the cap. + let ns: Vec = std::env::var("ANN_SCALE_NS") + .ok() + .map(|s| s.split(',').filter_map(|x| x.trim().parse().ok()).collect()) + .unwrap_or_else(|| vec![10_000, 100_000, 250_000]); + let bits_set = [1u32, 2, 4]; + let target = 0.90f64; + let base = AnnBenchParams::default_fixture(ns[0]); + + println!("\n=== ADR-261 §11 multi-bit scaling study (planted-cluster synthetic) ==="); + println!( + "dim={} clusters={} queries={} K={} noise={} graph_seed=0x{:X} rot_seed=0x{:X}", + base.dim, base.clusters, base.n_queries, base.k, base.noise, base.graph_seed, base.rot_seed + ); + println!("metric=L2 M=16 ef_construction=200 target recall >= {target:.2} (use --release for QPS)"); + println!( + "{:<9} {:>4} {:>9} {:>10} {:>22} {:>22} {:>12}", + "N", "bits", "B/node", "q_best_rec", "float@target", "quant@target", "quant/float" + ); + + let rows = run_scaling_study(base, &ns, &bits_set, target); + for row in &rows { + let float_s = row + .float_op + .as_ref() + .map(|(q, r, l)| format!("{l} {q:.0}QPS r={r:.3}")) + .unwrap_or_else(|| "none".to_string()); + let quant_s = row + .quant_op + .as_ref() + .map(|(q, r, l)| format!("{l} {q:.0}QPS r={r:.3}")) + .unwrap_or_else(|| "none".to_string()); + let ratio_s = row + .ratio + .map(|x| format!("{x:.2}x")) + .unwrap_or_else(|| "—".to_string()); + println!( + "{:<9} {:>4} {:>9} {:>10.3} {:>22} {:>22} {:>12}", + row.n, row.bits, row.bytes_per_node, row.quant_best_recall, float_s, quant_s, ratio_s + ); + } + + // Crossover verdict: report whether the quant/float ratio EVER exceeds 1.0 + // at equal recall, and the per-bit trend of the best-quant-recall as N grows + // (is quant getting closer to the equal-recall regime, or not). + println!("\n--- crossover verdict (quant-HNSW > float-HNSW at equal recall?) ---"); + let crossover: Vec<&ScalingRow> = rows + .iter() + .filter(|r| r.ratio.map(|x| x > 1.0).unwrap_or(false)) + .collect(); + if crossover.is_empty() { + println!("NO crossover at any measured (N, b): quant never met target recall AND beat float QPS."); + } else { + for r in &crossover { + println!( + "CROSSOVER at N={} b={}: quant/float = {:.2}x at recall >= {target:.2}", + r.n, r.bits, r.ratio.unwrap() + ); + } + } + for &b in &bits_set { + let trend: Vec<(usize, f64)> = rows + .iter() + .filter(|r| r.bits == b) + .map(|r| (r.n, r.quant_best_recall)) + .collect(); + let trend_s: Vec = trend + .iter() + .map(|(n, r)| format!("N={n}:{r:.3}")) + .collect(); + println!("b={b} best-quant-recall trend: {}", trend_s.join(" ")); + } + println!("======================================================================\n"); + + // Structural invariants (gate-safe at any N): at least one float op met + // target at every N (the baseline must work), and quant recall is in range. + for &n in &ns { + let any_float = rows.iter().any(|r| r.n == n && r.float_op.is_some()); + assert!(any_float, "no float-HNSW op met target recall at N={n} — baseline broken"); + } + for r in &rows { + assert!( + (0.0..=1.0).contains(&r.quant_best_recall), + "quant recall out of range at N={} b={}: {}", + r.n, + r.bits, + r.quant_best_recall + ); + } + } + + /// CI-safe structural check for the scaling study at tiny `N` (debug-fast): + /// the study runs end-to-end, bytes/node scales with `b`, and the float + /// baseline meets target at the smallest N. Does **not** assert any crossover + /// (that is the §11 measured question, answered by `scaling_report`). + #[test] + fn scaling_study_small_is_consistent() { + let base = AnnBenchParams::default_fixture(1500); + let ns = [1500usize, 3000]; + let bits_set = [1u32, 2, 4]; + let rows = run_scaling_study(base, &ns, &bits_set, 0.90); + assert_eq!(rows.len(), ns.len() * bits_set.len()); + // Bytes/node scales with b at dim=128 (D=128): 16 / 32 / 64. + for r in rows.iter().filter(|r| r.n == 1500) { + let expect = match r.bits { + 1 => 16, + 2 => 32, + _ => 64, + }; + assert_eq!(r.bytes_per_node, expect, "B/node wrong for b={}", r.bits); + } + // Float baseline must meet target at the smallest N. + assert!( + rows.iter().any(|r| r.n == 1500 && r.float_op.is_some()), + "float baseline failed target at small N" + ); + } } diff --git a/v2/crates/wifi-densepose-ruvector/src/hnsw_quantized.rs b/v2/crates/wifi-densepose-ruvector/src/hnsw_quantized.rs index 655b4764..91eaac14 100644 --- a/v2/crates/wifi-densepose-ruvector/src/hnsw_quantized.rs +++ b/v2/crates/wifi-densepose-ruvector/src/hnsw_quantized.rs @@ -1,4 +1,4 @@ -//! A **SymphonyQG-style quantized-traversal HNSW** — ADR-261. +//! A **SymphonyQG-style quantized-traversal HNSW** — ADR-261 (multi-bit, §11). //! //! # The SymphonyQG bet (what we are testing) //! @@ -25,20 +25,26 @@ //! float and quantized search is **how a candidate is scored during traversal**, //! so any QPS/recall difference is attributable to the quantization, not to a //! different graph. -//! - **Quantized score = 1-bit Hamming over the RaBitQ Pass-2 rotated sign code** -//! ([`crate::rotation`] + the sign-quantization in [`crate::sketch`]). Each -//! node stores its `ceil(D/8)`-byte sign code (`D = next_pow2(dim)`). During -//! traversal we compare query-code vs node-code by **POPCNT Hamming** — a few -//! machine words, no per-dimension float work. +//! - **Quantized score = `b`-bit code over the RaBitQ Pass-2 rotated coordinates** +//! ([`crate::rotation`] + the multi-bit scalar quantizer mirrored from +//! [ADR-156 §10](../../../../../docs/adr/ADR-156-ruvector-fusion-beyond-sota.md)'s +//! `coverage::measure_multibit`). Each node stores a `b`-bit-per-dimension code +//! over the padded rotation length `D = next_pow2(dim)`. During traversal we +//! compare query-code vs node-code by the **L1 distance over the per-dim +//! codes** — a few machine words of integer work, no per-dimension float work. +//! For `b == 1` the codes are `{0, 1}` and the L1 distance is **exactly the +//! 1-bit Hamming distance** of the original ADR-261 construction, so `b == 1` +//! is fully backward-compatible. //! - **Exact float rerank** of the final beam: the top `rerank` candidates by -//! Hamming are re-scored with the true float metric and the best `k` returned. +//! code-L1 are re-scored with the true float metric and the best `k` returned. //! -//! This trades a small recall hit (the 1-bit code is a coarse angle proxy — the -//! same ~46%-strict limitation ADR-156 §10 measured) for far cheaper per-node -//! scoring, recovered by the float rerank. **Whether that nets a QPS win at our -//! test scale is the measured question ADR-261 answers** — and at small N the -//! float distance is cheap enough that the Hamming saving may not pay off. We -//! report the real number, win or lose, and do not tune to manufacture a speedup. +//! Higher `b` keeps the traversal beam on-path better than 1-bit (ADR-156 §10 +//! measured 1/2/3/4-bit strict-K coverage at ~46/54/67/74%), at a memory cost +//! that scales linearly with `b` (bytes/node = `ceil(D·b/8)`). **Whether the +//! extra bits net a QPS win at equal recall — and at what N a crossover with +//! float HNSW appears, if any — is the measured question ADR-261 §11 answers.** +//! We report the real number, win or lose, and do not tune to manufacture a +//! speedup. //! //! # Determinism & robustness //! @@ -53,56 +59,95 @@ use std::collections::{BinaryHeap, HashSet}; use crate::hnsw::{HnswIndex, HnswParams, Metric}; use crate::rotation::Rotation; -/// A 1-bit Pass-2 sign code for one vector, over the padded rotation length `D`. -/// Stored as packed bytes; compared by POPCNT Hamming. +/// Symmetric clamp range for the uniform mid-rise scalar quantizer, in rotated- +/// coordinate units. The normalized FHT (`1/√D`) puts AETHER-shape rotated +/// coordinates roughly in `[-3, 3]`; out-of-range coords clamp to the end codes. +/// This is the **same `RANGE = 3.0`** as ADR-156 §10's `coverage::measure_multibit`, +/// so the multi-bit code here is the same scheme that module measured. +const RANGE: f32 = 3.0; + +/// A `b`-bit-per-dimension scalar code of a rotated embedding over the padded +/// length `D`, compared by per-dim L1. +/// +/// For `bits == 1` the per-dim code is `{0, 1}` (sign), and L1 over those codes +/// is exactly POPCNT Hamming — so the 1-bit case is bit-for-bit the original +/// ADR-261 construction. For `bits ∈ {2, 4}` the code is a uniform mid-rise +/// quantizer with `2^bits` levels over `[-RANGE, RANGE]`. #[derive(Debug, Clone)] struct Code { - bits: Vec, + /// Per-dimension codes (`0..2^bits`), one entry per padded dimension `D`. + /// Kept unpacked as `u8` for branch-free L1; the *reported* memory cost is + /// the packed footprint (`ceil(D·bits/8)`), since a production node would + /// store the packed form. (We measure the packed bytes/node explicitly in + /// [`QuantizedHnswIndex::bytes_per_node`].) + codes: Vec, } impl Code { - /// Hamming distance to another code of the same length (popcount of XOR). + /// L1 distance over the per-dimension codes — the multi-bit generalization + /// of Hamming. At `bits == 1` (codes in `{0,1}`) this equals the popcount of + /// the XOR, i.e. the 1-bit Hamming distance. #[inline] - fn hamming(&self, other: &Code) -> u32 { - let n = self.bits.len().min(other.bits.len()); + fn l1(&self, other: &Code) -> u32 { + let n = self.codes.len().min(other.codes.len()); let mut acc = 0u32; for i in 0..n { - acc += (self.bits[i] ^ other.bits[i]).count_ones(); + acc += (self.codes[i] as i32 - other.codes[i] as i32).unsigned_abs(); } acc } } -/// Build the packed 1-bit sign code of a rotated embedding over the padded -/// length `D = rotation.padded_dim()`. Bit set ⇒ rotated coord ≥ 0. -fn encode(embedding: &[f32], rotation: &Rotation) -> Code { +/// Quantize the rotated coordinates of `embedding` to a `bits`-bit-per-dimension +/// [`Code`] over the padded rotation length `D = rotation.padded_dim()`. +/// +/// `bits == 1` reduces to sign-quantization (code `1` iff the rotated coord ≥ 0), +/// preserving the original 1-bit construction; `bits ∈ {2, 4}` uses a uniform +/// mid-rise quantizer with `2^bits` levels over `[-RANGE, RANGE]`, identical to +/// ADR-156 §10's `measure_multibit`. +fn encode(embedding: &[f32], rotation: &Rotation, bits: u32) -> Code { let rotated = rotation.apply_padded(embedding); - let d = rotated.len(); - let mut bits = vec![0u8; d.div_ceil(8)]; - for (i, &c) in rotated.iter().enumerate() { - if c >= 0.0 { - bits[i / 8] |= 1 << (7 - (i % 8)); - } - } - Code { bits } + let levels = 1u32 << bits; // 2^bits codes per dim + let codes: Vec = rotated + .iter() + .map(|&x| { + if bits == 1 { + // Sign code: identical to the original 1-bit construction. + u8::from(x >= 0.0) + } else { + let t = ((x + RANGE) / (2.0 * RANGE)).clamp(0.0, 1.0); // → [0,1] + let code = (t * (levels - 1) as f32).round() as u32; + code.min(levels - 1) as u8 + } + }) + .collect(); + Code { codes } } -/// Min-heap node for the quantized beam (closest Hamming at the top). +/// Packed bytes a node's `bits`-bit code occupies over padded length `D`: +/// `ceil(D·bits/8)`. The memory cost reported by ADR-261 §11 (1-bit → `D/8`, +/// 2-bit → `D/4`, 4-bit → `D/2`). +#[inline] +fn packed_bytes(padded_dim: usize, bits: u32) -> usize { + (padded_dim * bits as usize).div_ceil(8) +} + +/// Min-heap node for the quantized beam (closest code-L1 at the top). #[derive(Debug, Clone, Copy)] struct HScored { - /// Hamming distance (quantized score) — the traversal key. - ham: u32, + /// Code-L1 distance (quantized score) — the traversal key. + dist: u32, id: u32, } impl PartialEq for HScored { fn eq(&self, other: &Self) -> bool { - self.ham == other.ham && self.id == other.id + self.dist == other.dist && self.id == other.id } } impl Eq for HScored {} impl Ord for HScored { fn cmp(&self, other: &Self) -> Ordering { - self.ham.cmp(&other.ham).then(self.id.cmp(&other.id)) + self.dist.cmp(&other.dist).then(self.id.cmp(&other.id)) } } impl PartialOrd for HScored { @@ -110,7 +155,7 @@ impl PartialOrd for HScored { Some(self.cmp(other)) } } -/// Reversed wrapper for a min-heap (smallest Hamming at the top). +/// Reversed wrapper for a min-heap (smallest code-L1 at the top). #[derive(Debug, Clone, Copy)] struct MinH(HScored); impl PartialEq for MinH { @@ -131,33 +176,34 @@ impl PartialOrd for MinH { } /// A SymphonyQG-style HNSW: the same graph as [`HnswIndex`], traversed by a -/// **cheap 1-bit Hamming score**, with a final **exact-float rerank**. +/// **cheap `b`-bit code-L1 score**, with a final **exact-float rerank**. /// /// Built by inserting the same vectors in the same order with the same seed as /// a float [`HnswIndex`], so the two indices share identical graph structure and /// only differ in how the beam is scored. The shared [`Rotation`] (seed + dim) -/// is the index/query frame for the 1-bit codes. +/// is the index/query frame for the `b`-bit codes. `bits ∈ {1, 2, 4}` selects +/// the traversal-code resolution; `bits == 1` is the original 1-bit Hamming +/// construction. #[derive(Debug, Clone)] pub struct QuantizedHnswIndex { /// The underlying graph (built with the float metric for exact rerank). graph: HnswIndex, - /// Per-node 1-bit Pass-2 codes, indexed by id (parallel to graph vectors). + /// Per-node `b`-bit codes, indexed by id (parallel to graph vectors). codes: Vec, /// The rotation frame shared by index and query codes. rotation: Rotation, + /// Bits per dimension of the traversal code (`1`, `2`, or `4`). + bits: u32, /// Number of final candidates to exact-float rerank (≥ k at query time). default_rerank: usize, } impl QuantizedHnswIndex { - /// Build a quantized index over `vectors`, mirroring a float [`HnswIndex`] - /// built with the same `(dim, metric, params)` and insertion order. The - /// `rotation_seed` fixes the 1-bit code frame (index and query share it). + /// Build a 1-bit quantized index (the original ADR-261 construction). /// - /// `default_rerank` is how many top-Hamming candidates get an exact float - /// re-score before returning the best `k`; it is clamped to `≥ k` at query - /// time. A larger rerank recovers more recall at more float cost — the knob - /// that, alongside `ef`, sets the equal-recall operating point. + /// Equivalent to [`QuantizedHnswIndex::build_bits`] with `bits = 1`; kept as + /// the backward-compatible entry point so existing callers and tests are + /// unchanged. pub fn build( vectors: &[Vec], dim: usize, @@ -166,17 +212,41 @@ impl QuantizedHnswIndex { rotation_seed: u64, default_rerank: usize, ) -> Self { + Self::build_bits(vectors, dim, metric, params, rotation_seed, 1, default_rerank) + } + + /// Build a `bits`-bit quantized index over `vectors`, mirroring a float + /// [`HnswIndex`] built with the same `(dim, metric, params)` and insertion + /// order. The `rotation_seed` fixes the code frame (index and query share it). + /// + /// `bits` is clamped to `{1, 2, 4}` (the resolutions ADR-261 §11 sweeps): any + /// other value is rounded up to the nearest of these so the constructor is + /// total. `default_rerank` is how many top-code-L1 candidates get an exact + /// float re-score before returning the best `k`; it is clamped to `≥ k` at + /// query time. A larger rerank recovers more recall at more float cost — the + /// knob that, alongside `ef`, sets the equal-recall operating point. + pub fn build_bits( + vectors: &[Vec], + dim: usize, + metric: Metric, + params: HnswParams, + rotation_seed: u64, + bits: u32, + default_rerank: usize, + ) -> Self { + let bits = clamp_bits(bits); let rotation = Rotation::new(rotation_seed, dim); let mut graph = HnswIndex::new(dim, metric, params); let mut codes = Vec::with_capacity(vectors.len()); for v in vectors { graph.insert(v); - codes.push(encode(v, &rotation)); + codes.push(encode(v, &rotation, bits)); } Self { graph, codes, rotation, + bits, default_rerank: default_rerank.max(1), } } @@ -207,9 +277,23 @@ impl QuantizedHnswIndex { self.default_rerank } - /// SymphonyQG-style search: traverse the graph scoring candidates by **1-bit - /// Hamming**, collect a beam of `ef`, then **exact-float rerank** the top - /// `rerank` (clamped ≥ k) and return the best `k` as `(id, float_dist)`. + /// Bits per dimension of the traversal code. + #[inline] + pub fn bits(&self) -> u32 { + self.bits + } + + /// Packed memory footprint of one node's traversal code, in bytes: + /// `ceil(D·bits/8)` where `D = next_pow2(dim)` is the padded rotation length. + /// This is the per-node cost ADR-261 §11 reports for each `b`. + #[inline] + pub fn bytes_per_node(&self) -> usize { + packed_bytes(self.rotation.padded_dim(), self.bits) + } + + /// SymphonyQG-style search: traverse the graph scoring candidates by the + /// **`b`-bit code-L1**, collect a beam of `ef`, then **exact-float rerank** + /// the top `rerank` (clamped ≥ k) and return the best `k` as `(id, float_dist)`. /// /// Degenerate cases mirror [`HnswIndex::search`]: empty ⇒ empty; `k == 0` ⇒ /// empty; `k > n` ⇒ all; never panics. @@ -225,7 +309,7 @@ impl QuantizedHnswIndex { } let ef = ef.max(k).max(1); let rerank = rerank.max(k); - let q_code = encode(query, &self.rotation); + let q_code = encode(query, &self.rotation, self.bits); // Entry point: the graph's entry (highest-level node). let entry = match self.graph.entry_point() { @@ -233,18 +317,18 @@ impl QuantizedHnswIndex { None => return Vec::new(), }; - // Greedy-descend upper layers by Hamming, then beam-search layer 0. + // Greedy-descend upper layers by code-L1, then beam-search layer 0. let mut ep = entry; let mut layer = self.graph.top_level(); while layer > 0 { - ep = self.greedy_hamming(&q_code, ep, layer); + ep = self.greedy_code(&q_code, ep, layer); layer -= 1; } - let beam = self.beam_hamming(&q_code, ep, ef); + let beam = self.beam_code(&q_code, ep, ef); - // Exact-float rerank of the top `rerank` Hamming candidates. + // Exact-float rerank of the top `rerank` code-L1 candidates. let mut cand: Vec = beam; - cand.sort_by_key(|c| c.ham); + cand.sort_by_key(|c| c.dist); cand.truncate(rerank); let mut reranked: Vec<(u32, f32)> = cand .iter() @@ -265,16 +349,16 @@ impl QuantizedHnswIndex { self.search_quantized(query, k, self.graph.params_ef_search(), self.default_rerank) } - /// Greedy single-best descent on a layer scored by Hamming. - fn greedy_hamming(&self, q_code: &Code, start: u32, layer: usize) -> u32 { + /// Greedy single-best descent on a layer scored by code-L1. + fn greedy_code(&self, q_code: &Code, start: u32, layer: usize) -> u32 { let mut best = start; - let mut best_h = self.codes[best as usize].hamming(q_code); + let mut best_d = self.codes[best as usize].l1(q_code); loop { let mut improved = false; for &nbr in self.graph.neighbours(best, layer) { - let h = self.codes[nbr as usize].hamming(q_code); - if h < best_h { - best_h = h; + let d = self.codes[nbr as usize].l1(q_code); + if d < best_d { + best_d = d; best = nbr; improved = true; } @@ -285,32 +369,32 @@ impl QuantizedHnswIndex { } } - /// Beam search on layer 0 scored by Hamming. Returns the `ef` best-Hamming - /// nodes (unsorted). Iterative — bounded by the visited set + the ef beam. - fn beam_hamming(&self, q_code: &Code, ep: u32, ef: usize) -> Vec { + /// Beam search on layer 0 scored by code-L1. Returns the `ef` best-code nodes + /// (unsorted). Iterative — bounded by the visited set + the ef beam. + fn beam_code(&self, q_code: &Code, ep: u32, ef: usize) -> Vec { let mut visited: HashSet = HashSet::new(); let mut candidates: BinaryHeap = BinaryHeap::new(); let mut results: BinaryHeap = BinaryHeap::new(); // max-heap: worst at top - let h0 = self.codes[ep as usize].hamming(q_code); - let s0 = HScored { ham: h0, id: ep }; + let d0 = self.codes[ep as usize].l1(q_code); + let s0 = HScored { dist: d0, id: ep }; visited.insert(ep); candidates.push(MinH(s0)); results.push(s0); while let Some(MinH(cur)) = candidates.pop() { - let worst = results.peek().map(|s| s.ham).unwrap_or(u32::MAX); - if cur.ham > worst && results.len() >= ef { + let worst = results.peek().map(|s| s.dist).unwrap_or(u32::MAX); + if cur.dist > worst && results.len() >= ef { break; } for &nbr in self.graph.neighbours(cur.id, 0) { if !visited.insert(nbr) { continue; } - let h = self.codes[nbr as usize].hamming(q_code); - let worst = results.peek().map(|s| s.ham).unwrap_or(u32::MAX); - if results.len() < ef || h < worst { - let s = HScored { ham: h, id: nbr }; + let d = self.codes[nbr as usize].l1(q_code); + let worst = results.peek().map(|s| s.dist).unwrap_or(u32::MAX); + if results.len() < ef || d < worst { + let s = HScored { dist: d, id: nbr }; candidates.push(MinH(s)); results.push(s); while results.len() > ef { @@ -323,6 +407,17 @@ impl QuantizedHnswIndex { } } +/// Clamp a requested bit-depth to the supported `{1, 2, 4}` set (round up to the +/// nearest supported value; `0` → `1`, `3` → `4`, `> 4` → `4`). +#[inline] +fn clamp_bits(bits: u32) -> u32 { + match bits { + 0 | 1 => 1, + 2 => 2, + _ => 4, + } +} + #[cfg(test)] mod tests { use super::*; @@ -463,4 +558,116 @@ mod tests { let r = idx.search_quantized(&[], 2, 16, 4); assert_eq!(r.len(), 2); } + + // ----- multi-bit (ADR-261 §11) ----- + + /// `bits == 1` via `build_bits` is byte-for-byte the legacy `build` 1-bit + /// construction: same codes, same search output. Backward-compatibility pin. + #[test] + fn one_bit_build_bits_matches_legacy_build() { + let vectors = planted(32, 400, 8, 0x1B17); + let legacy = QuantizedHnswIndex::build(&vectors, 32, Metric::L2, params(0x5151), 0xC0DE, 40); + let viabits = + QuantizedHnswIndex::build_bits(&vectors, 32, Metric::L2, params(0x5151), 0xC0DE, 1, 40); + assert_eq!(legacy.bits(), 1); + assert_eq!(viabits.bits(), 1); + let q = &vectors[123]; + assert_eq!( + legacy.search_quantized(q, 10, 64, 40), + viabits.search_quantized(q, 10, 64, 40), + "build_bits(…,1,…) must equal legacy build(…)" + ); + } + + /// Unsupported bit-depths round up to the supported `{1,2,4}` set so the + /// constructor is total (no panic, predictable resolution). + #[test] + fn bits_are_clamped_to_supported_set() { + let vectors = planted(16, 50, 4, 0xB175); + for (req, exp) in [(0u32, 1u32), (1, 1), (2, 2), (3, 4), (4, 4), (7, 4)] { + let idx = QuantizedHnswIndex::build_bits( + &vectors, + 16, + Metric::L2, + params(0x9), + 0xB, + req, + 16, + ); + assert_eq!(idx.bits(), exp, "bits {req} should clamp to {exp}"); + // and it must still search without panic + assert!(!idx.search_quantized(&vectors[0], 5, 32, 20).is_empty()); + } + } + + /// Bytes/node scales linearly with `bits`: for a power-of-two dim `D`, + /// 1-bit → D/8, 2-bit → D/4, 4-bit → D/2. + #[test] + fn bytes_per_node_scales_with_bits() { + let vectors = planted(128, 20, 4, 0xBEEF); + let b1 = QuantizedHnswIndex::build_bits(&vectors, 128, Metric::L2, params(1), 0x5, 1, 16); + let b2 = QuantizedHnswIndex::build_bits(&vectors, 128, Metric::L2, params(1), 0x5, 2, 16); + let b4 = QuantizedHnswIndex::build_bits(&vectors, 128, Metric::L2, params(1), 0x5, 4, 16); + assert_eq!(b1.bytes_per_node(), 16, "128-d 1-bit = 16 B/node"); + assert_eq!(b2.bytes_per_node(), 32, "128-d 2-bit = 32 B/node"); + assert_eq!(b4.bytes_per_node(), 64, "128-d 4-bit = 64 B/node"); + } + + /// More bits must not *reduce* recall at a fixed (ef, rerank): the multi-bit + /// code is a strictly finer angle proxy than 1-bit, so the traversal beam can + /// only land on equal-or-better candidates for the rerank to repair. This is + /// the core ADR-261 §11 hypothesis (multi-bit keeps the beam on-path better), + /// pinned as a regression gate. We assert a small tolerance for ties. + #[test] + fn more_bits_does_not_reduce_recall() { + let dim = 64; + let n = 3000; + let clusters = 32; + let seed = 0x7A11; + let vectors = planted(dim, n, clusters, seed); + let recall_for = |bits: u32| -> f64 { + let idx = QuantizedHnswIndex::build_bits( + &vectors, + dim, + Metric::L2, + params(0xA11A), + 0x5EED, + bits, + // Modest rerank so traversal quality — not a huge rerank pool — + // is what drives the recall difference between bit depths. + 20, + ); + let mut total = 0.0f64; + let n_queries = 64; + for q in 0..n_queries { + let c = q % clusters; + let mut cs = seed ^ (0xC0FFEE_u64.wrapping_mul(c as u64 + 1)); + let centre: Vec = (0..dim).map(|_| gauss(&mut cs) * 3.0).collect(); + let mut s = seed ^ 0xDEAD_0000 ^ (q as u64).wrapping_mul(0x2545_F491); + let qv: Vec = (0..dim).map(|d| centre[d] + gauss(&mut s) * 0.35).collect(); + let truth: HashSet = idx + .graph() + .brute_force(&qv, 10) + .into_iter() + .map(|(id, _)| id) + .collect(); + let got = idx.search_quantized(&qv, 10, 64, 20); + let hit = got.iter().filter(|(id, _)| truth.contains(id)).count(); + total += hit as f64 / 10.0; + } + total / n_queries as f64 + }; + let r1 = recall_for(1); + let r2 = recall_for(2); + let r4 = recall_for(4); + // 2-bit and 4-bit must be at least as good as 1-bit (small tie tolerance). + assert!( + r2 + 0.02 >= r1, + "2-bit recall {r2:.4} regressed vs 1-bit {r1:.4}" + ); + assert!( + r4 + 0.02 >= r1, + "4-bit recall {r4:.4} regressed vs 1-bit {r1:.4}" + ); + } }