wifi-densepose

Commit Graph

Author	SHA1	Message	Date
ruv	237325a117	feat(temporal): weight-blob wire format (ADR-095 Phase 1, #513 ) The training/firmware boundary needs a stable serialization for the temporal head's weights, distinct from the kernel scaffold and the firmware ABI. This commit defines that format on the host side. The firmware-side mirrored loader lands when the toolchain unblocks. Format: - Header (24 B): magic 'RVNE' / version 1 / dtype flag (FP32 / FP16) / input_dim / n_q_heads / n_kv_heads / head_dim / n_layers / n_classes / weights_len. - Body: weights_len bytes of flat per-layer weights. - Footer (4 B): CRC32 IEEE 802.3 over everything before, same polynomial used by temporal_task.c so a blob produced here parses on the firmware unchanged. Layout decisions: - Little-endian throughout (Xtensa native). - Weights kept as Vec<u8> rather than Vec<f32>/Vec<f16> so the no_std firmware loader (which may not have the `half` crate) can mmap and read either dtype directly. - Versioning is hard-break: bumping `version` means firmware refuses to load. Optional fields go behind reserved flag bits, never by field reorder. Documented inline. Validation surface: - `WeightBlobHeader::validate()` catches zero dims, invalid GQA ratios (n_q_heads % n_kv_heads != 0), n_layers=0, n_classes<2. Same checks fire from `WeightBlob::parse()` so the firmware can't accidentally accept a blob the host should have rejected. - `WeightBlob::parse()` enforces magic / version / size / CRC before exposing weights to the caller. Tests (8/8 passing, alongside 5/5 sparse smoke = 13/13 total): - roundtrip_fp32, roundtrip_fp16 - parse_rejects_bad_magic, _wrong_version, _size_mismatch, _crc_corruption, _invalid_gqa_ratio_in_header - header_constants_match_wire_layout (anchor) What's deliberately NOT in this commit: - The firmware-side mirrored loader (deferred to the iteration that unblocks the esp Rust toolchain — no point shipping a parser that can't be compiled). - Per-layer weight ordering. The blob is a flat byte-buffer; the interpretation of per-layer offsets is the kernel's contract, documented in the eventual model module (ADR-095 §3.2 follow-up). Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-08 11:43:49 -04:00
ruv	bfb3fdee13	feat(temporal): scaffold wifi-densepose-temporal crate (ADR-096 Phase 1-3, #513 ) Implements Phases 1-3 of the ADR-096 roadmap: Phase 1: workspace integration - Add `ruvllm_sparse_attention` as a path-vendored workspace dep against `vendor/ruvector/crates/ruvllm_sparse_attention`, default-features=false, features=["fp16"]. Mirrors the no_std posture ADR-095 will need on the firmware side so both consumers share a single feature set. - Register `wifi-densepose-temporal` as workspace member. Phase 2: AETHER temporal head - `AetherTemporalHead` facade dispatches to a `SparseGqa` backend wrapping `SubquadraticSparseAttention`. Selection rule from ADR-096 §4.4 enforced at forward(): MHA branch when q_heads == kv_heads, GQA branch otherwise. - `Dense` backend reserved (returns typed `DenseBackendNotImplemented`) so config-time validation fails loudly instead of at forward(). - `TemporalHeadConfig::default_aether()` matches the AETHER training default per ADR-096 §3.1 (window=32, block=16, q=4, kv=1 → MQA). - Token 0 always wired as a global anchor — preserves AETHER's contrastive "session-start reference" role per ADR-024. Phase 3: smoke tests (5/5 passing) - forward at AETHER default config, both MHA and GQA dispatch paths, rejected dense backend, rejected non-divisible GQA ratio, and the long-window roadmap target (N=1000, the 10s @ 100Hz case from ADR-096 §3.1 — proves the kernel runs at lengths where dense MHA costs 10⁶ edge ops vs sparse 10⁴). Streaming `step()` deferred — KvCache lifecycle ties to PoseTrack per ADR-096 §8.5 and lands when the firmware-side ABI does (Phase 4+). Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-08 09:26:18 -04:00

Author

SHA1

Message

Date

ruv

237325a117

feat(temporal): weight-blob wire format (ADR-095 Phase 1, #513 )

The training/firmware boundary needs a stable serialization for the
temporal head's weights, distinct from the kernel scaffold and the
firmware ABI. This commit defines that format on the host side. The
firmware-side mirrored loader lands when the toolchain unblocks.

Format:
  - Header (24 B): magic 'RVNE' / version 1 / dtype flag
    (FP32 / FP16) / input_dim / n_q_heads / n_kv_heads / head_dim /
    n_layers / n_classes / weights_len.
  - Body: weights_len bytes of flat per-layer weights.
  - Footer (4 B): CRC32 IEEE 802.3 over everything before, same
    polynomial used by temporal_task.c so a blob produced here parses
    on the firmware unchanged.

Layout decisions:
  - Little-endian throughout (Xtensa native).
  - Weights kept as Vec<u8> rather than Vec<f32>/Vec<f16> so the no_std
    firmware loader (which may not have the `half` crate) can mmap and
    read either dtype directly.
  - Versioning is hard-break: bumping `version` means firmware refuses
    to load. Optional fields go behind reserved flag bits, never by
    field reorder. Documented inline.

Validation surface:
  - `WeightBlobHeader::validate()` catches zero dims, invalid GQA
    ratios (n_q_heads % n_kv_heads != 0), n_layers=0, n_classes<2.
    Same checks fire from `WeightBlob::parse()` so the firmware can't
    accidentally accept a blob the host should have rejected.
  - `WeightBlob::parse()` enforces magic / version / size / CRC
    before exposing weights to the caller.

Tests (8/8 passing, alongside 5/5 sparse smoke = 13/13 total):
  - roundtrip_fp32, roundtrip_fp16
  - parse_rejects_bad_magic, _wrong_version, _size_mismatch,
    _crc_corruption, _invalid_gqa_ratio_in_header
  - header_constants_match_wire_layout (anchor)

What's deliberately NOT in this commit:
  - The firmware-side mirrored loader (deferred to the iteration that
    unblocks the esp Rust toolchain — no point shipping a parser that
    can't be compiled).
  - Per-layer weight ordering. The blob is a flat byte-buffer; the
    interpretation of per-layer offsets is the kernel's contract,
    documented in the eventual model module (ADR-095 §3.2 follow-up).

Co-Authored-By: claude-flow <ruv@ruv.net>

2026-05-08 11:43:49 -04:00

ruv

bfb3fdee13

feat(temporal): scaffold wifi-densepose-temporal crate (ADR-096 Phase 1-3, #513 )

Implements Phases 1-3 of the ADR-096 roadmap:

Phase 1: workspace integration
- Add `ruvllm_sparse_attention` as a path-vendored workspace dep against
  `vendor/ruvector/crates/ruvllm_sparse_attention`, default-features=false,
  features=["fp16"]. Mirrors the no_std posture ADR-095 will need on the
  firmware side so both consumers share a single feature set.
- Register `wifi-densepose-temporal` as workspace member.

Phase 2: AETHER temporal head
- `AetherTemporalHead` facade dispatches to a `SparseGqa` backend wrapping
  `SubquadraticSparseAttention`. Selection rule from ADR-096 §4.4 enforced
  at forward(): MHA branch when q_heads == kv_heads, GQA branch otherwise.
- `Dense` backend reserved (returns typed `DenseBackendNotImplemented`)
  so config-time validation fails loudly instead of at forward().
- `TemporalHeadConfig::default_aether()` matches the AETHER training
  default per ADR-096 §3.1 (window=32, block=16, q=4, kv=1 → MQA).
- Token 0 always wired as a global anchor — preserves AETHER's
  contrastive "session-start reference" role per ADR-024.

Phase 3: smoke tests (5/5 passing)
- forward at AETHER default config, both MHA and GQA dispatch paths,
  rejected dense backend, rejected non-divisible GQA ratio, and the
  long-window roadmap target (N=1000, the 10s @ 100Hz case from
  ADR-096 §3.1 — proves the kernel runs at lengths where dense MHA
  costs 10⁶ edge ops vs sparse 10⁴).

Streaming `step()` deferred — KvCache lifecycle ties to PoseTrack per
ADR-096 §8.5 and lands when the firmware-side ABI does (Phase 4+).

Co-Authored-By: claude-flow <ruv@ruv.net>

2026-05-08 09:26:18 -04:00

2 Commits