Commit Graph

3 Commits

Author SHA1 Message Date
rUv c353255672
fix: firmware cluster — wasm3 IDF v6.0 build (#946) + swarm TLS stack (#949) + Docker unauth default (#864) (#975)
* fix(firmware,docker): clear three high-severity bugs in one sweep

Closes #946 — wasm3 fails on Xtensa GCC 15.2.0 (ESP-IDF v6.0.1)

  cannot tail-call: machine description does not have a sibcall_epilogue
  instruction pattern

wasm3's `M3_MUSTTAIL return jumpOpImpl(...)` uses
`__attribute__((musttail))` which GCC 15 enforces strictly on Xtensa,
where the backend never reliably implemented sibling-call epilogues.
Define `M3_NO_MUSTTAIL=1` in the wasm3 component compile-defs so the
macro expands to plain `return` — slightly slower per opcode dispatch
but functionally identical, and the only change needed in this tree.
Older IDF / GCC builds accept the define as a no-op so the IDF v5.4
CI build is unchanged.

Closes #949 — swarm task stack overflow on Seed TLS init

The reporter provisioned with `--seed-url https://...` which exercises
TLS, and the task panicked with the FreeRTOS stack-fill sentinel
`0xa5a5a5a5` immediately after the bridge init line. `SWARM_TASK_STACK`
was 3 KB ("HTTP client uses ~2.5 KB" per the original comment) — fine
for plain HTTP, far too small for mbedTLS handshake which alone wants
4-6 KB for the cipher suite + cert chain + ECDH state, plus another
1.5-2 KB for esp_http_client. Bumped to 8192 with the why in the
comment. Plain-HTTP deployments waste ~5 KB headroom (negligible
PSRAM cost) but the bug class is closed.

Closes #864 — Docker default exposes unauthenticated sensing API + WS

`docker-entrypoint.sh` started the sensing-server with `--bind-addr
0.0.0.0` AND empty `RUVIEW_API_TOKEN` AND docker-compose published
3000/3001/5005 — anyone on a reachable network segment could read
/api/v1/sensing/latest and the /ws/sensing live frame stream.

Now the entrypoint refuses to start when:
  RUVIEW_API_TOKEN is empty
  AND RUVIEW_ALLOW_UNAUTHENTICATED is not "1"
  AND RUVIEW_BIND_ADDR is not loopback / localhost / ::1

…and prints exactly which three escape hatches the operator can take
(set the token, opt in explicitly, or pin to loopback). Also wires
RUVIEW_BIND_ADDR through to --bind-addr so the loopback escape hatch
is one env var, not a flag override. cog-ha-matter / homecore routes
are excluded from this check since they own their own auth lifecycle.
This is a breaking change for unattended LAN deployments — exactly
what the reporter asked for.

Validation

* `idf.py build` for esp32s3 target — succeeds (#946 fix doesn't
  affect default IDF v5.4 build path).
* `idf.py set-target esp32c6 && idf.py build` — succeeds, binary
  1015 KB / 45% partition free.
* Hardware flash to COM12 (C6) failed with "No serial data received"
  — XIAO C6 needs manual BOOT-hold+RESET; couldn't drive that without
  operator. Code is correct per build + review; runtime validation
  needs the operator to press the BOOT button at flash time.
* docker-entrypoint.sh changes are shell-only — exercised by reading
  the path under the four escape-hatch conditions.

Out of scope — cross-repo issues

Issues #935 (cognitum-agent mesh panics), #936 (CSI relay routing),
and #937 (cognitum-csi-capture --simulate default) reference
`cognitum-agent` / `csi-capture` / `csi-relay-routes.json` artifacts
that live in the cognitum-v0 appliance repo, not this tree.

Issue #954 (CSI callback never fires on S3 v0.6.5/v0.7.0) is not
addressed here — the reporter is on the S3 (COM9 in this lab) but the
hardware path needs an interactive debug session with a configurable
AP traffic source to pin the root cause (MGMT-only filter, traffic
filter MAC, or driver-level callback wiring). Will tackle in a
follow-up.

Co-Authored-By: claude-flow <ruv@ruv.net>

* fix(firmware): bump LWIP UDP / WiFi TX buffer pools to ease ENOMEM

Hardware validation on COM8 (S3) and COM9 (C6) surfaced a v0.7.0
regression not captured in the existing issue tracker: stock IDF v5.4
defaults (UDP recv mbox = 6, TCPIP recv mbox = 32, WiFi dynamic TX
buffers = 32) are too small for the v0.7.0 packet mix once CSI
promiscuous mode is active. The boot trace showed
`stream_sender: sendto ENOMEM — backing off for 100 ms` repeating
every capture cycle, with the csi_collector path reporting `fail #1..5`
within seconds of associating to an AP.

Modest bumps applied (~3 KB extra heap each):

  CONFIG_LWIP_UDP_RECVMBOX_SIZE      6  → 32
  CONFIG_LWIP_TCPIP_RECVMBOX_SIZE   32  → 64
  CONFIG_ESP_WIFI_DYNAMIC_TX_BUFFER_NUM 32 → 64

Empirical 25 s measurement on S3 / COM8 post-fix:

  csi_collector fail #            : 1-5  → 0  (full path drained)
  stream_sender ENOMEM hits / sec : 8-15 → 8  (capped by 100 ms backoff)
  CSI cb rate                     : ~28 cb/s, yield max 18 pps
  feature_state emit failed       : still present

A second, more aggressive iteration (DYNAMIC_TX=128, PBUF_POOL=32, TCP
SND/WND=16384) was tested and reverted — the ENOMEM count was
identical to the modest bump. The residual 8/s is structural: it's the
100 ms backoff window ceiling × the adaptive_controller emit cadence
which currently fires roughly every 50 ms instead of the intended 1 Hz.
Bigger buffers don't fix that — only rate-limiting the emitter does.

Code-level rate-limit refactor is tracked separately to keep this PR
scoped to the bundle that landed mechanically.

Co-Authored-By: claude-flow <ruv@ruv.net>

* fix(firmware): rate-limit feature_state emit from 5 Hz → 1 Hz

Completes the ENOMEM cure that the LWIP/WiFi buffer bumps started.

Root cause (verified on COM8 / S3 + COM9 / C6)

`fast_loop_cb` runs every 200 ms (5 Hz) and unconditionally called
`emit_feature_state()`. Combined with CSI capture in promiscuous mode
(radio mostly in RX), the WiFi TX airtime got saturated and every
100 ms backoff window had at least one ENOMEM. Bumping the LWIP/WiFi
buffer pools to 4× had no effect on the ENOMEM rate because the
bottleneck was radio TX time, not pool size.

The ADR-081 spec calls out "1–10 Hz" for feature_state; 5 Hz was at
the top of the range and not necessary — operators consuming the
telemetry want a sample every second, not five times. Dropping to
1 Hz frees ~80 % of the feature_state TX traffic.

Measurement on COM8 (25 s windows, otherwise-idle environment)

  csi_collector lost sends     : 1-5 / 25 s  →  0 / 25 s  (✓ fixed)
  feature_state emit failed    : 75 / 25 s   →  25 / 25 s (3× ↓)
  total sendto ENOMEM log lines: 200/25 s    →  212 / 25 s
                                 (unchanged — bound by 100 ms backoff
                                  window ceiling, not by emit rate)
  CSI yield                    : 18 pps (steady)

The unchanged total ENOMEM is a measurement artifact: the backoff
window emits exactly one ENOMEM record per 100 ms when *anything*
collides with a TX-busy moment. The packet-loss numbers (which is
what actually matters) all dropped to zero or near-zero on the CSI
path.

Implementation

Pure-static `s_emit_divider` counter in `fast_loop_cb`. Every 5th tick
calls the emit. Zero allocation, zero extra state, zero interaction
with the existing observation snapshot under `s_obs_lock`. Could be
made config-driven if any operator ever wants 2-5 Hz back — out of
scope here.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-06-08 16:39:42 +02:00
rUv 00a234eda8
ADR-110: ESP32-C6 firmware extension (#764)
Closes the firmware-side ADR-110 design at v0.7.0-esp32 after a 38-iter /loop SOTA sprint.

Headline (bench, COM9+COM12 ESP32-C6):
- 99.56% cross-board RX, 104.1 µs smoothed offset stdev (≤100 µs §2.4 target met)
- 3.95× EMA suppression, 1.4 ppm crystal skew preserved

4 firmware releases: v0.6.7 / v0.6.8 / v0.6.9 / v0.7.0-esp32.
42 ADR-110 unit tests, 1761 v2 workspace tests, full Firmware CI + QEMU green.
2026-05-23 15:34:48 -04:00
rUv 2b8a7cc458
feat: happiness scoring pipeline + ESP32 swarm with Cognitum Seed (#285)
* feat: happiness scoring pipeline with ESP32 swarm + Cognitum Seed coordinator

ADR-065: Hotel guest happiness scoring from WiFi CSI physiological proxies.
ADR-066: ESP32 swarm with Cognitum Seed as coordinator for multi-zone analytics.

Firmware:
- swarm_bridge.c/h: FreeRTOS task on Core 0, HTTP client with Bearer auth,
  registers with Seed, sends heartbeats (30s) and happiness vectors (5s)
- nvs_config: seed_url, seed_token, zone_name, swarm intervals
- provision.py: --seed-url, --seed-token, --zone CLI args
- esp32-hello-world: capability discovery firmware for 4MB ESP32-S3 variant

WASM edge modules:
- exo_happiness_score.rs: 8-dim happiness vector from gait speed, stride
  regularity, movement fluidity, breathing calm, posture, dwell time
  (events 690-694, 11 tests, ESP32-optimized buffers + event decimation)
- ghost_hunter.rs standalone binary: 5.7 KB WASM, feature-gated default pipeline

RuView Live:
- --mode happiness dashboard with bar visualization
- --seed flag for Cognitum Seed bridge (urllib, background POST)
- HappinessScorer + SeedBridge classes (stdlib only, no deps)

Examples:
- seed_query.py: CLI tool (status, search, witness, monitor, report)
- provision_swarm.sh: batch provisioning for multi-node deployment
- happiness_vector_schema.json: 8-dim vector format documentation

Verified live: ESP32 on COM5 (4MB flash) registered with Seed at 10.1.10.236,
vectors flowing, witness chain growing (epoch 455, chain 1108).

Co-Authored-By: claude-flow <ruv@ruv.net>

* ci: raise firmware binary size gate to 1100 KB for HTTP client stack

The swarm bridge (ADR-066) adds esp_http_client for Seed communication,
which pulls in the HTTP/TLS stack (~150 KB). Binary grew from ~978 KB to
~1077 KB. Raise the gate from 950 KB to 1100 KB. Still fits comfortably
in both 4MB (1856 KB OTA slot, 43% free) and 8MB flash variants.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-03-20 18:46:34 -04:00