fix: first-run breakage (closes #559, #561) + #560 platform-aware diagnosis

Three related fixes — a fresh-clone user hitting any of these would
conclude the project doesn't work; #557's "feels like mock" narrative
is fed in part by these breakages.

## #559 — `./verify` pointed at removed `v1/` paths

The wrapper hard-coded `v1/data/proof` / `v1/src`, but the proof scripts
moved to `archive/v1/` long ago. A fresh clone failed before the
pipeline could even run. User `Fewmanism` provided the exact diff in
the issue. Applied verbatim across four hits (PROOF_DIR, V1_SRC, the
Phase 3 scan-message, and the SKIP-state recovery hint).

  ./verify  # now PASS end-to-end

## #561 — firmware README would misflash and point at the wrong provisioner

Two real bring-up bugs:

1. Manual flash command put the app at `0x10000`. The partition tables
   (`partitions_display.csv`, `partitions_4mb.csv`) define `ota_0` at
   `0x20000`. `0x10000` is the start of `phy_init` data — flashing
   the app binary there would corrupt the PHY init data and the app
   would never run. The QEMU section already had the right `0x20000`,
   so this was an internal contradiction. Both occurrences fixed.

   Also added `0xf000 ota_data_initial.bin` to the manual flash
   command — the release bundle ships this binary and without it the
   bootloader can refuse to boot after a factory wipe.

2. `python scripts/provision.py` referenced the wrong file. There are
   actually TWO `provision.py` files in the repo (`scripts/` — 275
   lines, stale; `firmware/esp32-csi-node/` — 348 lines, has the
   issue #391 full-replace semantics fix). The canonical one is in
   the firmware dir. Both README occurrences fixed to point at the
   canonical path. (The stale `scripts/provision.py` is a separate
   cleanup; the historical ADRs that reference it are intentionally
   not touched.)

## #560 — proof hash mismatches on macOS arm64 / Accelerate

User `Fewmanism` reports that with the same pinned `numpy 1.26.4` /
`scipy 1.14.1` on macOS arm64, the proof's SHA-256 differs from the
published expected hash. The proof passes on linux-x86_64 and
windows-x86_64 (where wheels ship OpenBLAS); it mismatches on
darwin-arm64 (where numpy/scipy use Accelerate.framework). That is
not a code bug — Accelerate's FFT and BLAS produce bit-different
output on identical IEEE 754 inputs from the same backend, and the
proof's bit-exact contract therefore cannot hold across backends.

What this commit changes:

- `verify.py` now prints a RUNTIME ENVIRONMENT block before the
  pipeline runs: platform, machine, Python version, numpy BLAS
  backend. Users on a non-reference backend see the cause up front.
- The FAIL message reorders causes: platform BLAS/FFT backend is
  now the *primary* suspect (not "unlikely"), with a pointer to
  the printed RUNTIME ENVIRONMENT block.
- New `archive/v1/data/proof/REFERENCE_PLATFORMS.md` documents the
  reference platforms (linux-x86_64 + windows-x86_64 with OpenBLAS),
  the expected-MISMATCH platforms (darwin-arm64 with Accelerate,
  any MKL install), and three workable responses for users hitting
  a non-reference backend (run on a reference platform, generate a
  local-reference hash, or use tolerance-based comparison — that
  last one is the roadmap path).

This converts #560 from "the proof is broken on my Mac" to "the proof
has a documented single-backend contract".

## Verification

- `./verify` (Windows x86_64 / OpenBLAS): VERDICT PASS, hash
  `8c0680d7…51c6` matches expected. RUNTIME ENVIRONMENT block prints
  numpy BLAS = `scipy-openblas`.
- `grep -E '0x10000|scripts/provision\.py' firmware/esp32-csi-node/README.md`:
  no matches.

Co-Authored-By: claude-flow <ruv@ruv.net>
This commit is contained in:
ruv 2026-05-14 08:45:33 -04:00
parent 457f713702
commit 86f38c4fc6
4 changed files with 129 additions and 17 deletions

View File

@ -0,0 +1,52 @@
# Reference platforms for `expected_features.sha256`
The hash in `expected_features.sha256` was generated on a specific BLAS / FFT
backend. Numpy + scipy delegate FFT/linear-algebra to platform-native
libraries, and those libraries produce **bit-different output on identical
IEEE 754 inputs** depending on the backend. This is not a bug in the proof
pipeline — it is a property of the underlying numerical libraries. (See
issue #560.)
## Platforms where the hash is expected to MATCH
| Platform | BLAS backend | Status |
|---|---|---|
| `linux-x86_64-gnu` (Python 3.11.x, numpy 1.26.4 from PyPI wheels, scipy 1.14.1) | OpenBLAS | ✅ Reference |
| `windows-x86_64-msvc` (Python 3.11.x / 3.13.x, numpy 1.26.4 from PyPI wheels, scipy 1.14.1) | OpenBLAS | ✅ Reference |
## Platforms where the hash is **expected to MISMATCH**
| Platform | BLAS backend | Why |
|---|---|---|
| `darwin-arm64` (macOS arm64, Apple Silicon) | Accelerate.framework | FFT + matrix kernels differ in last-bit positions; the SHA-256 will differ even with pinned `numpy 1.26.4` / `scipy 1.14.1`. |
| Any environment with MKL installed | Intel MKL | Same root cause as Accelerate: different vectorized FFT path. |
## What to do if you get MISMATCH on a non-reference platform
The pipeline is still correct on your platform — the *output* is bit-different
because the *backend* is bit-different, not because the proof code has a bug.
Three workable responses:
1. **Run the proof on a reference platform** (Linux x86_64 or Windows x86_64
with the PyPI OpenBLAS wheels). This is what CI does.
2. **Generate a new local-reference hash** for your platform and check it
against the same hash on a teammate's machine with the *same* backend:
```bash
# Regenerate from your platform
python archive/v1/data/proof/verify.py --generate-hash
# Commit the new hash to a side file (do NOT overwrite expected_features.sha256
# unless you are publishing a new cross-platform reference)
```
3. **Compare numerical output, not the hash.** A relaxed-tolerance comparison
on the feature vectors (e.g. `np.allclose(features, reference, atol=1e-10)`)
will pass across backends. This is on the roadmap (see issue #560).
## The `verify.py` runtime environment block
Every run of `verify.py` now prints a `RUNTIME ENVIRONMENT` block before the
pipeline runs. Include that block in any issue report — it identifies the
platform + numpy version + BLAS backend in one place.

View File

@ -116,6 +116,48 @@ def print_source_provenance():
print()
def print_runtime_environment():
"""Print the platform + numpy/scipy BLAS backend.
The proof pipeline's SHA-256 is sensitive to the BLAS / FFT backend
behind numpy + scipy.fft. Different platforms ship different backends
(OpenBLAS on Linux/Windows wheels, Accelerate.framework on macOS arm64,
MKL when installed) and they produce bit-different output on identical
IEEE 754 inputs. Surfacing the backend up front turns an unexplained
MISMATCH into a one-line diagnosis -- see issue #560.
"""
import platform
print(" RUNTIME ENVIRONMENT:")
print(f" Platform : {platform.platform()}")
print(f" Machine : {platform.machine()}")
print(f" Python : {platform.python_version()} ({platform.python_implementation()})")
# numpy BLAS / LAPACK backend.
try:
blas_info = np.__config__.blas_ilp64_opt_info # type: ignore[attr-defined]
backend = getattr(blas_info, "get", lambda *_: None)("libraries", None) or "unknown"
except Exception:
# Newer numpy (>= 1.26) reports via show_config(); fall back to a stringified dump.
try:
import io
buf = io.StringIO()
np.show_config(mode="dicts") if hasattr(np, "show_config") else None
# `show_config(mode='dicts')` returns a dict in numpy >= 1.26.
cfg = np.show_config(mode="dicts") if hasattr(np, "show_config") else {}
if isinstance(cfg, dict):
blas = cfg.get("Build Dependencies", {}).get("blas", {})
backend = blas.get("name", "unknown")
else:
backend = "unknown"
except Exception:
backend = "unknown"
print(f" numpy BLAS : {backend}")
print(" (FFT/BLAS backend affects the hash -- see #560 if MISMATCH on")
print(" macOS arm64 / Accelerate. Reference platforms: linux-x86_64,")
print(" windows-x86_64 with OpenBLAS; see expected_features.sha256.)")
print()
def load_reference_signal(data_path):
"""Load the reference CSI signal from JSON.
@ -417,6 +459,7 @@ def main():
# ---------------------------------------------------------------
print("[0/4] SOURCE PROVENANCE")
print_source_provenance()
print_runtime_environment()
# ---------------------------------------------------------------
# Step 1: Load and describe reference signal
@ -518,13 +561,23 @@ def main():
print()
print(" The pipeline output does NOT match the expected hash.")
print()
print(" Possible causes:")
print(" - Numpy/scipy version mismatch (check requirements)")
print(" - Code change in CSI processor that alters numerical output")
print(" - Platform floating-point differences (unlikely for IEEE 754)")
print(" Likely causes, in order of probability:")
print(" 1. Platform BLAS/FFT backend differs from the reference.")
print(" The expected hash was generated on linux-x86_64 +")
print(" windows-x86_64 with OpenBLAS. macOS arm64 ships with")
print(" Accelerate.framework, which produces bit-different FFT")
print(" output on identical inputs (issue #560). Inspect the")
print(" RUNTIME ENVIRONMENT block printed at the top of this run.")
print(" 2. Numpy/scipy version mismatch.")
print(" Install pinned versions: pip install -r archive/v1/requirements-lock.txt")
print(" 3. Real code change in the CSI processor that alters output.")
print(" Investigate the diff against the reference commit.")
print()
print(" To update the expected hash after intentional changes:")
print(" To regenerate the expected hash on a NEW reference platform:")
print(" python verify.py --generate-hash")
print(" (Only do this if you intend to publish a new reference; the")
print(" single-platform contract of expected_features.sha256 is")
print(" documented at the top of that file.)")
print("=" * 72)
sys.exit(1)

View File

@ -40,15 +40,21 @@ MSYS_NO_PATHCONV=1 docker run --rm \
```bash
python -m esptool --chip esp32s3 --port COM7 --baud 460800 \
write_flash --flash_mode dio --flash_size 8MB \
0x0 firmware/esp32-csi-node/build/bootloader/bootloader.bin \
0x8000 firmware/esp32-csi-node/build/partition_table/partition-table.bin \
0x10000 firmware/esp32-csi-node/build/esp32-csi-node.bin
0x0 firmware/esp32-csi-node/build/bootloader/bootloader.bin \
0x8000 firmware/esp32-csi-node/build/partition_table/partition-table.bin \
0xf000 firmware/esp32-csi-node/build/ota_data_initial.bin \
0x20000 firmware/esp32-csi-node/build/esp32-csi-node.bin
```
> The app slot (`ota_0`) starts at `0x20000` per `partitions_display.csv` /
> `partitions_4mb.csv`. `ota_data_initial.bin` at `0xf000` initialises the OTA
> slot pointer; without it the bootloader can refuse to boot the app after a
> factory wipe.
### 3. Provision WiFi credentials (no reflash needed)
```bash
python scripts/provision.py --port COM7 \
python firmware/esp32-csi-node/provision.py --port COM7 \
--ssid "YourSSID" --password "YourPass" --target-ip 192.168.1.20
```
@ -254,9 +260,10 @@ Find your serial port: `COM7` on Windows, `/dev/ttyUSB0` on Linux, `/dev/cu.SLAB
```bash
python -m esptool --chip esp32s3 --port COM7 --baud 460800 \
write_flash --flash_mode dio --flash_size 8MB \
0x0 firmware/esp32-csi-node/build/bootloader/bootloader.bin \
0x8000 firmware/esp32-csi-node/build/partition_table/partition-table.bin \
0x10000 firmware/esp32-csi-node/build/esp32-csi-node.bin
0x0 firmware/esp32-csi-node/build/bootloader/bootloader.bin \
0x8000 firmware/esp32-csi-node/build/partition_table/partition-table.bin \
0xf000 firmware/esp32-csi-node/build/ota_data_initial.bin \
0x20000 firmware/esp32-csi-node/build/esp32-csi-node.bin
```
### Serial Monitor
@ -285,7 +292,7 @@ All settings can be changed at runtime via Non-Volatile Storage (NVS) without re
The easiest way to write NVS settings:
```bash
python scripts/provision.py --port COM7 \
python firmware/esp32-csi-node/provision.py --port COM7 \
--ssid "MyWiFi" \
--password "MyPassword" \
--target-ip 192.168.1.20

8
verify
View File

@ -19,9 +19,9 @@
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROOF_DIR="${SCRIPT_DIR}/v1/data/proof"
PROOF_DIR="${SCRIPT_DIR}/archive/v1/data/proof"
VERIFY_PY="${PROOF_DIR}/verify.py"
V1_SRC="${SCRIPT_DIR}/v1/src"
V1_SRC="${SCRIPT_DIR}/archive/v1/src"
# Colors (disabled if not a terminal)
if [ -t 1 ]; then
@ -136,7 +136,7 @@ echo ""
echo -e "${CYAN}[PHASE 3] PRODUCTION CODE INTEGRITY SCAN${RESET}"
echo ""
echo " Scanning ${V1_SRC} for np.random.rand / np.random.randn calls..."
echo " (Excluding v1/src/testing/ -- test helpers are allowed to use random.)"
echo " (Excluding archive/v1/src/testing/ -- test helpers are allowed to use random.)"
echo ""
MOCK_FINDINGS=0
@ -204,7 +204,7 @@ elif [ $PIPELINE_EXIT -eq 2 ]; then
echo -e " ${YELLOW}${BOLD}RESULT: SKIP${RESET}"
echo ""
echo " No expected hash file to compare against."
echo " Run: python v1/data/proof/verify.py --generate-hash"
echo " Run: python archive/v1/data/proof/verify.py --generate-hash"
echo ""
echo -e "${BOLD}======================================================================${RESET}"
exit 2