110 lines
5.0 KiB
Markdown
110 lines
5.0 KiB
Markdown
# ADR-095: Pull-based OTA Firmware Update
|
||
|
||
## Status
|
||
|
||
Proposed
|
||
|
||
## Context
|
||
|
||
ESP32 sensing nodes deployed in user homes need firmware updates without
|
||
operator-side push access. Push-based OTA (server initiates upgrades to a
|
||
known set of node IPs) is operationally heavy for consumer-grade deployments:
|
||
|
||
- Operators must enumerate every node's IP address and schedule rollouts.
|
||
- Nodes that come online intermittently or behind NAT get missed entirely.
|
||
- A node in a bad state (e.g. hung at startup) may never receive a push.
|
||
|
||
For a consumer sensing system where nodes are embedded in rooms and accessed
|
||
infrequently, this creates a support burden and leaves nodes on stale firmware.
|
||
|
||
## Decision
|
||
|
||
Adopt a pull-based OTA model: each node periodically polls a server manifest
|
||
endpoint and self-upgrades when a newer version is available. Operators publish
|
||
new firmware to the server; nodes fetch it at their next poll cycle.
|
||
|
||
## Architecture
|
||
|
||
### Server side — `firmware_registry` module
|
||
|
||
`v2/crates/wifi-densepose-sensing-server/src/firmware_registry.rs` provides
|
||
a pure-data, transport-agnostic registry:
|
||
|
||
- `FirmwareRegistry` — in-memory holder for the currently-blessed firmware
|
||
binary: version, SHA-256 hex digest, byte size, file path, compile time.
|
||
- `set_current(path)` — reads a file from disk, computes SHA-256, parses the
|
||
version string from either a sidecar `.manifest.json` or the filename
|
||
(patterns: `esp32-csi-node-0.8.0-watchdog.bin`).
|
||
- `is_update_available(running_version)` — simple string comparison helper.
|
||
- `sha256_bytes(&[u8])` + `sha256_file(Path)` — pure-Rust SHA-256 helpers
|
||
using the `sha2` crate.
|
||
- Minimum firmware size: 256 KB (rejects truncated uploads).
|
||
- 11 unit tests covering hex encoding, version parsing, manifest sidecar
|
||
priority, size rejection, missing-file rejection, and SHA-256 round-trips.
|
||
|
||
### Server HTTP endpoints (wired in `main.rs`)
|
||
|
||
| Method | Path | Purpose |
|
||
|--------|------|---------|
|
||
| `GET` | `/api/v1/firmware/latest` | Returns `{available, version, sha256, size, compile_time, download_url}` |
|
||
| `GET` | `/api/v1/firmware/download` | Streams binary with `X-Firmware-Version` + `X-Firmware-Sha256` headers |
|
||
| `POST` | `/api/v1/firmware/upload?version=X[&sha256=HEX]` | Operator uploads; server computes SHA-256, optionally verifies client-supplied hash, writes to `<firmware_dir>/esp32-csi-node-<version>.bin` |
|
||
|
||
On startup the server scans `--firmware-dir` (env `FIRMWARE_DIR`, default
|
||
`/app/data/firmware`) for the newest `.bin` by mtime and seeds the registry.
|
||
This is non-fatal — the server starts normally if no firmware is staged.
|
||
|
||
### Firmware client — `ota_pull` module
|
||
|
||
`firmware/esp32-csi-node/main/ota_pull.c` (+413 LOC):
|
||
|
||
1. `GET /api/v1/firmware/latest` — parse `{available, version, sha256, size}`.
|
||
2. Compare `version` with the compile-time `esp_app_desc.version`.
|
||
3. If newer: `GET /api/v1/firmware/download` — write binary to the ESP-IDF
|
||
OTA partition via `esp_ota_ops`.
|
||
4. Verify SHA-256 of downloaded bytes against the server-advertised hash.
|
||
5. Call `esp_ota_set_boot_partition` and `esp_restart()`.
|
||
|
||
Guards:
|
||
- Waits for `OTA_MIN_UPTIME_SEC` (300 s) before first check — avoids
|
||
boot-loop on a node that OTA'd to bad firmware.
|
||
- Stops BLE before flashing to prevent Core 1 StoreProhibited crash.
|
||
- Aborts if the download exceeds `OTA_MAX_SIZE`.
|
||
- Graceful failure on network error — retries on next poll cycle.
|
||
|
||
Poll interval: `OTA_CHECK_INTERVAL_SEC` = 300 s (configurable at compile time).
|
||
|
||
### Rollback (ESP-IDF built-in)
|
||
|
||
The ESP-IDF OTA partition scheme includes an application rollback mechanism.
|
||
After `esp_ota_set_boot_partition`, the new firmware must call
|
||
`esp_ota_mark_app_valid_cancel_rollback()` within a configurable window, or
|
||
the bootloader rolls back to the previous partition. `ota_pull.c` relies on
|
||
the existing `ota_update.c` canary task for this confirmation.
|
||
|
||
## Consequences
|
||
|
||
**Positive:**
|
||
- Zero operator action for routine upgrades; nodes that come online late catch
|
||
up automatically on their next poll cycle.
|
||
- Tolerates intermittent connectivity — retry is just the next poll tick.
|
||
- No inbound firewall holes required — nodes initiate all connections.
|
||
- Latecomers behind NAT/CGNAT are handled identically to nodes on the LAN.
|
||
|
||
**Negative:**
|
||
- Upgrade latency is up to one poll interval (default 5 minutes).
|
||
- The manifest endpoint is discoverable; anyone who can reach the server can
|
||
learn the current firmware version and download the binary. Mitigated by
|
||
network segmentation; manifest signing is out of scope for this ADR.
|
||
- Poll traffic at scale: 11 nodes × 1 req/5 min = ~2 req/min steady-state.
|
||
Negligible.
|
||
|
||
## Related
|
||
|
||
- Firmware client: `firmware/esp32-csi-node/main/ota_pull.c` + `ota_pull.h`
|
||
- Server registry: `v2/crates/wifi-densepose-sensing-server/src/firmware_registry.rs`
|
||
- Server wiring: `v2/crates/wifi-densepose-sensing-server/src/main.rs`
|
||
(routes `/api/v1/firmware/*`, `AppStateInner::firmware_registry`, `scan_firmware_dir`)
|
||
- ADR-018: ESP32 binary frame format (firmware identity)
|
||
- ADR-057: Firmware CSI build guard
|