diff --git a/docs/references/ota-pipeline.md b/docs/references/ota-pipeline.md
new file mode 100644
index 00000000..5eea2b0a
--- /dev/null
+++ b/docs/references/ota-pipeline.md
@@ -0,0 +1,329 @@
+# OTA Pipeline — Full Reproduction Recipe
+
+Verbatim agent contribution (2026-05-17), saved as authoritative
+reference for the WiFi-OTA flow on this RuView fork. Kept whole
+deliberately — splitting it would lose the diagnostic flowchart.
+
+## TL;DR
+
+OTA works because **three FW-side fixes** are in place. Without them
+the chip receives the firmware, reboots, **panics during early boot
+of the new partition**, the bootloader rolls back, and from outside
+it looks like "OTA didn't work" even though the upload succeeded.
+Most agents focus on the network side (curl, gh-action) and miss it,
+because the bug lives inside the firmware.
+
+---
+
+## 0 · Prerequisites (without them OTA = panic loop)
+
+These three things **must already be in the firmware running on the
+chip** (i.e. in ota_0/factory before the first OTA). If they're not
+there, fix once via USB-flash; after that, OTA works.
+
+### A. `OTA_SIZE_UNKNOWN` instead of `OTA_WITH_SEQUENTIAL_WRITES`
+
+**File:** `firmware/esp32-csi-node/main/ota_update.c:137`
+
+```c
+esp_err_t err = esp_ota_begin(update_partition, OTA_SIZE_UNKNOWN, &ota_handle);
+```
+
+**Why:** `OTA_WITH_SEQUENTIAL_WRITES` erases 4 KB pages on the fly
+as it writes. If the new binary (~870 KB) is smaller than the previous
+one in the same partition (~1.1 MB), **tail of the old code stays in
+the partition**. The SHA-image-verify in `esp_ota_end()` only checks
+the declared image-header length — residual code isn't covered. After
+reboot the new app may jump into IRAM / a .literal pool address
+overlapped by stale code → **Guru Meditation Error** → bootloader
+rolls back.
+
+`OTA_SIZE_UNKNOWN` forces a **full partition erase before write**
+(~1.5 s overhead, unnoticeable).
+
+### B. `config.stack_size = 8192` for httpd
+
+**File:** `firmware/esp32-csi-node/main/ota_update.c:225`
+
+```c
+httpd_config_t config = HTTPD_DEFAULT_CONFIG();   // default stack_size = 4096
+config.server_port = OTA_PORT;
+config.max_uri_handlers = 12;
+config.recv_wait_timeout = 30;
+config.stack_size = 8192;                          // ← critical
+```
+
+**Why:** `esp_ota_end()` streams a SHA-256 verify over the entire
+image and walks the mmap segments = >5 KB of local variables. On the
+standard 4 KB httpd-task stack → **stack overflow** at validation
+time. The chip panics **inside the handler**, before
+`esp_ota_set_boot_partition()`. From outside you see
+`{"status":"ok"}` (it's sent before `esp_ota_end`), but the partition
+doesn't switch.
+
+### C. Reset reason logged in `app_main`
+
+**File:** `firmware/esp32-csi-node/main/main.c:130-153`
+
+```c
+static const char *reset_reason_str(esp_reset_reason_t r) {
+    switch (r) {
+        case ESP_RST_PANIC:    return "PANIC";
+        case ESP_RST_TASK_WDT: return "TASK_WDT";
+        case ESP_RST_SW:       return "SW";
+        ...
+    }
+}
+void app_main(void) {
+    esp_reset_reason_t rr = esp_reset_reason();
+    const esp_partition_t *running = esp_ota_get_running_partition();
+    ESP_LOGI(TAG, "boot: reset_reason=%s running_partition=%s",
+             reset_reason_str(rr),
+             running ? running->label : "?");
+    ...
+}
+```
+
+**Why:** Without this line you **cannot tell** "new image booted
+cleanly after OTA" from "new image panicked → rolled back". `/ota/status`
+looks the same (or suspicious) in both cases. With this line the
+first UART line after boot tells the truth:
+
+- `reset_reason=SW  running_partition=ota_1` → OTA OK, new image in ota_1.
+- `reset_reason=PANIC  running_partition=ota_0` → new image panicked,
+  rollback worked. **This is the case other agents get stuck in —
+  without the log it's impossible to diagnose.**
+
+---
+
+## 1 · Wire format of POST /ota
+
+**Endpoint:** `POST http://<node-ip>:8032/ota`
+
+**Headers:**
+- `Content-Type: application/octet-stream` (required)
+- `Content-Length: <bytes>` (curl/urllib sets it)
+- `Authorization: Bearer <psk>` (only if `security/ota_psk` is in NVS)
+
+**Body:** raw bytes of `build/esp32-csi-node.bin` — no multipart, no base64.
+
+**Response on success:**
+
+```json
+{"status":"ok","message":"OTA update successful. Rebooting..."}
+```
+
+**Important about the response:** the chip sends it **before
+`esp_restart()`**, but `vTaskDelay(1000ms)` between response and
+restart **does not guarantee delivery**. On macOS / Linux curl will see:
+
+- `{"status":"ok"...}`, or
+- `Connection reset by peer` (TCP RST from the dying side), or
+- `Recv failure`.
+
+**All three are upload success.** The real check is NOT curl's
+status — it's a **second GET `/ota/status` after reboot**.
+
+---
+
+## 2 · Chip's path through the handler
+
+```
+HTTP POST /ota
+    │
+    ▼
+ota_check_auth(req)              ← if PSK in NVS, verifies Authorization header
+    │
+    ▼
+esp_ota_get_next_update_partition(NULL)
+    │                            ← running in ota_0 → returns ota_1, and vice-versa
+    ▼
+esp_ota_begin(part, OTA_SIZE_UNKNOWN, &handle)
+    │                            ← full erase of target partition (~1.5 s)
+    ▼
+loop {
+    received = httpd_req_recv(req, buf, 1024)
+    esp_ota_write(handle, buf, received)
+}                                ← writes in 1 KB chunks
+    │
+    ▼
+esp_ota_end(handle)              ← SHA-256 verify over the entire image (>5 KB stack)
+    │
+    ▼
+esp_ota_set_boot_partition(part) ← writes "boot from target" into otadata
+    │
+    ▼
+httpd_resp_send(JSON)            ← replies {"status":"ok"...}
+    │
+    ▼
+vTaskDelay(1000ms)               ← window so TCP flush goes out (best-effort)
+    │
+    ▼
+esp_restart()                    ← soft reset via RTC_SW_CPU_RST
+    │
+    ▼
+[bootloader picks ota_1 from otadata → loads new image → app_main]
+    │
+    ▼
+"I (335) main: boot: reset_reason=SW running_partition=ota_1"
+```
+
+---
+
+## 3 · Flashing via `scripts/ota-deploy.sh`
+
+```bash
+# Scenario A — deploy to all nodes on local /24 (auto-discover):
+scripts/ota-deploy.sh
+
+# Scenario B — specific IPs:
+scripts/ota-deploy.sh 192.168.0.100 192.168.0.101
+
+# Scenario C — build before deploy:
+scripts/ota-deploy.sh --build
+
+# Scenario D — with auth:
+OTA_PSK=your_token scripts/ota-deploy.sh
+```
+
+**What the script does under the hood (4 phases):**
+
+### Phase 1 — discovery
+
+```python
+arp -a -n  →  ['192.168.0.100', '192.168.0.101', ...]
+# parallel GET /ota/status:8032 (timeout 1.5s)
+# only IPs that return valid JSON survive
+```
+
+If ARP is empty (fresh Mac boot) → fallback ping-sweep `.100`–`.110`.
+
+### Phase 2 — snapshot before
+
+```
+GET /ota/status:8032 on each node
+→  remember running_partition (ota_0 or ota_1)
+```
+
+### Phase 3 — parallel upload
+
+```python
+ThreadPoolExecutor(max_workers=len(targets))
+for each node:
+    urllib POST with body = read_bytes(esp32-csi-node.bin)
+    ConnectionResetError caught as expected (that's the reboot)
+```
+
+### Phase 4 — verify
+
+```
+sleep 10  ← wait for boot to finish
+for each node (up to 6 retries, 3-s delay):
+    GET /ota/status:8032
+    new_part != old_part   →  ✓
+    new_part == old_part   →  ✗ FAIL (panicked)
+exit 0 if all OK, 1 if any node didn't confirm
+```
+
+---
+
+## 4 · Diagnosis when "OTA doesn't work"
+
+Flowchart that catches **every observable failure mode** on ESP32-S3
+in this FW:
+
+```
+GET /ota/status works?
+├── 404/timeout    → node offline / wrong network / IP changed (check `arp -a`)
+├── 200, time=OLD  → OTA didn't take (see below)
+└── 200, time=NEW  → OTA OK ✓
+
+OTA didn't take — diagnose via UART (USB!):
+
+See "boot: reset_reason=..." in UART?
+├── reset_reason=POWERON  → chip didn't reboot — POST didn't arrive, check curl
+├── reset_reason=SW  AND  running_partition=ota_X  → OTA OK, may be server-side cache
+├── reset_reason=PANIC AND running_partition=ota_0
+│       → NEW image panics at boot
+│       → causes (most likely first):
+│           1. OTA_WITH_SEQUENTIAL_WRITES → tail of old code (fix A above)
+│           2. esp_ota_end stack overflow (fix B above)
+│           3. ABI mismatch bootloader vs new app (USB-flash bootloader.bin)
+│           4. real bug in new code (read the backtrace before PANIC)
+├── reset_reason=TASK_WDT → handler hung mid-upload
+└── reset_reason=BROWNOUT → power supply browned out under stress
+                            (USB on bus power?)
+```
+
+If UART is unavailable (no USB) but HTTP works: POST then GET
+`/ota/status` three times at 5 s intervals. If `next_partition`
+flip-flops, the chip is in a panic loop. That's a definitive diagnosis.
+
+---
+
+## 5 · Why other agents fail (common pitfalls)
+
+| Pitfall | Symptom | Fix |
+|---|---|---|
+| Treat OTA as a pure network problem, never look at FW | "POST returned 200 but time doesn't change" → endless curl-header experiments | **Verify the three FW prerequisites first**, before any curl |
+| Use `OTA_WITH_SEQUENTIAL_WRITES` (it's in IDF examples) | OTA works once, stops working after binary size changes | Switch to `OTA_SIZE_UNKNOWN` |
+| Leave httpd stack at 4 KB | Sometimes works (fast SHA), sometimes doesn't — looks flaky | `config.stack_size = 8192` |
+| Enable `CONFIG_BOOTLOADER_APP_ROLLBACK_ENABLE=y` "for safety" | Every OTA rolled back because nobody calls `esp_ota_mark_app_valid_cancel_rollback()` | Either disable, or call the API after 10 s |
+| `curl` without `--data-binary` (only `-d`) | Binary corrupted by HTML-encoding | Use `--data-binary @file.bin` or urllib bytes |
+| Measure success by HTTP response code | Connection reset = normal (esp_restart kills socket), not failure | Re-check via **GET /ota/status after reboot** |
+| Don't wait 10 s after reboot before verify | Verify times out, agent thinks OTA failed | `sleep 10` (or backoff retries) |
+| Ignore that mDNS names drift | Flash the wrong node, or stale ARP cache | Auto-discover by IP **at deploy time**, not by hostname |
+| Share a single file descriptor across upload threads | Race conditions, partial reads | Each upload-thread opens its own file |
+| Rely on bootloader rollback instead of explicit app_valid | Image sometimes flagged BAD, OTA becomes non-idempotent | If rollback enabled, MUST call `esp_ota_mark_app_valid_cancel_rollback()` |
+
+---
+
+## 6 · Things other agents do **wrong**
+
+From recurring patterns in others' logs:
+
+1. **Rely on `idf.py flash --port .../ota`** — that mode does NOT
+   exist in idf.py. OTA is only via the HTTP handler.
+2. **Send via `ssh esp32 'esp_ota_write ...'`** — ESP32 has no shell;
+   OTA is only via the HTTP endpoint.
+3. **Run MQTT-based OTA** — this FW has no MQTT client; only HTTP
+   POST on 8032.
+4. **Use ESP RainMaker / esp_https_ota** — those require HTTPS +
+   cert; we serve plain HTTP. Don't confuse the APIs.
+5. **Re-use an old build of
+   `firmware/esp32-csi-node/build/esp32-csi-node.bin`** — forget to
+   run `idf.py build`. The script's `--build` solves that.
+
+---
+
+## 7 · Quick reference (for the next agent)
+
+```bash
+# Once over USB if the nodes still run pre-fix firmware:
+cd /Users/arsen/Desktop/RuView/firmware/esp32-csi-node
+source ~/esp/esp-idf-v5.2/export.sh
+idf.py build
+
+# Hold BOOT+RESET on the device
+cd build
+esptool.py --chip esp32s3 --port /dev/cu.usbmodem... -b 460800 \
+  --before default-reset --after hard-reset write-flash \
+  --flash-mode dio --flash-size 8MB --flash-freq 80m \
+  0x0 bootloader/bootloader.bin \
+  0x8000 partition_table/partition-table.bin \
+  0xf000 ota_data_initial.bin \
+  0x20000 esp32-csi-node.bin
+
+# Forever after, over WiFi:
+scripts/ota-deploy.sh --build
+# (auto-discover, parallel POST, verify, exit code)
+```
+
+---
+
+**Bottom line:** OTA is not "send a file via curl", it's an
+**end-to-end protocol** between the on-chip handler and the host
+tooling. 80 % of the work lives on the FW side (correct erase,
+correct stack, correct log). The network part is trivial
+(`urllib.request.urlopen(POST)`). Agents who "can't" usually stopped
+at the network layer and didn't realise the chip is panicking.