# ADR-102: Edge Module Registry Integration - **Status:** Accepted - **Date:** 2026-05-19 - **Deciders:** ruv ## Context The Cognitum app ecosystem publishes a canonical app store catalog at: ``` https://storage.googleapis.com/cognitum-apps/app-registry.json ``` As of v2.1.0 (2026-05-13) the registry advertises **105 cogs across 11 categories** (health, security, building, retail, industrial, research, ai, swarm, signal, network, developer). Each entry carries `id`, `name`, `category`, `version`, `description`, `size_kb`, `difficulty`, `sha256`, `binary_size`, and a `config[]` schema describing the runtime parameters the appliance offers when installing the cog. RuView today has no live awareness of this catalog. The `README.md` capability table is hand-curated; the UI surfaces only the capabilities the dashboard's HTML knows about; nothing in `wifi-densepose-sensing-server` references the registry. Result: when Cognitum ships a new cog (the registry was last updated 6 days ago — a fast cadence), RuView stays unaware until someone manually edits the README. Customers running the RuView dashboard against a real appliance see a 10-capability bag in the UI while the appliance is actually capable of installing 105 cogs. Today's `cog-pose-estimation@0.0.1` release (PRs #642 / #643, ADR-100, ADR-101) is the first cog this repo ships to that registry. We need the discovery side to match. ## Decision `wifi-densepose-sensing-server` will fetch `app-registry.json` on demand, cache it in process memory with a TTL, and serve it back through a new endpoint: ``` GET /api/v1/edge/registry GET /api/v1/edge/registry?refresh=1 (force-bypass cache, log if abused) ``` The registry is **passively surfaced**, not modified. RuView is a presentation layer for the canonical Cognitum catalog; it never re-signs entries or re-hosts binaries. ### Module `v2/crates/wifi-densepose-sensing-server/src/edge_registry.rs` — small, ~150 lines. ```rust pub struct EdgeRegistry { cached: RwLock>, ttl: Duration, upstream_url: String, } struct CachedEntry { payload: serde_json::Value, fetched_at: Instant, upstream_sha256: String, } ``` Cache semantics: - TTL **3600 s (1 hour)** by default — registry updates land on a roughly-weekly cadence and a stale-by-an-hour catalog is fine. - `?refresh=1` bypasses the cache but writes a debug log so accidental abuse is visible. - On upstream fetch failure when the cache is non-empty, **serve the stale cached copy** with a `stale: true` marker in the response and a 200 status (preserve UI), not a 5xx. - On upstream fetch failure when the cache is empty, return 503 with the upstream error in the body. ### Response shape ```jsonc { "fetched_at": 1779200000, // server-side fetch timestamp "ttl_seconds": 3600, "stale": false, // true when serving past TTL because upstream is down "upstream_url": "https://storage.googleapis.com/cognitum-apps/app-registry.json", "upstream_sha256": "", "registry": { /* full canonical JSON as returned upstream */ } } ``` The `registry` field is the upstream JSON inlined verbatim so consumers don't need to make a second hop. `upstream_sha256` lets a paranoid consumer compare against a pinned hash. ### Trust / verification - Bucket is public-read with object versioning enabled (per ADR-100 §"GCS misconfiguration risks"). - The cog-level `binary_sha256` + `binary_signature` (ADR-100) are the trust roots for *installs*. The registry itself is not signed today. - We deliberately **do not** add a signature requirement to the registry JSON in this ADR — that would block the integration on a parallel infrastructure project. A future ADR can layer signature checks on top once the publisher pipeline emits them. ### UI surfacing New page `ui/edge-modules.html` renders the registry into category sections with cog cards. Each card links out to the Cognitum V0 appliance's `/cogs` page (`http://cognitum-v0:9000/cogs#`) for the install action — RuView itself never installs. The existing dashboard's "Capabilities" section continues to show RuView-native sensing capabilities (presence, breathing, pose, etc. — the things RuView itself runs); the new edge-modules page shows the broader Cognitum cog catalog. The two are distinct surfaces and shouldn't be merged. ### Failure modes | Scenario | Behaviour | |---|---| | Upstream returns 200 with valid JSON | Cache it, return it. | | Upstream returns 200 with invalid JSON | Treat as failure; serve stale if available else 503. Log the upstream sha + the parse error. | | Upstream returns 4xx / 5xx | Same as JSON-invalid: serve stale if available else 503. | | TLS / DNS / timeout error | Same. | | Upstream is permanently moved | Operator updates the `upstream_url` config (CLI flag added). No code change required to migrate registries. | ### Configuration - `--edge-registry-url ` — override the default (default: `https://storage.googleapis.com/cognitum-apps/app-registry.json`) - `--edge-registry-ttl-secs ` — override the cache TTL (default: 3600) - `--no-edge-registry` — disable the endpoint entirely (returns 404). For air-gapped deployments. ## Consequences ### Positive - One source of truth for the cog catalog across RuView + Cognitum dashboards. - Zero ongoing maintenance: when Cognitum publishes registry v2.2.0, RuView sees it within an hour without a release. - The endpoint is also useful for non-UI consumers (CI checks, fleet automation, third-party integrations). - Lets us deprecate the hand-curated README capability table in favour of generated content (separate PR). ### Negative - Adds an outbound HTTP dependency to the sensing-server. Air-gapped deployments must use `--no-edge-registry`. - Stale-but-served behaviour can mask upstream outages from operators. Mitigation: include `stale: true` + `fetched_at` in the response so the UI can render a "registry possibly out of date" badge. ### Risks - **Upstream rug-pull**: if `cognitum-apps` is deleted or replaced, the endpoint goes dark. The `--edge-registry-url` flag lets operators repoint without a code change. Long-term, RuView could mirror the registry into its own GCS bucket if the relationship requires it. - **Cache poisoning**: the upstream is public-read; an attacker who breaches Cognitum's GCS write could push a bad registry. The cog-level signatures (ADR-100) limit the blast radius — bad registry entries can't install bad binaries, only show wrong metadata. Acceptable until registry-level signing lands. ## Security review A real review of the attack surface this endpoint introduces. ### Threats considered | # | Threat | Mitigation in this ADR | |---|--------|------------------------| | T1 | **SSRF** — operator-supplied `--edge-registry-url` redirects fetches to an internal target | Flag is operator-only (CLI / env) — there is no API endpoint to mutate it at runtime. Operators are already trusted (they control the binary). | | T2 | **Outbound dependency reveals deployment** — a passive observer of the egress sees the appliance phoning home to GCS | Documented in the docstring + the runtime startup log. Operators wanting offline deployments use `--no-edge-registry`. | | T3 | **Malicious upstream registry** — Cognitum's GCS bucket is breached and a poisoned `app-registry.json` is served | Two layers absorb this: (a) the registry's role is **discovery only** — installs verify the per-cog `binary_sha256` + `binary_signature` (ADR-100); a wrong description string can mislead a human, but a wrong binary still has to pass Ed25519 against `COGNITUM_OWNER_SIGNING_KEY`. (b) The endpoint exposes `upstream_sha256` so a paranoid operator can pin the expected registry hash externally and alert on drift. | | T4 | **Response inflation** — upstream returns a multi-GB payload to exhaust memory | `MAX_PAYLOAD_BYTES = 8 MiB` cap (current registry is ~50–200 KB). Exceeding cap returns an error without buffering past the cap. | | T5 | **Slow upstream blocking server threads** — Slowloris-style stall on the fetch | 10-second wire timeout via `ureq::AgentBuilder`. Per-handler fetch runs inside `tokio::task::spawn_blocking` so a stalled fetch never blocks the async runtime. | | T6 | **Denial via `?refresh=1` abuse** — unauthenticated callers force-bypass the cache repeatedly | Cache lives in process; `?refresh=1` triggers a single upstream fetch behind a synchronous code path. A flood of refresh requests is rate-limited by the upstream's own throttling (GCS) and locally serialised by Rust's `RwLock`. Refresh requests are logged at `debug` so abuse is visible. **Follow-up:** add per-IP rate-limit middleware if seen abused (separate PR; tracked in #574-style follow-up). | | T7 | **JSON deserialisation panics** — malformed registry triggers a Rust panic | Payload is parsed as `serde_json::Value` (opaque untyped tree) — never coerced into a strongly-typed struct that could panic. Failure is propagated as `FetcherError::Network` which the handler maps to 503. | | T8 | **Stale-on-error masks outages from operators** | Response carries `stale: true` + `fetched_at` (unix timestamp). UI rendering MUST surface this badge — encoded as an explicit field, not an implicit silence. | | T9 | **TLS downgrade / MITM on the fetch** | `ureq` is built with the `tls` feature (rustls) by default. No `--insecure` flag exists. If the upstream uses LetsEncrypt the cert chain is system-trusted; certificate pinning is out of scope (would block the bucket from rotating certs). | | T10 | **Unauthenticated access exposes ‘what cogs exist’** | The registry is canonical-public information (already public-read on GCS via anonymous HTTP GET). Surfacing it on a local LAN HTTP API does not increase its disclosure. The endpoint stays under the project's existing `RUVIEW_API_TOKEN` Bearer auth — when set, the registry is gated like other `/api/v1/*` routes. | | T11 | **Configuration injection via env var** — `RUVIEW_EDGE_REGISTRY_URL` set to a malicious URL by an attacker who controls the process environment | If an attacker controls the env, they own the process; this is not a new threat surface. Documented in the CLI help. | | T12 | **Cache mutation across threads / poisoning** | The cache is `RwLock>`. Writes go through `cached.write()` once per fetch. Snapshot reads `clone()` the `CachedEntry` (cheap — `Value` is reference-counted internally for large strings) so concurrent readers don't share mutable state. Tests cover the multi-call path; no `unsafe` is used. | ### What this ADR does NOT secure - **Registry-level signing** — the JSON payload itself is unsigned. If/when Cognitum's publisher pipeline emits a registry sig (e.g. detached `.json.sig`), a follow-up ADR will require it. Today the per-cog binary signature (ADR-100) is the actual trust root for installs; the registry is metadata. - **Per-client rate-limiting on `?refresh=1`** — relies on the upstream's own throttling. If we see abuse we'll add a token-bucket middleware; not needed for v0.0.1. ### Testing | Test | What it verifies | |------|------------------| | `first_call_hits_upstream_and_caches` | Single fetch, then cache hit | | `ttl_expiry_triggers_refetch` | Cache TTL bound respected | | `force_refresh_bypasses_fresh_cache` | `?refresh=1` semantics | | `stale_serve_on_upstream_failure_after_cached_success` | T8 explicit (`stale: true` returned) | | `no_cache_no_upstream_returns_error` | T3/T5 — error propagated cleanly when nothing to fall back on | | `upstream_invalid_json_is_treated_as_error` | T7 — malformed payload doesn't panic | | `upstream_sha256_is_deterministic` | T3 — hash field is reliable for external pinning | All 7 tests in `src/edge_registry.rs::tests` pass. ## Migration 1. Land this ADR + the implementing PR. 2. UI: ship `ui/edge-modules.html` and link from `index.html`. 3. After two clean releases of the endpoint, remove the hand-curated "Capabilities" table from `README.md` and replace with a small "see the appliance for the full catalog" pointer. 4. Future ADR: registry signing once Cognitum's publisher pipeline emits a sig. ## See also - ADR-100: Cognitum Cog Packaging Specification (binary trust model). - ADR-101: Pose Estimation Cog (the first repo-shipped cog visible in the registry). - v0-appliance ADR-220: Cog management surface (where this registry is the input to install actions). - `docs/benchmarks/pose-estimation-cog.md`: the per-cog benchmark format this ADR's response shape complements.