feat(pointcloud): use MediaPipe Face Mesh for the live demo (ADR-094)

The previous synthetic procedural demo did not represent what the local fusion pipeline produces — a real depth-backprojected point cloud of the user's face and surroundings. This commit ports the closest browser equivalent: MediaPipe Face Mesh runs in-browser at ~30 fps and emits 478 3D landmarks per frame. Each visitor now sees the outline of their own face rendered as a point cloud, with a small floor + back wall for spatial context. - Adds MediaPipe Face Mesh + Camera Utils via jsdelivr CDN. - Adds an "▶ Enable camera" CTA so getUserMedia is gated on a user gesture (required by some browsers and good UX regardless). - New face-mesh frame generator uses the same splat shape as the live /api/splats payload, so a single render path drives both modes. - Mirrors x to match selfie convention; maps lm.z (relative depth) to the world-coord range used by the live pipeline. - Falls back automatically to the procedural floor + walls + figure when the camera is denied, dismissed, or unavailable. - Badge surfaces the new state: '● DEMO Your Face (MediaPipe)'. - Bumps poll cadence to 4 Hz so face mesh updates feel live. - ADR-094 updated to reflect the new default behavior. Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-29 19:42:51 -04:00 · 2026-04-29 19:42:51 -04:00 · cbedbce9e3
parent 7343bdc4dd
commit cbedbce9e3
2 changed files with 177 additions and 17 deletions
--- a/docs/adr/ADR-094-pointcloud-github-pages-deployment.md
+++ b/docs/adr/ADR-094-pointcloud-github-pages-deployment.md
@ -45,10 +45,22 @@ Ship **one** viewer that auto-selects its transport from URL parameters,
 and publish it to `gh-pages/pointcloud/` alongside the other demos:

 1. **Default mode** — when the viewer is opened with no query parameters
-   on `https://ruvnet.github.io/RuView/pointcloud/`, render a synthetic
-   in-browser scene (floor grid, walls, breathing/swaying figure, animated
-   17-keypoint skeleton) and label the badge `● DEMO Synthetic`. No
-   network calls are made. Renders forever, deterministic, ~200 splats.
+   on `https://ruvnet.github.io/RuView/pointcloud/`, present a "▶ Enable
+   camera" CTA. On click the viewer requests webcam access, runs
+   **MediaPipe Face Mesh** in-browser (~30 fps, 478 refined landmarks),
+   and renders the visitor's own face as a point cloud — the closest
+   browser equivalent of the local pipeline's depth-backprojected face
+   geometry that motivated this ADR (`I could see the outline of my face
+   in points`). The viewer mirrors x to match selfie convention and
+   maps Face Mesh's relative-z to the same world-coordinate range the
+   live `/api/splats` payload uses, so a single render path drives both.
+   Badge reads `● DEMO Your Face (MediaPipe)`. If the user denies
+   camera permission, dismisses the prompt, or visits on a device
+   without a webcam, the viewer falls back automatically to a
+   procedural scaffold (floor grid, walls, breathing figure, 17-keypoint
+   skeleton). All processing is client-side; no frames leave the
+   browser. ~480-500 splats from the face plus ~110 floor/wall context
+   splats.
 2. **Auto mode** (`?backend=auto`) — fetch from `/api/splats` on the same
   origin. This is the local-development case (`ruview-pointcloud serve`
   serves the viewer and the API together). On any failure (404, network
@ -99,11 +111,14 @@ and nvsim deployments.

 ### Negative / tradeoffs

- **Synthetic ≠ real.** The demo figure is procedural, not recorded from
-  hardware, so visitors cannot see *real* CSI-derived poses without
-  supplying `?backend=`. We accept this — the alternatives (pre-recorded
-  JSON, on-page WASM inference) add maintenance cost and diverge the
-  render path.
+- **Face mesh ≠ CSI.** Browser webcam + MediaPipe gives real face
+  geometry but does not produce CSI-derived pose. Visitors who want to
+  see the *WiFi-driven* path still need `?backend=<their-host>`. The
+  procedural fallback is not WiFi-driven either; it is purely visual
+  scaffolding. We accept this — the goal of the hosted demo is to
+  convey the *shape* of what the local pipeline produces (a point
+  cloud of the user) rather than reproduce the WiFi physics in the
+  browser. The latter is a future ADR (WASM port of the fusion crate).
 - **CORS burden on remote mode.** Users who want to share their backend
  must add `Access-Control-Allow-Origin: https://ruvnet.github.io` (or
  `*`) to their `ruview-pointcloud serve` config. We document this in the
@ -139,9 +154,11 @@ This ADR is **Implemented** when all of the following hold:
 1. Pushing to `main` with a viewer change triggers
   `pointcloud-pages.yml`, which deploys to `gh-pages/pointcloud/` in
   under 60 seconds.
-2. `https://ruvnet.github.io/RuView/pointcloud/` loads, renders the
-   synthetic scene, displays `● DEMO Synthetic` badge, and shows
-   non-zero splat + frame counts.
+2. `https://ruvnet.github.io/RuView/pointcloud/` loads, shows the
+   "Enable camera" CTA, and on accept renders the visitor's face as a
+   point cloud with badge `● DEMO Your Face (MediaPipe)` and non-zero
+   splat + frame counts. On camera denial, falls back to the
+   procedural scene with badge `● DEMO Synthetic`.
 3. Existing demos at `https://ruvnet.github.io/RuView/` and
   `…/pose-fusion.html` and `…/nvsim/` are still reachable after the
   first deploy (smoke-tested manually).
--- a/v2/crates/wifi-densepose-pointcloud/src/viewer.html
+++ b/v2/crates/wifi-densepose-pointcloud/src/viewer.html
@ -5,24 +5,32 @@
    <style>
        body { margin: 0; background: #0a0a0a; color: #e8a634; font-family: monospace; }
        canvas { display: block; }
-        #info { position: absolute; top: 10px; left: 10px; padding: 12px; background: rgba(0,0,0,0.85); border: 1px solid #e8a634; border-radius: 6px; min-width: 240px; font-size: 13px; line-height: 1.5; }
+        #info { position: absolute; top: 10px; left: 10px; padding: 12px; background: rgba(0,0,0,0.85); border: 1px solid #e8a634; border-radius: 6px; min-width: 240px; font-size: 13px; line-height: 1.5; z-index: 10; }
+        #cam-cta { position: absolute; bottom: 16px; left: 50%; transform: translateX(-50%); padding: 10px 18px; background: #e8a634; color: #0a0a0a; border: none; border-radius: 4px; font-family: monospace; font-size: 14px; font-weight: bold; cursor: pointer; z-index: 10; }
+        #cam-cta:hover { background: #ffc04d; }
+        #cam-cta.hidden { display: none; }
        .live { color: #4f4; } .demo { color: #f44; }
+        .face { color: #4cf; }
        .section { margin-top: 6px; padding-top: 6px; border-top: 1px solid #333; }
        .label { color: #888; }
    </style>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/three.js/r128/three.min.js"></script>
    <script src="https://cdn.jsdelivr.net/npm/three@0.128.0/examples/js/controls/OrbitControls.js"></script>
+    <!-- MediaPipe Face Mesh — runs in demo mode so each visitor sees their own face as a point cloud -->
+    <script src="https://cdn.jsdelivr.net/npm/@mediapipe/face_mesh@0.4/face_mesh.js"></script>
+    <script src="https://cdn.jsdelivr.net/npm/@mediapipe/camera_utils@0.3/camera_utils.js"></script>
 </head>
 <body>
    <div id="info">
        <h3 style="margin:0 0 8px 0">RuView Point Cloud</h3>
        <div id="stats">Loading...</div>
    </div>
+    <button id="cam-cta">▶ Enable camera — render your face as a point cloud</button>
    <script>
        var scene = new THREE.Scene();
        scene.background = new THREE.Color(0x0a0a0a);
        var camera = new THREE.PerspectiveCamera(75, window.innerWidth/window.innerHeight, 0.1, 100);
-        camera.position.set(0, 2, -4);
+        camera.position.set(0, 0.5, -1.5);
        camera.lookAt(0, 0, 2);

        var renderer = new THREE.WebGLRenderer({ antialias: true });
@ -115,6 +123,118 @@
        var transportMode = "demo"; // resolved at first fetch: "live" | "remote" | "demo"
        var demoStartMs = Date.now();
        var demoFrameNum = 0;
+        var latestFaceLandmarks = null; // populated by MediaPipe when camera enabled
+        var faceMeshState = "idle"; // "idle" | "starting" | "running" | "denied" | "unavailable"
+
+        // ----- MediaPipe Face Mesh (browser equivalent of camera-depth backprojection) -----
+        // Locally, ruview-pointcloud serve fuses real camera depth + WiFi CSI. In the
+        // browser we don't have depth from a webcam, but Face Mesh produces 468
+        // 3D landmarks (x,y in [0,1], z roughly in [-0.5,0.5]) at ~30 fps — enough to
+        // reproduce the "I can see the outline of my face in points" experience. The
+        // landmarks feed into the same splat render path as live /api/splats data.
+        async function startFaceMesh() {
+            if (faceMeshState !== "idle") return;
+            if (!window.FaceMesh || !window.Camera) {
+                faceMeshState = "unavailable";
+                return;
+            }
+            faceMeshState = "starting";
+            try {
+                var videoEl = document.createElement("video");
+                videoEl.style.display = "none";
+                videoEl.autoplay = true;
+                videoEl.playsInline = true;
+                videoEl.muted = true;
+                document.body.appendChild(videoEl);
+
+                var fm = new FaceMesh({
+                    locateFile: function(file) {
+                        return "https://cdn.jsdelivr.net/npm/@mediapipe/face_mesh@0.4/" + file;
+                    }
+                });
+                fm.setOptions({
+                    maxNumFaces: 1,
+                    refineLandmarks: true,
+                    minDetectionConfidence: 0.5,
+                    minTrackingConfidence: 0.5
+                });
+                fm.onResults(function(results) {
+                    if (results.multiFaceLandmarks && results.multiFaceLandmarks[0]) {
+                        latestFaceLandmarks = results.multiFaceLandmarks[0];
+                    }
+                });
+
+                var mpCamera = new Camera(videoEl, {
+                    onFrame: async function() { await fm.send({ image: videoEl }); },
+                    width: 640,
+                    height: 480
+                });
+                await mpCamera.start();
+                faceMeshState = "running";
+                var btn = document.getElementById("cam-cta");
+                if (btn) btn.classList.add("hidden");
+            } catch (err) {
+                faceMeshState = "denied";
+                console.warn("Face mesh unavailable:", err);
+            }
+        }
+
+        function faceMeshFrame() {
+            if (faceMeshState !== "running" || !latestFaceLandmarks) return null;
+            var lms = latestFaceLandmarks;
+            var splats = [];
+            var i, lm, x, y, z;
+            // 468 (or 478 with refined landmarks) face points → splats. MediaPipe's
+            // selfie convention has x mirrored; we mirror back so left-of-screen = your
+            // left side. z is depth-relative-to-face-center, ~[-0.1,+0.1] in practice.
+            for (i = 0; i < lms.length; i++) {
+                lm = lms[i];
+                x = (0.5 - lm.x) * 4.0;
+                y = (0.5 - lm.y) * 3.0;
+                z = 2.0 + lm.z * 4.0;
+                splats.push({
+                    center: [x, y, z],
+                    color: [0.95, 0.65, 0.20],
+                    opacity: 1.0,
+                    scale: [0.012, 0.012, 0.012]
+                });
+            }
+            // Procedural floor + back wall for spatial context — same density as the
+            // local demo's room scaffold.
+            var gx, gz;
+            for (gx = -4; gx <= 4; gx++) {
+                for (gz = 1; gz <= 8; gz++) {
+                    splats.push({
+                        center: [gx * 0.4, -1.4, gz * 0.4],
+                        color: [0.15, 0.18, 0.22],
+                        opacity: 1.0,
+                        scale: [0.05, 0.05, 0.05]
+                    });
+                }
+            }
+            for (gx = -4; gx <= 4; gx += 2) {
+                for (var wy = -1; wy <= 2; wy++) {
+                    splats.push({
+                        center: [gx * 0.4, wy * 0.5, 4.0],
+                        color: [0.12, 0.20, 0.28],
+                        opacity: 1.0,
+                        scale: [0.05, 0.05, 0.05]
+                    });
+                }
+            }
+            demoFrameNum += 1;
+            return {
+                splats: splats,
+                count: splats.length,
+                frame: demoFrameNum,
+                live: false,
+                source: "face-mesh",
+                pipeline: {
+                    skeleton: null,
+                    vitals: { breathing_rate: 14, motion_score: 0.15 }
+                }
+            };
+        }

        function buildSplatsUrl() {
            if (backendArg === "demo") return null;
@ -211,11 +331,16 @@
            };
        }

+        function pickDemoFrame() {
+            // Prefer real face-mesh data when the camera is running; else procedural.
+            return faceMeshFrame() || syntheticFrame();
+        }
+
        async function fetchCloud() {
            // Demo-only mode: never hit the network.
            if (backendArg === "demo") {
                transportMode = "demo";
-                handleData(syntheticFrame());
+                handleData(pickDemoFrame());
                return;
            }
            try {
@ -231,7 +356,7 @@
                    return;
                }
                transportMode = "demo";
-                handleData(syntheticFrame());
+                handleData(pickDemoFrame());
            }
        }

@ -262,6 +387,8 @@
                        mode = '<span class="live">&#9679; LIVE</span> Local Backend';
                    } else if (transportMode === "remote") {
                        mode = '<span class="live">&#9679; REMOTE</span> ' + backendArg;
+                    } else if (data.source === "face-mesh") {
+                        mode = '<span class="face">&#9679; DEMO</span> Your Face (MediaPipe)';
                    } else {
                        mode = '<span class="demo">&#9679; DEMO</span> Synthetic';
                    }
@ -321,8 +448,24 @@
                }
            } catch(e) {}
        }
+        // Wire the camera CTA: shown only when we'll be rendering the demo path
+        // (auto-with-no-backend or explicit ?backend=demo). Hidden in live/remote.
+        (function wireCamCta() {
+            var btn = document.getElementById("cam-cta");
+            if (!btn) return;
+            // Hide CTA when user explicitly required live data.
+            if (requireLive || backendArg.startsWith("http")) {
+                btn.classList.add("hidden");
+                return;
+            }
+            btn.addEventListener("click", function() {
+                btn.textContent = "Starting camera…";
+                startFaceMesh();
+            });
+        })();
+
        fetchCloud();
-        setInterval(fetchCloud, 500);
+        setInterval(fetchCloud, 250); // 4 Hz — enough for face mesh, light on the network

        function updateSplats(splats) {
            if (pointsMesh) scene.remove(pointsMesh);