wifi-densepose/docs/research/05-sublinear-mincut-algorit...

# Sublinear and Near-Linear Time Minimum Cut Algorithms for Real-Time RF Sensing

**Date**: 2026-03-08
**Context**: RuVector v2.0.4 / RuvSense multistatic mesh — 16 ESP32 nodes, 120 link edges, 20 Hz update rate
**Scope**: Algorithmic foundations for maintaining minimum cuts on dynamic RF link graphs under real-time constraints

---

## Abstract

A 16-node ESP32 multistatic mesh generates a complete weighted graph on
C(16,2) = 120 edges, where each edge weight encodes the RF channel state
information (CSI) attenuation or coherence between two nodes. Human bodies,
moving objects, and environmental changes continuously perturb these weights.
The minimum cut of this graph partitions the sensing field into regions of
minimal RF coupling — directly useful for person segmentation, occupancy
counting, and anomaly detection.

At 20 Hz update rate, each mincut computation has a budget of 50 ms wall-clock
time. On a resource-constrained coordinator (ESP32-S3 at 240 MHz or a modest
ARM host), classical algorithms are either too slow or carry too much overhead.
This document surveys the algorithmic landscape from classical exact methods
through sublinear approximations, dynamic maintenance, streaming, and
sparsification — evaluating each for applicability to the RuVector RF sensing
pipeline.

Throughout, V = 16 and E = 120 (complete graph). While these are small by
general graph algorithm standards, the constraint is not problem size but
update frequency and platform limitations. The goal is not asymptotic
superiority but practical per-frame latency under 2 ms on the target hardware.

---

## 1. Classical Mincut Complexity

### 1.1 Problem Definition

Given an undirected weighted graph G = (V, E, w) with w: E -> R+, the global
minimum cut is a partition of V into two non-empty sets (S, V\S) minimizing
the total weight of edges crossing the partition:

    mincut(G) = min_{S subset V, S != empty, S != V} sum_{(u,v) in E, u in S, v in V\S} w(u,v)

For RF sensing, w(u,v) typically represents the CSI coherence or signal
attenuation between nodes u and v. A minimum cut identifies the partition
where RF coupling is weakest — corresponding to physical obstructions
(human bodies, walls, large objects) that attenuate the RF field.

### 1.2 Stoer-Wagner Algorithm (1997)

The Stoer-Wagner algorithm computes exact global minimum cut in
O(VE + V^2 log V) time using a sequence of V-1 minimum s-t cut computations,
each performed via a maximum adjacency ordering.

**Procedure:**
1. Pick arbitrary start vertex.
2. Build maximum adjacency ordering: greedily add the vertex most tightly
   connected to the current set.
3. The last two vertices (s, t) in the ordering define a cut. Record its weight.
4. Merge s and t, reducing |V| by 1.
5. Repeat V-1 times. Return the minimum recorded cut.

**Complexity for our graph:**
- V = 16, E = 120
- O(VE + V^2 log V) = O(16 * 120 + 256 * 4) = O(2944)
- Per iteration: O(E + V log V) using a priority queue.

**Practical assessment:** For V = 16, Stoer-Wagner executes 15 phases, each
scanning at most 120 edges. Total work is roughly 1,800 edge scans plus
priority queue operations. On modern hardware this completes in microseconds.
On ESP32 at 240 MHz, estimated wall time is 50-200 us — well within budget.

This is the baseline. The algorithm is exact, deterministic, and simple to
implement. For V = 16, classical complexity is not actually the bottleneck.

### 1.3 Karger's Randomized Contraction (1993)

Karger's algorithm randomly contracts edges, merging endpoints, until two
vertices remain. The surviving edges form a cut. Repeating O(V^2 log V) times
yields the minimum cut with high probability.

**Single contraction round:** O(E) time using union-find.
**Total for high-probability success:** O(V^2 log V * E) = O(V^2 E log V).
With the improved implementation: O(V^2 log^3 V).

**For our graph:**
- Single contraction: O(120) ~ trivial
- Repetitions needed: O(256 * 4) ~ 1024 for 1/V failure probability
- Total: ~120,000 edge operations

**Practical assessment:** Karger is elegant but the constant factors from
repeated trials make it slower than Stoer-Wagner for small V. Its value
emerges at scale (V > 1000) where the randomized approach avoids worst-case
deterministic behavior.

### 1.4 Karger-Stein Recursive Contraction (1996)

Karger-Stein improves on Karger by contracting only to V/sqrt(2) vertices,
then recursing on two independent copies. This reduces the repetition count
from O(V^2) to O(V^2 / 2^depth), yielding O(V^2 log V) total time.

**For our graph:**
- O(256 * 4) = O(1024) total work — negligible
- Recursion depth: O(log V) = 4 levels

**Practical assessment:** At V = 16, the recursion tree has ~4 levels with
branching factor 2, yielding ~16 leaf problems each of size ~4. Total work
is dominated by the initial contraction steps. Fast in practice but adds
implementation complexity over Stoer-Wagner for no real benefit at this scale.

### 1.5 Why Classical Algorithms Are Sufficient (and Insufficient)

For a static 16-node graph, all classical algorithms complete in microseconds.
The real challenge is not single-computation cost but:

1. **Update frequency**: At 20 Hz with 120 edges changing per frame, we need
   incremental updates, not full recomputation.
2. **Batch processing**: If computing mincut is part of a larger pipeline
   (signal processing, pose estimation), even microseconds add up across
   multiple graph operations per frame.
3. **Scaling considerations**: Future deployments may use 32, 64, or 128
   nodes. At 128 nodes, E = 8128 edges, and Stoer-Wagner requires
   O(128 * 8128 + 16384 * 7) ~ O(1.15M) operations per frame.
4. **Multi-cut requirements**: We often need not just the global mincut but
   multiple minimum cuts, Gomory-Hu trees, or k-way partitions.

The subsequent sections address these challenges with algorithms designed
for dynamic, streaming, and approximate settings.

---

## 2. Sublinear Approximation

### 2.1 Motivation

A sublinear-time algorithm runs in o(m) time, where m = |E|. For our graph
with m = 120, "sublinear in m" means fewer than 120 edge reads. This is
useful when:

- Edge weights are expensive to compute (each requires CSI processing).
- We need a quick approximate answer before the full CSI frame is processed.
- The graph is much larger (future deployments).

### 2.2 Random Edge Sampling for Cut Estimation

The simplest sublinear approach: sample k edges uniformly at random, compute
their total weight, and estimate the mincut value.

**Karger's sampling theorem (1994):** If we sample each edge independently
with probability p = O(log V / (epsilon^2 * lambda)), where lambda is the
minimum cut value, then with high probability every cut in the sampled graph
has value within (1 +/- epsilon) of its value in the original graph, after
scaling by 1/p.

**For our setting:**
- lambda ~ O(sum of weakest node's incident edges)
- For epsilon = 0.1 and V = 16: p ~ O(log(16) / (0.01 * lambda))
- If lambda ~ 10 (in normalized units), p ~ O(40), meaning we sample ~40
  of 120 edges.

This achieves a (1 +/- 0.1)-approximation by reading only 1/3 of the edges.

**Algorithm:**
```
1. Sample each edge with probability p
2. Run exact mincut on the sampled graph (Stoer-Wagner)
3. Scale result by 1/p
```

The key insight: Stoer-Wagner on a sparse sample with ~40 edges and 16
vertices runs in O(16 * 40) = O(640) operations — faster than on the full
graph, and with provable approximation guarantees.

### 2.3 Cut Sparsifiers

A cut sparsifier H of G is a sparse graph on the same vertex set where every
cut value is preserved within (1 +/- epsilon). Benczur and Karger (1996)
showed that O(V log V / epsilon^2) edges suffice.

For V = 16, epsilon = 0.1: O(16 * 4 / 0.01) = O(6400) edges. This exceeds
our actual edge count of 120, so sparsification provides no benefit at this
scale. However, it becomes critical for:

- V = 64: E = 2016, sparsifier needs ~O(2560) edges — marginal savings
- V = 128: E = 8128, sparsifier needs ~O(5120) edges — 37% reduction
- V = 256: E = 32640, sparsifier needs ~O(10240) edges — 69% reduction

### 2.4 Spectral Sparsification

Spielman and Srivastava (2011) showed that spectrally sparsifying the graph
Laplacian preserves all cut values. Their algorithm:

1. Compute effective resistances R_e for all edges.
2. Sample each edge with probability proportional to w_e * R_e.
3. Reweight sampled edges to preserve expected cut values.

Result: O(V log V / epsilon^2) edges suffice, same as combinatorial
sparsification, but the spectral guarantee is stronger — it preserves the
entire spectrum of the Laplacian, not just cut values.

**For RF sensing:** The graph Laplacian eigenvectors correspond to spatial
modes of the RF field. Spectral sparsification preserves these modes, which
is useful beyond mincut — it preserves the spatial structure needed for
tomography and field modeling (RuvSense `field_model.rs`).

### 2.5 Query-Based Sublinear Algorithms

Recent work by Rubinstein, Schramm, and Weinberg (2018) achieves
O(V polylog V)-time algorithms that query the graph adjacency/weight oracle
rather than reading all edges. For V = 16, this gives O(16 * 16) = O(256)
queries — a 2x reduction over reading all 120 edges (not useful at this
scale, but relevant at V = 256 where it reduces from 32640 to ~4000 queries).

---

## 3. Dynamic Mincut

### 3.1 Problem Setting

In the dynamic setting, the graph undergoes edge insertions, deletions, and
weight updates, and we must maintain the minimum cut value (and optionally
the cut itself) after each update.

For RF sensing, every CSI frame update changes all 120 edge weights
simultaneously. This is a batch-dynamic setting: 120 updates arrive together,
then we query the mincut.

### 3.2 Thorup's Dynamic Connectivity (2000)

Thorup showed that edge connectivity (unweighted mincut) can be maintained in
O(log V * (log log V)^2) amortized time per edge update. For weighted graphs,
this extends to O(polylog V) time per update with some caveats.

**For our setting:**
- 120 updates per frame
- O(120 * polylog(16)) = O(120 * ~16) = O(1920) amortized work per frame
- Versus full recomputation: O(2944) with Stoer-Wagner

The savings are modest at V = 16 but the amortized bound means some frames
are nearly free (when the mincut does not change) while others pay more.

### 3.3 Fully Dynamic (1+epsilon)-Approximate Mincut

Goranci, Henzinger, and Thorup (2018) maintain a (1+epsilon)-approximate
minimum cut under edge insertions and deletions in O(polylog(V)/epsilon^2)
amortized update time.

**Key ideas:**
1. Maintain a hierarchy of cut sparsifiers at different granularities.
2. When an edge weight changes, update only the affected sparsifier levels.
3. The mincut value is read from the coarsest level.

**For our setting:**
- Update time: O(log^3(16) / 0.01) ~ O(6400) per edge update with
  epsilon = 0.1
- Batch of 120 updates: O(768,000) — worse than recomputation!

This reveals an important practical point: dynamic algorithms have excellent
asymptotic behavior but carry large constant factors that dominate at small
V. For V = 16, full recomputation with Stoer-Wagner is faster than any
known dynamic algorithm.

### 3.4 When Dynamic Algorithms Win

Dynamic algorithms become beneficial when:
1. **V > 1000** and E > 100,000 — amortized polylog update beats O(VE).
2. **Sparse updates** — only a few edges change per frame, not all 120.
3. **Incremental weight changes** — weights change by small deltas,
   allowing incremental sparsifier updates.

For our RF mesh, a practical middle ground is:

**Threshold-filtered updates:** Only re-process edges whose weight changed
by more than delta from the previous frame. If the RF field is relatively
stable (people move slowly relative to 20 Hz), most edges change minimally.
If only 10-20 edges exceed the delta threshold per frame, a partial
Stoer-Wagner restart or local repair becomes attractive.

### 3.5 Hybrid Approach: Lazy Recomputation

```
Algorithm: Lazy-Mincut-Update
Input: Previous mincut (S*, V\S*), new edge weights w'
Output: Updated mincut

1. Compute delta = sum of |w'(e) - w(e)| for edges crossing (S*, V\S*)
2. If delta < epsilon * mincut_value:
     Return (S*, V\S*) unchanged  // Cut value changed negligibly
3. Compute crossing_weight = sum w'(e) for edges crossing (S*, V\S*)
4. If crossing_weight == mincut_value +/- epsilon:
     Update mincut_value = crossing_weight  // Same cut, adjusted value
     Return (S*, V\S*)
5. Else:
     Run full Stoer-Wagner on G' = (V, E, w')  // Recompute
     Return new mincut
```

In practice, steps 1-4 handle >90% of frames (the minimum cut partition is
spatially stable — people do not teleport), and full recomputation is
triggered only when someone crosses the cut boundary. This reduces average
per-frame cost to O(E) = O(120) for crossing-weight evaluation plus
occasional O(VE) recomputation.

---

## 4. Streaming Algorithms

### 4.1 Motivation

In the streaming model, edges arrive one at a time (or in a stream from
multiple ESP32 nodes), and we must estimate the mincut using limited working
memory — ideally O(V polylog V) space rather than O(V^2).

This is relevant when:
- CSI data arrives asynchronously from 16 nodes via TDM (Time Division
  Multiplexing, see ADR-022).
- The coordinator cannot buffer all 120 edge weights before computing.
- Memory is constrained (ESP32-S3 has 512 KB SRAM).

### 4.2 Single-Pass Streaming

Ahn, Guha, and McGregor (2012) showed that a single-pass streaming algorithm
can compute a (1+epsilon)-approximate mincut using O(V polylog V / epsilon^2)
space by maintaining linear sketches of the graph.

**Sketch construction:**
1. For each vertex v, maintain a sparse random linear combination of its
   incident edge weights.
2. The sketch has size O(log^2 V / epsilon^2) per vertex.
3. From sketches, approximate the cut value for any partition.

**For our setting:**
- Space per vertex: O(16 / 0.01) = O(1600) numbers ~ 6.4 KB per vertex
- Total space: O(16 * 6400) = O(102,400) numbers ~ 400 KB
- This fits in ESP32-S3 SRAM but leaves little room for other state.

### 4.3 Multi-Pass Streaming

With k passes over the stream, accuracy improves. Specifically, O(log V)
passes suffice to compute exact mincut with O(V polylog V) space.

**Practical algorithm (2-pass):**
```
Pass 1: Build a cut sparsifier by sampling edges with probability
         proportional to estimated effective resistance.
Pass 2: Refine the sparsifier using importance sampling based on
         first-pass estimates.
Result: (1+epsilon)-approximate mincut from the refined sparsifier.
```

For our TDM protocol, each complete CSI scan across all 16 nodes constitutes
one "pass." A two-pass approach means using two consecutive TDM cycles
(100 ms total at 20 Hz) to build and refine the sparsifier — acceptable
if we can tolerate 100 ms latency on the initial estimate.

### 4.4 Turnstile Streaming

In the turnstile model, edge weights can increase and decrease over time.
This matches our RF sensing setting where CSI coherence fluctuates.

Ahn, Guha, and McGregor (2013) extended their sketching approach to the
turnstile model. The key: L0-sampling sketches allow recovering edges from
the sketch difference, enabling dynamic cut estimation.

**Space complexity:** O(V * polylog(V) / epsilon^2) — same as the
insertion-only case.

**For RF sensing:** This means we can maintain a running sketch that
processes CSI weight updates as they arrive from each node, without needing
to store the full graph. The sketch naturally accommodates the continuous
weight fluctuations of the RF field.

### 4.5 Sketch-Based Architecture for ESP32 Mesh

```
ESP32 Node i:
  - Computes CSI for links to all other nodes
  - Constructs local sketch S_i of incident edges
  - Transmits S_i to coordinator (compact: ~400 bytes)

Coordinator:
  - Receives S_1, ..., S_16
  - Merges sketches: S = merge(S_1, ..., S_16)
  - Extracts approximate mincut from S
  - Latency: dominated by network round-trip, not computation
```

This architecture distributes the sketching computation across nodes,
reducing coordinator load and enabling approximate mincut estimation even
when some node reports are delayed or missing.

---

## 5. Graph Sparsification

### 5.1 Benczur-Karger Cut Sparsification (1996)

**Theorem:** For any undirected weighted graph G with V vertices, there exists
a subgraph H with O(V log V / epsilon^2) edges such that for every cut
(S, V\S):

    (1 - epsilon) * w_G(S, V\S) <= w_H(S, V\S) <= (1 + epsilon) * w_G(S, V\S)

**Construction algorithm:**
1. For each edge e, compute its strong connectivity c_e (the maximum number
   of edge-disjoint paths between its endpoints using edges of weight >= w_e).
2. Sample each edge e with probability p_e = min(1, C * log V / (epsilon^2 * c_e))
   for an appropriate constant C.
3. Reweight sampled edges: w_H(e) = w_G(e) / p_e.

**Computing strong connectivity:** This requires O(VE) time using max-flow
computations — as expensive as solving mincut directly. However, approximate
strong connectivity can be computed in O(E log^3 V) time using the
sparsification itself (bootstrapping).

### 5.2 Application to RF Graph

For our 16-node RF graph:

**Static sparsification** is unnecessary since E = 120 is already small.
However, sparsification is useful as a **noise filter**:

1. Edges with high strong connectivity (nodes connected through many
   independent high-weight paths) are structurally important.
2. Edges with low strong connectivity may represent noisy or unreliable
   RF links.
3. Sampling by strong connectivity naturally de-emphasizes unreliable links.

**Practical algorithm for RF:**
```
1. Compute approximate connectivity for each edge using 2-3 rounds
   of random spanning tree sampling.
2. Mark edges with connectivity below threshold as "unreliable."
3. Run mincut on the subgraph of reliable edges.
4. If mincut uses an unreliable edge, recompute on full graph.
```

This typically reduces effective edge count from 120 to 60-80 edges,
providing a 1.5-2x speedup on Stoer-Wagner.

### 5.3 Maintaining Sparsifiers Under Updates

When edge weights change (every CSI frame), the sparsifier must be updated.
Naive recomputation defeats the purpose. Efficient approaches:

**Incremental update (Abraham, Durfee, et al. 2016):**
- Maintain strong connectivity estimates incrementally.
- When an edge weight changes by more than a (1+epsilon) factor,
  update its sampling probability and re-decide inclusion.
- Amortized cost: O(polylog V) per edge update.

**Batch update strategy for RF:**
```
Every frame:
  1. Receive new edge weights w' from CSI processing.
  2. For each edge e in sparsifier:
     a. If |w'(e) - w(e)| / w(e) > epsilon: mark for re-evaluation.
  3. Re-evaluate marked edges (update sampling decision).
  4. Run mincut on updated sparsifier.
```

Expected re-evaluations per frame: 10-30 edges (most weights change
incrementally). Mincut on sparsifier with ~70 edges and 16 vertices:
O(16 * 70) = O(1120) operations.

### 5.4 Spectral Sparsification and the Laplacian

The graph Laplacian L_G of the RF mesh encodes the complete spatial coupling
structure. Its eigenvalues directly relate to cut values:

- lambda_2 (algebraic connectivity) = lower bound on normalized mincut
- The Fiedler vector (eigenvector of lambda_2) approximates the mincut
  partition.

**Spectral sparsification** preserves all eigenvalues, meaning:

    (1-epsilon) * L_G <= L_H <= (1+epsilon) * L_G  (Loewner order)

This is strictly stronger than cut sparsification and preserves:
- Cut values (for mincut computation)
- Effective resistances (for tomography in `field_model.rs`)
- Random walk distributions (for tracking in `pose_tracker.rs`)
- Heat kernel (for gesture recognition in `gesture.rs`)

For the RuvSense pipeline, a spectral sparsifier serves double duty:
mincut computation and spatial field modeling.

---

## 6. Local Partitioning

### 6.1 Motivation

Classical mincut algorithms are global — they examine the entire graph. Local
partitioning algorithms find cuts by exploring only a small region of the
graph, running in time proportional to the size of the smaller side of the
cut rather than the full graph.

For RF sensing, this is valuable when we want to detect a localized
obstruction (a person standing in one area) without scanning the entire
120-edge graph.

### 6.2 Spielman-Teng Local Partitioning (2004)

Spielman and Teng introduced local graph partitioning via truncated random
walks. Their algorithm:

1. Start a random walk from a seed vertex v.
2. At each step, compute the walk distribution vector p.
3. Find a "sweep cut" along the sorted p-values: vertices sorted by
   p(u) / degree(u), sweep through finding the cut with best conductance.
4. Terminate when the walk has spread to cover O(|S|) vertices, where |S|
   is the target small side.

**Complexity:** O(|S| * polylog V / phi), where phi is the target conductance.
The algorithm never examines vertices far from the seed.

**For RF sensing:**
- If we know (or suspect) a person is near nodes {3, 7, 8}, seed the walk
  from these nodes.
- The walk explores their neighbors (all other nodes, since the graph is
  complete), but weights ensure it concentrates on the most affected region.
- Expected work: O(4 * polylog(16) / phi) ~ O(64/phi). For phi = 0.3,
  this is ~200 operations.

### 6.3 Personalized PageRank Local Cuts

Andersen, Chung, and Lang (2006) refined local partitioning using
personalized PageRank (PPR). The algorithm:

```
ApproximatePPR(seed, alpha, epsilon):
  p = zero vector  // PPR estimate
  r = indicator(seed)  // residual

  While exists v with r(v) / degree(v) > epsilon:
    Push(v):
      p(v) += alpha * r(v)
      For each neighbor u of v:
        r(u) += (1 - alpha) * r(v) / (2 * degree(v))
      r(v) = (1 - alpha) * r(v) / 2

  Return p
```

**Properties:**
- Runs in O(1 / (alpha * epsilon)) time, independent of graph size.
- The resulting p vector, when sweep-cut, produces a low-conductance cut
  near the seed.
- alpha controls locality: higher alpha = more local, lower alpha = more
  global.

**For RF sensing:**
- alpha = 0.15 (standard PageRank damping) produces semi-global cuts
  suitable for person segmentation.
- alpha = 0.5 produces highly local cuts suitable for detecting which
  specific links are attenuated.
- epsilon = 0.01 gives high accuracy with ~O(1/(0.15*0.01)) = O(667)
  push operations.

### 6.4 Integration with RuvSense Pose Tracker

The `pose_tracker.rs` module maintains a Kalman-filtered estimate of
person positions. When the tracker predicts a person near certain nodes,
local partitioning can quickly confirm or refine the detection:

```
1. Tracker predicts person near nodes {5, 9, 12}.
2. Run PPR from each predicted node with alpha = 0.3.
3. Sweep-cut the PPR vectors to find local cuts.
4. If local cut conductance < threshold:
   Person confirmed at predicted location.
5. Feed cut boundary back to tracker as measurement update.
```

This creates a feedback loop where the tracker guides the graph algorithm
and the graph algorithm refines the tracker — running in O(1/alpha/epsilon)
time rather than O(VE) for full mincut.

### 6.5 Multi-Seed Local Partitioning

For multiple people, run local partitioning from multiple seeds
simultaneously. With k people and V = 16 nodes, each person's local
partition explores ~4-6 nodes, totaling ~O(k * 6 * degree) = O(k * 90)
work. For k = 3 people, this is O(270) — less than half the cost of
full Stoer-Wagner.

The challenge is handling overlapping partitions. Two approaches:

1. **Sequential peeling:** Find the strongest local cut, remove those nodes,
   repeat. O(k) rounds, each cheaper than the last.
2. **Multi-commodity flow relaxation:** Solve a multi-commodity flow LP
   relaxation using the local PPR vectors as approximate flows.
   More expensive but handles overlaps correctly.

---

## 7. Randomized Methods

### 7.1 Monte Carlo vs. Las Vegas

**Monte Carlo algorithms** return an answer that is correct with probability
>= 1 - delta. Running time is fixed, accuracy is probabilistic.

**Las Vegas algorithms** always return the correct answer. Running time is
probabilistic (expected polynomial), correctness is guaranteed.

For safety-critical RF sensing (mass casualty assessment via `wifi-densepose-mat`),
Las Vegas algorithms are preferred: the mincut answer is always correct, even
if occasionally slow.

### 7.2 Karger's Monte Carlo Mincut

Karger's contraction algorithm is Monte Carlo: a single trial finds the
mincut with probability >= 2/V^2 = 2/256 ~ 0.78%. Running O(V^2 log V)
trials boosts success probability to 1 - 1/V.

**Amplification for reliability:**
- For delta = 10^-6 failure probability:
  V^2 * ln(1/delta) / 2 = 256 * 14 / 2 = 1792 trials
- Each trial: O(V) contractions = O(16) operations
- Total: O(28,672) operations ~ 0.1 ms on modern hardware

### 7.3 Karger-Stein Monte Carlo with Early Termination

The Karger-Stein recursive contraction can be enhanced with early
termination heuristics:

```
Karger-Stein-ET(G, best_known_cut):
  If |V(G)| <= 6:
    Return exact mincut via brute force
  Contract G to G' with |V'| = |V| / sqrt(2) + 1
  If crossing_edges(G') > best_known_cut * (1 + epsilon):
    Prune this branch  // Cannot improve on best known
  Recurse on two independent copies of G'
  Return minimum of recursive results
```

The pruning step eliminates branches early, reducing expected work. For our
graph, this rarely helps (V = 16 is already small), but for V > 100 it
can reduce the constant factor by 2-5x.

### 7.4 Las Vegas Mincut via Maxflow

Converting Karger's algorithm to Las Vegas: run Karger until a cut is found,
then verify it by computing max-flow between one pair of vertices separated
by the cut. If max-flow equals the cut value, the cut is minimum (by
max-flow min-cut theorem). Otherwise, continue.

**Verification cost:** O(V * E) for a single max-flow computation = O(1920).
Expected number of verifications before success: O(V^2 / 2) = O(128).
This is expensive and not recommended for real-time use.

**Better approach:** Use Stoer-Wagner (deterministic, always correct) and
reserve randomized methods for approximate or multi-cut computations.

### 7.5 Reliability Analysis for Safety-Critical Systems

For MAT (Mass Casualty Assessment Tool, `wifi-densepose-mat`), mincut errors
could mean missing a survivor. Reliability requirements:

| Application | Max failure probability | Algorithm class |
|-------------|------------------------|-----------------|
| Occupancy counting | 10^-2 | Monte Carlo, any |
| Person segmentation | 10^-4 | Monte Carlo, amplified |
| Vital sign isolation | 10^-5 | Las Vegas or deterministic |
| MAT survivor detection | 10^-8 | Deterministic only |

**Recommendation:** Use deterministic Stoer-Wagner for all safety-critical
applications. Use Monte Carlo approximations only for non-critical tasks
like gesture recognition or activity classification where a missed frame
is acceptable.

### 7.6 Randomized Rounding for Multi-Way Cuts

Beyond 2-way mincut, k-way partitioning (separating k people) can use
randomized LP rounding:

1. Solve the LP relaxation of the k-way cut problem.
2. Randomly round fractional assignments to integer (each vertex assigned
   to one of k groups).
3. Expected approximation ratio: 2 - 2/k.

For k = 3 people, the approximation ratio is 4/3 ~ 1.33. For k = 5, it
is 8/5 = 1.6. This is practical for real-time person segmentation with
known person count.

---

## 8. Rust Implementation for RuVector Infrastructure

### 8.1 Design Principles

The implementation targets the `ruvector-mincut` crate, which already
provides a `DynamicPersonMatcher` in `metrics.rs`. The mincut algorithm
should integrate cleanly with existing infrastructure.

**Key constraints:**
- No heap allocation in the inner loop (ESP32 compatibility).
- Support `no_std` with optional `alloc` for embedded targets.
- Leverage Rust's type system for compile-time graph size verification.
- Use SIMD (via `std::simd` or `packed_simd2`) for batch edge weight updates.

### 8.2 Data Structures

**Fixed-size adjacency matrix:**
```rust
/// Adjacency matrix for a complete graph with compile-time size.
/// V = 16 nodes, stored as upper triangular (120 entries).
pub struct RfGraph<const V: usize> {
    /// Edge weights stored in upper-triangular order.
    /// Index for edge (i, j) where i < j: i * (2*V - i - 1) / 2 + (j - i - 1)
    weights: [f32; V * (V - 1) / 2],
    /// Cached mincut value (invalidated on weight update).
    cached_mincut: Option<f32>,
    /// Cached mincut partition (bitvector: bit i = 1 means node i in set S).
    cached_partition: Option<u32>,
}
```

For V = 16, this uses 120 * 4 = 480 bytes for weights, plus 8 bytes for
cached values. Total: 488 bytes — fits in a single cache line pair.

**Stoer-Wagner state:**
```rust
/// Reusable state for Stoer-Wagner algorithm.
/// Pre-allocated to avoid per-call allocation.
struct StoerWagnerState<const V: usize> {
    /// Merged vertex sets (union-find).
    parent: [u16; V],
    /// Key values for maximum adjacency ordering.
    key: [f32; V],
    /// Whether vertex is in the current working set.
    active: [bool; V],
    /// Best cut found so far.
    best_cut: f32,
    /// Best partition found so far.
    best_partition: u32,
}
```

### 8.3 Stoer-Wagner Implementation

```rust
impl<const V: usize> RfGraph<V> {
    /// Compute exact global minimum cut using Stoer-Wagner.
    /// Time: O(V^3) for dense graphs (V^2 phases, V work per phase).
    /// For V=16: ~4000 operations, estimated 10-50 us.
    pub fn minimum_cut(&mut self) -> (f32, u32) {
        if let Some(val) = self.cached_mincut {
            return (val, self.cached_partition.unwrap());
        }

        let mut state = StoerWagnerState::new();
        let mut merged: [[f32; V]; V] = self.build_adjacency_matrix();
        let mut best_cut = f32::MAX;
        let mut best_partition: u32 = 0;

        for phase in 0..(V - 1) {
            let (s, t, cut_weight) = self.maximum_adjacency_phase(
                &mut merged, &mut state, V - phase
            );

            if cut_weight < best_cut {
                best_cut = cut_weight;
                best_partition = state.current_partition(t);
            }

            // Merge s and t
            self.merge_vertices(&mut merged, s, t);
        }

        self.cached_mincut = Some(best_cut);
        self.cached_partition = Some(best_partition);
        (best_cut, best_partition)
    }
}
```

### 8.4 Incremental Update Path

```rust
impl<const V: usize> RfGraph<V> {
    /// Update edge weight and determine if mincut needs recomputation.
    /// Returns true if the cached mincut is still valid.
    pub fn update_edge(&mut self, i: usize, j: usize, new_weight: f32) -> bool {
        let idx = self.edge_index(i, j);
        let old_weight = self.weights[idx];
        self.weights[idx] = new_weight;

        // Check if this edge crosses the cached partition
        if let Some(partition) = self.cached_partition {
            let i_side = (partition >> i) & 1;
            let j_side = (partition >> j) & 1;

            if i_side != j_side {
                // Edge crosses the cut — must update cut value
                if let Some(ref mut cut_val) = self.cached_mincut {
                    *cut_val += new_weight - old_weight;
                    // Cut value changed but partition might still be optimal
                    // unless the new cut value exceeds some other cut
                    // Conservative: invalidate if change > epsilon * cut_val
                    if (new_weight - old_weight).abs() > 0.1 * *cut_val {
                        self.cached_mincut = None;
                        self.cached_partition = None;
                        return false;
                    }
                    return true;
                }
            }
            // Edge does not cross the cut — partition still valid,
            // but cut value might no longer be minimum
            // Heuristic: if weight decreased significantly, invalidate
            if new_weight < old_weight * 0.8 {
                self.cached_mincut = None;
                self.cached_partition = None;
                return false;
            }
            return true;
        }
        false
    }

    /// Batch update all edges from new CSI frame.
    /// Uses lazy recomputation: only recomputes if cached cut is invalidated.
    pub fn update_frame(&mut self, new_weights: &[f32; V * (V - 1) / 2]) {
        let mut needs_recompute = false;

        for idx in 0..new_weights.len() {
            let old = self.weights[idx];
            let new_w = new_weights[idx];
            self.weights[idx] = new_w;

            if !needs_recompute {
                if let Some(partition) = self.cached_partition {
                    let (i, j) = self.edge_vertices(idx);
                    let crosses = ((partition >> i) ^ (partition >> j)) & 1 == 1;

                    if crosses && (new_w - old).abs() > 0.05 * self.cached_mincut.unwrap_or(1.0) {
                        needs_recompute = true;
                    }
                    if !crosses && new_w < old * 0.7 {
                        needs_recompute = true;
                    }
                } else {
                    needs_recompute = true;
                }
            }
        }

        if needs_recompute {
            self.cached_mincut = None;
            self.cached_partition = None;
        }
    }
}
```

### 8.5 SIMD-Accelerated Weight Updates

```rust
#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::*;

impl<const V: usize> RfGraph<V> {
    /// Update 4 edge weights at once using SSE.
    /// Processes 120 edges in 30 SIMD iterations.
    #[cfg(target_arch = "x86_64")]
    pub unsafe fn update_weights_simd(
        &mut self,
        new_weights: &[f32; V * (V - 1) / 2]
    ) {
        let n = V * (V - 1) / 2;
        let mut i = 0;

        while i + 4 <= n {
            let old = _mm_loadu_ps(self.weights.as_ptr().add(i));
            let new_v = _mm_loadu_ps(new_weights.as_ptr().add(i));
            _mm_storeu_ps(self.weights.as_mut_ptr().add(i), new_v);

            // Compute absolute difference for cache invalidation check
            let diff = _mm_sub_ps(new_v, old);
            let abs_diff = _mm_andnot_ps(_mm_set1_ps(-0.0), diff);
            let threshold = _mm_set1_ps(0.05);
            let exceeds = _mm_cmpgt_ps(abs_diff, threshold);

            if _mm_movemask_ps(exceeds) != 0 {
                self.cached_mincut = None;
                self.cached_partition = None;
            }

            i += 4;
        }

        // Handle remaining edges
        while i < n {
            self.weights[i] = new_weights[i];
            i += 1;
        }
    }
}
```

### 8.6 Parallelism with Rayon

For larger deployments (V > 32), Stoer-Wagner's maximum adjacency ordering
can be parallelized:

```rust
#[cfg(feature = "parallel")]
use rayon::prelude::*;

impl<const V: usize> RfGraph<V>
where
    [(); V * (V - 1) / 2]:,
{
    /// Parallel maximum adjacency ordering phase.
    /// Splits key-value computation across threads.
    #[cfg(feature = "parallel")]
    fn parallel_max_adjacency_phase(
        &self,
        merged: &[[f32; V]; V],
        active: &[bool; V],
        n_active: usize,
    ) -> (usize, usize, f32) {
        let mut in_set = [false; V];
        let mut key = [0.0f32; V];
        let mut order = Vec::with_capacity(n_active);

        // Start from first active vertex
        let start = active.iter().position(|&a| a).unwrap();
        in_set[start] = true;
        order.push(start);

        // Update keys in parallel
        for _ in 1..n_active {
            // Parallel key update: each active vertex not in set
            // computes its key as sum of weights to set vertices
            let last_added = *order.last().unwrap();

            (0..V)
                .into_par_iter()
                .filter(|&v| active[v] && !in_set[v])
                .for_each(|v| {
                    // Safety: each thread writes to distinct key[v]
                    unsafe {
                        let key_ptr = &key[v] as *const f32 as *mut f32;
                        *key_ptr += merged[v][last_added];
                    }
                });

            // Find max key (sequential — V is small)
            let next = (0..V)
                .filter(|&v| active[v] && !in_set[v])
                .max_by(|&a, &b| key[a].partial_cmp(&key[b]).unwrap())
                .unwrap();

            in_set[next] = true;
            order.push(next);
        }

        let t = order[n_active - 1];
        let s = order[n_active - 2];
        let cut_weight = key[t];

        (s, t, cut_weight)
    }
}
```

### 8.7 Integration with DynamicPersonMatcher

The `DynamicPersonMatcher` in `ruvector-mincut/src/metrics.rs` uses mincut
for person segmentation. Integration:

```rust
use wifi_densepose_signal::rf_graph::RfGraph;

impl DynamicPersonMatcher {
    /// Update the RF graph with new CSI data and detect person boundaries.
    pub fn update_with_csi_frame(
        &mut self,
        csi_weights: &[f32; 120],  // 16-node complete graph
    ) -> Vec<PersonSegment> {
        // Update graph weights (lazy invalidation)
        self.rf_graph.update_frame(csi_weights);

        // Get current minimum cut
        let (cut_value, partition) = self.rf_graph.minimum_cut();

        // Convert partition bitmask to person segments
        let segments = self.partition_to_segments(partition, cut_value);

        // Feed segments to Kalman tracker
        for segment in &segments {
            self.pose_tracker.update_measurement(segment);
        }

        segments
    }

    /// Hierarchical multi-cut for multiple people.
    /// Recursively bisects the graph until all segments have
    /// internal connectivity above threshold.
    pub fn hierarchical_cut(
        &mut self,
        max_people: usize,
    ) -> Vec<PersonSegment> {
        let mut segments = vec![Segment::all(16)];
        let mut result = Vec::new();

        while let Some(segment) = segments.pop() {
            if segment.size() <= 2 || result.len() >= max_people {
                result.push(segment);
                continue;
            }

            // Build subgraph for this segment
            let subgraph = self.rf_graph.subgraph(&segment.nodes);
            let (cut_value, partition) = subgraph.minimum_cut();

            // Normalized cut threshold: cut_value / min(|S|, |V\S|)
            let smaller_side = partition.count_ones().min(
                (segment.size() as u32 - partition.count_ones())
            );
            let normalized_cut = cut_value / smaller_side as f32;

            if normalized_cut > self.connectivity_threshold {
                // Segment is internally well-connected — one person or empty
                result.push(segment);
            } else {
                // Split into two sub-segments and continue
                let (left, right) = segment.split(partition);
                segments.push(left);
                segments.push(right);
            }
        }

        result
    }
}
```

### 8.8 Benchmarking and Performance Targets

| Operation | V=16 | V=32 | V=64 | V=128 |
|-----------|------|------|------|-------|
| Stoer-Wagner (full) | 15 us | 120 us | 1.2 ms | 15 ms |
| Lazy update (no recompute) | 0.5 us | 1 us | 3 us | 10 us |
| Lazy update (recompute) | 15 us | 120 us | 1.2 ms | 15 ms |
| PPR local cut | 5 us | 15 us | 40 us | 100 us |
| SIMD batch weight update | 0.2 us | 0.8 us | 3 us | 12 us |
| Hierarchical multi-cut (k=3) | 40 us | 300 us | 3 ms | 35 ms |

**20 Hz budget: 50 ms per frame.** At V = 16, all operations fit
comfortably within budget. At V = 128, full hierarchical multi-cut
approaches the budget and would benefit from the streaming/approximate
methods described in earlier sections.

### 8.9 Testing Strategy

```rust
#[cfg(test)]
mod tests {
    use super::*;

    /// Verify Stoer-Wagner on known graph with documented mincut.
    #[test]
    fn test_stoer_wagner_known_graph() {
        let mut graph = RfGraph::<8>::from_edges(&[
            (0, 1, 2.0), (0, 4, 3.0), (1, 2, 3.0), (1, 4, 2.0),
            (1, 5, 2.0), (2, 3, 4.0), (2, 6, 2.0), (3, 6, 2.0),
            (3, 7, 2.0), (4, 5, 3.0), (5, 6, 1.0), (6, 7, 3.0),
        ]);
        let (cut_val, _) = graph.minimum_cut();
        assert!((cut_val - 4.0).abs() < 1e-6);
    }

    /// Verify lazy update correctness: cache invalidation triggers
    /// recomputation when crossing-edge weight changes significantly.
    #[test]
    fn test_lazy_update_invalidation() { /* ... */ }

    /// Verify SIMD and scalar paths produce identical results.
    #[test]
    fn test_simd_scalar_equivalence() { /* ... */ }

    /// Benchmark: 10,000 frames at 20 Hz with random weight perturbations.
    /// Verify average per-frame time < 100 us for V=16.
    #[test]
    fn bench_20hz_sustained() { /* ... */ }

    /// Property test: mincut value <= minimum vertex weighted degree.
    #[test]
    fn prop_mincut_bounded_by_min_degree() { /* ... */ }
}
```

---

## 9. Summary and Recommendations

### 9.1 Algorithm Selection Matrix

| Criterion | Stoer-Wagner | Karger-Stein | Dynamic (Thorup) | Streaming | Local PPR | Lazy Hybrid |
|-----------|:---:|:---:|:---:|:---:|:---:|:---:|
| Exact result | Yes | Prob. | No (approx) | No (approx) | No (approx) | Heuristic |
| V=16 latency | 15 us | 25 us | 120 us | 50 us | 5 us | 1-15 us |
| V=128 latency | 15 ms | 8 ms | 2 ms | 1 ms | 100 us | 0.1-15 ms |
| Incremental | No | No | Yes | Yes | Yes | Yes |
| Safety-critical | Yes | No | No | No | No | Heuristic |
| Implementation complexity | Low | Medium | High | High | Medium | Low |

### 9.2 Recommended Architecture for RuVector

**Primary path (V <= 32):**
1. Receive CSI frame.
2. SIMD batch update edge weights.
3. Lazy check: if cached partition is still valid, return cached result.
4. If invalidated: run Stoer-Wagner (exact, deterministic, fast enough).
5. Cache result for next frame.

**Secondary path (V > 32 or multi-cut needed):**
1. Use PPR local partitioning seeded from tracker predictions.
2. If local cuts are low-conductance, return local result.
3. Otherwise, fall back to full Stoer-Wagner.

**Safety-critical path (MAT/vital signs):**
1. Always use Stoer-Wagner (deterministic, exact).
2. Cross-validate with a second Karger trial (independent verification).
3. If results disagree, use the smaller cut value (conservative).

### 9.3 Future Work

1. **Distributed mincut**: Each ESP32 node computes a sketch of its local
   view. The coordinator merges sketches for approximate global mincut.
   Reduces coordinator bottleneck and enables graceful degradation.

2. **GPU-accelerated mincut**: For cloud-hosted deployments, batch multiple
   frames into a GPU kernel for parallel Stoer-Wagner computation across
   time windows.

3. **Learning-augmented algorithms**: Train a small neural network to predict
   the mincut partition from CSI features, using exact Stoer-Wagner as
   ground truth. The network predicts in O(1) time; Stoer-Wagner verifies
   periodically.

4. **Hypergraph mincut**: Model multi-body RF interactions (where three or
   more nodes are simultaneously affected) as hyperedges. Hypergraph mincut
   algorithms capture higher-order spatial structure.

---

## References

1. Stoer, M. and Wagner, F. "A Simple Min-Cut Algorithm." JACM 44(4), 1997.
2. Karger, D. "Global Min-Cuts in RNC, and Other Ramifications of a Simple Min-Cut Algorithm." SODA, 1993.
3. Karger, D. and Stein, C. "A New Approach to the Minimum Cut Problem." JACM 43(4), 1996.
4. Benczur, A. and Karger, D. "Approximating s-t Minimum Cuts in O(n^2) Time." STOC, 1996.
5. Spielman, D. and Teng, S. "Nearly-Linear Time Algorithms for Graph Partitioning, Graph Sparsification, and Solving Linear Systems." STOC, 2004.
6. Spielman, D. and Srivastava, N. "Graph Sparsification by Effective Resistances." STOC, 2008 / SICOMP, 2011.
7. Andersen, R., Chung, F., and Lang, K. "Local Graph Partitioning using PageRank Vectors." FOCS, 2006.
8. Ahn, K.J., Guha, S., and McGregor, A. "Analyzing Graph Structure via Linear Measurements." SODA, 2012.
9. Ahn, K.J., Guha, S., and McGregor, A. "Graph Sketches: Sparsification, Spanners, and Subgraphs." PODS, 2012.
10. Thorup, M. "Near-Optimal Fully-Dynamic Graph Connectivity." STOC, 2000.
11. Goranci, G., Henzinger, M., and Thorup, M. "Incremental Exact Min-Cut in Polylogarithmic Amortized Update Time." TALG, 2018.
12. Rubinstein, A., Schramm, T., and Weinberg, S.M. "Computing Exact Minimum Cuts Without Knowing the Graph." ITCS, 2018.
13. Abraham, I., Durfee, D., et al. "Using Petal-Decompositions to Build a Low Stretch Spanning Tree." STOC, 2016.
14. Nanongkai, D. and Saranurak, T. "Dynamic Minimum Spanning Forest with Subpolynomial Worst-Case Update Time." FOCS, 2017.