MultistaticArray::fuse / fuse_ungated cloned every viewpoint embedding twice per
fusion (once into `extracted`, again when building the attention input). Now the
embeddings are MOVED out of `extracted` (one clone per viewpoint instead of two),
capturing geometry/ids by Copy in the same pass. Correctness-neutral — all 100
viewpoint/mat lib tests pass unchanged.
MEASURED (new benches/fusion_bench.rs, embedding_extract A/B, 8 vp x 128-d):
before_double_clone 1.0029 us -> after_single_clone 461.6 ns (~2.17x)
End-to-end fusion_pipeline (8 vp): 202 us — marshalling is <1% of fusion
(n*n attention dominates), so end-to-end win is modest; the A/B isolates the
clone elimination. Reproduce:
cargo bench -p wifi-densepose-ruvector --bench fusion_bench
Co-Authored-By: claude-flow <ruv@ruv.net>