# Comprehensive Benchmark Guide This guide covers all benchmarks for the Midstream workspace's 6 production crates. ## Overview All benchmarks use Criterion.rs for statistical analysis and HTML report generation. Each crate has comprehensive benchmarks targeting specific performance goals. ## Running Benchmarks ### Run All Benchmarks ```bash cargo bench ``` ### Run Specific Crate Benchmarks ```bash # Temporal Compare (DTW, LCS, Edit Distance) cargo bench --bench temporal_bench # Nanosecond Scheduler cargo bench --bench scheduler_bench # Temporal Attractor Studio cargo bench --bench attractor_bench # Temporal Neural Solver cargo bench --bench solver_bench # Strange Loop (Meta-Learning) cargo bench --bench meta_bench # QUIC Multistream cargo bench --bench quic_bench ``` ### Run Specific Benchmark Groups ```bash # DTW performance tests cargo bench --bench temporal_bench dtw # Scheduler overhead tests cargo bench --bench scheduler_bench overhead # Phase space embedding cargo bench --bench attractor_bench embedding ``` ## Performance Targets ### 1. Temporal Compare (`temporal_bench.rs`) **Targets:** - DTW n=100: <10ms - LCS n=100: <5ms - Edit distance n=100: <3ms - Cache hit: <1μs **Benchmark Groups:** ```rust dtw_benches // DTW performance across sizes lcs_benches // LCS algorithm performance edit_benches // Edit distance operations cache_benches // Cache hit/miss scenarios memory_benches // Memory allocation patterns ``` **Key Metrics:** - Throughput (elements/second) - Mean execution time - Standard deviation - Memory allocations ### 2. Nanosecond Scheduler (`scheduler_bench.rs`) **Targets:** - Schedule overhead: <100ns - Task execution: <1μs - Stats calculation: <10μs - Multi-threaded scaling **Benchmark Groups:** ```rust overhead_benches // Schedule operation overhead latency_benches // Task execution latency queue_benches // Priority queue operations stats_benches // Statistics calculation threading_benches // Multi-threaded scenarios ``` **Key Scenarios:** - High/low contention - Priority variations - Batch operations - Concurrent scheduling ### 3. Temporal Attractor Studio (`attractor_bench.rs`) **Targets:** - Phase space embedding: <20ms (n=1000) - Lyapunov calculation: <500ms - Attractor detection: <100ms - Dimension estimation **Benchmark Groups:** ```rust embedding_benches // Phase space reconstruction lyapunov_benches // Lyapunov exponent calculation detection_benches // Attractor type detection trajectory_benches // Trajectory analysis dimension_benches // Dimension estimation chaos_benches // Chaos detection pipeline_benches // Complete analysis pipeline ``` **Test Attractors:** - Lorenz attractor - Rössler attractor - Hénon map - Periodic signals - Random data ### 4. Temporal Neural Solver (`solver_bench.rs`) **Targets:** - Formula encoding: <10ms - Verification: <100ms - Parsing: <5ms - State checking: <1μs **Benchmark Groups:** ```rust encoding_benches // LTL formula encoding parsing_benches // Formula parsing verification_benches // Trace verification state_benches // State operations neural_benches // Neural verification operator_benches // Temporal operators pipeline_benches // Complete pipeline ``` **LTL Operations:** - Next (X) - Globally (G) - Finally (F) - Until (U) - Boolean combinations ### 5. Strange Loop (`meta_bench.rs`) **Targets:** - Meta-learning iteration: <50ms - Pattern extraction: <20ms - Integration overhead: <100ms - Recursive optimization **Benchmark Groups:** ```rust learning_benches // Meta-learning iteration pattern_benches // Pattern extraction/matching hierarchy_benches // Multi-level learning integration_benches // Cross-crate integration recursive_benches // Self-referential operations pipeline_benches // Complete meta-learning cycle ``` **Integration Tests:** - With temporal-compare (DTW) - With nanosecond-scheduler - With attractor-studio - Cross-crate overhead ### 6. QUIC Multistream (`quic_bench.rs`) **Targets:** - Stream establishment: <1ms - Multiplexing overhead: <100μs - Throughput: >1GB/s - Connection setup: <10ms ## Benchmark Configuration ### Criterion Settings Each benchmark group uses optimized Criterion configuration: ```rust criterion_group! { name = benches; config = Criterion::default() .sample_size(100) // Statistical samples .measurement_time(Duration::from_secs(10)) // Per benchmark .warm_up_time(Duration::from_secs(3)); // Warmup period targets = ... } ``` ### Custom Configurations **Fast benchmarks** (overhead, parsing): - sample_size: 500-1000 - measurement_time: 5s **Slow benchmarks** (neural, integration): - sample_size: 30-50 - measurement_time: 15s ## Understanding Results ### HTML Reports After running benchmarks, view results at: ``` target/criterion/[benchmark_name]/report/index.html ``` ### Key Metrics 1. **Mean**: Average execution time 2. **Std Dev**: Consistency indicator 3. **Median**: Central tendency 4. **MAD**: Median Absolute Deviation 5. **Throughput**: Operations per second ### Regression Detection Criterion automatically detects performance regressions: - Green: Performance improved - Yellow: Within noise threshold - Red: Performance regressed ## Profiling Integration ### With perf ```bash cargo bench --bench temporal_bench -- --profile-time=10 perf record -g cargo bench --bench temporal_bench perf report ``` ### With flamegraph ```bash cargo install flamegraph cargo flamegraph --bench temporal_bench ``` ### With valgrind (memory) ```bash cargo bench --bench temporal_bench -- --profile-time=10 valgrind --tool=cachegrind target/release/temporal_bench ``` ## Best Practices ### 1. Consistent Environment - Close other applications - Disable CPU frequency scaling - Use consistent power settings - Run multiple times ### 2. Baseline Establishment ```bash # Create baseline cargo bench -- --save-baseline main # Compare against baseline git checkout feature-branch cargo bench -- --baseline main ``` ### 3. Statistical Validity - Minimum 30 samples for statistical significance - Watch for outliers (high std dev) - Multiple runs for consistency ### 4. Realistic Data - Use production-like data sizes - Include edge cases - Test boundary conditions - Vary input patterns ## CI/CD Integration ### GitHub Actions ```yaml - name: Run benchmarks run: cargo bench --no-fail-fast - name: Upload benchmark results uses: actions/upload-artifact@v3 with: name: benchmark-results path: target/criterion/ ``` ### Performance Tracking Store baseline results in repo: ```bash git add target/criterion/*/base/ git commit -m "Update benchmark baselines" ``` ## Optimization Workflow 1. **Identify bottlenecks**: Run benchmarks, check reports 2. **Profile**: Use flamegraph/perf for hotspots 3. **Optimize**: Make targeted improvements 4. **Verify**: Re-run benchmarks 5. **Compare**: Check against baseline 6. **Document**: Update if targets change ## Common Issues ### High Variance - System load too high - Thermal throttling - Background processes - Insufficient samples **Solution**: Increase sample size, close applications, check CPU frequency. ### Unexpected Regressions - Compiler version changes - Dependency updates - System configuration - Measurement noise **Solution**: Compare multiple runs, check git diff, validate hardware. ### Memory Benchmarks Inconsistent - GC timing (if applicable) - Allocator behavior - Page faults - Cache effects **Solution**: Increase warmup time, use fixed heap size, minimize allocations. ## Future Enhancements - [ ] Continuous benchmark tracking - [ ] Performance regression alerts - [ ] Cross-platform comparison - [ ] Memory profiling integration - [ ] Automated optimization suggestions - [ ] Benchmark result visualization - [ ] Historical trend analysis ## Resources - [Criterion.rs Documentation](https://bheisler.github.io/criterion.rs/book/) - [Rust Performance Book](https://nnethercote.github.io/perf-book/) - [Linux perf Tutorial](https://perf.wiki.kernel.org/index.php/Tutorial) --- **Summary**: All 6 crates now have comprehensive benchmarks covering core functionality, edge cases, and integration scenarios. Total ~2,800 lines of benchmark code targeting specific performance goals for each crate.