8.2 KiB
Comprehensive Benchmark Guide
This guide covers all benchmarks for the Midstream workspace's 6 production crates.
Overview
All benchmarks use Criterion.rs for statistical analysis and HTML report generation. Each crate has comprehensive benchmarks targeting specific performance goals.
Running Benchmarks
Run All Benchmarks
cargo bench
Run Specific Crate Benchmarks
# Temporal Compare (DTW, LCS, Edit Distance)
cargo bench --bench temporal_bench
# Nanosecond Scheduler
cargo bench --bench scheduler_bench
# Temporal Attractor Studio
cargo bench --bench attractor_bench
# Temporal Neural Solver
cargo bench --bench solver_bench
# Strange Loop (Meta-Learning)
cargo bench --bench meta_bench
# QUIC Multistream
cargo bench --bench quic_bench
Run Specific Benchmark Groups
# DTW performance tests
cargo bench --bench temporal_bench dtw
# Scheduler overhead tests
cargo bench --bench scheduler_bench overhead
# Phase space embedding
cargo bench --bench attractor_bench embedding
Performance Targets
1. Temporal Compare (temporal_bench.rs)
Targets:
- DTW n=100: <10ms
- LCS n=100: <5ms
- Edit distance n=100: <3ms
- Cache hit: <1μs
Benchmark Groups:
dtw_benches // DTW performance across sizes
lcs_benches // LCS algorithm performance
edit_benches // Edit distance operations
cache_benches // Cache hit/miss scenarios
memory_benches // Memory allocation patterns
Key Metrics:
- Throughput (elements/second)
- Mean execution time
- Standard deviation
- Memory allocations
2. Nanosecond Scheduler (scheduler_bench.rs)
Targets:
- Schedule overhead: <100ns
- Task execution: <1μs
- Stats calculation: <10μs
- Multi-threaded scaling
Benchmark Groups:
overhead_benches // Schedule operation overhead
latency_benches // Task execution latency
queue_benches // Priority queue operations
stats_benches // Statistics calculation
threading_benches // Multi-threaded scenarios
Key Scenarios:
- High/low contention
- Priority variations
- Batch operations
- Concurrent scheduling
3. Temporal Attractor Studio (attractor_bench.rs)
Targets:
- Phase space embedding: <20ms (n=1000)
- Lyapunov calculation: <500ms
- Attractor detection: <100ms
- Dimension estimation
Benchmark Groups:
embedding_benches // Phase space reconstruction
lyapunov_benches // Lyapunov exponent calculation
detection_benches // Attractor type detection
trajectory_benches // Trajectory analysis
dimension_benches // Dimension estimation
chaos_benches // Chaos detection
pipeline_benches // Complete analysis pipeline
Test Attractors:
- Lorenz attractor
- Rössler attractor
- Hénon map
- Periodic signals
- Random data
4. Temporal Neural Solver (solver_bench.rs)
Targets:
- Formula encoding: <10ms
- Verification: <100ms
- Parsing: <5ms
- State checking: <1μs
Benchmark Groups:
encoding_benches // LTL formula encoding
parsing_benches // Formula parsing
verification_benches // Trace verification
state_benches // State operations
neural_benches // Neural verification
operator_benches // Temporal operators
pipeline_benches // Complete pipeline
LTL Operations:
- Next (X)
- Globally (G)
- Finally (F)
- Until (U)
- Boolean combinations
5. Strange Loop (meta_bench.rs)
Targets:
- Meta-learning iteration: <50ms
- Pattern extraction: <20ms
- Integration overhead: <100ms
- Recursive optimization
Benchmark Groups:
learning_benches // Meta-learning iteration
pattern_benches // Pattern extraction/matching
hierarchy_benches // Multi-level learning
integration_benches // Cross-crate integration
recursive_benches // Self-referential operations
pipeline_benches // Complete meta-learning cycle
Integration Tests:
- With temporal-compare (DTW)
- With nanosecond-scheduler
- With attractor-studio
- Cross-crate overhead
6. QUIC Multistream (quic_bench.rs)
Targets:
- Stream establishment: <1ms
- Multiplexing overhead: <100μs
- Throughput: >1GB/s
- Connection setup: <10ms
Benchmark Configuration
Criterion Settings
Each benchmark group uses optimized Criterion configuration:
criterion_group! {
name = benches;
config = Criterion::default()
.sample_size(100) // Statistical samples
.measurement_time(Duration::from_secs(10)) // Per benchmark
.warm_up_time(Duration::from_secs(3)); // Warmup period
targets = ...
}
Custom Configurations
Fast benchmarks (overhead, parsing):
- sample_size: 500-1000
- measurement_time: 5s
Slow benchmarks (neural, integration):
- sample_size: 30-50
- measurement_time: 15s
Understanding Results
HTML Reports
After running benchmarks, view results at:
target/criterion/[benchmark_name]/report/index.html
Key Metrics
- Mean: Average execution time
- Std Dev: Consistency indicator
- Median: Central tendency
- MAD: Median Absolute Deviation
- Throughput: Operations per second
Regression Detection
Criterion automatically detects performance regressions:
- Green: Performance improved
- Yellow: Within noise threshold
- Red: Performance regressed
Profiling Integration
With perf
cargo bench --bench temporal_bench -- --profile-time=10
perf record -g cargo bench --bench temporal_bench
perf report
With flamegraph
cargo install flamegraph
cargo flamegraph --bench temporal_bench
With valgrind (memory)
cargo bench --bench temporal_bench -- --profile-time=10
valgrind --tool=cachegrind target/release/temporal_bench
Best Practices
1. Consistent Environment
- Close other applications
- Disable CPU frequency scaling
- Use consistent power settings
- Run multiple times
2. Baseline Establishment
# Create baseline
cargo bench -- --save-baseline main
# Compare against baseline
git checkout feature-branch
cargo bench -- --baseline main
3. Statistical Validity
- Minimum 30 samples for statistical significance
- Watch for outliers (high std dev)
- Multiple runs for consistency
4. Realistic Data
- Use production-like data sizes
- Include edge cases
- Test boundary conditions
- Vary input patterns
CI/CD Integration
GitHub Actions
- name: Run benchmarks
run: cargo bench --no-fail-fast
- name: Upload benchmark results
uses: actions/upload-artifact@v3
with:
name: benchmark-results
path: target/criterion/
Performance Tracking
Store baseline results in repo:
git add target/criterion/*/base/
git commit -m "Update benchmark baselines"
Optimization Workflow
- Identify bottlenecks: Run benchmarks, check reports
- Profile: Use flamegraph/perf for hotspots
- Optimize: Make targeted improvements
- Verify: Re-run benchmarks
- Compare: Check against baseline
- Document: Update if targets change
Common Issues
High Variance
- System load too high
- Thermal throttling
- Background processes
- Insufficient samples
Solution: Increase sample size, close applications, check CPU frequency.
Unexpected Regressions
- Compiler version changes
- Dependency updates
- System configuration
- Measurement noise
Solution: Compare multiple runs, check git diff, validate hardware.
Memory Benchmarks Inconsistent
- GC timing (if applicable)
- Allocator behavior
- Page faults
- Cache effects
Solution: Increase warmup time, use fixed heap size, minimize allocations.
Future Enhancements
- Continuous benchmark tracking
- Performance regression alerts
- Cross-platform comparison
- Memory profiling integration
- Automated optimization suggestions
- Benchmark result visualization
- Historical trend analysis
Resources
Summary: All 6 crates now have comprehensive benchmarks covering core functionality, edge cases, and integration scenarios. Total ~2,800 lines of benchmark code targeting specific performance goals for each crate.