Add ruvnet/midstream (AIMDS real-time inference) and ruvnet/sublinear-time-solver (sublinear optimization algorithms) as vendored dependencies under vendor/. |
||
|---|---|---|
| .. | ||
| src | ||
| CHANGELOG.md | ||
| Cargo.toml | ||
| LICENSE | ||
| OPTIMIZATIONS.md | ||
| README.md | ||
| RESULTS.md | ||
README.md
Temporal-Compare ๐
Ultra-fast Rust framework for temporal prediction with 6x speedup via SIMD and 3.69x compression via INT8 quantization.
๐ฏ What is Temporal-Compare?
Imagine trying to predict the next word you'll type, the next stock price movement, or the next frame in a video. These are temporal prediction tasks - predicting future states from historical sequences. Temporal-Compare provides a testing ground to compare different approaches to this fundamental problem.
This crate implements a clean, extensible framework for comparing:
- 15+ ML backends from basic MLPs to ensemble methods
- INT8 quantization (3.69x model compression, 0.42% accuracy loss)
- SIMD acceleration (AVX2/AVX-512 intrinsics for 6x speedup)
- Production-ready optimizations with real benchmarks, no overfitting
๐๏ธ Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Input Time Series โ
โ [t-31, t-30, ..., t-1, t] โ
โโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Feature Engineering โ
โ โข Window: 32 timesteps โ
โ โข Regime indicators โ
โ โข Temporal features (time-of-day) โ
โโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโดโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโ
โผ โผ โผ โผ โผ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ
โ Baseline โ โ MLP โ โ MLP-Opt โ โMLP-Ultra โ โ RUV-FANN โ
โ Predictor โ โ Simple โ โ Adam โ โ SIMD โ โ Network โ
โ โ โ โ โ โ โ โ โ โ
โ Last value โ โ Basic โ โ Backprop โ โ AVX2 โ โ Rprop โ
โโโโโโโโฌโโโโโโโโ โโโโโโฌโโโโโโ โโโโโโฌโโโโโโ โโโโโโฌโโโโโโ โโโโโโฌโโโโโโ
โ โ โ โ โ
โโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โ Outputs โ
โ โข Regression (MSE) โ
โ โข Classification โ
โ (3-class: โ/โ/โ) โ
โโโโโโโโโโโโโโโโโโโโโโโ
โจ Features (v0.5.0)
- ๐ INT8 Quantization: 3.69x model compression (9.7KB โ 2.6KB)
- โก AVX2/AVX-512 SIMD: 6x speedup with hardware acceleration
- ๐ง 15+ Backend Options: MLP variants, ensemble, reservoir, sparse, quantum-inspired
- ๐ฆ Tiny Models: Production-ready with only 0.42% accuracy loss from quantization
- ๐ฅ Ultra Performance: 0.5s training for 10k samples (vs 3s baseline)
- โ Real Benchmarks: No overfitting - includes failed experiments for transparency
- ๐ฏ 65.2% Accuracy: Best-in-class MLP-Classifier with BatchNorm + Dropout
- ๐ Synthetic Data: Configurable time series with regime shifts and noise
- ๐ง CLI Interface: Full control via command-line arguments
- ๐ Built-in Metrics: MSE for regression, accuracy for classification
- ๐ฆ RUV-FANN Integration: Optional feature flag for FANN backend
- ๐ Reservoir Computing: Echo state networks with spectral radius control
- ๐ฒ Sparse Networks: Dynamic pruning with lottery ticket hypothesis
- ๐ฎ Quantum-Inspired: Phase rotations and entanglement simulation
- ๐ Kernel Methods: Random Fourier features for RBF approximation
๐ ๏ธ Technical Details
Data Generation
The synthetic time series follows an autoregressive process with complexity:
x(t) = 0.8 * x(t-1) + drift(regime) + N(0, 0.3) + impulse(t)
where:
- regime โ {0, 1} switches with P=0.02
- drift = 0.02 if regime=0, else -0.015
- impulse = +0.9 every 37 timesteps
Neural Network Architecture
- Input Layer: 32 temporal features + 2 engineered features
- Hidden Layer: 64 neurons with ReLU activation
- Output Layer: 1 neuron (regression) or 3 neurons (classification)
- Training: Simplified SGD with numerical gradients
- Initialization: Xavier/He weight initialization
Performance Characteristics (v0.5.0)
| Backend | Accuracy | Speed | Size | Key Innovation |
|---|---|---|---|---|
| MLP-Classifier | 65.2% | 1.9s | 120KB | BatchNorm + Dropout |
| Baseline | 64.3% | 0.0s | N/A | Analytical solution |
| MLP-Ultra | 64.0% | 0.5s | 100KB | AVX2 SIMD (6x speedup) |
| MLP-Quantized | 63.6% | 0.5s | 2.6KB | INT8 quantization (3.69x) |
| MLP-AVX512 | 62.0% | 0.4s | 100KB | AVX-512 (16 floats/cycle) |
| Ensemble | 59.5% | 8.2s | 400KB | 4-model weighted voting |
| Boosted | 58.0% | 10s | 200KB | AdaBoost-style iteration |
| Reservoir | 55.8% | 0.8s | 50KB | Echo state, no backprop |
| Quantum | 53.2% | 1.0s | 60KB | Quantum interference patterns |
| Fourier | 48.7% | 0.3s | 200KB | Random RBF kernel features |
| Sparse | 40.1% | 5.0s | 10KB | 91% weights pruned |
| Lottery | 38.5% | 15s | 5KB | Iterative magnitude pruning |
๐ก Use Cases
- Algorithm Research: Test new temporal prediction methods
- Benchmark Suite: Compare performance across different approaches
- Educational Tool: Learn about time series prediction
- Integration Testing: Validate external ML libraries (ruv-fann)
- Hyperparameter Tuning: Find optimal settings for your domain
- Production Prototyping: Quick proof-of-concept for temporal models
๐ฆ Installation
# Clone the repository
git clone https://github.com/ruvnet/sublinear-time-solver.git
cd sublinear-time-solver/temporal-compare
# Build with standard features
cargo build --release
# Build with RUV-FANN backend support
cargo build --release --features ruv-fann
# Build with SIMD optimizations (recommended)
RUSTFLAGS="-C target-cpu=native" cargo build --release
๐ Usage
Basic Regression
# Baseline predictor
cargo run --release -- --backend baseline --n 5000
# Simple MLP
cargo run --release -- --backend mlp --n 5000 --epochs 20 --lr 0.001
# Optimized MLP with Adam optimizer
cargo run --release -- --backend mlp-opt --n 5000 --epochs 20 --lr 0.001
# Ultra-fast SIMD MLP (recommended for performance)
RUSTFLAGS="-C target-cpu=native" cargo run --release -- --backend mlp-ultra --n 5000 --epochs 20
# RUV-FANN backend (requires feature flag)
cargo run --release --features ruv-fann -- --backend ruv-fann --n 5000
Classification Task
# 3-class trend prediction (down/neutral/up)
cargo run --release -- --backend mlp --classify --n 5000 --epochs 15
# Compare against baseline
cargo run --release -- --backend baseline --classify --n 5000
Advanced Options
# Custom window size and seed
cargo run --release -- --backend mlp --window 64 --seed 12345 --n 10000
# Full parameter control
cargo run --release -- \
--backend mlp \
--window 48 \
--hidden 256 \
--epochs 50 \
--lr 0.0005 \
--n 20000 \
--seed 42
Benchmarking All Backends
# Run complete comparison with timing
for backend in baseline mlp mlp-opt mlp-ultra; do
echo "Testing $backend..."
time cargo run --release -- --backend $backend --n 10000 --epochs 25
done
# With RUV-FANN included
cargo build --release --features ruv-fann
for backend in baseline mlp mlp-opt mlp-ultra ruv-fann; do
echo "Testing $backend..."
time cargo run --release --features ruv-fann -- --backend $backend --n 10000 --epochs 25
done
๐ Benchmark Results (v0.2.0)
Regression Performance (10,000 samples, 20 epochs)
Backend MSE Training Time Speedup
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Baseline 0.112 N/A -
MLP 0.128 3.057s 1.0x
MLP-Opt 0.238 2.100s 1.5x
MLP-Ultra 0.108 0.500s 6.1x โ Best!
RUV-FANN 0.115 1.200s 2.5x
Classification Accuracy
Backend Accuracy Notes
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Baseline 64.7% Simple threshold-based
MLP 37.0% Limited by numerical gradients
MLP-Opt 42.3% Improved with backprop
MLP-Ultra 45.0% SIMD-accelerated
RUV-FANN 62.0% Close to baseline
Key Achievements in v0.2.0
- 6.1x speedup with Ultra-MLP (AVX2 SIMD)
- Best MSE: Ultra-MLP matches baseline (0.108)
- Parallel processing: Multi-threaded predictions
- Memory efficient: Cache-optimized layouts
๐ฌ What's New in v0.5.0
Major Features
- INT8 Quantization: 3.69x model compression with only 0.42% accuracy loss
- AVX-512 Support: Process 16 floats per cycle on modern CPUs
- 15+ Backend Options: Complete suite of temporal prediction algorithms
- Production Ready: Real benchmarks, no overfitting, transparent results
- Best Accuracy: MLP-Classifier achieves 65.2% (vs 64.3% baseline)
Technical Innovations
- Symmetric INT8 quantization for minimal accuracy loss
- Cache-aligned memory layouts for 15-20% speedup
- Prefetching and loop unrolling for latency reduction
- Batch normalization with dropout for regularization
- Echo state networks with spectral radius control
- 91% sparsity achieved while maintaining 40% accuracy
๐ Future Optimization Strategies
Near-term Optimizations (Low Effort, High Impact)
1. Memory Pooling - 10-15% speedup
// Reuse allocations across predictions
let tensor_pool = TensorPool::new();
let tensor = pool.acquire(size);
// ... use tensor ...
pool.release(tensor);
- Zero allocations in hot path
- Pre-allocated buffer reuse
- Thread-local pools for parallel execution
2. OpenMP Parallelism - 2-4x speedup
// Parallelize batch processing
#[parallel]
for batch in batches.par_iter() {
process_batch(batch);
}
- Multi-core CPU utilization
- Automatic work stealing
- Cache-aware scheduling
3. FP16 Mixed Precision - 2x compute speedup
// Compute in FP16, accumulate in FP32
let fp16_weights = weights.to_f16();
let result = fp16_matmul(fp16_weights, input);
- Half memory bandwidth usage
- Double throughput on modern CPUs
- Minimal accuracy loss with proper scaling
Medium-term Optimizations (Moderate Effort)
4. Burn Framework Integration - GPU support
burn = "0.13"
burn-wgpu = "0.13" # WebGPU backend
- Cross-platform GPU acceleration
- Automatic kernel fusion
- ONNX model import/export
- 10-50x speedup on GPU
5. Candle Deep Learning - Modern ML features
candle-core = "0.3"
candle-transformers = "0.3"
- Transformer architectures
- CUDA/Metal/WebGPU backends
- Quantized inference (INT4)
- Zero-copy tensor operations
6. Graph Compilation - Optimized execution
// Compile computation graph
let graph = ComputeGraph::from_model(&model);
graph.optimize() // Fusion, CSE, layout optimization
.compile() // Generate optimized code
.execute(input);
- Operator fusion
- Common subexpression elimination
- Memory layout optimization
- 20-30% speedup
Long-term Optimizations (High Impact)
7. WebAssembly Deployment
#[wasm_bindgen]
pub fn predict_wasm(input: &[f32]) -> Vec<f32> {
// Run in browser at near-native speed
}
- Browser deployment
- WASM SIMD support
- 1MB deployment size
- Cross-platform compatibility
8. Neural Architecture Search (NAS)
let best_architecture = NAS::evolve()
.population(100)
.generations(50)
.optimize_for(Metric::Accuracy, Constraint::Latency(1.0))
.run();
- Automatic architecture discovery
- Hardware-aware optimization
- Multi-objective optimization
- 5-10% accuracy improvement
9. Distributed Training
// Multi-node training with MPI
let trainer = DistributedTrainer::new();
trainer.all_reduce_gradients(&mut gradients);
- Scale to multiple machines
- Data/model parallelism
- Gradient compression
- 10-100x training speedup
10. Custom CUDA Kernels
__global__ void quantized_matmul_int8(
const int8_t* __restrict__ A,
const int8_t* __restrict__ B,
float* __restrict__ C,
float scale_a, float scale_b
) {
// Tensor Core INT8 operations
}
- Maximum GPU utilization
- Tensor Core acceleration
- Custom fusion patterns
- 100x+ speedup vs CPU
Platform-Specific Optimizations
CPU Optimizations
- โ AVX2/AVX-512 SIMD
- โ Cache-aligned memory
- โ INT8 quantization
- โฌ AMX instructions (Intel)
- โฌ SVE2 (ARM)
- โฌ Profile-guided optimization
GPU Optimizations
- โฌ CUDA kernels
- โฌ Tensor Cores (INT8/FP16)
- โฌ Multi-GPU training
- โฌ Kernel fusion
- โฌ CUTLASS libraries
- โฌ Flash Attention
Edge Deployment
- โฌ ONNX Runtime
- โฌ TensorFlow Lite
- โฌ Core ML (Apple)
- โฌ NNAPI (Android)
- โฌ OpenVINO (Intel)
- โฌ TensorRT (NVIDIA)
Algorithmic Improvements
Advanced Architectures
- Mamba: Linear-time sequence modeling
- RWKV: RNN with transformer performance
- RetNet: Retention networks for efficiency
- Hyena: Long-range sequence modeling
- S4: Structured state spaces
Training Techniques
- PEFT: Parameter-efficient fine-tuning
- LoRA: Low-rank adaptation
- QLoRA: Quantized LoRA
- Gradient checkpointing: Memory-efficient training
- Mixed precision: FP16/BF16 training
Expected Impact Summary
| Optimization | Effort | Speedup | Size Reduction | Status |
|---|---|---|---|---|
| INT8 Quantization | Low | 1x | 3.69x | โ Done |
| AVX2 SIMD | Low | 6x | 1x | โ Done |
| Memory Pooling | Low | 1.15x | 1x | โฌ TODO |
| OpenMP | Low | 2-4x | 1x | โฌ TODO |
| FP16 | Medium | 2x | 2x | โฌ TODO |
| GPU (Burn) | Medium | 10-50x | 1x | โฌ TODO |
| WASM | Medium | 0.9x | 1x | โฌ TODO |
| NAS | High | 1.1x | Variable | โฌ TODO |
| Distributed | High | 10-100x | 1x | โฌ TODO |
๐ค Contributing
Contributions welcome! Areas of interest:
- Full backpropagation implementation
- Additional backend integrations
- More sophisticated data generators
- Visualization tools
- Performance optimizations
- Documentation improvements
๐ References
- Time-R1 Architecture - Temporal reasoning systems
- ruv-fann - Rust FANN neural network library
- ndarray - N-dimensional arrays for Rust
๐ Credits
Primary Developer
@ruvnet - Architecture, implementation, and optimization Pioneering work in temporal consciousness mathematics and sublinear algorithms
Acknowledgments
- OpenAI - Inspiration from Time-R1 temporal architectures
- Rust Community - Outstanding ecosystem and tools
- ndarray Contributors - Efficient numerical computing
- Claude/Anthropic - AI-assisted development and testing
Special Thanks
- The Sublinear Solver Project team for theoretical foundations
- Strange Loops framework for consciousness emergence insights
- Temporal Attractor Studio for visualization concepts
๐ License
MIT License - See LICENSE file for details
๐ Links
- Repository: github.com/ruvnet/sublinear-time-solver
- Issues: GitHub Issues
- Documentation: docs.rs/temporal-compare
- Crates.io: crates.io/crates/temporal-compare