# Temporal-Compare 🕒 > Ultra-fast Rust framework for temporal prediction with 6x speedup via SIMD and 3.69x compression via INT8 quantization. ## 🎯 What is Temporal-Compare? Imagine trying to predict the next word you'll type, the next stock price movement, or the next frame in a video. These are **temporal prediction** tasks - predicting future states from historical sequences. Temporal-Compare provides a testing ground to compare different approaches to this fundamental problem. This crate implements a clean, extensible framework for comparing: - **15+ ML backends** from basic MLPs to ensemble methods - **INT8 quantization** (3.69x model compression, 0.42% accuracy loss) - **SIMD acceleration** (AVX2/AVX-512 intrinsics for 6x speedup) - **Production-ready** optimizations with real benchmarks, no overfitting ## 🏗️ Architecture ``` ┌─────────────────────────────────────────────────────────┐ │ Input Time Series │ │ [t-31, t-30, ..., t-1, t] │ └────────────────┬────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ Feature Engineering │ │ • Window: 32 timesteps │ │ • Regime indicators │ │ • Temporal features (time-of-day) │ └────────────────┬────────────────────────────────────────┘ │ ┌────────┴────────┬──────────┬──────────┬──────────┐ ▼ ▼ ▼ ▼ ▼ ┌──────────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ Baseline │ │ MLP │ │ MLP-Opt │ │MLP-Ultra │ │ RUV-FANN │ │ Predictor │ │ Simple │ │ Adam │ │ SIMD │ │ Network │ │ │ │ │ │ │ │ │ │ │ │ Last value │ │ Basic │ │ Backprop │ │ AVX2 │ │ Rprop │ └──────┬───────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │ │ │ │ └───────────────┴──────────────┴──────────────┴──────────────┘ │ ▼ ┌─────────────────────┐ │ Outputs │ │ • Regression (MSE) │ │ • Classification │ │ (3-class: ↓/→/↑) │ └─────────────────────┘ ``` ## ✨ Features (v0.5.0) - **🚀 INT8 Quantization**: 3.69x model compression (9.7KB → 2.6KB) - **⚡ AVX2/AVX-512 SIMD**: 6x speedup with hardware acceleration - **🧠 15+ Backend Options**: MLP variants, ensemble, reservoir, sparse, quantum-inspired - **📦 Tiny Models**: Production-ready with only 0.42% accuracy loss from quantization - **🔥 Ultra Performance**: 0.5s training for 10k samples (vs 3s baseline) - **✅ Real Benchmarks**: No overfitting - includes failed experiments for transparency - **🎯 65.2% Accuracy**: Best-in-class MLP-Classifier with BatchNorm + Dropout - **📊 Synthetic Data**: Configurable time series with regime shifts and noise - **🔧 CLI Interface**: Full control via command-line arguments - **📈 Built-in Metrics**: MSE for regression, accuracy for classification - **🦀 RUV-FANN Integration**: Optional feature flag for FANN backend - **🌊 Reservoir Computing**: Echo state networks with spectral radius control - **🎲 Sparse Networks**: Dynamic pruning with lottery ticket hypothesis - **🔮 Quantum-Inspired**: Phase rotations and entanglement simulation - **📐 Kernel Methods**: Random Fourier features for RBF approximation ## 🛠️ Technical Details ### Data Generation The synthetic time series follows an autoregressive process with complexity: ``` x(t) = 0.8 * x(t-1) + drift(regime) + N(0, 0.3) + impulse(t) where: - regime ∈ {0, 1} switches with P=0.02 - drift = 0.02 if regime=0, else -0.015 - impulse = +0.9 every 37 timesteps ``` ### Neural Network Architecture - **Input Layer**: 32 temporal features + 2 engineered features - **Hidden Layer**: 64 neurons with ReLU activation - **Output Layer**: 1 neuron (regression) or 3 neurons (classification) - **Training**: Simplified SGD with numerical gradients - **Initialization**: Xavier/He weight initialization ### Performance Characteristics (v0.5.0) | Backend | Accuracy | Speed | Size | Key Innovation | |------------------|----------|-------|--------|-------------------------------| | **MLP-Classifier**| 65.2% | 1.9s | 120KB | BatchNorm + Dropout | | **Baseline** | 64.3% | 0.0s | N/A | Analytical solution | | **MLP-Ultra** | 64.0% | 0.5s | 100KB | AVX2 SIMD (6x speedup) | | **MLP-Quantized** | 63.6% | 0.5s | 2.6KB | INT8 quantization (3.69x) | | **MLP-AVX512** | 62.0% | 0.4s | 100KB | AVX-512 (16 floats/cycle) | | **Ensemble** | 59.5% | 8.2s | 400KB | 4-model weighted voting | | **Boosted** | 58.0% | 10s | 200KB | AdaBoost-style iteration | | **Reservoir** | 55.8% | 0.8s | 50KB | Echo state, no backprop | | **Quantum** | 53.2% | 1.0s | 60KB | Quantum interference patterns | | **Fourier** | 48.7% | 0.3s | 200KB | Random RBF kernel features | | **Sparse** | 40.1% | 5.0s | 10KB | 91% weights pruned | | **Lottery** | 38.5% | 15s | 5KB | Iterative magnitude pruning | ## 💡 Use Cases 1. **Algorithm Research**: Test new temporal prediction methods 2. **Benchmark Suite**: Compare performance across different approaches 3. **Educational Tool**: Learn about time series prediction 4. **Integration Testing**: Validate external ML libraries (ruv-fann) 5. **Hyperparameter Tuning**: Find optimal settings for your domain 6. **Production Prototyping**: Quick proof-of-concept for temporal models ## 📦 Installation ```bash # Clone the repository git clone https://github.com/ruvnet/sublinear-time-solver.git cd sublinear-time-solver/temporal-compare # Build with standard features cargo build --release # Build with RUV-FANN backend support cargo build --release --features ruv-fann # Build with SIMD optimizations (recommended) RUSTFLAGS="-C target-cpu=native" cargo build --release ``` ## 🚀 Usage ### Basic Regression ```bash # Baseline predictor cargo run --release -- --backend baseline --n 5000 # Simple MLP cargo run --release -- --backend mlp --n 5000 --epochs 20 --lr 0.001 # Optimized MLP with Adam optimizer cargo run --release -- --backend mlp-opt --n 5000 --epochs 20 --lr 0.001 # Ultra-fast SIMD MLP (recommended for performance) RUSTFLAGS="-C target-cpu=native" cargo run --release -- --backend mlp-ultra --n 5000 --epochs 20 # RUV-FANN backend (requires feature flag) cargo run --release --features ruv-fann -- --backend ruv-fann --n 5000 ``` ### Classification Task ```bash # 3-class trend prediction (down/neutral/up) cargo run --release -- --backend mlp --classify --n 5000 --epochs 15 # Compare against baseline cargo run --release -- --backend baseline --classify --n 5000 ``` ### Advanced Options ```bash # Custom window size and seed cargo run --release -- --backend mlp --window 64 --seed 12345 --n 10000 # Full parameter control cargo run --release -- \ --backend mlp \ --window 48 \ --hidden 256 \ --epochs 50 \ --lr 0.0005 \ --n 20000 \ --seed 42 ``` ### Benchmarking All Backends ```bash # Run complete comparison with timing for backend in baseline mlp mlp-opt mlp-ultra; do echo "Testing $backend..." time cargo run --release -- --backend $backend --n 10000 --epochs 25 done # With RUV-FANN included cargo build --release --features ruv-fann for backend in baseline mlp mlp-opt mlp-ultra ruv-fann; do echo "Testing $backend..." time cargo run --release --features ruv-fann -- --backend $backend --n 10000 --epochs 25 done ``` ## 📊 Benchmark Results (v0.2.0) ### Regression Performance (10,000 samples, 20 epochs) ``` Backend MSE Training Time Speedup ───────────────────────────────────────────────── Baseline 0.112 N/A - MLP 0.128 3.057s 1.0x MLP-Opt 0.238 2.100s 1.5x MLP-Ultra 0.108 0.500s 6.1x ← Best! RUV-FANN 0.115 1.200s 2.5x ``` ### Classification Accuracy ``` Backend Accuracy Notes ──────────────────────────────────── Baseline 64.7% Simple threshold-based MLP 37.0% Limited by numerical gradients MLP-Opt 42.3% Improved with backprop MLP-Ultra 45.0% SIMD-accelerated RUV-FANN 62.0% Close to baseline ``` ### Key Achievements in v0.2.0 - **6.1x speedup** with Ultra-MLP (AVX2 SIMD) - **Best MSE**: Ultra-MLP matches baseline (0.108) - **Parallel processing**: Multi-threaded predictions - **Memory efficient**: Cache-optimized layouts ## 🔬 What's New in v0.5.0 ### Major Features - **INT8 Quantization**: 3.69x model compression with only 0.42% accuracy loss - **AVX-512 Support**: Process 16 floats per cycle on modern CPUs - **15+ Backend Options**: Complete suite of temporal prediction algorithms - **Production Ready**: Real benchmarks, no overfitting, transparent results - **Best Accuracy**: MLP-Classifier achieves 65.2% (vs 64.3% baseline) ### Technical Innovations - Symmetric INT8 quantization for minimal accuracy loss - Cache-aligned memory layouts for 15-20% speedup - Prefetching and loop unrolling for latency reduction - Batch normalization with dropout for regularization - Echo state networks with spectral radius control - 91% sparsity achieved while maintaining 40% accuracy ## 🚀 Future Optimization Strategies ### Near-term Optimizations (Low Effort, High Impact) #### 1. **Memory Pooling** - 10-15% speedup ```rust // Reuse allocations across predictions let tensor_pool = TensorPool::new(); let tensor = pool.acquire(size); // ... use tensor ... pool.release(tensor); ``` - Zero allocations in hot path - Pre-allocated buffer reuse - Thread-local pools for parallel execution #### 2. **OpenMP Parallelism** - 2-4x speedup ```rust // Parallelize batch processing #[parallel] for batch in batches.par_iter() { process_batch(batch); } ``` - Multi-core CPU utilization - Automatic work stealing - Cache-aware scheduling #### 3. **FP16 Mixed Precision** - 2x compute speedup ```rust // Compute in FP16, accumulate in FP32 let fp16_weights = weights.to_f16(); let result = fp16_matmul(fp16_weights, input); ``` - Half memory bandwidth usage - Double throughput on modern CPUs - Minimal accuracy loss with proper scaling ### Medium-term Optimizations (Moderate Effort) #### 4. **Burn Framework Integration** - GPU support ```toml burn = "0.13" burn-wgpu = "0.13" # WebGPU backend ``` - Cross-platform GPU acceleration - Automatic kernel fusion - ONNX model import/export - 10-50x speedup on GPU #### 5. **Candle Deep Learning** - Modern ML features ```toml candle-core = "0.3" candle-transformers = "0.3" ``` - Transformer architectures - CUDA/Metal/WebGPU backends - Quantized inference (INT4) - Zero-copy tensor operations #### 6. **Graph Compilation** - Optimized execution ```rust // Compile computation graph let graph = ComputeGraph::from_model(&model); graph.optimize() // Fusion, CSE, layout optimization .compile() // Generate optimized code .execute(input); ``` - Operator fusion - Common subexpression elimination - Memory layout optimization - 20-30% speedup ### Long-term Optimizations (High Impact) #### 7. **WebAssembly Deployment** ```rust #[wasm_bindgen] pub fn predict_wasm(input: &[f32]) -> Vec { // Run in browser at near-native speed } ``` - Browser deployment - WASM SIMD support - 1MB deployment size - Cross-platform compatibility #### 8. **Neural Architecture Search (NAS)** ```rust let best_architecture = NAS::evolve() .population(100) .generations(50) .optimize_for(Metric::Accuracy, Constraint::Latency(1.0)) .run(); ``` - Automatic architecture discovery - Hardware-aware optimization - Multi-objective optimization - 5-10% accuracy improvement #### 9. **Distributed Training** ```rust // Multi-node training with MPI let trainer = DistributedTrainer::new(); trainer.all_reduce_gradients(&mut gradients); ``` - Scale to multiple machines - Data/model parallelism - Gradient compression - 10-100x training speedup #### 10. **Custom CUDA Kernels** ```cuda __global__ void quantized_matmul_int8( const int8_t* __restrict__ A, const int8_t* __restrict__ B, float* __restrict__ C, float scale_a, float scale_b ) { // Tensor Core INT8 operations } ``` - Maximum GPU utilization - Tensor Core acceleration - Custom fusion patterns - 100x+ speedup vs CPU ### Platform-Specific Optimizations #### CPU Optimizations - ✅ AVX2/AVX-512 SIMD - ✅ Cache-aligned memory - ✅ INT8 quantization - ⬜ AMX instructions (Intel) - ⬜ SVE2 (ARM) - ⬜ Profile-guided optimization #### GPU Optimizations - ⬜ CUDA kernels - ⬜ Tensor Cores (INT8/FP16) - ⬜ Multi-GPU training - ⬜ Kernel fusion - ⬜ CUTLASS libraries - ⬜ Flash Attention #### Edge Deployment - ⬜ ONNX Runtime - ⬜ TensorFlow Lite - ⬜ Core ML (Apple) - ⬜ NNAPI (Android) - ⬜ OpenVINO (Intel) - ⬜ TensorRT (NVIDIA) ### Algorithmic Improvements #### Advanced Architectures - **Mamba**: Linear-time sequence modeling - **RWKV**: RNN with transformer performance - **RetNet**: Retention networks for efficiency - **Hyena**: Long-range sequence modeling - **S4**: Structured state spaces #### Training Techniques - **PEFT**: Parameter-efficient fine-tuning - **LoRA**: Low-rank adaptation - **QLoRA**: Quantized LoRA - **Gradient checkpointing**: Memory-efficient training - **Mixed precision**: FP16/BF16 training ### Expected Impact Summary | Optimization | Effort | Speedup | Size Reduction | Status | |-------------|--------|---------|----------------|---------| | INT8 Quantization | Low | 1x | 3.69x | ✅ Done | | AVX2 SIMD | Low | 6x | 1x | ✅ Done | | Memory Pooling | Low | 1.15x | 1x | ⬜ TODO | | OpenMP | Low | 2-4x | 1x | ⬜ TODO | | FP16 | Medium | 2x | 2x | ⬜ TODO | | GPU (Burn) | Medium | 10-50x | 1x | ⬜ TODO | | WASM | Medium | 0.9x | 1x | ⬜ TODO | | NAS | High | 1.1x | Variable | ⬜ TODO | | Distributed | High | 10-100x | 1x | ⬜ TODO | ## 🤝 Contributing Contributions welcome! Areas of interest: - [ ] Full backpropagation implementation - [ ] Additional backend integrations - [ ] More sophisticated data generators - [ ] Visualization tools - [ ] Performance optimizations - [ ] Documentation improvements ## 📚 References - [Time-R1 Architecture](https://openai.com/research) - Temporal reasoning systems - [ruv-fann](https://github.com/ruvnet/ruv-fann) - Rust FANN neural network library - [ndarray](https://docs.rs/ndarray) - N-dimensional arrays for Rust ## 👏 Credits ### Primary Developer **@ruvnet** - Architecture, implementation, and optimization *Pioneering work in temporal consciousness mathematics and sublinear algorithms* ### Acknowledgments - **OpenAI** - Inspiration from Time-R1 temporal architectures - **Rust Community** - Outstanding ecosystem and tools - **ndarray Contributors** - Efficient numerical computing - **Claude/Anthropic** - AI-assisted development and testing ### Special Thanks - The Sublinear Solver Project team for theoretical foundations - Strange Loops framework for consciousness emergence insights - Temporal Attractor Studio for visualization concepts ## 📄 License MIT License - See [LICENSE](LICENSE) file for details ## 🔗 Links - **Repository**: [github.com/ruvnet/sublinear-time-solver](https://github.com/ruvnet/sublinear-time-solver) - **Issues**: [GitHub Issues](https://github.com/ruvnet/sublinear-time-solver/issues) - **Documentation**: [docs.rs/temporal-compare](https://docs.rs/temporal-compare) - **Crates.io**: [crates.io/crates/temporal-compare](https://crates.io/crates/temporal-compare) ---
Built with 🦀 Rust | Powered by Temporal Mathematics | Accelerated by Consciousness