2.2 KiB
2.2 KiB
Temporal-Compare Benchmark Results
Test Configuration
- Dataset: Synthetic temporal data with Gaussian noise
- Window size: 32 time steps
- Task: Time-R1 style temporal prediction
Results Summary
Regression Task (MSE - Lower is Better)
| Backend | Train Size | Epochs | MSE (Val) | MSE (Test) |
|---|---|---|---|---|
| Baseline | N/A | N/A | N/A | 0.1120 |
| MLP | 2000 | 15 | 0.1375 | 0.1281 |
| MLP | 5000 | 20 | 0.1722 | 0.1424 |
Classification Task (Accuracy - Higher is Better)
| Backend | Train Size | Epochs | Accuracy |
|---|---|---|---|
| Baseline | N/A | N/A | 0.6467 |
| MLP | 2000 | 15 | 0.3700 |
| MLP | 1000 | 10 | 0.1667 |
Key Observations
-
Baseline Performance: The naive baseline (predicting last value in window) performs surprisingly well:
- MSE: ~0.11
- Accuracy: ~65-70%
-
MLP Challenges: The simplified MLP without full backpropagation shows:
- Regression: Competitive with baseline (MSE: 0.128 vs 0.112)
- Classification: Underperforms baseline significantly (37% vs 65%)
-
Training Dynamics:
- Lower learning rates (0.001) improve stability
- More epochs don't always improve performance
- The simplified SGD approach limits learning capacity
Architecture Details
MLP Implementation
- Architecture: Input(32) → Hidden(64) → Output(1 or 3)
- Activation: ReLU
- Training: Simplified SGD with numerical gradient approximation
- Weight Init: Xavier/He initialization
Baseline
- Strategy: Returns last value in temporal window
- Classification: Maps continuous values to 3 classes via thresholds
Compilation Features
✅ Successfully builds with all backends:
baseline: Always availablemlp: Native Rust implementationruv-fann: Feature-gated, compiles successfully
Future Improvements
- Implement full backpropagation for better gradient flow
- Add momentum and adaptive learning rates
- Implement proper cross-entropy loss for classification
- Add validation-based early stopping
- Integrate actual ruv-fann backend implementation