2.2 KiB

Raw Blame History

Temporal-Compare Benchmark Results

Test Configuration

Dataset: Synthetic temporal data with Gaussian noise
Window size: 32 time steps
Task: Time-R1 style temporal prediction

Results Summary

Regression Task (MSE - Lower is Better)

Backend	Train Size	Epochs	MSE (Val)	MSE (Test)
Baseline	N/A	N/A	N/A	0.1120
MLP	2000	15	0.1375	0.1281
MLP	5000	20	0.1722	0.1424

Classification Task (Accuracy - Higher is Better)

Backend	Train Size	Epochs	Accuracy
Baseline	N/A	N/A	0.6467
MLP	2000	15	0.3700
MLP	1000	10	0.1667

Key Observations

Baseline Performance: The naive baseline (predicting last value in window) performs surprisingly well:
- MSE: ~0.11
- Accuracy: ~65-70%
MLP Challenges: The simplified MLP without full backpropagation shows:
- Regression: Competitive with baseline (MSE: 0.128 vs 0.112)
- Classification: Underperforms baseline significantly (37% vs 65%)
Training Dynamics:
- Lower learning rates (0.001) improve stability
- More epochs don't always improve performance
- The simplified SGD approach limits learning capacity

Architecture Details

MLP Implementation

Architecture: Input(32) → Hidden(64) → Output(1 or 3)
Activation: ReLU
Training: Simplified SGD with numerical gradient approximation
Weight Init: Xavier/He initialization

Baseline

Strategy: Returns last value in temporal window
Classification: Maps continuous values to 3 classes via thresholds

Compilation Features

✅ Successfully builds with all backends:

baseline: Always available
mlp: Native Rust implementation
ruv-fann: Feature-gated, compiles successfully

Future Improvements

Implement full backpropagation for better gradient flow
Add momentum and adaptive learning rates
Implement proper cross-entropy loss for classification
Add validation-based early stopping
Integrate actual ruv-fann backend implementation