Replication — Eval

Goal

Show the test-set performance of the four trained models on the held-out final 12 months of district-month data. Numbers below are loaded from the metrics CSV produced by evaluator.py — they are the same numbers reported in the Results chapter.

Performance table

Table 1: Test-set performance across the four models on the held-out final 12 months.
  Model RMSE (cases) MAE (cases) MAPE (%)
0 Random Forest 131.43 72.10 0.8563 41.48
1 XGBoost 129.07 69.50 0.8614 44.64
2 MLP 128.07 71.45 0.8635 43.50
3 Weighted Ensemble 126.20 68.07 0.8675 41.01

Feature–target correlations

Table 2: Pearson correlation of each engineered feature with log(1 + cases), district-month panel.
  Feature Pearson_r_with_log_cases
0 cases_lag1 0.731
1 temp_mean_lag1 0.599
2 temp_roll3 0.559
3 precip_lag1 0.313
4 monsoon 0.245
5 precip_roll3 0.208
6 humidity_lag1 0.180
7 flood_lag1 0.122
8 month_sin -0.133
9 month_cos -0.226

Interpretation

  • The Weighted Ensemble is the headline winner — R² = 0.8675, RMSE = 126.20, MAE = 68.07, MAPE = 41.01 %. It edges every individual model on every metric, validating the use of structurally distinct learners whose errors are partially uncorrelated.
  • All four architectures converge to within 0.012 R² of each other (Random Forest 0.8563, XGBoost 0.8614, MLP 0.8635, Ensemble 0.8675). This is not noise: it indicates the predictive ceiling is set by the climate × autoregressive signal in the data, not by the choice of model family.
  • XGBoost is the operational pick when only a single model can be deployed — lowest MAE among the individual models (69.50), interpretable gain-based feature importance, graceful handling of missing values, and reproducible under a fixed random seed.
  • Random Forest has the lowest MAPE among individuals (41.5 %), meaning marginally better proportional accuracy on low-incidence districts. It remains the conservative baseline.
  • MLP edges out the tree-based individual models on overall R² (0.8635) but at the cost of higher absolute-error variance and stricter input-scaling requirements.

Where the trained pickles live

The pipeline persists trained models to code/output/models/ next to the source code (see Train). The file layout is:

code/output/models/
├── rf_model.pkl
├── xgb_model.pkl
├── mlp_model.pkl     (or lstm_model.h5 if TensorFlow is installed)
├── scaler_X.pkl
└── scaler_y.pkl

The directory is gitignored by default — model artifacts are regenerable from source and shouldn’t be committed.

To regenerate from scratch, see Train.

Tip

The figures supporting this evaluation — model comparison, correlation heatmap, ecological-zone distribution, national trend — are on the Figures page.