Typhoid Prediction in Nepal

Predictive Modeling of Typhoid Incidence in Nepal Under Extreme Climate Change Scenarios Using Machine Learning

Author
Affiliation

Kritika Baral

Kathmandu University

Published

May 17, 2026

Abstract

Typhoid fever, caused by Salmonella enterica serovar Typhi, imposes a persistent and disproportionate public-health burden on Nepal, where monsoon-driven flooding and climate variability increasingly stress water, sanitation, and hygiene (WASH) systems. This study integrates four open-access national datasets across Nepal’s 77 administrative districts and 108 months (January 2015 – December 2023; N = 7,327 district-month observations): district-level outpatient enteric-fever surveillance from the Ministry of Health and Population HMIS, flood-event records from the DRR Portal (NDRRMA), CHIRPS gridded precipitation, and ERA5-Land temperature and relative humidity. Four machine-learning models — Random Forest, XGBoost, Multilayer Perceptron, and a Weighted Ensemble — are trained on a chronological partition (training: 2015 – mid-2022; test: most recent 12 months) to prevent temporal leakage. The Weighted Ensemble achieves the strongest predictive performance on the held-out window (R² = 0.8675, RMSE = 126.20 cases / district-month, MAE = 68.07, MAPE = 41.01 %), with all four architectures clustering tightly (R² 0.856 – 0.868). Feature-importance analysis identifies the prior month’s case count (cases_lag1, normalised importance ≈ 0.67) as the single most influential predictor, followed by one-month-lagged mean temperature (≈ 0.09), cyclical month encodings (≈ 0.12 combined), and the monsoon indicator (≈ 0.04). Bivariate correlations confirm strong climate sensitivity: mean temperature (r = 0.50 – 0.63), precipitation (r = 0.33 – 0.37), monsoon (r = 0.25 – 0.29), and flood events (r = 0.19) all correlate positively with typhoid incidence. Seasonal analysis shows that August records the highest median district-month case count (~640 cases) — nearly three times the dry-season median. Under SSP2-4.5, the climate × disease signal implies a +25 % rise in national typhoid burden by 2050 (~469,000 annual cases); SSP5-8.5 implies +40 %, with disproportionate impact on flood-prone Terai districts. The study provides the most comprehensive quantitative evidence to date for the climatic sensitivity of typhoid transmission in Nepal, an open-source end-to-end early-warning pipeline that operates entirely on freely accessible climate data, and a basis for district-level health planning aligned with Nepal’s National Adaptation Plan 2021–2050.

Keywords

Typhoid fever, Climate change, Flooding, Machine learning, Extreme climate events, Nepal, Disease prediction, Public health surveillance

Highlights

  • Problem. Climate-driven year-on-year variability in typhoid incidence across Nepal’s 77 districts is poorly characterised by existing surveillance systems, leaving health authorities reactive rather than anticipatory.
  • Approach. Four machine-learning models — Random Forest, XGBoost, a Multilayer Perceptron, and a Weighted Ensemble — trained on a 2015–2023 district-month panel (N = 7,327) that integrates HMIS surveillance, ERA5-Land + CHIRPS climate, and DRR-portal flood records, with a strictly chronological train/test split (last 12 months held out for testing) to prevent data leakage.
  • Result. The Weighted Ensemble is the headline model: R² = 0.8675, RMSE = 126.20 cases / district-month, MAE = 68.07, MAPE = 41.01 %. All four architectures cluster tightly (R² 0.856 – 0.868), confirming that the climate × autoregressive signal — not the model family — sets the predictive ceiling. XGBoost is the operational pick when interpretability and reproducibility under partial inputs matter more than headline R².
  • Driver structure. Feature-importance analysis identifies the prior month’s case count (cases_lag1, normalised importance ≈ 0.67) as the single most influential predictor, followed by lagged mean temperature (≈ 0.09), cyclical month encodings (≈ 0.12 combined), and the monsoon indicator (≈ 0.04). Bivariate climate correlations: temperature r = 0.50 – 0.63, precipitation r = 0.33 – 0.37, monsoon r = 0.25 – 0.29, flood events r = 0.19. August records the highest median district-month case count (≈ 640 cases) — nearly three times the dry-season median.
  • Projection. Under medium-emissions SSP2-4.5, the climate × disease signal implies a +25 % national typhoid burden by 2050 (~469,000 annual cases); SSP5-8.5 implies +40 %, with disproportionate impact on flood-prone Terai districts.
  • Replication. Every figure and metric on this site is loaded from the source code and data in code/ and committed data/ and tables/. See the Replication section for the runnable pipeline.

Read the paper

The full write-up follows the standard structure — sidebar links from Introduction onward. Headline results live in Results and the policy translation is in Discussion.

Reproduce the results

The Replication section is end-to-end runnable: data loading, feature engineering, model fitting, and evaluation. Clone the repo, install dependencies (make install), then make regenerates the entire site locally — including all charts and tables.

How to cite

Baral, K. Predictive Modeling of Typhoid Incidence in Nepal Under Extreme Climate Change Scenarios Using Machine Learning. Department of Health Informatics, Kathmandu University, Dhulikhel, Nepal. 2026. URL: https://baralsamrat.github.io