SARIMA and ARDL models for predicting leptospirosis in Anuradhapura district Sri Lanka.
Janith Niwanthaka WarnasekaraSuneth AgampodiAbeynayake NrPublished in: PloS one (2022)
Leptospirosis is considered a neglected tropical disease despite its considerable mortality and morbidity. Lack of prediction remains a major reason for underestimating the disease. Although many models have been developed, most of them focused on the districts situated in the wet zone due to higher case numbers in that region. However, leptospirosis remains a major disease even in the dry zone of Sri Lanka. The objective of this study is to develop a time series model to predict leptospirosis in the Anuradhapura district situated in the dry zone of Sri Lanka. Time series data on monthly leptospirosis incidences from January 2008 to December 2018 and monthly rainfall, rainy days, temperature, and relative humidity were considered in model fitting. The first 72 months (55%) were used to fit the model, and the subsequent 60 months(45%) were used to validate the model. The log-transformed dependent variable was employed for fitting the Univariate seasonal ARIMA model. Based on the stationarity of the mean of the five variables, the ARDL model was selected as the multivariate time series technique. Residuals analysis was performed on normality, heteroskedasticity, and serial correlation to validate the model. The lowest AIC and MAPE were used to select the best model. Univariate models could not be fitted without adjusting the outliers. Adjusting seasonal outliers yielded better results than the models without adjustments. Best fitted Univariate model was ARIMA(1,0,0)(0,1,1)12,(AIC-1.08, MAPE-19.8). Best fitted ARDL model was ARDL(1, 3, 2, 1, 0),(AIC-2.04,MAPE-30.4). The number of patients reported in the previous month, rainfall, rainy days, and temperature showed a positive association, while relative humidity was negatively associated with leptospirosis. Multivariate models fitted better than univariate models for the original data. Best-fitted models indicate the necessity of including other explanatory variables such as patient, host, and epidemiological factors to yield better results.