Login / Signup

Respiratory Diseases, Malaria and Leishmaniasis: Temporal and Spatial Association with Fire Occurrences from Knowledge Discovery and Data Mining.

Lucas SchroederMauricio Roberto VeronezEniuce Menezes de SouzaDiego BrumLuiz GonzagaVinicius Francisco Rofatto
Published in: International journal of environmental research and public health (2020)
The relationship between the fires occurrences and diseases is an essential issue for making public health policy and environment protecting strategy. Thanks to the Internet, today, we have a huge amount of health data and fire occurrence reports at our disposal. The challenge, therefore, is how to deal with 4 Vs (volume, variety, velocity and veracity) associated with these data. To overcome this problem, in this paper, we propose a method that combines techniques based on Data Mining and Knowledge Discovery from Databases (KDD) to discover spatial and temporal association between diseases and the fire occurrences. Here, the case study was addressed to Malaria, Leishmaniasis and respiratory diseases in Brazil. Instead of losing a lot of time verifying the consistency of the database, the proposed method uses Decision Tree, a machine learning-based supervised classification, to perform a fast management and extract only relevant and strategic information, with the knowledge of how reliable the database is. Namely, States, Biomes and period of the year (months) with the highest rate of fires could be identified with great success rates and in few seconds. Then, the K-means, an unsupervised learning algorithms that solves the well-known clustering problem, is employed to identify the groups of cities where the fire occurrences is more expressive. Finally, the steps associated with KDD is perfomed to extract useful information from mined data. In that case, Spearman's rank correlation coefficient, a nonparametric measure of rank correlation, is computed to infer the statistical dependence between fire occurrences and those diseases. Moreover, maps are also generated to represent the distribution of the mined data. From the results, it was possible to identify that each region showed a susceptible behaviour to some disease as well as some degree of correlation with fire outbreak, mainly in the drought period.
Keyphrases