Unsupervised Learning Applied to the Stratification of Preterm Birth Risk in Brazil with Socioeconomic Data.
Márcio L B LopesRaquel de Melo BarbosaMarcelo A C FernandesPublished in: International journal of environmental research and public health (2022)
Preterm birth (PTB) is a phenomenon that brings risks and challenges for the survival of the newborn child. Despite many advances in research, not all the causes of PTB are already clear. It is understood that PTB risk is multi-factorial and can also be associated with socioeconomic factors. Thereby, this article seeks to use unsupervised learning techniques to stratify PTB risk in Brazil using only socioeconomic data. Through the use of datasets made publicly available by the Federal Government of Brazil, a new dataset was generated with municipality-level socioeconomic data and a PTB occurrence rate. This dataset was processed using various unsupervised learning techniques, such as k -means, principal component analysis (PCA), and density-based spatial clustering of applications with noise (DBSCAN). After validation, four clusters with high levels of PTB occurrence were discovered, as well as three with low levels. The clusters with high PTB were comprised mostly of municipalities with lower levels of education, worse quality of public services-such as basic sanitation and garbage collection-and a less white population. The regional distribution of the clusters was also observed, with clusters of high PTB located mostly in the North and Northeast regions of Brazil. The results indicate a positive influence of the quality of life and the offer of public services on the reduction in PTB risk.