A Bayesian analysis for pseudo-compositional data with spatial structure.
Edson Zangiacomi MartinezJorge A AchcarDavi Casale AragonMarisa Afonso de Andrade BrunherottiPublished in: Statistical methods in medical research (2019)
We proposed a Bayesian analysis of pseudo-compositional data in presence of a latent factor, assuming a spatial structure. This development was motivated by a dataset containing information on the number of newborns of primiparous mothers living in each of the microregions of the state of Sao Paulo, Brazil, in the year of 2015, stratified by the age of the mothers (15-18, 19-29 and 30 years or more). Considering that data on newborns are not stochastically distributed among the three age groups, but they are explained in relation to women's population structure, we adopted the expression "pseudo-compositional data" to refer to this data structure. The hypothesis of interest establishes that the age of the first pregnancy is associated with the economic conditions of the geographic area where the mother lives. The incidence of poverty was included as an independent variable. Additive log-ratio (alr) and isometric log-ratio (ilr) transformations were considered, as is usually done in the analysis of compositional data. The model included a random effect related to the spatial effect assumed to have a conditional autoregressive structure. A Bayesian Markov Chain Monte Carlo (MCMC) simulation procedure was used to get the posterior summaries of interest. The model based on the (ilr) transformation was well fitted to the data, showing that in the microregions with the highest incidence of poverty, there are higher proportions of women who have their first child in adolescence, while in the microregions with the lowest incidence of poverty, there are higher proportions of women who have their first child after the age of 30 years. From these results it is possible to conclude that this Bayesian approach was very useful in the estimation of the parameters of the proposed model. The proposed method should have a broad application to other problems involving pseudo-compositional data.