A two-step log-linear procedure for graphical representation and inference of associations in cross-classified data for disease diagnosis.
José Fernando Vera VeraJosé A Roldán-NofuentesPublished in: Statistics in medicine (2023)
Biometrical sciences and disease diagnosis in particular, are often concerned with the analysis of associations for cross-classified data, for which distance association models give us a graphical interpretation for non-sparse matrices with a low number of categories. In this framework, usually binary exploratory and response variables are present, with analysis based on individual profiles being of great interest. For saturated models, we show the usual linear relationship for log-linear models is preserved in full dimension for the distance association parameterization. This enables a two-step procedure to facilitate the analysis and the interpretation of associations in terms of unfolding after the overall and main effects are removed. The proposed procedure can deal with cross-classified data for profiles by binary variables, and it is easy to implement using traditional statistical software. For disease diagnosis, the problems of a degenerate solution in the unfolding representation, and that of determining significant differences between the profile locations are addressed. A hypothesis test of independence based on odds ratio is considered. Furthermore, a procedure is proposed to determine the causes of the significance of the test, avoiding the problem of error propagation. The equivalence between a test for equality of odds ratio pairs and the test for equality of location for two profiles in the unfolding representation in the disease diagnosis is shown. The results have been applied to a real example on the diagnosis of coronary disease, relating the odds ratios with performance parameters of the diagnostic test.