Lung cancer clustering by identification of similarities and discrepancies of DNA copy numbers using maximal information coefficient.

Nezamoddin N KachouieWejdan DeebaniMeshal ShutaywiDavid C Christiani

Published in: PloS one (2024)

Lung cancer is the second most diagnosed cancer and the first cause of cancer related death for men and women in the United States. Early detection is essential as patient survival is not optimal and recurrence rate is high. Copy number (CN) changes in cancer populations have been broadly investigated to identify CN gains and deletions associated with the cancer. In this research, the similarities between cancer and paired peripheral blood samples are identified using maximal information coefficient (MIC) and the spatial locations with substantially high MIC scores in each chromosome are used for clustering analysis. The results showed that a sizable reduction of feature set can be obtained using only a subset of locations with high MIC values. The clustering performance was evaluated using both true rate and normalized mutual information (NMI). Clustering results using the reduced feature set outperformed the performance of clustering using entire feature set in several chromosomes that are highly associated with lung cancer with several identified oncogenes.

Keyphrases