The use of medical data for machine learning, including unsupervised methods such as clustering, is often restricted by privacy regulations such as the Health Insurance Portability and Accountability Act (HIPAA). Medical data is sensitive and highly regulated and anonymization is often insufficient to protect a patient's identity. Traditional clustering algorithms are also unsuitable for longitudinal behavioral health trials, which often have missing data and observe individual behaviors over varying time periods. In this work, we develop a new decentralized federated multiple imputation-based fuzzy clustering algorithm for complex longitudinal behavioral trial data collected from multisite randomized controlled trials over different time periods. Federated learning (FL) preserves privacy by aggregating model parameters instead of data. Unlike previous FL methods, this proposed algorithm requires only two rounds of communication and handles clients with varying numbers of time points for incomplete longitudinal data. The model is evaluated on both empirical longitudinal dietary health data and simulated clusters with different numbers of clients, effect sizes, correlations, and sample sizes. The proposed algorithm converges rapidly and achieves desirable performance on multiple clustering metrics. This new method allows for targeted treatments for various patient groups while preserving their data privacy and enables the potential for broader applications in the Internet of Medical Things.
Keyphrases
- machine learning
- big data
- electronic health record
- healthcare
- health insurance
- health information
- public health
- artificial intelligence
- clinical trial
- mental health
- single cell
- data analysis
- deep learning
- case report
- rna seq
- study protocol
- social media
- risk assessment
- health promotion
- hiv infected
- open label
- neural network
- phase iii
- affordable care act