Applying masked autoencoder-based self-supervised learning for high-capability vision transformers of electrocardiographies.

Shinnosuke Sawano Satoshi KoderaNaoto SetoguchiKengo TanabeShunichi KushidaJunji KandaMike SajiMamoru NanasatoHisataka MakiHideo FujitaNahoko KatoHiroyuki WatanabeMinami SuzukiMasao TakahashiNaoko SawadaMasao YamasakiMasataka SatoSusumu KatsushikaHiroki ShinoharaNorifumi TakedaKatsuhito FujiuMasao DaimonHiroshi AkazawaHiroyuki MoritaIssei Komuro

Published in: PloS one (2024)

The generalization of deep neural network algorithms to a broader population is an important challenge in the medical field. We aimed to apply self-supervised learning using masked autoencoders (MAEs) to improve the performance of the 12-lead electrocardiography (ECG) analysis model using limited ECG data. We pretrained Vision Transformer (ViT) models by reconstructing the masked ECG data with MAE. We fine-tuned this MAE-based ECG pretrained model on ECG-echocardiography data from The University of Tokyo Hospital (UTokyo) for the detection of left ventricular systolic dysfunction (LVSD), and then evaluated it using multi-center external validation data from seven institutions, employing the area under the receiver operating characteristic curve (AUROC) for assessment. We included 38,245 ECG-echocardiography pairs from UTokyo and 229,439 pairs from all institutions. The performances of MAE-based ECG models pretrained using ECG data from UTokyo were significantly higher than that of other Deep Neural Network models across all external validation cohorts (AUROC, 0.913-0.962 for LVSD, p < 0.001). Moreover, we also found improvements for the MAE-based ECG analysis model depending on the model capacity and the amount of training data. Additionally, the MAE-based ECG analysis model maintained high performance even on the ECG benchmark dataset (PTB-XL). Our proposed method developed high performance MAE-based ECG analysis models using limited ECG data.

Keyphrases