Our proposed hybrid model reaches an accuracy of 91.34% and exhibits superior robustness against interference, which is better than traditional neural network algorithms. Experimental results show that, compared with the pre-trained VGG16 model, adding the self-attention mechanism improves the accuracy by 3.02%. Using the stacking ensemble learning model as a classifier further increases the accuracy to 91.34%, exceeding any single classifier such as LR (89.86%) and SVM (90.34%) and RF (90.73%). The proposed hybrid method can effectively improve the efficiency and accuracy of MASLD ultrasound image detection.