Art or Artifact: Evaluating the Accuracy, Appeal, and Educational Value of AI-Generated Imagery in DALL·E 3 for Illustrating Congenital Heart Diseases.

Mohammad-Hani TemsahAbdullah N AlhuzaimiMohammed Almansour Fadi Ajman Khalid A AlhasanMunirah A BatarfiIbraheem AltamimiAmani AlharbiAdel Abdulaziz AlsuhaibaniLeena AlwakeelAbdulrahman Abdulkhaliq AlzahraniKhaled B Alsulaim Amr A Jamal Afnan KhayatMohammed Hussien AlghamdiRabih Halwani Muhammad Khurram KhanAyman Al-EyadhyRakan Nazer

Published in: Journal of medical systems (2024)

Artificial Intelligence (AI), particularly AI-Generated Imagery, has the potential to impact medical and patient education. This research explores the use of AI-generated imagery, from text-to-images, in medical education, focusing on congenital heart diseases (CHD). Utilizing ChatGPT's DALL·E 3, the research aims to assess the accuracy and educational value of AI-created images for 20 common CHDs. In this study, we utilized DALL·E 3 to generate a comprehensive set of 110 images, comprising ten images depicting the normal human heart and five images for each of the 20 common CHDs. The generated images were evaluated by a diverse group of 33 healthcare professionals. This cohort included cardiology experts, pediatricians, non-pediatric faculty members, trainees (medical students, interns, pediatric residents), and pediatric nurses. Utilizing a structured framework, these professionals assessed each image for anatomical accuracy, the usefulness of in-picture text, its appeal to medical professionals, and the image's potential applicability in medical presentations. Each item was assessed on a Likert scale of three. The assessments produced a total of 3630 images' assessments. Most AI-generated cardiac images were rated poorly as follows: 80.8% of images were rated as anatomically incorrect or fabricated, 85.2% rated to have incorrect text labels, 78.1% rated as not usable for medical education. The nurses and medical interns were found to have a more positive perception about the AI-generated cardiac images compared to the faculty members, pediatricians, and cardiology experts. Complex congenital anomalies were found to be significantly more predicted to anatomical fabrication compared to simple cardiac anomalies. There were significant challenges identified in image generation. Based on our findings, we recommend a vigilant approach towards the use of AI-generated imagery in medical education at present, underscoring the imperative for thorough validation and the importance of collaboration across disciplines. While we advise against its immediate integration until further validations are conducted, the study advocates for future AI-models to be fine-tuned with accurate medical data, enhancing their reliability and educational utility.

Keyphrases