Login / Signup

Results and implications for generative AI in a large introductory biomedical and health informatics course.

William HershKate Fultz Hollis
Published in: NPJ digital medicine (2024)
Generative artificial intelligence (AI) systems have performed well at many biomedical tasks, but few studies have assessed their performance directly compared to students in higher-education courses. We compared student knowledge-assessment scores with prompting of 6 large-language model (LLM) systems as they would be used by typical students in a large online introductory course in biomedical and health informatics that is taken by graduate, continuing education, and medical students. The state-of-the-art LLM systems were prompted to answer multiple-choice questions (MCQs) and final exam questions. We compared the scores for 139 students (30 graduate students, 85 continuing education students, and 24 medical students) to the LLM systems. All of the LLMs scored between the 50 th and 75 th percentiles of students for MCQ and final exam questions. The performance of LLMs raises questions about student assessment in higher education, especially in courses that are knowledge-based and online.
Keyphrases
  • high school
  • healthcare
  • artificial intelligence
  • medical students
  • medical education
  • big data
  • health information
  • public health
  • quality improvement
  • deep learning
  • mental health
  • social media
  • risk assessment
  • human health