Login / Signup

Sample Size Considerations for Fine-Tuning Large Language Models for Named Entity Recognition Tasks: Methodological Study.

Zoltan P MajdikS Scott GrahamJade C Shiva EdwardSabrina N RodriguezMartha Sue KarnesJared T JensenJoshua B BarbourJustin F Rousseau
Published in: JMIR AI (2024)
Relatively modest sample sizes can be used to fine-tune LLMs for NER tasks applied to biomedical text, and training data entity density should representatively approximate entity density in production data. Training data quality and a model architecture's intended use (text generation vs text processing or classification) may be as, or more, important as training data volume and model parameter size.
Keyphrases
  • electronic health record
  • big data
  • smoking cessation
  • air pollution
  • machine learning
  • working memory
  • virtual reality
  • deep learning
  • autism spectrum disorder
  • data analysis
  • artificial intelligence