Sample Size Considerations for Fine-Tuning Large Language Models for Named Entity Recognition Tasks: Methodological Study.
Zoltan P MajdikS Scott GrahamJade C Shiva EdwardSabrina N RodriguezMartha Sue KarnesJared T JensenJoshua B BarbourJustin F RousseauPublished in: JMIR AI (2024)
Relatively modest sample sizes can be used to fine-tune LLMs for NER tasks applied to biomedical text, and training data entity density should representatively approximate entity density in production data. Training data quality and a model architecture's intended use (text generation vs text processing or classification) may be as, or more, important as training data volume and model parameter size.