Quality Evaluation Scores are no more Reliable than Gestalt in Evaluating the Quality of Emergency Medicine Blogs: A METRIQ Study.
Brent ThomaStefanie S Sebok-SyerIsabelle Colmers-GrayJonathan SherbinoFelix AnkelN Seth TruegerAndrew GrockMarshall SiemensMichael PaddockEve PurdyWilliam Kenneth MilneTeresa M Channull nullPublished in: Teaching and learning in medicine (2018)
The average scores of each blog post correlated strongly with gestalt ratings. However, neither ALiEM AIR nor METRIQ-8 showed higher reliability than gestalt. Improved reliability may be possible through rater training and instrument refinement.