Reproducibility of the Development and Validation Process of Standard Area Diagram by Two Laboratories: An Example Using the Botrytis cinerea/Gerbera jamesonii Pathosystem.
Vilma Pereira de MeloAna Claudia da Silva MendonçaHudson Sergio de SouzaLorrant Cavanha GabrielClive H BockMahogani J EatonKátia Regina Freitas Schwan-EstradaWilliam Mário de Carvalho NunesPublished in: Plant disease (2020)
Standard area diagrams (SADs) are plant disease severity assessment aids demonstrated to improve the accuracy and reliability of visual estimates of severity. Knowledge of the sources of variation, including those specific to a lab such as raters, specific procedures followed including instruction, image analysis software, image viewing time, etc., that affect the outcome of development and validation of SADs can help improve standard operating practice of these assessment aids. As reproducibility has not previously been explored in development of SADs, we aimed to explore the overarching question of whether the lab in which the measurement and validation of a SAD was performed affected the outcome of the process. Two different labs (Lab 1 and Lab 2) measured severity on the individual diagrams in a SAD and validated them independently for severity of gray mold (caused by Botrytis cinerea) on Gerbera daisy. Severity measurements of the 30 test images were performed independently at the two labs as well. A different group of 18 raters at each lab assessed the test images first without, and secondly with SADs under independent instruction at both Lab 1 and 2. Results showed that actual severity on the SADs as measured at each lab varied by up to 5.18%. Furthermore, measurement of the test image actual values varied from 0 to up to 24.29%, depending on image. Whereas at Lab 1 an equivalence test indicated no significant improvement in any measure of agreement with use of the SADs, at Lab 2, scale shift, generalized bias, and agreement were significantly improved with use of the SADs (P ≤ 0.05). An analysis of variance indicated differences existed between labs, use of the SADs aid, and the interaction, depending on the agreement statistic. Based on an equivalence test, the interrater reliability was significantly (P ≤ 0.05) improved at both Lab 1 and Lab 2 as a result of using SADs as an aid to severity estimation. Gain in measures of agreement and reliability tended to be greatest for the least able raters at both Lab 1 and Lab 2. Absolute error was reduced at both labs when raters used SADs. The results confirm that SADs are a useful tool, but the results demonstrated that aspects of the development and validation process in different labs may affect the outcome.