Login / Signup

Observer-study-based approaches to quantitatively evaluate the realism of synthetic medical images.

Ziping LiuScott WolfeZitong YuRichard LaforestJoyce C MhlangaTyler J FraumMalak ItaniFarrokh DehdashtiBarry A SiegelAbhinav K Jha
Published in: Physics in medicine and biology (2023)
Synthetic images generated by simulation studies have a well-recognized role in developing and evaluating imaging systems and methods. For clinically relevant development and evaluation, synthetic images must be clinically realistic and, ideally, have the same distribution as that of clinical images. Thus, mechanisms that can quantitatively evaluate this clinical realism and, ideally, similarity in distributions of real and synthetic images, are much needed.

We investigated two observer-study-based approaches to quantitatively evaluate the clinical realism of synthetic images. First, we presented a theoretical formalism for using an ideal observer to quantitatively evaluate similarity in distributions between real and synthetic images. Our theoretical formalism provides a direct relationship between the ideal-observer AUC and distributions of real and synthetic images. The second approach is based on using human observers to quantitatively evaluate the clinical realism. We developed a web-based software to conduct two-alternative forced-choice (2-AFC) experiments with expert human readers. Usability of this software was evaluated by conducting a system usability scale (SUS) survey with seven expert readers and five observer-study designers. Further, we demonstrated the application of this software to evaluate a stochastic and physics-based image-synthesis technique for oncologic PET, where the 2-AFC study was performed by six expert readers who were highly experienced in reading PET scans.

In the first approach, we theoretically demonstrated that the ideal-observer AUC can be expressed by the Bhattacharyya distance between distributions of real and synthetic images. We showed that a decrease in the ideal-observer AUC indicates a decrease in distance between the two image distributions. Moreover, a lower bound of AUC = 0.5 implies that distributions of synthetic and real images exactly match. In the second approach, results from the SUS survey demonstrate that our developed software is highly usable. As a secondary finding, evaluation of the PET image-synthesis technique using our software showed that expert readers were generally unable to distinguish the real and synthetic images.

This work addresses the important need for mechanisms to quantitatively evaluate the clinical realism of synthetic images. Our mathematical treatment shows that quantifying the similarity in distributions of real and synthetic images is theoretically possible with an ideal-observer-study-based approach. Our developed software provides a platform for designing and performing 2-AFC experiments with human observers in a highly accessible, efficient, and secure manner. Additionally, results on evaluation of the PET image-synthesis technique motivate the application of this technique to develop and evaluate a wide array of PET imaging methods.
Keyphrases