Improving Generalizability of PET DL Algorithms: List-Mode Reconstructions Improve DOTATATE PET Hepatic Lesion Detection Performance.
Xinyi YangMichael SiloskyJonathan WehrendDaniel V LitwillerMuthiah NachiappanScott D MetzlerDebashis GhoshFuyong XingBennett B ChinPublished in: Bioengineering (Basel, Switzerland) (2024)
Deep learning (DL) algorithms used for DOTATATE PET lesion detection typically require large, well-annotated training datasets. These are difficult to obtain due to low incidence of gastroenteropancreatic neuroendocrine tumors (GEP-NETs) and the high cost of manual annotation. Furthermore, networks trained and tested with data acquired from site specific PET/CT instrumentation, acquisition and processing protocols have reduced performance when tested with offsite data. This lack of generalizability requires even larger, more diverse training datasets. The objective of this study is to investigate the feasibility of improving DL algorithm performance by better matching the background noise in training datasets to higher noise, out-of-domain testing datasets. 68 Ga-DOTATATE PET/CT datasets were obtained from two scanners: Scanner1, a state-of-the-art digital PET/CT (GE DMI PET/CT; n = 83 subjects), and Scanner2, an older-generation analog PET/CT (GE STE; n = 123 subjects). Set1 , the data set from Scanner1, was reconstructed with standard clinical parameters (5 min; Q.Clear) and list-mode reconstructions (VPFXS 2, 3, 4, and 5-min). Set2 , data from Scanner2 representing out-of-domain clinical scans, used standard iterative reconstruction (5 min; OSEM). A deep neural network was trained with each dataset: Network1 for Scanner1 and Network2 for Scanner2. DL performance (Network1) was tested with out-of-domain test data ( Set2 ). To evaluate the effect of training sample size, we tested DL model performance using a fraction (25%, 50% and 75%) of Set1 for training. Scanner1, list-mode 2-min reconstructed data demonstrated the most similar noise level compared that of Set2 , resulting in the best performance ( F 1 = 0.713). This was not significantly different compared to the highest performance, upper-bound limit using in-domain training for Network2 ( F 1 = 0.755; p -value = 0.103). Regarding sample size, the F 1 score significantly increased from 25% training data ( F 1 = 0.478) to 100% training data ( F 1 = 0.713; p < 0.001). List-mode data from modern PET scanners can be reconstructed to better match the noise properties of older scanners. Using existing data and their associated annotations dramatically reduces the cost and effort in generating these datasets and significantly improves the performance of existing DL algorithms. List-mode reconstructions can provide an efficient, low-cost method to improve DL algorithm generalizability.