Interrater Agreement and Reliability of PERCIST and Visual Assessment When Using 18F-FDG-PET/CT for Response Monitoring of Metastatic Breast Cancer.
Jonas S SørensenMie Holm VilstrupJorun HolmMarianne VogsenJakob L BülowLasse LjungstrømPoul-Erik BraadOke GerkeMalene Grubbe HildebrandtPublished in: Diagnostics (Basel, Switzerland) (2020)
Response evaluation at regular intervals is indicated for treatment of metastatic breast cancer (MBC). FDG-PET/CT has the potential to monitor treatment response accurately. Our purpose was to: (a) compare the interrater agreement and reliability of the semi-quantitative PERCIST criteria to qualitative visual assessment in response evaluation of MBC and (b) investigate the intrarater agreement when comparing visual assessment of each rater to their respective PERCIST assessment. We performed a retrospective study on FDG-PET/CT in women who received treatment for MBC. Three specialists in nuclear medicine categorized response evaluation by qualitative assessment and standardized one-lesion PERCIST assessment. The scans were categorized into complete metabolic response, partial metabolic response, stable metabolic disease, and progressive metabolic disease. 37 patients with 179 scans were included. Visual assessment categorization yielded moderate agreement with an overall proportion of agreement (PoA) between raters of 0.52 (95% CI 0.44-0.66) and a Fleiss kappa estimate of 0.54 (95% CI 0.46-0.62). PERCIST response categorization yielded substantial agreement with an overall PoA of 0.65 (95% CI 0.57-0.73) and a Fleiss kappa estimate of 0.68 (95% CI 0.60-0.75). The difference in PoA between overall estimates for PERCIST and visual assessment was 0.13 (95% CI 0.06-0.21; p = 0.001), that of kappa was 0.14 (95% CI 0.06-0.21; p < 0.001). The overall intrarater PoA was 0.80 (95% CI 0.75-0.84) with substantial agreement by a Fleiss kappa of 0.74 (95% CI 0.69-0.79). Semi-quantitative PERCIST assessment achieved significantly higher level of overall agreement and reliability compared with qualitative assessment among three raters. The achieved high levels of intrarater agreement indicated no obvious conflicting elements between the two methods. PERCIST assessment may, therefore, give more consistent interpretations between raters when using FDG-PET/CT for response evaluation in MBC.