Volume of hyperintense inflammation (VHI): A quantitative imaging biomarker of inflammation load in spondyloarthritis, enabled by human-machine cooperation.

Carolyna HepburnAlexis JonesAlan BainbridgeCoziana CiurtinJuan Eugenio IglesiasHui ZhangMargaret Anne Hall-Craggs Timothy James Pengilly Bray

Published in: PloS one (2023)

Qualitative visual assessment of MRI scans is a key mechanism by which inflammation is assessed in clinical practice. For example, in axial spondyloarthritis (axSpA), visual assessment focuses on the identification of regions with increased signal in the bone marrow, known as bone marrow oedema (BMO), on water-sensitive images. The identification of BMO has an important role in the diagnosis, quantification and monitoring of disease in axSpA. However, BMO evaluation depends heavily on the experience and expertise of the image reader, creating substantial imprecision. Deep learning-based segmentation is a natural approach to addressing this imprecision, but purely automated solutions require large training sets that are not currently available, and deep learning solutions with limited data may not be sufficiently trustworthy for use in clinical practice. To address this, we propose a workflow for inflammation segmentation incorporating both deep learning and human input. With this 'human-machine cooperation' workflow, a preliminary segmentation is generated automatically by deep learning; a human reader then 'cleans' the segmentation by removing extraneous segmented voxels. The final cleaned segmentation defines the volume of hyperintense inflammation (VHI), which is proposed as a quantitative imaging biomarker (QIB) of inflammation load in axSpA. We implemented and evaluated the proposed human-machine workflow in a cohort of 29 patients with axSpA who had undergone prospective MRI scans before and after starting biologic therapy. The performance of the workflow was compared against purely visual assessment in terms of inter-observer/inter-method segmentation overlap, inter-observer agreement and assessment of response to biologic therapy. The human-machine workflow showed superior inter-observer segmentation overlap than purely manual segmentation (Dice score 0.84 versus 0.56). VHI measurements produced by the workflow showed similar or better inter-observer agreement than visual scoring, with similar response assessments. We conclude that the proposed human-machine workflow offers a mechanism to improve the consistency of inflammation assessment, and that VHI could be a valuable QIB of inflammation load in axSpA, as well as offering an exemplar of human-machine cooperation more broadly.

Keyphrases