Quality assessment of anatomical MRI images from generative adversarial networks: Human assessment and image quality metrics.

Treder, Matthias S; Codrai, Ryan; Tsvetanov, Kamen A

Quality assessment of anatomical MRI images from generative adversarial networks: Human assessment and image quality metrics.

Accepted version

Peer-reviewed

Repository URI

https://www.repository.cam.ac.uk/handle/1810/335839

Repository DOI

https://doi.org/10.17863/CAM.83272

Files

Accepted version (13.3 MB)

Type

Article

Authors

Treder, Matthias S

Codrai, Ryan

Tsvetanov, Kamen A.

https://orcid.org/0000-0002-3178-6363

Abstract

BACKGROUND: Generative Adversarial Networks (GANs) can synthesize brain images from image or noise input. So far, the gold standard for assessing the quality of the generated images has been human expert ratings. However, due to limitations of human assessment in terms of cost, scalability, and the limited sensitivity of the human eye to more subtle statistical relationships, a more automated approach towards evaluating GANs is required. NEW METHOD: We investigated to what extent visual quality can be assessed using image quality metrics and we used group analysis and spatial independent components analysis to verify that the GAN reproduces multivariate statistical relationships found in real data. Reference human data was obtained by recruiting neuroimaging experts to assess real Magnetic Resonance (MR) images and images generated by a GAN. Image quality was manipulated by exporting images at different stages of GAN training. RESULTS: Experts were sensitive to changes in image quality as evidenced by ratings and reaction times, and the generated images reproduced group effects (age, gender) and spatial correlations moderately well. We also surveyed a number of image quality metrics. Overall, Fréchet Inception Distance (FID), Maximum Mean Discrepancy (MMD) and Naturalness Image Quality Evaluator (NIQE) showed sensitivity to image quality and good correspondence with the human data, especially for lower-quality images (i.e., images from early stages of GAN training). However, only a Deep Quality Assessment (QA) model trained on human ratings was able to reproduce the subtle differences between higher-quality images. CONCLUSIONS: We recommend a combination of group analyses, spatial correlation analyses, and both distortion metrics (FID, MMD, NIQE) and perceptual models (Deep QA) for a comprehensive evaluation and comparison of brain images produced by GANs.

Keywords

Ageing, Deep learning, GAN, Generative Adversarial Network, Generative models, MRI, Machine learning, Quality assessment, Benchmarking, Brain, Humans, Image Processing, Computer-Assisted, Magnetic Resonance Imaging, Signal-To-Noise Ratio

Journal Title

J Neurosci Methods

Journal ISSN

0165-0270
1872-678X

Publisher

Elsevier BV

Publisher DOI

https://doi.org/10.1016/j.jneumeth.2022.109579

Rights

Sponsorship

Guarantors of Brain (Unknown)
Medical Research Council (MR/J009482/1)
Medical Research Council (MR/M008983/1)
Medical Research Council (MC_U105597119)
Medical Research Council (MC_UU_00005/12)

Collections

Cambridge University Research Outputs