============================================================== Cross-content quality scaling of TID2013 image quality dataset ============================================================== This dataset improves the accuracy and consistency of the quality scores for the TID2013 image quality dataset: http://www.ponomarenko.info/tid2013.htm (Version 1.0) The details in this improved quality scores can be found in the paper: Aliaksei Mikhailiuk, María Pérez Ortiz and Rafał K. Mantiuk "Psychometric scaling of TID2013 dataset" Proc. of 10th International Conference on Quality of Multimedia Experience (QoMEX 2018) http://www.cl.cam.ac.uk/~rkm38/pdfs/mikhailiuk2018tid_psych_scaling.pdf Benefits -------- Although TID2013 is the largest and the most extensively measured image quality dataset, it has several limitations: 1. TID2013 quality scores are not relative to the reference images Pairwise comparisons were used to measure quality for TID2013 dataset. However, the distorted images were never directly compared with reference images. Because of that, it is impossible to say where references images lie on the quality scale. We cannot assume that reference images have the larger possible quality scores because the dataset contains enhanced images, whose quality was higher than that for reference images. This dataset adds comparisons for 140 image pairs of reference and distorted (or enhanced) images, which were missing in the original dataset and are required to anchor the quality scores. The new quality scores put the reference at 0, assign positive scores to enhanced images, and negative score to distorted images. 2. TID2013 does not capture quality differences between different content All comparisons in TID2018 were performed between images depicting the same content. Therefore, the results cannot tell whether noise added to a heavily textured image is less bothersome than the same amount of noise added to an image with faces and smooth regions. This dataset adds comparisons for 540 cross-content image pairs to align the quality values across images depicting different content. 3. No observer model was used to obtain TID2013 quality scores TID2013 quality scores count the mean number of times each image was selected over all other compared images. We refer to them as vote counts. Such counting the number of wins provides correct ordering of quality scores but does not ensure that the relative distances between scores reflect subjective preference. This dataset was scaled using a well-established Thurstone observer model with Case V assumptions. The scaling was performed using publicly available software: https://github.com/mantiuk/pwcmp Further details on the scaling procedure can be found in: A practical guide and software for analysing pairwise comparison experiments Maria Perez-Ortiz and Rafal K. Mantiuk. https://arxiv.org/abs/1712.03686 4. In addition to the above, we also included: * Added comparisons for 20 within-content pairs. * Overall 15,000 additional pairwise comparison judgements collected for 16 observers. Copyright and credits --------------------- Note that this dataset relies mostly on the measurements conducted by the authors of the original TID2013 study and they must be credited for their work and contribution. If using this dataset, please cite one of the papers listed on the TID2013 web page: http://www.ponomarenko.info/tid2013.htm in addition to: Aliaksei Mikhailiuk, María Pérez Ortiz and Rafał K. Mantiuk "Psychometric scaling of TID2013 dataset" Proc. of 10th International Conference on Quality of Multimedia Experience (QoMEX 2018) The dataset - files ------------------- The repository contains the following files: * JOD.csv - contains JOD scores, confidence intervals and standard deviations for each distorted image. Reference images are assumed to have a score of 0. Higher scores mean higher percepted image quality, lower values correspond to lower quality of the image. The scores were obtained from the psychometric scaling using both original comparisons from the TID2013 dataset and the newly collected data. Columns p_N contain percentiles from the distribution of the mean. The distribution was obtained by boostraping experimental data. Column STD contains the standard variation of the mean quality score. Note that the distribution of the mean is not always Gaussian so that STD column may misrepresent underlying distribution. It is recommended to use percentile columns instead. "JOD" means Just-Objectionable-Difference. For the full explanation, and why we do not call them JNDs, please refer to Section 5.3 in https://arxiv.org/abs/1712.03686 * rawexp_qomex.mat - a cell array with raw data from the experiments. Every cell in the array are comparisons performed by one observer. Every observer has a unique id from 1 to 16. * example_of_usage.m - a matlab script to read the data from rawexp_qomex.mat * images.zip - for completeness, we include distorted and reference images from the original TID2013 dataset. The archive contains subfolders corresponding to 25 different contents. For each image in a subfolder, the name iXX_YY_Z.bmp corresponds to a content XX, distortion type YY, distortion level Z. If YY and Z are not set, i.e. iXX.bmp, the image is the reference image for the content XX.