==============================================================
Cross-content quality scaling of TID2013 image quality dataset
==============================================================

This dataset improves the accuracy and consistency of the 
quality scores for the TID2013 image quality dataset:

http://www.ponomarenko.info/tid2013.htm (Version 1.0)

The details in this improved quality scores can be found in the paper:

Aliaksei Mikhailiuk, María Pérez Ortiz and Rafał K. Mantiuk
"Psychometric scaling of TID2013 dataset"
Proc. of 10th International Conference on Quality of Multimedia Experience 
(QoMEX 2018)
http://www.cl.cam.ac.uk/~rkm38/pdfs/mikhailiuk2018tid_psych_scaling.pdf


Benefits
--------

Although TID2013 is the largest and the most extensively measured
image quality dataset, it has several limitations:

1. TID2013 quality scores are not relative to the reference images

Pairwise comparisons were used to measure quality for TID2013
dataset. However, the distorted images were never directly compared
with reference images. Because of that, it is impossible to say where
references images lie on the quality scale. We cannot assume that
reference images have the larger possible quality scores because the
dataset contains enhanced images, whose quality was higher than that
for reference images.

This dataset adds comparisons for 140 image pairs of reference and
distorted (or enhanced) images, which were missing in the original
dataset and are required to anchor the quality scores. The new quality
scores put the reference at 0, assign positive scores to enhanced images,
and negative score to distorted images.

2. TID2013 does not capture quality differences between different
  content

All comparisons in TID2018 were performed between images depicting the
same content. Therefore, the results cannot tell whether noise added
to a heavily textured image is less bothersome than the same amount of
noise added to an image with faces and smooth regions.

This dataset adds comparisons for 540 cross-content image pairs to
align the quality values across images depicting different content.

3. No observer model was used to obtain TID2013 quality scores

TID2013 quality scores count the mean number of times each image was
selected over all other compared images. We refer to them as vote
counts. Such counting the number of wins provides correct ordering of
quality scores but does not ensure that the relative distances
between scores reflect subjective preference. 

This dataset was scaled using a well-established Thurstone
observer model with Case V assumptions. The scaling was performed
using publicly available software:

https://github.com/mantiuk/pwcmp

Further details on the scaling procedure can be found in:

A practical guide and software for analysing pairwise comparison experiments
Maria Perez-Ortiz and Rafal K. Mantiuk.
https://arxiv.org/abs/1712.03686

4. In addition to the above, we also included:

* Added comparisons for 20 within-content pairs.

* Overall 15,000 additional pairwise comparison judgements collected
for 16 observers.


Copyright and credits
---------------------

Note that this dataset relies mostly on the measurements conducted by
the authors of the original TID2013 study and they must be credited
for their work and contribution. If using this dataset, please cite
one of the papers listed on the TID2013 web page:

http://www.ponomarenko.info/tid2013.htm

in addition to:

Aliaksei Mikhailiuk, María Pérez Ortiz and Rafał K. Mantiuk
"Psychometric scaling of TID2013 dataset" Proc. of 10th 
International Conference on Quality of Multimedia Experience 
(QoMEX 2018)


The dataset - files
-------------------

The repository contains the following files:

* JOD.csv - contains JOD scores, confidence intervals and 
standard deviations for each distorted image. Reference images
are assumed to have a score of 0. Higher scores mean higher 
percepted image quality, lower values correspond to lower 
quality of the image. The scores were obtained from the 
psychometric scaling using both original comparisons from the 
TID2013 dataset and the newly collected data.

Columns p_N contain percentiles from the distribution of the mean. The
distribution was obtained by boostraping experimental data.

Column STD contains the standard variation of the mean quality score. Note
that the distribution of the mean is not always Gaussian so that STD
column may misrepresent underlying distribution. It is recommended to
use percentile columns instead.

"JOD" means Just-Objectionable-Difference. For the full explanation,
and why we do not call them JNDs, please refer to Section 5.3 in
https://arxiv.org/abs/1712.03686

* rawexp_qomex.mat - a cell array with raw data from the experiments. Every cell in the array are comparisons performed 
by one observer. Every observer has a unique id from 1 to 16.

* example_of_usage.m - a matlab script to read the data from 
rawexp_qomex.mat

* images.zip - for completeness, we include distorted and reference
images from the original TID2013 dataset. The archive contains
subfolders corresponding to 25 different contents. For each image in a
subfolder, the name iXX_YY_Z.bmp corresponds to a content XX,
distortion type YY, distortion level Z. If YY and Z are not set,
i.e. iXX.bmp, the image is the reference image for the content XX.