Repository logo

The concurrent validity of comparative judgement outcomes compared with marks

Published version

Change log


Gill, Tim 


In Comparative Judgement (CJ) exercises, examiners are asked to look at a selection of candidate scripts (with marks removed) and order them in terms of which they believe display the best quality. By including scripts from different examination sessions, the results of these exercises can be used to help with maintaining standards.

Results from previous CJ studies have demonstrated that the method appears to be valid and reliable in many contexts. However, it is not entirely clear whether CJ works as well as it does because of the physical and judgemental processes involved (i.e., placing two scripts next to each other and deciding which is better based on an intuitive, holistic, and relative judgement), or because CJ exercises capture a lot of individual paired comparison decisions quickly. This article adds to the research on this question by re-analysing data from previous CJ studies and comparing the concurrent validity of the outcomes of individual CJ paired comparisons with the concurrent validity of outcomes based on the original marks given to scripts.

The results show that for 16 out of the 20 data sets analysed, mark-based decisions had higher concurrent validity than CJ-based decisions. Two possible reasons for this finding are: CJ decisions reward different skills to marks; or individual CJ decisions are of lower quality than individual decisions based on marks. Either way, the implication is that the CJ method works because many individual paired comparison decisions are captured quickly, rather than because of the physical and psychological processes involved in making holistic judgements.



Comparative Judgement, Validity, Marking

Journal Title

Research Matters

Conference Name

Journal ISSN

Volume Title


Research Division, Cambridge University Press & Assessment

Publisher DOI

Publisher URL