The reliabilities of three potential methods of capturing expert judgement in determining grade boundaries
Published version
Peer-reviewed
Repository URI
Repository DOI
Type
Change log
Authors
Abstract
In England there is a strong public expectation that qualification standards should remain constant over time. At each examination session, awarding bodies must therefore determine the grade boundaries for their examinations that equate to those of previous sessions. We investigated the reliabilities of three methods for capturing the expert judgement of professional examiners who are responsible for maintaining year-on-year examination standards. The methods were those used in: traditional (current) awarding; Thurstone pairs; and rank ordering.
In the context of setting grade boundaries in AS level Biology and GCSE English, we conducted a three-way comparison of the intra-method and inter-method reliabilities of the three methods. For each subject, three mutually exclusive sets of examination scripts were created, which were matched for mark. Three groups of ten 'judges' (examiners, matched for experience of the methods) made judgements using each of the three methods on a different set of scripts. It was found that for both subjects, the traditional awarding and Thurstone pairs methods generated very similar boundary marks, except for at the biology A/B grade boundary. The boundary marks generated by rank ordering were all on the lenient side for biology, whereas for the English C/D grade boundary, they were on the severe side.
