Comparing small-sample equating with Angoff judgement for linking cut-scores on two tests
Published version
Peer-reviewed
Repository URI
Repository DOI
Type
Change log
Authors
Abstract
The aim of this study was to compare, by simulation, the accuracy of mapping a cut-score from one test to another by expert judgement (using the Angoff method) versus the accuracy with a small-sample equating method (chained linear equating). As expected, the standard-setting method resulted in more accurate equating when we assumed a higher level of correlation between simulated expert judgements of item difficulty and empirical difficulty. For small-sample equating with 90 examinees per test, more accurate equating arose from using simple random sampling compared to cluster sampling at the same sample size. The overall equating error depended on where on the mark scale the cut-score was located. The simulations based on a realistic value for the correlation between judged and empirical difficulty (0.6) produced a similar overall error to small-sample equating with cluster sampling. Simulations of standard-setting based on a very optimistic correlation of 0.9 had the lowest error of all.