Skills embeddings: A neural approach to multicomponent representations of students and tasks
Accepted version
Peer-reviewed
Repository URI
Repository DOI
Change log
Authors
Abstract
Educational systems use models of student skill to inform decision-making processes. Defining such a model manually is challenging due to the large number of relevant factors. We introduce an alternative approach by learning multidimensional representations (embeddings) from student activity data. Such embeddings are fixed-length real vectors with three desirable characteristics: co-location of similar students and items in a vector space; magnitude increases with skill, and that absence of a skill can be represented. Based on the Multicomponent Latent Trait Model, we use a neural network with complementary trainable weights to learn these embeddings by backpropagation in an unsupervised manner. We evaluate using synthetic student activity data that provides a ground-truth of student skills in order to understand the impact of number of students, question items and knowledge components in the domain. We find that our data-mined parameter values can recreate the synthetic datasets up to the accuracy of the model that generated them, for domains containing up to 10 simultaneously active knowledge components, which can be effectively mined using relatively small quantities of data (1000 students, 100 items). We describe a procedure to estimate the number of components in a domain, and propose a component-masking logic mechanism that improves performance on high-dimensional datasets.