Research data supporting "Towards an open-domain chatbot for language practice"

No Thumbnail Available
Change log
Tyen, Wen Hoi Gladys 
Brenchley, Mark 
Caines, Andrew 
Buttery, Paula 

This dataset is a set of dialogues generated by an artificial dialogue system, along with difficulty and quality annotations. The dialogue system used is a modified version of BlenderBot 1.0 (Roller et al., 2020). In each dialogue, the system is adjusted to generate messages at a particular difficulty level (as denoted by CEFR levels). The system always responds to the previously generated message as if in a 2-person conversation.

These dialogues are then shown to 10 English language examiners, who are asked to annotate the dialogues according to the difficulty and quality of the messages. They are asked to give an overall CEFR for the dialogue, as well as binary labels to each individual message denoting whether the message is grammatical, sensible, and specific to the conversation.

The .json file contains a list of dictionaries, with the following keys:

  • "intended_cefr" refers to the CEFR level which was used when generating the dialogue.
  • "generation_method" refers to one of 5 methods used in Tyen et al. (2022).
  • "dialogue_turns" is the list of generated dialogue messages.
  • "cefr_annotations" contains a dictionary of CEFR levels as determined by annotators.
  • "grammaticality_annotations" is a list of dictionaries containing binary labels, referring to whether an annotator considered a dialogue message to be grammatical. The order of the dictionaries corresponds to the order of the dialogue messages in "dialogue_turns".
  • "sensibleness_annotations" is structured in the same way as "grammaticality_annotations", but instead describes whether an annotator thought the message was sensible.
  • "specificity_annotations" is structured in the same way as "grammaticality_annotations", but instead describes whether an annotator thought the message was specific to the conversation.

For more detailed descriptions of the adjustments used in each method, as well as definitions of grammaticality, sensibleness, and specificity, please see Tyen et al. (2022) or Adiwardana et al. (2020).

Tyen, G., Brenchley, M., Caines, A., & Buttery, P. (2022). Towards an open-domain chatbot for language practice. 17th Workshop on Innovative Use of NLP for Building Educational Applications.

Adiwardana, D., Luong, M.-T., So, D. R., Hall, J., Thoppilan, R., Yang, Z., Kulshreshtha, A., Nemade, G., Lu, Y., & Le, Q. V. (2020). Towards a human-like open-domain chatbot. arXiv:2001.09977 Roller, S., Dinan, E., Goyal, N., Ju, D., Williamson, M., Liu, Y., ... & Weston, J. (2021, April). Recipes for Building an Open-Domain Chatbot. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (pp. 300-325).

Software / Usage instructions
chatbot, dialogue system, language learning, neural text generation, text complexity
This research was supported by Cambridge University Press & Assessment. This work was performed using resources provided by the Cambridge Service for Data Driven Discovery (CSD3) operated by the University of Cambridge Research Computing Service (, provided by Dell EMC and Intel using Tier-2 funding from the Engineering and Physical Sciences Research Council (capital grant EP/P020259/1), and DiRAC funding from the Science and Technology Facilities Council (