Research data supporting "MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling"
MetadataShow full item record
Budzianowski, P., Wen, T., & Gasic, M. (2018). Research data supporting "MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling" [Dataset]. https://doi.org/10.17863/CAM.27632
Dataset contains the following json files: 1. data.json: the woz dialogue dataset, which contains the conversation users and wizards, as well as a set of coarse labels for each user turn. 2. restaurant_db.json: the Cambridge restaurant database file, containing restaurants in the Cambridge UK area and a set of attributes. 3. attraction_db.json: the Cambridge attraction database file, contining attractions in the Cambridge UK area and a set of attributes. 4. hotel_db.json: the Cambridge hotel database file, containing hotels in the Cambridge UK area and a set of attributes. 5. train_db.json: the Cambridge train (with artificial connections) database file, containing trains in the Cambridge UK area and a set of attributes. 6. hospital_db.json: the Cambridge hospital database file, contatining information about departments. 7. police_db.json: the Cambridge police station information. 8. taxi_db.json: slot-value list for taxi domain. 9. valListFile.json: list of dialogues for validation. 10. testListFile.json: list of dialogues for testing. 11. system_acts.json: system acts annotations 12. ontology.json: Data-based ontology.
The Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a collection of human-human written conversations spanning over multiple domains and topics. The dataset was collected based on the Wizard of Oz experiment on Amazon MTurk. Each dialogue contains a goal label and several exchanges between a visitor and the system. Each system turn has labels from the set of slot-value pairs representing a coarse representation of dialogue state for both user and system. There are in total 10438 dialogues.
dialogue system, dataset, wizard of oz
Publication Reference: https://doi.org/10.18653/v1/D18-1547
The data collection was funded through Google Faculty Award.
This record's DOI: https://doi.org/10.17863/CAM.27632
Attribution 4.0 International (CC BY 4.0)
Licence URL: https://creativecommons.org/licenses/by/4.0/