Research data supporting "MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling"
View / Open Files
Authors
Mihail, Eric
Rahul, Goel
Shachi, Paul
Sethi, Abhishek
Agarwal, Sanchit
Gao, Shuyag
Hakkani-Tur, Dilek
Publication Date
2019-07-10Previous Version(s)
Type
Dataset
Metadata
Show full item recordCitation
Budzianowski, P., Mihail, E., Rahul, G., Shachi, P., Sethi, A., Agarwal, S., Gao, S., & et al. (2019). Research data supporting "MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling" [Dataset]. https://doi.org/10.17863/CAM.41572
Description
The dataset contains the following json files: 1. data.json: the woz dialogue dataset, which contains the conversation users and wizards, as well as a set of coarse labels for each user turn. 2. restaurant_db.json: the Cambridge restaurant database file, containing restaurants in the Cambridge UK area and a set of attributes. 3. attraction_db.json: the Cambridge attraction database file, contining attractions in the Cambridge UK area and a set of attributes. 4. hotel_db.json: the Cambridge hotel database file, containing hotels in the Cambridge UK area and a set of attributes. 5. train_db.json: the Cambridge train (with artificial connections) database file, containing trains in the Cambridge UK area and a set of attributes. 6. hospital_db.json: the Cambridge hospital database file, contatining information about departments. 7. police_db.json: the Cambridge police station information. 8. taxi_db.json: slot-value list for taxi domain. 9. valListFile.json: list of dialogues for validation. 10. testListFile.json: list of dialogues for testing. 11. system_acts.json: system acts annotations 12. ontology.json: Data-based ontology.
Important note: This dataset was previously entitled 'Research data supporting "MultiWOZ 2.1 - Multi-Domain Dialogue State Corrections and State Tracking Baselines"'. The change to the current title of this dataset was made at the request of the authors in July 2019.
Format
The Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a collection of human-human written conversations spanning over multiple domains and topics. The dataset was collected based on the Wizard of Oz experiment on Amazon MTurk. Each dialogue contains a goal label and several exchanges between a visitor and the system. Each system turn has labels from the set of slot-value pairs representing a coarse representation of dialogue state for both user and system. There are in total 10438 dialogues.
This dataset contains corrections to the MultiWOZ 2.0 dataset.
Keywords
dialogue, machine learning, conversational ai
Relationships
Publication Reference: https://arxiv.org/pdf/1907.01669.pdf
Identifiers
This record's DOI: https://doi.org/10.17863/CAM.41572
Rights
Attribution 4.0 International (CC BY 4.0)
Licence URL: https://creativecommons.org/licenses/by/4.0/
Statistics
Total file downloads (since January 2020). For more information on metrics see the
IRUS guide.