Research data supporting "MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling"

Name: Research data supporting "MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling"
Published: 2018-09-21T14:46:01Z
Keywords: dialogue system, dataset, wizard of oz

Budzianowski, PF; Wen, T-H; Gasic, M

doi:10.17863/CAM.27632

This is not the latest version of this item. The latest version can be found here.

Research data supporting "MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling"

Repository URI

https://www.repository.cam.ac.uk/handle/1810/280608

Repository DOI

https://doi.org/10.17863/CAM.27632

Files

MULTIWOZ2.zip (13.34 MB)

Type

Dataset

Authors

Budzianowski, PF

Wen, T-H

Gasic, Milica

https://orcid.org/0000-0003-0318-9147

Description

Dataset contains the following json files:

data.json: the woz dialogue dataset, which contains the conversation users and wizards, as well as a set of coarse labels for each user turn.
restaurant_db.json: the Cambridge restaurant database file, containing restaurants in the Cambridge UK area and a set of attributes.
attraction_db.json: the Cambridge attraction database file, contining attractions in the Cambridge UK area and a set of attributes.
hotel_db.json: the Cambridge hotel database file, containing hotels in the Cambridge UK area and a set of attributes.
train_db.json: the Cambridge train (with artificial connections) database file, containing trains in the Cambridge UK area and a set of attributes.
hospital_db.json: the Cambridge hospital database file, contatining information about departments.
police_db.json: the Cambridge police station information.
taxi_db.json: slot-value list for taxi domain.
valListFile.json: list of dialogues for validation.
testListFile.json: list of dialogues for testing.
system_acts.json: system acts annotations
ontology.json: Data-based ontology.

Software / Usage instructions

The Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a collection of human-human written conversations spanning over multiple domains and topics. The dataset was collected based on the Wizard of Oz experiment on Amazon MTurk. Each dialogue contains a goal label and several exchanges between a visitor and the system. Each system turn has labels from the set of slot-value pairs representing a coarse representation of dialogue state for both user and system. There are in total 10438 dialogues.

Keywords

dialogue system, dataset, wizard of oz

Rights and licensing

Except where otherwised noted, this item's license is described as Attribution 4.0 International (CC BY 4.0)

Sponsorship

The data collection was funded through Google Faculty Award.

Relationships

Supplements:

https://doi.org/10.18653/v1/D18-1547

Is previous version of:

https://doi.org/10.17863/CAM.41572

Collections

Research Data - Engineering

Research data supporting "MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling"

Repository URI

Repository DOI

Files

Type

Change log

Authors

Description

Version

Software / Usage instructions

Keywords

Publisher

Rights and licensing

Sponsorship

Relationships

Collections