WMT16 Quality Estimation Shared Task Training and Development Data
- Title:
- WMT16 Quality Estimation Shared Task Training and Development Data
- Creator:
- Specia, Lucia, Logacheva, Varvara, and Scarton, Carolina
- Contributor:
- European Union@@H2020-ICT-2014-1-645452@@QT21: Quality Translation 21@@euFunds@@info:eu-repo/grantAgreement/EC/H2020/645452
- Publisher:
- University of Sheffield
- Identifier:
- http://hdl.handle.net/11372/LRT-1646
- Subject:
- machine translation, quality estimation, and machine learning
- Type:
- text and corpus
- Description:
- Training and development data for the WMT16 QE task. Test data will be published as a separate item. This shared task will build on its previous four editions to further examine automatic methods for estimating the quality of machine translation output at run-time, without relying on reference translations. We include word-level, sentence-level and document-level estimation. The sentence and word-level tasks will explore a large dataset produced from post-editions by professional translators (as opposed to crowdsourced translations as in the previous year). For the first time, the data will be domain-specific (IT domain). The document-level task will use, for the first time, entire documents, which have been human annotated for quality indirectly in two ways: through reading comprehension tests and through a two-stage post-editing exercise. Our tasks have the following goals: - To advance work on sentence and word-level quality estimation by providing domain-specific, larger and professionally annotated datasets. - To study the utility of detailed information logged during post-editing (time, keystrokes, actual edits) for different levels of prediction. - To analyse the effectiveness of different types of quality labels provided by humans for longer texts in document-level prediction. This year's shared task provides new training and test datasets for all tasks, and allows participants to explore any additional data and resources deemed relevant. A in-house MT system was used to produce translations for the sentence and word-level tasks, and multiple MT systems were used to produce translations for the document-level task. Therefore, MT system-dependent information will be made available where possible.
- Language:
- English and German
- Rights:
- AGREEMENT ON THE USE OF DATA IN QT21
https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21
PUB - Relation:
- info:eu-repo/grantAgreement/EC/H2020/645452
http://hdl.handle.net/11372/LRT-1631
http://hdl.handle.net/11372/LRT-1974 - Source:
- http://www.statmt.org/wmt16/quality-estimation-task.html
- Harvested from:
- LINDAT/CLARIAH-CZ repository
- Metadata only:
- false
- Date:
- 2016-02-29
The item or associated files might be "in copyright"; review the provided rights metadata:
- AGREEMENT ON THE USE OF DATA IN QT21
- https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21
- PUB