WMT16 Quality Estimation Shared Task Training and Development Data

Title:: WMT16 Quality Estimation Shared Task Training and Development Data
Creator:: Specia, Lucia, Logacheva, Varvara, and Scarton, Carolina
Contributor:: European Union@@H2020-ICT-2014-1-645452@@QT21: Quality Translation 21@@euFunds@@info:eu-repo/grantAgreement/EC/H2020/645452
Publisher:: University of Sheffield
Identifier:: http://hdl.handle.net/11372/LRT-1646
Subject:: machine translation, quality estimation, and machine learning
Type:: text and corpus
Description:: Training and development data for the WMT16 QE task. Test data will be published as a separate item. This shared task will build on its previous four editions to further examine automatic methods for estimating the quality of machine translation output at run-time, without relying on reference translations. We include word-level, sentence-level and document-level estimation. The sentence and word-level tasks will explore a large dataset produced from post-editions by professional translators (as opposed to crowdsourced translations as in the previous year). For the first time, the data will be domain-specific (IT domain). The document-level task will use, for the first time, entire documents, which have been human annotated for quality indirectly in two ways: through reading comprehension tests and through a two-stage post-editing exercise. Our tasks have the following goals: - To advance work on sentence and word-level quality estimation by providing domain-specific, larger and professionally annotated datasets. - To study the utility of detailed information logged during post-editing (time, keystrokes, actual edits) for different levels of prediction. - To analyse the effectiveness of different types of quality labels provided by humans for longer texts in document-level prediction. This year's shared task provides new training and test datasets for all tasks, and allows participants to explore any additional data and resources deemed relevant. A in-house MT system was used to produce translations for the sentence and word-level tasks, and multiple MT systems were used to produce translations for the document-level task. Therefore, MT system-dependent information will be made available where possible.
Language:: English and German
Rights:: AGREEMENT ON THE USE OF DATA IN QT21
https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21
PUB
Relation:: info:eu-repo/grantAgreement/EC/H2020/645452
http://hdl.handle.net/11372/LRT-1631
http://hdl.handle.net/11372/LRT-1974
Source:: http://www.statmt.org/wmt16/quality-estimation-task.html
Harvested from:: LINDAT/CLARIAH-CZ repository
Metadata only:: false
Date:: 2016-02-29

The item or associated files might be "in copyright"; review the provided rights metadata:

AGREEMENT ON THE USE OF DATA IN QT21
https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21
PUB

and the original context.

WMT16 Quality Estimation Shared Task Training and Development Data

Original context