Harvested from: LINDAT/CLARIAH-CZ repository / Rights: PUB / Subject: machine translation / Type: text

1. APE Shared Task WMT17: Human Post-edits Test Data DE-EN

Creator:: Turchi, Marco, Chatterjee, Rajen, and Negri, Matteo
Publisher:: Fondazione Bruno Kessler, Trento, Italy
Type:: text and corpus
Subject:: Human post-edits, machine translation, shared task, automatic post-editing, and post-editing
Language:: English
Description:: Human post-edited test sentences for the WMT 2017 Automatic post-editing task. This consists in 2,000 English sentences belonging to the IT domain and already tokenized. Source and target segments can be downloaded from: https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-2132. All data is provided by the EU project QT21 (http://www.qt21.eu/).
Rights:: AGREEMENT ON THE USE OF DATA IN QT21 APE Task, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21, and PUB

2. APE Shared Task WMT17: Human Post-edits Test Data EN-DE

Creator:: Turchi, Marco, Chatterjee, Rajen, and Negri, Matteo
Publisher:: Fondazione Bruno Kessler, Trento, Italy
Type:: text and corpus
Subject:: machine translation, human post-edits, shared task, automatic post-editing, and post-editing
Language:: German
Description:: Human post-edited test sentences for the WMT 2017 Automatic post-editing task. This consists in 2,000 German sentences belonging to the IT domain and already tokenized. Source and target segments can be downloaded from: https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-2133. All data is provided by the EU project QT21 (http://www.qt21.eu/).
Rights:: AGREEMENT ON THE USE OF DATA IN QT21 APE Task, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21, and PUB

3. Automatic Paraphrases of Czech Reference Sentences for WMT11, 13 and 14

Creator:: Barančíková, Petra and Tamchyna, Aleš
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: machine translation, automatic evaluation, and paraphrasing
Language:: Czech
Description:: This dataset contains automatic paraphrases of Czech official reference translations for the Workshop on Statistical Machine Translation shared task. The data covers the years 2011, 2013 and 2014. For each sentence, at most 10000 paraphrases were included (randomly selected from the full set). The goal of using this dataset is to improve automatic evaluation of machine translation outputs. If you use this work, please cite the following paper: Tamchyna Aleš, Barančíková Petra: Automatic and Manual Paraphrases for MT Evaluation. In proceedings of LREC, 2016.
Rights:: Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB

4. DiscoMT 2015 Shared Task on Pronoun Translation

Creator:: Hardmeier, Christian, Tiedemann, Jörg, Nakov, Preslav, Stymne, Sara, and Versley, Yannick
Publisher:: Uppsala University
Type:: text and corpus
Subject:: machine translation, coreference resolution, anaphora resolution, and discourse
Language:: English and French
Description:: The data set includes training, development and test data from the shared tasks on pronoun-focused machine translation and cross-lingual pronoun prediction from the EMNLP 2015 workshop on Discourse in Machine Translation (DiscoMT2015). The release also contains the submissions to the pronoun-focused machine translation along with the manual annotations used for the official evaluation as well as gold-standard annotations of pronoun coreference for the shared task test set.
Rights:: Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), http://creativecommons.org/licenses/by-nc-nd/4.0/, and PUB

5. DiscoMT 2016 Shared Task on Cross-lingual Pronoun Prediction

Creator:: Guillou, Liane, Hardmeier, Christian, Nakov, Preslav, Stymne, Sara, Tiedemann, Jörg, Versley, Yannick, Cettolo, Mauro, Webber, Bonnie, and Popescu-Belis, Andrei
Publisher:: Uppsala University
Type:: text and corpus
Subject:: machine translation, coreference, discourse, and pronouns
Language:: English, French, and German
Description:: Files for the DiscoMT 2016 shared task on cross-lingual pronoun prediction
Rights:: Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), http://creativecommons.org/licenses/by-nc-nd/4.0/, and PUB

6. DiscoMT 2017 Shared Task on Cross-lingual Pronoun Prediction

Creator:: Loáiciga, Sharid, Stymne, Sara, Nakov, Preslav, Hardmeier, Christian, Tiedemann, Jörg, Cettolo, Mauro, and Versley, Yannick
Publisher:: Uppsala University
Type:: text and corpus
Subject:: machine translation, discourse, coreference, and pronouns
Language:: English, Spanish, German, and French
Description:: Data used in the 2017 shared task on cross-lingual pronoun prediction.
Rights:: Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), http://creativecommons.org/licenses/by-nc-nd/4.0/, and PUB

7. English-Urdu Religious Parallel Corpus

Creator:: Jawaid, Bushra and Zeman, Daniel
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: parallel corpus, religious text, and machine translation
Language:: English and Urdu
Description:: English-Urdu parallel corpus is a collection of religious texts (Quran, Bible) in English and Urdu language with sentence alignments. The corpus can be used for experiments with statistical machine translation. Our modifications of crawled data include but are not limited to the following: 1- Manually corrected sentence alignment of the corpora. 2- Our data split (training-development-test) so that our published experiments can be reproduced. 3- Tokenization (optional, but needed to reproduce our experiments). 4- Normalization (optional) of e.g. European vs. Urdu numerals, European vs. Urdu punctuation, removal of Urdu diacritics.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

8. Extended CLEF eHealth 2013-2015 IR Test Collection

Creator:: Pecina, Pavel and Saleh, Shadi
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: cross-lingual information retrieval and machine translation
Language:: English, Czech, French, German, Hungarian, Polish, Spanish, and Swedish
Description:: This package contains an extended version of the test collection used in the CLEF eHealth Information Retrieval tasks in 2013--2015. Compared to the original version, it provides complete query translations into Czech, French, German, Hungarian, Polish, Spanish and Swedish and additional relevance assessment.
Rights:: Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB

9. FAUST cs-en 0.5

Creator:: Hajič, Jan, Mareček, David, Fučíková, Eva, Cinková, Silvie, Štěpánek, Jan, Mikulová, Marie, and Popel, Martin
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: noisy texts, parallel corpus, and machine translation
Language:: English and Czech
Description:: This machine translation test set contains 2223 Czech sentences collected within the FAUST project (https://ufal.mff.cuni.cz/grants/faust, http://hdl.handle.net/11234/1-3308). Each original (noisy) sentence was normalized (clean1 and clean2) and translated to English independently by two translators.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

10. Khresmoi Query Translation Test Data 1.0

Creator:: Pecina, Pavel, Dušek, Ondřej, Hajič, Jan, and Urešová, Zdeňka
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: corpus, test data, medical, health, machine translation, Czech, French, German, and English
Language:: English, French, German, and Czech
Description:: This package contains data sets for development and testing of machine translation of medical search short queries between Czech, English, French, and German. The queries come from general public and medical experts. and This work was supported by the EU FP7 project Khresmoi (European Comission contract No. 257528). The language resources are distributed by the LINDAT/Clarin project of the Ministry of Education, Youth and Sports of the Czech Republic (project no. LM2010013). We thank Health on the Net Foundation for granting the license for the English general public queries, TRIP database for granting the license for the English medical expert queries, and three anonymous translators and three medical experts for translating amd revising the data.
Rights:: Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0), http://creativecommons.org/licenses/by-nc/3.0/, and PUB

1. APE Shared Task WMT17: Human Post-edits Test Data DE-EN

2. APE Shared Task WMT17: Human Post-edits Test Data EN-DE

3. Automatic Paraphrases of Czech Reference Sentences for WMT11, 13 and 14

4. DiscoMT 2015 Shared Task on Pronoun Translation

5. DiscoMT 2016 Shared Task on Cross-lingual Pronoun Prediction

6. DiscoMT 2017 Shared Task on Cross-lingual Pronoun Prediction

7. English-Urdu Religious Parallel Corpus

8. Extended CLEF eHealth 2013-2015 IR Test Collection

9. FAUST cs-en 0.5

10. Khresmoi Query Translation Test Data 1.0

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Creator

Show values starting with

Language

Show values starting with

Publisher

Rights

Show values starting with

Subject

Show values starting with

Type

Date

Original context has metadata only

Harvested from