Contributor: European Union@@EC/H2020/825303@@Bergamot - Browser-based Multilingual Translation@@euFunds@@info:eu-repo/grantAgreement/EC/H2020/825303 / Subject: machine translation

Start Over Contributor European Union@@EC/H2020/825303@@Bergamot - Browser-based Multilingual Translation@@euFunds@@info:eu-repo/grantAgreement/EC/H2020/825303 Subject machine translation

1. Machine Translation Testsuite for Gender-Consistent Translation

Creator:: Aires, João Paulo
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: machine translation, testsuite, evaluation, and gender
Language:: English and Czech
Description:: Document-level testsuite for evaluation of gender translation consistency. Our Document-Level test set consists of selected English documents from the WMT21 newstest annotated with gender information. Czech unnanotated references are also added for convenience. We semi-automatically annotated person names and pronouns to identify the gender of these elements as well as coreferences. Our proposed annotation consists of three elements: (1) an ID, (2) an element class, and (3) gender. The ID identifies a person's name and its occurrences (name and pronouns). The element class identifies whether the tag refers to a name or a pronoun. Finally, the gender information defines whether the element is masculine or feminine. We performed a series of NLP techniques to automatically identify person names and coreferences. This initial process resulted in a set containing 45 documents to be manually annotated. Thus, we started a manual annotation of these documents to make sure they are correctly tagged. See README.md for more details.
Rights:: Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB

2. Ptakopět data: the dataset for experiments on outbound translation

Creator:: Novák, Michal, Zouhar, Vilém, and Bojar, Ondřej
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: machine translation, interactive, and web forms
Language:: English and Czech
Description:: The dataset used for the Ptakopět experiment on outbound machine translation. It consists of screenshots of web forms with user queries entered. The queries are available also in a text form. The dataset comprises two language versions: English and Czech. Whereas the English version has been fully post-processed (screenshots cropped, queries within the screenshots highlighted, dataset split based on its quality etc.), the Czech version is raw as it was collected by the annotators.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

3. WMT21 Marian translation model (ca-oc multi-task)

Creator:: Novák, Michal and Jon, Josef
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: other and toolService
Subject:: neural machine translation, machine translation, grapheme-to-phoneme conversion, and multi-task model
Language:: Catalan and Occitan (post 1500)
Description:: Marian NMT model for Catalan to Occitan translation. It is a multi-task model, producing also a phonemic transcription of the Catalan source. The model was submitted to WMT'21 Shared Task on Multilingual Low-Resource Translation for Indo-European Languages as a CUNI-Contrastive system for Catalan to Occitan.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

4. WMT21 Marian translation model (ca-oc)

Creator:: Jon, Josef
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: other and toolService
Subject:: machine translation and neural machine translation
Language:: Catalan and Occitan (post 1500)
Description:: Marian NMT model for Catalan to Occitan translation. Primary CUNI submission for WMT21 Multilingual Low-Resource Translation for Indo-European Languages Shared Task.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

Search

Search Constraints

Search Results

Limit your search

Contributor

Creator

Language

Publisher

Rights

Subject

Type

Original context has metadata only

Harvested from