Preamble 1.0 is a multilingual annotated corpus of the preamble of the EU REGULATION 2020/2092 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL. The corpus consists of four language versions of the preamble (Czech, English, French, Polish), each of them annotated with sentence subjects.
The data were annotated in the Brat tool (https://brat.nlplab.org/) and are distributed in the Brat native format, i.e. each annotated preamble is represented by the original plain text and a stand-off annotation file.
The presented game is designed to teach the six most frequent English prepositions (to, of, in, for, on, and with) at the A1 to A2 levels of proficiency. Prep for Adventure is a single-player game comprised of five separate tasks – jumping puzzle, cooking, town maze, lighting the goblets, and a banter with a classmate. Their mechanics are then combined in the final task (The Final Fight) to elicit the correct responses of the subject.
The language used in the game is adjusted for the subjects’ level of proficiency, the game is fully voiced and offers a degree of customization. All tasks are based on the gap-filling type of exercise where subjects have to complete a sentence with a missing word, either by typing it in or via different kinds of multiple-choice formats. The game is designed to advance the subjects’ performance in prepositional structures by exposing players to as much input as possible.
The length of one average playthrough is approximately 30-45 minutes. The game was created in the RPG Maker MV engine where RPG stands for role-playing game, which is a genre of a game in which the player adopts a role/roles of a fictional character/characters in a (partly or fully) invented setting.
The game story:
The Grammar School of Witchcraft has been taken over by the Evil Preposition Magician and the player is trying to win their school back alongside with a young witch named Morphologina (the player’s guide).
Possibility to download or to browse free electronic books; Angebot: Download von und Online-Zugang zu frei verfügbaren E-Books; deutschsprachige Literatur stellt nur einen Teilbereich der verfügbaren E-Books dar
The dataset used for the Ptakopět experiment on outbound machine translation. It consists of screenshots of web forms with user queries entered. The queries are available also in a text form. The dataset comprises two language versions: English and Czech. Whereas the English version has been fully post-processed (screenshots cropped, queries within the screenshots highlighted, dataset split based on its quality etc.), the Czech version is raw as it was collected by the annotators.
Post-editing and MQM annotations produced by the QT21 project. As described in
@InProceedings{specia-etal_MTSummit:2017,
author = {Specia, Lucia and Kim Harris and Frédéric Blain and Aljoscha Burchardt and Viviven Macketanz and Inguna Skadiņa and Matteo Negri and and Marco Turchi},
title = {Translation Quality and Productivity: A Study on Rich Morphology Languages},
booktitle = {Proceedings of Machine Translation Summit XVI},
year = {2017},
pages = {55--71},
address = {Nagoya, Japan},
}
This corpora is part of Deliverable 5.5 of the European Commission project QTLeap FP7-ICT-2013.4.1-610516 (http://qtleap.eu).
The texts are Q&A interactions from the real-user scenario (batches 1 and 2). The interactions in this corpus are available in Basque, Bulgarian, Czech, English, Portuguese and Spanish.
The texts have been automatically annotated with NLP tools, including Word Sense Disambiguation, Named Entity Disambiguation and Coreference resolution. Please check deliverable D5.6 in http://qtleap.eu/deliverables for more information.
Dataset collected from natural dialogs which enables to test the ability of dialog systems to interactively learn new facts from user utterances throughout the dialog. The dataset, consisting of 1900 dialogs, allows simulation of an interactive gaining of denotations and questions explanations from users which can be used for the interactive learning.
The RuN corpus is a parallel corpus consisting of Norwegian, Russian and English texts. The texts are aligned at the sentence level and have been tagged for grammatical information at the word level.