The segment from a Degl film production company newsreel captures president Tomáš Garrigue Masaryk´s first visit to the National Theatre. Cars with guests are arriving at the side entrance of the building. The president is leaving the theatre and stepping into a car.
Experimental materials, data and R scripts used in the paper "Garden-path sentences and the diversity of their
(mis)representations" (Ceháková - Chromý, 2023).
Segment from Český zvukový týdeník Aktualita (Czech Aktualita Sound Newsreel) issue no. 5B from 1945 shows girls, supervised by instructors of the Board of Trustees for the Education of Youth, sewing gloves for the Czech men who were sent to dig trenches as part of the forced labour (Totaleinsatz) programme. The finished products were sent to labour camps.
The dataset used for the Ptakopět experiment on outbound machine translation. It consists of screenshots of web forms with user queries entered. The queries are available also in a text form. The dataset comprises two language versions: English and Czech. Whereas the English version has been fully post-processed (screenshots cropped, queries within the screenshots highlighted, dataset split based on its quality etc.), the Czech version is raw as it was collected by the annotators.
Post-editing and MQM annotations produced by the QT21 project. As described in
@InProceedings{specia-etal_MTSummit:2017,
author = {Specia, Lucia and Kim Harris and Frédéric Blain and Aljoscha Burchardt and Viviven Macketanz and Inguna Skadiņa and Matteo Negri and and Marco Turchi},
title = {Translation Quality and Productivity: A Study on Rich Morphology Languages},
booktitle = {Proceedings of Machine Translation Summit XVI},
year = {2017},
pages = {55--71},
address = {Nagoya, Japan},
}
This corpora is part of Deliverable 5.5 of the European Commission project QTLeap FP7-ICT-2013.4.1-610516 (http://qtleap.eu).
The texts are Q&A interactions from the real-user scenario (batches 1 and 2). The interactions in this corpus are available in Basque, Bulgarian, Czech, English, Portuguese and Spanish.
The texts have been automatically annotated with NLP tools, including Word Sense Disambiguation, Named Entity Disambiguation and Coreference resolution. Please check deliverable D5.6 in http://qtleap.eu/deliverables for more information.
Input data, individual experimental annotations, and a complete and detailed overview of the measured results related to the experiment described in the referenced paper.
Dataset collected from natural dialogs which enables to test the ability of dialog systems to interactively learn new facts from user utterances throughout the dialog. The dataset, consisting of 1900 dialogs, allows simulation of an interactive gaining of denotations and questions explanations from users which can be used for the interactive learning.