This package contains an extended version of the test collection used in the CLEF eHealth Information Retrieval tasks in 2013--2015. Compared to the original version, it provides complete query translations into Czech, French, German, Hungarian, Polish, Spanish and Swedish and additional relevance assessment.
Annotation of extended textual coreference and bridging relations in the Prague Dependency Treebank 2.0 and project LINDAT-Clarin LM2010013, grant GAČR GA405/09/0729
This package contains the eye-tracker recordings of 8 subjects evaluating English-to-Czech machine translation quality using the WMT-style ranking of sentences.
We provide the set of sentences evaluated, the exact screens presented to the annotators (including bounding box information for every area of interest and even for individual letters in the text) and finally the raw EyeLink II files with gaze trajectories.
The description of the experiment can be found in the paper:
Ondřej Bojar, Filip Děchtěrenko, Maria Zelenina. A Pilot Eye-Tracking Study of WMT-Style Ranking Evaluation.
Proceedings of the LREC 2016 Workshop “Translation Evaluation – From Fragmented Tools
and Data Sets to an Integrated Ecosystem”, Georg Rehm, Aljoscha Burchardt et al. (eds.). pp. 20-26. May 2016, Portorož, Slovenia.
This work has received funding from the European Union's Horizon 2020 research
and innovation programme under grant agreement no. 645452 (QT21). This work was
partially financially supported by the Government of Russian Federation, Grant
074-U01.
This work has been using language resources developed, stored and distributed
by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of
the Czech Republic (project LM2010013).
Fairytale Child is a simple chatbot trying to simulate a curious child. It asks the user to tell a fairy tale, often interrupting to ask for details and clarifications. However, it remembers what it was told and tries to show it if possible.
The chatbot can communicate in Czech and in English. It analyzes the morphology of each sentence produced by the user with natural language processing tools, tries to identify potential questions to ask, and then asks one. A morphological generator is employed to generate correctly inflected sentences in Czech, so that the resulting sentences sound as natural as possible.
Pohádkové dítě je jednoduchý chatbot, simulující zvídavé dítě. Požádá uživatele, aby mu vyprávěl pohádku, ale často ho přerušuje, aby se zeptal na detaily a vysvětlení. Pamatuje si ale, co mu uživatel řekl, a snaží se to pokud možno dát najevo.
Chatbot umí komunikovat česky a anglicky. Analyzuje tvarosloví každé uživatelovy věty pomocí NLP nástrojů, pokusí se nalézt chodnou otázku, a tu pak položí. Aby tvořené české věty zněly co nejpřirozeněji, využívá se pro skloňování tvaroslovný generátor. and The work has been supported by GAUK 1572314 and SVV 260104.
It has been using language resources developed, stored and distributed by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic (project LM2010013).
Fairytale Child is a simple chatbot trying to simulate a curious child. It asks the user to tell a fairy tale, often interrupting to ask for details and clarifications. However, it remembers what it was told and tries to show it if possible.
The chatbot can communicate in Czech and in English. It analyzes the morphology of each sentence produced by the user with natural language processing tools, tries to identify potential questions to ask, and then asks one. A morphological generator is employed to generate correctly inflected sentences in Czech, so that the resulting sentences sound as natural as possible.
Pohádkové dítě je jednoduchý chatbot, simulující zvídavé dítě. Požádá uživatele, aby mu vyprávěl pohádku, ale často ho přerušuje, aby se zeptal na detaily a vysvětlení. Pamatuje si ale, co mu uživatel řekl, a snaží se to pokud možno dát najevo.
Chatbot umí komunikovat česky a anglicky. Analyzuje tvarosloví každé uživatelovy věty pomocí NLP nástrojů, pokusí se nalézt chodnou otázku, a tu pak položí. Aby tvořené české věty zněly co nejpřirozeněji, využívá se pro skloňování tvaroslovný generátor. and The work has been supported by GAUK 1572314 and SVV 260104.
It has been using language resources developed, stored and distributed by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic (project LM2010013).
Fairytale Child is a simple chatbot trying to simulate a curious child. It asks the user to tell a fairy tale, often interrupting to ask for details and clarifications. However, it remembers what it was told and tries to show it if possible.
The chatbot can communicate in Czech and in English. It analyzes the morphology of each sentence produced by the user with natural language processing tools, tries to identify potential questions to ask, and then asks one. A morphological generator is employed to generate correctly inflected sentences in Czech, so that the resulting sentences sound as natural as possible.
Pohádkové dítě je jednoduchý chatbot, simulující zvídavé dítě. Požádá uživatele, aby mu vyprávěl pohádku, ale často ho přerušuje, aby se zeptal na detaily a vysvětlení. Pamatuje si ale, co mu uživatel řekl, a snaží se to pokud možno dát najevo.
Chatbot umí komunikovat česky a anglicky. Analyzuje tvarosloví každé uživatelovy věty pomocí NLP nástrojů, pokusí se nalézt chodnou otázku, a tu pak položí. Aby tvořené české věty zněly co nejpřirozeněji, využívá se pro skloňování tvaroslovný generátor. and The work has been supported by GAUK 1572314 and SVV 260104.
It has been using language resources developed, stored and distributed by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic (project LM2010013).
Fairytale Child is a simple chatbot trying to simulate a curious child. It asks the user to tell a fairy tale, often interrupting to ask for details and clarifications. However, it remembers what it was told and tries to show it if possible.
The chatbot can communicate in Czech and in English. It analyzes the morphology of each sentence produced by the user with natural language processing tools, tries to identify potential questions to ask, and then asks one. A morphological generator is employed to generate correctly inflected sentences in Czech, so that the resulting sentences sound as natural as possible.
Pohádkové dítě je jednoduchý chatbot, simulující zvídavé dítě. Požádá uživatele, aby mu vyprávěl pohádku, ale často ho přerušuje, aby se zeptal na detaily a vysvětlení. Pamatuje si ale, co mu uživatel řekl, a snaží se to pokud možno dát najevo.
Chatbot umí komunikovat česky a anglicky. Analyzuje tvarosloví každé uživatelovy věty pomocí NLP nástrojů, pokusí se nalézt chodnou otázku, a tu pak položí. Aby tvořené české věty zněly co nejpřirozeněji, využívá se pro skloňování tvaroslovný generátor. and The work has been supported by GAUK 1572314 and SVV 260104.
It has been using language resources developed, stored and distributed by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic (project LM2010013).
Syntactic (including deep-syntactic - tectogrammatical) annotation of user-generated noisy sentences. The annotation was made on Czech-English and English-Czech Faust Dev/Test sets.
The English data includes manual annotations of English reference translations of Czech source texts. This texts were translated independently by two translators. After some necessary cleanings, 1000 segments were randomly selected for manual annotation. Both the reference translations were annotated, which means 2000 annotated segments in total.
The Czech data includes manual annotations of Czech reference translations of English source texts. This texts were translated independently by three translators. After some necessary cleanings, 1000 segments were randomly selected for manual annotation. All three reference translations were annotated, which means 3000 annotated segments in total.
Faust is part of PDT-C 1.0 (http://hdl.handle.net/11234/1-3185).
This machine translation test set contains 2223 Czech sentences collected within the FAUST project (https://ufal.mff.cuni.cz/grants/faust, http://hdl.handle.net/11234/1-3308).
Each original (noisy) sentence was normalized (clean1 and clean2) and translated to English independently by two translators.
ForFun is a database of linguistic forms and their syntactic functions built with the use of the multi-layer annotated corpora of Czech, the Prague Dependency Treebanks. The purpose of the Prague Database of Forms and Functions (ForFun) is to help the linguists to study the form-function relation, which we assume to be one of the principal tasks of both theoretical linguistics and natural language processing.
A prototypical question to be asked is "What purposes does a preposition 'po' serve for" or "What are the linguistic means in the sentence that can express the meaning 'a destination of an action'?". There are almost 1500 distinct forms (besides the 'po' preposition) and 65 distinct functions (besides the 'destination').