Number of results to display per page
Search Results
652. Extended CLEF eHealth 2013-2015 IR Test Collection
- Creator:
- Pecina, Pavel and Saleh, Shadi
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- cross-lingual information retrieval and machine translation
- Language:
- English, Czech, French, German, Hungarian, Polish, Spanish, and Swedish
- Description:
- This package contains an extended version of the test collection used in the CLEF eHealth Information Retrieval tasks in 2013--2015. Compared to the original version, it provides complete query translations into Czech, French, German, Hungarian, Polish, Spanish and Swedish and additional relevance assessment.
- Rights:
- Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB
653. Extended Morphosyntactic Testset for Word2Vec
- Creator:
- Kocmi, Tom and Bojar, Ondřej
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text, other, and lexicalConceptualResource
- Subject:
- syntactic questions
- Language:
- English
- Description:
- We have created test set for syntactic questions presented in the paper [1] which is more general than Mikolov's [2]. Since we were interested in morphosyntactic relations, we extended only the questions of the syntactic type with exception of nationality adjectives which is already covered completely in Mikolov's test set. We constructed the pairs more or less manually, taking inspiration in the Czech side of the CzEng corpus [3], where explicit morphological annotation allows to identify various pairs of Czech words (different grades of adjectives, words and their negations, etc.). The word-aligned English words often shared the same properties. Another sources of pairs were acquired from various webpages usually written for learners of English. For example for verb tense, we relied on a freely available list of English verbs and their morphological variations. We have included 100-1000 different pairs for each question set. The questions were constructed from the pairs similarly as by Mikolov: generating all possible pairs of pairs. This leads to millions of questions, so we randomly selected 1000 instances per question set, to keep the test set in the same order of magnitude. Additionally, we decided to extend set of questions on opposites to cover not only opposites of adjectives but also of nouns and verbs.
- Rights:
- Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB
654. Extended Textual Coreference and Bridging Relations in PDT 2.0
- Creator:
- Nedoluzhko, Anna and Mírovský, Jiří
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- bridging anaphora, textual coreference, and PDT
- Language:
- Czech
- Description:
- Annotation of extended textual coreference and bridging relations in the Prague Dependency Treebank 2.0 and project LINDAT-Clarin LM2010013, grant GAČR GA405/09/0729
- Rights:
- Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), http://creativecommons.org/licenses/by-nc-sa/3.0/, and PUB
655. Extract
- Creator:
- Forsberg, Markus and Ranta, Aarne
- Publisher:
- Språkbanken, Dept. of Swedish Language, Göteborg University
- Type:
- toolService
- Subject:
- morphology extraction
- Description:
- Extract is a tool for supervised morphological lexicon extraction from raw text data.
- Rights:
- Not specified
656. Eye-Tracking Recordings from a Pilot Study of WMT-style MT Outputs Ranking
- Creator:
- Bojar, Ondřej, Děchtěrenko, Filip, and Zelenina, Maria
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- image and corpus
- Subject:
- eye-tracking and MT evaluation
- Language:
- Czech and English
- Description:
- This package contains the eye-tracker recordings of 8 subjects evaluating English-to-Czech machine translation quality using the WMT-style ranking of sentences. We provide the set of sentences evaluated, the exact screens presented to the annotators (including bounding box information for every area of interest and even for individual letters in the text) and finally the raw EyeLink II files with gaze trajectories. The description of the experiment can be found in the paper: Ondřej Bojar, Filip Děchtěrenko, Maria Zelenina. A Pilot Eye-Tracking Study of WMT-Style Ranking Evaluation. Proceedings of the LREC 2016 Workshop “Translation Evaluation – From Fragmented Tools and Data Sets to an Integrated Ecosystem”, Georg Rehm, Aljoscha Burchardt et al. (eds.). pp. 20-26. May 2016, Portorož, Slovenia. This work has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement no. 645452 (QT21). This work was partially financially supported by the Government of Russian Federation, Grant 074-U01. This work has been using language resources developed, stored and distributed by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic (project LM2010013).
- Rights:
- Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB
657. F. Vaníček (film professional)
- Creator:
- Veselý, Bohumil
- Publisher:
- Národní filmový archiv
- Type:
- video and clip
- Subject:
- Galerie osobností and People::Vaníček F. (1911-1967)
- Language:
- No linguistic content
- Description:
- Footage from the wedding of film professional F. Vaníček.
- Rights:
- http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
658. Facebook Data for Sentiment Analysis
- Creator:
- Habernal, Ivan, Ptáček, Tomáš, and Steinberger, Josef
- Publisher:
- University of West Bohemia
- Type:
- text and corpus
- Subject:
- sentiment analysis and opinion mining
- Language:
- Czech
- Description:
- Corpus consisting of 10,000 Facebook posts manually annotated on sentiment (2,587 positive, 5,174 neutral, 1,991 negative and 248 bipolar posts). The archive contains data and statistics in an Excel file (FBData.xlsx) and gold data in two text files with posts (gold-posts.txt) and labels (gols-labels.txt) on corresponding lines.
- Rights:
- Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0), http://creativecommons.org/licenses/by-sa/3.0/, and PUB
659. Fairytale child
- Creator:
- Rosa, Rudolf
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- toolService and tool
- Subject:
- dialogue system, morphological generation, Treex, morphological analysis, and interactive
- Language:
- English and Czech
- Description:
- Fairytale Child is a simple chatbot trying to simulate a curious child. It asks the user to tell a fairy tale, often interrupting to ask for details and clarifications. However, it remembers what it was told and tries to show it if possible. The chatbot can communicate in Czech and in English. It analyzes the morphology of each sentence produced by the user with natural language processing tools, tries to identify potential questions to ask, and then asks one. A morphological generator is employed to generate correctly inflected sentences in Czech, so that the resulting sentences sound as natural as possible. Pohádkové dítě je jednoduchý chatbot, simulující zvídavé dítě. Požádá uživatele, aby mu vyprávěl pohádku, ale často ho přerušuje, aby se zeptal na detaily a vysvětlení. Pamatuje si ale, co mu uživatel řekl, a snaží se to pokud možno dát najevo. Chatbot umí komunikovat česky a anglicky. Analyzuje tvarosloví každé uživatelovy věty pomocí NLP nástrojů, pokusí se nalézt chodnou otázku, a tu pak položí. Aby tvořené české věty zněly co nejpřirozeněji, využívá se pro skloňování tvaroslovný generátor. and The work has been supported by GAUK 1572314 and SVV 260104. It has been using language resources developed, stored and distributed by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic (project LM2010013).
- Rights:
- GNU General Public License, version 2, http://www.gnu.org/licenses/gpl-2.0.html, and PUB
660. Fairytale child (2014-09-26)
- Creator:
- Rosa, Rudolf
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- toolService and tool
- Subject:
- dialogue system, morphological generation, Treex, morphological analysis, and interactive
- Language:
- English and Czech
- Description:
- Fairytale Child is a simple chatbot trying to simulate a curious child. It asks the user to tell a fairy tale, often interrupting to ask for details and clarifications. However, it remembers what it was told and tries to show it if possible. The chatbot can communicate in Czech and in English. It analyzes the morphology of each sentence produced by the user with natural language processing tools, tries to identify potential questions to ask, and then asks one. A morphological generator is employed to generate correctly inflected sentences in Czech, so that the resulting sentences sound as natural as possible. Pohádkové dítě je jednoduchý chatbot, simulující zvídavé dítě. Požádá uživatele, aby mu vyprávěl pohádku, ale často ho přerušuje, aby se zeptal na detaily a vysvětlení. Pamatuje si ale, co mu uživatel řekl, a snaží se to pokud možno dát najevo. Chatbot umí komunikovat česky a anglicky. Analyzuje tvarosloví každé uživatelovy věty pomocí NLP nástrojů, pokusí se nalézt chodnou otázku, a tu pak položí. Aby tvořené české věty zněly co nejpřirozeněji, využívá se pro skloňování tvaroslovný generátor. and The work has been supported by GAUK 1572314 and SVV 260104. It has been using language resources developed, stored and distributed by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic (project LM2010013).
- Rights:
- GNU General Public License, version 2, http://www.gnu.org/licenses/gpl-2.0.html, and PUB