Harvested from: LINDAT/CLARIAH-CZ repository / Language: Czech - LINDAT/CLARIAH-CZ Catalog Search Results

1. Annotation of Dramatic Situations in Theater Play Scripts

Creator:: Mareček, David, Nováková, Marie, Vosecká, Klára, Doležal, Josef, and Rosa, Rudolf
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) and The Academy of Performing Arts in Prague, Theatre Faculty (DAMU)
Type:: text and corpus
Subject:: theatre, play script, and dramatic situation
Language:: Czech
Description:: We defined 58 dramatic situations and annotated them in 19 play scripts. Then we selected only 5 well-recognized dramatic situations and annotated further 33 play scripts. In this version of the data, we release only play scripts that can be freely distributed, which is 9 play scripts. One play is annotated independently by three annotators.
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

2. Annotation of Dramatic Situations in Theater Play Scripts (2023)

Creator:: Mareček, David, Nováková, Marie, Vosecká, Klára, Doležal, Josef, and Rosa, Rudolf
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) and The Academy of Performing Arts in Prague, Theatre Faculty (DAMU)
Type:: text and corpus
Subject:: theatre, play script, and dramatic situation
Language:: Czech
Description:: We defined 58 dramatic situations and annotated them in 19 play scripts. Then we selected only 5 well-recognized dramatic situations and annotated further 33 play scripts. In the previous (first) version, we released 9 play scripts that could be freely distributed. In this (second) version of the data, we are adding another 10 plays for which we have obtained licenses from authors. In total, there are 19 play scripts available, and one of them is annotated three times - independently by three annotators.
Rights:: THEAITRE AI research only license, https://lindat.mff.cuni.cz/repository/xmlui/page/theaitre-license, and ACA

3. Czech Lexico-Semantic Database 0.1

Creator:: Tichy, Ondrej, Obstova, Zora, and Klegr, Ales
Publisher:: Charles University, Faculty of Arts
Type:: text, thesaurus, and lexicalConceptualResource
Subject:: onomasiological lexicography, thesaurus, lexico-semantic database, digitization, and Czech
Language:: Czech
Description:: A lexicographical project, whose aim is to digitize and align two Czech onomasiological dictionaries (Haller 1969–77; Klégr 2007) in order to create an integrated digital multi-purpose lexico-semantic database of Czech.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

4. MLASK: Multimodal Summarization of Video-based News Articles

Creator:: Krubiński, Mateusz and Pecina, Pavel
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: video and corpus
Subject:: Multimodal Summarization, Summarization, Video, and Image
Language:: Czech
Description:: The MLASK corpus consists of 41,243 multi-modal documents – video-based news articles in the Czech language – collected from Novinky.cz (https://www.novinky.cz/) and Seznam Zprávy (https://www.seznamzpravy.cz/). It was introduced in "MLASK: Multimodal Summarization of Video-based News Articles" (Krubiński & Pecina, EACL 2023). The articles' publication dates range from September 2016 to February 2022. The intended use case of the dataset is to model the task of multimodal summarization with multimodal output: based on a pair of a textual article and a short video, a textual summary is generated, and a single frame from the video is chosen as a pictorial summary. Each document consists of the following: - a .mp4 video - a single image (cover picture) - the article's text - the article's summary - the article's title - the article's publication date All of the videos are re-sampled to 25 fps and resized to the same resolution of 1280x720p. The maximum length of the video is 5 minutes, and the shortest one is 7 seconds. The average video duration is 86 seconds. The quantitative statistics of the lengths of titles, abstracts, and full texts (measured in the number of tokens) are below. Q1 and Q3 denote the first and third quartiles, respectively. / - / mean / Q1 / Median / Q3 / / Title / 11.16 ± 2.78 / 9 / 11 / 13 / / Abstract / 33.40 ± 13.86 / 22 / 32 / 43 / / Article / 276.96 ± 191.74 / 154 / 231 / 343 / The proposed training/dev/test split follows the chronological ordering based on publication data. We use the articles published in the first half (Jan-Jun) of 2021 for validation (2,482 instances) and the ones published in the second half (Jul-Dec) of 2021 and the beginning (Jan-Feb) of 2022 for testing (2,652 instances). The remaining data is used for training (36,109 instances). The textual data is shared as a single .tsv file. The visual data (video+image) is shared as a single archive for validation and test splits, and the one from the training split is partitioned based on the publication date.
Rights:: Seznam Dataset Licence, https://lindat.mff.cuni.cz/repository/xmlui/page/szn-dataset-licence, and RES

5. Quality and Efficiency of Manual Annotation: Data from the Pre-annotation Bias Experiment (part of the PDT-C 2.0 project)

Creator:: Mikulová, Marie, Straka, Milan, Štěpánek, Jan, Štěpánková, Barbora, and Hajič, Jan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: annotation, syntax, inter-annotator agreement, pre-annotation bias, annotation efficiency, and annotation quality
Language:: Czech
Description:: Input data, individual experimental annotations, and a complete and detailed overview of the measured results related to the experiment described in the referenced paper.
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

6. Semantically annotated sample of Czech and English conversion pairs of verbs and nouns

Creator:: Hledíková, Hana
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, wordList, and lexicalConceptualResource
Subject:: word-formation, morphology, conversion, semantics, and cognitive
Language:: English and Czech
Description:: Supplementary files for a comparative study of word-formation without the addition of derivational affixes (conversion) in English and Czech. The two .csv files contain 300 verb-noun conversion pairs in English and 300 verb-noun conversion pairs in Czech, i.e. pairs where either the noun is created from the verb or the verb is created from the noun without the use of derivational affixes. In English, the noun and verb in the conversion pair have the same form. In Czech, the noun and verb in the conversion pair differ in inflectional affixes. The pairs are supplied with manual semantic annotation based on cognitive event schemata. A file with the Appendix includes a list of dictionary definition phrases used as a basis for the semantic annotation.
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

7. SQAD 3.2

Creator:: Medveď, Marek
Publisher:: Masaryk University, NLP Centre
Type:: text and corpus
Subject:: QA, Question Answering, SQAD, and Czech QA
Language:: Czech
Description:: Simple question answering database version 3.2 (SQAD v3.2) created from Czech Wikipedia. The new version consists of more than 16000 records. Each record of SQAD consists of multiple files - question, answer extraction, answer selection, URL, question metadata, and in some cases, answer context.
Rights:: Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0), http://creativecommons.org/licenses/by-sa/3.0/, and PUB

8. SynSemClass Search Tool

Creator:: Petliak, Nataliia, Hajič, Jan, Urešová, Zdeňka, and Fučíková, Eva
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: ontology, lexical semantics, search tool, and language resource
Language:: English, Czech, and German
Description:: The SynSemClass Search Tool provides a web search tool for the SynSemClass 5.0 ontology. It includes several search options and criteria for building complex queries. The search results are rendered in a clear and user-friendly interactive representation.
Rights:: Not specified

1. Annotation of Dramatic Situations in Theater Play Scripts

2. Annotation of Dramatic Situations in Theater Play Scripts (2023)

3. Czech Lexico-Semantic Database 0.1

4. MLASK: Multimodal Summarization of Video-based News Articles

5. Quality and Efficiency of Manual Annotation: Data from the Pre-annotation Bias Experiment (part of the PDT-C 2.0 project)

6. Semantically annotated sample of Czech and English conversion pairs of verbs and nouns

7. SQAD 3.2

8. SynSemClass Search Tool

Limit your search

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Creator

Show values starting with

Language

Publisher

Rights

Show values starting with

Subject

Show values starting with

Type

Date

Original context has metadata only

Harvested from