Language: Czech / Rights: PUB - LINDAT/CLARIAH-CZ Catalog Search Results

301. Large-Scale Colloquial Persian 0.5

Creator:: Abdi Khojasteh, Hadi, Ansari, Ebrahim, and Bohlouli, Mahdi
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) and Institute for Advanced Studies in Basic Sciences (IASBS)
Type:: text and corpus
Subject:: PoS tagging, corpus, annotated corpus, multilingual, derivation, dependency parser, machine translation, informal language, spoken language, monolingual corpus, and bilingual corpus annotation
Language:: Persian, English, German, Czech, Italian, and Hindi
Description:: "Large Scale Colloquial Persian Dataset" (LSCP) is hierarchically organized in asemantic taxonomy that focuses on multi-task informal Persian language understanding as a comprehensive problem. LSCP includes 120M sentences from 27M casual Persian tweets with its dependency relations in syntactic annotation, Part-of-speech tags, sentiment polarity and automatic translation of original Persian sentences in five different languages (EN, CS, DE, IT, HI).
Rights:: Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), http://creativecommons.org/licenses/by-nc-nd/4.0/, and PUB

302. Lexico-Semantic Annotation of PDT using Czech WordNet

Creator:: Bejček, Eduard, Hoffmannová, Petra, Holub, Martin, Hučínová, Marie, Pecina, Pavel, Straňák, Pavel, Šidák, Pavel, and Hajič, Jan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: PDT and Czech WordNet
Language:: Czech
Description:: This dataset contains annotation of PDT using Czech WordNet ontology: http://hdl.handle.net/11858/00-097C-0000-0001-4880-3 Data is stored in PML format. This is a stand-off annotation and for most use cases it requires PDT 2.0 and the Czech WordNet 1.9 PDT that we have used for annotation. and 1ET100300517, 1ET201120505
Rights:: Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), http://creativecommons.org/licenses/by-nc-sa/3.0/, and PUB

303. Lexicon of Czech and German Anaphoric Connectives

Creator:: Rysová, Kateřina, Poláková, Lucie, Rysová, Magdaléna, and Mírovský, Jiří
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, lexicon, and lexicalConceptualResource
Subject:: lexicon, discourse, and bilingual
Language:: Czech and German
Description:: GeCzLex 1.0 is an online electronic resource for translation equivalents of Czech and German discourse connectives. It contains anaphoric connectives for both languages and their possible translations documented in bilingual parallel corpora (not necessarily anaphoric). The entries have been interlinked via semantic annotation of the connectives (taken from monolingual lexicons of connectives CzeDLex and DiMLex) according to the PDTB 3 sense taxonomy and translation possibilities aquired from the Czech and German parallel data of the Intercorp project. The lexicon is the first bilingual inventory of connectives with linkage on the level of individual pairs (connective + discourse sense).
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

304. LiFR-Law. Corpus of Paraphrased Czech Administrative Texts with Reading Comprehension for Readability Studies

Creator:: Cinková, Silvie, Chromý, Jan, Šamánková, Jana, Hořeňovská, Karolína, Kettnerová, Václava, Kolářová, Veronika, Kubištová, Hana, and Panevová, Jarmila
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: readability, legal texts, legal domain, reading comprehension, corpus, and survey
Language:: Czech
Description:: LiFR-Law is a corpus of Czech legal and administrative texts with measured reading comprehension and a subjective expert annotation of diverse textual properties based on the Hamburg Comprehensibility Concept (Langer, Schulz von Thun, Tausch, 1974). It has been built as a pilot data set to explore the Linguistic Factors of Readability (hence the LiFR acronym) in Czech administrative and legal texts, modeling their correlation with actually observed reading comprehension. The corpus is comprised of 18 documents in total; that is, six different texts from the legal/administration domain, each in three versions: the original and two paraphrases. Each such document triple shares one reading-comprehension test administered to at least thirty readers of random gender, educational background, and age. The data set also captures basic demographic information about each reader, their familiarity with the topic, and their subjective assessment of the stylistic properties of the given document, roughly corresponding to the key text properties identified by the Hamburg Comprehensibility Concept.
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

305. LiFR-Law. Corpus of Paraphrased Czech Administrative Texts with Reading Comprehension for Readability Studies (2023-10-08)

Creator:: Cinková, Silvie, Chromý, Jan, Šamánková, Jana, Hořeňovská, Karolína, Kettnerová, Václava, Kolářová, Veronika, Kubištová, Hana, and Panevová, Jarmila
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: readability, legal texts, legal domain, reading comprehension, corpus, and survey
Language:: Czech
Description:: LiFR-Law is a corpus of Czech legal and administrative texts with measured reading comprehension and a subjective expert annotation of diverse textual properties based on the Hamburg Comprehensibility Concept (Langer, Schulz von Thun, Tausch, 1974). It has been built as a pilot data set to explore the Linguistic Factors of Readability (hence the LiFR acronym) in Czech administrative and legal texts, modeling their correlation with actually observed reading comprehension. The corpus is comprised of 18 documents in total; that is, six different texts from the legal/administration domain, each in three versions: the original and two paraphrases. Each such document triple shares one reading-comprehension test administered to at least thirty readers of random gender, educational background, and age. The data set also captures basic demographic information about each reader, their familiarity with the topic, and their subjective assessment of the stylistic properties of the given document, roughly corresponding to the key text properties identified by the Hamburg Comprehensibility Concept. Changes to the previous version and helpful comments • File names of the comprehension test results (self-explanatory) • Corrected one erroneous automatic evaluation rule in the multiple-choice evaluation (zahradnici_3, TRUE and FALSE had been swapped) • Evaluation protocols for both question types added into Folder lifr_formr_study_design • Data has been cleaned: empty responses to multiple-choice questions were re-inserted. Now, all surveys are considered complete that have reader’s subjective text evaluation complete (these were placed at the very end of each survey). • Only complete surveys (all 7 content questions answered) are represented. We dropped the replies of six users who did not complete their surveys. • A few missing responses to open questions have been detected and re-inserted. • The demographic data contain all respondents who filled in the informed consent and the demographic details, with respondents who did not complete any test survey (but provided their demographic details) in a separate file. All other data have been cleaned to contain only responses by the regular respondents (at least one completed survey).
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

306. LiFR-Lite

Creator:: Cinková, Silvie, Chromý, Jan, Hořeňovská, Karolína, Kettnerová, Václava, Kolářová, Veronika, Panevová, Jarmila, and Ševčíková, Magda
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: education, readability, reading comprehension, and text corpora
Language:: Czech
Description:: Corpus of Czech educational texts for readability studies, with paraphrases, measured reading comprehension, and a multi-annotator subjective rating of selected text features based on the Hamburg Comprehensibility Concept
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

307. LiFR-Lite (2021-11-05)

Creator:: Cinková, Silvie, Chromý, Jan, Hořeňovská, Karolína, Kettnerová, Václava, Kolářová, Veronika, Kubištová, Hana, Panevová, Jarmila, and Ševčíková, Magda
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: education, readability, reading comprehension, and text corpora
Language:: Czech
Description:: Corpus of Czech educational texts for readability studies, with paraphrases, measured reading comprehension, and a multi-annotator subjective rating of selected text features based on the Hamburg Comprehensibility Concept
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

308. Lingua::Interset 2.026

Creator:: Zeman, Daniel
Publisher:: Charles University, Faculty of Mathematics and Physics
Type:: tool and toolService
Subject:: morphology, part of speech, conversion, and tagset
Language:: Arabic, Bulgarian, Bengali, Catalan, Czech, Danish, German, Modern Greek (1453-), English, Spanish, Estonian, Basque, Persian, Finnish, Ancient Greek (to 1453), Hebrew, Hindi, Croatian, Japanese, Multiple languages, and Portuguese
Description:: Lingua::Interset is a universal morphosyntactic feature set to which all tagsets of all corpora/languages can be mapped. Version 2.026 covers 37 different tagsets of 21 languages. Limited support of the older drivers for other languages (which are not included in this package but are available for download elsewhere) is also available; these will be fully ported to Interset 2 in future. Interset is implemented as Perl libraries. It is also available via CPAN.
Rights:: Artistic License (Perl) 1.0, http://opensource.org/licenses/Artistic-Perl-1.0, and PUB

309. Luděk Pachman (chess grand master)

Creator:: Krátký film and Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: šachovnice, fšachy simultánní, šachisté nejmladší, Galerie osobností, People::Pachman Luděk (1924-), and Československý filmový týdeník 1954/43
Language:: Czech
Description:: Chess grand master Luděk Pachman in a simultaneous game with young chess players in a segment from Československý filmový týdeník (Czechoslovak Film Weekly Newsreel) 1954, issue no. 43.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

310. Machine Translation Testsuite for Gender-Consistent Translation

Creator:: Aires, João Paulo
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: machine translation, testsuite, evaluation, and gender
Language:: English and Czech
Description:: Document-level testsuite for evaluation of gender translation consistency. Our Document-Level test set consists of selected English documents from the WMT21 newstest annotated with gender information. Czech unnanotated references are also added for convenience. We semi-automatically annotated person names and pronouns to identify the gender of these elements as well as coreferences. Our proposed annotation consists of three elements: (1) an ID, (2) an element class, and (3) gender. The ID identifies a person's name and its occurrences (name and pronouns). The element class identifies whether the tag refers to a name or a pronoun. Finally, the gender information defines whether the element is masculine or feminine. We performed a series of NLP techniques to automatically identify person names and coreferences. This initial process resulted in a set containing 45 documents to be manually annotated. Thus, we started a manual annotation of these documents to make sure they are correctly tagged. See README.md for more details.
Rights:: Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB

301. Large-Scale Colloquial Persian 0.5

302. Lexico-Semantic Annotation of PDT using Czech WordNet

303. Lexicon of Czech and German Anaphoric Connectives

304. LiFR-Law. Corpus of Paraphrased Czech Administrative Texts with Reading Comprehension for Readability Studies

305. LiFR-Law. Corpus of Paraphrased Czech Administrative Texts with Reading Comprehension for Readability Studies (2023-10-08)

306. LiFR-Lite

307. LiFR-Lite (2021-11-05)

308. Lingua::Interset 2.026

309. Luděk Pachman (chess grand master)

310. Machine Translation Testsuite for Gender-Consistent Translation

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Creator

Show values starting with

Language

Show values starting with

Publisher

Show values starting with

Rights

Show values starting with

Subject

Show values starting with

Type

Show values starting with

Date

Original context has metadata only

Harvested from