Contributor: Ministerstvo školství, mládeže a tělovýchovy České republiky@@LM2015071@@LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat@@nationalFunds@@ / Type: text

31. ParaDi 2.0 (2018-01-24)

Creator:: Barančíková, Petra and Kettnerová, Václava
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, machineReadableDictionary, and lexicalConceptualResource
Subject:: multiword expressions, light verb construction, paraphrases, and idioms
Language:: Czech
Description:: ParaDi 2.0. is a dictionary of single verb paraphrases of Czech verbal multiword expressions - light verb constructions and idiomatic verb constructions. Moreover, it provides an elaborated set of morphological, syntactic and semantic features, including information on aspectual counterparts of verbs or paraphrasability conditions of given verbs. The format of ParaDi has been designed with respect to both human and machine readability - the dictionary is represented as a plain table in TSV format, as it is a flexible and language-independent data format.
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

32. PDT-Vallex: Czech Valency lexicon linked to treebanks 4.0 (PDT-Vallex 4.0)

Creator:: Urešová, Zdeňka, Bémová, Alevtina, Fučíková, Eva, Hajič, Jan, Kolářová, Veronika, Mikulová, Marie, Pajas, Petr, Panevová, Jarmila, and Štěpánek, Jan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, computationalLexicon, and lexicalConceptualResource
Subject:: verbal valency, valency, annotation, linguistic data, lexicon, lexical semantics, and PDT
Language:: Czech
Description:: The valency lexicon PDT-Vallex 4.0 has been built in close connection with the annotation of the Prague Dependency Treebank project (PDT) and its successors (mainly the Prague Czech-English Dependency Treebank project, PCEDT, the spoken language corpus (PDTSC) and corpus of user-generated texts in the project Faust). It contains over 14500 valency frames for almost 8500 verbs which occurred in the PDT, PCEDT, PDTSC and Faust corpora. In addition, there are nouns, adjectives and adverbs, linked from the PDT part only, increasing the total to over 17000 valency frames for 13000 words. All the corpora have been published in 2020 as the PDT-C 1.0 corpus with the PDT-Vallex 4.0 dictionary included; this is a copy of the dictionary published as a separate item for those not interested in the corpora themselves. It is available in electronically processable format (XML), and also in more human readable form including corpus examples (see the WEBSITE link below, and the links to its main publications elsewhere in this metadata). The main feature of the lexicon is its linking to the annotated corpora - each occurrence of each verb is linked to the appropriate valency frame with additional (generalized) information about its usage and surface morphosyntactic form alternatives. It replaces the previously published unversioned edition of PDT-Vallex from 2014.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

33. Persian Morphologically Segmented Lexicon 0.5

Creator:: Ansari, Ebrahim, Žabokrtský, Zdeněk, Haghdoost, Hamid, and Nikravesh, Mahshid
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, lexicon, and lexicalConceptualResource
Subject:: morphological analysis, and lemmatization
Language:: Persian
Description:: This dataset includes 45300 Persian word forms which are manually segmented into sequences of morphemes.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

34. Prague Czech-English Dependency Treebank 2.0 - Russian translation

Creator:: Novák, Michal, Nedoluzhko, Anna, and Schwarz (Khoroshkina), Anna
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: multilingual and coreference
Language:: English, Czech, and Russian
Description:: Prague Czech-English Dependency Treebank - Russian translation (PCEDT-R) is a project of translating a subset of Prague Czech-English Dependency Treebank 2.0 (PCEDT 2.0) to Russian and linguistically annotating the Russian translations with emphasis on coreference and cross-lingual alignment of coreferential expressions. Cross-lingual comparison of coreference means is currently the purpose that drives development of this corpus. The current version 0.5 is a preliminary version, which contains (+ denotes new features): * complete PCEDT 2.0 documents "wsj_1900"-"wsj_1949" * Czech-English word alignment of coreferential expressions annotated manually mainly on the t-layer + Russian translations of the original English sentences + automatic tokenization, part-of-speech tagging and morphological analysis for Russian + automatic word alignment between all Czech and Russian words + manual alignment between Russian and the other two languages on possessive pronouns
Rights:: CC-BY-NC-SA + LDC99T42, https://lindat.mff.cuni.cz/repository/xmlui/page/license-pcedt2, and RES

35. Prague Czech-English Dependency Treebank 2.0 Coref

Creator:: Nedoluzhko, Anna, Novák, Michal, Cinková, Silvie, Mikulová, Marie, and Mírovský, Jiří
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: multilingual and coreference
Language:: English and Czech
Description:: The Prague Czech-English Dependency Treebank 2.0 Coref (PCEDT 2.0 Coref) is a parallel treebank building upon the original PCEDT 2.0 release and enriching it with the extended manual annotation of coreference, as well as with an improved automatic annotation of the coreferential expression alignment.
Rights:: CC-BY-NC-SA + LDC99T42, https://lindat.mff.cuni.cz/repository/xmlui/page/license-pcedt2, and RES

36. Prague Discourse Treebank 2.0

Creator:: Rysová, Magdaléna, Synková, Pavlína, Mírovský, Jiří, Hajičová, Eva, Nedoluzhko, Anna, Ocelák, Radek, Pergler, Jiří, Poláková, Lucie, Scheller, Veronika, Zdeňková, Jana, and Zikánová, Šárka
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: discourse, bridging relations, coreference, topic-focus articulation, treebank, dependency, tectogrammatics, and PDT
Language:: Czech
Description:: PDiT 2.0 is a new version of the Prague Discourse Treebank. It contains a complex annotation of discourse phenomena enriched by the annotation of secondary connectives.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

37. Retrograde Morphemic Dictionary of Czech

Creator:: Slavíčková, Eleonora
Publisher:: Academia
Type:: text, lexicon, and lexicalConceptualResource
Subject:: morphemes and morphology
Language:: Czech
Description:: The data contains the morphemic dictionary scanned in the PDF format. It is divided into 3 parts: introductions.pdf - pp. 11-102 main_dictionary.pdf - pp. 113-506 appendices.pdf - pp. 509-645
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

38. Retrograde Morphemic Dictionary of Czech - verbs

Creator:: Slavíčková, Eleonora, Hlaváčová, Jaroslava, and Pognan, Patrice
Publisher:: Academia
Type:: text, lexicon, and lexicalConceptualResource
Subject:: morphemes, morphology, prefix, and root
Language:: Czech
Description:: The file contains all Czech verbs included in the Retrograde Morphemic Dictionary of Czech Language (Slavíčková Eleonora, Academia 1975). The data was obtained by scanning a portion of the dictionary that contains words ending in -ci and -ti. Among them, there were 18 non-verbs, which were removed. Using OCR, the data was converted into the plain text format and the result was checked by two independent readers. However, if a user encounters a forgotten error, please report.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

39. Slovak MorphoDiTa Models 170914

Creator:: Straka, Milan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, mlmodel, and languageDescription
Subject:: MorphoDiTa, Slovak, morphological analysis, morphological generation, and PoS tagging
Language:: Slovak
Description:: Slovak models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from MorfFlex SK 170914 and the PoS tagger is trained on automatically translated Prague Dependency Treebank 3.0 (PDT).
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

40. sqad 2.1

Creator:: Medveď, Marek, Horák, Aleš, and Kušniráková, Dáša
Publisher:: Natural Language Processing Centre, Faculty of Informatics, Masaryk University
Type:: text and corpus
Subject:: Czech, Simple Question Answering Database, and question answering
Language:: Czech
Description:: Simple question answering database version 2.1 (SQAD_v2.1) created from Czech Wikipedia. Each record of SQAD consist of four files (in vertical form provided with lemmatization and POS tagging) and two metadata files.
Rights:: GNU Library or "Lesser" General Public License 3.0 (LGPL-3.0), http://opensource.org/licenses/LGPL-3.0, and PUB

31. ParaDi 2.0 (2018-01-24)

32. PDT-Vallex: Czech Valency lexicon linked to treebanks 4.0 (PDT-Vallex 4.0)

33. Persian Morphologically Segmented Lexicon 0.5

34. Prague Czech-English Dependency Treebank 2.0 - Russian translation

35. Prague Czech-English Dependency Treebank 2.0 Coref

36. Prague Discourse Treebank 2.0

37. Retrograde Morphemic Dictionary of Czech

38. Retrograde Morphemic Dictionary of Czech - verbs

39. Slovak MorphoDiTa Models 170914

40. sqad 2.1

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Creator

Show values starting with

Language

Show values starting with

Publisher

Rights

Show values starting with

Subject

Show values starting with

Type

Original context has metadata only

Harvested from