DeriNet is a lexical network which models derivational relations in the lexicon of Czech. Nodes of the network correspond to Czech lexemes, while edges represent derivational or compositional relations between a derived word and its base word / words. The present version, DeriNet 2.0, contains 1,027,665 lexemes (sampled from the MorfFlex dictionary) connected by 808682 derivational and 600 compositional links.
Compared to previous versions, version 2.0 uses a new format and contains new types of annotations: compounding, annotation of several morphological and other categories of lexemes, identification of root morphs of 244,198 lexemes, semantic labelling of 151,005 relations using five labels and identification of 13 fictitious lexemes.
DeriNet is a lexical network which models derivational relations in the lexicon of Czech. Nodes of the network correspond to Czech lexemes, while edges represent word-formational relations between a derived word and its base word / words. The present version, DeriNet 2.1, contains 1,039,012 lexemes (sampled from the MorfFlex CZ 2.0 dictionary) connected by 782,814 derivational, 50,533 orthographic variant, 1,952 compounding, 295 univerbation and 144 conversion relations.
Compared to the previous version, version 2.1 contains annotations of orthographic variants, full automatically generated annotation of affix morpheme boundaries (in addition to the roots annotated in 2.0), 202 affixoid lexemes serving as bases for compounding, annotation of corpus frequency of lexemes, annotation of verbal conjugation classes and a pilot annotation of univerbation. The set of part-of-speech tags was converted to Universal POS from the Universal Dependencies project.
Universal Derivations (UDer) is a collection of harmonized lexical networks capturing word-formation, especially derivational relations, in a cross-linguistically consistent annotation scheme for many languages. The annotation scheme is based on a rooted tree data structure, in which nodes correspond to lexemes, while edges represent derivational relations or compounding. The current version of the UDer collection contains twenty-seven harmonized resources covering twenty different languages.
Universal Derivations (UDer) is a collection of harmonized lexical networks capturing word-formation, especially derivational relations, in a cross-linguistically consistent annotation scheme for many languages. The annotation scheme is based on a rooted tree data structure, in which nodes correspond to lexemes, while edges represent derivational relations or compounding. The current version of the UDer collection contains thirty-one harmonized resources covering twenty-one different languages.
Universal Segmentations (UniSegments) is a collection of lexical resources capturing morphological segmentations harmonised into a cross-linguistically consistent annotation scheme for many languages. The annotation scheme consists of simple tab-separated columns that stores a word and its morphological segmentations, including pieces of information about the word and the segmented units, e.g., part-of-speech categories, type of morphs/morphemes etc. The current public version of the collection contains 38 harmonised segmentation datasets covering 30 different languages.
Grantová agentura České Republiky@@19-14534S@@Popis slovotvorné struktury českých slov na základě jazykových dat@@nationalFunds@@✖[remove]6
Ministerstvo školství, mládeže a tělovýchovy České republiky@@LM2015071@@LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat@@nationalFunds@@✖[remove]6