NomVallex 2.0 is a manually annotated valency lexicon of Czech nouns and adjectives, created in the theoretical framework of the Functional Generative Description and based on corpus data (the SYN series of corpora from the Czech National Corpus and the Araneum Bohemicum Maximum corpus). In total, NomVallex is comprised of 1027 lexical units contained in 570 lexemes, covering the following parts-of-speech and derivational categories: deverbal or deadjectival nouns, and deverbal, denominal, deadjectival or primary adjectives. Valency properties of a lexical unit are captured in a valency frame (modeled as a sequence of valency slots, each supplemented with a list of morphemic forms) and documented by corpus examples. In order to make it possible to study the relationship between valency behavior of base words and their derivatives, lexical units of nouns and adjectives in NomVallex are linked to their respective base lexical units (contained either in NomVallex itself or, in case of verbs, in the VALLEX lexicon), linking up to three parts-of-speech (i.e., noun – verb, adjective – verb, noun – adjective, and noun – adjective – verb).
In order to facilitate comparison, this submission also contains abbreviated entries of the base verbs of these nouns and adjectives from the VALLEX lexicon and simplified entries of the covered nouns and adjectives from the PDT-Vallex lexicon.
NomVallex is a manually annotated valency lexicon of Czech nouns and adjectives, adopting the theoretical framework of Functional Generative Description as its theoretical basis. In total, NomVallex 2.5 comprises 1337 lexical units contained in 730 lexemes. As for derivational categories, it covers deverbal, deadjectival or denominal nouns, and deverbal, denominal, deadjectival or primary adjectives. Valency properties of a lexical unit are captured in a valency frame (modeled as a sequence of valency slots, each supplemented with a list of morphemic forms) and documented by corpus examples (extracted from the SYN series of corpora from the Czech National Corpus or from the Araneum Bohemicum Maximum corpus). To enable analysis of the relationship between the valency behavior of base words and their derivatives, lexical units of nouns and adjectives in NomVallex are linked to their respective base lexical units (contained either in NomVallex itself or, in the case of verbs, in the VALLEX lexicon), linking together up to three parts of speech (i.e., noun–verb, e.g., vnímání ‘perception’ – vnímat ‘perceive’, adjective–verb, e.g., vnímatelný ‘perceivable’ – vnímat ‘perceive’, noun–adjective, e.g., vnímavost ‘perceptiveness’ – vnímavý ‘perceptive’, and noun–adjective–verb, e.g., vnímavost ‘perceptiveness’ – vnímavý ‘perceptive’ – vnímat ‘perceive’). NomVallex 2.5 is an enhanced edition of the NomVallex 2.0 version; new developments that feature in the NomVallex 2.5 version include an increase in the number of noun and adjectival lexemes covered, treatment of negation (i.e., negative forms of nouns and adjectives), and annotation of reciprocity or reflexivity.
Annotators: Veronika Kolářová, Václava Kettnerová, Jana Klímová and Jakub Sláma.
Software and technical support: Jiří Mírovský and Anna Vernerová.
The NomVallex I. lexicon describes valency of Czech deverbal nouns belonging to three semantic classes, i.e. Communication (dotaz 'question'), Mental Action (plán 'plan') and Psych State (nenávist 'hatred'). It covers both stem-nominals and root-nominals (dotazování se 'asking' and dotaz 'question'). In total, the lexicon includes 505 lexical units in 248 lexemes. Valency properties are captured in the form of valency frames, specifying valency slots and their morphemic forms, and are exemplified by corpus examples.
In order to facilitate comparison, this submission also contains abbreviated entries of the source verbs of these nouns from the Vallex lexicon and simplified entries of the covered nouns from the PDT-Vallex lexicon.
VALLEX 3.0 provides information on the valency structure (combinatorial potential) of verbs in their particular senses, which are characterized by glosses and examples. VALLEX 3.0 describes almost 4 600 Czech verbs in more than 10 800 lexical units, i.e., given verbs in the given senses.
VALLEX 3.0 is a is a collection of linguistically annotated data and documentation, resulting from an attempt at formal description of valency frames of Czech verbs. In order to satisfy different needs of different potential users, the lexicon is distributed (i) in a HTML version (the data allows for an easy and fast navigation through the lexicon) and (ii) in a machine-tractable form as a single XML file, so that the VALLEX data can be used in NLP applications.