Czech OOV Inflection Dataset is a Czech inflection dataset of nouns, focused on evaluation in out-of-vocabulary (OOV) conditions. It consists of two parts: a standard lemma-disjoint train-dev-test split of a subset of noun paradigms of existing morphological dictionary Czech MorfFlex 2.0 (files train, dev and test-MorfFlex); and small set of neologisms from Čeština 2.0, annotated for inflected forms (file test-neologisms).
Czech morphological dictionary developed originally by Jan Hajič as a spelling checker and lemmatization dictionary. Currently it contains full morphological information for each covered wordform, as well as some derivational, semantic and named entity information.
Czech morphological dictionary developed originally by Jan Hajič as a spelling checker and lemmatization dictionary. Currently it contains full morphological information for each covered wordform, as well as some derivational, semantic and named entity information.
Czech morphological dictionary developed originally by Jan Hajič as a spelling checker and lemmatization dictionary. Currently it contains full morphological information for each covered wordform, as well as some derivational, semantic and named entity information.
MorfFlex CZ 2.0 is the Czech morphological dictionary developed originally by Jan Hajič as a spelling checker and lemmatization dictionary. MorfFlex is a flat list of lemma-tag-wordform triples. For each wordform, full inflectional information is coded in a positional tag. Wordforms are organized into entries (paradigm instances or paradigms in short) according to their formal morphological behavior. The paradigm (set of wordforms) is identified by a unique lemma. Apart from traditional morphological categories, the description also contains some semantic, stylistic and derivational information. For more details see a comprehensive specification of the Czech morphological annotation http://ufal.mff.cuni.cz/techrep/tr64.pdf .
Among the results of Russian influence on Czech in the 19th century was the emergence of an active past participle in -(v)ší in Czech. Although not welcomed by all grammarians, this participle continued its existence in Czech until today, becoming mainly a device of archaic and bookish style. In the actual work, the occurence oft the active past participle in -(v)ší in the largest partial corpus of the Czech National Corpus containing journalistic texts is studied. A main result of the study is that apart from a large number of examples from different verbs which show the active past participle on -(v)ší in the studied corpus once or twice and where it is indeed a device of archaic and bookish style, sometimes even of irony and humor, there is a small group of (mainly intransitive) verbs, where this participle functions with considerable frequency in stylistically more neutral contexts of written Standard Czech as the only participle (sometimes as a - stylistically more marked - variant of a more numerous active past participle in -l). In theses cases, it remains overwhelmingly a syntactically unextended direct attribute of a noun. Such active past participle in -(v)ší is to be found most often in sports coverage where it is built from a set of verbs with terminological function.