A dictionary of morphologically segmented word forms in Czech. Rules of manual segmentation are described in Pelegrinová, K., Mačutek, J., Čech, R. (2021). The Menzerath-Altmann law as the relation between lengths of words and morphemes in Czech. Jazykovedný časopis, 72, 405-414. The dictionary is based on short stories, fairy tales, letters and studies written by Karel Čapek.
A dictionary of morphologically segmented word forms in Czech. Rules of manual segmentation are described in Pelegrinová, K., Mačutek, J., Čech, R. (2021). The Menzerath-Altmann law as the relation between lengths of words and morphemes in Czech. Jazykovedný časopis, 72, 405-414. The dictionary is based on short stories, fairy tales, letters and studies written by Karel Čapek.
MSTperl is a Perl reimplementation of the MST parser of Ryan McDonald (http://www.seas.upenn.edu/~strctlrn/MSTParser/MSTParser.html).
MST parser (Maximum Spanning Tree parser) is a state-of-the-art natural language dependency parser -- a tool that takes a sentence and returns its dependency tree.
In MSTperl, only some functionality was implemented; the limitations include the following:
the parser is a non-projective one, curently with no possibility of enforcing the requirement of projectivity of the parse trees;
only first-order features are supported, i.e. no second-order or third-order features are possible;
the implementation of MIRA is that of a single-best MIRA, with a closed-form update instead of using quadratic programming.
On the other hand, the parser supports several advanced features:
parallel features, i.e. enriching the parser input with word-aligned sentence in other language;
adding large-scale information, i.e. the feature set enriched with features corresponding to pointwise mutual information of word pairs in a large corpus (CzEng).
The MSTperl parser is tuned for parsing Czech. Trained models are available for Czech, English and German. We can train the parser for other languages on demand, or you can train it yourself -- the guidelines are part of the documentation.
The parser, together with detailed documentation, is avalable on CPAN (http://search.cpan.org/~rur/Treex-Parser-MSTperl/). and The research has been supported by the EU Seventh Framework Programme under grant agreement 247762 (Faust), and by the grants GAUK116310 and GA201/09/H057.
MSTperl is a Perl reimplementation of the MST parser of Ryan McDonald (http://www.seas.upenn.edu/~strctlrn/MSTParser/MSTParser.html).
MST parser (Maximum Spanning Tree parser) is a state-of-the-art natural language dependency parser -- a tool that takes a sentence and returns its dependency tree.
In MSTperl, only some functionality was implemented; the limitations include the following:
the parser is a non-projective one, curently with no possibility of enforcing the requirement of projectivity of the parse trees;
only first-order features are supported, i.e. no second-order or third-order features are possible;
the implementation of MIRA is that of a single-best MIRA, with a closed-form update instead of using quadratic programming.
On the other hand, the parser supports several advanced features:
parallel features, i.e. enriching the parser input with word-aligned sentence in other language;
adding large-scale information, i.e. the feature set enriched with features corresponding to pointwise mutual information of word pairs in a large corpus (CzEng);
weighted/unweighted parser model interpolation;
combination of several instances of the MSTperl parser (through MST algorithm);
combination of several existing parses from any parsers (through MST algorithm).
The MSTperl parser is tuned for parsing Czech. Trained models are available for Czech, English and German. We can train the parser for other languages on demand, or you can train it yourself -- the guidelines are part of the documentation.
The parser, together with detailed documentation, is avalable on CPAN (http://search.cpan.org/~rur/Treex-Parser-MSTperl/). and The research has been supported by the EU Seventh Framework Programme under grant agreement 247762 (Faust), and by the grants GAUK116310 and GA201/09/H057.
This dataset adds annotation of multiword expressions and multiword named entities to the original PDT 2.0 data. The annotation is stand-off, stored in the same PML format as the original PDT 2.0 data. It is to be used together with the PDT 2.0. and grant 1ET201120505 of the Academy of Sciences of the Czech Republic and grant MSM0021620838 of the Ministry of Youth, Education and Sport of The Czech Republic
NER models for NameTag 2, named entity recognition tool, for English, German, Dutch, Spanish and Czech. Model documentation including performance can be found here: https://ufal.mff.cuni.cz/nametag/2/models . These models are for NameTag 2, named entity recognition tool, which can be found here: https://ufal.mff.cuni.cz/nametag/2 .
NER models for NameTag 2, named entity recognition tool, for English, German, Dutch, Spanish and Czech. Model documentation including performance can be found here: https://ufal.mff.cuni.cz/nametag/2/models . These models are for NameTag 2, named entity recognition tool, which can be found here: https://ufal.mff.cuni.cz/nametag/2 .
Trailer for Eva tropí hlouposti (Eva Fools Around, dir. Martin Frič, 1939). Actresses Nataša Gollová and Adina Mandlová in Šťastnou cestu (Happy Journey, dir. Otakar Vávra, 1943).
Segment from Československý zvukový týdeník Aktualita (Czechoslovak Aktualita Sound Newsreel) 1942, issue no. 18, from an event called National Health Week, organized by the Ministry of the Interior together with the Health Institute of the Protectorate of Bohemia and Moravia on 3Ï10 May 1942. The official objective of the event was to highlight the importance of healthcare. The report discusses the establishment of anti-tuberculosis stations in a number of places in the Protectorate. Footage of height measurements and the weighing of patients. Examples of the use of an X-ray station in Moravská Ostrava. Footage showing handling of X-ray machines. A close-up of lung X-rays. The segment includes footage of X-ray vans used to care for child patients. Images from a solarium where children are treated using sunlamps.
NomVallex 2.0 is a manually annotated valency lexicon of Czech nouns and adjectives, created in the theoretical framework of the Functional Generative Description and based on corpus data (the SYN series of corpora from the Czech National Corpus and the Araneum Bohemicum Maximum corpus). In total, NomVallex is comprised of 1027 lexical units contained in 570 lexemes, covering the following parts-of-speech and derivational categories: deverbal or deadjectival nouns, and deverbal, denominal, deadjectival or primary adjectives. Valency properties of a lexical unit are captured in a valency frame (modeled as a sequence of valency slots, each supplemented with a list of morphemic forms) and documented by corpus examples. In order to make it possible to study the relationship between valency behavior of base words and their derivatives, lexical units of nouns and adjectives in NomVallex are linked to their respective base lexical units (contained either in NomVallex itself or, in case of verbs, in the VALLEX lexicon), linking up to three parts-of-speech (i.e., noun – verb, adjective – verb, noun – adjective, and noun – adjective – verb).
In order to facilitate comparison, this submission also contains abbreviated entries of the base verbs of these nouns and adjectives from the VALLEX lexicon and simplified entries of the covered nouns and adjectives from the PDT-Vallex lexicon.