DeriNet is a lexical network which models derivational relations in the lexicon of Czech. Nodes of the network correspond to Czech lexemes, while edges represent derivational or compositional relations between a derived word and its base word / words. The present version, DeriNet 2.0, contains 1,027,665 lexemes (sampled from the MorfFlex dictionary) connected by 808682 derivational and 600 compositional links.
Compared to previous versions, version 2.0 uses a new format and contains new types of annotations: compounding, annotation of several morphological and other categories of lexemes, identification of root morphs of 244,198 lexemes, semantic labelling of 151,005 relations using five labels and identification of 13 fictitious lexemes.
DeriNet is a lexical network which models derivational relations in the lexicon of Czech. Nodes of the network correspond to Czech lexemes, while edges represent word-formational relations between a derived word and its base word / words. The present version, DeriNet 2.1, contains 1,039,012 lexemes (sampled from the MorfFlex CZ 2.0 dictionary) connected by 782,814 derivational, 50,533 orthographic variant, 1,952 compounding, 295 univerbation and 144 conversion relations.
Compared to the previous version, version 2.1 contains annotations of orthographic variants, full automatically generated annotation of affix morpheme boundaries (in addition to the roots annotated in 2.0), 202 affixoid lexemes serving as bases for compounding, annotation of corpus frequency of lexemes, annotation of verbal conjugation classes and a pilot annotation of univerbation. The set of part-of-speech tags was converted to Universal POS from the Universal Dependencies project.
DeriNet is a lexical network which models derivational and compositional relations in the lexicon of Czech. Nodes of the network correspond to Czech lexemes, while edges represent word-formational relations between a derived word and its base word / words.
The present version, DeriNet 2.2, contains:
- 1,040,127 lexemes (sampled from the MorfFlex CZ 2.0 dictionary), connected by
- 782,904 derivational,
- 50,511 orthographic variant,
- 6,336 compounding,
- 288 univerbation, and
- 135 conversion relations.
Compared to the previous version, version 2.1 contains an overhaul of the compounding annotation scheme, 4384 extra compounds, 83 more affixoid lexemes serving as bases for compounding, more parts of speech serving as bases for compounding (adverbs, pronouns, numerals), and several minor corrections of derivational relations.
Diachronic corpus of Czech sized 3.45 million words (i.e. 4.1 million tokens). It contains 116 texts from the 14th-20th century period. The texts are transcribed, not transliterated. Diakorp v6 is provided in a CoNLL-U-like vertical format used as an input to the Manatee query engine. The data thus correspond to the corpus available via the KonText query interface to the registered users of CNC at http://www.korpus.cz
Titles of courses possibly relevant to the Digital Humanities for 2017-2018, manually gathered from course catalogues of most Czech state colleges, including the names of the teachers, department and school names, and the school-unique course IDs. All this information was publicly available in the individual course catalogues accessed from the official websites of the individual colleges.
Segment from Český zvukový týdeník Aktualita (Czech Aktualita Sound Newsreel) issue no. 48B from 1943 is about an event of the Board of Trustees for the Education of Youth called Sewing Dolls, which was part of the mandatory service. Girls, supervised by instructors, made toys out of pieces of cloth for the children of the labourers working in the Reich.
Segment from Český zvukový týdeník Aktualita (Czech Aktualita Sound Newsreel) issue no. 52A, B from 1944 shows the distribution of Christmas presents to poor children, which was organised by Social Aid in collaboration with the Board of Trustees for the Education of Youth and took place in the Great Hall of Lucerna Palace on 18 December. The event was attended by Minister of Education and People´s Enlightenment and Chairman of the Board Emanuel Moravec.
Segment from Český zvukový týdeník Aktualita (Czech Aktualita Sound Newsreel) issue no. 24B from 1944 was shot during the Days of Czech Youth organised by the Board of Trustees for the Education of Youth. The national event was preceded by districts rounds. The segment depicts the event that took place in Kolín nad Labem. The official procession through the streets of Kolín was followed by the District Track and Field Championship at the A.F.K. Stadium. Girls in folk costumes danced to folk songs.
The segment of Československý zvukový týdeník Aktualita (Czechoslovak Aktualita Sound Newsreel), 1938, issue no. 34 captures a speech given by Eddy Sherwood, an American journalist, Protestant missionary and YMCA official, in which he talks about the brave stance adopted by the Czechoslovak nation in the critical days of 1938.
This software package includes three tools: web frontend for machine translation featuring phonetic transcription of Ukrainian suitable for Czech speakers, API server and a tool for translation of documents with markup (html, docx, odt, pptx, odp,...). These tools are used in the Charles Translator service (https://translator.cuni.cz).
This software was developed within the EdUKate project, which aims to help mitigate language barriers between non-Czech-speaking children in the Czech Republic and the education in the Czech school system. The project focuses on the development and dissemination of multilingual digital learning materials for students in primary and secondary schools.