DeriNet is a lexical network which models derivational relations in the lexicon of Czech. Nodes of the network correspond to Czech lexemes, while edges represent word-formational relations between a derived word and its base word / words. The present version, DeriNet 2.1, contains 1,039,012 lexemes (sampled from the MorfFlex CZ 2.0 dictionary) connected by 782,814 derivational, 50,533 orthographic variant, 1,952 compounding, 295 univerbation and 144 conversion relations.
Compared to the previous version, version 2.1 contains annotations of orthographic variants, full automatically generated annotation of affix morpheme boundaries (in addition to the roots annotated in 2.0), 202 affixoid lexemes serving as bases for compounding, annotation of corpus frequency of lexemes, annotation of verbal conjugation classes and a pilot annotation of univerbation. The set of part-of-speech tags was converted to Universal POS from the Universal Dependencies project.
DeriNet is a lexical network which models derivational and compositional relations in the lexicon of Czech. Nodes of the network correspond to Czech lexemes, while edges represent word-formational relations between a derived word and its base word / words.
The present version, DeriNet 2.2, contains:
- 1,040,127 lexemes (sampled from the MorfFlex CZ 2.0 dictionary), connected by
- 782,904 derivational,
- 50,511 orthographic variant,
- 6,336 compounding,
- 288 univerbation, and
- 135 conversion relations.
Compared to the previous version, version 2.1 contains an overhaul of the compounding annotation scheme, 4384 extra compounds, 83 more affixoid lexemes serving as bases for compounding, more parts of speech serving as bases for compounding (adverbs, pronouns, numerals), and several minor corrections of derivational relations.
Diachronic corpus of Czech sized 3.45 million words (i.e. 4.1 million tokens). It contains 116 texts from the 14th-20th century period. The texts are transcribed, not transliterated. Diakorp v6 is provided in a CoNLL-U-like vertical format used as an input to the Manatee query engine. The data thus correspond to the corpus available via the KonText query interface to the registered users of CNC at http://www.korpus.cz
Phonological neighborhood density is known to influence lexical access, speech production as well as perception processes. Lexical competition is thought to be the central concept from which the neighborhood effect emanates: highly competitive neighborhoods are characterized by large degrees of phonemic co-activation, which can delay speech recognition and facilitate speech production. The present study investigates phonetic learning in English as a foreign language in relation to phonological neighborhood density and onset density to see whether dense or sparse neighborhoods are more conducive to the incorporation of novel phonetic detail. In addition, the effect of voice-contrasted minimal pairs (bat-pat) is explored. Results indicate that sparser neighborhoods with weaker lexical competition provide the most optimal phonological environment for phonetic learning. Moreover, novel phonetic details are incorporated faster in neighborhoods without minimal pairs. Results indicate that lexical competition plays a role in the dissemination of phonetic updates in the lexicon of foreign language learners.
Titles of courses possibly relevant to the Digital Humanities for 2017-2018, manually gathered from course catalogues of most Czech state colleges, including the names of the teachers, department and school names, and the school-unique course IDs. All this information was publicly available in the individual course catalogues accessed from the official websites of the individual colleges.
The aim of the course is to introduce digital humanities and to describe various aspects of digital content processing.
The course consists of 10 lessons with video material and a PowerPoint presentation with the same content.
Every lesson contains a practical session – either a Jupyter Notebook to work in Python or a text file with a short description of the task. Most of the practical tasks consist of running the programme and analyse the results.
Although the course does not focus on programming, the code can be reused easily in individual projects.
Some experience in running Python code is desirable but not required.
The data set includes training, development and test data from the shared tasks on pronoun-focused machine translation and cross-lingual pronoun prediction from the EMNLP 2015 workshop on Discourse in Machine Translation (DiscoMT2015). The release also contains the submissions to the pronoun-focused machine translation along with the manual annotations used for the official evaluation as well as gold-standard annotations of pronoun coreference for the shared task test set.
Segment from Český zvukový týdeník Aktualita (Czech Aktualita Sound Newsreel) issue no. 48B from 1943 is about an event of the Board of Trustees for the Education of Youth called Sewing Dolls, which was part of the mandatory service. Girls, supervised by instructors, made toys out of pieces of cloth for the children of the labourers working in the Reich.