Harvested from: LINDAT/CLARIAH-CZ repository / Language: Czech - LINDAT/CLARIAH-CZ Catalog Search Results

Start Over Language Czech Harvested from LINDAT/CLARIAH-CZ repository Date 2010

Creator:: Bejček, Eduard, Klyueva, Natalia, Straňák, Pavel, Šidák, Pavel, Šťastná, Eva, Vimmrová, Pavlína, and Hajič, Jan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: MWE, multiword expressions, idiom, phraseme, and named entity
Language:: Czech
Description:: This dataset adds annotation of multiword expressions and multiword named entities to the original PDT 2.0 data. The annotation is stand-off, stored in the same PML format as the original PDT 2.0 data. It is to be used together with the PDT 2.0. and grant 1ET201120505 of the Academy of Sciences of the Czech Republic and grant MSM0021620838 of the Ministry of Youth, Education and Sport of The Czech Republic
Rights:: Creative Commons - Attribution 3.0 Unported (CC BY 3.0), http://creativecommons.org/licenses/by/3.0/, and PUB

Creator:: Křen, Michal, Bartoň, Tomáš, Hnátková, Milena, Jelínek, Tomáš, Petkevič, Vladimír, Procházka, Pavel, and Skoumalová, Hana
Publisher:: Faculty of Arts, Institute of the Czech National Corpus, Charles University in Prague
Type:: text and corpus
Subject:: corpus and written language
Language:: Czech
Description:: Corpus of contemporary Czech newspapers and magazines sized 700 MW. It contains various titles published between 1995–2007. The corpus is lemmatized and morphologically tagged by a combination of stochastic and rule-based methods. The corpus is provided in a (semi-XML) vertical format used as an input to the Manatee query engine. The data thus correspond to the corpus available via query interface to registered users of the CNC with one important exception: they are shuffled, i.e. divided into blocks sized max. 100 words (respecting the sentence boundaries) whose ordering was randomized within the given document. and MSM0021620823 – Český národní korpus a korpusy dalších jazyků
Rights:: Czech National Corpus (Shuffled Corpus Data), https://lindat.mff.cuni.cz/repository/xmlui/page/license-cnc, and ACA

Creator:: Křen, Michal, Bartoň, Tomáš, Cvrček, Václav, Hnátková, Milena, Jelínek, Tomáš, Kocek, Jan, Novotná, Renata, Petkevič, Vladimír, Procházka, Pavel, Schmiedtová, Věra, and Skoumalová, Hana
Publisher:: Faculty of Arts, Institute of the Czech National Corpus, Charles University in Prague
Type:: text and corpus
Subject:: balanced corpus and written language
Language:: Czech
Description:: Balanced corpus of contemporary written Czech sized 100 MW. It was created as a representation of written language from 2005–2009 and thus it contains a wide range of text types and genres (fiction, professional literature, newspapers etc.) in balanced proportions. The corpus is lemmatized and morphologically tagged by a combination of stochastic and rule-based methods. The corpus is provided in a (semi-XML) vertical format used as an input to the Manatee query engine. The data thus correspond to the corpus available via query interface to registered users of the CNC with one important exception: they are shuffled, i.e. divided into blocks sized max. 100 words (respecting the sentence boundaries) whose ordering was randomized within the given document. and MSM0021620823 – Český národní korpus a korpusy dalších jazyků
Rights:: Czech National Corpus (Shuffled Corpus Data), https://lindat.mff.cuni.cz/repository/xmlui/page/license-cnc, and ACA

Limit your search