Creator: Novotná, Renata - LINDAT/CLARIAH-CZ Catalog Search Results

Start Over Creator Novotná, Renata

1. Jazyková potencialita: studium na bázi hapaxů legomenon

Creator:: Novotná, Renata
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: hapax legomenon, potentiality, suffix, dimminutive, feminine noun, compound, potencialita, sufix, deminutivum, přechýlené slovo, and kompozitum
Language:: Czech
Description:: The paper deals with word forms occurring in the SYN corpus (1.2 billion words) only once, twice or three times, the so-called hapax legomena, which provide a basis for the study of potentiality in language. As the material was very large, only 20 samples were chosen, each containing 3 000 forms, i. e. 60 000 forms overall. Approximately 50 % of word forms were various mistakes, especially typing errors, or words from other languages. Therefore only the remaining 30 000 word forms were selected as the basis for this study. The analysis showed that the most relevant suffixes for hapax legomena are -ovský (e. g. jimmyreedovský), -ák (e. g. medvěďák), -ista (e. g. havlista), -ing/-ink (e. g. gardening, dancink), -ovitý (e. g. kladivovitý), type po vojensku, diminutives derived from abstracts (e.g. minulůstka) and names of women professions (e. g. meduprodavačka). Moreover, compound words with the first parts dlouho- (e. g. dlouhorožec), gala- (e. g. galamenu), jino- (e. g. jinomluva), kino- (e. g. kinofajnšmekr), mega- (e. g. megakatastrofa), nízko- (e. g. nízkohlučný), polo- (e. g. poločitelný), video- (e. g. videokomentář) were typical for new words.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

2. Profesor František Čermák sedmdesátiletý /

Creator:: Novotná, Renata
Type:: text and články jubilejní
Subject:: Lingvistika. Jazyky, Čermák, František,, jubilea životní, lingvisté, české země od r. 1993 do současnosti, jazyk, písmo, and Československo 1918-1992
Language:: Czech
Rights:: unknown

3. SYN2005: balanced corpus of written Czech

Creator:: Čermák, František, Hlaváčová, Jaroslava, Hnátková, Milena, Jelínek, Tomáš, Kocek, Jan, Kopřivová, Marie, Křen, Michal, Novotná, Renata, Petkevič, Vladimír, Schmiedtová, Věra, Skoumalová, Hana, Spoustová, Johanka, Šulc, Michal, and Velíšek, Zdeněk
Publisher:: Faculty of Arts, Institute of the Czech National Corpus, Charles University in Prague
Type:: text and corpus
Subject:: balanced corpus and written language
Language:: Czech
Description:: Balanced corpus of contemporary written Czech sized 100 MW. It was created as a representation of written language from 2000–2004 and thus it contains a wide range of text types and genres (fiction, professional literature, newspapers etc.) in balanced proportions. The corpus is lemmatized and morphologically tagged by a combination of stochastic and rule-based methods. The corpus is provided in a (semi-XML) vertical format used as an input to the Manatee query engine. The data thus correspond to the corpus available via query interface to registered users of the CNC with one important exception: they are shuffled, i.e. divided into blocks sized max. 100 words (respecting the sentence boundaries) whose ordering was randomized within the given document. and MSM0021620823 – Český národní korpus a korpusy dalších jazyků
Rights:: Czech National Corpus (Shuffled Corpus Data), https://lindat.mff.cuni.cz/repository/xmlui/page/license-cnc, and ACA

4. SYN2006PUB: corpus of Czech newspapers

Creator:: Čermák, František, Hlaváčová, Jaroslava, Hnátková, Milena, Jelínek, Tomáš, Kocek, Jan, Kopřivová, Marie, Křen, Michal, Novotná, Renata, Petkevič, Vladimír, Schmiedtová, Věra, Skoumalová, Hana, Spoustová, Johanka, Šulc, Michal, and Velíšek, Zdeněk
Publisher:: Faculty of Arts, Institute of the Czech National Corpus, Charles University in Prague
Type:: text and corpus
Subject:: corpus and written language
Language:: Czech
Description:: Corpus of contemporary Czech newspapers and magazines sized 300 MW. It contains various titles published between the end of 1989 and 2004. The corpus is lemmatized and morphologically tagged by a combination of stochastic and rule-based methods. The corpus is provided in a (semi-XML) vertical format used as an input to the Manatee query engine. The data thus correspond to the corpus available via query interface to registered users of the CNC with one important exception: they are shuffled, i.e. divided into blocks sized max. 100 words (respecting the sentence boundaries) whose ordering was randomized within the given document. and MSM0021620823 – Český národní korpus a korpusy dalších jazyků
Rights:: Czech National Corpus (Shuffled Corpus Data), https://lindat.mff.cuni.cz/repository/xmlui/page/license-cnc, and ACA

5. SYN2010: balanced corpus of written Czech

Creator:: Křen, Michal, Bartoň, Tomáš, Cvrček, Václav, Hnátková, Milena, Jelínek, Tomáš, Kocek, Jan, Novotná, Renata, Petkevič, Vladimír, Procházka, Pavel, Schmiedtová, Věra, and Skoumalová, Hana
Publisher:: Faculty of Arts, Institute of the Czech National Corpus, Charles University in Prague
Type:: text and corpus
Subject:: balanced corpus and written language
Language:: Czech
Description:: Balanced corpus of contemporary written Czech sized 100 MW. It was created as a representation of written language from 2005–2009 and thus it contains a wide range of text types and genres (fiction, professional literature, newspapers etc.) in balanced proportions. The corpus is lemmatized and morphologically tagged by a combination of stochastic and rule-based methods. The corpus is provided in a (semi-XML) vertical format used as an input to the Manatee query engine. The data thus correspond to the corpus available via query interface to registered users of the CNC with one important exception: they are shuffled, i.e. divided into blocks sized max. 100 words (respecting the sentence boundaries) whose ordering was randomized within the given document. and MSM0021620823 – Český národní korpus a korpusy dalších jazyků
Rights:: Czech National Corpus (Shuffled Corpus Data), https://lindat.mff.cuni.cz/repository/xmlui/page/license-cnc, and ACA

1. Jazyková potencialita: studium na bázi hapaxů legomenon

2. Profesor František Čermák sedmdesátiletý /

3. SYN2005: balanced corpus of written Czech

4. SYN2006PUB: corpus of Czech newspapers

5. SYN2010: balanced corpus of written Czech

Limit your search

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Coverage

Creator

Show values starting with

Format

Language

Publisher

Rights

Subject

Show values starting with

Type

Date

Original context has metadata only

Harvested from