Subject: text - LINDAT/CLARIAH-CZ Catalog Search Results

Start Over Subject text Date Unknown

1. Czech Text Document Corpus v 2.0

Creator:: Král, Pavel and Lenc, Ladislav
Publisher:: European Language Resources Association (ELRA)
Type:: text and corpus
Subject:: corpus, Czech, document classification, multi-label, and text
Language:: Czech
Description:: BASIC INFORMATION -------------------- Czech Text Document Corpus v 2.0 is a collection of text documents for automatic document classification in Czech language. It is composed of the text documents provided by the Czech News Agency and is freely available for research purposes. This corpus was created in order to facilitate a straightforward comparison of the document classification approaches on Czech data. It is particularly dedicated to evaluation of multi-label document classification approaches, because one document is usually labelled with more than one label. Besides the information about the document classes, the corpus is also annotated at the morphological layer. The main part (for training and testing) is composed of 11,955 real newspaper articles. We provide also a development set which is intended to be used for tuning of the hyper-parameters of the created models. This set contains 2735 additional articles. The total category number is 60 out of which 37 most frequent ones are used for classification. The reason of this reduction is to keep only the classes with the sufficient number of occurrences to train the models. Technical Details ------------------------ Text documents are stored in the individual text files using UTF-8 encoding. Each filename is composed of the serial number and the list of the categories abbreviations separated by the underscore symbol and the .txt suffix. Serial numbers are composed of five digits and the numerical series starts from the value one. For instance the file 00046_kul_nab_mag.txt represents the document file number 46 annotated by the categories kul (culture), nab (religion) and mag (magazine selection). The content of the document, i.e. the word tokens, is stored in one line. The tokens are separated by the space symbols. Every text document was further automatically mophologically analyzed. This analysis includes lemmatization, POS tagging and syntactic parsing. The fully annotated files are stored in .conll files. We also provide the lemmatized form, file with suffix .lemma, and appropriate POS-tags, see .pos files. The tokenized version of the documents is also available in .tok files. This corpus is available only for research purposes for free. Commercial use in any form is strictly excluded.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

2. Dr. A. Dvořák: Svatební košile: ballada pro soli, smíšený sbor a velký orkestr na slova K.J. Erbena : op. 69 : s mnoha notovými příklady

Creator:: Piskáček, Adolf and Erben, Karel Jaromír
Publisher:: Fr.A. Urbánek
Format:: print and 38 stran : noty ; 18 cm
Type:: model:monograph and TEXT
Subject:: Vokální hudba, zhudebněné texty, zhudebněná poezie (o ní), vokálně instrumentální hudba, Svatební košile (A. Dvořáka), rozbory skladeb, rozbory díla, hudební rozbory, česká vokální hudba, česká hudba, 80. léta 19. stol., 2. pol. 19. stol., studie, text, 19. století, kantáty, cantatas, světské kantáty, secular cantatas, vokální hudba, vocal music, 783.3+784.5, 784.5, 784, (048.8), and 9
Language:: Czech
Description:: Hudební osobnosti.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

3. Kratičká Hystorye Augspurského vyznánj

Creator:: Tablic, Bohuslav
Publisher:: Antonjn Gotljb
Format:: print and 253 stran ; 18 cm
Type:: model:monograph and TEXT
Subject:: text, vzácné tisky 19. stol. (sbírkový fond), vzácné a historické tisky MKP (sbírkový fond), Křesťanské církve, sekty, denominace, evangelické církve, církevní dějiny, Evangelical churches, church history, Slovensko (hist. území), Morava (Česko), Moravia (Czechia), essays, pojednání, (049), 274/279, 27-9, (437.32), 5, and 271/279
Language:: Czech
Description:: Stručná historie této církve od roku 1530 je doprovázena tabulkou s přehledem kněží a učitelů působících v jednotlivých městech na Moravě a Slovensku. and Křesťanské církve, obce a sekty. Původní církve. Východní křesťanské církve. Ruská pravoslavná církev. Řecká církev. Slovanská církev. Římskokatolická církev. Národní episkopální církve. Anglikánská církev. Protestantské církve a sekty. Presbyteriáni. Baptisté. Metodisté. Unitáři. Ostatní křesťanské obce většinou bez kněží. Mormoni. Kvakeři. Svědkové Jehovovi. Armáda spásy. YMCA, YWCA. Náboženské spolky. Kalendáře, ročenky a náboženská literatura jednotlivých církví.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

4. Taroky a Piquet: příruční knížka pro příznivce a hráče obou těchto oblíbených her s návodem i se všemi pravidly

Creator:: Chirenský, J
Publisher:: Nákladem Aloise Hynka
Format:: print and 22 stran ; 20 cm
Type:: model:monograph and TEXT
Subject:: Sport. Hry. Tělesná cvičení, vzácné tisky 20. stol. (sbírkový fond), vzácné a historické tisky MKP (sbírkový fond), příručky, karetní hry, hry (činnost), text, 20, and 796
Language:: Czech
Description:: Objasnění pravidel dvou karetních her se stručným návodem pro hráče. and Společenské hry. Stolové a deskové hry. Šachy. Karty. Loto. Hazardní hry. Hry pro jednoho hráče. Kouzelnické triky. Multimediální hry.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

1. Czech Text Document Corpus v 2.0

2. Dr. A. Dvořák: Svatební košile: ballada pro soli, smíšený sbor a velký orkestr na slova K.J. Erbena : op. 69 : s mnoha notovými příklady

3. Kratičká Hystorye Augspurského vyznánj

4. Taroky a Piquet: příruční knížka pro příznivce a hráče obou těchto oblíbených her s návodem i se všemi pravidly

Limit your search

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Coverage

Creator

Format

Language

Publisher

Rights

Subject

Show values starting with

Type

Original context has metadata only

Harvested from