Tento článek používá empirická data za účelem kontextualizace a shrnutí postojů Čechů ke slovenštině a jejich představ o znalosti slovenštiny. Klade si dále za cíl osvětlit změny, které nastaly po roce 1989, a přispět v obecnějším smyslu k existujícím poznatkům o česko-slovenských jazykových vztazích. Zároveň také usiluje o vyzdvižení obtížnosti při vymezení statutu dvou zeměpisně přilehlých kontaktních jazyků, jejichž identitu mluvčí definují ve stejné míře pomocí sdílené politické a historické zkušenosti (zejména ve dvacátém století) a jejich etnických, kulturních a jazykových rozdílů. Evidence je primárně shromážděna ze dvou celonárodních výzkumů, provedených pro autora v Centru pro výzkum veřejného mínění Sociologického ústavu AV ČR, v.v.i.: „Postoje českých mluvčích k lexikálním výpůjčkám“ (dále jen „Postoje“) a „Češi a slovenština“. Obsah a metodologie těchto výzkumů jsou založeny na různé řadě diachronních a synchronních dat, zejména pak studie z roku 1971 v Institutu pro výzkum veřejného mínění (předchůdce CVVM), a tří rozsáhlých průzkumů Evropské unie., This study employs a r ange of up-to-date statistical information, including the findings of two nationwide sur- veys conducted on the author’s behalf, to evaluate current perceptions of Slovak in the Czech Republic. Where appropriate, the results are compared with the evidence of other questionnaires (including Tejnor: 1971)., and Tom Dickins.
A new version of the previously published corpus Chroma. The version 2023.04 includes six children. Two transcripts (Julie20221, Klara30424) were removed since they did not meet the criteria on the dialogical format. The transcripts were revised (eliminating typing errors and inconsistencies in the transcription format) and morphologically annotated by the automatic tool MorphoDiTa. Detailed manual control of the annotation was performed on children's utterances; the annotation of adult data was not checked yet. Files are in plain text with UTF-8 encoding. Each file represents one recording session of one of the target children and is named with the alias of the child and their age at the given session in form YMMDD. Transcription rules and other details can be found on the homepage coczefla.ff.cuni.cz.
A new version of the previously published corpus Chroma wih morphological annotation. The version 2023.07 differs from 2023.04 in that it includes all seven children and it went through an additional careful check of consistency and conformity to the CHAT transcription principles.
Two transcripts (Julie20221, Klara30424) from the previous versions (2022.07, 2019.07) were removed since they did not meet our criteria on dialogical format. All transcripts of recordings made during one day were split into one file. Thus, version 2023.07 consists of 183 files/transcripts. The number of utterances and tokens given here in LINDAT corresponds to children's lines only.
Files are in plain text with UTF-8 encoding. Each file represents one recording session of one of the target children and is named with the alias of the child and their age at the given session in form YMMDD. Transcription rules and other details can be found on the homepage coczefla.ff.cuni.cz.
This is a Czech Named Entity Corpus 1.0 transformed into the CoNLL format. The original corpus can be downloaded from: http://hdl.handle.net/11858/00-097C-0000-0023-1B04-C. The CoNLL transformation is described in this publication: https://link.springer.com/chapter/10.1007/978-3-642-40585-3_20.
This is a Czech Named Entity Corpus 2.0 transformed into the CoNLL format. The original corpus can be downloaded from: http://hdl.handle.net/11858/00-097C-0000-0023-1B22-8. The CoNLL transformation is described in this publication: https://link.springer.com/chapter/10.1007/978-3-642-40585-3_20.
Web corpus of Czech, created in 2011. Contains newspapers+magazines, discussions, blogs. See http://www.lrec-conf.org/proceedings/lrec2012/summaries/120.html for details. and GA405/09/0278
Czech Contracts dataset was created as a part of the thesis Low-resource Text Classification (2021), A. Szabó, MFF UK.
Contracts are obtained from the Hlídač Státu web portal. Labels in the development and training set are automatically classified on the basis of the keyword method according to the thesis Automatická klasifikace smluv pro portál HlidacSmluv.cz, J. Maroušek (2020), MFF UK. For this reason, the goal in the classification is not to achieve 100% on the development set, as the classification contains a certain amount of noise. The test set is manually annotated. The dataset contains a total of 97493 contracts.
The Czech Legal Text Treebank (CLTT) is a collection of 1133 manually annotated dependency trees. CLTT consists of two legal documents: The Accounting Act (563/1991 Coll., as amended) and Decree on Double-entry Accounting for undertakers (500/2002 Coll., as amended).
A lexicographical project, whose aim is to digitize and align two Czech onomasiological dictionaries (Haller 1969–77; Klégr 2007) in order to create an integrated digital multi-purpose lexico-semantic database of Czech.
Czech models for NameTag, providing recognition of named entities.
