Harvested from: LINDAT/CLARIAH-CZ repository / Language: Czech - LINDAT/CLARIAH-CZ Catalog Search Results

301. Khresmoi Summary Translation Test Data 1.1

Creator:: Dušek, Ondřej, Hajič, Jan, Hlaváčová, Jaroslava, Pecina, Pavel, Tamchyna, Aleš, and Urešová, Zdeňka
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: corpus, test data, medical, health, machine translation, Czech, French, German, and English
Language:: English, Czech, French, and German
Description:: This package contains data sets for development and testing of machine translation of sentences from summaries of medical articles between Czech, English, French, and German. and This work was supported by the EU FP7 project Khresmoi (European Comission contract No. 257528). The language resources are distributed by the LINDAT/Clarin project of the Ministry of Education, Youth and Sports of the Czech Republic (project no. LM2010013). We thank all the data providers and copyright holders for providing the source data and anonymous experts for translating the sentences.
Rights:: Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0), http://creativecommons.org/licenses/by-nc/3.0/, and PUB

302. Khresmoi Summary Translation Test Data 2.0

Creator:: Dušek, Ondřej, Hajič, Jan, Hlaváčová, Jaroslava, Libovický, Jindřich, Pecina, Pavel, Tamchyna, Aleš, and Urešová, Zdeňka
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: corpus, test data, medical, health, machine translation, Czech, English, French, German, Hungarian, Polish, Spanish, and Swedish
Language:: Czech, English, French, German, Hungarian, Polish, Spanish, and Swedish
Description:: This package contains data sets for development (Section dev) and testing (Section test) of machine translation of sentences from summaries of medical articles between Czech, English, French, German, Hungarian, Polish, Spanish and Swedish. Version 2.0 extends the previous version by adding Hungarian, Polish, Spanish, and Swedish translations.
Rights:: Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB

303. KonText Web Demo

Creator:: Josífko, Michal
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: toolService and tool
Subject:: web service, corpus, parallel corpus, and demo
Language:: Czech and English
Description:: An interactive web demo for querying selected ÚFAL and LINDAT corpora. LINDAT/CLARIN KonText is a fork of ÚČNK KonText (https://github.com/czcorpus/kontext, maintained by Tomáš Machálek) that contains some modifications and additional features. Kontext, in turn, is a fork of the Bonito 2.68 python web interface to the corpus management tool Manatee (http://nlp.fi.muni.cz/trac/noske, created by Pavel Rychlý).
Rights:: GNU General Public License, version 2, http://www.gnu.org/licenses/gpl-2.0.html, and PUB

304. Korektor

Creator:: Richter, Michal
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: toolService and tool
Subject:: grammar checker and spellchecker
Language:: Czech
Description:: Statistical spell- and (occasional) grammar-checker. There are three versions: a unix command line utility and an OS X SpellServer with a System Service, that integrates with native OS X GUI applications, and a web service run by Lindat-Clarin, that can be used either through a web form in a browser, or by web applications using API. and The LINDAT-CLARIN project (LM2010013), fully supported by TheMinistry of Education, Sports and Youth of The Czech Republic under the programme LM of "Large Infrastructures"
Rights:: BSD 2-Clause "Simplified" or "FreeBSD" license, http://opensource.org/licenses/BSD-2-Clause, and PUB

305. KUK 0.0

Creator:: Hladká, Barbora, Cinková, Silvie, Kuk, Michal, Mírovský, Jiří, Novotná, Tereza, and Zahálková, Kristýna Nguyen
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: legal texts and court decisions
Language:: Czech
Description:: KUK 0.0 is a pilot version of a corpus of Czech legal and administrative texts designated as data for manual and automatic assessment of accessibility (comprehensibility or clarity) of Czech legal texts.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

306. L2 Acquisition Barbara Schmiedtova

Publisher:: Max Planck Institute for Psycholinguistics
Type:: corpus
Language:: Czech, English, German, and Vietnamese
Description:: Language Acquisition corpus
Rights:: Not specified

307. Ladislav Boháč (actor)

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: film Karel Hynek Mácha ukázka, Galerie osobností, People::Boháč Ladislav (1907-1978), People::Beneš Svatopluk (1918-2007), and Karel Hynek Mácha
Language:: Czech
Description:: Actor Ladislav Boháč with his colleague Svatopluk Beneš starring in Karel Hynek Mácha (dir. Zet Molas, 1937).
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

308. Languages in Migration

Creator:: Bučková, Aneta, Nekula, Marek, Lukeš, David, Woźniak, Michał, Wastl, Michael, and Polowy, Louisa
Publisher:: Faculty of Arts, Institute of the Czech National Corpus, Charles University in Prague and Universität Regensburg
Type:: text and corpus
Subject:: spoken language, bilingual, syntactic annotation, migrant language, narrative interviews, and language biography
Language:: German and Czech
Description:: LANGUAGES IN MIGRATION is designed as a representation of authentic spoken Czech and German that is used in informal speech (private environment, spontaneity, unpreparedness etc.) by Czech-German bilingual speakers born in Czechoslovakia around 1955 and who departed for Germany after becoming 12 years old. The corpus is composed of interviews conducted from 2018–2020 with 20 speakers on language biographies and narrated in Czech and German respectively. 10 interviews were recorded with late (German) repatriates and 10 with Czech migrants. The corpus includes transcripts of ca. 14 hours of Czech recordings and ca. 13,5 hours of German recordings. It contains 217 650 orthographic words (i.e. a total of 286 533 tokens including punctuation). Metadata of LANGUAGES IN MIGRATION include basic sociolinguistically relevant speaker categories (gender, year of birth and of migration, level of education and region of childhood and present residence). The transcription of LANGUAGES IN MIGRATION is linked to the corresponding audio track. The transcription was carried out on the orthographic tier and supplemented by an additional metalanguage tier. The corpus LANGUAGES IN MIGRATION is lemmatized and morphologically tagged in different formats for Czech and German (Stuttgart-Tübingen-Tagset). Deviations from the norm of the spoken Czech and German of the homeland, which are understood as the result of language contact and language isolation, are tagged in a further tier both in the Czech and in the German sub-corpuses of LANGUAGES IN MIGRATION. The (anonymized) corpus is provided in form of transcripts in EAF format, which can be viewed via the freely available ELAN program, and a (semi-XML) vertical format used as an input to the Manatee query engine. The data thus correspond to the corpus available via the KonText query engine to registered users of the CNC at http://www.korpus.cz
Rights:: Czech National Corpus (Shuffled Corpus Data), https://lindat.mff.cuni.cz/repository/xmlui/page/license-cnc, and ACA

309. Large Corpus of Czech Parliament Plenary Hearings

Creator:: Kratochvíl, Jonáš, Polák, Peter, and Bojar, Ondřej
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: audio and corpus
Subject:: ASR and Czech
Language:: Czech
Description:: We present a large corpus of Czech parliament plenary sessions. The corpus consists of approximately 444 hours of speech data and corresponding text transcriptions. The whole corpus has been segmented to short audio snippets making it suitable for both training and evaluation of automatic speech recognition (ASR) systems. The source language of the corpus is Czech, which makes it a valuable resource for future research as only a few public datasets are available for the Czech language.
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

310. Large-Scale Colloquial Persian 0.5

Creator:: Abdi Khojasteh, Hadi, Ansari, Ebrahim, and Bohlouli, Mahdi
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) and Institute for Advanced Studies in Basic Sciences (IASBS)
Type:: text and corpus
Subject:: PoS tagging, corpus, annotated corpus, multilingual, derivation, dependency parser, machine translation, informal language, spoken language, monolingual corpus, and bilingual corpus annotation
Language:: Persian, English, German, Czech, Italian, and Hindi
Description:: "Large Scale Colloquial Persian Dataset" (LSCP) is hierarchically organized in asemantic taxonomy that focuses on multi-task informal Persian language understanding as a comprehensive problem. LSCP includes 120M sentences from 27M casual Persian tweets with its dependency relations in syntactic annotation, Part-of-speech tags, sentiment polarity and automatic translation of original Persian sentences in five different languages (EN, CS, DE, IT, HI).
Rights:: Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), http://creativecommons.org/licenses/by-nc-nd/4.0/, and PUB

301. Khresmoi Summary Translation Test Data 1.1

302. Khresmoi Summary Translation Test Data 2.0

303. KonText Web Demo

304. Korektor

305. KUK 0.0

306. L2 Acquisition Barbara Schmiedtova

307. Ladislav Boháč (actor)

308. Languages in Migration

309. Large Corpus of Czech Parliament Plenary Hearings

310. Large-Scale Colloquial Persian 0.5

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Coverage

Creator

Show values starting with

Format

Language

Show values starting with

Publisher

Show values starting with

Rights

Show values starting with

Subject

Show values starting with

Type

Show values starting with

Date

Original context has metadata only

Harvested from