Number of results to display per page
Search Results
42. EVALD 4.0 – Evaluator of Discourse
- Creator:
- Novák, Michal, Mírovský, Jiří, Rysová, Kateřina, Rysová, Magdaléna, and Hajičová, Eva
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- tool and toolService
- Subject:
- text coherence, discourse, automatic evaluation, and non-native speakers
- Language:
- Czech
- Description:
- EVALD 4.0 serves for automatic evaluation of surface coherence (cohesion) in Czech texts written by native speakers of Czech.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
43. EVALD 4.0 for Beginners – Evaluator of Discourse
- Creator:
- Novák, Michal, Mírovský, Jiří, Rysová, Kateřina, Rysová, Magdaléna, and Hajičová, Eva
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- tool and toolService
- Subject:
- text coherence, discourse, automatic evaluation, and non-native speakers
- Language:
- Czech
- Description:
- EVALD 4.0 for Beginners is a software that serves for automatic evaluation of Czech texts written by non-native speakers of Czech – language beginners.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
44. EVALD 4.0 for Foreigners – Evaluator of Discourse
- Creator:
- Novák, Michal, Mírovský, Jiří, Rysová, Kateřina, Rysová, Magdaléna, and Hajičová, Eva
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- tool and toolService
- Subject:
- text coherence, discourse, automatic evaluation, and non-native speakers
- Language:
- Czech
- Description:
- EVALD 4.0 for Foreigners is a software for automatic evaluation of surface coherence (cohesion) in Czech texts written by non-native speakers of Czech.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
45. FAUST cs-en 0.5
- Creator:
- Hajič, Jan, Mareček, David, Fučíková, Eva, Cinková, Silvie, Štěpánek, Jan, Mikulová, Marie, and Popel, Martin
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- noisy texts, parallel corpus, and machine translation
- Language:
- English and Czech
- Description:
- This machine translation test set contains 2223 Czech sentences collected within the FAUST project (https://ufal.mff.cuni.cz/grants/faust, http://hdl.handle.net/11234/1-3308). Each original (noisy) sentence was normalized (clean1 and clean2) and translated to English independently by two translators.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
46. FERNET-C5
- Creator:
- Lehečka, Jan and Švec, Jan
- Publisher:
- University of West Bohemia, Department of Cybernetics
- Type:
- text, mlmodel, and languageDescription
- Subject:
- Czech and BERT
- Language:
- Czech
- Description:
- The FERNET-C5 is a monolingual BERT language representation model trained from scratch on the Czech Colossal Clean Crawled Corpus (C5) data - a Czech mutation of the English C4 dataset. The training data contained almost 13 billion words (93 GB of text data). The model has the same architecture as the original BERT model, i.e. 12 transformation blocks, 12 attention heads and the hidden size of 768 neurons. In contrast to Google’s BERT models, we used SentencePiece tokenization instead of the Google’s internal WordPiece tokenization. More details can be found in README.txt. Yet more detailed description is available in https://arxiv.org/abs/2107.10042 The same models are also released at https://huggingface.co/fav-kky/FERNET-C5
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
47. FicTree 1.0
- Creator:
- Jelínek, Tomáš, Hnátková, Milena, and Skoumalová, Hana
- Publisher:
- Charles University, Faculty of Arts, Institute of Theoretical and Computational Linguistics
- Type:
- text and corpus
- Subject:
- treebank
- Language:
- Czech
- Description:
- FicTree is a dependency treebank of Czech fiction manually annotated in the format of the analytical layer of the Prague Dependency Trebank. The treebank consists of 12,760 sentences (166,432 tokens). The texts come from eight literary works published in the Czech Republic between 1991 and 2007. The syntactic annotation of the treebank was first performed by two distinct parsers (MSTParser and MaltParser) trained on the PDT training data, then manually corrected. Any differences between the two versions were resolved manually (by another annotator). The corpus is provided in a vertical format, where sentence boundaries are marked with a blank line. Every word form is written on a separate line, followed by five tab-separated attributes: lemma, tag, ID (word index in the sentence), head and deprel (analytical function, afun in the PDT formalism). The texts are shuffled in random chunks of maximum 100 words (respecting sentence boundaries). Each chunk is provided as a separate file, with the suggested division into train, dev and test sets written as file prefix.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
48. ForFun 1.0
- Creator:
- Mikulová, Marie and Bejček, Eduard
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- service and toolService
- Subject:
- form, function, database, and syntax
- Language:
- Czech
- Description:
- ForFun is a database of linguistic forms and their syntactic functions built with the use of the multi-layer annotated corpora of Czech, the Prague Dependency Treebanks. The purpose of the Prague Database of Forms and Functions (ForFun) is to help the linguists to study the form-function relation, which we assume to be one of the principal tasks of both theoretical linguistics and natural language processing. A prototypical question to be asked is "What purposes does a preposition 'po' serve for" or "What are the linguistic means in the sentence that can express the meaning 'a destination of an action'?". There are almost 1500 distinct forms (besides the 'po' preposition) and 65 distinct functions (besides the 'destination').
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
49. HaCzech: Dataset of Handwritten Czech
- Creator:
- Procházka, Štěpán and Straka, Milan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- image and corpus
- Subject:
- htr, ocr, manuscripts, chronicles, and handwriting
- Language:
- Czech
- Description:
- The dataset of handwritten Czech text lines, sourced from two chronicles (municipal chronicles 1931-1944, school chronicles 1913-1933). The dataset comprises 25k lines machine-extracted from scanned pages, and provides manual annotation of text contents for a subset of size 2k.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
50. High-Coverage Multi-Level Text Corpus for Non-Professional Voice Conservation
- Creator:
- Jůzová, Markéta, Tihelka, Daniel, and Matoušek, Jindřich
- Publisher:
- University of West Bohemia, Department of Cybernetics
- Type:
- text and corpus
- Subject:
- text-to-speech (TTS), voice conservation, voice banking, and text corpus
- Language:
- Czech
- Description:
- This text corpus contains a carefully optimized set of sentences that could be used in the process of preparing a speech corpus for the development of personalized text-to-speech system. It was designed primarily for the voice conservation procedure that must be performed in a relatively short period before a person loses his/her own voice, typically because of the total laryngectomy. Total laryngectomy is a radical treatment procedure which is often unavoidable to save life of patients who were diagnosed with severe laryngeal cancer. In spite of being very effective with respect to the primary treatment, it significantly handicaps the patients due to the permanent loss of their ability to use voice and produce speech. Luckily, the modern methods of computer text-to-speech (TTS) synthesis offer a possibility for "digital conservation" of patient's original voice for his/her future speech communication -- a procedure called voice banking or voice conservation. Moreover, the banking procedure can be undertaken by any person facing voice degradation or loss in farther future, or who is simply is willing to keep his/her voice-print.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB