Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) / Rights: PUB

251. Public License Selector

Creator:: Sedlák, Michal, Straňák, Pavel, and Kamocki, Pawel
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: Legal and Licensing
Language:: English
Description:: Customizable tool that will help user select the right open license for his data or software
Rights:: The MIT License (MIT), http://opensource.org/licenses/mit-license.php, and PUB

252. QTLeap WSD/NED corpus

Creator:: Agirre, Eneko, Branco, António, Popel, Martin, and Simov, Kiril
Publisher:: University of the Basque Country, UPV/EHU, Faculty of Science, Univeristy of Lisbon, FCUL, Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), and Bulgarian Academy of Sciences, IICT-BAS
Type:: text and corpus
Subject:: annotated corpus and multilingual
Language:: Basque, Bulgarian, Czech, English, Portuguese, and Spanish
Description:: This corpora is part of Deliverable 5.5 of the European Commission project QTLeap FP7-ICT-2013.4.1-610516 (http://qtleap.eu). The texts are Q&A interactions from the real-user scenario (batches 1 and 2). The interactions in this corpus are available in Basque, Bulgarian, Czech, English, Portuguese and Spanish. The texts have been automatically annotated with NLP tools, including Word Sense Disambiguation, Named Entity Disambiguation and Coreference resolution. Please check deliverable D5.6 in http://qtleap.eu/deliverables for more information.
Rights:: Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), http://creativecommons.org/licenses/by-nc-sa/3.0/, and PUB

253. Quality and Efficiency of Manual Annotation: Data from the Pre-annotation Bias Experiment (part of the PDT-C 2.0 project)

Creator:: Mikulová, Marie, Straka, Milan, Štěpánek, Jan, Štěpánková, Barbora, and Hajič, Jan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: annotation, syntax, inter-annotator agreement, pre-annotation bias, annotation efficiency, and annotation quality
Language:: Czech
Description:: Input data, individual experimental annotations, and a complete and detailed overview of the measured results related to the experiment described in the referenced paper.
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

254. Question Dialogs Dataset

Creator:: Vodolán, Miroslav and Jurčíček, Filip
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, other, and lexicalConceptualResource
Subject:: question dialogs and interactive learning
Language:: English
Description:: Dataset collected from natural dialogs which enables to test the ability of dialog systems to interactively learn new facts from user utterances throughout the dialog. The dataset, consisting of 1900 dialogs, allows simulation of an interactive gaining of denotations and questions explanations from users which can be used for the interactive learning.
Rights:: Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB

255. RobeCzech Base

Creator:: Straka, Milan, Náplava, Jakub, Straková, Jana, and Samuel, David
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, mlmodel, and languageDescription
Subject:: Czech, BERT, and RoBERTa
Language:: Czech
Description:: RobeCzech is a monolingual RoBERTa language representation model trained on Czech data. RoBERTa is a robustly optimized Transformer-based pretraining approach. We show that RobeCzech considerably outperforms equally-sized multilingual and Czech-trained contextualized language representation models, surpasses current state of the art in all five evaluated NLP tasks and reaches state-of-theart results in four of them. The RobeCzech model is released publicly at https://hdl.handle.net/11234/1-3691 and https://huggingface.co/ufal/robeczech-base, both for PyTorch and TensorFlow.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

256. Self-paced reading experiments on explicit and implicit contrastive and temporal discourse relations in Czech

Creator:: Zikánová, Šárka and Smolík, Filip
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, other, and languageDescription
Subject:: discourse, psycholinguistic experiments, explicit discourse relations, implicit discourse relations, and self-paced reading
Language:: Czech
Description:: Supplementary materials for the paper “Processing of explicit and implicit contrastive and temporal discourse relations in Czech” (submitted to Discourse Processes)
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

257. Semantic annotation of noun/verb conversion in Czech

Creator:: Ševčíková, Magda, Kyjánek, Lukáš, Hledíková, Hana, and Staňková, Anna
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: other, text, and lexicalConceptualResource
Subject:: conversion, semantic, noun, verb, word formation, and Czech
Language:: Czech
Description:: The item contains a list of 2,058 noun/verb conversion pairs along with related formations (word-formation paradigms) provided with linguistic features, including semantic categories that characterize semantic relations between the noun and the verb in each conversion pair. Semantic categories were assigned manually by two human annotators based on a set of sentences containing the noun and the verb from individual conversion pairs. In addition to the list of paradigms, the item contains a set of 739 files (a separate file for each conversion pair) annotated by the annotators in parallel and a set of 2,058 files containing the final annotation, which is included in the list of paradigms.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), PUB, and http://creativecommons.org/licenses/by-nc-sa/4.0/

258. Semantically annotated sample of Czech and English conversion pairs of verbs and nouns

Creator:: Hledíková, Hana
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, wordList, and lexicalConceptualResource
Subject:: word-formation, morphology, conversion, semantics, and cognitive
Language:: English and Czech
Description:: Supplementary files for a comparative study of word-formation without the addition of derivational affixes (conversion) in English and Czech. The two .csv files contain 300 verb-noun conversion pairs in English and 300 verb-noun conversion pairs in Czech, i.e. pairs where either the noun is created from the verb or the verb is created from the noun without the use of derivational affixes. In English, the noun and verb in the conversion pair have the same form. In Czech, the noun and verb in the conversion pair differ in inflectional affixes. The pairs are supplied with manual semantic annotation based on cognitive event schemata. A file with the Appendix includes a list of dictionary definition phrases used as a basis for the semantic annotation.
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

259. Sentiment Analysis (Czech Model)

Creator:: Vysušilová, Petra and Straka, Milan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, mlmodel, and languageDescription
Subject:: sentiment analysis and BERT
Language:: Czech
Description:: Sentiment analysis models for Czech language. Models are three Czech sentiment analysis datasets(http://liks.fav.zcu.cz/sentiment/): Mall, CSFD, Facebook, and joint data from all three datasets above, using Czech version of BERT model, RobeCzech. We present the best model for every dataset. Mall and CSFD models are new state-of-the-art for respective data. Demo jupyter notebook is available on the project GitHub. These models are a part of Czech NLP with Contextualized Embeddings master thesis.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

260. SiR 1.0

Creator:: Hladká, Barbora, Mírovský, Jiří, Kopp, Matyáš, and Moravec, Václav
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: news server articles, attribution, attribution signals, attribution sources, and annotation
Language:: Czech
Description:: SiR 1.0 is a corpus of Czech articles published on iRozhlas, a news server of a Czech public radio (https://www.irozhlas.cz/). It is a collection of 1 718 articles (42 890 sentences, 614 995 words) with manually annotated attribution of citation phrases and sources. The sources are classified into several classes of named and unnamed sources. The corpus consists of three parts, depending on the quality of the annotations: (i) triple-annotated articles: 46 articles (933 sentences, 13 242 words) annotated independently by three annotators and subsequently curated by an arbiter, (ii) double-annotated articles: 543 articles (12 347 sentences, 180 622 words) annotated independently by two annotators and automatically unified, and (iii) single-annotated articles: 1 129 articles (29 610 sentences, 421 131 words) annotated each only by a single annotator. The data were annotated in the Brat tool (https://brat.nlplab.org/) and are distributed in the Brat native format, i.e. each article is represented by the original plain text and a stand-off annotation file. Please cite the following paper when using the corpus for your research: Hladká Barbora, Jiří Mírovský, Matyáš Kopp, Václav Moravec. Annotating Attribution in Czech News Server Articles. In: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), pages 1817–1823, Marseille, France 20-25 June 2022.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

251. Public License Selector

252. QTLeap WSD/NED corpus

253. Quality and Efficiency of Manual Annotation: Data from the Pre-annotation Bias Experiment (part of the PDT-C 2.0 project)

254. Question Dialogs Dataset

255. RobeCzech Base

256. Self-paced reading experiments on explicit and implicit contrastive and temporal discourse relations in Czech

257. Semantic annotation of noun/verb conversion in Czech

258. Semantically annotated sample of Czech and English conversion pairs of verbs and nouns

259. Sentiment Analysis (Czech Model)

260. SiR 1.0

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Creator

Show values starting with

Language

Show values starting with

Publisher

Show values starting with

Rights

Show values starting with

Subject

Show values starting with

Type

Show values starting with

Date

Original context has metadata only

Harvested from