Harvested from: LINDAT/CLARIAH-CZ repository and LINDAT/CLARIAH-CZ repository / Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)

Start Over Publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) Harvested from LINDAT/CLARIAH-CZ repository Harvested from LINDAT/CLARIAH-CZ repository

271. Self-paced reading experiments on explicit and implicit contrastive and temporal discourse relations in Czech

Creator:: Zikánová, Šárka and Smolík, Filip
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, other, and languageDescription
Subject:: discourse, psycholinguistic experiments, explicit discourse relations, implicit discourse relations, and self-paced reading
Language:: Czech
Description:: Supplementary materials for the paper “Processing of explicit and implicit contrastive and temporal discourse relations in Czech” (submitted to Discourse Processes)
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

272. Semantic annotation of noun/verb conversion in Czech

Creator:: Ševčíková, Magda, Kyjánek, Lukáš, Hledíková, Hana, and Staňková, Anna
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: other, text, and lexicalConceptualResource
Subject:: conversion, semantic, noun, verb, word formation, and Czech
Language:: Czech
Description:: The item contains a list of 2,058 noun/verb conversion pairs along with related formations (word-formation paradigms) provided with linguistic features, including semantic categories that characterize semantic relations between the noun and the verb in each conversion pair. Semantic categories were assigned manually by two human annotators based on a set of sentences containing the noun and the verb from individual conversion pairs. In addition to the list of paradigms, the item contains a set of 739 files (a separate file for each conversion pair) annotated by the annotators in parallel and a set of 2,058 files containing the final annotation, which is included in the list of paradigms.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), PUB, and http://creativecommons.org/licenses/by-nc-sa/4.0/

273. Semantically annotated sample of Czech and English conversion pairs of verbs and nouns

Creator:: Hledíková, Hana
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, wordList, and lexicalConceptualResource
Subject:: word-formation, morphology, conversion, semantics, and cognitive
Language:: English and Czech
Description:: Supplementary files for a comparative study of word-formation without the addition of derivational affixes (conversion) in English and Czech. The two .csv files contain 300 verb-noun conversion pairs in English and 300 verb-noun conversion pairs in Czech, i.e. pairs where either the noun is created from the verb or the verb is created from the noun without the use of derivational affixes. In English, the noun and verb in the conversion pair have the same form. In Czech, the noun and verb in the conversion pair differ in inflectional affixes. The pairs are supplied with manual semantic annotation based on cognitive event schemata. A file with the Appendix includes a list of dictionary definition phrases used as a basis for the semantic annotation.
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

274. Sentiment Analysis (Czech Model)

Creator:: Vysušilová, Petra and Straka, Milan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, mlmodel, and languageDescription
Subject:: sentiment analysis and BERT
Language:: Czech
Description:: Sentiment analysis models for Czech language. Models are three Czech sentiment analysis datasets(http://liks.fav.zcu.cz/sentiment/): Mall, CSFD, Facebook, and joint data from all three datasets above, using Czech version of BERT model, RobeCzech. We present the best model for every dataset. Mall and CSFD models are new state-of-the-art for respective data. Demo jupyter notebook is available on the project GitHub. These models are a part of Czech NLP with Contextualized Embeddings master thesis.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

275. SiR 1.0

Creator:: Hladká, Barbora, Mírovský, Jiří, Kopp, Matyáš, and Moravec, Václav
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: news server articles, attribution, attribution signals, attribution sources, and annotation
Language:: Czech
Description:: SiR 1.0 is a corpus of Czech articles published on iRozhlas, a news server of a Czech public radio (https://www.irozhlas.cz/). It is a collection of 1 718 articles (42 890 sentences, 614 995 words) with manually annotated attribution of citation phrases and sources. The sources are classified into several classes of named and unnamed sources. The corpus consists of three parts, depending on the quality of the annotations: (i) triple-annotated articles: 46 articles (933 sentences, 13 242 words) annotated independently by three annotators and subsequently curated by an arbiter, (ii) double-annotated articles: 543 articles (12 347 sentences, 180 622 words) annotated independently by two annotators and automatically unified, and (iii) single-annotated articles: 1 129 articles (29 610 sentences, 421 131 words) annotated each only by a single annotator. The data were annotated in the Brat tool (https://brat.nlplab.org/) and are distributed in the Brat native format, i.e. each article is represented by the original plain text and a stand-off annotation file. Please cite the following paper when using the corpus for your research: Hladká Barbora, Jiří Mírovský, Matyáš Kopp, Václav Moravec. Annotating Attribution in Czech News Server Articles. In: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), pages 1817–1823, Marseille, France 20-25 June 2022.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

276. Slavic Forest, Norwegian Wood (models)

Creator:: Rosa, Rudolf, Zeman, Daniel, Mareček, David, and Žabokrtský, Zdeněk
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: other and toolService
Subject:: parsing, dependency parser, cross-lingual parsing, and universal dependencies
Language:: Slovak, Croatian, and Norwegian
Description:: Trained models for UDPipe used to produce our final submission to the Vardial 2017 CLP shared task (https://bitbucket.org/hy-crossNLP/vardial2017). The SK model was trained on CS data, the HR model on SL data, and the SV model on a concatenation of DA and NO data. The scripts and commands used to create the models are part of separate submission (http://hdl.handle.net/11234/1-1970). The models were trained with UDPipe version 3e65d69 from 3rd Jan 2017, obtained from https://github.com/ufal/udpipe -- their functionality with newer or older versions of UDPipe is not guaranteed. We list here the Bash command sequences that can be used to reproduce our results submitted to VarDial 2017. The input files must be in CoNLLU format. The models only use the form, UPOS, and Universal Features fields (SK only uses the form). You must have UDPipe installed. The feats2FEAT.py script, which prunes the universal features, is bundled with this submission. SK -- tag and parse with the model: udpipe --tag --parse sk-translex.v2.norm.feats07.w2v.trainonpred.udpipe sk-ud-predPoS-test.conllu A slightly better after-deadline model (sk-translex.v2.norm.Case-feats07.w2v.trainonpred.udpipe), which we mention in the accompanying paper, is also included. It is applied in the same way (udpipe --tag --parse sk-translex.v2.norm.Case-feats07.w2v.trainonpred.udpipe sk-ud-predPoS-test.conllu). HR -- prune the Features to keep only Case and parse with the model: python3 feats2FEAT.py Case < hr-ud-predPoS-test.conllu | udpipe --parse hr-translex.v2.norm.Case.w2v.trainonpred.udpipe NO -- put the UPOS annotation aside, tag Features with the model, merge with the left-aside UPOS annotation, and parse with the model (this hassle is because UDPipe cannot be told to keep UPOS and only change Features): cut -f1-4 no-ud-predPoS-test.conllu > tmp udpipe --tag no-translex.v2.norm.tgttagupos.srctagfeats.Case.w2v.udpipe no-ud-predPoS-test.conllu | cut -f5- | paste tmp - | sed 's/^\t$//' | udpipe --parse no-translex.v2.norm.tgttagupos.srctagfeats.Case.w2v.udpipe
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

277. Slavic Forest, Norwegian Wood (scripts)

Creator:: Rosa, Rudolf, Zeman, Daniel, Mareček, David, and Žabokrtský, Zdeněk
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: suiteOfTools and toolService
Subject:: parsing, dependency parser, universal dependencies, and cross-lingual parsing
Language:: Czech, Slovak, Slovenian, Croatian, Danish, Swedish, and Norwegian
Description:: Tools and scripts used to create the cross-lingual parsing models submitted to VarDial 2017 shared task (https://bitbucket.org/hy-crossNLP/vardial2017), as described in the linked paper. The trained UDPipe models themselves are published in a separate submission (https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-1971). For each source (SS, e.g. sl) and target (TT, e.g. hr) language, you need to add the following into this directory: - treebanks (Universal Dependencies v1.4): SS-ud-train.conllu TT-ud-predPoS-dev.conllu - parallel data (OpenSubtitles from Opus): OpenSubtitles2016.SS-TT.SS OpenSubtitles2016.SS-TT.TT !!! If they are originally called ...TT-SS... instead of ...SS-TT..., you need to symlink them (or move, or copy) !!! - target tagging model TT.tagger.udpipe All of these can be obtained from https://bitbucket.org/hy-crossNLP/vardial2017 You also need to have: - Bash - Perl 5 - Python 3 - word2vec (https://code.google.com/archive/p/word2vec/); we used rev 41 from 15th Sep 2014 - udpipe (https://github.com/ufal/udpipe); we used commit 3e65d69 from 3rd Jan 2017 - Treex (https://github.com/ufal/treex); we used commit d27ee8a from 21st Dec 2016 The most basic setup is the sl-hr one (train_sl-hr.sh): - normalization of deprels - 1:1 word-alignment of parallel data with Monolingual Greedy Aligner - simple word-by-word translation of source treebank - pre-training of target word embeddings - simplification of morpho feats (use only Case) - and finally, training and evaluating the parser Both da+sv-no (train_ds-no.sh) and cs-sk (train_cs-sk.sh) add some cross-tagging, which seems to be useful only in specific cases (see paper for details). Moreover, cs-sk also adds more morpho features, selecting those that seem to be very often shared in parallel data. The whole pipeline takes tens of hours to run, and uses several GB of RAM, so make sure to use a powerful computer.
Rights:: GNU General Public License 2 or later (GPL-2.0), http://opensource.org/licenses/GPL-2.0, and PUB

278. Slovak MorphoDiTa Models 170914

Creator:: Straka, Milan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, mlmodel, and languageDescription
Subject:: MorphoDiTa, Slovak, morphological analysis, morphological generation, and PoS tagging
Language:: Slovak
Description:: Slovak models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from MorfFlex SK 170914 and the PoS tagger is trained on automatically translated Prague Dependency Treebank 3.0 (PDT).
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

279. Spoken corpus of Karel Makoň

Creator:: Krůza, Oldřich
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: audio and corpus
Subject:: single speaker, spoken corpus, christianity, and mystic
Language:: Czech
Description:: Talks of Karel Makoň given to his friends in the course of late sixties through early nineties of the 20th century. The topic is mostly christian mysticism.
Rights:: Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0), http://creativecommons.org/licenses/by-sa/3.0/, and PUB

280. Spoken corpus of Karel Makoň (2020-11-16)

Creator:: Krůza, Jan Oldřich
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: audio and corpus
Subject:: single speaker, christianity, mystic, and speech corpus
Language:: Czech
Description:: Talks of Karel Makoň given to his friends in the course of late sixties through early nineties of the 20th century. The topic is mostly christian mysticism.
Rights:: Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB

« Previous
Next »
1
2
…
24
25
26
27
28
29
30
31
32
…
36
37

271. Self-paced reading experiments on explicit and implicit contrastive and temporal discourse relations in Czech

272. Semantic annotation of noun/verb conversion in Czech

273. Semantically annotated sample of Czech and English conversion pairs of verbs and nouns

274. Sentiment Analysis (Czech Model)

275. SiR 1.0

276. Slavic Forest, Norwegian Wood (models)

277. Slavic Forest, Norwegian Wood (scripts)

278. Slovak MorphoDiTa Models 170914

279. Spoken corpus of Karel Makoň

280. Spoken corpus of Karel Makoň (2020-11-16)

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Creator

Show values starting with

Language

Show values starting with

Publisher

Show values starting with

Rights

Show values starting with

Subject

Show values starting with

Type

Show values starting with

Date

Original context has metadata only

Harvested from