Creator: Hajič, Jan / Rights: PUB - LINDAT/CLARIAH-CZ Catalog Search Results

Start Over Creator Hajič, Jan Rights PUB

31. MorfFlex SK 170914

Creator:: Hajič, Jan and Hric, Jan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, computationalLexicon, and lexicalConceptualResource
Subject:: Slovak and morphological dictionary
Language:: Slovak
Description:: Slovak morphological dictionary modeled after the Czech one. It consists of (word form, lemma, POS tag) triples, reusing the Czech morphological system for POS tags and lemma descriptions.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

32. Morpho-syntactically annotated corpora provided for the PARSEME Shared Task on Semi-Supervised Identification of Verbal Multiword Expressions (edition 1.2)

Creator:: Guillaume, Bruno, Ramisch, Carlos, Waszczuk, Jakub, Monti, Johanna, Di Buono, Maria Pia, Sangati, Federico, Speranza, Giulia, Carlino, Carola, Güngör, Tunga, Yirmibeşoğlu, Zeynep, Sak, Haşim, Saraçlar, Murat, Giouli, Voula, Foufi, Vassiliki, Ramisch, Renata, Rademaker, Alexandre, Vale, Oto, Wilkens, Rodrigo, Candito, Marie, Crabbé, Benoît, Segonne, Vincent, Liebeskind, Chaya, Stymne, Sara, Hajič, Jan, Ginter, Filip, Luotolahti, Juhani, Straka, Milan, Zeman, Daniel, Barbu Mititelu, Verginica, Cristescu, Mihaela, Vaidya, Ashwini, Bhatia, Archna, Lichte, Timm, Ehren, Rafael, Jiang, Menghan, Xu, Hongzhi, Walsh, Abigail, Irimia, Elena, and Dowling, Meghan
Publisher:: PARSEME
Type:: text and corpus
Subject:: morphosyntactic annotation, dependency trees, and morphological analysis
Language:: German, Modern Greek (1453-), Basque, French, Irish, Hebrew, Hindi, Italian, Polish, Portuguese, Romanian, Swedish, Turkish, and Chinese
Description:: This multilingual resource contains corpora for 14 languages, gathered at the occasion of the 1.2 edition of the PARSEME Shared Task on semi-supervised Identification of Verbal MWEs (2020). These corpora were meant to serve as additional "raw" corpora, to help discovering unseen verbal MWEs. The corpora are provided in CONLL-U (https://universaldependencies.org/format.html) format. They contain morphosyntactic annotations (parts of speech, lemmas, morphological features, and syntactic dependencies). Depending on the language, the information comes from treebanks (mostly Universal Dependencies v2.x) or from automatic parsers trained on UD v2.x treebanks (e.g., UDPipe). VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). For the 1.2 shared task edition, the data covers 14 languages, for which VMWEs were annotated according to the universal guidelines. The corpora are provided in the cupt format, inspired by the CONLL-U format. Morphological and syntactic information – not necessarily using UD tagsets – including parts of speech, lemmas, morphological features and/or syntactic dependencies are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). This item contains training, development and test data, as well as the evaluation tools used in the PARSEME Shared Task 1.2 (2020). The annotation guidelines are available online: http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.2
Rights:: PARSEME Shared Task Raw Corpus Data (v. 1.2) Agreement, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-mwe-1.2-raw, and PUB

33. Multiword expressions in the Prague Dependency Treebank 2.0

Creator:: Bejček, Eduard, Klyueva, Natalia, Straňák, Pavel, Šidák, Pavel, Šťastná, Eva, Vimmrová, Pavlína, and Hajič, Jan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: MWE, multiword expressions, idiom, phraseme, and named entity
Language:: Czech
Description:: This dataset adds annotation of multiword expressions and multiword named entities to the original PDT 2.0 data. The annotation is stand-off, stored in the same PML format as the original PDT 2.0 data. It is to be used together with the PDT 2.0. and grant 1ET201120505 of the Academy of Sciences of the Czech Republic and grant MSM0021620838 of the Ministry of Youth, Education and Sport of The Czech Republic
Rights:: Creative Commons - Attribution 3.0 Unported (CC BY 3.0), http://creativecommons.org/licenses/by/3.0/, and PUB

34. Open SDP

Creator:: Flickinger, Dan, Hajič, Jan, Ivanova, Angelina, Kuhlmann, Marco, Miyao, Yusuke, Oepen, Stephan, and Zeman, Daniel
Publisher:: Oslo University and Charles University
Type:: text and corpus
Subject:: semantic dependency and parsing
Language:: English and Czech
Description:: The original SDP 2014 and 2015 data collections were made available under task-specific ‘evaluation’ licenses to registered SemEval participants. In mid-2016, all original data has been bundled with system submissions, supporting software, an additional SDP-style collection of semantic dependency graphs, and additional background material (from which some of the SDP target representations were derived) for release through the Linguistic Data Consortium (with LDC catalogue number LDC2016 T10). One of the four English target representations (viz. DM) and the entire Czech data (in the PSD target representation) are not derivative of LDC-licensed annotations and, thus, can be made available for direct download (Open SDP; version 1.1; April 2016) under a more permissive licensing scheme, viz. the Creative Common Attribution-NonCommercial-ShareAlike license. This package also includes some ‘richer’ meaning representations from which the English bi-lexical DM graphs derive, viz. scope-underspecified logical forms and more abstract, non-lexicalized ‘semantic networks’. The latter of these are formally (if not linguistically) similar to Abstract Meaning Representation (AMR) and are available in a range of serializations, including in AMR-like syntax. Please use the following bibliographic reference for the SDP 2016 data: @string{C:LREC = {{I}nternational {C}onference on {L}anguage {R}esources and {E}valuation}} @string{LREC:16 = {Proceedings of the 10th } # C:LREC} @string{L:LREC:16 = {Portoro\v{z}, Slovenia}} @inproceedings{Oep:Kuh:Miy:16, author = {Oepen, Stephan and Kuhlmann, Marco and Miyao, Yusuke and Zeman, Daniel and Cinkov{\'a}, Silvie and Flickinger, Dan and Haji\v{c}, Jan and Ivanova, Angelina and Ure\v{s}ov{\'a}, Zde\v{n}ka}, title = {Towards Comparability of Linguistic Graph Banks for Semantic Parsing}, booktitle = LREC:16 year = 2016, address = L:LREC:16, pages = {3991--3995} }
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

35. Open SDP 1.2

Creator:: Flickinger, Dan, Hajič, Jan, Ivanova, Angelina, Kuhlmann, Marco, Miyao, Yusuke, Oepen, Stephan, and Zeman, Daniel
Publisher:: Oslo University and Charles University
Type:: text and corpus
Subject:: semantic dependency and parsing
Language:: English and Czech
Description:: The original SDP 2014 and 2015 data collections were made available under task-specific ‘evaluation’ licenses to registered SemEval participants. In mid-2016, all original data has been bundled with system submissions, supporting software, an additional SDP-style collection of semantic dependency graphs, and additional background material (from which some of the SDP target representations were derived) for release through the Linguistic Data Consortium (with LDC catalogue number LDC2016 T10). One of the four English target representations (viz. DM) and the entire Czech data (in the PSD target representation) are not derivative of LDC-licensed annotations and, thus, can be made available for direct download (Open SDP; version 1.2; January 2017) under a more permissive licensing scheme, viz. the Creative Common Attribution-NonCommercial-ShareAlike license. This package also includes some ‘richer’ meaning representations from which the English bi-lexical DM graphs derive, viz. scope-underspecified logical forms and more abstract, non-lexicalized ‘semantic networks’. The latter of these are formally (if not linguistically) similar to Abstract Meaning Representation (AMR) and are available in a range of serializations, including in AMR-like syntax. Version 1.1 was released April 2016. Version 1.2 adds the 2015 Turku system, which was accidentally left out from version 1.1. Please use the following bibliographic reference for the SDP 2016 data: @string{C:LREC = {{I}nternational {C}onference on {L}anguage {R}esources and {E}valuation}} @string{LREC:16 = {Proceedings of the 10th } # C:LREC} @string{L:LREC:16 = {Portoro\v{z}, Slovenia}} @inproceedings{Oep:Kuh:Miy:16, author = {Oepen, Stephan and Kuhlmann, Marco and Miyao, Yusuke and Zeman, Daniel and Cinkov{\'a}, Silvie and Flickinger, Dan and Haji\v{c}, Jan and Ivanova, Angelina and Ure\v{s}ov{\'a}, Zde\v{n}ka}, title = {Towards Comparability of Linguistic Graph Banks for Semantic Parsing}, booktitle = LREC:16 year = 2016, address = L:LREC:16, pages = {3991--3995} }
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

36. PDT-Vallex: Czech Valency lexicon linked to treebanks

Creator:: Urešová, Zdeňka, Štěpánek, Jan, Hajič, Jan, Panevova, Jarmila, and Mikulová, Marie
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, lexicon, and lexicalConceptualResource
Subject:: annotation, corpora, data, lexicon, semantics, valency, and PDT
Language:: Czech
Description:: The valency lexicon PDT-Vallex has been built in close connection with the annotation of the Prague Dependency Treebank project (PDT) and its successors (mainly the Prague Czech-English Dependency Treebank project, PCEDT). It contains over 11000 valency frames for more than 7000 verbs which occurred in the PDT or PCEDT. It is available in electronically processable format (XML) together with the aforementioned treebanks (to be viewed and edited by TrEd, the PDT/PCEDT main annotation tool), and also in more human readable form including corpus examples (see the WEBSITE link below). The main feature of the lexicon is its linking to the annotated corpora - each occurrence of each verb is linked to the appropriate valency frame with additional (generalized) information about its usage and surface morphosyntactic form alternatives.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

37. PDT-Vallex: Czech Valency lexicon linked to treebanks 4.0 (PDT-Vallex 4.0)

Creator:: Urešová, Zdeňka, Bémová, Alevtina, Fučíková, Eva, Hajič, Jan, Kolářová, Veronika, Mikulová, Marie, Pajas, Petr, Panevová, Jarmila, and Štěpánek, Jan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, computationalLexicon, and lexicalConceptualResource
Subject:: verbal valency, valency, annotation, linguistic data, lexicon, lexical semantics, and PDT
Language:: Czech
Description:: The valency lexicon PDT-Vallex 4.0 has been built in close connection with the annotation of the Prague Dependency Treebank project (PDT) and its successors (mainly the Prague Czech-English Dependency Treebank project, PCEDT, the spoken language corpus (PDTSC) and corpus of user-generated texts in the project Faust). It contains over 14500 valency frames for almost 8500 verbs which occurred in the PDT, PCEDT, PDTSC and Faust corpora. In addition, there are nouns, adjectives and adverbs, linked from the PDT part only, increasing the total to over 17000 valency frames for 13000 words. All the corpora have been published in 2020 as the PDT-C 1.0 corpus with the PDT-Vallex 4.0 dictionary included; this is a copy of the dictionary published as a separate item for those not interested in the corpora themselves. It is available in electronically processable format (XML), and also in more human readable form including corpus examples (see the WEBSITE link below, and the links to its main publications elsewhere in this metadata). The main feature of the lexicon is its linking to the annotated corpora - each occurrence of each verb is linked to the appropriate valency frame with additional (generalized) information about its usage and surface morphosyntactic form alternatives. It replaces the previously published unversioned edition of PDT-Vallex from 2014.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

38. Prague Arabic Dependency Treebank 1.0

Creator:: Hajič, Jan, Smrž, Otakar, Zemánek, Petr, Pajas, Petr, Šnaidauf, Jan, Beška, Emanuel, Kracmar, Jakub, and Hassanová, Kamila
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: corpus and Arabic
Language:: Arabic
Description:: The PADT project might be summarized as an open-ended activity of the Center for Computational Linguistics, the Institute of Formal and Applied Linguistics, and the Institute of Comparative Linguistics, Charles University in Prague, resting in multi-level annotation of Arabic language resources in the light of the theory of Functional Generative Description (Sgall et al., 1986; Hajičová and Sgall, 2003).
Rights:: Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), http://creativecommons.org/licenses/by-nc-sa/3.0/, and PUB

39. Prague DaTabase of Spoken Czech 1.0

Creator:: Hajič, Jan, Pajas, Petr, Ircing, Pavel, Romportl, Jan, Peterek, Nino, Spousta, Miroslav, Mikulová, Marie, Grůber, Martin, and Legát, Milan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) and University of West Bohemia
Type:: audio and corpus
Subject:: spoken corpus, speech recognition, and speech reconstruction
Language:: Czech
Description:: PDTSC 1.0 is a multi-purpose corpus of spoken language. 768,888 tokens, 73,374 sentences and 7,324 minutes of spontaneous dialog speech have been recorded, transcribed and edited in several interlinked layers: audio recordings, automatic and manual transcription and manually reconstructed text. PDTSC 1.0 is a delayed release of data annotated in 2012. It is an update of Prague Dependency Treebank of Spoken Language (PDTSL) 0.5 (published in 2009). In 2017, Prague Dependency Treebank of Spoken Czech (PDTSC) 2.0 was published as an update of PDTSC 1.0.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

40. Prague Dependency Treebank - Consolidated 1.0 (PDT-C 1.0)

Creator:: Hajič, Jan, Bejček, Eduard, Bémová, Alevtina, Buráňová, Eva, Fučíková, Eva, Hajičová, Eva, Havelka, Jiří, Hlaváčová, Jaroslava, Homola, Petr, Ircing, Pavel, Kárník, Jiří, Kettnerová, Václava, Klyueva, Natalia, Kolářová, Veronika, Kučová, Lucie, Lopatková, Markéta, Mareček, David, Mikulová, Marie, Mírovský, Jiří, Nedoluzhko, Anna, Novák, Michal, Pajas, Petr, Panevová, Jarmila, Peterek, Nino, Poláková, Lucie, Popel, Martin, Popelka, Jan, Romportl, Jan, Rysová, Magdaléna, Semecký, Jiří, Sgall, Petr, Spoustová, Johanka, Straka, Milan, Straňák, Pavel, Synková, Pavlína, Ševčíková, Magda, Šindlerová, Jana, Štěpánek, Jan, Štěpánková, Barbora, Toman, Josef, Urešová, Zdeňka, Vidová Hladká, Barbora, Zeman, Daniel, Zikánová, Šárka, and Žabokrtský, Zdeněk
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: treebank, dependency, tectogrammatics, topic-focus articulation, multiword expressions, coreference, bridging relations, discourse, morphology, syntax, tokenization, lemmatization, semantic relations, lexical semantics, lexicon, valency, speech reconstruction, clauses, speech recognition, and spoken corpus
Language:: Czech
Description:: A richly annotated and genre-diversified language resource, The Prague Dependency Treebank – Consolidated 1.0 (PDT-C 1.0, or PDT-C in short in the sequel) is a consolidated release of the existing PDT-corpora of Czech data, uniformly annotated using the standard PDT scheme. PDT-corpora included in PDT-C: Prague Dependency Treebank (the original PDT contents, written newspaper and journal texts from three genres); Czech part of Prague Czech-English Dependency Treebank (translated financial texts, from English), Prague Dependency Treebank of Spoken Czech (spoken data, including audio and transcripts and multiple speech reconstruction annotation); PDT-Faust (user-generated texts). The difference from the separately published original treebanks can be briefly described as follows: it is published in one package, to allow easier data handling for all the datasets; the data is enhanced with a manual linguistic annotation at the morphological layer and new version of morphological dictionary is enclosed; a common valency lexicon for all four original parts is enclosed. Documentation provides two browsing and editing desktop tools (TrEd and MEd) and the corpus is also available online for searching using PML-TQ.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

« Previous
Next »
1
2
3
4
5
6
7
8
9
10

31. MorfFlex SK 170914

32. Morpho-syntactically annotated corpora provided for the PARSEME Shared Task on Semi-Supervised Identification of Verbal Multiword Expressions (edition 1.2)

33. Multiword expressions in the Prague Dependency Treebank 2.0

34. Open SDP

35. Open SDP 1.2

36. PDT-Vallex: Czech Valency lexicon linked to treebanks

37. PDT-Vallex: Czech Valency lexicon linked to treebanks 4.0 (PDT-Vallex 4.0)

38. Prague Arabic Dependency Treebank 1.0

39. Prague DaTabase of Spoken Czech 1.0

40. Prague Dependency Treebank - Consolidated 1.0 (PDT-C 1.0)

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Creator

Show values starting with

Language

Show values starting with

Publisher

Rights

Show values starting with

Subject

Show values starting with

Type

Show values starting with

Date

Original context has metadata only

Harvested from