Creator: Straňák, Pavel / Rights: PUB - LINDAT/CLARIAH-CZ Catalog Search Results

Start Over Creator Straňák, Pavel Rights PUB

11. Multiword expressions in the Prague Dependency Treebank 2.0

Creator:: Bejček, Eduard, Klyueva, Natalia, Straňák, Pavel, Šidák, Pavel, Šťastná, Eva, Vimmrová, Pavlína, and Hajič, Jan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: MWE, multiword expressions, idiom, phraseme, and named entity
Language:: Czech
Description:: This dataset adds annotation of multiword expressions and multiword named entities to the original PDT 2.0 data. The annotation is stand-off, stored in the same PML format as the original PDT 2.0 data. It is to be used together with the PDT 2.0. and grant 1ET201120505 of the Academy of Sciences of the Czech Republic and grant MSM0021620838 of the Ministry of Youth, Education and Sport of The Czech Republic
Rights:: Creative Commons - Attribution 3.0 Unported (CC BY 3.0), http://creativecommons.org/licenses/by/3.0/, and PUB

12. ParCzech 3.0

Creator:: Kopp, Matyáš, Stankov, Vladislav, Bojar, Ondřej, Hladká, Barbora, and Straňák, Pavel
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: audio and corpus
Subject:: Parliament of the Czech Republic, Chamber of Deputies, stenographic protocols, TEI encoding, and speech corpus
Language:: Czech
Description:: The ParCzech 3.0 corpus is the third version of ParCzech consisting of stenographic protocols that record the Chamber of Deputies’ meetings held in the 7th term (2013-2017) and the current 8th term (2017-Mar 2021). The protocols are provided in their original HTML format, Parla-CLARIN TEI format, and the format suitable for Automatic Speech Recognition. The corpus is automatically enriched with the morphological, syntactic, and named-entity annotations using the procedures UDPipe 2 and NameTag 2. The audio files are aligned with the texts in the annotated TEI files.
Rights:: Public Domain Dedication (CC Zero), http://creativecommons.org/publicdomain/zero/1.0/, and PUB

13. ParCzech PS7 1.0

Creator:: Hladká, Barbora, Kopp, Matyáš, and Straňák, Pavel
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: Parliament of the Czech Republic, Chamber of Deputies, stenographic protocols, TEI encoding, and TEITOK
Language:: Czech
Description:: The ParCzech PS7 1.0 corpus is the very first member of the corpus family of data coming from the Parliament of the Czech Republic. ParCzech PS7 1.0 consists of stenographic protocols that record the Chamber of Deputies' meetings held in the 7th term between 2013-2017. The audio recordings are available as well. Transcripts are provided in the original HTML as harvested, and also converted into TEI-derived XML format for use in TEITOK corpus manager. The corpus is automatically enriched with the morphological and named-entity annotations using the procedures MorphoDita and NameTag.
Rights:: Public Domain Dedication (CC Zero), http://creativecommons.org/publicdomain/zero/1.0/, and PUB

14. ParCzech PS7 2.0

Creator:: Hladká, Barbora, Kopp, Matyáš, and Straňák, Pavel
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: Parliament of the Czech Republic, Chamber of Deputies, stenographic protocols, TEI encoding, and TEITOK
Language:: Czech
Description:: The ParCzech PS7 2.0 corpus is the second version of ParCzech PS7 consisting of stenographic protocols that record the Chamber of Deputies' meetings held in the 7th term between 2013-2017. The protocols are provided in their original HTML format, TEI format and TEI-derived format to make them searchable in the TEITOK corpus manager. Their audio recordings are available as well. The corpus is automatically enriched with the morphological, syntactic, and named-entity annotations using the procedures UDPipe 2 and NameTag 2.
Rights:: Public Domain Dedication (CC Zero), http://creativecommons.org/publicdomain/zero/1.0/, and PUB

15. Prague Dependency Treebank - Consolidated 1.0 (PDT-C 1.0)

Creator:: Hajič, Jan, Bejček, Eduard, Bémová, Alevtina, Buráňová, Eva, Fučíková, Eva, Hajičová, Eva, Havelka, Jiří, Hlaváčová, Jaroslava, Homola, Petr, Ircing, Pavel, Kárník, Jiří, Kettnerová, Václava, Klyueva, Natalia, Kolářová, Veronika, Kučová, Lucie, Lopatková, Markéta, Mareček, David, Mikulová, Marie, Mírovský, Jiří, Nedoluzhko, Anna, Novák, Michal, Pajas, Petr, Panevová, Jarmila, Peterek, Nino, Poláková, Lucie, Popel, Martin, Popelka, Jan, Romportl, Jan, Rysová, Magdaléna, Semecký, Jiří, Sgall, Petr, Spoustová, Johanka, Straka, Milan, Straňák, Pavel, Synková, Pavlína, Ševčíková, Magda, Šindlerová, Jana, Štěpánek, Jan, Štěpánková, Barbora, Toman, Josef, Urešová, Zdeňka, Vidová Hladká, Barbora, Zeman, Daniel, Zikánová, Šárka, and Žabokrtský, Zdeněk
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: treebank, dependency, tectogrammatics, topic-focus articulation, multiword expressions, coreference, bridging relations, discourse, morphology, syntax, tokenization, lemmatization, semantic relations, lexical semantics, lexicon, valency, speech reconstruction, clauses, speech recognition, and spoken corpus
Language:: Czech
Description:: A richly annotated and genre-diversified language resource, The Prague Dependency Treebank – Consolidated 1.0 (PDT-C 1.0, or PDT-C in short in the sequel) is a consolidated release of the existing PDT-corpora of Czech data, uniformly annotated using the standard PDT scheme. PDT-corpora included in PDT-C: Prague Dependency Treebank (the original PDT contents, written newspaper and journal texts from three genres); Czech part of Prague Czech-English Dependency Treebank (translated financial texts, from English), Prague Dependency Treebank of Spoken Czech (spoken data, including audio and transcripts and multiple speech reconstruction annotation); PDT-Faust (user-generated texts). The difference from the separately published original treebanks can be briefly described as follows: it is published in one package, to allow easier data handling for all the datasets; the data is enhanced with a manual linguistic annotation at the morphological layer and new version of morphological dictionary is enclosed; a common valency lexicon for all four original parts is enclosed. Documentation provides two browsing and editing desktop tools (TrEd and MEd) and the corpus is also available online for searching using PML-TQ.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

16. Prague Dependency Treebank 2.5

Creator:: Bejček, Eduard, Hajič, Jan, Panevová, Jarmila, Mírovský, Jiří, Spoustová, Johanka, Štěpánek, Jan, Straňák, Pavel, Šidák, Pavel, Vimmrová, Pavlína, Šťastná, Eva, Ševčíková, Magda, Smejkalová, Lenka, Homola, Petr, Popelka, Jan, Lopatková, Markéta, Hrabalová, Lucie, Klyueva, Natalia, and Žabokrtský, Zdeněk
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: treebank, multiword expressions, clauses, tectogrammatics, dependency, and PDT
Language:: Czech
Description:: The Prague Dependency Treebank 2.5 annotates the same texts as the PDT 2.0. The annotation on the original four layers was fixed or improved in various aspects (see Documentation). Moreover, new information was added to the data: Annotation of multiword expressions Pair/group meaning Clause segmentation and Ministry of Education of the Czech Republic projects No.: LM2010013 LC536 MSM0021620838 Grant Agency of the Czech Republic grants No.: P406/2010/0875 P202/10/1333 P406/10/P193
Rights:: Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), http://creativecommons.org/licenses/by-nc-sa/3.0/, and PUB

17. Prague Dependency Treebank 3.5

Creator:: Hajič, Jan, Bejček, Eduard, Bémová, Alevtina, Buráňová, Eva, Hajičová, Eva, Havelka, Jiří, Homola, Petr, Kárník, Jiří, Kettnerová, Václava, Klyueva, Natalia, Kolářová, Veronika, Kučová, Lucie, Lopatková, Markéta, Mikulová, Marie, Mírovský, Jiří, Nedoluzhko, Anna, Pajas, Petr, Panevová, Jarmila, Poláková, Lucie, Rysová, Magdaléna, Sgall, Petr, Spoustová, Johanka, Straňák, Pavel, Synková, Pavlína, Ševčíková, Magda, Štěpánek, Jan, Urešová, Zdeňka, Vidová Hladká, Barbora, Zeman, Daniel, Zikánová, Šárka, and Žabokrtský, Zdeněk
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: treebank, dependency, tectogrammatics, topic-focus articulation, multiword expressions, coreference, bridging relations, discourse, morphology, syntax, tokenization, lemmatization, clauses, semantics, semantic relations, lexical semantics, and lexicon
Language:: Czech
Description:: The Prague Dependency Treebank 3.5 is the 2018 edition of the core Prague Dependency Treebank (PDT). It contains all PDT annotation made at the Institute of Formal and Applied Linguistics under various projects between 1996 and 2018 on the original texts, i.e., all annotation from PDT 1.0, PDT 2.0, PDT 2.5, PDT 3.0, PDiT 1.0 and PDiT 2.0, plus corrections, new structure of basic documentation and new list of authors covering all previous editions. The Prague Dependency Treebank 3.5 (PDT 3.5) contains the same texts as the previous versions since 2.0; there are 49,431 annotated sentences (832,823 words) on all layers, from tectogrammatical annotation to syntax to morphology. There are additional annotated sentences for syntax and morphology; the totals for the lower layers of annotation are: 87,913 sentences with 1,502,976 words at the analytical layer (surface dependency syntax) and 115,844 sentences with 1,956,693 words at the morphological layer of annotation (these totals include the annotation with the higher layers annotated as well). Closely linked to the tectogrammatical layer is the annotation of sentence information structure, multiword expressions, coreference, bridging relations and discourse relations.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

18. Public License Selector

Creator:: Sedlák, Michal, Straňák, Pavel, and Kamocki, Pawel
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: Legal and Licensing
Language:: English
Description:: Customizable tool that will help user select the right open license for his data or software
Rights:: The MIT License (MIT), http://opensource.org/licenses/mit-license.php, and PUB

11. Multiword expressions in the Prague Dependency Treebank 2.0

12. ParCzech 3.0

13. ParCzech PS7 1.0

14. ParCzech PS7 2.0

15. Prague Dependency Treebank - Consolidated 1.0 (PDT-C 1.0)

16. Prague Dependency Treebank 2.5

17. Prague Dependency Treebank 3.5

18. Public License Selector

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Creator

Show values starting with

Language

Show values starting with

Publisher

Rights

Show values starting with

Subject

Show values starting with

Type

Date

Original context has metadata only

Harvested from