1 - 9 of 9
Number of results to display per page
Search Results
2. Lexico-Semantic Annotation of PDT using Czech WordNet
- Creator:
- Bejček, Eduard, Hoffmannová, Petra, Holub, Martin, Hučínová, Marie, Pecina, Pavel, Straňák, Pavel, Šidák, Pavel, and Hajič, Jan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- PDT and Czech WordNet
- Language:
- Czech
- Description:
- This dataset contains annotation of PDT using Czech WordNet ontology: http://hdl.handle.net/11858/00-097C-0000-0001-4880-3 Data is stored in PML format. This is a stand-off annotation and for most use cases it requires PDT 2.0 and the Czech WordNet 1.9 PDT that we have used for annotation. and 1ET100300517, 1ET201120505
- Rights:
- Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), http://creativecommons.org/licenses/by-nc-sa/3.0/, and PUB
3. PDT-Vallex: Czech Valency lexicon linked to treebanks
- Creator:
- Urešová, Zdeňka, Štěpánek, Jan, Hajič, Jan, Panevova, Jarmila, and Mikulová, Marie
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text, lexicon, and lexicalConceptualResource
- Subject:
- annotation, corpora, data, lexicon, semantics, valency, and PDT
- Language:
- Czech
- Description:
- The valency lexicon PDT-Vallex has been built in close connection with the annotation of the Prague Dependency Treebank project (PDT) and its successors (mainly the Prague Czech-English Dependency Treebank project, PCEDT). It contains over 11000 valency frames for more than 7000 verbs which occurred in the PDT or PCEDT. It is available in electronically processable format (XML) together with the aforementioned treebanks (to be viewed and edited by TrEd, the PDT/PCEDT main annotation tool), and also in more human readable form including corpus examples (see the WEBSITE link below). The main feature of the lexicon is its linking to the annotated corpora - each occurrence of each verb is linked to the appropriate valency frame with additional (generalized) information about its usage and surface morphosyntactic form alternatives.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
4. PDT-Vallex: Czech Valency lexicon linked to treebanks 4.0 (PDT-Vallex 4.0)
- Creator:
- Urešová, Zdeňka, Bémová, Alevtina, Fučíková, Eva, Hajič, Jan, Kolářová, Veronika, Mikulová, Marie, Pajas, Petr, Panevová, Jarmila, and Štěpánek, Jan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text, computationalLexicon, and lexicalConceptualResource
- Subject:
- verbal valency, valency, annotation, linguistic data, lexicon, lexical semantics, and PDT
- Language:
- Czech
- Description:
- The valency lexicon PDT-Vallex 4.0 has been built in close connection with the annotation of the Prague Dependency Treebank project (PDT) and its successors (mainly the Prague Czech-English Dependency Treebank project, PCEDT, the spoken language corpus (PDTSC) and corpus of user-generated texts in the project Faust). It contains over 14500 valency frames for almost 8500 verbs which occurred in the PDT, PCEDT, PDTSC and Faust corpora. In addition, there are nouns, adjectives and adverbs, linked from the PDT part only, increasing the total to over 17000 valency frames for 13000 words. All the corpora have been published in 2020 as the PDT-C 1.0 corpus with the PDT-Vallex 4.0 dictionary included; this is a copy of the dictionary published as a separate item for those not interested in the corpora themselves. It is available in electronically processable format (XML), and also in more human readable form including corpus examples (see the WEBSITE link below, and the links to its main publications elsewhere in this metadata). The main feature of the lexicon is its linking to the annotated corpora - each occurrence of each verb is linked to the appropriate valency frame with additional (generalized) information about its usage and surface morphosyntactic form alternatives. It replaces the previously published unversioned edition of PDT-Vallex from 2014.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
5. Prague Dependency Treebank 2.0 - sample data
- Creator:
- Hajič, Jan, Panevová, Jarmila, Sgall, Petr, Pajas, Petr, Štěpánek, Jan, Havelka, Jiří, Mikulová, Marie, Žabokrtský, Zdeněk, and Ševčíková-Razímová, Magda
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- treebank, dependency, and PDT
- Language:
- Czech
- Description:
- A small subset of PDT 2.0 made available under a permissive license. Prague Dependency Treebank 2.0 (PDT 2.0) contains a large amount of Czech texts with complex and interlinked morphological (2 million words), syntactic (1.5 MW) and complex semantic annotation (0.8 MW); in addition, certain properties of sentence information structure and coreference relations are annotated at the semantic level. PDT 2.0 is based on the long-standing Praguian linguistic tradition, adapted for the current Computational Linguistics research needs. The corpus itself uses the latest annotation technology. Software tools for corpus search, annotation and language analysis are included. Extensive documentation (in English) is provided as well. and * Ministry of Education of the Czech Republic projects No. VS96151, LN00A063, 1P05ME752, MSM0021620838 and LC536, * Grant Agency of the Czech Republic grants Nos. 405/96/0198, 405/96/K214 and 405/03/0913, * research funds of the Faculty of Mathematics and Physics, * Charles University, Prague, Czech Republic, * Grant Agency of the Czech Academy of Science, Prague, Czech Republic projects No. 1ET101120503, 1ET101120413, and 1ET201120505 * Grant Agency of the Charles University No. 489/04, 350/05, 352/05 and 375/05 * the U.S. NSF Grant #IIS9732388.
- Rights:
- Creative Commons - Attribution 3.0 Unported (CC BY 3.0), http://creativecommons.org/licenses/by/3.0/, and PUB
6. Prague Dependency Treebank 2.5
- Creator:
- Bejček, Eduard, Hajič, Jan, Panevová, Jarmila, Mírovský, Jiří, Spoustová, Johanka, Štěpánek, Jan, Straňák, Pavel, Šidák, Pavel, Vimmrová, Pavlína, Šťastná, Eva, Ševčíková, Magda, Smejkalová, Lenka, Homola, Petr, Popelka, Jan, Lopatková, Markéta, Hrabalová, Lucie, Klyueva, Natalia, and Žabokrtský, Zdeněk
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- treebank, multiword expressions, clauses, tectogrammatics, dependency, and PDT
- Language:
- Czech
- Description:
- The Prague Dependency Treebank 2.5 annotates the same texts as the PDT 2.0. The annotation on the original four layers was fixed or improved in various aspects (see Documentation). Moreover, new information was added to the data: Annotation of multiword expressions Pair/group meaning Clause segmentation and Ministry of Education of the Czech Republic projects No.: LM2010013 LC536 MSM0021620838 Grant Agency of the Czech Republic grants No.: P406/2010/0875 P202/10/1333 P406/10/P193
- Rights:
- Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), http://creativecommons.org/licenses/by-nc-sa/3.0/, and PUB
7. Prague Dependency Treebank 3.0
- Creator:
- Bejček, Eduard, Hajičová, Eva, Hajič, Jan, Jínová, Pavlína, Kettnerová, Václava, Kolářová, Veronika, Mikulová, Marie, Mírovský, Jiří, Nedoluzhko, Anna, Panevová, Jarmila, Poláková, Lucie, Ševčíková, Magda, Štěpánek, Jan, and Zikánová, Šárka
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- treebank, dependency, tectogrammatics, topic-focus articulation, multiword expressions, coreference, bridging relations, discourse, and PDT
- Language:
- Czech
- Description:
- PDT 3.0 is a new version of Prague Dependency Treebank. It contains a large amount of Czech texts with complex and interlinked morphological (2 million words), syntactic (1.5 MW) and semantic annotation (0.8 MW); in addition, certain properties of sentence information structure, multiword expressions, coreference, bridging relations and discourse relations are annotated at the semantic level. and the Grant Agency of the Czech Republic: grants P406/12/0658 "Coreference, discourse relations and information structure in a contrastive perspective", P406/2010/0875 "Computational Linguistics: Explicit description of language and annotated data focused on Czech", 405/09/0729 "From the structure of a sentence to textual relationships", and GPP406/12/P175 (Selected derivational relations for automatic processing of Czech); the Ministry of Education, Youth and Sports of the Czech Republic: the KONTAKT project ME10018 "Towards a computational analysis of text structure" and the LINDAT-Clarin project LM2010013; the Grant Agency of Charles University in Prague: GAUK 103609 "Textual (Inter-sentential) Relations and their Representation in a Language Corpus" and GAUK 4383/2009 "Methods of coreference resolution".
- Rights:
- Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), http://creativecommons.org/licenses/by-nc-sa/3.0/, and PUB
8. Prague Discourse Treebank 2.0
- Creator:
- Rysová, Magdaléna, Synková, Pavlína, Mírovský, Jiří, Hajičová, Eva, Nedoluzhko, Anna, Ocelák, Radek, Pergler, Jiří, Poláková, Lucie, Scheller, Veronika, Zdeňková, Jana, and Zikánová, Šárka
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- discourse, bridging relations, coreference, topic-focus articulation, treebank, dependency, tectogrammatics, and PDT
- Language:
- Czech
- Description:
- PDiT 2.0 is a new version of the Prague Discourse Treebank. It contains a complex annotation of discourse phenomena enriched by the annotation of secondary connectives.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
9. Prague Discourse Treebank 3.0
- Creator:
- Synková, Pavlína, Rysová, Magdaléna, Mírovský, Jiří, Poláková, Lucie, Sheller, Veronika, Zdeňková, Jana, Zikánová, Šárka, and Hajičová, Eva
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- discourse, discourse annotation, treebank, PDT, and tectogrammatics
- Language:
- Czech
- Description:
- The Prague Discourse Treebank 3.0 (PDiT 3.0) is a new version of annotation of discourse relations marked by primary and secondary discourse connectives in the data of the Prague Dependency Treebank. With respect to the previous versions, PDiT 3.0 brings a largely revised annotation of discourse relations and offers the data also in the Penn Discourse Treebank 3.0 (PDTB 3.0) format and sense taxonomy.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB