CzeDLex 0.5 is a pilot version of a lexicon of Czech discourse connectives. The lexicon contains connectives partially automatically extracted from the Prague Discourse Treebank 2.0 (PDiT 2.0), a large corpus annotated manually with discourse relations. The most frequent entries in the lexicon (covering more than 2/3 of the discourse relations annotated in the PDiT 2.0) have been manually checked, translated to English and supplemented with additional linguistic information.
CzeDLex 0.6 is the second development version of the lexicon of Czech discourse connectives. The lexicon contains connectives partially automatically extracted from the Prague Discourse Treebank 2.0 (PDiT 2.0), a large corpus annotated manually with discourse relations. The most frequent entries in the lexicon (76 out of total 204 entries, covering more than 90% of the discourse relations annotated in PDiT 2.0), have been manually checked, translated to English and supplemented with additional linguistic information.
CzeDLex 0.7 is the third development version of the Lexicon of Czech discourse connectives. The lexicon contains connectives partially automatically extracted from the Prague Discourse Treebank 2.0 (PDiT 2.0) and, as a supplementary resource, the Czech part of the Prague Czech–English Dependency Treebank with discourse annotation projected from the Penn Discourse Treebank 3.0. The most frequent entries in the lexicon (131 out of total 218 entries, covering more than 95% of discourse relations annotated in PDiT 2.0), have been manually checked, translated to English and supplemented with additional linguistic information.
CzeDLex 1.0 is the first production version (the fourth development version) of the Lexicon of Czech discourse connectives. The lexicon contains connectives partially automatically extracted from resources annotated manually with discourse relations: the Prague Discourse Treebank 2.0 (PDiT 2.0) as the primary resource, and two supplementary resources: (i) the Czech part of the Prague Czech–English Dependency Treebank with discourse annotation projected from the Penn Discourse Treebank 3.0, and (ii) a thousand sentences selected from various fiction novels and transcriptions of public speeches. All 200 entries in the lexicon have been manually checked, translated to English and supplemented with additional linguistic information.
Enriched discourse annotation of a subset of the Prague Discourse Treebank, adding implicit relations, entity based relations, question-answer relations and other discourse structuring phenomena.
The Prague Dependency Treebank 3.5 is the 2018 edition of the core Prague Dependency Treebank (PDT). It contains all PDT annotation made at the Institute of Formal and Applied Linguistics under various projects between 1996 and 2018 on the original texts, i.e., all annotation from PDT 1.0, PDT 2.0, PDT 2.5, PDT 3.0, PDiT 1.0 and PDiT 2.0, plus corrections, new structure of basic documentation and new list of authors covering all previous editions. The Prague Dependency Treebank 3.5 (PDT 3.5) contains the same texts as the previous versions since 2.0; there are 49,431 annotated sentences (832,823 words) on all layers, from tectogrammatical annotation to syntax to morphology. There are additional annotated sentences for syntax and morphology; the totals for the lower layers of annotation are: 87,913 sentences with 1,502,976 words at the analytical layer (surface dependency syntax) and 115,844 sentences with 1,956,693 words at the morphological layer of annotation (these totals include the annotation with the higher layers annotated as well). Closely linked to the tectogrammatical layer is the annotation of sentence information structure, multiword expressions, coreference, bridging relations and discourse relations.
PDiT 2.0 is a new version of the Prague Discourse Treebank. It contains a complex annotation of discourse phenomena enriched by the annotation of secondary connectives.
The Prague Discourse Treebank 3.0 (PDiT 3.0) is a new version of annotation of discourse relations marked by primary and secondary discourse connectives in the data of the Prague Dependency Treebank. With respect to the previous versions, PDiT 3.0 brings a largely revised annotation of discourse relations and offers the data also in the Penn Discourse Treebank 3.0 (PDTB 3.0) format and sense taxonomy.