The Prague Dependency Treebank 2.5 annotates the same texts as the PDT 2.0. The annotation on the original four layers was fixed or improved in various aspects (see Documentation). Moreover, new information was added to the data:
Annotation of multiword expressions
Pair/group meaning
Clause segmentation and Ministry of Education of the Czech Republic projects No.:
LM2010013
LC536
MSM0021620838
Grant Agency of the Czech Republic grants No.:
P406/2010/0875
P202/10/1333
P406/10/P193
PDT 3.0 is a new version of Prague Dependency Treebank. It contains a large amount of Czech texts with complex and interlinked morphological (2 million words), syntactic (1.5 MW) and semantic annotation (0.8 MW); in addition, certain properties of sentence information structure, multiword expressions, coreference, bridging relations and discourse relations are annotated at the semantic level. and the Grant Agency of the Czech Republic: grants P406/12/0658 "Coreference, discourse relations and information structure in a contrastive perspective", P406/2010/0875 "Computational Linguistics: Explicit description of language and annotated data focused on Czech", 405/09/0729 "From the structure of a sentence to textual relationships", and GPP406/12/P175 (Selected derivational relations for automatic processing of Czech);
the Ministry of Education, Youth and Sports of the Czech Republic: the KONTAKT project ME10018 "Towards a computational analysis of text structure" and the LINDAT-Clarin project LM2010013;
the Grant Agency of Charles University in Prague: GAUK 103609 "Textual (Inter-sentential) Relations and their Representation in a Language Corpus" and GAUK 4383/2009 "Methods of coreference resolution".
The Prague Dependency Treebank 3.5 is the 2018 edition of the core Prague Dependency Treebank (PDT). It contains all PDT annotation made at the Institute of Formal and Applied Linguistics under various projects between 1996 and 2018 on the original texts, i.e., all annotation from PDT 1.0, PDT 2.0, PDT 2.5, PDT 3.0, PDiT 1.0 and PDiT 2.0, plus corrections, new structure of basic documentation and new list of authors covering all previous editions. The Prague Dependency Treebank 3.5 (PDT 3.5) contains the same texts as the previous versions since 2.0; there are 49,431 annotated sentences (832,823 words) on all layers, from tectogrammatical annotation to syntax to morphology. There are additional annotated sentences for syntax and morphology; the totals for the lower layers of annotation are: 87,913 sentences with 1,502,976 words at the analytical layer (surface dependency syntax) and 115,844 sentences with 1,956,693 words at the morphological layer of annotation (these totals include the annotation with the higher layers annotated as well). Closely linked to the tectogrammatical layer is the annotation of sentence information structure, multiword expressions, coreference, bridging relations and discourse relations.
The Prague Dependency Treebank of Spoken Czech 2.0 (PDTSC 2.0) is a corpus of spoken language, consisting of 742,316 tokens and 73,835 sentences, representing 7,324 minutes (over 120 hours) of spontaneous dialogs. The dialogs have been recorded, transcribed and edited in several interlinked layers: audio recordings, automatic and manual transcripts and manually reconstructed text. These layers were part of the first version of the corpus (PDTSC 1.0). Version 2.0 is extended by an automatic dependency parser at the analytical and by the manual annotation of “deep” syntax at the tectogrammatical layer, which contains semantic roles and relations as well as annotation of coreference.
The first edition of a speech corpus with a speech reconstruction layer (edited transcript).
The project of speech reconstruction of Czech and English has been started at UFAL together with the PIRE project in 2005, and has gradually grown from ideas to (first) annotation specification, annotation software and actual annotation. It is part of the Prague Dependency Treebank family of annotated corpus resources and tools, to which it adds the spoken language layer(s). and LC536; MSM0021620838; IST-034344; ME838
Annotation of discourse relations is a project related to the Prague Dependency Treebank 2.5. It represents a new manually annotated layer of language description, above the existing layers of the PDT, and it portrays linguistic phenomena from the perspective of discourse structure and coherence. and GACR P406/12/0658, GACR P406/2010/0875, GACR 405/09/0729, Ministry of Education ME10018, Ministry of Education LM2010013
PDiT 2.0 is a new version of the Prague Discourse Treebank. It contains a complex annotation of discourse phenomena enriched by the annotation of secondary connectives.
The Prague Discourse Treebank 3.0 (PDiT 3.0) is a new version of annotation of discourse relations marked by primary and secondary discourse connectives in the data of the Prague Dependency Treebank. With respect to the previous versions, PDiT 3.0 brings a largely revised annotation of discourse relations and offers the data also in the Penn Discourse Treebank 3.0 (PDTB 3.0) format and sense taxonomy.
Segment from Československý zvukový týdeník Aktualita (Czechoslovak Aktualita Sound Newsreel) 1942, issue no. 10, captures the opening of a proagandistic anti-Soviet exhibition with the ironic title The Soviet Paradise (Das sowjet Paradies), held at the Prague Exhibition Grounds from 28 February to 28 March 1942. A view of the gate of the grounds with the sign The Soviet Paradise. The opening of the exhibition is attended by Reich Secretary Karl Hermann Frank, Prime Minister of the Protectorate Government Jaroslav Krejčí, and Minister of Education and People´s Enlightenment Emanuel Moravec. Footage of the exhibits. Karl Hermann Frank and the others examines a Russian tank and a RATA fighter plane in front of the Industrial Palace. An image of a poster saying "People Who Gave Up Laughing".
Segment from Československý zvukový týdeník Aktualita (Czechoslovak Aktualita Sound Newsreel) 1942, issue no. 21, from a concert held as part of Prague Music Weeks at the German Opera in Prague (now State Opera Prague) on 15 May 1942. Footage of the exterior and interior of the building. Prague´s German Philharmonic, conducted by Joseph Keilberth, performs the Finale of Anton Bruckner´s Eighth Symphony (authentic sound). Acting Reich Protector Reinhard Heydrich with his wife Lina, and the chief of the Wehrmacht troops in Prague, General Rudolf Toussaint, are present in the audience.