PAWS is a multi-lingual parallel treebank with coreference annotation. It consists of English texts from the Wall Street Journal translated into Czech, Russian and Polish. In addition, the texts are syntactically parsed and word-aligned. PAWS is based on PCEDT 2.0 and continues the tradition of multilingual treebanks with coreference annotation. PAWS offers linguistic material that can be further leveraged in cross-lingual studies, especially on coreference.
Prague Czech-English Dependency Treebank - Russian translation (PCEDT-R) is a project of translating a subset of Prague Czech-English Dependency Treebank 2.0 (PCEDT 2.0) to Russian and linguistically annotating the Russian translations with emphasis on coreference and cross-lingual alignment of coreferential expressions. Cross-lingual comparison of coreference means is currently the purpose that drives development of this corpus.
The current version 0.5 is a preliminary version, which contains (+ denotes new features):
* complete PCEDT 2.0 documents "wsj_1900"-"wsj_1949"
* Czech-English word alignment of coreferential expressions annotated manually mainly on the t-layer
+ Russian translations of the original English sentences
+ automatic tokenization, part-of-speech tagging and morphological analysis for Russian
+ automatic word alignment between all Czech and Russian words
+ manual alignment between Russian and the other two languages on possessive pronouns
The Prague Czech-English Dependency Treebank 2.0 Coref (PCEDT 2.0 Coref) is a parallel treebank building upon the original PCEDT 2.0 release and enriching it with the extended manual annotation of coreference, as well as with an improved automatic annotation of the coreferential expression alignment.