IDENTIC is an Indonesian-English parallel corpus for research purposes. The corpus is a bilingual corpus paired with English. The aim of this work is to build and provide researchers a proper Indonesian-English textual data set and also to promote research in this language pair. The corpus contains texts coming from different sources with different genres. and The research leading to these results has received funding from the European Commission’s 7th Framework Program under grant agreement no 238405 (CLARA) and by the grant LC536 Centrum Komputacni Lingvistiky of the Czech Ministry of Education.
The Prague Dependency Treebank 2.5 annotates the same texts as the PDT 2.0. The annotation on the original four layers was fixed or improved in various aspects (see Documentation). Moreover, new information was added to the data:
Annotation of multiword expressions
Pair/group meaning
Clause segmentation and Ministry of Education of the Czech Republic projects No.:
LM2010013
LC536
MSM0021620838
Grant Agency of the Czech Republic grants No.:
P406/2010/0875
P202/10/1333
P406/10/P193