On the basis of the material of the corpus SYN, the article deals, at first, with the description of morphologically frozen expressions jakživ, jaktěživ with an adverbial meaning ''never'' in negative clauses, while these expressions are, due to their ending, in syntactic agreement in gender and number with the grammatical subject. Also this agreement in positive clauses, where the frozen expressions mean ''ever (in one’s life)'', is briefly mentioned. However, the principal aim of the article is to show that the syntactic adverbialisation of these expressions in negative clauses causes the disturbance of this agreement, cf. jaktěživo neměl názor ''never in his life had he an opinion'', while there are two possible results of this adverbialisation: the forms of neuter jaktěživo, jakživo are more common in Bohemia, while the forms of masculine jaktěživ, jakživ are used rather in Moravia. The author interprets the frequency of both concordant and non-concordant (frozen) expressions, ordered according to their descending frequency in SYN.
Tamil Dependency Treebank version 0.1 (TamilTB.v0.1) is an attempt to develop a syntactically annotated corpora for Tamil. TamilTB.v0.1 contains 600 sentences enriched with manual annotation of morphology and dependency syntax in the style of Prague Dependency Treebank. TamilTB.v0.1 has been created at the Institute of Formal and Applied Linguistics, Charles University in Prague.
This is the first release of the UFAL Parallel Corpus of North Levantine, compiled by the Institute of Formal and Applied Linguistics (ÚFAL) at Charles University within the Welcome project (https://welcome-h2020.eu/). The corpus consists of 120,600 multiparallel sentences in English, French, German, Greek, Spanish, and Standard Arabic selected from the OpenSubtitles2018 corpus [1] and manually translated into the North Levantine Arabic language. The corpus was created for the purpose of training machine translation for North Levantine and the other languages.
We release a sizeable monolingual Urdu corpus automatically tagged with part-of-speech tags. We extend the work of Jawaid and Bojar (2012) who use three different taggers and then apply a voting scheme to disambiguate among the different choices suggested by each tagger. We run this complex ensemble on a large monolingual corpus and release the both plain and tagged corpora. and it is supported by the MosesCore project sponsored by the European Commission’s Seventh Framework Programme (Grant Number 288487).