This multilingual resource contains corpora in which verbal MWEs have been manually annotated, gathered at the occasion of the 1.2 edition of the PARSEME Shared Task on semi-supervised Identification of Verbal MWEs (2020).
VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do).
For the 1.2 shared task edition, the data covers 14 languages, for which VMWEs were annotated according to the universal guidelines. The corpora are provided in the cupt format, inspired by the CONLL-U format.
Morphological and syntactic information – not necessarily using UD tagsets – including parts of speech, lemmas, morphological features and/or syntactic dependencies are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe).
This item contains training, development and test data, as well as the evaluation tools used in the PARSEME Shared Task 1.2 (2020). The annotation guidelines are available online: http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.2
ParaDi 2.0. is a dictionary of single verb paraphrases of Czech verbal multiword expressions - light verb constructions and idiomatic verb constructions. Moreover, it provides an elaborated set of morphological, syntactic and semantic features, including information on aspectual counterparts of verbs or paraphrasability conditions of given verbs.
The format of ParaDi has been designed with respect to both human and machine readability - the dictionary is represented as a plain table in TSV format, as it is a flexible and language-independent data format.
ParaDi 2.0. is a dictionary of single verb paraphrases of Czech verbal multiword expressions - light verb constructions and idiomatic verb constructions. Moreover, it provides an elaborated set of morphological, syntactic and semantic features, including information on aspectual counterparts of verbs or paraphrasability conditions of given verbs.
The format of ParaDi has been designed with respect to both human and machine readability - the dictionary is represented as a plain table in TSV format, as it is a flexible and language-independent data format.
This multilingual resource contains corpora in which verbal MWEs have been manually annotated. VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). This is the first release of the corpora without an associated shared task. Previous version (1.2) was associated with the PARSEME Shared Task on semi-supervised Identification of Verbal MWEs (2020). The data covers 26 languages corresponding to the combination of the corpora for all previous three editions (1.0, 1.1 and 1.2) of the corpora. VMWEs were annotated according to the universal guidelines. The corpora are provided in the cupt format, inspired by the CONLL-U format. Morphological and syntactic information, including parts of speech, lemmas, morphological features and/or syntactic dependencies, are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). All corpora are split into training, development and test data, following the splitting strategy adopted for the PARSEME Shared Task 1.2. The annotation guidelines are available online: https://parsemefr.lis-lab.fr/parseme-st-guidelines/1.3 The .cupt format is detailed here: https://multiword.sourceforge.net/cupt-format/