Tokenizer, POS Tagger, Lemmatizer and Parser models for 123 treebanks of 69 languages of Universal Depenencies 2.10 Treebanks, created solely using UD 2.10 data (https://hdl.handle.net/11234/1-4758). The model documentation including performance can be found at https://ufal.mff.cuni.cz/udpipe/2/models#universal_dependencies_210_models .
To use these models, you need UDPipe version 2.0, which you can download from https://ufal.mff.cuni.cz/udpipe/2 .
Tokenizer, POS Tagger, Lemmatizer and Parser models for 131 treebanks of 72 languages of Universal Depenencies 2.12 Treebanks, created solely using UD 2.12 data (https://hdl.handle.net/11234/1-5150). The model documentation including performance can be found at https://ufal.mff.cuni.cz/udpipe/2/models#universal_dependencies_212_models .
To use these models, you need UDPipe version 2.0, which you can download from https://ufal.mff.cuni.cz/udpipe/2 .
Tokenizer, POS Tagger, Lemmatizer and Parser models for 84 treebanks of 56 languages of Universal Depenencies 2.3 Treebanks, created solely using UD 2.3 data (http://hdl.handle.net/11234/1-2895). The model documentation including performance can be found at http://ufal.mff.cuni.cz/udpipe/models#universal_dependencies_23_models .
To use these models, you need UDPipe binary version at least 1.2, which you can download from http://ufal.mff.cuni.cz/udpipe .
In addition to models itself, all additional data and value of hyperparameters used for training are available in the second archive, allowing reproducible training.
Tokenizer, POS Tagger, Lemmatizer and Parser models for 90 treebanks of 60 languages of Universal Depenencies 2.4 Treebanks, created solely using UD 2.4 data (http://hdl.handle.net/11234/1-2988). The model documentation including performance can be found at http://ufal.mff.cuni.cz/udpipe/models#universal_dependencies_24_models .
To use these models, you need UDPipe binary version at least 1.2, which you can download from http://ufal.mff.cuni.cz/udpipe .
In addition to models itself, all additional data and value of hyperparameters used for training are available in the second archive, allowing reproducible training.
Tokenizer, POS Tagger, Lemmatizer and Parser models for 94 treebanks of 61 languages of Universal Depenencies 2.5 Treebanks, created solely using UD 2.5 data (http://hdl.handle.net/11234/1-3105). The model documentation including performance can be found at http://ufal.mff.cuni.cz/udpipe/models#universal_dependencies_25_models .
To use these models, you need UDPipe binary version at least 1.2, which you can download from http://ufal.mff.cuni.cz/udpipe .
In addition to models itself, all additional data and value of hyperparameters used for training are available in the second archive, allowing reproducible training.
Tokenizer, POS Tagger, Lemmatizer and Parser models for 99 treebanks of 63 languages of Universal Depenencies 2.6 Treebanks, created solely using UD 2.6 data (https://hdl.handle.net/11234/1-3226). The model documentation including performance can be found at https://ufal.mff.cuni.cz/udpipe/2/models#universal_dependencies_26_models .
To use these models, you need UDPipe version 2.0, which you can download from https://ufal.mff.cuni.cz/udpipe/2 .
VALLEX 3.0 provides information on the valency structure (combinatorial potential) of verbs in their particular senses, which are characterized by glosses and examples. VALLEX 3.0 describes almost 4 600 Czech verbs in more than 10 800 lexical units, i.e., given verbs in the given senses.
VALLEX 3.0 is a is a collection of linguistically annotated data and documentation, resulting from an attempt at formal description of valency frames of Czech verbs. In order to satisfy different needs of different potential users, the lexicon is distributed (i) in a HTML version (the data allows for an easy and fast navigation through the lexicon) and (ii) in a machine-tractable form as a single XML file, so that the VALLEX data can be used in NLP applications.
VALLEX 4.0 provides information on the valency structure (combinatorial potential) of verbs in their particular senses; each sense is by a gloss and examples. VALLEX 4.0 describes almost 4 700 Czech verbs in more than 11 000 lexical units, i.e., given verbs in the given senses. VALLEX 4.0 is a is a collection of linguistically annotated data and documentation, resulting from an attempt at formal description of valency frames of Czech verbs. In order to satisfy different needs of different potential users, the lexicon is distributed (i) in a HTML version (the data allows for an easy and fast navigation through the lexicon) and (ii) in a machine-tractable form, so that the VALLEX data can be used in NLP applications. VALLEX 4.0 provides (in addition to information from previous versions) also characteristics of verbs expressing reciprocity and reflexivity.
The data is provided in two formats: XML and JSON.
VALLEX 4.5 provides information on the valency structure (combinatorial potential) of Czech verbs in their particular senses (almost 4 700 verbs in more than 11 080 lexical units, supplemented with more than 290 nouns in more than 350 lexical units forming complex predicates with light verbs). VALLEX 4.5 is an enhanced successor of VALLEX 3.0, 3.5, and 4.0. In addition to the information stored there, VALLEX 4.5 provides a detailed description of reflexive verbs, i.e., verbs with the reflexive "se" or "si" as an obligatory part of their verb lexemes. VALLEX 4.5 covers 1 525 reflexive verbs in 1 545 lexical units (2 501 when aspectual counterparts counted separately). In order to satisfy different needs of different potential users, the lexicon is distributed (i) online in a HTML version (the data allows for an easy and fast navigation through the lexicon) and (ii) in this distribution in a machine-tractable form, so that the VALLEX data can be used in NLP applications.
We provide the Vietnamese version of the multi-lingual test set from WMT 2013 [1] competition. The Vietnamese version was manually translated from English. For completeness, this record contains the 3000 sentences in all the WMT 2013 original languages (Czech, English, French, German, Russian and Spanish), extended with our Vietnamese version. Test set is used in [2] to evaluate translation between Czech, English and Vietnamese.
References
1. http://www.statmt.org/wmt13/evaluation-task.html
2. Duc Tam Hoang and Ondřej Bojar, The Prague Bulletin of Mathematical Linguistics. Volume 104, Issue 1, Pages 75--86, ISSN 1804-0462. 9/2015