Baseline UDPipe models for CoNLL 2017 Shared Task in UD Parsing, and supplementary material.
The models require UDPipe version at least 1.1 and are evaluated using the official evaluation script.
The models are trained on a slightly different split of the official UD 2.0 CoNLL 2017 training data, so called baselinemodel split, in order to allow comparison of models even during the shared task. This baselinemodel split of UD 2.0 CoNLL 2017 training data is available for download.
Furthermore, we also provide UD 2.0 CoNLL 2017 training data with automatically predicted morphology. We utilize the baseline models on development data and perform 10-fold jack-knifing (each fold is predicted with a model trained on the rest of the folds) on the training data.
Finally, we supply all required data and hyperparameter values needed to replicate the baseline models.
Baseline UDPipe models for CoNLL 2018 Shared Task in UD Parsing, and supplementary material.
The models require UDPipe version at least 1.2 and are evaluated using the official evaluation script. The models were trained using a custom data split for treebanks where no development data is provided. Also, we trained an additional "Mixed" model, which uses 200 sentences from every training data. All information needed to replicate the model training (hyperparameters, modified train-dev split, and pre-computed word embeddings for the parser) are included in the archive.
Additionaly, we provide UD 2.2 CoNLL 2018 training data with automatically predicted morphology. We utilize the baseline models on development data and perform 10-fold jack-knifing (each fold is predicted with a model trained on the rest of the folds) on the training data.
Lingua::Interset is a universal morphosyntactic feature set to which all tagsets of all corpora/languages can be mapped. Version 2.026 covers 37 different tagsets of 21 languages. Limited support of the older drivers for other languages (which are not included in this package but are available for download elsewhere) is also available; these will be fully ported to Interset 2 in future.
Interset is implemented as Perl libraries. It is also available via CPAN.