This package contains an extended version of the test collection used in the CLEF eHealth Information Retrieval tasks in 2013--2015. Compared to the original version, it provides complete query translations into Czech, French, German, Hungarian, Polish, Spanish and Swedish and additional relevance assessment.
HamleDT 2.0 is a collection of 30 existing treebanks harmonized into a common annotation style, the Prague Dependencies, and further transformed into Stanford Dependencies, a treebank annotation style that became popular recently. We use the newest basic Universal Stanford Dependencies, without added language-specific subtypes.
HamleDT (HArmonized Multi-LanguagE Dependency Treebank) is a compilation of existing dependency treebanks (or dependency conversions of other treebanks), transformed so that they all conform to the same annotation style. This version uses Universal Dependencies as the common annotation style.
Update (November 1017): for a current collection of harmonized dependency treebanks, we recommend using the Universal Dependencies (UD). All of the corpora that are distributed in HamleDT in full are also part of the UD project; only some corpora from the Patch group (where HamleDT provides only the harmonizing scripts but not the full corpus data) are available in HamleDT but not in UD.
This package contains data sets for development and testing of machine translation of medical queries between Czech, English, French, German, Hungarian, Polish, Spanish ans Swedish. The queries come from general public and medical experts. This is version 2.0 extending the previous version by adding Hungarian, Polish, Spanish, and Swedish translations.
This package contains data sets for development (Section dev) and testing (Section test) of machine translation of sentences from summaries of medical articles between Czech, English, French, German, Hungarian, Polish, Spanish
and Swedish. Version 2.0 extends the previous version by adding Hungarian, Polish, Spanish, and Swedish translations.
Lingua::Interset is a universal morphosyntactic feature set to which all tagsets of all corpora/languages can be mapped. Version 2.026 covers 37 different tagsets of 21 languages. Limited support of the older drivers for other languages (which are not included in this package but are available for download elsewhere) is also available; these will be fully ported to Interset 2 in future.
Interset is implemented as Perl libraries. It is also available via CPAN.
NER models for NameTag 2, named entity recognition tool, for English, German, Dutch, Spanish and Czech. Model documentation including performance can be found here: https://ufal.mff.cuni.cz/nametag/2/models . These models are for NameTag 2, named entity recognition tool, which can be found here: https://ufal.mff.cuni.cz/nametag/2 .