This package contains data sets for development (Section dev) and testing (Section test) of machine translation of sentences from summaries of medical articles between Czech, English, French, German, Hungarian, Polish, Spanish
and Swedish. Version 2.0 extends the previous version by adding Hungarian, Polish, Spanish, and Swedish translations.
Lingua::Interset is a universal morphosyntactic feature set to which all tagsets of all corpora/languages can be mapped. Version 2.026 covers 37 different tagsets of 21 languages. Limited support of the older drivers for other languages (which are not included in this package but are available for download elsewhere) is also available; these will be fully ported to Interset 2 in future.
Interset is implemented as Perl libraries. It is also available via CPAN.
Multilingual lexical database that follows the model proposed by the EuroWordNet project. The MCR integrates into the same EuroWordNet framework wordnets from five different languages (together with four English WordNet versions). It also integrates WordNet Domains and new versions of the Base Concepts and Top Concept Ontology. Overall, it contains 1,642,389 semantic relations between synsets, most of them acquired by automatic means. Information contained: semantics, synonyms, antonyms, definition, equivalents, example of use, morphology.
NER models for NameTag 2, named entity recognition tool, for English, German, Dutch, Spanish and Czech. Model documentation including performance can be found here: https://ufal.mff.cuni.cz/nametag/2/models . These models are for NameTag 2, named entity recognition tool, which can be found here: https://ufal.mff.cuni.cz/nametag/2 .
NER models for NameTag 2, named entity recognition tool, for English, German, Dutch, Spanish and Czech. Model documentation including performance can be found here: https://ufal.mff.cuni.cz/nametag/2/models . These models are for NameTag 2, named entity recognition tool, which can be found here: https://ufal.mff.cuni.cz/nametag/2 .
This is a trained model for the supervised machine learning tool NameTag 3 (https://ufal.mff.cuni.cz/nametag/3/), trained jointly on several NE corpora: English CoNLL-2003, German CoNLL-2003, Dutch CoNLL-2002, Spanish CoNLL-2002, Ukrainian Lang-uk, and Czech CNEC 2.0, all harmonized to flat NEs with 4 labels PER, ORG, LOC, and MISC. NameTag 3 is an open-source tool for both flat and nested named entity recognition (NER). NameTag 3 identifies proper names in text and classifies them into a set of predefined categories, such as names of persons, locations, organizations, etc. The model documentation can be found at https://ufal.mff.cuni.cz/nametag/3/models#multilingual-conll.