Creator: Zeman, Daniel / Language: Multiple languages - LINDAT/CLARIAH-CZ Catalog Search Results

Start Over Creator Zeman, Daniel Language Multiple languages

1. CoNLL 2017 Shared Task - Automatically Annotated Raw Texts and Word Embeddings

Creator:: Ginter, Filip, Hajič, Jan, Luotolahti, Juhani, Straka, Milan, and Zeman, Daniel
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, mlmodel, and languageDescription
Subject:: CoNLL 2017, word embeddings, and automatic annotation
Language:: Multiple languages
Description:: Automatic segmentation, tokenization and morphological and syntactic annotations of raw texts in 45 languages, generated by UDPipe (http://ufal.mff.cuni.cz/udpipe), together with word embeddings of dimension 100 computed from lowercased texts by word2vec (https://code.google.com/archive/p/word2vec/). For each language, automatic annotations in CoNLL-U format are provided in a separate archive. The word embeddings for all languages are distributed in one archive. Note that the CC BY-SA-NC 4.0 license applies to the automatically generated annotations and word embeddings, not to the underlying data, which may have different license and impose additional restrictions. Update 2018-09-03 =============== Added data in the 4 “surprise languages” from the 2017 ST: Buryat, Kurmanji, North Sami and Upper Sorbian. This has been promised before, during CoNLL-ST 2018 we gave the participants a link to this record saying the data was here. It wasn't, sorry. But now it is.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

2. Lingua::Interset 2.026

Creator:: Zeman, Daniel
Publisher:: Charles University, Faculty of Mathematics and Physics
Type:: tool and toolService
Subject:: morphology, part of speech, conversion, and tagset
Language:: Arabic, Bulgarian, Bengali, Catalan, Czech, Danish, German, Modern Greek (1453-), English, Spanish, Estonian, Basque, Persian, Finnish, Ancient Greek (to 1453), Hebrew, Hindi, Croatian, Japanese, Multiple languages, and Portuguese
Description:: Lingua::Interset is a universal morphosyntactic feature set to which all tagsets of all corpora/languages can be mapped. Version 2.026 covers 37 different tagsets of 21 languages. Limited support of the older drivers for other languages (which are not included in this package but are available for download elsewhere) is also available; these will be fully ported to Interset 2 in future. Interset is implemented as Perl libraries. It is also available via CPAN.
Rights:: Artistic License (Perl) 1.0, http://opensource.org/licenses/Artistic-Perl-1.0, and PUB

1. CoNLL 2017 Shared Task - Automatically Annotated Raw Texts and Word Embeddings

2. Lingua::Interset 2.026

Limit your search

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Creator

Language

Show values starting with

Publisher

Rights

Subject

Type

Original context has metadata only

Harvested from