Contributor: Ministerstvo školství, mládeže a tělovýchovy České republiky@@LM2015071@@LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat@@nationalFunds@@ / Language: Swedish / Original context has metadata only: false / Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)

1. CoNLL 2017 and 2018 Shared Task Blind and Preprocessed Test Data

Creator:: Zeman, Daniel and Straka, Milan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: tokenization, word segmentation, morphology, tagging, syntax, parsing, and universal dependencies
Language:: Afrikaans, Arabic, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Persian, Finnish, French, Old French (842-ca. 1400), Irish, Galician, Gothic, Ancient Greek (to 1453), Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Latin, Latvian, Dutch, Norwegian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Thai, Turkish, Uighur, Ukrainian, Urdu, Vietnamese, and Chinese
Description:: CoNLL 2017 and 2018 shared tasks: Multilingual Parsing from Raw Text to Universal Dependencies This package contains the test data in the form in which they ware presented to the participating systems: raw text files and files preprocessed by UDPipe. The metadata.json files contain lists of files to process and to output; README files in the respective folders describe the syntax of metadata.json. For full training, development and gold standard test data, see Universal Dependencies 2.0 (CoNLL 2017) Universal Dependencies 2.2 (CoNLL 2018) See the download links at http://universaldependencies.org/. For more information on the shared tasks, see http://universaldependencies.org/conll17/ http://universaldependencies.org/conll18/ Contents: conll17-ud-test-2017-05-09 ... CoNLL 2017 test data conll18-ud-test-2018-05-06 ... CoNLL 2018 test data conll18-ud-test-2018-05-06-for-conll17 ... CoNLL 2018 test data with metadata and filenames modified so that it is digestible by the 2017 systems.
Rights:: Licence Universal Dependencies v2.2, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.2, and PUB

2. Deep Universal Dependencies 2.4

Creator:: Zeman, Daniel and Droganova, Kira
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: semantic dependency and universal dependencies
Language:: Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, and Galician
Description:: Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-2988). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
Rights:: Licence Universal Dependencies v2.4, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.4, and PUB

3. Deep Universal Dependencies 2.5

Creator:: Zeman, Daniel and Droganova, Kira
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: semantic dependency and universal dependencies
Language:: Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, Galician, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, and Skolt Sami
Description:: Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3105). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
Rights:: Licence Universal Dependencies v2.5, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.5, and PUB

4. Khresmoi Query Translation Test Data 2.0

Creator:: Pecina, Pavel, Dušek, Ondřej, Hajič, Jan, Libovický, Jindřich, and Urešová, Zdeňka
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: corpus, test data, medical, health, machine translation, Czech, English, French, German, Hungarian, Polish, Spanish, and Swedish
Language:: Czech, English, French, German, Hungarian, Polish, Spanish, and Swedish
Description:: This package contains data sets for development and testing of machine translation of medical queries between Czech, English, French, German, Hungarian, Polish, Spanish ans Swedish. The queries come from general public and medical experts. This is version 2.0 extending the previous version by adding Hungarian, Polish, Spanish, and Swedish translations.
Rights:: Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB

5. Khresmoi Summary Translation Test Data 2.0

Creator:: Dušek, Ondřej, Hajič, Jan, Hlaváčová, Jaroslava, Libovický, Jindřich, Pecina, Pavel, Tamchyna, Aleš, and Urešová, Zdeňka
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: corpus, test data, medical, health, machine translation, Czech, English, French, German, Hungarian, Polish, Spanish, and Swedish
Language:: Czech, English, French, German, Hungarian, Polish, Spanish, and Swedish
Description:: This package contains data sets for development (Section dev) and testing (Section test) of machine translation of sentences from summaries of medical articles between Czech, English, French, German, Hungarian, Polish, Spanish and Swedish. Version 2.0 extends the previous version by adding Hungarian, Polish, Spanish, and Swedish translations.
Rights:: Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB

6. Slavic Forest, Norwegian Wood (scripts)

Creator:: Rosa, Rudolf, Zeman, Daniel, Mareček, David, and Žabokrtský, Zdeněk
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: suiteOfTools and toolService
Subject:: parsing, dependency parser, universal dependencies, and cross-lingual parsing
Language:: Czech, Slovak, Slovenian, Croatian, Danish, Swedish, and Norwegian
Description:: Tools and scripts used to create the cross-lingual parsing models submitted to VarDial 2017 shared task (https://bitbucket.org/hy-crossNLP/vardial2017), as described in the linked paper. The trained UDPipe models themselves are published in a separate submission (https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-1971). For each source (SS, e.g. sl) and target (TT, e.g. hr) language, you need to add the following into this directory: - treebanks (Universal Dependencies v1.4): SS-ud-train.conllu TT-ud-predPoS-dev.conllu - parallel data (OpenSubtitles from Opus): OpenSubtitles2016.SS-TT.SS OpenSubtitles2016.SS-TT.TT !!! If they are originally called ...TT-SS... instead of ...SS-TT..., you need to symlink them (or move, or copy) !!! - target tagging model TT.tagger.udpipe All of these can be obtained from https://bitbucket.org/hy-crossNLP/vardial2017 You also need to have: - Bash - Perl 5 - Python 3 - word2vec (https://code.google.com/archive/p/word2vec/); we used rev 41 from 15th Sep 2014 - udpipe (https://github.com/ufal/udpipe); we used commit 3e65d69 from 3rd Jan 2017 - Treex (https://github.com/ufal/treex); we used commit d27ee8a from 21st Dec 2016 The most basic setup is the sl-hr one (train_sl-hr.sh): - normalization of deprels - 1:1 word-alignment of parallel data with Monolingual Greedy Aligner - simple word-by-word translation of source treebank - pre-training of target word embeddings - simplification of morpho feats (use only Case) - and finally, training and evaluating the parser Both da+sv-no (train_ds-no.sh) and cs-sk (train_cs-sk.sh) add some cross-tagging, which seems to be useful only in specific cases (see paper for details). Moreover, cs-sk also adds more morpho features, selecting those that seem to be very often shared in parallel data. The whole pipeline takes tens of hours to run, and uses several GB of RAM, so make sure to use a powerful computer.
Rights:: GNU General Public License 2 or later (GPL-2.0), http://opensource.org/licenses/GPL-2.0, and PUB

10. Universal Derivations v1.0

Creator:: Kyjánek, Lukáš, Žabokrtský, Zdeněk, Vidra, Jonáš, and Ševčíková, Magda
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, lexicon, and lexicalConceptualResource
Subject:: universal derivations, uder, word-formation, derivation, derivational morphology, lexical network, and harmonization
Language:: Czech, English, Estonian, Finnish, German, French, Latin, Persian, Polish, Portuguese, Spanish, Catalan, Turkish, Scottish Gaelic, Russian, Swedish, Serbo-Croatian, Italian, Dutch, and Croatian
Description:: Universal Derivations (UDer) is a collection of harmonized lexical networks capturing word-formation, especially derivational relations, in a cross-linguistically consistent annotation scheme for many languages. The annotation scheme is based on a rooted tree data structure, in which nodes correspond to lexemes, while edges represent derivational relations or compounding. The current version of the UDer collection contains twenty-seven harmonized resources covering twenty different languages.
Rights:: Universal Derivations v1.0 License Agreement, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UDer-1.0, and PUB

1. CoNLL 2017 and 2018 Shared Task Blind and Preprocessed Test Data

2. Deep Universal Dependencies 2.4

3. Deep Universal Dependencies 2.5

4. Khresmoi Query Translation Test Data 2.0

5. Khresmoi Summary Translation Test Data 2.0

6. Slavic Forest, Norwegian Wood (scripts)

7. UDify Pretrained Model

8. Universal Dependencies 2.4 Models for UDPipe (2019-05-31)

9. Universal Dependencies 2.5 Models for UDPipe (2019-12-06)

10. Universal Derivations v1.0

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Creator

Show values starting with

Language

Show values starting with

Publisher

Rights

Show values starting with

Subject

Show values starting with

Type

Original context has metadata only

Harvested from