1 - 36 of 36
Number of results to display per page
Search Results
2. CoNLL 2017 and 2018 Shared Task Blind and Preprocessed Test Data
- Creator:
- Zeman, Daniel and Straka, Milan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- tokenization, word segmentation, morphology, tagging, syntax, parsing, and universal dependencies
- Language:
- Afrikaans, Arabic, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Persian, Finnish, French, Old French (842-ca. 1400), Irish, Galician, Gothic, Ancient Greek (to 1453), Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Latin, Latvian, Dutch, Norwegian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Thai, Turkish, Uighur, Ukrainian, Urdu, Vietnamese, and Chinese
- Description:
- CoNLL 2017 and 2018 shared tasks: Multilingual Parsing from Raw Text to Universal Dependencies This package contains the test data in the form in which they ware presented to the participating systems: raw text files and files preprocessed by UDPipe. The metadata.json files contain lists of files to process and to output; README files in the respective folders describe the syntax of metadata.json. For full training, development and gold standard test data, see Universal Dependencies 2.0 (CoNLL 2017) Universal Dependencies 2.2 (CoNLL 2018) See the download links at http://universaldependencies.org/. For more information on the shared tasks, see http://universaldependencies.org/conll17/ http://universaldependencies.org/conll18/ Contents: conll17-ud-test-2017-05-09 ... CoNLL 2017 test data conll18-ud-test-2018-05-06 ... CoNLL 2018 test data conll18-ud-test-2018-05-06-for-conll17 ... CoNLL 2018 test data with metadata and filenames modified so that it is digestible by the 2017 systems.
- Rights:
- Licence Universal Dependencies v2.2, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.2, and PUB
3. CoNLL 2018 Shared Task System Outputs
- Creator:
- Zeman, Daniel, Potthast, Martin, Duthoo, Elie, Mesnard, Olivier, Rybak, Piotr, Wróblewska, Alina, Che, Wanxiang, Liu, Yijia, Wang, Yuxuan, Zheng, Bo, Liu, Ting, Li, Zuchao, He, Shexia, Zhang, Zhuosheng, Zhao, Hai, Wu, Yingting, Tong, Jia-Jun, Nguyen, Dat Quoc, Verspoor, Karin, Wan, Hui, Naseem, Tahira, Lee, Young-Suk, Castelli, Vittorio, Ballesteros, Miguel, Hershcovich, Daniel, Abend, Omri, Rappoport, Ari, Smith, Aaron, Bohnet, Bernd, de Lhoneux, Miryam, Nivre, Joakim, Shao, Yan, Stymne, Sara, Kırnap, Ömer, Dayanık, Erenay, Yuret, Deniz, Kanerva, Jenna, Ginter, Filip, Miekka, Niko, Leino, Akseli, Salakoski, Tapio, Lim, KyungTae, Park, Cheoneum, Lee, Changki, Poibeau, Thierry, Bhat, Riyaz Ahmad, Bhat, Irshad, Bangalore, Srinivas, Qi, Peng, Dozat, Timothy, Zhang, Yuhao, Manning, Christopher, Boroș, Tiberiu, Dumitrescu, Stefan Daniel, Burtica, Ruxandra, Arakelyan, Gor, Hambardzumyan, Karen, Khachatrian, Hrant, Rosa, Rudolf, Mareček, David, Straka, Milan, Seker, Amit, More, Amir, Tsarfaty, Reut, Önder, Berkay Furkan, Gümeli, Can, Jawahar, Ganesh, Muller, Benjamin, Fethi, Amal, Martin, Louis, Villemonte de la Clergerie, Eric, Sagot, Benoît, Seddah, Djamé, Özateş, Şaziye Betül, Özgür, Arzucan, Gungor, Tunga, Öztürk, Balkız, Ji, Tao, Liu, Yufang, Wang, Yijun, Wu, Yuanbin, Lan, Man, Chen, Danlu, Lin, Mengxiao, Hu, Zhifeng, and Qiu, Xipeng
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- parsed data, conllu, and universal dependencies
- Language:
- Afrikaans, Arabic, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Persian, Finnish, French, Old French (842-ca. 1400), Irish, Galician, Gothic, Ancient Greek (to 1453), Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Latin, Latvian, Dutch, Norwegian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Thai, Turkish, Uighur, Ukrainian, Urdu, Vietnamese, and Chinese
- Description:
- Test data parsed by systems submitted to the CoNLL 2018 UD parsing shared task.
- Rights:
- Licence Universal Dependencies v2.2, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.2, and PUB
4. DaMuEL 1.0: A Large Multilingual Dataset for Entity Linking
- Creator:
- Kubeša, David and Straka, Milan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- entity linking, NEL, NER, dataset, and knowledge base
- Language:
- Afrikaans, Arabic, Armenian, Basque, Belarusian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Latin, Latvian, Lithuanian, Maltese, Marathi, Modern Greek (1453-), Northern Sami, Norwegian Nynorsk, Persian, Polish, Portuguese, Romanian, Russian, Scottish Gaelic, Serbian, Slovak, Slovenian, Spanish, Swedish, Tamil, Telugu, Uighur, Ukrainian, Urdu, Vietnamese, and Wolof
- Description:
- We present DaMuEL, a large Multilingual Dataset for Entity Linking containing data in 53 languages. DaMuEL consists of two components: a knowledge base that contains language-agnostic information about entities, including their claims from Wikidata and named entity types (PER, ORG, LOC, EVENT, BRAND, WORK_OF_ART, MANUFACTURED); and Wikipedia texts with entity mentions linked to the knowledge base, along with language-specific text from Wikidata such as labels, aliases, and descriptions, stored separately for each language. The Wikidata QID is used as a persistent, language-agnostic identifier, enabling the combination of the knowledge base with language-specific texts and information for each entity. Wikipedia documents deliberately annotate only a single mention for every entity present; we further automatically detect all mentions of named entities linked from each document. The dataset contains 27.9M named entities in the knowledge base and 12.3G tokens from Wikipedia texts. The dataset is published under the CC BY-SA licence.
- Rights:
- Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB
5. Deep Universal Dependencies 2.4
- Creator:
- Zeman, Daniel and Droganova, Kira
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- semantic dependency and universal dependencies
- Language:
- Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, and Galician
- Description:
- Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-2988). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
- Rights:
- Licence Universal Dependencies v2.4, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.4, and PUB
6. Deep Universal Dependencies 2.5
- Creator:
- Zeman, Daniel and Droganova, Kira
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- semantic dependency and universal dependencies
- Language:
- Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, Galician, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, and Skolt Sami
- Description:
- Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3105). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
- Rights:
- Licence Universal Dependencies v2.5, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.5, and PUB
7. Deep Universal Dependencies 2.6
- Creator:
- Zeman, Daniel and Droganova, Kira
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- semantic dependency and universal dependencies
- Language:
- Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, Galician, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Icelandic, Albanian, and Persian
- Description:
- Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3226). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
- Rights:
- Licence Universal Dependencies v2.6, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.6, and PUB
8. Deep Universal Dependencies 2.7
- Creator:
- Zeman, Daniel and Droganova, Kira
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- semantic dependency and universal dependencies
- Language:
- Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, Galician, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Icelandic, Albanian, Persian, Akuntsu, Apurinã, Khunsari, Manx, Mundurukú, Nayini, Soi, South Levantine Arabic, and Tupinambá
- Description:
- Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3424). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
- Rights:
- Licence Universal Dependencies v2.7, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.7, and PUB
9. Deep Universal Dependencies 2.8
- Creator:
- Zeman, Daniel and Droganova, Kira
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- semantic dependency and universal dependencies
- Language:
- Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, Galician, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Icelandic, Albanian, Persian, Akuntsu, Apurinã, Khunsari, Manx, Mundurukú, Nayini, Soi, South Levantine Arabic, Tupinambá, Beja, Western Frisian, Urubú-Kaapor, Kangri, K'iche', Low German, Makuráp, Western Armenian, and Central Siberian Yupik
- Description:
- Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3687). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
- Rights:
- Licence Universal Dependencies v2.8, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.8, and PUB
10. Deltacorpus
- Creator:
- Mareček, David, Yu, Zhiwei, Zeman, Daniel, and Žabokrtský, Zdeněk
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- part of speech, tagging, semi-supervised, and cross-language
- Language:
- Belarusian, Bosnian, Bulgarian, Czech, Serbo-Croatian, Croatian, Upper Sorbian, Macedonian, Polish, Russian, Slovak, Slovenian, Serbian, Ukrainian, Latvian, Lithuanian, Afrikaans, Danish, German, English, Faroese, Western Frisian, Swiss German, Icelandic, Limburgan, Luxembourgish, Low German, Dutch, Norwegian Nynorsk, Norwegian, Scots, Swedish, Yiddish, Aragonese, Asturian, Catalan, French, Galician, Haitian, Italian, Latin, Lombard, Neapolitan, Piemontese, Portuguese, Romanian, Spanish, Venetian, Walloon, Breton, Welsh, Scottish Gaelic, Irish, Modern Greek (1453-), Armenian, Albanian, Dimli (individual language), Persian, Gilaki, Kurdish, Tajik, Bengali, Bishnupriya, Gujarati, Fiji Hindi, Hindi, Marathi, Nepali (macrolanguage), Urdu, Amharic, Arabic, Egyptian Arabic, Hebrew, Estonian, Finnish, Hungarian, Basque, Georgian, Chuvash, Azerbaijani, Turkish, Uzbek, Kazakh, Tatar, Yakut, Korean, Mongolian, Telugu, Kannada, Malayalam, Tamil, Newari, Vietnamese, Indonesian, Javanese, Malagasy, Maori, Malay (macrolanguage), Pampanga, Sundanese, Tagalog, Waray (Philippines), Swahili (macrolanguage), Esperanto, Ido, Interlingua (International Auxiliary Language Association), and Volapük
- Description:
- Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger described in Yu et al. (2016, LREC, Portorož, Slovenia).
- Rights:
- Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB
11. Deltacorpus 1.1
- Creator:
- Mareček, David, Yu, Zhiwei, Zeman, Daniel, and Žabokrtský, Zdeněk
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- part of speech, tagging, semi-supervised, and cross-language
- Language:
- Belarusian, Bosnian, Bulgarian, Czech, Serbo-Croatian, Croatian, Upper Sorbian, Macedonian, Polish, Russian, Slovak, Slovenian, Serbian, Ukrainian, Latvian, Lithuanian, Afrikaans, Danish, German, English, Faroese, Western Frisian, Swiss German, Icelandic, Limburgan, Luxembourgish, Low German, Dutch, Norwegian Nynorsk, Norwegian, Scots, Swedish, Yiddish, Aragonese, Asturian, Catalan, French, Galician, Haitian, Italian, Latin, Lombard, Neapolitan, Piemontese, Portuguese, Romanian, Spanish, Venetian, Walloon, Breton, Welsh, Scottish Gaelic, Irish, Modern Greek (1453-), Armenian, Albanian, Dimli (individual language), Persian, Gilaki, Kurdish, Tajik, Bengali, Bishnupriya, Gujarati, Fiji Hindi, Hindi, Marathi, Nepali (macrolanguage), Urdu, Amharic, Arabic, Egyptian Arabic, Hebrew, Estonian, Finnish, Hungarian, Basque, Georgian, Chuvash, Azerbaijani, Turkish, Uzbek, Kazakh, Tatar, Yakut, Korean, Mongolian, Telugu, Kannada, Malayalam, Tamil, Newari, Vietnamese, Indonesian, Javanese, Malagasy, Maori, Malay (macrolanguage), Pampanga, Sundanese, Tagalog, Waray (Philippines), Swahili (macrolanguage), Esperanto, Ido, Interlingua (International Auxiliary Language Association), and Volapük
- Description:
- Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger described in Yu et al. (2016, LREC, Portorož, Slovenia). Changes in version 1.1: 1. Universal Dependencies tagset instead of the older and smaller Google Universal POS tagset. 2. SVM classifier trained on Universal Dependencies 1.2 instead of HamleDT 2.0. 3. Balto-Slavic languages, Germanic languages and Romance languages were tagged by classifier trained only on the respective group of languages. Other languages were tagged by a classifier trained on all available languages. The "c7" combination from version 1.0 is no longer used.
- Rights:
- Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB
12. Plaintext Wikipedia dump 2018
- Creator:
- Rosa, Rudolf
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- Wikipedia, text corpora, and monolingual corpus
- Language:
- Abkhazian, Achinese, Adyghe, Afrikaans, Akan, Tosk Albanian, Amharic, Old English (ca. 450-1100), Arabic, Official Aramaic (700-300 BCE), Aragonese, Egyptian Arabic, Assamese, Asturian, Atikamekw, Avaric, Aymara, South Azerbaijani, Azerbaijani, Bashkir, Bambara, Bavarian, Central Bikol, Belarusian, Bengali, Bislama, Banjar, Tibetan, Bosnian, Bishnupriya, Breton, Buginese, Bulgarian, Russia Buriat, Catalan, Min Dong Chinese, Cebuano, Czech, Chamorro, Chechen, Cherokee, Church Slavic, Chuvash, Cheyenne, Central Kurdish, Cornish, Corsican, Cree, Crimean Tatar, Kashubian, Welsh, Danish, German, Dinka, Dimli (individual language), Dhivehi, Lower Sorbian, Dzongkha, Modern Greek (1453-), English, Esperanto, Estonian, Basque, Ewe, Extremaduran, Faroese, Persian, Fijian, Finnish, French, Arpitan, Northern Frisian, Western Frisian, Fulah, Friulian, Gagauz, Gan Chinese, Scottish Gaelic, Irish, Galician, Gilaki, Manx, Goan Konkani, Gothic, Guarani, Gujarati, Hakka Chinese, Haitian, Hausa, Hawaiian, Serbo-Croatian, Hebrew, Herero, Fiji Hindi, Hindi, Hiri Motu, Croatian, Upper Sorbian, Hungarian, Armenian, Igbo, Ido, Inuktitut, Interlingue, Iloko, Interlingua (International Auxiliary Language Association), Indonesian, Inupiaq, Icelandic, Italian, Jamaican Creole English, Javanese, Lojban, Japanese, Kara-Kalpak, Kabyle, Kalaallisut, Kannada, Kashmiri, Georgian, Kanuri, Kazakh, Kabardian, Kabiyè, Khmer, Kikuyu, Kinyarwanda, Kirghiz, Komi-Permyak, Komi, Kongo, Korean, Karachay-Balkar, Kölsch, Kurdish, Ladino, Lao, Latin, Latvian, Lak, Lezghian, Ligurian, Limburgan, Lingala, Lithuanian, Lombard, Northern Luri, Latgalian, Luxembourgish, Ganda, Literary Chinese, Marshallese, Maithili, Malayalam, Marathi, Moksha, Eastern Mari, Minangkabau, Macedonian, Malagasy, Maltese, Mongolian, Maori, Western Mari, Malay (macrolanguage), Creek, Mirandese, Burmese, Erzya, Mazanderani, Min Nan Chinese, Neapolitan, Nauru, Navajo, Ndonga, Low German, Nepali (macrolanguage), Newari, Dutch, Norwegian Nynorsk, Norwegian, Novial, Pedi, Nyanja, Occitan (post 1500), Livvi, Oriya (macrolanguage), Oromo, Ossetian, Pangasinan, Pampanga, Panjabi, Papiamento, Picard, Pennsylvania German, Pfaelzisch, Pitcairn-Norfolk, Pali, Piemontese, Western Panjabi, Pontic, Polish, Portuguese, Pushto, Quechua, Vlax Romani, Romansh, Romanian, Rusyn, Rundi, Macedo-Romanian, Russian, Sango, Yakut, Sanskrit, Sicilian, Scots, Samogitian, Sinhala, Slovak, Slovenian, Northern Sami, Samoan, Shona, Sindhi, Somali, Southern Sotho, Spanish, Albanian, Sardinian, Sranan Tongo, Serbian, Swati, Saterfriesisch, Sundanese, Swahili (macrolanguage), Swedish, Silesian, Tahitian, Tamil, Tatar, Tulu, Telugu, Tama (Colombia), Tetum, Tajik, Tagalog, Thai, Tigrinya, Tonga (Tonga Islands), Tok Pisin, Tswana, Tsonga, Turkmen, Tumbuka, Turkish, Twi, Tuvinian, Udmurt, Uighur, Ukrainian, Urdu, Uzbek, Venetian, Venda, Veps, Vietnamese, Vlaams, Volapük, Võro, Waray (Philippines), Walloon, Wolof, Wu Chinese, Kalmyk, Xhosa, Mingrelian, Yiddish, Yoruba, Yue Chinese, Zeeuws, Zhuang, Chinese, Zulu, and Dotyali
- Description:
- Wikipedia plain text data obtained from Wikipedia dumps with WikiExtractor in February 2018. The data come from all Wikipedias for which dumps could be downloaded at [https://dumps.wikimedia.org/]. This amounts to 297 Wikipedias, usually corresponding to individual languages and identified by their ISO codes. Several special Wikipedias are included, most notably "simple" (Simple English Wikipedia) and "incubator" (tiny hatching Wikipedias in various languages). For a list of all the Wikipedias, see [https://meta.wikimedia.org/wiki/List_of_Wikipedias]. The script which can be used to get new version of the data is included, but note that Wikipedia limits the download speed for downloading a lot of the dumps, so it takes a few days to download all of them (but one or a few can be downloaded fast). Also, the format of the dumps changes time to time, so the script will probably eventually stop working one day. The WikiExtractor tool [http://medialab.di.unipi.it/wiki/Wikipedia_Extractor] used to extract text from the Wikipedia dumps is not mine, I only modified it slightly to produce plaintext outputs [https://github.com/ptakopysk/wikiextractor].
- Rights:
- Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0), http://creativecommons.org/licenses/by-sa/3.0/, and PUB
13. The art of the Armenian book through the ages :
- Creator:
- Utidjian, Haig,
- Type:
- text, statický obraz, katalogy výstav, and monografie
- Subject:
- Staré tisky, kultura knižní, literatura arménská, rukopisy, tisky staré, Arménie, přehledná zpracování světových dějin (chronologicky), dějiny knihy, knihtisk, nakladatelství, and rukopisy a staré tisky
- Language:
- English, Armenian, and Czech
- Description:
- Doprovodná publikace ke stejnojmenné výstavě, pořádané v Praze 11.-25.10. 2016
- Rights:
- unknown
14. TITUS Old Armenian
- Format:
- text/html
- Type:
- corpus
- Language:
- Armenian
- Description:
- ca. 1.000.000 tokens; linked with relational database; XML-encoding in progress
- Rights:
- http://titus.uni-frankfurt.de/texte/texte2.htm#Estart
15. Treasures of the earliest Christian nation :
- Creator:
- Utidjian, Haig,
- Type:
- text and katalogy výstav
- Subject:
- Rukopisy, prvotisky, staré tisky. Vzácná a pozoruhodná díla, rukopisy středověké, Arméni, rukopisy iluminované, kultura knižní, umění křesťanské, Arménie, světové dějiny středověku (do r. 1492), and dějiny knihy, knihtisk, nakladatelství
- Language:
- English, Czech, and Armenian
- Description:
- Vychází jako doprovodná publikace ke stejnojmenné výstavě and V prelimináriích: Královská kanonie premonstrátů na Strahově
- Rights:
- unknown
16. UDify Pretrained Model
- Creator:
- Kondratyuk, Dan and Straka, Milan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- tool and toolService
- Subject:
- syntax, dependency parser, and universal dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, and Maltese
- Description:
- Pretrained model weights for the UDify model, and extracted BERT weights in pytorch-transformers format. Note that these weights slightly differ from those used in the paper.
- Rights:
- Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB
17. Universal Dependencies 2.10
- Creator:
- Zeman, Daniel, Nivre, Joakim, Abrams, Mitchell, Ackermann, Elia, Aepli, Noëmi, Aghaei, Hamid, Agić, Željko, Ahmadi, Amir, Ahrenberg, Lars, Ajede, Chika Kennedy, Aleksandravičiūtė, Gabrielė, Alfina, Ika, Algom, Avner, Andersen, Erik, Antonsen, Lene, Aplonova, Katya, Aquino, Angelina, Aragon, Carolina, Aranes, Glyd, Aranzabe, Maria Jesus, Arıcan, Bilge Nas, Arnardóttir, Þórunn, Arutie, Gashaw, Arwidarasti, Jessica Naraiswari, Asahara, Masayuki, Aslan, Deniz Baran, Asmazoğlu, Cengiz, Ateyah, Luma, Atmaca, Furkan, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Balasubramani, Keerthana, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Barkarson, Starkaður, Basile, Rodolfo, Basmov, Victoria, Batchelor, Colin, Bauer, John, Bedir, Seyyit Talha, Bengoetxea, Kepa, Ben Moshe, Yifat, Berk, Gözde, Berzak, Yevgeni, Bhat, Irshad Ahmad, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Bjarnadóttir, Kristín, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Braggaar, Anouck, Brokaitė, Kristina, Burchardt, Aljoscha, Candito, Marie, Caron, Bernard, Caron, Gauthier, Cassidy, Lauren, Cavalcanti, Tatiana, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cesur, Neslihan, Cetin, Savas, Çetinoğlu, Özlem, Chalub, Fabricio, Chauhan, Shweta, Chi, Ethan, Chika, Taishi, Cho, Yongseok, Choi, Jinho, Chun, Jayeol, Chung, Juyeon, Cignarella, Alessandra T., Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Corbetta, Daniela, Courtin, Marine, Cristescu, Mihaela, Daniel, Philemon, Davidson, Elizabeth, Dehouck, Mathieu, de Laurentiis, Martina, de Marneffe, Marie-Catherine, de Paiva, Valeria, Derin, Mehmet Oguz, de Souza, Elvis, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dinakaramani, Arawinda, Di Nuovo, Elisa, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eckhoff, Hanne, Eiche, Sandra, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erina, Olga, Erjavec, Tomaž, Etienne, Aline, Evelyn, Wograine, Facundes, Sidney, Farkas, Richárd, Favero, Federica, Ferdaousi, Jannatul, Fernanda, Marília, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Gamba, Federica, Garcia, Marcos, Gärdenfors, Moa, Garza, Sebastian, Gerardi, Fabrício Ferraz, Gerdes, Kim, Ginter, Filip, Godoy, Gustavo, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Griciūtė, Bernadeta, Grioni, Matias, Grobol, Loïc, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Güngör, Tunga, Habash, Nizar, Hafsteinsson, Hinrik, Hajič, Jan, Hajič jr., Jan, Hämäläinen, Mika, Hà Mỹ, Linh, Han, Na-Rae, Hanifmuti, Muhammad Yudistira, Harada, Takahiro, Hardwick, Sam, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hellwig, Oliver, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Hwang, Jena, Ikeda, Takumi, Ingason, Anton Karl, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Ito, Kaoru, Jannat, Siratun, Jelínek, Tomáš, Jha, Apoorva, Johannsen, Anders, Jónsdóttir, Hildur, Jørgensen, Fredrik, Juutinen, Markus, K, Sarveswaran, Kaşıkara, Hüner, Kaasen, Andre, Kabaeva, Nadezhda, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Kara, Neslihan, Karahóǧa, Ritván, Katz, Boris, Kayadelen, Tolga, Kenney, Jessica, Kettnerová, Václava, Kirchner, Jesse, Klementieva, Elena, Klyachko, Elena, Köhn, Arne, Köksal, Abdullatif, Kopacewicz, Kamil, Korkiakangas, Timo, Köse, Mehmet, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Krishnamurthy, Parameswari, Kübler, Sandra, Kuyrukçu, Oğuzhan, Kuzgun, Aslı, Kwak, Sookyoung, Laippala, Veronika, Lam, Lucia, Lambertino, Lorenzo, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Levina, Maria, Li, Cheuk Ying, Li, Josie, Li, Keying, Li, Yuan, Lim, KyungTae, Lima Padovani, Bruna, Lindén, Krister, Ljubešić, Nikola, Loginova, Olga, Lusito, Stefano, Luthfi, Andry, Luukko, Mikko, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Mahamdi, Menel, Maillard, Jean, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Marşan, Büşra, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Markantonatou, Stella, Martínez Alonso, Héctor, Martín Rodríguez, Lorena, Martins, André, Mašek, Jan, Matsuda, Hiroshi, Matsumoto, Yuji, Mazzei, Alessandro, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Merzhevich, Tatiana, Miekka, Niko, Mischenkova, Karina, Misirpashayeva, Margarita, Missilä, Anna, Mititelu, Cătălin, Mitrofan, Maria, Miyao, Yusuke, Mojiri Foroushani, AmirHossein, Molnár, Judit, Moloodi, Amirsaeid, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Moretti, Giovanni, Mori, Keiko Sophie, Mori, Shinsuke, Morioka, Tomohiko, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Munro, Robert, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Nakhlé, Mariam, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nevaci, Manuela, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nourian, Alireza, Nurmi, Hanna, Ojala, Stina, Ojha, Atul Kr., Olúòkun, Adédayọ̀, Omura, Mai, Onwuegbuzia, Emeka, Ordan, Noam, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Özateş, Şaziye Betül, Özçelik, Merve, Özgür, Arzucan, Öztürk Başaran, Balkız, Paccosi, Teresa, Palmero Aprosio, Alessio, Park, Hyunji Hayley, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Pedonese, Giulia, Peljak-Łapińska, Angelika, Peng, Siyao, Perez, Cenel-Augusto, Perkova, Natalia, Perrier, Guy, Petrov, Slav, Petrova, Daria, Peverelli, Andrea, Phelan, Jason, Piitulainen, Jussi, Pirinen, Tommi A, Pitler, Emily, Plank, Barbara, Poibeau, Thierry, Ponomareva, Larisa, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Puolakainen, Tiina, Pyysalo, Sampo, Qi, Peng, Rääbis, Andriela, Rademaker, Alexandre, Rahoman, Mizanur, Rama, Taraka, Ramasamy, Loganathan, Ramisch, Carlos, Rashel, Fam, Rasooli, Mohammad Sadegh, Ravishankar, Vinit, Real, Livy, Rebeja, Petru, Reddy, Siva, Regnault, Mathilde, Rehm, Georg, Riabov, Ivan, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rizqiyah, Putri, Rocha, Luisa, Rögnvaldsson, Eiríkur, Romanenko, Mykhailo, Rosa, Rudolf, Roșca, Valentin, Rovati, Davide, Rozonoyer, Ben, Rudina, Olga, Rueter, Jack, Rúnarsson, Kristján, Sadde, Shoval, Safari, Pegah, Sagot, Benoît, Sahala, Aleksi, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Sanıyar, Ezgi, Särg, Dage, Saulīte, Baiba, Sawanakunanon, Yanin, Saxena, Shefali, Scannell, Kevin, Scarlata, Salvatore, Schneider, Nathan, Schuster, Sebastian, Schwartz, Lane, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shahzadi, Syeda, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shishkina, Yana, Shohibussirri, Muh, Sichinava, Dmitry, Siewert, Janine, Sigurðsson, Einar Freyr, Silveira, Aline, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Skachedubova, Maria, Smith, Aaron, Soares-Bastos, Isabela, Sourov, Shafi, Spadine, Carolyn, Sprugnoli, Rachele, Stamou, Vivian, Steingrímsson, Steinþór, Stella, Antonio, Straka, Milan, Strickland, Emmett, Strnadová, Jana, Suhr, Alane, Sulestio, Yogi Lesmana, Sulubacak, Umut, Suzuki, Shingo, Swanson, Daniel, Szántó, Zsolt, Taguchi, Chihiro, Taji, Dima, Takahashi, Yuta, Tamburini, Fabio, Tan, Mary Ann C., Tanaka, Takaaki, Tanaya, Dipta, Tavoni, Mirko, Tella, Samson, Tellier, Isabelle, Testori, Marinella, Thomas, Guillaume, Tonelli, Sara, Torga, Liisi, Toska, Marsida, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Türk, Utku, Tyers, Francis, Uematsu, Sumire, Untilov, Roman, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Utka, Andrius, Vagnoni, Elena, Vajjala, Sowmya, van der Goot, Rob, Vanhove, Martine, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Vedenina, Uliana, Villemonte de la Clergerie, Eric, Vincze, Veronika, Vlasova, Natalia, Wakasa, Aya, Wallenberg, Joel C., Wallin, Lars, Walsh, Abigail, Wang, Jing Xian, Washington, Jonathan North, Wendt, Maximilan, Widmer, Paul, Wigderson, Shira, Wijono, Sri Hartati, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Yako, Mary, Yamashita, Kayo, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yenice, Arife Betül, Yıldız, Olcay Taner, Yu, Zhuoran, Yuliawati, Arlisa, Žabokrtský, Zdeněk, Zahra, Shorouq, Zeldes, Amir, Zhou, He, Zhu, Hanzhi, Zhuravleva, Anna, and Ziane, Rayan
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, Mbyá Guaraní, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Swiss German, Albanian, Icelandic, Akuntsu, Apurinã, Chukot, Khunsari, Manx, Mundurukú, Nayini, Old Turkish, Soi, South Levantine Arabic, Tupinambá, Beja, Western Frisian, Guajajára, Urubú-Kaapor, Kangri, K'iche', Low German, Makuráp, Central Siberian Yupik, Western Armenian, Bengali, Javanese, Karo (Brazil), Ligurian, Neapolitan, Tatar, Xibe, Yakut, Ancient Hebrew, Cebuano, Guarani, Hittite, Madi, Emerillon, and Umbrian
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v2.10, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.10, and PUB
18. Universal Dependencies 2.10 models for UDPipe 2 (2022-07-11)
- Creator:
- Straka, Milan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- tool and toolService
- Subject:
- tokenizer, POS tagger, lemmatization, tagger, parser, and dependency parser
- Language:
- Afrikaans, Arabic, Belarusian, Bulgarian, Catalan, Czech, Church Slavic, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Persian, Finnish, French, Old French (842-ca. 1400), Scottish Gaelic, Irish, Galician, Gothic, Ancient Greek (to 1453), Ancient Hebrew, Hebrew, Hindi, Croatian, Hungarian, Armenian, Western Armenian, Indonesian, Icelandic, Italian, Japanese, Korean, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Maltese, Dutch, Norwegian Nynorsk, Norwegian Bokmål, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Telugu, Turkish, Uighur, Ukrainian, Urdu, Vietnamese, Gambian Wolof, Wolof, and Chinese
- Description:
- Tokenizer, POS Tagger, Lemmatizer and Parser models for 123 treebanks of 69 languages of Universal Depenencies 2.10 Treebanks, created solely using UD 2.10 data (https://hdl.handle.net/11234/1-4758). The model documentation including performance can be found at https://ufal.mff.cuni.cz/udpipe/2/models#universal_dependencies_210_models . To use these models, you need UDPipe version 2.0, which you can download from https://ufal.mff.cuni.cz/udpipe/2 .
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
19. Universal Dependencies 2.11
- Creator:
- Zeman, Daniel, Nivre, Joakim, Abrams, Mitchell, Ackermann, Elia, Aepli, Noëmi, Aghaei, Hamid, Agić, Željko, Ahmadi, Amir, Ahrenberg, Lars, Ajede, Chika Kennedy, Akkurt, Salih Furkan, Aleksandravičiūtė, Gabrielė, Alfina, Ika, Algom, Avner, Alzetta, Chiara, Andersen, Erik, Antonsen, Lene, Aplonova, Katya, Aquino, Angelina, Aragon, Carolina, Aranes, Glyd, Aranzabe, Maria Jesus, Arıcan, Bilge Nas, Arnardóttir, Þórunn, Arutie, Gashaw, Arwidarasti, Jessica Naraiswari, Asahara, Masayuki, Ásgeirsdóttir, Katla, Aslan, Deniz Baran, Asmazoğlu, Cengiz, Ateyah, Luma, Atmaca, Furkan, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Balasubramani, Keerthana, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Barkarson, Starkaður, Basile, Rodolfo, Basmov, Victoria, Batchelor, Colin, Bauer, John, Bedir, Seyyit Talha, Belieni, Juan, Bengoetxea, Kepa, Ben Moshe, Yifat, Berk, Gözde, Berzak, Yevgeni, Bhat, Irshad Ahmad, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Bjarnadóttir, Kristín, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Braggaar, Anouck, Brokaitė, Kristina, Burchardt, Aljoscha, Candito, Marie, Caron, Bernard, Caron, Gauthier, Cassidy, Lauren, Castro, Maria Clara, Cavalcanti, Tatiana, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cesur, Neslihan, Cetin, Savas, Çetinoğlu, Özlem, Chalub, Fabricio, Chamila, Liyanage, Chauhan, Shweta, Chi, Ethan, Chika, Taishi, Cho, Yongseok, Choi, Jinho, Chun, Jayeol, Chung, Juyeon, Cignarella, Alessandra T., Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Corbetta, Daniela, Courtin, Marine, Cristescu, Mihaela, Daniel, Philemon, Davidson, Elizabeth, de Alencar, Leonel Figueiredo, Dehouck, Mathieu, de Laurentiis, Martina, de Marneffe, Marie-Catherine, de Paiva, Valeria, Derin, Mehmet Oguz, de Souza, Elvis, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dinakaramani, Arawinda, Di Nuovo, Elisa, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Ebert, Christian, Eckhoff, Hanne, Eiche, Sandra, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erina, Olga, Erjavec, Tomaž, Etienne, Aline, Evelyn, Wograine, Facundes, Sidney, Farkas, Richárd, Favero, Federica, Ferdaousi, Jannatul, Fernanda, Marília, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Gamba, Federica, Garcia, Marcos, Gärdenfors, Moa, Garza, Sebastian, Gerardi, Fabrício Ferraz, Gerdes, Kim, Ginter, Filip, Godoy, Gustavo, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Griciūtė, Bernadeta, Grioni, Matias, Grobol, Loïc, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Güngör, Tunga, Habash, Nizar, Hafsteinsson, Hinrik, Hajič, Jan, Hajič jr., Jan, Hämäläinen, Mika, Hà Mỹ, Linh, Han, Na-Rae, Hanifmuti, Muhammad Yudistira, Harada, Takahiro, Hardwick, Sam, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hellwig, Oliver, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Huerta Mendez, Marivel, Hwang, Jena, Ikeda, Takumi, Ingason, Anton Karl, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Islamaj, Artan, Ito, Kaoru, Jannat, Siratun, Jelínek, Tomáš, Jha, Apoorva, Jiang, Katharine, Johannsen, Anders, Jónsdóttir, Hildur, Jørgensen, Fredrik, Juutinen, Markus, Kaşıkara, Hüner, Kaasen, Andre, Kabaeva, Nadezhda, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Kara, Neslihan, Karahóǧa, Ritván, Katz, Boris, Kayadelen, Tolga, Kengatharaiyer, Sarveswaran, Kenney, Jessica, Kettnerová, Václava, Kirchner, Jesse, Klementieva, Elena, Klyachko, Elena, Köhn, Arne, Köksal, Abdullatif, Kopacewicz, Kamil, Korkiakangas, Timo, Köse, Mehmet, Koshevoy, Alexey, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Krishnamurthy, Parameswari, Kübler, Sandra, Kuqi, Adrian, Kuyrukçu, Oğuzhan, Kuzgun, Aslı, Kwak, Sookyoung, Laippala, Veronika, Lam, Lucia, Lambertino, Lorenzo, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Levina, Maria, Li, Cheuk Ying, Li, Josie, Li, Keying, Li, Yixuan, Li, Yuan, Lim, KyungTae, Lima Padovani, Bruna, Lindén, Krister, Ljubešić, Nikola, Loginova, Olga, Lusito, Stefano, Luthfi, Andry, Luukko, Mikko, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Mahamdi, Menel, Maillard, Jean, Makarchuk, Ilya, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Marşan, Büşra, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Markantonatou, Stella, Martínez Alonso, Héctor, Martín Rodríguez, Lorena, Martins, André, Mašek, Jan, Matsuda, Hiroshi, Matsumoto, Yuji, Mazzei, Alessandro, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Merzhevich, Tatiana, Miekka, Niko, Mischenkova, Karina, Misirpashayeva, Margarita, Missilä, Anna, Mititelu, Cătălin, Mitrofan, Maria, Miyao, Yusuke, Mojiri Foroushani, AmirHossein, Molnár, Judit, Moloodi, Amirsaeid, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Moretti, Giovanni, Mori, Keiko Sophie, Mori, Shinsuke, Morioka, Tomohiko, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Munro, Robert, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Nakhlé, Mariam, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nevaci, Manuela, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nourian, Alireza, Nurmi, Hanna, Ojala, Stina, Ojha, Atul Kr., Óladóttir, Hulda, Olúòkun, Adédayọ̀, Omura, Mai, Onwuegbuzia, Emeka, Ordan, Noam, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Özateş, Şaziye Betül, Özçelik, Merve, Özgür, Arzucan, Öztürk Başaran, Balkız, Paccosi, Teresa, Palmero Aprosio, Alessio, Panova, Anastasia, Park, Hyunji Hayley, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Pedonese, Giulia, Peljak-Łapińska, Angelika, Peng, Siyao, Perez, Cenel-Augusto, Perkova, Natalia, Perrier, Guy, Petrov, Slav, Petrova, Daria, Peverelli, Andrea, Phelan, Jason, Piitulainen, Jussi, Pintucci, Rodrigo, Pirinen, Tommi A, Pitler, Emily, Plamada, Magdalena, Plank, Barbara, Poibeau, Thierry, Ponomareva, Larisa, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Pugh, Robert, Puolakainen, Tiina, Pyysalo, Sampo, Qi, Peng, Rääbis, Andriela, Rademaker, Alexandre, Rahoman, Mizanur, Rama, Taraka, Ramasamy, Loganathan, Ramisch, Carlos, Rashel, Fam, Rasooli, Mohammad Sadegh, Ravishankar, Vinit, Real, Livy, Rebeja, Petru, Reddy, Siva, Regnault, Mathilde, Rehm, Georg, Riabov, Ivan, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rizqiyah, Putri, Rocha, Luisa, Rögnvaldsson, Eiríkur, Roksandic, Ivan, Romanenko, Mykhailo, Rosa, Rudolf, Roșca, Valentin, Rovati, Davide, Rozonoyer, Ben, Rudina, Olga, Rueter, Jack, Rúnarsson, Kristján, Sadde, Shoval, Safari, Pegah, Sagot, Benoît, Sahala, Aleksi, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Sanıyar, Ezgi, Särg, Dage, Sartor, Marta, Sasaki, Mitsuya, Saulīte, Baiba, Sawanakunanon, Yanin, Saxena, Shefali, Scannell, Kevin, Scarlata, Salvatore, Schneider, Nathan, Schuster, Sebastian, Schwartz, Lane, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shahzadi, Syeda, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shishkina, Yana, Shohibussirri, Muh, Shvedova, Maria, Siewert, Janine, Sigurðsson, Einar Freyr, Silva, João Ricardo, Silveira, Aline, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Símonarson, Haukur Barri, Simov, Kiril, Sitchinava, Dmitri, Skachedubova, Maria, Smith, Aaron, Soares-Bastos, Isabela, Sonnenhauser, Barbara, Sourov, Shafi, Spadine, Carolyn, Sprugnoli, Rachele, Stamou, Vivian, Steingrímsson, Steinþór, Stella, Antonio, Stephen, Abishek, Straka, Milan, Strickland, Emmett, Strnadová, Jana, Suhr, Alane, Sulestio, Yogi Lesmana, Sulubacak, Umut, Suzuki, Shingo, Swanson, Daniel, Szántó, Zsolt, Taguchi, Chihiro, Taji, Dima, Takahashi, Yuta, Tamburini, Fabio, Tan, Mary Ann C., Tanaka, Takaaki, Tanaya, Dipta, Tavoni, Mirko, Tella, Samson, Tellier, Isabelle, Testori, Marinella, Thomas, Guillaume, Tonelli, Sara, Torga, Liisi, Toska, Marsida, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Türk, Utku, Tyers, Francis, Þórðarson, Sveinbjörn, Þorsteinsson, Vilhjálmur, Uematsu, Sumire, Untilov, Roman, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Utka, Andrius, Vagnoni, Elena, Vajjala, Sowmya, van der Goot, Rob, Vanhove, Martine, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Vedenina, Uliana, Venturi, Giulia, Villemonte de la Clergerie, Eric, Vincze, Veronika, Vlasova, Natalia, Wakasa, Aya, Wallenberg, Joel C., Wallin, Lars, Walsh, Abigail, Wang, Jing Xian, Washington, Jonathan North, Wendt, Maximilan, Widmer, Paul, Wigderson, Shira, Wijono, Sri Hartati, Wille, Vanessa Berwanger, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Yako, Mary, Yamashita, Kayo, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yenice, Arife Betül, Yıldız, Olcay Taner, Yu, Zhuoran, Yuliawati, Arlisa, Žabokrtský, Zdeněk, Zahra, Shorouq, Zeldes, Amir, Zhou, He, Zhu, Hanzhi, Zhuravleva, Anna, and Ziane, Rayan
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, Mbyá Guaraní, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Swiss German, Albanian, Icelandic, Akuntsu, Apurinã, Chukot, Khunsari, Manx, Mundurukú, Nayini, Old Turkish, Soi, South Levantine Arabic, Tupinambá, Beja, Western Frisian, Guajajára, Urubú-Kaapor, Kangri, K'iche', Low German, Makuráp, Central Siberian Yupik, Western Armenian, Bengali, Javanese, Karo (Brazil), Ligurian, Neapolitan, Tatar, Xibe, Yakut, Ancient Hebrew, Cebuano, Guarani, Hittite, Madi, Emerillon, Umbrian, Abaza, Gheg Albanian, Malayalam, Nhengatu, Sinhala, Zacatlán-Ahuacatlán-Tepetzintla Nahuatl, Xavánte, and Saya
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v2.11, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.11, and PUB
20. Universal Dependencies 2.12
- Creator:
- Zeman, Daniel, Nivre, Joakim, Abrams, Mitchell, Ackermann, Elia, Aepli, Noëmi, Aghaei, Hamid, Agić, Željko, Ahmadi, Amir, Ahrenberg, Lars, Ajede, Chika Kennedy, Akkurt, Salih Furkan, Aleksandravičiūtė, Gabrielė, Alfina, Ika, Algom, Avner, Alnajjar, Khalid, Alzetta, Chiara, Andersen, Erik, Antonsen, Lene, Aoyama, Tatsuya, Aplonova, Katya, Aquino, Angelina, Aragon, Carolina, Aranes, Glyd, Aranzabe, Maria Jesus, Arıcan, Bilge Nas, Arnardóttir, Þórunn, Arutie, Gashaw, Arwidarasti, Jessica Naraiswari, Asahara, Masayuki, Ásgeirsdóttir, Katla, Aslan, Deniz Baran, Asmazoğlu, Cengiz, Ateyah, Luma, Atmaca, Furkan, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Avelãs, Mariana, Badmaeva, Elena, Balasubramani, Keerthana, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Barkarson, Starkaður, Basile, Rodolfo, Basmov, Victoria, Batchelor, Colin, Bauer, John, Bedir, Seyyit Talha, Behzad, Shabnam, Bengoetxea, Kepa, Benli, İbrahim, Ben Moshe, Yifat, Berk, Gözde, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Bjarnadóttir, Kristín, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Braggaar, Anouck, Branco, António, Brokaitė, Kristina, Burchardt, Aljoscha, Campos, Marisa, Candito, Marie, Caron, Bernard, Caron, Gauthier, Carvalheiro, Catarina, Carvalho, Rita, Cassidy, Lauren, Castro, Maria Clara, Castro, Sérgio, Cavalcanti, Tatiana, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cesur, Neslihan, Cetin, Savas, Çetinoğlu, Özlem, Chalub, Fabricio, Chamila, Liyanage, Chauhan, Shweta, Chi, Ethan, Chika, Taishi, Cho, Yongseok, Choi, Jinho, Chun, Jayeol, Chung, Juyeon, Cignarella, Alessandra T., Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Corbetta, Daniela, Costa, Francisco, Courtin, Marine, Cristescu, Mihaela, Dale, Ingerid Løyning, Daniel, Philemon, Davidson, Elizabeth, de Alencar, Leonel Figueiredo, Dehouck, Mathieu, de Laurentiis, Martina, de Marneffe, Marie-Catherine, de Paiva, Valeria, Derin, Mehmet Oguz, de Souza, Elvis, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dinakaramani, Arawinda, Di Nuovo, Elisa, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Doyle, Adrian, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Ebert, Christian, Eckhoff, Hanne, Eguchi, Masaki, Eiche, Sandra, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erina, Olga, Erjavec, Tomaž, Essaidi, Farah, Etienne, Aline, Evelyn, Wograine, Facundes, Sidney, Farkas, Richárd, Favero, Federica, Ferdaousi, Jannatul, Fernanda, Marília, Fernandez Alcalde, Hector, Fethi, Amal, Foster, Jennifer, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Gamba, Federica, Garcia, Marcos, Gärdenfors, Moa, Gerardi, Fabrício Ferraz, Gerdes, Kim, Gessler, Luke, Ginter, Filip, Godoy, Gustavo, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Griciūtė, Bernadeta, Grioni, Matias, Grobol, Loïc, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Güngör, Tunga, Habash, Nizar, Hafsteinsson, Hinrik, Hajič, Jan, Hajič jr., Jan, Hämäläinen, Mika, Hà Mỹ, Linh, Han, Na-Rae, Hanifmuti, Muhammad Yudistira, Harada, Takahiro, Hardwick, Sam, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hellwig, Oliver, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Huerta Mendez, Marivel, Hwang, Jena, Ikeda, Takumi, Ingason, Anton Karl, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Islamaj, Artan, Ito, Kaoru, Jannat, Siratun, Jelínek, Tomáš, Jha, Apoorva, Jiang, Katharine, Johannsen, Anders, Jónsdóttir, Hildur, Jørgensen, Fredrik, Juutinen, Markus, Kaşıkara, Hüner, Kabaeva, Nadezhda, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Kara, Neslihan, Karahóǧa, Ritván, Kåsen, Andre, Kayadelen, Tolga, Kengatharaiyer, Sarveswaran, Kettnerová, Václava, Kirchner, Jesse, Klementieva, Elena, Klyachko, Elena, Köhn, Arne, Köksal, Abdullatif, Kopacewicz, Kamil, Korkiakangas, Timo, Köse, Mehmet, Koshevoy, Alexey, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Krishnamurthy, Parameswari, Kübler, Sandra, Kuqi, Adrian, Kuyrukçu, Oğuzhan, Kuzgun, Aslı, Kwak, Sookyoung, Kyle, Kris, Laippala, Veronika, Lambertino, Lorenzo, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Levina, Maria, Levine, Lauren, Li, Cheuk Ying, Li, Josie, Li, Keying, Li, Yixuan, Li, Yuan, Lim, KyungTae, Lima Padovani, Bruna, Lin, Yi-Ju Jessica, Lindén, Krister, Liu, Yang Janet, Ljubešić, Nikola, Loginova, Olga, Lusito, Stefano, Luthfi, Andry, Luukko, Mikko, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Mahamdi, Menel, Maillard, Jean, Makarchuk, Ilya, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Marşan, Büşra, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Markantonatou, Stella, Martínez Alonso, Héctor, Martín Rodríguez, Lorena, Martins, André, Martins, Cláudia, Mašek, Jan, Matsuda, Hiroshi, Matsumoto, Yuji, Mazzei, Alessandro, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Merzhevich, Tatiana, Miekka, Niko, Miller, Aaron, Mischenkova, Karina, Missilä, Anna, Mititelu, Cătălin, Mitrofan, Maria, Miyao, Yusuke, Mojiri Foroushani, AmirHossein, Molnár, Judit, Moloodi, Amirsaeid, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Moretti, Giovanni, Mori, Shinsuke, Morioka, Tomohiko, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Munro, Robert, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Nakhlé, Mariam, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nevaci, Manuela, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nourian, Alireza, Nurmi, Hanna, Ojala, Stina, Ojha, Atul Kr., Óladóttir, Hulda, Olúòkun, Adédayọ̀, Omura, Mai, Onwuegbuzia, Emeka, Ordan, Noam, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Özateş, Şaziye Betül, Özçelik, Merve, Özgür, Arzucan, Öztürk Başaran, Balkız, Paccosi, Teresa, Palmero Aprosio, Alessio, Panova, Anastasia, Park, Hyunji Hayley, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Pedonese, Giulia, Peljak-Łapińska, Angelika, Peng, Siyao, Peng, Siyao Logan, Pereira, Rita, Pereira, Sílvia, Perez, Cenel-Augusto, Perkova, Natalia, Perrier, Guy, Petrov, Slav, Petrova, Daria, Peverelli, Andrea, Phelan, Jason, Piitulainen, Jussi, Pinter, Yuval, Pinto, Clara, Pirinen, Tommi A, Pitler, Emily, Plamada, Magdalena, Plank, Barbara, Poibeau, Thierry, Ponomareva, Larisa, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Pugh, Robert, Puolakainen, Tiina, Pyysalo, Sampo, Qi, Peng, Querido, Andreia, Rääbis, Andriela, Rademaker, Alexandre, Rahoman, Mizanur, Rama, Taraka, Ramasamy, Loganathan, Ramos, Joana, Rashel, Fam, Rasooli, Mohammad Sadegh, Ravishankar, Vinit, Real, Livy, Rebeja, Petru, Reddy, Siva, Regnault, Mathilde, Rehm, Georg, Riabi, Arij, Riabov, Ivan, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rizqiyah, Putri, Rocha, Luisa, Rögnvaldsson, Eiríkur, Roksandic, Ivan, Romanenko, Mykhailo, Rosa, Rudolf, Roșca, Valentin, Rovati, Davide, Rozonoyer, Ben, Rudina, Olga, Rueter, Jack, Rúnarsson, Kristján, Sadde, Shoval, Safari, Pegah, Sahala, Aleksi, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Sanıyar, Ezgi, Särg, Dage, Sartor, Marta, Sasaki, Mitsuya, Saulīte, Baiba, Sawanakunanon, Yanin, Saxena, Shefali, Scannell, Kevin, Scarlata, Salvatore, Schneider, Nathan, Schuster, Sebastian, Schwartz, Lane, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shahzadi, Syeda, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shishkina, Yana, Shohibussirri, Muh, Shvedova, Maria, Siewert, Janine, Sigurðsson, Einar Freyr, Silva, João, Silveira, Aline, Silveira, Natalia, Silveira, Sara, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Símonarson, Haukur Barri, Simov, Kiril, Sitchinava, Dmitri, Sither, Ted, Skachedubova, Maria, Smith, Aaron, Soares-Bastos, Isabela, Solberg, Per Erik, Sonnenhauser, Barbara, Sourov, Shafi, Sprugnoli, Rachele, Stamou, Vivian, Steingrímsson, Steinþór, Stella, Antonio, Stephen, Abishek, Straka, Milan, Strickland, Emmett, Strnadová, Jana, Suhr, Alane, Sulestio, Yogi Lesmana, Sulubacak, Umut, Suzuki, Shingo, Swanson, Daniel, Szántó, Zsolt, Taguchi, Chihiro, Taji, Dima, Tamburini, Fabio, Tan, Mary Ann C., Tanaka, Takaaki, Tanaya, Dipta, Tavoni, Mirko, Tella, Samson, Tellier, Isabelle, Testori, Marinella, Thomas, Guillaume, Tonelli, Sara, Torga, Liisi, Toska, Marsida, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Türk, Utku, Tyers, Francis, Þórðarson, Sveinbjörn, Þorsteinsson, Vilhjálmur, Uematsu, Sumire, Untilov, Roman, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Utka, Andrius, Vagnoni, Elena, Vajjala, Sowmya, Vak, Socrates, van der Goot, Rob, Vanhove, Martine, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Vedenina, Uliana, Venturi, Giulia, Vincze, Veronika, Vlasova, Natalia, Wakasa, Aya, Wallenberg, Joel C., Wallin, Lars, Walsh, Abigail, Washington, Jonathan North, Wendt, Maximilan, Widmer, Paul, Wigderson, Shira, Wijono, Sri Hartati, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Yako, Mary, Yamashita, Kayo, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yenice, Arife Betül, Yıldız, Olcay Taner, Yu, Zhuoran, Yuliawati, Arlisa, Žabokrtský, Zdeněk, Zahra, Shorouq, Zeldes, Amir, Zhou, He, Zhu, Hanzhi, Zhu, Yilun, Zhuravleva, Anna, and Ziane, Rayan
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, Mbyá Guaraní, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Swiss German, Albanian, Icelandic, Akuntsu, Apurinã, Chukot, Khunsari, Manx, Mundurukú, Nayini, Old Turkish, Soi, South Levantine Arabic, Tupinambá, Beja, Western Frisian, Guajajára, Urubú-Kaapor, Kangri, K'iche', Low German, Makuráp, Central Siberian Yupik, Western Armenian, Bengali, Javanese, Karo (Brazil), Ligurian, Neapolitan, Tatar, Xibe, Yakut, Ancient Hebrew, Cebuano, Guarani, Hittite, Madi, Emerillon, Umbrian, Abaza, Gheg Albanian, Malayalam, Nhengatu, Sinhala, Zacatlán-Ahuacatlán-Tepetzintla Nahuatl, Xavánte, Saya, Borôro, Kirghiz, Algerian Arabic, and Old Irish (to 900)
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v2.12, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.12, and PUB
21. Universal Dependencies 2.12 models for UDPipe 2 (2023-07-17)
- Creator:
- Straka, Milan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- tool and toolService
- Subject:
- tokenizer, POS tagger, lemmatization, tagger, parser, and dependency parser
- Language:
- Afrikaans, Arabic, Belarusian, Bulgarian, Catalan, Czech, Church Slavic, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Persian, Finnish, French, Old French (842-ca. 1400), Scottish Gaelic, Irish, Galician, Gothic, Ancient Greek (to 1453), Ancient Hebrew, Hebrew, Hindi, Croatian, Hungarian, Armenian, Western Armenian, Indonesian, Icelandic, Italian, Japanese, Korean, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Maltese, Dutch, Norwegian Nynorsk, Norwegian Bokmål, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Telugu, Turkish, Uighur, Ukrainian, Urdu, Vietnamese, Gambian Wolof, Wolof, Chinese, Norwegian, Erzya, and Manx
- Description:
- Tokenizer, POS Tagger, Lemmatizer and Parser models for 131 treebanks of 72 languages of Universal Depenencies 2.12 Treebanks, created solely using UD 2.12 data (https://hdl.handle.net/11234/1-5150). The model documentation including performance can be found at https://ufal.mff.cuni.cz/udpipe/2/models#universal_dependencies_212_models . To use these models, you need UDPipe version 2.0, which you can download from https://ufal.mff.cuni.cz/udpipe/2 .
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
22. Universal Dependencies 2.13
- Creator:
- Zeman, Daniel, Nivre, Joakim, Abrams, Mitchell, Ackermann, Elia, Aepli, Noëmi, Aghaei, Hamid, Agić, Željko, Ahmadi, Amir, Ahrenberg, Lars, Ajede, Chika Kennedy, Akkurt, Salih Furkan, Aleksandravičiūtė, Gabrielė, Alfina, Ika, Algom, Avner, Alnajjar, Khalid, Alzetta, Chiara, Andersen, Erik, Antonsen, Lene, Aoyama, Tatsuya, Aplonova, Katya, Aquino, Angelina, Aragon, Carolina, Aranes, Glyd, Aranzabe, Maria Jesus, Arıcan, Bilge Nas, Arnardóttir, Þórunn, Arutie, Gashaw, Arwidarasti, Jessica Naraiswari, Asahara, Masayuki, Ásgeirsdóttir, Katla, Aslan, Deniz Baran, Asmazoğlu, Cengiz, Ateyah, Luma, Atmaca, Furkan, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Avelãs, Mariana, Badmaeva, Elena, Balasubramani, Keerthana, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Barkarson, Starkaður, Basile, Rodolfo, Basmov, Victoria, Batchelor, Colin, Bauer, John, Bedir, Seyyit Talha, Behzad, Shabnam, Belieni, Juan, Bengoetxea, Kepa, Benli, İbrahim, Ben Moshe, Yifat, Berk, Gözde, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Bjarnadóttir, Kristín, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Braggaar, Anouck, Branco, António, Brokaitė, Kristina, Burchardt, Aljoscha, Campos, Marisa, Candito, Marie, Caron, Bernard, Caron, Gauthier, Carvalheiro, Catarina, Carvalho, Rita, Cassidy, Lauren, Castro, Maria Clara, Castro, Sérgio, Cavalcanti, Tatiana, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cesur, Neslihan, Cetin, Savas, Çetinoğlu, Özlem, Chalub, Fabricio, Chamila, Liyanage, Chauhan, Shweta, Chi, Ethan, Chika, Taishi, Cho, Yongseok, Choi, Jinho, Chun, Jayeol, Chung, Juyeon, Cignarella, Alessandra T., Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Corbetta, Claudia, Corbetta, Daniela, Costa, Francisco, Courtin, Marine, Crabbé, Benoît, Cristescu, Mihaela, Cvetkoski, Vladimir, Dale, Ingerid Løyning, Daniel, Philemon, Davidson, Elizabeth, de Alencar, Leonel Figueiredo, Dehouck, Mathieu, de Laurentiis, Martina, de Marneffe, Marie-Catherine, de Paiva, Valeria, Derin, Mehmet Oguz, de Souza, Elvis, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dinakaramani, Arawinda, Di Nuovo, Elisa, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Doyle, Adrian, Dozat, Timothy, Droganova, Kira, Duran, Magali Sanches, Dwivedi, Puneet, Ebert, Christian, Eckhoff, Hanne, Eguchi, Masaki, Eiche, Sandra, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erina, Olga, Erjavec, Tomaž, Essaidi, Farah, Etienne, Aline, Evelyn, Wograine, Facundes, Sidney, Farkas, Richárd, Favero, Federica, Ferdaousi, Jannatul, Fernanda, Marília, Fernandez Alcalde, Hector, Fethi, Amal, Foster, Jennifer, Fransen, Theodorus, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Gamba, Federica, Garcia, Marcos, Gärdenfors, Moa, Gerardi, Fabrício Ferraz, Gerdes, Kim, Gessler, Luke, Ginter, Filip, Godoy, Gustavo, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Griciūtė, Bernadeta, Grioni, Matias, Grobol, Loïc, Grūzītis, Normunds, Guillaume, Bruno, Guiller, Kirian, Guillot-Barbance, Céline, Güngör, Tunga, Habash, Nizar, Hafsteinsson, Hinrik, Hajič, Jan, Hajič jr., Jan, Hämäläinen, Mika, Hà Mỹ, Linh, Han, Na-Rae, Hanifmuti, Muhammad Yudistira, Harada, Takahiro, Hardwick, Sam, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hellwig, Oliver, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Huang, Yidi, Huerta Mendez, Marivel, Hwang, Jena, Ikeda, Takumi, Ingason, Anton Karl, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Islamaj, Artan, Ito, Kaoru, Jagodzińska, Sandra, Jannat, Siratun, Jelínek, Tomáš, Jha, Apoorva, Jiang, Katharine, Johannsen, Anders, Jónsdóttir, Hildur, Jørgensen, Fredrik, Juutinen, Markus, Kaşıkara, Hüner, Kabaeva, Nadezhda, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Kara, Neslihan, Karahóǧa, Ritván, Kåsen, Andre, Kayadelen, Tolga, Kengatharaiyer, Sarveswaran, Kettnerová, Václava, Kharatyan, Lilit, Kirchner, Jesse, Klementieva, Elena, Klyachko, Elena, Kocharov, Petr, Köhn, Arne, Köksal, Abdullatif, Kopacewicz, Kamil, Korkiakangas, Timo, Köse, Mehmet, Koshevoy, Alexey, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Krishnamurthy, Parameswari, Kübler, Sandra, Kuqi, Adrian, Kuyrukçu, Oğuzhan, Kuzgun, Aslı, Kwak, Sookyoung, Kyle, Kris, Laan, Käbi, Laippala, Veronika, Lambertino, Lorenzo, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Levina, Maria, Levine, Lauren, Li, Cheuk Ying, Li, Josie, Li, Keying, Li, Yixuan, Li, Yuan, Lim, KyungTae, Lima Padovani, Bruna, Lin, Yi-Ju Jessica, Lindén, Krister, Liu, Yang Janet, Ljubešić, Nikola, Lobzhanidze, Irina, Loginova, Olga, Lopes, Lucelene, Lusito, Stefano, Luthfi, Andry, Luukko, Mikko, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Mahamdi, Menel, Maillard, Jean, Makarchuk, Ilya, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Marşan, Büşra, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Markantonatou, Stella, Martínez Alonso, Héctor, Martín Rodríguez, Lorena, Martins, André, Martins, Cláudia, Mašek, Jan, Matsuda, Hiroshi, Matsumoto, Yuji, Mazzei, Alessandro, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Merzhevich, Tatiana, Miekka, Niko, Miller, Aaron, Mischenkova, Karina, Missilä, Anna, Mititelu, Cătălin, Mitrofan, Maria, Miyao, Yusuke, Mojiri Foroushani, AmirHossein, Molnár, Judit, Moloodi, Amirsaeid, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Moretti, Giovanni, Mori, Shinsuke, Morioka, Tomohiko, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Munro, Robert, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Nakhlé, Mariam, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nevaci, Manuela, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nourian, Alireza, Nunes, Maria das Graças Volpe, Nurmi, Hanna, Ojala, Stina, Ojha, Atul Kr., Óladóttir, Hulda, Olúòkun, Adédayọ̀, Omura, Mai, Onwuegbuzia, Emeka, Ordan, Noam, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Özateş, Şaziye Betül, Özçelik, Merve, Özgür, Arzucan, Öztürk Başaran, Balkız, Paccosi, Teresa, Palmero Aprosio, Alessio, Panova, Anastasia, Pardo, Thiago Alexandre Salgueiro, Park, Hyunji Hayley, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Pedonese, Giulia, Peljak-Łapińska, Angelika, Peng, Siyao, Peng, Siyao Logan, Pereira, Rita, Pereira, Sílvia, Perez, Cenel-Augusto, Perkova, Natalia, Perrier, Guy, Petrov, Slav, Petrova, Daria, Peverelli, Andrea, Phelan, Jason, Pierre-Louis, Claudel, Piitulainen, Jussi, Pinter, Yuval, Pinto, Clara, Pintucci, Rodrigo, Pirinen, Tommi A, Pitler, Emily, Plamada, Magdalena, Plank, Barbara, Poibeau, Thierry, Ponomareva, Larisa, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Pugh, Robert, Puolakainen, Tiina, Pyysalo, Sampo, Qi, Peng, Querido, Andreia, Rääbis, Andriela, Rademaker, Alexandre, Rahoman, Mizanur, Rama, Taraka, Ramasamy, Loganathan, Ramisch, Carlos, Ramos, Joana, Rashel, Fam, Rasooli, Mohammad Sadegh, Ravishankar, Vinit, Real, Livy, Rebeja, Petru, Reddy, Siva, Regnault, Mathilde, Rehm, Georg, Riabi, Arij, Riabov, Ivan, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rizqiyah, Putri, Rocha, Luisa, Rögnvaldsson, Eiríkur, Roksandic, Ivan, Romanenko, Mykhailo, Rosa, Rudolf, Roșca, Valentin, Rovati, Davide, Rozonoyer, Ben, Rudina, Olga, Rueter, Jack, Rúnarsson, Kristján, Sadde, Shoval, Safari, Pegah, Sahala, Aleksi, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Sanıyar, Ezgi, Särg, Dage, Sartor, Marta, Sasaki, Mitsuya, Saulīte, Baiba, Savary, Agata, Sawanakunanon, Yanin, Saxena, Shefali, Scannell, Kevin, Scarlata, Salvatore, Schang, Emmanuel, Schneider, Nathan, Schuster, Sebastian, Schwartz, Lane, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shahzadi, Syeda, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shishkina, Yana, Shohibussirri, Muh, Shvedova, Maria, Siewert, Janine, Sigurðsson, Einar Freyr, Silva, João, Silveira, Aline, Silveira, Natalia, Silveira, Sara, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Símonarson, Haukur Barri, Simov, Kiril, Sitchinava, Dmitri, Sither, Ted, Skachedubova, Maria, Smith, Aaron, Soares-Bastos, Isabela, Solberg, Per Erik, Sonnenhauser, Barbara, Sourov, Shafi, Sprugnoli, Rachele, Stamou, Vivian, Steingrímsson, Steinþór, Stella, Antonio, Stephen, Abishek, Straka, Milan, Strickland, Emmett, Strnadová, Jana, Suhr, Alane, Sulestio, Yogi Lesmana, Sulubacak, Umut, Suzuki, Shingo, Swanson, Daniel, Szántó, Zsolt, Taguchi, Chihiro, Taji, Dima, Tamburini, Fabio, Tan, Mary Ann C., Tanaka, Takaaki, Tanaya, Dipta, Tavoni, Mirko, Tella, Samson, Tellier, Isabelle, Testori, Marinella, Thomas, Guillaume, Tonelli, Sara, Torga, Liisi, Toska, Marsida, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Türk, Utku, Tyers, Francis, Þórðarson, Sveinbjörn, Þorsteinsson, Vilhjálmur, Uematsu, Sumire, Untilov, Roman, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Utka, Andrius, Vagnoni, Elena, Vajjala, Sowmya, Vak, Socrates, van der Goot, Rob, Vanhove, Martine, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Vedenina, Uliana, Venturi, Giulia, Villemonte de la Clergerie, Eric, Vincze, Veronika, Vlasova, Natalia, Wakasa, Aya, Wallenberg, Joel C., Wallin, Lars, Walsh, Abigail, Washington, Jonathan North, Wendt, Maximilan, Widmer, Paul, Wigderson, Shira, Wijono, Sri Hartati, Wille, Vanessa Berwanger, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Wu, Qishen, Yako, Mary, Yamashita, Kayo, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yenice, Arife Betül, Yıldız, Olcay Taner, Yu, Zhuoran, Yuliawati, Arlisa, Žabokrtský, Zdeněk, Zahra, Shorouq, Zeldes, Amir, Zhou, He, Zhu, Hanzhi, Zhu, Yilun, Zhuravleva, Anna, and Ziane, Rayan
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, Mbyá Guaraní, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Swiss German, Albanian, Icelandic, Akuntsu, Apurinã, Chukot, Khunsari, Manx, Mundurukú, Nayini, Old Turkish, Soi, South Levantine Arabic, Tupinambá, Beja, Western Frisian, Guajajára, Urubú-Kaapor, Kangri, K'iche', Low German, Makuráp, Central Siberian Yupik, Western Armenian, Bengali, Javanese, Karo (Brazil), Ligurian, Neapolitan, Tatar, Xibe, Yakut, Ancient Hebrew, Cebuano, Guarani, Hittite, Madi, Emerillon, Umbrian, Abaza, Gheg Albanian, Malayalam, Nhengatu, Sinhala, Zacatlán-Ahuacatlán-Tepetzintla Nahuatl, Xavánte, Saya, Borôro, Kirghiz, Algerian Arabic, Old Irish (to 900), Classical Armenian, Georgian, Haitian, Highland Puebla Nahuatl, Macedonian, Middle French (ca. 1400-1600), and Veps
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v2.13, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.13, and PUB
23. Universal Dependencies 2.2
- Creator:
- Nivre, Joakim, Abrams, Mitchell, Agić, Željko, Ahrenberg, Lars, Antonsen, Lene, Aranzabe, Maria Jesus, Arutie, Gashaw, Asahara, Masayuki, Ateyah, Luma, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Bauer, John, Bellato, Sandra, Bengoetxea, Kepa, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Blokland, Rogier, Bobicev, Victoria, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Burchardt, Aljoscha, Candito, Marie, Caron, Bernard, Caron, Gauthier, Cebiroğlu Eryiğit, Gülşen, Celano, Giuseppe G. A., Cetin, Savas, Chalub, Fabricio, Choi, Jinho, Cho, Yongseok, Chun, Jayeol, Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Courtin, Marine, Davidson, Elizabeth, de Marneffe, Marie-Catherine, de Paiva, Valeria, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erjavec, Tomaž, Etienne, Aline, Farkas, Richárd, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Gajdošová, Katarína, Galbraith, Daniel, Garcia, Marcos, Gärdenfors, Moa, Gerdes, Kim, Ginter, Filip, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, Gonzáles Saavedra, Berta, Grioni, Matias, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Habash, Nizar, Hajič, Jan, Hajič jr., Jan, Hà Mỹ, Linh, Han, Na-Rae, Harris, Kim, Haug, Dag, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Hwang, Jena, Ion, Radu, Irimia, Elena, Jelínek, Tomáš, Johannsen, Anders, Jørgensen, Fredrik, Kaşıkara, Hüner, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Kayadelen, Tolga, Kettnerová, Václava, Kirchner, Jesse, Kotsyba, Natalia, Krek, Simon, Kwak, Sookyoung, Laippala, Veronika, Lambertino, Lorenzo, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Li, Cheuk Ying, Li, Josie, Li, Keying, Lim, KyungTae, Ljubešić, Nikola, Loginova, Olga, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Martínez Alonso, Héctor, Martins, André, Mašek, Jan, Matsumoto, Yuji, McDonald, Ryan, Mendonça, Gustavo, Miekka, Niko, Missilä, Anna, Mititelu, Cătălin, Miyao, Yusuke, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Mori, Shinsuke, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikolaev, Vitaly, Nitisaroj, Rattima, Nurmi, Hanna, Ojala, Stina, Olúòkun, Adédayọ̀, Omura, Mai, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Peng, Siyao, Perez, Cenel-Augusto, Perrier, Guy, Petrov, Slav, Piitulainen, Jussi, Pitler, Emily, Plank, Barbara, Poibeau, Thierry, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Puolakainen, Tiina, Pyysalo, Sampo, Rääbis, Andriela, Rademaker, Alexandre, Ramasamy, Loganathan, Rama, Taraka, Ramisch, Carlos, Ravishankar, Vinit, Real, Livy, Reddy, Siva, Rehm, Georg, Rießler, Michael, Rinaldi, Larissa, Rituma, Laura, Rocha, Luisa, Romanenko, Mykhailo, Rosa, Rudolf, Rovati, Davide, Roșca, Valentin, Rudina, Olga, Sadde, Shoval, Saleh, Shadi, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Saulīte, Baiba, Sawanakunanon, Yanin, Schneider, Nathan, Schuster, Sebastian, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shen, Mo, Shimada, Atsuko, Shohibussirri, Muh, Sichinava, Dmitry, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Smith, Aaron, Soares-Bastos, Isabela, Stella, Antonio, Straka, Milan, Strnadová, Jana, Suhr, Alane, Sulubacak, Umut, Szántó, Zsolt, Taji, Dima, Takahashi, Yuta, Tanaka, Takaaki, Tellier, Isabelle, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Tyers, Francis, Uematsu, Sumire, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Vajjala, Sowmya, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Vincze, Veronika, Wallin, Lars, Washington, Jonathan North, Williams, Seyi, Wirén, Mats, Woldemariam, Tsegay, Wong, Tak-sum, Yan, Chunxiao, Yavrumyan, Marat M., Yu, Zhuoran, Žabokrtský, Zdeněk, Zeldes, Amir, Zeman, Daniel, Zhang, Manying, and Zhu, Hanzhi
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, and Yoruba
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v2.2, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.2, and PUB
24. Universal Dependencies 2.3
- Creator:
- Nivre, Joakim, Abrams, Mitchell, Agić, Željko, Ahrenberg, Lars, Antonsen, Lene, Aplonova, Katya, Aranzabe, Maria Jesus, Arutie, Gashaw, Asahara, Masayuki, Ateyah, Luma, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Basmov, Victoria, Bauer, John, Bellato, Sandra, Bengoetxea, Kepa, Berzak, Yevgeni, Bhat, Irshad Ahmad, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Blokland, Rogier, Bobicev, Victoria, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Burchardt, Aljoscha, Candito, Marie, Caron, Bernard, Caron, Gauthier, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cetin, Savas, Chalub, Fabricio, Choi, Jinho, Cho, Yongseok, Chun, Jayeol, Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Courtin, Marine, Davidson, Elizabeth, de Marneffe, Marie-Catherine, de Paiva, Valeria, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erjavec, Tomaž, Etienne, Aline, Farkas, Richárd, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Gajdošová, Katarína, Galbraith, Daniel, Garcia, Marcos, Gärdenfors, Moa, Garza, Sebastian, Gerdes, Kim, Ginter, Filip, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, Gonzáles Saavedra, Berta, Grioni, Matias, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Habash, Nizar, Hajič, Jan, Hajič jr., Jan, Hà Mỹ, Linh, Han, Na-Rae, Harris, Kim, Haug, Dag, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Hwang, Jena, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Jelínek, Tomáš, Johannsen, Anders, Jørgensen, Fredrik, Kaşıkara, Hüner, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Katz, Boris, Kayadelen, Tolga, Kenney, Jessica, Kettnerová, Václava, Kirchner, Jesse, Kopacewicz, Kamil, Kotsyba, Natalia, Krek, Simon, Kwak, Sookyoung, Laippala, Veronika, Lambertino, Lorenzo, Lam, Lucia, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Li, Cheuk Ying, Li, Josie, Li, Keying, Lim, KyungTae, Ljubešić, Nikola, Loginova, Olga, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Martínez Alonso, Héctor, Martins, André, Mašek, Jan, Matsumoto, Yuji, McDonald, Ryan, Mendonça, Gustavo, Miekka, Niko, Misirpashayeva, Margarita, Missilä, Anna, Mititelu, Cătălin, Miyao, Yusuke, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Mori, Keiko Sophie, Mori, Shinsuke, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikolaev, Vitaly, Nitisaroj, Rattima, Nurmi, Hanna, Ojala, Stina, Olúòkun, Adédayọ̀, Omura, Mai, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Peng, Siyao, Perez, Cenel-Augusto, Perrier, Guy, Petrov, Slav, Piitulainen, Jussi, Pitler, Emily, Plank, Barbara, Poibeau, Thierry, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Puolakainen, Tiina, Pyysalo, Sampo, Rääbis, Andriela, Rademaker, Alexandre, Ramasamy, Loganathan, Rama, Taraka, Ramisch, Carlos, Ravishankar, Vinit, Real, Livy, Reddy, Siva, Rehm, Georg, Rießler, Michael, Rinaldi, Larissa, Rituma, Laura, Rocha, Luisa, Romanenko, Mykhailo, Rosa, Rudolf, Rovati, Davide, Roșca, Valentin, Rudina, Olga, Rueter, Jack, Sadde, Shoval, Sagot, Benoît, Saleh, Shadi, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Saulīte, Baiba, Sawanakunanon, Yanin, Schneider, Nathan, Schuster, Sebastian, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shen, Mo, Shimada, Atsuko, Shohibussirri, Muh, Sichinava, Dmitry, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Smith, Aaron, Soares-Bastos, Isabela, Spadine, Carolyn, Stella, Antonio, Straka, Milan, Strnadová, Jana, Suhr, Alane, Sulubacak, Umut, Szántó, Zsolt, Taji, Dima, Takahashi, Yuta, Tanaka, Takaaki, Tellier, Isabelle, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Tyers, Francis, Uematsu, Sumire, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Vajjala, Sowmya, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Villemonte de la Clergerie, Eric, Vincze, Veronika, Wallin, Lars, Wang, Jing Xian, Washington, Jonathan North, Williams, Seyi, Wirén, Mats, Woldemariam, Tsegay, Wong, Tak-sum, Yan, Chunxiao, Yavrumyan, Marat M., Yu, Zhuoran, Žabokrtský, Zdeněk, Zeldes, Amir, Zeman, Daniel, Zhang, Manying, and Zhu, Hanzhi
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, and Maltese
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v2.3, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.3, and PUB
25. Universal Dependencies 2.4
- Creator:
- Nivre, Joakim, Abrams, Mitchell, Agić, Željko, Ahrenberg, Lars, Aleksandravičiūtė, Gabrielė, Antonsen, Lene, Aplonova, Katya, Aranzabe, Maria Jesus, Arutie, Gashaw, Asahara, Masayuki, Ateyah, Luma, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Basmov, Victoria, Bauer, John, Bellato, Sandra, Bengoetxea, Kepa, Berzak, Yevgeni, Bhat, Irshad Ahmad, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Brokaitė, Kristina, Burchardt, Aljoscha, Candito, Marie, Caron, Bernard, Caron, Gauthier, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cetin, Savas, Chalub, Fabricio, Choi, Jinho, Cho, Yongseok, Chun, Jayeol, Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Courtin, Marine, Davidson, Elizabeth, de Marneffe, Marie-Catherine, de Paiva, Valeria, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eckhoff, Hanne, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erjavec, Tomaž, Etienne, Aline, Farkas, Richárd, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Garcia, Marcos, Gärdenfors, Moa, Garza, Sebastian, Gerdes, Kim, Ginter, Filip, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Grioni, Matias, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Habash, Nizar, Hajič, Jan, Hajič jr., Jan, Hà Mỹ, Linh, Han, Na-Rae, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Hwang, Jena, Ikeda, Takumi, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Jelínek, Tomáš, Johannsen, Anders, Jørgensen, Fredrik, Kaşıkara, Hüner, Kaasen, Andre, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Katz, Boris, Kayadelen, Tolga, Kenney, Jessica, Kettnerová, Václava, Kirchner, Jesse, Köhn, Arne, Kopacewicz, Kamil, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Kwak, Sookyoung, Laippala, Veronika, Lambertino, Lorenzo, Lam, Lucia, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Li, Cheuk Ying, Li, Josie, Li, Keying, Lim, KyungTae, Li, Yuan, Ljubešić, Nikola, Loginova, Olga, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Martínez Alonso, Héctor, Martins, André, Mašek, Jan, Matsumoto, Yuji, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Miekka, Niko, Misirpashayeva, Margarita, Missilä, Anna, Mititelu, Cătălin, Miyao, Yusuke, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Mori, Keiko Sophie, Morioka, Tomohiko, Mori, Shinsuke, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nurmi, Hanna, Ojala, Stina, Olúòkun, Adédayọ̀, Omura, Mai, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Peljak-Łapińska, Angelika, Peng, Siyao, Perez, Cenel-Augusto, Perrier, Guy, Petrova, Daria, Petrov, Slav, Piitulainen, Jussi, Pirinen, Tommi A, Pitler, Emily, Plank, Barbara, Poibeau, Thierry, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Puolakainen, Tiina, Pyysalo, Sampo, Rääbis, Andriela, Rademaker, Alexandre, Ramasamy, Loganathan, Rama, Taraka, Ramisch, Carlos, Ravishankar, Vinit, Real, Livy, Reddy, Siva, Rehm, Georg, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rocha, Luisa, Romanenko, Mykhailo, Rosa, Rudolf, Rovati, Davide, Roșca, Valentin, Rudina, Olga, Rueter, Jack, Sadde, Shoval, Sagot, Benoît, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Särg, Dage, Saulīte, Baiba, Sawanakunanon, Yanin, Schneider, Nathan, Schuster, Sebastian, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shohibussirri, Muh, Sichinava, Dmitry, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Smith, Aaron, Soares-Bastos, Isabela, Spadine, Carolyn, Stella, Antonio, Straka, Milan, Strnadová, Jana, Suhr, Alane, Sulubacak, Umut, Suzuki, Shingo, Szántó, Zsolt, Taji, Dima, Takahashi, Yuta, Tamburini, Fabio, Tanaka, Takaaki, Tellier, Isabelle, Thomas, Guillaume, Torga, Liisi, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Tyers, Francis, Uematsu, Sumire, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Vajjala, Sowmya, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Villemonte de la Clergerie, Eric, Vincze, Veronika, Wallin, Lars, Walsh, Abigail, Wang, Jing Xian, Washington, Jonathan North, Wendt, Maximilan, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Yako, Mary, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yu, Zhuoran, Žabokrtský, Zdeněk, Zeldes, Amir, Zeman, Daniel, Zhang, Manying, and Zhu, Hanzhi
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, and Mbyá Guaraní
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v2.4, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.4, and PUB
26. Universal Dependencies 2.4 Models for UDPipe (2019-05-31)
- Creator:
- Straka, Milan and Straková, Jana
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- tool and toolService
- Subject:
- tokenizer, POS tagger, lemmatization, tagger, parser, and dependency parser
- Language:
- Czech, Afrikaans, Arabic, Belarusian, Bulgarian, Catalan, Church Slavic, Coptic, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Persian, Finnish, French, Old French (842-ca. 1400), Irish, Galician, Gothic, Ancient Greek (to 1453), Hebrew, Hindi, Croatian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Korean, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Maltese, Dutch, Norwegian Nynorsk, Norwegian Bokmål, Old Russian, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Telugu, Turkish, Uighur, Ukrainian, Urdu, Vietnamese, Gambian Wolof, Wolof, and Chinese
- Description:
- Tokenizer, POS Tagger, Lemmatizer and Parser models for 90 treebanks of 60 languages of Universal Depenencies 2.4 Treebanks, created solely using UD 2.4 data (http://hdl.handle.net/11234/1-2988). The model documentation including performance can be found at http://ufal.mff.cuni.cz/udpipe/models#universal_dependencies_24_models . To use these models, you need UDPipe binary version at least 1.2, which you can download from http://ufal.mff.cuni.cz/udpipe . In addition to models itself, all additional data and value of hyperparameters used for training are available in the second archive, allowing reproducible training.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
27. Universal Dependencies 2.5
- Creator:
- Zeman, Daniel, Nivre, Joakim, Abrams, Mitchell, Aepli, Noëmi, Agić, Željko, Ahrenberg, Lars, Aleksandravičiūtė, Gabrielė, Antonsen, Lene, Aplonova, Katya, Aranzabe, Maria Jesus, Arutie, Gashaw, Asahara, Masayuki, Ateyah, Luma, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Basmov, Victoria, Batchelor, Colin, Bauer, John, Bellato, Sandra, Bengoetxea, Kepa, Berzak, Yevgeni, Bhat, Irshad Ahmad, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Brokaitė, Kristina, Burchardt, Aljoscha, Candito, Marie, Caron, Bernard, Caron, Gauthier, Cavalcanti, Tatiana, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cetin, Savas, Chalub, Fabricio, Choi, Jinho, Cho, Yongseok, Chun, Jayeol, Cignarella, Alessandra T., Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Courtin, Marine, Davidson, Elizabeth, de Marneffe, Marie-Catherine, de Paiva, Valeria, de Souza, Elvis, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eckhoff, Hanne, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erina, Olga, Erjavec, Tomaž, Etienne, Aline, Evelyn, Wograine, Farkas, Richárd, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Garcia, Marcos, Gärdenfors, Moa, Garza, Sebastian, Gerdes, Kim, Ginter, Filip, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Griciūtė, Bernadeta, Grioni, Matias, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Habash, Nizar, Hajič, Jan, Hajič jr., Jan, Hämäläinen, Mika, Hà Mỹ, Linh, Han, Na-Rae, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Hwang, Jena, Ikeda, Takumi, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Jelínek, Tomáš, Johannsen, Anders, Jørgensen, Fredrik, Juutinen, Markus, Kaşıkara, Hüner, Kaasen, Andre, Kabaeva, Nadezhda, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Katz, Boris, Kayadelen, Tolga, Kenney, Jessica, Kettnerová, Václava, Kirchner, Jesse, Klementieva, Elena, Köhn, Arne, Kopacewicz, Kamil, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Kwak, Sookyoung, Laippala, Veronika, Lambertino, Lorenzo, Lam, Lucia, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Li, Cheuk Ying, Li, Josie, Li, Keying, Lim, KyungTae, Liovina, Maria, Li, Yuan, Ljubešić, Nikola, Loginova, Olga, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Martínez Alonso, Héctor, Martins, André, Mašek, Jan, Matsumoto, Yuji, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Miekka, Niko, Misirpashayeva, Margarita, Missilä, Anna, Mititelu, Cătălin, Mitrofan, Maria, Miyao, Yusuke, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Mori, Keiko Sophie, Morioka, Tomohiko, Mori, Shinsuke, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Munro, Robert, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nurmi, Hanna, Ojala, Stina, Ojha, Atul Kr., Olúòkun, Adédayọ̀, Omura, Mai, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Peljak-Łapińska, Angelika, Peng, Siyao, Perez, Cenel-Augusto, Perrier, Guy, Petrova, Daria, Petrov, Slav, Phelan, Jason, Piitulainen, Jussi, Pirinen, Tommi A, Pitler, Emily, Plank, Barbara, Poibeau, Thierry, Ponomareva, Larisa, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Puolakainen, Tiina, Pyysalo, Sampo, Qi, Peng, Rääbis, Andriela, Rademaker, Alexandre, Ramasamy, Loganathan, Rama, Taraka, Ramisch, Carlos, Ravishankar, Vinit, Real, Livy, Reddy, Siva, Rehm, Georg, Riabov, Ivan, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rocha, Luisa, Romanenko, Mykhailo, Rosa, Rudolf, Rovati, Davide, Roșca, Valentin, Rudina, Olga, Rueter, Jack, Sadde, Shoval, Sagot, Benoît, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Särg, Dage, Saulīte, Baiba, Sawanakunanon, Yanin, Schneider, Nathan, Schuster, Sebastian, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shohibussirri, Muh, Sichinava, Dmitry, Silveira, Aline, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Smith, Aaron, Soares-Bastos, Isabela, Spadine, Carolyn, Stella, Antonio, Straka, Milan, Strnadová, Jana, Suhr, Alane, Sulubacak, Umut, Suzuki, Shingo, Szántó, Zsolt, Taji, Dima, Takahashi, Yuta, Tamburini, Fabio, Tanaka, Takaaki, Tellier, Isabelle, Thomas, Guillaume, Torga, Liisi, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Tyers, Francis, Uematsu, Sumire, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Utka, Andrius, Vajjala, Sowmya, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Villemonte de la Clergerie, Eric, Vincze, Veronika, Wallin, Lars, Walsh, Abigail, Wang, Jing Xian, Washington, Jonathan North, Wendt, Maximilan, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Yako, Mary, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yu, Zhuoran, Žabokrtský, Zdeněk, Zeldes, Amir, Zhang, Manying, and Zhu, Hanzhi
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, Mbyá Guaraní, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, and Swiss German
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v2.5, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.5, and PUB
28. Universal Dependencies 2.5 Models for UDPipe (2019-12-06)
- Creator:
- Straka, Milan and Straková, Jana
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- tool and toolService
- Subject:
- tokenizer, POS tagger, lemmatization, tagger, parser, and dependency parser
- Language:
- Czech, Afrikaans, Arabic, Belarusian, Bulgarian, Catalan, Church Slavic, Coptic, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Persian, Finnish, French, Old French (842-ca. 1400), Irish, Galician, Gothic, Ancient Greek (to 1453), Hebrew, Hindi, Croatian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Korean, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Maltese, Dutch, Norwegian Nynorsk, Norwegian Bokmål, Old Russian, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Telugu, Turkish, Uighur, Ukrainian, Urdu, Vietnamese, Gambian Wolof, Wolof, Chinese, and Scottish Gaelic
- Description:
- Tokenizer, POS Tagger, Lemmatizer and Parser models for 94 treebanks of 61 languages of Universal Depenencies 2.5 Treebanks, created solely using UD 2.5 data (http://hdl.handle.net/11234/1-3105). The model documentation including performance can be found at http://ufal.mff.cuni.cz/udpipe/models#universal_dependencies_25_models . To use these models, you need UDPipe binary version at least 1.2, which you can download from http://ufal.mff.cuni.cz/udpipe . In addition to models itself, all additional data and value of hyperparameters used for training are available in the second archive, allowing reproducible training.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
29. Universal Dependencies 2.6
- Creator:
- Zeman, Daniel, Nivre, Joakim, Abrams, Mitchell, Ackermann, Elia, Aepli, Noëmi, Agić, Željko, Ahrenberg, Lars, Ajede, Chika Kennedy, Aleksandravičiūtė, Gabrielė, Antonsen, Lene, Aplonova, Katya, Aquino, Angelina, Aranzabe, Maria Jesus, Arutie, Gashaw, Asahara, Masayuki, Ateyah, Luma, Atmaca, Furkan, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Basmov, Victoria, Batchelor, Colin, Bauer, John, Bengoetxea, Kepa, Berzak, Yevgeni, Bhat, Irshad Ahmad, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Brokaitė, Kristina, Burchardt, Aljoscha, Candito, Marie, Caron, Bernard, Caron, Gauthier, Cavalcanti, Tatiana, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cetin, Savas, Chalub, Fabricio, Chi, Ethan, Choi, Jinho, Cho, Yongseok, Chun, Jayeol, Cignarella, Alessandra T., Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Courtin, Marine, Davidson, Elizabeth, de Marneffe, Marie-Catherine, de Paiva, Valeria, de Souza, Elvis, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eckhoff, Hanne, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erina, Olga, Erjavec, Tomaž, Etienne, Aline, Evelyn, Wograine, Farkas, Richárd, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Garcia, Marcos, Gärdenfors, Moa, Garza, Sebastian, Gerdes, Kim, Ginter, Filip, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Griciūtė, Bernadeta, Grioni, Matias, Grobol, Loïc, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Güngör, Tunga, Habash, Nizar, Hajič, Jan, Hajič jr., Jan, Hämäläinen, Mika, Hà Mỹ, Linh, Han, Na-Rae, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hellwig, Oliver, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Hwang, Jena, Ikeda, Takumi, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Jelínek, Tomáš, Johannsen, Anders, Jónsdóttir, Hildur, Jørgensen, Fredrik, Juutinen, Markus, Kaşıkara, Hüner, Kaasen, Andre, Kabaeva, Nadezhda, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Katz, Boris, Kayadelen, Tolga, Kenney, Jessica, Kettnerová, Václava, Kirchner, Jesse, Klementieva, Elena, Köhn, Arne, Köksal, Abdullatif, Kopacewicz, Kamil, Korkiakangas, Timo, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Kwak, Sookyoung, Laippala, Veronika, Lambertino, Lorenzo, Lam, Lucia, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Levina, Maria, Li, Cheuk Ying, Li, Josie, Li, Keying, Lim, KyungTae, Li, Yuan, Ljubešić, Nikola, Loginova, Olga, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Martínez Alonso, Héctor, Martins, André, Mašek, Jan, Matsuda, Hiroshi, Matsumoto, Yuji, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Miekka, Niko, Misirpashayeva, Margarita, Missilä, Anna, Mititelu, Cătălin, Mitrofan, Maria, Miyao, Yusuke, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Mori, Keiko Sophie, Morioka, Tomohiko, Mori, Shinsuke, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Munro, Robert, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nurmi, Hanna, Ojala, Stina, Ojha, Atul Kr., Olúòkun, Adédayọ̀, Omura, Mai, Onwuegbuzia, Emeka, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Özateş, Şaziye Betül, Özgür, Arzucan, Öztürk Başaran, Balkız, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Peljak-Łapińska, Angelika, Peng, Siyao, Perez, Cenel-Augusto, Perrier, Guy, Petrova, Daria, Petrov, Slav, Phelan, Jason, Piitulainen, Jussi, Pirinen, Tommi A, Pitler, Emily, Plank, Barbara, Poibeau, Thierry, Ponomareva, Larisa, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Puolakainen, Tiina, Pyysalo, Sampo, Qi, Peng, Rääbis, Andriela, Rademaker, Alexandre, Ramasamy, Loganathan, Rama, Taraka, Ramisch, Carlos, Ravishankar, Vinit, Real, Livy, Rebeja, Petru, Reddy, Siva, Rehm, Georg, Riabov, Ivan, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rocha, Luisa, Romanenko, Mykhailo, Rosa, Rudolf, Roșca, Valentin, Rovati, Davide, Rudina, Olga, Rueter, Jack, Sadde, Shoval, Sagot, Benoît, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Särg, Dage, Saulīte, Baiba, Sawanakunanon, Yanin, Scarlata, Salvatore, Schneider, Nathan, Schuster, Sebastian, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shohibussirri, Muh, Sichinava, Dmitry, Silveira, Aline, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Skachedubova, Maria, Smith, Aaron, Soares-Bastos, Isabela, Spadine, Carolyn, Stella, Antonio, Straka, Milan, Strnadová, Jana, Suhr, Alane, Sulubacak, Umut, Suzuki, Shingo, Szántó, Zsolt, Taji, Dima, Takahashi, Yuta, Tamburini, Fabio, Tanaka, Takaaki, Tella, Samson, Tellier, Isabelle, Thomas, Guillaume, Torga, Liisi, Toska, Marsida, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Türk, Utku, Tyers, Francis, Uematsu, Sumire, Untilov, Roman, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Utka, Andrius, Vajjala, Sowmya, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Villemonte de la Clergerie, Eric, Vincze, Veronika, Wakasa, Aya, Wallin, Lars, Walsh, Abigail, Wang, Jing Xian, Washington, Jonathan North, Wendt, Maximilan, Widmer, Paul, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Yako, Mary, Yamashita, Kayo, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yu, Zhuoran, Žabokrtský, Zdeněk, Zeldes, Amir, Zhu, Hanzhi, and Zhuravleva, Anna
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, Mbyá Guaraní, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Swiss German, Albanian, and Icelandic
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v2.6, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.6, and PUB
30. Universal Dependencies 2.6 models for UDPipe 2 (2020-08-31)
- Creator:
- Straka, Milan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- tool and toolService
- Subject:
- tokenizer, POS tagger, lemmatization, tagger, parser, and dependency parser
- Language:
- Afrikaans, Arabic, Armenian, Belarusian, Bulgarian, Catalan, Czech, Church Slavic, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Persian, Finnish, French, Old French (842-ca. 1400), Scottish Gaelic, Irish, Galician, Gothic, Ancient Greek (to 1453), Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Korean, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Maltese, Dutch, Norwegian Nynorsk, Norwegian Bokmål, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Telugu, Turkish, Uighur, Ukrainian, Urdu, Vietnamese, Gambian Wolof, Wolof, and Chinese
- Description:
- Tokenizer, POS Tagger, Lemmatizer and Parser models for 99 treebanks of 63 languages of Universal Depenencies 2.6 Treebanks, created solely using UD 2.6 data (https://hdl.handle.net/11234/1-3226). The model documentation including performance can be found at https://ufal.mff.cuni.cz/udpipe/2/models#universal_dependencies_26_models . To use these models, you need UDPipe version 2.0, which you can download from https://ufal.mff.cuni.cz/udpipe/2 .
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
31. Universal Dependencies 2.7
- Creator:
- Zeman, Daniel, Nivre, Joakim, Abrams, Mitchell, Ackermann, Elia, Aepli, Noëmi, Aghaei, Hamid, Agić, Željko, Ahmadi, Amir, Ahrenberg, Lars, Ajede, Chika Kennedy, Aleksandravičiūtė, Gabrielė, Alfina, Ika, Antonsen, Lene, Aplonova, Katya, Aquino, Angelina, Aragon, Carolina, Aranzabe, Maria Jesus, Arnardóttir, Þórunn, Arutie, Gashaw, Arwidarasti, Jessica Naraiswari, Asahara, Masayuki, Ateyah, Luma, Atmaca, Furkan, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Balasubramani, Keerthana, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Basmov, Victoria, Batchelor, Colin, Bauer, John, Bedir, Seyyit Talha, Bengoetxea, Kepa, Berk, Gözde, Berzak, Yevgeni, Bhat, Irshad Ahmad, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Bjarnadóttir, Kristín, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Brokaitė, Kristina, Burchardt, Aljoscha, Candito, Marie, Caron, Bernard, Caron, Gauthier, Cavalcanti, Tatiana, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cetin, Savas, Çetinoğlu, Özlem, Chalub, Fabricio, Chi, Ethan, Cho, Yongseok, Choi, Jinho, Chun, Jayeol, Cignarella, Alessandra T., Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Courtin, Marine, Davidson, Elizabeth, de Marneffe, Marie-Catherine, de Paiva, Valeria, Derin, Mehmet Oguz, de Souza, Elvis, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dinakaramani, Arawinda, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eckhoff, Hanne, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erina, Olga, Erjavec, Tomaž, Etienne, Aline, Evelyn, Wograine, Facundes, Sidney, Farkas, Richárd, Fernanda, Marília, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Garcia, Marcos, Gärdenfors, Moa, Garza, Sebastian, Gerardi, Fabrício Ferraz, Gerdes, Kim, Ginter, Filip, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Griciūtė, Bernadeta, Grioni, Matias, Grobol, Loïc, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Güngör, Tunga, Habash, Nizar, Hafsteinsson, Hinrik, Hajič, Jan, Hajič jr., Jan, Hämäläinen, Mika, Hà Mỹ, Linh, Han, Na-Rae, Hanifmuti, Muhammad Yudistira, Hardwick, Sam, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hellwig, Oliver, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Huber, Eva, Hwang, Jena, Ikeda, Takumi, Ingason, Anton Karl, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Jelínek, Tomáš, Johannsen, Anders, Jónsdóttir, Hildur, Jørgensen, Fredrik, Juutinen, Markus, K, Sarveswaran, Kaşıkara, Hüner, Kaasen, Andre, Kabaeva, Nadezhda, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Katz, Boris, Kayadelen, Tolga, Kenney, Jessica, Kettnerová, Václava, Kirchner, Jesse, Klementieva, Elena, Köhn, Arne, Köksal, Abdullatif, Kopacewicz, Kamil, Korkiakangas, Timo, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Krishnamurthy, Parameswari, Kwak, Sookyoung, Laippala, Veronika, Lam, Lucia, Lambertino, Lorenzo, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Levina, Maria, Li, Cheuk Ying, Li, Josie, Li, Keying, Li, Yuan, Lim, KyungTae, Lindén, Krister, Ljubešić, Nikola, Loginova, Olga, Luthfi, Andry, Luukko, Mikko, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Martínez Alonso, Héctor, Martins, André, Mašek, Jan, Matsuda, Hiroshi, Matsumoto, Yuji, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Miekka, Niko, Mischenkova, Karina, Misirpashayeva, Margarita, Missilä, Anna, Mititelu, Cătălin, Mitrofan, Maria, Miyao, Yusuke, Mojiri Foroushani, AmirHossein, Moloodi, Amirsaeid, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Mori, Keiko Sophie, Mori, Shinsuke, Morioka, Tomohiko, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Munro, Robert, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Nakhlé, Mariam, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nourian, Alireza, Nurmi, Hanna, Ojala, Stina, Ojha, Atul Kr., Olúòkun, Adédayọ̀, Omura, Mai, Onwuegbuzia, Emeka, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Özateş, Şaziye Betül, Özgür, Arzucan, Öztürk Başaran, Balkız, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Peljak-Łapińska, Angelika, Peng, Siyao, Perez, Cenel-Augusto, Perkova, Natalia, Perrier, Guy, Petrov, Slav, Petrova, Daria, Phelan, Jason, Piitulainen, Jussi, Pirinen, Tommi A, Pitler, Emily, Plank, Barbara, Poibeau, Thierry, Ponomareva, Larisa, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Puolakainen, Tiina, Pyysalo, Sampo, Qi, Peng, Rääbis, Andriela, Rademaker, Alexandre, Rama, Taraka, Ramasamy, Loganathan, Ramisch, Carlos, Rashel, Fam, Rasooli, Mohammad Sadegh, Ravishankar, Vinit, Real, Livy, Rebeja, Petru, Reddy, Siva, Rehm, Georg, Riabov, Ivan, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rocha, Luisa, Rögnvaldsson, Eiríkur, Romanenko, Mykhailo, Rosa, Rudolf, Roșca, Valentin, Rovati, Davide, Rudina, Olga, Rueter, Jack, Rúnarsson, Kristján, Sadde, Shoval, Safari, Pegah, Sagot, Benoît, Sahala, Aleksi, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Särg, Dage, Saulīte, Baiba, Sawanakunanon, Yanin, Scannell, Kevin, Scarlata, Salvatore, Schneider, Nathan, Schuster, Sebastian, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shohibussirri, Muh, Sichinava, Dmitry, Sigurðsson, Einar Freyr, Silveira, Aline, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Skachedubova, Maria, Smith, Aaron, Soares-Bastos, Isabela, Spadine, Carolyn, Steingrímsson, Steinþór, Stella, Antonio, Straka, Milan, Strickland, Emmett, Strnadová, Jana, Suhr, Alane, Sulestio, Yogi Lesmana, Sulubacak, Umut, Suzuki, Shingo, Szántó, Zsolt, Taji, Dima, Takahashi, Yuta, Tamburini, Fabio, Tan, Mary Ann C., Tanaka, Takaaki, Tella, Samson, Tellier, Isabelle, Thomas, Guillaume, Torga, Liisi, Toska, Marsida, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Türk, Utku, Tyers, Francis, Uematsu, Sumire, Untilov, Roman, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Utka, Andrius, Vajjala, Sowmya, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Villemonte de la Clergerie, Eric, Vincze, Veronika, Wakasa, Aya, Wallenberg, Joel C., Wallin, Lars, Walsh, Abigail, Wang, Jing Xian, Washington, Jonathan North, Wendt, Maximilan, Widmer, Paul, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Yako, Mary, Yamashita, Kayo, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yu, Zhuoran, Žabokrtský, Zdeněk, Zahra, Shorouq, Zeldes, Amir, Zhu, Hanzhi, and Zhuravleva, Anna
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, Mbyá Guaraní, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Swiss German, Albanian, Icelandic, Akuntsu, Apurinã, Chukot, Khunsari, Manx, Mundurukú, Nayini, Old Turkish, Soi, South Levantine Arabic, and Tupinambá
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v2.7, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.7, and PUB
32. Universal Dependencies 2.8
- Creator:
- Zeman, Daniel, Nivre, Joakim, Abrams, Mitchell, Ackermann, Elia, Aepli, Noëmi, Aghaei, Hamid, Agić, Željko, Ahmadi, Amir, Ahrenberg, Lars, Ajede, Chika Kennedy, Aleksandravičiūtė, Gabrielė, Alfina, Ika, Antonsen, Lene, Aplonova, Katya, Aquino, Angelina, Aragon, Carolina, Aranzabe, Maria Jesus, Arıcan, Bilge Nas, Arnardóttir, Þórunn, Arutie, Gashaw, Arwidarasti, Jessica Naraiswari, Asahara, Masayuki, Aslan, Deniz Baran, Ateyah, Luma, Atmaca, Furkan, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Balasubramani, Keerthana, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Barkarson, Starkaður, Basmov, Victoria, Batchelor, Colin, Bauer, John, Bedir, Seyyit Talha, Bengoetxea, Kepa, Berk, Gözde, Berzak, Yevgeni, Bhat, Irshad Ahmad, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Bjarnadóttir, Kristín, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Braggaar, Anouck, Brokaitė, Kristina, Burchardt, Aljoscha, Candito, Marie, Caron, Bernard, Caron, Gauthier, Cassidy, Lauren, Cavalcanti, Tatiana, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cesur, Neslihan, Cetin, Savas, Çetinoğlu, Özlem, Chalub, Fabricio, Chauhan, Shweta, Chi, Ethan, Chika, Taishi, Cho, Yongseok, Choi, Jinho, Chun, Jayeol, Cignarella, Alessandra T., Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Courtin, Marine, Cristescu, Mihaela, Daniel, Philemon., Davidson, Elizabeth, de Marneffe, Marie-Catherine, de Paiva, Valeria, Derin, Mehmet Oguz, de Souza, Elvis, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dinakaramani, Arawinda, Di Nuovo, Elisa, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eckhoff, Hanne, Eiche, Sandra, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erina, Olga, Erjavec, Tomaž, Etienne, Aline, Evelyn, Wograine, Facundes, Sidney, Farkas, Richárd, Fernanda, Marília, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Garcia, Marcos, Gärdenfors, Moa, Garza, Sebastian, Gerardi, Fabrício Ferraz, Gerdes, Kim, Ginter, Filip, Godoy, Gustavo, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Griciūtė, Bernadeta, Grioni, Matias, Grobol, Loïc, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Güngör, Tunga, Habash, Nizar, Hafsteinsson, Hinrik, Hajič, Jan, Hajič jr., Jan, Hämäläinen, Mika, Hà Mỹ, Linh, Han, Na-Rae, Hanifmuti, Muhammad Yudistira, Hardwick, Sam, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hellwig, Oliver, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Huber, Eva, Hwang, Jena, Ikeda, Takumi, Ingason, Anton Karl, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Ito, Kaoru, Jelínek, Tomáš, Jha, Apoorva, Johannsen, Anders, Jónsdóttir, Hildur, Jørgensen, Fredrik, Juutinen, Markus, K, Sarveswaran, Kaşıkara, Hüner, Kaasen, Andre, Kabaeva, Nadezhda, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Kara, Neslihan, Katz, Boris, Kayadelen, Tolga, Kenney, Jessica, Kettnerová, Václava, Kirchner, Jesse, Klementieva, Elena, Köhn, Arne, Köksal, Abdullatif, Kopacewicz, Kamil, Korkiakangas, Timo, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Krishnamurthy, Parameswari, Kuyrukçu, Oğuzhan, Kuzgun, Aslı, Kwak, Sookyoung, Laippala, Veronika, Lam, Lucia, Lambertino, Lorenzo, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Levina, Maria, Li, Cheuk Ying, Li, Josie, Li, Keying, Li, Yuan, Lim, KyungTae, Lima Padovani, Bruna, Lindén, Krister, Ljubešić, Nikola, Loginova, Olga, Luthfi, Andry, Luukko, Mikko, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Marşan, Büşra, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Martínez Alonso, Héctor, Martins, André, Mašek, Jan, Matsuda, Hiroshi, Matsumoto, Yuji, Mazzei, Alessandro, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Miekka, Niko, Mischenkova, Karina, Misirpashayeva, Margarita, Missilä, Anna, Mititelu, Cătălin, Mitrofan, Maria, Miyao, Yusuke, Mojiri Foroushani, AmirHossein, Molnár, Judit, Moloodi, Amirsaeid, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Moretti, Giovanni, Mori, Keiko Sophie, Mori, Shinsuke, Morioka, Tomohiko, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Munro, Robert, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Nakhlé, Mariam, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nevaci, Manuela, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nourian, Alireza, Nurmi, Hanna, Ojala, Stina, Ojha, Atul Kr., Olúòkun, Adédayọ̀, Omura, Mai, Onwuegbuzia, Emeka, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Özateş, Şaziye Betül, Özçelik, Merve, Özgür, Arzucan, Öztürk Başaran, Balkız, Park, Hyunji Hayley, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Peljak-Łapińska, Angelika, Peng, Siyao, Perez, Cenel-Augusto, Perkova, Natalia, Perrier, Guy, Petrov, Slav, Petrova, Daria, Phelan, Jason, Piitulainen, Jussi, Pirinen, Tommi A, Pitler, Emily, Plank, Barbara, Poibeau, Thierry, Ponomareva, Larisa, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Puolakainen, Tiina, Pyysalo, Sampo, Qi, Peng, Rääbis, Andriela, Rademaker, Alexandre, Rama, Taraka, Ramasamy, Loganathan, Ramisch, Carlos, Rashel, Fam, Rasooli, Mohammad Sadegh, Ravishankar, Vinit, Real, Livy, Rebeja, Petru, Reddy, Siva, Rehm, Georg, Riabov, Ivan, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rocha, Luisa, Rögnvaldsson, Eiríkur, Romanenko, Mykhailo, Rosa, Rudolf, Roșca, Valentin, Rovati, Davide, Rudina, Olga, Rueter, Jack, Rúnarsson, Kristján, Sadde, Shoval, Safari, Pegah, Sagot, Benoît, Sahala, Aleksi, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Sanıyar, Ezgi, Särg, Dage, Saulīte, Baiba, Sawanakunanon, Yanin, Saxena, Shefali, Scannell, Kevin, Scarlata, Salvatore, Schneider, Nathan, Schuster, Sebastian, Schwartz, Lane, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shishkina, Yana, Shohibussirri, Muh, Sichinava, Dmitry, Siewert, Janine, Sigurðsson, Einar Freyr, Silveira, Aline, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Skachedubova, Maria, Smith, Aaron, Soares-Bastos, Isabela, Spadine, Carolyn, Sprugnoli, Rachele, Steingrímsson, Steinþór, Stella, Antonio, Straka, Milan, Strickland, Emmett, Strnadová, Jana, Suhr, Alane, Sulestio, Yogi Lesmana, Sulubacak, Umut, Suzuki, Shingo, Szántó, Zsolt, Taji, Dima, Takahashi, Yuta, Tamburini, Fabio, Tan, Mary Ann C., Tanaka, Takaaki, Tella, Samson, Tellier, Isabelle, Testori, Marinella, Thomas, Guillaume, Torga, Liisi, Toska, Marsida, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Türk, Utku, Tyers, Francis, Uematsu, Sumire, Untilov, Roman, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Utka, Andrius, Vajjala, Sowmya, van der Goot, Rob, Vanhove, Martine, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Villemonte de la Clergerie, Eric, Vincze, Veronika, Vlasova, Natalia, Wakasa, Aya, Wallenberg, Joel C., Wallin, Lars, Walsh, Abigail, Wang, Jing Xian, Washington, Jonathan North, Wendt, Maximilan, Widmer, Paul, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Yako, Mary, Yamashita, Kayo, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yenice, Arife Betül, Yıldız, Olcay Taner, Yu, Zhuoran, Žabokrtský, Zdeněk, Zahra, Shorouq, Zeldes, Amir, Zhu, Hanzhi, Zhuravleva, Anna, and Ziane, Rayan
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, Mbyá Guaraní, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Swiss German, Albanian, Icelandic, Akuntsu, Apurinã, Chukot, Khunsari, Manx, Mundurukú, Nayini, Old Turkish, Soi, South Levantine Arabic, Tupinambá, Beja, Western Frisian, Guajajára, Urubú-Kaapor, Kangri, K'iche', Low German, Makuráp, Central Siberian Yupik, and Western Armenian
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v2.8, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.8, and PUB
33. Universal Dependencies 2.8.1
- Creator:
- Zeman, Daniel, Nivre, Joakim, Abrams, Mitchell, Ackermann, Elia, Aepli, Noëmi, Aghaei, Hamid, Agić, Željko, Ahmadi, Amir, Ahrenberg, Lars, Ajede, Chika Kennedy, Aleksandravičiūtė, Gabrielė, Alfina, Ika, Antonsen, Lene, Aplonova, Katya, Aquino, Angelina, Aragon, Carolina, Aranzabe, Maria Jesus, Arıcan, Bilge Nas, Arnardóttir, Þórunn, Arutie, Gashaw, Arwidarasti, Jessica Naraiswari, Asahara, Masayuki, Aslan, Deniz Baran, Ateyah, Luma, Atmaca, Furkan, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Balasubramani, Keerthana, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Barkarson, Starkaður, Basmov, Victoria, Batchelor, Colin, Bauer, John, Bedir, Seyyit Talha, Bengoetxea, Kepa, Berk, Gözde, Berzak, Yevgeni, Bhat, Irshad Ahmad, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Bjarnadóttir, Kristín, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Braggaar, Anouck, Brokaitė, Kristina, Burchardt, Aljoscha, Candito, Marie, Caron, Bernard, Caron, Gauthier, Cassidy, Lauren, Cavalcanti, Tatiana, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cesur, Neslihan, Cetin, Savas, Çetinoğlu, Özlem, Chalub, Fabricio, Chauhan, Shweta, Chi, Ethan, Chika, Taishi, Cho, Yongseok, Choi, Jinho, Chun, Jayeol, Cignarella, Alessandra T., Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Courtin, Marine, Cristescu, Mihaela, Daniel, Philemon., Davidson, Elizabeth, de Marneffe, Marie-Catherine, de Paiva, Valeria, Derin, Mehmet Oguz, de Souza, Elvis, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dinakaramani, Arawinda, Di Nuovo, Elisa, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eckhoff, Hanne, Eiche, Sandra, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erina, Olga, Erjavec, Tomaž, Etienne, Aline, Evelyn, Wograine, Facundes, Sidney, Farkas, Richárd, Fernanda, Marília, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Garcia, Marcos, Gärdenfors, Moa, Garza, Sebastian, Gerardi, Fabrício Ferraz, Gerdes, Kim, Ginter, Filip, Godoy, Gustavo, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Griciūtė, Bernadeta, Grioni, Matias, Grobol, Loïc, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Güngör, Tunga, Habash, Nizar, Hafsteinsson, Hinrik, Hajič, Jan, Hajič jr., Jan, Hämäläinen, Mika, Hà Mỹ, Linh, Han, Na-Rae, Hanifmuti, Muhammad Yudistira, Hardwick, Sam, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hellwig, Oliver, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Huber, Eva, Hwang, Jena, Ikeda, Takumi, Ingason, Anton Karl, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Ito, Kaoru, Jelínek, Tomáš, Jha, Apoorva, Johannsen, Anders, Jónsdóttir, Hildur, Jørgensen, Fredrik, Juutinen, Markus, K, Sarveswaran, Kaşıkara, Hüner, Kaasen, Andre, Kabaeva, Nadezhda, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Kara, Neslihan, Katz, Boris, Kayadelen, Tolga, Kenney, Jessica, Kettnerová, Václava, Kirchner, Jesse, Klementieva, Elena, Köhn, Arne, Köksal, Abdullatif, Kopacewicz, Kamil, Korkiakangas, Timo, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Krishnamurthy, Parameswari, Kuyrukçu, Oğuzhan, Kuzgun, Aslı, Kwak, Sookyoung, Laippala, Veronika, Lam, Lucia, Lambertino, Lorenzo, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Levina, Maria, Li, Cheuk Ying, Li, Josie, Li, Keying, Li, Yuan, Lim, KyungTae, Lima Padovani, Bruna, Lindén, Krister, Ljubešić, Nikola, Loginova, Olga, Luthfi, Andry, Luukko, Mikko, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Marşan, Büşra, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Martínez Alonso, Héctor, Martins, André, Mašek, Jan, Matsuda, Hiroshi, Matsumoto, Yuji, Mazzei, Alessandro, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Miekka, Niko, Mischenkova, Karina, Misirpashayeva, Margarita, Missilä, Anna, Mititelu, Cătălin, Mitrofan, Maria, Miyao, Yusuke, Mojiri Foroushani, AmirHossein, Molnár, Judit, Moloodi, Amirsaeid, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Moretti, Giovanni, Mori, Keiko Sophie, Mori, Shinsuke, Morioka, Tomohiko, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Munro, Robert, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Nakhlé, Mariam, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nevaci, Manuela, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nourian, Alireza, Nurmi, Hanna, Ojala, Stina, Ojha, Atul Kr., Olúòkun, Adédayọ̀, Omura, Mai, Onwuegbuzia, Emeka, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Özateş, Şaziye Betül, Özçelik, Merve, Özgür, Arzucan, Öztürk Başaran, Balkız, Park, Hyunji Hayley, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Peljak-Łapińska, Angelika, Peng, Siyao, Perez, Cenel-Augusto, Perkova, Natalia, Perrier, Guy, Petrov, Slav, Petrova, Daria, Phelan, Jason, Piitulainen, Jussi, Pirinen, Tommi A, Pitler, Emily, Plank, Barbara, Poibeau, Thierry, Ponomareva, Larisa, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Puolakainen, Tiina, Pyysalo, Sampo, Qi, Peng, Rääbis, Andriela, Rademaker, Alexandre, Rama, Taraka, Ramasamy, Loganathan, Ramisch, Carlos, Rashel, Fam, Rasooli, Mohammad Sadegh, Ravishankar, Vinit, Real, Livy, Rebeja, Petru, Reddy, Siva, Rehm, Georg, Riabov, Ivan, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rocha, Luisa, Rögnvaldsson, Eiríkur, Romanenko, Mykhailo, Rosa, Rudolf, Roșca, Valentin, Rovati, Davide, Rudina, Olga, Rueter, Jack, Rúnarsson, Kristján, Sadde, Shoval, Safari, Pegah, Sagot, Benoît, Sahala, Aleksi, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Sanıyar, Ezgi, Särg, Dage, Saulīte, Baiba, Sawanakunanon, Yanin, Saxena, Shefali, Scannell, Kevin, Scarlata, Salvatore, Schneider, Nathan, Schuster, Sebastian, Schwartz, Lane, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shishkina, Yana, Shohibussirri, Muh, Sichinava, Dmitry, Siewert, Janine, Sigurðsson, Einar Freyr, Silveira, Aline, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Skachedubova, Maria, Smith, Aaron, Soares-Bastos, Isabela, Spadine, Carolyn, Sprugnoli, Rachele, Steingrímsson, Steinþór, Stella, Antonio, Straka, Milan, Strickland, Emmett, Strnadová, Jana, Suhr, Alane, Sulestio, Yogi Lesmana, Sulubacak, Umut, Suzuki, Shingo, Szántó, Zsolt, Taji, Dima, Takahashi, Yuta, Tamburini, Fabio, Tan, Mary Ann C., Tanaka, Takaaki, Tella, Samson, Tellier, Isabelle, Testori, Marinella, Thomas, Guillaume, Torga, Liisi, Toska, Marsida, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Türk, Utku, Tyers, Francis, Uematsu, Sumire, Untilov, Roman, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Utka, Andrius, Vajjala, Sowmya, van der Goot, Rob, Vanhove, Martine, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Villemonte de la Clergerie, Eric, Vincze, Veronika, Vlasova, Natalia, Wakasa, Aya, Wallenberg, Joel C., Wallin, Lars, Walsh, Abigail, Wang, Jing Xian, Washington, Jonathan North, Wendt, Maximilan, Widmer, Paul, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Yako, Mary, Yamashita, Kayo, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yenice, Arife Betül, Yıldız, Olcay Taner, Yu, Zhuoran, Žabokrtský, Zdeněk, Zahra, Shorouq, Zeldes, Amir, Zhu, Hanzhi, Zhuravleva, Anna, and Ziane, Rayan
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, Mbyá Guaraní, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Swiss German, Albanian, Icelandic, Akuntsu, Apurinã, Chukot, Khunsari, Manx, Mundurukú, Nayini, Old Turkish, Soi, South Levantine Arabic, Tupinambá, Beja, Western Frisian, Guajajára, Urubú-Kaapor, Kangri, K'iche', Low German, Makuráp, Central Siberian Yupik, and Western Armenian
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008). Version 2.8.1 fixes a bug in 2.8 where a portion of the Dutch Alpino treebank was accidentally omitted.
- Rights:
- Licence Universal Dependencies v2.8, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.8, and PUB
34. Universal Dependencies 2.9
- Creator:
- Zeman, Daniel, Nivre, Joakim, Abrams, Mitchell, Ackermann, Elia, Aepli, Noëmi, Aghaei, Hamid, Agić, Željko, Ahmadi, Amir, Ahrenberg, Lars, Ajede, Chika Kennedy, Aleksandravičiūtė, Gabrielė, Alfina, Ika, Antonsen, Lene, Aplonova, Katya, Aquino, Angelina, Aragon, Carolina, Aranzabe, Maria Jesus, Arıcan, Bilge Nas, Arnardóttir, Þórunn, Arutie, Gashaw, Arwidarasti, Jessica Naraiswari, Asahara, Masayuki, Aslan, Deniz Baran, Ateyah, Luma, Atmaca, Furkan, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Balasubramani, Keerthana, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Barkarson, Starkaður, Basile, Rodolfo, Basmov, Victoria, Batchelor, Colin, Bauer, John, Bedir, Seyyit Talha, Bengoetxea, Kepa, Berk, Gözde, Berzak, Yevgeni, Bhat, Irshad Ahmad, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Bjarnadóttir, Kristín, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Braggaar, Anouck, Brokaitė, Kristina, Burchardt, Aljoscha, Candito, Marie, Caron, Bernard, Caron, Gauthier, Cassidy, Lauren, Cavalcanti, Tatiana, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cesur, Neslihan, Cetin, Savas, Çetinoğlu, Özlem, Chalub, Fabricio, Chauhan, Shweta, Chi, Ethan, Chika, Taishi, Cho, Yongseok, Choi, Jinho, Chun, Jayeol, Chung, Juyeon, Cignarella, Alessandra T., Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Courtin, Marine, Cristescu, Mihaela, Daniel, Philemon, Davidson, Elizabeth, de Marneffe, Marie-Catherine, de Paiva, Valeria, Derin, Mehmet Oguz, de Souza, Elvis, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dinakaramani, Arawinda, Di Nuovo, Elisa, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eckhoff, Hanne, Eiche, Sandra, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erina, Olga, Erjavec, Tomaž, Etienne, Aline, Evelyn, Wograine, Facundes, Sidney, Farkas, Richárd, Ferdaousi, Jannatul, Fernanda, Marília, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Garcia, Marcos, Gärdenfors, Moa, Garza, Sebastian, Gerardi, Fabrício Ferraz, Gerdes, Kim, Ginter, Filip, Godoy, Gustavo, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Griciūtė, Bernadeta, Grioni, Matias, Grobol, Loïc, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Güngör, Tunga, Habash, Nizar, Hafsteinsson, Hinrik, Hajič, Jan, Hajič jr., Jan, Hämäläinen, Mika, Hà Mỹ, Linh, Han, Na-Rae, Hanifmuti, Muhammad Yudistira, Hardwick, Sam, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hellwig, Oliver, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Huber, Eva, Hwang, Jena, Ikeda, Takumi, Ingason, Anton Karl, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Ito, Kaoru, Jannat, Siratun, Jelínek, Tomáš, Jha, Apoorva, Johannsen, Anders, Jónsdóttir, Hildur, Jørgensen, Fredrik, Juutinen, Markus, K, Sarveswaran, Kaşıkara, Hüner, Kaasen, Andre, Kabaeva, Nadezhda, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Kara, Neslihan, Katz, Boris, Kayadelen, Tolga, Kenney, Jessica, Kettnerová, Václava, Kirchner, Jesse, Klementieva, Elena, Klyachko, Elena, Köhn, Arne, Köksal, Abdullatif, Kopacewicz, Kamil, Korkiakangas, Timo, Köse, Mehmet, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Krishnamurthy, Parameswari, Kübler, Sandra, Kuyrukçu, Oğuzhan, Kuzgun, Aslı, Kwak, Sookyoung, Laippala, Veronika, Lam, Lucia, Lambertino, Lorenzo, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Levina, Maria, Li, Cheuk Ying, Li, Josie, Li, Keying, Li, Yuan, Lim, KyungTae, Lima Padovani, Bruna, Lindén, Krister, Ljubešić, Nikola, Loginova, Olga, Lusito, Stefano, Luthfi, Andry, Luukko, Mikko, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Mahamdi, Menel, Maillard, Jean, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Marşan, Büşra, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Martínez Alonso, Héctor, Martín-Rodríguez, Lorena, Martins, André, Mašek, Jan, Matsuda, Hiroshi, Matsumoto, Yuji, Mazzei, Alessandro, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Merzhevich, Tatiana, Miekka, Niko, Mischenkova, Karina, Misirpashayeva, Margarita, Missilä, Anna, Mititelu, Cătălin, Mitrofan, Maria, Miyao, Yusuke, Mojiri Foroushani, AmirHossein, Molnár, Judit, Moloodi, Amirsaeid, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Moretti, Giovanni, Mori, Keiko Sophie, Mori, Shinsuke, Morioka, Tomohiko, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Munro, Robert, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Nakhlé, Mariam, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nevaci, Manuela, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nourian, Alireza, Nurmi, Hanna, Ojala, Stina, Ojha, Atul Kr., Olúòkun, Adédayọ̀, Omura, Mai, Onwuegbuzia, Emeka, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Özateş, Şaziye Betül, Özçelik, Merve, Özgür, Arzucan, Öztürk Başaran, Balkız, Park, Hyunji Hayley, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Peljak-Łapińska, Angelika, Peng, Siyao, Perez, Cenel-Augusto, Perkova, Natalia, Perrier, Guy, Petrov, Slav, Petrova, Daria, Phelan, Jason, Piitulainen, Jussi, Pirinen, Tommi A, Pitler, Emily, Plank, Barbara, Poibeau, Thierry, Ponomareva, Larisa, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Puolakainen, Tiina, Pyysalo, Sampo, Qi, Peng, Rääbis, Andriela, Rademaker, Alexandre, Rahoman, Mizanur, Rama, Taraka, Ramasamy, Loganathan, Ramisch, Carlos, Rashel, Fam, Rasooli, Mohammad Sadegh, Ravishankar, Vinit, Real, Livy, Rebeja, Petru, Reddy, Siva, Regnault, Mathilde, Rehm, Georg, Riabov, Ivan, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rizqiyah, Putri, Rocha, Luisa, Rögnvaldsson, Eiríkur, Romanenko, Mykhailo, Rosa, Rudolf, Roșca, Valentin, Rovati, Davide, Rudina, Olga, Rueter, Jack, Rúnarsson, Kristján, Sadde, Shoval, Safari, Pegah, Sagot, Benoît, Sahala, Aleksi, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Sanıyar, Ezgi, Särg, Dage, Saulīte, Baiba, Sawanakunanon, Yanin, Saxena, Shefali, Scannell, Kevin, Scarlata, Salvatore, Schneider, Nathan, Schuster, Sebastian, Schwartz, Lane, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shahzadi, Syeda, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shishkina, Yana, Shohibussirri, Muh, Sichinava, Dmitry, Siewert, Janine, Sigurðsson, Einar Freyr, Silveira, Aline, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Skachedubova, Maria, Smith, Aaron, Soares-Bastos, Isabela, Sourov, Shafi, Spadine, Carolyn, Sprugnoli, Rachele, Steingrímsson, Steinþór, Stella, Antonio, Straka, Milan, Strickland, Emmett, Strnadová, Jana, Suhr, Alane, Sulestio, Yogi Lesmana, Sulubacak, Umut, Suzuki, Shingo, Szántó, Zsolt, Taguchi, Chihiro, Taji, Dima, Takahashi, Yuta, Tamburini, Fabio, Tan, Mary Ann C., Tanaka, Takaaki, Tanaya, Dipta, Tella, Samson, Tellier, Isabelle, Testori, Marinella, Thomas, Guillaume, Torga, Liisi, Toska, Marsida, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Türk, Utku, Tyers, Francis, Uematsu, Sumire, Untilov, Roman, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Utka, Andrius, Vajjala, Sowmya, van der Goot, Rob, Vanhove, Martine, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Villemonte de la Clergerie, Eric, Vincze, Veronika, Vlasova, Natalia, Wakasa, Aya, Wallenberg, Joel C., Wallin, Lars, Walsh, Abigail, Wang, Jing Xian, Washington, Jonathan North, Wendt, Maximilan, Widmer, Paul, Wijono, Sri Hartati, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Yako, Mary, Yamashita, Kayo, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yenice, Arife Betül, Yıldız, Olcay Taner, Yu, Zhuoran, Yuliawati, Arlisa, Žabokrtský, Zdeněk, Zahra, Shorouq, Zeldes, Amir, Zhou, He, Zhu, Hanzhi, Zhuravleva, Anna, and Ziane, Rayan
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, Mbyá Guaraní, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Swiss German, Albanian, Icelandic, Akuntsu, Apurinã, Chukot, Khunsari, Manx, Mundurukú, Nayini, Old Turkish, Soi, South Levantine Arabic, Tupinambá, Beja, Western Frisian, Guajajára, Urubú-Kaapor, Kangri, K'iche', Low German, Makuráp, Central Siberian Yupik, Western Armenian, Bengali, Javanese, Karo (Brazil), Ligurian, Neapolitan, Tatar, Xibe, and Yakut
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008). Version 2.8.1 fixes a bug in 2.8 where a portion of the Dutch Alpino treebank was accidentally omitted.
- Rights:
- Licence Universal Dependencies v2.9, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.9, and PUB
35. Universal Segmentations 1.0 (UniSegments 1.0)
- Creator:
- Žabokrtský, Zdeněk, Bafna, Nyati, Bodnár, Jan, Kyjánek, Lukáš, Svoboda, Emil, Ševčíková, Magda, Vidra, Jonáš, Angle, Sachi, Ansari, Ebrahim, Arkhangelskiy, Timofey, Batsuren, Khuyagbaatar, Bella, Gábor, Bertinetto, Pier Marco, Bonami, Olivier, Celata, Chiara, Daniel, Michael, Fedorenko, Alexei, Filko, Matea, Giunchiglia, Fausto, Haghdoost, Hamid, Hathout, Nabil, Khomchenkova, Irina, Khurshudyan, Victoria, Levonian, Dmitri, Litta, Eleonora, Medvedeva, Maria, Muralikrishna, S. N., Namer, Fiammetta, Nikravesh, Mahshid, Padó, Sebastian, Passarotti, Marco, Plungian, Vladimir, Polyakov, Alexey, Potapov, Mihail, Pruthwik, Mishra, Rao B, Ashwath, Rubakov, Sergei, Samar, Husain, Sharma, Dipti Misra, Šnajder, Jan, Šojat, Krešimir, Štefanec, Vanja, Talamo, Luigi, Tribout, Delphine, Vodolazsky, Daniil, Vydrin, Arseniy, Zakirova, Aigul, and Zeller, Britta
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text, lexicon, and lexicalConceptualResource
- Subject:
- universal segmentations, morphological segmentation, word segmentation, segmentation, morphology, morphemes, morphological dictionary, unisegments, morph, and multilingual
- Language:
- Czech, Catalan, German, English, Persian, Finnish, French, Serbo-Croatian, Croatian, Hungarian, Italian, Komi-Zyrian, Latin, Moksha, Mari (Russia), Mongolian, Erzya, Polish, Portuguese, Russian, Spanish, Swedish, Tajik, Udmurt, Armenian, Bengali, Hindi, Malayalam, Marathi, and Kannada
- Description:
- Universal Segmentations (UniSegments) is a collection of lexical resources capturing morphological segmentations harmonised into a cross-linguistically consistent annotation scheme for many languages. The annotation scheme consists of simple tab-separated columns that stores a word and its morphological segmentations, including pieces of information about the word and the segmented units, e.g., part-of-speech categories, type of morphs/morphemes etc. The current public version of the collection contains 38 harmonised segmentation datasets covering 30 different languages.
- Rights:
- Universal Segmentations 1.0 License Terms, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-unisegs-1.0, and PUB
36. W2C – Web to Corpus – Corpora
- Creator:
- Majliš, Martin
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- multilingual corpora
- Language:
- Afrikaans, Tosk Albanian, Amharic, Arabic, Aragonese, Egyptian Arabic, Asturian, Azerbaijani, Belarusian, Bengali, Bosnian, Bishnupriya, Breton, Buginese, Bulgarian, Catalan, Cebuano, Czech, Chuvash, Corsican, Welsh, Danish, German, Dimli (individual language), Modern Greek (1453-), English, Esperanto, Estonian, Basque, Faroese, Persian, Finnish, French, Western Frisian, Gan Chinese, Scottish Gaelic, Irish, Galician, Gilaki, Gujarati, Haitian, Serbo-Croatian, Hebrew, Fiji Hindi, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Ido, Interlingua (International Auxiliary Language Association), Indonesian, Icelandic, Italian, Javanese, Japanese, Kannada, Georgian, Kazakh, Korean, Kurdish, Latin, Latvian, Limburgan, Lithuanian, Lombard, Luxembourgish, Malayalam, Marathi, Macedonian, Malagasy, Mongolian, Maori, Malay (macrolanguage), Burmese, Neapolitan, Low German, Nepali (macrolanguage), Newari, Dutch, Norwegian Nynorsk, Norwegian, Occitan (post 1500), Ossetian, Pampanga, Piemontese, Polish, Portuguese, Quechua, Romanian, Russian, Yakut, Sicilian, Scots, Slovak, Slovenian, Spanish, Albanian, Serbian, Sundanese, Swahili (macrolanguage), Swedish, Tamil, Tatar, Telugu, Tajik, Tagalog, Thai, Turkish, Ukrainian, Urdu, Uzbek, Venetian, Vietnamese, Volapük, Waray (Philippines), Walloon, Yiddish, Yoruba, and Chinese
- Description:
- A set of corpora for 120 languages automatically collected from wikipedia and the web. Collected using the W2C toolset: http://hdl.handle.net/11858/00-097C-0000-0022-60D6-1
- Rights:
- Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0), http://creativecommons.org/licenses/by-sa/3.0/, and PUB