1 - 67 of 67
Number of results to display per page
Search Results
2. Amara - universal subtitles
- Type:
- corpus
- Language:
- Arabic, Danish, Dutch, English, German, Modern Greek (1453-), Italian, Japanese, Korean, Portuguese, Russian, Spanish, and Turkish
- Description:
- Large set of subtitles available for download in multiple languages. Can be used as parallel corpus.
- Rights:
- Not specified
3. Annotated corpora and tools of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions (edition 1.0)
- Creator:
- Savary, Agata, Ramisch, Carlos, Cordeiro, Silvio Ricardo, Sangati, Federico, Vincze, Veronika, QasemiZadeh, Behrang, Candito, Marie, Cap, Fabienne, Giouli, Voula, Stoyanova, Ivelina, Doucet, Antoine, Adalı, Kübra, Barbu Mititelu, Verginica, Bejček, Eduard, El Maarouf, Ismail, Eryiğit, Gülşen, Galea, Luke, Ha-Cohen Kerner, Yaakov, Liebeskind, Chaya, Monti, Johanna, Parra Escartín, Carla, Kovalevskaitė, Jolanta, Krek, Simon, van der Plas, Lonneke, Aceta, Cristina, Aduriz, Itziar, Antoine, Jean-Yves, Attard, Greta, Azzopardi, Kirsty, Boizou, Loic, Bonnici, Janice, Boz, Mert, Bumbulienė, Ieva, Busuttil, Jael, Caruso, Valeria, Cherchi, Manuela, Constant, Matthieu, Czerepowicka, Monika, De Santis, Anna, Dimitrova, Tsvetana, Dinç, Tutkum, Elyovich, Hevi, Fabri, Ray, Farrugia, Alison, Findlay, Jamie, Fotopoulou, Aggeliki, Foufi, Vassiliki, Galea, Sara Anne, Gantar, Polona, Gatt, Albert, Gatt, Anabelle, Herrero, Carlos, Iñurrieta, Uxoa, Jagfeld, Glorianna, Hnátková, Milena, Ionescu, Mihaela, Klyueva, Natalia, Koeva, Svetla, Kovács, Viktória, Kuzman, Taja, Leseva, Svetlozara, Louisou, Sevi, Lynn, Teresa, Malka, Ruth, Martínez Alonso, Héctor, McCrae, John, de Medeiros Caseli, Helena, Miral, Ayşenur, Muscat, Amanda, Nivre, Joakim, Oakes, Michael, Onofrei, Mihaela, Parmentier, Yannick, Pasquer, Caroline, Pia di Buono, Maria, Priego Sanchez, Belem, Raffone, Annalisa, Ramisch, Renata, Rimkutė, Erika, Rizea, Monica-Mihaela, Simkó, Katalin, Spagnol, Michael, Stefanova, Valentina, Stymne, Sara, Sulubacak, Umut, Tabone, Nicole, Tanti, Marc, Todorova, Maria, Urešová, Zdenka, Villavicencio, Aline, and Zilio, Leonardo
- Publisher:
- PARSEME
- Type:
- text and corpus
- Subject:
- Multiword expressions, verbal multiword expressions, idioms, light-verb constructions, verb-particle constructions, and inherently reflexive verbs
- Language:
- Bulgarian, Czech, German, Modern Greek (1453-), Spanish, Persian, French, Hebrew, Hungarian, Italian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovenian, Swedish, and Turkish
- Description:
- The PARSEME shared task aims at identifying verbal MWEs in running texts. Verbal MWEs include idioms (let the cat out of the bag), light verb constructions (make a decision), verb-particle constructions (give up), and inherently reflexive verbs (se suicider 'to suicide' in French). VMWEs were annotated according to the universal guidelines in 18 languages. The corpora are provided in the parsemetsv format, inspired by the CONLL-U format. For most languages, paired files in the CONLL-U format - not necessarily using UD tagsets - containing parts of speech, lemmas, morphological features and/or syntactic dependencies are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). This item contains training and test data, tools and the universal guidelines file.
- Rights:
- PARSEME Shared Task Data (v. 1.0) Agreement, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-mwe-1.0, and PUB
4. Annotated corpora and tools of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions (edition 1.1)
- Creator:
- Ramisch, Carlos, Cordeiro, Silvio Ricardo, Savary, Agata, Vincze, Veronika, Barbu Mititelu, Verginica, Bhatia, Archna, Buljan, Maja, Candito, Marie, Gantar, Polona, Giouli, Voula, Güngör, Tunga, Hawwari, Abdelati, Iñurrieta, Uxoa, Kovalevskaitė, Jolanta, Krek, Simon, Lichte, Timm, Liebeskind, Chaya, Monti, Johanna, Parra Escartín, Carla, QasemiZadeh, Behrang, Ramisch, Renata, Schneider, Nathan, Stoyanova, Ivelina, Vaidya, Ashwini, Walsh, Abigail, Aceta, Cristina, Aduriz, Itziar, Antoine, Jean-Yves, Arhar Holdt, Špela, Berk, Gözde, Bielinskienė, Agnė, Blagus, Goranka, Boizou, Loic, Bonial, Claire, Caruso, Valeria, Čibej, Jaka, Constant, Matthieu, Cook, Paul, Diab, Mona, Dimitrova, Tsvetana, Ehren, Rafael, Elbadrashiny, Mohamed, Elyovich, Hevi, Erden, Berna, Estarrona, Ainara, Fotopoulou, Aggeliki, Foufi, Vassiliki, Geeraert, Kristina, van Gompel, Maarten, Gonzalez, Itziar, Gurrutxaga, Antton, Ha-Cohen Kerner, Yaakov, Ibrahim, Rehab, Ionescu, Mihaela, Jain, Kanishka, Jazbec, Ivo-Pavao, Kavčič, Teja, Klyueva, Natalia, Kocijan, Kristina, Kovács, Viktória, Kuzman, Taja, Leseva, Svetlozara, Ljubešić, Nikola, Malka, Ruth, Markantonatou, Stella, Martínez Alonso, Héctor, Matas, Ivana, McCrae, John, de Medeiros Caseli, Helena, Onofrei, Mihaela, Palka-Binkiewicz, Emilia, Papadelli, Stella, Parmentier, Yannick, Pascucci, Antonio, Pasquer, Caroline, Pia di Buono, Maria, Puri, Vandana, Raffone, Annalisa, Ratori, Shraddha, Riccio, Anna, Sangati, Federico, Shukla, Vishakha, Simkó, Katalin, Šnajder, Jan, Somers, Clarissa, Srivastava, Shubham, Stefanova, Valentina, Taslimipoor, Shiva, Theoxari, Natasa, Todorova, Maria, Urizar, Ruben, Villavicencio, Aline, and Zilio, Leonardo
- Publisher:
- PARSEME
- Type:
- text and corpus
- Subject:
- Multiword expressions, verbal multiword expressions, light-verb constructions, verb-particle constructions, inherently reflexive verbs, verbal idioms, and multi-verb constructions
- Language:
- Bulgarian, German, Modern Greek (1453-), Spanish, Persian, French, Hebrew, Hungarian, Italian, Lithuanian, Polish, Portuguese, Romanian, Slovenian, Turkish, Hindi, Basque, English, and Croatian
- Description:
- This multilingual resource contains corpora in which verbal MWEs have been manually annotated. VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). VMWEs were annotated according to the universal guidelines in 19 languages. The corpora are provided in the cupt format, inspired by the CONLL-U format. The corpora were used in the 1.1 edition of the PARSEME Shared Task (2018). For most languages, morphological and syntactic information – not necessarily using UD tagsets – including parts of speech, lemmas, morphological features and/or syntactic dependencies are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). This item contains training, development and test data, as well as the evaluation tools used in the PARSEME Shared Task 1.1 (2018). The annotation guidelines are available online: http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.1
- Rights:
- PARSEME Shared Task Data (v. 1.1) Agreement, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-mwe-1.1, and PUB
5. Annotated corpora and tools of the PARSEME Shared Task on Semi-Supervised Identification of Verbal Multiword Expressions (edition 1.2)
- Creator:
- Ramisch, Carlos, Guillaume, Bruno, Savary, Agata, Waszczuk, Jakub, Candito, Marie, Vaidya, Ashwini, Barbu Mititelu, Verginica, Bhatia, Archna, Iñurrieta, Uxoa, Giouli, Voula, Güngör, Tunga, Jiang, Menghan, Lichte, Timm, Liebeskind, Chaya, Monti, Johanna, Ramisch, Renata, Stymme, Sara, Walsh, Abigail, Xu, Hongzhi, Palka-Binkiewicz, Emilia, Ehren, Rafael, Stymne, Sara, Constant, Matthieu, Pasquer, Caroline, Parmentier, Yannick, Antoine, Jean-Yves, Carlino, Carola, Caruso, Valeria, Di Buono, Maria Pia, Pascucci, Antonio, Raffone, Annalisa, Riccio, Anna, Sangati, Federico, Speranza, Giulia, Cordeiro, Silvio Ricardo, de Medeiros Caseli, Helena, Miranda, Isaac, Rademaker, Alexandre, Vale, Oto, Villavicencio, Aline, Wick Pedro, Gabriela, Wilkens, Rodrigo, Zilio, Leonardo, Rizea, Monica-Mihaela, Ionescu, Mihaela, Onofrei, Mihaela, Chen, Jia, Ge, Xiaomin, Hu, Fangyuan, Hu, Sha, Li, Minli, Liu, Siyuan, Qin, Zhenzhen, Sun, Ruilong, Wang, Chenweng, Xiao, Huangyang, Yan, Peiyi, Yih, Tsy, Yu, Ke, Yu, Songping, Zeng, Si, Zhang, Yongchen, Zhao, Yun, Foufi, Vassiliki, Fotopoulou, Aggeliki, Markantonatou, Stella, Papadelli, Stella, Louizou, Sevasti, Aduriz, Itziar, Estarrona, Ainara, Gonzalez, Itziar, Gurrutxaga, Antton, Uria, Larraitz, Urizar, Ruben, Foster, Jennifer, Lynn, Teresa, Elyovitch, Hevi, Ha-Cohen Kerner, Yaakov, Malka, Ruth, Jain, Kanishka, Puri, Vandana, Ratori, Shraddha, Shukla, Vishakha, Srivastava, Shubham, Berk, Gozde, Erden, Berna, and Yirmibeşoğlu, Zeynep
- Publisher:
- PARSEME
- Type:
- text and corpus
- Subject:
- multiword expressions, verbal multiword expressions, light verb construction, verb-particle constructions, inherently reflexive verbs, verbal idioms, and multi-verb constructions
- Language:
- German, Modern Greek (1453-), Basque, French, Irish, Hebrew, Hindi, Italian, Polish, Portuguese, Romanian, Swedish, Turkish, and Chinese
- Description:
- This multilingual resource contains corpora in which verbal MWEs have been manually annotated, gathered at the occasion of the 1.2 edition of the PARSEME Shared Task on semi-supervised Identification of Verbal MWEs (2020). VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). For the 1.2 shared task edition, the data covers 14 languages, for which VMWEs were annotated according to the universal guidelines. The corpora are provided in the cupt format, inspired by the CONLL-U format. Morphological and syntactic information – not necessarily using UD tagsets – including parts of speech, lemmas, morphological features and/or syntactic dependencies are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). This item contains training, development and test data, as well as the evaluation tools used in the PARSEME Shared Task 1.2 (2020). The annotation guidelines are available online: http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.2
- Rights:
- PARSEME Shared Task Data (v. 1.2) Agreement, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-mwe-1.2, and PUB
6. C4Corpus (CC BY-NC part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Malayalam, Macedonian, Nepali (macrolanguage), Dutch, Norwegian, Panjabi, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Telugu, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB
7. C4Corpus (CC BY-NC-ND part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Malayalam, Marathi, Macedonian, Nepali (macrolanguage), Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Telugu, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Urdu, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), http://creativecommons.org/licenses/by-nc-nd/4.0/, and PUB
8. C4Corpus (CC BY-NC-SA part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Malayalam, Marathi, Macedonian, Nepali (macrolanguage), Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Telugu, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Urdu, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
9. C4Corpus (CC BY-ND part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Malayalam, Macedonian, Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution-NoDerivatives 4.0 International (CC BY-ND 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB
10. C4Corpus (CC BY-SA part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Malayalam, Marathi, Macedonian, Nepali (macrolanguage), Dutch, Norwegian, Panjabi, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Telugu, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Urdu, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB
11. C4Corpus (CC-BY part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Malayalam, Marathi, Macedonian, Nepali (macrolanguage), Dutch, Norwegian, Panjabi, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Telugu, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Urdu, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB
12. C4Corpus (publicdomain part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Croatian, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Dutch, Norwegian, Polish, Portuguese, Russian, Slovenian, Somali, Spanish, Swahili (macrolanguage), Swedish, Tagalog, Thai, Turkish, Ukrainian, Undetermined, and Vietnamese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Public Domain Mark (PD), http://creativecommons.org/publicdomain/mark/1.0/, and PUB
13. CLIPS : corpora e lessici di italiano parlato e scritto
- Publisher:
- Università degli studi di Napoli Federico II
- Type:
- corpus
- Language:
- Italian
- Description:
- Audio files of about 100 hours of speech from 15 different cities in Italy. Various recordings are transcribed to read in PDF
- Rights:
- Not specified
14. CoNLL 2017 and 2018 Shared Task Blind and Preprocessed Test Data
- Creator:
- Zeman, Daniel and Straka, Milan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- tokenization, word segmentation, morphology, tagging, syntax, parsing, and universal dependencies
- Language:
- Afrikaans, Arabic, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Persian, Finnish, French, Old French (842-ca. 1400), Irish, Galician, Gothic, Ancient Greek (to 1453), Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Latin, Latvian, Dutch, Norwegian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Thai, Turkish, Uighur, Ukrainian, Urdu, Vietnamese, and Chinese
- Description:
- CoNLL 2017 and 2018 shared tasks: Multilingual Parsing from Raw Text to Universal Dependencies This package contains the test data in the form in which they ware presented to the participating systems: raw text files and files preprocessed by UDPipe. The metadata.json files contain lists of files to process and to output; README files in the respective folders describe the syntax of metadata.json. For full training, development and gold standard test data, see Universal Dependencies 2.0 (CoNLL 2017) Universal Dependencies 2.2 (CoNLL 2018) See the download links at http://universaldependencies.org/. For more information on the shared tasks, see http://universaldependencies.org/conll17/ http://universaldependencies.org/conll18/ Contents: conll17-ud-test-2017-05-09 ... CoNLL 2017 test data conll18-ud-test-2018-05-06 ... CoNLL 2018 test data conll18-ud-test-2018-05-06-for-conll17 ... CoNLL 2018 test data with metadata and filenames modified so that it is digestible by the 2017 systems.
- Rights:
- Licence Universal Dependencies v2.2, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.2, and PUB
15. CoNLL 2017 Shared Task System Outputs
- Creator:
- Zeman, Daniel, Potthast, Martin, Straka, Milan, Popel, Martin, Dozat, Timothy, Qi, Peng, Manning, Christopher, Shi, Tianze, Wu, Felix G., Chen, Xilun, Cheng, Yao, Björkelund, Anders, Falenska, Agnieszka, Yu, Xiang, Kuhn, Jonas, Che, Wanxiang, Guo, Jiang, Wang, Yuxuan, Zheng, Bo, Zhao, Huaipeng, Liu, Yang, Teng, Dechuan, Liu, Ting, Lim, Kyungtae, Poibeau, Thierry, Sato, Motoki, Manabe, Hitoshi, Noji, Hiroshi, Matsumoto, Yuji, Kırnap, Ömer, Önder, Berkay Furkan, Yuret, Deniz, Straková, Jana, Vania, Clara, Zhang, Xingxing, Lopez, Adam, Heinecke, Johannes, Asadullah, Munshi, Kanerva, Jenna, Luotolahti, Juhani, Ginter, Filip, Kuan, Yu, Sofroniev, Pavel, Schill, Erik, Hinrichs, Erhard, Nguyen, Dat Quoc, Dras, Mark, Johnson, Mark, Qian, Xian, Vilares, David, Gómez-Rodríguez, Carlos, Aufrant, Lauriane, Wisniewski, Guillaume, Yvon, François, Dumitrescu, Stefan Daniel, Boroş, Tiberiu, Tufiş, Dan, Das, Ayan, Zaffar, Affan, Sarkar, Sudeshna, Wang, Hao, Zhao, Hai, Zhang, Zhisong, Hornby, Ryan, Taylor, Clark, Park, Jungyeul, de Lhoneux, Miryam, Shao, Yan, Basirat, Ali, Kiperwasser, Eliyahu, Stymne, Sara, Goldberg, Yoav, Nivre, Joakim, Akkuş, Burak Kerim, Azizoglu, Heval, Cakici, Ruket, Moor, Christophe, Merlo, Paola, Henderson, James, Wang, Haozhou, Ji, Tao, Wu, Yuanbin, Lan, Man, de la Clergerie, Eric, Sagot, Benoît, Seddah, Djamé, More, Amir, Tsarfaty, Reut, Kanayama, Hiroshi, Muraoka, Masayasu, Yoshikawa, Katsumasa, Garcia, Marcos, and Gamallo, Pablo
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- dependency parser and parsebank
- Language:
- Arabic, Bulgarian, Russia Buriat, Czech, Catalan, Church Slavic, Danish, German, Modern Greek (1453-), English, Spanish, Estonian, Basque, Persian, Finnish, French, Irish, Galician, Gothic, Ancient Greek (to 1453), Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Latin, Latvian, Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Northern Sami, Swedish, Turkish, Uighur, Ukrainian, Urdu, Vietnamese, and Chinese
- Description:
- This package contains the system outputs from the CoNLL 2017 Shared Task in Multilingual Parsing from Raw Text to Universal Dependencies.
- Rights:
- Licence Universal Dependencies v2.0, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.0, and PUB
16. CoNLL 2018 Shared Task System Outputs
- Creator:
- Zeman, Daniel, Potthast, Martin, Duthoo, Elie, Mesnard, Olivier, Rybak, Piotr, Wróblewska, Alina, Che, Wanxiang, Liu, Yijia, Wang, Yuxuan, Zheng, Bo, Liu, Ting, Li, Zuchao, He, Shexia, Zhang, Zhuosheng, Zhao, Hai, Wu, Yingting, Tong, Jia-Jun, Nguyen, Dat Quoc, Verspoor, Karin, Wan, Hui, Naseem, Tahira, Lee, Young-Suk, Castelli, Vittorio, Ballesteros, Miguel, Hershcovich, Daniel, Abend, Omri, Rappoport, Ari, Smith, Aaron, Bohnet, Bernd, de Lhoneux, Miryam, Nivre, Joakim, Shao, Yan, Stymne, Sara, Kırnap, Ömer, Dayanık, Erenay, Yuret, Deniz, Kanerva, Jenna, Ginter, Filip, Miekka, Niko, Leino, Akseli, Salakoski, Tapio, Lim, KyungTae, Park, Cheoneum, Lee, Changki, Poibeau, Thierry, Bhat, Riyaz Ahmad, Bhat, Irshad, Bangalore, Srinivas, Qi, Peng, Dozat, Timothy, Zhang, Yuhao, Manning, Christopher, Boroș, Tiberiu, Dumitrescu, Stefan Daniel, Burtica, Ruxandra, Arakelyan, Gor, Hambardzumyan, Karen, Khachatrian, Hrant, Rosa, Rudolf, Mareček, David, Straka, Milan, Seker, Amit, More, Amir, Tsarfaty, Reut, Önder, Berkay Furkan, Gümeli, Can, Jawahar, Ganesh, Muller, Benjamin, Fethi, Amal, Martin, Louis, Villemonte de la Clergerie, Eric, Sagot, Benoît, Seddah, Djamé, Özateş, Şaziye Betül, Özgür, Arzucan, Gungor, Tunga, Öztürk, Balkız, Ji, Tao, Liu, Yufang, Wang, Yijun, Wu, Yuanbin, Lan, Man, Chen, Danlu, Lin, Mengxiao, Hu, Zhifeng, and Qiu, Xipeng
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- parsed data, conllu, and universal dependencies
- Language:
- Afrikaans, Arabic, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Persian, Finnish, French, Old French (842-ca. 1400), Irish, Galician, Gothic, Ancient Greek (to 1453), Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Latin, Latvian, Dutch, Norwegian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Thai, Turkish, Uighur, Ukrainian, Urdu, Vietnamese, and Chinese
- Description:
- Test data parsed by systems submitted to the CoNLL 2018 UD parsing shared task.
- Rights:
- Licence Universal Dependencies v2.2, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.2, and PUB
17. Copenhagen Dependency Treebanks versions 1-3
- Publisher:
- Copenhagen Business School
- Format:
- application/octet-stream
- Type:
- corpus
- Subject:
- parallel treebank, POS annotation, discourse annotation, morphological annotation, syntactic annotation, and semantic annotation
- Language:
- Danish, English, German, Italian, and Spanish
- Description:
- Parallel treebanks with annotation of syntax, discourse, coreference, morphology, and semantics. Version 3 also includes the Danish Dependency Treebank (version 1) and the Danish-English Parallel Dependency Treebank (version 2).
- Rights:
- GNU General Public License
18. Corpus of Italian Emblem Books
- Publisher:
- University of Glasgow
- Type:
- corpus
- Language:
- Italian
- Description:
- Italian emblem books from the Stirling Maxwell Collection (University of Glasgow). Transcribed text and photographi reproducitons. Searchable and browsable online
- Rights:
- Not specified
19. DaMuEL 1.0: A Large Multilingual Dataset for Entity Linking
- Creator:
- Kubeša, David and Straka, Milan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- entity linking, NEL, NER, dataset, and knowledge base
- Language:
- Afrikaans, Arabic, Armenian, Basque, Belarusian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Latin, Latvian, Lithuanian, Maltese, Marathi, Modern Greek (1453-), Northern Sami, Norwegian Nynorsk, Persian, Polish, Portuguese, Romanian, Russian, Scottish Gaelic, Serbian, Slovak, Slovenian, Spanish, Swedish, Tamil, Telugu, Uighur, Ukrainian, Urdu, Vietnamese, and Wolof
- Description:
- We present DaMuEL, a large Multilingual Dataset for Entity Linking containing data in 53 languages. DaMuEL consists of two components: a knowledge base that contains language-agnostic information about entities, including their claims from Wikidata and named entity types (PER, ORG, LOC, EVENT, BRAND, WORK_OF_ART, MANUFACTURED); and Wikipedia texts with entity mentions linked to the knowledge base, along with language-specific text from Wikidata such as labels, aliases, and descriptions, stored separately for each language. The Wikidata QID is used as a persistent, language-agnostic identifier, enabling the combination of the knowledge base with language-specific texts and information for each entity. Wikipedia documents deliberately annotate only a single mention for every entity present; we further automatically detect all mentions of named entities linked from each document. The dataset contains 27.9M named entities in the knowledge base and 12.3G tokens from Wikipedia texts. The dataset is published under the CC BY-SA licence.
- Rights:
- Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB
20. Deep Universal Dependencies 2.4
- Creator:
- Zeman, Daniel and Droganova, Kira
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- semantic dependency and universal dependencies
- Language:
- Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, and Galician
- Description:
- Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-2988). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
- Rights:
- Licence Universal Dependencies v2.4, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.4, and PUB
21. Deep Universal Dependencies 2.5
- Creator:
- Zeman, Daniel and Droganova, Kira
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- semantic dependency and universal dependencies
- Language:
- Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, Galician, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, and Skolt Sami
- Description:
- Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3105). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
- Rights:
- Licence Universal Dependencies v2.5, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.5, and PUB
22. Deep Universal Dependencies 2.6
- Creator:
- Zeman, Daniel and Droganova, Kira
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- semantic dependency and universal dependencies
- Language:
- Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, Galician, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Icelandic, Albanian, and Persian
- Description:
- Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3226). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
- Rights:
- Licence Universal Dependencies v2.6, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.6, and PUB
23. Deep Universal Dependencies 2.7
- Creator:
- Zeman, Daniel and Droganova, Kira
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- semantic dependency and universal dependencies
- Language:
- Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, Galician, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Icelandic, Albanian, Persian, Akuntsu, Apurinã, Khunsari, Manx, Mundurukú, Nayini, Soi, South Levantine Arabic, and Tupinambá
- Description:
- Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3424). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
- Rights:
- Licence Universal Dependencies v2.7, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.7, and PUB
24. Deep Universal Dependencies 2.8
- Creator:
- Zeman, Daniel and Droganova, Kira
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- semantic dependency and universal dependencies
- Language:
- Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, Galician, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Icelandic, Albanian, Persian, Akuntsu, Apurinã, Khunsari, Manx, Mundurukú, Nayini, Soi, South Levantine Arabic, Tupinambá, Beja, Western Frisian, Urubú-Kaapor, Kangri, K'iche', Low German, Makuráp, Western Armenian, and Central Siberian Yupik
- Description:
- Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3687). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
- Rights:
- Licence Universal Dependencies v2.8, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.8, and PUB
25. Deltacorpus
- Creator:
- Mareček, David, Yu, Zhiwei, Zeman, Daniel, and Žabokrtský, Zdeněk
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- part of speech, tagging, semi-supervised, and cross-language
- Language:
- Belarusian, Bosnian, Bulgarian, Czech, Serbo-Croatian, Croatian, Upper Sorbian, Macedonian, Polish, Russian, Slovak, Slovenian, Serbian, Ukrainian, Latvian, Lithuanian, Afrikaans, Danish, German, English, Faroese, Western Frisian, Swiss German, Icelandic, Limburgan, Luxembourgish, Low German, Dutch, Norwegian Nynorsk, Norwegian, Scots, Swedish, Yiddish, Aragonese, Asturian, Catalan, French, Galician, Haitian, Italian, Latin, Lombard, Neapolitan, Piemontese, Portuguese, Romanian, Spanish, Venetian, Walloon, Breton, Welsh, Scottish Gaelic, Irish, Modern Greek (1453-), Armenian, Albanian, Dimli (individual language), Persian, Gilaki, Kurdish, Tajik, Bengali, Bishnupriya, Gujarati, Fiji Hindi, Hindi, Marathi, Nepali (macrolanguage), Urdu, Amharic, Arabic, Egyptian Arabic, Hebrew, Estonian, Finnish, Hungarian, Basque, Georgian, Chuvash, Azerbaijani, Turkish, Uzbek, Kazakh, Tatar, Yakut, Korean, Mongolian, Telugu, Kannada, Malayalam, Tamil, Newari, Vietnamese, Indonesian, Javanese, Malagasy, Maori, Malay (macrolanguage), Pampanga, Sundanese, Tagalog, Waray (Philippines), Swahili (macrolanguage), Esperanto, Ido, Interlingua (International Auxiliary Language Association), and Volapük
- Description:
- Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger described in Yu et al. (2016, LREC, Portorož, Slovenia).
- Rights:
- Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB
26. Deltacorpus 1.1
- Creator:
- Mareček, David, Yu, Zhiwei, Zeman, Daniel, and Žabokrtský, Zdeněk
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- part of speech, tagging, semi-supervised, and cross-language
- Language:
- Belarusian, Bosnian, Bulgarian, Czech, Serbo-Croatian, Croatian, Upper Sorbian, Macedonian, Polish, Russian, Slovak, Slovenian, Serbian, Ukrainian, Latvian, Lithuanian, Afrikaans, Danish, German, English, Faroese, Western Frisian, Swiss German, Icelandic, Limburgan, Luxembourgish, Low German, Dutch, Norwegian Nynorsk, Norwegian, Scots, Swedish, Yiddish, Aragonese, Asturian, Catalan, French, Galician, Haitian, Italian, Latin, Lombard, Neapolitan, Piemontese, Portuguese, Romanian, Spanish, Venetian, Walloon, Breton, Welsh, Scottish Gaelic, Irish, Modern Greek (1453-), Armenian, Albanian, Dimli (individual language), Persian, Gilaki, Kurdish, Tajik, Bengali, Bishnupriya, Gujarati, Fiji Hindi, Hindi, Marathi, Nepali (macrolanguage), Urdu, Amharic, Arabic, Egyptian Arabic, Hebrew, Estonian, Finnish, Hungarian, Basque, Georgian, Chuvash, Azerbaijani, Turkish, Uzbek, Kazakh, Tatar, Yakut, Korean, Mongolian, Telugu, Kannada, Malayalam, Tamil, Newari, Vietnamese, Indonesian, Javanese, Malagasy, Maori, Malay (macrolanguage), Pampanga, Sundanese, Tagalog, Waray (Philippines), Swahili (macrolanguage), Esperanto, Ido, Interlingua (International Auxiliary Language Association), and Volapük
- Description:
- Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger described in Yu et al. (2016, LREC, Portorož, Slovenia). Changes in version 1.1: 1. Universal Dependencies tagset instead of the older and smaller Google Universal POS tagset. 2. SVM classifier trained on Universal Dependencies 1.2 instead of HamleDT 2.0. 3. Balto-Slavic languages, Germanic languages and Romance languages were tagged by classifier trained only on the respective group of languages. Other languages were tagged by a classifier trained on all available languages. The "c7" combination from version 1.0 is no longer used.
- Rights:
- Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB
27. HamleDT 2.0
- Creator:
- Zeman, Daniel, Mareček, David, Mašek, Jan, Popel, Martin, Ramasamy, Loganathan, Rosa, Rudolf, Štěpánek, Jan, and Žabokrtský, Zdeněk
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- treebank, Stanford dependencies, Prague dependencies, harmonization, common annotation style, and Interset
- Language:
- Arabic, Bulgarian, Bengali, Catalan, Czech, Danish, German, Modern Greek (1453-), English, Spanish, Estonian, Basque, Persian, Finnish, Ancient Greek (to 1453), Hindi, Hungarian, Italian, Japanese, Latin, Dutch, Portuguese, Romanian, Russian, Slovak, Slovenian, Swedish, Tamil, Telugu, and Turkish
- Description:
- HamleDT 2.0 is a collection of 30 existing treebanks harmonized into a common annotation style, the Prague Dependencies, and further transformed into Stanford Dependencies, a treebank annotation style that became popular recently. We use the newest basic Universal Stanford Dependencies, without added language-specific subtypes.
- Rights:
- HamleDT 2.0 Licence Agreement, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-hamledt-2.0, and ACA
28. HamleDT 3.0
- Creator:
- Zeman, Daniel, Mareček, David, Mašek, Jan, Popel, Martin, Ramasamy, Loganathan, Rosa, Rudolf, Štěpánek, Jan, and Žabokrtský, Zdeněk
- Publisher:
- Charles University
- Type:
- text and corpus
- Subject:
- annotated corpus, morphology, syntax, dependency, treebank, harmonized annotation, and common annotation style
- Language:
- Arabic, Basque, Bengali, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Modern Greek (1453-), Ancient Greek (to 1453), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Persian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Tamil, Telugu, and Turkish
- Description:
- HamleDT (HArmonized Multi-LanguagE Dependency Treebank) is a compilation of existing dependency treebanks (or dependency conversions of other treebanks), transformed so that they all conform to the same annotation style. This version uses Universal Dependencies as the common annotation style. Update (November 1017): for a current collection of harmonized dependency treebanks, we recommend using the Universal Dependencies (UD). All of the corpora that are distributed in HamleDT in full are also part of the UD project; only some corpora from the Patch group (where HamleDT provides only the harmonizing scripts but not the full corpus data) are available in HamleDT but not in UD.
- Rights:
- HamleDT 3.0 License Terms, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-hamledt-3.0, and PUB
29. IWPT 2020 Shared Task Data and System Outputs
- Creator:
- Zeman, Daniel, Bouma, Gosse, and Seddah, Djamé
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, enhanced universal dependencies, shared task, and parsing
- Language:
- Arabic, Bulgarian, Czech, Dutch, English, Estonian, Finnish, French, Italian, Latvian, Lithuanian, Polish, Russian, Slovak, Swedish, Tamil, and Ukrainian
- Description:
- This package contains data used in the IWPT 2020 shared task. It contains training, development and test (evaluation) datasets. The data is based on a subset of Universal Dependencies release 2.5 (http://hdl.handle.net/11234/1-3105) but some treebanks contain additional enhanced annotations. Moreover, not all of these additions became part of Universal Dependencies release 2.6 (http://hdl.handle.net/11234/1-3226), which makes the shared task data unique and worth a separate release to enable later comparison with new parsing algorithms. The package also contains a number of Perl and Python scripts that have been used to process the data during preparation and during the shared task. Finally, the package includes the official primary submission of each team participating in the shared task.
- Rights:
- Licence Universal Dependencies v2.5, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.5, and PUB
30. IWPT 2021 Shared Task Data and System Outputs
- Creator:
- Zeman, Daniel, Bouma, Gosse, and Seddah, Djamé
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, enhanced universal dependencies, shared task, and parsing
- Language:
- Arabic, Bulgarian, Czech, Dutch, English, Estonian, Finnish, French, Italian, Latvian, Lithuanian, Polish, Russian, Slovak, Swedish, Tamil, and Ukrainian
- Description:
- This package contains data used in the IWPT 2021 shared task. It contains training, development and test (evaluation) datasets. The data is based on a subset of Universal Dependencies release 2.7 (http://hdl.handle.net/11234/1-3424) but some treebanks contain additional enhanced annotations. Moreover, not all of these additions became part of Universal Dependencies release 2.8 (http://hdl.handle.net/11234/1-3687), which makes the shared task data unique and worth a separate release to enable later comparison with new parsing algorithms. The package also contains a number of Perl and Python scripts that have been used to process the data during preparation and during the shared task. Finally, the package includes the official primary submission of each team participating in the shared task.
- Rights:
- Licence Universal Dependencies v2.7, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.7, and PUB
31. JRC-Acquis
- Publisher:
- Joint Research Centre of the EU
- Type:
- corpus
- Language:
- Bulgarian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Modern Greek (1453-), Hungarian, Italian, Latvian, Maltese, Norwegian, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, and Swedish
- Description:
- The largest parallel corpus, contains EU law, the Acquis Communautaire in 22 languages.
- Rights:
- Not specified
32. L2 Acquisition P-Moll Norbert Dittmar
- Publisher:
- Max Planck Institute for Psycholinguistics
- Type:
- corpus
- Language:
- German, Italian, and Polish
- Description:
- Language Acquisition corpus
- Rights:
- Not specified
33. LAC Italian Corpus
- Publisher:
- Max Planck Institute for Psycholinguistics
- Type:
- corpus
- Language:
- Italian
- Description:
- Language and Cognition corpus
- Rights:
- Not specified
34. Large-Scale Colloquial Persian 0.5
- Creator:
- Abdi Khojasteh, Hadi, Ansari, Ebrahim, and Bohlouli, Mahdi
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) and Institute for Advanced Studies in Basic Sciences (IASBS)
- Type:
- text and corpus
- Subject:
- PoS tagging, corpus, annotated corpus, multilingual, derivation, dependency parser, machine translation, informal language, spoken language, monolingual corpus, and bilingual corpus annotation
- Language:
- Persian, English, German, Czech, Italian, and Hindi
- Description:
- "Large Scale Colloquial Persian Dataset" (LSCP) is hierarchically organized in asemantic taxonomy that focuses on multi-task informal Persian language understanding as a comprehensive problem. LSCP includes 120M sentences from 27M casual Persian tweets with its dependency relations in syntactic annotation, Part-of-speech tags, sentiment polarity and automatic translation of original Persian sentences in five different languages (EN, CS, DE, IT, HI).
- Rights:
- Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), http://creativecommons.org/licenses/by-nc-nd/4.0/, and PUB
35. Morpho-syntactically annotated corpora provided for the PARSEME Shared Task on Semi-Supervised Identification of Verbal Multiword Expressions (edition 1.2)
- Creator:
- Guillaume, Bruno, Ramisch, Carlos, Waszczuk, Jakub, Monti, Johanna, Di Buono, Maria Pia, Sangati, Federico, Speranza, Giulia, Carlino, Carola, Güngör, Tunga, Yirmibeşoğlu, Zeynep, Sak, Haşim, Saraçlar, Murat, Giouli, Voula, Foufi, Vassiliki, Ramisch, Renata, Rademaker, Alexandre, Vale, Oto, Wilkens, Rodrigo, Candito, Marie, Crabbé, Benoît, Segonne, Vincent, Liebeskind, Chaya, Stymne, Sara, Hajič, Jan, Ginter, Filip, Luotolahti, Juhani, Straka, Milan, Zeman, Daniel, Barbu Mititelu, Verginica, Cristescu, Mihaela, Vaidya, Ashwini, Bhatia, Archna, Lichte, Timm, Ehren, Rafael, Jiang, Menghan, Xu, Hongzhi, Walsh, Abigail, Irimia, Elena, and Dowling, Meghan
- Publisher:
- PARSEME
- Type:
- text and corpus
- Subject:
- morphosyntactic annotation, dependency trees, and morphological analysis
- Language:
- German, Modern Greek (1453-), Basque, French, Irish, Hebrew, Hindi, Italian, Polish, Portuguese, Romanian, Swedish, Turkish, and Chinese
- Description:
- This multilingual resource contains corpora for 14 languages, gathered at the occasion of the 1.2 edition of the PARSEME Shared Task on semi-supervised Identification of Verbal MWEs (2020). These corpora were meant to serve as additional "raw" corpora, to help discovering unseen verbal MWEs. The corpora are provided in CONLL-U (https://universaldependencies.org/format.html) format. They contain morphosyntactic annotations (parts of speech, lemmas, morphological features, and syntactic dependencies). Depending on the language, the information comes from treebanks (mostly Universal Dependencies v2.x) or from automatic parsers trained on UD v2.x treebanks (e.g., UDPipe). VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). For the 1.2 shared task edition, the data covers 14 languages, for which VMWEs were annotated according to the universal guidelines. The corpora are provided in the cupt format, inspired by the CONLL-U format. Morphological and syntactic information – not necessarily using UD tagsets – including parts of speech, lemmas, morphological features and/or syntactic dependencies are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). This item contains training, development and test data, as well as the evaluation tools used in the PARSEME Shared Task 1.2 (2020). The annotation guidelines are available online: http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.2
- Rights:
- PARSEME Shared Task Raw Corpus Data (v. 1.2) Agreement, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-mwe-1.2-raw, and PUB
36. ParaCrawl Corpus version 1.0
- Creator:
- Koehn, Philipp, Heafield, Kenneth, Forcada, Mikel L., Esplà-Gomis, Miquel, Ortiz-Rojas, Sergio, Sánchez, Gema Ramírez, Cartagena, Víctor M. Sánchez, Haddow, Barry, Bañón, Marta, Střelec, Marek, Samiotou, Anna, and Kamran, Amir
- Publisher:
- ParaCrawl
- Type:
- text and corpus
- Subject:
- ParaCrawl, parallel corpus, CommonCrawl, machine translation, and text corpora
- Language:
- English, German, French, Spanish, Italian, Portuguese, Dutch, Polish, Czech, Romanian, Finnish, Latvian, Russian, and Estonian
- Description:
- The January 2018 release of the ParaCrawl is the first version of the corpus. It contains parallel corpora for 11 languages paired with English, crawled from a large number of web sites. The selection of websites is based on CommonCrawl, but ParaCrawl is extracted from a brand new crawl which has much higher coverage of these selected websites than CommonCrawl. Since the data is fairly raw, it is released with two quality metrics that can be used for corpus filtering. An official "clean" version of each corpus uses one of the metrics. For more details and raw data download please visit: http://paracrawl.eu/releases.html
- Rights:
- Public Domain Dedication (CC Zero), http://creativecommons.org/publicdomain/zero/1.0/, and PUB
37. PARSEME corpora annotated for verbal multiword expressions (version 1.3)
- Creator:
- Savary, Agata, Ramisch, Carlos, Guillaume, Bruno, Hawwari, Abdelati, Walsh, Abigail, Fotopoulou, Aggeliki, Bielinskienė, Agnė, Estarrona, Ainara, Gatt, Albert, Butler, Alexandra, Rademaker, Alexandre, Maldonado, Alfredo, Villavicencio, Aline, Farrugia, Alison, Muscat, Amanda, Gatt, Anabelle, Antić, Anđela, De Santis, Anna, Raffone, Annalisa, Riccio, Anna, Pascucci, Antonio, Gurrutxaga, Antton, Bhatia, Archna, Vaidya, Ashwini, Miral, Ayşenur, QasemiZadeh, Behrang, Priego Sanchez, Belem, Griciūtė, Bernadeta, Erden, Berna, Parra Escartín, Carla, Herrero, Carlos, Carlino, Carola, Pasquer, Caroline, Liebeskind, Chaya, Wang, Chenweng, Ben Khelil, Chérifa, Bonial, Claire, Somers, Clarissa, Aceta, Cristina, Krstev, Cvetana, Bejček, Eduard, Lindqvist, Ellinor, Erenmalm, Elsa, Palka-Binkiewicz, Emilia, Rimkute, Erika, Petterson, Eva, Cap, Fabienne, Hu, Fangyuan, Sangati, Federico, Wick Pedro, Gabriela, Speranza, Giulia, Jagfeld, Glorianna, Blagus, Goranka, Berk, Gözde, Attard, Greta, Eryiğit, Gülşen, Finnveden, Gustav, Martínez Alonso, Héctor, de Medeiros Caseli, Helena, Elyovich, Hevi, Xu, Hongzhi, Xiao, Huangyang, Miranda, Isaac, Jaknić, Isidora, El Maarouf, Ismail, Aduriz, Itziar, Gonzalez, Itziar, Matas, Ivana, Stoyanova, Ivelina, Jazbec, Ivo-Pavao, Busuttil, Jael, Waszczuk, Jakub, Findlay, Jamie, Bonnici, Janice, Šnajder, Jan, Antoine, Jean-Yves, Foster, Jennifer, Chen, Jia, Nivre, Joakim, Monti, Johanna, McCrae, John, Kovalevskaitė, Jolanta, Jain, Kanishka, Simkó, Katalin, Yu, Ke, Azzopardi, Kirsty, Adalı, Kübra, Uria, Larraitz, Zilio, Leonardo, Boizou, Loïc, van der Plas, Lonneke, Galea, Luke, Sarlak, Mahtab, Buljan, Maja, Cherchi, Manuela, Tanti, Marc, Di Buono, Maria Pia, Todorova, Maria, Candito, Marie, Constant, Matthieu, Shamsfard, Mehrnoush, Jiang, Menghan, Boz, Mert, Spagnol, Michael, Onofrei, Mihaela, Li, Minli, Elbadrashiny, Mohamed, Diab, Mona, Rizea, Monica-Mihaela, Hadj Mohamed, Najet, Theoxari, Natasa, Schneider, Nathan, Tabone, Nicole, Ljubešić, Nikola, Vale, Oto, Cook, Paul, Yan, Peiyi, Gantar, Polona, Ehren, Rafael, Fabri, Ray, Ibrahim, Rehab, Ramisch, Renata, Walles, Rinat, Wilkens, Rodrigo, Urizar, Ruben, Sun, Ruilong, Malka, Ruth, Galea, Sara Anne, Stymne, Sara, Louizou, Sevasti, Hu, Sha, Taslimipoor, Shiva, Ratori, Shraddha, Srivastava, Shubham, Cordeiro, Silvio Ricardo, Krek, Simon, Liu, Siyuan, Zeng, Si, Yu, Songping, Arhar Holdt, Špela, Markantonatou, Stella, Papadelli, Stella, Leseva, Svetlozara, Kuzman, Taja, Kavčič, Teja, Lynn, Teresa, Lichte, Timm, Pickard, Thomas, Dimitrova, Tsvetana, Yih, Tsy, Güngör, Tunga, Dinç, Tutkum, Iñurrieta, Uxoa, Tajalli, Vahide, Stefanova, Valentina, Caruso, Valeria, Puri, Vandana, Foufi, Vassiliki, Barbu Mititelu, Verginica, Vincze, Veronika, Kovács, Viktória, Shukla, Vishakha, Giouli, Voula, Ge, Xiaomin, Ha-Cohen Kerner, Yaakov, Öztürk, Yağmur, Yarandi, Yalda, Parmentier, Yannick, Zhang, Yongchen, Zhao, Yun, Urešová, Zdeňka, Yirmibeşoğlu, Zeynep, Qin, Zhenzhen, Stank, Cristescu, Mihaela, Zgreabăn, Bianca-Mădălina, Bărbulescu, Elena-Andreea, and Stanković, Ranka
- Publisher:
- PARSEME
- Type:
- text and corpus
- Subject:
- multiword expressions, verbal multiword expressions, light verb construction, verb-particle constructions, inherently reflexive verbs, verbal idioms, and multi-verb constructions
- Language:
- Arabic, Bulgarian, Czech, German, Modern Greek (1453-), English, Spanish, Basque, Persian, French, Irish, Hebrew, Hindi, Croatian, Hungarian, Lithuanian, Italian, Maltese, Polish, Portuguese, Romanian, Slovenian, Serbian, Swedish, Turkish, and Chinese
- Description:
- This multilingual resource contains corpora in which verbal MWEs have been manually annotated. VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). This is the first release of the corpora without an associated shared task. Previous version (1.2) was associated with the PARSEME Shared Task on semi-supervised Identification of Verbal MWEs (2020). The data covers 26 languages corresponding to the combination of the corpora for all previous three editions (1.0, 1.1 and 1.2) of the corpora. VMWEs were annotated according to the universal guidelines. The corpora are provided in the cupt format, inspired by the CONLL-U format. Morphological and syntactic information, including parts of speech, lemmas, morphological features and/or syntactic dependencies, are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). All corpora are split into training, development and test data, following the splitting strategy adopted for the PARSEME Shared Task 1.2. The annotation guidelines are available online: https://parsemefr.lis-lab.fr/parseme-st-guidelines/1.3 The .cupt format is detailed here: https://multiword.sourceforge.net/cupt-format/
- Rights:
- PARSEME Corpora v. 1.3 - Licence Agreement, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-mwe-1.3, and PUB
38. Plaintext Wikipedia dump 2018
- Creator:
- Rosa, Rudolf
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- Wikipedia, text corpora, and monolingual corpus
- Language:
- Abkhazian, Achinese, Adyghe, Afrikaans, Akan, Tosk Albanian, Amharic, Old English (ca. 450-1100), Arabic, Official Aramaic (700-300 BCE), Aragonese, Egyptian Arabic, Assamese, Asturian, Atikamekw, Avaric, Aymara, South Azerbaijani, Azerbaijani, Bashkir, Bambara, Bavarian, Central Bikol, Belarusian, Bengali, Bislama, Banjar, Tibetan, Bosnian, Bishnupriya, Breton, Buginese, Bulgarian, Russia Buriat, Catalan, Min Dong Chinese, Cebuano, Czech, Chamorro, Chechen, Cherokee, Church Slavic, Chuvash, Cheyenne, Central Kurdish, Cornish, Corsican, Cree, Crimean Tatar, Kashubian, Welsh, Danish, German, Dinka, Dimli (individual language), Dhivehi, Lower Sorbian, Dzongkha, Modern Greek (1453-), English, Esperanto, Estonian, Basque, Ewe, Extremaduran, Faroese, Persian, Fijian, Finnish, French, Arpitan, Northern Frisian, Western Frisian, Fulah, Friulian, Gagauz, Gan Chinese, Scottish Gaelic, Irish, Galician, Gilaki, Manx, Goan Konkani, Gothic, Guarani, Gujarati, Hakka Chinese, Haitian, Hausa, Hawaiian, Serbo-Croatian, Hebrew, Herero, Fiji Hindi, Hindi, Hiri Motu, Croatian, Upper Sorbian, Hungarian, Armenian, Igbo, Ido, Inuktitut, Interlingue, Iloko, Interlingua (International Auxiliary Language Association), Indonesian, Inupiaq, Icelandic, Italian, Jamaican Creole English, Javanese, Lojban, Japanese, Kara-Kalpak, Kabyle, Kalaallisut, Kannada, Kashmiri, Georgian, Kanuri, Kazakh, Kabardian, Kabiyè, Khmer, Kikuyu, Kinyarwanda, Kirghiz, Komi-Permyak, Komi, Kongo, Korean, Karachay-Balkar, Kölsch, Kurdish, Ladino, Lao, Latin, Latvian, Lak, Lezghian, Ligurian, Limburgan, Lingala, Lithuanian, Lombard, Northern Luri, Latgalian, Luxembourgish, Ganda, Literary Chinese, Marshallese, Maithili, Malayalam, Marathi, Moksha, Eastern Mari, Minangkabau, Macedonian, Malagasy, Maltese, Mongolian, Maori, Western Mari, Malay (macrolanguage), Creek, Mirandese, Burmese, Erzya, Mazanderani, Min Nan Chinese, Neapolitan, Nauru, Navajo, Ndonga, Low German, Nepali (macrolanguage), Newari, Dutch, Norwegian Nynorsk, Norwegian, Novial, Pedi, Nyanja, Occitan (post 1500), Livvi, Oriya (macrolanguage), Oromo, Ossetian, Pangasinan, Pampanga, Panjabi, Papiamento, Picard, Pennsylvania German, Pfaelzisch, Pitcairn-Norfolk, Pali, Piemontese, Western Panjabi, Pontic, Polish, Portuguese, Pushto, Quechua, Vlax Romani, Romansh, Romanian, Rusyn, Rundi, Macedo-Romanian, Russian, Sango, Yakut, Sanskrit, Sicilian, Scots, Samogitian, Sinhala, Slovak, Slovenian, Northern Sami, Samoan, Shona, Sindhi, Somali, Southern Sotho, Spanish, Albanian, Sardinian, Sranan Tongo, Serbian, Swati, Saterfriesisch, Sundanese, Swahili (macrolanguage), Swedish, Silesian, Tahitian, Tamil, Tatar, Tulu, Telugu, Tama (Colombia), Tetum, Tajik, Tagalog, Thai, Tigrinya, Tonga (Tonga Islands), Tok Pisin, Tswana, Tsonga, Turkmen, Tumbuka, Turkish, Twi, Tuvinian, Udmurt, Uighur, Ukrainian, Urdu, Uzbek, Venetian, Venda, Veps, Vietnamese, Vlaams, Volapük, Võro, Waray (Philippines), Walloon, Wolof, Wu Chinese, Kalmyk, Xhosa, Mingrelian, Yiddish, Yoruba, Yue Chinese, Zeeuws, Zhuang, Chinese, Zulu, and Dotyali
- Description:
- Wikipedia plain text data obtained from Wikipedia dumps with WikiExtractor in February 2018. The data come from all Wikipedias for which dumps could be downloaded at [https://dumps.wikimedia.org/]. This amounts to 297 Wikipedias, usually corresponding to individual languages and identified by their ISO codes. Several special Wikipedias are included, most notably "simple" (Simple English Wikipedia) and "incubator" (tiny hatching Wikipedias in various languages). For a list of all the Wikipedias, see [https://meta.wikimedia.org/wiki/List_of_Wikipedias]. The script which can be used to get new version of the data is included, but note that Wikipedia limits the download speed for downloading a lot of the dumps, so it takes a few days to download all of them (but one or a few can be downloaded fast). Also, the format of the dumps changes time to time, so the script will probably eventually stop working one day. The WikiExtractor tool [http://medialab.di.unipi.it/wiki/Wikipedia_Extractor] used to extract text from the Wikipedia dumps is not mine, I only modified it slightly to produce plaintext outputs [https://github.com/ptakopysk/wikiextractor].
- Rights:
- Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0), http://creativecommons.org/licenses/by-sa/3.0/, and PUB
39. Project Gutenberg
- Type:
- corpus
- Language:
- Danish, Dutch, English, Finnish, French, German, Italian, Latin, Portuguese, Russian, Spanish, Swedish, and Telugu
- Description:
- Possibility to download or to browse free electronic books; Angebot: Download von und Online-Zugang zu frei verfügbaren E-Books; deutschsprachige Literatur stellt nur einen Teilbereich der verfügbaren E-Books dar
- Rights:
- Not specified
40. SenTube
- Publisher:
- Machine Learning and NLP group at Trento
- Type:
- corpus
- Subject:
- sentiment analysis
- Language:
- English and Italian
- Description:
- Sentiment analysis of Youtube videos with joint models of text and speech
- Rights:
- Not specified
41. SpeechDat-Car databases
- Type:
- corpus
- Language:
- Danish, Dutch, English, Finnish, French, German, Modern Greek (1453-), Italian, and Spanish
- Description:
- 9 speech databases for training and testing multilingual speech recognition applications in the car environment. Contains parallel 4 channel in-car recordings and a GSM channel. Contains interesting phonetically rich material. All orthographically transcribed. Speaker information included for gender, age, accent. Including pronunciation lexicon.
- Rights:
- Not specified
42. Speecon databases
- Type:
- corpus
- Language:
- Czech, Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Polish, Portuguese, Russian, Spanish, Swedish, Turkish, Chinese, Hebrew, Japanese, Korean, and Thai
- Description:
- 28 speech databases containing broadband recordings from 550 adults and 50 children per language. Contains interesting phonetically rich material. All orthographically transcribed. Speaker information included for gender, age, accent. Including pronunciation lexicon.
- Rights:
- Not specified
43. The National Certificates corpus
- Publisher:
- Centre for Applied Language Studies, University of Jyväskylä
- Type:
- corpus
- Language:
- English, Finnish, French, German, Italian, Russian, Spanish, and Swedish
- Description:
- The NC test results, background information, speaking and writing performances in 9 foreign / second languages. A web-based data base (html files).
- Rights:
- Not specified
44. Universal Dependencies 1.0
- Creator:
- Nivre, Joakim, Bosco, Cristina, Choi, Jinho, de Marneffe, Marie-Catherine, Dozat, Timothy, Farkas, Richárd, Foster, Jennifer, Ginter, Filip, Goldberg, Yoav, Hajič, Jan, Kanerva, Jenna, Laippala, Veronika, Lenci, Alessandro, Lynn, Teresa, Manning, Christopher, McDonald, Ryan, Missilä, Anna, Montemagni, Simonetta, Petrov, Slav, Pyysalo, Sampo, Silveira, Natalia, Simi, Maria, Smith, Aaron, Tsarfaty, Reut, Vincze, Veronika, and Zeman, Daniel
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Czech, German, English, Spanish, Finnish, French, Irish, Italian, Swedish, and Hungarian
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Universal Dependencies 1.0 License Set, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-1.0, and PUB
45. Universal Dependencies 1.1
- Creator:
- Agić, Željko, Aranzabe, Maria Jesus, Atutxa, Aitziber, Bosco, Cristina, Choi, Jinho, de Marneffe, Marie-Catherine, Dozat, Timothy, Farkas, Richárd, Foster, Jennifer, Ginter, Filip, Goenaga, Iakes, Gojenola, Koldo, Goldberg, Yoav, Hajič, Jan, Johannsen, Anders Trærup, Kanerva, Jenna, Kuokkala, Juha, Laippala, Veronika, Lenci, Alessandro, Lindén, Krister, Ljubešić, Nikola, Lynn, Teresa, Manning, Christopher, Martínez, Héctor Alonso, McDonald, Ryan, Missilä, Anna, Montemagni, Simonetta, Nivre, Joakim, Nurmi, Hanna, Osenova, Petya, Petrov, Slav, Piitulainen, Jussi, Plank, Barbara, Prokopidis, Prokopis, Pyysalo, Sampo, Seeker, Wolfgang, Seraji, Mojgan, Silveira, Natalia, Simi, Maria, Simov, Kiril, Smith, Aaron, Tsarfaty, Reut, Vincze, Veronika, and Zeman, Daniel
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency syntax, morphology, harmonized annotation, interset, universal tagset, stanford dependencies, and universal dependencies
- Language:
- Basque, Bulgarian, Croatian, Czech, Danish, English, Finnish, French, German, Modern Greek (1453-), Hebrew, Hungarian, Indonesian, Irish, Italian, Persian, Spanish, and Swedish
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008). This is the second release of UD Treebanks, Version 1.1.
- Rights:
- Licence Universal Dependencies v1.1, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-1.1, and PUB
46. Universal Dependencies 1.2
- Creator:
- Nivre, Joakim, Agić, Željko, Aranzabe, Maria Jesus, Asahara, Masayuki, Atutxa, Aitziber, Ballesteros, Miguel, Bauer, John, Bengoetxea, Kepa, Bhat, Riyaz Ahmad, Bosco, Cristina, Bowman, Sam, Celano, Giuseppe G. A., Connor, Miriam, de Marneffe, Marie-Catherine, Diaz de Ilarraza, Arantza, Dobrovoljc, Kaja, Dozat, Timothy, Erjavec, Tomaž, Farkas, Richárd, Foster, Jennifer, Galbraith, Daniel, Ginter, Filip, Goenaga, Iakes, Gojenola, Koldo, Goldberg, Yoav, Gonzales, Berta, Guillaume, Bruno, Hajič, Jan, Haug, Dag, Ion, Radu, Irimia, Elena, Johannsen, Anders, Kanayama, Hiroshi, Kanerva, Jenna, Krek, Simon, Laippala, Veronika, Lenci, Alessandro, Ljubešić, Nikola, Lynn, Teresa, Manning, Christopher, Mărănduc, Cătălina, Mareček, David, Martínez Alonso, Héctor, Mašek, Jan, Matsumoto, Yuji, McDonald, Ryan, Missilä, Anna, Mititelu, Verginica, Miyao, Yusuke, Montemagni, Simonetta, Mori, Shunsuke, Nurmi, Hanna, Osenova, Petya, Øvrelid, Lilja, Pascual, Elena, Passarotti, Marco, Perez, Cenel-Augusto, Petrov, Slav, Piitulainen, Jussi, Plank, Barbara, Popel, Martin, Prokopidis, Prokopis, Pyysalo, Sampo, Ramasamy, Loganathan, Rosa, Rudolf, Saleh, Shadi, Schuster, Sebastian, Seeker, Wolfgang, Seraji, Mojgan, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Simov, Kiril, Smith, Aaron, Štěpánek, Jan, Suhr, Alane, Szántó, Zsolt, Tanaka, Takaaki, Tsarfaty, Reut, Uematsu, Sumire, Uria, Larraitz, Varga, Viktor, Vincze, Veronika, Žabokrtský, Zdeněk, Zeman, Daniel, and Zhu, Hanzhi
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, and Tamil
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v1.2, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-1.2, and PUB
47. Universal Dependencies 1.3
- Creator:
- Nivre, Joakim, Agić, Željko, Ahrenberg, Lars, Aranzabe, Maria Jesus, Asahara, Masayuki, Atutxa, Aitziber, Ballesteros, Miguel, Bauer, John, Bengoetxea, Kepa, Berzak, Yevgeni, Bhat, Riyaz Ahmad, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Cebiroğlu Eryiğit, Gülşen, Celano, Giuseppe G. A., Çöltekin, Çağrı, Connor, Miriam, de Marneffe, Marie-Catherine, Diaz de Ilarraza, Arantza, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Erjavec, Tomaž, Farkas, Richárd, Foster, Jennifer, Galbraith, Daniel, Garza, Sebastian, Ginter, Filip, Goenaga, Iakes, Gojenola, Koldo, Gokirmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, Gonzáles Saavedra, Berta, Grūzītis, Normunds, Guillaume, Bruno, Hajič, Jan, Haug, Dag, Hladká, Barbora, Ion, Radu, Irimia, Elena, Johannsen, Anders, Kaşıkara, Hüner, Kanayama, Hiroshi, Kanerva, Jenna, Katz, Boris, Kenney, Jessica, Krek, Simon, Laippala, Veronika, Lam, Lucia, Lenci, Alessandro, Ljubešić, Nikola, Lyashevskaya, Olga, Lynn, Teresa, Makazhanov, Aibek, Manning, Christopher, Mărănduc, Cătălina, Mareček, David, Martínez Alonso, Héctor, Mašek, Jan, Matsumoto, Yuji, McDonald, Ryan, Missilä, Anna, Mititelu, Verginica, Miyao, Yusuke, Montemagni, Simonetta, Mori, Keiko Sophie, Mori, Shunsuke, Muischnek, Kadri, Mustafina, Nina, Müürisep, Kaili, Nikolaev, Vitaly, Nurmi, Hanna, Osenova, Petya, Øvrelid, Lilja, Pascual, Elena, Passarotti, Marco, Perez, Cenel-Augusto, Petrov, Slav, Piitulainen, Jussi, Plank, Barbara, Popel, Martin, Pretkalniņa, Lauma, Prokopidis, Prokopis, Puolakainen, Tiina, Pyysalo, Sampo, Ramasamy, Loganathan, Rituma, Laura, Rosa, Rudolf, Saleh, Shadi, Saulīte, Baiba, Schuster, Sebastian, Seeker, Wolfgang, Seraji, Mojgan, Shakurova, Lena, Shen, Mo, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Simov, Kiril, Smith, Aaron, Spadine, Carolyn, Suhr, Alane, Sulubacak, Umut, Szántó, Zsolt, Tanaka, Takaaki, Tsarfaty, Reut, Tyers, Francis, Uematsu, Sumire, Uria, Larraitz, van Noord, Gertjan, Varga, Viktor, Vincze, Veronika, Wang, Jing Xian, Washington, Jonathan North, Žabokrtský, Zdeněk, Zeman, Daniel, and Zhu, Hanzhi
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, and Turkish
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v1.3, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-1.3, and PUB
48. Universal Dependencies 1.4
- Creator:
- Nivre, Joakim, Agić, Željko, Ahrenberg, Lars, Aranzabe, Maria Jesus, Asahara, Masayuki, Atutxa, Aitziber, Ballesteros, Miguel, Bauer, John, Bengoetxea, Kepa, Berzak, Yevgeni, Bhat, Riyaz Ahmad, Bick, Eckhard, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Cebiroğlu Eryiğit, Gülşen, Celano, Giuseppe G. A., Chalub, Fabricio, Çöltekin, Çağrı, Connor, Miriam, Davidson, Elizabeth, de Marneffe, Marie-Catherine, Diaz de Ilarraza, Arantza, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eli, Marhaba, Erjavec, Tomaž, Farkas, Richárd, Foster, Jennifer, Freitas, Claudia, Gajdošová, Katarína, Galbraith, Daniel, Garcia, Marcos, Gärdenfors, Moa, Garza, Sebastian, Ginter, Filip, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, Gonzáles Saavedra, Berta, Grioni, Matias, Grūzītis, Normunds, Guillaume, Bruno, Hajič, Jan, Hà Mỹ, Linh, Haug, Dag, Hladká, Barbora, Ion, Radu, Irimia, Elena, Johannsen, Anders, Jørgensen, Fredrik, Kaşıkara, Hüner, Kanayama, Hiroshi, Kanerva, Jenna, Katz, Boris, Kenney, Jessica, Kotsyba, Natalia, Krek, Simon, Laippala, Veronika, Lam, Lucia, Lê Hồng, Phương, Lenci, Alessandro, Ljubešić, Nikola, Lyashevskaya, Olga, Lynn, Teresa, Makazhanov, Aibek, Manning, Christopher, Mărănduc, Cătălina, Mareček, David, Martínez Alonso, Héctor, Martins, André, Mašek, Jan, Matsumoto, Yuji, McDonald, Ryan, Missilä, Anna, Mititelu, Verginica, Miyao, Yusuke, Montemagni, Simonetta, Mori, Keiko Sophie, Mori, Shunsuke, Moskalevskyi, Bohdan, Muischnek, Kadri, Mustafina, Nina, Müürisep, Kaili, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikolaev, Vitaly, Nurmi, Hanna, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Paiva, Valeria, Pascual, Elena, Passarotti, Marco, Perez, Cenel-Augusto, Petrov, Slav, Piitulainen, Jussi, Plank, Barbara, Popel, Martin, Pretkalniņa, Lauma, Prokopidis, Prokopis, Puolakainen, Tiina, Pyysalo, Sampo, Rademaker, Alexandre, Ramasamy, Loganathan, Real, Livy, Rituma, Laura, Rosa, Rudolf, Saleh, Shadi, Saulīte, Baiba, Schuster, Sebastian, Seeker, Wolfgang, Seraji, Mojgan, Shakurova, Lena, Shen, Mo, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Smith, Aaron, Spadine, Carolyn, Suhr, Alane, Sulubacak, Umut, Szántó, Zsolt, Tanaka, Takaaki, Tsarfaty, Reut, Tyers, Francis, Uematsu, Sumire, Uria, Larraitz, van Noord, Gertjan, Varga, Viktor, Vincze, Veronika, Wallin, Lars, Wang, Jing Xian, Washington, Jonathan North, Wirén, Mats, Žabokrtský, Zdeněk, Zeldes, Amir, Zeman, Daniel, and Zhu, Hanzhi
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Swedish Sign Language, Ukrainian, Uighur, and Vietnamese
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v1.4, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-1.4, and PUB
49. Universal Dependencies 2.0
- Creator:
- Nivre, Joakim, Agić, Željko, Ahrenberg, Lars, Aranzabe, Maria Jesus, Asahara, Masayuki, Atutxa, Aitziber, Ballesteros, Miguel, Bauer, John, Bengoetxea, Kepa, Bhat, Riyaz Ahmad, Bick, Eckhard, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Candito, Marie, Cebiroğlu Eryiğit, Gülşen, Celano, Giuseppe G. A., Chalub, Fabricio, Choi, Jinho, Çöltekin, Çağrı, Connor, Miriam, Davidson, Elizabeth, de Marneffe, Marie-Catherine, de Paiva, Valeria, Diaz de Ilarraza, Arantza, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eli, Marhaba, Erjavec, Tomaž, Farkas, Richárd, Foster, Jennifer, Freitas, Cláudia, Gajdošová, Katarína, Galbraith, Daniel, Garcia, Marcos, Ginter, Filip, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, Gonzáles Saavedra, Berta, Grioni, Matias, Grūzītis, Normunds, Guillaume, Bruno, Habash, Nizar, Hajič, Jan, Hà Mỹ, Linh, Haug, Dag, Hladká, Barbora, Hohle, Petter, Ion, Radu, Irimia, Elena, Johannsen, Anders, Jørgensen, Fredrik, Kaşıkara, Hüner, Kanayama, Hiroshi, Kanerva, Jenna, Kotsyba, Natalia, Krek, Simon, Laippala, Veronika, Lê Hồng, Phương, Lenci, Alessandro, Ljubešić, Nikola, Lyashevskaya, Olga, Lynn, Teresa, Makazhanov, Aibek, Manning, Christopher, Mărănduc, Cătălina, Mareček, David, Martínez Alonso, Héctor, Martins, André, Mašek, Jan, Matsumoto, Yuji, McDonald, Ryan, Missilä, Anna, Mititelu, Verginica, Miyao, Yusuke, Montemagni, Simonetta, More, Amir, Mori, Shunsuke, Moskalevskyi, Bohdan, Muischnek, Kadri, Mustafina, Nina, Müürisep, Kaili, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikolaev, Vitaly, Nurmi, Hanna, Ojala, Stina, Osenova, Petya, Øvrelid, Lilja, Pascual, Elena, Passarotti, Marco, Perez, Cenel-Augusto, Perrier, Guy, Petrov, Slav, Piitulainen, Jussi, Plank, Barbara, Popel, Martin, Pretkalniņa, Lauma, Prokopidis, Prokopis, Puolakainen, Tiina, Pyysalo, Sampo, Rademaker, Alexandre, Ramasamy, Loganathan, Real, Livy, Rituma, Laura, Rosa, Rudolf, Saleh, Shadi, Sanguinetti, Manuela, Saulīte, Baiba, Schuster, Sebastian, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shakurova, Lena, Shen, Mo, Sichinava, Dmitry, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Smith, Aaron, Suhr, Alane, Sulubacak, Umut, Szántó, Zsolt, Taji, Dima, Tanaka, Takaaki, Tsarfaty, Reut, Tyers, Francis, Uematsu, Sumire, Uria, Larraitz, van Noord, Gertjan, Varga, Viktor, Vincze, Veronika, Washington, Jonathan North, Žabokrtský, Zdeněk, Zeldes, Amir, Zeman, Daniel, and Zhu, Hanzhi
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, and Urdu
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008). This release is special in that the treebanks will be used as training/development data in the CoNLL 2017 shared task (http://universaldependencies.org/conll17/). Test data are not released, except for the few treebanks that do not take part in the shared task. 64 treebanks will be in the shared task, and they correspond to the following 45 languages: Ancient Greek, Arabic, Basque, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Gothic, Greek, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Kazakh, Korean, Latin, Latvian, Norwegian, Old Church Slavonic, Persian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Turkish, Ukrainian, Urdu, Uyghur and Vietnamese. This release fixes a bug in http://hdl.handle.net/11234/1-1976. Changed files: ud-tools-v2.0.tgz (conllu_to_text.pl, conllu_to_conllx.pl; added text_without_spaces.pl), ud-treebanks-conll2017.tgz (fi_ftb-ud-train.txt, he-ud-train.txt, it-ud-train.txt, pt_br-ud-train.txt, es-ud-train.txt) and ud-treebanks-v2.0.tgz (fi_ftb-ud-train.txt, he-ud-train.txt, it-ud-train.txt, pt_br-ud-train.txt, es-ud-train.txt, ar_nyuad-ud-dev.txt, ar_nyuad-ud-test.txt, ar_nyuad-ud-train.txt, cop-ud-dev.txt, cop-ud-test.txt, cop-ud-train.txt, sa-ud-dev.txt, sa-ud-test.txt, sa-ud-train.txt).
- Rights:
- Licence Universal Dependencies v2.0, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.0, and PUB
50. Universal Dependencies 2.0 – CoNLL 2017 Shared Task Development and Test Data
- Creator:
- Nivre, Joakim, Agić, Željko, Ahrenberg, Lars, Antonsen, Lene, Aranzabe, Maria Jesus, Asahara, Masayuki, Ateyah, Luma, Attia, Mohammed, Atutxa, Aitziber, Badmaeva, Elena, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Bauer, John, Bengoetxea, Kepa, Bhat, Riyaz Ahmad, Bick, Eckhard, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Burchardt, Aljoscha, Candito, Marie, Caron, Gauthier, Cebiroğlu Eryiğit, Gülşen, Celano, Giuseppe G. A., Cetin, Savas, Chalub, Fabricio, Choi, Jinho, Cho, Yongseok, Cinková, Silvie, Çöltekin, Çağrı, Connor, Miriam, de Marneffe, Marie-Catherine, de Paiva, Valeria, Diaz de Ilarraza, Arantza, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Eli, Marhaba, Elkahky, Ali, Erjavec, Tomaž, Farkas, Richárd, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Gajdošová, Katarína, Galbraith, Daniel, Garcia, Marcos, Ginter, Filip, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, Gonzáles Saavedra, Berta, Grioni, Matias, Grūzītis, Normunds, Guillaume, Bruno, Habash, Nizar, Hajič, Jan, Hajič jr., Jan, Hà Mỹ, Linh, Harris, Kim, Haug, Dag, Hladká, Barbora, Hlaváčová, Jaroslava, Hohle, Petter, Ion, Radu, Irimia, Elena, Johannsen, Anders, Jørgensen, Fredrik, Kaşıkara, Hüner, Kanayama, Hiroshi, Kanerva, Jenna, Kayadelen, Tolga, Kettnerová, Václava, Kirchner, Jesse, Kotsyba, Natalia, Krek, Simon, Kwak, Sookyoung, Laippala, Veronika, Lambertino, Lorenzo, Lando, Tatiana, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Li, Cheuk Ying, Li, Josie, Ljubešić, Nikola, Loginova, Olga, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Martínez Alonso, Héctor, Martins, André, Mašek, Jan, Matsumoto, Yuji, McDonald, Ryan, Mendonça, Gustavo, Missilä, Anna, Mititelu, Verginica, Miyao, Yusuke, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Mori, Shunsuke, Moskalevskyi, Bohdan, Muischnek, Kadri, Mustafina, Nina, Müürisep, Kaili, Nainwani, Pinkey, Nedoluzhko, Anna, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikolaev, Vitaly, Nitisaroj, Rattima, Nurmi, Hanna, Ojala, Stina, Osenova, Petya, Øvrelid, Lilja, Pascual, Elena, Passarotti, Marco, Perez, Cenel-Augusto, Perrier, Guy, Petrov, Slav, Piitulainen, Jussi, Pitler, Emily, Plank, Barbara, Popel, Martin, Pretkalniņa, Lauma, Prokopidis, Prokopis, Puolakainen, Tiina, Pyysalo, Sampo, Rademaker, Alexandre, Real, Livy, Reddy, Siva, Rehm, Georg, Rinaldi, Larissa, Rituma, Laura, Rosa, Rudolf, Rovati, Davide, Saleh, Shadi, Sanguinetti, Manuela, Saulīte, Baiba, Sawanakunanon, Yanin, Schuster, Sebastian, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shakurova, Lena, Shen, Mo, Shimada, Atsuko, Shohibussirri, Muh, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Smith, Aaron, Stella, Antonio, Strnadová, Jana, Suhr, Alane, Sulubacak, Umut, Szántó, Zsolt, Taji, Dima, Tanaka, Takaaki, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Tyers, Francis, Uematsu, Sumire, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, van Noord, Gertjan, Varga, Viktor, Vincze, Veronika, Washington, Jonathan North, Yu, Zhuoran, Žabokrtský, Zdeněk, Zeman, Daniel, and Zhu, Hanzhi
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Northern Sami, Upper Sorbian, Russia Buriat, and Northern Kurdish
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008). This release contains the test data used in the CoNLL 2017 shared task on parsing Universal Dependencies. Due to the shared task the test data was held hidden and not released together with the training and development data of UD 2.0. Therefore this release complements the UD 2.0 release (http://hdl.handle.net/11234/1-1983) to a full release of UD treebanks. In addition, the present release contains 18 new parallel test sets and 4 test sets in surprise languages. The present release also includes the development data already released with UD 2.0. Unlike regular UD releases, this one uses the folder-file structure that was visible to the systems participating in the shared task.
- Rights:
- Licence Universal Dependencies v2.0, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.0, and PUB
51. Universal Dependencies 2.0 alpha (obsolete)
- Creator:
- Nivre, Joakim, Agić, Željko, Ahrenberg, Lars, Aranzabe, Maria Jesus, Asahara, Masayuki, Atutxa, Aitziber, Ballesteros, Miguel, Bauer, John, Bengoetxea, Kepa, Bhat, Riyaz Ahmad, Bick, Eckhard, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Candito, Marie, Cebiroğlu Eryiğit, Gülşen, Celano, Giuseppe G. A., Chalub, Fabricio, Choi, Jinho, Çöltekin, Çağrı, Connor, Miriam, Davidson, Elizabeth, de Marneffe, Marie-Catherine, de Paiva, Valeria, Diaz de Ilarraza, Arantza, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eli, Marhaba, Erjavec, Tomaž, Farkas, Richárd, Foster, Jennifer, Freitas, Cláudia, Gajdošová, Katarína, Galbraith, Daniel, Garcia, Marcos, Ginter, Filip, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, Gonzáles Saavedra, Berta, Grioni, Matias, Grūzītis, Normunds, Guillaume, Bruno, Habash, Nizar, Hajič, Jan, Hà Mỹ, Linh, Haug, Dag, Hladká, Barbora, Hohle, Petter, Ion, Radu, Irimia, Elena, Johannsen, Anders, Jørgensen, Fredrik, Kaşıkara, Hüner, Kanayama, Hiroshi, Kanerva, Jenna, Kotsyba, Natalia, Krek, Simon, Laippala, Veronika, Lê Hồng, Phương, Lenci, Alessandro, Ljubešić, Nikola, Lyashevskaya, Olga, Lynn, Teresa, Makazhanov, Aibek, Manning, Christopher, Mărănduc, Cătălina, Mareček, David, Martínez Alonso, Héctor, Martins, André, Mašek, Jan, Matsumoto, Yuji, McDonald, Ryan, Missilä, Anna, Mititelu, Verginica, Miyao, Yusuke, Montemagni, Simonetta, More, Amir, Mori, Shunsuke, Moskalevskyi, Bohdan, Muischnek, Kadri, Mustafina, Nina, Müürisep, Kaili, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikolaev, Vitaly, Nurmi, Hanna, Ojala, Stina, Osenova, Petya, Øvrelid, Lilja, Pascual, Elena, Passarotti, Marco, Perez, Cenel-Augusto, Perrier, Guy, Petrov, Slav, Piitulainen, Jussi, Plank, Barbara, Popel, Martin, Pretkalniņa, Lauma, Prokopidis, Prokopis, Puolakainen, Tiina, Pyysalo, Sampo, Rademaker, Alexandre, Ramasamy, Loganathan, Real, Livy, Rituma, Laura, Rosa, Rudolf, Saleh, Shadi, Sanguinetti, Manuela, Saulīte, Baiba, Schuster, Sebastian, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shakurova, Lena, Shen, Mo, Sichinava, Dmitry, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Smith, Aaron, Suhr, Alane, Sulubacak, Umut, Szántó, Zsolt, Taji, Dima, Tanaka, Takaaki, Tsarfaty, Reut, Tyers, Francis, Uematsu, Sumire, Uria, Larraitz, van Noord, Gertjan, Varga, Viktor, Vincze, Veronika, Washington, Jonathan North, Žabokrtský, Zdeněk, Zeldes, Amir, Zeman, Daniel, and Zhu, Hanzhi
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, and Urdu
- Description:
- This release contains errors in several files. Please use http://hdl.handle.net/11234/1-1983 instead.
- Rights:
- Licence Universal Dependencies v2.0, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.0, and PUB
52. Universal Dependencies 2.1
- Creator:
- Nivre, Joakim, Agić, Željko, Ahrenberg, Lars, Antonsen, Lene, Aranzabe, Maria Jesus, Asahara, Masayuki, Ateyah, Luma, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Bauer, John, Bengoetxea, Kepa, Bhat, Riyaz Ahmad, Bick, Eckhard, Bobicev, Victoria, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Burchardt, Aljoscha, Candito, Marie, Caron, Gauthier, Cebiroğlu Eryiğit, Gülşen, Celano, Giuseppe G. A., Cetin, Savas, Chalub, Fabricio, Choi, Jinho, Cinková, Silvie, Çöltekin, Çağrı, Connor, Miriam, Davidson, Elizabeth, de Marneffe, Marie-Catherine, de Paiva, Valeria, Diaz de Ilarraza, Arantza, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eli, Marhaba, Elkahky, Ali, Erjavec, Tomaž, Farkas, Richárd, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Gajdošová, Katarína, Galbraith, Daniel, Garcia, Marcos, Gärdenfors, Moa, Gerdes, Kim, Ginter, Filip, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, Gonzáles Saavedra, Berta, Grioni, Matias, Grūzītis, Normunds, Guillaume, Bruno, Habash, Nizar, Hajič, Jan, Hajič jr., Jan, Hà Mỹ, Linh, Harris, Kim, Haug, Dag, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Ion, Radu, Irimia, Elena, Jelínek, Tomáš, Johannsen, Anders, Jørgensen, Fredrik, Kaşıkara, Hüner, Kanayama, Hiroshi, Kanerva, Jenna, Kayadelen, Tolga, Kettnerová, Václava, Kirchner, Jesse, Kotsyba, Natalia, Krek, Simon, Laippala, Veronika, Lambertino, Lorenzo, Lando, Tatiana, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Li, Cheuk Ying, Li, Josie, Li, Keying, Ljubešić, Nikola, Loginova, Olga, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Martínez Alonso, Héctor, Martins, André, Mašek, Jan, Matsumoto, Yuji, McDonald, Ryan, Mendonça, Gustavo, Miekka, Niko, Missilä, Anna, Mititelu, Cătălin, Miyao, Yusuke, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Mori, Shinsuke, Moskalevskyi, Bohdan, Muischnek, Kadri, Müürisep, Kaili, Nainwani, Pinkey, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikolaev, Vitaly, Nurmi, Hanna, Ojala, Stina, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Pascual, Elena, Passarotti, Marco, Perez, Cenel-Augusto, Perrier, Guy, Petrov, Slav, Piitulainen, Jussi, Pitler, Emily, Plank, Barbara, Popel, Martin, Pretkalniņa, Lauma, Prokopidis, Prokopis, Puolakainen, Tiina, Pyysalo, Sampo, Rademaker, Alexandre, Ramasamy, Loganathan, Rama, Taraka, Ravishankar, Vinit, Real, Livy, Reddy, Siva, Rehm, Georg, Rinaldi, Larissa, Rituma, Laura, Romanenko, Mykhailo, Rosa, Rudolf, Rovati, Davide, Sagot, Benoît, Saleh, Shadi, Samardžić, Tanja, Sanguinetti, Manuela, Saulīte, Baiba, Schuster, Sebastian, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shen, Mo, Shimada, Atsuko, Sichinava, Dmitry, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Smith, Aaron, Stella, Antonio, Straka, Milan, Strnadová, Jana, Suhr, Alane, Sulubacak, Umut, Szántó, Zsolt, Taji, Dima, Tanaka, Takaaki, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Tyers, Francis, Uematsu, Sumire, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Vajjala, Sowmya, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Villemonte de la Clergerie, Eric, Vincze, Veronika, Wallin, Lars, Washington, Jonathan North, Wirén, Mats, Wong, Tak-sum, Yu, Zhuoran, Žabokrtský, Zdeněk, Zeldes, Amir, Zeman, Daniel, and Zhu, Hanzhi
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, and Telugu
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v2.1, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.1, and PUB
53. Universal Dependencies 2.10
- Creator:
- Zeman, Daniel, Nivre, Joakim, Abrams, Mitchell, Ackermann, Elia, Aepli, Noëmi, Aghaei, Hamid, Agić, Željko, Ahmadi, Amir, Ahrenberg, Lars, Ajede, Chika Kennedy, Aleksandravičiūtė, Gabrielė, Alfina, Ika, Algom, Avner, Andersen, Erik, Antonsen, Lene, Aplonova, Katya, Aquino, Angelina, Aragon, Carolina, Aranes, Glyd, Aranzabe, Maria Jesus, Arıcan, Bilge Nas, Arnardóttir, Þórunn, Arutie, Gashaw, Arwidarasti, Jessica Naraiswari, Asahara, Masayuki, Aslan, Deniz Baran, Asmazoğlu, Cengiz, Ateyah, Luma, Atmaca, Furkan, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Balasubramani, Keerthana, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Barkarson, Starkaður, Basile, Rodolfo, Basmov, Victoria, Batchelor, Colin, Bauer, John, Bedir, Seyyit Talha, Bengoetxea, Kepa, Ben Moshe, Yifat, Berk, Gözde, Berzak, Yevgeni, Bhat, Irshad Ahmad, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Bjarnadóttir, Kristín, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Braggaar, Anouck, Brokaitė, Kristina, Burchardt, Aljoscha, Candito, Marie, Caron, Bernard, Caron, Gauthier, Cassidy, Lauren, Cavalcanti, Tatiana, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cesur, Neslihan, Cetin, Savas, Çetinoğlu, Özlem, Chalub, Fabricio, Chauhan, Shweta, Chi, Ethan, Chika, Taishi, Cho, Yongseok, Choi, Jinho, Chun, Jayeol, Chung, Juyeon, Cignarella, Alessandra T., Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Corbetta, Daniela, Courtin, Marine, Cristescu, Mihaela, Daniel, Philemon, Davidson, Elizabeth, Dehouck, Mathieu, de Laurentiis, Martina, de Marneffe, Marie-Catherine, de Paiva, Valeria, Derin, Mehmet Oguz, de Souza, Elvis, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dinakaramani, Arawinda, Di Nuovo, Elisa, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eckhoff, Hanne, Eiche, Sandra, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erina, Olga, Erjavec, Tomaž, Etienne, Aline, Evelyn, Wograine, Facundes, Sidney, Farkas, Richárd, Favero, Federica, Ferdaousi, Jannatul, Fernanda, Marília, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Gamba, Federica, Garcia, Marcos, Gärdenfors, Moa, Garza, Sebastian, Gerardi, Fabrício Ferraz, Gerdes, Kim, Ginter, Filip, Godoy, Gustavo, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Griciūtė, Bernadeta, Grioni, Matias, Grobol, Loïc, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Güngör, Tunga, Habash, Nizar, Hafsteinsson, Hinrik, Hajič, Jan, Hajič jr., Jan, Hämäläinen, Mika, Hà Mỹ, Linh, Han, Na-Rae, Hanifmuti, Muhammad Yudistira, Harada, Takahiro, Hardwick, Sam, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hellwig, Oliver, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Hwang, Jena, Ikeda, Takumi, Ingason, Anton Karl, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Ito, Kaoru, Jannat, Siratun, Jelínek, Tomáš, Jha, Apoorva, Johannsen, Anders, Jónsdóttir, Hildur, Jørgensen, Fredrik, Juutinen, Markus, K, Sarveswaran, Kaşıkara, Hüner, Kaasen, Andre, Kabaeva, Nadezhda, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Kara, Neslihan, Karahóǧa, Ritván, Katz, Boris, Kayadelen, Tolga, Kenney, Jessica, Kettnerová, Václava, Kirchner, Jesse, Klementieva, Elena, Klyachko, Elena, Köhn, Arne, Köksal, Abdullatif, Kopacewicz, Kamil, Korkiakangas, Timo, Köse, Mehmet, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Krishnamurthy, Parameswari, Kübler, Sandra, Kuyrukçu, Oğuzhan, Kuzgun, Aslı, Kwak, Sookyoung, Laippala, Veronika, Lam, Lucia, Lambertino, Lorenzo, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Levina, Maria, Li, Cheuk Ying, Li, Josie, Li, Keying, Li, Yuan, Lim, KyungTae, Lima Padovani, Bruna, Lindén, Krister, Ljubešić, Nikola, Loginova, Olga, Lusito, Stefano, Luthfi, Andry, Luukko, Mikko, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Mahamdi, Menel, Maillard, Jean, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Marşan, Büşra, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Markantonatou, Stella, Martínez Alonso, Héctor, Martín Rodríguez, Lorena, Martins, André, Mašek, Jan, Matsuda, Hiroshi, Matsumoto, Yuji, Mazzei, Alessandro, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Merzhevich, Tatiana, Miekka, Niko, Mischenkova, Karina, Misirpashayeva, Margarita, Missilä, Anna, Mititelu, Cătălin, Mitrofan, Maria, Miyao, Yusuke, Mojiri Foroushani, AmirHossein, Molnár, Judit, Moloodi, Amirsaeid, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Moretti, Giovanni, Mori, Keiko Sophie, Mori, Shinsuke, Morioka, Tomohiko, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Munro, Robert, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Nakhlé, Mariam, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nevaci, Manuela, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nourian, Alireza, Nurmi, Hanna, Ojala, Stina, Ojha, Atul Kr., Olúòkun, Adédayọ̀, Omura, Mai, Onwuegbuzia, Emeka, Ordan, Noam, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Özateş, Şaziye Betül, Özçelik, Merve, Özgür, Arzucan, Öztürk Başaran, Balkız, Paccosi, Teresa, Palmero Aprosio, Alessio, Park, Hyunji Hayley, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Pedonese, Giulia, Peljak-Łapińska, Angelika, Peng, Siyao, Perez, Cenel-Augusto, Perkova, Natalia, Perrier, Guy, Petrov, Slav, Petrova, Daria, Peverelli, Andrea, Phelan, Jason, Piitulainen, Jussi, Pirinen, Tommi A, Pitler, Emily, Plank, Barbara, Poibeau, Thierry, Ponomareva, Larisa, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Puolakainen, Tiina, Pyysalo, Sampo, Qi, Peng, Rääbis, Andriela, Rademaker, Alexandre, Rahoman, Mizanur, Rama, Taraka, Ramasamy, Loganathan, Ramisch, Carlos, Rashel, Fam, Rasooli, Mohammad Sadegh, Ravishankar, Vinit, Real, Livy, Rebeja, Petru, Reddy, Siva, Regnault, Mathilde, Rehm, Georg, Riabov, Ivan, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rizqiyah, Putri, Rocha, Luisa, Rögnvaldsson, Eiríkur, Romanenko, Mykhailo, Rosa, Rudolf, Roșca, Valentin, Rovati, Davide, Rozonoyer, Ben, Rudina, Olga, Rueter, Jack, Rúnarsson, Kristján, Sadde, Shoval, Safari, Pegah, Sagot, Benoît, Sahala, Aleksi, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Sanıyar, Ezgi, Särg, Dage, Saulīte, Baiba, Sawanakunanon, Yanin, Saxena, Shefali, Scannell, Kevin, Scarlata, Salvatore, Schneider, Nathan, Schuster, Sebastian, Schwartz, Lane, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shahzadi, Syeda, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shishkina, Yana, Shohibussirri, Muh, Sichinava, Dmitry, Siewert, Janine, Sigurðsson, Einar Freyr, Silveira, Aline, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Skachedubova, Maria, Smith, Aaron, Soares-Bastos, Isabela, Sourov, Shafi, Spadine, Carolyn, Sprugnoli, Rachele, Stamou, Vivian, Steingrímsson, Steinþór, Stella, Antonio, Straka, Milan, Strickland, Emmett, Strnadová, Jana, Suhr, Alane, Sulestio, Yogi Lesmana, Sulubacak, Umut, Suzuki, Shingo, Swanson, Daniel, Szántó, Zsolt, Taguchi, Chihiro, Taji, Dima, Takahashi, Yuta, Tamburini, Fabio, Tan, Mary Ann C., Tanaka, Takaaki, Tanaya, Dipta, Tavoni, Mirko, Tella, Samson, Tellier, Isabelle, Testori, Marinella, Thomas, Guillaume, Tonelli, Sara, Torga, Liisi, Toska, Marsida, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Türk, Utku, Tyers, Francis, Uematsu, Sumire, Untilov, Roman, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Utka, Andrius, Vagnoni, Elena, Vajjala, Sowmya, van der Goot, Rob, Vanhove, Martine, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Vedenina, Uliana, Villemonte de la Clergerie, Eric, Vincze, Veronika, Vlasova, Natalia, Wakasa, Aya, Wallenberg, Joel C., Wallin, Lars, Walsh, Abigail, Wang, Jing Xian, Washington, Jonathan North, Wendt, Maximilan, Widmer, Paul, Wigderson, Shira, Wijono, Sri Hartati, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Yako, Mary, Yamashita, Kayo, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yenice, Arife Betül, Yıldız, Olcay Taner, Yu, Zhuoran, Yuliawati, Arlisa, Žabokrtský, Zdeněk, Zahra, Shorouq, Zeldes, Amir, Zhou, He, Zhu, Hanzhi, Zhuravleva, Anna, and Ziane, Rayan
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, Mbyá Guaraní, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Swiss German, Albanian, Icelandic, Akuntsu, Apurinã, Chukot, Khunsari, Manx, Mundurukú, Nayini, Old Turkish, Soi, South Levantine Arabic, Tupinambá, Beja, Western Frisian, Guajajára, Urubú-Kaapor, Kangri, K'iche', Low German, Makuráp, Central Siberian Yupik, Western Armenian, Bengali, Javanese, Karo (Brazil), Ligurian, Neapolitan, Tatar, Xibe, Yakut, Ancient Hebrew, Cebuano, Guarani, Hittite, Madi, Emerillon, and Umbrian
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v2.10, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.10, and PUB
54. Universal Dependencies 2.11
- Creator:
- Zeman, Daniel, Nivre, Joakim, Abrams, Mitchell, Ackermann, Elia, Aepli, Noëmi, Aghaei, Hamid, Agić, Željko, Ahmadi, Amir, Ahrenberg, Lars, Ajede, Chika Kennedy, Akkurt, Salih Furkan, Aleksandravičiūtė, Gabrielė, Alfina, Ika, Algom, Avner, Alzetta, Chiara, Andersen, Erik, Antonsen, Lene, Aplonova, Katya, Aquino, Angelina, Aragon, Carolina, Aranes, Glyd, Aranzabe, Maria Jesus, Arıcan, Bilge Nas, Arnardóttir, Þórunn, Arutie, Gashaw, Arwidarasti, Jessica Naraiswari, Asahara, Masayuki, Ásgeirsdóttir, Katla, Aslan, Deniz Baran, Asmazoğlu, Cengiz, Ateyah, Luma, Atmaca, Furkan, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Balasubramani, Keerthana, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Barkarson, Starkaður, Basile, Rodolfo, Basmov, Victoria, Batchelor, Colin, Bauer, John, Bedir, Seyyit Talha, Belieni, Juan, Bengoetxea, Kepa, Ben Moshe, Yifat, Berk, Gözde, Berzak, Yevgeni, Bhat, Irshad Ahmad, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Bjarnadóttir, Kristín, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Braggaar, Anouck, Brokaitė, Kristina, Burchardt, Aljoscha, Candito, Marie, Caron, Bernard, Caron, Gauthier, Cassidy, Lauren, Castro, Maria Clara, Cavalcanti, Tatiana, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cesur, Neslihan, Cetin, Savas, Çetinoğlu, Özlem, Chalub, Fabricio, Chamila, Liyanage, Chauhan, Shweta, Chi, Ethan, Chika, Taishi, Cho, Yongseok, Choi, Jinho, Chun, Jayeol, Chung, Juyeon, Cignarella, Alessandra T., Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Corbetta, Daniela, Courtin, Marine, Cristescu, Mihaela, Daniel, Philemon, Davidson, Elizabeth, de Alencar, Leonel Figueiredo, Dehouck, Mathieu, de Laurentiis, Martina, de Marneffe, Marie-Catherine, de Paiva, Valeria, Derin, Mehmet Oguz, de Souza, Elvis, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dinakaramani, Arawinda, Di Nuovo, Elisa, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Ebert, Christian, Eckhoff, Hanne, Eiche, Sandra, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erina, Olga, Erjavec, Tomaž, Etienne, Aline, Evelyn, Wograine, Facundes, Sidney, Farkas, Richárd, Favero, Federica, Ferdaousi, Jannatul, Fernanda, Marília, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Gamba, Federica, Garcia, Marcos, Gärdenfors, Moa, Garza, Sebastian, Gerardi, Fabrício Ferraz, Gerdes, Kim, Ginter, Filip, Godoy, Gustavo, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Griciūtė, Bernadeta, Grioni, Matias, Grobol, Loïc, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Güngör, Tunga, Habash, Nizar, Hafsteinsson, Hinrik, Hajič, Jan, Hajič jr., Jan, Hämäläinen, Mika, Hà Mỹ, Linh, Han, Na-Rae, Hanifmuti, Muhammad Yudistira, Harada, Takahiro, Hardwick, Sam, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hellwig, Oliver, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Huerta Mendez, Marivel, Hwang, Jena, Ikeda, Takumi, Ingason, Anton Karl, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Islamaj, Artan, Ito, Kaoru, Jannat, Siratun, Jelínek, Tomáš, Jha, Apoorva, Jiang, Katharine, Johannsen, Anders, Jónsdóttir, Hildur, Jørgensen, Fredrik, Juutinen, Markus, Kaşıkara, Hüner, Kaasen, Andre, Kabaeva, Nadezhda, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Kara, Neslihan, Karahóǧa, Ritván, Katz, Boris, Kayadelen, Tolga, Kengatharaiyer, Sarveswaran, Kenney, Jessica, Kettnerová, Václava, Kirchner, Jesse, Klementieva, Elena, Klyachko, Elena, Köhn, Arne, Köksal, Abdullatif, Kopacewicz, Kamil, Korkiakangas, Timo, Köse, Mehmet, Koshevoy, Alexey, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Krishnamurthy, Parameswari, Kübler, Sandra, Kuqi, Adrian, Kuyrukçu, Oğuzhan, Kuzgun, Aslı, Kwak, Sookyoung, Laippala, Veronika, Lam, Lucia, Lambertino, Lorenzo, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Levina, Maria, Li, Cheuk Ying, Li, Josie, Li, Keying, Li, Yixuan, Li, Yuan, Lim, KyungTae, Lima Padovani, Bruna, Lindén, Krister, Ljubešić, Nikola, Loginova, Olga, Lusito, Stefano, Luthfi, Andry, Luukko, Mikko, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Mahamdi, Menel, Maillard, Jean, Makarchuk, Ilya, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Marşan, Büşra, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Markantonatou, Stella, Martínez Alonso, Héctor, Martín Rodríguez, Lorena, Martins, André, Mašek, Jan, Matsuda, Hiroshi, Matsumoto, Yuji, Mazzei, Alessandro, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Merzhevich, Tatiana, Miekka, Niko, Mischenkova, Karina, Misirpashayeva, Margarita, Missilä, Anna, Mititelu, Cătălin, Mitrofan, Maria, Miyao, Yusuke, Mojiri Foroushani, AmirHossein, Molnár, Judit, Moloodi, Amirsaeid, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Moretti, Giovanni, Mori, Keiko Sophie, Mori, Shinsuke, Morioka, Tomohiko, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Munro, Robert, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Nakhlé, Mariam, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nevaci, Manuela, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nourian, Alireza, Nurmi, Hanna, Ojala, Stina, Ojha, Atul Kr., Óladóttir, Hulda, Olúòkun, Adédayọ̀, Omura, Mai, Onwuegbuzia, Emeka, Ordan, Noam, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Özateş, Şaziye Betül, Özçelik, Merve, Özgür, Arzucan, Öztürk Başaran, Balkız, Paccosi, Teresa, Palmero Aprosio, Alessio, Panova, Anastasia, Park, Hyunji Hayley, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Pedonese, Giulia, Peljak-Łapińska, Angelika, Peng, Siyao, Perez, Cenel-Augusto, Perkova, Natalia, Perrier, Guy, Petrov, Slav, Petrova, Daria, Peverelli, Andrea, Phelan, Jason, Piitulainen, Jussi, Pintucci, Rodrigo, Pirinen, Tommi A, Pitler, Emily, Plamada, Magdalena, Plank, Barbara, Poibeau, Thierry, Ponomareva, Larisa, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Pugh, Robert, Puolakainen, Tiina, Pyysalo, Sampo, Qi, Peng, Rääbis, Andriela, Rademaker, Alexandre, Rahoman, Mizanur, Rama, Taraka, Ramasamy, Loganathan, Ramisch, Carlos, Rashel, Fam, Rasooli, Mohammad Sadegh, Ravishankar, Vinit, Real, Livy, Rebeja, Petru, Reddy, Siva, Regnault, Mathilde, Rehm, Georg, Riabov, Ivan, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rizqiyah, Putri, Rocha, Luisa, Rögnvaldsson, Eiríkur, Roksandic, Ivan, Romanenko, Mykhailo, Rosa, Rudolf, Roșca, Valentin, Rovati, Davide, Rozonoyer, Ben, Rudina, Olga, Rueter, Jack, Rúnarsson, Kristján, Sadde, Shoval, Safari, Pegah, Sagot, Benoît, Sahala, Aleksi, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Sanıyar, Ezgi, Särg, Dage, Sartor, Marta, Sasaki, Mitsuya, Saulīte, Baiba, Sawanakunanon, Yanin, Saxena, Shefali, Scannell, Kevin, Scarlata, Salvatore, Schneider, Nathan, Schuster, Sebastian, Schwartz, Lane, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shahzadi, Syeda, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shishkina, Yana, Shohibussirri, Muh, Shvedova, Maria, Siewert, Janine, Sigurðsson, Einar Freyr, Silva, João Ricardo, Silveira, Aline, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Símonarson, Haukur Barri, Simov, Kiril, Sitchinava, Dmitri, Skachedubova, Maria, Smith, Aaron, Soares-Bastos, Isabela, Sonnenhauser, Barbara, Sourov, Shafi, Spadine, Carolyn, Sprugnoli, Rachele, Stamou, Vivian, Steingrímsson, Steinþór, Stella, Antonio, Stephen, Abishek, Straka, Milan, Strickland, Emmett, Strnadová, Jana, Suhr, Alane, Sulestio, Yogi Lesmana, Sulubacak, Umut, Suzuki, Shingo, Swanson, Daniel, Szántó, Zsolt, Taguchi, Chihiro, Taji, Dima, Takahashi, Yuta, Tamburini, Fabio, Tan, Mary Ann C., Tanaka, Takaaki, Tanaya, Dipta, Tavoni, Mirko, Tella, Samson, Tellier, Isabelle, Testori, Marinella, Thomas, Guillaume, Tonelli, Sara, Torga, Liisi, Toska, Marsida, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Türk, Utku, Tyers, Francis, Þórðarson, Sveinbjörn, Þorsteinsson, Vilhjálmur, Uematsu, Sumire, Untilov, Roman, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Utka, Andrius, Vagnoni, Elena, Vajjala, Sowmya, van der Goot, Rob, Vanhove, Martine, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Vedenina, Uliana, Venturi, Giulia, Villemonte de la Clergerie, Eric, Vincze, Veronika, Vlasova, Natalia, Wakasa, Aya, Wallenberg, Joel C., Wallin, Lars, Walsh, Abigail, Wang, Jing Xian, Washington, Jonathan North, Wendt, Maximilan, Widmer, Paul, Wigderson, Shira, Wijono, Sri Hartati, Wille, Vanessa Berwanger, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Yako, Mary, Yamashita, Kayo, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yenice, Arife Betül, Yıldız, Olcay Taner, Yu, Zhuoran, Yuliawati, Arlisa, Žabokrtský, Zdeněk, Zahra, Shorouq, Zeldes, Amir, Zhou, He, Zhu, Hanzhi, Zhuravleva, Anna, and Ziane, Rayan
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, Mbyá Guaraní, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Swiss German, Albanian, Icelandic, Akuntsu, Apurinã, Chukot, Khunsari, Manx, Mundurukú, Nayini, Old Turkish, Soi, South Levantine Arabic, Tupinambá, Beja, Western Frisian, Guajajára, Urubú-Kaapor, Kangri, K'iche', Low German, Makuráp, Central Siberian Yupik, Western Armenian, Bengali, Javanese, Karo (Brazil), Ligurian, Neapolitan, Tatar, Xibe, Yakut, Ancient Hebrew, Cebuano, Guarani, Hittite, Madi, Emerillon, Umbrian, Abaza, Gheg Albanian, Malayalam, Nhengatu, Sinhala, Zacatlán-Ahuacatlán-Tepetzintla Nahuatl, Xavánte, and Saya
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v2.11, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.11, and PUB
55. Universal Dependencies 2.12
- Creator:
- Zeman, Daniel, Nivre, Joakim, Abrams, Mitchell, Ackermann, Elia, Aepli, Noëmi, Aghaei, Hamid, Agić, Željko, Ahmadi, Amir, Ahrenberg, Lars, Ajede, Chika Kennedy, Akkurt, Salih Furkan, Aleksandravičiūtė, Gabrielė, Alfina, Ika, Algom, Avner, Alnajjar, Khalid, Alzetta, Chiara, Andersen, Erik, Antonsen, Lene, Aoyama, Tatsuya, Aplonova, Katya, Aquino, Angelina, Aragon, Carolina, Aranes, Glyd, Aranzabe, Maria Jesus, Arıcan, Bilge Nas, Arnardóttir, Þórunn, Arutie, Gashaw, Arwidarasti, Jessica Naraiswari, Asahara, Masayuki, Ásgeirsdóttir, Katla, Aslan, Deniz Baran, Asmazoğlu, Cengiz, Ateyah, Luma, Atmaca, Furkan, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Avelãs, Mariana, Badmaeva, Elena, Balasubramani, Keerthana, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Barkarson, Starkaður, Basile, Rodolfo, Basmov, Victoria, Batchelor, Colin, Bauer, John, Bedir, Seyyit Talha, Behzad, Shabnam, Bengoetxea, Kepa, Benli, İbrahim, Ben Moshe, Yifat, Berk, Gözde, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Bjarnadóttir, Kristín, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Braggaar, Anouck, Branco, António, Brokaitė, Kristina, Burchardt, Aljoscha, Campos, Marisa, Candito, Marie, Caron, Bernard, Caron, Gauthier, Carvalheiro, Catarina, Carvalho, Rita, Cassidy, Lauren, Castro, Maria Clara, Castro, Sérgio, Cavalcanti, Tatiana, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cesur, Neslihan, Cetin, Savas, Çetinoğlu, Özlem, Chalub, Fabricio, Chamila, Liyanage, Chauhan, Shweta, Chi, Ethan, Chika, Taishi, Cho, Yongseok, Choi, Jinho, Chun, Jayeol, Chung, Juyeon, Cignarella, Alessandra T., Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Corbetta, Daniela, Costa, Francisco, Courtin, Marine, Cristescu, Mihaela, Dale, Ingerid Løyning, Daniel, Philemon, Davidson, Elizabeth, de Alencar, Leonel Figueiredo, Dehouck, Mathieu, de Laurentiis, Martina, de Marneffe, Marie-Catherine, de Paiva, Valeria, Derin, Mehmet Oguz, de Souza, Elvis, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dinakaramani, Arawinda, Di Nuovo, Elisa, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Doyle, Adrian, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Ebert, Christian, Eckhoff, Hanne, Eguchi, Masaki, Eiche, Sandra, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erina, Olga, Erjavec, Tomaž, Essaidi, Farah, Etienne, Aline, Evelyn, Wograine, Facundes, Sidney, Farkas, Richárd, Favero, Federica, Ferdaousi, Jannatul, Fernanda, Marília, Fernandez Alcalde, Hector, Fethi, Amal, Foster, Jennifer, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Gamba, Federica, Garcia, Marcos, Gärdenfors, Moa, Gerardi, Fabrício Ferraz, Gerdes, Kim, Gessler, Luke, Ginter, Filip, Godoy, Gustavo, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Griciūtė, Bernadeta, Grioni, Matias, Grobol, Loïc, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Güngör, Tunga, Habash, Nizar, Hafsteinsson, Hinrik, Hajič, Jan, Hajič jr., Jan, Hämäläinen, Mika, Hà Mỹ, Linh, Han, Na-Rae, Hanifmuti, Muhammad Yudistira, Harada, Takahiro, Hardwick, Sam, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hellwig, Oliver, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Huerta Mendez, Marivel, Hwang, Jena, Ikeda, Takumi, Ingason, Anton Karl, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Islamaj, Artan, Ito, Kaoru, Jannat, Siratun, Jelínek, Tomáš, Jha, Apoorva, Jiang, Katharine, Johannsen, Anders, Jónsdóttir, Hildur, Jørgensen, Fredrik, Juutinen, Markus, Kaşıkara, Hüner, Kabaeva, Nadezhda, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Kara, Neslihan, Karahóǧa, Ritván, Kåsen, Andre, Kayadelen, Tolga, Kengatharaiyer, Sarveswaran, Kettnerová, Václava, Kirchner, Jesse, Klementieva, Elena, Klyachko, Elena, Köhn, Arne, Köksal, Abdullatif, Kopacewicz, Kamil, Korkiakangas, Timo, Köse, Mehmet, Koshevoy, Alexey, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Krishnamurthy, Parameswari, Kübler, Sandra, Kuqi, Adrian, Kuyrukçu, Oğuzhan, Kuzgun, Aslı, Kwak, Sookyoung, Kyle, Kris, Laippala, Veronika, Lambertino, Lorenzo, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Levina, Maria, Levine, Lauren, Li, Cheuk Ying, Li, Josie, Li, Keying, Li, Yixuan, Li, Yuan, Lim, KyungTae, Lima Padovani, Bruna, Lin, Yi-Ju Jessica, Lindén, Krister, Liu, Yang Janet, Ljubešić, Nikola, Loginova, Olga, Lusito, Stefano, Luthfi, Andry, Luukko, Mikko, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Mahamdi, Menel, Maillard, Jean, Makarchuk, Ilya, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Marşan, Büşra, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Markantonatou, Stella, Martínez Alonso, Héctor, Martín Rodríguez, Lorena, Martins, André, Martins, Cláudia, Mašek, Jan, Matsuda, Hiroshi, Matsumoto, Yuji, Mazzei, Alessandro, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Merzhevich, Tatiana, Miekka, Niko, Miller, Aaron, Mischenkova, Karina, Missilä, Anna, Mititelu, Cătălin, Mitrofan, Maria, Miyao, Yusuke, Mojiri Foroushani, AmirHossein, Molnár, Judit, Moloodi, Amirsaeid, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Moretti, Giovanni, Mori, Shinsuke, Morioka, Tomohiko, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Munro, Robert, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Nakhlé, Mariam, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nevaci, Manuela, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nourian, Alireza, Nurmi, Hanna, Ojala, Stina, Ojha, Atul Kr., Óladóttir, Hulda, Olúòkun, Adédayọ̀, Omura, Mai, Onwuegbuzia, Emeka, Ordan, Noam, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Özateş, Şaziye Betül, Özçelik, Merve, Özgür, Arzucan, Öztürk Başaran, Balkız, Paccosi, Teresa, Palmero Aprosio, Alessio, Panova, Anastasia, Park, Hyunji Hayley, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Pedonese, Giulia, Peljak-Łapińska, Angelika, Peng, Siyao, Peng, Siyao Logan, Pereira, Rita, Pereira, Sílvia, Perez, Cenel-Augusto, Perkova, Natalia, Perrier, Guy, Petrov, Slav, Petrova, Daria, Peverelli, Andrea, Phelan, Jason, Piitulainen, Jussi, Pinter, Yuval, Pinto, Clara, Pirinen, Tommi A, Pitler, Emily, Plamada, Magdalena, Plank, Barbara, Poibeau, Thierry, Ponomareva, Larisa, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Pugh, Robert, Puolakainen, Tiina, Pyysalo, Sampo, Qi, Peng, Querido, Andreia, Rääbis, Andriela, Rademaker, Alexandre, Rahoman, Mizanur, Rama, Taraka, Ramasamy, Loganathan, Ramos, Joana, Rashel, Fam, Rasooli, Mohammad Sadegh, Ravishankar, Vinit, Real, Livy, Rebeja, Petru, Reddy, Siva, Regnault, Mathilde, Rehm, Georg, Riabi, Arij, Riabov, Ivan, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rizqiyah, Putri, Rocha, Luisa, Rögnvaldsson, Eiríkur, Roksandic, Ivan, Romanenko, Mykhailo, Rosa, Rudolf, Roșca, Valentin, Rovati, Davide, Rozonoyer, Ben, Rudina, Olga, Rueter, Jack, Rúnarsson, Kristján, Sadde, Shoval, Safari, Pegah, Sahala, Aleksi, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Sanıyar, Ezgi, Särg, Dage, Sartor, Marta, Sasaki, Mitsuya, Saulīte, Baiba, Sawanakunanon, Yanin, Saxena, Shefali, Scannell, Kevin, Scarlata, Salvatore, Schneider, Nathan, Schuster, Sebastian, Schwartz, Lane, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shahzadi, Syeda, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shishkina, Yana, Shohibussirri, Muh, Shvedova, Maria, Siewert, Janine, Sigurðsson, Einar Freyr, Silva, João, Silveira, Aline, Silveira, Natalia, Silveira, Sara, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Símonarson, Haukur Barri, Simov, Kiril, Sitchinava, Dmitri, Sither, Ted, Skachedubova, Maria, Smith, Aaron, Soares-Bastos, Isabela, Solberg, Per Erik, Sonnenhauser, Barbara, Sourov, Shafi, Sprugnoli, Rachele, Stamou, Vivian, Steingrímsson, Steinþór, Stella, Antonio, Stephen, Abishek, Straka, Milan, Strickland, Emmett, Strnadová, Jana, Suhr, Alane, Sulestio, Yogi Lesmana, Sulubacak, Umut, Suzuki, Shingo, Swanson, Daniel, Szántó, Zsolt, Taguchi, Chihiro, Taji, Dima, Tamburini, Fabio, Tan, Mary Ann C., Tanaka, Takaaki, Tanaya, Dipta, Tavoni, Mirko, Tella, Samson, Tellier, Isabelle, Testori, Marinella, Thomas, Guillaume, Tonelli, Sara, Torga, Liisi, Toska, Marsida, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Türk, Utku, Tyers, Francis, Þórðarson, Sveinbjörn, Þorsteinsson, Vilhjálmur, Uematsu, Sumire, Untilov, Roman, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Utka, Andrius, Vagnoni, Elena, Vajjala, Sowmya, Vak, Socrates, van der Goot, Rob, Vanhove, Martine, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Vedenina, Uliana, Venturi, Giulia, Vincze, Veronika, Vlasova, Natalia, Wakasa, Aya, Wallenberg, Joel C., Wallin, Lars, Walsh, Abigail, Washington, Jonathan North, Wendt, Maximilan, Widmer, Paul, Wigderson, Shira, Wijono, Sri Hartati, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Yako, Mary, Yamashita, Kayo, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yenice, Arife Betül, Yıldız, Olcay Taner, Yu, Zhuoran, Yuliawati, Arlisa, Žabokrtský, Zdeněk, Zahra, Shorouq, Zeldes, Amir, Zhou, He, Zhu, Hanzhi, Zhu, Yilun, Zhuravleva, Anna, and Ziane, Rayan
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, Mbyá Guaraní, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Swiss German, Albanian, Icelandic, Akuntsu, Apurinã, Chukot, Khunsari, Manx, Mundurukú, Nayini, Old Turkish, Soi, South Levantine Arabic, Tupinambá, Beja, Western Frisian, Guajajára, Urubú-Kaapor, Kangri, K'iche', Low German, Makuráp, Central Siberian Yupik, Western Armenian, Bengali, Javanese, Karo (Brazil), Ligurian, Neapolitan, Tatar, Xibe, Yakut, Ancient Hebrew, Cebuano, Guarani, Hittite, Madi, Emerillon, Umbrian, Abaza, Gheg Albanian, Malayalam, Nhengatu, Sinhala, Zacatlán-Ahuacatlán-Tepetzintla Nahuatl, Xavánte, Saya, Borôro, Kirghiz, Algerian Arabic, and Old Irish (to 900)
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v2.12, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.12, and PUB
56. Universal Dependencies 2.13
- Creator:
- Zeman, Daniel, Nivre, Joakim, Abrams, Mitchell, Ackermann, Elia, Aepli, Noëmi, Aghaei, Hamid, Agić, Željko, Ahmadi, Amir, Ahrenberg, Lars, Ajede, Chika Kennedy, Akkurt, Salih Furkan, Aleksandravičiūtė, Gabrielė, Alfina, Ika, Algom, Avner, Alnajjar, Khalid, Alzetta, Chiara, Andersen, Erik, Antonsen, Lene, Aoyama, Tatsuya, Aplonova, Katya, Aquino, Angelina, Aragon, Carolina, Aranes, Glyd, Aranzabe, Maria Jesus, Arıcan, Bilge Nas, Arnardóttir, Þórunn, Arutie, Gashaw, Arwidarasti, Jessica Naraiswari, Asahara, Masayuki, Ásgeirsdóttir, Katla, Aslan, Deniz Baran, Asmazoğlu, Cengiz, Ateyah, Luma, Atmaca, Furkan, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Avelãs, Mariana, Badmaeva, Elena, Balasubramani, Keerthana, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Barkarson, Starkaður, Basile, Rodolfo, Basmov, Victoria, Batchelor, Colin, Bauer, John, Bedir, Seyyit Talha, Behzad, Shabnam, Belieni, Juan, Bengoetxea, Kepa, Benli, İbrahim, Ben Moshe, Yifat, Berk, Gözde, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Bjarnadóttir, Kristín, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Braggaar, Anouck, Branco, António, Brokaitė, Kristina, Burchardt, Aljoscha, Campos, Marisa, Candito, Marie, Caron, Bernard, Caron, Gauthier, Carvalheiro, Catarina, Carvalho, Rita, Cassidy, Lauren, Castro, Maria Clara, Castro, Sérgio, Cavalcanti, Tatiana, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cesur, Neslihan, Cetin, Savas, Çetinoğlu, Özlem, Chalub, Fabricio, Chamila, Liyanage, Chauhan, Shweta, Chi, Ethan, Chika, Taishi, Cho, Yongseok, Choi, Jinho, Chun, Jayeol, Chung, Juyeon, Cignarella, Alessandra T., Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Corbetta, Claudia, Corbetta, Daniela, Costa, Francisco, Courtin, Marine, Crabbé, Benoît, Cristescu, Mihaela, Cvetkoski, Vladimir, Dale, Ingerid Løyning, Daniel, Philemon, Davidson, Elizabeth, de Alencar, Leonel Figueiredo, Dehouck, Mathieu, de Laurentiis, Martina, de Marneffe, Marie-Catherine, de Paiva, Valeria, Derin, Mehmet Oguz, de Souza, Elvis, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dinakaramani, Arawinda, Di Nuovo, Elisa, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Doyle, Adrian, Dozat, Timothy, Droganova, Kira, Duran, Magali Sanches, Dwivedi, Puneet, Ebert, Christian, Eckhoff, Hanne, Eguchi, Masaki, Eiche, Sandra, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erina, Olga, Erjavec, Tomaž, Essaidi, Farah, Etienne, Aline, Evelyn, Wograine, Facundes, Sidney, Farkas, Richárd, Favero, Federica, Ferdaousi, Jannatul, Fernanda, Marília, Fernandez Alcalde, Hector, Fethi, Amal, Foster, Jennifer, Fransen, Theodorus, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Gamba, Federica, Garcia, Marcos, Gärdenfors, Moa, Gerardi, Fabrício Ferraz, Gerdes, Kim, Gessler, Luke, Ginter, Filip, Godoy, Gustavo, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Griciūtė, Bernadeta, Grioni, Matias, Grobol, Loïc, Grūzītis, Normunds, Guillaume, Bruno, Guiller, Kirian, Guillot-Barbance, Céline, Güngör, Tunga, Habash, Nizar, Hafsteinsson, Hinrik, Hajič, Jan, Hajič jr., Jan, Hämäläinen, Mika, Hà Mỹ, Linh, Han, Na-Rae, Hanifmuti, Muhammad Yudistira, Harada, Takahiro, Hardwick, Sam, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hellwig, Oliver, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Huang, Yidi, Huerta Mendez, Marivel, Hwang, Jena, Ikeda, Takumi, Ingason, Anton Karl, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Islamaj, Artan, Ito, Kaoru, Jagodzińska, Sandra, Jannat, Siratun, Jelínek, Tomáš, Jha, Apoorva, Jiang, Katharine, Johannsen, Anders, Jónsdóttir, Hildur, Jørgensen, Fredrik, Juutinen, Markus, Kaşıkara, Hüner, Kabaeva, Nadezhda, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Kara, Neslihan, Karahóǧa, Ritván, Kåsen, Andre, Kayadelen, Tolga, Kengatharaiyer, Sarveswaran, Kettnerová, Václava, Kharatyan, Lilit, Kirchner, Jesse, Klementieva, Elena, Klyachko, Elena, Kocharov, Petr, Köhn, Arne, Köksal, Abdullatif, Kopacewicz, Kamil, Korkiakangas, Timo, Köse, Mehmet, Koshevoy, Alexey, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Krishnamurthy, Parameswari, Kübler, Sandra, Kuqi, Adrian, Kuyrukçu, Oğuzhan, Kuzgun, Aslı, Kwak, Sookyoung, Kyle, Kris, Laan, Käbi, Laippala, Veronika, Lambertino, Lorenzo, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Levina, Maria, Levine, Lauren, Li, Cheuk Ying, Li, Josie, Li, Keying, Li, Yixuan, Li, Yuan, Lim, KyungTae, Lima Padovani, Bruna, Lin, Yi-Ju Jessica, Lindén, Krister, Liu, Yang Janet, Ljubešić, Nikola, Lobzhanidze, Irina, Loginova, Olga, Lopes, Lucelene, Lusito, Stefano, Luthfi, Andry, Luukko, Mikko, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Mahamdi, Menel, Maillard, Jean, Makarchuk, Ilya, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Marşan, Büşra, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Markantonatou, Stella, Martínez Alonso, Héctor, Martín Rodríguez, Lorena, Martins, André, Martins, Cláudia, Mašek, Jan, Matsuda, Hiroshi, Matsumoto, Yuji, Mazzei, Alessandro, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Merzhevich, Tatiana, Miekka, Niko, Miller, Aaron, Mischenkova, Karina, Missilä, Anna, Mititelu, Cătălin, Mitrofan, Maria, Miyao, Yusuke, Mojiri Foroushani, AmirHossein, Molnár, Judit, Moloodi, Amirsaeid, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Moretti, Giovanni, Mori, Shinsuke, Morioka, Tomohiko, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Munro, Robert, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Nakhlé, Mariam, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nevaci, Manuela, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nourian, Alireza, Nunes, Maria das Graças Volpe, Nurmi, Hanna, Ojala, Stina, Ojha, Atul Kr., Óladóttir, Hulda, Olúòkun, Adédayọ̀, Omura, Mai, Onwuegbuzia, Emeka, Ordan, Noam, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Özateş, Şaziye Betül, Özçelik, Merve, Özgür, Arzucan, Öztürk Başaran, Balkız, Paccosi, Teresa, Palmero Aprosio, Alessio, Panova, Anastasia, Pardo, Thiago Alexandre Salgueiro, Park, Hyunji Hayley, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Pedonese, Giulia, Peljak-Łapińska, Angelika, Peng, Siyao, Peng, Siyao Logan, Pereira, Rita, Pereira, Sílvia, Perez, Cenel-Augusto, Perkova, Natalia, Perrier, Guy, Petrov, Slav, Petrova, Daria, Peverelli, Andrea, Phelan, Jason, Pierre-Louis, Claudel, Piitulainen, Jussi, Pinter, Yuval, Pinto, Clara, Pintucci, Rodrigo, Pirinen, Tommi A, Pitler, Emily, Plamada, Magdalena, Plank, Barbara, Poibeau, Thierry, Ponomareva, Larisa, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Pugh, Robert, Puolakainen, Tiina, Pyysalo, Sampo, Qi, Peng, Querido, Andreia, Rääbis, Andriela, Rademaker, Alexandre, Rahoman, Mizanur, Rama, Taraka, Ramasamy, Loganathan, Ramisch, Carlos, Ramos, Joana, Rashel, Fam, Rasooli, Mohammad Sadegh, Ravishankar, Vinit, Real, Livy, Rebeja, Petru, Reddy, Siva, Regnault, Mathilde, Rehm, Georg, Riabi, Arij, Riabov, Ivan, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rizqiyah, Putri, Rocha, Luisa, Rögnvaldsson, Eiríkur, Roksandic, Ivan, Romanenko, Mykhailo, Rosa, Rudolf, Roșca, Valentin, Rovati, Davide, Rozonoyer, Ben, Rudina, Olga, Rueter, Jack, Rúnarsson, Kristján, Sadde, Shoval, Safari, Pegah, Sahala, Aleksi, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Sanıyar, Ezgi, Särg, Dage, Sartor, Marta, Sasaki, Mitsuya, Saulīte, Baiba, Savary, Agata, Sawanakunanon, Yanin, Saxena, Shefali, Scannell, Kevin, Scarlata, Salvatore, Schang, Emmanuel, Schneider, Nathan, Schuster, Sebastian, Schwartz, Lane, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shahzadi, Syeda, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shishkina, Yana, Shohibussirri, Muh, Shvedova, Maria, Siewert, Janine, Sigurðsson, Einar Freyr, Silva, João, Silveira, Aline, Silveira, Natalia, Silveira, Sara, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Símonarson, Haukur Barri, Simov, Kiril, Sitchinava, Dmitri, Sither, Ted, Skachedubova, Maria, Smith, Aaron, Soares-Bastos, Isabela, Solberg, Per Erik, Sonnenhauser, Barbara, Sourov, Shafi, Sprugnoli, Rachele, Stamou, Vivian, Steingrímsson, Steinþór, Stella, Antonio, Stephen, Abishek, Straka, Milan, Strickland, Emmett, Strnadová, Jana, Suhr, Alane, Sulestio, Yogi Lesmana, Sulubacak, Umut, Suzuki, Shingo, Swanson, Daniel, Szántó, Zsolt, Taguchi, Chihiro, Taji, Dima, Tamburini, Fabio, Tan, Mary Ann C., Tanaka, Takaaki, Tanaya, Dipta, Tavoni, Mirko, Tella, Samson, Tellier, Isabelle, Testori, Marinella, Thomas, Guillaume, Tonelli, Sara, Torga, Liisi, Toska, Marsida, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Türk, Utku, Tyers, Francis, Þórðarson, Sveinbjörn, Þorsteinsson, Vilhjálmur, Uematsu, Sumire, Untilov, Roman, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Utka, Andrius, Vagnoni, Elena, Vajjala, Sowmya, Vak, Socrates, van der Goot, Rob, Vanhove, Martine, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Vedenina, Uliana, Venturi, Giulia, Villemonte de la Clergerie, Eric, Vincze, Veronika, Vlasova, Natalia, Wakasa, Aya, Wallenberg, Joel C., Wallin, Lars, Walsh, Abigail, Washington, Jonathan North, Wendt, Maximilan, Widmer, Paul, Wigderson, Shira, Wijono, Sri Hartati, Wille, Vanessa Berwanger, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Wu, Qishen, Yako, Mary, Yamashita, Kayo, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yenice, Arife Betül, Yıldız, Olcay Taner, Yu, Zhuoran, Yuliawati, Arlisa, Žabokrtský, Zdeněk, Zahra, Shorouq, Zeldes, Amir, Zhou, He, Zhu, Hanzhi, Zhu, Yilun, Zhuravleva, Anna, and Ziane, Rayan
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, Mbyá Guaraní, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Swiss German, Albanian, Icelandic, Akuntsu, Apurinã, Chukot, Khunsari, Manx, Mundurukú, Nayini, Old Turkish, Soi, South Levantine Arabic, Tupinambá, Beja, Western Frisian, Guajajára, Urubú-Kaapor, Kangri, K'iche', Low German, Makuráp, Central Siberian Yupik, Western Armenian, Bengali, Javanese, Karo (Brazil), Ligurian, Neapolitan, Tatar, Xibe, Yakut, Ancient Hebrew, Cebuano, Guarani, Hittite, Madi, Emerillon, Umbrian, Abaza, Gheg Albanian, Malayalam, Nhengatu, Sinhala, Zacatlán-Ahuacatlán-Tepetzintla Nahuatl, Xavánte, Saya, Borôro, Kirghiz, Algerian Arabic, Old Irish (to 900), Classical Armenian, Georgian, Haitian, Highland Puebla Nahuatl, Macedonian, Middle French (ca. 1400-1600), and Veps
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v2.13, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.13, and PUB
57. Universal Dependencies 2.2
- Creator:
- Nivre, Joakim, Abrams, Mitchell, Agić, Željko, Ahrenberg, Lars, Antonsen, Lene, Aranzabe, Maria Jesus, Arutie, Gashaw, Asahara, Masayuki, Ateyah, Luma, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Bauer, John, Bellato, Sandra, Bengoetxea, Kepa, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Blokland, Rogier, Bobicev, Victoria, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Burchardt, Aljoscha, Candito, Marie, Caron, Bernard, Caron, Gauthier, Cebiroğlu Eryiğit, Gülşen, Celano, Giuseppe G. A., Cetin, Savas, Chalub, Fabricio, Choi, Jinho, Cho, Yongseok, Chun, Jayeol, Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Courtin, Marine, Davidson, Elizabeth, de Marneffe, Marie-Catherine, de Paiva, Valeria, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erjavec, Tomaž, Etienne, Aline, Farkas, Richárd, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Gajdošová, Katarína, Galbraith, Daniel, Garcia, Marcos, Gärdenfors, Moa, Gerdes, Kim, Ginter, Filip, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, Gonzáles Saavedra, Berta, Grioni, Matias, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Habash, Nizar, Hajič, Jan, Hajič jr., Jan, Hà Mỹ, Linh, Han, Na-Rae, Harris, Kim, Haug, Dag, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Hwang, Jena, Ion, Radu, Irimia, Elena, Jelínek, Tomáš, Johannsen, Anders, Jørgensen, Fredrik, Kaşıkara, Hüner, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Kayadelen, Tolga, Kettnerová, Václava, Kirchner, Jesse, Kotsyba, Natalia, Krek, Simon, Kwak, Sookyoung, Laippala, Veronika, Lambertino, Lorenzo, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Li, Cheuk Ying, Li, Josie, Li, Keying, Lim, KyungTae, Ljubešić, Nikola, Loginova, Olga, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Martínez Alonso, Héctor, Martins, André, Mašek, Jan, Matsumoto, Yuji, McDonald, Ryan, Mendonça, Gustavo, Miekka, Niko, Missilä, Anna, Mititelu, Cătălin, Miyao, Yusuke, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Mori, Shinsuke, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikolaev, Vitaly, Nitisaroj, Rattima, Nurmi, Hanna, Ojala, Stina, Olúòkun, Adédayọ̀, Omura, Mai, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Peng, Siyao, Perez, Cenel-Augusto, Perrier, Guy, Petrov, Slav, Piitulainen, Jussi, Pitler, Emily, Plank, Barbara, Poibeau, Thierry, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Puolakainen, Tiina, Pyysalo, Sampo, Rääbis, Andriela, Rademaker, Alexandre, Ramasamy, Loganathan, Rama, Taraka, Ramisch, Carlos, Ravishankar, Vinit, Real, Livy, Reddy, Siva, Rehm, Georg, Rießler, Michael, Rinaldi, Larissa, Rituma, Laura, Rocha, Luisa, Romanenko, Mykhailo, Rosa, Rudolf, Rovati, Davide, Roșca, Valentin, Rudina, Olga, Sadde, Shoval, Saleh, Shadi, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Saulīte, Baiba, Sawanakunanon, Yanin, Schneider, Nathan, Schuster, Sebastian, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shen, Mo, Shimada, Atsuko, Shohibussirri, Muh, Sichinava, Dmitry, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Smith, Aaron, Soares-Bastos, Isabela, Stella, Antonio, Straka, Milan, Strnadová, Jana, Suhr, Alane, Sulubacak, Umut, Szántó, Zsolt, Taji, Dima, Takahashi, Yuta, Tanaka, Takaaki, Tellier, Isabelle, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Tyers, Francis, Uematsu, Sumire, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Vajjala, Sowmya, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Vincze, Veronika, Wallin, Lars, Washington, Jonathan North, Williams, Seyi, Wirén, Mats, Woldemariam, Tsegay, Wong, Tak-sum, Yan, Chunxiao, Yavrumyan, Marat M., Yu, Zhuoran, Žabokrtský, Zdeněk, Zeldes, Amir, Zeman, Daniel, Zhang, Manying, and Zhu, Hanzhi
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, and Yoruba
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v2.2, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.2, and PUB
58. Universal Dependencies 2.3
- Creator:
- Nivre, Joakim, Abrams, Mitchell, Agić, Željko, Ahrenberg, Lars, Antonsen, Lene, Aplonova, Katya, Aranzabe, Maria Jesus, Arutie, Gashaw, Asahara, Masayuki, Ateyah, Luma, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Basmov, Victoria, Bauer, John, Bellato, Sandra, Bengoetxea, Kepa, Berzak, Yevgeni, Bhat, Irshad Ahmad, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Blokland, Rogier, Bobicev, Victoria, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Burchardt, Aljoscha, Candito, Marie, Caron, Bernard, Caron, Gauthier, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cetin, Savas, Chalub, Fabricio, Choi, Jinho, Cho, Yongseok, Chun, Jayeol, Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Courtin, Marine, Davidson, Elizabeth, de Marneffe, Marie-Catherine, de Paiva, Valeria, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erjavec, Tomaž, Etienne, Aline, Farkas, Richárd, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Gajdošová, Katarína, Galbraith, Daniel, Garcia, Marcos, Gärdenfors, Moa, Garza, Sebastian, Gerdes, Kim, Ginter, Filip, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, Gonzáles Saavedra, Berta, Grioni, Matias, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Habash, Nizar, Hajič, Jan, Hajič jr., Jan, Hà Mỹ, Linh, Han, Na-Rae, Harris, Kim, Haug, Dag, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Hwang, Jena, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Jelínek, Tomáš, Johannsen, Anders, Jørgensen, Fredrik, Kaşıkara, Hüner, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Katz, Boris, Kayadelen, Tolga, Kenney, Jessica, Kettnerová, Václava, Kirchner, Jesse, Kopacewicz, Kamil, Kotsyba, Natalia, Krek, Simon, Kwak, Sookyoung, Laippala, Veronika, Lambertino, Lorenzo, Lam, Lucia, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Li, Cheuk Ying, Li, Josie, Li, Keying, Lim, KyungTae, Ljubešić, Nikola, Loginova, Olga, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Martínez Alonso, Héctor, Martins, André, Mašek, Jan, Matsumoto, Yuji, McDonald, Ryan, Mendonça, Gustavo, Miekka, Niko, Misirpashayeva, Margarita, Missilä, Anna, Mititelu, Cătălin, Miyao, Yusuke, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Mori, Keiko Sophie, Mori, Shinsuke, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikolaev, Vitaly, Nitisaroj, Rattima, Nurmi, Hanna, Ojala, Stina, Olúòkun, Adédayọ̀, Omura, Mai, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Peng, Siyao, Perez, Cenel-Augusto, Perrier, Guy, Petrov, Slav, Piitulainen, Jussi, Pitler, Emily, Plank, Barbara, Poibeau, Thierry, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Puolakainen, Tiina, Pyysalo, Sampo, Rääbis, Andriela, Rademaker, Alexandre, Ramasamy, Loganathan, Rama, Taraka, Ramisch, Carlos, Ravishankar, Vinit, Real, Livy, Reddy, Siva, Rehm, Georg, Rießler, Michael, Rinaldi, Larissa, Rituma, Laura, Rocha, Luisa, Romanenko, Mykhailo, Rosa, Rudolf, Rovati, Davide, Roșca, Valentin, Rudina, Olga, Rueter, Jack, Sadde, Shoval, Sagot, Benoît, Saleh, Shadi, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Saulīte, Baiba, Sawanakunanon, Yanin, Schneider, Nathan, Schuster, Sebastian, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shen, Mo, Shimada, Atsuko, Shohibussirri, Muh, Sichinava, Dmitry, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Smith, Aaron, Soares-Bastos, Isabela, Spadine, Carolyn, Stella, Antonio, Straka, Milan, Strnadová, Jana, Suhr, Alane, Sulubacak, Umut, Szántó, Zsolt, Taji, Dima, Takahashi, Yuta, Tanaka, Takaaki, Tellier, Isabelle, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Tyers, Francis, Uematsu, Sumire, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Vajjala, Sowmya, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Villemonte de la Clergerie, Eric, Vincze, Veronika, Wallin, Lars, Wang, Jing Xian, Washington, Jonathan North, Williams, Seyi, Wirén, Mats, Woldemariam, Tsegay, Wong, Tak-sum, Yan, Chunxiao, Yavrumyan, Marat M., Yu, Zhuoran, Žabokrtský, Zdeněk, Zeldes, Amir, Zeman, Daniel, Zhang, Manying, and Zhu, Hanzhi
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, and Maltese
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v2.3, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.3, and PUB
59. Universal Dependencies 2.4
- Creator:
- Nivre, Joakim, Abrams, Mitchell, Agić, Željko, Ahrenberg, Lars, Aleksandravičiūtė, Gabrielė, Antonsen, Lene, Aplonova, Katya, Aranzabe, Maria Jesus, Arutie, Gashaw, Asahara, Masayuki, Ateyah, Luma, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Basmov, Victoria, Bauer, John, Bellato, Sandra, Bengoetxea, Kepa, Berzak, Yevgeni, Bhat, Irshad Ahmad, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Brokaitė, Kristina, Burchardt, Aljoscha, Candito, Marie, Caron, Bernard, Caron, Gauthier, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cetin, Savas, Chalub, Fabricio, Choi, Jinho, Cho, Yongseok, Chun, Jayeol, Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Courtin, Marine, Davidson, Elizabeth, de Marneffe, Marie-Catherine, de Paiva, Valeria, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eckhoff, Hanne, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erjavec, Tomaž, Etienne, Aline, Farkas, Richárd, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Garcia, Marcos, Gärdenfors, Moa, Garza, Sebastian, Gerdes, Kim, Ginter, Filip, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Grioni, Matias, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Habash, Nizar, Hajič, Jan, Hajič jr., Jan, Hà Mỹ, Linh, Han, Na-Rae, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Hwang, Jena, Ikeda, Takumi, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Jelínek, Tomáš, Johannsen, Anders, Jørgensen, Fredrik, Kaşıkara, Hüner, Kaasen, Andre, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Katz, Boris, Kayadelen, Tolga, Kenney, Jessica, Kettnerová, Václava, Kirchner, Jesse, Köhn, Arne, Kopacewicz, Kamil, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Kwak, Sookyoung, Laippala, Veronika, Lambertino, Lorenzo, Lam, Lucia, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Li, Cheuk Ying, Li, Josie, Li, Keying, Lim, KyungTae, Li, Yuan, Ljubešić, Nikola, Loginova, Olga, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Martínez Alonso, Héctor, Martins, André, Mašek, Jan, Matsumoto, Yuji, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Miekka, Niko, Misirpashayeva, Margarita, Missilä, Anna, Mititelu, Cătălin, Miyao, Yusuke, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Mori, Keiko Sophie, Morioka, Tomohiko, Mori, Shinsuke, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nurmi, Hanna, Ojala, Stina, Olúòkun, Adédayọ̀, Omura, Mai, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Peljak-Łapińska, Angelika, Peng, Siyao, Perez, Cenel-Augusto, Perrier, Guy, Petrova, Daria, Petrov, Slav, Piitulainen, Jussi, Pirinen, Tommi A, Pitler, Emily, Plank, Barbara, Poibeau, Thierry, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Puolakainen, Tiina, Pyysalo, Sampo, Rääbis, Andriela, Rademaker, Alexandre, Ramasamy, Loganathan, Rama, Taraka, Ramisch, Carlos, Ravishankar, Vinit, Real, Livy, Reddy, Siva, Rehm, Georg, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rocha, Luisa, Romanenko, Mykhailo, Rosa, Rudolf, Rovati, Davide, Roșca, Valentin, Rudina, Olga, Rueter, Jack, Sadde, Shoval, Sagot, Benoît, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Särg, Dage, Saulīte, Baiba, Sawanakunanon, Yanin, Schneider, Nathan, Schuster, Sebastian, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shohibussirri, Muh, Sichinava, Dmitry, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Smith, Aaron, Soares-Bastos, Isabela, Spadine, Carolyn, Stella, Antonio, Straka, Milan, Strnadová, Jana, Suhr, Alane, Sulubacak, Umut, Suzuki, Shingo, Szántó, Zsolt, Taji, Dima, Takahashi, Yuta, Tamburini, Fabio, Tanaka, Takaaki, Tellier, Isabelle, Thomas, Guillaume, Torga, Liisi, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Tyers, Francis, Uematsu, Sumire, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Vajjala, Sowmya, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Villemonte de la Clergerie, Eric, Vincze, Veronika, Wallin, Lars, Walsh, Abigail, Wang, Jing Xian, Washington, Jonathan North, Wendt, Maximilan, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Yako, Mary, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yu, Zhuoran, Žabokrtský, Zdeněk, Zeldes, Amir, Zeman, Daniel, Zhang, Manying, and Zhu, Hanzhi
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, and Mbyá Guaraní
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v2.4, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.4, and PUB
60. Universal Dependencies 2.5
- Creator:
- Zeman, Daniel, Nivre, Joakim, Abrams, Mitchell, Aepli, Noëmi, Agić, Željko, Ahrenberg, Lars, Aleksandravičiūtė, Gabrielė, Antonsen, Lene, Aplonova, Katya, Aranzabe, Maria Jesus, Arutie, Gashaw, Asahara, Masayuki, Ateyah, Luma, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Basmov, Victoria, Batchelor, Colin, Bauer, John, Bellato, Sandra, Bengoetxea, Kepa, Berzak, Yevgeni, Bhat, Irshad Ahmad, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Brokaitė, Kristina, Burchardt, Aljoscha, Candito, Marie, Caron, Bernard, Caron, Gauthier, Cavalcanti, Tatiana, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cetin, Savas, Chalub, Fabricio, Choi, Jinho, Cho, Yongseok, Chun, Jayeol, Cignarella, Alessandra T., Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Courtin, Marine, Davidson, Elizabeth, de Marneffe, Marie-Catherine, de Paiva, Valeria, de Souza, Elvis, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eckhoff, Hanne, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erina, Olga, Erjavec, Tomaž, Etienne, Aline, Evelyn, Wograine, Farkas, Richárd, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Garcia, Marcos, Gärdenfors, Moa, Garza, Sebastian, Gerdes, Kim, Ginter, Filip, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Griciūtė, Bernadeta, Grioni, Matias, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Habash, Nizar, Hajič, Jan, Hajič jr., Jan, Hämäläinen, Mika, Hà Mỹ, Linh, Han, Na-Rae, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Hwang, Jena, Ikeda, Takumi, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Jelínek, Tomáš, Johannsen, Anders, Jørgensen, Fredrik, Juutinen, Markus, Kaşıkara, Hüner, Kaasen, Andre, Kabaeva, Nadezhda, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Katz, Boris, Kayadelen, Tolga, Kenney, Jessica, Kettnerová, Václava, Kirchner, Jesse, Klementieva, Elena, Köhn, Arne, Kopacewicz, Kamil, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Kwak, Sookyoung, Laippala, Veronika, Lambertino, Lorenzo, Lam, Lucia, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Li, Cheuk Ying, Li, Josie, Li, Keying, Lim, KyungTae, Liovina, Maria, Li, Yuan, Ljubešić, Nikola, Loginova, Olga, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Martínez Alonso, Héctor, Martins, André, Mašek, Jan, Matsumoto, Yuji, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Miekka, Niko, Misirpashayeva, Margarita, Missilä, Anna, Mititelu, Cătălin, Mitrofan, Maria, Miyao, Yusuke, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Mori, Keiko Sophie, Morioka, Tomohiko, Mori, Shinsuke, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Munro, Robert, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nurmi, Hanna, Ojala, Stina, Ojha, Atul Kr., Olúòkun, Adédayọ̀, Omura, Mai, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Peljak-Łapińska, Angelika, Peng, Siyao, Perez, Cenel-Augusto, Perrier, Guy, Petrova, Daria, Petrov, Slav, Phelan, Jason, Piitulainen, Jussi, Pirinen, Tommi A, Pitler, Emily, Plank, Barbara, Poibeau, Thierry, Ponomareva, Larisa, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Puolakainen, Tiina, Pyysalo, Sampo, Qi, Peng, Rääbis, Andriela, Rademaker, Alexandre, Ramasamy, Loganathan, Rama, Taraka, Ramisch, Carlos, Ravishankar, Vinit, Real, Livy, Reddy, Siva, Rehm, Georg, Riabov, Ivan, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rocha, Luisa, Romanenko, Mykhailo, Rosa, Rudolf, Rovati, Davide, Roșca, Valentin, Rudina, Olga, Rueter, Jack, Sadde, Shoval, Sagot, Benoît, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Särg, Dage, Saulīte, Baiba, Sawanakunanon, Yanin, Schneider, Nathan, Schuster, Sebastian, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shohibussirri, Muh, Sichinava, Dmitry, Silveira, Aline, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Smith, Aaron, Soares-Bastos, Isabela, Spadine, Carolyn, Stella, Antonio, Straka, Milan, Strnadová, Jana, Suhr, Alane, Sulubacak, Umut, Suzuki, Shingo, Szántó, Zsolt, Taji, Dima, Takahashi, Yuta, Tamburini, Fabio, Tanaka, Takaaki, Tellier, Isabelle, Thomas, Guillaume, Torga, Liisi, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Tyers, Francis, Uematsu, Sumire, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Utka, Andrius, Vajjala, Sowmya, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Villemonte de la Clergerie, Eric, Vincze, Veronika, Wallin, Lars, Walsh, Abigail, Wang, Jing Xian, Washington, Jonathan North, Wendt, Maximilan, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Yako, Mary, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yu, Zhuoran, Žabokrtský, Zdeněk, Zeldes, Amir, Zhang, Manying, and Zhu, Hanzhi
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, Mbyá Guaraní, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, and Swiss German
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v2.5, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.5, and PUB
61. Universal Dependencies 2.6
- Creator:
- Zeman, Daniel, Nivre, Joakim, Abrams, Mitchell, Ackermann, Elia, Aepli, Noëmi, Agić, Željko, Ahrenberg, Lars, Ajede, Chika Kennedy, Aleksandravičiūtė, Gabrielė, Antonsen, Lene, Aplonova, Katya, Aquino, Angelina, Aranzabe, Maria Jesus, Arutie, Gashaw, Asahara, Masayuki, Ateyah, Luma, Atmaca, Furkan, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Basmov, Victoria, Batchelor, Colin, Bauer, John, Bengoetxea, Kepa, Berzak, Yevgeni, Bhat, Irshad Ahmad, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Brokaitė, Kristina, Burchardt, Aljoscha, Candito, Marie, Caron, Bernard, Caron, Gauthier, Cavalcanti, Tatiana, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cetin, Savas, Chalub, Fabricio, Chi, Ethan, Choi, Jinho, Cho, Yongseok, Chun, Jayeol, Cignarella, Alessandra T., Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Courtin, Marine, Davidson, Elizabeth, de Marneffe, Marie-Catherine, de Paiva, Valeria, de Souza, Elvis, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eckhoff, Hanne, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erina, Olga, Erjavec, Tomaž, Etienne, Aline, Evelyn, Wograine, Farkas, Richárd, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Garcia, Marcos, Gärdenfors, Moa, Garza, Sebastian, Gerdes, Kim, Ginter, Filip, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Griciūtė, Bernadeta, Grioni, Matias, Grobol, Loïc, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Güngör, Tunga, Habash, Nizar, Hajič, Jan, Hajič jr., Jan, Hämäläinen, Mika, Hà Mỹ, Linh, Han, Na-Rae, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hellwig, Oliver, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Hwang, Jena, Ikeda, Takumi, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Jelínek, Tomáš, Johannsen, Anders, Jónsdóttir, Hildur, Jørgensen, Fredrik, Juutinen, Markus, Kaşıkara, Hüner, Kaasen, Andre, Kabaeva, Nadezhda, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Katz, Boris, Kayadelen, Tolga, Kenney, Jessica, Kettnerová, Václava, Kirchner, Jesse, Klementieva, Elena, Köhn, Arne, Köksal, Abdullatif, Kopacewicz, Kamil, Korkiakangas, Timo, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Kwak, Sookyoung, Laippala, Veronika, Lambertino, Lorenzo, Lam, Lucia, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Levina, Maria, Li, Cheuk Ying, Li, Josie, Li, Keying, Lim, KyungTae, Li, Yuan, Ljubešić, Nikola, Loginova, Olga, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Martínez Alonso, Héctor, Martins, André, Mašek, Jan, Matsuda, Hiroshi, Matsumoto, Yuji, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Miekka, Niko, Misirpashayeva, Margarita, Missilä, Anna, Mititelu, Cătălin, Mitrofan, Maria, Miyao, Yusuke, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Mori, Keiko Sophie, Morioka, Tomohiko, Mori, Shinsuke, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Munro, Robert, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nurmi, Hanna, Ojala, Stina, Ojha, Atul Kr., Olúòkun, Adédayọ̀, Omura, Mai, Onwuegbuzia, Emeka, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Özateş, Şaziye Betül, Özgür, Arzucan, Öztürk Başaran, Balkız, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Peljak-Łapińska, Angelika, Peng, Siyao, Perez, Cenel-Augusto, Perrier, Guy, Petrova, Daria, Petrov, Slav, Phelan, Jason, Piitulainen, Jussi, Pirinen, Tommi A, Pitler, Emily, Plank, Barbara, Poibeau, Thierry, Ponomareva, Larisa, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Puolakainen, Tiina, Pyysalo, Sampo, Qi, Peng, Rääbis, Andriela, Rademaker, Alexandre, Ramasamy, Loganathan, Rama, Taraka, Ramisch, Carlos, Ravishankar, Vinit, Real, Livy, Rebeja, Petru, Reddy, Siva, Rehm, Georg, Riabov, Ivan, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rocha, Luisa, Romanenko, Mykhailo, Rosa, Rudolf, Roșca, Valentin, Rovati, Davide, Rudina, Olga, Rueter, Jack, Sadde, Shoval, Sagot, Benoît, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Särg, Dage, Saulīte, Baiba, Sawanakunanon, Yanin, Scarlata, Salvatore, Schneider, Nathan, Schuster, Sebastian, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shohibussirri, Muh, Sichinava, Dmitry, Silveira, Aline, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Skachedubova, Maria, Smith, Aaron, Soares-Bastos, Isabela, Spadine, Carolyn, Stella, Antonio, Straka, Milan, Strnadová, Jana, Suhr, Alane, Sulubacak, Umut, Suzuki, Shingo, Szántó, Zsolt, Taji, Dima, Takahashi, Yuta, Tamburini, Fabio, Tanaka, Takaaki, Tella, Samson, Tellier, Isabelle, Thomas, Guillaume, Torga, Liisi, Toska, Marsida, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Türk, Utku, Tyers, Francis, Uematsu, Sumire, Untilov, Roman, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Utka, Andrius, Vajjala, Sowmya, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Villemonte de la Clergerie, Eric, Vincze, Veronika, Wakasa, Aya, Wallin, Lars, Walsh, Abigail, Wang, Jing Xian, Washington, Jonathan North, Wendt, Maximilan, Widmer, Paul, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Yako, Mary, Yamashita, Kayo, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yu, Zhuoran, Žabokrtský, Zdeněk, Zeldes, Amir, Zhu, Hanzhi, and Zhuravleva, Anna
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, Mbyá Guaraní, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Swiss German, Albanian, and Icelandic
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v2.6, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.6, and PUB
62. Universal Dependencies 2.7
- Creator:
- Zeman, Daniel, Nivre, Joakim, Abrams, Mitchell, Ackermann, Elia, Aepli, Noëmi, Aghaei, Hamid, Agić, Željko, Ahmadi, Amir, Ahrenberg, Lars, Ajede, Chika Kennedy, Aleksandravičiūtė, Gabrielė, Alfina, Ika, Antonsen, Lene, Aplonova, Katya, Aquino, Angelina, Aragon, Carolina, Aranzabe, Maria Jesus, Arnardóttir, Þórunn, Arutie, Gashaw, Arwidarasti, Jessica Naraiswari, Asahara, Masayuki, Ateyah, Luma, Atmaca, Furkan, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Balasubramani, Keerthana, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Basmov, Victoria, Batchelor, Colin, Bauer, John, Bedir, Seyyit Talha, Bengoetxea, Kepa, Berk, Gözde, Berzak, Yevgeni, Bhat, Irshad Ahmad, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Bjarnadóttir, Kristín, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Brokaitė, Kristina, Burchardt, Aljoscha, Candito, Marie, Caron, Bernard, Caron, Gauthier, Cavalcanti, Tatiana, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cetin, Savas, Çetinoğlu, Özlem, Chalub, Fabricio, Chi, Ethan, Cho, Yongseok, Choi, Jinho, Chun, Jayeol, Cignarella, Alessandra T., Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Courtin, Marine, Davidson, Elizabeth, de Marneffe, Marie-Catherine, de Paiva, Valeria, Derin, Mehmet Oguz, de Souza, Elvis, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dinakaramani, Arawinda, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eckhoff, Hanne, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erina, Olga, Erjavec, Tomaž, Etienne, Aline, Evelyn, Wograine, Facundes, Sidney, Farkas, Richárd, Fernanda, Marília, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Garcia, Marcos, Gärdenfors, Moa, Garza, Sebastian, Gerardi, Fabrício Ferraz, Gerdes, Kim, Ginter, Filip, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Griciūtė, Bernadeta, Grioni, Matias, Grobol, Loïc, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Güngör, Tunga, Habash, Nizar, Hafsteinsson, Hinrik, Hajič, Jan, Hajič jr., Jan, Hämäläinen, Mika, Hà Mỹ, Linh, Han, Na-Rae, Hanifmuti, Muhammad Yudistira, Hardwick, Sam, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hellwig, Oliver, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Huber, Eva, Hwang, Jena, Ikeda, Takumi, Ingason, Anton Karl, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Jelínek, Tomáš, Johannsen, Anders, Jónsdóttir, Hildur, Jørgensen, Fredrik, Juutinen, Markus, K, Sarveswaran, Kaşıkara, Hüner, Kaasen, Andre, Kabaeva, Nadezhda, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Katz, Boris, Kayadelen, Tolga, Kenney, Jessica, Kettnerová, Václava, Kirchner, Jesse, Klementieva, Elena, Köhn, Arne, Köksal, Abdullatif, Kopacewicz, Kamil, Korkiakangas, Timo, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Krishnamurthy, Parameswari, Kwak, Sookyoung, Laippala, Veronika, Lam, Lucia, Lambertino, Lorenzo, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Levina, Maria, Li, Cheuk Ying, Li, Josie, Li, Keying, Li, Yuan, Lim, KyungTae, Lindén, Krister, Ljubešić, Nikola, Loginova, Olga, Luthfi, Andry, Luukko, Mikko, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Martínez Alonso, Héctor, Martins, André, Mašek, Jan, Matsuda, Hiroshi, Matsumoto, Yuji, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Miekka, Niko, Mischenkova, Karina, Misirpashayeva, Margarita, Missilä, Anna, Mititelu, Cătălin, Mitrofan, Maria, Miyao, Yusuke, Mojiri Foroushani, AmirHossein, Moloodi, Amirsaeid, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Mori, Keiko Sophie, Mori, Shinsuke, Morioka, Tomohiko, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Munro, Robert, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Nakhlé, Mariam, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nourian, Alireza, Nurmi, Hanna, Ojala, Stina, Ojha, Atul Kr., Olúòkun, Adédayọ̀, Omura, Mai, Onwuegbuzia, Emeka, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Özateş, Şaziye Betül, Özgür, Arzucan, Öztürk Başaran, Balkız, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Peljak-Łapińska, Angelika, Peng, Siyao, Perez, Cenel-Augusto, Perkova, Natalia, Perrier, Guy, Petrov, Slav, Petrova, Daria, Phelan, Jason, Piitulainen, Jussi, Pirinen, Tommi A, Pitler, Emily, Plank, Barbara, Poibeau, Thierry, Ponomareva, Larisa, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Puolakainen, Tiina, Pyysalo, Sampo, Qi, Peng, Rääbis, Andriela, Rademaker, Alexandre, Rama, Taraka, Ramasamy, Loganathan, Ramisch, Carlos, Rashel, Fam, Rasooli, Mohammad Sadegh, Ravishankar, Vinit, Real, Livy, Rebeja, Petru, Reddy, Siva, Rehm, Georg, Riabov, Ivan, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rocha, Luisa, Rögnvaldsson, Eiríkur, Romanenko, Mykhailo, Rosa, Rudolf, Roșca, Valentin, Rovati, Davide, Rudina, Olga, Rueter, Jack, Rúnarsson, Kristján, Sadde, Shoval, Safari, Pegah, Sagot, Benoît, Sahala, Aleksi, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Särg, Dage, Saulīte, Baiba, Sawanakunanon, Yanin, Scannell, Kevin, Scarlata, Salvatore, Schneider, Nathan, Schuster, Sebastian, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shohibussirri, Muh, Sichinava, Dmitry, Sigurðsson, Einar Freyr, Silveira, Aline, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Skachedubova, Maria, Smith, Aaron, Soares-Bastos, Isabela, Spadine, Carolyn, Steingrímsson, Steinþór, Stella, Antonio, Straka, Milan, Strickland, Emmett, Strnadová, Jana, Suhr, Alane, Sulestio, Yogi Lesmana, Sulubacak, Umut, Suzuki, Shingo, Szántó, Zsolt, Taji, Dima, Takahashi, Yuta, Tamburini, Fabio, Tan, Mary Ann C., Tanaka, Takaaki, Tella, Samson, Tellier, Isabelle, Thomas, Guillaume, Torga, Liisi, Toska, Marsida, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Türk, Utku, Tyers, Francis, Uematsu, Sumire, Untilov, Roman, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Utka, Andrius, Vajjala, Sowmya, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Villemonte de la Clergerie, Eric, Vincze, Veronika, Wakasa, Aya, Wallenberg, Joel C., Wallin, Lars, Walsh, Abigail, Wang, Jing Xian, Washington, Jonathan North, Wendt, Maximilan, Widmer, Paul, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Yako, Mary, Yamashita, Kayo, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yu, Zhuoran, Žabokrtský, Zdeněk, Zahra, Shorouq, Zeldes, Amir, Zhu, Hanzhi, and Zhuravleva, Anna
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, Mbyá Guaraní, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Swiss German, Albanian, Icelandic, Akuntsu, Apurinã, Chukot, Khunsari, Manx, Mundurukú, Nayini, Old Turkish, Soi, South Levantine Arabic, and Tupinambá
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v2.7, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.7, and PUB
63. Universal Dependencies 2.8
- Creator:
- Zeman, Daniel, Nivre, Joakim, Abrams, Mitchell, Ackermann, Elia, Aepli, Noëmi, Aghaei, Hamid, Agić, Željko, Ahmadi, Amir, Ahrenberg, Lars, Ajede, Chika Kennedy, Aleksandravičiūtė, Gabrielė, Alfina, Ika, Antonsen, Lene, Aplonova, Katya, Aquino, Angelina, Aragon, Carolina, Aranzabe, Maria Jesus, Arıcan, Bilge Nas, Arnardóttir, Þórunn, Arutie, Gashaw, Arwidarasti, Jessica Naraiswari, Asahara, Masayuki, Aslan, Deniz Baran, Ateyah, Luma, Atmaca, Furkan, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Balasubramani, Keerthana, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Barkarson, Starkaður, Basmov, Victoria, Batchelor, Colin, Bauer, John, Bedir, Seyyit Talha, Bengoetxea, Kepa, Berk, Gözde, Berzak, Yevgeni, Bhat, Irshad Ahmad, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Bjarnadóttir, Kristín, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Braggaar, Anouck, Brokaitė, Kristina, Burchardt, Aljoscha, Candito, Marie, Caron, Bernard, Caron, Gauthier, Cassidy, Lauren, Cavalcanti, Tatiana, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cesur, Neslihan, Cetin, Savas, Çetinoğlu, Özlem, Chalub, Fabricio, Chauhan, Shweta, Chi, Ethan, Chika, Taishi, Cho, Yongseok, Choi, Jinho, Chun, Jayeol, Cignarella, Alessandra T., Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Courtin, Marine, Cristescu, Mihaela, Daniel, Philemon., Davidson, Elizabeth, de Marneffe, Marie-Catherine, de Paiva, Valeria, Derin, Mehmet Oguz, de Souza, Elvis, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dinakaramani, Arawinda, Di Nuovo, Elisa, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eckhoff, Hanne, Eiche, Sandra, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erina, Olga, Erjavec, Tomaž, Etienne, Aline, Evelyn, Wograine, Facundes, Sidney, Farkas, Richárd, Fernanda, Marília, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Garcia, Marcos, Gärdenfors, Moa, Garza, Sebastian, Gerardi, Fabrício Ferraz, Gerdes, Kim, Ginter, Filip, Godoy, Gustavo, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Griciūtė, Bernadeta, Grioni, Matias, Grobol, Loïc, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Güngör, Tunga, Habash, Nizar, Hafsteinsson, Hinrik, Hajič, Jan, Hajič jr., Jan, Hämäläinen, Mika, Hà Mỹ, Linh, Han, Na-Rae, Hanifmuti, Muhammad Yudistira, Hardwick, Sam, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hellwig, Oliver, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Huber, Eva, Hwang, Jena, Ikeda, Takumi, Ingason, Anton Karl, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Ito, Kaoru, Jelínek, Tomáš, Jha, Apoorva, Johannsen, Anders, Jónsdóttir, Hildur, Jørgensen, Fredrik, Juutinen, Markus, K, Sarveswaran, Kaşıkara, Hüner, Kaasen, Andre, Kabaeva, Nadezhda, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Kara, Neslihan, Katz, Boris, Kayadelen, Tolga, Kenney, Jessica, Kettnerová, Václava, Kirchner, Jesse, Klementieva, Elena, Köhn, Arne, Köksal, Abdullatif, Kopacewicz, Kamil, Korkiakangas, Timo, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Krishnamurthy, Parameswari, Kuyrukçu, Oğuzhan, Kuzgun, Aslı, Kwak, Sookyoung, Laippala, Veronika, Lam, Lucia, Lambertino, Lorenzo, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Levina, Maria, Li, Cheuk Ying, Li, Josie, Li, Keying, Li, Yuan, Lim, KyungTae, Lima Padovani, Bruna, Lindén, Krister, Ljubešić, Nikola, Loginova, Olga, Luthfi, Andry, Luukko, Mikko, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Marşan, Büşra, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Martínez Alonso, Héctor, Martins, André, Mašek, Jan, Matsuda, Hiroshi, Matsumoto, Yuji, Mazzei, Alessandro, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Miekka, Niko, Mischenkova, Karina, Misirpashayeva, Margarita, Missilä, Anna, Mititelu, Cătălin, Mitrofan, Maria, Miyao, Yusuke, Mojiri Foroushani, AmirHossein, Molnár, Judit, Moloodi, Amirsaeid, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Moretti, Giovanni, Mori, Keiko Sophie, Mori, Shinsuke, Morioka, Tomohiko, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Munro, Robert, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Nakhlé, Mariam, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nevaci, Manuela, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nourian, Alireza, Nurmi, Hanna, Ojala, Stina, Ojha, Atul Kr., Olúòkun, Adédayọ̀, Omura, Mai, Onwuegbuzia, Emeka, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Özateş, Şaziye Betül, Özçelik, Merve, Özgür, Arzucan, Öztürk Başaran, Balkız, Park, Hyunji Hayley, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Peljak-Łapińska, Angelika, Peng, Siyao, Perez, Cenel-Augusto, Perkova, Natalia, Perrier, Guy, Petrov, Slav, Petrova, Daria, Phelan, Jason, Piitulainen, Jussi, Pirinen, Tommi A, Pitler, Emily, Plank, Barbara, Poibeau, Thierry, Ponomareva, Larisa, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Puolakainen, Tiina, Pyysalo, Sampo, Qi, Peng, Rääbis, Andriela, Rademaker, Alexandre, Rama, Taraka, Ramasamy, Loganathan, Ramisch, Carlos, Rashel, Fam, Rasooli, Mohammad Sadegh, Ravishankar, Vinit, Real, Livy, Rebeja, Petru, Reddy, Siva, Rehm, Georg, Riabov, Ivan, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rocha, Luisa, Rögnvaldsson, Eiríkur, Romanenko, Mykhailo, Rosa, Rudolf, Roșca, Valentin, Rovati, Davide, Rudina, Olga, Rueter, Jack, Rúnarsson, Kristján, Sadde, Shoval, Safari, Pegah, Sagot, Benoît, Sahala, Aleksi, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Sanıyar, Ezgi, Särg, Dage, Saulīte, Baiba, Sawanakunanon, Yanin, Saxena, Shefali, Scannell, Kevin, Scarlata, Salvatore, Schneider, Nathan, Schuster, Sebastian, Schwartz, Lane, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shishkina, Yana, Shohibussirri, Muh, Sichinava, Dmitry, Siewert, Janine, Sigurðsson, Einar Freyr, Silveira, Aline, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Skachedubova, Maria, Smith, Aaron, Soares-Bastos, Isabela, Spadine, Carolyn, Sprugnoli, Rachele, Steingrímsson, Steinþór, Stella, Antonio, Straka, Milan, Strickland, Emmett, Strnadová, Jana, Suhr, Alane, Sulestio, Yogi Lesmana, Sulubacak, Umut, Suzuki, Shingo, Szántó, Zsolt, Taji, Dima, Takahashi, Yuta, Tamburini, Fabio, Tan, Mary Ann C., Tanaka, Takaaki, Tella, Samson, Tellier, Isabelle, Testori, Marinella, Thomas, Guillaume, Torga, Liisi, Toska, Marsida, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Türk, Utku, Tyers, Francis, Uematsu, Sumire, Untilov, Roman, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Utka, Andrius, Vajjala, Sowmya, van der Goot, Rob, Vanhove, Martine, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Villemonte de la Clergerie, Eric, Vincze, Veronika, Vlasova, Natalia, Wakasa, Aya, Wallenberg, Joel C., Wallin, Lars, Walsh, Abigail, Wang, Jing Xian, Washington, Jonathan North, Wendt, Maximilan, Widmer, Paul, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Yako, Mary, Yamashita, Kayo, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yenice, Arife Betül, Yıldız, Olcay Taner, Yu, Zhuoran, Žabokrtský, Zdeněk, Zahra, Shorouq, Zeldes, Amir, Zhu, Hanzhi, Zhuravleva, Anna, and Ziane, Rayan
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, Mbyá Guaraní, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Swiss German, Albanian, Icelandic, Akuntsu, Apurinã, Chukot, Khunsari, Manx, Mundurukú, Nayini, Old Turkish, Soi, South Levantine Arabic, Tupinambá, Beja, Western Frisian, Guajajára, Urubú-Kaapor, Kangri, K'iche', Low German, Makuráp, Central Siberian Yupik, and Western Armenian
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v2.8, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.8, and PUB
64. Universal Dependencies 2.8.1
- Creator:
- Zeman, Daniel, Nivre, Joakim, Abrams, Mitchell, Ackermann, Elia, Aepli, Noëmi, Aghaei, Hamid, Agić, Željko, Ahmadi, Amir, Ahrenberg, Lars, Ajede, Chika Kennedy, Aleksandravičiūtė, Gabrielė, Alfina, Ika, Antonsen, Lene, Aplonova, Katya, Aquino, Angelina, Aragon, Carolina, Aranzabe, Maria Jesus, Arıcan, Bilge Nas, Arnardóttir, Þórunn, Arutie, Gashaw, Arwidarasti, Jessica Naraiswari, Asahara, Masayuki, Aslan, Deniz Baran, Ateyah, Luma, Atmaca, Furkan, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Balasubramani, Keerthana, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Barkarson, Starkaður, Basmov, Victoria, Batchelor, Colin, Bauer, John, Bedir, Seyyit Talha, Bengoetxea, Kepa, Berk, Gözde, Berzak, Yevgeni, Bhat, Irshad Ahmad, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Bjarnadóttir, Kristín, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Braggaar, Anouck, Brokaitė, Kristina, Burchardt, Aljoscha, Candito, Marie, Caron, Bernard, Caron, Gauthier, Cassidy, Lauren, Cavalcanti, Tatiana, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cesur, Neslihan, Cetin, Savas, Çetinoğlu, Özlem, Chalub, Fabricio, Chauhan, Shweta, Chi, Ethan, Chika, Taishi, Cho, Yongseok, Choi, Jinho, Chun, Jayeol, Cignarella, Alessandra T., Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Courtin, Marine, Cristescu, Mihaela, Daniel, Philemon., Davidson, Elizabeth, de Marneffe, Marie-Catherine, de Paiva, Valeria, Derin, Mehmet Oguz, de Souza, Elvis, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dinakaramani, Arawinda, Di Nuovo, Elisa, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eckhoff, Hanne, Eiche, Sandra, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erina, Olga, Erjavec, Tomaž, Etienne, Aline, Evelyn, Wograine, Facundes, Sidney, Farkas, Richárd, Fernanda, Marília, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Garcia, Marcos, Gärdenfors, Moa, Garza, Sebastian, Gerardi, Fabrício Ferraz, Gerdes, Kim, Ginter, Filip, Godoy, Gustavo, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Griciūtė, Bernadeta, Grioni, Matias, Grobol, Loïc, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Güngör, Tunga, Habash, Nizar, Hafsteinsson, Hinrik, Hajič, Jan, Hajič jr., Jan, Hämäläinen, Mika, Hà Mỹ, Linh, Han, Na-Rae, Hanifmuti, Muhammad Yudistira, Hardwick, Sam, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hellwig, Oliver, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Huber, Eva, Hwang, Jena, Ikeda, Takumi, Ingason, Anton Karl, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Ito, Kaoru, Jelínek, Tomáš, Jha, Apoorva, Johannsen, Anders, Jónsdóttir, Hildur, Jørgensen, Fredrik, Juutinen, Markus, K, Sarveswaran, Kaşıkara, Hüner, Kaasen, Andre, Kabaeva, Nadezhda, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Kara, Neslihan, Katz, Boris, Kayadelen, Tolga, Kenney, Jessica, Kettnerová, Václava, Kirchner, Jesse, Klementieva, Elena, Köhn, Arne, Köksal, Abdullatif, Kopacewicz, Kamil, Korkiakangas, Timo, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Krishnamurthy, Parameswari, Kuyrukçu, Oğuzhan, Kuzgun, Aslı, Kwak, Sookyoung, Laippala, Veronika, Lam, Lucia, Lambertino, Lorenzo, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Levina, Maria, Li, Cheuk Ying, Li, Josie, Li, Keying, Li, Yuan, Lim, KyungTae, Lima Padovani, Bruna, Lindén, Krister, Ljubešić, Nikola, Loginova, Olga, Luthfi, Andry, Luukko, Mikko, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Marşan, Büşra, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Martínez Alonso, Héctor, Martins, André, Mašek, Jan, Matsuda, Hiroshi, Matsumoto, Yuji, Mazzei, Alessandro, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Miekka, Niko, Mischenkova, Karina, Misirpashayeva, Margarita, Missilä, Anna, Mititelu, Cătălin, Mitrofan, Maria, Miyao, Yusuke, Mojiri Foroushani, AmirHossein, Molnár, Judit, Moloodi, Amirsaeid, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Moretti, Giovanni, Mori, Keiko Sophie, Mori, Shinsuke, Morioka, Tomohiko, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Munro, Robert, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Nakhlé, Mariam, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nevaci, Manuela, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nourian, Alireza, Nurmi, Hanna, Ojala, Stina, Ojha, Atul Kr., Olúòkun, Adédayọ̀, Omura, Mai, Onwuegbuzia, Emeka, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Özateş, Şaziye Betül, Özçelik, Merve, Özgür, Arzucan, Öztürk Başaran, Balkız, Park, Hyunji Hayley, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Peljak-Łapińska, Angelika, Peng, Siyao, Perez, Cenel-Augusto, Perkova, Natalia, Perrier, Guy, Petrov, Slav, Petrova, Daria, Phelan, Jason, Piitulainen, Jussi, Pirinen, Tommi A, Pitler, Emily, Plank, Barbara, Poibeau, Thierry, Ponomareva, Larisa, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Puolakainen, Tiina, Pyysalo, Sampo, Qi, Peng, Rääbis, Andriela, Rademaker, Alexandre, Rama, Taraka, Ramasamy, Loganathan, Ramisch, Carlos, Rashel, Fam, Rasooli, Mohammad Sadegh, Ravishankar, Vinit, Real, Livy, Rebeja, Petru, Reddy, Siva, Rehm, Georg, Riabov, Ivan, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rocha, Luisa, Rögnvaldsson, Eiríkur, Romanenko, Mykhailo, Rosa, Rudolf, Roșca, Valentin, Rovati, Davide, Rudina, Olga, Rueter, Jack, Rúnarsson, Kristján, Sadde, Shoval, Safari, Pegah, Sagot, Benoît, Sahala, Aleksi, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Sanıyar, Ezgi, Särg, Dage, Saulīte, Baiba, Sawanakunanon, Yanin, Saxena, Shefali, Scannell, Kevin, Scarlata, Salvatore, Schneider, Nathan, Schuster, Sebastian, Schwartz, Lane, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shishkina, Yana, Shohibussirri, Muh, Sichinava, Dmitry, Siewert, Janine, Sigurðsson, Einar Freyr, Silveira, Aline, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Skachedubova, Maria, Smith, Aaron, Soares-Bastos, Isabela, Spadine, Carolyn, Sprugnoli, Rachele, Steingrímsson, Steinþór, Stella, Antonio, Straka, Milan, Strickland, Emmett, Strnadová, Jana, Suhr, Alane, Sulestio, Yogi Lesmana, Sulubacak, Umut, Suzuki, Shingo, Szántó, Zsolt, Taji, Dima, Takahashi, Yuta, Tamburini, Fabio, Tan, Mary Ann C., Tanaka, Takaaki, Tella, Samson, Tellier, Isabelle, Testori, Marinella, Thomas, Guillaume, Torga, Liisi, Toska, Marsida, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Türk, Utku, Tyers, Francis, Uematsu, Sumire, Untilov, Roman, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Utka, Andrius, Vajjala, Sowmya, van der Goot, Rob, Vanhove, Martine, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Villemonte de la Clergerie, Eric, Vincze, Veronika, Vlasova, Natalia, Wakasa, Aya, Wallenberg, Joel C., Wallin, Lars, Walsh, Abigail, Wang, Jing Xian, Washington, Jonathan North, Wendt, Maximilan, Widmer, Paul, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Yako, Mary, Yamashita, Kayo, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yenice, Arife Betül, Yıldız, Olcay Taner, Yu, Zhuoran, Žabokrtský, Zdeněk, Zahra, Shorouq, Zeldes, Amir, Zhu, Hanzhi, Zhuravleva, Anna, and Ziane, Rayan
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, Mbyá Guaraní, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Swiss German, Albanian, Icelandic, Akuntsu, Apurinã, Chukot, Khunsari, Manx, Mundurukú, Nayini, Old Turkish, Soi, South Levantine Arabic, Tupinambá, Beja, Western Frisian, Guajajára, Urubú-Kaapor, Kangri, K'iche', Low German, Makuráp, Central Siberian Yupik, and Western Armenian
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008). Version 2.8.1 fixes a bug in 2.8 where a portion of the Dutch Alpino treebank was accidentally omitted.
- Rights:
- Licence Universal Dependencies v2.8, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.8, and PUB
65. Universal Dependencies 2.9
- Creator:
- Zeman, Daniel, Nivre, Joakim, Abrams, Mitchell, Ackermann, Elia, Aepli, Noëmi, Aghaei, Hamid, Agić, Željko, Ahmadi, Amir, Ahrenberg, Lars, Ajede, Chika Kennedy, Aleksandravičiūtė, Gabrielė, Alfina, Ika, Antonsen, Lene, Aplonova, Katya, Aquino, Angelina, Aragon, Carolina, Aranzabe, Maria Jesus, Arıcan, Bilge Nas, Arnardóttir, Þórunn, Arutie, Gashaw, Arwidarasti, Jessica Naraiswari, Asahara, Masayuki, Aslan, Deniz Baran, Ateyah, Luma, Atmaca, Furkan, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Balasubramani, Keerthana, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Barkarson, Starkaður, Basile, Rodolfo, Basmov, Victoria, Batchelor, Colin, Bauer, John, Bedir, Seyyit Talha, Bengoetxea, Kepa, Berk, Gözde, Berzak, Yevgeni, Bhat, Irshad Ahmad, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Bjarnadóttir, Kristín, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Braggaar, Anouck, Brokaitė, Kristina, Burchardt, Aljoscha, Candito, Marie, Caron, Bernard, Caron, Gauthier, Cassidy, Lauren, Cavalcanti, Tatiana, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cesur, Neslihan, Cetin, Savas, Çetinoğlu, Özlem, Chalub, Fabricio, Chauhan, Shweta, Chi, Ethan, Chika, Taishi, Cho, Yongseok, Choi, Jinho, Chun, Jayeol, Chung, Juyeon, Cignarella, Alessandra T., Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Courtin, Marine, Cristescu, Mihaela, Daniel, Philemon, Davidson, Elizabeth, de Marneffe, Marie-Catherine, de Paiva, Valeria, Derin, Mehmet Oguz, de Souza, Elvis, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dinakaramani, Arawinda, Di Nuovo, Elisa, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eckhoff, Hanne, Eiche, Sandra, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erina, Olga, Erjavec, Tomaž, Etienne, Aline, Evelyn, Wograine, Facundes, Sidney, Farkas, Richárd, Ferdaousi, Jannatul, Fernanda, Marília, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Garcia, Marcos, Gärdenfors, Moa, Garza, Sebastian, Gerardi, Fabrício Ferraz, Gerdes, Kim, Ginter, Filip, Godoy, Gustavo, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Griciūtė, Bernadeta, Grioni, Matias, Grobol, Loïc, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Güngör, Tunga, Habash, Nizar, Hafsteinsson, Hinrik, Hajič, Jan, Hajič jr., Jan, Hämäläinen, Mika, Hà Mỹ, Linh, Han, Na-Rae, Hanifmuti, Muhammad Yudistira, Hardwick, Sam, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hellwig, Oliver, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Huber, Eva, Hwang, Jena, Ikeda, Takumi, Ingason, Anton Karl, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Ito, Kaoru, Jannat, Siratun, Jelínek, Tomáš, Jha, Apoorva, Johannsen, Anders, Jónsdóttir, Hildur, Jørgensen, Fredrik, Juutinen, Markus, K, Sarveswaran, Kaşıkara, Hüner, Kaasen, Andre, Kabaeva, Nadezhda, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Kara, Neslihan, Katz, Boris, Kayadelen, Tolga, Kenney, Jessica, Kettnerová, Václava, Kirchner, Jesse, Klementieva, Elena, Klyachko, Elena, Köhn, Arne, Köksal, Abdullatif, Kopacewicz, Kamil, Korkiakangas, Timo, Köse, Mehmet, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Krishnamurthy, Parameswari, Kübler, Sandra, Kuyrukçu, Oğuzhan, Kuzgun, Aslı, Kwak, Sookyoung, Laippala, Veronika, Lam, Lucia, Lambertino, Lorenzo, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Levina, Maria, Li, Cheuk Ying, Li, Josie, Li, Keying, Li, Yuan, Lim, KyungTae, Lima Padovani, Bruna, Lindén, Krister, Ljubešić, Nikola, Loginova, Olga, Lusito, Stefano, Luthfi, Andry, Luukko, Mikko, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Mahamdi, Menel, Maillard, Jean, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Marşan, Büşra, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Martínez Alonso, Héctor, Martín-Rodríguez, Lorena, Martins, André, Mašek, Jan, Matsuda, Hiroshi, Matsumoto, Yuji, Mazzei, Alessandro, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Merzhevich, Tatiana, Miekka, Niko, Mischenkova, Karina, Misirpashayeva, Margarita, Missilä, Anna, Mititelu, Cătălin, Mitrofan, Maria, Miyao, Yusuke, Mojiri Foroushani, AmirHossein, Molnár, Judit, Moloodi, Amirsaeid, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Moretti, Giovanni, Mori, Keiko Sophie, Mori, Shinsuke, Morioka, Tomohiko, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Munro, Robert, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Nakhlé, Mariam, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nevaci, Manuela, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nourian, Alireza, Nurmi, Hanna, Ojala, Stina, Ojha, Atul Kr., Olúòkun, Adédayọ̀, Omura, Mai, Onwuegbuzia, Emeka, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Özateş, Şaziye Betül, Özçelik, Merve, Özgür, Arzucan, Öztürk Başaran, Balkız, Park, Hyunji Hayley, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Peljak-Łapińska, Angelika, Peng, Siyao, Perez, Cenel-Augusto, Perkova, Natalia, Perrier, Guy, Petrov, Slav, Petrova, Daria, Phelan, Jason, Piitulainen, Jussi, Pirinen, Tommi A, Pitler, Emily, Plank, Barbara, Poibeau, Thierry, Ponomareva, Larisa, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Puolakainen, Tiina, Pyysalo, Sampo, Qi, Peng, Rääbis, Andriela, Rademaker, Alexandre, Rahoman, Mizanur, Rama, Taraka, Ramasamy, Loganathan, Ramisch, Carlos, Rashel, Fam, Rasooli, Mohammad Sadegh, Ravishankar, Vinit, Real, Livy, Rebeja, Petru, Reddy, Siva, Regnault, Mathilde, Rehm, Georg, Riabov, Ivan, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rizqiyah, Putri, Rocha, Luisa, Rögnvaldsson, Eiríkur, Romanenko, Mykhailo, Rosa, Rudolf, Roșca, Valentin, Rovati, Davide, Rudina, Olga, Rueter, Jack, Rúnarsson, Kristján, Sadde, Shoval, Safari, Pegah, Sagot, Benoît, Sahala, Aleksi, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Sanıyar, Ezgi, Särg, Dage, Saulīte, Baiba, Sawanakunanon, Yanin, Saxena, Shefali, Scannell, Kevin, Scarlata, Salvatore, Schneider, Nathan, Schuster, Sebastian, Schwartz, Lane, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shahzadi, Syeda, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shishkina, Yana, Shohibussirri, Muh, Sichinava, Dmitry, Siewert, Janine, Sigurðsson, Einar Freyr, Silveira, Aline, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Skachedubova, Maria, Smith, Aaron, Soares-Bastos, Isabela, Sourov, Shafi, Spadine, Carolyn, Sprugnoli, Rachele, Steingrímsson, Steinþór, Stella, Antonio, Straka, Milan, Strickland, Emmett, Strnadová, Jana, Suhr, Alane, Sulestio, Yogi Lesmana, Sulubacak, Umut, Suzuki, Shingo, Szántó, Zsolt, Taguchi, Chihiro, Taji, Dima, Takahashi, Yuta, Tamburini, Fabio, Tan, Mary Ann C., Tanaka, Takaaki, Tanaya, Dipta, Tella, Samson, Tellier, Isabelle, Testori, Marinella, Thomas, Guillaume, Torga, Liisi, Toska, Marsida, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Türk, Utku, Tyers, Francis, Uematsu, Sumire, Untilov, Roman, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Utka, Andrius, Vajjala, Sowmya, van der Goot, Rob, Vanhove, Martine, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Villemonte de la Clergerie, Eric, Vincze, Veronika, Vlasova, Natalia, Wakasa, Aya, Wallenberg, Joel C., Wallin, Lars, Walsh, Abigail, Wang, Jing Xian, Washington, Jonathan North, Wendt, Maximilan, Widmer, Paul, Wijono, Sri Hartati, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Yako, Mary, Yamashita, Kayo, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yenice, Arife Betül, Yıldız, Olcay Taner, Yu, Zhuoran, Yuliawati, Arlisa, Žabokrtský, Zdeněk, Zahra, Shorouq, Zeldes, Amir, Zhou, He, Zhu, Hanzhi, Zhuravleva, Anna, and Ziane, Rayan
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, Mbyá Guaraní, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Swiss German, Albanian, Icelandic, Akuntsu, Apurinã, Chukot, Khunsari, Manx, Mundurukú, Nayini, Old Turkish, Soi, South Levantine Arabic, Tupinambá, Beja, Western Frisian, Guajajára, Urubú-Kaapor, Kangri, K'iche', Low German, Makuráp, Central Siberian Yupik, Western Armenian, Bengali, Javanese, Karo (Brazil), Ligurian, Neapolitan, Tatar, Xibe, and Yakut
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008). Version 2.8.1 fixes a bug in 2.8 where a portion of the Dutch Alpino treebank was accidentally omitted.
- Rights:
- Licence Universal Dependencies v2.9, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.9, and PUB
66. W2C – Web to Corpus – Corpora
- Creator:
- Majliš, Martin
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- multilingual corpora
- Language:
- Afrikaans, Tosk Albanian, Amharic, Arabic, Aragonese, Egyptian Arabic, Asturian, Azerbaijani, Belarusian, Bengali, Bosnian, Bishnupriya, Breton, Buginese, Bulgarian, Catalan, Cebuano, Czech, Chuvash, Corsican, Welsh, Danish, German, Dimli (individual language), Modern Greek (1453-), English, Esperanto, Estonian, Basque, Faroese, Persian, Finnish, French, Western Frisian, Gan Chinese, Scottish Gaelic, Irish, Galician, Gilaki, Gujarati, Haitian, Serbo-Croatian, Hebrew, Fiji Hindi, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Ido, Interlingua (International Auxiliary Language Association), Indonesian, Icelandic, Italian, Javanese, Japanese, Kannada, Georgian, Kazakh, Korean, Kurdish, Latin, Latvian, Limburgan, Lithuanian, Lombard, Luxembourgish, Malayalam, Marathi, Macedonian, Malagasy, Mongolian, Maori, Malay (macrolanguage), Burmese, Neapolitan, Low German, Nepali (macrolanguage), Newari, Dutch, Norwegian Nynorsk, Norwegian, Occitan (post 1500), Ossetian, Pampanga, Piemontese, Polish, Portuguese, Quechua, Romanian, Russian, Yakut, Sicilian, Scots, Slovak, Slovenian, Spanish, Albanian, Serbian, Sundanese, Swahili (macrolanguage), Swedish, Tamil, Tatar, Telugu, Tajik, Tagalog, Thai, Turkish, Ukrainian, Urdu, Uzbek, Venetian, Vietnamese, Volapük, Waray (Philippines), Walloon, Yiddish, Yoruba, and Chinese
- Description:
- A set of corpora for 120 languages automatically collected from wikipedia and the web. Collected using the W2C toolset: http://hdl.handle.net/11858/00-097C-0000-0022-60D6-1
- Rights:
- Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0), http://creativecommons.org/licenses/by-sa/3.0/, and PUB
67. Wortschatz
- Publisher:
- University of Leipzig
- Type:
- corpus
- Language:
- Afrikaans, Albanian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, German, Hungarian, Icelandic, Indonesian, Italian, Japanese, Korean, Latin, Latvian, Lithuanian, Malay (macrolanguage), Norwegian, Occitan (post 1500), Romanian, Russian, Slovak, Slovenian, Spanish, Sundanese, Swedish, Tagalog, Turkish, Vietnamese, and Welsh
- Description:
- Collected from newspaper texts, webcrawling, etc.: words (+frequency), cooccurrences (+graph), left/right neighbours, example sentences
- Rights:
- Not specified