1 - 24 of 24
Number of results to display per page
Search Results
2. A Human-Annotated Dataset for Language Modeling and Named Entity Recognition in Medieval Documents (2023-01-05)
- Creator:
- Novotný, Vít, Luger, Kristýna, Štefánik, Michal, Vrabcová, Tereza, and Horák, Aleš
- Publisher:
- Masaryk University, Brno
- Type:
- text and corpus
- Subject:
- NER, named entity recognition, and Medieval
- Language:
- Czech, English, German, and Latin
- Description:
- This is an open dataset of sentences from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains a corpus for language modeling and human annotations for named entity recognition (NER).
- Rights:
- Public Domain Dedication (CC Zero), http://creativecommons.org/publicdomain/zero/1.0/, and PUB
3. A Human-Annotated Dataset of Scanned Images and OCR Texts from Medieval Documents
- Creator:
- Novotný, Vít, Seidlová, Kristýna, Vrabcová, Tereza, and Horák, Aleš
- Publisher:
- Masaryk University, Brno
- Type:
- image and corpus
- Subject:
- ocr, optical character recognition, language identification, image super-resolution, sr, and Medieval
- Language:
- German, Czech, Latin, and English
- Description:
- This is an open dataset of scanned images and OCR texts from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains human annotations for layout analysis, OCR evaluation, and language identification.
- Rights:
- Public Domain Dedication (CC Zero), http://creativecommons.org/publicdomain/zero/1.0/, and PUB
4. A Human-Annotated Dataset of Scanned Images and OCR Texts from Medieval Documents: Supplementary Materials
- Creator:
- Novotný, Vít and Horák, Aleš
- Publisher:
- Masaryk University, Brno
- Type:
- text and corpus
- Subject:
- ocr, optical character recognition, language identification, image super-resolution, sr, and Medieval
- Language:
- Czech, English, German, and Latin
- Description:
- These are supplementary materials for an open dataset of scanned images and OCR texts from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains human annotations for layout analysis, OCR evaluation, and language identification and is available at http://hdl.handle.net/11234/1-4615. These supplementary materials contain OCR texts from different OCR engines for book pages for which we have both high-resolution scanned images and annotations for OCR evaluation.
- Rights:
- Public Domain Dedication (CC Zero), http://creativecommons.org/publicdomain/zero/1.0/, and PUB
5. Bosworth-Toller’s Anglo-Saxon Dictionary online
- Creator:
- Tichý, Ondřej, Roček, Martin, Bočková, Renata, Čermák, Matěj, Dragounová, Jolana, Filipová, Helena, Gilová, Lucie, Hejná, Michaela, Hladíková, Lenka, Hladká, Alena, Hubinová, Veronika, Krajcsovicsová, Vlaďena, Kupková, Tatiana, Lebedeva, Tatiana, Malečková, Nikola, Novotná, Alena, Pazderová, Tereza, Popelíková, Jiřina, Rumlová, Jana, Tyčová Ocelík, Dana, Volná, Veronika, and Zahradníková, Tereza
- Publisher:
- Charles University, Faculty of Arts, Department of English Language and ELT Methodology
- Type:
- text, lexicon, and lexicalConceptualResource
- Subject:
- English, Old English, Anglo-Saxon, dictionary, Bosworth, Toller, lexicography, digitalization, English history, Mediaeval, and Medieval
- Language:
- English, Old English (ca. 450-1100), Latin, Ancient Greek (to 1453), and Ancient Hebrew
- Description:
- Description : This is an online edition of An Anglo-Saxon Dictionary, or a dictionary of "Old English". The dictionary records the state of the English language as it was used between ca. 700-1100 AD by the Anglo-Saxon inhabitants of the British Isles. This project is based on a digital edition of An Anglo-Saxon dictionary, based on the manuscript collections of the late Joseph Bosworth (the so called Main Volume, first edition 1898) and its Supplement (first edition 1921), edited by Joseph Bosworth and T. Northcote Toller, today the largest complete dictionary of Old English (one day to be hopefully supplanted by the DOE). Alistair Campbell's "enlarged addenda and corrigenda" from 1972 are not public domain and are therefore not part of the online dictionary. Please see the front & back matter of the paper dictionary for further information, prefaces and lists of references & contractions. The digitization project was initiated by Sean Crist in 2001 as a part of his Germanic Lexicon Project and many individuals and institutions have contributed to this project. Check out the original GLP webpage and the old Bosworth-Toller offline application webpage (to be updated). Currently the project is hosted by the Faculty of Arts, Charles University. In 2010, the data from the GLP were converted to create the current site. Care was taken to preserve the typography of the original dictionary, but also provide a modern, user friendly interface for contemporary users. In 2013, the entries were structurally re-tagged and the original typography was abandoned, though the immediate access to the scans of the paper dictionary was preserved. Our aim is to reach beyond a simple digital edition and create an online environment dedicated to all interested in Old English and Anglo-Saxon culture. Feel free to join in the editing of the Dictionary, commenting on its numerous entries or participating in the discussions at our forums. We hope that by drawing the attention of the community of Anglo-Saxonists to our site and joining our resources, we may create a more useful tool for everybody. The most immediate project to draw on the corrected and tagged data of the Dictionary is a Morphological Analyzer of Old English (currently under development). We are grateful for the generous support of the Charles University Grant Agency and for the free hosting at the Faculty of Arts at Charles University. The site is currently maintained and developed by Ondrej Tichy et al. at the Department of English Language and ELT Methodology, Faculty of Arts, Charles University in Prague (Czech Republic).
- Rights:
- Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB
6. Deep Universal Dependencies 2.6
- Creator:
- Zeman, Daniel and Droganova, Kira
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- semantic dependency and universal dependencies
- Language:
- Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, Galician, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Icelandic, Albanian, and Persian
- Description:
- Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3226). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
- Rights:
- Licence Universal Dependencies v2.6, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.6, and PUB
7. Deep Universal Dependencies 2.7
- Creator:
- Zeman, Daniel and Droganova, Kira
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- semantic dependency and universal dependencies
- Language:
- Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, Galician, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Icelandic, Albanian, Persian, Akuntsu, Apurinã, Khunsari, Manx, Mundurukú, Nayini, Soi, South Levantine Arabic, and Tupinambá
- Description:
- Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3424). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
- Rights:
- Licence Universal Dependencies v2.7, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.7, and PUB
8. Deep Universal Dependencies 2.8
- Creator:
- Zeman, Daniel and Droganova, Kira
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- semantic dependency and universal dependencies
- Language:
- Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, Galician, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Icelandic, Albanian, Persian, Akuntsu, Apurinã, Khunsari, Manx, Mundurukú, Nayini, Soi, South Levantine Arabic, Tupinambá, Beja, Western Frisian, Urubú-Kaapor, Kangri, K'iche', Low German, Makuráp, Western Armenian, and Central Siberian Yupik
- Description:
- Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3687). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
- Rights:
- Licence Universal Dependencies v2.8, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.8, and PUB
9. EvaLatin 2020 models for UDPipe 2 (2020-08-31)
- Creator:
- Straka, Milan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- tool and toolService
- Subject:
- POS tagger, lemmatization, and tagger
- Language:
- Latin
- Description:
- POS Tagger and Lemmatizer models for EvaLatin2020 data (https://github.com/CIRCSE/LT4HALA). The model documentation including performance can be found at https://ufal.mff.cuni.cz/udpipe/2/models#evalatin20_models . To use these models, you need UDPipe version at least 2.0, which you can download from https://ufal.mff.cuni.cz/udpipe/2 .
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
10. On-line Dictionary of medieval latin in the Czech lands
- Creator:
- Ctibor, Jan and Nývlt, Pavel
- Publisher:
- Institute of Philosophy of the Czech Academy of Sciences
- Type:
- text, lexicon, and lexicalConceptualResource
- Subject:
- dictionary, latin, Medieval, digital humanities, lexicography, and Medieval Latin
- Language:
- Latin and Czech
- Description:
- The Dictionary of Medieval Latin in the Czech Lands registers and explains the vocabulary of Medieval Latin as used in the Czech lands since the beginnings of Latin writing in this area (from about 1000 CE) to 1500 CE, so far covering the letters A-M. For more information about the Dictionary, see the webpage of the Department of Medieval Lexicography of the Institute of Philosophy of Czech Academy of Sciences. The data uploaded present the on-line version of the dictionary (API and XML data), making it possible to put the application into operation at a localhost.
- Rights:
- Dictionary of Medieval Latin in the Czech Lands - digital version 2.2 License Agreement, https://lindat.mff.cuni.cz/repository/xmlui/page/license-lb, and ACA
11. Universal Dependencies 2.10
- Creator:
- Zeman, Daniel, Nivre, Joakim, Abrams, Mitchell, Ackermann, Elia, Aepli, Noëmi, Aghaei, Hamid, Agić, Željko, Ahmadi, Amir, Ahrenberg, Lars, Ajede, Chika Kennedy, Aleksandravičiūtė, Gabrielė, Alfina, Ika, Algom, Avner, Andersen, Erik, Antonsen, Lene, Aplonova, Katya, Aquino, Angelina, Aragon, Carolina, Aranes, Glyd, Aranzabe, Maria Jesus, Arıcan, Bilge Nas, Arnardóttir, Þórunn, Arutie, Gashaw, Arwidarasti, Jessica Naraiswari, Asahara, Masayuki, Aslan, Deniz Baran, Asmazoğlu, Cengiz, Ateyah, Luma, Atmaca, Furkan, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Balasubramani, Keerthana, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Barkarson, Starkaður, Basile, Rodolfo, Basmov, Victoria, Batchelor, Colin, Bauer, John, Bedir, Seyyit Talha, Bengoetxea, Kepa, Ben Moshe, Yifat, Berk, Gözde, Berzak, Yevgeni, Bhat, Irshad Ahmad, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Bjarnadóttir, Kristín, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Braggaar, Anouck, Brokaitė, Kristina, Burchardt, Aljoscha, Candito, Marie, Caron, Bernard, Caron, Gauthier, Cassidy, Lauren, Cavalcanti, Tatiana, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cesur, Neslihan, Cetin, Savas, Çetinoğlu, Özlem, Chalub, Fabricio, Chauhan, Shweta, Chi, Ethan, Chika, Taishi, Cho, Yongseok, Choi, Jinho, Chun, Jayeol, Chung, Juyeon, Cignarella, Alessandra T., Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Corbetta, Daniela, Courtin, Marine, Cristescu, Mihaela, Daniel, Philemon, Davidson, Elizabeth, Dehouck, Mathieu, de Laurentiis, Martina, de Marneffe, Marie-Catherine, de Paiva, Valeria, Derin, Mehmet Oguz, de Souza, Elvis, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dinakaramani, Arawinda, Di Nuovo, Elisa, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eckhoff, Hanne, Eiche, Sandra, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erina, Olga, Erjavec, Tomaž, Etienne, Aline, Evelyn, Wograine, Facundes, Sidney, Farkas, Richárd, Favero, Federica, Ferdaousi, Jannatul, Fernanda, Marília, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Gamba, Federica, Garcia, Marcos, Gärdenfors, Moa, Garza, Sebastian, Gerardi, Fabrício Ferraz, Gerdes, Kim, Ginter, Filip, Godoy, Gustavo, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Griciūtė, Bernadeta, Grioni, Matias, Grobol, Loïc, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Güngör, Tunga, Habash, Nizar, Hafsteinsson, Hinrik, Hajič, Jan, Hajič jr., Jan, Hämäläinen, Mika, Hà Mỹ, Linh, Han, Na-Rae, Hanifmuti, Muhammad Yudistira, Harada, Takahiro, Hardwick, Sam, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hellwig, Oliver, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Hwang, Jena, Ikeda, Takumi, Ingason, Anton Karl, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Ito, Kaoru, Jannat, Siratun, Jelínek, Tomáš, Jha, Apoorva, Johannsen, Anders, Jónsdóttir, Hildur, Jørgensen, Fredrik, Juutinen, Markus, K, Sarveswaran, Kaşıkara, Hüner, Kaasen, Andre, Kabaeva, Nadezhda, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Kara, Neslihan, Karahóǧa, Ritván, Katz, Boris, Kayadelen, Tolga, Kenney, Jessica, Kettnerová, Václava, Kirchner, Jesse, Klementieva, Elena, Klyachko, Elena, Köhn, Arne, Köksal, Abdullatif, Kopacewicz, Kamil, Korkiakangas, Timo, Köse, Mehmet, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Krishnamurthy, Parameswari, Kübler, Sandra, Kuyrukçu, Oğuzhan, Kuzgun, Aslı, Kwak, Sookyoung, Laippala, Veronika, Lam, Lucia, Lambertino, Lorenzo, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Levina, Maria, Li, Cheuk Ying, Li, Josie, Li, Keying, Li, Yuan, Lim, KyungTae, Lima Padovani, Bruna, Lindén, Krister, Ljubešić, Nikola, Loginova, Olga, Lusito, Stefano, Luthfi, Andry, Luukko, Mikko, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Mahamdi, Menel, Maillard, Jean, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Marşan, Büşra, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Markantonatou, Stella, Martínez Alonso, Héctor, Martín Rodríguez, Lorena, Martins, André, Mašek, Jan, Matsuda, Hiroshi, Matsumoto, Yuji, Mazzei, Alessandro, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Merzhevich, Tatiana, Miekka, Niko, Mischenkova, Karina, Misirpashayeva, Margarita, Missilä, Anna, Mititelu, Cătălin, Mitrofan, Maria, Miyao, Yusuke, Mojiri Foroushani, AmirHossein, Molnár, Judit, Moloodi, Amirsaeid, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Moretti, Giovanni, Mori, Keiko Sophie, Mori, Shinsuke, Morioka, Tomohiko, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Munro, Robert, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Nakhlé, Mariam, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nevaci, Manuela, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nourian, Alireza, Nurmi, Hanna, Ojala, Stina, Ojha, Atul Kr., Olúòkun, Adédayọ̀, Omura, Mai, Onwuegbuzia, Emeka, Ordan, Noam, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Özateş, Şaziye Betül, Özçelik, Merve, Özgür, Arzucan, Öztürk Başaran, Balkız, Paccosi, Teresa, Palmero Aprosio, Alessio, Park, Hyunji Hayley, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Pedonese, Giulia, Peljak-Łapińska, Angelika, Peng, Siyao, Perez, Cenel-Augusto, Perkova, Natalia, Perrier, Guy, Petrov, Slav, Petrova, Daria, Peverelli, Andrea, Phelan, Jason, Piitulainen, Jussi, Pirinen, Tommi A, Pitler, Emily, Plank, Barbara, Poibeau, Thierry, Ponomareva, Larisa, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Puolakainen, Tiina, Pyysalo, Sampo, Qi, Peng, Rääbis, Andriela, Rademaker, Alexandre, Rahoman, Mizanur, Rama, Taraka, Ramasamy, Loganathan, Ramisch, Carlos, Rashel, Fam, Rasooli, Mohammad Sadegh, Ravishankar, Vinit, Real, Livy, Rebeja, Petru, Reddy, Siva, Regnault, Mathilde, Rehm, Georg, Riabov, Ivan, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rizqiyah, Putri, Rocha, Luisa, Rögnvaldsson, Eiríkur, Romanenko, Mykhailo, Rosa, Rudolf, Roșca, Valentin, Rovati, Davide, Rozonoyer, Ben, Rudina, Olga, Rueter, Jack, Rúnarsson, Kristján, Sadde, Shoval, Safari, Pegah, Sagot, Benoît, Sahala, Aleksi, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Sanıyar, Ezgi, Särg, Dage, Saulīte, Baiba, Sawanakunanon, Yanin, Saxena, Shefali, Scannell, Kevin, Scarlata, Salvatore, Schneider, Nathan, Schuster, Sebastian, Schwartz, Lane, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shahzadi, Syeda, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shishkina, Yana, Shohibussirri, Muh, Sichinava, Dmitry, Siewert, Janine, Sigurðsson, Einar Freyr, Silveira, Aline, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Skachedubova, Maria, Smith, Aaron, Soares-Bastos, Isabela, Sourov, Shafi, Spadine, Carolyn, Sprugnoli, Rachele, Stamou, Vivian, Steingrímsson, Steinþór, Stella, Antonio, Straka, Milan, Strickland, Emmett, Strnadová, Jana, Suhr, Alane, Sulestio, Yogi Lesmana, Sulubacak, Umut, Suzuki, Shingo, Swanson, Daniel, Szántó, Zsolt, Taguchi, Chihiro, Taji, Dima, Takahashi, Yuta, Tamburini, Fabio, Tan, Mary Ann C., Tanaka, Takaaki, Tanaya, Dipta, Tavoni, Mirko, Tella, Samson, Tellier, Isabelle, Testori, Marinella, Thomas, Guillaume, Tonelli, Sara, Torga, Liisi, Toska, Marsida, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Türk, Utku, Tyers, Francis, Uematsu, Sumire, Untilov, Roman, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Utka, Andrius, Vagnoni, Elena, Vajjala, Sowmya, van der Goot, Rob, Vanhove, Martine, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Vedenina, Uliana, Villemonte de la Clergerie, Eric, Vincze, Veronika, Vlasova, Natalia, Wakasa, Aya, Wallenberg, Joel C., Wallin, Lars, Walsh, Abigail, Wang, Jing Xian, Washington, Jonathan North, Wendt, Maximilan, Widmer, Paul, Wigderson, Shira, Wijono, Sri Hartati, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Yako, Mary, Yamashita, Kayo, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yenice, Arife Betül, Yıldız, Olcay Taner, Yu, Zhuoran, Yuliawati, Arlisa, Žabokrtský, Zdeněk, Zahra, Shorouq, Zeldes, Amir, Zhou, He, Zhu, Hanzhi, Zhuravleva, Anna, and Ziane, Rayan
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, Mbyá Guaraní, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Swiss German, Albanian, Icelandic, Akuntsu, Apurinã, Chukot, Khunsari, Manx, Mundurukú, Nayini, Old Turkish, Soi, South Levantine Arabic, Tupinambá, Beja, Western Frisian, Guajajára, Urubú-Kaapor, Kangri, K'iche', Low German, Makuráp, Central Siberian Yupik, Western Armenian, Bengali, Javanese, Karo (Brazil), Ligurian, Neapolitan, Tatar, Xibe, Yakut, Ancient Hebrew, Cebuano, Guarani, Hittite, Madi, Emerillon, and Umbrian
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v2.10, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.10, and PUB
12. Universal Dependencies 2.10 models for UDPipe 2 (2022-07-11)
- Creator:
- Straka, Milan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- tool and toolService
- Subject:
- tokenizer, POS tagger, lemmatization, tagger, parser, and dependency parser
- Language:
- Afrikaans, Arabic, Belarusian, Bulgarian, Catalan, Czech, Church Slavic, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Persian, Finnish, French, Old French (842-ca. 1400), Scottish Gaelic, Irish, Galician, Gothic, Ancient Greek (to 1453), Ancient Hebrew, Hebrew, Hindi, Croatian, Hungarian, Armenian, Western Armenian, Indonesian, Icelandic, Italian, Japanese, Korean, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Maltese, Dutch, Norwegian Nynorsk, Norwegian Bokmål, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Telugu, Turkish, Uighur, Ukrainian, Urdu, Vietnamese, Gambian Wolof, Wolof, and Chinese
- Description:
- Tokenizer, POS Tagger, Lemmatizer and Parser models for 123 treebanks of 69 languages of Universal Depenencies 2.10 Treebanks, created solely using UD 2.10 data (https://hdl.handle.net/11234/1-4758). The model documentation including performance can be found at https://ufal.mff.cuni.cz/udpipe/2/models#universal_dependencies_210_models . To use these models, you need UDPipe version 2.0, which you can download from https://ufal.mff.cuni.cz/udpipe/2 .
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
13. Universal Dependencies 2.11
- Creator:
- Zeman, Daniel, Nivre, Joakim, Abrams, Mitchell, Ackermann, Elia, Aepli, Noëmi, Aghaei, Hamid, Agić, Željko, Ahmadi, Amir, Ahrenberg, Lars, Ajede, Chika Kennedy, Akkurt, Salih Furkan, Aleksandravičiūtė, Gabrielė, Alfina, Ika, Algom, Avner, Alzetta, Chiara, Andersen, Erik, Antonsen, Lene, Aplonova, Katya, Aquino, Angelina, Aragon, Carolina, Aranes, Glyd, Aranzabe, Maria Jesus, Arıcan, Bilge Nas, Arnardóttir, Þórunn, Arutie, Gashaw, Arwidarasti, Jessica Naraiswari, Asahara, Masayuki, Ásgeirsdóttir, Katla, Aslan, Deniz Baran, Asmazoğlu, Cengiz, Ateyah, Luma, Atmaca, Furkan, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Balasubramani, Keerthana, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Barkarson, Starkaður, Basile, Rodolfo, Basmov, Victoria, Batchelor, Colin, Bauer, John, Bedir, Seyyit Talha, Belieni, Juan, Bengoetxea, Kepa, Ben Moshe, Yifat, Berk, Gözde, Berzak, Yevgeni, Bhat, Irshad Ahmad, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Bjarnadóttir, Kristín, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Braggaar, Anouck, Brokaitė, Kristina, Burchardt, Aljoscha, Candito, Marie, Caron, Bernard, Caron, Gauthier, Cassidy, Lauren, Castro, Maria Clara, Cavalcanti, Tatiana, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cesur, Neslihan, Cetin, Savas, Çetinoğlu, Özlem, Chalub, Fabricio, Chamila, Liyanage, Chauhan, Shweta, Chi, Ethan, Chika, Taishi, Cho, Yongseok, Choi, Jinho, Chun, Jayeol, Chung, Juyeon, Cignarella, Alessandra T., Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Corbetta, Daniela, Courtin, Marine, Cristescu, Mihaela, Daniel, Philemon, Davidson, Elizabeth, de Alencar, Leonel Figueiredo, Dehouck, Mathieu, de Laurentiis, Martina, de Marneffe, Marie-Catherine, de Paiva, Valeria, Derin, Mehmet Oguz, de Souza, Elvis, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dinakaramani, Arawinda, Di Nuovo, Elisa, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Ebert, Christian, Eckhoff, Hanne, Eiche, Sandra, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erina, Olga, Erjavec, Tomaž, Etienne, Aline, Evelyn, Wograine, Facundes, Sidney, Farkas, Richárd, Favero, Federica, Ferdaousi, Jannatul, Fernanda, Marília, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Gamba, Federica, Garcia, Marcos, Gärdenfors, Moa, Garza, Sebastian, Gerardi, Fabrício Ferraz, Gerdes, Kim, Ginter, Filip, Godoy, Gustavo, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Griciūtė, Bernadeta, Grioni, Matias, Grobol, Loïc, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Güngör, Tunga, Habash, Nizar, Hafsteinsson, Hinrik, Hajič, Jan, Hajič jr., Jan, Hämäläinen, Mika, Hà Mỹ, Linh, Han, Na-Rae, Hanifmuti, Muhammad Yudistira, Harada, Takahiro, Hardwick, Sam, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hellwig, Oliver, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Huerta Mendez, Marivel, Hwang, Jena, Ikeda, Takumi, Ingason, Anton Karl, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Islamaj, Artan, Ito, Kaoru, Jannat, Siratun, Jelínek, Tomáš, Jha, Apoorva, Jiang, Katharine, Johannsen, Anders, Jónsdóttir, Hildur, Jørgensen, Fredrik, Juutinen, Markus, Kaşıkara, Hüner, Kaasen, Andre, Kabaeva, Nadezhda, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Kara, Neslihan, Karahóǧa, Ritván, Katz, Boris, Kayadelen, Tolga, Kengatharaiyer, Sarveswaran, Kenney, Jessica, Kettnerová, Václava, Kirchner, Jesse, Klementieva, Elena, Klyachko, Elena, Köhn, Arne, Köksal, Abdullatif, Kopacewicz, Kamil, Korkiakangas, Timo, Köse, Mehmet, Koshevoy, Alexey, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Krishnamurthy, Parameswari, Kübler, Sandra, Kuqi, Adrian, Kuyrukçu, Oğuzhan, Kuzgun, Aslı, Kwak, Sookyoung, Laippala, Veronika, Lam, Lucia, Lambertino, Lorenzo, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Levina, Maria, Li, Cheuk Ying, Li, Josie, Li, Keying, Li, Yixuan, Li, Yuan, Lim, KyungTae, Lima Padovani, Bruna, Lindén, Krister, Ljubešić, Nikola, Loginova, Olga, Lusito, Stefano, Luthfi, Andry, Luukko, Mikko, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Mahamdi, Menel, Maillard, Jean, Makarchuk, Ilya, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Marşan, Büşra, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Markantonatou, Stella, Martínez Alonso, Héctor, Martín Rodríguez, Lorena, Martins, André, Mašek, Jan, Matsuda, Hiroshi, Matsumoto, Yuji, Mazzei, Alessandro, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Merzhevich, Tatiana, Miekka, Niko, Mischenkova, Karina, Misirpashayeva, Margarita, Missilä, Anna, Mititelu, Cătălin, Mitrofan, Maria, Miyao, Yusuke, Mojiri Foroushani, AmirHossein, Molnár, Judit, Moloodi, Amirsaeid, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Moretti, Giovanni, Mori, Keiko Sophie, Mori, Shinsuke, Morioka, Tomohiko, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Munro, Robert, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Nakhlé, Mariam, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nevaci, Manuela, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nourian, Alireza, Nurmi, Hanna, Ojala, Stina, Ojha, Atul Kr., Óladóttir, Hulda, Olúòkun, Adédayọ̀, Omura, Mai, Onwuegbuzia, Emeka, Ordan, Noam, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Özateş, Şaziye Betül, Özçelik, Merve, Özgür, Arzucan, Öztürk Başaran, Balkız, Paccosi, Teresa, Palmero Aprosio, Alessio, Panova, Anastasia, Park, Hyunji Hayley, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Pedonese, Giulia, Peljak-Łapińska, Angelika, Peng, Siyao, Perez, Cenel-Augusto, Perkova, Natalia, Perrier, Guy, Petrov, Slav, Petrova, Daria, Peverelli, Andrea, Phelan, Jason, Piitulainen, Jussi, Pintucci, Rodrigo, Pirinen, Tommi A, Pitler, Emily, Plamada, Magdalena, Plank, Barbara, Poibeau, Thierry, Ponomareva, Larisa, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Pugh, Robert, Puolakainen, Tiina, Pyysalo, Sampo, Qi, Peng, Rääbis, Andriela, Rademaker, Alexandre, Rahoman, Mizanur, Rama, Taraka, Ramasamy, Loganathan, Ramisch, Carlos, Rashel, Fam, Rasooli, Mohammad Sadegh, Ravishankar, Vinit, Real, Livy, Rebeja, Petru, Reddy, Siva, Regnault, Mathilde, Rehm, Georg, Riabov, Ivan, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rizqiyah, Putri, Rocha, Luisa, Rögnvaldsson, Eiríkur, Roksandic, Ivan, Romanenko, Mykhailo, Rosa, Rudolf, Roșca, Valentin, Rovati, Davide, Rozonoyer, Ben, Rudina, Olga, Rueter, Jack, Rúnarsson, Kristján, Sadde, Shoval, Safari, Pegah, Sagot, Benoît, Sahala, Aleksi, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Sanıyar, Ezgi, Särg, Dage, Sartor, Marta, Sasaki, Mitsuya, Saulīte, Baiba, Sawanakunanon, Yanin, Saxena, Shefali, Scannell, Kevin, Scarlata, Salvatore, Schneider, Nathan, Schuster, Sebastian, Schwartz, Lane, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shahzadi, Syeda, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shishkina, Yana, Shohibussirri, Muh, Shvedova, Maria, Siewert, Janine, Sigurðsson, Einar Freyr, Silva, João Ricardo, Silveira, Aline, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Símonarson, Haukur Barri, Simov, Kiril, Sitchinava, Dmitri, Skachedubova, Maria, Smith, Aaron, Soares-Bastos, Isabela, Sonnenhauser, Barbara, Sourov, Shafi, Spadine, Carolyn, Sprugnoli, Rachele, Stamou, Vivian, Steingrímsson, Steinþór, Stella, Antonio, Stephen, Abishek, Straka, Milan, Strickland, Emmett, Strnadová, Jana, Suhr, Alane, Sulestio, Yogi Lesmana, Sulubacak, Umut, Suzuki, Shingo, Swanson, Daniel, Szántó, Zsolt, Taguchi, Chihiro, Taji, Dima, Takahashi, Yuta, Tamburini, Fabio, Tan, Mary Ann C., Tanaka, Takaaki, Tanaya, Dipta, Tavoni, Mirko, Tella, Samson, Tellier, Isabelle, Testori, Marinella, Thomas, Guillaume, Tonelli, Sara, Torga, Liisi, Toska, Marsida, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Türk, Utku, Tyers, Francis, Þórðarson, Sveinbjörn, Þorsteinsson, Vilhjálmur, Uematsu, Sumire, Untilov, Roman, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Utka, Andrius, Vagnoni, Elena, Vajjala, Sowmya, van der Goot, Rob, Vanhove, Martine, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Vedenina, Uliana, Venturi, Giulia, Villemonte de la Clergerie, Eric, Vincze, Veronika, Vlasova, Natalia, Wakasa, Aya, Wallenberg, Joel C., Wallin, Lars, Walsh, Abigail, Wang, Jing Xian, Washington, Jonathan North, Wendt, Maximilan, Widmer, Paul, Wigderson, Shira, Wijono, Sri Hartati, Wille, Vanessa Berwanger, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Yako, Mary, Yamashita, Kayo, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yenice, Arife Betül, Yıldız, Olcay Taner, Yu, Zhuoran, Yuliawati, Arlisa, Žabokrtský, Zdeněk, Zahra, Shorouq, Zeldes, Amir, Zhou, He, Zhu, Hanzhi, Zhuravleva, Anna, and Ziane, Rayan
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, Mbyá Guaraní, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Swiss German, Albanian, Icelandic, Akuntsu, Apurinã, Chukot, Khunsari, Manx, Mundurukú, Nayini, Old Turkish, Soi, South Levantine Arabic, Tupinambá, Beja, Western Frisian, Guajajára, Urubú-Kaapor, Kangri, K'iche', Low German, Makuráp, Central Siberian Yupik, Western Armenian, Bengali, Javanese, Karo (Brazil), Ligurian, Neapolitan, Tatar, Xibe, Yakut, Ancient Hebrew, Cebuano, Guarani, Hittite, Madi, Emerillon, Umbrian, Abaza, Gheg Albanian, Malayalam, Nhengatu, Sinhala, Zacatlán-Ahuacatlán-Tepetzintla Nahuatl, Xavánte, and Saya
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v2.11, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.11, and PUB
14. Universal Dependencies 2.12
- Creator:
- Zeman, Daniel, Nivre, Joakim, Abrams, Mitchell, Ackermann, Elia, Aepli, Noëmi, Aghaei, Hamid, Agić, Željko, Ahmadi, Amir, Ahrenberg, Lars, Ajede, Chika Kennedy, Akkurt, Salih Furkan, Aleksandravičiūtė, Gabrielė, Alfina, Ika, Algom, Avner, Alnajjar, Khalid, Alzetta, Chiara, Andersen, Erik, Antonsen, Lene, Aoyama, Tatsuya, Aplonova, Katya, Aquino, Angelina, Aragon, Carolina, Aranes, Glyd, Aranzabe, Maria Jesus, Arıcan, Bilge Nas, Arnardóttir, Þórunn, Arutie, Gashaw, Arwidarasti, Jessica Naraiswari, Asahara, Masayuki, Ásgeirsdóttir, Katla, Aslan, Deniz Baran, Asmazoğlu, Cengiz, Ateyah, Luma, Atmaca, Furkan, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Avelãs, Mariana, Badmaeva, Elena, Balasubramani, Keerthana, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Barkarson, Starkaður, Basile, Rodolfo, Basmov, Victoria, Batchelor, Colin, Bauer, John, Bedir, Seyyit Talha, Behzad, Shabnam, Bengoetxea, Kepa, Benli, İbrahim, Ben Moshe, Yifat, Berk, Gözde, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Bjarnadóttir, Kristín, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Braggaar, Anouck, Branco, António, Brokaitė, Kristina, Burchardt, Aljoscha, Campos, Marisa, Candito, Marie, Caron, Bernard, Caron, Gauthier, Carvalheiro, Catarina, Carvalho, Rita, Cassidy, Lauren, Castro, Maria Clara, Castro, Sérgio, Cavalcanti, Tatiana, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cesur, Neslihan, Cetin, Savas, Çetinoğlu, Özlem, Chalub, Fabricio, Chamila, Liyanage, Chauhan, Shweta, Chi, Ethan, Chika, Taishi, Cho, Yongseok, Choi, Jinho, Chun, Jayeol, Chung, Juyeon, Cignarella, Alessandra T., Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Corbetta, Daniela, Costa, Francisco, Courtin, Marine, Cristescu, Mihaela, Dale, Ingerid Løyning, Daniel, Philemon, Davidson, Elizabeth, de Alencar, Leonel Figueiredo, Dehouck, Mathieu, de Laurentiis, Martina, de Marneffe, Marie-Catherine, de Paiva, Valeria, Derin, Mehmet Oguz, de Souza, Elvis, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dinakaramani, Arawinda, Di Nuovo, Elisa, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Doyle, Adrian, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Ebert, Christian, Eckhoff, Hanne, Eguchi, Masaki, Eiche, Sandra, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erina, Olga, Erjavec, Tomaž, Essaidi, Farah, Etienne, Aline, Evelyn, Wograine, Facundes, Sidney, Farkas, Richárd, Favero, Federica, Ferdaousi, Jannatul, Fernanda, Marília, Fernandez Alcalde, Hector, Fethi, Amal, Foster, Jennifer, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Gamba, Federica, Garcia, Marcos, Gärdenfors, Moa, Gerardi, Fabrício Ferraz, Gerdes, Kim, Gessler, Luke, Ginter, Filip, Godoy, Gustavo, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Griciūtė, Bernadeta, Grioni, Matias, Grobol, Loïc, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Güngör, Tunga, Habash, Nizar, Hafsteinsson, Hinrik, Hajič, Jan, Hajič jr., Jan, Hämäläinen, Mika, Hà Mỹ, Linh, Han, Na-Rae, Hanifmuti, Muhammad Yudistira, Harada, Takahiro, Hardwick, Sam, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hellwig, Oliver, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Huerta Mendez, Marivel, Hwang, Jena, Ikeda, Takumi, Ingason, Anton Karl, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Islamaj, Artan, Ito, Kaoru, Jannat, Siratun, Jelínek, Tomáš, Jha, Apoorva, Jiang, Katharine, Johannsen, Anders, Jónsdóttir, Hildur, Jørgensen, Fredrik, Juutinen, Markus, Kaşıkara, Hüner, Kabaeva, Nadezhda, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Kara, Neslihan, Karahóǧa, Ritván, Kåsen, Andre, Kayadelen, Tolga, Kengatharaiyer, Sarveswaran, Kettnerová, Václava, Kirchner, Jesse, Klementieva, Elena, Klyachko, Elena, Köhn, Arne, Köksal, Abdullatif, Kopacewicz, Kamil, Korkiakangas, Timo, Köse, Mehmet, Koshevoy, Alexey, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Krishnamurthy, Parameswari, Kübler, Sandra, Kuqi, Adrian, Kuyrukçu, Oğuzhan, Kuzgun, Aslı, Kwak, Sookyoung, Kyle, Kris, Laippala, Veronika, Lambertino, Lorenzo, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Levina, Maria, Levine, Lauren, Li, Cheuk Ying, Li, Josie, Li, Keying, Li, Yixuan, Li, Yuan, Lim, KyungTae, Lima Padovani, Bruna, Lin, Yi-Ju Jessica, Lindén, Krister, Liu, Yang Janet, Ljubešić, Nikola, Loginova, Olga, Lusito, Stefano, Luthfi, Andry, Luukko, Mikko, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Mahamdi, Menel, Maillard, Jean, Makarchuk, Ilya, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Marşan, Büşra, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Markantonatou, Stella, Martínez Alonso, Héctor, Martín Rodríguez, Lorena, Martins, André, Martins, Cláudia, Mašek, Jan, Matsuda, Hiroshi, Matsumoto, Yuji, Mazzei, Alessandro, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Merzhevich, Tatiana, Miekka, Niko, Miller, Aaron, Mischenkova, Karina, Missilä, Anna, Mititelu, Cătălin, Mitrofan, Maria, Miyao, Yusuke, Mojiri Foroushani, AmirHossein, Molnár, Judit, Moloodi, Amirsaeid, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Moretti, Giovanni, Mori, Shinsuke, Morioka, Tomohiko, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Munro, Robert, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Nakhlé, Mariam, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nevaci, Manuela, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nourian, Alireza, Nurmi, Hanna, Ojala, Stina, Ojha, Atul Kr., Óladóttir, Hulda, Olúòkun, Adédayọ̀, Omura, Mai, Onwuegbuzia, Emeka, Ordan, Noam, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Özateş, Şaziye Betül, Özçelik, Merve, Özgür, Arzucan, Öztürk Başaran, Balkız, Paccosi, Teresa, Palmero Aprosio, Alessio, Panova, Anastasia, Park, Hyunji Hayley, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Pedonese, Giulia, Peljak-Łapińska, Angelika, Peng, Siyao, Peng, Siyao Logan, Pereira, Rita, Pereira, Sílvia, Perez, Cenel-Augusto, Perkova, Natalia, Perrier, Guy, Petrov, Slav, Petrova, Daria, Peverelli, Andrea, Phelan, Jason, Piitulainen, Jussi, Pinter, Yuval, Pinto, Clara, Pirinen, Tommi A, Pitler, Emily, Plamada, Magdalena, Plank, Barbara, Poibeau, Thierry, Ponomareva, Larisa, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Pugh, Robert, Puolakainen, Tiina, Pyysalo, Sampo, Qi, Peng, Querido, Andreia, Rääbis, Andriela, Rademaker, Alexandre, Rahoman, Mizanur, Rama, Taraka, Ramasamy, Loganathan, Ramos, Joana, Rashel, Fam, Rasooli, Mohammad Sadegh, Ravishankar, Vinit, Real, Livy, Rebeja, Petru, Reddy, Siva, Regnault, Mathilde, Rehm, Georg, Riabi, Arij, Riabov, Ivan, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rizqiyah, Putri, Rocha, Luisa, Rögnvaldsson, Eiríkur, Roksandic, Ivan, Romanenko, Mykhailo, Rosa, Rudolf, Roșca, Valentin, Rovati, Davide, Rozonoyer, Ben, Rudina, Olga, Rueter, Jack, Rúnarsson, Kristján, Sadde, Shoval, Safari, Pegah, Sahala, Aleksi, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Sanıyar, Ezgi, Särg, Dage, Sartor, Marta, Sasaki, Mitsuya, Saulīte, Baiba, Sawanakunanon, Yanin, Saxena, Shefali, Scannell, Kevin, Scarlata, Salvatore, Schneider, Nathan, Schuster, Sebastian, Schwartz, Lane, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shahzadi, Syeda, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shishkina, Yana, Shohibussirri, Muh, Shvedova, Maria, Siewert, Janine, Sigurðsson, Einar Freyr, Silva, João, Silveira, Aline, Silveira, Natalia, Silveira, Sara, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Símonarson, Haukur Barri, Simov, Kiril, Sitchinava, Dmitri, Sither, Ted, Skachedubova, Maria, Smith, Aaron, Soares-Bastos, Isabela, Solberg, Per Erik, Sonnenhauser, Barbara, Sourov, Shafi, Sprugnoli, Rachele, Stamou, Vivian, Steingrímsson, Steinþór, Stella, Antonio, Stephen, Abishek, Straka, Milan, Strickland, Emmett, Strnadová, Jana, Suhr, Alane, Sulestio, Yogi Lesmana, Sulubacak, Umut, Suzuki, Shingo, Swanson, Daniel, Szántó, Zsolt, Taguchi, Chihiro, Taji, Dima, Tamburini, Fabio, Tan, Mary Ann C., Tanaka, Takaaki, Tanaya, Dipta, Tavoni, Mirko, Tella, Samson, Tellier, Isabelle, Testori, Marinella, Thomas, Guillaume, Tonelli, Sara, Torga, Liisi, Toska, Marsida, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Türk, Utku, Tyers, Francis, Þórðarson, Sveinbjörn, Þorsteinsson, Vilhjálmur, Uematsu, Sumire, Untilov, Roman, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Utka, Andrius, Vagnoni, Elena, Vajjala, Sowmya, Vak, Socrates, van der Goot, Rob, Vanhove, Martine, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Vedenina, Uliana, Venturi, Giulia, Vincze, Veronika, Vlasova, Natalia, Wakasa, Aya, Wallenberg, Joel C., Wallin, Lars, Walsh, Abigail, Washington, Jonathan North, Wendt, Maximilan, Widmer, Paul, Wigderson, Shira, Wijono, Sri Hartati, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Yako, Mary, Yamashita, Kayo, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yenice, Arife Betül, Yıldız, Olcay Taner, Yu, Zhuoran, Yuliawati, Arlisa, Žabokrtský, Zdeněk, Zahra, Shorouq, Zeldes, Amir, Zhou, He, Zhu, Hanzhi, Zhu, Yilun, Zhuravleva, Anna, and Ziane, Rayan
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, Mbyá Guaraní, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Swiss German, Albanian, Icelandic, Akuntsu, Apurinã, Chukot, Khunsari, Manx, Mundurukú, Nayini, Old Turkish, Soi, South Levantine Arabic, Tupinambá, Beja, Western Frisian, Guajajára, Urubú-Kaapor, Kangri, K'iche', Low German, Makuráp, Central Siberian Yupik, Western Armenian, Bengali, Javanese, Karo (Brazil), Ligurian, Neapolitan, Tatar, Xibe, Yakut, Ancient Hebrew, Cebuano, Guarani, Hittite, Madi, Emerillon, Umbrian, Abaza, Gheg Albanian, Malayalam, Nhengatu, Sinhala, Zacatlán-Ahuacatlán-Tepetzintla Nahuatl, Xavánte, Saya, Borôro, Kirghiz, Algerian Arabic, and Old Irish (to 900)
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v2.12, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.12, and PUB
15. Universal Dependencies 2.6
- Creator:
- Zeman, Daniel, Nivre, Joakim, Abrams, Mitchell, Ackermann, Elia, Aepli, Noëmi, Agić, Željko, Ahrenberg, Lars, Ajede, Chika Kennedy, Aleksandravičiūtė, Gabrielė, Antonsen, Lene, Aplonova, Katya, Aquino, Angelina, Aranzabe, Maria Jesus, Arutie, Gashaw, Asahara, Masayuki, Ateyah, Luma, Atmaca, Furkan, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Basmov, Victoria, Batchelor, Colin, Bauer, John, Bengoetxea, Kepa, Berzak, Yevgeni, Bhat, Irshad Ahmad, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Brokaitė, Kristina, Burchardt, Aljoscha, Candito, Marie, Caron, Bernard, Caron, Gauthier, Cavalcanti, Tatiana, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cetin, Savas, Chalub, Fabricio, Chi, Ethan, Choi, Jinho, Cho, Yongseok, Chun, Jayeol, Cignarella, Alessandra T., Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Courtin, Marine, Davidson, Elizabeth, de Marneffe, Marie-Catherine, de Paiva, Valeria, de Souza, Elvis, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eckhoff, Hanne, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erina, Olga, Erjavec, Tomaž, Etienne, Aline, Evelyn, Wograine, Farkas, Richárd, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Garcia, Marcos, Gärdenfors, Moa, Garza, Sebastian, Gerdes, Kim, Ginter, Filip, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Griciūtė, Bernadeta, Grioni, Matias, Grobol, Loïc, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Güngör, Tunga, Habash, Nizar, Hajič, Jan, Hajič jr., Jan, Hämäläinen, Mika, Hà Mỹ, Linh, Han, Na-Rae, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hellwig, Oliver, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Hwang, Jena, Ikeda, Takumi, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Jelínek, Tomáš, Johannsen, Anders, Jónsdóttir, Hildur, Jørgensen, Fredrik, Juutinen, Markus, Kaşıkara, Hüner, Kaasen, Andre, Kabaeva, Nadezhda, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Katz, Boris, Kayadelen, Tolga, Kenney, Jessica, Kettnerová, Václava, Kirchner, Jesse, Klementieva, Elena, Köhn, Arne, Köksal, Abdullatif, Kopacewicz, Kamil, Korkiakangas, Timo, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Kwak, Sookyoung, Laippala, Veronika, Lambertino, Lorenzo, Lam, Lucia, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Levina, Maria, Li, Cheuk Ying, Li, Josie, Li, Keying, Lim, KyungTae, Li, Yuan, Ljubešić, Nikola, Loginova, Olga, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Martínez Alonso, Héctor, Martins, André, Mašek, Jan, Matsuda, Hiroshi, Matsumoto, Yuji, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Miekka, Niko, Misirpashayeva, Margarita, Missilä, Anna, Mititelu, Cătălin, Mitrofan, Maria, Miyao, Yusuke, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Mori, Keiko Sophie, Morioka, Tomohiko, Mori, Shinsuke, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Munro, Robert, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nurmi, Hanna, Ojala, Stina, Ojha, Atul Kr., Olúòkun, Adédayọ̀, Omura, Mai, Onwuegbuzia, Emeka, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Özateş, Şaziye Betül, Özgür, Arzucan, Öztürk Başaran, Balkız, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Peljak-Łapińska, Angelika, Peng, Siyao, Perez, Cenel-Augusto, Perrier, Guy, Petrova, Daria, Petrov, Slav, Phelan, Jason, Piitulainen, Jussi, Pirinen, Tommi A, Pitler, Emily, Plank, Barbara, Poibeau, Thierry, Ponomareva, Larisa, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Puolakainen, Tiina, Pyysalo, Sampo, Qi, Peng, Rääbis, Andriela, Rademaker, Alexandre, Ramasamy, Loganathan, Rama, Taraka, Ramisch, Carlos, Ravishankar, Vinit, Real, Livy, Rebeja, Petru, Reddy, Siva, Rehm, Georg, Riabov, Ivan, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rocha, Luisa, Romanenko, Mykhailo, Rosa, Rudolf, Roșca, Valentin, Rovati, Davide, Rudina, Olga, Rueter, Jack, Sadde, Shoval, Sagot, Benoît, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Särg, Dage, Saulīte, Baiba, Sawanakunanon, Yanin, Scarlata, Salvatore, Schneider, Nathan, Schuster, Sebastian, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shohibussirri, Muh, Sichinava, Dmitry, Silveira, Aline, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Skachedubova, Maria, Smith, Aaron, Soares-Bastos, Isabela, Spadine, Carolyn, Stella, Antonio, Straka, Milan, Strnadová, Jana, Suhr, Alane, Sulubacak, Umut, Suzuki, Shingo, Szántó, Zsolt, Taji, Dima, Takahashi, Yuta, Tamburini, Fabio, Tanaka, Takaaki, Tella, Samson, Tellier, Isabelle, Thomas, Guillaume, Torga, Liisi, Toska, Marsida, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Türk, Utku, Tyers, Francis, Uematsu, Sumire, Untilov, Roman, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Utka, Andrius, Vajjala, Sowmya, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Villemonte de la Clergerie, Eric, Vincze, Veronika, Wakasa, Aya, Wallin, Lars, Walsh, Abigail, Wang, Jing Xian, Washington, Jonathan North, Wendt, Maximilan, Widmer, Paul, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Yako, Mary, Yamashita, Kayo, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yu, Zhuoran, Žabokrtský, Zdeněk, Zeldes, Amir, Zhu, Hanzhi, and Zhuravleva, Anna
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, Mbyá Guaraní, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Swiss German, Albanian, and Icelandic
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v2.6, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.6, and PUB
16. Universal Dependencies 2.6 models for UDPipe 2 (2020-08-31)
- Creator:
- Straka, Milan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- tool and toolService
- Subject:
- tokenizer, POS tagger, lemmatization, tagger, parser, and dependency parser
- Language:
- Afrikaans, Arabic, Armenian, Belarusian, Bulgarian, Catalan, Czech, Church Slavic, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Persian, Finnish, French, Old French (842-ca. 1400), Scottish Gaelic, Irish, Galician, Gothic, Ancient Greek (to 1453), Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Korean, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Maltese, Dutch, Norwegian Nynorsk, Norwegian Bokmål, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Telugu, Turkish, Uighur, Ukrainian, Urdu, Vietnamese, Gambian Wolof, Wolof, and Chinese
- Description:
- Tokenizer, POS Tagger, Lemmatizer and Parser models for 99 treebanks of 63 languages of Universal Depenencies 2.6 Treebanks, created solely using UD 2.6 data (https://hdl.handle.net/11234/1-3226). The model documentation including performance can be found at https://ufal.mff.cuni.cz/udpipe/2/models#universal_dependencies_26_models . To use these models, you need UDPipe version 2.0, which you can download from https://ufal.mff.cuni.cz/udpipe/2 .
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
17. Universal Dependencies 2.7
- Creator:
- Zeman, Daniel, Nivre, Joakim, Abrams, Mitchell, Ackermann, Elia, Aepli, Noëmi, Aghaei, Hamid, Agić, Željko, Ahmadi, Amir, Ahrenberg, Lars, Ajede, Chika Kennedy, Aleksandravičiūtė, Gabrielė, Alfina, Ika, Antonsen, Lene, Aplonova, Katya, Aquino, Angelina, Aragon, Carolina, Aranzabe, Maria Jesus, Arnardóttir, Þórunn, Arutie, Gashaw, Arwidarasti, Jessica Naraiswari, Asahara, Masayuki, Ateyah, Luma, Atmaca, Furkan, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Balasubramani, Keerthana, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Basmov, Victoria, Batchelor, Colin, Bauer, John, Bedir, Seyyit Talha, Bengoetxea, Kepa, Berk, Gözde, Berzak, Yevgeni, Bhat, Irshad Ahmad, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Bjarnadóttir, Kristín, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Brokaitė, Kristina, Burchardt, Aljoscha, Candito, Marie, Caron, Bernard, Caron, Gauthier, Cavalcanti, Tatiana, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cetin, Savas, Çetinoğlu, Özlem, Chalub, Fabricio, Chi, Ethan, Cho, Yongseok, Choi, Jinho, Chun, Jayeol, Cignarella, Alessandra T., Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Courtin, Marine, Davidson, Elizabeth, de Marneffe, Marie-Catherine, de Paiva, Valeria, Derin, Mehmet Oguz, de Souza, Elvis, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dinakaramani, Arawinda, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eckhoff, Hanne, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erina, Olga, Erjavec, Tomaž, Etienne, Aline, Evelyn, Wograine, Facundes, Sidney, Farkas, Richárd, Fernanda, Marília, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Garcia, Marcos, Gärdenfors, Moa, Garza, Sebastian, Gerardi, Fabrício Ferraz, Gerdes, Kim, Ginter, Filip, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Griciūtė, Bernadeta, Grioni, Matias, Grobol, Loïc, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Güngör, Tunga, Habash, Nizar, Hafsteinsson, Hinrik, Hajič, Jan, Hajič jr., Jan, Hämäläinen, Mika, Hà Mỹ, Linh, Han, Na-Rae, Hanifmuti, Muhammad Yudistira, Hardwick, Sam, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hellwig, Oliver, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Huber, Eva, Hwang, Jena, Ikeda, Takumi, Ingason, Anton Karl, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Jelínek, Tomáš, Johannsen, Anders, Jónsdóttir, Hildur, Jørgensen, Fredrik, Juutinen, Markus, K, Sarveswaran, Kaşıkara, Hüner, Kaasen, Andre, Kabaeva, Nadezhda, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Katz, Boris, Kayadelen, Tolga, Kenney, Jessica, Kettnerová, Václava, Kirchner, Jesse, Klementieva, Elena, Köhn, Arne, Köksal, Abdullatif, Kopacewicz, Kamil, Korkiakangas, Timo, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Krishnamurthy, Parameswari, Kwak, Sookyoung, Laippala, Veronika, Lam, Lucia, Lambertino, Lorenzo, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Levina, Maria, Li, Cheuk Ying, Li, Josie, Li, Keying, Li, Yuan, Lim, KyungTae, Lindén, Krister, Ljubešić, Nikola, Loginova, Olga, Luthfi, Andry, Luukko, Mikko, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Martínez Alonso, Héctor, Martins, André, Mašek, Jan, Matsuda, Hiroshi, Matsumoto, Yuji, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Miekka, Niko, Mischenkova, Karina, Misirpashayeva, Margarita, Missilä, Anna, Mititelu, Cătălin, Mitrofan, Maria, Miyao, Yusuke, Mojiri Foroushani, AmirHossein, Moloodi, Amirsaeid, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Mori, Keiko Sophie, Mori, Shinsuke, Morioka, Tomohiko, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Munro, Robert, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Nakhlé, Mariam, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nourian, Alireza, Nurmi, Hanna, Ojala, Stina, Ojha, Atul Kr., Olúòkun, Adédayọ̀, Omura, Mai, Onwuegbuzia, Emeka, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Özateş, Şaziye Betül, Özgür, Arzucan, Öztürk Başaran, Balkız, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Peljak-Łapińska, Angelika, Peng, Siyao, Perez, Cenel-Augusto, Perkova, Natalia, Perrier, Guy, Petrov, Slav, Petrova, Daria, Phelan, Jason, Piitulainen, Jussi, Pirinen, Tommi A, Pitler, Emily, Plank, Barbara, Poibeau, Thierry, Ponomareva, Larisa, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Puolakainen, Tiina, Pyysalo, Sampo, Qi, Peng, Rääbis, Andriela, Rademaker, Alexandre, Rama, Taraka, Ramasamy, Loganathan, Ramisch, Carlos, Rashel, Fam, Rasooli, Mohammad Sadegh, Ravishankar, Vinit, Real, Livy, Rebeja, Petru, Reddy, Siva, Rehm, Georg, Riabov, Ivan, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rocha, Luisa, Rögnvaldsson, Eiríkur, Romanenko, Mykhailo, Rosa, Rudolf, Roșca, Valentin, Rovati, Davide, Rudina, Olga, Rueter, Jack, Rúnarsson, Kristján, Sadde, Shoval, Safari, Pegah, Sagot, Benoît, Sahala, Aleksi, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Särg, Dage, Saulīte, Baiba, Sawanakunanon, Yanin, Scannell, Kevin, Scarlata, Salvatore, Schneider, Nathan, Schuster, Sebastian, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shohibussirri, Muh, Sichinava, Dmitry, Sigurðsson, Einar Freyr, Silveira, Aline, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Skachedubova, Maria, Smith, Aaron, Soares-Bastos, Isabela, Spadine, Carolyn, Steingrímsson, Steinþór, Stella, Antonio, Straka, Milan, Strickland, Emmett, Strnadová, Jana, Suhr, Alane, Sulestio, Yogi Lesmana, Sulubacak, Umut, Suzuki, Shingo, Szántó, Zsolt, Taji, Dima, Takahashi, Yuta, Tamburini, Fabio, Tan, Mary Ann C., Tanaka, Takaaki, Tella, Samson, Tellier, Isabelle, Thomas, Guillaume, Torga, Liisi, Toska, Marsida, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Türk, Utku, Tyers, Francis, Uematsu, Sumire, Untilov, Roman, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Utka, Andrius, Vajjala, Sowmya, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Villemonte de la Clergerie, Eric, Vincze, Veronika, Wakasa, Aya, Wallenberg, Joel C., Wallin, Lars, Walsh, Abigail, Wang, Jing Xian, Washington, Jonathan North, Wendt, Maximilan, Widmer, Paul, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Yako, Mary, Yamashita, Kayo, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yu, Zhuoran, Žabokrtský, Zdeněk, Zahra, Shorouq, Zeldes, Amir, Zhu, Hanzhi, and Zhuravleva, Anna
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, Mbyá Guaraní, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Swiss German, Albanian, Icelandic, Akuntsu, Apurinã, Chukot, Khunsari, Manx, Mundurukú, Nayini, Old Turkish, Soi, South Levantine Arabic, and Tupinambá
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v2.7, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.7, and PUB
18. Universal Dependencies 2.8
- Creator:
- Zeman, Daniel, Nivre, Joakim, Abrams, Mitchell, Ackermann, Elia, Aepli, Noëmi, Aghaei, Hamid, Agić, Željko, Ahmadi, Amir, Ahrenberg, Lars, Ajede, Chika Kennedy, Aleksandravičiūtė, Gabrielė, Alfina, Ika, Antonsen, Lene, Aplonova, Katya, Aquino, Angelina, Aragon, Carolina, Aranzabe, Maria Jesus, Arıcan, Bilge Nas, Arnardóttir, Þórunn, Arutie, Gashaw, Arwidarasti, Jessica Naraiswari, Asahara, Masayuki, Aslan, Deniz Baran, Ateyah, Luma, Atmaca, Furkan, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Balasubramani, Keerthana, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Barkarson, Starkaður, Basmov, Victoria, Batchelor, Colin, Bauer, John, Bedir, Seyyit Talha, Bengoetxea, Kepa, Berk, Gözde, Berzak, Yevgeni, Bhat, Irshad Ahmad, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Bjarnadóttir, Kristín, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Braggaar, Anouck, Brokaitė, Kristina, Burchardt, Aljoscha, Candito, Marie, Caron, Bernard, Caron, Gauthier, Cassidy, Lauren, Cavalcanti, Tatiana, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cesur, Neslihan, Cetin, Savas, Çetinoğlu, Özlem, Chalub, Fabricio, Chauhan, Shweta, Chi, Ethan, Chika, Taishi, Cho, Yongseok, Choi, Jinho, Chun, Jayeol, Cignarella, Alessandra T., Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Courtin, Marine, Cristescu, Mihaela, Daniel, Philemon., Davidson, Elizabeth, de Marneffe, Marie-Catherine, de Paiva, Valeria, Derin, Mehmet Oguz, de Souza, Elvis, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dinakaramani, Arawinda, Di Nuovo, Elisa, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eckhoff, Hanne, Eiche, Sandra, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erina, Olga, Erjavec, Tomaž, Etienne, Aline, Evelyn, Wograine, Facundes, Sidney, Farkas, Richárd, Fernanda, Marília, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Garcia, Marcos, Gärdenfors, Moa, Garza, Sebastian, Gerardi, Fabrício Ferraz, Gerdes, Kim, Ginter, Filip, Godoy, Gustavo, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Griciūtė, Bernadeta, Grioni, Matias, Grobol, Loïc, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Güngör, Tunga, Habash, Nizar, Hafsteinsson, Hinrik, Hajič, Jan, Hajič jr., Jan, Hämäläinen, Mika, Hà Mỹ, Linh, Han, Na-Rae, Hanifmuti, Muhammad Yudistira, Hardwick, Sam, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hellwig, Oliver, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Huber, Eva, Hwang, Jena, Ikeda, Takumi, Ingason, Anton Karl, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Ito, Kaoru, Jelínek, Tomáš, Jha, Apoorva, Johannsen, Anders, Jónsdóttir, Hildur, Jørgensen, Fredrik, Juutinen, Markus, K, Sarveswaran, Kaşıkara, Hüner, Kaasen, Andre, Kabaeva, Nadezhda, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Kara, Neslihan, Katz, Boris, Kayadelen, Tolga, Kenney, Jessica, Kettnerová, Václava, Kirchner, Jesse, Klementieva, Elena, Köhn, Arne, Köksal, Abdullatif, Kopacewicz, Kamil, Korkiakangas, Timo, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Krishnamurthy, Parameswari, Kuyrukçu, Oğuzhan, Kuzgun, Aslı, Kwak, Sookyoung, Laippala, Veronika, Lam, Lucia, Lambertino, Lorenzo, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Levina, Maria, Li, Cheuk Ying, Li, Josie, Li, Keying, Li, Yuan, Lim, KyungTae, Lima Padovani, Bruna, Lindén, Krister, Ljubešić, Nikola, Loginova, Olga, Luthfi, Andry, Luukko, Mikko, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Marşan, Büşra, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Martínez Alonso, Héctor, Martins, André, Mašek, Jan, Matsuda, Hiroshi, Matsumoto, Yuji, Mazzei, Alessandro, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Miekka, Niko, Mischenkova, Karina, Misirpashayeva, Margarita, Missilä, Anna, Mititelu, Cătălin, Mitrofan, Maria, Miyao, Yusuke, Mojiri Foroushani, AmirHossein, Molnár, Judit, Moloodi, Amirsaeid, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Moretti, Giovanni, Mori, Keiko Sophie, Mori, Shinsuke, Morioka, Tomohiko, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Munro, Robert, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Nakhlé, Mariam, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nevaci, Manuela, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nourian, Alireza, Nurmi, Hanna, Ojala, Stina, Ojha, Atul Kr., Olúòkun, Adédayọ̀, Omura, Mai, Onwuegbuzia, Emeka, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Özateş, Şaziye Betül, Özçelik, Merve, Özgür, Arzucan, Öztürk Başaran, Balkız, Park, Hyunji Hayley, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Peljak-Łapińska, Angelika, Peng, Siyao, Perez, Cenel-Augusto, Perkova, Natalia, Perrier, Guy, Petrov, Slav, Petrova, Daria, Phelan, Jason, Piitulainen, Jussi, Pirinen, Tommi A, Pitler, Emily, Plank, Barbara, Poibeau, Thierry, Ponomareva, Larisa, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Puolakainen, Tiina, Pyysalo, Sampo, Qi, Peng, Rääbis, Andriela, Rademaker, Alexandre, Rama, Taraka, Ramasamy, Loganathan, Ramisch, Carlos, Rashel, Fam, Rasooli, Mohammad Sadegh, Ravishankar, Vinit, Real, Livy, Rebeja, Petru, Reddy, Siva, Rehm, Georg, Riabov, Ivan, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rocha, Luisa, Rögnvaldsson, Eiríkur, Romanenko, Mykhailo, Rosa, Rudolf, Roșca, Valentin, Rovati, Davide, Rudina, Olga, Rueter, Jack, Rúnarsson, Kristján, Sadde, Shoval, Safari, Pegah, Sagot, Benoît, Sahala, Aleksi, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Sanıyar, Ezgi, Särg, Dage, Saulīte, Baiba, Sawanakunanon, Yanin, Saxena, Shefali, Scannell, Kevin, Scarlata, Salvatore, Schneider, Nathan, Schuster, Sebastian, Schwartz, Lane, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shishkina, Yana, Shohibussirri, Muh, Sichinava, Dmitry, Siewert, Janine, Sigurðsson, Einar Freyr, Silveira, Aline, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Skachedubova, Maria, Smith, Aaron, Soares-Bastos, Isabela, Spadine, Carolyn, Sprugnoli, Rachele, Steingrímsson, Steinþór, Stella, Antonio, Straka, Milan, Strickland, Emmett, Strnadová, Jana, Suhr, Alane, Sulestio, Yogi Lesmana, Sulubacak, Umut, Suzuki, Shingo, Szántó, Zsolt, Taji, Dima, Takahashi, Yuta, Tamburini, Fabio, Tan, Mary Ann C., Tanaka, Takaaki, Tella, Samson, Tellier, Isabelle, Testori, Marinella, Thomas, Guillaume, Torga, Liisi, Toska, Marsida, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Türk, Utku, Tyers, Francis, Uematsu, Sumire, Untilov, Roman, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Utka, Andrius, Vajjala, Sowmya, van der Goot, Rob, Vanhove, Martine, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Villemonte de la Clergerie, Eric, Vincze, Veronika, Vlasova, Natalia, Wakasa, Aya, Wallenberg, Joel C., Wallin, Lars, Walsh, Abigail, Wang, Jing Xian, Washington, Jonathan North, Wendt, Maximilan, Widmer, Paul, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Yako, Mary, Yamashita, Kayo, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yenice, Arife Betül, Yıldız, Olcay Taner, Yu, Zhuoran, Žabokrtský, Zdeněk, Zahra, Shorouq, Zeldes, Amir, Zhu, Hanzhi, Zhuravleva, Anna, and Ziane, Rayan
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, Mbyá Guaraní, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Swiss German, Albanian, Icelandic, Akuntsu, Apurinã, Chukot, Khunsari, Manx, Mundurukú, Nayini, Old Turkish, Soi, South Levantine Arabic, Tupinambá, Beja, Western Frisian, Guajajára, Urubú-Kaapor, Kangri, K'iche', Low German, Makuráp, Central Siberian Yupik, and Western Armenian
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
- Rights:
- Licence Universal Dependencies v2.8, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.8, and PUB
19. Universal Dependencies 2.8.1
- Creator:
- Zeman, Daniel, Nivre, Joakim, Abrams, Mitchell, Ackermann, Elia, Aepli, Noëmi, Aghaei, Hamid, Agić, Željko, Ahmadi, Amir, Ahrenberg, Lars, Ajede, Chika Kennedy, Aleksandravičiūtė, Gabrielė, Alfina, Ika, Antonsen, Lene, Aplonova, Katya, Aquino, Angelina, Aragon, Carolina, Aranzabe, Maria Jesus, Arıcan, Bilge Nas, Arnardóttir, Þórunn, Arutie, Gashaw, Arwidarasti, Jessica Naraiswari, Asahara, Masayuki, Aslan, Deniz Baran, Ateyah, Luma, Atmaca, Furkan, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Balasubramani, Keerthana, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Barkarson, Starkaður, Basmov, Victoria, Batchelor, Colin, Bauer, John, Bedir, Seyyit Talha, Bengoetxea, Kepa, Berk, Gözde, Berzak, Yevgeni, Bhat, Irshad Ahmad, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Bjarnadóttir, Kristín, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Braggaar, Anouck, Brokaitė, Kristina, Burchardt, Aljoscha, Candito, Marie, Caron, Bernard, Caron, Gauthier, Cassidy, Lauren, Cavalcanti, Tatiana, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cesur, Neslihan, Cetin, Savas, Çetinoğlu, Özlem, Chalub, Fabricio, Chauhan, Shweta, Chi, Ethan, Chika, Taishi, Cho, Yongseok, Choi, Jinho, Chun, Jayeol, Cignarella, Alessandra T., Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Courtin, Marine, Cristescu, Mihaela, Daniel, Philemon., Davidson, Elizabeth, de Marneffe, Marie-Catherine, de Paiva, Valeria, Derin, Mehmet Oguz, de Souza, Elvis, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dinakaramani, Arawinda, Di Nuovo, Elisa, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eckhoff, Hanne, Eiche, Sandra, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erina, Olga, Erjavec, Tomaž, Etienne, Aline, Evelyn, Wograine, Facundes, Sidney, Farkas, Richárd, Fernanda, Marília, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Garcia, Marcos, Gärdenfors, Moa, Garza, Sebastian, Gerardi, Fabrício Ferraz, Gerdes, Kim, Ginter, Filip, Godoy, Gustavo, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Griciūtė, Bernadeta, Grioni, Matias, Grobol, Loïc, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Güngör, Tunga, Habash, Nizar, Hafsteinsson, Hinrik, Hajič, Jan, Hajič jr., Jan, Hämäläinen, Mika, Hà Mỹ, Linh, Han, Na-Rae, Hanifmuti, Muhammad Yudistira, Hardwick, Sam, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hellwig, Oliver, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Huber, Eva, Hwang, Jena, Ikeda, Takumi, Ingason, Anton Karl, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Ito, Kaoru, Jelínek, Tomáš, Jha, Apoorva, Johannsen, Anders, Jónsdóttir, Hildur, Jørgensen, Fredrik, Juutinen, Markus, K, Sarveswaran, Kaşıkara, Hüner, Kaasen, Andre, Kabaeva, Nadezhda, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Kara, Neslihan, Katz, Boris, Kayadelen, Tolga, Kenney, Jessica, Kettnerová, Václava, Kirchner, Jesse, Klementieva, Elena, Köhn, Arne, Köksal, Abdullatif, Kopacewicz, Kamil, Korkiakangas, Timo, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Krishnamurthy, Parameswari, Kuyrukçu, Oğuzhan, Kuzgun, Aslı, Kwak, Sookyoung, Laippala, Veronika, Lam, Lucia, Lambertino, Lorenzo, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Levina, Maria, Li, Cheuk Ying, Li, Josie, Li, Keying, Li, Yuan, Lim, KyungTae, Lima Padovani, Bruna, Lindén, Krister, Ljubešić, Nikola, Loginova, Olga, Luthfi, Andry, Luukko, Mikko, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Marşan, Büşra, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Martínez Alonso, Héctor, Martins, André, Mašek, Jan, Matsuda, Hiroshi, Matsumoto, Yuji, Mazzei, Alessandro, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Miekka, Niko, Mischenkova, Karina, Misirpashayeva, Margarita, Missilä, Anna, Mititelu, Cătălin, Mitrofan, Maria, Miyao, Yusuke, Mojiri Foroushani, AmirHossein, Molnár, Judit, Moloodi, Amirsaeid, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Moretti, Giovanni, Mori, Keiko Sophie, Mori, Shinsuke, Morioka, Tomohiko, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Munro, Robert, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Nakhlé, Mariam, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nevaci, Manuela, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nourian, Alireza, Nurmi, Hanna, Ojala, Stina, Ojha, Atul Kr., Olúòkun, Adédayọ̀, Omura, Mai, Onwuegbuzia, Emeka, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Özateş, Şaziye Betül, Özçelik, Merve, Özgür, Arzucan, Öztürk Başaran, Balkız, Park, Hyunji Hayley, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Peljak-Łapińska, Angelika, Peng, Siyao, Perez, Cenel-Augusto, Perkova, Natalia, Perrier, Guy, Petrov, Slav, Petrova, Daria, Phelan, Jason, Piitulainen, Jussi, Pirinen, Tommi A, Pitler, Emily, Plank, Barbara, Poibeau, Thierry, Ponomareva, Larisa, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Puolakainen, Tiina, Pyysalo, Sampo, Qi, Peng, Rääbis, Andriela, Rademaker, Alexandre, Rama, Taraka, Ramasamy, Loganathan, Ramisch, Carlos, Rashel, Fam, Rasooli, Mohammad Sadegh, Ravishankar, Vinit, Real, Livy, Rebeja, Petru, Reddy, Siva, Rehm, Georg, Riabov, Ivan, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rocha, Luisa, Rögnvaldsson, Eiríkur, Romanenko, Mykhailo, Rosa, Rudolf, Roșca, Valentin, Rovati, Davide, Rudina, Olga, Rueter, Jack, Rúnarsson, Kristján, Sadde, Shoval, Safari, Pegah, Sagot, Benoît, Sahala, Aleksi, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Sanıyar, Ezgi, Särg, Dage, Saulīte, Baiba, Sawanakunanon, Yanin, Saxena, Shefali, Scannell, Kevin, Scarlata, Salvatore, Schneider, Nathan, Schuster, Sebastian, Schwartz, Lane, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shishkina, Yana, Shohibussirri, Muh, Sichinava, Dmitry, Siewert, Janine, Sigurðsson, Einar Freyr, Silveira, Aline, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Skachedubova, Maria, Smith, Aaron, Soares-Bastos, Isabela, Spadine, Carolyn, Sprugnoli, Rachele, Steingrímsson, Steinþór, Stella, Antonio, Straka, Milan, Strickland, Emmett, Strnadová, Jana, Suhr, Alane, Sulestio, Yogi Lesmana, Sulubacak, Umut, Suzuki, Shingo, Szántó, Zsolt, Taji, Dima, Takahashi, Yuta, Tamburini, Fabio, Tan, Mary Ann C., Tanaka, Takaaki, Tella, Samson, Tellier, Isabelle, Testori, Marinella, Thomas, Guillaume, Torga, Liisi, Toska, Marsida, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Türk, Utku, Tyers, Francis, Uematsu, Sumire, Untilov, Roman, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Utka, Andrius, Vajjala, Sowmya, van der Goot, Rob, Vanhove, Martine, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Villemonte de la Clergerie, Eric, Vincze, Veronika, Vlasova, Natalia, Wakasa, Aya, Wallenberg, Joel C., Wallin, Lars, Walsh, Abigail, Wang, Jing Xian, Washington, Jonathan North, Wendt, Maximilan, Widmer, Paul, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Yako, Mary, Yamashita, Kayo, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yenice, Arife Betül, Yıldız, Olcay Taner, Yu, Zhuoran, Žabokrtský, Zdeněk, Zahra, Shorouq, Zeldes, Amir, Zhu, Hanzhi, Zhuravleva, Anna, and Ziane, Rayan
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, Mbyá Guaraní, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Swiss German, Albanian, Icelandic, Akuntsu, Apurinã, Chukot, Khunsari, Manx, Mundurukú, Nayini, Old Turkish, Soi, South Levantine Arabic, Tupinambá, Beja, Western Frisian, Guajajára, Urubú-Kaapor, Kangri, K'iche', Low German, Makuráp, Central Siberian Yupik, and Western Armenian
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008). Version 2.8.1 fixes a bug in 2.8 where a portion of the Dutch Alpino treebank was accidentally omitted.
- Rights:
- Licence Universal Dependencies v2.8, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.8, and PUB
20. Universal Dependencies 2.9
- Creator:
- Zeman, Daniel, Nivre, Joakim, Abrams, Mitchell, Ackermann, Elia, Aepli, Noëmi, Aghaei, Hamid, Agić, Željko, Ahmadi, Amir, Ahrenberg, Lars, Ajede, Chika Kennedy, Aleksandravičiūtė, Gabrielė, Alfina, Ika, Antonsen, Lene, Aplonova, Katya, Aquino, Angelina, Aragon, Carolina, Aranzabe, Maria Jesus, Arıcan, Bilge Nas, Arnardóttir, Þórunn, Arutie, Gashaw, Arwidarasti, Jessica Naraiswari, Asahara, Masayuki, Aslan, Deniz Baran, Ateyah, Luma, Atmaca, Furkan, Attia, Mohammed, Atutxa, Aitziber, Augustinus, Liesbeth, Badmaeva, Elena, Balasubramani, Keerthana, Ballesteros, Miguel, Banerjee, Esha, Bank, Sebastian, Barbu Mititelu, Verginica, Barkarson, Starkaður, Basile, Rodolfo, Basmov, Victoria, Batchelor, Colin, Bauer, John, Bedir, Seyyit Talha, Bengoetxea, Kepa, Berk, Gözde, Berzak, Yevgeni, Bhat, Irshad Ahmad, Bhat, Riyaz Ahmad, Biagetti, Erica, Bick, Eckhard, Bielinskienė, Agnė, Bjarnadóttir, Kristín, Blokland, Rogier, Bobicev, Victoria, Boizou, Loïc, Borges Völker, Emanuel, Börstell, Carl, Bosco, Cristina, Bouma, Gosse, Bowman, Sam, Boyd, Adriane, Braggaar, Anouck, Brokaitė, Kristina, Burchardt, Aljoscha, Candito, Marie, Caron, Bernard, Caron, Gauthier, Cassidy, Lauren, Cavalcanti, Tatiana, Cebiroğlu Eryiğit, Gülşen, Cecchini, Flavio Massimiliano, Celano, Giuseppe G. A., Čéplö, Slavomír, Cesur, Neslihan, Cetin, Savas, Çetinoğlu, Özlem, Chalub, Fabricio, Chauhan, Shweta, Chi, Ethan, Chika, Taishi, Cho, Yongseok, Choi, Jinho, Chun, Jayeol, Chung, Juyeon, Cignarella, Alessandra T., Cinková, Silvie, Collomb, Aurélie, Çöltekin, Çağrı, Connor, Miriam, Courtin, Marine, Cristescu, Mihaela, Daniel, Philemon, Davidson, Elizabeth, de Marneffe, Marie-Catherine, de Paiva, Valeria, Derin, Mehmet Oguz, de Souza, Elvis, Diaz de Ilarraza, Arantza, Dickerson, Carly, Dinakaramani, Arawinda, Di Nuovo, Elisa, Dione, Bamba, Dirix, Peter, Dobrovoljc, Kaja, Dozat, Timothy, Droganova, Kira, Dwivedi, Puneet, Eckhoff, Hanne, Eiche, Sandra, Eli, Marhaba, Elkahky, Ali, Ephrem, Binyam, Erina, Olga, Erjavec, Tomaž, Etienne, Aline, Evelyn, Wograine, Facundes, Sidney, Farkas, Richárd, Ferdaousi, Jannatul, Fernanda, Marília, Fernandez Alcalde, Hector, Foster, Jennifer, Freitas, Cláudia, Fujita, Kazunori, Gajdošová, Katarína, Galbraith, Daniel, Garcia, Marcos, Gärdenfors, Moa, Garza, Sebastian, Gerardi, Fabrício Ferraz, Gerdes, Kim, Ginter, Filip, Godoy, Gustavo, Goenaga, Iakes, Gojenola, Koldo, Gökırmak, Memduh, Goldberg, Yoav, Gómez Guinovart, Xavier, González Saavedra, Berta, Griciūtė, Bernadeta, Grioni, Matias, Grobol, Loïc, Grūzītis, Normunds, Guillaume, Bruno, Guillot-Barbance, Céline, Güngör, Tunga, Habash, Nizar, Hafsteinsson, Hinrik, Hajič, Jan, Hajič jr., Jan, Hämäläinen, Mika, Hà Mỹ, Linh, Han, Na-Rae, Hanifmuti, Muhammad Yudistira, Hardwick, Sam, Harris, Kim, Haug, Dag, Heinecke, Johannes, Hellwig, Oliver, Hennig, Felix, Hladká, Barbora, Hlaváčová, Jaroslava, Hociung, Florinel, Hohle, Petter, Huber, Eva, Hwang, Jena, Ikeda, Takumi, Ingason, Anton Karl, Ion, Radu, Irimia, Elena, Ishola, Ọlájídé, Ito, Kaoru, Jannat, Siratun, Jelínek, Tomáš, Jha, Apoorva, Johannsen, Anders, Jónsdóttir, Hildur, Jørgensen, Fredrik, Juutinen, Markus, K, Sarveswaran, Kaşıkara, Hüner, Kaasen, Andre, Kabaeva, Nadezhda, Kahane, Sylvain, Kanayama, Hiroshi, Kanerva, Jenna, Kara, Neslihan, Katz, Boris, Kayadelen, Tolga, Kenney, Jessica, Kettnerová, Václava, Kirchner, Jesse, Klementieva, Elena, Klyachko, Elena, Köhn, Arne, Köksal, Abdullatif, Kopacewicz, Kamil, Korkiakangas, Timo, Köse, Mehmet, Kotsyba, Natalia, Kovalevskaitė, Jolanta, Krek, Simon, Krishnamurthy, Parameswari, Kübler, Sandra, Kuyrukçu, Oğuzhan, Kuzgun, Aslı, Kwak, Sookyoung, Laippala, Veronika, Lam, Lucia, Lambertino, Lorenzo, Lando, Tatiana, Larasati, Septina Dian, Lavrentiev, Alexei, Lee, John, Lê Hồng, Phương, Lenci, Alessandro, Lertpradit, Saran, Leung, Herman, Levina, Maria, Li, Cheuk Ying, Li, Josie, Li, Keying, Li, Yuan, Lim, KyungTae, Lima Padovani, Bruna, Lindén, Krister, Ljubešić, Nikola, Loginova, Olga, Lusito, Stefano, Luthfi, Andry, Luukko, Mikko, Lyashevskaya, Olga, Lynn, Teresa, Macketanz, Vivien, Mahamdi, Menel, Maillard, Jean, Makazhanov, Aibek, Mandl, Michael, Manning, Christopher, Manurung, Ruli, Marşan, Büşra, Mărănduc, Cătălina, Mareček, David, Marheinecke, Katrin, Martínez Alonso, Héctor, Martín-Rodríguez, Lorena, Martins, André, Mašek, Jan, Matsuda, Hiroshi, Matsumoto, Yuji, Mazzei, Alessandro, McDonald, Ryan, McGuinness, Sarah, Mendonça, Gustavo, Merzhevich, Tatiana, Miekka, Niko, Mischenkova, Karina, Misirpashayeva, Margarita, Missilä, Anna, Mititelu, Cătălin, Mitrofan, Maria, Miyao, Yusuke, Mojiri Foroushani, AmirHossein, Molnár, Judit, Moloodi, Amirsaeid, Montemagni, Simonetta, More, Amir, Moreno Romero, Laura, Moretti, Giovanni, Mori, Keiko Sophie, Mori, Shinsuke, Morioka, Tomohiko, Moro, Shigeki, Mortensen, Bjartur, Moskalevskyi, Bohdan, Muischnek, Kadri, Munro, Robert, Murawaki, Yugo, Müürisep, Kaili, Nainwani, Pinkey, Nakhlé, Mariam, Navarro Horñiacek, Juan Ignacio, Nedoluzhko, Anna, Nešpore-Bērzkalne, Gunta, Nevaci, Manuela, Nguyễn Thị, Lương, Nguyễn Thị Minh, Huyền, Nikaido, Yoshihiro, Nikolaev, Vitaly, Nitisaroj, Rattima, Nourian, Alireza, Nurmi, Hanna, Ojala, Stina, Ojha, Atul Kr., Olúòkun, Adédayọ̀, Omura, Mai, Onwuegbuzia, Emeka, Osenova, Petya, Östling, Robert, Øvrelid, Lilja, Özateş, Şaziye Betül, Özçelik, Merve, Özgür, Arzucan, Öztürk Başaran, Balkız, Park, Hyunji Hayley, Partanen, Niko, Pascual, Elena, Passarotti, Marco, Patejuk, Agnieszka, Paulino-Passos, Guilherme, Peljak-Łapińska, Angelika, Peng, Siyao, Perez, Cenel-Augusto, Perkova, Natalia, Perrier, Guy, Petrov, Slav, Petrova, Daria, Phelan, Jason, Piitulainen, Jussi, Pirinen, Tommi A, Pitler, Emily, Plank, Barbara, Poibeau, Thierry, Ponomareva, Larisa, Popel, Martin, Pretkalniņa, Lauma, Prévost, Sophie, Prokopidis, Prokopis, Przepiórkowski, Adam, Puolakainen, Tiina, Pyysalo, Sampo, Qi, Peng, Rääbis, Andriela, Rademaker, Alexandre, Rahoman, Mizanur, Rama, Taraka, Ramasamy, Loganathan, Ramisch, Carlos, Rashel, Fam, Rasooli, Mohammad Sadegh, Ravishankar, Vinit, Real, Livy, Rebeja, Petru, Reddy, Siva, Regnault, Mathilde, Rehm, Georg, Riabov, Ivan, Rießler, Michael, Rimkutė, Erika, Rinaldi, Larissa, Rituma, Laura, Rizqiyah, Putri, Rocha, Luisa, Rögnvaldsson, Eiríkur, Romanenko, Mykhailo, Rosa, Rudolf, Roșca, Valentin, Rovati, Davide, Rudina, Olga, Rueter, Jack, Rúnarsson, Kristján, Sadde, Shoval, Safari, Pegah, Sagot, Benoît, Sahala, Aleksi, Saleh, Shadi, Salomoni, Alessio, Samardžić, Tanja, Samson, Stephanie, Sanguinetti, Manuela, Sanıyar, Ezgi, Särg, Dage, Saulīte, Baiba, Sawanakunanon, Yanin, Saxena, Shefali, Scannell, Kevin, Scarlata, Salvatore, Schneider, Nathan, Schuster, Sebastian, Schwartz, Lane, Seddah, Djamé, Seeker, Wolfgang, Seraji, Mojgan, Shahzadi, Syeda, Shen, Mo, Shimada, Atsuko, Shirasu, Hiroyuki, Shishkina, Yana, Shohibussirri, Muh, Sichinava, Dmitry, Siewert, Janine, Sigurðsson, Einar Freyr, Silveira, Aline, Silveira, Natalia, Simi, Maria, Simionescu, Radu, Simkó, Katalin, Šimková, Mária, Simov, Kiril, Skachedubova, Maria, Smith, Aaron, Soares-Bastos, Isabela, Sourov, Shafi, Spadine, Carolyn, Sprugnoli, Rachele, Steingrímsson, Steinþór, Stella, Antonio, Straka, Milan, Strickland, Emmett, Strnadová, Jana, Suhr, Alane, Sulestio, Yogi Lesmana, Sulubacak, Umut, Suzuki, Shingo, Szántó, Zsolt, Taguchi, Chihiro, Taji, Dima, Takahashi, Yuta, Tamburini, Fabio, Tan, Mary Ann C., Tanaka, Takaaki, Tanaya, Dipta, Tella, Samson, Tellier, Isabelle, Testori, Marinella, Thomas, Guillaume, Torga, Liisi, Toska, Marsida, Trosterud, Trond, Trukhina, Anna, Tsarfaty, Reut, Türk, Utku, Tyers, Francis, Uematsu, Sumire, Untilov, Roman, Urešová, Zdeňka, Uria, Larraitz, Uszkoreit, Hans, Utka, Andrius, Vajjala, Sowmya, van der Goot, Rob, Vanhove, Martine, van Niekerk, Daniel, van Noord, Gertjan, Varga, Viktor, Villemonte de la Clergerie, Eric, Vincze, Veronika, Vlasova, Natalia, Wakasa, Aya, Wallenberg, Joel C., Wallin, Lars, Walsh, Abigail, Wang, Jing Xian, Washington, Jonathan North, Wendt, Maximilan, Widmer, Paul, Wijono, Sri Hartati, Williams, Seyi, Wirén, Mats, Wittern, Christian, Woldemariam, Tsegay, Wong, Tak-sum, Wróblewska, Alina, Yako, Mary, Yamashita, Kayo, Yamazaki, Naoki, Yan, Chunxiao, Yasuoka, Koichi, Yavrumyan, Marat M., Yenice, Arife Betül, Yıldız, Olcay Taner, Yu, Zhuoran, Yuliawati, Arlisa, Žabokrtský, Zdeněk, Zahra, Shorouq, Zeldes, Amir, Zhou, He, Zhu, Hanzhi, Zhuravleva, Anna, and Ziane, Rayan
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, morphology, harmonized annotation, interset, universal tagset, and stanford dependencies
- Language:
- Ancient Greek (to 1453), Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Modern Greek (1453-), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Church Slavic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil, Catalan, Chinese, Galician, Kazakh, Latvian, Russian, Turkish, Coptic, Sanskrit, Slovak, Ukrainian, Uighur, Vietnamese, Belarusian, Korean, Lithuanian, Urdu, Russia Buriat, Northern Kurdish, Northern Sami, Upper Sorbian, Afrikaans, Yue Chinese, Marathi, Serbian, Swedish Sign Language, Telugu, Amharic, Armenian, Breton, Faroese, Komi-Zyrian, Nigerian Pidgin, Old French (842-ca. 1400), Tagalog, Thai, Warlpiri, Yoruba, Akkadian, Bambara, Erzya, Maltese, Welsh, Wolof, Assyrian Neo-Aramaic, Literary Chinese, Old Russian, Karelian, Mbyá Guaraní, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Swiss German, Albanian, Icelandic, Akuntsu, Apurinã, Chukot, Khunsari, Manx, Mundurukú, Nayini, Old Turkish, Soi, South Levantine Arabic, Tupinambá, Beja, Western Frisian, Guajajára, Urubú-Kaapor, Kangri, K'iche', Low German, Makuráp, Central Siberian Yupik, Western Armenian, Bengali, Javanese, Karo (Brazil), Ligurian, Neapolitan, Tatar, Xibe, and Yakut
- Description:
- Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008). Version 2.8.1 fixes a bug in 2.8 where a portion of the Dutch Alpino treebank was accidentally omitted.
- Rights:
- Licence Universal Dependencies v2.9, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.9, and PUB
21. Universal Derivations v0.5
- Creator:
- Kyjánek, Lukáš, Žabokrtský, Zdeněk, Vidra, Jonáš, and Ševčíková, Magda
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text, lexicon, and lexicalConceptualResource
- Subject:
- universal derivations, uder, word-formation, derivation, derivational morphology, and lexical network
- Language:
- Czech, English, Estonian, Finnish, French, German, Latin, Persian, Polish, Portuguese, and Spanish
- Description:
- Universal Derivations (UDer) is a collection of harmonized lexical networks capturing word-formation, especially derivational relations, in a cross-linguistically consistent annotation scheme for many languages. The annotation scheme is based on a rooted tree data structure, in which nodes correspond to lexemes, while edges represent derivational relations or compounding. The current version of the UDer collection contains eleven harmonized resources covering eleven different languages.
- Rights:
- Universal Derivations v0.5 License Agreement, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UDer-0.5, and PUB
22. Universal Derivations v1.0
- Creator:
- Kyjánek, Lukáš, Žabokrtský, Zdeněk, Vidra, Jonáš, and Ševčíková, Magda
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text, lexicon, and lexicalConceptualResource
- Subject:
- universal derivations, uder, word-formation, derivation, derivational morphology, lexical network, and harmonization
- Language:
- Czech, English, Estonian, Finnish, German, French, Latin, Persian, Polish, Portuguese, Spanish, Catalan, Turkish, Scottish Gaelic, Russian, Swedish, Serbo-Croatian, Italian, Dutch, and Croatian
- Description:
- Universal Derivations (UDer) is a collection of harmonized lexical networks capturing word-formation, especially derivational relations, in a cross-linguistically consistent annotation scheme for many languages. The annotation scheme is based on a rooted tree data structure, in which nodes correspond to lexemes, while edges represent derivational relations or compounding. The current version of the UDer collection contains twenty-seven harmonized resources covering twenty different languages.
- Rights:
- Universal Derivations v1.0 License Agreement, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UDer-1.0, and PUB
23. Universal Derivations v1.1
- Creator:
- Kyjánek, Lukáš, Žabokrtský, Zdeněk, Vidra, Jonáš, and Ševčíková, Magda
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- lexicon, text, and lexicalConceptualResource
- Subject:
- universal derivations, uder, word-formation, derivation, derivational morphology, lexical network, and harmonization
- Language:
- Czech, English, Estonian, Finnish, German, French, Latin, Persian, Polish, Portuguese, Spanish, Catalan, Turkish, Scottish Gaelic, Russian, Swedish, Serbo-Croatian, Italian, Dutch, Croatian, and Slovenian
- Description:
- Universal Derivations (UDer) is a collection of harmonized lexical networks capturing word-formation, especially derivational relations, in a cross-linguistically consistent annotation scheme for many languages. The annotation scheme is based on a rooted tree data structure, in which nodes correspond to lexemes, while edges represent derivational relations or compounding. The current version of the UDer collection contains thirty-one harmonized resources covering twenty-one different languages.
- Rights:
- Universal Derivations v1.1 License Agreement, PUB, and https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UDer-1.1
24. Universal Segmentations 1.0 (UniSegments 1.0)
- Creator:
- Žabokrtský, Zdeněk, Bafna, Nyati, Bodnár, Jan, Kyjánek, Lukáš, Svoboda, Emil, Ševčíková, Magda, Vidra, Jonáš, Angle, Sachi, Ansari, Ebrahim, Arkhangelskiy, Timofey, Batsuren, Khuyagbaatar, Bella, Gábor, Bertinetto, Pier Marco, Bonami, Olivier, Celata, Chiara, Daniel, Michael, Fedorenko, Alexei, Filko, Matea, Giunchiglia, Fausto, Haghdoost, Hamid, Hathout, Nabil, Khomchenkova, Irina, Khurshudyan, Victoria, Levonian, Dmitri, Litta, Eleonora, Medvedeva, Maria, Muralikrishna, S. N., Namer, Fiammetta, Nikravesh, Mahshid, Padó, Sebastian, Passarotti, Marco, Plungian, Vladimir, Polyakov, Alexey, Potapov, Mihail, Pruthwik, Mishra, Rao B, Ashwath, Rubakov, Sergei, Samar, Husain, Sharma, Dipti Misra, Šnajder, Jan, Šojat, Krešimir, Štefanec, Vanja, Talamo, Luigi, Tribout, Delphine, Vodolazsky, Daniil, Vydrin, Arseniy, Zakirova, Aigul, and Zeller, Britta
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text, lexicon, and lexicalConceptualResource
- Subject:
- universal segmentations, morphological segmentation, word segmentation, segmentation, morphology, morphemes, morphological dictionary, unisegments, morph, and multilingual
- Language:
- Czech, Catalan, German, English, Persian, Finnish, French, Serbo-Croatian, Croatian, Hungarian, Italian, Komi-Zyrian, Latin, Moksha, Mari (Russia), Mongolian, Erzya, Polish, Portuguese, Russian, Spanish, Swedish, Tajik, Udmurt, Armenian, Bengali, Hindi, Malayalam, Marathi, and Kannada
- Description:
- Universal Segmentations (UniSegments) is a collection of lexical resources capturing morphological segmentations harmonised into a cross-linguistically consistent annotation scheme for many languages. The annotation scheme consists of simple tab-separated columns that stores a word and its morphological segmentations, including pieces of information about the word and the segmented units, e.g., part-of-speech categories, type of morphs/morphemes etc. The current public version of the collection contains 38 harmonised segmentation datasets covering 30 different languages.
- Rights:
- Universal Segmentations 1.0 License Terms, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-unisegs-1.0, and PUB