Skip to search
Skip to main content
Skip to first result
Search
Search Results
Creator:
Straka, Milan
Publisher:
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:
text , mlmodel , and languageDescription
Subject:
CoNLL 2017 , tokenizer , POS tagger , lemmatization , tagger , parser , dependency parser , morphology , and treebank
Language:
Multiple languages
Description:
Baseline UDPipe models for CoNLL 2017 Shared Task in UD Parsing, and supplementary material.
The models require UDPipe version at least 1.1 and are evaluated using the official evaluation script.
The models are trained on a slightly different split of the official UD 2.0 CoNLL 2017 training data, so called baselinemodel split, in order to allow comparison of models even during the shared task. This baselinemodel split of UD 2.0 CoNLL 2017 training data is available for download.
Furthermore, we also provide UD 2.0 CoNLL 2017 training data with automatically predicted morphology. We utilize the baseline models on development data and perform 10-fold jack-knifing (each fold is predicted with a model trained on the rest of the folds) on the training data.
Finally, we supply all required data and hyperparameter values needed to replicate the baseline models.
Rights:
Licence Universal Dependencies v2.0 , https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.0 , and PUB
Creator:
Straka, Milan
Publisher:
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:
text , mlmodel , and languageDescription
Subject:
CoNLL 2018 , tokenizer , POS tagger , lemmatization , tagger , parser , dependency parser , morphology , and treebank
Language:
Multiple languages
Description:
Baseline UDPipe models for CoNLL 2018 Shared Task in UD Parsing, and supplementary material.
The models require UDPipe version at least 1.2 and are evaluated using the official evaluation script. The models were trained using a custom data split for treebanks where no development data is provided. Also, we trained an additional "Mixed" model, which uses 200 sentences from every training data. All information needed to replicate the model training (hyperparameters, modified train-dev split, and pre-computed word embeddings for the parser) are included in the archive.
Additionaly, we provide UD 2.2 CoNLL 2018 training data with automatically predicted morphology. We utilize the baseline models on development data and perform 10-fold jack-knifing (each fold is predicted with a model trained on the rest of the folds) on the training data.
Rights:
Licence Universal Dependencies v2.2 , https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.2 , and PUB
Creator:
Kríž, Vincent and Hladká, Barbora
Publisher:
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:
text and corpus
Subject:
treebank , Prague dependencies , named entities , and semantic relations
Language:
Czech
Description:
The Czech Legal Text Treebank 2.0 (CLTT 2.0) annotates the same texts as the CLTT 1.0. These texts come from the legal domain and they are manually syntactically annotated. The CLTT 2.0 annotation on the syntactic layer is more elaborate than in the CLTT 1.0 from various aspects. In addition, new annotation layers were added to the data: (i) the layer of accounting entities, and (ii) the layer of semantic entity relations.
Rights:
Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) , http://creativecommons.org/licenses/by-nc-sa/4.0/ , and PUB
Creator:
Hajič, Jan , Mareček, David , Fučíková, Eva , Cinková, Silvie , Štěpánek, Jan , and Mikulová, Marie
Publisher:
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:
text and corpus
Subject:
tectogrammatics , treebank , parallel corpus , and noisy texts
Language:
English and Czech
Description:
Syntactic (including deep-syntactic - tectogrammatical) annotation of user-generated noisy sentences. The annotation was made on Czech-English and English-Czech Faust Dev/Test sets.
The English data includes manual annotations of English reference translations of Czech source texts. This texts were translated independently by two translators. After some necessary cleanings, 1000 segments were randomly selected for manual annotation. Both the reference translations were annotated, which means 2000 annotated segments in total.
The Czech data includes manual annotations of Czech reference translations of English source texts. This texts were translated independently by three translators. After some necessary cleanings, 1000 segments were randomly selected for manual annotation. All three reference translations were annotated, which means 3000 annotated segments in total.
Faust is part of PDT-C 1.0 (http://hdl.handle.net/11234/1-3185).
Rights:
Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) , http://creativecommons.org/licenses/by-nc/4.0/ , and PUB
Creator:
Hajič, Jan , Bejček, Eduard , Bémová, Alevtina , Buráňová, Eva , Fučíková, Eva , Hajičová, Eva , Havelka, Jiří , Hlaváčová, Jaroslava , Homola, Petr , Ircing, Pavel , Kárník, Jiří , Kettnerová, Václava , Klyueva, Natalia , Kolářová, Veronika , Kučová, Lucie , Lopatková, Markéta , Mareček, David , Mikulová, Marie , Mírovský, Jiří , Nedoluzhko, Anna , Novák, Michal , Pajas, Petr , Panevová, Jarmila , Peterek, Nino , Poláková, Lucie , Popel, Martin , Popelka, Jan , Romportl, Jan , Rysová, Magdaléna , Semecký, Jiří , Sgall, Petr , Spoustová, Johanka , Straka, Milan , Straňák, Pavel , Synková, Pavlína , Ševčíková, Magda , Šindlerová, Jana , Štěpánek, Jan , Štěpánková, Barbora , Toman, Josef , Urešová, Zdeňka , Vidová Hladká, Barbora , Zeman, Daniel , Zikánová, Šárka , and Žabokrtský, Zdeněk
Publisher:
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:
text and corpus
Subject:
treebank , dependency , tectogrammatics , topic-focus articulation , multiword expressions , coreference , bridging relations , discourse , morphology , syntax , tokenization , lemmatization , semantic relations , lexical semantics , lexicon , valency , speech reconstruction , clauses , speech recognition , and spoken corpus
Language:
Czech
Description:
A richly annotated and genre-diversified language resource, The Prague Dependency Treebank – Consolidated 1.0 (PDT-C 1.0, or PDT-C in short in the sequel) is a consolidated release of the existing PDT-corpora of Czech data, uniformly annotated using the standard PDT scheme. PDT-corpora included in PDT-C: Prague Dependency Treebank (the original PDT contents, written newspaper and journal texts from three genres); Czech part of Prague Czech-English Dependency Treebank (translated financial texts, from English), Prague Dependency Treebank of Spoken Czech (spoken data, including audio and transcripts and multiple speech reconstruction annotation); PDT-Faust (user-generated texts). The difference from the separately published original treebanks can be briefly described as follows: it is published in one package, to allow easier data handling for all the datasets; the data is enhanced with a manual linguistic annotation at the morphological layer and new version of morphological dictionary is enclosed; a common valency lexicon for all four original parts is enclosed. Documentation provides two browsing and editing desktop tools (TrEd and MEd) and the corpus is also available online for searching using PML-TQ.
Rights:
Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) , http://creativecommons.org/licenses/by-nc-sa/4.0/ , and PUB
Creator:
Rysová, Magdaléna , Synková, Pavlína , Mírovský, Jiří , Hajičová, Eva , Nedoluzhko, Anna , Ocelák, Radek , Pergler, Jiří , Poláková, Lucie , Scheller, Veronika , Zdeňková, Jana , and Zikánová, Šárka
Publisher:
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:
text and corpus
Subject:
discourse , bridging relations , coreference , topic-focus articulation , treebank , dependency , tectogrammatics , and PDT
Language:
Czech
Description:
PDiT 2.0 is a new version of the Prague Discourse Treebank. It contains a complex annotation of discourse phenomena enriched by the annotation of secondary connectives.
Rights:
Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) , http://creativecommons.org/licenses/by-nc-sa/4.0/ , and PUB
Creator:
Nivre, Joakim , Abrams, Mitchell , Agić, Željko , Ahrenberg, Lars , Aleksandravičiūtė, Gabrielė , Antonsen, Lene , Aplonova, Katya , Aranzabe, Maria Jesus , Arutie, Gashaw , Asahara, Masayuki , Ateyah, Luma , Attia, Mohammed , Atutxa, Aitziber , Augustinus, Liesbeth , Badmaeva, Elena , Ballesteros, Miguel , Banerjee, Esha , Bank, Sebastian , Barbu Mititelu, Verginica , Basmov, Victoria , Bauer, John , Bellato, Sandra , Bengoetxea, Kepa , Berzak, Yevgeni , Bhat, Irshad Ahmad , Bhat, Riyaz Ahmad , Biagetti, Erica , Bick, Eckhard , Bielinskienė, Agnė , Blokland, Rogier , Bobicev, Victoria , Boizou, Loïc , Borges Völker, Emanuel , Börstell, Carl , Bosco, Cristina , Bouma, Gosse , Bowman, Sam , Boyd, Adriane , Brokaitė, Kristina , Burchardt, Aljoscha , Candito, Marie , Caron, Bernard , Caron, Gauthier , Cebiroğlu Eryiğit, Gülşen , Cecchini, Flavio Massimiliano , Celano, Giuseppe G. A. , Čéplö, Slavomír , Cetin, Savas , Chalub, Fabricio , Choi, Jinho , Cho, Yongseok , Chun, Jayeol , Cinková, Silvie , Collomb, Aurélie , Çöltekin, Çağrı , Connor, Miriam , Courtin, Marine , Davidson, Elizabeth , de Marneffe, Marie-Catherine , de Paiva, Valeria , Diaz de Ilarraza, Arantza , Dickerson, Carly , Dione, Bamba , Dirix, Peter , Dobrovoljc, Kaja , Dozat, Timothy , Droganova, Kira , Dwivedi, Puneet , Eckhoff, Hanne , Eli, Marhaba , Elkahky, Ali , Ephrem, Binyam , Erjavec, Tomaž , Etienne, Aline , Farkas, Richárd , Fernandez Alcalde, Hector , Foster, Jennifer , Freitas, Cláudia , Fujita, Kazunori , Gajdošová, Katarína , Galbraith, Daniel , Garcia, Marcos , Gärdenfors, Moa , Garza, Sebastian , Gerdes, Kim , Ginter, Filip , Goenaga, Iakes , Gojenola, Koldo , Gökırmak, Memduh , Goldberg, Yoav , Gómez Guinovart, Xavier , González Saavedra, Berta , Grioni, Matias , Grūzītis, Normunds , Guillaume, Bruno , Guillot-Barbance, Céline , Habash, Nizar , Hajič, Jan , Hajič jr., Jan , Hà Mỹ, Linh , Han, Na-Rae , Harris, Kim , Haug, Dag , Heinecke, Johannes , Hennig, Felix , Hladká, Barbora , Hlaváčová, Jaroslava , Hociung, Florinel , Hohle, Petter , Hwang, Jena , Ikeda, Takumi , Ion, Radu , Irimia, Elena , Ishola, Ọlájídé , Jelínek, Tomáš , Johannsen, Anders , Jørgensen, Fredrik , Kaşıkara, Hüner , Kaasen, Andre , Kahane, Sylvain , Kanayama, Hiroshi , Kanerva, Jenna , Katz, Boris , Kayadelen, Tolga , Kenney, Jessica , Kettnerová, Václava , Kirchner, Jesse , Köhn, Arne , Kopacewicz, Kamil , Kotsyba, Natalia , Kovalevskaitė, Jolanta , Krek, Simon , Kwak, Sookyoung , Laippala, Veronika , Lambertino, Lorenzo , Lam, Lucia , Lando, Tatiana , Larasati, Septina Dian , Lavrentiev, Alexei , Lee, John , Lê Hồng, Phương , Lenci, Alessandro , Lertpradit, Saran , Leung, Herman , Li, Cheuk Ying , Li, Josie , Li, Keying , Lim, KyungTae , Li, Yuan , Ljubešić, Nikola , Loginova, Olga , Lyashevskaya, Olga , Lynn, Teresa , Macketanz, Vivien , Makazhanov, Aibek , Mandl, Michael , Manning, Christopher , Manurung, Ruli , Mărănduc, Cătălina , Mareček, David , Marheinecke, Katrin , Martínez Alonso, Héctor , Martins, André , Mašek, Jan , Matsumoto, Yuji , McDonald, Ryan , McGuinness, Sarah , Mendonça, Gustavo , Miekka, Niko , Misirpashayeva, Margarita , Missilä, Anna , Mititelu, Cătălin , Miyao, Yusuke , Montemagni, Simonetta , More, Amir , Moreno Romero, Laura , Mori, Keiko Sophie , Morioka, Tomohiko , Mori, Shinsuke , Moro, Shigeki , Mortensen, Bjartur , Moskalevskyi, Bohdan , Muischnek, Kadri , Murawaki, Yugo , Müürisep, Kaili , Nainwani, Pinkey , Navarro Horñiacek, Juan Ignacio , Nedoluzhko, Anna , Nešpore-Bērzkalne, Gunta , Nguyễn Thị, Lương , Nguyễn Thị Minh, Huyền , Nikaido, Yoshihiro , Nikolaev, Vitaly , Nitisaroj, Rattima , Nurmi, Hanna , Ojala, Stina , Olúòkun, Adédayọ̀ , Omura, Mai , Osenova, Petya , Östling, Robert , Øvrelid, Lilja , Partanen, Niko , Pascual, Elena , Passarotti, Marco , Patejuk, Agnieszka , Paulino-Passos, Guilherme , Peljak-Łapińska, Angelika , Peng, Siyao , Perez, Cenel-Augusto , Perrier, Guy , Petrova, Daria , Petrov, Slav , Piitulainen, Jussi , Pirinen, Tommi A , Pitler, Emily , Plank, Barbara , Poibeau, Thierry , Popel, Martin , Pretkalniņa, Lauma , Prévost, Sophie , Prokopidis, Prokopis , Przepiórkowski, Adam , Puolakainen, Tiina , Pyysalo, Sampo , Rääbis, Andriela , Rademaker, Alexandre , Ramasamy, Loganathan , Rama, Taraka , Ramisch, Carlos , Ravishankar, Vinit , Real, Livy , Reddy, Siva , Rehm, Georg , Rießler, Michael , Rimkutė, Erika , Rinaldi, Larissa , Rituma, Laura , Rocha, Luisa , Romanenko, Mykhailo , Rosa, Rudolf , Rovati, Davide , Roșca, Valentin , Rudina, Olga , Rueter, Jack , Sadde, Shoval , Sagot, Benoît , Saleh, Shadi , Salomoni, Alessio , Samardžić, Tanja , Samson, Stephanie , Sanguinetti, Manuela , Särg, Dage , Saulīte, Baiba , Sawanakunanon, Yanin , Schneider, Nathan , Schuster, Sebastian , Seddah, Djamé , Seeker, Wolfgang , Seraji, Mojgan , Shen, Mo , Shimada, Atsuko , Shirasu, Hiroyuki , Shohibussirri, Muh , Sichinava, Dmitry , Silveira, Natalia , Simi, Maria , Simionescu, Radu , Simkó, Katalin , Šimková, Mária , Simov, Kiril , Smith, Aaron , Soares-Bastos, Isabela , Spadine, Carolyn , Stella, Antonio , Straka, Milan , Strnadová, Jana , Suhr, Alane , Sulubacak, Umut , Suzuki, Shingo , Szántó, Zsolt , Taji, Dima , Takahashi, Yuta , Tamburini, Fabio , Tanaka, Takaaki , Tellier, Isabelle , Thomas, Guillaume , Torga, Liisi , Trosterud, Trond , Trukhina, Anna , Tsarfaty, Reut , Tyers, Francis , Uematsu, Sumire , Urešová, Zdeňka , Uria, Larraitz , Uszkoreit, Hans , Vajjala, Sowmya , van Niekerk, Daniel , van Noord, Gertjan , Varga, Viktor , Villemonte de la Clergerie, Eric , Vincze, Veronika , Wallin, Lars , Walsh, Abigail , Wang, Jing Xian , Washington, Jonathan North , Wendt, Maximilan , Williams, Seyi , Wirén, Mats , Wittern, Christian , Woldemariam, Tsegay , Wong, Tak-sum , Wróblewska, Alina , Yako, Mary , Yamazaki, Naoki , Yan, Chunxiao , Yasuoka, Koichi , Yavrumyan, Marat M. , Yu, Zhuoran , Žabokrtský, Zdeněk , Zeldes, Amir , Zeman, Daniel , Zhang, Manying , and Zhu, Hanzhi
Publisher:
Universal Dependencies Consortium
Type:
text and corpus
Subject:
treebank , dependency , syntax , morphology , harmonized annotation , interset , universal tagset , and stanford dependencies
Language:
Ancient Greek (to 1453) , Arabic , Basque , Bulgarian , Croatian , Czech , Danish , Dutch , English , Estonian , Finnish , French , German , Gothic , Modern Greek (1453-) , Hebrew , Hindi , Hungarian , Indonesian , Irish , Italian , Japanese , Latin , Norwegian , Church Slavic , Persian , Polish , Portuguese , Romanian , Slovenian , Spanish , Swedish , Tamil , Catalan , Chinese , Galician , Kazakh , Latvian , Russian , Turkish , Coptic , Sanskrit , Slovak , Ukrainian , Uighur , Vietnamese , Belarusian , Korean , Lithuanian , Urdu , Russia Buriat , Northern Kurdish , Northern Sami , Upper Sorbian , Afrikaans , Yue Chinese , Marathi , Serbian , Swedish Sign Language , Telugu , Amharic , Armenian , Breton , Faroese , Komi-Zyrian , Nigerian Pidgin , Old French (842-ca. 1400) , Tagalog , Thai , Warlpiri , Yoruba , Akkadian , Bambara , Erzya , Maltese , Welsh , Wolof , Assyrian Neo-Aramaic , Literary Chinese , Old Russian , Karelian , and Mbyá Guaraní
Description:
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
Rights:
Licence Universal Dependencies v2.4 , https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.4 , and PUB
Creator:
Zeman, Daniel , Nivre, Joakim , Abrams, Mitchell , Aepli, Noëmi , Agić, Željko , Ahrenberg, Lars , Aleksandravičiūtė, Gabrielė , Antonsen, Lene , Aplonova, Katya , Aranzabe, Maria Jesus , Arutie, Gashaw , Asahara, Masayuki , Ateyah, Luma , Attia, Mohammed , Atutxa, Aitziber , Augustinus, Liesbeth , Badmaeva, Elena , Ballesteros, Miguel , Banerjee, Esha , Bank, Sebastian , Barbu Mititelu, Verginica , Basmov, Victoria , Batchelor, Colin , Bauer, John , Bellato, Sandra , Bengoetxea, Kepa , Berzak, Yevgeni , Bhat, Irshad Ahmad , Bhat, Riyaz Ahmad , Biagetti, Erica , Bick, Eckhard , Bielinskienė, Agnė , Blokland, Rogier , Bobicev, Victoria , Boizou, Loïc , Borges Völker, Emanuel , Börstell, Carl , Bosco, Cristina , Bouma, Gosse , Bowman, Sam , Boyd, Adriane , Brokaitė, Kristina , Burchardt, Aljoscha , Candito, Marie , Caron, Bernard , Caron, Gauthier , Cavalcanti, Tatiana , Cebiroğlu Eryiğit, Gülşen , Cecchini, Flavio Massimiliano , Celano, Giuseppe G. A. , Čéplö, Slavomír , Cetin, Savas , Chalub, Fabricio , Choi, Jinho , Cho, Yongseok , Chun, Jayeol , Cignarella, Alessandra T. , Cinková, Silvie , Collomb, Aurélie , Çöltekin, Çağrı , Connor, Miriam , Courtin, Marine , Davidson, Elizabeth , de Marneffe, Marie-Catherine , de Paiva, Valeria , de Souza, Elvis , Diaz de Ilarraza, Arantza , Dickerson, Carly , Dione, Bamba , Dirix, Peter , Dobrovoljc, Kaja , Dozat, Timothy , Droganova, Kira , Dwivedi, Puneet , Eckhoff, Hanne , Eli, Marhaba , Elkahky, Ali , Ephrem, Binyam , Erina, Olga , Erjavec, Tomaž , Etienne, Aline , Evelyn, Wograine , Farkas, Richárd , Fernandez Alcalde, Hector , Foster, Jennifer , Freitas, Cláudia , Fujita, Kazunori , Gajdošová, Katarína , Galbraith, Daniel , Garcia, Marcos , Gärdenfors, Moa , Garza, Sebastian , Gerdes, Kim , Ginter, Filip , Goenaga, Iakes , Gojenola, Koldo , Gökırmak, Memduh , Goldberg, Yoav , Gómez Guinovart, Xavier , González Saavedra, Berta , Griciūtė, Bernadeta , Grioni, Matias , Grūzītis, Normunds , Guillaume, Bruno , Guillot-Barbance, Céline , Habash, Nizar , Hajič, Jan , Hajič jr., Jan , Hämäläinen, Mika , Hà Mỹ, Linh , Han, Na-Rae , Harris, Kim , Haug, Dag , Heinecke, Johannes , Hennig, Felix , Hladká, Barbora , Hlaváčová, Jaroslava , Hociung, Florinel , Hohle, Petter , Hwang, Jena , Ikeda, Takumi , Ion, Radu , Irimia, Elena , Ishola, Ọlájídé , Jelínek, Tomáš , Johannsen, Anders , Jørgensen, Fredrik , Juutinen, Markus , Kaşıkara, Hüner , Kaasen, Andre , Kabaeva, Nadezhda , Kahane, Sylvain , Kanayama, Hiroshi , Kanerva, Jenna , Katz, Boris , Kayadelen, Tolga , Kenney, Jessica , Kettnerová, Václava , Kirchner, Jesse , Klementieva, Elena , Köhn, Arne , Kopacewicz, Kamil , Kotsyba, Natalia , Kovalevskaitė, Jolanta , Krek, Simon , Kwak, Sookyoung , Laippala, Veronika , Lambertino, Lorenzo , Lam, Lucia , Lando, Tatiana , Larasati, Septina Dian , Lavrentiev, Alexei , Lee, John , Lê Hồng, Phương , Lenci, Alessandro , Lertpradit, Saran , Leung, Herman , Li, Cheuk Ying , Li, Josie , Li, Keying , Lim, KyungTae , Liovina, Maria , Li, Yuan , Ljubešić, Nikola , Loginova, Olga , Lyashevskaya, Olga , Lynn, Teresa , Macketanz, Vivien , Makazhanov, Aibek , Mandl, Michael , Manning, Christopher , Manurung, Ruli , Mărănduc, Cătălina , Mareček, David , Marheinecke, Katrin , Martínez Alonso, Héctor , Martins, André , Mašek, Jan , Matsumoto, Yuji , McDonald, Ryan , McGuinness, Sarah , Mendonça, Gustavo , Miekka, Niko , Misirpashayeva, Margarita , Missilä, Anna , Mititelu, Cătălin , Mitrofan, Maria , Miyao, Yusuke , Montemagni, Simonetta , More, Amir , Moreno Romero, Laura , Mori, Keiko Sophie , Morioka, Tomohiko , Mori, Shinsuke , Moro, Shigeki , Mortensen, Bjartur , Moskalevskyi, Bohdan , Muischnek, Kadri , Munro, Robert , Murawaki, Yugo , Müürisep, Kaili , Nainwani, Pinkey , Navarro Horñiacek, Juan Ignacio , Nedoluzhko, Anna , Nešpore-Bērzkalne, Gunta , Nguyễn Thị, Lương , Nguyễn Thị Minh, Huyền , Nikaido, Yoshihiro , Nikolaev, Vitaly , Nitisaroj, Rattima , Nurmi, Hanna , Ojala, Stina , Ojha, Atul Kr. , Olúòkun, Adédayọ̀ , Omura, Mai , Osenova, Petya , Östling, Robert , Øvrelid, Lilja , Partanen, Niko , Pascual, Elena , Passarotti, Marco , Patejuk, Agnieszka , Paulino-Passos, Guilherme , Peljak-Łapińska, Angelika , Peng, Siyao , Perez, Cenel-Augusto , Perrier, Guy , Petrova, Daria , Petrov, Slav , Phelan, Jason , Piitulainen, Jussi , Pirinen, Tommi A , Pitler, Emily , Plank, Barbara , Poibeau, Thierry , Ponomareva, Larisa , Popel, Martin , Pretkalniņa, Lauma , Prévost, Sophie , Prokopidis, Prokopis , Przepiórkowski, Adam , Puolakainen, Tiina , Pyysalo, Sampo , Qi, Peng , Rääbis, Andriela , Rademaker, Alexandre , Ramasamy, Loganathan , Rama, Taraka , Ramisch, Carlos , Ravishankar, Vinit , Real, Livy , Reddy, Siva , Rehm, Georg , Riabov, Ivan , Rießler, Michael , Rimkutė, Erika , Rinaldi, Larissa , Rituma, Laura , Rocha, Luisa , Romanenko, Mykhailo , Rosa, Rudolf , Rovati, Davide , Roșca, Valentin , Rudina, Olga , Rueter, Jack , Sadde, Shoval , Sagot, Benoît , Saleh, Shadi , Salomoni, Alessio , Samardžić, Tanja , Samson, Stephanie , Sanguinetti, Manuela , Särg, Dage , Saulīte, Baiba , Sawanakunanon, Yanin , Schneider, Nathan , Schuster, Sebastian , Seddah, Djamé , Seeker, Wolfgang , Seraji, Mojgan , Shen, Mo , Shimada, Atsuko , Shirasu, Hiroyuki , Shohibussirri, Muh , Sichinava, Dmitry , Silveira, Aline , Silveira, Natalia , Simi, Maria , Simionescu, Radu , Simkó, Katalin , Šimková, Mária , Simov, Kiril , Smith, Aaron , Soares-Bastos, Isabela , Spadine, Carolyn , Stella, Antonio , Straka, Milan , Strnadová, Jana , Suhr, Alane , Sulubacak, Umut , Suzuki, Shingo , Szántó, Zsolt , Taji, Dima , Takahashi, Yuta , Tamburini, Fabio , Tanaka, Takaaki , Tellier, Isabelle , Thomas, Guillaume , Torga, Liisi , Trosterud, Trond , Trukhina, Anna , Tsarfaty, Reut , Tyers, Francis , Uematsu, Sumire , Urešová, Zdeňka , Uria, Larraitz , Uszkoreit, Hans , Utka, Andrius , Vajjala, Sowmya , van Niekerk, Daniel , van Noord, Gertjan , Varga, Viktor , Villemonte de la Clergerie, Eric , Vincze, Veronika , Wallin, Lars , Walsh, Abigail , Wang, Jing Xian , Washington, Jonathan North , Wendt, Maximilan , Williams, Seyi , Wirén, Mats , Wittern, Christian , Woldemariam, Tsegay , Wong, Tak-sum , Wróblewska, Alina , Yako, Mary , Yamazaki, Naoki , Yan, Chunxiao , Yasuoka, Koichi , Yavrumyan, Marat M. , Yu, Zhuoran , Žabokrtský, Zdeněk , Zeldes, Amir , Zhang, Manying , and Zhu, Hanzhi
Publisher:
Universal Dependencies Consortium
Type:
text and corpus
Subject:
treebank , dependency , syntax , morphology , harmonized annotation , interset , universal tagset , and stanford dependencies
Language:
Ancient Greek (to 1453) , Arabic , Basque , Bulgarian , Croatian , Czech , Danish , Dutch , English , Estonian , Finnish , French , German , Gothic , Modern Greek (1453-) , Hebrew , Hindi , Hungarian , Indonesian , Irish , Italian , Japanese , Latin , Norwegian , Church Slavic , Persian , Polish , Portuguese , Romanian , Slovenian , Spanish , Swedish , Tamil , Catalan , Chinese , Galician , Kazakh , Latvian , Russian , Turkish , Coptic , Sanskrit , Slovak , Ukrainian , Uighur , Vietnamese , Belarusian , Korean , Lithuanian , Urdu , Russia Buriat , Northern Kurdish , Northern Sami , Upper Sorbian , Afrikaans , Yue Chinese , Marathi , Serbian , Swedish Sign Language , Telugu , Amharic , Armenian , Breton , Faroese , Komi-Zyrian , Nigerian Pidgin , Old French (842-ca. 1400) , Tagalog , Thai , Warlpiri , Yoruba , Akkadian , Bambara , Erzya , Maltese , Welsh , Wolof , Assyrian Neo-Aramaic , Literary Chinese , Old Russian , Karelian , Mbyá Guaraní , Bhojpuri , Komi-Permyak , Livvi , Moksha , Scottish Gaelic , Skolt Sami , and Swiss German
Description:
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
Rights:
Licence Universal Dependencies v2.5 , https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.5 , and PUB