Skip to search
Skip to main content
Skip to first result
Search
Search Results
Creator:
Zeman, Daniel , Potthast, Martin , Straka, Milan , Popel, Martin , Dozat, Timothy , Qi, Peng , Manning, Christopher , Shi, Tianze , Wu, Felix G. , Chen, Xilun , Cheng, Yao , Björkelund, Anders , Falenska, Agnieszka , Yu, Xiang , Kuhn, Jonas , Che, Wanxiang , Guo, Jiang , Wang, Yuxuan , Zheng, Bo , Zhao, Huaipeng , Liu, Yang , Teng, Dechuan , Liu, Ting , Lim, Kyungtae , Poibeau, Thierry , Sato, Motoki , Manabe, Hitoshi , Noji, Hiroshi , Matsumoto, Yuji , Kırnap, Ömer , Önder, Berkay Furkan , Yuret, Deniz , Straková, Jana , Vania, Clara , Zhang, Xingxing , Lopez, Adam , Heinecke, Johannes , Asadullah, Munshi , Kanerva, Jenna , Luotolahti, Juhani , Ginter, Filip , Kuan, Yu , Sofroniev, Pavel , Schill, Erik , Hinrichs, Erhard , Nguyen, Dat Quoc , Dras, Mark , Johnson, Mark , Qian, Xian , Vilares, David , Gómez-Rodríguez, Carlos , Aufrant, Lauriane , Wisniewski, Guillaume , Yvon, François , Dumitrescu, Stefan Daniel , Boroş, Tiberiu , Tufiş, Dan , Das, Ayan , Zaffar, Affan , Sarkar, Sudeshna , Wang, Hao , Zhao, Hai , Zhang, Zhisong , Hornby, Ryan , Taylor, Clark , Park, Jungyeul , de Lhoneux, Miryam , Shao, Yan , Basirat, Ali , Kiperwasser, Eliyahu , Stymne, Sara , Goldberg, Yoav , Nivre, Joakim , Akkuş, Burak Kerim , Azizoglu, Heval , Cakici, Ruket , Moor, Christophe , Merlo, Paola , Henderson, James , Wang, Haozhou , Ji, Tao , Wu, Yuanbin , Lan, Man , de la Clergerie, Eric , Sagot, Benoît , Seddah, Djamé , More, Amir , Tsarfaty, Reut , Kanayama, Hiroshi , Muraoka, Masayasu , Yoshikawa, Katsumasa , Garcia, Marcos , and Gamallo, Pablo
Publisher:
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:
text and corpus
Subject:
dependency parser and parsebank
Language:
Arabic , Bulgarian , Russia Buriat , Czech , Catalan , Church Slavic , Danish , German , Modern Greek (1453-) , English , Spanish , Estonian , Basque , Persian , Finnish , French , Irish , Galician , Gothic , Ancient Greek (to 1453) , Hebrew , Hindi , Croatian , Upper Sorbian , Hungarian , Indonesian , Italian , Japanese , Kazakh , Northern Kurdish , Korean , Latin , Latvian , Dutch , Norwegian , Polish , Portuguese , Romanian , Russian , Slovak , Slovenian , Northern Sami , Swedish , Turkish , Uighur , Ukrainian , Urdu , Vietnamese , and Chinese
Description:
This package contains the system outputs from the CoNLL 2017 Shared Task in Multilingual Parsing from Raw Text to Universal Dependencies.
Rights:
Licence Universal Dependencies v2.0 , https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.0 , and PUB
Creator:
Popel, Martin , Novák, Michal , Žabokrtský, Zdeněk , Zeman, Daniel , Nedoluzhko, Anna , Acar, Kutay , Bamman, David , Bourgonje, Peter , Cinková, Silvie , Eckhoff, Hanne , Cebiroğlu Eryiğit, Gülşen , Hajič, Jan , Hardmeier, Christian , Haug, Dag , Jørgensen, Tollef , Kåsen, Andre , Krielke, Pauline , Landragin, Frédéric , Lapshinova-Koltunski, Ekaterina , Mæhlum, Petter , Martí, M. Antònia , Mikulová, Marie , Nøklestad, Anders , Ogrodniczuk, Maciej , Øvrelid, Lilja , Pamay Arslan, Tuğba , Recasens, Marta , Solberg, Per Erik , Stede, Manfred , Straka, Milan , Swanson, Daniel , Toldova, Svetlana , Vadász, Noémi , Velldal, Erik , Vincze, Veronika , Zeldes, Amir , and Žitkus, Voldemaras
Publisher:
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:
text and corpus
Subject:
coreference , bridging relations , harmonized annotation , dependency , and treebank
Language:
Ancient Greek (to 1453) , Ancient Hebrew , Catalan , Czech , English , French , German , Hungarian , Lithuanian , Norwegian , Church Slavic , Polish , Russian , Spanish , and Turkish
Description:
CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version 1.2 consists of 25 datasets for 16 languages. The datasets are enriched with automatic morphological and syntactic annotations that are fully compliant with the standards of the Universal Dependencies project. All the datasets are stored in the CoNLL-U format, with coreference- and bridging-specific information captured by attribute-value pairs located in the MISC column. The collection is divided into a public edition and a non-public (ÚFAL-internal) edition. The publicly available edition is distributed via LINDAT-CLARIAH-CZ and contains 21 datasets for 15 languages (1 dataset for Ancient Greek, 1 for Ancient Hebrew, 1 for Catalan, 2 for Czech, 3 for English, 1 for French, 2 for German, 2 for Hungarian, 1 for Lithuanian, 2 for Norwegian, 1 for Old Church Slavonic, 1 for Polish, 1 for Russian, 1 for Spanish, and 1 for Turkish), excluding the test data. The non-public edition is available internally to ÚFAL members and contains additional 4 datasets for 2 languages (1 dataset for Dutch, and 3 for English), which we are not allowed to distribute due to their original license limitations. It also contains the test data portions for all datasets. When using any of the harmonized datasets, please get acquainted with its license (placed in the same directory as the data) and cite the original data resource, too. Compared to the previous version 1.1, the version 1.2 comprises new languages and corpora, namely Ancient_Greek-PROIEL, Ancient_Hebrew-PTNK, English-LitBank, and Old_Church_Slavonic-PROIEL. In addition, English-GUM and Turkish-ITCC have been updated to newer versions, conversion of zeros in Polish-PCC has been improved, and the conversion pipelines for multiple other datasets have been refined (a list of all changes in each dataset can be found in the corresponding README file).
Rights:
Licence CorefUD v1.2 , https://lindat.mff.cuni.cz/repository/xmlui/page/license-corefud-1.2 , and PUB
Creator:
Zeman, Daniel , Mareček, David , Mašek, Jan , Popel, Martin , Ramasamy, Loganathan , Rosa, Rudolf , Štěpánek, Jan , and Žabokrtský, Zdeněk
Publisher:
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:
text and corpus
Subject:
treebank , Stanford dependencies , Prague dependencies , harmonization , common annotation style , and Interset
Language:
Arabic , Bulgarian , Bengali , Catalan , Czech , Danish , German , Modern Greek (1453-) , English , Spanish , Estonian , Basque , Persian , Finnish , Ancient Greek (to 1453) , Hindi , Hungarian , Italian , Japanese , Latin , Dutch , Portuguese , Romanian , Russian , Slovak , Slovenian , Swedish , Tamil , Telugu , and Turkish
Description:
HamleDT 2.0 is a collection of 30 existing treebanks harmonized into a common annotation style, the Prague Dependencies, and further transformed into Stanford Dependencies, a treebank annotation style that became popular recently. We use the newest basic Universal Stanford Dependencies, without added language-specific subtypes.
Rights:
HamleDT 2.0 Licence Agreement , https://lindat.mff.cuni.cz/repository/xmlui/page/licence-hamledt-2.0 , and ACA
Creator:
Zeman, Daniel , Mareček, David , Mašek, Jan , Popel, Martin , Ramasamy, Loganathan , Rosa, Rudolf , Štěpánek, Jan , and Žabokrtský, Zdeněk
Publisher:
Charles University
Type:
text and corpus
Subject:
annotated corpus , morphology , syntax , dependency , treebank , harmonized annotation , and common annotation style
Language:
Arabic , Basque , Bengali , Bulgarian , Catalan , Croatian , Czech , Danish , Dutch , English , Estonian , Finnish , French , German , Modern Greek (1453-) , Ancient Greek (to 1453) , Hebrew , Hindi , Hungarian , Indonesian , Irish , Italian , Japanese , Latin , Persian , Polish , Portuguese , Romanian , Russian , Slovak , Slovenian , Spanish , Swedish , Tamil , Telugu , and Turkish
Description:
HamleDT (HArmonized Multi-LanguagE Dependency Treebank) is a compilation of existing dependency treebanks (or dependency conversions of other treebanks), transformed so that they all conform to the same annotation style. This version uses Universal Dependencies as the common annotation style.
Update (November 1017): for a current collection of harmonized dependency treebanks, we recommend using the Universal Dependencies (UD). All of the corpora that are distributed in HamleDT in full are also part of the UD project; only some corpora from the Patch group (where HamleDT provides only the harmonizing scripts but not the full corpus data) are available in HamleDT but not in UD.
Rights:
HamleDT 3.0 License Terms , https://lindat.mff.cuni.cz/repository/xmlui/page/licence-hamledt-3.0 , and PUB
Creator:
Nivre, Joakim , Agić, Željko , Aranzabe, Maria Jesus , Asahara, Masayuki , Atutxa, Aitziber , Ballesteros, Miguel , Bauer, John , Bengoetxea, Kepa , Bhat, Riyaz Ahmad , Bosco, Cristina , Bowman, Sam , Celano, Giuseppe G. A. , Connor, Miriam , de Marneffe, Marie-Catherine , Diaz de Ilarraza, Arantza , Dobrovoljc, Kaja , Dozat, Timothy , Erjavec, Tomaž , Farkas, Richárd , Foster, Jennifer , Galbraith, Daniel , Ginter, Filip , Goenaga, Iakes , Gojenola, Koldo , Goldberg, Yoav , Gonzales, Berta , Guillaume, Bruno , Hajič, Jan , Haug, Dag , Ion, Radu , Irimia, Elena , Johannsen, Anders , Kanayama, Hiroshi , Kanerva, Jenna , Krek, Simon , Laippala, Veronika , Lenci, Alessandro , Ljubešić, Nikola , Lynn, Teresa , Manning, Christopher , Mărănduc, Cătălina , Mareček, David , Martínez Alonso, Héctor , Mašek, Jan , Matsumoto, Yuji , McDonald, Ryan , Missilä, Anna , Mititelu, Verginica , Miyao, Yusuke , Montemagni, Simonetta , Mori, Shunsuke , Nurmi, Hanna , Osenova, Petya , Øvrelid, Lilja , Pascual, Elena , Passarotti, Marco , Perez, Cenel-Augusto , Petrov, Slav , Piitulainen, Jussi , Plank, Barbara , Popel, Martin , Prokopidis, Prokopis , Pyysalo, Sampo , Ramasamy, Loganathan , Rosa, Rudolf , Saleh, Shadi , Schuster, Sebastian , Seeker, Wolfgang , Seraji, Mojgan , Silveira, Natalia , Simi, Maria , Simionescu, Radu , Simkó, Katalin , Simov, Kiril , Smith, Aaron , Štěpánek, Jan , Suhr, Alane , Szántó, Zsolt , Tanaka, Takaaki , Tsarfaty, Reut , Uematsu, Sumire , Uria, Larraitz , Varga, Viktor , Vincze, Veronika , Žabokrtský, Zdeněk , Zeman, Daniel , and Zhu, Hanzhi
Publisher:
Universal Dependencies Consortium
Type:
text and corpus
Subject:
treebank , dependency , syntax , morphology , harmonized annotation , interset , universal tagset , and stanford dependencies
Language:
Ancient Greek (to 1453) , Arabic , Basque , Bulgarian , Croatian , Czech , Danish , Dutch , English , Estonian , Finnish , French , German , Gothic , Modern Greek (1453-) , Hebrew , Hindi , Hungarian , Indonesian , Irish , Italian , Japanese , Latin , Norwegian , Church Slavic , Persian , Polish , Portuguese , Romanian , Slovenian , Spanish , Swedish , and Tamil
Description:
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
Rights:
Licence Universal Dependencies v1.2 , https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-1.2 , and PUB
Creator:
Nivre, Joakim , Agić, Željko , Ahrenberg, Lars , Aranzabe, Maria Jesus , Asahara, Masayuki , Atutxa, Aitziber , Ballesteros, Miguel , Bauer, John , Bengoetxea, Kepa , Berzak, Yevgeni , Bhat, Riyaz Ahmad , Bosco, Cristina , Bouma, Gosse , Bowman, Sam , Cebiroğlu Eryiğit, Gülşen , Celano, Giuseppe G. A. , Çöltekin, Çağrı , Connor, Miriam , de Marneffe, Marie-Catherine , Diaz de Ilarraza, Arantza , Dobrovoljc, Kaja , Dozat, Timothy , Droganova, Kira , Erjavec, Tomaž , Farkas, Richárd , Foster, Jennifer , Galbraith, Daniel , Garza, Sebastian , Ginter, Filip , Goenaga, Iakes , Gojenola, Koldo , Gokirmak, Memduh , Goldberg, Yoav , Gómez Guinovart, Xavier , Gonzáles Saavedra, Berta , Grūzītis, Normunds , Guillaume, Bruno , Hajič, Jan , Haug, Dag , Hladká, Barbora , Ion, Radu , Irimia, Elena , Johannsen, Anders , Kaşıkara, Hüner , Kanayama, Hiroshi , Kanerva, Jenna , Katz, Boris , Kenney, Jessica , Krek, Simon , Laippala, Veronika , Lam, Lucia , Lenci, Alessandro , Ljubešić, Nikola , Lyashevskaya, Olga , Lynn, Teresa , Makazhanov, Aibek , Manning, Christopher , Mărănduc, Cătălina , Mareček, David , Martínez Alonso, Héctor , Mašek, Jan , Matsumoto, Yuji , McDonald, Ryan , Missilä, Anna , Mititelu, Verginica , Miyao, Yusuke , Montemagni, Simonetta , Mori, Keiko Sophie , Mori, Shunsuke , Muischnek, Kadri , Mustafina, Nina , Müürisep, Kaili , Nikolaev, Vitaly , Nurmi, Hanna , Osenova, Petya , Øvrelid, Lilja , Pascual, Elena , Passarotti, Marco , Perez, Cenel-Augusto , Petrov, Slav , Piitulainen, Jussi , Plank, Barbara , Popel, Martin , Pretkalniņa, Lauma , Prokopidis, Prokopis , Puolakainen, Tiina , Pyysalo, Sampo , Ramasamy, Loganathan , Rituma, Laura , Rosa, Rudolf , Saleh, Shadi , Saulīte, Baiba , Schuster, Sebastian , Seeker, Wolfgang , Seraji, Mojgan , Shakurova, Lena , Shen, Mo , Silveira, Natalia , Simi, Maria , Simionescu, Radu , Simkó, Katalin , Simov, Kiril , Smith, Aaron , Spadine, Carolyn , Suhr, Alane , Sulubacak, Umut , Szántó, Zsolt , Tanaka, Takaaki , Tsarfaty, Reut , Tyers, Francis , Uematsu, Sumire , Uria, Larraitz , van Noord, Gertjan , Varga, Viktor , Vincze, Veronika , Wang, Jing Xian , Washington, Jonathan North , Žabokrtský, Zdeněk , Zeman, Daniel , and Zhu, Hanzhi
Publisher:
Universal Dependencies Consortium
Type:
text and corpus
Subject:
treebank , dependency , syntax , morphology , harmonized annotation , interset , universal tagset , and stanford dependencies
Language:
Ancient Greek (to 1453) , Arabic , Basque , Bulgarian , Croatian , Czech , Danish , Dutch , English , Estonian , Finnish , French , German , Gothic , Modern Greek (1453-) , Hebrew , Hindi , Hungarian , Indonesian , Irish , Italian , Japanese , Latin , Norwegian , Church Slavic , Persian , Polish , Portuguese , Romanian , Slovenian , Spanish , Swedish , Tamil , Catalan , Chinese , Galician , Kazakh , Latvian , Russian , and Turkish
Description:
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
Rights:
Licence Universal Dependencies v1.3 , https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-1.3 , and PUB
Creator:
Nivre, Joakim , Agić, Željko , Ahrenberg, Lars , Aranzabe, Maria Jesus , Asahara, Masayuki , Atutxa, Aitziber , Ballesteros, Miguel , Bauer, John , Bengoetxea, Kepa , Berzak, Yevgeni , Bhat, Riyaz Ahmad , Bick, Eckhard , Börstell, Carl , Bosco, Cristina , Bouma, Gosse , Bowman, Sam , Cebiroğlu Eryiğit, Gülşen , Celano, Giuseppe G. A. , Chalub, Fabricio , Çöltekin, Çağrı , Connor, Miriam , Davidson, Elizabeth , de Marneffe, Marie-Catherine , Diaz de Ilarraza, Arantza , Dobrovoljc, Kaja , Dozat, Timothy , Droganova, Kira , Dwivedi, Puneet , Eli, Marhaba , Erjavec, Tomaž , Farkas, Richárd , Foster, Jennifer , Freitas, Claudia , Gajdošová, Katarína , Galbraith, Daniel , Garcia, Marcos , Gärdenfors, Moa , Garza, Sebastian , Ginter, Filip , Goenaga, Iakes , Gojenola, Koldo , Gökırmak, Memduh , Goldberg, Yoav , Gómez Guinovart, Xavier , Gonzáles Saavedra, Berta , Grioni, Matias , Grūzītis, Normunds , Guillaume, Bruno , Hajič, Jan , Hà Mỹ, Linh , Haug, Dag , Hladká, Barbora , Ion, Radu , Irimia, Elena , Johannsen, Anders , Jørgensen, Fredrik , Kaşıkara, Hüner , Kanayama, Hiroshi , Kanerva, Jenna , Katz, Boris , Kenney, Jessica , Kotsyba, Natalia , Krek, Simon , Laippala, Veronika , Lam, Lucia , Lê Hồng, Phương , Lenci, Alessandro , Ljubešić, Nikola , Lyashevskaya, Olga , Lynn, Teresa , Makazhanov, Aibek , Manning, Christopher , Mărănduc, Cătălina , Mareček, David , Martínez Alonso, Héctor , Martins, André , Mašek, Jan , Matsumoto, Yuji , McDonald, Ryan , Missilä, Anna , Mititelu, Verginica , Miyao, Yusuke , Montemagni, Simonetta , Mori, Keiko Sophie , Mori, Shunsuke , Moskalevskyi, Bohdan , Muischnek, Kadri , Mustafina, Nina , Müürisep, Kaili , Nguyễn Thị, Lương , Nguyễn Thị Minh, Huyền , Nikolaev, Vitaly , Nurmi, Hanna , Osenova, Petya , Östling, Robert , Øvrelid, Lilja , Paiva, Valeria , Pascual, Elena , Passarotti, Marco , Perez, Cenel-Augusto , Petrov, Slav , Piitulainen, Jussi , Plank, Barbara , Popel, Martin , Pretkalniņa, Lauma , Prokopidis, Prokopis , Puolakainen, Tiina , Pyysalo, Sampo , Rademaker, Alexandre , Ramasamy, Loganathan , Real, Livy , Rituma, Laura , Rosa, Rudolf , Saleh, Shadi , Saulīte, Baiba , Schuster, Sebastian , Seeker, Wolfgang , Seraji, Mojgan , Shakurova, Lena , Shen, Mo , Silveira, Natalia , Simi, Maria , Simionescu, Radu , Simkó, Katalin , Šimková, Mária , Simov, Kiril , Smith, Aaron , Spadine, Carolyn , Suhr, Alane , Sulubacak, Umut , Szántó, Zsolt , Tanaka, Takaaki , Tsarfaty, Reut , Tyers, Francis , Uematsu, Sumire , Uria, Larraitz , van Noord, Gertjan , Varga, Viktor , Vincze, Veronika , Wallin, Lars , Wang, Jing Xian , Washington, Jonathan North , Wirén, Mats , Žabokrtský, Zdeněk , Zeldes, Amir , Zeman, Daniel , and Zhu, Hanzhi
Publisher:
Universal Dependencies Consortium
Type:
text and corpus
Subject:
treebank , dependency , syntax , morphology , harmonized annotation , interset , universal tagset , and stanford dependencies
Language:
Ancient Greek (to 1453) , Arabic , Basque , Bulgarian , Croatian , Czech , Danish , Dutch , English , Estonian , Finnish , French , German , Gothic , Modern Greek (1453-) , Hebrew , Hindi , Hungarian , Indonesian , Irish , Italian , Japanese , Latin , Norwegian , Church Slavic , Persian , Polish , Portuguese , Romanian , Slovenian , Spanish , Swedish , Tamil , Catalan , Chinese , Galician , Kazakh , Latvian , Russian , Turkish , Coptic , Sanskrit , Slovak , Swedish Sign Language , Ukrainian , Uighur , and Vietnamese
Description:
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
Rights:
Licence Universal Dependencies v1.4 , https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-1.4 , and PUB
Creator:
Nivre, Joakim , Agić, Željko , Ahrenberg, Lars , Aranzabe, Maria Jesus , Asahara, Masayuki , Atutxa, Aitziber , Ballesteros, Miguel , Bauer, John , Bengoetxea, Kepa , Bhat, Riyaz Ahmad , Bick, Eckhard , Bosco, Cristina , Bouma, Gosse , Bowman, Sam , Candito, Marie , Cebiroğlu Eryiğit, Gülşen , Celano, Giuseppe G. A. , Chalub, Fabricio , Choi, Jinho , Çöltekin, Çağrı , Connor, Miriam , Davidson, Elizabeth , de Marneffe, Marie-Catherine , de Paiva, Valeria , Diaz de Ilarraza, Arantza , Dobrovoljc, Kaja , Dozat, Timothy , Droganova, Kira , Dwivedi, Puneet , Eli, Marhaba , Erjavec, Tomaž , Farkas, Richárd , Foster, Jennifer , Freitas, Cláudia , Gajdošová, Katarína , Galbraith, Daniel , Garcia, Marcos , Ginter, Filip , Goenaga, Iakes , Gojenola, Koldo , Gökırmak, Memduh , Goldberg, Yoav , Gómez Guinovart, Xavier , Gonzáles Saavedra, Berta , Grioni, Matias , Grūzītis, Normunds , Guillaume, Bruno , Habash, Nizar , Hajič, Jan , Hà Mỹ, Linh , Haug, Dag , Hladká, Barbora , Hohle, Petter , Ion, Radu , Irimia, Elena , Johannsen, Anders , Jørgensen, Fredrik , Kaşıkara, Hüner , Kanayama, Hiroshi , Kanerva, Jenna , Kotsyba, Natalia , Krek, Simon , Laippala, Veronika , Lê Hồng, Phương , Lenci, Alessandro , Ljubešić, Nikola , Lyashevskaya, Olga , Lynn, Teresa , Makazhanov, Aibek , Manning, Christopher , Mărănduc, Cătălina , Mareček, David , Martínez Alonso, Héctor , Martins, André , Mašek, Jan , Matsumoto, Yuji , McDonald, Ryan , Missilä, Anna , Mititelu, Verginica , Miyao, Yusuke , Montemagni, Simonetta , More, Amir , Mori, Shunsuke , Moskalevskyi, Bohdan , Muischnek, Kadri , Mustafina, Nina , Müürisep, Kaili , Nguyễn Thị, Lương , Nguyễn Thị Minh, Huyền , Nikolaev, Vitaly , Nurmi, Hanna , Ojala, Stina , Osenova, Petya , Øvrelid, Lilja , Pascual, Elena , Passarotti, Marco , Perez, Cenel-Augusto , Perrier, Guy , Petrov, Slav , Piitulainen, Jussi , Plank, Barbara , Popel, Martin , Pretkalniņa, Lauma , Prokopidis, Prokopis , Puolakainen, Tiina , Pyysalo, Sampo , Rademaker, Alexandre , Ramasamy, Loganathan , Real, Livy , Rituma, Laura , Rosa, Rudolf , Saleh, Shadi , Sanguinetti, Manuela , Saulīte, Baiba , Schuster, Sebastian , Seddah, Djamé , Seeker, Wolfgang , Seraji, Mojgan , Shakurova, Lena , Shen, Mo , Sichinava, Dmitry , Silveira, Natalia , Simi, Maria , Simionescu, Radu , Simkó, Katalin , Šimková, Mária , Simov, Kiril , Smith, Aaron , Suhr, Alane , Sulubacak, Umut , Szántó, Zsolt , Taji, Dima , Tanaka, Takaaki , Tsarfaty, Reut , Tyers, Francis , Uematsu, Sumire , Uria, Larraitz , van Noord, Gertjan , Varga, Viktor , Vincze, Veronika , Washington, Jonathan North , Žabokrtský, Zdeněk , Zeldes, Amir , Zeman, Daniel , and Zhu, Hanzhi
Publisher:
Universal Dependencies Consortium
Type:
text and corpus
Subject:
treebank , dependency , syntax , morphology , harmonized annotation , interset , universal tagset , and stanford dependencies
Language:
Ancient Greek (to 1453) , Arabic , Basque , Bulgarian , Croatian , Czech , Danish , Dutch , English , Estonian , Finnish , French , German , Gothic , Modern Greek (1453-) , Hebrew , Hindi , Hungarian , Indonesian , Irish , Italian , Japanese , Latin , Norwegian , Church Slavic , Persian , Polish , Portuguese , Romanian , Slovenian , Spanish , Swedish , Tamil , Catalan , Chinese , Galician , Kazakh , Latvian , Russian , Turkish , Coptic , Sanskrit , Slovak , Ukrainian , Uighur , Vietnamese , Belarusian , Korean , Lithuanian , and Urdu
Description:
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
This release is special in that the treebanks will be used as training/development data in the CoNLL 2017 shared task (http://universaldependencies.org/conll17/). Test data are not released, except for the few treebanks that do not take part in the shared task. 64 treebanks will be in the shared task, and they correspond to the following 45 languages: Ancient Greek, Arabic, Basque, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Gothic, Greek, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Kazakh, Korean, Latin, Latvian, Norwegian, Old Church Slavonic, Persian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Turkish, Ukrainian, Urdu, Uyghur and Vietnamese.
This release fixes a bug in http://hdl.handle.net/11234/1-1976. Changed files: ud-tools-v2.0.tgz (conllu_to_text.pl, conllu_to_conllx.pl; added text_without_spaces.pl), ud-treebanks-conll2017.tgz (fi_ftb-ud-train.txt, he-ud-train.txt, it-ud-train.txt, pt_br-ud-train.txt, es-ud-train.txt) and ud-treebanks-v2.0.tgz (fi_ftb-ud-train.txt, he-ud-train.txt, it-ud-train.txt, pt_br-ud-train.txt, es-ud-train.txt, ar_nyuad-ud-dev.txt, ar_nyuad-ud-test.txt, ar_nyuad-ud-train.txt, cop-ud-dev.txt, cop-ud-test.txt, cop-ud-train.txt, sa-ud-dev.txt, sa-ud-test.txt, sa-ud-train.txt).
Rights:
Licence Universal Dependencies v2.0 , https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.0 , and PUB
Creator:
Nivre, Joakim , Agić, Željko , Ahrenberg, Lars , Antonsen, Lene , Aranzabe, Maria Jesus , Asahara, Masayuki , Ateyah, Luma , Attia, Mohammed , Atutxa, Aitziber , Badmaeva, Elena , Ballesteros, Miguel , Banerjee, Esha , Bank, Sebastian , Bauer, John , Bengoetxea, Kepa , Bhat, Riyaz Ahmad , Bick, Eckhard , Bosco, Cristina , Bouma, Gosse , Bowman, Sam , Burchardt, Aljoscha , Candito, Marie , Caron, Gauthier , Cebiroğlu Eryiğit, Gülşen , Celano, Giuseppe G. A. , Cetin, Savas , Chalub, Fabricio , Choi, Jinho , Cho, Yongseok , Cinková, Silvie , Çöltekin, Çağrı , Connor, Miriam , de Marneffe, Marie-Catherine , de Paiva, Valeria , Diaz de Ilarraza, Arantza , Dobrovoljc, Kaja , Dozat, Timothy , Droganova, Kira , Eli, Marhaba , Elkahky, Ali , Erjavec, Tomaž , Farkas, Richárd , Fernandez Alcalde, Hector , Foster, Jennifer , Freitas, Cláudia , Gajdošová, Katarína , Galbraith, Daniel , Garcia, Marcos , Ginter, Filip , Goenaga, Iakes , Gojenola, Koldo , Gökırmak, Memduh , Goldberg, Yoav , Gómez Guinovart, Xavier , Gonzáles Saavedra, Berta , Grioni, Matias , Grūzītis, Normunds , Guillaume, Bruno , Habash, Nizar , Hajič, Jan , Hajič jr., Jan , Hà Mỹ, Linh , Harris, Kim , Haug, Dag , Hladká, Barbora , Hlaváčová, Jaroslava , Hohle, Petter , Ion, Radu , Irimia, Elena , Johannsen, Anders , Jørgensen, Fredrik , Kaşıkara, Hüner , Kanayama, Hiroshi , Kanerva, Jenna , Kayadelen, Tolga , Kettnerová, Václava , Kirchner, Jesse , Kotsyba, Natalia , Krek, Simon , Kwak, Sookyoung , Laippala, Veronika , Lambertino, Lorenzo , Lando, Tatiana , Lê Hồng, Phương , Lenci, Alessandro , Lertpradit, Saran , Leung, Herman , Li, Cheuk Ying , Li, Josie , Ljubešić, Nikola , Loginova, Olga , Lyashevskaya, Olga , Lynn, Teresa , Macketanz, Vivien , Makazhanov, Aibek , Mandl, Michael , Manning, Christopher , Manurung, Ruli , Mărănduc, Cătălina , Mareček, David , Marheinecke, Katrin , Martínez Alonso, Héctor , Martins, André , Mašek, Jan , Matsumoto, Yuji , McDonald, Ryan , Mendonça, Gustavo , Missilä, Anna , Mititelu, Verginica , Miyao, Yusuke , Montemagni, Simonetta , More, Amir , Moreno Romero, Laura , Mori, Shunsuke , Moskalevskyi, Bohdan , Muischnek, Kadri , Mustafina, Nina , Müürisep, Kaili , Nainwani, Pinkey , Nedoluzhko, Anna , Nguyễn Thị, Lương , Nguyễn Thị Minh, Huyền , Nikolaev, Vitaly , Nitisaroj, Rattima , Nurmi, Hanna , Ojala, Stina , Osenova, Petya , Øvrelid, Lilja , Pascual, Elena , Passarotti, Marco , Perez, Cenel-Augusto , Perrier, Guy , Petrov, Slav , Piitulainen, Jussi , Pitler, Emily , Plank, Barbara , Popel, Martin , Pretkalniņa, Lauma , Prokopidis, Prokopis , Puolakainen, Tiina , Pyysalo, Sampo , Rademaker, Alexandre , Real, Livy , Reddy, Siva , Rehm, Georg , Rinaldi, Larissa , Rituma, Laura , Rosa, Rudolf , Rovati, Davide , Saleh, Shadi , Sanguinetti, Manuela , Saulīte, Baiba , Sawanakunanon, Yanin , Schuster, Sebastian , Seddah, Djamé , Seeker, Wolfgang , Seraji, Mojgan , Shakurova, Lena , Shen, Mo , Shimada, Atsuko , Shohibussirri, Muh , Silveira, Natalia , Simi, Maria , Simionescu, Radu , Simkó, Katalin , Šimková, Mária , Simov, Kiril , Smith, Aaron , Stella, Antonio , Strnadová, Jana , Suhr, Alane , Sulubacak, Umut , Szántó, Zsolt , Taji, Dima , Tanaka, Takaaki , Trosterud, Trond , Trukhina, Anna , Tsarfaty, Reut , Tyers, Francis , Uematsu, Sumire , Urešová, Zdeňka , Uria, Larraitz , Uszkoreit, Hans , van Noord, Gertjan , Varga, Viktor , Vincze, Veronika , Washington, Jonathan North , Yu, Zhuoran , Žabokrtský, Zdeněk , Zeman, Daniel , and Zhu, Hanzhi
Publisher:
Universal Dependencies Consortium
Type:
text and corpus
Subject:
treebank , dependency , syntax , morphology , harmonized annotation , interset , universal tagset , and stanford dependencies
Language:
Ancient Greek (to 1453) , Arabic , Basque , Bulgarian , Croatian , Czech , Danish , Dutch , English , Estonian , Finnish , French , German , Gothic , Modern Greek (1453-) , Hebrew , Hindi , Hungarian , Indonesian , Irish , Italian , Japanese , Latin , Norwegian , Church Slavic , Persian , Polish , Portuguese , Romanian , Slovenian , Spanish , Swedish , Tamil , Catalan , Chinese , Galician , Kazakh , Latvian , Russian , Turkish , Coptic , Sanskrit , Slovak , Ukrainian , Uighur , Vietnamese , Belarusian , Korean , Lithuanian , Urdu , Northern Sami , Upper Sorbian , Russia Buriat , and Northern Kurdish
Description:
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
This release contains the test data used in the CoNLL 2017 shared task on parsing Universal Dependencies. Due to the shared task the test data was held hidden and not released together with the training and development data of UD 2.0. Therefore this release complements the UD 2.0 release (http://hdl.handle.net/11234/1-1983) to a full release of UD treebanks. In addition, the present release contains 18 new parallel test sets and 4 test sets in surprise languages. The present release also includes the development data already released with UD 2.0. Unlike regular UD releases, this one uses the folder-file structure that was visible to the systems participating in the shared task.
Rights:
Licence Universal Dependencies v2.0 , https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.0 , and PUB
Creator:
Nivre, Joakim , Agić, Željko , Ahrenberg, Lars , Aranzabe, Maria Jesus , Asahara, Masayuki , Atutxa, Aitziber , Ballesteros, Miguel , Bauer, John , Bengoetxea, Kepa , Bhat, Riyaz Ahmad , Bick, Eckhard , Bosco, Cristina , Bouma, Gosse , Bowman, Sam , Candito, Marie , Cebiroğlu Eryiğit, Gülşen , Celano, Giuseppe G. A. , Chalub, Fabricio , Choi, Jinho , Çöltekin, Çağrı , Connor, Miriam , Davidson, Elizabeth , de Marneffe, Marie-Catherine , de Paiva, Valeria , Diaz de Ilarraza, Arantza , Dobrovoljc, Kaja , Dozat, Timothy , Droganova, Kira , Dwivedi, Puneet , Eli, Marhaba , Erjavec, Tomaž , Farkas, Richárd , Foster, Jennifer , Freitas, Cláudia , Gajdošová, Katarína , Galbraith, Daniel , Garcia, Marcos , Ginter, Filip , Goenaga, Iakes , Gojenola, Koldo , Gökırmak, Memduh , Goldberg, Yoav , Gómez Guinovart, Xavier , Gonzáles Saavedra, Berta , Grioni, Matias , Grūzītis, Normunds , Guillaume, Bruno , Habash, Nizar , Hajič, Jan , Hà Mỹ, Linh , Haug, Dag , Hladká, Barbora , Hohle, Petter , Ion, Radu , Irimia, Elena , Johannsen, Anders , Jørgensen, Fredrik , Kaşıkara, Hüner , Kanayama, Hiroshi , Kanerva, Jenna , Kotsyba, Natalia , Krek, Simon , Laippala, Veronika , Lê Hồng, Phương , Lenci, Alessandro , Ljubešić, Nikola , Lyashevskaya, Olga , Lynn, Teresa , Makazhanov, Aibek , Manning, Christopher , Mărănduc, Cătălina , Mareček, David , Martínez Alonso, Héctor , Martins, André , Mašek, Jan , Matsumoto, Yuji , McDonald, Ryan , Missilä, Anna , Mititelu, Verginica , Miyao, Yusuke , Montemagni, Simonetta , More, Amir , Mori, Shunsuke , Moskalevskyi, Bohdan , Muischnek, Kadri , Mustafina, Nina , Müürisep, Kaili , Nguyễn Thị, Lương , Nguyễn Thị Minh, Huyền , Nikolaev, Vitaly , Nurmi, Hanna , Ojala, Stina , Osenova, Petya , Øvrelid, Lilja , Pascual, Elena , Passarotti, Marco , Perez, Cenel-Augusto , Perrier, Guy , Petrov, Slav , Piitulainen, Jussi , Plank, Barbara , Popel, Martin , Pretkalniņa, Lauma , Prokopidis, Prokopis , Puolakainen, Tiina , Pyysalo, Sampo , Rademaker, Alexandre , Ramasamy, Loganathan , Real, Livy , Rituma, Laura , Rosa, Rudolf , Saleh, Shadi , Sanguinetti, Manuela , Saulīte, Baiba , Schuster, Sebastian , Seddah, Djamé , Seeker, Wolfgang , Seraji, Mojgan , Shakurova, Lena , Shen, Mo , Sichinava, Dmitry , Silveira, Natalia , Simi, Maria , Simionescu, Radu , Simkó, Katalin , Šimková, Mária , Simov, Kiril , Smith, Aaron , Suhr, Alane , Sulubacak, Umut , Szántó, Zsolt , Taji, Dima , Tanaka, Takaaki , Tsarfaty, Reut , Tyers, Francis , Uematsu, Sumire , Uria, Larraitz , van Noord, Gertjan , Varga, Viktor , Vincze, Veronika , Washington, Jonathan North , Žabokrtský, Zdeněk , Zeldes, Amir , Zeman, Daniel , and Zhu, Hanzhi
Publisher:
Universal Dependencies Consortium
Type:
text and corpus
Subject:
treebank , dependency , syntax , morphology , harmonized annotation , interset , universal tagset , and stanford dependencies
Language:
Ancient Greek (to 1453) , Arabic , Basque , Bulgarian , Croatian , Czech , Danish , Dutch , English , Estonian , Finnish , French , German , Gothic , Modern Greek (1453-) , Hebrew , Hindi , Hungarian , Indonesian , Irish , Italian , Japanese , Latin , Norwegian , Church Slavic , Persian , Polish , Portuguese , Romanian , Slovenian , Spanish , Swedish , Tamil , Catalan , Chinese , Galician , Kazakh , Latvian , Russian , Turkish , Coptic , Sanskrit , Slovak , Ukrainian , Uighur , Vietnamese , Belarusian , Korean , Lithuanian , and Urdu
Description:
This release contains errors in several files. Please use http://hdl.handle.net/11234/1-1983 instead.
Rights:
Licence Universal Dependencies v2.0 , https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.0 , and PUB