Skip to search
Skip to main content
Skip to first result
Search
Search Results
Creator:
Ramisch, Carlos , Cordeiro, Silvio Ricardo , Savary, Agata , Vincze, Veronika , Barbu Mititelu, Verginica , Bhatia, Archna , Buljan, Maja , Candito, Marie , Gantar, Polona , Giouli, Voula , Güngör, Tunga , Hawwari, Abdelati , Iñurrieta, Uxoa , Kovalevskaitė, Jolanta , Krek, Simon , Lichte, Timm , Liebeskind, Chaya , Monti, Johanna , Parra Escartín, Carla , QasemiZadeh, Behrang , Ramisch, Renata , Schneider, Nathan , Stoyanova, Ivelina , Vaidya, Ashwini , Walsh, Abigail , Aceta, Cristina , Aduriz, Itziar , Antoine, Jean-Yves , Arhar Holdt, Špela , Berk, Gözde , Bielinskienė, Agnė , Blagus, Goranka , Boizou, Loic , Bonial, Claire , Caruso, Valeria , Čibej, Jaka , Constant, Matthieu , Cook, Paul , Diab, Mona , Dimitrova, Tsvetana , Ehren, Rafael , Elbadrashiny, Mohamed , Elyovich, Hevi , Erden, Berna , Estarrona, Ainara , Fotopoulou, Aggeliki , Foufi, Vassiliki , Geeraert, Kristina , van Gompel, Maarten , Gonzalez, Itziar , Gurrutxaga, Antton , Ha-Cohen Kerner, Yaakov , Ibrahim, Rehab , Ionescu, Mihaela , Jain, Kanishka , Jazbec, Ivo-Pavao , Kavčič, Teja , Klyueva, Natalia , Kocijan, Kristina , Kovács, Viktória , Kuzman, Taja , Leseva, Svetlozara , Ljubešić, Nikola , Malka, Ruth , Markantonatou, Stella , Martínez Alonso, Héctor , Matas, Ivana , McCrae, John , de Medeiros Caseli, Helena , Onofrei, Mihaela , Palka-Binkiewicz, Emilia , Papadelli, Stella , Parmentier, Yannick , Pascucci, Antonio , Pasquer, Caroline , Pia di Buono, Maria , Puri, Vandana , Raffone, Annalisa , Ratori, Shraddha , Riccio, Anna , Sangati, Federico , Shukla, Vishakha , Simkó, Katalin , Šnajder, Jan , Somers, Clarissa , Srivastava, Shubham , Stefanova, Valentina , Taslimipoor, Shiva , Theoxari, Natasa , Todorova, Maria , Urizar, Ruben , Villavicencio, Aline , and Zilio, Leonardo
Publisher:
PARSEME
Type:
text and corpus
Subject:
Multiword expressions , verbal multiword expressions , light-verb constructions , verb-particle constructions , inherently reflexive verbs , verbal idioms , and multi-verb constructions
Language:
Bulgarian , German , Modern Greek (1453-) , Spanish , Persian , French , Hebrew , Hungarian , Italian , Lithuanian , Polish , Portuguese , Romanian , Slovenian , Turkish , Hindi , Basque , English , and Croatian
Description:
This multilingual resource contains corpora in which verbal MWEs have been manually annotated. VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). VMWEs were annotated according to the universal guidelines in 19 languages. The corpora are provided in the cupt format, inspired by the CONLL-U format. The corpora were used in the 1.1 edition of the PARSEME Shared Task (2018).
For most languages, morphological and syntactic information – not necessarily using UD tagsets – including parts of speech, lemmas, morphological features and/or syntactic dependencies are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe).
This item contains training, development and test data, as well as the evaluation tools used in the PARSEME Shared Task 1.1 (2018).
The annotation guidelines are available online: http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.1
Rights:
PARSEME Shared Task Data (v. 1.1) Agreement , https://lindat.mff.cuni.cz/repository/xmlui/page/licence-mwe-1.1 , and PUB
Creator:
Gurevych, Iryna , Habernal, Ivan , and Zayed, Omnia
Publisher:
Technische Universität Darmstadt
Type:
text and corpus
Subject:
CommonCrawl , Creative Commons , Web corpus , and Amazon Web Services
Language:
Afrikaans , Arabic , Bengali , Bulgarian , Czech , Danish , German , Modern Greek (1453-) , English , Estonian , Persian , Finnish , French , Hebrew , Hindi , Croatian , Hungarian , Indonesian , Italian , Japanese , Kannada , Korean , Latvian , Lithuanian , Malayalam , Macedonian , Nepali (macrolanguage) , Dutch , Norwegian , Panjabi , Polish , Portuguese , Romanian , Russian , Slovak , Slovenian , Somali , Spanish , Albanian , Swahili (macrolanguage) , Swedish , Tamil , Telugu , Tagalog , Thai , Turkish , Ukrainian , Undetermined , Vietnamese , and Chinese
Description:
A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
Rights:
Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) , http://creativecommons.org/licenses/by-nc/4.0/ , and PUB
Creator:
Gurevych, Iryna , Habernal, Ivan , and Zayed, Omnia
Publisher:
Technische Universität Darmstadt
Type:
text and corpus
Subject:
CommonCrawl , Creative Commons , Web corpus , and Amazon Web Services
Language:
Afrikaans , Arabic , Bengali , Bulgarian , Czech , Danish , German , Modern Greek (1453-) , English , Estonian , Persian , Finnish , French , Gujarati , Hebrew , Hindi , Croatian , Hungarian , Indonesian , Italian , Japanese , Kannada , Korean , Latvian , Lithuanian , Malayalam , Marathi , Macedonian , Nepali (macrolanguage) , Dutch , Norwegian , Polish , Portuguese , Romanian , Russian , Slovak , Slovenian , Somali , Spanish , Albanian , Swahili (macrolanguage) , Swedish , Tamil , Telugu , Tagalog , Thai , Turkish , Ukrainian , Undetermined , Urdu , Vietnamese , and Chinese
Description:
A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
Rights:
Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) , http://creativecommons.org/licenses/by-nc-nd/4.0/ , and PUB
Creator:
Gurevych, Iryna , Habernal, Ivan , and Zayed, Omnia
Publisher:
Technische Universität Darmstadt
Type:
text and corpus
Subject:
CommonCrawl , Creative Commons , Web corpus , and Amazon Web Services
Language:
Afrikaans , Arabic , Bengali , Bulgarian , Czech , Danish , German , Modern Greek (1453-) , English , Estonian , Persian , Finnish , French , Gujarati , Hebrew , Hindi , Croatian , Hungarian , Indonesian , Italian , Japanese , Korean , Latvian , Lithuanian , Malayalam , Marathi , Macedonian , Nepali (macrolanguage) , Dutch , Norwegian , Polish , Portuguese , Romanian , Russian , Slovak , Slovenian , Somali , Spanish , Albanian , Swahili (macrolanguage) , Swedish , Tamil , Telugu , Tagalog , Thai , Turkish , Ukrainian , Undetermined , Urdu , Vietnamese , and Chinese
Description:
A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
Rights:
Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) , http://creativecommons.org/licenses/by-nc-sa/4.0/ , and PUB
Creator:
Gurevych, Iryna , Habernal, Ivan , and Zayed, Omnia
Publisher:
Technische Universität Darmstadt
Type:
text and corpus
Subject:
CommonCrawl , Creative Commons , Web corpus , and Amazon Web Services
Language:
Afrikaans , Arabic , Bengali , Bulgarian , Czech , Danish , German , Modern Greek (1453-) , English , Estonian , Persian , Finnish , French , Gujarati , Hebrew , Hindi , Croatian , Hungarian , Indonesian , Italian , Japanese , Korean , Latvian , Lithuanian , Malayalam , Macedonian , Dutch , Norwegian , Polish , Portuguese , Romanian , Russian , Slovak , Slovenian , Somali , Spanish , Albanian , Swahili (macrolanguage) , Swedish , Tamil , Tagalog , Thai , Turkish , Ukrainian , Undetermined , Vietnamese , and Chinese
Description:
A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
Rights:
Creative Commons - Attribution-NoDerivatives 4.0 International (CC BY-ND 4.0) , http://creativecommons.org/licenses/by-nc/4.0/ , and PUB
Creator:
Gurevych, Iryna , Habernal, Ivan , and Zayed, Omnia
Publisher:
Technische Universität Darmstadt
Type:
text and corpus
Subject:
CommonCrawl , Creative Commons , Web corpus , and Amazon Web Services
Language:
Afrikaans , Arabic , Bengali , Bulgarian , Czech , Danish , German , Modern Greek (1453-) , English , Estonian , Persian , Finnish , French , Gujarati , Hebrew , Hindi , Croatian , Hungarian , Indonesian , Italian , Japanese , Kannada , Korean , Latvian , Lithuanian , Malayalam , Marathi , Macedonian , Nepali (macrolanguage) , Dutch , Norwegian , Panjabi , Polish , Portuguese , Romanian , Russian , Slovak , Slovenian , Somali , Spanish , Albanian , Swahili (macrolanguage) , Swedish , Tamil , Telugu , Tagalog , Thai , Turkish , Ukrainian , Undetermined , Urdu , Vietnamese , and Chinese
Description:
A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
Rights:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) , http://creativecommons.org/licenses/by-sa/4.0/ , and PUB
Creator:
Gurevych, Iryna , Habernal, Ivan , and Zayed, Omnia
Publisher:
Technische Universität Darmstadt
Type:
text and corpus
Subject:
CommonCrawl , Creative Commons , Web corpus , and Amazon Web Services
Language:
Afrikaans , Arabic , Bengali , Bulgarian , Czech , Danish , German , Modern Greek (1453-) , English , Estonian , Persian , Finnish , French , Gujarati , Hebrew , Hindi , Croatian , Hungarian , Indonesian , Italian , Japanese , Kannada , Korean , Latvian , Lithuanian , Malayalam , Marathi , Macedonian , Nepali (macrolanguage) , Dutch , Norwegian , Panjabi , Polish , Portuguese , Romanian , Russian , Slovak , Slovenian , Somali , Spanish , Albanian , Swahili (macrolanguage) , Swedish , Tamil , Telugu , Tagalog , Thai , Turkish , Ukrainian , Undetermined , Urdu , Vietnamese , and Chinese
Description:
A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
Rights:
Creative Commons - Attribution 4.0 International (CC BY 4.0) , http://creativecommons.org/licenses/by/4.0/ , and PUB
Creator:
Gurevych, Iryna , Habernal, Ivan , and Zayed, Omnia
Publisher:
Technische Universität Darmstadt
Type:
text and corpus
Subject:
CommonCrawl , Creative Commons , Web corpus , and Amazon Web Services
Language:
Afrikaans , Arabic , Bulgarian , Czech , Danish , German , Modern Greek (1453-) , English , Estonian , Persian , Finnish , French , Croatian , Hungarian , Indonesian , Italian , Japanese , Korean , Latvian , Lithuanian , Dutch , Norwegian , Polish , Portuguese , Russian , Slovenian , Somali , Spanish , Swahili (macrolanguage) , Swedish , Tagalog , Thai , Turkish , Ukrainian , Undetermined , and Vietnamese
Description:
A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
Rights:
Public Domain Mark (PD) , http://creativecommons.org/publicdomain/mark/1.0/ , and PUB
Type:
text , dokumenty , and edice
Subject:
Mezinárodní vztahy, světová politika , dějiny československé , vztahy mezinárodní , politika zahraniční , diplomacie , Československo 1945-1948 , and zahraniční politika, mezinárodní vztahy
Language:
Czech , English , Croatian , Polish , and Slovak
Rights:
unknown
Creator:
Kranjčević, Jasenka
Type:
text , statický obraz , and monografie
Subject:
Architektura , architekti čeští , turistika , hotely , domy lázeňské , české země 1848-1918 , Československo 1918-1938 , architektura, architekti , and jednotlivé sporty, turistika
Language:
Czech , English , and Croatian
Description:
Katalog výstavy
Rights:
unknown