« Previous |
1 - 100 of 145
|
Next »
Number of results to display per page
Search Results
2. 1968 din primăvară până în toamnă :
- Creator:
- Retegan, Mihai,
- Publisher:
- RAO,
- Type:
- monografie
- Subject:
- Mezinárodní vztahy, světová politika, Pražské jaro (1968), vztahy rumunsko-československé, Československo 1948-1969, Mnichov 1938, Pražské jaro 1968, okupace 1939, 1968, Rumunsko, světové dějiny od r. 1945 do současnosti, and zahraniční politika, mezinárodní vztahy
- Language:
- Romanian
- Rights:
- unknown
3. 1968. Primăvara de la Praga. Documente diplomatice, iaunarie 1968 - aprille 1969 /
- Creator:
- Preda, Dumitru
- Publisher:
- MondoMedia,
- Subject:
- Pražské jaro (1968), vztahy rumunsko-československé, politika zahraniční, dokumenty diplomatické, zahraniční politika, mezinárodní vztahy, Mnichov 1938, Pražské jaro 1968, okupace 1939, 1968, světové dějiny od r. 1945 do současnosti, Rumunsko, and Československo 1948-1969
- Language:
- Romanian
- Rights:
- unknown
4. Abatele Zavoral: vizitele unui prieten /
- Creator:
- Felix, Jiří,
- Subject:
- Zavoral, Method Jan,, opati, řád, premonstráti, vztahy česko-rumunské, válka první světová (1914-1918), zahraniční politika, mezinárodní vztahy, světové dějiny 1918-1945, Rumunsko, and Československo 1918-1945
- Language:
- Romanian
- Rights:
- unknown
5. Annotated corpora and tools of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions (edition 1.0)
- Creator:
- Savary, Agata, Ramisch, Carlos, Cordeiro, Silvio Ricardo, Sangati, Federico, Vincze, Veronika, QasemiZadeh, Behrang, Candito, Marie, Cap, Fabienne, Giouli, Voula, Stoyanova, Ivelina, Doucet, Antoine, Adalı, Kübra, Barbu Mititelu, Verginica, Bejček, Eduard, El Maarouf, Ismail, Eryiğit, Gülşen, Galea, Luke, Ha-Cohen Kerner, Yaakov, Liebeskind, Chaya, Monti, Johanna, Parra Escartín, Carla, Kovalevskaitė, Jolanta, Krek, Simon, van der Plas, Lonneke, Aceta, Cristina, Aduriz, Itziar, Antoine, Jean-Yves, Attard, Greta, Azzopardi, Kirsty, Boizou, Loic, Bonnici, Janice, Boz, Mert, Bumbulienė, Ieva, Busuttil, Jael, Caruso, Valeria, Cherchi, Manuela, Constant, Matthieu, Czerepowicka, Monika, De Santis, Anna, Dimitrova, Tsvetana, Dinç, Tutkum, Elyovich, Hevi, Fabri, Ray, Farrugia, Alison, Findlay, Jamie, Fotopoulou, Aggeliki, Foufi, Vassiliki, Galea, Sara Anne, Gantar, Polona, Gatt, Albert, Gatt, Anabelle, Herrero, Carlos, Iñurrieta, Uxoa, Jagfeld, Glorianna, Hnátková, Milena, Ionescu, Mihaela, Klyueva, Natalia, Koeva, Svetla, Kovács, Viktória, Kuzman, Taja, Leseva, Svetlozara, Louisou, Sevi, Lynn, Teresa, Malka, Ruth, Martínez Alonso, Héctor, McCrae, John, de Medeiros Caseli, Helena, Miral, Ayşenur, Muscat, Amanda, Nivre, Joakim, Oakes, Michael, Onofrei, Mihaela, Parmentier, Yannick, Pasquer, Caroline, Pia di Buono, Maria, Priego Sanchez, Belem, Raffone, Annalisa, Ramisch, Renata, Rimkutė, Erika, Rizea, Monica-Mihaela, Simkó, Katalin, Spagnol, Michael, Stefanova, Valentina, Stymne, Sara, Sulubacak, Umut, Tabone, Nicole, Tanti, Marc, Todorova, Maria, Urešová, Zdenka, Villavicencio, Aline, and Zilio, Leonardo
- Publisher:
- PARSEME
- Type:
- text and corpus
- Subject:
- Multiword expressions, verbal multiword expressions, idioms, light-verb constructions, verb-particle constructions, and inherently reflexive verbs
- Language:
- Bulgarian, Czech, German, Modern Greek (1453-), Spanish, Persian, French, Hebrew, Hungarian, Italian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovenian, Swedish, and Turkish
- Description:
- The PARSEME shared task aims at identifying verbal MWEs in running texts. Verbal MWEs include idioms (let the cat out of the bag), light verb constructions (make a decision), verb-particle constructions (give up), and inherently reflexive verbs (se suicider 'to suicide' in French). VMWEs were annotated according to the universal guidelines in 18 languages. The corpora are provided in the parsemetsv format, inspired by the CONLL-U format. For most languages, paired files in the CONLL-U format - not necessarily using UD tagsets - containing parts of speech, lemmas, morphological features and/or syntactic dependencies are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). This item contains training and test data, tools and the universal guidelines file.
- Rights:
- PARSEME Shared Task Data (v. 1.0) Agreement, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-mwe-1.0, and PUB
6. Annotated corpora and tools of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions (edition 1.1)
- Creator:
- Ramisch, Carlos, Cordeiro, Silvio Ricardo, Savary, Agata, Vincze, Veronika, Barbu Mititelu, Verginica, Bhatia, Archna, Buljan, Maja, Candito, Marie, Gantar, Polona, Giouli, Voula, Güngör, Tunga, Hawwari, Abdelati, Iñurrieta, Uxoa, Kovalevskaitė, Jolanta, Krek, Simon, Lichte, Timm, Liebeskind, Chaya, Monti, Johanna, Parra Escartín, Carla, QasemiZadeh, Behrang, Ramisch, Renata, Schneider, Nathan, Stoyanova, Ivelina, Vaidya, Ashwini, Walsh, Abigail, Aceta, Cristina, Aduriz, Itziar, Antoine, Jean-Yves, Arhar Holdt, Špela, Berk, Gözde, Bielinskienė, Agnė, Blagus, Goranka, Boizou, Loic, Bonial, Claire, Caruso, Valeria, Čibej, Jaka, Constant, Matthieu, Cook, Paul, Diab, Mona, Dimitrova, Tsvetana, Ehren, Rafael, Elbadrashiny, Mohamed, Elyovich, Hevi, Erden, Berna, Estarrona, Ainara, Fotopoulou, Aggeliki, Foufi, Vassiliki, Geeraert, Kristina, van Gompel, Maarten, Gonzalez, Itziar, Gurrutxaga, Antton, Ha-Cohen Kerner, Yaakov, Ibrahim, Rehab, Ionescu, Mihaela, Jain, Kanishka, Jazbec, Ivo-Pavao, Kavčič, Teja, Klyueva, Natalia, Kocijan, Kristina, Kovács, Viktória, Kuzman, Taja, Leseva, Svetlozara, Ljubešić, Nikola, Malka, Ruth, Markantonatou, Stella, Martínez Alonso, Héctor, Matas, Ivana, McCrae, John, de Medeiros Caseli, Helena, Onofrei, Mihaela, Palka-Binkiewicz, Emilia, Papadelli, Stella, Parmentier, Yannick, Pascucci, Antonio, Pasquer, Caroline, Pia di Buono, Maria, Puri, Vandana, Raffone, Annalisa, Ratori, Shraddha, Riccio, Anna, Sangati, Federico, Shukla, Vishakha, Simkó, Katalin, Šnajder, Jan, Somers, Clarissa, Srivastava, Shubham, Stefanova, Valentina, Taslimipoor, Shiva, Theoxari, Natasa, Todorova, Maria, Urizar, Ruben, Villavicencio, Aline, and Zilio, Leonardo
- Publisher:
- PARSEME
- Type:
- text and corpus
- Subject:
- Multiword expressions, verbal multiword expressions, light-verb constructions, verb-particle constructions, inherently reflexive verbs, verbal idioms, and multi-verb constructions
- Language:
- Bulgarian, German, Modern Greek (1453-), Spanish, Persian, French, Hebrew, Hungarian, Italian, Lithuanian, Polish, Portuguese, Romanian, Slovenian, Turkish, Hindi, Basque, English, and Croatian
- Description:
- This multilingual resource contains corpora in which verbal MWEs have been manually annotated. VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). VMWEs were annotated according to the universal guidelines in 19 languages. The corpora are provided in the cupt format, inspired by the CONLL-U format. The corpora were used in the 1.1 edition of the PARSEME Shared Task (2018). For most languages, morphological and syntactic information – not necessarily using UD tagsets – including parts of speech, lemmas, morphological features and/or syntactic dependencies are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). This item contains training, development and test data, as well as the evaluation tools used in the PARSEME Shared Task 1.1 (2018). The annotation guidelines are available online: http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.1
- Rights:
- PARSEME Shared Task Data (v. 1.1) Agreement, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-mwe-1.1, and PUB
7. Annotated corpora and tools of the PARSEME Shared Task on Semi-Supervised Identification of Verbal Multiword Expressions (edition 1.2)
- Creator:
- Ramisch, Carlos, Guillaume, Bruno, Savary, Agata, Waszczuk, Jakub, Candito, Marie, Vaidya, Ashwini, Barbu Mititelu, Verginica, Bhatia, Archna, Iñurrieta, Uxoa, Giouli, Voula, Güngör, Tunga, Jiang, Menghan, Lichte, Timm, Liebeskind, Chaya, Monti, Johanna, Ramisch, Renata, Stymme, Sara, Walsh, Abigail, Xu, Hongzhi, Palka-Binkiewicz, Emilia, Ehren, Rafael, Stymne, Sara, Constant, Matthieu, Pasquer, Caroline, Parmentier, Yannick, Antoine, Jean-Yves, Carlino, Carola, Caruso, Valeria, Di Buono, Maria Pia, Pascucci, Antonio, Raffone, Annalisa, Riccio, Anna, Sangati, Federico, Speranza, Giulia, Cordeiro, Silvio Ricardo, de Medeiros Caseli, Helena, Miranda, Isaac, Rademaker, Alexandre, Vale, Oto, Villavicencio, Aline, Wick Pedro, Gabriela, Wilkens, Rodrigo, Zilio, Leonardo, Rizea, Monica-Mihaela, Ionescu, Mihaela, Onofrei, Mihaela, Chen, Jia, Ge, Xiaomin, Hu, Fangyuan, Hu, Sha, Li, Minli, Liu, Siyuan, Qin, Zhenzhen, Sun, Ruilong, Wang, Chenweng, Xiao, Huangyang, Yan, Peiyi, Yih, Tsy, Yu, Ke, Yu, Songping, Zeng, Si, Zhang, Yongchen, Zhao, Yun, Foufi, Vassiliki, Fotopoulou, Aggeliki, Markantonatou, Stella, Papadelli, Stella, Louizou, Sevasti, Aduriz, Itziar, Estarrona, Ainara, Gonzalez, Itziar, Gurrutxaga, Antton, Uria, Larraitz, Urizar, Ruben, Foster, Jennifer, Lynn, Teresa, Elyovitch, Hevi, Ha-Cohen Kerner, Yaakov, Malka, Ruth, Jain, Kanishka, Puri, Vandana, Ratori, Shraddha, Shukla, Vishakha, Srivastava, Shubham, Berk, Gozde, Erden, Berna, and Yirmibeşoğlu, Zeynep
- Publisher:
- PARSEME
- Type:
- text and corpus
- Subject:
- multiword expressions, verbal multiword expressions, light verb construction, verb-particle constructions, inherently reflexive verbs, verbal idioms, and multi-verb constructions
- Language:
- German, Modern Greek (1453-), Basque, French, Irish, Hebrew, Hindi, Italian, Polish, Portuguese, Romanian, Swedish, Turkish, and Chinese
- Description:
- This multilingual resource contains corpora in which verbal MWEs have been manually annotated, gathered at the occasion of the 1.2 edition of the PARSEME Shared Task on semi-supervised Identification of Verbal MWEs (2020). VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). For the 1.2 shared task edition, the data covers 14 languages, for which VMWEs were annotated according to the universal guidelines. The corpora are provided in the cupt format, inspired by the CONLL-U format. Morphological and syntactic information – not necessarily using UD tagsets – including parts of speech, lemmas, morphological features and/or syntactic dependencies are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). This item contains training, development and test data, as well as the evaluation tools used in the PARSEME Shared Task 1.2 (2020). The annotation guidelines are available online: http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.2
- Rights:
- PARSEME Shared Task Data (v. 1.2) Agreement, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-mwe-1.2, and PUB
8. Armata romănă in războiul antihitlerist :
- Type:
- text, sborníky, and vzpomínky
- Subject:
- Dějiny států a území na Balkánském poloostrově, válka druhá světová (1939-1945), armáda rumunská, Rumunsko, armáda, vojenské složky, vojáci, and světové dějiny 1939-1945
- Language:
- Romanian
- Rights:
- unknown
9. Aspecte din înarmarea României burghezo-mosieresti împotriva URSS in perioada crizei economice mondiale. Afacerea Skoda /
- Creator:
- Safta, Stelian
- Type:
- text and studie
- Subject:
- Zahraniční obchod. Mezinárodní obchod, zbrojařství, Československo 1918-1938, průmysl, manufaktury, hornictví, pivovary, Rumunsko, and světové dějiny 1918-1945
- Language:
- Romanian
- Rights:
- unknown
10. Atitudinea guvernului român fatä de Cehoslovacia in lunile premergätoare München-ului mai-septembre 1938 /
- Creator:
- Benditer, Janeta
- Type:
- text and studie
- Subject:
- Dějiny Česka a Slovenska, politika zahraniční, Československo 1918-1938, zahraniční politika, mezinárodní vztahy, Rumunsko, and světové dějiny 1918-1945
- Language:
- Romanian
- Rights:
- unknown
11. August 1944 - Mai 1945 :
- Creator:
- Bantea, Eugen
- Publisher:
- Edit. Militară,
- Type:
- monografie
- Subject:
- Dějiny Česka a Slovenska, osvobození Československa, armáda rumunská, partyzáni, vztahy rumunsko-české, Československo 1938-1945, osvobození Československa, Pražské povstání 1945, Rumunsko, světové dějiny 1939-1945, and armáda, vojenské složky, vojáci
- Language:
- Romanian
- Rights:
- unknown
12. Bibliografia analitică a periodicelor românesti.
- Type:
- text and bibliografie
- Subject:
- Dějiny států a území na Balkánském poloostrově, bibliografie historická, bibliografie oborové, bibliografie retrospektivní, Rumunsko, přehledná zpracování (tematicky), světové dějiny 1789-1918, and bibliografie oborové a tematické, rejstříky časopisů
- Language:
- Romanian
- Rights:
- unknown
13. Bibliografia istorică a României pentru anii 1959 şi 1960. De 1 /
- Type:
- text and bibliografie
- Subject:
- Dějiny států a území na Balkánském poloostrově, bibliografie oborové, historiografie, Rumunsko, dějepisectví, historické vědy, historici, světové dějiny od r. 1945 do současnosti, and bibliografie oborové a tematické, rejstříky časopisů
- Language:
- Romanian
- Rights:
- unknown
14. Bibliografiă istorica a României. /
- Type:
- text and bibliografie
- Subject:
- Dějiny států a území na Balkánském poloostrově, historiografie, dějiny států, Rumunsko, přehledná zpracování (tematicky), přehledná zpracování světových dějin (chronologicky), and bibliografie oborové a tematické, rejstříky časopisů
- Language:
- Romanian
- Rights:
- unknown
15. Bibliotheca historica Romaniae :
- Type:
- text and monografie
- Subject:
- Dějiny států a území na Balkánském poloostrově, edice, dějiny států, Rumunsko, přehledná zpracování (tematicky), and přehledná zpracování světových dějin (chronologicky)
- Language:
- Romanian
- Description:
- Ediční monografická řada
- Rights:
- unknown
16. C4Corpus (CC BY-NC part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Malayalam, Macedonian, Nepali (macrolanguage), Dutch, Norwegian, Panjabi, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Telugu, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB
17. C4Corpus (CC BY-NC-ND part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Malayalam, Marathi, Macedonian, Nepali (macrolanguage), Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Telugu, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Urdu, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), http://creativecommons.org/licenses/by-nc-nd/4.0/, and PUB
18. C4Corpus (CC BY-NC-SA part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Malayalam, Marathi, Macedonian, Nepali (macrolanguage), Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Telugu, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Urdu, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
19. C4Corpus (CC BY-ND part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Malayalam, Macedonian, Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution-NoDerivatives 4.0 International (CC BY-ND 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB
20. C4Corpus (CC BY-SA part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Malayalam, Marathi, Macedonian, Nepali (macrolanguage), Dutch, Norwegian, Panjabi, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Telugu, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Urdu, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB
21. C4Corpus (CC-BY part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Malayalam, Marathi, Macedonian, Nepali (macrolanguage), Dutch, Norwegian, Panjabi, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Telugu, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Urdu, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB
22. Când şi unde se i vesc Românii întâia datà în istorie? /
- Creator:
- Murnu, George,
- Type:
- text and studie
- Subject:
- Dějiny států a území na Balkánském poloostrově, Rumuni, osídlení rumunské, kolonizace rumunská, Rumunsko, dějiny společnosti, světové dějiny středověku (do r. 1492), and Evropa za hranicemi antického světa
- Language:
- Romanian
- Rights:
- unknown
23. Carol al IV lea :
- Creator:
- Karel
- Type:
- text, prameny, and překlady
- Subject:
- Biografie, Karel, Václav,, panovníci čeští, panovníci římsko-němečtí, autobiografie, legendy svatojánské, české země 1306-1419, and literatura, spisovatelé
- Language:
- Romanian
- Description:
- Viaţa lui Carol al IV-lea and Legenda Sfântului Venceslav
- Rights:
- unknown
24. Cehi, Slovaci si Romani in veacurila XIII - XVI /
- Creator:
- Dan, Mihail P.,
- Type:
- text and monografie
- Subject:
- Demografie. Populace, vztahy rumunsko-české, vztahy rumunsko-slovenské, české země 1306-1526, zahraniční politika, mezinárodní vztahy, Rumunsko, světové dějiny středověku (do r. 1492), and české země od příchodu Slovanů do roku 1306
- Language:
- Romanian
- Rights:
- unknown
25. Cehi, slovaci şi români în veacurile XIII-XVI =
- Creator:
- Dan, Mihail P.,
- Type:
- text and monografie
- Subject:
- Mezinárodní vztahy, světová politika, vztahy česko-rumunské, vztahy slovensko-rumunské, vztahy mezinárodní, vztahy kulturní, zahraniční politika, mezinárodní vztahy, světové dějiny středověku (do r. 1492), Rumunsko, and české země 1306-1526
- Language:
- Romanian
- Rights:
- unknown
26. Cehoslovacia între pragmatism intern şi interes international /
- Creator:
- Legrand, Veronica
- Publisher:
- Editura Ivan Krasko,
- Subject:
- dějiny států, Češi rumunští, politické dějiny, politici, světové dějiny od r. 1918 do současnosti, Rumunsko, národnosti, vztahy mezi národnostmi a národní hnutí, migrace, vystěhovalectví, kolonizace, and Československo 1918-1992
- Language:
- Romanian
- Rights:
- unknown
27. Cetatea Scheia :
- Creator:
- Diaconu, Gheorghe,
- Type:
- text and monografie
- Subject:
- Archeologie, archeologie středověku, hrady, Rumunsko, hrady, hradiště, zámky, tvrze, dvory, světové dějiny středověku (do r. 1492), and archeologické výzkumy, archeologie v muzeích a archivech
- Language:
- Romanian
- Rights:
- unknown
28. Civilizatia turcă /
- Creator:
- Ekrem, Mehmet Ali
- Type:
- text and monografie
- Subject:
- Dějiny států západní Asie. Blízký východ, Turci, dějiny turecké, Turecko, přehledná zpracování světových dějin (chronologicky), přehledná zpracování (tematicky), and Osmanská říše
- Language:
- Romanian
- Rights:
- unknown
29. Conferinta internatională: "Promises of 1968" /
- Creator:
- Vasile, Cristian
- Subject:
- konference mezinárodní, Pražské jaro (1968), dějiny politické, Mnichov 1938, Pražské jaro 1968, okupace 1939, 1968, světové dějiny od r. 1945 do současnosti, zahraniční konference, kongresy, and Československo 1948-1969
- Language:
- Romanian
- Description:
- [Washington D.C., 6.-7.11.2008]
- Rights:
- unknown
30. CoNLL 2017 and 2018 Shared Task Blind and Preprocessed Test Data
- Creator:
- Zeman, Daniel and Straka, Milan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- tokenization, word segmentation, morphology, tagging, syntax, parsing, and universal dependencies
- Language:
- Afrikaans, Arabic, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Persian, Finnish, French, Old French (842-ca. 1400), Irish, Galician, Gothic, Ancient Greek (to 1453), Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Latin, Latvian, Dutch, Norwegian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Thai, Turkish, Uighur, Ukrainian, Urdu, Vietnamese, and Chinese
- Description:
- CoNLL 2017 and 2018 shared tasks: Multilingual Parsing from Raw Text to Universal Dependencies This package contains the test data in the form in which they ware presented to the participating systems: raw text files and files preprocessed by UDPipe. The metadata.json files contain lists of files to process and to output; README files in the respective folders describe the syntax of metadata.json. For full training, development and gold standard test data, see Universal Dependencies 2.0 (CoNLL 2017) Universal Dependencies 2.2 (CoNLL 2018) See the download links at http://universaldependencies.org/. For more information on the shared tasks, see http://universaldependencies.org/conll17/ http://universaldependencies.org/conll18/ Contents: conll17-ud-test-2017-05-09 ... CoNLL 2017 test data conll18-ud-test-2018-05-06 ... CoNLL 2018 test data conll18-ud-test-2018-05-06-for-conll17 ... CoNLL 2018 test data with metadata and filenames modified so that it is digestible by the 2017 systems.
- Rights:
- Licence Universal Dependencies v2.2, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.2, and PUB
31. CoNLL 2017 Shared Task System Outputs
- Creator:
- Zeman, Daniel, Potthast, Martin, Straka, Milan, Popel, Martin, Dozat, Timothy, Qi, Peng, Manning, Christopher, Shi, Tianze, Wu, Felix G., Chen, Xilun, Cheng, Yao, Björkelund, Anders, Falenska, Agnieszka, Yu, Xiang, Kuhn, Jonas, Che, Wanxiang, Guo, Jiang, Wang, Yuxuan, Zheng, Bo, Zhao, Huaipeng, Liu, Yang, Teng, Dechuan, Liu, Ting, Lim, Kyungtae, Poibeau, Thierry, Sato, Motoki, Manabe, Hitoshi, Noji, Hiroshi, Matsumoto, Yuji, Kırnap, Ömer, Önder, Berkay Furkan, Yuret, Deniz, Straková, Jana, Vania, Clara, Zhang, Xingxing, Lopez, Adam, Heinecke, Johannes, Asadullah, Munshi, Kanerva, Jenna, Luotolahti, Juhani, Ginter, Filip, Kuan, Yu, Sofroniev, Pavel, Schill, Erik, Hinrichs, Erhard, Nguyen, Dat Quoc, Dras, Mark, Johnson, Mark, Qian, Xian, Vilares, David, Gómez-Rodríguez, Carlos, Aufrant, Lauriane, Wisniewski, Guillaume, Yvon, François, Dumitrescu, Stefan Daniel, Boroş, Tiberiu, Tufiş, Dan, Das, Ayan, Zaffar, Affan, Sarkar, Sudeshna, Wang, Hao, Zhao, Hai, Zhang, Zhisong, Hornby, Ryan, Taylor, Clark, Park, Jungyeul, de Lhoneux, Miryam, Shao, Yan, Basirat, Ali, Kiperwasser, Eliyahu, Stymne, Sara, Goldberg, Yoav, Nivre, Joakim, Akkuş, Burak Kerim, Azizoglu, Heval, Cakici, Ruket, Moor, Christophe, Merlo, Paola, Henderson, James, Wang, Haozhou, Ji, Tao, Wu, Yuanbin, Lan, Man, de la Clergerie, Eric, Sagot, Benoît, Seddah, Djamé, More, Amir, Tsarfaty, Reut, Kanayama, Hiroshi, Muraoka, Masayasu, Yoshikawa, Katsumasa, Garcia, Marcos, and Gamallo, Pablo
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- dependency parser and parsebank
- Language:
- Arabic, Bulgarian, Russia Buriat, Czech, Catalan, Church Slavic, Danish, German, Modern Greek (1453-), English, Spanish, Estonian, Basque, Persian, Finnish, French, Irish, Galician, Gothic, Ancient Greek (to 1453), Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Latin, Latvian, Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Northern Sami, Swedish, Turkish, Uighur, Ukrainian, Urdu, Vietnamese, and Chinese
- Description:
- This package contains the system outputs from the CoNLL 2017 Shared Task in Multilingual Parsing from Raw Text to Universal Dependencies.
- Rights:
- Licence Universal Dependencies v2.0, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.0, and PUB
32. CoNLL 2018 Shared Task System Outputs
- Creator:
- Zeman, Daniel, Potthast, Martin, Duthoo, Elie, Mesnard, Olivier, Rybak, Piotr, Wróblewska, Alina, Che, Wanxiang, Liu, Yijia, Wang, Yuxuan, Zheng, Bo, Liu, Ting, Li, Zuchao, He, Shexia, Zhang, Zhuosheng, Zhao, Hai, Wu, Yingting, Tong, Jia-Jun, Nguyen, Dat Quoc, Verspoor, Karin, Wan, Hui, Naseem, Tahira, Lee, Young-Suk, Castelli, Vittorio, Ballesteros, Miguel, Hershcovich, Daniel, Abend, Omri, Rappoport, Ari, Smith, Aaron, Bohnet, Bernd, de Lhoneux, Miryam, Nivre, Joakim, Shao, Yan, Stymne, Sara, Kırnap, Ömer, Dayanık, Erenay, Yuret, Deniz, Kanerva, Jenna, Ginter, Filip, Miekka, Niko, Leino, Akseli, Salakoski, Tapio, Lim, KyungTae, Park, Cheoneum, Lee, Changki, Poibeau, Thierry, Bhat, Riyaz Ahmad, Bhat, Irshad, Bangalore, Srinivas, Qi, Peng, Dozat, Timothy, Zhang, Yuhao, Manning, Christopher, Boroș, Tiberiu, Dumitrescu, Stefan Daniel, Burtica, Ruxandra, Arakelyan, Gor, Hambardzumyan, Karen, Khachatrian, Hrant, Rosa, Rudolf, Mareček, David, Straka, Milan, Seker, Amit, More, Amir, Tsarfaty, Reut, Önder, Berkay Furkan, Gümeli, Can, Jawahar, Ganesh, Muller, Benjamin, Fethi, Amal, Martin, Louis, Villemonte de la Clergerie, Eric, Sagot, Benoît, Seddah, Djamé, Özateş, Şaziye Betül, Özgür, Arzucan, Gungor, Tunga, Öztürk, Balkız, Ji, Tao, Liu, Yufang, Wang, Yijun, Wu, Yuanbin, Lan, Man, Chen, Danlu, Lin, Mengxiao, Hu, Zhifeng, and Qiu, Xipeng
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- parsed data, conllu, and universal dependencies
- Language:
- Afrikaans, Arabic, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Persian, Finnish, French, Old French (842-ca. 1400), Irish, Galician, Gothic, Ancient Greek (to 1453), Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Latin, Latvian, Dutch, Norwegian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Thai, Turkish, Uighur, Ukrainian, Urdu, Vietnamese, and Chinese
- Description:
- Test data parsed by systems submitted to the CoNLL 2018 UD parsing shared task.
- Rights:
- Licence Universal Dependencies v2.2, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.2, and PUB
33. ConsILR - Consortium for the Romanian Language: Resources & Tools
- Type:
- lexicalConceptualResource
- Language:
- English and Romanian
- Description:
- Resources and tools developed for Romanian
- Rights:
- Not specified
34. Contributia României la victoria asupra fascismului
- Type:
- text and monografie
- Subject:
- Dějiny států a území na Balkánském poloostrově, sborníky, konference vědecké, hnutí antifašistická, Rumunsko, odboj, odpor, antifašismus, antikomunismus, světové dějiny 1939-1945, zahraniční periodika a sborníky, and zahraniční konference, kongresy
- Language:
- Romanian
- Rights:
- unknown
35. Corpus for training and evaluating diacritics restoration systems
- Creator:
- Náplava, Jakub, Straka, Milan, Hajič, Jan, and Straňák, Pavel
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- diacritical marks generation and natural language correction
- Language:
- Czech, Vietnamese, Romanian, Polish, Slovak, Spanish, Croatian, Irish, Latvian, Hungarian, French, and Turkish
- Description:
- Corpus of texts in 12 languages. For each language, we provide one training, one development and one testing set acquired from Wikipedia articles. Moreover, each language dataset contains (substantially larger) training set collected from (general) Web texts. All sets, except for Wikipedia and Web training sets that can contain similar sentences, are disjoint. Data are segmented into sentences which are further word tokenized. All data in the corpus contain diacritics. To strip diacritics from them, use Python script diacritization_stripping.py contained within attached stripping_diacritics.zip. This script has two modes. We generally recommend using method called uninames, which for some languages behaves better. The code for training recurrent neural-network based model for diacritics restoration is located at https://github.com/arahusky/diacritics_restoration.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
36. Cultura moldovenească în timpul lui Ştefan cel Mare :
- Type:
- text and monografie
- Subject:
- Dějiny států a území na Balkánském poloostrově, Štěpán, Berza, Mihai,, sborníky, dějiny kultury, Rumunsko, dějiny vědy, umění, kultury a techniky, kulturní vztahy, světové dějiny středověku (do r. 1492), světové dějiny 1492-1648, and zahraniční periodika a sborníky
- Language:
- Romanian
- Description:
- Moldavská kultura v době Štěpána Velikého.
- Rights:
- unknown
37. DaMuEL 1.0: A Large Multilingual Dataset for Entity Linking
- Creator:
- Kubeša, David and Straka, Milan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- entity linking, NEL, NER, dataset, and knowledge base
- Language:
- Afrikaans, Arabic, Armenian, Basque, Belarusian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Latin, Latvian, Lithuanian, Maltese, Marathi, Modern Greek (1453-), Northern Sami, Norwegian Nynorsk, Persian, Polish, Portuguese, Romanian, Russian, Scottish Gaelic, Serbian, Slovak, Slovenian, Spanish, Swedish, Tamil, Telugu, Uighur, Ukrainian, Urdu, Vietnamese, and Wolof
- Description:
- We present DaMuEL, a large Multilingual Dataset for Entity Linking containing data in 53 languages. DaMuEL consists of two components: a knowledge base that contains language-agnostic information about entities, including their claims from Wikidata and named entity types (PER, ORG, LOC, EVENT, BRAND, WORK_OF_ART, MANUFACTURED); and Wikipedia texts with entity mentions linked to the knowledge base, along with language-specific text from Wikidata such as labels, aliases, and descriptions, stored separately for each language. The Wikidata QID is used as a persistent, language-agnostic identifier, enabling the combination of the knowledge base with language-specific texts and information for each entity. Wikipedia documents deliberately annotate only a single mention for every entity present; we further automatically detect all mentions of named entities linked from each document. The dataset contains 27.9M named entities in the knowledge base and 12.3G tokens from Wikipedia texts. The dataset is published under the CC BY-SA licence.
- Rights:
- Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB
38. Dan al II-lea, Sigismund de Luxemburg şi cruciada târzie :
- Creator:
- Cîmpeanu, Liviu
- Type:
- text and studie
- Subject:
- Dějiny Evropy, Zikmund Lucemburský,, Dan, Radu, panovníci čeští, dokumenty, výpravy křížové, panovníci valašští, války proti Turkům, řád, němečtí rytíři, Rumunsko, světové dějiny středověku (do r. 1492), vojenské operace, války, bitvy, and české země 1419-1471
- Language:
- Romanian
- Description:
- Dan II, Sigismund od Luxembourg and the later crusades. A new document from the archive of the teutonic order.
- Rights:
- unknown
39. Deep Universal Dependencies 2.4
- Creator:
- Zeman, Daniel and Droganova, Kira
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- semantic dependency and universal dependencies
- Language:
- Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, and Galician
- Description:
- Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-2988). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
- Rights:
- Licence Universal Dependencies v2.4, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.4, and PUB
40. Deep Universal Dependencies 2.5
- Creator:
- Zeman, Daniel and Droganova, Kira
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- semantic dependency and universal dependencies
- Language:
- Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, Galician, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, and Skolt Sami
- Description:
- Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3105). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
- Rights:
- Licence Universal Dependencies v2.5, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.5, and PUB
41. Deep Universal Dependencies 2.6
- Creator:
- Zeman, Daniel and Droganova, Kira
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- semantic dependency and universal dependencies
- Language:
- Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, Galician, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Icelandic, Albanian, and Persian
- Description:
- Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3226). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
- Rights:
- Licence Universal Dependencies v2.6, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.6, and PUB
42. Deep Universal Dependencies 2.7
- Creator:
- Zeman, Daniel and Droganova, Kira
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- semantic dependency and universal dependencies
- Language:
- Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, Galician, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Icelandic, Albanian, Persian, Akuntsu, Apurinã, Khunsari, Manx, Mundurukú, Nayini, Soi, South Levantine Arabic, and Tupinambá
- Description:
- Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3424). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
- Rights:
- Licence Universal Dependencies v2.7, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.7, and PUB
43. Deep Universal Dependencies 2.8
- Creator:
- Zeman, Daniel and Droganova, Kira
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- semantic dependency and universal dependencies
- Language:
- Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, Galician, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Icelandic, Albanian, Persian, Akuntsu, Apurinã, Khunsari, Manx, Mundurukú, Nayini, Soi, South Levantine Arabic, Tupinambá, Beja, Western Frisian, Urubú-Kaapor, Kangri, K'iche', Low German, Makuráp, Western Armenian, and Central Siberian Yupik
- Description:
- Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3687). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
- Rights:
- Licence Universal Dependencies v2.8, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.8, and PUB
44. Deltacorpus
- Creator:
- Mareček, David, Yu, Zhiwei, Zeman, Daniel, and Žabokrtský, Zdeněk
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- part of speech, tagging, semi-supervised, and cross-language
- Language:
- Belarusian, Bosnian, Bulgarian, Czech, Serbo-Croatian, Croatian, Upper Sorbian, Macedonian, Polish, Russian, Slovak, Slovenian, Serbian, Ukrainian, Latvian, Lithuanian, Afrikaans, Danish, German, English, Faroese, Western Frisian, Swiss German, Icelandic, Limburgan, Luxembourgish, Low German, Dutch, Norwegian Nynorsk, Norwegian, Scots, Swedish, Yiddish, Aragonese, Asturian, Catalan, French, Galician, Haitian, Italian, Latin, Lombard, Neapolitan, Piemontese, Portuguese, Romanian, Spanish, Venetian, Walloon, Breton, Welsh, Scottish Gaelic, Irish, Modern Greek (1453-), Armenian, Albanian, Dimli (individual language), Persian, Gilaki, Kurdish, Tajik, Bengali, Bishnupriya, Gujarati, Fiji Hindi, Hindi, Marathi, Nepali (macrolanguage), Urdu, Amharic, Arabic, Egyptian Arabic, Hebrew, Estonian, Finnish, Hungarian, Basque, Georgian, Chuvash, Azerbaijani, Turkish, Uzbek, Kazakh, Tatar, Yakut, Korean, Mongolian, Telugu, Kannada, Malayalam, Tamil, Newari, Vietnamese, Indonesian, Javanese, Malagasy, Maori, Malay (macrolanguage), Pampanga, Sundanese, Tagalog, Waray (Philippines), Swahili (macrolanguage), Esperanto, Ido, Interlingua (International Auxiliary Language Association), and Volapük
- Description:
- Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger described in Yu et al. (2016, LREC, Portorož, Slovenia).
- Rights:
- Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB
45. Deltacorpus 1.1
- Creator:
- Mareček, David, Yu, Zhiwei, Zeman, Daniel, and Žabokrtský, Zdeněk
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- part of speech, tagging, semi-supervised, and cross-language
- Language:
- Belarusian, Bosnian, Bulgarian, Czech, Serbo-Croatian, Croatian, Upper Sorbian, Macedonian, Polish, Russian, Slovak, Slovenian, Serbian, Ukrainian, Latvian, Lithuanian, Afrikaans, Danish, German, English, Faroese, Western Frisian, Swiss German, Icelandic, Limburgan, Luxembourgish, Low German, Dutch, Norwegian Nynorsk, Norwegian, Scots, Swedish, Yiddish, Aragonese, Asturian, Catalan, French, Galician, Haitian, Italian, Latin, Lombard, Neapolitan, Piemontese, Portuguese, Romanian, Spanish, Venetian, Walloon, Breton, Welsh, Scottish Gaelic, Irish, Modern Greek (1453-), Armenian, Albanian, Dimli (individual language), Persian, Gilaki, Kurdish, Tajik, Bengali, Bishnupriya, Gujarati, Fiji Hindi, Hindi, Marathi, Nepali (macrolanguage), Urdu, Amharic, Arabic, Egyptian Arabic, Hebrew, Estonian, Finnish, Hungarian, Basque, Georgian, Chuvash, Azerbaijani, Turkish, Uzbek, Kazakh, Tatar, Yakut, Korean, Mongolian, Telugu, Kannada, Malayalam, Tamil, Newari, Vietnamese, Indonesian, Javanese, Malagasy, Maori, Malay (macrolanguage), Pampanga, Sundanese, Tagalog, Waray (Philippines), Swahili (macrolanguage), Esperanto, Ido, Interlingua (International Auxiliary Language Association), and Volapük
- Description:
- Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger described in Yu et al. (2016, LREC, Portorož, Slovenia). Changes in version 1.1: 1. Universal Dependencies tagset instead of the older and smaller Google Universal POS tagset. 2. SVM classifier trained on Universal Dependencies 1.2 instead of HamleDT 2.0. 3. Balto-Slavic languages, Germanic languages and Romance languages were tagged by classifier trained only on the respective group of languages. Other languages were tagged by a classifier trained on all available languages. The "c7" combination from version 1.0 is no longer used.
- Rights:
- Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB
46. Documente privind istoria Romîniei.
- Type:
- text, prameny, and edice
- Subject:
- Dějiny Evropy, Rumunsko, přehledná zpracování světových dějin (chronologicky), and přehledná zpracování (tematicky)
- Language:
- Romanian
- Rights:
- unknown
47. Documente privind revolutia de la 1848 in tările române :
- Type:
- text and dokumenty
- Subject:
- Dějiny států a území na Balkánském poloostrově, revoluce 1848-1849, Rumunsko, and světové dějiny 1789-1918
- Language:
- Romanian and German
- Rights:
- unknown
48. Eşuarea încercârilor monarhiei şi a reactiunii interne şi externe de a râsturna regimul democrat popular /
- Creator:
- Bâlteanu, Boris
- Type:
- text and studie
- Subject:
- Dějiny států a území na Balkánském poloostrově, Rumunsko, politické dějiny, politici, and světové dějiny od r. 1945 do současnosti
- Language:
- Romanian
- Rights:
- unknown
49. Filozoful Jan Patočka /
- Creator:
- Dubský, Ivan,
- Type:
- text and biografie
- Subject:
- Filozofie, Patočka, Jan,, filozofové čeští, Československo 1945-1992, and filozofie, filozofové
- Language:
- Romanian
- Rights:
- unknown
50. Gabriel Bethlen (1613-1629) /
- Creator:
- Bunta Péter,
- Type:
- text, monografie, and biografie
- Subject:
- Dějiny Evropy, Bethlen, Gabriel,, knížata sedmihradská, povstání protihabsburská, šlechta, buržoazie, měšťanstvo, podnikatelé, and světové dějiny 1492-1648
- Language:
- Romanian
- Rights:
- unknown
51. HamleDT 2.0
- Creator:
- Zeman, Daniel, Mareček, David, Mašek, Jan, Popel, Martin, Ramasamy, Loganathan, Rosa, Rudolf, Štěpánek, Jan, and Žabokrtský, Zdeněk
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- treebank, Stanford dependencies, Prague dependencies, harmonization, common annotation style, and Interset
- Language:
- Arabic, Bulgarian, Bengali, Catalan, Czech, Danish, German, Modern Greek (1453-), English, Spanish, Estonian, Basque, Persian, Finnish, Ancient Greek (to 1453), Hindi, Hungarian, Italian, Japanese, Latin, Dutch, Portuguese, Romanian, Russian, Slovak, Slovenian, Swedish, Tamil, Telugu, and Turkish
- Description:
- HamleDT 2.0 is a collection of 30 existing treebanks harmonized into a common annotation style, the Prague Dependencies, and further transformed into Stanford Dependencies, a treebank annotation style that became popular recently. We use the newest basic Universal Stanford Dependencies, without added language-specific subtypes.
- Rights:
- HamleDT 2.0 Licence Agreement, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-hamledt-2.0, and ACA
52. HamleDT 3.0
- Creator:
- Zeman, Daniel, Mareček, David, Mašek, Jan, Popel, Martin, Ramasamy, Loganathan, Rosa, Rudolf, Štěpánek, Jan, and Žabokrtský, Zdeněk
- Publisher:
- Charles University
- Type:
- text and corpus
- Subject:
- annotated corpus, morphology, syntax, dependency, treebank, harmonized annotation, and common annotation style
- Language:
- Arabic, Basque, Bengali, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Modern Greek (1453-), Ancient Greek (to 1453), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Persian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Tamil, Telugu, and Turkish
- Description:
- HamleDT (HArmonized Multi-LanguagE Dependency Treebank) is a compilation of existing dependency treebanks (or dependency conversions of other treebanks), transformed so that they all conform to the same annotation style. This version uses Universal Dependencies as the common annotation style. Update (November 1017): for a current collection of harmonized dependency treebanks, we recommend using the Universal Dependencies (UD). All of the corpora that are distributed in HamleDT in full are also part of the UD project; only some corpora from the Patch group (where HamleDT provides only the harmonizing scripts but not the full corpus data) are available in HamleDT but not in UD.
- Rights:
- HamleDT 3.0 License Terms, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-hamledt-3.0, and PUB
53. Iaşii :
- Creator:
- Andronic, Alexandru,
- Type:
- text and monografie
- Subject:
- Dějiny států a území na Balkánském poloostrově, města středověká, urbanizace, Rumunsko, města, obce, and světové dějiny středověku (do r. 1492)
- Language:
- Romanian
- Rights:
- unknown
54. In tres partes divisa. Polonia śi criza Cehoslovacă în documente diplomatice Româneşti (septembrie-octombrie 1938) /
- Creator:
- Anghel, Florin
- Subject:
- diplomacie rumunská, dokumenty diplomatické, vztahy mezinárodní, politika zahraniční, vztahy československo-polské, vztahy rumunsko-československé, krize mnichovská, politické dějiny, politici, zahraniční politika, mezinárodní vztahy, světové dějiny 1918-1945, Polsko, Rumunsko, and Československo 1938-1945
- Language:
- Romanian
- Description:
- In tres partes divisa: Poland and the Czechoslovak crisis in Romanian diplomatic documents (September-October 1938).
- Rights:
- unknown
55. Începutul activitãtii revolutionare a lui C. Dobrogeanu-Gherea /
- Creator:
- Haupt, Georges,
- Type:
- text and studie
- Subject:
- Dějiny států a území na Balkánském poloostrově, Gherea-Dobrogeanu, Constantin,, hnutí dělnické, sociologové, publicisté, Rumunsko, Rusko, dělnictvo, chudina, and světové dějiny 1789-1918
- Language:
- Romanian
- Rights:
- unknown
56. Internationala întîi şi Romînia /
- Creator:
- Deac, Augustin,
- Type:
- text and monografie
- Subject:
- Politické strany a hnutí, vztahy mezinárodní, internacionála první (1864-1876), Rumunsko, zahraniční politika, mezinárodní vztahy, and světové dějiny 1789-1918
- Language:
- Romanian
- Description:
- Institutul de istorie a partidului de pe lîngă C.C. al P.M.R.
- Rights:
- unknown
57. Istoria ilustrată a Românilor /
- Creator:
- Giurescu, Dinu Constantin,
- Type:
- text and monografie
- Subject:
- Dějiny států a území na Balkánském poloostrově, dějiny států, Rumunsko, přehledná zpracování (tematicky), and přehledná zpracování světových dějin (chronologicky)
- Language:
- Romanian
- Rights:
- unknown
58. Istoria matematicii in România.
- Creator:
- Andonie, George Ștefan,
- Type:
- text and monografie
- Subject:
- Matematika, dějiny vědy, vědy exaktní, matematika, Rumunsko, matematika, kybernetika, and přehledná zpracování světových dějin (chronologicky)
- Language:
- Romanian
- Rights:
- unknown
59. Istoria Romäniei în date /
- Type:
- text and příručky
- Subject:
- Dějiny států a území na Balkánském poloostrově, dějiny, obecné přehledy, Rumunsko, politické dějiny, politici, and přehledná zpracování světových dějin (chronologicky)
- Language:
- Romanian
- Rights:
- unknown
60. Istoria Romäniei.
- Type:
- text and monografie kolektivní
- Subject:
- Dějiny států a území na Balkánském poloostrově, dějiny států, Rumunsko, přehledná zpracování (tematicky), and světové dějiny 1789-1918
- Language:
- Romanian
- Rights:
- unknown
61. Istoria sclavajului in Dacia romană /
- Creator:
- Tudor, David,
- Type:
- text and monografie
- Subject:
- Dějiny zemí starověkého světa, antika, provincie, otroci, Rumunsko, společenská struktura, Etruskové, starověký Řím, and epigrafika
- Language:
- Romanian
- Rights:
- unknown
62. Istoria stiintelor în România :
- Type:
- text and monografie kolektivní
- Subject:
- Obecná biologie, dějiny biologie, biologie, Rumunsko, přehledná zpracování světových dějin (chronologicky), and vědy o živé přírodě
- Language:
- Romanian
- Rights:
- unknown
63. Istorich sunt oameni modesti, spre pąguba breslei /
- Creator:
- Matej, Dorin
- Subject:
- vztahy česko-rumunské, zahraniční politika, mezinárodní vztahy, světové dějiny od r. 1918 do současnosti, Rumunsko, and Československo 1918-1992
- Language:
- Romanian
- Description:
- [rozhovor s velvyslancem ČR v Rumunsku Radkem Pechem]
- Rights:
- unknown
64. JRC-Acquis
- Publisher:
- Joint Research Centre of the EU
- Type:
- corpus
- Language:
- Bulgarian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Modern Greek (1453-), Hungarian, Italian, Latvian, Maltese, Norwegian, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, and Swedish
- Description:
- The largest parallel corpus, contains EU law, the Acquis Communautaire in 22 languages.
- Rights:
- Not specified
65. Juvenilie Josefa Macůrka /
- Creator:
- Macůrek, Josef,
- Type:
- text, sborníky jubilejní, and spisy
- Subject:
- Historická věda. Pomocné vědy historické. Archivnictví, Macůrek, Josef,, historici (jubilea, nekrology apod.), Československo 1918-1992, and dějepisectví, historické vědy, historici
- Language:
- Czech, Romanian, and French
- Description:
- Čes., rum. a franc. text
- Rights:
- unknown
66. Karel Zdeněk Líman :
- Type:
- text and monografie
- Subject:
- Architektura, Líman, Karel Zdeněk,, Carol, architekti čeští, Češi rumunští, architektura, české země 1848-1918, Československo 1918-1938, architektura, architekti, Rumunsko, světové dějiny 1789-1918, and světové dějiny 1918-1945
- Language:
- Romanian and English
- Rights:
- unknown
67. La industria checoeslovaca :
- Type:
- text and publikace informační
- Subject:
- Národní hospodářství a hospodářská politika, průmysl, Československo 1918-1938, and průmysl, manufaktury, hornictví, pivovary
- Language:
- Romanian
- Description:
- Na konci knihy [64] stran reklam a propagačních textů v češtině, španělštině, angličtině, němčině
- Rights:
- unknown
68. Lectii în ajutorul celor care studiaza istoriă P. M. R. /
- Type:
- text and sborníky
- Subject:
- Dějiny států a území na Balkánském poloostrově, hnutí dělnické, strany politické sociálně demokratické, strany politické komunistické, Rumunsko, dělnictvo, chudina, světové dějiny 1789-1918, světové dějiny od r. 1918 do současnosti, and politické strany a hnutí, volby
- Language:
- Romanian
- Rights:
- unknown
69. Les relations politiques roumano-francaises au début du XX siěcle (1900-1916 ) /
- Creator:
- Vesa, Vasile,
- Type:
- text and monografie
- Subject:
- Mezinárodní vztahy, světová politika, vztahy rumunsko-francouzské, politika zahraniční, Rumunsko, Francie, zahraniční politika, mezinárodní vztahy, and světové dějiny 1789-1918
- Language:
- French and Romanian
- Rights:
- unknown
70. Lupta pentru unitate naţională a tăriolor române 1590-1630 :
- Type:
- text and dokumenty
- Subject:
- Dějiny států a území na Balkánském poloostrově, války turecké, dějiny politické, Rumunsko, politické dějiny, politici, and světové dějiny 1492-1648
- Language:
- Romanian
- Description:
- Inst. de istorie Nicolae Iorga
- Rights:
- unknown
71. MEBA word aligner
- Creator:
- Tufiş, Dan and Ceauşu, Alexandru
- Publisher:
- Research Institute for Artificial Intelligence, Romanian Academy of Sciences
- Type:
- toolService
- Subject:
- word aligner
- Language:
- English and Romanian
- Description:
- MEBA is a lexical aligner, implemented in C#, based on an iterative algorithm that uses pre-processing steps: sentence alignment ([[http://www.clarin.eu/tools/sal-sentence-aligner|SAL]]), tokenization, POS-tagging and lemmatization (through [[http://www.clarin.eu/tools/ttl-tokenizing-tagging-and-lemmatizing-free-running-texts|TTL]], sentence chunking. Similar to YAWA aligner, MEBA generates the links step by step, beginning with the most probable (anchor links). The links to be added at any later step are supported or restricted by the links created in the previous iterations. The aligner has different weights and different significance thresholds on each feature and iteration. Each of the iterations can be configured to align different categories of tokens (named entities, dates and numbers, content words, functional words, punctuation) in decreasing order of statistical evidence. MEBA has an individual F-measure of 81.71% and it is currently integrated in the platform [[http://www.clarin.eu/tools/cowal-combined-word-aligner|COWAL]]. More detailed descriptions are available in [[http://www.racai.ro/~tufis/papers|the following papers]]: -- Dan Tufiş (2007). Exploiting Aligned Parallel Corpora in Multilingual Studies and Applications. In Toru Ishida, Susan R. Fussell, and Piek T.J.M. Vossen (eds.), Intercultural Collaboration. First International Workshop (IWIC 2007), volume 4568 of Lecture Notes in Computer Science, pp. 103-117. Springer-Verlag, August 2007. ISBN 978-3-540-73999-9. -- -- Dan Tufiş, Radu Ion, Alexandru Ceauşu, and Dan Ştefănescu (2006). Improved Lexical Alignment by Combining Multiple Reified Alignments. In Toru Ishida, Susan R. Fussell, and Piek T.J.M. Vossen (eds.), Proceedings of the 11th Conference EACL2006, pp. 153-160, Trento, Italy, April 2006. Association for Computational Linguistics. ISBN 1-9324-32-61-2. -- Dan Tufiş, Radu Ion, Alexandru Ceauşu, and Dan Ştefănescu (2005). Combined Aligners. In Proceedings of the ACL Workshop on Building and Using Parallel Texts: Data-Driven Machine Translation and Beyond, pp. 107-110, Ann Arbor, USA, June 2005. Association for Computational Linguistics. ISBN 978-973-703-208-9.
- Rights:
- Not specified
72. Misiunea militară a generalului Milan Rastislav Štefánik în România în lumina documentelor din arhivele române şi franceze /
- Creator:
- Kopecký, Peter,
- Type:
- text and studie
- Subject:
- Dějiny Česka a Slovenska, Štefánik, Milan Rastislav,, dokumenty archivní, politici slovenští, cesty zahraniční, vztahy československo-rumunské, zahraniční politika, mezinárodní vztahy, světové dějiny 1914-1918, Rumunsko, zahraniční archivnictví, and české země 1914-1918
- Language:
- Romanian
- Rights:
- unknown
73. Morpho-syntactically annotated corpora provided for the PARSEME Shared Task on Semi-Supervised Identification of Verbal Multiword Expressions (edition 1.2)
- Creator:
- Guillaume, Bruno, Ramisch, Carlos, Waszczuk, Jakub, Monti, Johanna, Di Buono, Maria Pia, Sangati, Federico, Speranza, Giulia, Carlino, Carola, Güngör, Tunga, Yirmibeşoğlu, Zeynep, Sak, Haşim, Saraçlar, Murat, Giouli, Voula, Foufi, Vassiliki, Ramisch, Renata, Rademaker, Alexandre, Vale, Oto, Wilkens, Rodrigo, Candito, Marie, Crabbé, Benoît, Segonne, Vincent, Liebeskind, Chaya, Stymne, Sara, Hajič, Jan, Ginter, Filip, Luotolahti, Juhani, Straka, Milan, Zeman, Daniel, Barbu Mititelu, Verginica, Cristescu, Mihaela, Vaidya, Ashwini, Bhatia, Archna, Lichte, Timm, Ehren, Rafael, Jiang, Menghan, Xu, Hongzhi, Walsh, Abigail, Irimia, Elena, and Dowling, Meghan
- Publisher:
- PARSEME
- Type:
- text and corpus
- Subject:
- morphosyntactic annotation, dependency trees, and morphological analysis
- Language:
- German, Modern Greek (1453-), Basque, French, Irish, Hebrew, Hindi, Italian, Polish, Portuguese, Romanian, Swedish, Turkish, and Chinese
- Description:
- This multilingual resource contains corpora for 14 languages, gathered at the occasion of the 1.2 edition of the PARSEME Shared Task on semi-supervised Identification of Verbal MWEs (2020). These corpora were meant to serve as additional "raw" corpora, to help discovering unseen verbal MWEs. The corpora are provided in CONLL-U (https://universaldependencies.org/format.html) format. They contain morphosyntactic annotations (parts of speech, lemmas, morphological features, and syntactic dependencies). Depending on the language, the information comes from treebanks (mostly Universal Dependencies v2.x) or from automatic parsers trained on UD v2.x treebanks (e.g., UDPipe). VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). For the 1.2 shared task edition, the data covers 14 languages, for which VMWEs were annotated according to the universal guidelines. The corpora are provided in the cupt format, inspired by the CONLL-U format. Morphological and syntactic information – not necessarily using UD tagsets – including parts of speech, lemmas, morphological features and/or syntactic dependencies are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). This item contains training, development and test data, as well as the evaluation tools used in the PARSEME Shared Task 1.2 (2020). The annotation guidelines are available online: http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.2
- Rights:
- PARSEME Shared Task Raw Corpus Data (v. 1.2) Agreement, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-mwe-1.2-raw, and PUB
74. Neologismos económicos en las lenguas románicas a través de la prensa
- Publisher:
- Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
- Type:
- lexicalConceptualResource
- Subject:
- terminology database
- Language:
- Catalan, French, Galician, Italian, Portuguese, Romanian, and Spanish
- Description:
- Multilingual terminological resource containing 3.875 entries from the Economics, Finance and Banking domains.
- Rights:
- Not specified
75. Noi contribuţii privitoare la originea şi activitatea arhitectului Johann Freywald /
- Creator:
- Rădvan, Laurențiu,
- Type:
- text and studie
- Subject:
- Architektura, Dějiny států a území na Balkánském poloostrově, Freywald, Johann,, architekti, architektura městská, novoklasicismus, vztahy česko-rumunské, vztahy kulturní, české země 1792-1847, architektura, architekti, Rumunsko, and světové dějiny 1789-1918
- Language:
- Romanian
- Description:
- New contributions to the origin and work of the architect Johann Freywald.
- Rights:
- unknown
76. Noul sfat al animalelor /
- Creator:
- Flaška z Pardubic a Rychmburku, Smil,
- Type:
- text, prameny, and překlady
- Subject:
- Česká poezie, Flaška z Pardubic a Rychmburku, Smil,, literatura česká, literatura středověká, české země 1306-1419, and literatura, spisovatelé
- Language:
- Romanian
- Description:
- Přeloženo z češtiny
- Rights:
- unknown
77. Ocuparea Cehoslovaciei (august 1968) reflectată de Radiodifuziunea Romănă /
- Creator:
- Denize, Eugen
- Subject:
- okupace, rozhlas rumunský, mínění veřejné, Mnichov 1938, Pražské jaro 1968, okupace 1939, 1968, světové dějiny od r. 1945 do současnosti, Rumunsko, Československo 1948-1969, and televize, rozhlas
- Language:
- Romanian
- Rights:
- unknown
78. Omagiu lui Constantin Daicoviciu cu prilejul împlinirii a 60 de ani :
- Type:
- text and sborníky jubilejní
- Subject:
- Historická věda. Pomocné vědy historické. Archivnictví, Daicoviciu, Constantin,, archeologie, zahraniční periodika a sborníky, přehledná zpracování (tematicky), světové dějiny - pravěk a starověk, and světové dějiny středověku (do r. 1492)
- Language:
- Romanian
- Rights:
- unknown
79. OmegaWiki
- Publisher:
- Universität Bamberg, World Language Documentation Centre
- Format:
- application/octet-stream
- Type:
- lexicalConceptualResource
- Language:
- Afrikaans, Arabic, Basque, Bulgarian, Catalan, Chinese, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Galician, Georgian, Modern Greek (1453-), Hebrew, Hungarian, Icelandic, Indonesian, Interlingua (International Auxiliary Language Association), Irish, Italian, Japanese, Khmer, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Spanish, Swedish, Turkish, Ukrainian, and Welsh
- Rights:
- GFDL or CC and http://www.omegawiki.org/Licensing
80. ParaCrawl Corpus version 1.0
- Creator:
- Koehn, Philipp, Heafield, Kenneth, Forcada, Mikel L., Esplà-Gomis, Miquel, Ortiz-Rojas, Sergio, Sánchez, Gema Ramírez, Cartagena, Víctor M. Sánchez, Haddow, Barry, Bañón, Marta, Střelec, Marek, Samiotou, Anna, and Kamran, Amir
- Publisher:
- ParaCrawl
- Type:
- text and corpus
- Subject:
- ParaCrawl, parallel corpus, CommonCrawl, machine translation, and text corpora
- Language:
- English, German, French, Spanish, Italian, Portuguese, Dutch, Polish, Czech, Romanian, Finnish, Latvian, Russian, and Estonian
- Description:
- The January 2018 release of the ParaCrawl is the first version of the corpus. It contains parallel corpora for 11 languages paired with English, crawled from a large number of web sites. The selection of websites is based on CommonCrawl, but ParaCrawl is extracted from a brand new crawl which has much higher coverage of these selected websites than CommonCrawl. Since the data is fairly raw, it is released with two quality metrics that can be used for corpus filtering. An official "clean" version of each corpus uses one of the metrics. For more details and raw data download please visit: http://paracrawl.eu/releases.html
- Rights:
- Public Domain Dedication (CC Zero), http://creativecommons.org/publicdomain/zero/1.0/, and PUB
81. PARSEME corpora annotated for verbal multiword expressions (version 1.3)
- Creator:
- Savary, Agata, Ramisch, Carlos, Guillaume, Bruno, Hawwari, Abdelati, Walsh, Abigail, Fotopoulou, Aggeliki, Bielinskienė, Agnė, Estarrona, Ainara, Gatt, Albert, Butler, Alexandra, Rademaker, Alexandre, Maldonado, Alfredo, Villavicencio, Aline, Farrugia, Alison, Muscat, Amanda, Gatt, Anabelle, Antić, Anđela, De Santis, Anna, Raffone, Annalisa, Riccio, Anna, Pascucci, Antonio, Gurrutxaga, Antton, Bhatia, Archna, Vaidya, Ashwini, Miral, Ayşenur, QasemiZadeh, Behrang, Priego Sanchez, Belem, Griciūtė, Bernadeta, Erden, Berna, Parra Escartín, Carla, Herrero, Carlos, Carlino, Carola, Pasquer, Caroline, Liebeskind, Chaya, Wang, Chenweng, Ben Khelil, Chérifa, Bonial, Claire, Somers, Clarissa, Aceta, Cristina, Krstev, Cvetana, Bejček, Eduard, Lindqvist, Ellinor, Erenmalm, Elsa, Palka-Binkiewicz, Emilia, Rimkute, Erika, Petterson, Eva, Cap, Fabienne, Hu, Fangyuan, Sangati, Federico, Wick Pedro, Gabriela, Speranza, Giulia, Jagfeld, Glorianna, Blagus, Goranka, Berk, Gözde, Attard, Greta, Eryiğit, Gülşen, Finnveden, Gustav, Martínez Alonso, Héctor, de Medeiros Caseli, Helena, Elyovich, Hevi, Xu, Hongzhi, Xiao, Huangyang, Miranda, Isaac, Jaknić, Isidora, El Maarouf, Ismail, Aduriz, Itziar, Gonzalez, Itziar, Matas, Ivana, Stoyanova, Ivelina, Jazbec, Ivo-Pavao, Busuttil, Jael, Waszczuk, Jakub, Findlay, Jamie, Bonnici, Janice, Šnajder, Jan, Antoine, Jean-Yves, Foster, Jennifer, Chen, Jia, Nivre, Joakim, Monti, Johanna, McCrae, John, Kovalevskaitė, Jolanta, Jain, Kanishka, Simkó, Katalin, Yu, Ke, Azzopardi, Kirsty, Adalı, Kübra, Uria, Larraitz, Zilio, Leonardo, Boizou, Loïc, van der Plas, Lonneke, Galea, Luke, Sarlak, Mahtab, Buljan, Maja, Cherchi, Manuela, Tanti, Marc, Di Buono, Maria Pia, Todorova, Maria, Candito, Marie, Constant, Matthieu, Shamsfard, Mehrnoush, Jiang, Menghan, Boz, Mert, Spagnol, Michael, Onofrei, Mihaela, Li, Minli, Elbadrashiny, Mohamed, Diab, Mona, Rizea, Monica-Mihaela, Hadj Mohamed, Najet, Theoxari, Natasa, Schneider, Nathan, Tabone, Nicole, Ljubešić, Nikola, Vale, Oto, Cook, Paul, Yan, Peiyi, Gantar, Polona, Ehren, Rafael, Fabri, Ray, Ibrahim, Rehab, Ramisch, Renata, Walles, Rinat, Wilkens, Rodrigo, Urizar, Ruben, Sun, Ruilong, Malka, Ruth, Galea, Sara Anne, Stymne, Sara, Louizou, Sevasti, Hu, Sha, Taslimipoor, Shiva, Ratori, Shraddha, Srivastava, Shubham, Cordeiro, Silvio Ricardo, Krek, Simon, Liu, Siyuan, Zeng, Si, Yu, Songping, Arhar Holdt, Špela, Markantonatou, Stella, Papadelli, Stella, Leseva, Svetlozara, Kuzman, Taja, Kavčič, Teja, Lynn, Teresa, Lichte, Timm, Pickard, Thomas, Dimitrova, Tsvetana, Yih, Tsy, Güngör, Tunga, Dinç, Tutkum, Iñurrieta, Uxoa, Tajalli, Vahide, Stefanova, Valentina, Caruso, Valeria, Puri, Vandana, Foufi, Vassiliki, Barbu Mititelu, Verginica, Vincze, Veronika, Kovács, Viktória, Shukla, Vishakha, Giouli, Voula, Ge, Xiaomin, Ha-Cohen Kerner, Yaakov, Öztürk, Yağmur, Yarandi, Yalda, Parmentier, Yannick, Zhang, Yongchen, Zhao, Yun, Urešová, Zdeňka, Yirmibeşoğlu, Zeynep, Qin, Zhenzhen, Stank, Cristescu, Mihaela, Zgreabăn, Bianca-Mădălina, Bărbulescu, Elena-Andreea, and Stanković, Ranka
- Publisher:
- PARSEME
- Type:
- text and corpus
- Subject:
- multiword expressions, verbal multiword expressions, light verb construction, verb-particle constructions, inherently reflexive verbs, verbal idioms, and multi-verb constructions
- Language:
- Arabic, Bulgarian, Czech, German, Modern Greek (1453-), English, Spanish, Basque, Persian, French, Irish, Hebrew, Hindi, Croatian, Hungarian, Lithuanian, Italian, Maltese, Polish, Portuguese, Romanian, Slovenian, Serbian, Swedish, Turkish, and Chinese
- Description:
- This multilingual resource contains corpora in which verbal MWEs have been manually annotated. VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). This is the first release of the corpora without an associated shared task. Previous version (1.2) was associated with the PARSEME Shared Task on semi-supervised Identification of Verbal MWEs (2020). The data covers 26 languages corresponding to the combination of the corpora for all previous three editions (1.0, 1.1 and 1.2) of the corpora. VMWEs were annotated according to the universal guidelines. The corpora are provided in the cupt format, inspired by the CONLL-U format. Morphological and syntactic information, including parts of speech, lemmas, morphological features and/or syntactic dependencies, are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). All corpora are split into training, development and test data, following the splitting strategy adopted for the PARSEME Shared Task 1.2. The annotation guidelines are available online: https://parsemefr.lis-lab.fr/parseme-st-guidelines/1.3 The .cupt format is detailed here: https://multiword.sourceforge.net/cupt-format/
- Rights:
- PARSEME Corpora v. 1.3 - Licence Agreement, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-mwe-1.3, and PUB
82. Pentru patrie /
- Creator:
- Lupäşteanu, Aurel
- Type:
- text and vzpomínky
- Subject:
- Dějiny států a území na Balkánském poloostrově, vzpomínky, hnutí antifašistická, povstání, Rumunsko, odboj, odpor, antifašismus, antikomunismus, světové dějiny 1939-1945, and armáda, vojenské složky, vojáci
- Language:
- Romanian
- Rights:
- unknown
83. Plaintext Wikipedia dump 2018
- Creator:
- Rosa, Rudolf
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- Wikipedia, text corpora, and monolingual corpus
- Language:
- Abkhazian, Achinese, Adyghe, Afrikaans, Akan, Tosk Albanian, Amharic, Old English (ca. 450-1100), Arabic, Official Aramaic (700-300 BCE), Aragonese, Egyptian Arabic, Assamese, Asturian, Atikamekw, Avaric, Aymara, South Azerbaijani, Azerbaijani, Bashkir, Bambara, Bavarian, Central Bikol, Belarusian, Bengali, Bislama, Banjar, Tibetan, Bosnian, Bishnupriya, Breton, Buginese, Bulgarian, Russia Buriat, Catalan, Min Dong Chinese, Cebuano, Czech, Chamorro, Chechen, Cherokee, Church Slavic, Chuvash, Cheyenne, Central Kurdish, Cornish, Corsican, Cree, Crimean Tatar, Kashubian, Welsh, Danish, German, Dinka, Dimli (individual language), Dhivehi, Lower Sorbian, Dzongkha, Modern Greek (1453-), English, Esperanto, Estonian, Basque, Ewe, Extremaduran, Faroese, Persian, Fijian, Finnish, French, Arpitan, Northern Frisian, Western Frisian, Fulah, Friulian, Gagauz, Gan Chinese, Scottish Gaelic, Irish, Galician, Gilaki, Manx, Goan Konkani, Gothic, Guarani, Gujarati, Hakka Chinese, Haitian, Hausa, Hawaiian, Serbo-Croatian, Hebrew, Herero, Fiji Hindi, Hindi, Hiri Motu, Croatian, Upper Sorbian, Hungarian, Armenian, Igbo, Ido, Inuktitut, Interlingue, Iloko, Interlingua (International Auxiliary Language Association), Indonesian, Inupiaq, Icelandic, Italian, Jamaican Creole English, Javanese, Lojban, Japanese, Kara-Kalpak, Kabyle, Kalaallisut, Kannada, Kashmiri, Georgian, Kanuri, Kazakh, Kabardian, Kabiyè, Khmer, Kikuyu, Kinyarwanda, Kirghiz, Komi-Permyak, Komi, Kongo, Korean, Karachay-Balkar, Kölsch, Kurdish, Ladino, Lao, Latin, Latvian, Lak, Lezghian, Ligurian, Limburgan, Lingala, Lithuanian, Lombard, Northern Luri, Latgalian, Luxembourgish, Ganda, Literary Chinese, Marshallese, Maithili, Malayalam, Marathi, Moksha, Eastern Mari, Minangkabau, Macedonian, Malagasy, Maltese, Mongolian, Maori, Western Mari, Malay (macrolanguage), Creek, Mirandese, Burmese, Erzya, Mazanderani, Min Nan Chinese, Neapolitan, Nauru, Navajo, Ndonga, Low German, Nepali (macrolanguage), Newari, Dutch, Norwegian Nynorsk, Norwegian, Novial, Pedi, Nyanja, Occitan (post 1500), Livvi, Oriya (macrolanguage), Oromo, Ossetian, Pangasinan, Pampanga, Panjabi, Papiamento, Picard, Pennsylvania German, Pfaelzisch, Pitcairn-Norfolk, Pali, Piemontese, Western Panjabi, Pontic, Polish, Portuguese, Pushto, Quechua, Vlax Romani, Romansh, Romanian, Rusyn, Rundi, Macedo-Romanian, Russian, Sango, Yakut, Sanskrit, Sicilian, Scots, Samogitian, Sinhala, Slovak, Slovenian, Northern Sami, Samoan, Shona, Sindhi, Somali, Southern Sotho, Spanish, Albanian, Sardinian, Sranan Tongo, Serbian, Swati, Saterfriesisch, Sundanese, Swahili (macrolanguage), Swedish, Silesian, Tahitian, Tamil, Tatar, Tulu, Telugu, Tama (Colombia), Tetum, Tajik, Tagalog, Thai, Tigrinya, Tonga (Tonga Islands), Tok Pisin, Tswana, Tsonga, Turkmen, Tumbuka, Turkish, Twi, Tuvinian, Udmurt, Uighur, Ukrainian, Urdu, Uzbek, Venetian, Venda, Veps, Vietnamese, Vlaams, Volapük, Võro, Waray (Philippines), Walloon, Wolof, Wu Chinese, Kalmyk, Xhosa, Mingrelian, Yiddish, Yoruba, Yue Chinese, Zeeuws, Zhuang, Chinese, Zulu, and Dotyali
- Description:
- Wikipedia plain text data obtained from Wikipedia dumps with WikiExtractor in February 2018. The data come from all Wikipedias for which dumps could be downloaded at [https://dumps.wikimedia.org/]. This amounts to 297 Wikipedias, usually corresponding to individual languages and identified by their ISO codes. Several special Wikipedias are included, most notably "simple" (Simple English Wikipedia) and "incubator" (tiny hatching Wikipedias in various languages). For a list of all the Wikipedias, see [https://meta.wikimedia.org/wiki/List_of_Wikipedias]. The script which can be used to get new version of the data is included, but note that Wikipedia limits the download speed for downloading a lot of the dumps, so it takes a few days to download all of them (but one or a few can be downloaded fast). Also, the format of the dumps changes time to time, so the script will probably eventually stop working one day. The WikiExtractor tool [http://medialab.di.unipi.it/wiki/Wikipedia_Extractor] used to extract text from the Wikipedia dumps is not mine, I only modified it slightly to produce plaintext outputs [https://github.com/ptakopysk/wikiextractor].
- Rights:
- Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0), http://creativecommons.org/licenses/by-sa/3.0/, and PUB
84. Populaţie şi societate.
- Type:
- text and sborníky
- Subject:
- Demografie. Populace, demografie historická, Rumuni, and Rumunsko
- Language:
- Romanian
- Rights:
- unknown
85. Preliminarii ale interventiei militare a Tratatului de la Varşovia in Cehoslovacia /
- Creator:
- Pop, Adrian
- Subject:
- okupace, Varšavská smlouva (organizace), Pražské jaro (1968), hnutí reformní, Mnichov 1938, Pražské jaro 1968, okupace 1939, 1968, světové dějiny od r. 1945 do současnosti, and Československo 1948-1969
- Language:
- Romanian
- Rights:
- unknown
86. Presa muncitorească şi socialistă din România.
- Type:
- text and monografie
- Subject:
- Politické strany a hnutí, hnutí dělnické, hnutí socialistické, tisk, Rumunsko, novinářství, tisk, and světové dějiny 1789-1918
- Language:
- Romanian
- Rights:
- unknown
87. Prietenia Cehoslovaco-Românâ :
- Creator:
- Konečný, Zdeněk,
- Publisher:
- Editura militarâ,
- Type:
- monografie
- Subject:
- Dějiny Česka a Slovenska, armáda rumunská, válka druhá světová (1939-1945), osvobození Československa, Československo 1938-1945, osvobození Československa, Pražské povstání 1945, Rumunsko, světové dějiny 1939-1945, and armáda, vojenské složky, vojáci
- Language:
- Romanian
- Rights:
- unknown
88. Primele asociatii muncitoreşti in Romînia /
- Creator:
- Haupt, Georges,
- Type:
- text and studie
- Subject:
- Politické strany a hnutí, hnutí dělnické, Rumunsko, sociální péče, odbory, and světové dějiny 1789-1918
- Language:
- Romanian
- Description:
- První dělnické spolky v Rumunsku.
- Rights:
- unknown
89. Programul de lupta al lui Crixos in cadrul răscoalei lui Spartacus /
- Creator:
- Daicoviciu, Hadrian,
- Type:
- text and studie
- Subject:
- Dějiny zemí starověkého světa, Spartacus,, říše římská, antika, povstání otroků, společenská struktura, and Etruskové, starověký Řím
- Language:
- Romanian
- Rights:
- unknown
90. Publikace o Československu v cizích jazycích =
- Type:
- text and bibliografie
- Subject:
- Bibliografie. Katalogy, bohemika cizojazyčná, dějiny československé, and všeobecné bibliografie
- Language:
- Czech, Bosnian, English, French, German, Italian, Polish, Russian, and Romanian
- Rights:
- unknown
91. Relatiile cehoslovaco-romane dupa marele razboi /
- Creator:
- Tejchman, Miroslav,
- Subject:
- vztahy československo-rumunské, zahraniční politika, mezinárodní vztahy, světové dějiny od r. 1918 do současnosti, Rumunsko, and Československo 1918-1992
- Language:
- Romanian
- Rights:
- unknown
92. Renaşterea africană /
- Creator:
- Voiculescu, Elena
- Type:
- text and monografie kolektivní
- Subject:
- Dějiny Afriky, dějiny africké, dějiny politické, přehledná zpracování světových dějin (chronologicky), and politické dějiny, politici
- Language:
- Romanian
- Rights:
- unknown
93. Resources and Tools for Romanian NLP
- Type:
- corpus
- Language:
- Romanian
- Description:
- Resources and tools developed for Romanian
- Rights:
- Not specified
94. Rok 1989 - pád komunistických režimov v Rumunsku a na Slovensku :
- Type:
- text and sborníky konferenční
- Subject:
- Dějiny Evropy, revoluce (1989), zahraniční periodika a sborníky, světové dějiny od r. 1945 do současnosti, and politické dějiny, politici
- Language:
- Slovak and Romanian
- Description:
- Nad názvem: Slovenská akadémia vied, Rumunskă akademia, Komisia historikov Rumunskă a Slovenska
- Rights:
- unknown
95. Rok 1989 - pád komunistických režimov v Rumunsku a na Slovensku :
- Type:
- text and sborníky konferenční
- Subject:
- Dějiny Evropy, revoluce (1989), zahraniční periodika a sborníky, světové dějiny od r. 1945 do současnosti, and politické dějiny, politici
- Language:
- Slovak and Romanian
- Description:
- Nad názvem: Slovenská akadémia vied, Rumunskă akademia, Komisia historikov Rumunskă a Slovenska
- Rights:
- unknown
96. Romanian Explanatory Dictionary
- Type:
- lexicalConceptualResource
- Language:
- Romanian
- Description:
- 292.792 definitions
- Rights:
- Not specified
97. Romanian stopwords
- Type:
- lexicalConceptualResource
- Language:
- Romanian
- Description:
- Aprox 500 entries, txt
- Rights:
- Not specified
98. Romanian word frequency list
- Type:
- lexicalConceptualResource
- Language:
- Romanian
- Description:
- Aprox. 2 mil words, txt
- Rights:
- Not specified
99. Romanian-English dictionary
- Type:
- lexicalConceptualResource
- Language:
- English and Romanian
- Description:
- 38,000 entries, XML
- Rights:
- Not specified
100. Romanian-English parallel texts
- Type:
- corpus
- Language:
- Romanian
- Description:
- 1 milion word; sentence annotation
- Rights:
- Not specified