« Previous |
1 - 50 of 88
|
Next »
Number of results to display per page
Search Results
2. Bibliografia slovenských kníh 1901-1918 /
- Type:
- text and bibliografie
- Subject:
- Souborná díla, literatura slovenská, dějiny knihy, knihtisk, nakladatelství, Slovensko 1848-1918, and všeobecné bibliografie
- Language:
- Slovak, Hungarian, and German
- Description:
- Spoluvydavatel: Univerzitná knižnica v Bratislave a Štátna vedecká knižnica v Košiciach
- Rights:
- unknown
3. Bitka pri Moháči - historický medzník v dejinách strednej Európy (490. výročie) :
- Type:
- text and sborníky konferenční
- Subject:
- Vojenství. Obrana země. Ozbrojené síly, bitva u Moháče (1526), války turecké, and zahraniční periodika a sborníky
- Language:
- Slovak, Czech, German, Hungarian, Latin, and Polish
- Rights:
- unknown
4. Bitka pri Moháči - historický medzník v dejinách strednej Európy (490. výročie) :
- Type:
- text and sborníky konferenční
- Subject:
- Vojenství. Obrana země. Ozbrojené síly, bitva u Moháče (1526), války turecké, and zahraniční periodika a sborníky
- Language:
- Slovak, Czech, German, Hungarian, Latin, and Polish
- Rights:
- unknown
5. Bretka
- Publisher:
- Vojenský zeměpisný ústav
- Format:
- map and 1 mapa : barevná ; 39 x 51 cm na listu 47 x 63 cm
- Type:
- model:map, cartographic, and IMAGE
- Subject:
- udc:913(4), Konspekt:7, udc:912, udc:913(437.6), udc:912.43, udc:(084.3), Konspekt:Geografie Evropy, reálie, cestování, Konspekt:Mapy. Atlasy. Glóby, and czenas:Bretka (Slovensko : oblast)
- Language:
- Czech, Slovak, and Hungarian
- Description:
- 4665, Legenda, Edice dle kladu listů, and (Language) Místní názvy maďarsky, částečně slovensky
- Rights:
- http://creativecommons.org/publicdomain/mark/1.0/ and policy:public
6. C4Corpus (CC BY-NC part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Malayalam, Macedonian, Nepali (macrolanguage), Dutch, Norwegian, Panjabi, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Telugu, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB
7. C4Corpus (CC BY-NC-ND part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Malayalam, Marathi, Macedonian, Nepali (macrolanguage), Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Telugu, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Urdu, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), http://creativecommons.org/licenses/by-nc-nd/4.0/, and PUB
8. C4Corpus (CC BY-NC-SA part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Malayalam, Marathi, Macedonian, Nepali (macrolanguage), Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Telugu, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Urdu, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
9. C4Corpus (CC BY-ND part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Malayalam, Macedonian, Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution-NoDerivatives 4.0 International (CC BY-ND 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB
10. C4Corpus (CC BY-SA part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Malayalam, Marathi, Macedonian, Nepali (macrolanguage), Dutch, Norwegian, Panjabi, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Telugu, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Urdu, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB
11. C4Corpus (CC-BY part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Malayalam, Marathi, Macedonian, Nepali (macrolanguage), Dutch, Norwegian, Panjabi, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Telugu, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Urdu, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB
12. Cejkov
- Publisher:
- Vojenský zeměpisný ústav
- Format:
- map and 1 mapa : barevná ; 39 x 51 cm na listu 48 x 62 cm
- Type:
- model:map, cartographic, and IMAGE
- Subject:
- udc:913(4), Konspekt:7, udc:912, udc:913(437.6), udc:912.43, udc:(084.3), Konspekt:Geografie Evropy, reálie, cestování, Konspekt:Mapy. Atlasy. Glóby, and czenas:Cejkov (Slovensko : oblast)
- Language:
- Czech, Slovak, and Hungarian
- Description:
- 4667, Legenda, Edice dle kladu listů, and (Language) Místní názvy slovensky a maďarsky
- Rights:
- http://creativecommons.org/publicdomain/mark/1.0/ and policy:public
13. Cirkev a náboženstvo v Uhorsku v ranom novoveku =
- Type:
- text, učebnice vysokoškolské, and monografie kolektivní
- Subject:
- Dějiny křesťanské církve, Učební osnovy. Vyučovací předměty. Učebnice, církev, náboženství, dějiny církevní, reformace, katolicismus, společnost raně novověká, vztahy mezi konfesemi, and zahraniční periodika a sborníky
- Language:
- Slovak and Hungarian
- Description:
- "Preklad - ÚJK CCKV Prešovskej univerzity"--Rub titulního listu and Název části: Egyház és vallás a koraújkori Magyaroszágon
- Rights:
- unknown
14. Cirkev a náboženstvo v Uhorsku v ranom novoveku =
- Type:
- text, učebnice vysokoškolské, and monografie kolektivní
- Subject:
- Dějiny křesťanské církve, Učební osnovy. Vyučovací předměty. Učebnice, církev, náboženství, dějiny církevní, reformace, katolicismus, společnost raně novověká, vztahy mezi konfesemi, and zahraniční periodika a sborníky
- Language:
- Slovak and Hungarian
- Description:
- "Preklad - ÚJK CCKV Prešovskej univerzity"--Rub titulního listu and Název části: Egyház és vallás a koraújkori Magyaroszágon
- Rights:
- unknown
15. CoNLL 2017 and 2018 Shared Task Blind and Preprocessed Test Data
- Creator:
- Zeman, Daniel and Straka, Milan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- tokenization, word segmentation, morphology, tagging, syntax, parsing, and universal dependencies
- Language:
- Afrikaans, Arabic, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Persian, Finnish, French, Old French (842-ca. 1400), Irish, Galician, Gothic, Ancient Greek (to 1453), Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Latin, Latvian, Dutch, Norwegian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Thai, Turkish, Uighur, Ukrainian, Urdu, Vietnamese, and Chinese
- Description:
- CoNLL 2017 and 2018 shared tasks: Multilingual Parsing from Raw Text to Universal Dependencies This package contains the test data in the form in which they ware presented to the participating systems: raw text files and files preprocessed by UDPipe. The metadata.json files contain lists of files to process and to output; README files in the respective folders describe the syntax of metadata.json. For full training, development and gold standard test data, see Universal Dependencies 2.0 (CoNLL 2017) Universal Dependencies 2.2 (CoNLL 2018) See the download links at http://universaldependencies.org/. For more information on the shared tasks, see http://universaldependencies.org/conll17/ http://universaldependencies.org/conll18/ Contents: conll17-ud-test-2017-05-09 ... CoNLL 2017 test data conll18-ud-test-2018-05-06 ... CoNLL 2018 test data conll18-ud-test-2018-05-06-for-conll17 ... CoNLL 2018 test data with metadata and filenames modified so that it is digestible by the 2017 systems.
- Rights:
- Licence Universal Dependencies v2.2, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.2, and PUB
16. CoNLL 2017 Shared Task System Outputs
- Creator:
- Zeman, Daniel, Potthast, Martin, Straka, Milan, Popel, Martin, Dozat, Timothy, Qi, Peng, Manning, Christopher, Shi, Tianze, Wu, Felix G., Chen, Xilun, Cheng, Yao, Björkelund, Anders, Falenska, Agnieszka, Yu, Xiang, Kuhn, Jonas, Che, Wanxiang, Guo, Jiang, Wang, Yuxuan, Zheng, Bo, Zhao, Huaipeng, Liu, Yang, Teng, Dechuan, Liu, Ting, Lim, Kyungtae, Poibeau, Thierry, Sato, Motoki, Manabe, Hitoshi, Noji, Hiroshi, Matsumoto, Yuji, Kırnap, Ömer, Önder, Berkay Furkan, Yuret, Deniz, Straková, Jana, Vania, Clara, Zhang, Xingxing, Lopez, Adam, Heinecke, Johannes, Asadullah, Munshi, Kanerva, Jenna, Luotolahti, Juhani, Ginter, Filip, Kuan, Yu, Sofroniev, Pavel, Schill, Erik, Hinrichs, Erhard, Nguyen, Dat Quoc, Dras, Mark, Johnson, Mark, Qian, Xian, Vilares, David, Gómez-Rodríguez, Carlos, Aufrant, Lauriane, Wisniewski, Guillaume, Yvon, François, Dumitrescu, Stefan Daniel, Boroş, Tiberiu, Tufiş, Dan, Das, Ayan, Zaffar, Affan, Sarkar, Sudeshna, Wang, Hao, Zhao, Hai, Zhang, Zhisong, Hornby, Ryan, Taylor, Clark, Park, Jungyeul, de Lhoneux, Miryam, Shao, Yan, Basirat, Ali, Kiperwasser, Eliyahu, Stymne, Sara, Goldberg, Yoav, Nivre, Joakim, Akkuş, Burak Kerim, Azizoglu, Heval, Cakici, Ruket, Moor, Christophe, Merlo, Paola, Henderson, James, Wang, Haozhou, Ji, Tao, Wu, Yuanbin, Lan, Man, de la Clergerie, Eric, Sagot, Benoît, Seddah, Djamé, More, Amir, Tsarfaty, Reut, Kanayama, Hiroshi, Muraoka, Masayasu, Yoshikawa, Katsumasa, Garcia, Marcos, and Gamallo, Pablo
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- dependency parser and parsebank
- Language:
- Arabic, Bulgarian, Russia Buriat, Czech, Catalan, Church Slavic, Danish, German, Modern Greek (1453-), English, Spanish, Estonian, Basque, Persian, Finnish, French, Irish, Galician, Gothic, Ancient Greek (to 1453), Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Latin, Latvian, Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Northern Sami, Swedish, Turkish, Uighur, Ukrainian, Urdu, Vietnamese, and Chinese
- Description:
- This package contains the system outputs from the CoNLL 2017 Shared Task in Multilingual Parsing from Raw Text to Universal Dependencies.
- Rights:
- Licence Universal Dependencies v2.0, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.0, and PUB
17. CoNLL 2018 Shared Task System Outputs
- Creator:
- Zeman, Daniel, Potthast, Martin, Duthoo, Elie, Mesnard, Olivier, Rybak, Piotr, Wróblewska, Alina, Che, Wanxiang, Liu, Yijia, Wang, Yuxuan, Zheng, Bo, Liu, Ting, Li, Zuchao, He, Shexia, Zhang, Zhuosheng, Zhao, Hai, Wu, Yingting, Tong, Jia-Jun, Nguyen, Dat Quoc, Verspoor, Karin, Wan, Hui, Naseem, Tahira, Lee, Young-Suk, Castelli, Vittorio, Ballesteros, Miguel, Hershcovich, Daniel, Abend, Omri, Rappoport, Ari, Smith, Aaron, Bohnet, Bernd, de Lhoneux, Miryam, Nivre, Joakim, Shao, Yan, Stymne, Sara, Kırnap, Ömer, Dayanık, Erenay, Yuret, Deniz, Kanerva, Jenna, Ginter, Filip, Miekka, Niko, Leino, Akseli, Salakoski, Tapio, Lim, KyungTae, Park, Cheoneum, Lee, Changki, Poibeau, Thierry, Bhat, Riyaz Ahmad, Bhat, Irshad, Bangalore, Srinivas, Qi, Peng, Dozat, Timothy, Zhang, Yuhao, Manning, Christopher, Boroș, Tiberiu, Dumitrescu, Stefan Daniel, Burtica, Ruxandra, Arakelyan, Gor, Hambardzumyan, Karen, Khachatrian, Hrant, Rosa, Rudolf, Mareček, David, Straka, Milan, Seker, Amit, More, Amir, Tsarfaty, Reut, Önder, Berkay Furkan, Gümeli, Can, Jawahar, Ganesh, Muller, Benjamin, Fethi, Amal, Martin, Louis, Villemonte de la Clergerie, Eric, Sagot, Benoît, Seddah, Djamé, Özateş, Şaziye Betül, Özgür, Arzucan, Gungor, Tunga, Öztürk, Balkız, Ji, Tao, Liu, Yufang, Wang, Yijun, Wu, Yuanbin, Lan, Man, Chen, Danlu, Lin, Mengxiao, Hu, Zhifeng, and Qiu, Xipeng
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- parsed data, conllu, and universal dependencies
- Language:
- Afrikaans, Arabic, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Persian, Finnish, French, Old French (842-ca. 1400), Irish, Galician, Gothic, Ancient Greek (to 1453), Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Latin, Latvian, Dutch, Norwegian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Thai, Turkish, Uighur, Ukrainian, Urdu, Vietnamese, and Chinese
- Description:
- Test data parsed by systems submitted to the CoNLL 2018 UD parsing shared task.
- Rights:
- Licence Universal Dependencies v2.2, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.2, and PUB
18. Coronatus Posonii... :
- Type:
- text and katalogy výstav
- Subject:
- Předměty z drahých kovů, medaile, žetony, korunovace, panovníci uherští, medailérství, Slovensko 1526-1780, Slovensko 1780-1847, and panovníci, panovnické rody, dvory
- Language:
- Slovak and Hungarian
- Rights:
- unknown
19. Corpus for training and evaluating diacritics restoration systems
- Creator:
- Náplava, Jakub, Straka, Milan, Hajič, Jan, and Straňák, Pavel
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- diacritical marks generation and natural language correction
- Language:
- Czech, Vietnamese, Romanian, Polish, Slovak, Spanish, Croatian, Irish, Latvian, Hungarian, French, and Turkish
- Description:
- Corpus of texts in 12 languages. For each language, we provide one training, one development and one testing set acquired from Wikipedia articles. Moreover, each language dataset contains (substantially larger) training set collected from (general) Web texts. All sets, except for Wikipedia and Web training sets that can contain similar sentences, are disjoint. Data are segmented into sentences which are further word tokenized. All data in the corpus contain diacritics. To strip diacritics from them, use Python script diacritization_stripping.py contained within attached stripping_diacritics.zip. This script has two modes. We generally recommend using method called uninames, which for some languages behaves better. The code for training recurrent neural-network based model for diacritics restoration is located at https://github.com/arahusky/diacritics_restoration.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
20. DaMuEL 1.0: A Large Multilingual Dataset for Entity Linking
- Creator:
- Kubeša, David and Straka, Milan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- entity linking, NEL, NER, dataset, and knowledge base
- Language:
- Afrikaans, Arabic, Armenian, Basque, Belarusian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Latin, Latvian, Lithuanian, Maltese, Marathi, Modern Greek (1453-), Northern Sami, Norwegian Nynorsk, Persian, Polish, Portuguese, Romanian, Russian, Scottish Gaelic, Serbian, Slovak, Slovenian, Spanish, Swedish, Tamil, Telugu, Uighur, Ukrainian, Urdu, Vietnamese, and Wolof
- Description:
- We present DaMuEL, a large Multilingual Dataset for Entity Linking containing data in 53 languages. DaMuEL consists of two components: a knowledge base that contains language-agnostic information about entities, including their claims from Wikidata and named entity types (PER, ORG, LOC, EVENT, BRAND, WORK_OF_ART, MANUFACTURED); and Wikipedia texts with entity mentions linked to the knowledge base, along with language-specific text from Wikidata such as labels, aliases, and descriptions, stored separately for each language. The Wikidata QID is used as a persistent, language-agnostic identifier, enabling the combination of the knowledge base with language-specific texts and information for each entity. Wikipedia documents deliberately annotate only a single mention for every entity present; we further automatically detect all mentions of named entities linked from each document. The dataset contains 27.9M named entities in the knowledge base and 12.3G tokens from Wikipedia texts. The dataset is published under the CC BY-SA licence.
- Rights:
- Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB
21. Deep Universal Dependencies 2.4
- Creator:
- Zeman, Daniel and Droganova, Kira
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- semantic dependency and universal dependencies
- Language:
- Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, and Galician
- Description:
- Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-2988). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
- Rights:
- Licence Universal Dependencies v2.4, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.4, and PUB
22. Deep Universal Dependencies 2.5
- Creator:
- Zeman, Daniel and Droganova, Kira
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- semantic dependency and universal dependencies
- Language:
- Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, Galician, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, and Skolt Sami
- Description:
- Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3105). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
- Rights:
- Licence Universal Dependencies v2.5, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.5, and PUB
23. Deep Universal Dependencies 2.6
- Creator:
- Zeman, Daniel and Droganova, Kira
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- semantic dependency and universal dependencies
- Language:
- Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, Galician, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Icelandic, Albanian, and Persian
- Description:
- Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3226). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
- Rights:
- Licence Universal Dependencies v2.6, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.6, and PUB
24. Deep Universal Dependencies 2.7
- Creator:
- Zeman, Daniel and Droganova, Kira
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- semantic dependency and universal dependencies
- Language:
- Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, Galician, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Icelandic, Albanian, Persian, Akuntsu, Apurinã, Khunsari, Manx, Mundurukú, Nayini, Soi, South Levantine Arabic, and Tupinambá
- Description:
- Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3424). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
- Rights:
- Licence Universal Dependencies v2.7, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.7, and PUB
25. Deep Universal Dependencies 2.8
- Creator:
- Zeman, Daniel and Droganova, Kira
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- semantic dependency and universal dependencies
- Language:
- Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, Galician, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Icelandic, Albanian, Persian, Akuntsu, Apurinã, Khunsari, Manx, Mundurukú, Nayini, Soi, South Levantine Arabic, Tupinambá, Beja, Western Frisian, Urubú-Kaapor, Kangri, K'iche', Low German, Makuráp, Western Armenian, and Central Siberian Yupik
- Description:
- Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3687). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
- Rights:
- Licence Universal Dependencies v2.8, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.8, and PUB
26. Deltacorpus
- Creator:
- Mareček, David, Yu, Zhiwei, Zeman, Daniel, and Žabokrtský, Zdeněk
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- part of speech, tagging, semi-supervised, and cross-language
- Language:
- Belarusian, Bosnian, Bulgarian, Czech, Serbo-Croatian, Croatian, Upper Sorbian, Macedonian, Polish, Russian, Slovak, Slovenian, Serbian, Ukrainian, Latvian, Lithuanian, Afrikaans, Danish, German, English, Faroese, Western Frisian, Swiss German, Icelandic, Limburgan, Luxembourgish, Low German, Dutch, Norwegian Nynorsk, Norwegian, Scots, Swedish, Yiddish, Aragonese, Asturian, Catalan, French, Galician, Haitian, Italian, Latin, Lombard, Neapolitan, Piemontese, Portuguese, Romanian, Spanish, Venetian, Walloon, Breton, Welsh, Scottish Gaelic, Irish, Modern Greek (1453-), Armenian, Albanian, Dimli (individual language), Persian, Gilaki, Kurdish, Tajik, Bengali, Bishnupriya, Gujarati, Fiji Hindi, Hindi, Marathi, Nepali (macrolanguage), Urdu, Amharic, Arabic, Egyptian Arabic, Hebrew, Estonian, Finnish, Hungarian, Basque, Georgian, Chuvash, Azerbaijani, Turkish, Uzbek, Kazakh, Tatar, Yakut, Korean, Mongolian, Telugu, Kannada, Malayalam, Tamil, Newari, Vietnamese, Indonesian, Javanese, Malagasy, Maori, Malay (macrolanguage), Pampanga, Sundanese, Tagalog, Waray (Philippines), Swahili (macrolanguage), Esperanto, Ido, Interlingua (International Auxiliary Language Association), and Volapük
- Description:
- Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger described in Yu et al. (2016, LREC, Portorož, Slovenia).
- Rights:
- Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB
27. Deltacorpus 1.1
- Creator:
- Mareček, David, Yu, Zhiwei, Zeman, Daniel, and Žabokrtský, Zdeněk
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- part of speech, tagging, semi-supervised, and cross-language
- Language:
- Belarusian, Bosnian, Bulgarian, Czech, Serbo-Croatian, Croatian, Upper Sorbian, Macedonian, Polish, Russian, Slovak, Slovenian, Serbian, Ukrainian, Latvian, Lithuanian, Afrikaans, Danish, German, English, Faroese, Western Frisian, Swiss German, Icelandic, Limburgan, Luxembourgish, Low German, Dutch, Norwegian Nynorsk, Norwegian, Scots, Swedish, Yiddish, Aragonese, Asturian, Catalan, French, Galician, Haitian, Italian, Latin, Lombard, Neapolitan, Piemontese, Portuguese, Romanian, Spanish, Venetian, Walloon, Breton, Welsh, Scottish Gaelic, Irish, Modern Greek (1453-), Armenian, Albanian, Dimli (individual language), Persian, Gilaki, Kurdish, Tajik, Bengali, Bishnupriya, Gujarati, Fiji Hindi, Hindi, Marathi, Nepali (macrolanguage), Urdu, Amharic, Arabic, Egyptian Arabic, Hebrew, Estonian, Finnish, Hungarian, Basque, Georgian, Chuvash, Azerbaijani, Turkish, Uzbek, Kazakh, Tatar, Yakut, Korean, Mongolian, Telugu, Kannada, Malayalam, Tamil, Newari, Vietnamese, Indonesian, Javanese, Malagasy, Maori, Malay (macrolanguage), Pampanga, Sundanese, Tagalog, Waray (Philippines), Swahili (macrolanguage), Esperanto, Ido, Interlingua (International Auxiliary Language Association), and Volapük
- Description:
- Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger described in Yu et al. (2016, LREC, Portorož, Slovenia). Changes in version 1.1: 1. Universal Dependencies tagset instead of the older and smaller Google Universal POS tagset. 2. SVM classifier trained on Universal Dependencies 1.2 instead of HamleDT 2.0. 3. Balto-Slavic languages, Germanic languages and Romance languages were tagged by classifier trained only on the respective group of languages. Other languages were tagged by a classifier trained on all available languages. The "c7" combination from version 1.0 is no longer used.
- Rights:
- Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB
28. Dunajská Streda
- Publisher:
- Vojenský zeměpisný ústav
- Format:
- map and 1 mapa : barevná ; 39 x 52 cm na listu 48 x 62 cm
- Type:
- model:map, cartographic, and IMAGE
- Subject:
- udc:913(4), Konspekt:7, udc:912, udc:913(437.6), udc:912.43, udc:(084.3), Konspekt:Geografie Evropy, reálie, cestování, Konspekt:Mapy. Atlasy. Glóby, and czenas:Dunajská Streda (Slovensko : oblast)
- Language:
- Czech, Slovak, and Hungarian
- Description:
- 4859, Legenda, Edice dle kladu listů, and (Language) Místní názvy slovensky a maďarsky
- Rights:
- http://creativecommons.org/publicdomain/mark/1.0/ and policy:public
29. Esterházy Jánosról a közép-európai dialógus jegyében =
- Type:
- text and sborníky konferenční
- Subject:
- Mezinárodní vztahy, světová politika, Esterházy, János,, politici slovenští, menšina maďarská, zahraniční periodika a sborníky, Československo 1918-1992, politické dějiny, politici, and Maďaři
- Language:
- Slovak and Hungarian
- Rights:
- unknown
30. Felekezetek, egyházpolitika, identitás Magyarországon és Szlovákiában 1945 után. = Konfesie, cirkevná politika, identita na Slovensku a v Maďarsku po roku 1945 /
- Publisher:
- Kossuth Kiadó,
- Subject:
- sborníky, vztahy slovensko-maďarské, politika církevní, and zahraniční periodika a sborníky
- Language:
- Slovak and Hungarian
- Description:
- [Souběžný slovenský a maďarský text]
- Rights:
- unknown
31. Gróf Imrich Thököly a jeho povstanie =
- Type:
- text and monografie
- Subject:
- Genealogie. Heraldika. Šlechta. Vlajky, Thőkőly, Imre,, šlechtici uherští, povstání, sborníky zahraniční, světové dějiny 1648-1789, Maďarsko, šlechta, buržoazie, měšťanstvo, podnikatelé, zahraniční periodika a sborníky, and Slovensko 1606-1711
- Language:
- Slovak and Hungarian
- Rights:
- unknown
32. Hajnáčka
- Publisher:
- Vojenský zeměpisný ústav
- Format:
- map and 1 mapa : barevná ; 39 x 51 cm na listu 48 x 63 cm
- Type:
- model:map, cartographic, and IMAGE
- Subject:
- udc:913(4), Konspekt:7, udc:912, udc:913(437.6), udc:912.43, udc:(084.3), Konspekt:Geografie Evropy, reálie, cestování, Konspekt:Mapy. Atlasy. Glóby, and czenas:Hajnáčka (Slovensko : oblast)
- Language:
- Czech, Slovak, and Hungarian
- Description:
- 4764, Legenda, Edice dle kladu listů, and (Language) Místní názvy slovensky a maďarsky
- Rights:
- http://creativecommons.org/publicdomain/mark/1.0/ and policy:public
33. HamleDT 2.0
- Creator:
- Zeman, Daniel, Mareček, David, Mašek, Jan, Popel, Martin, Ramasamy, Loganathan, Rosa, Rudolf, Štěpánek, Jan, and Žabokrtský, Zdeněk
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- treebank, Stanford dependencies, Prague dependencies, harmonization, common annotation style, and Interset
- Language:
- Arabic, Bulgarian, Bengali, Catalan, Czech, Danish, German, Modern Greek (1453-), English, Spanish, Estonian, Basque, Persian, Finnish, Ancient Greek (to 1453), Hindi, Hungarian, Italian, Japanese, Latin, Dutch, Portuguese, Romanian, Russian, Slovak, Slovenian, Swedish, Tamil, Telugu, and Turkish
- Description:
- HamleDT 2.0 is a collection of 30 existing treebanks harmonized into a common annotation style, the Prague Dependencies, and further transformed into Stanford Dependencies, a treebank annotation style that became popular recently. We use the newest basic Universal Stanford Dependencies, without added language-specific subtypes.
- Rights:
- HamleDT 2.0 Licence Agreement, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-hamledt-2.0, and ACA
34. HamleDT 3.0
- Creator:
- Zeman, Daniel, Mareček, David, Mašek, Jan, Popel, Martin, Ramasamy, Loganathan, Rosa, Rudolf, Štěpánek, Jan, and Žabokrtský, Zdeněk
- Publisher:
- Charles University
- Type:
- text and corpus
- Subject:
- annotated corpus, morphology, syntax, dependency, treebank, harmonized annotation, and common annotation style
- Language:
- Arabic, Basque, Bengali, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Modern Greek (1453-), Ancient Greek (to 1453), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Persian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Tamil, Telugu, and Turkish
- Description:
- HamleDT (HArmonized Multi-LanguagE Dependency Treebank) is a compilation of existing dependency treebanks (or dependency conversions of other treebanks), transformed so that they all conform to the same annotation style. This version uses Universal Dependencies as the common annotation style. Update (November 1017): for a current collection of harmonized dependency treebanks, we recommend using the Universal Dependencies (UD). All of the corpora that are distributed in HamleDT in full are also part of the UD project; only some corpora from the Patch group (where HamleDT provides only the harmonizing scripts but not the full corpus data) are available in HamleDT but not in UD.
- Rights:
- HamleDT 3.0 License Terms, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-hamledt-3.0, and PUB
35. JRC-Acquis
- Publisher:
- Joint Research Centre of the EU
- Type:
- corpus
- Language:
- Bulgarian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Modern Greek (1453-), Hungarian, Italian, Latvian, Maltese, Norwegian, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, and Swedish
- Description:
- The largest parallel corpus, contains EU law, the Acquis Communautaire in 22 languages.
- Rights:
- Not specified
36. Komárno
- Publisher:
- Vojenský zeměpisný ústav
- Format:
- map and 1 mapa : barevná ; 39 x 52 cm na listu 47 x 64 cm
- Type:
- model:map, cartographic, and IMAGE
- Subject:
- udc:913(4), Konspekt:7, udc:912, udc:913(437.6), udc:912.43, udc:(084.3), Konspekt:Geografie Evropy, reálie, cestování, Konspekt:Mapy. Atlasy. Glóby, and czenas:Komárno (Slovensko : oblast)
- Language:
- Czech, Slovak, and Hungarian
- Description:
- 4860, Legenda, Edice dle kladu listů, and (Language) Místní názvy slovensky, částečně maďarsky
- Rights:
- http://creativecommons.org/publicdomain/mark/1.0/ and policy:public
37. Košice
- Publisher:
- Vojenský zeměpisný ústav
- Format:
- map and 1 mapa : barevná ; 39 x 51 cm na listu 47 x 63 cm
- Type:
- model:map, cartographic, and IMAGE
- Subject:
- udc:913(4), Konspekt:7, udc:912, udc:913(437.6), udc:912.43, udc:(084.3), Konspekt:Geografie Evropy, reálie, cestování, Konspekt:Mapy. Atlasy. Glóby, and czenas:Košice (Slovensko : oblast)
- Language:
- Czech, Slovak, and Hungarian
- Description:
- 4566, Legenda, Edice dle kladu listů, and (Language) Místní názvy slovensky, částečně maďarsky
- Rights:
- http://creativecommons.org/publicdomain/mark/1.0/ and policy:public
38. Kráľ. Chlumec (ČOP)
- Publisher:
- Vojenský zeměpisný ústav
- Format:
- map and 1 mapa : barevná ; 39 x 51 cm na listu 48 x 62 cm
- Type:
- model:map, cartographic, and IMAGE
- Subject:
- udc:913(4), Konspekt:7, udc:912, udc:913(437.6), udc:912.43, udc:(084.3), Konspekt:Geografie Evropy, reálie, cestování, Konspekt:Mapy. Atlasy. Glóby, and czenas:Kráľovský Chlmec (Slovensko : oblast)
- Language:
- Czech, Slovak, and Hungarian
- Description:
- 4668, Legenda, Edice dle kladu listů, and (Language) Místní názvy slovensky a maďarsky
- Rights:
- http://creativecommons.org/publicdomain/mark/1.0/ and policy:public
39. Levice
- Publisher:
- Vojenský zeměpisný ústav
- Format:
- map and 1 mapa : barevná ; 39 x 51 cm na listu 47 x 64 cm
- Type:
- model:map, cartographic, and IMAGE
- Subject:
- udc:913(4), Konspekt:7, udc:912, udc:913(437.6), udc:912.43, udc:(084.3), Konspekt:Geografie Evropy, reálie, cestování, Konspekt:Mapy. Atlasy. Glóby, and czenas:Levice (Slovensko : oblast)
- Language:
- Czech, Slovak, and Hungarian
- Description:
- 4761, Legenda, Edice dle kladu listů, and (Language) Místní názvy slovensky a maďarsky
- Rights:
- http://creativecommons.org/publicdomain/mark/1.0/ and policy:public
40. Likvidácia reholí a ich život v ilegalite v rokoch 1950-1989 :
- Publisher:
- Ústav pamäti národa,
- Type:
- sborníky konferenční
- Subject:
- Politika a náboženství. Vztahy mezi církví a státem, sborníky konferenční, sborníky zahraniční, řády církevní, represe komunistické, zahraniční periodika a sborníky, Československo 1945-1992, perzekuce, politická emigrace, and církevní řády a kongregace, náboženská bratrstva, kláštery
- Language:
- Slovak, Czech, Hungarian, and Italian
- Description:
- Obálkový název:Rehole v ilegalite
- Rights:
- unknown
41. Likvidácia reholí a ich život v ilegalite v rokoch 1950-1989 :
- Publisher:
- Ústav pamäti národa,
- Type:
- sborníky konferenční
- Subject:
- Politika a náboženství. Vztahy mezi církví a státem, sborníky konferenční, sborníky zahraniční, řády církevní, represe komunistické, zahraniční periodika a sborníky, Československo 1945-1992, perzekuce, politická emigrace, and církevní řády a kongregace, náboženská bratrstva, kláštery
- Language:
- Slovak, Czech, Hungarian, and Italian
- Description:
- Obálkový název:Rehole v ilegalite
- Rights:
- unknown
42. Od Uhorského kráľovstva k Československej republike :
- Creator:
- Bandoľová, Margita
- Type:
- text and dokumenty
- Subject:
- Dějiny Česka a Slovenska, vznik Československa (1918), Československo 1918-1938, and vznik Československa 1918
- Language:
- Slovak, Czech, German, and Hungarian
- Rights:
- unknown
43. OmegaWiki
- Publisher:
- Universität Bamberg, World Language Documentation Centre
- Format:
- application/octet-stream
- Type:
- lexicalConceptualResource
- Language:
- Afrikaans, Arabic, Basque, Bulgarian, Catalan, Chinese, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Galician, Georgian, Modern Greek (1453-), Hebrew, Hungarian, Icelandic, Indonesian, Interlingua (International Auxiliary Language Association), Irish, Italian, Japanese, Khmer, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Spanish, Swedish, Turkish, Ukrainian, and Welsh
- Rights:
- GFDL or CC and http://www.omegawiki.org/Licensing
44. Ovidius redivivus :
- Creator:
- Polgár, Anikó
- Publisher:
- UKF ; and Kalligram,
- Type:
- studie
- Subject:
- Lingvistika. Jazyky, Ovidius,, překlady literární, překladatelství, jazyk maďarský, poezie latinská, Maďarsko, světové dějiny od r. 1918 do současnosti, literatura, spisovatelé, and Etruskové, starověký Řím
- Language:
- Slovak and Hungarian
- Description:
- Obsahuje bibliografii a bibliografické odkazy
- Rights:
- unknown
45. Plaintext Wikipedia dump 2018
- Creator:
- Rosa, Rudolf
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- Wikipedia, text corpora, and monolingual corpus
- Language:
- Abkhazian, Achinese, Adyghe, Afrikaans, Akan, Tosk Albanian, Amharic, Old English (ca. 450-1100), Arabic, Official Aramaic (700-300 BCE), Aragonese, Egyptian Arabic, Assamese, Asturian, Atikamekw, Avaric, Aymara, South Azerbaijani, Azerbaijani, Bashkir, Bambara, Bavarian, Central Bikol, Belarusian, Bengali, Bislama, Banjar, Tibetan, Bosnian, Bishnupriya, Breton, Buginese, Bulgarian, Russia Buriat, Catalan, Min Dong Chinese, Cebuano, Czech, Chamorro, Chechen, Cherokee, Church Slavic, Chuvash, Cheyenne, Central Kurdish, Cornish, Corsican, Cree, Crimean Tatar, Kashubian, Welsh, Danish, German, Dinka, Dimli (individual language), Dhivehi, Lower Sorbian, Dzongkha, Modern Greek (1453-), English, Esperanto, Estonian, Basque, Ewe, Extremaduran, Faroese, Persian, Fijian, Finnish, French, Arpitan, Northern Frisian, Western Frisian, Fulah, Friulian, Gagauz, Gan Chinese, Scottish Gaelic, Irish, Galician, Gilaki, Manx, Goan Konkani, Gothic, Guarani, Gujarati, Hakka Chinese, Haitian, Hausa, Hawaiian, Serbo-Croatian, Hebrew, Herero, Fiji Hindi, Hindi, Hiri Motu, Croatian, Upper Sorbian, Hungarian, Armenian, Igbo, Ido, Inuktitut, Interlingue, Iloko, Interlingua (International Auxiliary Language Association), Indonesian, Inupiaq, Icelandic, Italian, Jamaican Creole English, Javanese, Lojban, Japanese, Kara-Kalpak, Kabyle, Kalaallisut, Kannada, Kashmiri, Georgian, Kanuri, Kazakh, Kabardian, Kabiyè, Khmer, Kikuyu, Kinyarwanda, Kirghiz, Komi-Permyak, Komi, Kongo, Korean, Karachay-Balkar, Kölsch, Kurdish, Ladino, Lao, Latin, Latvian, Lak, Lezghian, Ligurian, Limburgan, Lingala, Lithuanian, Lombard, Northern Luri, Latgalian, Luxembourgish, Ganda, Literary Chinese, Marshallese, Maithili, Malayalam, Marathi, Moksha, Eastern Mari, Minangkabau, Macedonian, Malagasy, Maltese, Mongolian, Maori, Western Mari, Malay (macrolanguage), Creek, Mirandese, Burmese, Erzya, Mazanderani, Min Nan Chinese, Neapolitan, Nauru, Navajo, Ndonga, Low German, Nepali (macrolanguage), Newari, Dutch, Norwegian Nynorsk, Norwegian, Novial, Pedi, Nyanja, Occitan (post 1500), Livvi, Oriya (macrolanguage), Oromo, Ossetian, Pangasinan, Pampanga, Panjabi, Papiamento, Picard, Pennsylvania German, Pfaelzisch, Pitcairn-Norfolk, Pali, Piemontese, Western Panjabi, Pontic, Polish, Portuguese, Pushto, Quechua, Vlax Romani, Romansh, Romanian, Rusyn, Rundi, Macedo-Romanian, Russian, Sango, Yakut, Sanskrit, Sicilian, Scots, Samogitian, Sinhala, Slovak, Slovenian, Northern Sami, Samoan, Shona, Sindhi, Somali, Southern Sotho, Spanish, Albanian, Sardinian, Sranan Tongo, Serbian, Swati, Saterfriesisch, Sundanese, Swahili (macrolanguage), Swedish, Silesian, Tahitian, Tamil, Tatar, Tulu, Telugu, Tama (Colombia), Tetum, Tajik, Tagalog, Thai, Tigrinya, Tonga (Tonga Islands), Tok Pisin, Tswana, Tsonga, Turkmen, Tumbuka, Turkish, Twi, Tuvinian, Udmurt, Uighur, Ukrainian, Urdu, Uzbek, Venetian, Venda, Veps, Vietnamese, Vlaams, Volapük, Võro, Waray (Philippines), Walloon, Wolof, Wu Chinese, Kalmyk, Xhosa, Mingrelian, Yiddish, Yoruba, Yue Chinese, Zeeuws, Zhuang, Chinese, Zulu, and Dotyali
- Description:
- Wikipedia plain text data obtained from Wikipedia dumps with WikiExtractor in February 2018. The data come from all Wikipedias for which dumps could be downloaded at [https://dumps.wikimedia.org/]. This amounts to 297 Wikipedias, usually corresponding to individual languages and identified by their ISO codes. Several special Wikipedias are included, most notably "simple" (Simple English Wikipedia) and "incubator" (tiny hatching Wikipedias in various languages). For a list of all the Wikipedias, see [https://meta.wikimedia.org/wiki/List_of_Wikipedias]. The script which can be used to get new version of the data is included, but note that Wikipedia limits the download speed for downloading a lot of the dumps, so it takes a few days to download all of them (but one or a few can be downloaded fast). Also, the format of the dumps changes time to time, so the script will probably eventually stop working one day. The WikiExtractor tool [http://medialab.di.unipi.it/wiki/Wikipedia_Extractor] used to extract text from the Wikipedia dumps is not mine, I only modified it slightly to produce plaintext outputs [https://github.com/ptakopysk/wikiextractor].
- Rights:
- Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0), http://creativecommons.org/licenses/by-sa/3.0/, and PUB
46. Přirozený svět a fenomenologie
- Creator:
- Jan Patočka
- Publisher:
- 1.3, 48 s. Stať. [Předloha pro slovenský překlad (v. 1967/2). Začátky textů se mírně liší.] — 2. otisk in: Fenomenologické spisy II (SS-7/Fen-II), Praha 2009, str. 202–237 (v. 2009/1).
- Type:
- Text
- Subject:
- 1967/2, 1969/8, 1970/10, 1972/1, 1972/2, 1976/7, 1980, 1988/29, 1989/16, 1991/2, 1996/7, 2003/23, 2004/10, 2009/1, cs, de, en, es, fr, hu, it, sk, SS-7/Fen-II, and stať
- Language:
- English, French, Italian, Hungarian, German, Slovak, Spanish, and Czech
- Rights:
- open access and Rights holder: Archiv Jana Patočky, z.s.
47. Rakousko-uherská monarchie. Habsburská říše 1867-1918 slovem a obrazem.
- Publisher:
- Slovart,
- Subject:
- prameny obrazové, přehledná zpracování (tematicky), světové dějiny 1789-1918, and Habsburská monarchie
- Language:
- Czech, English, German, Hungarian, and Slovak
- Rights:
- unknown
48. Reformácia v strednej Európe =
- Type:
- text and sborníky konferenční
- Subject:
- Dějiny křesťanské církve, reformace, protestantismus, církve evangelické, and zahraniční periodika a sborníky
- Language:
- Slovak, English, German, and Hungarian
- Description:
- Příspěvky z mezinárodní vědecké konference "Reformácia v strednej a juhovýchodnej Európe" konané 4.-7.12.2017 v Prešove, Souběžný název: Reformáció közép-Európában, and Souběžný název: Reformation in mittel-Europa.
- Rights:
- unknown
49. Reformácia v strednej Európe =
- Type:
- text and sborníky konferenční
- Subject:
- Dějiny křesťanské církve, reformace, protestantismus, církve evangelické, and zahraniční periodika a sborníky
- Language:
- Slovak, English, German, and Hungarian
- Description:
- Příspěvky z mezinárodní vědecké konference "Reformácia v strednej a juhovýchodnej Európe" konané 4.-7.12.2017 v Prešove, Souběžný název: Reformáció közép-Európában, and Souběžný název: Reformation in mittel-Europa.
- Rights:
- unknown
50. Reformácia v strednej Európe =
- Type:
- text and sborníky konferenční
- Subject:
- Dějiny křesťanské církve, reformace, protireformace, protestantismus, církve evangelické, and zahraniční periodika a sborníky
- Language:
- Slovak, English, German, and Hungarian
- Description:
- Příspěvky z mezinárodní vědecké konference "Reformácia v strednej a juhovýchodnej Európe" konané 4.-7.12.2017 v Prešove, Souběžný název: Reformáció közép-Európában, and Souběžný název: Reformation in mittel-Europa
- Rights:
- unknown