« Previous |
1 - 100 of 167
|
Next »
Number of results to display per page
Search Results
2. Adriaan van Roomen en Polen :
- Creator:
- Bockstaele, Paul,
- Type:
- text and monografie
- Subject:
- Dějiny Evropy, Roomen, Adriaan van,, Broscius, Johannes,, vztahy kulturní, dějiny vědy, vědy exaktní, matematika, matematici, Polsko, matematika, kybernetika, and světové dějiny 1492-1648
- Language:
- Dutch
- Rights:
- unknown
3. Alpino Treebank
- Publisher:
- Center for Language and Cognition
- Format:
- application/xml
- Type:
- corpus
- Language:
- Dutch
- Description:
- A database of 7.000 syntactically analyzed Dutch sentences.
- Rights:
- Not specified
4. Amara - universal subtitles
- Type:
- corpus
- Language:
- Arabic, Danish, Dutch, English, German, Modern Greek (1453-), Italian, Japanese, Korean, Portuguese, Russian, Spanish, and Turkish
- Description:
- Large set of subtitles available for download in multiple languages. Can be used as parallel corpus.
- Rights:
- Not specified
5. Ancorae : Steunpunten voor studie en onderwijs :
- Creator:
- Aerts, Erik,
- Type:
- text and dokumenty
- Subject:
- Dějiny Evropy, dějiny hospodářské, dějiny osídlení, demografie historická, vývoj hospodářský, obchod, peněžnictví, hospodářské dějiny, and světové dějiny středověku (do r. 1492)
- Language:
- Dutch
- Rights:
- unknown
6. Athenae Batavae, Leidse Universiteit :
7. Benedikta de Spinozy Spisy filosofické
- Creator:
- Benedictus de Spinoza, František Kalda, and Josef Hrůša
- Publisher:
- Česká akademie věd a umění
- Format:
- print and xii, 276 s.
- Type:
- text, volume, pojednání, model:monograph, and TEXT
- Subject:
- Filozofické systémy a hlediska, Spinoza, Benedictus de, 1632-1677, nizozemská filozofie, filozofie a náboženství, 14(492), 2:1, (049), 5, and 14
- Language:
- Czech, Latin, and Dutch
- Description:
- [Sv.] 2, Krátké pojednání o bohu, člověku a jeho blahu -- Listy, Benedikt de Spinoza ; z nizozemského jazyka přeložil František Kalda ; z latiny přeložil Josef Hrůša., and KČSN
- Rights:
- http://creativecommons.org/licenses/by-nc-sa/4.0/ and policy:public
8. Benedikta de Spinozy Spisy filosofické
- Creator:
- Benedictus de Spinoza, František Krejčí, Čestmír Stehlík, and Stejskal, Alois
- Publisher:
- Česká akademie věd a umění
- Format:
- print and xli, 339 s.
- Type:
- text, volume, biografie, pojednání, model:monograph, and TEXT
- Subject:
- Moderní západní filozofie, Spinoza, Benedictus de, 1632-1677, 17. stol, filozofové, nizozemská filozofie, logika, 14(492), 15, (049), 5, and 14(100-15)"15/20"
- Language:
- Czech and Dutch
- Description:
- [Sv.] 1, Rozprava o zdokonalení rozumu a Ethika po geometricku vyložená, Benedikt de Spinoza ; přeložili Frant. Krejčí, Čestmír Stehlík a Alois Stejskal., and KČSN
- Rights:
- http://creativecommons.org/licenses/by-nc-sa/4.0/ and policy:public
9. Benedikta de Spinozy Spisy filosofické
- Creator:
- Benedictus de Spinoza, František Krejčí, Čestmír Stehlík, and Stejskal, Alois
- Type:
- title, biografie, pojednání, model:monograph, and TEXT
- Language:
- Czech and Dutch
- Rights:
- http://creativecommons.org/licenses/by-nc-sa/4.0/ and policy:public
10. Bohemen en de Nederlanden in de zestiende eeuw :
- Creator:
- Mout, M. E. H. N.,
- Type:
- text and monografie
- Subject:
- Mezinárodní vztahy, světová politika, vztahy česko-nizozemské, Nizozemí, světové dějiny 1492-1648, zahraniční politika, mezinárodní vztahy, české země 1471-1526, and české země 1526-1792
- Language:
- Dutch
- Rights:
- unknown
11. Bronnenmateriaal uit de Brugse Stadsrekeningen betreffende de Hongersnood van 1316 /
- Creator:
- Werveke, Hans van,
- Type:
- text and studie
- Subject:
- Dějiny ostatních evropských států, města belgická, dějiny společnosti, hladomory, Belgie, dějiny osídlení, regionální dějiny, and světové dějiny středověku (do r. 1492)
- Language:
- Dutch
- Rights:
- unknown
12. Bůh filosofů a Bůh Pascalův. Na pomezí filosofie a theologie /
- Creator:
- Boer, Theo de,
- Publisher:
- Eman,
- Subject:
- Pascal, Blaise,, filozofie křesťanská, filozofie náboženská, teologie, vztahy filozofie-náboženství, filozofie francouzská, světové dějiny 1492-1648, světové dějiny 1648-1789, Francie, and filozofie, filozofové
- Language:
- Czech and Dutch
- Rights:
- unknown
13. C4Corpus (CC BY-NC part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Malayalam, Macedonian, Nepali (macrolanguage), Dutch, Norwegian, Panjabi, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Telugu, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB
14. C4Corpus (CC BY-NC-ND part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Malayalam, Marathi, Macedonian, Nepali (macrolanguage), Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Telugu, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Urdu, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), http://creativecommons.org/licenses/by-nc-nd/4.0/, and PUB
15. C4Corpus (CC BY-NC-SA part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Malayalam, Marathi, Macedonian, Nepali (macrolanguage), Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Telugu, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Urdu, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
16. C4Corpus (CC BY-ND part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Malayalam, Macedonian, Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution-NoDerivatives 4.0 International (CC BY-ND 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB
17. C4Corpus (CC BY-SA part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Malayalam, Marathi, Macedonian, Nepali (macrolanguage), Dutch, Norwegian, Panjabi, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Telugu, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Urdu, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB
18. C4Corpus (CC-BY part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Malayalam, Marathi, Macedonian, Nepali (macrolanguage), Dutch, Norwegian, Panjabi, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Telugu, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Urdu, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB
19. C4Corpus (publicdomain part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Croatian, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Dutch, Norwegian, Polish, Portuguese, Russian, Slovenian, Somali, Spanish, Swahili (macrolanguage), Swedish, Tagalog, Thai, Turkish, Ukrainian, Undetermined, and Vietnamese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Public Domain Mark (PD), http://creativecommons.org/publicdomain/mark/1.0/, and PUB
20. CELEX (web version)
- Publisher:
- Max Planck Institute for Psycholinguistics
- Type:
- lexicalConceptualResource
- Language:
- Dutch, English, and German
- Rights:
- Not specified
21. Chrudim /
- Creator:
- Šulc, Ivo,
- Publisher:
- Město Chrudim, Prezipp Chrudim,
- Subject:
- města, průvodce, dějiny měst, památky umělecké, města, obce, and přehledná zpracování dějin českých zemí (chronologicky)
- Language:
- Czech, English, French, German, and Dutch
- Rights:
- unknown
22. Code-switching conversation corpus
- Publisher:
- Max Planck Institute for Psycholinguistics
- Type:
- corpus
- Language:
- Dutch
- Description:
- The code-switching corpus consists of 5x30-minute conversations between four speakers (i.e. a total of 20 speakers). The speakers are bilingual speakers of Papiamento (a creole langauge spoken in the Dutch Antilles) and Dutch. In the course of their free conversations, they engage in code-switching, that is, they use both languages within the same utterance in systematic ways. The corpus is fully transcribed and glossed, coded for language and word class, in ELAN.
- Rights:
- Not specified
23. Comenius en Naarden :
- Creator:
- Goedhart, Pieter J.,
- Type:
- text and publikace jubilejní
- Subject:
- Muzea. Muzeologie. Muzejnictví. Výstavy, Komenský, Jan Amos,, mauzolea, muzea, filozofové čeští, učitelé, vztahy česko-nizozemské, zahraniční muzea a galerie, české země 1526-1620, dějiny vědy, umění, kultury a techniky, kulturní vztahy, české země 1620-1740, Nizozemí, and světové dějiny novověku (1492-1918)
- Language:
- Dutch and Czech
- Description:
- Vydáno u příležitosti 80. výročí Komenského mauzolea (8.5.1937 - 8.5.2017)
- Rights:
- unknown
24. Comenius in Nederland :
- Creator:
- Groenendijk, Leendert Frans,
- Type:
- text and studie
- Subject:
- Dějiny civilizace. Kulturní dějiny, Komenský, Jan Amos,, myšlení pedagogické, české země 1620-1740, and školství, pedagogika, učitelé, péče o mládež
- Language:
- Dutch
- Rights:
- unknown
25. CoNLL 2017 and 2018 Shared Task Blind and Preprocessed Test Data
- Creator:
- Zeman, Daniel and Straka, Milan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- tokenization, word segmentation, morphology, tagging, syntax, parsing, and universal dependencies
- Language:
- Afrikaans, Arabic, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Persian, Finnish, French, Old French (842-ca. 1400), Irish, Galician, Gothic, Ancient Greek (to 1453), Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Latin, Latvian, Dutch, Norwegian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Thai, Turkish, Uighur, Ukrainian, Urdu, Vietnamese, and Chinese
- Description:
- CoNLL 2017 and 2018 shared tasks: Multilingual Parsing from Raw Text to Universal Dependencies This package contains the test data in the form in which they ware presented to the participating systems: raw text files and files preprocessed by UDPipe. The metadata.json files contain lists of files to process and to output; README files in the respective folders describe the syntax of metadata.json. For full training, development and gold standard test data, see Universal Dependencies 2.0 (CoNLL 2017) Universal Dependencies 2.2 (CoNLL 2018) See the download links at http://universaldependencies.org/. For more information on the shared tasks, see http://universaldependencies.org/conll17/ http://universaldependencies.org/conll18/ Contents: conll17-ud-test-2017-05-09 ... CoNLL 2017 test data conll18-ud-test-2018-05-06 ... CoNLL 2018 test data conll18-ud-test-2018-05-06-for-conll17 ... CoNLL 2018 test data with metadata and filenames modified so that it is digestible by the 2017 systems.
- Rights:
- Licence Universal Dependencies v2.2, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.2, and PUB
26. CoNLL 2017 Shared Task System Outputs
- Creator:
- Zeman, Daniel, Potthast, Martin, Straka, Milan, Popel, Martin, Dozat, Timothy, Qi, Peng, Manning, Christopher, Shi, Tianze, Wu, Felix G., Chen, Xilun, Cheng, Yao, Björkelund, Anders, Falenska, Agnieszka, Yu, Xiang, Kuhn, Jonas, Che, Wanxiang, Guo, Jiang, Wang, Yuxuan, Zheng, Bo, Zhao, Huaipeng, Liu, Yang, Teng, Dechuan, Liu, Ting, Lim, Kyungtae, Poibeau, Thierry, Sato, Motoki, Manabe, Hitoshi, Noji, Hiroshi, Matsumoto, Yuji, Kırnap, Ömer, Önder, Berkay Furkan, Yuret, Deniz, Straková, Jana, Vania, Clara, Zhang, Xingxing, Lopez, Adam, Heinecke, Johannes, Asadullah, Munshi, Kanerva, Jenna, Luotolahti, Juhani, Ginter, Filip, Kuan, Yu, Sofroniev, Pavel, Schill, Erik, Hinrichs, Erhard, Nguyen, Dat Quoc, Dras, Mark, Johnson, Mark, Qian, Xian, Vilares, David, Gómez-Rodríguez, Carlos, Aufrant, Lauriane, Wisniewski, Guillaume, Yvon, François, Dumitrescu, Stefan Daniel, Boroş, Tiberiu, Tufiş, Dan, Das, Ayan, Zaffar, Affan, Sarkar, Sudeshna, Wang, Hao, Zhao, Hai, Zhang, Zhisong, Hornby, Ryan, Taylor, Clark, Park, Jungyeul, de Lhoneux, Miryam, Shao, Yan, Basirat, Ali, Kiperwasser, Eliyahu, Stymne, Sara, Goldberg, Yoav, Nivre, Joakim, Akkuş, Burak Kerim, Azizoglu, Heval, Cakici, Ruket, Moor, Christophe, Merlo, Paola, Henderson, James, Wang, Haozhou, Ji, Tao, Wu, Yuanbin, Lan, Man, de la Clergerie, Eric, Sagot, Benoît, Seddah, Djamé, More, Amir, Tsarfaty, Reut, Kanayama, Hiroshi, Muraoka, Masayasu, Yoshikawa, Katsumasa, Garcia, Marcos, and Gamallo, Pablo
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- dependency parser and parsebank
- Language:
- Arabic, Bulgarian, Russia Buriat, Czech, Catalan, Church Slavic, Danish, German, Modern Greek (1453-), English, Spanish, Estonian, Basque, Persian, Finnish, French, Irish, Galician, Gothic, Ancient Greek (to 1453), Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Latin, Latvian, Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Northern Sami, Swedish, Turkish, Uighur, Ukrainian, Urdu, Vietnamese, and Chinese
- Description:
- This package contains the system outputs from the CoNLL 2017 Shared Task in Multilingual Parsing from Raw Text to Universal Dependencies.
- Rights:
- Licence Universal Dependencies v2.0, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.0, and PUB
27. CoNLL 2018 Shared Task System Outputs
- Creator:
- Zeman, Daniel, Potthast, Martin, Duthoo, Elie, Mesnard, Olivier, Rybak, Piotr, Wróblewska, Alina, Che, Wanxiang, Liu, Yijia, Wang, Yuxuan, Zheng, Bo, Liu, Ting, Li, Zuchao, He, Shexia, Zhang, Zhuosheng, Zhao, Hai, Wu, Yingting, Tong, Jia-Jun, Nguyen, Dat Quoc, Verspoor, Karin, Wan, Hui, Naseem, Tahira, Lee, Young-Suk, Castelli, Vittorio, Ballesteros, Miguel, Hershcovich, Daniel, Abend, Omri, Rappoport, Ari, Smith, Aaron, Bohnet, Bernd, de Lhoneux, Miryam, Nivre, Joakim, Shao, Yan, Stymne, Sara, Kırnap, Ömer, Dayanık, Erenay, Yuret, Deniz, Kanerva, Jenna, Ginter, Filip, Miekka, Niko, Leino, Akseli, Salakoski, Tapio, Lim, KyungTae, Park, Cheoneum, Lee, Changki, Poibeau, Thierry, Bhat, Riyaz Ahmad, Bhat, Irshad, Bangalore, Srinivas, Qi, Peng, Dozat, Timothy, Zhang, Yuhao, Manning, Christopher, Boroș, Tiberiu, Dumitrescu, Stefan Daniel, Burtica, Ruxandra, Arakelyan, Gor, Hambardzumyan, Karen, Khachatrian, Hrant, Rosa, Rudolf, Mareček, David, Straka, Milan, Seker, Amit, More, Amir, Tsarfaty, Reut, Önder, Berkay Furkan, Gümeli, Can, Jawahar, Ganesh, Muller, Benjamin, Fethi, Amal, Martin, Louis, Villemonte de la Clergerie, Eric, Sagot, Benoît, Seddah, Djamé, Özateş, Şaziye Betül, Özgür, Arzucan, Gungor, Tunga, Öztürk, Balkız, Ji, Tao, Liu, Yufang, Wang, Yijun, Wu, Yuanbin, Lan, Man, Chen, Danlu, Lin, Mengxiao, Hu, Zhifeng, and Qiu, Xipeng
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- parsed data, conllu, and universal dependencies
- Language:
- Afrikaans, Arabic, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Persian, Finnish, French, Old French (842-ca. 1400), Irish, Galician, Gothic, Ancient Greek (to 1453), Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Latin, Latvian, Dutch, Norwegian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Thai, Turkish, Uighur, Ukrainian, Urdu, Vietnamese, and Chinese
- Description:
- Test data parsed by systems submitted to the CoNLL 2018 UD parsing shared task.
- Rights:
- Licence Universal Dependencies v2.2, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.2, and PUB
28. Coreference in Universal Dependencies 0.1 (CorefUD 0.1)
- Creator:
- Nedoluzhko, Anna, Novák, Michal, Popel, Martin, Žabokrtský, Zdeněk, and Zeman, Daniel
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- dependency, treebank, coreference, bridging relations, and harmonized annotation
- Language:
- Catalan, Czech, Dutch, English, French, German, Hungarian, Lithuanian, Polish, Russian, and Spanish
- Description:
- CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version 0.1 consists of 17 datasets for 11 languages. The datasets are enriched with automatic morphological and syntactic annotations that are fully compliant with the standards of the Universal Dependencies project. All the datasets are stored in the CoNLL-U format, with coreference- and bridging-specific information captured by attribute-value pairs located in the MISC column. The collection is divided into a public edition and a non-public (ÚFAL-internal) edition. The publicly available edition is distributed via LINDAT-CLARIAH-CZ and contains 13 datasets for 10 languages (1 dataset for Catalan, 2 for Czech, 2 for English, 1 for French, 2 for German, 1 for Hungarian, 1 for Lithuanian, 1 for Polish, 1 for Russian, and 1 for Spanish), excluding the test data. The non-public edition is available internally to ÚFAL members and contains additional 4 datasets for 2 languages (1 dataset for Dutch, and 3 for English), which we are not allowed to distribute due to their original license limitations. It also contains the test data portions for all datasets. When using any of the harmonized datasets, please get acquainted with its license (placed in the same directory as the data) and cite the original data resource too. References to original resources whose harmonized versions are contained in the public edition of CorefUD 0.1: - Catalan-AnCora: Recasens, M. and Martí, M. A. (2010). AnCora-CO: Coreferentially Annotated Corpora for Spanish and Catalan. Language Resources and Evaluation, 44(4):315–345 - Czech-PCEDT: Nedoluzhko, A., Novák, M., Cinková, S., Mikulová, M., and Mírovský, J. (2016). Coreference in Prague Czech-English Dependency Treebank. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 169–176, Portorož, Slovenia. European Language Resources Association. - Czech-PDT: Hajič, J., Bejček, E., Hlaváčová, J., Mikulová, M., Straka, M., Štěpánek, J., and Štěpánková, B. (2020). Prague Dependency Treebank - Consolidated 1.0. In Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020), pages 5208–5218, Marseille, France. European Language Resources Association. - English-GUM: Zeldes, A. (2017). The GUM Corpus: Creating Multilayer Resources in the Classroom. Language Resources and Evaluation, 51(3):581–612. - English-ParCorFull: Lapshinova-Koltunski, E., Hardmeier, C., and Krielke, P. (2018). ParCorFull: a Parallel Corpus Annotated with Full Coreference. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association. - French-Democrat: Landragin, F. (2016). Description, modélisation et détection automatique des chaı̂nes de référence (DEMOCRAT). Bulletin de l’Association Française pour l’Intelligence Artificielle, (92):11–15. - German-ParCorFull: Lapshinova-Koltunski, E., Hardmeier, C., and Krielke, P. (2018). ParCorFull: a Parallel Corpus Annotated with Full Coreference. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association - German-PotsdamCC: Bourgonje, P. and Stede, M. (2020). The Potsdam Commentary Corpus 2.2: Extending annotations for shallow discourse parsing. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 1061–1066, Marseille, France. European Language Resources Association. - Hungarian-SzegedKoref: Vincze, V., Hegedűs, K., Sliz-Nagy, A., and Farkas, R. (2018). SzegedKoref: A Hungarian Coreference Corpus. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association. - Lithuanian-LCC: Žitkus, V. and Butkienė, R. (2018). Coreference Annotation Scheme and Corpus for Lithuanian Language. In Fifth International Conference on Social Networks Analysis, Management and Security, SNAMS 2018, Valencia, Spain, October 15-18, 2018, pages 243–250. IEEE. - Polish-PCC: Ogrodniczuk, M., Glowińska, K., Kopeć, M., Savary, A., and Zawisławska, M. (2013). Polish coreference corpus. In Human Language Technology. Challenges for Computer Science and Linguistics - 6th Language and Technology Conference, LTC 2013, Poznań, Poland, December 7-9, 2013. Revised Selected Papers, volume 9561 of Lecture Notes in Computer Science, pages 215–226. Springer. - Russian-RuCor: Toldova, S., Roytberg, A., Ladygina, A. A., Vasilyeva, M. D., Azerkovich, I. L., Kurzukov,M., Sim, G., Gorshkov, D. V., Ivanova, A., Nedoluzhko, A., and Grishina, Y. (2014). Evaluating Anaphora and Coreference Resolution for Russian. In Komp’juternaja lingvistika i intellektual’nye tehnologii. Po materialam ezhegodnoj Mezhdunarodnoj konferencii Dialog, pages 681–695. - Spanish-AnCora: Recasens, M. and Martí, M. A. (2010). AnCora-CO: Coreferentially Annotated Corpora for Spanish and Catalan. Language Resources and Evaluation, 44(4):315–345 References to original resources whose harmonized versions are contained in the ÚFAL-internal edition of CorefUD 0.1: - Dutch-COREA: Hendrickx, I., Bouma, G., Coppens, F., Daelemans, W., Hoste, V., Kloosterman, G., Mineur, A.-M., Van Der Vloet, J., and Verschelde, J.-L. (2008). A coreference corpus and resolution system for Dutch. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), Marrakech, Morocco. European Language Resources Association. - English-ARRAU: Uryupina, O., Artstein, R., Bristot, A., Cavicchio, F., Delogu, F., Rodriguez, K. J., and Poesio, M. (2020). Annotating a broad range of anaphoric phenomena, in a variety of genres: the ARRAU Corpus. Natural Language Engineering, 26(1):95–128. - English-OntoNotes: Weischedel, R., Hovy, E., Marcus, M., Palmer, M., Belvin, R., Pradhan, S., Ramshaw, L., and Xue, N. (2011). Ontonotes: A large training corpus for enhanced processing. In Handbook of Natural Language Processing and Machine Translation: DARPA Global Autonomous Language Exploitation, pages 54–63, New York. Springer-Verlag. - English-PCEDT: Nedoluzhko, A., Novák, M., Cinková, S., Mikulová, M., and Mírovský, J. (2016). Coreference in Prague Czech-English Dependency Treebank. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 169–176, Portorož, Slovenia. European Language Resources Association.
- Rights:
- Licence CorefUD v0.1, https://lindat.mff.cuni.cz/repository/xmlui/page/license-corefud-0.1, and PUB
29. Coreference in Universal Dependencies 0.2 (CorefUD 0.2)
- Creator:
- Nedoluzhko, Anna, Novák, Michal, Popel, Martin, Žabokrtský, Zdeněk, and Zeman, Daniel
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- dependency, treebank, coreference, bridging relations, and harmonized annotation
- Language:
- Catalan, Czech, Dutch, English, French, German, Hungarian, Lithuanian, Polish, Russian, and Spanish
- Description:
- CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version 0.2 consists of 17 datasets for 11 languages. The datasets are enriched with automatic morphological and syntactic annotations that are fully compliant with the standards of the Universal Dependencies project. All the datasets are stored in the CoNLL-U format, with coreference- and bridging-specific information captured by attribute-value pairs located in the MISC column. The collection is divided into a public edition and a non-public (ÚFAL-internal) edition. The publicly available edition is distributed via LINDAT-CLARIAH-CZ and contains 13 datasets for 10 languages (1 dataset for Catalan, 2 for Czech, 2 for English, 1 for French, 2 for German, 1 for Hungarian, 1 for Lithuanian, 1 for Polish, 1 for Russian, and 1 for Spanish), excluding the test data. The non-public edition is available internally to ÚFAL members and contains additional 4 datasets for 2 languages (1 dataset for Dutch, and 3 for English), which we are not allowed to distribute due to their original license limitations. It also contains the test data portions for all datasets. When using any of the harmonized datasets, please get acquainted with its license (placed in the same directory as the data) and cite the original data resource too. Version 0.2 consists of exactly the same datasets as the version 0.1. All automatically parsed datasets were re-parsed for v0.2 using UDPipe 2 with models trained on UD 2.6. Catalan-AnCora, Spanish-AnCora and English-GUM have been updated to match the their UD 2.9 versions.
- Rights:
- Licence CorefUD v0.2, https://lindat.mff.cuni.cz/repository/xmlui/page/license-corefud-0.2, and PUB
30. Coreference in Universal Dependencies 1.0 (CorefUD 1.0)
- Creator:
- Nedoluzhko, Anna, Novák, Michal, Popel, Martin, Žabokrtský, Zdeněk, Zeldes, Amir, Zeman, Daniel, Bourgonje, Peter, Cinková, Silvie, Hajič, Jan, Hardmeier, Christian, Krielke, Pauline, Landragin, Frédéric, Lapshinova-Koltunski, Ekaterina, Martí, M. Antònia, Mikulová, Marie, Ogrodniczuk, Maciej, Recasens, Marta, Stede, Manfred, Straka, Milan, Toldova, Svetlana, Vincze, Veronika, and Žitkus, Voldemaras
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- dependency, treebank, coreference, bridging relations, and harmonized annotation
- Language:
- Catalan, Czech, Dutch, English, French, German, Hungarian, Lithuanian, Polish, Russian, and Spanish
- Description:
- CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version 1.0 consists of 17 datasets for 11 languages. The datasets are enriched with automatic morphological and syntactic annotations that are fully compliant with the standards of the Universal Dependencies project. All the datasets are stored in the CoNLL-U format, with coreference- and bridging-specific information captured by attribute-value pairs located in the MISC column. The collection is divided into a public edition and a non-public (ÚFAL-internal) edition. The publicly available edition is distributed via LINDAT-CLARIAH-CZ and contains 13 datasets for 10 languages (1 dataset for Catalan, 2 for Czech, 2 for English, 1 for French, 2 for German, 1 for Hungarian, 1 for Lithuanian, 1 for Polish, 1 for Russian, and 1 for Spanish), excluding the test data. The non-public edition is available internally to ÚFAL members and contains additional 4 datasets for 2 languages (1 dataset for Dutch, and 3 for English), which we are not allowed to distribute due to their original license limitations. It also contains the test data portions for all datasets. When using any of the harmonized datasets, please get acquainted with its license (placed in the same directory as the data) and cite the original data resource too. Version 1.0 consists of the same corpora and languages as the previous version 0.2; however, the English GUM dataset has been updated to a newer and larger version, and in the Czech/English PCEDT dataset, the train-dev-test split has been changed to be compatible with OntoNotes. Nevertheless, the main change is in the file format (the MISC attributes have new form and interpretation).
- Rights:
- Licence CorefUD v0.2, https://lindat.mff.cuni.cz/repository/xmlui/page/license-corefud-0.2, and PUB
31. CorpusExplorer
- Creator:
- Rüdiger, Jan Oliver
- Publisher:
- Jan Oliver Rüdiger
- Type:
- tool and toolService
- Subject:
- Corpus Linguisitics, NLP, conll, tei, XML, nlp, Natural Language Processing, linguistics, Linguistics, Computational Linguistics, corpus processing, tagger, POS tagger, lemmatization, text cleaning, CommonCrawl, epub, JSON, Twitter, Pandoc, Wikipedia, digital data, DTA, DSpin, MySQL, ElasticSearch, TextGrid, text corpora, TigerXML, and WeblichtXML
- Language:
- German, English, French, Italian, Dutch, Spanish, Polish, Arabic, Chinese, and Portuguese
- Description:
- Software for corpus linguists and text/data mining enthusiasts. The CorpusExplorer combines over 45 interactive visualizations under a user-friendly interface. Routine tasks such as text acquisition, cleaning or tagging are completely automated. The simple interface supports the use in university teaching and leads users/students to fast and substantial results. The CorpusExplorer is open for many standards (XML, CSV, JSON, R, etc.) and also offers its own software development kit (SDK). Source code available at https://github.com/notesjor/corpusexplorer2.0
- Rights:
- Not specified
32. Correspondentie van Willem den Eerste, Prins van Oranje.
- Creator:
- Japikse, N.,
- Type:
- text and korespondence
- Subject:
- Dějiny ostatních evropských států, Vilém, korespondence, politici nizozemští, Nizozemí, politické dějiny, politici, and světové dějiny 1492-1648
- Language:
- Dutch
- Rights:
- unknown
33. CST's lemmatiser
- Publisher:
- Center for Sprogteknologi, University of Copenhagen
- Type:
- toolService
- Language:
- Danish, Dutch, English, German, Modern Greek (1453-), Icelandic, Norwegian, Russian, Slovenian, and Swedish
- Description:
- 1) Fully automatic rule based lemmatization of inflected languages 2) Fully automatic training of lemmatization rules based on full form-lemma list
- Rights:
- Not specified
34. DaMuEL 1.0: A Large Multilingual Dataset for Entity Linking
- Creator:
- Kubeša, David and Straka, Milan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- entity linking, NEL, NER, dataset, and knowledge base
- Language:
- Afrikaans, Arabic, Armenian, Basque, Belarusian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Latin, Latvian, Lithuanian, Maltese, Marathi, Modern Greek (1453-), Northern Sami, Norwegian Nynorsk, Persian, Polish, Portuguese, Romanian, Russian, Scottish Gaelic, Serbian, Slovak, Slovenian, Spanish, Swedish, Tamil, Telugu, Uighur, Ukrainian, Urdu, Vietnamese, and Wolof
- Description:
- We present DaMuEL, a large Multilingual Dataset for Entity Linking containing data in 53 languages. DaMuEL consists of two components: a knowledge base that contains language-agnostic information about entities, including their claims from Wikidata and named entity types (PER, ORG, LOC, EVENT, BRAND, WORK_OF_ART, MANUFACTURED); and Wikipedia texts with entity mentions linked to the knowledge base, along with language-specific text from Wikidata such as labels, aliases, and descriptions, stored separately for each language. The Wikidata QID is used as a persistent, language-agnostic identifier, enabling the combination of the knowledge base with language-specific texts and information for each entity. Wikipedia documents deliberately annotate only a single mention for every entity present; we further automatically detect all mentions of named entities linked from each document. The dataset contains 27.9M named entities in the knowledge base and 12.3G tokens from Wikipedia texts. The dataset is published under the CC BY-SA licence.
- Rights:
- Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB
35. De banden tussen Nederland en Tsjechoslowakije /
- Creator:
- Polišenský, Josef,
- Type:
- text and přehledy
- Subject:
- Mezinárodní vztahy, světová politika, vztahy česko-nizozemské, přehledná zpracování dějin českých zemí (chronologicky), zahraniční politika, mezinárodní vztahy, Nizozemí, and přehledná zpracování světových dějin (chronologicky)
- Language:
- Dutch
- Rights:
- unknown
36. De Custen van Noorwegen, Finmarcken, Laplandt, Spitsbergen, Ian Mayen eylandt, Yslandt
- Creator:
- Goos, Pieter
- Publisher:
- Goos, Pieter
- Format:
- 1 mapa : 44,5 x 54,5 cm and kartografický dokument
- Type:
- model:map and IMAGE
- Language:
- Dutch
- Description:
- Mapa ručně kolorována
- Rights:
- http://creativecommons.org/publicdomain/mark/1.0/ and policy:public
37. De economische ontwikkeling van Europa 950-1950 /
- Creator:
- Van der Wee, Herman,
- Type:
- text and monografie
- Subject:
- Dějiny Evropy, dějiny hospodářské, ekonomie, hospodářství, syntézy, hospodářské dějiny, and přehledná zpracování světových dějin (chronologicky)
- Language:
- Dutch
- Rights:
- unknown
38. De kerk als minderheid /
- Type:
- text and sborníky
- Subject:
- Náboženství, církve reformované, církve evangelické, církve nizozemské, církve české, dějiny církevní, vztahy stát-církev, přehledná zpracování světových dějin (chronologicky), Nizozemí, církve, sekty, and přehledná zpracování dějin českých zemí (chronologicky)
- Language:
- Dutch
- Description:
- Vydáno též česky
- Rights:
- unknown
39. DE LEVENS-BESCHRYVINGEN DER NEDERLANDSCHE KONST-SCHILDERS EN KONST-SCHILDERESSEN, ...
- Creator:
- Weyerman, Jacob Campo and Blussé en zoon, Abraham
- Publisher:
- Blussé en zoon, Abraham
- Format:
- print and [4], 92, 475 [recte 465], [5] pp ; 4°
- Type:
- model:monograph and TEXT
- Subject:
- století 18., nizozemské malířství, and biografie
- Language:
- Dutch
- Description:
- VIERDE DEEL.
- Rights:
- http://creativecommons.org/publicdomain/mark/1.0/ and policy:public
40. DE LEVENS-BESCHRYVINGEN DER NEDERLANDSCHE KONST-SCHILDERS EN KONST-SCHILDERESSEN, ...
- Creator:
- Weyerman, Jacob Campo, Houbraken, Jacobus, Boucquet, Engelbrecht, Scheurleer, Henri, Boucquet, Frederik, and Jongh, Jacobus de
- Publisher:
- Boucquet, Engelbrecht, Scheurleer, Henri, Boucquet, Frederik, and Jongh, Jacobus de
- Format:
- print and [4], 446, [4] pp + [8] tab. ; 4°
- Type:
- model:monograph and TEXT
- Subject:
- století 18., nizozemské malířství, and biografie
- Language:
- Dutch
- Description:
- DERDE DEEL.
- Rights:
- http://creativecommons.org/publicdomain/mark/1.0/ and policy:public
41. DE LEVENS-BESCHRYVINGEN DER NEDERLANDSCHE KONST-SCHILDERS EN KONST-SCHILDERESSEN, ...
- Creator:
- Weyerman, Jakob Campo and Houbraken, Jacobus
- Type:
- model:monograph and TEXT
- Language:
- Dutch
- Rights:
- http://creativecommons.org/publicdomain/mark/1.0/ and policy:public
42. De socialisten, personen en stelsels.
- Creator:
- Quack, Hendrik P. G.,
- Type:
- text and monografie
- Subject:
- Dějiny Evropy, socialismus, socialisté, hnutí socialistické, politické dějiny, politici, and světové dějiny 1789-1918
- Language:
- Dutch
- Rights:
- unknown
43. De uitbarsting /
- Creator:
- Japikse, N.,
- Type:
- text and monografie
- Subject:
- Mezinárodní vztahy, světová politika, válka první světová (1914-1918), politika zahraniční, světové dějiny 1914-1918, and zahraniční politika, mezinárodní vztahy
- Language:
- Dutch
- Rights:
- unknown
44. Deep Universal Dependencies 2.4
- Creator:
- Zeman, Daniel and Droganova, Kira
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- semantic dependency and universal dependencies
- Language:
- Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, and Galician
- Description:
- Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-2988). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
- Rights:
- Licence Universal Dependencies v2.4, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.4, and PUB
45. Deep Universal Dependencies 2.5
- Creator:
- Zeman, Daniel and Droganova, Kira
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- semantic dependency and universal dependencies
- Language:
- Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, Galician, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, and Skolt Sami
- Description:
- Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3105). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
- Rights:
- Licence Universal Dependencies v2.5, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.5, and PUB
46. Deep Universal Dependencies 2.6
- Creator:
- Zeman, Daniel and Droganova, Kira
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- semantic dependency and universal dependencies
- Language:
- Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, Galician, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Icelandic, Albanian, and Persian
- Description:
- Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3226). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
- Rights:
- Licence Universal Dependencies v2.6, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.6, and PUB
47. Deep Universal Dependencies 2.7
- Creator:
- Zeman, Daniel and Droganova, Kira
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- semantic dependency and universal dependencies
- Language:
- Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, Galician, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Icelandic, Albanian, Persian, Akuntsu, Apurinã, Khunsari, Manx, Mundurukú, Nayini, Soi, South Levantine Arabic, and Tupinambá
- Description:
- Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3424). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
- Rights:
- Licence Universal Dependencies v2.7, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.7, and PUB
48. Deep Universal Dependencies 2.8
- Creator:
- Zeman, Daniel and Droganova, Kira
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- semantic dependency and universal dependencies
- Language:
- Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, Galician, Bhojpuri, Komi-Permyak, Livvi, Moksha, Scottish Gaelic, Skolt Sami, Icelandic, Albanian, Persian, Akuntsu, Apurinã, Khunsari, Manx, Mundurukú, Nayini, Soi, South Levantine Arabic, Tupinambá, Beja, Western Frisian, Urubú-Kaapor, Kangri, K'iche', Low German, Makuráp, Western Armenian, and Central Siberian Yupik
- Description:
- Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3687). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
- Rights:
- Licence Universal Dependencies v2.8, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.8, and PUB
49. Delftse Bijbel 1477
- Publisher:
- NBG/DBNL/INL; Nicoline van der Sijs
- Type:
- corpus
- Language:
- Dutch
- Description:
- Digitised version of the Delftse Bijbel 1477
- Rights:
- Not specified
50. Deltacorpus
- Creator:
- Mareček, David, Yu, Zhiwei, Zeman, Daniel, and Žabokrtský, Zdeněk
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- part of speech, tagging, semi-supervised, and cross-language
- Language:
- Belarusian, Bosnian, Bulgarian, Czech, Serbo-Croatian, Croatian, Upper Sorbian, Macedonian, Polish, Russian, Slovak, Slovenian, Serbian, Ukrainian, Latvian, Lithuanian, Afrikaans, Danish, German, English, Faroese, Western Frisian, Swiss German, Icelandic, Limburgan, Luxembourgish, Low German, Dutch, Norwegian Nynorsk, Norwegian, Scots, Swedish, Yiddish, Aragonese, Asturian, Catalan, French, Galician, Haitian, Italian, Latin, Lombard, Neapolitan, Piemontese, Portuguese, Romanian, Spanish, Venetian, Walloon, Breton, Welsh, Scottish Gaelic, Irish, Modern Greek (1453-), Armenian, Albanian, Dimli (individual language), Persian, Gilaki, Kurdish, Tajik, Bengali, Bishnupriya, Gujarati, Fiji Hindi, Hindi, Marathi, Nepali (macrolanguage), Urdu, Amharic, Arabic, Egyptian Arabic, Hebrew, Estonian, Finnish, Hungarian, Basque, Georgian, Chuvash, Azerbaijani, Turkish, Uzbek, Kazakh, Tatar, Yakut, Korean, Mongolian, Telugu, Kannada, Malayalam, Tamil, Newari, Vietnamese, Indonesian, Javanese, Malagasy, Maori, Malay (macrolanguage), Pampanga, Sundanese, Tagalog, Waray (Philippines), Swahili (macrolanguage), Esperanto, Ido, Interlingua (International Auxiliary Language Association), and Volapük
- Description:
- Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger described in Yu et al. (2016, LREC, Portorož, Slovenia).
- Rights:
- Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB
51. Deltacorpus 1.1
- Creator:
- Mareček, David, Yu, Zhiwei, Zeman, Daniel, and Žabokrtský, Zdeněk
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- part of speech, tagging, semi-supervised, and cross-language
- Language:
- Belarusian, Bosnian, Bulgarian, Czech, Serbo-Croatian, Croatian, Upper Sorbian, Macedonian, Polish, Russian, Slovak, Slovenian, Serbian, Ukrainian, Latvian, Lithuanian, Afrikaans, Danish, German, English, Faroese, Western Frisian, Swiss German, Icelandic, Limburgan, Luxembourgish, Low German, Dutch, Norwegian Nynorsk, Norwegian, Scots, Swedish, Yiddish, Aragonese, Asturian, Catalan, French, Galician, Haitian, Italian, Latin, Lombard, Neapolitan, Piemontese, Portuguese, Romanian, Spanish, Venetian, Walloon, Breton, Welsh, Scottish Gaelic, Irish, Modern Greek (1453-), Armenian, Albanian, Dimli (individual language), Persian, Gilaki, Kurdish, Tajik, Bengali, Bishnupriya, Gujarati, Fiji Hindi, Hindi, Marathi, Nepali (macrolanguage), Urdu, Amharic, Arabic, Egyptian Arabic, Hebrew, Estonian, Finnish, Hungarian, Basque, Georgian, Chuvash, Azerbaijani, Turkish, Uzbek, Kazakh, Tatar, Yakut, Korean, Mongolian, Telugu, Kannada, Malayalam, Tamil, Newari, Vietnamese, Indonesian, Javanese, Malagasy, Maori, Malay (macrolanguage), Pampanga, Sundanese, Tagalog, Waray (Philippines), Swahili (macrolanguage), Esperanto, Ido, Interlingua (International Auxiliary Language Association), and Volapük
- Description:
- Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger described in Yu et al. (2016, LREC, Portorož, Slovenia). Changes in version 1.1: 1. Universal Dependencies tagset instead of the older and smaller Google Universal POS tagset. 2. SVM classifier trained on Universal Dependencies 1.2 instead of HamleDT 2.0. 3. Balto-Slavic languages, Germanic languages and Romance languages were tagged by classifier trained only on the respective group of languages. Other languages were tagged by a classifier trained on all available languages. The "c7" combination from version 1.0 is no longer used.
- Rights:
- Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB
52. Dietrich von Nieheim, zyne opvatting van het Concilie en zyne Kronick :
- Creator:
- Mulder, Willem Johannes Maria,
- Type:
- text
- Subject:
- Křesťanská teologie. Dogmatická teologie, Theodoricus de Nieheim,, reformátoři, koncil kostnický (1414-1418), jednotlivci (církevní dějiny), and světové dějiny středověku (do r. 1492)
- Language:
- Dutch
- Rights:
- unknown
53. DPC (Dutch Parallel Corpus)
- Publisher:
- Katholieke Universiteit Leuven Campus Kortrijk, Hogeschool Gent
- Type:
- corpus
- Language:
- Dutch, English, and French
- Description:
- Parallel corpus, with Dutch as first language, 10 M words (under construction). DPC is a STEVIN-project.
- Rights:
- Not specified
54. Dutch Bilingualism Data Base (DBD)
- Publisher:
- Radboud University Nijmegen, Max Planck Institute for Psycholinguistics, Meertens Institute KNAW The Netherlands, and Babylon Centre for Studies of Multilingualism in the Multicultural Society
- Type:
- corpus
- Language:
- Arabic, Dutch, and Turkish
- Description:
- Audio recordings, transcripts,
- Rights:
- Not specified
55. DynaSAND (Dynamic Syntactic Atlas of the Dutch dialects)
- Publisher:
- Meertens Institute KNAW The Netherlands
- Type:
- corpus
- Language:
- Dutch
- Description:
- The Dynamic Syntactic Atlas of the Dutch dialects (DynaSAND) is an on-line tool for dialect syntax research. DynaSAND consists of a database, a search engine, a cartographic component and a bibliography.
- Rights:
- Not specified
56. Een belangrijk arabisch bericht over de slawische volken omstreeks 965 n. Ch. /
- Creator:
- Goeje, Michael Jan de,
- Type:
- text and studie
- Subject:
- Dějiny Evropy, Ibrahim ibn Jákúb,, Al-Bekri,, kronikáři arabští, kodexy, Slované, kroniky arabské, edice, paleografie, filologie, vztahy arabsko-slovanské, dějepisectví, historické vědy, historici, světové dějiny středověku (do r. 1492), and rukopisy
- Language:
- Dutch
- Description:
- Overgedrukt uit de Verslagen en Medeelingen der Koninklijke Akademie van Wetenschappen, Afdeling Letterkunde, 2. de Reeks, Deel IX.
- Rights:
- unknown
57. Essais d'arithmétique politique contenant trois traités sur la population de la province de Hollande et Frise occidentale /
- Creator:
- Kersseboom, Willem,
- Type:
- text and prameny
- Subject:
- Demografie. Populace, demografie historická, demografové nizozemští, and dějiny historické demografie a statistiky, jednotlivci
- Language:
- French and Dutch
- Rights:
- unknown
58. Europa in een boek /
- Creator:
- Presser, Jacob,
- Type:
- text and studie
- Subject:
- Germánské literatury, dějiny evropské, dějiny společnosti, společenská struktura, and přehledná zpracování světových dějin (chronologicky)
- Language:
- Dutch
- Rights:
- unknown
59. Franz Kafka en Praag /
- Creator:
- Salfellner, Harald,
- Publisher:
- Vitalis,
- Subject:
- Kafka, Franz,, spisovatelé německojazyční, Židé, biografie, české země 1848-1918, Československo 1918-1938, and literatura, spisovatelé
- Language:
- Dutch
- Rights:
- unknown
60. HamleDT 2.0
- Creator:
- Zeman, Daniel, Mareček, David, Mašek, Jan, Popel, Martin, Ramasamy, Loganathan, Rosa, Rudolf, Štěpánek, Jan, and Žabokrtský, Zdeněk
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- treebank, Stanford dependencies, Prague dependencies, harmonization, common annotation style, and Interset
- Language:
- Arabic, Bulgarian, Bengali, Catalan, Czech, Danish, German, Modern Greek (1453-), English, Spanish, Estonian, Basque, Persian, Finnish, Ancient Greek (to 1453), Hindi, Hungarian, Italian, Japanese, Latin, Dutch, Portuguese, Romanian, Russian, Slovak, Slovenian, Swedish, Tamil, Telugu, and Turkish
- Description:
- HamleDT 2.0 is a collection of 30 existing treebanks harmonized into a common annotation style, the Prague Dependencies, and further transformed into Stanford Dependencies, a treebank annotation style that became popular recently. We use the newest basic Universal Stanford Dependencies, without added language-specific subtypes.
- Rights:
- HamleDT 2.0 Licence Agreement, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-hamledt-2.0, and ACA
61. HamleDT 3.0
- Creator:
- Zeman, Daniel, Mareček, David, Mašek, Jan, Popel, Martin, Ramasamy, Loganathan, Rosa, Rudolf, Štěpánek, Jan, and Žabokrtský, Zdeněk
- Publisher:
- Charles University
- Type:
- text and corpus
- Subject:
- annotated corpus, morphology, syntax, dependency, treebank, harmonized annotation, and common annotation style
- Language:
- Arabic, Basque, Bengali, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Modern Greek (1453-), Ancient Greek (to 1453), Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Persian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Tamil, Telugu, and Turkish
- Description:
- HamleDT (HArmonized Multi-LanguagE Dependency Treebank) is a compilation of existing dependency treebanks (or dependency conversions of other treebanks), transformed so that they all conform to the same annotation style. This version uses Universal Dependencies as the common annotation style. Update (November 1017): for a current collection of harmonized dependency treebanks, we recommend using the Universal Dependencies (UD). All of the corpora that are distributed in HamleDT in full are also part of the UD project; only some corpora from the Patch group (where HamleDT provides only the harmonizing scripts but not the full corpus data) are available in HamleDT but not in UD.
- Rights:
- HamleDT 3.0 License Terms, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-hamledt-3.0, and PUB
62. Het gezicht der vrijheid /
- Creator:
- Mout, M. E. H. N.,
- Type:
- text and přednášky
- Subject:
- Historická věda. Pomocné vědy historické. Archivnictví, historiografie nizozemská, svobody stavovské, svoboda náboženská, svoboda občanská, and pojmosloví (historiografie)
- Language:
- Dutch
- Description:
- projev k zahájení činnosti katedry středoevropských studií na univerzitě v Leydenu 14. 6. 1991
- Rights:
- unknown
63. Het gezicht der vrijheid :
- Creator:
- Mout, M. E. H. N.,
- Type:
- text and projevy
- Subject:
- Historická věda. Pomocné vědy historické. Archivnictví, historiografie nizozemská, svoboda stavovská, svoboda náboženská, svoboda občanská, pojmosloví (historiografie), Nizozemí, světové dějiny novověku (1492-1918), and politické dějiny, politici
- Language:
- Dutch and German
- Description:
- projev k zahájení činnosti katedry středoevropských studíí na univerzitě v Leydenu 14. 6. 1991 and Souběžný název: Gesicht der Freiheit : Vorlesung gehalten beim Antritt des Amts eines Professors füt Mitteleuropäische Studien mit besonderer Berücksichtigung Österreichs errichtet von der Stichting Oostenrijkse Studiën an der Universität Leiden am 14. Juni 1991
- Rights:
- unknown
64. Het Nederlandsch Economisch-Historisch Archief /
- Creator:
- Wiersum, Eppe,
- Type:
- text and studie
- Subject:
- Historická věda. Pomocné vědy historické. Archivnictví, archivy nizozemské, dějiny hospodářské, zahraniční archivnictví, Nizozemí, světové dějiny 1789-1918, and dějepisectví, historické vědy, historici
- Language:
- Dutch
- Rights:
- unknown
65. Het Noorderkwartier :
- Creator:
- Woude, Adrianus Maria van der,
- Type:
- text and monografie
- Subject:
- Dějiny ostatních evropských států, demografie, dějiny hospodářské, Nizozemí, světové dějiny novověku (1492-1918), hospodářské dějiny, and historická demografie
- Language:
- Dutch
- Rights:
- unknown
66. Het onderwijsbeleid in Nederlands-Indie 1900-1940 :
- Type:
- text and dokumenty
- Subject:
- Dějiny států jihovýchodní Asie, politika zahraniční, kolonialismus, Nizozemí, Indonésie, zahraniční politika, mezinárodní vztahy, světové dějiny 1789-1918, and světové dějiny 1918-1945
- Language:
- Dutch
- Rights:
- unknown
67. Het Panslavisme bij de Tsjechen en Slovaken /
- Creator:
- Locher, Th. J. G.,
- Type:
- text and studie
- Subject:
- Dějiny Česka a Slovenska, panslavismus, dějiny ideí, ideologie, české země 1792-1918, and národnosti, vztahy mezi národnostmi a národní hnutí
- Language:
- Dutch
- Description:
- Verslag van het eerste Congres van nederlandsche Historici gehouden te S'Gravenhage den 14 Mei 1932, příloha
- Rights:
- unknown
68. Historici in de politiek /
- Type:
- text and monografie kolektivní
- Subject:
- Politika, Palacký, František,, Raffles, Thomas Stamford,, Mommsen, Theodor,, Bernstein, Eduard,, Hanotaux, Gabriel,, Miljukov, Pavel Nikolajevič,, historici, život politický, světové dějiny 1789-1918, české země 1792-1918, and dějiny vědy, umění, kultury a techniky, kulturní vztahy
- Language:
- Dutch
- Rights:
- unknown
69. Historisme in economisch denken :
- Creator:
- Krabbe, Jacob Jan,
- Type:
- text and monografie
- Subject:
- Ekonomie, myšlení ekonomické, dějiny hospodářské, sociologie, psychologie, sociologové, psychologové, and světové dějiny novověku (1492-1918)
- Language:
- Dutch
- Rights:
- unknown
70. IFA dialog video corpus
- Publisher:
- IFA-groep, University of Amsterdam
- Type:
- corpus
- Language:
- Dutch
- Description:
- A video collection of spontaneous speech dialogues of 42 participants (14m, 28f)
- Rights:
- GNU GPL
71. IFA Dialog Video corpus
- Publisher:
- Amsterdam Centre for Language and Communication, University of Amsterdam
- Type:
- corpus
- Language:
- Dutch
- Description:
- annotated video recordings of friendly face-to-face dialogs
- Rights:
- GPL
72. IFA speech corpus
- Publisher:
- Instituut Fonetische Wetenschappen (IFA-groep) UvA
- Type:
- corpus
- Language:
- Dutch
- Description:
- Spoken corpus containing speech of 4 male and 4 female speakers. 50,000 words segmented at phoneme level
- Rights:
- GNU GPL
73. IFA Spoken Language Corpus
- Publisher:
- Amsterdam Centre for Language and Communication, University of Amsterdam
- Type:
- corpus
- Language:
- Dutch
- Description:
- hand-segmented speech
- Rights:
- GPL
74. In memoriam Jan Romein, 1893-1962 /
- Creator:
- Schöffer, I.
- Type:
- text and nekrology
- Subject:
- Historická věda. Pomocné vědy historické. Archivnictví, Romein, Jan,, historiografie nizozemská, historici nizozemští, Nizozemí, dějepisectví, historické vědy, historici, světové dějiny od r. 1918 do současnosti, and historici (jubilea, nekrology apod.)
- Language:
- Dutch
- Rights:
- unknown
75. Intas corpus
- Publisher:
- Department of Languages, University of Jyväskylä
- Type:
- corpus
- Language:
- Dutch, Finnish, and Russian
- Description:
- A corpus of spontaneous discussions and read-aloud performances from native speakers of different ages. Parallel corpus in Russian, Finnish, and Dutch.
- Rights:
- Not specified
76. IWPT 2020 Shared Task Data and System Outputs
- Creator:
- Zeman, Daniel, Bouma, Gosse, and Seddah, Djamé
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, enhanced universal dependencies, shared task, and parsing
- Language:
- Arabic, Bulgarian, Czech, Dutch, English, Estonian, Finnish, French, Italian, Latvian, Lithuanian, Polish, Russian, Slovak, Swedish, Tamil, and Ukrainian
- Description:
- This package contains data used in the IWPT 2020 shared task. It contains training, development and test (evaluation) datasets. The data is based on a subset of Universal Dependencies release 2.5 (http://hdl.handle.net/11234/1-3105) but some treebanks contain additional enhanced annotations. Moreover, not all of these additions became part of Universal Dependencies release 2.6 (http://hdl.handle.net/11234/1-3226), which makes the shared task data unique and worth a separate release to enable later comparison with new parsing algorithms. The package also contains a number of Perl and Python scripts that have been used to process the data during preparation and during the shared task. Finally, the package includes the official primary submission of each team participating in the shared task.
- Rights:
- Licence Universal Dependencies v2.5, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.5, and PUB
77. IWPT 2021 Shared Task Data and System Outputs
- Creator:
- Zeman, Daniel, Bouma, Gosse, and Seddah, Djamé
- Publisher:
- Universal Dependencies Consortium
- Type:
- text and corpus
- Subject:
- treebank, dependency, syntax, enhanced universal dependencies, shared task, and parsing
- Language:
- Arabic, Bulgarian, Czech, Dutch, English, Estonian, Finnish, French, Italian, Latvian, Lithuanian, Polish, Russian, Slovak, Swedish, Tamil, and Ukrainian
- Description:
- This package contains data used in the IWPT 2021 shared task. It contains training, development and test (evaluation) datasets. The data is based on a subset of Universal Dependencies release 2.7 (http://hdl.handle.net/11234/1-3424) but some treebanks contain additional enhanced annotations. Moreover, not all of these additions became part of Universal Dependencies release 2.8 (http://hdl.handle.net/11234/1-3687), which makes the shared task data unique and worth a separate release to enable later comparison with new parsing algorithms. The package also contains a number of Perl and Python scripts that have been used to process the data during preparation and during the shared task. Finally, the package includes the official primary submission of each team participating in the shared task.
- Rights:
- Licence Universal Dependencies v2.7, https://lindat.mff.cuni.cz/repository/xmlui/page/license-ud-2.7, and PUB
78. Jaarboek van het europees genootschap voor munt- en penningkunde.
- Type:
- text and časopisy
- Subject:
- Seriálové publikace. Periodika, numizmatika, mincovnictví, ražba mincí, sborníky zahraniční, and zahraniční periodika a sborníky
- Language:
- Dutch
- Rights:
- unknown
79. Jan Amos Comenius :
- Creator:
- Woldring, H. E. S.,
- Type:
- text and monografie
- Subject:
- Filozofie, Komenský, Jan Amos,, filozofie, české země 1526-1792, filozofie, filozofové, and světové dějiny novověku (1492-1918)
- Language:
- Dutch
- Rights:
- unknown
80. JRC-Acquis
- Publisher:
- Joint Research Centre of the EU
- Type:
- corpus
- Language:
- Bulgarian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Modern Greek (1453-), Hungarian, Italian, Latvian, Maltese, Norwegian, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, and Swedish
- Description:
- The largest parallel corpus, contains EU law, the Acquis Communautaire in 22 languages.
- Rights:
- Not specified
81. L1 Acquisition Anke Jolink
- Publisher:
- Max Planck Institute for Psycholinguistics
- Type:
- corpus
- Language:
- Dutch
- Description:
- Language Acquisition corpus
- Rights:
- Not specified
82. L1 Acquisition Joost van de Weijer
- Publisher:
- Max Planck Institute for Psycholinguistics
- Type:
- corpus
- Language:
- Dutch
- Description:
- Language Acquisition corpus
- Rights:
- Not specified
83. L2 Acquisition Finiteness and Scope
- Publisher:
- Max Planck Institute for Psycholinguistics
- Type:
- corpus
- Language:
- Dutch, English, French, and German
- Description:
- Language Acquisition corpus
- Rights:
- Not specified
84. Leids Kunsthistorisch Jaarboek. (Rudolf II. and his court).
- Publisher:
- Delftsche Uigevers Maatschappij Brno V.,
- Subject:
- Rudolf, dvory panovnické, dějiny umění, časopisy, dějiny umění, mecenát, české země 1526-1620, and zahraniční periodika a sborníky
- Language:
- English, Dutch, and German
- Rights:
- unknown
85. Living Oral History Workbench: Interviewproject Nederlandse Veteranen (IPNV)
- Publisher:
- The Netherlands Veteran Institute, Centre for Language and Speech Technology, Radboud University, and Data Archiving and Networked Services
- Format:
- text/plain
- Type:
- corpus
- Language:
- Dutch
- Description:
- The Netherlands Veterans Institute (VI) hosts about 250 interviews (audio) in which Dutch former military personel speak about their experiences during World War II (interviews about the years 1935-1945) and decolonisation in the Dutch East Indies (1945-1950) and Dutch New Guinea (1960-1962). In the project Living Oral History Workbench these interviews have been indexed by automatic speech recognition techniques. The list of interviews and their metadata are available at the CLARIN Center; researchers may apply to VI for access to the data.
- Rights:
- Not specified
86. Mapa Moravy Jana Amose Komenského =
- Type:
- text, mapy staré, and faksimile
- Subject:
- Staré mapy, Komenský, Jan Amos,, mapy staré, kartografie historická, and dějiny historické kartografie, jednotlivci
- Language:
- Czech, English, German, and Dutch
- Description:
- Souběžný anglický, německý a nizozemský text, V tiráži souběžný německý a nizozemský název, and 1200 výt.
- Rights:
- unknown
87. Materiële cultuur en levensstijl :
- Creator:
- Schuurman, A. J.,
- Type:
- text and monografie
- Subject:
- Dějiny ostatních evropských států, inventáře domácností, zemědělci, venkov, kultura hmotná, Nizozemí, zemědělci, řemeslníci, poddaní, and světové dějiny 1789-1918
- Language:
- Dutch
- Rights:
- unknown
88. MEDIATIC
- Publisher:
- Katholieke Universiteit Leuven Campus Kortrijk, Université Lille3
- Type:
- corpus
- Language:
- Dutch and French
- Description:
- Databank with video-fragments (Dutch and French), transcribed and translated (LINGUATIC-project)
- Rights:
- Not specified
89. Memory-Based Shallow Parser (MBSP)
- Publisher:
- ILK, Tilburg University and CNTS - Language Technology Group, University of Antwerp
- Type:
- toolService
- Language:
- Dutch and English
- Description:
- MBSP is a set of linguistic tools based on the TiMBL and MBT memory based learning applications developed at CNTS and ILK. It provides tools for Part of Speech tagging, Chunking, Lemmatizing, Relation Finding, Named Entity Recognition, and (for medical language) Semantic tagging.
- Rights:
- Not specified
90. Michiel Adriaanszoon de Ruyter /
- Creator:
- Blok, Petrus Johannes,
- Type:
- text and monografie
- Subject:
- Vojenství. Obrana země. Ozbrojené síly, Ruyter, Michiel Adriaensz de,, velitelé vojenští, politici, Nizozemí, armáda, vojenské složky, vojáci, and světové dějiny 1492-1648
- Language:
- Dutch
- Rights:
- unknown
91. Morphological Atlas of the Dutch Dialects (MAND)
- Publisher:
- Meertens Institute KNAW The Netherlands
- Type:
- corpus
- Language:
- Dutch
- Description:
- The Morphological Atlas of the Dutch Dialects (MAND) is based on phonetically transcribed speech. The speech recordings were made during a period from 1980 until 1995.
- Rights:
- Not specified
92. MPI ESF Corpus
93. Multilingualism Marianne Gullberg & Peter Indefrey
- Publisher:
- Max Planck Institute for Psycholinguistics
- Type:
- corpus
- Language:
- Dutch, German, English, and French
- Description:
- Language Acquisition corpus
- Rights:
- Not specified
94. NameTag 2 Models (2020-08-31)
- Creator:
- Straková, Jana and Straka, Milan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text, mlmodel, and languageDescription
- Subject:
- named entity recognition
- Language:
- English, German, Dutch, Spanish, and Czech
- Description:
- NER models for NameTag 2, named entity recognition tool, for English, German, Dutch, Spanish and Czech. Model documentation including performance can be found here: https://ufal.mff.cuni.cz/nametag/2/models . These models are for NameTag 2, named entity recognition tool, which can be found here: https://ufal.mff.cuni.cz/nametag/2 .
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
95. NameTag 2 Models (2021-09-16)
- Creator:
- Straková, Jana and Straka, Milan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text, mlmodel, and languageDescription
- Subject:
- named entity recognition and NER
- Language:
- English, German, Dutch, Spanish, and Czech
- Description:
- NER models for NameTag 2, named entity recognition tool, for English, German, Dutch, Spanish and Czech. Model documentation including performance can be found here: https://ufal.mff.cuni.cz/nametag/2/models . These models are for NameTag 2, named entity recognition tool, which can be found here: https://ufal.mff.cuni.cz/nametag/2 .
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
96. NameTag service description
- Creator:
- Straková, Jana and Straka, Milan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- service and toolService
- Subject:
- named entity recognition, NameTag, and WeblichtXML
- Language:
- Czech, German, English, Spanish, and Dutch
- Description:
- Metadata description of nametag (http://hdl.handle.net/11234/1-3633, https://lindat.mff.cuni.cz/services/nametag/) provided for weblicht.
- Rights:
- Not specified
97. Namur Corpus
- Publisher:
- Katholieke Universiteit Leuven Campus Kortrijk
- Type:
- corpus
- Language:
- Dutch, English, and French
- Description:
- Trilingual parallel corpus, with Dutch as first language. 2M words, aligned at paragraph level. It includes fiction and non-fiction texts.
- Rights:
- Not specified
98. Nederlandse Familienamen Databank (Dutch Database of Family Names)
- Publisher:
- Meertens Institute KNAW The Netherlands
- Format:
- application/octet-stream
- Type:
- toolService
- Language:
- Dutch
- Description:
- Enriched database of (mainly) Dutch family names, based on 1947 census (in progress; currently 90.000 entries from 140.000 max)
- Rights:
- Meertens Institute KNAW The Netherlands
99. Nederlandse oude drukken in Bohemen, Moravië en Silezië (1500-1800) =
- Publisher:
- ATUT,
- Type:
- monografie
- Subject:
- Staré tisky, Bibliografie. Katalogy, tisky staré, literatura nizozemská, vztahy česko-nizozemské, vztahy kulturní, staré tisky, české země 1526-1792, dějiny knihy, knihtisk, nakladatelství, Nizozemí, and světové dějiny novověku (1492-1918)
- Language:
- Dutch and English
- Rights:
- unknown
100. Nizozemské divadelní hry ve středověku :
- Creator:
- Bossaert, Benjamin,
- Type:
- text and monografie kolektivní
- Subject:
- Germánské literatury (o nich), divadlo středověké, literatura středověká, hry divadelní, Nizozemí, světové dějiny středověku (do r. 1492), and divadlo, film, fotografie
- Language:
- Czech and Dutch
- Description:
- Dutch theatre plays in the Middle Ages.
- Rights:
- unknown