Number of results to display per page
Search Results
312. DaMuEL 1.0: A Large Multilingual Dataset for Entity Linking
- Creator:
- Kubeša, David and Straka, Milan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- entity linking, NEL, NER, dataset, and knowledge base
- Language:
- Afrikaans, Arabic, Armenian, Basque, Belarusian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Latin, Latvian, Lithuanian, Maltese, Marathi, Modern Greek (1453-), Northern Sami, Norwegian Nynorsk, Persian, Polish, Portuguese, Romanian, Russian, Scottish Gaelic, Serbian, Slovak, Slovenian, Spanish, Swedish, Tamil, Telugu, Uighur, Ukrainian, Urdu, Vietnamese, and Wolof
- Description:
- We present DaMuEL, a large Multilingual Dataset for Entity Linking containing data in 53 languages. DaMuEL consists of two components: a knowledge base that contains language-agnostic information about entities, including their claims from Wikidata and named entity types (PER, ORG, LOC, EVENT, BRAND, WORK_OF_ART, MANUFACTURED); and Wikipedia texts with entity mentions linked to the knowledge base, along with language-specific text from Wikidata such as labels, aliases, and descriptions, stored separately for each language. The Wikidata QID is used as a persistent, language-agnostic identifier, enabling the combination of the knowledge base with language-specific texts and information for each entity. Wikipedia documents deliberately annotate only a single mention for every entity present; we further automatically detect all mentions of named entities linked from each document. The dataset contains 27.9M named entities in the knowledge base and 12.3G tokens from Wikipedia texts. The dataset is published under the CC BY-SA licence.
- Rights:
- Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB
313. Danish Fungi 2020
- Creator:
- Picek, Lukáš, Šulc, Milan, Matas, Jiří, Jeppesen, Thomas S., Heilmann-Clausen, Jacob, Læssøe, Thomas, and Frøslev, Tobias
- Publisher:
- IEEE/CVF
- Type:
- IMAGE and corpus
- Subject:
- Fungi, image processing, Classification, and Fine-grained
- Description:
- Danish Fungi 2020 (DF20) is a fine-grained dataset and benchmark. The dataset, constructed from observations submitted to the Danish Fungal Atlas, is unique in its taxonomy-accurate class labels, small number of errors, highly unbalanced long-tailed class distribution, rich observation metadata, and well-defined class hierarchy. DF20 has zero overlap with ImageNet, allowing unbiased comparison of models fine-tuned from publicly available ImageNet checkpoints. The dataset has 1,604 different classes, with 248,466 training images and 27,608 test images.
- Rights:
- GNU Library or "Lesser" General Public License 3.0 (LGPL-3.0), http://opensource.org/licenses/LGPL-3.0, and PUB
314. Database of speech corpora of Czech laryngectomy patients
- Creator:
- Matoušek, Jindřich, Tihelka, Daniel, Jůzová, Markéta, Grůber, Martin, Vít, Jakub, and Řepová, Barbora
- Publisher:
- University of West Bohemia, Department of Cybernetics
- Type:
- audio and corpus
- Subject:
- speech corpus, voice conservation, laryngectomy, and text-to-speech synthesis
- Language:
- Czech
- Description:
- The corpus contains Czech speech of laryngectomy patients recorded before a surgery causing their voice to be lost in order to preserve the voice which can be later used for personalized text-to-speech system. Individual utterances were selected from the language by a special algorithm to cover as much phonetic and prosodic features as possible.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
315. Day of Czech Youth in Prague
- Creator:
- Aktualita
- Publisher:
- Národní filmový archiv
- Type:
- video and clip
- Subject:
- akce Den české mládeže, znak Kuratorium pro výchovu mládeže, akce Kuratorium pro výchovu mládeže, Kuratorium pro výchovu mládeže, orchestr dechový v průvodu, průvod slavnostní, sbor pěvecký dětský Kühnův, sbormistr, značky sportovních klubů, běh štafetový, běh překážkový, vlajky Kuratorium pro výchovu mládeže, potlesk, stupně vítězů, start závodu běžeckého, projev Moravec Emanuel, projevy veřejné, vlajky stoupající, stadion nadhled, vlajkonoši v průvodu, tanec v krojích, kroje lidové, lidé v krojích, dívky v krojích, diváci na stadionu, sportovci nastoupení, atletika lehká, mikrofon, cvičenci nastoupení, hajlování, tance lidové, skok o tyči, náměstí zaplněné, diváci skandující, skok do dálky, Kuratorium, Places::Praha::Vinohrady::náměstí Míru, Places::Praha::Vinohrady::kostel sv. Ludmily::s vlajkou Kuratoria, Places::Praha::Nové Město::Václavské náměstí, Places::Praha::Nové Město::Na Příkopě, Places::Praha::Strahov::stadion, Places::Praha::Vinohrady::Jugoslávská, Places::Praha::Nové Město::Národní muzeum, People::Krejčí Jaroslav (1892-1956), People::Moravec Emanuel (1893-1945), People::Popelka August Adolf (1887-1951), People::Kliment Josef (1901-1978), People::Teuner František (1911-1978), and Český zvukový týdeník Aktualita::1943/38A
- Language:
- Czech
- Description:
- Segment from Český zvukový týdeník Aktualita (Czech Aktualita Sound Newsreel) issue no. 38A from 1943 contains footage from the Days of Czech Youth event organised by the Board of Trustees for the Education of Youth from 11 to 12 September. A concert of three brass bands, led by Miloš Kuba, and the Kühn Children´s Choir was held on Peace Square at 5 pm on 11 September. A procession of the Board´s members set out from Peace Square and continued through the streets of Prague. The event culminated with a track and field championship at Strahov Stadium where the winners of district rounds competed against each other. The spectators were welcomed by General Secretary of the Board František Teuner. The programme included a dance performance by girls in folk costumes. The event concluded with a speech by Minister of Education and People´s Enlightenment and Chairman of the Board Emanuel Moravec, followed by a solemn oath "to the Führer and to the Fatherland".
- Rights:
- http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
316. Days of Czech Youth
- Creator:
- Aktualita
- Publisher:
- Národní filmový archiv
- Type:
- video and clip
- Subject:
- akce Dny české mládeže, Kuratorium pro výchovu mládeže, akce Kuratorium pro výchovu mládeže, atletika lehká, závody běžecké na stadionu, hod diskem, skok do dálky, hajlování, stadion sportovní, Kuratorium, Places::Kolín::stadion A.F.K., and Český zvukový týdeník Aktualita::1943/35
- Language:
- Czech
- Description:
- Segment from Český zvukový týdeník Aktualita (Czech Aktualita Sound Newsreel) issue no. 35A from 1943 captures the mood of the District Youth Track and Field Championship for Ages 10-18, which was organised by the Board of Trustees for the Education of Youth in eighty towns of the Protectorate as part of the Days of Czech Youth event held from 28 to 29 August 1943. At the A. F. K. Stadium in Kolín nad Labem, approximately 1,500 athletes qualified for the Track and Field Championship of Bohemia and Moravia.
- Rights:
- http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
317. De Latinae Linguae Reparatione treebank
- Creator:
- Gamba, Federica and Cecchini, Flavio Massimiliano
- Publisher:
- Università Cattolica del Sacro Cuore, Centro Interdisciplinare di Ricerche per la Computerizzazione dei Segni dell’Espressione (CIRCSE)
- Type:
- text and corpus
- Subject:
- treebank, universal dependencies, latin, and Renaissance
- Language:
- Latin
- Description:
- This corpus contains the text of De Latinae Linguae Reparatione authored by Marcus Antonius Sabellicus (1436–1506), annotated with respect to lemmas, part-of-speech tags, morphological features and syntactic dependencies according to the typological formalism of Universal Dependencies (UD).
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), PUB, and http://creativecommons.org/licenses/by-nc-sa/4.0/
318. Decorating Traditional Moravian Slovak Easter Eggs
- Creator:
- Aktualita
- Publisher:
- Národní filmový archiv
- Type:
- video and clip
- Subject:
- akce Kuratorium pro výchovu mládeže, zvyky lidové, kraslice výroba, výroba kraslic, kraslice vyškrabávání, akce Soutěž o nejhezčí kraslici, malérečky při práci, Kuratorium, and Český zvukový týdeník Aktualita::1945/17B
- Language:
- Czech
- Description:
- Segment from Český zvukový týdeník Aktualita (Czech Aktualita Sound Newsreel) issue no. 17B from 1945 shows a competition for the best decorated Easter egg, which was organised by girls from the Moravian Slovak branch of the Board of Trustees for the Education of Youth as part of the youth service of honour. Local women artisans, skilled in the traditional techniques, helped them with painting and etching patterns on Easter eggs.
- Rights:
- http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
319. Deep Sequoia corpus - PARSEME-FR corpus - FrSemCor
- Creator:
- Barque, Lucie, Candito, Marie, Constant, Matthieu, Cordeiro, Silvio Ricardo, Crabbé, Benoît, Fort, Karën, Guillaume, Bruno, Haas, Pauline, Huyghe, Richard, Perrier, Guy, Ramisch, Carlos, Ribeyre, Corentin, Savary, Agata, Seddah, Djamé, Segonne, Vincent, Tribout, Delphine, Villemonte de la Clergerie, Eric, Parmentier, Yannick, Pasquer, Caroline, and Antoine, Jean-Yves
- Publisher:
- ANR
- Type:
- text and corpus
- Subject:
- morpho-syntactic annotations, treebank, dependency syntax, semantic tagging, multiword expressions, and named entities
- Language:
- French
- Description:
- The Sequoia corpus is a set of 3,099 linguistically-annotated French sentences, originating from four sources (Europarl, European Agency Reports, French regional journal L'Est Républicain, and French wikipedia). Several types of annotations were added over the years. The current release comprises: - parts-of-speech (SEQUOIA ANR-08-EMER-013 project) - syntactic dependency trees - deep syntactic dependency graphs (Deep sequoia project) - multi-word expressions and named entities (PARSEME COST project and PARSEME-FR ANR-14-CERA-0001 project) - coarse semantic tags for nouns (FrSemCor project) See the deep sequoia page for a detailed description: https://deep-sequoia.inria.fr/
- Rights:
- Deep Sequoia Licence, https://lindat.mff.cuni.cz/repository/xmlui/page/deep-sequoia-licence, and PUB
320. Deep Universal Dependencies 2.4
- Creator:
- Zeman, Daniel and Droganova, Kira
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- semantic dependency and universal dependencies
- Language:
- Afrikaans, Assyrian Neo-Aramaic, Akkadian, Amharic, Arabic, Belarusian, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Mandarin Chinese, Coptic, Welsh, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Finnish, French, Irish, Gothic, Ancient Greek (to 1453), Mbyá Guaraní, Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Komi-Zyrian, Karelian, Latin, Latvian, Lithuanian, Literary Chinese, Marathi, Erzya, Dutch, Norwegian, Old Russian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Sanskrit, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Tamil, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Warlpiri, Wolof, Yoruba, and Galician
- Description:
- Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-2988). It contains additional deep-syntactic and semantic annotations. Version of Deep UD corresponds to the version of UD it is based on. Note however that some UD treebanks have been omitted from Deep UD.
- Rights:
- Licence Universal Dependencies v2.4, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.4, and PUB