« Previous |
81 - 87 of 87
|
Next »
Number of results to display per page
Search Results
82. W2C – Web to Corpus – Corpora
- Creator:
- Majliš, Martin
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- multilingual corpora
- Language:
- Afrikaans, Tosk Albanian, Amharic, Arabic, Aragonese, Egyptian Arabic, Asturian, Azerbaijani, Belarusian, Bengali, Bosnian, Bishnupriya, Breton, Buginese, Bulgarian, Catalan, Cebuano, Czech, Chuvash, Corsican, Welsh, Danish, German, Dimli (individual language), Modern Greek (1453-), English, Esperanto, Estonian, Basque, Faroese, Persian, Finnish, French, Western Frisian, Gan Chinese, Scottish Gaelic, Irish, Galician, Gilaki, Gujarati, Haitian, Serbo-Croatian, Hebrew, Fiji Hindi, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Ido, Interlingua (International Auxiliary Language Association), Indonesian, Icelandic, Italian, Javanese, Japanese, Kannada, Georgian, Kazakh, Korean, Kurdish, Latin, Latvian, Limburgan, Lithuanian, Lombard, Luxembourgish, Malayalam, Marathi, Macedonian, Malagasy, Mongolian, Maori, Malay (macrolanguage), Burmese, Neapolitan, Low German, Nepali (macrolanguage), Newari, Dutch, Norwegian Nynorsk, Norwegian, Occitan (post 1500), Ossetian, Pampanga, Piemontese, Polish, Portuguese, Quechua, Romanian, Russian, Yakut, Sicilian, Scots, Slovak, Slovenian, Spanish, Albanian, Serbian, Sundanese, Swahili (macrolanguage), Swedish, Tamil, Tatar, Telugu, Tajik, Tagalog, Thai, Turkish, Ukrainian, Urdu, Uzbek, Venetian, Vietnamese, Volapük, Waray (Philippines), Walloon, Yiddish, Yoruba, and Chinese
- Description:
- A set of corpora for 120 languages automatically collected from wikipedia and the web. Collected using the W2C toolset: http://hdl.handle.net/11858/00-097C-0000-0022-60D6-1
- Rights:
- Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0), http://creativecommons.org/licenses/by-sa/3.0/, and PUB
83. Wikicorpus
- Publisher:
- Centro de Tecnologías y Aplicaciones del Lenguaje y del Habla (TALP)
- Type:
- corpus
- Subject:
- trilingual corpus
- Language:
- Catalan, English, and Spanish
- Description:
- Trilingual corpus (Catalan, Spanish, English) that contains large portions of the Wikipedia (based on a 2006 dump) and has been automatically enriched with linguistic information. In its present version, it contains over 750 million words.
- Rights:
- Not specified
84. WMT21 Marian translation model (ca-oc multi-task)
- Creator:
- Novák, Michal and Jon, Josef
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- other and toolService
- Subject:
- neural machine translation, machine translation, grapheme-to-phoneme conversion, and multi-task model
- Language:
- Catalan and Occitan (post 1500)
- Description:
- Marian NMT model for Catalan to Occitan translation. It is a multi-task model, producing also a phonemic transcription of the Catalan source. The model was submitted to WMT'21 Shared Task on Multilingual Low-Resource Translation for Indo-European Languages as a CUNI-Contrastive system for Catalan to Occitan.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
85. WMT21 Marian translation model (ca-oc)
- Creator:
- Jon, Josef
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- other and toolService
- Subject:
- machine translation and neural machine translation
- Language:
- Catalan and Occitan (post 1500)
- Description:
- Marian NMT model for Catalan to Occitan translation. Primary CUNI submission for WMT21 Multilingual Low-Resource Translation for Indo-European Languages Shared Task.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
86. WMT21 Marian translation models (ca-ro,it,oc)
- Creator:
- Jon, Josef
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- other and toolService
- Subject:
- neural machine translation
- Language:
- Catalan, Occitan (post 1500), Italian, and Romanian
- Description:
- Marian multilingual translation model from Catalan into Romanian, Italian and Occitan. Primary CUNI submission for WMT21 Multilingual Low-Resource Translation for Indo-European Languages Shared Task.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
87. Wortschatz
- Publisher:
- University of Leipzig
- Type:
- corpus
- Language:
- Afrikaans, Albanian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, German, Hungarian, Icelandic, Indonesian, Italian, Japanese, Korean, Latin, Latvian, Lithuanian, Malay (macrolanguage), Norwegian, Occitan (post 1500), Romanian, Russian, Slovak, Slovenian, Spanish, Sundanese, Swedish, Tagalog, Turkish, Vietnamese, and Welsh
- Description:
- Collected from newspaper texts, webcrawling, etc.: words (+frequency), cooccurrences (+graph), left/right neighbours, example sentences
- Rights:
- Not specified