« Previous |
11 - 12 of 12
|
Next »
Number of results to display per page
Search Results
12. W2C – Web to Corpus – Corpora
- Creator:
- Majliš, Martin
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- multilingual corpora
- Language:
- Afrikaans, Tosk Albanian, Amharic, Arabic, Aragonese, Egyptian Arabic, Asturian, Azerbaijani, Belarusian, Bengali, Bosnian, Bishnupriya, Breton, Buginese, Bulgarian, Catalan, Cebuano, Czech, Chuvash, Corsican, Welsh, Danish, German, Dimli (individual language), Modern Greek (1453-), English, Esperanto, Estonian, Basque, Faroese, Persian, Finnish, French, Western Frisian, Gan Chinese, Scottish Gaelic, Irish, Galician, Gilaki, Gujarati, Haitian, Serbo-Croatian, Hebrew, Fiji Hindi, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Ido, Interlingua (International Auxiliary Language Association), Indonesian, Icelandic, Italian, Javanese, Japanese, Kannada, Georgian, Kazakh, Korean, Kurdish, Latin, Latvian, Limburgan, Lithuanian, Lombard, Luxembourgish, Malayalam, Marathi, Macedonian, Malagasy, Mongolian, Maori, Malay (macrolanguage), Burmese, Neapolitan, Low German, Nepali (macrolanguage), Newari, Dutch, Norwegian Nynorsk, Norwegian, Occitan (post 1500), Ossetian, Pampanga, Piemontese, Polish, Portuguese, Quechua, Romanian, Russian, Yakut, Sicilian, Scots, Slovak, Slovenian, Spanish, Albanian, Serbian, Sundanese, Swahili (macrolanguage), Swedish, Tamil, Tatar, Telugu, Tajik, Tagalog, Thai, Turkish, Ukrainian, Urdu, Uzbek, Venetian, Vietnamese, Volapük, Waray (Philippines), Walloon, Yiddish, Yoruba, and Chinese
- Description:
- A set of corpora for 120 languages automatically collected from wikipedia and the web. Collected using the W2C toolset: http://hdl.handle.net/11858/00-097C-0000-0022-60D6-1
- Rights:
- Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0), http://creativecommons.org/licenses/by-sa/3.0/, and PUB