Skip to search
Skip to main content
Skip to first result
Search
Search Results
Creator:
Majliš, Martin
Publisher:
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:
text and corpus
Subject:
multilingual corpora
Language:
Afrikaans , Tosk Albanian , Amharic , Arabic , Aragonese , Egyptian Arabic , Asturian , Azerbaijani , Belarusian , Bengali , Bosnian , Bishnupriya , Breton , Buginese , Bulgarian , Catalan , Cebuano , Czech , Chuvash , Corsican , Welsh , Danish , German , Dimli (individual language) , Modern Greek (1453-) , English , Esperanto , Estonian , Basque , Faroese , Persian , Finnish , French , Western Frisian , Gan Chinese , Scottish Gaelic , Irish , Galician , Gilaki , Gujarati , Haitian , Serbo-Croatian , Hebrew , Fiji Hindi , Hindi , Croatian , Upper Sorbian , Hungarian , Armenian , Ido , Interlingua (International Auxiliary Language Association) , Indonesian , Icelandic , Italian , Javanese , Japanese , Kannada , Georgian , Kazakh , Korean , Kurdish , Latin , Latvian , Limburgan , Lithuanian , Lombard , Luxembourgish , Malayalam , Marathi , Macedonian , Malagasy , Mongolian , Maori , Malay (macrolanguage) , Burmese , Neapolitan , Low German , Nepali (macrolanguage) , Newari , Dutch , Norwegian Nynorsk , Norwegian , Occitan (post 1500) , Ossetian , Pampanga , Piemontese , Polish , Portuguese , Quechua , Romanian , Russian , Yakut , Sicilian , Scots , Slovak , Slovenian , Spanish , Albanian , Serbian , Sundanese , Swahili (macrolanguage) , Swedish , Tamil , Tatar , Telugu , Tajik , Tagalog , Thai , Turkish , Ukrainian , Urdu , Uzbek , Venetian , Vietnamese , Volapük , Waray (Philippines) , Walloon , Yiddish , Yoruba , and Chinese
Description:
A set of corpora for 120 languages automatically collected from wikipedia and the web.
Collected using the W2C toolset: http://hdl.handle.net/11858/00-097C-0000-0022-60D6-1
Rights:
Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) , http://creativecommons.org/licenses/by-sa/3.0/ , and PUB
Publisher:
University of Leipzig
Type:
corpus
Language:
Afrikaans , Albanian , Bulgarian , Catalan , Chinese , Croatian , Czech , Danish , Dutch , English , Esperanto , Estonian , Finnish , French , German , Hungarian , Icelandic , Indonesian , Italian , Japanese , Korean , Latin , Latvian , Lithuanian , Malay (macrolanguage) , Norwegian , Occitan (post 1500) , Romanian , Russian , Slovak , Slovenian , Spanish , Sundanese , Swedish , Tagalog , Turkish , Vietnamese , and Welsh
Description:
Collected from newspaper texts, webcrawling, etc.: words (+frequency), cooccurrences (+graph), left/right neighbours, example sentences
Rights:
Not specified