Skip to search
Skip to main content
Skip to first result
Search
Search Results
Creator:
Majliš, Martin
Publisher:
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:
text and corpus
Subject:
multilingual corpora
Language:
Afrikaans , Tosk Albanian , Amharic , Arabic , Aragonese , Egyptian Arabic , Asturian , Azerbaijani , Belarusian , Bengali , Bosnian , Bishnupriya , Breton , Buginese , Bulgarian , Catalan , Cebuano , Czech , Chuvash , Corsican , Welsh , Danish , German , Dimli (individual language) , Modern Greek (1453-) , English , Esperanto , Estonian , Basque , Faroese , Persian , Finnish , French , Western Frisian , Gan Chinese , Scottish Gaelic , Irish , Galician , Gilaki , Gujarati , Haitian , Serbo-Croatian , Hebrew , Fiji Hindi , Hindi , Croatian , Upper Sorbian , Hungarian , Armenian , Ido , Interlingua (International Auxiliary Language Association) , Indonesian , Icelandic , Italian , Javanese , Japanese , Kannada , Georgian , Kazakh , Korean , Kurdish , Latin , Latvian , Limburgan , Lithuanian , Lombard , Luxembourgish , Malayalam , Marathi , Macedonian , Malagasy , Mongolian , Maori , Malay (macrolanguage) , Burmese , Neapolitan , Low German , Nepali (macrolanguage) , Newari , Dutch , Norwegian Nynorsk , Norwegian , Occitan (post 1500) , Ossetian , Pampanga , Piemontese , Polish , Portuguese , Quechua , Romanian , Russian , Yakut , Sicilian , Scots , Slovak , Slovenian , Spanish , Albanian , Serbian , Sundanese , Swahili (macrolanguage) , Swedish , Tamil , Tatar , Telugu , Tajik , Tagalog , Thai , Turkish , Ukrainian , Urdu , Uzbek , Venetian , Vietnamese , Volapük , Waray (Philippines) , Walloon , Yiddish , Yoruba , and Chinese
Description:
A set of corpora for 120 languages automatically collected from wikipedia and the web.
Collected using the W2C toolset: http://hdl.handle.net/11858/00-097C-0000-0022-60D6-1
Rights:
Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) , http://creativecommons.org/licenses/by-sa/3.0/ , and PUB
Creator:
Müller, Thomas and Schütze, Hinrich
Publisher:
Center for Information and Language Processing, University of Munich
Type:
text and corpus
Subject:
morphological dictionary , morphological analysis , and PoS tagging
Language:
English , German , Latin , Hungarian , Spanish , and Czech
Description:
Dictionaries with different representations for various languages. Representations include brown clusters of different sizes and morphological dictionaries extracted using different morphological analyzers. All representations cover the most frequent 250,000 word types on the Wikipedia version of the respective language.
Analzers used: MAGYARLANC (Hungarian, Zsibrita et al. (2013)), FREELING (English and Spanish, Padro and Stanilovsky (2012)), SMOR (German, Schmid et al. (2004)), an MA from Charles University (Czech, Hajic (2001)) and LATMOR (Latin, Springmann et al. (2014)).
Rights:
Creative Commons - Attribution 3.0 Unported (CC BY 3.0) , http://creativecommons.org/licenses/by/3.0/ , and PUB
Publisher:
University of Leipzig
Type:
corpus
Language:
Afrikaans , Albanian , Bulgarian , Catalan , Chinese , Croatian , Czech , Danish , Dutch , English , Esperanto , Estonian , Finnish , French , German , Hungarian , Icelandic , Indonesian , Italian , Japanese , Korean , Latin , Latvian , Lithuanian , Malay (macrolanguage) , Norwegian , Occitan (post 1500) , Romanian , Russian , Slovak , Slovenian , Spanish , Sundanese , Swedish , Tagalog , Turkish , Vietnamese , and Welsh
Description:
Collected from newspaper texts, webcrawling, etc.: words (+frequency), cooccurrences (+graph), left/right neighbours, example sentences
Rights:
Not specified
Creator:
Hledíková, Zdeňka,
Type:
studie
Subject:
Dějiny Česka a Slovenska , Eliška Přemyslovna, , královny , řád, cisterciáci , kopiáře , edice , diplomatika , listiny , panovníci, panovnické rody, dvory , diplomatika, edice , and české země 1306-1419
Language:
Czech and Latin
Description:
Das Vermächtnis Eliška Přemyslovnas.
Rights:
unknown
Creator:
Kameníček, František
Type:
model:monograph and TEXT
Language:
Czech , Latin , and German
Rights:
http://creativecommons.org/publicdomain/mark/1.0/ and policy:public
Creator:
Šimák, Josef Vítězslav and Spolek historický v Praze
Type:
model:monograph and TEXT
Language:
Czech and Latin
Rights:
http://creativecommons.org/publicdomain/mark/1.0/ and policy:public