Skip to search
Skip to main content
Skip to first result
Search
Search Results
Creator:
Rosa, Rudolf
Publisher:
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:
text and corpus
Subject:
Wikipedia , text corpora , and monolingual corpus
Language:
Abkhazian , Achinese , Adyghe , Afrikaans , Akan , Tosk Albanian , Amharic , Old English (ca. 450-1100) , Arabic , Official Aramaic (700-300 BCE) , Aragonese , Egyptian Arabic , Assamese , Asturian , Atikamekw , Avaric , Aymara , South Azerbaijani , Azerbaijani , Bashkir , Bambara , Bavarian , Central Bikol , Belarusian , Bengali , Bislama , Banjar , Tibetan , Bosnian , Bishnupriya , Breton , Buginese , Bulgarian , Russia Buriat , Catalan , Min Dong Chinese , Cebuano , Czech , Chamorro , Chechen , Cherokee , Church Slavic , Chuvash , Cheyenne , Central Kurdish , Cornish , Corsican , Cree , Crimean Tatar , Kashubian , Welsh , Danish , German , Dinka , Dimli (individual language) , Dhivehi , Lower Sorbian , Dzongkha , Modern Greek (1453-) , English , Esperanto , Estonian , Basque , Ewe , Extremaduran , Faroese , Persian , Fijian , Finnish , French , Arpitan , Northern Frisian , Western Frisian , Fulah , Friulian , Gagauz , Gan Chinese , Scottish Gaelic , Irish , Galician , Gilaki , Manx , Goan Konkani , Gothic , Guarani , Gujarati , Hakka Chinese , Haitian , Hausa , Hawaiian , Serbo-Croatian , Hebrew , Herero , Fiji Hindi , Hindi , Hiri Motu , Croatian , Upper Sorbian , Hungarian , Armenian , Igbo , Ido , Inuktitut , Interlingue , Iloko , Interlingua (International Auxiliary Language Association) , Indonesian , Inupiaq , Icelandic , Italian , Jamaican Creole English , Javanese , Lojban , Japanese , Kara-Kalpak , Kabyle , Kalaallisut , Kannada , Kashmiri , Georgian , Kanuri , Kazakh , Kabardian , Kabiyè , Khmer , Kikuyu , Kinyarwanda , Kirghiz , Komi-Permyak , Komi , Kongo , Korean , Karachay-Balkar , Kölsch , Kurdish , Ladino , Lao , Latin , Latvian , Lak , Lezghian , Ligurian , Limburgan , Lingala , Lithuanian , Lombard , Northern Luri , Latgalian , Luxembourgish , Ganda , Literary Chinese , Marshallese , Maithili , Malayalam , Marathi , Moksha , Eastern Mari , Minangkabau , Macedonian , Malagasy , Maltese , Mongolian , Maori , Western Mari , Malay (macrolanguage) , Creek , Mirandese , Burmese , Erzya , Mazanderani , Min Nan Chinese , Neapolitan , Nauru , Navajo , Ndonga , Low German , Nepali (macrolanguage) , Newari , Dutch , Norwegian Nynorsk , Norwegian , Novial , Pedi , Nyanja , Occitan (post 1500) , Livvi , Oriya (macrolanguage) , Oromo , Ossetian , Pangasinan , Pampanga , Panjabi , Papiamento , Picard , Pennsylvania German , Pfaelzisch , Pitcairn-Norfolk , Pali , Piemontese , Western Panjabi , Pontic , Polish , Portuguese , Pushto , Quechua , Vlax Romani , Romansh , Romanian , Rusyn , Rundi , Macedo-Romanian , Russian , Sango , Yakut , Sanskrit , Sicilian , Scots , Samogitian , Sinhala , Slovak , Slovenian , Northern Sami , Samoan , Shona , Sindhi , Somali , Southern Sotho , Spanish , Albanian , Sardinian , Sranan Tongo , Serbian , Swati , Saterfriesisch , Sundanese , Swahili (macrolanguage) , Swedish , Silesian , Tahitian , Tamil , Tatar , Tulu , Telugu , Tama (Colombia) , Tetum , Tajik , Tagalog , Thai , Tigrinya , Tonga (Tonga Islands) , Tok Pisin , Tswana , Tsonga , Turkmen , Tumbuka , Turkish , Twi , Tuvinian , Udmurt , Uighur , Ukrainian , Urdu , Uzbek , Venetian , Venda , Veps , Vietnamese , Vlaams , Volapük , Võro , Waray (Philippines) , Walloon , Wolof , Wu Chinese , Kalmyk , Xhosa , Mingrelian , Yiddish , Yoruba , Yue Chinese , Zeeuws , Zhuang , Chinese , Zulu , and Dotyali
Description:
Wikipedia plain text data obtained from Wikipedia dumps with WikiExtractor in February 2018.
The data come from all Wikipedias for which dumps could be downloaded at [https://dumps.wikimedia.org/]. This amounts to 297 Wikipedias, usually corresponding to individual languages and identified by their ISO codes. Several special Wikipedias are included, most notably "simple" (Simple English Wikipedia) and "incubator" (tiny hatching Wikipedias in various languages).
For a list of all the Wikipedias, see [https://meta.wikimedia.org/wiki/List_of_Wikipedias].
The script which can be used to get new version of the data is included, but note that Wikipedia limits the download speed for downloading a lot of the dumps, so it takes a few days to download all of them (but one or a few can be downloaded fast).
Also, the format of the dumps changes time to time, so the script will probably eventually stop working one day.
The WikiExtractor tool [http://medialab.di.unipi.it/wiki/Wikipedia_Extractor] used to extract text from the Wikipedia dumps is not mine, I only modified it slightly to produce plaintext outputs [https://github.com/ptakopysk/wikiextractor].
Rights:
Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) , http://creativecommons.org/licenses/by-sa/3.0/ , and PUB
Type:
text and sborníky jubilejní
Subject:
Historická věda. Pomocné vědy historické. Archivnictví , Urbánková, Emma, , sborníky , historici čeští , jubilea životní , kodikologie , Československo 1969-1989 , and české (československé) sborníky a kolektivní monografie
Language:
Czech , French , German , Latin , and Slovak
Rights:
unknown
Type:
text , prameny , and edice
Subject:
Latinská poezie, latinsky psaná , Campanus Vodňanský, Jan, , Šimon ze Slaného, , Cropacius, Kašpar, , Villaticus, Šimon, , Hanno, Martin, , Hasištejnský z Lobkovic, Bohuslav, , Hodějovský z Hodějova, Bohuslav, , Collinus, Matouš, , Kuthen ze Šprinsberka, Martin, , Mitis z Limuz, Tomáš, , Řípa, Václav, , Srnec z Varvažova, Jakub, , Vestonia, , Šentygar z Chotěřiny, Jan, , literatura česká , literatura latinská , poezie humanistická , české země 1471-1526 , české země 1526-1620 , and literatura, spisovatelé
Language:
French and Latin
Description:
"Cultures d'Europe centrale, numéro hors-série."
Rights:
unknown
Creator:
Deml, Jakub,
Type:
text and korespondence
Subject:
Česká literatura (o ní) , Biografie , Deml, Jakub, , Březina, Otokar, , kněží katoličtí , básníci , literatura česká , české země 1848-1918 , Československo 1918-1938 , literatura, spisovatelé , and jednotlivci (církevní dějiny)
Language:
Czech , French , German , Latin , and Slovenian
Description:
Částečně přeloženo z francouzštiny, latiny, němčiny a slovinštiny and Hřbetní název: Jakub Deml - korespondence s Otokarem Březinou
Rights:
unknown
Type:
text , prameny , and soupisy
Subject:
Staré tisky , prvotisky, paleotypy , bohemika , literatura česká , and prvotisky
Language:
Czech , French , German , Latin , and Polish
Description:
Obálkový název: Prameny k prvotiskům českého původu
Rights:
unknown
Creator:
Descartes, René,
Type:
text , spisy , and korespondence
Subject:
Filozofie , Descartes, René, , Alžběta, , filozofové francouzští , filozofie francouzská , světové dějiny 1492-1648 , Francie , české země 1620-1740 , sociologie, psychologie, sociologové, psychologové , and filozofie, filozofové
Language:
Czech , French , and Latin
Description:
Souběž. text v latině nebo francouzštině.
Rights:
unknown
Creator:
Descartes, René,
Type:
text , spisy , and korespondence
Subject:
Filozofie , Descartes, René, , Alžběta, , filozofové francouzští , filozofie francouzská , Francie , sociologie, psychologie, sociologové, psychologové , české země 1620-1740 , filozofie, filozofové , and světové dějiny 1492-1648
Language:
Czech , French , and Latin
Description:
Souběž. text v latině nebo franzouzštině
Rights:
unknown
Type:
corpus
Language:
Danish , Dutch , English , Finnish , French , German , Italian , Latin , Portuguese , Russian , Spanish , Swedish , and Telugu
Description:
Possibility to download or to browse free electronic books; Angebot: Download von und Online-Zugang zu frei verfügbaren E-Books; deutschsprachige Literatur stellt nur einen Teilbereich der verfügbaren E-Books dar
Rights:
Not specified
Type:
text and sborníky jubilejní
Subject:
Filologie , Nechutová, Jana, , historici čeští , jubilea životní , and české (československé) sborníky a kolektivní monografie
Language:
Czech , English , French , German , Latin , Polish , and Slovak
Description:
Vydala Filozofická fakulta Masarykovy univerzity, Výzkumné středisko pro dějiny střední Evropy: prameny, země, kultura a Matice moravská v nakl. Matice moravské
Rights:
unknown
Type:
text and prameny
Subject:
Dějiny Francie a Monaka , diplomatika , panovníci francouzští , Francie , politické dějiny, politici , světové dějiny středověku (do r. 1492) , and diplomatika, edice
Language:
French and Latin
Rights:
unknown