Skip to search
Skip to main content
Skip to first result
Search
Search Results
Creator:
Veselý, Bohumil
Publisher:
Národní filmový archiv
Type:
video and clip
Subject:
Galerie osobností , Places::Praha::Smíchov::Dientzenhoferův pavilon /ext./ , and People::Pták Bohumil (1869-1933)
Language:
No linguistic content
Description:
Opera singer Bohumil Pták by Dientzenhofer Pavillion in Prague-Smíchov.
Rights:
http://creativecommons.org/licenses/by-nc-nd/4.0/ , PUB , and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Creator:
Veselý, Bohumil
Publisher:
Národní filmový archiv
Type:
video and clip
Subject:
Galerie osobností , Places::Praha::Nové Město::Školská::pavlač domu , and People::Ryba Bohumil (1900-1980)
Language:
No linguistic content
Description:
Philologist Bohumil Ryba on Bohumil Veselý's balcony.
Rights:
http://creativecommons.org/licenses/by-nc-nd/4.0/ , PUB , and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Creator:
Veselý, Bohumil
Publisher:
Národní filmový archiv
Type:
video and clip
Subject:
Galerie osobností , Places::Praha::Nové Město::Školská::pavlač domu , and People::Špidra Bohumil (1895-1964)
Language:
No linguistic content
Description:
Conductor of Čeští madrigalisté choir Bohumil Špidra on Bohumil Veselý's balcony.
Rights:
http://creativecommons.org/licenses/by-nc-nd/4.0/ , PUB , and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Creator:
Veselý, Bohumil
Publisher:
Národní filmový archiv
Type:
video and clip
Subject:
Galerie osobností , Places::Praha::Nové Město::Školská::pavlač domu , People::Urban Bohumil Stanislav (1903-1997) , and People::Urbanová Antone S. (1905-1987)
Language:
No linguistic content
Description:
Painter Bohumil Stanislav Urban with his wife Antonie on Bohumil Veselý's balcony.
Rights:
http://creativecommons.org/licenses/by-nc-nd/4.0/ , PUB , and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Creator:
Veselý, Bohumil
Publisher:
Národní filmový archiv
Type:
video and clip
Subject:
Galerie osobností , Places::Praha::Michle::Jaurisova č. 11::byt Bohumila Střemchy , and People::Střemcha Bohumil (1878-1966)
Language:
No linguistic content
Description:
Photographer and scholar Bohumil Střemcha in his study.
Rights:
http://creativecommons.org/licenses/by-nc-nd/4.0/ , PUB , and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Creator:
Veselý, Bohumil
Publisher:
Národní filmový archiv
Type:
video and clip
Subject:
klobouk smeknutí , Galerie osobností , Places::Praha::Nové Město::Školská::pavlač domu , and People::Trnka Bohumil (1895-1984)
Language:
No linguistic content
Description:
Philologist Bohumil Trnka on Bohumil Veselý's balcony.
Rights:
http://creativecommons.org/licenses/by-nc-nd/4.0/ , PUB , and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Creator:
Veselý, Bohumil
Publisher:
Národní filmový archiv
Type:
video and clip
Subject:
kamera filmová , Galerie osobností , Places::Praha::Nové Město::Školská::pavlač domu , and People::Veselý Bohumil (1903-1971)
Language:
No linguistic content
Description:
Film collector Bohumil Veselý on the balcony of his own house on Školská Street in Prague.
Rights:
http://creativecommons.org/licenses/by-nc-nd/4.0/ , PUB , and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Creator:
Veselý, Bohumil
Publisher:
Národní filmový archiv
Type:
video and clip
Subject:
Galerie osobností , Places::Praha::Nové Město::Školská::pavlač domu , and People::Zámečník Alois (1897-1976)
Language:
No linguistic content
Description:
Footage of writer Bohumil Zahradník-Brodský, standing with a Virginia cigar, later sitting at a desk and writing.
Rights:
http://creativecommons.org/licenses/by-nc-nd/4.0/ , PUB , and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Creator:
Veselý, Bohumil
Publisher:
Národní filmový archiv
Type:
video and clip
Subject:
klobouk smeknutí , Galerie osobností , Places::Praha::Nové Město::Školská::pavlač Bohumila Veselého , and People::Lifka Bohumír (1900-1987)
Language:
No linguistic content
Description:
Librarian Bohumír Lifka on Bohumil Veselý's balcony.
Rights:
http://creativecommons.org/licenses/by-nc-nd/4.0/ , PUB , and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Creator:
Veselý, Bohumil
Publisher:
Národní filmový archiv
Type:
video and clip
Subject:
Galerie osobností and People::Šimek Bohumír (1896-1978)
Language:
No linguistic content
Description:
Film architect Bohumír Šimek on Bohumil Veselý's balcony.
Rights:
http://creativecommons.org/licenses/by-nc-nd/4.0/ , PUB , and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Creator:
Veselý, Bohumil
Publisher:
Národní filmový archiv
Type:
video and clip
Subject:
Galerie osobností , Places::Praha::Nové Město::Školská::pavlač domu , and People::Heran Bohuš (1907-1968)
Language:
No linguistic content
Description:
Cello player Bohuš Heran on Bohumil Veselý's balcony.
Rights:
http://creativecommons.org/licenses/by-nc-nd/4.0/ , PUB , and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Creator:
Veselý, Bohumil
Publisher:
Národní filmový archiv
Type:
video and clip
Subject:
Galerie osobností and People::Brauner Bohuslav (1855-1935)
Language:
No linguistic content
Description:
Professor and chemist Bohuslav Brauner with his grandson in the garden.
Rights:
http://creativecommons.org/licenses/by-nc-nd/4.0/ , PUB , and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Creator:
Veselý, Bohumil
Publisher:
Národní filmový archiv
Type:
video and clip
Subject:
Galerie osobností and People::Leopold Bohuslav (1888-1956)
Language:
No linguistic content
Description:
Violinist Bohuslav Leopold with his family in the garden. Leopold at his desk.
Rights:
http://creativecommons.org/licenses/by-nc-nd/4.0/ , PUB , and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Creator:
Elekta-journal and Veselý, Bohumil
Publisher:
Národní filmový archiv
Type:
video and clip
Subject:
nástroj hudební housle , hra na housle , kvarteto Ševčíkovo-Lhotského , Galerie osobností , and People::Lhotský Bohuslav (1879-1930)
Language:
No linguistic content
Description:
Violinist Bohuslav Lhotský playing the piano.
Rights:
http://creativecommons.org/licenses/by-nc-nd/4.0/ , PUB , and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Creator:
Veselý, Bohumil
Publisher:
Národní filmový archiv
Type:
video and clip
Subject:
pes pudl , Galerie osobností , People::Šula Bohuslav (1887-1967) , and People::Votrubová (neuvedeno-)
Language:
No linguistic content
Description:
Film architect Bohuslav Šula and an unidentified man with a dog in a garden.
Rights:
http://creativecommons.org/licenses/by-nc-nd/4.0/ , PUB , and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Publisher:
Department of Linguistics and Nordic Studies, University of Oslo
Type:
lexicalConceptualResource
Description:
65 000 entries with definitions, etymology, examples
Rights:
Not specified
Creator:
Veselý, Bohumil
Publisher:
Národní filmový archiv
Type:
video and clip
Subject:
Galerie osobností , Places::Praha::Nové Město::Školská::pavlač domu , and People::Vomáčka Boleslav (1887-1965)
Language:
No linguistic content
Description:
Composer Boleslav Vomáčka on Bohumil Veselý's balcony.
Rights:
http://creativecommons.org/licenses/by-nc-nd/4.0/ , PUB , and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Publisher:
Korpora.org and Fakultät Geisteswissenschaften, Universität Duisburg-Essen
Type:
corpus
Subject:
Germanistik
Language:
German
Description:
Digital, morphologically annotated (N, V, A) part of the Bonn Corpus of Early New High German; used to create the Grammatik des Frühneuhochdeutschen (III. Nouns; IV. Verbs; VI. Adjectives); morphologisch annotiert; Materialgrundlage für die Erarbeitung der Bände 3, 4 und 6 der "Grammatik des Frühneuhochdeutschen"
Rights:
Not specified
Creator:
Veselý, Bohumil
Publisher:
Národní filmový archiv
Type:
video and clip
Subject:
Galerie osobností , Places::Praha::Nové Město::Školská::pavlač domu , and People::Rujan Bořek (1899-1955)
Language:
No linguistic content
Description:
Opera singer Bořek Rujan on Bohumil Veselý's balcony.
Rights:
http://creativecommons.org/licenses/by-nc-nd/4.0/ , PUB , and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Creator:
Veselý, Bohumil
Publisher:
Národní filmový archiv
Type:
video and clip
Subject:
Galerie osobností , Places::Praha::Nové Město::Školská::pavlač domu , and People::Milec Boris (1906-1984)
Language:
No linguistic content
Description:
Dancer Boris Milec on Bohumil Veselý's balcony.
Rights:
http://creativecommons.org/licenses/by-nc-nd/4.0/ , PUB , and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Creator:
Tichý, Ondřej , Roček, Martin , Bočková, Renata , Čermák, Matěj , Dragounová, Jolana , Filipová, Helena , Gilová, Lucie , Hejná, Michaela , Hladíková, Lenka , Hladká, Alena , Hubinová, Veronika , Krajcsovicsová, Vlaďena , Kupková, Tatiana , Lebedeva, Tatiana , Malečková, Nikola , Novotná, Alena , Pazderová, Tereza , Popelíková, Jiřina , Rumlová, Jana , Tyčová Ocelík, Dana , Volná, Veronika , and Zahradníková, Tereza
Publisher:
Charles University, Faculty of Arts, Department of English Language and ELT Methodology
Type:
text , lexicon , and lexicalConceptualResource
Subject:
English , Old English , Anglo-Saxon , dictionary , Bosworth , Toller , lexicography , digitalization , English history , Mediaeval , and Medieval
Language:
English , Old English (ca. 450-1100) , Latin , Ancient Greek (to 1453) , and Ancient Hebrew
Description:
Description : This is an online edition of An Anglo-Saxon Dictionary, or a dictionary of "Old English". The dictionary records the state of the English language as it was used between ca. 700-1100 AD by the Anglo-Saxon inhabitants of the British Isles.
This project is based on a digital edition of An Anglo-Saxon dictionary, based on the manuscript collections of the late Joseph Bosworth (the so called Main Volume, first edition 1898) and its Supplement (first edition 1921), edited by Joseph Bosworth and T. Northcote Toller, today the largest complete dictionary of Old English (one day to be hopefully supplanted by the DOE). Alistair Campbell's "enlarged addenda and corrigenda" from 1972 are not public domain and are therefore not part of the online dictionary. Please see the front & back matter of the paper dictionary for further information, prefaces and lists of references & contractions.
The digitization project was initiated by Sean Crist in 2001 as a part of his Germanic Lexicon Project and many individuals and institutions have contributed to this project. Check out the original GLP webpage and the old Bosworth-Toller offline application webpage (to be updated). Currently the project is hosted by the Faculty of Arts, Charles University.
In 2010, the data from the GLP were converted to create the current site. Care was taken to preserve the typography of the original dictionary, but also provide a modern, user friendly interface for contemporary users.
In 2013, the entries were structurally re-tagged and the original typography was abandoned, though the immediate access to the scans of the paper dictionary was preserved.
Our aim is to reach beyond a simple digital edition and create an online environment dedicated to all interested in Old English and Anglo-Saxon culture. Feel free to join in the editing of the Dictionary, commenting on its numerous entries or participating in the discussions at our forums.
We hope that by drawing the attention of the community of Anglo-Saxonists to our site and joining our resources, we may create a more useful tool for everybody. The most immediate project to draw on the corrected and tagged data of the Dictionary is a Morphological Analyzer of Old English (currently under development).
We are grateful for the generous support of the Charles University Grant Agency and for the free hosting at the Faculty of Arts at Charles University. The site is currently maintained and developed by Ondrej Tichy et al. at the Department of English Language and ELT Methodology, Faculty of Arts, Charles University in Prague (Czech Republic).
Rights:
Creative Commons - Attribution 4.0 International (CC BY 4.0) , http://creativecommons.org/licenses/by/4.0/ , and PUB
Type:
corpus
Subject:
Germanistik
Language:
Chinese , Czech , English , French , German , Latin , and Spanish
Description:
Digital copies of historical botanic papers from the Missouri Botanical Garden Library; Bilddigitalisate von historischen botanischen Schriften; deutschsprachige Texte stellen nur einen Teilbereich dar
Rights:
Not specified
Creator:
Veselý, Bohumil
Publisher:
Národní filmový archiv
Type:
video and clip
Subject:
Galerie osobností , Places::Praha::Nové Město::Školská::pavlač domu , and People::Vronský Boža (1889-1955)
Language:
No linguistic content
Description:
Opera and operetta singer Boža Vronský on Bohumil Veselý's balcony.
Rights:
http://creativecommons.org/licenses/by-nc-nd/4.0/ , PUB , and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Creator:
Veselý, Bohumil
Publisher:
Národní filmový archiv
Type:
video and clip
Subject:
stroj psací , Galerie osobností , and People::Benešová Božena (1873-1936)
Language:
No linguistic content
Description:
Writer Božena Benešová at her typewriter in her study.
Rights:
http://creativecommons.org/licenses/by-nc-nd/4.0/ , PUB , and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Creator:
Veselý, Bohumil
Publisher:
Národní filmový archiv
Type:
video and clip
Subject:
Galerie osobností , Places::Praha::Nové Město::Školská::pavlač domu , and People::Petanová Božena (1888-1958)
Language:
No linguistic content
Description:
Opera singer Božena Petanová-Setunská with an unidentified man.
Rights:
http://creativecommons.org/licenses/by-nc-nd/4.0/ , PUB , and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Creator:
Veselý, Bohumil
Publisher:
Národní filmový archiv
Type:
video and clip
Subject:
Galerie osobností , Places::Praha::Nové Město::Školská::pavlač domu , and People::Klika Břetislav Maria (1884-1958)
Language:
No linguistic content
Description:
Editor Břetislav Maria Klika on Bohumil Veselý's balcony.
Rights:
http://creativecommons.org/licenses/by-nc-nd/4.0/ , PUB , and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Creator:
Veselý, Bohumil
Publisher:
Národní filmový archiv
Type:
video and clip
Subject:
kniha Bezruč Petr Slezské písně , Galerie osobností , and People::Pračka Břetislav (1881-1958)
Language:
No linguistic content
Description:
Břetislav Pračka, a collector of literature by and about the poet Petr Bezruč, with an unidentified woman on Bohumil Veselý's balcony. They're examining different editions of Slezské písně (Silesian Songs).
Rights:
http://creativecommons.org/licenses/by-nc-nd/4.0/ , PUB , and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Publisher:
Coventry University, University of Reading, University of Warwick
Format:
application/tei+xml
Type:
corpus
Language:
English
Description:
Transcribed recordings of 160 lectures and 39 seminars held in university departments. Four broad disciplinary groups, 1,644,942 tokens in total.
Rights:
Not specified
Type:
corpus
Language:
English
Description:
General reference corpus; 100 million words; POS, lemma, descriptive metadata
Rights:
Not specified
Type:
lexicalConceptualResource
Subject:
Germanistik
Language:
German
Description:
5. Aufl. 1911; Fokus auf Politik, Wirtschaft, Kultur und Technik zu Beginn des 20. Jahrhunderts
Rights:
Not specified
Creator:
Ouamer, meriem , Bouzoubaa, Karim , and Tajmout, rachida
Publisher:
ALELM research group
Type:
text , wordList , and lexicalConceptualResource
Subject:
Broken plural
Language:
Arabic
Description:
An LMF conformant XML-based file containing a comprehensive Arabic broken plural list. The file contains 12,249 singular words with their corresponding BPs
Rights:
Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) , http://creativecommons.org/licenses/by-nc-sa/4.0/ , and PUB
Creator:
Veselý, Bohumil
Publisher:
Národní filmový archiv
Type:
video and clip
Subject:
Galerie osobností , Places::Praha::Nové Město::Školská::pavlač domu , and People::Chorovič Bronislav (1888-1980)
Language:
No linguistic content
Description:
Opera singer Bronislav Chorovič on Bohumil Veselý's balcony.
Rights:
http://creativecommons.org/licenses/by-nc-nd/4.0/ , PUB , and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Publisher:
Academy of Sciences
Type:
corpus
Language:
Hungarian
Description:
BSI is a large-scale survey which provides reliable data on and analyses of the varieties of Hungarian spoken in Budapest.
Rights:
Not specified
Type:
corpus
Language:
Bulgarian
Description:
Written, synchronic, general (newspapers)
Rights:
Not specified
Type:
corpus
Language:
Bulgarian and Croatian
Description:
written; domain-specific (newspaper); diachronic; bilingual; comparable; ca 3,500,000 tokens (393 Kw Bulgarian; 3.1 Mw Croatian)
Rights:
Not specified
Type:
corpus
Language:
Bulgarian
Description:
HPSG-based annotation including: constituent structure, dependency relations, named entities (classified as person, organisation, location or other names), coreferential relations. Annotation in XML
Rights:
Not specified
Type:
lexicalConceptualResource
Language:
Bulgarian
Description:
100 000 most frequent Cyrillic tokens in the BulTreeBank text archive, UTF-16 list of token-frequency pairs
Rights:
Not specified
Creator:
Simov, Kiril and Osenova, Petya
Publisher:
Linguistic Modeling Department, IPP, Bulgarian Academy of Sciences
Type:
toolService
Description:
It is used morphological lexicon of Bulgarian (100 000 lemmas) compiled as a finite-state automaton in CLaRK System. It requires the text to be first tokenized and it is applied in each token. Includes also guessers for unknown words and Named Entities gazetteers. If the corresponding resources are available for a different language, then it can be tuned to it.
Rights:
Not specified
Type:
corpus
Language:
Bulgarian
Description:
Written, synchronic, general, manually annotated, 1 000 000 tokens divided in three sets: 215 000 tokens used in BulTreeBank HPSG Treebank (see below), additionally 300 000 checked second time, rest about 480 000 checked by the annotators. Morphosyntactic annotation with the BulTreeBank Tagset (http://www.bultreebank.org/TechRep/BTB-TR03.pdf), XML, annotation description in technical reports of BulTreeBank project http://www.bultreebank.org/TechRep
Rights:
Not specified
Creator:
Simov, Kiril , Osenova, Petya , and Simov, Alex
Publisher:
Linguistic Modeling Department, IPP, Bulgarian Academy of Sciences
Type:
toolService
Description:
This is a hybrid system: rules, neural network, rules. First
rules for the sure cases are applied, then a neural network
disambiguator is applied, then rules for repairing of the most
frequent errors of the neural network. The rules are implemented
as constraints in CLaRK System. The neural network is additional
module implemented in Java. It is called CLaRK. It requires the
morphologically annotated input.
Rights:
Not specified
Type:
corpus
Language:
Bulgarian
Description:
Written, synchronic, general, manually annotated; 50 000 tokens, 2600 sentences extracted from the BulTreeBank Text Archive in order to contain the most frequent ambiguity classes in Bulgarian
Rights:
Not specified
Type:
lexicalConceptualResource
Language:
Bulgarian
Description:
805 prepositions, pronouns, etc stop words, UTF-16 list of wordforms
Rights:
Not specified
Type:
corpus
Language:
Bulgarian
Description:
72 000 000 tokens, 15% fiction, 78% newspapers and 7% legal texts, government bulletins and others
Rights:
Not specified
Creator:
Simov, Kiril
Publisher:
Linguistic Modeling Department, IPP, Bulgarian Academy of Sciences
Type:
toolService
Description:
The tokenizer is covering all languages that use Latin1, Laitn2, Latin3 and Cyrillic tables of Unicode. Can be extended to cover other tables in Unicode if necessary. The implementation is as a cascaded regular grammar in CLaRK. It recognizes over 60 token categories. It is easy to be adapted to new token categories.
Rights:
Not specified
Publisher:
Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:
toolService
Language:
Catalan and Spanish
Description:
Tool for neologism extraction.
Rights:
Not specified
Creator:
Grác, Marek
Publisher:
Masaryk University, NLP Centre
Type:
text and corpus
Subject:
interannotator agreement , corpus , chunks , phrases , and clauses
Language:
Czech
Description:
Czech corpus annotated for NP and clause chunks by 3-11 annotators (with average inter-annotator agreement at 88%). It consists of 10,000 sentences.
Rights:
Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0) , http://creativecommons.org/licenses/by-nc-nd/3.0/ , and PUB
Publisher:
Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:
toolService
Language:
Catalan and Spanish
Description:
Terminology management
Rights:
Not specified
Publisher:
Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:
toolService
Language:
Catalan , English , and Spanish
Description:
Tool for querying the Technical Corpus of the Institut Universitari de Lingüística Aplicada.
Rights:
Not specified
Creator:
Gurevych, Iryna , Habernal, Ivan , and Zayed, Omnia
Publisher:
Technische Universität Darmstadt
Type:
text and corpus
Subject:
CommonCrawl , Creative Commons , Web corpus , and Amazon Web Services
Language:
Afrikaans , Arabic , Bengali , Bulgarian , Czech , Danish , German , Modern Greek (1453-) , English , Estonian , Persian , Finnish , French , Hebrew , Hindi , Croatian , Hungarian , Indonesian , Italian , Japanese , Kannada , Korean , Latvian , Lithuanian , Malayalam , Macedonian , Nepali (macrolanguage) , Dutch , Norwegian , Panjabi , Polish , Portuguese , Romanian , Russian , Slovak , Slovenian , Somali , Spanish , Albanian , Swahili (macrolanguage) , Swedish , Tamil , Telugu , Tagalog , Thai , Turkish , Ukrainian , Undetermined , Vietnamese , and Chinese
Description:
A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
Rights:
Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) , http://creativecommons.org/licenses/by-nc/4.0/ , and PUB
Creator:
Gurevych, Iryna , Habernal, Ivan , and Zayed, Omnia
Publisher:
Technische Universität Darmstadt
Type:
text and corpus
Subject:
CommonCrawl , Creative Commons , Web corpus , and Amazon Web Services
Language:
Afrikaans , Arabic , Bengali , Bulgarian , Czech , Danish , German , Modern Greek (1453-) , English , Estonian , Persian , Finnish , French , Gujarati , Hebrew , Hindi , Croatian , Hungarian , Indonesian , Italian , Japanese , Kannada , Korean , Latvian , Lithuanian , Malayalam , Marathi , Macedonian , Nepali (macrolanguage) , Dutch , Norwegian , Polish , Portuguese , Romanian , Russian , Slovak , Slovenian , Somali , Spanish , Albanian , Swahili (macrolanguage) , Swedish , Tamil , Telugu , Tagalog , Thai , Turkish , Ukrainian , Undetermined , Urdu , Vietnamese , and Chinese
Description:
A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
Rights:
Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) , http://creativecommons.org/licenses/by-nc-nd/4.0/ , and PUB