The data on the Carib language is collected by dr. Berend Hoff in the period 1955-1965. See: B.J. Hoff, The Carib Language, Phonology, Morphology, Text and Word Index. Verhandelingen van het Koninklijk Instituut voor Taal-, Land-, en Volkenkunde (Royal Institute of Linguistics and Anthropology) Vol. 55 (1968), Martinus Nijhoff: The Hague.
This RESTful service allows to define a sub-corpus from different annotated corpora. The service includes a POS tag harmonisation process where original tags are converted to EAGLES/Parole format. The eventual sub-corpus is indexed using the IMS CWB tool. The user receives an ID which can be used by the CQP service to exploit the sub-corpus.
This RESTful service accesses part of the Hemeroteca Digital de l’Arxiu Municipal de Girona (digital press archive from the Girona city council), specifically Catalan press from 2003. The service uses the SRU protocol.
This corpus was originally created for performance testing (server infrastructure CorpusExplorer - see: / It includes the filtered database (German texts only) of CommonCrawl (as of March 2018). First, the URLs were filtered according to their top-level domain (de, at, ch). Then the texts were classified using NTextCat and only uniquely German texts were included in the corpus. The texts were then annotated using TreeTagger (token, lemma, part-of-speech). 2.58 million documents - 232.87 million sentences - 3.021 billion tokens. You can use CorpusExplorer ( to convert this data into various other corpus formats (XML, JSON, Weblicht, TXM and many more).
Report from the celebration of the fourteenth anniversary of the Czechoslovak Republic held in front of the Municipal House in Prague on 28 October 1932. The gathering was attended by troops and legionnaires. A Philips Radio broadcast vehicle stands in front of the entrance. The segment includes a silent recording of a speech given by the Former Secretary of the National Committee and current Chairman of the Senate František Soukup.