Nově přidané
OBRAZ
Popis:
The key idea of our project is to convey to the widest possible readership detailed abstracts of the testimonies of Roma and Sinti and thus their personal and irreplaceable experience of the Second World War. We hope that ...
Tento záznam obsahuje 1 soubor (6.53
MB).
Publicly Available
toolService
Popis:
Tokenizer, POS Tagger, Lemmatizer and Parser models for 147 treebanks of 78 languages of Universal Depenencies 2.15 Treebanks, created solely using UD 2.15 data (https://hdl.handle.net/11234/1-5787). The model documentation ...
Tento záznam obsahuje 1 soubor (8.53
GB).
Publicly Available
corpus
Popis:
*** german version see below ***
The ‘Ancillary Monitor Corpus: Common Crawl - german web’ was designed with the aim of enabling a broad-based linguistic analysis of the German-language (visible) internet over time - ...
Tento záznam obsahuje 272 souborů (53.6
GB).
Publicly Available
Nejnavštěvovanější záznamy
Za poslední týden
corpus
Popis:
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and ...
Tento záznam obsahuje 3 souborů (650.18
MB).
Publicly Available
corpus
Popis:
The ParCzech 3.0 corpus is the third version of ParCzech consisting of stenographic protocols that record the Chamber of Deputies’ meetings held in the 7th term (2013-2017) and the current 8th term (2017-Mar 2021). The ...
Tento záznam obsahuje 40 souborů (1064.79
GB).
Publicly Available
corpus
Popis:
A set of corpora for 120 languages automatically collected from wikipedia and the web.
Collected using the W2C toolset: http://hdl.handle.net/11858/00-097C-0000-0022-60D6-1
Tento záznam obsahuje 122 souborů (18.91
GB).
Publicly Available