Skip to search
Skip to main content
Skip to first result
Search
Search Results
Type:
corpus
Language:
Czech , Danish , Dutch , English , Finnish , French , German , Hungarian , Italian , Polish , Portuguese , Russian , Spanish , Swedish , Turkish , Chinese , Hebrew , Japanese , Korean , and Thai
Description:
28 speech databases containing broadband recordings from 550 adults and 50 children per language. Contains interesting phonetically rich material. All orthographically transcribed. Speaker information included for gender, age, accent. Including pronunciation lexicon.
Rights:
Not specified
Publisher:
Department of Informatics, Human Language Technology Group, University of Szeged
Format:
application/xml
Type:
corpus
Subject:
monolingual corpus , annotated corpus , and POS annotation
Language:
Hungarian
Description:
written, monolingual, general, manually POS annotated reference corpus; 1,247,546 tokens; MSD tagset, XML (TEIxLite) files
Rights:
Not specified
Publisher:
Department of Informatics, Human Language Technology Group, University of Szeged
Format:
application/xml
Type:
corpus
Subject:
monolingual corpus , annotated corpus , and POS annotation
Language:
Hungarian
Description:
written, monolingual, general, manually POS annotated reference corpus; 1,459,288 tokens; MSD tagset, XML (TEI P4) files
Rights:
Not specified
Publisher:
Department of Informatics, Human Language Technology Group, University of Szeged
Format:
application/xml
Type:
corpus
Language:
Hungarian
Description:
82,000 sentences with shallow syntactic annotation (NP-level).
Rights:
Not specified
Publisher:
Department of Informatics, Human Language Technology Group, University of Szeged
Format:
application/xml
Type:
corpus
Language:
Hungarian
Description:
82,000 sentences with full syntactic annotation.
Rights:
Not specified
Publisher:
University of Leipzig
Type:
corpus
Language:
Afrikaans , Albanian , Bulgarian , Catalan , Chinese , Croatian , Czech , Danish , Dutch , English , Esperanto , Estonian , Finnish , French , German , Hungarian , Icelandic , Indonesian , Italian , Japanese , Korean , Latin , Latvian , Lithuanian , Malay (macrolanguage) , Norwegian , Occitan (post 1500) , Romanian , Russian , Slovak , Slovenian , Spanish , Sundanese , Swedish , Tagalog , Turkish , Vietnamese , and Welsh
Description:
Collected from newspaper texts, webcrawling, etc.: words (+frequency), cooccurrences (+graph), left/right neighbours, example sentences
Rights:
Not specified