Skip to search
Skip to main content
Skip to first result
Search
Search Results
Publisher:
Academy of Sciences
Type:
corpus
Language:
Hungarian
Description:
BSI is a large-scale survey which provides reliable data on and analyses of the varieties of Hungarian spoken in Budapest.
Rights:
Not specified
Publisher:
Academy of Sciences
Format:
application/xml
Type:
corpus
Language:
Hungarian
Description:
Containing 27 million running words the Hungarian Historical Corpus provides a valuable basis for research on the history of words of Hungarian between the second half of the 18th century and 2000.
Rights:
Not specified
Publisher:
Academy of Sciences
Format:
application/xml
Type:
corpus
Subject:
synchronic corpus
Language:
Hungarian
Description:
Written general synchronic reference corpus; 190m tokens; POS annotated XML
Rights:
Not specified
Publisher:
Budapest University of Technology and Economics Media Research (BME MOKK)
Type:
corpus
Subject:
Web corpus
Language:
Hungarian
Description:
Monolingual written general; 700 million tokens; Segmentation, disambiguation
Rights:
Not specified
Publisher:
Joint Research Centre of the EU
Type:
corpus
Language:
Bulgarian , Czech , Danish , Dutch , English , Estonian , Finnish , French , German , Modern Greek (1453-) , Hungarian , Italian , Latvian , Maltese , Norwegian , Polish , Portuguese , Romanian , Slovak , Slovenian , Spanish , and Swedish
Description:
The largest parallel corpus, contains EU law, the Acquis Communautaire in 22 languages.
Rights:
Not specified
Publisher:
MTA-SZTE Research Group on Artificial Intelligence
Type:
corpus
Subject:
speech corpus
Language:
Hungarian
Description:
spoken, monolingual, manually segmented domain-specific corpus of numbers, 5857 recorded words
Rights:
Not specified
Type:
corpus
Subject:
These databases serve as an important resource for the performance of voice driven teleservice systems in practical implementations
Language:
Czech , Hungarian , Polish , Russian , and Slovak
Description:
5 telephone databases recorded over the PSTN. Contains interesting phonetically rich material. All orthographically transcribed. Speaker information included for gender, age, accent. Including pronunciation lexicon.
Rights:
Not specified
Type:
corpus
Language:
Czech , Danish , Dutch , English , Finnish , French , German , Hungarian , Italian , Polish , Portuguese , Russian , Spanish , Swedish , Turkish , Chinese , Hebrew , Japanese , Korean , and Thai
Description:
28 speech databases containing broadband recordings from 550 adults and 50 children per language. Contains interesting phonetically rich material. All orthographically transcribed. Speaker information included for gender, age, accent. Including pronunciation lexicon.
Rights:
Not specified
Publisher:
Department of Informatics, Human Language Technology Group, University of Szeged
Format:
application/xml
Type:
corpus
Subject:
monolingual corpus , annotated corpus , and POS annotation
Language:
Hungarian
Description:
written, monolingual, general, manually POS annotated reference corpus; 1,247,546 tokens; MSD tagset, XML (TEIxLite) files
Rights:
Not specified
Publisher:
Department of Informatics, Human Language Technology Group, University of Szeged
Format:
application/xml
Type:
corpus
Subject:
monolingual corpus , annotated corpus , and POS annotation
Language:
Hungarian
Description:
written, monolingual, general, manually POS annotated reference corpus; 1,459,288 tokens; MSD tagset, XML (TEI P4) files
Rights:
Not specified