Contributor: Ministerstvo školství, mládeže a tělovýchovy České republiky@@7E09003@@EuroMatrixPlus – Bringing Machine Translation for European Languages to the User@@nationalFunds@@ / Creator: Bojar, Ondřej / Rights: Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0)

Start Over Contributor Ministerstvo školství, mládeže a tělovýchovy České republiky@@7E09003@@EuroMatrixPlus – Bringing Machine Translation for European Languages to the User@@nationalFunds@@ Creator Bojar, Ondřej Rights Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0)

1. Hindi Web Texts

Creator:: Bojar, Ondřej, Straňák, Pavel, and Zeman, Daniel
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: news and web texts
Language:: Hindi
Description:: A Hindi corpus of texts downloaded mostly from news sites. Contains both the original raw texts and an extensively cleaned-up and tokenized version suitable for language modeling. 18M sentences, 308M tokens and FP7-ICT-2007-3-231720 (EuroMatrix Plus), 7E09003 (Czech part of EM+)
Rights:: Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0), http://creativecommons.org/licenses/by-nc/3.0/, and PUB

Search

Search Constraints

Search Results

Limit your search

Contributor

Creator

Language

Publisher

Rights

Subject

Type

Original context has metadata only

Harvested from