This corpora is part of Deliverable 5.5 of the European Commission project QTLeap FP7-ICT-2013.4.1-610516 (http://qtleap.eu).
The texts are sentences from the Europarl parallel corpus (Koehn, 2005). We selected the monolingual sentences from parallel corpora for the following pairs: Bulgarian-English, Czech-English, Portuguese-English and Spanish-English. The English corpus is comprised by the English side of the Spanish-English corpus.
Basque is not in Europarl. In addition, it contains the Basque and English sides of the GNOME corpus.
The texts have been automatically annotated with NLP tools, including Word Sense Disambiguation, Named Entity Disambiguation and Coreference resolution. Please check deliverable D5.6 in http://qtleap.eu/deliverables for more information.
Údaje o názvu převzaty z obálky and Pod názvem "Version preliminar de un capítulo preparado para la Historia General de América, Comisión de Historia del I.P.G.H."
This package contains an extended version of the test collection used in the CLEF eHealth Information Retrieval tasks in 2013--2015. Compared to the original version, it provides complete query translations into Czech, French, German, Hungarian, Polish, Spanish and Swedish and additional relevance assessment.