This corpora is part of Deliverable 5.5 of the European Commission project QTLeap FP7-ICT-2013.4.1-610516 (http://qtleap.eu).
The texts are Q&A interactions from the real-user scenario (batches 1 and 2). The interactions in this corpus are available in Basque, Bulgarian, Czech, English, Portuguese and Spanish.
The texts have been automatically annotated with NLP tools, including Word Sense Disambiguation, Named Entity Disambiguation and Coreference resolution. Please check deliverable D5.6 in http://qtleap.eu/deliverables for more information.
An XML-based file containing Arabic Stop-words respecting nouns syntax; particle nouns, signal nouns, separated pronouns and connected nouns
Citation: Driss Namly, Yasser Regragui, Karim Bouzoubaa. "Interoperable Arabic language resources building and exploitation in SAFAR platform". 13th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA) November 29th to December 2nd, 2016.
Historical dictionary of the Swedish language. Includes information about pronunciation, inflexions, variant forms, etymologies, usages and definitions.
"Teaching Dutch in primary and secondary education: a video collection" is a collection of approx. 80 hours of filmed lessons in Dutch and Flemish primary and secondary schools. The lessons were collected in a searchable database and they were enriched with metadata and annotations.
AMALACH project component TMODS:ENG-CZE; machine translation of queries from Czech to English. This archive contains models for the Moses decoder (binarized, pruned to allow for real-time translation) and configuration files for the MTMonkey toolkit. The aim of this package is to provide a full service for Czech->English translation which can be easily utilized as a component in a larger software solution. (The required tools are freely available and an installation guide is included in the package.)
The translation models were trained on CzEng 1.0 corpus and Europarl. Monolingual data for LM estimation additionally contains WMT news crawls until 2013.
This is a state-of-the-art pipeline of Turkish NLP tools (sentence splitting, tokenisation, normalisation, deasciification, vowelisation, spelling correction, morphological analysis/disambiguation, named entity recognition, dependency parsing). The platform operates as a SaaS (Software as a Service) and provides the researchers and the students the state of the art NLP tools in many layers: preprocessing, morphology, syntax and entity recognition.
The users may communicate with the platform via three channels: via a user friendly web interface, file uploads, AP.