Speech corpus comprising 4608 spoken sentences recorded for speech timing research. The complete archive, available for downloading, includes a structured list of the sentences, the speech recordings and the label files, plus full documentation.
eXist-db is an open source database management system entirely built on XML technology. It stores XML data according to the XML data model and features efficient, index-based XQuery processing.
This reference corpus of written Slovenian is a precursor to the Gigafida corpora (see http://hdl.handle.net/11356/1320 for version 2.0).
It contains 600 million words and 738.5 million tokens. In terms of annotation, it is tagged for morphosyntactic descriptors (MSD tags) and lemmatised.