This dataset can serve as a training and evaluation corpus for the task of training keyword detection with speaker direction estimation (keyword direction of arrival - KWDOA).
It was created by processing the existing Speech Commands dataset [1] with the PyroomAcoustics library so that the resulting speech recordings simulate the usage of a circular microphone array with 4 microphones having a distance of 57 mm between adjacent microphones. Such design of a simulated microphone array was chosen in order to match the existing physical microphone array from the Seeeduino series.
[1] Warden, Pete. “Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition.” ArXiv.org, 2018, arxiv.org/abs/1804.03209
SPRAAK (also Dutch for 'speech') is a speech recognition package. As such it is useful for transcription of speech, alignment of spoken and written language, annotation of corpora, etc. It is an efficient and flexible tool that combines many of the recent advancements in automatic speech recognition with a very efficient decoder in a proven HMM architecture. SPRAAK can be adapted for all languages, except tonal ones.
SpeechRecorder is a platform independent multi-channel audio recording software. Its main features are a configurable recording script, Unicode text, image and audio prompts, hardware independence and localized language interfaces.
Talks of Karel Makoň given to his friends in the course of late sixties through early nineties of the 20th century. The topic is mostly christian mysticism.
Mainly written Swedish corpora (all time periods except Runic Swedish; various genres, including learner corpora) and lexicons; some non-Swedish corpora (Faroese, Old Icelandic, Latin, Spanish); Swedish corpora (appr. 200 MW); Swedish lexicons (appr. 220,000 entries total); non-Swedish corpora (appr. 15 MW
The SQAD database consists of 3301 records obtained from Czech Wikipedia articles. The record structure is following:
- the original sentence(s) from Wikipedia
- a question that is directly answered in the text
- the expected answer to the question as it appears in the original text
- the URL of the Wikipedia web page from which the original text was extracted
- name of the author of this SQAD record
Simple question answering database version 2.1 (SQAD_v2.1) created from Czech Wikipedia. Each record of SQAD consist of four files (in vertical form provided with lemmatization and POS tagging) and two metadata files.
Simple question answering database version 3 (SQAD v3) created from Czech Wikipedia. New version consits of 13477 records. Each record of SQAD consist of multiple files - question, answer extraction, answer selection, ulr, question metadata and in some cases answer context.