Vystadial 2013 is a dataset of telephone conversations in English and Czech, developed for training acoustic models for automatic speech recognition in spoken dialogue systems. It ships in three parts: Czech data, English data, and scripts.
The data comprise over 41 hours of speech in English and over 15 hours in Czech, plus orthographic transcriptions. The scripts implement data pre-processing and building acoustic models using the HTK and Kaldi toolkits.
This is the English data part of the dataset. and This research was funded by the Ministry of
Education, Youth and Sports of the Czech Republic under the grant agreement
LK11221.
Vystadial 2013 is a dataset of telephone conversations in English and Czech, developed for training acoustic models for automatic speech recognition in spoken dialogue systems. It ships in three parts: Czech data, English data, and scripts.
The data comprise over 41 hours of speech in English and over 15 hours in Czech, plus orthographic transcriptions. The scripts implement data pre-processing and building acoustic models using the HTK and Kaldi toolkits.
This is the scripts part of the dataset. and This research was funded by the Ministry of
Education, Youth and Sports of the Czech Republic under the grant agreement
LK11221.
This is the Czech data collected during the `VYSTADIAL` project. It is an extension of the 'Vystadial 2013' Czech part data release. The dataset comprises of telephone conversations in Czech, developed for training acoustic models for automatic speech recognition in spoken dialogue systems.
A set of corpora for 120 languages automatically collected from wikipedia and the web.
Collected using the W2C toolset: http://hdl.handle.net/11858/00-097C-0000-0022-60D6-1
A tool used to build multilingual corpora from wikipedia. Download the web pages, convert them to plain text, identify language, etc.
A set of 120 corpora collected using this tool is available at https://ufal-point.mff.cuni.cz/xmlui/handle/11858/00-097C-0000-0022-6133-9
entstanden zwischen 1830 und 1880; erarbeitet von Karl Friedrich Wilhelm Wander; Angabe von Sprichwörtern, in denen der jeweilige Suchbegriff vorkommt; neben deutschen Sprichwörtern auch Hinweis auf ähnliche (außer-)europäische Wendungen; enthält 250.000 Sprichwörter
Segment from Český zvukový týdeník Aktualita (Czech Aktualita Sound Newsreel) 1943 issue no. 26A from 1943 captures the Slavia vs. Pilsen water polo match that was a part of the Provincial Youth Swimming Championship organised by the Board of Trustees for the Education of Youth in cooperation with the Czech Amateur Swimming Union and held at the swimming pool in Prague-Barrandov on 3 and 4 July.
WaveSurfer is an Open Source tool for sound visualization and manipulation. It has been designed to suit both novice and advanced users. WaveSurfer has a simple and logical user interface that provides functionality in an intuitive way and which can be adapted to different tasks. It can be used as a stand-alone tool for a wide range of tasks in speech research and education. Typical applications are speech/sound analysis and sound annotation/transcription. WaveSurfer can also serve as a platform for more advanced/specialized applications. This is accomplished either through extending the WaveSurfer application with new custom plug-ins or by embedding WaveSurfer visualization components in other applications.