VIADAT-SEARCH in connection with VIADAT-REPO enables searching transcripts of oral history recordings. Language analysis has been used to preprocess the recordings, which makes it possible to search the fulltext using multiple criteria, including names, different forms of the same word etc.
Developed in cooperation with ÚSD AV ČR and NFA.
A VIADAT module; the purpose of VIADAT-STAT is statistical analysis of recordings stored by the platform.
Developed in cooperation with ÚSD AV ČR and NFA.
A VIADAT module; the purpose of VIADAT-STAT is statistical analysis of recordings stored by the platform.
Developed in cooperation with ÚSD AV ČR and NFA.
Vystadial 2013 is a dataset of telephone conversations in English and Czech, developed for training acoustic models for automatic speech recognition in spoken dialogue systems. It ships in three parts: Czech data, English data, and scripts.
The data comprise over 41 hours of speech in English and over 15 hours in Czech, plus orthographic transcriptions. The scripts implement data pre-processing and building acoustic models using the HTK and Kaldi toolkits.
This is the Czech data part of the dataset. and This research was funded by the Ministry of
Education, Youth and Sports of the Czech Republic under the grant agreement
LK11221.
Vystadial 2013 is a dataset of telephone conversations in English and Czech, developed for training acoustic models for automatic speech recognition in spoken dialogue systems. It ships in three parts: Czech data, English data, and scripts.
The data comprise over 41 hours of speech in English and over 15 hours in Czech, plus orthographic transcriptions. The scripts implement data pre-processing and building acoustic models using the HTK and Kaldi toolkits.
This is the scripts part of the dataset. and This research was funded by the Ministry of
Education, Youth and Sports of the Czech Republic under the grant agreement
LK11221.
This is the Czech data collected during the `VYSTADIAL` project. It is an extension of the 'Vystadial 2013' Czech part data release. The dataset comprises of telephone conversations in Czech, developed for training acoustic models for automatic speech recognition in spoken dialogue systems.
A set of corpora for 120 languages automatically collected from wikipedia and the web.
Collected using the W2C toolset: http://hdl.handle.net/11858/00-097C-0000-0022-60D6-1