Contributor: Ministerstvo školství, mládeže a tělovýchovy České republiky@@LK11221@@Vývoj metod pro návrh statistických mluvených dialogových systémů@@nationalFunds@@ / Harvested from: LINDAT/CLARIAH-CZ repository / Type: audio

Start Over Contributor Ministerstvo školství, mládeže a tělovýchovy České republiky@@LK11221@@Vývoj metod pro návrh statistických mluvených dialogových systémů@@nationalFunds@@ Type audio Harvested from LINDAT/CLARIAH-CZ repository

1. A Small Dataset for English-to-Czech Speech Translation in the Travel Domain

Creator:: Cífka, Ondřej and Bojar, Ondřej
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: audio and corpus
Subject:: speech corpus, ASR, and machine translation
Language:: English and Czech
Description:: This small dataset contains 3 speech corpora collected using the Alex Translate telephone service (https://ufal.mff.cuni.cz/alex#alex-translate). The "part1" and "part2" corpora contain English speech with transcriptions and Czech translations. These recordings were collected from users of the service. Part 1 contains earlier recordings, filtered to include only clean speech; Part 2 contains later recordings with no filtering applied. The "cstest" corpus contains recordings of artificially created sentences, each containing one or more Czech names of places in the Czech Republic. These were recorded by a multinational group of students studying in Prague.
Rights:: Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB

2. Vystadial 2013 – Czech data

Creator:: Korvas, Matěj, Plátek, Ondřej, Dušek, Ondřej, Žilka, Lukáš, and Jurčíček, Filip
Publisher:: Charles University, Faculty of Mathematics and Physics
Type:: audio and corpus
Subject:: acoustic data, speech corpus, spoken corpus, orthographic transcriptions, telephone speech, voip, and dialogue system
Language:: Czech
Description:: Vystadial 2013 is a dataset of telephone conversations in English and Czech, developed for training acoustic models for automatic speech recognition in spoken dialogue systems. It ships in three parts: Czech data, English data, and scripts. The data comprise over 41 hours of speech in English and over 15 hours in Czech, plus orthographic transcriptions. The scripts implement data pre-processing and building acoustic models using the HTK and Kaldi toolkits. This is the Czech data part of the dataset. and This research was funded by the Ministry of Education, Youth and Sports of the Czech Republic under the grant agreement LK11221.
Rights:: Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0), http://creativecommons.org/licenses/by-sa/3.0/, and PUB

3. Vystadial 2013 – English data

Creator:: Korvas, Matěj, Plátek, Ondřej, Dušek, Ondřej, Žilka, Lukáš, and Jurčíček, Filip
Publisher:: Charles University, Faculty of Mathematics and Physics
Type:: audio and corpus
Subject:: acoustic data, speech corpus, spoken corpus, orthographic transcriptions, telephone speech, voip, and dialogue system
Language:: English
Description:: Vystadial 2013 is a dataset of telephone conversations in English and Czech, developed for training acoustic models for automatic speech recognition in spoken dialogue systems. It ships in three parts: Czech data, English data, and scripts. The data comprise over 41 hours of speech in English and over 15 hours in Czech, plus orthographic transcriptions. The scripts implement data pre-processing and building acoustic models using the HTK and Kaldi toolkits. This is the English data part of the dataset. and This research was funded by the Ministry of Education, Youth and Sports of the Czech Republic under the grant agreement LK11221.
Rights:: Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0), http://creativecommons.org/licenses/by-sa/3.0/, and PUB

4. Vystadial 2016 – Czech data

Creator:: Plátek, Ondřej, Dušek, Ondřej, and Jurčíček, Filip
Publisher:: Charles University, Faculty of Mathematics and Physics
Type:: audio and corpus
Subject:: acoustic data, speech corpus, spoken corpus, telephone speech, voip, and dialogue system
Language:: Czech
Description:: This is the Czech data collected during the `VYSTADIAL` project. It is an extension of the 'Vystadial 2013' Czech part data release. The dataset comprises of telephone conversations in Czech, developed for training acoustic models for automatic speech recognition in spoken dialogue systems.
Rights:: Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB

Search

Search Constraints

Search Results

Limit your search

Contributor

Creator

Language

Publisher

Rights

Subject

Type

Original context has metadata only

Harvested from