Show simple item record

 
dc.contributor.author Macháček, Dominik
dc.contributor.author Kratochvíl, Jonáš
dc.contributor.author Vojtěchová, Tereza
dc.contributor.author Bojar, Ondřej
dc.date.accessioned 2019-07-15T14:53:51Z
dc.date.available 2019-07-15T14:53:51Z
dc.date.issued 2019-07-13
dc.identifier.uri http://hdl.handle.net/11234/1-3023
dc.description We present a test corpus of audio recordings and transcriptions of presentations of students' enterprises together with their slides and web-pages. The corpus is intended for evaluation of automatic speech recognition (ASR) systems, especially in conditions where the prior availability of in-domain vocabulary and named entities is benefitable. The corpus consists of 39 presentations in English, each up to 90 seconds long, and slides and web-pages in Czech, Slovak, English, German, Romanian, Italian or Spanish. The speakers are high school students from European countries with English as their second language. We benchmark three baseline ASR systems on the corpus and show their imperfection.
dc.language.iso eng
dc.publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.relation info:eu-repo/grantAgreement/EC/H2020/825460
dc.relation.isreferencedby https://doi.org/10.1007/978-3-030-31372-2_13
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri http://creativecommons.org/licenses/by/4.0/
dc.subject ASR
dc.subject ASR evaluation
dc.subject speech corpus
dc.subject non-native English
dc.subject speech recognition
dc.subject speech recognition evaluation
dc.subject speech and relevant texts
dc.subject European non-native English
dc.title A Speech Test Set of Practice Business Presentations with Additional Relevant Texts
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType audio
dc.rights.label PUB
has.files yes
branding LINDAT / CLARIAH-CZ
contact.person Macháček Dominik machacek@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
sponsor European Union H2020-ICT-2018-2-825460 ELITR - European Live Translator euFunds info:eu-repo/grantAgreement/EC/H2020/825460
sponsor Czech Science Foundation 19-26934X Neural Representations in Multi-modal and Multi-lingual Modelling nationalFunds
size.info 59 minutes
size.info 39 entries
files.size 929830594
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Name
antrecorp-final.zip
Size
886.76 MB
Format
application/zip
Description
zipped corpus
MD5
cefb646bf626af7094921204bbfa69ca
 Download file  Preview
 File Preview  

Show simple item record