Show simple item record

 
dc.contributor.author Lukeš, David
dc.contributor.author Kopřivová, Marie
dc.contributor.author Laubeová, Zuzana
dc.contributor.author Poukarová, Petra
dc.contributor.author Horký, Václav
dc.contributor.author Jelínek, Tomáš
dc.contributor.author Křivan, Jan
dc.contributor.author Waclawičová, Martina
dc.contributor.author Benešová, Lucie
dc.contributor.author Škarpová, Marie
dc.date.accessioned 2024-10-10T10:40:13Z
dc.date.available 2024-10-10T10:40:13Z
dc.date.issued 2024-07-15
dc.identifier.uri http://hdl.handle.net/11234/1-5686
dc.description ORTOFON v3 is a corpus of authentic spoken Czech used in informal situations (private environment, spontaneity, unpreparedness etc.) that covers the area of the whole Czech Republic. The corpus is composed of 697 recordings from 2012–2020 and contains 2 445 793 orthographic words (i.e. a total of 2 976 742 tokens including punctuation); a total of 1 121 different speakers appear in the probes. ORTOFON v3 is partially balanced regarding the basic sociolinguistic speaker categories (gender, age group, level of education and region of childhood residence). The transcription is linked to the corresponding audio track. Unlike the ORAL-series corpora, the transcription was carried out on two main tiers, orthographic and phonetic, supplemented by an additional metalanguage tier. The (anonymized) transcriptions are provided in the XML Elan Annotation format, audio (with corresponding anonymization beeps) is in uncompressed 16-bit PCM WAV, mono, 16 kHz format. Another format option of the transcriptions is also available under less restrictive CC BY-NC-SA license at http://hdl.handle.net/11234/1-5687
dc.language.iso ces
dc.publisher Charles University, Faculty of Arts, Institute of the Czech National Corpus
dc.relation.replaces http://hdl.handle.net/11234/1-2579
dc.rights License Agreement for Czech National Corpus Data
dc.rights.uri https://lindat.mff.cuni.cz/repository/xmlui/page/license-cnc-data
dc.source.uri https://wiki.korpus.cz/doku.php/en:cnk:ortofon
dc.subject spoken language
dc.subject informal language
dc.title ORTOFON v3: corpus of informal spoken Czech with multi-tier transcription (transcriptions & audio)
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType audio
dc.rights.label ACA
has.files yes
branding LINDAT / CLARIAH-CZ
demo.uri https://www.korpus.cz/kontext/query?corpname=ortofon_v3
contact.person Michal Křen michal.kren@ff.cuni.cz Charles University, Faculty of Arts, Institute of the Czech National Corpus
sponsor Ministerstvo školství, mládeže a tělovýchovy LM2023044 Český národní korpus nationalFunds
size.info 2400000 words
files.size 62236655384
files.count 1


 Files in this item

This item is
Academic Use
and licensed under:
License Agreement for Czech National Corpus Data
Attribution Required Noncommercial
Icon
Name
ortofon_v3.tar.gz
Size
57.96 GB
Format
application/x-gzip
Description
Unknown
MD5
ac4887cda40b8523e33345b11b34d606
 Download file

Show simple item record