dc.contributor.author |
Lukeš, David |
dc.contributor.author |
Kopřivová, Marie |
dc.contributor.author |
Laubeová, Zuzana |
dc.contributor.author |
Poukarová, Petra |
dc.contributor.author |
Horký, Václav |
dc.contributor.author |
Jelínek, Tomáš |
dc.contributor.author |
Křivan, Jan |
dc.contributor.author |
Waclawičová, Martina |
dc.contributor.author |
Benešová, Lucie |
dc.contributor.author |
Škarpová, Marie |
dc.date.accessioned |
2024-10-10T10:40:13Z |
dc.date.available |
2024-10-10T10:40:13Z |
dc.date.issued |
2024-07-15 |
dc.identifier.uri |
http://hdl.handle.net/11234/1-5686 |
dc.description |
ORTOFON v3 is a corpus of authentic spoken Czech used in informal situations (private environment, spontaneity, unpreparedness etc.) that covers the area of the whole Czech Republic. The corpus is composed of 697 recordings from 2012–2020 and contains 2 445 793 orthographic words (i.e. a total of 2 976 742 tokens including punctuation); a total of 1 121 different speakers appear in the probes. ORTOFON v3 is partially balanced regarding the basic sociolinguistic speaker categories (gender, age group, level of education and region of childhood residence). The transcription is linked to the corresponding audio track. Unlike the ORAL-series corpora, the transcription was carried out on two main tiers, orthographic and phonetic, supplemented by an additional metalanguage tier. The (anonymized) transcriptions are provided in the XML Elan Annotation format, audio (with corresponding anonymization beeps) is in uncompressed 16-bit PCM WAV, mono, 16 kHz format. Another format option of the transcriptions is also available under less restrictive CC BY-NC-SA license at http://hdl.handle.net/11234/1-5687 |
dc.language.iso |
ces |
dc.publisher |
Charles University, Faculty of Arts, Institute of the Czech National Corpus |
dc.relation.replaces |
http://hdl.handle.net/11234/1-2579 |
dc.rights |
License Agreement for Czech National Corpus Data |
dc.rights.uri |
https://lindat.mff.cuni.cz/repository/xmlui/page/license-cnc-data |
dc.source.uri |
https://wiki.korpus.cz/doku.php/en:cnk:ortofon |
dc.subject |
spoken language |
dc.subject |
informal language |
dc.title |
ORTOFON v3: corpus of informal spoken Czech with multi-tier transcription (transcriptions & audio) |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
audio |
dc.rights.label |
ACA |
has.files |
yes |
branding |
LINDAT / CLARIAH-CZ |
demo.uri |
https://www.korpus.cz/kontext/query?corpname=ortofon_v3 |
contact.person |
Michal Křen michal.kren@ff.cuni.cz Charles University, Faculty of Arts, Institute of the Czech National Corpus |
sponsor |
Ministerstvo školství, mládeže a tělovýchovy LM2023044 Český národní korpus nationalFunds |
size.info |
2400000 words |
files.size |
62236655384 |
files.count |
1 |