Harvested from: LINDAT/CLARIAH-CZ repository / Language: Czech - LINDAT/CLARIAH-CZ Catalog Search Results

321. VIADAT-GIS

Creator:: Böhm, Stanislav, Hajič, Jan, Srdečný, Vojtěch, Toman, Josef, and Košarko, Ondřej
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: infrastructure and toolService
Subject:: oral history, speech, and search
Language:: Czech
Description:: A VIADAT module; VIADAT-GIS connects the platform with maps. Developed in cooperation with ÚSD AV ČR and NFA.
Rights:: BSD 3-Clause "New" or "Revised" license, http://opensource.org/licenses/BSD-3-Clause, and PUB

322. VIADAT-GIS (2019-12-31)

Creator:: Böhm, Stanislav, Hajič, Jan, Srdečný, Vojtěch, Toman, Josef, and Košarko, Ondřej
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: infrastructure and toolService
Subject:: oral history, speech, and search
Language:: Czech
Description:: A VIADAT module; VIADAT-GIS connects the platform with maps. Developed in cooperation with ÚSD AV ČR and NFA.
Rights:: BSD 3-Clause "New" or "Revised" license, http://opensource.org/licenses/BSD-3-Clause, and PUB

323. VIADAT-SEARCH

Creator:: Böhm, Stanislav and Hajič, Jan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: infrastructure and toolService
Subject:: oral history, speech, and search
Language:: Czech
Description:: VIADAT-SEARCH in connection with VIADAT-REPO enables searching transcripts of oral history recordings. Language analysis has been used to preprocess the recordings, which makes it possible to search the fulltext using multiple criteria, including names, different forms of the same word etc. Developed in cooperation with ÚSD AV ČR and NFA.
Rights:: BSD 3-Clause "New" or "Revised" license, http://opensource.org/licenses/BSD-3-Clause, and PUB

327. Vystadial 2013 – Czech data

Creator:: Korvas, Matěj, Plátek, Ondřej, Dušek, Ondřej, Žilka, Lukáš, and Jurčíček, Filip
Publisher:: Charles University, Faculty of Mathematics and Physics
Type:: audio and corpus
Subject:: acoustic data, speech corpus, spoken corpus, orthographic transcriptions, telephone speech, voip, and dialogue system
Language:: Czech
Description:: Vystadial 2013 is a dataset of telephone conversations in English and Czech, developed for training acoustic models for automatic speech recognition in spoken dialogue systems. It ships in three parts: Czech data, English data, and scripts. The data comprise over 41 hours of speech in English and over 15 hours in Czech, plus orthographic transcriptions. The scripts implement data pre-processing and building acoustic models using the HTK and Kaldi toolkits. This is the Czech data part of the dataset. and This research was funded by the Ministry of Education, Youth and Sports of the Czech Republic under the grant agreement LK11221.
Rights:: Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0), http://creativecommons.org/licenses/by-sa/3.0/, and PUB

328. Vystadial 2013 – scripts

Creator:: Korvas, Matěj, Plátek, Ondřej, Dušek, Ondřej, Žilka, Lukáš, and Jurčíček, Filip
Publisher:: Charles University, Faculty of Mathematics and Physics
Type:: toolService and tool
Subject:: ASR, HTK, Kaldi, and acoustic model
Language:: English and Czech
Description:: Vystadial 2013 is a dataset of telephone conversations in English and Czech, developed for training acoustic models for automatic speech recognition in spoken dialogue systems. It ships in three parts: Czech data, English data, and scripts. The data comprise over 41 hours of speech in English and over 15 hours in Czech, plus orthographic transcriptions. The scripts implement data pre-processing and building acoustic models using the HTK and Kaldi toolkits. This is the scripts part of the dataset. and This research was funded by the Ministry of Education, Youth and Sports of the Czech Republic under the grant agreement LK11221.
Rights:: Apache License 2.0, http://opensource.org/licenses/Apache-2.0, and PUB

329. Vystadial 2016 – Czech data

Creator:: Plátek, Ondřej, Dušek, Ondřej, and Jurčíček, Filip
Publisher:: Charles University, Faculty of Mathematics and Physics
Type:: audio and corpus
Subject:: acoustic data, speech corpus, spoken corpus, telephone speech, voip, and dialogue system
Language:: Czech
Description:: This is the Czech data collected during the `VYSTADIAL` project. It is an extension of the 'Vystadial 2013' Czech part data release. The dataset comprises of telephone conversations in Czech, developed for training acoustic models for automatic speech recognition in spoken dialogue systems.
Rights:: Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB

330. W2C – Web to Corpus – Corpora

Creator:: Majliš, Martin
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: multilingual corpora
Language:: Afrikaans, Tosk Albanian, Amharic, Arabic, Aragonese, Egyptian Arabic, Asturian, Azerbaijani, Belarusian, Bengali, Bosnian, Bishnupriya, Breton, Buginese, Bulgarian, Catalan, Cebuano, Czech, Chuvash, Corsican, Welsh, Danish, German, Dimli (individual language), Modern Greek (1453-), English, Esperanto, Estonian, Basque, Faroese, Persian, Finnish, French, Western Frisian, Gan Chinese, Scottish Gaelic, Irish, Galician, Gilaki, Gujarati, Haitian, Serbo-Croatian, Hebrew, Fiji Hindi, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Ido, Interlingua (International Auxiliary Language Association), Indonesian, Icelandic, Italian, Javanese, Japanese, Kannada, Georgian, Kazakh, Korean, Kurdish, Latin, Latvian, Limburgan, Lithuanian, Lombard, Luxembourgish, Malayalam, Marathi, Macedonian, Malagasy, Mongolian, Maori, Malay (macrolanguage), Burmese, Neapolitan, Low German, Nepali (macrolanguage), Newari, Dutch, Norwegian Nynorsk, Norwegian, Occitan (post 1500), Ossetian, Pampanga, Piemontese, Polish, Portuguese, Quechua, Romanian, Russian, Yakut, Sicilian, Scots, Slovak, Slovenian, Spanish, Albanian, Serbian, Sundanese, Swahili (macrolanguage), Swedish, Tamil, Tatar, Telugu, Tajik, Tagalog, Thai, Turkish, Ukrainian, Urdu, Uzbek, Venetian, Vietnamese, Volapük, Waray (Philippines), Walloon, Yiddish, Yoruba, and Chinese
Description:: A set of corpora for 120 languages automatically collected from wikipedia and the web. Collected using the W2C toolset: http://hdl.handle.net/11858/00-097C-0000-0022-60D6-1
Rights:: Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0), http://creativecommons.org/licenses/by-sa/3.0/, and PUB

321. VIADAT-GIS

322. VIADAT-GIS (2019-12-31)

323. VIADAT-SEARCH

324. VIADAT-STAT

325. VIADAT-STAT (2019-12-31)

326. VIADAT-TEXT

327. Vystadial 2013 – Czech data

328. Vystadial 2013 – scripts

329. Vystadial 2016 – Czech data

330. W2C – Web to Corpus – Corpora

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Coverage

Creator

Show values starting with

Format

Language

Show values starting with

Publisher

Show values starting with

Rights

Show values starting with

Subject

Show values starting with

Type

Show values starting with

Original context has metadata only

Harvested from