Language: Czech / Rights: PUB - LINDAT/CLARIAH-CZ Catalog Search Results

441. sqad 3.0

Creator:: Medveď, Marek and Horák, Aleš
Publisher:: Masaryk University, NLP Centre
Type:: text and corpus
Subject:: Simple Question Answering Database, Czech, and question answering
Language:: Czech
Description:: Simple question answering database version 3 (SQAD v3) created from Czech Wikipedia. New version consits of 13477 records. Each record of SQAD consist of multiple files - question, answer extraction, answer selection, ulr, question metadata and in some cases answer context.
Rights:: GNU Library or "Lesser" General Public License 3.0 (LGPL-3.0), http://opensource.org/licenses/LGPL-3.0, and PUB

442. SQAD 3.2

Creator:: Medveď, Marek
Publisher:: Masaryk University, NLP Centre
Type:: text and corpus
Subject:: QA, Question Answering, SQAD, and Czech QA
Language:: Czech
Description:: Simple question answering database version 3.2 (SQAD v3.2) created from Czech Wikipedia. The new version consists of more than 16000 records. Each record of SQAD consists of multiple files - question, answer extraction, answer selection, URL, question metadata, and in some cases, answer context.
Rights:: Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0), http://creativecommons.org/licenses/by-sa/3.0/, and PUB

443. SQAD v2

Creator:: Medveď, Marek, Horák, Aleš, and Šulganová, Terézia
Publisher:: Natural Language Processing Centre, Faculty of Informatics, Masaryk University
Type:: text and corpus
Subject:: question answering, Czech, and Simple Question Answering Database
Language:: Czech
Description:: Simple question answering database (SQAD) created from Czech Wikipedia. Each record of SQAD consist of four files (in vertical form provided with lemmatization and POS tagging) and two metadata files.
Rights:: GNU General Public Licence, version 3, http://opensource.org/licenses/GPL-3.0, and PUB

444. State Defence Jubilee Fund

Creator:: Aktualita
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: cvičení vojenské československé, akce Půjčka na obranu státu, armáda československá, sbírka Půjčka na obranu státu, bombardování, letadlo vojenské, město bombardované, agitace podpory armády, vojáci českoslovenští, Fond na obranu státu, noviny České slovo, noviny Prager Presse, noviny Národní politika, noviny Venkov, noviny Pražské noviny, Mnichovská dohoda, and Československý zvukový týdeník Aktualita::1938/26
Language:: Czech
Description:: The segment of Československý zvukový týdeník Aktualita (Czechoslovak Aktualita Sound Newsreel), 1938, issue no. 26 promotes the State Defence Jubilee Fund intended for modernizing the Czechoslovak Army, and asks for contributions to be made to account no. 400 at the Postal Savings Bank.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

445. STAZKA – Speech recordings from vehicles

Creator:: Šmídl, Luboš, Stanislav, Petr, and Radová, Vlasta
Publisher:: University of West Bohemia, Department of Cybernetics
Type:: audio and corpus
Subject:: speech corpus, noisy speech, voice activity detector, and speech recognition
Language:: Czech
Description:: The database actually contains two sets of recordings, both recorded in the moving or stationary vehicles (passenger cars or trucks). All data were recorded within the project “Intelligent Electronic Record of the Operation and Vehicle Performance” whose aim is to develop a voice-operated software for registering the vehicle operation data. The first part (full_noises.zip) consists of relatively long recordings from the vehicle cabin, containing spontaneous speech from the vehicle crew. The recordings are accompanied with detailed transcripts in the Transcriber XML-based format (.trs). Due to the recording settings, the audio contains many different noises, only sparsely interspersed with speech. As such, the set is suitable for robust estimation of the voice activity detector parameters. The second set (prompts.zip) consists of short prompts that were recorded in the controlled setting – the speakers either answered simple questions or they repeated commands and short phrases. The prompts were recorded by 26 different speakers. Each speaker recorded at least two sessions (with identical set of prompts) – first in stationary vehicle, with low level of noise (those recordings are marked by –A_ in the file name) and second while actually driving the car (marked by –B_ or, since several speakers recorded 3 sessions, by –C_). The recordings from this set are suitable mostly for training of the robust domain-specific speech recognizer and also ASR test purposes.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

446. Students Harvesting Potatoes

Creator:: Aktualita
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: brigáda bramborová, studenti pracující, pytle zašívání, výdej jídla, jídlo výdej, vagony brambor, brambory, Protektorát zemědělství, vagony nákladní, Kuratorium, and Český zvukový týdeník Aktualita::1943/46
Language:: Czech
Description:: Segment from Český zvukový týdeník Aktualita (Czech Aktualita Sound Newsreel) issue no. 46B from 1943 presents footage of the voluntary work and help with harvesting organised by the Board of Trustees for the Education of Youth as part of mandatory service. Older teenagers worked at railway stations, unloading potatoes.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

447. STYX

Creator:: Kučera, Ondřej
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: toolService
Subject:: education, morphology, and syntax
Language:: Czech
Description:: The STYX system is an electronic exercise book for practising Czech morphology and syntax consisting of more than 11, 000 sentences.
Rights:: GNU General Public Licence, version 3, http://opensource.org/licenses/GPL-3.0, and PUB

448. STYX 1.0

Creator:: Hladká, Barbora, Kučera, Ondřej, and Kuchyňová, Karolína
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: annotated corpus, syntax, and sentence diagramming
Language:: Czech
Description:: STYX 1.0 is a corpus of Czech sentences selected from the Prague Dependency treebank. The criterion for including sentences into STYX was their suitability for practicing Czech morphology and syntax in elementary schools. The sentences contain both the PDT annotations and the school sentence analyses. The school sentence analyses were created by transforming the PDT annotations using handcrafted rules. Altogether the STYX 1.0 corpus contains 11 655 sentences. Originally, the STYX 1.0 corpus was an inseparable part of the Styx system (http://hdl.handle.net/11858/00-097C-0000-0001-48FB-F)
Rights:: Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), http://creativecommons.org/licenses/by-nc-nd/4.0/, and PUB

449. STYX 1.0 (2017-10-03)

Creator:: Hladká, Barbora, Kučera, Ondřej, and Kuchyňová, Karolína
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: annotated corpus, syntax, and sentence diagramming
Language:: Czech
Description:: STYX 1.0 is a corpus of Czech sentences selected from the Prague Dependency treebank. The criterion for including sentences into STYX was their suitability for practicing Czech morphology and syntax in elementary schools. The sentences contain both the PDT annotations and the school sentence analyses. The school sentence analyses were created by transforming the PDT annotations using handcrafted rules. Altogether the STYX 1.0 corpus contains 11 655 sentences. Originally, the STYX 1.0 corpus was an inseparable part of the Styx system (http://hdl.handle.net/11858/00-097C-0000-0001-48FB-F)
Rights:: Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), http://creativecommons.org/licenses/by-nc-nd/4.0/, and PUB

450. SumeCzech

Creator:: Straka, Milan, Mediankin, Nikita, Kocmi, Tom, Žabokrtský, Zdeněk, Hudeček, Vojtěch, and Hajič, Jan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: summarization, SumeCzech, and Rouge
Language:: Czech
Description:: This entry contains the SumeCzech dataset and the metric RougeRAW used for evaluation. Both the dataset and the metric are described in the paper "SumeCzech: Large Czech News-Based Summarization Dataset" by Milan Straka et al. The dataset is distributed as a set of Python scripts which download the raw HTML pages from CommonCrawl and then process them into the required format. The MPL 2.0 license applies to the scripts downloading the dataset and to the RougeRAW implementation. Note: sumeczech-1.0-update-230225.zip is the updated release of the SumeCzech download script, including the original RougeRAW evaluation metric. The download script was modified to use the updated CommonCraw download URL and to support Python 3.10 and Python 3.11. However, the downloaded dataset is still exactly the same. The original archive sumeczech-1.0.zip was renamed to sumeczech-1.0-obsolete-180213.zip and is kept for reference.
Rights:: Mozilla Public License 2.0, http://opensource.org/licenses/MPL-2.0, and PUB

441. sqad 3.0

442. SQAD 3.2

443. SQAD v2

444. State Defence Jubilee Fund

445. STAZKA – Speech recordings from vehicles

446. Students Harvesting Potatoes

447. STYX

448. STYX 1.0

449. STYX 1.0 (2017-10-03)

450. SumeCzech

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Creator

Show values starting with

Language

Show values starting with

Publisher

Show values starting with

Rights

Show values starting with

Subject

Show values starting with

Type

Show values starting with

Date

Original context has metadata only

Harvested from