Creator: Švec, Jan - LINDAT/CLARIAH-CZ Catalog Search Results

Start Over Creator Švec, Jan

Creator:: Galuščáková, Petra, Pecina, Pavel, Hoffmannová, Petra, Hajič, Jan, Ircing, Pavel, and Švec, Jan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: audio and corpus
Subject:: annotated corpus, corpus, speech corpus, annotation, audio, and multilingual
Language:: Czech, English, French, German, and Spanish
Description:: The package contains Czech recordings of the Visual History Archive which consists of the interviews with the Holocaust survivors. The archive consists of audio recordings, four types of automatic transcripts, manual annotations of selected topics and interviews' metadata. The archive totally contains 353 recordings and 592 hours of interviews.
Rights:: Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), http://creativecommons.org/licenses/by-nc-nd/4.0/, and PUB

Creator:: Lehečka, Jan and Švec, Jan
Publisher:: University of West Bohemia, Department of Cybernetics
Type:: text, mlmodel, and languageDescription
Subject:: Czech and BERT
Language:: Czech
Description:: The FERNET-C5 is a monolingual BERT language representation model trained from scratch on the Czech Colossal Clean Crawled Corpus (C5) data - a Czech mutation of the English C4 dataset. The training data contained almost 13 billion words (93 GB of text data). The model has the same architecture as the original BERT model, i.e. 12 transformation blocks, 12 attention heads and the hidden size of 768 neurons. In contrast to Google’s BERT models, we used SentencePiece tokenization instead of the Google’s internal WordPiece tokenization. More details can be found in README.txt. Yet more detailed description is available in https://arxiv.org/abs/2107.10042 The same models are also released at https://huggingface.co/fav-kky/FERNET-C5
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

Search