Harvested from: LINDAT/CLARIAH-CZ repository / Language: Slovak - LINDAT/CLARIAH-CZ Catalog Search Results

Start Over Language Slovak Harvested from LINDAT/CLARIAH-CZ repository

31. Manually Ranked Translation Outputs

Creator:: Bojar, Ondřej and Galuščáková, Petra
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: machine translation, evaluation, and manual ranking
Language:: Slovak and Czech
Description:: Manually ranked outputs of Czech-Slovak translations. Three annotators manually ranked outputs of five MT systems (Česílko, Česílko2, Google Translate and two Moses setups) on three data sets (100 sentences randomly selected from books, 100 sentences randomly selected from Acquis corpus and 50 first sentences from WMT 2010 test set). Ranking was applied in MT systems comparison in [1]. References: [1] Ondřej Bojar, Petra Galuščáková, and Miroslav Týnovský. Evaluating Quality of Machine Translation from Czech to Slovak. In Markéta Lopatková, editor, Information Technologies - Applications and Theory, pages 3-9, September 2011 and This work has been supported by the grant Euro-MatrixPlus (FP7-ICT-2007-3-231720 of the EU and 7E09003 of the Czech Republic)
Rights:: Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), http://creativecommons.org/licenses/by-nc-sa/3.0/, and PUB

36. Slavic Forest, Norwegian Wood (models)

Creator:: Rosa, Rudolf, Zeman, Daniel, Mareček, David, and Žabokrtský, Zdeněk
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: other and toolService
Subject:: parsing, dependency parser, cross-lingual parsing, and universal dependencies
Language:: Slovak, Croatian, and Norwegian
Description:: Trained models for UDPipe used to produce our final submission to the Vardial 2017 CLP shared task (https://bitbucket.org/hy-crossNLP/vardial2017). The SK model was trained on CS data, the HR model on SL data, and the SV model on a concatenation of DA and NO data. The scripts and commands used to create the models are part of separate submission (http://hdl.handle.net/11234/1-1970). The models were trained with UDPipe version 3e65d69 from 3rd Jan 2017, obtained from https://github.com/ufal/udpipe -- their functionality with newer or older versions of UDPipe is not guaranteed. We list here the Bash command sequences that can be used to reproduce our results submitted to VarDial 2017. The input files must be in CoNLLU format. The models only use the form, UPOS, and Universal Features fields (SK only uses the form). You must have UDPipe installed. The feats2FEAT.py script, which prunes the universal features, is bundled with this submission. SK -- tag and parse with the model: udpipe --tag --parse sk-translex.v2.norm.feats07.w2v.trainonpred.udpipe sk-ud-predPoS-test.conllu A slightly better after-deadline model (sk-translex.v2.norm.Case-feats07.w2v.trainonpred.udpipe), which we mention in the accompanying paper, is also included. It is applied in the same way (udpipe --tag --parse sk-translex.v2.norm.Case-feats07.w2v.trainonpred.udpipe sk-ud-predPoS-test.conllu). HR -- prune the Features to keep only Case and parse with the model: python3 feats2FEAT.py Case < hr-ud-predPoS-test.conllu | udpipe --parse hr-translex.v2.norm.Case.w2v.trainonpred.udpipe NO -- put the UPOS annotation aside, tag Features with the model, merge with the left-aside UPOS annotation, and parse with the model (this hassle is because UDPipe cannot be told to keep UPOS and only change Features): cut -f1-4 no-ud-predPoS-test.conllu > tmp udpipe --tag no-translex.v2.norm.tgttagupos.srctagfeats.Case.w2v.udpipe no-ud-predPoS-test.conllu | cut -f5- | paste tmp - | sed 's/^\t$//' | udpipe --parse no-translex.v2.norm.tgttagupos.srctagfeats.Case.w2v.udpipe
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

37. Slavic Forest, Norwegian Wood (scripts)

Creator:: Rosa, Rudolf, Zeman, Daniel, Mareček, David, and Žabokrtský, Zdeněk
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: suiteOfTools and toolService
Subject:: parsing, dependency parser, universal dependencies, and cross-lingual parsing
Language:: Czech, Slovak, Slovenian, Croatian, Danish, Swedish, and Norwegian
Description:: Tools and scripts used to create the cross-lingual parsing models submitted to VarDial 2017 shared task (https://bitbucket.org/hy-crossNLP/vardial2017), as described in the linked paper. The trained UDPipe models themselves are published in a separate submission (https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-1971). For each source (SS, e.g. sl) and target (TT, e.g. hr) language, you need to add the following into this directory: - treebanks (Universal Dependencies v1.4): SS-ud-train.conllu TT-ud-predPoS-dev.conllu - parallel data (OpenSubtitles from Opus): OpenSubtitles2016.SS-TT.SS OpenSubtitles2016.SS-TT.TT !!! If they are originally called ...TT-SS... instead of ...SS-TT..., you need to symlink them (or move, or copy) !!! - target tagging model TT.tagger.udpipe All of these can be obtained from https://bitbucket.org/hy-crossNLP/vardial2017 You also need to have: - Bash - Perl 5 - Python 3 - word2vec (https://code.google.com/archive/p/word2vec/); we used rev 41 from 15th Sep 2014 - udpipe (https://github.com/ufal/udpipe); we used commit 3e65d69 from 3rd Jan 2017 - Treex (https://github.com/ufal/treex); we used commit d27ee8a from 21st Dec 2016 The most basic setup is the sl-hr one (train_sl-hr.sh): - normalization of deprels - 1:1 word-alignment of parallel data with Monolingual Greedy Aligner - simple word-by-word translation of source treebank - pre-training of target word embeddings - simplification of morpho feats (use only Case) - and finally, training and evaluating the parser Both da+sv-no (train_ds-no.sh) and cs-sk (train_cs-sk.sh) add some cross-tagging, which seems to be useful only in specific cases (see paper for details). Moreover, cs-sk also adds more morpho features, selecting those that seem to be very often shared in parallel data. The whole pipeline takes tens of hours to run, and uses several GB of RAM, so make sure to use a powerful computer.
Rights:: GNU General Public License 2 or later (GPL-2.0), http://opensource.org/licenses/GPL-2.0, and PUB

38. Slovak Dependency Treebank

Creator:: Gajdošová, Katarína, Šimková, Mária, and et al.
Publisher:: Jazykovedný ústav Ľ. Štúra Slovenskej akadémie vied
Type:: text and corpus
Subject:: dependency, treebank, syntax, and morphology
Language:: Slovak
Description:: Slovak Dependency Treebank (Slovenský závislostný korpus) was created as part of the Slovak National Corpus at the Ľ. Štúr Institute of the Slovak Academy of Sciences. The annotation follows the guidelines of the Prague Dependency Treebank (Czech), slightly modified in the spirit of Slovak grammatical tradition. Morphological tags, lemmas and dependency relations have been assigned manually to every word. The present dataset is a subset of the original treebank. We automatically selected the sentences where the two human annotators 100% agreed on the analysis. This increases the quality and trustworthiness of the data but it also results in selecting short sentences most of the time. An extended version may be published in the future when manually merged and checked annotation is available. The selected sentences have been converted to the CoNLL-X file format (original token IDs are preserved in the FEATS column). This PDT-style annotation will serve as the source for the first Slovak dataset in the Universal Dependencies (to be published separately).
Rights:: Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB

39. Slovak MorphoDiTa Models 170914

Creator:: Straka, Milan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, mlmodel, and languageDescription
Subject:: MorphoDiTa, Slovak, morphological analysis, morphological generation, and PoS tagging
Language:: Slovak
Description:: Slovak models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from MorfFlex SK 170914 and the PoS tagger is trained on automatically translated Prague Dependency Treebank 3.0 (PDT).
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

40. SpeechDat-East databases

Type:: corpus
Subject:: These databases serve as an important resource for the performance of voice driven teleservice systems in practical implementations
Language:: Czech, Hungarian, Polish, Russian, and Slovak
Description:: 5 telephone databases recorded over the PSTN. Contains interesting phonetically rich material. All orthographically transcribed. Speaker information included for gender, age, accent. Including pronunciation lexicon.
Rights:: Not specified

« Previous
Next »
1
2
3
4
5
6
7

31. Manually Ranked Translation Outputs

32. MorfFlex SK 170914

33. OmegaWiki

34. Plaintext Wikipedia dump 2018

35. skTenTen

36. Slavic Forest, Norwegian Wood (models)

37. Slavic Forest, Norwegian Wood (scripts)

38. Slovak Dependency Treebank

39. Slovak MorphoDiTa Models 170914

40. SpeechDat-East databases

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Coverage

Creator

Show values starting with

Format

Language

Show values starting with

Publisher

Rights

Show values starting with

Subject

Show values starting with

Type

Show values starting with

Date

Original context has metadata only

Harvested from