Type: toolService - LINDAT/CLARIAH-CZ Catalog Search Results

211. Slavic Forest, Norwegian Wood (scripts)

Creator:: Rosa, Rudolf, Zeman, Daniel, Mareček, David, and Žabokrtský, Zdeněk
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: suiteOfTools and toolService
Subject:: parsing, dependency parser, universal dependencies, and cross-lingual parsing
Language:: Czech, Slovak, Slovenian, Croatian, Danish, Swedish, and Norwegian
Description:: Tools and scripts used to create the cross-lingual parsing models submitted to VarDial 2017 shared task (https://bitbucket.org/hy-crossNLP/vardial2017), as described in the linked paper. The trained UDPipe models themselves are published in a separate submission (https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-1971). For each source (SS, e.g. sl) and target (TT, e.g. hr) language, you need to add the following into this directory: - treebanks (Universal Dependencies v1.4): SS-ud-train.conllu TT-ud-predPoS-dev.conllu - parallel data (OpenSubtitles from Opus): OpenSubtitles2016.SS-TT.SS OpenSubtitles2016.SS-TT.TT !!! If they are originally called ...TT-SS... instead of ...SS-TT..., you need to symlink them (or move, or copy) !!! - target tagging model TT.tagger.udpipe All of these can be obtained from https://bitbucket.org/hy-crossNLP/vardial2017 You also need to have: - Bash - Perl 5 - Python 3 - word2vec (https://code.google.com/archive/p/word2vec/); we used rev 41 from 15th Sep 2014 - udpipe (https://github.com/ufal/udpipe); we used commit 3e65d69 from 3rd Jan 2017 - Treex (https://github.com/ufal/treex); we used commit d27ee8a from 21st Dec 2016 The most basic setup is the sl-hr one (train_sl-hr.sh): - normalization of deprels - 1:1 word-alignment of parallel data with Monolingual Greedy Aligner - simple word-by-word translation of source treebank - pre-training of target word embeddings - simplification of morpho feats (use only Case) - and finally, training and evaluating the parser Both da+sv-no (train_ds-no.sh) and cs-sk (train_cs-sk.sh) add some cross-tagging, which seems to be useful only in specific cases (see paper for details). Moreover, cs-sk also adds more morpho features, selecting those that seem to be very often shared in parallel data. The whole pipeline takes tens of hours to run, and uses several GB of RAM, so make sure to use a powerful computer.
Rights:: GNU General Public License 2 or later (GPL-2.0), http://opensource.org/licenses/GPL-2.0, and PUB

212. SMOR - German morphology

Publisher:: University of Stuttgart
Type:: toolService
Language:: German
Description:: SMOR is a wide-coverage German computational morphology with inflection, derivation, and compounding. The SMOR code excepted the stem lexicon are available under the GNU license. SMOR (without a stem lexicon) comes with the SFST tools.
Rights:: Not specified

213. SOLC

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Language:: Catalan
Description:: An orthologic server for Catalan. A query system for the orthologic dictionary which allows making searches using dialectal and pragmatic variables.
Rights:: Not specified

214. Speech Processing, Recognition and Automatic Annotation Kit (SPRAAK)

Type:: toolService
Subject:: speech recognition
Description:: SPRAAK (also Dutch for 'speech') is a speech recognition package. As such it is useful for transcription of speech, alignment of spoken and written language, annotation of corpora, etc. It is an efficient and flexible tool that combines many of the recent advancements in automatic speech recognition with a very efficient decoder in a proven HMM architecture. SPRAAK can be adapted for all languages, except tonal ones.
Rights:: Not specified

215. SpeechRecorder

Type:: toolService
Description:: SpeechRecorder is a platform independent multi-channel audio recording software. Its main features are a configurable recording script, Unicode text, image and audio prompts, hardware independence and localized language interfaces.
Rights:: Not specified

216. Spejd

Publisher:: Institute of Computer Science, Polish Academy of Sciences
Type:: toolService
Description:: Tool for partial parsing and rule-based morphosyntactic disambiguation
Rights:: Not specified

217. Stuttgart Finite State Transducer Tools

Publisher:: University of Stuttgart
Type:: toolService
Description:: SFST is a finite state transducer toolkit for the implementation of morphologies and other applications of finite state transducers. SFST comprises a compiler and several tools for transforming, printing and applying transducers.
Rights:: Not specified

218. STYX

Creator:: Kučera, Ondřej
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: toolService
Subject:: education, morphology, and syntax
Language:: Czech
Description:: The STYX system is an electronic exercise book for practising Czech morphology and syntax consisting of more than 11, 000 sentences.
Rights:: GNU General Public Licence, version 3, http://opensource.org/licenses/GPL-3.0, and PUB

219. SVMTool

Publisher:: Centro de Tecnologías y Aplicaciones del Lenguaje y del Habla (TALP)
Type:: toolService
Language:: Catalan, English, and Spanish
Description:: Generator of sequential taggers based on Support Vector Machines.
Rights:: Not specified

220. Świgra

Publisher:: Institute of Computer Science, Polish Academy of Sciences
Type:: toolService
Language:: Polish
Description:: Implementation of Świdziński's formal grammar of Polish. Requires a parser (Birnam parser available as a separate tool) and a morphological analyser (no free analyser for Polish; Morfeusz can be used with restrictions - in this case the whole set is available for academic and non-commercial use only).
Rights:: Not specified

211. Slavic Forest, Norwegian Wood (scripts)

212. SMOR - German morphology

213. SOLC

214. Speech Processing, Recognition and Automatic Annotation Kit (SPRAAK)

215. SpeechRecorder

216. Spejd

217. Stuttgart Finite State Transducer Tools

218. STYX

219. SVMTool

220. Świgra

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Coverage

Show values starting with

Creator

Show values starting with

Format

Language

Show values starting with

Publisher

Show values starting with

Rights

Show values starting with

Subject

Show values starting with

Type

Date

Original context has metadata only

Harvested from