Number of results to display per page
Search Results
42. SALDO
- Publisher:
- Språkbanken, Dept. of Swedish Language, Göteborg University
- Type:
- toolService
- Language:
- Swedish
- Description:
- SALDO (Swedish Associative Thesaurus version 2) is an extensive lexicon resource for modern Swedish written language created for the purpose of language technology research and for the development of language technology applications. SALDO may be viewed as a basic lexical resouce for a Swedish BLARK. SALDO builds on Swedish Associative Thesaurus, a semantic lexicon for Swedish.
- Rights:
- Not specified
43. SLäNDa
- Creator:
- Stymne, Sara and Östman, Carin
- Publisher:
- Uppsala University
- Type:
- text and corpus
- Subject:
- literature, literary fiction, dialogue, narrative, and cited materials
- Language:
- Swedish
- Description:
- SLäNDa, the Swedish literature corpus of narrative and dialogue, is a corpus made up of eight Swedish literary novels from the late 19th and early 20th centuries, manually annotated mainly for different aspects of dialogue. The full annotation also contains other cited materials, like thoughts, signs and letters. The main motivation for including these categories as well, is to be able to identify the main narrative, which is all remaining unannotated text.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
44. SLäNDa 2.0
- Creator:
- Stymne, Sara and Östman, Carin
- Publisher:
- Uppsala University
- Type:
- text and corpus
- Subject:
- literature, literary fiction, dialogue, narrative, and cited materials
- Language:
- Swedish
- Description:
- SLäNDa, the Swedish literature corpus of narrative and dialogue, is a corpus made up of eight Swedish literary novels from the 19th and early 20th centuries, manually annotated mainly for different aspects of dialogue. The full annotation also contains other cited materials, like thoughts, signs and letters. The main motivation for including these categories as well, is to be able to identify the main narrative, which is all remaining unannotated text. SLäNDa version 2.0 extends version 1.0 mainly by adding more data, but also by additional quality control, and a slight modification of the annotation scheme. In addition, the data is organized into test sets with different types of speech marking: quotation marks, dashes, and no marking.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
45. Slavic Forest, Norwegian Wood (scripts)
- Creator:
- Rosa, Rudolf, Zeman, Daniel, Mareček, David, and Žabokrtský, Zdeněk
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- suiteOfTools and toolService
- Subject:
- parsing, dependency parser, universal dependencies, and cross-lingual parsing
- Language:
- Czech, Slovak, Slovenian, Croatian, Danish, Swedish, and Norwegian
- Description:
- Tools and scripts used to create the cross-lingual parsing models submitted to VarDial 2017 shared task (https://bitbucket.org/hy-crossNLP/vardial2017), as described in the linked paper. The trained UDPipe models themselves are published in a separate submission (https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-1971). For each source (SS, e.g. sl) and target (TT, e.g. hr) language, you need to add the following into this directory: - treebanks (Universal Dependencies v1.4): SS-ud-train.conllu TT-ud-predPoS-dev.conllu - parallel data (OpenSubtitles from Opus): OpenSubtitles2016.SS-TT.SS OpenSubtitles2016.SS-TT.TT !!! If they are originally called ...TT-SS... instead of ...SS-TT..., you need to symlink them (or move, or copy) !!! - target tagging model TT.tagger.udpipe All of these can be obtained from https://bitbucket.org/hy-crossNLP/vardial2017 You also need to have: - Bash - Perl 5 - Python 3 - word2vec (https://code.google.com/archive/p/word2vec/); we used rev 41 from 15th Sep 2014 - udpipe (https://github.com/ufal/udpipe); we used commit 3e65d69 from 3rd Jan 2017 - Treex (https://github.com/ufal/treex); we used commit d27ee8a from 21st Dec 2016 The most basic setup is the sl-hr one (train_sl-hr.sh): - normalization of deprels - 1:1 word-alignment of parallel data with Monolingual Greedy Aligner - simple word-by-word translation of source treebank - pre-training of target word embeddings - simplification of morpho feats (use only Case) - and finally, training and evaluating the parser Both da+sv-no (train_ds-no.sh) and cs-sk (train_cs-sk.sh) add some cross-tagging, which seems to be useful only in specific cases (see paper for details). Moreover, cs-sk also adds more morpho features, selecting those that seem to be very often shared in parallel data. The whole pipeline takes tens of hours to run, and uses several GB of RAM, so make sure to use a powerful computer.
- Rights:
- GNU General Public License 2 or later (GPL-2.0), http://opensource.org/licenses/GPL-2.0, and PUB
46. Språkbanken (Swedish Language Bank)
- Type:
- corpus
- Language:
- Faroese, Icelandic, Spanish, and Swedish
- Description:
- Mainly written Swedish corpora (all time periods except Runic Swedish; various genres, including learner corpora) and lexicons; some non-Swedish corpora (Faroese, Old Icelandic, Latin, Spanish); Swedish corpora (appr. 200 MW); Swedish lexicons (appr. 220,000 entries total); non-Swedish corpora (appr. 15 MW
- Rights:
- Not specified
47. SVANTE (SVenska ANdraspråksTExter)
- Type:
- corpus
- Language:
- Swedish
- Description:
- Interlanguage/Learner corpus (essays written by SL Swedish learners with many native languages); appr. 200 kW; POS tags; base forms of words (in TEI/XCES XML format)
- Rights:
- Not specified
48. Svenska ord/Lexin
- Type:
- lexicalConceptualResource
- Language:
- Swedish
- Description:
- appr. 20,000 entries, XML
- Rights:
- Not specified
49. Swedish NE annotator
- Type:
- languageDescription
- Language:
- Swedish
- Description:
- Swedish Named Entity annotator
- Rights:
- Not specified
50. Syntag
- Type:
- corpus
- Language:
- Swedish
- Description:
- appr. 100 kW, functional/dependency (one token per line plus its POS and syntactic annotation[s])
- Rights:
- Not specified