Number of results to display per page
Search Results
812. SemTi-Kamols morphological analyser
- Type:
- toolService
- Subject:
- morphological analyzer
- Language:
- Latvian
- Description:
- A Java library for morphological analysis of Latvian. The lexicon covers ~50 000 lemmas. A set of robust derivation rules is also used.
- Rights:
- Not specified
813. SenTube
- Publisher:
- Machine Learning and NLP group at Trento
- Type:
- corpus
- Subject:
- sentiment analysis
- Language:
- English and Italian
- Description:
- Sentiment analysis of Youtube videos with joint models of text and speech
- Rights:
- Not specified
814. Shallow syntactically disambiguated corpus
- Type:
- corpus
- Language:
- Estonian
- Description:
- written general; 300 000 words; local tagset (POS, syntactic functions)
- Rights:
- Not specified
815. SIL FieldWorks
- Publisher:
- Summer Institute of Linguistics (SIL), Inc
- Type:
- toolService
- Subject:
- corpus management
- Description:
- FieldWorks consists of software tools that help you manage linguistic and cultural data. FieldWorks supports tasks ranging from the initial entry of collected data through to the preparation of data for publication: * dictionary development * interlinearization of texts * cultural records, which can be categorized using the Outline of Cultural Materials * bulk editing of many fields * morphological analysis * complex non-Roman scripts using Unicode and SIL-developed Graphite * multi-user editing capability over a local area network.
- Rights:
- Not specified
816. SiR 1.0
- Creator:
- Hladká, Barbora, Mírovský, Jiří, Kopp, Matyáš, and Moravec, Václav
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- news server articles, attribution, attribution signals, attribution sources, and annotation
- Language:
- Czech
- Description:
- SiR 1.0 is a corpus of Czech articles published on iRozhlas, a news server of a Czech public radio (https://www.irozhlas.cz/). It is a collection of 1 718 articles (42 890 sentences, 614 995 words) with manually annotated attribution of citation phrases and sources. The sources are classified into several classes of named and unnamed sources. The corpus consists of three parts, depending on the quality of the annotations: (i) triple-annotated articles: 46 articles (933 sentences, 13 242 words) annotated independently by three annotators and subsequently curated by an arbiter, (ii) double-annotated articles: 543 articles (12 347 sentences, 180 622 words) annotated independently by two annotators and automatically unified, and (iii) single-annotated articles: 1 129 articles (29 610 sentences, 421 131 words) annotated each only by a single annotator. The data were annotated in the Brat tool (https://brat.nlplab.org/) and are distributed in the Brat native format, i.e. each article is represented by the original plain text and a stand-off annotation file. Please cite the following paper when using the corpus for your research: Hladká Barbora, Jiří Mírovský, Matyáš Kopp, Václav Moravec. Annotating Attribution in Czech News Server Articles. In: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), pages 1817–1823, Marseille, France 20-25 June 2022.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
817. skTenTen
- Creator:
- (:unav) Unknown author
- Publisher:
- Masaryk University, NLP Centre
- Type:
- text and corpus
- Subject:
- Slovak large corpus
- Language:
- Slovak
- Description:
- Slovak large web corpus skTenTen, comprising 876,003,720 tokens. and Lexical Computing Ltd.
- Rights:
- Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0), http://creativecommons.org/licenses/by-nc-nd/3.0/, and PUB
818. SLäNDa
- Creator:
- Stymne, Sara and Östman, Carin
- Publisher:
- Uppsala University
- Type:
- text and corpus
- Subject:
- literature, literary fiction, dialogue, narrative, and cited materials
- Language:
- Swedish
- Description:
- SLäNDa, the Swedish literature corpus of narrative and dialogue, is a corpus made up of eight Swedish literary novels from the late 19th and early 20th centuries, manually annotated mainly for different aspects of dialogue. The full annotation also contains other cited materials, like thoughts, signs and letters. The main motivation for including these categories as well, is to be able to identify the main narrative, which is all remaining unannotated text.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
819. SLäNDa 2.0
- Creator:
- Stymne, Sara and Östman, Carin
- Publisher:
- Uppsala University
- Type:
- text and corpus
- Subject:
- literature, literary fiction, dialogue, narrative, and cited materials
- Language:
- Swedish
- Description:
- SLäNDa, the Swedish literature corpus of narrative and dialogue, is a corpus made up of eight Swedish literary novels from the 19th and early 20th centuries, manually annotated mainly for different aspects of dialogue. The full annotation also contains other cited materials, like thoughts, signs and letters. The main motivation for including these categories as well, is to be able to identify the main narrative, which is all remaining unannotated text. SLäNDa version 2.0 extends version 1.0 mainly by adding more data, but also by additional quality control, and a slight modification of the annotation scheme. In addition, the data is organized into test sets with different types of speech marking: quotation marks, dashes, and no marking.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
820. Slavic Forest, Norwegian Wood (models)
- Creator:
- Rosa, Rudolf, Zeman, Daniel, Mareček, David, and Žabokrtský, Zdeněk
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- other and toolService
- Subject:
- parsing, dependency parser, cross-lingual parsing, and universal dependencies
- Language:
- Slovak, Croatian, and Norwegian
- Description:
- Trained models for UDPipe used to produce our final submission to the Vardial 2017 CLP shared task (https://bitbucket.org/hy-crossNLP/vardial2017). The SK model was trained on CS data, the HR model on SL data, and the SV model on a concatenation of DA and NO data. The scripts and commands used to create the models are part of separate submission (http://hdl.handle.net/11234/1-1970). The models were trained with UDPipe version 3e65d69 from 3rd Jan 2017, obtained from https://github.com/ufal/udpipe -- their functionality with newer or older versions of UDPipe is not guaranteed. We list here the Bash command sequences that can be used to reproduce our results submitted to VarDial 2017. The input files must be in CoNLLU format. The models only use the form, UPOS, and Universal Features fields (SK only uses the form). You must have UDPipe installed. The feats2FEAT.py script, which prunes the universal features, is bundled with this submission. SK -- tag and parse with the model: udpipe --tag --parse sk-translex.v2.norm.feats07.w2v.trainonpred.udpipe sk-ud-predPoS-test.conllu A slightly better after-deadline model (sk-translex.v2.norm.Case-feats07.w2v.trainonpred.udpipe), which we mention in the accompanying paper, is also included. It is applied in the same way (udpipe --tag --parse sk-translex.v2.norm.Case-feats07.w2v.trainonpred.udpipe sk-ud-predPoS-test.conllu). HR -- prune the Features to keep only Case and parse with the model: python3 feats2FEAT.py Case < hr-ud-predPoS-test.conllu | udpipe --parse hr-translex.v2.norm.Case.w2v.trainonpred.udpipe NO -- put the UPOS annotation aside, tag Features with the model, merge with the left-aside UPOS annotation, and parse with the model (this hassle is because UDPipe cannot be told to keep UPOS and only change Features): cut -f1-4 no-ud-predPoS-test.conllu > tmp udpipe --tag no-translex.v2.norm.tgttagupos.srctagfeats.Case.w2v.udpipe no-ud-predPoS-test.conllu | cut -f5- | paste tmp - | sed 's/^\t$//' | udpipe --parse no-translex.v2.norm.tgttagupos.srctagfeats.Case.w2v.udpipe
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB