Number of results to display per page
Search Results
852. Svenska ord/Lexin
- Type:
- lexicalConceptualResource
- Language:
- Swedish
- Description:
- appr. 20,000 entries, XML
- Rights:
- Not specified
853. SVMTool
- Publisher:
- Centro de Tecnologías y Aplicaciones del Lenguaje y del Habla (TALP)
- Type:
- toolService
- Language:
- Catalan, English, and Spanish
- Description:
- Generator of sequential taggers based on Support Vector Machines.
- Rights:
- Not specified
854. Swedish NE annotator
- Type:
- languageDescription
- Language:
- Swedish
- Description:
- Swedish Named Entity annotator
- Rights:
- Not specified
855. Świgra
- Publisher:
- Institute of Computer Science, Polish Academy of Sciences
- Type:
- toolService
- Language:
- Polish
- Description:
- Implementation of Świdziński's formal grammar of Polish. Requires a parser (Birnam parser available as a separate tool) and a morphological analyser (no free analyser for Polish; Morfeusz can be used with restrictions - in this case the whole set is available for academic and non-commercial use only).
- Rights:
- Not specified
856. SYN v4: large corpus of written Czech
- Creator:
- Křen, Michal, Cvrček, Václav, Čapka, Tomáš, Čermáková, Anna, Hnátková, Milena, Chlumská, Lucie, Jelínek, Tomáš, Kováříková, Dominika, Petkevič, Vladimír, Procházka, Pavel, Skoumalová, Hana, Škrabal, Michal, Truneček, Petr, Vondřička, Pavel, and Zasina, Adrian
- Publisher:
- Charles University, Faculty of Arts, Institute of the Czech National Corpus
- Type:
- text and corpus
- Subject:
- corpus and written language
- Language:
- Czech
- Description:
- Corpus of contemporary written (printed) Czech sized 3.6 GW (i.e. 4.3 billion tokens). It covers mostly the period of 1990–2014 and it is a traditional corpus (as opposed to the web-crawled corpora) with rich metadata containing bibliographical information etc. Although it contains a wide range of text types (fiction, non-fiction, newspapers), the newspapers prevail noticeably. The corpus is lemmatized and morphologically annotated by a combination of stochastic and rule-based methods. The corpus is provided in a (semi-XML) vertical format used as an input to the Manatee query engine. The data thus correspond to the corpus available via the KonText query interface to registered users of the CNC at http://www.korpus.cz with one important exception: the corpus are shuffled, i.e. divided into blocks sized max. 100 words (respecting the sentence boundaries) with ordering randomized within the given document.
- Rights:
- Czech National Corpus (Shuffled Corpus Data), https://lindat.mff.cuni.cz/repository/xmlui/page/license-cnc, and ACA
857. SYN v9: large corpus of written Czech
- Creator:
- Křen, Michal, Cvrček, Václav, Henyš, Jan, Hnátková, Milena, Jelínek, Tomáš, Kocek, Jan, Kováříková, Dominika, Křivan, Jan, Milička, Jiří, Petkevič, Vladimír, Procházka, Pavel, Skoumalová, Hana, Šindlerová, Jana, and Škrabal, Michal
- Publisher:
- Charles University, Faculty of Arts, Institute of the Czech National Corpus
- Type:
- text and corpus
- Subject:
- corpus and written language
- Language:
- Czech
- Description:
- Corpus of contemporary written (printed) Czech sized 4.7 GW (i.e. 5.7 billion tokens). It covers mostly the 1990-2019 period and features rich metadata including detailed bibliographical information, text-type classification etc. SYN v9 contains a wide variety of text types (fiction, non-fiction, newspapers), but the newspapers prevail noticeably. The corpus is lemmatized and morphologically tagged by the new CNC tagset first utilized for the annotation of the SYN2020 corpus. SYN v9 is provided in a CoNLL-U-like vertical format used as an input to the Manatee query engine. The data thus correspond to the corpus available via the KonText query interface to the registered users of CNC at http://www.korpus.cz with one important exception: the corpus is shuffled, i.e. divided into blocks sized max. 100 words (respecting the sentence boundaries) with ordering randomized within the given document.
- Rights:
- Czech National Corpus (Shuffled Corpus Data), https://lindat.mff.cuni.cz/repository/xmlui/page/license-cnc, and ACA
858. SYN2015: representative corpus of written Czech
- Creator:
- Křen, Michal, Cvrček, Václav, Čapka, Tomáš, Čermáková, Anna, Hnátková, Milena, Chlumská, Lucie, Kováříková, Dominika, Jelínek, Tomáš, Petkevič, Vladimír, Procházka, Pavel, Skoumalová, Hana, Škrabal, Michal, Truneček, Petr, Vondřička, Pavel, and Zasina, Adrian
- Publisher:
- Faculty of Arts, Institute of the Czech National Corpus, Charles University in Prague
- Type:
- text and corpus
- Subject:
- representative corpus and written language
- Language:
- Czech
- Description:
- Representative corpus of contemporary written Czech sized 100 MW. It was created as a representation of printed language from 2010–2014 containing a wide range of text types (fiction, professional literature, newspapers etc.). The corpus is lemmatized, morphologically and syntactically annotated by a combination of stochastic and rule-based methods. The corpus is provided in a (semi-XML) vertical format used as an input to the Manatee query engine. The data thus correspond to the corpus available via the KonText query interface to registered users of the CNC with one important exception: they are shuffled, i.e. divided into blocks sized max. 100 words (respecting the sentence boundaries) with ordering randomized within the given document.
- Rights:
- Czech National Corpus (Shuffled Corpus Data), https://lindat.mff.cuni.cz/repository/xmlui/page/license-cnc, and ACA
859. Synpathy
- Publisher:
- Max Planck Institute for Psycholinguistics
- Type:
- toolService
- Subject:
- annotation tool
- Description:
- Synpathy is a tool for annotating, analyzing, and graphically editing the syntactical structure of sentences (e.g. linguisticly annotated text corpora), developed at the Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands. The application is based on the SyntaxViewer from the TIGER search project developed by the IMS (Institute für Maschinelle Sprachverarbeitung, University of Stuttgart). Since all (non) terminal node features values are user definable a wide range of linguistic descriptions like syntax trees, functional structures, dependency-style structures or predicate-argument structures can be accommodated. The annotated text together with its treebank graph information is stored separately from the list of labels used in the graph (features). Output formats are in persistent TIGER-XML. This facilitates the further processing of the data by other linguistic applications (like ELAN and ANNEX).
- Rights:
- Not specified
860. SynSemClass 1.0
- Creator:
- Urešová, Zdeňka, Fučíková, Eva, Hajičová, Eva, and Hajič, Jan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text, lexicon, and lexicalConceptualResource
- Subject:
- verbal valency, predicate argument structure, semantic roles, bilingual corpus annotation, translational equivalence, comparative syntax, and comparative semantics
- Language:
- English and Czech
- Description:
- The SynSemClass synonym verb lexicon is a result of a project investigating semantic ‘equivalence’ of verb senses and their valency behavior in parallel Czech-English language resources, i.e., relating verb meanings with respect to contextually-based verb synonymy. The lexicon entries are linked to PDT-Vallex (http://hdl.handle.net/11858/00-097C-0000-0023-4338-F), EngVallex (http://hdl.handle.net/11858/00-097C-0000-0023-4337-2), CzEngVallex (http://hdl.handle.net/11234/1-1512), FrameNet (https://framenet.icsi.berkeley.edu/fndrupal/), VerbNet (http://verbs.colorado.edu/verbnet/index.html), PropBank (http://verbs.colorado.edu/%7Empalmer/projects/ace.html), Ontonotes (http://verbs.colorado.edu/html_groupings/), and English Wordnet (https://wordnet.princeton.edu/). Part of the dataset are files reflecting interannotator agreement.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB