Harvested from: LINDAT/CLARIAH-CZ repository - LINDAT/CLARIAH-CZ Catalog Search Results

241. BulTreeBank Morphosyntactic Corpus

Type:: corpus
Language:: Bulgarian
Description:: Written, synchronic, general, manually annotated, 1 000 000 tokens divided in three sets: 215 000 tokens used in BulTreeBank HPSG Treebank (see below), additionally 300 000 checked second time, rest about 480 000 checked by the annotators. Morphosyntactic annotation with the BulTreeBank Tagset (http://www.bultreebank.org/TechRep/BTB-TR03.pdf), XML, annotation description in technical reports of BulTreeBank project http://www.bultreebank.org/TechRep
Rights:: Not specified

242. BulTreeBank Morphosyntactic Disambiguator

Creator:: Simov, Kiril, Osenova, Petya, and Simov, Alex
Publisher:: Linguistic Modeling Department, IPP, Bulgarian Academy of Sciences
Type:: toolService
Description:: This is a hybrid system: rules, neural network, rules. First rules for the sure cases are applied, then a neural network disambiguator is applied, then rules for repairing of the most frequent errors of the neural network. The rules are implemented as constraints in CLaRK System. The neural network is additional module implemented in Java. It is called CLaRK. It requires the morphologically annotated input.
Rights:: Not specified

243. BulTreeBank POS Corpus

Type:: corpus
Language:: Bulgarian
Description:: Written, synchronic, general, manually annotated; 50 000 tokens, 2600 sentences extracted from the BulTreeBank Text Archive in order to contain the most frequent ambiguity classes in Bulgarian
Rights:: Not specified

244. BulTreeBank Stopword List

Type:: lexicalConceptualResource
Language:: Bulgarian
Description:: 805 prepositions, pronouns, etc stop words, UTF-16 list of wordforms
Rights:: Not specified

245. BulTreeBank Text Archive

Type:: corpus
Language:: Bulgarian
Description:: 72 000 000 tokens, 15% fiction, 78% newspapers and 7% legal texts, government bulletins and others
Rights:: Not specified

246. BulTreeBank Tokenizer

Creator:: Simov, Kiril
Publisher:: Linguistic Modeling Department, IPP, Bulgarian Academy of Sciences
Type:: toolService
Description:: The tokenizer is covering all languages that use Latin1, Laitn2, Latin3 and Cyrillic tables of Unicode. Can be extended to cover other tables in Unicode if necessary. The implementation is as a cascaded regular grammar in CLaRK. It recognizes over 60 token categories. It is easy to be adapted to new token categories.
Rights:: Not specified

247. BUSCANEO

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Language:: Catalan and Spanish
Description:: Tool for neologism extraction.
Rights:: Not specified

248. BushBank

Creator:: Grác, Marek
Publisher:: Masaryk University, NLP Centre
Type:: text and corpus
Subject:: interannotator agreement, corpus, chunks, phrases, and clauses
Language:: Czech
Description:: Czech corpus annotated for NP and clause chunks by 3-11 annotators (with average inter-annotator agreement at 88%). It consists of 10,000 sentences.
Rights:: Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0), http://creativecommons.org/licenses/by-nc-nd/3.0/, and PUB

249. Bústia Neològica Escolar

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Language:: Catalan and Spanish
Description:: Terminology management
Rights:: Not specified

250. Bwananet

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Language:: Catalan, English, and Spanish
Description:: Tool for querying the Technical Corpus of the Institut Universitari de Lingüística Aplicada.
Rights:: Not specified

241. BulTreeBank Morphosyntactic Corpus

242. BulTreeBank Morphosyntactic Disambiguator

243. BulTreeBank POS Corpus

244. BulTreeBank Stopword List

245. BulTreeBank Text Archive

246. BulTreeBank Tokenizer

247. BUSCANEO

248. BushBank

249. Bústia Neològica Escolar

250. Bwananet

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Coverage

Show values starting with

Creator

Show values starting with

Format

Language

Show values starting with

Publisher

Show values starting with

Rights

Show values starting with

Subject

Show values starting with

Type

Show values starting with

Date

Original context has metadata only

Harvested from