Skip to search
Skip to main content
Skip to first result
Search
Search Results
Publisher:
Centro de Tecnologías y Aplicaciones del Lenguaje y del Habla (TALP)
Type:
toolService
Language:
Catalan , English , Galician , Italian , Portuguese , and Welsh
Description:
Open source language analysis tool suite: tokenizer, stemmer/lemmatizer, named entity recognizer, chunker/segmenter, morphosyntactic tagger, syntactic tagger, corpus processer, morphological tagger, semantic tagger, analyzer, Word Sense Disambiguator.
Rights:
Not specified
Publisher:
University of Cambridge
Format:
application/tei+xml
Type:
corpus
Language:
Welsh
Description:
Welsh texts from the period 1500-1850. Overall the corpus contains around 420,000 words from 30 texts.
Rights:
Not specified
Publisher:
University of Leipzig
Type:
corpus
Language:
Afrikaans , Albanian , Bulgarian , Catalan , Chinese , Croatian , Czech , Danish , Dutch , English , Esperanto , Estonian , Finnish , French , German , Hungarian , Icelandic , Indonesian , Italian , Japanese , Korean , Latin , Latvian , Lithuanian , Malay (macrolanguage) , Norwegian , Occitan (post 1500) , Romanian , Russian , Slovak , Slovenian , Spanish , Sundanese , Swedish , Tagalog , Turkish , Vietnamese , and Welsh
Description:
Collected from newspaper texts, webcrawling, etc.: words (+frequency), cooccurrences (+graph), left/right neighbours, example sentences
Rights:
Not specified