Skip to search
Skip to main content
Skip to first result
Search
Search Results
Publisher:
Shared initiative of Institute of Computer Science at Polish Academy of Sciences (IPI PAN) , Institute of Computer Science, Polish Academy of Sciences , Institute of Polish Language at the Polish Academy of Sciences , Polish Scientific Publishers PWN , and Department of Computational and Corpus Linguistics at the University of Łódź
Type:
corpus
Language:
Polish
Description:
In (advanced) preparation: a reference corpus of Polish language containing hundreds millions of words.
Rights:
Not specified
Type:
toolService
Description:
Open source Python modules, linguistic data and documentation for research and development in natural language processing, supporting dozens of NLP tasks. NLTK includes the following software modules (~120k lines of Python code): Corpus readers interfaces to many corpora Tokenizers whitespace, newline, blankline, word, treebank, sexpr, regexp, Punkt sentence segmenter Stemmers Porter, Lancaster, regexp Taggers regexp, n-gram, backoff, Brill, HMM, TnT Chunkers regexp, n-gram, named-entity Parsers recursive descent, shift-reduce, chart, feature-based, probabilistic, dependency, ... Semantic interpretation untyped lambda calculus, first-order models, DRT, glue semantics, hole semantics, parser interface WordNet WordNet interface, lexical relations, similarity, interactive browser Classifiers decision tree, maximum entropy, naive Bayes, Weka interface, megam Clusterers expectation maximization, agglomerative, k-means Metrics accuracy, precision, recall, windowdiff, distance metrics, inter-annotator agreement coefficients, word association measures, rank correlation Estimation uniform, maximum likelihood, Lidstone, Laplace, expected likelihood, heldout, cross-validation, Good-Turing, Witten-Bell Miscellaneous unification, chatbots, many utilities NLTK-Contrib (less mature) categorial grammar (Lambek, CCG), finite-state automata, hadoop (MapReduce), kimmo, readability, textual entailment, timex, TnT interface, inter-annotator agreement
Rights:
Not specified
Publisher:
Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:
lexicalConceptualResource
Subject:
terminology database
Language:
Catalan , French , Galician , Italian , Portuguese , Romanian , and Spanish
Description:
Multilingual terminological resource containing 3.875 entries from the Economics, Finance and Banking domains.
Rights:
Not specified
Publisher:
TALG Research Group (University of Vigo)
Type:
lexicalConceptualResource
Language:
Galician
Description:
Galician neology databank
Rights:
Not specified
Publisher:
Frisian Academy
Type:
corpus
Description:
A digital collection of Frisian books, scientific magazines and newspaper articles, which can be used to investigate various aspects of Frisian culture including language and literature. The corpus contains more than 25 million words
Rights:
Not specified
Publisher:
The Research Institute for the Languages of Finland
Type:
corpus
Language:
Finnish
Description:
text corpus, period 1935–2007
Rights:
Not specified
Publisher:
Newcastle University
Format:
application/tei+xml
Type:
corpus
Language:
English
Description:
A corpus of dialect speech from Tyneside in North-East England. digitized audio, standard orthographic transcription, phonetic transcription, and part-of-speech tagged
Rights:
Not specified
Publisher:
Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:
toolService
Description:
Ted Pedersen's Ngram Statistics Package (used to identify word Ngrams that appear in large corpora using standard tests of association such as Fisher's exact test, the log likelihood ratio, Pearson's chi-squared test, the Dice Coefficient, etc.).
Rights:
Not specified
Publisher:
NO2014, University of Oslo
Type:
corpus
Language:
Norwegian Nynorsk
Rights:
Not specified
Type:
languageDescription
Description:
All central constructions; lexicon >80 000 lemmas, Lexical-Functional Grammar (LFG) and Minimal Recursion Semantics (MRS)
Rights:
Not specified