Skip to search
Skip to main content
Skip to first result
Search
Search Results
Type:
toolService
Description:
Open source Python modules, linguistic data and documentation for research and development in natural language processing, supporting dozens of NLP tasks. NLTK includes the following software modules (~120k lines of Python code): Corpus readers interfaces to many corpora Tokenizers whitespace, newline, blankline, word, treebank, sexpr, regexp, Punkt sentence segmenter Stemmers Porter, Lancaster, regexp Taggers regexp, n-gram, backoff, Brill, HMM, TnT Chunkers regexp, n-gram, named-entity Parsers recursive descent, shift-reduce, chart, feature-based, probabilistic, dependency, ... Semantic interpretation untyped lambda calculus, first-order models, DRT, glue semantics, hole semantics, parser interface WordNet WordNet interface, lexical relations, similarity, interactive browser Classifiers decision tree, maximum entropy, naive Bayes, Weka interface, megam Clusterers expectation maximization, agglomerative, k-means Metrics accuracy, precision, recall, windowdiff, distance metrics, inter-annotator agreement coefficients, word association measures, rank correlation Estimation uniform, maximum likelihood, Lidstone, Laplace, expected likelihood, heldout, cross-validation, Good-Turing, Witten-Bell Miscellaneous unification, chatbots, many utilities NLTK-Contrib (less mature) categorial grammar (Lambek, CCG), finite-state automata, hadoop (MapReduce), kimmo, readability, textual entailment, timex, TnT interface, inter-annotator agreement
Rights:
Not specified
Creator:
Veselý, Bohumil
Publisher:
Národní filmový archiv
Type:
video and clip
Subject:
divadlo exteriér , Galerie osobností , People::Pollert Emil (1877-1935) , People::Ostrčil Otakar (1879-1935) , and Národní divadlo
Language:
No linguistic content
Description:
The film segment shows opera singer Emil Pollert and composer Otakar Ostrčil near the back entrance to the National Theatre in Prague.
Rights:
Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) , http://creativecommons.org/licenses/by-nc-nd/4.0/ , and PUB
Publisher:
Meertens Institute KNAW The Netherlands
Format:
application/octet-stream
Type:
toolService
Language:
Dutch
Description:
Enriched database of (mainly) Dutch family names, based on 1947 census (in progress; currently 90.000 entries from 140.000 max)
Rights:
Meertens Institute KNAW The Netherlands
Publisher:
Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:
lexicalConceptualResource
Subject:
terminology database
Language:
Catalan , French , Galician , Italian , Portuguese , Romanian , and Spanish
Description:
Multilingual terminological resource containing 3.875 entries from the Economics, Finance and Banking domains.
Rights:
Not specified
Publisher:
TALG Research Group (University of Vigo)
Type:
lexicalConceptualResource
Language:
Galician
Description:
Galician neology databank
Rights:
Not specified
Creator:
Mírovský, Jiří and Ondruška, Roman
Publisher:
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:
toolService
Subject:
search and treebank
Description:
Netgraph is a graphically oriented client-server application for searching in linguistically annotated treebanks. The query language of Netgraph is simple and intuitive, yet powerful enough for treebanks with complex annotations schemes. The primary purpose of Netgraph is searching in the Prague Dependency Treebank 2.0, nevertheless it can be used for other treebanks as well.
Rights:
GNU General Public Licence, version 3 , http://opensource.org/licenses/GPL-3.0 , and PUB
Publisher:
Frisian Academy
Type:
corpus
Description:
A digital collection of Frisian books, scientific magazines and newspaper articles, which can be used to investigate various aspects of Frisian culture including language and literature. The corpus contains more than 25 million words
Rights:
Not specified
Publisher:
The Research Institute for the Languages of Finland
Type:
corpus
Language:
Finnish
Description:
text corpus, period 1935–2007
Rights:
Not specified
Publisher:
Newcastle University
Format:
application/tei+xml
Type:
corpus
Language:
English
Description:
A corpus of dialect speech from Tyneside in North-East England. digitized audio, standard orthographic transcription, phonetic transcription, and part-of-speech tagged
Rights:
Not specified
Publisher:
Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:
toolService
Description:
Ted Pedersen's Ngram Statistics Package (used to identify word Ngrams that appear in large corpora using standard tests of association such as Fisher's exact test, the log likelihood ratio, Pearson's chi-squared test, the Dice Coefficient, etc.).
Rights:
Not specified