Skip to search
Skip to main content
Skip to first result
Search
Search Results
Publisher:
Centro de Tecnologías y Aplicaciones del Lenguaje y del Habla (TALP)
Type:
lexicalConceptualResource
Subject:
lexical database
Language:
Basque , Catalan , English , Galician , and Spanish
Description:
Multilingual lexical database that follows the model proposed by the EuroWordNet project. The MCR integrates into the same EuroWordNet framework wordnets from five different languages (together with four English WordNet versions). It also integrates WordNet Domains and new versions of the Base Concepts and Top Concept Ontology. Overall, it contains 1,642,389 semantic relations between synsets, most of them acquired by automatic means. Information contained: semantics, synonyms, antonyms, definition, equivalents, example of use, morphology.
Rights:
Not specified
Publisher:
University of Tampere
Format:
application/octet-stream
Type:
corpus
Subject:
parallel corpus and multilingual
Language:
English , German , Russian , and Swedish
Description:
International conventions and treaties arranged as a paralell corpus aligned on paragraph level
Rights:
Not specified
Publisher:
Max Planck Institute for Psycholinguistics
Type:
corpus
Language:
Dutch , German , English , and French
Description:
Language Acquisition corpus
Rights:
Not specified
Type:
corpus
Language:
English , French , and Modern Greek (1453-)
Description:
Multilingual (EN, EL, FR); multimodal (Video, Text); parallel (EN, EL, FR subtitles); comparable (transcripts, subtitles); 120 hours
Rights:
Not specified
Creator:
Straka, Milan and Straková, Jana
Publisher:
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:
toolService and tool
Subject:
named entity recognizer
Language:
English
Description:
NameTag is an open-source tool for named entity recognition (NER). NameTag identifies proper names in text and classifies them into predefined categories, such as names of persons, locations, organizations, etc. NameTag is distributed as a standalone tool or a library, along with trained linguistic models. In the Czech language, NameTag achieves state-of-the-art performance (Straková et al. 2013). NameTag is a free software under LGPL license and the linguistic models are free for non-commercial use and distributed under CC BY-NC-SA license, although for some models the original data used to create the model may impose additional licensing conditions.
Rights:
Not specified
Creator:
Straková, Jana and Straka, Milan
Publisher:
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:
service and toolService
Subject:
named entity recognition , NameTag , and WeblichtXML
Language:
Czech , German , English , Spanish , and Dutch
Description:
Metadata description of nametag (http://hdl.handle.net/11234/1-3633, https://lindat.mff.cuni.cz/services/nametag/) provided for weblicht.
Rights:
Not specified
Publisher:
Katholieke Universiteit Leuven Campus Kortrijk
Type:
corpus
Language:
Dutch , English , and French
Description:
Trilingual parallel corpus, with Dutch as first language. 2M words, aligned at paragraph level. It includes fiction and non-fiction texts.
Rights:
Not specified
Format:
application/pdf
Type:
lexicalConceptualResource
Description:
The Narrangansett corpus contains a cultural linguistic dictionary and grammatical information on the Narrangansett language, an extinct language of the USA.
Rights:
Not specified
Publisher:
Shared initiative of Institute of Computer Science at Polish Academy of Sciences (IPI PAN) , Institute of Computer Science, Polish Academy of Sciences , Institute of Polish Language at the Polish Academy of Sciences , Polish Scientific Publishers PWN , and Department of Computational and Corpus Linguistics at the University of Łódź
Type:
corpus
Language:
Polish
Description:
In (advanced) preparation: a reference corpus of Polish language containing hundreds millions of words.
Rights:
Not specified
Type:
toolService
Description:
Open source Python modules, linguistic data and documentation for research and development in natural language processing, supporting dozens of NLP tasks. NLTK includes the following software modules (~120k lines of Python code): Corpus readers interfaces to many corpora Tokenizers whitespace, newline, blankline, word, treebank, sexpr, regexp, Punkt sentence segmenter Stemmers Porter, Lancaster, regexp Taggers regexp, n-gram, backoff, Brill, HMM, TnT Chunkers regexp, n-gram, named-entity Parsers recursive descent, shift-reduce, chart, feature-based, probabilistic, dependency, ... Semantic interpretation untyped lambda calculus, first-order models, DRT, glue semantics, hole semantics, parser interface WordNet WordNet interface, lexical relations, similarity, interactive browser Classifiers decision tree, maximum entropy, naive Bayes, Weka interface, megam Clusterers expectation maximization, agglomerative, k-means Metrics accuracy, precision, recall, windowdiff, distance metrics, inter-annotator agreement coefficients, word association measures, rank correlation Estimation uniform, maximum likelihood, Lidstone, Laplace, expected likelihood, heldout, cross-validation, Good-Turing, Witten-Bell Miscellaneous unification, chatbots, many utilities NLTK-Contrib (less mature) categorial grammar (Lambek, CCG), finite-state automata, hadoop (MapReduce), kimmo, readability, textual entailment, timex, TnT interface, inter-annotator agreement
Rights:
Not specified