Skip to search
Skip to main content
Skip to first result
Search
Search Results
Publisher:
Max Planck Institute for Psycholinguistics
Type:
corpus
Language:
German and Polish
Description:
Language Acquisition corpus
Rights:
Not specified
Creator:
Rüdiger, Jan Oliver
Publisher:
Jan Oliver Rüdiger
Type:
tool and toolService
Subject:
Corpus Linguisitics , NLP , conll , tei , XML , nlp , Natural Language Processing , linguistics , Linguistics , Computational Linguistics , corpus processing , tagger , POS tagger , lemmatization , text cleaning , CommonCrawl , epub , JSON , Twitter , Pandoc , Wikipedia , digital data , DTA , DSpin , MySQL , ElasticSearch , TextGrid , text corpora , TigerXML , and WeblichtXML
Language:
German , English , French , Italian , Dutch , Spanish , Polish , Arabic , Chinese , and Portuguese
Description:
Software for corpus linguists and text/data mining enthusiasts. The CorpusExplorer combines over 45 interactive visualizations under a user-friendly interface. Routine tasks such as text acquisition, cleaning or tagging are completely automated. The simple interface supports the use in university teaching and leads users/students to fast and substantial results. The CorpusExplorer is open for many standards (XML, CSV, JSON, R, etc.) and also offers its own software development kit (SDK).
Source code available at https://github.com/notesjor/corpusexplorer2.0
Rights:
Not specified
Publisher:
Joint Research Centre of the EU
Type:
corpus
Language:
Bulgarian , Czech , Danish , Dutch , English , Estonian , Finnish , French , German , Modern Greek (1453-) , Hungarian , Italian , Latvian , Maltese , Norwegian , Polish , Portuguese , Romanian , Slovak , Slovenian , Spanish , and Swedish
Description:
The largest parallel corpus, contains EU law, the Acquis Communautaire in 22 languages.
Rights:
Not specified
Publisher:
Max Planck Institute for Psycholinguistics
Type:
corpus
Subject:
language acquisition corpus
Language:
French and Polish
Description:
Language Acquisition corpus
Rights:
Not specified
Publisher:
Max Planck Institute for Psycholinguistics
Type:
corpus
Language:
German , Italian , and Polish
Description:
Language Acquisition corpus
Rights:
Not specified
Publisher:
Institute of Computer Science, Polish Academy of Sciences
Type:
toolService
Language:
Polish
Description:
Morfeusz is a morphological analyser (not stemmer, not tagger) for Polish, withouth a guesser - so it's a morphological dictionary of a kind. Note it's a library, not a ready program. There exist modules developed by external authors, allowing to use Morfeusz in Java and Python.
Rights:
Not specified
Publisher:
Shared initiative of Institute of Computer Science at Polish Academy of Sciences (IPI PAN) , Institute of Computer Science, Polish Academy of Sciences , Institute of Polish Language at the Polish Academy of Sciences , Polish Scientific Publishers PWN , and Department of Computational and Corpus Linguistics at the University of Łódź
Type:
corpus
Language:
Polish
Description:
In (advanced) preparation: a reference corpus of Polish language containing hundreds millions of words.
Rights:
Not specified
Publisher:
Universität Bamberg, World Language Documentation Centre
Format:
application/octet-stream
Type:
lexicalConceptualResource
Language:
Afrikaans , Arabic , Basque , Bulgarian , Catalan , Chinese , Czech , Danish , Dutch , English , Esperanto , Estonian , Finnish , French , Galician , Georgian , Modern Greek (1453-) , Hebrew , Hungarian , Icelandic , Indonesian , Interlingua (International Auxiliary Language Association) , Irish , Italian , Japanese , Khmer , Norwegian , Polish , Portuguese , Romanian , Russian , Serbian , Slovak , Spanish , Swedish , Turkish , Ukrainian , and Welsh
Rights:
GFDL or CC and http://www.omegawiki.org/Licensing
Publisher:
Research Institute for Artificial Intelligence, Romanian Academy of Sciences
Format:
application/xml
Type:
lexicalConceptualResource
Language:
Polish
Description:
currently: about 18 600 lexical units, about 11 000 synsets, planned (by the end of 2008): 25-30 thousands of lexical units
Rights:
Free for non-commercial use
Type:
corpus
Subject:
These databases serve as an important resource for the performance of voice driven teleservice systems in practical implementations
Language:
Czech , Hungarian , Polish , Russian , and Slovak
Description:
5 telephone databases recorded over the PSTN. Contains interesting phonetically rich material. All orthographically transcribed. Speaker information included for gender, age, accent. Including pronunciation lexicon.
Rights:
Not specified