1 - 10 of 10
Number of results to display per page
Search Results
2. CorpusExplorer
- Creator:
- Rüdiger, Jan Oliver
- Publisher:
- Jan Oliver Rüdiger
- Type:
- tool and toolService
- Subject:
- Corpus Linguisitics, NLP, conll, tei, XML, nlp, Natural Language Processing, linguistics, Linguistics, Computational Linguistics, corpus processing, tagger, POS tagger, lemmatization, text cleaning, CommonCrawl, epub, JSON, Twitter, Pandoc, Wikipedia, digital data, DTA, DSpin, MySQL, ElasticSearch, TextGrid, text corpora, TigerXML, and WeblichtXML
- Language:
- German, English, French, Italian, Dutch, Spanish, Polish, Arabic, Chinese, and Portuguese
- Description:
- Software for corpus linguists and text/data mining enthusiasts. The CorpusExplorer combines over 45 interactive visualizations under a user-friendly interface. Routine tasks such as text acquisition, cleaning or tagging are completely automated. The simple interface supports the use in university teaching and leads users/students to fast and substantial results. The CorpusExplorer is open for many standards (XML, CSV, JSON, R, etc.) and also offers its own software development kit (SDK). Source code available at https://github.com/notesjor/corpusexplorer2.0
- Rights:
- Not specified
3. JRC-Acquis
- Publisher:
- Joint Research Centre of the EU
- Type:
- corpus
- Language:
- Bulgarian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Modern Greek (1453-), Hungarian, Italian, Latvian, Maltese, Norwegian, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, and Swedish
- Description:
- The largest parallel corpus, contains EU law, the Acquis Communautaire in 22 languages.
- Rights:
- Not specified
4. NameTag service description
- Creator:
- Straková, Jana and Straka, Milan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- service and toolService
- Subject:
- named entity recognition, NameTag, and WeblichtXML
- Language:
- Czech, German, English, Spanish, and Dutch
- Description:
- Metadata description of nametag (http://hdl.handle.net/11234/1-3633, https://lindat.mff.cuni.cz/services/nametag/) provided for weblicht.
- Rights:
- Not specified
5. Project Gutenberg
- Type:
- corpus
- Language:
- Danish, Dutch, English, Finnish, French, German, Italian, Latin, Portuguese, Russian, Spanish, Swedish, and Telugu
- Description:
- Possibility to download or to browse free electronic books; Angebot: Download von und Online-Zugang zu frei verfügbaren E-Books; deutschsprachige Literatur stellt nur einen Teilbereich der verfügbaren E-Books dar
- Rights:
- Not specified
6. SpeechDat-Car databases
- Type:
- corpus
- Language:
- Danish, Dutch, English, Finnish, French, German, Modern Greek (1453-), Italian, and Spanish
- Description:
- 9 speech databases for training and testing multilingual speech recognition applications in the car environment. Contains parallel 4 channel in-car recordings and a GSM channel. Contains interesting phonetically rich material. All orthographically transcribed. Speaker information included for gender, age, accent. Including pronunciation lexicon.
- Rights:
- Not specified
7. Speecon databases
- Type:
- corpus
- Language:
- Czech, Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Polish, Portuguese, Russian, Spanish, Swedish, Turkish, Chinese, Hebrew, Japanese, Korean, and Thai
- Description:
- 28 speech databases containing broadband recordings from 550 adults and 50 children per language. Contains interesting phonetically rich material. All orthographically transcribed. Speaker information included for gender, age, accent. Including pronunciation lexicon.
- Rights:
- Not specified
8. Subtitle Word Frequencies
- Publisher:
- Center for Reading Research, Ghent University
- Type:
- lexicalConceptualResource
- Language:
- Chinese, Dutch, English, German, Modern Greek (1453-), and Spanish
- Rights:
- Not specified
9. TreeTagger
- Publisher:
- University of Stuttgart
- Type:
- toolService
- Subject:
- POS tagger
- Language:
- Bulgarian, Dutch, English, French, German, Modern Greek (1453-), Italian, Portuguese, Russian, Spanish, and Swahili (macrolanguage)
- Description:
- A part-of-speech tagger and lemmatizer for several languages.
- Rights:
- Not specified
10. Wortschatz
- Publisher:
- University of Leipzig
- Type:
- corpus
- Language:
- Afrikaans, Albanian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, German, Hungarian, Icelandic, Indonesian, Italian, Japanese, Korean, Latin, Latvian, Lithuanian, Malay (macrolanguage), Norwegian, Occitan (post 1500), Romanian, Russian, Slovak, Slovenian, Spanish, Sundanese, Swedish, Tagalog, Turkish, Vietnamese, and Welsh
- Description:
- Collected from newspaper texts, webcrawling, etc.: words (+frequency), cooccurrences (+graph), left/right neighbours, example sentences
- Rights:
- Not specified