1 - 7 of 7
Number of results to display per page
Search Results
2. CST's lemmatiser
- Publisher:
- Center for Sprogteknologi, University of Copenhagen
- Type:
- toolService
- Language:
- Danish, Dutch, English, German, Modern Greek (1453-), Icelandic, Norwegian, Russian, Slovenian, and Swedish
- Description:
- 1) Fully automatic rule based lemmatization of inflected languages 2) Fully automatic training of lemmatization rules based on full form-lemma list
- Rights:
- Not specified
3. JRC-Acquis
- Publisher:
- Joint Research Centre of the EU
- Type:
- corpus
- Language:
- Bulgarian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Modern Greek (1453-), Hungarian, Italian, Latvian, Maltese, Norwegian, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, and Swedish
- Description:
- The largest parallel corpus, contains EU law, the Acquis Communautaire in 22 languages.
- Rights:
- Not specified
4. Project Gutenberg
- Type:
- corpus
- Language:
- Danish, Dutch, English, Finnish, French, German, Italian, Latin, Portuguese, Russian, Spanish, Swedish, and Telugu
- Description:
- Possibility to download or to browse free electronic books; Angebot: Download von und Online-Zugang zu frei verfügbaren E-Books; deutschsprachige Literatur stellt nur einen Teilbereich der verfügbaren E-Books dar
- Rights:
- Not specified
5. SpeechDat-Car databases
- Type:
- corpus
- Language:
- Danish, Dutch, English, Finnish, French, German, Modern Greek (1453-), Italian, and Spanish
- Description:
- 9 speech databases for training and testing multilingual speech recognition applications in the car environment. Contains parallel 4 channel in-car recordings and a GSM channel. Contains interesting phonetically rich material. All orthographically transcribed. Speaker information included for gender, age, accent. Including pronunciation lexicon.
- Rights:
- Not specified
6. Speecon databases
- Type:
- corpus
- Language:
- Czech, Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Polish, Portuguese, Russian, Spanish, Swedish, Turkish, Chinese, Hebrew, Japanese, Korean, and Thai
- Description:
- 28 speech databases containing broadband recordings from 550 adults and 50 children per language. Contains interesting phonetically rich material. All orthographically transcribed. Speaker information included for gender, age, accent. Including pronunciation lexicon.
- Rights:
- Not specified
7. Wortschatz
- Publisher:
- University of Leipzig
- Type:
- corpus
- Language:
- Afrikaans, Albanian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, German, Hungarian, Icelandic, Indonesian, Italian, Japanese, Korean, Latin, Latvian, Lithuanian, Malay (macrolanguage), Norwegian, Occitan (post 1500), Romanian, Russian, Slovak, Slovenian, Spanish, Sundanese, Swedish, Tagalog, Turkish, Vietnamese, and Welsh
- Description:
- Collected from newspaper texts, webcrawling, etc.: words (+frequency), cooccurrences (+graph), left/right neighbours, example sentences
- Rights:
- Not specified