Harvested from: LINDAT/CLARIAH-CZ repository - LINDAT/CLARIAH-CZ Catalog Search Results

Start Over Harvested from LINDAT/CLARIAH-CZ repository

881. hunalign - sentence level aligner

Publisher:: Budapest University of Technology and Economics Media Research (BME MOKK)
Type:: toolService
Description:: Hunalign is a powerful free sentence level aligner for building parallel corpora. Its input is tokenized and sentence-segmented text in two languages.
Rights:: Not specified

882. Hungarian Historical Corpus

Publisher:: Academy of Sciences
Format:: application/xml
Type:: corpus
Language:: Hungarian
Description:: Containing 27 million running words the Hungarian Historical Corpus provides a valuable basis for research on the history of words of Hungarian between the second half of the 18th century and 2000.
Rights:: Not specified

883. Hungarian Lexical and Syntactic Resources for NooJ

Publisher:: Dept. of Language Technology, Research Institute for Linguistics
Type:: toolService
Language:: Hungarian
Description:: NooJ is a linguistic development environment that includes large-coverage dictionaries and grammars, and parses corpora in real time. The large-coverage lexical resources (morphological and syntactic grammars) for Hungarian might be applied to texts in order to locate morphological, lexical and syntactic patterns and tag simple and compound words.
Rights:: Not specified

884. Hungarian National Corpus

Publisher:: Academy of Sciences
Format:: application/xml
Type:: corpus
Subject:: synchronic corpus
Language:: Hungarian
Description:: Written general synchronic reference corpus; 190m tokens; POS annotated XML
Rights:: Not specified

885. Hungarian Web Corpus

Publisher:: Budapest University of Technology and Economics Media Research (BME MOKK)
Type:: corpus
Subject:: Web corpus
Language:: Hungarian
Description:: Monolingual written general; 700 million tokens; Segmentation, disambiguation
Rights:: Not specified

886. Hunglish Corpus

Publisher:: Academy of Sciences and Budapest University of Technology and Economics Media Research (BME MOKK)
Type:: corpus
Subject:: parallel corpus
Language:: English and Hungarian
Description:: Billingual written general; 2 million sentences
Rights:: CC

887. hunner - named entitiy recognizer for Hungarian

Creator:: Varga, Dániel and Simon, Eszter
Publisher:: Budapest Technical University Media Research Centre
Type:: toolService
Description:: Hungarian named entity recognition with a maximum entropy approach
Rights:: Not specified

888. hunpos - a POS tagger

Publisher:: Budapest University of Technology and Economics Media Research (BME MOKK)
Type:: toolService
Description:: Hunpos is an open source reimplementation of TnT, the well known part-of-speech tagger by Thorsten Brants.
Rights:: Not specified

889. huntoken - tokenizer and sentence splitter

Creator:: Németh, László, Halácsy, Péter, and Kornai, András
Publisher:: Budapest Technical University Media Research Centre
Type:: toolService
Subject:: tokenizer
Description:: HunToken is a rule based tokenizer and sentence boundary detector for Hungarian (and English) texts.
Rights:: GNU Library or "Lesser" General Public License 3.0 (LGPL-3.0), http://opensource.org/licenses/LGPL-3.0, and PUB

890. HWC2023 –Hamburg.de Website Corpus 2023

Creator:: Rüdiger, Jan Oliver
Publisher:: Leibniz-Institut für Deutsche Sprache
Type:: text and corpus
Subject:: corpus, Web corpus, web corpora, Germanistik, German, websites, crawling corpus, and CorpusExplorer
Language:: German
Description:: A petition for a referendum (called: "Schluss mit Gendersprache in Verwaltung und Bildung" / eng.: "abolition of gender language in administration and education") was formed in Hamburg in February 2023. The project "Empirical Gender Linguistics" at the "Leibniz Institute for the German Language" took this as an opportunity to completely scrap the "https://www.hamburg.de" website (except the list of ships in the Port of Hamburg and the yellow page). The Hamburg.de website is the central digital contact point for citizens. The scraped texts were cleaned, processed and annotated using http://www.CorpusExplorer.de (TreeTagger - POS/Lemma information). We use the corpus to analyze the use of words with gender signs.
Rights:: Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), PUB, and http://creativecommons.org/licenses/by-nc-sa/3.0/

« Previous
Next »
1
2
…
85
86
87
88
89
90
91
92
93
…
228
229

881. hunalign - sentence level aligner

882. Hungarian Historical Corpus

883. Hungarian Lexical and Syntactic Resources for NooJ

884. Hungarian National Corpus

885. Hungarian Web Corpus

886. Hunglish Corpus

887. hunner - named entitiy recognizer for Hungarian

888. hunpos - a POS tagger

889. huntoken - tokenizer and sentence splitter

890. HWC2023 –Hamburg.de Website Corpus 2023

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Coverage

Show values starting with

Creator

Show values starting with

Format

Language

Show values starting with

Publisher

Show values starting with

Rights

Show values starting with

Subject

Show values starting with

Type

Show values starting with

Date

Original context has metadata only

Harvested from