Subject: named entity recognition - LINDAT/CLARIAH-CZ Catalog Search Results

11. NameTag 3 Czech CNEC 2.0 Model

Creator:: Straková, Jana
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, mlmodel, and languageDescription
Subject:: named entity recognition, NER, NameTag, and Czech
Language:: Czech
Description:: This is a trained model for the supervised machine learning tool NameTag 3 (https://ufal.mff.cuni.cz/nametag/3/), trained on the Czech Named Entity Corpus 2.0 (https://ufal.mff.cuni.cz/cnec/cnec2.0). NameTag 3 is an open-source tool for both flat and nested named entity recognition (NER). NameTag 3 identifies proper names in text and classifies them into a set of predefined categories, such as names of persons, locations, organizations, etc. The model documentation can be found at https://ufal.mff.cuni.cz/nametag/3/models#czech-cnec2.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

12. NameTag 3 Multilingual CoNLL Model

Creator:: Straková, Jana
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, mlmodel, and languageDescription
Subject:: named entity recognition, NER, NameTag, and multilingual
Language:: English, German, Dutch, Spanish, Ukrainian, and Czech
Description:: This is a trained model for the supervised machine learning tool NameTag 3 (https://ufal.mff.cuni.cz/nametag/3/), trained jointly on several NE corpora: English CoNLL-2003, German CoNLL-2003, Dutch CoNLL-2002, Spanish CoNLL-2002, Ukrainian Lang-uk, and Czech CNEC 2.0, all harmonized to flat NEs with 4 labels PER, ORG, LOC, and MISC. NameTag 3 is an open-source tool for both flat and nested named entity recognition (NER). NameTag 3 identifies proper names in text and classifies them into a set of predefined categories, such as names of persons, locations, organizations, etc. The model documentation can be found at https://ufal.mff.cuni.cz/nametag/3/models#multilingual-conll.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

13. NameTag service description

Creator:: Straková, Jana and Straka, Milan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: service and toolService
Subject:: named entity recognition, NameTag, and WeblichtXML
Language:: Czech, German, English, Spanish, and Dutch
Description:: Metadata description of nametag (http://hdl.handle.net/11234/1-3633, https://lindat.mff.cuni.cz/services/nametag/) provided for weblicht.
Rights:: Not specified

14. Parallel Global Voices, Czech-English NER+NEL

Creator:: Nevěřilová, Zuzana and Žižková, Hana
Publisher:: Masaryk University, Brno
Type:: text, other, and lexicalConceptualResource
Subject:: named entity recognition, named entities, named entity, named entitity corpus, named entity linking, named entity disambiguation, and wikidata
Language:: English and Czech
Description:: Annotation of named entities to the existing source Parallel Global Voices, ces-eng language pair. The named entity annotations distinguish four classes: Person, Organization, Location, Misc. The annotation is in the IOB schema (annotation per token, beginning + inside of the multi-word annotation). NEL annotation contains Wikidata Qnames.
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

15. SumeCzech-NER

Creator:: Marek, Petr and Müller, Štěpán
Publisher:: Czech Technical University in Prague
Type:: text and corpus
Subject:: SumeCzech, named entity recognition, named entitity corpus, and summarization
Language:: Czech
Description:: SumeCzech-NER SumeCzech-NER contains named entity annotations of SumeCzech 1.0 (Straka et al. 2018, SumeCzech: Large Czech News-Based Summarization Dataset). Format The dataset is split into four files. Files are in jsonl format. There is one JSON object on each line of the file. The most important fields of JSON objects are: - dataset: train, dev, test, oodtest - ne_abstract: list of named entity annotations of article's abstract - ne_headline: list of named entity annotations of article's headline - ne_text: list of name entity annotations of article's text - url: article's URL that can be used to match article across SumeCzech and SumeCzech-NER Annotations We used SpaCy's NER model trained on CoNLL-based extended CNEC 2.0. The model achieved a 78.45 F-Score on the dataset's testing set. The annotations are in IOB2 format. The entity types are: Numbers in addresses, Geographical names, Institutions, Media names, Artifact names, Personal names, and Time expressions. Tokenization We used the following Python code for tokenization: from typing import List from nltk.tokenize import word_tokenize def tokenize(text: str) -> List[str]: for mark in ('.', ',', '?', '!', '-', '–', '/'): text = text.replace(mark, f' {mark} ') tokens = word_tokenize(text) return tokens
Rights:: Mozilla Public License 2.0, http://opensource.org/licenses/MPL-2.0, and PUB

11. NameTag 3 Czech CNEC 2.0 Model

12. NameTag 3 Multilingual CoNLL Model

13. NameTag service description

14. Parallel Global Voices, Czech-English NER+NEL

15. SumeCzech-NER

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Creator

Show values starting with

Language

Publisher

Rights

Show values starting with

Subject

Show values starting with

Type

Original context has metadata only

Harvested from