Language: Albanian / Rights: PUB - LINDAT/CLARIAH-CZ Catalog Search Results

Creator:: Çano, Erion
Publisher:: University of Vienna
Type:: text and corpus
Subject:: sentiment analysis, under-resourced language, and albanian language
Language:: Albanian
Description:: AlbMoRe is a sentiment analysis corpus of movie reviews in Albanian, consisting of 800 records in CSV format. Each record includes a text review retrieved from IMDb and translated in Albanian by the author. It also contains a 0 negative) or 1 (positive) label added by the author. The corpus is fully balanced, consisting of 400 positive and 400 negative reviews about 67 movies of different genres. AlbMoRe corpus is released under CC-BY license (https://creativecommons.org/licenses/by/4.0/). If using the data, please cite the following paper: Çano Erion. AlbMoRe: A Corpus of Movie Reviews for Sentiment Analysis in Albanian. CoRR, abs/2306.08526, 2023. URL https://arxiv.org/abs/2306.08526.
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), PUB, and http://creativecommons.org/licenses/by/4.0/

Creator:: Çano, Erion
Publisher:: University of Vienna
Type:: text and corpus
Subject:: named entity recognition, under-resourced languages, and albanian language
Language:: Albanian
Description:: AlbNER is a Named Entity Recognition corpus of Wikipedia sentences in Albanian, consisting of 900 records. The sentence tokens are manually labeled complying with the CoNLL-2003 shared task annotation scheme explained at https://aclanthology.org/W03-0419.pdf that uses I-ORG, B-ORG, I-PER, B-PER, I-LOC, B-LOC, I-MISC, B-MISC and O tags. AlbNER data are released under CC-BY license (https://creativecommons.org/licenses/by/4.0/). If using AlbMoRe corpus, please cite the following paper: Çano Erion. AlbNER: A Corpus for Named Entity Recognition in Albanian. CoRR, abs/2309.08741, 2023. URL https://arxiv.org/abs/2309.08741.
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

Creator:: Çano, Erion
Publisher:: University of Vienna
Type:: text and corpus
Subject:: under-resourced language, albanian language, and topic modeling
Language:: Albanian
Description:: AlbNews is a topic modeling corpus of news headlines in Albanian, consisting of 600 labeled samples and 2600 unlabeled samples. Each labeled sample includes a headline text retrieved from Albanian online news portals. It also contains one of the four labels: 'pol' for politics, 'cul' for culture, 'eco' for economy, and 'spo' for sport. Each of the unlabeled samples contain a headline text only.AlbTopic corpus is released under CC-BY 4.0 license (https://creativecommons.org/licenses/by/4.0/). If using the data, please cite the following paper: Çano Erion, Lamaj Dario. AlbNews: A Corpus of Headlines for Topic Modeling in Albanian. CoRR, abs/2402.04028, 2024. URL: https://arxiv.org/abs/2402.04028.
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

Limit your search