Show simple item record

 
dc.contributor.author Osuský, Adam
dc.contributor.author Javorský, Dávid
dc.date.accessioned 2024-07-15T14:35:54Z
dc.date.available 2024-07-15T14:35:54Z
dc.date.issued 2024
dc.identifier.uri http://hdl.handle.net/11234/1-5520
dc.description This dataset comprises a corpus of 50 text contexts, each about 60 words in length, sourced from five distinct domains. Each context has been evaluated by multiple annotators who identified and ranked the most important words—up to 10% of each text—according to their perceived significance. The annotators followed specific guidelines to ensure consistency in word selection and ranking. For further details, please refer to the cited source. --- rankings_task.csv - This csv contains information about the contexts which are to be annotated: - id: A unique identifier for each task. - content: The context to be ranked. --- rankings_ranking.csv - This csv includes ranking information for various assignments. It contains four columns: - id: A unique identifier for each ranking entry. - score: The score assigned to the entry. - word_order: A JSON detailing the order of words positions. It is essentially the selected word positions and their ordering from an annotator. - assignment_id: A reference ID linking to the assignments. --- rankings_assignment.csv - This csv tracks the completion status of tasks by users. It includes four columns: - id: A unique identifier for each assignment entry. - is_completed: A binary indicator (1 for completed, 0 for not completed). - task_id: A reference ID linking to the tasks. - user_id: The identifier for the user who should complete the task (rank the words). --- Known Issues: Please note that each annotator was intended to rank each context only once. However, due to a bug in the deployment of the annotation tool, some entries may be duplicated. Users of this dataset should be cautious of this issue and verify the uniqueness of the annotations where necessary. --- This dataset is a part of work from a bachelor thesis: OSUSKÝ, Adam. Predicting Word Importance Using Pre-Trained Language Models. Bachelor thesis, supervisor Javorský, Dávid. Prague: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics, 2024.
dc.language.iso eng
dc.publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri http://creativecommons.org/licenses/by/4.0/
dc.subject word importance
dc.subject ranking
dc.subject importance ranking
dc.title Word Importance Dataset
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
dc.rights.label PUB
has.files yes
branding LINDAT / CLARIAH-CZ
contact.person Adam Osuský adam.osusky02@gmail.com Charles University, Faculty of Mathematics and Physics
size.info 2861 tokens
files.size 80421
files.count 3


 Files in this item

 Download all files in item (78.54 KB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Name
rankings_assignment.csv
Size
6.48 KB
Format
Unknown
Description
Unknown
MD5
d6bcd5b307765fe814ad854a2f18cb43
 Download file
Icon
Name
rankings_ranking.csv
Size
58.17 KB
Format
Unknown
Description
Unknown
MD5
c48cd70ec0f3365f785dc647221604bb
 Download file
Icon
Name
rankings_task.csv
Size
13.88 KB
Format
Unknown
Description
Unknown
MD5
ea77936a8a4d8a13e9e272044abc8dcf
 Download file

Show simple item record