Zobrazit minimální záznam

 
dc.contributor.author Çano, Erion
dc.date.accessioned 2020-07-02T12:35:47Z
dc.date.available 2020-07-02T12:35:47Z
dc.date.issued 2020-06-30
dc.identifier.uri http://hdl.handle.net/11234/1-3257
dc.description OAGL is a paper metadata dataset consisting of 17528680 records which comprise various scientific publication attributes like abstracts, titles, keywords, publication years, venues, etc. The last field of each record is the page length of the corresponding publication. Dataset records (samples) are stored as JSON lines in each text file. The data is derived from OAG data collection (https://aminer.org/open-academic-graph) which was released under ODC-BY license. This data (OAGL Paper Metadata Dataset) is released under CC-BY license (https://creativecommons.org/licenses/by/4.0/). If using it, please cite the following paper: Çano Erion, Bojar Ondřej: How Many Pages? Paper Length Prediction from the Metadata. NLPIR 2020, Proceedings of the the 4th International Conference on Natural Language Processing and Information Retrieval, Seoul, Korea, December 2020.
dc.language.iso eng
dc.publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.relation info:eu-repo/grantAgreement/EC/H2020/825460
dc.relation.isreferencedby https://dl.acm.org/doi/10.1145/3443279.3443305
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri http://creativecommons.org/licenses/by/4.0/
dc.subject Paper Length Prediction
dc.subject Scientific Papers Corpus
dc.subject Scientific Publication Metadata
dc.title OAGL Paper Metadata Dataset
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
dc.rights.label PUB
has.files yes
branding LINDAT / CLARIAH-CZ
contact.person Erion Çano cano@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
sponsor European Union H2020-ICT-2018-2-825460 ELITR - European Live Translator euFunds info:eu-repo/grantAgreement/EC/H2020/825460
size.info 5 files
size.info 17528680 entries
size.info 22.9 gb
size.info 7.3 gb
files.size 7818553291
files.count 2


 Soubory tohoto záznamu

Licenční kategorie:
Publicly Available

Licence: Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Název
oagl.zip
Velikost
7.28 GB
Formát
application/zip
Popis
Data
MD5
e2d6dfc1a6d7c76499e4c1c27ad86a89
 Stáhnout soubor  Náhled
 Náhled souboru  
  • oagl
    • val.txt829 kB
    • test.txt1 MB
    • val-test_bck.txt274 MB
    • train_bck.txt22 GB
    • train.txt5 MB
Icon
Název
README.txt
Velikost
1.56 KB
Formát
Textový soubor
Popis
readme
MD5
8442f638fbb2ab4d45c6c28a846b70b5
 Stáhnout soubor  Náhled
 Náhled souboru  
OAGL Paper Metadata Dataset
===========================

OAGL is a paper metadata dataset consisting
of 17528680 records which comprise various scientific 
publication attributes like abstracts, titles, keywords,
publication years, venues, etc. The last field of each
record is the page length of the corresponding publication. 
Dataset records (samples) are stored as JSON lines in each 
text file. 

The data is derived from OAG data collection 
(https://aminer.org/open-academic-graph) which was released 
under ODC-BY license. 

This data (OAGL Paper Metadata Dataset) is released under 
CC-BY license (https://creativecommons.org/licenses/by/4.0/). 


Download
--------

This dataset can be download from:
http://hdl.handle.net/11234/1-3257


Publications
------------

If using it, please cite the following paper:

Çano Erion, Bojar Ondřej: How Many Pages? Paper Length Prediction from the Metadata. 
NLPIR 2020, Proceedings of the the 4th International Confe . . .
                                            

Zobrazit minimální záznam