Soubory tohoto záznamu

Licenční kategorie:
Publicly Available

Licence: Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Název
oagl.zip
Velikost
7.28 GB
Formát
application/zip
Popis
Data
MD5
e2d6dfc1a6d7c76499e4c1c27ad86a89
 Stáhnout soubor  Náhled
 Náhled souboru  
  • oagl
    • val.txt829 kB
    • test.txt1 MB
    • val-test_bck.txt274 MB
    • train_bck.txt22 GB
    • train.txt5 MB
Icon
Název
README.txt
Velikost
1.56 KB
Formát
Textový soubor
Popis
readme
MD5
8442f638fbb2ab4d45c6c28a846b70b5
 Stáhnout soubor  Náhled
 Náhled souboru  
OAGL Paper Metadata Dataset
===========================

OAGL is a paper metadata dataset consisting
of 17528680 records which comprise various scientific 
publication attributes like abstracts, titles, keywords,
publication years, venues, etc. The last field of each
record is the page length of the corresponding publication. 
Dataset records (samples) are stored as JSON lines in each 
text file. 

The data is derived from OAG data collection 
(https://aminer.org/open-academic-graph) which was released 
under ODC-BY license. 

This data (OAGL Paper Metadata Dataset) is released under 
CC-BY license (https://creativecommons.org/licenses/by/4.0/). 


Download
--------

This dataset can be download from:
http://hdl.handle.net/11234/1-3257


Publications
------------

If using it, please cite the following paper:

Çano Erion, Bojar Ondřej: How Many Pages? Paper Length Prediction from the Metadata. 
NLPIR 2020, Proceedings of the the 4th International Confe . . .