dc.contributor.author | Çano, Erion |
dc.date.accessioned | 2020-07-02T12:35:47Z |
dc.date.available | 2020-07-02T12:35:47Z |
dc.date.issued | 2020-06-30 |
dc.identifier.uri | http://hdl.handle.net/11234/1-3257 |
dc.description | OAGL is a paper metadata dataset consisting of 17528680 records which comprise various scientific publication attributes like abstracts, titles, keywords, publication years, venues, etc. The last field of each record is the page length of the corresponding publication. Dataset records (samples) are stored as JSON lines in each text file. The data is derived from OAG data collection (https://aminer.org/open-academic-graph) which was released under ODC-BY license. This data (OAGL Paper Metadata Dataset) is released under CC-BY license (https://creativecommons.org/licenses/by/4.0/). If using it, please cite the following paper: Çano Erion, Bojar Ondřej: How Many Pages? Paper Length Prediction from the Metadata. NLPIR 2020, Proceedings of the the 4th International Conference on Natural Language Processing and Information Retrieval, Seoul, Korea, December 2020. |
dc.language.iso | eng |
dc.publisher | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) |
dc.relation | info:eu-repo/grantAgreement/EC/H2020/825460 |
dc.relation.isreferencedby | https://dl.acm.org/doi/10.1145/3443279.3443305 |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ |
dc.subject | Paper Length Prediction |
dc.subject | Scientific Papers Corpus |
dc.subject | Scientific Publication Metadata |
dc.title | OAGL Paper Metadata Dataset |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
dc.rights.label | PUB |
has.files | yes |
branding | LINDAT / CLARIAH-CZ |
contact.person | Erion Çano cano@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) |
sponsor | European Union H2020-ICT-2018-2-825460 ELITR - European Live Translator euFunds info:eu-repo/grantAgreement/EC/H2020/825460 |
size.info | 5 files |
size.info | 17528680 entries |
size.info | 22.9 gb |
size.info | 7.3 gb |
files.size | 7818553291 |
files.count | 2 |
Soubory tohoto záznamu
Licenční kategorie:
Licence: Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
Licence: Creative Commons - Attribution 4.0 International (CC BY 4.0)
- Název
- oagl.zip
- Velikost
- 7.28 GB
- Formát
- application/zip
- Popis
- Data
- MD5
- e2d6dfc1a6d7c76499e4c1c27ad86a89
- oagl
- val.txt829 kB
- test.txt1 MB
- val-test_bck.txt274 MB
- train_bck.txt22 GB
- train.txt5 MB
- Název
- README.txt
- Velikost
- 1.56 KB
- Formát
- Textový soubor
- Popis
- readme
- MD5
- 8442f638fbb2ab4d45c6c28a846b70b5
OAGL Paper Metadata Dataset =========================== OAGL is a paper metadata dataset consisting of 17528680 records which comprise various scientific publication attributes like abstracts, titles, keywords, publication years, venues, etc. The last field of each record is the page length of the corresponding publication. Dataset records (samples) are stored as JSON lines in each text file. The data is derived from OAG data collection (https://aminer.org/open-academic-graph) which was released under ODC-BY license. This data (OAGL Paper Metadata Dataset) is released under CC-BY license (https://creativecommons.org/licenses/by/4.0/). Download -------- This dataset can be download from: http://hdl.handle.net/11234/1-3257 Publications ------------ If using it, please cite the following paper: Çano Erion, Bojar Ondřej: How Many Pages? Paper Length Prediction from the Metadata. NLPIR 2020, Proceedings of the the 4th International Confe . . .