dc.contributor.author | Galuščáková, Petra |
dc.contributor.author | Neužilová, Lucie |
dc.date.accessioned | 2018-03-02T07:07:04Z |
dc.date.available | 2018-03-02T07:07:04Z |
dc.date.issued | 2018-02-28 |
dc.identifier.uri | http://hdl.handle.net/11234/1-1952 |
dc.description | This package provides an evaluation framework, training and test data for semi-automatic recognition of sections of historical diplomatic manuscripts. The data collection consists of 57 Latin charters issued by the Royal Chancellery of 7 different types. Documents were created in the era of John the Blind, King of Bohemia (1310–1346) and Count of Luxembourg. Manuscripts were digitized, transcribed, and typical sections of medieval charters ('corroboratio', 'datatio', 'dispositio', 'inscriptio', 'intitulatio', 'narratio', and 'publicatio') were manually tagged. Manuscripts also contain additional metadata, such as manually marked named entities and short Czech abstracts. Recognition models are first trained using manually marked sections in training documents and the trained model can then be used for recognition of the sections in the test data. The parsing script supports methods based on Cosine Distance, TF-IDF weighting and adapted Viterbi algorithm. |
dc.language.iso | lat |
dc.language.iso | ces |
dc.publisher | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) |
dc.rights | Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/4.0/ |
dc.source.uri | http://ufal.mff.cuni.cz/Medieval-Charter-Sections-Corpus |
dc.subject | section detection |
dc.subject | segmentation |
dc.subject | information retrieval |
dc.title | Medieval Charter Sections Corpus |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
dc.rights.label | PUB |
has.files | yes |
branding | LINDAT / CLARIAH-CZ |
contact.person | Petra Galuščáková galuscakova@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) |
sponsor | Grantová agentura České republiky GAP103/12/G084 Centrum pro multi-modální interpretaci dat velkého rozsahu nationalFunds |
sponsor | NSF 1618695 Safely Searching Among Sensitive Content Other |
size.info | 171 kb |
size.info | 57 files |
files.size | 171266 |
files.count | 1 |
Soubory tohoto záznamu
Licenční kategorie:
Licence: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Publicly Available
Licence: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
- Název
- Historical_Manuscript_Sections_Detection_v2.zip
- Velikost
- 167.25 KB
- Formát
- application/zip
- Popis
- Medieval Charter Sections Corpus
- MD5
- 53d7e68d98402166539b0a9aecdc49c8