dc.contributor.author | Galuščáková, Petra |
dc.contributor.author | Devaud, Romain |
dc.contributor.author | Gonzalez-Saez, Gabriela |
dc.contributor.author | Mulhem, Philippe |
dc.contributor.author | Goeuriot, Lorraine |
dc.contributor.author | Piroi, Florina |
dc.contributor.author | Popel, Martin |
dc.date.accessioned | 2023-02-21T09:48:04Z |
dc.date.available | 2023-02-21T09:48:04Z |
dc.date.issued | 2023-02-16 |
dc.identifier.uri | http://hdl.handle.net/11234/1-5010 |
dc.description | The collection consists of queries and documents provided by the Qwant search Engine (https://www.qwant.com). The queries, which were issued by the users of Qwant, are based on the selected trending topics. The documents in the collection were selected with respect to these queries using the Qwant click model. Apart from the documents selected using this model, the collection also contains randomly selected documents from the Qwant index. All the data were collected over June 2022. In total, the collection contains 672 train queries, with corresponding 9656 assessments coming from the Qwant click model, and 98 heldout queries. The set of documents consist of 1,570,734 downloaded, cleaned and filtered Web Pages. Apart from their original French versions, the collection also contains translations of the webpages and queries into English. The collection serves as the official training collection for the 2023 LongEval Information Retrieval Lab (https://clef-longeval.github.io/) organised at CLEF. |
dc.language.iso | fra |
dc.language.iso | eng |
dc.publisher | Université Grenoble Alpes |
dc.publisher | Qwant |
dc.publisher | Research Studios Austria |
dc.publisher | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) |
dc.rights | Qwant LongEval Attribution-NonCommercial-ShareAlike License |
dc.rights.uri | https://lindat.mff.cuni.cz/repository/xmlui/page/Qwant_LongEval_BY-NC-SA_License |
dc.source.uri | https://clef-longeval.github.io/ |
dc.subject | information retrieval |
dc.subject | parallel corpus |
dc.subject | search |
dc.subject | automatic evaluation |
dc.title | LongEval Train Collection |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
dc.rights.label | PUB |
has.files | yes |
branding | LINDAT / CLARIAH-CZ |
contact.person | Petra Galuščáková galuscakova@gmail.com Université Grenoble Alpes |
sponsor | Ministerstvo školství, mládeže a tělovýchovy České republiky LM2018101 LINDAT/CLARIAH-CZ: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy nationalFunds |
sponsor | Ministerstvo školství, mládeže a tělovýchovy České republiky LM2023062 LINDAT/CLARIAH-CZ: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy nationalFunds |
sponsor | French Agence Nationale de la Recherche ANR-19-CE23-0029 Kodicare Other |
sponsor | Austrian Science Fund I4471-N Kodicare Other |
size.info | 1570734 articles |
size.info | 672 other |
files.size | 12561428096 |
files.count | 1 |
Soubory tohoto záznamu
Licenční kategorie:
Licence: Qwant LongEval Attribution-NonCommercial-ShareAlike License
Publicly Available
Licence: Qwant LongEval Attribution-NonCommercial-ShareAlike License
- Název
- longeval-train-v2.tgz
- Velikost
- 11.7 GB
- Formát
- application/x-gzip
- Popis
- data
- MD5
- e34cf8b5e9b2de98628759bbd621a4ca