dc.contributor.author | Estève, Louis Clément |
dc.contributor.author | Savary, Agata |
dc.contributor.author | Lavergne, Thomas |
dc.date.accessioned | 2024-07-12T11:53:50Z |
dc.date.available | 2024-07-12T11:53:50Z |
dc.date.issued | 2024-06-07 |
dc.identifier.uri | http://hdl.handle.net/11234/1-5528 |
dc.description | This resource is a set of 14 vector spaces for single words and Verbal Multiword Expressions (VMWEs) in different languages (German, Greek, Basque, French, Irish, Hebrew, Hindi, Italian, Polish, Brazilian Portuguese, Romanian, Swedish, Turkish, Chinese). They were trained with the Word2Vec algorithm, in its skip-gram version, on PARSEME raw corpora automatically annotated for morpho-syntax (http://hdl.handle.net/11234/1-3367). These corpora were annotated by Seen2Seen, a rule-based VMWE identifier, one of the leading tools of the PARSEME shared task version 1.2. VMWE tokens were merged into single tokens. The format of the vector space files is that of the original Word2Vec implementation by Mikolov et al. (2013), i.e. a binary format. For compression, bzip2 was used. |
dc.language.iso | deu |
dc.language.iso | ell |
dc.language.iso | eus |
dc.language.iso | fra |
dc.language.iso | gle |
dc.language.iso | heb |
dc.language.iso | hin |
dc.language.iso | ita |
dc.language.iso | pol |
dc.language.iso | por |
dc.language.iso | ron |
dc.language.iso | swe |
dc.language.iso | tur |
dc.language.iso | zho |
dc.publisher | Université Paris-Saclay, CNRS, Laboratoire Interdisciplinaire des Sciences du Numérique |
dc.rights | PARSEME Shared Task Raw Corpus Data (v. 1.2) Agreement |
dc.rights.uri | https://lindat.mff.cuni.cz/repository/xmlui/page/licence-mwe-1.2-raw |
dc.source.uri | https://gitlab.com/parseme/corpora |
dc.subject | verbal multiword expressions |
dc.subject | word embeddings |
dc.subject | word2vec |
dc.title | Multilingual static embeddings for Verbal Multiword Expressions trained on PARSEME raw corpora |
dc.type | lexicalConceptualResource |
metashare.ResourceInfo#ContentInfo.mediaType | text |
metashare.ResourceInfo#ContentInfo.detailedType | computationalLexicon |
dc.rights.label | PUB |
has.files | yes |
branding | LINDAT / CLARIAH-CZ |
contact.person | Louis Estève louis.esteve@universite-paris-saclay.fr Université Paris-Saclay, CNRS, Laboratoire Interdisciplinaire des Sciences du Numérique |
sponsor | Université Paris Saclay Plan blanc PhD grant nationalFunds |
size.info | 44412316 entries |
size.info | 17267 multiWordUnits |
files.size | 16752281832 |
files.count | 22 |
Files in this item
This item is
PARSEME Shared Task Raw Corpus Data (v. 1.2) Agreement
Publicly Available
and licensed under:PARSEME Shared Task Raw Corpus Data (v. 1.2) Agreement
- Name
- INSTALL.md
- Size
- 1.21 KB
- Format
- Unknown
- Description
- Unknown
- MD5
- 23fbf46cd30ccdae44893d3906946e9f
- Name
- MWE_S2S_DE_typed_100d_skip-gram.bin.xz
- Size
- 822.69 MB
- Format
- application/x-xz
- MD5
- c5968c97c89a332d08d4ec100cc794b1
- Name
- MWE_S2S_EL_typed_100d_skip-gram.bin.xz
- Size
- 472.67 MB
- Format
- application/x-xz
- MD5
- 1408874115eb749721791284e4c0ee1e
- Name
- MWE_S2S_EU_typed_100d_skip-gram.bin.xz
- Size
- 145.61 MB
- Format
- application/x-xz
- MD5
- 0d2536382ddda7a92c68d7e6ffde23d4
- Name
- MWE_S2S_FR_typed_100d_skip-gram.bin.xz
- Size
- 1.96 GB
- Format
- application/x-xz
- MD5
- e4ffbd8874d2ca4593036884c4f7fb0b
- Name
- MWE_S2S_GA_typed_100d_skip-gram.bin.xz
- Size
- 197.87 MB
- Format
- application/x-xz
- MD5
- 12804a1e814d9fd4f03608f821c73f08
- Name
- MWE_S2S_HE_typed_100d_skip-gram.bin.xz
- Size
- 117.52 MB
- Format
- application/x-xz
- MD5
- c382b1f98e4d18c1652ef65d14f0a06b
- Name
- MWE_S2S_HI_typed_100d_skip-gram.bin.xz
- Size
- 319.86 MB
- Format
- application/x-xz
- MD5
- b4fb6af09acf2fee1fa4c426f9b60b53
- Name
- MWE_S2S_IT_typed_100d_skip-gram.bin.xz
- Size
- 611.89 MB
- Format
- application/x-xz
- MD5
- 07a883f11b4442f2b0611e1fa29101f4
- Name
- MWE_S2S_PL_typed_100d_skip-gram.bin.xz
- Size
- 3.89 GB
- Format
- application/x-xz
- MD5
- e2c4349c12f2da5fe4c7a918f67c0ec9
- Name
- MWE_S2S_PT_typed_100d_skip-gram.bin.xz
- Size
- 1.51 GB
- Format
- application/x-xz
- MD5
- a40e2effa510c076c46e0eebe8c31bef
- Name
- MWE_S2S_RO_typed_100d_skip-gram.bin.xz
- Size
- 100.33 MB
- Format
- application/x-xz
- MD5
- 335a84cd858ad1854696cbb7cc41dd90
- Name
- MWE_S2S_SV_typed_100d_skip-gram.bin.xz
- Size
- 4.62 GB
- Format
- application/x-xz
- MD5
- 1c80c77c6eb64d9842fb3cae87d8dce8
- Name
- MWE_S2S_TR_typed_100d_skip-gram.bin.xz
- Size
- 237.35 MB
- Format
- application/x-xz
- MD5
- 4f193bcbc0c4d8dee5cbac6b753afe93
- Name
- MWE_S2S_ZH_typed_100d_skip-gram.bin.xz
- Size
- 679.84 MB
- Format
- application/x-xz
- MD5
- c60eee0a4faf6ed54705e6d080335881
- Name
- load_vectors.sh
- Size
- 153 bytes
- Format
- Unknown
- MD5
- 8219c98d3d1999d86660a9d8459395ca
- Name
- md5_checksums.txt
- Size
- 1022 bytes
- Format
- Text file
- MD5
- 0a2210bab3bd4160578317b8f9bd443a
c5968c97c89a332d08d4ec100cc794b1 *MWE_S2S_DE_typed_100d_skip-gram.bin.xz 1408874115eb749721791284e4c0ee1e *MWE_S2S_EL_typed_100d_skip-gram.bin.xz 0d2536382ddda7a92c68d7e6ffde23d4 *MWE_S2S_EU_typed_100d_skip-gram.bin.xz e4ffbd8874d2ca4593036884c4f7fb0b *MWE_S2S_FR_typed_100d_skip-gram.bin.xz 12804a1e814d9fd4f03608f821c73f08 *MWE_S2S_GA_typed_100d_skip-gram.bin.xz c382b1f98e4d18c1652ef65d14f0a06b *MWE_S2S_HE_typed_100d_skip-gram.bin.xz b4fb6af09acf2fee1fa4c426f9b60b53 *MWE_S2S_HI_typed_100d_skip-gram.bin.xz 07a883f11b4442f2b0611e1fa29101f4 *MWE_S2S_IT_typed_100d_skip-gram.bin.xz e2c4349c12f2da5fe4c7a918f67c0ec9 *MWE_S2S_PL_typed_100d_skip-gram.bin.xz a40e2effa510c076c46e0eebe8c31bef *MWE_S2S_PT_typed_100d_skip-gram.bin.xz 335a84cd858ad1854696cbb7cc41dd90 *MWE_S2S_RO_typed_100d_skip-gram.bin.xz 1c80c77c6eb64d9842fb3cae87d8dce8 *MWE_S2S_SV_typed_100d_skip-gram.bin.xz 4f193bcbc0c4d8dee5cbac6b753afe93 *MWE_S2S_TR_typed_100d_skip-gram.bin.xz c60eee0a4faf6ed54705e6d080335881 *MWE_S2S_ZH_typed_ . . .
- Name
- sha3_checksums.txt
- Size
- 1.33 KB
- Format
- Text file
- MD5
- 62c052404a4361b91df2464f21971885
e12d5f4d7539b161098d922ffb8935e9e9d350aec9a0f8aea110aac5 *MWE_S2S_DE_typed_100d_skip-gram.bin.xz 59a224848baee4c956565374935ccb8c53faa1151650ceb9af14f999 *MWE_S2S_EL_typed_100d_skip-gram.bin.xz 6a4f6b423d597db6b6b06942d452eac8aeffa6ef0d6e97d7e88d6c65 *MWE_S2S_EU_typed_100d_skip-gram.bin.xz 297d6cc656d909d62b063af92aada8b222b694e50e54a3e9ac984736 *MWE_S2S_FR_typed_100d_skip-gram.bin.xz 107bff85292d176ef0ac975e9f0625fd8376cba223e7b2f33bb03e7b *MWE_S2S_GA_typed_100d_skip-gram.bin.xz d4951bd3322a635ab0971be28b91b630109ad9fd4878d8d8700b3984 *MWE_S2S_HE_typed_100d_skip-gram.bin.xz 6be3a825011a0213f33d5bce32e58e6aae26232d4a0dcfd4c266d477 *MWE_S2S_HI_typed_100d_skip-gram.bin.xz dd436cd35d95395417eff750690b957aee6f906590b3ea238110cf96 *MWE_S2S_IT_typed_100d_skip-gram.bin.xz c979d05a0d5c2658fc112d7555191b8f0601903a4eed47e83c561cd6 *MWE_S2S_PL_typed_100d_skip-gram.bin.xz a64110568c0987dd152c80d2833e592523580da4991d763129dfb4f8 *MWE_S2S_PT_typed_100d_skip-gram.bin.xz 903c1f3e3d792625a064310623d317 . . .
- Name
- verify_checksums.sh
- Size
- 351 bytes
- Format
- Unknown
- MD5
- 570a61090da4e672e2d7309e0ab7086a