Best practices
Compiling files into collections in LINDAT repository
This text is intended for users who want to make their local files available as collections in a public data repository.
We run LINDAT repository and therefore we prefer it to other repositories. LINDAT is a digital library that is designed for language data collections, both text and spoken. You can find there corpora for linguistic research, collections of court decisions to automate procedures in the legal domain, audio recordings from vehicles to implement speech recognition systems, etc. Collections in LINDAT are identified by three attributes
- data is machine-readable information
- metadata is data about data
- (meta)data licence is the conditions under which (meta)data can be used
Here are tips and recommendations for compiling your data files to store them in LINDAT repository; your file formats can be diverse, e.g. Excel spreadsheets, Word documents, audio/video recordings, manuscripts, online dictionaries.
- Check the documentation on how the files were created and edit it if needed
- Document file contents, e.g. add column/row names in Excel spreadsheets
- Associate a license with your data, metadata is available for free
- Keep the original files and do not worry whether they will be machine readable in the future
- Think about format conversion and consider the formats that various search services work with
- Do not hesitate to store the data in multiple formats
- Document the compilation process. This will enrich Best practices and a nice paper can be written
- Consult your colleagues about clarity in the documentation
See the instructions how to deposit data in LINDAT repository.