Preservation Policy

LINDAT/CLARIAH-CZ is committed to the long-term care of items deposited in the repository, to preserve the research and to help in keeping research replicable and strives to adopt the current best practice in digital preservation. See the Mission Statement section in the FAQ. We follow best practice guidelines, standards and regulations set forth by CLARIN, OAIS and Charles University.

To stay a reliable and trustworthy repository, we undergo periodical assessments by CLARIN ERIC and Core Trust Seal (formerly Data Seal of Approval).

To fulfil our commitments, the repository ensures that datasets are ingested and distributed in accordance with their license (see License Agreement and Contracts in the FAQ). Sometimes (for licenses that do not permit public access) this means only authorized users can access the dataset.

The submission workflow as described in the FAQ: "What is the actual depositing/archiving procedure?" and the work of our editors ensures discoverability (by requiring accurate Metadata Policy) via our search engine, externally through OAI-PMH and in page metadata for certain web crawlers. Metadata are freely accessible.

The datasets are kept within a private cloud at UFAL. That is on internal server storage, as opposed to being stored on removable media such as DVDs. The physical access to the server rooms is limited. The datasets are accessible online.

There are various automated procedures including fixity checks, to ensure integrity of the submitted datasets and completeness of metadata.

Disaster recovery of data is implemented via a multi-level backup scheme:

  • first level is replication of the virtual systems between cluster nodes used for the HA regime
  • second level are weekly dumps of the virtual systems to a shared NFS volume
  • third level is a weekly off-site differential backup on an external hierarchical storage

We monitor the hardware platform as well as the real-time performance and status of our services. We react to the incidents reported.

The whole software stack is based on open-source software. We aim to use a supported version of each of the components. The various export options and extensive documentation offered by the repository system (DSpace) ensure that data and their metadata are not locked in and can be moved to a different repository system.

We view data and tools as primary research outputs; each submission receives a Persistent IDentifier for reference, and the users are guided to use them. For reproducibility (of results using the dataset) and to have a clear meaning of what a PID refers to, changes in a dataset after it has been published are not permitted. A new submission – new version – is required instead. The old and the new submissions are linked through their metadata (see the FAQ: "How do I create a new version of a record?" for more details).

The policy of the repository is to never delete datasets, but in extreme cases the repository may be legally required to do so. In such a case, the repository provides a tombstone page if a dataset/item was removed, and the PID is directed to this page. The details on the tombstone pages differ on a case-by-case basis, i.e., why the item was removed. If we can, we provide the title, the authors, and the removal reason.

The metadata can change - they can be automatically (or manually) enriched. The helpdesk can be contacted with a metadata update request (e.g., to fix a typo in the metadata).

Through regular participation in CLARIN activities, Open Repositories and various other meetings, schools and conferences, the repository staff is informed of new developments in technologies and/or initiatives.

The repository encourages the usage of specific file formats as recommended by CLARIN. The guiding principles for format selection are: open standards are preferred over proprietary standards, formats should be well-documented, verifiable, and proven, text-based formats are preferred over binary formats where possible, in the case of digitalization of analogue signal lossless or no compression is recommended.

The preferred file formats will change over time, in which case the repository will make every effort to migrate to other formats, while keeping originals intact for reproducibility purposes (i.e. migrated item will be a new repository record linked to the old one).

In the case of a withdrawal of funding, the repositories content would be transferred to another CLARIN centre. While the legal aspects of the process of relocating data to another institution are underway, the hosting institute (UFAL) offers a timeframe of at least 10 years after the funding of the RI is withdrawn, in which it will preserve the data and provide access to it.