About including Policies

About CLARINO Bergen Repository and Policies

Hosted and maintained at the University of Bergen by the The IT-department, The Department of Linguistic, Literary and Aesthetic Studies and The University Library.

Code developed and maintained by The Institute of Formal and Applied Linguistics (ÚFAL), Charles University in Prague, Czech Republic. We are using a git fork of clarin-dspace with minor changes.

Mission Statement
Terms of Service
About the Repository
License Agreement and Contracts
Intellectual Property Rights
Privacy Policy
Metadata Policy
Preservation Policy
Citing Data Policy
Contact Information

Mission Statement

The CLARINO Bergen Centre is one of four centres in the Norwegian national CLARINO research infrastructure, which in turn is Norway's contribution to the construction and operation of CLARIN ERIC. The aim of the CLARINO Bergen Centre is in line with that of CLARIN ERIC, i.e. to make existing and future language resources easily accessible for researchers, and to bring eScience to humanities disciplines. Thus the CLARINO Bergen Centre is restricted to the domain of language data and tools; datasets may be any collections or descriptions of language expressions (e.g. corpora, lexicons, termbases) or data about language (e.g. quantitative data obtained from field work or experiments).

The CLARINO Bergen Centre is operated by the University of Bergen (UiB) which is one of the partners in the CLARINO consortium, carrying out the CLARINO project funded by the Research Council of Norway (RCN).

The CLARINO Bergen Centre runs a repository (the CLARINO Bergen Repository) offering download and upload of resources. Furthermore the CLARINO Bergen Centre provides access to deposited resources for online search through the following two online systems for exploration: Corpuscle, a corpus exploration tool, and INESS, the Norwegian Infrastructure for the Exploration of Syntax and Semantics. The CLARINO Bergen Centre also provides COMEDI, an online component metadata editor. All data and services offered by the CLARINO Bergen Centre (the Repository, Corpuscle, INESS and COMEDI) adhere to CLARIN principles for metadata, PIDs, licensing and access.

The CLARINO Bergen Centre helps researchers working in the Humanities and the Cultural and Social Sciences who have a need for managing, preparing, storing, accessing or analyzing research data related to language. Typical producers of language data are researchers in corpus linguistics, translation, grammar, terminology, lexicography, sociolinguistics, psycholinguistics, philology, philosophy, historical linguistics, political sciences, anthropology, etc. Producers of language processing tools are researchers in natural language processing. Consumers of language data and tools in the CLARINO Bergen Centre are often the same groups as producers; they are interested in analyzing language data and using text processing tools available in the CLARIN infrastructure. Many resources in the CLARINO Bergen Centre originate from Norway, but the infrastructure is open to all CLARIN members, contingent upon licensing conditions.

To know more about CLARIN ERIC visit CLARIN-ShortGuide.pdf.

Terms of Service

To achieve our mission statement,we set out some ground rules through the Terms of Service. By accessing or using any kind of data or services provided by the Repository, you agree to abide by the Terms contained in the above mentioned document.

Data in the CLARINO repository are made available under the licence attached to the resources. In case there is no licence, data is made freely available for access, printing and download for the purposes of non-commercial research or private study. Users must acknowledge in any publication, the Deposited Work using a persistent identifier (see Citing Data), its original author(s)/creator(s), and any publisher where applicable. Full items must not be harvested by robots except transiently for full-text indexing or citation analysis. Full items must not be sold commercially unless explicitaly granted by the attached licence without formal permission of the copyright holders.

About the Repository

It is like a library for linguistic data and tools.

Search for data and tools and easily download them.
Deposit the data and be sure it is safely stored, everyone can find it, use it, and correctly cite it (giving you credit)

The Repository performs basic curation of the submissions by checking and editing the metadata. The Repository uses the DSpace submission workflow which foresees several curation steps before completing a submission. Curation of submitted data implies assessment and revision of metadata and quality checks by experts affiliated with the CLARINO Bergen Centre. Automatic tools are helping the editors to verify and validate metadata and the integrity of the submitted data.

The basic curation process foresees the possibility of returning the submission to the submitter for additional changes before the dataset is published in the repository.

On occasion the repository assists in enhanced curation, for example, conversion to a different format and enhancement of metadata. Customized services may require payment.

The University of Bergen started working on installing the Czech Lindat repository in August 2013 with help from visiting staff from UFAL, sponsored by the CLARIN Mobility action. We are working locally on making changes for our repository, while we try to maintain updates from UFAL, given the frequent updates to code. In the future we will work on implementing the nationally recommended CLARINO CMDI profile.

License Agreement and Contracts

At the moment, CLARINO Bergen Repository distinguishes three types of contracts.

For every deposit, we enter into a standard contract with the submitter, the so-called "Deposition License Agreement", in which we describe our rights and duties and the submitter acknowledges that they have the right to submit the data and gives us (the repository centre) right to distribute the data on their behalf.
Everyone who downloads data is bound by the licence assigned to the item - in order to download protected data, one has to be authenticated and needs to electronically sign the licence. A list of available licenses in our repository can be found here.
For submitters, there is a possibility for setting custom licences to items during the submission workflow.

Intellectual Property Rights

As mentioned in the section License Agreement and Contracts, we require the depositor of data or tools to sign a Deposition License Agreement, which specifies that they have the right to submit the data and gives us (the repository centre) right to distribute the data on their behalf. This means that depositors are solely responsible for taking care of IPR issues before publishing data or tools by submitting them to us.
Should anyone have a suspicion that any of the datasets or tools in our repository violate Intelectual Property Rights, they should contact us immediately at our help desk.

Privacy Policy

Please read our Privacy Policy in order to learn how we manage personal data collected by the CLARINO Bergen Repository and services.

Metadata Policy

Deposited content must be accompanied by sufficient metadata describing its content, provenance and formats in order to support its preservation and dissemination. Metadata are freely accessible and are distributed in the public domain (under CC0). However, we reserve the right to be informed about commercial usage of metadata from CLARINO Bergen repository including a description of your use case at Help Desk.

Preservation Policy

The CLARINO Bergen Centre is committed to curation and the long-term archiving of items deposited in its repository, in order to preserve research results and to promote the replicability of research. We follow best practice guidelines, standards and regulations set forth by CLARIN, OAIS and/or the University of Bergen.

In order to be recognized as a reliable and trustworthy repository, the centre undergoes periodical assessments and certifications by CLARIN and CoreTrustSeal.

To fulfill its commitments, the repository ensures that datasets are ingested and distributed in accordance with their license (see agreements and contracts). We prefer open data, but in the case of restrictions imposed by rightsholders, only authenticated users who meet specified conditions can access the dataset.

The submission workflow as described in the deposit guidelines and the assistance by our editors ensures discoverability (by requiring accurate metadata) via our search engine, externally through OAI-PMH and in page metadata for certain web crawlers. Metadata are freely accessible.

There are various automated procedures including fixity checks, to ensure integrity of the submitted datasets and completeness of metadata. On the system level we employ various on-site and off-site backup strategies and hardware monitoring. The datasets are accessible and dowloadable online to authorized users.

Each submission receives a Persistent IDentifier (PID) for reference and citation. No changes are permitted in a dataset after it has been published, but a submission of a new version, linked to the old one through their metadata, is possible (see faq for more details). Retraction is also possible.

Through regular participation in CLARINO and CLARIN activities, Open Repositories and various other meetings, schools and conferences, the repository staff is informed of new developments in technologies and/or initiatives.

The various export options offered by the repository system (DSpace) ensures that data and their metadata are not locked in and can be migrated to a different repository system.

The repository encourages the usage of standards and formats recommended by CLARIN. The preferred file formats may change over time. In the case of migration to new formats, original items will be kept intact for reproducibility purposes, while new, migrated versions of datasets, will be stored as new repository records linked to the old ones through their metadata. Open standards are preferred over proprietary standards, formats should be well-documented, verifiable and proven, text-based formats are preferred over binary formats where possible, and in the case of digitalization of analogue signals, lossless or no compression is recommended.

In the unlikely case of a withdrawal of funding, the repository’s content would be transferred to another CLARIN centre. While the legal and technical aspects of the process of relocating data to another institution are underway, the University of Bergen Library offers a timeframe of at least 10 years of hosting for the CLARINO Bergen Centre Repository, in which period the University of Bergen Library as technical system owner will provide preservation of and access to the data.

Citing Data Policy

Data Users must acknowledge and cite data sources properly in all publications and outputs.
To make reference to resources deposited in our repository, use a handle as a persistent identifier instead of an URL.

Contact Information

Submissions : Juliane.Tiemann@uib.no
Technical : Oyvind.Gjesdal@uib.no
University Library : post@ub.uib.no