Support of Enhanced Publications in CLARIN: Citation, Archiving and Access to research data

Verification of research results nowadays requires making sure that also the underlying data are available. Such data is cited in enhanced publications which requires a unique identifier along the handle-system. Those identifiers point to data in repositories, which, under certain circumstances, can be downloaded. Repositories contain research based data available under certain conditions. As repositories are permanent archiving installations, the data in there can be cited and hence made visible. This allows reusing data, attributing the resource to the creator, and reproducing research results. Access to research data will be different from repository to repository.

Especially relevant for

  • humanities scholars working with empirical speech data
  • developers from speech technology

Starting point:

in a publication we see the citation (PID) of a speech corpus PD2: 11858/00-1779-0000-001F-88A9-6


to download the version of the cited speech corpus


find the repository, authenticate and download

Related CLARIN-D tools and services

Short guide on how to download a speech corpus

  1. start Chrome or Firefox and type in the following URL:
  2. A so-called landing page in the BAS CLARIN repository appears; check the displayed meta data to verify that it describes indeed the requested speech corpus PD2 and which version. You'll see that the 'Access' is 'free for science'. Scrolling down you will find links to all recording sessions in this corpus.
  3. to download data from the BAS CLARIN repository you must authenticate yourself as an academic.
    • Click on the link 'Login via your institution' just below the CLARIN logo.
    • A selection page of European academic institutions appears.
    • find your home institution, or - in case you do not have an AAI account of your home institution - select ' website account'.
    • A login page of your home institution should appear.
    • log-in with your university account.
    • The BAS CLARIN repository page should appear again; in the line below the BAS logo you should read 'You are authorized to have full access to the BAS repository'.
  1. Scroll down to the PD2 speech corpus and click on the link 'PD2'
  2. The landing page of the PD2 corpus appears once again, but this time you'll find a download section at the bottom. Note: you might check if this is the same version of the speech corpus; if not, find the link to the correct version.
  3. if not already there, fill in a valid email address to which the repository can send the download information, acknowledge the terms of usage, and click on 'create and download .tar archive'.
  4. After a few seconds the repo should acknowledge the download request with the message 'An email containing the download link will be sent to: (your email address)
  5. The download package is composed in the background, and the download link will be send to you as soon as the package is available. The email looks something like:
    "The requested tar archive has been created on 2015-12-14T14:54:13.000Z.
    Please follow this download link:
    The archive will be available for 24 hours from now on."
  6. click on the link or copy&paste the URL [....] into a web browser.
  7. The downloaded *.tgz archive should contain a sub-directory with the same name as the speech corpus. This subdirectory contains the documentation and a separate subdirectory for every recording session.