CLARIN-D Blog

Data Management in the Humanities: Progress in the Standardisation of Metadata Formats for Language-related Research Data

Data Management in the Humanities: Progress in the Standardisation of Metadata Formats for Language-related Research Data

In July 2019, the International Organization for Standardization (ISO) published a new standard which contributes to describing language-related research data in a significant and sustainable way during archiving. The Standard ISO 24622-2 "Component Metadata Specification Language" standardizes procedures for defining a schema for descriptions tailored to requirements of specific types of research data.

When research data is archived, information about the data is collected and made available in a way that allows other researchers to find the data and to assess the relevance of the data from the description. In addition, potential users can get an idea of how they could incorporate this data into their own research and use it to answer research questions of their own. These descriptions are called metadata.

Experience shows that due to differences in the types of research data and research questions, it is very difficult to find an all-encompassing, universal pattern - or schema - according to which these descriptions can be created. For instance, the description of psychological experiments (number of test persons, research question, free and bound variables, recording system, etc.) are described in a different way than collections of texts for grammatical investigations or for the creation of word embeddings (number of "words", language, length of texts, source of texts, age of texts, authors, etc.). Despite their long tradition, libraries for books have a variety of metadata formats, e.g. Dublin Core, MARC 21, PREMIS, MODS. Many metadata schemas have some fields - also called data categories - that resemble each other, as well as some areas where they differ. In order to enable both an adequate description of research data and the utilisation of similar metadata

Read more