Sample Use Cases of the CLARIN-D infrastructure

WebMAUS-Basic: Automatic phonetic labelling & segmentation of a single German recording with text

Interviews and conversation are often recorded and later transcribed. The web service WebMAUS Basic, which is available in the CLARIN infrastructure, allows to automatically combine audio recordings and text transcriptions in a way that the phones, words and audio signals are time-aligned.

Especially relevant for

anybody who has audio signals and transcriptions, for example researchers working with:

linguistics
phonetics
anthropology
ethnology
media studies
educational research
conversation analysis
speech pathology
political science
speech technology

Further information and example >>

WebMAUS-Pipeline: Dealing with long video interviews with interlocutor speech, noise, long silence intervals etc.

Very long recordings (typical up to several hours in video interviews) are difficult to time-align. Therefore, the BAS offers a web service that automatically splits long recordings into so-called chunks, segments them individually, and combines the results into a common file, as demonstrated in this use case.

Especially relevant for

Linguistics
Phonetics
Phonology
Speech Technology

Further information and example >>

From text to phonological pronunciation

The orthography of many languages does not encode the precise pronunciation of the corresponding spoken utterance. In such cases, it is useful to be able to automatically transform a text into a phonological encoding (e.g. for speech synthesis). The CLARIN web service G2P provides such a tool for a multitude of languages.

Especially relevant for

Linguistics
Phonetics
Anthropology
Ethnology

Further information and example >>

WebMAUS-Basic: Automatic phonetic labelling and segmentation of multiple Hungarian recordings

Interviews and conversation are often recorded and later transcribed. The web service WebMAUS, which is available in the CLARIN infrastructure, provides tools to combine audio recordings and transcriptions in a way that the words and audio signals are time aligned.

Especially relevant for

Linguistics
Phonetics
Speech Technology
Anthropology
Ethnology
Media Science

Further information and example >>

Cross-corpus search and download of recordings of the BAS CLARIN repository

Large collections of speech recordings and annotations contain different sub-corpora. CLARIN provides such datasets for academic research. This requires authentication as a member of the academic society.

Especially relevant for

Humanities scholars interested in empirical speech data
Developers in speech technology

Further information and example >>

Support of Enhanced Publications in CLARIN: Citation, Archiving and Access to research data

Repositories contain research-based data available under certain conditions. As repositories are permanent archiving installations, the data in there can be cited and hence made visible. This allows reusing data, attributing the resource to the creator, and reproducing research results. Access to research data will be different from repository to repository.

Especially relevant for

Humanities scholars working with empirical speech data
Developers of speech technology

Further information and example >>

DiaCollo: Collocation analysis with a diachronic perspective

The meaning of a word can be revealed by the context in which it appears. Changes in a word's meaning will therefore often be directly associated with changes in its characteristic combinations (the set of words with which it typically occurs together, its collocates). DiaCollo is a software tool for the discovery, comparison, and interactive visualization of typical word combinations for user-specified target terms.

Especially relevant for

Historians
Political scientists
Philologists
Linguists

Further information and example >>

Using Automatic Annotation Tools for Transcription Files

The EXMARaLDA Partitur-Editor enables access to the web services provided by WebLicht and the CLARIN-D infrastructure. WebLicht as a service allows workflows to be defined and later re-used with just one click.

Especially relevant for

anybody who works with the EXMARaLDA Partitur-Editor and wants to automatically annotate his or her files, for example:

Linguists
Anthropologists
Political scientists, especially those working with video and audio files

Further information and example >>

Where do you say

Many linguistic resources contain geographic information, for example the location of the recording or the birthplace of a speaker. The tool Wo sagt man (German for "where do you say") uses the external data from the database of spoken German (Datenbank für Gesprochenes Deutsch, DGD) and highlights the areas in which an expression is being used. On a map, it shows the areas where an expression has been recorded.

Especially relevant for

Dialectologists
Historians interested in specific regions
Philolologists

Further information and example >>

Context Search of Words in Distributed Corpora

The CLARIN Federated Content Search (CLARIN FCS) allows to search in language resources that are archived in different repositories. The aggregator converts the results so that they can be further processed in WebLicht, so as to, for example, perform Named Entity Recognition.

Especially relevant for

Linguists
Computational Linguists

Further Information and examples >>

Word-level-based comparative text analysis

Many questions from the field of Humanities relating to specific text resources can be reduced to the analysis of vocabulary. Especially comparison is of central interest. The aim of this use case is to demonstrate how to answer one's own research question.

Especially relevant for

All scholars that compare texts or vocabulary, including:

Historians
Political scientists
Philologists

Further information and example >>

Content analysis of biographical data supported by computational linguistics

Our web service "Textuelle Emigrationsanalyse" (German for 'textual emigration analysis') offers an example of how facts about emigration that were extracted from large textual corpora using computational linguistic techniques within the CLARIN infrastructure can be explored. The results can be seen either in tabular form, on a map with geographical information or person-centred.

Especially relevant for

Historians
Political scientists
Literary scholars

Further information and example >>

Automatic markup of personal and place names in textual sources

Books, articles, and manuscripts often entail information about people, geographical locations, and organizations. With this tool, names can automatically be marked and categorized.

Especially relevant for

Historians
Political scientists
Literary scholars

Further information and example >>

Sample Use Cases of the CLARIN-D infrastructure

Especially relevant for

Especially relevant for

Especially relevant for

Especially relevant for

Especially relevant for

Especially relevant for

Especially relevant for

Especially relevant for

Especially relevant for

Especially relevant for

Especially relevant for

Especially relevant for

Especially relevant for

Information

Contact CLARIN-D