Sample Use Cases of the CLARIN-D infrastructure

WebMAUS-Basic: Automatic phonetic labelling & segmentation of a single German recording with text

Interviews and conversation are often recorded and later transcribed. The web service WebMAUS Basic, which is available in the CLARIN infrastructure, allows to automatically combine audio recordings and text transcriptions in a way that the phones, words and audio signals are time-aligned.

Especially relevant for

anybody who has audio signals and transcriptions, for example researchers working with:

  • linguistics
  • phonetics
  • anthropology
  • ethnology
  • media studies
  • educational research
  • conversation analysis
  • speech pathology
  • political science
  • speech technology

Further information and example >>

WebMAUS-Pipeline: Dealing with long video interviews with interlocutor speech, noise, long silence intervals etc. 

Very long recordings (typical up to several hours in video interviews) are difficult to time-align. Therefore, the BAS offers a web service that automatically splits long recordings into so-called chunks, segments them individually, and combines the results into a common file, as demonstrated in this use case.

Especially relevant for

  • Linguistics
  • Phonetics
  • Phonology
  • Speech Technology

Further information and example >>

 

From text to phonological pronunciation

The orthography of many languages does not encode the precise pronunciation of the corresponding spoken utterance. In such cases, it is useful to be able to automatically transform a text into a phonological encoding (e.g. for speech synthesis). The CLARIN web service G2P provides such a tool for a multitude of languages.

Especially relevant for

  • Linguistics
  • Phonetics
  • Anthropology
  • Ethnology

Further information and example >>

WebMAUS-Basic: Automatic phonetic labelling and segmentation of multiple Hungarian recordings 

Interviews and conversation are often recorded and later transcribed. The web service WebMAUS, which is available in the CLARIN infrastructure, provides tools to combine audio recordings and transcriptions in a way that the words and audio signals are time aligned.

Especially relevant for

  • Linguistics
  • Phonetics
  • Speech Technology
  • Anthropology
  • Ethnology
  • Media Science

Further information and example >>

 

Cross-corpus search and download of recordings of the BAS CLARIN repository 

 

Large collections of speech recordings and annotations contain different sub-corpora. CLARIN provides such datasets for academic research. This requires authentication as a member of the academic society. 

Especially relevant for

  • Humanities scholars interested in empirical speech data
  • Developers in speech technology

Further information and example >>

Support of Enhanced Publications in CLARIN: Citation, Archiving and Access to research data 

Repositories contain research-based data available under certain conditions. As repositories are permanent archiving installations, the data in there can be cited and hence made visible. This allows reusing data, attributing the resource to the creator, and reproducing research results. Access to research data will be different from repository to repository.

Especially relevant for

  • Humanities scholars working with empirical speech data
  • Developers of speech technology

Further information and example >>

 

DiaCollo: Collocation analysis with a diachronic perspective 

Symbolbild DiaCollo

The meaning of a word can be revealed by the context in which it appears. Changes in a word's meaning will therefore often be directly associated with changes in its characteristic combinations (the set of words with which it typically occurs together, its collocates). DiaCollo is a software tool for the discovery, comparison, and interactive visualization of typical word combinations for user-specified target terms.

Especially relevant for

  • Historians
  • Political scientists
  • Philologists
  • Linguists

Further information and example >>

Using Automatic Annotation Tools for Transcription Files 

Symbolbild: WebLICHT as a Service mit EXMARaLDA

The EXMARaLDA Partitur-Editor enables access to the web services provided by WebLicht and the CLARIN-D infrastructure. WebLicht as a service allows workflows to be defined and later re-used with just one click.

Especially relevant for

anybody who works with the EXMARaLDA Partitur-Editor and wants to automatically annotate his or her files, for example:

  • Linguists
  • Anthropologists
  • Political scientists, especially those working with video and audio files

Further information and example >>

 

Where do you say 

Many linguistic resources contain geographic information, for example the location of the recording or the birthplace of a speaker. The tool Wo sagt man (German for "where do you say") uses the external data from the database of spoken German (Datenbank für Gesprochenes Deutsch, DGD) and highlights the areas in which an expression is being used. On a map, it shows the areas where an expression has been recorded. 

Especially relevant for

  • Dialectologists
  • Historians interested in specific regions
  • Philolologists

Further information and example >>

 

Context Search of Words in Distributed Corpora

The CLARIN Federated Content Search (CLARIN FCS) allows to search in language resources that are archived in different repositories. The aggregator converts the results so that they can be further processed in WebLicht, so as to, for example, perform Named Entity Recognition. 

Especially relevant for

  • Linguists
  • Computational Linguists

Further Information and examples >>

  

Word-level-based comparative text analysis 

Many questions from the field of Humanities relating to specific text resources can be reduced to the analysis of vocabulary. Especially comparison is of central interest. The aim of this use case is to demonstrate how to answer one's own research question.

Especially relevant for

All scholars that compare texts or vocabulary, including:

  • Historians
  • Political scientists
  • Philologists

Further information and example >>

Content analysis of biographical data supported by computational linguistics

Our web service "Textuelle Emigrationsanalyse" (German for 'textual emigration analysis') offers an example of how facts about emigration that were extracted from large textual corpora using computational linguistic techniques within the CLARIN infrastructure can be explored. The results can be seen either in tabular form, on a map with geographical information or person-centred.

Especially relevant for

  • Historians
  • Political scientists
  • Literary scholars

Further information and example >>

 

Automatic markup of personal and place names in textual sources 

Symbolbild Named Entity in WebLicht

Books, articles, and manuscripts often entail information about people, geographical locations, and organizations. With this tool, names can automatically be marked and categorized. 

Especially relevant for

  • Historians
  • Political scientists
  • Literary scholars

Further information and example >>