Practical Training Stays for (Early Stage) Researchers (Hosting)

  • Research stay in any CLARIN-D Centre
  • Contact in case of interest/questions: CLARIN-D Helpdesk

What can we offer?

  • hands-on experience and help related to the usage of language resources and/or language technology for a particular research project
  • financial aid: grant for travel costs, accommodation, ...

Target group

Requirements

  • concrete research plan
  • report about your stay
  • presentation of your work at the center
  • integration of the outcome of your work in CLARIN-D

Information about the centres

CentreAreas of expertiseResearch interestsProjects
SFS, Tübingen
  • design, collection, validation, and distribution of treebanks,
  • design, collection, validation, and distribution of WordNet data,
  • design and implementation of web services,
  • integration of web services into service-oriented architectures (SOA)
  • corpus linguistics,
  • word nets,
  • natural language processing,
  • web services,
  • SOA
BAS, Munich
  • design, collection, validation, and distribution of speech databases,
  • integration of speech corpora,
  • software development, consulting
  • speech database models,
  • automatic annotation of spoken language,
  • grapheme-to-phoneme conversion,
  • recording and annotation via crowd-sourcing
  • ALC (speech database of intoxicated speakers, DFG),
  • PERCY (development of a web-based experiment framework, internal)
BBAW
  • Creation, annotation, and maintenance of text corpora (as reference corpora) for the historical and contemporary German language
  • Curation and integration of digitized text resources
  • Quality assurance (DTAQ; i.a. via crowdsourcing)
  • Corpus based, and computational linguistics assisted lexicography
  • Linguistic search engine DDC, integrating the results of linguistic text analyses
  • (Automatic) linguistic analysis and annotation of text corpora
  • Handling of non-standard spellings (e.g. historical spellings, computer mediated communication)
  • TEI/P5-schemata for the structural tagging and the cataloging of huge text corpora (historical texts: DTA base format; computer mediated communication: TEI-CMC
  • Metadata conversion into different formats (e.g. CMDI)
IDS, Mannheim
  • design, collection, validation, and distribution of large scale written corpora
  • design, collection, validation, and distribution of spoken corpora
  • design, collection, validation, and distribution of electronic lexica
  • German Linguistics
MPI, Nijmegen
  • Language technology (http://www.mpi.nl/lat)
  • Online archiving of language resources (http://www.mpi.nl/tla)
  • Design, collection, validation, and dissemination of spoken, sign language, and multimodal corpora
  • Curation of a variety of language resources from linguistic & ethnological field work and endangered languages
  • Corpora on language acquisition/comprehension/production research, language genetics research, neurobiology of language research, sign language research, etc.
  • Design and development of state-of-the-art language annotation, archiving and accessing technology (annotations, lexicons, metadata, etc.)
  • Design and development of infrastructures for language resources
  • Development and integration of audio/video recognizers
  • Design and implementation of standards (http://www.isocat.org/, LMF)
  • Language documentation
  • Experimental setups
  • Brain imaging-based language research
  • Multimedia annotations & gesture research
  • Language acquisition / comprehension / production, language & genetics research
  • Neurobiology of language research
  • etc.
Projects of the Language Archive:
HZSK, Hamburg
  • (Advice on) corpus design and compilation, specially spoken language data,
  • Software development (EXMARaLDA)
  • Corpus distribution
Methods, tools for and distribution of:
  • spoken language data used for conversational research (discourse analysis, conversation analysis, interactional linguistics),
  • sign language corpora and lexicons,
  • spoken language data documenting first or second language acquisition and language usage by multilingual individuals (e.g. corpus of child language acquisition, acquisition of second languages or language attrition in adults),
  • spoken language data documenting social and regional variation of particular language (e.g. dialect corpora),
  • written language data allowing the study of multilingualism (e.g. parallel corpora),
ASV, Leipzig
  • Web-crawling / collecting texts from the web
  • Compilation of reference corpora or electronic dictionaries in numerous languages (preprocessing and calculation of statistical measures like frequencies, sentences and proximity occurrences... on large amounts of data)
  • (linguistic) web-services
  • Information Retrieval and knowledge management
  • Semantic Web
  • Text mining
In general:
  • data, methods and applications for the automatic semantic analysis of the raw knowledge base texts
  • tracing quotations (eTraces)
  • see also: areas of expertise
UdS, Saarbrücken
  • compilation and annotation of corpora;
  • empirical corpus linguistic
  • language variation, register analysis: synchronic, diachronic;
  • multilingual corpora (parallel, comparable);
  • scientific corpora
IMS, Stuttgart Computational Linguistics:
  • rule-based, machine learning and combined approaches to analysis and generation
  • morphology, syntax, semantics, discourse semantics, prosody
  • linguistic corpus annotation, lexical resources, representation and exchange formats, annotation standards
  • multi-lingual tools for robust tagging, morphological, syntactic analysis (dependency/constituent structure), semantic role labeling, relation extraction, coreference resolution
  • statistical machine translation
  • parsing and generation: linguistically informed data-driven models and techniques
  • tagging, morphological, syntactic and semantic analysis: model combination, improved coverage and quality, enhanced representations in standard training resources
  • web service integration of a new family of data-driven analysis tools (lemmatizer, dependency parser, semantic role labeler)
  • parametrization of analysis tools for more targeted application (domain adaptation, adjustment to non-canonical language use etc.)