Practical Training Stays for (Early Stage) Researchers (Hosting)

Research stay in any CLARIN-D Centre
Contact in case of interest/questions:

What can we offer?

hands-on experience and help related to the usage of language resources and/or language technology for a particular research project
financial aid: grant for travel costs, accommodation, ...

Target group

(Early stage) researchers with a well defined research project, who need help to use linguistic resources and/or language technologies.
See the list of beneficiaries of the CLARIN-D hostings programme.

Requirements

concrete research plan
report about your stay
presentation of your work at the center
integration of the outcome of your work in CLARIN-D

Information about the centres

Information about the centres

Centre	Areas of expertise	Research interests	Projects
SFS, Tübingen	design, collection, validation, and distribution of treebanks, design, collection, validation, and distribution of WordNet data, design and implementation of web services, integration of web services into service-oriented architectures (SOA)	corpus linguistics, word nets, natural language processing, web services, SOA	CLARA, EUDAT
BAS, Munich	design, collection, validation, and distribution of speech databases, integration of speech corpora, software development, consulting	speech database models, automatic annotation of spoken language, grapheme-to-phoneme conversion, recording and annotation via crowd-sourcing	ALC (speech database of intoxicated speakers, DFG), PERCY (development of a web-based experiment framework, internal)
BBAW	Creation, annotation, and maintenance of text corpora (as reference corpora) for the historical and contemporary German language Curation and integration of digitized text resources Quality assurance (DTAQ; i.a. via crowdsourcing) Corpus based, and computational linguistics assisted lexicography Linguistic search engine DDC, integrating the results of linguistic text analyses	(Automatic) linguistic analysis and annotation of text corpora Handling of non-standard spellings (e.g. historical spellings, computer mediated communication) TEI/P5-schemata for the structural tagging and the cataloging of huge text corpora (historical texts: DTA base format; computer mediated communication: TEI-CMC Metadata conversion into different formats (e.g. CMDI)	DWDS DTA dlexDB
IDS, Mannheim	design, collection, validation, and distribution of large scale written corpora design, collection, validation, and distribution of spoken corpora design, collection, validation, and distribution of electronic lexica	German Linguistics	TextGrid
MPI, Nijmegen	Language technology (http://www.mpi.nl/lat) Online archiving of language resources (http://www.mpi.nl/tla) Design, collection, validation, and dissemination of spoken, sign language, and multimodal corpora Curation of a variety of language resources from linguistic & ethnological field work and endangered languages Corpora on language acquisition/comprehension/production research, language genetics research, neurobiology of language research, sign language research, etc. Design and development of state-of-the-art language annotation, archiving and accessing technology (annotations, lexicons, metadata, etc.) Design and development of infrastructures for language resources Development and integration of audio/video recognizers Design and implementation of standards (http://www.isocat.org/, LMF)	Language documentation Experimental setups Brain imaging-based language research Multimedia annotations & gesture research Language acquisition / comprehension / production, language & genetics research Neurobiology of language research etc.	Projects of the Language Archive: CLARIN NL (http://www.clarin.nl) DOBES (http://www.mpi.nl/DOBES/) EUDAT (http://www.eudat.eu) DASISH (EC, site to come) Radieschen (http://www.forschungsdaten.org/) TextGrid (http://www.textgrid.de) AVATecH (http://www.mpi.nl/avatech) INNET (EC, site to come)
HZSK, Hamburg	(Advice on) corpus design and compilation, specially spoken language data, Software development (EXMARaLDA) Corpus distribution	Methods, tools for and distribution of: spoken language data used for conversational research (discourse analysis, conversation analysis, interactional linguistics), sign language corpora and lexicons, spoken language data documenting first or second language acquisition and language usage by multilingual individuals (e.g. corpus of child language acquisition, acquisition of second languages or language attrition in adults), spoken language data documenting social and regional variation of particular language (e.g. dialect corpora), written language data allowing the study of multilingualism (e.g. parallel corpora),	Hamburger Zentrum für Sprachkorpora Etablierung eines Schwerpunkts 'Mehrsprachigkeit und Gesprochene Sprache' am HZSK
ASV, Leipzig	Web-crawling / collecting texts from the web Compilation of reference corpora or electronic dictionaries in numerous languages (preprocessing and calculation of statistical measures like frequencies, sentences and proximity occurrences... on large amounts of data) (linguistic) web-services Information Retrieval and knowledge management Semantic Web Text mining	In general: data, methods and applications for the automatic semantic analysis of the raw knowledge base texts tracing quotations (eTraces) see also: areas of expertise	eTraces Automatic enrichment of OAI-Metadata INSEARCH permanent project of the chair: Linguistic data resources - German dictionary, multilingual corpora and word of the day see also: current overview of the ASV's website
UdS, Saarbrücken	compilation and annotation of corpora; empirical corpus linguistic	language variation, register analysis: synchronic, diachronic; multilingual corpora (parallel, comparable); scientific corpora	GeCCo (German-English contrasts in cohesion) Register in Contact
IMS, Stuttgart	Computational Linguistics: rule-based, machine learning and combined approaches to analysis and generation morphology, syntax, semantics, discourse semantics, prosody linguistic corpus annotation, lexical resources, representation and exchange formats, annotation standards multi-lingual tools for robust tagging, morphological, syntactic analysis (dependency/constituent structure), semantic role labeling, relation extraction, coreference resolution statistical machine translation	parsing and generation: linguistically informed data-driven models and techniques tagging, morphological, syntactic and semantic analysis: model combination, improved coverage and quality, enhanced representations in standard training resources web service integration of a new family of data-driven analysis tools (lemmatizer, dependency parser, semantic role labeler) parametrization of analysis tools for more targeted application (domain adaptation, adjustment to non-canonical language use etc.)	SFB (Collaborative Research Centre) 732 "Incremental Specification in Context"(involving Linguistics and Computational Linguistics) B3: Disambiguierung von Nominalisierungen bei der Datenextraktion aus Korpora: Morphologisch verwandte Wörter D2: Combining Contextual Information Sources for Disambiguation in Parsing and Choice in Generation D4: Modular Lexicalization of Probabilistic Context-Free Grammars: Improved Parameter Estimation and Clustering Methods for Statistical Parsing D8: Data-driven Dependency Parsing - Context Factors in Dependency Classification