Centre | Areas of expertise | Research interests | Projects |
SFS, Tübingen |
- design, collection, validation, and distribution of treebanks,
- design, collection, validation, and distribution of WordNet data,
- design and implementation of web services,
- integration of web services into service-oriented architectures (SOA)
|
- corpus linguistics,
- word nets,
- natural language processing,
- web services,
- SOA
|
|
BAS, Munich |
- design, collection, validation, and distribution of speech databases,
- integration of speech corpora,
- software development, consulting
|
- speech database models,
- automatic annotation of spoken language,
- grapheme-to-phoneme conversion,
- recording and annotation via crowd-sourcing
|
- ALC (speech database of intoxicated speakers, DFG),
- PERCY (development of a web-based experiment framework, internal)
|
BBAW |
- Creation, annotation, and maintenance of text corpora (as reference corpora) for the historical and contemporary German language
- Curation and integration of digitized text resources
- Quality assurance (DTAQ; i.a. via crowdsourcing)
- Corpus based, and computational linguistics assisted lexicography
- Linguistic search engine DDC, integrating the results of linguistic text analyses
|
- (Automatic) linguistic analysis and annotation of text corpora
- Handling of non-standard spellings (e.g. historical spellings, computer mediated communication)
- TEI/P5-schemata for the structural tagging and the cataloging of huge text corpora (historical texts: DTA base format; computer mediated communication: TEI-CMC
- Metadata conversion into different formats (e.g. CMDI)
|
|
IDS, Mannheim |
- design, collection, validation, and distribution of large scale written corpora
- design, collection, validation, and distribution of spoken corpora
- design, collection, validation, and distribution of electronic lexica
|
|
|
MPI, Nijmegen |
- Language technology (http://www.mpi.nl/lat)
- Online archiving of language resources (http://www.mpi.nl/tla)
- Design, collection, validation, and dissemination of spoken, sign language, and multimodal corpora
- Curation of a variety of language resources from linguistic & ethnological field work and endangered languages
- Corpora on language acquisition/comprehension/production research, language genetics research, neurobiology of language research, sign language research, etc.
- Design and development of state-of-the-art language annotation, archiving and accessing technology (annotations, lexicons, metadata, etc.)
- Design and development of infrastructures for language resources
- Development and integration of audio/video recognizers
- Design and implementation of standards (http://www.isocat.org/, LMF)
|
- Language documentation
- Experimental setups
- Brain imaging-based language research
- Multimedia annotations & gesture research
- Language acquisition / comprehension / production, language & genetics research
- Neurobiology of language research
- etc.
|
Projects of the Language Archive:
|
HZSK, Hamburg |
- (Advice on) corpus design and compilation, specially spoken language data,
- Software development (EXMARaLDA)
- Corpus distribution
|
Methods, tools for and distribution of:
- spoken language data used for conversational research (discourse analysis, conversation analysis, interactional linguistics),
- sign language corpora and lexicons,
- spoken language data documenting first or second language acquisition and language usage by multilingual individuals (e.g. corpus of child language acquisition, acquisition of second languages or language attrition in adults),
- spoken language data documenting social and regional variation of particular language (e.g. dialect corpora),
- written language data allowing the study of multilingualism (e.g. parallel corpora),
|
|
ASV, Leipzig |
- Web-crawling / collecting texts from the web
- Compilation of reference corpora or electronic dictionaries in numerous languages (preprocessing and calculation of statistical measures like frequencies, sentences and proximity occurrences... on large amounts of data)
- (linguistic) web-services
- Information Retrieval and knowledge management
- Semantic Web
- Text mining
|
In general:
- data, methods and applications for the automatic semantic analysis of the raw knowledge base texts
- tracing quotations (eTraces)
- see also: areas of expertise
|
|
UdS, Saarbrücken |
- compilation and annotation of corpora;
- empirical corpus linguistic
|
- language variation, register analysis: synchronic, diachronic;
- multilingual corpora (parallel, comparable);
- scientific corpora
|
|
IMS, Stuttgart |
Computational Linguistics:
- rule-based, machine learning and combined approaches to analysis and generation
- morphology, syntax, semantics, discourse semantics, prosody
- linguistic corpus annotation, lexical resources, representation and exchange formats, annotation standards
- multi-lingual tools for robust tagging, morphological, syntactic analysis (dependency/constituent structure), semantic role labeling, relation extraction, coreference resolution
- statistical machine translation
|
- parsing and generation: linguistically informed data-driven models and techniques
- tagging, morphological, syntactic and semantic analysis: model combination, improved coverage and quality, enhanced representations in standard training resources
- web service integration of a new family of data-driven analysis tools (lemmatizer, dependency parser, semantic role labeler)
- parametrization of analysis tools for more targeted application (domain adaptation, adjustment to non-canonical language use etc.)
|
|