Tvorba, funkce a využití Česko-německého paralelního korpusu

Title in English Construction, Functions and Usage of the Czech-German Parallel Corpus
Authors

PELOUŠKOVÁ Hana KÁŇA Tomáš

Year of publication 2007
Type Article in Proceedings
Conference Gramatika a korpus / Grammar & Corpora 2005
MU Faculty or unit

Faculty of Education

Citation
Field Linguistics
Keywords corpus linguistics; Czech-German parallel corpus; contemporary Czech; contemporary German; contrastive research
Description The absence of a Czech-German parallel corpus ended in 2001 with the start of construction of The Czech-German parallel Corpus (CNPK) by the authors of this article. The CNPK runs under Bonito interface (the same as the CNC). It consists of two independent but linked parallels with a total of more than 6,5 million text words. The corpus is manually aligned, automatically tagged in both parts (tagger Ajka in the Czech part; Tree-Tagger in the German part). The CNPK is a general synchronic corpus trying to cover as many stylistic features as possible: texts preferably not older than 50 years; balanced in style: 50% fiction, 50% non-fiction (scientific texts from potentially all fields and public, esp. journalistic texts in the wider meaning). Obviously, there are no texts of the spoken language. All texts are of either Czech or German origin (no "third langue" texts). The CNPK will be partly accessible as a part of the multilingual corpus INTERCORP (https://trnka.ff.cuni.cz/ucnk/intercorp/)

You are running an old browser version. We recommend updating your browser to its latest version.