A parallel corpus consists of a collection of texts which have been
translated into one or more other language(s).
The direction of the translation is irrelevant: some texts can be the result of
a translation from language A into language B, and some other texts in the
same corpus may have been translated from language B into language B. In
most cases, the direction of the translation is not known to the user.
Parallel Corpora constitute an interesting tool for modern research since
they provide an insight in the nature of traslation, and Probabilistic
MT Systems can be trained on such corpora.
Examples of paralell corpora (or texts) are, for example,
The Canadian Hansard proceedings in English and French.
(Site where you type a word/phrase in
one language and get the equivalent in the other).
The English-Turkish Aligned Paralell Corpora
- The PENDANT Project, Gothenborg, Sweden.
Searching interface for searching language pairs of Swedish and English,
German, or French. Texts mostly official EU documents and some presse releases
from Volvo and Skanska.
The yearly declaration of the Swedish government issued in Swedish, English, French,
German, and Spanish.
WHO bilingual documents
Aligned texts English-French, English-Spanish
English-Norwegian Parallel Corpus 50 original text extracts (10,000-15,000
words) and their translations to and from Neowegian/English. Demo search available.
More information on parallel corpora can be found on Michael Barlow's Corpus