KNAW

Research

Machine Translation When Exact Pattern Match Fails

Pagina-navigatie:
Title Machine Translation When Exact Pattern Match Fails
Period 09 / 2010 - unknown
Status Current
Data Supplier: NWO

Abstract

The translation between languages is a rare expertise attained by training and experience. Apart from {\em ambiguity}, translation is complicated by the {\em divergence}\/ among languages in morphology, word order, and the idiomatic ways of expressing concepts. The field of Machine Translation (MT) concerns building effective models of automatic translation. With {\em English}\/ in mind as the prototypical language, current state-of-the-art statistical MT models employ a ``translation dictionary/table" that consists of fixed pattern pairs (called phrase pairs), harvested from a bilingual corpus of example translations (parallel corpus). A phrase pair is used during translation only if its source side {\em exactly matches} contiguous subsequences of the input. However, by assuming English as the prototype language, a {\sl major challenge is shoved aside}: the notorious {\em intra-language variation}\/ in morphological forms and the freer word order that is typical for many languages, e.g., Polish, Greek, Arabic, Finnish. This proposal addresses machine translation from/into languages with substantial morpho-syntactic variation. Instead of a mere dictionary, it proposes a model that works with a probabilistic synchronous grammar over morpho-syntactic representations extracted from a parallel corpus. As well as the pattern pairs found in training data, this grammar can generate morphological and syntactic variants that are currently not available to the state-of-the-art models. The probability estimates over variants are such that least deviating variants are preferred. Instead of translation by mere exact match, our model translates by the most likely variant.

Related organisations

Related people

Project leader Dr. K. Sima'an
Update this data

Go to page top
Go back to contents
Go back to site navigation