Examples like the ones in ( ) below are familiar to translators, but the examples of colours ( c), and the Japanese examples in ( d) are particularly striking. The latter because they show how languages need differ not only with respect to the fineness or `granularity' of the distinctions they make, but also with respect to the basis for the distinction: English chooses different verbs for the action/event of putting on, and the action/state of wearing. Japanese does not make this distinction, but differentiates according to the object that is worn. In the case of English to Japanese, a fairly simple test on the semantics of the NPs that accompany a verb may be sufficient to decide on the right translation. Some of the colour examples are similar, but more generally, investigation of colour vocabulary indicates that languages actually carve up the spectrum in rather different ways, and that deciding on the best translation may require knowledge that goes well beyond what is in the text, and may even be undecidable. In this sense, the translation of colour terminology begins to resemble the translation of terms for cultural artifacts (e.g. words like English cottage, Russian dacha, French château, etc. for which no adequate translation exists, and for which the human translator must decide between straight borrowing, neologism, and providing an explanation). In this area, translation is a genuinely creative act, which is well beyond the capacity of current computers.
Calling cases such as those above lexical mismatches is not controversial. However, when one turns to cases of structural mismatch, classification is not so easy. This is because one may often think that the reason one language uses one construction, where another uses another is because of the stock of lexical items the two languages have. Thus, the distinction is to some extent a matter of taste and convenience.
A particularly obvious example of this involves problems arising from what are sometimes called lexical holes --- that is, cases where one language has to use a phrase to express what another language expresses in a single word. Examples of this include the `hole' that exists in English with respect to French ignorer (`to not know', `to be ignorant of'), and se suicider (`to suicide', i.e. `to commit suicide', `to kill oneself'). The problems raised by such lexical holes have a certain similarity to those raised by idiom s: in both cases, one has phrases translating as single words. We will therefore postpone discussion of these until Section .
One kind of structural mismatch occurs where two languages use the same construction for different purposes, or use different constructions for what appears to be the same purpose.
Cases where the same structure is used for different purposes include the use of passive constructions in English, and Japanese . In the example below, the Japanese particle wa, which we have glossed as `TOP' here marks the `topic' of the sentence --- intuitively, what the sentence is about.
Example ( ) indicates that Japanese has a passive-like construction, i.e. a construction where the PATIENT, which is normally realized as an OBJECT, is realized as SUBJECT. It is different from the English passive in the sense that in Japanese this construction tends to have an extra adversive nuance which might make ( a) rather odd, since it suggests an interpretation where Mr Satoh did not want to be elected, or where election is somehow bad for him. This is not suggested by the English translation, of course. The translation problem from Japanese to English is one of those that looks unsolvable for MT, though one might try to convey the intended sense by adding an adverb such as unfortunately. The translation problem from English to Japanese is on the other hand within the scope of MT, since one must just choose another form. This is possible, since Japanese allows SUBJECTs to be omitted freely, so one can say the equivalent of elected Mr Satoh, and thus avoid having to mention an AGENT . However, in general, the result of this is that one cannot have simple rules like those described in Chapter for passives. In fact, unless one uses a very abstract structure indeed, the rules will be rather complicated.
We can see different constructions used for the same effect in cases like the following:
Figure: venir-de and have-just
The first example shows how English, German and French choose different methods for expressing `naming'. The other two examples show one language using an adverbial ADJUNCT ( just, or graag(Dutch) `likingly' or `with pleasure'), where another uses a verbal construction. This is actually one of the most discussed problems in current MT, and it is worth examining why it is problematic. This can be seen by looking at the representations for () in Figure .
These representations are relatively abstract (e.g. the information about tense and aspect conveyed by the auxiliary verb have has been expressed in a feature) , but they are still rather different. In particular, notice that while the main verb of (a) is see, the main verb of (b) is venir-de. Now notice what is involved in writing rules which relate these structures (we will look at the direction English French).
All this is summarized in Figure and Figure .
Figure: Translating have-just into venir-de
Of course, given a complicated enough rule, all this can be stated. However, there will still be problems because writing a rule in isolation is not enough. One must also consider how the rule interacts with other rules. For example, there will be a rule somewhere that tells the system how see is to be translated, and what one should do with its SUBJECT and OBJECT. One must make sure that this rule still works (e.g. its application is not blocked by the fact that the SUBJECT is dealt with by the special rule above; or that it does not insert an extra SUBJECT into the translation, which would give * Sam vient de Sam voir Kim). One must also make sure that the rule works when there are other problematic phenomena around. For example, one might like to make sure the system produces ( b) as the translation of ( a).
Figure: The Representation of venir-de
We said above that everything except the SUBJECT, and some of the tense information goes into the `lower' sentence in French. But this is clearly not true, since here the translation of probably actually becomes part of the main sentence, with the translation of (a) as its COMPLEMENT.
Of course, one could try to argue that the difference between English just and French venir de is only superficial. The argument could, for example, say that just should be treated as a verb at the semantic level. However, this is not very plausible. There are other cases where this does not seem possible. Examples like the following show that where English uses a `manner' verb and a directional adverb/prepositional phrase, French (and other Romance languages ) use a directional verb and a manner adverb. That is where English classifies the event described as `running', French classifies it as an `entering':
The syntactic structures of these examples are very different, and it is hard to see how one can naturally reduce them to similar structures without using very abstract representations indeed.
A slightly different sort of structural mismatch occurs where two languages have `the same' construction (more precisely, similar constructions, with equivalent interpretations), but where different restrictions on the constructions mean that it is not always possible to translate in the most obvious way. The following is a relatively simple example of this.
What this shows is that English and French differ in that English permits prepositions to be `stranded' (i.e. to appear without their objects, like in a). French normally requires the preposition and its object to appear together, as in ( d) --- of course, English allows this too. This will make translating ( a) into French difficult for many sorts of system (in particular, for systems that try to manage without fairly abstract syntactic representations). However, the general solution is fairly clear --- what one wants is to build a structure where ( a) is represented in the same way as ( c), since this will eliminate the translation problem. The most obvious representation would probably be something along the lines of ( a), or perhaps ( b).
While by no means a complete solution to the treatment of relative clause constructions, such an approach probably overcomes this particular translation problem. There are other cases which pose worse problems, however.
In general, relative clause constructions in English consist of a head noun ( letters in the previous example), a relative pronoun (such as which), and a sentence with a `gap' in it. The relative pronoun (and hence the head noun) is understood as if it filled the gap --- this is the idea behind the representations in ( ). In English, there are restrictions on where the `gap' can occur. In particular, it cannot occur inside an indirect question, or a `reason' ADJUNCT. Thus, ( b), and ( d) are both ungrammatical. However, these restrictions are not exactly paralleled in other languages. For example, Italian allows the former, as in ( a), and Japanese the latter, as in ( c). These sorts of problem are beyond the scope of current MT systems --- in fact, they are difficult even for human translators.