These estimates of CEC translation costs are from [Patterson1982].

In fact, one can get perfect translations from one kind of system, but at the cost of radically restricting what an author can say, so one should perhaps think of such systems as (multilingual) text creation aids, rather than MT systems. The basic idea is similar to that of a phrase book, which provides the user with a collection of `canned' phrases to use. This is fine, provided the canned text contains what the user wants to say. Fortunately, there are some situations where this is the case.

Of course, the sorts of errors one finds in draft translations produced by a human translator will be rather different from those that one finds in translations produced by machine.

Of course, some languages have larger vocabularies than others, but this is mainly a matter of how many things the language is used to talk about (not surprisingly, the vocabulary which Shakespeare's contemporaries had for discussing high-energy physics was rather impoverished), but all languages have ways of forming new words, and this has nothing to do with logical perfection.

Weaver described an analogy of individuals in tall closed towers who communicate (badly) by shouting to each other. However, the towers have a common foundation and basement. Here communication is easy: ``Thus it may be true that the way to translate ... is not to attempt the direct route, shouting from tower to tower. Perhaps the way is to descend, from each language, down to the common base of human communication --- the real but as yet undiscovered universal language.''

Hatim  and Mason  [Hatim and Mason1990] give a number of very good examples where translation requires this sort of cultural mediation.

For some reason, linguists' trees  are always written upside down, with the `root' at the top, and the leaves (the actual words) at the bottom.

In English, SUBJECTs can only be omitted in imperative sentences, for example orders, such as Clean the printer regularly, and in some embedded sentences, e.g. the boxed part of It is essential 122#122

We have not specified the time-reference information: see Chapter gif.

Another possibility would be to have another rule which put the translated preposition immediately after the verb object, giving Turn the button back a position.

The names of these particular Semantic Relations  should not be taken too seriously. In fact, of course, it does not much matter what the relations are called, so long as they are the same in the source and target grammars. 

`Paper' here is intended to convey `intended for human readers', as opposed to `electronic'  meaning `intended for use by computers'. Of course, it is possible for a paper dictionary to be stored on a computer like any other document, and our use of `paper' here is not supposed to exclude this. If one were being precise, one should distinguish `paper' dictionaries, `machine readable'  dictionaries (conventional dictionaries which are stored on, and can therefore be accessed automatically by computer), and `machine usable dictionaries'.

The form of the monolingual  entry is based on that used in the Oxford Advanced Learner's Dictionary (OALD); the bilingual entry is similar to what one finds in Collins-Robert English-French dictionary.

One can also get some idea of the cost of dictionary construction from this. Even if one were able to write four entries an hour, and keep this up for 8 hours a day every working day, it would still take over three years to construct even a small size dictionary. Of course, the time it takes to write a dictionary entry is very variable, depending on how much of the work has already been done by other lexicographers.

In fact, it is arguable that the vocabulary of a language like English, with relatively productive morphological processes, is infinite, in the sense that there is no longest word of the language. Even the supposedly longest word antidisestablishmentarianism can be made longer by adding a prefix such as crypto-, or a suffix such as -ist. The result may not be pretty, but it is arguably a possible word of English. The point is even clearer when one considers compound  words (see Section gif.

The restriction applying on the OBJECT of the verb actually concerns the thing which is buttoned whether that appears as the OBJECT of a active sentence or the SUBJECT of a passive sentence.

In this rule we write +finite for finite=+. We also ignore some issues about datatypes, in particular, the fact that on the right-hand-side V stands for a string of characters, while on the lefthand (lexical) side it stands for the value of an attribute, which is probably an atom, rather than a string.

More precisely, the rule is that the third person singular form is the base form plus s, except (i) when the base form ends in s, ch, sh, o, x, z, in which case +es is added (for example, poach- poaches, push- pushes), and (ii) when the base form ends in y, when ies is added to the base minus y.

Notice, however, that we still cannot expect morphological analysis and lexical lookup to come up with a single right answer straight away. Apart from anything else, a form like affects could be a noun rather than a verb. For another thing, just looking at the word form in isolation will not tell us which of several readings of a word is involved.

Note that the category of the stem word is important, since there is another prefix un which combines with verbs to give verbs which mean `perform the reverse action to X' --- to unbutton is to reverse the effect of buttoning.

Where words have been fused together to form a compound , as is prototypically the case in German , an additional problem presents itself in the analysis of the compound , namely to decide exactly which words the compound consists of. The German word Wachtraum, for example, could have been formed by joining Wach and Traum giving a composite meaning of day-dream. On the other hand, it could have been formed by joining Wacht to Raum, in which case the compound would mean guard-room.

Creative in the sense of `genuine invention which is not governed by rules', rather than the sense of `creating new things by following rules' --- computers have no problem with creating new things by following rules, of course.

This discussion of the Japanese passive is a slight simplification. The construction does sometimes occur without the adversive sense, but this is usually regarded as a `europeanism', showing the influence of European languages.

This is a simplification, of course. For one thing, it could be used to refer to something outside the discourse, to some entity which is not mentioned, but pointed at, for example. For another thing, there are some other potential antencedents, such as the back in (411#411 a), and it could be that Speaker A is returning to the digression in sentence (412#412 f). Though the discourse  structure can helps to resolve pronoun-antecedent relations, discovering the discourse structure poses serious problems.

Politeness dictates that giving by the hearer to the speaker is normally giving `downwards' ( kureru), so this is the verb used to describe requests, and giving by the speaker to the hearer is normally giving `upwards' ( ageru), so this is the verb used to describe offers, etc.

As noted above, knowledge about selectional restrictions  is unusual in being defeasible in just this way: the restriction that the AGENT of eat is ANIMATE is only a preference, or default, and can be overridden. This leads some to think that it is not strictly speaking linguistic knowledge at all. In general, the distinction between linguistic and real world  knowledge is not always very clear.

ASCII stands for American Standard Code for Information Interchange.

For example, suppose one has a printer manual marked up in this way, with special markup used for the names of printer components wherever they occur. It would be very easy to extract a list of printer parts automatically, together with surrounding text. This text might be a useful addition to a parts database. As regards consistency, it would be easy to check that each section conforms to a required pattern --- e.g. that it contains a list of all parts mentioned in the section.

Although most elements of the structure are exactly matched, there may sometimes be differences. For example, if the document element Paragraph is composed of document element Sentence(s), it is perhaps unwise to insist that each Sentence in each language is paired exactly with a single corresponding Sentence in every other language, since frequently there is a tendency to distribute information across sentences slightly differently in different languages. However, at least for technical purposes, it is usually perfectly safe to assume that the languages are paired Paragraph by Paragraph, even though these units may contain slightly different numbers of sentences for each language.

`PACE' stands for `Perkins Approved Clear English'.

For an excellent discussion of the range of aspects that a good translation may need to take into account, see Hatim and Mason [Hatim and Mason1990].

This comes from the section on `Talking to the Tailor' in an English-Italian phrasebook of the 1920s.

`Declarative' here is to be contrasted with `procedural'. A declarative specification of a program states what the program should do, without considering the order in which it must be done. A procedural specification would specify both what is to be done, and when. Properties like Accuracy and Intelligibility  are properties of a system which are independent of the dynamics of the system, or the way the system operates at all --- hence `non-procedural', or `declarative'.

It would be nice to try to find possible problem areas by some sort of automatic scanning of bilingual texts but the tools and techniques are not available to date.

Here `same value' is to be interpreted strongly, as token identity --- in a sentence with two nouns, there would be two objects with the `same' category value, namely, the two nouns. This is often called `type' identity. In everyday usage, when we speak of two people having the `same' shirt, we normally mean type identity. Token identity would involve them sharing one piece of clothing. On the other hand, when we speak of people having the same father, we mean token identity.

This may use the measure of Mutual Information, taking into account (roughly) the amount of mutual context elements share

Arnold D J
Thu Dec 21 10:52:49 GMT 1995