Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Free program translates Euro languages to/from English
- Date: Wed, 2 Nov 2005 16:54:29 +0900
- From: Michael Smith <smith@example.com>
- Subject: Re: [tlug] Free program translates Euro languages to/from English
- References: <20051101221020.37d85785.jep200404@example.com> <d8fcc0800511011954v6c8dd402k129d0fd5c3f6eb7d@example.com>
- User-agent: Mutt/1.5.9i
Josh Glover <jmglov@example.com> writes: > Machine translation of Japanese (and, I would assume, Korean) is > considered by some linguists to be an impossible problem. I > disagree, but at the same time admit that the problem is > decidedly non-trivial. It will take some serious AI-style shit > to solve. > > I am of the opinion that anything can be modelled, provided you > have a dense enough set of rules. These rules might be > grammar-based, or they might be heuristics. In the case of > Japanese (and probably Korean as well, though maybe to a > slightly lesser degree), you also have to model a shit-tonne of > context. And this is what current machine translation programs > *do not* do well. Actually, I think one problem with most machine translation tools is that they try to be _too_ smart. Sometimes (maybe usually) I don't want or need grammar translation. I just want the source translated word-by-word. Or even morpheme-by-morpheme. For example, I might like to see 会いたかったの? (aitakatta no?) translated like this: "aitakatta no" = "meet/see [want] [past tense] [question]" I think that even someone not familar with Japanese at all would say, OK, looks as if that means something like "Wanted to meet?" or "Wanted to see?" in English. (Yeah, depending on the context, it might could probably really mean more like "Did you miss me?") But if I put 会いたかったの? into Babelfish or Google translation, I get: When you want to meet? Which is just plain wrong. Where the hell does it get "when" from? So try 会いたかったんですか? (aitakattan desu ka?), and get: When we would like to meet, it is? Huh? So, try to keep it as simple as possible. Type in 会った。(atta) and 会いました。 (aimashita). atta = It met. aimashita = It met. Now 友達と会った。(tomodachi to atta) goes in. And out comes: tomodachi to atta = "It met with the friend." So now it's doing the "No idea what the subject should be so I'll just use 'It'" and the "OK, we need an indefinite or definite article here, so I'll just choose 'the'" things. At the very least, for these cases, no tool should be inventing an arbitrary subject or arbitrarily choosing an article. Better: tomodachi to atta = {subject ellided} met with a/the friend(s). But what would be much more helpful instead is: atta = meet/see [plain/informal past tense] aimashita = meet/see [polite past tense] tomodachi to atta = friend(s) with meet/see [plain past tense] Of course the person reading that would need to understand that Japanese uses "subject object verb" order. But if I understand the word order, having it translated as "friend(s) with meet/see [plain past tense]" is much more clear to me than "It met with the friend." I seem to remember once seeing a tool once that did Japanese to English translation in a word-by-word sort of "aitakatta no" = "meet/see [want] [past tense] [question]" way. (Or maybe it only did English to Japanese.) What it actually did was: Given some source text (a web page, maybe?), it would re-render the entire text, but with word-by-word rubi translations added above each line. I think it also created hyperlinks for each word -- dictionary links. So if a word had multiple meanings, you could see what those multiple meanings were, and figure out from the context it was in which meaning was the intended one. That's another problem with most other machine translation tools: They don't preserve any of the amiguity of the original text. For example, 会う (au) could be translated as both "meet" and "see". If most tools find a word with multiple possible translations, they just choose one and put that into the translated output. I would guess that in most cases, they are just choosing the most common translation of the word. I would much rather they just showed me all the possible translations. That said, I guess there is not nearly as much of an issue with ambiguity in translating most Japanese and Chinese text -- where most of the text is ideograms -- as there is in translating text that is in a language written in a phonetic alphabet. --Mike -- Michael Smith http://sideshowbarker.net/
- Follow-Ups:
- Re: [tlug] Free program translates Euro languages to/from English
- From: Edward Wright
- Re: [tlug] Free program translates Euro languages to/from English
- From: Michael Reinsch
- Re: [tlug] Free program translates Euro languages to/from English
- From: Lyle (Hiroshi) Saxon
- Re: [tlug] Free program translates Euro languages to/from English
- From: Josh Glover
- References:
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] SuSe10, USB 2.0, Higher Resolution, etc.
- Next by Date: Re: [tlug] Free program translates Euro languages to/from English
- Previous by thread: Re: [tlug] Free program translates Euro languages to/from English
- Next by thread: Re: [tlug] Free program translates Euro languages to/from English
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links