Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlug] Editing XML



Well, the acute problem, even after I've got memsource to work one way or another, is Bigger, and Almost interesting.

We have this vast mass* of mathematical teaching material in Japanese, in In Design format, to be converted/translated into British English**. The Plan is to use Memsource... oh well, I already said:
> Of course, I do not think Memsource is the right tool for this job:
> if you have a manual, preferably translating between two
> Indo-european languages, it may be brilliant at picking up common,
> almost identical sentences. Generally for Japanese it strikes me as
> miserable, ...

* Could be why the project name is Xxxx_Mass_Textbook. Could be not.
** I swear no project I have never involved in in Japan with "BrE" has failed to be a disaster on some scale.

So the question is: What is the right tool? InDesign files (haven't actually managed to see one yet) appear to be xml, and I assume that Memsource has a generic XML cracker, which extracts "segments" of text from xml documents, and puts back the "translations" afterwards. Somehow it decides which tags to simply leave as markers within the "segment"; if these are present in the translation, the same graphics etc etc etc will appear. Past experience shows that even in ordinary documents, where these are relatively rare, every so often it is impossible to generate a (mindless; there, I said it) translation which fits into this scheme. Or you get two sentence fragments like

Fuji-san [[picture of mountain]] ni tuite hanashimasu.

And you just have to translate "Fuji-san" as "We'll talk about", and translate "ni tuite hanashimasu." as "Mount Fuji." In principle, if you are taking the translation memory seriously, this will cause chaos.

Recently I had the pleasure of dealing with Latex, for the first time. This is designed to be written/read by humans, and makes translation really easy. Memsource does not. The question is whether there is some other generic framework for cracking the text out of (specifically .idml) xml files for translation, in an intelligent and flexible way, capable of helping automation, rather than hindering it. For example, one global replace, something like (imagined example):

s/<char-special type='maru-suuji' value=$N>/($N)/

... would replace every circled number by the appropriate (n), supposing that this is the design decision. To do this in Memsource effectively means that every single numeral will be retyped, errors will occur, etc etc. COST.

Or does anyone know of anyone who does this sort of thing, commercially? It seems to me that a proper approach could well halve the total translator time needed, which would pay for everything. (And save me mental anguish.)

It also occurs to me I might come down to Tokyo to talk to my customers (translation agency) on Friday afternoon, then turn up at the nomi-kai. It would be nice just to be able to talk to someone's shoulder.

Brian Chandler



Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links