[tlug] Editing XML

Date: Thu, 19 Jun 2014 09:53:26 +0900
From: "Stephen J. Turnbull" <turnbull@example.com>
Subject: [tlug] Editing XML
References: <53A1E0CF.7050004@imaginatorium.org>

Brian Chandler writes:

 > So the question is: What is the right tool?

There probably isn't one that's compatible with using Memsource at
all.

 > InDesign files (haven't actually managed to see one yet)

These are the upstream publisher's format?

 > appear to be xml, and I assume that Memsource has a generic XML
 > cracker, which extracts "segments" of text from xml documents, and
 > puts back the "translations" afterwards.

I wouldn't bet on this model.  It may indeed do things this way, but
that's not obvious, and it may matter.

 > Fuji-san [[picture of mountain]] ni tuite hanashimasu.
 > 
 > And you just have to translate "Fuji-san" as "We'll talk about", and 
 > translate "ni tuite hanashimasu." as "Mount Fuji."

Is it possible to tell the customer that they're clueless, and that
the tool is just plain a loser for such different grammars?

Even C's printf (!!) has facilities for doing this correctly.  It's
not rocket science.  If Memsource can't do that, it's not going to
help you do your job at all.

 > really easy. Memsource does not. The question is whether there is some 
 > other generic framework for cracking the text out of (specifically 
 > .idml) xml files for translation, in an intelligent and flexible way, 
 > capable of helping automation, rather than hindering it.

Sure.  All the P-languages (Python, Perl, Ruby, PHP) have XML parser
bindings.  For a simple find/replace, it would be easier to use sed,
though.

 > For example, one global replace, something like (imagined example):
 > 
 > s/<char-special type='maru-suuji' value=$N>/($N)/

This isn't a very helpful example, as maru-suuji are just plain old
Unicode, and sed will do the job fine (tr won't work since it's 1-1).
Emacs might do the job better.

 > ... would replace every circled number by the appropriate (n), supposing 
 > that this is the design decision. To do this in Memsource effectively 
 > means that every single numeral will be retyped, errors will occur, etc 
 > etc. COST.

You've used Memsource before?

 > Or does anyone know of anyone who does this sort of thing,
 > commercially? 

This is precisely what Emacs is all about.  Despite the name, nxhtml
mode is a generic SGML parser and editor.  You might be well-advised
to learn Emacs and a bit of Lisp.

 > It seems to me that a proper approach could well halve the total 
 > translator time needed, which would pay for everything. (And save me 
 > mental anguish.)
 > 
 > It also occurs to me I might come down to Tokyo to talk to my customers 
 > (translation agency) on Friday afternoon,

The customers are Japanese?  Uh-oh.

Footnotes: 
[1]  Sorry for the repetitive redundancy.

References:
- [tlug] Editing XML
  - From: Brian Chandler

Prev by Date: Re: [tlug] Ubuntu 10.04 - kernel update snafu
Next by Date: Re: [tlug] Ubuntu 10.04 - kernel update snafu
Previous by thread: Re: [tlug] Editing XML
Next by thread: [tlug] <ot> How to get a Free electronics kit
Index(es):
- Date
- Thread

Home | Main Index | Thread Index