Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: carriage returns



Tony Laszlo (laszlo@example.com) wrote:

Tony> I use jvim and yudit for editing Japanese documents. 
Tony> I am having trouble with long strings of Japanese 
Tony> text which have no carriage returns. With Yudit, 

Let me take a wild guess and ask if you're getting these things from
people using Outhouse Excess?  I get mails like that in English from time to
time, and it's a PITA.

Tony> I think it might be better if I were to insert 
Tony> carriage returns into such data, so that it would 
Tony> be manageable. But I don't know how to do that and 

A sticky problem.  I have no experience with Yudit, but do use
vi a lot, and it has powerful pattern-matching search-and-replace
features.  You could use that feature to replace any occurrence of a
given character with a carriage return, but that might not help
very much.  You could try targeting the double-byte comma
(、) and see if that broke up the lines into more reasonable chunks.
I've never tried any pattern-matching text replacement on double-byte
text or not, so I don't know if this will work, but it might be worth
a try.

To do it, maybe you can enter the comma using your J input method.
For the carriage return, you'll have to try using its ASCII code
with a backslash escape in front of it, I think (someone please
correct me if I'm wrong here).   I just did a test on a text file
and replaced every occurrence of 「日本語」 with "Japanese," so
double-byte search and replace seems to work OK.

To do this in vi and its counterparts such as jvim, enter the following
in command mode:

:%s/、/escape-code-for-carriage-return-and-escape-code-for-line-feed-here/g

That should (not guarantees, of course :-)  Replace ever Japanese
double-byte comma with a unix-style carriage return+line feed.

The other way I can think of to do it is  (a good bit) harder.  You
would have to write a program (Perl would probably be best for this)
to either A) do the same thing (in which case you're much better off
using vi as above), or to arbtrarily insert a cr+lf at set interval.
This would be pretty easy with ASCII text.  All you'd have to do is
count off (say) 60 characters, and see if the 61st one was white space.
If not, take the first white space character after the 60th one and
replace it with a cr+lf.

With Japanese, things will be a lot more complicated. Spaces are a lot
less common, and will probably be double-byte spaces.  So the first
space after the 60th character could be another 60 characters down the line.
And of course, the 60th character in ASCII terms is the 30th human-readable
one in double-byte terms, so this must be accounted for.  This approach
would probably not work well.

The other approach would be to count off (say) 60 characters
(30 double-byte characters) and then do a test and either insert
a cr+lf at that point, or move over one single-byte position and
insert the cr+lf there, based on the results of the test.  The
test is what becomes the sticky part.  You need to determine the
answer to the question "If I insert a cr+lf right here, will I cut
a double-byte character in half and mangle my text?"  If the answer
is yes, move one single-byte space right and insert the cr+lf there.
If the answer is no, insert the cr+lf where you are.  If you want to
go for even prettier formating, also test to see if the character
after your insertion point is white space.  If it is, remove it.

I don't know how you would go about testing to see if you were going
to chop a character in half or not, but I bet it's probably difficult
or worse (any Perl/double-byte gurus with nothing better to do on 
Saturday than read TLUG please chime in on this :-)  For people, it's
relatively easy, since we're looking at the human readable text
and can see where to manually hit the return key, but of course, your
whole goal is to avoid doing this :-)  Doing this with a program is
likely going to prove much more challenging.

Jonathan






Tony> suspect it might not be so easy due to the mixed 
Tony> double-byte/single-byte text in Japanese documents. 
Tony> Any hints on how this cat might best be skinned 
Tony> would be most appreciated. 
Tony> 
Tony> (I would like to stay with the apps I have now, if 
Tony> possible). 
Tony> 
Tony> Thanks. 
Tony> 
Tony> 
Tony> 
Tony> -----------------------------------------------------------------------
Tony> Next Nomikai Meeting: October 20 (Fri) 19:00   Place: Tengu TokyoEkiMae
Tony> Next Technical Meeting: November 11 (Sat) 13:30  Place: LinuxProbe Hall
Tony> -----------------------------------------------------------------------
Tony> more info: http://www.tlug.gr.jp           Sponsor: Global Online Japan
Tony> 

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links