TLUG Mailing List

Mailing List Archive
Support open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: carriage returns

To: tlug@example.com

Subject: Re: carriage returns

From: Frank BENNETT <bennett@example.com>

Date: Sat, 9 Sep 2000 16:43:20 +0900

Content-Transfer-Encoding: 7bit

Content-Type: text/plain; charset=iso-2022-jp

In-Reply-To: <20000909143758.C18072@example.com>; from Jonathan Q on Sat, Sep 09, 2000 at 02:37:59PM +0900

References: <Pine.LNX.4.10.10009091055180.6625-100000@example.com> <20000909143758.C18072@example.com>

Reply-To: tlug@example.com

Resent-From: tlug@example.com

Resent-Message-ID: <nODfFB.A.2WG.eweu5@example.com>

Resent-Sender: tlug-request@example.com
On Sat, Sep 09, 2000 at 02:37:59PM +0900, Jonathan Q wrote:

> Tony> I think it might be better if I were to insert 
> Tony> carriage returns into such data, so that it would 
> Tony> be manageable. But I don't know how to do that and 

> I don't know how you would go about testing to see if you were going
> to chop a character in half or not, but I bet it's probably difficult
> or worse (any Perl/double-byte gurus with nothing better to do on 
> Saturday than read TLUG please chime in on this :-)  For people, it's
> relatively easy, since we're looking at the human readable text
> and can see where to manually hit the return key, but of course, your
> whole goal is to avoid doing this :-)  Doing this with a program is
> likely going to prove much more challenging.

Sorry, no gurus :)

I wrote an algorithm in (of all things) Tcl that does this, just last year. 
The task was simplified by two assumptions:

  o That the Japanese text was all in EUC;

  o That any ASCII text occurred ONLY at the beginning of a line,
    and consisted ALWAYS of a string of two or more asterisks,
    or of numerals inside a set of one or more balanced
    forward slashes.

If that much is guaranteed, all you have to do is count off the ASCII
characters, and then count off the desired number of pairs to the line
break.  To prevent linebreaking weirdness, you might want to add a check for
punctuation symbols.

The real world is not so nicely controlled, however; for your needs you will
want to be able at the least to cope with ASCII strings in the middle of the
line.  That is going to cause some pain, because it is a good deal more
complicated to determine the break points in mixed text.

[I started to add snippets of the Tcl mentioned above here as an example,
but thought better of it; Perl undoubtedly provides some shorthand
mechanisms for dealing with multibyte character sets.  Wait for better
counsel.]

Cheers,
----
-x80
Frank G Bennett, Jr         @@
Faculty of Law, Nagoya Univ () email: bennett@example.com
Tel: +81[(0)52]789-2239     ()
References:

carriage returns
From: Tony Laszlo <laszlo@example.com>

Re: carriage returns
From: Jonathan Q <jq@example.com>

Prev by Date: Re: carriage returns

Next by Date: ppp, pmcia and isdn question

Prev by thread: Re: carriage returns

Next by thread: Re: carriage returns

Index(es):

Date

Thread

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links