Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: carriage returns
- To: tlug@example.com
- Subject: Re: carriage returns
- From: Frank BENNETT <bennett@example.com>
- Date: Sat, 9 Sep 2000 16:43:20 +0900
- Content-Transfer-Encoding: 7bit
- Content-Type: text/plain; charset=iso-2022-jp
- In-Reply-To: <20000909143758.C18072@example.com>; from Jonathan Q on Sat, Sep 09, 2000 at 02:37:59PM +0900
- References: <Pine.LNX.4.10.10009091055180.6625-100000@example.com> <20000909143758.C18072@example.com>
- Reply-To: tlug@example.com
- Resent-From: tlug@example.com
- Resent-Message-ID: <nODfFB.A.2WG.eweu5@example.com>
- Resent-Sender: tlug-request@example.com
On Sat, Sep 09, 2000 at 02:37:59PM +0900, Jonathan Q wrote: > Tony> I think it might be better if I were to insert > Tony> carriage returns into such data, so that it would > Tony> be manageable. But I don't know how to do that and > I don't know how you would go about testing to see if you were going > to chop a character in half or not, but I bet it's probably difficult > or worse (any Perl/double-byte gurus with nothing better to do on > Saturday than read TLUG please chime in on this :-) For people, it's > relatively easy, since we're looking at the human readable text > and can see where to manually hit the return key, but of course, your > whole goal is to avoid doing this :-) Doing this with a program is > likely going to prove much more challenging. Sorry, no gurus :) I wrote an algorithm in (of all things) Tcl that does this, just last year. The task was simplified by two assumptions: o That the Japanese text was all in EUC; o That any ASCII text occurred ONLY at the beginning of a line, and consisted ALWAYS of a string of two or more asterisks, or of numerals inside a set of one or more balanced forward slashes. If that much is guaranteed, all you have to do is count off the ASCII characters, and then count off the desired number of pairs to the line break. To prevent linebreaking weirdness, you might want to add a check for punctuation symbols. The real world is not so nicely controlled, however; for your needs you will want to be able at the least to cope with ASCII strings in the middle of the line. That is going to cause some pain, because it is a good deal more complicated to determine the break points in mixed text. [I started to add snippets of the Tcl mentioned above here as an example, but thought better of it; Perl undoubtedly provides some shorthand mechanisms for dealing with multibyte character sets. Wait for better counsel.] Cheers, ---- -x80 Frank G Bennett, Jr @@ Faculty of Law, Nagoya Univ () email: bennett@example.com Tel: +81[(0)52]789-2239 ()
- References:
- carriage returns
- From: Tony Laszlo <laszlo@example.com>
- Re: carriage returns
- From: Jonathan Q <jq@example.com>
Home | Main Index | Thread Index
- Prev by Date: Re: carriage returns
- Next by Date: ppp, pmcia and isdn question
- Prev by thread: Re: carriage returns
- Next by thread: Re: carriage returns
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links