Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: carriage returns
- To: tlug@example.com
- Subject: Re: carriage returns
- From: Jonathan Q <jq@example.com>
- Date: Sat, 9 Sep 2000 14:37:59 +0900
- Content-Transfer-Encoding: 7bit
- Content-Type: text/plain; charset=iso-2022-jp
- In-Reply-To: <Pine.LNX.4.10.10009091055180.6625-100000@example.com>; from laszlo@example.com on Sat, Sep 09, 2000 at 11:01:01AM +0900
- References: <Pine.LNX.4.10.10009091055180.6625-100000@example.com>
- Reply-To: tlug@example.com
- Resent-From: tlug@example.com
- Resent-Message-ID: <hzQezB.A.TPG.WBdu5@example.com>
- Resent-Sender: tlug-request@example.com
- Sender: jq@example.com
Tony Laszlo (laszlo@example.com) wrote: Tony> I use jvim and yudit for editing Japanese documents. Tony> I am having trouble with long strings of Japanese Tony> text which have no carriage returns. With Yudit, Let me take a wild guess and ask if you're getting these things from people using Outhouse Excess? I get mails like that in English from time to time, and it's a PITA. Tony> I think it might be better if I were to insert Tony> carriage returns into such data, so that it would Tony> be manageable. But I don't know how to do that and A sticky problem. I have no experience with Yudit, but do use vi a lot, and it has powerful pattern-matching search-and-replace features. You could use that feature to replace any occurrence of a given character with a carriage return, but that might not help very much. You could try targeting the double-byte comma (、) and see if that broke up the lines into more reasonable chunks. I've never tried any pattern-matching text replacement on double-byte text or not, so I don't know if this will work, but it might be worth a try. To do it, maybe you can enter the comma using your J input method. For the carriage return, you'll have to try using its ASCII code with a backslash escape in front of it, I think (someone please correct me if I'm wrong here). I just did a test on a text file and replaced every occurrence of 「日本語」 with "Japanese," so double-byte search and replace seems to work OK. To do this in vi and its counterparts such as jvim, enter the following in command mode: :%s/、/escape-code-for-carriage-return-and-escape-code-for-line-feed-here/g That should (not guarantees, of course :-) Replace ever Japanese double-byte comma with a unix-style carriage return+line feed. The other way I can think of to do it is (a good bit) harder. You would have to write a program (Perl would probably be best for this) to either A) do the same thing (in which case you're much better off using vi as above), or to arbtrarily insert a cr+lf at set interval. This would be pretty easy with ASCII text. All you'd have to do is count off (say) 60 characters, and see if the 61st one was white space. If not, take the first white space character after the 60th one and replace it with a cr+lf. With Japanese, things will be a lot more complicated. Spaces are a lot less common, and will probably be double-byte spaces. So the first space after the 60th character could be another 60 characters down the line. And of course, the 60th character in ASCII terms is the 30th human-readable one in double-byte terms, so this must be accounted for. This approach would probably not work well. The other approach would be to count off (say) 60 characters (30 double-byte characters) and then do a test and either insert a cr+lf at that point, or move over one single-byte position and insert the cr+lf there, based on the results of the test. The test is what becomes the sticky part. You need to determine the answer to the question "If I insert a cr+lf right here, will I cut a double-byte character in half and mangle my text?" If the answer is yes, move one single-byte space right and insert the cr+lf there. If the answer is no, insert the cr+lf where you are. If you want to go for even prettier formating, also test to see if the character after your insertion point is white space. If it is, remove it. I don't know how you would go about testing to see if you were going to chop a character in half or not, but I bet it's probably difficult or worse (any Perl/double-byte gurus with nothing better to do on Saturday than read TLUG please chime in on this :-) For people, it's relatively easy, since we're looking at the human readable text and can see where to manually hit the return key, but of course, your whole goal is to avoid doing this :-) Doing this with a program is likely going to prove much more challenging. Jonathan Tony> suspect it might not be so easy due to the mixed Tony> double-byte/single-byte text in Japanese documents. Tony> Any hints on how this cat might best be skinned Tony> would be most appreciated. Tony> Tony> (I would like to stay with the apps I have now, if Tony> possible). Tony> Tony> Thanks. Tony> Tony> Tony> Tony> ----------------------------------------------------------------------- Tony> Next Nomikai Meeting: October 20 (Fri) 19:00 Place: Tengu TokyoEkiMae Tony> Next Technical Meeting: November 11 (Sat) 13:30 Place: LinuxProbe Hall Tony> ----------------------------------------------------------------------- Tony> more info: http://www.tlug.gr.jp Sponsor: Global Online Japan Tony>
- Follow-Ups:
- Re: carriage returns
- From: "Stephen J. Turnbull" <turnbull@example.com>
- Re: carriage returns
- From: Frank BENNETT <bennett@example.com>
- References:
- carriage returns
- From: Tony Laszlo <laszlo@example.com>
Home | Main Index | Thread Index
- Prev by Date: carriage returns
- Next by Date: Re: carriage returns
- Prev by thread: carriage returns
- Next by thread: Re: carriage returns
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links