Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Japanese encoding problem



Jc,

--- Jean-Christian Imbeault <jean_christian@example.com> wrote:
> >From: BOTi <9915104t@example.com>
> >
> >Though it displays as a ^K, it's actually not that in the file.
> >Use an editor that can display it (Japanese). Or use some simple
> unix tool,
> >like sed.
> 
> Thanks for the tip but why would sed work if vim doesn't? Doesn't vim
> use sed?

In the old days of Unix (and perhaps even today in xBSD), vi, ed, ex
and sed were all based on the same code. But vim and sed are 
different codebases. 

> 
> And more importantly, eve if I do use sed, how do I input that
> character(^K) 
> into any sed command I might try?
> I tried:
> 
> sed -e "s/?/ /g" file.dat but that didn't do anything to the
> "japanese" dots 
> I wanted to replace.
> 
> Also please note that I am only surmising that the ^K is the japanese
> dot because that's what it looks like when I open the file 
> in a Windows editor.
> 
> If I do a /?/ I do get a few matches but none on the places where I
> "see" 
> japanese dots in the original file. So I'm begining to think that ^K
> not 
> really the japanese dot but something else, maybe even a "corrupted" 
> character?

Could be. What you need to do is find out what the underlying character
codes are and work with them. 

You can use a hex editor for this, or in the Unix way.....

Pipe a line of the file containingthe code to od:

cat foo.txt | od -c 

You should see the code for the separator in octal. 

You can then replace it with something else like this:
perl -wp -e 's/\244/\n/g' < foo.txt > foo.converted

Replace \244 with the code (or codes) you got from od -c. 

> 
> Jc
> 
>

Regards,
Jake 

__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links