Re: tlug: msword files

>>>>> "Hirotaka" == Hirotaka Yoshioka <> writes:

    Hirotaka> Shimpei Yamashita wrote:

    >> But does "strings" work on Japanese characters at all? I
    >> suspect not.

Yes.  As long as they're 7-bit JIS.  You'll get ESC-less JIS, but
that's easily repaired since you _will_ be left with the rest of the
kanji in-out sequences.  That won't work on Japanese MS Word files, of

    Hirotaka> Can we write a SJIS version of 'strings'?

No.  strings(1) simply (1) checks for object/library files, and if so
uses libbfd to find the data, if not takes the whole file and (2)
scans it byte-by-byte using isprint() and spits out sequences of 4
bytes or more that are printable.  Even if we had a POSIX locale for
Japan, which we don't, really, the byte-oriented-ness would screw this
idea.  Not to mention the fact that since MS saw fit to pack GR with
the half-wit kana, most executables are going to be about 70-80%
printable outside of the string data.

Ie, a non-ASCII version of strings must know a lot more about file
structure; for Shimpei's application you really need to know what a MS
Word file looks like, and even MS Word doesn't know that from one
version to the next; I doubt you'll find anybody to maintain a version 
of strings that handles shift JIS.

