Re: tlug: msword files

>>>>> "Hirotaka" == Hirotaka Yoshioka <> writes:

    Hirotaka> Now can we hack the code fragment which accepts one SJIS
    Hirotaka> character? Of course we need to look ahead one byte to
    Hirotaka> test if the byte sequence is a valid SJIS character.

Sure.  The problem is that almost everything is a valid SJIS
character, so most of most binary files will get passed through SJIS.
For example, here are a few lines from "strings `which strings`":


[snipped symbol table here]



I don't know what that mojibake means, but a moderately large
executable will give you hundreds or thousands of lines of it.  This
is for plain old ASCII; the effect would be much worse for shift JIS.

    Hirotaka> Does anybody send me the source code of 'strings'? I
    Hirotaka> suppose it is not a large program.

I don't happen to have a copy at the moment but it's in GNU binutils.

    Hirotaka> Can we write a SJIS version of 'strings'?
    >> No.

    Hirotaka> I think 'no' is too strong word but not impossible. You
    Hirotaka> need to make some dirty hack :-)

Well, no.  It would be like trying to write `strings' for ISO-8859-1:
you just get so many false positives that you end up with 90% of the 

It might be good enough, but the chances aren't high enough that I'll
spend any more time on it ;-)

You could add lots more heuristics, but that wouldn't really be
`strings' any more, since you'd have to be really careful to avoid
stripping out stuff surrounded by MS Word formatting characters, which
is my point.

University of Tsukuba                Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Institute of Policy and Planning Sciences       Tel/fax: +81 (298) 53-5091
What are those two straight lines for?  "Free software rules."
