Mailing List Archive

Support open source code!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tlug: msword files

>>>>> "Hirotaka" == Hirotaka Yoshioka <> writes:

    Hirotaka> Now can we hack the code fragment which accepts one SJIS
    Hirotaka> character? Of course we need to look ahead one byte to
    Hirotaka> test if the byte sequence is a valid SJIS character.

Sure.  The problem is that almost everything is a valid SJIS
character, so most of most binary files will get passed through SJIS.
For example, here are a few lines from "strings `which strings`":


[snipped symbol table here]



I don't know what that mojibake means, but a moderately large
executable will give you hundreds or thousands of lines of it.  This
is for plain old ASCII; the effect would be much worse for shift JIS.

    Hirotaka> Does anybody send me the source code of 'strings'? I
    Hirotaka> suppose it is not a large program.

I don't happen to have a copy at the moment but it's in GNU binutils.

    Hirotaka> Can we write a SJIS version of 'strings'?
    >> No.

    Hirotaka> I think 'no' is too strong word but not impossible. You
    Hirotaka> need to make some dirty hack :-)

Well, no.  It would be like trying to write `strings' for ISO-8859-1:
you just get so many false positives that you end up with 90% of the 

It might be good enough, but the chances aren't high enough that I'll
spend any more time on it ;-)

You could add lots more heuristics, but that wouldn't really be
`strings' any more, since you'd have to be really careful to avoid
stripping out stuff surrounded by MS Word formatting characters, which
is my point.

University of Tsukuba                Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Institute of Policy and Planning Sciences       Tel/fax: +81 (298) 53-5091
What are those two straight lines for?  "Free software rules."
Next Technical Meeting: October 9 (Sat), 13:30   place: Temple Univ.
* Linux Internationalisation Initiative (Li18nux) speaker: Akio Kido
* Japanese TrueType Fonts                     speaker: Adrian Havill
Next Technical Meeting: November 13 (Sat), 13:30 place: Temple Univ.
* Network Security                               speaker: Steve Baur
Next Nomikai:  December 17 (Fri), 19:00 Tengu TokyoEkiMae 03-3275-3691
more info:        Sponsor: Global Online Japan

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links