Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: tlug: msword files
- To: tlug@example.com
- Subject: Re: tlug: msword files
- From: "Stephen J. Turnbull" <turnbull@example.com>
- Date: Fri, 8 Oct 1999 16:22:26 +0900 (JST)
- Content-Transfer-Encoding: 7bit
- Content-Type: text/plain; charset=us-ascii
- In-Reply-To: <37FD7E24.3ACE5B87@example.com>
- References: <19991007095454N.hbell@example.com><Pine.LNX.4.10.9910071031110.1892-100000@example.com><19991008064806.A1963@example.com><37FD3924.EF3A79B@example.com><14333.17985.440788.722125@example.com><37FD7E24.3ACE5B87@example.com>
- Reply-To: tlug@example.com
- Sender: owner-tlug@example.com
>>>>> "Hirotaka" == Hirotaka Yoshioka <hyoshiok@example.com> writes: Hirotaka> Now can we hack the code fragment which accepts one SJIS Hirotaka> character? Of course we need to look ahead one byte to Hirotaka> test if the byte sequence is a valid SJIS character. Sure. The problem is that almost everything is a valid SJIS character, so most of most binary files will get passed through SJIS. For example, here are a few lines from "strings `which strings`": /lib/ld-linux.so.2 __gmon_start__ libbfd-2.9.1.0.25.so _DYNAMIC _GLOBAL_OFFSET_TABLE_ _init _fini [snipped symbol table here] _end GLIBC_2.1 GLIBC_2.0 PTRh QVhp (<xt G<Pj ueh\ 80t-@example.com(@example.com#@ WVSj/ 80t-@example.com(@example.com#@ @@+E 80t-@example.com(@example.com#@ @@example.com$ 8#t+C8#t&C8#t!C 80t-@example.com(@example.com#@ @@example.com$ @example.com$ 8#t+C8#t&C8#t!C 80t-@example.com(@example.com#@ cHJy version help target [snipped] I don't know what that mojibake means, but a moderately large executable will give you hundreds or thousands of lines of it. This is for plain old ASCII; the effect would be much worse for shift JIS. Hirotaka> Does anybody send me the source code of 'strings'? I Hirotaka> suppose it is not a large program. I don't happen to have a copy at the moment but it's in GNU binutils. Hirotaka> Can we write a SJIS version of 'strings'? >> No. Hirotaka> I think 'no' is too strong word but not impossible. You Hirotaka> need to make some dirty hack :-) Well, no. It would be like trying to write `strings' for ISO-8859-1: you just get so many false positives that you end up with 90% of the file. It might be good enough, but the chances aren't high enough that I'll spend any more time on it ;-) You could add lots more heuristics, but that wouldn't really be `strings' any more, since you'd have to be really careful to avoid stripping out stuff surrounded by MS Word formatting characters, which is my point. -- University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Institute of Policy and Planning Sciences Tel/fax: +81 (298) 53-5091 __________________________________________________________________________ __________________________________________________________________________ What are those two straight lines for? "Free software rules." ------------------------------------------------------------------- Next Technical Meeting: October 9 (Sat), 13:30 place: Temple Univ. * Linux Internationalisation Initiative (Li18nux) speaker: Akio Kido * Japanese TrueType Fonts speaker: Adrian Havill Next Technical Meeting: November 13 (Sat), 13:30 place: Temple Univ. * Network Security speaker: Steve Baur Next Nomikai: December 17 (Fri), 19:00 Tengu TokyoEkiMae 03-3275-3691 ------------------------------------------------------------------- more info: http://www.tlug.gr.jp Sponsor: Global Online Japan
- References:
- Re: tlug: msword files
- From: Tony Laszlo <laszlo@example.com>
- Re: tlug: msword files
- From: Shimpei Yamashita <shimpei@example.com>
- Re: tlug: msword files
- From: "Hirotaka Yoshioka" <hyoshiok@example.com>
- Re: tlug: msword files
- From: "Stephen J. Turnbull" <turnbull@example.com>
- Re: tlug: msword files
- From: "Hirotaka Yoshioka" <hyoshiok@example.com>
Home | Main Index | Thread Index
- Prev by Date: Re: tlug: gcc use
- Next by Date: Re: tlug: RH61
- Prev by thread: Re: tlug: msword files
- Next by thread: Re: tlug: "strings" for Java class files
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links