Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]tlug: Japanese PDF file contents
- To: tlug@example.com
- Subject: tlug: Japanese PDF file contents
- From: "Frank Bennett (=?iso-2022-jp?B?GyRCJVUlaSVzJS8kWSVNJUMlSBsoQg==?= )" <bennett@example.com>
- Date: Mon, 31 Jan 2000 14:14:38 +0900
- Content-Transfer-Encoding: 7bit
- Content-Type: text/plain; charset=us-ascii
- Reply-To: tlug@example.com
- Sender: owner-tlug@example.com
Short version: Is there a PDF->text stripper of some sort that will work with Japanese PDF files encoded in Shift-JIS? *** Long version: Another translation query. The Japanese government issues updates to law through an official registry called "Kanpo". This publication is now online, at: http://kanpou.pb-mof.go.jp/ The actual text of the updates is distributed as a series of PDF files. A colleague and I were commiserating with one another about the difficulties of applying the update to the actual text of the law --- "In sections 1, 4(3) an 23.2(6)(ii) of the Dogcatcher Investment Assistance Act, replace the word "net" with the words "rope or collar". I suggested that it might be possible to generate a patch file off of the Kanpo PDF --- the phrasing is highly structured, so this might work well enough to save some work for the community. I've been playing with Tcl's HTTP facilities, and I can see that it will be a simple matter to walk through the menus and snatch the PDF files themselves on a daily basis. However, I can't find anything like intelligible text in there. Does anyone know if there is a stripper out there that can dump just the text of a Japanese PDF to a file so that it can be made useful to a scripting language? Many thanks for any suggestions, Frank Bennett -------------------------------------------------------------------- Next Nomikai Meeting: February 18 (Fri) 19:00 Tengu TokyoEkiMae Next Technical Meeting: March 11 (Sat) 13:00 Temple University Japan * Topic: TBD -------------------------------------------------------------------- more info: http://www.tlug.gr.jp Sponsor: Global Online Japan
- Follow-Ups:
- Re: tlug: Japanese PDF file contents
- From: smitimko@example.com
- tlug: Japanese PDF file contents
- From: "Stephen J. Turnbull" <turnbull@example.com>
Home | Main Index | Thread Index
- Prev by Date: tlug: Re: mysterious X core / namazu
- Next by Date: Re: tlug: Japanese PDF file contents
- Prev by thread: RE: tlug: mysterious X core / namazu
- Next by thread: Re: tlug: Japanese PDF file contents
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links