TLUG Mailing List

Mailing List Archive
Support open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
tlug: Japanese PDF file contents

To: tlug@example.com

Subject: tlug: Japanese PDF file contents

From: "Frank Bennett (=?iso-2022-jp?B?GyRCJVUlaSVzJS8kWSVNJUMlSBsoQg==?= )" <bennett@example.com>

Date: Mon, 31 Jan 2000 14:14:38 +0900

Content-Transfer-Encoding: 7bit

Content-Type: text/plain; charset=us-ascii

Reply-To: tlug@example.com

Sender: owner-tlug@example.com
Short version:  Is there a PDF->text stripper of some sort that will work
with Japanese PDF files encoded in Shift-JIS?

***

Long version:

Another translation query.  The Japanese government issues updates to law
through an official registry called "Kanpo".  This publication is now
online, at: 

  http://kanpou.pb-mof.go.jp/

The actual text of the updates is distributed as a series of PDF files. 

A colleague and I were commiserating with one another about the
difficulties of applying the update to the actual text of the law --- "In
sections 1, 4(3) an 23.2(6)(ii) of the Dogcatcher Investment Assistance
Act, replace the word "net" with the words "rope or collar". 

I suggested that it might be possible to generate a patch file off of the
Kanpo PDF --- the phrasing is highly structured, so this might work well
enough to save some work for the community.

I've been playing with Tcl's HTTP facilities, and I can see that it
will be a simple matter to walk through the menus and snatch the PDF
files themselves on a daily basis.  However, I can't find anything like
intelligible text in there.  Does anyone know if there is a stripper out
there that can dump just the text of a Japanese PDF to a file so that it
can be made useful to a scripting language?

Many thanks for any suggestions,
Frank Bennett

--------------------------------------------------------------------
Next Nomikai Meeting: February 18 (Fri) 19:00 Tengu TokyoEkiMae
Next Technical Meeting:  March 11 (Sat) 13:00 Temple University Japan
* Topic: TBD
--------------------------------------------------------------------
more info: http://www.tlug.gr.jp        Sponsor: Global Online Japan
Follow-Ups:

Re: tlug: Japanese PDF file contents
From: smitimko@example.com

tlug: Japanese PDF file contents
From: "Stephen J. Turnbull" <turnbull@example.com>

Prev by Date: tlug: Re: mysterious X core / namazu

Next by Date: Re: tlug: Japanese PDF file contents

Prev by thread: RE: tlug: mysterious X core / namazu

Next by thread: Re: tlug: Japanese PDF file contents

Index(es):

Date

Thread

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links