Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]"My Kanpo" open law project
- To: tlug@example.com
- Subject: "My Kanpo" open law project
- From: "Frank BENNETT (=?iso-2022-jp?B?GyRCJVUlaSVzJS8hISVZJU0lQyVIGyhC?= )" <bennett@example.com>
- Date: Thu, 1 Mar 2001 15:48:34 +0900
- Content-Transfer-Encoding: 7bit
- Content-Type: text/plain; charset=iso-2022-jp
- Reply-To: tlug@example.com
- Resent-From: tlug@example.com
- Resent-Message-ID: <ytEMeD.A._K.LDfn6@example.com>
- Resent-Sender: tlug-request@example.com
Dear TLUG people, Late last year and early this, I wrote to the list a few times about converting some access-restricted Japanese PDF files into plain text. As I may have mentioned, this was part of a larger scheme aimed at freeing up access to the laws of Japan -- for Japanese people as well as for gringos like myself. With your help, I have been able to get my tools into a form (just barely) fit for public inspection. This is a progress report cum publicity announcement cum invitation to do extra work^H get involved. Apart from making Japan a better place, I have come up with an algorithm for converting vertically-formatted Japanese PDF text to reading-order text that can be viewed on a TTY console. It still has some rough spots, but with a bit more work, it promises to do a good job of conversion. If PDF irritates you as much as it does me, this may interest you. The target text appears once each week at the following site: http://kanpou.pb-mof.go.jp/ This is "Kanpo", the Japanese official gazette, in which all laws, Ministry regulations and orders, Ministry announcements, and certain judicial determinations and orders are published. Contrary to the warning language published on the site, all of this material is excluded from copyright protection under section 13 of the Japanese Copyright Act. This has been confirmed by the Ministry of Finance Printing Bureau, both to me and to an editor at the Asahi Shinbun. The files for each issue of Kanpo last on the site for one week only, and are afflicted with Ad*b* Acr*b*t access control restrictions that prevent Acr*b*t from doing much more that displaying the text on your monitor. No save, no print, no cut and paste. (If they could have prevented the information from entering your brain, I suppose they would have imposed that restriction as well ...) The objective is to create a persistent copy of the text content of these files, under a full text search engine, so that people with an interest in this material (journalists, lawyers, businessmen, members of the general public, and bureaucrats themselves) can experience the convenience of unrestricted access to the law as it rolls off the press. I have written a (pretty ugly) Python script (my first) that automates the downloading, conversion and indexing of the archive. On March 10, the script and related patches will be "cited" in a short article that I have placed in the Tokyo Internet Law Review, edited by law students in the University of Tokyo. The "cite" will lead to the CVS archive for the suite on SourceForge, who have allowed the project in after full disclosure of what it's all about. The docs and whatnot are still pretty rough, but anyone who is interested can grab a copy of the sources thus: cvs -d:pserver:anonymous@example.com:/cvsroot/mykanpo login cvs -d:pserver:anonymous@example.com:/cvsroot/mykanpo co dist Beware, though, that I need to do some reorganization; the "dist" module includes the full sources of Python, freeWAIS-sf, kakasi, and some other weighty items. If you're on a narrow bandwidth line, you won't want to check the dist module out en bloc. The CGI script in the suite will require lots of work to make it capable of handling the sort of load that an archive of this kind will eventually attract. Text is currently stored in disk files, and the WAIS search engine I used for development launches a fresh instance for every search. For production, everything should move into MySQL, accessed through a fastCGI, and that will require lots of rewriting. That's about it. Thanks to those who offered advice on this. Comments are, as ever, welcome. And if you know of anyone who might be interested in helping to push this forward, do please put them in touch. Cheers, Frank
Home | Main Index | Thread Index
- Prev by Date: Re: exim and bcc
- Next by Date: Re: exim and bcc
- Prev by thread: March Meeting
- Next by thread: Re: "My Kanpo" open law project
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links