Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: "My Kanpo" open law project
- To: tlug@example.com
- Subject: Re: "My Kanpo" open law project
- From: "Frank BENNETT (=?iso-2022-jp?B?GyRCJVUlaSVzJS8hISVZJU0lQyVIGyhC?= )" <bennett@example.com>
- Date: Thu, 1 Mar 2001 17:30:29 +0900
- Content-Transfer-Encoding: 7bit
- Content-Type: text/plain; charset=iso-2022-jp
- In-Reply-To: <200103010736.QAA26536@example.com>; from Jim Breen on Thu, Mar 01, 2001 at 04:36:24PM +0900
- References: <200103010736.QAA26536@example.com>
- Reply-To: tlug@example.com
- Resent-From: tlug@example.com
- Resent-Message-ID: <A6EIBB.A.Ui.rign6@example.com>
- Resent-Sender: tlug-request@example.com
On Thu, Mar 01, 2001 at 04:36:24PM +0900, Jim Breen wrote: > >> Date: Thu, 1 Mar 2001 15:48:34 +0900 > >> From: "Frank BENNETT <bennett@example.com> > >> > >> I have come up with an algorithm for converting vertically-formatted > >> Japanese PDF text to reading-order text that can be viewed on a > >> TTY console. It still has some rough spots, but with a bit more > >> work, it promises to do a good job of conversion. If PDF irritates > >> you as much as it does me, this may interest you. > > Sounds great. Any chance to see some samples? Your wish is my command set. A partial converted archive is available for viewing at: http://www.nomolog.nagoya-u.ac.jp/~bennett/rippinhood/ Use username "just", password "lookin" to authorize a connection. I'll leave offsite password access in place until the 10th -- I'm planning to offer an advance view to the MOF Printing Bureau anyway, so you-all can share a password with them until then. I would appreciate it if this info were confined to members of TLUG, though. It's not that I'm trying to keep this under wraps, but I _don't_ have approval from our faculty to serve this stuff to the world at large, and it _would_ be embarrassing if someone in government complained to my Dean. If you want unrestricted access, set up the software and run your own mirror. :-) If you do take a look, the archive stops on 2 February, which is not right. We do have source on file down to the present, but a bug in the cascading conversion algorithm is holding things up. I haven't gotten around to repairing it yet because this _is_ still just a test suite. However, you can use what's there to get a feel for the search engine and see what the conversion filter does with vertical PDF. Our gateway is VERY slow at the moment. You might want to try the connection morning-times, when things seem to be a little less sluggish from this end ... Bye the bye, if you run a search for インターネット, the _last_ two pages in the list returned show what the algorithm does for orthodox vertically formatted pages arranged into ranks that do not change height mid-page -- very nice. The other pages returned show what sort of irritating broken weirdness happens when tables and other erratica are thrown into the middle of the page -- common in Kanpo, so something that I need to fix. Most of the weirdness that results can be controlled, with a little more effort and possibly some work (by someone other than ignorant me) on the source code to xpdf's pdftotext filter. Ultimately, it would be nice to see the entire formatting algorithm incorporated into xpdf -- it is indifferent to text direction -- but Python is working very nicely as a prototyping platform, so that can wait until things stabilize. Cheers, Frank
- References:
- Re: "My Kanpo" open law project
- From: jwb@example.com (Jim Breen)
Home | Main Index | Thread Index
- Prev by Date: Recommended Video Capture Card
- Next by Date: Recommended Video Capture Card
- Prev by thread: Re: "My Kanpo" open law project
- Next by thread: Re: "My Kanpo" open law project
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links