Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: location of pdf2txt



On Tue, Nov 28, 2000 at 01:27:00PM -0800, Drew Poulin wrote:
> Selva wrote:
> 
> > Does anyone know the latest location of pdf2txt?
> 
> I'm not sure if this is what you're looking for, but xpdf includes
> pdftotext, which does extract Japanese if you set that compile option.
> 
> http://www.foolabs.com/xpdf/

Motion seconded: this is what you want, Selva.

Xpdf now ships with the decryption patches that used to be housed at another
site.  It also supports Jse (you'll get output in EUCJP). If you are working
on vertically formatted docs, drop me a note -- last week I wrote a Python
script that munges the debugging output of pdftotext as applied to vertical
files into reading-order horizontal text.

It would really make my day if someone would take the Python script's
algorithms and implement them inside pdftotext (and add the CMap entries for
vertically formatted text as well).  But the script, such as it is, is free
for the asking.

Cheers,
----
-x80
Frank G Bennett, Jr         @@
Faculty of Law, Nagoya Univ () email: bennett@example.com
Tel: +81[(0)52]789-2239     ()

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links