Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Encrypted PDF
- To: tlug@example.com
- Subject: Encrypted PDF
- From: Frank BENNETT <bennett@example.com>
- Date: Fri, 3 Nov 2000 17:09:29 +0900
- Content-Transfer-Encoding: 7bit
- Content-Type: text/plain; charset=iso-2022-jp
- Reply-To: tlug@example.com
- Resent-From: tlug@example.com
- Resent-Message-ID: <oLWTmB.A.5gC.sanA6@example.com>
- Resent-Sender: tlug-request@example.com
Yes, folks, it's time for that topic again. Periodically, I have come back to the list with requests for information about uncanning the text in Japanese PDF files. Here I am again. This is a longish message. The short questions for those in a hurry are: o Does anyone here know of a tool other than xpdf/pdftotext for extracting text from an encrypted PDF file containing Japanese shift-JIS encoded text? and o Is anyone here friends with Derek Noonburg? Now the full story. But first, a brief recap: The last time we visited our hero, he was attempting to extract Japanese plain text from access-restricted PDF files published to the Web by the Printing Bureau of the Japanese Ministry of Finance. (His idea is that the law, of all things, should not be published in a form that restricts distribution, so he is trying to give the Japanese government a nudge in the right direction.) The encoding of the text in the files is Shift-JIS, in vertical orientation. The original files can be found at: http://kanpou.pb-mof.go.jp/ The last time I tuned into the list on this, someone (Shimpei?) suggested that I use xpdf. I fetched that, compiled pdftotext with the decryption patches (now merged into the main source tree with version 0.91), and *thought* that it had pretty well solved the problem, apart from missing a few vertical-style characters that I can hack in on my own. Today, I discovered that pdftotext stops processing many of the target files before the actual end of the text. This seems to be associated with the substitution of ASCII for special characters, such as "TM", "ae", "ff" and so forth -- but no such characters exist in the PDF text at the point where pdftotext thinks it finds them. I have tried disabling these substitutions in the source of pdftotext, but the output stops at the same point anyway. This is now well beyond my meagre computing skills. I either need to find a way to fix pdftotext for use on this class of PDF file, find another decryption/extraction tool, or give up on the project as a serious republication effort. Xpdf and pdftotext are written by Derek Noonburg. I patched his source in order to get around the access restriction on these files. He has this to say about that: I occasionally get email asking if I can explain how to crack a PDF file, or if I can help decrypt a PDF file. I won't help these people because I believe that an author's requests relating to the use of his/her work should be honored. I distribute source code (for Xpdf) under a particular license (the GPL) which depends entirely on users' goodwill for its effectiveness. If any of my users ever decided to violate the license, I would probably never even know about it, much less be able to do anything about it. The only thing I can do is trust the users. In light of this, it would be very hypocritical of me to, on one hand, ask people to honor my licensing restrictions, and, on the other hand, bypass (or assist others in bypassing) another author's requested restrictions. I believe that this is a special case; the ultimate author of Japanese law is the Japanese public. All I am trying to do is make it available to them, via means which are legal in Japan to the best of my knowledge. I think it's a persuasive case, but since the author of xpdf doesn't know me from a box of apples, it would help if I could go to him with an introduction. ... ? Cheers, Frank
- Follow-Ups:
- Re: Encrypted PDF
- From: Frank BENNETT <bennett@example.com>
Home | Main Index | Thread Index
- Prev by Date: Re: LWN on the TL IPO
- Next by Date: Re: Lawson Of Japan To Install 15,000 Linux Terminals
- Prev by thread: Re: Lawson Of Japan To Install 15,000 Linux Terminals
- Next by thread: Re: Encrypted PDF
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links