Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: "My Kanpo" open law project



On Fri, Mar 02, 2001 at 01:11:10PM +0900, Jim Breen wrote:
> JB>> > What's the "[badchar]" problem. PDF-funnies?
> >> 
> FB>> Ah.  The PDF is CID encoded.  I hacked in EUC mappings to cope with the
> FB>> special vertical-text characters (I noticed that that came up in a recent
> FB>> GhostScript-related post as well) used by Adobe.  But CID also offers a lot of
> FB>> rare glyphs that don't have direct EUC-JP mappings.  
> 
> If they are kanji etc, Ken Lunde can probably tell you the mapping into
> JIS212. You can graft in an image for them, but that makes it less
> useful as general text.

It will be least stressful to fix the conversion and mapping problem inside
the code of xpdf/pdftotext -- converting to Unicode and back again just to
check the validity of characters is, I freely confess, a wasteful breed of
madness. There are other anomalies in the output of pdftotext, too, which
could be addressed at the same time.  By a real programmer.[1] Someday.  :-)

Frank

[1] I'm a comparative lawyer.  My favored excuse for not knowing C, and for
many other shortcomings as well.

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links