Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] EDICT dictionary on Kindle



On Sat, Dec 22, 2012 at 03:34:56PM +1100, Jim Breen wrote:
> On 21 December 2012 23:45, John Mettraux <jmettraux@example.com> wrote:
> > On Fri, Dec 21, 2012 at 10:19:49PM +1100, Jim Breen wrote:
>
> >> I have put my version at:
> >> http://www.csse.monash.edu.au/~jwb/edict2.mobi
> >> Can someone with a Kindle try it out and check on
> >> (a) and (b) above for me?
>
> > For (b) I counted 167060 <idx:entry.../>.
>
> Yep, I get that too (in the edict2.html file).
>
> > Now for the file at
> >
> >   http://www.csse.monash.edu.au/~jwb/edict2.mobi
> >
> > Regarding (a), lookup from the dictionary itself (magnifying glass) doesn't
> > work, I guess it's an encoding issue (hint, the attached jpeg).
> > Lookup from a japanese document doesn't work (the kindle falls back to its
> > own wawa dictionary, disregards edict2.mobi).
>
> There doesn't seem to be a magnifying glass in Kindle Previewer.

Hello,

downloaded the Kindle Previewer to double check.

Indeed there is no magnifying glass and hitting "Go To" -> "Dictionary
Search" does nothing.


> When I created the edict2.html file file (ruby to_opf.rb < edict2.txt
> > html/edict2.html)
> I noticed the entries came out looking like this:
>
> <idx:entry name="word" scriptable="yes">
>   <h2>金円【きんえん】</h2>
>   <idx:orth value="\351\207\221\345\206\206"></idx:orth>
>   <idx:orth value="\343\201\215\343\202\223\343\201\210\343\202\223"></idx:orth>
>   <p>(n) money</p>
> </idx:entry>
>
> In other words, the kanji/kana UTF8 indices are in octal literals.
> Should that not happen?
>
> It's being generated by the ruby script you have on github.
> Perhaps the ruby I have installed behaves differently?
> Mine is "ruby 1.8.7 (2010-01-10 patchlevel 249) [i486-linux]"
> and is the latest available for the Ubuntu I am running.

Yes, that should not happen. Sorry, I should have mentioned in the docs that
Ruby 1.9 is required.

I reach the same result (\351\207...) if I use 1.8.7 on my system.

1.9.2 or 1.9.3 are better. There are probably Ubuntu packages for them out
there or you can use RVM (http://rvm.io) or chruby
(https://github.com/postmodern/chruby). Probably following the
instructions in https://github.com/postmodern/chruby/wiki/MRI is better and
more classical.


> > Sorry, I didn't look at (b). It would be interesting to have a look at the
> > edict2.html output before it gets rolled into the .mobi file.
>
> Is the sample above enough? I can send you the whole thing
>
> I checked what's showing in Kindle Previewer. The last entry,
> marked "118523" is the last in EDICT2, so the other 50k
> may have been dropped internally. Does your Kindle display
> the number of entries?

I'm not sure this 118483 (my file) or 118523 (your file) indicates the number
of entries. My file goes from "getsu - cutting off the leg at the knee" to
"harubaru - from afar".

I think those numbers are a mobi/kindle numbering scheme (not pages, not
words, something else).

I see "no 118481" and "100%" on my Kindle when on the last "page" (harubaru).

I guess a real mobi book would give hints to the reader about page numbering
and those hints are missing (not necessary for us) so the reader falls back
to whatever routine it has to determine those points.


If you need more help, please tell me. Best regards,

John



Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links