Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlug] Subject: Retirement, part 2: Books ...



Steve,

Congratulations on your retirement.   I second the https://www.gonbei.jp/ domain registrar.

Regarding the books, I am just at the point of trying to do something similar, and have run across a service ScanB that destructively scans books by cutting off the binding and scanning with a Canon DR-X10C.   

     https://sg.canon/en/consumer/dr-x10c/product

The cost is 80 yen per book up to 300 pages, and 160 yen for 300+ pages in greyscale, no OCR, or 210 yen per book unlimited pages, color scan with OCR, with the latter pricing being for 10 or more books, with data downloadable on completion.

More details on their Japanese only home page https://scanb.jp/howmuch

I just ordered this for one book, grayscale, to test the concept, and it will arrive in about 2 weeks, so I can't unfortunately give you a direct testimonial on the quality.

If you have a short time window to sort this out there is a nominal fee for express service.

Searching for 本 スキャン on Google, I see a number of competing services.

You might give these a look for anything that you want to keep, but do not have the space to store.

In a side note, I will be using this for a research project in which I take scanned definitive source books mostly in Japanese from the 1960s-1980s, use ImageMagick to make a jpg per page,  bulk upload to Google Cloud Vision where the files are automatically OCR'd, and the text published to a stream.   A short script can then subscribe to the stream and download the text for each image.   

This tutorial did not really work for me, but describes the OCR service https://cloud.google.com/functions/docs/tutorials/ocr   ChatGPT was better for setting this up via CLI.

Loading the text files into a large context window LLM, I should then be able to do natural language query of the document contents.   It is indeed an amazing time we live in.

Shaun


Home | Main Index | Thread Index