Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] PDF to text converter (was: Anyone alive out here ?)
- Date: Fri, 06 Sep 2024 16:43:22 +0900
- From: Jim Blackson <blackson@example.com>
- Subject: Re: [tlug] PDF to text converter (was: Anyone alive out here ?)
- References: <2d9532be-b1af-42d1-a8cd-8eae13f9f9d5@codewiz.org> <f48593f1-8a3a-440d-9b29-33199b6dcef2@dcook.org>
On Thu, 5 Sep 2024 12:58:01 +0100, Darren Cook <darren@example.com> wrote: > Quite closely related, I've been wondering what the state of the > art for open-source OCR is, particularly of Japanese text. > ... > This could then lead on to the greatest unsolved computing > challenge of the 21st century, which is a PDF to text converter. Here is a link that talks about "extracting tabular data from PDF files and images of tables." (I have not tried the software.) Apparently it uses Tesseract for OCR. https://eihli.github.io/image-table-ocr/pdf_table_extraction_and_ocr.html Here is another link to a PDF-Extraction-Kit, called MinerU. (Sorry, I haven't tried this one either.) https://github.com/opendatalab/MinerU Hope this helps, jimb. Jim Blackson
- References:
- Re: [tlug] Anyone alive out here ?
- From: Bernie Innocenti
- Re: [tlug] Anyone alive out here ?
- From: Darren Cook
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Anyone alive out here ?
- Next by Date: Re: [tlug] Anyone alive out here ?
- Previous by thread: Re: [tlug] Anyone alive out here ?
- Next by thread: Re: [tlug] Anyone alive out here ?
- Index(es):