Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Anyone alive out here ?



Later, I installed ollama and blogged about my experience with a few freely available LLMs:

   https://mstdn.io/@codewiz/112527717194517544

That was interesting. Which AMD GPU are you using?

Do you have a frontend speech-to-text model that spits out text as input to a regular LLM, then feed the output to a TTS model? ...
...
What's the current state of the art? Interfacing models with text has big limits...

Quite closely related, I've been wondering what the state of the art for open-source OCR is, particularly of Japanese text.

All the links go to Tesseract (or wrappers around it), which is simply not good enough, even for English. Or to online tools where you upload your private data, and pay for the privilege.

This could then lead on to the greatest unsolved computing challenge of the 21st century, which is a PDF to text converter. Yes, yes, I know toy examples work. I mean PDFs that were made in Microsoft Word, contain multiple columns and figures, and inset boxes, like a magazine, or a company's annual report. And extracting all the sections with headings, in a reasonable reading order (suitable for, say, a screen reader).

I thought Google, OpenAI, etc. had solved it, as they train the LLMs on PDFs, and ChatGPT can be given a PDF and answer questions. But, as far as I've been able to find out, they either feed raw PDF bytes in, and hope, or they use pdf2text, and hope.

Darren



Home | Main Index | Thread Index