TLUG Mailing List

Mailing List Archive

tlug.jp Mailing List tlug archive tlug Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] Anyone alive out here ?

Date: Sat, 7 Sep 2024 12:08:09 +0900

From: Bernie Innocenti <bernie@example.com>

Subject: Re: [tlug] Anyone alive out here ?

References: <f03af814-4ec7-6b38-59fb-3c1851faf7a8@gmail.com> <2d9532be-b1af-42d1-a8cd-8eae13f9f9d5@codewiz.org> <f48593f1-8a3a-440d-9b29-33199b6dcef2@dcook.org>

Organization: Codewiz - https://codewiz.org/

User-agent: Mozilla Thunderbird Beta
On 2024/09/05 20:58, Darren Cook wrote:
Later, I installed ollama and blogged about my experience with a fewfreely available LLMs:
   https://mstdn.io/@codewiz/112527717194517544
That was interesting. Which AMD GPU are you using?
Radeon RX 7900 XT. It's 2 years old, but very stable and performs wellon Linux for desktop, gaming and compute workloads. AMD'sCUDA-compatible stack no longer sucks as it used to.
The main issue is having only 20GB of RAM. Not sure when they willannounce the next gen of desktop GPUs with at least 40GB, which would bevery useful for LLM inference and training.
If you need more GPU RAM, I'd recommend getting a VM on Google Cloud orother provider that charges by the hour.
What's the current state of the art? Interfacing models with text hasbig limits...
Quite closely related, I've been wondering what the state of the art foropen-source OCR is, particularly of Japanese text.
I'm waiting for the first Llama-like LLM with image recognition similarto ChatGPT.
It's not the same of classic OCR and has obvious limitations, but amultimodal LLM can figure out things from context, like "oh, this is thehand writing of a young child... and this word is misspelled, I thinkthey meant to write hamburger... pages 3 and 4 appear to be swapped".
> All the links go to Tesseract (or wrappers around it), which is simply
> not good enough, even for English. Or to online tools where you upload
> your private data, and pay for the privilege.
>
> This could then lead on to the greatest unsolved computing challenge
> of the 21st century, which is a PDF to text converter. Yes, yes, I
> know toy examples work. I mean PDFs that were made in Microsoft Word,
> contain multiple columns and figures, and inset boxes, like a
> magazine, or a company's annual report. And extracting all the
> sections with headings, in a reasonable reading order (suitable for,
> say, a screen reader).
>
> I thought Google, OpenAI, etc. had solved it, as they train the LLMs
> on PDFs, and ChatGPT can be given a PDF and answer questions. But, as
> far as I've been able to find out, they either feed raw PDF bytes in,
> and hope, or they use pdf2text, and hope.
Ultimately, narrow models for OCR, speech recognition and translationcan't improve beyond a certain point because they lack semanticunderstanding. They also can't be instructed in plain English to adjustthe output (e.g. "use informal language in the dialogue part, but formalJapanese for the professor").
Just yesterday, I asked ChatGPT to compare a 5-page lease contract inJapanese with an older one from 5 years ago, then summarize theimportant differences along with any clauses I should watch out for.
I could have asked "translate this PDF to English", but then I wouldhave to do all the hard work by myself :-)
--
_ // Bernie Innocenti
\X/  https://codewiz.org/
Follow-Ups:

Re: [tlug] Anyone alive out here ?
From: Chris Salisbury

References:

Re: [tlug] Anyone alive out here ?
From: J. Hart

Re: [tlug] Anyone alive out here ?
From: Bernie Innocenti

Re: [tlug] Anyone alive out here ?
From: Darren Cook

Prev by Date: Re: [tlug] PDF to text converter (was: Anyone alive out here ?)

Next by Date: Re: [tlug] Anyone alive out here ?

Previous by thread: Re: [tlug] PDF to text converter (was: Anyone alive out here ?)

Next by thread: Re: [tlug] Anyone alive out here ?

Index(es):

Date

Thread

Home | Main Index | Thread Index