Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Unicode/ICU question about joining lines
- Date: Sun, 15 Aug 2021 13:35:32 +0900
- From: "Stephen J. Turnbull" <turnbull.stephen.fw@example.com>
- Subject: Re: [tlug] Unicode/ICU question about joining lines
- References: <CACaJP_QGLoO=qFPSQYUFp3PvZy7O7PTEBFvjBaom4-vPuHZLmw@mail.gmail.com> <24853.16777.99356.678602@turnbull.sk.tsukuba.ac.jp> <e3e10d14-67cd-8383-f6c2-b050d4ee153b@dcook.org>
Darren Cook writes: > :-) The problem described by Travis is one that comes up in various > places, but I struggle with it most in PDF files. Whenever I tell people > that PDF to Text extraction is a hard problem, research-level problem, > Another key place it needs to be dealt with is in OCR. But those are really different problems from Travis's, since spaces don't exist as coded characters, but rather as offsets in image space. BTW I was amused that Travis pointed out you didn't have to go to ancient languages to find inconsistency in use of word-separating spaces in a single script. Of course it was Japanese! Steve
- References:
- [tlug] Unicode/ICU question about joining lines
- From: Travis Cardwell
- [tlug] Unicode/ICU question about joining lines
- From: Stephen J. Turnbull
- Re: [tlug] Unicode/ICU question about joining lines
- From: Darren Cook
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Unicode/ICU question about joining lines
- Next by Date: [tlug] Online Nomikai this Friday?
- Previous by thread: Re: [tlug] Unicode/ICU question about joining lines
- Next by thread: Re: [tlug] Recent conversation on devel@fedoraproject
- Index(es):