Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Unicode/ICU question about joining lines
- Date: Fri, 13 Aug 2021 11:14:58 +0100
- From: Darren Cook <darren@example.com>
- Subject: Re: [tlug] Unicode/ICU question about joining lines
- References: <CACaJP_QGLoO=qFPSQYUFp3PvZy7O7PTEBFvjBaom4-vPuHZLmw@mail.gmail.com> <24853.16777.99356.678602@turnbull.sk.tsukuba.ac.jp>
> It is, it just doesn't get much traffic because all linguistic > problems are straightforward. ;-) :-) The problem described by Travis is one that comes up in various places, but I struggle with it most in PDF files. Whenever I tell people that PDF to Text extraction is a hard problem, research-level problem, and there are even academic conferences for it, they look at me like obviously I'm an idiot and haven't tried applying Algorithms to it. (https://xkcd.com/1831/) Another key place it needs to be dealt with is in OCR. And for pulling out paragraphs from emails. And now I'll add restoring git commits to the list! Darren
- Follow-Ups:
- Re: [tlug] Unicode/ICU question about joining lines
- From: Stephen J. Turnbull
- References:
- [tlug] Unicode/ICU question about joining lines
- From: Travis Cardwell
- [tlug] Unicode/ICU question about joining lines
- From: Stephen J. Turnbull
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Unicode/ICU question about joining lines
- Next by Date: Re: [tlug] Unicode/ICU question about joining lines
- Previous by thread: Re: [tlug] Unicode/ICU question about joining lines
- Next by thread: Re: [tlug] Unicode/ICU question about joining lines
- Index(es):