Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Unicode/ICU question about joining lines
- Date: Thu, 12 Aug 2021 16:16:32 +0900
- From: eizietheez@example.com
- Subject: Re: [tlug] Unicode/ICU question about joining lines
- References: <CACaJP_QGLoO=qFPSQYUFp3PvZy7O7PTEBFvjBaom4-vPuHZLmw@mail.gmail.com> <CAAhy3dufsFDgNaF0V5yq0-VxKSyC5kkF4kF7dLabnYKk8o67rQ@mail.gmail.com> <CADR0rncr_fhnuKBzk1qqx=3niZnHBEQp31k5XrjnuFzdNwR-Vg@mail.gmail.com> <CACaJP_QU4zS-NjzuX5mq4c+uuMCsOk6otJTD-GCig94k_ZQtmg@mail.gmail.com>
- User-agent: mblaze/1.1
Travis Cardwell <travis.cardwell@example.com> wrote: > My goal is to create a function that determines how to join the > lines/fragments of text automatically, based on the content. In my > first post, I included some code that does this based on the Unicode > block of neighboring characters. This strategy works, but it requires > classifying the many Unicode blocks, and I hoped that there is an > easier way. Do you accept multi-lingual text? If not, then a simple hack would be to just look for spaces in the input text and classify the language accordingly. The probability of mis-detection should decrease exponentially with the input length. Of course, even J語 does sometimes contain spaces in practice, simply as a mistake or as a kind of "scare quote" emphasis around words.
- Follow-Ups:
- Re: [tlug] Unicode/ICU question about joining lines
- From: Travis Cardwell
- References:
- [tlug] Unicode/ICU question about joining lines
- From: Travis Cardwell
- Re: [tlug] Unicode/ICU question about joining lines
- From: Raymond Wan
- Re: [tlug] Unicode/ICU question about joining lines
- From: Benjamin Kowarsch
- Re: [tlug] Unicode/ICU question about joining lines
- From: Travis Cardwell
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Unicode/ICU question about joining lines
- Next by Date: Re: [tlug] Unicode/ICU question about joining lines
- Previous by thread: Re: [tlug] Unicode/ICU question about joining lines
- Next by thread: Re: [tlug] Unicode/ICU question about joining lines
- Index(es):