Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Unicode/ICU question about joining lines
- Date: Wed, 11 Aug 2021 09:47:59 +0100
- From: Darren Cook <darren@example.com>
- Subject: Re: [tlug] Unicode/ICU question about joining lines
- References: <CACaJP_QGLoO=qFPSQYUFp3PvZy7O7PTEBFvjBaom4-vPuHZLmw@mail.gmail.com>
> I have a Unicode question, and I am posting to this mailing list because > it appears that the Lingo list is not used these days. The [unicode] and/or [icu] tags on StackOverflow might help. A quick scan looks there is more questions about going the other way, though. > My problem is straightforward. Given a string containing a paragraph of > text with "soft" line breaks, I want to output a string containing the > text without line breaks. The way that lines are joined depends on the > language. Many languages such as English require spaces, while many > languages such as Japanese do not use spaces. Do you need to handle all possible languages? I'd probably start by making the "" rule for lines that end in the CJK block characters, as well as "-", and use " " as the default rule for all other characters, and adapt that as people complain. But if your source text contains hyphens at the end of lines, people will start complaining very quickly. Knowing the difference between a non-hyphenated word that got split with a hyphen, and a hyphenated word that got split at its hyphen, is a big jump in complexity. Darren
- Follow-Ups:
- Re: [tlug] Unicode/ICU question about joining lines
- From: Travis Cardwell
- References:
- [tlug] Unicode/ICU question about joining lines
- From: Travis Cardwell
Home | Main Index | Thread Index
- Prev by Date: [tlug] Unicode/ICU question about joining lines
- Next by Date: Re: [tlug] Unicode/ICU question about joining lines
- Previous by thread: [tlug] Unicode/ICU question about joining lines
- Next by thread: Re: [tlug] Unicode/ICU question about joining lines
- Index(es):