TLUG Mailing List

Mailing List Archive

tlug.jp Mailing List tlug archive tlug Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] Unicode/ICU question about joining lines

Date: Wed, 11 Aug 2021 09:47:59 +0100

From: Darren Cook <darren@example.com>

Subject: Re: [tlug] Unicode/ICU question about joining lines

References: <CACaJP_QGLoO=qFPSQYUFp3PvZy7O7PTEBFvjBaom4-vPuHZLmw@mail.gmail.com>
> I have a Unicode question, and I am posting to this mailing list because
> it appears that the Lingo list is not used these days.

The [unicode] and/or [icu] tags on StackOverflow might help. A quick
scan looks there is more questions about going the other way, though.

> My problem is straightforward.  Given a string containing a paragraph of
> text with "soft" line breaks, I want to output a string containing the
> text without line breaks.  The way that lines are joined depends on the
> language.  Many languages such as English require spaces, while many
> languages such as Japanese do not use spaces.

Do you need to handle all possible languages? I'd probably start by
making the "" rule for lines that end in the CJK block characters, as
well as "-", and use " " as the default rule for all other characters,
and adapt that as people complain.

But if your source text contains hyphens at the end of lines, people
will start complaining very quickly. Knowing the difference between a
non-hyphenated word that got split with a hyphen, and a hyphenated word
that got split at its hyphen, is a big jump in complexity.

Darren
Follow-Ups:

Re: [tlug] Unicode/ICU question about joining lines
From: Travis Cardwell

References:

[tlug] Unicode/ICU question about joining lines
From: Travis Cardwell

Prev by Date: [tlug] Unicode/ICU question about joining lines

Next by Date: Re: [tlug] Unicode/ICU question about joining lines

Previous by thread: [tlug] Unicode/ICU question about joining lines

Next by thread: Re: [tlug] Unicode/ICU question about joining lines

Index(es):

Date

Thread

Home | Main Index | Thread Index