Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Unicode/ICU question about joining lines
- Date: Thu, 12 Aug 2021 14:32:17 +0900
- From: Benjamin Kowarsch <trijezdci@example.com>
- Subject: Re: [tlug] Unicode/ICU question about joining lines
- References: <CACaJP_QGLoO=qFPSQYUFp3PvZy7O7PTEBFvjBaom4-vPuHZLmw@mail.gmail.com> <CAAhy3dufsFDgNaF0V5yq0-VxKSyC5kkF4kF7dLabnYKk8o67rQ@mail.gmail.com>
On Thu, 12 Aug 2021 at 12:02, Raymond Wan <rwan.kyoto@example.com> wrote:On Wed, Aug 11, 2021 at 4:28 PM Travis Cardwell wrote:
> My problem is straightforward. Given a string containing a paragraph of
> text with "soft" line breaks, I want to output a string containing the
> text without line breaks. The way that lines are joined depends on the
> language. Many languages such as English require spaces, while many
> languages such as Japanese do not use spaces.Does the language need to be detected by analysing the text?Or can the language be an input parameter?If it can be an input parameter this is quite trivial.For languages without whitespace separation between words:Simply process the input to delete the soft line breaks.For languages with whitespace separation between words:Simply process the input to replace the soft line breaks with whitespaceUNLESS that soft-break is preceded or followed by a whitespace, in whichcase you can simply delete the soft-break.If you need to detect the language, you might want to do this as apreparatory step before doing the above.
- Follow-Ups:
- Re: [tlug] Unicode/ICU question about joining lines
- From: Travis Cardwell
- References:
- [tlug] Unicode/ICU question about joining lines
- From: Travis Cardwell
- Re: [tlug] Unicode/ICU question about joining lines
- From: Raymond Wan
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Unicode/ICU question about joining lines
- Next by Date: Re: [tlug] Unicode/ICU question about joining lines
- Previous by thread: Re: [tlug] Unicode/ICU question about joining lines
- Next by thread: Re: [tlug] Unicode/ICU question about joining lines
- Index(es):