Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Unicode/ICU question about joining lines
- Date: Thu, 12 Aug 2021 12:22:32 +0900
- From: Travis Cardwell <travis.cardwell@example.com>
- Subject: Re: [tlug] Unicode/ICU question about joining lines
- References: <CACaJP_QGLoO=qFPSQYUFp3PvZy7O7PTEBFvjBaom4-vPuHZLmw@mail.gmail.com> <CABHGxq5Ma58RaaruwK6x+5o_vBh66fkjHnTjWYpaMC5FYOgOTg@mail.gmail.com>
On Thu, Aug 12, 2021 at 11:07 AM Jim Breen wrote: > > Given a string containing a paragraph of > > text with "soft" line breaks, > > What exactly do you mean by a _"soft" line break_? Is it a specific > character? In document/markup languages, a soft line break is a line break in the source code that does not represent a line break in the actual content. The line breaks are the usual (`\n`), not a special character. For example, (La)TeX allows you to write a single paragraph by "wrapping" text across multiple lines (using "soft line breaks"). The soft line breaks in the source do not determine where lines are broken in the output. The term "soft" is used to distinguish this type of line break from "hard" line breaks in the output (using `\\`, `\newline`, or `\hfill \break` for example). > > I want to output a string containing the > > text without line breaks. > > Output to what? Write it to a file (as in fprintf() in C), display it > on a screen, chisel it on stone, ...? I wrote that with a function in mind. The input of the function is a string that may contain newlines, and the output of the function is a string that does not contain newlines. Such a function could be used with input read from a file (or `STDIN`/API/database), and the output could be written to a file (or `STDOUT`/API/database). > > The way that lines are joined depends on the > > language. Many languages such as English require spaces, while many > > languages such as Japanese do not use spaces. > > Don't you really mean "[t]he way that lines are *broken* when > displaying, printing, etc. depends ....."? I think that examples may best illustrate the motivation. Consider the following English sentence, which is split into two lines using a soft line break: This is an English example. The input string is `This is an\nEnglish example.` (which could have a trailing line break, but that is unrelated to this problem). The function should return `This is an English example.` in this case because English uses spaces to separate words. Here is a Japanese sentence, which is split into two lines using a soft line break: これは日本語の 例です。 The input string is `これは日本語の\n例です。`, and the function should return `これは日本語の例です。` in this case because Japanese does not use spaces to separate words. ICU provides an API for breaking text, but I do not know of a good way to "join" lines of text like this. > Sorry if this is being difficult or pedantic, but I can't get my head > around the question itself. No problem at all! In my attempt to keep my question concise, I was not very clear. Sorry about that! Cheers, Travis
- Follow-Ups:
- Re: [tlug] Unicode/ICU question about joining lines
- From: Michael Paddon
- Re: [tlug] Unicode/ICU question about joining lines
- From: Jim Breen
- References:
- [tlug] Unicode/ICU question about joining lines
- From: Travis Cardwell
- Re: [tlug] Unicode/ICU question about joining lines
- From: Jim Breen
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Unicode/ICU question about joining lines
- Next by Date: Re: [tlug] Unicode/ICU question about joining lines
- Previous by thread: Re: [tlug] Unicode/ICU question about joining lines
- Next by thread: Re: [tlug] Unicode/ICU question about joining lines
- Index(es):