Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Unicode/ICU question about joining lines



On Thu, Aug 12, 2021 at 4:21 PM <eizietheez@example.com> wrote:
> Do you accept multi-lingual text? If not, then a simple hack would be to just
> look for spaces in the input text and classify the language accordingly. The
> probability of mis-detection should decrease exponentially with the input
> length.

That is an interesting idea!  Thanks!

The software that I am working on does accept multilingual text, but
users can write the text in one (long) line in cases where lines are not
joined correctly, so this could be a viable option.

> Of course, even J語 does sometimes contain spaces in practice, simply as a
> mistake or as a kind of "scare quote" emphasis around words.

At a company that I worked at, developers put spaces around all ローマ字
words in Japanese text.  I am not certain, but I think that the practice
originated because the ticketing system in use required spaces to
correctly parse markup:

    例えば、 @foldText@ は関数である。

I suspect that they started to put spaces around all such ローマ字 for
consistency.  An unfortunate result was that such spaces would often
cause unsightly line wrapping in rendered text.

Cheers,

Travis


Home | Main Index | Thread Index