Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Unicode/ICU question about joining lines

On Thu, Aug 12, 2021 at 4:21 PM <> wrote:
> Do you accept multi-lingual text? If not, then a simple hack would be to just
> look for spaces in the input text and classify the language accordingly. The
> probability of mis-detection should decrease exponentially with the input
> length.

That is an interesting idea!  Thanks!

The software that I am working on does accept multilingual text, but
users can write the text in one (long) line in cases where lines are not
joined correctly, so this could be a viable option.

> Of course, even J語 does sometimes contain spaces in practice, simply as a
> mistake or as a kind of "scare quote" emphasis around words.

At a company that I worked at, developers put spaces around all ローマ字
words in Japanese text.  I am not certain, but I think that the practice
originated because the ticketing system in use required spaces to
correctly parse markup:

    例えば、 @foldText@ は関数である。

I suspect that they started to put spaces around all such ローマ字 for
consistency.  An unfortunate result was that such spaces would often
cause unsightly line wrapping in rendered text.



Home | Main Index | Thread Index