Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

UTF-8: each character is one byte . . . . . . (was: Re: Learn a Variety of Languages) [tlug]



Curt wrote:

> In UTF-8, all characters contain exactly one byte without the high bit set.

Oh dear. 

When Ian started questioning Curt's code, 
it would have been a good time to check one's assumptions. 

> You can easily look up on the web how the encoding works.

Indeed. 

Especially since on Tue, 16 Jan 2007 09:05:44 -0500 I wrote
about which web page to read:

>    http://en.wikipedia.org/wiki/UTF-8

almost a day before Wed, 17 Jan 2007 17:08:03 +0900 (JST) when Curt wrote:

>      class String
>  	def utf8_char_count
>  	    split('').select { |c| c[0] < 128 }.length
>  	end
>      end



Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links