TLUG Mailing List

Mailing List Archive

tlug.jp Mailing List tlug archive tlug Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: UTF-8: each character is one byte . . . . . . (was: Re: Learn a Variety of Languages) [tlug]

Date: Sat, 20 Jan 2007 13:31:23 +0900

From: "Guillaume Proux" <gproux@example.com>

Subject: Re: UTF-8: each character is one byte . . . . . . (was: Re: Learn a Variety of Languages) [tlug]

References: <45AAFDA9.90504@example.com> <19dd68ba0701160122i1b813c10jf34c0210d53fbbdd@example.com> <op.tl8roo02rtshzt@example.com> <19dd68ba0701160412y2eb95062r6235fed92b752784@example.com> <Pine.NEB.4.64.0701162139360.10912@example.com> <3156339d0701161820lb684aeubcd51914b19a87bf@example.com> <Pine.NEB.4.64.0701171657080.1515@example.com> <3156339d0701180035k2a4f2b70o3bbf00612501470@example.com> <Pine.NEB.4.64.0701201123230.1314@example.com> <20070119230346.6435923f.jep200404@example.com>

> In UTF-8, all characters contain exactly one byte without the high bit set.
uh?
The wikipedia page that was linked to shows one example.
"""
For example, the character aleph (×), which is Unicode U+05D0, is
encoded into UTF-8 in this way:
   * It falls into the range of U+0080 to U+07FF. The table shows it
will be encoded using two bytes, 110yyyyy 10zzzzzz.
   * Hexadecimal 0x05D0 is equivalent to binary 101-1101-0000.
   * The eleven bits are put in their order into the positions marked
by "y"-s and "z"-s: 11010111 10010000.
   * The final result is the two bytes, more conveniently expressed
as the two hexadecimal bytes 0xD7 0x90. That is the encoding of the
character aleph (×) in UTF-8.
"""
U+05D0 codepoint is turned into 11010111 10010000 . Both byte having
the high bit set.
I am misunderstanding something or can we check this again?
Guillaume

Follow-Ups:

Re: UTF-8: each character is one byte . . . . . . (was: Re: Learn a Variety of Languages) [tlug]
From: Curt Sampson

References:

[tlug] What is the most appropriate scripting language
From: Dave M G

Re: Learn a Variety of Languages . . . . . . . (was: Re: [tlug] Re: Bourne Shell is the most appropriate scripting language)
From: Guillaume Proux

Re: Learn a Variety of Languages . . . . . . . (was: Re: [tlug] Re: Bourne Shell is the most appropriate scripting language)
From: Zev Blut

Re: Learn a Variety of Languages . . . . . . . (was: Re: [tlug] Re: Bourne Shell is the most appropriate scripting language)
From: Guillaume Proux

Re: Learn a Variety of Languages . . . . . . . (was: Re: [tlug] Re: Bourne Shell is the most appropriate scripting language)
From: Curt Sampson

Re: Learn a Variety of Languages . . . . . . . (was: Re: [tlug] Re: Bourne Shell is the most appropriate scripting language)
From: Ian MacLean

Re: Learn a Variety of Languages . . . . . . . (was: Re: [tlug] Re: Bourne Shell is the most appropriate scripting language)
From: Curt Sampson

Re: Learn a Variety of Languages . . . . . . . (was: Re: [tlug] Re: Bourne Shell is the most appropriate scripting language)
From: Ian MacLean

Re: Learn a Variety of Languages . . . . . . . (was: Re: [tlug] Re: Bourne Shell is the most appropriate scripting language)
From: Curt Sampson

UTF-8: each character is one byte . . . . . . (was: Re: Learn a Variety of Languages) [tlug]
From: Jim

Prev by Date: Advantage of Having or Not Having Header Files . . . . . . . (was: Re: To package or not to package) [tlug]

Next by Date: Re: Advantage of Having or Not Having Header Files . . . . . . . (was: Re: To package or not to package) [tlug]

Previous by thread: UTF-8: each character is one byte . . . . . . (was: Re: Learn a Variety of Languages) [tlug]

Next by thread: Re: UTF-8: each character is one byte . . . . . . (was: Re: Learn a Variety of Languages) [tlug]

Index(es):

Date

Thread

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links