
Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] Website Question(s)
>>>>> "Lyle" == Lyle Saxon <Lyle> writes:
Lyle> The language angle is one thing I've been wondering about -
Lyle> the person having trouble with the links is in Portugal....
Not a problem. It can only be a problem in a language which uses
7-bit codes in a way incompatible with ASCII. That means JIS Roman,
ancient European codes for non-Romance languages (cf Bjarne's "The C++
Programming Language" 1st ed, where he discusses ANSI trigraphs), and
multibyte 7-bit ISO 2022 codes (in practice, used only by Japanese and
Koreans). (EBCDIC doesn't really count, here.)
[It always amazes me; every time you find a really bogus standard (or,
to be kinder, one that was written for an environment where the Intel
8008 was an "advanced single-chip microprocessor" and scratchpad
memory was implemented with paper) , it turns out that the Japanese
have one just like it, and it's still in occasional use in 2005.]
Lyle> http://www5d.biglobe.ne.jp/~LLLtrs/PhotoGlryMain/pgb/Kurihama01a.html
Lyle> <http://www5d.biglobe.ne.jp/%7ELLLtrs/PhotoGlryMain/pgb/Kurihama01a.html>
It's weird that that second form works; I would think that the browser
should URL-encode the '%'.
Hmm. Better look it up. http://www.rfc-editor.org/rfc/rfc1738.txt
(which has been superseded) says:
Octets must be encoded if they have no corresponding graphic
character within the US-ASCII coded character set, if the use of the
corresponding character is unsafe, or if the corresponding character
is reserved for some other interpretation within the particular URL
scheme.
The unsafe characters, including "~", are listed in the RFC, so we can
consider this to be a predefined list. Technically, then,
http://www5d.biglobe.ne.jp/~LLLtrs/PhotoGlryMain/pgb/Kurihama01a.html
is not an URL in the sense of RFC 1738. However, apparently it's
acceptable in HTTP URLs because of a special rule for HTTP (from RFC
2396 which superseded RFC 1718):
In some cases, data that could be represented by an unreserved
character may appear escaped; for example, some of the unreserved
"mark" characters are automatically escaped by some systems. If the
given URI scheme defines a canonicalization algorithm, then
unreserved characters may be unescaped according to that algorithm.
For example, "%7e" is sometimes used instead of "~" in an http URL
path, but the two are equivalent for an http URL.
So you can legally write it either way. To know exactly what's going
on (ie, what gets canonicalized where), you'd have to read the HTTP
RFC 2616.[1] The bottom line seems to be that the practice of escaping
"~" in HTTP URLs goes back to people trying to comply with RFC 1738,
or maybe (as Brett suggested) so that you can type the URL using only
characters appearing as labels on your keyboard. However, today you
can use either form, with "~" being recommended.
The authors go on to say:
Because the percent "%" character always has the reserved purpose of
being the escape indicator, it must be escaped as "%25" in order to
be used as data within a URI. Implementers should be careful not to
escape or unescape the same string more than once, since unescaping
an already unescaped string might lead to misinterpreting a percent
data character as another escaped character, or vice versa in the
case of escaping an already escaped string.
Translation into language we can all understand: MUZUKASHII DA YO NE!!
Footnotes:
[1] Actually, you have to read between the lines, because 2396
doesn't define any "unsafe" characters but 2616 refers to the unsafe
characters as defined by 2396!
--
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Ask not how you can "do" free software business;
ask what your business can "do for" free software.
Home |
Main Index |
Thread Index