Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Website Question(s)
- Date: Fri, 05 Aug 2005 13:42:26 +0900
- From: "Stephen J. Turnbull" <stephen@example.com>
- Subject: Re: [tlug] Website Question(s)
- References: <42EF27DE.5060509@example.com><42F2A9B4.8080907@example.com><20050805004003.GC4441@example.com><42F2BA0E.4010501@example.com> <42F2BE68.8040600@example.com><42F2C0DA.60409@example.com> <42F2CDFA.2060607@example.com>
- Organization: The XEmacs Project
- User-agent: Gnus/5.1006 (Gnus v5.10.6) XEmacs/21.5 (corn, linux)
>>>>> "Lyle" == Lyle Saxon <Lyle> writes: Lyle> The language angle is one thing I've been wondering about - Lyle> the person having trouble with the links is in Portugal.... Not a problem. It can only be a problem in a language which uses 7-bit codes in a way incompatible with ASCII. That means JIS Roman, ancient European codes for non-Romance languages (cf Bjarne's "The C++ Programming Language" 1st ed, where he discusses ANSI trigraphs), and multibyte 7-bit ISO 2022 codes (in practice, used only by Japanese and Koreans). (EBCDIC doesn't really count, here.) [It always amazes me; every time you find a really bogus standard (or, to be kinder, one that was written for an environment where the Intel 8008 was an "advanced single-chip microprocessor" and scratchpad memory was implemented with paper) , it turns out that the Japanese have one just like it, and it's still in occasional use in 2005.] Lyle> http://www5d.biglobe.ne.jp/~LLLtrs/PhotoGlryMain/pgb/Kurihama01a.html Lyle> <http://www5d.biglobe.ne.jp/%7ELLLtrs/PhotoGlryMain/pgb/Kurihama01a.html> It's weird that that second form works; I would think that the browser should URL-encode the '%'. Hmm. Better look it up. http://www.rfc-editor.org/rfc/rfc1738.txt (which has been superseded) says: Octets must be encoded if they have no corresponding graphic character within the US-ASCII coded character set, if the use of the corresponding character is unsafe, or if the corresponding character is reserved for some other interpretation within the particular URL scheme. The unsafe characters, including "~", are listed in the RFC, so we can consider this to be a predefined list. Technically, then, http://www5d.biglobe.ne.jp/~LLLtrs/PhotoGlryMain/pgb/Kurihama01a.html is not an URL in the sense of RFC 1738. However, apparently it's acceptable in HTTP URLs because of a special rule for HTTP (from RFC 2396 which superseded RFC 1718): In some cases, data that could be represented by an unreserved character may appear escaped; for example, some of the unreserved "mark" characters are automatically escaped by some systems. If the given URI scheme defines a canonicalization algorithm, then unreserved characters may be unescaped according to that algorithm. For example, "%7e" is sometimes used instead of "~" in an http URL path, but the two are equivalent for an http URL. So you can legally write it either way. To know exactly what's going on (ie, what gets canonicalized where), you'd have to read the HTTP RFC 2616.[1] The bottom line seems to be that the practice of escaping "~" in HTTP URLs goes back to people trying to comply with RFC 1738, or maybe (as Brett suggested) so that you can type the URL using only characters appearing as labels on your keyboard. However, today you can use either form, with "~" being recommended. The authors go on to say: Because the percent "%" character always has the reserved purpose of being the escape indicator, it must be escaped as "%25" in order to be used as data within a URI. Implementers should be careful not to escape or unescape the same string more than once, since unescaping an already unescaped string might lead to misinterpreting a percent data character as another escaped character, or vice versa in the case of escaping an already escaped string. Translation into language we can all understand: MUZUKASHII DA YO NE!! Footnotes: [1] Actually, you have to read between the lines, because 2396 doesn't define any "unsafe" characters but 2616 refers to the unsafe characters as defined by 2396! -- School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software.
- References:
- [tlug] Website Question(s)
- From: Lyle (Hiroshi) Saxon
- Re: [tlug] Website Question(s)
- From: Lyle (Hiroshi) Saxon
- Re: [tlug] Website Question(s)
- From: Michael Smith
- Re: [tlug] Website Question(s)
- From: Matt Gushee
- Re: [tlug] Website Question(s)
- From: Lyle (Hiroshi) Saxon
- Re: [tlug] Website Question(s)
- From: Matt Gushee
- Re: [tlug] Website Question(s)
- From: Lyle (Hiroshi) Saxon
Home | Main Index | Thread Index
- Prev by Date: [tlug] Japanese PDF files in xpdf 3.0
- Next by Date: Re: [tlug] Japanese PDF files in xpdf 3.0
- Previous by thread: Re: [tlug] Website Question(s)
- Next by thread: Re: [tlug] Website Question(s)
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links