Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: tlug: Two Qs re translation project
- To: tlug@example.com
- Subject: Re: tlug: Two Qs re translation project
- From: Adrian Havill <havill@example.com>
- Date: Fri, 28 Jan 2000 15:48:44 +0900
- Content-Transfer-Encoding: 7bit
- Content-Type: text/plain; charset=iso-2022-jp
- Organization: TurboLinux Japan
- References: <20000128060241.A508@example.com>
- Reply-To: tlug@example.com
- Sender: owner-tlug@example.com
"Frank Bennett (フランクべネット )" wrote: > I also have a not-unrelated question that someone (Steve > Turnbull?) will be able to help with. The Jse data is stored > in EUC. In EUC encoding, could a one-byte search engine > capable to indexing 8-bit text be used? In other words, > if there is a string made up of four bytes: > > [A] [B] [C] [D] > > where A and C are the first bytes of two-byte characters > in EUC-JP encoding, and we run a search using a single-byte > search engine for a single arbitrary two-byte character, is it > possible that our character's underlying encoding could > be [B] [C]? Or is it logically impossible in EUC-JP > encoding to get crossed up in this way? > > In other words, what are the legal bounds of the first and > the second bytes in EUC-JP encoding? A1..FE for both bytes. So yes, it is possible for the trail and header bytes combined to be misinterpreted as a false positive. Also, if you index a lot of web data, which tends to use Latin 1 characters even in English (for degree signs and the occasional accented vowel (Pokemon! Sake!), you'll probably run into problems there as well, unless you're sure that the non JIS data will always be ASCII. UTF-8 doesn't suffer from this problem, btw. By design, the head byte is structurally different from the tail byte(s) so a 8-bit clean string search won't deliver a false positive. -------------------------------------------------------------------- Next Nomikai Meeting: February 18 (Fri) 19:00 Tengu TokyoEkiMae Next Technical Meeting: March 11 (Sat) 13:00 Temple University Japan * Topic: TBD -------------------------------------------------------------------- more info: http://www.tlug.gr.jp Sponsor: Global Online Japan
- References:
- tlug: Two Qs re translation project
- From: "Frank Bennett (=?iso-2022-jp?B?GyRCJVUlaSVzJS8kWSVNJUMlSBsoQg==?= )" <bennett@example.com>
Home | Main Index | Thread Index
- Prev by Date: tlug: Two Qs re translation project
- Next by Date: RE: tlug: Two Qs re translation project
- Prev by thread: tlug: Two Qs re translation project
- Next by thread: tlug: Two Qs re translation project
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links