Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Do you whitelist or blacklist utf-8?
- Date: Thu, 24 Feb 2011 00:42:45 +0900
- From: Dave M G <dave@example.com>
- Subject: Re: [tlug] Do you whitelist or blacklist utf-8?
- References: <4D639689.1010302@example.com> <4D63EFBC.1020900@example.com> <4D64C5DD.1040607@example.com> <4D64CB49.10906@example.com>
- User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101208 Thunderbird/3.1.7
Shmuel, Josh, Peter, Thank you for responding. > I think that every character that is above the ascii range can be safely > passed. So you don't need a huge array. just small one. This sounds promising. > but first you need to tell us something about your data. is the user > allowed to enter HTML tags? Nope. I want to be real strict. They get: No punctuation at all. Only spaces, no other white space (tabs, line feed characters, or anything else). They can have 0-9a-zA-Z, and anything above the ASCII range (taking into account what you wrote above). > or are you using different mark-down scheme? I don't know what "mark-down scheme" means... so, uh... no? Maybe? I looked at the pages Peter suggested (I had seen some of them before), and according to that page, these might be the regular expressions I'm looking for: \p{L} (any kind of letter from any language) \p{N} (any number from any language) There is also \p{Z} for "any kind of white space", but I'm not sure how to handle this. I don't want line feeds or tabs or anything like that, but since Japanese, as one example, has it's own space character, I should allow that kind of space character from different languages. So, I suck at regex, but maybe I want to do something like this: ^\p{L}\p{N}\p{Z}$ ... and then black list the space characters I don't like: ^\n\r\t$ The only other thing that I'm not confident about is if this regular expression notation is compatible in PHP and Javascript. On the page Peter linked to, it mentions a ton of different langages, like Perl, Java, and PCRE and gives different notes on all of them, which gives me the impression that different languages have different particulars. Am I on the right track here? -- Dave M G
- Follow-Ups:
- Re: [tlug] Do you whitelist or blacklist utf-8?
- From: Josh Glover
- Re: [tlug] Do you whitelist or blacklist utf-8?
- From: Shmuel Fomberg
- References:
- [tlug] Do you whitelist or blacklist utf-8?
- From: Dave M G
- Re: [tlug] Do you whitelist or blacklist utf-8?
- From: Shmuel Fomberg
- Re: [tlug] Do you whitelist or blacklist utf-8?
- From: Dave M G
- Re: [tlug] Do you whitelist or blacklist utf-8?
- From: Shmuel Fomberg
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Do you whitelist or blacklist utf-8?
- Next by Date: [tlug] PHP functions to create MySQL syntax? (was: Do you whitelist or blacklist utf-8?)
- Previous by thread: Re: [tlug] Do you whitelist or blacklist utf-8?
- Next by thread: Re: [tlug] Do you whitelist or blacklist utf-8?
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links