Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Do you whitelist or blacklist utf-8?
- Date: Tue, 22 Feb 2011 19:17:48 +0200
- From: Shmuel Fomberg <owner@example.com>
- Subject: Re: [tlug] Do you whitelist or blacklist utf-8?
- References: <4D639689.1010302@example.com>
- User-agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7
Hi Dave. Assuming that you want to avoid SQL-Injection attacks, move the input to the target encoding before doing any filtering. doing filtering in different encoding is malpractice. after it is in the target encoding, you probably want to examine only characters that are in the ascii range. if your encoding is utf-8, you can write a tight loop that examine the MSBof a byte, and pass this byte if it is set. else - whitelist / blacklist this byte.IMHO, only whitelist. Of course, all this is not excuse for not using pre-compiled SQL queries with placeholders, or whatever they are called in PHP. Good luck, Shmuel. On 2011/02/22 12:57, Dave M G wrote:TLUG, I've been going a little mental today trying to figure out how to filter out possible malicious characters from POST data going to my site. I want to block things like<,>, *. etc... The thing is that I also want to be able to allow CJK characters, and any other language with non-Latin characters. This is a snap to do if you just want to allow 0-9a-zA-Z. But once you get into Unicode land, it seems to be a whole other ballgame. I've got three stages I want to filter on. First I want to block characters on the client side with Javascript, so that the user is aware of what characters are permissible when entering names and whatnot. Then I want to block any bad characters on the server side in PHP to make sure no script kiddies have tried to POST anything nasty. And also, just for good measure, I want to ensure no nastiness is inserted into my MySQL. I'd like all three steps to be consistent with each other, so I'm trying to standardize a set of bad characters that I can filter for at each step. However, where I've broken down is whether or not I should blacklist bad characters (where I fear I might miss one), whitelist good characters (seems tough to get a whitelist that's utf-8 compatible), or do something like make comparisons on HTML entities or with regex or something using built in functions (PHP and Javascript differ on specific functions and their results). Since you guys are the go-to people for handling utf-8 text, I thought maybe you've encountered this before. How do you handle filtering malicious code from utf-8 text that contains CJK and other languages? And how do you do it in PHP and Javascript?
- Follow-Ups:
- Re: [tlug] Do you whitelist or blacklist utf-8?
- From: Dave M G
- Re: [tlug] Do you whitelist or blacklist utf-8?
- From: Josh Glover
- Re: [tlug] Do you whitelist or blacklist utf-8?
- From: Peter Brandt
- References:
- [tlug] Do you whitelist or blacklist utf-8?
- From: Dave M G
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] cacert question
- Next by Date: [tlug] Solaris tar: how to pre-pend a parent directory?
- Previous by thread: Re: [tlug] Do you whitelist or blacklist utf-8?
- Next by thread: Re: [tlug] Do you whitelist or blacklist utf-8?
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links