Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] utf form problems: solution



David Shanahan wrote:
Sounds like perl might be treating your string as bytes rather than a
unicode string,
then muching them on output.

You should read the docs on the Encode module;
perldoc Encode

You may need to do
$string = decode("utf8", param("stuff"));
to tell perl the %E5%8A%A9 bytes from the form is actually utf8 string

also look at the docs on the -C option ...

I needed -CO for printing to stdout, decode to tell perl string was utf8 and 'use utf8' because file contained utf8. This was my minimal debug script for testing the post form.


#!.../perl -CO
# output unicode to stdout.  Input is ascii
use utf8; # file contained utf-8 characters.
use Encode;
use URI::Escape;
my $input;
read(STDIN,$input,$ENV{CONTENT_LENGTH},0);
# Tell perl this is a utf-8 string.
my $unencoded = decode('utf8', uri_unescape($input));
print  "content-type: text/plain charset=UTF-8\n\n",
    "今日は世界\n",  # 日本語と"hello world"
    $unencoded;

Next I work on reading stdin safely.

Thanks everyone!

Steve S.


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links