Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlug] unable to create local copy utf8 encoded Japanese MySQL data



TLUG,

    This question is more MySQL than specifically Linux, although I'm 
running MySQL exclusively on Linux. The thing is that I posted this 
question to the MySQL mailing list, but I'm seeing no action there 
because there doesn't seem to be many people familiar with encoding 
issues. So I'm turning to you guys in hopes that Japanese and utf-8 
encoding is something you can help with.

    I have a MySQL database that I am trying to copy from my hosting 
service, where it was created, to my home machine, where I now run 
Ubuntu Linux and I want to create a full testing and development 
environment. The database is in utf8 encoding, and has a mix of English 
and Japanese.

    It might be relevant that the data has been around for a few years, 
and when it was created, the hosting service was running MySQL 3.2 (not 
exactly sure, but somewhere in the 3 series), which did not have utf-8 
support. So there is a mix of how the Japanese is stored in the 
database, as described below.

    Currently, both the hosting service and my home computer are running 
MySQL 4.2, which has decent utf-8 support.

   So, to the specific problem. I've exported an .sql file from my 
hosting service, with structure and data, and copied it to my home machine.

   I can take the .sql file, open it in OpenOffice Write as a text 
encoded file, and verify that it is encoded in utf-8. Most of the 
Japanese text shows up readable. Some of it, however, shows up as coded 
numbers (I'm not sure what the term is when utf displays this way): 
メーン・ I think this might be "legacy" data, 
held over from the days when MySQL did not have utf8 support.

  When I import the .sql file into MySQL, I can look at it in phpMyAdmin 
and see that the text that displayed correctly as Japanese in OpenOffice 
still displays correctly as Japanese. The text that was in number form 
is also still in number form when viewed through phpMyAdmin. In short, 
phpMyAdmin sees it after import the same way that OpenOffice did before 
import.

  But, then when I view a PHP file in FireFox, and it accesses the 
database, the situation changes. The text that is encoded as numbers 
displays as correct Japanese. The text that displays as actual Japanese 
text in OpenOffice and phpMyAdmin now displays as question marks.

  Again, just to be clear, all Japanese characters, regardless of how 
they look in phpMyAdmin, display correctly when viewed from the hosting 
service.

    The ideal scenario is for the text that displays as proper Japanese 
characters in OpenOffice and phpMyAdmin to display as proper Japanese 
characters when viewed via PHP/Browser. I am willing to go through the 
database with a fine tooth comb and replace the numbered utf8 characters 
with the correct Japanese. But I'm not willing to do the reverse, and 
make the database completely non-human readable when viewed through 
phpMyAdmin.

   I hope someone can shed some light on this.

   Thank you.

--
Dave M G


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links