Character encoding hell: Converting MySQL latin1 to utf8 for use in Django

This is one of those problems that developers like to pretend doesn't exist. Until we actually have to deal with it of coarse. What I'm talking about is Character encoding hell. Lots of programmers (me included) have many times spent endless hours trying to figure out the mysteries of garbled text. So the reason I'm writing this blog post is to maybe help one or two people out there that are trying to do the same thing as I did today. Converting a MySQL database full of Swedish text (latin1) to a more manageable utf8 format.

We basically have different character encodings* because we have lots of different spoken and written languages in the world, and it was hard for the brilliant computer scientists back in the day to get it all right the first time.

*Note: Please read A tutorial on character code issues for a fuller explanation on character encodings.

The problem

I've got a MySQL database dump containing some 5000 products with descriptions all written in Swedish. As you might know Swedish contains characters like å, ä and ö. The database was also using latin1 as it's character set. So when firing it up in the Django admin I got something looking like this:

Mysql latin1

I spent about 5 hours trying to get this to display correctly. Django rightfully uses utf8 throughout, so converting my database to utf8 should take care of the problem?

alter database DATABASE_NAME charset=utf8;

No, it's not as easy as that, because when we convert the database to utf8, the data within stays the same. So what if we convert the table and the field?

alter table TABLE_NAME charset utf8;
alter table TABLE_NAME modify FIELD_NAME varchar(255) character set utf8;

No no, nothing changes, it's all the same garbled mess. So how can we get MySQL to actually convert the content?

The solution

Convert the field to BLOB. That's right, the first thing we need to do is converting the field type to a binary type. Then we change the character set of the database to utf8, and lastly we convert our field back to text or varchar.

alter table TABLE_NAME modify FIELD_NAME blob;
alter database DATABASE_NAME charset=utf8;
alter table TABLE_NAME modify FIELD_NAME varchar(255) character set utf8;

This forces MySQL to convert the binary format into varchar using character set utf8.

Mysql utf8

This trick should work regardless if you're using Django or not, that's just what I happened to be developing in. And telling Django to use a different character encoding is, from what I've heard, a nightmare.

11 comments made for “Character encoding hell: Converting MySQL latin1 to utf8 for use in Django”

DECEMBER 21ST, 2008
By: Nilesh

Nice work man! I achieved similar using iconv's command line tools, but your method is so much more simple.

JANUARY 16TH, 2009
By: Penny Lane

Dude, thanks so much for your post, you've totally saved my life (well, at least several frustrating shaking-fist-at-the-world hours of it)! Can't thank you enough, seriously.

JANUARY 26TH, 2009
By: Hanky

Stabilt! :)

FEBRUARY 23RD, 2009
By: Ben

Thanks, thanks, thanks ! I was already imagining hours of trouble ahead... But your 3 lines made the whole process so easy! I still can't believe it :)

FEBRUARY 27TH, 2009
By: Kim

Cool! Also, I have some questions about POKE. If you have time please drop me an email so that I get yours :)

MARCH 17TH, 2009
By: Bastiaan

you can do it quicker:

ALTER TABLE `name` MODIFY COLUMN `title` VARCHAR(255) CHARACTER SET binary;
ALTER TABLE `name` MODIFY COLUMN `title` VARCHAR(255) CHARACTER SET utf8;

MARCH 18TH, 2009
By: antonio

wow.. great solution!

APRIL 20TH, 2009
By: Mattias

@Bastiaan That is surely a quicker solution. Thanks for sharing.

APRIL 30TH, 2009
By: marjan

Hi,

I'm still struggling to find the solution to convert the data in MySQL but with no success. This method doesn't solving my problems, although effect from this update is now characters are shown as question marks ???????? //

Any other hint is more than welcome
marjan

MAY 5TH, 2009
By: patrick

@Bastiaan

Possible to wildcard tables/columns to run it in one go?*

Thanks!

FEBRUARY 16TH, 2010
By: Jeroen Tomas

Thanks as lot!

will never be published
optional