I did a quick bit of research on Japanese character encodings and how functions in PHP handle the conversions between them.
The table below summarizes the results (click to enlarge).
We can see the following;
- Although Shift-JIS (SJIS) is still the most common format in Japan, it is terrible at handling special “hankaku” (single-width) characters. It simply leaves out a lot of them; even the ones that we would like to use quite frequently.
- The PHP
mb_convert_encodingfunction gives up when it can’t find a matching character, and deletes the character. On the other hand,
iconvdoes a pretty good job of finding a good substitute if we specify
- Gathering from webpages that I can find on the subject, a lot of people seem to prefer
mb_convert_encodingwith the sjis-win encoding. This is a lousy solution if you are using special “hankaku” characters. It’s better to use
iconvwith CP932 encoding and
//TRANSLIT. There is one snag with CP932 encoding with
//TRANSLITand that is with regards to the “hankaku” yen character (“¥”). Converting to “yen” isn’t really a nice solution. You can see however that
//TRANSLITalways converts to ASCII, and “yen” probably is the only way you can sensibly convert the ¥ mark. Otherwise, it’s a good idea to use the “zenkaku” (double-width) “￥”.
- The micro mark “µ” is not supported in Shift-JIS but the greek mu “μ” is. Therefore, if you want to write a micro mark in Shift-JIS, you should use the greek mu instead. Again,
//TRANSLITdoes the correct thing (converting it to “u”).