Encodings
liorean wrote:
Hello!
Just wondering if anybody has any real world data lying around covering what character encodings are necessary to support real world script content. UTF-8, UTF-16 and ISO-8859-1 are a given guess. What else?
My data relates to feeds, so it may not apply here, but in general UTF-16, while used internally in many places, is not widely supported as an interchange format. Here are the encodings that the feed validator does not mark as obscure:
'US-ASCII', 'ISO-8859-1', 'UTF-8', 'EUC-JP', 'ISO-8859-2', 'ISO-8859-15', 'ISO-8859-7', 'KOI8-R', 'SHIFT_JIS', 'WINDOWS-1250', 'WINDOWS-1251', 'WINDOWS-1252', 'WINDOWS-1254', 'WINDOWS-1255', 'WINDOWS-1256'
One other deserves special mention: 'GB18030'. Doesn't seem to be popular, but is the Chinese government's mandatory standard.
- Sam Ruby
There are many encodings for chinese.
GB2312, GBK and GB18030 are national standards of mainland China, maps to the Codepage 936 in MS Windows. Big5 is the industrial standards in Taiwan and Hongkong.
You can read en.wikipedia.org/wiki/Character_set#Popular_character_encodings for more info.
Hello!
Just wondering if anybody has any real world data lying around covering what character encodings are necessary to support real world script content. UTF-8, UTF-16 and ISO-8859-1 are a given guess. What else?