| To generate or modify mapping headers | |
| ------------------------------------- | |
| Mapping headers are imported from CJKCodecs as pre-generated form. | |
| If you need to tweak or add something on it, please look at tools/ | |
| subdirectory of CJKCodecs' distribution. | |
| Notes on implmentation characteristics of each codecs | |
| ----------------------------------------------------- | |
| 1) Big5 codec | |
| The big5 codec maps the following characters as cp950 does rather | |
| than conforming Unicode.org's that maps to 0xFFFD. | |
| BIG5 Unicode Description | |
| 0xA15A 0x2574 SPACING UNDERSCORE | |
| 0xA1C3 0xFFE3 SPACING HEAVY OVERSCORE | |
| 0xA1C5 0x02CD SPACING HEAVY UNDERSCORE | |
| 0xA1FE 0xFF0F LT DIAG UP RIGHT TO LOW LEFT | |
| 0xA240 0xFF3C LT DIAG UP LEFT TO LOW RIGHT | |
| 0xA2CC 0x5341 HANGZHOU NUMERAL TEN | |
| 0xA2CE 0x5345 HANGZHOU NUMERAL THIRTY | |
| Because unicode 0x5341, 0x5345, 0xFF0F, 0xFF3C is mapped to another | |
| big5 codes already, a roundtrip compatibility is not guaranteed for | |
| them. | |
| 2) cp932 codec | |
| To conform to Windows's real mapping, cp932 codec maps the following | |
| codepoints in addition of the official cp932 mapping. | |
| CP932 Unicode Description | |
| 0x80 0x80 UNDEFINED | |
| 0xA0 0xF8F0 UNDEFINED | |
| 0xFD 0xF8F1 UNDEFINED | |
| 0xFE 0xF8F2 UNDEFINED | |
| 0xFF 0xF8F3 UNDEFINED | |
| 3) euc-jisx0213 codec | |
| The euc-jisx0213 codec maps JIS X 0213 Plane 1 code 0x2140 into | |
| unicode U+FF3C instead of U+005C as on unicode.org's mapping. | |
| Because euc-jisx0213 has REVERSE SOLIDUS on 0x5c already and A140 | |
| is shown as a full width character, mapping to U+FF3C can make | |
| more sense. | |
| The euc-jisx0213 codec is enabled to decode JIS X 0212 codes on | |
| codeset 2. Because JIS X 0212 and JIS X 0213 Plane 2 don't have | |
| overlapped by each other, it doesn't bother standard conformations | |
| (and JIS X 0213 Plane 2 is intended to use so.) On encoding | |
| sessions, the codec will try to encode kanji characters in this | |
| order: | |
| JIS X 0213 Plane 1 -> JIS X 0213 Plane 2 -> JIS X 0212 | |
| 4) euc-jp codec | |
| The euc-jp codec is a compatibility instance on these points: | |
| - U+FF3C FULLWIDTH REVERSE SOLIDUS is mapped to EUC-JP A1C0 (vice versa) | |
| - U+00A5 YEN SIGN is mapped to EUC-JP 0x5c. (one way) | |
| - U+203E OVERLINE is mapped to EUC-JP 0x7e. (one way) | |
| 5) shift-jis codec | |
| The shift-jis codec is mapping 0x20-0x7e area to U+20-U+7E directly | |
| instead of using JIS X 0201 for compatibility. The differences are: | |
| - U+005C REVERSE SOLIDUS is mapped to SHIFT-JIS 0x5c. | |
| - U+007E TILDE is mapped to SHIFT-JIS 0x7e. | |
| - U+FF3C FULL-WIDTH REVERSE SOLIDUS is mapped to SHIFT-JIS 815f. | |