EUC


EUC stands for Extended Unix Code. It is a multibyte encoding standard developed by AT&T and supported on all System V implementations used to represent large Asian characters sets. There are several variants, two of them are for Chinese.
It defines both a fixed length and variable length encoding. It's a 8 bit coding method

If codeset 0 is ASCII, then the EUC codeset is ASCII transparent. Often this is the local version of ASCII. The rules for describing a legal EUC codeset. These rules are the following:
1) Each character of an EUC multibyte string is chosen from among four distinct multibyte codesets (0,1,2,and 3).
2) Codeset 0 must be a 7bit codeset.
3) No multibyte character of Codeset 1 will use either SS2 or SS3 as its first byte.
4) Characters from codeset 2 will be preceded by the byte SS2.
5) Characters from codeset 3 will be preceded by the byte SS3.
6) For codesets 1, 2, and 3, every byte of every character must have the eighth bit set.

EUC-TW

  • codeset 0 : ASCII
  • codeset 1 : CNS 11643-1992 plane 1
  • codeset 2 : CNS 11643-1992 plane 2 - 16
  • codeset 3 : [not used]

 

EUC-CN

  • codeset 0 : ASCII
  • codeset 1 : GB 2312-80
  • codeset 2 : [not used]
  • codeset 3 : [not used]
[ < back ] - [ home ]
   
Search >


Local links are in blue, links to other websites are in red, commands are in green.

You need unicode fonts, a 4+ browser and acrobat reader to fully explore and enjoy this webpage. (if necessary you can download asian fontpacks for acrobat reader)

Currently translating my thesis to English : more info

© Seba - contact at seba at ulyssis dot org
users online: 1