Java

Java comes with a classes called InputStreamReader and OutputStreamWriter that translate into and out of Unicode from local encodings. Two of the supported encodings are GB2312 and Big5.

Java 2 allows the programmer to directly access the fonts on the machine. Previous to the introduction of Swing set of peerless Java AWT components, Java could not display Chinese except on Chinese operating systems. With Swing, you can display Chinese in any component, providing you have fonts that support Chinese on your system. So the latest versions of Java can display Chinese, Japanese, and Korean text directly if corresponding fonts are installed.

Double-byte character set support for Java Server Pages
East Asian languages such as Japanese, Chinese, and Korean are classified as double-byte character sets (DBCS). An individual character representation requires two bytes as opposed to a single byte for an English language character. For example, Japanese requires 16 bits to represent the roughly 32,000 double-byte characters.
To support DBCS JSP pages, the JSP compiler checks the page directive to determine which character set to use. The JSP compiler uses the value of charset to determine the encoding of the JSP page. If charset is not defined, ISO-8859-1 encoding is assumed. For example, to specify encoding using simplified Chinese characters, set the contentType attribute to the appropriate character set in a page directive:
<%@ page contentType="text/html;charset=eucgb" %>
The generated Java code always uses UTF8 encoding.
The browser displaying the JSP must also support the character set. Browsers that are compliant with HTML 4.0 support Basic Multilingual Plane, a standardized 16-bit character set that supports most of the world's languages. The browser must also have the required fonts to correctly display the characters of your target language.

How to find CJK characters in a String
I found this code on the internet on how to find Japanese characters in a String. You can alter this code to make it work with Chinese ;)
/** returns true if the String s contains any "double-byte" characters */
public boolean containsDoubleByte(String s) {
for (int i=0;i<s.length(); i++) {
if (isDoubleByte(s.charAt(i)) {
return true;
}
}
return false;
}

/** returns true if the char c is a double-byte character */
public boolean isJapanese(char c) {
if (c >= '\u0100' && c<='\uffff') return true;
return false;
// simpler: return c>'\u00ff';
}

/** returns true if the String s contains any Japanese characters */
public boolean containsJapanese(String s) {
for (int i=0; i<s.length(); i++) {
if (isJapanese(s.charAt(i)) {
return true;
}
}
return false;
}

/** returns true if the char c is a Japanese character. */
public boolean isJapanese(char c) {
// katakana:
if (c >= '\u30a0' && c<='\u30ff') return true;
// hiragana
if (c >= '\u3040' && c<='\u309f') return true;
// CJK Unified Ideographs
if (c >= '\u4e00' && c<='\u9fff') return true;
// CJK symbols & punctuation
if (c >= '\u3000' && c<='\u303f') return true;
// KangXi (kanji)
if (c >= '\u2f00' && c<='\u2fdf') return true;
// KanBun
if (c >= '\u3190' && c <='\u319f') return true;
// CJK Unified Ideographs Extension A
if (c >= '\u3400' && c <='\u4db5') return true;
// CJK Compatibility Forms
if (c >= '\ufe30' && c <='\ufe4f') return true;
// CJK Compatibility
if (c >= '\u3300' && c <='\u33ff') return true;
// CJK Radicals Supplement
if (c >= '\u2e80' && c <='\u2eff') return true;
// other character..
return false;


Link
old Java i18n faq
Java 1.4 I18n faq
Internationalization of Java application

[ < back ] - [ home ]

   
Search >


Local links are in blue, links to other websites are in red, commands are in green.

You need unicode fonts, a 4+ browser and acrobat reader to fully explore and enjoy this webpage. (if necessary you can download asian fontpacks for acrobat reader)

Currently translating my thesis to English : more info

© Seba - contact at seba at ulyssis dot org
users online: 1