This very short article is about the romanization of Chinese characters (Called the PinYin alphabet)
and how to display them using Java (J2SE 5.0).
Introduction >
It is not that easy to find development resources on PinYin and how to use it within J2SE.
I had spent a few days trying to understand how we could simply display the Chinese characters,
their UNICODE code point and their definition.
You will find a Java Web Start application that you can launch at the bottom of that page in the resources section.
(Note that you must have at least the JRE 5.0 installed).
The Unihan database >
I started by looking at the Unihan database from Unicode.org.
This is a flat text file of about 25 megabytes that contains everything that you need:
- The Unicode scalar value (or code point): a 4 to 6 hexadecimal digits
- The English definition (kDefinition field)
- The Mandarin pronunciation for the character in PinYin (kDefinition field)
There is other useful information, like the simplified variant (if any) for that character, and so on.
Extraction of information from Unihan text file >
My next step was to parse the file, extract the information I needed, populate an object model that is serialised for later use.
I used the Regular Expression and the NIO packages from J2SE 5.0.
The whole process of parsing the 25 MB file, creating the object model and serialising it takes around 8 seconds on a Pentium 4 2.66 Ghz.
The serialised object model is about 1.2 MB in size.
The only tricky part was to retrieve the UNICODE string from the Code point hexadecimal digits.
Fortunately, the Character class has a new method in J2SE 5.0: public static char[] toChars(int codePoint)
that takes a code point (as an integer) and returns an array of chars. A char array can then be used within a String constructor.
The process is to take the UNICODE Code point in hexadecimal, convert it to an integer, and create the String.
In order to convert from hexadecimal to a base 10, just use the method
public static Integer valueOf(String s, int radix)
using a radix of 16.
That’s it!
Displaying the results >
I used a simple JTable from the Swing package to display the results. It looks pretty!
You can try it by clicking on the JNLP file in the Resources section. You need to have J2SE 5.0 installed!
What’s next? >
The next steps would be to:
- First of all, learn Chinese. At least to be able to read it.
I must find a contract in Hong-Kong or Beijing :-) Can you help?
- Second, it would be very useful to type in the PinYin or the English words so that the software comes up with a list of Chinese characters.
It is fairly easy to implement. Just need time.
- Third, implement the Second point in a quick and dirty way, or,
- Fourth use the Input Method Framework instead.
Resources >
Author
> Thierry Janaudy