Introduction to Unicode


What is Unicode?

Unicode is (or at least is intended to be) a Universal charictar set supporting every written script used on Earth (and some that aren't). Unicode attempts to draw a line between a character, a unit of phonetic or semantic meaning, and a rune or glyph, a character's visual representation. (See Han unification) Thus a single character that has different appearances, has multiple runes. (eg. Arabic characters change appearance dependant on position in a word.)

Other Names:

ISO 10646
Unicode and ISO 10646 started concurrently, but have since merged
UTF-X
A variant of unicode used for byte streams and file systems

Advantages of Unicode.

Universal
As a international standard Unicode has the potential for relieving some of the conflicts that arise from the use of non-standard implementations of non-latin characters, such as remapping the upper ascii range.
Fixed width
All characters are 16 bits. This means that the number of characters in a string can be gleemed for its length in bytes.
Font composition
Because of the unyieldy size of full unicode fonts many systems have several parital fonts that each cover a portion of the Unicode spectum. This means I can have many fonts for the characters I use the most with neither loss of generality nor multiple copies of the whole character set.
Han unification
Because of the distinction between characters and runes and also because of sheer space, unicode overlays the Japanese, Korean, and Chinese characters. The decision was made that they are the same characters with different appearances in the different languages.

Problems with Unicode

Size of Fonts
Doubles the size of files for western european languages
Not easily convertable to any of the Japanese standards
Han unification

Who is using Unicode?

Operating Systems
Plan 9
Linux
Windows NT
Applications
Sam editor
9Term
MASS
Fonts
Everson Mono fonts
Ifcss

Other Links

The Official Unicode Site
See what Digital has to say about Unicode


Eric Eastman
INTERNATIONALIZATION