Introduction to Unicode
What is Unicode?
Unicode is (or at least is intended to be) a Universal charictar set
supporting every written script used on Earth (and some that aren't). Unicode
attempts to draw a line between a character, a unit of phonetic or
semantic meaning, and a rune or glyph, a character's visual representation.
(See Han unification) Thus a single character that has different appearances, has multiple runes. (eg. Arabic characters change appearance dependant on position in a word.)
Other Names:
- ISO 10646
- Unicode and ISO 10646 started concurrently, but have since merged
- UTF-X
- A variant of unicode used for byte streams and file systems
Advantages of Unicode.
- Universal
- As a international standard Unicode has the potential for relieving some
of the conflicts that arise from the use of non-standard implementations of non-latin characters, such as remapping the upper ascii range.
- Fixed width
- All characters are 16 bits. This means that the number of characters in a string can be gleemed for its length in bytes.
- Font composition
- Because of the unyieldy size of full unicode fonts many systems have
several parital fonts that each cover a portion of the Unicode spectum. This
means I can have many fonts for the characters I use the most with neither
loss of generality nor multiple copies of the whole character set.
- Han unification
- Because of the distinction between characters and runes and also because
of sheer space, unicode overlays the Japanese, Korean, and Chinese characters.
The decision was made that they are the same characters with different
appearances in the different languages.
Problems with Unicode
- Size of Fonts
- Doubles the size of files for western european languages
- Not easily convertable to any of the Japanese standards
- Han unification
Who is using Unicode?
- Operating Systems
- Plan 9
- Linux
- Windows NT
- Applications
- Sam editor
- 9Term
- MASS
- Fonts
- Everson Mono fonts
- Ifcss
Other Links
The Official Unicode Site
See what Digital has to say about Unicode
Eric Eastman