[Subject Prev][Subject Next][Thread Prev][Thread Next][Subject Index][Thread Index]

Re: [LI] Double byte characters support Linux



On Thu, Nov 18, 1999 at 10:52:47AM -0800, Sudhakar Chandrasekharan wrote:
> From my understanding of unicode I was under the impression that unicode is
> mainly useful for storing multi-byte characters in databases and things
> like that.  i.e. storing data in the computer.  unicode, if I understand it
> right, is not useful to end users.

That's not right. With 8 bit characters, you are limited to a character
set of 2^8 = 256 characters. With this, you can not open a file containing
english, chinese and devanagari scripts at the same time.

Unicode allows you to do just that. With 16 bits per char, you have 
2^16 different representations. http://www.unicode.org has partitioned
this space into different languages. This partitioning is non-overlapping
i.e. Unicode number for Hindi "a" will not conflict with english "A"
and so on.

So you could potentially open up a document and see all there languages 
simultaneously. Of course, this requires that you have a font file, which
has glyphs for 2^16 characters. Some guy associated with the GNU project
was trying to assemble that the last I looked. Microsoft has a downloadble
Unicode font file that covers a large space - but is still incomplete.

Unfortunately, most of the various Indian language fonts that crop up
everyday on the net are not unicode compliant. They just overload the
space occupied by the current ASCII character set.

The hard problem about Indian languages is not representation. It's
the rendering of the representation on the screen. 

	-Arun

--------------------------------------------------------------------
The Linux India Mailing List Archives are now available.  Please search
the archive at http://lists.linux-india.org/ before posting your question
to avoid repetition and save bandwidth.