[Subject Prev][Subject Next][Thread Prev][Thread Next][Subject Index][Thread Index]

Re: [LI] Double byte characters support Linux



Does anybody know a mailer specifically for handling digests for editing

(besides EMACS :))?

> Date: Thu, 18 Nov 1999 12:41:29 -0800
> From: Arun Sharma <adsharma@xxxxxxxxxxxxxxx>
> Subject: Re: [LI] Double byte characters support Linux
>
> On Thu, Nov 18, 1999 at 10:52:47AM -0800, Sudhakar Chandrasekharan
wrote:
> > From my understanding of unicode I was under the impression that
unicode is
> > mainly useful for storing multi-byte characters in databases and
things
> > like that.  i.e. storing data in the computer.  unicode, if I
understand it
> > right, is not useful to end users.
>
> That's not right. With 8 bit characters, you are limited to a
character
> set of 2^8 = 256 characters. With this, you can not open a file
containing
> english, chinese and devanagari scripts at the same time.
>
> Unicode allows you to do just that. With 16 bits per char, you have
> 2^16 different representations. http://www.unicode.org has partitioned

> this space into different languages. This partitioning is
non-overlapping
> i.e. Unicode number for Hindi "a" will not conflict with english "A"
> and so on.

Actually there are two bodies working on this standardization of
universal
encoding. The ISO/JTC (non-commercial) and Unicode (commercial) group.
The
ISO group had been working on 4 byte (UCS4) encoding for a long time but

did not come out with a standard by the time Unicode was published. Once

Unicode was published they decided to adopt it for 2 byte (UCS2)
encoding
for the Basic Multilingual Plane (BMP) which is a subset of UCS4 in the
ISO10646 standard. You can get more info from the links at:
http://www.cl.cam.ac.uk/~mgk25/unicode.html

>
> So you could potentially open up a document and see all there
languages
> simultaneously. Of course, this requires that you have a font file,
which
> has glyphs for 2^16 characters. Some guy associated with the GNU
project
> was trying to assemble that the last I looked. Microsoft has a
downloadble
> Unicode font file that covers a large space - but is still incomplete.

The GNU effort is called UNIFONT and can be accessed from:
http://czyborra.com/unifont/ and UCS effort at
http://www.cl.cam.ac.uk/~mgk25/ucs-fonts.html

If you do not want all the glyphs but a part covering say English and
Hindi
then you can create a font filled with only those slots ie. 0000-00FF
and
0900-097F like the ones distributed at Sandeep Sibal's website.

>
> Unfortunately, most of the various Indian language fonts that crop up
> everyday on the net are not unicode compliant. They just overload the
> space occupied by the current ASCII character set.

There is a utility to convert Jagran and ITRANS encoded pages to UTF8
distributed by Mark/me on UTF8 mailing list. If somebody wants it then
I can provide the link/file.
>
> The hard problem about Indian languages is not representation. It's
> the rendering of the representation on the screen.

Rightly said. Does anybody in this mailing list have good knowledge
about
font/glyph rendering ? He can really help out the GScript project at
GTK I18N [link http://www.gtk.org ] or the IScript project at aczone.com

HTH
-swapan

--------------------------------------------------------------------
The Linux India Mailing List Archives are now available.  Please search
the archive at http://lists.linux-india.org/ before posting your question
to avoid repetition and save bandwidth.