- Subject: Re: unicode (was Re: Minor error message change)
- From: Pavel Roskin <proski@xxxxxxx>
- Date: Mon, 30 Jun 2003 16:25:10 -0400 (EDT)
On Mon, 30 Jun 2003, Goran Koruga wrote:
> Have you looked at this URL :
> http://www.cl.cam.ac.uk/~mgk25/unicode.html
I'm looking at it now.
> It suggests you use langinfo() to get info about it.
I'm not sure it's relevant to the original question. You probably mean
this part:
To do this, on any X/Open compliant systems, where <langinfo.h> is
available, you can use a line such as
utf8_mode = (strcmp(nl_langinfo(CODESET), "UTF-8") == 0);
in order to detect whether the current locale uses the UTF-8 encoding.
You have of course to add a setlocale(LC_CTYPE, "") at the beginning of
your application to set the locale according to the environment
variables first.
This is not related to the properties of the terminal. Most likely it
just takes the encoding from the LC_CTYPE locale, and failing that it
probably returns the "default" encoding for the given language/locale
combination. As I said in my previous message, LC_CTYPE affects the
assumed encoding of the data and of the regular expressions supplied by
the program.
There is no way for libc to know whether the terminal supports UTF-8
output.
It seems that we need some kind of "summit" on this topic, involving the
major players in the i18n development and text terminal software on POSIX
and Linux in particular (e.g. John E. Davis, Thomas Dickey, Bruno Haible,
Ulrich Drepper, Eric Raymond, Ted Ts'o). It's time to create a better
standard.
The existing specifications are unclear and don't address several issues.
The separation between locale categories is artificial, and it's even more
artificial when encodings are added to them.
It makes no sense for a user to set encodings for different locale
categories. The encodings should be set for the terminal (the only
setting irrelevant for GUI), for the default text (e.g. if most my files
are in koi8-r, I set it as default), and maybe for legacy programs that
use regular expressions but don't specify their charset.
There should be a standard way to check the encoding of the terminal.
Maybe it should be another capability or another environment variable.
If we don't create such standard, somebody will create it for us, poorly.
And then we'll waste our time gluing ad-hoc solutions together.
--
Regards,
Pavel Roskin
[2003 date index]
[2003 thread index]
[Thread Prev] [Thread Next]
[Date Prev] [Date Next]