- Subject: [slang-users] Re: unicode (was Re: Minor error message change)
- From: "John E. Davis" <davis@xxxxxxxxxxxxx>
- Date: Mon, 16 Aug 2004 11:58:34 -0400
[This followup are to comments made by Pavel a while back on the old
slang mailing list. I feel that his comments are right on target and are
particularly relavent for the upcoming slang2 release. For this
reason, the original message is quoted in its entirety.]
Pavel Roskin <proski@xxxxxxx> wrote:
>On Mon, 30 Jun 2003, Goran Koruga wrote:
>
>> Have you looked at this URL :
>> http://www.cl.cam.ac.uk/~mgk25/unicode.html
>
>I'm looking at it now.
>
>> It suggests you use langinfo() to get info about it.
>
>I'm not sure it's relevant to the original question. You probably mean
>this part:
>
> To do this, on any X/Open compliant systems, where <langinfo.h> is
> available, you can use a line such as
>
> utf8_mode = (strcmp(nl_langinfo(CODESET), "UTF-8") == 0);
>
> in order to detect whether the current locale uses the UTF-8 encoding.
> You have of course to add a setlocale(LC_CTYPE, "") at the beginning of
> your application to set the locale according to the environment
> variables first.
>
>This is not related to the properties of the terminal. Most likely it
>just takes the encoding from the LC_CTYPE locale, and failing that it
>probably returns the "default" encoding for the given language/locale
>combination. As I said in my previous message, LC_CTYPE affects the
>assumed encoding of the data and of the regular expressions supplied by
>the program.
>
>There is no way for libc to know whether the terminal supports UTF-8
>output.
>
>It seems that we need some kind of "summit" on this topic, involving the
>major players in the i18n development and text terminal software on POSIX
>and Linux in particular (e.g. John E. Davis, Thomas Dickey, Bruno Haible,
>Ulrich Drepper, Eric Raymond, Ted Ts'o). It's time to create a better
>standard.
>
>The existing specifications are unclear and don't address several issues.
>The separation between locale categories is artificial, and it's even more
>artificial when encodings are added to them.
>
>It makes no sense for a user to set encodings for different locale
>categories. The encodings should be set for the terminal (the only
>setting irrelevant for GUI), for the default text (e.g. if most my files
>are in koi8-r, I set it as default), and maybe for legacy programs that
>use regular expressions but don't specify their charset.
>
>There should be a standard way to check the encoding of the terminal.
>Maybe it should be another capability or another environment variable.
>
>If we don't create such standard, somebody will create it for us, poorly.
>And then we'll waste our time gluing ad-hoc solutions together.
>
>--
>Regards,
>Pavel Roskin
>
I think that Pavel is correct regarding the existing locale
specifications and I have encountered its weaknesses in adding UTF-8
support to jed. For example, I would like to edit and view UTF-8
documents while at the same time deal with documents using other
character sets, particularly iso-latin character sets. I realize that
converting from one character set to another is more or less a solved
problem and as such, it is not an issue. But as Pavel pointed out the
terminal (xterm, rxvt, etc) is the problem.
Since most of the time I deal with an iso-latin character set, I use
an ordinary xterm that has no unicode support. To allow me to deal
with UTF-8 encoded files on a non-UTF-8 terminal, I have added the
ability to turn on or off support for UTF-8 in the various slang
layers. For example, writing to the terminal using slang's SLsmg
functions involves at least two interfaces: The higher level SLsmg
layer and the lower-level SLtt layer. In an ordinary xterm,
the SLsmg layer would have UTF-8 support activated but the SLtt layer
would run with UTF-8 support deactivated. When is started with UTF-8
support, the interpreter will always run in UTF-8 mode.
Currently I use the following bit of code in jed to activate slang's UTF-8
capabilities:
Jed_UTF8_Mode = SLutf8_enable (-1);
SLsmg_utf8_enable (1);
SLinterp_utf8_enable (1);
Jed_UTF8_Mode = 1; /* force jed to use UTF-8 internally */
SLutf8_enable is a slang functions that activates base support for
UTF-8. Passing -1 as I have done above causes it to use the current
locale. If the locale indicates UTF-8, or UTF-8 support is forced (by
passing +1 as the argument to SLutf8_enable), then all interfaces will
use UTF-8 unless told otherwise via calls to an interface specific
SLxxx_enable_utf8 function. In the above, the SLsmg and SLinterp
layers are forced into into UTF-8 mode by calls to the appropriate
layer-specific functions. Since jed does not call SLtt_utf8_enable,
the terminal itself (SLtt layer) has its UTF-8 support activated
depending upon the locale.
The above outlines how I have handled UTF-8 activation for jed, but
this may not be the best way to do it and may change if better
suggestions come along. Nevertheless, I wanted to illustrate that the
slang library has provisions for fine-grain control of its UTF-8
support and it should be fairly easy to accomodate a "better standard"
if one arises.
Sometime soon I will announce the availablity of a slang 2 snapshot.
Thanks,
--John
_______________________________________________
To unsubscribe, visit http://jedsoft.org/slang/mailinglists.html
[2004 date index]
[2004 thread index]
[Thread Prev] [Thread Next]
[Date Prev] [Date Next]