- Subject: unicode (was Re: Minor error message change)
- From: "John E. Davis" <davis>
- Date: Thu, 29 May 2003 16:03:09 -0400
Pavel Roskin <proski@xxxxxxx> wrote:
>If TERM is not set, S-Lang prints "TERM environment variable needs set"
>(see sldisply.c in s-lang 1.4.9).
>
>It just don't sound right to me. How about "TERM environment variable is
>not set" or "TERM environment variable needs to be set"?
Many of the error messages are old and stem from a time when I tried
to get all global data to fit in 64K. I am reviewing all error
messages for slang 2.
BTW, I did not see my email about unicode appear on this list, and I
suspect that I was not the only one. For that reason, I am including
it below. Thanks, --John
>Hi,
>
> As many of you know, the next slang release will provide full
>support for unicode using the UTF-8 encoding. Right now, full support
>has been added to the SLsmg/SLtt slang interfaces. By full support I
>mean support for 32 bit unicode characters accounting for combining
>characters (up to 4), double width characters, illegal UTF-8 encoded
>strings, etc. In fact, the SLsmg/SLtt interfaces are done.
>
> At the same time, I am adding support for UTF-8 to jed, which will
>serve to test the library. (See http://www.jedsoft.org/images/jedutf8.png
>for an image) In doing so, I came across the following
>"issue". What should the interpreter's strlen function return?
>Currently, it knows nothing about the encoding and returns the number
>of bytes making up the string. However, it could be modified to
>return one of the following:
>
> 1. The number of bytes in the string.
> 2. The number of characters in the string, including combining
> characters.
> 3. The number of characters in the string, not counting the
> combining characters.
>
>Keep in mind that in the UTF-8 encoding, a character is represented by
>1 to 6 bytes. Hence, one needs to be careful when using the term
>"character". A so-called combining character can be thought of as an
>"overstrike" character. For example, the spanish "enye" character,
>may be represented as 2 characters: an 'n' and a '~' combined. In
>this case, the tilde (U+0303) is a combining character.
>
>When looking at the way jed's .sl files use strlen, I noticed that
>most of the code using strlen would not have to be changed assuming
>strlen behaved according to the semantics of #3. Hence, I propose the
>following for slang v2:
>
> strlen: returns the number of characters (not bytes!) in a string.
> Any combining characters will not be included in the sum.
>
>In addition, I propose two new functions:
>
> strbytelen: Returns the number of bytes in a string.
>
> strcharlen: Returns the number of chars in a string, counting the
> combining characters.
>
>Comments about this proposal?
>Thanks,
>--John
>
[2003 date index]
[2003 thread index]
[Thread Prev] [Thread Next]
[Date Prev] [Date Next]