slang-users mailing list

[2003 Date Index] [2003 Thread Index] [Other years]
[Thread Prev] [Thread Next] [Date Prev] [Date Next]

unicode (was Re: Minor error message change)

Subject: unicode (was Re: Minor error message change)
From: "John E. Davis" <davis>
Date: Thu, 29 May 2003 16:03:09 -0400

Pavel Roskin <proski@xxxxxxx> wrote:
>If TERM is not set, S-Lang prints "TERM environment variable needs set"
>(see sldisply.c in s-lang 1.4.9).
>
>It just don't sound right to me.  How about "TERM environment variable is
>not set" or "TERM environment variable needs to be set"?

Many of the error messages are old and stem from a time when I tried
to get all global data to fit in 64K.  I am reviewing all error
messages for slang 2.

BTW, I did not see my email about unicode appear on this list, and I
suspect that I was not the only one.  For that reason, I am including
it below.  Thanks, --John 

>Hi,
>
>   As many of you know, the next slang release will provide full
>support for unicode using the UTF-8 encoding.  Right now, full support
>has been added to the SLsmg/SLtt slang interfaces.  By full support I
>mean support for 32 bit unicode characters accounting for combining
>characters (up to 4), double width characters, illegal UTF-8 encoded
>strings, etc.  In fact, the SLsmg/SLtt interfaces are done.
>
>   At the same time, I am adding support for UTF-8 to jed, which will
>serve to test the library.  (See http://www.jedsoft.org/images/jedutf8.png 
>for an image) In doing so, I came across the following
>"issue".  What should the interpreter's strlen function return?
>Currently, it knows nothing about the encoding and returns the number
>of bytes making up the string.  However, it could be modified to
>return one of the following:
>
>   1.  The number of bytes in the string.
>   2.  The number of characters in the string, including combining
>       characters.
>   3.  The number of characters in the string, not counting the
>       combining characters.
>
>Keep in mind that in the UTF-8 encoding, a character is represented by
>1 to 6 bytes.  Hence, one needs to be careful when using the term
>"character".  A so-called combining character can be thought of as an
>"overstrike" character.  For example, the spanish "enye" character,
>may be represented as 2 characters: an 'n' and a '~' combined.  In
>this case, the tilde (U+0303) is a combining character.
>
>When looking at the way jed's .sl files use strlen, I noticed that
>most of the code using strlen would not have to be changed assuming
>strlen behaved according to the semantics of #3. Hence, I propose the
>following for slang v2:
>
>  strlen: returns the number of characters (not bytes!) in a string.
>          Any combining characters will not be included in the sum.
>
>In addition, I propose two new functions:
>
>  strbytelen: Returns the number of bytes in a string.
>
>  strcharlen: Returns the number of chars in a string, counting the
>              combining characters.
>
>Comments about this proposal?
>Thanks,
>--John
>

References:
- Minor error message change
  - From: Pavel Roskin

[2003 date index] [2003 thread index]
[Thread Prev] [Thread Next] [Date Prev] [Date Next]