- Subject: Re: slang: UTF-8 and strlen
- From: Romano Giannetti <romano@xxxxxxxxxxxxxxxx>
- Date: Tue, 13 May 2003 10:18:15 +0200
On Sun, May 11, 2003 at 01:27:37PM -0400, John E. Davis wrote:
> At the same time, I am adding support for UTF-8 to jed, which will
> serve to test the library. (See http://www.jedsoft.org/images/jedutf8.png
> for an image)
Nice! I'd really can't wait to have it. Right now I have to do strange
contorsion between jed and yudit when I want to answer in the linguistic
newsgroup (using IPA etc...).
A nice thing would be to have everytime under your eyes the encoding you are
using --- a nice "UTF-8" or "ISO-...-15" in some place in the main windows.
And function to switch encoding, too.
> In doing so, I came across the following
> "issue". What should the interpreter's strlen function return?
> Currently, it knows nothing about the encoding and returns the number
> of bytes making up the string. However, it could be modified to
> return one of the following:
>
> 1. The number of bytes in the string.
> 2. The number of characters in the string, including combining
> characters.
> 3. The number of characters in the string, not counting the
> combining characters.
>
Well, the problem is: if strlen is mainly used to count "how much visual
space" the string occupy on screen, option #3 is the correct one; not only,
but you should take into account wide char that occupy 2 places. But I do
not know how this can mix with searching etc etc.
I would like to suggest to borrow "wcswidth", "wcslen" (man 3 wcswidth)
and company, aka the the POSIX wide-char string visual lenght attribute. Or
add to strlen an "encoding" optional parameter.
The real neat thing would be to deprecate strlen and force the
user/programmer to use the correct function, but I understand this is
practically unviable.
Thanks,
Romano
--
Romano Giannetti - Univ. Pontificia Comillas (Madrid, Spain)
Electronic Engineer - phone +34 915 422 800 ext 2416 fax +34 915 411 132
--------------------------
To unsubscribe send email to <jed-users-request@xxxxxxxxxxx> with
the word "unsubscribe" in the message body.
Need help? Email <jed-users-owner@xxxxxxxxxxx>.
[2003 date index]
[2003 thread index]
[Thread Prev] [Thread Next]
[Date Prev] [Date Next]