- Subject: Re: slang: UTF-8 and strlen
 
- From: Romano Giannetti <romano@xxxxxxxxxxxxxxxx>
 
- Date: Tue, 13 May 2003 10:18:15 +0200
 
On Sun, May 11, 2003 at 01:27:37PM -0400, John E. Davis wrote:
>    At the same time, I am adding support for UTF-8 to jed, which will
> serve to test the library.  (See http://www.jedsoft.org/images/jedutf8.png 
> for an image) 
Nice! I'd really can't wait to have it. Right now I have to do strange
contorsion between jed and yudit when I want to answer in the linguistic
newsgroup (using IPA etc...). 
A nice thing would be to have everytime under your eyes the encoding you are
using --- a nice "UTF-8" or "ISO-...-15" in some place in the main windows.
And function to switch encoding, too. 
> In doing so, I came across the following
> "issue".  What should the interpreter's strlen function return?
> Currently, it knows nothing about the encoding and returns the number
> of bytes making up the string.  However, it could be modified to
> return one of the following:
> 
>    1.  The number of bytes in the string.
>    2.  The number of characters in the string, including combining
>        characters.
>    3.  The number of characters in the string, not counting the
>        combining characters.
> 
Well, the problem is: if strlen is mainly used to count "how much visual
space" the string occupy on screen, option #3 is the correct one; not only,
but you should take into account wide char that occupy 2 places. But I do
not know how this can mix with searching etc etc.
I would like to suggest to borrow "wcswidth", "wcslen" (man 3 wcswidth) 
and company,  aka the the POSIX wide-char string visual lenght attribute. Or
add to strlen an "encoding" optional parameter. 
The real neat thing would be to deprecate strlen and force the
user/programmer to use the correct function, but I understand this is
practically unviable. 
Thanks,
           Romano 
-- 
Romano Giannetti             -  Univ. Pontificia Comillas (Madrid, Spain)
Electronic Engineer - phone +34 915 422 800 ext 2416  fax +34 915 411 132
--------------------------
To unsubscribe send email to <jed-users-request@xxxxxxxxxxx> with
the word "unsubscribe" in the message body.
Need help? Email <jed-users-owner@xxxxxxxxxxx>.
  [2003 date index]
  [2003 thread index]
  
  [Thread Prev] [Thread Next]
      
  [Date Prev] [Date Next]