- Subject: Re: UTF-8 and Regular Expressions
- From: Jörg Sommer <joerg@xxxxxxxxxxxx>
- Date: Wed, 18 Apr 2007 13:17:00 +0000 (UTC)
Hello G.,
"G. Milde" <milde@xxxxxxxxxxxxxxxxxxxxx> wrote:
> On 11.04.07, John E. Davis wrote:
>> In testing, one problem has come up: When used in UTF-8 mode, PCRE
>> cannot tolerate malformed text. This can be a problem when jed is
>> running in UTF-8 mode, but one is editing text in some other encoding,
>> e.g., ISO-Latin-1.
>
> However, when editing text in Jed-U, "the right thing" would be to
> convert it transparently to UTF-8 in a find_file_hook and re-convert back
> when saving (analog to compress.sl).
But UTF‐8 has also malformed sequences. see UTF-8-test.txt.
What would happen, if the PCRE see such a malformed sequence? Would jed
die? Isn't it possible, that an exception (InvalidUTF8Error) is thrown?
> Conversion could be done by `iconv`, `recode` or (from|to latin-1) a
Are those converters accessable via SLang? (I talk about iconv(3)). That
would be great, but I doubt it. They aren't available everywhere.
Bye, Jörg.
--
Mathematiker beim Kuchenessen (aus dem wahren Leben):
J: Du überlegst wohl, wie du das Stück optimal teilst?
K: Ja, ich wende gerade den Simplex‐Algorithmus darauf an.
C: Schau mal, da hast du schon vier Ecken.
--------------------------
To unsubscribe send email to <jed-users-request@xxxxxxxxxxx> with
the word "unsubscribe" in the message body.
Need help? Email <jed-users-owner@xxxxxxxxxxx>.
[2007 date index]
[2007 thread index]
[Thread Prev] [Thread Next]
[Date Prev] [Date Next]