- Subject: Re: Non-ascii chars in UTF-8 mode not bindable
- From: "John E. Davis" <davis@xxxxxxxxxxxxx>
- Date: Fri, 1 Jun 2007 13:45:54 -0400
G. Milde <milde@xxxxxxxxxxxxxxxxxxxxx> wrote:
>I did some more testing to track down the problem:
There is really no mystery to what is happening. The fact is that the
keymap routines use byte-semantics. In general, a keymap is a series
of 256 element lookup tables. If the tables were naively expanded
from 256 to the maximum allowable unicode character (~1 million), then
the tables would be unacceptably large. The work-around that I posted
(and later corrected) avoids this problem.
In the case you considered, the four keys correspond to the following
byte strings:
Key: ´ : "\c2\b4"
Key: ¬ : "\c2\ac"
Key: ° : "\c2\b0"
Key: ¼ : "\c2\bc"
The default bindings of the bytes 0xc2, 0xb4, 0xac, 0xb0, and 0xbc are
to "self_insert_cmd". So when the editor sees a byte sequence such as
0xc2 0xb4, it simply inserts both bytes into the buffer. When in
UTF-8 mode, this combination is interpreted as the single unicode
character '´' (\u{00b4}).
When you bound the the byte-sequence "\c2\b4" to something, that
effectively created a keymap for sequences beginning with 0xc2. As a
result, 0xc2 was nolonger bound to "self_insert_cmd", and a sequence
such as "\c2\bc" would not do anything since "\bc" is unbound in the
0xc2 based keymap.
At some point, I will integrate the work-around that I posted into the
setkey functions. I posted the slang version to give others an
immediate solution to the problem, although I suspect only a few will
ever run into this issue.
I hope this clarifies things a bit.
Thanks,
--John
--------------------------
To unsubscribe send email to <jed-users-request@xxxxxxxxxxx> with
the word "unsubscribe" in the message body.
Need help? Email <jed-users-owner@xxxxxxxxxxx>.
[2007 date index]
[2007 thread index]
[Thread Prev] [Thread Next]
[Date Prev] [Date Next]