- Subject: Re: Jed and utf-8... a pre-pre-pre-plea :-)
- From: Romano Giannetti <romano@xxxxxxxxxxxxxxxx>
- Date: Thu, 19 Jun 2003 10:32:00 +0200
On Thu, Jun 19, 2003 at 09:15:59AM +0200, Günter Milde wrote:
> If I understood well, doing set_charset("latin1") would also mean that the
> buffer is saved in latin1 encoding, i.e. although you change bytes in the
> buffer, doing find_file("test.tex"); set_charset("latin1");
> save_buffer() will not change the file.
yes, if test.tex was originally in latin1. I think the problem is if I have some
.tex in latin1, some in latin2, and some in latin9. Maybe we can have a
"raw8bit" mode that does work as if the locale was not utf-8... but that's
tricky at least...
> Of course one can then have a latex_mode_hook that calls
> set_charset("latin1"), so everything is transparent and fairly automatic.
Best would be a function that search for "\usepackage[charset]{inputenc}"
and set the charset appropriately.
>
> > One of the things that I need to think about is an "API" for the
> > charset-mapping functions. I mentioned one function, "set_charset",
> > but perhaps there should be more. Can you think of other
> > charset-related functions that may be useful? "get_charset" comes to
> > mind.
When converting from utf-8 to a 8bit charset, rules differs with the charset
in use. I do not think that you will implement all the conversions, so a
mechanism (hooks) to add conversion functions from a generic charset and
utf-8 and back should be implemented. With the possibility to give errors:
for example, if I type a "long umlaut uppercase U" (TeX \"U, utf-8 0xb0
0xc5), it can be translated to latin2 (0xe1) but _not_ in latin1, where the
same code is for \^U ... in this case for example yudit write an ASCII
\u0170 (unicode value) to the output file. This should could be a point
where a hook could be useful (what to do with character that are
unrepresentable in the requested file charset; that would be really nice for
TeX things, you can fake almost all composing accents with macros, without
the need of a special font).
I do not know if I had explained myself well... trying to resume (it's a
horrible interface, but I want just to give the idea):
add_translation_from_charset_to_utf8(charset,f_one);
add_translation_from_utf8_to_charset(charset,f_two,f_three);
f_one get a byte and output a sequence of bytes
f_two get a sequence of byte representing a utf-8 char and output a byte
if possible, or call f_three if impossible
f_three get a sequence of byte representing a utf-8 char that has not
representation in the charset and do something.
f_one is called when reading a file, the other two when writing it.
More trivially, a function to lookup between short ("latin1", "latin9") and
canonical names ("iso-8859-1", "iso-8859-15" respectively).
> As well as a Buffers>Character_Set menu entry...
Yes. And
* a "buffer-hook" like -*-charset: bla-*- in the first lines of the
file
* a %<something> formatting for showing buffer charset (well, it
should be "charset of the file associated with buffer, but well) in
status line.
have a nice day,
Romano
--
Romano Giannetti - Univ. Pontificia Comillas (Madrid, Spain)
Electronic Engineer - phone +34 915 422 800 ext 2416 fax +34 915 411 132
--------------------------
To unsubscribe send email to <jed-users-request@xxxxxxxxxxx> with
the word "unsubscribe" in the message body.
Need help? Email <jed-users-owner@xxxxxxxxxxx>.
[2003 date index]
[2003 thread index]
[Thread Prev] [Thread Next]
[Date Prev] [Date Next]