Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

unpatch: metafying zle line



I have a patch for the first step in making completion work with wide
characters.  It's quite long, so instead of posting a patch which will
fill people's mailboxes and no one will read I'll describe it here and
commit it some time during the day.

People who aren't interested in the implementation should simply note
that it might make completion unstable for a while.  This is in a good
cause and any bug reports will help the Unicode development, even
without the definition ZLE_UNICODE_SUPPORT that turns it on (at the
moment this is off unless you alter zsh.h).

Now the details.

Completion works on a multibyte string: that's tied to the way the main
shell works, because of the degree of interaction between them.(*) So
when we enter the completion section we take the variables zleline,
zlecs and zlell and turn them into metafied multibyte counterparts,
zlemetaline, zlemetacs and zlemetall.  The metafication exists already;
the new feature is the conversion from wide characters to multibyte
strings at the same time.  This necessitated different variables for the
line itself; it didn't do so for the cursor and length, but it seemed
neater to use different variables for those as well.  It's supposed to
be the case that only the metafied line is used from the main shell; the
interface with lex.c and hist.c is on the horrific side, but I don't
propose to play with that for the time being.

I had to catch various places (inevitably rather more places than I had
hoped) where metafy_line()/unmetafy_line() pairs were needed, for
example:  some occurrences of zrefresh(); the read-only uses of variables
BUFFER, LBUFFER, RBUFFER, CURSOR; zle -M and relatives.  If it's any
consolation, the lack of such pairs was already a bug in the existing
code and could have resulted in Meta characters popping up on the
command line.  With the new system, the likely result of a missed case
is a crash, since I deliberately set zleline or zlemetaline to NULL when
in the other state.

Note the existence of METACHECK() and UNMETACHECK() macros which are
turned on with the DEBUG definition and report the file name and line number
if the metafication state is wrong.  (By the way, there's a good
argument for adopting the use of file name and line number in the
standard DPUTS definition.)  These can be added to any other places
where it seems appropriate.

One thing I haven't done is handle the mark variable properly.  This
should really be updated with metafy_line() and unmetafy_line(), too,
but it was too minor for me to worry about at this stage.

There is one additional fix, for the lprompt output in singlerefresh().
This is completely untested!  singlerefresh() isn't used much nowadays.
Still no attempt to get character widths correct.  (This would need
either multibyte or wide character support in the prompt code.)

Indeed, I haven't yet tried any of this with ZLE_UNICODE_SUPPORT turned
on.  That's the next step: the first step is to ensure that the patch
works with the old system.  When that's done, we can be reasonably sure
that the conversion between zleline and zlemetaline is sound.  Then, I
hope, the basics of completion with ZLE_UNICODE_SUPPORT will work
without too much extra work, in that any text with only single-byte
characters will (in theory) be handled straight away.

The step after that will be to teach the metafied areas of completion,
as well as (ouch) the main shell about multibyte characters.  However,
at this point it should be possible to turn on ZLE_UNICODE_SUPPORT for
systems that have all the required support(**) as the functionality
should be no worse than what we have at present without
ZLE_UNICODE_SUPPORT.  This will make life easier since we will at least
be debugging a basically working system.

(*) It might theoretically be possible to move the interface so that
more of the completion code runs with wide characters; I have simply
intercepted the points where metafy_line currently runs (or should run).
However, that's substantially more work.  As I said before, changing the
main shell to use wide chars throughout isn't an option: we don't even
know that the entire byte stream input maps one-to-one to characters at
all (indeed, with tokens in it we know for sure it doesn't); the only
safe assumption is that a 7-bit subset of it contains characters we can
interpret.  Then we provide additional facilities (all TBD) when the
user knows a certain chunk is multibyte characters.

(**) Once we turn on ZLE_UNICODE_SUPPORT by default for systems where it
works, we can start thinking about relaxing some of the assumptions
underlying it since, as Oliver pointed out, they shouldn't all be
necessary.  However, we'll probably have to do this system by system to
find out which assumptions we can safely relax.  One interesting case is
Solaris 8 which currently doesn't meet all the tests but should
nonetheless have all the features we actually require.

-- 
Peter Stephenson <pws@xxxxxxx>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

**********************************************************************



Messages sorted by: Reverse Date, Date, Thread, Author