Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: PATCH: PCRE support for embedded NUL characters



On Mon, 17 Sep 2012 15:04:23 -0400
Phil Pennock <zsh-workers+phil.pennock@xxxxxxxxxxxx> wrote:
> Yeah, but correlating offsets in unmetafied strings to the metafied
> strings for then counting is non-trivial (or so it seems to me).

It's not so difficult: we already do most of this conversion for other
similar cases of pattern matching, where we need to convert offsets in
octets to characters, the only difference being the metafication which
just means the loop over the characters is slightly different.  In fact,
it's if anything marginally easier since the metafication is a pure zsh
invention.

> And wcwidth() tells how many display cells are needed for a given
> character, assuming a monospace layout.  For this, instead, mblen() is
> needed, on a character-by-character basis.  Given that mblen() is C99, I
> opted to avoid it, and implement this just for UTF-8 with bit-pattern
> examination to quickly count past characters.  We only initialise PCRE
> for wide characters with UTF-8.  I've no idea how much effort we want to
> put into supporting non-UTF-8 wide-character PCRE across multiple OSes.

Doing it just for UTF-8 is incompatible with the rest of the shell.  It
should be possible to do it similarly to mb_metastrlen() in utils.c.
Basically the only difference is using an explicit length rather than null
termination, plus not having an internal test for Meta characters.

-- 
Peter Stephenson <pws@xxxxxxx>            Software Engineer
Tel: +44 (0)1223 692070                   Cambridge Silicon Radio Limited
Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, UK


Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom
More information can be found at www.csr.com. Follow CSR on Twitter at http://twitter.com/CSR_PLC and read our blog at www.csr.com/blog



Messages sorted by: Reverse Date, Date, Thread, Author