Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Pattern changes, part 2
- X-seq: zsh-workers 20500
- From: Peter Stephenson <pws@xxxxxxx>
- To: zsh-workers@xxxxxxxxxx (Zsh hackers list)
- Subject: Pattern changes, part 2
- Date: Mon, 18 Oct 2004 12:47:02 +0100
- Mailing-list: contact zsh-workers-help@xxxxxxxxxx; run by ezmlm
I did some more work on pattern matching over the weekend. The main
idea is to make it easier to handle multibyte characters by using
the normal string representation whenever convenient. All tests still
pass.
- The test string is now unmetafied for comparing against the pattern.
Literal strings in the pattern are also unmetafied. I've turned
the METAINCs in the pattern matcher into CHARINCs where appropriate;
this is currently a trivial increment but is a placeholder to say "go
to next character". (There is no change in places where the string
remains metafied which will still need more thought.) The new code
should be significantly more efficient during pattern matching,
since it doesn't have to test for Meta characters in many
places, although I haven't benchmarked it.
- Character sets [...] are still metafied; we need the special
characters to indicate ranges and Posix ctype names.
- Pure strings are still metafied. (These are signalled by a special
flag indicating the value stored is a string rather than the normal
pattern programme.) It became clear that changing this would be
inefficient, particularly in globbing where we use the result of the
pattern matcher to add to the (metafied) path buffer. There are
actually two cases:
o We can spot immediately that the string doesn't have special
characters. This is the normal case and is handled fairly
efficiently.
o There are special characters around but nonetheless the string is a
pure string. There is one case where we need to handle this
properly, which is when the string in question is ".." or ".", since
those are never matched by globbing. An example where this could
occur would be a path segment (#i).. with extended globbing. Here,
we only find out we have a pure string after unmetafying into the
pattern programme, so we need to metafy again. This isn't so hot,
but it's actually a rare corner case.
- The interface used by parameter substitutions has been tidied up.
o The call patmatchlen() gets the length of the match, so that nothing
outside pattern.c needs pointers into the test string. This was
necessary since the strings may now be reallocated, but is neater
anyway. (This is the metafied length, which is what the parameter
code needs --- and this will probably continue, I don't thinks
there's a case for unmetafying there. There is some minor
inefficiency in counting metafiable characters in the matched part
of the trial string.)
o The horrible global patoffset has disappeared. Now the offset to
be added to indices into parameters is passed as an argument. I
should have done it this way all along.
- Minor fix for numeric ranges: <num-> will now match any integer that
is too large to represent in the internal integer type. This has
worked for <-> for some time, but it wasn't special-cased if there was a
lower range.
I will commit this directly (with a ChangeLog entry, this time).
By the way, we really need a lot more tests which require the use of the
Meta character, and not just for pattern matching. Adding this while
the character representation is in flux is probably not particularly
useful, however.
--
Peter Stephenson <pws@xxxxxxx> Software Engineer
CSR Ltd., Science Park, Milton Road,
Cambridge, CB4 0WH, UK Tel: +44 (0)1223 692070
**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.
www.mimesweeper.com
**********************************************************************
Messages sorted by:
Reverse Date,
Date,
Thread,
Author