Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: file globbing



2021-08-05 09:05:19 -0700, Ray Andrews:
> On 2021-08-05 8:36 a.m., Peter Stephenson wrote:
> > > On 05 August 2021 at 16:27 Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx> wrote:
> > > 
> > d,1^[[:digit:]]*
> > 
> > 
> Cool.  I thought zsh always used ' [:digit:]' not the double bracket form. 
[...]

I would avoid [[:digit:]] in zsh globs / patterns especially for
input validation.

[:digit:] within a bracket expression is a POSIX character
class, it is a POSIX invention. It would be recognised, but
within bracket expressions only by anything specified by POSIX
and that uses shell filename patters or regular expressions
(basic or extended) such as sh (for globs or case constructs),
find (for -name/-path matching) for grep/sed/ed...

[X[:digit:]] would match on any character that is either X or
any character classified as decimal digit in the locale.

What that matches in practice depends on the system and locale. 
In 2016, someone pointed out to POSIX that isdigit() in the C
standard was not locale dependent and matched on 0123456789 only
(https://www.austingroupbugs.net/view.php?id=1078), so, to align
with that future versions of the standard will restrict
[:digit:] to match on 0123456789 only and will forbid to match
on any other decimal digits. I wouldn't be surprised if that's
later reverted again though as it's quite unintuitive /
inconsistent.

Still, there are systems where iswdigit() matches on a lot more
than 0123456789 in some locales, and as a consequence, the
[[:digit:]] of zsh globs and most other tools will too. For
instance, on FreeBSD 12.2 and in a en_US.UTF-8 locale, [[:digit:]]
matches on
0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९০১২৩৪৫৬৭৮৯੦੧੨੩੪੫੬੭੮੯૦૧૨૩૪૫૬૭૮૯୦୧୨୩୪୫୬୭୮୯௦௧௨௩௪௫௬௭௮௯౦౧౨౩౪౫౬౭౮౯೦೧೨೩೪೫೬೭೮೯൦൧൨൩൪൫൬൭൮൯෦෧෨෩෪෫෬෭෮෯๐๑๒๓๔๕๖๗๘๙໐໑໒໓໔໕໖໗໘໙༠༡༢༣༤༥༦༧༨༩၀၁၂၃၄၅၆၇၈၉႐႑႒႓႔႕႖႗႘႙០១២៣៤៥៦៧៨៩᠐᠑᠒᠓᠔᠕᠖᠗᠘᠙᥆᥇᥈᥉᥊᥋᥌᥍᥎᥏᧐᧑᧒᧓᧔᧕᧖᧗᧘᧙᪀᪁᪂᪃᪄᪅᪆᪇᪈᪉᪐᪑᪒᪓᪔᪕᪖᪗᪘᪙᭐᭑᭒᭓᭔᭕᭖᭗᭘᭙᮰᮱᮲᮳᮴᮵᮶᮷᮸᮹᱀᱁᱂᱃᱄᱅᱆᱇᱈᱉᱐᱑᱒᱓᱔᱕᱖᱗᱘᱙꘠꘡꘢꘣꘤꘥꘦꘧꘨꘩꣐꣑꣒꣓꣔꣕꣖꣗꣘꣙꤀꤁꤂꤃꤄꤅꤆꤇꤈꤉꧐꧑꧒꧓꧔꧕꧖꧗꧘꧙꧰꧱꧲꧳꧴꧵꧶꧷꧸꧹꩐꩑꩒꩓꩔꩕꩖꩗꩘꩙꯰꯱꯲꯳꯴꯵꯶꯷꯸꯹0123456789𐒠𐒡𐒢𐒣𐒤𐒥𐒦𐒧𐒨𐒩𐴰𐴱𐴲𐴳𐴴𐴵𐴶𐴷𐴸𐴹𑁦𑁧𑁨𑁩𑁪𑁫𑁬𑁭𑁮𑁯𑃰𑃱𑃲𑃳𑃴𑃵𑃶𑃷𑃸𑃹𑄶𑄷𑄸𑄹𑄺𑄻𑄼𑄽𑄾𑄿𑇐𑇑𑇒𑇓𑇔𑇕𑇖𑇗𑇘𑇙𑋰𑋱𑋲𑋳𑋴𑋵𑋶𑋷𑋸𑋹𑑐𑑑𑑒𑑓𑑔𑑕𑑖𑑗𑑘𑑙𑓐𑓑𑓒𑓓𑓔𑓕𑓖𑓗𑓘𑓙𑙐𑙑𑙒𑙓𑙔𑙕𑙖𑙗𑙘𑙙𑛀𑛁𑛂𑛃𑛄𑛅𑛆𑛇𑛈𑛉𑜰𑜱𑜲𑜳𑜴𑜵𑜶𑜷𑜸𑜹𑣠𑣡𑣢𑣣𑣤𑣥𑣦𑣧𑣨𑣩𑱐𑱑𑱒𑱓𑱔𑱕𑱖𑱗𑱘𑱙𑵐𑵑𑵒𑵓𑵔𑵕𑵖𑵗𑵘𑵙𑶠𑶡𑶢𑶣𑶤𑶥𑶦𑶧𑶨𑶩𖩠𖩡𖩢𖩣𖩤𖩥𖩦𖩧𖩨𖩩𖭐𖭑𖭒𖭓𖭔𖭕𖭖𖭗𖭘𖭙𝟎𝟏𝟐𝟑𝟒𝟓𝟔𝟕𝟖𝟗𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡𝟢𝟣𝟤𝟥𝟦𝟧𝟨𝟩𝟪𝟫𝟬𝟭𝟮𝟯𝟰𝟱𝟲𝟳𝟴𝟵𝟶𝟷𝟸𝟹𝟺𝟻𝟼𝟽𝟾𝟿𞥐𞥑𞥒𞥓𞥔𞥕𞥖𞥗𞥘𞥙

All decimal digits, some variations on the 0123456789 Arabic
ones, and some other decimal digits in some other scripts.

[0-9] itself, in general, is even worse. Not two systems or
utilities or library functions and version thereof agree on what
characters are ranked between 0 and 9. It could even match on
sequences of characters (collating elements).

That's not the case of zsh globs though where [0-9] only matches
on 0123456789, as ranges in zsh are based on the wide char value
of the characters (or byte value if the multibyte option is
off), and for those 0123456789 characters specifically, in
practice, the wide char values are consecutive and in that order
regardless of the locale and system.

Beware though that it only applies to zsh globs. It doesn't
apply to [0-9] in regexps which use the system's extended
regexps matching functions (or pcre with the rematchpcre
option; see also \d there).

The only thing guaranteed to match only 0123456789 regardless of
locale and system is [0123456789], do not use [[:digit:]] or \d
for that. In zsh, you can use [0-9] but only with globs.

[[ $d = [0-9] ]] && echo is one of 0123456789

is correct (in zsh, not in bash / ksh93)

[[ $d =~ '^[0-9]$' && echo is one of 0123456789

is not (at least on some systems/locales).

With set -o rematchpcre

[[ $d =~ '^[0-9]\Z' && echo is one of 0123456789

should be OK (so would the same with \d, though I wouldn't trust
it as it could vary with the version and what flags are passed
to the matcher as \d can be told to match other digits under
some circumstances).

Also beware re matching doesn't work properly on non-text.

See also
https://www.mail-archive.com/bug-bash@xxxxxxx/msg25885.html for
a glimpse at the (more messier) situation in the bash shell.

-- 
Stephane




Messages sorted by: Reverse Date, Date, Thread, Author