Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: utf-8
18.12.2014, 20:38, "Ray Andrews" <rayandrews@xxxxxxxxxxx>:
> On 12/18/2014 01:25 AM, Peter Stephenson wrote:
>
> Mikael, Peter:
>> Chapter 5 of the FAQ is the best place to start. You can see this
>> online at http://zsh.sourceforge.net/FAQ/zshfaq05.html#l52. The
>> version in Etc of the source is newer but I don't think there are
>> significant differences. pws
>
> Very nicely written. That's exactly what I wanted to learn. And tho I
> knew it
> previously, I had semi forgotten the difference between unicode and utf-8,
> which lead to the fuzzy question. To ask it again more accurately, where are
> extended unicode characters permitted? Or perhaps that's better reversed,
> where are they *not* permitted? Can a variable have a name beyond ASCII?
> I see that zsh is transparent to utf-8 everywhere, but that does not presume
> that one has use of the entire unicode charset in all situations.
It is permitted at least in variable and function names: though I cannot find anything relevant in manual regarding them, but code that implements `isident` function that is used to check for variable names (not function names, I do not know this part) indirectly uses library function `iswalnum` which in turn knows about unicode character classes (depends on LC_CTYPE).
AFAIK function name can be anything that is not parsed as anything else: the following definition works:
'()' () {
echo Test
}
\(\)
# Outputs Test.
More:
$PATH () {
echo Test
}
/home/zyx/.gem/ruby/1.9.1/bin:<skip>:/opt/ekopath/bin
# Outputs Test as well.
. It looks like zsh code was intentionally modified to use `iswalnum` for `itype_end` called from `isident`. It also appears that UTF-8 characters in IFS are also recognized: `itype_end` handles them as well and I do not think such handling was added without a reason. Everything is locale-bound in any case because libc functions are used and not something like icu.
Messages sorted by:
Reverse Date,
Date,
Thread,
Author