Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: read -d $'\200' doesn't work with set +o multibyte (and [PATCH])
> 2022/12/16 17:29, Oliver Kiddle <opk@xxxxxxx> wrote:
>
>>> + read -ed $'\xc2'
>>> +0:read delimited by a single byte terminates if the byte is part of a multibyte character
>>> +<one£two
>>> +>one
>>
>> Is this really what the standard requires (or will require)?
>> Breaking in the middle of a valid multibyte character looks
>> rather odd to me.
>
> The proposed standard wording appears to only talk about the case of the
> delimiter consisting of "one single-byte character". $'\xc2' is not a
> valid UTF-8 character so my interpretation is that they are leaving this
> undefined.
I thought the "one single-byte character" etc. applies only when C or
POSIX locale is in use.
> Behaviour that treats the input as raw bytes for a raw byte delimiter
> is consistent. This retains compatibility with the way things
> work for a non-multibyte locale. Not all files are valid UTF-8 and it
> can be useful to force things to work at a raw byte level.
I was thinking it would be enough if we can do 'byte-by-byte' analysis by
using C/POSIX locale (or by setting MULTIBYTE option to off).
In the web page Stehane mentioned:
https://austingroupbugs.net/view.php?id=243#c6091
"When the current locale is not the C or POSIX locale, pathnames can contain bytes that do not form part of a valid character, and therefore portable applications need to ensure that the current locale is the C or POSIX locale when using read with arbitrary pathnames as input."
But I'm not familiar with this type of documents.
Messages sorted by:
Reverse Date,
Date,
Thread,
Author