Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author

Re: num_in_chars incremented after each mbrtowc()

X-seq: zsh-workers 37330
From: Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx>
To: Sebastian Gniazdowski <sgniazdowski@xxxxxxxxx>
Subject: Re: num_in_chars incremented after each mbrtowc()
Date: Sun, 6 Dec 2015 17:33:55 +0000
Cc: Zsh hackers list <zsh-workers@xxxxxxx>
In-reply-to: <CAKc7PVDCiREGLtGP+SQbpoG74bBFtdn-EnSo31dYwr7xMNx+og@mail.gmail.com>
List-help: <mailto:zsh-workers-help@zsh.org>
List-id: Zsh Workers List <zsh-workers.zsh.org>
List-post: <mailto:zsh-workers@zsh.org>
Mailing-list: contact zsh-workers-help@xxxxxxx; run by ezmlm
References: <CAKc7PVCmApbPrfcArqxJcDX0nHN9NmuE7zqcrSZb51bSfGyuuQ@mail.gmail.com> <20151206154956.104b10c6@ntlworld.com> <CAKc7PVDCiREGLtGP+SQbpoG74bBFtdn-EnSo31dYwr7xMNx+og@mail.gmail.com>

On Sun, 6 Dec 2015 18:03:08 +0100
> It should be:
>     return num + num_in_char > 0 ? 1 : 0;

OK, here's my full explanation of what this function actually does; it's
not what you say it does, but that doesn't mean I've got it right, of
course.  However, the real question is the one in the previous message,
about what API you actually need for what you're doing.

num is the "real" answer for real chracters.  It counts:

- 1 for MB_INVALID, and we count only the first octet from the input
string, then move down the input string for more.  We assume we'll
represent the character as a single width.

- 1 or the width for a valid character, depending on what the caller
requested.

Under these circumstances num_in_char is irrelevant.

num_in_char is only useful if we get some number of bytes for
MB_INCOMPLETE.  They can only occur if there are no more characters at
the end, since otherwise we would get MB_INVALID, not MB_INCOMPLETE.  In
this case, since we're never going to produce anything else, we *assume*
(and this assumption may be wrong) that the right way to deal with it is
as individual octets.  As there is no standard (for obvious reasons) as
to how to deal with incomplete characters, this may not be the most
convenient answer in practice.

If you think there is a logical error in the above, please state it.

Are you saying, for example, that a trailing set of chracters that are
MB_INCOMPLETE appear as a single output (albeit invalid) character (I
guess with a single width)?  That would mean the right return value was

     return num + (num_in_char > 0 ? 1 : 0);

(perhaps that was even what you meant above?)

pws

Follow-Ups:
- Re: num_in_chars incremented after each mbrtowc()
  - From: Peter Stephenson

References:
- num_in_chars incremented after each mbrtowc()
  - From: Sebastian Gniazdowski
- Re: num_in_chars incremented after each mbrtowc()
  - From: Peter Stephenson
- Re: num_in_chars incremented after each mbrtowc()
  - From: Sebastian Gniazdowski

Messages sorted by: Reverse Date, Date, Thread, Author