Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: PATCH: (large) initial support for combining characters in ZLE.
- X-seq: zsh-workers 24837
- From: Peter Stephenson <pws@xxxxxxx>
- To: zsh-workers@xxxxxxxxxx
- Subject: Re: PATCH: (large) initial support for combining characters in ZLE.
- Date: Fri, 18 Apr 2008 10:40:16 +0100
- In-reply-to: <9F0DCF1B-F5FB-4150-A4FF-C441DE615404@xxxxxxxxxxxxxxxxx>
- Mailing-list: contact zsh-workers-help@xxxxxxxxxx; run by ezmlm
- Organization: CSR
- References: <20080413175442.0e95a241@pws-pc> <9F0DCF1B-F5FB-4150-A4FF-C441DE615404@xxxxxxxxxxxxxxxxx>
On Fri, 18 Apr 2008 03:33:36 +0900
"Jun T." <takimoto-j@xxxxxxxxxxxxxxxxx> wrote:
> At 17:54 +0100 08.4.13, Peter Stephenson wrote:
> >the base character must be an alphanumeric (and
> >I'm not sure about the numeric, I need to find a better definition),
> and
>
> I think this is too restrictive, because in some Asian languages
> (Japanese, Korean, Thai, etc.) the base character can be non-alphaget.
> For example, in Japanese, Hiragana/Katakana can be combined with
> U+3099 (VOICED SOUND MARK) or U+309A (SEMI-VOICED SOUND MARK).
> Example: U+3057 U+3099 = "じ"
> the base character U+3057 = "し" is not an alphanumeric.
It's treated as alphanumeric here, but what you say doesn't surprise me. I
think we can widen it without problems to anything that isn't a whitespace
or a control character, at least, so iswgraph() might be the thing. We
definitely need to avoid special whitespace characters (tabs, feeds,
newline, carriage return) since we don't know what's going to happen.
Also, as far as I can see marking a character as "control" is a signal not
to print it directly.
> >the zero-width characters afterwards (I haven't imposed a limit on how
> >many there are) must be punctuation.
>
> I guess this is also too restrictive. I have run the code like the
> following
> on Fedora7:
>
> wchar_t w;
> setlocale(LC_ALL,"");
> for(w=1; w<0x2ffff; ++w) {
> if(wcwidth(w)==0 && iswpunct(w)==0) {
> printf("%05x: %lc\n",w,w);
> }
> }
>
> It listed 166 characters, all of which seem to be combining chars in
> Thai or Korean (U+0e4e and U+1160 may not be combining, I'm not sure).
Probably looking for a graphic zero width character is good enough.
There may not be control character with zero width, anyway.
--
Peter Stephenson <pws@xxxxxxx> Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK Tel: +44 (0)1223 692070
Messages sorted by:
Reverse Date,
Date,
Thread,
Author