On Jun 1, 2014, at 11:25 AM, Daniel Shahaf <d.s@xxxxxxxxxxxxxxxxxx> wrote: > Bart Schaefer wrote on Sat, May 31, 2014 at 14:29:26 -0700: >> On May 31, 8:16pm, Peter Stephenson wrote: >> } >> } I'm currently wondering if there is scope for normalising keyboard input >> } really early --- before we feed it back to the shell --- and turning it >> } back into the usual keyboard form right at the end >> >> Per thread with Chet, I think normalizing the filesystem is the easier >> way to go. Keyboard input is already as close to normalized as it needs >> to be, I think, and with only a couple of exceptions all the names we >> get from the filesystem come through zreaddir(). > > What about, say, people doing 'ls' and copy-pasting a filename from the > output into a command line? Wouldn't that result in NFD keyboard > input? > > FWIW, while OS X always returns NFD filenames, one could also imagine an > OS that is normalization-aware (forbids creating a file if its > normalized name is the same as the normalized name of an existing file) > but octet-sequence-preserving, and on such an OS both the readdir() > output and the user input would need to be normalized. > > Also, other unixes allow you to have both the NFC-form and NFD-form in > the same directory, e.g., 'touch fooá fooá' works just fine on linux > ext4 (the first filename is composed, the second decomposed); in such > cases normalization magic should not be done. > > Fun! :-) > > Daniel Fortunately, I think Mac OS X can handle input in decomposed or composed form. Here’s some code I tested: ================ hangul.c ========================= #include <stdio.h> #include <dirent.h> int main() { char *fname = "한글/가나다"; char *dirname = "한글"; DIR *dirp = opendir(dirname); struct dirent *direntry = NULL; FILE *fp = fopen(fname, "r"); char buf[512]; if (dirp == NULL) { printf("Failed to read the directory: %s\n", dirname); if (fp > 0) fclose(fp); return -1; } while ((direntry = readdir(dirp)) != NULL) { printf("file name: %s\n", direntry->d_name); if (direntry->d_name[0] == '.') continue; } closedir(dirp); if (fp == NULL) { printf("Failed to read %s\n", fname); return -1; } else { fread(buf, sizeof(buf), 1, fp); printf("%s\n", buf); } fclose(fp); return 0; } ======= END ======== And the output is > mkdir 한글 > touch 한글/가나다 > echo “test success!” > 한글/가나다 > clang -g hangul.c > ./a.out file name: . file name: .. file name: 가나다 test success! I checked the contents of memory using lldb and I confirmed that fname is UTF-8 composed chars and the returned filename from readdir is UTF-8 decomposed chars. But file operation (reading in the above codes and writing is also working) is working perfectly. So I think we can convert decomposed filenames into composed after readdir. It will work at least for Korean. Detecting, composing, and decomposing hangul can be done easily.
Attachment:
signature.asc
Description: Message signed with OpenPGP using GPGMail