Normalizing filenames between Linux and Mac OS data backups/mirrors

If you are mirroring data with UTF-8 international character set filenames between Linux and Mac OS X (through 10.6 at least), you may notice that something like rsync has difficulty seeing files as the same. Mac OS (HFS+ primarily*) forces everything into UTF-8 NFD (de-composed, o-umlaut = 2 things, not 1), while linux uses NFC (composed, o-umlaut = 1 thing).

Here is one coherent and one stubborn solution.

  1. Linux is smart, it allows either, just prefers NFC. Convert your linux filenames to NFD.

convmv -r -f utf8 -t utf8 --nfd /path/to/data

  1. Fuck that noise, Mac OS is a bitch for forcing NFD. NFC is better. If workable (infrequent syncing/restores, lowish data volume in colliding UTF-8 named files), convert back to NFC on linux after transferring data off the mirror.

convmv -r -f utf8 -t utf8 --nfc /path/to/data

Add --notest when ready to make it go now.

  • Note of interest: This problem also affected MacZFS volumes for some reason (rsync data replication). Not sure if that’s a design decision by MacZFS or inherited through some core filesystem related libraries.