Internationalized Domain Names (IDN)

Aside: It's not just a little ironic that I'd prefer to spell that ``Internationalised''. We can't even agree how to write words in the same language.

How do you get an IDN like http://pógmothóin.now.ie/ to work?

1. DNS

Punycode is the method chosen to encode Unicode characters into US-ASCII for interoperability with the DNS. VeriSign have an online convertor from native script. Put your desired DNS name into the convertor. Be warned: punycode domain labels are hella ugly.

pógmothóin.now.ie comes out as xn--pgmothin-v3af.now.ie. Add a DNS record for that name which directs users to your web server. I chose to use a CNAME to my web server's canonical name.

BIND

xn--pgmothin-v3af.now.ie.  IN  CNAME  a.mx.now.ie.

tinydns (djbdns)

Cxn--pgmothin-v3af.now.ie:a.mx.now.ie

2. Web server

I'm using Apache 2.2 with name-based virtual hosting so this information is based on that.

NameVirtualHost *:80

<VirtualHost *:80>
  ServerName pógmothóin.now.ie
  ServerAlias xn--pgmothin-v3af.now.ie
  [ ... snip other VirtualHost directives ... ]
</VirtualHost>

Gotchas

Native characters won't match in ServerName

Web browsers make their HTTP request using the punycode string in the Host header so only that will match the VirtualHost. Make sure the punycode version is specified either as a ServerName or ServerAlias. I prefer to make the native version be the ServerName (for purely stubborn reasons).

This also means that Apache will refer to itself using the punycode name. For example, a HTTP 404 error ends with: Apache Server at xn--pgmothin-v3af.now.ie Port 80. The UseCanonicalName directive doesn't really help. Error pages aren't expected to be in Unicode so clients will interpret the non-ASCII characters as jibberish.

VirtualHost name in logs

It's printed as p\xc3\xb3gmoth\xc3\xb3in.now.ie


Client oddness

Firefox 2 & 3

An IDN entered into the address bar is replaced by its punycode representation after the page has loaded unless the top-level domain in the URL is on the Mozilla list of approved IDN TLDs.

Internet Explorer 7/8

IE's behaviour is based on your language settings.

Safari

Apple maintain a text file of scripts that are allowed in IDN URLs.