Posted by Virus Bulletin on Mar 3, 2011
Non-Latin characters in URLs used to trick filters.
By using internationalized domain names (IDN), spammers manage to avoid detection of URLs in their messages.
IDNs were introduced in 2003 and allow for domain names in non-Latin alphabets, such as Russian, Chinese and Arabic, as well as in Latin with diactitics. On top of that, last year saw the introduction of foreign top level domains (TLD). With the globalization of Internet usage - and, for instance, the Chinese now forming the largest national group on the net - these developments are understandable.
However, many experts have pointed out that IDNs are a goldmine for those with less honest intentions. The much-quoted example is that where a phisher manages to register a domain equal to that of a well known bank, but with the Cyrillic lower case 'а' substituted for its indistinguishable Latin equivalent 'a' in the word 'bank' - the domains are different, but end-users are unlikely to notice.
International TLDs provide even more opportunities for spammers. Symantec reports that its researchers discovered a wave of pill spam, aimed at German speakers, but using domains on the .рф TLD: the international TLD for Russia. With URL detection forming an important part of spam filters, and even more so in anti-phishing tools, the use of IDNs and non-Latin TLDs will require developers to update their solutions.
The Register quotes Symantec's Paul Wood, who says that if URLs are used in Punycode form (a way to encode IDNs using ASCII characters), adapting URL detectors is rather straightforward. However, if IDNs are not encoded in Punycode, "more work could be required, particularly given the various different character encodings that could be used to represent these characters."