Virus linguistics - searching for ethnic words

Masaki Suenaga Symantec

Most viruses 'speak' in English, and English-'speaking' mass-mailing worms tend to spread worldwide. Virus analysts generally can also understand English. As a result, we can tell our customers about what kind of message is sent or what is targeted.

Even if the viruses speak in French or Portuguese, we might be able to extract the correct text from them. But there is quite a bit of room for error when extracting Portuguese words.

If the text is not written in the West-European code page, however, we have to guess which code page was used. If we fail, we will get nothing, and therefore cannot provide the same level of precise information to customers as we could if it were English text.

Encrypted English strings can be decrypted technically. Natural languages might look like hieroglyphics to those unfamiliar with the language. Machine translation is widely used nowadays and can be very useful when we know the correct strings and what language is used. The question is, how do we determine these? This paper will provide some tips.



twitter.png
fb.png
linkedin.png
hackernews.png
reddit.png

We have placed cookies on your device in order to improve the functionality of this site, as outlined in our cookies policy. However, you may delete and block all cookies from this site and your use of the site will be unaffected. By continuing to browse this site, you are agreeing to Virus Bulletin's use of data as outlined in our privacy policy.