John Graham-Cumming thinks it's time for an information-rich naming scheme that can be used to refer to spammer and phisher content tricks.
Copyright © 2006 Virus Bulletin
I have been tracking the tricks used by spammers in the bodies of their messages since January 2003. Three years on, I have collected 55 distinct tricks and published them on The Spammers' Compendium website . When I first started publishing the site I gave each of the tricks a humorous name (such as 'Camouflage' or 'Honey, I shrunk the font'), and some of these names have entered popular use (such as 'Hypertextus Interruptus', which is enshrined in the SpamAssassin test INTERRUPTUS).
The trick count has been growing steadily over the last three years: Figure 1 shows the number of tricks in The Spammers' Compendium by calendar quarter. It is interesting to note that trick innovation or discovery seems to slow down in the fourth quarter of each year – perhaps indicating that spammers are in the middle of spamming their Christmas campaigns at that time, and not spending time on modifying their software.
Entries are made in The Spammers' Compendium when the tricks have been identified by me in spam seen in the wild in my spam traps, or in spam emailed to me by volunteers. Submitters receive credit in The Spammers' Compendium for submitting a new trick.
While the humorous names make good copy for journalists writing about the latest devious spammer trickery, they are less useful to people working in anti-spam research because they do not, in themselves, convey much information. In this article (and the related blog post ) I propose a drier, but more information-rich, naming scheme that can be used to refer to spammer and phisher content tricks.
At the 2004 Virus Bulletin conference I presented a paper (see ) in which I analysed some trends in the use of spammers’ tricks by examining the appearance of various tricks (as extracted from The Spammers' Compendium) against a large corpus of spam supplied by Sophos. One of the problems in that analysis was that I was forced to write code to identify the tricks in The Spammers' Compendium and I also had to explain each trick as the names conveyed little information.
To remedy that situation and provide a foundation on which other authors and vendors can build research into spammer trickery I think it's time for a uniform naming scheme for these tricks.
In the uniform naming scheme, which I am calling the Spam/Phish Uniform Trick Repository, or SPUTR, each name consists of three '!'-separated parts: a purpose, a name, and a technology. The purpose is the reason for the trick (for example, the trick is used to obscure a URL, or to insert innocent words). The name is derived from the current Spammers' Compendium pejorative name. The technology identifies the way in which the trick is coded (for example, with HTML or MIME).
Table 1 contains a list of proposed 'purposes' that can be used to categorize tricks.
|BWO||Bad word obfuscation||Making it hard for a filter to parse potentially bad words (e.g. Viagra)|
|GW||Good word insertion||Adding words likely to confuse a statistical filter.|
|HB||Hash busting||Inserting randomness designed to make message hashing hard.|
|TA||Tokenization avoidance||Preventing a filter from tokenizing a message.|
|UH||URL hiding||Hiding a URL so that a user is fooled into clicking an incorrect link.|
|UO||URL obfuscation||Making it hard for a filter to identify a URL and check it against a black list.|
|WB||Web bugs||Inserting a beacon that tells the spammer that a message has been read.|
Table 1. Trick purposes
For a single name there could be multiple tricks using different technologies (e.g. some tricks might be implemented using HTML or CSS), or tricks intended for different purposes (words might be inserted to fool a Bayesian filter or break a hash).
Table 2 shows the 'technologies' that would be recognized in the naming scheme:
|CSS||Use of CSS|
|HTML||Any HTML without using CSS|
|MIME||Manipulation of MIME|
Table 2. Technology identifiers.
For example, the original Invisible Ink trick, written using HTML, would be referred to as:
while a CSS variant would be:
Names would be generated only for tricks that have been seen in the wild.
With such uniform naming it would be possible to analyse spams and phishes (perhaps even specific recognizers for each trick could be written) and the trends built up over time to see how individual tricks and individual classes of tricks are changing.
Table 3 shows the proposed mapping from the current Spammers' Compendium names to the SPUTR name.
|The Big Picture||TA!BigPicture!HTML|
|Invisible Ink||GWI!Invisible!HTML and GWI!Invisible!CSS|
|The Daily News||GWI!BigTag!HTML|
|Slice and Dice||TA!SliceNDice!HTML|
|MIME is money||GWI!PlainNotHTML!MIME|
|Lost in Space||BWO!Space!Plain|
|Ze Foreign Accent||BWO!Accent!Plain|
|Speaking in Tongues||HB!Tongues!Plain|
|The Black Hole||BWO!BlackHole!HTML|
|A Numbers Game||BWO!Numbers!HTML|
|Honey, I Shrunk the Font||GWI!ShrunkFont!HTML|
|No Whitespace, No Cry||TA!NoWhitespace!Plain|
|And in the Right Corner||HB!RightCorner!Plain|
|A Form of Desperation||GWI!Form!HTML and BWO!Form!HTML|
|It's Mini Marquee!||GWI!Marquee!HTML|
|You've Been Framed||BWO!Framed!HTML|
|Don't Cramp My Style||GWI!Style!CSS|
|Style Wars: Episode 1||Included in other tricks|
|The tURLing Test||UO!TurlingTest!Plain|
|Sound of Silence||WB!Silence!HTML|
|Doing the Splits||BWO!Splits!Plain|
|But is it Art?||BWO!ASCIIArt!Plain|
|Absolute Zero||Same as Control Freak|
|Catch a Wave||TA!Wave!HTML|
|You Cannot be Serious||UO!Mcenroe!HTML|
|The Small Picture||TA!SmallPicture!HTML|
|Now you see it; now you don't||BWO!Copperfield!CSS|
|Slick Click Trick||UH!Caption!HTML|
|Whiter Shade of Pale||TA!Pale!HTML|
Table 3. Trick name mapping.
If the anti-spam and anti-phish community gets together now it may be able to avoid the mess that exists in the anti-virus industry where vendors compete to release information about viruses and each have their own way of naming them.
Worse, the current unifying malware scheme maintained by MITRE (the Common Malware Enumeration or CME; see http://cme.mitre.org/) unifies virus names by providing a simple identifier for each that contains absolutely no information. For example, the Kukudro.C worm is currently assigned the uninformative name 'CME136'.
In order to help the anti-spam and anti-phish community I propose to:
Maintain a website containing the uniform naming scheme and keep it updated as new spammer tricks are reported to me;
Allow any organization to use the names freely and identify themselves as a user by including their name or logo on an appropriate page on the site without any form of compensation;
Accept reports of new spammer and phisher trickery for inclusion on the website;
Host a mailing list for all interested parties so that tricks can be discussed and named;
Manage an open source project that creates software that can analyse an RFC822 message and output the tricks used.
In order to do that I would like the support of at least five major email security companies in the form of a decision to use the SPUTR names in their own research and publications.
Undoubtedly there will be many things about this proposal that old anti-virus hands, and those fighting email security problems would like to modify or comment on; please send your comments to
 The Spammers' Compendium. http://www.jgc.org/tsc/.
 Graham-Cumming J. Proposed uniform naming scheme for spammer/phisher content trickery. http://www.jgc.org/blog/2006/06/proposed-uniform-naming-scheme-for.html.
 Graham-Cumming J. The Waxing and Waning of Spammers' Trickery. Proceedings of the Virus Bulletin International Conference, 2004. http://www.virusbtn.com/conference/vb2004/abstracts/jgrahamcumming.xml.