The rise and rise of image-based spam

2006-11-01

John Graham-Cumming

Independent consultant, France
Editor: Helen Martin

Abstract

John Graham-Cumming charts the rise of image-based spam.


Introduction

Anyone looking in their quarantined spam folder will soon notice that a lot of spam these days is being sent using images instead of text.

In a paper presented at the Virus Bulletin conference last month [1], Dmitry Samosseiko and Ross Thomas of Sophos reported that, on some days, 40% of the spam seen in SophosLabs' spam traps is image-based spam and that the amount of image-based spam has doubled since the start of 2006.

At a birds-of-a-feather meeting on image spam held during the same conference, a representative of one major anti-spam service provider reported that, some days, the amount of image-based spam peaks at 95% of all spam sent, but that 20–40% is typical.

Nothing new

However, image-based spam is not new. The very first trick entered in the Spammers' Compendium [2] in January 2003 is 'The Big Picture', which entails sending a spam that consists merely of an embedded picture.

Samosseiko and Thomas [1] report having seen image-based spams in Russian in September 2004. The users of SpamAssassin have been discussing the use of optical character recognition (OCR) techniques since 2002 – indicating that image-based spam has been with us for a number of years. According to IronPort statistics [4] 1% of spam was image-based in June 2005. By June 2006 that figure had risen to 16%.

Within the last six months image-based spam has become a major problem for anti-spam vendors, all of whom have adapted their tactics and issued press releases touting their solutions. The spammers, meanwhile, have not remained idle and have modified the types of image-based spam they are sending in an attempt to avoid filtering.

Clearly, spammers believe that the right battleground between spam filters and spam is in image processing – and accordingly they have switched from the use of ever more complex text obfuscation (which spam filters easily see through – see for example [3]) to image obfuscation.

The big picture

In 2003, when I first reported on image-based spam [2] the images used were very simple. Figure 1 shows a spam image from late 2003.

At that time images were typically loaded from remote websites using a simple HTML <img src=> tag. But the use of remote images died out as email clients (such as Mozilla Thunderbird and Outlook Express 6 in Windows XP Service Pack 2) started blocking remote images by default, and spammers started sending their images as MIME-encoded attachments (as they still do). The image in Figure 1 was sent as an attached GIF file.

Late 2003 image-based spam.

Figure 1. Late 2003 image-based spam.

2006: The year of image-based spam

Despite having existed for a while, there was little innovation in image-based spam until 2006. In January 2006 the trick named 'The Small Picture' was added to [2]. Within the next couple of months OCR plug-ins were announced for SpamAssassin.

'The Small Picture' involves embedding GIF images in the email message. Each image consists of a single letter and is positioned strategically within the text of an HTML-based email to form readable text. Figure 2 shows an example of spam using the 'Small Picture' trick.

'The Small Picture'.

Figure 2. 'The Small Picture'.

In Figure 2 the letter 'm' in Ambien, 'o' in Propecia, the first 'a' in Xanax, 'e' in Levitra, the first 'A' in VIAGRA and 'a' in Soma are embedded images.

Chop GUI

Towards the middle of January 2006 the trick 'Chop GUI' was added to [2]. Here, the spammer attempted to avoid detection (and possibly OCR) by chopping a single image into multiple, randomly chosen rectangles, and then reconstructing the original image using HTML. Figure 3 shows an example of 'Chop GUI' with the boundaries between the individual images highlighted; in the real spam there were no boundaries.

Simple 'Chop GUI'.

Figure 3. Simple 'Chop GUI'.

Figure 3 is a rather simple example, where the spammer has simply cut a single image horizontally through the text. A much more complex example is shown in Figure 4, once again with the boundaries highlighted. Here the spammer chose random cuts in the image and used HTML to reconstruct it.

Complex 'Chop GUI'.

Figure 4. Complex 'Chop GUI'.

At around the same time spammers started to try to resist optical character recognition of their images by overlaying their text with random lines (as shown in Figure 5) or by introducing random stippling of the background, as shown in Figure 6 (which can also be used to change the hash value of the image at will).

Random lines to avoid OCR.

Figure 5. Random lines to avoid OCR.

Random pixels to avoid OCR and hashing.

Figure 6. Random pixels to avoid OCR and hashing.

More recently, spammers have tried using fonts that are almost humanly unreadable as a tactic to avoid optical character recognition. However, this tactic appears to have died out – given that the fonts are hard to read for even the human recipient (see Figure 7) one assumes that the decline of this tactic has been due to the ineffectiveness of the spams.

Using fonts that are difficult to OCR.

Figure 7. Using fonts that are difficult to OCR.

Notice that the spam shown in Figure 7 also includes a block of random pixels in the bottom left-hand corner. The purpose of this is to change the hash value of the image each time it is generated.

Animated spam

As spam filters have become adept at filtering these images (albeit with random elements added to attempt to avoid hashing or detection), spammers have adapted to use animated GIFs. In August 2006 the first animation-based trick was added to [2]: 'Animated Noise'.

In 'Animated Noise' the spammer sends an animated GIF with a number of decoy frames that consist solely of random noise, and a single frame that contains the actual spam message. The real frame appears for a long period of time (for example, it may stay visible for ten minutes), whereas the decoy frames appear before and after the real frame and last mere milliseconds. The spammer is attempting to fool the spam filter into missing the real frame, although examination of the animation times makes the real frame easy to detect.

Figure 8 shows three decoy frames used in a real 'Animated Noise' spam.

Decoy frames that were displayed for 100ms before and after a real frame.

Figure 8. Decoy frames that were displayed for 100ms before and after a real frame.

A progression of the 'Animated Noise' trick was to use the rapidly shown decoy frames to display a 'subliminal' message. As well as flashing frames with random noise added, the decoy frames contained the word 'BUY' in random positions. Figure 9 shows two frames from a 'subliminal' spam.

Subliminal spam (top frame shown for 10ms; bottom frame shown for 17s).

Figure 9. Subliminal spam (top frame shown for 10ms; bottom frame shown for 17s).

The frame containing the word 'BUY'’ (there were three such decoy frames) was flashed for 10ms on screen, the frame containing the real spam message remained visible for 17s.

Since finding the real frame is relatively easy (a spam filter need only look for the frame that is displayed for the longest time, or perhaps for the first frame that is displayed for many seconds), spammers have adapted to use both animation and GIF transparency.

Strip mining

The first attempt at using animation and transparency follows the 'Chop GUI' style of splitting an image into parts (in this case strips). Each strip of the image is a single frame in the spam image on a transparent background. By animating the various strips one after another each frame shows through the transparency to the next frame, building up a complete picture. This is called the 'Strip Mining' trick in [2].

Figure 10 shows two frames from a 'Strip Mining' spam (the blank areas are transparent) and Figure 11 shows the final image after the animation has completed.

'Strip Mining'.

Figure 10. 'Strip Mining'.

Final image of a 'Strip Mining' spam.

Figure 11. Final image of a 'Strip Mining' spam.

And most recently spammers have taken the animation plus transparency to a new extreme in their battle against OCR by starting from a single spam image and randomly choosing pixels from it to appear on one of two animated frames. In this way neither frame contains text that is readable (by a human or a machine), but the final merged image is readable.

Despite the cleverness of this scheme the developers of the SpamAssassin OCR plug-in report that the latest version of the plug-in merges and OCRs these image spams successfully.

Figure 12 shows an example of two frames from such a spam, and Figure 13 shows the merged result.

Random pixel stripping between two animated, transparent frames.

Figure 12. Random pixel stripping between two animated, transparent frames.

Result of merging the two frames using animation.

Figure 13. Result of merging the two frames using animation.

Conclusion

Despite the cleverness of the trickery being used by spammers, current techniques for filtering image-based spam are working. Anti-spam vendors report using a mixture of image hashing, regular expressions and examination of image meta data (such as the palette, presence of animation and compression ratio) to catch image-based spam. A great danger for spammers is that (as in the case of text-based spam) they become enamoured with the obfuscation possibilities present in image-based spams only to see their spams easily filtered just by detecting the obfuscations themselves.

Spam filter authors will need to be on the lookout for new image-based spam techniques as spammers are innovating actively to attempt to avoid detection. Recently spammers have started to switch from GIF formatted images to PNG, some spammers are corrupting their images deliberately to make decompression difficult, and others are reporting that an image is a JPEG when it is, in fact, a GIF.

Image-based spam remains fertile ground for spammers and spam filter authors.

Bibliography

[1] Samosseiko, D.; Thomas, R. The game goes on: an analysis of modern spam techniques. Proceedings of the 16th Virus Bulletin International Conference 2006.

[2] Graham-Cumming, J. The Spammers' Compendium. http://www.jgc.org/tsc/.

[3] Sharma, V.; Lewis, S. Exploiting spammers' tactics of obfuscations for better corporate-level spam filtering. Proceedings of the 16th Virus Bulletin International Conference 2006.

[4] Image-based spam makes a comeback. http://www.dmconfidential.com/blogs/column/Web_Trends/916/.

twitter.png
fb.png
linkedin.png
hackernews.png
reddit.png

 

Latest articles:

APT vs Internet service providers – a threat hunter's perspective

Organizations in the telecommunications sector are faced with a multitude of threats, ranging from targeted attacks to malicious actions attributable to the criminal or activist world. Telsy researcher Emanuele De Lucia reports what he observed in…

VB2019 paper: APT cases exploiting vulnerabilities in region‑specific software

Some APT attacks are carried out by exploiting vulnerabilities in region-specific software. Government agencies frequently use such localized software, and this tends to be the target of attackers. In Japan, there have been many cases where attacks…

Detection of vulnerabilities in web applications by validating parameter integrity and data flow graphs

Web application vulnerabilities are an important entry vector for threat actors. In this paper researchers Abhishek Singh and Ramesh Mani detail algorithms that can be used to detect SQL injection in stored procedures, persistent cross-site scripting…

VB2019 paper: Cyber espionage in the Middle East: Unravelling OSX.WindTail

It’s no secret that many nation states possess offensive macOS cyber capabilities, though such capabilities are rarely publicly uncovered. However, when such tools are detected, they provide unparalleled insight into the operations and techniques…

VB2019 paper: 2,000 reactions to a malware attack – accidental study

This paper presents an analysis of 1,976 unsolicited answers received from the targets of a malicious email campaign, who were mostly unaware that they were not contacting the real sender of the malicious messages. Many of the victims were unaware…


Bulletin Archive

We have placed cookies on your device in order to improve the functionality of this site, as outlined in our cookies policy. However, you may delete and block all cookies from this site and your use of the site will be unaffected. By continuing to browse this site, you are agreeing to Virus Bulletin's use of data as outlined in our privacy policy.