The scourge of blog spam

2007-06-01

Jessica Baumgart

Renesys, USA

Editor: Helen Martin

Abstract

Jessica Baumgart, an active blogger since April 2003, has contributed to a variety of weblogs on at least seven different platforms, helps lead a support group for bloggers, and eagerly deletes spam. In this article she describes the lesser-known, but increasing problem of blog spam.

Table of contents

Introduction
Blog spam
Search engine page ranking
Spam-blocking tools
Turings and captchas
The way to go
Concluding remarks

Introduction

Weblog spam doesn’t get as much attention as email spam because not as many people are plagued by it. However, it is a growing problem and one that has important consequences for the Internet and technology community. Blog developers, search engine companies and blog administrators will need to change their approaches to handling unwanted and unwelcome blog comments and fake weblogs.

Blog spam

Most weblog spam falls into one of three categories: comment spam, trackback spam, or a spam weblog. Comment spam is basically an unsolicited and sometimes unrelated comment on a weblog that might advertise a product, service, or website. Some comment spam is added manually by a person to a particular weblog entry. Much comment spam comes from scripts that can add many comments automatically to one post or many posts simultaneously.

Trackback spam happens as a result of the ability of some weblog platforms to show links to posts linking to a particular post. In some cases, real web pages of a spam-like nature create a genuine trackback by linking to a blog post. In other cases, bots exploit the weblog software to place fake trackbacks advertising products, services, and/or websites on weblogs.

A spam weblog, or splog, is a blog created for no better purpose than to advertise various products, services, or websites. Spam blogs can be found on some free weblog sites, like Blogger. While sometimes invisible to and ignored by many people, these blogs can often cause problems for search engines.

Bots and scripts are often faster, more efficient, and a less expensive way to create weblog spam than hiring people. However, some unsolicited blog comments and fake weblogs are the result of humans putting fingers to keys. For targeted spam, like a comment that includes text that is related to the nature of the post, or to circumvent a tricky registration process on a high-profile blog, sometimes humans are better for the task than a bot or a script.

A lot of blog spam is very similar to email spam. It advertises medication, vacation destinations, and all sorts of items that might also appear in an email inbox. It is often easy to determine that a comment, trackback, or weblog is garbage. However, it is not always simple to decipher whether something is blog spam. A popular comment goes something like this: ‘I like your site. I find it very useful. Please visit mine’, and includes a link. Until the blogger or comment reader follows the link or reads many identical comments on the same weblog, it might not be obvious that the comment is spam.

Some comments are rambling digressions that might or might not be genuine comments posted by real readers. Others are lists of links, which might be pure spam, or might have been posted by a reader in a hurry who wants to point out some related and possibly useful websites. Since the Internet has a global reach, it can also be difficult to deduce the nature of a comment posted in a foreign language and to determine whether or not it should be removed.

Some spam can be offensive and may relate to illegal activities, such as child pornography. Other spam points to malware sites that cause problems for people who follow the links. Enough trashy comments on a weblog can cause readers to change their minds about the value of the blog. Links to sites hosting malware and other unsavoury content might result in a weblog being excluded from certain search engines. Bloggers and blog administrators who choose to ignore spam on their weblogs are unlikely to be making the wisest decision.

Search engine page ranking

Spammers post comments and trackbacks and create spam blogs for a variety of reasons. Much like email spam, they want people to use certain products and services or visit certain websites. Unlike email spam, search engine rankings play a large role in blog spam.

People blogging on servers that have high page rankings in Google might find themselves besieged by more spam than someone blogging on their own personal server that does not have a high Google ranking. If Google thinks a site it ranks highly is linking to another site that appears lower in its search results, it might raise the rank of that site based on the authority of the links.

A high page rank often means heavy site traffic. If the site in question is selling anything (whether selling directly to the customer, or indirectly through advertising), heavier site traffic can translate into increased revenue. Thus, spammers use a variety of methods of blog spam to increase their page rank.

When weblog developers realized spammers were taking advantage of links in comments and trackbacks to increase their search engine page rank, many implemented a ‘nofollow’ tag in their link code. Adopters of the tag include LiveJournal and Drupal. The tag instructs search engine spiders not to follow links in an effort to quash spammers’ attempts to increase their page rank.

The tag often appears in the HTML code for example:

<a href=”http://www.virusbtn.com/” rel=“nofollow”>VB</a>

The blog software adds the tag to links automatically. Spiders from Google, Yahoo, and MSN Search obey the tag.

Spam-blocking tools

Present-day spam-handling tools at the blog administrator level vary based on the blog software. Many platforms offer ways to delete comments and trackbacks en masse, at the post-level, or by other means.

On some group blogs, only the primary administrator(s) has spam-deleting capabilities, rather than any contributor. WordPress has the Akismet plug-in that learns what comments and trackbacks might be spam, captures them, and holds them for optional manual moderation before deleting them after a number of days. In Manila and several other platforms, it might be necessary to navigate through individual posts to control trackback or comment spam. Some tools, like Blogware, include an option at the user-level to delete and block selected spam.

Blocking spammers is not always the best or easiest option, though. Many spammers do not use the same IP address to attack weblogs, especially those who send out bots or scripts. Blocking an IP address is often, at best, a temporary measure and not very easy. It might be something the server administrator needs to do. Some spammers use common IP addresses, such as those from cafes, coffee shops, libraries and other places offering free wireless Internet access, or IP addresses from a particular Internet service provider (ISP). Blocking an IP address associated with one of these public locations or with an ISP might result in legitimate blog readers and even bloggers losing access to the weblog(s).

Some software gives bloggers the option of requiring people to register on the weblog before they can submit comments. While this works to keep some spammers and some scripts out, it is not a completely foolproof method. If the spammer is human instead of a bot, that person might simply register for access to the weblog. Also, some bots and scripts can get around required registration on some platforms. Blocking specific people from a weblog might be possible, but a persistent spammer will change their user profile on a weblog to gain access to it again.

Many platforms allow bloggers to choose whether the blog should offer comments. Since comments have become one of the main vehicles for blog spam, many managing editors consider turning them off. However, the comments form one of the main components of many weblogs. The blogosphere thrives on dialogue. Many bloggers want to be able to foster community, and turning off comments is often not what blog owners want to do.

Turings and captchas

It might be possible to add tougher registration requirements to weblogs. Completely automated public Turing tests to tell computers and humans apart, or captchas, like those that require someone to describe an image or translate text displayed in an image before they can post a comment to a weblog, do not work when the spammer is a human.

One method shows many different images and asks the viewer which one has a particular attribute, like a red umbrella, or fits an adjective, like ‘cute’. While bots might not be able to handle a challenge-response test, it is probably only a matter of time before such bots include a way to test all of the possible image combinations and get past the captcha.

For now, captchas might help curb the number of spam blogs (unless a human is creating them). The free service Blogger once had such a severe problem with spam blogs that it caused problems for people using search engines like Feedster, one of the search engines specializing in blogs and XML feeds. Splogs were clogging the search results so badly that it was difficult to find legitimate weblogs. Circumstances like that not only make searching difficult, but can make a search engine look like it is full of worthless content.

It is possible that someone could hack into a free weblog-hosting site to use scripts to set up many spam blogs instead of having people create them manually. Security on that kind of blog-hosting site is often much better than it was a few years ago when blogging was a new thing that spammers had not yet completely usurped. Blogger addressed the problem they were having a few years ago, but splogs still exist. A search for ‘Viagra’ and ‘Blogspot’ in Technorati might reveal several splogs on Blogger to which that search engine gives top billing.

The way to go

A blogger battling spam might often feel quite overwhelmed and as if he/she is fighting a losing battle. It seems like the spammers are always either way ahead of us, or else not very far behind. Whenever a new spam-curbing tool becomes available, spammers seem able to break through the barrier in a matter of weeks. Even armed with the best and toughest tools available, perhaps the best thing a blogger can do is hope that one day bloggers will win and spammers will lose. Regrettably, though, that does not seem likely.

Some developers believe that better captcha tools with wider adoption are the way to go. Many popular platforms do not yet offer those tools, and no one can really predict how well they might work until they are widely implemented and results are available.

Many bloggers want better server blocks. But what should they be and how should we implement them? Should individual bloggers be able to control those blocks or should only server administrators have that level of control?

What’s particularly important, but sometimes lost in the shuffle, is for the servers on which bloggers work to be properly maintained, updated and secured. An insecure server can foster holes for spammers to fill and lead to a variety of problems that are worse than the average blogger might imagine.

With the constant spread of badware, keeping blogging systems up to date and securely patched has become critical. Several reports have floated through the blogosphere recently about blogs that have been hacked or spammed with links to badware. If such an attack is severe enough, it will take a considerable amount of effort and energy to repair the damage and make the weblog usable in a safe manner again.

Concluding remarks

Despite the increases in spam during the last few years, many people are still happily blogging and reading weblogs. Although spammers pose lots of challenges to blog developers, the software is continuing to improve and offer more protection against spammers.

The only sure way to defeat blog spammers might be to stop blogging. For many people, that is simply not a desirable option.

Latest articles:

Nexus Android banking botnet – compromising C&C panels and dissecting mobile AppInjects

Aditya Sood & Rohit Bansal provide details of a security vulnerability in the Nexus Android botnet C&C panel that was exploited to compromise the C&C panel in order to gather threat intelligence, and present a model of mobile AppInjects.

Cryptojacking on the fly: TeamTNT using NVIDIA drivers to mine cryptocurrency

TeamTNT is known for attacking insecure and vulnerable Kubernetes deployments in order to infiltrate organizations’ dedicated environments and transform them into attack launchpads. In this article Aditya Sood presents a new module introduced by…

Collector-stealer: a Russian origin credential and information extractor

Collector-stealer, a piece of malware of Russian origin, is heavily used on the Internet to exfiltrate sensitive data from end-user systems and store it in its C&C panels. In this article, researchers Aditya K Sood and Rohit Chaturvedi present a 360…

Fighting Fire with Fire

In 1989, Joe Wells encountered his first virus: Jerusalem. He disassembled the virus, and from that moment onward, was intrigued by the properties of these small pieces of self-replicating code. Joe Wells was an expert on computer viruses, was partly…

Run your malicious VBA macros anywhere!

Kurt Natvig wanted to understand whether it’s possible to recompile VBA macros to another language, which could then easily be ‘run’ on any gateway, thus revealing a sample’s true nature in a safe manner. In this article he explains how he recompiled…

Bulletin Archive