Search engines in research and vulnerability assessment

2007-11-01

Alex Eckelberry

Sunbelt Software, USA
Editor: Helen Martin

Abstract

'Search engines are free, powerful and efficient tools that can be used to find vulnerabilities and hacked sites on the web, and even in your own organization.' Alex Eckelberry, Sunbelt Software.


On 2 October an odd thing happened: state and government websites in California started shutting down. It turned out that the US General Services Administration (GSA) had overreacted somewhat to reports of pornography being hosted on a website for the Transportation Authority of Marin County, and simply pulled the plug on the entire ca.gov domain.

News that the Marin County website was serving porn came as no great surprise to malware researchers. A number of individuals had contacted the owners of the site in mid-September, alerting them to the problem. Unfortunately these alerts were ignored, as they were believed to be ‘phishing’ attempts.

The primary source of the problem was an apparent DNS hack, which redirected parts of the Marin County website to pornographic sites. These were redirects – the government site itself was not hosting porn. The hack occurred at an outsourced provider, which didn’t have the tightest security practices in place.

As many in the research community know, finding these types of hack is trivial work – it can often be a matter of using simple Google searches such as sex porn site:gov.

Malware researchers are quite familiar with the power of search engines in conducting research. Search engines are free, powerful and efficient tools that can be used to find vulnerabilities and hacked sites on the web, and even in your own organization.

Generally, one sees websites compromised through stolen FTP credentials; unpatched (usually open source) software, including poorly maintained LAMP stacks; the increasing use of collaborative, ‘web 2.0’-type software (wikis, tikis, etc.); DNS hacks; poorly written ASP code; sloppy PHP work and SQL hacks.

So-called ‘Google dorks’ can be useful in finding compromised and malware-hosting websites. The term was coined by Johnny Long on his website johnny.ihackstuff.com, where he described the practice (which had been around for some time) of using Google searches to find ‘dorks’ – people who expose too much information on the web.

Google dorking’ has evolved, and malware researchers continue to fine-tune their searches to find vulnerable websites and malware. Furthermore, by using Google’s Alerts feature, one can input a number of different searches and receive alerts when a matching site is found – useful in finding newly compromised or rogue sites.

As one example, some broader starter queries might be any of the following: inurl:traff site:.biz (or .info), inurl:in.cgi site:.info, inurl:klik site:.info, or intitle:"index of" (the last followed by any of a number of terms, such as "love exe", "jpg exe", porn exe, xxx pif, "bot exe" or "gif exe" – hence, a final search might be intitle:"index of" xxx pif -filetype:html -filetype:php -filetype:htm or, as another example, intitle:"index of" "bot exe" -filetype:html -filetype:php -filetype:htm).

Since ‘jump pages’ are often created with the sole purpose of being indexed on search engines and redirecting visitors to other content, running searches on something like site:nm.ru might prove useful. Alternatively, searches can be performed for specific directory structures of frameworks used in malware, such as /stata/index.php.

Pornography and malware distributors commonly hack into websites for search engine optimization and increased distribution (ironically, Google's work in marking sites as 'unsafe' in search results is likely driving malware and porn distributors to rely increasingly on hacking 'good sites' to perform redirections to their own bad sites). Finding these hacked sites is similarly trivial. One can simply look for any combination of terms, such as ‘porn, free ringtones, free casino’, followed by some operators to narrow down the search.

Some knowledge of the language used by the distributors also helps – ‘sesso’ and ‘fottilo’, for example, are often used by Italian malware and porn distributors (such as Gromozon). At the time of writing this article, the search sesso OR gratuito porno OR fottilo site:gov produces some rather interesting (and sometimes very dangerous) results.

One can continue to experiment by adding different domains and additional operators to the searches. It’s common to find plenty of comment spam using these methods, but very often you’ll also find compromised websites.

Organizations large and small can use similar searches to find vulnerabilities on their own sites. This holds especially true for larger organizations that work in collaborative environments, such as academic institutions and some governmental organizations. For example, problems with vulnerabilities commonly exist in colleges and universities, where students are often provided with their own websites, academic discourse is encouraged through open source collaborative software, and servers are managed by different groups throughout a campus. It’s a recipe for disaster, and that’s often exactly what happens – finding hacked university websites is almost trite work. IT administrators could complement their security toolboxes with search engines, seeking inappropriate content on their own domains.

In addition to finding malware on the web, there are numerous (and often hair-raising) searches available that can be used to find vulnerabilities on a site. Queries are limited only by creativity, technical acumen and knowledge of data structures.

Finally, there are distinct differences between search engines. Yahoo, Google and Live.com present similar data, but sometimes one provides clearer results than the other. Live.com has the powerful and unique feature of allowing IP searches. Searching for common malware IPs produces profitable results, such as ip:89.28.13.208, ip:89.28.13.213, and so on.

Some researchers are frustrated by the inability to search within the source of web pages – which, if provided, would open up a mother lode of information, obviating the need to use proprietary spiders. For now, however, one can find plenty of information using simple searches, enabling the research community to find bad things before they get out broadly to the public – and in the process, hopefully making some impact on the safety of the Internet experience for users.

Sunbelt researchers Francesco Benedini and Adam Thomas contributed to this article.

twitter.png
fb.png
linkedin.png
googleplus.png
reddit.png

 

Latest articles:

Throwback Thursday: CARO: a personal view

As a founding member of CARO (Computer Antivirus Research Organization), Fridrik Skulason was well placed, in August 1994, to shed some light on what might have seemed something of an elitist organisation, and to explain CARO's activities and…

VB2016 paper: Uncovering the secrets of malvertising

Malicious advertising, a.k.a. malvertising, has evolved tremendously over the past few years to take a central place in some of today’s largest web-based attacks. It is by far the tool of choice for attackers to reach the masses but also to target…

VB2016 paper: Building a local passive DNS capability for malware incident response

Many security operations teams struggle with obtaining useful passive DNS data post security breach, and existing well-known external passive DNS collections lack complete visibility to aid analysts in conducting incident response and malware…

Throwback Thursday: Tools of the DDoS Trade

In September 2000, Aleksander Czarnowski took a look at the DDoS tools of the day.

VB2016 paper: Debugging and monitoring malware network activities with Haka

Malware analysts have an arsenal of tools with which to reverse engineer malware but lack the means to monitor, debug and control malicious network traffic. This VB2016 paper proposes the use of Haka, an open source security-oriented language, to…