Last-minute presentation: Human-based computation: how crowd-sourcing can solve some of the trickiest security problems

Sumesh Jaiswal Symantec

Human-based computation (HBC) is a class of hybrid techniques where a computer program outsources certain steps of its function to humans. Normally a person asks a computer to perform a certain task and receives the result. However not all problems can be accurately solved by computers and this is where HBC reverses roles by giving those unsolvable parts back to humans. HBC is used by Google to label billions of images in its archives by making people play online games that actually label images for Google.

In the security space we regularly come across problems that cannot be completely solved by computers and are good candidates for HBC. This paper discusses solutions to four such problems in the security domain: phishing website identification, typo-analysis, data classification and spam identification. There are two operational models that work well with HBC: paid (such as Amazon's Mechanical Turk) and game-based (such as Google Image Labeler).

Automated methods for identifying phishing websites suffer from poor accuracy. This paper describes an HBC game that displays screenshots of genuine and suspected phishing websites to human players who accurately identify phishing websites while playing the game.

Keyboard typographical analysis is a critical step in developing statistical models to detect cyber-squatting. This requires collection and analysis of the keyboard typographical error patterns of millions of users and is an excellent problem for HBC to solve. We present an HBC-based online game that collects this data that is ultimately used to develop probabilistic and statistical models to detect and rank URL and email typos.

Data classification based on machine-learning techniques is a task which requires humans to pre-classify a training corpus. We present this problem in the form of an HBC game that is capable of pre-classifying huge corpora in a very short period of time.

While existing anti-spam technologies easily handle textual spam, they are ineffective when it comes to image spam, VoIP spam and video spam. The fourth HBC-based game described in this paper has been designed to accurately identify such spam using real human users.