/ ”Spam and the ongoing battle for the Inbox”

 

Definitions:

 

Spam - unsolicited email, often advertising a product or service. Spam can occasional “flood” an individual or ISP to the point that it significantly slows down the data flow.

Phishing – In a computing context, Phishing is an impersonation of a corporation or other trusted institution. The goal of the impersonation is to extract passwords or other sensitive information from the victim. It is a form of criminal activity that utilizes social engineering techniques. Phishing is typically done using e-mail or an instant messaging program. The attempt of the message is to appear to be from an authentic source so that victim will either directly respond, or will open a URL link to a fake web site run by the criminals.

 

Spam Filter - Software that uses various techniques to redirect unwanted email away from a user’s inbox. These filters can be based on a variety of criteria, including sender's email address; specific words in the subject or message body, and can be implemented by end-users as well as ISPs. Unique Clicks: The number of different individuals who click on an ad link within a specific period of time.

Human interaction proof (CAPTCHA) - HIPs (also known as "completely automated public Turing Tests to tell computers and humans apart," or CAPTCHAs, or just plain Turing Tests) are a key component in preventing abuse. The most common type of HIP is an image of a sequence of letters and digits that has been automatically distorted. One of the many ways they are used is before signing up for most free email accounts, users are required to solve one—correctly entering the sequence of letters and numbers in the image. Without HIPs, spammers would use these services to produce a torrent of spam. They are also used to prevent automated password attacks. Several products (such as MailBlocks and Matador) have used HIP challenges for suspected spam as a kind of economic approach. HIPs also prevent, for instance, the automated harvesting of Web site data and automated attempts to steal passwords. 

 What percentage overall mail volume is spam?

“Spam has increased from approximately 10% of overall mail volume in 1998, constituting an annoyance, to as much as 80% today” 

Give two examples of the escalation of technology taking place between spammers and spam filters

Training Machine Method

The use of a training machine to filter spam has been widely accepted. The Naïve Bayes method amongst other methods of spam filtering use a learning algorithm. “The Naïve Bayes method is used to find the characteristics of the spam mail versus those of the good mail. Future messages can be automatically categorized as highly likely to be spam, highly likely to be good, or somewhere in between. The earliest learning approaches were fairly simple, using the Naive Bayes algorithm to count how often each word or other feature occurs in spam messages and in good messages. To be effective these methods need training data “known spam and known good mail” to train the system.”

Other Training Algorithm Methods

Much more sophisticated algorithms have come out which put “weights” on words. Each word has a specific weight and if the email message weights totals more than a specified spam message weight than it is flagged as spam. These algorithms "learn" a weight for each word in a message. The weights are carefully adjusted so results derived from the training examples of both spam and good email are as accurate as possible. The learning process may require repeatedly adjusting tens of thousands or even hundreds of thousands of weights, a potentially time-consuming process. Fortunately, progress in machine learning over the past few years has made such computation possible.

Compression Technique

“Compression-based technique is more effective for spam filtering than traditional machine learning systems. Compression-based systems build a model of spam and a model of good email. A new message is compressed using both the spam model and the good-email model. If the message compresses better with the spam model, the message is likely spam; if it compresses better with the good-email model, the message is more likely legitimate.”

 

IP Address Filtering Method

IP Address Filtering has been used to block spam but some methods spammers use to circumvent this is to obtain IP addresses of other machines and create a horde of zombies or a “bot-net” and use those machines as drones to do the dirty work of emailing their spam messages with their unblocked IP Addresses.

Certificates & Security ID Method

Certificates or Security ID methods have been robust to most attacks but too difficult to deploy for practical reasons. They typically focus on the identity of a person rather than the identity of an email address, thus requiring a certifying agency of some sort. Some proposals would require all Internet users to go to their local Post Office and pay a fee to get a certificate. In addition, these proposals usually require some form of attachment or inclusion in the email message itself, confusing some users.

Similarity Matching Method

One of the most widely deployed spam filtering techniques is similarity-matching solutions. They attempt to find examples of known spam; for example, email that has gone to a special trap account that should receive no legitimate email that users have complained about. They then try to match new examples to this known spam. Spammers actively randomize their email in an attempt to defeat these matching systems. In some cases (such as spam where the primary content is an image meant to defeat both matching-based and machine-learning-based text-oriented filters), spammers even randomize the image to defeat image-matching technologies.

What Spammers have done to circumvent anti-spam methods?

Spammers have not sat idle while algorithms have been ramped up. “Traditional machine learning for spam filtering has many weaknesses. Initially, spammers sought to overcome these filters by making sure that words with large (spammy) weights, like "free," did not appear verbatim in their messages. For instance, they might break the word into multiple pieces using an HTML comment (fr<!--><-->ee) or encode it with HTML ASCII codes (fr&#101xe). When displayed to a user, both these examples look like "free," but for spam-filtering software, especially on servers, any sort of complex HTML processing is too computationally expensive, so the systems do not detect the word "free."” 

Anti-Spam Links & Tips for you!

 

FTC Spam website - http://www.ftc.gov/spam/

Sign up your email address to not receive junk email here.

 

Make A Difference! - http://spam.abuse.net/bits/makeadifference.shtml

How you can help out in the fight against spammers.


How to Prevent Spam - http://www.spamlaws.com/prevent-spam.html

List of guidelines to help you remove and decrease the amount of spam you receive