2002 November 22 Friday
Whitelists To Be Solution For Spam Junk Email?

Within a year or two more than half of all email will be spam junk mail. What do to about it? One approach is to use whitelists to exclude all email from people you do not know:

But the filters are running out of gas. The spammers keep multiplying, and they keep finding clever ways to fool the systems designed to stop them. Promising newcomers such as CloudMark, which taps the collective power of e-mail recipients to identify spam, may improve things for a while. But there will always be a trade-off between catching all the spam and ensuring that every piece of legitimate e-mail gets through.

So, sophisticated Internet users are turning to a new approach. Instead of trying to block spam while allowing everything else, these users employ software that blocks everything except messages from already known, accepted senders. These systems, called "whitelists," change e-mail from an open system to a closed one.

There are practical problems with whitemail lists. Among the reasons why legitimate email could be filtered out:

  • People have more than one email address. So, for instance, you might have a home address for someone on your list but then they can try to send you email thru a a work address.
  • People change their email addresses when they change internet service providers.
  • Someone could get your emal address from, for instance, classmates.com in order to contact you for legitimate reasons. Well, that's a new email address for the recipient the first time the email comes in.
  • Automated tools could send email to notify about some problem (eg a list admin demon could send a warning that some email being sent to your account is bouncing due to conventional junk mail filtering done by an ISP). The sending address would be a new address from your standpoint.
  • A public figure (commentator, politician, etc) might want to make an email address public in order to get comments from the larger public. A whitelist is not a realistic option for such email addresses.
  • A large variety of email addresses are used for reporting problems (eg web site main admin addresses and some tech support email addresses) from users who are often totally unknown to an organization before they first send in a message.

The basic problem is that there are a variety of legitimate reasons for why email gets sent from addresses which wouldn't already be in the receiver's address book. Another problem is that junk mail senders can fake the originating email address. So junk mail that pretends to be from an address on a whitelist could get thru.

There are a few methods proposed for dealing with this problem of legitimate email that isn't already on a whitelist. One could put it in a folder that the user would occasionally glance thru to look for what might be legitimate email. Many of us do that with existing email that our filters route to junk mail folders. Another option would be to have automated software that would respond to the suspect mail asking that the originator read some GIF to identify a keywork embedded in a thatched pattern. Then the user would either go to a web page that the response mail would provide a link to or would respond with an email that contained the keyword. Basically, the idea is to ensure that a human cares enough about getting the email thru to look at a response to it and do something to get one registered as a real human sender of individual email messages.

The sharing of whitelists has been proposed. That way, for instance, everyone in a company that deals with some other set of companies could use the whitelists for those other companies. One problem with these shared whitelists would become valuable for junk mailers to acquire. After all, the bulk of their entries would tend to be real used email addresses that could be added to lists of email addresses to email to. Plus, by analysing whitelists the junk mailers can choose originating addresses to fake. It is easy for spammers to put a fake value in the From address field. This would up the odds that a junk mail message will get thru.

One response to the problem of spammers using whitelists as part of their toolbox would be to encrypt the email addresses in the whitelists. Dan Brickley has proposed using an RDF format file to allow sharing of whitelists. He calls this approach FOAF for Friend Of A Friend. He proposes the use of encryption to hide the addresses:

This is an experiment based on the idea of sharing lists of garbled email addresses, ie instead of sharing 'mailto:danbri@w3.org' we might share '357fdd378d61684762ed88277192cfdf001189af', which is what we get when we feed that address to the sha1 algorithm. Consumers of this data can do the same thing with addresses from incoming mail, and then check to see if the resulting value is on the (garbled) whitelist.

One problem with encrypted shared whitelists is that if someone was to give you one you'd have no way of knowing who you are opening yourself up to receiving email from. Another problem with it is that a junk emailer who has a huge database of email addresses could get a copy of a whitelist and then run all of their email addresses thru the encrypting algorithm and compare the output to the entries in the whitelist. The irony here is that the junk emailers, because they have such large numbers of email addresses in their databases, are in a better position to figure out what the encrypted values are in the whitelists.

It might be possible to prevent spammers from faking at least some From addresses by creating a group of trusted POP servers that know about what From domains each POP server is allowed to originate email with those domains in the From field. The sending POP servers would have to enforce on senders that they can only send email with the specific From addresses that have been assigned to them. The receiving POP servers would have to know what domains each sending POP server can legally use to send to them and if an email gets sent by an untrusted POP server and that email contains a domain in the From field that is a domain that is "owned" by a trusted POP server then the receiving POP server would know to reject the email.

Posted by Randall Parker at November 22, 2002 08:34 AM
Comments
Post a comment
Name:

Email Address:

URL:

Comments:
Remember info?

      
Site Traffic Info