Karl A. Krueger has written an interesting article reviewing various spam fighting techniques entitled The Spam Battle 2002: A Tactical Update.
Vernon Schryver's DCC: Measuring Bulkiness
DCC, short for Distributed Checksum Clearinghouse, is a client/server system for the detection of bulk mail. (Schryver) A DCC client is usually an SMTP server, though it may also be a mail user agent (MUA -- a mail client). Whenever it receives a message, it calculates several checksums of that message, and transmits them to a server, which returns the number of times it has seen each of those checksums. If a message has been seen many times by DCC clients, these numbers will be high, indicating that the message is likely bulk mail. DCC servers can also exchange checksums with one another, forming a redundant server-network similar in structure to that of IRC.
As the above description should make clear, DCC does not attempt to judge whether a message is spam. Vernon Schryver, the system's creator, believes that it is not feasible for an unintelligent system to accurately discern whether a particular message is spam. What DCC judges is the "bulkiness" of the message -- how many copies of it have been transmitted. As a result, clients which reject mail on this basis must also maintain a whitelist of non-spam bulk mail senders, such as legitimate mailing lists. This imposes some overhead on DCC users, but presumably not as much as maintaining a local blacklist of every spam source.
The checksums that DCC uses are not the same kind of checksums used by cryptographic algorithms. A crypto checksum or message digest is designed to maximize the output change caused by a small input change. Since spammers usually add changing elements such as tracking numbers to spam messages, such a checksum would not work for spam. Instead, the DCC checksums are fuzzy checksums under which such small input changes do not change the output. These work by checksumming not the bits of the message, but the arrangement of meaningful elements such as letters and URLs.
The New Scientist reports on a new technique for fighting spam developed by AT&T researcher John Ioannidis. It involves the use of special encrypted email addresses.
The Single Purpose addresses consist of a few dozen characters before the @ sign. The reply conditions are encoded using a secret cryptographic key, so that a spammer cannot create fake addresses. The addresses might look like nonsense but could easily be processed by computers, Ioannidis says. They could be posted to the web or used to subscribe to a mailing list without fear of receiving a barrage of spam in return. A much simpler "unlimited use" address would kept for personal correspondence, he says.
This article really doesn't explain how this technique works. Does the sender make a public key available for reading the address so that receivers can know who it is from and that it really is a valid originating address? Does each receiver need to know the public decrypting key of each sender he gets email from? Or are the keys shared at the level of POP servers?
Is the purpose to allow only each receiver to be able to reply to a given sender with the customized response address? I don't think so. Or is the purpose to allow receivers to know that the original sender is really who he says he is?
Posted by Randall Parker at December 06, 2002 09:36 AM