CEAS 2007 Live Spam Challenge

Competition Guidelines
Version 1.0

Overview

The CEAS Live Spam Challenge is an online anti-spam filtering competition in which anti-spam filters are tested on a live stream of spam and ham messages. Each filter will be asked to process a live e-mail stream and label each incoming message as spam or ham. The competition environment has been designed to allow most existing anti-spam systems to be used essentially out-of-the box by accurately simulating typical e-mail installations.

Competition Model

Each participant will be assigned a subdomain of ceas-challenge.cc to filter. The competition e-mail stream will be multiplexed to each participating filter such that each filter receives an essentially identical set of messages. The messages received by each filter will only differ in that e-mail addresses will be rewritten to match the subdomain assigned to the destination filter.

The test e-mail stream will be collected from several production SMTP servers. The collection process records the original SMTP envelope and the original message contents and relays that data to the Competition Controller. The Competition Controller modifies the message to appear to be addressed to each contestant's sub-domain and then relays it to each contestant as appropriate. The relayed message is identical to what the anti-spam filter would have received if the message had actually been sent to the perimiter server for the simulated domain and relayed to the anti-spam filter. The SMTP "MAIL From", and "RCPT To" addresses will be presented to each filter as it was presented to the capturing server. Exactly one Received header will be added to the captured message. The "from part" of the Received header will record the SMTP "HELO" and connection information using a Sendmail-style received line. For example,

   Received: from spammer.bulkmail.com {openrelay.com [1.2.3.4])
	     by ceas-challenge.cc (8.13.1/8.13.1)
	     with ESMTP ID l68FDvK031975; Mon, 16 Jul 2007 11:13:57 -0400

Indicates a HELO of "spammer.bulkmail.com", a connecting IP address of 1.2.3.4, and the results of a reverse DNS check on 1.2.3.4 that returned "openrelay.com". The first received line can safely be used for IP blacklisting or SPF checks.

Most messages will be delivered within minutes of its original receipt. The test stream may contain some previously recorded messages. Recorded messages will be updated to ensure they contain appropriate dates, IP addresses, and DKIM signatures to ensure that most widely known anti-spam algorithms will behave correctly.

COMPETITION RULES

  1. The competition will occur on August 2nd. The start time will be 12:15pm PST and the competition will last for 24 hours.
  2. The first hour of the competition will be used to train learning filters and will not be scored.
  3. All contestants must register to participate in the competition.
  4. Each group can enter up to two anti-spam filters.
  5. All contestants must sign and return the contestant agreement to participate in the competition. All signed agreements must be received by July 31, 2007.
  6. Filters can be located anywhere on the internet.
  7. Filters must not send bounce messages, challenge-response messages, or make any attempt to contact the message originator. Failure to adhere to this requirement may result is disqualification.
  8. Filters can use any resource available to make filtering decision. This includes both public and private resources, as well as teams of human labelers. We ask that each contestant report what resources they use, especially any human labelers.
  9. Filters should not update any public resources such as a shared signature database. Any updates made to public databases may be available to competing filters before they are required to classify the message. Any contestant found to have intentionally tampered with a public resource will be disqualified.
  10. Filters will be given one minute to classify each message. If a response is not received in one minute, the message will be scored using the filter's default response.
  11. The first valid response received by a filter will be scored. Subsequent responses will be dropped.
  12. Invalid responses will be discarded. If no valid response is received in the one minute time limit, the message will be scored using the filter's default response.
  13. Contestant's are responsible for ensuring the continuous operation of their anti-spam filter. The one minute timeout will be strictly enforced.
  14. Learnng based filters can (and should) be pre-trained with any or all public and private data available to the contestant.
  15. Feedback on all messages will be provided for the first hour of the competition. Thereafter, feedback will be provided for a subset of the e-mail stream.
  16. Feedback will be sent immediately for the first hour of the competition. Thereafter, feedback will be delayed to simulate real-user behavior. The delay may range from minutes to hours.
  17. Feedback will contain the official judgment for a message. There will be no attempt to simulate user labeling errors.
  18. Filters will be evaulated using the lam() metric. Lam() calculates the average of a filter's false-negative and false-positive rates, but performs the calculation in logit space. The exact formula used for the competition will be:
        	FPrate = (#ham-errors + 0.5) / (#ham + 0.5)
    	FNrate = (#spam-errors + 0.5) / (#spam + 0.5)
    	lam = invlogit((logit(FPrate) + logit(FNrate)) / 2)
    
    where
    	logit(p) = log(p/(1-p))
    	invlogit(x) = (e^x)/(1 + e^x)
    
    The winner will be the filter with the lowest lam() score.
  19. All practical measures will be taken to ensure the accuracy of the official message judgements. We reserve the right to change the official judgment of a message or remove a message from the competition as needed. Any reasonable disagreement on whether a message is spam will cause a message to be dropped.
  20. Feedback for incorrectly judged messages may be sent to all competing anti-spam filters before the error can be corrected. Participating filters are responsible for smoothly handling any erroneous feedback received.