CEAS 2008 Spam-Filter Challenge Live Spam Task

Competition Guidelines
Version 1.0

NOTE: The following rules are tentative and subject to change. The rules for active learning have not yet been finalized and therefore are not included below. All rule updates will be posted here and announced on the mailing list.

1.0 Overview

The CEAS Spam-Filter Challenge Live Spam Task is an online anti-spam filtering competition in which anti-spam filters are tested on a live stream of spam and ham messages. Each filter will be asked to process a live e-mail stream and label each incoming message as spam or ham. The competition environment has been designed to allow most existing anti-spam systems to be used essentially out-of-the box by accurately simulating typical e-mail installations.

2.0 Competition Model

Each participant will be assigned a subdomain of ceas-challenge.cc to filter. The competition e-mail stream will be multiplexed to each participating filter such that each filter receives an essentially identical set of messages. The messages received by each filter will only differ in that e-mail addresses will be rewritten to match the subdomain assigned to the destination filter.

The test e-mail stream will be collected from several production SMTP servers and relayed to the Competition Controller. The Competition Controller will modify the message to appear to be addressed to each contestant's sub-domain and then relay it to each contestant as appropriate. The relayed message will be as close as possible to what the anti-spam filter would have received if the message had actually been sent to the perimiter server for the simulated domain and relayed to the anti-spam filter. Every attempt will be made to preserve the original SMTP "HELO", "MAIL From", and "RCPT To" commands as was originally presented to the capturing server. If we are unabe to preseve the original headers, simulated headers will be provided that respect the properties most likely to be used by anti-spam systems. The properties preserved by the simulated headers include but are not limited to DNS, rDNS, DNSRBL, and SPF data for all envelope addresses. Unfortunately, due to the anonymization process, we cannot support DKIM signatures. All DKIM headers will be striped from the test stream.

Exactly one Received header will be added to the captured message. The "from part" of the Received header will record the SMTP "HELO" and connection information using a Sendmail-style received line. For example,

Received: from spammer.bulkmail.com {openrelay.com [1.2.3.4])
          by ceas-challenge.cc (8.13.1/8.13.1)
          with ESMTP ID l68FDvK031975; Mon, 16 Jul 2007 11:13:57 -0400

Indicates a HELO of "spammer.bulkmail.com", a connecting IP address of 1.2.3.4, and the results of a reverse DNS check on 1.2.3.4 that returned "openrelay.com". The first received line can safely be used for IP blacklisting or SPF checks.

Most messages will be delivered within minutes of its original receipt. The test stream may contain some previously recorded messages. Recorded messages will also be updated to ensure they contain appropriate dates, IP addresses, and DKIM signatures to ensure that most widely known anti-spam algorithms will behave correctly.

3.0 Competition Rules

  1. The competition will occur on August 5th - August 7th. The start time will be 14:00 UTC and the competition will last for 72 hours.
  2. The first hour of the competition will be used to train learning filters and will not be scored.
  3. All contestants must register to participate in the competition.
  4. Each group can enter up to two anti-spam filters.
  5. All contestants must sign and return the contestant agreement to participate in the competition. All signed agreements must be received by July 31, 2008.
  6. Filters can be located anywhere on the internet.
  7. Filters must not send bounce messages, challenge-response messages, or make any attempt to contact the message originator. Failure to adhere to this requirement may result is disqualification.
  8. Filters can use any resource available to make filtering decision. This includes both public and private resources, as well as teams of human labelers. We ask that each contestant report what resources they use, especially any human labelers.
  9. Filters should not update any public resources such as a shared signature database. Any updates made to public databases may be available to competing filters before they are required to classify the message. Any contestant found to have intentionally tampered with a public resource will be disqualified.
  10. Filters will be given ten minutes to classify each message. If a response is not received in ten minutes, the message will be scored using the filter's default response.
  11. The first valid response received by a filter will be scored. Subsequent responses will be dropped.
  12. Invalid responses will be discarded. If no valid response is received in the ten minute time limit, the message will be scored using the filter's default response.
  13. Contestant's are responsible for ensuring the continuous operation of their anti-spam filter. The ten minute timeout will be strictly enforced.
  14. Learning based filters can (and should) be pre-trained with any or all public and private data available to the contestant.
  15. Feedback on all messages will be provided for the first hour of the competition. Thereafter, feedback will be provided for a subset of the e-mail stream.
  16. Feedback will be sent immediately for the first hour of the competition. Thereafter, feedback will be delayed to simulate real-user behavior. The delay may range from minutes to hours.
  17. Feedback will contain the official judgment for a message. There will be no attempt to simulate user labeling errors.
  18. Filters will be evaluated using two measures which combine the percentage of spam blocked and its false positive rate:

    Filters which cannot provide a numeric score are welcome, however, such filters will be evaluated using only the LAM measure. The LAM calculation will be smoothed. The exact formula used for the competition will be:

    FPrate = (#ham-errors + 0.5) / (#ham + 0.5)
    FNrate = (#spam-errors + 0.5) / (#spam + 0.5)
    lam = invlogit((logit(FPrate) + logit(FNrate)) / 2)

    where

    logit(p) = log(p/(1-p))
    invlogit(x) = (e^x)/(1 + e^x)

    We will declare two winners, one based on lowest LAM and one based on highest AUC.
  19. All practical measures will be taken to ensure the accuracy of the official message judgments. We reserve the right to change the official judgment of a message or remove a message from the competition as needed. Any reasonable disagreement on whether a message is spam will cause a message to be dropped.
  20. Feedback for incorrectly judged messages may be sent to all competing anti-spam filters before the error can be corrected. Participating filters are responsible for smoothly handling any erroneous feedback received.