NOTE: The Competition Controller currently does not support the reporting of numerical scores nor does it support active learning. These features will be added in the next two weeks. Updates to this document will be posted here and announced to the mailing list.
The goal of the CEAS Spam-Filter Challenge Live Spam Task is to evaluate anti-spam filters in a realistic environment that closely approximates real-world anti-spam installations. Each filter will be asked to process a live e-mail stream and label each incoming message as spam or ham. The competition environment has been designed to allow most existing SMTP-based anti-spam systems to be used essentially out-of-the box by accurately simulating typical e-mail installations. This year IMAP4 and POP3 will not be directly supported. Contestants with IMAP4 or POP3 filters can participate by setting up an SMTP server and connecting it to their IMAP4 or POP3 filter.
The competition environment is managed by the Competition Controller. You can access the Competition Controller at
https://comp.ceas-challenge.cc
Please note, for security reasons, the controller is only available via https (SSL). Each participant will be assigned a subdomain of ceas-challenge.cc to filter (i.e. <contestant>.ceas-challenge.cc). The Competition Controller acts as a perimeter SMTP server for each filter's subdomain. The controller delivers every test message to each anti-spam filter via SMTP. The details of how each message is delivered and how each filter is expected to respond is discussed below.
The e-mail for the competition will be collected from several production SMTP servers. The collection process records the original SMTP envelope and the original message contents and relays that data to the Competition Controller. The Competition Controller modifies the message to appear to be addressed to each contestant's simulated domain and then relays it to each contestant as appropriate. The relayed message is identical as possible to what the anti-spam filter would have received if the message had actually been sent to the perimeter server for the simulated domain and relayed to the anti-spam filter.
The Competition Controller simulates a simple user feedback model in which the recipient is given the opportunity to tag any message they receive as spam or ham. These labels are sent to each anti-spam filter to be used as training data. The simulation attempts to model real users in that only a fraction of the e-mail received will be labeled by the user, and that there may be a significant delay between when a message is delivered and when it is labeled. All feedback will be accurate and no attempt will be made to simulate user errors.
Please note that the toolkit and the test stream use a trivial feedback model in which feedback is provided for every message with a fixed delay. This model will NOT be used in the actual competition.
The configuration of each filter is specified using a Java-style properties file. The configuration file tells the Competition Controller how to communicate with each filter. The configuration tab on the Competition Controller can be used to upload and install your configuration file.
There are three primary types of communication that must occur between the Competition Controller and a participating filter: submit a message to be classified, receive the classification response, and send training examples. The configuration file allows the communication parameters for each step to be specified separately.
Each message sent between the Competition Controller and the anti-spam filter must contain a special 'x-ceas-tracking' header. This header is used to authenticate messages to the Competition Controller as well as to correlate each of the three message types described above with the original test message. This header will be included in all messages sent to the anti-spam filter. The header must be copied verbatim to all response messages sent by the anti-spam filter. Response messages that do not contain the appropriate tracking header will be discarded.
The tracking header is also useful to correlate classification requests and filter judgments. The tracking header is formatted as a MIME-style attribute-value list. The same message identifier will be used for classification requests and the corresponding user-feedback messages.
The Competition Controller delivers incoming mail to each anti-spam filter via SMTP. The anti-spam filter is expected to classify each message as spam or ham, add a field to the message indicating its classification, and relay the message back to the Competition Controller for scoring. The Competition Controller records the filter's response by extracting the appropriate header field and analyzing its contents. The header used to indicate the the filter's decision as well as its format is configurable.
User feedback is provided by sending the original message back to the anti-spam filter. Typically, this is done by addressing feedback to special "report-spam" and "report-ham" addresses based on the message's correct label.
This section describes how to configure a typical SMTP-based anti-spam filter. The configuration of procmail-based filters is described in the next section. If your filter does not fit the model presented below, please see the REFERENCE section for additional options. A basic configuration file for SMTP is as follows:
# Typical SMTP configuration file
filter.type: smtp
filter.smtp.server: comp.contestant.com
response.type: smtp
response.smtp.autolabel: regex
response.smtp.header: x-ceas-label
response.smtp.regex: (?i:spam)
response.default: spam
feedback.spam.type: smtp
feedback.spam.smtp.recipient: report-spam
feedback.ham.type: smtp
feedback.ham.smtp.recipient: report-ham
The first two lines:
filter.type: smtp
filter.smtp.server: comp.contestant.com
request that the test stream be delivered by SMTP to comp.contestant.com. Your filter should be configured to accept e-mail for the sub-domain assigned to your anti-spam filter. The sub-domain will be a concatenation of the filter name you request and "ceas-challenge.cc". For instance, myfilter.ceas-challenge.cc.
The list of valid users at myfilter.ceas-challenge.cc will not be provided. Configure your anti-spam filter to relay all local mail back to the Competition Controller.
The lines
response.type: smtp
response.smtp.autolabel: regex
response.smtp.header: x-ceas-label
response.smtp.regex: (?i:spam)
response.default: spam
define how the Competitition Controller should expect to receive your filter's classifications and how they should be interpreted. The first line configures the server to expect responses via SMTP. The second line declares that responses will contain a special response header that can be parsed with a regular expression. The following two lines define the response header and a regular expression that will only match the response header of spam messages. As configured, all response messages with the header:
x-ceas-label: spam
will be scored as being labeled spam. Any other value of 'x-ceas-label' (ignoring case) will be scored as being labeled ham. Note, the regular expression must match the entire header value. The regular expression "(?i:spam)" does not match the line
x-ceas-label: spam, score=30
as it contains additional text. If it is desired to match the above, a regular expression such as "(?i:.*spam.*)" should be used. The regular expression must be written in the java.util.regex format. See http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html for details.
Using this cofiguration file, if a response message is not received before the response timeout expires, the message will be classified using the default response of "spam".
Inorder for your filter to send the appropriate response messages, you should configure your filter to act as a relay and deliver all filtered e-mail back to the Competition Controller. The Competition Controller is designed to ensure that this loopback configuration will not create a mail loop.
Feedback is provided by sending the original message back to your SMTP server with an added 'x-ceas-judgment' header with the official judgment. The configuration settings:
feedback.spam.type: smtp
feedback.spam.smtp.recipient: report-spam
feedback.ham.type: smtp
feedback.ham.smtp.recipient: report-ham
requests that spam feedback be delivered to the contestant's SMTP server using the special address "report-spam". That is, judgments are delivered to report-spam@myfilter.ceas-challenge.cc. Similarly, the above requests ham judgments be delivered to resport-ham@myfilter.ceas-challenge.cc.
If reporting is not done through a special address, see the reference section for other alternatives. If user feedback is not supported at all, then add the following feedback block should be used:
feedback.spam.type: none
feedback.ham.type: none
This section describes how to configure a typical procmail-based anti-spam filter. The description assumes you have read the SMTP configuration section above. If your filter does not fit the model presented below, please see the REFERENCE section for additional options. How you configure your procmail server depends on what control you have over your local SMTP server. To retain the most SMTP envelope information that is possible, use the following configuration:
.
# Typical procmail configuration file
filter.type: smtp
filter.smtp.server: comp.contestant.com
filter.smtp.recipient: filterd
response.type: smtp
response.smtp.autolabel: regex
response.smtp.header: x-ceas-label
response.smtp.regex: (?i:spam)
response.default: spam
feedback.spam.type: smtp
feedback.spam.smtp.recipient: filterd
feedback.ham.type: smtp
feedback.ham.smtp.recipient: filterd
This configuration is almost identical to the configuration described for SMTP above. The key difference is the addition of the three lines:
filter.smtp.recipient: filterd
feedback.spam.smtp.recipient: filterd
feedback.ham.smtp.recipient: filterd
This requests that all e-mail sent to your filter be delivered to the special e-mail address 'filterd'. The full destination address passed to your SMTP server will include the simulated sub-domain assigned to your filter: filterd@myfilter.ceas-challenge.cc. As a result, the above requires that you have sufficient controll over your SMTP server to accept mail addressed to your filter's subdomain. If you do not, a different configuration is recommended:
# Alternative procmail configuration file
filter.type: email
filter.email.address: filterd@yourdomain.com
response.type: smtp
response.smtp.autolabel: regex
response.smtp.header: x-ceas-label
response.smtp.regex: (?i:spam)
response.default: spam
feedback.spam.type: email
feedback.spam.email.address: filterd@yourdomain.com
feedback.ham.type: email
feedback.ham.smtp.recipient: filterd@yourdomain.com
This configuration uses the alternative 'email' delivery method. This setup sends the test stream and feedback messages using ordinary e-mail.
In either setup, the competitition test stream and feedback messages will be delivered to the provided user. Setup the user's procmailrc file to pipe the message into your anti-spam filter. With these configuration files, the antispam filter will be expected to add a 'x-ceas-label' header with the value of either 'spam' or 'ham' as appropriate. See the REFERENCE section for alternative methods for providing your filter's label.
Also setup the procmailrc file of the target user to forward the filtered message back to the competition controller. You can e-mail your responses to the address 'response@ceas-challenge.cc'.
filter.type: <delivery-type>
Required
Delivery method that will be used to send the test stream to the anti-spam filter. Allowable values are:
smtp
Relay the competition e-mail stream via SMTP. The message stream will be relayed with the original SMTP "MAIL From" and "RCPT To" address lists. The SMTP "HELO" and TCP/IP connection information will be stored in the first received line.
Relay the competition e-mail stream via ordinary e-mail to an address provided below. The original SMTP envelope information will not be available.
filter.smtp.server: <host-address>
Required for SMTP
Destination IP address for SMTP-based delivery.
filter.smtp.recipient: <username>
Optional for SMTP
Configures SMTP delivery to send all messages to <username> rather than the message's original recipient list.
filter.email.address: <email-address>
Required for EMAIL
Destination address for e-mail based delivery.
Classifications are sent back to the Competition Controller using SMTP. All responses take the form of an e-mail message. It is recommended that the original e-mail message be sent back for this purpose as it helps ensure the response is correctly recorded. At a minimum, the "x-ceas-tracking" header present in the original classification request must be sent back with the filter's response. For security reasons, any response received without the original tracking header will be discarded.
response.type: <type>
Required
Specifies the mechanism used by the anti-spam filter to return responses back to the Competition Controller for scoring. Currently tho only supported method is SMTP:
smtp
The anti-spam filter will relay the original message back to the Competition Controller via SMTP with its response stored in a special header.
response.smtp.autolabel: <autolabel-mode>
Required
Deterimnes the method used to label incoming response messages. Allowable values are:
regex
The response will contain a special mime header with the filter's classification. The header will be matched against a regular expression. The regular expression should match the special header if and only if the message is classified as spam.
ham
The filter will discard spam messages and only relay ham messages to the competition controller. All relayed messages received by the controller will be scored as being classified as ham by the filter. Any message not relayed within the timeout interval will be scored as a spam response.
spam
The filter will discard ham messages and only relay spam messages to the competition controller. All relayed messages received by the controller will be scored as being classified as spam by the filter. Any message not relayed within the timeout interval will be scored as a ham response.
response.smtp.header: <mime-header>
Optional for REGEX
Configures the MIME header used to store the filter's SMTP-based response. If this property is not specified, a value of 'x-ceas-label' is assumed.
response.smtp.regex: <java-regex>
Optional for REGEX
Specifies the regular expression for determining whether the filter's response header represents a spam classification. Any message in which the response header matches <java-regex> will be scored as spam. All other responses will be treated as ham classifications.
The regular expression uses the Java regular expression syntax (see java.util.regex package for details), which in turn is based on the Perl regular expression syntax. The default is "(?i:spam)" which performs a case insensitive match for the string literal "spam".
response.default: <default-response>
Optional
If no response message is received from a filter before the timeout period expires, the message will be scored using the default response. The default response will be "ham" unless set to spam using this option.
The mechanisms available for sending judgments closely parallels those available for sending the test stream. Separate delivery options can be specified for spam and ham judgments. Judgments will be sent using the original test message but with a 'x-ceas-judgment' header added with the value of "spam" or "ham" as appropriate. The original message and the judgment message will be sent with identical message id's stored in their respective "x-ceas-tracking" headers.
feedback.spam.type: <delivery-type>
feedback.ham.type: <delivery-type>
Required
Specifies the primary delivery mechanism for sending judgments to the filter. Allowable values are:
smtp
Relay judgments via SMTP to the SMTP server configured to receive the test stream. The judgment messages can be distinguished from the test stream by the presence of the 'x-ceas-judgment' header or using a unique recipient addresses (see below).
Relay judgments via ordinary e-mail to an e-mail address provided below.
none
Do not send judgments.
feedback.spam.smtp.recipient: <username>
feedback.ham.smtp.recipient <username>
Optional
Configure judgments to be sent to <username> rather than the message's original recipient list. This setting is useful for solutions that provide special addresses for sending false positive and false negative reports.
feedback.spam.email.address: <email-address>
feedback.ham.email.address: <email-address>
Required for EMAIL
Destination address for sending judgments via e-mail.