The goal of the CEAS Live Spam Challenge is to evaluate anti-spam filters in a realistic environment that closely approximates real-world anti-spam installations. Each filter will be asked to process a live e-mail stream and label each incoming message as spam or ham. The competition environment has been designed to allow most existing anti-spam systems to be used essentially out-of-the box by accurately simulating typical e-mail installations. This is accomplished by supporting several delivery modes including SMTP, POP3, IMAP4, and the TREC toolkit. The Competition Toolkit provides the ability to run a local copy of the competition environment and several related tools. The toolkit is not needed for integrating most anti-spam solutions into the competition environment.
The competition environment is managed by the Competition Controller. Each participant will be assigned a subdomain of ceas-challenge.cc to filter (i.e. <contestant>.ceas-challenge.cc). The Competition Controller acts as a perimeter SMTP server for each filter's subdomain. The controller delivers each message to each anti-spam filter based on the delivery mechanism configured for that filter (e.g. SMTP). The details of how each message is delivered and how each filter is expected to respond is dependent on the delivery mechanism chosen. The details for each delivery method are discussed below.
The e-mail for the competition will be collected from several production SMTP servers. The collection process records the original SMTP envelope and the original message contents and relays that data to the Competition Controller. The Competition Controller modifies the message to appear to be addressed to each contestant's simulated domain and then relays it to each contestant as appropriate. The relayed message is identical to what the anti-spam filter would have received if the message had actually been sent to the perimeter server for the simulated domain and relayed to the anti-spam filter.
The primary methods used by the Competition Controller to relay test messages are SMTP and POP3/IMAP4. The SMTP mechanism supports SMTP-based solutions, and the POP3/IMAP4 mechanism supports POP3/IMAP4 solutions. In addition, the POP3/IMAP4 solution can be combined with other tools to support the TREC Toolkit as well operation behind a firewall.
The Competition Controller simulates a simple user feedback model in which the recipient is given the opportunity to tag any message they receive as spam or ham. These labels are sent to each anti-spam filter to be used as training data. The simulation attempts to model real users in that only a fraction of the e-mail received will be labeled by the user, and that there may be a significant delay between when a message is delivered and when it is labeled. All feedback will be accurate and no attempt will be made to simulate user errors.
Please note that the toolkit and the test stream use a trivial feedback model in which feedback is provided for every message with a fixed delay. This model will NOT be used in the actual competition.
The configuration of each filter is specified using a Java-style properties file. Each contestant will be asked to submit a configuration file to be installed on the Competition Controller. The configuration file tells the Competition Controller how to communicate with each filter.
There are three primary types of communication that must occur between the Competition Controller and a participating filter: submit a message to be classified, receive the classification response, and send training examples. The configuration file allows the communication parameters for each step to be specified separately. A filter can be configured to accept classification requests via SMTP provide classifications via POP3/IMAP4, and accept judgments via e-mail to a special address. Filters can also be configured with separate communication parameters for spam and ham judgments.
Each message sent between the Competition Controller and the anti-spam filter should contain a special 'x-ceas-tracking' header. This header is used to authenticate messages to the Competition Controller as well as to correlate each of the three message types described above with the original test message. This header will be included in all messages sent to the anti-spam filter. The header must be copied verbatim to all response messages sent by the anti-spam filter. Response messages that do not contain the appropriate tracking header will be discarded.
The tracking header is also useful to correlate classification requests and filter judgments. The tracking header is formatted as a MIME-style attribute-value list. The same message identifier will be used for classification requests and the corresponding user-feedback message.
This section describes several methods for integrating anti-spam servers into the test environment. The methods described here should cover a majority of use cases. A complete description of all the configuration options available appears in Section 3.0.
The Competition Toolkit contains template configuration files for each of the scenarios described below. The template files also include descriptions of the most important configuration options.
In the basic SMTP model, the Competition Controller delivers incoming mail to the anti-spam filter via SMTP. The anti-spam filter is expected to classify each message as spam or ham, add a field to the message indicating its classification, and relay the message back to the Competition Controller for scoring. The Competition Controller records the filter's response by extracting the appropriate header field and analyzing its contents. The header used to indicate the the filter's decision as well as its format is configurable.
User feedback is provided by sending the original message back to the anti-spam filter. Typically, this is done by addressing feedback to special "spam-report" and "ham-report" addresses based on the message's correct label.
Set the address of your SMTP server:
Configure your filter to act as a relay between a perimeter SMTP server and a local delivery server. Set both incoming and outgoing server to the address of the Competition Controller. The Competition Controller is designed to ensure that this loopback configuration will not create a mail loop.
The server must be configured to accept e-mail for the sub-domain assigned to your anti-spam filter. The sub-domain will be a concatenation of the filter name you request and "ceas-challenge.cc". For instance, myfilter.ceas-challenge.cc.
The list of valid users at myfilter.ceas-challenge.cc will not be provided. Configure your anti-spam server to relay all local mail back to the Competition Controller.
The anti-spam filter should be configured to tag incoming messages as spam or ham. The Competition Controller by default expects spam e-mail to be tagged with a "x-ceas-label" header with the value "spam". If a different header is needed, set the following properties in the configuration file:
The regular expression must be written in the java.util.regex format. See http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html for details.
Alternatively, the anti-spam filter can be configured to block all spam messages and only deliver ham. If this setup is desired, comment out the settings for "response.smtp.header" and "response.smtp.regex" and add the following:
Many anti-spam filters allow user feedback to be provided via addresses for reporting spam and ham. If your filter supports such an address, set one or both of the following properties:
The Competition Controller server includes a IMAP4/POP3 server and each contestant will be provided an account on that server. Contestants can request that all incoming e-mail be delivered to their account for filtering by their IMAP4- or POP3-based anti-spam filter.
IMAP4-based filters will be expected to move any message identified as spam into an appropriate spam folder. The Competition Controller will monitor the contestant's spam folder and credit the filter for detecting any spam message that is placed in the spam folder. Any feedback provided by the user will be delivered directly to special "judgedspam" and "judgedham" folders. The anti-spam filter can monitor these folders and train on the messages found accordingly.
POP3-based filters will be expected to delete from the server any message identified as spam, and to leave on the server any message classified as ham. There is no feedback mechanism for POP3-based solutions. POP3-based solutions will either have to function without feedback, or use a non-POP3 based feedback mechanism such as SMTP.
Configure your filter to access the Competition Controller IMAP4 server. Set your filter's IMAP4 account and password as appropriate. For locally run Competition Controller's, this will be the account and password you created during the installation process. For the public test server, that data will have been provided when requesting access to the test server.
Setup your filter to process all e-mail received in the filter's inbox. Any messages identified as spam should be moved to the filter's "spam" folder. If a different folder is desired, modify the following property setting to read:
Messages tagged as spam by the user will be placed in the "judgedspam" folder, and messages tagged as ham will be placed in the "judgedham" folder. If supported, configure your filter to scan these folders for new training examples. If different folder locations are required, modify the following property settings:
feedback.spam.imap.folder: <my-train-spam-folder> feedback.ham.imap.folder: <my-train-ham-folder>
Adjust the polling interval on your server to ensure that messages are processed within the one minute time limit.
Configure your filter to access the Competition Controller POP3 server. Set your filter's IMAP4 account and password as appropriate. For locally run Competition Controller's, this will be the account and password you created during the installation process. For the public test server, that data will have been provided when requesting access to the test server.
Setup your filter to process all e-mail received in the filter's inbox. Any messages identified as spam should be deleted. Any message identified as ham must be left on the server. However, some POP3 filters cannot be configured to leave ham messages on the server. The standard POP3 template will not work and a custom setup is required. See the reference section for possible alternatives.
The POP3 template does not provide any mechanism for receiving user feedback. See the reference section for alternative pathways for accessing feedback data.
Adjust the polling interval on your server to ensure that messages are processed within the one minute time limit.
The TREC toolkit is a framework for testing anti-spam filters that was developed for the TREC anti-spam track. We provide a bridge to the TREC toolkit as many existing anti-spam systems already support the TREC framework. Support for the TREC toolkit is implemented by combining the IMAP4 and SMTP models above with scripts for classifying messages from the IMAP4 server and sending classification results back via SMTP.
The TREC Toolkit bridge works by combining the IMAP4 delivery method, fetchmail, and shell scripts for converting fetchmail actions into calls to TREC-based filters.
The TREC Toolkit bridge is run with the following command:
trec-bridge <controller-address> <filter-name> <trec-filter-directory>
The standard SMTP setup will not work for participants whose anti-spam filter sits behind a firewall. One good solution is to pull the test stream from the Competition Controller's IMAP4 server and relay each message to the anti-spam filter using fetchmail.
A sample fetchmail configuration file for performing this task is provided in fetchmail/smtp-firewall.fetchmail. The configuration file will read all mail in the the IMAP inbox, judgedspam, and judgedham folders and relay to the specified SMTP server. The anti-spam filter can detect which IMAP folder each message originated based on the presence and value of the 'X-CEAS-Judgment' header. If no header exists, the message is from the INBOX and should be classified. If the header exists, then the message is a judgment and the "X-CEAS-Judgment" header indicates the correct classification of the message.
The configuration file provided is just a sample and will not work for all filters. However, fetchmail is a very powerful tool and it should be able to connect most anti-spam filters behind a firewall into the competition environment.
Configure your anti-spam filter as described in Section 2.1, but do not modify any values of smtp-firewall.properties that begins with "filter.*" or "feedback.*".
Download the Competition Toolkit. Make a copy of
If you are using a local Competition Controller, edit the above file and replace:
fetchmail -f smtp-firewall-copy.fetchmailrc --user <filter-name> -d 10s
REFERENCE --------- Submitting Messages for Classification ------------------------------------------ filter.type: <delivery-type> [required] Specify the primary delivery mechanism for submitting messages to be classified. Allowable values are: smtp Relay the competition e-mail stream via SMTP. The message will be relayed with the original SMTP "MAIL From" and "RCPT To" address lists. The SMTP "HELO" and TCP/IP connection information will be stored in the first received line. imap4 Relay the competition e-mail stream to the local POP3/IMAP4 server. The message will be delivered to the contestant's account on the competition POP3/IMAP4 server. The SMTP "MAIL From" will be recorded in the "Return-Path" header. The SMTP recipient list will be recorded in the "Delivered-To" header. email The competition stream will be sent to the specified e-mail address. The message will be sent with the original SMTP "MAIL From". The original recipient list is not available when e-mail delivery is used. none The competition stream will not be delivered anywhere. Not very useful, but included here as this delivery type is useful for dropping feedback messages. filter.smtp.server: <host-address> [required for smtp] filter.smtp.port: <port-number> [optional] Sets the host address and port used to relay competition messages when the filter type is set to "smtp". Port number defaults to 25. filter.smtp.recipient: <username> [optional] Configure SMTP delivery to send all messages to <username> rather than the message's original recipient list. filter.imap.folder: <folder-name> [optional] Configure POP3/IMAP4 delivery to redirect messages to the indicated folder instead of the default, the user's inbox. filter.email.address: <email-address> [required for e-mail] Destination address for e-mail based delivery. 3.2 Sending Back Classifications -------------------------------- Classifications are sent back to the Competition Controller using either SMTP or via POP3/IMAP4 mailboxes. All responses take the form of an e-mail message. It is recommended that the original e-mail message be sent back for this purpose as it helps ensure the response is correctly recorded. At a minimum, the "x-ceas-tracking" header present in the original classification request must be sent back with the filter's response. For security reasons, any response received without the original tracking header will be discarded. response.type <type> [required] Specify the mechanism used by the anti-spam filter to return responses back to the Competition Controller for scoring. Allowable values are: smtp The anti-spam filter will relay the original message back to the Competition Controller via SMTP with its response stored in a special header. imap4 The anti-spam filter will move all spam messages into a special junk folder. The Competition Controller will scan the specified junk folder and score each message found in that folder as spam. All other messages are assumed to be classified as ham. pop3 The anti-spam filter will delete all spam from its POP3/IMAP4 account. The Competition Controller will scan the filter's INBOX and score as spam any message that is deleted. Messages that remain in the INBOX after the response timeout period has expired will be marked ham. response.smtp.header: <mime-header> [optional] Configures the MIME header used to store the filter's SMTP-based response. If this property is not specified, a value of 'x-ceas-label' is assumed. response.smtp.regex: <java-regex> [optional] Specifies the regular expression for determining whether the filter's response header represents a spam classification. Any message in which the response header matches <java-regex> will be scored as spam. All other responses will be treated as ham classifications. The regular expression uses the Java regular expression syntax (see java.util.regex package for details), which in turn is based on the Perl regular expression syntax. The default is "(?i:spam)" which performs a case insensitive match for the string literal "spam". response.spam.imap.folder <mailbox> [optional] Specify the IMAP4 folder to check for IMAP4-based spam responses. The default is "spam". 3.3 Sending Judgments --------------------- The mechanisms available for sending judgments closely parallel those available for sending the test stream. Separate delivery options can be specified for spam and ham judgments. Judgments will be sent using the original test message but with a 'x-ceas-judgment' header added with the value of "spam" or "ham" as appropriate. The original message and the judgment message will be sent with identical message id's stored in their respective "x-ceas-tracking" headers. feedback.spam.type: <delivery-type> [required] feedback.ham.type: <delivery-type> [required] Specify the primary delivery mechanism for sending judgments to the filter. Allowable values are: smtp Relay judgments via SMTP. imap4 Relay judgment messages to the filter's judgment folders on the Competition Controller's POP3/IMAP4 server. By default, spam judgments go to "judgedspam" and ham judgments go to "judgedham". email Send judgments via e-mail. none Do not send judgments. feedback.spam.smtp.server [optional] feedback.spam.smtp.port feedback.ham.smtp.server feedback.ham.smtp.port Sets the host address and port used to relay competition messages when the feedback type is set to "smtp". If not specified, the feedback server address and port number default to the server and port defined in "filter.smtp.server" and "filter.smtp.port" feedback.spam.smtp.recipient: <username> [optional] feedback.ham.smtp.recipient <username> Configure judgments be sent to <username> rather than the message's original recipient list. This setting is useful for solutions that provide special addresses for sending false positive and false negative reports. feedback.spam.imap.folder: <folder-name> [optional] feedback.ham.imap.folder: <folder-name> Configure judgments to be sent to the indicated folder instead of the default. feedback.spam.email.address: <email-address> [required for e-mail] feedback.ham.email.address: <email-address> Destination address for sending judgments via e-mail.