[Dspam-user] DSPAM Not Working Very Effectively

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Dspam-user] DSPAM Not Working Very Effectively

Jerry Gardner
I set up a new mail server about four months ago with DSPAM as the spam filter. While it is filtering out some spam, it completely misses most of it. I trained it for the first 2500 emails, but it doesn't seem to be getting any better at filtering after that. I regularly retrain any false negatives and false positives it finds. For example, I get a few dozen spams a day advertising vacations in Belize. They're all almost identical, and I've been getting them consistently for months, yet DSPAM is still not filtering them out and is marking them as

X-DSPAM-Confidence: 0.9803
X-DSPAM-Probability: 0.0000

Here's the output of dspam_stats:  TP: 19302 TN: 17498 FP:    45 FN:  1337 SC:     0 NC:     0

Here's the output of dspam_stats -H :

                FP False Positives:                   45
                FN False Negatives:                 1337
                SC Spam Corpusfed:                     0
                NC Nonspam Corpusfed:                  0
                TL Training Left:                      0
                SHR Spam Hit Rate                 93.52%
                HSR Ham Strike Rate:               0.26%
                PPV Positive predictive value:    99.77%
                OCA Overall Accuracy:             96.38%


One thing I don't understand is TN--according to dspam_stats its 17498, yet I have only received a total of around 1900 non-spams since I set up this server.

What can I do to help DSPAM learn faster so I can cut down all of the spam I have to manually deal with each day?

Here's my dspam.conf:

## $Id: dspam.conf.in,v 1.103 2011/11/10 00:27:34 tomhendr Exp $
## dspam.conf -- DSPAM configuration file
##

#
# DSPAM Home: Specifies the base directory to be used for DSPAM storage
#
Home /var/spool/dspam

#
# StorageDriver: Specifies the storage driver backend (library) to use.
# You'll only need to set this if you are using dynamic storage driver plugins
# from a binary distribution. The default build statically links the storage
# driver (when only one is specified at configure time), overriding this
# setting, which only comes into play if multiple storage drivers are specified
# at configure time. When using dynamic linking, be sure to include the path
# to the library if necessary, and some systems may use an extension other
# than .so (e.g. OSX uses .dylib).
#
# Options include:
#
#   libmysql_drv.so     libpgsql_drv.so
#   libsqlite3_drv.so   libhash_drv.so
#
# IMPORTANT: Switching storage drivers requires more than merely changing
# this option. If you do not wish to lose all of your data, you will need to
# migrate it to the new backend before making this change.
#
StorageDriver /usr/lib/x86_64-linux-gnu/dspam/libhash_drv.so
#StorageDriver /usr/lib/dspam/libmysql_drv.so

#
# Trusted Delivery Agent: Specifies the local delivery agent DSPAM should call
# when delivering mail as a trusted user. Use %u to specify the user DSPAM is
# processing mail for. It is generally a good idea to allow the MTA to specify
# the pass-through arguments at run-time, but they may also be specified here.
#
# Most operating system defaults:
#TrustedDeliveryAgent "/usr/bin/procmail"       # Linux
#TrustedDeliveryAgent "/usr/bin/mail"           # Solaris
#TrustedDeliveryAgent "/usr/libexec/mail.local" # FreeBSD
#TrustedDeliveryAgent "/usr/bin/procmail"       # Cygwin
#
# Other popular configurations:
#TrustedDeliveryAgent "/usr/cyrus/bin/deliver"  # Cyrus
#TrustedDeliveryAgent "/bin/maildrop"           # Maildrop
#TrustedDeliveryAgent "/usr/local/sbin/exim -oMr spam-scanned -oi" # Exim
#
TrustedDeliveryAgent "/usr/bin/procmail"

#
# Untrusted Delivery Agent: Specifies the local delivery agent and arguments
# DSPAM should use when delivering mail and running in untrusted user mode.
# Because DSPAM will not allow pass-through arguments to be specified to
# untrusted users, all arguments should be specified here. Use %u to specify
# the user DSPAM is processing mail for. This configuration parameter is only
# necessary if you plan on allowing untrusted processing.
#
#UntrustedDeliveryAgent "/usr/bin/procmail -d %u"

#
# SMTP or LMTP Delivery: Alternatively, you may wish to use SMTP or LMTP
# delivery to deliver your message to the mail server instead of using a
# delivery agent. You will need to configure with --enable-daemon to use host
# delivery, however you do not need to operate in daemon mode. Specify an IP
# address or UNIX path to a domain socket below as a host.
#
# If you would like to set up DeliveryHost's on a per-domain basis, use
# the syntax: DeliveryHost.example.org 1.2.3.4
#
DeliveryHost            127.0.0.1
#DeliveryPort           2424
DeliveryPort            10026
DeliveryIdent           localhost
#DeliveryProto          LMTP
DeliveryProto           SMTP

#
# FallbackDomains: If you want to specify certain domains as fallback domains,
# enable this option. For example, you could create a user @example.org, and
# if [hidden email] does not resolve to a known user on the system, the user
# could default to your @example.org user. NOTE: This also requires designating
# fallbackDomain for the domain name;
# e.g. dspam_admin ch pref example.org fallbackDomain on
#
FallbackDomains on
FallbackDomain xm23.net

#
# Quarantine Agent: DSPAM's default behavior is to quarantine all mail it
# thinks is spam. If you wish to override this behavior, you may specify
# a quarantine agent which will be called with all messages DSPAM thinks is
# spam. Use %u to specify the user DSPAM is processing mail for.
#
#QuarantineAgent        "/usr/bin/procmail -d spam"

#
# DSPAM can optionally process "plused users" (addresses in the user+detail
# form) by truncating the username just before the "+", so all internal
# processing occurs for "user", but delivery will be performed for
# "user+detail". This is only useful if the LDA can handle "plused users"
# (for example Cyrus IMAP) and when configured for LMTP delivery above
#
#EnablePlusedDetail     on

#
# Character to use as seperator between user names and address extensions.
# If you change this value then please adjust QuarantineMailbox to use the
# new specified character. The default is '+'.
#
#PlusedCharacter        +

#
# Turn this feature on if you want to force DSPAM to lowercase the "plused
# users" username.
#
#PlusedUserLowercase    on

#
# Quarantine Mailbox: DSPAM's LMTP code can send spam mail using LMTP to a
# "plused" mailbox (such as user+quarantine) leaving quarantine processing
# for retraining or deletion to be performed by the LDA and the mail client.
# "plused" mailboxes are supported by Cyrus IMAP and possibly other LDAs. If
# you don't set/change PlusedCharacter then the mailbox name must have the +
# since the + is the default used character.
#
#QuarantineMailbox      +quarantine

#
# OnFail: What to do if local delivery or quarantine should fail. If set
# to "unlearn", DSPAM will unlearn the message prior to exiting with an
# un successful return code. The default option, "error" will not unlearn
# the message but return the appropriate error code. The unlearn option
# is use-ful on some systems where local delivery failures will cause the
# message to be requeued for delivery, and could result in the message
# being processed multiple times. During a very large failure, however,
# this could cause a significant load increase.
#
OnFail error

#
# Trusted Users: Only the users specified below will be allowed to perform
# administrative functions in DSPAM such as setting the active user and
# accessing tools. All other users attempting to run DSPAM will be restricted;
# their uids will be forced to match the active username and they will not be
# able to specify delivery agent privileges or use tools.
#
Trust root
Trust dspam
Trust www-data
Trust mail
Trust daemon
Trust amavis
#Trust nobody
#Trust majordomo

#
# Debugging: Enables debugging for some or all users. IMPORTANT: DSPAM must
# be compiled with debug support in order to use this option. DSPAM should
# never be running in production with debug active unless you are
# troubleshooting problems.
#
# DebugOpt: One or more of: process, classify, spam, fp, inoculation, corpus
#   process     standard message processing
#   classify    message classification using --classify
#   spam        error correction of missed spam
#   fp          error correction of false positives
#   inoculation message inoculations (source=inoculation)
#   corpus      corpusfed messages (source=corpus)
#
Debug process
#Debug bob bill
#
#DebugOpt process spam fp

#
# ClassAlias: Alias a particular class to spam/nonspam. This is useful if
# classifying things other than spam.
#
#ClassAliasSpam badstuff
#ClassAliasNonspam goodstuff

#
# Training Mode: The default training mode to use for all operations, when
# one has not been specified on the commandline or in the user's preferences.
# Acceptable values are:
#     toe     Train on Error (Only)
#     teft    Train Everything (Trains on every message)
#     tum     Train Until Mature (Train only tokens without enough data)
#     notrain Do not train or store signatures (large ISP systems, post-train)
#
TrainingMode teft

#
# TestConditionalTraining: By default, dspam will retrain certain errors
# until the condition is no longer met. This usually accelerates learning.
# Some people argue that this can increase the risk of errors, however.
#
TestConditionalTraining on

#
# Features: Specify features to activate by default; can also be specified
# on the commandline. See the documentation for a list of available features.
# If _any_ features are specified on the commandline, these are ignored.
#
Feature noise
Feature whitelist

# Training Buffer: The training buffer waters down statistics during training.
# It is designed to prevent false positives, but can also dramatically reduce
# dspam's catch rate during initial training. This can be a number from 0
# (no buffering) to 10 (maximum buffering). If you are paranoid about false
# positives, you should probably enable this option.
#
#Feature tb=5

#
# Algorithms: Specify the statistical algorithms to use, overriding any
# defaults configured in the build. The options are:
#    naive       Naive-Bayesian (All Tokens)
#    graham      Graham-Bayesian ("A Plan for Spam")
#    burton      Burton-Bayesian (SpamProbe)
#    robinson    Robinson's Geometric Mean Test (Obsolete)
#    chi-square  Fisher-Robinson's Chi-Square Algorithm
#
# You may have multiple algorithms active simultaneously, but it is strongly
# recommended that you group Bayesian algorithms with other Bayesian
# algorithms, and any use of Chi-Square remain exclusive.
#
# NOTE: For standard "CRM114" Markovian weighting, use 'naive', or consider
#       using 'burton' for slightly better accuracy
#
# Don't mess with this unless you know what you're doing
#
#Algorithm chi-square
#Algorithm naive
Algorithm graham burton

#
# Tokenizer: Specify the tokenizer to use. The tokenizer is the piece
# responsible for parsing the message into individual tokens. Depending on
# how many resources you are willing to trade off vs. accuracy, you may
# choose to use a less or more detailed tokenizer:
#   word    uniGram (single word) tokenizer
#           Tokenizes message into single individual words/tokens
#           example: "free" and "viagra"
#   chain   biGram (chained tokens) tokenizer (default)
#           Single words + chains adjacent tokens together
#           example: "free" and "viagra" and "free viagra"
#   sbph    Sparse Binary Polynomial Hashing tokenizer
#           Creates sparse token patterns across sliding window of 5-tokens
#           example: "the quick * fox jumped" and "the * * fox jumped"
#   osb     Orthogonal Sparse biGram tokenizer
#           Similar to SBPH, but only uses the biGrams
#           example: "the * * fox" and "the * * * jumped"
#
# In general the reccomendation is to use 'osb' for new installations.
# The default value of 'chain' remains here as not to surprise anyone upgrading
# that has not changed from the default value.
#
Tokenizer chain

#
# PValue: Specify the technique used for calculating Probability Values,
# overriding any defaults configured in the build. These options are:
#    bcr         Bayesian Chain Rule (Graham's Technique - "A Plan for Spam")
#    robinson    Robinson's Technique (used in Chi-Square)
#    markov      Markovian Weighted Technique (for Markovian discrimination)
#
# Unlike the "Algorithms" property, you may only have one of these defined.
# Use of the chi-square algorithm automatically changes this to robinson.
#
# Don't mess with this unless you know what you're doing.
#
#PValue robinson
#PValue markov
PValue bcr

#
# WebStats: Enable this if you are using the CGI, which writes .stats files
WebStats on

#
# ImprobabilityDrive: Calculate odds-ratios for ham/spam, and add to
# X-DSPAM-Improbability headers
#
#ImprobabilityDrive on

#
# Preferences: Specify any preferences to set by default, unless otherwise
# overridden by the user (see next section) or a default.prefs file.
# If user or default.prefs are found, the user's preferences will override any
# defaults.
#
Preference "trainingMode=TEFT"          # { TOE | TUM | TEFT | NOTRAIN } -> default:teft
Preference "spamAction=tag"             # { quarantine | tag | deliver } -> default:quarantine
Preference "spamSubject=[SPAM]"         # { string } -> default:[SPAM]
Preference "statisticalSedation=5"      # { 0 - 10 } -> default:0
Preference "enableBNR=on"               # { on | off } -> default:off
Preference "enableWhitelist=on"         # { on | off } -> default:on
Preference "signatureLocation=message"  # { message | headers } -> default:message
Preference "tagSpam=off"                # { on | off }
Preference "tagNonspam=off"             # { on | off }
Preference "showFactors=off"            # { on | off } -> default:off
Preference "optIn=off"                  # { on | off }
Preference "optOut=off"                 # { on | off }
Preference "whitelistThreshold=10"      # { Integer } -> default:10
Preference "makeCorpus=off"             # { on | off } -> default:off
Preference "storeFragments=off"         # { on | off } -> default:off
Preference "localStore="                # { on | off } -> default:username
Preference "processorBias=on"           # { on | off } -> default:on
Preference "fallbackDomain=off"         # { on | off } -> default:off
Preference "trainPristine=off"          # { on | off } -> default:off
Preference "optOutClamAV=off"           # { on | off } -> default:off
Preference "ignoreRBLLookups=off"       # { on | off } -> default:off
Preference "RBLInoculate=off"           # { on | off } -> default:off
Preference "notifications=off"          # { on | off } -> default:off

#
# Overrides: Specifies the user preferences which may override configuration
# and commandline defaults. Any other preferences supplied by an untrusted user
# will be ignored.
#
AllowOverride enableBNR
AllowOverride enableWhitelist
AllowOverride fallbackDomain
AllowOverride ignoreGroups
AllowOverride ignoreRBLLookups
AllowOverride localStore
AllowOverride makeCorpus
AllowOverride optIn
AllowOverride optOut
AllowOverride optOutClamAV
AllowOverride processorBias
AllowOverride RBLInoculate
AllowOverride showFactors
AllowOverride signatureLocation
AllowOverride spamAction
AllowOverride spamSubject
AllowOverride statisticalSedation
AllowOverride storeFragments
AllowOverride tagNonspam
AllowOverride tagSpam
AllowOverride trainPristine
AllowOverride trainingMode
AllowOverride whitelistThreshold
AllowOverride dailyQuarantineSummary
AllowOverride notifications

# --- Profiles ---

#
# You can specify multiple storage profiles, and specify the server to
# use on the commandline with --profile. For example:
#
#Profile DECAlpha
#MySQLServer.DECAlpha   10.0.0.1
#MySQLPort.DECAlpha     3306
#MySQLUser.DECAlpha     dspam
#MySQLPass.DECAlpha     changeme
#MySQLDb.DECAlpha       dspam
#MySQLCompress.DECAlpha true
#MySQLReconnect.DECAlpha        true
#
#Profile Sun420R
#MySQLServer.Sun420R    10.0.0.2
#MySQLPort.Sun420R      3306
#MySQLUser.Sun420R      dspam
#MySQLPass.Sun420R      changeme
#MySQLDb.Sun420R        dspam
#MySQLCompress.Sun420R  false
#MySQLReconnect.Sun420R true
#
#DefaultProfile DECAlpha

#
# If you're using storage profiles, you can set failovers for each profile.
# Of course, if you'll be failing over to another database, that database
# must have the same information as the first. If you're using a global
# database with no training, this should be relatively simple. If you're
# configuring per-user data, however, you'll need to set up some type of
# replication between databases.
#
#Failover.DECAlpha      SUN420R
#Failover.Sun420R       DECAlpha

# If the storage fails, the agent will follow each profile's failover up to
# a maximum number of failover attempts. This should be set to a maximum of
# the number of profiles you have, otherwise the agent could loop and try
# the same profile multiple times (unless this is your desired behavior).
#
#FailoverAttempts       1

#
# Ignored headers: If DSPAM is behind other tools which may add a header to
# incoming emails, it may be beneficial to ignore these headers - especially
# if they are coming from another spam filter. If you are _not_ using one of
# these tools, however, leaving the appropriate headers commented out will
# allow DSPAM to use them as telltale signs of forged email.
#
#IgnoreHeader X-Spam-Status
#IgnoreHeader X-Spam-Scanned
#IgnoreHeader X-Virus-Scanner-Result

#
# Lookup: Perform lookups on streamlined blackhole list servers (see
# http://www.nuclearelephant.com/projects/sbl/). The streamlined blacklist
# server is machine-automated, unsupervised blacklisting system designed to
# provide real-time and highly accurate blacklisting based on network spread.
# When performing a lookup, DSPAM will automatically learn the inbound message
# as spam if the source IP is listed. Until an official public RABL server is
# available, this feature is only useful if you are running your own
# streamlined blackhole list server for internal reporting among multiple mail
# servers. Provide the name of the lookup zone below to use.
#
# This function performs standard reverse-octet.domain lookups, and while it
# will function with many RBLs, it's strongly discouraged to use those
# maintained by humans as they're often inaccurate and could hurt filter
# learning and accuracy.
#
#Lookup         "sbl.example.org"

#
# RBLInoculate: If you want to inoculate the user from RBL'd messages it would
# have otherwise missed, set this to on.
#
#RBLInoculate   off

#
# Notifications: Enable the sending of notification emails to users (first
# message, quarantine full, etc.)
#
Notifications   off

# TxtDirectory: the directory that holds the templates for notification
# messages (see Notifications) and tagging (see tagSpam/tagNonspam).
#
#TxtDirectory /etc/dspam/txt

#
# QuarantineWarnSize: You may specify a size when DSPAM should send a "Quarantine
# Full" message to each user. This is only working if you enable notifications
# (see above). Value is in bytes. Default is 2097152 -> 2MB.
#
#QuarantineWarnSize 2097152

#
# Purge configuration: Set dspam_clean purge default options, if not otherwise
# specified on the commandline
#
PurgeSignatures 14      # Stale signatures
PurgeNeutral    90      # Tokens with neutralish probabilities
PurgeUnused     90      # Unused tokens
PurgeHapaxes    30      # Tokens with less than 5 hits (hapaxes)
PurgeHits1S     15      # Tokens with only 1 spam hit
PurgeHits1I     15      # Tokens with only 1 innocent hit

#
# Purge configuration for SQL-based installations using purge.sql
#
#PurgeSignature off     # Specified in purge.sql
#PurgeNeutral   90
#PurgeUnused    off     # Specified in purge.sql
#PurgeHapaxes   off     # Specified in purge.sql
#PurgeHits1S    off     # Specified in purge.sql
#PurgeHits1I    off     # Specified in purge.sql

#
# Local Mail Exchangers: Used for source address tracking, tells DSPAM which
# mail exchangers are local and therefore should be ignored in the Received:
# header when tracking the source of an email. Note: you should use the address
# of the host as appears between brackets [ ] in the Received header.
# By default DSPAM is considering the following IPs always as LocalMX:
#       10.0.0.0/8      - Private IP addresses (RFC 1918)
#       127.0.0.0/8     - Localhost Loopback Address (RFC 1700)
#       169.254.0.0/16  - Zeroconf / APIPA (RFC 3330)
#       172.16.0.0/12   - Private IP addresses (RFC 1918)
#       192.168.0.0/16  - Private IP addresses (RFC 1918)
#
LocalMX 127.0.0.1

#
# Logging: Disabling logging for users will make usage graphs unavailable to
# them. Disabling system logging will make admin graphs unavailable.
#
SystemLog       on
UserLog         on

#
# TrainPristine: for systems where the original message remains server side
# and can therefore be presented in pristine format for retraining. This option
# will cause DSPAM to cease all writing of signatures and DSPAM headers to the
# message, and deliver the message in as pristine format as possible. This mode
# REQUIRES that the original message in its pristine format (as of delivery)
# be presented for retraining, as in the case of webmail, imap, or other
# applications where the message is actually kept server-side during reading,
# and is preserved. DO NOT use this switch unless the original message can be
# presented for retraining with the ORIGINAL HEADERS and NO MODIFICATIONS.
#
# NOTE: You can't use this setting with dspam_trian; if you're going to use it,
#       wait until after you train any corpora.
#
#TrainPristine on

#
# Opt: in or out; determines DSPAM's default filtering behavior. If this value
# is set to in, users must opt-in to filtering by dropping a .dspam file in
# /var/dspam/opt-in/user.dspam (or if you have homedirs configured, a .dspam
# folder in their home directory).  The default is opt-out, which means all
# users will be filtered unless a .nodspam file is dropped in
# /var/dspam/opt-out/user.nodspam
#
Opt out

#
# TrackSources: specify which (if any) source addresses to track and report
# them to syslog (mail.info). This is useful if you're running a firewall or
# blacklist and would like to use this information. Spam reporting also drops
# RABL blacklist files (see http://www.nuclearelephant.com/projects/rabl/).
#
#TrackSources spam nonspam virus

#
# ParseToHeaders: In lieu of setting up individual aliases for each user,
# DSPAM can be configured to automatically parse the To: address for spam and
# false positive forwards. From there, it can be configured to either set the
# DSPAM user based on the username specified in the header and/or change the
# training class and source accordingly. The options below can be used to
# customize most common types of header parsing behavior to avoid the need for
# multiple aliases, or if using LMTP, aliases entirely..
#
# ParseToHeader: Parse the To: headers of an incoming message. This must be
#                set to 'on' to use either of the following features.
#
# ChangeModeOnParse: Automatically change the class (to spam or innocent)
#   depending on whether spam- or notspam- was specified, and change the source
#   to 'error'. This is convenient if you're not using aliases at all, but
#   are delivering via LMTP.
#
# ChangeUserOnParse: Automatically change the username to match that specified
#   in the To: header. For example, [hidden email] will set the username
#   to bob, ignoring any --user passed in. This may not always be desirable if
#   you are using virtual email addresses as usernames. Options:
#     on or user        take the portion before the @ sign only
#     full              take everything after the initial {spam,notspam}-.
#
ParseToHeaders on
ChangeModeOnParse on
#ChangeUserOnParse on
ChangeUserOnParse full

#
# Broken MTA Options: Some MTAs don't support the proper functionality
# necessary. In these cases you can activate certain features in DSPAM to
# compensate. 'returnCodes' causes DSPAM to return an exit code of 99 if
# the message is spam, 0 if not, or a negative code if an error has occured.
# Specifying 'case' causes DSPAM to force the input usernames to lowercase.
# Specifying 'lineStripping' causes DSPAM to strip ^M's from messages passed
# in.
#
#Broken returnCodes
#Broken case
#Broken lineStripping

#
# MaxMessageSize: You may specify a maximum message size for DSPAM to process.
# If the message is larger than the maximum size, it will be delivered
# without processing. Value is in bytes.
#
#MaxMessageSize 4194304

# --- ClamAV ---

#
# Virus Checking: If you are running clamd, DSPAM can perform stream-based
# virus checking using TCP. Uncomment the values below to enable virus
# checking.
#
# ClamAVResponse: reject (reject or drop the message with a permanent failure)
#                 accept (accept the message and quietly drop the message)
#                 spam   (treat as spam and quarantine/tag/whatever)
#
#ClamAVPort             3310
#ClamAVHost             127.0.0.1
#ClamAVResponse         accept

# --- CLIENT / SERVER ---

#
# Daemonized Server: If you are running DSPAM as a daemonized server using
# --daemon, the following parameters will override the default. Use the
# ServerPass option to set up accounts for each client machine. The DSPAM
# server will process and deliver the message based on the parameters
# specified. If you want the client machine to perform delivery, use
# the --stdout option in conjunction with a local setup.
#
# ServerHost: Not enabling ServerHost will bind DSPAM server to all available
# interfaces.
#
# ServerPort: Default upstream configuration is to run dspam daemon on port
# 24. On Debian, dspam being run as a unprivileged user, default port is
# set to 2424.
#
#ServerHost             127.0.0.1
#ServerPort             2424
#ServerQueueSize        32
ServerPID               /var/run/dspam/dspam.pid

#
# ServerMode specifies the type of LMTP server to start. This can be one of:
#     dspam: DSPAM-proprietary DLMTP server, for communicating with dspamc
#  standard: Standard LMTP server, for communicating with Postfix or other MTA
#      auto: Speak both DLMTP and LMTP; auto-detect by ServerPass.IDENT
#
#ServerMode dspam
ServerMode auto

# If supporting DLMTP (dspam) mode, dspam clients will require authentication
# as they will be passing in parameters. The idents below will be used to
# determine which clients will be speaking DLMTP, so if you will be using
# both LMTP and DLMTP from the same host, be sure to use something other
# than the server's hostname below (which will be sent by the MTA during a
# standard LMTP LHLO).
#
#ServerPass.Relay1      "secret"
#ServerPass.Relay2      "password"

# If supporting standard LMTP mode, server parameters will need to be specified
# here, as they will not be passed in by the mail server. The ServerIdent
# specifies the 250 response code ident sent back to connecting clients and
# should be set to the hostname of your server, or an alias.
#
# NOTE: If you specify --user in ServerParameters, the RCPT TO will be
#       used only for delivery, and not set as the active user for processing.
#
#ServerParameters       "--deliver=innocent -d %u"
ServerParameters        "--deliver=innocent"
#ServerIdent            "localhost.localdomain"
ServerIdent             "localhost.xm23.net"

# If you wish to use a local domain socket instead of a TCP socket, uncomment
# the following. It is strongly recommended you use local domain sockets if
# you are running the client and server on the same machine, as it eliminates
# much of the bandwidth overhead.
#
#ServerDomainSocketPath "/var/run/dspam/dspam.sock"
ServerDomainSocketPath  "/var/spool/postfix/tmp/dspam.sock"

#
# Client Mode: If you are running DSPAM in client/server mode, uncomment and
# set these variables. A ClientHost beginning with a / will be treated as
# a domain socket.
#
#ClientHost     /var/run/dspam/dspam.sock
#ClientIdent    "secret@Relay1"
#
#ClientHost     127.0.0.1
#ClientPort     2424
#ClientIdent    "secret@Relay1"

# --- RABL ---

# RABLQueue: Touch files in the RABL queue
# If you are a reporting streamlined blackhole list participant, you can
# touch ip addresses within the directory the rabl_client process is watching.
#
#RABLQueue      /var/spool/rabl

# ---  ---

# DataSource: If you are using any type of data source that does not include
# email-like headers (such as documents), uncomment the line below. This
# will cause the entire input to be treated like a message "body"
#
#DataSource document

# ProcessorWordFrequency: By default, words are only counted once per message.
# If you are classifying large documents, however, you may wish to count once
# per occurrence instead.
#
#ProcessorWordFrequency occurrence

# ProcessorURLContext: By default, a URL context is generated for URLs, which
# records their tokens as separate from words found in documents. To use
# URL tokens in the same context as words, turn this feature off.
#
ProcessorURLContext on

# ProcessorBias: Bias causes the filter to lean more toward 'innocent', and
# usually greatly reduces false positives. It is the default behavior of
# most Bayesian filters (including dspam).
#
# NOTE: You probably DONT want this if you're using Markovian Weighting, unless
# you are paranoid about false positives.
#
ProcessorBias on

# StripRcptDomain: Cut the domain (including the at sign) from recipients.
# This is particularly useful if the recipient name is equal to real user
# accounts as recipients with domains tend to cause permission issues with
# dspam-web.
#
StripRcptDomain off

# GroupConfig: The configuration file for groups. See the README file
# for details on how to enable users to combine their training data to
# get better results.
GroupConfig /var/spool/dspam/group

# --- Split Configuration File Support ---

# Include a directory with configuration items.
Include /etc/dspam/dspam.d/

# ---  ---

IgnoreHeader Accept-Language
IgnoreHeader Authentication-Results
IgnoreHeader Content-Type
IgnoreHeader DKIM-Signature
IgnoreHeader Date
IgnoreHeader DomainKey-Signature
IgnoreHeader Importance
IgnoreHeader In-Reply-To
IgnoreHeader List-Archive
IgnoreHeader List-Help
IgnoreHeader List-Id
IgnoreHeader List-Post
IgnoreHeader List-Subscribe
IgnoreHeader List-Unsubscribe
IgnoreHeader Message-ID
IgnoreHeader Message-Id
IgnoreHeader Organization
IgnoreHeader Received
IgnoreHeader Received-SPF
IgnoreHeader References
IgnoreHeader Reply-To
IgnoreHeader Resent-Date
IgnoreHeader Resent-From
IgnoreHeader Thread-Index
IgnoreHeader Thread-Topic
IgnoreHeader User-Agent
IgnoreHeader X-policyd-weight
IgnoreHeader thread-index


------------------------------------------------------------------------------

_______________________________________________
Dspam-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspam-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Dspam-user] DSPAM Not Working Very Effectively

David A. Desrosiers
On 9/2/16 3:44 PM, Jerry Gardner wrote:
> What can I do to help DSPAM learn faster so I can cut down all of the
> spam I have to manually deal with each day?

Did you feed it the initial corpus to train it with both ham and spam?

------------------------------------------------------------------------------
_______________________________________________
Dspam-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspam-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Dspam-user] DSPAM Not Working Very Effectively

ktm@rice.edu
In reply to this post by Jerry Gardner
Hi Jerry,

First, I see that you have teft set as the training mode. Change that
to toe, instead. Otherwise your accuracy can degrade over time. Second,
you are using the hash driver and the chain tokenizer, even though the
comments in the config file recommends osb instead. Make those two
changes and start over.

Regards,
Ken
On Fri, Sep 02, 2016 at 12:44:14PM -0700, Jerry Gardner wrote:

> I set up a new mail server about four months ago with DSPAM as the spam
> filter. While it is filtering out some spam, it completely misses most of
> it. I trained it for the first 2500 emails, but it doesn't seem to be
> getting any better at filtering after that. I regularly retrain any false
> negatives and false positives it finds. For example, I get a few dozen
> spams a day advertising vacations in Belize. They're all almost identical,
> and I've been getting them consistently for months, yet DSPAM is still not
> filtering them out and is marking them as
>
> *X-DSPAM-Confidence:* 0.9803
> *X-DSPAM-Probability:* 0.0000
>
> Here's the output of dspam_stats:  TP: 19302 TN: 17498 FP:    45 FN:  1337
> SC:     0 NC:     0
>
> Here's the output of dspam_stats -H :
>
>                 FP False Positives:                   45
>                 FN False Negatives:                 1337
>                 SC Spam Corpusfed:                     0
>                 NC Nonspam Corpusfed:                  0
>                 TL Training Left:                      0
>                 SHR Spam Hit Rate                 93.52%
>                 HSR Ham Strike Rate:               0.26%
>                 PPV Positive predictive value:    99.77%
>                 OCA Overall Accuracy:             96.38%
>
>
> One thing I don't understand is TN--according to dspam_stats its 17498, yet
> I have only received a total of around 1900 non-spams since I set up this
> server.
>
> What can I do to help DSPAM learn faster so I can cut down all of the spam
> I have to manually deal with each day?
>
> Here's my dspam.conf:
>
> ## $Id: dspam.conf.in,v 1.103 2011/11/10 00:27:34 tomhendr Exp $
> ## dspam.conf -- DSPAM configuration file
> ##
>
> #
> # DSPAM Home: Specifies the base directory to be used for DSPAM storage
> #
> Home /var/spool/dspam
>
> #
> # StorageDriver: Specifies the storage driver backend (library) to use.
> # You'll only need to set this if you are using dynamic storage driver
> plugins
> # from a binary distribution. The default build statically links the storage
> # driver (when only one is specified at configure time), overriding this
> # setting, which only comes into play if multiple storage drivers are
> specified
> # at configure time. When using dynamic linking, be sure to include the path
> # to the library if necessary, and some systems may use an extension other
> # than .so (e.g. OSX uses .dylib).
> #
> # Options include:
> #
> #   libmysql_drv.so     libpgsql_drv.so
> #   libsqlite3_drv.so   libhash_drv.so
> #
> # IMPORTANT: Switching storage drivers requires more than merely changing
> # this option. If you do not wish to lose all of your data, you will need to
> # migrate it to the new backend before making this change.
> #
> StorageDriver /usr/lib/x86_64-linux-gnu/dspam/libhash_drv.so
> #StorageDriver /usr/lib/dspam/libmysql_drv.so
>
> #
> # Trusted Delivery Agent: Specifies the local delivery agent DSPAM should
> call
> # when delivering mail as a trusted user. Use %u to specify the user DSPAM
> is
> # processing mail for. It is generally a good idea to allow the MTA to
> specify
> # the pass-through arguments at run-time, but they may also be specified
> here.
> #
> # Most operating system defaults:
> #TrustedDeliveryAgent "/usr/bin/procmail"       # Linux
> #TrustedDeliveryAgent "/usr/bin/mail"           # Solaris
> #TrustedDeliveryAgent "/usr/libexec/mail.local" # FreeBSD
> #TrustedDeliveryAgent "/usr/bin/procmail"       # Cygwin
> #
> # Other popular configurations:
> #TrustedDeliveryAgent "/usr/cyrus/bin/deliver"  # Cyrus
> #TrustedDeliveryAgent "/bin/maildrop"           # Maildrop
> #TrustedDeliveryAgent "/usr/local/sbin/exim -oMr spam-scanned -oi" # Exim
> #
> TrustedDeliveryAgent "/usr/bin/procmail"
>
> #
> # Untrusted Delivery Agent: Specifies the local delivery agent and arguments
> # DSPAM should use when delivering mail and running in untrusted user mode.
> # Because DSPAM will not allow pass-through arguments to be specified to
> # untrusted users, all arguments should be specified here. Use %u to specify
> # the user DSPAM is processing mail for. This configuration parameter is
> only
> # necessary if you plan on allowing untrusted processing.
> #
> #UntrustedDeliveryAgent "/usr/bin/procmail -d %u"
>
> #
> # SMTP or LMTP Delivery: Alternatively, you may wish to use SMTP or LMTP
> # delivery to deliver your message to the mail server instead of using a
> # delivery agent. You will need to configure with --enable-daemon to use
> host
> # delivery, however you do not need to operate in daemon mode. Specify an IP
> # address or UNIX path to a domain socket below as a host.
> #
> # If you would like to set up DeliveryHost's on a per-domain basis, use
> # the syntax: DeliveryHost.example.org 1.2.3.4
> #
> DeliveryHost            127.0.0.1
> #DeliveryPort           2424
> DeliveryPort            10026
> DeliveryIdent           localhost
> #DeliveryProto          LMTP
> DeliveryProto           SMTP
>
> #
> # FallbackDomains: If you want to specify certain domains as fallback
> domains,
> # enable this option. For example, you could create a user @example.org, and
> # if [hidden email] does not resolve to a known user on the system, the
> user
> # could default to your @example.org user. NOTE: This also requires
> designating
> # fallbackDomain for the domain name;
> # e.g. dspam_admin ch pref example.org fallbackDomain on
> #
> FallbackDomains on
> FallbackDomain xm23.net
>
> #
> # Quarantine Agent: DSPAM's default behavior is to quarantine all mail it
> # thinks is spam. If you wish to override this behavior, you may specify
> # a quarantine agent which will be called with all messages DSPAM thinks is
> # spam. Use %u to specify the user DSPAM is processing mail for.
> #
> #QuarantineAgent        "/usr/bin/procmail -d spam"
>
> #
> # DSPAM can optionally process "plused users" (addresses in the user+detail
> # form) by truncating the username just before the "+", so all internal
> # processing occurs for "user", but delivery will be performed for
> # "user+detail". This is only useful if the LDA can handle "plused users"
> # (for example Cyrus IMAP) and when configured for LMTP delivery above
> #
> #EnablePlusedDetail     on
>
> #
> # Character to use as seperator between user names and address extensions.
> # If you change this value then please adjust QuarantineMailbox to use the
> # new specified character. The default is '+'.
> #
> #PlusedCharacter        +
>
> #
> # Turn this feature on if you want to force DSPAM to lowercase the "plused
> # users" username.
> #
> #PlusedUserLowercase    on
>
> #
> # Quarantine Mailbox: DSPAM's LMTP code can send spam mail using LMTP to a
> # "plused" mailbox (such as user+quarantine) leaving quarantine processing
> # for retraining or deletion to be performed by the LDA and the mail client.
> # "plused" mailboxes are supported by Cyrus IMAP and possibly other LDAs. If
> # you don't set/change PlusedCharacter then the mailbox name must have the +
> # since the + is the default used character.
> #
> #QuarantineMailbox      +quarantine
>
> #
> # OnFail: What to do if local delivery or quarantine should fail. If set
> # to "unlearn", DSPAM will unlearn the message prior to exiting with an
> # un successful return code. The default option, "error" will not unlearn
> # the message but return the appropriate error code. The unlearn option
> # is use-ful on some systems where local delivery failures will cause the
> # message to be requeued for delivery, and could result in the message
> # being processed multiple times. During a very large failure, however,
> # this could cause a significant load increase.
> #
> OnFail error
>
> #
> # Trusted Users: Only the users specified below will be allowed to perform
> # administrative functions in DSPAM such as setting the active user and
> # accessing tools. All other users attempting to run DSPAM will be
> restricted;
> # their uids will be forced to match the active username and they will not
> be
> # able to specify delivery agent privileges or use tools.
> #
> Trust root
> Trust dspam
> Trust www-data
> Trust mail
> Trust daemon
> Trust amavis
> #Trust nobody
> #Trust majordomo
>
> #
> # Debugging: Enables debugging for some or all users. IMPORTANT: DSPAM must
> # be compiled with debug support in order to use this option. DSPAM should
> # never be running in production with debug active unless you are
> # troubleshooting problems.
> #
> # DebugOpt: One or more of: process, classify, spam, fp, inoculation, corpus
> #   process     standard message processing
> #   classify    message classification using --classify
> #   spam        error correction of missed spam
> #   fp          error correction of false positives
> #   inoculation message inoculations (source=inoculation)
> #   corpus      corpusfed messages (source=corpus)
> #
> Debug process
> #Debug bob bill
> #
> #DebugOpt process spam fp
>
> #
> # ClassAlias: Alias a particular class to spam/nonspam. This is useful if
> # classifying things other than spam.
> #
> #ClassAliasSpam badstuff
> #ClassAliasNonspam goodstuff
>
> #
> # Training Mode: The default training mode to use for all operations, when
> # one has not been specified on the commandline or in the user's
> preferences.
> # Acceptable values are:
> #     toe     Train on Error (Only)
> #     teft    Train Everything (Trains on every message)
> #     tum     Train Until Mature (Train only tokens without enough data)
> #     notrain Do not train or store signatures (large ISP systems,
> post-train)
> #
> TrainingMode teft
>
> #
> # TestConditionalTraining: By default, dspam will retrain certain errors
> # until the condition is no longer met. This usually accelerates learning.
> # Some people argue that this can increase the risk of errors, however.
> #
> TestConditionalTraining on
>
> #
> # Features: Specify features to activate by default; can also be specified
> # on the commandline. See the documentation for a list of available
> features.
> # If _any_ features are specified on the commandline, these are ignored.
> #
> Feature noise
> Feature whitelist
>
> # Training Buffer: The training buffer waters down statistics during
> training.
> # It is designed to prevent false positives, but can also dramatically
> reduce
> # dspam's catch rate during initial training. This can be a number from 0
> # (no buffering) to 10 (maximum buffering). If you are paranoid about false
> # positives, you should probably enable this option.
> #
> #Feature tb=5
>
> #
> # Algorithms: Specify the statistical algorithms to use, overriding any
> # defaults configured in the build. The options are:
> #    naive       Naive-Bayesian (All Tokens)
> #    graham      Graham-Bayesian ("A Plan for Spam")
> #    burton      Burton-Bayesian (SpamProbe)
> #    robinson    Robinson's Geometric Mean Test (Obsolete)
> #    chi-square  Fisher-Robinson's Chi-Square Algorithm
> #
> # You may have multiple algorithms active simultaneously, but it is strongly
> # recommended that you group Bayesian algorithms with other Bayesian
> # algorithms, and any use of Chi-Square remain exclusive.
> #
> # NOTE: For standard "CRM114" Markovian weighting, use 'naive', or consider
> #       using 'burton' for slightly better accuracy
> #
> # Don't mess with this unless you know what you're doing
> #
> #Algorithm chi-square
> #Algorithm naive
> Algorithm graham burton
>
> #
> # Tokenizer: Specify the tokenizer to use. The tokenizer is the piece
> # responsible for parsing the message into individual tokens. Depending on
> # how many resources you are willing to trade off vs. accuracy, you may
> # choose to use a less or more detailed tokenizer:
> #   word    uniGram (single word) tokenizer
> #           Tokenizes message into single individual words/tokens
> #           example: "free" and "viagra"
> #   chain   biGram (chained tokens) tokenizer (default)
> #           Single words + chains adjacent tokens together
> #           example: "free" and "viagra" and "free viagra"
> #   sbph    Sparse Binary Polynomial Hashing tokenizer
> #           Creates sparse token patterns across sliding window of 5-tokens
> #           example: "the quick * fox jumped" and "the * * fox jumped"
> #   osb     Orthogonal Sparse biGram tokenizer
> #           Similar to SBPH, but only uses the biGrams
> #           example: "the * * fox" and "the * * * jumped"
> #
> # In general the reccomendation is to use 'osb' for new installations.
> # The default value of 'chain' remains here as not to surprise anyone
> upgrading
> # that has not changed from the default value.
> #
> Tokenizer chain
>
> #
> # PValue: Specify the technique used for calculating Probability Values,
> # overriding any defaults configured in the build. These options are:
> #    bcr         Bayesian Chain Rule (Graham's Technique - "A Plan for
> Spam")
> #    robinson    Robinson's Technique (used in Chi-Square)
> #    markov      Markovian Weighted Technique (for Markovian discrimination)
> #
> # Unlike the "Algorithms" property, you may only have one of these defined.
> # Use of the chi-square algorithm automatically changes this to robinson.
> #
> # Don't mess with this unless you know what you're doing.
> #
> #PValue robinson
> #PValue markov
> PValue bcr
>
> #
> # WebStats: Enable this if you are using the CGI, which writes .stats files
> WebStats on
>
> #
> # ImprobabilityDrive: Calculate odds-ratios for ham/spam, and add to
> # X-DSPAM-Improbability headers
> #
> #ImprobabilityDrive on
>
> #
> # Preferences: Specify any preferences to set by default, unless otherwise
> # overridden by the user (see next section) or a default.prefs file.
> # If user or default.prefs are found, the user's preferences will override
> any
> # defaults.
> #
> Preference "trainingMode=TEFT"          # { TOE | TUM | TEFT | NOTRAIN } ->
> default:teft
> Preference "spamAction=tag"             # { quarantine | tag | deliver } ->
> default:quarantine
> Preference "spamSubject=[SPAM]"         # { string } -> default:[SPAM]
> Preference "statisticalSedation=5"      # { 0 - 10 } -> default:0
> Preference "enableBNR=on"               # { on | off } -> default:off
> Preference "enableWhitelist=on"         # { on | off } -> default:on
> Preference "signatureLocation=message"  # { message | headers } ->
> default:message
> Preference "tagSpam=off"                # { on | off }
> Preference "tagNonspam=off"             # { on | off }
> Preference "showFactors=off"            # { on | off } -> default:off
> Preference "optIn=off"                  # { on | off }
> Preference "optOut=off"                 # { on | off }
> Preference "whitelistThreshold=10"      # { Integer } -> default:10
> Preference "makeCorpus=off"             # { on | off } -> default:off
> Preference "storeFragments=off"         # { on | off } -> default:off
> Preference "localStore="                # { on | off } -> default:username
> Preference "processorBias=on"           # { on | off } -> default:on
> Preference "fallbackDomain=off"         # { on | off } -> default:off
> Preference "trainPristine=off"          # { on | off } -> default:off
> Preference "optOutClamAV=off"           # { on | off } -> default:off
> Preference "ignoreRBLLookups=off"       # { on | off } -> default:off
> Preference "RBLInoculate=off"           # { on | off } -> default:off
> Preference "notifications=off"          # { on | off } -> default:off
>
> #
> # Overrides: Specifies the user preferences which may override configuration
> # and commandline defaults. Any other preferences supplied by an untrusted
> user
> # will be ignored.
> #
> AllowOverride enableBNR
> AllowOverride enableWhitelist
> AllowOverride fallbackDomain
> AllowOverride ignoreGroups
> AllowOverride ignoreRBLLookups
> AllowOverride localStore
> AllowOverride makeCorpus
> AllowOverride optIn
> AllowOverride optOut
> AllowOverride optOutClamAV
> AllowOverride processorBias
> AllowOverride RBLInoculate
> AllowOverride showFactors
> AllowOverride signatureLocation
> AllowOverride spamAction
> AllowOverride spamSubject
> AllowOverride statisticalSedation
> AllowOverride storeFragments
> AllowOverride tagNonspam
> AllowOverride tagSpam
> AllowOverride trainPristine
> AllowOverride trainingMode
> AllowOverride whitelistThreshold
> AllowOverride dailyQuarantineSummary
> AllowOverride notifications
>
> # --- Profiles ---
>
> #
> # You can specify multiple storage profiles, and specify the server to
> # use on the commandline with --profile. For example:
> #
> #Profile DECAlpha
> #MySQLServer.DECAlpha   10.0.0.1
> #MySQLPort.DECAlpha     3306
> #MySQLUser.DECAlpha     dspam
> #MySQLPass.DECAlpha     changeme
> #MySQLDb.DECAlpha       dspam
> #MySQLCompress.DECAlpha true
> #MySQLReconnect.DECAlpha        true
> #
> #Profile Sun420R
> #MySQLServer.Sun420R    10.0.0.2
> #MySQLPort.Sun420R      3306
> #MySQLUser.Sun420R      dspam
> #MySQLPass.Sun420R      changeme
> #MySQLDb.Sun420R        dspam
> #MySQLCompress.Sun420R  false
> #MySQLReconnect.Sun420R true
> #
> #DefaultProfile DECAlpha
>
> #
> # If you're using storage profiles, you can set failovers for each profile.
> # Of course, if you'll be failing over to another database, that database
> # must have the same information as the first. If you're using a global
> # database with no training, this should be relatively simple. If you're
> # configuring per-user data, however, you'll need to set up some type of
> # replication between databases.
> #
> #Failover.DECAlpha      SUN420R
> #Failover.Sun420R       DECAlpha
>
> # If the storage fails, the agent will follow each profile's failover up to
> # a maximum number of failover attempts. This should be set to a maximum of
> # the number of profiles you have, otherwise the agent could loop and try
> # the same profile multiple times (unless this is your desired behavior).
> #
> #FailoverAttempts       1
>
> #
> # Ignored headers: If DSPAM is behind other tools which may add a header to
> # incoming emails, it may be beneficial to ignore these headers - especially
> # if they are coming from another spam filter. If you are _not_ using one of
> # these tools, however, leaving the appropriate headers commented out will
> # allow DSPAM to use them as telltale signs of forged email.
> #
> #IgnoreHeader X-Spam-Status
> #IgnoreHeader X-Spam-Scanned
> #IgnoreHeader X-Virus-Scanner-Result
>
> #
> # Lookup: Perform lookups on streamlined blackhole list servers (see
> # http://www.nuclearelephant.com/projects/sbl/). The streamlined blacklist
> # server is machine-automated, unsupervised blacklisting system designed to
> # provide real-time and highly accurate blacklisting based on network
> spread.
> # When performing a lookup, DSPAM will automatically learn the inbound
> message
> # as spam if the source IP is listed. Until an official public RABL server
> is
> # available, this feature is only useful if you are running your own
> # streamlined blackhole list server for internal reporting among multiple
> mail
> # servers. Provide the name of the lookup zone below to use.
> #
> # This function performs standard reverse-octet.domain lookups, and while it
> # will function with many RBLs, it's strongly discouraged to use those
> # maintained by humans as they're often inaccurate and could hurt filter
> # learning and accuracy.
> #
> #Lookup         "sbl.example.org"
>
> #
> # RBLInoculate: If you want to inoculate the user from RBL'd messages it
> would
> # have otherwise missed, set this to on.
> #
> #RBLInoculate   off
>
> #
> # Notifications: Enable the sending of notification emails to users (first
> # message, quarantine full, etc.)
> #
> Notifications   off
>
> # TxtDirectory: the directory that holds the templates for notification
> # messages (see Notifications) and tagging (see tagSpam/tagNonspam).
> #
> #TxtDirectory /etc/dspam/txt
>
> #
> # QuarantineWarnSize: You may specify a size when DSPAM should send a
> "Quarantine
> # Full" message to each user. This is only working if you enable
> notifications
> # (see above). Value is in bytes. Default is 2097152 -> 2MB.
> #
> #QuarantineWarnSize 2097152
>
> #
> # Purge configuration: Set dspam_clean purge default options, if not
> otherwise
> # specified on the commandline
> #
> PurgeSignatures 14      # Stale signatures
> PurgeNeutral    90      # Tokens with neutralish probabilities
> PurgeUnused     90      # Unused tokens
> PurgeHapaxes    30      # Tokens with less than 5 hits (hapaxes)
> PurgeHits1S     15      # Tokens with only 1 spam hit
> PurgeHits1I     15      # Tokens with only 1 innocent hit
>
> #
> # Purge configuration for SQL-based installations using purge.sql
> #
> #PurgeSignature off     # Specified in purge.sql
> #PurgeNeutral   90
> #PurgeUnused    off     # Specified in purge.sql
> #PurgeHapaxes   off     # Specified in purge.sql
> #PurgeHits1S    off     # Specified in purge.sql
> #PurgeHits1I    off     # Specified in purge.sql
>
> #
> # Local Mail Exchangers: Used for source address tracking, tells DSPAM which
> # mail exchangers are local and therefore should be ignored in the Received:
> # header when tracking the source of an email. Note: you should use the
> address
> # of the host as appears between brackets [ ] in the Received header.
> # By default DSPAM is considering the following IPs always as LocalMX:
> #       10.0.0.0/8      - Private IP addresses (RFC 1918)
> #       127.0.0.0/8     - Localhost Loopback Address (RFC 1700)
> #       169.254.0.0/16  - Zeroconf / APIPA (RFC 3330)
> #       172.16.0.0/12   - Private IP addresses (RFC 1918)
> #       192.168.0.0/16  - Private IP addresses (RFC 1918)
> #
> LocalMX 127.0.0.1
>
> #
> # Logging: Disabling logging for users will make usage graphs unavailable to
> # them. Disabling system logging will make admin graphs unavailable.
> #
> SystemLog       on
> UserLog         on
>
> #
> # TrainPristine: for systems where the original message remains server side
> # and can therefore be presented in pristine format for retraining. This
> option
> # will cause DSPAM to cease all writing of signatures and DSPAM headers to
> the
> # message, and deliver the message in as pristine format as possible. This
> mode
> # REQUIRES that the original message in its pristine format (as of delivery)
> # be presented for retraining, as in the case of webmail, imap, or other
> # applications where the message is actually kept server-side during
> reading,
> # and is preserved. DO NOT use this switch unless the original message can
> be
> # presented for retraining with the ORIGINAL HEADERS and NO MODIFICATIONS.
> #
> # NOTE: You can't use this setting with dspam_trian; if you're going to use
> it,
> #       wait until after you train any corpora.
> #
> #TrainPristine on
>
> #
> # Opt: in or out; determines DSPAM's default filtering behavior. If this
> value
> # is set to in, users must opt-in to filtering by dropping a .dspam file in
> # /var/dspam/opt-in/user.dspam (or if you have homedirs configured, a .dspam
> # folder in their home directory).  The default is opt-out, which means all
> # users will be filtered unless a .nodspam file is dropped in
> # /var/dspam/opt-out/user.nodspam
> #
> Opt out
>
> #
> # TrackSources: specify which (if any) source addresses to track and report
> # them to syslog (mail.info). This is useful if you're running a firewall or
> # blacklist and would like to use this information. Spam reporting also
> drops
> # RABL blacklist files (see http://www.nuclearelephant.com/projects/rabl/).
> #
> #TrackSources spam nonspam virus
>
> #
> # ParseToHeaders: In lieu of setting up individual aliases for each user,
> # DSPAM can be configured to automatically parse the To: address for spam
> and
> # false positive forwards. From there, it can be configured to either set
> the
> # DSPAM user based on the username specified in the header and/or change the
> # training class and source accordingly. The options below can be used to
> # customize most common types of header parsing behavior to avoid the need
> for
> # multiple aliases, or if using LMTP, aliases entirely..
> #
> # ParseToHeader: Parse the To: headers of an incoming message. This must be
> #                set to 'on' to use either of the following features.
> #
> # ChangeModeOnParse: Automatically change the class (to spam or innocent)
> #   depending on whether spam- or notspam- was specified, and change the
> source
> #   to 'error'. This is convenient if you're not using aliases at all, but
> #   are delivering via LMTP.
> #
> # ChangeUserOnParse: Automatically change the username to match that
> specified
> #   in the To: header. For example, [hidden email] will set the
> username
> #   to bob, ignoring any --user passed in. This may not always be desirable
> if
> #   you are using virtual email addresses as usernames. Options:
> #     on or user        take the portion before the @ sign only
> #     full              take everything after the initial {spam,notspam}-.
> #
> ParseToHeaders on
> ChangeModeOnParse on
> #ChangeUserOnParse on
> ChangeUserOnParse full
>
> #
> # Broken MTA Options: Some MTAs don't support the proper functionality
> # necessary. In these cases you can activate certain features in DSPAM to
> # compensate. 'returnCodes' causes DSPAM to return an exit code of 99 if
> # the message is spam, 0 if not, or a negative code if an error has occured.
> # Specifying 'case' causes DSPAM to force the input usernames to lowercase.
> # Specifying 'lineStripping' causes DSPAM to strip ^M's from messages passed
> # in.
> #
> #Broken returnCodes
> #Broken case
> #Broken lineStripping
>
> #
> # MaxMessageSize: You may specify a maximum message size for DSPAM to
> process.
> # If the message is larger than the maximum size, it will be delivered
> # without processing. Value is in bytes.
> #
> #MaxMessageSize 4194304
>
> # --- ClamAV ---
>
> #
> # Virus Checking: If you are running clamd, DSPAM can perform stream-based
> # virus checking using TCP. Uncomment the values below to enable virus
> # checking.
> #
> # ClamAVResponse: reject (reject or drop the message with a permanent
> failure)
> #                 accept (accept the message and quietly drop the message)
> #                 spam   (treat as spam and quarantine/tag/whatever)
> #
> #ClamAVPort             3310
> #ClamAVHost             127.0.0.1
> #ClamAVResponse         accept
>
> # --- CLIENT / SERVER ---
>
> #
> # Daemonized Server: If you are running DSPAM as a daemonized server using
> # --daemon, the following parameters will override the default. Use the
> # ServerPass option to set up accounts for each client machine. The DSPAM
> # server will process and deliver the message based on the parameters
> # specified. If you want the client machine to perform delivery, use
> # the --stdout option in conjunction with a local setup.
> #
> # ServerHost: Not enabling ServerHost will bind DSPAM server to all
> available
> # interfaces.
> #
> # ServerPort: Default upstream configuration is to run dspam daemon on port
> # 24. On Debian, dspam being run as a unprivileged user, default port is
> # set to 2424.
> #
> #ServerHost             127.0.0.1
> #ServerPort             2424
> #ServerQueueSize        32
> ServerPID               /var/run/dspam/dspam.pid
>
> #
> # ServerMode specifies the type of LMTP server to start. This can be one of:
> #     dspam: DSPAM-proprietary DLMTP server, for communicating with dspamc
> #  standard: Standard LMTP server, for communicating with Postfix or other
> MTA
> #      auto: Speak both DLMTP and LMTP; auto-detect by ServerPass.IDENT
> #
> #ServerMode dspam
> ServerMode auto
>
> # If supporting DLMTP (dspam) mode, dspam clients will require
> authentication
> # as they will be passing in parameters. The idents below will be used to
> # determine which clients will be speaking DLMTP, so if you will be using
> # both LMTP and DLMTP from the same host, be sure to use something other
> # than the server's hostname below (which will be sent by the MTA during a
> # standard LMTP LHLO).
> #
> #ServerPass.Relay1      "secret"
> #ServerPass.Relay2      "password"
>
> # If supporting standard LMTP mode, server parameters will need to be
> specified
> # here, as they will not be passed in by the mail server. The ServerIdent
> # specifies the 250 response code ident sent back to connecting clients and
> # should be set to the hostname of your server, or an alias.
> #
> # NOTE: If you specify --user in ServerParameters, the RCPT TO will be
> #       used only for delivery, and not set as the active user for
> processing.
> #
> #ServerParameters       "--deliver=innocent -d %u"
> ServerParameters        "--deliver=innocent"
> #ServerIdent            "localhost.localdomain"
> ServerIdent             "localhost.xm23.net"
>
> # If you wish to use a local domain socket instead of a TCP socket,
> uncomment
> # the following. It is strongly recommended you use local domain sockets if
> # you are running the client and server on the same machine, as it
> eliminates
> # much of the bandwidth overhead.
> #
> #ServerDomainSocketPath "/var/run/dspam/dspam.sock"
> ServerDomainSocketPath  "/var/spool/postfix/tmp/dspam.sock"
>
> #
> # Client Mode: If you are running DSPAM in client/server mode, uncomment and
> # set these variables. A ClientHost beginning with a / will be treated as
> # a domain socket.
> #
> #ClientHost     /var/run/dspam/dspam.sock
> #ClientIdent    "secret@Relay1"
> #
> #ClientHost     127.0.0.1
> #ClientPort     2424
> #ClientIdent    "secret@Relay1"
>
> # --- RABL ---
>
> # RABLQueue: Touch files in the RABL queue
> # If you are a reporting streamlined blackhole list participant, you can
> # touch ip addresses within the directory the rabl_client process is
> watching.
> #
> #RABLQueue      /var/spool/rabl
>
> # ---  ---
>
> # DataSource: If you are using any type of data source that does not include
> # email-like headers (such as documents), uncomment the line below. This
> # will cause the entire input to be treated like a message "body"
> #
> #DataSource document
>
> # ProcessorWordFrequency: By default, words are only counted once per
> message.
> # If you are classifying large documents, however, you may wish to count
> once
> # per occurrence instead.
> #
> #ProcessorWordFrequency occurrence
>
> # ProcessorURLContext: By default, a URL context is generated for URLs,
> which
> # records their tokens as separate from words found in documents. To use
> # URL tokens in the same context as words, turn this feature off.
> #
> ProcessorURLContext on
>
> # ProcessorBias: Bias causes the filter to lean more toward 'innocent', and
> # usually greatly reduces false positives. It is the default behavior of
> # most Bayesian filters (including dspam).
> #
> # NOTE: You probably DONT want this if you're using Markovian Weighting,
> unless
> # you are paranoid about false positives.
> #
> ProcessorBias on
>
> # StripRcptDomain: Cut the domain (including the at sign) from recipients.
> # This is particularly useful if the recipient name is equal to real user
> # accounts as recipients with domains tend to cause permission issues with
> # dspam-web.
> #
> StripRcptDomain off
>
> # GroupConfig: The configuration file for groups. See the README file
> # for details on how to enable users to combine their training data to
> # get better results.
> GroupConfig /var/spool/dspam/group
>
> # --- Split Configuration File Support ---
>
> # Include a directory with configuration items.
> Include /etc/dspam/dspam.d/
>
> # ---  ---
>
> IgnoreHeader Accept-Language
> IgnoreHeader Authentication-Results
> IgnoreHeader Content-Type
> IgnoreHeader DKIM-Signature
> IgnoreHeader Date
> IgnoreHeader DomainKey-Signature
> IgnoreHeader Importance
> IgnoreHeader In-Reply-To
> IgnoreHeader List-Archive
> IgnoreHeader List-Help
> IgnoreHeader List-Id
> IgnoreHeader List-Post
> IgnoreHeader List-Subscribe
> IgnoreHeader List-Unsubscribe
> IgnoreHeader Message-ID
> IgnoreHeader Message-Id
> IgnoreHeader Organization
> IgnoreHeader Received
> IgnoreHeader Received-SPF
> IgnoreHeader References
> IgnoreHeader Reply-To
> IgnoreHeader Resent-Date
> IgnoreHeader Resent-From
> IgnoreHeader Thread-Index
> IgnoreHeader Thread-Topic
> IgnoreHeader User-Agent
> IgnoreHeader X-policyd-weight
> IgnoreHeader thread-index

> ------------------------------------------------------------------------------

> _______________________________________________
> Dspam-user mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/dspam-user


------------------------------------------------------------------------------
_______________________________________________
Dspam-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspam-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Dspam-user] DSPAM Not Working Very Effectively

Jerry Gardner
In reply to this post by David A. Desrosiers
Yes.

On Fri, Sep 2, 2016 at 1:14 PM, David A. Desrosiers <[hidden email]> wrote:
On 9/2/16 3:44 PM, Jerry Gardner wrote:
> What can I do to help DSPAM learn faster so I can cut down all of the
> spam I have to manually deal with each day?

Did you feed it the initial corpus to train it with both ham and spam?

------------------------------------------------------------------------------
_______________________________________________
Dspam-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspam-user


------------------------------------------------------------------------------

_______________________________________________
Dspam-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspam-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Dspam-user] DSPAM Not Working Very Effectively

Jerry Gardner
In reply to this post by ktm@rice.edu
On Fri, Sep 2, 2016 at 1:14 PM, Kenneth Marshall <[hidden email]> wrote:
Hi Jerry,

First, I see that you have teft set as the training mode. Change that
to toe, instead. Otherwise your accuracy can degrade over time. Second,
you are using the hash driver and the chain tokenizer, even though the
comments in the config file recommends osb instead. Make those two
changes and start over.

Hi Ken,

Thanks for the reply. When you say "start over" do you mean to say that I should delete my existing DSPAM databases and recreate them from scratch after changing the dspam.conf file to change to osb and toe?

I'm willing to give this a try. What's the best way to delete them and start over? Just do an "rm /var/spool/dspam/*"?

Thanks,
Jerry

 

------------------------------------------------------------------------------

_______________________________________________
Dspam-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspam-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Dspam-user] DSPAM Not Working Very Effectively

ktm@rice.edu
On Tue, Sep 06, 2016 at 10:06:11AM -0700, Jerry Gardner wrote:

> On Fri, Sep 2, 2016 at 1:14 PM, Kenneth Marshall <[hidden email]> wrote:
>
> > Hi Jerry,
> >
> > First, I see that you have teft set as the training mode. Change that
> > to toe, instead. Otherwise your accuracy can degrade over time. Second,
> > you are using the hash driver and the chain tokenizer, even though the
> > comments in the config file recommends osb instead. Make those two
> > changes and start over.
> >
>
> Hi Ken,
>
> Thanks for the reply. When you say "start over" do you mean to say that I
> should delete my existing DSPAM databases and recreate them from scratch
> after changing the dspam.conf file to change to osb and toe?
>
> I'm willing to give this a try. What's the best way to delete them and
> start over? Just do an "rm /var/spool/dspam/*"?
>
> Thanks,
> Jerry

Hi Jerry,

Yes, just delete them. You can look at /etc/dspam.conf to see where they
are stored.

Regards,
Ken

------------------------------------------------------------------------------
_______________________________________________
Dspam-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspam-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Dspam-user] DSPAM Not Working Very Effectively

Jerry Gardner


On Tue, Sep 6, 2016 at 10:30 AM, Kenneth Marshall <[hidden email]> wrote:

Yes, just delete them. You can look at /etc/dspam.conf to see where they
are stored.

Okay, thanks. I've saved all spam and non-spam I've received since I set this server up. Should I use it to train DSPAM before I change teft to toe, or should I train it with it set to toe?



------------------------------------------------------------------------------

_______________________________________________
Dspam-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspam-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Dspam-user] DSPAM Not Working Very Effectively

ktm@rice.edu
On Tue, Sep 06, 2016 at 10:40:15AM -0700, Jerry Gardner wrote:

> On Tue, Sep 6, 2016 at 10:30 AM, Kenneth Marshall <[hidden email]> wrote:
>
> >
> > Yes, just delete them. You can look at /etc/dspam.conf to see where they
> > are stored.
> >
>
> Okay, thanks. I've saved all spam and non-spam I've received since I set
> this server up. Should I use it to train DSPAM before I change teft to toe,
> or should I train it with it set to toe?

Hi Jerry,

Just set it to TOE. Also, train a balanced set with an equal number of SPAM
and not-SPAM messages.

Regards,
Ken

------------------------------------------------------------------------------
_______________________________________________
Dspam-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspam-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Dspam-user] DSPAM Not Working Very Effectively

Jerry Gardner
Ken,

I erased the DSPAM database, reconfigured it to toe/osb, and retrained. Now it's working much better. Spam hit rate is >90%.

Thanks!




------------------------------------------------------------------------------

_______________________________________________
Dspam-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspam-user
Loading...