dspam not training

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

dspam not training

waterdog
I'm running postfix, dspam version 3.10.2, and dovecot with the dspam plugin but it doesn't seem to be classifying email or training properly.  I've configured everything according to the documentation and have been pulling my hair out for the past two days trying to figure out why it's not working.  I've attached the relevant configuration files and would be very grateful if someone here could explain what's wrong.

dspam.conf
20-imap.conf
master.cf
main.cf

I'm confused about what to expect from dspam on the command line.  Using the training corpuses, I get the following results:

dspam --user <username> --classify < ham/00250.c7603b27a45284d12b49adf767b2b6fa
X-DSPAM-Result: <username>; result="Innocent"; class="Innocent"; probability=0.0000; confidence=1.00; signature=N/A

dspam --user <username> --classify < spam/00500.85b72f09f6778a085dc8b6821965a76f
X-DSPAM-Result: <username>; result="Innocent"; class="Innocent"; probability=0.0000; confidence=1.00; signature=N/A

I would expect to get a result of Spam for a message in the SPAM corpus but dspam is giving it an Innocent result.  Also, when I run this command on a known clean message in my inbox with a tag of X-dspam-result: Innocent, I get the following:

dspam --user <username> --classify < 1436832517.M373995P7093.www,S=117373,W=120324:2,
X-DSPAM-Result: <username>; result="Spam"; class="Spam"; probability=1.0000; confidence=1.00; signature=55a4530550068593316169

Now, I know this is not SPAM!  Here are the current stats for this user:

Training Snapshot:
<username>          TP:     0 TN:   752 FP:     0 FN:     0 SC:     0 NC:     0
                  SHR:  100.00%       HSR:    0.00%       OCA:  100.00%

Overall Statistics:
<username>          TP:    64 TN: 14985 FP:     0 FN:  5710 SC:     0 NC:     0
                  SHR:    1.11%       HSR:    0.00%       OCA:   72.49%

Also, it doesn't seem like the dovecot antispam plugin is working either.  I get repeated junk mail that I move into my Junk folder but more of the same continues to get delivered to my inbox everyday.
Reply | Threaded
Open this post in threaded view
|

Re: [Dspam-user] dspam not training

Eric Broch
I train only spam, never corpus. I only train when spam email was
classified as innocent by DSPAM and ends up in the inbox.
Using IMAP I have users move spam (marked as innocent by DSPAM) from
their inbox to their spam folder and loop through every email (server
side) and learn with the following command:

'cat $email-file | dspam --user $USER@$DOMAIN --mode=teft --class=spam
--source=error'

I get roughly 98-99% accuracy presently.

If I had it to do over, I would set my configuration to Train-On-Error
(TOE), but it works well enough as it is and I don't feel like
retraining. It took between 100-200 emails to start seeing result, but
when we did, and boy did we, it was refreshing for clients to have
uncluttered inboxes.





On 7/14/2015 5:34 PM, waterdog wrote:

> I'm running postfix, dspam version 3.10.2, and dovecot with the dspam plugin
> but it doesn't seem to be classifying email or training properly.  I've
> configured everything according to the documentation and have been pulling
> my hair out for the past two days trying to figure out why it's not working.
> I've attached the relevant configuration files and would be very grateful if
> someone here could explain what's wrong.
>
> dspam.conf
> <http://dspam-users.2290790.n4.nabble.com/file/n4641961/dspam.conf>  
> 20-imap.conf
> <http://dspam-users.2290790.n4.nabble.com/file/n4641961/20-imap.conf>  
> master.cf <http://dspam-users.2290790.n4.nabble.com/file/n4641961/master.cf>  
> main.cf <http://dspam-users.2290790.n4.nabble.com/file/n4641961/main.cf>  
>
> I'm confused about what to expect from dspam on the command line.  Using the
> training corpuses, I get the following results:
>
> dspam --user <username> --classify <
> ham/00250.c7603b27a45284d12b49adf767b2b6fa
> X-DSPAM-Result: <username>; result="Innocent"; class="Innocent";
> probability=0.0000; confidence=1.00; signature=N/A
>
> dspam --user <username> --classify <
> spam/00500.85b72f09f6778a085dc8b6821965a76f
> X-DSPAM-Result: <username>; result="Innocent"; class="Innocent";
> probability=0.0000; confidence=1.00; signature=N/A
>
> I would expect to get a result of Spam for a message in the SPAM corpus but
> dspam is giving it an Innocent result.  Also, when I run this command on a
> known clean message in my inbox with a tag of X-dspam-result: Innocent, I
> get the following:
>
> dspam --user <username> --classify <
> 1436832517.M373995P7093.www,S=117373,W=120324:2,
> X-DSPAM-Result: <username>; result="Spam"; class="Spam"; probability=1.0000;
> confidence=1.00; signature=55a4530550068593316169
>
> Now, I know this is not SPAM!  Here are the current stats for this user:
>
> Training Snapshot:
> <username>          TP:     0 TN:   752 FP:     0 FN:     0 SC:     0 NC:    
> 0
>                   SHR:  100.00%       HSR:    0.00%       OCA:  100.00%
>
> Overall Statistics:
> <username>          TP:    64 TN: 14985 FP:     0 FN:  5710 SC:     0 NC:    
> 0
>                   SHR:    1.11%       HSR:    0.00%       OCA:   72.49%
>
> Also, it doesn't seem like the dovecot antispam plugin is working either.  I
> get repeated junk mail that I move into my Junk folder but more of the same
> continues to get delivered to my inbox everyday.
>
>
>
> --
> View this message in context: http://dspam-users.2290790.n4.nabble.com/dspam-not-training-tp4641961.html
> Sent from the dspam users mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> Don't Limit Your Business. Reach for the Cloud.
> GigeNET's Cloud Solutions provide you with the tools and support that
> you need to offload your IT needs and focus on growing your business.
> Configured For All Businesses. Start Your Cloud Today.
> https://www.gigenetcloud.com/
> _______________________________________________
> Dspam-user mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/dspam-user



------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
Dspam-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspam-user
Reply | Threaded
Open this post in threaded view
|

Re: [Dspam-user] dspam not training

waterdog
Eric,

This doesn't seem to be working right.  Here is an example of running dspam on a known clean email in my inbox:

1) Initially, dspam incorrectly classifies the message as Spam even though it delivered the email properly.

dspam --user <username> --classify < 1436977475.M667188P27913.www,S=22671,W=23095:2,S
X-DSPAM-Result: <username>; result="Spam"; class="Spam"; probability=1.0000; confidence=1.00; signature=55a6894350061878812237

2) Then, I tell dspam to reclassify the same message as Innocent.

dspam --user <username> --mode=teft --class=innocent --source=error < 1436977475.M667188P27913.www,S=22671,W=23095:2,S

3) However, it still incorrectly classifies the message as Spam even after I reclassify it as Innocent.

dspam --user <username> --classify < 1436977475.M667188P27913.www,S=22671,W=23095:2,S
X-DSPAM-Result: <username>; result="Spam"; class="Spam"; probability=1.0000; confidence=1.00;  signature=55a6894350061878812237

Am I missing something or interpreting this wrong?  I can't understand why dspam continues to tag messages incorrectly.  How can I clean the dspam database entirely and start over?  The man page for dspam_clean says that it doesn't work for the hash storage driver and I can't find any clear instructions.

Also, you mention that you retrain messages from the SPAM folder.  Do you use dovecot and have you tried the dovecot antispam plugin?  This is supposed to automatically train dspam when users move email between their SPAM folder and Inbox.  However, this doesn't seem to be working either.
Reply | Threaded
Open this post in threaded view
|

Re: [Dspam-user] dspam not training

Eric Broch
First, I would move to a MySQL (or Postfix) backend. I use MySQL because
I'm familiar with it. This would resolve any need to clean your hash
database.
I think misclassification of email is (at least in my case) a
problematic database and reason to clear it. I had a similar issue in
the past on which basis I cleared a database. I never could resolve the
issue by training. Maybe someone with more experience could chime in on
this.
I am familiar with the dovecot antispam plugin but don't use it. I may
give it a try some time in the future. One reason I don't use it now is
because Outlook 2013, which some clients use, strips headers out of
email causing difficulty, or impossibility, in DSPAM training.

Here's how I create a MySQL DSPAM DB:

1) Install MySQL/MariaDB server



2) Create DPSAM DB (minus the <begin> and <end> tags):
<begin>
MYSQLPW=mypassword
mysqladmin drop dspam -uroot -p$MYSQLPW
mysqladmin create dspam -uroot -p$MYSQLPW
mysqladmin -uroot -p$MYSQLPW reload
mysqladmin -uroot -p$MYSQLPW refresh

echo "GRANT ALL ON dspam.* TO dspam@localhost IDENTIFIED BY 'mydspampw'"
| mysql -uroot -p$MYSQLPW
mysqladmin -uroot -p$MYSQLPW reload
mysqladmin -uroot -p$MYSQLPW refresh

mysql -uroot -p$MYSQLPW dspam < dspamdb.sql
mysqladmin -uroot -p$MYSQLPW reload
mysqladmin -uroot -p$MYSQLPW refresh
<end>




3)
dspamdb.sql file (minus the <begin> and <end> tags):

<begin>
--
-- MySQL dump 10.13  Distrib 5.1.71, for redhat-linux-gnu (x86_64)
--
-- Host: localhost    Database: dspam
-- ------------------------------------------------------
-- Server version       5.1.71

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
/*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
/*!40103 SET TIME_ZONE='+00:00' */;
/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS,
FOREIGN_KEY_CHECKS=0 */;
/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
/*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;

--
-- Table structure for table `dspam_preferences`
--

DROP TABLE IF EXISTS `dspam_preferences`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `dspam_preferences` (
  `uid` int(10) unsigned NOT NULL,
  `preference` varchar(32) COLLATE latin1_general_ci NOT NULL,
  `value` varchar(64) COLLATE latin1_general_ci NOT NULL,
  UNIQUE KEY `id_preferences_01` (`uid`,`preference`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Table structure for table `dspam_signature_data`
--

DROP TABLE IF EXISTS `dspam_signature_data`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `dspam_signature_data` (
  `uid` int(10) unsigned NOT NULL,
  `signature` char(32) COLLATE latin1_general_ci NOT NULL,
  `data` longblob NOT NULL,
  `length` int(10) unsigned NOT NULL,
  `created_on` date NOT NULL,
  UNIQUE KEY `id_signature_data_01` (`uid`,`signature`),
  KEY `id_signature_data_02` (`created_on`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci
MAX_ROWS=2500000 AVG_ROW_LENGTH=8096;
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Table structure for table `dspam_stats`
--

DROP TABLE IF EXISTS `dspam_stats`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `dspam_stats` (
  `uid` int(10) unsigned NOT NULL,
  `spam_learned` bigint(20) unsigned NOT NULL,
  `innocent_learned` bigint(20) unsigned NOT NULL,
  `spam_misclassified` bigint(20) unsigned NOT NULL,
  `innocent_misclassified` bigint(20) unsigned NOT NULL,
  `spam_corpusfed` bigint(20) unsigned NOT NULL,
  `innocent_corpusfed` bigint(20) unsigned NOT NULL,
  `spam_classified` bigint(20) unsigned NOT NULL,
  `innocent_classified` bigint(20) unsigned NOT NULL,
  PRIMARY KEY (`uid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Table structure for table `dspam_token_data`
--

DROP TABLE IF EXISTS `dspam_token_data`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `dspam_token_data` (
  `uid` int(10) unsigned NOT NULL,
  `token` bigint(20) unsigned NOT NULL,
  `spam_hits` bigint(20) unsigned NOT NULL,
  `innocent_hits` bigint(20) unsigned NOT NULL,
  `last_hit` date NOT NULL,
  UNIQUE KEY `id_token_data_01` (`uid`,`token`),
  KEY `spam_hits` (`spam_hits`),
  KEY `innocent_hits` (`innocent_hits`),
  KEY `last_hit` (`last_hit`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci
PACK_KEYS=1;
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Table structure for table `dspam_virtual_uids`
--

DROP TABLE IF EXISTS `dspam_virtual_uids`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `dspam_virtual_uids` (
  `uid` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `username` varchar(128) DEFAULT NULL,
  PRIMARY KEY (`uid`),
  UNIQUE KEY `id_virtual_uids_01` (`username`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
/*!40101 SET character_set_client = @saved_cs_client */;
/*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */;

/*!40101 SET SQL_MODE=@OLD_SQL_MODE */;
/*!40014 SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS */;
/*!40014 SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS */;
/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
/*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
/*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;
/*!40111 SET SQL_NOTES=@OLD_SQL_NOTES */;

-- Dump completed on 2014-01-29 10:00:12

<end>




4) Storage Driver settings in dspam.conf

<begin>
StorageDriver /usr/local/lib/dspam/libmysql_drv.so

MySQLServer             /var/lib/mysql/mysql.sock
MySQLUser               dspam
MySQLPass               mydspampw
MySQLDb                 dspam
MySQLCompress           true
MySQLReconnect          true
<end>

Eric


On 7/15/2015 11:37 AM, waterdog wrote:

> Eric,
>
> This doesn't seem to be working right.  Here is an example of running dspam
> on a known clean email in my inbox:
>
> 1) Initially, dspam incorrectly classifies the message as Spam even though
> it delivered the email properly.
>
> dspam --user <username> --classify <
> 1436977475.M667188P27913.www,S=22671,W=23095:2,S
> X-DSPAM-Result: <username>; result="Spam"; class="Spam"; probability=1.0000;
> confidence=1.00; signature=55a6894350061878812237
>
> 2) Then, I tell dspam to reclassify the same message as Innocent.
>
> dspam --user <username> --mode=teft --class=innocent --source=error <
> 1436977475.M667188P27913.www,S=22671,W=23095:2,S
>
> 3) However, it still incorrectly classifies the message as Spam even after I
> reclassify it as Innocent.
>
> dspam --user <username> --classify <
> 1436977475.M667188P27913.www,S=22671,W=23095:2,S
> X-DSPAM-Result: <username>; result="Spam"; class="Spam"; probability=1.0000;
> confidence=1.00;  signature=55a6894350061878812237
>
> Am I missing something or interpreting this wrong?  I can't understand why
> dspam continues to tag messages incorrectly.  How can I clean the dspam
> database entirely and start over?  The man page for dspam_clean says that it
> doesn't work for the hash storage driver and I can't find any clear
> instructions.
>
> Also, you mention that you retrain messages from the SPAM folder.  Do you
> use dovecot and have you tried the dovecot antispam plugin?  This is
> supposed to automatically train dspam when users move email between their
> SPAM folder and Inbox.  However, this doesn't seem to be working either.
>
>
>
> --
> View this message in context: http://dspam-users.2290790.n4.nabble.com/dspam-not-training-tp4641961p4641963.html
> Sent from the dspam users mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> Don't Limit Your Business. Reach for the Cloud.
> GigeNET's Cloud Solutions provide you with the tools and support that
> you need to offload your IT needs and focus on growing your business.
> Configured For All Businesses. Start Your Cloud Today.
> https://www.gigenetcloud.com/
> _______________________________________________
> Dspam-user mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/dspam-user



------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
Dspam-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspam-user
Reply | Threaded
Open this post in threaded view
|

Re: [Dspam-user] dspam not training

Eric Broch
Sorry, not Postfix, but PostgreSQL.

On 7/15/2015 1:42 PM, Eric Broch wrote:

> First, I would move to a MySQL (or Postfix) backend. I use MySQL because
> I'm familiar with it. This would resolve any need to clean your hash
> database.
> I think misclassification of email is (at least in my case) a
> problematic database and reason to clear it. I had a similar issue in
> the past on which basis I cleared a database. I never could resolve the
> issue by training. Maybe someone with more experience could chime in on
> this.
> I am familiar with the dovecot antispam plugin but don't use it. I may
> give it a try some time in the future. One reason I don't use it now is
> because Outlook 2013, which some clients use, strips headers out of
> email causing difficulty, or impossibility, in DSPAM training.
>
> Here's how I create a MySQL DSPAM DB:
>
> 1) Install MySQL/MariaDB server
>
>
>
> 2) Create DPSAM DB (minus the <begin> and <end> tags):
> <begin>
> MYSQLPW=mypassword
> mysqladmin drop dspam -uroot -p$MYSQLPW
> mysqladmin create dspam -uroot -p$MYSQLPW
> mysqladmin -uroot -p$MYSQLPW reload
> mysqladmin -uroot -p$MYSQLPW refresh
>
> echo "GRANT ALL ON dspam.* TO dspam@localhost IDENTIFIED BY 'mydspampw'"
> | mysql -uroot -p$MYSQLPW
> mysqladmin -uroot -p$MYSQLPW reload
> mysqladmin -uroot -p$MYSQLPW refresh
>
> mysql -uroot -p$MYSQLPW dspam < dspamdb.sql
> mysqladmin -uroot -p$MYSQLPW reload
> mysqladmin -uroot -p$MYSQLPW refresh
> <end>
>
>
>
>
> 3)
> dspamdb.sql file (minus the <begin> and <end> tags):
>
> <begin>
> --
> -- MySQL dump 10.13  Distrib 5.1.71, for redhat-linux-gnu (x86_64)
> --
> -- Host: localhost    Database: dspam
> -- ------------------------------------------------------
> -- Server version       5.1.71
>
> /*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
> /*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
> /*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
> /*!40101 SET NAMES utf8 */;
> /*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
> /*!40103 SET TIME_ZONE='+00:00' */;
> /*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
> /*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS,
> FOREIGN_KEY_CHECKS=0 */;
> /*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
> /*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;
>
> --
> -- Table structure for table `dspam_preferences`
> --
>
> DROP TABLE IF EXISTS `dspam_preferences`;
> /*!40101 SET @saved_cs_client     = @@character_set_client */;
> /*!40101 SET character_set_client = utf8 */;
> CREATE TABLE `dspam_preferences` (
>   `uid` int(10) unsigned NOT NULL,
>   `preference` varchar(32) COLLATE latin1_general_ci NOT NULL,
>   `value` varchar(64) COLLATE latin1_general_ci NOT NULL,
>   UNIQUE KEY `id_preferences_01` (`uid`,`preference`)
> ) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;
> /*!40101 SET character_set_client = @saved_cs_client */;
>
> --
> -- Table structure for table `dspam_signature_data`
> --
>
> DROP TABLE IF EXISTS `dspam_signature_data`;
> /*!40101 SET @saved_cs_client     = @@character_set_client */;
> /*!40101 SET character_set_client = utf8 */;
> CREATE TABLE `dspam_signature_data` (
>   `uid` int(10) unsigned NOT NULL,
>   `signature` char(32) COLLATE latin1_general_ci NOT NULL,
>   `data` longblob NOT NULL,
>   `length` int(10) unsigned NOT NULL,
>   `created_on` date NOT NULL,
>   UNIQUE KEY `id_signature_data_01` (`uid`,`signature`),
>   KEY `id_signature_data_02` (`created_on`)
> ) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci
> MAX_ROWS=2500000 AVG_ROW_LENGTH=8096;
> /*!40101 SET character_set_client = @saved_cs_client */;
>
> --
> -- Table structure for table `dspam_stats`
> --
>
> DROP TABLE IF EXISTS `dspam_stats`;
> /*!40101 SET @saved_cs_client     = @@character_set_client */;
> /*!40101 SET character_set_client = utf8 */;
> CREATE TABLE `dspam_stats` (
>   `uid` int(10) unsigned NOT NULL,
>   `spam_learned` bigint(20) unsigned NOT NULL,
>   `innocent_learned` bigint(20) unsigned NOT NULL,
>   `spam_misclassified` bigint(20) unsigned NOT NULL,
>   `innocent_misclassified` bigint(20) unsigned NOT NULL,
>   `spam_corpusfed` bigint(20) unsigned NOT NULL,
>   `innocent_corpusfed` bigint(20) unsigned NOT NULL,
>   `spam_classified` bigint(20) unsigned NOT NULL,
>   `innocent_classified` bigint(20) unsigned NOT NULL,
>   PRIMARY KEY (`uid`)
> ) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;
> /*!40101 SET character_set_client = @saved_cs_client */;
>
> --
> -- Table structure for table `dspam_token_data`
> --
>
> DROP TABLE IF EXISTS `dspam_token_data`;
> /*!40101 SET @saved_cs_client     = @@character_set_client */;
> /*!40101 SET character_set_client = utf8 */;
> CREATE TABLE `dspam_token_data` (
>   `uid` int(10) unsigned NOT NULL,
>   `token` bigint(20) unsigned NOT NULL,
>   `spam_hits` bigint(20) unsigned NOT NULL,
>   `innocent_hits` bigint(20) unsigned NOT NULL,
>   `last_hit` date NOT NULL,
>   UNIQUE KEY `id_token_data_01` (`uid`,`token`),
>   KEY `spam_hits` (`spam_hits`),
>   KEY `innocent_hits` (`innocent_hits`),
>   KEY `last_hit` (`last_hit`)
> ) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci
> PACK_KEYS=1;
> /*!40101 SET character_set_client = @saved_cs_client */;
>
> --
> -- Table structure for table `dspam_virtual_uids`
> --
>
> DROP TABLE IF EXISTS `dspam_virtual_uids`;
> /*!40101 SET @saved_cs_client     = @@character_set_client */;
> /*!40101 SET character_set_client = utf8 */;
> CREATE TABLE `dspam_virtual_uids` (
>   `uid` int(10) unsigned NOT NULL AUTO_INCREMENT,
>   `username` varchar(128) DEFAULT NULL,
>   PRIMARY KEY (`uid`),
>   UNIQUE KEY `id_virtual_uids_01` (`username`)
> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
> /*!40101 SET character_set_client = @saved_cs_client */;
> /*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */;
>
> /*!40101 SET SQL_MODE=@OLD_SQL_MODE */;
> /*!40014 SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS */;
> /*!40014 SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS */;
> /*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
> /*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
> /*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;
> /*!40111 SET SQL_NOTES=@OLD_SQL_NOTES */;
>
> -- Dump completed on 2014-01-29 10:00:12
>
> <end>
>
>
>
>
> 4) Storage Driver settings in dspam.conf
>
> <begin>
> StorageDriver /usr/local/lib/dspam/libmysql_drv.so
>
> MySQLServer             /var/lib/mysql/mysql.sock
> MySQLUser               dspam
> MySQLPass               mydspampw
> MySQLDb                 dspam
> MySQLCompress           true
> MySQLReconnect          true
> <end>
>
> Eric
>
>
> On 7/15/2015 11:37 AM, waterdog wrote:
>> Eric,
>>
>> This doesn't seem to be working right.  Here is an example of running dspam
>> on a known clean email in my inbox:
>>
>> 1) Initially, dspam incorrectly classifies the message as Spam even though
>> it delivered the email properly.
>>
>> dspam --user <username> --classify <
>> 1436977475.M667188P27913.www,S=22671,W=23095:2,S
>> X-DSPAM-Result: <username>; result="Spam"; class="Spam"; probability=1.0000;
>> confidence=1.00; signature=55a6894350061878812237
>>
>> 2) Then, I tell dspam to reclassify the same message as Innocent.
>>
>> dspam --user <username> --mode=teft --class=innocent --source=error <
>> 1436977475.M667188P27913.www,S=22671,W=23095:2,S
>>
>> 3) However, it still incorrectly classifies the message as Spam even after I
>> reclassify it as Innocent.
>>
>> dspam --user <username> --classify <
>> 1436977475.M667188P27913.www,S=22671,W=23095:2,S
>> X-DSPAM-Result: <username>; result="Spam"; class="Spam"; probability=1.0000;
>> confidence=1.00;  signature=55a6894350061878812237
>>
>> Am I missing something or interpreting this wrong?  I can't understand why
>> dspam continues to tag messages incorrectly.  How can I clean the dspam
>> database entirely and start over?  The man page for dspam_clean says that it
>> doesn't work for the hash storage driver and I can't find any clear
>> instructions.
>>
>> Also, you mention that you retrain messages from the SPAM folder.  Do you
>> use dovecot and have you tried the dovecot antispam plugin?  This is
>> supposed to automatically train dspam when users move email between their
>> SPAM folder and Inbox.  However, this doesn't seem to be working either.
>>
>>
>>
>> --
>> View this message in context: http://dspam-users.2290790.n4.nabble.com/dspam-not-training-tp4641961p4641963.html
>> Sent from the dspam users mailing list archive at Nabble.com.
>>
>> ------------------------------------------------------------------------------
>> Don't Limit Your Business. Reach for the Cloud.
>> GigeNET's Cloud Solutions provide you with the tools and support that
>> you need to offload your IT needs and focus on growing your business.
>> Configured For All Businesses. Start Your Cloud Today.
>> https://www.gigenetcloud.com/
>> _______________________________________________
>> Dspam-user mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/dspam-user
>
>
> ------------------------------------------------------------------------------
> Don't Limit Your Business. Reach for the Cloud.
> GigeNET's Cloud Solutions provide you with the tools and support that
> you need to offload your IT needs and focus on growing your business.
> Configured For All Businesses. Start Your Cloud Today.
> https://www.gigenetcloud.com/
> _______________________________________________
> Dspam-user mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/dspam-user


------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
Dspam-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspam-user
Reply | Threaded
Open this post in threaded view
|

Re: [Dspam-user] dspam not training

waterdog
Okay, I've transitioned to mysql and made several other changes to my postfix/dspam/dovecot configs but dspam is still not filtering SPAM.  At least now, postfix is calling dspam to filter incoming email but SPAM continues to get delivered even after multiple training attempts using my Junk folder.  Here's what the mail.log shows:

Jul 22 09:15:35 www postfix/pipe[18013]: 01EE2180E58: to=<[hidden email]>, relay=dspam, delay=10, delays=9.7/0/0/0.26, dsn=2.0.0, status=sent (delivered via dspam service)
Jul 22 09:15:35 www postfix/qmgr[17707]: 01EE2180E58: removed
Jul 22 09:15:35 www postfix/local[18017]: 2FD56180E84: to=<[hidden email]>, relay=local, delay=0.2, delays=0.16/0/0/0.04, dsn=2.0.0, status=sent (delivered to maildir)

The dspam_stats for this user don't look too good even after multiple training attempts:

                TP True Positives:                     0
                TN True Negatives:                    4
                FP False Positives:                    2353
                FN False Negatives:                  1947
                SC Spam Corpusfed:                 0
                NC Nonspam Corpusfed:           0
                TL Training Left:                        143
                SHR Spam Hit Rate                  0.00%
                HSR Ham Strike Rate:              99.83%
                PPV Positive predictive value:    0.00%
                OCA Overall Accuracy:              0.09%

And the dovecot antispam plugin is still not working.  Whenever I move SPAM to the Junk folder, I get the following errors in mail.log:

Jul 22 08:06:32 www dspam[17541]: Signature retrieval for '%s' failed
Jul 22 08:06:32 www dspam[17541]: Unable to find a valid signature. Aborting.
Jul 22 08:06:32 www dspam[17541]: process_message returned error -5.  dropping message.

I've been trying to get SPAM processing working for several days and remain perplexed.  I don't know if I'm dealing with configuration issues or software bugs.  I could try dropping the dspam database tables and training fresh but I've already tried training and retraining multiple times so I'm not too optimistic about that idea.  I consider myself pretty good at resolving IT issues but, at this point, I'm about ready to scrap dspam and try a different solution.

Here are my current configs:

main.cf
master.cf
dspam.conf
20-imap.conf

I will be happy to provide any other configs, command output, or logs upon request.  I would appreciate any ideas or pointers anyone may have to offer.

TIA, ~Gary
Reply | Threaded
Open this post in threaded view
|

Re: [Dspam-user] dspam not training

waterdog
BTW, even though I think I have dspam configured to log debug messages, I haven't seen any new dspam log entries to /var/log/dspam/dspam.debug in several days.

/etc/default/dspam options:

START=yes
USER=dspam
OPTIONS="--debug"
MAINTENANCE_OPTIONS="--with-sql-autoupdate --with-sql-optimization"
RUN_NOTIFY="no"

/etc/dspam/dspam.conf debug options:

Debug *
DebugOpt process spam fp

ps -ef| grep dspam
dspam     5438     1  0 Jul19 ?        00:00:03 /usr/bin/dspam --daemon --debug

It seems odd that it's not logging any debug messages and consequently, it's not helping me to figure out the problem.  Am I missing a debug switch somewhere?
Reply | Threaded
Open this post in threaded view
|

Re: [Dspam-user] dspam not training

Eric Broch
In reply to this post by waterdog
How many messages have you trained?

On 7/22/2015 6:48 PM, waterdog wrote:

> Okay, I've transitioned to mysql and made several other changes to my
> postfix/dspam/dovecot configs but dspam is still not filtering SPAM.  At
> least now, postfix is calling dspam to filter incoming email but SPAM
> continues to get delivered even after multiple training attempts using my
> Junk folder.  Here's what the mail.log shows:
>
> Jul 22 09:15:35 www postfix/pipe[18013]: 01EE2180E58:
> to=<[hidden email]>, relay=dspam, delay=10, delays=9.7/0/0/0.26,
> dsn=2.0.0, status=sent (delivered via dspam service)
> Jul 22 09:15:35 www postfix/qmgr[17707]: 01EE2180E58: removed
> Jul 22 09:15:35 www postfix/local[18017]: 2FD56180E84:
> to=<[hidden email]>, relay=local, delay=0.2, delays=0.16/0/0/0.04,
> dsn=2.0.0, status=sent (delivered to maildir)
>
> The dspam_stats for this user don't look too good even after multiple
> training attempts:
>
>                 TP True Positives:                     0
>                 TN True Negatives:                    4
>                 FP False Positives:                    2353
>                 FN False Negatives:                  1947
>                 SC Spam Corpusfed:                 0
>                 NC Nonspam Corpusfed:           0
>                 TL Training Left:                        143
>                 SHR Spam Hit Rate                  0.00%
>                 HSR Ham Strike Rate:              99.83%
>                 PPV Positive predictive value:    0.00%
>                 OCA Overall Accuracy:              0.09%
>
> And the dovecot antispam plugin is still not working.  Whenever I move SPAM
> to the Junk folder, I get the following errors in mail.log:
>
> Jul 22 08:06:32 www dspam[17541]: Signature retrieval for '%s' failed
> Jul 22 08:06:32 www dspam[17541]: Unable to find a valid signature.
> Aborting.
> Jul 22 08:06:32 www dspam[17541]: process_message returned error -5.
> dropping message.
>
> I've been trying to get SPAM processing working for several days and remain
> perplexed.  I don't know if I'm dealing with configuration issues or
> software bugs.  I could try dropping the dspam database tables and training
> fresh but I've already tried training and retraining multiple times so I'm
> not too optimistic about that idea.  I consider myself pretty good at
> resolving IT issues but, at this point, I'm about ready to scrap dspam and
> try a different solution.
>
> Here are my current configs:
>
> main.cf <http://dspam-users.2290790.n4.nabble.com/file/n4641971/main.cf>  
> master.cf <http://dspam-users.2290790.n4.nabble.com/file/n4641971/master.cf>  
> dspam.conf
> <http://dspam-users.2290790.n4.nabble.com/file/n4641971/dspam.conf>  
> 20-imap.conf
> <http://dspam-users.2290790.n4.nabble.com/file/n4641971/20-imap.conf>  
>
> I will be happy to provide any other configs, command output, or logs upon
> request.  I would appreciate any ideas or pointers anyone may have to offer.
>
> TIA, ~Gary
>
>
>
>
> --
> View this message in context: http://dspam-users.2290790.n4.nabble.com/dspam-not-training-tp4641961p4641971.html
> Sent from the dspam users mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Dspam-user mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/dspam-user


------------------------------------------------------------------------------
_______________________________________________
Dspam-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspam-user
Reply | Threaded
Open this post in threaded view
|

Re: [Dspam-user] dspam not training

Nathanael D. Noblet
In reply to this post by waterdog
On Wed, 2015-07-22 at 17:48 -0700, waterdog wrote:

> The dspam_stats for this user don't look too good even after multiple
> training attempts:
>
>                 TP True Positives:                     0
>                 TN True Negatives:                    4
>                 FP False Positives:                    2353
>                 FN False Negatives:                  1947
>                 SC Spam Corpusfed:                 0
>                 NC Nonspam Corpusfed:           0
>                 TL Training Left:                        143

You can see from this line that it needs to receive another 143
messages before it is out of training. It requires about 2500 messages
before it flips a switch. I can't remember what switch but it flips
one.

When I setup myself years ago, I found a corpus of spam, and I fed it
my entire mailbox + the spam. Now you can see my stats years later:

        TP True Positives:                  3354
        TN True Negatives:                239849
        FP False Positives:                 1448
        FN False Negatives:                  981
        SC Spam Corpusfed:                     0
        NC Nonspam Corpusfed:                  0
        TL Training Left:                      0
        SHR Spam Hit Rate                 77.37%
        HSR Ham Strike Rate:               0.60%
        PPV Positive predictive value:    69.85%
        OCA Overall Accuracy:             99.01%

You don't have enought data for dpsam do reliably do anything.
Retraining one message as spam will *not* automatically get it to be
classified as spam on the *next* classification.

Watch the numbers in your stats which says whether training is
occuring. If you have a false negative (ham as spam), train it and you
should see the FN increment. If it does dspam is working as expected.

The other implied part of your question is 'Why isn't dspam effective
yet?'. Which is partly due to the amount of mail you've received so
far, the type of spam, and the dspam settings. I used to setup people
with TEFT as those were the recommendations and I think the default.
Over the years I've seen it mentioned on this list multiple times that
you should use TOE by default.

I also use

Algorithm graham burton
Tokeninzer osb

because of users of this list back in the day explaining that they were
better defaults.



------------------------------------------------------------------------------
_______________________________________________
Dspam-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspam-user
Reply | Threaded
Open this post in threaded view
|

Re: [Dspam-user] dspam not training

Al Zick
Hi,

Here are my stats after retraining 100's of messages. Both spam and ham:

{227} dspam_stats -H
antispam:
                 TP True Positives:                  4818
                 TN True Negatives:                 22115
                 FP False Positives:                    4
                 FN False Negatives:                    5
                 SC Spam Corpusfed:                     0
                 NC Nonspam Corpusfed:                  0
                 TL Training Left:                      0
                 SHR Spam Hit Rate                 99.90%
                 HSR Ham Strike Rate:               0.02%
                 PPV Positive predictive value:    99.92%
                 OCA Overall Accuracy:             99.97%

Last night it caught maybe 100 emails, but I had much more than that  
in my inbox.

Kind Regards,
Al




On Jul 23, 2015, at 12:13 PM, Nathanael D. Noblet wrote:

> On Wed, 2015-07-22 at 17:48 -0700, waterdog wrote:
>
>> The dspam_stats for this user don't look too good even after multiple
>> training attempts:
>>
>>                 TP True Positives:                     0
>>                 TN True Negatives:                    4
>>                 FP False Positives:                    2353
>>                 FN False Negatives:                  1947
>>                 SC Spam Corpusfed:                 0
>>                 NC Nonspam Corpusfed:           0
>>                 TL Training Left:                        143
>
> You can see from this line that it needs to receive another 143
> messages before it is out of training. It requires about 2500 messages
> before it flips a switch. I can't remember what switch but it flips
> one.
>
> When I setup myself years ago, I found a corpus of spam, and I fed it
> my entire mailbox + the spam. Now you can see my stats years later:
>
>         TP True Positives:                  3354
>         TN True Negatives:                239849
>         FP False Positives:                 1448
>         FN False Negatives:                  981
>         SC Spam Corpusfed:                     0
>         NC Nonspam Corpusfed:                  0
>         TL Training Left:                      0
>         SHR Spam Hit Rate                 77.37%
>         HSR Ham Strike Rate:               0.60%
>         PPV Positive predictive value:    69.85%
>         OCA Overall Accuracy:             99.01%
>
> You don't have enought data for dpsam do reliably do anything.
> Retraining one message as spam will *not* automatically get it to be
> classified as spam on the *next* classification.
>
> Watch the numbers in your stats which says whether training is
> occuring. If you have a false negative (ham as spam), train it and you
> should see the FN increment. If it does dspam is working as expected.
>
> The other implied part of your question is 'Why isn't dspam effective
> yet?'. Which is partly due to the amount of mail you've received so
> far, the type of spam, and the dspam settings. I used to setup people
> with TEFT as those were the recommendations and I think the default.
> Over the years I've seen it mentioned on this list multiple times that
> you should use TOE by default.
>
> I also use
>
> Algorithm graham burton
> Tokeninzer osb
>
> because of users of this list back in the day explaining that they  
> were
> better defaults.
>
>
>
> ----------------------------------------------------------------------
> --------
> _______________________________________________
> Dspam-user mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/dspam-user
>
> !DSPAM:55b11c8d189367246910663!
>


------------------------------------------------------------------------------
_______________________________________________
Dspam-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspam-user
Reply | Threaded
Open this post in threaded view
|

Re: [Dspam-user] dspam not training

ktm@rice.edu
On Fri, Jul 24, 2015 at 07:17:14AM -0400, Al Zick wrote:

> Hi,
>
> Here are my stats after retraining 100's of messages. Both spam and ham:
>
> {227} dspam_stats -H
> antispam:
>                  TP True Positives:                  4818
>                  TN True Negatives:                 22115
>                  FP False Positives:                    4
>                  FN False Negatives:                    5
>                  SC Spam Corpusfed:                     0
>                  NC Nonspam Corpusfed:                  0
>                  TL Training Left:                      0
>                  SHR Spam Hit Rate                 99.90%
>                  HSR Ham Strike Rate:               0.02%
>                  PPV Positive predictive value:    99.92%
>                  OCA Overall Accuracy:             99.97%
>
> Last night it caught maybe 100 emails, but I had much more than that  
> in my inbox.
>
> Kind Regards,
> Al
>

Hi Al,

It does not look like your training is working. If you retrained 100's
of messages, you should have appropriate counts in the FP/FN fields.

Regards,
Ken

------------------------------------------------------------------------------
_______________________________________________
Dspam-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspam-user
Reply | Threaded
Open this post in threaded view
|

Re: [Dspam-user] dspam not training

waterdog
In reply to this post by Eric Broch
Okay, I apologize for all the following questions but, the more I troubleshoot dspam without progress, the more questions I have.

Are there recommendations/documentation on how to properly train?  It seems that some users do corpus training and other users just train based on actual messages.

What are the pros/cons of using a corpus vs. actual messages?

Does it help to retrain multiple times using the same corpus and/or messages?

What are the specific stats that one should look to achieve to determine if dspam has had enough training?

Does TL need to be at zero before dspam will work at all?

Do you have to train separately for each user or can all users share the same training?  

I've tried training and retraining multiple times using corpuses and actual messages but don't seem to be making any real progress.  Here are my current stats after training with a corpus:

sudo dspam_train <username> spam_2 easy_ham_2

sudo dspam_stats -H <username>

                TP True Positives:                     0
                TN True Negatives:                  1315
                FP False Positives:                 2443
                FN False Negatives:                 2154
                SC Spam Corpusfed:                     0
                NC Nonspam Corpusfed:                  0
                TL Training Left:                      0
                SHR Spam Hit Rate                  0.00%
                HSR Ham Strike Rate:              65.01%
                PPV Positive predictive value:     0.00%
                OCA Overall Accuracy:             22.24%

As you can see, the OCA is still low but better than it was before.

It might help if someone could post working configurations for postfix, dspam, dovecot, and clamAV for comparison.  I've tried to follow the online documentation but apparently I'm missing something.
Reply | Threaded
Open this post in threaded view
|

Re: [Dspam-user] dspam not training

Eric Broch
I train on error only, though, my settings are TEFT in dspam.conf--I'll
change this for the next DSPAM setup.
I've never fed any corpus to DSPAM and I'm getting between 98 and 99
percent success rate, if not higher.
I use DSPAM on a per user bases--no global user. I use spamassassin
(bayes) as a global spam filter.
I train individual messages by looping through each in every user's
'.spam' folder to which they move unmarked spam and to which the imap
client moves marked spam. I do not train spam marked by DSPAM as spam, I
skip and delete them.
I never have trained ham.

The above scenario has brought success at 2 sites, however, it was done
while I was relatively inexperienced with DSPAM.

Eric


 
On 7/24/2015 11:17 AM, waterdog wrote:

> Okay, I apologize for all the following questions but, the more I
> troubleshoot dspam without progress, the more questions I have.
>
> Are there recommendations/documentation on how to properly train?  It seems
> that some users do corpus training and other users just train based on
> actual messages.
>
> What are the pros/cons of using a corpus vs. actual messages?
>
> Does it help to retrain multiple times using the same corpus and/or
> messages?
>
> What are the specific stats that one should look to achieve to determine if
> dspam has had enough training?
>
> Does TL need to be at zero before dspam will work at all?
>
> Do you have to train separately for each user or can all users share the
> same training?  
>
> I've tried training and retraining multiple times using corpuses and actual
> messages but don't seem to be making any real progress.  Here are my current
> stats after training with a corpus:
>
> sudo dspam_train <username> spam_2 easy_ham_2
>
> sudo dspam_stats -H <username>
>
>                 TP True Positives:                     0
>                 TN True Negatives:                  1315
>                 FP False Positives:                 2443
>                 FN False Negatives:                 2154
>                 SC Spam Corpusfed:                     0
>                 NC Nonspam Corpusfed:                  0
>                 TL Training Left:                      0
>                 SHR Spam Hit Rate                  0.00%
>                 HSR Ham Strike Rate:              65.01%
>                 PPV Positive predictive value:     0.00%
>                 OCA Overall Accuracy:             22.24%
>
> As you can see, the OCA is still low but better than it was before.
>
> It might help if someone could post working configurations for postfix,
> dspam, dovecot, and clamAV for comparison.  I've tried to follow the online
> documentation but apparently I'm missing something.
>
>
>
>
> --
> View this message in context: http://dspam-users.2290790.n4.nabble.com/dspam-not-training-tp4641961p4641980.html
> Sent from the dspam users mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Dspam-user mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/dspam-user



------------------------------------------------------------------------------
_______________________________________________
Dspam-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspam-user
Reply | Threaded
Open this post in threaded view
|

Re: [Dspam-user] dspam not training

ktm@rice.edu
In reply to this post by waterdog
On Fri, Jul 24, 2015 at 10:17:18AM -0700, waterdog wrote:

> Okay, I apologize for all the following questions but, the more I
> troubleshoot dspam without progress, the more questions I have.
>
> Are there recommendations/documentation on how to properly train?  It seems
> that some users do corpus training and other users just train based on
> actual messages.
>
> What are the pros/cons of using a corpus vs. actual messages?
>
> Does it help to retrain multiple times using the same corpus and/or
> messages?
>
> What are the specific stats that one should look to achieve to determine if
> dspam has had enough training?
>
> Does TL need to be at zero before dspam will work at all?
>
> Do you have to train separately for each user or can all users share the
> same training?  
>
> I've tried training and retraining multiple times using corpuses and actual
> messages but don't seem to be making any real progress.  Here are my current
> stats after training with a corpus:
>
> sudo dspam_train <username> spam_2 easy_ham_2
>
> sudo dspam_stats -H <username>
>
>                 TP True Positives:                     0
>                 TN True Negatives:                  1315
>                 FP False Positives:                 2443
>                 FN False Negatives:                 2154
>                 SC Spam Corpusfed:                     0
>                 NC Nonspam Corpusfed:                  0
>                 TL Training Left:                      0
>                 SHR Spam Hit Rate                  0.00%
>                 HSR Ham Strike Rate:              65.01%
>                 PPV Positive predictive value:     0.00%
>                 OCA Overall Accuracy:             22.24%
>
> As you can see, the OCA is still low but better than it was before.
>
> It might help if someone could post working configurations for postfix,
> dspam, dovecot, and clamAV for comparison.  I've tried to follow the online
> documentation but apparently I'm missing something.
>
Hi,

I am not sure what your training corpus looks like, but those are pretty bad
as results. Training a global/merged group can reduce the accuracy hit at
the beginning, but in general, using a train-on-error setup, with no initial
training would probably be better. Training with some valid good content is
good if your ham/spam ratio is very small. The accuracy is best with an even
mix of spam/ham to start. Then the TOE will keep it balanced.

Regards,
Ken

------------------------------------------------------------------------------
_______________________________________________
Dspam-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspam-user
Reply | Threaded
Open this post in threaded view
|

Re: [Dspam-user] dspam not training

Al Zick
In reply to this post by ktm@rice.edu

On Jul 24, 2015, at 9:53 AM, [hidden email] wrote:

On Fri, Jul 24, 2015 at 07:17:14AM -0400, Al Zick wrote:
Hi,

Here are my stats after retraining 100's of messages. Both spam and ham:

{227} dspam_stats -H
antispam:
                 TP True Positives:                  4818
                 TN True Negatives:                 22115
                 FP False Positives:                    4
                 FN False Negatives:                    5
                 SC Spam Corpusfed:                     0
                 NC Nonspam Corpusfed:                  0
                 TL Training Left:                      0
                 SHR Spam Hit Rate                 99.90%
                 HSR Ham Strike Rate:               0.02%
                 PPV Positive predictive value:    99.92%
                 OCA Overall Accuracy:             99.97%

Last night it caught maybe 100 emails, but I had much more than that  
in my inbox.

Kind Regards,
Al


Hi Al,

It does not look like your training is working. If you retrained 100's
of messages, you should have appropriate counts in the FP/FN fields.

Regards,
Ken


Hi Ken,

What do I need to do to fix this?

Kind Regards,
Al


------------------------------------------------------------------------------

_______________________________________________
Dspam-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspam-user
Reply | Threaded
Open this post in threaded view
|

Re: [Dspam-user] dspam not training

ktm@rice.edu
On Fri, Jul 24, 2015 at 02:36:49PM -0400, Al Zick wrote:

>
> >Hi Al,
> >
> >It does not look like your training is working. If you retrained 100's
> >of messages, you should have appropriate counts in the FP/FN fields.
> >
> >Regards,
> >Ken
> >
>
> Hi Ken,
>
> What do I need to do to fix this?
>
> Kind Regards,
> Al
>

Hi Al,

I would trace through your re-train process one step at a time. Run each manually
to verify that it is working. You should see counts increase when a message is
trained as either spam or not-spam/ham. Many times it is an ACL, access right,
or other permission or ownership setting that causes the failure. Then, once you
have a verified process, make sure that that IS the process that is called when
you get a message misidentification. I apologize for the hand-waving, but the
details really are completely dependent on your setup and configuration.

Regards,
Ken

------------------------------------------------------------------------------
_______________________________________________
Dspam-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspam-user