I am trying to train a recently installed copy of Spamassassin, and I'm having the impression that bayesian learning isn't working.
First of all: yes, spamd
is running with the --allow-tell
option.
Now, I have a piece of spam. I first run it by Spamassassin and I get a given score:
[paulo@myserver ~]$ spamc -R < spam6.txt
2.9/5.0
Spam detection software, running on the system "myserver",
has NOT identified this incoming email as spam. The original
message has been attached to this so you can view it or label
similar future email. If you have any questions, see
the administrator of that system for details.
Content preview: Nombre - herbertrl1 E-mail: - mu18@atsushi1010.masumi76.pushmail.fun
Asunto - Mensaje - New sexy website is available on the web http://porndreamscene.sexjanet.com/?katarina
porn star carl paula blum porn double d hamster porn video oiled porn clitoris
massage free young nubile porn [...]
Content analysis details: (2.9 points, 5.0 required)
pts rule name description
---- ---------------------- --------------------------------------------------
1.2 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
[Blocked - see <https://www.spamcop.net/bl.shtml?164.132.34.35>]
1.7 URIBL_BLACK Contains an URL listed in the URIBL blacklist
[URIs: sexjanet.com]
0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record
So I feed it to spamc
using the -L
option:
[paulo@myserver ~]$ spamc -L spam < spam6.txt
Message successfully un/learned
And then I try to analyze it with spamc again... and I get the exact same score:
[paulo@myserver ~]$ spamc -R < spam6.txt
2.9/5.0
Spam detection software, running on the system "myserver",
has NOT identified this incoming email as spam. The original
message has been attached to this so you can view it or label
similar future email. If you have any questions, see
the administrator of that system for details.
Content preview: Nombre - herbertrl1 E-mail: - mu18@atsushi1010.masumi76.pushmail.fun
Asunto - Mensaje - New sexy website is available on the web http://porndreamscene.sexjanet.com/?katarina
porn star carl paula blum porn double d hamster porn video oiled porn clitoris
massage free young nubile porn [...]
Content analysis details: (2.9 points, 5.0 required)
pts rule name description
---- ---------------------- --------------------------------------------------
1.2 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
[Blocked - see <https://www.spamcop.net/bl.shtml?164.132.34.35>]
1.7 URIBL_BLACK Contains an URL listed in the URIBL blacklist
[URIs: sexjanet.com]
0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record
Am I missing something?
Default spamassassin configuration requires minimum 200 spam and 200 ham messages to train bayes. You can execute sa-learn --dump magic
to check number of messages passed to bayes learning.
man Mail::SpamAssassin::Conf
(SpamAssassin version 3.1)
bayes_min_ham_num (Default: 200)
bayes_min_spam_num (Default: 200)
To be accurate, the Bayes system does not activate until a certain number of ham (non-spam) and spam have been learned. The default is 200 of each ham and spam, but you can tune these up or down with these two settings
$ sa-learn --dump magic
[…]
0.000 0 2508 0 non-token data: nspam
0.000 0 508 0 non-token data: nham
[…]