[Dspam-user] dspam for foreign languages and spam pictures

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[Dspam-user] dspam for foreign languages and spam pictures

ML mail
Hello,

Two questions:

How well does dspam perform with more "exotic" foreign languages such as arabic, chinese, etc? 

and how does dspam also work for fighting spam mails which include their content in pictures jpeg/png/etc ?

Regards
ML

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Dspam-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspam-user
Reply | Threaded
Open this post in threaded view
|

Re: [Dspam-user] dspam for foreign languages and spam pictures

Tom Hendrikx
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 12-03-14 14:29, ML mail wrote:
> Hello,
>
> Two questions:
>
> How well does dspam perform with more "exotic" foreign languages
> such as arabic, chinese, etc?

As long as dspam can break up the strings into tokens, it should able
to do smething wth it. I don't know if the charset actually has any
effect, maybe that's more a question for Stevan...

>
> and how does dspam also work for fighting spam mails which include
> their content in pictures jpeg/png/etc ?

Dspam does nothing with pictures. Extracting text from images (OCR) is
completely different task, where dspam has support for. You could look
into various projects with support for OCR.

Tom
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJTIhgKAAoJEJPfMZ19VO/1INgP+gPX7t1Pl9Cn+ES4yw6UsIvD
IMZchfti0QvMRIAMM51dbo0TaaZYgvne9S6AYVpMD685POoafFQuh7/BmCFGZnYT
sMhh99fNe7uvWZ891Y8USJuBIuUX99gBxhRfHwg+D1pW/nJhnV7aK+dDn23OmVz3
i4T9mwC++ky3SRjyxiEQ00QGZK2alefQUXfBfLlQPeAGM2UX8yg4W8DQU8LKBG29
tc1FlLvbIJvXsclrjTYpdIDBgA+EX63hnhFeroD6IpYWwz0rl2ZKx3pfUBFEEHgA
7rqgXO0egpT4kEhAd6iZR/M6Eqn0o+4oRyRG7viojyRwcX2Yke5BXreeT7qXojEC
jW7KRj/RHPBQ0xlLUNo0sezzF2vP2WdU0r1XG3CPuGGNxemLe7nJMzokultKWO8K
3mC0vu90amjUsGwwDtijhhupRI6p6bWc0bkBdlP8iMbeH0Nt6bNN54XUnjCPrz4w
O7JPI8fet6h2Ubf2HE6sZKUQIffSJ1m0OaVxp0CJu5SEeQVa5JFL9KEvOU/0ko9t
cQxBWuMyQQW8WKS05vNll+347pshPcwiiAv7p8lPLN5m1D5LbSTcQLEaYtP5ia+F
3eoh75tsBOGQydOgBLPSE6vrVcPxOESB4wTD9PA8miIijkc2HrQhRzlwDOn2aSsO
rTs+LLLb4vtuLFihtfGB
=6K5l
-----END PGP SIGNATURE-----

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Dspam-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspam-user
Reply | Threaded
Open this post in threaded view
|

Re: [Dspam-user] dspam for foreign languages and spam pictures

ML mail
Hi Tom,

Thanks for your feedback...

@Stevan, any input regarding support of dspam of various charsets?

Regards
ML


On Thursday, March 13, 2014 9:53 PM, Tom Hendrikx <[hidden email]> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 12-03-14 14:29, ML mail wrote:
> Hello,
>
> Two questions:
>
> How well does dspam perform with more "exotic" foreign languages
> such as arabic, chinese, etc?

As long as dspam can break up the strings into tokens, it should able
to do smething wth it. I don't know if the charset actually has any
effect, maybe that's more a question for Stevan...

>
> and how does dspam also work for fighting spam mails which include
> their content in pictures jpeg/png/etc ?

Dspam does nothing with pictures. Extracting text from images (OCR) is
completely different task, where dspam has support for. You could look
into various projects with support for OCR.

Tom
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJTIhgKAAoJEJPfMZ19VO/1INgP+gPX7t1Pl9Cn+ES4yw6UsIvD
IMZchfti0QvMRIAMM51dbo0TaaZYgvne9S6AYVpMD685POoafFQuh7/BmCFGZnYT
sMhh99fNe7uvWZ891Y8USJuBIuUX99gBxhRfHwg+D1pW/nJhnV7aK+dDn23OmVz3
i4T9mwC++ky3SRjyxiEQ00QGZK2alefQUXfBfLlQPeAGM2UX8yg4W8DQU8LKBG29
tc1FlLvbIJvXsclrjTYpdIDBgA+EX63hnhFeroD6IpYWwz0rl2ZKx3pfUBFEEHgA
7rqgXO0egpT4kEhAd6iZR/M6Eqn0o+4oRyRG7viojyRwcX2Yke5BXreeT7qXojEC
jW7KRj/RHPBQ0xlLUNo0sezzF2vP2WdU0r1XG3CPuGGNxemLe7nJMzokultKWO8K
3mC0vu90amjUsGwwDtijhhupRI6p6bWc0bkBdlP8iMbeH0Nt6bNN54XUnjCPrz4w
O7JPI8fet6h2Ubf2HE6sZKUQIffSJ1m0OaVxp0CJu5SEeQVa5JFL9KEvOU/0ko9t
cQxBWuMyQQW8WKS05vNll+347pshPcwiiAv7p8lPLN5m1D5LbSTcQLEaYtP5ia+F
3eoh75tsBOGQydOgBLPSE6vrVcPxOESB4wTD9PA8miIijkc2HrQhRzlwDOn2aSsO
rTs+LLLb4vtuLFihtfGB
=6K5l
-----END PGP SIGNATURE-----


------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Dspam-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspam-user



------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Dspam-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspam-user
Reply | Threaded
Open this post in threaded view
|

Re: [Dspam-user] dspam for foreign languages and spam pictures

Rick Leir
The problem is more complex. Some charsets are used without the space char so where do you break the string int tokens? If kana and katakana are present then I have been told that you should break at transitions between charsets. Chinese chars could each be considered a token. Stevan, please tell more.

Though most of my mail is in English, charset support would be useful because some correspondents have Cantonese in the sig.
Cheers-- Rick

ML mail <[hidden email]> wrote:
Hi Tom,

Thanks for your feedback...

@Stevan, any input regarding support of dspam of various charsets?

Regards
ML


On Thursday, March 13, 2014 9:53 PM, Tom Hendrikx <[hidden email]> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 12-03-14 14:29, ML mail wrote:
> Hello,
>
> Two questions:
>
> How well does dspam perform with more "exotic" foreign languages
> such as arabic, chinese, etc?

As long as dspam can break up the strings into tokens, it should able
to do smething wth it. I don't know if the charset actually has any
effect, maybe that's more a question for Stevan...

>
> and how does dspam also work for fighting spam mails which include
> their content in pictures jpeg/png/etc ?

Dspam does nothing with pictures. Extracting text from images (OCR) is
completely different task, where dspam has support for. You could look
into various projects with support for OCR.

Tom
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJTIhgKAAoJEJPfMZ19VO/1INgP+gPX7t1Pl9Cn+ES4yw6UsIvD
IMZchfti0QvMRIAMM51dbo0TaaZYgvne9S6AYVpMD685POoafFQuh7/BmCFGZnYT
sMhh99fNe7uvWZ891Y8USJuBIuUX99gBxhRfHwg+D1pW/nJhnV7aK+dDn23O mVz3
i4T9mwC++ky3SRjyxiEQ00QGZK2alefQUXfBfLlQPeAGM2UX8yg4W8DQU8LKBG29
tc1FlLvbIJvXsclrjTYpdIDBgA+EX63hnhFeroD6IpYWwz0rl2ZKx3pfUBFEEHgA
7rqgXO0egpT4kEhAd6iZR/M6Eqn0o+4oRyRG7viojyRwcX2Yke5BXreeT7qXojEC
jW7KRj/RHPBQ0xlLUNo0sezzF2vP2WdU0r1XG3CPuGGNxemLe7nJMzokultKWO8K
3mC0vu90amjUsGwwDtijhhupRI6p6bWc0bkBdlP8iMbeH0Nt6bNN54XUnjCPrz4w
O7JPI8fet6h2Ubf2HE6sZKUQIffSJ1m0OaVxp0CJu5SEeQVa5JFL9KEvOU/0ko9t
cQxBWuMyQQW8WKS05vNll+347pshPcwiiAv7p8lPLN5m1D5LbSTcQLEaYtP5ia+F
3eoh75tsBOGQydOgBLPSE6vrVcPxOESB4wTD9PA8miIijkc2HrQhRzlwDOn2aSsO
rTs+LLLb4vtuLFihtfGB
=6K5l
-----END PGP SIGNATURE-----


------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Rei lly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Dspam-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspam-user




Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech



Dspam-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspam-user

--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Dspam-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspam-user
Reply | Threaded
Open this post in threaded view
|

Re: [Dspam-user] dspam for foreign languages and spam pictures

Tom Hendrikx
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi,

I took a quick look at the source code (tokenizer.c, config.h) and
delimiters for both headers and body seem to be a list non-alphabetic
ascii characters:

" .,;:\"/\\[]}{=+_()<>|&\n\t\r@-*~`?#$%^"

or a subset of those, depending on tokenizer. So, no support for
non-ascii token delimiters.


Tom

On 14-03-14 13:03, [hidden email] wrote:

> The problem is more complex. Some charsets are used without the
> space char so where do you break the string int tokens? If kana and
> katakana are present then I have been told that you should break at
> transitions between charsets. Chinese chars could each be
> considered a token. Stevan, please tell more.
>
> Though most of my mail is in English, charset support would be
> useful because some correspondents have Cantonese in the sig.
> Cheers-- Rick
>
> ML mail <[hidden email]> wrote:
>
> Hi Tom,
>
> Thanks for your feedback...
>
> @Stevan, any input regarding support of dspam of various charsets?
>
> Regards ML
>
>
> On Thursday, March 13, 2014 9:53 PM, Tom Hendrikx
> <[hidden email]> wrote: On 12-03-14 14:29, ML mail wrote:
>> Hello,
>
>> Two questions:
>
>> How well does dspam perform with more "exotic" foreign languages
>> such as arabic, chinese, etc?
>
> As long as dspam can break up the strings into tokens, it should
> able to do smething wth it. I don't know if the charset actually
> has any effect, maybe that's more a question for Stevan...
>
>
>> and how does dspam also work for fighting spam mails which
>> include their content in pictures jpeg/png/etc ?
>
> Dspam does nothing with pictures. Extracting text from images (OCR)
> is completely different task, where dspam has support for. You
> could look into various projects with support for OCR.
>
> Tom
>
>
> ------------------------------------------------------------------------------
>
>
Learn Graph Databases - Download FREE O'Rei lly Book

> "Graph Databases" is the definitive new guide to graph databases
> and their applications. Written by three acclaimed leaders in the
> field, this first edition is now available. Download your free book
> today! http://p.sf.net/sfu/13534_NeoTech 
> _______________________________________________ Dspam-user mailing
> list [hidden email]
> <mailto:[hidden email]>
> https://lists.sourceforge.net/lists/listinfo/dspam-user
>
>
> ------------------------------------------------------------------------
>
>  Learn Graph Databases - Download FREE O'Reilly Book "Graph
> Databases" is the definitive new guide to graph databases and
> their applications. Written by three acclaimed leaders in the
> field, this first edition is now available. Download your free book
> today! http://p.sf.net/sfu/13534_NeoTech
>
> ------------------------------------------------------------------------
>
>  Dspam-user mailing list [hidden email]
> https://lists.sourceforge.net/lists/listinfo/dspam-user
>
>
> -- Sent from my Android phone with K-9 Mail. Please excuse my
> brevity.
>
>
> ------------------------------------------------------------------------------
>
>
Learn Graph Databases - Download FREE O'Reilly Book

> "Graph Databases" is the definitive new guide to graph databases
> and their applications. Written by three acclaimed leaders in the
> field, this first edition is now available. Download your free book
> today! http://p.sf.net/sfu/13534_NeoTech
>
>
>
> _______________________________________________ Dspam-user mailing
> list [hidden email]
> https://lists.sourceforge.net/lists/listinfo/dspam-user
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJTJKpqAAoJEJPfMZ19VO/1QZoP/1VO2KQxSic6XgLNpkcf3sR5
+Vx5ChnTrzceCq30DDRgZGuGG1D+QSIgKnZzTnmDHdceyc/4+AJnYGIuApWP1oi1
OfONprIDPlzCta8qhjruueZeo94F2zYoseOG+itfcgcPIGu74MwYDNrdE/WVrkU0
31j+iETWAr6SoPly22wTontWK3/6Mcv95jOiu70rMIdRLS2Nw0h4ESnWAMb/6HIk
3IKr+kjP5gQFq7wFxEa8szQ7lbzWP6sUvnAlPRA90UvdC2WKdVJJfPPL32SVfiwI
Hv4CaZ7CfGUkMfS0MUbaW7PVN7YGJ1Tny2mFR9wJo3YZKbo6VBPUf9cEHxx8l84K
ztt/NLWIwjxp9O1DwGan8qTnut5vEy9okwdchEEDJnQPe3cgIKGIakBSTowsAUAE
/Le0SbW1nIErB3nv9HYRw0G/2IjkvNa9GPcIv0Giqgw8J8m3/J7jkfYE87o/5qpI
Z6pdRW9TPnh3WjHh/q/oW839vPHfDHAoJ9uRQmNdWKsmrjqVlykrV453iZD5Bn5l
A7ZuHinzJnWu28W2OG6zyJsAd4VYrvcSj/L/q7KN3zIngc1DQOguS2SS6frkts7C
BaRDGKQpXkcq7JFQI9WXiN/mnvO7BjUFXjHfaiW8xC0dM6NEfAoXJIFMlPT12qUa
F4TFbg1pDBchq6pm5QhL
=NWDl
-----END PGP SIGNATURE-----

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Dspam-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/dspam-user