Thursday, February 02, 2006
Regex Spam Fighting
If you are fighting wiki spam, you likely have found regular expressions are a common method to block spam. They are of course useful for a lot more than just blocking URLs. You can find regexs in a lot of places. But for this post I am sticking to antispam uses.
John Walling uses keyword blocking. I worry about false positives with this method, but if careful with your regexs and you think about the keywords carefully hopefully that won't be a frequent problem. For example, blocking cialis also blocks specialist. But there are words like tramadol or hydrocodone that aren't likely to be necessary unless you are running a medical wiki.
You can also block spam based on patterns commonly found in spam. This is used in blocking CSS Hidden Spam. This doesn't catch all spam, but it will catch all spam using this technique whether or not any keyword or URL would have caught it.
Well, the main purpose of this post was to let you know about Regular-Expressions.info's regex tutorial. I saw lots of stuff there I didn't realize was possible. Regular expressions are very powerful tools and you don't need a lot of experience to take advantage of some simple rules. The more people that can come up with their own spam blocking methods the worse off spammers will be. No single method (except locking down your wiki) is going to solve the problem.
John Walling uses keyword blocking. I worry about false positives with this method, but if careful with your regexs and you think about the keywords carefully hopefully that won't be a frequent problem. For example, blocking cialis also blocks specialist. But there are words like tramadol or hydrocodone that aren't likely to be necessary unless you are running a medical wiki.
You can also block spam based on patterns commonly found in spam. This is used in blocking CSS Hidden Spam. This doesn't catch all spam, but it will catch all spam using this technique whether or not any keyword or URL would have caught it.
Well, the main purpose of this post was to let you know about Regular-Expressions.info's regex tutorial. I saw lots of stuff there I didn't realize was possible. Regular expressions are very powerful tools and you don't need a lot of experience to take advantage of some simple rules. The more people that can come up with their own spam blocking methods the worse off spammers will be. No single method (except locking down your wiki) is going to solve the problem.