Friday, May 26, 2006

Google Bug

Does this look strange to anyone? Clicking the link doesn't get you anywhere. Since when do URLs start with a %20? That is a space. It was the only site in the results with this error.

Radiant Extra Condensed font - preview & download
Radiant Extra Condensed font. SearchFreeFonts.com has the best selection of downloadable, design quality, True Type and PostScript fonts for Mac and ...
%20www.searchfreefonts.com/font/radiant-extra-condensed.htm - 7k -

Wednesday, May 24, 2006

Web Spam Detector

I just read about a Search engine spam detector on Search Engine Watch. The idea behind the site is "To persuade webmasters and SEO to think." If an independent italian software developer can on his own create a simple tool capable of detecting these techniques, just think what search engines can do or are already doing.

I didn't see an overall rating of the website it scanned, I guess you have to study the individual results. My blog isn't very spammy according to this test, a while back I removed some of the more spammy looking parts. It didn't like the antispam CSS rules I posted here. It called them unnatural text. It also detected unnatural text in a spam email I posted recently.

Kind of makes me wish I had time to do something like this myself. But I already have too many unfinished spam research projects going.

Nigerian Scammer Article

CNN has a story on internet scammers in Lagos, Nigeria. The main subject of the article is a 14 year old boy who gets paid for scamming people. His boss takes 60% of the revenue, uses another 20% to pay off law enforcement and teachers, and still leaves plenty for the teenage boy to be the main support for "his family and legions of relatives."

He and many others like him, do their work out of internet cafes. The cafe mentioned in the article has a sign on the door that starts, "WE DO NOT TOLERATE SCAMS IN THIS PLACE." The article calls that sign is a joke, obtaining things by trickery is a national pastime.

Wednesday, May 03, 2006

Bad Behavior 2 Alphas

Michael has released Alpha 4 of BadBehavior 2. I have not tested it, but from his post it sounds like a major improvement. He is again ahead of the spammers. And he now supports MediaWiki without the hack that used to be required. He is expecially interested in testing by MediaWiki users since actual support for it was just added. I would give it a shot, but the only MediaWiki I run is my honeypot. I may still give it a try anyway for a few days just to see, but I wouldn't want to run a clean wiki. Spam attracts Spam, and in the case of a honeypot, that is exactly what I want.

Even though I don't use it, I am a big supporter of BadBehavior. It clearly can't always stop all spam, but other than disabling all user submitted content on your site, nothing will. As long as he is able to keep just a tiny bit ahead of spammers, it will make their job harder, and make ours easier.

ScamFly AntiScam Forum

Yesterday I got a comment on my FBI email scam post. As you know when I get really good comments, I must share them as new posts:

There are so many west coast wellness scam online these days. I see it even where i live in Florida. I did come across a forum though that i can post about west coast wellness scam and its free. Here is the url http://www.scamfly.com/forum/

Posted by ChignikLake_561769 at 5/02/2006 10:17:53 AM


What an interesting idea. Lets look into it further. This blogger who left the comment had only 4 profile views when I checked shortly after the comment arrived. The domain was registered on May 1st for only one year with RegisterFly and their whois info protection service. They are building up content fast. There were 13 non default forums with a total of 56 posts when I visited. All the posts were by the forum's single registered user, admin.

An example of one of the posts in the site's forum:

Bedford to fight senior citizen fraud - Lynchburg News and Advance

Bedford to fight senior citizen fraudLynchburg News and Advance, VA - Apr 29, 2006... “Senior citizens are targeted,” Wheeler said. “They tend to be a lot more vulnerable to the door-to-door and the telemarketing scams.”. ...


Check out the link on that top line, it looks like a link direct from Google News's RSS feed. The funny characters in the post, are common on badly parsed/formed RSS feeds, as is as its truncated length.

The main site says they provide free blog hosting where you can "Tell your story" on your "I've been scammed" blog. What a nice guy, there just aren't enough free blog providers out there. Currently the only active blog listed is titled "affiliates" and has a couple posts on "VITAL HEALTH CARE PRODUCTS." There is one more blog named "pits," but so far it only has the default welcome post and Google Ads. Oddly, both blogs use the same Google Ad Client ID.

From the Features page, I discovered why the IDs match:

We offer this free blog service for a number of reasons. You will notice that on our free hosting platform we have a small Google adsense ad unit on the blog. This is a small way to help us cover bandwidth costs.


The logo in the upper left corner of the main site gives away some more info. It says ClubBlogs.com, but does not link to it. There are a couple other mentions of that domain, but no links. Going there yesterday gave this message along with some spamdexing looking content:

We are certain you will be delighted with the fruits of our online test which has sourced good file about to mail you to. An essential strategic assistance that cyberspace organizations obtain over non-net based organizations is that they can constitute changes when changes are demanded.

Detailed test all over the net resulted in this location about and capture the best sites from all those available for you to visit. Our web location is yet expanding so we have not much managed to comprehend volumes of support, however what we have done so far is researched the too best sites on the net. Our web location is yet expanding so we have not much managed to provide lots of file, however what we have done so far is researched the too best


Today the site looks totally different and that text is gone, it is an actual WordPress blog now.

Another site mentioned on ScamFly, but not linked was blogsilla.com. Today it contained this text with a somewhat similar layout to yesterday's clubblogs site:

We are certain you will be delighted with the fruits of our online test which has sourced good file about wikipedia to mail you to. An essential strategic assistance that cyberspace wikipedia organizations obtain over non-net based wikipedia organizations is that they can constitute changes when changes are demanded.

Our wikipedia website is too advanced so we have not much managed to assistance lots of content, however what we have done so far is researched the too best wikipedia sites on the net.

Locating essential and good wikipedia requires a considerably extent of skill. So we came up with this website. As online commerce continues to grow wikipedia.


Don't forget to check out blogsilla's sitemap. They have pages on lots of varied topics including: wikipedia, mesothelioma, health care, Titleist, and Diablo 2. But each page is basically the same with no actual topical content. It just inserts the page's keyword into the above text (where it says wikipedia).

Through domaintools.com, I discovered a related site with a design similar to today's clubblog design, 106th-park.com. The whois info even says "Visit: http://clubblogs.com." Wow, I didn't know you could advertise your other sites in your whois info.

So getting back to scamfly.com, is it just a case of spamdexing or a legitimate site designed to help scam victims? I will leave the conclusion to the reader.

Tuesday, May 02, 2006

SpamOrHam.org

John Graham-Cumming, the inventor of POPFile, has a new project called SpamOrHam. He is attempting to improve spam filter testing by verifying (or correcting) the accuracy of one of the few large sources of real world test emails, the TREC 2005 Public Spam Corpus. The more accuratly classified the test data is, the more accuratly spam filters can be tested, which hopefully will lead to better filters. To do this, he needs volunteers to visit the site and classify a few messages. If enough people participate, the corpus could be done in no time.

The Public Spam Corpus is a body of email collected from a large corporation during the process of a legal trial. Normally it is hard to get such a real world collection of email due to privacy issues, etc. But thanks to the legal system, this is a collection of mail from 150 recipients in the company containing business and personal emails as well as spam. After a few messages you will understand more about the source and the types of business they deal with. This was a major company that was covered heavily in the news recently. It is a bit voyeruistic reading some of these emails (though I have seen no juicy details). I found it rather addictive.

Apparently this project is such a good idea that within just two days of its launch, it has already seen hacking attempts which due to John's planning ahead were already being prevented, partly through a CAPTCHA and logging.

Update: Looks like John has made some improvements. Now you get feedback after each classification based on what the automated classification of the email was. It also keeps a running total of your classifications, possible errors and those where you and the filter agreed. I am trying to recover a hard drive in Knoppix right now and had some free time while it copies, so I just did 200 emails. I disagreed with the filter or was unsure 13 times. Considering many of those I couldn't decide, looks like the corpus is pretty well filtered.