Tuesday, June 08, 2004
What is chongqing?
Chonqging, China is the city where the spammer that inspired us to start this campaign works/lives. We quickly discovered this spammer likes to spam lots of wikis, blogs, and guestbooks with his links and the keyword Chongqing. Once we realized this guy was a big time web spammer we decided the best solution (in addition to blocking him) was to wreak his Google PageRank for his favorite keywords. Now we are also actively going after a number of other wiki spammers (cases and nominees). If you or someone you know have had your wiki spammed let us know on our submission page.
Most people that don't see wiki spam as a problem don't understand what a wiki really is. Some say any open content on the net is just asking to be spammed. Often wiki owners do find the easiest solution is to stop allowing anonymous posting. But that is not in the spirit of a wiki, they are meant to be open to anonymous users. Being open encourages everyone to add their knowledge of the topic for everyone to benefit from. When posting to a wiki, the information or link should benefit other users, not your PageRank.
Many say add a robots.txt so Google won't index the sandbox page. They don't realize that spammers aren't limiting themselves to sandbox pages. No spammer ever hit the sandbox page at POPFile (maybe since its not named SandBox), they just spammed the main page and the FAQ index. On other wikis, most of the spam I clean is not on sandbox pages either. I do really strongly agree there are good reasons to use robots.txt or noindex meta tags though. Previous history versions, diffs, deleted pages, about user pages, and sandboxes shouldn't be indexed. But its not a total solution.
A few people even say you should block the entire wiki from Google. But many wikis have very useful information and should be in Google for people to find the information. Wikis are a very popular source for documentation for open source projects (either as supplemental or the primary documentation). Gnome, Appache, Gentoo, POPFile, and K-Meleon are just a few examples of OSS that make good use of wikis. By having the documentation in a wiki, it gives non-programmers a way to contribute to the software they use.
I just discovered what I have been calling webspamming is also apparently called spamdexing:
Most people that don't see wiki spam as a problem don't understand what a wiki really is. Some say any open content on the net is just asking to be spammed. Often wiki owners do find the easiest solution is to stop allowing anonymous posting. But that is not in the spirit of a wiki, they are meant to be open to anonymous users. Being open encourages everyone to add their knowledge of the topic for everyone to benefit from. When posting to a wiki, the information or link should benefit other users, not your PageRank.
Many say add a robots.txt so Google won't index the sandbox page. They don't realize that spammers aren't limiting themselves to sandbox pages. No spammer ever hit the sandbox page at POPFile (maybe since its not named SandBox), they just spammed the main page and the FAQ index. On other wikis, most of the spam I clean is not on sandbox pages either. I do really strongly agree there are good reasons to use robots.txt or noindex meta tags though. Previous history versions, diffs, deleted pages, about user pages, and sandboxes shouldn't be indexed. But its not a total solution.
A few people even say you should block the entire wiki from Google. But many wikis have very useful information and should be in Google for people to find the information. Wikis are a very popular source for documentation for open source projects (either as supplemental or the primary documentation). Gnome, Appache, Gentoo, POPFile, and K-Meleon are just a few examples of OSS that make good use of wikis. By having the documentation in a wiki, it gives non-programmers a way to contribute to the software they use.
I just discovered what I have been calling webspamming is also apparently called spamdexing:
Wikipedia: Blog spam is a form of spamming known as spamdexing...