BlogSpot SPAM Heuristics

Like many people I too get a lot of SPAM containing links to BlogSpot pages. The whole point is for a user to see a fairly trustworthy domain in an e-mail and click on it, then get quickly redirected to the intended site. I always thought it was strange because while the e-mails can not be stopped, Google could perform heuristics based scanning of the BlogSpot pages for questionable HTML and JavaScript.

Taking a few minutes, I decided to visit one of the BlogSpot SPAM sites with JavaScript off. I fully expected to see an instance of window.location, instead it was similar to:

document.write("<meta content='0;URL=http://www.example.com/?"+location.search.substring(1)+"' http-equiv='refresh'/>");

The page also was for the most part blank. It wouldn’t be much of a stretch to say that many of the other BlogSpot SPAM pages were employing similar methods. I attempted to find a Google Search that would let me search only within site source code, but was unable to.

It is my hope that within Google either efforts to do this are already underway or could be considered. Even if just given the right starting point I’d be glad to help come up with a set of heuristics to quickly flag-out these sites. If anyone reading this knows of search engine that is capable of doing HTML source searching, let me know.