in orbit

I mostly talk about video games and the world wide web

Sep292006

Spam: A Lament

I hate spam. A lot. And spammers. Today at work our SMTP server was (and still is being) brought to its knees because it's getting hammered by spambots attempting to use it for outgoing mail. The guy who runs it says it's getting about four messages a second. Nobody can send e-mail and there's not much we can do. Hooray for Fridays! Fortunately we have been able to use an alternative port for outgoing e-mail that was set up a little while back in order to allow people to use our SMTP server from their home ISP (most ISPs, as I'm sure most of you are aware, do not let you use an outgoing mail server that isn't their own), but it's got a new use for this day.

Another different, but equally annoying type of spam is web page comment spam. This blog gets maybe one or two a week at most because a) the content-management system is non-standard and b) it's very low traffic. However it's still obviously spammable because I still have to nuke spammy comments every so often. Now, I run another web site using this same CMS (well, a slightly outdated one with a few custom changes to fit the content of the page) which is linked on the left and called The Mantis-Eye Experiment. It's a site I put entirely too much time into and is centered around one of my favorite shows, The Venture Bros. It is high traffic. Just to give you an idea, this blog gets anywhere from 1000 to 2200 hits per month (low of 1037, high of 2256), whereas Mantis-Eye got 51,606 hits in June, 72,318 hits in July and 91,469 hits in August since the new season of the show started back in late June. Before then it pulled in anywhere from 10k to 15k hits per month.

Since I implemented the new comment system back in May (I believe) the page has gotten 24,469 user comments. Of these, 14,524 were not nuked, meaning 9,945 comments have been nuked (which means they stay in the database but are not displayed to users). 9,525 comments have been flagged as spam and auto-nuked by the system's content scanning (which is very simple and pretty much just looks for common spam words like 'viagra' or any posts with excessive use of bbcode).

420 comments have been nuked but are not flagged spam. A couple of those are double-posts or me nuking someone for posting how or where to pirate the show, but that's not overly common. So in about four months I have had to deal with and personally nuke 420 comments. That's not too bad, but I can't be around all the time, and the spam is really annoying and just makes a page look shitty in general.

Today I think I found a very simple way of tricking spam bots. Apparently what they do is scan the HTML of a page and look for the first
block they see. They fill in the inputs and submit the data and go along their merry way. Over the last month I have noticed that only the first article on the site would get spammed consistently with the other ones getting spammed only rarely. This is because the comment submission boxes for each article are already on the main page, hidden from view thanks to CSS display properties.

So, using my comment box output function I added a single line of code which would output a new comment submission box with an id of zero. The id normally relates to the update id number, but there is no update zero. So, when the backend PHP gets hold of the submitted comments it notes that the id is zero and instead of putting it into the comments table it simply increments a counter I have set up (this step is unnecessary and really just there to feed my curiosity) and then informs the spambot that it's a big jerk, though I doubt spambots care about the redirect after they submit their wonderful data.

Thus far it's actually working. I implemented it at around 10:30am today and right now the count is up to 18 and there is no new spam in any of the latest articles. So, where my content filtering failed, this simple solution has been quite successful in it's short five-hour run. I know there are tons of other things I could implement, like a Bayesian filter, or utilize cookies or sessions, but this required only a few extra lines of PHP and a single optional mySQL table and seems to be working just as well. Now if they get the bright idea to scan pages for multiple blocks I might be screwed and forced to do some actual work, but for now it's at least a temporary victory for me.

Nobody tell my secret to the spammer guys, okay?
Add Comment
Name:

URL (optional):

Your Comment:

SPAM! Fry it up with a side of scrambled eggs! =D

Sep. 30, 2006 (1:57am EST)

... or you could install Wordpress with the Spam Karma 2 plug-in. No spam at all for the whole summer. I also don't get any traffic... but still - you've been served, son!

Sep. 30, 2006 (11:42am EST)

#3 - Mike Reply
yes but existing technology that I didn't make scares me :(

Sep. 30, 2006 (3:56pm EST)

Same with vaginas. :(

Oct. 1, 2006 (4:30am EST)

"most ISPs, as I'm sure most of you are aware, do not let you use an outgoing mail server that isn't their own"

This isn't true in my experience. Many of them, such as Comcast, block SMTP servers that are running inside their network, other than their own SMTP server. But I use several different SMTP servers that aren't Comcast's and they work just fine. I do have to use SMTP authentication in all cases, but they are all still on port 25.

Oct. 3, 2006 (3:51am EST)

Yeah this has been my experience as well. Usually one can only use an ISP's own SMTP server if they're a customer for authentication, but there's nothing really stopping you from using an SMTP server of your choice so long as you have authorized access to it.

Also hooray for custom built blog software and go team venture!

Oct. 3, 2006 (7:30pm EST)