, Technology, WordPress

Contact Forms, Bots, and Shameless Spam

Anti-spam is a huge topic with email. People have been trying to keep their inbox clean for years with everything from the annoying spamarrest tools to simple junk-mail filters with their uncanny ability for false-positives.  In fact, email spam became such a nuisance that the government even stepped in (imagine that) and wrote laws about it. But there’s one form of spam that’s still up to the vigilante’s to catch… and I’m hoping you’ll help me.

It started as just an annoyance, but it grew to all-out business interruption. Every form submission automatically triggers a lead in my CRM. Which meant that for the past year or so, I’ve had a heck-of-a-lot of leads to sell to SEO companies who can get me on page 1 of Google. So, I set out to create a home brew form-handler that would begin to identify and eliminate these nasty spammers WITHOUT risk of false-positive. Because, after all, while I hate spam, I hate a lost opportunity even more.

To start out, I boiled the types of spam that I could feasibly eliminate down to two categories:

  1. The real human who submits erroneous data just to get to that cookie behind the form… the free trial, the free white paper, the drip marketing content, etc.
  2. The bots that crawl the web submitting affiliate links and erroneous data to any form they can find.

Also, as part of this little collaborative project (which you may join by means of comment here) let me add the following parameter: NO CAPTCHA.  I can’t read the dang things myself half the time and there is reason to fear that CAPTCHA itself reduces lead conversion by means of difficulty alone.

So, the trick is to create a series of logical tests against which one can run the form submitted data that will positively identify spam a significant percentage of the time while almost never blocking legitimate leads.

Here’s where I’m at:

  1. Insert an input into the form, type=text, but style=”display: none;”.  Bots will naturally inject a value into any text input field in an effort to bypass required field checkers.  However, if this particular field were to be submitted with data in it, we can know with certainty that a human did not do it.
  2. Check for “asdf.”  Simple, I know, but a report of historic spam showed that this was a rather popular form of false submissions.  If the string asdf appears in any field, it’s spam.
  3. Check for repeating characters.  I tried and tried, but I could not think of a legitimate reason that any character should repeat more than 3 times in a name, company name, or address field.  If you can convince me otherwise, great.  As for now, “XXXX Consulting Company” will not become a lead for me.
  4. Check for identical strings.  Other than Tim Allen’s neighbor, Wilson Wilson, nobody I know has the same string value in all fields of a contact form.  If too many fields are identical, it’s spam.
  5. Finally, and this is key: check for URL’s where they don’t belong.  One of the most classic cases of spam is to place a URL in a field where it doesn’t belong.  Outside of the text-area “message” box, a URL should not be used for one’s name, phone number, company name, or otherwise.  If they try it, it’s spam.

These 5 logical tests have reduced spam submissions by well over 70% in the past month on our free contact form product.  I would love to get that figure even higher.  The greatest number of spam submissions that still sneak by are ill-repute SEO offers.  So, here’s the next challenge:  Can you come up with a series of key terms and threshold for density that would reasonably indicate the content of the submission is talking about SEO?  Of course, this might be a bad idea for the guys at SlingShot to implement on their site, but for the rest of us, it would fit.

Web developers unite: what else should be tested?

5 Comments

  1. 1

    I absolutely love the idea of adding a field with display:none. It’s ingenius! I wrote a post many moons ago about how terrible a technology Captcha is… it punishes the innocent and adds an additional, unnecessary step for users. It’s the antithesis of user experience. I may put your hidden field to the test!

  2. 2

    I absolutely love the idea of adding a field with display:none. It’s ingenius! I wrote a post many moons ago about how terrible a technology Captcha is… it punishes the innocent and adds an additional, unnecessary step for users. It’s the antithesis of user experience. I may put your hidden field to the test!

  3. 3

    It does work really well, but if you roll it out on existing forms it may take a while for the effect to propogate. Bots often cache your form and post to it as they saw it weeks ago until they come back around and see it again. So, as long as they’re posting to your cached form, they’ll get through. Within about a month, you should start to see results.

  4. 4

    1. A timer;
    2. Hard to guess form field names;
    3. server-side form validation;
    4. a form field not expected to have a value;
    5. having JavaScript update a hidden field w/ a form submit;
    6. change form attributes on submit w/ JavaScript;

    #1 is my favorite. Start a timer as soon the contact (or any page) page is loaded. On the server side set an expected required time to fill out the form. If submitted too soon, the user will see a message/account disabled/admin receives an email/etc. This one actually eliminates 99.9% of any type of bot activity.

    #2 store field names in a session and give the fields random names. Makes it hard for a bot to learn.

    #3 this one is important. Email can be verified very accurately w/ regular expressions, a phone number field is supposed to contain 10 numbers, 2 or more fields w/ same value=bot, etc.

    #4 explained in your article, 5 and 6 some script options.

  5. 5

    Thanks for the post, Nick. Appreciate the share.

    Martin – I think the timer is a great idea. I assume a bot would zip through it and the threshold would be somewhat low… maybe 5 seconds? I am just curious because of prefilled forms for actual users as well as users that come back to the page and know immediately that they want to fill out the form.  just my two pennies. i know i’m about a year late on this post so not expecting much of a reply, just putting it out there in hopes 🙂

    thanks again!

    -Dave

Leave a Reply