Anti-spam is a huge topic with email. People have been trying to keep their inbox clean for years with everything from the annoying spamarrest tools to simple junk-mail filters with their uncanny ability for false-positives. In fact, email spam became such a nuisance that the government even stepped in (imagine that) and wrote laws about it. But there’s one form of spam that’s still up to the vigilante’s to catch… and I’m hoping you’ll help me.
It started as just an annoyance, but it grew to all-out business interruption. Every form submission automatically triggers a lead in my CRM. Which meant that for the past year or so, I’ve had a heck-of-a-lot of leads to sell to SEO companies who can get me on page 1 of Google. So, I set out to create a home brew form-handler that would begin to identify and eliminate these nasty spammers WITHOUT risk of false-positive. Because, after all, while I hate spam, I hate a lost opportunity even more.
To start out, I boiled the types of spam that I could feasibly eliminate down to two categories:
- The real human who submits erroneous data just to get to that cookie behind the form… the free trial, the free white paper, the drip marketing content, etc.
- The bots that crawl the web submitting affiliate links and erroneous data to any form they can find.
Also, as part of this little collaborative project (which you may join by means of comment here) let me add the following parameter: NO CAPTCHA. I can’t read the dang things myself half the time and there is reason to fear that CAPTCHA itself reduces lead conversion by means of difficulty alone.
So, the trick is to create a series of logical tests against which one can run the form submitted data that will positively identify spam a significant percentage of the time while almost never blocking legitimate leads.
Here’s where I’m at:
- Insert an input into the form, type=text, but style=”display: none;”. Bots will naturally inject a value into any text input field in an effort to bypass required field checkers. However, if this particular field were to be submitted with data in it, we can know with certainty that a human did not do it.
- Check for “asdf.” Simple, I know, but a report of historic spam showed that this was a rather popular form of false submissions. If the string asdf appears in any field, it’s spam.
- Check for repeating characters. I tried and tried, but I could not think of a legitimate reason that any character should repeat more than 3 times in a name, company name, or address field. If you can convince me otherwise, great. As for now, “XXXX Consulting Company” will not become a lead for me.
- Check for identical strings. Other than Tim Allen’s neighbor, Wilson Wilson, nobody I know has the same string value in all fields of a contact form. If too many fields are identical, it’s spam.
- Finally, and this is key: check for URL’s where they don’t belong. One of the most classic cases of spam is to place a URL in a field where it doesn’t belong. Outside of the text-area “message” box, a URL should not be used for one’s name, phone number, company name, or otherwise. If they try it, it’s spam.
These 5 logical tests have reduced spam submissions by well over 70% in the past month on our free contact form product. I would love to get that figure even higher. The greatest number of spam submissions that still sneak by are ill-repute SEO offers. So, here’s the next challenge: Can you come up with a series of key terms and threshold for density that would reasonably indicate the content of the submission is talking about SEO? Of course, this might be a bad idea for the guys at SlingShot to implement on their site, but for the rest of us, it would fit.
Web developers unite: what else should be tested?