Just as email spam has cemented itself as an unstoppable scourge in our daily digital existence, content scraping bots are polluting the web by slowing sites down, stealing content, and generally running amok. And until now, there hasn’t been much we could do about it.
Arlington, Virginia-based Distil believes it has the answer to bad bots: It has developed a Content Protection Network that detects bots and prevents web scraping. The company has been so successful that it recently surpassed 1 billion blocked bots since it started counting around seven months ago.
Bots affect pretty much anyone who has a website online. They scrape content from digital publishers like VentureBeat and re-purpose it on their own ad-filled sites. Bots can also be used to gather intelligence around your business, which can give your competitors an advantage. Bots also increase the latency and server load for your website, which leads to a worse experience for legitimate visitors (and higher bills on your end).
Distil doesn’t just block IP addresses, which is typically the first course of action for online security companies. It’s secret sauce is in how it identifies bots.
Rami Essaid, Distil’s founder and chief executive, points out that IP addresses change frequently, so they’re not the best way to stop malicious people. Instead, the company uses a variety of approaches to discover bots, including behavioral analytics, session rate limiting, and a unique method of fingerprinting visitors.
“We look at the combination of your unique browser running on your OS, the settings of the browser and settings of your OS, and we combine that into something that’s unique about you,” Essaid explained in an interview with VentureBeat. “This is trackable on the server side — in the age where Do Not Track cookies are starting to be a bigger and bigger deal, something you can track server side gives us control over who we track and how. … We’ve got 2-3 years before people start saying we’re invading users’ privacy.”