Inc. has an interesting story about how startup Distil Networks is attempting to detect bots online. This passage was particularly illuminating:
To determine who on the internet is real and who is fake, Distil’s system studies a range of variables. Does the actor’s cursor wander around the page like a person’s would, for example, or does it make a beeline for its target? When it clicks a button, does it press right in the center every time? Do its web browsing patterns across other web sites resemble those of a real person? After quickly collecting that data, the system uses a series of algorithms, strengthened over time by machine learning, to make a determination as to whether the actor is real or a bot. (source: It’s 10 A.M. Do You Know How Many Bots Have Influenced You Today? | Inc.com: https://www.inc.com/kevin-j-ryan/distil-networks-fights-malicious-online-bots.html)
This seems like a more promising method to me than much of the “bot spotting” tools and techniques typically found online. These usually focus on Twitter and in particular aspects of an account’s pattern of tweets and other characteristics. These will include aspects of the profile (creation date, description, profile image, etc.) and tweets (e.g. rate of tweets, retweets vs. original tweets, etc.).
While these methods certainly do tag some actual bots, they do hold out the possibility of false positives. They can also be fooled by savvy bot creators who set up their profiles in a way that avoids characteristics known to trigger “bot spotters,” both human and algorithmic.
Of course, the other major drawback of much of the current “bot spotting” is its almost exclusive focus on Twitter. Bots are impacting other platforms now and this will only increase in the future. One way this does/will happen is through browser automation, using tools like Python’s Selenium library or even simple browser extensions like iMacros for Chrome. Being able to detect automated control of the browser, as Distil’s technology seems to do, will be crucial to spotting these kinds of bots.
Nonetheless, I still worry about false positives or what I think of as “bot shaming” or “cyborg shaming.” There are many legitimate uses on social media for bots and automation, including semi-automated tools that help individuals find, curate, and distribute more content than otherwise would have been possible. I think, for example, of my own use of Buffer to schedule tweets and even a simple twitter bot built on a Google Spreadsheet to post links to books related to my area of research. Just this semester, one of my students wrote a simple iMacros script to automate the process of search and saving snapshots of results on the TwXplorer tool. This is precisely the kind of activity that Distil’s tools would identify as bot behavior. But in the latter case, it is a creative use of automation by a student to improve his research efficiency for a class project.
At the end of the day, tools like those being developed by Distil can provide better means to detect bots. But ultimately, we will still need human judgement. And in applying that judgement, we should be cautious not to veer into bot hysteria.