Resurrection
09.11.10, 04:32
Easy PHP Blackhole Trap with WHOIS Lookup for Bad Bots (http://perishablepress.com/press/2010/07/14/blackhole-bad-bots/)
In five easy steps, you can set up your own Blackhole to trap bad bots and protect your site from evil scripts, bandwidth thieves, content scrapers, spammers, and other malicious behavior.
One of my favorite security measures here at Perishable Press (http://perishablepress.com/) is the site’s virtual Blackhole trap for bad bots. The concept is simple: include a hidden link to a robots.txt-forbidden directory somewhere on your pages. Bots that ignore or disobey your robots rules will crawl the link and fall into the trap, which then performs a WHOIS Lookup (https://ws.arin.net/whois/) and records the event in the blackhole data file. Once added to the blacklist data file, bad bots immediately are denied access to your site. I call it the “one-strike” rule: bots have one chance to follow the robots.txt protocol (http://www.robotstxt.org/), check the site’s robots.txt file, and obey its directives. Failure to comply results in immediate banishment. The best part is that the Blackhole only affects bad bots: normal users never see the hidden link, and good bots obey the robots rules in the first place.
The Perishable Press Blackhole is released under GNU General Public License. Check the Creative Commons for a summary and/or see the Blackhole source code for additional information. Also note that by downloading the Blackhole, you agree to accept full responsibility for its use. In no way shall the author be held accountable for anything that happens after the file has been downloaded.
Perishable Press Blackhole for Bad Bots (http://perishablepress.com/press/download-manager.php?id=62)
In five easy steps, you can set up your own Blackhole to trap bad bots and protect your site from evil scripts, bandwidth thieves, content scrapers, spammers, and other malicious behavior.
One of my favorite security measures here at Perishable Press (http://perishablepress.com/) is the site’s virtual Blackhole trap for bad bots. The concept is simple: include a hidden link to a robots.txt-forbidden directory somewhere on your pages. Bots that ignore or disobey your robots rules will crawl the link and fall into the trap, which then performs a WHOIS Lookup (https://ws.arin.net/whois/) and records the event in the blackhole data file. Once added to the blacklist data file, bad bots immediately are denied access to your site. I call it the “one-strike” rule: bots have one chance to follow the robots.txt protocol (http://www.robotstxt.org/), check the site’s robots.txt file, and obey its directives. Failure to comply results in immediate banishment. The best part is that the Blackhole only affects bad bots: normal users never see the hidden link, and good bots obey the robots rules in the first place.
The Perishable Press Blackhole is released under GNU General Public License. Check the Creative Commons for a summary and/or see the Blackhole source code for additional information. Also note that by downloading the Blackhole, you agree to accept full responsibility for its use. In no way shall the author be held accountable for anything that happens after the file has been downloaded.
Perishable Press Blackhole for Bad Bots (http://perishablepress.com/press/download-manager.php?id=62)