-
https://github.com/DannyBen/webcache - hassle-free caching for HTTP download
-
https://github.com/DannyBen/lightly - a file cache for performing heavy tasks, lightly
- https://github.com/gurgeous/sinew - a ruby DSL for structured web crawling, with a robust caching system
- Gopher ? => Webgo e.g. Webgo.get - Why? Why not?
well known crawlers (and user agent strings):
- Googlebot by Google
- Bingbot by Microsoft
- Slurp by Yahoo!
- ??
- more http://www.robotstxt.org/db.html
User-agent: *
Crawl-Delay: 20