Thursday, August 24, 2006

HTTP Spider

HTTP Spider has been added to my catalgoue of Kamaelia examples.
Given an initial URL it downloads that page and then, by a breadth-first search, all pages that are linked to by downloaded pages. It can be limited to only download pages with a given prefix.

This example is fairly basic, not saving any pages to disc nor honouring robots.txt files, but it demonstrates the principal and could be adapted to look for 404s or to index a website for searching fairly easily.

HTTP Spider


