Re-writing the spiders 08 Nov 03

I have been very busy lately re-writing the spiders for the search engine. I have decided to write up what I did to build the spider in the vain hope that someone may find it useful one day. I digressed several times and had some fun writing a recursive one but I eventually settled on writing an iterative robot that uses Postgres to store the links. This was partly due to already having a database with several million links in it already. Please see the link above for more details. I have also managed to download a few thousand documents for the search engine, hence the increase in the links found, this was caused by me parsing the documents that I had found when experimenting with the new robots .
85.0 Million links found

Leave a Reply

Your email address will not be published.