I want to extend this project to make it a bit more interesting. I was thinking the other night what if I could run it like distributed.net that we could get a lot more down. My proposal would be for members to volunteer to start their own links harvesters and to upload them to a central repository after indexing. I am intending on purchasing some more RAM and some big IDE drives ( unless someone wants to donate me some for this project beg beg )
As far as I am aware its not the bandwidth and harvesting its the actual searching that is costly so any distributed search engine would need to be able to search across a distributed network. This would probably require some standardisation ie some sort of search data exchange protocol that allows easy calculation at the front end.
Does anyone want to volunteer for some harvesting. I can provide all source and directions on how to get started. I would prefer people with some knowledge of Postgres and Perl. You can contact me at harry[ at ]hjackson[ dot ]org a dial up connection is probably not much use either. If we got enough members we could even start thinking about building a distributed search engine for a laugh.
Anyway I am off to have a few beers in Portsmouth at a birthday party so the robots are going off for a while. Enjoy the rest of the weekend.
27M links_found
4.4M home_page