Search engine restarted

Well, I have managed to get my PC back in action and now have a decent sized disk (SATA 160Gb) installed so I started the robots again. They have been running for a couple of days now and so far I have collected 100,000 pages and dumped them on disk. This is on a 600k connection and it’ not running all the time. My target for testing is 1 million pages so I should have these by the end of May then the robots will be tamed a bit.
Once the pages are down I then need to figure out how I’m going to represent the documents on disk. There are various methods for this but I am intending to emulate an already popular search engine 😉 or at least do it the way they started and figure it out as I go along.
I intend to use C++ to do all the document parsing etc. This choice was made simply because I have not got time to roll my own binary trees etc or to learn a new library. I am fairly familiar with the STL so I will work with it..

Silenx 14Db Power Supply

I received the Silenx 450 Watt 14 Db power supply today. It was a piece of cake to install. Just comparing weights between this supply and my old ones was a good indication that it is a lot better.
It is certainly quieter than my old one but there is still lot I could do to make the PC even quieter. For a start the fans are blowing against crap old grills that not only restrict the air but the turbulence causes a lot of the noise. I also have one really noisy 20Gb IDE disk which will be getting yanked at some stage and replaced with something a bit more sociable.

Kernel Panic Unable To Mount root fs

To say that SATA on Linux is a pain in the ass is a bit of an understatement. The 2.4 kernel has limited support for SATA which is why I was very careful with what motherboard I bought. Paul Nasrat checked with some of the Fedora guys which chip sets where supported and the Promise PDC20378 and VIA VT8237 on the MSI and Asus motherboards are supported.
I spent quite a while trying to get a 2.6.4 kernel working on my machine. Lucky for me the netinst CD is also a recovery CD.
Anyway to save some poor person like me spending a long time wondering why their 2.6.4 kernel wont work here is a pointer on a problem that stumped me for a wee while.
I am no guru with the Linux kernel so some of this might very well be nonsense.
If you have compiled a kernel and are getting an error as follows
VFS: Cannot open root device “342” or unknown-block(3,66)
Please append a correct “root=” boot option
Kernel Panic VFS: Unable to mount root fs on unknown-block(3,2)
here are a few things to try.
First make sure that the file system that you are trying to use ie ext2, ext3, reiserfs or whatever is NOT being compiled as a module. To do this you need to locate the “.config” file that is created when you compile your kernel. The normal place for this is
/boot/config-2.6.4
or something like that. An example kernel config ????
In this file there are lots of options to determine whether the kernel will have the driver built in or as a loadable module. If you think about it some drivers need to be in the kernel in order for the kernel to boot and read off the filesystem ie your filesystem drivers in my case EXT2. Make sure these are configured so that they are built in to the kernel.
Another problem that I had was to do with using make-kpkg to build my kernel.
Normally most people finish creating their kernel when using make-kpkg using
make-kpkg kernel_image
when I did this I got the same error above. To fix this you can do the following
make-kpkg –initrd kernel_image
this will create an initrd image in the /boot/ partition which you can then need to link like
/initrd.img > boot/initrd.img-2.6.4
the command to create soft links is
cd /
ln -s /boot/initrd-kernelversion initrd.img
or something similar. This solved the above problem for me.

AMD64 and MSI K8T

Well, I got the AMD64 today and got it all put together with a few little mishaps. I am using an MSI K8T Neo motherboard which was bought simply because I couldn’t wait 5 days for the Asus which I would have preferred simply because it can take an extra gig of RAM over the MSI. I also had a problem with the power supply that was in the tower. Who would have thought that an ATX power supply wouldn’t have the proper 4 pin connector required to power the chip. I ripped Jen’s power supply out and used it. I can now safely say that an AMD 64 with 1 SATA 160Gb two 20Gb hard disks, CD player and 32Mb nvidia video card will run on a 300Watt power supply or at least it will on mine. I will be ordering a bigger quieter one but more on that later.
I uses an old netinst CD that I had laying around to put woody on and the immediately upgraded to sarge. All this went with very little mishap and so far the hardware has been running fine. All I need now is SATA support, I’ll do that tomorrow.

fvprotect.exe

fvprotect.exe
I found this on the windows PC today. AVG made a valiant attempt at cleaning it off the PC but I had to resort to editing the registry and various files to actually remove the buggers. This virus scans various files on your hard drive looking for email addresses and uses the address’s it finds to email itself. This is how it spreads
If you get this virus make sure you kill the fvprotect.exe process before wiping files or they will just come back. To do this you need to open the task manager and look for the process with the same name. Kill it, then go about removing infected files.
To edit the registry click on “Start” select “Run” and type “regedit” then use the menu’s at the top to do a search for “fvprotect.exe”. I delete all entries found.
This virus managed to get onto the PC because the AVG virus database was not up to date. I hardly ever use the windows PC so wasn’t really paying attention to it very much. I recommend updating the database at least every day.
Even when logged in as the administrator the system refused to delete the offending files. This is a cock up on microsofts part that they allow this behavior. If you are in as the administrator you should be allowed to do whatever the hell you like.
NEW
Since I seem to be getting some hits about fvprotect I decided to try and provide some links for people who want to remove the virus from their machine. So:
For immediate help on removing fvprotect do the following (This is from memory I use Linux normally)
Terminate the FVPROTECT.EXE process using Windows Task Manager. This can be done by “Right Clicking” the mouse over the bottom bar on your desktop and selecting the Task Manager. Then Select the Processes Tab and sort on name. When you see fvprotect.exe or something very similar then highlight it by clicking on it and select “End Now”. This will terminate the fvprotect process. If you try and remove the files first the fvprotect process will just recreate them so YOU MUST KILL FVPROTECT FIRST.
Delete the following files from your Windows directory (typically c:\windows or c:\winnt):
* FVPROTECT.EXE
* USERCONFIG9X.DLL
* BASE64.TMP
* ZIP1.TMP
* ZIP2.TMP
* ZIP3.TMP
* ZIPPED.TMP
Files could be in UPPER or lower case or any combination so check for this. The worm will have deposited lots of files on your disk most of which will have pornographic names. You must either remove these manually or have the virus scanner updated and then let it remove them.
Edit the registry:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\
Windows\CurrentVersion\Run
Delete the key “Norton Anti virus AV” with %WinDir%\FVProtect.exe
I recommend searching the registry for all occurrences of the word “fvprotect” and removing it from the registry
For more information visit.
Sophos website:
General information about fvprotect and netskyb
Removing fvprotect.exe and netskyb

Tie::RDBM::Cached failing tests

Bugger damn and blast. I was such an obvious mistake. Tie::RDBM::Cached is failing tests due to dependency problems. As soon as I have a working PC I will fix that and wait for the next one 😉

My PC Gave up the Ghost

I am not a happy bunny. Although my PC (which was damageed by faulty products from www.scan.co.uk ) was on the cards I was hoping it would hold out a little longer or at least until I could get some cash saved up for the hardware that I want.
I would really have liked a dual opteron system but the price is way out of my league at the moment. If you are wondering why I want a faster system its because of my search engine hobby.
The database is several gigabytes in size and growing very rapidly. This means that backups are starting to take a long time to complete due to the lack of grunt in my current XP1700. It takes about 1 hour to gzip the database up, don’t ask me about bzip2, I gave up waiting. I also need more grunt when actually trying to create the vector space and get some results back, its fast enough now but this is only for a few thousand documents. I am aiming for the magic one million. Parsing the files is also exceedingly slow (I know I should just do it in C) and more grunt would also help with this.
I am probably going to go for a single AMD64 and hope that it will last for several months before I need more power at which point I may be able to afford a dual system. I am also going to try and do more of the grunt work in C for obvious reasons.

Google Adsense

I have decided to add some google adsense adds to my Uklug website. I have several reasons for doing this not least of which is trying to make a little money to pay for the hosting of the site etc.
Google adsense has a filtering facilty where I could if I wanted block adds that are in direct competition to me or adds I do not agree with. In keeping with Googles ethic with their search results ie they do not filter their search results of their competitors. I will not block any competitors based on competition grounds so you should see plenty of adds for other jobsites on my Google adds.
Happy Hunting!