I spoke to someone today who expressed an interest in the distributed search engine. I am hoping that he is able to start taking on some of the work which will speed the project up and give us a total bandwidth of 1.2Mb which although small it has doubled my current spidering capability. I may also have perked someone elses interest but I need to chase that lead up so its all good fun cultivating people who are a bit bananas.
Word Parser in C++
I have been writing a word parser in C++ for a while now and noticed two things that had quite a distinct affect on performance. I was storing the words in a “map” which is a standard associative container. I knew I should have been using a hash_map but unfortunately when writing the parser I had access to a minimal Borland library on a windows box so hash_map was out of the question.
When I got home however and back on the Linux box I had a poke around and discovered “ext/hash_map” which is not officially part of the C++ standard but its widely used and so in my eyes ok.
This change had a marked improvement as the word list grew to just over a few thousand. Having a word list in excess of a few hundred thousand meant that it had a vast improvement in performance.
Another improvement was the way I was reading files into a single string object then parsing the string for its words. I was originally using “getline” and appending the string but this is slow even if you reserve space for the string.
If you are looking for a
faster more elegant way to read a file into a string look no further
Text Parser
While on holiday I wrote a simple parser in C++ that will be used to create the word list for the lexicon. It now needs to be tested to see if its fast enough for use. It might just turn out that my C++ is a bit rusty and its a pile of crap. Nonetheless its pretty close to working.
I will post some times etc whenever I get them.
Search engine restarted
Well, I have managed to get my PC back in action and now have a decent sized disk (SATA 160Gb) installed so I started the robots again. They have been running for a couple of days now and so far I have collected 100,000 pages and dumped them on disk. This is on a 600k connection and it’ not running all the time. My target for testing is 1 million pages so I should have these by the end of May then the robots will be tamed a bit.
Once the pages are down I then need to figure out how I’m going to represent the documents on disk. There are various methods for this but I am intending to emulate an already popular search engine 😉 or at least do it the way they started and figure it out as I go along.
I intend to use C++ to do all the document parsing etc. This choice was made simply because I have not got time to roll my own binary trees etc or to learn a new library. I am fairly familiar with the STL so I will work with it..
Kernel Oops
I got a series of kernel oops when using SATA on an MSI K8T Neo motherboard. I have written a short article on what I did to get it working. Luckily for me it turned out to be a solvable problem or at least it hasn’t oopsed on me since.
Silenx 14Db Power Supply
I received the Silenx 450 Watt 14 Db power supply today. It was a piece of cake to install. Just comparing weights between this supply and my old ones was a good indication that it is a lot better.
It is certainly quieter than my old one but there is still lot I could do to make the PC even quieter. For a start the fans are blowing against crap old grills that not only restrict the air but the turbulence causes a lot of the noise. I also have one really noisy 20Gb IDE disk which will be getting yanked at some stage and replaced with something a bit more sociable.
Kernel Panic Unable To Mount root fs
To say that SATA on Linux is a pain in the ass is a bit of an understatement. The 2.4 kernel has limited support for SATA which is why I was very careful with what motherboard I bought. Paul Nasrat checked with some of the Fedora guys which chip sets where supported and the Promise PDC20378 and VIA VT8237 on the MSI and Asus motherboards are supported.
I spent quite a while trying to get a 2.6.4 kernel working on my machine. Lucky for me the netinst CD is also a recovery CD.
Anyway to save some poor person like me spending a long time wondering why their 2.6.4 kernel wont work here is a pointer on a problem that stumped me for a wee while.
I am no guru with the Linux kernel so some of this might very well be nonsense.
If you have compiled a kernel and are getting an error as follows
VFS: Cannot open root device “342” or unknown-block(3,66)
Please append a correct “root=” boot option
Kernel Panic VFS: Unable to mount root fs on unknown-block(3,2)
here are a few things to try.
First make sure that the file system that you are trying to use ie ext2, ext3, reiserfs or whatever is NOT being compiled as a module. To do this you need to locate the “.config” file that is created when you compile your kernel. The normal place for this is
/boot/config-2.6.4
or something like that. An example kernel config ????
In this file there are lots of options to determine whether the kernel will have the driver built in or as a loadable module. If you think about it some drivers need to be in the kernel in order for the kernel to boot and read off the filesystem ie your filesystem drivers in my case EXT2. Make sure these are configured so that they are built in to the kernel.
Another problem that I had was to do with using make-kpkg to build my kernel.
Normally most people finish creating their kernel when using make-kpkg using
make-kpkg kernel_image
when I did this I got the same error above. To fix this you can do the following
make-kpkg –initrd kernel_image
this will create an initrd image in the /boot/ partition which you can then need to link like
/initrd.img > boot/initrd.img-2.6.4
the command to create soft links is
cd /
ln -s /boot/initrd-kernelversion initrd.img
or something similar. This solved the above problem for me.
AMD64 and MSI K8T
Well, I got the AMD64 today and got it all put together with a few little mishaps. I am using an MSI K8T Neo motherboard which was bought simply because I couldn’t wait 5 days for the Asus which I would have preferred simply because it can take an extra gig of RAM over the MSI. I also had a problem with the power supply that was in the tower. Who would have thought that an ATX power supply wouldn’t have the proper 4 pin connector required to power the chip. I ripped Jen’s power supply out and used it. I can now safely say that an AMD 64 with 1 SATA 160Gb two 20Gb hard disks, CD player and 32Mb nvidia video card will run on a 300Watt power supply or at least it will on mine. I will be ordering a bigger quieter one but more on that later.
I uses an old netinst CD that I had laying around to put woody on and the immediately upgraded to sarge. All this went with very little mishap and so far the hardware has been running fine. All I need now is SATA support, I’ll do that tomorrow.
fvprotect.exe
fvprotect.exe
I found this on the windows PC today. AVG made a valiant attempt at cleaning it off the PC but I had to resort to editing the registry and various files to actually remove the buggers. This virus scans various files on your hard drive looking for email addresses and uses the address’s it finds to email itself. This is how it spreads
If you get this virus make sure you kill the fvprotect.exe process before wiping files or they will just come back. To do this you need to open the task manager and look for the process with the same name. Kill it, then go about removing infected files.
To edit the registry click on “Start” select “Run” and type “regedit” then use the menu’s at the top to do a search for “fvprotect.exe”. I delete all entries found.
This virus managed to get onto the PC because the AVG virus database was not up to date. I hardly ever use the windows PC so wasn’t really paying attention to it very much. I recommend updating the database at least every day.
Even when logged in as the administrator the system refused to delete the offending files. This is a cock up on microsofts part that they allow this behavior. If you are in as the administrator you should be allowed to do whatever the hell you like.
NEW
Since I seem to be getting some hits about fvprotect I decided to try and provide some links for people who want to remove the virus from their machine. So:
For immediate help on removing fvprotect do the following (This is from memory I use Linux normally)
Terminate the FVPROTECT.EXE process using Windows Task Manager. This can be done by “Right Clicking” the mouse over the bottom bar on your desktop and selecting the Task Manager. Then Select the Processes Tab and sort on name. When you see fvprotect.exe or something very similar then highlight it by clicking on it and select “End Now”. This will terminate the fvprotect process. If you try and remove the files first the fvprotect process will just recreate them so YOU MUST KILL FVPROTECT FIRST.
Delete the following files from your Windows directory (typically c:\windows or c:\winnt):
* FVPROTECT.EXE
* USERCONFIG9X.DLL
* BASE64.TMP
* ZIP1.TMP
* ZIP2.TMP
* ZIP3.TMP
* ZIPPED.TMP
Files could be in UPPER or lower case or any combination so check for this. The worm will have deposited lots of files on your disk most of which will have pornographic names. You must either remove these manually or have the virus scanner updated and then let it remove them.
Edit the registry:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\
Windows\CurrentVersion\Run
Delete the key “Norton Anti virus AV” with %WinDir%\FVProtect.exe
I recommend searching the registry for all occurrences of the word “fvprotect” and removing it from the registry
For more information visit.
Sophos website:
General information about fvprotect and netskyb
Removing fvprotect.exe and netskyb
Tie::RDBM::Cached failing tests
Bugger damn and blast. I was such an obvious mistake. Tie::RDBM::Cached is failing tests due to dependency problems. As soon as I have a working PC I will fix that and wait for the next one 😉