SCSI or SATA

There seems to be a bit of confusion for some people as to what they should be using when it comes to hard drives.
There are those that will disagree with this and say they use SATA in a server or SCSI at home. For those then you probably know what you are doing so reading this is a bit moot.
If you are a home user then use SATA. Spending megabucks on SCSI would be a waste of your money and you are quite unlikey to see the benifits of it over SATA. SATA is fast enough for everything I have needed at home, and more.
If you are running a server that is on 24/7 and you expect it to remain that way for a very long time and you don’t want to be called ot at 4am in the morning to replace a disk, use SCSI.

ia_archiver DOS attack

I had to send alexa the following message tomnight after suffering a Denial Of Service attach from Alexa. They have been hitting me about 5 times a second for quite a while now and it does not look as if it is going to ease up.

Your ia_archiver robot is hitting my website 5 times a second.
www.uklug.co.uk
This is crazy, please stop it now. I have now added it to the robots.txt file. Don’t you know that a what you are doing to my server is against the law. I can hardly use it at all. Get seomeone in the tech department to fire the idiot who wrote/runs that spider, their clueless.

I blocked them using mod_rewrite and my advice to anyone that has more than a few pages would be to do the same. They are not playing the same ball game as google and yahoo, they obviously don’t give a damn about the people on the recieveing end of these attacks. Put the following in you .htaccess file.
RewriteCond %{HTTP_USER_AGENT} ^.*ia_archiver.* [NC]
RewriteRule ^.*$ – [F]
They recommend adding the following to your robots.txt file.
User-agent: ia_archiver
Disallow: /
I have heard that it is sometimes ignored though so my advice it to block them for good using mod_rewrite. They are only leeching the data or their own gain anyway. At least google forwards you some traffic.

Rack Space and Servers

Are a pain in the ass to find. Actually I should narrow that down a bit. It’s hard to find cheap rack space and cheap servers that are in or close to London and from a company or someone you can trust. There are various offerings on the market most of which are downright rip offs. Its amazing how the prices can differ between 2 suppliers in RedBus or how the prices of 2 server can differ.
Out of curiosity I priced up a quarter rack ages ago (Aug 2003) to see what I could get and the cheapest was £3300 with a 1Mb connection and power. This also came with a £350 setup fee. I have had a look around a few forums and the cheapest I could see tonight was around £3000 with a similar setup fee. It would seem that rack space is holding its price regardless of how many people seem to be in on it.
I only need 1u or 2u to start with and the prices are quite expensive. I am currently looking at UKFSN because I know people with a box there and the guy who runs it (Jason Clifford) has a very good reputation. The proceeds also go to help the free software movement which is something I am interested in. I have seen a few cheaper than this but none by any great margin, at least not yet.
I also need to get a nice 1u possibly 2u rack mount server. I like Dell because parts are fairly cheap and common on ebay but I also like HP which are not cheap and parts are not as common on ebay. If the HP goes duff there is a good chance you will need to go back to the manufacturer and when you do make sure you visited Boots for some ky jelly.
As for which one is more reliable I would need to speak to several unbiased sysadmin’s who had been using both for years. There are good and bad reports for both online.
Given a choice between the 2 I would pick a dell simply because there are more spare parts floating around, although like HP, if you buy direct don’t forget to visit Boots.

Backup to CDRW

For too long now I have been lazy with my backup procedures which was normally a quick rsync to a different hard drive. This is hardly ideal and up until now I have been quite lucky. I also tend to have a blast every so often and burn some bits to CD but I have been doing a lot of work lately and I know what its like to loose a few days worth of it.
For those with the luxury of a large Lacie drive or a decent tape drive then deciding what to backup is relatively easy. I decided to limit myself to a single CDR-W which is a measly 650Mb. I did this because I am a hoarder and it was about time I cleaned house. Besides, if push comes to shove I have a 20Gb drive that is not plugged in due to noise so if the backup takes more than 650Mb then I will use it rather than risk loosing the data.
I have four users on this machine that I use to do all my work, all other users are used by the system and for the most part I am not worried about the data they generate. I cannot backup each users entire directory because there is too much data do I limited myself to certain directories in the users home directory.
First thing I did was create an area where the backups are going to take place. I have 19Gb free in one partition on an SATA drive so thats where it’s going.
The basic idea is as follows.
1. Determine which directories in each users home directory are important.
2. Determine if there are any files outside of these directories that are important.
3. Move those files inside one of the directories.
4. Weed any crap from the directories, either delete it or move it out to another area on disk.
5. Write the backup script.
For step 2 there are some files outside these directories that I would like backed up. Some examples are.
/boot
/etc/
/var/spool/cron
/var/spool/mail
Backups script itself.
The backup script does the following.
rsync -av user1/dir1 rsync_dir/user1/
rsync -av user1/dir2 rsync_dir/user1/
rsync -av user1/dir3 rsync_dir/user1/
tar -czvf user1.tar.gz tar_dir/user1/
………
……… DO THE SAME FOR ALL USERS
………
rsync -av /var/spool/cron/crontabs rsync_dir/system/
rsync -av /var/spool/mail rsync_dir/system/
……… rsync each system directory
tar -czvf system.tar.gz tar_dir/system/
I have deliberately created separate tar.gz files because its easier and faster to extract them on an individual basis where we just want a couple of files out of the backup. One thing to note is that when you tar the files up you want the paths to be the paths to the original files on disk not the rsynced files that we just copied. This is for sanity checking later.
To create an image of the tar files we need to make an ISO image of them as follows.
mkisofs -r -J -l -o backup.iso tar_dir/
Once the iso has been written we need to make sure that it is not too big
( < 650Mb use "ls -lah") and then we can burn it to our CDR-W as follows.
cdrecord -v blank=fast
cdrecord -v speed=8 dev=1,5,0 backup.iso
Note that I am blanking the CDRW before I write to it. The above script is now being managed by cron so I no longer need to worry too much about the backup as long as I have a cdrw in the disk. The next step is to check to make sure that the backup actually worked.
The best way to do this is to use tar.
From the man page:
-d, –diff, –compare
find differences between archive and file system
cd rsync_dir/user1/
mount /cdrom/
tar -zdf -diff /cdrom/user1.tar.gz
If you don't see anything then you have a clean backup from the file system. If you are unsure if anything happened edit a file on the file system and try it again. You would get a similar message to this if all you do is change the mod time of the file
bin/document_parser.pl: Mod time differs
bin/indexer.pl: Mod time differs
Next thing we should do is start rotating the media so I am off into town tomorrow during my lunch break to get a couple more rewritable CD's.
The above is a very simplified version of what I have done. There are lots of options to rsync and tar that can make things much easier so go have a look. I also have some websites not on the local machine that I am doing manually.
I also need to get myself another large disk for the machine. I would ideally like to use SCSI but it's very expensive. Another big SATA drive may just be what I am after or perhaps one of those Lacie drives…………..

Image Theft

I just noticed that some idiot has decided to steal bandwidth by linking to an image on my website. I went hunting around looking for a suitable replacement when I came across an entry by Jeremy Zawodny
Unfortunately the site has decreased the size of the image to 24×24 which means you don’t get the full effect of the image. I did consider using the infamous goatse image but decided against it.
I used mod_rewrite to change the image by adding a couple more rules to my .htaccess file.
RewriteCond %{HTTP_REFERER} ^http://(www\.)agneasy\.splinder\.com.*$
RewriteRule ^.*pics/badger_logo\.png$ /pics/babyshit\.jpg [L]
After a while I will just block the IP and have done with it.

Apache Error

I created a duplicate website on my box today for testing the rss jobs site and lo and behold a famous error.
Invalid command ‘PerlHandler’, perhaps mis-spelled or defined by a module not included in the server configuration.
Basically I have not got mod_perl working for the localhost server. I edited httpd.conf and added the following and we are in business.
AddModule mod_perl.c

Alias /perl/ /var/www/perl/

SetHandler perl-script
PerlHandler Apache::Registry
Options +ExecCGI

I now have a mirror of the live site on the local machine for testing.

Postgresql: Function foo() does not exist

Very often we are humbled by the simplest things and tonight I got a good one.
I had created a plpgsql function that was called from within a trigger to check for some duplicates in a table, blah blah. The function was working because I had tested it. It was registered in the “pg_proc” table and the two “proargtypes” were type 1043 ie varchar. This function was there, I could see it and if I was a character in Tron I could touch it, so why the hell when one of my scripts ran did I get: function foo does not exist.
I’ll tell you why, I was not using the correct schema. Ahhhhhhhhhhhh.
I had tested it while logged in as the user who created it and that users schema is different from the user that needs to use it from the website. A quick
grant usage on schema foo to bar;
sorted that problem.

rsync and vfat filesystems

When I have added a few CD’S to my ogg collection I need to copy them across to my H320. This is OK if you copy the CD over as soon as it has been copied but I don’t normally work that way. We just got some CD’s for Christmas and I already had a few copied to the hard drive so as usual I use rsync to do it.
I noticed that rsync would always copy everything rather than do an update of the files. I found out that this is because of limitations on a vfat file system. The following command sorted it.
rsync -av –modify-window=1 * /mnt/usb/

Counting files of a particualr type

I am pretty sure there is an easier way to do this but this is how I done it the first time.
find ./ -name “*.ogg” -print | perl -wne ‘$i++; END{print “$i\n”}’
I then thought about it a bit and came up with
find ./ -name “*.ogg” -print | xargs ls | wc -l
now that is much neater 😉 but wait, we don’t need the “xargs ls”
find ./ -name “*.ogg” -print | wc -l
I then thought about doing it this way
ls -R | grep “.ogg” | wc -l
I’m going to leave it there 😉

iRiver Playlist Error

I tried to make up a few playlists today for my H320 iRiver. I am using vim to make them not an application and this had me stumped for a while.iRiver_h320.jpg
Then I realised that the iRiver expects a DOS formatted file. Basically Linux text files use “\n” for a newline and MS windows uses “\r\n”. To get it to work I had to save the file as a dos file. This can be done in vim as follows.
:set fileformat=dos
For those that don’t like vim then a Perl one liner will do the same thing.
perl -i.bak -pe ‘s/(?<!\r)\n/\r\n/g;’ *.m3u
It is safe to run this on a file that has already been converted to the msdos format. It also creates a backup file in case anything goes horribly wrong.
So a simple example of an m3u playlist is as follows:

#EXTM3U
\cypress_hill\a_to_the_k.ogg

Of course the “\r\n” is not visible but its there 😉 The example m3u file above is in the ROOT directory of the mp3 player. Remember that on the H320 and probably on the other iRiver makes as well you need to use the A-B button to view your playlist when the player has stopped.