For too long now I have been lazy with my backup procedures which was normally a quick rsync to a different hard drive. This is hardly ideal and up until now I have been quite lucky. I also tend to have a blast every so often and burn some bits to CD but I have been doing a lot of work lately and I know what its like to loose a few days worth of it.
For those with the luxury of a large Lacie drive or a decent tape drive then deciding what to backup is relatively easy. I decided to limit myself to a single CDR-W which is a measly 650Mb. I did this because I am a hoarder and it was about time I cleaned house. Besides, if push comes to shove I have a 20Gb drive that is not plugged in due to noise so if the backup takes more than 650Mb then I will use it rather than risk loosing the data.
I have four users on this machine that I use to do all my work, all other users are used by the system and for the most part I am not worried about the data they generate. I cannot backup each users entire directory because there is too much data do I limited myself to certain directories in the users home directory.
First thing I did was create an area where the backups are going to take place. I have 19Gb free in one partition on an SATA drive so thats where it’s going.
The basic idea is as follows.
1. Determine which directories in each users home directory are important.
2. Determine if there are any files outside of these directories that are important.
3. Move those files inside one of the directories.
4. Weed any crap from the directories, either delete it or move it out to another area on disk.
5. Write the backup script.
For step 2 there are some files outside these directories that I would like backed up. Some examples are.
/boot
/etc/
/var/spool/cron
/var/spool/mail
Backups script itself.
The backup script does the following.
rsync -av user1/dir1 rsync_dir/user1/
rsync -av user1/dir2 rsync_dir/user1/
rsync -av user1/dir3 rsync_dir/user1/
tar -czvf user1.tar.gz tar_dir/user1/
………
……… DO THE SAME FOR ALL USERS
………
rsync -av /var/spool/cron/crontabs rsync_dir/system/
rsync -av /var/spool/mail rsync_dir/system/
……… rsync each system directory
tar -czvf system.tar.gz tar_dir/system/
I have deliberately created separate tar.gz files because its easier and faster to extract them on an individual basis where we just want a couple of files out of the backup. One thing to note is that when you tar the files up you want the paths to be the paths to the original files on disk not the rsynced files that we just copied. This is for sanity checking later.
To create an image of the tar files we need to make an ISO image of them as follows.
mkisofs -r -J -l -o backup.iso tar_dir/
Once the iso has been written we need to make sure that it is not too big
( < 650Mb use "ls -lah") and then we can burn it to our CDR-W as follows.
cdrecord -v blank=fast
cdrecord -v speed=8 dev=1,5,0 backup.iso
Note that I am blanking the CDRW before I write to it. The above script is now being managed by cron so I no longer need to worry too much about the backup as long as I have a cdrw in the disk. The next step is to check to make sure that the backup actually worked.
The best way to do this is to use tar.
From the man page:
-d, –diff, –compare
find differences between archive and file system
cd rsync_dir/user1/
mount /cdrom/
tar -zdf -diff /cdrom/user1.tar.gz
If you don't see anything then you have a clean backup from the file system. If you are unsure if anything happened edit a file on the file system and try it again. You would get a similar message to this if all you do is change the mod time of the file
bin/document_parser.pl: Mod time differs
bin/indexer.pl: Mod time differs
Next thing we should do is start rotating the media so I am off into town tomorrow during my lunch break to get a couple more rewritable CD's.
The above is a very simplified version of what I have done. There are lots of options to rsync and tar that can make things much easier so go have a look. I also have some websites not on the local machine that I am doing manually.
I also need to get myself another large disk for the machine. I would ideally like to use SCSI but it's very expensive. Another big SATA drive may just be what I am after or perhaps one of those Lacie drives…………..
More RSS Job Feeds
I managed to find another 3 rss job feeds today for the RSS Jobs website.
That takes me back up to 19 job feeds in total. I had to take 3 of the feeds off earlier in the week because they were just duplicate jobs so I decided to go looking for some more.
As usual if you find an rss job feed please let me know and I will add it to the database.
Image Theft
I just noticed that some idiot has decided to steal bandwidth by linking to an image on my website. I went hunting around looking for a suitable replacement when I came across an entry by Jeremy Zawodny
Unfortunately the site has decreased the size of the image to 24×24 which means you don’t get the full effect of the image. I did consider using the infamous goatse image but decided against it.
I used mod_rewrite to change the image by adding a couple more rules to my .htaccess file.
RewriteCond %{HTTP_REFERER} ^http://(www\.)agneasy\.splinder\.com.*$
RewriteRule ^.*pics/badger_logo\.png$ /pics/babyshit\.jpg [L]
After a while I will just block the IP and have done with it.
Job Search Statistics
I have added some job search statistics to RSS Job feed website. Every search is now added to a table and counted so that I can see which searches are the most popular.
Most Common Job Search Words
Apache Error
I created a duplicate website on my box today for testing the rss jobs site and lo and behold a famous error.
Invalid command ‘PerlHandler’, perhaps mis-spelled or defined by a module not included in the server configuration.
Basically I have not got mod_perl working for the localhost server. I edited httpd.conf and added the following and we are in business.
AddModule mod_perl.c
Alias /perl/ /var/www/perl/
SetHandler perl-script
PerlHandler Apache::Registry
Options +ExecCGI
I now have a mirror of the live site on the local machine for testing.
Postgresql: Function foo() does not exist
Very often we are humbled by the simplest things and tonight I got a good one.
I had created a plpgsql function that was called from within a trigger to check for some duplicates in a table, blah blah. The function was working because I had tested it. It was registered in the “pg_proc” table and the two “proargtypes” were type 1043 ie varchar. This function was there, I could see it and if I was a character in Tron I could touch it, so why the hell when one of my scripts ran did I get: function foo does not exist.
I’ll tell you why, I was not using the correct schema. Ahhhhhhhhhhhh.
I had tested it while logged in as the user who created it and that users schema is different from the user that needs to use it from the website. A quick
grant usage on schema foo to bar;
sorted that problem.
Uklug Cleanup
I started cleaning up the back end of the RSS jobs site today. This mainly involved tidying up some cron jobs. There was one change that I had to make because a couple of the larger rss job feeds that I get have duplicate entries in them. This means people searching the site would get lots of jobs duplicated in their results. This is not good so I added some checks to stop the duplicates getting in. There are probably still a few problems with the smaller sites but I can deal with them as I encounter them.
I would dearly love to move the site to templates but that is a fairly large job so it can wait, besides why fix that which is not broken.
I suppose I should really fix the HTML at some point but that can wait until the move to Template Toolkit 😉
rsync and vfat filesystems
When I have added a few CD’S to my ogg collection I need to copy them across to my H320. This is OK if you copy the CD over as soon as it has been copied but I don’t normally work that way. We just got some CD’s for Christmas and I already had a few copied to the hard drive so as usual I use rsync to do it.
I noticed that rsync would always copy everything rather than do an update of the files. I found out that this is because of limitations on a vfat file system. The following command sorted it.
rsync -av –modify-window=1 * /mnt/usb/
Counting files of a particualr type
I am pretty sure there is an easier way to do this but this is how I done it the first time.
find ./ -name “*.ogg” -print | perl -wne ‘$i++; END{print “$i\n”}’
I then thought about it a bit and came up with
find ./ -name “*.ogg” -print | xargs ls | wc -l
now that is much neater 😉 but wait, we don’t need the “xargs ls”
find ./ -name “*.ogg” -print | wc -l
I then thought about doing it this way
ls -R | grep “.ogg” | wc -l
I’m going to leave it there 😉
iRiver Playlist Error
I tried to make up a few playlists today for my H320 iRiver. I am using vim to make them not an application and this had me stumped for a while.
Then I realised that the iRiver expects a DOS formatted file. Basically Linux text files use “\n” for a newline and MS windows uses “\r\n”. To get it to work I had to save the file as a dos file. This can be done in vim as follows.
:set fileformat=dos
For those that don’t like vim then a Perl one liner will do the same thing.
perl -i.bak -pe ‘s/(?<!\r)\n/\r\n/g;’ *.m3u
It is safe to run this on a file that has already been converted to the msdos format. It also creates a backup file in case anything goes horribly wrong.
So a simple example of an m3u playlist is as follows:
#EXTM3U
\cypress_hill\a_to_the_k.ogg
Of course the “\r\n” is not visible but its there 😉 The example m3u file above is in the ROOT directory of the mp3 player. Remember that on the H320 and probably on the other iRiver makes as well you need to use the A-B button to view your playlist when the player has stopped.