After the amount of moving around that I did yesterday it is hard to believe that nothing went wrong. Lots of things went wrong but I will not bore you with them they are trivial to what I did today.
I started moving the “/usr/local/pgsql/data/base/links_database/files” around to make room in various places and managed to corrupt one of them. You can imagine my panic. I quickly put everything back to where it was and started Postgres and checked to see if the file I had corrupted was an index or a table. If it had been a index I could have just dropped it and recreated it but, it was the links_found table. However because I make regular backups of everything this gave me the chance to test them out.
Just recently I made some comments that I was going to use gzip, rather than bzip2, to do my backups, due to the amount of time that bzip2 takes to do anything. Trust my luck the gzip backup that I had taken recently was corrupted giving all sort of end of file errors etc. I need to investigate this because I cannot afford the time, to use bzip2 for my backups.
I used an old backup file from a few days ago. This is how I recreated the database for anyone that is interested.
]# psql template1
template1=# drop database links;
template1=# \q
]# createdb links
]# cat /links_database_01_10_03_15\:12\:57.sql.bz2 | bunzip2 | psql links
It looks like a very old backup from 7 days ago, but since the robots have not been running for a few days during that time so there was little data loss. The whole operation took a couple of hours. I also tried something a bit dangerous, when the database was halfway through re-creating itself I checked to see which data files had been created and had reached the max limit of 1Gb. I then moved the oldest of these to another file system and created a soft link to it. I know that there is probably an easier way do it than this, but because the database is bigger than any of my file systems I needed a quick and dirty method to free file space to avoid running out of room. If anyone knows a better way to do this I would like to know what it is, you know how to contact me.
Just when I thought everything was ok, I got the following error.
links=# select now (), count(*) from links_found;
ERROR: cannot read block 477160 of links_found: Input/output error
links=# select relname, relfilenode, relpages from pg_class order by
relname | relfilenode | relpages
——————————–+————-+———-
links_found | 71509890 | 456987
links_found_pkey | 112850939 | 418056
lf_found_url_idx | 112850933 | 397954
home_page | 71509882 | 90268
home_page_pkey | 112850935 | 77280
home_page_url_key | 112850937 | 74141
hp_url_id_index | 112850934 | 11990
pg_proc_proname_args_nsp_index | 16640 | 125
pg_proc | 1255 | 58
pg_depend | 16598 | 20
I knew this could mean I have a problem on my file system, I was having visions of one of my disks now being completely screwed. I found out what file system it was on using the above filename and then did the following.
[root@harry 71509876]# /etc/init.d/postgres stop
Stopping PostgreSQL: ok
[root@harry 71509876]# cd
[root@harry root]# umount /dev/sda5
[root@harry root]# e2fsck -c /dev/sda5
e2fsck 1.26 (3-Feb-2002)
Checking for bad blocks (read-only test): done
Pass 1: Checking inodes, blocks, and sizes
Duplicate blocks found… invoking duplicate block passes.
Pass 1B: Rescan for duplicate/bad blocks
Duplicate/bad block(s) in inode 766: 206483
Pass 1C: Scan directories for inodes with dup blocks.
Pass 1D: Reconciling duplicate blocks
(There are 1 inodes containing duplicate/bad blocks.)
File /data/base/71509876/71509890.3 (inode #766, mod time Wed Oct 8 13:16:55 2003)
has 1 duplicate block(s), shared with 1 file(s):
[root@harry 71509876]#
We can see straight away that there is a problem on my links_found table again. To fix this I ran e2fsck using the “-f” option and chose the defaults when asked questions. I ran it again to make sure that the defaults where not causing any trouble, and the database is now back in business.
40 Million links found
6 Million unique links found
Scan.co.uk are shit 07 Oct 03
I had another bad day today. I need to return all the hardware to www.scan.co.uk, which means moving various bits and bobs around and restructuring my file system to make as much space as possible. The following was my file system before I started playing with it.
harry]#df -m
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/hdc2 3938 961 2777 26% /
/dev/hdc1 30 9 20 30% /boot
/dev/sda1 3937 3006 731 81% /home
none 505 0 504 0% /dev/shm
/dev/hdc7 1969 33 1836 2% /tmp
/dev/hdc3 3938 2177 1561 59% /usr
/dev/hdc5 3938 3108 630 84% /opt/links/hdc5
/dev/hdb1 9628 6955 2184 77% /opt/oracle
/dev/hdb2 9605 4884 4233 54% /opt/oracle/oradata/ide12
/dev/sda2 3937 1019 2719 28% /opt/oracle/oradata/scsi02
/dev/sda5 9365 8625 272 97% /opt/oracle/oradata/scsi03
/dev/hde1 156319 33 148470 1% /opt/oracle/oradata/hde1
After a bit of moving around and shifting of file systems I managed to get it into the following order. It is not ideal yet, but it gets the robots started again.
[root@cpc3-lutn1-6-0-cust26 root]# df -m
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/hdc2 3938 2567 1170 69% /
/dev/hdc1 30 9 20 30% /boot
none 505 0 504 0% /dev/shm
/dev/hdc3 3938 2177 1561 59% /usr
/dev/hdc5 8439 33 7985 1% /links/pg_xlog
/dev/hdb1 9628 4222 4917 47% /links/tables
/dev/hdb2 9605 752 8365 9% /links/temp
/dev/sda5 17364 10500 5996 64% /links/database
I now have a few extra Gb for the database that I did not have before. Something that was causing me some concern was what to do if I ever had to restore a backup. I had to make sure that I had a single contiguous file system that would contain the whole database. This is the reason I made the SCSI disk into a single file system.
35.876 Million links found
5.476 Million unique links found
Nightmare 06 Oct 03
I have had an absolute nightmare of a day trying to get various bits of hardware to operate together. After further investigation we quickly came to the conclusion that www.scan.co.uk ‘s RAM had shagged the PC. I knew that as the current law stands that I have not got a leg to stand on, or at least I would have if I had the money to chase them for the next several months proving that there hardware was to blame.
Before anyone says that I am not really qualified to make such a general statement about their hardware I should probably state that I was “PACE” qualified for 3 years in a row and spent 5 years as a satellite engineer so I know my way around a circuit board better than most.
For reference I have also tried to get the Highpoint Card running better. It was giving me some really slow speeds when using hdparm. So I downloaded the driver from the website, installed it, and for a few minutes it tripled my hdparm speeds then it froze the PC. I tried it twice before giving up.
BOLLOCKS BOLLOCKS BOLLOCKS BOLLOCKS BOLLOCKS BOLLOCKS
End of OU coursework
Well the course work is finished, thank god. Waiting for the arrival of the hardware. I managed to a get a little spidering done today.
32.8M links_found
RAM and shmmax 03 Oct 03
Watford Electronics only do 512Mb sticks so I am limiting my upgrade options in the future, but I need the RAM so they got the sale. I put the two 512Mb into the machine and it worked a treat. Now all I need to do is compile a kernel that has support for the Highpoint HPTxxx Controllers and edit the postgres.conf file and shmmax settings so that we can take advantage of the new RAM.
Unfortunately compiling a kernel for this beast is not really that straight forward. I needed to get the latest up2date packages and keys etc from redhat then I needed to download the source rpm and install that. I do not have an old .config file for this machine so I had to use one of the ones provided in the source install and customise it for my requirements. I was a bit surprised when it worked first time. Needless to say I have left out a lot of the details, because it was a very frustrating exercise.
I managed to move the 20Gb disk out of the PC for J’s dad. This involved moving several of the Postgres files around and onto disks that are not really ideal places to have them. I will sort this out before I start the robots again in a few days time. So I am off to build a PC . They need the PC because they need to research cycling across Canada on a tandem.
35.876 Million links found
5.476 Million unique links found
RAM and shmmax 02 Oct 03
I am going to build a PC for J’s parents from old bits that I have in storage or should I say bits that I have no room for here. I am taking the 512Mb PC133 from this machine and adding the 1Gb stick that arrived today along with the Maxtor 160 SATA drive and Highpoint Controller 1542-4. I am also giving him one of my 20Gb drives because he’s only got one 1.7Gb and a 127Mb drive in his old PC.
Needless to say the hardware that arrived from www.scan.co.uk today was a bit dodgy. The RAM completely trashed one of our PC’s and we are now unable to get it running again. I now need to get more RAM from Watford Electronics to replace the piece of crap www.scan.co.uk sent. Unfortunately Watford Electronics are not open until tomorrow so its back to the old config for a little bit more spidering.
35.876 Million links found
5.476 Million unique links found