Postgresql Database Recovery

If you have found this page then you probably have a serious problem with a corrupt Postgres database. If like me you found that your database seems to be missing then you have come to the correct place. I am not saying this will cure your problem or even come close but it might give you a few ideas on where to look next.
If you are going to follow this article I strongly suggest you read it ALL before starting to do anything. I have written it as I done it and I went down some blind alleys before I recovered the data. If you start doing what I have done as you read it you might waste a lot of time or fsck something up, so read it first and decide what you need to do. In most situations it might be a very simple single command that needs to be run and you will be sorted, others may have to do it a bit different.
I would also highly recommend getting onto the Postgres mailing lists and asking some pertinent questions. Make sure if you do post questions you give as much detail as possible i.e. version numbers and full debug level log output. There are people on there who have done this a lot more than I have and I am sure they have seen some nastier cases than the one I’ve got.
This article does not cover how to fix file system errors, see fsck or e2fsck for that if you have them. You might also want to investigate setting
zero_damaged_pages = true
in your postgresql.conf file if you are expecting corruption in your files. Ask on the postgresql maiing lists about this before doing it though.
.
My problems started as follows
postgres@debian:~$ psql links
Welcome to psql 7.4.3, the PostgreSQL interactive terminal.
Type: \copyright for distribution terms
\h for help with SQL commands
\? for help on internal slash commands
\g or terminate with semicolon to execute query
\q to quit
links=# \dt
no relations found
links=# select relname from pg_class where relname not like ‘pg_%’;

none of my tables were present in the output, this is where I had a sudden urge to go to the toilet.
I don’t often use the links database but every now and then I start a set of spiders that use the database to traverse the internet. I have been doing this for about 2 years and the database is huge or at least for a home machine it is.
I only have a backup from several months ago (lesson to be learned here) and I really didn’t want to loose the data I had collected over the last few months.
On hunting around the internet I noticed that there have been a lot of people who have had corrupt postgres databases and managed to recover them with varying degrees of success. Most causes of these corruptions seemed to be hardware related and some with bodged upgrades. Mine could have been one of 2 things:
1. A glitch on the 160Gb SATA drive the database is stored on. This happened the other night.
2. Recent Debian upgrade to the database.
At this point figuring out what went wrong was less important than getting the data back so I decided not to bother on a witch hunt and cure it instead.
This was the point where I asked myself what was more important.
1. Recover as much of the data as possible.
2. Data Integrity.
For me the choice was quite simple. I wanted the data, I also needed to be able to retrieve the vast majority of it otherwise I might as well just use most recent dump which would guarantee the integrity of the database but would set me back a few months.
First thing I did was stop the postgres server:
/etc/init.d/postgres stop
I then took a full copy of the “data” directory, this is the one you may have used the “initdb” command to setup.
Once the backup has been made make sure that nothing happens to the original directory, don’t do anything to it at all because we may need it later. All subsequent actions will use the copy of the database not the original.
At this point it might be an idea to a little data gathering. For me I needed to know what table was the largest etc. Doing
ls -la /var/lib/postgres/data/base/17142/
This will list all the files in the directory. I was pretty sure that the biggest table was going to be either the “child_links” or home_page table and it was easy to see in my case which was the largest tables. I also turned on full logging on all the postgres databases by editing the postgresql.conf file. Be aware that any more database created by “initdb” will create separate config files in the data directory and these will need to be edited. I suggest copying over them with a single common one. Another thing that you will need to know is the last Transaction ID or at least as close to value as possible. When you start stop the postgres database it write and entry to the log file and this contains a TID. I used grep to find mine ie
grep “next transaction ID” /var/log/syslog
This produced a list of TID’s. (I log to syslog you might not, check postgresql.conf to find out)
The next thing I did was create a new database away from both the copy and the original databases. I did this using the initdb command as follows
initdb /var/lib/postgres/recovery_database
This creates a skeleton database data directory ready for action. Make sure no postmaster instances are running. I then started my new database as follows
/path/to/postmaster -D /var/lib/postgres/recovery_database
This database as it stands is not really much use so:
createdb links
I then fished out the create table script for my database (you may not need this). And created an empty database using it. I then had all my original tables with no data in them. The next thing I did was
links=#select relfilenode from pg_class where relname = ‘child_links’;
relfilenode
————-
17160
(1 row
This gave me the name of the file on disk where the table data was. I stopped the database and then:
cp /copy/base/17142/172345 /path/recovery_database/base/17142/17160
cp /copy/base/17142/172345.1 /path/recovery_database/base/17142/17160.1
cp /copy/base/17142/172345.2 /path/recovery_database/base/17142/17160.2
I know I could have just soft linked them but I was being cautious. If I take copies the originals are safe from harm if I make a cock up.
Restart the recovery database again using the following. (please read the man page before using pg_resetxlog). I used 90812030 here because this number was the largest Transaction ID I could get from the logs.
pg_resetxlog -x 90812030 /var/lib/postgres/recovery_database
/path/to/postmaster -D /var/lib/postgres/recovery_database
I then used
postgres@debian:~$ psql links
Welcome to psql 7.4.3, the PostgreSQL interactive terminal.
Type: \copyright for distribution terms
\h for help with SQL commands
\? for help on internal slash commands
\g or terminate with semicolon to execute query
\q to quit
links=# select count(*) from child_links;
count
———-
16341924
(1 row)
I immediately recognized that this count was wrong. 16 Million rows looked more like the home_page table to me. I decided to try
links=# select * from child_links;
……..ERROR………
This threw an error straight away which was a good indication that although it would count rows etc the structure was completely different. I then ran
links=#select relfilenode from pg_class where relname = ‘home_page’;
relfilenode
————-
17152
(1 row
I stopped the database again and now that I had the filename of the recovery databases home_page table I was able to soft link the files I copied earlier to it as follows
cd /var/lib/postgres/recovery_database/base/17142
ln -s 17160 17152
ln -s 17160.1 17152.1
ln -s 17160.2 17152.2
This was a bodge to save me time copying the files back over from the copy. If I corrupt these files the copies are safe anyway so a soft link is a quick way to see if this would work.
I restarted the recovery database
/path/to/postmaster -D /var/lib/postgres/recovery_database
I then used
postgres@debian:~$ psql links
Welcome to psql 7.4.3, the PostgreSQL interactive terminal.
Type: \copyright for distribution terms
\h for help with SQL commands
\? for help on internal slash commands
\g or terminate with semicolon to execute query
\q to quit
links=# select count(*) from home_page;
count
———-
16341924
(1 row)
links=# select * from home_page limit 10;

This returned 10 rows of data which is all when and good.
links=# select * from home_page;

This first seemed to be doing something but it eventually failed with an error similar to the following.
FATAL: XX000: xlog flush request 29/5BEF8A58 is not satisfied — flush ed only to 0/62000050
This meant I had to start looking to see what would have caused this. I managed to find some info on this and by Googling I gotthis on one of the postgres mailing lists
From what I could tell I needed to use pg_resetxlog but this time I had to give pg_resetxlog some info about the WAL setting. Following the instructions in the man page I stopped the database and issued
:~$ pg_resetxlog -x 90812030 -l 0x58,0x25 /var/lib/postgres/recovery_database
I then restarted the server and
postgres@debian:~$ psql links
Welcome to psql 7.4.3, the PostgreSQL interactive terminal.
Type: \copyright for distribution terms
\h for help with SQL commands
\? for help on internal slash commands
\g or terminate with semicolon to execute query
\q to quit
links=# select * from home_page;

I then used pg_dump to dump this table out to a backup file. The prospect of doing the above for each table made me cringe so I decided to risk trying something on the original database. I do not recommend you taking this short cut if you are really that worried about your data. I have the luxury in my case that the data is neither critical or really important. If you value your data don’t do this to the original repository do it somewhere else first.
I shut down the recovery instance. I then ran
:~$ pg_resetxlog -x 90812030 -l 0x58,0x25 /var/lib/postgres/data
against the original database. I restarted the database using
/etc/init.d/postgresql start
I logged into the database and I was now able to see all the tables etc. I ran a few select statement and everything looked fine. I then logged out and
pg_dump links | gzip > links_dump.gz
This dumped the entire database out.
I then created another directory for a new database as follows.
initdb /var/lib/postgres/new_database
Remember to edit the new config files ie postgresql.conf. You might want to add a higher setting to checkpoint_segments, I used 15. I then ran
cat /var/lib/postgres/full_dump_links.gz | gunzip | psql links
This finished with the following errors.
ERROR: there is no unique constraint matching given keys for referenced table “home_page”
ERROR: insert or update on table “rev_index” violates foreign key constraint “lexicon_id_fk”
DETAIL: Key (lexicon_id)=(22342) is not present in table “lexicon”.
ERROR: there is no unique constraint matching given keys for referenced table “home_page”
There are basically some constraints broken. I had some errors indicating that a few of the unique indexes and primary keys could not be created. This is not really concerning me too much since this is easily remedied. Everything after this point is simple administration tasks ie dropping and creating tables etc and fixing the broken constraints. I also had to check the integrity of the data which is simple enough.
That was it for me. A fully recovered postgresql database.

Amost posioned by a crap Chinese restaurant

We got a delivery from a Chinese restaurant tonight:
The Oriental
2A Granville Road
Luton
Beds
LU1 1PA
When we actually got the meal it we found that there was raw chicken in it. I got a refund and reported them to the local council. I have no idea what else we can do about it. I think they shouldn’t be trading if they can’t be bothered to cook the food properly. It’ll be the last time we order anything from them.

Continue reading “Amost posioned by a crap Chinese restaurant”

Kernel 2.6.5 and 2.6.9 fun

I upgraded from kernel 2.6.5 to kernel 2.6.9 because I was getting DMA errors when ripping CD’s to disc. I was also getting major errors with the SATA disk when copying the CD’s to my mp3 player so I have bit the bullet and decided to try and upgrade.
During boot I came across what I thought was some king of bug. When I rebooted the kernel fsck complained about a bad file system, no indication as to which device just the error message.
I logged back in and my SATA disk was gone. It had not been mounted during boot which was a bit of a bummer. I had a look at the dmesg output and lo and behold it is now a SCSI device and my three old mount points are now invalid. I am using the VIA controller ie
CONFIG_SCSI_SATA_VIA=y
in my config. I find these changes very disconcerting but then I am not a kernel hacker. I wonder if there is an easy way to see changes like this without having to read through tons of Change logs. Its even worse when you are jumping several versions.
Kernel upgrades, like life are is just full of little surprises.

Free Gmail Accounts

I have no more gmail accounts to give away.
I have some gmail accounts to give away. If someone wants one email me.
I need your.
First Name.
Last Name.
Email Address.

Unsolicited Email

Last year I was looking for work so I registered with gojobsite.co.uk and stuck my CV up their. I actaully did get some leads from it but nothing substantial.
Anyway I seem to get random emails from recruiters that are completely irrelevant. I don’t get them often but I do get them and I know that its a bulk mailing system they are using. They have not even looked at the details they just do a mix and match hodge podge and come up with a list, the more the merrier because they know there are plenty of dumb ass people who forward job offers emails all over the place. If you think there is nothing wrong with this then have a read of Sharp Practices.
Anyway I got another one today and I politely asked them to take me off the mailing list. The reply was as follows

The email was sent through an internet candidate database. Your details
have come up in a search of keywords from your CV.
Inevitably, there are mistakes made with this type of search, meaning
job descriptions may not be perfectly suited to your profile.
However, we invite candidates to send their CV so that we can update
their profile accordingly on our in-house database
If you no longer wish to receive emails through internet databases,
then this point needs to be raised with any Recruitment websites you are
subscribed to.
If you wish to be included on our in-house database, please reply with
your most up-to-date CV, detailing your ideal position in the body of
the email. We will then call, or email you with details of more suitable
positions
Any questions, do not hesitate in contacting me directly

If I had registered with several recruitment sites how am I meant to tell which one I am to go back to and change. The other thing is, I might be very happy with the other companies that send me job advertisements except his and no internet recruitment company I have come across allows you to selectively choose which “Partners” of theirs are allowed to send you emails.
I know that the laws changed on December 11 2003 so I decided to go looking to see if there was anything I could do about it.
This is what I found
I have asked them to take me off their email list and they have failed to comply. I have replied with pointers to the appropriate materials online and I am awaiting a reply to that. If I get no joy then I will report them.

Copyright protected CD’s

The music industry is quite crazy. For years they have been milking us like cattle. They continually tout illegal ripping of CD’s as the main reason for their high price. This seems like a nice cop out. Its in their interests to keep the price high because there is always a base of people who buy CD’s regardless of how easy they are to copy.
There are 3 types of people.
1. People who buy CD’s
2. People who buy CD’s and also accept illegal copies.
3. People who deal mainly with illegal copies
I am one of those Tyoe 1 suckers. People of Type 2 would be more likely to become Type 1 if prices where lower. The people of Type 3 are unlikely to mend their ways unless prices drop significantly but there will always be a pirate market its the size of it that matters.
I like to own the originals but prices are extortionate and illegal copies are looking very tempting. I buy a CD every couple of weeks and always look for the bargains, rarely will I ever buy a CD that costs me more than £8 (still expensive). One exception was a Metallica’s Black album that cost a wopping £15. I really wanted it so I had to take another shafting up the arse from the industry to buy it.
When I come home I burn the CD then put it back in its case for safe keeping. I need to do this because CD’s are so bloody expensive and CD’s get scratched. I also like to have them on my mp3 player for the train journey to work.
We now have Copyright protected CD’s which means I am not able to make copies of them. I know I can copy them but its against the law to make a copy for my own use, this is a pile of crap. I for one will not be buying any Copyright protected CD’s. In the age of MP3’s there is really not much point in owning a CD that you cannot copy so why buy it. Then of course some players won’t actually play these CD’s.
I think the music industry are hanging themeselves. They are making it harder and harder for people to use the music the way they want it and more an more people are sick to death of paying high prices for it. They need to make the prices low enough so that its not worth the effort of burning them when you can get them for next to nothing.
There will be a loser in this war and I can guarantee that it will not be the public. Music is getting easier and easier to get hold of or copy and prices are not going down so its only going to get worse. Its about time the music industry realised that you can only shaft Joe public for so long before it all goes horribly wrong.

kernel: cdrom: dropping to single frame dma

This error manifests itself when using grip to encode some CD’s to ogg. When the CD is scratched it takes a long time to rip it to disk so I would normally set these aside and do them later. When I tried to abort the ripping grip fails with an application error and if I check the logs I see
kernel: cdrom: dropping to single frame dma
ripping after this point fails at around 90% of each track. I have hunted high and low for a solution and so far I have not found one. I am using a SCSI cd burner
‘YAMAHA ‘ ‘CRW2100S ‘ ‘1.0H’ Removable CD-ROM
For the time being one solution that I seen mentioned was to reload the kernel module for the CD drive. This is not easy for me because mine was compiled into the kernel. I decided to download 2.6.8 and see if it works instead. I compiled my SCSI card driver as a module just in case anything went wrong.
Something weird did happen. When I booted into the new kernel I was unable to mount the H320 USB device which is a bit of a bummer. Worse than that was XFree86 started using between 60 and 90 percent CPU. Something is definitely not right with the the new kernel. I decided to recompile the kernel with the SCSI device built in in the off chance that it may have caused this but when I booted back in everything was fine for a few minutes and then X went mad again. I was still unable to see the USB device either. I was also unable to mount my SATA drive either which is where the CD collection is stored so I am switching back to the 2.6.5 kernel and will just reboot when it happens.
For the lowdown on the problem have a look at the following thread

iRiver H320 on Linux

I just bought two of these and decided to get them working on Linux. This is a very rough guide on how to get it running, it is not a guide on how to compile a kernel. For Debian I wrote a page on Compiling a kernel for Debian that you could use as a guide but for other systems see the Kernel Rebuild Guide
First off these are USB Mass storage devices so you need to have USB enabled properly in your kernel. The appropriate options that I had to add to my kernel config file are as follows.

# USB support
#
CONFIG_USB=y
CONFIG_USB_DEBUG=y
# Miscellaneous USB options
#
CONFIG_USB_DEVICEFS=y
# USB Host Controller Drivers
#
CONFIG_USB_EHCI_HCD=m
CONFIG_USB_OHCI_HCD=m
CONFIG_USB_UHCI_HCD=m
# USB Device Class drivers
#
CONFIG_USB_BLUETOOTH_TTY=m
CONFIG_USB_ACM=m
CONFIG_USB_PRINTER=m
CONFIG_USB_STORAGE=y
CONFIG_USB_STORAGE_DEBUG=y
CONFIG_USB_STORAGE_DPCM=y
CONFIG_USB_STORAGE_JUMPSHOT=y
# USB Human Interface Devices (HID)
CONFIG_USB_HID=m
CONFIG_USB_HIDINPUT=y

For those that don’t know what the kernel config file is this is the file that is used to configure the kernel 😉 When I recompiled my kernel I used
make menuconfig
this edits the config file before you compile and install the kernel. After running “make menuconfig” then go to drivers and at the bottom you should see USB device option select this and then select the devices you have on your machine.
To see what devices are on your machine you need to enable them in your BIOS and then you can use
lspci -v | grep HCI
to have a look at what USB controller your motherboard or PCI card is using. Mine was running a VIA controller.
To get the usb to appear when you plug it in you need to have the hotplug scripts installed. On Debian this is a simple
apt-get install hotplug
and thats sorted. I also added the following to my fstab file so that I can browse the devices.
/dev/sda1 /mnt/usb vfat defaults,auto,user,sync 0 0
That was it. I now have the iRiver H320 on my machine and it looks like a 20Gb hard drive. Now to get my CD Collection converted to OGG’s.

iRiver H320

I bought two of these as Christmas presents ie one for me 😉 and one for Jenny. I am not really a gadget person but I knew I wanted plenty of storage and the H320 has 20Gb which should keep me happy.
What I didn’t know is that iRiver released an American version that has had the hardware castrated by removing USB TO GO (USBTG). This was the facility to plug another USB device into the other and copy files between them in particular cameras. From what I understand hardly any hardware was supported but it still seemed to be a daft decision. I got both the units in Manhattan so I have the American version, bollix.
For the low down on the USBTG issue visit www.misticriver.net
My initial impressions were very good, it looks sleek enough and although there are plenty of complaints about it being a bit big I like the size of it. I have also read a lot of complaints about usability and I for one am getting along fine with it. Of course, I read the instructions which some people seem to miss and assume that it should just do it the way their last one did or the same way an iPod works. It is easy to figure out an like most new bits of tech with a lot of feature it takes a wee bit of getting used to.
I plugged it into the PC and after a while (it needs some charge before it will work on the PC) I tried to write some files to it. This was a pain in the ass because the device is unable to be charged and do this stuff at the same time so I had to wait a while before I could switch data mode and move some files on.
First off it does not tell me in the manual which port I am meant to use to transfer files, I guessed data and managed to get it to work but being curious I tried to see what would happen if I plugged in the media port. On the manual the media port is meant to show up as a device the same way the data port did but it kept asking me for a driver so I can only assume there is something very wrong there since it is touted as plug and play with no need for drivers unless you are using Win98. This annoyed me because now I have a port on the unit and I have no idea what it is for.
Earphones: I have no idea why they supplied these because they are complete shit. They don’t fit and keep falling out but then they all do in my ears but what makes these crap is that as the volume goes up I start to get all sort of crap coming through them. I am assuming they are faulty because they shouldn’t sound that bad out of the box. I have tried them on several different songs using different bit rates etc and they are still crap.
Case: This is OK. Its sturdy and will save the unit from some damage but I have to ask why they blocked the view of the screen. I suppose they want you to purchase the remote control or buy a better case.
Radio: I am unable to get more than three stations and this is in the center of London so I think the radio is a bit crap as well. I will keep trying because this is one feature I really need, I like Radio 2 and Radio 4 so if I cannot get these I will be sending them back. (I know about the European setting on the radio)
Would I buy the iRiver again?
Not at UK prices, its a rip off. I also need to spend more money getting headphones which is another pain. My advice is if you can get either the HI120 or the H140 it would be money better spent. There is just too many little things that tell me that the H320 is a half arsed attempt at an update for the H100 series.
As a first impression with the company I am disappointed so will be purchasing somewhere else in future.
I am currently trying to get the H320 to work in Linux but more in that in another post.