Increasing performance on the database 10 Oct 03

I am in the process of reducing the total size of the database and increasing performance a bit in the process. As it stands the links_found table is holding duplicate copies of the home_page table. This was necessary because I wanted to be able to see what links belong to what web pages at the start for testing purposes. I no longer need to do this. What I am going to change is the format that they are stored in. I am going to make the following changes to the link_found table.
FORM
links_found
(
parent_url_id int4,
found_url varchar(2000)
);
TO
child_links
(
parent_url_id int4,
found_url int4
);
I have written a postgres function to carry out the migration. As you can see beneath the space savings are fantastic. I should also see an improvement in my indexes on this table. I am trying very hard to postpone buying any extra hardware. It will make it a bit more awkward to use the data now but this is not my main priority at the moment. When I have lots of disk space I can the create temporary tables for any manipulations as I require them.
You can see below that I have created a new table called child_links. I have converted all the URL’s into url_id’s which are “int4” types. I have also created indexes on this table. I can now remove all relations relating to the links_found table.
links# select relname, relfilenode, relpages from pg_class order by relpages limit 10;
relname | relfilenode | relpages
——————-+————-+———-
links_found | 188163825 | 644114
links_found_pkey | 246168688 | 588185
lf_found_url_idx | 246168682 | 559585
child_links | 246168690 | 255817
child_links_pkey | 246168692 | 216185
home_page | 188163817 | 118338
parent_url_id_idx | 299992353 | 116508
child_url_id_idx | 299992259 | 116231
home_page_pkey | 246168684 | 103223
home_page_url_key | 246168686 | 100120
hp_url_id_index | 246168683 | 15857
hp_url_id_idx | 301324542 | 15660
(12 rows)
Before I remove any of the relations I wanted to see my actual disk savings.
File system 1M-blocks Used Available Use% Mounted on
/dev/hdc2 3938 3125 613 84% /
/dev/hdc1 30 9 20 30% /boot
none 505 0 504 0% /dev/shm
/dev/hdc3 3938 2177 1561 59% /usr
/dev/hdc5 8439 1466 6551 19% /links/pg_xlog
/dev/hdb1 9628 5523 3616 61% /links/tables
/dev/hdb2 9605 7044 2073 78% /links/temp
/dev/sda5 17364 13812 2684 84% /links/database
So that you can see what I had done to the files and where they where all pointing here is a listing of the links database directory for all the big relations.
-rw——- 1 postgres postgres 8.0k Oct 12 03:54 188163815
lrwxrwxrwx 1 postgres postgres 33 Oct 10 23:17 188163817 -> /links/pg_xlog/postgres/188163817
lrwxrwxrwx 1 postgres postgres 30 Oct 10 22:35 188163825 -> /links/temp/postgres/188163825
lrwxrwxrwx 1 postgres postgres 32 Oct 10 22:37 188163825.1 -> /links/temp/postgres/188163825.1
lrwxrwxrwx 1 postgres postgres 32 Oct 10 22:38 188163825.2 -> /links/temp/postgres/188163825.2
lrwxrwxrwx 1 postgres postgres 32 Oct 10 22:39 188163825.3 -> /links/temp/postgres/188163825.3
lrwxrwxrwx 1 postgres postgres 32 Oct 10 22:41 188163825.4 -> /links/temp/postgres/188163825.4
lrwxrwxrwx 1 postgres postgres 32 Oct 10 23:30 246168682 -> /links/tables/postgres/246168682
lrwxrwxrwx 1 postgres postgres 34 Oct 10 23:44 246168682.1 -> /links/tables/postgres/246168682.1
lrwxrwxrwx 1 postgres postgres 34 Oct 11 00:41 246168682.2 -> /links/tables/postgres/246168682.2
lrwxrwxrwx 1 postgres postgres 34 Oct 11 00:41 246168682.3 -> /links/tables/postgres/246168682.3
lrwxrwxrwx 1 postgres postgres 34 Oct 11 00:42 246168682.4 -> /links/tables/postgres/246168682.4
-rw——- 1 postgres postgres 132M Oct 12 03:54 246168683
-rw——- 1 postgres postgres 855M Oct 12 03:55 246168684
-rw——- 1 postgres postgres 871M Oct 12 03:55 246168686
-rw——- 1 postgres postgres 1.0G Oct 11 01:59 246168688
-rw——- 1 postgres postgres 1.0G Oct 11 01:38 246168688.1
-rw——- 1 postgres postgres 1.0G Oct 11 01:54 246168688.2
-rw——- 1 postgres postgres 1.0G Oct 11 01:59 246168688.3
-rw——- 1 postgres postgres 499M Oct 11 01:59 246168688.4
-rw——- 1 postgres postgres 1.0G Oct 11 14:32 246168690
-rw——- 1 postgres postgres 1.0G Oct 12 00:42 246168690.1
-rw——- 1 postgres postgres 52M Oct 12 03:55 246168690.2
-rw——- 1 postgres postgres 1.0G Oct 12 03:52 246168692
-rw——- 1 postgres postgres 750M Oct 12 03:55 246168692.1
-rw——- 1 postgres postgres 1005M Oct 12 03:55 299992259
-rw——- 1 postgres postgres 995M Oct 12 03:55 299992353
-rw——- 1 postgres postgres 130M Oct 12 03:55 301324542
After droping the relations and deleting all trace of them I had the following results. You may notice that the filenames are completely different. I had a major problem on the SCSI disk again. I had to recreate the database using initdb because some of my pg_clog files went missing. To allow me to complete a vacuum I copied the last pg_clog file to the file that was missing. I am pretty sure that this is very dangerous but it allowed me to complete the vacuum on the table. I was hoping to see some more errors but I got none. I completely dropped the database reformatted the hard disk and created a new file system on it. I had originally used the “largefile4” option of mke2fs but I have now left it at default.
]# df -m
File system 1M-blocks Used Available Use% Mounted on
/dev/hdc2 3938 3124 613 84% /
/dev/hdc1 30 9 20 30% /boot
none 505 0 504 0% /dev/shm
/dev/hdc3 3938 2177 1561 59% /usr
/dev/hdc5 8439 2369 5648 30% /links/pg_xlog
/dev/hdb1 9628 2034 7105 23% /links/tables
/dev/hdb2 9605 2007 7110 23% /links/temp
/dev/sda5 17093 7513 8712 47% /links/database
Interesting bits from my base directory
-rw——- 1 postgres postgres 1.0G Oct 12 2003 16992
-rw——- 1 postgres postgres 927M Oct 12 2003 16992.1
-rw——- 1 postgres postgres 121M Oct 12 2003 58021849
-rw——- 1 postgres postgres 783M Oct 12 2003 58021850
-rw——- 1 postgres postgres 752M Oct 12 2003 58021852
-rw——- 1 postgres postgres 1.0G Oct 12 2003 58021854
-rw——- 1 postgres postgres 68M Oct 12 2003 58021854.1
-rw——- 1 postgres postgres 872M Oct 12 2003 58021856
-rw——- 1 postgres postgres 872M Oct 12 2003 58021857
We can see that we have dropped the total size of the database considerably.
links=# select relname, relfilenode, relpages from pg_class order by relpages desc limit 9;
relname | relfilenode | relpages
——————————–+————-+———-
child_links | 16992 | 249791
child_links_pkey | 58021854 | 139723
home_page | 16982 | 116267
parent_url_id_idx | 58021856 | 111577
child_url_id_idx | 58021857 | 111577
home_page_pkey | 58021850 | 100253
home_page_url_key | 58021852 | 96209
hp_url_id_index | 58021849 | 15434
pg_proc_proname_args_nsp_index | 16640 | 125
50 Million links found
6.7 Million unique links found

Leave a Reply

Your email address will not be published. Required fields are marked *