If you are color blind try the Color chooser Color chooser Color chooser
Warning
First off let me say that I am by no means a kernal hacker. I am just documenting my experiences with the kernel, nothing more.
MSI K8T Neo kernel 2.6 Oops errors.
I bought the MSI K8T Neo motherboard with an AMD 64 3200 chip and whenever I put the filesystem under heavy load it would Oops. I noticed this when trying to restore a postgres database.
I searched high and low for a decent guide on trying to decipher the
Oops output but could not really find much that I could use. After
much rummaging around I found that I needed to save the Oops output to
a file and then us ksymoops to get some info from it. This was a disaster because
I got nothing but warning and errors about how untrustworthy my findings where. On
further investigation I noticed that you could compile the kernel with
the option
CONFIG_KALLSYMS=y
set in the config file
what this means is that when the kernel decides to oops it produces
the correct output that you would expect from ksymoops. This was
nice to have and put my mind at rest because it still appeared to have
errors so I wasn't going mad.
One thing to note is that after you have obtained an Oops then any further Oops cannot be trusted. This is because you are effectively running a broken system. It might not appear broken but it is, the only use an Oopsed system is good for is debugging.
How to read an Oops
I would love to find a tutorial on this because I spent ages looking for one and the ones I did find where a bit above my head. I did try to debug the problem I had but I was a bit shocked at the amount of stuff that you needed to know just to fart without following through when dealing with the linux kernel. The kernel is complicated, complicated, complicated and in case you didn't hear me the first time, complicated, 1600 Pennsylvania Avenue and its occupants can only dream of appearing this complicated. I have to say appearances can be deceiving, in this case 1600 P. Avenue is easy to understand, the kernel unfortunately isn't, or at least this has been experience of it (limited).
The fisrt thing you need to work with an oops is the oops itself. I am only
going to talk oops that have had their symbols resolved via ksymoops or by
using the
CONFIG_KALLSYMS=y
kernel option. I use the kernel otpion because I don't need to remember
it and for someone like me thats a good thing.
Kernel Bugs
As soon as I got the bug I wanted to send it straight to the linux kernel list on the assumtion that someone there would have seen it before but heeding many warning and having seen scorch marked newbies before I refrained from this and decided to try and investigate a little before doing so. This was in part due to my wn curiosity, I also didn't want to make an arse of myself posting a bug that was solved in kernel 0.86.
First thing I looked for was some guidance on how to go about reading the oops output. I found various snippets of text here and there and from these I gathered that a normal oops output is not in human readable format ( when I say human I am disregarding kernel wizards, these people have yet to be classified / categorised ). Prodding deeper I discovered that I needed to take the oops file and use a tool to get it into a format that was readable or at least more readable. This tool is known as "ksymoops" and should be installed on your system. I did not have much luck using ksymoops because I got various warnings etc about how unreliable my oops was. On further investigation I noticed that I could compile a kernel to give me an oops in a readable format (please see notes above). I re-compiled my kernel with the the option set and set about reproducing the oops and lo and behold the oops was produced in a different unintelligable output or at least at first glance this is what it looked like. Below you can see the oops that I got.
Unable to handle kernel paging request at virtual address 00ff0744 printing eip: c01e5351 *pde = 00000000 Oops: 0000 [#1] CPU: 0 0060:[generic_make_request+17/384] Not tainted EFLAGS: 00010282 (2.6.5) EIP is at generic_make_request+0x11/0x180 eax: 00000202 ebx: 007d8008 ecx: 00ff0740 edx: e8eea300 esi: fb001000 edi: e8eea300 ebp: 00000040 esp: f3ee5d70 ds: 007b es: 007b ss: 0068 Process kjournald (pid: 868, threadinfo=f3ee4000 task=f572b2a0) Stack: f3ee5d70 f3ee5d70 f3ee5da4 00000082 f1b9f8c0 e8eea300 00000000 00000000 00000010 c01495ab f7fed8a0 00000010 00000000 e908fbd0 00000001 007d8008 00000013 00000001 00000040 c01e54fd e8eea300 e908fbd0 c0148fb0 00000001 Call Trace: [bio_alloc+203/416] bio_alloc+0xcb/0x1a0 [submit_bio+61/112] submit_bio+0x3d/0x70 [ll_rw_block+96/128] ll_rw_block+0x60/0x80 [journal_commit_transaction+3533/4048] journal_commit_transaction+0xdcd/0xfd0 [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50 [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50 [kjournald+180/464] kjournald+0xb4/0x1d0 [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50 [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50 [ret_from_fork+6/20] ret_from_fork+0x6/0x14 [commit_timeout+0/16] commit_timeout+0x0/0x10 [kjournald+0/464] kjournald+0x0/0x1d0 [kernel_thread_helper+5/20] kernel_thread_helper+0x5/0x14 Code: 8b 41 04 c1 ee 09 8b 50 38 8b 40 34 0f ac d0 09 85 c0 89 c3
The oops above appears incomprehensible but it isn't. I knew absolutely nothing about them until today and I managed to track my problem down. I am not saying I know much more about them now but I now know that the line that is of the most interest in the oops above is:
0060:[generic_make_request+17/384] Not tainted
Its this line that tells me where the oops occoured ie what function the scene of the crime took place in. The "Not tainted" bit at the end is a signal for kernel hackers to easily check if I have been loading in proprietary drivers or doing some odd non standard stuff to the kernel. If the kernel is tainted it has gone off the beaten track and gets exponentially harder to debug. Some kernel hackers won't help you if you kernel has been tainted.
The next job was to try and locate where the function was being called from. To do this I issued the following commands
cd /usr/src/linux/ grep -r "generic_make_request" *
This produced a lot of output but only one of the files will give a line simalar to
drivers/block/ll_rw_blk.c:EXPORT_SYMBOL(generic_make_request);
This is where the function belongs ie where it was declared. It was at this point that I got a bit stuck. I know enough C to be dangerous but anyone who can code "Hello World" in C is dangerous. C is a firm supporter of the 2nd ammendment and your right to bear arms, how you use them is entirely up to you. C is just the arms dealer. I rummaged around in
/usr/src/linux/drivers/block/ll_rw_blk.c
for a while until I realised that I was getting nowhere, even after having looked at various header files etc.
Repeating an Oops
I read somewhere that repeating an oops can somthimes be quite a good way of tracking it down. Even though I had had this problem several times I had not actually compared several oops to see if they where all simlar or occouring in the same function so this is what I did next. I rebooted the machine and re-created the oops. The next oops can be ssen below
Unable to handle kernel paging request at virtual address 00650d50 printing eip: c0161fb4 *pde = 00000000 Oops: 0000 [#1] CPU: 0 EIP: 0060:[mpage_writepage+116/1344] Not tainted EFLAGS: 00010246 (2.6.5) EIP is at mpage_writepage+0x74/0x540 eax: 2000102d ebx: 00000000 ecx: 0000000c edx: 00650d50 esi: c10a9998 edi: 00650d50 ebp: f753e180 esp: c1ba9d10 ds: 007b es: 007b ss: 0068 Process pdflush (pid: 6, threadinfo=c1ba8000 task=c1bab700) Stack: eaf57800 c10a99c0 00001000 00000000 00000000 00000000 00000000 00000000 00000000 00000001 ec79e6c0 00000001 0000000c f753e20c ec79e6c0 4c5a0d66 0000e2c2 99f3269a 00000071 c1ba9d8c 00000082 00000001 c0112c3d 00000000 Call Trace: [scheduler_tick+109/1296] scheduler_tick+0x6d/0x510 [schedule+740/1280] schedule+0x2e4/0x500 [mpage_writepages+596/704] mpage_writepages+0x254/0x2c0 [ext2_get_block+0/880] ext2_get_block+0x0/0x370 [ext2_writepages+31/48] ext2_writepages+0x1f/0x30 [ext2_get_block+0/880] ext2_get_block+0x0/0x370 [do_writepages+30/64] do_writepages+0x1e/0x40 [__sync_single_inode+169/480] __sync_single_inode+0xa9/0x1e0 [sync_sb_inodes+331/496] sync_sb_inodes+0x14b/0x1f0 [writeback_inodes+51/80] writeback_inodes+0x33/0x50 [background_writeout+123/192] background_writeout+0x7b/0xc0 [pdflush+0/48] pdflush+0x0/0x30 [__pdflush+159/336] __pdflush+0x9f/0x150 [pdflush+40/48] pdflush+0x28/0x30 [background_writeout+0/192] background_writeout+0x0/0xc0 [pdflush+0/48] pdflush+0x0/0x30 [kthread+165/176] kthread+0xa5/0xb0 [kthread+0/176] kthread+0x0/0xb0 [kernel_thread_helper+5/20] kernel_thread_helper+0x5/0x14 Code: 8b 02 a8 04 0f 85 f2 02 00 00 8b 02 a8 10 0f 85 8c 02 00 00
I noticed a major difference in this and the last one. The oops was being gernerated in differnetn function calls. This was a bit odd and not what I expected at all. I would have expected the two oopsen to have appeared at least vaguely familiar but the didn't. This did not bode well for me because it suggest a spurious error that is normaly hardware related.
Reproducing the oops again gave me
Unable to handle kernel paging request at virtual address 000e1b58 printing eip: c0133e61 *pde = 00000000 Oops: 0002 [#1] CPU: 0 EIP: 0060:[activate_page+49/128] Not tainted EFLAGS: 00010046 (2.6.5) EIP is at activate_page+0x31/0x80 eax: c17f7a50 ebx: c10ebe60 ecx: c10ebe78 edx: 000e1b58 esi: c0300fd8 edi: c10ebe60 ebp: d1883ae0 esp: d588fd68 ds: 007b es: 007b ss: 0068 Process postmaster (pid: 823, threadinfo=d588e000 task=df3cacc0) Stack: c10ebe60 00001000 c0133ed8 00000000 c012df8b d93a70c0 c10ebe60 00000000 00001000 d588fdf4 00000001 00000001 00000337 c62a5c40 c014a1fd c1b49600 00000000 00000001 00001000 00000000 d588fdf4 00000000 d1883a54 40a10d00 Call Trace: [mark_page_accessed+40/48] mark_page_accessed+0x28/0x30 [generic_file_aio_write_nolock+1099/2672] generic_file_aio_write_nolock+0x44b/0xa70 [bio_hw_segments+45/48] bio_hw_segments+0x2d/0x30 [scheduler_tick+31/1296] scheduler_tick+0x1f/0x510 [buffered_rmqueue+191/352] buffered_rmqueue+0xbf/0x160 [update_process_times+70/96] update_process_times+0x46/0x60 [update_wall_time+11/64] update_wall_time+0xb/0x40 [do_timer+223/240] do_timer+0xdf/0xf0 [generic_file_aio_write+119/160] generic_file_aio_write+0x77/0xa0 [ext3_file_write+68/192] ext3_file_write+0x44/0xc0 [do_sync_write+139/192] do_sync_write+0x8b/0xc0 [permission+70/80] permission+0x46/0x50 [permission+70/80] permission+0x46/0x50 [get_empty_filp+104/224] get_empty_filp+0x68/0xe0 [update_process_times+70/96] update_process_times+0x46/0x60 [dentry_open+282/432] dentry_open+0x11a/0x1b0 [filp_open+98/112] filp_open+0x62/0x70 [do_sync_write+0/192] do_sync_write+0x0/0xc0 [vfs_write+184/304] vfs_write+0xb8/0x130 [sys_write+66/112] sys_write+0x42/0x70 [syscall_call+7/11] syscall_call+0x7/0xb Code: 89 02 c7 41 04 00 02 20 00 c7 43 18 00 01 10 00 ff 4e 2c 0f
This is another oops which appears completely different than the last one so I am starting to think that my problem is hardware related and not a kernel problem at all. It was at this point that I decided to send a bug report to the kernel mailing list. I can hear some of you say why send it if I knew it was hardware, the answer is that I didn't know it was hardware and that I was just guessing based on prior experience of electronics. The bug report is at the bottom of the page.
I got a couple of replies from people direct one of which suggested using memtest86+ to test my memory. I had already downloaded it so decided to give it a whirl. To start memtest86 you need to configure lilo to boot it rather than the kernel. The following lines are what I added to /etc/lilo.conf
image=/boot/memtest86+.bin label=memtest86
remeber to change the "default=label_name" entry to use the new image. Do not reboot the machine unless you have a working resue disk or a working boot disk otherwise you won't be able to get back into the machine.
Memtest86 errors
Please be aware that because memtest finds errors it does not necessarily mean you have dodgy RAM. You could have a bad motherboard or chipset or they may not be seated correctly. When I ran memtest86 I got errors during tests 5 and 6. This was dissapointing because hardware bugs are a bit more terminal than software bugs and usually involves spending more money or wrangling with you supplier to get the parts replaced. I ran the memtest a few times and I was getting errros during the same tests without fail so I resigned my self to the fact that I my memory was dodgy. This is when I noticed something quite obvious that I should have noticed before. The RAM speed was showing DDR333. This rung alarm bells with me, I had bought the RAM quite a while ago and was pretty sure that it was slow stuff ie DDR266 or something similar. I reboooted the machine and found the DDR timings which where set to "Auto". I reduced the timing to DDR300 and reran memtest and lo and behold no more errors.
Below is a bug report I sent to the linux kernel mailing list.
I have not submitted a bug report before so I hope this is enough information. If any more is required please let me know. [1] Getting Oops's during heavy filesystem access [2] I initially thought this was a hardware problem because I was trying to use the SATA on a MSI K8T Neo motherboard. I switched to using normal IDE disks and got another Oops using the 2.6.5 kernel. I reverted back to the old binary 2.2.20 kernel and tried to reproduce the problem but was unable to. [3] IDE SATA [4.] kernel 2.6.5 [5.] I have had three seperate Oops all of which look completely different, at least to me. I have only included the first Oops from each occurence. Unable to handle kernel paging request at virtual address 00ff0744 printing eip: c01e5351 *pde = 00000000 Oops: 0000 [#1] CPU: 0 0060:[generic_make_request+17/384] Not tainted EFLAGS: 00010282 (2.6.5) EIP is at generic_make_request+0x11/0x180 eax: 00000202 ebx: 007d8008 ecx: 00ff0740 edx: e8eea300 esi: fb001000 edi: e8eea300 ebp: 00000040 esp: f3ee5d70 ds: 007b es: 007b ss: 0068 Process kjournald (pid: 868, threadinfo=f3ee4000 task=f572b2a0) Stack: f3ee5d70 f3ee5d70 f3ee5da4 00000082 f1b9f8c0 e8eea300 00000000 00000000 00000010 c01495ab f7fed8a0 00000010 00000000 e908fbd0 00000001 007d8008 00000013 00000001 00000040 c01e54fd e8eea300 e908fbd0 c0148fb0 00000001 Call Trace: [bio_alloc+203/416] bio_alloc+0xcb/0x1a0 [submit_bio+61/112] submit_bio+0x3d/0x70 [ll_rw_block+96/128] ll_rw_block+0x60/0x80 [journal_commit_transaction+3533/4048] journal_commit_transaction+0xdcd/0xfd0 [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50 [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50 [kjournald+180/464] kjournald+0xb4/0x1d0 [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50 [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50 [ret_from_fork+6/20] ret_from_fork+0x6/0x14 [commit_timeout+0/16] commit_timeout+0x0/0x10 [kjournald+0/464] kjournald+0x0/0x1d0 [kernel_thread_helper+5/20] kernel_thread_helper+0x5/0x14 Code: 8b 41 04 c1 ee 09 8b 50 38 8b 40 34 0f ac d0 09 85 c0 89 c3 Unable to handle kernel paging request at virtual address 00650d50 printing eip: c0161fb4 *pde = 00000000 Oops: 0000 [#1] CPU: 0 EIP: 0060:[mpage_writepage+116/1344] Not tainted EFLAGS: 00010246 (2.6.5) EIP is at mpage_writepage+0x74/0x540 eax: 2000102d ebx: 00000000 ecx: 0000000c edx: 00650d50 esi: c10a9998 edi: 00650d50 ebp: f753e180 esp: c1ba9d10 ds: 007b es: 007b ss: 0068 Process pdflush (pid: 6, threadinfo=c1ba8000 task=c1bab700) Stack: eaf57800 c10a99c0 00001000 00000000 00000000 00000000 00000000 00000000 00000000 00000001 ec79e6c0 00000001 0000000c f753e20c ec79e6c0 4c5a0d66 0000e2c2 99f3269a 00000071 c1ba9d8c 00000082 00000001 c0112c3d 00000000 Call Trace: [scheduler_tick+109/1296] scheduler_tick+0x6d/0x510 [schedule+740/1280] schedule+0x2e4/0x500 [mpage_writepages+596/704] mpage_writepages+0x254/0x2c0 [ext2_get_block+0/880] ext2_get_block+0x0/0x370 [ext2_writepages+31/48] ext2_writepages+0x1f/0x30 [ext2_get_block+0/880] ext2_get_block+0x0/0x370 [do_writepages+30/64] do_writepages+0x1e/0x40 [__sync_single_inode+169/480] __sync_single_inode+0xa9/0x1e0 [sync_sb_inodes+331/496] sync_sb_inodes+0x14b/0x1f0 [writeback_inodes+51/80] writeback_inodes+0x33/0x50 [background_writeout+123/192] background_writeout+0x7b/0xc0 [pdflush+0/48] pdflush+0x0/0x30 [__pdflush+159/336] __pdflush+0x9f/0x150 [pdflush+40/48] pdflush+0x28/0x30 [background_writeout+0/192] background_writeout+0x0/0xc0 [pdflush+0/48] pdflush+0x0/0x30 [kthread+165/176] kthread+0xa5/0xb0 [kthread+0/176] kthread+0x0/0xb0 [kernel_thread_helper+5/20] kernel_thread_helper+0x5/0x14 Code: 8b 02 a8 04 0f 85 f2 02 00 00 8b 02 a8 10 0f 85 8c 02 00 00 Unable to handle kernel paging request at virtual address 000e1b58 printing eip: c0133e61 *pde = 00000000 Oops: 0002 [#1] CPU: 0 EIP: 0060:[activate_page+49/128] Not tainted EFLAGS: 00010046 (2.6.5) EIP is at activate_page+0x31/0x80 eax: c17f7a50 ebx: c10ebe60 ecx: c10ebe78 edx: 000e1b58 esi: c0300fd8 edi: c10ebe60 ebp: d1883ae0 esp: d588fd68 ds: 007b es: 007b ss: 0068 Process postmaster (pid: 823, threadinfo=d588e000 task=df3cacc0) Stack: c10ebe60 00001000 c0133ed8 00000000 c012df8b d93a70c0 c10ebe60 00000000 00001000 d588fdf4 00000001 00000001 00000337 c62a5c40 c014a1fd c1b49600 00000000 00000001 00001000 00000000 d588fdf4 00000000 d1883a54 40a10d00 Call Trace: [mark_page_accessed+40/48] mark_page_accessed+0x28/0x30 [generic_file_aio_write_nolock+1099/2672] generic_file_aio_write_nolock+0x44b/0xa70 [bio_hw_segments+45/48] bio_hw_segments+0x2d/0x30 [scheduler_tick+31/1296] scheduler_tick+0x1f/0x510 [buffered_rmqueue+191/352] buffered_rmqueue+0xbf/0x160 [update_process_times+70/96] update_process_times+0x46/0x60 [update_wall_time+11/64] update_wall_time+0xb/0x40 [do_timer+223/240] do_timer+0xdf/0xf0 [generic_file_aio_write+119/160] generic_file_aio_write+0x77/0xa0 [ext3_file_write+68/192] ext3_file_write+0x44/0xc0 [do_sync_write+139/192] do_sync_write+0x8b/0xc0 [permission+70/80] permission+0x46/0x50 [permission+70/80] permission+0x46/0x50 [get_empty_filp+104/224] get_empty_filp+0x68/0xe0 [update_process_times+70/96] update_process_times+0x46/0x60 [dentry_open+282/432] dentry_open+0x11a/0x1b0 [filp_open+98/112] filp_open+0x62/0x70 [do_sync_write+0/192] do_sync_write+0x0/0xc0 [vfs_write+184/304] vfs_write+0xb8/0x130 [sys_write+66/112] sys_write+0x42/0x70 [syscall_call+7/11] syscall_call+0x7/0xb Code: 89 02 c7 41 04 00 02 20 00 c7 43 18 00 01 10 00 ff 4e 2c 0f [6] I found the problem while restoring a database cat database.gz | gunzip | psql dbname where database.gz is a 600Mb file [7] Debian sarge ( mild and sunny ;-) [7.1] debian:~# /usr/src/kernel-source-2.6.5/scripts/ver_linux If some fields are empty or look unusual you may have an old version. Compare to the current minimal requirements in Documentation/Changes. Linux debian 2.6.5 #1 Sun Apr 25 19:53:20 BST 2004 i686 GNU/Linux Gnu C 3.3.3 Gnu make 3.80 binutils 2.14.90.0.7 util-linux 2.12 mount 2.12 module-init-tools 3.0-pre10 e2fsprogs 1.35 pcmcia-cs 3.2.5 PPP 2.4.2 Linux C Library 2.3.2 Dynamic linker (ldd) 2.3.2 Procps 3.2.0 Net-tools 1.60 Console-tools 0.2.3 Sh-utils 5.0.91 Modules Loaded tulip crc32 af_packet [7.2.] debian:~# cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 4 model name : AMD Athlon(tm) 64 Processor 3200+ stepping : 8 cpu MHz : 2001.027 cache size : 1024 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall mmxext lm 3dnowext 3dnow bogomips : 3940.35 [7.3.] debian:~# cat /proc/modules tulip 36640 0 - Live 0xf88bd000 crc32 3840 1 tulip, Live 0xf88a8000 af_packet 12552 2 - Live 0xf88aa000 [7.4.] debian:~# cat /proc/ioports 0000-001f : dma1 0020-0021 : pic1 0040-005f : timer 0060-006f : keyboard 0080-008f : dma page reg 00a0-00a1 : pic2 00c0-00df : dma2 00f0-00ff : fpu 0170-0177 : ide1 01f0-01f7 : ide0 0376-0376 : ide1 03c0-03df : vga+ 03f6-03f6 : ide0 0cf8-0cff : PCI conf1 bc00-bcff : 0000:00:11.5 c000-c0ff : 0000:00:0f.0 c400-c40f : 0000:00:0f.0 c400-c407 : ide2 c408-c40f : ide3 c800-c803 : 0000:00:0f.0 c802-c802 : ide3 cc00-cc07 : 0000:00:0f.0 cc00-cc07 : ide3 d000-d003 : 0000:00:0f.0 d400-d407 : 0000:00:0f.0 d800-d87f : 0000:00:0e.0 dc00-dcff : 0000:00:0b.0 e000-e0ff : 0000:00:07.0 e000-e0ff : tulip e400-e47f : 0000:00:0d.0 e400-e47f : sata_promise e800-e80f : 0000:00:0d.0 e800-e80f : sata_promise ec00-ec3f : 0000:00:0d.0 ec00-ec3f : sata_promise fc00-fc0f : 0000:00:0f.1 fc00-fc07 : ide0 fc08-fc0f : ide1 debian:~# cat /proc/iomem 00000000-0009fbff : System RAM 0009fc00-0009ffff : reserved 000a0000-000bffff : Video RAM area 000cc800-000cd7ff : Extension ROM 000e0000-000effff : Extension ROM 000f0000-000fffff : System ROM 00100000-3ffeffff : System RAM 00100000-002b24f6 : Kernel code 002b24f7-0033d13f : Kernel data 3fff0000-3fff7fff : ACPI Tables 3fff8000-3fffffff : ACPI Non-volatile Storage bdc00000-cdbfffff : PCI Bus #01 c0000000-c7ffffff : 0000:01:00.0 cdd00000-cfdfffff : PCI Bus #01 ce000000-ceffffff : 0000:01:00.0 cff60000-cff7ffff : 0000:00:0d.0 cff60000-cff7ffff : sata_promise cfffe000-cfffefff : 0000:00:0d.0 cfffe000-cfffefff : sata_promise cffff000-cffff7ff : 0000:00:0e.0 cffffe00-cffffeff : 0000:00:0b.0 cfffff00-cfffffff : 0000:00:07.0 cfffff00-cfffffff : tulip d0000000-d1ffffff : 0000:00:00.0 fec00000-fec00fff : reserved fee00000-fee00fff : reserved fff80000-ffffffff : reserved [7.5.] 0000:00:00.0 Host bridge: VIA Technologies, Inc. VT8385 [K8T800 AGP] Host Bridge (rev 01) Subsystem: VIA Technologies, Inc. VT8385 [K8T800 AGP] Host Bridge Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-SERR- Capabilities: [c0] #08 [0060] Capabilities: [68] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] #08 [8001] 0000:00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI bridge [K8T800 South] (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- Reset- FastB2B- Capabilities: [80] Power Management version 2 Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 0000:00:07.0 Ethernet controller: Lite-On Communications Inc LNE100TX (rev 20) Subsystem: Netgear FA310TX Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- [7.6.] debian:~# cat /proc/scsi/scsi Attached devices: [X.] Since I am unable to reproduce the problem with the old binary 2.2.20 on normal IDE disks but I can on the same disks when using the 2.6.5 compiled kernel I am making the wild assumption that it is not hardware related. I tried to find rougly where the problem was using objdump -d /mnt/hdc2/usr/src/kernel-source-2.6.4/drivers/block/ll_rw_blk.o objdump -d /usr/src/linux/fs/bio.o objdump -d fs/mpage.o and trying to use the offsets from the oops to see where the problem was but I was unable to locate the offset in each of the files. This is probably more my inexperience than anything else. If there is a decent tutorial on how to do this sort of thing I would appreciate a pointer or two. So far the only thing I can think of is that my compiler is dodgy or I am having spurious memory problems. yours Harry
The more observant among you will have noticed that it is fairly lengthy. This is because I followed the kernel bug reporting procedure and tried to provide as much information as possible. If you want a reply you would do well to follow the guidlines and help the kernel hackers as much as you can, they are busy people.