Comment 8 for bug 230878

Revision history for this message
Léobaillard (leobaillard) wrote :

I've launched the same ab test but the machine has frozen before I could see the results, nevertheless, when it was frozen, it didn't produce an OOM, the OOM has come later but not within the 4 hours limit.

Here is the initial OOM of apache2 and it seems to have taken down other processes before locking completely the machine :

May 20 01:30:00 yoda kernel: [347664.832222] apache2 invoked oom-killer: gfp_mask=0x1201d2, order=0, oomkilladj=-17
May 20 01:30:00 yoda kernel: [347664.832229] Pid: 6899, comm: apache2 Not tainted 2.6.24-16-server #1
May 20 01:30:00 yoda kernel: [347664.832246] [oom_kill_process+0x10a/0x120] oom_kill_process+0x10a/0x120
May 20 01:30:00 yoda kernel: [347664.832274] [out_of_memory+0x167/0x1a0] out_of_memory+0x167/0x1a0
May 20 01:30:01 yoda kernel: [347664.832299] [agpgart:__alloc_pages+0x34c/0x380] __alloc_pages+0x34c/0x380
May 20 01:30:03 yoda kernel: [347664.832334] [__do_page_cache_readahead+0x11d/0x240] __do_page_cache_readahead+0x11d/0x240
May 20 01:30:03 yoda kernel: [347664.832337] [sync_page+0x0/0x40] sync_page+0x0/0x40
May 20 01:30:03 yoda kernel: [347664.832368] [do_page_cache_readahead+0x4c/0x70] do_page_cache_readahead+0x4c/0x70
May 20 01:30:03 yoda kernel: [347664.832380] [filemap_fault+0x2f2/0x420] filemap_fault+0x2f2/0x420
May 20 01:30:05 yoda kernel: [347664.832387] [kmap_atomic_prot+0xfc/0x130] kmap_atomic_prot+0xfc/0x130
May 20 01:30:06 yoda kernel: [347664.832417] [__do_fault+0x83/0x4c0] __do_fault+0x83/0x4c0
May 20 01:30:06 yoda kernel: [347664.832445] [kmap_atomic_prot+0xfc/0x130] kmap_atomic_prot+0xfc/0x130
May 20 01:30:06 yoda kernel: [347664.832474] [handle_mm_fault+0x21b/0xb80] handle_mm_fault+0x21b/0xb80
May 20 01:30:06 yoda kernel: [347664.832543] [set_process_cpu_timer+0xa6/0xc0] set_process_cpu_timer+0xa6/0xc0
May 20 01:30:31 yoda kernel: [347664.832574] [do_page_fault+0x143/0x900] do_page_fault+0x143/0x900
May 20 01:30:32 yoda kernel: [347664.832580] [do_sigaction+0x65/0x170] do_sigaction+0x65/0x170
May 20 01:30:33 yoda kernel: [347664.832598] [recalc_sigpending+0xb/0x40] recalc_sigpending+0xb/0x40
May 20 01:30:33 yoda kernel: [347664.832609] [sys_rt_sigprocmask+0xed/0x110] sys_rt_sigprocmask+0xed/0x110
May 20 01:30:33 yoda kernel: [347664.832620] [do_page_fault+0x0/0x900] do_page_fault+0x0/0x900
May 20 01:30:33 yoda kernel: [347664.832624] [error_code+0x72/0x78] error_code+0x72/0x78
May 20 01:30:33 yoda kernel: [347664.832662] =======================
May 20 01:30:33 yoda kernel: [347664.832663] Mem-info:
May 20 01:30:33 yoda kernel: [347664.832665] DMA per-cpu:
May 20 01:30:33 yoda kernel: [347664.832667] CPU 0: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
May 20 01:30:33 yoda kernel: [347664.832669] Normal per-cpu:
May 20 01:30:33 yoda kernel: [347664.832670] CPU 0: Hot: hi: 186, btch: 31 usd: 177 Cold: hi: 62, btch: 15 usd: 58
May 20 01:30:33 yoda kernel: [347664.832672] HighMem per-cpu:
May 20 01:30:33 yoda kernel: [347664.832674] CPU 0: Hot: hi: 42, btch: 7 usd: 3 Cold: hi: 14, btch: 3 usd: 2
May 20 01:30:33 yoda kernel: [347664.832677] Active:195460 inactive:49533 dirty:0 writeback:0 unstable:0
May 20 01:30:33 yoda kernel: [347664.832678] free:2987 slab:3693 mapped:11 pagetables:2018 bounce:0
May 20 01:30:33 yoda kernel: [347664.832681] DMA free:4056kB min:68kB low:84kB high:100kB active:4588kB inactive:3344kB present:16256kB pages_scanned:13377 all_unreclaimable? yes
May 20 01:30:33 yoda kernel: [347664.832683] lowmem_reserve[]: 0 873 999 999
May 20 01:30:33 yoda kernel: [347664.832686] Normal free:7772kB min:3744kB low:4680kB high:5616kB active:724200kB inactive:126740kB present:894080kB pages_scanned:2424320 all_unreclaimable? yes
May 20 01:30:33 yoda kernel: [347664.832688] lowmem_reserve[]: 0 0 1014 1014
May 20 01:30:33 yoda kernel: [347664.832691] HighMem free:120kB min:128kB low:264kB high:400kB active:53052kB inactive:68048kB present:129796kB pages_scanned:604459 all_unreclaimable? yes
May 20 01:30:33 yoda kernel: [347664.832694] lowmem_reserve[]: 0 0 0 0
May 20 01:30:33 yoda kernel: [347664.832696] DMA: 2*4kB 2*8kB 0*16kB 0*32kB 3*64kB 0*128kB 3*256kB 2*512kB 0*1024kB 1*2048kB 0*4096kB = 4056kB
May 20 01:30:33 yoda kernel: [347664.832701] Normal: 485*4kB 1*8kB 8*16kB 4*32kB 3*64kB 6*128kB 4*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 7772kB
May 20 01:30:33 yoda kernel: [347664.832706] HighMem: 2*4kB 0*8kB 1*16kB 1*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 120kB
May 20 01:30:33 yoda kernel: [347664.832712] Swap cache: add 253027, delete 253022, find 609448/615602, race 0+12
May 20 01:30:33 yoda kernel: [347664.832714] Free swap = 0kB
May 20 01:30:33 yoda kernel: [347664.832715] Total swap = 779144kB
May 20 01:30:33 yoda kernel: [347664.832716] Free swap: 0kB
May 20 01:30:33 yoda kernel: [347664.836648] 262080 pages of RAM
May 20 01:30:33 yoda kernel: [347664.836650] 32704 pages of HIGHMEM
May 20 01:30:33 yoda kernel: [347664.836651] 3318 reserved pages
May 20 01:30:33 yoda kernel: [347664.836652] 12389 pages shared
May 20 01:30:33 yoda kernel: [347664.836653] 5 pages swap cached
May 20 01:30:33 yoda kernel: [347664.836654] 0 pages dirty
May 20 01:30:33 yoda kernel: [347664.836655] 0 pages writeback
May 20 01:30:33 yoda kernel: [347664.836656] 11 pages mapped
May 20 01:30:33 yoda kernel: [347664.836657] 3693 pages slab
May 20 01:30:33 yoda kernel: [347664.836658] 2018 pages pagetables

We can see here that it is 1:30AM, I've launched the last ab test at approximately 00:10, so it's 1h20 after... This problem is getting very weird... I'm so desperate not to find the solution because it's becoming urgent. Indeed I'm hosting about 100 websites and I'm forced to launch an other httpd to display an apology message...

I join to this message the last screenshot of htop before the ssh connection got stuck which was taken at 00:12.

Good luck in finding out what the problem is, and thanks a lot, I really need you :)