Kernel Oops - BUG: unable to handle kernel NULL pointer dereference at (null); EIP is at radeon_suspend_kms+0x78/0x1e0 [radeon]
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Expired
|
Undecided
|
Unassigned |
Bug Description
XZ
ProblemType: KernelOops
DistroRelease: Ubuntu 11.04
Package: linux-image-
Regression: Yes
Reproducible: Yes
ProcVersionSign
Uname: Linux 2.6.38-8-generic i686
NonfreeKernelMo
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.23.
Annotation: Your system might become unstable now and might need to be restarted.
Architecture: i386
ArecordDevices:
**** List of CAPTURE Hardware Devices ****
card 0: Intel [HDA Intel], device 0: ALC272 Analog [ALC272 Analog]
Subdevices: 1/1
Subdevice #0: subdevice #0
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/
/dev/snd/pcmC0D0p: phobos 1604 F...m pulseaudio
CRDA: Error: [Errno 2] Нет такого файла или каталога
Card0.Amixer.info:
Card hw:0 'Intel'/'HDA Intel at 0xf0800000 irq 47'
Mixer name : 'Realtek ALC272'
Components : 'HDA:10ec0272,
Controls : 19
Simple ctrls : 11
Card1.Amixer.info:
Card hw:1 'HDMI'/'HDA ATI HDMI at 0xf0410000 irq 48'
Mixer name : 'ATI R6xx HDMI'
Components : 'HDA:1002aa01,
Controls : 4
Simple ctrls : 1
Card1.Amixer.
Simple mixer control 'IEC958',0
Capabilities: pswitch pswitch-joined penum
Playback channels: Mono
Mono: Playback [on]
Date: Sun Apr 10 17:04:47 2011
Failure: oops
HibernationDevice: RESUME=
MachineType: LENOVO 20032
ProcKernelCmdLine: BOOT_IMAGE=
RelatedPackageV
linux-
linux-
linux-firmware 1.50
SourcePackage: linux
Title: BUG: unable to handle kernel NULL pointer dereference at (null)
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 01/20/2010
dmi.bios.vendor: LENOVO
dmi.bios.version: 22CN35WW(V2.02)
dmi.board.name: NIUR1
dmi.board.vendor: LENOVO
dmi.board.version: REFERENCE
dmi.chassis.
dmi.chassis.type: 10
dmi.chassis.vendor: No Enclosure
dmi.chassis.
dmi.modalias: dmi:bvnLENOVO:
dmi.product.name: 20032
dmi.product.
dmi.sys.vendor: LENOVO
summary: |
BUG: unable to handle kernel NULL pointer dereference at (null) + radeon_suspend_kms+0x78/0x1e0 [radeon] from + radeon_switcheroo_set_state+0x4b/0xa0 |
Changed in linux (Ubuntu): | |
status: | New → Confirmed |
tags: | added: radeon |
tags: | added: kernel-driver-radeon |
I was pointed to this by a friend of mine (Mayank Rungta) whom I helped crack a radeon driver OOPs on suspend: (bug 820746)
https:/ /bugs.launchpad .net/ubuntu/ +source/ linux/+ bug/820746/
For some reason, he wanted me to take a look at this issue as well which has been inactive for sometime :)
Detailed RCA and some action items for the guy who reproduced this. Its evident that he had 3 connectors (or displays) attached to the radeon driver during boot time and might be also suspending with them. (LVDS laptop display + VGA connector display + HDMI connector attached). Need to know how it was reproduced or whether laptop lid was closed/suspended with all 3 connectors attached. So I can ask my friend with a similar hardware and radeon driver (same guy who reproduced 820746) to reproduce this.
Read ahead for the full story: ======= =======
=======
Again using the objdump disassembly of the radeon driver (radeon.ko.out) attached from bug 820746 (same 2.6.38), I managed to crack the place that's causing the OOPs.
Reverse engineering the OOPs to the assembly and mapping the assembly to C code, the panic was triggered by this instruction on radeon driver suspend in radeon_suspend_kms:
radeon_ suspend_ kms.c:
/* turn off display hw */ each_entry( connector, &dev->mode_ config. connector_ list, head) { connector_ dpms(connector, DRM_MODE_DPMS_OFF);
list_for_
drm_helper_
}
In the above list_head iteration of connector_list for the radeon drm_device on SUSPEND, the dev->mode_ config. connector_ list.next is NULL.
Or in other words, the DRM device connector list is _corrupted_. Its mostly certain that the device connector was detached or destroyed while suspend is trying to switch off display on all your connectors.
dev->mde_ config. connector_ list.next is NULL
the panic or faulting instruction EIP was triggered by a NULL in register EBX
EBX value is 0xfffffea8
which is nothing but: NULL pointer - 0x158,
which is nothing but: ~0U - 0x157.
The OOPs EIP is at: suspend_ kms+0x78
radeon_
which from objdump disassembly maps to:
which is radeon_suspend_kms + 19888
19888: 8b 83 58 01 00 00 mov 0x158(%ebx),%eax
bingo:
as thats a list_entry macro trying to iterator "struct drm_connector" or drm connector list. The drm connector list head field is at offset 0x158 which has to be subtracted from the list_head pointer to arrive at the drm_connector.
So at panic time, the radeon driver OOps while trying to suspend display on each of the connected devices.
But the connector list was corrupted.
Also the OOPs hexdump exactly matches the objdump dissassembly hexdump at the time of the panic:
81 eb 58 01 00 00 <8b> 83 58 01 00 00 0f
<8b> (angular brackets) is the faulting instruction or the "mov".
This matches the list_head walk for the drm connector from the objdump disassembly of radeon_suspend_kms function:
19882: 81 eb 58 01 00 00 sub $0x158,%ebx
19888: 8b 83 58 01 00 00 mov 0x158(%ebx),%eax ->PANIC EIP is here.
Now that we know the C code and the reason of the Oops or the null pointer field, we have to trace backwards in code and see how the drm connector list can be corrupted or can have NULL as a list element or a co...