[drm:i915_gem_idle] *ERROR* hardware wedged

Bug #358574 reported by Matt Zimmerman
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Triaged
Medium
Unassigned
Jaunty
Triaged
Medium
Unassigned

Bug Description

After updating to the latest Jaunty (20090408), I wanted to check whether graphics performance had improved for me as a result of bug 349314. I booted into the latest kernel, logged in, and tried to stress the graphics hardware. I started a bunch of programs on multiple workspaces, including a web browser, Totem playing video, etc., and switched rapidly between them. Eventually, it froze and didn't recover. alt+sysrq+k killed the X server, but the text console was corrupt. I was able to sync/unmount with sysrq, but not reboot, and had to cycle power.

After rebooting, I found this message (and some similar ones) in syslog.

ProblemType: Bug
Architecture: amd64
DistroRelease: Ubuntu 9.04
MachineType: LENOVO 6465CTO
Package: linux-image-2.6.28-11-generic 2.6.28-11.41
ProcCmdLine: root=UUID=305dde78-d20a-4248-aaf4-09447b7c5791 ro quiet splash
ProcEnviron:
 LC_COLLATE=C
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/zsh
ProcVersionSignature: Ubuntu 2.6.28-11.41-generic
SourcePackage: linux

Revision history for this message
Matt Zimmerman (mdz) wrote :
Revision history for this message
Matt Zimmerman (mdz) wrote :

Apr 8 13:52:00 perseus kernel: [ 1024.833410] SysRq : SAK
Apr 8 13:52:00 perseus kernel: [ 1024.833456] SAK: killed process 2893 (Xorg): task_session_nr(p)==tty->session
Apr 8 13:52:00 perseus kernel: [ 1024.833659] SAK: killed process 2893 (Xorg): task_session_nr(p)==tty->session
Apr 8 13:52:01 perseus kernel: [ 1025.886867] SysRq : HELP : loglevel0-8 reBoot Crashdump tErm Full kIll saK aLlcpus showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync showTasks Unmount shoW-blocked-tasks
Apr 8 13:52:02 perseus kernel: [ 1026.576118] [drm:i915_gem_idle] *ERROR* hardware wedged
Apr 8 13:52:02 perseus kernel: [ 1026.651310] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 1
Apr 8 13:52:02 perseus x-session-manager[3326]: WARNING: Detected that screensaver has left the bus
Apr 8 13:52:02 perseus gdm[2888]: WARNING: gdm_slave_xioerror_handler: Fatal X error - Restarting :0
Apr 8 13:52:03 perseus bonobo-activation-server (mdz-4978): could not associate with desktop session: Failed to connect to socket /tmp/dbus-Si1QOBTtkn: Connection refused
Apr 8 13:52:04 perseus acpid: client connected from 5039[0:0]
Apr 8 13:52:05 perseus kernel: [ 1029.087158] [drm:i915_setparam] *ERROR* unknown parameter 4
Apr 8 13:52:05 perseus kernel: [ 1029.258894] [drm:i915_gem_entervt_ioctl] *ERROR* Reenabling wedged hardware, good luck
Apr 8 13:52:08 perseus bonobo-activation-server (mdz-5056): could not associate with desktop session: Failed to connect to socket /tmp/dbus-Si1QOBTtkn: Connection refused
Apr 8 13:52:28 perseus kernel: [ 1052.072692] SysRq : SAK
Apr 8 13:52:28 perseus kernel: [ 1052.072776] SAK: killed process 5039 (Xorg): task_session_nr(p)==tty->session
Apr 8 13:52:28 perseus kernel: [ 1052.073027] SAK: killed process 5039 (Xorg): task_session_nr(p)==tty->session
Apr 8 13:52:29 perseus kernel: [ 1053.060083] SysRq : Emergency Sync
Apr 8 13:52:29 perseus kernel: [ 1053.060417] Emergency Sync complete
Apr 8 13:52:29 perseus kernel: [ 1053.756115] [drm:i915_gem_idle] *ERROR* hardware wedged
Apr 8 13:52:34 perseus kernel: [ 1058.317742] SysRq : Emergency Remount R/O
Apr 8 13:53:46 perseus syslogd 1.5.0#5ubuntu3: restart.

Steve Langasek (vorlon)
Changed in linux (Ubuntu Jaunty):
importance: Undecided → Medium
Revision history for this message
Steve Langasek (vorlon) wrote :

Importance set based on the apparent difficulty of reporducing the failure (one reported occurrence, under heavy load); but feel free to raise the importance if appropriate.

Revision history for this message
Matt Zimmerman (mdz) wrote : Re: [Bug 358574] Re: [drm:i915_gem_idle] *ERROR* hardware wedged

On Thu, Apr 09, 2009 at 06:39:25PM -0000, Steve Langasek wrote:
> Importance set based on the apparent difficulty of reporducing the
> failure (one reported occurrence, under heavy load); but feel free to
> raise the importance if appropriate.

I would like to add that my system has been plenty stable on the whole with
recent Jaunty kernels including .39. The only graphics-related change which
went into .40 was reverted in .41, so I don't suspect this is a regression,
but probably a rare bug.

--
 - mdz

Changed in linux (Ubuntu Jaunty):
status: New → Triaged
Revision history for this message
Ömer Fadıl USTA (omerusta) wrote :

Apr 13 20:35:42 mylaptop kernel: [ 3195.456056] events/0 D f5a6e000 0 6 2
Apr 13 20:35:42 mylaptop kernel: [ 3195.456061] f6dd9f20 00000046 c19ce600 f5a6e000 f6541920 f5bb31c0 c0687340 c07b4500
Apr 13 20:35:42 mylaptop kernel: [ 3195.456069] f6c6bed0 f6c6c148 c19ce600 f6dd9f1c c19ce600 e2656e0c 000002c7 f6c6c148
Apr 13 20:35:42 mylaptop kernel: [ 3195.456076] 00000001 f59f581c f6513418 f651341c ffffffff f6dd9f48 c0501c1e f6c6bed0
Apr 13 20:35:42 mylaptop kernel: [ 3195.456082] Call Trace:
Apr 13 20:35:42 mylaptop kernel: [ 3195.456094] [<c0501c1e>] __mutex_lock_slowpath+0x6e/0xb0
Apr 13 20:35:42 mylaptop kernel: [ 3195.456098] [<c0501a57>] mutex_lock+0x17/0x20
Apr 13 20:35:42 mylaptop kernel: [ 3195.456113] [<f80db8d8>] i915_gem_retire_work_handler+0x28/0x70 [i915]
Apr 13 20:35:42 mylaptop kernel: [ 3195.456121] [<f80db8b0>] ? i915_gem_retire_work_handler+0x0/0x70 [i915]
Apr 13 20:35:42 mylaptop kernel: [ 3195.456127] [<c014afbd>] run_workqueue+0x8d/0x150
Apr 13 20:35:42 mylaptop kernel: [ 3195.456132] [<c014ef0a>] ? prepare_to_wait+0x3a/0x70
Apr 13 20:35:42 mylaptop kernel: [ 3195.456135] [<c014b238>] worker_thread+0x88/0xf0
Apr 13 20:35:42 mylaptop kernel: [ 3195.456139] [<c014ecb0>] ? autoremove_wake_function+0x0/0x50
Apr 13 20:35:42 mylaptop kernel: [ 3195.456143] [<c014b1b0>] ? worker_thread+0x0/0xf0
Apr 13 20:35:42 mylaptop kernel: [ 3195.456146] [<c014e90c>] kthread+0x3c/0x70
Apr 13 20:35:42 mylaptop kernel: [ 3195.456149] [<c014e8d0>] ? kthread+0x0/0x70
Apr 13 20:35:42 mylaptop kernel: [ 3195.456154] [<c0105477>] kernel_thread_helper+0x7/0x10
Apr 13 20:37:12 mylaptop syslogd 1.5.0#5ubuntu3: restart.

Revision history for this message
Bryce Harrington (bryce) wrote :

I'm going to go ahead and dupe this to 359392 since the (relative!) ease of reproducing it matches my experience with that bug.

However, I am certain we have multiple X freeze bugs (most of which seem to be either extremely rare, or corner cases), so I can't say with 100% confidence yet that this is the same bug. So, after 359392 is fixed, you'll want to re-test this, and undupe if you can still reproduce the problem.

Revision history for this message
Bryce Harrington (bryce) wrote :

Oh, and fwiw I don't think this is a kernel bug.

With X freezes, most of the time the situation is that the GPU gets locked up. The kernel is probably just noticing it and going, "hey, something wedged the gpu!"

Typically, the way to distinguish an X freeze from a kernel freeze is to see if the machine is still ssh'able (sometimes via ethernet, if your wireless is driven by your login session). With X freezes, the underlying system is still all there.

@Omer, I'm not sure what you're posting, but it doesn't seem to match mdz's output, so I'm guessing you have some other bug? You should probably report that separately.

Revision history for this message
Ted Pritchard (k-launchpad-pritchard-uk-net) wrote :

I'm also getting the same problem on a (rather old) Dell Dimension 2400. The problem can occur after logging in and leaving the PC alone with no further interaction. Sometimes it take 10-20 mins and other times it might last a couple of hours of active use (mainly Firefox).

I've attached a sample of the log file and also lspci -vv

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.