[i945GME] X freeze on Dell Mini 10v in Karmic

Bug #424055 reported by Matt Zimmerman
26
This bug affects 3 people
Affects Status Importance Assigned to Milestone
xf86-video-intel
Invalid
Critical
xserver-xorg-video-intel (Ubuntu)
Fix Released
High
Unassigned

Bug Description

Binary package hint: xserver-xorg-video-intel

I'm afraid I don't have very much information about this, but I wanted to get a bug report open because it's potentially serious. I know of three reports in the past week or so of GPU hangs on Intel chipsets:

1. Me, on a Dell mini 10v (hardware info attached to this bug)
2. Tim Gardner, on his laptop (hardware info to be posted here)
3. Pete Graner, on a new netbook (hardware info to be posted here)

I personally observed #2 as well as #1. In the #2 case, Tim was able to ssh in, and I checked debugfs to see that the GPU appeared to be hung. I ran intel_gpu_dump and saved the output to Tim's home directory (to be attached here).

ProblemType: Bug
Architecture: i386
Date: Thu Sep 3 16:16:40 2009
DistroRelease: Ubuntu 9.10
MachineType: Dell Inc. Inspiron 1011
NonfreeKernelModules: wl
Package: xorg 1:7.4+3ubuntu5
PccardctlIdent:

PccardctlStatus:

ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-9-generic root=UUID=0e0805c9-684a-4646-9724-954e05f7dd01 ro quiet splash
ProcEnviron:
 LC_COLLATE=C
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/zsh
ProcVersionSignature: Ubuntu 2.6.31-9.29-generic
RelatedPackageVersions:
 xserver-xorg 1:7.4+3ubuntu5
 libgl1-mesa-glx 7.6.0~git20090817.7c422387-0ubuntu3
 libdrm2 2.4.12+git20090801.45078630-0ubuntu1
 xserver-xorg-video-intel 2:2.8.1-1ubuntu1
 xserver-xorg-video-ati 1:6.12.99+git20090629.f39cafc5-0ubuntu6
SourcePackage: xorg
Tags: ubuntu-unr
Uname: Linux 2.6.31-9-generic i686
dmi.bios.date: 03/20/2009
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A00
dmi.board.name: CN0Y53
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 8
dmi.chassis.vendor: Dell Inc.
dmi.chassis.version: A00
dmi.modalias: dmi:bvnDellInc.:bvrA00:bd03/20/2009:svnDellInc.:pnInspiron1011:pvrA00:rvnDellInc.:rnCN0Y53:rvrA00:cvnDellInc.:ct8:cvrA00:
dmi.product.name: Inspiron 1011
dmi.product.version: A00
dmi.sys.vendor: Dell Inc.
fglrx: Not loaded
system:
 distro: Ubuntu
 architecture: i686kernel: 2.6.31-9-generic

Revision history for this message
Matt Zimmerman (mdz) wrote :
Revision history for this message
Matt Zimmerman (mdz) wrote :

To anyone experiencing this issue, there is some information on how to capture debug information automatically at http://mdzlog.alcor.net/2009/06/17/collecting-debug-information-when-your-gpu-hangs/#comments

Bryce may have something more up-to-date though.

Changed in xserver-xorg-video-intel (Ubuntu):
importance: Undecided → High
assignee: nobody → Bryce (bryce)
Revision history for this message
Matt Zimmerman (mdz) wrote :

I wasn't able to collect any additional information when I observed a hang, because I didn't have access to another system on the same network to ssh in (though based on my observations, I'm confident it was only the GPU. The system responded to sysrq+k, though the console was never properly restored).

If there is another way other than ssh to get a text login on the system when it is in this state, I would like to know so that I can try it if it happens again.

Revision history for this message
Matt Zimmerman (mdz) wrote :

Pete, Tim, please file separate bugs with your observations, and mention them here, rather than commenting here (just in case they're different root causes).

Changed in xserver-xorg-video-intel (Ubuntu):
assignee: Bryce (bryce) → Bryce Harrington (bryceharrington)
Revision history for this message
Matt Zimmerman (mdz) wrote :

This appeared in syslog around the time of the hang, confirming it's a GPU lockup:

Sep 3 15:46:24 atomicity kernel: [21000.796177] INFO: task i915/0:828 blocked for more than 120 seconds.
Sep 3 15:46:24 atomicity kernel: [21000.796193] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 3 15:46:24 atomicity kernel: [21000.796206] i915/0 D c0805380 0 828 2 0x00000000
Sep 3 15:46:24 atomicity kernel: [21000.796227] ef44bf04 00000046 f707e000 c0805380 ef4ecdf8 c0805380 1b290057 000012f1
Sep 3 15:46:24 atomicity kernel: [21000.796256] c0805380 c0805380 ef4ecdf8 c0805380 00000000 000012f1 c0805380 eec77880
Sep 3 15:46:24 atomicity kernel: [21000.796283] ef4ecb60 ef52a414 ef52a418 ffffffff ef44bf30 c0567556 c073ce80 ef52a41c
Sep 3 15:46:24 atomicity kernel: [21000.796310] Call Trace:
Sep 3 15:46:24 atomicity kernel: [21000.796339] [<c0567556>] __mutex_lock_slowpath+0xc6/0x130
Sep 3 15:46:24 atomicity kernel: [21000.796356] [<c0567470>] mutex_lock+0x20/0x40
Sep 3 15:46:24 atomicity kernel: [21000.796424] [<f824e94a>] i915_gem_retire_work_handler+0x2a/0x70 [i915]
Sep 3 15:46:24 atomicity kernel: [21000.796459] [<c015312e>] run_workqueue+0x6e/0x140
Sep 3 15:46:24 atomicity kernel: [21000.796515] [<f824e920>] ? i915_gem_retire_work_handler+0x0/0x70 [i915]
Sep 3 15:46:24 atomicity kernel: [21000.796547] [<c0153288>] worker_thread+0x88/0xe0
Sep 3 15:46:24 atomicity kernel: [21000.796569] [<c0157930>] ? autoremove_wake_function+0x0/0x40
Sep 3 15:46:24 atomicity kernel: [21000.796590] [<c0153200>] ? worker_thread+0x0/0xe0
Sep 3 15:46:24 atomicity kernel: [21000.796604] [<c015763c>] kthread+0x7c/0x90
Sep 3 15:46:24 atomicity kernel: [21000.796618] [<c01575c0>] ? kthread+0x0/0x90
Sep 3 15:46:24 atomicity kernel: [21000.796635] [<c0103f17>] kernel_thread_helper+0x7/0x10

Revision history for this message
Bryce Harrington (bryce) wrote :

Upstream will need a batchbuffer dump before they can start analysis on this. Or if there is a specific way to reproduce the issue (aside from 'randomly') you can detail, that might be sufficient to get them going.

Changed in xserver-xorg-video-intel (Ubuntu):
status: New → Incomplete
Revision history for this message
Matt Zimmerman (mdz) wrote : Re: [Bug 424055] Re: Karmic Intel GPU hang

On Fri, Sep 04, 2009 at 12:13:40AM -0000, Bryce Harrington wrote:
> Upstream will need a batchbuffer dump before they can start analysis on
> this. Or if there is a specific way to reproduce the issue (aside from
> 'randomly') you can detail, that might be sufficient to get them going.

Assuming intel_gpu_dump provides that, Tim has one. See the thread I
started on ubuntu-devel about how we need to make it more straightforward to
collect this information (I wasn't able to).

--
 - mdz

Revision history for this message
In , Bryce Harrington (bryce) wrote :
Download full text (3.8 KiB)

Forwarding this bug from Ubuntu reporter Matt Zimmerman:
http://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/424055

[Problem]
Karmic Intel GPU hang

[Original Description]
I'm afraid I don't have very much information about this, but I wanted to get a bug report open because it's potentially serious.

I wasn't able to collect any additional information when I observed a hang, because I didn't have access to another system on the same network to ssh in (though based on my observations, I'm confident it was only the GPU. The system responded to sysrq+k, though the console was never properly restored).

If there is another way other than ssh to get a text login on the system when it is in this state, I would like to know so that I can try it if it happens again.

This appeared in syslog around the time of the hang, confirming it's a GPU lockup:

Sep 3 15:46:24 atomicity kernel: [21000.796177] INFO: task i915/0:828 blocked for more than 120 seconds.
Sep 3 15:46:24 atomicity kernel: [21000.796193] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 3 15:46:24 atomicity kernel: [21000.796206] i915/0 D c0805380 0 828 2 0x00000000
Sep 3 15:46:24 atomicity kernel: [21000.796227] ef44bf04 00000046 f707e000 c0805380 ef4ecdf8 c0805380 1b290057 000012f1
Sep 3 15:46:24 atomicity kernel: [21000.796256] c0805380 c0805380 ef4ecdf8 c0805380 00000000 000012f1 c0805380 eec77880
Sep 3 15:46:24 atomicity kernel: [21000.796283] ef4ecb60 ef52a414 ef52a418 ffffffff ef44bf30 c0567556 c073ce80 ef52a41c
Sep 3 15:46:24 atomicity kernel: [21000.796310] Call Trace:
Sep 3 15:46:24 atomicity kernel: [21000.796339] [<c0567556>] __mutex_lock_slowpath+0xc6/0x130
Sep 3 15:46:24 atomicity kernel: [21000.796356] [<c0567470>] mutex_lock+0x20/0x40
Sep 3 15:46:24 atomicity kernel: [21000.796424] [<f824e94a>] i915_gem_retire_work_handler+0x2a/0x70 [i915]
Sep 3 15:46:24 atomicity kernel: [21000.796459] [<c015312e>] run_workqueue+0x6e/0x140
Sep 3 15:46:24 atomicity kernel: [21000.796515] [<f824e920>] ? i915_gem_retire_work_handler+0x0/0x70 [i915]
Sep 3 15:46:24 atomicity kernel: [21000.796547] [<c0153288>] worker_thread+0x88/0xe0
Sep 3 15:46:24 atomicity kernel: [21000.796569] [<c0157930>] ? autoremove_wake_function+0x0/0x40
Sep 3 15:46:24 atomicity kernel: [21000.796590] [<c0153200>] ? worker_thread+0x0/0xe0
Sep 3 15:46:24 atomicity kernel: [21000.796604] [<c015763c>] kthread+0x7c/0x90
Sep 3 15:46:24 atomicity kernel: [21000.796618] [<c01575c0>] ? kthread+0x0/0x90
Sep 3 15:46:24 atomicity kernel: [21000.796635] [<c0103f17>] kernel_thread_helper+0x7/0x10

Architecture: i386
Date: Thu Sep 3 16:16:40 2009
DistroRelease: Ubuntu 9.10
MachineType: Dell Inc. Inspiron 1011
NonfreeKernelModules: wl
Package: xorg 1:7.4+3ubuntu5
PccardctlIdent:

PccardctlStatus:

ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-9-generic root=UUID=0e0805c9-684a-4646-9724-954e05f7dd01 ro quiet splash
ProcEnviron:
 LC_COLLATE=C
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/zsh
ProcVersionSignature: Ubuntu 2.6.31-9.29-generic
RelatedPackageVersions:
 xserver-xorg 1:7.4+3ubuntu5
 libgl1-mesa-glx 7.6.0~git20090817.7c422387-0ubuntu3
 libdrm2 2.4.12+git20090801.4507863...

Read more...

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created an attachment (id=29202)
XorgLogOld.txt

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created an attachment (id=29203)
XorgLog.txt

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created an attachment (id=29204)
CurrentDmesg.txt

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created an attachment (id=29205)
BootDmesg.txt

Revision history for this message
Tim Gardner (timg-tpi) wrote : Re: Karmic Intel GPU hang

Its not exactly reproducible, it only froze once in 4 days of constant activity. It's also a prototype laptop, so it may not have released hardware.

Revision history for this message
Bryce Harrington (bryce) wrote :

Well, like I said, I'm fairly sure upstream is going to require a batchbuffer dump and stuff before they'll look into this issue, but I've forwarded this bug upstream to https://bugs.freedesktop.org/show_bug.cgi?id=23699 and subscribed you to it just in case. Please follow up with them in case they need further information or wish you to test something.

Changed in xserver-xorg-video-intel (Ubuntu):
status: Incomplete → Triaged
assignee: Bryce Harrington (bryceharrington) → nobody
summary: - Karmic Intel GPU hang
+ Karmic Intel GPU hang on Dell Mini 10v
Changed in xserver-xorg-video-intel:
status: Unknown → Confirmed
Geir Ove Myhr (gomyhr)
summary: - Karmic Intel GPU hang on Dell Mini 10v
+ [i945GME] Karmic Intel GPU hang on Dell Mini 10v
tags: added: 945gme freeze
Revision history for this message
In , Gordon Jin (gordon-jin) wrote :

How to reproduce it?

If there's no steady reproduce steps, nor gpu dump, I don't suggest this to be higher priority than other gpu hang bugs.

Revision history for this message
Bryce Harrington (bryce) wrote : Re: [i945GME] Karmic Intel GPU hang on Dell Mini 10v

mdz, I've noticed in several other recent freeze bugs mentioned to me, that they only occur after you have resumed at least one time. Does this hold true for your case as well?

Revision history for this message
Matt Zimmerman (mdz) wrote : Re: [Bug 424055] Re: [i945GME] Karmic Intel GPU hang on Dell Mini 10v

On Tue, Sep 08, 2009 at 05:16:12PM -0000, Bryce Harrington wrote:
> mdz, I've noticed in several other recent freeze bugs mentioned to me,
> that they only occur after you have resumed at least one time. Does
> this hold true for your case as well?

I had definitely resumed at least once (I generally do so multiple times per
day on this netbook).

--
 - mdz

Bryce Harrington (bryce)
summary: - [i945GME] Karmic Intel GPU hang on Dell Mini 10v
+ [i945GME] X freeze on Dell Mini 10v in Karmic
Revision history for this message
Andy Whitcroft (apw) wrote :

We did have a suspend resume related fix for i915 which could have had any number of symptoms, mostly bad. I would be interested particularly in repeats of this with kernels later than Ubuntu 2.6.31-10.32.

Revision history for this message
In , Alberto Milone (albertomilone) wrote :

(In reply to comment #5)
> How to reproduce it?
>
> If there's no steady reproduce steps, nor gpu dump, I don't suggest this to be
> higher priority than other gpu hang bugs.
>

I'm no longer able to reproduce the problem. It's likely that Eric's mesa patch (about the relocation delta for WM surfaces) mentioned here fixed it:
https://bugs.freedesktop.org/show_bug.cgi?id=23932

Revision history for this message
Alberto Milone (albertomilone) wrote :

Can you try to reproduce the problem with a current Ubuntu image, please?

I can't reproduce the problem with my dell mini 10v anymore.

Revision history for this message
Andy Whitcroft (apw) wrote :

I believe this was a symptom of the mesa bug which was presenting bad low address bits in the reloc field. I also have not been able to reproduce this with recent karmic kernels on my Mini 10v.

Revision history for this message
Matt Zimmerman (mdz) wrote : Re: [Bug 424055] Re: [i945GME] X freeze on Dell Mini 10v in Karmic

On Fri, Oct 02, 2009 at 07:46:24AM -0000, Alberto Milone wrote:
> Can you try to reproduce the problem with a current Ubuntu image,
> please?

I was never able to reproduce it; it happened randomly. I have not seen it
at all recently, though.

--
 - mdz

Revision history for this message
Alberto Milone (albertomilone) wrote :

I asked because I think it's fixed (for the reasons that Andy mentioned).

I'm closing this bug report but if you or any of the subscribers to this report manage to reproduce the problem, feel free to reopen the report.

Changed in xserver-xorg-video-intel (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Luka Renko (lure) wrote :

OK, since I am still having occasional hangs in Karmic on X200s/gm45, I have submitted new bug 440523

Revision history for this message
In , Gordon Jin (gordon-jin) wrote :

Good to know this. Let's close it.

*** This bug has been marked as a duplicate of bug 23932 ***

Changed in xserver-xorg-video-intel:
status: Confirmed → Invalid
Changed in xserver-xorg-video-intel:
importance: Unknown → Critical
status: Invalid → Unknown
Changed in xserver-xorg-video-intel:
importance: Critical → Unknown
Changed in xserver-xorg-video-intel:
importance: Unknown → Critical
Changed in xserver-xorg-video-intel:
status: Unknown → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.