Uses 100% CPU with latest mesa/libdrm update

Bug #419264 reported by Rami Al-Rfou'
100
This bug affects 12 people
Affects Status Importance Assigned to Milestone
X.Org X server
Fix Released
Critical
compiz (Ubuntu)
Fix Released
High
Robert Ancell
Karmic
Invalid
High
Robert Ancell
linux (Ubuntu)
Fix Released
High
Leann Ogasawara
Karmic
Fix Released
High
Leann Ogasawara

Bug Description

Binary package hint: compiz

compiz eats 100% of the CPU even after restarting ! only kill -9 is able to close th crazy compiz process.

ProblemType: Bug
Architecture: i386
CompizPlugins: [core,ccp,dbus,place,mousepoll,gnomecompat,move,resize,decoration,png,svg,imgjpeg,text,neg,video,wall,snap,animation,scale,scaleaddon,expo,staticswitcher,regex,resizeinfo,workarounds,ezoom,vpswitch,extrawm,fade,session,shift,wobbly]
Date: Wed Aug 26 17:47:57 2009
DistroRelease: Ubuntu 9.10
MachineType: LENOVO 8933Y16
Package: compiz 1:0.8.2-0ubuntu16
PackageArchitecture: all
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
PciDisplay: 00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller [8086:2a02] (rev 0c)
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-7-generic root=UUID=8920ca3c-8a9b-4b68-893c-1fec8a7cf652 ro quiet splash
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-7.27-generic
RelatedPackageVersions:
 xserver-xorg 1:7.4+3ubuntu5
 libgl1-mesa-glx 7.6.0~git20090817.7c422387-0ubuntu2
 libdrm2 2.4.12+git20090801.45078630-0ubuntu1
 xserver-xorg-video-intel 2:2.8.0-0ubuntu2
 xserver-xorg-video-ati 1:6.12.99+git20090629.f39cafc5-0ubuntu6
SourcePackage: compiz
Uname: Linux 2.6.31-7-generic i686
XorgConf: Error: [Errno 2] No such file or directory: '/etc/X11/xorg.conf'
dmi.bios.date: 06/28/2007
dmi.bios.vendor: LENOVO
dmi.bios.version: 7OET24WW (1.03 )
dmi.board.name: 8933Y16
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr7OET24WW(1.03):bd06/28/2007:svnLENOVO:pn8933Y16:pvrThinkPadR61/R61i:rvnLENOVO:rn8933Y16:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 8933Y16
dmi.product.version: ThinkPad R61/R61i
dmi.sys.vendor: LENOVO
system: distro = Ubuntu, architecture = i686, kernel = 2.6.31-7-generic

Revision history for this message
Rami Al-Rfou' (rmyeid) wrote :
Changed in compiz (Ubuntu):
importance: Undecided → High
assignee: nobody → Robert Ancell (robert-ancell)
Revision history for this message
Robert Ancell (robert-ancell) wrote :

Appears to be a combination of compiz+intel. Rick Spencer had this occur since dist-upgrading today. I am not reproducing on my upgraded compiz+amd system.

Revision history for this message
Robert Ancell (robert-ancell) wrote :

Still occurs using compiz 0.8.3 in PPA:
https://launchpad.net/~compiz/+archive/ppa

summary: - compiz eats 100% of the CPU
+ Uses 100% CPU with Intel drivers
Changed in compiz (Ubuntu):
status: New → Triaged
Revision history for this message
Travis Watkins (amaranth) wrote : Re: Uses 100% CPU with Intel drivers

Can you install the dbgsym packages for compiz then attach gdb to the running compiz and use 'where' to see what it is doing? Also, is metacity still running at this time? There was a bug where metacity kept getting restarted by the session manager and trying to take over which makes compiz use lots of CPU, perhaps it is related.

Revision history for this message
Juan Sebastián Marulanda (juanchito2006) wrote :

This CPU hang trigers as easily as pressing Alt+Tab

Revision history for this message
Rick Spencer (rick-rickspencer3) wrote :

Travis -
1. I did not see metacity running
2. I tried to install dbgsym, but the packages were out of sync

Changed in compiz (Ubuntu):
milestone: none → karmic-alpha-6
Revision history for this message
Bryce Harrington (bryce) wrote :

I agree with Travis that having some gdb data would help pinpoint where the failure is occurring.

Aside from that, there's a hypothesis that the recent mesa upgrade caused this issue. That can be tested by downgrading to the earlier version of mesa (7.5) and verifying that the issue goes away. You may still have the old mesa deps in your /var/cache/apt/archives/ ; if not, they can also be obtained from https://edge.launchpad.net/~ubuntu-x-swat/+archive/x-retro .

If it can be confirmed that the issue goes away when downgrading to mesa 7.5, then the next steps would be to move the bug report from compiz to mesa, and I will send it upstream to get priority attention.

Revision history for this message
Bryce Harrington (bryce) wrote :

I am not able to reproduce this on my 965 hardware updated to current karmic. (Also did not see the error on this hardware with karmic updated as of late last week.)

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Forwarding this bug from Ubuntu:
https://bugs.edge.launchpad.net/ubuntu/+bug/419264

[Problem]
Compiz locks up system using 100% cpu and preventing mouse or keyboard input until it is killed when running with recent git snapshots of libdrm and mesa. Downgrading to mesa 7.5 and libdrm 2.4.12 the issue goes away.

[Original Report]
compiz eats 100% of the CPU even after restarting ! only kill -9 is able to close th crazy compiz process.

ProblemType: Bug
Architecture: i386
CompizPlugins: [core,ccp,dbus,place,mousepoll,gnomecompat,move,resize,decoration,png,svg,imgjpeg,text,neg,video,wall,snap,animation,scale,scaleaddon,expo,staticswitcher,regex,resizeinfo,workarounds,ezoom,vpswitch,extrawm,fade,session,shift,wobbly]
Date: Wed Aug 26 17:47:57 2009
DistroRelease: Ubuntu 9.10
MachineType: LENOVO 8933Y16
Package: compiz 1:0.8.2-0ubuntu16
PackageArchitecture: all
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
PciDisplay: 00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller [8086:2a02] (rev 0c)
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-7-generic root=UUID=8920ca3c-8a9b-4b68-893c-1fec8a7cf652 ro quiet splash
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-7.27-generic
RelatedPackageVersions:
 xserver-xorg 1:7.4+3ubuntu5
 libgl1-mesa-glx 7.6.0~git20090817.7c422387-0ubuntu2
 libdrm2 2.4.12+git20090801.45078630-0ubuntu1
 xserver-xorg-video-intel 2:2.8.0-0ubuntu2
 xserver-xorg-video-ati 1:6.12.99+git20090629.f39cafc5-0ubuntu6
SourcePackage: compiz
Uname: Linux 2.6.31-7-generic i686
XorgConf: Error: [Errno 2] No such file or directory: '/etc/X11/xorg.conf'
dmi.bios.date: 06/28/2007
dmi.bios.vendor: LENOVO
dmi.bios.version: 7OET24WW (1.03 )
dmi.board.name: 8933Y16
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr7OET24WW(1.03):bd06/28/2007:svnLENOVO:pn8933Y16:pvrThinkPadR61/R61i:rvnLENOVO:rn8933Y16:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 8933Y16
dmi.product.version: ThinkPad R61/R61i
dmi.sys.vendor: LENOVO
system: distro = Ubuntu, architecture = i686, kernel = 2.6.31-7-generic

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created an attachment (id=28969)
BootDmesg.txt

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created an attachment (id=28970)
CurrentDmesg.txt

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created an attachment (id=28971)
Dependencies.txt

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created an attachment (id=28972)
XorgLog.txt

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created an attachment (id=28973)
XsessionErrors.txt

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created an attachment (id=28974)
gdb.txt

Revision history for this message
In , Chris Wilson (ickle) wrote :

Just a quick question to clarify: Is it spinning inside drawWindowTexture() chain or are we doing lots of counter-productive work?

Another couple of gdb traces, or ideally a sysprof, whilst it is spinning would be useful.

Revision history for this message
In , Bryce Harrington (bryce) wrote :

I took several additional gdb traces but they all look more or less the same - something stuck in _mesa_copy_rect ().

Revision history for this message
Bryce Harrington (bryce) wrote : Re: Uses 100% CPU with Intel drivers

I reproduced it finally

Revision history for this message
Bryce Harrington (bryce) wrote :

I'm also having some trouble figuring out how to get the dbgsyms for compiz installed - Robert or Travis please give guidance on this.

Revision history for this message
Bryce Harrington (bryce) wrote :

After downgrading libdrm and mesa to the versions in the x-retro PPA, the issue seems to have gone away. Previously, compiz would freeze reliably after <5 min usage. With the x-retro ppa I've been running now 20 minutes with nary a problem.

Changed in mesa (Ubuntu Karmic):
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Bryce Harrington (bryce) wrote :

I'm gathering that either there's a bug in mesa or libdrm, or else compiz needs to be updated in order to work properly with the new mesa.

I've reviewed the list of changes subsequent to the snapshot date for both our libdrm and mesa, and do not see an obvious commit that fixes this issue.

Changed in mesa (Ubuntu Karmic):
assignee: nobody → Bryce Harrington (bryceharrington)
summary: - Uses 100% CPU with Intel drivers
+ Uses 100% CPU with latest mesa/libdrm update
Revision history for this message
Robert Ancell (robert-ancell) wrote :

I've opened bug 420321 to add a -dbg package to compiz. In the meantime you can use the build from my PPA:
https://launchpad.net/~robert-ancell/+archive/ppa

Revision history for this message
Bryce Harrington (bryce) wrote :

I'm still not certain whether this is going to need a fix in compiz or in something mesa/libdrm/intel-ish, but I've gone ahead and filed the bug upstream with X.org at https://bugs.freedesktop.org/show_bug.cgi?id=23566 and flagged it to our Intel rep. The fact that it showed up after we upped mesa, and goes away after downgrading to 7.5 seems too suspicious for coincidence. (I'd like to hear if this holds true for others.)

Changed in xorg-server:
status: Unknown → Confirmed
Revision history for this message
In , Eric Anholt (eric-anholt) wrote :

This is probably yet another case of the lack of LRUs on our fences causing failure. Writing a patch.

Revision history for this message
Matt Zimmerman (mdz) wrote :

FYI I don't see this issue on my mini 10v, which hasn't been upgraded since 25 August (so mesa 7.5)

Revision history for this message
Martin Albisetti (beuno) wrote :

I can reliable reproduce this on 965 as well. One thing that I've found that triggers it instantly is opening up gwibber (if that's of any help).

Revision history for this message
Martin Albisetti (beuno) wrote :

Downgrading to libgl1-mesa-dri 7.5-1ubuntu1 makes the issue go away for me as well (sorry for the noise, computer hung the in middle!)

Revision history for this message
Bryce Harrington (bryce) wrote : Re: [Bug 419264] Re: Uses 100% CPU with latest mesa/libdrm update

On Fri, Aug 28, 2009 at 08:16:35PM -0000, Martin Albisetti wrote:
> I can reliable reproduce this on 965 as well. One thing that I've found
> that triggers it instantly is opening up gwibber (if that's of any
> help).

I wonder if the issue is particular to i965. Anyone reproduced it on
945?

Bryce

Revision history for this message
Martin Albisetti (beuno) wrote :

My netbook is 945, and I can't reproduce it.
It's using mesa 7.6 with everything up-to-date.

Revision history for this message
Bryce Harrington (bryce) wrote :

Interesting, I've reproduced the compiz 100% cpu freeze with mesa 7.1 as well.

Revision history for this message
Bryce Harrington (bryce) wrote :

Typo in previous comment. "7.1" should be "7.5-1ubuntu1"

Revision history for this message
In , Eric Anholt (eric-anholt) wrote :

*** Bug 23220 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Eric Anholt (eric-anholt) wrote :

*** Bug 23253 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Eric Anholt (eric-anholt) wrote :

*** Bug 23366 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Eric Anholt (eric-anholt) wrote :
Revision history for this message
In , Eric Anholt (eric-anholt) wrote :

pull request sent.

commit a09ba7faf75fa4b21980d81de8e5f3d5c0785ccf
Author: Eric Anholt <email address hidden>
Date: Sat Aug 29 12:49:51 2009 -0700

    drm/i915: Fix CPU-spinning hangs related to fence usage by using an LRU.

    The lack of a proper LRU was partially worked around by taking the fence
    from the object containing the oldest seqno. But if there are multiple
    objects inactive, then they don't have seqnos and the first fence reg
    among them would be chosen. If you were trying to copy data between two
    mappings, this could result in each page fault stealing the fence from
    the other argument, and your application hanging.

    https://bugs.freedesktop.org/show_bug.cgi?id=23566
    https://bugs.freedesktop.org/show_bug.cgi?id=23220
    https://bugs.freedesktop.org/show_bug.cgi?id=23253
    https://bugs.freedesktop.org/show_bug.cgi?id=23366

    Cc: Stable Team <email address hidden>
    Signed-off-by: Eric Anholt <email address hidden>
    Reviewed-by: Jesse Barnes <email address hidden>
    Reviewed-by: Chris Wilson <email address hidden>

Revision history for this message
Bryce Harrington (bryce) wrote :

From the upstream bug it appears to solve this issue will need a kernel fix. Leann and Andy, mind taking a look at including this patch for Karmic?

http://lists.freedesktop.org/archives/intel-gfx/2009-August/003981.html

affects: mesa (Ubuntu Karmic) → linux (Ubuntu Karmic)
Changed in linux (Ubuntu Karmic):
assignee: Bryce Harrington (bryceharrington) → Leann Ogasawara (leannogasawara)
status: Confirmed → Triaged
Changed in xorg-server:
status: Confirmed → Fix Released
Revision history for this message
Sebastien Bacher (seb128) wrote :

I get this issue every time I use alt-tab since I've upgraded

Revision history for this message
In , quanxian (quanxian-wang) wrote :

(In reply to comment #14)
Hi, Eric
I have put the patch into 2.6.31_RC7,the problem is still there. Are there more patches needed?
My environment is libdrm2.4.12, Mesa_7.6, xserver-1.6.3,xf86-video-intel:2.8.1

> pull request sent.
>
> commit a09ba7faf75fa4b21980d81de8e5f3d5c0785ccf
> Author: Eric Anholt <email address hidden>
> Date: Sat Aug 29 12:49:51 2009 -0700
>
> drm/i915: Fix CPU-spinning hangs related to fence usage by using an LRU.
>
> The lack of a proper LRU was partially worked around by taking the fence
> from the object containing the oldest seqno. But if there are multiple
> objects inactive, then they don't have seqnos and the first fence reg
> among them would be chosen. If you were trying to copy data between two
> mappings, this could result in each page fault stealing the fence from
> the other argument, and your application hanging.
>
> https://bugs.freedesktop.org/show_bug.cgi?id=23566
> https://bugs.freedesktop.org/show_bug.cgi?id=23220
> https://bugs.freedesktop.org/show_bug.cgi?id=23253
> https://bugs.freedesktop.org/show_bug.cgi?id=23366
>
> Cc: Stable Team <email address hidden>
> Signed-off-by: Eric Anholt <email address hidden>
> Reviewed-by: Jesse Barnes <email address hidden>
> Reviewed-by: Chris Wilson <email address hidden>
>

Revision history for this message
In , Sven Arvidsson (sa) wrote :

I don't know about Compiz, but at least the problems I reported with Warzone 2100 and ETQW have been fixed with this patch.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Guys,

Just a quick note that I'll build a Karmic test kernel with the patch Bryce mentioned from comment #22. I'll post a link to the build when it's ready. Thanks.

Changed in linux (Ubuntu Karmic):
status: Triaged → In Progress
tags: added: xorg-needs-kernel-fix
Revision history for this message
Bryce Harrington (bryce) wrote :

Sebastien, downgrade to mesa 7.5 (https://edge.launchpad.net/~ubuntu-x-swat/+archive/x-retro) and see if that solves it for you.

Revision history for this message
In , Eric Anholt (eric-anholt) wrote :

Reclose since it's reported fixed by Bryce.

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Yes, since updating to a kernel which includes this patch, I have been unable to reproduce the bug so far. I'll continue to keep an eye out for it, and encourage others to likewise test, but so far it appears this patch solved it.

Revision history for this message
Sebastien Bacher (seb128) wrote :

downgrading libgl1-mesa-dri=7.5-1ubuntu1 and restarting compiz fixes the issue

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

As noted in comment #24, I've built and posted a test kernel with the patch referenced in comment #22. Please test and let us know your results. Thanks.

http://people.canonical.com/~ogasawara/lp419264/

Revision history for this message
Bryce Harrington (bryce) wrote :

Thanks, I've installed and booted the lp419264 kernel with mesa 7.6 and so far am not able to reproduce the hang. I'll keep running it for a bit, since downgrading mesa seemed to fix it but the issue reappeared after a few hours heavy use.

Revision history for this message
Bryce Harrington (bryce) wrote :

Sebastien, it would be helpful if you could also test leann's kernel with the mesa 7.6 package and see if it still solves the hang you're seeing as well, because if it is, this will be the preferred fix for this bug.

Revision history for this message
Rami Al-Rfou' (rmyeid) wrote :

As in comment 5, I trigger the problem by pressing alt+tab. I am using the xorg stack from xorg-edgers ppa. After installing Leann's kernel, I can not reproduce the problem.

Revision history for this message
Sebastien Bacher (seb128) wrote :

the issue is fixed too there using the karmic libgl1-mesa-dri and Leann's linux version

Revision history for this message
Sebastien Bacher (seb128) wrote :

to be clear the karmic libgl1-mesa-dri = current 7.6

Revision history for this message
Bryce Harrington (bryce) wrote :

Thanks everyone for the quick testing. Ogasawara said she'll get the patch queued for the kernel, so I think this bug is nearly in the bag now.

Changed in compiz (Ubuntu Karmic):
status: Triaged → Invalid
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

I've posted the request to the Ubuntu kernel-team mailing list that this be included in Karmic:

https://lists.ubuntu.com/archives/kernel-team/2009-September/007128.html

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Tim just applied this patch to Karmic. It appears this patch also just hit upstream so we'll only need to carry this until the next rebase with upstream.

ogasawara@emiko:~/linux-2.6$ git log -p a09ba7faf75fa4b21980d81de8e5f3d5c0785ccf
commit a09ba7faf75fa4b21980d81de8e5f3d5c0785ccf
Author: Eric Anholt <email address hidden>
Date: Sat Aug 29 12:49:51 2009 -0700

    drm/i915: Fix CPU-spinning hangs related to fence usage by using an LRU.

Revision history for this message
Karunadheera (karunadheera) wrote :

Hi,

Can anyone please advice us on how to apply this patch in our kernel?

00:00.0 Host bridge: Intel Corporation Mobile PM965/GM965/GL960 Memory Controller Hub (rev 03)
00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 03)
00:02.1 Display controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 03)

prageeth@prageeth-laptop:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu karmic (development branch)
Release: 9.10
Codename: karmic
prageeth@prageeth-laptop:~$ uname -a
Linux prageeth-laptop 2.6.31-9-generic #29-Ubuntu SMP Sun Aug 30 17:39:26 UTC 2009 x86_64 GNU/Linux

Revision history for this message
Shwan (shwan-ciyako) wrote :

Every time I went back from Suspend compiz went crazy , no with the kernel above it is working normal.
I have some other problem as in xournal if I hold mouse pointer on a button it couldn't show the text about it and the hole screen just flickered as going on and off very fast , now it is working normal , so the problem was bigger than just some hangs.

Revision history for this message
Juan Sebastián Marulanda (juanchito2006) wrote :

In the latest Kernel update (2.6.31-10.30), Compiz seems to work without issues.

Revision history for this message
Łukasz Kuryło (lukasz-kurylo) wrote :

That's a negat. For me mesa 7.6.0~git20090817.7c422387-0ubuntu2 is still no go. Even after kernel upgrade to 2.6.31-10 I got kernel panic when moved mouse while glMatrix screensaver was running. Next time glMatrix just froze and it was better I could do SysRq + RSIUB kungfu.
Mine bug was #419264 marked as dupe of this one maybe it's another issue then.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Indeed the patch noted in comment #35 should be in the latest 2.6.31-10.30 kernel. I'm marking the linux kernel task as Fix Released.

@Łukasz Kuryło, I'm going to undup your bug from this as it does seem you are experiencing a different issue which was not resolved by this patch. Thanks.

Changed in linux (Ubuntu Karmic):
status: In Progress → Fix Released
Revision history for this message
In , Gordon Jin (gordon-jin) wrote :

Good this commit went into 2.6.31.

blankgus (blankgus)
Changed in linux (Ubuntu Karmic):
status: Fix Released → Fix Committed
Revision history for this message
Travis Watkins (amaranth) wrote :

This was fixed in the 2.6.31-10.30 kernel and we've got 2.6.31-10.32 now so this is Fix Released.

Changed in linux (Ubuntu Karmic):
status: Fix Committed → Fix Released
Changed in xorg-server:
importance: Unknown → Critical
Changed in xorg-server:
importance: Critical → Unknown
Changed in compiz (Ubuntu):
status: Invalid → Fix Released
Changed in xorg-server:
importance: Unknown → Critical
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.