Comment 523 for bug 359392

Revision history for this message
santorc (saxvodafome) wrote : R: [Bug 359392] Re: [i965] X freezes starting on April 3rd

I confirmed this bug on Ubuntu9.04 with the kernel 2.6.28-11.

-----Messaggio originale-----
Da: <email address hidden> per conto di zwaldowski
Inviato: mar 02/06/2009 23.12
A: <email address hidden>
Oggetto: [Bug 359392] Re: [i965] X freezes starting on April 3rd

Is the kernel-side fix in 2.6.30? Can it be backported to 2.6.29 or
even 2.6.28 (Jaunty kernel)? How about the actual fixes for X? Can
they be backported?

--
[i965] X freezes starting on April 3rd
https://bugs.launchpad.net/bugs/359392
You received this bug notification because you are a direct subscriber
of the bug.

Status in X.org xf86-video-intel: Confirmed
Status in "compiz" source package in Ubuntu: In Progress
Status in "xserver-xorg-video-intel" source package in Ubuntu: Fix Released
Status in compiz in Ubuntu Jaunty: Fix Committed
Status in xserver-xorg-video-intel in Ubuntu Jaunty: Confirmed
Status in compiz in Ubuntu Karmic: In Progress
Status in xserver-xorg-video-intel in Ubuntu Karmic: Fix Released

Bug description:
[Problem]
Starting around 4/3, when mesa was upgraded from 7.3 to 7.4, several i965 users started noticing X freeze after several hours of use, triggered by application usage especially noticeable with (but not unique to) compiz enabled.

[Impact]
The freeze bug affects a subset of i965 based systems, most particularly those using compiz. Exact numbers cannot be determined, but may be as high as 25-50% based on rough estimates.

The problem is severe: An unpredictable lockup of the system that requires a power cycle to recover from. For some users it comes on within minutes, while for others it comes after a few hours of use.

[How Addressed in Development Version]
For now, the patch being proposed for Jaunty has been uploaded to Karmic.

Longer term, we plan to move from EXA to UXA once the latter is stable enough. It has been found that while UXA exhibits other kinds of freezes, we've not yet been able to reproduce this particular freeze there.

[Patch for Jaunty]
A low-risk workaround that has proven effective at eliminating freezes, or at least greatly reducing their frequency, is to increase the Virtual framebuffer size. Some users do this locally as a matter of course to gain dual-head support, so this setting has received extremely widespread testing already.

The patch for Jaunty causes the Virtual size to be set to 2048x2048 if it is not otherwise specified. Users can still override this with their own settings, larger or smaller, as desired.

A side effect of this patch is that it also mitigates bug 158415 to a degree, which will make projector usage somewhat easier for this hardware because it will enable X to recognize higher resolutions available from the external monitor than were available on initial boot.

[Test Case]
The best method found to reproduce the bug is:
  a. Enable compiz
  b. Set your desktop to a 6x1 workspace layout
  c. Run http://launchpadlibrarian.net/25683477/repro.sh
  d. System will freeze anywhere from 1-20 min typically
  e. Power-button shutdown is required to reset the system

[Regression Potential]
In general, Virtual has been widely and extensively tested, so we do not expect this patch to trigger regressions.

The patch is coded to only take effect on i965 systems, so the scope of any regression that might conceivably be triggered will be limited to just that hardware.

[Suspects]
Omitting obviously trivial, unrelated changes, here are what changed in several suspect packages in the timeframe in question:

* intel driver:
  4/01: 118_drop_legacy3d.patch: Removed Legacy3D
  4/03: 114_fix_xv_with_non_gem.patch: Dropped since it caused regression
  4/06: 119_drm_bo_unreference_needs_null.patch: Fixes various nullptr derefs
  4/08: 120_fix_vt_switch.patch: Fix nullptr deref in video playback

* xserver:
  4/08: 177_animated_cursor_change_master.patch: fixes animated cursors
  4/06: 174_set_bg_pixmap_of_cow_to_none.patch: Sets bg pixmap of
         composite overlay window to None
  3/30: 172_cwgetbackingpicture_nullptr_check.patch: fix race condition
         when minimzing/maximizing firefox with flash video playing.

* mesa:
  4/03: 7.4 released
    * Added MESA_GLX_FORCE_DIRECT env var for Xlib/software driver
    * GLSL version 1.20 is returnd by the GL_SHADING_LANGUAGE_VERSION query
    * glGetActiveUniform() returned wrong size for some array types
    * Fixed some error checking in glUniform()
    * Fixed a potential glTexImage('proxy target') segfault
    * Fixed bad reference counting for 1D/2D texture arrays
    * Fixed VBO + glPush/PopClientAttrib() bug #19835
   * Assorted i965 driver bug fixes
    * Fixed a Windows compilation failure in s_triangle.c
    * Fixed a GLSL array indexing bug
    * Fixes for building on Haiku

* linux:
  4/04: 2.6.28-11.41: Revert MCHBAR patch
  4/02: 2.6.28-11.40: Add MCHBAR patch

* libdrm:
  4/04: 02_libdrm_nouveau_update.patch: Only affects nouveau code
  3/29: libdrm-nouveau1.symbols: Probably innocuous

[Workarounds]
Various people have found one or more of the following have helped to reduce the frequency or eliminate the freezes:

  * Turn off compiz
  * AccelMethod UXA
  * Set NoAccel true
  * Set Virtual to something high
  * MigrationHeuristic greedy
  * Revert to 2.4 -intel driver

[Current Theory]
A 3D memory buffer accumulates data until something gets in there incorrectly and leads to a freeze. This can happen either slowly over a long period of light use, or fairly soon if using the system heavily. It is not simply a matter of filling the memory up, so the trigger often seems to be random, but usually follows some sort of graphics transition (such as the 3D effect from alt-tab, or closing a firefox window).

It is fairly certain that there are multiple freeze bugs present in X.org with the -intel driver. This causes significant confusion when people having different bugs think they might have the same one, and find the symptoms and workarounds don't match. It is also suspected that the same root bug may have multiple different ways of triggering it.

Increasing the Virtual settings seems to either eliminate or greatly reduce the frequency of these freezes. Presumably this allocates larger memory buffers so the chances of something hitting a wrong thing are greatly lessened. However, none of this is well understood.

[Original Report]
I am using kubuntu jaunty beta. My system is freezing randomly. I have the latest updates.
After freezing the mouse pointer still works. And sometimes the power off button works. On powering off , when the stage for kubuntu logo comes its showing a distorted image.

This happens with or without desktop effects. These freezes are very randomly, but mostly when I run some new applications.
I am using intel graphics.

[lspci]
00:00.0 Host bridge [0600]: Intel Corporation Mobile PM965/GM965/GL960 Memory Controller Hub [8086:2a00] (rev 0c)
     Subsystem: Hewlett-Packard Company Device [103c:30be]
00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller [8086:2a02] (rev 0c)
     Subsystem: Hewlett-Packard Company Device [103c:30be]