[i945] spontaneous black screen (major pipe-A underrun)

Bug #311895 reported by Martin Pitt
132
This bug affects 10 people
Affects Status Importance Assigned to Milestone
xf86-video-intel
Fix Released
High
linux (Ubuntu)
Fix Released
High
Unassigned

Bug Description

Binary package hint: xserver-xorg-video-intel

I am using a Dell Latitude D430 with an Intel GM945.

When I use my external 19" TFT (through DVI, 1280x1024), I occasionally get a black screen. This is not triggered by anything obvious, it just happens spontaneously. It is impossible to recover from this with restarting X, only a reboot cures it.

Further investigation shows that this is caused by a long series of

  (EE) intel(0): underrun on pipe A!

errors (some 10.000 lines in the log). I get short underruns pretty often, which results in the screen flickering for a split second, but when the long series happens, the screen stays black forever.

A comparison of the intel_reg_dump output (-: works, +: black screen) confirms this as well:

-(II): PIPEASTAT: 0x00000203 (status: VSYNC_INT_STATUS VBLANK_INT_STATUS OREG_UPDATE_STATUS)
+(II): PIPEASTAT: 0x80000000 (status: FIFO_UNDERRUN)

I have not observed this behaviour when I use the laptop undocked, with the internal screen (1280x800).

This is current Jaunty with -intel 2:2.5.1-1ubuntu7. It also happened in Intrepid, but back then I didn't know about intel_reg_dumper.

ProblemType: Bug
Architecture: i386
DistroRelease: Ubuntu 9.04
Package: xserver-xorg-video-intel 2:2.5.1-1ubuntu7
ProcEnviron:
 PATH: custom, user
 LANG=de_DE.UTF-8
 SHELL=/bin/bash
ProcVersion: Linux version 2.6.28-4-generic (buildd@palmer) (gcc version 4.3.3 20081217 (prerelease) (Ubuntu 4.3.2-2ubuntu9) ) #5-Ubuntu SMP Fri Dec 26 22:48:51 UTC 2008

SourcePackage: xserver-xorg-video-intel
Uname: Linux 2.6.28-4-generic i686
xkbcomp:

[lspci]
00:00.0 Host bridge [0600]: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express Memory Controller Hub [8086:27a0] (rev 03)
     Subsystem: Dell Device [1028:0201]
00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller [8086:27a2] (rev 03)
     Subsystem: Dell Device [1028:0201]

Revision history for this message
In , Martin Pitt (pitti) wrote :

Created an attachment (id=21511)
intel_reg_dump when working

Revision history for this message
In , Martin Pitt (pitti) wrote :

Created an attachment (id=21512)
intel_reg_dump after going black

Revision history for this message
In , Martin Pitt (pitti) wrote :

Created an attachment (id=21513)
Xorg.0.log

X.org log which shows the plethora of pipe-A underruns.

Revision history for this message
In , Martin Pitt (pitti) wrote :

When this happens, I get the following kernel messages:

Dec 28 10:25:07 tick kernel: [ 5559.025081] mtrr: no MTRR for d0000000,10000000 found
Dec 28 10:25:08 tick kernel: [ 5560.478087] apm: BIOS not found.

Revision history for this message
In , Martin Pitt (pitti) wrote :

$ diff -U 0 intel_regs.works.txt intel_regs.black.txt
--- intel_regs.works.txt 2008-12-28 10:33:22.000000000 +0100
+++ intel_regs.black.txt 2008-12-28 10:24:50.000000000 +0100
@@ -57 +57 @@
-(II): PIPEASTAT: 0x00000203 (status: VSYNC_INT_STATUS VBLANK_INT_STATUS OREG_UPDATE_STATUS)
+(II): PIPEASTAT: 0x80000000 (status: FIFO_UNDERRUN)
@@ -132 +132 @@
-(II): FBC_CONTROL: 0x43e847e2
+(II): FBC_CONTROL: 0xc3e847e2
@@ -134 +134 @@
-(II): FBC_STATUS: 0x20000000
+(II): FBC_STATUS: 0x60000000
@@ -138 +138 @@
-(II): MI_MODE: 0x00000200
+(II): MI_MODE: 0x00000000

Revision history for this message
Martin Pitt (pitti) wrote :

Binary package hint: xserver-xorg-video-intel

I am using a Dell Latitude D430 with an Intel GM945.

When I use my external 19" TFT (through DVI, 1280x1024), I occasionally get a black screen. This is not triggered by anything obvious, it just happens spontaneously. It is impossible to recover from this with restarting X, only a reboot cures it.

Further investigation shows that this is caused by a long series of

  (EE) intel(0): underrun on pipe A!

errors (some 10.000 lines in the log). I get short underruns pretty often, which results in the screen flickering for a split second, but when the long series happens, the screen stays black forever.

A comparison of the intel_reg_dump output (-: works, +: black screen) confirms this as well:

-(II): PIPEASTAT: 0x00000203 (status: VSYNC_INT_STATUS VBLANK_INT_STATUS OREG_UPDATE_STATUS)
+(II): PIPEASTAT: 0x80000000 (status: FIFO_UNDERRUN)

I have not observed this behaviour when I use the laptop undocked, with the internal screen (1280x800).

This is current Jaunty with -intel 2:2.5.1-1ubuntu7. It also happened in Intrepid, but back then I didn't know about intel_reg_dumper.

ProblemType: Bug
Architecture: i386
DistroRelease: Ubuntu 9.04
Package: xserver-xorg-video-intel 2:2.5.1-1ubuntu7
ProcEnviron:
 PATH: custom, user
 LANG=de_DE.UTF-8
 SHELL=/bin/bash
ProcVersion: Linux version 2.6.28-4-generic (buildd@palmer) (gcc version 4.3.3 20081217 (prerelease) (Ubuntu 4.3.2-2ubuntu9) ) #5-Ubuntu SMP Fri Dec 26 22:48:51 UTC 2008

SourcePackage: xserver-xorg-video-intel
Uname: Linux 2.6.28-4-generic i686
xkbcomp:

Revision history for this message
Martin Pitt (pitti) wrote :
Revision history for this message
Martin Pitt (pitti) wrote :

This is the register dump when it works.

The pipe-A underruns can be seen in XorgLogOld.txt.

dmesg output when this happens:
Dec 28 10:25:07 tick kernel: [ 5559.025081] mtrr: no MTRR for d0000000,10000000 found
Dec 28 10:25:08 tick kernel: [ 5560.478087] apm: BIOS not found.

Revision history for this message
Martin Pitt (pitti) wrote :

This is the register dump after it goes black.

Changed in xserver-xorg-video-intel:
status: Unknown → Confirmed
Revision history for this message
reini (rrumberger) wrote :

I have what appears the same problem with Hardy on my Dell Inspiron 1525 with an Intel GM965 using the internal LCD screen (1280x800).
My hardy installation is up-to-date and my xserver-xorg-video-intel version is 2:2.2.1-1ubuntu13.8.
I have apport installed & running, but there seem to be no files in /var/crash/.

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

I think this might be a DUP, can you try the patch in 18651?

*** This bug has been marked as a duplicate of bug 18651 ***

Bryce Harrington (bryce)
Changed in xserver-xorg-video-intel:
status: New → Confirmed
Revision history for this message
In , Martin Pitt (pitti) wrote :

I'm building/installing -intel with this patch applied. I'll report back in a day or two, since the underruns only start to happen after a couple of hours (presumably when I'm doing particular things with my computer, but I'm not able to pinpoint what triggers it).

Thanks!

Revision history for this message
In , Martin Pitt (pitti) wrote :

I found out that starting kvm and doing some other window juggling triggers the quick underrun (i. e. the flickering, not the total blackout) pretty reliably.

With the proposed patch applied, I still get underruns, though. I'll let it run for a couple of days to see whether I get any black screen still.

Revision history for this message
Martin Pitt (pitti) wrote :

I'm building an -intel package with the patch in https://bugs.freedesktop.org/show_bug.cgi?id=18651 applied, and will let it run for two days. (The flicker underruns only seem to happen after some hours, and the complete blackout only happens every other day).

Revision history for this message
Martin Pitt (pitti) wrote :

Hm, if only I could actually build it..

../doltcompile gcc -DHAVE_CONFIG_H -I. -I../../src -I.. -I/usr/include/xorg -I/usr/include/pixman-1 -I/usr/include/drm -Wall -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -Wnested-externs -fno-strict-aliasing -I/usr/include/xorg -I/usr/include/pixman-1 -I/usr/include/drm -I/usr/include/X11/dri -I../../uxa -DI830_XV -DI830_USE_XAA -DI830_USE_EXA -Wall -g -O2 -MT i830_dri.lo -MD -MP -MF .deps/i830_dri.Tpo -c -o i830_dri.lo ../../src/i830_dri.c
../../src/i830_dri.c: In function 'I830DRISwapContext':
../../src/i830_dri.c:1162: error: 'drm_i915_flip_t' undeclared (first use in this function)
../../src/i830_dri.c:1162: error: (Each undeclared identifier is reported only once
../../src/i830_dri.c:1162: error: for each function it appears in.)
../../src/i830_dri.c:1162: error: expected ';' before 'flip'
../../src/i830_dri.c:1168: error: 'flip' undeclared (first use in this function)

I cannot find "drm_i915_flip_t" anywhere. I also downgraded linux-libc-dev to -3.4 (which the current jaunty .deb was built with). NB that this is not due to the patch I was attaching, that was already existin code. I grepped /usr/include/ and the source tree, nothing.

Revision history for this message
Martin Pitt (pitti) wrote :

I worked around this by explicitly adding

typedef struct drm_i915_flip {
   int pipes;
} drm_i915_flip_t;

(Copied from http://cgit.freedesktop.org/xorg/driver/xf86-video-intel/commit/?id=ba55ff15df974197bebd871e28bb96d817ae41c7)

Revision history for this message
Martin Pitt (pitti) wrote :

I found out that starting kvm and doing some other window juggling triggers the quick underrun (i. e. the flickering, not the total blackout) pretty reliably. With the upstream patch applied, I still get underruns, though. I'll let it run for a couple of days to see whether I get any black screen still.

Revision history for this message
Søren Holm (sgh) wrote :

It's happenens for me 4 times a day.

Revision history for this message
Wolfgang (wt-lists) wrote :

I have the same problem on an Apple Mac Mini (Intel i945), screen flickers from time to time and even "freezes" occasionally. I can switch to console with CTRL-ALT-F1 but to get X-Windows back alive system must be rebooted.

Xlog says:
...
(EE) intel(0): underrun on pipe A!
(EE) intel(0): underrun on pipe A!
(EE) intel(0): underrun on pipe A!
(EE) intel(0): underrun on pipe A!
...

The problem exists since update to 8.10, 8.04 worked stable...

Revision history for this message
Raghu (raghua1111+list) wrote :

I installed fresh Ubuntu 8.10 on EEE Box 202 (Atom, 945GME) couple of days back and seeing the exact 'blank screen' problem with "underrun on pipe A!" errors. Xorg log looks pretty much the same.

Changed in xserver-xorg-video-intel:
status: Confirmed → Invalid
Bryce Harrington (bryce)
description: updated
Revision history for this message
Raghu (raghua1111+list) wrote :

Martin,

Can we have access to the package you built? I will try it out on my Eee Box.

This is biggest problem I see with my ubuntu currently. I am used to not having to reboot for months.
Thanks.

Revision history for this message
Raghu (raghua1111+list) wrote :

Martin,

I just saw your comment over at https://bugs.freedesktop.org/show_bug.cgi?id=18651 that said the patch does not fix the problem.

Are there less optimal work around for this? Will I be able to use a generic driver that could be more stable? I don't need 3D performance.

thanks.
Raghu.

Revision history for this message
rhtme52 (reinhard-enders) wrote :

Happened to me reliably with my Acer Aspire One 110L, when watching movies with mplayer (usually after 5 - 15 minutes). In my case switching off framebuffer compression worked (See also https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/193419):

Section "Device"
 Identifier "Configured Video Device"
 Driver "intel"
 Option "FramebufferCompression" "off"
EndSection

Bryce Harrington (bryce)
Changed in xserver-xorg-video-intel:
status: Invalid → Unknown
importance: Undecided → High
status: Confirmed → Triaged
Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

Looks separate from 18651 unfortunately.

Revision history for this message
Raghu (raghua1111+list) wrote :

Thanks for the info. I am trying out the config now. will see how it works for multiple days. I was using VESA driver as the work around for this issue.

Bryce, thanks for increasing priority. This affects a lot of users.

Changed in xserver-xorg-video-intel:
status: Unknown → Confirmed
Revision history for this message
In , Martin Pitt (pitti) wrote :

I have used the suggestion in https://bugs.launchpad.net/bugs/311895
since yesterday (Option "FramebufferCompression" "off"), and that *seems* to do
the trick. I want to test it a little longer before fully confirming,
especially since the most recent X.org stopped logging the underruns in
Xorg.0.log, and I got too used to the occasional screen flicker, so I might
well have ignored them.

But my screen went black (or brown, or white) irrecoverably after a day or two
without that option. If that doesn't happen any more either, I'll report back here.

Revision history for this message
Raghu (raghua1111+list) wrote :

Looks like "Assigned To" field should change from 'freedesktop-bugs #18651' to 'freedesktop-bugs #19304' since 19304 is not a duplicate of 18651 anymore.

Martin Pitt (pitti)
Changed in xserver-xorg-video-intel:
status: Confirmed → Unknown
Changed in xserver-xorg-video-intel:
status: Unknown → Confirmed
Revision history for this message
In , Martin Pitt (pitti) wrote :

Two days have passed with Option "FramebufferCompression" "off", and I didn't notice a single flickering, nor encounter another black screen. Thus I'm fairly sure that this is at least a very good (if not perfect) workaround for the problem, and might also point to the root cause.

Just reiterating that I never ever observed those problems with the internal LVDS (1280x800), just with the external TFT (1280x1024).

</facts>

<wild and unqualified speculations>
May it be possible that compressing the framebuffer just occasionally takes too long, once it gets bigger than a critical treshold (which lies somewhere in between 1280x800 and 1280x1024 pixels)? Any idea why it would sometimes not recover from this at all any more, perhaps if it takes too long, and it cannot 'catch up' any more?

Thanks!

Revision history for this message
In , Dave-justdave (dave-justdave) wrote :

That mirrors my experience, too.

I'm on a Mac Mini with a GM945 video... using the TV-out at 1024x768 for several months I never had any issues, and when I changed to using DVI->HDMI output on it at 1280x720, I started getting the solid color screen really frequently. Disabling the FramebufferCompression about three weeks ago did make the machine usable again. I've run the thing for 5 or 6 hours per day on a daily basis (I have it hooked up to a TV using MythTV on it), and although I have still gotten that solid color screen since then, it's only happened once in all that time (as opposed to every 5 or 10 minutes before). I was getting that periodic flicker before, too, and that's infrequent enough that I don't notice it anymore if it's still happening at all.

Revision history for this message
Jean-Paul Calderone (exarkun) wrote :

I can add another data point for the "FramebufferCompression" "Off" fix. I have a Mac Mini. 8.10 is the first release to come even close to being able to drive a display from it. I've been experiencing screen flickering and black screens as described in the initial report as well. mplayer triggers this behavior more quickly than anything else I've found, usually blacking the screen and requiring a reboot within 5 or 10 minutes of video playing. Other media players trigger the behavior too, but less frequently. Everything causes flickering to happen now and then - including non-media playing, like browsing around the filesystem with nautilus. I also get frequent pipe underrun reports in my X log file.

After adding `Option "FramebufferCompression" "off"´ to my xorg.conf (as described by rhtme52 above), the system has been stable for several days. The flickering is gone and the display hasn't required a reboot to unblack it since.

Revision history for this message
Raghu (raghua1111+list) wrote :

Another happy customer with the "FramebufferCompression" "off". No flicker and no need to reboot for last two weeks.

I am willing to try any proposed patches.

Revision history for this message
reini (rrumberger) wrote :

I had to add the "FramebufferCompression" "off" to the monitor section (see below). While the issue sometimes only occurred when watching two videos in parallel and making some overly (KDE kicker tooltip) appear over them, it doesn't occur anymore even when watching 4 videos + overlay. IOW, issue closed for me. (Kubuntu 8.04.2 BTW)

Section "Monitor"
        Identifier "Configured Monitor"
        Option "FramebufferCompression" "off"
        Option "Ignore" "false"
EndSection

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

In #18491 there's a patch (https://bugs.freedesktop.org/attachment.cgi?id=22319) to mess with the FIFO watermark values that might help. But more than that, it includes a patch to dump the FIFO watermark regs to the intel_reg_dumper tool. Can someone apply it and capture a reg dump both before and after starting X on their machine with the patch applied?

The spontaneous black screen is almost surely caused by a series of pipe underruns. That generally happens if our memory arbitration settings are off (so a given pipe can't get its pixels due to some other pipe hogging the memory interface) or the FIFO watermark regs being incorrect (we fetch a new chunk of pixels too late and end up missing our window of time to feed them to the pipe).

The framebuffer compression hardware periodically compresses the framebuffer into a private section of memory (the compressed buffer), temporarily increasing memory activity; it could be that we're not accounting for that in the FIFO settings, so the screen goes black after the first compression pass (which is usually after about 15s iirc).

Revision history for this message
In , Martin Pitt (pitti) wrote :

I enabled FB compression again and applied the patch in bug 18491. It had quite a dramatic regressive effect: the screen now flickers at each hard disk access, mouse movement, or key press, and only stands still if absolutely nothing happens.

I captured the registers right after boot, then after X and gdm started, and finally after GNOME was fully running.

Revision history for this message
In , Martin Pitt (pitti) wrote :

Created an attachment (id=22944)
regs with patch from #18491: right after boot

Revision history for this message
In , Martin Pitt (pitti) wrote :

Created an attachment (id=22945)
regs with patch from #18491: after X and gdm start

Revision history for this message
In , Martin Pitt (pitti) wrote :

Created an attachment (id=22946)
regs with patch from #18491: GNOME fully running

That's the watermark change you asked for:

--- boot-nox.regs 2009-02-14 15:49:55.000000000 +0100
+++ boot-gdm.regs 2009-02-14 15:50:15.000000000 +0100
@@ -31,2 +31,2 @@
-(II): FWATER_BLC: 0x03060106
-(II): FWATER_BLC2: 0x00000306
+(II): FWATER_BLC: 0x033f033f
+(II): FWATER_BLC2: 0x0000033f

It doesn't change any further after starting GNOME (which does xrandr stuff, etc.) Other registers do change during GNOME startup, though.

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

Heh, I think I had the watermark regs backwards... I'll have to spin a new patch, but you could try changing the watermark value in the patch in the meantime:

watermark = (3 << 8) | 0x3f

should instead be something like

watermark = (3 << 8) | 1

Revision history for this message
In , Martin Pitt (pitti) wrote :

I did that change, much better. :-) It doesn't flicker so badly any more, and the watermark reg diff is now

$ diff -U 0 boot-nox.regs boot-gnome.regs |grep WATER
-(II): FWATER_BLC: 0x03060106
-(II): FWATER_BLC2: 0x00000306
+(II): FWATER_BLC: 0x03010301
+(II): FWATER_BLC2: 0x00000301

I have to run now, so I can't do the full test which triggers the original underrun; will report back tomorrow or Monday.

Thank you so far and have a nice weekend!

Revision history for this message
In , Martin Pitt (pitti) wrote :

Created an attachment (id=22956)
regs with fixed patch from #18491: right after boot

Martin Pitt (pitti)
Changed in xserver-xorg-video-intel:
assignee: nobody → pitti
Martin Pitt (pitti)
Changed in xserver-xorg-video-intel:
status: Triaged → Fix Committed
Changed in xserver-xorg-video-intel:
status: Fix Committed → Fix Released
Martin Pitt (pitti)
Changed in xserver-xorg-video-intel:
status: Fix Released → Confirmed
assignee: pitti → nobody
Martin Pitt (pitti)
Changed in xserver-xorg-video-intel (Ubuntu):
assignee: nobody → pitti
status: Confirmed → In Progress
Martin Pitt (pitti)
Changed in xserver-xorg-video-intel (Ubuntu):
assignee: pitti → nobody
status: In Progress → Triaged
Changed in xserver-xorg-video-intel:
status: Confirmed → In Progress
Bryce Harrington (bryce)
tags: added: black-screen
91 comments hidden view all 171 comments
Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

Yay, fix pushed!

commit 7662c8bd6545c12ac7b2b39e4554c3ba34789c50
Author: Shaohua Li <email address hidden>
Date: Fri Jun 26 11:23:55 2009 +0800

    drm/i915: add FIFO watermark support

Revision history for this message
In , Martin Pitt (pitti) wrote :

Oops, I am terribly sorry. We currently put i915 into the initramfs, and it gets loaded from there. When I built the module with the patch, I forgot to update the initramfs, so all these successful tests were actually done with the original i915 from 2.6.31rc1.

Later this afternoon some other package updated the initramfs, and now the screen goes entirely and irrecorverably black when booting, both when docked (external DVI) and when undocked (internal LVDS).

So, perhaps you should revert this from your tree until this is investigated further? So far, I don't seem to have this underrun problem at all with 2.6.31rc1, thus I leave the bug as "resolved".

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

Uh-oh, ok thanks for the heads-up. I'll look at this. Can you modprobe your drm with debug=1 so we can see what the watermark values end up being on your machine? It would help if you could confirm that this particular patch caused the problem too, was that the only change or was there another kernel update as well?

Revision history for this message
In , Martin Pitt (pitti) wrote :

It wasn't the only patch, I also applied the tiny patch from bug 20520 (register restoring ordering fix for resuming). However, I tested that patch in isolation before, and it worked fine. Also, I don't think that code path is active on boot. There was no other kernel update.

I'll send detailled debugging information tomorrow (I hope I can ssh into the machine still, or it gets logged far enough), bed time for today. I just wanted to give you an early warning to perhaps defer propagation of the patch (or just revert it for now, since it just works without it.

Revision history for this message
In , Martin Pitt (pitti) wrote :

Created an attachment (id=27329)
logs for early/late i915 loading with drm debugging

So, first I turned on DRM debugging and dmesg capturing:

$ cat /etc/rcS.d/S80dmesg
#!/bin/sh
dmesg > /var/log/dmesg-`date +%T`
$ cat /etc/modprobe.d/drmdebug.conf
options drm debug=1

In the attached logs I renamed the dmesg files from timestamps to situation descriptions, such as "dmesg-31rc1-vanilla-early-2GB-ok.txt"

Then I tested all possible combinations of 2.6.31rc1 with/without this patch, with 1GB or 2 GB RAM, and with "
early" or "late" loading of i915/drm.

early: modules are contained and loaded by initramfs, i. e. pretty much as one of the first things after the k
ernel starts to boot

late: I booted without an initramfs, thus init starts readahead, sets the hostname and keyboard layout, and th
en starts udev which does an "udev trigger" and causes modules such as drm and i915 to be loaded, which in tur
n does KMS.

In earlier Karmic (2.6.30 release candidates), we didn't put i915/drm into the initramfs, and it worked fine (just looked a bit ugly since mode got switched halfway through boot). Now I noticed that this late loading doe
s not work any more for some reason, not with 2.6.30 final, not with 31rc1, or with 31rc1+your patch. That is
a bug in itself, and sounds pretty unrelated to this pipe underrun issue, so perhaps I should report it separa
tely?

Results from this testing:
 * late loading never works, I always get LVDS and DVI turned off
 * early loading works with .30 final and .31rc1 vanilla
 * with this patch applied, it never works, and worse, I don't even get a dmesg captured; this means that the
boot doesn't even get to rcS/70. Sounds like it wedges display and causes a kernel panic? Anything I can do to
 debug this?
 * 1 GB/2 GB does not make any difference in any test case

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

(In reply to comment #87)
> Then I tested all possible combinations of 2.6.31rc1 with/without this patch,
> with 1GB or 2 GB RAM, and with "
> early" or "late" loading of i915/drm.
>
> early: modules are contained and loaded by initramfs, i. e. pretty much as one
> of the first things after the k
> ernel starts to boot
>
> late: I booted without an initramfs, thus init starts readahead, sets the
> hostname and keyboard layout, and th
> en starts udev which does an "udev trigger" and causes modules such as drm and
> i915 to be loaded, which in tur
> n does KMS.

Sounds like a good set of combinations, thanks for testing.

> In earlier Karmic (2.6.30 release candidates), we didn't put i915/drm into the
> initramfs, and it worked fine (just looked a bit ugly since mode got switched
> halfway through boot). Now I noticed that this late loading doe
> s not work any more for some reason, not with 2.6.30 final, not with 31rc1, or
> with 31rc1+your patch. That is
> a bug in itself, and sounds pretty unrelated to this pipe underrun issue, so
> perhaps I should report it separately?

One thing jumped out between the early (working) and late (broken) logs: in the broken ones there's no line for the fbcon loading & initializing. Which would leave your display blank if/until X starts. Maybe that's missing from the load in the late case?

> Results from this testing:
> * late loading never works, I always get LVDS and DVI turned off
> * early loading works with .30 final and .31rc1 vanilla
> * with this patch applied, it never works, and worse, I don't even get a dmesg
> captured; this means that the
> boot doesn't even get to rcS/70. Sounds like it wedges display and causes a
> kernel panic? Anything I can do to
> debug this?
> * 1 GB/2 GB does not make any difference in any test case

Ugh, ok so it's probably not a pipe underrun then if it kills the whole machine (at least I hope not); could be a kernel panic. You could try netconsole (modprobe netconsole netconsole=<params> and then use nc on another machine, the kernel Documentation/ directory has some info on that); it might capture a panic if you load the module by hand with the netconsole running.

Changed in xserver-xorg-video-intel:
status: In Progress → Fix Released
Revision history for this message
In , Martin Pitt (pitti) wrote :
Download full text (4.8 KiB)

> One thing jumped out between the early (working) and late (broken) logs: in the
> broken ones there's no line for the fbcon loading & initializing. Which would
> leave your display blank if/until X starts. Maybe that's missing from the load
> in the late case?

Indeed, I discussed that with our initramfs/boot guru. So that's not a concern here.

> Ugh, ok so it's probably not a pipe underrun then if it kills the whole machine
(at least I hope not); could be a kernel panic. You could try netconsole

Thanks for the netconsole hint, that worked beautifully. Indeed it catches a nice trace in the watermark updating:

[ 489.298734] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
[ 489.298908] IP: [<ffffffffa030f1af>] intel_update_watermarks+0xcf/0xd40 [i915]
[ 489.299056] PGD 0
[ 489.299152] Oops: 0000 [#1] SMP
[ 489.299289] last sysfs file: /sys/devices/pci0000:00/0000:00:02.0/drm/card0/dev
[ 489.299384] CPU 0
[ 489.299481] Modules linked in: i915(+) drm netconsole i2c_algo_bit configfs snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm arc4 joydev ecb snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer iwl3945 iwlcore iTCO_wdt iTCO_vendor_support snd_seq_device mac80211 led_class snd psmouse dell_wmi dell_laptop cfg80211 soundcore snd_page_alloc usb_storage usbhid serio_raw dcdbas video output tg3 fbcon tileblit font bitblit softcursor intel_agp [last unloaded: drm]
[ 489.300005] Pid: 2208, comm: work_for_cpu Not tainted 2.6.31-1-generic #14-Ubuntu Latitude D430
[ 489.300005] RIP: 0010:[<ffffffffa030f1af>] [<ffffffffa030f1af>] intel_update_watermarks+0xcf/0xd40 [i915]
[ 489.300005] RSP: 0018:ffff8800229e98b0 EFLAGS: 00010202
[ 489.300005] RAX: 0000000000000000 RBX: ffff880022966800 RCX: ffffffffa03244fb
[ 489.300005] RDX: ffffffffa0321a20 RSI: ffffffffa0324518 RDI: 0000000000000001
[ 489.300005] RBP: ffff8800229e9930 R08: 0000000000000000 R09: 000000000001a400
[ 489.300005] R10: 0000000000000500 R11: 0000000000000000 R12: ffff880022967000
[ 489.300005] R13: 000000000001a400 R14: ffff8800229674a0 R15: 0000000000000001
[ 489.300005] FS: 0000000000000000(0000) GS:ffff8800019b4000(0000) knlGS:0000000000000000
[ 489.300005] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 489.300005] CR2: 0000000000000038 CR3: 0000000001001000 CR4: 00000000000006b0
[ 489.300005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 489.300005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 489.300005] Process work_for_cpu (pid: 2208, threadinfo ffff8800229e8000, task ffff88003d5416b0)
[ 489.300005] Stack:
[ 489.300005] ffff8800229e9910 ffffffffa0317a5a ffff000100000038 ffff8800229e98f0
[ 489.300005] <0> ffff000100010038 ffff8800229e98e0 0000000000000001 0000000000000002
[ 489.300005] <0> ffff8800229e0009 0000000000000000 ffff8800229e9920 ffff880022f3b000
[ 489.300005] Call Trace:
[ 489.300005] [<ffffffffa0317a5a>] ? intel_sdvo_read_byte+0x6a/0xc0 [i915]
[ 489.300005] [<ffffffffa031161c>] intel_crtc_dpms+0xb0c/0xef0 [i915]
[ 489.300005] [<ffffffffa0317cff>] ? intel_sdvo_set_active...

Read more...

Revision history for this message
In , Martin Pitt (pitti) wrote :

jbarnes| pitti: just wondering if you can gdb your i915.o and do a "list *intel_update_watermarks+0xcf"

Seems I need to build the module with debugging or so:

(gdb) list *intel_update_watermarks+0xcf
No symbol table is loaded. Use the "file" command.

Sorry, this kernel debugging is all new to me :/

I now built the module with "CONFIG_DEBUG_INFO=1 make -C /usr/src/linux-headers-2.6.31-1-generic/ M=`pwd` modules", so they have debug info now and gdb works. But I guess due to the rebuild the offsets were all scrambled, so I need to get the backtrace again. Stay tuned..

Revision history for this message
In , Martin Pitt (pitti) wrote :

So apparently the offset is even stable across rebuilds. I captured the trace again, and it looks exactly like the previous trace, so I'm not copying that again.

(gdb) list *intel_update_watermarks+0xcf
0x101af is in intel_update_watermarks (/home/martin/ubuntu/kernel/linux-2.6.31/drivers/gpu/drm/i915/intel_display.c:1918).
1913 intel_crtc->pipe, crtc->mode.clock);
1914 planeb_clock = crtc->mode.clock;
1915 }
1916 sr_hdisplay = crtc->mode.hdisplay;
1917 sr_clock = crtc->mode.clock;
1918 pixel_size = crtc->fb->bits_per_pixel / 8;
1919 }
1920 }
1921
1922 /* Single pipe configs can enable self refresh */

So I guess it crashes because crtc->fb is NULL, since fbcon is not loaded yet?

Revision history for this message
In , Martin Pitt (pitti) wrote :

BTW, this happens whether or not 'fbcon' gets loaded before.

Also confirmed when applying the patch to 2.6.31rc2.

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

On Mon, 6 Jul 2009 23:03:19 -0700 (PDT)
<email address hidden> wrote:
> --- Comment #91 from Martin Pitt <email address hidden> 2009-07-06
> 23:03:18 PST --- So apparently the offset is even stable across
> rebuilds. I captured the trace again, and it looks exactly like the
> previous trace, so I'm not copying that again.
>
> (gdb) list *intel_update_watermarks+0xcf
> 0x101af is in intel_update_watermarks
> (/home/martin/ubuntu/kernel/linux-2.6.31/drivers/gpu/drm/i915/intel_display.c:1918).
> 1913 intel_crtc->pipe,
> crtc->mode.clock);
> 1914 planeb_clock =
> crtc->mode.clock; 1915 }
> 1916 sr_hdisplay = crtc->mode.hdisplay;
> 1917 sr_clock = crtc->mode.clock;
> 1918 pixel_size =
> crtc->fb->bits_per_pixel / 8; 1919 }
> 1920 }
> 1921
> 1922 /* Single pipe configs can enable self refresh */
>
> So I guess it crashes because crtc->fb is NULL, since fbcon is not
> loaded yet?

Ah yes, that helps a lot, thanks. I'll fix that up.

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

Created an attachment (id=27540)
fix up FIFO programming

The stuff that went upstream falls into the "how did that ever work" category. We were just getting lucky that the calculations always resulted in the most aggressive FIFO programming. This corrects that and should also fix your hang.

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

Re-opening this as the FIFO master bug.

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

*** Bug 18702 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

*** Bug 18491 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Martin Pitt (pitti) wrote :

Does that patch go on top of the "most recent, KMS version of the patch" (https://bugs.freedesktop.org/attachment.cgi?id=26930) or does it replace it? I suppose the latter, since the new one doesn't touch crtc->fb at all, but it looks very different from the older one.

Thanks! Martin

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

It sits on top of current drm-intel-next bits.

Changed in xserver-xorg-video-intel:
status: Fix Released → Confirmed
Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

Created an attachment (id=27575)
more fixes for FIFO programming

I tested on my 855 machine and found some bugs in that configuration. So I cleaned up the code a little more and fixed things up. This one applies on top of the drm-intel-next branch.

Revision history for this message
In , Martin Pitt (pitti) wrote :

For the record, I get a warning after applying the patch to drm-intel-next:

/home/martin/ubuntu/kernel/drm-intel-next/i915/intel_display.c: In function ‘intel_find_pll_g4x_dp’:
/home/martin/ubuntu/kernel/drm-intel-next/i915/intel_display.c:834: warning: ‘clock.vco’ is used uninitialized in this function

Will test now.

Revision history for this message
In , Martin Pitt (pitti) wrote :

Applied on top of current intel-drm-next, so far no noticeable difference (in other words, everything still works just fine). I'll use that driver for a few days now, will report back if anything regresses.

Revision history for this message
In , Scott (firecat53) wrote :

Hey Jesse, sorry I haven't been able to try the patch that you sent me yet. I did real quick install the newest version of the video-intel driver, which on Arch is 2.7.99.901-3. This is on the 2.6.30 kernel (i686). It still exhibits the same behavior (flickering after resume from suspend to ram), but the frequency of the flicker is substantially reduced....it's actually usable now, with just the occasional flicker. Better performance than the vesa driver!!

I'll still attempt the patch at some point when I get a chance. Send me a new one if this info changes anything.

Scott

Revision history for this message
In , Scott (firecat53) wrote :

Sorry guys....I have to retract my previous post after using intel-video-newest for a couple of hours. Worked fine with normal browsing, and program open/closing, but as soon as a non-flash video (avi) played, the flicker went back to making it unusable (well, highly unpleasant at least) for the duration of the movie. Flash video doesn't seem to trigger the flicker, except periodically.

Scott

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

The last patch I attached here is a kernel patch; it should make things better for you if you've got a KMS enabled configuration. Is there any way for you to try that, Scott?

Revision history for this message
In , Scott (firecat53) wrote :

(In reply to comment #105)
> The last patch I attached here is a kernel patch; it should make things better
> for you if you've got a KMS enabled configuration. Is there any way for you to
> try that, Scott?
>

Tried with the kernel source from the Arch repos and got:

patching file drivers/gpu/drm/i915/i915_reg.h
Hunk #1 FAILED at 1618.
1 out of 1 hunk FAILED -- saving rejects to file drivers/gpu/drm/i915/i915_reg.h.rej
patching file drivers/gpu/drm/i915/intel_display.c
Hunk #1 FAILED at 1623.
Hunk #2 FAILED at 1822.
Hunk #3 FAILED at 1869.
Hunk #4 FAILED at 2022.
4 out of 4 hunks FAILED -- saving rejects to file drivers/gpu/drm/i915/intel_display.c.rej

I did make sure this patch was applied before the standard arch patches. Can you send the link for the other kernel source you had me use last time?

Thanks!
Scott

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

Fix has been pushed to drm-intel-next, that's probably the easiest way to get it now:

author Jesse Barnes <email address hidden>
commit dff33cfcefa31c30b72c57f44586754ea9e8f3e2

drm/i915: FIFO watermark calculation fixes

Changed in xserver-xorg-video-intel:
status: Confirmed → Fix Released
Revision history for this message
In , Scott (firecat53) wrote :

Ok, got and compiled the kernel from git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel.git.

uname -a = 2.6.31-rc2-drm-intel-26127-gdff33cf #1 SMP PREEMPT Thu Jul 16 20:23:01 PDT 2009 i686 Intel(R) Pentium(R) 4 CPU 1.80GHz GenuineIntel GNU/Linux

xf86-video-intel-newest 2.7.99.902-1 : X.org Intel i810\/i830\/i915\/945G\/G965+ video drivers (2.8.0 RC2).

Enabled KMS. Same flicker behavior following suspend to RAM, possibly even worse than with the stock kernel and no KMS. Darn it, I was hoping we had this solved!

Well, let me know what other information you need from me. I can't remember where to find the source for the intel_reg_dump program you had me use several months ago, if you need that.

Thanks!
Scott

Changed in xserver-xorg-video-intel:
status: Fix Released → Confirmed
Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

The bug that keeps on giving. Please check this one out; Eric found the same thing for his high res configs:

http://lists.freedesktop.org/archives/intel-gfx/2009-July/003471.html

Revision history for this message
In , Scott (firecat53) wrote :

Jesse, no change with that patch. Still horrible flickering of the whole screen after resuming from suspend to RAM.

What's next? :)

Scott

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

Can you attach your kernel log after you've loaded drm with debug=1? (Note, I'm assuming you're using KMS here.)

Revision history for this message
In , Scott (firecat53) wrote :

Boot was at 09:18, I suspended and resumed a few minutes later. Debug sure fills up the log quick! Sorry its so big....it was too big to post here so here's the link. Booted with drm.debug=1 and i915.modeset=1. Definitely have KMS working, because the switching between virtual terminals is so fast. Cool!

http://scottandchrystie.homeip.net/kernel.log.gz

Scott

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

Ah thanks, that helps a lot. What chipset do you have? I should be able to give you a fix pretty quickly...

Revision history for this message
In , Scott (firecat53) wrote :

Jesse: Graphics device from lspci --

00:00.0 Host bridge: Intel Corporation 82845G/GL[Brookdale-G]/GE/PE DRAM Controller/Host-Hub Interface (rev 01)
00:02.0 VGA compatible controller: Intel Corporation 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device (rev 01)

Scott

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

Hm, I was hoping it was something simple like I'd just read the 845 docs incorrectly, but afaict things are actually correct for that case. But the plane A FIFO allocation does look supiciously high; this patch assumes 845G actually measures FIFO entries in DSPARB as 16 byte values rather than 64, so it might help. I'll have to check some more docs before I know for sure though.

--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -1844,6 +1844,9 @@ static int intel_get_fifo_size(struct drm_device *dev, int
                        size = ((dsparb >> DSPARB_BEND_SHIFT) & 0x1ff) -
                                (dsparb & 0x1ff);
                size >>= 1; /* Convert to cachelines */
+ } else if (IS_845G(dev)){
+ size = dsparb & 0x7f;
+ size >>= 2; /* Convert to cachelines */
        } else {
                size = dsparb & 0x7f;
                size >>= 1; /* Convert to cachelines */

Revision history for this message
In , Scott (firecat53) wrote :

That didn't work, Jesse. I just get a black screen when it switches to the framebuffer on boot. The machine is still functioning because I can ssh in, but no display.

Let me know if you need the logs for this.

Scott

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

Ah I was looking at the wrong code path. In the 830/845 case I think I might be clobbering some important bits, this should preserve them and hopefully set the right values.

--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -1943,14 +1943,16 @@ static void i830_update_wm(struct drm_device *dev, int planea_clock,
       int pixel_size)
 {
  struct drm_i915_private *dev_priv = dev->dev_private;
- uint32_t fwater_lo = I915_READ(FW_BLC) & MM_FIFO_WATERMARK;
+ uint32_t fwater_lo = I915_READ(FW_BLC) & ~0xfff;
  int planea_wm;

  i830_wm_info.fifo_size = intel_get_fifo_size(dev, 0);

  planea_wm = intel_calculate_wm(planea_clock, &i830_wm_info,
            pixel_size, latency_ns);
- fwater_lo = fwater_lo | planea_wm;
+ fwater_lo |= (3<<8) | planea_wm;
+
+ DRM_DEBUG("Setting FIFO watermarks - A: %d\n", planea_wm);

  I915_WRITE(FW_BLC, fwater_lo);
 }

Revision history for this message
In , Scott (firecat53) wrote :

Jesse, I think this is an improvement. Still get occasional flickers with normal browsing and window movements following suspend. DVD and other movie playback still triggers strong flickering, although it seems somewhat better than the last patch. Flash doesn't seem to trigger the flicker, even running full screen. Here's the link for the kernel log (drm.debug=1).

http://scottandchrystie.homeip.net/kernel.log.gz

Just so you know, the last two patches you posted have been "malformed patches" right around line 4. I've had to manually patch to get it working :) Not sure if its a cut and paste artifact, but the other ones you posted worked fine as a patch file.

Thanks!
Scott

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

OK, so we're slowly improving. :) What if you apply both patches? I still can't find docs for the 845G FIFO and cache line sizes, so that could still be an issue.

Revision history for this message
In , Scott (firecat53) wrote :

Awesome! That did it!! Not a flicker to be seen so far! Nice work :) Let me know if you need anything else and when the patches actually make it into the kernel.

Thanks very much!

Scott

Changed in xserver-xorg-video-intel:
status: Confirmed → Fix Released
Revision history for this message
Bryce Harrington (bryce) wrote :

Hi Martin, I notice the upstream bug has been closed as fixed, although your last comment on the upstream bug was a bit uncertain whether it was resolved. Could you report if the bug is still an issue, and what patch(es) if any need pulled in?

From the patches being discussed upstream, it appears this bug needs a kernel patch, so I'm refiling against the kernel and tagging accordingly.

affects: xserver-xorg-video-intel (Ubuntu) → linux (Ubuntu)
Revision history for this message
Martin Pitt (pitti) wrote :

With KMS I never had this problem. Upstream said it worked by sheer accident, but now then are feeding the real patches to upstream. I don't think we need to track this any more.

Changed in linux (Ubuntu):
status: Triaged → Fix Released
Changed in xserver-xorg-video-intel:
importance: Unknown → High
Changed in xserver-xorg-video-intel:
importance: High → Unknown
Changed in xserver-xorg-video-intel:
importance: Unknown → High
Displaying first 40 and last 40 comments. View all 171 comments or add a comment.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.