GTK applications crashing reproducibly when using vesa

Bug #246585 reported by Matt Zimmerman
2
Affects Status Importance Assigned to Milestone
X.Org X server
Fix Released
Medium
xorg-server (Ubuntu)
Fix Released
Critical
Bryce Harrington
xserver-xorg-video-vesa (Debian)
Fix Released
Unknown

Bug Description

Binary package hint: xserver-xorg-video-vesa

Because I had some problems with -intel after upgrading to Intrepid (bug 246581), I fell back to vesa. However, it was immediately apparent that things weren't right, as the gdm greeter (both the standard one and xfailsafedialog) were segfaulting. gnome-panel wouldn't start up either, segfaulting in panel_multiscreen_width. gnome-terminal would run, but segfault when I clicked a drop-down menu after the error:

(gnome-terminal:8546): Gdk-CRITICAL **: get_monitor: assertion `monitor_num < screen_x11->n_monitors' failed

Once I was able to work around bug 246581, things worked fine with the -intel driver, so this seems to be specific to -vesa.

ProblemType: Bug
Architecture: i386
Date: Tue Jul 8 14:17:39 2008
DistroRelease: Ubuntu 8.10
Package: xserver-xorg-video-vesa 1:2.0.0-1ubuntu2
PackageArchitecture: i386
ProcEnviron:
 LC_COLLATE=C
 PATH=/home/username/bin:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/sbin:/usr/sbin:/usr/games:/usr/lib/surfraw
 LANG=en_US.UTF-8
 SHELL=/bin/zsh
SourcePackage: xserver-xorg-video-vesa
Uname: Linux 2.6.24-19-generic i686

Related branches

Revision history for this message
Matt Zimmerman (mdz) wrote :
Revision history for this message
Matt Zimmerman (mdz) wrote :

I noticed some mtrr errors in dmesg which may be related:

[ 711.736923] mtrr: base(0xe0000000) is not aligned on a size(0x770000) boundary
[ 716.265719] mtrr: base(0xe0000000) is not aligned on a size(0x770000) boundary
[ 721.748276] gnome-terminal[6790]: segfault at 00000000 eip b78a1a79 esp bfcf13e0 error 4
[ 730.166023] mtrr: base(0xe0000000) is not aligned on a size(0x770000) boundary
[ 733.543826] gnome-terminal[6963]: segfault at 00000000 eip b78f3a79 esp bf97d870 error 4

Revision history for this message
Matt Zimmerman (mdz) wrote :

xterm seemed to run perfectly well. metacity behaved like gnome-terminal, where it would only break if I clicked a menu

Revision history for this message
Bryce Harrington (bryce) wrote :

Could you reproduce this by setting the driver to "vesa" in an otherwise working xorg.conf, and get a backtrace (/var/log/Xorg.0.log as well)?

Changed in xserver-xorg-video-vesa:
status: New → Incomplete
Revision history for this message
Matt Zimmerman (mdz) wrote :
Revision history for this message
Matt Zimmerman (mdz) wrote :
Revision history for this message
Matt Zimmerman (mdz) wrote :
Revision history for this message
Matt Zimmerman (mdz) wrote :

As I mentioned in the original report, the crashes are in different places in different programs, but here's one example.

Revision history for this message
Matt Zimmerman (mdz) wrote :

With stack trace this time

Changed in xserver-xorg-video-vesa:
status: Incomplete → Confirmed
Revision history for this message
Matt Zimmerman (mdz) wrote :
Revision history for this message
Matt Zimmerman (mdz) wrote :
Revision history for this message
Matt Zimmerman (mdz) wrote :

Further debugging on IRC reveals that gdk_screen_get_n_monitors() is returning 0:

$ python
>>> import gtk
>>> gtk.gdk.screen_get_default().get_n_monitors()
0

However, both xrandr and xinerama seem to return sane values:

$ xrandr --verbose
Screen 0: minimum 1680 x 1050, current 1680 x 1050, maximum 1680 x 1050
default connected (normal)
 Identifier: 0x63
 Timestamp: 3676687
 Subpixel: horizontal rgb
 Clones:
 CRTCs: 0
  1680x1050 (0x64) 0.0MHz
        h: width 1680 start 0 end 0 total 1680 skew 0 clock 0.0KHz
        v: height 1050 start 0 end 0 total 1050 clock 0.0Hz

$ xdpyinfo -ext XINERAMA
[...]
number of screens: 1
[...]
XINERAMA version 1.1 opcode: 150
  Xinerama is inactive.

Revision history for this message
Bryce Harrington (bryce) wrote :

<mdz> bryce: I'm using sudo -b X :1 -ac -noreset -config xorg.conf.test
<mdz> for convenience, though it definitely happens when it's the only X server running as well

Revision history for this message
Bryce Harrington (bryce) wrote :

I am reproducing this behavior with -vesa on an amd64 box.

Changed in xserver-xorg-video-vesa:
assignee: nobody → bryceharrington
importance: Undecided → Critical
status: Confirmed → Triaged
Revision history for this message
Bryce Harrington (bryce) wrote :

Interesting, I've been able to reproduce everything in comment #12, and I'm seeing the gdm greeter crash, but not the gnome-terminal crash. Also, I'm not seeing anything in /var/crash (do I need to do something to switch apport on?)

0. This is with X.Org X Server 1.4.99.905 (1.5.0 RC 5) on nVidia G70 [GeForce 7600 GT]

1. Boot with driver set to "nv" (or omit it entirely).
   + gdm comes up, login works fine.
   + get_n_monitors() reports "1"
   + xrandr --verbose looks fine
   + xdpyinfo -ext XINERAMA reports
        XINERAMA version 1.1 opcode 150
          head #0: 1280x1024 @ 0,0
   + gnome-terminal loads up fine

2. Set driver to "vesa" in xorg.conf, restart X (ctrl-alt-backspace)
   + Displays greeter application crashing error dialog
   + Can switch to a tty and login as root okay
   + From tty, I get an error when importing gtk that display couldn't be opened
   + Clicking on the greeter error dialog, screen flickers, and this time import gtk works from the tty
   + gtk.gdk.screen_get_default().get_n_monitors() presents an error 'NoneType' object has no attribute 'get_n_monitors'

3. Shut down X via `/etc/init.d/gdm stop`
   + seems to take longer than expected, but it completes with OK
   + An X process is still running however. kill -9 it
   + Now restart X using `cd /etc/X11; X :0 -ac -noreset -config xorg.conf`
   + It presents the gnome-less grey default X session
   + Xorg.0.log confirms that it's using driver VESA
   + gtk.gdk.screen_get_default().get_n_monitors() returns "0"
   + xrandr --verbose reports "CRTCs: 0"; displays resolutions correctly
   + xdpyinfo -ext XINERAMA reports:
       XINERAMA version 1.1 opcode: 150
         Xinerama is inactive
   + DISPLAY=:0 gnome-terminal seems to run fine

4. Leaving "vesa" specified in xorg.conf, reboot.
   + greeter error dialog comes up as in #2
   + Hit ok multiple (~6) times. Displays a whiptail dialog, "It is likely that something bad is going on."

5. Shutdown gdm, start up X as in #3
    + Results are the same as in #3

So... I can see the issue with gdm greeter not starting up, and will continue investigating that, but I'm not seeing the GTK application crashes.

Revision history for this message
Bryce Harrington (bryce) wrote :

I'm able to reproduce the crash in gnome-panel. In panel_multiscreen_init(), it calls:

        screens = gdk_display_get_n_screens (display);

which appears to work correctly:

(gdb) print screens
$33 = 1
(gdb) print display
$34 = (GdkDisplay *) 0xa2b000

but then the gdk_screen_get_n_monitors() call seems to be returning a 0:

                monitors [i] = gdk_screen_get_n_monitors (screen);

this then causes a NULL pointer to be set here:

                geometries [i] = g_new0 (GdkRectangle, monitors [i]);

which then propagates down to this point:

int
panel_multiscreen_width (GdkScreen *screen,
                         int monitor)
{
  int n_screen;

        n_screen = gdk_screen_get_number (screen);

        g_return_val_if_fail (n_screen >= 0 && n_screen < screens, 0);
  g_return_val_if_fail (monitor >= 0 || monitor < monitors [n_screen], 0);

 return geometries [n_screen][monitor].width;
}

Breakpoint 1, panel_multiscreen_width (screen=<value optimized out>, monitor=0) at panel-multiscreen.c:180
180 in panel-multiscreen.c

(gdb) print n_screen
$9 = 0
(gdb) print monitor
$10 = 0
(gdb) print geometries[0][0]
Cannot access memory at address 0x0

And in referencing this NULL pointer, we get our crash.

So gnome-panel is making the assumption that gdk_display_get_screen () does not return 0 ever, which it appears in fact to be doing now when using -vesa. I imagine other gtk apps have similar logic in them, that doesn't check this return code and are also crashing on null pointers.

The attached patch peppers in some null pointer checks that probably should be there if 0 is a valid gdk_display_get_screen() return value. It won't fix the problem but will make it crash earlier on, where the bug actually occurs. I'll look at gdk_display_get_screen() next...

Revision history for this message
Bryce Harrington (bryce) wrote :

Looking at gdk_display_get_screen(), it returns NULL if there are no screens defined:

GdkScreen *
gdk_display_get_screen (GdkDisplay *display,
                        gint screen_num)
{
  g_return_val_if_fail (GDK_IS_DISPLAY (display), NULL);
  g_return_val_if_fail (ScreenCount (GDK_DISPLAY_X11 (display)->xdisplay) > screen_num, NULL);

  return GDK_DISPLAY_X11 (display)->screens[screen_num];
}

However, I'm not sure why it's doing so in this case...

(gdb) print ((GdkDisplayX11*)display)->xdisplay->nscreens
$45 = 1
(gdb) print screen_num
$46 = 0
(gdb) print ((GdkDisplayX11*)display)->xdisplay->screens[0]
$47 = {ext_data = 0x0, display = 0x9c7610, root = 92, width = 1280, height = 1024, mwidth = 382, mheight = 302, ndepths = 7,
  depths = 0x9ce250, root_depth = 24, root_visual = 0x9ce2d0, default_gc = 0x9ce350, cmap = 32, white_pixel = 16777215,
  black_pixel = 0, max_maps = 1, min_maps = 1, backing_store = 0, save_unders = 0, root_input_mask = 0}

Revision history for this message
Matt Zimmerman (mdz) wrote : Re: [Bug 246585] Re: GTK applications crashing reproducibly when using vesa

On Wed, Jul 09, 2008 at 10:57:21PM -0000, Bryce Harrington wrote:
> Interesting, I've been able to reproduce everything in comment #12, and
> I'm seeing the gdm greeter crash, but not the gnome-terminal crash.

You need to click on a menu to get gnome-terminal to crash (see the original
report).

> Also, I'm not seeing anything in /var/crash (do I need to do something
> to switch apport on?)

Yes, /etc/default/apport.

> 2. Set driver to "vesa" in xorg.conf, restart X (ctrl-alt-backspace)
> + Displays greeter application crashing error dialog
> + Can switch to a tty and login as root okay
> + From tty, I get an error when importing gtk that display couldn't be opened
> + Clicking on the greeter error dialog, screen flickers, and this time import gtk works from the tty
> + gtk.gdk.screen_get_default().get_n_monitors() presents an error 'NoneType' object has no attribute 'get_n_monitors'

That's odd; I get an otherwise valid return of 0. Did you remember to set
DISPLAY in the tty session?

--
 - mdz

Revision history for this message
Bryce Harrington (bryce) wrote :

Thanks.

I've been debugging this further today. The problem seems to be originating from GDK's init_randr12() routine. It makes a XRRGetOutputInfo() against libXrandr and the xserver, and receives back a (output->crtc = 0) value that then leads into the problems mentioned above.

So, in other words, on the client side it's calling the server to retrieve output info, and it's getting a structure back that doesn't have a crtc attached to it. Odd.

Anyway, best guess is that it's a bug introduced either with -vesa 2.0.0 or xserver 1.5 (or perhaps both), possibly gdk deserves a bug too since it's not properly checking this failure condition. I've filed this upstream with debian.

Changed in xorg-server:
status: Unknown → Confirmed
Changed in xserver-xorg-video-vesa:
status: Unknown → New
Changed in xserver-xorg-video-vesa:
status: New → Confirmed
Revision history for this message
Bryce Harrington (bryce) wrote :

Ajax provided a patch to the xserver that reputedly fixes the issue. Mind giving this a shot on a system you can reproduce the problem on?

http://people.ubuntu.com/~bryce/Testing/xserver/ *1.4.99.905-0ubuntu3*.deb

Changed in xserver-xorg-video-vesa:
status: Triaged → In Progress
Revision history for this message
Matt Zimmerman (mdz) wrote :

On Tue, Jul 15, 2008 at 10:05:32AM -0000, Bryce Harrington wrote:
> Ajax provided a patch to the xserver that reputedly fixes the issue.
> Mind giving this a shot on a system you can reproduce the problem on?
>
> http://people.ubuntu.com/~bryce/Testing/xserver/
> *1.4.99.905-0ubuntu3*.deb

Confirmed, this seems to fix the problem for me.

--
 - mdz

Revision history for this message
Bryce Harrington (bryce) wrote :

Excellent, thanks, I'll push this to intrepid.

Changed in xorg-server:
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package xorg-server - 2:1.4.99.905-0ubuntu3

---------------
xorg-server (2:1.4.99.905-0ubuntu3) intrepid; urgency=low

   * patches/124_fix_randr_no_crtc.patch:
     + In certain circumstances, xrandr multiscreen initialization fails
       to associate crtcs with monitors, resulting in startup failures in
       some GDK-based applications when using -vesa. This occurs because
       mode-Clock, mode->HTotal, and mode->VTotal are all 0. (LP: #246585)

 -- Bryce Harrington <email address hidden> Tue, 15 Jul 2008 02:16:40 -0700

Changed in xorg-server:
status: Fix Committed → Fix Released
Changed in xorg-server:
status: Confirmed → Fix Released
Changed in xserver-xorg-video-vesa:
status: Confirmed → Fix Released
Changed in xorg-server:
importance: Unknown → Medium
Changed in xorg-server:
importance: Medium → Unknown
Changed in xorg-server:
importance: Unknown → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.