[Hardy] Race condition in Xgl startup process

Bug #176515 reported by Jean-Baptiste Lallement
20
Affects Status Importance Assigned to Milestone
xserver-xgl (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

Binary package hint: xserver-xgl

Under certain circumstances with Xgl enabled, gnome-session fails to start because it can't find a display.
This is because gnome-session starts before Xgl has started.

The scenario is:
1. After boot, first time you start a gnome-session from gdm with Xgl enabled session starts correctly
2. Next logout/login session fails with a message in .xsession-errors

cannot open display:
Run '/usr/bin/seahorse-agent --help' to see a full list of available command line options.

The workaround is to wait for Xgl process to start in /usr/share/xserver-xgl/Xgl-session before continuing the startup process and running the session.

Here is a patch for /usr/share/xserver-xgl/Xgl-session
144,154d143
< #Wait for Xgl process to start
< TIMEOUT=5
< DISPLAYNUM=$( echo $XGL_DISPLAY | sed s/.*:// )
< while ! test -e "/tmp/.X$DISPLAYNUM-lock" ; do
< sleep 1;
< TIMEOUT=$(( $TIMEOUT - 1))
<
< # TODO If Xgl doesn't start then continue without it
< [ $TIMEOUT -lt 0 ] && exit 1
< done
<

This happens on
Hardy
DELL D630 - cpu intel T7300
Nvidia Quadro NVS135

Revision history for this message
Slash123 (manic-laughter) wrote :

I'm having a similar error in Hardy (cannot open display :1)
Could you elaborate on this patch please?
1) My guess is that the line about seahorse in your .xsession-error is unrelated. Correct?
2) Where in /usr/share/xserver-xgl/Xgl-session should the code you added above be pasted? Could you provide the contents of the previous and following lines?
I tried adding the commands between:
"...
verbose "Starting Xgl with options: " $XGL_ACCEL_OPTS $XGL_OPTS "\n"
$XGL_WRAPPER $XGL_DISPLAY $XGL_ACCEL_OPTS $XGL_OPTS &
DISPLAY=$XGL_DISPLAY"
and
"#Don't use Shift+Backspace as terminate_server
xmodmap -e "keycode 22 = BackSpace"
else
..."

But the patch did not seem to help. On logging in, the system just paused. Admittedly, I did not get the error in .xsession-errors, but Gnome did not load and the PC just waited. I had to disable Xgl (reinstating the file 'disable' in ~/.config/xserver-xgl) to login as usual.

If possible, please attach a copy of your Xgl-session that works.

My config:
Hardy
Compaq nc6230
ATI Mobility radeon

Thanks!

PS> Is your .xsession-error similar to Bug 174408 (https://bugs.launchpad.net/ubuntu/+bug/174408)?

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

Sorry, in the hurry I forgot to post the file. Here is the modified version of Xgl-session

The command "/usr/bin/seahorse-agent --execute /usr/bin/gnome-session " is passed as argument to the Xgl-session script and executed at the end.
If the display in $DISPLAY is not ready the command fails with "can't open display" and gnome-session is not executed.

Adding traces to this script, I've observed that the "exec $@" is executed _before_ the Xgl command in the wrapper script "Xgl-lockfile-wrapper" (except the first time you log in). In this case the session can't start because the display is not ready.

My .xsession-errors was similar to the one of Bug 174408.

The wait loop added to the startup script also solved the xmodmap error.

Revision history for this message
Slash123 (manic-laughter) wrote :

Great. I got Xgl to load on DISPLAY :1 using your Xgl-session attachment.
However, gnome (or for that matter KDE) refused to start. On entering login details, I can see the hard disk activity till Xgl loads, then it just stops with the human background and a busy pointer.
I noticed that if I run compiz from tty1 (# DISPLAY=:1 compiz &), compiz finds Xgl and loads without an error, but still no desktop.
However, if I run # DISPLAY =:0 xterm, I get an xterm on vt7. I can then start gnome-panel and nautlius to have a working gnome session. But, as this is :0 (whereas Xgl loaded on :1), when I try running compiz from this xterm, it exits with an error that it couldnt find Xgl.

~/.xsession-errors now does not have any errors that could give a clue. Neither does /var/log/Xorg.0.log.

Any suggestions?

Revision history for this message
Chris Halse Rogers (raof) wrote :

Your patch seems a fine solution for this problem. I'll look at adding something like this to the scripts sometime over the holidays, but don't hold your breath too soon! Feel free to push this along (by providing a candidate debdiff, as per «https://wiki.ubuntu.com/PackagingGuide/Recipes/Debdiff») if you'd like a solution before I get around to it.

Changed in xserver-xgl:
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

My very first debdiff attempt. Hope I did it fine.

@Slash123 : It seems to be a pb with your gnome session. look into ~/.gnome2/session or try removing your ~/.gnome2 directory to see if it solves it. But this is off topic and this place is not a support forum. I can't help any further here.

Revision history for this message
Tux (peter-hoogkamer) wrote :

Has anyone already tested this debdiff??

Revision history for this message
Chris Halse Rogers (raof) wrote :

The debdiff has unrelated changes in it (to config.guess, and another autogen'd file). I'll extend your patch to fix the TODO and upload a new snapshot soon.

Revision history for this message
harrydb (harrydeboer) wrote :

The cause of this bug is probably the same as the following
https://bugs.launchpad.net/ubuntu/+bug/174408

I would not call it a race condition since it is always reproducible btw.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package xserver-xgl - 1:1.1.99.1~git20080115-0ubuntu1

---------------
xserver-xgl (1:1.1.99.1~git20080115-0ubuntu1) hardy; urgency=low

  * New git snapshot. Fixes FTBFS on AMD64.
  * debian/rules:
    + Add get-orig-source target
  * debian/copyright:
    + Add git URL
  * debian/Xgl-session
    + Wait for Xgl to finish starting (as determined by lockfiles in /tmp)
      before spawning other processes (LP: #176515).
      Thanks to Jean-Baptiste Lallement.

 -- Christopher James Halse Rogers <email address hidden> Tue, 15 Jan 2008 23:34:10 +1100

Changed in xserver-xgl:
status: Triaged → Fix Released
Revision history for this message
hnmb (kernelghost) wrote :

The bug still presents for me with the latest xserver-xgl (1:1.1.99.1~git20080115-0ubuntu1) . there is no change in the symtomps at all..

xmodmap still waits during the loading of the desktop. when i kill it, seahorse starts and waits. when i disable the xgl, plain xorg uses 60% of the cpu.

cpu pentium m 2.0
vga radeon mobile x700 with fglrx

Revision history for this message
hnmb (kernelghost) wrote :

Ok, xorg had a business with glxinfo, which is also using quite alot cpu (30%) killing glxinfo eases the pain of xorg and the system becomes normal (without xgl)

Revision history for this message
hnmb (kernelghost) wrote :

I found some spare time to look at it.
Applications started within the session, always stuck in select, endlessly waiting for a response in my case.

I had prepended a gnome-terminal before xmodmap in Xgl-session. Here is its backtrace

Core was generated by `/usr/bin/gnome-terminal'.
#0 0xb7fc3410 in __kernel_vsyscall ()
(gdb) bt
#0 0xb7fc3410 in __kernel_vsyscall ()
#1 0xb75fb2dd in select () from /lib/tls/i686/cmov/libc.so.6
#2 0xb70c155a in _xcb_in_read_block (c=0x80a7ca0, buf=0x80a7328, len=8) at xcb_in.c:248
#3 0xb70c07fc in xcb_connect_to_fd (fd=3, auth_info=0xbfb79c10) at xcb_conn.c:131
#4 0xb70c2f21 in xcb_connect (displayname=0x0, screenp=0x0) at xcb_util.c:279
#5 0xb76d57f7 in _XConnectXCB () from /usr/lib/libX11.so.6
#6 0xb76be109 in XOpenDisplay () from /usr/lib/libX11.so.6
#7 0xb7a1f032 in gdk_display_open () from /usr/lib/libgdk-x11-2.0.so.0
#8 0xb79fca0d in gdk_display_open_default_libgtk_only () from /usr/lib/libgdk-x11-2.0.so.0
#9 0xb7bb2e4f in gtk_init_check () from /usr/lib/libgtk-x11-2.0.so.0
#10 0xb7bb2e84 in gtk_init () from /usr/lib/libgtk-x11-2.0.so.0

I also checked if xauth really add the token for Xgl and it works correctly.

Revision history for this message
Slash123 (manic-laughter) wrote :

Wanted to confirm hnmb's point: The updated xserver-xgl has not solved the problem in my computer. Have reinstalled (and now updated) gdm, but the system continues to pause at gdm load when xserver-xgl is enabled.

There are no errors in Xorg.0.log and .xsession-errors.

Revision history for this message
Slash123 (manic-laughter) wrote :

As of today's updates (including an upgrade of the kernel), Xgl seems to be working well.

Haven't been able to figure out what exactly put things right.

Thanks to whoever put this right :)

Revision history for this message
Slash123 (manic-laughter) wrote :

Ooops... Jumped the gun there.

Though fglrxinfo was reporting that the appropriate driver was in use, I found the graphics to be a bit too slow. So I checked glxgears and found that I was getting frame rates of ~300.

I reinstalled the ATI driver (8.01) only to find that the blank screen on login as before. .xsession-errors ends with the line 'Waiting 10 seconds for Xgl to start'. Nothing more.

FWIW: Was also getting a gnome-settings-daemon error at startup (bug 199960).

Have now reverted to no Xgl and metacity. Performance is back to normal (1700 fps in glxgears).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.