Xorg crashed with SIGSEGV in XisbRead()

Bug #324368 reported by Matt Zimmerman
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
X.Org X server
Fix Released
High
xorg-server (Ubuntu)
Fix Released
Medium
Bryce Harrington

Bug Description

This happened during, or immediately after, a resume from RAM. I've suspended and resumed many other times without incident, so this may not be reproducible.

#0 XisbRead (b=0x0) at ../../../../hw/xfree86/common/xisb.c:101
 ret = <value optimized out>
#1 0x00007f06594f4309 in ?? ()
   from /usr/lib/xorg/modules/input//synaptics_drv.so
#2 0x00007f06594f0c8a in ?? ()
   from /usr/lib/xorg/modules/input//synaptics_drv.so
#3 0x00000000004858bb in xf86Wakeup (blockData=<value optimized out>,
    err=<value optimized out>, pReadmask=<value optimized out>)
    at ../../../../hw/xfree86/common/xf86Events.c:271
 sigstate = 1
 LastSelectMask = (fd_set *) 0x7ddf20
 devicesWithInput = {fds_bits = {16384, 0 <repeats 15 times>}}
 pInfo = (InputInfoPtr) 0x23ddfd0
#4 0x0000000000451cdb in WakeupHandler (result=1, pReadmask=0x7ddf20)
    at ../../dix/dixutils.c:418
 i = 0
#5 0x00000000004ee4bf in WaitForSomething (pClientsReady=0x23dfaf0)
    at ../../os/WaitFor.c:231
 i = 1
 waittime = {tv_sec = 0, tv_usec = 923976}
 wt = (struct timeval *) 0x7fff79106740
 timeout = <value optimized out>
 clientsReadable = {fds_bits = {0 <repeats 16 times>}}
 clientsWritable = {fds_bits = {35506112, 35585208, 35477280,
    139665605120774, 108834960, 139665575263122, 126616504, 5472109,
    35591776, 35591776, 35591776, 108834960, 35585208, 139665608215040,
    84144416, 23161482}}
 curclient = <value optimized out>
 selecterr = 11
 nready = <value optimized out>
 devicesReadable = {fds_bits = {0 <repeats 16 times>}}
 now = <value optimized out>
 someReady = 0
#6 0x000000000044dea0 in Dispatch () at ../../dix/dispatch.c:367
 result = 0
 client = (ClientPtr) 0x27bbe30
 nready = -1
 start_tick = <value optimized out>
#7 0x0000000000433c0d in main (argc=10, argv=0x7fff79106938,
    envp=<value optimized out>) at ../../dix/main.c:397
 i = 1
 alwaysCheckForInput = {0, 1}

ProblemType: Crash
Architecture: amd64
DistroRelease: Ubuntu 9.04
ExecutablePath: /usr/bin/Xorg
Package: xserver-xorg-core 2:1.5.99.902-0ubuntu1
ProcAttrCurrent: unconfined
ProcCmdline: /usr/X11R6/bin/X :0 -br -audit 0 -auth /var/lib/gdm/:0.Xauth -nolisten tcp vt7
ProcEnviron:
 LC_COLLATE=C
 PATH=(custom, no user)
 LANG=en_GB.UTF-8
 SHELL=/bin/zsh
ProcVersion: Linux version 2.6.28-6-generic (buildd@crested) (gcc version 4.3.3 (Ubuntu 4.3.3-3ubuntu1) ) #17-Ubuntu SMP Fri Jan 30 15:35:08 UTC 2009

Signal: 11
SourcePackage: xorg-server
StacktraceTop:
 XisbRead ()
 ?? ()
 ?? ()
 xf86Wakeup ()
 WakeupHandler ()
Title: Xorg crashed with SIGSEGV in XisbRead()
Uname: Linux 2.6.28-6-generic x86_64
UserGroups:

[lspci]
00:00.0 Host bridge [0600]: Intel Corporation Mobile PM965/GM965/GL960 Memory Controller Hub [8086:2a00] (rev 0c)
     Subsystem: Lenovo Device [17aa:20b3]
00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller [8086:2a02] (rev 0c)
     Subsystem: Lenovo Device [17aa:20b5]

Tags: apport-crash

Related branches

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created an attachment (id=22501)
xorg.conf

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created an attachment (id=22502)
lshal

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created an attachment (id=22503)
lspci output

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created an attachment (id=22504)
Xorg.0.log

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created an attachment (id=22505)
nullptr_xisbread.patch

Checks for null pointer. (But why did -synaptics pass in a null ptr to begin with?)

Revision history for this message
Matt Zimmerman (mdz) wrote :

This happened during, or immediately after, a resume from RAM. I've suspended and resumed many other times without incident, so this may not be reproducible.

ProblemType: Crash
Architecture: amd64
DistroRelease: Ubuntu 9.04
ExecutablePath: /usr/bin/Xorg
Package: xserver-xorg-core 2:1.5.99.902-0ubuntu1
ProcAttrCurrent: unconfined
ProcCmdline: /usr/X11R6/bin/X :0 -br -audit 0 -auth /var/lib/gdm/:0.Xauth -nolisten tcp vt7
ProcEnviron:
 LC_COLLATE=C
 PATH=(custom, no user)
 LANG=en_GB.UTF-8
 SHELL=/bin/zsh
ProcVersion: Linux version 2.6.28-6-generic (buildd@crested) (gcc version 4.3.3 (Ubuntu 4.3.3-3ubuntu1) ) #17-Ubuntu SMP Fri Jan 30 15:35:08 UTC 2009

Signal: 11
SourcePackage: xorg-server
StacktraceTop:
 XisbRead ()
 ?? ()
 ?? ()
 xf86Wakeup ()
 WakeupHandler ()
Title: Xorg crashed with SIGSEGV in XisbRead()
Uname: Linux 2.6.28-6-generic x86_64
UserGroups:

Revision history for this message
Matt Zimmerman (mdz) wrote :
Revision history for this message
Apport retracing service (apport) wrote : Symbolic stack trace

StacktraceTop:?? ()
?? ()
xf86Wakeup (blockData=<value optimized out>,
WakeupHandler (result=1, pReadmask=0x7ddf20)
WaitForSomething (pClientsReady=0x23dfaf0)

Revision history for this message
Apport retracing service (apport) wrote : Symbolic threaded stack trace
Bryce Harrington (bryce)
description: updated
Changed in xorg-server:
assignee: nobody → bryceharrington
importance: Undecided → High
status: New → Triaged
Revision history for this message
Bryce Harrington (bryce) wrote :

Catches the null pointer at least, though I'm not sure why synaptics would be passing a null in here.

Changed in xorg-server:
status: Unknown → Confirmed
Revision history for this message
Bryce Harrington (bryce) wrote :

Hi matt, I've forwarded this issue upstream to https://bugs.freedesktop.org/show_bug.cgi?id=19918 - please subscribe to that bug in case upstream has further questions or wishes you to test something. Thanks ahead of time.

Bryce

Revision history for this message
In , Matt Zimmerman (mdz) wrote :

I'm the original bug reporter, and am subscribed to this bug now if you need further information.

Revision history for this message
Matt Zimmerman (mdz) wrote :

Kees just experienced the same crash during a resume, so this is no longer an isolated incident.

Revision history for this message
In , Peter Hutterer (peter-hutterer) wrote :

I looked at that code, but couldn't really find anything. Just putting a check
for NULL in isn't really a solution either, we need to find the root of the
problem, not just fix the symptom.
Anything that makes this bug reproducible is appreciated.

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Seems not to be easily reproducible. Both matt and kees saw the same crash, but only once each.

They both saw it at the end of a convention in Berlin. Kees suspects it was related to yanking out a projector before/during/after a resume. I suspect matt was probably doing similarly (perhaps with the same model of projector).

Unfortunately, as the conference is now over and the projectors were rented, we cannot test that hypothesis. But I hope it might provide a small clue.

Revision history for this message
Bryce Harrington (bryce) wrote :

Patching for the null ptr seems to be an insufficient fix. But for a better fix upstream needs a reproducible test case.

Kees or Matt, either of you have steps for reproducing this crash? If it was a one-time only thing, it could be hard to find a fix.

Changed in xorg-server:
status: Triaged → Incomplete
Revision history for this message
Bryce Harrington (bryce) wrote :
Revision history for this message
Bryce Harrington (bryce) wrote :

Erf, ignore that debdiff. Trying to do too many things simultaneously.

Revision history for this message
Matt Zimmerman (mdz) wrote : Re: [Bug 324368] Re: Xorg crashed with SIGSEGV in XisbRead()

On Tue, Feb 10, 2009 at 05:51:59PM -0000, Bryce Harrington wrote:
> Patching for the null ptr seems to be an insufficient fix. But for a
> better fix upstream needs a reproducible test case.
>
> Kees or Matt, either of you have steps for reproducing this crash? If
> it was a one-time only thing, it could be hard to find a fix.

Kees and I both experienced it independently, so it wasn't a one-time thing.
There is surely a bug here. Have you considered possible cases where the
null pointer could be passed in via the synaptics driver?

--
 - mdz

Revision history for this message
Bryce Harrington (bryce) wrote :

> Kees and I both experienced it independently, so it wasn't a one-time thing.
> There is surely a bug here.

Right, pretty clearly from the backtrace something did go wrong. It is exceptionally weird that this happened for both you and kees at roughly the same time, yet I gather that neither of you have seen it since then, and it appears few others have seen it (neither google nor launchpad turn up other bugs with 'XisbRead' crashes). It makes me think there is something very specific that is done to produce this bug, that both of you did. Kees mentioned he'd been plugging/unplugging from projectors, which seems unlikely to cause a crash in the mouse code, but stranger things have been known to happen.

> Have you considered possible cases where the null pointer could be passed in via the synaptics driver?

Yes, unfortunately the backtrace did not include symbols for the synaptics routines, so it's a bit of detective work to guess at the codepath, but XisbRead() only gets called a few places and it always passes the same parameter - a comm buffer. There's a few places where this is set to NULL, once at the beginning of driver initialization, and again in the code to turn the driver on and off. However, there are errors/warnings that would normally get printed in those conditions, which aren't appearing in your log. Very strange.

I wonder if perhaps there is a race condition where the suspend/resume occurred while the driver was in the middle of either the init, or in the middle of turning the device on/off. Some of the other warnings in the log indicate it was having trouble opening the device and retrying after a delay; maybe suspending while it was in the middle of a delay revealed the problem. I wish we had kees' Xorg.0.log for comparison.

I'm tentatively going to drop the priority on this to Medium for now since while it's a bad issue, it seems to occur quite infrequently. Although if we see more reports of this problem I'll bump it back up to High.

Changed in xorg-server:
importance: High → Medium
Revision history for this message
Kees Cook (kees) wrote :

On Tue, Feb 17, 2009 at 09:26:33PM -0000, Bryce Harrington wrote:
> I wish we had kees' Xorg.0.log for comparison.

This is one of the down-sides of the dup-detector, as it encourages people
to not file a bug when one already exists for a given crash. I think it'd
be better for it to finish filing the bug, but the dup it (to gain the
backtraces and logs).

Revision history for this message
Matt Zimmerman (mdz) wrote :

On Tue, Feb 17, 2009 at 09:50:15PM -0000, Kees Cook wrote:
> On Tue, Feb 17, 2009 at 09:26:33PM -0000, Bryce Harrington wrote:
> > I wish we had kees' Xorg.0.log for comparison.
>
> This is one of the down-sides of the dup-detector, as it encourages people
> to not file a bug when one already exists for a given crash. I think it'd
> be better for it to finish filing the bug, but the dup it (to gain the
> backtraces and logs).

In this case, of course, you decided not to file the bug when I said that I
had already done so. :-)

The dupe detector does work as you describe. It sounds like maybe you're
referring to bugpatterns, which are generally only used to suppress invalid
reports or to block bugs which are getting reported too frequently.

--
 - mdz

Revision history for this message
Bryce Harrington (bryce) wrote :

Both this and bug 328035 are crashes within xf86Wakeup on the same hardware. The backtraces and symptoms are different, but it's entirely possible they share a similar root cause.

I've patched in some extra debug messages to xf86Wakeup, which might give some additional clues (it spews quite a bit to the log though). I stuck the package on my ppa:

  https://edge.launchpad.net/~bryceharrington/+archive/ppa

Revision history for this message
Bryce Harrington (bryce) wrote :

For reference, here's my log when running with this patched Xserver. The log grows at a pretty quick clip, so I wouldn't recommend running this version except when trying to deliberately reproduce the behavior.

Bryce Harrington (bryce)
description: updated
Revision history for this message
Bryce Harrington (bryce) wrote :

Upstream seems to be waiting for better steps to reproduce. But I'm assuming kees and mdz are not seeing the crash again due to their silence.

I've left my nullptr check patch aside up 'til now in the hope that further crashes would turn up so we could pinpoint the steps to reproduce. However, I feel it makes little sense to release Ubuntu without the check since we already know it can crash here even if infrequently. We can disable the patch during karmic development if we want to explore it further.

Changed in xorg-server (Ubuntu):
status: Incomplete → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package xorg-server - 2:1.6.0-0ubuntu3

---------------
xorg-server (2:1.6.0-0ubuntu3) jaunty; urgency=low

  * Add 165_man_xorg_conf_no_device_ident.patch:
    - Device identifier no longer necessary in Screen section of
      xorg.conf. Update man page accordingly.
      (LP: #261577)
  * Add 166_nullptr_xinerama_keyrepeat.patch:
    - Avoids null pointer dereference when holding down keys on
      non-primary screen when using TwinView / Xinerama on -nvidia.
      (LP: #324465)
  * Add 167_nullptr_xisbread.patch:
    - Avoids null pointer dereference in XisbRead to prevent a (difficult
      to reproduce) crash during or after a resume from RAM.
      (LP: #324368)

 -- Bryce Harrington <email address hidden> Thu, 19 Mar 2009 00:17:40 -0700

Changed in xorg-server:
status: Fix Committed → Fix Released
Revision history for this message
In , Brian M. Carlson (sandals) wrote :

Please note that Debian bugs 532375 and 541259 are also about this bug. As the submitter of one of those bugs, I was able to reproduce the problem so regularly that I had to uninstall the synaptics driver so that I didn't lose my session the majority of the times I resumed. I'm happy to provide more information or do more tests if that's needed.

Revision history for this message
In , Peter Hutterer (peter-hutterer) wrote :

Created an attachment (id=30903)
0001-eventcomm-don-t-use-the-Xisb-buffers-for-reading.patch

Janitor patch - don't use the Xisb buffers for eventcomm devices.

This doesn't resolve the actual problem but since the use of the Xisb buffers was a bit dubious anyway it should rid it of that problem. Please let me know whether this patch fixes the issue.

Revision history for this message
In , Brian M. Carlson (sandals) wrote :

The patch in comment #10 fixes the problem.

Revision history for this message
In , Peter Hutterer (peter-hutterer) wrote :

Pushed as commit 33413529dc35f0afc585d4297f86199393d19684. Thanks for testing!

Changed in xorg-server:
status: Confirmed → Fix Released
Changed in xorg-server:
importance: Unknown → High
Changed in xorg-server:
importance: High → Unknown
Changed in xorg-server:
importance: Unknown → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.