Long delay before new/compose window appears

Bug #386199 reported by Craig Ringer
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Evolution
Invalid
Undecided
Unassigned
GtkHTML
Invalid
Undecided
Unassigned
gtk+2.0 (Ubuntu)
Fix Released
Low
Ubuntu Desktop Bugs

Bug Description

On my remote X11 clients, Evolution takes 20 - 30 seconds to display the compose / reply window. This happens on all clients, including a PXE-booted KVM virtual machine and a Xephyr X server. The problem occurs whether the X11 connection is ssh-tunneled or is a direct connection in the clear (using XDMCP).

Performance is otherwise normal, in Evolution and elsewhere.

Tracker has been removed. libtrackerclient is still installed as it's required by Totem, but Evolution does not link to it according to `ldd'.

IPv6 has been disabled, as previous bug reports have indicated that IPv6 may be the issue. `ip -6 r s' shows no routes, there are no inet6 addresses in `ip addr show', and the disable_ipv6 flag has been set in sysctl for all system interfaces.

Probably related bugs: bug #366101, bug #159153 . I've filed this separately because of the involvement of networked sessions. (though, as I haven't been able to test a physical console yet, there's no guarantee it's network only).

I'll attach a trace of the X11 traffic taken while Evolution is opening the compose window after clicking on the "new" button in the toolbar.

System details:
--------------------
$ lsb_release -rd
Description: Ubuntu 9.04
Release: 9.04

evolution:
  Installed: 2.26.1-0ubuntu2
tracker:
  Installed: (none)
evolution-data-server:
  Installed: 2.26.1-0ubuntu2
--------------------

Related branches

Revision history for this message
Craig Ringer (ringerc) wrote :
Revision history for this message
Craig Ringer (ringerc) wrote :

To examine the libpcap trace, open it with Wireshark, right-click on the first packet, and choose "Decode as". In the window that appears select the "Transport" tab, then the "X11" protocol in the list. Hit apply, then close.

You'll now see the decoded X11 traffic. (Note that this traffic is for the WHOLE SERVER, not just Evolution).

It's also informative to use the I/O graph. From the main menu, hit Statistics->I/O graphs. The first hump is moving the pointer to the "new" button and clicking on it, then moving the pointer out of the Xephyr window to wait for the compose window to appear. There's then a long period of steady low-rate traffic corresponding to the period when Evolution is non-responsive (from around seconds 6-18 in this particular trace), followed by a spike as it sets up and draws the window.

(I've attached a PNG of the graph, but if you actually fire up Wireshark and generate it yourself you'll be able to click on parts of it to move the main window's selection to that part of the timeline.)

The long flat period, when examined, appears to be a continuous loop of the following X11 operations:

(C: X11 client; S: X11 server):

C->S: Requests: Grabserver, QueryPointer
S->C: Reply: QueryPointer
C->S: Requests: QueryPointer
S->C: Reply: QueryPointer
C->S: UngrabServer

There are over 400 such repetitions in the particular sample I'm examining right now. Most are looping one after the other, but some are intermixed with other operations before or after the long slow period. However, at this point I'm not 100% sure they're even from Evolution.

Revision history for this message
Craig Ringer (ringerc) wrote :
Revision history for this message
Craig Ringer (ringerc) wrote :
Revision history for this message
Craig Ringer (ringerc) wrote :

I've attached a better trace now. It contains a full Evolution session from start to finish, and excludes X11 traffic from the rest of the desktop. I used `netstat' to see which clients were connected to the Xephyr server before starting evo, fired up a tcpdump, started evo, did the testing, used 'netstat' to see what clients were connected now, and exited evo. I then diffed the two netstat results, found the new client source ports, and filtered the dump to include only those (sport: 37625, 37627).

The compose window display took 17 seconds from clicking on the "new" button to the finish of the window drawing. The time range was 11:37:30-> 11:37:47

When filtered for evo traffic only things look a fair bit different; see the I/O graph I'll attach shortly. There are several distinct phases apparent in window creation, each of which take several seconds. I'm not sure what's going on yet, and it's far from clear that network round trips / throughput are even the issue.

Revision history for this message
Craig Ringer (ringerc) wrote :

In the trace attached to the last comment, the new button click is packet 3664 at 11:37:30.98 . The window appears, as noted, around 11:37:47.

The attached graph and the one in the next comment show I/O in packets/second and bytes/second during this period.

Revision history for this message
Craig Ringer (ringerc) wrote :
Revision history for this message
Craig Ringer (ringerc) wrote :

As you can see the packets per second graph is rather more interesting.

When we look at what the traffic actually is (note: using "view -> time display format -> time of day" is useful), it's possible to see the same sequence of packets repeated during the long delay:

(C: client; S: server)

C -> S: Req: TranslateCoordinates
S -> C: Rep: TranslateCoordinates
C -> S: Req: GrabServer, QueryPointer
S -> C: Rep: QueryPointer
(* now, repeated seven times *):
* C -> S: Req: QueryPointer
* S -> C: Rep: QueryPointer
(* end repeated segment *)
C -> S: Req: UngrabServer

The whole segment above repeats hundreds of times during the long delay before the compose window appears.

I've attached a pcap trace containing one of these repeated segments so you can examine it easily without having to dig through the whole trace. Also attached is a text version.

Revision history for this message
Craig Ringer (ringerc) wrote :
Revision history for this message
Craig Ringer (ringerc) wrote :

Forgot to mention:

libx11-6
2:1.1.99.2-1ubuntu2

libgtk2.0-0
2.16.1-0ubuntu2

libglib2.0-0
2.20.1-0ubuntu2

running kernel: 2.6.28-11-server

Revision history for this message
Craig Ringer (ringerc) wrote :

A sequence of backtraces taken across all threads while waiting for the compose window to appear.

I can do more specific, or controlled, debugging if someone can offer a suggestion as to where to start. Right now, I don't know evo's guts well enough to have much idea, though I haven't yet dug through the traces in detail.

See the comments inserted in "backtrace.txt" for info on what was happening at the time of each backtrace.

Revision history for this message
Sebastien Bacher (seb128) wrote :

Thank you for your bug report. The issue is an upstream one and it would be nice if somebody having it could send the bug the to the people writting the software (https://wiki.ubuntu.com/Bugs/Upstream/GNOME)

Usually such bugs are due to tracker or non local or broken bookmarks but the stacktrace doesn't seem to indicate that there

Changed in evolution (Ubuntu):
assignee: nobody → Ubuntu Desktop Bugs (desktop-bugs)
importance: Undecided → Low
Changed in evolution:
status: Unknown → New
Revision history for this message
Craig Ringer (ringerc) wrote :

Cloning my comment from the GNOME bug, since I've identified pretty much exactly what factor controls whether or not this issue happens:

----
If you enable your local X server to listen on TCP/IP (on most GNOME desktops, edit /etc/gdm/gdm.conf (may be /etc/X11/gdm/gdm.conf depending on distro/version) and change "DisallowTCP=true" to "DisallowTCP=false" then restart GDM, you can reproduce this running directly on your normal X server.

If you run with DISPLAY=:0.0 (assuming you're on DISPLAY 0) the compose window appears in about a second. If you run with DISPLAY=127.0.0.1:0.0 it takes 15 or more seconds to appear.

Here's an interesting example. I have two terminals, with prompts "T1$ " and "T2$ ". In T1 I'll launch Evolution, talking on a UNIX domain socket. On T2 I'll then time how long it takes to open a compose window by using the command "time evolution mailto:<email address hidden>". The remote invocation only terminates when the mail window has displayed, so we can time how long it takes to display quite accurately. Once I've timed that, I'll exit evolution on T1, reopen it with a TCP/IP connection, and repeat the test on T2. Observe the difference:

T1$ DISPLAY=:0.0 evolution
** (evolution:26183): DEBUG: EI: SHELL STARTUP
** (evolution:26183): DEBUG: EI: mail_read_notify
** (evolution:26183): DEBUG: MAIL SERVER: Count changed: 0
T2$ time evolution mailto:<email address hidden>
real 0m0.830s
user 0m0.160s
sys 0m0.030s
T1$ DISPLAY=127.0.0.1:0 evolution
** (evolution:26236): DEBUG: EI: SHELL STARTUP
** (evolution:26236): DEBUG: EI: mail_read_notify
** (evolution:26236): DEBUG: MAIL SERVER: Count changed: 0
T2$ time evolution mailto:<email address hidden>
real 0m16.742s
user 0m0.180s
sys 0m0.030s

0.8 seconds vs 16.7 seconds. The only difference is TCP/IP vs UNIX domain socket comms.

Changed in gtkhtml:
status: Unknown → New
Revision history for this message
Craig Ringer (ringerc) wrote :
Revision history for this message
Craig Ringer (ringerc) wrote :

It's a gtkhtml bug, or possibly a libgtk+ bug.

A workaround is to patch libgtkhtml and rebuild it. Instructions:

$ sudo apt-get install fakeroot build-essential wget
$ sudo apt-get build-dep gtkhtml3.14
$ mkdir $HOME/gtkhtml
$ cd $HOME/gtkhtml
$ apt-get source gtkhtml3.14
$ wget http://launchpadlibrarian.net/27859768/bug585626.diff
$ patch -p0 < bug585626.diff
$ ( cd gtkhtml3.14-* && fakeroot debian/rules binary )
$ sudo dpkg -i *.deb

Your changes WILL BE AUTOMATICALLY REPLACED next time gtkhtml is updated, so this is only a quick-and-dirty fix. (Yes, you can use pinning or the like to retain them, but if you need to know how to unpin later, remember to , etc, or things may go pear-shaped, so I won't explain that here).

Changed in evolution (Ubuntu):
status: New → Triaged
Revision history for this message
Craig Ringer (ringerc) wrote :

Matthias Clasen from Red Hat tackled the underlying gtk+ bug ( http://bugzilla.gnome.org/show_bug.cgi?id=585626 ) and has committed a patch to gtk+ head that dramatically improves things.

See the gtk+ bug I've linked to for his patch, and some instructions I've written on easily applying the fix to your Ubuntu system without having to change anything else if you suffer from the same issue.

Changed in evolution (Ubuntu):
status: Triaged → In Progress
Changed in gtkhtml:
importance: Unknown → Undecided
status: New → Invalid
Changed in evolution:
importance: Unknown → Undecided
status: New → Invalid
affects: evolution (Ubuntu) → gtk+2.0 (Ubuntu)
Changed in gtk+2.0 (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package gtk+2.0 - 2.17.3-0ubuntu1

---------------
gtk+2.0 (2.17.3-0ubuntu1) karmic; urgency=low

  * New upstream version:
    GtkFileChooser:
    - Shows the size column by default now
    GtkStatusIcon:
    - Has a title property, which can be used by ATs when they read status icons
    GtkInfoBar:
    - The default theme now includes color definitions for infobars
    - The ::use-tooltip-style style property has been removed
    GtkMountOperation now supports interaction during unmount operations.
    The client-side windows branch has been merged; GDK now maintains
    its own window hierarchy client-side, and only uses X windows where
    unavoidable. Some of the benefits of this change are
    - Reduced flicker
    - The ability to do transformed and animated rendering of widgets
    - Easier embedding of GTK+ widgets e.g. into Clutter scene graphs
      This is a fundamental change to the way GDK works,
      so watch out for regressions.
    Bugs fixed:
    586315 Crash in GTK+ 2.14 when calling gtk.FileChooser.list_shor...
    461944 pressing the volume icon in full screen shuts down the sound
    (lp: #137041)
    490724 iconview item's height would be changed event with same m...
    564063 regression: Left margin in popup menus
    582025 Accelerators fail for submenus
    585626 Setting widget tooltip hammers X11 server on any TCP/IP X...
    (lp: #386199)
    585802 Add API to make GtkStatusIcon accessible with a name
    585858 right-click Add-to-Booksmarks is sometimes greyed out
    586330 GtkButton ignores user_underline when an image is set
    318807 Offscreen windows and window redirection
    587716 GtkInfoBar broken on resize
    587485 GMountOperation::show-processes support
    Updated translations
  * debian/control.in:
    - updated libglib requirement
  * debian/patches/090_directfb_csw_build.patch:
    - new patch from Cody Russel to fix the directfb build
  * debian/patches/091_directfb_build_workaround.patch:
    - workaround build issue
  * debian/rules:
    - updated shlibs version

 -- Sebastien Bacher <email address hidden> Thu, 09 Jul 2009 17:15:04 +0200

Changed in gtk+2.0 (Ubuntu):
status: Fix Committed → Fix Released
Changed in libgtk:
status: Unknown → Fix Released
Changed in libgtk:
importance: Unknown → Medium
To post a comment you must log in.