gdm leaking filehandles, causing "too many open files"

Bug #1053217 reported by Stuart Longland
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
gdm (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

At a client's site (a large mining company here in Queensland) we have a Ubuntu 10.04 virtual machine running MacroView SCADA in a dr:bd high availability cluster. Workstations connect to the server via XDMCP for control of the local plant.

In the past they ran two discrete Ubuntu servers, which used to run reliably for months at a time. Since the switch to the high availability cluster, they now find gdm will refuse to accept new logins after about 6 weeks of operation until the gdm service is restarted. The restart of gdm has the effect of booting off everyone currently logged in -- so for a few seconds, everyone loses control of the processing plant while people re-connect, every 6 weeks.

The logs show the following:
Sep 20 08:23:38 cgmv1 gdm-binary[31050]: CRITICAL: could not add display to access file: Too many open files
Sep 20 08:23:38 cgmv1 gdm-binary[31050]: WARNING: Unable to set up access control for display 1691
Sep 20 08:23:38 cgmv1 gdm-binary[31050]: WARNING: GdmDisplay: display lasted 0.010690 seconds

Doing a search, it would appear this is a file-handle leaking bug reported to Red Hat back in February 2010:

https://bugzilla.redhat.com/show_bug.cgi?id=562143

Comment #2 of that bug has a patch that allegedly fixes the problem. I have hand-applied the patch against the latest stable source package of gdm (2.30.2.is.2.30.0), which I have attached here and am in the process of testing.

Tags: patch
Revision history for this message
Stuart Longland (redhatter) wrote :
Revision history for this message
Stuart Longland (redhatter) wrote :
Download full text (6.2 KiB)

Okay, I can confirm this would appear to fix the problem.

Test procedure:

1. Log in remotely to the affected machine using ssh, run the following command:

   $ sudo watch ls -l /proc/$( pidof gdm-binary )/fd

2. Start a remote X session with that host (through Xnest, plain X, whatever)... observe that new files are opened.

3. Close the X session, observe that the files are not removed from the list. In my case, this looked like this:

Vanilla Ubuntu gdm build.
After initial start-up.

total 0
lrwx------ 1 root root 64 2012-09-20 14:34 0 -> /dev/null
lrwx------ 1 root root 64 2012-09-20 14:34 1 -> /dev/null
lrwx------ 1 root root 64 2012-09-20 14:34 2 -> /dev/null
lrwx------ 1 root root 64 2012-09-20 14:34 3 -> socket:[78942]
lr-x------ 1 root root 64 2012-09-20 14:34 4 -> pipe:[78945]
l-wx------ 1 root root 64 2012-09-20 14:34 5 -> pipe:[78945]
lr-x------ 1 root root 64 2012-09-20 14:34 6 -> inotify
lr-x------ 1 root root 64 2012-09-20 14:34 7 -> pipe:[78952]
l-wx------ 1 root root 64 2012-09-20 14:34 8 -> pipe:[78952]
lrwx------ 1 root root 64 2012-09-20 14:34 9 -> /var/run/gdm/auth-for-gdm-omI8J6/database
lr-x------ 1 root root 64 2012-09-20 14:34 10 -> pipe:[78971]
l-wx------ 1 root root 64 2012-09-20 14:34 11 -> pipe:[78971]
lr-x------ 1 root root 64 2012-09-20 14:34 12 -> pipe:[78937]
lrwx------ 1 root root 64 2012-09-20 14:34 13 -> socket:[78973]
lrwx------ 1 root root 64 2012-09-20 14:34 14 -> socket:[78974]

After a few logins...

total 0
lrwx------ 1 root root 64 2012-09-20 14:38 0 -> /dev/null
lrwx------ 1 root root 64 2012-09-20 14:38 1 -> /dev/null
lrwx------ 1 root root 64 2012-09-20 14:38 2 -> /dev/null
lrwx------ 1 root root 64 2012-09-20 14:38 3 -> socket:[78942]
lr-x------ 1 root root 64 2012-09-20 14:38 4 -> pipe:[78945]
l-wx------ 1 root root 64 2012-09-20 14:38 5 -> pipe:[78945]
lr-x------ 1 root root 64 2012-09-20 14:38 6 -> inotify
lr-x------ 1 root root 64 2012-09-20 14:38 7 -> pipe:[78952]
l-wx------ 1 root root 64 2012-09-20 14:38 8 -> pipe:[78952]
lrwx------ 1 root root 64 2012-09-20 14:38 9 -> /var/run/gdm/auth-for-gdm-omI8J6/database
lr-x------ 1 root root 64 2012-09-20 14:38 10 -> pipe:[78971]
l-wx------ 1 root root 64 2012-09-20 14:38 11 -> pipe:[78971]
lr-x------ 1 root root 64 2012-09-20 14:38 12 -> pipe:[78937]
lrwx------ 1 root root 64 2012-09-20 14:38 13 -> socket:[78973]
lrwx------ 1 root root 64 2012-09-20 14:38 14 -> socket:[78974]
lrwx------ 1 root root 64 2012-09-20 14:38 15 -> /var/run/gdm/auth-for-gdm-bqKAEb/database
lrwx------ 1 root root 64 2012-09-20 14:38 16 -> /var/run/gdm/auth-for-gdm-QC37Jk/database
lrwx------ 1 root root 64 2012-09-20 14:38 17 -> /var/run/gdm/auth-for-vrtadmin-4oPw6L/database
lrwx------ 1 root root 64 2012-09-20 14:38 18 -> /var/run/gdm/auth-for-gdm-ePg83T/database
lrwx------ 1 root root 64 2012-09-20 14:38 19 -> /var/run/gdm/auth-for-vrtadmin-YMcsem/database

Now, build gdm with the given patch here, re-start gdm, and try again with the same procedure. In my case, I see:

Patched gdm build.
After initial start-up.

total 0
lrwx------ 1 root root 64 2012-09-20 14:49 0 -> /dev/null
lrwx------ 1 root root 64 2012-09-20 14:49 1 -> /dev/null
lrwx------ 1 root root 64 2012-09...

Read more...

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "The aforementioned patch for gdm" of this bug report has been identified as being a patch. The ubuntu-reviewers team has been subscribed to the bug report so that they can review the patch. In the event that this is in fact not a patch you can resolve this situation by removing the tag 'patch' from the bug report and editing the attachment so that it is not flagged as a patch. Additionally, if you are member of the ubuntu-reviewers team please also unsubscribe the team from this bug report.

[This is an automated message performed by a Launchpad user owned by Brian Murray. Please contact him regarding any issues with the action taken in this bug report.]

tags: added: patch
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in gdm (Ubuntu):
status: New → Confirmed
Revision history for this message
Stuart Longland (redhatter) wrote :

Hi,

Is there any possibility of this patch being reviewed and included in the Ubuntu 10.04 series? We just had the customer at this site ring up complaining that they couldn't log in today. Turns out something reverted our patched gdm package back to the original Ubuntu one.

Think: big industrial system, and when this problem occurs we have to boot everyone off to restart gdm. It was good until the old package got instated then the service restarted, causing the old problem to occur once more. This is our problem, but you could help us a little if you put out an updated gdm binary.

Or, as a work around: maybe tell us a way we can stop the update manager from messing with the one we have now.

The original patch is linked in the original post, my revised patch is supplied, and I've given detail on how to reproduce the problem and how to check if the problem is fixed. Is there anything else I need to supply in order to get this bug resolved?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.